Debugging a Batched Job

This assumes that you cannot attach to the processes in question. If you can, this might be another good option.

If you notice that a batched job seems to be stuck without useful information for its cause in its log, you can consider cancelling the job and creating a core dump that you can then use to investigate the issue’s cause.

kill -s 3 PID

Usually a similar command exists for schedulers, e.g. with slurm:

scancel -s 3 JOBID

The optional parameter passes signal 3 to the program signaling it to exit and make a core dump.

You can then load the core dump(s) with your favorite debugger (gdb, ddt or other)

With gdb that would be:

gdb EXECUTABLE COREDUMP
Avatar
Chris Byrohl
Second year PhD Candidate in Physical Cosmology