Debugging a Batched Job

This assumes that you cannot attach to the processes in question. If you can, this might be another good option.

If you notice that a batched job seems to be stuck without useful information for its cause in its log, you can consider cancelling the job and creating a core dump that you can then use to investigate the issue’s cause.

kill -s 3 PID

Usually a similar command exists for schedulers, e.g. with slurm:

scancel -s 3 JOBID

The optional parameter passes signal 3 to the program signaling it to exit and make a core dump.

You can then load the core dump(s) with your favorite debugger ( gdb, ddt or other)

With gdb that would be:

Chris Byrohl
Chris Byrohl
PhD Student

My research interests include Lyman-$\alpha$ radiation to study galaxies and the large-scale structure, supernovae type Ia and high-performance computing.