Monitoring Jobs on compute.cla
Checking Job Status
To see the status of your job, enter the following at the command prompt:
The result will look something like the following:
The output is fairly self-explanatory. Perhaps the main item to note is the State (“S”) column where the “R” indicates that the job is running. Other entries you may see in that column are “Q” for “queued”, “E” for “exiting”, or “C” for “completed.”
The “qstat -f” command will give you more information on the jobs you have in queue, including, for example, the execution host(s), variable list, and walltime remaining. More information on the qstat command can be found on the manpage.
Checking Job Array Status
Checking the status of an entire job array is done by running qstat with the -t option. Each array element will appear as a separate job in the queue and the normal scheduling rules apply to each element. The name of the array will be the job number assigned by PBS followed by a set of brackets. For example, if the assigned job number is 9876, the entire job array will be denoted as 9876 and the individual jobs will be 9876, 9876, 9876 and so on.
Checking Job Logs
PBS by default will log both stdout and stderr to the job submission directory. (See the document for submitting jobs for information on how to have PBS log to another location.) If your job doesn’t run as expected, check the stderr log for errors. If you submit a large batch of jobs, an easy way
to check for errors is to look for stderr files whose size is greater than 0.
Note: Module file loads and unloads get written to stderr. The xargs portion of the above command is a workaround since torque error logs will record module loads and unloads. If you aren’t loading any modules when you run your job, you can exclude the last section of the above command ( the “| xargs grep -iv loaded” part) and just check for error files that have a size greater than 0.