Submitting jobs on compute.cla

Submitting jobs on compute.cla


Interactive vs batch jobs

 

There are two basic ways to run jobs on the cluster -- interactively or via a batch script.

What is an interactive job?

Interactive jobs are the simplest way to use the cluster. You log in, run commands which execute immediately, and log off when you’re finished. You can use either the command line or a graphical environment when running an interactive job.

 

When should you use an interactive job?

  • Short tasks

  • Tasks that require frequent user interaction

  • Graphically intensive tasks

 

Submitting an interactive job

To submit an interactive job, ssh to compute.cla.umn.edu and enter “qsub -IX” on the command line. This will drop you to a node where you can run your interactive job.

 

    user@somehost:~$ ssh -X compute.cla.umn.edu

 

    user@compute:~$ qsub -IX

 

 

What is a batch job?

 

A batch job involves writing a script that specifies the tasks that you want to run. The script is then submitted to the job scheduler and runs without user interaction. This is an efficient way to leverage the power of the cluster as once your batch job has been submitted, you can log off and wait for the jobs to complete.

 

When should you use a batch job?

  • Longer running processes

  • Parallel processes

  • Running large numbers of short jobs simultaneously

  • Tasks that can be left running for a significant amount of time without any interaction

 

Submitting a batch job

The easiest way to submit a batch job is to first create a Portable Batch System (PBS) script which defines the commands and cluster resources that will be needed to run the job. This script is then submitted to PBS using the qsub command.

Creating a PBS Script

To set the parameters for your job, you can create a file that contains the commands to be executed. Typically, this is in the form of a PBS script. You can find a list of some of the more commonly-used PBS directives here.

Here is a sample PBS file, named myjob.pbs, followed by an explanation of each line of the file:

 

#!/bin/bash

#PBS -q batch
#PBS -l nodes=1:ppn=2
#PBS -l walltime=01:00:00
#PBS -l mem=500mb

cd $PBS_O_WORKDIR/
module load stata/14.1
stata-se -b do test.do

 

  • The first line in the file identifies which shell will be used for the job.  In this example, bash is used but csh or other valid shells would also work.

  • The second line specifies which queue to use. In this case, we are submitting the job to the “batch” queue.

  • The third line specifies the number of nodes and processors desired for this job. In this example, one node with two processors is being requested. Note: The “-l” flag is an “el” (for ”resource_list”), not a “one”. There are no spaces around the “=” and “:” signs.

  • The fourth line specifies how much wall-clock time is being requested. The format for the walltime option is hh:mm:ss. In this example, one hour of wall time has been requested.

  • The fifth line in the PBS file requests a maximum of 500mb physical memory. Note that on Linux, the “-l mem” directive is ignored if the number of nodes being requested is not 1.

  • The sixth line tells the cluster to cd to the directory from which the batch job was submitted. By default, a new job starts in your home directory. Including this line in your script makes it convenient to edit a script and then submit the job from the same directory.

  • The seventh line loads the stata 14.1 environment module in preparation to run a stata script. More information on environment modules can be found on the LATIS website.

  • The last line tells the cluster to run the program. In this example, it runs stata, specifying test.do in the same directory from which the job was submitted.

Submitting the job

Once you have your job script ready to go, you will need to submit it to the cluster. To run the job, enter the following command on compute.cla.umn.edu:

 

    user@compute:~$ qsub myjob.pbs

 

 

Job Arrays

 

Job arrays can be used to simplify the task of submitting multiple similar jobs. For example, let’s say you want to run 10 jobs using the same script to analyze 10 different data files. Rather than submit 10 jobs individually, using a job array allows you to submit a single job. PBS will then create the 10 individual jobs using the requested script and file parameters. Please note that job arrays are inherently serial unless the code itself is parallelized.

 

Submitting a Job That Uses an Array

 

An easy way to prepare your data files for job submission using a job array is to rename the files by appending sequential numbers to the name of the each of the files, such as data-1, data-2, etc.. Here is an example of a pbs script that uses job arrays to run a script 10 times, each time with a different input file starting with data-1 and ending with data-10:

 

#!/bin/bash

#PBS -q batch
#PBS -l nodes=1:ppn=2
#PBS -l walltime=01:00:00

#PBS -t 1-10

cd $PBS_O_WORKDIR/
./myscript -input=data-${PBS_ARRAYID}

 

 

The -t parameter sets the range of the PBS_ARRAYID variable. In the above example, the parameter of 1-10 will cause qsub to call the script 10 times, each time updating the PBS_ARRAYID from 1 to 10.  This results in 10 jobs being created in the job array, each one using the same script with a different input data file from data-1 to data-10. The argument to the -t parameter can be an integer id or a range of integers. Multiple ids or id ranges can be combined in a comma delimited list (e.g., -t 1,10,50-100).

 

You can also limit the number of jobs that will run simultaneously by specifying the number of job slots that you want. For example, if you change the -t parameter in the above example to:

 

#PBS -t 1-10%5

 

you are specifying an array with 10 elements as before but the “%5” at the end tells the system that only 5 should be running at any one time. Limiting the number of simultaneous jobs in this manner can be useful when you are sharing limited cluster resources with others.

 

 

Useful PBS Parameters

   

This is a partial list of some of the more commonly-used PBS parameters. A complete list can be found in the qsub man page or the Torque Administrator Guide.


#PBS -N stata-test                    Sets the name of the job that will be seen in the qstat output.

                                                  If not set, the name defaults to the name of the script.

#PBS -o myprog.out                  Where to write stdout. Defaults to

                                                  $PBS_JOBNAME.o$PBS_JOBID in the job submission directory.


#PBS -e myprog.err                   Where to write stderr. Defaults to

                                                  $PBS_JOBNAME.e$PBS_JOBID in the job submission directory.                                              

 

#PBS -o mylogs/

                                                  Write stdout logs to the mylogs subdirectory of the job submission

                                                  Directory. You can also specify the filename, such as  

                                                  mylogs/myprog.err. Note that when specifying just a directory, you                             

                                                  need to include the trailing slash.

#PBS -e mylogs/                        Write stderr logs to the mylogs subdirectory of the job submission

                                                  directory. As above, you can also specify the filename but will

                                                  need to include the trailing slash when specifying just a directory.

 

#PBS -j $arg              This argument to this directive determines how the standard error                                                   

                                                  and standard output streams will be joined. The $arg argument

                                                  can be one of the following:

  • oe - both output streams will be merged, intermixed, as standard output.

  • eo - both output streams will be merged, intermixed, as standard error.

  • -n - the two streams will be separate files. This is also the default if the “-j” directive is not used.

 

 

#PBS -m abe                             Send an email when the job aborts, begins, or ends.

 

#PBS -m $mail_options             Defines the set of conditions under which the execution server will          

                                                  send a mail message about the job. The mail_options argument is

                                                  a string which consists of either the single character "n" or "p", or                                                

                                                  one or more of the characters "a", "b", "e", and "f".

  • a - mail is sent when the job is aborted

  • b - mail is sent when the job begins execution

  • e - mail is sent when the job terminates

  • f - mail is sent when the job terminates with a non-zero exit code

  • n - no normal mail is sent

  • p - no mail is sent

 

#PBS -S /bin/$shell          Sets the shell to be used in executing your script. If left out, it    

                                                  defaults to your normal login shell. Typical values for the $shell

  argument are /bin/bash, /bin/tcsh, /bin/csh or /bin/sh.

 

#PBS -V                                     Export all environment variables in the qsub command                                                                                                    

                                                  environment to the batch job environment.

 

Comments