The default job scheduler on Gemini is SLURM. SLURM has replaced Sun Grid Engine as the job scheduling system, and as result any previously developed workflows need to be modified to work with SLURM. Equivalent commands and instructions for using the most common features are described below.
Job submission is done from the command line or via batch script. For example:
sbatch -n 8 -N 1 my_application <application arguments>
sbatch -n 8 -N 1 mpirun <mpirun arguments> my_application <application arguments>
In the above examples specifying -n 4 means submit the script to four CPUs or -n 8 -N 1 means run the application using 8 CPUs, but restrict it to a single compute node. In the latter case the application must be parallelized already (e.g. via OpenMP or MPI).
Alternatively all of the necessary information can be placed inside the submission script. For example:
#SBATCH -n 4
module load shared openmpi/gcc
mpirun --mca btl openib --report-bindings sleep 10
In the above example, the directives for the job scheduler are provided as #SBATCH directives at the top of the script. Modules needed for the job are loaded and then the command for the specific application is given. In this case we're using OpenMPI via Infiniband and each process will execute the command 'sleep' for 10 seconds.
The job would then be dispatched to the scheduler using the command:
- By default jobs run relative to the directory the job was submitted from.
- For MPI jobs you do not need to specify a hosts file. SLURM takes care of this for you.
- Environment variables can be explicitly exported (see below), and SLURM sets a number of variables you can access from your scripts.
|sbatch||Used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.||qsub|
|squeue||Reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.||qstat|
|scancel||Used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.||qdel|
|sinfo||Reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options.||qhost|
|sbcast||Used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.|
|sview||Graphical user interface to get and update state information for jobs, partitions, and nodes managed by Slurm.||qmon|
|Queue||-p [queue]||-q [queue]|
|Node Count||-N [min[-max]]||N/A|
|CPU Count||-n [count]||-pe [PE] [count]|
|Generic Resources||--gres=[resource_spec]||-l [resource]=[value]|
|StdOut/StdErr||-o [file_name]/-e [file_name]||-o [file_name]/-e [file_name]|
|Copy Environment||--export=[ALL | NONE | variables]||-V|
|Email Address||--mail-user=[address]||-M [address]|
|Job Name||--job-name=[name]||-N [name]|
|Working Directory||--workdir=[dir_name]||-wd [directory]|
|Tasks Per Node||--tasks-per-node=[count]||Fixed Allocation|
|CPUs Per Task||--cpus-per-task=[count]||N/A|
|Job Dependency||--depend=[state:job_id]||-hold_jid [job_id | job_name]|
|Job Arrays||--array=[array_spec]||-t [array_spec]|
|Job Host Preference||--nodelist=[nodes] AND/OR --exclude= [nodes]||-q [queue]@[node] OR -q [queue]@@[hostgroup]|
|Job Array Index||$SLURM_ARRAY_TASK_ID||$SGE_TASK_ID|