This page describes the commands used to run jobs with slurm on SRVOAD. Note that running a program on SRVOAD requires allocating resources to avoid interfering with other users' experiments, the resource allocation is managed by slurm when you use one of these three commands.
salloc
salloc is used to allocate a Slurm job allocation, which is a set of resources (cpus), possibly with some set of constraints (e.g. time limit, memory per cpu). When salloc successfully obtains the requested allocation, it then runs the command specified by the user. Finally, when the user specified command is complete, salloc relinquishes the job allocation.
The command may be any program the user wishes. Some typical commands are xterm, a shell script containing srun commands. If no command is specified, then salloc runs the user's default shell.
$ salloc --cpus-per-task=1 --mem-per-cpu=100mb
salloc: Granted job allocation 130214
$ exit
exit
salloc: Relinquishing job allocation 130214
salloc: Job allocation 130214 has been revoked.
srun
Run a parallel job on SRVOAD. If necessary, srun will first create a resource allocation in which to run the parallel job (More details here).
To run your code you need to specify different parameters value to allocate resources:
Option | short | Description |
---|---|---|
--nodes=1 | -N1 | Number of node used (SRVOAD have only one node) |
--ntasks=1 | -n1 | Number of tasks to be run |
--cpus-per-task=1 | -c1 | Number of cpus used by each task |
--time=0-00:30:00 | -t0-00:30:00 | Time limit (d-hh-mm-ss) |
--mem-per-cpus=100mb | ... | Minimum amount of memory allocated per CPU |
--mem=100mb | ... | Maximum amount of real memory |
$ srun --nodes=1 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=100mb --time=0-00:00:30 sleep 20 &
$ srun -N1 -n1 -c1 -t1 --mem=100mb sleep 20 &
sbatch
sbatch submits a batch script to Slurm. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input. The batch script may contain options preceded with "#SBATCH" before any executable commands in the script. sbatch will stop processing further #SBATCH directives once the first non-comment non-whitespace line has been reached in the script.
Caution: The sbatch command will run one single job separated in several tasks. Meaning that your tasks will be executed if and only if there is enough resources available to run all the tasks simultaneously. A better practice to run several independant tasks in parallel is to use job arrays
sbatch myscript.sh &
Here is an example of file:
#!/bin/bash
#SBATCH --job-name=parallel_job_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (Not available on SRVOAD)
#SBATCH --mail-user=email@imt-atlantique.fr # Where to send mail
#SBATCH --nodes=1 # Run all processes on a single node
#SBATCH --ntasks=1 # Number of processes
#SBATCH --cpus-per-task=4 # Number of CPU per task
#SBATCH --mem=1gb # Total memory limit
#SBATCH --time=01:00:00 # Time limit hrs:min:sec
#SBATCH --output=example_%j.log # Standard output and error log
./my-app arg1 arg2 &
This file will run one task using at most 4 cpus, 1gb of memory for at most 1 hour and print the output in example_%j.log (%j will be replaced by the job allocation id).
A second example of file:
#!/bin/bash
#SBATCH --job-name=parallel_job_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (Not available on SRVOAD)
#SBATCH --mail-user=email@imt-atlantique.fr # Where to send mail
#SBATCH --nodes=1 # Run all processes on a single node
#SBATCH --ntasks=10 # Number of processes
#SBATCH --cpus-per-task=1 # Number of CPU per task
#SBATCH --mem=1gb # Total memory limit
#SBATCH --time=01:00:00 # Time limit hrs:min:sec
#SBATCH --output=example_%j.log # Standard output and error log
for i in {1..10}
do
./my-app arg1 $i &
done
wait
This will run 10 different tasks in parallel. Note that this will appear as a single job if you use squeue and that you can't separate the outputs in a different file for each task (To do that, you can refer to the Job array section).
Example:
$ sbatch myscript.sh &
$ Submitted batch job 130977
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
130977 ls2n parallel user R 0:08 1 srvoad
Common errors:
-
"my sbatch option does not seem to be set"
- the bash command will not interpret the #SBATCH options, make sure to use sbatch
- the sbatch command will stop to interpret if it encounters anything different from a space, a comment or a #SBATCH option.
-
"my tasks are not performed in parallel"
- make sure that you properly set the correct number of task in the options
- make sure that you putted the "&" at the end of you command line (in your file)