Create Slurm job arrays authored by LOGER Benoit's avatar LOGER Benoit
An other approach to run several independent tasks in parallel is to use Slurm array of jobs. It can be done by adding a single *#SBATCH* option to your sbatch file. Here is a simple example that uses the *--array* option:
```
#!/bin/bash
#SBATCH --job-name=myarray_job
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --array=1-10%5
#SBATCH --mem-per-cpu=1gb
#SBATCH --time=0-00:30:00
#SBATCH --output=0-00:01:00
./myapp arg1 arg2
```
This script (note the --array=1-10 option) will run one *parent* job that will run 10 *child* jobs performing a single task.
**What is different ?**
- The *parent* job will run a new *child* job everytime there is enough computation ressources available (instead of waiting to be able to run them all in parallel)
- You can specify how many *child* jobs (at most) should be run in parallel *--array=1-10%5* will run at most 5 *child* jobs simultaneously
- You can define the set of identifiers for your *child* jobs (i.e. use --array=1,5,6)
## Separated outputs
The use of array of Jobs is an opportunity to define a specific output file for every job.
```
#SBATCH --output=slurm-%A-%a.log
```
this command line will create one output file for every *child* job (%A will be replaced by the ID of the *parent* job and %a by the ID of the *child* job)
## Using configuration files
Using arrays of jobs will instantiate different slurm variables and in particular *SLURM_ARRAY_TASK_ID*.
This variable can then be used in your sbatch file to extract information from a configuration file.
Here is an example of how you can use slurm variables to configure the execution of your jobs.
```
#!/bin/bash
#SBATCH --job-name=myarray_job # Name of the parent job
#SBATCH --ntasks=1 # Each child job run 1 task
#SBATCH --cpus-per-task=1 # Each task require 1 cpu
#SBATCH --array=1-10%5 # Running 50 child jobs with IDs in [1,50]
#SBATCH --mem-per-cpu=1gb # Using at most 1gb of memory per cpu
#SBATCH --time=0-00:30:00 # Child jobs will be killed if longer than 30 minutes
#SBATCH --output=logs/array_%A-%a.logs # One log file per child job
# Specify the path to the config file
config=config.txt
# Extract the instance number for the current $SLURM_ARRAY_TASK_ID
inst=$(awk -v ArrayTaskID=$SLURM_ARRAY_TASK_ID '$1==ArrayTaskID {print $2}' $config)
# Extract the value of a parameter for the current $SLURM_ARRAY_TASK_ID
param=$(awk -v ArrayTaskID=$SLURM_ARRAY_TASK_ID '$1==ArrayTaskID {print $3}' $config)
# Execute my application/code with the parameters specified in my configuration file
./myapp $inst $param
```
And the corresponding configuration file *config.txt*:
```
ArrayTaskID Instance Parameter
1 1 15
2 2 15
3 3 15
4 4 15
5 5 20
6 6 20
7 7 20
8 8 30
9 9 30
10 10 30
```
## Running the example
```
$ sbatch script_array_job.sh &
$ Submitted batch job 130995
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
130995_[6-10%5] ls2n myarray_ b19loger PD 0:00 1 (JobArrayTaskLimit)
130995_1 ls2n myarray_ b19loger R 0:02 1 srvoad
130995_2 ls2n myarray_ b19loger R 0:02 1 srvoad
130995_3 ls2n myarray_ b19loger R 0:02 1 srvoad
130995_4 ls2n myarray_ b19loger R 0:02 1 srvoad
130995_5 ls2n myarray_ b19loger R 0:02 1 srvoad
```
For more specific usage and more information check the [slurm documentation](https://slurm.schedmd.com/job_array.html).
\ No newline at end of file