Sample Slurm Scripts¶

Below are a number of sample scripts that can be used as a template for building your own Slurm submission scripts for use on HiPerGator. These scripts are also located at: /data/training/Slurm/, and can be copied from there. As with any script you copy, if you choose to copy one of these sample scripts, please make sure you understand what each #SBATCH directive means before using the script to submit your jobs. Otherwise, you may not get the result you want and may waste valuable computing resources.

Please see the SchedMD sbatch documentation for more detailed explanations of each of the sbatch options below.

Note

There is a maximum limit of 3000 jobs per user.

See Annotated Slurm Script for a step-by-step explanation of all options.

A Note on Memory Requests¶

Many users request far more memory (RAM) than their jobs use (sometimes 100-10,000 times!). Groups often find themselves with jobs pending due to having reached their memory limits (QOSGrpMemLimit listed in the squeue output).

While it is important to request more memory than will be used (10-20% is usually sufficient), requesting 100x, or even 10,000x, more memory only reduces the number of jobs that a group can run, as well as overall throughput on the cluster. Many groups, and our overall user community, will be able to run far more jobs if they request more reasonable amounts of memory.

The email sent when a job finishes shows users how much memory the job actually used and can be used to adjust memory requests for future jobs. The Slurm directives for memory requests are the --mem or --mem-per-cpu. It is in the user’s best interest to adjust the memory request to a more realistic value.

Requesting more memory than needed will not speed up analyses.

Based on their experience of finding their personal computers run faster when adding more memory, users often believe that requesting more memory will make their analyses run faster. This is not the case.

An application running on HiPerGator will have access to all of the memory it requests, and we never swap RAM to disk. If an application can use more memory, it will get more memory. Only when the job crosses the limit based on the memory request does Slurm kill the job.

Basic, Single-Threaded Job¶

This script can serve as the template for many single-processor applications. The --mem flag can be used to request the appropriate amount of memory for your job. Please make sure to test your application and set this value to a reasonable number based on actual memory use. The %j in the --output line tells Slurm to substitute the job ID in the name of the output file. You can also add a -e or --error line with an error file name to separate output and error logs.

A single-core job example (Click to collapse the example)

Click the button to view annotations.

#!/bin/bash
#(1)!
#SBATCH --job-name=serial_job_test    # Job name (2)
#SBATCH --mail-type=END,FAIL          # Mail events (3)
#SBATCH --mail-user=email@ufl.edu     # Where to send mail (4)
#SBATCH --ntasks=1                    # Run on a single CPU (5)
#SBATCH --mem=1gb                     # Job memory request (6)
#SBATCH --time=00:05:00               # Time limit hrs:min:sec (7)
#SBATCH --output=serial_test_%j.log   # Standard output and error log (8)
pwd; hostname; date #(9)!

module purge #(10)!
module load python #(11)!

echo "Running plot script on a single CPU core" #(12)!

python /data/training/Slurm/plot_template.py #(13)!

date #(14)!

The line above should be the first line of your script. It indicated that the script is a bash script, the only type of script that Slurm takes as input.
Name the job to make it easier to see in the job queue. By default, only the first eight characters of the job name are displayed by squeue, so focus on the start of the name to distinguish among your jobs.

See the Slurm documentation for full details and options.
What emails you want to receive about your job. Options include:
- BEGIN when your job starts
- END when your job finishes
- FAIL if your job fails for some reason
- ALL all of the above.
- You can also specify multiple types using a coma-separated list, e.g., FAIL,END
See the Slurm documentation for full details and options.
This can be your UF email, or a different one if you want.
This example is for a single-CPU job, so we will ask for 1 task.

See the Slurm documentation for full details and options.
This example requests 1GB of RAM. Slurm will allocate 1GB of memory (RAM) on the computer where the job runs. If the job tries to access more than that 1GB, it will be terminated.

See the Slurm documentation for full details and options.
How long the job should be able to run in the format hh:mm:ss. Other formats are possible. If the job is still running at the end of that time limit, the job will be terminated.

See the Slurm documentation for full details and options.
Name the output file so that it is meaningful to you. Any text that would normally display on the screen when the job runs will be written to this file. The %j will be substituted with the job ID.

You may also add a --error or -e line to separate STDOUT and STDERR.

See the Slurm documentation for --output and --error details and options.
These commands are helpful for keeping track of things and are the first part of the script that is run when the job starts. The information is printed to the file specified with the --output directive.
- pwd prints the current directory, allowing you to verify the starting path for your job.
- hostname prints the name of the server that the job runs on. This can be helpful for tracking down problematic servers.
- date runs the command that prints the date and time. This can help you keep track of when the job was run. Adding the date at the end of the script also lets to see how long the job took to run.
To prevent modules that you may have had loaded when submitting a job from interfering with your job, we suggest using module purge to clear out all modules before loading the ones you want to use.
If you are using an application maintained by UFIT-RC, you will likely need to load one or more modules. See the environment module system pages for more details.

Suggestion: Use full module name, including the version number

While the documentation here often omits the version number to avoid it becoming stale, it is best practices to include the version number when loading a module in your scripts. e.g.,
```
module load python/3.11
```
By including the version:
- You know what version was used.
- You retain the same version until you decide to change it.
- Avoid having new versions unexpectedly change how your script runs.
Print any helpful logging information to the --output file. In Bash, echo prints the information on the line after it.
Run your application! This is the main thing that you want your job to do. It can be one line, as it is here, or a long script with many steps. These are the Bash commands that will be run on the computer where you job is run. They will be run in order, one after another, until all have been run (or the job time or memory limits are reached).
Print the date again, providing both a way to see how long the job took and an indication that the script ran until the end.

Multi-Threaded SMP Job¶

This script can serve as a template for applications that are capable of using multiple processors on a single server or physical computer. These applications are commonly referred to as threaded, OpenMP, PTHREADS, or shared memory applications. While they can use multiple processors, they cannot make use of multiple servers and all the processors must be on the same node.

These applications required shared memory and can only run on one node; as such it is important to remember the following:

You must set --ntasks=1, and then set --cpus-per-task to the number of cores, or OpenMP threads you wish to use.
You must make the application aware of how many processors to use. How that is done depends on the application:
- For some applications, set OMP_NUM_THREADS to a value less than or equal to the number of cpus-per-task you set.
- For some applications, use a command line option when calling that application.

Expand to view example

The lines changed from the single-core job are highlighted and annotated.

#!/bin/bash
#SBATCH --job-name=parallel_job      # Job name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ufl.edu    # Where to send mail
#SBATCH --nodes=1                    # Run all processes on a single node (1)
#SBATCH --ntasks=1                   # Run a single task (2)
#SBATCH --cpus-per-task=4            # Number of CPU cores per task (3)
#SBATCH --mem=1gb                    # Job memory request
#SBATCH --time=00:05:00              # Time limit hrs:min:sec
#SBATCH --output=parallel_%j.log     # Standard output and error log
pwd; hostname; date

echo "Running prime number generator program on $SLURM_CPUS_ON_NODE CPU cores" #(4)!

/data/training/Slurm/prime/prime
# In this example, the application automatically detects the number of cores.
# You may need to specify the number to use. Read the applications' manual.

date

While not strictly needed here, the --nodes directive sets how many individual servers to use. In this case, we want all of the processors on the same server. As there is only one task specified in the next line, that will happen anyway, but by adding it here, we can emphasize that we want a single node for this example.

See the Slurm documentation for full details and options.
For shared memory applications, specify a single task, --ntasks=1. In most cases, nodes and ntasks should be one unless you are using MPI code that can run on multiple servers.

See the Slurm documentation for full details and options.
To request multiple processors (or cores) for an application, use the --cpus-per-task directive. In this case, four cores will be allocated to the job.

See the Slurm documentation for full details and options.
In this example, the Slurm environment variable $SLURM_CPUS_ON_NODE is used to get the number of cores allocated to the job on the server. In this case, the value will be 4 and that value is printed in the echo output. You can also use this value to tell your application how many cored to use depending on how it gets that information.

See the full list of Slurm output environment variables, these are set at the start of the job and are available in the job's environment.

Expand to view another example, setting OMP_NUM_THREADS

#!/bin/bash
#SBATCH --job-name=parallel_job_test # Job name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email@ufl.edu>    # Where to send mail
#SBATCH --nodes=1                    # Run all processes on a single node
#SBATCH --ntasks=1                   # Run a single task
#SBATCH --cpus-per-task=4            # Number of CPU cores per task
#SBATCH --mem=600mb                  # Total memory limit
#SBATCH --time=00:05:00              # Time limit hrs:min:sec
#SBATCH --output=parallel_%j.log     # Standard output and error log
date;hostname;pwd

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} #(1)!

module purge 
module load gcc

./YOURPROGRAM INPUT

date

For OMP programs, set the value of the environment variable OMP_NUM_THREADS. In this case, we use the Slurm environment variable $SLURM_CPUS_PER_TASK to avoid needing to make changes if the --cpus-per-task line is changed.

Multiprocessing jobs¶

While most Python libraries work well with the --cpus-per-task directive as outlined above, there are some cases where you may want to use multiple tasks, e.g. --ntasks=4. If you run multiprocessing code, for example using the python multiprocess module, and use the --ntasks directive make sure to specify a single node and the number of tasks that your code will use.

These types of jobs can only run on a single server (node), and once you specify more than 1 task, Slurm may split them across servers unless you tell it not to.

Expand to view example of multi-task job on a single node

#!/bin/bash
#SBATCH --job-name=parallel_job_test # Job name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ufl.edu    # Where to send mail
#SBATCH --nodes=1                    # Run all processes on a single node (1)
#SBATCH --ntasks=4                   # Number of processes (2)
#SBATCH --mem=1gb                    # Total memory limit
#SBATCH --time=01:00:00              # Time limit hrs:min:sec
#SBATCH --output=multiprocess_%j.log # Standard output and error log
date;hostname;pwd

module purge 
module load python/3

python script.py

date

Any time --ntasks is greater than 1, Slurm may split the tasks on multiple servers (nodes). To prevent this, or control how many servers are used, you must specify the number of nodes with the --nodes directive.
Using --ntasks will sometimes work for applications. It is important to test and make sure the application is really using the multiple processors. Many applications cannot make use of multiple tasks, and will run on a single processor, leaving the others idle with this format of request.

Message Passing Interface (MPI) Jobs¶

Example¶

The following example requests 24 tasks, each with a single core. It further specifies that these should be split evenly on 2 server (nodes), and within the nodes, the 12 tasks should be evenly split on the two sockets. So, each socket (roughly a multi-core CPU) on the two nodes will have 6 tasks, each with its own dedicated core. The --distribution option will ensure that tasks are assigned cyclically among the allocated nodes and sockets.

Slurm is very flexible and allows users to be very specific about their resource requests. Thinking about your application and doing some testing will be important to determine the best set of resources for your specific job.

Expand to see MPI example.

The lines changed from the single-core job are highlighted and annotated.

#!/bin/bash
#SBATCH --job-name=mpi_job_test      # Job name
#SBATCH --mail-type=END,FAIL         # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<email@ufl.edu>    # Where to send mail.  Set this to your email address
#SBATCH --nodes=2                    # Maximum number of nodes to be allocated (1)
#SBATCH --ntasks=24                  # Number of MPI tasks (i.e. processes) (2)
#SBATCH --cpus-per-task=1            # Number of cores per MPI task (3)
#SBATCH --ntasks-per-node=12         # Maximum number of tasks on each node (4)
#SBATCH --ntasks-per-socket=6        # Maximum number of tasks on each socket (5)
#SBATCH --distribution=cyclic:cyclic # Distribute tasks cyclically (6)
#SBATCH --mem-per-cpu=600mb          # Memory (i.e. RAM) per processor (7)
#SBATCH --time=00:05:00              # Wall time limit (days-hrs:min:sec)
#SBATCH --output=mpi_test_%j.log     # Path to the standard output and error files relative to the working directory

echo "Date              = $(date)"
echo "Hostname          = $(hostname -s)"
echo "Working Directory = $(pwd)"
echo ""
echo "Number of Nodes Allocated      = $SLURM_JOB_NUM_NODES"
echo "Number of Tasks Allocated      = $SLURM_NTASKS"
echo "Number of Cores/Task Allocated = $SLURM_CPUS_PER_TASK"

module purge 
module load gcc openmpi
srun --mpi=${HPC_PMIX} /data/training/Slurm/prime/prime_mpi #(8)!

The --nodes directive controls the number of physical servers (nodes) allocated to the job. Typically, a single number is used, indicating the minimum number of nodes. In this case, at least 2 servers are requested. Without the --ntasks-per-node directive on line 8, more nodes could be allocated. Optionally, a range of node count can be used. See the SchedMD sbatch documentation for more details.
The --ntasks directive specifies the number of tasks, or MPI ranks. Unless constrained by other directives, each task can be allocated on a different server by Slurm. Slurm will select the distribution that would allow the job to start the soonest.
As with other examples, --cpus-per-task controls how many cores are allocated to each task. Here, a single core. Hybrid MPI applications, were each task uses multiple cores, are covered below.
--ntasks-per-node is optional and controls how tasks are allocated across servers.

See the Slurm documentation for full details and options.
--ntasks-per-socket is optional and controls how tasks are allocated across sockets (see below for details of socket, but in general, you can think of this as a set of cores with the fastest access to a portion of the server's RAM).

See the Slurm documentation for full details and options.
Since Slurm and OpenMPI have different default methods of distributing tasks, this aligns things so that tasks are distributed correctly.
Unlike in single-node jobs above, where --mem is used for total memory, we suggest using the --mem-per-cpu, which is the per core memory. If some tasks use more memory than others, it is important to request the highest value.

See the Slurm documentation for full details and options.
A few things to note about this line:
- Use srun, not mpirun or mpiexec.
- To avoid needing to update the MPIX version, we suggest using the environment variable $HPC_MPIX, which will be set when the module file is loaded.
- Do not pass the -n or -np flags as Slurm will communicate the information to OpenMPI.

MPI FAQs¶

How many nodes?

The number of servers, or nodes, to request depends on a number of factors.

Check out the node features page to see how much memory (RAM) and many cores and sockets different servers have. HiPerGator is a heterogenous cluster with different generations of servers. Depending on the size of your job, you may need to split things to fit on servers of particular types.
Some applications perform lots of communication among MPI ranks, these would perform best when tasks are grouped on fewer nodes as more of that communication can be on faster, within-node interfaces.
Some applications have high memory bandwidth use. These can saturate the available memory bandwidth on a server, and tasks might perform better split across more nodes.
The above may also be the case for I/O bandwidth.
Ultimately, especially as you scale applications up, using more resources, it is critical to do some testing to ensure that your are efficiently using the resources your job is requesting.

What is a socket?

This used to be an easier answer... Once upon a time, computers had one or more places on the motherboard where the processors plugged in. These are referred to as sockets. The architecture of the motherboard was such that the processor plugged into one socket usually had faster access to some of the memory chips, and the processor plugged into a different socket had faster access to a different set of memory. All of the processors could still reach all of the memory, but access was non-uniform. From this came the idea of a NUMA node (Non-Uniform Memory Access).

Slurm adopted the socket terminology and things were good. A socket referred to the physical plug where a multi-core CPU processor was plugged into a computer motherboard.

But then technology advanced, and AMD introduced processors with multiple NUMA nodes. The multi-core processors plugged into a single socket on the motherboard now had multiple NUMA nodes--some cores had faster access to some memory chips than other cores on the same processor.

So now, the Slurm "socket" refers to these sets of cores with fast access to RAM, not the thing that is plugged into the motherboard.

See the node features page for the details of how many sockets each server has.

Do I need to specify --ntasks-per-node and --ntasks-per-socket?

Not necessarily, but without these, Slurm has free reign on how your tasks are allocated across the cluster. In some cases, that may not be an issue. With the example set up above, requesting --nodes=2 and --ntasks=24, without the --ntasks-per-node line, while two nodes will be used, one node might get 23 tasks, and the other 1 (or any other split, totalling 24). In many cases, that may be fine. But some applications overwhelm memory or IO bandwidth on a server and spreading them across servers can improve performance.

What PMIx version should I use?

To avoid needing to update the MPIX version, we suggest using the environment variable $HPC_MPIX, which will be set when the module file is loaded.

The example below shows the use of the prte_info command to check the MPIX version if you want.

Expand to see example of prte_info command.

You can determine the appropriate PMIx version to use by running the prt_info command after loading the desired OpenMPI environment module.

$ module load gcc/14.2.0 openmpi/5.0.7
$ prte_info
            PRTE: 3.0.82025-04-02
PRTE repo revision: 2025-04-02
PRTE release date: @PMIX_RELEASE_DATE@
            PMIx: OpenPMIx 5.0.6 (PMIx Standard: 4.2, Stable ABI: 0.0, Provisional ABI:
                    0.0)
<snip>

This uses PMIX v5, so you could use --mpi=pmix_v5

Hybrid MPI/Threaded job¶

This script can serve as a template for hybrid MPI/SMP applications. These are MPI applications where each MPI process is multi-threaded (usually via either OpenMP or POSIX Threads) and can use multiple processors.

Our testing has found that it is best to be very specific about how you want your MPI ranks laid out across nodes and even sockets (multi-core CPUs). SLURM and OpenMPI have some conflicting behavior if you leave too much to chance. Please refer to the full Slurm sbatch documentation, as well as the information in the MPI example above.

The following example requests 8 tasks, each with 4 cores. It further specifies that these should be split evenly on 2 nodes, and within the nodes, the 4 tasks should be evenly split on the two sockets. So each CPU on the two nodes will have 2 tasks, each with 4 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.

Expand to view example

#!/bin/bash
#SBATCH --job-name=hybrid_job_test      # Job name
#SBATCH --mail-type=END,FAIL            # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ufl.edu       # Where to send mail
#SBATCH --ntasks=8                      # Number of MPI ranks
#SBATCH --cpus-per-task=4               # Number of cores per MPI rank
#SBATCH --nodes=2                       # Number of nodes
#SBATCH --ntasks-per-node=4             # How many tasks on each node
#SBATCH --ntasks-per-socket=2           # How many tasks on each CPU or socket
#SBATCH --mem-per-cpu=100mb             # Memory per core
#SBATCH --time=00:05:00                 # Time limit hrs:min:sec
#SBATCH --output=hybrid_test_%j.log     # Standard output and error log
pwd; hostname; date

module purge 
module load  gcc/9.3.0  openmpi/4.1.1 raxml-ng/1.1.0

srun --mpi=${HPC_PMIX}  raxml-ng ...

date

The following example requests 8 tasks, each with 8 cores. It further specifies that these should be split evenly on 4 nodes, and within the nodes, the 2 tasks should be split, one on each of the two sockets. So each CPU on the two nodes will have 1 task, each with 8 cores. The distribution option will ensure that MPI ranks are distributed cyclically on nodes and sockets.

Also note setting OMP_NUM_THREADS so that OpenMP knows how many threads to use per task.

Note that MPI gets -np from Slurm automatically.
Note there are many directives available to control processor layout.
Some to pay particular attention to are:
- --nodes if you care exactly how many nodes are used
- --ntasks-per-node to limit number of tasks on a node
- --distribution one of several directives (see also --contiguous, --cores-per-socket, --mem_bind, --ntasks-per-socket, --sockets-per-node) to control how tasks, cores and memory are distributed among nodes, sockets and cores. While Slurm will generally make appropriate decisions for setting up jobs, careful use of these directives can significantly enhance job performance and users are encouraged to profile application performance under different conditions.

Expand to view example

#!/bin/bash
#SBATCH --job-name=LAMMPS
#SBATCH --output=LAMMPS_%j.out
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=<email_address>
#SBATCH --nodes=4              # Number of nodes
#SBATCH --ntasks=8             # Number of MPI ranks
#SBATCH --ntasks-per-node=2    # Number of MPI ranks per node
#SBATCH --ntasks-per-socket=1  # Number of tasks per processor socket on the node
#SBATCH --cpus-per-task=8      # Number of OpenMP threads for each MPI process/rank
#SBATCH --mem-per-cpu=2000mb   # Per processor memory request
#SBATCH --time=4-00:00:00      # Walltime in hh:mm:ss or d-hh:mm:ss
date;hostname;pwd

module purge 
module load gcc/12.2.0 openmpi/4.1.5

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun --mpi=${HPC_PMIX} /path/to/app/lmp_gator2 < in.Cu.v.24nm.eq_xrd

date

Array job¶

Please see the Slurm Job Arrays page for more information on job arrays. Note that we use the simplest 'single-threaded' process example from above and extending it to an array of jobs. Modify the following script using the parallel, mpi, or hybrid job layout as needed.

Expand to view script

#!/bin/bash
#SBATCH --job-name=array_job_test   # Job name
#SBATCH --mail-type=FAIL            # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=email@ufl.edu   # Where to send mail
#SBATCH --ntasks=1                  # Run a single task
#SBATCH --mem=1gb                   # Job Memory
#SBATCH --time=00:05:00             # Time limit hrs:min:sec
#SBATCH --output=array_%A-%a.log    # Standard output and error log (1)
#SBATCH --array=1-5                 # Array range (2)
pwd; hostname; date

echo This is task $SLURM_ARRAY_TASK_ID #(3)!

date

Note the use of %A for the master job ID of the array, and the %a for the task ID in the output filename.
The --array directive makes this an array job that will submit multiple, nearly identical jobs, or tasks. Each task will have a unique value for the environment variable $SLURM_ARRAY_TASK_ID. The 1-5 here indicated that tasks should be numbered, 1, 2, 3, 4, 5. Other task numbering and control methods are described on the Slurm Job Arrays page.
The only difference among each tasks will be the value of the environment variable $SLURM_ARRAY_TASK_ID. You can use the value to select input files, change parameter values, name output files, etc. There are many options here. Again, see Slurm Job Arrays for more details.

GPU job¶

Please see GPU Access for more information regarding the use of HiPerGator GPUs. Note that the order in which the environment modules are loaded is important.

Note

Prior to the RHEL9 upgrade, users were required to specify the gpu partition using --partition=gpu. This has been changed so that users instead specify either '--partition=hpg-turin' or '--partition=hpg-b200' to access L4 and B200 gpus, respectively. Without a partition specification, L4 GPUs will be requested by default.

Expand to view L4 GPU script

#!/bin/bash
#SBATCH --job-name=vasptest
#SBATCH --output=vasp.out
#SBATCH --error=vasp.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@ufl.edu
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=8
#SBATCH --distribution=cyclic:cyclic
#SBATCH --mem-per-cpu=7000mb
#SBATCH --gpus=1  (1)
#SBATCH --partition=hpg-turin (2)
#SBATCH --time=00:30:00

module purge 
module load cuda  intel  openmpi vasp #(3)!

srun --mpi=${HPC_PMIX} vasp_gpu

Use the --gpus directive to specify the number of GPUs.
Partition can be omitted
Cuda will typically be required in the list of modules for GPU jobs.

Expand to view B200 GPU script

   #!/bin/bash
#SBATCH --job-name=vasptest
#SBATCH --output=vasp.out
#SBATCH --error=vasp.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@ufl.edu
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=8
#SBATCH --distribution=cyclic:cyclic
#SBATCH --mem-per-cpu=7000mb
#SBATCH --gpus=1  (1)
#SBATCH --partition=hpg-b200 (2)
#SBATCH --time=00:30

module purge
module load cuda  intel  openmpi vasp #(3)!

srun --mpi=${HPC_PMIX} vasp_gpu

Use the --gpus directive to specify the number of GPUs.
Partition must be specified
Cuda will typically be required in the list of modules for GPU jobs.