Slurm and GPU Use¶

After purchase, NGU allocations are included in your groups resources (quality of service).

Interactive Access¶

Interactive sessions are limited to 12 hours. In order to request interactive command line access to a GPU under SLURM, use commands similar to these:

To request access to an L4 GPU for a default of 10 minutes session:

srun --gpus=1 --pty -u bash -i

srun --partition=hpg-turin --gpus=1 --pty -u bash -i

To request access to two B200 GPUs on a single node for a 3-hour session with 300gb RAM:

srun --partition=hpg-b200 --gpus=2 --mem=300gb --time=3:00:00 --ntasks=1 --pty bash -i

Open On Demand Access¶

To access GPUs using Open-On-Demand, please check the form for your application. If your application supports multiple GPU types, choose the GPU partition and specify number of GPUs and type:

To request one L4 GPU, select the cluster partition 'hpg-turin' and use this gres string:

gpu:1

To request one B200 GPU, select the cluster partition 'hpg-b200' and use this gres string:

gpu:1

To request multiple GPUs (of any type, use this gres string were n is the number of GPUs you need):

gpu:n

Batch Jobs¶

For batch jobs, to request GPU resources, use lines similar to the following:

In this example, two L4 GPUs on a single server (--nodes defaults to "1") will be allocated to the job:

#SBATCH --partition=hpg-turin
#SBATCH --gpus=2

In this example, two B200 GPUs on a single server (--nodes defaults to "1") will be allocated to the job:

#SBATCH --partition=hpg-b200
#SBATCH --gpus=2

Alternatively, use '--gres=gpu:1' or '--gres=gpu:b200:1' format. Note, if '--gpus=' format is used SLURM will not provide the data on GPU usage to slurmInfo and those GPUs will not be shown in slurmInfo output.

If no GPUs are available, your request will be queued and your connection established once the next GPU becomes available. Otherwise, you may cancel your job and try lowering requested resources. If you have requested a longer time than is needed, please be sure to end your session so that the GPU will be available for other users.

SLURM Options for B200 GPUs¶

To use B200 GPUs for interactive sessions or batch jobs, please use one of the following SLURM parameters:

--partition=hpg-b200
--gpus=2

Job Script Example¶

This is a sample script for MPI parallel VASP job requesting and using GPUs under SLURM:

Expand to view a sample script

#!/bin/bash
#SBATCH --job-name=vasptest
#SBATCH --output=vasp.out
#SBATCH --error=vasp.err
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@ufl.edu
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=8
#SBATCH --ntasks-per-socket=4
#SBATCH --mem-per-cpu=7000mb
#SBATCH --distribution=cyclic:cyclic
#SBATCH --partition=hpg-turin
#SBATCH --gres=gpu:4
#SBATCH --time=00:30:00

echo "Date      = $(date)"
echo "host      = $(hostname -s)"
echo "Directory = $(pwd)"

module purge
module load cuda  intel  openmpi vasp # You may want to specify versions
                # Use `module spider vasp` to see available versions of VASP
                # And `module spider vasp/### with the version number to see options 
                # for cuda, intel, and openmpi versions. 

T1=$(date +%s)
srun --mpi=${HPC_PMIX} vasp_gpu
T2=$(date +%s)

ELAPSED=$((T2 - T1))
echo "Elapsed Time = $ELAPSED"