parabricks¶

Description¶

Parabricks is a software suite for genomic analysis. It delivers major improvements in throughput time for common analytical tasks in genomics, including germline and somatic analysis. The core of the Parabricks software is its data pipeline which takes raw data and transoforms it according to the user's requirements.

Environment Modules¶

Run module spider parabricks to find out what environment modules are available for this application.

Environment Variables¶

HPC_PARABRICKS_DIR - installation directory
HPC_PARABRICKS_BIN - executable directory
HPC_PARABRICKS_IMAGE - the parabricks container image

Additional Usage Information¶

An example job resource request based on the Nvidia recommendation:

srun -N 1 --cpus-per-task=16 --gpus=2 --mem=32gb --time=200:00 --pty bash -i

Parabricks requires 2 to 8 GPUs to run. The '--num-gpus X' pbrun argument must match the number 'X' of requested GPUs. If not specified parabricks will try to run on all gpus on the compute node and exit with an error.

A parabricks run may produce an error if the paths used as arguments for the run resolve to symlinks. Containerized tools need real paths i.e. use the /blue/mygroup/myuser/project/inputdir path instead of a shorter ~/blue/project/inputdir path that's using a symlink even if the ~/blue symlink is pointing to the /blue/mygroup/myuser/ directory.

Job Script Examples¶

#!/bin/bash
#SBATCH --partition=hpg-ai               # partition
#SBATCH --time=4:00:00              # wall time
#SBATCH --mem=64gb              # all mem avail
#SBATCH --mail-type=FAIL        # only send email on failure
#SBATCH --mail-user=your@email.com
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --gpus=2  # 2-8 GPUS
#SBATCH --output=pb_%j.log
date;hostname;pwd

# Load the Parabricks environment module
module load parabricks

# Set data directories
DATA_DIR="/blue/nvidia-parabricks/g.burnett/parabricks_sample"
SAMPLE_1="${DATA_DIR}/Data/sample_1.fq.gz"
SAMPLE_2="${DATA_DIR}/Data/sample_2.fq.gz"
REF="${DATA_DIR}/Ref/Homo_sapiens_assembly38.fasta"
OUTPUT_DIR="${DATA_DIR}"

# Make the output directory
mkdir -p ${OUTPUT_DIR}

# Run germline
pbrun germline \
        --ref ${REF} \
        --in-fq ${SAMPLE_1} ${SAMPLE_2} \
        --out-bam ${OUTPUT_DIR}/germline.bam \
        --out-variants ${OUTPUT_DIR}/germline.vcf |& tee ${OUTPUT_DIR}/germline.log

Parabricks on HiPerGator Tutorial¶

GitHub repository with code examples: https://github.com/hw-ju/genomics_uf_tutorials

Categories¶

phylogenetics