GATK¶
Description¶
The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. GATK aims to work well with both samtools and Picard by providing complementary tools. GATK SNP calling pipeline (Q score recalibration -> multiple sequence realignment -> snp/index calling) is a particular area of focus.
Note: We provide a GenomeAnalysisTK wrapper script that calls java with the correct jar file, so you can call GATK with "GenomeAnalysisTK -T analysis_type [options] ...".
We provide the JEXL library that allows you to specify sofisticated option combinations for VariantFiltration. See http://goo.gl/YjzZf4 for more details.
Environment Modules¶
Run module spider gatk
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_GATK_DIR - installation directory
- HPC_GATK_DOC - guide book directory
Additional Usage Information¶
We provide a wrapper script GenomeAnalysisTK for gatk module versions before 4.0 that is equivalent to running:
mkdir -p tmp
export TMPDIR=$(pwd)/tmp
java -Djava.io.tmpdir=$TMPDIR -cp /apps/gatk/jexl/2.1.1/commons-jexl-2.1.1.jar -jar $HPC_GATK_DIR/GenomeAnalysisTK.jar
If you do not use the wrapper you must make sure to create and use a local TMPDIR in your /blue
space with GenomeAnalysisTK.jar
. Otherwise /tmp
will be used by default leading to filled up /tmp
partitions on compute nodes and node failure.
Starting with GATK4 new upstream wrappers are available, so we no longer include our own wrapper. GenomeAnalysisTK and gatk. Running GenomeAnalysisTK
will show you how to run gatk tools. To get a full list of tools run gatk list
.
Categories¶
biology, ngs