SHAPEIT4¶
Description¶
A fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm
Environment Modules¶
Run module spider shapeit4
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_SHAPEIT4_DIR - installation directory
- HPC_SHAPEIT4_BIN - executable directory
Job Script Examples¶
Below is a sample job script using SHAPEIT4
#!/bin/bash
#SBATCH --job-name=shapeit4_test
#SBATCH --mail-type=NONE
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4gb
#SBATCH --time=24:00:00
#SBATCH --output=shapeit4_test.log
echo "Setting up test environment..."
TEST_PWD=/data/apps/tests/shapeit4
TEST_SAMPLEDIR=${TEST_PWD}/example_data
TEST_WORKDIR=${TEST_PWD}/output
cd ${TEST_PWD}
module load shapeit4
# Remove any previous test results, create a working directory, and copy
# initial test reads into the expected position in working directory
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}/; fi
mkdir ${TEST_WORKDIR}
echo "Starting test run at $(date) on $(hostname)..."
shapeit4 \
--input ${TEST_SAMPLEDIR}/unphased.vcf.gz \
--map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
--region 20 \
--output ${TEST_WORKDIR}/phased.vcf.gz \
--thread ${SLURM_CPUS_PER_TASK:-1}
# Test with BDF files...
shapeit4 \
--input ${TEST_SAMPLEDIR}/unphased.bcf \
--map ${TEST_SAMPLEDIR}/chr20.b37.gmap.gz \
--region 20 \
--output ${TEST_WORKDIR}/phased.bcf \
--thread ${SLURM_CPUS_PER_TASK:-1}
# There should be some files in the work directory
echo "There should be some results listed below:"
find ${TEST_WORKDIR} -type f ! -empty -ls
echo "Test complete at $(date)."
Citation¶
If you publish research that uses SHAPEIT4 you have to cite it as follows:
Olivier Delaneau, Jean-Francois Zagury, Matthew R Robinson, Jonathan Marchini, Emmanouil Dermitzakis. Accurate, scalable and integrative haplotype estimation. Nat. Comm. 2019.
Categories¶
biology, sequencing