PGAP¶
Description¶
The NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids). Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs and pseudogenes. NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Pipeline was developed in 2001 and is regularly upgraded to improve structural and functional annotation quality (Li W, O'Neill KR et al 2021). Recent improvements include utilization of curated protein profile hidden Markov models (HMMs), and curated complex domain architectures for functional annotation of proteins and annotation of Enzyme Commission numbers and Gene Ontology terms.
Environment Modules¶
Run module spider pgap
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_PGAP_DIR - installation directory
- HPC_PGAP_BIN - executable directory
Additional Usage Information¶
The PGAP module provides a wrapper function pgap.py
to fine tune usage in HPG. This means that you should just use pgap.py ...
without a path when running the command. See the Job Scripts section below for an example.
Job Script Examples¶
Script used to test application installation
#!/bin/bash
#SBATCH --job-name=pgap_20220414_test
#SBATCH --mail-type=NONE
#SBATCH --cpus-per-task=28
#SBATCH --mem-per-cpu=4gb
#SBATCH --time=24:00:00
#SBATCH --output=pgap_20220414_test.log
echo "Setting up test environment..."
TEST_PWD=/data/apps/tests/pgap/20220414
TEST_DATADIR=${TEST_PWD}/example_data
TEST_WORKDIR=${TEST_PWD}/test_output
cd ${TEST_PWD}
module load pgap/20220414
# Remove any previous test results and re-create a working directory
if [ -d ${TEST_WORKDIR} ]; then rm -rf ${TEST_WORKDIR}*/; fi
echo "Starting test run at $(date) on $(hostname)..."
###############################################################################
pgap.py \
--cpus ${SLURM_CPUS_ON_NODE:-4} \
-o ${TEST_WORKDIR} \
${HPC_PGAP_DIR}/test_genomes/MG37/input.yaml
###############################################################################
# There should be some files in the work directory
echo "There should be some results listed below:"
find ${TEST_WORKDIR} -type f ! -empty -ls
echo "Test complete at $(date)."
Citation¶
If you publish research that uses pgap, please cite NCBI:
Categories¶
biology, genomics, annotation, pipeline