AlphaFold¶
Description¶
This package provides an implementation of the inference pipeline of AlphaFold v2.3. Note: Use alphafold_full_db.sh command to run with default data path arguments or use alphafold to launch without any required arguments pre-populated
Environment Modules¶
Run module spider alphafold
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_ALPHAFOLD_DIR - installation directory
- HPC_ALPHAFOLD_BIN - executable directory
- HPC_ALPHAFOLD_REF - reference directory
- HPC_ALPHAFOLD_DATA - data directory
Additional Usage Information¶
To simplify the usage use the 'alphafold_full_db.sh' script. Simply run example:
alphafold_full_db.sh --fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta --output_dir=~/scratch --max_template_date=2020-05-14 --use_gpu_relax=1
From version 2.3, the AlphaFold documentation recommends running as Docker container. However, Docker is not compatible with the HPC. AlphaFold has been installed as an apptainer container and the alphafold_full_db.sh wrapper script has been created to mimic the behavior of docker/run_docker.py as referenced in the AlphaFold documentation. alphafold_full_db,sh will specify the database location options required by alphafold.
To specify these options manually, use run_alphafold.sh instead.
If using the --model_preset=multimer option, use the alphafold_multimer_db.sh launch script instead. Example:
alphafold_multimer_db.sh --model_preset=multimer --fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta --output_dir=~/scratch --max_template_date=2020-05-14 --use_gpu_relax=1<!-- END INCLUDE -->
Job Script Examples¶
Note that Alphafold has large memory requirements and some of its stages use 4 or 8 CPUs in addition to a GPU. An example job script for a run with the test data included with the software is shown below.
- Sample script for version 2.1.2:
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --constraint=ai
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gpus=1
#SBATCH --mem=48gb
#SBATCH --time=12:00:00
date;hostname;pwd
module load alphafold
run_alphafold.py \
--data_dir "${HPC_ALPHAFOLD_REF}" \
--output_dir $(pwd) \
--fasta_paths query.fasta \
--uniref90_database_path=${HPC_ALPHAFOLD_REF}/uniref90/uniref90.fasta \
--mgnify_database_path=${HPC_ALPHAFOLD_REF}/mgnify/mgy_clusters_2018_12.fa \
--template_mmcif_dir=${HPC_ALPHAFOLD_REF}/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=${HPC_ALPHAFOLD_REF}/pdb_mmcif/obsolete.dat \
--use_gpu_relax=1 \
--bfd_database_path=${HPC_ALPHAFOLD_REF}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=${HPC_ALPHAFOLD_REF}/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--pdb70_database_path=${HPC_ALPHAFOLD_REF}/pdb70/pdb70
date
- Sample script for version 2.3.1:
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --constraint=a100
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gpus=1
#SBATCH --mem=300gb
#SBATCH --time=96:00:00
date;hostname;pwd
module load alphafold
alphafold_full_db.sh \
--fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta \
--output_dir=~/scratch \
--max_template_date=2020-05-14 \
--use_gpu_relax=1
date
Categories¶
protein, structure, machine_learning