Skip to content

AlphaFold

Description

alphafold website

This package provides an implementation of the inference pipeline of AlphaFold v2.3. Note: Use alphafold_full_db.sh command to run with default data path arguments or use alphafold to launch without any required arguments pre-populated

Environment Modules

Run module spider alphafold to find out what environment modules are available for this application.

Environment Variables

  • HPC_ALPHAFOLD_DIR - installation directory
  • HPC_ALPHAFOLD_BIN - executable directory
  • HPC_ALPHAFOLD_REF - reference directory
  • HPC_ALPHAFOLD_DATA - data directory

Additional Usage Information

To simplify the usage use the 'alphafold_full_db.sh' script. Simply run example:

alphafold_full_db.sh  --fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta --output_dir=~/scratch --max_template_date=2020-05-14 --use_gpu_relax=1

From version 2.3, the AlphaFold documentation recommends running as Docker container. However, Docker is not compatible with the HPC. AlphaFold has been installed as an apptainer container and the alphafold_full_db.sh wrapper script has been created to mimic the behavior of docker/run_docker.py as referenced in the AlphaFold documentation. alphafold_full_db,sh will specify the database location options required by alphafold.

To specify these options manually, use run_alphafold.sh instead.

If using the --model_preset=multimer option, use the alphafold_multimer_db.sh launch script instead. Example:

alphafold_multimer_db.sh  --model_preset=multimer --fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta --output_dir=~/scratch --max_template_date=2020-05-14 --use_gpu_relax=1<!-- END INCLUDE -->

Job Script Examples

Note that Alphafold has large memory requirements and some of its stages use 4 or 8 CPUs in addition to a GPU. An example job script for a run with the test data included with the software is shown below.

  • Sample script for version 2.1.2:
#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --constraint=ai
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gpus=1
#SBATCH --mem=48gb
#SBATCH --time=12:00:00
date;hostname;pwd

module load alphafold

run_alphafold.py \
    --data_dir "${HPC_ALPHAFOLD_REF}" \
    --output_dir $(pwd) \
    --fasta_paths query.fasta \
    --uniref90_database_path=${HPC_ALPHAFOLD_REF}/uniref90/uniref90.fasta \
    --mgnify_database_path=${HPC_ALPHAFOLD_REF}/mgnify/mgy_clusters_2018_12.fa \
    --template_mmcif_dir=${HPC_ALPHAFOLD_REF}/pdb_mmcif/mmcif_files \
    --max_template_date=2020-05-14 \
    --obsolete_pdbs_path=${HPC_ALPHAFOLD_REF}/pdb_mmcif/obsolete.dat \
    --use_gpu_relax=1 \
    --bfd_database_path=${HPC_ALPHAFOLD_REF}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniclust30_database_path=${HPC_ALPHAFOLD_REF}/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --pdb70_database_path=${HPC_ALPHAFOLD_REF}/pdb70/pdb70

date
  • Sample script for version 2.3.1:

#!/bin/bash
#SBATCH --partition=gpu
#SBATCH --constraint=a100
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gpus=1
#SBATCH --mem=300gb
#SBATCH --time=96:00:00

date;hostname;pwd

module load alphafold

alphafold_full_db.sh \
    --fasta_paths=${HPC_ALPHAFOLD_REF}/test.fasta \
    --output_dir=~/scratch \
    --max_template_date=2020-05-14 \
    --use_gpu_relax=1

date

Categories

protein, structure, machine_learning