PAUDA¶

Description¶

PAUDA is a new approach toward the problem of comparing DNA reads against a database of protein reference sequences that is applicable to very large datasets consisting of hundreds of millions or billions of reads. PAUDA is an acronym for "Protein Alignment Using a DNA Aligner". The approach allows one to harness the high efficiency of DNA read aligners to compute BLASTX-like alignments between sequencing reads and a protein database in a small fraction of the time required by BLASTX. The PAUDA approach makes it possible to process DNA reads at a rate of millions of reads per CPU hour. PAUDA is 10,000 times faster than BLASTX. This module sets the following environment variables:

Environment Modules¶

Run module spider pauda to find out what environment modules are available for this application.

Environment Variables¶

HPC_PAUDA_DIR - installation directory
HPC_PAUDA_BIN - executable directory
HPC_PAUDA_DOC - documentation directory
HPC_PAUDA_DATA - data directory

Citation¶

If you publish research that uses PAUDA you have to cite it as follows:

Huson, D. H., & Xie, C. (2013). A poor man's BLASTX - high-throughput metagenomic protein database search using PAUDA. Bioinformatics, 29(17), 2180-2182. https://doi.org/10.1093/bioinformatics/btt254

Categories¶

biology, ngs, phylogenetics