khmer¶
Description¶
khmer is a library and suite of command line tools for working with DNA sequence. It is primarily aimed at short-read sequencing data such as that produced by the Illumina platform.
Environment Modules¶
Run module spider khmer
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_KHMER_DIR
- HPC_KHMER_BIN
- HPC_KHMER_LIB
- HPC_KHMER_SANDBOX
Additional Usage Information¶
Use import khmer
in your script or in an interactive Python session to begin using Khmer.
Available Scripts:
- abundance-dist.py
- count-median.py
- do-partition.sh
- filter-abund.py
- find-knots.py
- load-into-counting.py
- merge-partitions.py
- normalize-by-median.py
- partition-graph.py
- annotate-partitions.py
- count-overlap.py
- extract-partitions.py
- filter-stoptags.py
- load-graph.py
- make-initial-stoptags.py
- normalize-by-kadian.py
- normalize-by-min.py
Citation¶
If you use the khmer software, you must cite:
Crusoe, M. R., Alameldin, H. F., Awad, S., Boucher, E., Caldwell, A., Cartwright, R., Charbonneau, A., Constantinides, B., Edvenson, G., Fay, S., Fenton, J., Fenzl, T., Fish, J., Garcia-Gutierrez, L., Garland, P., Gluck, J., González, I., Guermond, S., Guo, J., Brown, C. T. (2014). The khmer software package: Enabling efficient sequence analysis. Figshare. https://doi.org/10.6084/m9.figshare.979190
If you use any of Khmer's published scientific methods, you should also cite the relevant paper(s) listed below:
Graph partitioning and/or compressible graph representation: The load-graph.py, partition-graph.py, find-knots.py, load-graph.py, and partition-graph.py scripts are part of the compressible graph representation and partitioning algorithms described in:
- Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J. M., & Brown, C. T. (2012). Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proceedings of the National Academy of Sciences of the United States of America, 109(33), 13272–13277. https://doi.org/10.1073/pnas.1121464109
Digital normalization: The normalize-by-median.py and count-median.py scripts are part of the digital normalization algorithm, described in:
- Brown, C. T., Howe, A. C., Zhang, Q., Pyrkosz, A. B., & Brom, T. H. (2012). A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv preprint, arXiv:1203.4802 [q-bio.GN]. https://doi.org/10.48550/arXiv.1203.4802
K-mer counting: The abundance-dist.py, filter-abund.py, and load-into-counting.py scripts implement the probabilistic k-mer counting described in:
- Zhang, Q., Pell, J., Canino-Koning, R., Howe, A. C., & Brown, C. T. (2013). These are not the k-mers you are looking for: Efficient online k-mer counting using a probabilistic data structure. arXiv preprint, arXiv:1309.2975 [q-bio.GN]. https://doi.org/10.1371/journal.pone.0101271
Categories¶
biology, ngs