Skip to content

khmer

Description

khmer website

khmer is a library and suite of command line tools for working with DNA sequence. It is primarily aimed at short-read sequencing data such as that produced by the Illumina platform.

Environment Modules

Run module spider khmer to find out what environment modules are available for this application.

Environment Variables

  • HPC_KHMER_DIR
  • HPC_KHMER_BIN
  • HPC_KHMER_LIB
  • HPC_KHMER_SANDBOX

Additional Usage Information

Use import khmer in your script or in an interactive Python session to begin using Khmer.

Available Scripts:

- abundance-dist.py  
- count-median.py  
- do-partition.sh  
- filter-abund.py  
- find-knots.py  
- load-into-counting.py  
- merge-partitions.py  
- normalize-by-median.py  
- partition-graph.py  
- annotate-partitions.py  
- count-overlap.py  
- extract-partitions.py  
- filter-stoptags.py  
- load-graph.py  
- make-initial-stoptags.py  
- normalize-by-kadian.py  
- normalize-by-min.py

Citation

If you use the khmer software, you must cite:

Crusoe, M. R., Alameldin, H. F., Awad, S., Boucher, E., Caldwell, A., Cartwright, R., Charbonneau, A., Constantinides, B., Edvenson, G., Fay, S., Fenton, J., Fenzl, T., Fish, J., Garcia-Gutierrez, L., Garland, P., Gluck, J., González, I., Guermond, S., Guo, J., Brown, C. T. (2014). The khmer software package: Enabling efficient sequence analysis. Figshare. https://doi.org/10.6084/m9.figshare.979190

If you use any of Khmer's published scientific methods, you should also cite the relevant paper(s) listed below:

Graph partitioning and/or compressible graph representation: The load-graph.py, partition-graph.py, find-knots.py, load-graph.py, and partition-graph.py scripts are part of the compressible graph representation and partitioning algorithms described in:

  • Pell, J., Hintze, A., Canino-Koning, R., Howe, A., Tiedje, J. M., & Brown, C. T. (2012). Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proceedings of the National Academy of Sciences of the United States of America, 109(33), 13272–13277. https://doi.org/10.1073/pnas.1121464109

Digital normalization: The normalize-by-median.py and count-median.py scripts are part of the digital normalization algorithm, described in:

  • Brown, C. T., Howe, A. C., Zhang, Q., Pyrkosz, A. B., & Brom, T. H. (2012). A reference-free algorithm for computational normalization of shotgun sequencing data. arXiv preprint, arXiv:1203.4802 [q-bio.GN]. https://doi.org/10.48550/arXiv.1203.4802

K-mer counting: The abundance-dist.py, filter-abund.py, and load-into-counting.py scripts implement the probabilistic k-mer counting described in:

  • Zhang, Q., Pell, J., Canino-Koning, R., Howe, A. C., & Brown, C. T. (2013). These are not the k-mers you are looking for: Efficient online k-mer counting using a probabilistic data structure. arXiv preprint, arXiv:1309.2975 [q-bio.GN]. https://doi.org/10.1371/journal.pone.0101271

Categories

biology, ngs