Skip to content

kmerfreq

Description

kmerfreq website

kmerfreq count K-mer (with size K) frequency from the input sequence data, typically sequencing reads data, and reference genome data is also applicable. The forward and reverse strand of a k-mer are taken as the same k-mer, and only the kmer strand with smaller bit-value is used to represent the kmer. It adopts a 16-bit integer with max value 65535 to store the frequency value of a unique K-mer, and any K-mer with frequency larger than 65535 will be recorded as 65535. The program store all kmer frequency values in a 4^K size array of 16-bit integer (2 bytes), using the k-mer bit-value as index, so the total memory usage is 2* 4^K bytes. For K-mer size 15, 16, 17, 18, 19, it will consume constant 2G, 8G 32G 128G 512G memory, respectively. kmerfreq works in a highly simple and parallel style, to achieve as fast speed as possible. The output files can be used as input \file for programs GCE and correct_error_reads.

Environment Modules

Run module spider kmerfreq to find out what environment modules are available for this application.

Environment Variables

  • HPC_KMERFREQ_DIR - installation directory
  • HPC_KMERFREQ_BIN - executable directory

Citation

If you publish research that uses kmerfreq you have to cite it as follows:

  • Liu, B., Shi, Y., Yuan, J., Hu, X., Zhang, H., Li, N., Li, Z., Chen, Y., Mu, D., Fan, W., & et al. (2013). Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv preprint, arXiv:1308.2012 [q-bio.GN]. https://doi.org/10.48550/arXiv.1308.2012

  • Wang, H., Liu, B., Zhang, Y., Jiang, F., Ren, Y., Yin, L., Liu, H., Wang, S., & Fan, W. (2020). Estimation of genome size using k-mer frequencies from corrected long reads. arXiv preprint, arXiv:2003.11817 [q-bio.GN]. https://doi.org/10.48550/arXiv.2003.11817

Categories

biology, genomics