RepeatAnalysisTools¶
Description¶
This repository contains instructions for processing and repeat analysis of sequence data generated with the PacBio No-Amp Targeted Sequencing Protocol with simplified double Cas9 cut.
UPDATE: RepeatAnalysis Tools in this repository now use Python 3.
Outputs from the analysis scripts include high-accuracy (>=QV20) CCS sequences for target regions so that users can easily analyze the results with other third party tools as necessary.
Environment Modules¶
Run module spider repeatanalysistools
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_RATOOLS_DIR - installation directory
- HPC_RATOOLS_BIN - executable directory
Additional Usage Information¶
To utilize any of the BASH or Python scripts provided with the tools (i.e. the files ending with .sh
or .py
, you will need to prefix the script name with ${HPC_RATOOLS_DIR}/
For example, a preprocess.sh
command might look something like:
${HPC_RATOOLS_DIR}/preprocess.sh \
m64012_191221_044659.subreads.bam \
m64012_191221_044659.adapters.fasta \
/data/reference/genomes/human/hs37d5/hs37d5.fa \
./output \
16 \
16 \
local
Citation¶
If you publish research that uses RepeatAnalysisTools you have to cite it as follows:
@software{tange_2021_5013933,
author = {Tange, Ole},
title = {GNU Parallel 20210622 ('Protasevich')},
month = Jun,
year = 2021,
note = {{GNU Parallel is a general parallelizer to run
multiple serial command line programs in parallel
without changing them.}},
publisher = {Zenodo},
doi = {10.5281/zenodo.5013933},
url = {https://doi.org/10.5281/zenodo.5013933}
Categories¶
biology, genomics, sequencing, utility