SRA¶
Description¶
The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. The toolkit contains loading and dumping tools with their respective libraries for building new and accessing existing runs. The online documentation is located at https://github.com/ncbi/sra-tools/wiki
Environment Modules¶
Run module spider sra
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_SRA_DIR - location of the installation directory
- HPC_SRA_BIN - location of the executables directory
- HPC_SRA_DOC - location of the documentation directory
Additional Usage Information¶
Warning
sra creates a $HOME/ncbi/public directory for caching prefetched data files.
Set Prefetch Directory¶
The sra toolkit creates a $HOME/ncbi/public directory for caching prefetched data files. Home directory has a 40gb limit and its use for job data storage is a violation of the RC storage policy.
Change that location to a directory in your blue directory tree before running the sra toolkit. The official approach is to use the vdb-config tool
vdb-config -i
In the config tool change the directory to, for example, /blue/$GROUP/$USER/ncbi/public. See the SRA Toolkit Configuration Documentation for more details.
Alternatively, use a symlink to redirect the cache directory to blue.
- Remove the 'ncbi' directory in your home directory:
$ cd $ rm -rf ncbi
- Create a 'ncbi' directory in your /blue space
$ mkdir /blue/mygroup/$USER/ncbi
- Symlink the ncbi directory in blue into your home directory
$ ln -s /blue/mygroup/$USER/ncbi/ ~/ncbi
Uploads¶
It appears that data uploads to NCBI only work from login servers because of likely IP address range blocks. Start a screen session before beginning an upload if there are any concerns about being disconnected.
Categories¶
biology, ngs