Skip to content

SRA

Description

sra website

The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. The toolkit contains loading and dumping tools with their respective libraries for building new and accessing existing runs. The online documentation is located at https://github.com/ncbi/sra-tools/wiki

Environment Modules

Run module spider sra to find out what environment modules are available for this application.

Environment Variables

  • HPC_SRA_DIR - location of the installation directory
  • HPC_SRA_BIN - location of the executables directory
  • HPC_SRA_DOC - location of the documentation directory

Additional Usage Information

Warning

sra creates a $HOME/ncbi/public directory for caching prefetched data files.

Set Prefetch Directory

The sra toolkit creates a $HOME/ncbi/public directory for caching prefetched data files. Home directory has a 40gb limit and its use for job data storage is a violation of the RC storage policy.

Change that location to a directory in your blue directory tree before running the sra toolkit. The official approach is to use the vdb-config tool

vdb-config -i

In the config tool change the directory to, for example, /blue/$GROUP/$USER/ncbi/public. See the SRA Toolkit Configuration Documentation for more details.

Alternatively, use a symlink to redirect the cache directory to blue.

  • Remove the 'ncbi' directory in your home directory:
        $ cd
        $ rm -rf ncbi
    
  • Create a 'ncbi' directory in your /blue space
        $ mkdir /blue/mygroup/$USER/ncbi
    
  • Symlink the ncbi directory in blue into your home directory
        $ ln -s /blue/mygroup/$USER/ncbi/ ~/ncbi
    

Uploads

It appears that data uploads to NCBI only work from login servers because of likely IP address range blocks. Start a screen session before beginning an upload if there are any concerns about being disconnected.

Categories

biology, ngs