Skip to content

R

Description

R website

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and graphical techniques. It is highly extensible. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Note: This module's environment is compatible with rstudio/1.1.419, so the personal packages installed under either module will work. The default installation directory is ~/R/x86_64-pc-linux-gnu-library/3.4/. Modules system sets up the following environment variables for this module:

Environment Modules

Run module spider R to find out what environment modules are available for this application.

Environment Variables

  • HPC_R_DIR - installation directory
  • HPC_R_BIN - executable directory

Additional Usage Information

R can be run on the command-line (or the batch system) using the Rscript myscript.R or R CMD BATCH myscript.R command. For script development or visualization RStudio GUI application can be used. See the Open OnDemand documentation for details. Alternatively an instance of RStudio_Server can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.

Notes and Warnings

  • The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE")) to find out the number of CPU cores 'X' requested in your job script by: #SBATCH --cpus-per-task=X

  • Default RData format In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.

  • Java rJava users need to load the java module manually with module load java/1.7.0_79. Use the correct java module version for your case.

  • TMPDIR If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like

    mkdir -p tmp
    export TMPDIR=$(pwd)/tmp
    
    in your job script to prevent this and launch your job from the respective directory and not from your home directory.
  • For users of PHI and FERPA: It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in /blue when working with R. Writing files to $HOME or $TMPDIR could expose restricted data to unauthorized users.

  • Tasks vs Cores for parallel runs Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:

    #SBATCH --nodes=1
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=8
    
  • See the single-threaded and multi-threaded examples in the Sample SLURM Scripts page for more details.

Categories

statistics