Skip to content

R

Description

R website

This module enables the use of the R software

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

Modules system sets up the following environment variables for this module:

Environment Modules

Run module spider R to find out what environment modules are available for this application.

Environment Variables

  • HPC_R_DIR - installation directory
  • HPC_R_BIN - executable directory

Additional Usage Information

R can be run on the command-line (or the batch system) using the Rscript myscript.R or R CMD BATCH myscript.R command. For script development or visualization RStudio GUI application can be used. See the Open OnDemand documentation for details. Alternatively an instance of RStudio_Server can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.

Notes and Warnings

  • The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE")) to find out the number of CPU cores 'X' requested in your job script by: #SBATCH --cpus-per-task=X

  • Default RData format In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.

  • Java rJava users need to load the java module manually with module load java/1.7.0_79. Use the correct java module version for your case.

  • TMPDIR If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like

    mkdir -p tmp
    export TMPDIR=$(pwd)/tmp
    
    in your job script to prevent this and launch your job from the respective directory and not from your home directory.
  • For users of PHI and FERPA: It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in /blue when working with R. Writing files to $HOME or $TMPDIR could expose restricted data to unauthorized users.

  • Tasks vs Cores for parallel runs Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:

    #SBATCH --nodes=1
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=8
    
  • See the single-threaded and multi-threaded examples in the Sample SLURM Scripts page for more details.

Installed Libraries:

You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our Applications FAQ and see the section "How do I install R packages?".

Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package:

mkdir ~/R/x86_64-pc-linux-gnu-library/4.3
You can set a custom library path with the R_LIBS_USER environment variable. From https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html:

R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers.

To see a list of installed libraries in the currently loaded version of R:

$ R
> installed.packages()

R MPI Example

Expand this section to view an example of using R MPI code

Example, of using the parallel module to run MPI jobs under SLURM with Rmpi library.

# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
    library("Rmpi")
    }

ns <- mpi.universe.size() - 1
mpi.spawn.Rslaves(nslaves=ns)
#
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function(){
      if (is.loaded("mpi_initialize")){
          if (mpi.comm.size(1) > 0){
              print("Please use mpi.close.Rslaves() to close slaves.")
              mpi.close.Rslaves()
          }
          print("Please use mpi.quit() to quit R")
          .Call("mpi_finalize")
      }
}
# Tell all slaves to return a message identifying themselves
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( ns <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))

# Test computations
x <- 5
x <- mpi.remote.exec(rnorm, x)
length(x)
x

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves(dellog = FALSE)
mpi.quit()

Example job script using rmpi_test.R script.

#!/bin/sh
#SBATCH --job-name=mpi_job_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail  
#SBATCH --cpus-per-task=1 # Number of cores per MPI rank 
#SBATCH --nodes=2 #Number of nodes
#SBATCH --ntasks=8 # Number of MPI ranks
#SBATCH --ntasks-per-node=4 #How many tasks on each node
#SBATCH --ntasks-per-socket=2 #How many tasks on each CPU or socket
#SBATCH --distribution=cyclic:cyclic #Distribute tasks cyclically on nodes and sockets
#SBATCH --mem-per-cpu=1gb # Memory per processor
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
#SBATCH --output=mpi_test_%j.out # Standard output and error log
pwd; hostname; date

echo "Running example Rmpi script. Using $SLURM_JOB_NUM_NODES nodes with $SLURM_NTASKS 
tasks, each with $SLURM_CPUS_PER_TASK cores."
module purge; module load gcc openmpi rmpi

srun --mpi=${HPC_PMIX} Rscript /data/training/SLURM/rmpi_test.R

date

For rmpi/4.0 module the following command will work

mpiexec -n ${SLURM_NTASKS} Rscript rmpi_test.R

Link the /apps/rmpi/conf/Rprofile as .Rprofile in the current directory configuration file that must be placed in the working directory if the rmpi module doesn't add a symlink automatically.

ln -s /apps/rmpi/conf/Rprofile .Rprofile

Job Script Examples

Expand this section to view an example R job script
#!/bin/bash
#SBATCH --job-name=R_test   #Job name   
#SBATCH --mail-type=END,FAIL   # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE   # Where to send mail    
#SBATCH --ntasks=1
#SBATCH --mem=1gb   # Per processor memory
#SBATCH --time=00:05:00   # Walltime
#SBATCH --output=r_job.%j.out   # Name output file 
#Record the time and compute node the job ran on
date; hostname; pwd
#Use modules to load the environment for R
module load R

#Run R script 
Rscript myRscript.R

date

Categories

statistics