Skip to content

R

Description

R website

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and graphical techniques. It is highly extensible. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed.

Modules system sets up the following environment variables for this module:

Environment Modules

Run module spider R to find out what environment modules are available for this application.

Environment Variables

  • HPC_R_DIR - installation directory
  • HPC_R_BIN - executable directory

Additional Usage Information

R can be run on the command-line (or the batch system) using the Rscript myscript.R or R CMD BATCH myscript.R command. For script development or visualization RStudio GUI application can be used. See the Open OnDemand documentation for details. Alternatively an instance of RStudio_Server can be started in a job. Then you can connect to it through an SSH tunnel from a web browser on your local computer.

Notes and Warnings

  • The parallel::detectCores() function will return the total number of cores on a compute node and not the number of cores assigned to your job by the scheduler. Instead, use something like numCores = as.integer(Sys.getenv("SLURM_CPUS_ON_NODE")) to find out the number of CPU cores 'X' requested in your job script by: #SBATCH --cpus-per-task=X

  • Default RData format In R-3.6.0 the default serialization format used to save RData files has been changed to version 3 (RDX3), so R versions prior to 3.5.0 will not be able to open it. Keep this in mind if you copy RData files from HiPerGator to an external system with old R installed.

  • Java rJava users need to load the java module manually with module load java/1.7.0_79. Use the correct java module version for your case.

  • TMPDIR If temporary files are produced the may fill up memory disks on HPG2 nodes and cause node and job failures. Use something like

    mkdir -p tmp
    export TMPDIR=$(pwd)/tmp
    
    in your job script to prevent this and launch your job from the respective directory and not from your home directory.
  • For users of PHI and FERPA: It is particularly important to set your working and TMPDIR directories to be in your project's PHI/FERPA configured directory in /blue when working with R. Writing files to $HOME or $TMPDIR could expose restricted data to unauthorized users.

  • Tasks vs Cores for parallel runs Parallel threads in an R job will be bound to the same CPU core even if multiple ntasks are specified in the job script. Use cpus-per-task to use R 'parallel' module correctly. For example, for an 8-thread parallel job use the following resource request in your job script:

    #SBATCH --nodes=1
    #SBATCH --ntasks=1
    #SBATCH --cpus-per-task=8
    
  • See the single-threaded and multi-threaded examples in the Sample SLURM Scripts page for more details.

Installed Libraries:

You can install your own libraries to use with R. These are stored in your /home/ environment. For details visit our Applications FAQ and see the section "How do I install R packages?".

Make sure the directory for that version of R is created or R will try to install to a system path and fail. E.g. for R/4.3 run the following command before attempting to install a package:

mkdir ~/R/x86_64-pc-linux-gnu-library/4.3
You can set a custom library path with the R_LIBS_USER environment variable. From https://cran.r-project.org/web/packages/startup/vignettes/startup-intro.html:

R_LIBS_USER - user's library path, e.g. R_LIBS_USER=~/R/%p-library/%v is the folder specification used by default on all platforms and and R version. The folder must exist, otherwise it is ignored by R. The %p (platform) and %v (version) parts are R-specific conversion specifiers.

To see a list of installed libraries in the currently loaded version of R:

$ R
> installed.packages()

R MPI Example

Expand this section to view an example of using R MPI code

Example, of using the parallel module to run MPI jobs under SLURM with Rmpi library.

# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
    library("Rmpi")
    }

ns <- mpi.universe.size() - 1
mpi.spawn.Rslaves(nslaves=ns)
#
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function(){
      if (is.loaded("mpi_initialize")){
          if (mpi.comm.size(1) > 0){
              print("Please use mpi.close.Rslaves() to close slaves.")
              mpi.close.Rslaves()
          }
          print("Please use mpi.quit() to quit R")
          .Call("mpi_finalize")
      }
}
# Tell all slaves to return a message identifying themselves
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( ns <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))

# Test computations
x <- 5
x <- mpi.remote.exec(rnorm, x)
length(x)
x

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves(dellog = FALSE)
mpi.quit()

Example job script using rmpi_test.R script.

#!/bin/sh
#SBATCH --job-name=mpi_job_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE # Where to send mail  
#SBATCH --cpus-per-task=1 # Number of cores per MPI rank 
#SBATCH --nodes=2 #Number of nodes
#SBATCH --ntasks=8 # Number of MPI ranks
#SBATCH --ntasks-per-node=4 #How many tasks on each node
#SBATCH --ntasks-per-socket=2 #How many tasks on each CPU or socket
#SBATCH --distribution=cyclic:cyclic #Distribute tasks cyclically on nodes and sockets
#SBATCH --mem-per-cpu=1gb # Memory per processor
#SBATCH --time=00:05:00 # Time limit hrs:min:sec
#SBATCH --output=mpi_test_%j.out # Standard output and error log
pwd; hostname; date

echo "Running example Rmpi script. Using $SLURM_JOB_NUM_NODES nodes with $SLURM_NTASKS 
tasks, each with $SLURM_CPUS_PER_TASK cores."
module purge; module load gcc openmpi rmpi

srun --mpi=${HPC_PMIX} Rscript /data/training/SLURM/rmpi_test.R

date

For rmpi/4.0 module the following command will work

mpiexec -n ${SLURM_NTASKS} Rscript rmpi_test.R

Link the /apps/rmpi/conf/Rprofile as .Rprofile in the current directory configuration file that must be placed in the working directory if the rmpi module doesn't add a symlink automatically.

ln -s /apps/rmpi/conf/Rprofile .Rprofile

Job Script Examples

Expand this section to view an example R job script
#!/bin/bash
#SBATCH --job-name=R_test   #Job name   
#SBATCH --mail-type=END,FAIL   # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=ENTER_YOUR_EMAIL_HERE   # Where to send mail    
#SBATCH --ntasks=1
#SBATCH --mem=1gb   # Per processor memory
#SBATCH --time=00:05:00   # Walltime
#SBATCH --output=r_job.%j.out   # Name output file 
#Record the time and compute node the job ran on
date; hostname; pwd
#Use modules to load the environment for R
module load R

#Run R script 
Rscript myRscript.R

date

Categories

statistics