Storage¶
UF Research Computing maintains several shared storage systems that are intended for different user activities. A summary of the filesystems and their use can be found in the RC storage policy. Here we discuss practical use of the filesystems on HiPerGator. See our FAQ to see what to do when you run out of storage space.
Home Storage¶
Quick notes on Home storage
- Quota in home directory is 40GB.
- Do not use
/home
job input and output (reading or writing files). - There is one week of daily snapshots maintained at
~/.snapshot/
. - Check your quota with the
home_quota
command.
Your home directory is the first directory you see when you log into
HiPerGator. It's always found at ~
, /home/$USER
or $HOME
paths.
The shell variables above can be used in scripts.
The home directories
are the smallest storage devices available to our users. They contain
files important for setting up user shell environment and secure shell
connections. Do not remove the ~/.bashrc
or ~/.bash_profile
files or the .ssh
directory; you will have problems using your HiPerGator account. If you do run into issues, Open a support request, and we can reset the files to standard versions.
The first rule of using the home directory is to not use it for reading or writing data files in any analyses run on HiPerGator. It is permissible to keep software builds, conda environments, text documents, and valuable scripts in $HOME as it is somewhat protected by daily snapshots.
Blue Storage¶
Quick notes on Blue storage
- Blue is our main high-performance parallel filesystem
- Blue is the primary location that should be used for all files read or written during job execution.
- Each group should have a Blue folder at
/blue/groupname/
. - Quotas are based on investment and are at the group level.
- Check your quota with the
blue_quota
command.
Blue Storage is our main high-performance parallel filesystem. This is
where all job input/output a.k.a 'job i/o' or reading and writing files
must happen. By default your personal directory tree will start at
/blue/groupname/username
. That directory cannot be modified by other group
members. There is a shared directory at /blue/groupname/share
for groups
that prefer to share all their data between group members.
UFIT-RC only creates the user directory for your primary group. If you have secondary groups, you will need to create your own folder in that group's directory.
The parallel nature of the Blue Storage makes it very efficient at reading and writing large files, which can be 'striped' or broken into pieces to be stored on different storage servers.
It does not deal well with directories that have a large number of very small files. If a job produces those it is advisable to make use of the Temporary Directories to alleviate the burden on Blue Storage and make it more responsive and performant for everyone.
For groups that purchased separate storage for additional projects the
default path to the project directories is /blue/PROJECT
. That
directory is set up similarly to the 'share' directory in the primary
group directory tree.
Orange Storage¶
Quick notes on Orange storage
- The Orange filesystem is primarily intended for archival purposes.
- If an investment has been made, your group's Orange folder will be at
/orange/groupname
. - Quotas are based on investment and are at the group level.
- Check your quota with the
orange_quota
command.
Orange storage is cheaper than Blue, but its hardware is also more limited. Therefore, orange storage cannot support the full brunt of the applications running on HiPerGator. Limit its use to long-term storage of data that's not currently in use or for very gentle access like serial reading of raw data for QC/Filtering with the output of that first step in many workflows going to your Blue Storage directory tree.
We only create the /orange/groupname
directory when a new quota is added
(no user or share directories like the ones pre-created for a group in
/blue
). Users in a group are expected to work out their own approach to
storing and sharing data in their /orange
directory tree.
Red Storage¶
Red storage is fully flash based and can support high rates of i/o. The point to remember about Red storage is that the allocations are short-term and the data is removed within 24 hrs of the allocation's end date. See the policy page for how to request an allocation.
Storage Backup¶
Unless purchased separately, NOTHING is backed up!
Storage backup is available as an option, but Home, Blue, Orange and Red do not provide any backup mechanism.
It is unfortunate, but users regularly accidentally delete important files.
Either invest in tape backup or keep your own backups of important files!
Storage Automounting¶
Many directories on HiPerGator make use of automounting. That means that the directory doesn't exist on the server until it is accessed by a user. This means that if you get a listing of /blue
or /orange
, your group's directory may not appear! This can be concerning for users! Fear not, change directories into the group directory and it will be mounted and ready for use.
If you are using Jupyter Notebook or other GUI or web applications that make it difficult to browse to a specific path you can create a symlink (shortcut) as shown in Create the Link
Local Scratch Storage or Temporary Directories¶
Quick notes on temporary directories
- Use
$SLURM_TMPDIR
for a job's temporary directory.
All HiPerGator compute nodes have local storage. That storage is flash-based on HPG3 and newer nodes and can support high i/o rates. Older nodes use spinning disks with lower i/o rates compared to flash.
Using local scratch storage on HiPerGator compute nodes Temporary Directories is a way to insulate an
analysis from most of the other jobs running on HiPerGator, which are
generally using Blue storage. Therefore it may be possible to use $SLURM_TMPDIR
to get much higher i/o rates as the job competes for local scratch i/o
with a limited number of jobs running on the same compute node that also
chose to use local scratch.
The caveat is that using local scratch
requires staging out input data (copying it from /blue
to $SLURM_TMPDIR
within
a job) and staging in the results (copying results files back to /blue
)
since the job's temporary scratch directory is automatically removed at
the end of the job, so the files on it are irretrievably lost.
Checking Quotas and Managing Space¶
The ufrc environment module has several tools useful for checking storage use and quotas as well as exploring directories and their space use.
home_quota
- show your HiPerGator Home directory quota usage.blue_quota
- show HiPerGator Blue Storage (/blue
) quota usage for your user and group.orange_quota
- show HiPerGator Orange Storage (/orange
) quota usage for your project(s).ncdu
- an interactive program for showing directory sizes, browsing a directory tree, and removing files and directories in a terminal (ssh session).
Shared Work and Storage Management¶
Note that HiPerGator is a RedHat Enterprise Linux based cluster. Its main shared filesystems are based on Lustre. All filesystem management limitations follow from this setup.
The sponsor of a group on HiPerGator has the ultimate authority over any data produced by their group members within the limitations of the Linux kernel, linux filesystem permission model, and Lustre filesystem implementation of the POSIX Access Control Lists.
- What this means in practical terms is that the sponsor can decide on any action pertaining to the disposition of the files under their control, but how a particular change can be implemented, if at all possible, can vary both in its scope, the amount of initial and maintenance effort required, and the support request timeline.
- It's important to understand both the security model and limitations imposed by the system when considering what approaches to data management within a group or a project directory are possible.
In a default setup each primary or project group with a Blue storage
quota will have a /blue/groupname
top level directory, which contains
individual user directories and a share
sub-directory for
collaborative projects. The initial permissions are set such that only
individual users have write access to their personal directories and the
share directory is group-writable. The default for groups is that group
members have read access to other group member folders.
For the Orange filesystem the /orange/groupname
directory is
group-writable with no other directories or permissions created by
default. Access to files in the /orange/groupname
or /blue/groupname/share
directories depends on individual umask
settings or results of chmod
commands to change permissions and are within the purview of the group
members with no RC staff involvement.
- It is not expected that other users are given write access to another group member's files outside of the share directory.
- When an account is deleted because of inactivity or an explicit
sponsor support request the default action is to move the user's
personal directory to
/blue/groupname/share
andchown
all files in Blue or Orange group directory trees that were owned by the user account in question to the username of the group's sponsor. - A sponsor opening a support request can ask for a different dispensation of the former group member's files.
If the default directory and permission configuration and the account removal procedure fits how the group operates no further changes are necessary. However, we have observed additional approaches to group data management as well as support requests for changes that can be contrary to system limitations or are difficult to implement and maintain and therefore may not be advisable even if somewhat feasible.
Collaborative Approaches¶
In a fully collaborative research group on HiPerGator all members are
encouraged to set umask 007
in their ~/.bashrc
file to make sure
that write permissions are set on all files and directories created by
group members.
- This will allow all users in the group to manage (read, write, execute) all group files. If security against accidental file deletion is desirable the group is advised to purchase Tivoli Backup from UFIT ICT.
If there is a need to share files with members of other groups or external collaborators, multiple approaches are possible. The most straightforward way is to to share a directory via a Globus Collection. This approach will work for both HPG and external collaborators to make a copy of the data and permissions are controlled in Globus.
- If a selected set of users from multiple HPG groups must work on a project the preferred approach is for a sponsor or sponsors of the project to request a project group creation and purchase a storage quota for the project. In that case all members of the project will be added to the project group as secondary members and will have manage project file in a manner similar to how the Blue share or Orange directory is managed.
If it is necessary to give access to the directory to a member of a different HPG group the most straightforward solution from system administration viewpoint is to add that user as a secondary member of the sharing group. However, this change gives the user access to all group-readable files and the ability to use the group's computational resources on HiPerGator. This may not be desirable for one reason or another. The request to add a user to a group should be made by, and will require approval of, the group's sponsor via a support request.
- There are more complex situations. Unfortunately, there is no general mechanism in Linux to allow hierarchical access permissions on filesystems.
- It may be possible to set Lustre filesystem ACLs on
/blue
and/orange
directories to allow more complex permissions. For example, allowing a user to manage another user's files, which means they will be able to modify, rename, move, or remove files and directories they do not own or have no group level write access to. Using Lustre ACLs is not a straightforward approach and the ACL interactions with the Linux filesystem permissions may still preclude write access to some files and directories. - We are willing
to make such changes or work with the group to determine a usable
setfacl
command. However, the success is not guaranteed and the ACL permissions may be lost or not applied to new files. Please use the RC Support System to get in touch if you really need to use filesystem ACLs and are having trouble or need group-wide settings.