HiPerGator-RV Data Management¶
Procedures and tools are in place for users to manage the data stored in the HiPerGator-RV system.
Definitions¶
- HiPerGator-RV has two ways to store data
- My Vault: Files are stored in a way that is very similar to many cloud storage interfaces like Google Drive, Dropbox, OneDrive/Teams, Box, etc.
- Drives: Files are stored in encrypted drives that can be mounted in Linux or Windows virtual machines (VMs) for accessing and processing.
- Snapshots of files are copies taken at regular intervals, with a number of historical copies maintained by the system for each file. This allows users to recover a version of a file at some point in the past, as long as it is within the list of retained copies. Snapshots are often made on the same storage system and do not protect from system failure. Snapshots can be enabled on Windows VMs.
- Replication of files is maintaining a copy of the file, usually at some regular interval, such as daily. When the original file is deleted, modified, or corrupted, the replica file remains intact. However, at the next synchronization time, the modification/corruption will also affect the replica. Replicas are very useful for protecting against obvious damage that is immediately noticeable, allowing replication to be stopped or the file to be restored from the replica before the next synchronization time. Replicas are usually made on a different system in the same data hall and protect from single system failure, but not data hall-wide issues like fire or flood.
- Backup of a file are copies of various versions of the file to another storage device, often using a different technology (magnetic tapes instead of spinning disks).
- Off-site copies are stored on a storage device that is located at another geographic location. Off-site copies are the only way to protect against total system failure and issues that affect the whole data center or geographic site, such as flooding and destruction.
Process¶
- At the hardware level, all storage systems have RAID redundancy to protect against hardware failures of disk drives.
- All files in the My Vault are stored on the primary storage system, individually encrypted.
- The files are replicated to a secondary storage system once per day at 1 am.
- The files are also backed up to tape with an off-site copy using incremental backup.
- Incremental copies are retained for 90 days, with the last retained copy (after the file is deleted) kept for 1 year.
- The drives are listed in the Drives sub-tab of the Virtual Machines tab.
- The drives are subject to an incremental snapshot backup process using the virtual device system (QEMU) that works with the KVM hypervisor used in HiPerGator-RV to run VMs. The QEMU system ensures that the state of the virtual drive is consistent before it takes an incremental snapshot and replicates to the secondary storage device. That way, drives can be restored from one of the incremental snapshot backups on secondary storage.
- This process works on encrypted data, the QEMU system or any HiPerGator-RV administrator does not have access to the data at any time.
- Drives for Windows (Linux drives do not have this capability) have the Microsoft Volume Shadow Copy Service (VSS) option turned on by default. This service utilizes a fraction of the virtual drive to store snapshots of previous versions of all files that change, allowing users to easily recover them. The frequency, time, and available space at which the snapshots are taken, as well as the number of versions kept, are configurable by the Research Computing staff who maintain the VM images. This mechanism handles 90% of the typical requests for restoring deleted or corrupted files. It does not protect the data from underlying system-level problems.
Best Practices¶
We recommend that users choose a preferred workflow for using HiPerGator-RV. It is best practice to review the contractual requirements for the project's data to ensure that the selected workflow satisfies the contract's terms. Start simple and add complexity as the need arises. We describe a few use cases.
- Using My Vault – For some workflows that only involve storing, tracking and viewing files, My Vault may be sufficient for the project.
- Using encrypted drives and VM for processing – The files for the project can be stored in one or more encrypted virtual drives, with all the work done in Windows or Linux VMs.
- Using a single-user VM – Users can perform work on data stored in drives using applications available in Linux or Windows VM. Several VM templates are available that have applications pre-installed and are ready to use.
- Using a multi-user VM – To support larger and more complex operations, multiple users can log in to a single Linux or Windows VM that provides access to the drives shared by the users in the group. Within the VM, the operating system supported security mechanisms apply to provide access control to the files in the drives mounted in the VM.
Costs¶
- There is no additional cost for the My Vault replication and backup processes.
- Backup and Off-site copies for Virtual Drives are not included in the base level pricing for HiPerGator-RV. Please contact Research Computing if you require this service for your project.
Data Recovery Process¶
- Restoring files to a previous version using the VSS feature within Windows VMs can be done by users. No request to Research Computing is needed for this type of recovery. For instructions on the procedure, please refer to the following link: https://it.clas.ufl.edu/kb/recover-files/
- Data recovery requests for My Vault files or encrypted virtual drives should be submitted to UFIT Research Computing by submitting a ticket through http://support.rc.ufl.edu
- Recovery timeframes depend on the type and size of recovery requested.