Skip to content

Conda

Description

conda website

Conda and mamba provide package, dependency, and environment management for any language. Modules system sets up the following environment variables for this module:

Environment Modules

Run module spider conda to find out what environment modules are available for this application.

Environment Variables

  • HPC_CONDA_DIR - installation directory

Additional Usage Information

Background

Many projects that use Python code require careful management of the respective Python environments. Rapid changes in package dependencies, package version conflicts, deprecation of APIs (function calls) by individual projects, and obsolescence of system drivers and libraries make it virtually impossible to use an arbitrary set of packages or create one all-encompassing environment that will serve everyone's needs over long periods of time. The high velocity of changes in the popular ML/DL frameworks and packages and GPU computing exacerbates the problem.

Expand this section to view pip problems and how conda/mamba mends them.

The Problem with pip install

Most guides and project documentation for installing python packages recommend using pip install for package installation. While pip is easy to use and works for many use cases, there are some major drawbacks. There are a few issues with doing pip install on a supercomputer like HiPerGator:

  • Pip by default installs binary packages (wheels), which are often built on systems incompatible with HiPerGator. This can lead to importing errors, and its attempts to build from source will fail without additional configuration.
  • If you are pip installing a package that is/will be installed in an environment provided by UFIT-RC, your pip version will take precedence. Your dependencies eventually become incompatible causing errors.
  • Different packages may require different versions of a package as dependencies, leading to impossible to reconcile installation scenarios. This becomes a challenge to manage with pip as there isn't a method to swap active versions.
  • On its own, pip installs everything in one location: ~/.local/lib/python3.X/site-packages/.

Conda and Mamba to the Rescue!

Mamba logo {:style="height:100px"} conda and the newer, faster, drop-in replacement mamba, were written to solve some of these issues. They represent a higher level of packaging abstraction that can combine compiled packages, applications, and libraries as well as pip-installed python packages. They also allow easier management of project-specific environments and switching between environments as needed. They make it much easier to report the exact configuration of packages in an environment, facilitating reproducibility. Moreover, conda environments don't even have to be activated to be used; in most cases adding the path to the conda environment's bin directory to the $PATH in the shell environment is sufficient for using them.

A Caveat

conda and mamba get packages from channels, or repositories of prebuilt packages. While there are several available channels, like conda-forge or bioconda, not every Python package is available from such channel as they have to be packaged for conda first. You may still need to use pip to install some packages as noted later. However, conda still helps manage environment by installing packages into separate directory trees rather than trying to install all packages into a single folder that pip does.

The ~/.condarc Configuration File

conda's behavior is controlled by a configuration file in your home directory called .condarc. The dot at the start of the name means that the file is hidden from 'ls' file listing command by default. If you have not run conda before, you won't have this file. Whether the file exists or not, the steps here will help you modify the file to work best on HiPerGator.

Tip

The first time a HiPerGator user loads the conda environment module, it will put the current best practice .condarc into their home directory.

For example:

[username@login7 ~]$ module load conda

No ~/.condarc found, creating a new config from HPG defaults

If you have a configuration file created automatically as above, you can likely skip the details of this section as your envs and pkgs directories are already configured to be on /blue storage.

The following sections require editing the ~/.condarc file. One way to edit this file is to type: nano ~/.condarc

Expand to see how to configure the conda package cache location.

By default, conda caches (keeps a copy) of all downloaded packages in the ~/.conda/pkgs directory tree. If you install a lot of packages you may end up filling up your home quota. You can change the default package cache path. To do so, add or change the pkgs_dirs setting in your ~/.condarc configuration file e.g.:

pkgs_dirs:
- /blue/mygroup/$USER/conda/pkgs
Replace mygroup with your actual group name.
Expand to see how to configure the conda environment location.

conda puts all packages installed in a particular environment into a single directory. By default named conda environments are created in the ~/.conda/envs directory tree. They can quickly grow in size and, especially if you have many environments, fill the 40GB home directory quota. You can change the default path for the name environments (conda create -n NAME). To do so, add or change the envs_dirs setting in the ~/.condarc configuration file e.g.:

envs_dirs:
- /blue/mygroup/$USER/conda/envs
Replace mygroup with your actual group name. You can also use a group's share folder, e.g.: - /blue/mygroup/share/conda/envs

Your ~/.condarc should look something like this when you are done editing (again, replacing mygroup and $USER in the paths with your group and username).

channels:
- conda-forge
- bioconda
- defaults
pkgs_dirs:
- /blue/mygroup/$USER/.conda/pkgs
envs_dirs:
- /blue/mygroup/$USER/.conda/envs
auto_activate_base: false
anaconda_upload: false

Info

You do not need to manually create the folders that you setup in your ~/.condarc file. mamba will take care of that for you when you create environments.

Create and Activate a Conda Environment

The UFIT Research Computing Applications Team uses conda for many application installs behind the scenes. We are happy to install applications on request for you. However, if you would like to use conda to create multiple environments for your personal projects we encourage you to do so. Here are some recommendations for successful conda use on HiPerGator.

  • See the Conda project's documentation on managing conda environments.
  • We recommend creating environments by 'path' in /blue. The resulting environment should be located in the project(s) directory tree in /blue for better tracking of installs and better filesystem performance compared to home.

If you plan on using a GPU

To make sure your code will run on GPUs install a recent cudatoolkit package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s).

See the UFIT-RC provided tensorflow or pytorch installs for examples if needed.

Mamba can detect if there is a GPU available on the computer, so the easiest approach is to run the mamba install command in a GPU session. Alternatively, you can run mamba install on any node, or if a cpu-only pytorch package was already installed, by explicitly requiring a GPU version of pytorch when running mamba install. E.g. mamba install cudatoolkit=11.3 pytorch=1.12.1=gpu_cuda* -c pytorch

Load the conda Module

Before we can run conda or mamba on HiPerGator, we need to load the conda module:

module load conda

Create Your Environment

Create a Name Based Environment

To create your first name based (see path based instructions below) conda environment, run the following command. For example, to create an environment named my_env:

mamba create -n my_env
Expand to see example command and output
[username@login7 ~]$ mamba create -n my_env

Looking for: []

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

To activate this environment, use

    $ mamba activate my_env

To deactivate an active environment, use

    $ mamba deactivate

[username@login7 ~]$ 

Tip

When creating a Conda environment you can also install Conda packages as needed at the same time. i.e: mamba create -n another_env python=3.11 pytorch numpy=2.22

Create a Path Based Environment

To create a path based conda environment use the -p argument. For example, to create a path based enviroment at /blue/mygroup/share/project42/conda/envs/hfrl/

mamba create -p /blue/mygroup/share/project42/conda/envs/another_env/
Expand to see example command and output
[username@login7 ~]$ mamba create -p /blue/mygroup/share/project42/conda/envs/another_env/
Looking for: []
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
To activate this environment, use
    $ mamba activate /blue/mygroup/share/project42/conda/envs/another_env/
To deactivate an active environment, use
    $ mamba deactivate
[username@login7 ~]$ 

Activate the new environment

To activate our environment:

  • For name based environments: mamba activate my_env
  • For path based environments: mamba activate /blue/mygroup/share/project42/conda/envs/another_env

Success

Notice that your command prompt changes when you activate an environment to indicate which environment is active, showing that in parentheses before the other information:

(myenv) [username@c0907a-s23 ~]$

Tip

Activation of your environment is really only needed for package installation. For using the environment just add the path to its bin directory to $PATH in your job script.

E.g. If you conda environment at /blue/mygroup/$USER/conda/envs/project1/, add the following into your job script before executing any commands

export PATH=/blue/mygroup/$USER/conda/envs/project1/bin:$PATH

Adjust the path as needed for your environment. The path should include the bin folder as shown above.

Once you are done installing packages inside the environment you can use

conda deactivate

Export or import an environment

Expand to see how to export your environment to an environment.yml file

Now that you have your environment working, you may want to document its contents and/or share it with others. The environment.yml file defines the environment and can be used to build a new environment with the same setup.

To export an environment file from an existing environment, run:

conda env export > my_env.yml

You can inspect the contents of this file with cat my_env.yml. This file defines the packages and versions that make up the environment as it is at this point in time. Note that it also includes packages that were installed via pip.

Expand to see how to create an environment from a yaml file

If you share the environment yaml file created above with another user, they can create a copy of your environment using the command:

conda env create --file my_env.yml

They may need to edit the last line to change the location to match where they want their environment created.

Group environments

It is possible to create a shared environment accessed by a group on HiPerGator, storing the environment in, for example, /blue/group/share/conda. In general, this works best if only one user has write access to the environment. All installs should be made by that one user and should be communicated with the other users in the group. It is recommended that user's umask configuration is set to group friendly permissions, such as umask 007. See Sharing Within A Cluster.

For More Information

There is additional information on adding conda environments as Jupyter Kernels on the Managing Conda Environments page.

Categories

programming