Conda¶
Description¶
Conda and mamba provide package, dependency, and environment management for any language.
Modules system sets up the following environment variables for this module:
Environment Modules¶
Run module spider conda
to find out what environment modules are available for this application.
Environment Variables¶
- HPC_CONDA_DIR - installation directory
Additional Usage Information¶
Background¶
Many projects that use Python code require careful management of the respective Python environments. Rapid changes in package dependencies, package version conflicts, deprecation of APIs (function calls) by individual projects, and obsolescence of system drivers and libraries make it virtually impossible to use an arbitrary set of packages or create one all-encompassing environment that will serve everyone's needs over long periods of time. The high velocity of changes in the popular ML/DL frameworks and packages and GPU computing exacerbates the problem.
Expand this section to view pip
problems and how conda
mends them.
The Problem with pip install
¶
Most guides and project documentation for installing python packages recommend using pip install
for package installation. While pip
is easy to use and works for many use cases, there are some major drawbacks. There are a few issues with doing pip install on a supercomputer like HiPerGator:
- Pip by default installs binary packages (wheels), which are often built on systems incompatible with HiPerGator. This can lead to importing errors, and its attempts to build from source will fail without additional configuration.
- If you are
pip install
ing a package that is/will be installed in an environment provided by UFIT-RC, your pip version will take precedence. Your dependencies eventually become incompatible causing errors. - Different packages may require different versions of a package as dependencies, leading to impossible to reconcile installation scenarios. This becomes a challenge to manage with
pip
as there isn't a method to swap active versions. - On its own,
pip
installs everything in one location:~/.local/lib/python3.X/site-packages/
. - In Jupyter Notebooks, even with a Kernel selected,
pip
installs in .local. While many tutorials out there will tell you to!pip install ____
, this is poor advice that can cause lots of problems.
Conda to the Rescue!¶
conda
solves some of these issues. Conda represents a higher level of packaging abstraction that can combine compiled packages, applications, and libraries as well as pip
-installed python packages. Conda also allows easier management of project-specific environments and switching between environments as needed. Conda makes it much easier to report the exact configuration of packages in an environment, facilitating reproducibility. Moreover, conda environments don't even have to be activated to be used; in most cases adding the path to the conda environment's bin
directory to the $PATH
in the shell environment is sufficient for using them.
A Caveat¶
conda
gets packages from channels, or repositories of prebuilt packages. While there are several available channels, like conda-forge
or bioconda
, not every Python package is available from such channel as they have to be packaged for conda first. You may still need to use pip
to install some packages as noted later. However, conda still helps manage environment by installing packages into separate directory trees rather than trying to install all packages into a single folder that pip does.
Conda Configuration¶
Prior to usage, you may need to configure conda
. The first time a HiPerGator user loads the conda
environment module, a recommended configuration will be created. This configuration can be viewed
and modified with the conda config
command.
Conda storage locations¶
The settings for package and environment storage will be of particular interest to HiPerGator users. By default conda
will store these files in a user's home directory, which can rapidly lead to the user's 40GB home storage quota being used up.
The first time a HiPerGator user loads the conda
environment module, these values will be set to a location in /blue/groupname/username/
. If a user's primary group does not have a blue storage allocation, these values will need to be set with conda config
.
The conda
storage locations are configured with the envs_dirs
and pkgs_dirs
settings which determine where environments and downloaded packages are stored.
To view your current setttings:
[username@login12 ~]$ conda config --show envs_dirs pkgs_dirs
envs_dirs:
- /blue/groupname/username/.conda/envs
pkgs_dirs:
- /blue/groupname/username/.conda/pkgs
These settings are lists of locations. To add new locations as the defaults, run these commands:
conda config --prepend envs_dirs /blue/othergroup/username/.conda/envs
conda config --prepend pkgs_dirs /blue/othergroup/username/.conda/pkgs
Info
You do not need to manually create the folders that you setup in your conda
configuration. conda
will take care of that for you when you create environments.
Create and Activate a Conda Environment¶
The UFIT Research Computing Applications Team uses conda
for many application installs behind the scenes. We are happy to install applications on request for you. However, if you would like to use conda
to create multiple environments for your personal projects we encourage you to do so. Here are some recommendations for successful conda use on HiPerGator.
- See the Conda project's documentation on managing conda environments.
- We recommend creating environments by 'path' in
/blue
. The resulting environment should be located in the project(s) directory tree in /blue for better tracking of installs and better filesystem performance compared to home.
If you plan on using a GPU
To make sure your code will run on GPUs install a recent cudatoolkit
package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch
or tensorflow
package(s).
See the UFIT-RC provided tensorflow
or pytorch
installs for examples if needed.
Conda can detect if there is a GPU available on the computer, so the easiest approach is to run the conda install
command in a GPU session. Alternatively, you can run conda install
on any node, or if a cpu-only pytorch
package was already installed, by explicitly requiring a GPU version of pytorch
when running conda install
.
Load the conda
Module¶
Before we can run conda
on HiPerGator, we need to load the conda
module:
module load conda
Create Your Environment¶
Create a Name Based Environment¶
To create your first name based (see path based instructions below) conda environment, run the following command. For example, to create an environment named my_env
:
conda create -n my_env
Expand to see example command and output
[username@login7 ~]$ conda create -n my_env
Looking for: []
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
To activate this environment, use
$ conda activate my_env
To deactivate an active environment, use
$ conda deactivate
[username@login7 ~]$
Tip
When creating a Conda environment you can also install Conda packages as needed at the same time. i.e:
conda create -n another_env python=3.11 pytorch numpy=2.22
Create a Path Based Environment¶
To create a path based conda
environment use the -p
argument. For example, to create a path based enviroment at /blue/mygroup/share/project42/conda/envs/hfrl/
conda create -p /blue/mygroup/share/project42/conda/envs/another_env/
Expand to see example command and output
[username@login7 ~]$ conda create -p /blue/mygroup/share/project42/conda/envs/another_env/
Looking for: []
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
To activate this environment, use
$ conda activate /blue/mygroup/share/project42/conda/envs/another_env/
To deactivate an active environment, use
$ conda deactivate
[username@login7 ~]$
Activate the New Environment¶
To activate our environment:
- For name based environments:
conda activate my_env
- For path based environments:
conda activate /blue/mygroup/share/project42/conda/envs/another_env
Success
Notice that your command prompt changes when you activate an environment to indicate which environment is active, showing that in parentheses before the other information:
(myenv) [username@c0907a-s23 ~]$
Tip
Activation of your environment is really only needed for package installation. For using the environment just add the path to its bin directory to $PATH
in your job script.
E.g. If you conda environment at /blue/mygroup/$USER/conda/envs/project1/
, add the following into your job script before executing any commands
export PATH=/blue/mygroup/$USER/conda/envs/project1/bin:$PATH
Adjust the path as needed for your environment. The path should include the bin
folder as shown above.
Once you are done installing packages inside the environment you can use
conda deactivate
Export or Import an Environment¶
Expand to see how to export your environment to an environment.yml
file
Now that you have your environment working, you may want to document its contents and/or share it with others. The environment.yml
file defines the environment and can be used to build a new environment with the same setup.
To export an environment file from an existing environment, run:
conda env export > my_env.yml
You can inspect the contents of this file with cat my_env.yml
. This file defines the packages and versions that make up the environment as it is at this point in time. Note that it also includes packages that were installed via pip
.
Expand to see how to create an environment from a yaml file
If you share the environment yaml file created above with another user, they can create a copy of your environment using the command:
conda env create --file my_env.yml
They may need to edit the last line to change the location to match where they want their environment created.
Group Environments¶
It is possible to create a shared environment accessed by a group on HiPerGator, storing the environment in, for example, /blue/group/share/conda
. In general, this works best if only one user has write access to the environment. All installs should be made by that one user and should be communicated with the other users in the group. It is recommended that user's umask configuration is set to group friendly permissions, such as umask 007. See Sharing Within A Cluster.
Install Packages into your Environment with Mamba or Pip¶
Now we are ready to start adding things to our environment.
There are a few ways to do this. We can install things one-by-one with either mamba install ____
or pip install ____
. We will look at using yaml files below.
Note: when an environment is active, running pip install will install the package into that environment. So, even if you continue using pip, adding conda environments solves the problem of everything being installed in one location--each environment has its own site-packages folder and is isolated from other environments.
Expand this section to view instructions.
Mamba Install Packages¶
Now we are ready to install packages using mamba install ___
.
Note: If you plan on using a GPU: To make sure your code will run on GPUs install a recent cudatoolkit package that works with the NVIDIA drivers on HPG (currently 12.x, but older versions are still supported) alongside the pytorch or tensorflow package(s). See RC provided tensorflow or pytorch installs for examples if needed. Mamba can detect if there is a gpu in the environment, so the easiest approach is to run the mamba install command in a gpu session. Alternatively, you can run mamba install on any node or if a cpu-only pytorch package was already installed by explicitly requiring a gpu version of pytorch when running mamba install. E.g.
mamba install cudatoolkit=11.3 pytorch pytorch-cuda=11.3 -c pytorch -c nvidia
Load the conda
Module:¶
From the PyTorch Installation page, we should use:
mamba install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
mamba
will look in the repositories for the specified packages and their dependencies. Note we are specifying a particular version of cudatoolkit
. As of May, 2022, that is the correct version on HiPerGator.
Tensorflow Installation Alternative:¶
While not needed for this tutorial, many users will want TensorFlow instead of PyTorch, so we will provide the command for that here. To install TensorFlow, use this command:
mamba install tensorflow cudatoolkit=11.2
Install Additional Packages:¶
This tutorial creates an environment for the Hugging Face Deep Reinforcement Learning Course, you can either follow along with that or adapt to your needs.
You can list more than one package at a time in the mamba install command. We need a couple more, so run:
mamba install gym-box2d stable-baselines3
Add Packages to Our Environment with pip install
:¶
As noted above, not everything is available in a conda channel. For example the next thing we want to install is huggingface_sb3
.
If we type mamba install huggingface_sb3
, we get a message saying nothing provides it as seen below:
Encountered problems while solving:
- nothing provides requested huggingface_sb3
(hfrl) [magitz@c0907a-s23 magitz]$
conda
source that has that package, we can add it to the channels: section of our ~/.condarc file
. That will prompt mamba
to include that location when searching.
But many things are only available via pip
. So...
pip install huggingface_sb3
huggingface_sb3
. Again, because we are using environments and have the hfrl
environment active, pip
will not install huggingface_sb3
in our ~/.local/lib/python3.X/site-packages/
directory, but rather within in our hfrl
directory, at /blue/group/user/conda/envs/hfrl/lib/python3.10/site-packages
. This prevents the issues and headaches mentioned at the start.
Install Additional Packages¶
As with mamba, we could list multiple packages in the pip install
command, but again, we only need one more:
pip install ale-py==0.7.4
Use your environment from command line or scripts¶
Now that we have our environment ready, we can use it from the command line or a script using something like:
module load conda
conda activate hfrl
# Run my amazing python script
python amazing_script.py
# Set path to environment we want and pre-pend to PATH variable
env_path=/blue/mygroup/share/project42/conda/bin
export PATH=$env_path:$PATH
# Run my amazing python script
python amazing_script.py
Adding Conda Environments as Jupyter Kernels¶
To have a conda
environment show as a kernel in Jupyter, two more steps are needed. The conda environment must be activated when these steps are run.
Install the ipykernel package¶
In order to use an environment in Jupyter, we need to make sure we
install the ipykernel
package in the environment:
conda install ipykernel
Create the kernel definition¶
The ipykernel python module includes a utility for adding a kernel definition. With the conda environment active, run the following command:
python -m ipykernel install --user --name my_kernel_name --display-name my_display_name
"my_kernel_name" and "my_display_name" can be any value you like, but each kernel you create should have a distinct name.
Troubleshooting Kernels¶
If you kernel doesn't display, or won't launch, check the Jupyter output log.
In Open on Demand, each session has a card with a link to the Session ID.
Click on that link and open the output.log
file. That log file will often provide clues as to why kernels cannot launch.
Categories¶
programming