Background
Many projects that use Python code require careful management of the respective Python environments. Rapid changes in package dependencies, package version conflicts, deprecation of APIs (function calls) by individual projects, and obsolescence of system drivers and libraries make it virtually impossible to use an arbitrary set of packages or create one all-encompassing environment that will serve everyone's needs over long periods of time. The high velocity of changes in the popular ML/DL frameworks and packages and GPU computing exacerbates the problem.
The Problem with pip install
¶
Most guides and project documentation for installing python packages recommend using pip install
for package installation. While pip
is easy to use and works for many use cases, there are some major drawbacks. There are a few issues with doing pip install on a supercomputer like HiPerGator:
- On its own,
pip
installs everything in one location:~/.local/lib/python3.X/site-packages/
. - Packages installed with
pip
outside of a conda environment will be loaded anytime python is used. This will interfere with the operation of applications installed by UFIT Research Computing and accessed viamodule load
. - Different packages may require different versions of a package as dependencies, leading to impossible to reconcile installation scenarios. This becomes a challenge to manage with
pip
as there isn't a method to swap active versions. - In Jupyter Notebooks, even with a Kernel selected,
pip
installs in .local. While many tutorials out there will tell you to!pip install ____
, this is poor advice that can cause lots of problems.
Conda to the Rescue!¶
conda
solves some of these issues. Conda represents a higher level of packaging abstraction that can combine compiled packages, applications, and libraries as well as pip
-installed python packages. Conda also allows easier management of project-specific environments and switching between environments as needed. Conda makes it much easier to report the exact configuration of packages in an environment, facilitating reproducibility.
A Caveat¶
conda
gets packages from channels, or repositories of prebuilt packages. While there are several available channels, like conda-forge
or bioconda
, not every Python package is available from such channel as they have to be packaged for conda first. You may still need to use pip
to install some packages as noted later. However, conda still helps manage environment by installing packages into separate directory trees rather than trying to install all packages into a single folder that pip does.
UFIT Research Computing Conda Usage¶
The UFIT Research Computing Applications Team uses conda
for many application installs behind the scenes. We are happy to install applications on request for you. However, if you would like to use conda
to create multiple environments for your personal projects we encourage you to do so.
Next: Conda Configuration