3.4. Conda Environment#
In order to use a conda environment on the cluster, you will need to create a conda environment and then activate it on the compute node. FASRC uses mamba
as a replacement for conda
to manage conda environments. mamba
is a drop-in replacement for conda
that is generally much faster.
3.4.1. Building a Conda environment under the default home directory#
3.4.1.1. What is a conda environment and why should you use it?#
A conda environment is a directory that contains a self-contained instance of Python along with a specific set of packages. For instance, you can create a conda environment called myenv
with numpy version 1.26.4 installed. When executing Python code within that environment, numpy version 1.26.4 will be used. If you need to run code from a different project requiring an older version of numpy, you can use a different conda environment with that older version of numpy installed.
Overall, conda environments allow you to isolate package versions for different projects or repositories, which reduces conflicts and dependency issues. The use of conda environments also ensures reproducibility; you can export your conda environment, allowing others to run your code in precisely the same environment.
3.4.1.2. Creating a Conda Environment#
Step 1: See a list of load modules using the
module list
command.Tip
Verifying the loaded modules is a good practice to avoid conflicts between the loaded modules and any upcoming modules.
Step 2: Purge the loaded modules using the
module purge
command if you have any loaded modules and you want to start fresh.Step 3: Load
python
module using themodule load python
command. This will load the default version of Python.Step 5: Create a conda environment using the
mamba create
command. For example, to create a conda environment namedmyenv
with Python 3.12 and installing pip and numpy, you can use the following command:mamba create --name myenv python=3.12 pip numpy
Note
You can also create the environment from a
yaml
file. Please see the conda’s official documentation for more information.Tip
You can also add a channel to the command if needed:
mamba create --name myenv python=3.12 pip numpy -c conda-forge
You can also specify the list and order of channels to look for the software pacakges.
conda config --add channels conda-forge conda config --add channels bioconda
Step 6: Check the location of the conda environment using the
mamba info --envs
command. This will show the list of conda environments and their locations.$ mamba info --envs mamba version : 1.5.5 # conda environments: # myenv /n/home10/<username>/.conda/envs/myenv base /n/sw/Mambaforge-23.11.0-0
Step 7: Activate the conda environment using the
mamba activate
command. For example, to activate themyenv
environment, you can use the following command:mamba activate myenv
Warning
Be aware that the size of the new conda environment can occupy several gigabytes, depending on the number of packages being installed. To conserve disk space, it is advisable to delete any outdated environments. Read more here.
3.4.1.3. Using a Conda Environment with Jupyter#
If you plan to use Jupyter notebooks or JupyterLab on the cluster, and would like to use a conda environment, you need to install ipykernel
within that conda environment:
mamba activate myenv
pip install ipykernel
You should now be able to change the kernel of the notebook to your conda environment.
See also
For details on how select a kernel when running a jupyter notebook in VSCode, please see the section Using Jupyter notebooks within VSCode.
3.4.1.4. Exporting a Conda Environment#
You can export your conda environment into a yml
file. This means that other users can recreate your exact conda environment using the yml
file.
To export your conda environment, run
mamba activate <environment name>
mamba env export > environment.yml
With this command, the environment.yml
file will contain information about every single package in your environment, including low-level ones you did not explicitly install. Since some of these may be operating system-specific, this could mean that the conda environment is not reproducible across operating systems (Mac OS, Windows, Linux).
To help ensure your environment is reproducible across operating systems, you can instead run:
mamba activate <environment name>
mamba env export --from-history > environment.yml
The --from-history
argument ensures that only packages you specifically chose to install are exported. Note however, that if you did not specify the versions of packages when installing them, the versions will not be included in the yml
file when using the from-history
flag.
The --from-history
flag will also result in a yml
file that does not include information on packages you installed with pip. To include packages you installed using pip, you can run
pip freeze > requirements.txt
Then manually add the following lines to your environment.yml
file:
- pip
- pip:
- -r file:requirements.txt
A user can recreate your conda this yml
file can recreate your conda environment by running:
mamba env create --file environment.yml
They will need both the environment.yml
file and the requirements.txt
file.
See also
The Conda documentation may be useful for further information.
3.4.2. Building a Conda environment in a user-defined directory#
The Conda environment and cache directory for installed packages can easily exceed tens of gigabytes. It is recommended to create the Conda environment in the lab directory instead of the home directory. In this section, we assume you prefer not to have a Conda environment in the default home directory, and that your default Conda environment is located in the lab directory. The lab directory (under your username) does not have the 100 GB space limitation, providing you with more room to create Conda environments.
Warning
Please note that this is a major change to the default behavior of Conda. If you are unsure about this change, please consult with the FAS Research Computing.
Also be aware that creating numerous Conda environments can exhaust the file system’s inodes, as Conda generates a large number of files. To avoid this, use the
$SCRATCH
space for your Conda environments. You can learn more about the Scratch space policy on the FASRC website.
Here are the steps to create a Conda environment in the lab directory:
Step 1: Locate your lab directory
The labs are located at the following path: /n/holylabs/LABS
(or other filesystems such as holylfs04
, holylfs05
, holylfs06
). Inside the LABS directory, you will find directories for individual labs. Within each lab directory, there is a Users
folder. Inside this folder, there should be a folder with your username. If such a folder does not exist, request the FASRC help desk to create one for you. This folder will serve as your personal directory under your affiliated lab.
Step 2: Create the following directories in your lab directory:
.conda
: This directory will be your default Conda directory..conda/envs
: This directory will store the conda environments..conda/pkgs
: This directory will store the cached packages.
Step 3: Set the following environment variables in your
~/.bashrc
file:export CONDA_ENVS=/n/holylabs/LABS/<lab_name>/<username>/.conda/envs export CONDA_PKGS_DIRS=/n/holylabs/LABS/<lab_name>/<username>/.conda/pkgs export PATH="/n/holylabs/LABS/<lab_name>/<username>/.conda:$PATH"
Replace
<lab_name>
with the name of your lab and<username>
with your username. TheCONDA_ENVS
environment variable specifies the directory where the Conda environments will be stored, and theCONDA_PKGS_DIRS
environment variable specifies the directory where the cached packages will be stored.Run the following command to apply the changes:
source ~/.bashrc
Step 4: Add these default directories to your
.condarc
file:envs_dirs: - /n/holylabs/LABS/<lab_name>/<username>/.conda/envs pkgs_dirs: - /n/holylabs/LABS/<lab_name>/<username>/.conda/pkgs
Replace
<lab_name>
with the name of your lab and<username>
with your username.Step 5: Create a Conda environment in the lab directory using the
conda create
command. For example, to create a Conda environment namedmyenv
with Python 3.12 and installing pip and numpy, you can use the following command:module load python/3.10.12-fasrc01 conda create --name myenv python=3.12 pip numpy
Step 6: Check if the Conda environment is created successfully in the lab directory using the following command:
conda env list
This command will list all the Conda environments, including the one you just created.
Done! You have successfully created a Conda environment in the lab directory.