Software (self-)Installation

A list of applications that are available on BEAR's facilities can be found here: https://bear-apps.bham.ac.uk. Wherever possible we would recommend using one of the centrally provided applications.

Where you require an application that isn't currently available we would, in most cases, recommend that you submit an Application Installation Request via Service Now, using the following link: https://intranet.birmingham.ac.uk/bear/sd/new-bear-software. The Research Software Group will then make the application available centrally and ensure that it is suitably optimised for BEAR's facilities.

However, we understand that there are situations where users would prefer to self-install, particularly Python modules and R packages. If this is the case then please refer to the documentation below.

Shortcuts to further information:

BlueBEAR's configuration – overview

BlueBEAR is heterogenous HPC system, which means that it is made-up from a variety of different node types. This presents some challenges when compiling and installing software – the applications maintained by the Research Software Group make these node-differences transparent to the user by ensuring that when an application module is loaded it provides the correct binary-executable(s) for the given node type. If you are installing software yourself then you will need to manage this process manually and if your installation doesn't distinguish between these node types then you may run into issues. For example, if you install a package on a Cascadelake node, then try to run it on a Haswell node (which is older) it may or may not work. If you install on Haswell node and try to run it on POWER9 node then it would almost certainly fail. (Note that the errors resulting from these issues will likely be convoluted.)

Self-installing Python modules

N.B. the term "module" in this context refers to the name of the extensions to Python's functionality that can be used by including e.g. import flake8 in your Python code.

These are the most commonly used methods for installing Python modules:

  • pip install flake8
  • python setup.py install

Python module installation process

Where a Python module is available at pypi.org it can be installed by using pip, the Python installer command. Executing the default pip install command will not work on BlueBEAR as users don't have the file permissions to write into the directory where this process normally places the Python modules. It is possible to pass the --user option to the command so that it installs into your home directory but this is problematic for the reasons described above, i.e. it won't distinguish between node types and your jobs may subsequently fail.

Our recommendation is to therefore use a node-specfic Python virtual environment – this solution applies to both the pip installation method and also the "python setup.py install" method.

The process for creating and using a node-specific virtual environment is as follows:

Creating a virtual environment and installing a Python module

  1. Load the BEAR Python module on which you want to base your virtual environment, e.g. "module load BEAR-Python-DataScience/2019b-foss-2019b-Python-3.7.4". Note that the DataScience module provides various additional Python modules (including NumPy and SciPy) on which many other modules rely, although you could use a "thinner" module such as Python/3.7.4-GCCcore-8.3.0. See the tips section below for further information on Python module dependencies.
    (We strongly recommend using a module instead of the system Python version. Also, note that we do not recommend the use of Python 2 as it's no longer supported by the Python developers.)
  2. Change to the directory in which you want to create the virtual environment. (Alternatively you can specify the full path in the following step.)
  3. Create a virtual environment, including the environment variable ${BB_CPU} in its name to identify the node-type:
    • virtualenv --system-site-packages my-virtual-env-${BB_CPU}
  4. Activate the virtual environment:
    • source my-virtual-env-${BB_CPU}/bin/activate
  5. Run your Python module installations as normal (N.B. don't include --user):
    • pip install flake8

Using your node-specific virtual environment

  1. First load the same BEAR Python module as you used to create the virtual environment in the previous step. This is important, else your Python commands will likely fail.
  2. Activate the virtual environment:
    • source my-virtual-env-${BB_CPU}/bin/activate
  3. Execute your Python code.

Example script

All of the above steps can be encapsulated in a script, which can be included as part of the batch script that you submit to BlueBEAR:

#!/bin/bash
set -e

module purge; module load bluebear
module load BEAR-Python-DataScience/2019b-foss-2019b-Python-3.7.4

export VENV_DIR="${HOME}/virtual-environments"
export VENV_PATH="${VENV_DIR}/my-virtual-env-${BB_CPU}"

# Create a master venv directory if necessary
mkdir -p ${VENV_DIR}

# Check if virtual environment exists and create it if not
if [[ ! -d ${VENV_PATH} ]]; then
    virtualenv --system-site-packages ${VENV_PATH}
fi

# Activate the virtual environment
source ${VENV_PATH}/bin/activate

# Perform any required pip installations. For reasons of consistency we would recommend
# that you define the version of the Python module – this will also ensure that if the
# module is already installed in the virtual environment it won't be modified.
pip install flake8==3.8.4

# Execute your Python scripts
python my-script.py

 Tips

  • Install the minimum of what's required. For example, if the Python module that you're installing has a dependency on Matplotlib, load the BEAR Matplotlib module first (module load matplotlib/3.1.1-foss-2019b-Python-3.7.4) and then perform your virtual env installations.
  • Further to the above tip, you may need to be aware of dependencies' version constraints. For example, if a Python module needs a newer version of Matplotlib than the one we provide, first check if BEAR Applications has the later version. If not, see whether you can install an earlier version of the module you require that will work with the BEAR Applications version of Matplotlib – this would be our recommendation as some Python modules are complex to install. Finally, you can instead use the BEAR Python module instead of the BEAR Matplotlib module and then install everything yourself although, as mentioned, this may be difficult depending on the complexity of the modules' installation processes.

Self-installing R packages

As mentioned above, first check at https://bear-apps.bham.ac.uk to see whether the package you need is already installed, being aware that it may exist as a dependency of another R package.

The self-installation of R packages is simpler than the process for Python and essentially involves defining a single environment variable:

export R_LIBS_USER=${HOME}/R/library/${EBVERSIONR}/${BB_CPU}

 Then from within R, execute the install.packages command. By default it will attempt and fail to use the main library but will subsequently drop back to the user directory defined above:

> install.packages("vioplot")
Warning in install.packages("vioplot") :
  'lib = "/rds/bear-apps/2019b/EL7-haswell/software/R/3.6.2-foss-2019b/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘/rds/homes/a/a-user/R/library/3.6.2/haswell’
to install packages into? (yes/No/cancel) yes

Note that if you're performing a CRAN install in a batch script you will need to specify a repo so that R doesn't ask for you to select a CRAN Mirror, e.g: install.packages("vioplot", repos='https://www.stats.bris.ac.uk/R/')

Colleges

Professional Services