Software (self-)Installation

A list of applications that are available on BEAR's facilities can be found here: https://bear-apps.bham.ac.uk. Wherever possible we would recommend using one of the centrally provided applications.

Where you require an application that isn't currently available we would, in most cases, recommend that you submit an Application Installation Request via Service Now, using the following link: https://intranet.birmingham.ac.uk/bear/sd/new-bear-software. The Research Software Group will then make the application available centrally (which allows all users to utilise it) and will ensure that it is suitably optimised for BEAR's facilities.

However, we understand that there are situations where users would prefer to self-install, particularly for Python modules, R packages, and codes under active development by the user. If this is the case then please refer to the documentation below.

Shortcuts to further information:

BlueBEAR's configuration – overview

BlueBEAR is heterogenous HPC system, which means that it is made-up from a variety of different node types. This presents challenges when compiling and installing software – the applications maintained by the Research Software Group make these node-differences transparent to the user by ensuring that when an application module is loaded it provides the correct binary-executable(s) for the given node type. If you are installing software yourself then you will need to manage this process manually and if your installation doesn't distinguish between these node types then you may run into issues. For example, if you install a package on a Cascade Lake node, then try to run it on a Haswell node (which is older) it may or may not work. In contrast, if you install a package on the older Haswell node, it will work on the Cascade Lake nodes, but likely with suboptimal performance. Errors resulting from these sort of compilation issues will likely be convoluted, but often include the error "Illegal Instruction".

Self-installing Python modules

N.B. the term "module" in this context refers to the name of the extensions to Python's functionality that can be used by including e.g. import flake8 in your Python code.

These are the most commonly used methods for installing Python modules:

  • pip install flake8
  • python setup.py install

Python module installation process

Where a Python module is available at pypi.org it can be installed by using pip, the Python installer command. Executing the default pip install command will not work on BlueBEAR as users don't have the file permissions to write into the directory where this process normally places the Python modules. It is possible to pass the --user option to the command so that it installs into your home directory but this is problematic for the reasons described above, i.e. it won't distinguish between node types and your jobs may subsequently fail.

Our recommendation is to therefore use a node-specfic Python virtual environment – this solution applies to both the pip installation method and also the "python setup.py install" method.

The process for creating and using a node-specific virtual environment is as follows:

Creating a virtual environment and installing a Python module

  1. Load the BEAR Python module on which you want to base your virtual environment, e.g. "module load BEAR-Python-DataScience/2019b-foss-2019b-Python-3.7.4". Note that the DataScience module provides various additional Python modules (including NumPy and SciPy) on which many other modules rely, although you could use a "thinner" module such as Python/3.7.4-GCCcore-8.3.0. See the tips section below for further information on Python module dependencies.
    (We strongly recommend using a module instead of the system Python version. Also, note that we do not recommend the use of Python 2 as it's no longer supported by the Python developers.)
  2. Change to the directory in which you want to create the virtual environment. (Alternatively you can specify the full path in the following step.)
  3. Create a virtual environment, including the environment variable ${BB_CPU} in its name to identify the node-type:
    • python3 -m venv --system-site-packages my-virtual-env-${BB_CPU}
  4. Activate the virtual environment:
    • source my-virtual-env-${BB_CPU}/bin/activate
  5. Run your Python module installations as normal (N.B. don't include --user):
    • pip install flake8

Using your node-specific virtual environment

  1. First load the same BEAR Python module as you used to create the virtual environment in the previous step. This is important, else your Python commands will likely fail.
  2. Activate the virtual environment:
    • source my-virtual-env-${BB_CPU}/bin/activate
  3. Execute your Python code.

Example script

All of the above steps can be encapsulated in a script, which can be included as part of the batch script that you submit to BlueBEAR:

#!/bin/bash
set -e

module purge; module load bluebear
module load BEAR-Python-DataScience/2019b-foss-2019b-Python-3.7.4

export VENV_DIR="${HOME}/virtual-environments"
export VENV_PATH="${VENV_DIR}/my-virtual-env-${BB_CPU}"

# Create a master venv directory if necessary
mkdir -p ${VENV_DIR}

# Check if virtual environment exists and create it if not
if [[ ! -d ${VENV_PATH} ]]; then
    python3 -m venv --system-site-packages ${VENV_PATH}
fi

# Activate the virtual environment
source ${VENV_PATH}/bin/activate

# Perform any required pip installations. For reasons of consistency we would recommend
# that you define the version of the Python module – this will also ensure that if the
# module is already installed in the virtual environment it won't be modified.
pip install flake8==3.8.4

# Execute your Python scripts
python my-script.py

 Tips

  • Install the minimum of what is required. For example, if the Python module that you're installing has a dependency on Matplotlib, load the BEAR Matplotlib module first (module load matplotlib/3.1.1-foss-2019b-Python-3.7.4) and then perform your virtual environment installations.
  • Further to the above tip, you may need to be aware of dependencies' version constraints. For example, if a Python module needs a newer version of Matplotlib than the one we provide, first check if BEAR Applications has the later version. If not, see whether you can install an earlier version of the module you require that will work with the BEAR Applications version of Matplotlib – this would be our recommendation as some Python modules are complex to install. Finally, you can instead use the BEAR Python module instead of the BEAR Matplotlib module and then install everything yourself although, as mentioned, this may be difficult depending on the complexity of the modules' installation processes.
  • Python libraries on PyPi can either be binary packages known as 'wheels' which are self contained with compiled code, or source packages which rely on external dependencies such as compiled C/C++/Fortran libraries and which compile at installation.  For the latter, you may find that installing through pip will fail and that you need to load additional modules from Bear Apps before retrying the installation, or that you will need to compile the dependencies yourself.
  • Some package authors recommend Python package installation via Anaconda or Miniconda. We do not recommend the use of Python packages via this method on the BlueBEAR cluster and would encourage you to contact us if the package you want to make use of suggests this method of installation.

Self-installing R packages

As mentioned above, first search at https://bear-apps.bham.ac.uk to see whether the package you need is already installed, being aware that it may exist as a dependency of another R package or as part of the Bioconductor package.

The self-installation of R packages is simpler than the process for Python and essentially involves defining a single environment variable and ensuring that the path it points at exists:

export R_LIBS_USER=${HOME}/R/library/${EBVERSIONR}/${BB_CPU}

Then from within R, execute the install.packages command. By default it will attempt and fail to use the main library but will subsequently drop back to the user directory defined above:

> install.packages("vioplot")
Warning in install.packages("vioplot") :
  'lib = "/rds/bear-apps/2019b/EL7-haswell/software/R/3.6.2-foss-2019b/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘/rds/homes/a/a-user/R/library/3.6.2/haswell’
to install packages into? (yes/No/cancel) yes

Note that if you're performing a CRAN install in a batch script (i.e. non-interactively) you will need to specify a repo so that R doesn't ask for you to select a CRAN Mirror, e.g: install.packages("vioplot", repos='https://www.stats.bris.ac.uk/R/'). You will also need to ensure that the directory specified by $R_LIBS_USER exists prior to launching R by including the following command: mkdir -p "${R_LIBS_USER}"

 Self-installing C/C++/Fortran packages

Because of the nature of the heterogenous architecture of BlueBEAR as described above, you will need to consider how you compile and run packages using compiled languages. We provide a number of tools to help you with your software development needs. Generally, compiling codes on BlueBEAR is not as straightforward as on other HPC machines or your own machine.

Accessing Compilers and Build Tools

We provide access to several families of compilers on BlueBEAR, and their use will depend on the application you are compiling.

GNU Compiler Collection

Generally, we recommend that most people start by using the GNU family of compilers (along with FFTW, OpenMPI and OpenBLAS), which can be accessed via the foss toolchain

module load foss/2020a
# Compiling C, C++ and Fortran applications:
gcc -o my_c_app my_c_app.c
g++ -o my_cpp_app my_cpp_app.cpp
gfortran -o my_fortran_app my_fortran_app.f90

# Compiling C, C++ and Fortran MPI applications:
mpicc -o my_mpi_c_app my_mpi_c_app.c
mpicxx -o my_mpi_cpp_app my_mpi_cpp_app.cpp
mpifort -o my_mpi_fortran_app my_mpi_fortran_app.f90

In order to get the best peformance for your application on each of the node types, it is necessary to pass flags which tell the compiler to generate efficient code. Generally, you will want to specify at least the -O2 flag. The build scripts for many scientific packages will also add the flag -march=native to compilation commands, which tells the compiler to build for the processor that the compiler is running on. This is because newer processors have support for optimised additional operations via what are known as instruction sets. We recommend that users submit their compilation jobs (once they have fixed any errors) as a job script, and label their build/install directory appropriately in order to take advantage of the hardware. For e.g., you could submit two variations of the following script, changing the constraint to each of 'cascadelake' and 'haswell':

 #SBATCH --time 10:0
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --constraint cascadelake
module purge; module load bluebear

module load bear-apps/2020a
module load foss/2020a

export BUILDDIR=myapplication_${BB_CPU}
gcc -o ${BUILDDIR}/myexecutable -march=native -O2 test.c

Then, in any job script, you would be able to run your processor optimised application with:

./myapplication_${BB_CPU}/myexecutable

For some external packages, a file called 'configure' will be found in the source directory of the application. Usually, but not always, this will use the tool Autoconf in order to generate a Makefile. Where this is the case, you can specify an installation directory in your script:

./configure --prefix=installdir_${BB_CPU}

Users of the common CMake build system should create a seperate build and installation directory for each architecture:

mkdir -p build_${BB_CPU}
cd build_${BB_CPU}
cmake ../path/to/application/source/directory -DCMAKE_INSTALL_PREFIX=install_${BB_CPU}
make install

Intel Parallel Studio

Alternatively, you can load the Intel compiler, Math Kernel Library and OpenMPI using the iomkl toolchain:

module load iomkl/2020a
icc -o my_c_app my_c_app.c
icpc -o my_cpp_app my_cpp_app.cpp
ifort -o my_fortran_app my_fortran_app.f90

Note that the compilation wrapper commands for MPI applications are the same as for GCC. We do not provide the Intel MPI library on BlueBEAR and provide OpenMPI instead.

Most of the advice for the GNU compilers also applies here. It is important to note however, that the optimisations performed by the Intel compilers can be more aggressive than those in the GNU compilers, resulting in better performance, but at the expense of numerical accuracy in calculations. A particularly important flag to take note of is the "-fp-model" flag which tells the Intel Compiler how aggressively floating point calculations can be optimised. By default, the flag is set to "-fp-model fast=1", and this results in calculations being less accurate than the IEE754 standard which the GNU compilers use by default. Because of this, if you find that your code gets different results with the Intel Compiler, you may want to adjust this setting by using the flags "-fp-model precise", or "-fp-model strict".

Using BlueBEAR Provided Compiled Libraries

Loading the Appropriate Modules

When you load a compiler toolchain like foss, or iomkl, you should generally use versions of libraries which have been compiled with the same compiler. Some libraries will be labelled with the toolchain they are compiled with, for e.g. PETSc/3.11.1-foss-2020a, but for others, you will need to directly specify the compiler version instead. 

Toolchain versions and the compilers which they include
 Toolchain Compiler Versions

foss/2019a

foss/2019b

foss/2020a

foss/2020b

foss/2021a

GCC 8.2.0

GCC 8.3.0

GCC 9.3.0

GCC 10.2.0

GCC 10.3.0

iomkl/2019a

iomkl/2019b

iomkl/2020a

iomkl/2020b

Intel 2019.1.144 and GCC 8.2.0

Intel 2019.5.281 and GCC 8.3.0

Intel 2020.1.217 and GCC 9.3.0

Intel 2020.4.304 and GCC 10.2.0

To take an example; we provide several variants of the GNU Scientific Library as a module called GSL. If you wanted to use this in your code which was being compiled with the foss/2020a toolchain, you would need to load the GSL module as:

module load foss/2020a
module load GSL/2.6-GCCcore-9.3.0

In the case where versions conflict, you will see outputs warning you that some modules have been replaced - for e.g. if after loading foss/2020a a user loads the older version GSL/2.6-GCCcore-8.3.0, they would see the following output:
The following have been reloaded with a version change:
  1) GCC/9.3.0 => GCC/8.3.0
  2) GCCcore/9.3.0 => GCCcore/8.3.0
  3) binutils/2.34-GCCcore-9.3.0 => binutils/2.32-GCCcore-8.3.0
  4) zlib/1.2.11-GCCcore-9.3.0 => zlib/1.2.11-GCCcore-8.3.0
This should be treated as a sign that something is wrong, as version clashes such as this may lead to further errors in compilation or loading of modules.

 

Using Loaded Libraries

We recommend the use of the tool pkg-config, which allows you to query the flags which should be passed to the compiler. For example, a user wishing to use the GNU Scientific Library can do so by querying the include path and the library path from pkg-config:

module load foss/2020a
module load GSL/2.6-GCC-9.3.0

gcc -I$(pkg-config gsl --includedir) $(pkg-config gsl --libs) test.c

You can see a list of all of the variables available for a particular package by running:

pkg-config <packagename> --print-variables

For some packages, the default settings returned by pkg-config may not be optimal. This is especially the case where the library provides multiple versions - for e.g. where there exists both a serial version and a parallelised version - an example of this is the FFTW library. We encourage you to check that the flags and list of libraries returned are correctly specified.

In the case that you need more control over the flags, we provide environment variables which store the directory to every library we provide which start with the prefix EBROOT<packagename>. For example, you can find all of the libraries and header files for the GNU Scientific Library within the location ${EBROOTGSL}.

Build Tools

Most real world projects invoke compilers via build tools in order to simplify the building process. There are many choices, and we try to provide up-to-date versions of these. While you may find that there are versions available immediately after logging in to BlueBEAR, these are those provided and required by the operating system and are generally older versions which many projects no longer support. Because of this, we strongly recommend that you load the following tools from the modules system:

With all of these tools, you may find that they do not choose the compiler that you want by default - for e.g. you may wish to use the Intel compilers and find that the build tools instead choose the GNU Compilers. In this case, you will need to specify the compiler. Each of the tools has it's own ways of doing this:

How to specify compilers to build tools
ToolBasic Invocation
make CC=gcc CXX=g++ FORT=gfortran make
CMake cmake . -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_FC_COMPILER=gfortran
Autotools ./configure CC=gcc CXX=g++ FC=gfortran
Bazel CC=gcc bazel

Please note that Ninja build files are generally outputted by another tool like CMake and so should be regenerated rather than trying to specify the compiler to them directly.

It is worth noting that build tools will often try to find external dependencies in their configuration stage. Sometimes this is an automatic process; for e.g. with CMake it will often find things specified in the pkg-config paths and so will detect BlueBEAR modules that you have loaded. However, you may need to specify locations to dependencies yourself by either modifying the scripts or by passing variables to the tool if it is not set up to automatically detect the package you are loading. Using the EBROOTPACKAGENAME variables is usually helpful for this.

Colleges

Professional Services