Software on the SCITAS clusters

 

Compiler and MPI combinations

Software policy

Modules and LMOD

The slightly ugly reality

Saving your environment

GPU environment

Reverting to the old environment

Behind the scenes

 

 

Compiler and MPI combinations

SCITAS supports the following compiler and MPI variants

 

Intel Composer 2016 with Intel MPI 2016 

GCC 5.3 with MVAPICH2 version 2.2

GCC 5.3 with OpenMPI version 1.10

 

All software installed by SCITAS will, where applicable, be built for all three variants. 

 

For GCC we recommend that you use MVAPICH2 as it provides better performance but OpenMPI is provided as certain codes (e.g. Trilinos) require it.  

 

For each compiler/MPI variant there is also a BLAS implementation

 

Intel: MKL 11.3.3

GCC: OpenBLAS 0.2.18

 

Software policy

 

SCITAS will install libraries compatible with the aforementioned compiler+MPI variants. These libraries may have multiple versions due to the implementation and functionality required (e.g. Serial vs. OpenMP vs. MPI).

 

For scientific codes (i.e. not libraries) SCITAS will only install one version with the configuration decided by the SCITAS application experts. If users require different options then SCITAS will provide assistance for the user to compile their own version.

 

Modules and LMOD

 

The SCITAS managed clusters use the LMOD tool to manage scientific software. This is compatible with the classical Modules tool but brings with it a large number of improvements.

The official LMOD documentation can be consulted at: http://lmod.readthedocs.io/en/latest/

 

A slightly simplified example of using LMOD is:

 

(1) Connect to a cluster and see what base modules are available. These are either compilers or stand alone packages such as MATLAB.

$ module avail

-------------- /path/to/base/modules ---------------  

cmake  gcc  intel  matlab  

 

(2) Load a compiler to see the modules built with the chosen compiler. These may be scientific libraries, serial (non-MPI) codes or MPI libraries.

$ module load gcc

$ module avail

--------------- /path/to/gcc/modules ----------------  

gdb  fftw  hdf5  mvapich2  openmpi  python R

-------------- /path/to/base/modules ---------------  

cmake  gcc  intel  matlab  

 

(3) Load a MPI library to see the modules that use this MPI flavour

$ module load mvapich2

$ module avail

-------------- /path/to/mvapich/modules -------------  

boost  fftw  hdf5  gromacs  espresso parmentis

--------------- /path/to/gcc/modules ----------------  

cmake  gdb  fftw  hdf5  mvapich2  openmpi  python

-------------- /path/to/base/modules ----------------  

cmake  gcc  intel  matlab  

 

LMOD knows which modules are incompatible and will take the neccessary steps to ensure a consistent environment:

 

$ module load openmpi

Lmod is automatically replacing "mvapich2" with "openmpi"

 

 

 

The slightly ugly reality

 

In reality running "module avail fftw" returns (after having loaded gcc)

 

[user@system ~]$ module avail fftw
------------------- /ssoft/spack/lafnetscha/share/spack/lmod/x86_E5v2_IntelIB/gcc/5.3.0 -------------------
   fftw/3.3.4-openmp    fftw/3.3.4 (D)
  Where:
   D:  Default Module

 

The names are <module name > / <version - options > with the options being the "key" configuration options such as MPI or OpenMP activation. 

The (D) after a module name indicates that, if there are two or more version of the same package available then this is the version that would be loaded by default. 

 

If you need a specific version due to the options with which it is built then you have to specify the full name:

 

module purge
module load gcc
module load mvapich2
module load fftw/3.3.4-mpi-openmp

If you really want to know how a module was built then you need to run "module help <modulename>"

 

$ module load intel
$ module load intelmpi
$ module help hdf5/1.8.16-mpi
------------------------------------------- Module Specific Help for "hdf5/1.8.16-mpi" --------------------------------------
HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, 
and is designed for flexible and efficient I/O and for high volume and complex data. 
spec : hdf5@1.8.16%intel@16.0.3~cxx~debug+fortran+mpi+shared+szip~threadsafe 
arch=x86_E5v1_IntelIB^intelmpi@5.1.3%intel@16.0.3
arch=x86_E5v1_IntelIB^szip@2.1%intel@16.0.3
arch=x86_E5v1_IntelIB^zlib@1.2.8%intel@16.0.3
arch=x86_E5v1_IntelIB

The spec line is interpreted as follows:

hdf5@1.8.16%intel@16.0.3

HDF5 version 1.8.16 compiled with the Intel 16.0.3 compiler.

~cxx~debug+fortran+mpi+shared+szip~threadsafe

Here "~" means without/disabled and "+" means with/activated so HDF5 has been configured with Fortran, MPI and szip support and without the C++, debug and threadsafe options.

The remaining "arch=foo" sections specify the target architecture and dependencies and are of no interest to end users. 

 

 

 

 

Saving your environment

If you have a few sets of modules that you use regularly then a nice feature of LMOD is the ability to save sets of modules and reload them with a single command

 

[user@system ~]$ module list
Currently Loaded Modules:
  1) gcc/5.3.0   2) mvapich2/2.2b   3) gdb/7.11

[user@system ~]$ module save dev_env
Saved current collection of modules to: dev_env, for system: "x86_E5v2_IntelIB"

[user@system ~]$ module purge

[user@system ~]$ module list
No modules loaded

[user@system ~]$ module restore dev_env
Restoring modules to user's dev_env, for system: "x86_E5v2_IntelIB"

[user@system ~]$ module list
Currently Loaded Modules:
  1) gcc/5.3.0   2) mvapich2/2.2b   3) gdb/7.11 

 

Because each cluster has a different base module path then the saved set is only valid for one architecture (the system type given whan you save). 

If you try and load a module collection on a different system type you will see:

 

[user@other_system ~]$ module restore dev_env
Lmod has detected the following error:   User module collection: "dev_env" does not exist.
 Try "module savelist" for possible choices.

For this reason you should never use module restore in job scripts. You can, of course, save the same set of modules with the same name on multiple clusters so as to have the same environment everywhere.

 

GPU environment

 

For homogeneous clusters such as Bellatrix and Castor the environment on the compute nodes is the same as that on the front-end machines. 

 

Deneb is slightly different as it has a  partition containing machines with GPUs as well as a slightly different Infiniband configuration. 

 

If you wish to have access to the GPU node environment (i.e. the CUDA runtime and correct MPI) on the login machines then run:

 

[user@system ~]$ slmodules -s x86_E5v2_Mellanox_GPU -v
[INFO] S+L release: stable
[INFO] S+L systype: x86_E5v2_Mellanox_GPU
[INFO] S+L engaged!

 

To switch back to the architecture of the machine on which you are running the commands run "slmodules" without options.

 

Reverting to the old environment

 

SCITAS provides three software releases:

  • deprecated
  • stable
  • future 

 

deprecated

This is the old production environment and is no longer supported but will be retained for one year. 

stable

This is the current stable resease for the year in progress. We guarantee that the modules here will work and that modules will not be removed.

future

This is the testing area for what will become the next production environment and it is not guaranteed to exist! Modules in here are not guaranteed to work and may be removed without warning. 

 

When you connect to the clusters you will see the production release. To switch to a different release you can run "slmodules -r <release>"

 

In order to revert to the old environment (pre July 19th 2016) then run "slmodules -r deprecated"

 

[user@system ]$ slmodules -r deprecated -v
[INFO] Loaded old-style modules

In a job script you need to source the full command "/ssoft/spack/bin/slmodules.sh":

 

source /ssoft/spack/bin/slmodules.sh -r deprecated -v
module purge
module load foo

 

 

Behind the scenes

The software environment on the clusters is managed using the SPACK toolkit to which the EPFL is a major contributor: http://software.llnl.gov/spack/

This allows us to deploy software for multiple architectures and compiler/mpi variants in a consistant and automated manner.