This page provides a list of the work carried out during the SCITAS annual maintenance week 2016

CHANGELOG

 

Monday 11/7/2016

 

08h15 - All users logged out of the clusters and no new connections possible 

09h30 - GPFS shut down so no access to /home, /work or /scratch until next week.

 

SLURM updated to version 15.08.12 (from 15.08.7) on Bellatrix, Castor and Deneb 

Analysis of the queued jobs on all the clusters shows that only one user followed the instructions on how to ensure that jobs held in the queue will not crash after the maintenance. A decision on what to do with the queued jobs will be taken later in the week.

 

Tuesday 12/7/2016

IBM engineers are continuing working on the GSS storage and its external ethernet networking. The new switches have been installed and are now being configured. 

 

Reinstallation of the clusters is now beginning - RedHat 6.7 with the 2.6.32-573.22.1.el6.x86_64 kernel along with the latest OFED.

1012 compute nodes + the login and administration servers need upgrading so this will take most of the rest of the week.

 

Firmware upates for the IB adapters in the Deneb GPU nodes (FW: 2.36.5000)

 

Wednesday 13/7/2016

Documentation updated to reflect the new software environment

http://scitas.epfl.ch/documentation/using-clusters 

http://scitas.epfl.ch/software-clusters

 

Reinstallations ongoing.

 

Thursday  14/7/2016

Deneb: All compute nodes reinstalled with RH 6.7

Castor: All compute nodes reinstalled with RH 6.7 

 

Deneb: CUDA driver for the GPU nodes updated to 352.39 (CUDA 7.5) 

 

Work ongoing on the GSS storage.  

 

Deneb phase II  BIOS updates completed.

 

Friday 15/7/2016

Bellatrix updates and reinstallation ongoing.

Central GPFS (GSS) is up and performance tests are being carried out. 

Deneb admin nodes updated 

 

Saturday 16/7/2016

Update to GPFS 4.2 for the BG/Q, Bellatrix and Castor

 

Monday 18/7/2016

Update to GPFS 4.2 on Deneb

Update of the Bellatrix administration and login nodes

New software environment in place

 

Tuesday 19/7/2016

10h15 Users allowed to log in to the frontend nodes.

Compute nodes will be brought online during the rest of the day.