BlueBEAR Regeneration overview

Tux-svg-icon

 

BlueBEAR Regeneration

The last major upgrade of BlueBEAR was more than 4 years ago and that’s a long time in technology terms. The worker nodes are now out of warranty and likely to become less reliable as well as more expensive to maintain over coming months.

None of this is unexpected. The team has been working hard, preparing for the introduction of the new generation of hardware and to make sure applications are ready for the new architecture and the minor user affecting changes are documented and publicised. A deliberate choice was made to select the same hardware for the new BlueBEAR cluster as is going into service for BEAR Cloud. This not only gives us a standard specification for flexibility and familiarity but also reduces the management overhead. The hardware is both cost-effective in terms of price/performance but also in terms of compute power/energy consumption (We are deploying Lenovo warm water cooled NeXtScale).

A new NeXtScale-based version of BlueBEAR is currently in pilot with a small number of researchers already successfully running batch jobs. The new cluster has a number of other notable features, including more cores and memory per node, larger ‘large’ memory nodes and a new automated workload manager along with some tweaks to the way the resource is shared. The underlying principles remain the same but we are expecting to see benefits, particularly to the throughput of the high number of short running or single core jobs resulting from the removal of the limitation that only a single user can have access to a node. The updated Job Submission page details how to submit jobs.

In parallel with the upgrade to the processing capacity of BlueBEAR, we are also planning the replacement of the closely coupled storage (dedicated to active processing on the cluster). The new storage has a number of upgraded or additional features, all designed to boost efficiency and throughput. For further technical detail, visit the BlueBEAR Regeneration technical web pages

As plans progress over the next few months we will complete developments/testing and install more NeXtScale nodes to build up the pool before we start withdrawing the 4-5 year old nodes from service. Users/research groups will be individually migrated from the old to the new nodes to ensure a clean cutover, minimising any disruption and to pick up and resolve any issues immediately. This will be a progressive process over a number of weeks. In the run-up to this, we will ensure any changes in the way the system works for users are documented. We are going to considerable lengths to limit these and make the new environment feel familiar.

Unless users have built up large data holdings on BlueBEAR, no action is required until they are contacted. Those with significant data, should read the the December 2016 News Article at BlueBEAR Storage Policy and Changes to Use for more information and should start preparing for their data migration.


Last modified: 26th June 2017