BlueBEAR Storage Policy and Changes to Storage Use

BlueBEAR Storage Policy and Changes to Storage Use

The release and steady uptake of the BEAR Research Data Store (RDS) service which offers working storage to research groups, means it makes sense to re-appraise the usage of the limited and closely coupled storage on BlueBEAR.
As you will know, the bulk of the storage on BlueBEAR is not backed up and was always intended to be for transitory use, associated with processing. As requirements expand along with the size of data sets to be processed, it is becoming increasingly important that the space on BlueBEAR is reserved for this purpose. The advent of the RDS means IT Services is now providing the practical means to support researchers to work in this way. The RDS offers a store for working data which is connected to BlueBEAR at high speed via Infiniband, enabling the efficient transfer of data for processing and the transfer back of results. What’s more, by default, data in the RDS is both mirrored to our secondary Data Centre and backed up for resilience and disaster recovery purposes.
The Research Computing Team is aware that a body of data has built up on BlueBEAR, much of it not involved in immediate or recent processing. Early in the New Year, and as part of a wider re-registration process for the BlueBEAR service, we will be contacting data owners in order to rationalize data holdings by:
  • Identifying and deleting any obsolete data
  • Relocating current data that does not relate to any immediate processing
  • Archiving data that is now static but supports published papers
  • Potentially saving to tape large data sets that need to be retained but do not relate to published research and are not required for processing in the near term
The University has funded a default provision of up to 3TBs in the RDS for research projects on application by the PI. Similarly, 1TB of archive storage is being funded for the project in our Research Data Archive (RDA), specifically for the data supporting published research papers. Requirements in excess of the defaults can be catered for but there is an expectation that research grants will fund those requirements.
Please can we ask you now to evaluate the data you have on BlueBEAR, to clear out obsolete data and identify data for archiving or backing-off to tape. Also, please assess your storage requirements. By preparing in this way you can help make the transition for your project efficient and minimize any impact to your work.

Note that next year we will provide mechanisms to automate the data transfer process where possible as part of ‘pre’ and ‘post’ processing routines (though we do not have this capability today).
Data can be read directly from the RDS by jobs running on BlueBEAR worker nodes however the RDS should not be used for intermediate data generated or processed by HPC applications. Whilst this will work, the overhead of the two-site replication may have a significant impact on your job's throughput. Users are advised to generate intermediate data on the BlueBEAR file-system. Final output data can be written directly to RDS, however you may find that it is more efficient to copy the data on completion of your job using the mechanisms mentioned above.