BlueBEAR has multiple logon nodes which are allocated in a round-robin fashion, from the general logon address bluebear.bham.ac.uk,
to ensure that the workload is shared between them. Login nodes are meant for tasks such as submitting batch jobs, checking on job status, editing and performing simple tasks. They must not
be used for work requiring significant CPU resources, including analysis jobs running from a Graphical User Interface such as those provided by some engineering or Computational Fluid Dynamics (CFD) codes, or work on more than 1 core. Jobs requiring significant CPU, or multiple cores, should be run in the batch system.
If multiple users were to run several jobs, then the login nodes would give very poor response or crash, keeping all BlueBEAR users from being able to login to the cluster. Those wishing to run GUI applications should do so in an interactive job.
Any work that is consuming significant CPU, or causing problems such as poor responsiveness of the logon nodes for other users, is subject to the following procedure:
- after a process has used 15 minutes cpu: the process is set to run at a lower priority
- after a process has used 30 minutes cpu: the process is suspended/stopped (but could be resumed)
- at this stage the process can be resumed in case any tidying-up is required; to do this use the
kill -CONT pid command where the ProcessID (pid) is in the email that you will have received - it is the numerical value at the beginning of the line describing the process, for example 12345. The following steps are taken if the process is resumed:
- after a process has used 40 minutes cpu (that is, an additional 10 minutes after the previous stage): the process is signaled to terminate which should gracefully terminate the process, honouring any clean-up process that is part of the application. Not all applications honour this signal, so the following final step is taken if the process is still active
- after a process has used 50 minutes cpu: if still running, the process is sure-killed (kill -KILL).
Should this happen the user will be emailed at each of the above steps and invited to discuss their work with IT Services to see how their needs can be met on this service.
Please be aware that if you lose connection whilst working interactively on a node then your session (and potentially work) will also be lost. To mitigate this issue we would recommend running in either a tmux or screen session, which will remain active after you disconnect from the node. This method will only work for non-graphical applications.
To create either a tmux or screen session please issue one of the following commands:
$ tmux new-session -s session1
$ screen -S session1
To re-attach to an exisiting session (for example, following a forced disconnect) issue one of the following:
$ tmux attach -t session1
$ screen -r session1
Note that you can name the sessions as you please – session1 is just an example.
Further information on both of these applications can be found online. In general we would recommend tmux as it has a richer feature-set and more detailed usage information can be found here.
Last modified: 19 March 2019