This page has information about developing and running parallel programs on BlueBEAR, and also a selection of links to Parallel Programming information, tutorials and presentations
Please report any problems such as broken links to the IT Service Desk
This page has the following sections:
Parallel programming on BlueBEAR
Parallel jobs can either consist of a single program distributed across several cores with communication between the distributed components or the same program running on several cores. The former will usually consist of a program using a library such as MPI to provide the distributed computing facilities, whilst the latter will make use of a facility such as pbsdsh to spawn multiple copies of the same program on different nodes. Both methods are supported on this cluster. Examples of MPI programs are found on the help pages for the compilers which are linked from the applications help page
Interaction of the batch system with parallel codes
The number of cores required on a node can be specified with the ppn option when the job is submitted; see the help for the batch system for more details of specifying job parameters. This can be overridden by a parallel program that explicitly asks for multiple cores, so it is recommended to ask the batch system for the number of cores that are explicitly asked for by such a program.
A program written in most current conventional languages without explicit multi-processing/-threading calls will use one core at a time only, presumably following a serial algorithm. [Java programs seem often to be implicitly multi-threaded, though]. A program designed for multiple-core use should ideally detect how many nodes and cores the job scheduler has assigned to it for example by looking at $PBS_NODEFILE, and only use those. Programs that have not been designed with such Torque/PBS-dependent detection in mind may have to have the number of cores specified explicitly; this should be described in the code's documentation. Codes that have been linked against the OpenMPI libraries built on the cluster here, as described in the compiler help pages on the application help pages, can detect the number of available cores with a call to MPI_COMM_SIZE since the OpenMPI libraries have been built with torque support.
If a job's software doesn't detect that it needs multiple cores but is nevertheless multi-core/multi-threaded, then it is the responsibility of the user to know this and request a number of cores to correspond to the actual characteristics of the job.
There are other considerations when specifying the number of cores for jobs. Nodes share many resources as well as cores, for example RAM size, RAM bandwidth, network bandwidth and local disk access. An application which is single-cored but hammers memory access, say, will probably give significantly different results if 1,2,3, or 4 instances are run on the same node, even if there are enough allocated cores to run the job without apparent core contention.
Related external links
Although the system-specific information, such as running jobs on the systems at these institutions, will not be directly applicable to BlueBEAR the tutorials and documentation will be generally applicable.
NCSA Cyberinfrastructure tour (includes on-line tutorials, requires free registration)
Indiana State University MPI Tutorial
Maui HPC Centre Parallel Processing Workshop
Lawrence Livermore National Laboratory HPC Training
Guelph University MPI Introduction
Argonne's MPI Page
Last modified: 19 November 2012