Spark 2.2.0

Spark is Hadoop MapReduce done in memory

Accessing the software

To load the module:

module load Spark/2.2.0-Hadoop-2.6-Java-1.8.0_152

Example Batch Job

The following example can be copied to a test directory and submitted to the batch system using sbatch.

#!/bin/bash
#SBATCH -n 4
#SBATCH -t 1:00:00

module purge; module load bluebear
module load Spark/2.2.0-Hadoop-2.6-Java-1.8.0_152

wget https://raw.githubusercontent.com/apache/spark/master/examples/src/main/python/pi.py

WORKING=$(mktemp -d XXXX.${SLURM_JOBID})

cp pi.py ${WORKING}
cd ${WORKING}
spark-submit --total-executor-cores 4 --executor-memory 5G pi.py 100

Further information can be found at the following URL: https://researchcomputing.princeton.edu/faq/spark-via-slurm

Accessing Previous Versions

Wherever possible, previous versions of this application will be retained for continuity, especially for research projects that require a consistent version of the software throughout the project. Such versions, however, may be unsupported by IT Services or the applications vendor, and may be withdrawn at short or no notice if they can no longer run on the cluster - for example, essential operating system upgrades may be incompatible with old versions.

Previous versions of this application are recorded below:

At present there are no previous versions of this application on the BlueBEAR service.

Known Problems & Limitations

None.

Other Information

The Support Level for this application is An.

Visit the Spark website for more information regarding this application.


Last modified: 20 March 2019