samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. When marking duplicates, samblaster will require approximately 20MB of memory per 1M read pairs.

(A read-id grouped SAM file is one in which all alignments for a read-id (QNAME) are grouped together in adjacent lines. Aligners naturally produce such files. They can also be created by sorting a SAM file by read-id. But as shown below, sorting the input to samblaster by read-id is not required if the alignments are already grouped.)

Accessing the software

To load the module:

$ module load samblaster/0.1.24-foss-2016.10

An example command to include in your job script:

bwa mem <idxbase> samp.r1.fq samp.r2.fq | samblaster | samtools view -Sb - > samp.out.bam

This takes input alignments directly from bwa mem and output to samtools view to compress SAM to BAM:

Accessing Previous Versions

Wherever possible, previous versions of this application will be retained for continuity, especially for research projects that require a consistent version of the software throughout the project.

At present there are no previous versions of this application on the BlueBEAR service.

Other Information

Visit the Github repository for more information regarding this application.

