Cluster Flow: A user friendly workflow tool

Supplementary Information

Figure S1: Example Pipeline Script

/* ------FastQ to Bismark Pipeline (RRBS Data) ------This pipeline takes FastQ files as input, runs FastQC, Trim Galore, then processed with Bismark (alignment, deduplication, methylation extractor then tidying up and report generation). It requires a genome index with a corresponding genome folder and bismark converted genome.

Trim Galore is run with the RRBS parameter. */

#fastqc #trim_galore RRBS #bismark_align #bismark_methXtract #bismark_report >bismark_summary_report

Typical pipeline script showing analysis pipeline for reduced representation bisulfite (RRBS) data, from FastQ files to methylation calls with a project summary report. Pipeline steps will run in parallel for each read group for steps prefixed with a hash symbol (#). All input files will be channelled into the final process, prefixed with a greater-than symbol (>). Table S1: List of supported tools

Program Module Name Description Program Website Name

Custom script - takes a BED file and https://github.com/mel-astar/mel- ngs/blob/master/mel- bedToNrf - computes the NRF (non-redundant chipseq/chipseq- fraction). metrics/CalBedNrf.pl

bedtools_bamToBed bedtools Convert BAM files to BED files and sort. http://bedtools.readthedocs.io

bedtools_intersectN Filter out blacklisted regions from a BAM bedtools http://bedtools.readthedocs.io eg using a BED file.

http://www.bioinformatics.babraha bismark_align Bismark Align Bisulfite data m.ac.uk/projects/bismark

Remove duplicates from aligned Bisulfite http://www.bioinformatics.babraha bismark_deduplicate Bismark data m.ac.uk/projects/bismark

Extract methylation calls from aligned http://www.bioinformatics.babraha bismark_methXtract Bismark Bisulfite data m.ac.uk/projects/bismark

Create a sample report of Bismark http://www.bioinformatics.babraha bismark_report Bismark analysis m.ac.uk/projects/bismark

bismark_summary_rep Create a summary report of a Bismark http://www.bioinformatics.babraha Bismark ort project m.ac.uk/projects/bismark

Run either Bowtie 1 or 2, depending on bowtie Bowtie 1 / 2 read length

bowtie1 Bowite 1 Align reads using Bowtie 1 http://bowtie-bio.sourceforge.net

http://bowtie- bowtie2 Bowtie 2 Align reads ugins Bowtie 2 bio.sourceforge.net/bowtie2

bwa BWA Align reads using BWA http://bio-bwa.sourceforge.net

cf_download Cluster Flow Download data from the web http://clusterflow.io

Merge files using a regex (BAM, gzipped cf_merge_files Cluster Flow http://clusterflow.io or text).

deeptools_bamCovera Creates a bigWig coverage from from a deepTools http://deeptools.readthedocs.io ge BAM file

deeptools_bamFinger Creates a fingerprint plot for ChIP-seq deepTools http://deeptools.readthedocs.io print data

FastQ Runs FastQ Screen to check for http://www.bioinformatics.babraha fastq_screen Screen contaminants m.ac.uk/projects/fastq_screen

http://www.bioinformatics.babraha fastqc FastQC Runs FastQC for basic QC metrics m.ac.uk/projects/fastqc

Subread Counts reads overlapping exons in a http://bioinf.wehi.edu.au/featureCo featureCounts featureCoun GTF file, grouped by gene. unts ts

Mapping and quality control for Hi-C http://www.bioinformatics.babraha hicup HiCUP data. m.ac.uk/projects/hicup Program Module Name Description Program Website Name hisat2 HISAT2 Align RNA-seq reads with HISAT2 https://ccb.jhu.edu/software/hisat2

HTSeq- Counts reads overlapping exons in a http://www- htseq_counts huber.embl.de/users/anders/HTSe count GTF file. q/doc/count.html

Quantify abundances of transcripts from kallisto Kallisto https://pachterlab.github.io/kallisto RNA-Seq data

Aggregates results from bioinformatics multiqc MultiQC analyses across many samples into a http://multiqc.info single report

Phantom phantompeaktools_ru Runs cross correlation analysis on BAM https://github.com/kundajelab/pha Peak Qual nSpp files ntompeakqualtools Tools

http://broadinstitute.github.io/picar picard_dedup Picard Mark duplicates in a BAM files d

Calculate the complexity of a sequencing http://smithlabresearch.org/softwa preseq_calc Preseq library re/preseq rseqc_geneBody_cove Calculate the average coverage across RSeQC http://rseqc.sourceforge.net rage genes rseqc_inner_distanc Calculate the distance between reads for RSeQC http://rseqc.sourceforge.net e an RNA-seq library

QC plots for exon junction annoation and rseqc_junctions RSeQC http://rseqc.sourceforge.net saturation

Calculates a histogram showing the GC rseqc_read_GC RSeQC http://rseqc.sourceforge.net content of reads samtools_bam2sam Samtools Convert a BAM file to SAM http://www.htslib.org

Mark and remove duplicated sequences samtools_dedup Samtools http://www.htslib.org from BAM files samtools_sort_index Samtools Sort and index a BAM file http://www.htslib.org

Extract csqual and csfasta files from .sra sra_abidump SRA-Tools http://ncbi.github.io/sra-tools input sra_fqdump SRA-Tools Extract FastQ files from .sra input http://ncbi.github.io/sra-tools

https://github.com/alexdobin/STA star STAR Align RNA data with STAR R tophat TopHat Align RNA data with TopHat https://ccb.jhu.edu/software/tophat

Trim adapter sequence and low quality http://www.bioinformatics.babraha trim_galore TrimGalore! bases from reads m.ac.uk/projects/trim_galore

List of modules with tool description and URL. Core Cluster Flow modules excluded. List valid at time of writing for Cluster Flow v0.4.