Cluster Flow: A user friendly bioinformatics workflow tool
Supplementary Information
Figure S1: Example Pipeline Script
/* ------FastQ to Bismark Pipeline (RRBS Data) ------This pipeline takes FastQ files as input, runs FastQC, Trim Galore, then processed with Bismark (alignment, deduplication, methylation extractor then tidying up and report generation). It requires a genome index with a corresponding genome folder and bismark converted genome.
Trim Galore is run with the RRBS parameter. */
#fastqc #trim_galore RRBS #bismark_align #bismark_methXtract #bismark_report >bismark_summary_report
Typical pipeline script showing analysis pipeline for reduced representation bisulfite sequencing (RRBS) data, from FastQ files to methylation calls with a project summary report. Pipeline steps will run in parallel for each read group for steps prefixed with a hash symbol (#). All input files will be channelled into the final process, prefixed with a greater-than symbol (>). Table S1: List of supported tools
Program Module Name Description Program Website Name
Custom script - takes a BED file and https://github.com/mel-astar/mel- ngs/blob/master/mel- bedToNrf - computes the NRF (non-redundant chipseq/chipseq- fraction). metrics/CalBedNrf.pl
bedtools_bamToBed bedtools Convert BAM files to BED files and sort. http://bedtools.readthedocs.io
bedtools_intersectN Filter out blacklisted regions from a BAM bedtools http://bedtools.readthedocs.io eg using a BED file.
http://www.bioinformatics.babraha bismark_align Bismark Align Bisulfite data m.ac.uk/projects/bismark
Remove duplicates from aligned Bisulfite http://www.bioinformatics.babraha bismark_deduplicate Bismark data m.ac.uk/projects/bismark
Extract methylation calls from aligned http://www.bioinformatics.babraha bismark_methXtract Bismark Bisulfite data m.ac.uk/projects/bismark
Create a sample report of Bismark http://www.bioinformatics.babraha bismark_report Bismark analysis m.ac.uk/projects/bismark
bismark_summary_rep Create a summary report of a Bismark http://www.bioinformatics.babraha Bismark ort project m.ac.uk/projects/bismark
Run either Bowtie 1 or 2, depending on bowtie Bowtie 1 / 2 read length
bowtie1 Bowite 1 Align reads using Bowtie 1 http://bowtie-bio.sourceforge.net
http://bowtie- bowtie2 Bowtie 2 Align reads ugins Bowtie 2 bio.sourceforge.net/bowtie2
bwa BWA Align reads using BWA http://bio-bwa.sourceforge.net
cf_download Cluster Flow Download data from the web http://clusterflow.io
Merge files using a regex (BAM, gzipped cf_merge_files Cluster Flow http://clusterflow.io or text).
deeptools_bamCovera Creates a bigWig coverage from from a deepTools http://deeptools.readthedocs.io ge BAM file
deeptools_bamFinger Creates a fingerprint plot for ChIP-seq deepTools http://deeptools.readthedocs.io print data
FastQ Runs FastQ Screen to check for http://www.bioinformatics.babraha fastq_screen Screen contaminants m.ac.uk/projects/fastq_screen
http://www.bioinformatics.babraha fastqc FastQC Runs FastQC for basic QC metrics m.ac.uk/projects/fastqc
Subread Counts reads overlapping exons in a http://bioinf.wehi.edu.au/featureCo featureCounts featureCoun GTF file, grouped by gene. unts ts
Mapping and quality control for Hi-C http://www.bioinformatics.babraha hicup HiCUP data. m.ac.uk/projects/hicup Program Module Name Description Program Website Name hisat2 HISAT2 Align RNA-seq reads with HISAT2 https://ccb.jhu.edu/software/hisat2
HTSeq- Counts reads overlapping exons in a http://www- htseq_counts huber.embl.de/users/anders/HTSe count GTF file. q/doc/count.html
Quantify abundances of transcripts from kallisto Kallisto https://pachterlab.github.io/kallisto RNA-Seq data
Aggregates results from bioinformatics multiqc MultiQC analyses across many samples into a http://multiqc.info single report
Phantom phantompeaktools_ru Runs cross correlation analysis on BAM https://github.com/kundajelab/pha Peak Qual nSpp files ntompeakqualtools Tools
http://broadinstitute.github.io/picar picard_dedup Picard Mark duplicates in a BAM files d
Calculate the complexity of a sequencing http://smithlabresearch.org/softwa preseq_calc Preseq library re/preseq rseqc_geneBody_cove Calculate the average coverage across RSeQC http://rseqc.sourceforge.net rage genes rseqc_inner_distanc Calculate the distance between reads for RSeQC http://rseqc.sourceforge.net e an RNA-seq library
QC plots for exon junction annoation and rseqc_junctions RSeQC http://rseqc.sourceforge.net saturation
Calculates a histogram showing the GC rseqc_read_GC RSeQC http://rseqc.sourceforge.net content of reads samtools_bam2sam Samtools Convert a BAM file to SAM http://www.htslib.org
Mark and remove duplicated sequences samtools_dedup Samtools http://www.htslib.org from BAM files samtools_sort_index Samtools Sort and index a BAM file http://www.htslib.org
Extract csqual and csfasta files from .sra sra_abidump SRA-Tools http://ncbi.github.io/sra-tools input sra_fqdump SRA-Tools Extract FastQ files from .sra input http://ncbi.github.io/sra-tools
https://github.com/alexdobin/STA star STAR Align RNA data with STAR R tophat TopHat Align RNA data with TopHat https://ccb.jhu.edu/software/tophat
Trim adapter sequence and low quality http://www.bioinformatics.babraha trim_galore TrimGalore! bases from reads m.ac.uk/projects/trim_galore
List of modules with tool description and URL. Core Cluster Flow modules excluded. List valid at time of writing for Cluster Flow v0.4.