Table S1. Short-Read Sequence Alignment Programs

Table S1. Short-read sequence alignment programs. Paired-end Use FASTQ Name Description Gapped Multi-threaded License Website option quality A GPGPU accelerated Burrows-Wheeler Yes (POSIX transform (FM-index) short read alignment http://seqbarracuda. BarraCUDA Yes No Yes Threads and GPL program based on BWA, supports alignment of sourceforge.net/ CUDA) INDELs with gap openings and extensions. Uses a short kmers to rapidly index genome; no size or scaffold count limit. Higher sensitivity and specificity than Burrows-Wheeler aligners, with similar or greater speed. Performs affine- transform-optimized global alignment, which is http://sourceforge.n BBMap slower but more accurate than Smith-Waterman. Yes Yes Yes Yes BSD et/projects/bbmap/ Handles Illumina, 454, PacBio, Sanger, and Ion Torrent data. Splice-aware; capable of processing long INDELs and RNA-seq. Pure Java; runs on any platform. Used by the Joint Genome Institute. Explicit time and accuracy tradeoff with prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses Yes (POSIX http://bfast.sourcefo BFAST indexes. Can handle billions of short reads. Can GPL Threads) rge.net/ handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment. Tool to run the Burrows-Wheeler Aligner-BWA on a Hadoop cluster. It supports the algorithms BWA-MEM, BWA-ALN, and BWA-SW, working Low quality https://github.com/c BigBWA with paired and single reads. It implies an Yes Yes Yes GPL v3 bases trimming itiususc/BigBWA important reduction in the computational time when running in a Hadoop cluster, adding scalability and fault-tolerance. BLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a http://blast.ncbi.nlm BLASTN sequence database (EST, sanger sequence) rather .nih.gov/ than a reference genome. Free for academic Made by Jim Kent. Can handle one mismatch in Yes http://www.soe.ucsc BLAT and non- initial alignment step. (client/server). .edu/~kent/ commercial use. Uses a Burrows-Wheeler transform to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns Yes (POSIX Artistic http://bowtie- Bowtie Yes Yes No more than 25 million Illumina reads in 1 CPU Threads) License bio.sourceforge.net/ hour. Supports Maq-like and SOAP-like alignment policies Free for Uses a hash table and bloom matrix to create and academic filter potential positions on the genome. For and non- higher efficiency uses cross-similarity between commercial https://hive.biochem short reads and avoids realigning non unique HIVE-hexagon Yes Yes Yes Yes users istry.gwu.edu/dna.c redundant sequences. It is faster than bowtie and registered gi?cmd=about BWA and allows INDELs and divergent sensitive to HIVE alignments on viruses and bacteria as well as deployment more conservative eukaryotic alignments. instance. Uses a Burrows-Wheeler transform to create an http://bio- Low quality BWA index of the genome. It's a bit slower than bowtie Yes Yes Yes GPL bwa.sourceforge.net bases trimming but allows INDELs in alignment. / A probabilistic short read aligner based on the use of position specific scoring matrices (PSSM). The aligner is adaptable in the sense that it can http://bwa- BWA-PSSM take into account the quality scores of the reads Yes Yes Yes Yes GPL pssm.binf.ku.dk/ and models of data specific biases, such as those observed in Ancient DNA, PAR-CLIP data or genomes with biased nucleotide compositions. Quantify and manage large quantities of short- Free for read sequence data. CASHX pipeline contains a academic set of tools that can be used together or as http://carringtonlab. CASHX No and non- independent modules on their own. This org/resources/cashx commercial algorithm is very accurate for perfect hits to a use. reference genome. http://sourceforge.n Yes (Hadoop Artistic et/apps/mediawiki/c Cloudburst Short-read mapping using Hadoop MapReduce MapReduce) License loudburst- bio/index.php http://www.nvidia.c Short-read alignment error correction using Yes (GPU CUDA-EC om/object/ec_on_tes GPUs. enabled) la.html A CUDA compatible short read aligner to large Yes (GPU http://cushaw.sourc CUSHAW Yes Yes No GPL genomes based on Burrows-Wheeler transform. enabled) eforge.net/ Gapped short-read and long-read alignment http://cushaw2.sour CUSHAW2 Yes No Yes Yes GPL based on maximal exact match seeds. This ceforge.net/ aligner supports both base-space (e.g. from Illumina, 454, Ion Torrent and PacBio sequencers) and ABI SOLiD color-space read alignments. http://cushaw2.sour CUSHAW2-GPU GPU-accelerated CUSHAW2 short-read aligner. Yes No Yes Yes GPL ceforge.net/ Sensitive and Accurate Base-Space and Color- http://cushaw3.sour CUSHAW3 Space Short-Read Alignment with Hybrid Yes No Yes Yes GPL ceforge.net/ Seeding Read mapping alignment software that implements cache obliviousness to minimize main/cache memory transfers like mrFAST and Yes (for http://drfast.sourcef drFAST mrsFAST, however designed for the SOLiD Yes structural Yes No BSD orge.net/ sequencing platform (color space reads). It also variation) returns all possible map locations for improved structural variation discovery. Implemented by Illumina. Includes ungapped ELAND alignment with a finite read length. Extended Randomized Numerical alignEr for Multithreading Low quality http://erne.sourcefor ERNE accurate alignment of NGS reads. It can map Yes Yes and MPI- GPL v3 bases trimming ge.net/ bisulfite-treated reads. enabled CeCILL http://www.irisa.fr/s Finds global alignments of short DNA sequences GASSST Multithreading version 2 ymbiose/projects/ga against large DNA banks License. ssst/ Dual (free High-quality alignment engine (exhaustive for non- mapping with substitutions and INDELs). More commercial accurate and several times faster than BWA or http://gemlibrary.so GEM Yes Yes Yes Yes use); GEM Bowtie 1/2. Many standalone biological urceforge.net/ source is applications (mapper, split mapper, mappability, currently and other) provided. unavailable http://www.genalice Ultra-fast and comprehensive NGS read aligner Low quality Genalice MAP Yes Yes Yes Commercial .com/product/genali with high precision and small storage footprint. bases trimming ce-map/ Fast, accurate overlap assembler with the ability to handle any combination of sequencing Geneious http://www.geneiou technology, read length, any pairing orientations, Yes Commercial Assembler sserver.com/ with any spacer size for the pairing, with or without a reference genome. Complete framework with user-friendly GUI to http://www.phenos GensearchNGS analyze NGS data. It integrates a proprietary Yes No Yes Yes Commercial ystems.com/www/i high quality alignment algorithm as well as plug- in capability to integrate various public aligners ndex.php/products/ into a framework allowing to import short reads, gensearchngs align them, detect variants and generate reports. It is geared towards re-sequencing projects, namely in a diagnostic setting. Robust, fast short-read alignment. GMAP: longer reads, with multiple INDELs and splices (see entry above under Genomics analysis); GSNAP: Free for shorter reads, with a single INDEL or up to two academic http://research- GMAP and splices per read. Useful for digital gene Yes Yes Yes Yes and non- pub.gene.com/gmap GSNAP expression, SNP and INDEL genotyping. commercial / Developed by Thomas Wu at Genentech. Used use. by the National Center for Genome Resources (NCGR) in Alpheus. Yes (also Accurately performs gapped alignment of supports sequence data obtained from next-generation Illumina *_int.txt Multithreading sequencing machines (specifically that of http://dna.cs.byu.ed GNUMAP and *_prb.txt and MPI- Solexa/Illumina) back to a genome of any size. u/gnumap/ files with all 4 enabled Includes adaptor trimming, SNP calling and quality scores for Bisulfite sequence analysis. each base) iSAAC has been designed to take full advantage of all the computational power available on a https://github.com/s single server node. As a result, iSAAC scales well iSAAC Yes Yes Yes Yes BSD equencing/isaac_ali over a broad range of hardware architectures, gner and alignment performance improves with hardware capabilities LAST uses adaptive seeds and copes more efficiently with repeat-rich sequences (e.g. LAST genomes). For example: it can align reads to Yes Yes Yes No GPL http://last.cbrc.jp/ genomes without repeat-masking, without becoming overwhelmed by repetitive hits. Ungapped alignment that takes into account http://maq.sourcefor MAQ GPL quality scores for each base. ge.net/ Gapped (mrFAST) and ungapped (mrsFAST) alignment software that implements cache obliviousness to minimize main/cache memory Yes (for mrFAST and http://mrfast.sourcef transfers. They are designed for the Illumina Yes structural Yes No BSD mrsFAST orge.net/ sequencing platform and they can return all variation) possible map locations for improved structural variation discovery. MOM or maximum oligonucleotide mapping is a http://mom.csbc.vcu MOM query matching tool that captures a maximal Yes .edu/ length match within the short read. Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith- http://bioinformatics MOSAIK Waterman algorithm seeded by results from a Yes .bc.edu/marthlab/M kmer hashing scheme. Supports read ranging in osaik size from very short to very long. Fast aligner based on a filtration strategy (no http://www.atgc- MPscan indexing, use q-grams and Backward montpellier.fr/mpsc Nondeterministic DAWG Matching) an/

Table S1. Short-Read Sequence Alignment Programs

Sequencing Alignment I Outline: Sequence Alignment

Comparative Analysis of Multiple Sequence Alignment Tools

Chapter 6: Multiple Sequence Alignment Learning Objectives

How to Generate a Publication-Quality Multiple Sequence Alignment (Thomas Weimbs, University of California Santa Barbara, 11/2012)

To Find Information About Arabidopsis Genes Leonore Reiser1, Shabari

"Phylogenetic Analysis of Protein Sequence Data Using The

An Open-Sourced Bioinformatic Pipeline for the Processing of Next-Generation Sequencing Derived Nucleotide Reads

Homology & Alignment

Assembly Exercise

Visualization Assisted Hardware Configuration for Bioinformatics Algorithms Priyesh Dixit Advisors: Drs. K. R. Subramanian, A. Mukherjee, A

Title: "Implementation of a PRRSV Strain Database" – NPB #04-118

Biological Sequence Analysis