Table S1. Short-read programs.

Paired-end Use FASTQ Name Description Gapped Multi-threaded License Website option quality A GPGPU accelerated Burrows-Wheeler Yes (POSIX transform (FM-index) short read alignment http://seqbarracuda. BarraCUDA Yes No Yes Threads and GPL program based on BWA, supports alignment of sourceforge.net/ CUDA) with gap openings and extensions. Uses a short kmers to rapidly index ; no size or scaffold count limit. Higher sensitivity and specificity than Burrows-Wheeler aligners, with similar or greater speed. Performs affine- transform-optimized global alignment, which is http://sourceforge.n BBMap slower but more accurate than Smith-Waterman. Yes Yes Yes Yes BSD et/projects/bbmap/ Handles Illumina, 454, PacBio, Sanger, and Ion Torrent data. Splice-aware; capable of processing long INDELs and RNA-seq. Pure Java; runs on any platform. Used by the Joint Genome Institute. Explicit time and accuracy tradeoff with prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses Yes (POSIX http://bfast.sourcefo BFAST indexes. Can handle billions of short reads. Can GPL Threads) rge.net/ handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment. Tool to run the Burrows-Wheeler Aligner-BWA on a Hadoop cluster. It supports the BWA-MEM, BWA-ALN, and BWA-SW, working Low quality https://github.com/c BigBWA with paired and single reads. It implies an Yes Yes Yes GPL v3 bases trimming itiususc/BigBWA important reduction in the computational time when running in a Hadoop cluster, adding scalability and fault-tolerance. BLAST's alignment program, slow and not accurate for short reads, and uses a http://blast.ncbi.nlm BLASTN sequence (EST, sanger sequence) rather .nih.gov/ than a reference genome. Free for academic Made by Jim Kent. Can handle one mismatch in Yes http://www.soe.ucsc BLAT and non- initial alignment step. (client/server). .edu/~kent/ commercial use. Uses a Burrows-Wheeler transform to create a permanent, reusable index of the genome; 1.3 GB memory footprint for genome. Aligns Yes (POSIX Artistic http://bowtie- Bowtie Yes Yes No more than 25 million Illumina reads in 1 CPU Threads) License bio.sourceforge.net/ hour. Supports Maq-like and SOAP-like alignment policies Free for Uses a hash table and bloom matrix to create and academic filter potential positions on the genome. For and non- higher efficiency uses cross-similarity between commercial https://hive.biochem short reads and avoids realigning non unique HIVE-hexagon Yes Yes Yes Yes users istry.gwu.edu/dna.c redundant sequences. It is faster than bowtie and registered gi?cmd=about BWA and allows INDELs and divergent sensitive to HIVE alignments on viruses and as well as deployment more conservative eukaryotic alignments. instance. Uses a Burrows-Wheeler transform to create an http://bio- Low quality BWA index of the genome. It's a bit slower than bowtie Yes Yes Yes GPL bwa.sourceforge.net bases trimming but allows INDELs in alignment. / A probabilistic short read aligner based on the use of position specific scoring matrices (PSSM). The aligner is adaptable in the sense that it can http://bwa- BWA-PSSM take into account the quality scores of the reads Yes Yes Yes Yes GPL pssm.binf.ku.dk/ and models of data specific biases, such as those observed in Ancient DNA, PAR-CLIP data or with biased nucleotide compositions. Quantify and manage large quantities of short- Free for read sequence data. CASHX pipeline contains a academic set of tools that can be used together or as http://carringtonlab. CASHX No and non- independent modules on their own. This org/resources/cashx commercial is very accurate for perfect hits to a use. reference genome. http://sourceforge.n Yes (Hadoop Artistic et/apps/mediawiki/c Cloudburst Short-read mapping using Hadoop MapReduce MapReduce) License loudburst- bio/index.php http://www.nvidia.c Short-read alignment error correction using Yes (GPU CUDA-EC om/object/ec_on_tes GPUs. enabled) la. A CUDA compatible short read aligner to large Yes (GPU http://cushaw.sourc CUSHAW Yes Yes No GPL genomes based on Burrows-Wheeler transform. enabled) eforge.net/ Gapped short-read and long-read alignment http://cushaw2.sour CUSHAW2 Yes No Yes Yes GPL based on maximal exact match seeds. This ceforge.net/ aligner supports both base-space (e.g. from Illumina, 454, Ion Torrent and PacBio sequencers) and ABI SOLiD color-space read alignments. http://cushaw2.sour CUSHAW2-GPU GPU-accelerated CUSHAW2 short-read aligner. Yes No Yes Yes GPL ceforge.net/ Sensitive and Accurate Base-Space and Color- http://cushaw3.sour CUSHAW3 Space Short-Read Alignment with Hybrid Yes No Yes Yes GPL ceforge.net/ Seeding Read mapping alignment that implements cache obliviousness to minimize main/cache memory transfers like mrFAST and Yes (for http://drfast.sourcef drFAST mrsFAST, however designed for the SOLiD Yes structural Yes No BSD orge.net/ platform (color space reads). It also variation) returns all possible map locations for improved structural variation discovery. Implemented by Illumina. Includes ungapped ELAND alignment with a finite read length. Extended Randomized Numerical alignEr for Multithreading Low quality http://erne.sourcefor ERNE accurate alignment of NGS reads. It can map Yes Yes and MPI- GPL v3 bases trimming ge.net/ bisulfite-treated reads. enabled CeCILL http://www.irisa.fr/s Finds global alignments of short DNA sequences GASSST Multithreading version 2 ymbiose/projects/ga against large DNA banks License. ssst/ Dual (free High-quality alignment engine (exhaustive for non- mapping with substitutions and INDELs). More commercial accurate and several times faster than BWA or http://gemlibrary.so GEM Yes Yes Yes Yes use); GEM Bowtie 1/2. Many standalone biological urceforge.net/ source is applications (mapper, split mapper, mappability, currently and other) provided. unavailable http://www.genalice Ultra-fast and comprehensive NGS read aligner Low quality Genalice MAP Yes Yes Yes Commercial .com/product/genali with high precision and small storage footprint. bases trimming ce-map/ Fast, accurate overlap assembler with the ability to handle any combination of sequencing Geneious http://www.geneiou technology, read length, any pairing orientations, Yes Commercial Assembler sserver.com/ with any spacer size for the pairing, with or without a reference genome. Complete framework with user-friendly GUI to http://www.phenos GensearchNGS analyze NGS data. It integrates a proprietary Yes No Yes Yes Commercial ystems.com/www/i high quality alignment algorithm as well as plug- in capability to integrate various public aligners ndex.php/products/ into a framework allowing to import short reads, gensearchngs align them, detect variants and generate reports. It is geared towards re-sequencing projects, namely in a diagnostic setting. Robust, fast short-read alignment. GMAP: longer reads, with multiple INDELs and splices (see entry above under analysis); GSNAP: Free for shorter reads, with a single or up to two academic http://research- GMAP and splices per read. Useful for digital Yes Yes Yes Yes and non- pub.gene.com/gmap GSNAP expression, SNP and INDEL genotyping. commercial / Developed by Thomas Wu at Genentech. Used use. by the National Center for Genome Resources (NCGR) in Alpheus. Yes (also Accurately performs gapped alignment of supports sequence data obtained from next-generation Illumina *_int.txt Multithreading sequencing machines (specifically that of http://dna.cs.byu.ed GNUMAP and *_prb.txt and MPI- Solexa/Illumina) back to a genome of any size. u/gnumap/ files with all 4 enabled Includes adaptor trimming, SNP calling and quality scores for Bisulfite . each base) iSAAC has been designed to take full advantage of all the computational power available on a https://github.com/s single server node. As a result, iSAAC scales well iSAAC Yes Yes Yes Yes BSD equencing/isaac_ali over a broad range of hardware architectures, gner and alignment performance improves with hardware capabilities LAST uses adaptive seeds and copes more efficiently with repeat-rich sequences (e.g. LAST genomes). For example: it can align reads to Yes Yes Yes No GPL http://last.cbrc.jp/ genomes without repeat-masking, without becoming overwhelmed by repetitive hits. Ungapped alignment that takes into account http://maq.sourcefor MAQ GPL quality scores for each base. ge.net/ Gapped (mrFAST) and ungapped (mrsFAST) alignment software that implements cache obliviousness to minimize main/cache memory Yes (for mrFAST and http://mrfast.sourcef transfers. They are designed for the Illumina Yes structural Yes No BSD mrsFAST orge.net/ sequencing platform and they can return all variation) possible map locations for improved structural variation discovery. MOM or maximum oligonucleotide mapping is a http://mom.csbc.vcu MOM query matching tool that captures a maximal Yes .edu/ length match within the short read. Fast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith- http://bioinformatics MOSAIK Waterman algorithm seeded by results from a Yes .bc.edu/marthlab/M kmer hashing scheme. Supports read ranging in osaik size from very short to very long. Fast aligner based on a filtration strategy (no http://www.atgc- MPscan indexing, use q-grams and Backward montpellier.fr/mpsc Nondeterministic DAWG Matching) an/ Single Gapped alignment of single end and paired end threaded Illumina GA I & II, ABI Color space & ION Multi-threading version free Torrent reads.. High sensitivity and specificity, and MPI Novoalign & for http://www.novocra using base qualities at all steps in the alignment. Yes Yes Yes versions NovoalignCS academic ft.com/ Includes adapter trimming, base quality available with and non- calibration, Bi-Seq alignment, and option to paid license. commercial report multiple alignments per read. use. NextGENe® software has been developed specifically for use by biologists performing analysis of next generation sequencing data from http://softgenetics.co NextGENe Roche Genome Sequencer FLX, Illumina Yes Yes Yes Yes Commercial m/NextGENe.html GA/HiSeq, Life Technologies Applied BioSystems’ SOLiD™ System, PacBio and Ion Torrent platforms. Flexible and fast read mapping program (twice as fast as BWA), achieves a mapping sensitivity comparable to Stampy. Internally uses a memory efficient index structure (hash table) to store the positions of all 13-mers present in the reference Yes (POSIX genome. Mapping regions where pairwise https://github.com/ Threads, Open NextGenMap alignments are required are dynamically Yes No Yes Cibiv/NextGenMap/ OpenCL/CUDA, Source determined for each read. Uses fast SIMD wiki SSE) instructions (SSE) to accelerate the alignment calculations on the CPU. If available, alignments are computed on the GPU (using OpenCL/CUDA) resulting in an additional runtime reduction of 20 - 50%. The Omixon Variant Toolkit includes highly http://www.omixon. Omixon sensitive and highly accurate tools for detecting Yes Yes Yes Yes Commercial com/ SNPs and INDELs. It offers a solution to map NGS short reads with a moderate distance (up to 30% sequence divergence) from reference genomes. It poses no restrictions on the size of the reference, which, combined with its high sensitivity, makes the Variant Toolkit well-suited for targeted sequencing projects and diagnostics. PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy. Relying on a ML strategy combined with a fast http://www.fml.tue PALMapper mapping based on a banded Smith-Waterman- Yes GPL bingen.mpg.de/raets like algorithm it aligns around 7 million reads ch/suppl/palmapper per hour on a single CPU. It refines the originally proposed QPALMA approach. Partek® Flow software has been developed specifically for use by biologists and bioinformaticians. It supports un-gapped, gapped and splice-junction alignment from single and paired-end reads from Illumina, Life Multiprocessor/ technologies Solid TM, Roche 454 and Ion Core, Client- Commercial http://www.partek.c Partek Torrent raw data (with or without quality Yes Yes Yes Server , FREE trial om/ information). It integrates powerful quality installation version control on FASTQ/Qual level and on aligned possible data. Additional functionality includes trimming and filtering of raw reads, SNP and INDEL detection, mRNA and microRNA quantification and fusion gene detection. Free for Indexes the genome, then extends seeds using academic pre-computed alignments of words. Works with http://pass.cribi.uni PASS Yes Yes Yes Yes and non- base space as well as color space (SOLID) and pd.it/ commercial can align genomic and spliced RNA-seq reads. use. Indexes the genome with periodic seeds to quickly find alignments with full sensitivity up https://code.google.c PerM to four mismatches. It can map Illumina and Yes GPL om/p/perm/ SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths. Indexes the genome with a kmer lookup table https://www.researc No (multiple with full sensitivity up to an adjustable number hgate.net/publicatio PRIMEX No No Yes processes per of mismatches. It is best for mapping 15-60bp n/233734306_mex- search) sequences to a genome. 0.99.tar Is able to take advantage of quality scores, intron lengths and computation splice site predictions to perform and performs an unbiased alignment. http://www.fml.tue Yes QPalma Can be trained to the specifics of a RNA-seq GPLv2 bingen.mpg.de/raets (client/server) experiment and genome. Useful for splice ch/projects/qpalma site/intron discovery and for gene model building. (See PALMapper for a faster version). No read length limit. Hamming or mapping with configurable error rates. http://www.seqan.d RazerS Configurable and predictable sensitivity LGPL e/projects/razers.ht (runtime/sensitivity tradeoff). Supports paired- ml end read mapping. REAL is an efficient, accurate, and sensitive tool for aligning short reads obtained from next- generation sequencing. The programme can handle an enormous amount of single-end reads http://sco.h- REAL, cREAL generated by the next-generation Illumina/Solexa Yes Yes GPL its.org/exelixis/real/ Genome Analyzer. cREAL is a simple extension of REAL for aligning short reads obtained from next-generation sequencing to a genome with circular structure. Can map reads with or without error probability information (quality scores) and supports paired- http://rulai.cshl.edu/ RMAP end reads or bisulfite-treated read mapping. Yes Yes Yes GPL v3 rmap/ There are no limitations on read length or number of mismatches. Multithreading A randomized Numerical Aligner for Accurate Low quality http://iga- rNA Yes Yes and MPI- GPL v3 alignment of NGS reads bases trimming .sourceforge.net/ enabled Extremely fast, tolerant to high INDEL and substitution counts. Includes full read alignment. Free for Product includes comprehensive pipelines for Yes, for variant individual http://www.realtime RTG Investigator Yes Yes Yes variant detection and metagenomic analysis with calling investigator genomics.com/ any combination of Illumina, Complete use. Genomics and Roche 454 data. Free for http://www.bioinf.u Can handle insertions, deletions and mismatches. non- ni- Segemehl Yes No Yes Yes Uses enhanced suffix arrays. commercial leipzig.de/Software/ use segemehl/ Up to 5 mixed substitutions and Free for http://biogibbs.stanf SeqMap insertions/deletions. Various tuning options and academic ord.edu/~jiangh/Seq input/output formats. and non- Map/ commercial use. http://www.informa Short read error correction with a suffix data Shrec Yes (Java) tik.uni- structure. kiel.de/~jasc/Shrec/ Indexes the reference genome as of version 2. BSD http://compbio.cs.to SHRiMP Uses masks to generate possible keys. Can map Yes Yes Yes Yes (OpenMP) derivative ronto.edu/shrimp ABI SOLiD color space reads. Slider is an application for the Illumina Sequence Analyzer output that uses the "probability" files http://www.bcgsc.ca SLIDER instead of the sequence files as an input for /platform/bioinfo/so alignment to a reference sequence or a set of ftware/slider reference sequences. SOAP: Robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. SOAP2: using Yes (POSIX bidirectional BWT to build the index of reference, Threads), SOAP, SOAP2, and it is much faster than the first version. https://sourceforge. SOAP3- SOAP3, SOAP3- SOAP3 and SOAP3: GPU-accelerated version that could find Yes No GPL net/projects/soap3d dp:Yes dp need GPU SOAP3-dp all 4-mismatch alignments in tens of seconds per p/ with CUDAsupp one million reads. SOAP3-dp, also GPU ort. accelerated, supports arbitrary number of mismatches and gaps according to affine scores. For ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color http://socs.biology.g SOCS Yes GPL errors). Uses an iterative version of the Rabin- atech.edu/ Karp string search algorithm. SparkBWA is a tool that integrates the Burrows- Wheeler Aligner—BWA on an Apache framework running on the top of Hadoop. The Low quality https://github.com/c SparkBWA Yes Yes Yes GPLv3 current version of SparkBWA (v0.1, march 2016) bases trimming itiususc/SparkBWA supports the algorithms BWA-MEM and BWA- ALN. Both work with paired-end reads. Free for academic http://www.sanger.a SSAHA and Fast for a small number of variants. and non- c.uk/Software/analy SSAHA2 commercial sis/SSAHA2/ use. For Illumina reads. High specificity, and Free for http://www.well.ox. Stampy sensitive for reads with INDELs, structural Yes Yes Yes No academic ac.uk/project- variants, or many SNPs. Slow, but speed and non- stampy increased dramatically by using BWA for first commercial alignment pass). use For Illumina or ABI SOLiD reads, with SAM native output. Highly sensitive for reads with many errors, INDELs (full from 0 to 15, http://bioinfo.lifl.fr/y extended support otherwise). Uses spaced seeds Open SToRM No Yes Yes Yes (OpenMP) ass/iedera_solid/stor (single hit) and a very fast SSE/SSE2/AVX2/AVX- source m/ 512 banded alignment filter. For fixed-length reads only, authors recommend SHRiMP2 otherwise. http://subread.sourc Superfast and accurate read aligners. Subread eforge.net/ can be used to map both gDNA-seq and RNA- Subread and http://bioconductor. seq reads. Subjunc detects exon-exon junctions Yes Yes Yes Yes GPL3 Subjunc org/packages/releas and maps RNA-seq reads. They employ a novel e/bioc/html/Rsubrea mapping paradigm called "seed-and-vote". d.html Free for academic http://taipan.sourcef Taipan de-novo Assembler for Illumina reads and non- orge.net/ commercial use. Visual interface both for Bowtie and BWA, as Open http://ugene.unipro. UGENE Yes Yes Yes Yes well as an embedded aligner source, GPL ru/ FPGA-accelerated reference sequence alignment mapping tool from TimeLogic. Faster than Burrows-Wheeler transform-based http://www.timelogi VelociMapper algorithms like BWA and Bowtie. Supports up to Yes Yes Yes Yes Commercial c.com/catalog/799/v 7 mismatches and/or INDELs with no elocimapper performance penalty. Produces sensitive Smith- Waterman gapped alignments. FPGA based sliding window short read aligner which exploits the property of short read alignment. Performance scales linearly with number of transistors on a Free for chip (i.e. performance guaranteed to double with http://www.bcgsc.ca academic each iteration of Moore's Law without /platform/bioinfo/so XpressAlign and non- modification to algorithm). Low power ftware/fpga- commercial consumption is useful for data center equipment. illumina-aligner use. Predictable runtime. Better price/performance than software sliding window aligners on current hardware, but not better than software BWT- based aligners currently. Can cope with large numbers (>2) of mismatches. Will find all hit positions for all seeds. Single-FPGA experimental version, needs work to develop it into a multi- FPGA production version. 100% sensitivity for a reads between 15 - 240bp http://www.bioinfor with practical mismatches. Very fast. Support Yes (GUI) No ZOOM Commercial .com/zoom/general/ insertions and deletions. Works with Illumina & (CLI). overview.html SOLiD instruments, not 454.

Table S2. Variant annotation programs.

Tools Description External resources use Website SnpEff annotates variants based on their genomic locations and predicts ENSEMBL, UCSC and organism based e.g. http://snpeff.sourceforge.net/SnpEff_ SnpEff coding effects. Uses an interval forest approach FlyBase, WormBase and TAIR manual.htm Provides the location of specific variants in individuals. Variants are VEP dbSNP, Ensembl, UCSC and NCBI http://www.ensembl.org/ calculated using sanger-style resequencing data This tool is suitable for pinpointing a small subset of functionally http://www.openbioinformatics.org/a ANNOVAR UCSC, RefSe and Ensembl important variants. Uses mutation prediction approach for annotation nnovar/ SVM-based method using sequence information retrieved by BLAST PhD-SNP UniRef90 http://snps.biofold.org/phd-snp/ algorithm. Suitable for predicting damaging effects of missense mutations. Uses http://genetics.bwh.harvard.edu/pph PolyPhen-2 sequence conservation, structure to model position of UniPort 2/ substitution, and SWISS-PROT annotation Ensembl, 1000 Genomes Project, ExAC, Suitable for predicting damaging effects of all intragenic mutations MutationTaster UniProt, ClinVar, phyloP, phastCons, http://www.mutationtaster.org/ (DNA and level), including INDELs. nnsplice, polyadq (...) An SVM-trained predictor of the damaging effects of missense UniProt, PDB,Phyre2 for predicted mutations. Uses sequence conservation, structure and network http://www.sbg.bio.ic.ac.uk/suspect/i SuSPect structures, DOMINE and STRING for (interactome) information to model phenotypic effect of amino acid ndex.html interactome substitution. Accepts VCF file PolyPhen, SIFT, SNPeffect, SNPs3D, LS- SNP, ESEfinder, RescueESE, ESRSearch, Computationally predicts functional SNPs for disease association F-SNP PESX, Ensembl, TFSearch, Consite, http://compbio.cs.queensu.ca/F-SNP/ studies. GoldenPath, Ensembl, KinasePhos, OGPET, Sulfinator, GoldenPath dbSNP, UCSC, GATK refGene, GAD, Design to Identify novel and SNP/SNV, INDEL and SV/CNV. AnnTools published lists of common structural AnnTools searches for overlaps with regulatory elements, disease/trait associated http://anntools.sourceforge.net/ genomic variation, Database of Genomic loci, known segmental duplications and artifact prone regions Variants, lists of conserved TFBs, miRNA Analyses the potential functional significance of SNPs derived from dbSNP, EntrezGene, UCSC Browser, SNPit - genome wide association studies HGMD, ECR Browser, Haplotter, SIFT Uses physical and functional based annotation to categorize according to http://www.scandb.org/newinterface/ SCAN their position relative to and according to linkage disequilibrium - about.html (LD) patterns and effects on expression levels A neural network-based method for the prediction of the functional Ensembl, UCSC, Uniprot, UniProt, , http://www.rostlab.org/services/SNA SNAP effects of non-synonymous SNPs DAS-CBS, MINT, BIND, KEGG, TreeFam P SVM-based method using sequence information, SNPs&GO UniRef90, GO, PANTHER, PDB http://snps.biofold.org/snps-and-go/ annotation and when available . Maps nsSNPs onto protein sequences, functional pathways and LS-SNP UniProtKB, Genome Browser, dbSNP, PD http://www.salilab.org/LS-SNP comparative protein structure models TREAT is a tool for facile navigation and mining of the variants from http://ndc.mayo.edu/mayo/research/b TREAT - both targeted resequencing and whole exome sequencing iostat/stand-alone-packages.cfm Suitable for non-specific or support non-model organism data. https://code.google.com/p/snpdat/do SNPdat SNPdat does not require the creation of any local relational or - wnloads/ pre-processing of any mandatory input files Annotate SNPs comparing the reference amino acid and the non- http://stothard.afns.ualberta.ca/down NGS – SNP Ensembl, NCBI and UniProt reference amino acid to each orthologue loads/NGS-SNP/ NCBI RefSeq, Ensembl, variation databases, SVA Predicted biological function to variants identified UCSC, HGNC, GO, KEGG, HapMap, 1000 http://www.svaproject.org/ Genomes Project and DG VARIANT increases the information scope outside the coding regions by including all the available information on regulation, DNA structure, dbSNP,1000 genomes, disease-related VARIANT http://variant.bioinfo.cipf.es/ conservation, evolutionary pressures, etc. Regulatory variants constitute variants from GWAS,OMIM, COSMIC a recognized, but still unexplored, cause of pathologies SIFT is a program that predicts whether an amino acid substitution SIFT affects protein function. SIFT uses sequence to predict PROT/TrEMBL, or NCBI's http://blocks.fhcrc.org/sift/SIFT.html whether an amino acid substitution will affect protein function NCBI dbSNP, Ensembl, TFSearch, A web server that allows users to efficiently identify and prioritize high- PolyPhen, ESEfinder, RescueESE, FAS-ESS, FAST-SNP risk SNPs according to their phenotypic risks and putative functional http://fastsnp.ibms.sinica.edu.tw/ SwissProt, UCSC Golden Path, NCBI Blast effects and HapMap PANTHER relate protein sequence to the evolution of specific protein functions and biological roles. The source of protein sequences STKE, KEGG, MetaCyc, FREX and PANTHER http://www.pantherdb.org/ used to build the trees and used a computer-assisted Reactome manual curation step to better define the protein family clusters Meta-SNP SVM-based meta predictor including 4 different methods. PhD-SNP, PANTHER, SIFT, SNAP http://snps.biofold.org/meta-snp