Structural Variation Detection from Next Generation Sequencing

on: Sequ ati en er c n in e g G & t x A e p Journal of Next Generation Ye, et al., Next Generat Sequenc & Applic 2016, p N l f i c o S1:007 a l t a i o n r ISSN: 2469-9853n u s DOI: 10.4172/2469-9853.S1-007 o J Sequencing & Applications Review Article Open Access Structural Variation Detection from Next Generation Sequencing Kai Ye1*, George Hall2,3 and Zemin Ning2* 1McDonnell Genome Institute, Washington University School of Medicine, Forest Park Avenue, Saint Louis, Missouri, USA 2The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK 3Department of Computer Science, University of York, Heslington, York, YO10 5GH, UK *Corresponding authors: Kai Ye, McDonnell Genome Institute, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, Missouri 63108, USA, Tel: 1-314-813-0879; E-mail: [email protected] Zemin Ning, The Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK, Tel: 44-1223494705; E-mail: [email protected] Rec date: Nov 26, 2015; Acc date: Feb 10, 2016; Pub date: Feb 15, 2016 Copyright: © 2016 Ye K, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Abstract Structural variations (SVs) are the genetic variations in the structure of chromosome with different types of rearrangements. They comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to genetic diversity and disease susceptibility. In the genomics community, substantial efforts have been devoted to improving understanding of the roles of SVs in genome functions relating to diseases and researchers are working actively to develop effective algorithms to reliably identify various types of SVs such as deletions, insertions, duplications and inversions. Structural variant detection using Next-generation sequencing (NGS) data is difficult, and identification of large and complex structural variations is extremely challenging. In this short review, we mainly discuss various algorithms and computational tools for identifying SVs of different types and sizes with a brief introduction to complex SVs. At the end, we highlight the impact and potential applications of the 3rd generation sequencing data, generated from PacBio and Oxford Nanopore long read sequencing platforms. Keywords: Structural variations; Bioinformatics; Algorithms; Next such as PacBio and Oxford Nanopore, substantial impact and changes generation sequencing; Single molecule sequencing can be expected in the genomics community. In this review, our focus will be on the current algorithms used in the identification of SVs, a Introduction type of variant which are difficult to detect, but with huge medical and health implications. At the end, short discussions are presented about The complete catalogue of genetic variants contains substitutions, characteristics of the long read data and how it might impact on SV small insertions and deletions (indels) and large complex structural detection. variants. The most abundant genomic variations are substitutions of single (single nucleotide variant, SNV or SNP) or multiple consecutive Methods of Structural Variation Detection nucleic bases (multiple nucleotide variant or MNV). Due to having a higher density than other types of variants, SNPs are often used in Over the past years, sequencing technology has fuelled genomic genome-wide association studies (GWAS) to mark genome fragments studies and many computational tools have been developed to deepen related to diseases or certain traits. In addition to substitutions and our understanding of structural variation and its role in genome mutations, insertions or deletions of bases in the DNA are another functions. Rapid advancing in reducing NGS costs also makes genome- form of genomic variations between individuals. At low complexity wide SV detection possible, even at population scale [1-3]. Detection regions, especially homopolymer runs or microsatellite tracks, of structural variation is an active research area and different methods substantially higher mutation rates are observed. Structural variation are being explored with different applications. It is widely agreed that (SV) was originally defined as variation other than substitutions and the method can be classified into four different algorithms: read-pair small indels. Currently, SV includes large insertions, duplications, (RP), split-read (SR), read-depth (RD) and assembly. Figure 1 shows deletions, inversions and translocations. With substantial advances in the four general methods one by one, where SVs are identified after sequencing technologies and analysis strategies, the definition of SV reads are aligned against a given reference sequence. Table 1 is a list of has been widened to include variants as small as 50 bp in length [1-3]. structural variation tools published in the literature. It is not possible to While substitutions and short indels could be visualized from short compile a complete list as new tools are coming out every year. It read data, SV was largely detected from indirect evidence of should be noted that some recent tools combine more than one disturbance of read mapping around the variation. There are extensive algorithm in order to increase specificity and sensitivity. Our aim is to studies on structural variations reported in the literature, in terms of provide an overall introduction to the methods generally used in both method development and data collection [1-4]. A number review structural variation detection and no effort is made to compare papers have also been published on general methods [5,6], the analysis software tools based on the metrics of performance, specificity or of human genome [7-9], mouse [10] and more recently pig [11]. sensitivity. Below, we briefly introduce these four algorithms and However, algorithms designed for detection and archived datasets are discuss strength and weakness associated with each method. predominantly for Illumina pair-end sequencing [4]. With recent advancing in single molecule sequencing with long read platforms Next Generat Sequenc & Applic Sequencing Technologies ISSN:2469-9853 JNGSA, an open access Citation: Ye K, Hall G, Ning Z (2016) Structural Variation Detection from Next Generation Sequencing . Next Generat Sequenc & Applic S1: 007. doi:10.4172/2469-9853.S1-007 Page 2 of 6 Progra INS;DEL; Mutation Data Citation Authors and https:// m SV Type URL INV;TAN; WGS Layer, et al. github.com Signals Input s* reference LUMPY RP+SR 52 Name CNV; BAM [16] /arq5x/ TRANS lumpy-sv http:// INS;DEL; Breakd WGS Chen, et al. breakdanc RP INV;TRA 602 https:// ancer BAM [12] er.sourcef NS github.com orge.net/ INS;DEL; WGS MultiBr Ritz, et al. /raphael- RS INV;TRA BLAS 7 eak-SV [29] group/ https:// NS R multibreak github.com INS;DEL; -sv BreaK Assembl WGS Abo, et al. /a- INV;TAN; 16 mer y BAM [36] bioinforma TRANS WGS tician/ http:// FAST BreaKmer NovelS Assembl Hajirasouliha, novelseq.s INS A/ 81 eq y et al. [39] ourceforge FAST http:// .net/Home Q www.bioinf Breakp WGS Sun, et al. o.org/wiki/ SR INS;DEL 11 https:// ointer BAM [27] index.php/ INS;DEL; WGS github.com Breakpoint Pindel SR INV;TAN; 602 Ye, et al. [23] BAM /genome/ er TRANS pindel http:// http:// sourceforg BreakS WGS Zhao, et al. INS;DEL; WGS Jiang, et al. compbio.c SR INS;DEL 1 e.net/ PRISM RP+RS 50 eek SAM [11] INV;TAN SAM [19] s.toronto.e projects/ du/prism/ breakseek/ http:// http:// RDXplo WGS Yoon, et al. rdxplorer.s www.bioinf RD CNV 329 cn.MO WGS Klambauer, et rer BAM [34] ourceforge RD CNV 91 .jku.at/ PS BAM al. [31] .net/ software/ cnmops/ https:// github.com https:// ReadD WGS Miller, et al. / github.com RD CNV 81 CNVnat WGS Abyzov, et al. epth bed [35] chrisamille RD CNV 306 / or BAM [30] r/ abyzovlab/ readDepth CNVnator http:// http:// WGS soap.geno www.stjud INS;DEL; SOAPi Assembl SOA mics.org.c WGS Wang, et al. eresearch. INS;DEL 47 Li, et al. [38] CREST SR INV; 210 ndel y P/SA n/ BAM [26] org/ M soapindel. TRANS site/lab/ html zhang https:// https:// INS;DEL; SoftSe WGS Hart, et al. code.googl DEL;INV; github.com RP+SR INV;TRA 17 WGS Rausch, et al. arch BAM [17] e.com/p/ DELLY RP+SR TAN;TR 165 / NS BAM [14] softsearch/ ANS tobiasraus ch/delly RNA- http:// SplitRe seq Karakoc, et splitread.s http:// RS INS;DEL 71 ad FAST al. [28] ourceforge compbio.c GASVP DEL; WGS Sindi, et al. Q .net/ RP+RD 66 s.brown.ed ro BAM [15] INV u/projects/ http:// gasv/ INS;DEL; svdetect.s SVdete RP or WGS Zeitouni, et al. INV;TAN; 104 ourceforge http:// ct RD BAM [18] CNV .net/Site/ www.broa Home.html dinstitute.o Genom DEL; WGS Handsaker, et RD 25 rg/ eSTRiP BAM al. [32] http:// CNV software/ www.engr. genomestr WGS Zhang, et al. SVseq2 SR INS;DEL 24 uconn.edu ip/ BAM [24] /~jiz08001/ svseq.html INS; https:// Assembl WGS Quinlan, et al. github.com HYDRA 158 http:// y DEL;INV; BAM [37] /arq5x/ INS;DEL; bioinforma TAN Hydra TIGRA- Assembl WGS Chen, et al. INV;TRA 30 tics.mdand SV y BAM [12] NS erson.org/ http:// INS;DEL; main/ inGAP- WGS ingap.sour RD INV; 49 Qi, et al. [33] sv SAM ceforge.ne TRANS t/ Table 1: List of computational tools for structural variation detection. Next Generat Sequenc & Applic Sequencing Technologies ISSN:2469-9853 JNGSA, an open access Citation: Ye K, Hall G, Ning Z (2016) Structural Variation Detection from Next Generation Sequencing . Next Generat Sequenc & Applic S1: 007. doi:10.4172/2469-9853.S1-007 Page 3 of 6 in a pair are often incorrectly mapped to remote regions of the genome.

Load more