Sequencing Technology and Applica�Ons
Total Page:16
File Type:pdf, Size:1020Kb
Next Genera�on Sequencing technology and applica�ons Leonardo A. Meza-‐Zepeda, Ph.D. Genomics Core Facility Helse Sør-‐Øst/University of Oslo oslo.genomics.no Topics 7 Introduc�on 7 Technologies 7 Applica�ons oslo.genomics.no 1 Human Genome Project 3 billion bases Cost approx. 3 Billion Dollars Public HGP Celera Genomics 1990-‐2003 oslo.genomics.no Development of Sequencing Technologies Massively parallel sequencing Human Genome Project Stra�on MR et al, Nature 2009 oslo.genomics.no 2 Development of Sequencing Technologies last 10 years ER Mardis. Nature 470, 198-203 (2011) oslo.genomics.no Cost per Megabase oslo.genomics.no 3 Costs for a Human Genome Capillary Sequencing Next Genera�on Sequencing Applied Biosytems 3730xl HiSeq 2500 (2004) (Today) U$ 15,000,000 U$ 6,000 The Norwegian oslo.genomics.no Radium Hospital Sequencing Costs per Genome Costs Genomes $100M Venter 1M $10M 100k Watson $1M 10k African, Asian, Cancer pair $100k 169 in Genbank 1,000 Individual Genome Cost per Human Genome Human per Cost $10k Sequencing 100 2007 2008 2009 2010 2011 2012 Time oslo.genomics.no 4 Sequencing Technologies 7 Solexa (Illumina) 7 Sequencing by synthesis 7 454 (Roche) 7 Pyrosequencing 7 SOLiD (Life Technologies) 7 Sequencing by liga�on 7 Non op�cal 7 Ion Torrent/ Ion Proton 7 Single molecule sequencing 7 Helicos, Pacific Biosciences, Nanopores Oxford oslo.genomics.no Common a�ributes of commercial sequencers 7 Random fragmenta�on of DNA 7 Liga�on of adapter, crea�on of a library (genome/transcriptome) 7 Library amplifica�on on a solid surface 7 Direct sequencing for single molecule pla�orms 7 Direct step by step detec�on of nucleo�de incorpora�on 7 Shorter read length thank tradi�onal sequencers 7 Digital read type, enables direct quan�ta�ve comparisons 7 Possible to read both ends of a DNA fragment 7 Single read or Paired-‐end reads oslo.genomics.no 5 Single and Paired Ends Libraries Single Read Single Read Library (Up to 100bp) i.e. ChIP-‐Seq, miRNA-‐Seq Read 1 Read 2 Short Insert Paired-‐Ends Library 200bp → 500bp i.e. RNA-‐Seq, Genomic-‐Seq Read 1 Read 2 2Kb → Paired-‐end 10Kb read offer advantages for Long Insert Mate Pair Library sequencing larger and complex genomes. i.e. Genomic-‐Seq Facilitates accurate posi�oning (mapping) of the reads compared to single reads oslo.genomics.no Paired-‐end Sequencing and Alignment oslo.genomics.no 6 Mate-‐pair Sequencing and Alignment Combina�on of short-‐read and mate-‐pair sequence reads for de novo sequencing oslo.genomics.no Barcoding Libraries Library Sequence Separate prepara�on pool sequences Sequencing of insert and index oslo.genomics.no 7 Sequencing Technologies oslo.genomics.no DNA Fragmenta�on Adap�ve Focused Acous�cs, COVARIS 7 Acous�c energy wave that converges and focuses to a small-‐localized area 7 Shearing of DNA, RNA, Chroma�n, +++ 7 Random fragmenta�on Plates Single Sample oslo.genomics.no 8 Library Construc�on oslo.genomics.no Nextera Library Construc�on, Illumina oslo.genomics.no 9 Targeted Amplifica�on HaloPlex (Agilent) oslo.genomics.no Targeted Amplifica�on TruSeq Custom Amplicon (Illumina) oslo.genomics.no 10 Targeted Amplifica�on AmpliSeq (Ion Torrent) 12 to 3072-‐plex oslo.genomics.no Targeted Amplifica�on Single Molecule Molecular Inversion Probes O’Roak BJ, et al Science 2012 Hia� JB, et al Genome Res 2013 oslo.genomics.no 11 Library Amplifica�on, Emulsion PCR Roche/454 and Ion Torrent/Proton Metzker, Nature Reviews Gene�cs 2010 oslo.genomics.no Roche 454, Pyrosequencing Problems with homopolymers Metzker, Nature Reviews Gene�cs 2010 oslo.genomics.no 12 Roche/454 Technology Instrument Run �me (hr) Read Length Yield Error Type Error Rate (%) (bp) (Mb/run) GS FLX 23 1000 700 Mb Indel 1 Titanium XL+ Mean: 700 GS FLX 10 600 450 Mb Indel 1 Titanium XLR70 Mean: 450 GS Junior 10 400 35 Mb Indels 1 7 Mate pair paired-‐end reads of 3 kb, 8kb and 20kb 7 Cost per run makes sequencing an en�re human prohibi�ve 7 Great pla�orm for targeted valida�on 7 Good for de novo sequencing in combina�on with short reads oslo.genomics.no SOLiD, Life Technologies Beads are placed in the surface of the flow cell oslo.genomics.no 13 SOLiD, Life Technologies Tucker, Am J Hum Gen 2009 Metzker, Nature Reviews Gene�cs 2010 oslo.genomics.no SOLiD Technology Instrument Run �me Read Length Yield Error Type Error Rate (%) (days) (bp) (Gb/run) 5550 W 2 -‐ 8 1 x 50 80 Gb A-‐T bias 0.01 1 x 75 120 Gb 2 x 50 160 Gb 5500xl W 2 -‐ 8 1 x 50 160 Gb A-‐T bias 0.01 1 x 75 240 Gb 2 x 50 320 Gb 7 6-‐lane flow chip with independent lanes 7 Very high accuracy data due to two base encoding 7 Conversion of color space to base space 7 Paired-‐end chemistry enabled 7 Wild-‐fire chemistry being implemented (replaces ePCR) oslo.genomics.no 14 Ion Torrent/Proton, Life Technologies Library Construc�on and Emulsion PCR Semiconductor Chip Problems with homopolymers oslo.genomics.no Different Chip Sizes oslo.genomics.no 15 Instruments Ion Proton to sequence one human genome per day for U$ 1000 oslo.genomics.no Illumina Sample Prepara�on 1 Library prepara�on Fragment DNA Repair ends / Add A overhang Ligate adapters Select ligated DNA 2 Automated Cluster Genera�on Hybridize to flow cell 1-‐8 samples Extend hybridized oligos Perform bridge amplifica�on 3 Sequencing Perform sequencing on forward strand 1-‐16 samples Re-‐generate reverse strand Perform sequencing on reverse strand oslo.genomics.no 16 Illumina Library Prepara�on oslo.genomics.no Nextera Library Construc�on, Illumina 7 Low DNA input 50 ng 7 Fast Library Prep. 90 minutes 7 Automa�on friendly 7 Larger insert size 7 GC bias (inser�on site) oslo.genomics.no 17 Illumina Cluster Genera�on Seq. Library 100 μm (8 pmols) Single molecule array 3 Billion clusters Library Cluster Growth Cluster Density Prepara�on Amplifica�on oslo.genomics.no Sequencing-‐by-‐Synthesis A T C G Terminator and 3 Billion clusters Incorporated x 2x100bp = 600 Gigabases per run Add 4 Fl-‐ fluorescent dye are Fl-‐NTP is NTP’s + cleaved from the Fl-‐ imaged Polymerase 100 exomes 30x NTP 5 human X 36 genomes -‐ 150 30x coverage oslo.genomics.no 18 Illumina Instruments 2010: HiSeq 2000 • Two flow cells per run, 100 Gbp/FC or one human genome • New scanning mechanics -‐ scans both surfaces of FC lanes 2011: HiSeq 2000 • Improved chemistry: increased yield and accuracy • Approx. 600 GB, 5-‐6 human genomes 2011: MiSeq Personal Sequencer • One flow cells per run • 2x150 bp, approx. 4-‐5 Gb • Fast sequencing, 4-‐27 hrs per run 2012: MiSeq v.2 chemistry • Scans both surfaces of FC, Double the capacity • 2x250 bp, approx. 8-‐10 Gb 2013: HiSeq 2500 • One flow cell per run • RAPID mode, 27hrs sequencing run • One human genome per flow cell 2013: MiSeq v.3 chemistry • Scans both surfaces of FC, Double the capacity • 2x300 bp, approx. 15 Gb oslo.genomics.no Illumina Instrument Run �me Read Length Yield Error Type Error Rate (%) (days) (bp) (Gb/run) HiSeq 2500 2 1 x 36 108 Gb Sub 0.1 High output 5 2 x 50 300 Gb mode 11 2 x 100 600 Gb HiSeq 2500 7 1 x 36 22 Gb Sub 0.1 Rapid mode 27 2 x 100 120 Gb 40 2 x 150 360 Gb MiSeq 4 1 x 50 1.3 Gb Sub 0.1 24 2 x 150 7.5 Gb 65 2 x 300 15 Gb New development: Ordered array of clusters oslo.genomics.no 19 Moleculo, Long-‐Read Sequencing Voskoboynik A, et al eLife 2013 oslo.genomics.no Single Molecule Sequencing 7 Helicos 7 Pacific Biosciences 7 Oxford Nanopores oslo.genomics.no 20 Helicos Top: CTAGTC Bo�om: CAGCTA Metzker, Nature Reviews Gene�cs 2010 oslo.genomics.no Pacific Biosciences 7 Single Molecule Real Time (SMRT) sequencing technology 7 No amplification 7 Single pass read accuracy 85% 7 150,000 zero-mode waveguides per SMRT cell 7 Approx. 3-5,000 bp sequence length Metzker, Nature Reviews Gene�cs 2010 oslo.genomics.no 21 Nanopore Technology Nanopore is, essen�ally, a nano-‐scale hole. 7 Biological: pore-‐forming protein in a membrane (lipid bilayer) 7 Solid-‐state: formed by synthe�c materials, (silicon nitride) 7 Hybrid: pore-‐forming protein set in a synthe�c material To come oslo.genomics.no Data analysis Fluorescent signal Number Base call Alignment Biological pH change de novo Interpreta�on Conduc�vity Large IT infrastructure oslo.genomics.no 22 NGS Applica�ons oslo.genomics.no NGS applica�ons 7 Genomes: re-‐sequencing or de novo 7 Point muta�on/indel/structural varia�on discovery 7 Protein:DNA binding 7 Chroma�n IP/histone binding 7 Nucleosome/transcrip�on factor binding, etc. 7 ncRNA discovery/sequencing/variants 7 Transcriptome sequencing (RNA-‐seq) 7 Genome-‐wide methyla�on of DNA (Methyl-‐seq) 7 Clinical sequencing for therapeu�c decisions oslo.genomics.no 23 de novo Genome Sequencing oslo.genomics.no Resequencing Meyerson et al, Nature Reviews Gene�cs 2010 oslo.genomics.no 24 ICGC Descrip�on of genomic, transcriptomic and epigenomic changes 7 Data available to the en�re research community 50 different tumour types and/or subtypes 7 Clinical and societal importance across the globe 7 Pa�ent-‐matched control samples (500 of each) 7 ~ $ 25 million each project Osteosarcomas (Myklebost/Meza-‐Zepeda) 7 Wellcome Trust Sanger Ins�tute (Michael Stra�on) Similar US ini�a�ve, The Cancer Genome Atlas (TCGA) Interna�onal network of cancer genome projects.