Next Genera�on Sequencing technology and applica�ons
Leonardo A. Meza-‐Zepeda, Ph.D. Genomics Core Facility Helse Sør-‐Øst/University of Oslo
oslo.genomics.no
Topics
7 Introduc�on 7 Technologies 7 Applica�ons
oslo.genomics.no
1 Human Genome Project 3 billion bases
Cost approx. 3 Billion Dollars
Public HGP Celera Genomics 1990-‐2003
oslo.genomics.no
Development of Sequencing Technologies
Massively parallel sequencing
Human Genome Project Stra�on MR et al, Nature 2009
oslo.genomics.no
2 Development of Sequencing Technologies last 10 years
ER Mardis. Nature 470, 198-203 (2011)
oslo.genomics.no
Cost per Megabase
oslo.genomics.no
3 Costs for a Human Genome
Capillary Sequencing Next Genera�on Sequencing Applied Biosytems 3730xl HiSeq 2500 (2004) (Today) U$ 15,000,000 U$ 6,000
The Norwegian oslo.genomics.no Radium Hospital
Sequencing Costs per Genome
Costs Genomes
$100M Venter 1M
$10M 100k Watson
$1M 10k African, Asian, Cancer pair
$100k 169 in Genbank 1,000 Individual Genome Cost per Human Genome Human per Cost $10k Sequencing 100
2007 2008 2009 2010 2011 2012 Time
oslo.genomics.no
4 Sequencing Technologies
7 Solexa (Illumina) 7 Sequencing by synthesis 7 454 (Roche) 7 Pyrosequencing 7 SOLiD (Life Technologies) 7 Sequencing by liga�on
7 Non op�cal 7 Ion Torrent/ Ion Proton
7 Single molecule sequencing 7 Helicos, Pacific Biosciences, Nanopores Oxford
oslo.genomics.no
Common a�ributes of commercial sequencers
7 Random fragmenta�on of DNA 7 Liga�on of adapter, crea�on of a library (genome/transcriptome) 7 Library amplifica�on on a solid surface 7 Direct sequencing for single molecule pla�orms 7 Direct step by step detec�on of nucleo�de incorpora�on 7 Shorter read length thank tradi�onal sequencers 7 Digital read type, enables direct quan�ta�ve comparisons 7 Possible to read both ends of a DNA fragment 7 Single read or Paired-‐end reads
oslo.genomics.no
5 Single and Paired Ends Libraries
Single Read
Single Read Library (Up to 100bp) i.e. ChIP-‐Seq, miRNA-‐Seq
Read 1 Read 2
Short Insert Paired-‐Ends Library
200bp → 500bp i.e. RNA-‐Seq, Genomic-‐Seq
Read 1 Read 2
2Kb → Paired-‐end 10Kb read offer advantages for Long Insert Mate Pair Library sequencing larger and complex genomes. i.e. Genomic-‐Seq Facilitates accurate posi�oning (mapping) of the reads compared to single reads
oslo.genomics.no
Paired-‐end Sequencing and Alignment
oslo.genomics.no
6 Mate-‐pair Sequencing and Alignment
Combina�on of short-‐read and mate-‐pair sequence reads for de novo sequencing
oslo.genomics.no
Barcoding Libraries
Library Sequence Separate prepara�on pool sequences
Sequencing of insert and index
oslo.genomics.no
7 Sequencing Technologies
oslo.genomics.no
DNA Fragmenta�on
Adap�ve Focused Acous�cs, COVARIS 7 Acous�c energy wave that converges and focuses to a small-‐localized area 7 Shearing of DNA, RNA, Chroma�n, +++ 7 Random fragmenta�on
Plates
Single Sample
oslo.genomics.no
8 Library Construc�on
oslo.genomics.no
Nextera Library Construc�on, Illumina
oslo.genomics.no
9 Targeted Amplifica�on
HaloPlex (Agilent)
oslo.genomics.no
Targeted Amplifica�on
TruSeq Custom Amplicon (Illumina)
oslo.genomics.no
10 Targeted Amplifica�on
AmpliSeq (Ion Torrent) 12 to 3072-‐plex
oslo.genomics.no
Targeted Amplifica�on
Single Molecule Molecular Inversion Probes
O’Roak BJ, et al Science 2012 Hia� JB, et al Genome Res 2013
oslo.genomics.no
11 Library Amplifica�on, Emulsion PCR
Roche/454 and Ion Torrent/Proton
Metzker, Nature Reviews Gene�cs 2010
oslo.genomics.no
Roche 454, Pyrosequencing
Problems with homopolymers
Metzker, Nature Reviews Gene�cs 2010
oslo.genomics.no
12 Roche/454 Technology
Instrument Run �me (hr) Read Length Yield Error Type Error Rate (%) (bp) (Mb/run) GS FLX 23 1000 700 Mb Indel 1 Titanium XL+ Mean: 700 GS FLX 10 600 450 Mb Indel 1 Titanium XLR70 Mean: 450 GS Junior 10 400 35 Mb Indels 1
7 Mate pair paired-‐end reads of 3 kb, 8kb and 20kb 7 Cost per run makes sequencing an en�re human prohibi�ve 7 Great pla�orm for targeted valida�on 7 Good for de novo sequencing in combina�on with short reads
oslo.genomics.no
SOLiD, Life Technologies
Beads are placed in the surface of the flow cell
oslo.genomics.no
13 SOLiD, Life Technologies
Tucker, Am J Hum Gen 2009 Metzker, Nature Reviews Gene�cs 2010
oslo.genomics.no
SOLiD Technology
Instrument Run �me Read Length Yield Error Type Error Rate (%) (days) (bp) (Gb/run) 5550 W 2 -‐ 8 1 x 50 80 Gb A-‐T bias 0.01 1 x 75 120 Gb 2 x 50 160 Gb 5500xl W 2 -‐ 8 1 x 50 160 Gb A-‐T bias 0.01 1 x 75 240 Gb 2 x 50 320 Gb
7 6-‐lane flow chip with independent lanes 7 Very high accuracy data due to two base encoding 7 Conversion of color space to base space 7 Paired-‐end chemistry enabled 7 Wild-‐fire chemistry being implemented (replaces ePCR)
oslo.genomics.no
14 Ion Torrent/Proton, Life Technologies
Library Construc�on and Emulsion PCR
Semiconductor Chip
Problems with homopolymers
oslo.genomics.no
Different Chip Sizes
oslo.genomics.no
15 Instruments
Ion Proton to sequence one human genome per day for U$ 1000
oslo.genomics.no
Illumina Sample Prepara�on
1 Library prepara�on Fragment DNA
Repair ends / Add A overhang
Ligate adapters
Select ligated DNA
2 Automated Cluster Genera�on Hybridize to flow cell
1-‐8 samples Extend hybridized oligos
Perform bridge amplifica�on
3 Sequencing Perform sequencing on forward strand
1-‐16 samples Re-‐generate reverse strand
Perform sequencing on reverse strand
oslo.genomics.no
16 Illumina Library Prepara�on
oslo.genomics.no
Nextera Library Construc�on, Illumina
7 Low DNA input 50 ng 7 Fast Library Prep. 90 minutes 7 Automa�on friendly 7 Larger insert size 7 GC bias (inser�on site)
oslo.genomics.no
17 Illumina Cluster Genera�on
Seq. Library 100 μm (8 pmols)
Single molecule array 3 Billion clusters Library Cluster Growth Cluster Density Prepara�on Amplifica�on
oslo.genomics.no
Sequencing-‐by-‐Synthesis
A
T
C
G
Terminator and 3 Billion clusters Incorporated x 2x100bp = 600 Gigabases per run Add 4 Fl-‐ fluorescent dye are Fl-‐NTP is NTP’s + cleaved from the Fl-‐ imaged Polymerase 100 exomes 30x NTP 5 human X 36 genomes -‐ 150 30x coverage
oslo.genomics.no
18 Illumina Instruments
2010: HiSeq 2000 • Two flow cells per run, 100 Gbp/FC or one human genome • New scanning mechanics -‐ scans both surfaces of FC lanes 2011: HiSeq 2000 • Improved chemistry: increased yield and accuracy • Approx. 600 GB, 5-‐6 human genomes 2011: MiSeq Personal Sequencer • One flow cells per run • 2x150 bp, approx. 4-‐5 Gb • Fast sequencing, 4-‐27 hrs per run 2012: MiSeq v.2 chemistry • Scans both surfaces of FC, Double the capacity • 2x250 bp, approx. 8-‐10 Gb 2013: HiSeq 2500 • One flow cell per run • RAPID mode, 27hrs sequencing run • One human genome per flow cell 2013: MiSeq v.3 chemistry • Scans both surfaces of FC, Double the capacity • 2x300 bp, approx. 15 Gb
oslo.genomics.no
Illumina
Instrument Run �me Read Length Yield Error Type Error Rate (%) (days) (bp) (Gb/run) HiSeq 2500 2 1 x 36 108 Gb Sub 0.1 High output 5 2 x 50 300 Gb mode 11 2 x 100 600 Gb HiSeq 2500 7 1 x 36 22 Gb Sub 0.1 Rapid mode 27 2 x 100 120 Gb 40 2 x 150 360 Gb MiSeq 4 1 x 50 1.3 Gb Sub 0.1 24 2 x 150 7.5 Gb 65 2 x 300 15 Gb
New development: Ordered array of clusters
oslo.genomics.no
19 Moleculo, Long-‐Read Sequencing
Voskoboynik A, et al eLife 2013
oslo.genomics.no
Single Molecule Sequencing
7 Helicos 7 Pacific Biosciences 7 Oxford Nanopores
oslo.genomics.no
20 Helicos
Top: CTAGTC Bo�om: CAGCTA
Metzker, Nature Reviews Gene�cs 2010
oslo.genomics.no
Pacific Biosciences
7 Single Molecule Real Time (SMRT) sequencing technology 7 No amplification 7 Single pass read accuracy 85% 7 150,000 zero-mode waveguides per SMRT cell 7 Approx. 3-5,000 bp sequence length Metzker, Nature Reviews Gene�cs 2010
oslo.genomics.no
21 Nanopore Technology
Nanopore is, essen�ally, a nano-‐scale hole. 7 Biological: pore-‐forming protein in a membrane (lipid bilayer) 7 Solid-‐state: formed by synthe�c materials, (silicon nitride) 7 Hybrid: pore-‐forming protein set in a synthe�c material
To come
oslo.genomics.no
Data analysis
Fluorescent signal Number Base call Alignment Biological pH change de novo Interpreta�on Conduc�vity
Large IT infrastructure
oslo.genomics.no
22 NGS Applica�ons
oslo.genomics.no
NGS applica�ons
7 Genomes: re-‐sequencing or de novo 7 Point muta�on/indel/structural varia�on discovery 7 Protein:DNA binding 7 Chroma�n IP/histone binding 7 Nucleosome/transcrip�on factor binding, etc. 7 ncRNA discovery/sequencing/variants 7 Transcriptome sequencing (RNA-‐seq) 7 Genome-‐wide methyla�on of DNA (Methyl-‐seq) 7 Clinical sequencing for therapeu�c decisions
oslo.genomics.no
23 de novo Genome Sequencing
oslo.genomics.no
Resequencing
Meyerson et al, Nature Reviews Gene�cs 2010
oslo.genomics.no
24 ICGC
Descrip�on of genomic, transcriptomic and epigenomic changes 7 Data available to the en�re research community 50 different tumour types and/or subtypes 7 Clinical and societal importance across the globe 7 Pa�ent-‐matched control samples (500 of each) 7 ~ $ 25 million each project Osteosarcomas (Myklebost/Meza-‐Zepeda) 7 Wellcome Trust Sanger Ins�tute (Michael Stra�on)
Similar US ini�a�ve, The Cancer Genome Atlas (TCGA)
Interna�onal network of cancer genome projects. Nature 2010
oslo.genomics.no
Sequence Capture
-‐ Candidate region -‐ Exome (1/40 genome) Sequencing disease and control Meyerson et al, Nature Reviews Gene�cs 2010 -‐ Private SNPs Mardis ER, Nat Rev Gastroenterology & Hepatology 2012
oslo.genomics.no
25 Metagenomics
Main metagenomics applications, from the metagenomic libraries construction and screening, until next generation sequencing, gene count and genome reconstruction.
Lepage P et al. Gut 2011
oslo.genomics.no
RNA-‐Seq
Single reads Paired-‐End
The Norwegian Radium oslo.genomics.no
26 RNA-‐Seq (Quan�fica�on)
IOR/MOS (Osteosarcoma cell line) Lorenz, et al
oslo.genomics.no
Transcript Variants
RUNX2
Håkelien et al
oslo.genomics.no
27 Alterna�ve isoform regula�on in human �ssues
Exon usage
Wang, Sandberg et al. Nature 2008
oslo.genomics.no
Small RNA Sequencing
Size selec�on
Quan�fica�on Discovery
oslo.genomics.no
28 miRNAs Colorectal Cancer
Schee, et PLoSOne al 2013
oslo.genomics.no
ChIP-‐Seq
Sequencing
oslo.genomics.no
29 Epigene�c Regula�on
The Norwegian oslo.genomics.no Radium
Epigene�c landscapes
H3K4me3 Undiff.
Diff. H3K27Ac Undiff.
Diff. H3K27me3 Undiff.
Diff. H3K36me3 Undiff.
Diff. H3K9AC Undiff.
Diff. H3 Undiff.
Diff.
oslo.genomics.no Håkelien et al
30 Gene Expression and Histone Modifica�ons
Håkelien et al
oslo.genomics.no
RNA Sequencing, Fusion Transcript
Gene A Gene B
Fused mRNA
Gene junc�on
Paired-‐end
Adapter1 Adapter2 ACGTTTTCGGATT ACGTTTTCGGATTG Seq-‐primer1 Seq-‐primer2 Read 1 (75 bp) Read 2 (75 bp)
mRNA fragment ~240 -‐280 bp
oslo.genomics.no
31 Fusion Transcript Analysis
mRNA fragment ~240 -‐280 bp
ACGTTTTCGGATTGGC ACGTTTTCGGATTGGC Read 1 (75 bp ) Read 1 (75 bp ) 70 Millions reads Mapping to transcriptome and genome
Candidate fusion genes: not correctly mapped reads
Spanning reads
Gene A ACGTACTCGGATTGGC ACGTTAACGGATTGGC Gene B Breakpoint Fusion transcript CGGATTGGCTGGCTCA ACGTTCTCGGATTGGC ACGTTTTCGGATTGGA ACGTTTTCGCGGATTG
Breakpoint reads
oslo.genomics.no
Filtering Candidate Fusions
Sample Fusions Star�ng with 2597 reported fusions IOR-‐MOS 34 IOR-‐OS15 404 Pipeline IOR-‐OS18 547
MHM 56 Filter fusions Filter found in Filter read-‐ repete�ve/low Filter ribosomal OSA 191 normal control through events complex. seq. protein �ssues breakpoint ZK58 391
KPD 248 -‐ 491 -‐ 65 -‐ 27 -‐ 590
MG-‐63 324
SAOS-‐2 175
IOR-‐OS10 38 … s�ll 1424 fusions, 1100 unique
SARG 189 Total 2597 Lorenz et al
oslo.genomics.no
32 Breakpoint at Genomic Level
IOR-‐OS15
PMP22 ELOVL5
K. Szuhai, Leiden
RNAseq only + DNAseq
oslo.genomics.no
Circula�ng Tumour DNA
Leary et, Sci Transl Med 2012
oslo.genomics.no
33 Biomarkers
Leary R J et al. Sci Transl Med 2010
oslo.genomics.no
Non-‐invasive prenatal test
Bianchi DW, Nature Medicine 2012
oslo.genomics.no
34 DNA Methyla�on
Gene silencing
oslo.genomics.no
Bisulfite Sequencing
Bisulfite conversion
oslo.genomics.no
35 Affinity-‐based Methods (enrichment)
MeDIP-‐Seq MBD-‐Seq
oslo.genomics.no
Reduced Representa�ons
Reduced Representa�on Bisulfite Seq. 7 0.01-‐ 0.3 ug of DNA 7 Base pair resolu�on 7 Theore�cal genomic coverage 10% 7 CpG rich regions
Deep Sequencing
oslo.genomics.no Gu H et al, Nature Protocols (2011)
36 Detec�ng DNA Base Modifica�ons
oslo.genomics.no
WGBS of a newborn (NB) and a centenarian (Y103) individual
Genome-wide DNA methylation levels in the NB, Y26, and Y103 individuals
Heyn H et al. PNAS 2012
oslo.genomics.no
37 Personalised Medicine
Making the treatment as individualized as the disease, by iden�fying gene�c, genomic, and clinical informa�on that allows accurate predic�ons to be made about a person's suscep�bility of developing disease, the course of disease, and its response to treatment.
Advantages of personalised medicine 7 Ability to make more informed medical decisions 7 Higher probability of desired outcomes thanks to be�er-‐targeted therapies 7 Reduced probability of nega�ve side effects 7 Earlier disease interven�on 7 Reduced healthcare costs
oslo.genomics.no
Cancer classifica�on based on genomic profile
Gene Reflec�ng profile A mechanism that driver tumour growth, possible sensi�vity to specific targeted therapy
Gene profile C Gene profile B
oslo.genomics.no
38 Treatments based on genetic profiles
Treatment lung cancer Treatment colon cancer
Sequencing
Colon treatment Sarcoma treatment for lung cancer? for colon cancer?
New medicine?
oslo.genomics.no
Targeted Resequencing of Cancer Genes
oslo.genomics.no
39 Lung cancer incidence
40
35
30
25
20 Women Men 15
10 Per 100 000 person-years Per
5
0 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Men :29% increase (absolute numbers) Women: 163% increase (absolute numbers) Age-‐adjusted incidence : 36/100000 (men) (1,4% annual increase) 28/100000 (women) (4,9% annual increase)
oslo.genomics.no Sagerup et al, Thorax, 2011
A deadly Disease
Males Females
Cancer incidence in Norway 2005-‐2009
Cancer Register of Norway
oslo.genomics.no
40 Poor Prognosis
Localized disease 5 –year survival: Males 44 %, Females 55 % Need for new therapeu�c op�ons
Cancer Register of Norway
oslo.genomics.no
Personalised Cancer Medicine
Aims: To map muta�on frequencies in lung cancer Iden�fy novel therapeu�c targets
7 Large lung cancer biobank ( Drs. Helland and Brustugun) 7 430 clinical surgical biopsies, NSCLC 7 Early stage 7 Untreated samples 7 100 cases, tumour/normal pair 7 Targeted resequencing 7 All kinase genes + 100 cancer genes 7 All exons and UTRs 7 Cancer Clinic, Bioinforma�cs and Genomics CF
oslo.genomics.no
41 Kinases Regulate Cell Signaling
Kinases Targeted in Clinical Trials
oslo.genomics.no
Agilent Kinome Set
Inositol ADDITIONAL PROTEIN KINASE PI3K DOMAIN DIGLYCERIDE PIK3 REGULATORY polyphosphate PIP4/PIP5 CANCER GENES BREAST CANCER MORE CANCER GENES (517) PROTEINS (12) KINASES (13) COMPONENTS (6) kinases (9) kinases (9) (20) GENES (16) GENES (11) AAK1 PIK3C2A AGK PIK3R1 IP6K1 PIKFYVE CDC6 COL1A1 CCND1 AATK PIK3C2B CERK PIK3R2 IP6K2 PIP4K2A CHD3 GAB1 CCND2 ABL1 PIK3C2G DGKA PIK3R3 IP6K3 PIP4K2B HRAS HAUS3 CCND3 ABL2 PIK3C3 DGKB PIK3R4 IPMK PIP4K2C KRAS IRS2 ESR1 ACTR2 PIK3CA DGKD PIK3R5 IPPK PIP5K1A NRAS IRS4 ESR2 ACVR1 ATM PIK3CB DGKE PIK3R6 ITPK1 PIP5K1B PTEN KIAA1468 FBXW7 ACVR1B AXL PIK3CD DGKG ITPKA PIP5K1C CDH1 KLHL4 IDH1 ACVR1C CDKs PIK3CG DGKH ITPKB PIP5KL1 TP53 NFKB1 IDH2 ACVR2A PI4KA DGKI ITPKC PIPSL CDKN2A NFKBIA MLH1 EGFR ACVR2B PI4KB DGKQ CDKN2B NFKBIE TERT ACVRL1 FGFRs PI4K2B DGKZ APC PALB2 ADCK1 FLTs PI4K2A SPHK1 RB1 RHEB ADCK4 SPHK2 CTNNB1 RNF220 IGF1R ADCK5 BRCA1 SNX4 ADRBK1 JAK1 BRCA2 SP1 ADRBK2 KIT NF1 USP28 AKT1 MAPKs NF2 AKT2 GATA3 MET AKT3 MYC ALK MTOR INPP4A PDGFRs RAF1 TGFBRs 612 genes, total 3.2 Mbp
oslo.genomics.no
42 Analysis Pipieline
Calling DNA copy Genomic Mapped Calling soma�c number Indels rearrange-‐ Annota�on reads varia�ons muta-‐ changes ments �ons
Soma�c Muta�on Detec�on Pipeline
Collabora�on with UiO HSØ/ Bioinforma�cs Core Facility
oslo.genomics.no
Single Nucleo�de Variants
oslo.genomics.no
43 Soma�c Muta�on Distribu�on
oslo.genomics.no
Func�onal and Clinical Annota�on Pipeline
Sigve Nakken et al
oslo.genomics.no
44 EGFR Muta�on
CTG>CGG Leu>Arg Missense Muta�on Lorenz, Nakken, Vodak , Madoui, et al
oslo.genomics.no
EGFR mutations in lung cancer
EGF ligand binding Tyrosine kinase Autophos
TM K DFG Y Y Y Y
718 745 776 835 858 Y869 947 964
GXGXXG K R H DFG R Y M LREA
Most TKI responders have EGFR muta�ons i.e. Iressa and Tarceva
oslo.genomics.no
45 EGFR in lung cancer
Ligand 7 Epidermal Growth Factor Receptor L-‐domain 7 Muta�ons increase ac�vity of EGFR Furine-‐like domain 7 Increased cell survival and prolifera�on L-‐domain 7 Drives tumour growth Extracellular domain 7 EGFR inhibitor Transmembrane domain Intraracellular domain 7 Pa�ents treated with EGFR inhibitors EGFR Lung cancer before treatment Lung cancer a�er treatment Tyrosine kinase domain
Survival Prolifera�on 10-‐15% of pa�ents with EGFR muta�ons
oslo.genomics.no
Pao & Chmielecki, Nat Rev Cancer 2010
oslo.genomics.no
46 Norwegian Cancer Genomics Consor�um www.cancergenomics.no
Oslo, Bergen, Trondheim and Tromsø Key inves�gators 7 Ola Myklebost, PI, OUS 7 Ragnhild Lothe, OUS 7 Harald Holte, OUS Transla�onal medicine group 7 Per Eystein Lønning, Haukeland 7 Anders Waage , St Olavs 7 Giske Ursin, Norwegian Cancer Registry 7 Leonardo A. Meza-‐Zepeda, OUS, Sequencing Technology 7 Eivind Hovig, OUS, Bioinforma�cs
oslo.genomics.no
NCGC Project Sequencing 4000 tumour/blood pairs Exome and custom targeted resequencing Different cancer types 7 Breast cancer, Lymphoma, Leukemia, Colorectal cancer, Malignant melanoma, Sarcoma, Mul�ple 7 Provide myeloma, a na�onal Gynecological network for cancers, implementa�on Prostate of personalised cancer cancer medicine in Norway 7 Provide and disseminate methodology for sequencing of tumour material and iden�fica�on of soma�c muta�ons 7 Ini�ate a number of research projects to determine the applicability of muta�on profiles from the individual tumour for therapeu�c decisions
oslo.genomics.no
47 NCGC data logis�cs
Myeloma Bioinforma�cs Sequence Lymphoma
Leukemia OUS
Colon Na�onal
Melanoma muta�on database Haukeland
Prostate
Breast Cancer registry St Olavs Sarcoma
Clinical data Accumulated Others? na�onal data oslo.genomics.no
oslo.genomics.no
48 DNase-‐Seq
Iden�fy open chroma�n associated with ac�ve DNA elements Foot print of transcrip�on factors Zeng et al, Nature Immunology 2012
oslo.genomics.no
Chromosome Conforma�on Capture
Similar for 4C, ChIA Hi-‐C, -‐PET Stadhouders, et al Nature Protocols 2013
oslo.genomics.no
49 Ribosome Profiling
Ingolia, et al Nature Protocols 2012
oslo.genomics.no
oslo.genomics.no
oslo.genomics.no
50