Precision Diagnostics
Total Page:16
File Type:pdf, Size:1020Kb
Precision diagnostics the importance of bioinformatics for Next Generation Sequencing Saarbrücken, 11th of November, 2013 Professor Dr. Andreas Keller Chair for Clinical Bioinformatics Saarland University University Hospital > Agenda Introduction Genome Sequencing Exome Sequencing Gene Panel Sequencing Other applications 02.12.13 Seite 2 ACTC1 > Chair for Clinical Bioinformatics > Research at a glance Non-Invasive Biomarkers Genetic Testing by NGS new SNV 1183 new SNV 1184 new SNV 1185 new SNV 1186 Detection of miRNA and protein biomarker SID381 SID308 WholeSID298 SID307 genome, exome or gene panel patterns from human blood or serum sequencing of DNA in order to detect samples using microarrays, NGS, genetic causes for human diseases. qRT-PCR & mass spectrometry. Understanding the effect of the 35075000 35080000 35085000 35090000 Biostatistical evaluation & va- position on chr respective genetic variants for lidation of the complex profiles. different disease phenotypes. Precision Healthcare Bacterial Resistance Systems Biology Understanding the genetic cause Model and understand how DNA, of bacterial resistance and correlate mRNA, microRNA, methylation and the bacterial resistance to classical proteins mutually interact in order culture based tests in order to derive generate a holistic and multi- the minimal inhibitory concentration and scale molecular representa- best therapy with anti bacterial agents. tion of human pathologies. 02.12.13 Seite 3 ACTC1 > Chair for Clinical Bioinformatics > Research at a glance Non-Invasive Biomarkers Genetic Testing by NGS new SNV 1183 new SNV 1184 new SNV 1185 new SNV 1186 Detection of miRNA and protein biomarker SID381 SID308 WholeSID298 SID307 genome, exome or gene panel patterns from human blood or serum sequencing of DNA in order to detect samples using microarrays, NGS, genetic causes for human diseases. qRT-PCR & mass spectrometry. Understanding the effect of the 35075000 35080000 35085000 35090000 Biostatistical evaluation & va- position on chr respective genetic variants for lidation of the complex profiles. different disease phenotypes. Precision Healthcare Bacterial Resistance Systems Biology Understanding the genetic cause Model and understand how DNA, of bacterial resistance and correlate mRNA, microRNA, methylation and the bacterial resistance to classical proteins mutually interact in order culture based tests in order to derive generate a holistic and multi- the minimal inhibitory concentration and scale molecular representa- best therapy with anti bacterial agents. tion of human pathologies. 02.12.13 Seite 4 > The vision: integrated companion diagnostics The vision: • Using advanced informatics and biomarkers in order to… - deliver the right treatment – to the right patient – at the right time … for improving patients outcome cDX - strongly growing from a weak basis • Currently, below 5% of drugs on the market have cDX • 35% of Iate development pipelines (phase IIb –IV) relying on biomarkers • 58% of preclinical trials relying on biomarker data An approach healthy diseased treatment I treatment II marker 1 marker 2 marker 3 longitudinal data integration 02.12.13 Seite 5 ACTC1 > Chair for Clinical Bioinformatics > Research at a glance Non-Invasive Biomarkers Genetic Testing by NGS new SNV 1183 new SNV 1184 new SNV 1185 new SNV 1186 Detection of miRNA and protein biomarker SID381 SID308 WholeSID298 SID307 genome, exome or gene panel patterns from human blood or serum sequencing of DNA in order to detect samples using microarrays, NGS, genetic causes for human diseases. qRT-PCR & mass spectrometry. Understanding the effect of the 35075000 35080000 35085000 35090000 Biostatistical evaluation & va- position on chr respective genetic variants for lidation of the complex profiles. different disease phenotypes. Precision Healthcare Bacterial Resistance Systems Biology Understanding the genetic cause Model and understand how DNA, of bacterial resistance and correlate mRNA, microRNA, methylation and the bacterial resistance to classical proteins mutually interact in order culture based tests in order to derive generate a holistic and multi- the minimal inhibitory concentration and scale molecular representa- best therapy with anti bacterial agents. tion of human pathologies. 02.12.13 Seite 6 > Multi Scales – from molecules to populations DNA, RNA Proteins Metabolites Cells / Tissue Organs Body Society 25.000 100.000 2.400 7 trillion 9 1 7 billion Genes Proteins Compounds Human cells Organ- Human People systems being 4 Basic types 29 of tissue Organs 02.12.13 Seite 7 > Bioinformatics becomes key DNA/ RNA Proteins Data Set Size In Bytes (log scale) High-Throughput 1012 (Tera) Sequencing 1011 1010 109 (Giga) Mass Spec Microarray 108 107 6 Sanger Sequencing 10 (Mega) 105 COMPLEXITY COMPLEXITY 104 ELISA 3 Southern Blot Norther Blot Western Blot 10 (Kilo) 102 In molecular diagnostics there is a clear shift towards exponentially increasing complexity – the data can not be interpreted without IT support – Bioinformatics becomes clinically relevant 02.12.13 Seite 8 > Next-Generation DNA Sequencing three complexity stages Routine Targeted Gene Sets 3*105 Whole Exome 3*107 Whole Genome 3*109 Research 02.12.13 Seite 9 > From Sanger to HiSeq Sanger HiSeq The 10 year Human Genome Today, a genome is sequenced for Project sequenced the first human < $5,000 in less than 2 weeks on a reference genome the cost of single sequencing machine roughly $3 billion 02.12.13 Seite 10 > Development of sequencing cost / throughput Throughput Cost per genome 100 billion 1,000,000,000$ 1st generation 100,000,000,000 10,000,000$ 1,000,000,000 2nd generation 10,000,000 100,000$ 100,000 1,000$ rd 1,000 3 generation bases perday 10$ 1990 1998 2004 2012 1990 1998 2004 2012 NHGRI - analysis http://www.genome.gov/sequencingcosts/ 02.12.13 Seite 11 > Different Generations of Sequencing Three generations of sequencing 3rd Generation today Technically feasible nd 2 Generation Clinical applications Limited clinical usage st Complexity up 1 Generation Costper bases down 1980 1990 2000 2010 2020 2030 First generation • Classical sequencing approaches that are purely serial – Most relevant examples: Maxam-Gilbert Sequencing and Sanger Sequencing Second generation • High-throughput and parallel sequencing approaches that do not have single cell / genome resolution – Illumina GA, HiSeq, ABI SOLiD, IonTorrent Third generation • Nanopore based sequencing approaches and single cell / genome resolution approaches – oxford Nanopores, PacBio 02.12.13 Seite 12 > Paradigm shift with NGS > Towards complex genetic disorders Sanger Sequencing NGS cancer cardio- myopathy heritability heritability Genetic complexity Genetic complexity GOOD While many tests for monogenetic disorders (Mendelian Disorders) as cystic fibrosis can be carried out using Sanger sequencing for complex genetic disorders as cardiomyopathies or especially cancer Sanger sequencing lacks throughput. In addition, already for a single gene NGS is less expensive and equally accurate as predictability Sanger sequencing. genetic complexity 02.12.13 Seite 13 > Next-Generation Sequencing 37 technologies… … with very heterogeneous performance metrics …. Illumina Ion Torrent Roche-454 Property Minimum Maximum AB-SOLiD Helicos Read length 100 1,000 Pacific Bio OxfordNanopore Polonator # reads 200,000 6,000,000,000 CGI Intelligent Bio Output in GB 0.5 600 Genapsys Electronic Biosci Nabsys Accuracy 95% 99.95% IBM-Roche NobleGen Price per GB $ 10 $ 4,000 Genia LightSpeed GnuBio Price per device $ 100,000 $ 1,000,000 Bionanomatrix Halcyon GB per day 0.1 50 ZS Genetics Genizon BioSci LaserGen Visigen/Starlight GE Global Stratos Genomics Reveo … and the biggest problem becomes bioinformatics Base4innovation Li-Cor U.S. Genomics One human genome consists of …. Mobious Genomics Nanophotonics Biosci … up to 500 GB raw data Network Biosystems SeiraD … 3 billion reads of 100 bases Affymetrix Population Gen Tech AQI Sciences And has to be interpreted by clinicians 02.12.13 Seite 14 > Illumina at a glance Workflow Library Prep Cluster generation Sequencing Image Size comparison • 50 µm — typical length of a human liver cell, an average-sized body cell • 78 µm — width of a pixel on the display of the iPhone 4 (Retina Display) • 90 µm — paper thickness on average 02.12.13 Seite 15 > Steps of NGS bioinformatics analysis NGS analytics pipeline Solutions NGS Commercial available PROTOTYPES 0.3-8 days 1 – 500 GB raw data (fastq) Alignment of reads to target genome Detection of Variant calling genomic 0.5 –8days aberrations Primary Analysis 1 – 500 GB alignment file (bam) 1 – 500 MB variant file (vcf) Detect the single disease causing variant Understand influence on pathways Understand influence on protein structure and therapy 0–4weeks Segregation analysis Downstreamanalysis Carry out targeted therapy descriptions 50 kb clinical report 02.12.13 Seite 16 > Primary Analysis Example - Alignment > Geenral definition for alignments Basics • Given an alphabet Σ = {A,C,T,G} • Given a set S of k sequences S = {s1,..,sk} on the alphabet Σ, a sequence alignment is a set A = {a1,..,ak} of sequences on the alphabet Σ’ = {A,C,T,G,-} such that • All sequences of A are of the same length • After removal of {-}, ai = si for all I • In all columns at least one character of Σ has to be Original algorithms • Global alignment – Needleman-Wunsch Dynamic Programming • Local alignment – Smith-Waterman Dynamic Programming • Heuristics such as BLAST è Not suited for NGS because of runtime constrains 02.12.13 Seite 17