The Road from Genomics to Personalized Medicine
Total Page:16
File Type:pdf, Size:1020Kb
The road from genomics to personalized medicine Raeka Aiyar EMBL [email protected] Steinmetz Group Genome Biology Unit EMBL Heidelberg EUSJA Visit, 18 July 2011 www.embl.de/research/units/genome_biology/steinmetz 6 billion base pairs, 22K protein-coding genes, ~300K proteins ~6 million diferences between two individuals Understanding Understanding Understanding Advancing Improving the the structure of the biology of the biology of the science of effectiveness of genomes genomes disease medicine healthcare 1990-2003 Human Genome Project 2004-2010 2011-2020 Beyond 2020 Green, ED et al. Charting a course for genomic medicine from base pairs to bedside. Nature (2011) Personalized/P4 Medicine • Predictive - development of probabilistic health projection based on individual DNA and gene expression • Preventative - creation of therapeutics that will prevent a disease a person is at risk of developing • Personalized - treating an individual based on their unique human genetic variation, complementing the predictive and preventative efforts above • Participatory - patient's active, informed involvement in their medical choices and care, acting in partnership with their health providers The success of genomics Genomic achievements since the Human Genome Project Southern African genome sequences First personal genome Rhesus macaque NCBI's Database of Genotypes and sequenced using new technologies genome sequence Completion of the Mammalian Phenotypes (dbGaP) launched Gene Collection (MGC) 500th genome-wide 9 10 11 12 Phase I HapMap 3 6 7 8 association study 4 5 1 Watson & Crick 2 published describe the DNA double helix 19 21 Y 20 22 16 18 Chicken genome 15 17 !953 13 14 sequence X Human genetic variation is Mendel discovers Yoruba genome breakthrough of the year UK Biobank reaches laws of genetics !966 CFH sequence Age-related macular 500,000 participants !865 degeneration Nirenberg, 1 Khorana & Honeybee Holley determine First genome-wide genome sequence the genetic code association study International data release workshop >1,000 mouse published knockout mutations !977 Wellcome Trust Case Control GA T C Rat genome Consortium publication sequence Platypus genome sequence !982 Sanger and modENCODE publications Maxam & Gilbert develop DNA !990 GenBank sequencing methods database established Human Genome Project Chimpanzee genome Sea urchin launched sequence genome sequence Bovine genome sequence Neanderthal genome sequence !996 ENCODE pilot project complete First cancer genome sequence (AML) Publication of "nished Yeast humanhuman genomegenome sequence (Saccharomyces cerevisiae) genome sequence 2004 !998 Roundworm 1000 Genomes pilot project DogDog genomegenom sequence (Caenorhabditis elegans) First personal genome sequenced complete genome sequence !997 Comprehensive genomic First human methylome map Escherichia coli genome 2005 sequence First direct-to-consumer analysis of glioblastoma wwhole-genomeh test Moore’s law 1 2 3 4 5 6 78 9 2000 Miller syndrome gene discovered by Fruit!y Coding 2006 exome sequencing (Drosophila melanogaster) UTR genome sequence c.56G>A Han Chinese c.403C>Tc.454G>A c.595C>Tc.605A/Cc.611delTc.730C>Tc.851C>T c.1036C>Tc.1175A>G genomege sequence 2003 Cost per human genome sequence 2010 2008 Genetic Information 2006 2007 200! 2004 Nu#eld Council on Bioethics 2002 NondiscriminationNondiscrim Act (GINA) 2002 passedpa in US publication on For details, see http://genome.gov/sequencingcosts 2008 Korean genome sequence personalized healthcare Draft human End of the Human genome sequence Mouse genome Genome Project sequence 2009 Design by Darryl Leja (NHGRI, NIH). 20!0 Watson and Crick photograph: A. Barrington Brown/Photo Researchers; images of Science covers courtesy of AAAS. © 2011 Macmillan Publishers Limited. All rights reserved Green, ED et al. Charting a course for genomic medicine from base pairs to bedside. Nature (2011) The evolution of sequencing technology 300 The Trace archive, started in 2000, houses raw sequence data, and currently holds 1.8 trillion base pairs. Trace $10, 000 250 454 PYROSEQUENCING : Released in 2005, 454 sequencing is considered the first ‘next-generation’ C technique. A machine could sequence ost p er million ba hundreds of millions of base pairs in a se pairs of sequ ence (log scale) single run. 200 AUTOMATED SANGER SEQUENCING: Based on a decades-old method, at the peak of the technique, a single machine could produce hundreds of thousands of base pairs in a single run. $1, 000 150 SEQUENCING BY SYNTHESIS: Other companies such as Solexa (now Illumina) modified the next-generation, sequencing-by-synthesis techniques $100 and can produce billions of base pairs Billions of base pairs in a single run. 100 THIRD-GENERATION SEQUENCING: Companies such as Helicos BioSciences already read sequence from short, single DNA molecules. Others, such as Pacific Biosciences, Oxford Nanopore and Ion Torrent say they can read from longer molecules as they pass through a pore. 50 SEQUENCING BY LIGATION: $10 Whole Genome Shotgun Sequence This technique employed in SOLiD and chemistry from previous technologies Gene sequence stored in and samples every base twice, reducing the error rate. international public databases $1 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Human genome at ten: The sequence explosion. Nature (2010) What would you do if you could sequence everything? Temporal Populations changes Mutation discovery & profiling Protein-DNA Copy number interactions variation Tissue types Environmental & substructures exposures Transcriptionally DNA bar codes active sites AGACCGGC AGTTCCGG GGATCGCG AGTTCCGG CGAGGATG AGACCGGC Metagenomic Compound & heterogeneous libraries samples mRNA microRNA expression expression & discovery & discovery AAAAAAAA AAAAAAAA A Alternative AAAAAAA AAAAAAAA AAAAAAAA splicing & allele-specific expression Kahvejian A et al., Nature Biotechnology (2008) Complex diseases • Do not follow Mendelian inheritance and result from multiple alleles and environment • Responsible alleles contribute different amounts to phenotype • Alleles may be present in only a fraction of all individuals with the phenotype • Need large sample sizes and high density marker maps to find alleles Peltonen, L & McKusick, V. SCIENCE Online 2001 Understanding the biology of genomes Most of the genome is transcribed Pervasive transcription covers: Zhenyu Xu • 70% of human/ mammalian genomes (Carninci et al. 2005) Wu Wei • 85% of yeast genome (David et al. 2006) Julien Gagneur Gingeras, TR. Origin of phenotypes: genes and transcripts. Genome Research (2007) Sandra Clauder-Münster Bidirectional promoters generate pervasive transcription d1 d2 • >30% of promoters are bidirectional, accounting for >50% of unannotated transcripts in the genome Xu, Z., Wei, W., Gagneur, J., Perocchi, F., Clauder-Muenster, S., Camblong, J., Guffanti, E., Stutz, F., Huber, W. and Steinmetz, L.M. Bidirectional promoters generate pervasive transcription in yeast. Nature 2009. Neil, H., Malabat, C., d’Aubenton-Carafa, Y., Xu, Z., Steinmetz, L.M. and Jacquier, A. Widespread bidirectional promoters are the major source of cryptic unstable transcripts in yeast. Nature 2009. Promoter bidirectionality is universal a Yeast Promoter region CUTs and SUTs mRNA Long RNAs Long –2 kb –1 kb Gene DNA b Mammals PALRs PROMPTs mRNA Long RNAs Long –2 kb –1 kb Gene Non-capped: TSSa-RNAs Capped: PASRs Short RNAs NRO-RNAs Carninci, Nature (2009) Yeast: Xu, Z. et al., Bidirectional promoters generate pervasive transcription in yeast. Nature Jan. 2009 Neil, H. et al., Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature Jan. 2009 Human: Leighton, J. et al., Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science Dec. 2008 Preker, P. et al., RNA exosome depletion reveals transcription upstream of active human promoters. Science Dec. 2008 He, Y. et al., The antisense transcriptomes of human cells. Science Dec. 2008 Mouse: Seila, A.C. et al., Divergent transcription from active promoters. Science Dec. 2008 Parasites: Teodorovic et al., Bidirectional transcription is an inherent feature of Giardia lamblia promoters and contributes to an abundance of sterile antisense transcripts throughout the genome. Nucleic Acids Res. 2007 Regulatory roles of antisense expression • Transcription interference: • IME4 (Hongay et al., 2006) • Inactivating histone modification: • Pho84 (Camblong et al., 2007) • Ty1 (Berretta et al., 2008) • Gal1-10 (Houseley et al., 2008) • Activating: • Pho5 (Uhler et al., 2007) ORFs with antisense are more often ‘switched off’ 25)ï7VZLWKRXWDQWLVHQVHORFs without antisense 25)ï7VZLWKDQWLVHQVHORFs with antisense 0.30 Density 0.10 0.00 ï 0 4 6 Minimal expression levels across segregants 18% of ORFs with antisense are switched off vs. 8% without Xu Z et al. Antisense expression increases gene expression variability and locus interdependency. Mol Sys Biol (2011) Ultrasensitivity model for antisense function Off state On state TF antisense sense expression regulatory signal -> sense promoter -> sense activation Functional classes of genes with antisense transcripts: cell fate decision genes (e.g. IME4) condition-specific genes: plasma membrane and stress response genes Understanding the biology of Non-coding transcription: Summary genomes TF • Pervasive transcription originates from bidirectional promoters • Antisense transcripts in yeast play a regulatory role in switching genes off • Increase sensitivity to genetic and environmental changes • Beneficial