The road from genomics to personalized medicine
Raeka Aiyar EMBL [email protected] Steinmetz Group Genome Biology Unit EMBL Heidelberg EUSJA Visit, 18 July 2011 www.embl.de/research/units/genome_biology/steinmetz 6 billion base pairs, 22K protein-coding genes, ~300K proteins
~6 million diferences between two individuals Understanding Understanding Understanding Advancing Improving the the structure of the biology of the biology of the science of effectiveness of genomes genomes disease medicine healthcare
1990-2003 Human Genome Project
2004-2010
2011-2020
Beyond 2020
Green, ED et al. Charting a course for genomic medicine from base pairs to bedside. Nature (2011) Personalized/P4 Medicine
• Predictive - development of probabilistic health projection based on individual DNA and gene expression
• Preventative - creation of therapeutics that will prevent a disease a person is at risk of developing
• Personalized - treating an individual based on their unique human genetic variation, complementing the predictive and preventative efforts above
• Participatory - patient's active, informed involvement in their medical choices and care, acting in partnership with their health providers The success of genomics
Genomic achievements since the Human Genome Project Southern African genome sequences
First personal genome Rhesus macaque NCBI's Database of Genotypes and sequenced using new technologies genome sequence Completion of the Mammalian Phenotypes (dbGaP) launched Gene Collection (MGC)
500th genome-wide 9 10 11 12 Phase I HapMap 3 6 7 8 association study 4 5 1 Watson & Crick 2 published describe the DNA double helix 19 21 Y 20 22 16 18 Chicken genome 15 17 !953 13 14 sequence X Human genetic variation is Mendel discovers Yoruba genome breakthrough of the year UK Biobank reaches laws of genetics !966 CFH sequence Age-related macular 500,000 participants !865 degeneration Nirenberg, 1 Khorana & Honeybee Holley determine First genome-wide genome sequence the genetic code association study International data release workshop >1,000 mouse published knockout mutations
!977 Wellcome Trust Case Control GA T C Rat genome Consortium publication sequence Platypus genome sequence !982 Sanger and modENCODE publications Maxam & Gilbert develop DNA !990 GenBank sequencing methods database established
Human Genome Project Chimpanzee genome Sea urchin launched sequence genome sequence Bovine genome sequence Neanderthal genome sequence !996 ENCODE pilot project complete First cancer genome sequence (AML) Publication of "nished Yeast hhumanuman gegenomenome sequence (Saccharomyces cerevisiae) genome sequence
2004 !998 Roundworm 1000 Genomes pilot project DDogog genogenomem sequence (Caenorhabditis elegans) First personal genome sequenced complete genome sequence !997 Comprehensive genomic First human methylome map Escherichia coli genome 2005 sequence First direct-to-consumer analysis of glioblastoma wwhole-genomeh test Moore’s law 1 2 3 4 5 6 78 9 2000 Miller syndrome gene discovered by Fruit!y Coding 2006 exome sequencing (Drosophila melanogaster) UTR
genome sequence c.56G>A Han Chinese c.403C>Tc.454G>A c.595C>Tc.605A/Cc.611delTc.730C>Tc.851C>T c.1036C>Tc.1175A>G ggenomee sequence 2003 Cost per human genome sequence 2010 2008 Genetic Information 2006 2007 200! 2004 Nu#eld Council on Bioethics 2002 NNondiscriminationondiscrim Act (GINA) 2002 ppasseda in US publication on For details, see http://genome.gov/sequencingcosts 2008 Korean genome sequence personalized healthcare Draft human End of the Human genome sequence Mouse genome Genome Project sequence 2009 Design by Darryl Leja (NHGRI, NIH). 20!0 Watson and Crick photograph: A. Barrington Brown/Photo Researchers; images of Science covers courtesy of AAAS. © 2011 Macmillan Publishers Limited. All rights reserved
Green, ED et al. Charting a course for genomic medicine from base pairs to bedside. Nature (2011) The evolution of sequencing technology
300 The Trace archive, started in 2000, houses raw sequence data, and currently holds 1.8 trillion base pairs.
Trace $10, 000 250 454 PYROSEQUENCING : Released in 2005, 454 sequencing is considered the first ‘next-generation’ C technique. A machine could sequence ost p er million ba hundreds of millions of base pairs in a se pairs of sequ ence (log scale) single run. 200 AUTOMATED SANGER SEQUENCING: Based on a decades-old method, at the peak of the technique, a single machine could produce hundreds of thousands of base pairs in a single run. $1, 000
150 SEQUENCING BY SYNTHESIS: Other companies such as Solexa (now Illumina) modified the next-generation, sequencing-by-synthesis techniques $100 and can produce billions of base pairs Billions of base pairs in a single run.
100 THIRD-GENERATION SEQUENCING: Companies such as Helicos BioSciences already read sequence from short, single DNA molecules. Others, such as Pacific Biosciences, Oxford Nanopore and Ion Torrent say they can read from longer molecules as they pass through a pore.
50 SEQUENCING BY LIGATION: $10 Whole Genome Shotgun Sequence This technique employed in SOLiD and
chemistry from previous technologies Gene sequence stored in and samples every base twice, reducing the error rate. international public databases $1
0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Human genome at ten: The sequence explosion. Nature (2010) What would you do if you could sequence everything?
Temporal Populations changes
Mutation discovery & profiling
Protein-DNA Copy number interactions variation Tissue types Environmental & substructures exposures
Transcriptionally DNA bar codes active sites
AGACCGGC AGTTCCGG
GGATCGCG AGTTCCGG
CGAGGATG AGACCGGC
Metagenomic Compound & heterogeneous libraries samples mRNA microRNA expression expression & discovery & discovery AAAAAAAA
AAAAAAAA A Alternative AAAAAAA AAAAAAAA AAAAAAAA splicing & allele-specific expression
Kahvejian A et al., Nature Biotechnology (2008) Complex diseases
• Do not follow Mendelian inheritance and result from multiple alleles and environment • Responsible alleles contribute different amounts to phenotype • Alleles may be present in only a fraction of all individuals with the phenotype • Need large sample sizes and high density marker maps to find alleles Peltonen, L & McKusick, V. SCIENCE Online 2001 Understanding the biology of genomes Most of the genome is transcribed
Pervasive transcription covers: Zhenyu Xu • 70% of human/ mammalian genomes (Carninci et al. 2005) Wu Wei • 85% of yeast genome (David et al. 2006)
Julien Gagneur
Gingeras, TR. Origin of phenotypes: genes and transcripts. Genome Research (2007)
Sandra Clauder-Münster Bidirectional promoters generate pervasive transcription
d1
d2
• >30% of promoters are bidirectional, accounting for >50% of unannotated transcripts in the genome
Xu, Z., Wei, W., Gagneur, J., Perocchi, F., Clauder-Muenster, S., Camblong, J., Guffanti, E., Stutz, F., Huber, W. and Steinmetz, L.M. Bidirectional promoters generate pervasive transcription in yeast. Nature 2009. Neil, H., Malabat, C., d’Aubenton-Carafa, Y., Xu, Z., Steinmetz, L.M. and Jacquier, A. Widespread bidirectional promoters are the major source of cryptic unstable transcripts in yeast. Nature 2009. Promoter bidirectionality is universal
a Yeast Promoter region CUTs and SUTs
mRNA Long RNAs Long
–2 kb –1 kb Gene DNA b Mammals
PALRs
PROMPTs mRNA Long RNAs Long
–2 kb –1 kb Gene Non-capped: TSSa-RNAs
Capped: PASRs Short RNAs NRO-RNAs
Carninci, Nature (2009) Yeast: Xu, Z. et al., Bidirectional promoters generate pervasive transcription in yeast. Nature Jan. 2009 Neil, H. et al., Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature Jan. 2009 Human: Leighton, J. et al., Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science Dec. 2008 Preker, P. et al., RNA exosome depletion reveals transcription upstream of active human promoters. Science Dec. 2008 He, Y. et al., The antisense transcriptomes of human cells. Science Dec. 2008 Mouse: Seila, A.C. et al., Divergent transcription from active promoters. Science Dec. 2008 Parasites: Teodorovic et al., Bidirectional transcription is an inherent feature of Giardia lamblia promoters and contributes to an abundance of sterile antisense transcripts throughout the genome. Nucleic Acids Res. 2007 Regulatory roles of antisense expression
• Transcription interference:
• IME4 (Hongay et al., 2006)
• Inactivating histone modification:
• Pho84 (Camblong et al., 2007)
• Ty1 (Berretta et al., 2008)
• Gal1-10 (Houseley et al., 2008)
• Activating:
• Pho5 (Uhler et al., 2007) ORFs with antisense are more often ‘switched off’
25)ï7VZLWKRXWDQWLVHQVHORFs without antisense 25)ï7VZLWKDQWLVHQVHORFs with antisense 0.30 Density 0.10 0.00 ï 0 4 6
Minimal expression levels across segregants
18% of ORFs with antisense are switched off vs. 8% without
Xu Z et al. Antisense expression increases gene expression variability and locus interdependency. Mol Sys Biol (2011) Ultrasensitivity model for antisense function
Off state On state
TF
antisense sense expression regulatory signal -> sense promoter -> sense activation
Functional classes of genes with antisense transcripts: cell fate decision genes (e.g. IME4) condition-specific genes: plasma membrane and stress response genes Understanding the biology of Non-coding transcription: Summary genomes
TF
• Pervasive transcription originates from bidirectional promoters
• Antisense transcripts in yeast play a regulatory role in switching genes off
• Increase sensitivity to genetic and environmental changes
• Beneficial for adaptation: greater response to different environments, cell-to-cell variation Understanding the biology of Why are some mosquitoes disease resistant to the malaria parasite?
Stephanie Blandin
Which genetic factors determine resistance to Plasmodium?
Rui Wang-Sattler ! # 177 !70%'456 4 /67 !70%12-- 7 -.*"*><=> &77 !&0%'456%3%12-- :.;<="*"=# !&0%12--%3%'456
67 % #
/6
/67 !/0%12--%3%'456 89:;,)<=><.,<($)$*&+,*<(,)<:&%'9+
<:,.$"&*,%<($)$*&+,*<(,)<:&%'9+
&77
'()%*+,-.* #
67 54
.":. ./-% /6 2*(+
?@:A.-%,B%;"C.%D<-<*">.*%E%:"F#@>
."+-+
(9);+
./%5!
./15.
./##5
./#0%
./+., ./+!, ./+#,
./+-#
."6+,
'()*$,
'()*$# !$.53$+ &7 $" 67$*$! $# 6 8'928*! $%'0) $1&'() & $'+) $'+, $%4+)5657 $%4+. $*'+.- $%&'(. 7 53 47 7 & 6 &7 /6 67 &77 /67 mosquito midgut ?@:A.-%,B%:.;<="*.F%D<-<*">.*% /6 8d post-infection E%:"F#@>
" -.*"*><=> :.;<="*"=# 4 /7 53
&6
97 /@A<*?=),*
87 3
lod score &7 17 "
! 6
.":.
./-% 2*(+
/7 ."+-+
(9);+
./%5!
./15.
./##5
./#0%
./+!, ./+., ./+#, ./+-#
Invading ."6+, '()*$,
!"#$%&"' '()*$# !$.53$+
$" 67$*$! $#
7 7 8'928*! ($)$*&+,*parasitesG.-+.=><#.%,B%:,*H@">,.*
#$%&'+ #
'4(.
4"*-
!/++0
!/##%
!/+.#
!"!#$
!"!!& !"!%& !2.3.0
12--%J14K JL/K J4MK !2,3., !"-52+ '456%J46K !"!-$.
J/L&K J&4&K 73 !"!1'+
!2+03!.
!(503'+ '()*$+,
'456%3%12-- 12--%3%'456 '456%3%12-- 12--%3%'456 !/,--"6 2,* !" +0'.53$+ !#
/,Live -&..,%!7 !& !/ 4"*+ Killed $%'0. $%&'() $12'3) $*'+,- $1&'(. $*'+,/ 64 $'+. 0,. Lysed/1*,% 0,.$"&*,%Melanised
63 Blandin, S et al. Dissecting the genetic basis of resistance to malaria parasites in Anopheles gambiae. Science (2009) !"#$%&%
54
53
4
3
"
'4(.
4"*-
!/++0
!/##%
!/+.#
!"!#$
!"!!& !"!%&
!2.3.0
!2,3.,
!"-52+
!"!-$.
!"!1'+
!2+03!.
!(503'+
'()*$+,
!/,--"6 +0'.53$+
!" !# 4"*+ #$)&*+ #$%&'( !"#$%&'$%()**"+#%,-%.,/"%/,+01,.."+#%123"30)+/2%)+4%052%6,42%,-%/.2)1)+/2%,-%42)4%*)1)3"023%73"+#%/,7+03%)3% 01)"03$!"8#!$%&%'()*!*+(,(-%)(.-!(-!!"#$%&'(%)/!0.,*!.1!)2*!(-3%4(-5!6%&%'()*'!%&*!7(++*48!%-4!4*%4!6%&%'()*'!%&*! 4('6.'*4!.1!*()2*&!9:!,*+%-('%)(.-!.&!9:!+:'('/!;*'(')%-<*!+.<(!";*'#!4*)*&,(-*!)2*!*11(<(*-<:!.1!6%&%'()*!7(++(-58! =2(+*!,*+%-('%)(.-!+.<(!">*+#!<.-)&.+!)2*!<2.(<*!.1!,*<2%-(',"'#!1.&!)2*!<+*%&%-<*!.1!4*%4!6%&%'()*'/!"9#!?.-@ 6%&%,*)&(! A'(-5!)2*!9(-%&:!)&%()'!O&*'(')%-)P!%-4!O,*+%-('(-5P!"*'+),-8!*'&).-!%-4!*'&)./8!6.'()(.-*4!9*+.=!<2&.,.'.,*'! 1.&!<.,6%&('.-#/!Q*-*)(
Reciprocal allele-specific RNAi (rasRNAi) " $%&'()(*% ! &*$)*$ &'()'( 3",(%.*/*0"+(0 ** 3&45 6& 7888 ns ** !"#$ !"#$ 588 A58 &*$)'( 788 58 !7 +,* +,' 78 5
7 8 dsRa dsSa # 9()*2"0(:%.*/*0"+(0 $%&'()(*% 7888 ns 588 ** ** &*$ 7$5 A58 &'( In progress: search for%;<=>(/%1?%.*/*0"+(0%@%=":#<+ 788additional genetic factors 58 7$8 78
!"#$% (-./(00"12 8$5 5 Blandin, S et al. Dissecting the genetic basis of resistance to malaria parasites in Anopheles gambiae. Science (2009) 7 8$8 8 '()*+",(% +,-./0 +,!"#$ +,*. +,'. +,-./0 +,!"#$ +,*. +,'. BCDE B&DE B57E BF8E
!"#$%& Understanding the biology of Malaria and rasRNAi: Future Work disease
Field studies for malaria infection Metagenomic sequencing to test association of mosquito and malaria genotypes in the field
Collaboration with Isabelle Morlais in Yaounde, Cameroon rasRNAi in human cell lines HeLa cells for mitosis phenotypes Patient derived lines Advancing the science of Mitochondria: a diverse, essential medicine organelle
Dual genetic origin Tightly integrated with cellular function
1000 proteins in yeast 1500 proteins in human
intermembrane cristae inner space outer matrix membrane membrane
Conservation Mito: 60% Cell: 46% Yeast models of mitochondrial ATP synthase disorders
Components and structure of yeast Dilution series shows severity of yeast mitochondrial ATP synthase mutant respiratory growth defects Nucleus Glucose Glycerol (Fermentation) (Respiration) % ATP 35°C 35°C synthesis 18 subunits Cytosol Wild type 100 TOM OM IMS TIM IM atp6-T8993G 7 9 9 9 9 9 6 8 i,f Matrix %" $" 4 atp6-T8993C 50 18 subunits ASSEMBLY #"
Atp6p, Atp8p, Atp9p &" !" d Fmc1p !" &" atp6-T9176G <5 h oscp Lefebvre-Legendre et al., JBC 2001 atp6-T9176C 74 atp6-T8851C 6 fmc1Δ 8 S.cerevisiae mtDNA Elodie Couplan, INSERM Brest Severity of homologous mutations in yeast correlates with patient phenotypes
Raeka Aiyar Screen for chemicals active against ATP synthase disorders