<<

The road from to personalized

Raeka Aiyar EMBL [email protected] Steinmetz Group Unit EMBL Heidelberg EUSJA Visit, 18 July 2011 www.embl.de/research/units/genome_biology/steinmetz 6 billion base pairs, 22K -coding genes, ~300K

~6 million diferences between two individuals Understanding Understanding Understanding Advancing Improving the the structure of the biology of the biology of the of effectiveness of genomes disease medicine healthcare

1990-2003

2004-2010

2011-2020

Beyond 2020

Green, ED et al. Charting a course for genomic medicine from base pairs to bedside. Nature (2011) Personalized/P4 Medicine

• Predictive - development of probabilistic health projection based on individual DNA and gene expression

• Preventative - creation of therapeutics that will prevent a disease a person is at risk of developing

• Personalized - treating an individual based on their unique human , complementing the predictive and preventative efforts above

• Participatory - patient's active, informed involvement in their medical choices and care, acting in partnership with their health providers The success of genomics

Genomic achievements since the Human Genome Project Southern African genome sequences

First personal genome Rhesus macaque NCBI's Database of Genotypes and sequenced using new technologies genome sequence Completion of the Mammalian (dbGaP) launched Gene Collection (MGC)

500th genome-wide 9 10 11 12 Phase I HapMap 3 6 7 8 association study 4 5 1 Watson & Crick 2 published describe the DNA double helix 19 21 Y 20 22 16 18 Chicken genome 15 17 !953 13 14 sequence X is Mendel discovers Yoruba genome UK reaches laws of !966 CFH sequence Age-related macular 500,000 participants !865 degeneration Nirenberg, 1 Khorana & Honeybee Holley determine First genome-wide genome sequence the genetic code association study International data release workshop >1,000 mouse published knockout

!977 Wellcome Trust Case Control GA T C Rat genome Consortium publication sequence Platypus genome sequence !982 Sanger and modENCODE publications Maxam & Gilbert develop DNA !990 GenBank sequencing methods database established

Human Genome Project Chimpanzee genome Sea urchin launched sequence genome sequence Bovine genome sequence Neanderthal genome sequence !996 ENCODE pilot project complete First cancer genome sequence (AML) Publication of "nished Yeast hhumanuman gegenomenome sequence (Saccharomyces cerevisiae) genome sequence

2004 !998 Roundworm 1000 Genomes pilot project DDogog genogenomem sequence (Caenorhabditis elegans) First personal genome sequenced complete genome sequence !997 Comprehensive genomic First human methylome map Escherichia coli genome 2005 sequence First direct-to-consumer analysis of glioblastoma wwhole-genomeh test Moore’s law 1 2 3 4 5 6 78 9 2000 Miller syndrome gene discovered by Fruit!y Coding 2006 (Drosophila melanogaster) UTR

genome sequence c.56G>A Han Chinese c.403C>Tc.454G>A c.595C>Tc.605A/Cc.611delTc.730C>Tc.851C>T c.1036C>Tc.1175A>G ggenomee sequence 2003 Cost per human genome sequence 2010 2008 Genetic Information 2006 2007 200! 2004 Nu#eld Council on 2002 NNondiscriminationondiscrim Act (GINA) 2002 ppasseda in US publication on For details, see http://genome.gov/sequencingcosts 2008 Korean genome sequence personalized healthcare Draft human End of the Human genome sequence Mouse genome Genome Project sequence 2009 Design by Darryl Leja (NHGRI, NIH). 20!0 Watson and Crick photograph: A. Barrington Brown/Photo Researchers; images of Science covers courtesy of AAAS. © 2011 Macmillan Publishers Limited. All rights reserved

Green, ED et al. Charting a course for genomic medicine from base pairs to bedside. Nature (2011) The of sequencing technology

300 The Trace archive, started in 2000, houses raw sequence data, and currently holds 1.8 trillion base pairs.

Trace $10, 000 250 454 PYROSEQUENCING : Released in 2005, 454 sequencing is considered the first ‘next-generation’ C technique. A machine could sequence ost p er million ba hundreds of millions of base pairs in a se pairs of sequ ence (log scale) single run. 200 AUTOMATED SANGER SEQUENCING: Based on a decades-old method, at the peak of the technique, a single machine could produce hundreds of thousands of base pairs in a single run. $1, 000

150 SEQUENCING BY SYNTHESIS: Other companies such as Solexa (now Illumina) modified the next-generation, sequencing-by-synthesis techniques $100 and can produce billions of base pairs Billions of base pairs in a single run.

100 THIRD-GENERATION SEQUENCING: Companies such as Helicos BioSciences already read sequence from short, single DNA molecules. Others, such as Pacific Biosciences, Oxford Nanopore and Ion Torrent say they can read from longer molecules as they pass through a pore.

50 SEQUENCING BY LIGATION: $10 Whole Genome Shotgun Sequence This technique employed in SOLiD and

chemistry from previous technologies Gene sequence stored in and samples every base twice, reducing the error rate. international public databases $1

0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Human genome at ten: The sequence explosion. Nature (2010) What would you do if you could sequence everything?

Temporal Populations changes

Mutation discovery & profiling

Protein-DNA Copy number interactions variation Tissue types Environmental & substructures exposures

Transcriptionally DNA bar codes active sites

AGACCGGC AGTTCCGG

GGATCGCG AGTTCCGG

CGAGGATG AGACCGGC

Metagenomic Compound & heterogeneous libraries samples mRNA microRNA expression expression & discovery & discovery AAAAAAAA

AAAAAAAA A Alternative AAAAAAA AAAAAAAA AAAAAAAA splicing & allele-specific expression

Kahvejian A et al., Nature (2008) Complex diseases

• Do not follow and result from multiple alleles and environment • Responsible alleles contribute different amounts to • Alleles may be present in only a fraction of all individuals with the phenotype • Need large sample sizes and high density marker maps to find alleles Peltonen, L & McKusick, V. SCIENCE Online 2001 Understanding the biology of genomes Most of the genome is transcribed

Pervasive transcription covers: Zhenyu Xu • 70% of human/ mammalian genomes (Carninci et al. 2005) Wu Wei • 85% of yeast genome (David et al. 2006)

Julien Gagneur

Gingeras, TR. Origin of phenotypes: genes and transcripts. Genome Research (2007)

Sandra Clauder-Münster Bidirectional promoters generate pervasive transcription

d1

d2

• >30% of promoters are bidirectional, accounting for >50% of unannotated transcripts in the genome

Xu, Z., Wei, W., Gagneur, J., Perocchi, F., Clauder-Muenster, S., Camblong, J., Guffanti, E., Stutz, F., Huber, W. and Steinmetz, L.M. Bidirectional promoters generate pervasive transcription in yeast. Nature 2009. Neil, H., Malabat, C., d’Aubenton-Carafa, Y., Xu, Z., Steinmetz, L.M. and Jacquier, A. Widespread bidirectional promoters are the major source of cryptic unstable transcripts in yeast. Nature 2009. Promoter bidirectionality is universal

a Yeast Promoter region CUTs and SUTs

mRNA Long Long

–2 kb –1 kb Gene DNA b Mammals

PALRs

PROMPTs mRNA Long RNAs Long

–2 kb –1 kb Gene Non-capped: TSSa-RNAs

Capped: PASRs Short RNAs NRO-RNAs

Carninci, Nature (2009) Yeast: Xu, Z. et al., Bidirectional promoters generate pervasive transcription in yeast. Nature Jan. 2009 Neil, H. et al., Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature Jan. 2009 Human: Leighton, J. et al., Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science Dec. 2008 Preker, P. et al., RNA exosome depletion reveals transcription upstream of active human promoters. Science Dec. 2008 He, Y. et al., The antisense transcriptomes of human cells. Science Dec. 2008 Mouse: Seila, A.C. et al., Divergent transcription from active promoters. Science Dec. 2008 Parasites: Teodorovic et al., Bidirectional transcription is an inherent feature of Giardia lamblia promoters and contributes to an abundance of sterile antisense transcripts throughout the genome. Nucleic Acids Res. 2007 Regulatory roles of antisense expression

• Transcription interference:

• IME4 (Hongay et al., 2006)

• Inactivating histone modification:

• Pho84 (Camblong et al., 2007)

• Ty1 (Berretta et al., 2008)

• Gal1-10 (Houseley et al., 2008)

• Activating:

• Pho5 (Uhler et al., 2007) ORFs with antisense are more often ‘switched off’

25)ï7VZLWKRXWDQWLVHQVHORFs without antisense 25)ï7VZLWKDQWLVHQVHORFs with antisense 0.30  Density 0.10 0.00 ï 0  4 6

Minimal expression levels across segregants

18% of ORFs with antisense are switched off vs. 8% without

Xu Z et al. Antisense expression increases gene expression variability and locus interdependency. Mol Sys Biol (2011) Ultrasensitivity model for antisense function

Off state On state

TF

antisense sense expression regulatory signal -> sense promoter -> sense activation

Functional classes of genes with antisense transcripts: cell fate decision genes (e.g. IME4) condition-specific genes: plasma membrane and stress response genes Understanding the biology of Non-coding transcription: Summary genomes

TF

• Pervasive transcription originates from bidirectional promoters

• Antisense transcripts in yeast play a regulatory role in switching genes off

• Increase sensitivity to genetic and environmental changes

• Beneficial for adaptation: greater response to different environments, cell-to-cell variation Understanding the biology of Why are some mosquitoes disease resistant to the malaria parasite?

                                       

         

Stephanie Blandin

Which genetic factors determine resistance to Plasmodium?

Rui Wang-Sattler ! # 177 !70%'456 4 /67 !70%12-- 7 -.*"*><=> &77 !&0%'456%3%12-- :.;<="*"=# !&0%12--%3%'456

67 % #

/6

/67 !/0%12--%3%'456 89:;,)<=><.&#,<($)$*&+,*<(,)<:&%'9+

<:,.$"&*,%<($)$*&+,*<(,)<:&%'9+

&77

'()%*+,-.* #

67 54

.":. ./-% /6 2*(+

?@:A.-%,B%;"C.%D<-<*">.*%E%:"F#@>

."+-+

(9);+

./%5!

./15.

./##5

./#0%

./+., ./+!, ./+#,

./+-#

."6+,

'()*$,

'()*$# !$.53$+ &7 $" 67$*$! $# 6 8'928*! $%'0) $1&'() & $'+) $'+, $%4+)5657 $%4+. $*'+.- $%&'(. 7 53 47 7 & 6 &7 /6 67 &77 /67 mosquito midgut ?@:A.-%,B%:.;<="*.F%D<-<*">.*% /6 8d post- E%:"F#@>

" -.*"*><=> :.;<="*"=# 4 /7 53

&6

97 /@A<*?=),*

87 3

lod score &7 17 "

! 6

.":.

./-% 2*(+

/7 ."+-+

(9);+

./%5!

./15.

./##5

./#0%

./+!, ./+., ./+#, ./+-#

Invading ."6+, '()*$,

!"#$%&"' '()*$# !$.53$+

$" 67$*$! $#

7 7 8'928*! ($)$*&+,*parasitesG.-+.=><#.%,B%:,*H@">,.*

#$%&'+ #

'4(.

4"*-

!/++0

!/##%

!/+.#

!"!#$

!"!!& !"!%& !2.3.0

12--%J14K JL/K J4MK !2,3., !"-52+ '456%J46K !"!-$.

J/L&K J&4&K 73 !"!1'+

!2+03!.

!(503'+ '()*$+,

'456%3%12-- 12--%3%'456 '456%3%12-- 12--%3%'456 !/,--"6 2,* !" +0'.53$+ !#

/&#,Live -&..,%!7 !& !/ 4"*+ Killed $%'0. $%&'() $12'3) $*'+,- $1&'(. $*'+,/ 64 $'+. 0,. Lysed/1*,% 0,.$"&*,%Melanised

63 Blandin, S et al. Dissecting the genetic basis of resistance to malaria parasites in Anopheles gambiae. Science (2009) !"#$%&%

54

53

4

3

"

'4(.

4"*-

!/++0

!/##%

!/+.#

!"!#$

!"!!& !"!%&

!2.3.0

!2,3.,

!"-52+

!"!-$.

!"!1'+

!2+03!.

!(503'+

'()*$+,

!/,--"6 +0'.53$+

!" !# 4"*+ #$)&*+ #$%&'( !"#$%&'$%()**"+#%,-%.,/"%/,+01,.."+#%123"30)+/2%)+4%052%6,42%,-%/.2)1)+/2%,-%42)4%*)1)3"023%73"+#%/,7+03%)3% 01)"03$!"8#!$%&%'()*!*+(,(-%)(.-!(-!!"#$%&'(%)/!0.,*!.1!)2*!(-3%4(-5!6%&%'()*'!%&*!7(++*48!%-4!4*%4!6%&%'()*'!%&*! 4('6.'*4!.1!*()2*&!9:!,*+%-('%)(.-!.&!9:!+:'('/!;*'(')%-<*!+.<(!";*'#!4*)*&,(-*!)2*!*11(<(*-<:!.1!6%&%'()*!7(++(-58! =2(+*!,*+%-('%)(.-!+.<(!">*+#!<.-)&.+!)2*!<2.(<*!.1!,*<2%-(',"'#!1.&!)2*!<+*%&%-<*!.1!4*%4!6%&%'()*'/!"9#!?.-@ 6%&%,*)&(! A'(-5!)2*!9(-%&:!)&%()'!O&*'(')%-)P!%-4!O,*+%-('(-5P!"*'+),-8!*'&).-!%-4!*'&)./8!6.'()(.-*4!9*+.=!<2&.,.'.,*'! 1.&!<.,6%&('.-#/!Q*-*)(

Reciprocal allele-specific RNAi (rasRNAi) " $%&'()(*% ! &*$)*$ &'()'( 3",(%.*/*0"+(0 ** 3&45 6& 7888 ns ** !"#$ !"#$ 588 A58 &*$)'( 788 58 !7 +,* +,' 78 5

7 8 dsRa dsSa # 9()*2"0(:%.*/*0"+(0 $%&'()(*% 7888 ns 588 ** ** &*$ 7$5 A58 &'( In progress: search for%;<=>(/%1?%.*/*0"+(0%@%=":#<+ 788additional genetic factors 58 7$8 78

!"#$% (-./(00"12 8$5 5 Blandin, S et al. Dissecting the genetic basis of resistance to malaria parasites in Anopheles gambiae. Science (2009) 7 8$8 8 '()*+",(% +,-./0 +,!"#$ +,*. +,'. +,-./0 +,!"#$ +,*. +,'. BCDE B&DE B57E BF8E

!"#$%& Understanding the biology of Malaria and rasRNAi: Future Work disease

Field studies for malaria infection Metagenomic sequencing to test association of mosquito and malaria genotypes in the field

Collaboration with Isabelle Morlais in Yaounde, Cameroon rasRNAi in human cell lines HeLa cells for mitosis phenotypes Patient derived lines Advancing the science of Mitochondria: a diverse, essential medicine organelle

Dual genetic origin Tightly integrated with cellular function

1000 proteins in yeast 1500 proteins in human

intermembrane cristae inner space outer matrix membrane membrane

Conservation Mito: 60% Cell: 46% Yeast models of mitochondrial ATP synthase disorders

Components and structure of yeast Dilution series shows severity of yeast mitochondrial ATP synthase mutant respiratory growth defects Nucleus Glucose Glycerol (Fermentation) (Respiration) % ATP 35°C 35°C synthesis 18 subunits Cytosol Wild type 100 TOM OM IMS TIM IM atp6-T8993G 7 9 9 9 9 9 6 8 i,f Matrix %" $" 4 atp6-T8993C 50 18 subunits ASSEMBLY #"

Atp6p, Atp8p, Atp9p &" !" d Fmc1p !" &" atp6-T9176G <5 h oscp Lefebvre-Legendre et al., JBC 2001 atp6-T9176C 74 atp6-T8851C 6 fmc1Δ 8 S.cerevisiae mtDNA Elodie Couplan, INSERM Brest Severity of homologous mutations in yeast correlates with patient phenotypes

Raeka Aiyar Screen for chemicals active against ATP synthase disorders

Δfmc1 ~20 hits No growth YEAST NARP

Drug C7 Simple

Diffusion of the drug around the disk Sensitive

concentration Large dynamic range for dosage Drug Distance from the disk High-throughput

10 11 1 2 3 4 5 6 7 8 9 10 11 Couplan E, Aiyar RS et alA. A yeast-based identifies drugs active against human mitochondrial disorders. PNAS (2011) B C D Chemical genomics in yeast Finding drug targets via gene dosage

drug

Haploinsufficiency Overexpression Growth

Normal Inhibited Deletion Multicopy by drug sensitivity suppression Chemical genomics in yeast Exploiting yeast genome-wide collections

Hoon et al., Nat Chem Biol 2008 Understanding Understanding Understanding Advancing Improving the the structure of the biology of the biology of the science of effectiveness of genomes genomes disease medicine healthcare

1990-2003 Human Genome Project

2004-2010

2011-2020

Beyond 2020

Green, ED et al. Charting a course for genomic medicine from base pairs to bedside. Nature (2011) ~1 Day Ago

February 2011 NHGRI Published New Vision for Genomics