<<

An example of the power of genome study: floral MADS-box genes in the genome

Sangtae Kim Department of Biology, Sungshin University genome

Amborella has published 2013 Dec. 20, The Amborella Genome and the Evolution of Flowering Amborella Genome Project‡ ‡All authors with affiliations and contributions are listed at the end of the paper. Amborella Genome Project Authorship of this paper should be cited as “Amborella Genome Project.” Participants are arranged by working group, and then are listed in alphabetical order. Significant contributions (†) and the author for correspondence (*) are indicated within each working group. Joshua P. Der (project manager), Srikar Chamala, Andre S. Chanderbali, and James C. Estill made significant and equal contributions to this work. Research leadership: Victor A. Albert1, W. Bradley Barbazuk2, Claude W. dePamphilis3,4,* ([email protected]), Joshua P. Der3,4,†, James Leebens-Mack5, Hong Ma3,4,6, Steve Rounsley7,8, David Sankoff9, Stephan Schuster10, Douglas E. Soltis2,11, Pamela S. Soltis2,11, Susan R. Wessler12, Rod A. Wing7 Genome sequencing and assembly: Victor A. Albert1, W. Bradley Barbazuk2,* ([email protected]), Srikar Chamala2,†, Andre S. Chanderbali2, Claude W. dePamphilis3,4, Joshua P. Der3,4, Ronald Determann13, Raju Jetty7, James Leebens-Mack5, Hong Ma3,4,6, Paula Ralph3, Steve Rounsley7,8, Stephan Schuster10, Douglas E. Soltis2,11, Pamela S. Soltis2,11, Jason Talag7, Lynn Tomsho10, Brandon Walts2, Stefan Wanke14, Rod A. Wing7 Cytogenetics: Victor A. Albert1, W. Bradley Barbazuk2, Srikar Chamala2, Andre S. Chanderbali2,†, Tien-Hao Chang1, Ronald Determann13, Tianying Lan1, Douglas E. Soltis2,11,* ([email protected]), Pamela S. Soltis2,11 Genome annotation and database development: Siwaret Arikit15, Michael Axtell3,4, Saravanaraj Ayyampalayam5, W. Bradley Barbazuk2, James M. Burnette, III12, Srikar Chamala2, Emanuel De Paoli16, Claude W. dePamphilis3,4, Joshua P. Der3,4, James C. Estill5,†, Alex Harkess5, Yuannian Jiao3,17, James Leebens-Mack5,* ([email protected]), Kun Liu12, Wenbin Mei2, Blake Meyers15, Saima Shahid4, Eric Wafula3, Brandon Walts2, Susan R. Wessler12, Jixian Zhai15, Xiaoyu Zhang5 Synteny analysis: Victor A. Albert1* ([email protected]), Lorenzo Carretero-Paulet1, Claude W. dePamphilis3,4, Joshua P. Der3,4, Yuannian Jiao3,17, James Leebens-Mack5, Eric Lyons7,18, David Sankoff9,†, Haibao Tang19, Eric Wafula3, Chunfang Zheng9 Global gene family analysis: Victor A. Albert1, Naomi S. Altman20, W. Bradley Barbazuk2, Claude W. dePamphilis3,4,* ([email protected]), Joshua P. Der3,4,†, James C. Estill5, Yuannian Jiao3,17,†, James Leebens-Mack5, Kun Liu12, Wenbin Mei2, Eric Wafula3 Targeted gene family curation and analysis: Naomi S. Altman20, Siwaret Arikit15, Michael Axtell3,4, Srikar Chamala2, Andre S. Chanderbali2, Feng Chen21, Jian-Qun Chen22, Vincent Chiang23, Emanuel De Paoli16, Claude W. dePamphilis3,4, Joshua P. Der3,4,* ([email protected]), Ronald Determann13, Bruno Fogliani24, Chunce Guo25, Jesper Harholt26, Alex Harkess5, Claudette Job27, Dominique Job27, Sangtae Kim28, Hongzhi Kong25, James Leebens-Mack5, Guanglin Li21, Lin Li25, Jie Liu23, Hong Ma3,4,6, Blake Meyers15, Jongsun Park28, Xinshuai Qi2,29, Loïc Rajjou30, Valerie Sarramegna24, Ron Sederoff23, Saima Shahid4, Douglas E. Soltis2,11, Pamela S. Soltis2,11, Ying Hsuan Sun23, Peter Ulvskov26, Matthieu Villegente24, Jia-Yu Xue22, Ting-Feng Yeh31, Xianxian Yu25, Jixian Zhai15 Population genomics: Juan J. Acosta32, Victor A. Albert1, W. Bradley Barbazuk2, Riva A. Bruenn3, Alex de Kochko33, Claude W. dePamphilis3,4, Joshua P. Der3,4, Luis R. Herrera-Estrella34, Enrique Ibarr-Laclette34, Matias Kirst32, James Leebens-Mack5, Solon P. Pissis2, Valerie Poncet33, Stephan Schuster10, Douglas E. Soltis2,11, Pamela S. Soltis2,11,* ([email protected]), Lynn Tomsho10

(Amborella Genome Project, 2013. Science) 78 authors in 44 institutes from 12 countries In Science vol. 342 (Dec. 20, 2013)

• The Amborella Genome and the Evolution of Flowering Plants (Amborella Genome Project, 2013) 11 pages of Research Article and 119 pages of Supplementary Materials

• Related papers and reports in the same issue - Assembly and Validation of the Genome of the Nonmodel Angiosperm Amborella (Chamala et al., 2013) Report - Horizontal Transfer of Entire Genomes via Mitochondrial Fusion in the Angiosperm Amborella (Rice et al., 2013) Research Article - Genomic Clues to the Ancestral Flowering Plants (Keith Adams, 2013) Perspective “News” in Nature New York Times Discovery News Sci-News Why Amborella?

A sister to all other Angiosperms! …

? Darwin’s Abominable Mystery: explosive evolution and diversification of angiosperms

- Charles Darwin letter to Joseph Dalton Hooker, 22 July 1879.

“…the rapid rise and early diversification of angiosperms is an abominable mystery…” *

? Early branching angiosperm (sister to all other angiosperms): Ranalian complex?

Amentiferae?

Magnolia? Drimys?

Ceratophyllum? Core- Walnuts, chestnut Squash eudicots Apples, strawberries Debates on the most-basal angiosperm Legumes Creosote 1) STRIGHT PHYLOGENETIC ANALYSIS Star Passion fruit Citrus, cashews Cotton, cocoa Arabidopsis, mustard Pomegranate Gernaliales Gooseberry , , green pepper Coffee Sunflower Elderberry Soltis et al. (1999) Nature , , - Selected 500 taxa Buckwheat, quinoa - Multi-gene analyses: rbcL+atpB+18S Trochodendrales Macadamia nut Sabiaceae Poppy

Monocots

Ceratophyllales Chloranthales Nutmeg Avocado Black pepper Winterales Illiciaceae Star Schisandraceae Austrobaileyaceae Amborellaceae MfAP3.Michelia subfamily AP3

EbAP3.1.Eupomatia * *

SmAP3.Sagittaria *

NTDEF.

STDEF. # Phylogeny of B-class MADS-box genes B-class MADS-box Phylogeny of

CUM26Cucumis

AJB AJB

RfPI.1.Ranunculus

MfPI.Michelia # subfamily Debates on the most-basal angiosperm most-basal on the Debates GENES 2) NETWORK OF DUPLICATED PI Kim et al. (2004) Debates on the most-basal angiosperm 3) Structure of B-class genes also supports that Amborella is a basal-most angiosperm Eudicots Core- Fagales Walnuts, chestnut Cucurbitales Squash eudicots •End of debates? Rosales Apples, strawberries Fabales Legumes Zygophyllales Creosote plant Soltis et al., 1999. Celastrales Oxalidales Star fruit Malpighiales Passion fruit Mathews and Donoghue, 1999 Sapindales Citrus, cashews Qiu et al., 1999 Malvales Cotton, cocoa Brassicales Arabidopsis, mustard Soltis, …, Kim, 2000 Crossosomatales Savolainen et al., 2000 Myrtales Pomegranate Gernaliales Barkman et al., 2000 Saxifragales Gooseberry Zanis et al., 2002 Lamiales Antirrhinum, olive Solanales Tomato, green pepper Kuzoff and Gasser, 2000 Gentianales Coffee Zanis et al., 2002 Garryales Asterales Sunflower Borsch et al., 2003 Dipsacales Elderberry Apiales Dill, fennel Goremykin et al., 2003 Aquifoliales Hilu et al., 2003 Cornales Ericales Blueberry, cranberry Soltis and Soltis, 2004 Berberidopsidales Soltis et al., 2004 Santalales Caryophyllales Buckwheat, quinoa Kim et al., 2004 Gunnerales Martin et al., 2005 Buxales Trochodendrales Leebens-Mack et al., 2005 Goremykin et al., Proteales Macadamia nut 2006 Sabiaceae Ranunculales Poppy Moore et al., 2007 Jansen et al., 2007 Soltis et al., 2012 Monocots …

Ceratophyllales Chloranthales Magnoliales Nutmeg Laurales Avocado Piperales Black pepper Winterales Illiciaceae Star anise Schisandraceae Austrobaileyaceae Nymphaeaceae Amborellaceae Gymnosperms genome? of angiosperms of Amborella Evolutionary reference Asterales Dipsacales Apiales Aquifoliales RECENT GENOME STUDIES Campanulids (Euasterids II) Garryales Gentianales ON ANGIOSPERM EVOLUTION Laminales

Solanales Solanum, Capsella Lamiids

Citrus (Euasterids I) Ericales Cornales

Cucumis Sapindales Arabidopsis Malvales , Gossypium Theobroma Brassicales Glycine , Thellungiella, Carica, Brassica Malvids

• Summary of angiosperm (Eurosids II) Fagales Cucurbitales Malus , Citrullus phylogeny based on recent Rosales , Fragaria, Prunus, Pyrus, Cannabis Fabales , Medicago, Lotus, Cicer, Cajanus, Phaseolus molecular phylogenetic Zygophyllales Fabids

Celestrales (Eurosid I) studies (Soltis et al., 2012; Oxalidales Populus Malpighiales , Ricinus, Manihot, Hevea, Linum Moore et al., 2007; Jansen Vitis VitalesEucalyptus et al., 2007) Myrtales SaxifragalesBeta Caryphyllales Core- Santalales eudicots • Published or downloadable Beberidopsidales Gunnerales genome sequences on the Buxaceae Proteales Basal GenBank. Genomes of Sabiaceae eudicots EUDICOTS Aquilegia Ranunculales FOURTY angiosperm Oryza Euptelea Sorghum genera have been Ceratophyllales

sequenced as of Dec. 2013. Zea Brachypodium MONOCOTS Phoenix Musa Setaria Basal Piperales Angiosperms Magnoliales Laurales Chloranthus Austrobailales Nymphaeaceae Hydatellaceae Amborella

EXTENT GYMNOSPERMS Carpel

Amborella trichopoda • , •small Unisexual •arranged spirally Parts • parts number of Moderate • Undifferentiated Carpel

Tepals

Staminode

female

male

ffolds, !

genome

Arabidopsis

Amborella of of (fluroescence in situ hybridization) and optical ) used. have 48Gb genome assembly and validation

96% Science except except FISH ~870Mb OpGen ever published in land plants ever Covering over 1) single-end (SE) 454-FLX 1) single-end 2) SE 454-FLX+ (PE) 454-FLX 3) 11kb paired-end 4) PE Illumina3-kb HiSeq reads 5) 69,466 BAC-end Genome size: over Assembled To evaluate the fidelity and the chromosomal positioning of sca of positioning the fidelity and chromosomal evaluate To chromosomal mapping (

Amborella genome sequences A highest-quality • •  • Amborella Genome Project, 2013 Genome Amborella Major contents: 1) Angiosperm-wide Whole Genome Duplications (HGDs)

• WGDs are general events throughout the evolution and diversifications in angiosperms.

• Genomic hexaploidization is confirmed in eudicots

Amborella Genome Project, 2013 Science

Gene duplication At least three (alpha, beta, gamma) WGDs were confirmed during the analysis of Arabidopsis genome

Bowers, 2003 Nature After the publication of Vitis genome

Soltis, Bell, Kim, and Soltis (2008) Ann. N.Y. Acad. Sci. Major contents: 1) Angiosperm-wide Whole Genome Duplications (HGDs)

Macrosynteny and microsynteny between genomic regions in Amborella and grape

Amborella Genome Project, 2013 Science

Major contents: 1) Angiosperm-wide Whole Genome Duplications (HGDs)

• Genomic hexaploidization is confirmed in eudicots

Amborella Genome Project, 2013 Science

Major contents: 2) Ancestral angiosperm gene contents

An ancestral gene set for angiosperms: at least 10,088 genes

Amborella Genome Project, 2013 Science

Major contents: 3) Gene family expensions in angiosperms: MADS-box genes, GSK3 genes, Storage Globulins, Terpene Synthase genes…

Amborella Genome Project, 2013 Science

Major contents: 4) Transposable Element content in Amborella

Amborella Genome Project, 2013 Science

Major contents: 5) Population Genomics and Conservation Implications

The genomes of 12 individuals of Amborella, sampled from nearly all known populations

Amborella Genome Project, 2013 Science

Flowers are central identifying structure for angiosperms r evolution as the reference reference the as

Amborella for understanding flowe the molecular genetics developmental of Our understanding of floral developmental genetics in “model” systems ABC model (Coen and Meyeorowitz, 1991): Genetic Control of Floral Organ Identity in Model Plants

B A C

A A+B B+C C Sepal Petal Stamen Carpel

Carpel Petal Stamen

Sepal Extended ABC model (Theissen, 2001)

B A C D E

A A+B+E B+C+E C+E D+E Sepal Petal Stamen Carpel Ovule

Carpel Petal Stamen

Ovule Sepal The ‘quartet model” of floral organ specification in Arabidopsis (modified from Theissen, 2001)

AP3,B PI AP1,A AP2 AGC

Stamens STK/SHP1/SHP2D

AP3 PI SEP1/SEP2/SEP3/SEP4E Petals AG SEP AP3 PI Carpels AP1 SEP SEP AG

AG SEP Sepals

Ovules SEP AP1

SEP STK AP1 SEP

SEP SHP

Amborella trichopoda extant gymnosperms extant

Nuphar advena

Amborellaceae

Nymphaeaceae

Austrobaileyales

Illicium floridanum

Canellales

Piperales

Magnoliales basal Laurales

Magnolia grandiflora monocots

Eupomatia bennettii

monocots

Ranunculales

Proteales

Asimina longifolia Buxaceae

Sabiaceae

Trochodendraceae Gunnerales

Persea americana

Santalales Saxifragales

eudicots

Caryophyllales

rosids Plant J asterids Kim et 2005 al., “MADS” search of of “MADS” search basal angiosperms Expression Studies

Relative Quantitative RT-PCR Real-time PCR In situ Hybridization Am.tr.PI

Am.tr.AP3 Broad expression of B-class genes of expression Broad •

In situ hybridization • Broad expression of B-class genes

female

Tepals Carpels 1st 2nd 3rd 4th B-class Am.tr.PI +++ +++ +++ + Am.tr.AP3.1 +++ +++ ++ - Am.tr.AP3.2 +++ +++ - - C-class Am.tr.AG - +++ +++ - E-class Am.tr.AGL2 +++ +++ +++ -

B B ACAC RQ RT-PCR E

AP3 AP1 PI AG

Gerbera

Petunia

Antirrhinum Arabidopsis

core eudicots eudicots

Silene

Ranunculus

Sagittaria

Asparagus Tulipa

monocots Oryza

• of patterns gene expression of Evolution genes in angiosperms. floral MADS-box floral part: strongly expressed Color-filled Empty floral part: expressed not expressed/weakly equivocal Dashed organs: or uncertain

Zea

Asarum

Magnolia Asimina

A B C magnoliids Persea

AP1 PI AP3 AG

Nuphar Nuphar Plant J Amborella Kim et al. (2005) Amborella MADS genes from genome sequences

Copy numbers of MADS-box genes in four representative

Species Type I Type II Total Reference

Amborella trichopoda 13 23 36 this study

57 50 107 (PARENICOVA et al. 2003) 60 45 105 this study

32 43 75 (ARORA et al. 2007) Oryza sativa 27 37 64 this study

41 64 105 (LESEBERG et al. 2006) Populus trichocarpa 39 45 84 this study

Amborella Genome Project, 2013 Science

Phylogenetic relationships of Type I MADS-box genes in Amborella trichopoda and Arabidopsis thaliana. The tree was constructed using maximum likelihood in PhyML. Bootstrap values > 50% are shown.

Amborella Genome Project, 2013 Science

Phylogenetic relationships of Typetrichocarpa II MADS-box genes in Amborella trichopoda , Arabidopsis thaliana , Populus and Oryza sativa . The tree was constructed by using maximum likelihood in PhyML. Bootstrap values > 50% are shown.

Amborella Genome Project, 2013 Science

Identified 32 Amborella MADS-box genes Putative subfa mily name Gene name Protein ID Location Cufflinks coverage Source AP1 Am.tr.AP1 scaffold00047.105 scaffold00047_3064751_3090556 full coverage, 200-3000 this study AP3 Am.tr.AP3.1 scaffold00001.225 scaffold00001_4348140_4349431 full coverage, 1500-3800 (KIM et al. 2004); EST AP3 Am.tr.AP3.2 scaffold00066.97 scaffold00066_1372288_1376738 full coverage, 1000-3000 (KIM et al. 2004); EST (KIM et al. 2004);EST PI Am.tr.PI.1 scaffold00017.226 scaffold00017_5901058_5901961 full coverage, 2000-6000 PI Am.tr.PI.2 scaffold00089.36 scaffold00089_907931_909820 full coverage, 100-190 this study AG Am.tr.AG scaffold00021.296 scaffold00021_6744499_6778414 full coverage, 800-1700 (KIM et al. 2005); EST type II STK Am.tr.STK scaffold00071.203 scaffold00071_3440534_3462334 full coverage, 350-721 EST AGL2 Am.tr.AGL2 scaffold00047.121 scaffold00047_3344579_3375581 full coverage, 3000-6000 (KIM et al. 2005); EST AGL9 Am.tr.AGL9 scaffold00013.53 scaffold00013_1262781_1295413 full coverage, 1000-2400 (ZAHN et al. 2005); EST AGL6 Am.tr.AGL6 scaffold00001.413 scaffold00001_8828935_8888309 full coverage, 1900-4000 (KIM et al. 2005); EST AGL12 Am.tr.AGL12 scaffold00071.216 scaffold00071_3650392_3652966 partial coverage, 40-89 this study AGL15 Am.tr.AGL15 scaffold00053.185 scaffold00053_4857033_4870850 full coverage, 50-185 this study TM3 Am.tr.TM3 scaffold00001.409 scaffold00001_8708320_8773009 full coverage, 1000-2100 this study STMADS11 Am.tr.StM11 scaffold00127.17 scaffold00127_599439_654192 full coverage, 1500-4000 EST OsMADS32 Am.tr.OsM32 scaffold00001.226 scaffold00001_4397899_4400065 full coverage, 1000-2700 this study TM8 Am.tr.TM8.1 scaffold00013.57 scaffold00013_1472673_1472858 full coverage, 900-1900 this study TM8 Am.tr.TM8.2 scaffold00013.60 scaffold00013_1578058_1662108 full coverage, 50-320 EST ANR1 Am.tr.ANR1.1 scaffold00109.2 scaffold00109_29724_30613 partial coverage, 30-64 this study ANR1 Am.tr.ANR1.2 scaffold00046.134 scaffold00046_3995369_4044817 full coverage, 100-660 this study GMM13 Am.tr.GMM13.1 scaffold00001.461 scaffold00001_10158193_10163424 full coverage, 10-20 this study GMM13 Am.tr.GMM13.2 scaffold00002.466 scaffold00002_8048974_8049195 partial coverage, 1-2 this study MIKC*-P Am.tr.MP scaffold00010.504 scaffold00010_8389805_8396126 full coverage, 5-179 this study MIKC*-S Am.tr.MS scaffold00010.217 scaffold00010_4213059_4213797 partial coverage, 5-15 this study α Am.tr.MA1 scaffold00140.17 scaffold00140_402687_403790 full coverage, 5-19 this study α Am.tr.MA2 scaffold00159.17 scaffold00159_495293_495928 partial coverage, 1-4 this study scaffold00159.19 scaffold00159_504687_505322 full coverage, 1-17 this study type I α Am.tr.MA3 α Am.tr.MA4 scaffold00159.18 scaffold00159_499712_500182 partial coverage, 1-6 this study α Am.tr.MA5 scaffold00050.92 scaffold00050_4694781_4695569 full coverage, 5-34 this study α Am.tr.MA6 scaffold00025.385 scaffold00025_5897664_5898831 partial coverage, 1-3 this study β Am.tr.MB1 scaffold00025.268 scaffold00025_3949692_3950678 full coverage, 2-36 this study β Am.tr.MB2 scaffold00025.271 scaffold00025_3987163_3987890 unavailable this study β Am.tr.MB3 scaffold00176.19 scaffold00176_386806_387873 full coverage, 5-72 this study β Am.tr.MB4 scaffold00022.374 scaffold00022_6148497_6149633 unavailable this study β Am.tr.MB5 scaffold00116.30 scaffold00116_1677969_1678568 full coverage, 2-50 this study β Am.tr.MB6 scaffold00017.X# scaffold00017_3178608_3179039 full coverage, 2-16 this study γ Am.tr.MC scaffold00095.150 scaffold00095_2797522_2798208 partial coverage, 1-5 this study Amborella Genome Project, 2013 Science

Expressions for newly detected MADS-box genes from Amborella genome

Am.tr.GLO.2

Am.tr.SQUA

ACTIN

TE(F) TE(M) SN+SD CA LE

Relative quantitative RT-PCR results of newly identified key floral MADS-box genes from genome study (26 PCR cycles). Expression of ACTIN was compar ed with new genes in each tissue. TE(M), tepals in male flowers; TE(F) tepals in female flowers; SN, stamens; SD, ; CA, carpels; LE, leaves.

AP3 AP1 PI AG

Gerbera

Petunia

Antirrhinum Arabidopsis

core eudicots eudicots

Silene

Ranunculus

Sagittaria

Asparagus Tulipa

monocots Oryza

• of patterns gene expression of Evolution genes in angiosperms. floral MADS-box floral part: strongly expressed Color-filled Empty floral part: expressed not expressed/weakly equivocal Dashed organs: or uncertain

Zea

Asarum

Magnolia Asimina

A B C magnoliids Persea

AP1 PI AP3 AG Illicium

Nuphar Nuphar Plant J Amborella Kim et al. (2005)

AP3 AP1 PI AG

Gerbera

Petunia

Antirrhinum Arabidopsis

core eudicots eudicots

Silene

Ranunculus

Sagittaria

Asparagus Tulipa

monocots Oryza

• of patterns gene expression of Evolution genes in angiosperms. floral MADS-box floral part: strongly expressed Color-filled Empty floral part: expressed not expressed/weakly equivocal Dashed organs: or uncertain

Zea

Asarum

Magnolia Asimina

A B C magnoliids Persea

AP1 PI AP3 AG Illicium

Nuphar Nuphar Amborella our understanding of ancestral our understanding of from genome sequences from changed expression pattern in angiosperm. expression pattern Finding additional MADS-box genes Finding additional MADS-box Kim et al. unpublished The ‘quartet model” of floral organ specification in Arabidopsis (modified from Theissen, 2001)

AP3,B PI AP1,A AP2 AGC

Stamens STK/SHP1/SHP2D

AP3 PI SEP1/SEP2/SEP3/SEP4E Petals AG SEP AP3 PI Carpels AP1 SEP SEP AG

AG SEP Sepals

Ovules SEP AP1

SEP STK AP1 SEP

SEP SHP

Conserved aa residues in angiosperm PI- and AP3- PI PI homologues in K-domain except Amborella

PI AP3-1 9 9 1 7 8 0 PI AP3-2 … 2

PI.Arabidopsis EIDRIKKENDSLQLELRHLEN AP3-1 PI LtPI.Liriodendron EVERIKKENDSMQIKLRHLEN

CsPI.Chloranthus ELDRIKKENDSMQIELRHLEN AP3-1 AP3-1 CfPI.1.Calycanthus EVERIKKENDSMQIKLRHLEN

Pe.am.PI.Persea EVERIKKENDSMLIKLRHLEN AP3-2 Il.pa.PI.Illicium EVDRVRKENESMQIELKHLEN AP3-1 Nu.ad.PI.Nuphar ELDRIRKENENMQIELRHFEN

PI-homologues Nu.ve.PI.Nuphar ELDRIRKENENMQIELRHFEN AP3-2 PI Am.tr.PI.Amborella EVDRMKKDNNEQMRIELRHLR Am.tr.AP3-1.Amborella DLGNLKEEESNRLRKLIRQKR AP3-2 AP3-1 Am.tr.AP3-2.Amborella ELSSLKEENNRLEN QKLIRQK Nu.ve.AP3.1.Nuphar EFNKLKEKNERLRRSIRQRN R AP3-2 AP3-2 Nu.ve.AP3.2.Nuphar EFNKLKEKNERLRKSIRQRN R Il.pa.AP3.Illicium ELNKLKEENNKLR-KIRQRN R CsAP3.Chloranthus YFEKLKETNNKLRKEIRQRN R CfAP3.1.Calycanthus HLSKLTEDNNRLRREIRQRN R Pe.am.AP3.Persea HLNKLKDDNNKLRREIRQRN R EbAP3.1.Eupomatia HLNKLKEDNNNLRREIRQRN R

AP3-homologues AP3.Arabidopsis TKRKLLETNRNLRTQIKQRN R … Kim et al. (2004) GGM2.Gnetum ELIKERRENEKLRSKLRYM AJB Y -L-QD... -L-QQ... -L-QQ... -L-HQ... -L-HN... -L-HD... -L-HQ... -L-HD... -L-HQ... -L-QR... -L-HA... -L-HD... -L-QE... -L-QE... -L-HQ... -L-HE... -LHHG... -L-QP... -L-HE... -L-HD... -L-HG... -L-QE... -L-QE... -I-HD... -L-HQ... -L-QG... -L-HG... -L-QE... -L-QE... N N N N N N N N N N N N N N N N N N N N N N N N N N N N N X YX AP3 PI S----QP S----QP M----QP I----QP I----QP I----QP C----QP I----QP M----QP N----QP I----QP I----QP S----QP M----QP M----QP I----QP T----HP S----HP I----QP N----HP S----QP F----QP M----QP C----QP A----EG I----QP I----QP I----QP AJB P P P P P P P P P P P P P P P P P P P P P P P P P P P P Kim et al. (2004) V-Q V-Q V-Q V-Q V-Q I-Q L-Q LPTNHH-----PTLHSG... V-Q I-Q V-Q FHQNHHHYYPNHGLHAP... V-Q V-Q V-Q V-Q V-Q L-Q V-Q V-Q M-R V-Q V-H V-Q R R R R R R R R R R R R R R R R R R R R R R R R PI/PI derived motif F F Y F FCV-Q F F L L F L F FGL-Q F F FCI-Q F F Y A A A A A A A A A A A A A A A A A A A

PI ALANAGSHVY ------T GLANDASHIF GFPNGGPRII GYQIEGSRAY VLANGGAHI------L-HD... GSRN-SSLMF ALANGGAHHH------S-L-HD... VMRNGNAQPFPISV-Q VMRNGNPQLFPI GWANGGSQMF ELANGGPNIF GVRNTH-L-F and L FGLANGGGHVF L L L VAFANRVPNSY L L L AL AL A AL A AL AL AL M------PF M------PPFTFQL-H M------PF M------PFIF M------PFTF Q M------PF ------PFGF I------PF M------PF I------PF ------PFGCCV-Q M------PMTF L Q Q Q Q Q Q Q Q Q Q Q Q Q Q D-S------HYQ---NPI------PPYGF -SS---- -P------PS---- K-S------V E-T------D-A------VYGYPPQMSAPRILTF-RLH----PN NS------V NNNT--- -SN---- D-S------V E-S------D-S------EAQ-----M------PF -PS---- -PSQSG- E-S------E-S------M E-S------M -AD---- E-T------E-A------V --Q---NHM------PF G-A------VINLAHGG------N----H-D-L-HE... E-S---- F-TS---- Y Y Y Y F-TA----PV------PFGF Y Y Y Y Y G--Q------FGY 7 D DY DY DY DY DY DY DY DY D DY DY DY DY DY DY DY DY DY D -R- -R- ---KGIPSTSGV ------E A--EST-G------D --G GP- ---E -RR ---K GP- GP- --- G G G G G G G G G G G G G G exon E-----YP----S------ELD-PGYHQR-- ELE-HGYHQR-E--RREG-PP---- EMEAGVCSNPSD--RR DLD-NGFHPK-E--R- EMD-HGYHQR-E--RREYHP----- EMD-NGYHKREE------RYQNQQN DM------N EVD-HGYHHK--I-R- DID-PGRNKK-- EMP-QDYHQK-D--RR ELE-ISYHQK-DP---E EVDHHGYHQK-- NMGEIGEVFQQRENH- GMMMR--DH------Amborella Amborella AP3 -----GLED--D R R R R R R R R R R R R R R R -HPV---YGFVDN-DPT--N -DPQ---FGMIE--D -DPH---FGLVD--NE--G L- -DPH---YGLVD--N -NGA---YGLVDN-- -DPQ---YGLVEN-D L-KDIL--GSNNK-- -NGG---YGLVDD-- LN-----CSF-DGSEE---N LN-----CTFEE-SEE--- -DPH---YALANQ-E---E GNV GNM VNV ENK A-ICA---YGIAYN-E-T-L M-NQG---YDMLDR-E E E GNV E E E E E E E E D-NGD---CALGDN-- D DE D D DEAER D D D D D DE DEAER GG-- AIALP QMA-G ------E E E E E -LEIATM--N- V -HQI- EG------NPPS--YLFEDN-- -E--- M M M 6 RLVLKLQHQQ--QLM-NGNG RLNSILHHQQ--LAM- MMRHM RLSYIL-HQQQ-LAM-EGNV RLSYILHHQQ--LAL- QLKYI RLSYILQHQQQ-LAM- RLAYILHHQQ--IAM- NLGCALAGK------KLMHEFGIR------G RLAYQL-HK-MMKSE-E-NL RLTYIL-HQQQQMAM- ELLSA NLIHELELR------A QLAYIL-HQQ-SLAM-NGNL RLKYLLQHQQ--LAI-EGSM RLSYIL-HQQQ-ISM-GGNV NQLAYMLHKQ--EMDG---NM K K RQLKYIL-HQQ- K K RQMKYIL-HQQ- NK NK NK N NK NK NK NK NK N NK N NK NK QNK QNK ED DE E EE DE DE D EE EIHRNLVLEFDAR------R DE QTH DE DE EE ERH ERYGNLFLDWEGK------C EE SNRNKVFRDA--Q-- EE EE DE exon E NDEQAKLIRVLEGQ------A HEAHARLVRALEGQ------A E HESHANLVRALEGK------TETHHNLLREFEGR------HEIHTNLIHTLEGR------EV-NGH---FGFSGND--AP-H LE LE LE LE LE LE LE LE LE LE LE LE LE L LE LE LE L LE L L LE LE L RNDRK KSERD KRIKN KNERV KKLKNSHEAHRNLMRELEMR------KNHKM KNERT NNERI KNERS KKVRNV KNGRA KKVKN KKVRSAQDVY KNERT KNERI KKLRS KKNKSQQDIQ KKLRN KKIRSA KNERI KKLRS KNERM KNERN KKLRS K K K K K K K K K K K K K K K K K K K K K K K K TC TY ML TYGKKLKHRQEEHEKLLHRLEGI------TK TS TY TFRKKIRN TT TCRKKVRN TY TY 5 DTY E E E DTTRKKIRS TE T TE TE TE TE TE TE TE TE T exon K-domain C-domain QMEIYR----MM QMELLN----ML KNHLVTNQ QMDYLK----ML QMEMYK----LH QMDFLK----ML QMEHIRTR QAELLK----TL QTECLNNDIQIL QMEIFK----KKAKNQKA QMEFVR----MMRKHNEMVEEENQSLQFKL-RQMHLDP-MNDN-VMESQAVYDHHHHQNIA QTECLNNDIQIL QMDFLK----ML QNEVLR----MMRKKTQSMEEEQDQLNCQL-R K K K K K K K K K K K K K K 4 ...... RKYHVITTQ ...RKYHVINTQ ...RQMQCLK----ML ...... RKYKVISNQIDTS ...RKYHVIKTQ ...RKLKVIGNKL ...... RMSKYMD----AVRENNRA ...RQMECLR----IY ...KKYQVISSQ ...RKYHVINNES ...RKFKSLGNQI ...... RKYHVIQTQ ...VLDSKIKRQIDTY ...RKYHVLKTQ ...... RKYHVIATQ ...ALDNKIKRQIDTYRKKIKAADSIRNIGFMEL--Q------...RQMEFLR----AL ...... RKYHVINTQ ...RKYHVITTQ ...... HQMEILI----SKRRNEKMMAEEQRQLTFQL--QQQEMAIASNA- exon Similar C-domain sequences between Amborella CpPI.Caltha Iillicium PI* Am.tr.AP3.Amborella EbPI.Eupomatia TcAP3.Tacca CfAP3.1.Calycanthus MfPI.Michelia SLM2.Silene CsPI.Chloranthus Am.tr.PI.Amborella DEF.Antirrhinum CsAP3.Chloranthus HPDEF1. AePI.Asarum MASAKO.BP.Rosa CfPI.1.Calycanthus AeAP3.1.Asarum Il.pa.AP3.Illicium AP3.Arabidopsis Nu.ad.PI.Nuphar MfAP3.Michelia Nu.va.AP3-2.Nuphar HmTM6. SmPI.Sagittaria GLO.Antirrhinum Eu.be.AP3-1.Eupomatia Nu.va.AP3-1.Nuphar Pe.am.PI.Persea Nu.va.PI.Nuphar Pe.am.AP3.Persea ScAP3.Sanguinaria OrcPI.Orchis FBP1.Petunia PI.Arabidopsis

“DEAER” motif in -homologues PI -homologues AP3 , et al., 2006 Kim

Soltis, … Bot. Res. Adv. Expected protein qertets protein Expected on DNA sequence based data Y Y YY YY YY YY Y Y PI PI AP3-1 AP3-1 AP3-2 AP3-1 PI PI XX XX XX XX PI PI AP3-1 AP3-1 AP3-2 AP3-1 PI PI PI PI AP3-1 AP3-1 AP3-1 AP3-2 AP3-2 AP3-2

eudicots maybe most of basal angiosperms monocots, Amborella XY XY XX XY XY XY XY XX PI PI AP3 AP3-2 AP3-2 AP3-2 AP3-1 AP3-2 AP3-1 AP3-2 AP3-2 AP3-2 PI PI YX YX YX AP3-1 AP3-2 AP3-1 AP3-2 1: 21 4: Yeast two-hybridization system

Bait Prey

Yeast

DNA-binding Activation BD AD Medium: domain domain -Trp/-Leu plasmid plasmid TRP1 LEU2

AD BD ADE2, HIS3, lacZ, MEL1 Yeast two-hybridization system

BaitPrey X X Y Y Yeast

DNA- Activation BD DNA-binding AD Activation binding domain Medium: domain domain -Trp/-Leu/-His/-Ade domain plasmid plasmid plasmid plasmidTRP1 LEU2 TRP1 LEU2

Interaction Y X ADRNA Transcription BD polymerase ADE2, HIS3, lacZ, MEL1 Y2H AtPI AtAP3 GGM2 Am.tr.AG sAm.tr.AP3.1 sAm.tr.AP3.2 Am.tr.PI Am.tr.AP3.1 Am.tr.AP3.2 Am.tr.AGL6 Am.tr.AGL2 sAm.tr.AGL9 sAm.tr.PI

sAm.tr.PI sAm.tr.AP3.1 sAm.tr.AP3.2

Am.tr.PI

Am.tr.AP3.1

Am.tr.AP3.2

Am.tr.AGL6

Am.tr.AG

Am.tr.AGL2

sAm.tr.AGL9

AtPI

AtAP3

GGM2

Y2H in Kim lab (Korea) Y2H

ΔC indicates the truncation of the C-terminal regions of Am.tr.AGL9 and Am.tr.AGL6 because the binding domain vector constructs of the full-length candidate genes auto- activated the expression of reporter genes.

Y2H in

Amborella Genome Project, 2013 Science Kong lab (China)

Amborella Genome Project, 2013 Science

 Diverse PPI patterns occurred after gene duplication.  It has contributed to innovations in the regulatory network for reproductive organ development and to the origin of the .

Amborella Genome Project, 2013 Science

EMSA (electrophoretic mobility shift assays)

Y2H

Y2H in

Malzier, … Kim, … et al., 2014 Annals of Botany Theissen lab (Germany)

Conclusion • Amborella genome provide a unique reference for inferring the genome content and structure of the MRCA of living angiosperms. • WGDs have led the origination and diversification of angiosperms. • The power of genome sequencing in the field of gene family studies is well represented in the Amborella genome, specifically in the case of MADS-box transcription factors. phenotype!!! DOESN’T recover

Am.tr.PI.1 Am.tr.PI.1 Arabidopsis pi Ongoing Research Wild type Wild Thanks to: - Amborella Genome Project Group D. Soltis, P. Soltis, C dePamphilis, H. Ma, H. Kong… - Other members of the Plant Molecular Phylogeny Lab in Sungshin Univ. 2014, Happy New Year

Amborella trichopoda Baill. Designed by Sangtae Kim