Novel Insights Into Tree Biology and Genome Evolution As Revealed Through Genomics David B
Total Page:16
File Type:pdf, Size:1020Kb
PP68CH17-Neale ARI 6 April 2017 10:21 ANNUAL REVIEWS Further Click here to view this article's online features: • Download figures as PPT slides • Navigate linked references • Download citations Novel Insights into Tree • Explore related articles • Search keywords Biology and Genome Evolution as Revealed Through Genomics David B. Neale,1 Pedro J. Martınez-Garc´ ıa,´ 1 Amanda R. De La Torre,1 Sara Montanari,1 and Xiao-Xin Wei2 1Department of Plant Sciences, University of California, Davis, California 95616; email: [email protected] 2State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China Annu. Rev. Plant Biol. 2017. 68:457–83 Keywords First published online as a Review in Advance on woody plants, genome size, transposable elements, perennialism, February 6, 2017 adaptation, fruit quality The Annual Review of Plant Biology is online at plant.annualreviews.org Abstract https://doi.org/10.1146/annurev-arplant-042916- Reference genome sequences are the key to the discovery of genes and gene Access provided by INSEAD on 04/06/18. For personal use only. 041049 families that determine traits of interest. Recent progress in sequencing tech- Copyright c 2017 by Annual Reviews. nologies has enabled a rapid increase in genome sequencing of tree species, All rights reserved Annu. Rev. Plant Biol. 2017.68:457-483. Downloaded from www.annualreviews.org allowing the dissection of complex characters of economic importance, such as fruit and wood quality and resistance to biotic and abiotic stresses. Al- though the number of reference genome sequences for trees lags behind those for other plant species, it is not too early to gain insight into the unique features that distinguish trees from nontree plants. Our review of the published data suggests that, although many gene families are conserved among herbaceous and tree species, some gene families, such as those in- volved in resistance to biotic and abiotic stresses and in the synthesis and transport of sugars, are often expanded in tree genomes. As the genomes of more tree species are sequenced, comparative genomics will further eluci- date the complexity of tree genomes and how this relates to traits unique to trees. 457 PP68CH17-Neale ARI 6 April 2017 10:21 Contents INTRODUCTION . 458 Tree Genomes Sequenced and Published to Date . 462 Chronology and Sequencing and Assembly Strategies . 462 NONCODING DNA CONTENT AND GENOME SIZE VARIATION IN TREES AND THE EVOLUTIONARY MECHANISMS RESPONSIBLE FORTHESEDIFFERENCES................................................. 464 GenomeSizeVariation......................................................... 464 Whole-Genome Duplication . 466 Noncoding DNA Content . 466 Identification of Novel Classes of Noncoding RNAs. 467 GENES, GENE FAMILIES, AND EXPRESSION PATTERNS THAT UNDERLIE THE PERENNIAL HABIT AND ADAPTATION TOTHEENVIRONMENTINTREES....................................... 468 The Perennial Habit: Genes Associated with the Floral Transition, Bud Dormancy, and Woody Growth . 468 Adaptation to the Environment: Abiotic Stress ................................... 469 BioticStress.................................................................... 471 GENES, GENE FAMILIES, AND EXPRESSION PATTERNS THAT UNDERLIEFRUITDEVELOPMENTANDFRUITQUALITY.............. 472 Fruit Development and Ripening . 472 The Metabolism of Sugars . 474 Flavor:KeyFeaturesforFruitQuality........................................... 475 Fruit Characteristics Associated with Beneficial Effects in the Human Diet . 475 CONCLUSIONS................................................................. 476 INTRODUCTION The fundamental importance and applied value of a reference genome sequence for any organism are widely recognized. In plants, the first reference genome sequence was obtained for the model plant Arabidopsis thaliana (5). The first tree genome to be sequenced, and just the third plant Access provided by INSEAD on 04/06/18. For personal use only. genome, was that of black cottonwood (Populus trichocarpa Torr. & Gray), which was published in 2006 (108); in the decade since then, another 40 tree genomes have been sequenced and published Annu. Rev. Plant Biol. 2017.68:457-483. Downloaded from www.annualreviews.org (Table 1). Several recent reviews have included genome sequencing of tree species as well as broader developments in tree genomics (17, 28, 87, 94). The purpose of this review is to chronicle the history of reference genome sequencing in tree species by first reviewing the species sequenced and the technologies applied, and then focusing on three specific areas of plant biology where new knowledge has been gained that might not Eudicots (true otherwise have been obtained from the reference sequences of nontree genomes. The three areas dicots): we discuss are (a) the noncoding and repetitive DNA content of tree versus nontree species, a monophyletic clade which might account for the large sizes of some tree genomes and the distinct characteristics of that includes most of the dicot flowering long-lived, perennial organisms; (b) genes, gene families, and expression patterns that underlie the plants; also called perennial growth habit and biotic and abiotic adaptations to the environment; and (c) genes, gene tricolpates families, and expression patterns that underlie edible fruit development and quality. We have chosen to use a very broad definition of tree that includes perennial plants with an elongated stem as well as all seed plants (eudicots, monocots, and gymnosperms). We also 458 Neale et al. PP68CH17-Neale ARI 6 April 2017 10:21 Table 1 Genomic resources in tree species Assembly N50 Common Genome Sequencing scaffold Family Species name size (Mb) 2n strategy Sequencing technology (kb) Assembly software Reference(s) Actinidiaceae Actinidia Kiwifruit 758 58 WGS Illumina 646.8 ALLPATHS-LG and 53 chinensis GapCloser Arecaceae Elaeis African oil 1,800 32 WGS and BAC Roche 454 and Sanger 1,270 Newbler (Roche GS De 102 guineensis palm pool Novo Assembler) Phoenix Date palm 658a 36 WGS Illumina 9.3 SOAPdenovo 2 dactylifera (excluding scaffolds of <500 base pairs) 671 36 WGS and BAC Roche 454 and SOLiD 329.9 Newbler (Roche GS De 3 Novo Assembler) and BioScope Access provided by INSEAD on 04/06/18. For personal use only. Betulaceae Betula nana Dwarf birch 450 28 WGS Illumina 18.7 SOAPdenovo-63mer 113 Caricaceae Carica papaya Papaya 372 18 WGS Sanger NA Arachne 79 b www.annualreviews.org Euphorbiaceae Hevea Rubber tree 2,150 36 WGS Illumina, Roche 454, and 3 Newbler (Roche GS De 97 Annu. Rev. Plant Biol. 2017.68:457-483. Downloaded from www.annualreviews.org brasiliensis SOLiD Novo Assembler) 2,150c 36 WGS Illumina and PacBio 67.2 Platanus and PBJelly2 66 2,150c 36 WGS and BAC Illumina 1,280 SOAPdenovo and SSPACE 105 Jatropha Barbados 410 22 Pooled BAC Sanger 3.8 PCAP.REP, MIRA 98 curcas nut WGS Illumina and Roche 454 • Fagaceae Juglans regia Persian 606 32 WGS Illumina 465 SOAPdenovo and 75 Tree Genome Sequencing 459 walnut MaSuRCA Quercus robur Pedunculate 740 24 WGS and BAC Illumina, Roche 454, and 260 Newbler (Roche GS De 93 oak Sanger Novo Assembler), SSPACE, and GapCloser Malvaceae Theobroma Cacao 430 20 WGS Illumina, Roche 454, and 473.8 Newbler (Roche GS De 6 cacao Sanger Novo Assembler) Meliaceae Azadirachta Neem tree 364c 30 WGS Illumina, IonTorrent, 452 SOAPdenovo 65 indica and Sanger Moraceae Morus Mulberry 357c 14 WGS Illumina 390.1 SOAPdenovo 47 notabilis (Continued ) PP68CH17-Neale ARI 6 April 2017 10:21 Table 1 (Continued ) Assembly N50 Common Genome Sequencing scaffold Family Species name size (Mb) 2n strategy Sequencing technology (kb) Assembly software Reference(s) 460 Neale et al. Musaceae Musa Banana 523 22 WGS (DH) Roche 454 and Sanger 1,300 Newbler (Roche GS De 25 acuminata (A Novo Assembler) genome) Musa Banana 438d 22 WGS Illumina 467 CLC Genomics 27 balbisiana (B Workbench genome) Myrtaceae Eucalyptus Red river 650e 22 WGS and BAC Roche 454 and Sanger NA CABOG 49 camaldulensis gum Eucalyptus Flooded 640 22 WGS and BAC end Sanger L50 = Arachne and Rebuilder 81 grandis gum 53,900f Oleaceae Olea europaea Olive tree 1,400– 46 WGS Illumina and Roche 454 1.5 CLC Genomics 8 1,500 Workbench and Minimus2 assembler Pinaceae Picea abies Norway 19,600 24 WGS (1n)and Illumina 4.9 CLC Assembly Cell, 85 Access provided by INSEAD on 04/06/18. For personal use only. spruce pooled fosmid (2n) BESST, and GAM-NGS Picea glauca White 20,000b 24 WGS (2n) Illumina 20.3 ABySS 13 Annu. Rev. Plant Biol. 2017.68:457-483. Downloaded from www.annualreviews.org spruce 20,000b 24 WGS (2n) Illumina NG50 = ABySS 115 83.0g Pinus Sugar pine 31,000 24 WGS (1n and 2n) Illumina 246.6 SOAPdenovo and 104 lambertiana MaSuRCA Pinus taeda Loblolly 22,000 24 WGS (1n and 2n) Illumina 66.9 MaSuRCA 84, 116, 128 pine Rhamnaceae Ziziphus Jujube 444c 24 WGS and Illumina 301.1 SOAPdenovo and SSPACE 71 jujuba BAC-to-BAC Rosaceae Malus × Apple 750 34 WGS Roche 454 and Sanger L50 = 1.5f Programs developed at 109 domestica Myriad Genetics Prunus mume Mei 280c 16 WGS Illumina 577.8 SOAPdenovo 126 Prunus persica Peach 265 16 WGS (DH) Sanger 26.8 Arachne 111 Pyrus European 600 34 WGS Roche 454 88.1 Newbler (Roche GS De 18 communis pear Novo Assembler) Pyrus × Chinese 527 34 WGS and Illumina 540.8 SOAPdenovo and SSPACE 119 bretschneideri pear BAC-to-BAC PP68CH17-Neale ARI 6 April 2017 10:21 Rubiaceae Coffea Coffee 710 22 WGS (DH) Illumina and Roche 454 1,261 Newbler (Roche GS De 30 canephora Novo Assembler) and BAC Sanger GapCloser Rutaceae Citrus Clementine 301h 18 WGS, fosmid, and Sanger L50 = Arachne 118 clementina mandarin BAC (1n) 31,400f Citrus sinensis Sweet 367 18 WGS Roche 454 and Sanger L50 = Newbler (Roche GS De 118 orange 250.5f Novo Assembler) 367 18 WGS (DH) Illumina 1,690 SOAPdenovo and Opera 120 Salicaceae Populus Desert 593c 38 WGS and fosmid Illumina 482 SOAPdenovo and SSPACE 74 euphratica poplar pool Populus Black cot- 485 ± 10c 38 WGS and BAC Sanger 3.1 Jazz 108 trichocarpa tonwood Salix Purple 429 38 WGS Illumina and Roche 454 925.0 Newbler (Roche GS De 26 Access provided by INSEAD on 04/06/18.