Solanaceae): Insights Into Molecular Evolution, Positive 3 Selection and the Origin of the Maternal Genome of Aztec Tobacco (Nicotiana Rustica
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.13.905158; this version posted January 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 1 Short Title: Comparison of Nicotiana plastid genomes 2 Plastid genomics of Nicotiana (Solanaceae): insights into molecular evolution, positive 3 selection and the origin of the maternal genome of Aztec tobacco (Nicotiana rustica) 4 Furrukh Mehmood1, Abdullah1, Zartasha Ubaid1, Iram Shahzadi1, Ibrar Ahmed2, Mohammad 5 Tahir Waheed1, Péter Poczai*3, Bushra Mirza*1 6 1Department of Biochemistry, Quaid-i-Azam University, Islamabad, Pakistan 7 2Alpha Genomics Private Limited, Islamabad, Pakistan 8 3Finnish Museum of Natural History (Botany Unit), University of Helsinki, Helsinki, Finland 9 *Corresponding authors: Bushra Mirza ([email protected]) 10 Péter Poczai ([email protected]) 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.13.905158; this version posted January 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 27 Abstract 28 The genus Nicotiana of the family Solanaceae, commonly referred to as tobacco plants, are a 29 group cultivated as garden ornamentals. Besides their use in the worldwide production of 30 tobacco leaves, they are also used as evolutionary model systems due to their complex 31 development history, which is tangled by polyploidy and hybridization. Here, we assembled the 32 plastid genomes of five tobacco species, namely N. knightiana, N. rustica, N. paniculata, N. 33 obtusifolia and N. glauca. De novo assembled tobacco plastid genomes showed typical 34 quadripartite structure, consisting of a pair of inverted repeats (IR) regions (25,323–25,369 bp 35 each) separated by a large single copy (LSC) region (86,510 –86,716 bp) and a small single copy 36 (SSC) region (18,441–18,555 bp). Comparative analyses of Nicotiana plastid genomes showed 37 similar GC content, gene content, codon usage, simple sequence repeats, oligonucleotide repeats, 38 RNA editing sites and substitutions with currently available Solanaceae genomes sequences. We 39 identified twenty highly polymorphic regions mostly belonging to intergenic spacer regions 40 (IGS), which could be appropriate for the development of robust and cost-effective markers to 41 infer the phylogeny of genus Nicotiana and family Solanaceae. Our comparative plastid genome 42 analysis revealed that the maternal parent of the tetraploid N. rustica was the common ancestor 43 of N. paniculata and N. knightiana, and the later species is more closely related to N. rustica. 44 The relaxed molecular clock analyses estimated that the speciation event between N. rustica and 45 N. knightiana appeared 0.56 Ma (HPD 0.65–0.46). The biogeographical analysis showed a 46 south-to-north range expansion and diversification for N. rustica and related species, where N. 47 undulata and N. paniculata evolved in North/Central Peru, while N. rustica developed in 48 Southern Peru and separated from N. knightiana, which adapted to the Southern coastal climatic 49 regimes. We further inspected selective pressure on protein-coding genes among tobacco species 50 to determine if this adaptation process affected the evolution of plastid genes. These analyses 51 indicated that four genes involved in different plastid functions, such as DNA replication (rpoA) 52 and photosynthesis (atpB, ndhD and ndhF), came under positive selective pressure as a result of 53 specific environmental conditions. Genetic mutations of the following genes might have 54 contributed to the survival and better adaptation during the evolutionary history of tobacco 55 species. 56 Key words: Nicotiana, Chloroplast genome, Substitution and InDels, Mutational hotspots, 57 substitutions, positive selection. 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.13.905158; this version posted January 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 58 59 1. Introduction 60 The plant family Solanaceae consists of 98 genera and ~ 2700 species (Olmstead et al., 2008; 61 Olmstead & Bohs, 2007). This megadiverse family consists of herbaceous annual species to 62 perennial trees with a natural distribution ranging from deserts to rainforests (Knapp et al., 63 2004). Nicotiana L. is the fifth largest genus in the family, comprising 76 species, which were 64 subdivided into three subgenera and fourteen sections by Goodspeed (1954). The subgenera of 65 Nicotiana, as proposed by Goodspeed (1954), were not monophyletic (Aoki & Ito, 2000; Chase 66 et al., 2003), but most of Goodspeed's sections were natural groups. The formal classification of 67 the genus has been refined to reflect the growing body of evidence on Nicotiana, consisting of 68 thirteen sections (Knapp, Chase & Clarkson, 2004). Most Nicotiana species are diploid (2n = 2x 69 = 24) while allopolyploid species are also reported (Leitch et al., 2008). Phylogenetic studies 70 have shown that these allopolyploids were formed 0.2 million (N. rustica L. and N. tabacum L.) 71 to more than 10 million years ago (species of sect. Suaveolentes) (Clarkson et al., 2004; Leitch et 72 al., 2008). Cultivated tobacco (N. tabacum L.), commonly grown for its leaves and an important 73 economic and agricultural crop around the world (Occhialini et al., 2016), is a natural 74 amphiploidy derived from two progenitors (Smith 1974). Nicotiana species, especially N. 75 tabacum, are also used as model organisms in plant sciences and genetics (Zhang et al., 2011). 76 The first complete chloroplast genome sequence was also published for this species (Shinozaki et 77 al., 1986). Since the publication of this sequence, the structure and composition of chloroplast 78 genomes has become widely utilized in identifying unique genetic changes and the evolutionary 79 relationships of various groups of plants, while plastid genes have also been linked with 80 important crop traits such as yield and resistance to various pest and pathogens (Jin & Daniell, 81 2015). Chloroplasts (cp) are large double membrane organelles with a genome size of 75-250 kb 82 (Palmer, 1985). Proteins are used not only for photosynthesis but also for the synthesis of fatty 83 acids and amino acids (Cooper, 2000). Angiosperm plastomes commonly contain ~130 genes 84 with up to 80 protein-coding, 30 transfer RNA (tRNA), and four ribosomal RNA (rRNA) genes 85 (Daniell et al., 2016). The plastid genome exists in circular and linear forms (Oldenburg & 86 Bendich, 2015) and the percentage of each form varies within plant cells (Oldenburg & Bendich, 87 2016). Circular formed plastomes have a typical quadripartite structure, with two inverted repeat 88 regions (IRa and IRb), separated by one large single-copy (LSC) and one small single-copy 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.13.905158; this version posted January 14, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. 89 (SSC) region (Palmer, 1985; Amiryousefi, Hyvönen & Poczai, 2018a; Abdullah et al., 2019b). 90 Numerous mutational events occur in plastid genomes: variations in tandem repeats, insertion 91 and deletions (indels), and point mutations, but inversions and translocations are also common 92 (Jheng et al., 2012; Xu et al., 2015; Abdullah et al., 2019a). The plastid genome of angiosperms 93 have a uniparental maternal inheritance (Daniell, 2007) but paternal inheritance has been 94 recorded in a few gymnosperm species (Neale & Sederoff, 1989). The conserved organization of 95 the plastid genome makes it extremely useful in exploring the phylogenetic relationships at 96 various taxonomic levels (Ravi et al., 2008). Polymorphism in the chloroplast genome has been 97 exploited to solve taxonomic issues, infer phylogeny and to investigate species adaptation to 98 their natural habitats (Daniell et al., 2016). Genes in the plastid genome encode proteins and 99 several types of RNA molecules, which play a vital role in functional plant metabolism, and can 100 consequently undergo selective pressures. Most plastid protein-coding genes are under purifying 101 selection to maintain their function, while positive selection might act on some genes in response 102 to environmental changes. Complete plastid genome sequences are also useful tools in 103 population genetics (Ahmad, 2014), species barcoding (Nguyen et al., 2017) transplastomic 104 (Waheed et al., 2011, 2015) and conservation of endangered species (Wambugu et al., 2015). 105 Here, we assembled the plastid genome of five Nicotiana species and compared their sequences 106 to gain insight into the chloroplast genome structure of the genus Nicotiana. We also inferred the 107 phylogenetic relationship of genus Nicotiana and investigated the selection pressures acting on 108 protein-coding genes, then identified mutational hotspots in