
ARTICLES https://doi.org/10.1038/s41477-018-0349-9 Genome structure and evolution of Antirrhinum majus L Miaomiao Li 1,2,8, Dongfen Zhang1,8, Qiang Gao3,8, Yingfeng Luo4,8, Hui Zhang1,2,8, Bin Ma3,8, Chunhai Chen5,8, Annabel Whibley6, Yu’e Zhang1, Yinghao Cao3, Qun Li1, Han Guo1,2, Junhui Li1,2, Yanzhai Song1,2, Yue Zhang1,2, Lucy Copsey6, Yan Li3, Xiuxiu Li2,3, Ming Qi3, Jiawei Wang7, Yan Chen5, Dan Wang5, Jinyang Zhao5, Guocheng Liu5, Bin Wu5, Lili Yu5, Chunyan Xu5, Jiang Li5, Shancen Zhao5, Yijing Zhang7, Songnian Hu 2,4, Chengzhi Liang 2,3*, Ye Yin 5*, Enrico Coen 6* and Yongbiao Xue 1,2,4* Snapdragon (Antirrhinum majus L.), a member of the Plantaginaceae family, is an important model for plant genetics and molecular studies on plant growth and development, transposon biology and self-incompatibility. Here we report a near-com- plete genome assembly of A. majus cultivar JI7 (A. majus cv.JI7) comprising 510 Megabases (Mb) of genomic sequence and containing 37,714 annotated protein-coding genes. Scaffolds covering 97.12% of the assembled genome were anchored on eight chromosomes. Comparative and evolutionary analyses revealed that a whole-genome duplication event occurred in the Plantaginaceae around 46–49 million years ago (Ma). We also uncovered the genetic architectures associated with complex traits such as flower asymmetry and self-incompatibility, identifying a unique duplication of TCP family genes dated to around 46–49 Ma and reconstructing a near-complete ψS-locus of roughly 2 Mb. The genome sequence obtained in this study not only provides a representative genome sequenced from the Plantaginaceae but also brings the popular plant model system of Antirrhinum into the genomic age. he genus Antirrhinum belongs to the family Plantaginaceae Here we report a near-complete genome assembly of A. majus. and includes about 20 species with the chromosome number We annotated 37,714 protein-coding genes on the basis of expres- Tof 2n = 2x = 16. Antirrhinum originated in Europe and is mainly sion and homology evidence. The assembly was generated by com- distributed in Europe, Asia and Africa around the Mediterranean bining whole-genome shotgun (WGS) sequencing of short reads coast. Different species in the genus Antirrhinum exhibit differences on the Illumina platform and single-molecule real time (SMRT) in flower colour, flower pattern, fragrance and flowering time; inter- long reads on the Pacific Biosciences (PacBio) platform. Most of specific hybridization has also been described. The genus exhibits the assembled sequences were anchored onto chromosomes to two major mechanisms that promote outcrossing: insect pollina- form eight pseudomolecules using a genetic map. Comparative tion (entomophily) and self-incompatibility1–3. The self-compatible analysis based on this sequence reveals that the Plantaginaceae and A. majus was domesticated as a garden ornamental over 2,000 years ago1. Solanaceae diverged from their most recent ancestor about 62 Ma, Antirrhinum has served as a model system for molecular and and that a whole-genome duplication (WGD) event occurred developmental genetics for the past three decades1,4. Several key flo- around 46–49 Ma. We found that the WGD contributed to the ral genes were first identified in Antirrhinum including founding evolution of the TCP gene family related to flower asymmetry in members of the MADS (DEFICIENS) and TCP (CYCLOIDEA) gene A. majus. We also analysed the near-complete genomic structure families, MYB genes controlling petal epidermal cell shape (MIXTA) of the pseudo (ψ) S-locus of A. majus of roughly 2 Mb, which con- or flower colour (ROSEA and VENOSA) and SLFs (S-Locus F-box) tained 102 genes including 37 SLF genes. The genome sequence controlling self-incompatibility5–12. Isolation and analysis of genes provided here will accelerate genomic and evolutionary studies in in Antirrhinum have been facilitated by the availability of endoge- this model species. nous active transposons1,13. For example, five transposable elements (Tam1, Tam2, Tam3, Tam4 and Tam11) 14–18 have contributed to the Results identification of floral homeotic genes. However, so far these studies Genome sequencing, assembly and annotation of A. majus. We have been carried out without the benefit of a genome sequence to sequenced a highly inbred Antirrhinum line (A. majus cv. JI7) provide an overall evolutionary and architectural context for these using a combination of Illumina short-read and PacBio long-read genes, transposons and traits. sequencing technologies. The genome size was estimated from 1State Key Laboratory of Plant Cell and Chromosome Engineering and National Center of Plant Gene Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China. 2University of Chinese Academy of Sciences, Beijing, China. 3State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China. 4Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China. 5BGI-Shenzhen, Shenzhen, China. 6John Innes Centre, Norwich, UK. 7National Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China. 8These authors contributed equally: Miaomiao Li, Dongfen Zhang, Qiang Gao, Yingfeng Luo, Hui Zhang, Bin Ma, Chunhai Chen *e-mail: [email protected]; [email protected]; [email protected]; [email protected] 174 NATURE PLANTS | VOL 5 | FEBRUARY 2019 | 174–183 | www.nature.com/natureplants NATURE PLANTS ARTICLES 50 8 40 Chr1 Chr 0 30 10 20 Table 1 | Statistics for the Antirrhinum genome and gene 20 VII 30 VI 10 40 annotation V 50 IV 0 60 50 III Estimate of genome size 520 Mb II 70 40 I 0 GC content 35.50% 30 10 Chr7 N50 length (contig) 0.73 Mb 20 20 Chr2 Longest contig 3.74 Mb 10 30 Total size of assembled contigs 510.20 Mb 0 40 50 N50 length (scaffold) 2.62 Mb 50 60 Longest scaffold 9.90 Mb 40 70 Total size of assembled scaffolds 511.70 Mb 30 6 0 Number of genes 37,714 Chr 20 10 Average gene length 3,166 bp 10 20 Gene density 73.95 Mb−1 0 3 30 Chr 70 Transcripts number 52,780 40 60 Average coding sequence length 1,036 bp 50 50 60 Average protein length 344 amino acids 40 0 10 30 5 Chr 20 Average exon length 245 bp 30 50 20 40 0 10 4 Chr Average intron length 314 bp Tandem repeat 13.03 Mb Fig. 1 | An overview of the genomic features of A. majus JI7. Roman numerals refer to: I, duplications of genomic paralogous sequences; II, guanine-cytosine (GC) content; III, simple sequence repeats; IV, gene To evaluate the assembled genome quality, we aligned 25,651 density; V, retroelement density; VI, recombination rate (deep colour expressed sequence tags (ESTs) of Antirrhinum downloaded from shows high recombination rates) and VII, eight Antirrhinum chromosomes National Center for Biotechnology Information (NCBI) nucleo- with physical distances including low copy number repetitive elements: tide database to the assembled genome, and found that 96.59% of telomere repeat TTTAGGG (green), 5S recombinant DNA (orange) and the ESTs could be mapped. Alignments between the assembled pericentromeric repeats CentA1 and CentA2 (purple). A ruler with marks genome and three sequenced Bacterial Artificial Chromosomes every 1 Mb is drawn on each chromosome. (BACs) indicated an average nucleotide accuracy of 99.65% in the assembly. BUSCO23 analysis showed 93.88% complete genes at the genome mode and 93.40% at the protein mode, which suggested k-mer distributions to be about 520 Mb. We obtained 90.85 gigabases that the quality of the assembled Antirrhinum genome sequence (Gb) high-quality Illumina paired-end reads, equivalent to 174-fold was comparable to that of other published plant genomes (Petunia sequence coverage of the genome. We used CANU19 to correct and and Arabidopsis) (Supplementary Fig. 5–7 and Supplementary Data assemble 25.89 Gb PacBio reads into contigs and SSPACE20 for scaf- Set 2). Taken together, these results suggested that the A. majus folding with Mate-paired short reads. The assembled genome size genome assembly was both highly accurate and near completion. was 510 Mb with contig and scaffold N50 (the size above which 50% We predicted a total of 37,714 protein-coding genes with an of the total length of the sequence assembly can be found) sizes of average transcript length of 3,166 base pairs (bp) by using a com- 0.73 and 2.6 Mb, respectively (Fig. 1, Table 1, Supplementary Fig. 1 bination of ab initio and evidence-based methods24. We used and Supplementary Table 1–4). The mapping ratio of × 42.22 Illumina Antirrhinum EST sequences and RNA-seq data from six major data was 99.55% (Supplementary Table 5) and the coverage of contigs tissues: leaf, root, stem, stamen, pistil and pollen (Supplementary using PacBio data was 99.97%. The heterozygosity of the assembled Data Set 3) to confirm the expression of the genes. Approximately genome was estimated to be 51 single nucleotide polymorphs (SNPs) 89% of the genes were functionally annotated. The average gene per 1 Mb (0.0051%) by using Illumina sequencing data. density in Antirrhinum was one gene per 15.5 kilobase (kb), which To anchor the A. majus genome sequence to chromosomes, is about three times lower than Arabidopsis (one gene per 4.5 kb) we created linkage maps by re-sequencing 48 recombinant inbred and slightly higher than tomato (one gene per 25.7 kb). Genes were lines (RILs) derived from A. majus crossed to the self-incompati- distributed unevenly, being more abundant towards the ends of ble species A. charidemi. We identified a total of 4,523,444 homo- the chromosomal arms (Fig.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-