Downloaded from Proteins That Evolved Prior to the Diversification of Embry- Phytozome, That Were Not Included in Sequences Gener- Ophyta
Total Page:16
File Type:pdf, Size:1020Kb
Bewick et al. Genome Biology (2017) 18:65 DOI 10.1186/s13059-017-1195-1 RESEARCH Open Access The evolution of CHROMOMETHYLASES and gene body DNA methylation in plants Adam J. Bewick1, Chad E. Niederhuth1, Lexiang Ji2, Nicholas A. Rohr1, Patrick T. Griffin1, Jim Leebens-Mack3 and Robert J. Schmitz1* Abstract Background: The evolution of gene body methylation (gbM), its origins, and its functional consequences are poorly understood. By pairing the largest collection of transcriptomes (>1000) and methylomes (77) across Viridiplantae, we provide novel insights into the evolution of gbM and its relationship to CHROMOMETHYLASE (CMT) proteins. Results: CMTs are evolutionary conserved DNA methyltransferases in Viridiplantae. Duplication events gave rise to what are now referred to as CMT1, 2 and 3. Independent losses of CMT1, 2, and 3 in eudicots, CMT2 and ZMET in monocots and monocots/commelinids, variation in copy number, and non-neutral evolution suggests overlapping or fluid functional evolution of this gene family. DNA methylation within genes is widespread and is found in all major taxonomic groups of Viridiplantae investigated. Genes enriched with methylated CGs (mCG) were also identified in species sister to angiosperms. The proportion of genes and DNA methylation patterns associated with gbM are restricted to angiosperms with a functional CMT3 or ortholog. However, mCG-enriched genes in the gymnosperm Pinus taeda shared some similarities with gbM genes in Amborella trichopoda. Additionally, gymnosperms and ferns share a CMT homolog closely related to CMT2 and 3. Hence, the dependency of gbM on a CMT most likely extends to all angiosperms and possibly gymnosperms and ferns. Conclusions: The resulting gene family phylogeny of CMT transcripts from the most diverse sampling of plants to date redefines our understanding of CMT evolution and its evolutionary consequences on DNA methylation. Future, functional tests of homologous and paralogous CMTs will uncover novel roles and consequences to the epigenome. Keywords: CHROMOMETHYLASE, Phylogenetics, DNA methylation, Whole-genome bisulfite sequencing, WGBS Background CHRromatin Organisation MOdifier (CHROMO) domain DNA methylation is an important chromatin modification between the cytosine methyltransferase catalytic motifs I that protects the genome from selfish genetic elements, is and IV [1]. Identification, expression, and functional important for proper gene expression, and is involved in characterization of CMTs have been extensively per- genome stability. In plants, DNA methylation is found at formed in the model plant Arabidopsis thaliana [2–4] and cytosines (C) in three sequence contexts: CG, CHG, and in the model grass species Zea mays [5–7]. CHH (H is any nucleotide, but G). A suite of distinct de There are three CMT genes encoded in the A. thaliana novo and maintenance DNA methyltransferases establish genome: CMT1, CMT2, and CMT3 [2, 8–10]. CMT1 is and maintain DNA methylation at these three sequence the least studied of the three as a handful of A. thaliana contexts, respectively. CHROMOMETHYLASES (CMTs) accessions contain an Evelknievel retroelement insertion are an important class of plant-specific DNA methylation or a frameshift mutation truncating the protein, which has enzymes, which are characterized by the presence of a been suggested that CMT1 is non-essential [8]. The majority of DNA methylation at CHH sites (mCHH) within long transposable elements in pericentromeric * Correspondence: [email protected] 1Department of Genetics, University of Georgia, Athens, GA 30602, USA regions of the genome is targeted by a CMT2- Full list of author information is available at the end of the article dependent pathway [3, 4]. Allelic variation at CMT2 © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Bewick et al. Genome Biology (2017) 18:65 Page 2 of 13 has been shown to alter genome-wide levels of CHH the placement of non-seed plant CMTs is more closely DNA methylation (mCHH) and alleles of CMT2 may relatedtoCMT3[21].However,thesestudieswerenot play a role in adaptation to temperature [11–13]. In con- focused on resolving phylogenetic relationships within the trast, DNA methylation at CHG (mCHG) sites is often CMT gene family, but rather relationships of CMTs maintained by CMT3 through a reinforcing loop with his- between a handful of species. These studies have without tone H3 lysine 9 di-methylation (H3K9me2) catalyzed by question laid the groundwork to understand CMT- the KRYPTONITE (KYP)/SU(VAR)3-9 HOMOLOG 4 dependent DNA methylation pathways and patterns in (SUVH4), SUVH5, and SUVH6 lysine methyltransferases plants. However, the massive increase in transcriptome data [2, 6, 14, 15]. In Z. mays, ZMET2 is a functional homolog from a broad sampling of plant species together with of CMT3 and catalyzes the maintenance of mCHG [5–7]. advancements in sequence alignment and phylogenetic A paralog of ZMET2, ZMET5, contributes to the main- inferencealgorithmshavemadeitpossibletoincorporate tenance of mCHG to a lesser degree in Z. mays [5, 7]. thousands of sequences into a single phylogeny, allowing Homologous CMTs have been identified in other flower- for a more complete understanding of the CMT gene ing plants (angiosperms) [16–19], early diverging land family. Understanding the evolutionary relationships of plants (Embryophyta) – the moss Physcomitrella patens CMT proteins is foundational for inferring the evolution- and the lycophyte Selaginella moellendorffii – and the ary origins, maintenance, and consequences of genome- green algae (Chlorophyta) Chlorella sp. NC64A and Vol- wide DNA methylation and gbM. vox carteri [20]. The function of CMTs in non- angio Here we investigate phylogenetic relationships of CMTs sperms is poorly understood. However, in at least P. at a much larger evolutionary timescale using data gener- patens a CMT protein contributes to mCHG [21]. ate from the 1KP Consortium (www.onekp.com). In the Within angiosperms (flowering plants), a large number present study, we have identified and analyzed 771 of genes in angiosperms exclusively contain CG DNA messenger RNA transcripts from de novo assembled methylation (mCG) in the transcribed region and a de- transcriptomes and primary coding sequences from anno- pletion of mCG from both the transcriptional start and tated genomes belonging to the CMT gene family. A final stop sites (referred to as “gene body DNA methylation” set of sequences was obtained from an extensive taxo- or “gbM”)[22–25]. GbM genes are generally constitu- nomic sampling of 443 different species including eudi- tively expressed, evolutionarily conserved, and typically cots (basal, core, rosid, and asterid), basal angiosperms, longer than unmethylated genes [25–27]. How gbM is monocots and monocots/commelinid, magnoliids, gym- established and subsequently maintained is unclear. Re- nosperms (conifers, Cycadales, Ginkgoales), monilophytes cently it was discovered that CMT3 has been independ- (ferns and fern allies), lycophytes, bryophytes (mosses, ently lost in two angiosperm species belonging to the liverworts and hornworts), and green algae. CMT homo- Brassicaceae family of plants – Conringia planisiliqua logs identified across Viridiplantae (land plants and green and Eutrema salsugineum – and this coincides with the algae) indicate that CMT genes originated prior to the loss of gbM [19, 25], indicating CMT3 is required for origin of Embryophyta (land plants) (≥480 million years maintenance of DNA methylation. This has led to a hy- ago [MYA]) [30–33]. In addition, phylogenetic relation- pothesis that the evolution of gbM is linked to incorpor- ships suggest at least two duplication events occurred ation/methylation of histone H3 lysine-9 di-methylation within the angiosperm lineage giving rise to the CMT1, (H3K9me2) in gene bodies with subsequent failure of IN- CMT2, and CMT3 clades. In the light of CMT evolution CREASED IN BONSAI METHYLATION 1 (IBM1) to de- we explored patterns of genomic and genic DNA methyla- methylate H3K9me2 [19, 28]. This provides a substrate tion levels in 77 species of Viridiplantae, revealing di- for CMT3 to bind and methylate DNA and through an versity of the epigenome within and between major unknown mechanism leads to mCG. MCG is maintained taxonomic groups and the evolution of gbM in association over evolutionary timescales by the CMT3-dependent with the origin of CMT3 and orthologous sequences. mechanism and during DNA replication by the mainten- ance DNA METHYTRANSFERASE 1 (MET1). Methyl- Results ated DNA then provides a substrate for binding by The origins of CHROMOMETHYLASES KRYPTONITE (KYP) and related family members CMTs are found in most major taxonomic groups of through their SRA domains, which increases the rate at land plants and some algae: eudicots, basal angiosperms, which H3K9 is di-methylated [29]. Finally, mCG spreads monocots and commelinids, magnoliids, gymnosperms, throughout the gene over evolutionary time [19]. ferns, lycophytes, mosses,