Capture of a Functionally Active Methyl-Cpg Binding Domain by an Arthropod Retrotransposon Family
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press Research Capture of a functionally active methyl-CpG binding domain by an arthropod retrotransposon family Alex de Mendoza,1,2 Jahnvi Pflueger,1,2 and Ryan Lister1,2 1Australian Research Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, The University of Western Australia, Perth, Western Australia, 6009, Australia; 2Harry Perkins Institute of Medical Research, Perth, Western Australia, 6009, Australia The repressive capacity of cytosine DNA methylation is mediated by recruitment of silencing complexes by methyl-CpG binding domain (MBD) proteins. Despite MBD proteins being associated with silencing, we discovered that a family of arthropod Copia retrotransposons have incorporated a host-derived MBD. We functionally show how retrotransposon- encoded MBDs preferentially bind to CpG-dense methylated regions, which correspond to transposable element regions of the host genome, in the myriapod Strigamia maritima. Consistently, young MBD-encoding Copia retrotransposons (CopiaMBD) accumulate in regions with higher CpG densities than other LTR-retrotransposons also present in the genome. This would suggest that retrotransposons use MBDs to integrate into heterochromatic regions in Strigamia, avoiding poten- tially harmful insertions into host genes. In contrast, CopiaMBD insertions in the spider Stegodyphus dumicola genome dispro- portionately accumulate in methylated gene bodies compared with other spider LTR-retrotransposons. Given that transposons are not actively targeted by DNA methylation in the spider genome, this distribution bias would also support a role for MBDs in the integration process. Together, these data show that retrotransposons can co-opt host-derived epige- nome readers, potentially harnessing the host epigenome landscape to advantageously tune the retrotransposition process. [Supplemental material is available for this article.] Cytosine DNA methylation is a base modification associated and Tweedie 2003). With the exception of MBD4, which is known with gene and transposable element repression in animals for its role in DNA repair after methylated cytosine deamination, (Zemach and Zilberman 2010; Schübeler 2015; Deniz et al. 2019). all other MBD family members are highly associated with gene re- Methylation is deposited by DNA methyltransferases on CpG dinu- pression and heterochromatin formation (Bogdanovićand cleotides (CpGs), but despite DNA methyltransferases being deeply Veenstra 2009; Du et al. 2015). Thus, cytosine methylation and conserved across animal genomes (Lyko 2018), there is extensive heterochromatin formation by MBD proteins are one of the variability regarding the genome methylation levels and distribu- main defense mechanisms that the host genome possesses to si- tion between lineages. In vertebrates, there is widespread high lence and control transposable elements (Levin and Moran 2011; methylation across the genome, mostly only absent from CpG is- Deniz et al. 2019). land promoters and active regulatory regions (Schübeler 2015). In In turn, transposable elements are engaged in a continual contrast, in invertebrates, methylation is “mosaic,” concentrated arms race with their hosts. To proliferate, transposons must on active gene bodies and, in some instances, on transposable ele- develop strategies to escape from silencing mechanisms and tar- ments (Suzuki and Bird 2008). However, some invertebrate lineages geting by the host. Among transposable elements, there are two have lost DNA methylation, such as Drosophila melanogaster and main types depending on their replication strategy: DNA transpo- Caenorhabditis elegans, whereas others have lost methylation only sons, which are excised and copied in the genome as DNA, and ret- on transposable elements, including most insects and crustaceans rotransposons, which have an intermediate RNA step before (Bewick et al. 2017; Gatzmann et al. 2018). retrotranscription to DNA and integration (Wicker et al. 2007; Critical to the function of cytosine DNA methylation are Bourque et al. 2018). Retrotransposons are furtherly divided in methylation “readers,” proteins capable of binding and interpret- two main types: long terminal repeat (LTR) retrotransposons, ing the methylation state and subsequently altering the transcrip- which possess repetitive sequences flanking the retrotransposon, tional output or chromatin environment (Law and Jacobsen 2010; and long interspersed nuclear elements (LINEs), which lack Zhu et al. 2016). The major family of methylation readers are the LTRs. Autonomous LTR-retrotransposons require at least a reverse methyl-CpG binding domain (MBD) proteins (Bogdanovićand transcriptase (RT) and an integrase (INT) to replicate by them- Veenstra 2009; Du et al. 2015). The 70-amino-acid-long MBD is re- selves; however, additional protein domains might have an influ- sponsible for binding to methylated CpGs, whereas most MBD ence in the retrotransposition process. Here, we report how a family members have additional protein domains that can recruit family of retrotransposons has benefited from integrating an silencing complexes (Du et al. 2015). Not all MBD family members MBD into their coding sequence, challenging our views on MBD are able to bind methylated cytosines, despite encoding an MBD; function and retrotransposon evolution. for instance, SETDB and BAZ2 chromatin remodelers or the mam- malian MBD3 ortholog prefer unmethylated cytosines (Hendrich © 2019 de Mendoza et al. This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication Corresponding authors: [email protected], date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it [email protected] is available under a Creative Commons License (Attribution-NonCommercial Article published online before print. Article, supplemental material, and publi- 4.0 International), as described at http://creativecommons.org/licenses/ cation date are at http://www.genome.org/cgi/doi/10.1101/gr.243774.118. by-nc/4.0/. 29:1–10 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/19; www.genome.org Genome Research 1 www.genome.org Downloaded from genome.cshlp.org on September 23, 2021 - Published by Cold Spring Harbor Laboratory Press de Mendoza et al. Results While profiling the evolution of genes involved in DNA methylation in animal A DNMT1DNMT3 MBD4/MECP2MBD1/2/3BAZ2A/BSETDB1/2MBD-FboxCopiaMBD genomes, we serendipitously discovered Drosophila melanogaster that the centipede Strigamia maritima Pancrustacea Tribolium castaneum (Chipman et al. 2014) encoded hundreds Daphnia pulex of MBD containing proteins (Fig. 1A), in Arthropoda 223 Myriapoda Strigamia maritima contrast to most animal genomes that Centruroides sculpturatus encode between one and 10 MBD family Chelicerata Stegodyphus mimosarum 286 members. Some of the Strigamia MBD- Bilateria Crassostrea gigas containing gene models also presented Capitella teleta typical retrotransposon domains, such Homo sapiens as INTs and RTs, specifically the RVT_2 Nematostella vectensis domain characteristic of Copia retrotrans- Methyl-CpG binding domain (MBD) posons (Wicker et al. 2007). By perform- containing gene families ing a de novo annotation of repetitive BCCopiaMBD MBD phylogeny elements in the Strigamia genome, we confirmed that 98% of the MBD gene Strigamia maritima - SMAR00078 (5,767 bp) 78 Strigamia models were not host genes belonging LTR LTR 44 CopiaMBD to conserved gene families but were in ORF1 ORF2/Gag ORF3/Pol 50 fact in open reading frames (ORFs) be- 100 Stegodyphus 68 MBD-R2 longing to retrotransposons. Some of AP gag-pre rve MBD RVT_2 RNase_H 76 the copies displayed well-conserved PRINT RT RH 43 MBD4/MECP2 LTRs typical of Copia retrotransposons; Stegodyphus mimosarum - KK115746:15,345-20,254 (4,909 bp) thus, we called this new type of retro- 31 MBD1/2/3 63 transposon CopiaMBDs (Fig. 1B). LTR LTR ORF1/Pol To test whether this retrotransposon 43 MBD-Fbox family was specific to the centipede GAG AP gag-pre rve MBD RVT_2 RNase_H SETDB1/2 92 Strigamia, we used the MBD sequence to GAG PRINT RT RH scan for its presence in other animal ge- 31 BAZ2A/B nomes. The only genomes in which we D Positive charge 76 Amino acid Negative charge identified similarity hits were in the spi- type Polar 0.5 Lack of mCpG binding ders Stegodyphus mimosarum (Sanggaard Hydrophobic et al. 2014) and Stegodyphus dumicola Amino Acid Identity % mCpG (Liu et al. 2019), whereas it was not de- binding Hsap - MBD1 Yes tected in other arachnid, pancrustacean, Hsap - MBD2 Hsap - MBD3 No or myriapod genomes (Fig. 1A). We then Xlae - MBD3 asked whether CopiaMBDs in Strigamia Smar - MBD1/2/3 Unknown Hsap- MECP2 and Stegodyphus evolved through recruit- Smar - MBD4 Hsap- MBD4 ing MBDs independently or whether the Smar - DN9170 MBD capture occurred once and was Smar - 014194 Smar - 000078 then vertically inherited. To test this, we Smar - 000922 built a phylogenetic tree of eukaryotic Smar - 000682 CopiaMBD Smar - 001729 RTs belonging to Copia retrotransposons Smim - KFM74814 Hsap - BAZ2A (Supplemental Fig. S1), confirming that Hsap - SETDB1 all CopiaMBD are monophyletic and Figure 1. A family of arthropod Copia retrotransposons have incorporated an MBD into their coding thus share a common ancestor that al- sequence. (A) Cladogram showing a subset of animal species, taxonomic affiliation, and the presence/ ready encoded