Stella modulates transcriptional and endogenous retrovirus
programs during maternal-to-zygotic transition
Yun Huang1,7 , Jong Kyoung Kim2,6,7, Dang Vinh Do1, Caroline Lee1, Christopher A. Penfold1,
Jan J. Zylicz1, John C. Marioni2,3,5, Jamie A. Hackett1,4,8, M. Azim Surani1,8
1 Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis
Court Road, Cambridge CB2 1QN, UK; Department of Physiology, Development and
Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
2 European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
3 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA,
UK
4 Present Address: European Molecular Biology Laboratory (EMBL) - Monterotondo, via
Ramarini 32, 00015 Rome, Italy
5 Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE,
UK
6 Department of New Biology, DGIST, Daegu, 42988, Republic of Korea
7 Co-first authors
8 Co-corresponding authors [email protected] [email protected]
1 1 Abstract
2
3 The maternal-to-zygotic transition (MZT) marks the period when the embryonic genome is
4 activated and acquires control of development. Maternally inherited factors play a key role in
5 this critical developmental process, which occurs at the 2-cell stage in mice. We investigated
6 the function of the maternally inherited factor Stella (encoded by Dppa3) using single-
7 cell/embryo approaches. We show that loss of maternal Stella results in widespread
8 transcriptional mis-regulation and a partial failure of MZT. Strikingly, activation of
9 endogenous retroviruses (ERVs) is significantly impaired in Stella maternal/zygotic knockout
10 embryos, which in turn leads to a failure to upregulate chimeric transcripts. Amongst ERVs,
11 MuERV-L activation is particularly affected by the absence of Stella, and direct in vivo
12 knockdown of MuERV-L impacts the developmental potential of the embryo. We propose
13 that Stella is involved in ensuring activation of ERVs, which themselves play a potentially
14 key role during early development, either directly or through influencing embryonic gene
15 expression.
2 16 Introduction
17
18 Maternally inherited factors in the zygote play a critical role during early development
19 (Ancelin et al. 2016, Li et al. 2010, Wasson et al. 2016). The maternal-to-zygotic transition
20 (MZT) marks the time of transfer of developmental control to the embryo following activation
21 of the zygotic genome (Lee et al. 2014, Li et al. 2013). In mice, the major wave of zygotic
22 genome activation (ZGA) occurs at the late 2-cell stage (Golbus et al. 1973). The earliest
23 zygotically transcribed genes are preferentially enriched in essential processes such as
24 transcription, RNA metabolism and ribosome biogenesis (Hamatani et al. 2004, Xue et al.
25 2013). Other events that characterise early pre-implantation development include extensive
26 erasure of global DNA methylation and dynamic changes in histone modifications (Cantone
27 et al. 2013).
28
29 Previous studies have shown that Stella, encoded by the Dppa3 gene locus, is a maternally
30 inherited factor that is required for normal pre-implantation development (Bortvin et al. 2004,
31 Payer et al. 2003). Stella is a small basic protein with a putative SAP-like domain and
32 splicing-like factor; it also contains nuclear localisation and export signal and has the
33 potential to bind to DNA and RNA in-vitro (Nakamura et al. 2007, Payer et al. 2003) (Figure
34 1 – Figure Supplement 1A). Expression of Stella is high in oocytes, continues through pre-
35 implantation development, and subsequently occurs specifically in primordial germ cells.
36 Stella is also expressed in naïve embryonic stem cells, but is downregulated upon exit from
37 pluripotency (Hayashi et al. 2008). Lack of maternal inheritance of Stella results in
38 developmental defects; these manifest from the 2-cell stage onwards, and result in only a
39 small proportion of embryos developing to the blastocyst stage. Notably, a maternally
40 inherited pool of Stella in zygotic Stella knockout embryos, derived from Stella heterozygous
41 females, allows development to progress normally. This indicates Stella has a key role
42 during the earliest developmental events (Payer et al. 2003).
43
3 44 Stella was suggested to protect maternal pronuclei (PN) from TET3 mediated active DNA
45 demethylation (Nakamura et al. 2007, Nakamura et al. 2012) (Figure 1 – Figure Supplement
46 1B). In the absence of Stella, zygotes display enrichment of 5hmC in both parental PN
47 (Wossidlo et al. 2011), and an aberrant accumulation of γH2AX in maternal chromatin
48 (Nakatani et al. 2015). In addition, Stella protects DNA methylation levels at selected
49 imprinted genes and transposable elements (Nakamura et al. 2007). However, embryos
50 depleted of maternal effect proteins known to regulate imprinted genes only exhibit
51 developmental defects post-implantation (Denomme et al. 2013), implying that the 2-cell
52 phenotype in Stella maternal/zygotic knockout (Stella M/Z KO) embryos is not primarily
53 linked with imprint defects. Moreover, TET3 only partially contributes to DNA demethylation
54 and its absence is compatible with embryonic development (Amouroux et al. 2016, Peat et al.
55 2014, Shen et al. 2014, Tsukada et al. 2015). Thus, what impairs pre-implantation
56 embryonic development in the absence of maternal Stella remains unclear.
57
58 A significant number of transposable elements (TEs) are preferentially activated during early
59 development and in a sub-population of mouse embryonic stem cells (Gifford et al. 2013,
60 Macfarlan et al. 2012, Rowe et al. 2011). Notably, at the 2-cell (2C) stage in mouse
61 development, there is selective upregulation of endogenous retrovirus (ERVs), which are a
62 subset of TEs characterised by the presence of LTRs that mediate expression and
63 retrotransposition (Kigami et al. 2003, Ribet et al. 2008). Growing evidence suggests that
64 activation of some TEs has important biological functions during early development (Beraldi
65 et al. 2006, Kigami et al. 2003). TEs can regulate tissue-specific gene expression or splicing
66 through their exaptation as gene regulatory elements, and may also play a key role during
67 speciation (Bohne et al. 2008, Gifford et al. 2013, Rebollo et al. 2012). TEs additionally drive
68 expression of genes directly by acting as alternative promoters that generate chimeric
69 transcripts, which include both TE and protein-coding sequences (Macfarlan et al. 2012,
70 Peaston et al. 2004). Thus the expression of TEs may be functionally important during early
4 71 embryo development and understanding the regulation of TEs themselves is therefore of
72 great interest.
73
74 We adopted an unbiased approach to investigate the role of Stella during mouse maternal-
75 to-zygotic transition, using single-cell/embryo RNA-seq analysis of mutant embryos (Deng et
76 al. 2014, Xue et al. 2013, Yan et al. 2013). We find Stella M/Z KO 2-cell embryos fail to
77 upregulate key zygotic genes involved in regulation of ribosome and RNA processing.
78 Furthermore, the absence of Stella results in widespread misregulation of TEs and of
79 chimeric transcripts that are derived from these TEs. In particular, Stella M/Z KO embryos
80 exhibit a general failure to upregulate MuERV-L transcripts and protein at the 2-cell stage.
81 Our perturbation data is consistent with MuERV-L playing a functionally important role during
82 pre-implantation embryonic development, implying that MuERV-L is amongst the critical
83 factors affected by Stella in early embryos.
84
85 Results
86
87 Stella M/Z KO embryos are transcriptionally distinct from wild type embryos
88 We used Stella knockout mice (Payer et al. 2003), to collect mutant oocytes and embryos for
89 single-cell/embryo RNA-seq with modifications (Tang et al. 2010). Stella knockout (KO)
90 oocytes and Stella maternal/zygotic knockout (Stella M/Z KO, KO) 1-cell and 2-cell embryos,
91 lacking both maternal and zygotic Stella, were harvested. Strain matched wild type (WT)
92 oocytes and embryos served as controls (Figure 1A; Figure1 – Figure Supplement 2A).
93 Following normalisation, potentially confounding technical factors were controlled for using
94 Removal of Unwanted Variation (RUV) (Risso et al. 2014) (Figure 1 – Figure Supplement 2B,
95 Figure 1- source data 1, Supplementary File 1).
96
97 Principal component (PC) analysis revealed that the first PC represents progression from
98 oocyte to the 2-cell stage (Figure 1B). The largest separation is observed between oocyte /
5 99 1-cell and 2-cell embryos, consistent with the degradation of maternal transcripts and
100 activation of the zygotic genome at MZT. The second and third PCs capture the separation
101 between WT and KO samples, suggesting distinct genome-wide gene expression changes
102 between WT and KO embryos. Furthermore, although KO oocytes and 1-cell embryos
103 exhibit some separation from their WT counterparts, the difference is more clearly
104 pronounced at the 2-cell stage. Notably, Stella M/Z KO 1-cell and 2-cell embryos clustered
105 separately from each other, suggesting that Stella M/Z KO defects at the 2-cell stage are not
106 simply due to delayed developmental progression (Figure 1B). Differential gene expression
107 analysis between WT and KO samples identified 5881 misregulated genes (adjusted P-
108 value < 0.05) across the developmental stages with 360 genes overlapping across all stages
109 (Figure 1C, Figure 1 – source data 2).
110
111 Stella M/Z KO embryos are impaired in maternal-to-zygotic transition
112 Next, we compared the differentially expressed genes (DEG) against a public database of
113 early mouse embryonic transcriptomes (DBTMEE) (Park et al. 2015). This dataset
114 categorises genes by specific expression pattern at any developmental stage. Strikingly, we
115 found significant enrichment of maternal transcripts (maternal RNA, minor zygotic genome
116 activation and 1-cell transient genes) and depletion of zygotic transcripts (major zygotic
117 genome activation, 2-cell transient and mid pre-implantation activation) in the Stella M/Z KO
118 2-cell embryos compared to WT (Figure 1D). While Stella KO oocytes are transcriptionally
119 distinct from WT (Figure 1B), Stella KO oocytes do not exhibit overt abnormalities (Bortvin et
120 al. 2004, Payer et al. 2003). Furthermore, the apparent enrichment in maternal transcripts
121 only manifests at the 2-cell stage, suggesting the observed transcriptional differences are
122 not inherited from aberrant transcripts in the Stella KO oocyte but are likely linked to failed
123 downregulation of these maternal transcripts.
124
125 To independently characterise differences in gene expression dynamics between WT and
126 KO embryos during MZT, genes were clustered based on their pattern of expression in WT
6 127 oocytes, 1-cell and 2-cell embryos. We identified a cluster of maternal transcripts (ED),
128 defined as highly expressed in WT oocytes and 1-cell and with reduced expression at the 2-
129 cell stage, that are differentially expressed in KO samples (n=849, adjusted p-value <0.05)
130 (Figure 1E, Figure 1 – source data 3). Of these genes, 37.7% did not show the expected
131 reduction in expression in Stella M/Z KO 2-cell embryos. Furthermore, we detected a group
132 of zygotically activated genes (ZAG) (EU) with low expression in WT oocytes and 1-cell
133 embryos and increased expression in 2-cell embryos, which are differentially expressed in
134 KO samples (n=698, adjusted p-value <0.05). Moreover, 39.5% of this group of genes
135 exhibited significantly dampened 2-cell activation in the absence of maternal and zygotic
136 Stella (Figure 1E, Figure 1 – source data 4). The combined analysis suggests that Stella M/Z
137 KO embryos exhibit partial impairment in maternal-to-zygotic transition.
138
139 Stella M/Z KO 2-cell embryos fail to upregulate essential ribosomal and RNA
140 processing genes
141 Gene ontology analysis of genes more highly expressed in KO than WT 2-cell embryos
142 reveals an enrichment of chromatin modifiers, and genes involved in microtubule based
143 processes and response to DNA damage (Figure 2A, Figure 2 – source data 1), consistent
144 with a previous observation of abnormal γH2AX enrichment in maternal PN (Nakatani et al.
145 2015). In addition, downregulated genes in Stella M/Z KO 2-cell embryos are enriched for
146 RNA processing and ribosome biogenesis, with an overall depletion of genes associated
147 with ribosomes in the KEGG pathway and related gene ontology (Figure 2B, Figure 2-
148 Figure Supplement 1). The findings from single-cell/embryo RNA-sequencing were validated
149 by independent qRT-PCR analyses. Consistently, genes associated with chromatin
150 modifiers (Rbbp7 and Kdm1b) and DNA damage (Fam175a and Brip1) were more highly
151 expressed in KO embryos (Figure 2C), while those associated with ribosome biogenesis
152 (Nop16, Emg1), RNA binding (Larp1b) and cell cycle regulator (Cdc25c) showed lower
153 expression in KO 2-cell embryos (Figure 2D). To explore potential regulatory targets of
154 Stella, a genome-wide co-expression network analysis was performed across the WT
7 155 dataset. Overall, 910 genes showed correlated expression patterns with Dppa3 (Pearson
156 correlation >|0.9|) (Figure 2 – source data 2). Of these, 710 genes are positively correlated
157 while 200 genes are negatively correlated with Dppa3. Genes with a positive expression
158 correlation with Dppa3 are significantly enriched for RNA processing and RNA splicing
159 (Figure 2E), including Snrpd1 and Snrpb2, both core components of the spliceosome (Figure
160 2F).
161
162 Overall, our analysis suggests that the Stella M/Z KO 2-cell embryos do not effectively
163 transition from maternal to zygotic control, as judged by the aberrant enrichment of maternal
164 transcripts and depletion of zygotic transcripts. Genes regulating essential processes such
165 as RNA processing are depleted in Stella M/Z KO 2-cell embryos and Dppa3 exhibits
166 positive expression correlation with genes enriched in these processes. Notably, most genes
167 however do undergo appropriate MZT changes in mutant embryos, arguing against a
168 general developmental delay. Taken together, this supports a hypothesis whereby Stella
169 plays an important role in upregulating the expression of a subset of genes that are essential
170 during zygotic genome activation.
171
172 Expression of TEs are dysregulated in Stella KO oocytes and Stella M/Z KO embryos
173 Stella has been shown to affect the DNA methylation of particular TEs in zygotes (Nakamura
174 et al. 2007) and primordial germ cells (Nakashima et al. 2013). Since many TEs are
175 specifically activated during early development, we investigated their expression in WT and
176 KO oocytes and embryos. We remapped the single-cell/embryo RNA sequencing data to
177 obtain read counts for class I TEs (retrotransposons) (Figure 3 – source data 1-2). To get an
178 overview of the extent of TE activation, we calculated the fraction of transcripts mapped to
179 TEs as a ratio of total mapped reads. We found a dramatic increase in the ratio of reads
180 mapped to TEs, primarily of the LTR class, at the 2-cell stage compared to oocyte and 1-cell,
181 suggesting widespread TE activation coincident with MZT (Figure 3A, Figure 3B). Further
182 inspection revealed that activation of the LTR class is dominated by upregulation of ERVL
8 183 and ERVL-MaLR families, while LINE and SINE classes are primarily accounted for by L1
184 and Alu families, respectively (Figure 3 – Figure Supplement 1A). Moreover, principal
185 component analysis showed developmental progression can be clearly captured by TE
186 expression alone along PC1 (Figure 3C), similar to recent studies identifying stage-specific
187 transcription initiation of ERV expression in early human embryos (Göke et al. 2015, Grow et
188 al. 2015). In addition, WT and KO oocytes and embryos can be partially separated in
189 individual PCAs along each of the developmental stages, signifying differences in
190 expression of TEs between WT and KO samples.
191
192 Stella is associated with the expression of specific TE
193 We identified ‘maternal TEs’, defined as TEs expressed higher in WT oocytes than 2-cell
194 embryos and ‘zygotic TEs’, as TEs expressed higher in WT 2-cell embryos than oocytes. To
195 characterise the disparities between TE expression in WT and KO, we compared
196 differentially expressed TEs at the 2-cell stage, with maternal and zygotic TEs (Figure 3D).
197 Strikingly, TEs upregulated in Stella M/Z KO 2-cells are significantly enriched for maternal
198 TEs while TEs downregulated in Stella M/Z KO 2-cells are associated with zygotic TEs.
199 Furthermore, Stella M/Z KO 2-cell embryos display lower activation of LTR, specifically the
200 endogenous retrovirus (LTR-ERVL) family, with small relative enrichment for LINE and SINE
201 classes (Figure 3E, Figure 3 – Figure Supplement 1B).
202
203 A particular element of interest is MuERV-L (Kigami et al. 2003, Ribet et al. 2008), which
204 encodes a canonical retroviral Gag and Pol, flanked by 5’ and 3’ LTR (also known as
205 MT2_Mm). We observed a selective upregulation of MuERV-L-Int and MT2_Mm transcripts
206 at the 2-cell stage, but this was significantly reduced in Stella M/Z KO embryos (Figure 3F).
207 Many MuERV-L elements have an open reading frame and we therefore tested protein
208 levels using a MuERV-L Gag antibody. This antibody was validated by showing staining
209 overlaps well with a MuERV-L reporter ESC line – 2C::tdTomato (Macfarlan et al. 2012), and
210 is specific to 2-cell embryos whilst not detected in metaphase II oocytes (Figure 3 – Figure
9 211 Supplement 1C). Strikingly, immunofluorescence (IF) staining showed a significantly lower
212 signal and loss of perinuclear localisation of MuERV-L Gag (Ribet et al. 2008) in Stella M/Z
213 KO 2-cell embryos compared to WT, implying Stella influences both transcript and protein
214 levels of MuERV-L (Figure 3G).
215
216 To assess whether Stella directly binds to specific transcripts / TEs we turned to ESC since
217 obtaining sufficient 2-cell embryos for this analysis is intractable. Chromatin
218 immunoprecipitation-seq (ChIP-seq) of HA-tagged Stella resulted in minimal peak calls
219 (n=56) (Supplementary File 2), implying Stella does not efficiently bind DNA (including
220 MuERV-L) in ESC. Consistently, neither overexpression nor knockdown of Dppa3 in mESC
221 had a marked effect on the expression of MuERV-L (Figure 3 – Figure Supplement 2).
222 These results suggest that Stella-mediated control of MuERV-L is likely specific to the
223 unique chromatin/cellular context of early embryos. Notably Stella has the potential to bind
224 both RNA and DNA and could thus affect TEs through multiple modes. Overall, our data
225 suggests Stella plays a key role in regulating a subset of TEs specifically in 2-cell embryos,
226 which includes promoting strong activation of MuERV-L elements.
227
228 Expression of a subset of TEs is positively correlated with their nearest gene
229 Next we explored the relationship between TE and protein-coding gene expression in WT
230 and KO samples. First, we considered the intersection between differentially expressed
231 genes and genes that are neighbours of misregulated TEs at the 2-cell stage, and
232 discovered a highly significant overlap of 12% between the two populations (p=3.19x10-87)
233 (Figure 4A). Notably, a greater number of TEs downregulated in KO 2-cell embryos are
234 located within 10kb of the transcriptional start site of zygotically activated genes (ZAG,
235 n=698) than expected by chance (p<10-4) (Figure 4B). Altogether, this suggests
236 perturbations in TE expression may be linked to neighbouring mRNA expression levels in
237 Stella M/Z KO embryos. To determine if this relationship is applicable genome-wide, we
238 performed global expression correlation analysis between a TE and its nearest gene. To
10 239 eliminate confounding factors, we excluded TEs that overlap directly with an exon. Notably,
240 we detected 387 TEs whose expression levels were positively correlated with their nearest
241 gene (Spearman’s correlation >0.7) while 224 TEs were negatively correlated (Spearman’s
242 correlation < -0.7) (Figure 4 – Figure Supplement 1A). This relationship was validated using
243 single-embryo qRT-PCR at 4 genomic loci, which demonstrated a significant correlation
244 between the expression of a TE and its downstream gene (Figure 4C). These loci were
245 selected to represent a range of LTR class elements; RMER19A, ORR1A2-Int and MT2B
246 are further categorised into ERVK, ERVL-MaLR and ERVL families respectively.
247 Interestingly, 2 of the genes, Chka and Zfp54, have been implicated in regulation of early
248 embryonic development (Albertsen et al. 2010, Wu et al. 2008). Thus, the expression levels
249 of a subset of TEs are highly correlated with the expression of their nearest gene.
250
251 Whilst association does not equate to causality, there are a number of ways TEs have been
252 implicated in the regulation of gene expression (Elbarbary et al. 2016, Thompson et al.
253 2016). One possibility is the establishment of alternative splicing events, which create a
254 junction between an expressed TE and a downstream gene exon, thus leading to the
255 formation of a chimeric transcript (Peaston et al. 2004). Indeed, MuERV-L has been shown
256 to form such fusion transcripts with 307 genes at the 2-cell stage (Macfarlan et al. 2012).
257 Here, we find Stella M/Z KO embryos with the lowest MuERV-L expression also express
258 lower levels of genes that make a chimeric junction with MuERV-L elements (Figure 4 –
259 Figure Supplement 1B). In contrast, the KO embryo with the highest MuERV-L expression
260 expresses higher levels of these genes. Importantly, we next identified and validated a
261 number of chimeric transcripts that are expressed in 2-cell embryos by PCR and sequenced
262 across the chimeric junction (Figure 4D). Single-embryo qRT-PCR shows most of these
263 chimeric transcripts are expressed at lower levels in KO compared to WT 2-cell embryos
264 (Figure 4E). These chimeric transcripts include Snrpc, which encodes for a component of
265 the ribonucleoprotein required for the formation of the spliceosome, and the RNA binding
266 protein, Rmb8a. Our data implies a link between activation of some TEs and the emergent
11 267 gene expression network, which is perturbed in Stella M/Z KO embryos due to altered TE
268 expression. This is, in part, mediated by TEs directly regulating gene expression through
269 chimeric transcripts.
270
271 MuERV-L plays a functional role during pre-implantation development
272 We propose that the dysregulation in TE expression might directly contribute to the
273 abnormal pre-implantation phenotype observed in Stella M/Z KO embryos. We focused on
274 assessing the function of MuERV-L elements, as they represent one of the most highly
275 activated TEs at the 2-cell stage (Kigami et al. 2003), as well as being amongst the most
276 significantly downregulated in Stella M/Z KO embryos (Figure 3F and Figure 3G). We
277 considered that attenuated MuERV-L activation in Stella M/Z KO embryos could in turn
278 disrupt activation of associated 2-cell transcripts driven by their LTRs (Figure 4 – Figure
279 Supplement 1B). Alternatively, MuERV-L mRNA/protein itself may also have a functional
280 role during early embryonic development.
281
282 We detected 583 copies of the full-length MuERV-L element, containing the complete gag
283 and pol genes, in the genome. This makes it experimentally challenging to completely
284 eliminate MuERV-L activation in early embryos. Nevertheless, we utilised siRNA to
285 knockdown the expression of MuERV-L elements in 1-cell embryos and monitored the
286 effects on embryonic development (Figure 5A). We designed siRNA to target the consensus
287 sequence of MuERV-L from the Dfam database (Hubley et al. 2016) (Figure 5B).
288 Computational analysis reveals MuERV-L siRNA targets 80.5% of full-length MuERV-L
289 elements with perfect match and 99.5% with ≤2bp mismatches, while scramble siRNA has
290 no targets with ≤2bp mismatches. At 20μM siRNA, we observed a slight decrease in
291 MuERV-L transcripts (Figure 5C) but a significant reduction in MuERV-L Gag protein
292 staining in 2-cell embryos (Figure 5D). These embryos exhibited a mild developmental delay
293 at 4 to 8-cell stage progression on day 2, with fewer embryos reaching the blastocyst stage
294 on day 4, and those that did were smaller and with fewer cells (Figure 5E). This suggests
12 295 that a modest reduction in MuERV-L at the 2-cell stage has a notable effect on pre-
296 implantation development.
297
298 There is nonetheless variability in MuERV-L expression at the 2-cell stage, as shown in
299 other studies (Ancelin et al. 2016), and we therefore speculated that a higher concentration
300 of siRNA might increase the repression efficiency of MuERV-L transcripts. Indeed, 80μM
301 MuERV-L siRNA was more effective (Figure 5F, 5G), and in turn resulted in >90% embryos
302 arrested at the 2-cell stage (Figure 5H), albeit some indirect effects at higher concentrations
303 of siRNA may contribute to the embryonic arrest. Notably, MuERV-L knockdown resulted in
304 repression of some chimeric transcripts previously characterised in 2-cell embryos (Figure
305 4D, Figure 5 – Figure Supplement 1). These data collectively suggest MuERV-L may have a
306 functional role in embryonic development.
307
308 Discussion
309 The maternal-to-zygotic transition is a complex process that entails extensive global
310 transcriptional and epigenetic changes. Maternally inherited factors such as Stella play a role
311 in this transition in mouse oocytes but the difficulties of obtaining sufficient materials has
312 hindered investigations on how they contribute to MZT. Here, we use single-cell/embryo
313 approaches and find that deletion of maternal and zygotic Stella affects MZT. Furthermore,
314 we observed a failure to activate endogenous retrovirus (LTR-ERVL), and specifically
315 MuERV-L elements. In parallel there is an accumulation of maternal transcripts and a failure
316 to upregulate many zygotic transcripts. These events appear to be partially coupled, with
317 altered LTR promoter activity leading to changes in gene expression, for example through
318 the modulation of chimeric transcripts. In addition, we reveal the biological relevance of TE
319 expression as knockdown of MuERV-L at the 2-cell stage hinders developmental
320 progression, which likely contributes to the phenotype observed following loss of Stella.
321
13 322 Maternal mRNA degradation and zygotic genome activation are critical aspects during MZT
323 (Li et al. 2013). Maternal mRNA clearance plays both a permissive and instructive role
324 (Tadros et al. 2009), while ZGA also promotes the final degradation of maternal transcripts.
325 Stella has the potential to function as an RNA binding protein, which might facilitate the
326 degradation of maternal transcripts (Nakamura et al. 2007, Payer et al. 2003). In addition,
327 Stella KO oocytes exhibit transcriptional differences compared to WT oocytes. This is
328 consistent with a previous study showing Stella KO oocytes are defective in transcriptional
329 repression (Liu et al. 2012), which may be needed for dosage regulation of critical maternally
330 inherited transcripts. Stella has also however been demonstrated to be required beyond
331 oogenesis (Nakamura et al. 2007). Maternally inherited factors likely influence the timing and
332 specificity of gene expression during ZGA, which is essentially determined by interplay
333 between the transcriptional machinery and histone modifications that influences chromatin
334 accessibility (Lee et al. 2014).
335
336 The failure of zygotic transcript activation may be linked to the failure of zygotic TE activation.
337 We detected extensive changes in global TE expression during MZT. This is accompanied
338 by strong upregulation of the LTR-ERVL family in 2-cell embryos, which is consistent with a
339 recent study showing that the chromatin surrounding ERVL is in a highly accessible state at
340 the 2-cell stage (Wu et al. 2016). LTR-ERVL upregulation is significantly reduced in the
341 absence of maternal Stella, suggesting that Stella may facilitate the activation of many ERVs.
342 The expression changes are validated at the protein level for a specific LTR-ERVL element,
343 MuERV-L. Importantly, the decrease in MuERV-L is unlikely to be a secondary effect of 2-
344 cell embryonic arrest, as maternal Kdm1a (LSD1) mutant embryos, which result in pan 2-cell
345 arrest maintain normal MuERV-L transcript expression (Ancelin et al. 2016). Notably, whilst
346 many epigenetic mechanisms have been identified for the suppression of ERVs (Gifford et al.
347 2013), we have identified a factor associated with the activation of a subset of ERVs during
348 mammalian MZT.
349
14 350 The question remains, how does Stella regulate TE expression? In primordial germ cells,
351 Stella directly binds to LINE1 and IAP elements to facilitate DNA demethylation of these
352 targets (Nakashima et al. 2013). Whilst experimentally challenging, it would be pertinent to
353 determine potential DNA binding targets of Stella in 2-cell embryos. At the same time,
354 function of Stella at the RNA level cannot be excluded, or it could act by altering the
355 epigenetic status of cells at this time. For example, the H3K9 methyltransferase Setdb1 is
356 aberrantly enriched in Stella M/Z KO 2-cell embryos, and this has been shown to suppress
357 ERVs and chimeric transcripts expression (Eymery et al. 2016, Karimi et al. 2011, Kim et al.
358 2016). Since a human orthologue of Dppa3 is expressed in ESCs, germ cells and pre-
359 implantation embryos (Bowles et al. 2003, Deng et al. 2014, Payer et al. 2003), it would be
360 of interest to know whether DPPA3 plays a similar role during human pre-implantation
361 development; for example, on the stage specific activation endogenous retroviruses (Göke
362 et al. 2015, Grow et al. 2015).
363
364 TEs have evidently been co-opted for the regulation of mammalian development as
365 exemplified by the domestication of the retroviral env gene, which is essential for placental
366 development (Dupressoir et al. 2012). Here, we have shown that the expression of a subset
367 of TEs are intimately linked to its nearest gene during early embryonic development. As we
368 have shown in several cases, TEs could regulate gene expression through chimeric
369 transcripts, which are misregulated in the absence of Stella. The impact of transcriptional
370 inhibition of ERV programme and associated chimeric transcripts during MZT, and their
371 subsequent effects on development merits further investigation. Intriguingly, we discovered
372 that reducing MuERV-L mRNA/ protein levels at the 2-cell stage affects development. This
373 could suggest that the presence of MuERV-L Gag protein has a functional role during early
374 mouse development, similar to a recent report illustrating a role of endogenous LTR viral-like
375 particles in human blastocyst development (Grow et al. 2015). It is also possible that
376 MuERV-L siRNA KD may downregulate a critical subset of chimeric transcripts that include
377 MuERV-L sequence or, could feedback to target transcriptional repression to MuERV-L loci
15 378 through small RNA pathways (Castel et al. 2013). Disentangling these possibilities warrants
379 further investigation.
380
381 In conclusion, we have revealed that Stella is involved in orchestrating the transcriptional
382 changes and activation of endogenous retroviral (ERV) elements during maternal-to-zygotic
383 transition. The appropriate expression of LTR-ERV driven zygotic genes and specific ERV
384 elements (MuERV-L) in turn contribute towards normal early embryonic development in mice.
385
386 Materials and methods
387
388 Experimental methods
389
390 Collection of mouse oocytes and embryos
391 All husbandry and experiments involving mice were carried out according to the Home Office
392 guidelines and the local ethics committee. Here, we refer to Stella as the protein encoded by
393 the Dppa3 gene. Stella knockout (KO) mice were generated as previously described (Payer
394 et al. 2003) (RRID:MGI:2683730) in a pure 129/SvEv strain. For RNA-seq, ovulated oocytes
395 were collected from Stella KO or wild type (WT) females crossed with vasectomised male at
396 9am, embryonic day (E) 0.5. Stella KO females were crossed with Stella KO males to
397 produce Stella maternal and zygotic knockout (Stella M/Z KO, KO) embryos, while Stella WT
398 females were crossed with Stella WT males to obtain WT embryos. 1-cell and 2-cell embryos
399 were collected at 4pm on E0.5 and 9am-3pm on E1.5 respectively. All oocytes and embryos
400 were morphologically assessed to ensure only viable samples were collected.
401
402 Single embryo RNA sequencing and RT-qPCR
403 Oocytes were incubated with 0.3 mg/ml hyaluronidase (Sigma, St. Louis, MO) to remove the
404 cumulus cells. Zona pellucida was removed from oocytes and embryos prior to lysis. RNA
405 were extracted from single oocytes, 1-cell embryos or whole 2-cell embryos; and cDNA were
16 406 amplified as described previously with modifications (Tang et al. 2010). Between 1:10,000 –
407 1:60,000 dilution of ERCC RNA Spike-In mix (ThermoFisher Scientific, Hemel Hempstead,
408 UK) was added to lysis buffer to estimate efficiency of amplification (Jiang et al. 2011). 1:10
409 dilution of the cDNA was used for RT-qPCR. For relative expression, all genes / TE
410 transcripts were normalised to 3 housekeeping genes (ARBP, PPIA and GAPDH). For
411 primer sequences see Supplementary File 3. For RNA-seq, cDNA were fragmented to ~
412 200bp with Focused-ultrasonicator (Covaris, Woburn, MA) and adaptor ligated libraries were
413 generated using NEBNext® UltraTM DNA library Prep Kit for illumina® (New England Labs,
414 Ipswich, MA). Single-end 50bp sequencing was performed with HiSeq1500 (Illumina, San
415 Diego, CA) to an average depth of 13.5 million reads per sample.
416
417 2-cell embryo immunofluorescence staining
418 2-cell embryos were fixed in 4% paraformaldehyde (PFA) in PBS for 15mins at room
419 temperature (RT). After permeabilisation with IF buffer (0.1% Trition, 1% BSA in PBS) for
420 30mins at RT, embryos were incubated with primary antibody overnight at 4ºC. After
421 washing 3x with IF buffer, embryos were incubated with corresponding secondary antibody
422 and 1 g/ml of DAPI (4’,6’-diamidino-2-phenylindole) for DNA visualisation for 1hr at RT.
423 After washing, embryos were mounted in VECTASHIELD ® Mounting medium (Vector
424 Laboratories, Peterborough, UK). For each experiment, WT and KO embryos were stained
425 at the same time and processed identically using the same setting for confocal acquisition to
426 allow comparison of relative fluorescence intensity. The following antibodies were used:
427 rabbit anti-MuERV-L Gag (ER50102, Huangzhou HuaAn Biotechnology Co., LTD, China)
428 (RRID:AB_2636876), 1:200; Alexa488 donkey anti-rabbit IgG (ab150073, Abcam,
429 Cambridge,UK) (RRID:AB_2636877), 1:500.
430
431 siRNA injections
432 B6CBAF1/J (F1) female mice were superovulated by injection of 5 International Units (IU) of
433 pregnant mare’s serum gonadotrophin (PMSG) (Sigma, St. Louis, MO), followed by 5 IU of
17 434 human chorionic gonadotrophin (hCG) (Sigma, St. Louis, MO) after 48hours and then mated
435 with F1 male mice. Zygotes were harvested from oviducts 17-22hours post hCG injection.
436 Cumulus cells were removed by incubation with 0.3mg/ml hyaluronidase in M2 medium
437 (Sigma, St. Louis, MO). 20 M or 80 M of Stealth RNAi siRNA (ThermoFisher Scientific,
438 Waltham, MA) were micro-injected into the cytoplasm of zygotes using a Femtoject 4i device
439 (Ependorf, UK). The following are the siRNA sequences: scramble (sense:
440 UUCCUCUCCACGCGCAGUACAUUUA) and MuERV-L (sense:
441 GAAGAUAUGCCUUUCACCAGCUCUA). Injected embryos were cultured in M16 medium
442 (Sigma, St. Louis, MO) in BD Falcon Organ Culture Dish at 5% CO2 and 37 ºC for a total of
443 5 days. The number of surviving 2-cell embryos 24hrs after micro-injection represents the
444 total number of embryo analysed per experimental group. 20 M or 80 M siRNA injections
445 were repeated twice independently.
446
447 Stella-HA Chromatin immunoprecipitation
448 We tagged three copies of hemagglutinin (HA, 27bp) to the C-terminus of Stella and used
449 the high affinity HA antibody to pull down Stella bound chromatin. The Stella+HA construct
450 was expressed in the absence of endogenous Stella by using Stella M/Z KO mESCs, with
451 NANOG-HA and eGFP-HA as positive and negative controls respectively. The ChIP protocol
452 was modified from a previously described Stella ChIP protocol (Nakamura et al. 2012).
453 Briefly, cells were cross-linked with 1% paraformaldehyde (PFA) for 8 minutes at RT,
454 quenched with 200mM glycine for 5 minutes and washed twice with PBS. Samples were
455 suspended in 2ml/IP of radio immunoprecipitation assay (RIPA) buffer (50mM Tris-HCl, pH8,
456 150mM NaCl, 1mM EDTA, 1% NP40, 0.5% deoxycholate and 0.1% SDS) with 1:20 dilution
457 of proteinase inhibitor cocktail (Roche, Welwyn Garden City, UK) and sonicated to an
458 average length of 200-1000bp. Dynabeads® Protein G (ThermoFisher Scientific, Hemel
459 Hempstead, UK) were pre-incubated with anti-mouse HA antibody (MMS-101P-500,
460 Covance Research Products Inc, Denver, CA) (RRID:AB_291261) at 7 g/IP for 1 hour at 4℃.
461 The antibody-beads complex was then incubated with chromatin solution overnight at 4℃.
18 462 Beads were washed twice with RIPA buffer, twice with high-salt wash buffer (20mM Tris-HCl,
463 pH8; 500mM NaCl, 1mM EDTA, 1% NP-40, 0.5% deoxycholate and 0.1% SDS), twice with
464 LiCl wash buffer (250mM LiCl, 20mM Tris-HCl, pH8, 1mM EDTA, 1% NP-40 and 0.5%
465 deoxycholate) for 10 minutes at 4℃ and a final wash with Tris-EDTA. Chromatin were eluted
466 by adding 100 l of SDS-Elution buffer (50mM Tris-HCl (pH 7.5), 10mM EDTA, 1% SDS) and
467 incubated at 68℃ for 10 minutes. Reverse-crosslink was performed with 100 l of proteinase
468 K (1mg/ml) for 2 hours at 42℃ and 5 hours at 68℃. Finally, DNA were extracted with phenol
469 purification and precipitated with NaCl into pellets. DNA pellets were re-suspended in 11 l of
470 H20. We detected enrichments of NANOG in known binding regions using ChIP-qPCR, but
471 not for eGFP, supporting the efficacy of the ChIP (data not shown). Stella-HA ChIP
472 sequencing libraries were prepared with NEBNext® UltraTM Library Prep Kit for Illumina®
473 (New England Labs, Ipswich, MA) with ~8ng DNA input. Single-end 50bp sequencing was
474 performed with HiSeq2500 (Illumina, San Diego, CA). Stella -HA ChIP seq enriched peaks
475 are shown in Supplementary File 2.
476
477 2C::tdTomato ESC culture and Dppa3 transfection
478 2C::tdTomato reporter (Addgene plasmid #40281) ESCs (Macfarlan et al. 2012) were
479 cultured in GMEM (Life Technologies, Carlsbad, CA), 10% KnockOutTM Serum Replacement
480 (ThermoFisher Scientific, Waltham, MA), 2mM L-glutamine (Life Technologies), 0.1mM
481 MEM non-essential amino acids (Life Technologies), 100U/ml penicillin and 100ug/ml
482 streptomycin (Life Technologies), 1mM sodium pyruvate (Sigma), 0.05mM 2-
483 mercaptoethanol (Life technologies), LIF (1000U/ml; ESGRO; Merck Milipore) with 2i
484 (PD0325901 (1uM; Stemgent, Cambridge, MA) and CHIR99021 (3uM; Stemgent,
485 Cambridge, MA). For Dppa3 over-expression, a Tet-ON®3G inducible vector containing
486 Dppa3 CDS was transfected into ESCs with Lipfectamine 2000 (ThermoFisher Scientific,
487 Waltham, MA). 500ng/ml of DOX was added on day 0. For Dppa3 knockdown experiment,
488 ON-TARGETplus SMARTpool (Dharmacon, Lafayette, CO) which contain 4 siRNA targeting
19 489 Dppa3 was transfected into ESCs using Lipofectamine® RNAiMAX transfection reagent
490 (ThermoFisher Scientific, Waltham, MA). The Dppa3 siRNA SMARTpool includes the
491 following siRNA sequences: GGAUGAUACAGACGUCCUA; UAGAUAGGAUGCACAACGA;
492 AGAAAGUCGACCCAAUGAA; GAGUAUGUACGUUCUAAUU. ON-TARGETplus Non-
493 targeting (scramble) siRNA (Dharmacon) was transfected as negative control. Td-tomato
494 expression was analysed with BD LSRFORTESSA (BD Biosciences, San Jose, CA).
495
496 Data processing and analysis
497
498 Mapping reads for gene-level counts and data processing
499 Single-end reads were mapped to the Mus musculus genome (GRCm38) using GSNAP
500 (version 2014-07-21) with default options (Wu et al. 2010). To detect splice junctions in
501 reads, we extracted known splice sites from the GTF file of GRCm38 provided by Ensembl
502 (release 79). Uniquely mapped reads were counted for each gene using htseq-count
503 (Anders et al. 2014). We assessed quality of cells following the criteria previously described
504 (Kolodziejczyk et al. 2015) (Figure 1- Figure Supplement 2A). To remove unwanted variation
505 between batches, we applied a generalised linear model to the gene-level counts using the
506 RUVs function of the RUVSeq package of R with k=4 (Risso et al. 2014) (Figure 1 – source
507 data 1). To perform principal component analysis, the batch-adjusted gene counts were
508 normalised by the size factor estimated by the DESeq2 package of R and lowly expressed
509 genes whose mean normalised counts are below 100 were filtered out. The normalised
510 counts were log-transformed and a pseudo count of 1 was added. The prcomp function of R
511 was applied to the log-transformed gene counts by enabling the scaling option.
512
513 Differential expression analysis between WT and KO
514 From the batch-adjusted counts, we identified differentially expressed genes between WT
515 and KO for each developmental stage using the DESeq function of the DESeq2 package of
516 R with default options (Love et al. 2014). Genes that have an adjusted P-value less than a
20 517 given FDR cutoff of 0.05 were considered as differentially expressed (Figure 1- source data
518 2).
519
520 Clustering of time-series data
521 To cluster the dynamic expression profile of genes in WT, we defined 9 classes according to
522 any differences between adjacent time points: EE, ED, EU, DE, DD, DU, UE, UD, UU, where
523 “E” denotes equally expressed, “D” denotes downregulated or “U” denotes upregulated at a
524 later time point. A gene is considered to show significant differences between adjacent time
525 points (“D” or “U”) if the adjusted P-value estimated from DESeq function of the DESeq2
526 package with a default option is less than 0.1. Otherwise, the difference is called as “E”. For
527 example, “ED” means 1) equally expressed between oocyte and 1-cell and 2)
528 downregulated at 2-cell compared to 1-cell. Differential expression analysis was performed
529 between WT and KO samples, adjusted P-value <0.05 is considered significantly different.
530 WT class ED genes that are differentially expressed in KO samples are represented in the
531 left heatmap (Figure 1E, Figure 1 – source data 3). WT class EU genes that are differentially
532 expressed in KO samples are represented in the right heatmap (Figure 1E, Figure 1 –
533 source data 4).
534
535 Gene ontology analysis
536 Gene ontology (GO) analyses were performed using DAVID 6.7
537 (https://david.ncifcrf.gov/home.jsp) for differentially expressed upregulated or downregulated
538 genes in KO compared to WT 2-cell embryos. Adjusted p-values from BP_Fat were plotted
539 with ReviGO (Supek et al. 2011) allowing for medium similarity, with a selection of GO terms
540 displayed (Figure 2A, Figure 2 – source data 1).
541
542 Genome-wide Dppa3 gene co-expression analysis
543 A Dppa3 co-expression network (Bassel et al. 2011) was constructed for genes expressed in
544 WT oocyte, 1-cell, and 2-cell stages. The co-expression network can be represented as an
21 545 adjacency matrix, , with an edge being drawn between two genes and if the magnitude
546 of the Pearson correlation coefficient between the log2 transformed CPM (log2 (CPM+1))
547 was greater than a particular threshold value: