Stella modulates transcriptional and endogenous retrovirus

programs during maternal-to-zygotic transition

Yun Huang1,7 , Jong Kyoung Kim2,6,7, Dang Vinh Do1, Caroline Lee1, Christopher A. Penfold1,

Jan J. Zylicz1, John C. Marioni2,3,5, Jamie A. Hackett1,4,8, M. Azim Surani1,8

1 Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis

Court Road, Cambridge CB2 1QN, UK; Department of Physiology, Development and

Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK

2 European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI),

Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK

3 Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SA,

UK

4 Present Address: European Molecular Biology Laboratory (EMBL) - Monterotondo, via

Ramarini 32, 00015 Rome, Italy

5 Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE,

UK

6 Department of New Biology, DGIST, Daegu, 42988, Republic of Korea

7 Co-first authors

8 Co-corresponding authors [email protected] [email protected]

1 1 Abstract

2

3 The maternal-to-zygotic transition (MZT) marks the period when the embryonic genome is

4 activated and acquires control of development. Maternally inherited factors play a key role in

5 this critical developmental process, which occurs at the 2-cell stage in mice. We investigated

6 the function of the maternally inherited factor Stella (encoded by Dppa3) using single-

7 cell/embryo approaches. We show that loss of maternal Stella results in widespread

8 transcriptional mis-regulation and a partial failure of MZT. Strikingly, activation of

9 endogenous retroviruses (ERVs) is significantly impaired in Stella maternal/zygotic knockout

10 embryos, which in turn leads to a failure to upregulate chimeric transcripts. Amongst ERVs,

11 MuERV-L activation is particularly affected by the absence of Stella, and direct in vivo

12 knockdown of MuERV-L impacts the developmental potential of the embryo. We propose

13 that Stella is involved in ensuring activation of ERVs, which themselves play a potentially

14 key role during early development, either directly or through influencing embryonic

15 expression.

2 16 Introduction

17

18 Maternally inherited factors in the zygote play a critical role during early development

19 (Ancelin et al. 2016, Li et al. 2010, Wasson et al. 2016). The maternal-to-zygotic transition

20 (MZT) marks the time of transfer of developmental control to the embryo following activation

21 of the zygotic genome (Lee et al. 2014, Li et al. 2013). In mice, the major wave of zygotic

22 genome activation (ZGA) occurs at the late 2-cell stage (Golbus et al. 1973). The earliest

23 zygotically transcribed are preferentially enriched in essential processes such as

24 transcription, RNA metabolism and ribosome biogenesis (Hamatani et al. 2004, Xue et al.

25 2013). Other events that characterise early pre-implantation development include extensive

26 erasure of global DNA methylation and dynamic changes in histone modifications (Cantone

27 et al. 2013).

28

29 Previous studies have shown that Stella, encoded by the Dppa3 gene locus, is a maternally

30 inherited factor that is required for normal pre-implantation development (Bortvin et al. 2004,

31 Payer et al. 2003). Stella is a small basic with a putative SAP-like domain and

32 splicing-like factor; it also contains nuclear localisation and export signal and has the

33 potential to bind to DNA and RNA in-vitro (Nakamura et al. 2007, Payer et al. 2003) (Figure

34 1 – Figure Supplement 1A). Expression of Stella is high in oocytes, continues through pre-

35 implantation development, and subsequently occurs specifically in primordial germ cells.

36 Stella is also expressed in naïve embryonic stem cells, but is downregulated upon exit from

37 pluripotency (Hayashi et al. 2008). Lack of maternal inheritance of Stella results in

38 developmental defects; these manifest from the 2-cell stage onwards, and result in only a

39 small proportion of embryos developing to the blastocyst stage. Notably, a maternally

40 inherited pool of Stella in zygotic Stella knockout embryos, derived from Stella heterozygous

41 females, allows development to progress normally. This indicates Stella has a key role

42 during the earliest developmental events (Payer et al. 2003).

43

3 44 Stella was suggested to protect maternal pronuclei (PN) from TET3 mediated active DNA

45 demethylation (Nakamura et al. 2007, Nakamura et al. 2012) (Figure 1 – Figure Supplement

46 1B). In the absence of Stella, zygotes display enrichment of 5hmC in both parental PN

47 (Wossidlo et al. 2011), and an aberrant accumulation of γH2AX in maternal

48 (Nakatani et al. 2015). In addition, Stella protects DNA methylation levels at selected

49 imprinted genes and transposable elements (Nakamura et al. 2007). However, embryos

50 depleted of maternal effect known to regulate imprinted genes only exhibit

51 developmental defects post-implantation (Denomme et al. 2013), implying that the 2-cell

52 phenotype in Stella maternal/zygotic knockout (Stella M/Z KO) embryos is not primarily

53 linked with imprint defects. Moreover, TET3 only partially contributes to DNA demethylation

54 and its absence is compatible with embryonic development (Amouroux et al. 2016, Peat et al.

55 2014, Shen et al. 2014, Tsukada et al. 2015). Thus, what impairs pre-implantation

56 embryonic development in the absence of maternal Stella remains unclear.

57

58 A significant number of transposable elements (TEs) are preferentially activated during early

59 development and in a sub-population of mouse embryonic stem cells (Gifford et al. 2013,

60 Macfarlan et al. 2012, Rowe et al. 2011). Notably, at the 2-cell (2C) stage in mouse

61 development, there is selective upregulation of endogenous retrovirus (ERVs), which are a

62 subset of TEs characterised by the presence of LTRs that mediate expression and

63 retrotransposition (Kigami et al. 2003, Ribet et al. 2008). Growing evidence suggests that

64 activation of some TEs has important biological functions during early development (Beraldi

65 et al. 2006, Kigami et al. 2003). TEs can regulate tissue-specific gene expression or splicing

66 through their exaptation as gene regulatory elements, and may also play a key role during

67 speciation (Bohne et al. 2008, Gifford et al. 2013, Rebollo et al. 2012). TEs additionally drive

68 expression of genes directly by acting as alternative promoters that generate chimeric

69 transcripts, which include both TE and protein-coding sequences (Macfarlan et al. 2012,

70 Peaston et al. 2004). Thus the expression of TEs may be functionally important during early

4 71 embryo development and understanding the regulation of TEs themselves is therefore of

72 great interest.

73

74 We adopted an unbiased approach to investigate the role of Stella during mouse maternal-

75 to-zygotic transition, using single-cell/embryo RNA-seq analysis of mutant embryos (Deng et

76 al. 2014, Xue et al. 2013, Yan et al. 2013). We find Stella M/Z KO 2-cell embryos fail to

77 upregulate key zygotic genes involved in regulation of ribosome and RNA processing.

78 Furthermore, the absence of Stella results in widespread misregulation of TEs and of

79 chimeric transcripts that are derived from these TEs. In particular, Stella M/Z KO embryos

80 exhibit a general failure to upregulate MuERV-L transcripts and protein at the 2-cell stage.

81 Our perturbation data is consistent with MuERV-L playing a functionally important role during

82 pre-implantation embryonic development, implying that MuERV-L is amongst the critical

83 factors affected by Stella in early embryos.

84

85 Results

86

87 Stella M/Z KO embryos are transcriptionally distinct from wild type embryos

88 We used Stella knockout mice (Payer et al. 2003), to collect mutant oocytes and embryos for

89 single-cell/embryo RNA-seq with modifications (Tang et al. 2010). Stella knockout (KO)

90 oocytes and Stella maternal/zygotic knockout (Stella M/Z KO, KO) 1-cell and 2-cell embryos,

91 lacking both maternal and zygotic Stella, were harvested. Strain matched wild type (WT)

92 oocytes and embryos served as controls (Figure 1A; Figure1 – Figure Supplement 2A).

93 Following normalisation, potentially confounding technical factors were controlled for using

94 Removal of Unwanted Variation (RUV) (Risso et al. 2014) (Figure 1 – Figure Supplement 2B,

95 Figure 1- source data 1, Supplementary File 1).

96

97 Principal component (PC) analysis revealed that the first PC represents progression from

98 oocyte to the 2-cell stage (Figure 1B). The largest separation is observed between oocyte /

5 99 1-cell and 2-cell embryos, consistent with the degradation of maternal transcripts and

100 activation of the zygotic genome at MZT. The second and third PCs capture the separation

101 between WT and KO samples, suggesting distinct genome-wide gene expression changes

102 between WT and KO embryos. Furthermore, although KO oocytes and 1-cell embryos

103 exhibit some separation from their WT counterparts, the difference is more clearly

104 pronounced at the 2-cell stage. Notably, Stella M/Z KO 1-cell and 2-cell embryos clustered

105 separately from each other, suggesting that Stella M/Z KO defects at the 2-cell stage are not

106 simply due to delayed developmental progression (Figure 1B). Differential gene expression

107 analysis between WT and KO samples identified 5881 misregulated genes (adjusted P-

108 value < 0.05) across the developmental stages with 360 genes overlapping across all stages

109 (Figure 1C, Figure 1 – source data 2).

110

111 Stella M/Z KO embryos are impaired in maternal-to-zygotic transition

112 Next, we compared the differentially expressed genes (DEG) against a public database of

113 early mouse embryonic transcriptomes (DBTMEE) (Park et al. 2015). This dataset

114 categorises genes by specific expression pattern at any developmental stage. Strikingly, we

115 found significant enrichment of maternal transcripts (maternal RNA, minor zygotic genome

116 activation and 1-cell transient genes) and depletion of zygotic transcripts (major zygotic

117 genome activation, 2-cell transient and mid pre-implantation activation) in the Stella M/Z KO

118 2-cell embryos compared to WT (Figure 1D). While Stella KO oocytes are transcriptionally

119 distinct from WT (Figure 1B), Stella KO oocytes do not exhibit overt abnormalities (Bortvin et

120 al. 2004, Payer et al. 2003). Furthermore, the apparent enrichment in maternal transcripts

121 only manifests at the 2-cell stage, suggesting the observed transcriptional differences are

122 not inherited from aberrant transcripts in the Stella KO oocyte but are likely linked to failed

123 downregulation of these maternal transcripts.

124

125 To independently characterise differences in gene expression dynamics between WT and

126 KO embryos during MZT, genes were clustered based on their pattern of expression in WT

6 127 oocytes, 1-cell and 2-cell embryos. We identified a cluster of maternal transcripts (ED),

128 defined as highly expressed in WT oocytes and 1-cell and with reduced expression at the 2-

129 cell stage, that are differentially expressed in KO samples (n=849, adjusted p-value <0.05)

130 (Figure 1E, Figure 1 – source data 3). Of these genes, 37.7% did not show the expected

131 reduction in expression in Stella M/Z KO 2-cell embryos. Furthermore, we detected a group

132 of zygotically activated genes (ZAG) (EU) with low expression in WT oocytes and 1-cell

133 embryos and increased expression in 2-cell embryos, which are differentially expressed in

134 KO samples (n=698, adjusted p-value <0.05). Moreover, 39.5% of this group of genes

135 exhibited significantly dampened 2-cell activation in the absence of maternal and zygotic

136 Stella (Figure 1E, Figure 1 – source data 4). The combined analysis suggests that Stella M/Z

137 KO embryos exhibit partial impairment in maternal-to-zygotic transition.

138

139 Stella M/Z KO 2-cell embryos fail to upregulate essential ribosomal and RNA

140 processing genes

141 analysis of genes more highly expressed in KO than WT 2-cell embryos

142 reveals an enrichment of chromatin modifiers, and genes involved in microtubule based

143 processes and response to DNA damage (Figure 2A, Figure 2 – source data 1), consistent

144 with a previous observation of abnormal γH2AX enrichment in maternal PN (Nakatani et al.

145 2015). In addition, downregulated genes in Stella M/Z KO 2-cell embryos are enriched for

146 RNA processing and ribosome biogenesis, with an overall depletion of genes associated

147 with ribosomes in the KEGG pathway and related gene ontology (Figure 2B, Figure 2-

148 Figure Supplement 1). The findings from single-cell/embryo RNA-sequencing were validated

149 by independent qRT-PCR analyses. Consistently, genes associated with chromatin

150 modifiers (Rbbp7 and Kdm1b) and DNA damage (Fam175a and Brip1) were more highly

151 expressed in KO embryos (Figure 2C), while those associated with ribosome biogenesis

152 (Nop16, Emg1), RNA binding (Larp1b) and cell cycle regulator (Cdc25c) showed lower

153 expression in KO 2-cell embryos (Figure 2D). To explore potential regulatory targets of

154 Stella, a genome-wide co-expression network analysis was performed across the WT

7 155 dataset. Overall, 910 genes showed correlated expression patterns with Dppa3 (Pearson

156 correlation >|0.9|) (Figure 2 – source data 2). Of these, 710 genes are positively correlated

157 while 200 genes are negatively correlated with Dppa3. Genes with a positive expression

158 correlation with Dppa3 are significantly enriched for RNA processing and RNA splicing

159 (Figure 2E), including Snrpd1 and Snrpb2, both core components of the spliceosome (Figure

160 2F).

161

162 Overall, our analysis suggests that the Stella M/Z KO 2-cell embryos do not effectively

163 transition from maternal to zygotic control, as judged by the aberrant enrichment of maternal

164 transcripts and depletion of zygotic transcripts. Genes regulating essential processes such

165 as RNA processing are depleted in Stella M/Z KO 2-cell embryos and Dppa3 exhibits

166 positive expression correlation with genes enriched in these processes. Notably, most genes

167 however do undergo appropriate MZT changes in mutant embryos, arguing against a

168 general developmental delay. Taken together, this supports a hypothesis whereby Stella

169 plays an important role in upregulating the expression of a subset of genes that are essential

170 during zygotic genome activation.

171

172 Expression of TEs are dysregulated in Stella KO oocytes and Stella M/Z KO embryos

173 Stella has been shown to affect the DNA methylation of particular TEs in zygotes (Nakamura

174 et al. 2007) and primordial germ cells (Nakashima et al. 2013). Since many TEs are

175 specifically activated during early development, we investigated their expression in WT and

176 KO oocytes and embryos. We remapped the single-cell/embryo RNA sequencing data to

177 obtain read counts for class I TEs (retrotransposons) (Figure 3 – source data 1-2). To get an

178 overview of the extent of TE activation, we calculated the fraction of transcripts mapped to

179 TEs as a ratio of total mapped reads. We found a dramatic increase in the ratio of reads

180 mapped to TEs, primarily of the LTR class, at the 2-cell stage compared to oocyte and 1-cell,

181 suggesting widespread TE activation coincident with MZT (Figure 3A, Figure 3B). Further

182 inspection revealed that activation of the LTR class is dominated by upregulation of ERVL

8 183 and ERVL-MaLR families, while LINE and SINE classes are primarily accounted for by L1

184 and Alu families, respectively (Figure 3 – Figure Supplement 1A). Moreover, principal

185 component analysis showed developmental progression can be clearly captured by TE

186 expression alone along PC1 (Figure 3C), similar to recent studies identifying stage-specific

187 transcription initiation of ERV expression in early human embryos (Göke et al. 2015, Grow et

188 al. 2015). In addition, WT and KO oocytes and embryos can be partially separated in

189 individual PCAs along each of the developmental stages, signifying differences in

190 expression of TEs between WT and KO samples.

191

192 Stella is associated with the expression of specific TE

193 We identified ‘maternal TEs’, defined as TEs expressed higher in WT oocytes than 2-cell

194 embryos and ‘zygotic TEs’, as TEs expressed higher in WT 2-cell embryos than oocytes. To

195 characterise the disparities between TE expression in WT and KO, we compared

196 differentially expressed TEs at the 2-cell stage, with maternal and zygotic TEs (Figure 3D).

197 Strikingly, TEs upregulated in Stella M/Z KO 2-cells are significantly enriched for maternal

198 TEs while TEs downregulated in Stella M/Z KO 2-cells are associated with zygotic TEs.

199 Furthermore, Stella M/Z KO 2-cell embryos display lower activation of LTR, specifically the

200 endogenous retrovirus (LTR-ERVL) family, with small relative enrichment for LINE and SINE

201 classes (Figure 3E, Figure 3 – Figure Supplement 1B).

202

203 A particular element of interest is MuERV-L (Kigami et al. 2003, Ribet et al. 2008), which

204 encodes a canonical retroviral Gag and Pol, flanked by 5’ and 3’ LTR (also known as

205 MT2_Mm). We observed a selective upregulation of MuERV-L-Int and MT2_Mm transcripts

206 at the 2-cell stage, but this was significantly reduced in Stella M/Z KO embryos (Figure 3F).

207 Many MuERV-L elements have an open reading frame and we therefore tested protein

208 levels using a MuERV-L Gag antibody. This antibody was validated by showing staining

209 overlaps well with a MuERV-L reporter ESC line – 2C::tdTomato (Macfarlan et al. 2012), and

210 is specific to 2-cell embryos whilst not detected in metaphase II oocytes (Figure 3 – Figure

9 211 Supplement 1C). Strikingly, immunofluorescence (IF) staining showed a significantly lower

212 signal and loss of perinuclear localisation of MuERV-L Gag (Ribet et al. 2008) in Stella M/Z

213 KO 2-cell embryos compared to WT, implying Stella influences both transcript and protein

214 levels of MuERV-L (Figure 3G).

215

216 To assess whether Stella directly binds to specific transcripts / TEs we turned to ESC since

217 obtaining sufficient 2-cell embryos for this analysis is intractable. Chromatin

218 immunoprecipitation-seq (ChIP-seq) of HA-tagged Stella resulted in minimal peak calls

219 (n=56) (Supplementary File 2), implying Stella does not efficiently bind DNA (including

220 MuERV-L) in ESC. Consistently, neither overexpression nor knockdown of Dppa3 in mESC

221 had a marked effect on the expression of MuERV-L (Figure 3 – Figure Supplement 2).

222 These results suggest that Stella-mediated control of MuERV-L is likely specific to the

223 unique chromatin/cellular context of early embryos. Notably Stella has the potential to bind

224 both RNA and DNA and could thus affect TEs through multiple modes. Overall, our data

225 suggests Stella plays a key role in regulating a subset of TEs specifically in 2-cell embryos,

226 which includes promoting strong activation of MuERV-L elements.

227

228 Expression of a subset of TEs is positively correlated with their nearest gene

229 Next we explored the relationship between TE and protein-coding gene expression in WT

230 and KO samples. First, we considered the intersection between differentially expressed

231 genes and genes that are neighbours of misregulated TEs at the 2-cell stage, and

232 discovered a highly significant overlap of 12% between the two populations (p=3.19x10-87)

233 (Figure 4A). Notably, a greater number of TEs downregulated in KO 2-cell embryos are

234 located within 10kb of the transcriptional start site of zygotically activated genes (ZAG,

235 n=698) than expected by chance (p<10-4) (Figure 4B). Altogether, this suggests

236 perturbations in TE expression may be linked to neighbouring mRNA expression levels in

237 Stella M/Z KO embryos. To determine if this relationship is applicable genome-wide, we

238 performed global expression correlation analysis between a TE and its nearest gene. To

10 239 eliminate confounding factors, we excluded TEs that overlap directly with an exon. Notably,

240 we detected 387 TEs whose expression levels were positively correlated with their nearest

241 gene (Spearman’s correlation >0.7) while 224 TEs were negatively correlated (Spearman’s

242 correlation < -0.7) (Figure 4 – Figure Supplement 1A). This relationship was validated using

243 single-embryo qRT-PCR at 4 genomic loci, which demonstrated a significant correlation

244 between the expression of a TE and its downstream gene (Figure 4C). These loci were

245 selected to represent a range of LTR class elements; RMER19A, ORR1A2-Int and MT2B

246 are further categorised into ERVK, ERVL-MaLR and ERVL families respectively.

247 Interestingly, 2 of the genes, Chka and Zfp54, have been implicated in regulation of early

248 embryonic development (Albertsen et al. 2010, Wu et al. 2008). Thus, the expression levels

249 of a subset of TEs are highly correlated with the expression of their nearest gene.

250

251 Whilst association does not equate to causality, there are a number of ways TEs have been

252 implicated in the regulation of gene expression (Elbarbary et al. 2016, Thompson et al.

253 2016). One possibility is the establishment of alternative splicing events, which create a

254 junction between an expressed TE and a downstream gene exon, thus leading to the

255 formation of a chimeric transcript (Peaston et al. 2004). Indeed, MuERV-L has been shown

256 to form such fusion transcripts with 307 genes at the 2-cell stage (Macfarlan et al. 2012).

257 Here, we find Stella M/Z KO embryos with the lowest MuERV-L expression also express

258 lower levels of genes that make a chimeric junction with MuERV-L elements (Figure 4 –

259 Figure Supplement 1B). In contrast, the KO embryo with the highest MuERV-L expression

260 expresses higher levels of these genes. Importantly, we next identified and validated a

261 number of chimeric transcripts that are expressed in 2-cell embryos by PCR and sequenced

262 across the chimeric junction (Figure 4D). Single-embryo qRT-PCR shows most of these

263 chimeric transcripts are expressed at lower levels in KO compared to WT 2-cell embryos

264 (Figure 4E). These chimeric transcripts include Snrpc, which encodes for a component of

265 the ribonucleoprotein required for the formation of the spliceosome, and the RNA binding

266 protein, Rmb8a. Our data implies a link between activation of some TEs and the emergent

11 267 gene expression network, which is perturbed in Stella M/Z KO embryos due to altered TE

268 expression. This is, in part, mediated by TEs directly regulating gene expression through

269 chimeric transcripts.

270

271 MuERV-L plays a functional role during pre-implantation development

272 We propose that the dysregulation in TE expression might directly contribute to the

273 abnormal pre-implantation phenotype observed in Stella M/Z KO embryos. We focused on

274 assessing the function of MuERV-L elements, as they represent one of the most highly

275 activated TEs at the 2-cell stage (Kigami et al. 2003), as well as being amongst the most

276 significantly downregulated in Stella M/Z KO embryos (Figure 3F and Figure 3G). We

277 considered that attenuated MuERV-L activation in Stella M/Z KO embryos could in turn

278 disrupt activation of associated 2-cell transcripts driven by their LTRs (Figure 4 – Figure

279 Supplement 1B). Alternatively, MuERV-L mRNA/protein itself may also have a functional

280 role during early embryonic development.

281

282 We detected 583 copies of the full-length MuERV-L element, containing the complete gag

283 and pol genes, in the genome. This makes it experimentally challenging to completely

284 eliminate MuERV-L activation in early embryos. Nevertheless, we utilised siRNA to

285 knockdown the expression of MuERV-L elements in 1-cell embryos and monitored the

286 effects on embryonic development (Figure 5A). We designed siRNA to target the consensus

287 sequence of MuERV-L from the Dfam database (Hubley et al. 2016) (Figure 5B).

288 Computational analysis reveals MuERV-L siRNA targets 80.5% of full-length MuERV-L

289 elements with perfect match and 99.5% with ≤2bp mismatches, while scramble siRNA has

290 no targets with ≤2bp mismatches. At 20μM siRNA, we observed a slight decrease in

291 MuERV-L transcripts (Figure 5C) but a significant reduction in MuERV-L Gag protein

292 staining in 2-cell embryos (Figure 5D). These embryos exhibited a mild developmental delay

293 at 4 to 8-cell stage progression on day 2, with fewer embryos reaching the blastocyst stage

294 on day 4, and those that did were smaller and with fewer cells (Figure 5E). This suggests

12 295 that a modest reduction in MuERV-L at the 2-cell stage has a notable effect on pre-

296 implantation development.

297

298 There is nonetheless variability in MuERV-L expression at the 2-cell stage, as shown in

299 other studies (Ancelin et al. 2016), and we therefore speculated that a higher concentration

300 of siRNA might increase the repression efficiency of MuERV-L transcripts. Indeed, 80μM

301 MuERV-L siRNA was more effective (Figure 5F, 5G), and in turn resulted in >90% embryos

302 arrested at the 2-cell stage (Figure 5H), albeit some indirect effects at higher concentrations

303 of siRNA may contribute to the embryonic arrest. Notably, MuERV-L knockdown resulted in

304 repression of some chimeric transcripts previously characterised in 2-cell embryos (Figure

305 4D, Figure 5 – Figure Supplement 1). These data collectively suggest MuERV-L may have a

306 functional role in embryonic development.

307

308 Discussion

309 The maternal-to-zygotic transition is a complex process that entails extensive global

310 transcriptional and epigenetic changes. Maternally inherited factors such as Stella play a role

311 in this transition in mouse oocytes but the difficulties of obtaining sufficient materials has

312 hindered investigations on how they contribute to MZT. Here, we use single-cell/embryo

313 approaches and find that deletion of maternal and zygotic Stella affects MZT. Furthermore,

314 we observed a failure to activate endogenous retrovirus (LTR-ERVL), and specifically

315 MuERV-L elements. In parallel there is an accumulation of maternal transcripts and a failure

316 to upregulate many zygotic transcripts. These events appear to be partially coupled, with

317 altered LTR promoter activity leading to changes in gene expression, for example through

318 the modulation of chimeric transcripts. In addition, we reveal the biological relevance of TE

319 expression as knockdown of MuERV-L at the 2-cell stage hinders developmental

320 progression, which likely contributes to the phenotype observed following loss of Stella.

321

13 322 Maternal mRNA degradation and zygotic genome activation are critical aspects during MZT

323 (Li et al. 2013). Maternal mRNA clearance plays both a permissive and instructive role

324 (Tadros et al. 2009), while ZGA also promotes the final degradation of maternal transcripts.

325 Stella has the potential to function as an RNA binding protein, which might facilitate the

326 degradation of maternal transcripts (Nakamura et al. 2007, Payer et al. 2003). In addition,

327 Stella KO oocytes exhibit transcriptional differences compared to WT oocytes. This is

328 consistent with a previous study showing Stella KO oocytes are defective in transcriptional

329 repression (Liu et al. 2012), which may be needed for dosage regulation of critical maternally

330 inherited transcripts. Stella has also however been demonstrated to be required beyond

331 oogenesis (Nakamura et al. 2007). Maternally inherited factors likely influence the timing and

332 specificity of gene expression during ZGA, which is essentially determined by interplay

333 between the transcriptional machinery and histone modifications that influences chromatin

334 accessibility (Lee et al. 2014).

335

336 The failure of zygotic transcript activation may be linked to the failure of zygotic TE activation.

337 We detected extensive changes in global TE expression during MZT. This is accompanied

338 by strong upregulation of the LTR-ERVL family in 2-cell embryos, which is consistent with a

339 recent study showing that the chromatin surrounding ERVL is in a highly accessible state at

340 the 2-cell stage (Wu et al. 2016). LTR-ERVL upregulation is significantly reduced in the

341 absence of maternal Stella, suggesting that Stella may facilitate the activation of many ERVs.

342 The expression changes are validated at the protein level for a specific LTR-ERVL element,

343 MuERV-L. Importantly, the decrease in MuERV-L is unlikely to be a secondary effect of 2-

344 cell embryonic arrest, as maternal Kdm1a (LSD1) mutant embryos, which result in pan 2-cell

345 arrest maintain normal MuERV-L transcript expression (Ancelin et al. 2016). Notably, whilst

346 many epigenetic mechanisms have been identified for the suppression of ERVs (Gifford et al.

347 2013), we have identified a factor associated with the activation of a subset of ERVs during

348 mammalian MZT.

349

14 350 The question remains, how does Stella regulate TE expression? In primordial germ cells,

351 Stella directly binds to LINE1 and IAP elements to facilitate DNA demethylation of these

352 targets (Nakashima et al. 2013). Whilst experimentally challenging, it would be pertinent to

353 determine potential DNA binding targets of Stella in 2-cell embryos. At the same time,

354 function of Stella at the RNA level cannot be excluded, or it could act by altering the

355 epigenetic status of cells at this time. For example, the H3K9 methyltransferase Setdb1 is

356 aberrantly enriched in Stella M/Z KO 2-cell embryos, and this has been shown to suppress

357 ERVs and chimeric transcripts expression (Eymery et al. 2016, Karimi et al. 2011, Kim et al.

358 2016). Since a human orthologue of Dppa3 is expressed in ESCs, germ cells and pre-

359 implantation embryos (Bowles et al. 2003, Deng et al. 2014, Payer et al. 2003), it would be

360 of interest to know whether DPPA3 plays a similar role during human pre-implantation

361 development; for example, on the stage specific activation endogenous retroviruses (Göke

362 et al. 2015, Grow et al. 2015).

363

364 TEs have evidently been co-opted for the regulation of mammalian development as

365 exemplified by the domestication of the retroviral env gene, which is essential for placental

366 development (Dupressoir et al. 2012). Here, we have shown that the expression of a subset

367 of TEs are intimately linked to its nearest gene during early embryonic development. As we

368 have shown in several cases, TEs could regulate gene expression through chimeric

369 transcripts, which are misregulated in the absence of Stella. The impact of transcriptional

370 inhibition of ERV programme and associated chimeric transcripts during MZT, and their

371 subsequent effects on development merits further investigation. Intriguingly, we discovered

372 that reducing MuERV-L mRNA/ protein levels at the 2-cell stage affects development. This

373 could suggest that the presence of MuERV-L Gag protein has a functional role during early

374 mouse development, similar to a recent report illustrating a role of endogenous LTR viral-like

375 particles in human blastocyst development (Grow et al. 2015). It is also possible that

376 MuERV-L siRNA KD may downregulate a critical subset of chimeric transcripts that include

377 MuERV-L sequence or, could feedback to target transcriptional repression to MuERV-L loci

15 378 through small RNA pathways (Castel et al. 2013). Disentangling these possibilities warrants

379 further investigation.

380

381 In conclusion, we have revealed that Stella is involved in orchestrating the transcriptional

382 changes and activation of endogenous retroviral (ERV) elements during maternal-to-zygotic

383 transition. The appropriate expression of LTR-ERV driven zygotic genes and specific ERV

384 elements (MuERV-L) in turn contribute towards normal early embryonic development in mice.

385

386 Materials and methods

387

388 Experimental methods

389

390 Collection of mouse oocytes and embryos

391 All husbandry and experiments involving mice were carried out according to the Home Office

392 guidelines and the local ethics committee. Here, we refer to Stella as the protein encoded by

393 the Dppa3 gene. Stella knockout (KO) mice were generated as previously described (Payer

394 et al. 2003) (RRID:MGI:2683730) in a pure 129/SvEv strain. For RNA-seq, ovulated oocytes

395 were collected from Stella KO or wild type (WT) females crossed with vasectomised male at

396 9am, embryonic day (E) 0.5. Stella KO females were crossed with Stella KO males to

397 produce Stella maternal and zygotic knockout (Stella M/Z KO, KO) embryos, while Stella WT

398 females were crossed with Stella WT males to obtain WT embryos. 1-cell and 2-cell embryos

399 were collected at 4pm on E0.5 and 9am-3pm on E1.5 respectively. All oocytes and embryos

400 were morphologically assessed to ensure only viable samples were collected.

401

402 Single embryo RNA sequencing and RT-qPCR

403 Oocytes were incubated with 0.3 mg/ml hyaluronidase (Sigma, St. Louis, MO) to remove the

404 cumulus cells. Zona pellucida was removed from oocytes and embryos prior to lysis. RNA

405 were extracted from single oocytes, 1-cell embryos or whole 2-cell embryos; and cDNA were

16 406 amplified as described previously with modifications (Tang et al. 2010). Between 1:10,000 –

407 1:60,000 dilution of ERCC RNA Spike-In mix (ThermoFisher Scientific, Hemel Hempstead,

408 UK) was added to lysis buffer to estimate efficiency of amplification (Jiang et al. 2011). 1:10

409 dilution of the cDNA was used for RT-qPCR. For relative expression, all genes / TE

410 transcripts were normalised to 3 housekeeping genes (ARBP, PPIA and GAPDH). For

411 primer sequences see Supplementary File 3. For RNA-seq, cDNA were fragmented to ~

412 200bp with Focused-ultrasonicator (Covaris, Woburn, MA) and adaptor ligated libraries were

413 generated using NEBNext® UltraTM DNA library Prep Kit for illumina® (New England Labs,

414 Ipswich, MA). Single-end 50bp sequencing was performed with HiSeq1500 (Illumina, San

415 Diego, CA) to an average depth of 13.5 million reads per sample.

416

417 2-cell embryo immunofluorescence staining

418 2-cell embryos were fixed in 4% paraformaldehyde (PFA) in PBS for 15mins at room

419 temperature (RT). After permeabilisation with IF buffer (0.1% Trition, 1% BSA in PBS) for

420 30mins at RT, embryos were incubated with primary antibody overnight at 4ºC. After

421 washing 3x with IF buffer, embryos were incubated with corresponding secondary antibody

422 and 1g/ml of DAPI (4’,6’-diamidino-2-phenylindole) for DNA visualisation for 1hr at RT.

423 After washing, embryos were mounted in VECTASHIELD ® Mounting medium (Vector

424 Laboratories, Peterborough, UK). For each experiment, WT and KO embryos were stained

425 at the same time and processed identically using the same setting for confocal acquisition to

426 allow comparison of relative fluorescence intensity. The following antibodies were used:

427 rabbit anti-MuERV-L Gag (ER50102, Huangzhou HuaAn Biotechnology Co., LTD, China)

428 (RRID:AB_2636876), 1:200; Alexa488 donkey anti-rabbit IgG (ab150073, Abcam,

429 Cambridge,UK) (RRID:AB_2636877), 1:500.

430

431 siRNA injections

432 B6CBAF1/J (F1) female mice were superovulated by injection of 5 International Units (IU) of

433 pregnant mare’s serum gonadotrophin (PMSG) (Sigma, St. Louis, MO), followed by 5 IU of

17 434 human chorionic gonadotrophin (hCG) (Sigma, St. Louis, MO) after 48hours and then mated

435 with F1 male mice. Zygotes were harvested from oviducts 17-22hours post hCG injection.

436 Cumulus cells were removed by incubation with 0.3mg/ml hyaluronidase in M2 medium

437 (Sigma, St. Louis, MO). 20M or 80M of Stealth RNAi siRNA (ThermoFisher Scientific,

438 Waltham, MA) were micro-injected into the cytoplasm of zygotes using a Femtoject 4i device

439 (Ependorf, UK). The following are the siRNA sequences: scramble (sense:

440 UUCCUCUCCACGCGCAGUACAUUUA) and MuERV-L (sense:

441 GAAGAUAUGCCUUUCACCAGCUCUA). Injected embryos were cultured in M16 medium

442 (Sigma, St. Louis, MO) in BD Falcon Organ Culture Dish at 5% CO2 and 37 ºC for a total of

443 5 days. The number of surviving 2-cell embryos 24hrs after micro-injection represents the

444 total number of embryo analysed per experimental group. 20M or 80M siRNA injections

445 were repeated twice independently.

446

447 Stella-HA Chromatin immunoprecipitation

448 We tagged three copies of hemagglutinin (HA, 27bp) to the C-terminus of Stella and used

449 the high affinity HA antibody to pull down Stella bound chromatin. The Stella+HA construct

450 was expressed in the absence of endogenous Stella by using Stella M/Z KO mESCs, with

451 NANOG-HA and eGFP-HA as positive and negative controls respectively. The ChIP protocol

452 was modified from a previously described Stella ChIP protocol (Nakamura et al. 2012).

453 Briefly, cells were cross-linked with 1% paraformaldehyde (PFA) for 8 minutes at RT,

454 quenched with 200mM glycine for 5 minutes and washed twice with PBS. Samples were

455 suspended in 2ml/IP of radio immunoprecipitation assay (RIPA) buffer (50mM Tris-HCl, pH8,

456 150mM NaCl, 1mM EDTA, 1% NP40, 0.5% deoxycholate and 0.1% SDS) with 1:20 dilution

457 of proteinase inhibitor cocktail (Roche, Welwyn Garden City, UK) and sonicated to an

458 average length of 200-1000bp. Dynabeads® Protein G (ThermoFisher Scientific, Hemel

459 Hempstead, UK) were pre-incubated with anti-mouse HA antibody (MMS-101P-500,

460 Covance Research Products Inc, Denver, CA) (RRID:AB_291261) at 7g/IP for 1 hour at 4℃.

461 The antibody-beads complex was then incubated with chromatin solution overnight at 4℃.

18 462 Beads were washed twice with RIPA buffer, twice with high-salt wash buffer (20mM Tris-HCl,

463 pH8; 500mM NaCl, 1mM EDTA, 1% NP-40, 0.5% deoxycholate and 0.1% SDS), twice with

464 LiCl wash buffer (250mM LiCl, 20mM Tris-HCl, pH8, 1mM EDTA, 1% NP-40 and 0.5%

465 deoxycholate) for 10 minutes at 4℃ and a final wash with Tris-EDTA. Chromatin were eluted

466 by adding 100l of SDS-Elution buffer (50mM Tris-HCl (pH 7.5), 10mM EDTA, 1% SDS) and

467 incubated at 68℃ for 10 minutes. Reverse-crosslink was performed with 100l of proteinase

468 K (1mg/ml) for 2 hours at 42℃ and 5 hours at 68℃. Finally, DNA were extracted with phenol

469 purification and precipitated with NaCl into pellets. DNA pellets were re-suspended in 11l of

470 H20. We detected enrichments of NANOG in known binding regions using ChIP-qPCR, but

471 not for eGFP, supporting the efficacy of the ChIP (data not shown). Stella-HA ChIP

472 sequencing libraries were prepared with NEBNext® UltraTM Library Prep Kit for Illumina®

473 (New England Labs, Ipswich, MA) with ~8ng DNA input. Single-end 50bp sequencing was

474 performed with HiSeq2500 (Illumina, San Diego, CA). Stella -HA ChIP seq enriched peaks

475 are shown in Supplementary File 2.

476

477 2C::tdTomato ESC culture and Dppa3 transfection

478 2C::tdTomato reporter (Addgene plasmid #40281) ESCs (Macfarlan et al. 2012) were

479 cultured in GMEM (Life Technologies, Carlsbad, CA), 10% KnockOutTM Serum Replacement

480 (ThermoFisher Scientific, Waltham, MA), 2mM L-glutamine (Life Technologies), 0.1mM

481 MEM non-essential amino acids (Life Technologies), 100U/ml penicillin and 100ug/ml

482 streptomycin (Life Technologies), 1mM sodium pyruvate (Sigma), 0.05mM 2-

483 mercaptoethanol (Life technologies), LIF (1000U/ml; ESGRO; Merck Milipore) with 2i

484 (PD0325901 (1uM; Stemgent, Cambridge, MA) and CHIR99021 (3uM; Stemgent,

485 Cambridge, MA). For Dppa3 over-expression, a Tet-ON®3G inducible vector containing

486 Dppa3 CDS was transfected into ESCs with Lipfectamine 2000 (ThermoFisher Scientific,

487 Waltham, MA). 500ng/ml of DOX was added on day 0. For Dppa3 knockdown experiment,

488 ON-TARGETplus SMARTpool (Dharmacon, Lafayette, CO) which contain 4 siRNA targeting

19 489 Dppa3 was transfected into ESCs using Lipofectamine® RNAiMAX transfection reagent

490 (ThermoFisher Scientific, Waltham, MA). The Dppa3 siRNA SMARTpool includes the

491 following siRNA sequences: GGAUGAUACAGACGUCCUA; UAGAUAGGAUGCACAACGA;

492 AGAAAGUCGACCCAAUGAA; GAGUAUGUACGUUCUAAUU. ON-TARGETplus Non-

493 targeting (scramble) siRNA (Dharmacon) was transfected as negative control. Td-tomato

494 expression was analysed with BD LSRFORTESSA (BD Biosciences, San Jose, CA).

495

496 Data processing and analysis

497

498 Mapping reads for gene-level counts and data processing

499 Single-end reads were mapped to the Mus musculus genome (GRCm38) using GSNAP

500 (version 2014-07-21) with default options (Wu et al. 2010). To detect splice junctions in

501 reads, we extracted known splice sites from the GTF file of GRCm38 provided by Ensembl

502 (release 79). Uniquely mapped reads were counted for each gene using htseq-count

503 (Anders et al. 2014). We assessed quality of cells following the criteria previously described

504 (Kolodziejczyk et al. 2015) (Figure 1- Figure Supplement 2A). To remove unwanted variation

505 between batches, we applied a generalised linear model to the gene-level counts using the

506 RUVs function of the RUVSeq package of R with k=4 (Risso et al. 2014) (Figure 1 – source

507 data 1). To perform principal component analysis, the batch-adjusted gene counts were

508 normalised by the size factor estimated by the DESeq2 package of R and lowly expressed

509 genes whose mean normalised counts are below 100 were filtered out. The normalised

510 counts were log-transformed and a pseudo count of 1 was added. The prcomp function of R

511 was applied to the log-transformed gene counts by enabling the scaling option.

512

513 Differential expression analysis between WT and KO

514 From the batch-adjusted counts, we identified differentially expressed genes between WT

515 and KO for each developmental stage using the DESeq function of the DESeq2 package of

516 R with default options (Love et al. 2014). Genes that have an adjusted P-value less than a

20 517 given FDR cutoff of 0.05 were considered as differentially expressed (Figure 1- source data

518 2).

519

520 Clustering of time-series data

521 To cluster the dynamic expression profile of genes in WT, we defined 9 classes according to

522 any differences between adjacent time points: EE, ED, EU, DE, DD, DU, UE, UD, UU, where

523 “E” denotes equally expressed, “D” denotes downregulated or “U” denotes upregulated at a

524 later time point. A gene is considered to show significant differences between adjacent time

525 points (“D” or “U”) if the adjusted P-value estimated from DESeq function of the DESeq2

526 package with a default option is less than 0.1. Otherwise, the difference is called as “E”. For

527 example, “ED” means 1) equally expressed between oocyte and 1-cell and 2)

528 downregulated at 2-cell compared to 1-cell. Differential expression analysis was performed

529 between WT and KO samples, adjusted P-value <0.05 is considered significantly different.

530 WT class ED genes that are differentially expressed in KO samples are represented in the

531 left heatmap (Figure 1E, Figure 1 – source data 3). WT class EU genes that are differentially

532 expressed in KO samples are represented in the right heatmap (Figure 1E, Figure 1 –

533 source data 4).

534

535 Gene ontology analysis

536 Gene ontology (GO) analyses were performed using DAVID 6.7

537 (https://david.ncifcrf.gov/home.jsp) for differentially expressed upregulated or downregulated

538 genes in KO compared to WT 2-cell embryos. Adjusted p-values from BP_Fat were plotted

539 with ReviGO (Supek et al. 2011) allowing for medium similarity, with a selection of GO terms

540 displayed (Figure 2A, Figure 2 – source data 1).

541

542 Genome-wide Dppa3 gene co-expression analysis

543 A Dppa3 co-expression network (Bassel et al. 2011) was constructed for genes expressed in

544 WT oocyte, 1-cell, and 2-cell stages. The co-expression network can be represented as an

21 545 adjacency matrix, , with an edge being drawn between two genes and if the magnitude

546 of the Pearson correlation coefficient between the log2 transformed CPM (log2 (CPM+1))

547 was greater than a particular threshold value:

,,|,|≥, , = 0, |,|<.

548

549 Mapping reads and normalisation for TE counts

550 To investigate the expression of class I TE (retrotransposons), we remapped the sequencing

551 data to the same genome by randomly keeping only one genomic alignment of multi-

552 mapped reads using GSNAP. From the GTF file of RepeatMasker provided by UCSC table

553 browser (downloaded at 2015-08-25), uniquely mapped reads were counted for each

554 genomic region of the TE (annotated as repName in RepeatMasker) using htseq-count

555 (Figure 3 – source data 1). We also generated multi-mapped counts by considering both

556 uniquely and multi-mapped reads. To this end, the optional NH tags of all alignments in the

557 SAM files were set to 1 and mapped reads were counted for reach TE using htseq-count

558 (Figure 3 – source data 2). We show multi-mapped reads can be unambiguously assigned to

559 a class (repClass in RepeatMasker) and family (repFamily in RepeatMasker) 98% of the

560 time, however, only 54% of the time at the level of an element (repName in RepeatMasker)

561 (Figure 3 – Figure Supplement 3A). For this reason, all analysis at the level of

562 retrotransposon class and family are based on multi-mapped counts, while analysis at the

563 level of individual transposable elements are based only on uniquely mapped counts. To

564 adjust for different sequencing depths, we normalised the TE counts by the total number of

565 reads mapped to TEs and used the within-sample normalised values as our estimates of the

566 expression levels of TEs (Figure 3B). The dynamic expression profiles of TEs during MZT

567 were unchanged even when we normalised the TE counts by the size factor estimated by

568 the DESeq2 package of R (between-sample normalisation method) (Figure 3 – Figure

569 Supplement 3B).

570

22 571 Stella ChIP-Seq analysis

572 We mapped single-end reads of ChIP-seq experiments to the Mus musculus genome

573 (mm10) using Bowtie2 (version 2.2.7) with default options (--local) (Langmead et al. 2012).

574 After including only uniquely mapped reads, peak calling was performed by HOMER (Heinz

575 et al. 2010) (version 4.8.1) with default options (findPeaks –style factor), where the input

576 sequencing run was used as a control. The peaks were associated with nearby genes

577 (mm10) by using HOMER annotatePeaks.pl.

578

579 Correlation analysis between genes and TEs

580 Uniquely mapped BAM files were uploaded into SeqMonk (Version 0.32.0)

581 (http://www.bioinformatics.bbsrc.ac.uk/projects/seqmonk/). The edgeR package (McCarthy

582 et al. 2012) was used to identify differently expressed genes and TEs between WT and KO

583 2-cell embryos. We performed an intersection between differentially expressed genes and

584 genes ±20kb of differentially expressed TEs. A hypergeometric test was used to determine

585 the significance of the overlap. For genome-wide analysis, we calculated Spearman’s rank

586 correlation coefficient between a TE and its nearest neighbouring gene using the expression

587 profile of 66 cells. We excluded all TEs overlapped with any exons including UTRs, allowing

588 the max gap of 51bp equal to the read length. The maximum distance between genes and

589 TEs was set to 1Mbp.

590

591 Testing TE enrichments of zygotically activated genes (ZAGs)

592 To test whether TEs depleted in KO 2-cell embryos (adjusted P-value less than 0.05) are

593 enriched in the neighbourhood of ZAGs, we counted the mean number of depleted TEs

594 within ±10kb of the transcription start site (TSS) of ZAGs (n=698). ZAGs are defined as the

595 WT class EU, which are differentially expressed in KO samples in the time-series cluster

596 (Figure 1 – source data 4). This process was then repeated 10,000 times for a randomly

597 selected group of genes (n=698). We excluded all depleted TEs overlapped with any exons.

23 598 The observed mean number was then tested against a null model assuming no enrichment

599 of depleted repeats in the neighbourhood of ZAGs by computing the empirical P-value of the

600 observed mean number based on a null distribution of simulated mean numbers of depleted

601 TEs (Figure 4B).

602

603 Chimeric transcript identification

604 We extracted split (split but mapped to the same ) or translocated (mapped to

605 different ) reads from the remapped SAM files of GSNAP with a novel splicing

606 option (--novelsplicing=1 --distant-splice-penalty=0). Only one genomic alignment was kept

607 for each multi-mapped read. From the CIGAR string of the extracted reads, we calculated

608 the genomic positions of both ends of split or translocated reads. If one end of a read is

609 overlapped with one of genes and the other end is overlapped with one of repeat elements

610 unambiguously, we considered it as the read supporting chimeric transcripts between genes

611 and TE. Chimeric transcripts were validated through PCR amplification across the predicted

612 chimeric junction and the DNA sequence at the junction was validated with Sanger

613 sequencing.

614

615 siRNA target analysis

616 To search full-length MuERV-L copies with the complete gag and pol genes, we aligned the

617 sequences for MuERV-L gag and pol genes (GenBank accession number: Y12713) to the

618 Mus musculus genome (GRCm38) using Bowtie2 (version 2.3.0) with options “-a -D 20 -R 3

619 -N 1 -L 20 –i S,1,0.50”. From the GTF file of RepeatMasker we used for TE counts, we

620 extracted all MERVL-int elements overlapped with the alignments to the gag and pol genes

621 (n=583). To calculate the specificity of MuERV-L siRNA, we mapped the sequence of

622 MuERV-L siRNA to the Mus musculus genome (GRCm38) using Bowtie2 (version 2.3.0)

623 with options “-a -N 1 -L 10”. We stratified the SAM alignments into perfect match, one

624 mismatch, and two mismatches by examining the “XM:i:” field and counted the

625 alignments overlapped with the full-length MERVL-int elements.

24 626

627 Confocal acquisition and image analysis

628 Full projections of images were taken every 0.5M (Figure 5E) or 1.5 M (Figure 3G, Figure

629 3 – Figure Supplement 1C, Figure 5D &G) along the z-axis with an inverted Leica TCS SP5

630 confocal microscope. For MuERV-L staining in 2-cell embryos, image analysis was carried

631 out using Fiji (Schindelin et al. 2012). Measurements of fluorescent intensity were automated

632 using a custom ImageJ macro to create a Huang thresholded 3D cell mask from the

633 fluorescent signal and measure the mean intensity inside each embryo. For measurements

634 of the number of DAPI+ cells in day 5 blastocysts (Figure 5E), we generated a Jython script.

635 To allow reliable counting of clustered cells with widely varying amounts of labelling, a

636 hierarchical k-means segmentation algorithm (Dufour A et al. 2008) was implemented to

637 generate a 3D mask, and the ImageJ 3D Object Counter plugin (Bolte et al. 2007) was used

638 to determine how many separated objects were present by binary connexity analysis.

639

640 Acknowledgements

641 We thank Richard Butler for his support on the confocal imaging analysis, Charles Bradshaw

642 for bioinformatic support, Todd S. Macfarlan and Samuel L. Pfaff for the 2C::tdTomato ESCs.

643 We also thank members of the Surani lab for their critical input and helpful discussions on

644 this project. The work was funded by a studentship to YH from the James Baird Fund,

645 University of Cambridge, by the DGIST Start-up Fund of the Ministry of Science, ICT and

646 Future Planning to JKK, by a Wellcome Trust Senior Investigator Award to MAS, and by a

647 core grant from the Wellcome Trust and Cancer Research UK to the Gurdon Institute.

25 648 Competing interests

649 No competing interests declared.

650 References

651 Albertsen, Maria, Marta Teperek, Grethe Elholm, Ernst-Martin Füchtbauer, and Karin Lykke- 652 Hartmann. 2010. "Localization and differential expression of the Krüppel-associated box zinc 653 finger proteins 1 and 54 in early mouse development." DNA and cell biology 29 (10):589- 654 601. doi: 10.1089/dna.2010.1040.

655 Amouroux, R., B. Nashun, K. Shirane, S. Nakagawa, P. W. Hill, Z. D'Souza, M. Nakayama, 656 M. Matsuda, A. Turp, E. Ndjetehe, V. Encheva, N. R. Kudo, H. Koseki, H. Sasaki, and P. 657 Hajkova. 2016. "De novo DNA methylation drives 5hmC accumulation in mouse zygotes." In 658 Nat Cell Biol, 225-33. England.

659 Ancelin, Katia, Laurène Syx, Maud Borensztein, Noémie Ranisavljevic, Ivaylo Vassilev, Luis 660 Briseño-Roa, Tao Liu, Eric Metzger, Nicolas Servant, Emmanuel Barillot, Chong-Jian Chen, 661 Roland Schüle, and Edith Heard. 2016. "Maternal LSD1/KDM1A is an essential regulator of 662 chromatin and transcription landscapes during zygotic genome activation." eLife 5. doi: 663 10.7554/eLife.08851.

664 Anders, S., P. T. Pyl, and W. Huber. 2014. "HTSeq--a Python framework to work with high- 665 throughput sequencing data." Bioinformatics 31 (2):166-9. doi: 666 10.1093/bioinformatics/btu638 667 10.1093/bioinformatics/btu638. Epub 2014 Sep 25.

668 Aravind, L., and Eugene V. Koonin. 2000. "SAP – a putative DNA-binding motif involved in 669 chromosomal organization." Trends in Biochemical Sciences 25 (3):112-114. doi: 670 10.1016/S0968-0004(99)01537-6.

671 Bassel, G. W., H. Lan, E. Glaab, D. J. Gibbs, T. Gerjets, N. Krasnogor, A. J. Bonner, M. J. 672 Holdsworth, and N. J. Provart. 2011. "Genome-wide network model capturing seed 673 germination reveals coordinated regulation of plant cellular phase transitions." Proc Natl 674 Acad Sci U S A 108 (23):9709-14. doi: 10.1073/pnas.1100958108 675 10.1073/pnas.1100958108. Epub 2011 May 18.

676 Beraldi, Rosanna, Carmine Pittoggi, Ilaria Sciamanna, Elisabetta Mattei, and Corrado 677 Spadafora. 2006. "Expression of LINE-1 retroposons is essential for murine preimplantation 678 development." Molecular Reproduction and Development 73 (3):279-287. doi: 679 10.1002/mrd.20423.

680 Bohne, A., F. Brunet, D. Galiana-Arnoux, C. Schultheis, and J. N. Volff. 2008. "Transposable 681 elements as drivers of genomic and biological diversity in vertebrates." Chromosome Res 682 16 (1):203-15. doi: 10.1007/s10577-007-1202-6 683 10.1007/s10577-007-1202-6.

26 684 Bolte, S., and F. P. Cordelieres. 2007. "A guided tour into subcellular colocalization analysis 685 in light microscopy." J Microsc 224 (Pt 3):213-32. doi: 10.1111/j.1365-2818.2006.01706.x.

686 Bortvin, Alex, Mary Goodheart, Michelle Liao, and David C. Page. 2004. "Dppa3 / Pgc7 / 687 stella is a maternal factor and is not required for germ cell specification in mice." BMC 688 developmental biology 4:2. doi: 10.1186/1471-213X-4-2.

689 Bowles, J., R. P. Teasdale, K. James, and P. Koopman. 2003. "Dppa3 is a marker of 690 pluripotency and has a human homologue that is expressed in germ cell tumours." 691 Cytogenetic and Genome Research 101 (3-4):261-265. doi: 74346.

692 Cantone, I., and A. G. Fisher. 2013. "Epigenetic programming and reprogramming during 693 development." Nat Struct Mol Biol 20 (3):282-9. doi: 10.1038/nsmb.2489 694 10.1038/nsmb.2489. Epub 2013 Mar 5.

695 Castel, S. E., and R. A. Martienssen. 2013. "RNA interference in the nucleus: roles for small 696 RNAs in transcription, and beyond." Nat Rev Genet 14 (2):100-12. doi: 697 10.1038/nrg3355 698 10.1038/nrg3355.

699 Deng, Qiaolin, Daniel Ramsköld, Björn Reinius, and Rickard Sandberg. 2014. "Single-Cell 700 RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells." 701 Science 343 (6167):193-196. doi: 10.1126/science.1245316.

702 Denomme, M. M., and M. R. Mann. 2013. "Maternal control of genomic imprint 703 maintenance." Reprod Biomed Online 27 (6):629-36. doi: 10.1016/j.rbmo.2013.06.004 704 10.1016/j.rbmo.2013.06.004. Epub 2013 Jun 20.

705 Dufour A, Meas-Yedid V, Grassart A, and Olivo-Marin J. 2008. "Automated quantification of 706 cell endocytosis using active contours and wavelets." 2008 19th International Conference on 707 Pattern Recognition.

708 Dupressoir, A., C. Lavialle, and T. Heidmann. 2012. "From ancestral infectious retroviruses 709 to bona fide cellular genes: role of the captured syncytins in placentation." Placenta 33 710 (9):663-71. doi: 10.1016/j.placenta.2012.05.005 711 10.1016/j.placenta.2012.05.005. Epub 2012 Jun 12.

712 Elbarbary, Reyad A., Bronwyn A. Lucas, and Lynne E. Maquat. 2016. "Retrotransposons as 713 regulators of gene expression." Science (New York, N.Y.) 351 (6274):aac7247. doi: 714 10.1126/science.aac7247.

715 Eymery, A., Z. Liu, E. A. Ozonov, M. B. Stadler, and A. H. Peters. 2016. "The 716 methyltransferase Setdb1 is essential for meiosis and mitosis in mouse oocytes and early 717 embryos." Development 143 (15):2767-79. doi: 10.1242/dev.132746 718 10.1242/dev.132746. Epub 2016 Jun 17.

27 719 Gifford, Wesley D., Samuel L. Pfaff, and Todd S. Macfarlan. 2013. "Transposable elements 720 as genetic regulatory substrates in early development." Trends in Cell Biology 23 (5):218- 721 226. doi: 10.1016/j.tcb.2013.01.001.

722 Göke, Jonathan, Xinyi Lu, Yun-Shen Chan, Huck-Hui Ng, Lam-Ha Ly, Friedrich Sachs, and 723 Iwona Szczerbinska. 2015. "Dynamic transcription of distinct classes of endogenous 724 retroviral elements marks specific populations of early human embryonic cells." Cell Stem 725 Cell 16 (2):135-141. doi: 10.1016/j.stem.2015.01.005.

726 Golbus, M. S., P. G. Calarco, and C. J. Epstein. 1973. "The effects of inhibitors of RNA 727 synthesis (alpha-amanitin and actinomycin D) on preimplantation mouse embryogenesis." 728 The Journal of Experimental Zoology 186 (2):207-216. doi: 10.1002/jez.1401860211.

729 Grow, E. J., R. A. Flynn, S. L. Chavez, N. L. Bayless, M. Wossidlo, D. J. Wesche, L. Martin, 730 C. B. Ware, C. A. Blish, H. Y. Chang, R. A. Pera, and J. Wysocka. 2015. "Intrinsic retroviral 731 reactivation in human preimplantation embryos and pluripotent cells." Nature 522 732 (7555):221-5. doi: 10.1038/nature14308 733 10.1038/nature14308. Epub 2015 Apr 20.

734 Hamatani, Toshio, Mark G. Carter, Alexei A. Sharov, and Minoru S. H. Ko. 2004. "Dynamics 735 of global gene expression changes during mouse preimplantation development." 736 Developmental Cell 6 (1):117-131.

737 Hayashi, Katsuhiko, Susana M. Chuva de Sousa Lopes, Fuchou Tang, and M. Azim Surani. 738 2008. "Dynamic equilibrium and heterogeneity of mouse pluripotent stem cells with distinct 739 functional and epigenetic states." Cell stem cell 3 (4):391-401. doi: 740 10.1016/j.stem.2008.07.027.

741 Heinz, S., C. Benner, N. Spann, E. Bertolino, Y. C. Lin, P. Laslo, J. X. Cheng, C. Murre, H. 742 Singh, and C. K. Glass. 2010. "Simple combinations of lineage-determining transcription 743 factors prime cis-regulatory elements required for macrophage and B cell identities." Mol 744 Cell 38 (4):576-89. doi: 10.1016/j.molcel.2010.05.004 745 10.1016/j.molcel.2010.05.004.

746 Hubley, Robert, Robert D. Finn, Jody Clements, Sean R. Eddy, Thomas A. Jones, Weidong 747 Bao, Arian F. A. Smit, and Travis J. Wheeler. 2016. "The Dfam database of repetitive DNA 748 families." Nucleic Acids Research 44 (D1):D81-89. doi: 10.1093/nar/gkv1272.

749 Jiang, Lichun, Felix Schlesinger, Carrie A. Davis, Yu Zhang, Renhua Li, Marc Salit, Thomas 750 R. Gingeras, and Brian Oliver. 2011. "Synthetic spike-in standards for RNA-seq 751 experiments." Genome Research 21 (9):1543-1551. doi: 10.1101/gr.121095.111.

752 Karimi, Mohammad M., Preeti Goyal, Irina A. Maksakova, Misha Bilenky, Danny Leung, Jie 753 Xin Tang, Yoichi Shinkai, Dixie L. Mager, Steven Jones, Martin Hirst, and Matthew C. 754 Lorincz. 2011. "DNA methylation and SETDB1/H3K9me3 regulate predominantly distinct 755 sets of genes, retroelements, and chimeric transcripts in mESCs." Cell Stem Cell 8 (6):676- 756 687. doi: 10.1016/j.stem.2011.04.004.

28 757 Kigami, Daisuke, Naojiro Minami, Hanae Takayama, and Hiroshi Imai. 2003. "MuERV-L is 758 one of the earliest transcribed genes in mouse one-cell embryos." Biology of Reproduction 759 68 (2):651-654.

760 Kim, J., H. Zhao, J. Dan, S. Kim, S. Hardikar, D. Hollowell, K. Lin, Y. Lu, Y. Takata, J. Shen, 761 and T. Chen. 2016. "Maternal Setdb1 Is Required for Meiotic Progression and 762 Preimplantation Development in Mouse." PLoS Genet 12 (4):e1005970. doi: 763 10.1371/journal.pgen.1005970 764 10.1371/journal.pgen.1005970. eCollection 2016 Apr.

765 Kolodziejczyk, A. A., J. K. Kim, J. C. Tsang, T. Ilicic, J. Henriksson, K. N. Natarajan, A. C. 766 Tuck, X. Gao, M. Buhler, P. Liu, J. C. Marioni, and S. A. Teichmann. 2015. "Single Cell 767 RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation." Cell 768 Stem Cell 17 (4):471-85. doi: 10.1016/j.stem.2015.09.011 769 10.1016/j.stem.2015.09.011.

770 Langmead, B., and S. L. Salzberg. 2012. "Fast gapped-read alignment with Bowtie 2." Nat 771 Methods 9 (4):357-9. doi: 10.1038/nmeth.1923 772 10.1038/nmeth.1923.

773 Lee, Miler T., Ashley R. Bonneau, and Antonio J. Giraldez. 2014. "Zygotic genome activation 774 during the maternal-to-zygotic transition." Annual Review of Cell and Developmental Biology 775 30:581-613. doi: 10.1146/annurev-cellbio-100913-013027.

776 Li, Lei, Xukun Lu, and Jurrien Dean. 2013. "The maternal to zygotic transition in mammals." 777 Molecular Aspects of Medicine 34 (5):919-938. doi: 10.1016/j.mam.2013.01.003.

778 Li, Lei, Ping Zheng, and Jurrien Dean. 2010. "Maternal control of early mouse development." 779 Development (Cambridge, England) 137 (6):859-870. doi: 10.1242/dev.039487.

780 Liu, Yu-Jung, Toshinobu Nakamura, and Toru Nakano. 2012. "Essential Role of DPPA3 for 781 Chromatin Condensation in Mouse Oocytogenesis." Biology of Reproduction 86 (2):40. doi: 782 10.1095/biolreprod.111.095018.

783 Love, M. I., W. Huber, and S. Anders. 2014. "Moderated estimation of fold change and 784 dispersion for RNA-seq data with DESeq2." Genome Biol 15 (12):550. doi: 10.1186/s13059- 785 014-0550-8.

786 Macfarlan, Todd S., Wesley D. Gifford, Shawn Driscoll, Karen Lettieri, Helen M. Rowe, Dario 787 Bonanomi, Amy Firth, Oded Singer, Didier Trono, and Samuel L. Pfaff. 2012. "Embryonic 788 stem cell potency fluctuates with endogenous retrovirus activity." Nature 487 (7405):57-63. 789 doi: 10.1038/nature11244.

790 McCarthy, D. J., Y. Chen, and G. K. Smyth. 2012. "Differential expression analysis of 791 multifactor RNA-Seq experiments with respect to biological variation." Nucleic Acids Res 40 792 (10):4288-97. doi: 10.1093/nar/gks042

29 793 10.1093/nar/gks042. Epub 2012 Jan 28.

794 Nakamura, Toshinobu, Yoshikazu Arai, Hiroki Umehara, Masaaki Masuhara, Tohru Kimura, 795 Hisaaki Taniguchi, Toshihiro Sekimoto, Masahito Ikawa, Yoshihiro Yoneda, Masaru Okabe, 796 Satoshi Tanaka, Kunio Shiota, and Toru Nakano. 2007. "PGC7/Stella protects against DNA 797 demethylation in early embryogenesis." Nature cell biology 9 (1):64-71. doi: 798 10.1038/ncb1519.

799 Nakamura, Toshinobu, Yu-Jung Liu, Hiroyuki Nakashima, Hiroki Umehara, Kimiko Inoue, 800 Shogo Matoba, Makoto Tachibana, Atsuo Ogura, Yoichi Shinkai, and Toru Nakano. 2012. 801 "PGC7 binds histone H3K9me2 to protect against conversion of 5mC to 5hmC in early 802 embryos." Nature 486 (7403):415-419. doi: 10.1038/nature11093.

803 Nakashima, Hiroyuki, Tohru Kimura, Yoshiaki Kaga, Tsunetoshi Nakatani, Yoshiyuki Seki, 804 Toshinobu Nakamura, and Toru Nakano. 2013. "Effects of dppa3 on DNA methylation 805 dynamics during primordial germ cell development in mice." Biology of Reproduction 88 806 (5):125. doi: 10.1095/biolreprod.112.105932.

807 Nakatani, Tsunetoshi, Kazuo Yamagata, Tohru Kimura, Masaaki Oda, Hiroyuki Nakashima, 808 Mayuko Hori, Yoichi Sekita, Tatsuhiko Arakawa, Toshinobu Nakamura, and Toru Nakano. 809 2015. "Stella preserves maternal chromosome integrity by inhibiting 5hmC-induced γH2AX 810 accumulation." EMBO reports 16 (5):582-589. doi: 10.15252/embr.201439427.

811 Park, Sung-Joon, Katsuhiko Shirahige, Miho Ohsugi, and Kenta Nakai. 2015. "DBTMEE: a 812 database of transcriptome in mouse early embryos." Nucleic Acids Research 43 (D1):D771- 813 D776. doi: 10.1093/nar/gku1001.

814 Payer, Bernhard, Mitinori Saitou, Sheila C. Barton, Rosemary Thresher, John P. C. Dixon, 815 Dirk Zahn, William H. Colledge, Mark B. L. Carlton, Toru Nakano, and M. Azim Surani. 2003. 816 "stella Is a Maternal Effect Gene Required for Normal Early Development in Mice." Current 817 Biology 13 (23):2110-2117. doi: 10.1016/j.cub.2003.11.026.

818 Peaston, Anne E., Alexei V. Evsikov, Joel H. Graber, Wilhelmine N. de Vries, Andrea E. 819 Holbrook, Davor Solter, and Barbara B. Knowles. 2004. "Retrotransposons regulate host 820 genes in mouse oocytes and preimplantation embryos." Developmental Cell 7 (4):597-606. 821 doi: 10.1016/j.devcel.2004.09.004.

822 Peat, J. R., W. Dean, S. J. Clark, F. Krueger, S. A. Smallwood, G. Ficz, J. K. Kim, J. C. 823 Marioni, T. A. Hore, and W. Reik. 2014. "Genome-wide bisulfite sequencing in zygotes 824 identifies demethylation targets and maps the contribution of TET3 oxidation." Cell Rep 9 825 (6):1990-2000. doi: 10.1016/j.celrep.2014.11.034 826 10.1016/j.celrep.2014.11.034. Epub 2014 Dec 12.

827 Rebollo, Rita, Mark T. Romanish, and Dixie L. Mager. 2012. "Transposable elements: an 828 abundant and natural source of regulatory sequences for host genes." Annual Review of 829 Genetics 46:21-42. doi: 10.1146/annurev-genet-110711-155621.

30 830 Ribet, David, Sophie Louvet-Vallée, Francis Harper, Nathalie de Parseval, Marie 831 Dewannieux, Odile Heidmann, Gérard Pierron, Bernard Maro, and Thierry Heidmann. 2008. 832 "Murine endogenous retrovirus MuERV-L is the progenitor of the "orphan" epsilon viruslike 833 particles of the early mouse embryo." Journal of Virology 82 (3):1622-1625. doi: 834 10.1128/JVI.02097-07.

835 Risso, Davide, John Ngai, Terence P. Speed, and Sandrine Dudoit. 2014. "Normalization of 836 RNA-seq data using factor analysis of control genes or samples." Nature Biotechnology 32 837 (9):896-902. doi: 10.1038/nbt.2931.

838 Rowe, Helen M., and Didier Trono. 2011. "Dynamic control of endogenous retroviruses 839 during development." Virology 411 (2):273-287. doi: 10.1016/j.virol.2010.12.007.

840 Schindelin, J., I. Arganda-Carreras, E. Frise, V. Kaynig, M. Longair, T. Pietzsch, S. Preibisch, 841 C. Rueden, S. Saalfeld, B. Schmid, J. Y. Tinevez, D. J. White, V. Hartenstein, K. Eliceiri, P. 842 Tomancak, and A. Cardona. 2012. "Fiji: an open-source platform for biological-image 843 analysis." Nat Methods 9 (7):676-82. doi: 10.1038/nmeth.2019 844 10.1038/nmeth.2019.

845 Shen, Li, Azusa Inoue, Jin He, Yuting Liu, Falong Lu, and Yi Zhang. 2014. "Tet3 and DNA 846 replication mediate demethylation of both the maternal and paternal genomes in mouse 847 zygotes." Cell Stem Cell 15 (4):459-470. doi: 10.1016/j.stem.2014.09.002.

848 Supek, F., M. Bosnjak, N. Skunca, and T. Smuc. 2011. "REVIGO summarizes and visualizes 849 long lists of gene ontology terms." PLoS One 6 (7):e21800. doi: 850 10.1371/journal.pone.0021800 851 10.1371/journal.pone.0021800. Epub 2011 Jul 18.

852 Tadros, Wael, and Howard D. Lipshitz. 2009. "The maternal-to-zygotic transition: a play in 853 two acts." Development (Cambridge, England) 136 (18):3033-3042. doi: 854 10.1242/dev.033183.

855 Tang, Fuchou, Catalin Barbacioru, Ellen Nordman, Bin Li, Nanlan Xu, Vladimir I. Bashkirov, 856 Kaiqin Lao, and M. Azim Surani. 2010. "RNA-Seq analysis to capture the transcriptome 857 landscape of a single cell." Nature Protocols 5 (3):516-535. doi: 10.1038/nprot.2009.236.

858 Thompson, P. J., T. S. Macfarlan, and M. C. Lorincz. 2016. "Long Terminal Repeats: From 859 Parasitic Elements to Building Blocks of the Transcriptional Regulatory Repertoire." Mol Cell 860 62 (5):766-76. doi: 10.1016/j.molcel.2016.03.029 861 10.1016/j.molcel.2016.03.029.

862 Tsukada, Y., T. Akiyama, and K. I. Nakayama. 2015. "Maternal TET3 is dispensable for 863 embryonic development but is required for neonatal growth." Sci Rep 5:15876. doi: 864 10.1038/srep15876 865 10.1038/srep15876.

31 866 Wasson, Jadiel A., Ashley K. Simon, Dexter A. Myrick, Gernot Wolf, Shawn Driscoll, Samuel 867 L. Pfaff, Todd S. Macfarlan, and David J. Katz. 2016. "Maternally provided LSD1/KDM1A 868 enables the maternal-to-zygotic transition and prevents defects that manifest postnatally." 869 eLife 5. doi: 10.7554/eLife.08848.

870 Wossidlo, Mark, Toshinobu Nakamura, Konstantin Lepikhov, C. Joana Marques, Valeri 871 Zakhartchenko, Michele Boiani, Julia Arand, Toru Nakano, Wolf Reik, and Jörn Walter. 2011. 872 "5-Hydroxymethylcytosine in the mammalian zygote is linked with epigenetic 873 reprogramming." Nature Communications 2:241. doi: 10.1038/ncomms1240.

874 Wu, Gengshu, Chieko Aoyama, Stephen G. Young, and Dennis E. Vance. 2008. "Early 875 embryonic lethality caused by disruption of the gene for choline kinase alpha, the first 876 enzyme in phosphatidylcholine biosynthesis." The Journal of Biological Chemistry 283 877 (3):1456-1462. doi: 10.1074/jbc.M708766200.

878 Wu, J., B. Huang, H. Chen, Q. Yin, Y. Liu, Y. Xiang, B. Zhang, B. Liu, Q. Wang, W. Xia, W. 879 Li, Y. Li, J. Ma, X. Peng, H. Zheng, J. Ming, W. Zhang, J. Zhang, G. Tian, F. Xu, Z. Chang, J. 880 Na, X. Yang, and W. Xie. 2016. "The landscape of accessible chromatin in mammalian 881 preimplantation embryos." Nature. doi: 10.1038/nature18606 882 10.1038/nature18606.

883 Wu, T. D., and S. Nacu. 2010. "Fast and SNP-tolerant detection of complex variants and 884 splicing in short reads." Bioinformatics 26 (7):873-81. doi: 10.1093/bioinformatics/btq057 885 10.1093/bioinformatics/btq057. Epub 2010 Feb 10.

886 Xue, Zhigang, Kevin Huang, Chaochao Cai, Lingbo Cai, Chun-yan Jiang, Yun Feng, 887 Zhenshan Liu, Qiao Zeng, Liming Cheng, Yi E. Sun, Jia-yin Liu, Steve Horvath, and Guoping 888 Fan. 2013. "Genetic programs in human and mouse early embryos revealed by single-cell 889 RNA sequencing." Nature 500 (7464):593-597. doi: 10.1038/nature12364.

890 Yan, Liying, Mingyu Yang, Hongshan Guo, Lu Yang, Jun Wu, Rong Li, Ping Liu, Ying Lian, 891 Xiaoying Zheng, Jie Yan, Jin Huang, Ming Li, Xinglong Wu, Lu Wen, Kaiqin Lao, Ruiqiang Li, 892 Jie Qiao, and Fuchou Tang. 2013. "Single-cell RNA-Seq profiling of human preimplantation 893 embryos and embryonic stem cells." Nature Structural & Molecular Biology 20 (9):1131- 894 1139. doi: 10.1038/nsmb.2660. 895

32 896 Figure title and legends

897 Figure 1. Stella M/Z KO embryos are impaired in maternal-to-zygotic transition.

898 A) A schematic illustration of the single-cell / embryo RNA-seq experimental setup. The total

899 number of oocytes and embryos collected are indicated. The colour scheme represents the

900 transition from maternal (green) to zygotic (red) transcripts. Maternal Stella is represented in

901 green circle.

902 B) A score plot of the first three principal components for 66 cells using gene counts. The

903 lower panels represent the score plots of the first two principal components using cells

904 belonging to a specific developmental time point. The developmental time points are

905 indicated by colour and the genotypes for Stella are indicated by shape.

906 C) A Venn diagram illustrating the overlap of differentially expressed genes (DEG) between

907 WT and KO oocytes, 1-cell and 2-cell embryos (adjusted P-value < 0.05) (Figure 1 – source

908 data 2).

909 D) The heatmap represents the log2 ratio of the number of upregulated to downregulated

910 genes in KO compared to WT at oocytes, 1-cell and 2-cell stage, belonging to a given

911 cluster of DBTMEE (Park et al. 2015). Fisher’s exact test performed and statistically

912 significant p-values are stated.

913 E) A time-series clustering of gene expression dynamics across oocyte to 2-cell embryo (see

914 Materials and Methods). (Left) Heatmap shows a cluster of WT maternal transcripts (ED),

915 which are differentially expressed in KO samples (adjusted p-value <0.05, Figure 1 – source

916 data 3). (Right) Heatmap shows a cluster of WT zygotically activated genes (ZAG) (EU),

917 which are differentially expressed in KO samples (adjusted p-value <0.05, Figure 1 – source

918 data 4).

919 Also see Figure 1 – Figure Supplement 1-2 and Figure 1 – source data 1-4.

920

921

922

923

33 924 Figure 2. Stella is associated with the activation of essential zygotic genes.

925 A) Revigo plots (Supek et al. 2011) of a selection of gene ontology (GO) terms enriched in 2-

926 cell DEG (Figure 2 – source data 1). The colour of the circle represents the log10 p-value.

927 Semantic space clusters GO terms of similar functions together.

928 B) A boxplot of normalised counts of genes belong to the Ribosome KEGG pathway

929 between WT and KO across the developmental stages. Two-sided Wilcoxon rank sum test

930 performed between WT and KO and statistically significant p-value is stated.

931 C) and D) shows boxplots of single-embryo qRT-PCR validation of RNA-seq in WT (white)

932 and KO (light blue) 2-cell embryos. All genes are normalised to housekeeping genes, relative

933 to one WT embryo and log2 transformed. C) shows genes significantly upregulated and D)

934 downregulated in KO relative to WT 2-cell embryos (p<0.05 = *; p<0.01=**, p<0.001=*** and

935 p<0.0001=****).

936 E) A table of the top 10 GO terms enriched in 710 genes whose expression levels are

937 positively correlated with Dppa3 (Pearson’s correlation coefficient >0.9), identified from the

938 genome-wide co-expression network analysis (Figure 2 – source data 2).

939 F) Scatter plots of the expression of Snrpd1 and Snrpb2 against Dppa3. Gene expression

940 was log2 transformed (counts per million + 1) and Pearson’s correlation analysis was

941 performed.

942 Also see Figure 2 – Figure Supplement 1 and Figure 2 – source data 1-2.

943

944 Figure 3. TEs are mis-expressed in the absence of Stella.

945 A) Top panel shows boxplots of the ratio of reads mapped to TEs to total reads mapped in

946 WT and KO oocytes, 1-cell and 2-cell embryos. The number of reads mapped to TEs are

947 based on uniquely and multi-mapped counts (Figure 3 – source data 2).

948 B) A heatmap of the relative expression of LTR, LINE and SINE in oocyte (green), 1-cell

949 (yellow) and 2-cell embryos (red). Blue indicates higher expression and white indicates lower

950 expression. Samples are clustered by row.

34 951 C) A score plot of the first three principal components for 66 cells using uniquely mapped TE

952 counts (Figure 3 – source data 1). The lower panels represent the score plots of the first two

953 principal components using cells belonging to a specific developmental time point. The

954 developmental time points are indicated by colour and the genotypes of Stella are indicated

955 by shape.

956 D) A bar chart of the odds ratio of TEs up and downregulated in KO 2-cell embryos

957 compared to WT, intersected with ‘maternal TEs’ and ‘zygotic TEs’. For maternal TEs

958 enriched in TEs upregulated in KO 2-cell, *** p=5.35x10-6; for zygotic TEs enriched in TEs

959 downregulated in KO 2-cell, *** p=8.13x10-7.

960 E) Bar plots showing the relative expression of TE families (LTR-ERVL, LINE1-L1 and SINE-

961 Alu) in WT (white) and KO (light blue) oocytes, 1-cell and 2-cell embryos, data analysed

962 from single-cell / embryo RNA sequencing. Two-sided Wilcoxon rank sum test performed

963 between WT and KO samples and statistically significant p-values stated.

964 F) Top is an illustration of the structure of the full-length MuERV-L element flanked by 5’ and

965 3’ LTRs. Bottom is boxplots of the relative expression (counts per million) of MuERV-L Int

966 and MT2_Mm transcript in WT (white) and KO (light-blue) oocytes, 1-cell and 2-cell embryos

967 (p<0.05 corresponds to * and p<0.01 corresponds to **).

968 G) Immunofluorescence staining against MuERV-L Gag antibody in 2-cell embryos. (Left)

969 The top panel shows bright-field, middle panel is MuERV-L Gag (green) and bottom panel

970 are merged images of MuERV-L and DAPI, which counterstains DNA. Representative

971 projections of z-stacks are shown for WT and KO embryos. (Right) A boxplot of the z-stack

972 quantifications of the relative fluorescence intensity of MuERV-L Gag protein between WT

973 and KO 2-cell embryos.

974 Also see Figure 3 – Figure Supplement 1-3 and Figure 3 – source data 1-2.

975

976

977

978

35 979 Figure 4. A positive correlation between the expression of a subset of TEs and their

980 nearest gene.

981 A) A Venn diagram illustrating a significant overlap between differentially expressed genes

982 and genes within ±20kb of differentially expressed TEs in WT v KO 2-cell embryos.

983 B) A histogram of the mean number of depleted TEs in Stella M/Z KO 2-cell embryos within

984 ±10kb of the TSS of a group of genes. The red line represents zygotically activated genes

985 (ZAG, n=698) that belong to WT class EU and differentially expressed in KO samples

986 (adjusted p-value <0.05) (Figure 1 – source data 4). The grey bars represent each of 10,000

987 random sets of 698 genes.

988 C) Top is schematic illustrations of the relationship between the TE and downstream gene,

989 where grey box represents the TE and its orientation; white box represents the gene and its

990 orientation. Bottom is scatter plots of qRT-PCR expression in 2-cell embryos. WT (white

991 circle) n=19 and KO (light blue triangle) n=17. Spearman’s correlation analysis was

992 performed and p<0.0001 corresponds to ****.

993 D) Characterisation of chimeric transcripts. (Middle) A schematic illustration of the alternative

994 splicing event that occurred between the TE and the downstream gene. SD = splice donor

995 site and SA = splice acceptor site. The endogenous starting codon (ATG) is indicated. (Left)

996 The PCR product of the chimeric transcript where the forward primer originates from the TE

997 and reverse primer originates from the gene. The PCR product is significantly shorter than

998 the annotated distances between them, confirming a splicing event. (Right) Sanger

999 sequencing chromatographs from the PCR product for each chimeric transcript, confirming

1000 the site of chimeric junction. * Indicates a different nucleotide to annotation.

1001 E) Box plots of single-embryo qRT-PCR of the relative expression of chimeric transcripts in

1002 WT (white) and KO (light blue) 2-cell embryos. P<0.01 corresponds to ** and p<0.001

1003 corresponds to ***. All expression data are normalised to 3 housekeeping genes and relative

1004 to 1 WT embryo.

1005 Also see Figure 4 – Figure Supplement 1.

1006

36 1007 Figure 5. MuERV-L plays a functional role during pre-implantation development

1008 A) A schematic diagram illustrating the experimental setup.

1009 B) Left is an illustration of the structure of the MuERV-L element (GenBank accession

1010 number: Y12713), depicting the exact site of siRNA target. Right is a table showing the

1011 number of full-length MuERV-L elements targeted by MuERV-L siRNA and the number of

1012 off-targets.

1013 C) Box plots of single-embryo qRT-PCR of MuERV-L pol transcript and D) MuERV-L Gag

1014 protein (fluorescence) intensity in uninjected controls, 20µM scramble siRNA or 20µM

1015 MuERV-L siRNA injection 2-cell embryos. (Right) Representative z-stack projections of

1016 immunofluorescence (IF) staining against MuERV-L Gag in 2-cell embryos. Two-tail

1017 Wilcoxon Rank Sum Test was performed for both C) and D) and p<0.01 corresponds to *,

1018 not significant (n.s.) or are otherwise stated.

1019 E) The effect of MuERV-L siRNA on developmental progression of pre-implantation embryos.

1020 (Left) Bar graphs illustrating the number of embryos observed at different stages on day 2 –

1021 4 of in-vitro culture. Blue background indicates area of interest. Experiments were repeated

1022 twice and the circle dots represent the data points for each replicate. A total of n=17, n=58

1023 and n=53 embryos were analysed for uninjected, scramble and MuERV-L siRNA injections

1024 respectively. One-tailed Student’s T-test performed and there were no statistical difference

1025 between uninjected, scramble or MuERV-L siRNA injected groups. (Middle) Day 5 bright

1026 field images of late blastocysts injected with scramble or MuERV-L siRNA. (Right) A box plot

1027 of IF quantification of number of DAPI+ cells in the late blastocyst stage on day 5 injected

1028 with 20µM of scramble (white) or MuERV-L (grey) siRNA.

1029 F) and G) are box plots of single-embryo qRT-PCR of MuERV-L pol transcript and MuERV-L

1030 Gag protein (fluorescence) intensity in uninjected control, 80µM scramble siRNA and 80µM

1031 MuERV-L siRNA injected 2-cell embryos. (Right) Representative z-stack projections of IF

1032 staining against MuERV-L Gag antibody in 2-cell embryos. Two-tail Wilcoxon Rank Sum

1033 Test was performed for both F) and G) and p<0.01 corresponds to *, not significant (n.s.) or

1034 are otherwise stated.

37 1035 H) A bar graph plotting the number of embryos observed at each stage in day 2 in-vitro

1036 culture in uninjected embryos, 80µM scramble (white) or MuERV-L (grey) siRNA injection.

1037 Experiments were repeated twice and the circle dots represent the data points for each

1038 replicate. A total of n=17, n=54 and n=50 embryos were analysed for uninjected, scramble

1039 and MuERV-L siRNA respectively. One-tailed Student’s T-test performed. P<0.05

1040 corresponds to * and P<0.01 corresponds to **.

1041

1042 Figure Supplement Legends

1043

1044 Figure 1 – Figure Supplement 1. Stella protein domains

1045 A) A schematic diagram illustrating the putative Stella protein domains: SAP-like domain,

1046 splice-like factor, nuclear export and nuclear localization signals (Aravind et al. 2000, Payer

1047 et al. 2003).

1048 B) A schematic diagram illustrating the N-terminus of Stella (amino acids 1-75) is required

1049 for H3K9me2 binding in maternal pronuclei (PN), while the C-terminus of Stella (amino acids

1050 76-150) is required to exclude TET3 from the maternal PN (Nakamura et al. 2012).

1051

1052 Figure 1 – Figure Supplement 2. Quality control of single-cell / embryo RNA-seq.

1053 A) A scatter plot of the number of reads mapped to exons against the proportion of reads

1054 mapped to mitochondrial genes. The dashed red line indicates threshold of a good quality

1055 sample if: 1) cells have a total number of reads mapped to exons greater than 500,000; 2)

1056 cells have less than 10% of the proportion of reads mapped to 37 genes on the mitochondria

1057 chromosome. All 66 samples are of good quality.

1058 B) A score plot of the first 2 principle components for cells of each developmental stage.

1059 RUV = removal of unwanted variation (see materials and methods, Figure 1 – source data 1).

1060 Top panel shows samples clustering pre-RUV, and bottom panel shows samples clustering

38 1061 post-RUV. Each batch represents independent experiments in which samples are collected,

1062 and this is indicated by a different colour (Supplementary File 1).

1063

1064 Figure 2 – Figure Supplement 1. Ribosome associated genes are depleted in Stella

1065 M/Z KO 2-cell embryos.

1066 Comparison of normalised counts of genes belonging to the Ribosome associated gene

1067 ontology pathway between WT (white) and KO (blue) across the developmental stages.

1068 Two-sided Wilcoxon rank sum test was performed between WT and KO samples and

1069 statistically significant p-value stated.

1070

1071 Figure 3 – Figure Supplement 1. TEs are mis-expressed in the absence of Stella.

1072 A) Heatmaps of the relative expression of individual families of LTR, LINE and SINE

1073 elements in oocyte (green), 1-cell (yellow) and 2-cell (red) embryos. Samples are clustered

1074 by row. Blue indicated relatively high expression while white indicates relatively low

1075 expression of TEs.

1076 B) Bar charts of relative expression of LTR elements, LINE and SINE in WT (white) and KO

1077 (light blue) oocytes, 1-cell and 2-cell embryos. Two-sided Wilcoxon rank sum test performed

1078 between WT and KO samples and statistically significant p-values are stated.

1079 C) Validation of MuERV-L Gag antibody. (Left) IF staining of an ESC reporter line for

1080 MuERV-L – 2C::tdTomato (Macfarlan et al. 2012), illustrating the MuERV-L Gag (GFP)

1081 staining overlaps well with tdTomato+ ESC. (Right) IF staining showing robust detection of

1082 MuERV-L Gag (GFP) in 2-cell embryos, while it is virtually undetectable in metaphase II (MII)

1083 oocytes. DNA is counterstained with DAPI.

1084

1085 Figure 3 – Figure Supplement 2. Dppa3 does not affect MuERV-L expression in

1086 mESCs

1087 A) Left shows histogram plots of tdTomato expression in 2C::tdTomato mESCs 48hrs and

1088 96hrs after addition of DOX to induce Dppa3 expression. Red represents tdTomato

39 1089 frequency with Dppa3 over-expression (OE), while blue represents empty vector transfection

1090 control. The horizontal bars depict the expression threshold set for dtTomato and the

1091 proportion of tdTomato+ cells in each condition is stated. Right shows a bar plot of Dppa3

1092 expression 48hrs and 96hrs after addition of DOX. Dppa3 expression is normalized to a

1093 housekeeping gene (GAPDH) and relative to empty vector sample.

1094 B) Left shows histogram plots of tdTomato expression in 2C::tdTomato mESCs 48hrs and

1095 96hrs after siRNA transfection. Red represents tdTomato frequency with Dppa3 siRNA,

1096 while blue represents scramble siRNA control. The bar depicts the expression threshold set

1097 for dtTomato and the proportion of tdTomato+ cells in each condition is stated. Right shows

1098 a bar plot of Dppa3 expression 48hrs and 96hrs after siRNA transfection. Dppa3 expression

1099 is normalized to a housekeeping gene (GAPDH) and relative to scramble siRNA sample.

1100

1101 Figure 3 – Figure Supplement 3. Quality control of TE reads mapping and

1102 normalization.

1103 A) A bar plot of the proportion of multi-mapped TE reads (Figure 3 – source data 2) which

1104 can be unambiguously assigned at the level of class (repClass), family (repFamily) and

1105 element (repName) based on the RepeatMasker annotation provided by UCSC table

1106 browser.

1107 B) A heatmap of the expression of LTR, LINE and SINE in oocyte (green), 1-cell (yellow)

1108 and 2-cell (red) embryos based on between-sample normalization method (See Materials

1109 and Methods). Samples are clustered by row.

1110

1111 Figure 4 – Figure Supplement 1. A subset of TEs is positively correlated with its

1112 nearest genes.

1113 A) A histogram of Spearman’s correlation of the expression between a gene and its nearest

1114 TE within a distance of 1Mbp. The red line indicates correlation >|0.7|. 387 TE / gene pair

1115 correlation >0.7 while 224 TE/ gene pair correlation <-0.7.

40 1116 B) (Left) A bar chart of MuERV-L int transcript expression in individual WT (white) and KO

1117 (grey) 2-cell embryos. Red star indicates 3 KO embryos expressing the lowest MuERV-L

1118 transcript (KO2,8,1) while green circle highlights KO6, the KO embryo which expresses the

1119 highest MuERV-L levels. The right panel shows a heatmap of expression of genes identified

1120 to make a chimeric junction with MuERV-L (Macfarlan et al. 2012). Heatmap is normalised

1121 by row and clustered by column.

1122

1123 Figure 5 – Figure Supplement 1. Chimeric transcript expression in MuERV-L

1124 knockdown embryos.

1125 A scatter dot plot of qRT-PCR analysis of chimeric transcripts in uninjected (circle, n=4),

1126 80 M scramble siRNA (square, n=12) and 80M MuERV-L siRNA (triangle, n=8) treated 2-

1127 cell embryos. The median and interquartile range is indicated. All transcripts are normalised

1128 to housekeeping genes, relative to one uninjected embryo. Two-sided Wilcoxon rank sum

1129 test was performed and statistically significant results are stated.

1130

1131 Source data file legends

1132

1133 Supplementary File 1. Sample details for single-cell / embryo RNA sequencing

1134 Supplementary File 2. Stella ChIP-seq enriched peaks

1135 Supplementary File 3. A list of primer sequences used in this study

1136

1137 Figure 1 – source data 1. Gene counts for WT and KO oocyte, 1-cell and 2-cell

1138 embryos. Gene counts for 66 samples from single cell / embryo RNA-seq experiments.

1139 Removal of Unwanted Variation analysis has been performed to control for potentially

1140 confounding technical factors.

1141

41 1142 Figure 1 – source data 2. List of differentially expressed genes between WT and KO

1143 samples at oocyte, 1-cell and 2-cell stage, from single cell/embryo RNA-seq analysis.

1144 ENSEMBL ID and names of genes identified as significantly differentially expressed with

1145 adjusted P-value (padj) <0.05. Log2FoldChange=Log2(WT/KO). Stage indicates the

1146 developmental stage that is analysed.

1147

1148 Figure 1 - source data 3. List of maternal transcripts, WT class = ED, which are

1149 differentially expressed in KO samples. "E" denotes equally expressed, "D" denotes

1150 downregulated or "U" denotes upregulated at a later time point. Adjusted p-value (padj)

1151 shows significant differential expression between WT and KO class.

1152

1153 Figure 1 - source data 4. List of zygotically activated genes (ZAG), WT class = EU,

1154 which are differentially expressed in KO samples. "E" denotes equally expressed, "D"

1155 denotes downregulated or "U" denotes upregulated at a later time point. Adjusted p-value

1156 (padj) shows significant differential expression between WT and KO class.

1157

1158 Figure 2 – source data 1. List of gene ontology terms enriched in differentially

1159 expressed genes between WT and KO 2-cell embryos. Gene ontology terms enriched for

1160 biological processes were calculated using DAVID software (version 6.7).

1161

1162 Figure 2 – source data 2. Genome-wide Dppa3 co-expression network analysis. Table

1163 shows ensemble ID and official names of genes exhibiting significant expression correlation

1164 with Dppa3 (Pearson Correlation |r| > 0.9).

1165

1166 Figure 3 – source data 1. Uniquely mapped transposable element (TE) counts. TE

1167 counts for 66 samples from single cell / embryo RNA-seq experiments. Uniquely mapped

1168 reads only.

1169

42 1170 Figure 3 – source data 2. Uniquely and multi-mapped transposable element (TE)

1171 counts. TE counts for 66 samples from single cell/embryo RNA-seq experiments. Uniquely

1172 and multi-mapped reads.

43 Figure 1

A Oocyte 1-cell 2-cell B

X Stella WT Stella WT Stella Stella Stella 30 n=12 n=12 n=9 20

Stella KO

X 10 Stella KO 0 n=12 n=12 n=9

60 PC2 (8.55%) PC3 (5.09%) −10 40 C 20 Oocyte (WT) Overlap in DEG between WT and KO Oocyte ( KO) −20 0 oocytes, 1-cell and 2-cell embryos −20 1-Cell (WT) −40 1-Cell ( KO) −30 Oocyte −100 −80 −60 −40 −20 0 20 40 60 2-Cell (WT) PC1 (43.9%) 2-cell ( KO) 1564 Oocyte 1-Cell 2-Cell 40 40 354 40 839 0

360 0 1614 0 918 232 PC2 (9.3%) PC2 (15.6%) PC2 (10.4%)

1-Cell 2-cell −40 −40 −40 −80 −40 0 40 −40 0 40 −40 0 40 PC1 (23.7%) PC1 (18.3%) PC1 (21%) D Classification according to DBTMEE E Z−Score 1.7e-18 Maternal RNA

−1 1 1.1e-54 Minor ZGA −5 0 5 No. of UP Log2 No. of DOWN 0.038 1-Cell Transient genes in KO vs WT 0.016 3.5e-27 Major ZGA samples 1.3e-15 2-Cell Transient WT KO WT KO WT KO WT KO WT KO WT KO -33 Mid-preimplantation 5.7e activation Oocyte 1-cell 2-cell Oocyte 1-cell 2-cell

1-Cell Oocyte 2-Cell Figure 2 A Gene Ontology terms B KEGG pathway - Ribosome

KO 2-cell UP KO 2-cell DOWN p=0.024 log log10 10 p-value chromatin modification p-value 0 0.0 5 ribonucleoprotein 5 −10 −6.0 RNA processing complex biogenesis xpression −20 e 100 cell cycle process cellular response to microtubule DNA damage stimulus 0 cell cellcycle division 0 mitotic cell cycle process based process DNA metabolic WT

process malised gene KO semantic space x

No r 1

semantic space x −5 −5 protein transport RNA localization

−10 −5 0 5 10 −5 0 5 Oocyte 1-cell 2- cell semantic space y semantic space y

C D Translation DNA damage Histone deacetylase complex Ribosome biogenesis RNA binding initiation Fam175a Brip1 Rbbp7 Kdm1b Nop16 Emg1 Larp1b Ppp1r15b 6 * 10 * 5.0 * 4 * ** p=0.15 ** 0 ** 0 0.0 5 0 −1 4 2.5 2 0 −5 −2.5 −2 2 −2 −5 0.0 0 −3 −10 −5.0 0 −10 −4 −4 −2

Relative gene expression −2.5 −15 Relative gene expression microtubule protein Cell signalling Cell cycle Cell signalling Mapre1 Pla2g4c Dusp1 Cdc25c Socs3 Hipk1 Smad7 5.0 *** 4 *** * **** **** 0 *** 5 * 15 2 2 2.5 2 0 −5 0 10 0 0 0.0 0 −5 WT −5 WT 5 -2 −2 −10 −2.5 −2 −10 0 KO −4 −10 KO Relative gene expression -4 −15 −5.0 −4 Relative gene expression −15 WT n=19, KO n=17 One-sided Wilcoxon rank sum test WT n=19, KO n=17 One-sided Wilcoxon rank sum test

E F

Gene Ontolgoy Terms FDR 6 corr = 0.970 corr = 0.976 1.22x10-11 RNA processing 7 mRNA metabolic process 2.73x10-8 5 4.47x10-8 mRNA processing (CPM+1) (CPM+1) 2 2 6 RNA splicing 1.58x10-5 log log -5 4 M phase 8.54x10 1.97x10-4 5 pb2 chromosome segregation pd1 3 organelle fission 2.03x10-4 Sn r Sn r -4 nuclear division 4.42x10 4 mitosis 4.42x10-4 9 10 11 9 10 11 2 2 4.62x10-4 cell cycle Dppa3 log2 (CPM+1) Dppa3 log2(CPM+1) Figure 3

A C 0.5 TEs /

0.4 20 0.3 15 0.2 10 otal reads mapped 5 0.1 T 0 Ratio of reads mapped to 30 PC3 (5.55%)

−5 20 1-cell_WT1-cell_KO2-cell_WT2-cell_KO Oocyte_WTOocyte_KO 10 0 PC2 (7.13%) −10 B −10 Oocyte (WT) Relative TE −20 Oocyte (KO) −15 expression −50 −40 −30 −20 −10 0 10 20 1-Cell (WT) 1-Cell (KO) PC1 (47.5%) 2- Cell (WT) 0.2 0.6 2- cell (KO) Oocyte 1-cell 2-cell

Oocyte 10 20 1-Cell 10 0 0 2- Cell 10 −10 0 −10 −20 PC2 (15.5%) PC2 (11.8%) PC2 (9.86%) −10 TR −20 L −30 LINE SINE −20 −10 0 10 20 −20−10 0 10 20 −20−10 0 10 20 30 PC1 (18.4%) PC1 (16%) PC1 (29.7%) D *** E LTR-ERVL LINE-L1 SINE-Alu 4 *** 0.04 0.0001 0.3 0.100 0.12 3 Maternal TEs atio xpression xpression Zygotic TEs xpression e e e 0.075 0.2 2 0.03 0.08 Odds r 0.050 WT 1 0.1 0.002 0.04 KO e to total TE e to total TE e to total TE 0.025 v v v 0 UP in DOWN in 0.0 0.000 0.00 Relati Relati Relati KO 2-cell KO 2-cell Oocyte 1-Cell 2- Cell Oocyte 1-Cell 2- Cell Oocyte 1-Cell 2- Cell F 5’LTR MuERV-L 3’LTR gag pol MT2_Mm MT2_Mm MuERV-L Int MT2_Mm ** * 20000 12000 10000 4000 Counts per million Counts per million 0 0 WT KO WT KO WT KO WT KO WT KO WT KO Oocyte 1-Cell 2-Cell Oocyte 1-Cell 2-Cell One-sided Wilcoxon rank sum test G Wild type Stella M/Z KO

-4 80 5.0x10 Bright Field 60 WT n=6

40 KO n=10 MuERV-L Gag 20

MuERV-L Gag Relative MuERV-L Gag intensity DAPI 0 WT KO Two-sided Wilcoxon rank sum test Figure 4

A Relationship between expression of C RMER19A 1.5kb Exoc3l2 ORR1A2-Int 52kb Atf7ip2 TEs and genes in 2-cell embryos (668bp) (1805bp)

1.00 Spearman r 4 Spearman r = 0.67 0.75 = 0.81 **** 3 WT **** Differentially Genes assoicated WT KO expressed with differentially 0.50 KO 2 genes 68% 12% 20% expressed TEs 1 0.25 ORR1A2-Int RMER19A

p=3.19x10-87 Hypergeometric test performed 1 2 3 4 0.3 0.6 0.9 B Random genes Zygotically activated genes Exoc3l2 Atf7ip2 27kb 1.2kb 0.20 ORR1A2-Int Chka MT2B Zfp54 (726bp) (563bp)

0.15 Spearman r Spearman r **** 1.5 = 0.69 **** 2 = 0.63 1.0 WT WT KO KO

Frequency 0.10 1 0.5 MT2B ORR1A2-Int 0.05

0.00 2.5 3.0 3.5 4.0 Mean no. of TEs +/− 10kb of a group of genes 0.25 0.75 0.5 1.0 1.5 2.0 Chka Zfp54 D Chimeric transcripts ARBP Snrpc CTGCCGACTACGGAGTAACTAATCC 155bp MT2_Mm 1 2 Snrpc SD SA ATG -CTGCCGACT//GTAA 1.71kb CTAG//ACGGAGTAACTAATCC * * * * GM12617 CTGCTGACTGAGGCCAATTGGCAGA MT2C_Mm 1 GM12617 106bp SD SA -CTGCAGACT//GTAA 2.67kb CAAG//GCAGCCAAGTGGCAGA Bex6 ACTACAAATTGGCACTGTCTCCAGTC Bex6 MT2B2 1 147bp SD SA ATG -ACTACAAATT//GTAA 2.10kb TCAG//GGCACTGTCTCCAGTC Gpbp1 CTGCAAACTGGACTTGCCATGAGGT MT2B 1 2 3 Gpbp1 101bp SD SA ATG -CTGCAAACT//GTAA 31.5kb ACAG//GGACTTGCCATGAGGT Bri3bp AGATTGAAGTTACTGTCCAGGCTGA 1 B2_Mur2 2 Bri3bp 101bp ATG SD SA -AGATTGAAG//GTGT 8.30kb GCAG//TTACTGTCCAGGCTGA Rbm8a CTGCAGCCCCTTCCCGAGGTGCTTA MT2B1 1 2 3 Rbm8a 147bp SD SA ATG -CTGCAGCCC//GTAA 0.5kp ACAG//CTTCCCGAGGTGCTTA

E Chimeric transcript expression in 2-cell embryos p=0.13 WT n=10 *** 3.0 KO n=10 2.5

p=0.09 *** ** *** 2.0 1.5 1.0 Relative transcript expression 0.5 0.0

Snrcp GM12617 Bex6 Gpbp1 Bri3bp Rbm8a One-sided Wilcoxon rank sum test Figure 5

A 1-cell Blastocyst 2-cell 4-cell 8-cell Morula siRNA

B MuERV-L Perfect 1bp 2bp ≤ 2bp Gag MuERV-L siRNA match mismatch mismatch mismatch LTR 6471 bp Pro-pol-dUTPase LTR Full length 469 105 6 580 MuERV-L (80.5%) (18.0%) (1.0%) (99.5%) primer_bind Polypurine tract siRNA target Off targets 12 8 6 26

C MuERV-L pol transcript D MuERV-L Gag protein expression - 20μM expression - 20μM MuERV-L Gag protein level n.s. n.s. * n=16 -5 p=0.15 p=4.31x10 n=15 n=16 Uninjected 1.5 1500

n=15 n=17 1.0

1000 n=19 Scramble 20μM 0.5

500 MuERV-L 20μM Relative transcript expression Relative fluorescence intensity

_L _L V V R siRNA siRNA siRNA siRNA Scramble MuE Scramble MuER Uninjected Uninjected Two-sided Wilcoxson rank sum test E Day 2 Day 3 Day 4 0.065 Day 5 100 89% p=0.18 140 80 Scramble 83% siRNA 60% 20μM 120

60 cells + 100 40 80 20 MuERV-L siRNA 60 Percentage of total embryos 0 20μM 40

died Number of DAPI 2-cell 3-cell 4-cell 8-cell 2-cell 4-cell 8-cell 2-cell

morula morula morula 20 blastocyst blastocyst Uninjected Scramble siRNA 20μM MuERV-L siRNA 20μM One-sided Wilcoxon rank sum test One-tailed Student’s T-test MuERV-L pol transcript MuERV-L Gag protein MuERV-L Gag protein level F G H Day 2 * expression - 80μM expression - 80μM Scramble siRNA MuERV-L siRNA ** 80μM 80μM * n.s. n.s. * p=0.016 p=0.001 100 p=0.007 * p=0.001 80 n=4 n=10 n=12 20000 60 2.5

n=6 40 2.0 15000

n=16 20 1.5 n=8 10000 Percentage of total embryos 0 1.0 Relative transcript expression Relative fluorescence intensity

2 cell 4 cell - 8 cell 0.5 5000

Uninjected _L _L V V R R Scramble siRNA 80μM siRNA siRNA siRNA siRNA Scramble MuE Scramble MuE Uninjected Uninjected MuERV-L siRNA 80μM Two-sided Wilcoxon rank sum test One-tailed Student’s T-test Figure 1 - Figure Supplement 1 Stella protein domains A

SAP-like domain Splice-like factor

Nuclear export signal

Nuclear localisation signal

B

H3K9me2 binding domain TET3 exclusion from maternal PN

100 amino acids Figure 1 - Figure Supplement 2 A B Oocyte 1-cell 2-cell 40 40 1.0 0 0 Pre-RUV 0 0.8 −40 PC2 (15.8%) PC2 (10.3%) PC2 (15.5%) −40 −40 0.6 −40 0 40 −40 0 40 −50 0 50 PC1 (25.6%) PC1 (22.1%) PC1 (31.8%) 40 0.4 40 40 Batch 1 Batch 2 mitochondrial genes

0 Batch 3 0 0 0.2 Post-RUV Batch 4 Proportion of reads mapped to Batch 5 PC2 (9.3%) PC2 (15.6%) PC2 (10.4%) 0.0 −40 Batch 6 3 5 7 10 10 10 −40 −40 Batch 7 Number of reads mapped to exons −80 −40 0 40 −40 0 40 −40 0 40 PC1 (23.7%) PC1 (18.3%) PC1 (21%) Figure 2 - Figure Supplement 1

GO:0005840 - Ribosome

p=0.027 10000

xpression e 100 WT KO

malised gene No r 1

Oocyte 1-cell 2-cell Figure 3 - Figure Supplement 1

A LTR LINE SINE

Relative TE Relative TE Relative TE Oocyte expression expression expression 1-Cell 2- Cell

0 0.3 0 0.06 0.02 0.1

VL TR V1 VK vB L2 L1 Alu ID B2 B4 R L o CR1 Deu MIR E ER ER TE−X tRNA Gypsy R Dong−R4TE−B VL−MaLR R ER B LTR LINE SINE

-7 0.024 0.0002 7.30x10 0.0002 0.12 0.3

0.6 xpression xpression xpression e e e WT 0.08 0.024 0.2 KO 0.4 0.01 0.003 e to total TE e to total TE e to total TE v v v 0.04 0.1 0.2 Relati Relati Relati

0.0 0.00 0.0 Oocyte 1-Cell 2- Cell Oocyte 1-Cell 2- Cell Oocyte 1-Cell 2- Cell

C MuERV-L MuERV-L tdTomato Gag Merge DAPI Gag Merge

2-cell omato 2C::td T MII oocyte Figure 3 - Figure Supplement 2 A 48hrs + DOX Dppa3 OE 96hrs + DOX Dppa3 OE Dppa3 1000 Empty vector= 12.0% 1000 Empty vector = 8.94% 100 Dppa3 OE = 12.2% Dppa3 OE = 10.4% 800 800 80

600 600 60

40 400 400 Frequency

Relative expression 20 200 200 0 0 0 48hrs 96hrs 0 103 104 105 0 103 104 105 Empty vector dtTomato (MuERV-L) Dppa3 over-expression

B 48hrs + siRNA 96hrs + siRNA Dppa3 1000 Scramble siRNA= 7.83% 1000 Scramble siRNA= 9.10% Dppa3 siRNA = 7.30% Dppa3 siRNA = 9.31% 1.2 800 800 1

600 600 0.8

0.6 400 400 Frequency 0.4

200 200 Relative expression 0.2

0 0 0 0 103 104 105 0 103 104 105 48hrs 96hrs

dtTomato (MuERV-L) Scramble siRNA

Dppa3 siRNA Figure 3 - Figure Supplement 3

A B Between-sample normalisation method 1.00 TE expression

0.75 4.8 5.6

0.50

0.25

Proportion of unambiguous multimapped reads 0.00

Element Family Class TR L LINE SINE Figure 4 - Figure Supplement 1 A 1500

224 387 1000 TE/gene TE/gene <-0.7 >0.7 Frequency 500 0 -1 -0.7 -0.35 0 0.35 0.7 1 Spearman's correlation

B MuERV-L transcript expression in 2-cell embryo Genes with chimeric junction to MuERV-L

25000

20000 −2 0 2 Row Z−Score 15000

10000 Counts per million 5000

0 6 2 1 7 9 3 4 5 8 1 2 8 7 5 4 3 9 6 T T T T T T T T T O O O O O O O O O K K K K K K K K K W W W W W W W W W O6 O2 O1 O4 O7 O3 O5 O8 O9 K K K K K K K K K WT4 WT3 WT2 WT5 WT8 WT9 WT7 WT1 WT6 Figure 5 - Figure Supplement 1

MT2B1_Rbm8a MT2C_Mm_GM12617 MT2B2_Bex6 p=0.057 3 p=0.001 2.0 5 p=0.001 4 1.5 2 3 1.0 2 1 0.5 1 Relative transcript expression 0 0.0 0

Uninjected ScramblesiRNA MuERV-LsiRNA Uninjected ScramblesiRNA MuERV-LsiRNA Uninjected ScramblesiRNA MuERV-LsiRNA

MT2_Mm Snrpc MT2B_Gpbp1 B2_Bur2_Bri3bp

1.5 30 60

1.0 20 40

0.5 10 20 Relative transcript expression 0.0 0 0 Uninjected n=4 80µM Scramble siRNA n=12 80µM MuERV-L siRNA n=8

Uninjected ScramblesiRNA MuERV-LsiRNA Uninjected ScramblesiRNA MuERV-LsiRNA Uninjected ScramblesiRNA MuERV-LsiRNA Two-sided Wilcoxon rank sum test