A PAX5-OCT4-PRDM1 developmental switch specifies human primordial germ cells

Authors: Fang, Fang, Benjamin Angulo, Ninuo Xia, Meena Sukhwani, Zhengyuan Wang, Charles C. Carey, Aurelien Mazurie, Jun Cui, Royce Wilkinson, Blake Wiedenheft, Naoko Irie, M. Azim Surani, Kyle E. Orwig, and Renee A. Reijo Pera

This is a postprint of an article that originally appeared in Nature Cell Biology on April 2018. The final version can be found at https://dx.doi.org/10.1038/s41556-018-0094-3.

Fang, Fang, Benjamin Angulo, Ninuo Xia, Meena Sukhwani, Zhengyuan Wang, Charles C. Carey, Aurelien Mazurie, Jun Cui, Royce Wilkinson, Blake Wiedenheft, Naoko Irie, M. Azim Surani, Kyle E. Orwig, and Renee A. Reijo Pera. "A PAX5-OCT4-PRDM1 developmental switch specifies human primordial germ cells." Nature Cell Biology 20, no. 6 (April 2018): 655-665. DOI:10.1038/ s41556-018-0094-3.

Made available through Montana State University’s ScholarWorks scholarworks.montana.edu A PAX5–OCT4–PRDM1 developmental switch specifies human primordial germ cells

Fang Fang 1,2*, Benjamin Angulo1,2, Ninuo Xia1,2, Meena Sukhwani3, Zhengyuan Wang4, Charles C. Carey5, Aurélien Mazurie5, Jun Cui1,2, Royce Wilkinson5, Blake Wiedenheft5, Naoko Irie6, M. Azim Surani6, Kyle E. Orwig3 and Renee A. Reijo Pera 1,2

Dysregulation of genetic pathways during human development leads to infertility. Here, we analysed bona fide human primordial germ cells (hPGCs) to probe the developmental genetics of human germ cell specification and differentiation. We examined the distribution of OCT4 occupancy in hPGCs relative to human embryonic stem cells (hESCs). We demonstrated that development, from pluripotent stem cells to germ cells, is driven by switching partners with OCT4 from SOX2 to PAX5 and PRDM1. Gain- and loss-of-function studies revealed that PAX5 encodes a critical regulator of hPGC development. Moreover, an epistasis analysis indicated that PAX5 acts upstream of OCT4 and PRDM1. The PAX5–OCT4–PRDM1 form a core tran- scriptional network that activates germline and represses somatic programmes during human germ cell differentiation. These findings illustrate the power of combined genome editing, cell differentiation and engraftment for probing human developmen- tal genetics that have historically been difficult to study.

ubstantial research has centred on the identification and char- family, and discovered that it encodes a critical component of a acterization of that are required for the specification, genetic switch that is required for the transition from pluripotent Smaintenance and differentiation of the mammalian PGCs that stem cells to the differentiation of hPGCs. ultimately give rise to the sperm and eggs required to perpetuate life1–13. In mice, several transcription factors have been identified Results that are required for the in vitro specification and induction of the Global redistribution of OCT4 occupancy in the transition from earliest stages of germ cells. These germ cells are ultimately able to hPSCs to hPGCs. To probe the role of OCT4 in hPGCs, we per- mature and fulfil the greatest test of germ cell identity, the ability to formed OCT4 chromatin immunoprecipitation (ChIP) assays with produce live offspring10,14–18. However, the transcriptional network sequencing (ChIP–seq) on germ cells from second trimester human of hPGCs differs substantially from that of mice, making it diffi- fetal testis cells. At this developmental stage, hPGCs have colonized cult to translate this knowledge directly to humans11. For example, the testis and are in the process of expanding to approximately hPGCs express lineage specifier genes, such as SOX17, that are not 1–2 million cells in total, but have not differentiated to spermatogo- expressed in mouse PGCs19. nia or spermatocytes6. We noted that OCT4-positive cells are only Although hPGCs are committed to the germ cell lineage, they present in the seminiferous tubules of the testis and not within the share with hESCs expression profiles of several pluripotency genes, interstitial spaces (Fig. 1a, Supplementary Fig. 1a). Immunostaining including OCT4 (also known as POU5F1); however, other key plu- data also indicated that OCT4-positive cells are a subpopulation ripotency genes, such as SOX2, are not expressed in hPGCs11,20,21. of c-KIT-positive cells and do not express DDX4, which is an evo- How the network of pluripotency genes, which encode transcrip- lutionarily conserved germ cell marker of later stages of develop- tion factors, functions differently in hESCs and hPGCs is a fun- ment (post-PGC; Supplementary Fig. 1b,c). However, since only damental question in the field of human germ cell developmental 1% of the cells in the human fetal testis are OCT4-positive hPGCs genetics that has remained unaddressed. Thus, in this study, we (Fig.1a, Supplementary Fig. 1a), and conventional ChIP protocols elucidated genetic mechanisms that underlie the development of require a large number of homogenous cells, we adapted protocols hPGCs by identifying transcription factors that might function in from carrier ChIP25 and tissue ChIP26 to detect binding specifici- a network to mediate hPGC specification and differentiation. We ties of individual transcription factors within a heterogeneous cell focused on OCT4, an essential that encodes a transcription mixture. We validated our protocol using a heterogeneous control factor that is expressed in both hESCs and hPGCs and is required to mixture of 10,000 OCT4-positive hESCs mixed with 990,000 OCT4- maintain cell identity in both cell types22–24. We developed methods negative fibroblast cells to model the composition of fetal testis to map the genome-wide binding of the OCT4 in highly (Supplementary Fig. 1d). We compared these data to that generated heterogeneous tissue samples and identified OCT4 protein partners by conventional ChIP on a pure population of 1 million hESCs by in hPGCs. We then used gain- and loss-of-function gene analyses quantitative PCR (qPCR; Supplementary Fig. 1e) and ChIP–seq. to probe the function of PAX5, a member of the paired box (PAX) We found that the result from mixed-ChIP correlated highly with a b PGCsESCs 102 OCT4 and DAPI ESCs PGCs 101

100 4,905 peaks

–1

Normalized quantity of peaks 10 –5 –4 –3 –2 –1 0 1 2345 4 5 kb Distance to TSS (bp ×10 )

cdScatter plot of OCT4 ChIP-seq 5 ESCs 0 6 5 PGCs 4 0

OCT4 NANOG OCT4 DNMT NODAL 2 DPPA4 TDGF1 28 TERT PIWIL1 LIN28A NANOS2 PUM1 ESCs

0 MLH1 PUM2 0

OCT4 ChIP-seq in ESCs DDX4 DMC1 28 ZP4 PGCs –2 0 –4 –2 024 OCT4 ChIP-seq in PGCs PIWIL1

e 0 1×10–5 2×10–5 3×10–5 4×10–5 5×10–5 Male gamete generation

348 1,358 742 Lipid glycosylation Glucuronate metabolic process Cellular glucuronidation

ESCs PGCs 0510 15 20 25 30 35

04×10–10 8×10–10 1.2×10–9 1.6×10–9 Synaptic transmission Axon development Neuron projection development Nervous system development Neuron development Gene numbers Neuron differentiation P value Neurogenesis Neuron projection Synapse 050 100 150 200 250 300 350

Fig. 1 | Global redistribution of OCT4 binding in PGCs compared with ESCs. a, Cross-section of a human fetal testis (22 weeks) immunostained for OCT4. The panel on the right is an enlargement of the region enclosed within the white broken lines of the left panel. Scale bars, 50 µ​m; inset, 100 μ​m. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. b, Left panel: heatmap visualization of OCT4 ChIP–seq data, depicting all binding events centred on the peak region within a 5 kb window around the peak. Right panel: distribution and peak heights of OCT4 peaks around the transcription start site (TSS). Peak heights are reported in reads per million. c, Scatterplot comparing OCT4 binding in PGCs and ESCs. Selected genes known to be associated with pluripotency are highlighted in blue, and those highly expressed in the germline are highlighted in red. d, Genome browser representation of ChIP–seq tracks for OCT4 in ESCs (red) and PGCs (yellow) at the OCT4 and PIWIL1 loci. Regions that were bound exclusively by OCT4 in ESCs or PGCs are highlighted by pink shaded boxes. Y-axes represent ChIP-seq signals in units of SPMR (signals per million reads). ChIP–seq were independently repeated twice and similar results were obtained. e, Venn diagram of unique and shared genes bound by OCT4 in ESCs and PGCs. GO analysis results are shown to the right and bottom of the venn diagram. Y-axes represent each category of molecular functions and biological processes. The analysis was performed twice and similar results were obtained based on two independent ChIP–seq data. that from conventional ChIP (Supplementary Fig. 1f–h). Thus, our Although the enrichment profile of OCT4 around the transcription methods are reliable for generating binding data from a heteroge- start sites was similar in both hPGCs and hESCs (Fig. 1b), there was a neous mixture of cells when coupled with highly specific antibodies. substantial redistribution of OCT4 binding. This redistribution was We then applied the mixed-ChIP protocol to human fetal testis characterized by reduced binding near pluripotency-related genes samples and generated a global binding profile for OCT4 in bona (for example, OCT4, NANOG and LIN28A) and an enrichment of fide hPGCs. Two biological replicates were used and these replicates binding near germ cell-related genes (for example, PIWIL1, DDX4 demonstrated a gene overlap of >90%​ (Supplementary Fig. 1h). and NANOS2) in hPGCs relative to hESCs (Fig. 1c,d). Furthermore, a (GO) analysis of genes bound by OCT4 only in to a level comparable to that of bona fide hPGCs (Supplementary hPGCs revealed that genes were enriched in GO terms that include Fig. 2b). Forced expression of either PAX5 or PRDM1 did not alter male gamete generation and spermatogenesis, while genes bound the identity of ESCs in routine hESC maintenance (Supplementary by OCT4 in both hESCs and hPGCs were enriched in neuronal Fig. 3b,c). Next, we differentiated hESCs overexpressing PAX5 or development. This result suggests that OCT4 may repress ectoderm PRDM1 by exposing the cells to BMPs for 7 days, as previously differentiation in both cell types (Fig. 1e). Together, our data dem- described5,34. The expression of germ cell marker genes, including onstrated that the cell fate change from pluripotent hESCs to bona DAZL, DDX4, DPPA3 and NANOS3, was significantly upregulated fide germline hPGCs is associated with a global reorganization of in cells overexpressing these factors compared with control cells OCT4 occupancy. (Fig. 3a, Supplementary Fig. 3d). PAX5 also demonstrated binding on several later stage germ cells genes, such as SYCP1 and SYCP3, OCT4 switches partners to PAX5 and PRDM1 in hPGCs. We next and activated their expression during differentiation (Fig. 3a, set out to determine whether the genomic redistribution of OCT4 Supplementary Fig. 3e). Interestingly, co-expression of PAX5 and could be due to alternative OCT4 binding partners in hPGCs rela- PRDM1 showed no obvious additive effects relative to overexpres- tive to hESCs. Immunostaining and RNA expression data showed sion of either gene alone (Supplementary Fig. 3f). Immunostaining that the expression of SOX2, OTX2 and ZIC2, genes that encode revealed a DDX4 signal in cells overexpressing PAX5 but not con- well-characterized functional partners of OCT4 in hESCs27–30, are trol cells (Fig. 3b). We also used a hESC–DDX4–mOrange knock-in significantly downregulated in hPGCs31 (Supplementary Fig. 2a,b). reporter cell line to better characterize the effects of in vitro dif- This result indicated that OCT4 might require different partners ferentiation. FACS analyses revealed a significantly higher percent- in hPGCs compared with in hESCs. To screen for potential OCT4- age of DDX4–mOrange-positive cells in the cells overexpressing interacting transcription factors in hPGCs, we performed de novo PAX5 relative to control cells (Fig. 3c). These data demonstrated sequence motif searches using OCT4-bound sequences exclusively that induced expression of PAX5 and PRDM1 in hESCs strongly in hPGCs and discovered motifs that are similar to consensus promotes the differentiation of germ cells in vitro and prompted us motifs for the transcription factors PAX5 and PRDM1 (Fig. 2a). to explore whether these cells may further mature if placed in the Immunostaining demonstrated extensive co-expression of PAX5 somatic niche via xenotransplantation. and PRDM1 in OCT4-positive cells (Fig. 2b). Quantitation indicated To investigate the differentiation potential in vivo, we used a that approximately 60% of the OCT4-positive cells are also positive previously developed xenotransplantation platform2–4. Briefly, we for PAX5 and PRDM1. A RNA expression analysis also showed sig- transplanted green fluorescent protein (GFP)-tagged human cells nificant induction of PAX5 and PRDM1 in hPGCs compared with into busulfan-treated immunodeficient mice, which are depleted hESCs31 (Supplementary Fig. 2b). This result suggested that PAX5 of endogenous germ cells (Fig. 3d, Supplementary Fig. 4a). Two and PRDM1 might co-occupy genomic loci with OCT4 as func- months post-transplantation, we analysed the testes and observed tional complexes in a germ cell-specific transcriptional network. significant human germ cell engraftment in the tubules for cells To determine the binding profiles of PAX5 and PRDM1 and to overexpressing PAX5 or PRDM1, as indicated by the presence of probe the potential association with OCT4 in hPGCs, we performed GFP-positive cells that co-expressed DDX4 (Fig. 3e). More impor- a ChIP–seq analysis for both PAX5 and PRDM1 (Supplementary tantly, cells overexpressing PAX5 differentiated to a later stage germ Fig. 2c). Among the top 5,000 genes bound by each transcription cell fate and expressed mature germ cell markers, such as DAZL factor, 1,441 genes, including germ cell-specific genes (for example, and DAZ1 (Fig. 3f). To quantify germ cell potential, we counted DDX4, DAZL and PIWIL1) were collectively bound by all three GFP-positive and DDX4-positive human germ cells and tubules transcription factors (Fig. 2c,d). These results suggested extensive across entire cross-sections. We then calculated the percentage of co-occupancy of PAX5, PRDM1 and OCT4 in hPGCs. An annota- positive tubules and determined the number of GFP-positive and tion analysis revealed that these co-bound genes are enriched for DDX4-positive cells in each positively stained tubule. Both values GO terms of germ cell signalling, hESC pluripotency and bone were significantly higher in cells overexpressing PAX5 or PRDM1 morphogenetic protein (BMP) signalling pathways (Fig. 2e). In (Fig. 3g,h). Consistent with the in vitro differentiation results, cells addition, we observed that a recombinant OCT4–GST (glutathione overexpressing both PAX5 and PRDM1 promoted germ cell differ- S-transferase) fusion protein can immunoprecipitate recombinant entiation of hESCs in the mouse seminiferous tubule; however, no PAX5 protein in vitro (Fig. 2f), indicating that PAX5 may interact additive effects were detected compared with cells overexpressing directly with OCT4 protein in hPGCs. We did not detect direct pro- each factor alone (Supplementary Fig. 4b–d). These data provide tein interactions between recombinant OCT4 and PRDM1 proteins. strong evidence that overexpression of PAX5 and PRDM1 is able to However, PRDM1 protein was pulled down by PAX5 and vice versa greatly promote the differentiation potential of hESCs towards the (Supplementary Fig. 2d), indicating that OCT4, PAX5 and PRDM1 germ cell lineage in vitro and in vivo. may assemble as a protein complex through both direct and indirect interactions. In summary, our observations suggested that OCT4, Knockout of PAX5 or PRDM1 reduces the germ cell potential partnering with PAX5 and PRDM1 proteins, might constitute an of hESCs. We further examined the role of PAX5 in germ cell dif- extensive and unique transcription network in hPGCs. ferentiation by performing loss-of-function studies. We gener- ated a PAX5-knockout hESC line via CRISPR (clustered regularly Overexpression of PAX5 and PRDM1 induces human germ cell interspaced short palindromic repeats)–Cas9-based genome edit- differentiation. Although PRDM1 is a well-studied key regulator ing (Supplementary Fig. 5a,b). Knockout of PAX5 was confirmed of PGC development in both mice and humans15,19,32, the discovery by immunofluorescence and western blot analyses upon induced of PRDM1 as a binding partner of OCT4 has not been previously neuronal differentiation35,36 (Supplementary Fig. 5c,d). Moreover, shown. Additionally, although PAX5 is most commonly known for its a significant reduction of germ cell was observed role in development of the blood system33, a potential role for PAX5 after in vitro differentiation (Fig. 4a) and DDX4 was not detected in PGC development or specification has not yet been reported. To (Fig. 4b), further indicating that germ cell differentiation is severely investigate the roles of both genes in hPGCs, we examined whether reduced in vitro. We then examined whether germ cells could be overexpression of PAX5 and PRDM1 is capable of directing hESCs differentiated and maintained from these cell lines via in vivo xeno- to the germ cell lineage in an in vitro differentiation system. We transplantation. PAX5-knockout cells gave rise to significantly fewer overexpressed PAX5 or PRDM1 by approximately 10–15 times positive tubules with human germ cells and fewer GFP-positive above the endogenous expression levels (Supplementary Fig. 3a) and DDX4-positive cells in the positive tubules; most of the

Nature Cell BioloGy | www.nature.com/naturecellbiology 2 a 2 2 1 1 1 Bits Bits Bits

0 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 10 11 12 13 14 15 10 11 12 13 14 15 OCT4 PRDM1 PAX5

b PAX5, OCT4 and DAPI PAX5 OCT4 DAPI

PRDM1, OCT4 and DAPI PRDM1 OCT4 DAPI

c e –log(P value) 0 2.52.01.51.00.5 Germ cell signalling 823 Human ESC pluripotency OCT4 PRDM1 Transcriptional regulation in ESCs 1,769 1,883 Role of OCT4 in ESCs 1,441 Role of NANOG in ESCs 967 853 Sperm mobility BMP signalling pathway PAX5 Cell cycle: G1/M 1,739 Cell cycle by BTG family d 0 0.05 0.1 0.15 0.20 5.9 OCT4 ChIP Ratio

0 f 14.2 PAX5 ChIP PAX5 OCT4 GST 0 Anti-PAX5 5.5 PRDM1 ChIP 36 kDa

0 TBX3 OCT4 PAX5 GST 7.7 OCT4 ChIP Anti-OCT4 36 kDa 0 12.4 PAX5 ChIP PAX5 OCT4 GST

0 Anti-GST 14.5 PRDM1 ChIP 22 kDa

0 PIWIL1

Fig. 2 | Co-occurrence of OCT4 with PAX5 and PRDM1 in PGCs. a, The position weight matrix of an enriched motif found in OCT4 ChIP–seq data from PGCs. The motif resembles the binding motifs for PRDM1 and PAX5. b, Cross-section of a human fetal testis (22 weeks). Upper panel: immunostained for PAX5 (red) and OCT4 (green), and DAPI stained for nuclei (blue). Lower panel: immunostained for PRDM1 (red) and OCT4 (green), and DAPI stained for nuclei (blue). Enlarged panels on the right represent the region enclosed within the white broken lines of the far left panel. White arrows indicate co-localization of PAX5 and OCT4 or PRDM1 and OCT4. Scale bars (orignal images), 100 µ​m; (expanded images) 50 µ​m. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. c, Venn diagram of unique and shared genes bound by OCT4, PRDM1 and PAX5 in PGCs. The number of genes bound exclusively by each transcription factor or co-bound by multiple transcription factors are labelled. d, Genome browser representation of ChIP–seq tracks for OCT4 (yellow), PAX5 (blue) and PRDM1 (green) at the TBX3 and PIWIL1 loci. Regions that are bound collectively by OCT4, PAX5 and PRDM1 in PGCs are highlighted by pink shaded boxes. Y-axes represent ChIP-seq signals in units of SPMR (signals per million reads). ChIP–seq were independently repeated twice and similar results were obtained. e, GO analysis of co-bound genes. The analysis was performed twice and similar results were obtained. f, GST pull-down assay performed using OCT4 and PAX5 recombinant proteins. Pull-down was repeated three times and similar results were obtained. Unprocessed scans of western blots are shown in Supplementary Fig. 8. a bc 4 DAZL hESCs 10 DAPI DDX4 0.2% DDX4 3 10 No reporter P1 DPPA3 control 2 NANOS2 10

NANOS3 101 102 103 104 105 SYCP1 104 Germ cell genes SYCP3 3.3% 103 H1-DDX4 PAX5 OE P1 102 reporter DPPA4 Counts FGF4 101 102 103 104 105 104 NANOG

103 10.7% OCT4 H1-DDX4 reporter P1 PRDM14 102 PAX5 overexpression

Pluripotency genes TDGF1 DAPI DDX4 101 102 103 104 105 H1 mOrange

H1_BMPsPAX5 OE d PRDM1 OE PAX5 OE_BMPsRaw Z score PRDM1 OE_BMPs 2–2–101 Germ cell depletion by busulfan

e DDX4, GFP and DAPI Empty testis Injection of GFP-tagged human cells to seminiferous tubules * 2 months after injection * Analyse by immunohistochemistry PAX5 OE * f DAZL,/GFP/and DAPI

* ** * PRDM1 OE

DAZ1,/GFP/and DAPI

* Control hESCs *

g h P = 0.0020 P = 0.0002 P < 0.0001 P < 0.0001 60 60 80 60 60 40 40 40 40 20 20 20 20 all tubules all tubules Percentage of Percentage of

0 0 Number of cells 0 Number of cells 0

OE hESCs hESCs hESCs hESCs PAX5 OE PAX5 OE PRDM1 PRDM1 OE

Fig. 3 | Overexpression PAX5 and PRDM1 enhances the germ cell potential of ESCs. a, Heatmap of FPKM (fragments per kilobase of transcript per million mapped reads) values for genes associated with germline programming (top) and pluripotency (bottom). OE, overexpression. b, Differentiated cells from hESCs and cells overexpressing PAX5 immunostained for DDX4 and DAPI. Scale bars represent 50 µm. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. c, Gating strategy to sort mOrange-positive cells from the H1 hESC line after differentiation by BMPs. d, Schematic of the experimental design for xenotransplantation. Transplantations were performed by independently injecting GFP-tagged human cells directly into seminiferous tubules of busulfan-treated mouse testes that were depleted of endogenous germ cells. Testis xenografts were analysed by immunohistochemistry 2 months after injection. e, Immunohistochemical analysis of testis xenografts derived from cells overexpressing PAX5 or PRDM1 and of control H1 hESCs. In all panels, broken white lines indicate the outer edges of spermatogonial tubules and enlarged views are shown on the right. White asterisks represent GFP-positive and DDX4-positive donor cells near the basement membrane. Scale bars (original images), 50 µ m; (expanded images) 50 u m. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. f, Testis xenografts derived from H1 hESCs overexpressing PAX5 immunostained for the later stage PGC markers DAZL and DAZ1. Scale bars (original images), 50um. The enlarged panel on the right represents the region enclosed within the white rectangles in the left panel. Scale bars (insets), 25 µm. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. g, Percentage of tubules positive for GFP-positive and DDX4-positive cells were calculated across multiple cross-sections (relative to the total number of tubules). Data represent the mean ± s.d. of n = 4 independent replicates. P values were calculated by two-tailed Student’s t-test. h, For each positive tubule, the ratio of GFP-positive and DDX4-positive cells per tubule was determined. Data represent the mean ± s.d. of n = 5 independent replicates. P values were calculated by two-tailed Student’s t-test. Source data for g and h are provided in Supplementary Table 2. a b DDX4 and DAPI DDX4 NANOS3 DAZL SYCP3 P = 0.0395 P = 0.0029 P = 0.0096 P = 0.044 15 15 15 15

10 10 10 10

5 5 5 5

0 0 0 0 Relative expression

hESCs_BMPs hESCs_BMPs hESCs_BMPs hESCs_BMPs PAX5 KO_BMPs PAX5 KO_BMPs PAX5 KO_BMPs PAX5 KO_BMPs

c GFP, DDX4 and DAPI d f GFP, DDX4 and DAPI 25 P = 0.0003 20 15 PAX5 KO Control 10 tubules 5

Percentage of all 0 g h hESCs PAX5 KO 25 P < 0.0001 40 P < 0.0001 PRDM1 KO PAX5 KO 20 30 e 15 20 40 P = 0.0003 10 tubules tubules 30 5 10

20 Percentage of all 0 Percentage of all 0 10 hESCs PRDM1 KO hESCs PRDM1 KO

Number of cells 0 hESCs PAX5 KO ControlControl

Fig. 4 | Knockout of PAX5 or PRDM1 reduces the germ cell potential of ESCs. a, RT-qPCR analysis of control H1 hESCs after BMP-induced differentiation (hESCs_BMPs) and PAX5-knockout H1 hESCs after BMP-induced differentiation (PAX5 KO_BMPs). Data represent the mean ±​ s.d. of n = 3​ independent replicates. P values were calculated by two-tailed Student’s t-test. b, Differentiated cells immunostained for DDX4 (green) and DAPI stained for nuclei (blue). Scale bars, 100 µ​m. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. c, f, Immunohistochemical analysis of testis xenografts derived from control H1 hESCs and from PAX5 knockout (c) and PRDM1 knockout (f) H1 hESCs. All images are merged from DDX4-stained (red) and GFP-stained (green) cells and DAPI-stained nuclei. Scale bars, 50 µ​m. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. d, g, Percentage of tubules positive for GFP-positive and DDX4- positive cells were calculated across multiple cross-sections (relative to total number of tubules) for PAX5 knockout (d) and PRDM1 knockout (g) H1 hESCs. Data represent the mean ±​ s.d. of n =​ 3 independent replicates. P values were calculated by two-tailed Student’s t-test. e, h, For each positive tubule, the ratio of GFP-positive and DDX4-positive cells per tubule was determined for PAX5 knockout (e) and PRDM1 knockout (h) H1 hESCs. Data represent the mean ±​ s.d. of n =​ 3 independent replicates. P values were calculated by two-tailed Student’s t-test. Source data for a, d, e, g and h are provided in Supplementary Table 2.

DDX4-positive cells were mouse germ cells regenerated after treat- ulate and maintain OCT4 expression, as germ cell differentiation ment (Fig. 4c). There was a greater than threefold reduction of posi- proceeds and requires expression of OCT4 at moderate levels37. We tive tubules and a greater than fivefold reduction of GFP-positive first tested this hypothesis by examining OCT4 expression during in and DDX4-positive cells in the positive tubules (Fig. 4d,e), indicat- vitro differentiation. Following BMP-induced differentiation, OCT4 ing that genetic knockout of PAX5 greatly reduced germ cell differ- expression in cells overexpressing PAX5 was substantially elevated entiation from hESCs. relative to differentiated control cells (Fig. 5b). However, due to the Previous studies have revealed that PRDM1-knockout hESCs fail developmental limitation of current in vitro differentiation, genes to develop into PGCs in vitro19. To determine whether these cells essential for later stage germ cells (for example, DDX4 and DAZL), are also compromised in terms of germ cell differentiation in vivo, including PAX5, cannot be induced to the functional level1931,38,] we used previously reported PRDM1-knockout hESCs19 to test their (Supplementary Fig. 6a,b). Thus, PAX5-knockout cells only exhib- ability to differentiate into germ cells in murine xenotransplants. ited a minor decrease in OCT4 expression relative to control dif- We observed that PRDM1-knockout hESCs were severely deficient ferentiated cells for all three protocols (Supplementary Fig. 6c). To in germ cell differentiation in vivo (Fig. 4f), with the majority of overcome this limitation, we sorted hPGCs that were formed in vivo tubules devoid of any human germ cell engraftment. Only a small in mouse seminiferous tubules via GFP and c-KIT immunostaining number of tubules were observed, with sparse GFP-positive and and analysed the effects of PAX5 knockout in vivo (Fig. 5c,d). Note DDX4-positive cells. Counts of the engraftment revealed a greater that the niche of mouse seminiferous tubules provides a superior than tenfold reduction in the formation of human germ cells in the differentiation environment for hESCs to develop to more mature mouse tubules (Fig. 4g,h), resulting in a more severe defect in germ hPGCs that express genes essential for later stage germ cells, includ- cell potential compared with PAX5-knockout cells. ing DDX4 (Fig. 3e,f). In these in vivo-derived hPGCs, there was a significant downregulation of OCT4 expression in cells formed by Epistasis of PAX5, OCT4 and PRDM1 in hPGCs. To explore the PAX5-knockout cells, and, as expected, a significant upregulation molecular mechanism of PAX5 function in hPGCs, we re-analysed of OCT4 expression in cells formed by cells overexpressing PAX5 our ChIP–seq data and observed that the enhancers of OCT4, which compared with cells formed by control hESCs (Fig. 5e). are bound by OCT4 itself in hESCs, are bound by PAX5 in hPGCs To further determine whether PAX5 regulates OCT4 expression (Fig. 5a). Thus, we hypothesized that one role of PAX5 is to reg- by regulating the enhancer of OCT4, a luciferase reporter assay was c a OCT4 ChIP b 4.0 GFP, c-KIT and DAPI ESCs 0 15 5.0 OCT4 ChIP 10 P = 0.0217 PGCs 0 5 5.5 PAX5 ChIP

PGCs Relative expression 0 0

5.0 PRDM1 ChIP hESCs Control PAX5 OE PGCs 0 OCT4

d Control PAX5 OE PAX5 KO 105 105 105 0.8% 1.7% 0.3%

104 104 104

Q1 Q2 Q1 Q2 Q1 Q2 103 103 103 Q3 Q4 Q3 Q4 Q3 Q4 102 102 102 c-KIT 102 103 104 105 102 103 104 105 102 103 104 105 GFP

e f Minimal promoter

P = 0.0007 Luciferase 2.5 OCT4 enhancer 600 2.0 P = 0.0035 1.5 400 GFP OE 1.0 PAX5 OE

activity 200 0.5 Relative expression 0.0 Relative luciferase 0 Minimal Minimal promoter OE promoter + enhancer Control PAX5 KO PAX5

Fig. 5 | PAX5 acts upstream of OCT4. a, Genome browser representation of ChIP–seq tracks at the OCT4 . Enhancer regions are highlighted by pink shaded boxes. Y-axes represent ChIP-seq signals in units of SPMR. ChIP–seq was independently repeated twice and similar results were obtained. b, OCT4 expression in hESCs, control cells and cells overexpressing PAX5 during in vitro differentiation. Data represent the mean ±​ s.d. of n =​ 3 independent replicates. P values were calculated by two-tailed Student’s t-test. c, Mouse testis xenografts immunostained for GFP and c-KIT. Scale bar, 50 µ​m. Immunostaining experiments were independently repeated a minimum of three times and similar results were obtained. DAPI stained for nuclei. d, Flow cytometry analysis for GFP and c-KIT of mouse testis xenografts. e, RT-qPCR analysis of OCT4 expression in hPGCs formed in mouse seminiferous tubules by cells overexpressing PAX5, PAX5 knockout cells or control hESCs. Data represent the mean ±​ s.d. of n = 3​ independent replicates. P values were calculated by two-tailed Student’s t-test. f, Reporter construct used for measuring OCT4 enhancer activity is shown. The genomic fragment bound by PAX5 and OCT4 (red) was inserted upstream of a luciferase gene driven by a minimal promoter. The y axis represents the fold enrichment of luciferase activity. Data represent the mean ±​ s.d. of n = 3​ independent replicates. Source data for b, e and f are provided in Supplementary Table 2. performed in HEK293T cells. As expected, overexpression of PAX5 OCT4 were able to significantly increase luciferase activity driven caused a significant increase in luciferase activity, suggesting that by the PRDM1 enhancer (Fig. 6d). These results indicated that PAX5 could activate OCT4 expression through its enhancer (Fig. 5f). PAX5 and OCT4 could act on the PRDM1 enhancer as activators We also identified the binding motif of PAX5 in the region of the to induce PRDM1 expression, potentially during germline differ- OCT4 enhancer sequences (Supplementary Fig. 7a–c). The results entiation. A mutation in the PAX5-binding motif in the PRDM1 indicated that a mutation in the PAX5-binding motif abolished the enhancer region abolished the induction effects of the PAX5 protein induction effect of the PAX5 protein (Supplementary Fig. 7d). (Supplementary Fig. 7e–h). Since PRDM1 is a critical gene for germ Further analysis of the ChIP–seq data indicated that PAX5 and cell specification and it could be downstream of PAX5 and OCT4, OCT4 bind to PRDM1 enhancers with high intensity (Fig. 6a), sug- we overexpressed PRDM1 in PAX5-knockout cells to test whether gesting that PAX5 and OCT4 might act upstream of PRDM1 to PRDM1 could rescue the defect of PAX5-knockout cells. Indeed, regulate its expression. We found an increase of PRDM1 in hPGCs we observed that overexpression of PRDM1 restored the germ cell formed from cells overexpressing PAX5 and a significant decrease potential of PAX5-knockout cells (Fig. 6e). in hPGCs formed from PAX5-knockout cells (Fig. 6c). The lucif- Taken together, our data shed light on the epistasis of these three erase reporter assay in HEK293T cells showed that both PAX5 transcription factors during differentiation from hESCs to hPGCs and abOCT4 ChIP in PGCs 15 P = 0.0109 39

10 0 5 PAX5 ChIP in PGCs 27

Relative expression 0

0 Control PAX5 OE PAX5 KO PRDM1

c d Minimal promoter

P = 0.0004 Luciferase 2.5 PRDM1 enhancer 2.0 1,000 P = 0.0007 GFP OE 1.5 800 PAX5 OE 1.0 600 OCT4 OE 400

0.5 activity Relative expression 0.0 200

Relative luciferase 0 Minimal Minimal promoter Control PAX5 KO PAX5 OE promoter + enhancer

e DDX4 NANOS3 DAZL SYCP3 30 30 40 30

20 20 30 20 20 10 10 10 10 0 0 Relative expression 0 0

hESCs hESCs hESCs hESCs PAX5 KO PAX5 KO, PAX5 KO PAX5 KO, PAX5 KO PAX5 KO, PAX5 KO PAX5 KO,

PRDM1 OE PRDM1 OE PRDM1 OE PRDM1 OE f Other TFs and cofactors

High expression OCT4 Pluripotency Enhancer OCT4

Other TFs and cofactors Other TFs and cofactors

Moderate expression PAX5 High expression Germline PAX5 OCT4

Enhancer OCT4 Enhancer PRDM1

Fig. 6 | PAX5 and OCT4 act upstream of PRDM1. a, Genome browser representation of ChIP–seq tracks at the PRDM1 locus. Enhancer regions bound by OCT4 and PAX5 in PGCs are highlighted by pink shaded boxes. Y-axes represent ChIP-seq signals in units of SPMR. ChIP–seq was independently repeated twice and similar results were obtained. b, PRDM1 expression in control cells, cells overexpressing PAX5 and PAX5 knockout cells during in vitro differentiation. Data are represented as mean ±​ s.d. of n =​ 3 independent replicates. P values were calculated by two-tailed Student’s t-test. c, RT-qPCR analysis of PRDM1 expression in hPGCs formed in the mouse seminiferous tubules by PAX5 overexpressing, PAX5 knockout and control hESCs. Data are represented as mean ±​ s.d. of n =​ 3 independent replicates. P values were calculated by two-tailed Student’s t-test. d, Reporter construct used for measuring PRDM1 enhancer activity is shown. The genomic fragment bound by PAX5 and OCT4 (red) was inserted upstream of a luciferase gene driven by a minimal promoter. The y axis represents the fold enrichment of luciferase activity. Data represent the mean ±​ s.d. of n =​ 3 independent replicates. e, RT-qPCR analysis of the expression of genes associated with germline programming. Data represent the mean ±​ S.D. of n =​ 3 independent replicates. P values were calculated by two-tailed Student’s t-test. f, Model for gene regulation in pluripotency and germline programmes. In pluripotent stem cells, OCT4, together with other transcription factors (TFs) and cofactors, binds to its own enhancer to activate and maintain its high expression. While differentiating towards germline cells, PAX5 replaces OCT4 and binds to the enhancer of OCT4 to maintain a moderate expression of OCT4. Meanwhile, PAX5 and OCT4 bind to the enhancer of PRDM1 and activate its expression to initiate the germline programme. Source data for b, c, d and e are provided in Supplementary Table 2.

(Fig. 6f). In pluripotent stem cells, OCT4 interacts with SOX2 and cells, PAX5 replaces OCT4, recognizes its own binding motif and other cofactors, and binds to its own enhancer to activate and main- binds to the enhancer of OCT4 to maintain a moderate expression tain high expression. In contrast, during the differentiation of germ of OCT4. Concurrently, PAX5 and OCT4 may bind to the enhancer a Ectoderm Mesoderm Endoderm 15 15 15 P = 0.009 P = 0.0003 P = 0.013 P = 0.058 P = 0.024 P = 0.003 10 10 P = 0.077 10 hESCs P = 0.002 PAX5 OE 5 5 5 P = 0.014

Relative expression 0 0 0

ID2 KDR AFP OTX2 ATCT1 NCAM1 TUBB3 GATA6 GATA4

b Ectoderm MesodermEndoderm 4 P = 0.007 15 15 P = 0.009 P = 0.0008 P = 0.007 P = 0.058 3 P = 0.023 10 10 hESCs P = 0.020 2 P = 0.002 PAX5 KO P = 0.001 5 5 1

Relative expression 0 0 0

ID2 KDR AFP OTX2 NCAM1 TUBB3 ATCT1 GATA6 GATA4

c Ectoderm MesodermEndoderm 5 P = 0.001 25 P = 0.023 8 P = 0.002 4 P = 0.005 20 6 hESCs 3 15 4 P = 0.0001 PRDM1 2 10 P = 0.013 P = 0.038 P = 0.004 P = 0.0007 KO 1 5 2

Relative expression 0 0 0

ID2 KDR AFP OTX2 NCAM1 TUBB3 ATCT1 GATA6 GATA4 d PAX5 OE PAX5 KO PRDM1 KO

PAX5 OCT4 PAX5 OCT4 OCT4

PRDM1 PRDM1 PRDM1

Germline Mesoderm Endoderm Ectoderm Germline Mesoderm Endoderm Ectoderm Germline Mesoderm Endoderm Ectoderm

e hESCs Late hPGCs Early hPGCs OCT4, OCT4, SOX17, PRDM1 OCT4, SOX17 SOX2, PAX5, NANOS3, PRDM1, NANOS3 NANOG VASA, DAZL, DAZ1

ESCs In vitro differentiation PAX5 OE by BMPs PAX5 KO PRDM1 KO In vivo ESCs differentiation PAX5 OE by xenotrans- PAX5 KO plantation PRDM1 KO

Fig. 7 | Role of PAX5 and PRDM1 in hPGC specification in vitro. a–c, RT-qPCR analyses of gene expression in all three germ layers. Comparison between H1 hESCs and cells overexpressing PAX5 (a), PAX5 knockout cells (b), and PRDM1 knockout cells (c) after BMP-induced differentiation. Data represent the mean ± s.d. of n = 3 independent replicates. P values were calculated by two-tailed Student’s t-test. d, Proposed molecular model for a transcriptional network centred by PAX5, OCT4 and PRDM1 in hPGCs. Upon induced germ cell differentiation with BMPs, OCT4 expression is reduced to moderate levels and maintained in partnership with PAX5. To efficiently induce germline programmes, OCT4 represses ectodermal genes and, at the same time, together with PAX5, activates PRDM1 to repress mesodermal and endodermal genes. In PAX5 knockout cells, OCT4 expression has decreased to levels so low that the expression of ectodermal genes is not suppressed effectively. Thus, the efficiency of induction of germ cells is low in PAX5 knockout cells and lower in PRDM1 knockout cells. Due to the low expression of OCT4 and loss of PRDM1 function, genes in all somatic lineages are upregulated and germ cell programmes fail to be activated. e, Summary of data establishing the roles of PAX5 and PRDM1 in hPGC specification in vitro and in vivo. The identity of hESCs is maintained by a core transcriptional network centred by OCT4, SOX2 and NANOG. Induced by BMP signals in vitro or in vivo by xenotransplantation, hESCs start to differentiate to early hPGCs, which express early germ cell markers, such as OCT4, SOX17, PRDM1 and NANOS3 (grey line with arrowhead). Overexpression of PAX5 is able to enhance the efficiency to early hPGCs and promote early hPGCs to the later stage, which express mature germ cell markers, such as DDX4, DAZL and DAZ1 (black line with arrowhead). Loss of PAX5 significantly reduces the germ cell potential of hESCs (grey broke line with arrowhead), while loss of PRDM1 leads to failure of hPGC specification (red line with an end bar). Source data for a–c are provided in Supplementary Table 2. of PRDM1 and activate its expression to initiate the germ cell cell differentiation, enable more faithful and efficient production of programme. human germ cells in vitro, and contribute to knowledge and models of human germ cell pathologies. Molecular model of the PAX5–OCT4–PRDM1 network. OCT4 is known to repress ectoderm formation from hESCs39 and dur- Methods ing germ cell differentiation (Fig. 1e). In addition, PRDM1 has Methods, including statements of data availability and any asso- been shown to suppress endoderm and other somatic genes dur- ciated accession codes and references, are available at https://doi. ing gem cell specification19. Thus, we wondered whether PAX5, org/10.1038/s41556-018-0094-3. together with OCT4 and PRDM1, might function globally during germ cell differentiation. We examined the expression of somatic Received: 5 January 2018; Accepted: 23 March 2018; genes during in vitro differentiation (Supplementary Table 1). We Published: xx xx xxxx detected downregulation of somatic genes belonging to the three primary germ layers in cells overexpressing PAX5 (Fig. 7a). In con- References trast, a significant upregulation of expression in ectodermal genes 1. Ramathal, C. et al. DDX3Y gene rescue of a Y AZFa deletion was observed in PAX5-knockout cells (Fig. 7b). This result suggests restores germ cell formation and transcriptional programs. Sci. Rep. 5, 15041 (2015). that PAX5 could repress ectoderm and that the repression might be 2. Dominguez, A., Chiang, H., Sukhwani, M., Orwig, K. & Reijo Pera, R. A. mediated through the ability of PAX5 to activate or maintain OCT4 Human germ cell formation in xenotransplants of induced pluripotent stem expression. Moreover, consistent with previous studies of PRDM119, cells carrying X chromosome aneuploidies. Sci. Rep. 4, 6432 (2014). we observed significant upregulation of somatic genes in all three 3. Ramathal, C. et al. Fate of iPSCs derived from azoospermic and fertile men germ layers during the differentiation of PRDM1-knockout cells following xenotransplantation to seminiferous tubules. Cell Rep. 7, 1284–1297 (2014). (Fig. 7c), confirming the role of this gene in suppressing the differ- 4. Durruthy, J. D. et al. Fate of induced pluripotent stem cells following entiation of somatic lineages during human germ cell development. transplantation to murine seminiferous tubules. Hum. Mol. Genet. 23, Based on our data, we propose a molecular model of human 3071–3084 (2014). germ cell development (Fig. 7d). Upon external signalling via fac- 5. Kee, K., Angeles, V., Flores, M., Nguyen, H. & Reijo Pera, R. A. Human DAZL, DAZ and BOULE genes modulate primordial germ cell and haploid tors such as BMPs, germ cell differentiation is induced under both gamete formation. Nature 462, 222–225 (2009). in vitro and in vivo conditions, and OCT4 expression is reduced 6. Salto Mamsen, L., Lutterodt, M. C., Andersen, E. W., Byskov, A. G. & to a moderate level that is partly maintained by a partnership with Andersen, C. Y. Germ cell numbers in human embryonic and fetal gonads PAX5. To efficiently induce and maintain germline programmes, during the frst two trimesters of pregnancy: analysis of six published studies. OCT4 represses ectodermal genes and at the same time, together Hum. Reprod. 26, 2140–2145 (2011). 7. Clark, A. T. et al. Spontaneous diferentiation of germ cells from human with PAX5, activates PRDM1 to repress mesodermal and endoder- embryonic stem cells in vitro. Hum. Mol. Genet. 13, 727–739 (2004). mal genes. In PAX5-knockout cells, OCT4 expression was so low 8. Reijo Pera, R. A., Alagappan, R. K., Patrizio, P. & Page, D. C. Severe that the expression of ectodermal genes was not suppressed effec- oligospermia resulting from deletions of the azoospermia factor gene on tively. Thus, the efficiency of induction of germ cells is very low in the . Lancet 347, 1290–1293 (1996). PAX5-knockout cells. A more severe case is observed in PRDM1- 9. Irie, N., Tang, W. W. & Azim Surani, M. Germ cell specifcation and pluripotency in mammals: a perspective from early embryogenesis. knockout cells, in which there is an almost complete loss of expres- Reprod. Med. Biol. 13, 203–215 (2014). sion of OCT4 and PRDM1. Moreover, genes in all somatic lineages 10. Magnusdottir, E. et al. A tripartite transcription factor network regulates are upregulated and the germ cell programmes fail to be activated. primordial germ cell specifcation in mice. Nat. Cell Biol. 15, 905–915 (2013). 11. Tang, W. W., Kobayashi, T., Irie, N., Dietmann, S. & Surani, M. A. Discussion Specifcation and epigenetic programming of the human germ line. Nat. Rev. Genet. 17, 585–600 (2016). This work prompts a molecular model for germ cell development 12. Saitou, M., Barton, S. C. & Surani, M. A. A molecular programme for the (Fig. 7e). In hESCs, OCT4 partners with pluripotent master regula- specifcation of germ cell fate in mice. Nature 418, 293–300 (2002). tors, including SOX2, to form the core transcriptional network that 13. Saitou, M. & Yamaji, M. Primordial germ cells in mice. Cold Spring Harb. governs self-renewal and pluripotency. Induced by BMP signals in Perspect. Biol. 4, a008375 (2012). vitro and in vivo, hESCs differentiate into early hPGCs with low 14. Nakaki, F. et al. Induction of mouse germ-cell fate by transcription factors in vitro. Nature 501, 222–226 (2013). efficiency. Induction of PAX5 expression in hESCs results in their 15. Ohinata, Y. et al. Blimp1 is a critical determinant of the germ cell lineage in differentiation into more mature human germ cells, with signifi- mice. Nature 436, 207–213 (2005). cantly higher efficiency. These later stage differentiated cells closely 16. Weber, S. et al. Critical function of AP-2 gamma/TCFAP2C in mouse resemble late hPGCs or gonocytes and express genes such as DDX4, embryonic germ cell maintenance. Biol. Reprod. 82, 214–223 (2010). KIT and DAZL, which mark mature human germ cells. Conversely, 17. Yamaji, M. et al. Critical function of Prdm14 for the establishment of the germ cell lineage in mice. Nat. Genet. 40, 1016–1022 (2008). loss of function of PAX5 results in a lower efficiency in germ cell 18. Zhou, Q. et al. Complete from embryonic stem cell-derived germ differentiation and loss of function of PRDM1 results in a complete cells in vitro. Cell Stem Cell 18, 330–340 (2016). failure of germ cell specification. Thus, this study provides evidence 19. Irie, N. et al. SOX17 is a critical specifer of human primordial germ cell fate. that human cell fate determination, at the juncture of pluripotency Cell 160, 253–268 (2015). 20. Tang, W. W. et al. A unique gene regulatory network resets the human and somatic and germ line differentiation, may be the result of a bal- germline epigenome for development. Cell 161, 1453–1467 (2015). ance of forces. That is, the co-expression of pluripotency genes (for 21. Li, L. et al. Single-cell RNA-seq analysis maps development of human example, OCT4 and NANOG) simultaneously with lineage speci- germline cells and gonadal niche interactions. Cell Stem Cell 20, fiers (for example, SOX17, PAX5 and PRDM1) distinguishes hPGCs 858–873 (2017). from all other human cells and from mouse PGCs. To maintain cell 22. Kehler, J. et al. Oct4 is required for primordial germ cell survival. EMBO Rep. 5, 1078–1083 (2004). identity, hPGCs require a precise regulation and balance of pluri- 23. Niwa, H., Miyazaki, J. & Smith, A. G. Quantitative expression of Oct-3/4 potency and lineage specifiers to move forward from pluripotent defnes diferentiation, dediferentiation or self-renewal of ES cells. stem cell while repressing somatic lineage development and activat- Nat. Genet. 24, 372–376 (2000). ing germ cell programmes. The PAX5–OCT4–PRDM1 axis that has 24. Scholer, H. R., Dressler, G. R., Balling, R., Rohdewohld, H. & Gruss, P. Oct-4: been identified, along with the transcription factor genome-wide a germline-specifc transcription factor mapping to the mouse t-complex. EMBO J. 9, 2185–2195 (1990). binding profiles, define the identity of bona fide hPGCs at stages 25. O’Neill, L. P., VerMilyea, M. D. & Turner, B. M. Epigenetic characterization of beyond those commonly reported in vitro. The results of these the early embryo with a chromatin immunoprecipitation protocol applicable studies may shed light on genetic requirements for human germ to small cell populations. Nat. Genet. 38, 835–841 (2006). 26. Cotney, J. L. & Noonan, J. P. Chromatin immunoprecipitation with fxed 39. Wang, Z., Oron, E., Nelson, B., Razis, S. & Ivanova, N. Distinct lineage animal tissues and preparation for high-throughput sequencing. Cold Spring specifcation roles for NANOG, OCT4, and SOX2 in human embryonic stem Harb. Protoc. 2015, 191–199 (2015). cells. Cell Stem Cell 10, 440–454 (2012). 27. Nishimoto, M., Fukushima, A., Okuda, A. & Muramatsu, M. Te gene for the embryonic stem cell coactivator UTF1 carries a regulatory element which Acknowledgements selectively interacts with a complex composed of Oct-3/4 and Sox-2. This work was supported by P50 HD 068158 to R.A.R.P. (Project I). Mol. Cell. Biol. 19, 5453–5465 (1999). 28. Boyer, L. A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005). Author contributions 29. Buecker, C. et al. Reorganization of enhancer patterns in transition from The study was conceived and designed by F.F. and R.A.R.P. F.F. performed most naive to primed pluripotency. Cell Stem Cell 14, 838–853 (2014). experiments (including ChIP–seq, immunohistochemistry, RNA-seq, protein 30. Pardo, M. et al. An expanded Oct4 interaction network: implications for stem pull-down assays, luciferase reporter assays, gene expression profiling) and analysed cell biology, development, and disease. Cell Stem Cell 6, 382–395 (2010). the data. N.X. performed bioinformatics analyses for ChIP–seq and luciferase reporter 31. Gkountela, S. et al. Te ontogeny of cKIT+ human primordial germ cells assays, flow cytometry and gene expression analysis for the xenotransplantation proves to be a resource for human germ line reprogramming, imprint erasure experiments. B.A. generated the PAX5-knockout hESC lines and performed part and in vitro diferentiation. Nat. Cell Biol. 15, 113–122 (2012). of the immunohistochemistry in xenotransplantation samples. Z.W. performed the 32. Vincent, S. D. et al. Te zinc fnger transcriptional repressor Blimp1/Prdm1 is initial bioinformatics analysis for ChIP–seq data. M.S. and K.E.O. conducted the dispensable for early axis formation but is required for specifcation of xenotransplantation. C.C.C and A.M. performed RNA-seq analysis. J.C. constructed primordial germ cells in the mouse. Development 132, 1315–1325 (2005). the H1-DDX4 reporter. R.W. and B.W. designed and constructed the PAX5-knockout 33. Medvedovic, J., Ebert, A., Tagoh, H. & Busslinger, M. Pax5: a master regulator plasmids. M.A.S. and N.I. provided the PRDM1-knockout hESC line and protocol. of B cell development and leukemogenesis. Adv. Immunol. 111, 179–206 The manuscript was written by F.F. and R.A.R.P. with input from the other authors. (2011). 34. Kee, K. & Reijo Pera, R. A. Human germ cell lineage diferentiation from Competing interests embryonic stem cells. Cold Spring Harb. Protoc. https://doi.org/10.1101/pdb. The authors declare no competing interests. prot5048 (2008). 35. Zhang, P., Xia, N. & Reijo Pera, R. A. Directed dopaminergic neuron diferentiation from human pluripotent stem cells. J. Vis. Exp. https://doi. Additional information org/10.3791/51737 (2014). Supplementary information is available for this paper at https://doi.org/10.1038/ 36. Perrier, A. L. et al. Derivation of midbrain dopamine neurons from human s41556-018-0094-3. embryonic stem cells. Proc. Natl Acad. Sci. USA 101, 12543–12548 (2004). Reprints and permissions information is available at www.nature.com/reprints. 37. Yeom, Y. I. et al. Germline regulatory element of Oct-4 specifc for the totipotent cycle of embryonal cells. Development 122, 881–894 (1996). Correspondence and requests for materials should be addressed to F.F. 38. Sasaki, K. et al. Robust in vitro induction of human germ cell fate from Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in pluripotent stem cells. Cell Stem Cell 17, 178–194 (2015). published maps and institutional affiliations. Methods For nuclear staining, 4,6-diamidino-2-phenylindole (DAPI; 1 μ​g ml–1) was used. hESC culture and diferentiation. H1 hESCs (WiCell) were maintained feeder- Images shown are representative of at least three independent experiments. free on Matrigel (BD Biosciences)-coated plates as previously described1–5. Te cell line was authenticated by immunostaining for cell-type specifc proteins and tested RNA isolation, library preparation and sequencing analysis. Total RNA was as negative for mycoplasma contamination. All cultures were grown at 37 °C with extracted using an Acruturus PicoPure RNA Isolation kit (Life Technologies).

5% CO2 in mTeSR1 medium (Stemcell Technologies). For diferentiation, cells were RNA quality was determined using a Bioanalyzer 2100 (Agilent). Sequencing seeded overnight at a density of 200,000 per well in 6-well plates and medium was libraries were constructed using a SMARTer universal low input RNA kit replaced the next day with diferentiation media (knockout DMEM supplemented (Clonetech) according to the manufacturer’s instructions. DNA library samples with 20% fetal bovine serum, 1 mM L-glutamine, 0.1 mM nonessential amino acids, were submitted to the Stanford Genomics Facility and 100--end 0.1 mM β​-mercaptoethanol and 50 ng ml−1 recombinant human BMP4, BMP7 and high-throughput sequencing was performed. All sequenced libraries were mapped 42,43 BMP8b (R&D systems)) as previously described5,34,40. Diferentiation medium was to the using TopHat and Cufflink and the default parameter changed every other day for 7 days. setup. Differential expression was analysed using StrandNGS (AvadisNGS).

Xenotransplantation assay. Human cell lines were transplanted into the testes Gene expression analysis by qPCR. Total RNA for qPCR was extracted of busulfan-treated, immunodeficient nude mice (NCr nu/nu; Taconic) as from cells using a RNeasy Plus Mini kit (Qiagen) and 1 μg​ RNA was reverse previously described1–4,41. Immunodeficient nude mice were treated with a single transcribed using the SuperScript® III First-Strand Synthesis System. qPCR was dose of busulfan (40 mg kg−1; Sigma-Aldrich) at 5–6 weeks of age to eliminate performed in triplicate using Power SYBR® Green PCR Master Mix (both from endogenous spermatogenesis. Xenotransplantation was then performed 5–6 Life Technologies) with the data normalized to housekeeping genes. The primer weeks after busulfan treatment by injecting 7–8 μ​l cell suspensions (1.5–3 million sequences are listed in Supplementary Table 1. Data shown are representative of at cells per testis) containing 10% trypan blue (Invitrogen) into the seminiferous least three independent experiments. tubules of each recipient testes via cannulation of the efferent ducts. At 8 weeks after transplantation, recipient mouse testes were collected for histology and Immunohistochemistry of recipient mouse testes. Paraformaldehyde-fixed immunohistochemical analyses. Procedures were evaluated and approved by the mouse testes were sectioned by AML Laboratories by paraffin embedding University of Pittsburgh and Montana State University Institutional Animal Care and conducting serial cross-sectioning every 5 mm. For deparaffinization, and Use Committee (IACUC); all procedures were compliant with all relevant two consecutive xylene (Sigma-Aldrich) treatments (each 10 -min long) were ethical regulations regarding animal research. performed followed by rehydration in 100%, 100%, 90%, 80% and 70% ethanol treatments. A 10-min wash in tap water was then performed. Antigen retrieval ChIP on human fetal testes. Second trimester human fetal testes were staged and was conducted by boiling slides for 30 min in 0.01 M sodium citrate (pH 6.0; procured from Advanced Bioscience Resources. The ChIP protocol for human Sigma-Aldrich), cooling slides for 30 min, followed by a 10 min wash in PBS. fetal testes (22 weeks) was adapted from published protocols25,26 and optimized Blocking and permeabilization was conducted by adding 10% normal donkey as follows. The protocol for tissue procurement and use was approved by the serum (Jackson Immunoresearch) with 0.1% Triton X (Sigma-Aldrich) in PBS Institutional Review Board of Montana State University (RR-P031014-EX); all for 1 h, followed by incubation with primary antibodies diluted in 1% blocking procedures were compliant with all relevant ethical regulations regarding human solution overnight at 4 °C in a humidified chamber. The following antibodies research with unidentifiable banked tissue. The consent was obtained from all the were used: DDX4 (R&D; AF2030), GFP (Abcam; ab13970), OCT4 (Santa Cruz; participants. Human fetal testis samples (~25 mg for each ChIP) were placed in a sc-8628) and DAZL (Novus; NB100-2437). Slides were washed in PBST, followed 100 mm dish on ice and finely minced using a clean scalpel. Minced tissue was by incubation for 1 h with fluorescently labelled secondary antibodies raised in then transferred into a 15 ml conical tube. PBS (1 ml) with protease inhibitor donkey, followed by additional washes in PBST. The primary antibodies were cocktail (Roche) was added to the tissue. To crosslink, tissue was fixed in used at 1:200 dilution and secondary antibodies were used at 1:300 dilution. 1.5% formaldehyde and rocked at room temperature for 20 min, quenched with All samples were mounted using ProLong Gold Anti-fade mounting media 1 volume of 250 mM glycine (room temperature, 5 min), and rinsed with chilled containing DAPI (Life Technologies). Samples were then imaged using a confocal TBSE buffer (20 mM Tris-HCl, 150 mM NaCl, 1 mM EDTA) twice. Crosslinked microscope (Zeiss). Quantification of GFP-positive and DDX4-positive staining tissue was resuspended in PBS with protease inhibitor cocktail (1 ml 0.1% SDS was determined manually from multiple sections taken from two to three different buffer for every 100 μl​ of nuclear pellet) and transferred to a Dounce homogenizer. depths within the testis (technical replicates) and from multiple clonal replicates Tissue pieces were then disaggregated into single cell suspension with 20–25 for each transplanted cell line (biological replicates). strokes. The cell suspension was transferred to a 15 ml conical tube and centrifuged at 1,000g in a bench-top centrifuge for 5 min at 4 °C. Cells were lysed Motif analysis. Motif analysis was performed using the MEME-ChIP suite and with 1% SDS lysis buffer (on ice, 5 min) and then centrifuged (450g, 10 min). the default parameters. The novel motif for OCT4 in hPGCs were searched in Supernatant was removed as described above and samples were resuspended in TRANSFAC and JASPER databases to find transcription factors with similar 44,45 0.1% SDS buffer with protease inhibitor cocktail. Samples were then sonicated with consensus sequences . glass beads 12 times (30 s pulses with 30 s break interval) using a Bioruptor water bath sonicator (Diagenode). Chromatin extracts were then precleared with Dynal CRISPR design and PAX5-knockout derivation. CRISPR guide RNAs (gRNAs) Magnetic Beads (Invitrogen) (4 °C, 1 h) followed by centrifugation (450g, 30 min). were designed using the online CRISPR design tool from the Massachusetts Supernatant (precleared chromatin) was immunoprecipitated overnight with Institute of Technology (http://crispr.mit.edu/). Candidate gRNAs with the highest Dynal Magnetic Beads coupled with the following antibodies: anti-OCT4 score were chosen for each genomic region. Oligonucleotides for these gRNAs (sc-8628, Santa Cruz Biotechnology), anti-PAX5 (sc-1974, Santa Cruz were synthesized and cloned into plasmid pX459 (Addgene; 48139) carrying both Biotechnology), and anti-PRDM1 (C14A4, Cell Signaling Technology). Cas9 and gRNA expression cassettes, with one modification of the original plasmid On the next day, beads were washed with 0.1% SDS buffer for three times followed in which the Cas9 promoter was replaced by the EEF1A1 promoter. The cutting by a wash with TE buffer. Chromatin was eluted by incubating beads in efficiency of each gRNA construct was validated by transfecting HEK293T cells TE buffer supplemented with 1% SDS at 65 °C for 30 min. Afterwards, chromatin and sequencing the target regions in the genome. CRISPR pairs were nucleofected underwent reverse crosslinking with pronase at 42 °C for 2 h and at 67 °C for 6 h. into H1 hESCs and plated as single cells. Single cells were clonally expanded and DNA was then purified using phenol–chloroform extraction. ChIP DNA was isolated for PCR to test for successful PAX5 deletion and for sequencing. subjected to 15 cycles of amplification using a SeqPlex DNA Amplification kit (Sigma). Real-time qPCR (RT-qPCR) was performed (ABI PRISM 7900 Construction of overexpression constructs by lentiviral vectors. The sequences Sequence Detection System), and relative occupancy values were determined of PAX5 and PRDM1 were assembled using Gibson Assembly Cloning technology based on the immunoprecipitation efficiency (amount of immunoprecipitated (NEB). Briefly, gBlocks (gene fragments) were synthesized by Integrated DNA DNA relative to input). Illumina HiSeq 2 ×​ 60 base-pair paired end reads were used Technologies, and individual gBlocks were assembled to one gene transcript with for sequencing. Gibson Assembly Cloning technology followed by an amplification reaction with Phusion DNA polymerase according to the manufacturer’s instructions. Amplified Immunofluorescence analysis for cell culture. Cells were fixed with 4% genes were ligated into a pENTR/D-TOPO vector (Life Technologies) using the paraformaldehyde at room temperature for 20 min, permeabilized with 0.3% Triton Multisite Gateway system. Clones were transformed into One-Shot Competent X-100 in PBS (PBST) for 10 min, and blocked for 45 min at room temperature Escherichia coli, and DNA was purified and sequenced. Positive clones were in PBST containing 5% normal donkey serum (Jackson ImmunoResearch used for a recombination reaction with the Gateway destination vector (pcDNA- Laboratories). Primary antibodies were diluted at 1:200 in blocking solution DEST40). Subsequent transformation into One-Shot Competent E. coli, followed and incubated at 4 °C overnight. The following primary antibodies were used in by DNA purification and sequencing for verification of correct cloning, resulted in this study: DDX4 (R&D; AF2030), OCT4 (Santa Cruz; sc-8628), PAX5 (Santa overexpression vectors for PAX5 and PRDM1. Cruz; sc-1974), PRDM1 (Cell Signaling; 9115) and c-KIT (DAKO; A4502). Appropriate Alexa Fluor 488, 594 or 647-conjugated secondary antibodies (Jackson Luciferase assay for enhancer activity. Enhancer sequences were generated by ImmunoResearch Laboratories) were diluted at 1: 300 in PBS containing 0.1% PCR of human genomic DNA discovered for POU5F1 and PRDM1 and then bovine serum albumin (BSA) and incubated at room temperature for 1 h. cloned into pGL4.28(luc2CP/minP/Hygro) (Promega; E8461) with restricted enzymes KpnI and BmtI (NEB). The minimal promoter pGL4.28 was used as allocated into groups receiving various cell line injections. All statistical analyses negative control. Luciferase activity was measured using a dual-luciferase reporter were conducted using GraphPad Prism (version 5). Two-tailed Student’s t-test assay system (Promega) as per the manufacturer’s protocol. were used when data met criteria for parametric analysis (normal distribution or equal variances). P values are shown in the figures, and the number of biological GO analysis. GO enrichment was performed using GREAT analysis and the replicates for each experiment is indicated in the figure legends. Experiments were default parameters46. repeated independently at least three times.

FACS and flow cytometry. Cells were dissociated in 0.25% trypsin–EDTA buffer Reporting summary. Further information on experimental design is available in (Gibco) at 37 °C for 5 min and collected by centrifugation at 200 g in an the Nature Research Reporting Summary linked to this article. Eppendorf 5702R centrifuge. Then the cells were passed through 70 μ m strainers (BD Biosciences) to ensure they were digested as single cells before they were Data availability. All sequencing data that support the findings of this study have subject to flow cytometry analysis. Mouse testes cells were dissociated with 0.25% been deposited in NCBI’s GEO under accession code GSE100639. Previously trypsin–EDTA buffer for 30 min at 37 °C. Dissociated cells were incubated in 1% published sequencing data that were re-analysed here are available under accession BSA in PBS containing primary antibodies (c-KIT (DAKO; A4502)) on ice for 20 codes GSE60138, GSE39821 and GSE67259. Source data for the following min. Cells were then analysed for mOrange expression or c-KIT and GFP using a figures are provided in Supplementary Table 2: Fig. 3g,h; Fig. 4a,d,e,g,h; BD FACSAriaII cell sorter. Analysis was performed using LSRII (Becton Fig. 5d,e,g,h,I; Fig. 6a–c; Supplementary Fig. 2b; Supplementary Fig. 3a,d,f; Dickinson) and FlowJo software (Tree Star). Supplementary Fig. 4c,d; Supplementary Fig. 6a,b; and Supplementary Fig. 7a,c,d,e,g,h. All other data supporting the findings of this study are available GST pull-down assay. The recombinant protein OCT4 (Novus; H00005460-P01) from the corresponding author upon reasonable request. was bound to glutathione–sepharose beads (Amersham) and incubated with recombinant PAX5 protein (Novus; H00005079-P01) overnight at 4 °C. Beads were washed six times with cell lysis buffer. The eluents were analysed by References western blotting. The blots shown are representative of at least three independent 40. Kee, K., Gonsalves, J. M., Clark, A. T. & Reijo Pera, R. A. Bone experiments. morphogenetic proteins induce germ cell diferentiation from human embryonic stem cells. Stem Cells Dev. 15, 831–837 (2006). Differentiation to neuronal progenitor cells from hESCs. Differentiation of 41. Hermann, B. P. et al. Spermatogonial stem cell transplantation into rhesus hESCs into neuronal progenitor cells was carried out as previously described35,36. testes regenerates spermatogenesis producing functional sperm. Cell Stem Cell hESCs were dissociated into single cells with accutase, depleted of mouse 11, 715–726 (2012). embryonic fibroblasts (MEF) feeders by incubating on a gelatin-treated culture 42. Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions dish for 30 min, and then plated onto a Matrigel-coated dish at a density of 36,000 with RNA-Seq. Bioinformatics 25, 1105–1111 (2009). cells cm–2 in mTeSR1 (Stemcell Technologies) in the presence of 2 μM thiazovivin 43. Trapnell, C. et al. Transcript assembly and quantifcation by RNA-seq reveals (Santa Cruz Biotechnology). Differentiation was started 48 h later with KnockOut unannotated transcripts and isoform switching during cell diferentiation. Serum Replacemnt (KSR) differentiation medium supplemented with different Nat. Biotechnol. 28, 511–515 (2010). combinations of small molecules6. Each well of the 6-well plate was coated by 44. Sandelin, A. et al. Arrays of ultraconserved non-coding regions span the 1 ml fibronectin, which was diluted with cold (4 °C) PBS to 2 µg ml–1, and the loci of key developmental genes in vertebrate genomes. BMC Genomics 5, plates were incubated at 37 °C overnight before seeding the cells. After 20 days of 99 (2004). differentiation, cells were replated into plates coated with poly-L-ornithine, 45. Wingender, E., Dietze, P., Karas, H. & Knuppel, R. TRANSFAC: a database on laminin and fibronectin and cultured with B27 differentiation medium. transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996). Statistics and reproducibility. No statistical methods were used to predetermine 46. McLean, C. Y. et al. GREAT improves functional interpretation of cis- samples or outcomes. For the xenotransplantation studies, animals were regulatory regions. Nat. Biotechnol. 28, 495–501 (2010). randomly nature research | reporting summary

Corresponding author(s): Fang Fang

Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistical parameters When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main text, or Methods section). n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one- or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section. A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Clearly defined error bars State explicitly what error bars represent (e.g. SD, SE, CI)

Our web collection on statistics for biologists may be useful.

Software and code Policy information about availability of computer code Data collection no software was used.

Data analysis For ChIP-seq analysis: Bowtie (1.1.2), MACS (1.4.1). For RNA-seq analysis: TopHat (2.1.1), Cufflink (2.2.1), StrandNGS. For Motif analysis: MEME-ChIP(4.12.0). Statistical analysis: GraphPad Prism 5 (5.03). Flow cytometry analysis: Flowjo (10.4.2)

For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. March 2018 Data

1 Policy information about availability of data nature research | reporting summary All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: - Accession codes, unique identifiers, or web links for publicly available datasets - A list of figures that have associated raw data - A description of any restrictions on data availability

All sequencing data that support the findings of this study have been deposited in NCBI’s GEO under accession number GSE100639. Source data for Fig. 3g, h; 4a, d, e, g, h; 5d, e, g, h, I; 6a-c and Supplementary Fig. 2b; 3a, d, f; 4c-d; 6a, b; 7a, c, d, e, g, h is provided in Supplementary Table 2. All other data supporting the findings of this study are available from the corresponding authors on reasonable request.

Field-specific reporting Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Life sciences Behavioural & social sciences For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf

Life sciences

Study design All studies must disclose on these points even when the disclosure is negative. Sample size No statistical was used to predetermine sample size. The sample size was determined based on the previous publications and experimental designs that have similar objectives( Ramathal et al., Cell Rep., 2014; Ramathal et al., Sci. Rep., 2015; Durruthy et al., Hum. Mol. Genet., 2014; Kee et al., Nature, 2009). Three to more independent results were used to perform statistical analyses. If less, no statistics were performed from these samples. All source data required for statistical tests are indicated in Supplementary Table 2 (Statistics data source).

Data exclusions Data were excluded when negative or positive controls were not working.

Replication For each experiment, we performed at least two independent biological replicates and all attempts were successful.

Randomization For the xenotransplantation studies, animals were randomly allocated into groups receiving various cell line injections. Randomization (formal or otherwise) was not relevant for other data included in the manuscript.

Blinding Investigators were not blinded to group allocation. For the counting analysis for xenotransplantation, investigators were blinded to the samples.

Materials & experimental systems Policy information about availability of materials n/a Involved in the study Unique materials Antibodies Eukaryotic cell lines Research animals Human research participants

Antibodies Antibodies used anti-OCT4 (sc-8628, Santa Cruz Biotechnology, Lot#: C2613); anti-PAX5 (sc-1974, Santa Cruz Biotechnology, Lot#: A3113); anti-PRDM1 (C14A4, Cell Signaling Technology, Lot:4); anti-DDX4 (R&D #AF2030, Lot: KPX0 4051);

anti-GFP (Abcam; ab18373, Lot: GR254056-13); March 2018 anti-DAZL (Novus; NB100-2437; Lot: P1); anti-CKIT (A4502; DAKO); anti-beta-ACTIN (8H10D10, Cell Signaling Technology, Lot:15) HRP-conjugated anti-rabbit IgG secondary antibody, Supplier: Santa Cruz Biotechnology, Cat.: sc-2004, Lot: H0913. HRP-conjugated anti-goat IgG secondary antibody, Supplier: Santa Cruz Biotechnology, Cat.: sc-2020, Lot: B0613. HRP-conjugated anti-mouse IgG secondary antibody, Supplier: Santa Cruz Biotechnology, Cat.: sc-2005.

2 Alexa Fluor® 488 AffiniPure Donkey Anti-Goat IgG (H+L), Jackson ImmunoResearch Laboratories, Code: 705-545-003

Alexa Fluor® 594 AffiniPure Donkey Anti-Rabbit IgG (H+L), Jackson ImmunoResearch Laboratories, Code: 711-585-152. nature research | reporting summary The commercial antibodies were validated based on the information on the manufacturers' instructions. For ChIP and Co-IP assay, antibodies were used at 1:100. For immunostaining, primary antibodies were used at 1:200 and secondary antibodies were used at 1:300. For western blot, primary antibodies were used at 1:1000 and secondary antibodies were used at 1:5000. anti-OCT4 (sc-8628, Santa Cruz Biotechnology, Lot#: C2613); anti-PAX5 (sc-1974, Santa Cruz Biotechnology, Lot#: A3113); anti-PRDM1 (C14A4, Cell Signaling Technology, Lot:4); anti-DDX4 (R&D #AF2030, Lot: KPX0 4051); anti-GFP (Abcam; ab18373, Lot: GR254056-13); anti-DAZL (Novus; NB100-2437; Lot: P1); anti-CKIT (A4502; DAKO); anti-beta-ACTIN (8H10D10, Cell Signaling Technology, Lot:15) HRP-conjugated anti-rabbit IgG secondary antibody, Supplier: Santa Cruz Biotechnology, Cat.: sc-2004, Lot: H0913. HRP-conjugated anti-goat IgG secondary antibody, Supplier: Santa Cruz Biotechnology, Cat.: sc-2020, Lot: B0613. HRP-conjugated anti-mouse IgG secondary antibody, Supplier: Santa Cruz Biotechnology, Cat.: sc-2005. Alexa Fluor® 488 AffiniPure Donkey Anti-Goat IgG (H+L), Jackson ImmunoResearch Laboratories, Code: 705-545-003 Alexa Fluor® 594 AffiniPure Donkey Anti-Rabbit IgG (H+L), Jackson ImmunoResearch Laboratories, Code: 711-585-152. For ChIP and Co-IP assay, antibodies were used at 1:100. For immunostaining, primary antibodies were used at 1:200 and secondary antibodies were used at 1:300. For western blot, primary antibodies were used at 1:1000 and secondary antibodies were used at 1:5000.

Validation The commercial antibodies were validated based on the information on the manufacturers' instructions. Eukaryotic cell lines Policy information about cell lines Cell line source(s) H1 hESCs (WiCell) (feeder free) 293T (ATCC)

Authentication Immunostaining of cell type specific markers was performed for cell line authentication.

Mycoplasma contamination All lines were tested negative for mycoplasma contamination.

Commonly misidentified lines No commonly misidentified cell lines were used (See ICLAC register) Research animals Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research Animals/animal-derived materials In this study, we used busulfan-treated, immune-deficient nude male mice (NCr nu/nu; Taconic). Mice are treated with busulfan (40 mg/kg) at 5-6 weeks of age and then transplanted 5-12 weeks post busulfan. Human research participants Policy information about studies involving human research participants Population characteristics In this study, we obtained second trimester human fetal testis tissue. Note that the protocol for tissue procurement and use was approved by the Institutional Review Board of Montana State University (RR-P031014-EX); all procedures were compliant with all relevant ethical regulations regarding human research with unidentifiable banked tissue. Method-specific reporting n/a Involved in the study ChIP-seq Flow cytometry Magnetic resonance imaging March 2018 ChIP-seq Data deposition Confirm that both raw and final processed data have been deposited in a public database such as GEO. Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

3 https://www.ncbi.nlm.nih.gov/geo/submission/update/?acc=GSE100639

Data access links nature research | reporting summary May remain private before publication.

Files in database submission 1M_H9_OCT4_ChIP_CGATGT-04-30-2013.fastq 2nd_OCT4_SC_GCCAAT-04-30-2013.fastq 2nd_Test_OCT4_Ab_ACAGTG-04-30-2013.fastq IM_Mix_OCT4_TGACCA-04-30-2013.fastq testes_prdm1_chip-11-26-2013.fastq PAX5-testis-130909_BRISCOE_0116_BD2CRUACXX_L6_CTTGTA_pf.fastq.gz input-11-26-2013.fastq 1M_H9_OCT4.sorted.dedup.bam 2nd_OCT4_SC.sorted.dedup.bam 2nd_Test_OCT4_Ab.sorted.dedup.bam IM_Mix_OCT4.sorted.dedup.bam PAX5_TESTES_BRISCOE_0116_BD2CRUACXX_L6_CTTGTA_pf.bam TESTES_PRDM1.bowtie2.srt.bam

Genome browser session no longer applicable (e.g. UCSC) Methodology Replicates We did two replicates for OCT4 ChIP in human fetal testis, for the other ChIP, we only provided one

Sequencing depth single-end, 36bp

Antibodies anti-OCT4 (sc-8628, Santa Cruz Biotechnology); anti-PAX5 (sc-1974, Santa Cruz Biotechnology); anti-PRDM1 (C14A4, Cell Signaling Technology);

Peak calling parameters Effective genome size 2700000000.0 Tag size 49 Band width 300 Pvalue cutoff for peak detection 1e-05 Select the regions with MFOLD high-confidence enrichment ratio against background to build model 32 Parse xls files into into distinct interval files False Save shifted raw tag count at every bp into a wiggle file wig Extend tag from its middle point to a wigextend size fragment. -1 Resolution for saving wiggle files 10 Use fixed background lambda as local lambda for every peak region False 3 levels of regions around the peak region to calculate the maximum lambda as local lambda 1000,5000,10000 Build Model create_model Diagnosis report no_diag Perform the new peak detection method (futurefdr) False

Data quality OCT4 ChIP in human fetal testis: 27061 PAX5 ChIP in human fetal testis: 22728 PRDM1 ChIP in human fetal testis: 48399 OCT4 ChIP in hESCs:14799

Software Aligment by Bowtie-illumina/Galaxy Peak call by MACS-ChIP-seq/Galaxy BiwWig generated by WigtoBigWig/Galaxy Motif detection by MEMEChIP

Flow Cytometry Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).

The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). March 2018 All plots are contour plots with outliers or pseudocolor plots. A numerical value for number of cells or percentage (with statistics) is provided. Methodology Sample preparation hESCs were dissociated in 0.25% trypsin--EDTA (Gibco BRL) at 37ºC for 5 min and collected by centrifugation at 200g in an Eppendorf 5702 R centrifuge. Then the cells were passed through the 70uM strainers (BD Biosciences) to make sure they were

4 digested as single cells before they were subject to the flow cytometry.

Mouse testis cells were dissociated with 0.25% Trypsin–EDTA for 5 min at 37 °C. Dissociated cells were passed through the 40uM nature research | reporting summary strainers (BD Biosciences) and then incubated in 1% BSA in PBS containing primary antibodies on ice for 20 min.

Instrument BD FACSAria 2.0

Software For data collection: BD FACSDiva v 6.1.2 software; For data analysis: Flowjo (10.4.2)

Cell population abundance Sorted cells were directly placed into DNA extraction buffer. The abundance for these samples could not be assessed.

Gating strategy After cells were selected in the FSC/SSC dot plot to remove debris, they were gated to exclude cellular aggregates in the FSC/FSC dot plot. Gates of GFP-FITC, mOrange-PE, or CKIT-PE cells were set and compared with a control sample with no detectable fluorochrome expression. Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. March 2018

5