12152–12169 Nucleic Acids Research, 2017, Vol. 45, No. 21 Published online 11 September 2017 doi: 10.1093/nar/gkx798 Principles for the regulation of multiple developmental pathways by a versatile transcriptional factor, BLIMP1 Tadahiro Mitani1,2, Yukihiro Yabuta1,2, Hiroshi Ohta1,2, Tomonori Nakamura1,2, Chika Yamashiro1,2, Takuya Yamamoto3,4, Mitinori Saitou1,2,3,5,* and Kazuki Kurimoto1,2,*

1Department of Anatomy and Cell Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606–8501, Japan, 2JST, ERATO, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606–8501, Japan, 3Center for iPS Cell Research and Application, Kyoto University, 53 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606–8507, Japan, 4AMED-CREST, AMED 1-7-1 Otemachi, Chiyoda-ku, Tokyo 100-0004, Japan and 5Institute for Integrated Cell-Material Sciences, Kyoto University, Yoshida-Ushinomiya-cho, Sakyo-ku, Kyoto 606–8501, Japan

Received April 13, 2017; Revised August 25, 2017; Editorial Decision August 26, 2017; Accepted August 30, 2017

ABSTRACT INTRODUCTION Single transcription factors (TFs) regulate multiple Transcription factors (TFs) recognize short DNA se- developmental pathways, but the underlying mecha- quences and control the expression of associated , con- nisms remain unclear. Here, we quantitatively char- tributing to the generation and maintenance of diverse cell acterized the genome-wide occupancy profiles of types throughout the body based on a single set of genomic BLIMP1, a key transcriptional regulator for diverse information. Remarkably, single TFs can function in the development of many distinct cell types, and clarification developmental processes, during the development of the mechanism underlying this phenomenon remains a of three germ-layer derivatives (photoreceptor pre- fundamental challenge. To understand this mechanism, it cursors, embryonic intestinal epithelium and plas- will be critical to identify the genome-wide binding pro- mablasts) and the germ cell lineage (primordial germ files of relevant TFs in multiple developmental processes cells). We identified BLIMP1-binding sites shared in a systematic and quantitative manner. Studies along this among multiple developmental processes, and such line have been performed on cultured cell lines and a lim- sites were highly occupied by BLIMP1 with a strin- ited number of developmental lineages, and have revealed gent recognition motif and were located predomi- a number of key regulatory mechanisms for transcriptional nantly in promoter proximities. A subset of bindings activation, including the selection and activation of specific common to all the lineages exhibited a new, strong enhancers by collaborative TF interactions at closely spaced recognition sequence, a GGGAAA repeat. Paradox- DNA recognition motifs [reviewed in (1,2)]. On the other hand, cellular development proceeds under cross-talking ically, however, the shared/common bindings had signals that may promote irrelevant differentiation or cellu- only a slight impact on the associated ex- lar states, and thus repressive transcriptional programs are pression. In contrast, BLIMP1 occupied more distal also vital for appropriate cellular development. Repressive sites in a cell type-specific manner; despite lower transcriptional programs often play a key role in transient occupancy and flexible sequence recognitions, such cell populations, but there have been relatively few analy- bindings contributed effectively to the repression of ses investigating such programs with regard to TF-binding the associated genes. Recognition motifs of other profiles across multiple cell lineages. key TFs in BLIMP1-binding sites had little impact on B lymphocyte-induced maturation 1 [BLIMP1, the expression-level changes. These findings sug- also known as PR domain containing 1 (PRDM1)] was gest that the shared/common sites might serve as originally identified as a key factor for the differentiation of potential reservoirs of BLIMP1 that functions at the plasma cells from B lymphocytes (3,4). It has been shown to act primarily as a transcriptional repressor and to recog- specific sites, providing the foundation for a unified nize specific DNA sequences proximal to the transcription understanding of the genome regulation by BLIMP1, start sites (TSSs) in complexes with various co-repressors and, possibly, TFs in general. (3–11). BLIMP1 has subsequently been shown to play crit- ical roles in a wide variety of developmental pathways in embryos and adults, including embryonic derivatives from

*To whom correspondence should be addressed. Tel: +81 75 753 4337; Fax: +81 75 751 7286; Email: [email protected] Correspondence may also be addressed to Mitinori Saitou. Tel: +81 75 753 4335; Fax: +81 75 751 7286; Email: [email protected]

C The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Nucleic Acids Research, 2017, Vol. 45, No. 21 12153 all three germ layers, the germ line and extraembryonic lin- Cell purification eages (12). Thus, BLIMP1 is one of the TFs required for the The photoreceptor progenitors (PRPs) [postnatal day (P) widest ranges of developmental processes and would be in- 4 for ChIP-seq, P0, P4, P6 and P10 for RNA-seq], embry- structive in a comparative analysis of repressive programs. onic intestinal epithelium (emE) cells (E16.5 for ChIP-seq, Accordingly, genome-wide BLIMP1-binding profiles have and E16.5, P3 and P14 for RNA-seq), and male and female been analyzed in several lineages (13–16), and the function PGCs (PGCMs and PGCFs, respectively) (E12.5 for ChIP- of BLIMP1 as a transcriptional activator has also been doc- seq) were purified from EGFP-BLIMP1 mice using fluo- umented (15). On the other hand, systematic comparisons rescence activated cell sorting (FACS) (ARIA III; BD Bio- of BLIMP1-binding profiles across distinct cell types have sciences) (Figure 2A; Supplementary Figures S3A–D and been difficult/impractical, due to differences in the technol- S6A–D). PBs were induced from splenic B cells of EGFP- ogy employed for obtaining such datasets. Thus, key ques- BLIMP1 mice, and purified with FACS. The cell suspen- tions related to the mechanism of action of BLIMP1 remain sions of the respective cell types were prepared as described unanswered, including: How do the binding patterns differ previously (24–28) with minor modifications. For RNA-seq, among cell types? Which binding sites are cell-type specific we took two biological replicates of the above cell types at or common? How do the binding differences influence gene each time point. For ChIP-seq, we took two biological repli- expression? Is there any function of BLIMP1 common to cates for PRP, emIE and PB cells, and single pooled sam- all cell types? ples for male and female PGCs. The cell types and biolog- Using a unified, quantitative ChIP-seq method amenable ical replicates for ChIP-seq and RNA-seq are summarized for a relatively small number of cells (13), we here in- in Supplementary Table S1. vestigated the BLIMP1-binding profiles and their impacts on gene expression during four distinct developmental processes in mice: (i) differentiation of photoreceptors RNA-seq library construction from their precursors (photoreceptor precursors; PRP cells) Total RNA was purified with an RNeasy Micro Kit (QI- (17,18), (ii) maturation of the intestinal epithelium (IE) AGEN) according to the manufacturer’s instructions. One from its embryonic form (emIE) (19,20), (iii) differentia- nanogram of purified total RNA was used for synthesis and tion of plasmablasts (PBs) from B cells (4,15,21), (iv-a) the amplification of cDNAs, and for construction of the cDNA specification process of primordial germ cells (PGCs) atem- libraries for RNA sequencing; in constructing the cDNA ∼ bryonic day (E) 6.5 E9.5 [reconstituted in vitro as induc- libraries, a previously reported method was used to selec- tion of PGC-like cells (PGCLCs) from embryonic stem cells tively detect the 3 ends of mRNA (29,30). (ESCs) via epiblast-like cells (EpiLCs)] (13,22), and (iv-b) late PGC development (∼E12.5) (23). Based on the results, we then clarified the mechanisms of action of this highly Chromatin immunoprecipitation versatile transcriptional regulator. ChIP of EGFP-BLIMP1 was performed as described previ- ously (13,31) with reference to the published guidelines (32). MATERIALS AND METHODS Library preparation for ChIP-seq The methods are described in detail in the Supplementary materials and methods section. Libraries for the next-generation sequencing were prepared as described previously (13) with a modification for appli- cation to the Illumina systems. Briefly, ChIP-ed and in- Animals put DNAs were sheared to an average size of 150 bp by All the animal experiments were performed under the eth- ultra-sonication (Covaris). The sheared DNAs were end- ical guidelines of Kyoto University. Homozygous EGFP- repaired, dA-tailed, ligated to an amplification adaptor (P1- Blimp1 knocked-in mice (EGFP-BLIMP1 mice) (Supple- T Adaptor/F and Barcode-Internal+12-mer/R) (Supple- mentary Figure S1A) were generated as reported previously mentary Table S2), amplified with a 10-cycle polymerase (13). chain reaction (PCR) using Library PCR primer 1 (Life Technologies) and Library PCR primer Barcode001 + In- ternal adaptor (Supplementary Table S2), and purified with Immunofluorescence (IF) AMPure XP beads supplemented with PEG 8000 and NaCl Embryos of EGFP-BLIMP1 mice at various developmen- (AMPure XP purification, hereafter). tal stages were dissected from euthanized pregnant females, For sequencing on the Illumina systems, we then added fixed in freshly prepared ice-cold 4% PFA (TAAB) for30 an adaptor and index to the amplified library with two min on ice, and embedded in OCT compound (Sakura rounds of additional PCRs. The first round of additional Finetek). The frozen samples were sectioned at 10 ␮m PCR was performed for six cycles with 1 ␮M each of Read1- thickness at −20◦C, and incubated with blocking solution P1 and Read2-internal-adaptor primers (Supplementary [0.1% BSA (Sigma) and 0.1% Triton X-100 (Wako)] con- Table S2), using Platinum Taq DNA Polymerase (Life Tech- taining the primary antibodies overnight at 4◦C, followed nologies) at a 50-␮l reaction scale, followed by three rounds by washing and incubation with secondary antibodies and of AMPure XP purification. The second round of PCR was 4,6-diamidino-2-phenylindole (DAPI). The samples were performed for four cycles with 1 ␮M each of Illumina P5- then observed under a confocal laser scanning microscope Read1 primer and P7-index N-Read2 primer (Supplemen- (Olympus FV1000) (Figure 1). tary Table S2) to produce the index-tagged library DNAs, 12154 Nucleic Acids Research, 2017, Vol. 45, No. 21 followed by two rounds of AMPure XP purification (Sup- Analyses of sequence properties plementary Figure S3E for scheme). For the ChIP-seq of We performed a sequence motif search for genomic regions d2 and d6 PGCLCs, we performed the two rounds of addi- within 1 Kb of the center of BLIMP1-binding sites using tional PCRs for the previously amplified libraries13 ( ). the MEME-ChIP program suit (MEME v4.10.1) (36). The To exclude the initial constant region consisting of the best-matched sequences of TF motifs were analyzed for the amplification adaptor sequence (P1-T Adaptor) and max- regions within 200 bp of the center of the binding sites imize the efficient sequencing reads, we designed a custom and in random regions, using position frequency matrices sequencing primer, Custom-primer-29mer (Supplementary (PFMs) (37) from the JASPAR database (Figure 5 and Sup- Table S2), which met the criteria for sequencing on an Il- plementary Figure S5). lumina platform (33) in terms of the length, GC content The GGGAAA repeats were considered to be present if and melting temperature (Supplementary Figure S3E for this hexamer appeared 10 times or more in a 72-bp window scheme). The DNAs were then sequenced on the HiSeq (i.e., 12 × 6 bp) contained in the BLMP1-binding sites or 2500 platform (Illumina) in high-throughput mode to gen- non-overlapping 2-Kbp tiles of the whole genome. erate single-end 100-bp reads, per the manufacturer’s in- structions (Supplementary Figure S3E). Expression analyses We identified genes significantly expressed in at least one Mapping and normalization of RNA-seq data cell type examined in this study (expressed genes) [maxi- mum log (RPM+1) >4, 12 876 genes], and used these log GRCm38/mm10 for the mouse genome and ref GRCm38 2 2 expression levels in the subsequent analyses. Among the 12 for the mouse transcript annotation were used for mapping 876 significantly expressed genes, we defined genes associ- of the read data. Read trimming, mapping and estimation ated with the three binding-site classes (specific, shared and of expression levels were performed as described previously common) using the same criteria described above (Supple- (29,30). The reads per million mapped reads (RPM) values mentary Figure S7C). were defined as expression levels, and then converted to log 2 Differentially expressed genes (DEGs) during the devel- (RPM+1) values and averaged for the two biological repli- opmental processes examined (photoreceptor differentia- cates. tion, perinatal IE development, terminal B cell differenti- ation, germ cell specification in vitro and germ cell devel- opment in vivo) were defined with the following three crite- ChIP-seq data analyses ria: (i) log2 expression levels >4 in at least one cell type (av- ChIP-seq data analysis and normalization were performed erage of biological replicates) during the examination pro- according to the previously described method (13) with cess; (ii) significant variance with one-way ANOVA testP ( modification. The BLIMP1 occupancy levels (see also the < 0.05) in at least one developmental process; and (iii) more Supplementary materials and methods section) were con- than 2-fold expression-level differences in at least one pair- firmed to be reproducible in the biological replicates of PRP, wise comparison among the cell types (average of biological emIE and PB cells, and d2 and d6 PGCLCs (Figure 2C replicates) within each process. and D; Supplementary Figure S3F), and were averaged be- DEGs were classified by k-means clustering using the dif- tween the replicates for the subsequent analyses. For E12.5 ferences from the mean of expression levels (k = 2 for pho- male/female PGCs, we used the occupancy levels of single toreceptor differentiation, perinatal IE development, termi- samples for the analyses. We defined the ‘BLIMP1-binding nal B cell differentiation and germ cell development; k = 3 sites’ as the peaks with occupancy levels >6. for germ cell specification). Such clusters were classified into The classes of the BLIMP1-binding sites detected in PRP either those positively or those negatively correlated with cells, emIE cells, PBs and d2 PGCLCs were defined as fol- the Blimp1 level (Figure 6C). lows: those exclusively detected in one cell type were defined as the ‘specific sites’; those detected in two or three cell types Integration of ChIP-seq and gene expression as the ‘shared sites’; and those detected in all four cell types We calculated the enrichment level of DEGs associated as the ‘common sites’ (Figure 3B and Supplementary Fig- with each class of the BLIMP1-binding sites during the de- ure S4A). The differential read densities at the specific sites velopmental processes (Figure 6C and D) as the ratio of were also confirmed with EdgeR software (Supplementary BLIMP1-binding-site-associated genes between DEGs and Figure S4B) (34). The enrichment or depletion of each class all genes. The enrichment levels of genes up/downregulated of binding sites (Figure 3E) was calculated by comparing by Blimp1 mutation associated with the BLIMP1-binding the observed and randomly expected distribution of each sites (Figure 6E) were similarly calculated. binding-site class. We defined BLIMP1-binding sites that were ‘proximal to Sequence motifs and gene expression TSSs’ (ref GRCm38) (within 1 Kb), ‘around genes’ (within 15 Kb from TSSs) and ‘at large distances’ (>15 Kb) (Fig- We calculated enrichment levels of DEGs associated with ure 4 and Supplementary Figure S4D). We defined those cell type-specific sites containing recognition motifs of located at a distance of more than 50 Kb as ‘far sites’. We BLIMP1 and other TFs (Figure 6F) as the ratios between defined BLIMP1-associated genes according to binding-site the percentages of DEGs associated with the BLIMP1- class within 15 Kb of the TSSs. GO terms were analyzed us- binding sites bearing TF motifs and those of all DEGs as- ing the DAVID tool (35). sociated with the BLIMP1-binding sites. Nucleic Acids Research, 2017, Vol. 45, No. 21 12155

Analyses of common BLIMP1 targets mice: (i) PRP cells [EGFP-positive (+), postnatal day (P) 4], in which BLIMP1 is highly expressed before its downregula- The DEGs were classified on the basis of the number of de- tion upon differentiation to rod cells (17,18); (ii) emIE cells velopmental processes in which the genes were consistently [EGFP (+)/EpCAM (+), E16.5], in which BLIMP1 is ex- correlated with the Blimp1 level, and genes associated with pressed to prevent premature remodeling to adult-type IE the common BLIMP1-binding sites (<15 Kb from TSSs) before its downregulation upon the suckling-weaning tran- were identified (Figure 7). sition (14,19,20); (iii) PBs [EGFP (+)/CD138 (+), day (d) 3 of induction from splenic B cells], in which BLIMP1 re- Analyses of BLIMP1-binding sites in the late PGCs presses the B cell program and activates a part of the plasma cell program (15,21); (iv-a) PGCLCs, in which BLIMP1 re- We identified 8776 BLIMP1-binding sites in E12.5 male presses somatic programs (d2 PGCLCs, which recapitulate or female PGCs (late PGCs). For the expression analy- PGC specification with a transient expression profile similar ses, we selected genes significantly expressed in at least to that of the early mesoderm at around E6.5; and d6 PG- one cell type among EpiLCs, d2-d6 PGCLCs and E9.5- CLCs, which recapitulate established PGCs at E9.5) (13); E13.5 male/female PGCs [log (RPM+1) >4, 9339 genes]. 2 and (iv-b) E12.5 male/female PGCs [EGFP (+)/SSEA1 Among these genes, 1579 and 1990 were associated with the (+)], the maintenance of which requires BLIMP1 (23)(Fig- BLIMP1-binding sites in the d2 PGCLCs and late PGCs, ure 2A and Supplementary Figure S3A–D). A consistent respectively (within 15 Kb from TSSs). This gene set was use of the single epitope tag, antibody, library construction used in the analyses shown in Figure 8EandF. method, and sequencing platform allowed a quantitative comparison of the BLIMP1-binding profiles among these RESULTS cell types (Supplementary Figure S3E and Supplementary Tables S1–4). Expression of EGFP-BLIMP1 in embryos and adults We obtained EGFP-BLIMP1 peaks with a sufficient We have established homozygous knock-in mice that bear signal-to-noise ratio, and quantified their read-tag densi- an epitope tag, an enhanced green fluorescence protein ties (normalized IP/input for the regions within 500 bp (EGFP), at the N terminus of Blimp1 (EGFP-BLIMP1 from the peak centers; BLIMP1 occupancy levels here- mice) (13) and analyzed the EGFP-BLIMP1 expression in after) (‘Materials and Methods’ section). Pairwise compar- embryos and adults (8–10 weeks old) using immunofluo- isons revealed reproducibility between the biological repli- rescence (IF) staining with an anti-GFP antibody (Figure cates (Supplementary Figure S3F), and consistency with 1; Supplementary Figures S1 and 2). We detected EGFP- the reported data for the whole embryonic intestine and BLIMP1 in the nuclei of all cells previously shown to ex- PBs (14,15) (Supplementary Figure S3G and H). Nonethe- press BLIMP1, including cells of the surface ectoderm, ec- less, we detected substantial quantitative differences be- toderm and mesenchyme of the branchial arch, limb bud tween our datasets and those reported previously, which mesenchyme and midgut endoderm (emIE), as well as pla- were likely attributable to the methodological differences, cental spongioblasts, otocysts, myotomes, PGCs, PRP cells reinforcing the importance of performing analyses under in the retina, dental papilla, keratinocytes of the hair shafts the same experimental conditions. As shown in Figure 2B, and cells of the surface layer of the developing tongue ep- ChIP-seq tracks in the genomic loci proximal to Myc,a ithelium (Figure 1 and Supplementary Figure S1) (38–40). canonical target of BLIMP1 and around the Hoxa cluster Moreover, we detected EGFP-BLIMP1 in the simple revealed binding properties characteristic to the four cell squamous epithelium within the embryonic ventricular my- types. ocardium, lung mesenchyme, kidney glomerulus and neu- We found that the BLIMP1-occupancy levels of PG- ral epithelium (Figure 1 and Supplementary Figure S1), CLCs and PGCs at E12.5 were highly correlated (Figure which were only found in embryos and were most likely 2C and Supplementary Figure S3I), and decided to use premature vascular endothelia, since they were positive for the datasets for the d2 PGCLCs as representative of the platelet-endothelial cell adhesion molecules (PECAM) and BLIMP1 binding of PGCs, since they capture the immedi- had squamous morphology (Figure 1; Supplementary Fig- ate BLIMP1 binding upon PGC specification. On the other ures S1 and 2). We also detected EGFP-BLIMP1 in granu- hand, PRP cells, emIE cells, PBs and d2 PGCLCs exhibited lar layers of the stratified squamous epithelia of the epider- highly varied occupancies of BLIMP1 as revealed by scat- mis, esophagus, and vagina, and migratory cells in the lam- terplots, correlation coefficients and principal component ina propria of the small intestine, which were most likely analysis (PCA) (41)(Figure2C, D and Supplementary Fig- plasma cells (Supplementary Figure S2). These results re- ure S3J), demonstrating that the BLIMP1-binding profiles inforced previous findings and identified novel cell types vary widely among the cell types. expressing BLIMP1, demonstrating that the N-terminal EGFP tag was appropriately integrated in all cellular con- Defining specific, shared and common BLIMP1-binding sites texts examined. Specific, shared and common BLIMP1-binding sites. We identified 7203, 3751, 5150 and 6385 BLIMP1-binding sites The landscapes of BLIMP1 bindings with significant occupancy levels (>6) in PRP cells, emIE Using a recently developed ChIP-seq method (13), we went cells, PBs and d2 PGCLCs, respectively (Figure 3Aand on to evaluate the genomic occupancy profiles of BLIMP1 Supplementary Table S5), and defined three binding-site in the following purified cell types from EGFP-BLIMP1 classes: (i) those exclusively detected in one cell type (spe- 12156 Nucleic Acids Research, 2017, Vol. 45, No. 21

A E8.5 C E9.5

Heart

DAPI GFP, Phalloidin B E8.5 Midgut

Limb bud

DAPI GFP, Phalloidin DAPI GFP, Phalloidin

E10.5 midgut E10.5 placenta D E8.5 E9.5 E F Spongiotrophoblast layer cells A B C

DAPI GFP, Phalloidin DAPI GFP, Phalloidin G E12.5 Lung H E12.5 Kidney I E12.5 Myotome Ventral Ureteric bud

Ventral Dorsal

Dorsal S-shaped body

DAPI GFP, Phalloidin DAPI GFP, Phalloidin DAPI GFP, Phalloidin

J E12.5 PGC_M K E12.5 PGC_F L E15.5 Retina

DAPI GFP, DDX4 DAPI GFP, DDX4 DAPI GFP, Phalloidin

M E15.5 Dental papilla N E15.5 Heart O E15.5 Vibrassae

Ventricular myocardium Internal root sheath

Trabecular hair shaft

DAPI GFP, Phalloidin DAPI GFP, Phalloidin DAPI GFP, Phalloidin P E16.5 Hair follicle Q E16.5 Tongue

DAPI GFP, Phalloidin DAPI GFP, Phalloidin Bar=100μm

Figure 1. Exploration of EGFP-BLIMP1 protein in the three germ layer derivatives, germ cells and trophoblast cell lineage. (A–D) IF analyses of GFP in whole embryos at E8.5 (A and B) and E9.5 (C) sectioned as illustrated (D). Panel (C) was prepared by piecing together high resolution images. (E–Q)IFsof GFP in the E10.5 midgut (E) and placenta (F), the E12.5 lung (G), kidney (H), myotome (I) and male (J) and female (K) PGCs, the E15.5 retina (L), dental papilla (M), heart (N) and vibrassae (O), and the E16.5 hair follicle (P) and tongue (Q). Counterstaining was performed with 4’,6-diamidino-2-phenylindole (DAPI) for DNA, and phalloidin for actin (A-I, L-Q) or DDX4 (J and K). Bar: 100 ␮m. Nucleic Acids Research, 2017, Vol. 45, No. 21 12157

A E5.5E6.5 E6.75 E9.5 E12.5 E16.5 P0P4 P10 P14 Adult BLIMP1 Rod precursors (BLIMP1+) P0 P4 P6 P10 Post-mitotic precursors Rods (BLIMP1-)

Ectoderm Retinal progenitors

(phtoreceptor) (BLIMP1-, PAX6+, SOX2+) Cones (BLIMP1-) Bipolar cells (BLIMP1-, VSX2+)

BLIMP1 Suckling to weaning transition E16.5 P3 P14 Villi formation Crypt formation Endoderm Midgut (BLIMP1+) Embryonic intestinal epithelium (BLIMP1+) Adult Intestinal epithelium (BLIMP1-) (small intestine) BLIMP1

Epiblasts Splenic B cells (BLIMP1-, CD45R/B220+, PAX5+, BCL6+) Gastrulation ex vivo d3 pHSCs in the bone marrow LPS, IL-4 Mesoderm

(B cell lineage) Plasmablasts (BLIMP1+, CD138+)

BLIMP1 (Female) (Male) Late PGCs (BLIMP1+) Specification E12.5 Early PGCs (BLIMP1+) Male E13.5 E15.5 (BLIMP1-) Spermatogenesis E6.75 E9.5 E10.5 E11.5 Germ cell

Mesodermal program Oogenesis in vitro Female 2i+LIF Activin d2 d4 d6 E13.5 (BLIMP1-) b-FGF BMP4 Enter the gonads 2 days ESCs EpiLCs (BLIMP1-) PGCLCs (BLIMP1+) B 30kb 300kb [0-5.6] [0-2.1] PRP [0-10.4] [0-3.9] emIE [0-12.8] [0-4.8] PB [0-8] [0-3] d2 PGCLC

Myc Halr1 Hoxa1 Hoxa13

PRP_rep1 C D d2 PGCLC PRP_rep2 ) emIE_rep1 1 100 emIE_rep2 PB_rep1 emIE PB_rep2 0.4 d2 PGCLC_rep1 0 d2 PGCLC_rep2 PB d6 PGCLC_rep1

d6 PGCLC_rep2 -0.2 (28.5%) PC2 Correlation coefficient (R E12.5_PGCM -100 PRP E12.5_PGCF

1 2 1 2 1 2 1 2 1 2

F

M 200

p p p p p p p p p p

C

C

e e e e e e e e e e

r r r r r r r r r r 100

G

G

______-200 P 0

P

P P E E B B

C C C C

_

I I _ -150

L L L L

P P 5 -100

R R

5 -50 -10 . 0 0

m m

.

C C C C P P 50

2

e e

2

1 G G G G PC3 1 (20.0%)

E

P P P P

E PC1 (40.5%)

2 2 6 6

d d d d

Figure 2. Overview of the BLIMP1-binding profiles.A ( ) Scheme of ChIP-seq and RNA-seq in this study. The cell types used for ChIP-seq analyses are indicated with colored circles enclosed by dotted lines, and those used for the RNA-seq are indicated with red arrowheads. (B) ChIP-seq tracks of EGFP- BLIMP1 around the Myc (30 Kb) and Hoxa cluster (300 Kb). (C) Heat map of the correlation coefficients of BLIMP1-occupancy levels among the cell types examined, with color codes as indicated. (D) PCA (PC1-PC3) of BLIMP1-occupancy levels in PRP cells, emIE cells, PBs and d2 PGCLCs. 12158 Nucleic Acids Research, 2017, Vol. 45, No. 21 cific sites: 4739, 1436, 2911 and 4051, respectively); (ii) those ing sites <15 Kb from the TSSs as those associated with shared in two or three cell types (shared sites: 1610, 1461, the genes). A majority of the specific, shared, and common 1385 and 1480, respectively); (iii) and those common to all binding sites were associated with distinct sets of genes, re- four cell types (common sites: 854) (Figure 3A, B and Sup- spectively, in all four cell types (Figure 4D), and notably, plementary Figure S4A, ‘Materials and Methods’ section). the genes associated with the specific sites were highly spe- The scatterplot comparisons and the PCA of the BLIMP1- cific to the respective cell types (Figure 4E) and were en- occupancy levels clearly delineated the characteristics of the riched with (GO) functional terms (42)re- three binding-site classes (Figure 3C and D); in the three- lated to the respective developmental processes: e.g. ‘pho- dimensional (3D) space defined by the PC1–3 axes, the spe- toreceptor cell development’ for PRP cells; ‘positive regu- cific sites created coherent clusters representing the binding lation of macromolecule metabolic process’ for emIE; ‘reg- characteristics of individual cell types, the shared sites ex- ulation of lymphocyte activation’ for PBs; and ‘embryonic hibited a sparse distribution between/among the clusters of morphogenesis’, which was repressed in PGCs (43), for d2 the specific sites for the shared cell types, and the common PGCLCs (Figure 4F). On the other hand, the genes asso- sites were plotted around the origin of the PCA (Figure 3D). ciated with the shared/common sites exhibited a high en- As shown in Figure 3B, while the numbers of the specific richment with GO terms for more general processes such sites varied widely, those of the shared sites were relatively as ‘transcription, DNA-templated’ (Supplementary Figure uniform (there were between 1385 and 1610 shared sites); S4I). consistently, the numbers of the overlapped binding sites in pairwise comparisons were relatively constant (from 1364 Sequences recognized by BLIMP1. We next went on to to 1638) (Supplementary Figure S4C), indicating that there investigate the DNA recognition mechanism by BLIMP1, was essentially no bias in cell types for the shared binding and performed a motif discovery using the MEME-CHIP sites. The numbers of specific and common sites were en- program (36). We identified highly enriched sequences sim- riched in comparison to those expected from a random dis- ilar to the canonical recognition motif of BLIMP1 (6,11) tribution among all the binding sites (∼2- and >4-fold, re- [MA0508.1 in the JASPAR database (44)] in all the binding- spectively), while the shared sites were largely depleted (Fig- site classes in the four cell types (Figure 5A). To examine ure 3E, ‘Materials and Methods’ section), suggesting that the actual distribution of this motif, we sought for the best- the allocation of the binding-site classes reflects a regula- matched sequences in the regions within 200 bp from the tory consequence rather than a random distribution. We center of the binding sites and in random loci, and calcu- found that BLIMP1 exhibited the highest occupancy lev- lated their similarity to this motif. We consider that, if a se- els in the common sites, an intermediate level in the shared quence contained in a binding site of a TF and sequences sites, and the lowest level in the specific sites (Figure 3F). abundantly found in the genome showed the same similar- Pairwise comparisons revealed that the BLIMP1 occupancy ity level to the recognition motif of the TF, such a sequence levels of the shared/common sites were similar between cell in the TF-binding site would not explain the binding of the types (fold difference <2 for 70–90%) (Figure 3G), indicat- TF. Thus, the similarity level to a motif in a binding site ing that BLIMP1 binds such sites at nearly constant levels was evaluated by the probability of finding sequences that in the four cell types. showed such a similarity in the random loci [an occurrence rate in random loci (ORR)]. We then plotted the percent- Genomic distributions of the three binding-site classes. We ages of the occurrence of the best-matched sequences in next investigated the genomic distributions of the three the binding sites against their ORR scores, and defined the binding-site classes. We explored the distances of the ‘motif-matched sequences’ by ORR < 0.1 (Figure 5B) (‘Ma- BLIMP1-binding sites from the TSSs, and found that as terials and Methods’ section). a whole, they were highly enriched in the genomic regions We found that the distributions of the ORRs were highly around TSSs (<15 Kb), particularly in the most proximal varied among the binding-site classes (Figure 5B and C). regions (<1 Kb) (Figure 4A and Supplementary Figure Notably, the shared/common sites were highly enriched S4D). The shared/common sites exhibited this trend in a with these sequences (50–74%), particularly those with very pronounced fashion (>59% of the common sites, and 52– low ORRs (ORR < 10−3)(Figure5B and C). In con- 73% of the shared sites were within 15 Kb of the TSSs, re- trast, the motif-matched sequences were relatively depleted spectively), whereas, notably, a substantial proportion (46– in the PRP- and PB-specific sites (26 and 41%, respec- 66%) of the specific sites were also located at more distal tively), while they were more pervasive in the emIE- and regions (Figure 4B and C). We also examined the local den- d2 PGCLC-specific sites (54 and 55%, respectively) (Figure sities of the binding sites, and found that, irrespective of the 5B). Nonetheless, binding sites without the motif-matched binding-site classes, the vast majority (77–88%) were dis- sequences (ORR > 0.1) still contained sequences somewhat tributed in a solitary manner (>10 Kb from the next-nearest similar to the BLIMP1 motif (Figure 5D). Interestingly, the neighbor) (Supplementary Figure S4E–G); nonetheless, we presence or absence of the motif-matched sequences had lit- found an enrichment of the binding-site clusters in the re- tle effect on the BLIMP1 occupancy levels in all binding- gions around TSSs (1–15 Kb) (but not in the region most site classes (Supplementary Figure S5A), suggesting that se- proximal to the TSSs) in all four cell types (Supplementary quence recognition by BLIMP1 may be relatively flexible. Figure S4H). We identified a small number of the common binding We next examined the association of the BLIMP1- sites bearing a unique repeat of the GGGAAA hexamer binding sites with the genes (for simplicity of the assign- (the hexamer was repeated ≥10 times) with ORRs of 5∼6 × ment of the bindings to the genes, we defined the bind- 10−3 (N = 155) (Figure 5A, C and D; Supplementary Figure Nucleic Acids Research, 2017, Vol. 45, No. 21 12159

Specific Shared by 2 cell types Shared by 3 cell types Common A 40kb 40kb 40kb 40kb

PRP [0-4.2] emIE [0-7.8] PB [0-9.6] d2PGCLC [0-6]

Tbx3 Otx2 Otx2os1 Cdc42ep4 Stat4 Stat1

B Number of BLIMP1 binding sites C 8 8 0 2000 4000 6000 8000 6 6 PRP

854 1610 4739 (IP/input) (7203) 2 4 4 (IP/input)

emIE 2 (3751) 854 1461 1436 2 2

PB 0 0 (5150) 854 1385 2911 PB, log

d2 PGCLC d2 PGCLC, log -2 -2 (6385) 854 1480 4051 -2 0 2 4 6 8 -2 0 2 4 6 8

PRP, log2(IP/input) emIE, log2(IP/input) Specific Shared Common Bound in PRP or d2 PGCLC: Bound in emIE or PB: PRP-specific emIE-specific d2 PGCLC-specific PB-specific Shared Shared Common Common Non-bound in PRP or d2 PGCLC Non-bound in emIE or PB

D Specific sites Shared sites by 2 cell types Shared sites by 3 cell types Common sites 3 3 3 3 2 2 2 2

1 1 1 1

PC2 PC2 PC2 PC2 PC2 0 0 PC2 0 0 -1 -1 -1 -1 -2 -2 -2 -2 3 -4 1 2 -4 2 3 -4 2 3 -4 2 3 -2 0 -1 0 -2 -1 0 1 -2 0 0 1 -2 0 -1 0 1 PC3 2 -3-2 0 2 -3-2 PC3 2 -2-1 2 -3-2 PC1 PC3 PC1 -3 PC1 PC3 PC1 PRP emIE shared sites by 2 cell types shared sites by 3 cell types common sites PB d2 PGCLC specific sites specific sites specific sites

8 E PRP emIE PB F 8 p<0.001 * d2 PGCLC All cell types * 4 7 * 6 * 2 5 (IP/input) 1 2 4 3 Log of binding sites 0.5 2

) ) ) ) ) ) )

s s

n

P E B

C

9 6 1 3 4 4 1

I BLIMP1 occupancy level

e e

o

L

P

R

3 3 1 6 0 5 5

p p

0.25 m

m

7 4 9 C 7 8 8 0

P

y y

e ( (

t t

4 1 2 1 4

Observed / expected numbers Specific2 cell 3 cell Common

G m

l l ( ( ( ( (

l l

o

P

e e types types

C

c c

2

d

2 3

Shared G Specific Shared 8 Pairwise comparison 7 PRP, emIE PRP, PB 6 PRP, d2 PGCLC emIE, PB 5 emIE, d2 PGCLC PB, d2 PGCLC 4 3 2 difference of BLIMP1-

2 1

Log 0

occupancy level (absolute value) PRP emIE PB d2 PGCLC 2 cell type 3 cell type Common Specific Shared

Figure 3. Specific, shared and common BLIMP1-binding sites and their BLIMP1 occupancyA levels.( ) ChIP-seq track of EGFP-BLIMP1 around Tbx3, Otx2, Cdc42ep4 and Stat1, which represent BLIMP1-binding sites specific to one cell type (emIE), shared among two (PRP and d2 PGCLCs) orthree 12160 Nucleic Acids Research, 2017, Vol. 45, No. 21

S5B). In fact, BLIMP1 very highly occupied the majority of The relationship between BLIMP1-binding and gene expres- such repeats identified in the genome (58%,/ 155 268) (Sup- sion plementary Figure S5C), and thus these repeats formed a Gene expression dynamics of the four lineages. Using an distinct subclass of the common binding sites. This subclass RNA-sequencing (RNA-seq) method (29,30), we next de- was predisposed to locate at large distances from the TSSs termined the gene expression dynamics during the relevant (>15 Kb for 85%) (Supplementary Figure S5D). Nonethe- developmental processes of the four lineages: (i) for pho- less, a subset of the repeats were distributed around TSSs toreceptor development, rod cell precursors [P0 EGFP (+), or within gene bodies of 81 genes, which were enriched with P4 EGFP (+), P4 CD73 (−)] and early rod cells [P4/P6/P10 genes related to ‘spermatogenesis’, including Prdm9, a de- CD73 (+)]; (ii) for perinatal IE development, IE cells at terminant of meiotic recombination hotspots (Supplemen- E16.5, P3 and P14; (iii) for PB differentiation, B cells, Pre- tary Figure S5E and F). PB and PBs; and (iv) for germ-cell development, epiblast- Next, we found that some sequence motifs discovered in like cells (EpiLCs), d2 to d6 PGCLCs and E9.5 to E13.5 the specific sites were similar to recognition motifs ofkey PGCs (Figure 2A; Supplementary Figure S6A–G and Ta- TFs for the respective lineages [e.g. Cone-rod homeobox bles S1–4) (28,46,47) (‘Materials and Methods’ section). (CRX) in PRP cells, CDX2 and HNF4A in emIE cells, PU.1 The RNA-seq data were highly reproducible between bi- (also known as SPI1) in PBs, and TFAP2C and SOX2 in ological replicates and showed a good dynamic range for d2 PGCLCs] (Figure 5E and Supplementary Figure S5G). the efficient detection of expression dynamics, as previ- Among these, the CRX motif was abundant in the PRP- ously reported (29,30) (Supplementary Figure S6G). We specific sites (37%), with an abundance greater than that used genes with expression levels of log [RPM (reads of the BLIMP1 motif (26%) (Figure 5F). We also found a 2 per million mapped reads)+1] > 4, which corresponds to long terminal repeat (RLTR22), bearing motifs of CDX2 ∼10 copies per cell (29), in at least one cell type (12 876 and HNF4A, two key IE regulators, nearly exclusively in the genes) for the subsequent analyses. Figure 6A shows the emIE-specific sites (3.7%,/ 53 1436) (Supplementary Figure expression dynamics of key genes for the respective lin- S5H and I). The specific distances and orientations between eages. Note that during photoreceptor and IE develop- the motifs, which are hallmarks for ‘on-DNA TF-TF inter- ment, Blimp1 was initially expressed at high levels and actions’ (45), were not found in most pairs of BLIMP1 and was rapidly downregulated, whereas during PB and germ- the other TF motifs, except for a joint motif with PU.1 in the cell differentiation/development, Blimp1 was acutely upreg- PB-specific sites (1.8%,/ 51 2911) (Supplementary Figure ulated upon their differentiation/development and main- S5J). Interestingly, the evolutionary conservation of PRP- tained thereafter (Figure 6A). specific sites was as high as that of all TSSs (Supplementary The unsupervised hierarchical clustering (UHC) and Figure S5K), implying the importance of their sequence PCA successfully classified the developing cells of the four properties. The importance of distal specific binding sites lineages (Figure 6B and Supplementary Figure S7A). We (>50 Kb from TSSs) was highlighted by their evolutionary identified the DEGs during the development of each lin- conservation, which was nearly as high as that proximal to eage (difference of the expression levels >2-fold in at least (<1 Kb from) the TSSs (Supplementary Figure S5L). one pairwise comparison, ANOVA, P < 0.05). Interestingly, Collectively, these findings indicate that BLIMP1 binds by k-means clustering we also found that the DEGs could to the genome in at least three distinct modes. First, be categorized into two major classes, those monotonically BLIMP1 binds to the shared/common sites with constant, and progressively up- or downregulated, for photorecep- relatively high occupancy levels: such sites are similar in tor, perinatal IE and PB development, and into three major number among the cell types, are enriched with sites more classes, those monotonically and progressively up- or down- proximal to TSSs, and bear relatively stringent recognition regulated or those that show highly acute transient upreg- sequences. Second, a subset of the common sites covered a ulation followed by subsequent downregulation, for germ- majority of GGGAAA repeats in the genome with very high cell development (Figure 6C) (‘Materials and Methods’ sec- occupancy levels of BLIMP1: such sites are located at large tion). These findings demonstrate that the four developmen- distance from the nearest TSSs. Third, BLIMP1 binds to the tal processes reflect a typical cell-state transition in which specific sites with lower occupancy levels: such sites aredi- part of the initial character of the cells is lost and instead verse in number among the cell types, are located more dis- a novel character is gained in a relatively straightforward tal to TSSs, and show less stringent recognition sequences. fashion.

BLIMP1 regulates gene expression through specific binding sites. Among the total 12 876 expressed genes, around 15– ←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− (emIE, PBs and d2 PGCLCs) cell types, and common to all cell types, respectively. (B) Stacked bar graphs showing the number of specific, shared and common BLIMP1-binding sites in PRP cells, emIE cells, PBs and d2 PGCLCs. (C) Scatterplots of BLIMP1-occupancy levels between PRP cells and d2 PGCLCs (left) and between emIE cells and PBs (right). The binding-site classes are color coded as indicated. (D) Scatterplots of factor loadings (PC1- PC3) of BLIMP1-occupancy levels for the specific sites (leftmost), the sites shared by two (second leftmost) or three (second rightmost) cell types,and the common binding sites (rightmost). The binding-site classes are color coded as indicated. (E) Bar graphs showing the ratios between the observed numbers of BLIMP1-binding sites and those expected from the random distribution for each binding-site class. (F) Boxplots for BLIMP1-occupancy levels in the indicated binding-site classes. The numbers of sites are indicated in the brackets. Asterisks indicate statistical significance by t-test (two-sided P < 0.001). (G) Boxplots for the absolute log2 difference of BLIMP1-occupancy levels in pairwise comparisons of specific (left) and shared/common sites (right). Nucleic Acids Research, 2017, Vol. 45, No. 21 12161

35 PRP Distance from TSSs (top50%) A B C 100 30 <1Kb 1-15Kb specific Specific 25 12.9 23.0 1K15K 1K 15K 1K 15K 80 Shared 20 40.7 22.4 60 37.5 (%)21.7 (%) Common 15 40 10 shared

PRP 20 5 common 0 0 100 35 emIE specific 80 30 <1Kb 1-15Kb 25 11.4 33.8 Specific 60 30.5 30.0 Shared shared 20 emIE40 PRP 15 37.5 (%)21.7 (%) Common 20 emIE common 10 0 5 100 0 specific 0 80 35 PB Percentages 60 Percentages 30 <1Kb 1-15Kb PB 20.5 32.8 Specific 40 25 shared 20 47.2 25.8 Shared PB 20 common 15 37.5 (%)21.7 (%) Common 0 100 10 specific 5 80 0 60 35 d2 PGCLC 40 30 <1Kb 1-15Kb

d2 PGCLC shared 4.7 28.9 Specific 20 25 common 20 18.5 33.7 Shared d2 PGCLC 0 0.11 10 102 103 104 15 37.5 (%)21.7 (%) Common Distance from TSS (Kb) 10 0 Specific Shared Common 5 Specific RPM 0 Shared <1 10 102 103 104 Distance from TSS (Kb) 0.5 3 Common

D Genes associated with each BLIMP1-binding site class F Genes associated with specific binding sites PRP Specific emIE PRP (2156) Specific (882) Sensory perception of light stimulus (1.1E-10) Eye morphogenesis (6.0E-6) 1841 788 Photoreceptor cell development (2.5E-5) 11 Sox2, Vsx1, Nr2e3, Pax6, Nrl, Six3, Rorb, Prox1, Neurog1, 19 58 25 170 126 Atoh1, Opn1sw,Abca4, Pde6a, Kcnv2, Sag, Rd3, Ptf1a ... 1179 69 725 1235 65 620 emIE Shared Shared Common Common Regulation of actin filament depolymerization (1.1E-3) (1317) (830) (1489) Specific (830) Specific Positive regulation of macromolecule metabolic process (2021) (1722) (4.2E-3) PB d2 PGCLC Hoxa1, Klf5 Egfr, Cdk5, Cdc25b, Atp7a, Ereg, Tsc1, Nfib Hoxa2, Etv5, Ncoa1, Hoxc6, Hoxb5... 1703 1537 PB 18 94 17 222 78 74 Regulation of lymphocyte activation (7.9E-12) 1252 67 667 936 57 682 Apoptosis (2.1E-11) Shared Shared Negative regulation of B cell activation (2.0E-4) (1559) Common (1104) Common (830) (830) Myc, Bcl6, Pax5, Tlr4, Il15, Cd74, Il10, Fas, Cdkn2a, Bcl10, Il2ra, Tnf, Cd40, Il21, Cd83, Cd80, Nfkb1... E Genes associated with specific binding sites emIE PRP d2 PGCLC (882) 1747 (2156) 712 Embryonic morphogenesis (8.1E-6) 1362 Cell fate commitment (4.3E-5) 50 1680 65 154 Pattern specification process (1.76E-4) 4 5 166 0 Hoxa2, Hoxb13, Hand1, Tdgf1, Tbx3, Otx2, Fgfr2, Fgfr1, d2PGCLC 15 4 42 PB (1722) 121 (2021) Foxa2, Src, Ctnnb1, Spry2, Fgf3, Mycn, Pax7, Dbx1...

Figure 4. Genomic distribution and gene association of BLIMP1-binding sites. (A) Histograms of BLIMP1-binding sites against the log10 distance (Kb) from TSSs. Binding-site classes in the stacked bars are colored as indicated, and the percentages of the sites within 1 Kb and within 1–15 Kb for each class are shown to the left of the color-code symbols. The dotted lines indicate 15 Kb from the TSSs. (B) Heat maps of the top 50% of BLIMP1-binding sites sorted by distances from the nearest TSSs. The distances of 1 and 15 Kb are indicated with dotted lines. (C) Area charts showing the percentages of binding-site classes located within the indicated distances from TSSs. (D) Venn diagrams showing the definition of genes associated with each BLIMP1- binding-site class (solid lines). The numbers of TSSs located within 15 Kb of the nearest binding sites are indicated. (E) Venn diagrams showing the overlap of genes associated with the specific binding sites in different cell types. (F) Selected GO terms and lists for genes associated with the specific binding sites. 12162 Nucleic Acids Research, 2017, Vol. 45, No. 21

ClassSequence Logo E-value A C 20 PRP D ORR < 10-3 PRP 1.4e-816 15 10 emIE 1.5e-314 5 Specific 0 PB 1.6e-401 20 emIE 10-3 < ORR < 10-1 15 d2 5.3e-292 PGCLC 10 Shared 6.4e-271 5 by 2 cell types 0 -3 -3 Shared 1.3e-153 20 5 x 10 < ORR < 6 x 10 by 3 cell types PB 15 Common 7.9e-354 10 5 Sequence Logo BLIMP1 Percentages 0 (JASPAR) (MA0508.1) 10-1 < ORR 20 d2 PGCLC 1.0 15 B PRP s emIE 10

e

t Specific i PB s 5 0.8 d2

g

n 0

i

d PRP Common n 20 i emIE

b 0.6 Shared PB 15

n

i

d2

e 10 t Common

a

r

0.4 5

c

e 0 n -3 -2 -1

e 10 10 10 1

r

u 0.2 ORR (BLIMP1 motif)

u

c

O Specific 0 Shared 00.20.1 0.4 0.6 0.8 1.0 Occurence rate in random loci (ORR) Common

PRP specific sites emIE specific sites PB specific sites d2PGCLC specific sites E 0.0030 0.0045 0.0035 0.0040 0.0040 BLIMP1 BLIMP1 0.0030 BLIMP1 0.0035 0.0025 0.0035 BLIMP1 CRX CDX2 PU.1 0.0030 0.0030 0.0025 TFAP2C 0.0020 0.0025 0.0025 HNF4A 0.0020 POU2F2 SOX2 0.0015 0.0020 0.0020 0.0015 0.0010 0.0015 0.0015 0.010 Probability 0.0010 0.0010 0.0005 0.0005 0.00005 0.0005 0.0000 0.0000 0.0000 0.0000 -1000 0 1000 -1000 0 1000 -1000 0 1000 -1000 0 1000 Position of Best Site in Sequence (bp)

BLIMP1 / CRX BLIMP1 / CDX2 BLIMP1 / HNF4A BLIMP1 / PU.1 BLIMP1 / POU2F2 BLIMP1 / TFAP2C BLIMP1 / SOX2 F 100 167 199 216 424 468 203 195 219 310 388 215 80 477 502 22 1329 1578 2137 717 26 9 1373 596 30 1553 659 6 126 58 48 10 52 2625 24 59 54 113 20 110 123 48 60 3154 183 85 236 133 3165 252 3162 72 3170 91 3190 28 3177 760 511 3161 93 168 212 58 40 1370 440 128 260 619 355 577 605 609 849 575 284 516 135 519 921 581 403 728 801 646 792 Percentages 20 368 608 564 570 963 1343 1607 829 352 357 814 332 345 360 361 0 368 353 357 332 345 360 354 Shared Shared Shared Shared Shared Shared specific specific specific specific Shared specific specific specific Random Random Random Random Random Random Random Common Common Common Common Common Common Common PRP emIE PB d2 PGCLC BLIMP1 motif only Lineage-related TF motif only (CRX, CDX2, HNF4A, PU.1, POU2F2, TFAP2C, and SOX2) Both Neither

Figure 5. Sequence properties of BLIMP1-binding sites. (A) Identified sequence motifs enriched in the respective binding sites andtheir E-values. The BLIMP1 motifs registered in the JASPAR database are shown on the bottom. (B) Occurrence-rate plots of the sequences best-matched to the BLIMP1 motif in the binding sites against the ORRs. The vertical solid line indicates ORR 0.1. The dotted diagonal line represents coverages that equal the ORR (i.e. coverages in random genomic loci). Binding-site classes are color coded as indicated. (C) Histograms of ORRs for the sequences best-matched to the BLIMP1 motif in the indicated binding-site classes. Dotted lines indicate an ORR of 0.1. The black closed circle indicates ORRs between 5 × 10−3 and 6 × 10−3, which were highly enriched with the GGGAAA repeat. (D) Sequence logos corresponding to the indicated ORR values. (E) Enrichment of the indicated TF motifs in specific sites. (F) Stacked bar graphs showing the frequencies of the combinations of the BLIMP1 motif and the indicated TF motifs in the binding sites (ORR < 0.1). The numbers of binding sites are indicated. Nucleic Acids Research, 2017, Vol. 45, No. 21 12163

Photoreceptor differentiation Perinatal IE development Terminal B cell development Germ cell specification A Blimp1Otx2 Crx Nrl Rhodopsin Blimp1Sis Ass1Cyp4v3 Afp Blimp1 Pax5 Bcl6 Irf4 Xbp1 Blimp1Prdm14 Tfap2c Nanog Uhrf1 14 12 10 8 6 (RPM+1)

2 4 2

Log 0 d6 d6 d2 d6 d2 d6 d2 d6 d2 d2 P6 P0 P4 P4 P0 P4 P4 P6 P0 P4 P4 P6 P3 P0 P4 P4 P6 P0 P4 P4 P6 P3 P3 P3 P3 PB PB PB PB PB P10 P10 P10 P10 P10 P14 P14 P14 P14 P14 B cell B cell B cell B cell B cell E16.5 E16.5 E16.5 E16.5 E16.5 EpiLC EpiLC EpiLC EpiLC EpiLC prePB prePB prePB CD73+ prePB prePB B Photoreceptor differentiation Perinatal IE development Terminal B cell differentiation Germ cell specification / Development P10_CD73+ P0_GFP+ PB EpiLC 30 30 P14 40 E13.5 20 E16.5 B cell d2PGCLC E12.5 0 0 0 -80 40 -60 40 −80 40 E9.5 P4_CD73- 50 E11.5 PC2 (14.4%) P6_CD73+ PC2 (19.2%)

PC2 (16.1%) prePB -40 PC2 (12.1%) -20 -30 P3 -40 d4PGCLC E10.5 P4_CD73+ d6PGCLC PC1 (64.8%) PC1 (65.6%) PC1 (71.4%) PC1 (32.1%) C Photoreceptor differentiation Perinatal IE development Enrichment Enrichment 10 8 Specific (333 genes) 0 1 2.5 Specific (69 genes) 01 2.5 8 Nr2e3, Nrl, Abca4, Kcnv2, Sag... 6 Sis, Lrrc26, Gse1, Abcc3...

Blimp1 exp. 6 Blimp1 exp. 4 Shared (91 genes) * Shared (54 genes) * Gje1, Bmi1, Bhlhe41, Cc2d2a ... Mcm3, Krt18, Sp5...

Common (40 genes) (926) Common (48 genes) (1674) Ppap2c, Fryl, Tmem91 ... Igf1r, Ptms, Irf2bp1, Fgf15... Specific (303 genes) Enrichment 0 1 2.5 Specific (70 genes) Enrichment Sox2, Pax6, Six3, Mycn... 01 2.5 (892) (2052) Osta, Nras, Cyp3a13, Rhoc... Shared (160 genes) Shared (53 genes) Id4, Hes5, Jun, Uhrf1... * *

P3 Ugt2b5, Sulf2, Apoc2...

Common (66 genes) P14 Common (21 genes) Cep70, Cenpj, Fgf15 ... E16.5 Ube2q1, Chka, Pgam1... P4_GFP+ P0_GFP+ Germ cell specification P4_CD73+ P6_CD73+ Enrichment P10_CD73+ 8 Specific (148 genes) 01 2.5 Terminal B cell differentiation Tdgf1, Otx2, Myc, Klf9, Kras, Fgf4... Enrichment 4 Blimp1 10 exp. 0 Specific (427 genes) 0 1 2.5 Shared (64 genes) * 6 Uhrf1, Bcl11a, Bcat2, Cdh1... Spib, Bcl6, Pax5, Cd83, Cd37... Blimp1 exp. * 2 (853) Common (27 genes) Shared (184 genes) * Ciita, Mark2, Mafk, Hes1, Hdac1... Stat6, Fgf15, Fxc1.. Common (104 genes) Specific (87 genes) Enrichment

(1971) Stat6, Mll2, Fryl, Ets1, Bcl7b... * T, Tbx3, Snail1, Hoxa1, Hand1, Fos... 01 2.5

(1672) (653) Shared (51 genes) Specific (184 genes) Enrichment Msx1, Meis3, Krt18, Hes1, Junb... * Rhob, Mars, Irf4, Emillin1... 0 1 2.5 Common (27 genes) * (1198) Shared (90 genes) * Wnt3, Igf1r, Erap1, Dirc2... Xbp1, Pim2, Uhrf1, Sod1... EpiLC PB Common (90 genes) Specific (145 genes) Enrichment B cell d6PGCLC d2PGCLC d4PGCLC prePB Aurkb, Psma5, Slc39a7... Tfap2c, Hat1, Stat4, Mycn, Klf2... 01 2.5 Relative expression value Shared (89 genes) Sox2, Max, Jund, Id3, Bcl3.. -3 0 3 Common (62 genes)

s Stat3, Mll2, Nmi, Klf5, Kdm3a...

e

D n

e PRP emIE PB d2PGCLC f 2.5

g

o

t Specific

d

n

n

e

u

o m Shared

b

h

- 1

c

i

1

r

n P Common

E

M >0>2 >4 >8>0 >2 >4 >8>0 >2 >4 >8>0 >2 >4 >8 I 0

L

B Thresholds of fold changes

s BLIMP1 mutants

E e

n 12 3 2.5 2.5 2.5 2.5 emIE (E18.5) PB PGC (LS_0B/E_MB) emIE (E18.5) PB PGC (LS_0B/E_MB)

f e

o g

t

d

n

n

e

u

m o 1 1 1 1

h b 1

-

c

i

1 r 1

P

n

E 0 0 0 M 0 0 0 I >0>2 >4 >8>0>2 >4 >8>0>2 >4 >8 >0>2 >4 >0>2 >4 >8>0>2 >4 >8

L

B Up-regulated in mutants (fold change) Down-regulated in mutants (fold change) 2 F BLIMP1/CRX BLIMP1/CDX2 BLIMP1/HNF4A BLIMP1/PU.1 BLIMP1/POU2F2 BLIMP1/TFAP2C BLIMP1/SOX2 * 1

0 Both Both Both Both Both Both Both Neither Neither Neither Neither Neither Neither Neither CRX only joint motif PU.1 only

CDX2 only SOX2 only HNF4A only HNF4A BLIMP1 only BLIMP1 only BLIMP1 only BLIMP1 only BLIMP1 only BLIMP1 only BLIMP1 only TFAP2C only POU2F2 only

combinations of TF motifs PRP emIE PB d2 PGCLC

Enruchment of DEGs in indicated Genes associated with specific binding sites ( * p < 0.001)

Figure 6. Impact of BLIMP1 binding on gene expression. (A) Line graphs showing the expression dynamics of key genes in the indicated developmental processes. (B) Scatterplots of PC scores (PC1 and PC2) of the individual developmental processes. The cell types are represented with closed circles and 12164 Nucleic Acids Research, 2017, Vol. 45, No. 21

22% were associated with the BLIMP1-binding sites (<15 BLIMP1 represses the genes through binding to the specific, Kb from the TSSs) in the four cell types (Supplementary but not shared/common, binding sites. Figure S7B), and those associated with the specific, shared and common binding sites were largely non-overlapped and Sequence motifs and gene regulation. We next analyzed the showed a wide range of expression levels similar to the range impacts of the motif-matched sequences of BLIMP1 and for the total group of expressed genes (Supplementary Fig- other TFs in the specific BLIMP1-binding sites on gene ure S7C and D). By examining the enrichment levels of expression. Strikingly, the presence/absence and combina- genes associated with the BLIMP1-binding sites in DEGs, tions of such sequences did not show a major influence on we made two general findings. First, among the genes asso- the association between the specific sites and the DEGs dur- ciated with the specific, shared and common binding sites, ing the normal development (Figure 6F), raising a possibil- those associated with the specific binding sites were selec- ity that BLIMP1 may act on gene expression independent tively enriched in DEGs (photoreceptor, perinatal IE and of the stringent sequence recognitions or collaboration with PB development) (Figure 6C) (‘Materials and Methods’ sec- the other TFs. Nonetheless, we found a slight enrichment tion), and such DEGs were highly specific to the respective of the DEGs among genes associated with the emIE- and developmental processes (Supplementary Figure S7E). Sec- PB-specific sites that contained motifs of both BLIMP1 and ond, while the specific BLIMP1 bindings were associated other TFs (Figure 6F). The CRX motifs were frequently de- with DEGs in general, they tended to be more enriched tected without a BLIMP1 motif in the PRP-specific sites around DEGs negatively correlated with the Blimp1 level (29%), and were significantly enriched with DEGs in the [i.e. genes upregulated when Blimp1 was repressed (photore- photoreceptor differentiation (P < 0.001); such sites were ceptor development) or genes downregulated when Blimp1 highly associated with the genes for photoreception (‘Visual was upregulated (PB and germ-cell development)] (Figure perception’; Rbp3, Gucy2f) (Supplementary Figure S7F), 6C). An exception to the first general finding was that genes implying that BLIMP1 may block the maturation of CRX- associated with the shared binding sites in PGCLCs were mediated photoreceptor programs. The RLTR22s exhib- also enriched in DEGs, and an exception to the second gen- ited no association with the DEGs, but were located in the eral finding was that during the perinatal IE development, neighborhood of B4galnt2 and Fut8 (8.3 and 37.4 Kb from the genes with specific BLIMP1 bindings were both up- and TSSs, respectively), which encode enzymes for protein gly- downregulated at similar enrichment levels when Blimp1 cosylation expressed in the adult IE (Supplementary Figure was repressed (Figure 6C). Moreover, we found that the en- S7G). richment levels of genes associated with the specific (all four cell types) or shared (PGCLCs) BLIMP1-binding sites in Common BLIMP1 targets. Our analyses so far indicated DEGs were progressively increased along with increases in that genes bearing the common sites were not enriched the degree of differential expression (Figure 6D). around the DEGs in normal development, or around We next analyzed the effects of the loss of Blimp1 the genes affected by Blimp1 mutation (Figure 6C–E). on the expression of genes bound by BLIMP1 in emIE Nonetheless, among the common sites, 10–19% were associ- cells (19), PBs (15) and PGCs (43) (‘Materials and Meth- ated with DEGs, while 20–30% were associated with stably ods’ section). The genes misregulated in the Blimp1 mu- expressed genes (log2 RPM+1 > 4) or stably silenced genes tants were highly enriched with those bearing specific, but (log RPM+1 < 4) (Supplementary Figure S7H). To more / 2 not shared common (except for d2 PGCLC shared sites), directly identify genes that may be commonly regulated by BLIMP1-binding sites (Figure 6E) and the enrichment lev- BLIMP1, we looked for genes that were consistently neg- els were increased progressively with the increase of the atively or positively correlated with the Blimp1 level, and level of misregulation (Figure 6E): the specific sites were as- then examined whether such genes harbored the common / / sociated with 26% (5 19; IE development), 34% (65 189; BLIMP1-binding sites (<15 Kb from TSSs) (‘Materials and / PB differentiation) and 12% (74 615; PGC specification) Methods’ section) (Figure 7). This analysis showed that the > of genes upregulated in the Blimp1 mutant ( 4-fold), and genes showing a consistent correlation with Blimp1 in two / / 18% (25 140; PB differentiation) and 10% (30 288; PGC or more processes were fewer in number than the genes cor- < / specification) of the downregulated genes ( 1 4-fold) (Sup- related with Blimp1 in only one process; 3022, 483, 49 and plementary Table S6). The shared sites in d2 PGCLCs 1 genes were negatively correlated, and 2600, 493, 48 and / / were linked with 7% (40 615) and 8% (23 288) of the up- 5 genes were positively correlated, with Blimp1 in one, two, and downregulated genes, respectively (Supplementary Ta- three and four developmental processes, respectively (Fig- ble S6). Thus, these findings demonstrate that, primarily, ure 7). Interestingly, when the genes were consistently neg- ←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− colored as indicated. The percentages of variances for PC1 and PC2 are indicated. The time courses are traced as represented by the gray round arrows. (C) Line graphs of Blimp1 expression dynamics (top left), and the heat-maps of relative changes in the expression of DEGs classified by k-means clustering (k = 2 in photoreceptor differentiation, IE development and terminal B cell differentiation; k = 3 in germ cell specification) (bottom left). For each developmental process, 1000 randomly selected DEGs are visualized. Bar graphs for the enrichments of BLIMP1-associated genes and the lists of representative genes in the indicated classes are shown on the right. Asterisks indicate statistical significance by Chi-squared test (P < 0.001). The color code is the same as in Figure 4A. (D) Enrichment plots of the BLIMP1-associated genes against the indicated thresholds of fold change (the greatest changes during each developmental process). (E) Enrichment plots of BLIMP1-associated genes against the indicated thresholds of fold changes between the wild-type and Blimp1 mutants for genes up- (left) and downregulated (right) in the mutants. The color code is the same as in (D). (F) Bar graphs of the enrichment levels of DEGs among those associated with the specific sites containing the indicated combinations of motif-matched sequences for BLIMP1 and otherTFs. Asterisks indicate statistical significance by Chi-squared test (P < 0.001). Nucleic Acids Research, 2017, Vol. 45, No. 21 12165

level

Blimp1

Percentage

Correlation withNegatively Positivelycorrelated correlatedNot correlated0 10 20 100 4000 1 ~ ~ 301 7 42

s * e 45 438 s 202

s * e 121 2901 c 103

o

f

r

o

p 0 5

040

r

l

e

a

t b 0311 47 n Common-site

m

e

u 21 472 associated genes m 022

N

p

o l 01397 2503 e Common-site

v

e ≥1 ≥1 ≥0 83 1600 non-associated genes

d 004187 4305 All 562 12314 * p<0.001

Figure 7. Common targets of BLIMP1. Stacked bar graphs showing the percentages of the common-site associated genes among genes consistently negatively (top 4, blue) and positively (fifth to eighth from the top, pink) correlated with the Blimp1 level in the indicated number of developmental processes. Genes that showed inconsistent correlation with Blimp1 (i.e. negatively correlated in one cell type and positively in another; ninth from the top, orange), those not correlated with Blimp1 in all four processes (10th from the top, dark gray) and all expressed genes (bottom, yellow) are also shown. The numbers of genes are overlapped on the graphs. The asterisks indicate statistical significance by Chi-squared test (P < 0.001). atively correlated with Blimp1 in a greater number of devel- and 177 BLIMP1-binding sites that exhibited greater occu- opmental processes, the greater percentages of such genes pancy levels (>2-fold) in E12.5 PGCs and in d2 PGCLCs, were associated with the common sites; 4% (121/3022), 9% respectively. Interestingly, the 348 sites with greater occu- (45/483), and 14% (7/49) of the genes bore the common pancy levels in E12.5 PGCs were strongly enriched with sites, when they were negatively correlated with Blimp1 in the common sites (50%, 174/348), especially those with one, two and three processes, respectively (Figure 7). In con- the GGGAAA repeats (83%, 132/174) (Figure 8B), and trast, genes positively correlated with Blimp1 or those with- such repeats exhibited very high occupancy levels specific to out consistent correlation were associated with common E12.5 PGCs (Figure 8C), whereas the 177 sites with greater sites at nearly constant percentages that were similar to the occupancy levels in d2 PGCLCs were enriched with the spe- percentage for the total group of expressed genes (∼4%). cific sites (Figure 8B). Thus, a small number of genes were commonly repressed, Second, we examined the genes associated with the not activated, by BLIMP1. Myc, a prototypical target of BLIMP1-binding sites in d2 PGCLCs and E12.5 PGCs. BLIMP1 (5), was included in the seven genes that were neg- Among 9399 genes expressed in d2 PGCLCs or in E12.5 atively correlated with Blimp1 in three of the four processes PGCs [log2(RPM+1) > 4], 1579 and 1990 genes were asso- (Myc, Klf9, Wls, Psmb10, Plekha1, Sept6, Mrgbp) (Supple- ciated with the BLIMP1-binding sites in d2 PGCLCs and mentary Figure S7I), and was associated with a prominent in E12.5 PGCs, respectively, among which 1203 genes were common binding site (Figure 2B). common to d2 PGCLCs and E12.5 PGCs (common PGC genes), and 376 and 787 genes were specific to d2 PGCLCs and E12.5 PGCs, respectively (early- and late-PGC spe- BLIMP1-binding profile during germ cell development cific genes, respectively, see ‘Materials and Methods’ sec- Finally, we investigated how the BLIMP1-binding profiles tion) (Figure 8D). The early-PGC, but not late-PGC, spe- and hence the targets of BLIMP1 might change during the cific genes (376 genes) were highly enriched with those bear- development of a certain cell lineage, using the development ing the d2 PGCLC-specific binding sites (Figure 8E), rein- of germ cells as a paradigm, since BLIMP1 is expressed con- forcing the case for the involvement of BLIMP1 in a spe- sistently during a period of ∼7 days from their specification cific gene regulation upon PGC specification (13,43,48). On through to sexually dimorphic development (23,48,49). We the other hand, we found that among the 87 genes upregu- defined 8776 BLIMP1-binding sites in E12.5 male/female lated when Blimp1 was conditionally knocked out in late PGCs and compared them with those in d2 PGCLCs. First, PGCs (knocked out at E10.5 and analyzed at E11.5), 60 consistent with the correlation analyses (Figure 2C), a scat- genes (∼69%) were bound by BLIMP1 at E12.5 and 90% terplot comparison revealed that the BLIMP1-binding pro- (54/60) of such genes were also bound by BLIMP1 in d2 files were significantly correlated between d2 PGCLCs and PGCLCs (Figure 8F). These findings indicate that a major- E12.5 male/female PGCs, and did not show apparent sex- ity of key BLIMP1 targets during germ-cell development ual dimorphism (Figure 8A). However, we identified 348 are occupied by BLIMP1 upon germ cell specification. 12166 Nucleic Acids Research, 2017, Vol. 45, No. 21

A 8 348 sites B C D 100 6 f Common sites with Genes associed with

o

s

s 86 GGGAAA repeats BLIMP1-binding sites

e

t

e 4 i 80

s

s 9

s

g a 8 l 4051 d2PGCLC (1579)

n

c 2 i 88 130 60 7

f d Early-PGC

o n 376

i 6

b 0 e specific genes 42 (IP/input) 177 sites g 5 (IP/input)

1 40 2 2

a

t d2 PGCLC P 4

-2 n

M

e I 1480

Log 3

8 c 132 Common L 20 r 33 2 1203

B e PGC genes

F

P E

C C P 699 M 6 155 I

14 L L

PB C BLIMP1 occupancy level

R

0 C

m

) ) )

C C

2 2

P

G

e G

5 8 7

d d

G G P

/ /

8 4 7

P

4 Late-PGC

_

P P

3 3 1

_

787

( (

C C

5

6

5

.

2 6 specific genes

(

.

G G 2 2

2

d d

/ 2 2

2

>

1

P P E12.5_PGCM, log

1 1

d

E

<

e e

E l Late PGC (1990)

t t

l

0 a a

A

L L E12.5_PGCF -2 BLIMP1-binding sites -202468 in d2 PGCLC log (IP/input) 2 Specific BLIMP1 binding sites in d2 PGCLC Shared Late PGC / d2 >2 Non GGGAAA repeat Common Late PGC / d2 < 1/2 GGGAAA repeat Other BLIMP1 binding sites Non binding sites

E 100 F Genes up-regulated in G 98 BLIMP1 cKO PGCs

e 80

g 828 468 6

a t 6 BLIMP1 n 60 21

e

c r 278 e 40 P 751 ? 1Kb 20 319 GGGAAA repeats Cell type-specific sites Shared / Common sites 54 0 Stringency of ) ) ) Tdgf1, Spic, Nmi ... c c repeat flexible strong i C i

6 9 7

f f

i i L sequence motif 7 7 8

c c

C 3 5 7

e e

( (

1

G

p p • High conservation (PRP)

(

s s

s s

P

e e Characteristics of • RLTR22 (emIE)

n n

2

C C

e e d Common PGC genes

G G each cell type

l g g • PU.1-BLIMP1 joint motif (PB)

P P

l

- - Early-PGC specific genes

A

y

e

l { t • Distal distribution (PGC) r Late-PGC specific genes

a

a

L E Genes without BLIMP1 boundings Genes associated with Occupancy level (low) BLIMP1-binding sites Occupancy level (high) Binding-ste clased Repression (weak association) Specific Repression (strong association) Shared / Common Activation Late-PGC specific Others

Figure 8. Transition of BLIMP1-binding sites in the late PGCs. (A) Scatterplots of BLIMP1-occupancy levels between d2 PGCLCs and E12.5 male and female PGCs. Binding sites in d2 PGCLCs showing higher and lower occupancy levels in late PGCs (>2-fold) are colored red and blue, respectively. (B) Stacked bar graphs showing the binding-site classes in d2 PGCLCs for the sites defined in (A). (C) Boxplots of the log2 occupancy levels of GGGAAA repeats. (D) Venn diagrams showing overlap of the BLIMP1-associated genes in d2 PGCLCs and late PGCs. Early- and late-PGC specific genes, and common PGC genes are indicated. (E) Stacked bar graphs showing the percentages of genes bearing specific sites (blue) among the early-PGC specific genes (leftmost) and all genes associated with BLIMP1-binding sites in d2 PGCLCs (middle). Note that early-PGC specific genes are included in genes associated with BLIMP1-binding sites in d2 PGCLCs. The percentages of genes bearing BLIMP1-binding sites detected in the late PGCs and not detected in the other lineages (PRP, emIE or PBs) (pink, rightmost) are also shown. (F) Pie chart showing the percentages of BLIMP1-associated genes among the upregulated genes in Blimp1 cKO. The classes of the genes are color-coded as indicated. (G) Summary of the genomic binding patterns of BLIMP1 and gene regulations.

DISCUSSION types, locate more distally to the TSSs, show less stringent recognition sequences and lower BLIMP1 occupancy levels, Using the single epitope tag, antibody and ChIP-seq pro- and contribute more effectively to the regulation of down- tocol, we have performed a systematic comparison of the stream genes, than the shared/common binding sites, which genome-wide binding profiles of BLIMP1 across four dis- locate more proximally to the TSSs, show stringent recogni- tinct developing cell types and investigated the regulatory tion sequences and exhibit higher BLIMP1 occupancy lev- logic of this highly versatile transcriptional regulator. Our els (Figure 8G). We assume that our estimation of the con- key findings included that the specific BLIMP1-binding tribution of the specific binding sites to gene regulation rep- sites, which are divergent in number among the different cell resents a lower limit, since we only included the binding sites Nucleic Acids Research, 2017, Vol. 45, No. 21 12167

<15 Kb from the TSSs as those associated with the genes for methylation (H3K27me3) [terminal B cell differentiation simplicity of the assignment of the bindings to the genes, yet (15) and PGC specification in vitro (13)] suggested that a a majority of the specific binding sites locate >15 Kb from substantial elevation of H3K27me3 levels after BLIMP1- the TSSs (Figure 4A–C). For example, there were two PRP- bindings to the specific sites may characterize gene re- specific sites at 22.8 and 17.7 Kb upstream of Vsx2 (Supple- pression by BLIMP1, and when genes were activated by mentary Figure S4E), a critical regulator for the bipolar cell BLIMP1 or were expressed stably, the associated binding differentiation repressed by BLIMP1 (17,18), and the latter sites did not show the recruitment of H3K27me3 (Supple- site was reported to be a bipolar-cell enhancer (50), which mentary Figure S8A). may thus be a target for repression by BLIMP1. Along with the accumulation of datasets for the bind- Previous experiments primarily using cultured cell lines ing of TFs across the genome, it has now become well- have highlighted that BLIMP1 represses gene expression recognized that the number of TF-binding sites across the by recruiting various co-repressors around the BLIMP1- genome is much higher than the number of genes that ap- binding sites proximal to the TSSs (7–10). Consistent with pear to be controlled directly by the relevant TFs [reviewed such previous findings, we did identify an enrichment of the in (1,53)]. Mutational experiments have shown that the loss BLIMP1-binding sites, particularly the shared/common of one such element often has only a modest effect on the sites, around the TSSs (<1 Kb) (Figure 4A–C). On the other expression of the associated genes in a cellular context (54– hand, based on our aforementioned findings at the genome- 57), suggesting that the TF-binding sites may function in wide level and involving four distinct lineages, the extent to an additive manner or apparently ‘spurious’ binding sites which the shared/common sites contribute to gene regula- may play critical roles only upon environmental stresses or tion during developmental processes in vivo may not nec- mutations in other binding sites [reviewed in (1,53)]. Even essarily be significant, which should be verified experimen- in this regard, it has been surprising to find that the com- tally using methodologies such as mutating the binding se- mon BLIMP1-binding sites––which locate more proximally quences by a genome-editing technology, etc. In this re- to the TSSs, show stringent recognition sequences and ex- gard, it is also interesting to note that the shared BLIMP1- hibit higher BLIMP1 occupancy levels––do not essentially binding sites in d2 PGCLCs, which exhibited a significant influence the expression of associated genes (Figure 6C–E). association with the regulation of downstream genes, were This might suggest the possibility that the common/shared also distributed more distally than those of the other lin- BLIMP1-binding sites serve as a reservoir of the nuclear eages (Figures 4A–C and 6C–E). Collectively, these findings BLIMP1 protein, from which BLIMP1 is recruited to spe- therefore suggest that, in general, BLIMP1 controls gene ex- cific sites to regulate gene expression. A new sequence motif, pression through distally located, cell type-specific binding a GGGAAA repeat, which was very strongly occupied by sites. BLIMP1, might also play such a role. We note that the spe- In good agreement with the previous notion that cific sites associated with gene regulation by BLIMP1 tend BLIMP1 functions primarily as a transcriptional repres- to coexist with the common sites within topologically as- sor (3–11), specific BLIMP1 bindings exhibited a signifi- sociated domains (58) (Supplementary Figure S8B), which cant correlation with the repression of associated genes in might support this argument. all four cell types analyzed (both in wild-type and knockout In conclusion, our datasets on the correlation of spe- analyses) (Figure 6C–E). On the other hand, in line with cific BLIMP1 bindings with gene repression/activation in the demonstration that BLIMP1 also functions as a tran- four distinct lineages should serve as a valuable resource scriptional activator (15), specific BLIMP1 bindings exhib- for exploring the specific mechanisms underlying BLIMP1- ited a significant correlation with the activation of associ- mediated control of gene repression/activation through ated genes in perinatal IE development (in wild-type anal- its distal binding sites. Such studies should be facilitated yses) and in PB and PGC development (in knockout anal- with analyses on relevant epigenetic features around the yses) (Figure 6C and E). We found that the presence of the BLIMP1-binidng sites, including open chromatin states and lineage-related TF motifs in the BLIMP1-binding sites had histone modifications, as well as 3D genome structures. only a limited impact on the regulation of the associated genes, suggesting that BLIMP1 may exhibit specific func- tions without intimately cooperating with other TFs. If so, DATA AVAILABILITY this would stand in contrast to proposed mechanisms of en- The ChIP-seq and RNA-seq data of EGFP-BLIMP1 hancer activation, in which specific TF combinations target were deposited to the Gene Expression Omnibus (GEO) cell type-specific enhancers depending on individual recog- (GSE91041). nition sequences located in juxtaposition (∼100 bp) (51), or TFs operate collectively through bindings to recogni- tion sequences with flexible orientation or spacing (52)[re- SUPPLEMENTARY DATA viewed in (1,2)], although the motif occurrence patterns in the BLIMP1-binding sites were compatible with these mod- Supplementary Data are available at NAR Online. els. Understanding the mechanisms for gene regulation by ACKNOWLEDGEMENTS BLIMP1 via the specific binding sites with the flexible mo- tif recognition may require extensive investigations of the We thank the members of our laboratory for their helpful epigenome, such as analysis of histone modifications. Pub- input on this study, and R. Kabata, N. Konishi, Y. Sak- lished data for the distribution of histone H3 lysine 27 tri- aguchi, Y.Nagai, M. Kawasaki, T. Sato and M. Kabata for 12168 Nucleic Acids Research, 2017, Vol. 45, No. 21

their technical assistance. We thank K. Hayashi, K. Hani- Hobit and Blimp1 instruct a universal transcriptional program of uda and D. Kitamura for their advice on plasmablast induc- tissue residency in lymphocytes. Science, 352, 459–463. tion. We used experimental instruments at the Medical Re- 17. Brzezinski,J.A.t., Lamba,D.A. and Reh,T.A. (2010) Blimp1 controls photoreceptor versus bipolar cell fate choice during retinal search Support Center, Graduate School of Medicine, Ky- development. Development, 137, 619–629. oto University. 18. Katoh,K., Omori,Y., Onishi,A., Sato,S., Kondo,M. and Furukawa,T. (2010) Blimp1 suppresses Chx10 expression in differentiating retinal photoreceptor precursors to ensure proper photoreceptor FUNDING development. J. Neurosci., 30, 6515–6526. Japan Society for the Promotion of Science (JSPS) 19. Harper,J., Mould,A., Andrews,R.M., Bikoff,E.K. and Robertson,E.J. (2011) The transcriptional repressor Blimp1/Prdm1 regulates KAKENHI Grants (JP16H04720, JP16H01216 to K.K., postnatal reprogramming of intestinal enterocytes. Proc. Natl. Acad. JP25.5081 to T.M., in part); Japan Science and Technol- Sci. U.S.A., 108, 10585–10590. ogy Agency-Exploratory Research for Advanced Technol- 20. Muncan,V., Heijmans,J., Krasinski,S.D., Buller,N.V., ogy (JST-ERATO) Grant (JPMJER1104 to M.S.). Fund- Wildenberg,M.E., Meisner,S., Radonjic,M., Stapleton,K.A., Lamers,W.H., Biemond,I. et al. (2011) Blimp1 regulates the transition ing for open access charge: JSPS KAKENHI Grant of neonatal to adult intestinal epithelium. Nat. Commun., 2,452. (JP16H04720). 21. Shaffer,A.L., Lin,K.I., Kuo,T.C., Yu,X., Hurt,E.M., Rosenwald,A., Conflict of interest statement. None declared. Giltnane,J.M., Yang,L., Zhao,H., Calame,K. et al. (2002) Blimp-1 orchestrates plasma cell differentiation by extinguishing the mature B cell gene expression program. Immunity, 17, 51–62. REFERENCES 22. Hayashi,K., Ohta,H., Kurimoto,K., Aramaki,S. and Saitou,M. 1. Heinz,S., Romanoski,C.E., Benner,C. and Glass,C.K. (2015) The (2011) Reconstitution of the mouse germ cell specification pathway in selection and function of cell type-specific enhancers. Nat. Rev. Mol. culture by pluripotent stem cells. Cell, 146, 519–532. Cell Biol., 16, 144–154. 23. Yamashiro,C., Hirota,T., Kurimoto,K., Nakamura,T., Yabuta,Y., 2. Long,H.K., Prescott,S.L. and Wysocka,J. (2016) Ever-changing Nagaoka,S.I., Ohta,H., Yamamoto,T. and Saitou,M. (2016) landscapes: transcriptional enhancers in development and evolution. Persistent requirement and alteration of the key targets of PRDM1 Cell, 167, 1170–1187. during primordial germ cell development in mice. Biol. Reprod., 94,7. 3. Keller,A.D. and Maniatis,T. (1991) Identification and 24. Furukawa,A., Koike,C., Lippincott,P., Cepko,C.L. and Furukawa,T. characterization of a novel repressor of beta-interferon gene (2002) The mouse Crx 5’-upstream transgene sequence directs expression. Genes Dev., 5, 868–879. cell-specific and developmentally regulated expression in retinal 4. Turner,C.A. Jr, Mack,D.H. and Davis,M.M. (1994) Blimp-1, a novel photoreceptor cells. J. Neurosci., 22, 1640–1647. zinc finger-containing protein that can drive the maturation ofB 25. Gracz,A.D., Puthoff,B.J. and Magness,S.T. (2012) Identification, lymphocytes into immunoglobulin-secreting cells. Cell, 77, 297–306. isolation, and culture of intestinal epithelial stem cells from murine 5. Lin,Y., Wong,K. and Calame,K. (1997) Repression of c-myc intestine. Methods Mol. Biol., 879, 89–107. transcription by Blimp-1, an inducer of terminal B cell 26. Hayashi,K., Nittono,R., Okamoto,N., Tsuji,S., Hara,Y., Goitsuka,R. differentiation. Science, 276, 596–599. and Kitamura,D. (2000) The B cell-restricted adaptor BASH is 6. Kuo,T.C. and Calame,K.L. (2004) B lymphocyte-induced maturation required for normal development and antigen receptor-mediated protein (Blimp)-1, IFN regulatory factor (IRF)-1, and IRF-2 can activation of B cells. Proc. Natl. Acad. Sci. U.S.A., 97, 2755–2760. bind to the same regulatory sites. J. Immunol., 173, 5556–5563. 27. Nojima,T., Haniuda,K., Moutai,T., Matsudaira,M., Mizokawa,S., 7. Ren,B., Chee,K.J., Kim,T.H. and Maniatis,T. (1999) Shiratori,I., Azuma,T. and Kitamura,D. (2011) In-vitro derived PRDI-BF1/Blimp-1 repression is mediated by corepressors of the germinal centre B cells differentially generate memory B or plasma Groucho family of . Genes Dev., 13, 125–137. cells in vivo. Nat. Commun., 2, 465. 8. Yu,J., Angelin-Duclos,C., Greenwood,J., Liao,J. and Calame,K. 28. Kagiwada,S., Kurimoto,K., Hirota,T., Yamaji,M. and Saitou,M. (2000) Transcriptional repression by blimp-1 (PRDI-BF1) involves (2013) Replication-coupled passive DNA demethylation for the recruitment of histone deacetylase. Mol. Cell. Biol., 20, 2592–2603. erasure of genome imprints in mice. EMBO J., 32, 340–353. 9. Gyory,I., Wu,J., Fejer,G., Seto,E. and Wright,K.L. (2004) PRDI-BF1 29. Nakamura,T., Yabuta,Y., Okamoto,I., Aramaki,S., Yokobayashi,S., recruits the histone H3 methyltransferase G9a in transcriptional Kurimoto,K., Sekiguchi,K., Nakagawa,M., Yamamoto,T. and silencing. Nat. Immunol., 5, 299–308. Saitou,M. (2015) SC3-seq: a method for highly parallel and 10. Su,S.T., Ying,H.Y., Chiu,Y.K., Lin,F.R., Chen,M.Y. and Lin,K.I. quantitative measurement of single-cell gene expression. Nucleic (2009) Involvement of histone demethylase LSD1 in Acids Res., 43, e60. Blimp-1-mediated gene repression during plasma cell differentiation. 30. Nakamura,T., Okamoto,I., Sasaki,K., Yabuta,Y., Iwatani,C., Mol. Cell. Biol., 29, 1421–1431. Tsuchiya,H., Seita,Y., Nakamura,S., Yamamoto,T. and Saitou,M. 11. Doody,G.M., Care,M.A., Burgoyne,N.J., Bradford,J.R., Bota,M., (2016) A developmental coordinate of pluripotency among mice, Bonifer,C., Westhead,D.R. and Tooze,R.M. (2010) An extended set monkeys and humans. Nature, 537, 57–62. of PRDM1/BLIMP1 target genes links binding motif type to 31. Luo,R.X., Postigo,A.A. and Dean,D.C. (1998) Rb interacts with dynamic repression. Nucleic Acids Res., 38, 5336–5350. histone deacetylase to repress transcription. Cell, 92, 463–473. 12. Hohenauer,T. and Moore,A.W. (2012) The Prdm family: expanding 32. Landt,S.G., Marinov,G.K., Kundaje,A., Kheradpour,P., Pauli,F., roles in stem cells and development. Development, 139, 2267–2282. Batzoglou,S., Bernstein,B.E., Bickel,P., Brown,J.B., Cayting,P. et al. 13. Kurimoto,K., Yabuta,Y., Hayashi,K., Ohta,H., Kiyonari,H., (2012) ChIP-seq guidelines and practices of the ENCODE and Mitani,T., Moritoki,Y., Kohri,K., Kimura,H., Yamamoto,T. et al. modENCODE consortia. Genome Res., 22, 1813–1831. (2015) Quantitative dynamics of chromatin remodeling during germ 33. Caporaso,J.G., Lauber,C.L., Walters,W.A., Berg-Lyons,D., cell specification from mouse embryonic stem cells. Cell Stem Cell, Lozupone,C.A., Turnbaugh,P.J., Fierer,N. and Knight,R. (2011) 16, 517–532. Global patterns of 16S rRNA diversity at a depth of millions of 14. Mould,A.W., Morgan,M.A., Nelson,A.C., Bikoff,E.K. and sequences per sample. Proc. Natl. Acad. Sci. U.S.A., 108(Suppl. 1), Robertson,E.J. (2015) Blimp1/Prdm1 functions in opposition to Irf1 4516–4522. to maintain neonatal tolerance during postnatal intestinal 34. Robinson,M.D., McCarthy,D.J. and Smyth,G.K. (2010) edgeR: a maturation. PLoS Genet., 11, e1005375. Bioconductor package for differential expression analysis of digital 15. Minnich,M., Tagoh,H., Bonelt,P., Axelsson,E., Fischer,M., gene expression data. Bioinformatics, 26, 139–140. Cebolla,B., Tarakhovsky,A., Nutt,S.L., Jaritz,M. and Busslinger,M. 35. Huang da,W., Sherman,B.T. and Lempicki,R.A. (2009) Systematic (2016) Multifunctional role of the transcription factor Blimp-1 in and integrative analysis of large gene lists using DAVID coordinating plasma cell differentiation. Nat. Immunol., 17, 331–343. bioinformatics resources. Nat. Protoc., 4, 44–57. 16. Mackay,L.K., Minnich,M., Kragten,N.A., Liao,Y., Nota,B., 36. Machanick,P. and Bailey,T.L. (2011) MEME-ChIP: motif analysis of Seillet,C., Zaid,A., Man,K., Preston,S., Freestone,D. et al. (2016) large DNA datasets. Bioinformatics, 27, 1696–1697. Nucleic Acids Research, 2017, Vol. 45, No. 21 12169

37. Stormo,G.D., Schneider,T.D., Gold,L. and Ehrenfeucht,A. (1982) 48. Ohinata,Y., Payer,B., O’Carroll,D., Ancelin,K., Ono,Y., Sano,M., Use of the ‘Perceptron’ algorithm to distinguish translational Barton,S.C., Obukhanych,T., Nussenzweig,M., Tarakhovsky,A. et al. initiation sites in E. coli. Nucleic Acids Res., 10, 2997–3011. (2005) Blimp1 is a critical determinant of the germ cell lineage in 38. Chang,D.H., Cattoretti,G. and Calame,K.L. (2002) The dynamic mice. Nature, 436, 207–213. expression pattern of B lymphocyte induced maturation protein-1 49. Vincent,S.D., Dunn,N.R., Sciammas,R., Shapiro-Shalef,M., (Blimp-1) during mouse embryonic development. Mech. Dev., 117, Davis,M.M., Calame,K., Bikoff,E.K. and Robertson,E.J. (2005) The 305–309. zinc finger transcriptional repressor Blimp1/Prdm1 is dispensable for 39. Robertson,E.J., Charatsi,I., Joyner,C.J., Koonce,C.H., Morgan,M., early axis formation but is required for specification of primordial Islam,A., Paterson,C., Lejsek,E., Arnold,S.J., Kallies,A. et al. (2007) germ cells in the mouse. Development, 132, 1315–1325. Blimp1 regulates development of the posterior forelimb, caudal 50. Kim,D.S., Matsuda,T. and Cepko,C.L. (2008) A core paired-type and pharyngeal arches, heart and sensory vibrissae in mice. Development, POU homeodomain-containing transcription factor program drives 134, 4335–4345. retinal bipolar cell gene expression. J. Neurosci., 28, 7748–7764. 40. Ohinata,Y., Sano,M., Shigeta,M., Yamanaka,K. and Saitou,M. 51. Kazemian,M., Pham,H., Wolfe,S.A., Brodsky,M.H. and Sinha,S. (2008) A comprehensive, non-invasive visualization of primordial (2013) Widespread evidence of cooperative DNA binding by germ cell development in mice by the Prdm1-mVenus and transcription factors in Drosophila development. Nucleic Acids Res., Dppa3-ECFP double transgenic reporter. Reproduction, 136, 41, 8237–8252. 503–514. 52. Junion,G., Spivakov,M., Girardot,C., Braun,M., Gustafson,E.H., 41. Yeung,K.Y. and Ruzzo,W.L. (2001) Principal component analysis for Birney,E. and Furlong,E.E. (2012) A transcription factor collective clustering gene expression data. Bioinformatics, 17, 763–774. defines cardiac cell fate and reflects lineage history. Cell, 148, 42. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., 473–486. Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T. et al. 53. Spivakov,M. (2014) Spurious transcription factor binding: (2000) Gene ontology: tool for the unification of biology. The Gene non-functional or genetically redundant? Bioessays, 36, 798–806. Ontology Consortium. Nat. Genet., 25, 25–29. 54. Spivakov,M., Akhtar,J., Kheradpour,P., Beal,K., Girardot,C., 43. Kurimoto,K., Yabuta,Y., Ohinata,Y., Shigeta,M., Yamanaka,K. and Koscielny,G., Herrero,J., Kellis,M., Furlong,E.E. and Birney,E. Saitou,M. (2008) Complex genome-wide transcription dynamics (2012) Analysis of variation at transcription factor binding sites in orchestrated by Blimp1 for the specification of the germ cell lineage in Drosophila and humans. Genome Biol., 13, R49. mice. Genes Dev., 22, 1617–1635. 55. Korkmaz,G., Lopes,R., Ugalde,A.P., Nevedomskaya,E., Han,R., 44. Yang,L., Zhou,T., Dror,I., Mathelier,A., Wasserman,W.W., Myacheva,K., Zwart,W., Elkon,R. and Agami,R. (2016) Functional Gordan,R. and Rohs,R. (2014) TFBSshape: a motif database for genetic screens for enhancer elements in the using DNA shape features of transcription factor binding sites. Nucleic CRISPR-Cas9. Nat. Biotechnol., 34, 192–198. Acids Res., 42, D148–D155. 56. Sanjana,N.E., Wright,J., Zheng,K., Shalem,O., Fontanillas,P., 45. Jolma,A., Yin,Y., Nitta,K.R., Dave,K., Popov,A., Taipale,M., Joung,J., Cheng,C., Regev,A. and Zhang,F. (2016) High-resolution Enge,M., Kivioja,T., Morgunova,E. and Taipale,J. (2015) interrogation of functional elements in the noncoding genome. DNA-dependent formation of transcription factor pairs alters their Science, 353, 1545–1549. binding specificity. Nature, 527, 384–388. 57. Ulirsch,J.C., Nandakumar,S.K., Wang,L., Giani,F.C., Zhang,X., 46. Sasaki,K., Yokobayashi,S., Nakamura,T., Okamoto,I., Yabuta,Y., Rogov,P., Melnikov,A., McDonel,P., Do,R., Mikkelsen,T.S. et al. Kurimoto,K., Ohta,H., Moritoki,Y., Iwatani,C., Tsuchiya,H. et al. (2016) Systematic functional dissection of common genetic variation (2015) Robust in vitro induction of human germ cell fate from affecting red blood cell traits. Cell, 165, 1530–1545. pluripotent stem cells. Cell Stem Cell, 17, 178–194. 58. Dixon,J.R., Selvaraj,S., Yue,F., Kim,A., Li,Y., Shen,Y., Hu,M., 47. Ohta,H., Kurimoto,K., Okamoto,I., Nakamura,T., Yabuta,Y., Liu,J.S. and Ren,B. (2012) Topological domains in mammalian Miyauchi,H., Yamamoto,T., Okuno,Y., Hagiwara,M., Shirane,K. genomes identified by analysis of chromatin interactions. Nature, 485, et al. (2017) In vitro expansion of mouse primordial germ cell-like 376–380. cells recapitulates an epigenetic blank slate. EMBO J., 36, 1888–1907.