A high-resolution landscape of mutations in the BCL6 super-enhancer in normal human B cells

Jiang-Cheng Shena, Ashwini S. Kamath-Loeba, Brendan F. Kohrna, Keith R. Loeba,b, Bradley D. Prestona, and Lawrence A. Loeba,c,1

aDepartment of Pathology, University of Washington, Seattle, WA 98195; bClinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109; and cDepartment of Biochemistry, University of Washington, Seattle, WA 98195

Edited by James E. Cleaver, University of California San Francisco Medical Center, San Francisco, CA, and approved October 28, 2019 (received for review August 14, 2019) The super-enhancers (SEs) of lineage-specific in B cells are clustered mutations (8). The first of these targets to be identified off-target sites of somatic hypermutation. However, the inability was the promoter/SE region of the BCL6 (8, 9). The BCL6 to detect sufficient numbers of mutations in normal human B cells gene encodes a , whose ex- has precluded the generation of a high-resolution mutational pression is tightly coordinated with entry and exit from germinal landscape of SEs. Here we captured and sequenced 12 SEs at centers (10, 11). Germinal center B cells have high levels of the single-nucleotide resolution from 10 healthy individuals across BCL6 protein, which regulates the expression of many genes diverse ethnicities. We detected a total of approximately 9,000 involved in B cell differentiation. On the other hand, antibody- subclonal mutations (allele frequencies <0.1%); of these, approx- secreting plasma cells and memory B cells that exit the germinal imately 8,000 are present in the BCL6 SE alone. Within the BCL6 SE, center turn off BCL6 expression to facilitate the switch between we identified 3 regions of clustered mutations in which the muta- differentiation and stable cell state maintenance. It has been tion frequency is ∼7 × 10−4. Mutational spectra show a predomi- hypothesized that mutations within the BCL6 SE dysregulate nance of C > T/G > A and A > G/T > C substitutions, consistent BCL6 expression to promote lymphomagenesis (12, 13). Indeed, with the activities of activation-induced-cytidine deaminase (AID) Sanger sequencing of PCR amplicons from diffuse large B cell and the A-T mutator, DNA polymerase η, respectively, in mutagen- has identified clonal mutations in the BCL6 SE esis in normal B cells. Analyses of mutational signatures further (12). Furthermore, reporter assays in cultured cells showed that corroborate the participation of these factors in this process. Sin- some of these mutations interfere with the binding of tran- MEDICAL SCIENCES gle base substitution signatures SBS85, SBS37, and SBS39 were scriptional repressor factors to up-regulate BCL6 expression found in the BCL6 SE. While SBS85 is a denoted signature of AID (14). More recently, next-generation whole-genome sequencing in lymphoid cells, the etiologies of SBS37 and SBS39 are unknown. studies of diffuse large B cell lymphomas reported the presence Our analysis suggests the contribution of error-prone DNA poly- of mutation clusters within the BCL6 SE locus (7); however, only merases to the latter signatures. The high-resolution mutation approximately 30 clonal mutations were observed in a total of 10 landscape has enabled accurate profiling of subclonal mutations samples, an inadequate number for characterizing the in B cell SEs in normal individuals. By virtue of the fact that sub- types and distributions of mutations. clonal SE mutations are clonally expanded in B cell lymphomas, our A high-resolution landscape of SE mutations in B cells from studies also offer the potential for early detection of neoplastic healthy individuals will aid in unravelling the process of aSHM alterations. and enable early detection of preexisting SE mutations that might signal lymphomagenesis. We reasoned that DNA mutated duplex sequencing | somatic hypermutation | BCL6 | super-enhancer | mutational signatures Significance

n response to antigen stimulation, naïve B cells congregate in We used duplex sequencing to detect low-frequency mutations Igerminal centers, where they undergo multiple rounds of cell in the BCL6 super-enhancer locus in normal human B cells. The division and antigenic selection ultimately maturing into landscape of preexisting mutations is remarkably conserved antibody-producing plasma cells and memory B cells (1–3). In across different ethnicities and reveals clustered mutational the germinal center, somatic hypermutation (SHM) occurs, tar- hotspots that correlate with reported sites of clonal mutations geting the variable domain of the Ig and resulting in and translocation breakpoints in human B cell lymphomas. This nonfunctional, low-affinity, or high-affinity antibodies. This high-resolution genomic landscape revealed by duplex se- process involves the creation of multiple single-base substitutions quencing offers accurate and thorough profiling of low- in Ig genes by the mutagenic SHM machinery (4). SHM includes frequency preexisting mutations in normal individuals along 2 stages of mutagenesis: cytidine deamination by activation- with the potential for early detection of neoplastic alterations. induced cytidine deaminase (AID) to generate a G:U mispair and subsequent error-prone repair synthesis by an error-prone Author contributions: J.-C.S., B.D.P., and L.A.L. designed research; J.-C.S. and A.S.K.-L. performed research; B.F.K. and K.R.L. contributed new reagents/analytic tools; J.-C.S., DNA polymerase. A.S.K.-L., B.F.K., K.R.L., B.D.P., and L.A.L. analyzed data; and J.-C.S., A.S.K.-L., and L.A.L. Although SHM is a tightly regulated process, it can act aber- wrote the paper. rantly, even in normal cells, to somatically mutate other (off- Competing interest statement: L.A.L. is a founder and equity holder at TwinStrand target) sites. Aberrant SHM (aSHM) sites are frequently found Biosciences. in noncoding DNA regions containing cis-regulatory elements, This article is a PNAS Direct Submission. such as intragenic super-enhancers (SEs)—binding sites for Published under the PNAS license. – multiple factors that increase transcription (5 7). These regions Data deposition: The data in this paper have been uploaded to NCBI BioProject (accession are proximal to the transcription start site (TSS) of actively no. PRJNA574179). transcribed genes and are in an “open” chromatin conformation 1To whom correspondence may be addressed. Email: [email protected]. (p300- and H3K27ac-positive). This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. aSHM targets were delineated by low-resolution single-cell– 1073/pnas.1914163116/-/DCSupplemental. based polymerase chain reaction (PCR) sequencing of sites with

www.pnas.org/cgi/doi/10.1073/pnas.1914163116 PNAS Latest Articles | 1of7 Downloaded by guest on September 26, 2021 by aSHM is preserved in circulating memory B cells such that 1.0 x10-03 deep sequencing of purified B cells would reveal details of the mutational landscape. We used duplex sequencing (DS) (15, 16) for this purpose. DS provides a mutational landscape of genomic DNA at single-nucleotide resolution to reveal mutational pat- 1.0 x10-04 terns and potential underlying mechanisms. The accuracy of DS stems from copying both strands of single DNA molecules; mutations are defined based on complementarity and presence in both strands at the same position. As a result, DS is approx- 1.0 x10-05 imately 1,000-fold more accurate than routine next-generation sequencing and allows the identification of rare mutations— those present at frequencies as low as 1 base substitution in 7 10 nucleotides sequenced. Frequency Mutation 1.0 x10-06 Using targeted capture, we purified and sequenced 12 SE loci + in genomic DNA samples of human CD19 B cells isolated from 10 healthy individuals of diverse ethnic backgrounds. The mu- tational landscape shows clustered mutations, most elevated in 1.0 x10-07 the SE of BCL6 and to a lesser extent in the SEs of PAX5 and ∼ BCL6 CD83 PIM1 PAX5 EBF1 CD83. A total of 9,000 base substitutions were detected in the H2AFXRHOH AICDACD79BCD79A 10 samples; ∼8,000 of these were found in the BCL6 SE. The POU2AF IgHV3-23 mutational profiles of the BCL6 SE are remarkably similar in Fig. 1. Mutation frequencies in various TSS-SE gene loci and in the Ig gene individuals across various ethnic groups, suggesting that they are IgHV3-23 in B cells from 10 healthy individuals. Twelve aSHM targets as well not the result of random stochastic events but instead may as the Ig gene IgHV3-23 were captured from CD19+ B cells purified from 10 function in normal developmental processes. Mutational spectra independent blood donors and sequenced by DS. The mutation frequency— and signatures at the BCL6 SE locus further suggest that mu- the total number of mutations divided by total number of nucleotides se- tations result from the activities of AID and of the error-prone quenced—was calculated for each sample and plotted as shown. Note that − η the mutation frequency at the BCL6 SE (2.2 × 10 4) is as high as that of DNA polymerase, Pol . − IgHV3-23 (1.3 × 10 4). Data are presented as mean ± SEM (n = 10). SEs with − − − Results mutation frequencies >1.0 × 10 4, between 2.5 × 10 6 and 1.0 × 10 4, and <2.5 × 10−6 are highlighted in red, blue, and orange, respectively. High Frequency of Mutations in the BCL6 SE. We sequenced 12 + aSHM targets (SI Appendix, Table S1) in CD19 B cells from 10 healthy blood donors of different ethnicities (SI Appendix, Table SE locus (Fig. 2 A and B). We found that close to 90% of the ∼ ′ S2). The target gene segments are located 1kb3 of the TSS ∼8,000 mutations are located in a 600-bp region that corre- – and are reported SEs of the associated genes (5 7). As a control, sponds to a tentative “open” chromatin region enriched for we also captured and sequenced an SHM target locus frequently histone markers H3K4me3 and H3K27Ac in germinal center B found in memory B cells: the variable region of the Ig gene, cells (20, 21). The density of mutations is extensive, with as many IgHV3-23 (17, 18). Of the 12 sequenced SE loci, we found that as 3 mutations present in a single read (SI Appendix, Fig. S1). BCL6 is the most frequently mutated. We observed more than < Within this 600-bp region, we observed 2 major mutational 8,000 subclonal mutations (allele fraction 0.5%) in 10 individ- clusters, each spanning ∼50 nt (Fig. 2A and SI Appendix, Fig. S2). uals, to yield an average mutation frequency (MF; mutations/ × −4 The average mutation frequencies for clusters 1 and 2 are 7.2 total nucleotides sequenced) of 2.2 × 10 (Fig. 1 and SI Ap- −4 −4 10 and 7.7 × 10 , respectively. These are in contrast to a pendix, Table S3), comparable to that observed at the SHM −6 − frequency of 3.5 × 10 in a 50-bp region downstream from the target, IgHV3-23 (MF 1.3 × 10 4; Fig. 1 and SI Appendix, Table hotspot clusters (Fig. 2A). The mutation load at the 2 hotspots S3). Mutant allele fractions (MAFs; number of mutations at each suggests that 20% to 80% of memory B cells have at least 1 nucleotide position/total alleles sequenced at each nucleotide mutation within 1 of these clusters. Notably, both the location of position) ranging from ∼0.002% to ∼0.5% were detected in the these mutation clusters and the mutation frequencies are re- BCL6 SE (Fig. 2 A and B). Since circulating memory B cells in peripheral blood comprise between 5% and 20% of the B cell markably conserved in healthy individuals across different eth- nicities (Fig. 2B and SI Appendix, Fig. S3 and Table S4). The population (19), we estimate that there are between 0.8 and 3.2 − average mutation frequency of the BCL6 SE is 1.5 × 10 4 in mutations within the 1-kb BCL6 SE region in each circulating × −4 × −4 memory B cell, assuming that other circulating B cells do not Caucasians, 2.0 10 in African Americans, 4.2 10 in × −4 contain frequent mutations in BCL6. Asians, and 2.0 10 in Hispanics (SI Appendix, Fig. S3 and × −4 The SEs of PAX5, POU2AF, and CD83 also showed high Table S4), and the average across all individuals is 2.2 10 mutation frequencies; however, they are more than one order of (Fig. 1 and SI Appendix, Table S3). There is no significant dif- = magnitude lower than that of the BCL6 SE. The mutation fre- ference among the 4 ethnic groups (P 0.26) (SI Appendix, − − − quencies are 9.7 × 10 6, 9.4 × 10 6, and 6.9 × 10 6, respectively Fig. S3). (Fig. 1 and SI Appendix, Table S3). The SE of H2AFX was the We also identified a third cluster of mutations in the TAA/WA − × −4 least mutated, with an average mutation frequency of 2.8 × 10 7, sequences, with an average mutation frequency of 5.4 10 close to background detection levels (Fig. 1 and SI Appendix, (Fig. 2B). Although mutations here are present at high density, Table S3). The mutation frequencies of the other sequenced SEs they are at lower frequencies than those in clusters 1 and 2. ranged between those of BCL6 and H2AFX SEs (Fig. 1 and SI Clustered mutations were also detected in the PAX5 and CD83 Appendix, Table S3). Since large numbers of mutations lend SEs (SI Appendix, Fig. S4 A and B, respectively), suggesting that confidence to data interpretation, we focused our studies on the SEs of B cell lineage-specific genes are targeted by a similar mutational analyses of the BCL6 SE locus. mutagenic mechanism. To confirm that these clustered mutations are specific to the Mutations in the BCL6 SE Locus Show a Remarkably Similar Pattern in memory B cell population, we purified and sequenced different All Individuals. High-resolution duplex DNA sequencing enabled cells—memory B cells, naïve B cells, monocytes, and T lym- us to delineate the landscape of subclonal mutations in the BCL6 phocytes, as well as cultured primary fibroblasts, transformed

2of7 | www.pnas.org/cgi/doi/10.1073/pnas.1914163116 Shen et al. Downloaded by guest on September 26, 2021 A 0.5% transitions in SHM are generated primarily by AID. AID is an Ave Mut Freq Ave Mut Freq Ave Mut Freq initiator of SHM; it deaminates cytidine to uracil, which base in 50 bps: in 50 bps: in 50 bps: pairs with adenine. 0.4% 7.2 x 10-4 7.7 x 10-4 3.5 x 10-6 In addition to AID, error-prone DNA polymerases are also implicated in SHM. The leading candidate is Pol η. Of the known 0.3% error-prone DNA polymerases, Pol η is believed to function in a noncanonical base excision repair or mismatch repair pathway following AID-catalyzed cytidine deamination (4). Pol η pref- 0.2% erentially generates G:T or A:C mismatches at A:T base pairs during DNA synthesis to generate A > G/T > C transition mu- 0.1% tations (22). We observed that almost every A/T site within the Mutant Allele Fraction TAA/WA repeat region in the BCL6 SE shows a predominance

0.0% of transition mutations (SI Appendix, Fig. S6). Chr3: 187462513 187462750 187463000 187463250 187463513 Our mutation spectrum data suggest that, similar to the Ig η Nucleotide Position genes, the BCL6 SE locus is a target of AID and Pol , resulting in a high prevalence (>60%) of A > G and C > T transition > > B 0.6 % ID1 mutations. Moreover, the C G/G C transversions that we 0.0 % observed could also result from the sequential activities of AID 0.6 % ID2 and error-prone DNA replication. The G:U mismatches gen- 0.0 % 0.6 % ID3 erated by AID can be efficiently converted to G:apurinic/ 0.0 % apyrimidinic (AP) nucleotide pairs by uracil DNA glycosylase 0.6 % ID4 (UDG). While replicative DNA polymerases insert A residues 0.0 % ID5 across unrepaired AP sites (23), error-prone polymerases such 0.6 % > > 0.0 % as Rev1 preferentially insert C residues to generate C G/G 0.6 % ID6 C transversions instead of C > T/G > A transitions (24, 25). 0.0 % Pol η may also play a role in AID-induced C > G transversions 0.6 % ID7 0.0 % (24, 25) but the mechanism for this is unclear. Thus, the 0.6 % ID8 contributions of AID, Rev1, and Pol η to mutations (C > T, MEDICAL SCIENCES

Mutant Allele Fraction 0.0 % C > GandA> G) within the BCL6 SE could be as high as ID9 0.6 % ∼70% (SI Appendix,Fig.S5A). 0.0 % 0.6 % ID10 0.0 % Mutant Allele Fractions Reveal the Sequence of Mutagenesis at the 0.6 % Average TSS BCL6 SE. Allele fractions of mutations can be used to unravel 0.0 % sequential mutagenic events in a cell population. It is well Chr3: 187462500 187462750 187463000 187463250 187463500 established that SHM of Ig genes involves multiple cycles of Repetitive Cluster TAA/ Cluster Repetitive mutagenesis, mutation selection, and cell division to generate Sequence 1 WA 2 Sequence plasma and memory B cells expressing high-affinity antibodies Fig. 2. Landscape of aSHM at the BCL6 SE locus. (A) Average MAFs (number (1–4). While clonal selection enriches variant Ig genes, muta- of mutations at a given nucleotide position/sequence read depth at that tions in other genes, such as BCL6, are likely unselected or much position) at each nucleotide position within the 1,001-bp region of the BCL6 less selected in healthy individuals. Thus, mutations at the BCL6 SE. Average MAFs in the two 50-bp hotspot clusters and in a 50-bp cold spot locus preserve an unbiased history of mutagenic events that have are indicated. (B) Mutational profiles of the BCL6 SE locus are conserved in transpired in memory B cells. For example, a mutagenic event individuals of different ethnicities. Most of the mutations are present in the 600-bp segment at the 5′-end of the genomic DNA sequence, that is, be- that occurs during earlier rounds of maturation will be clonally tween positions +400 and +1,000 from the TSS. Three mutational clusters are expanded by more cell divisions and thus likely to be more fre- apparent. Whereas mutations in clusters 1 and 2 have higher mutant allele quently detected in terminally selected memory B cells, while fractions, those in the TAA/WA sequence have lower fractions but higher later mutagenic events will be present at a lower frequency due density. to fewer rounds of cell expansion. Based on this premise, we delineated the order of mutagenic events by parsing the observed mutations into 5 bins corresponding to the following MAFs: bin embryonic kidney cells (293T) and the colon cancer cell line − − − − 1, >1 × 10 3; bin 2, >5 × 10 4 to 1 × 10 3; bin 3, >1 × 10 4 to 5 × SW480—using the same SE capture probe set. We found that − − − − 10 4; bin 4, >5 × 10 5 to 1 × 10 4; and bin 5, >1.7 × 10 5 to 5 × among all cell types, only memory B lymphocytes have an in- −5 creased frequency of mutations at the BCL6 SE locus (Table 1). 10 . We then determined whether there were differences in the spectrum of base substitutions in different bins. The BCL6 SE Locus Is Targeted by AID and Error-Prone DNA Polymerases. To gain insight into mutational processes operat- BCL6 ing at the BCL6 SE, we examined the spectrum of base substi- Table 1. Mutation frequencies of SE in various cell types tutions. We observed that transition base substitutions occur Sample BCL6 SE mutation frequency more frequently than transversions (SI Appendix, Fig. S5A). A > − Memory B cells 1.8 × 10 4 G/T > C and C > T/G > A transitions account for 28% and 23% − Naïve B cells 8.5 × 10 6 of all mutations, respectively, while C > G/G > C, A > C/T > G, − Monocytes <3.7 × 10 7 A > T/T > A, and C > A/G > T transversions constitute 18%, − T cells <3.6 × 10 7 13%, 12%, and 4%, respectively, of the remainder. The fre- − Fibroblasts (AG01440) <2.8 × 10 7 quency of transition base substitutions in the BCL6 SE is similar − Fibroblasts (AG09860) <3.4 × 10 7 to that observed at the SHM target locus IgHV3-23, in which C > − HEK 293T 5.1 × 10 6 T/G > A and A > G/T > C transitions comprise 34% and 23%, − Colon cancer (SW480) 1.4 × 10 6 respectively, of all mutations (SI Appendix, Fig. S5B). The C > T

Shen et al. PNAS Latest Articles | 3of7 Downloaded by guest on September 26, 2021 100% Mutational Signatures of BCL6 SE Are Consistent with the A η 90% Contribution of AID, Pol , and Rev1 to aSHM. We determined C>A/G>T mutational signatures at the BCL6 SE by examining the identity 80% A>T/T>A of adjacent 5′ and 3′ nucleotides (26). We observed that A > G/ 70% A>C/T>G T > C transitions are mostly present at ATA sites (Fig. 4A), while 60% C>G/G>C both C > T/G > A transitions and C > G/G > C transversions C>T/G>A 50% occur most often in the trinucleotide sequence, ACA (Fig. 4A). A>G/T>C

Fracon This spectrum is similar in all 10 B cell samples (SI Appendix, 40% Fig. S9A). In contrast, C > T/G > A transition mutations in 30% IgHV3-23 show a sequence preference for GCT and ACC sites, 20% whereas C > G/G > C transversions are more frequent at GCT

10% and ACT sequences (Fig. 4B). Moreover, mutational signatures of IgHV3-23 differ among the 10 individuals (SI Appendix, Fig. 0% Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 S9B), likely due to selection of different mutant IgH clones in each person. We further used deconstructSigs (27) to deconstruct our sig- B 100% natures into COSMICv3 single base substitution (SBS) signa- 90% tures. We identified 6 mutational signatures in IgHV3-23 (Fig. A>C/T>G — — 80% 4B), 2 of which SBS84 (25.2%) and SBS85 (9%) have SHM- A>G/T>C related etiology. SBS84 shows predominant C > T transitions 70% A>T/T>A (likely AID-mediated), and SBS85 has equal contributions of C>A/G>T 60% T > A and T > C substitutions (likely the result of DNA synthesis A:T Pairs C>G/G>C 50% C>T/G>A by error-prone DNA polymerases). We extracted 2 additional

Fracon 40% signatures, SBS37 and SBS39, which also potentially could be attributed to error-prone DNA polymerase activity during SHM. 30% SBS37 (9%) and SBS39 (23.3%) are represented by T > C 20% G:C Pairs 10% 0% A 1.Original Profile 4. Identified Signatures Bin 1 Bin 2 Bin 3 Bin 4 Bin 5 C>A C>G C>T T>A T>C T>G η 0.04 SBS16 Fig. 3. Order of AID and Pol activities revealed by mutational frequencies 10.6 % and spectra. Cumulative mutations in BCL6 SE from 10 individuals were 0.02 Unknown 0.00 SBS30 23.9 % sorted into 5 bins based on the mutant allele fractions at each nucleotide Frequency 8.2 % position: bin 1, >1 × 10−3;bin2,>5 × 10−4 to 1 × 10−3; bin 3, >1 × 10−4 to 5 × 2. Combined SBS Signatures −4 −5 −4 −5 −5 0.04 SBS37 10 ;bin4,>5 × 10 to 1 × 10 ; and bin 5, >1.7 × 10 to 5 × 10 .(A) Order SBS85 0.02 13.1 % of mutations as a function of the fraction of each base substitution type in 8.6 % 0.00 each bin. C > T/G > A (AID-mediated) and C > G/G > C substitutions dominate Frequency SBS51 bin 1 (∼80%), decrease in bins 2 and 3, and then gradually increase in bins 4 3. Differences 7 % SBS39 > > η 0.03 and 5. In contrast, A G/T C substitutions (Pol -generated) constitute only 0.00 28.6 %

a minor fraction of all substitutions in bin 1 but increase in bins 2 and 3 Error −0.03 before declining. (B) Order of mutations at G:C vs. A:T pairs. TTT TTT TTT ATT TTA ATT TTA ATT TTA TCT TCT TCT ATA TTC ATA TTC ATA TTC CTT CTT CTT ACT TCA ACT TCA ACT TCA ATC CTA GTT TTG ATC CTA GTT TTG ATC CTA GTT TTG CCT CCT CCT ACA TCC ACA TCC ACA TCC ATG CTC GTA ATG CTC GTA ATG CTC GTA CCA GCT CCA GCT CCA GCT ACC TCG ACC TCG ACC TCG CTG GTC CTG GTC CTG GTC CCC GCA CCC GCA CCC GCA ACG ACG ACG GTG GTG GTG CCG GCC CCG GCC CCG GCC GCG GCG GCG C>A C>G C>T T>A T>C T>G The base changes at C residues, resulting primarily from the activity of AID are likely the earliest events, as C > T/G > A and > > > B 1. Original Profile 4. Identified Signatures C G/G C alterations comprise 70% of all mutation types in C>A C>G C>T T>A T>C T>G bin 1 (highest MAF; Fig. 3A and SI Appendix, Fig. S7A). A > G/ 0.06 SBS32 Unknown > η 0.04 10.3 % T C substitutions, reflective of Pol activity, increase in 0.02 13.1 % 0.00 SBS37

abundance in bins 2 and 3, concomitant with a decrease in C Frequency 9 % SBS85 > > 2. Combined SBS Signatures 9 % substitutions. A G/T C substitutions then decrease in bins 4 0.06 and 5, while C > T/G > A and C > G/G > C alterations increase 0.04 progressively. A similar pattern was observed when we compared 0.02 SBS39 0.00 23.3 % SBS84 substitutions at G:C versus A:T pairs. Close to 80% of the sub- Frequency 25.2 % 3. Differences stitutions in bin 1 are at G:C pairs; this fraction decreased in bins 0.04 SBS46 2 and 3 but then increased in bins 4 and 5 in coordination with an 0.00 10 % Error −0.04 increase and decrease, respectively, of mutations at A:T pairs TTT TTT TTT ATT TTA ATT TTA ATT TTA TCT TCT TCT ATA CTT TTC ATA CTT TTC ATA CTT TTC TCA TCA TCA ACT ACT ACT ATC CTA GTT TTG ATC CTA GTT TTG ATC CTA GTT TTG ACA CCT TCC ACA CCT TCC ACA CCT TCC ATG CTC GTA ATG CTC GTA ATG CTC GTA GCT GCT GCT ACC CCA TCG ACC CCA TCG ACC CCA TCG CTG GTC CTG GTC CTG GTC GCA GCA GCA ACG CCC ACG CCC ACG CCC GTG GTG GTG GCC GCC GCC CCG CCG CCG GCG GCG (Fig. 3B and SI Appendix, Fig. S7B). These data suggest that GCG mutations at the BCL6 SE locus are not the result of random C>A C>G C>T T>A T>C T>G events but instead arise from the coordinated action of AID and Fig. 4. Deconstruction of signatures of unique mutations in BCL6 SE (A)and Pol η. They further suggest that AID inscribes its mark on the IgHV3-23 (B) from 10 merged B cell samples using deconstructSigs (24). mutational landscape before Pol η. In contrast, mutational Panels represent the original profiles (A-1 and B-1), the signature as com- footprints of both AID and Pol η appear early (bin 1: MAF >1 × puted using the best-fitting combination of COSMICv3 SBS signatures (A-2 − 10 3, indicative of clonal expansion) in the Ig gene IgHV3-23 (SI and B-2), the difference between the preceding 2 figures (A-3 and B-3), and the breakdown of the original profile into COSMIC v3 SBS signatures (A-4 Appendix, Fig. S8). After initial rounds of selection, however, > × −4 and B-4). Types of base substitutions and the array of trinucleotide motifs AID catalysis becomes predominant in bin 2 (MAF 5 10 to are displayed; motifs with the most frequent substitutions found in the −3 1 × 10 ), with approximately equal contributions of AID and original profile (panel 1) are highlighted with colors designated for each of Pol η thereafter (SI Appendix, Fig. S8). the six base substitution types.

4of7 | www.pnas.org/cgi/doi/10.1073/pnas.1914163116 Shen et al. Downloaded by guest on September 26, 2021 transitions and C > G transversions, respectively (SI Appendix, where SHM occurs (42). The activities of both mutators are also Fig. S10). Similarly, we identified 6 mutational signatures at the evident in the extracted mutational signatures (Fig. 4A). SBS84 BCL6 SE (Fig. 4A); 3 of these—SBS85 (8.6%), SBS37 (13.1%), and SBS85 are designated as signatures of AID-induced somatic and SBS39 (28.6%)—are shared with IgHV3-23. While the eti- mutagenesis in lymphoid cells in COSMICv3. We suggest that ology of SBS85 is known, the etiologies of SBS37 and SBS39 SBS37 and SBS39 are signatures of error-prone DNA polymer- are not. ases Pol η and Rev1, respectively, in SHM, based on their Based on the spectrum of nucleotide substitutions found in prevalence in the SHM target IgHV3-23 and the dominance of SBS37 and SBS39, and their presence in IgHV3-23 (the target of nucleotide substitutions characteristic of Pol η (22) and Rev1 SHM), we hypothesize that SBS37 and SBS39 could be signa- (24). This notion is further supported by the prevalence of sig- tures of error-prone DNA polymerases in SHM. A combination nature SBS37 in non-Hodgkin B cell lymphoma and the reve- of signatures SBS 37, 39, 84, and/or 85 was also identified at the lation by Supek and Lehner (43) of a putative Pol η signature, PAX5 and CD83 SE loci, where they accounted for as much as similar to SBS37 (prevalence of A > G transitions in WA motifs), 75% of all signatures (SI Appendix, Fig. S11). Thus, signatures of at sites of clustered mutations in human B cell lymphomas. Thus, SHM permeate the mutational landscape of B cell SEs. SHM accounts for the etiology of at least 50% of the mutational signatures at BCL6 SE. Approximately one-quarter of mutations Discussion are of unknown signatures (Fig. 4A), raising the possibility that We sequenced the SE regions of 12 highly expressed genes in B mutagenic processes besides SHM also operate at the BCL6 SE cells from 10 healthy individuals. Previous studies of mutagenesis locus. Interestingly, SBS37 and excess mutations in Pol η muta- of B cell SEs were based on low-resolution single-cell PCR se- ble motifs, TAA/WA, are also found in non-B cell cancers, where quencing (8) or mutant mouse models (5). A recent study in- in fact there is a negative correlation between the activities of volving whole-genome sequencing of single human B cells AID and Pol η (44). Perhaps deregulation of Pol η expres- reported mutations at the BCL6 SE locus; however, very few sion and activity contribute to Pol η-induced mutations in these mutations were identified in this study, precluding a detailed cancers. analysis of aSHM (28). We used exceptionally accurate high- Mutant allele frequencies also facilitated an evaluation of the depth DS (15, 16) to reveal a high-resolution landscape of mu- order of mutational events during aSHM. We show that AID tations and mutational processes in B cell SEs in normal healthy initiates aSHM and that Pol η operates later to extend the mu- individuals. tational landscape (Fig. 3). While the order of AID and Pol η We show that of all sequenced SEs, the BCL6 SE is the most mutagenesis during SHM varies slightly from that during aSHM, favored target of aSHM, with ∼8,000 subclonal mutations in 10 MEDICAL SCIENCES − it is evident that a balance of these 2 mutagenic activities is re- individuals. The average mutation frequency of 2.2 × 10 4 at the quired for fine-tuning antibody diversity. BCL6 SE locus is as high as the SHM frequency of the Ig genes One of the unique features of the BCL6 SE mutation land- (Fig. 1 and SI Appendix, Table S3). The accurate identification of scape is the high density of mutations in DNA sequences located large numbers of single nucleotide substitutions at the single- between 400 and 1,000 bp downstream of the TSS. There is a 20- nucleotide level enabled the identification of refined clustered fold difference in mutation frequencies between sequences with mutations. There are 3 clusters of mutations within the BCL6 − low variation (0 to 400 bp downstream of TSS; MF 1.7 × 10 5) SE, 2 of which overlap with AID target motifs (RGYW) (29) and and those with clustered mutations (400 to 1,000 bp; MF 3.5 × a third lying within the TAA/WA repeat sequences (Fig. 2B and −4 SI Appendix, Figs. S2 and S6) (30). In addition, there are multiple 10 ). This demarcation in mutation burden tracks with the < positioning of histones H3K27ac and H3K4me3 (21). The mutations, located 100 nt apart, in multiple sequence reads (SI “ ” Appendix, Fig. S1). While the density of mutations within these mutation-hot region shows high levels of histone H3 acetyla- clusters is high, the allele frequency of individual mutations tion and methylation, indicative of open chromatin and active is <0.5%, with most even lower (Fig. 2B). Remarkably, the lo- transcription with a high density of cis-regulatory elements (45), cation and allele frequency of these clustered mutations is highly such as SEs. The cold region is devoid of H3K27ac, suggesting conserved in individuals across different ethnicities (Fig. 2B and that the nucleosome is in a closed conformation and lacks SI Appendix BCL6 IgH regulatory function. , Table S2). and could be within the same “ ” topologically associating domain or proximal to each other in the While an open chromatin conformation facilitates active chromatin architecture (7); therefore, the subclonal mutations in transcription of BCL6 during B cell development, it also in- the BCL6 SE may be the result of collateral damage of SHM at creases the accessibility of this region to the SHM machinery, the IgH locus. The unusually high numbers of mutations at the which generates mutations and promotes strand breakage (46). “ ” BCL6 SE locus suggests that they are not random events, but Indeed, it has been reported that this mutation-hot region is rather may be a part of the normal processes of B cell maturation also a hotbed for DNA breaks leading to genomic translocations, and differentiation. frequently fusing BCL6 with the Ig heavy chain genes (20, 47). In The identification of thousands of mutations in the BCL6 SE addition to chromatin conformation, AID access to nuclear enabled the extraction of mutation signatures and analyses of DNA and AID-induced C > T mutations are also regulated by mutational processes operative during aSHM in normal cells. such as GANP, which facilitates translocation of AID We show that mutations at the BCL6 SE are dominated by A > from the cytoplasm into the nucleus and targets it to sites of G, C > T, and C > G substitutions (SI Appendix, Fig. S5A). SHM (48). Together, these 3 mutation types comprise 70% of all mutations Multiple rounds of affinity selection and clonal expansion are at this locus. While C > T mutations are in accord with cytidine required for high-affinity antibody production (13). B cells deamination by AID (4), A > G and C > G substitutions are expressing low-affinity immunoglobulins are rapidly eliminated consistent with synthesis by error-prone DNA polymerases Pol η by apoptosis. In the case of BCL6, specific subclonal mutations and Rev1, respectively (22, 24, 25, 31–34). It is well established may be selected and expanded to cause aberrant cell differenti- that AID is essential for B cell development in response to an- ation or malignant phenotypes. In fact, clonal BCL6 SE muta- tigen stimulation and for antibody diversification in germinal tions and translocations are characteristic of diffuse large B cell centers (35, 36). Multiple error-prone DNA polymerases have lymphomas (12). They can affect the binding of transcription been implicated in SHM (33, 37–39); of these, Pol η in particular factors, such as the IRF4 repressor, to deregulate BCL6 protein shows increased expression in germinal center B cells (31, 40, 41) expression (14); BCL6 is frequently overexpressed in B cell and can be visualized specifically expressed in the dark zone, lymphomas (49).

Shen et al. PNAS Latest Articles | 5of7 Downloaded by guest on September 26, 2021 The mutational landscape of the BCL6 SE might serve as a sequential rounds of hybridization to enrich and sequence the indicated SEs sensitive indicator for mutation loads in healthy individuals. (SI Appendix, Table S1). Recent studies have reported the presence of low-frequency DS. DS was performed using published protocols (15, 16). In brief, genomic coding mutations in a large number of genes, including tumor + – DNA from purified CD19 B cells was fragmented by sonication, end- driver genes, in normal individuals (50 52). Many of these are repaired, and A-tailed. Purified DNA fragments (∼250 bp) were ligated found to be clonally expanded in the tumor genome. The B cell with DS adaptors containing unique molecular barcodes comprising 12 SE subclonal mutations represent a similar class of preexistent random nucleotides. The ligated DNA was PCR-amplified, and purified mutations in regulatory regions of genomic DNA. Apparent amplicons were hybridized to 5′-biotinylated capture probes. Double cap- shifts in the pattern of mutation position, density (the distance turing was performed as described previously (53) to improve on-target between mutations), and/or intensity (the extent of mutations) capture efficiency. After purification by binding to streptavidin beads, the final PCR amplified the captured DNA library using distinct index primers for could signal deregulated and an enhanced risk each sample to enable multiplex sequencing. The prepared libraries were for lymphomagenesis. mixed and sequenced using an Illumina HiSeq 2500 sequencer.

Methods Data Processing. Sequencing data were processed by as described previously + Human B Cells. The CD19 B cells were purchased from AllCells and ReachBio. (16). Sequencing reads with 12-nt molecular barcodes were assembled and Peripheral blood mononuclear cells (PBMCs) were purified from whole blood aligned to RefSeq to generate single-strand consensus sequences. Pairs of + drawn from 10 healthy donors (SI Appendix, Table S2). CD19 B cells were single-strand consensus sequences with complementary barcodes were purified by positive selection using an anti-CD19 antibody-conjugated af- grouped to establish double-strand consensus sequences. Mutations were + finity column. Purified memory B cells (CD27 ), naïve B cells, monocytes, and scored only if base substitutions were present at the same position in both T cells (all purchased from Bloodworks Northwest) were isolated using cell- strands and were complementary to each other. specific isolation kits and separation on autoMACS columns (Miltenyl Biotec). All samples were deidentified before use. Data Availability. Sequencing data presented in this paper can be accessed at NCBI BioProject PRJNA574179 (54). + DNA Isolation. Purified CD19 B cells were lysed with proteinase K, and DNA was extracted using the Qiagen DNeasy Blood and Tissue Kit. DNA concen- ACKNOWLEDGMENTS. We thank Mark Fielden, Herve Lebrec, and Hisham trations were quantified with a Nanodrop 2000 spectrophotometer. Hamadeh for helpful input during the course of these studies; and Clint Valentine for bioinformatics assistance. This study was supported by Amgen, NIH/National Cancer Institute Cancer Center Support Grant P30 CA015704, a Capture Library. Synthetic DNA oligonucleotides (purchased from IDT) were pilot grant from the Core Center of Excellence in Hematology (P30 DK used for targeted gene capture. A capture-probe pool comprising 5′- 56465), Seattle Translational Tumor Research, and NIH/National Cancer biotinylated DNA oligonucleotides (120 nt) was designed and used in Institute Grants P01 CA077852 and R01 CA193649.

1. U. Klein, R. Dalla-Favera, Germinal centres: Role in B cell physiology and malignancy. 21. Z. Lu et al., Convergent BCL6 and lncRNA promoters demarcate the major breakpoint Nat. Rev. Immunol. 8,22–33 (2008). region for BCL6 translocations. Blood 126, 1730–1731 (2015). 2. G. D. Victora, M. C. Nussenzweig, Germinal centers. Annu. Rev. Immunol. 30, 429–457 22. T. Matsuda, K. Bebenek, C. Masutani, F. Hanaoka, T. A. Kunkel, Low- fidelity DNA (2012). synthesis by human DNA polymerase-η. Nature 404, 1011–1013 (2000). 3. N. S. De Silva, U. Klein, Dynamics of B cells in germinal centres. Nat. Rev. Immunol. 15, 23. L. A. Loeb, B. D. Preston, Mutagenesis by apurinic/apyrimidinic sites. Annu. Rev. 137–148 (2015). Genet. 20, 201–230 (1986). 4. J. U. Peled et al., The biochemistry of somatic hypermutation. Annu. Rev. Immunol. 24. C. Kano, F. Hanaoka, J. Y. Wang, Analysis of mice deficient in both REV1 catalytic 26, 481–511 (2008). activity and POLH reveals an unexpected role for POLH in the generation of C to G 5. M. Liu et al., Two levels of protection for the B cell genome during somatic hyper- and G to C transversions during Ig gene hypermutation. Int. Immunol. 24, 169–174 mutation. Nature 451, 841–845 (2008). (2012). 6. F. L. Meng et al., Convergent transcription at intragenic super-enhancers targets AID- 25. K. Masuda et al., A critical role for REV1 in regulating the induction of C:G transitions initiated genomic instability. Cell 159, 1538–1548 (2014). and A:T mutations during Ig gene hypermutation. J. Immunol. 183, 1846–1850 (2009). 7. J. Qian et al., B cell super-enhancers and regulatory clusters recruit AID tumorigenic 26. L. B. Alexandrov et al.; Australian Pancreatic Cancer Genome Initiative; ICGC Breast activity. Cell 159, 1524–1537 (2014). Cancer Consortium; ICGC MMML-Seq Consortium; ICGC PedBrain, Signatures of mu- 8. L. Pasqualucci et al., BCL-6 mutations in normal germinal center B cells: Evidence of tational processes in human cancer. Nature 500, 415–421 (2013). somatic hypermutation acting outside Ig loci. Proc. Natl. Acad. Sci. U.S.A. 95, 11816– 27. R. Rosenthal, N. McGranahan, J. Herrero, B. S. Taylor, C. Swanton, DeconstructSigs: 11821 (1998). Delineating mutational processes in single tumors distinguishes DNA repair defi- 9. H. M. Shen, A. Peters, B. Baron, X. Zhu, U. Storb, Mutation of BCL-6 gene in normal B ciencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016). cells by the process of somatic hypermutation of Ig genes. Science 280, 1750–1752 28. L. Zhang et al., Single-cell whole-genome sequencing reveals the functional landscape (1998). of somatic mutations in B lymphocytes across the human lifespan. Proc. Natl. Acad. 10. S. Crotty, R. J. Johnston, S. P. Schoenberger, Effectors and memories: Bcl-6 and Blimp- Sci. U.S.A. 116, 9014–9019 (2019). 1 in T and B lymphocyte differentiation. Nat. Immunol. 11, 114–120 (2010). 29. I. B. Rogozin, N. A. Kolchanov, Somatic hypermutagenesis in immunoglobulin genes. 11. K. Basso, R. Dalla-Favera, Roles of BCL6 in normal and transformed germinal center B Biochim. Biophys. Acta Gene Structure Expression 1171,11–18 (1992). cells. Immunol. Rev. 247, 172–183 (2012). 30. Y. Zhao et al., Mechanism of somatic hypermutation at the WA motif by human DNA 12. A. Migliazza et al., Frequent somatic hypermutation of the 5′ noncoding region of polymerase η. Proc. Natl. Acad. Sci. U.S.A. 110, 8146–8151 (2013). the BCL6 gene in B-cell lymphoma. Proc. Natl. Acad. Sci. U.S.A. 92, 12520–12524 31. X. Zeng et al., DNA polymerase eta is an A-T mutator in somatic hypermutation of (1995). immunoglobulin variable genes. Nat. Immunol. 2, 537–541 (2001). 13. K. Basso, R. Dalla-Favera, Germinal centres and B cell lymphomagenesis. Nat. Rev. 32. I. B. Rogozin, Y. I. Pavlov, K. Bebenek, T. Matsuda, T. A. Kunkel, Somatic mutation Immunol. 15, 172–184 (2015). hotspots correlate with DNA polymerase eta error spectrum. Nat. Immunol. 2, 530– 14. M. Saito et al., A signaling pathway mediating downregulation of BCL6 in germinal 536 (2001). center B cells is blocked by BCL6 gene alterations in B cell lymphoma. Cancer Cell 12, 33. M. Seki, P. J. Gearhart, R. D. Wood, DNA polymerases and somatic hypermutation of 280–292 (2007). immunoglobulin genes. EMBO Rep. 6, 1143–1148 (2005). 15. M. W. Schmitt et al., Detection of ultra-rare mutations by next-generation sequenc- 34. K. J. Zanotti, P. J. Gearhart, Antibody diversification caused by disrupted mismatch ing. Proc. Natl. Acad. Sci. U.S.A. 109, 14508–14513 (2012). repair and promiscuous DNA polymerases. DNA Repair (Amst.) 38, 110–116 (2016). 16. S. R. Kennedy et al., Detecting ultralow-frequency mutations by duplex sequencing. 35. U. Storb, J. Stavnezer, Immunoglobulin genes: Generating diversity with AID and Nat. Protoc. 9, 2586–2606 (2014). UNG. Curr. Biol. 12, R725–R727 (2002). 17. Y. C. Wu et al., High-throughput immunoglobulin repertoire analysis distinguishes 36. C. Keim, D. Kazadi, G. Rothschild, U. Basu, Regulation of AID, the B-cell genome between human IgM memory and switched memory B-cell populations. Blood 116, mutator. Genes Dev. 27,1–17 (2013). 1070–1078 (2010). 37. H. Zan et al., The translesion DNA polymerase ζ plays a major role in Ig and bcl-6 18. S. Weller et al., CD40-CD40L independent Ig gene hypermutation suggests a second B somatic hypermutation. Immunity 14, 643–653 (2001). cell diversification pathway in humans. Proc. Natl. Acad. Sci. U.S.A. 98, 1166–1170 38. H. Zan et al., The translesion DNA polymerase θ plays a dominant role in immuno- (2001). globulin gene somatic hypermutation. EMBO J. 24, 3757–3769 (2005). 19. H. Morbach, E. M. Eichhorn, J. G. Liese, H. J. Girschick, Reference values for B cell 39. J.-C. Weill, C.-A. Reynaud, DNA polymerases in adaptive immunity. Nat. Rev. Immunol. subpopulations from infancy to adulthood. Clin. Exp. Immunol. 162, 271–279 (2010). 8, 302–312 (2008). 20. Z. Lu et al., BCL6 breaks occur at different AID sequence motifs in Ig-BCL6 and non-Ig- 40. R. Ouchida et al., Genetic analysis reveals an intrinsic property of the germinal center BCL6 rearrangements. Blood 121, 4551–4554 (2013). B cells to generate A:T mutations. DNA Repair (Amst.) 7, 1392–1398 (2008).

6of7 | www.pnas.org/cgi/doi/10.1073/pnas.1914163116 Shen et al. Downloaded by guest on September 26, 2021 41. K. Masuda et al., DNA polymerase η is a limiting factor for A:T mutations in Ig genes 48. M. M. A. Eid et al., Integrity of immunoglobulin variable regions is supported by and contributes to antibody affinity maturation. Eur. J. Immunol. 38, 2796–2805 GANP during AID-induced somatic hypermutation in germinal center B cells. Int. (2008). Immunol. 29, 211–220 (2017). 42. L. J. McHeyzer-Williams, P. J. Milpied, S. L. Okitsu, M. G. McHeyzer-Williams, Class- 49. K. Hatzi, A. Melnick, Breaking bad in the germinal center: How deregulation of BCL6 switched memory B cells remodel BCRs within secondary germinal centers. Nat. Im- contributes to lymphomagenesis. Trends Mol. Med. 20, 343–352 (2014). munol. 16, 296–305 (2015). 50. I. Martincorena et al., Somatic mutant clones colonize the human esophagus with 43. F. Supek, B. Lehner, Clustered mutation signatures reveal that error-prone DNA repair age. Science 362, 911–917 (2018). targets mutations to active genes. Cell 170, 534–547.e23 (2017). 51. A. Yokoyama et al., Age-related remodelling of oesophageal epithelia by mutated 44. I. B. Rogozin et al., DNA polymerase η mutational signatures are found in a variety of cancer drivers. Nature 565, 312–317 (2019). different types of cancer. Cell Cycle 17, 348–355 (2018). 52. K. Yizhak et al., RNA sequence analysis reveals macroscopic somatic clonal expansion 45. E. Calo, J. Wysocka, Modification of enhancer chromatin: What, how, and why? Mol. across normal tissues. Science 364, eaaw0726 (2019). Cell 49, 825–837 (2013). 53. M. W. Schmitt et al., Sequencing small genomic targets with high efficiency and ex- 46. M. R. Lieber, Mechanisms of human lymphoid chromosomal translocations. Nat. Rev. treme accuracy. Nat. Methods 12, 423–425 (2015). Cancer 16, 387–398 (2016). 54. J.-C. Shen, L. A. Loeb, A high resolution landscape of mutations in the BCL6 super- 47. B. H. Ye et al., Chromosomal translocations cause deregulated BCL6 expression by enhancer in normal B cells. BioProject. https://www.ncbi.nlm.nih.gov/sra/PRJNA574179. promoter substitution in B cell lymphoma. EMBO J. 14, 6209–6217 (1995). Deposited 25 September 2019. MEDICAL SCIENCES

Shen et al. PNAS Latest Articles | 7of7 Downloaded by guest on September 26, 2021