Analysis of 6912 Unselected Somatic Hypermutations in Human VDJ Rearrangements Reveals Lack of Strand Specificity and Correlation between Phase II This information is current as Substitution Rates and Distance to the of September 26, 2021. Nearest 3 ′ Activation-Induced Cytidine Deaminase Target Line Ohm-Laursen and Torben Barington Downloaded from J Immunol 2007; 178:4322-4334; ; doi: 10.4049/jimmunol.178.7.4322 http://www.jimmunol.org/content/178/7/4322 http://www.jimmunol.org/ References This article cites 76 articles, 28 of which you can access for free at: http://www.jimmunol.org/content/178/7/4322.full#ref-list-1

Why The JI? Submit online.

• Rapid Reviews! 30 days* from submission to initial decision by guest on September 26, 2021 • No Triage! Every submission reviewed by practicing scientists

• Fast Publication! 4 weeks from acceptance to publication

*average

Subscription Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Permissions Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Email Alerts Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts

The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2007 by The American Association of Immunologists All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. The Journal of Immunology

Analysis of 6912 Unselected Somatic Hypermutations in Human VDJ Rearrangements Reveals Lack of Strand Specificity and Correlation between Phase II Substitution Rates and Distance to the Nearest 3؅ Activation-Induced Cytidine Deaminase Target1

Line Ohm-Laursen2 and Torben Barington3

The initial event of (SHM) is the deamination of cytidine residues by activation-induced cytidine deaminase (AID). Deamination is followed by the replication over uracil and/or different error-prone repair events. We sequenced 659 nonproductive human IgH rearrangements (IGHV3-23*01) from blood B lymphocytes enriched for CD27-positive memory cells. Downloaded from Analyses of 6,912 unique, unselected substitutions showed that in vivo hot and cold spots for the SHM of C and G residues corresponded closely to the target preferences reported for AID in vitro. A detailed analysis of all possible four-nucleotide motifs present on both strands of the VH showed significant correlations between the substitution frequencies in reverse comple- mentary motifs, suggesting that the SHM machinery targets both strands equally well. An analysis of individual JH and D gene segments showed that the substitution frequencies in the individual motifs were comparable to the frequencies found in the VH http://www.jimmunol.org/ gene. Interestingly, JH6-carrying sequences were less likely to undergo SHM (average 15.2 substitutions per VH region) than ؍ sequences using JH4 (18.1 substitutions, p 0.03). We also found that the substitution rates in G and T residues correlated inversely with the distance to the nearest 3؅ WRC AID hot spot motif on both the nontranscribed and transcribed strands. This ,suggests that phase II SHM takes place 5؅ of the initial AID deamination target and primarily targets T and G residues or alternatively, the corresponding A and C residues on the opposite strand. The Journal of Immunology, 2007, 178: 4322–4334.

omatic hypermutations (SHMs)4 in the form of nucleotide in the intron region downstream of the J and no mutations substitutions, insertions, and deletions are found through- are normally found in the constant domain. In contrast, the entire S out the variable regions of Ig rearrangements in postger- variable domain is targeted by SHM. The CDRs have been described by guest on September 26, 2021 minal center B cells. SHM is followed by selection against unfa- as being more prone to mutation than the framework regions (FRs), vorable mutations and for high affinity binding to Ag, and the and this been attributed to the presence of more RGYW hot spot process is necessary for the generation of high affinity Abs and B motifs (9, 10). cell memory. The only trans-acting factor described to be absolutely manda- SHM is dependent on several cis-acting elements, including the tory for SHM is activation-induced cytidine deaminase (AID). promoter and the elements of the Ig enhancer regions (1–6). The AID is not only required for SHM but also for class switch re- promoter and enhancer requirement is likely to be due to a re- combination (CSR) and gene conversion (11–13). Ectopic expres- quirement for transcription, although other properties of the en- sion of AID can turn on mutation and CSR in human hybridomas, hancers, e.g., binding, are likely to be involved as well (6). Escherichia coli, and murine fibroblasts (14–17), proving that it is The transcription dependence is furthermore supported by the fact the only B cell-specific factor necessary for SHM. AID is a cyti- that the mutation frequency decays exponentially from a starting dine deaminase shown to be able to deaminate cytidine residues in point ϳ150–200 bp from the promoter (7, 8). The 3Ј boundary is ssDNA, in particular in WRC motifs (18–20). It has also been suggested that AID could be involved in SHM by modulation of

Department of Clinical Immunology, Odense University Hospital, Denmark the mRNA of an involved protein (11). According to the current model for SHM (21, 22), the process is Received for publication July 13, 2006. Accepted for publication January 8, 2007. initiated by cytidine deamination by AID. The targeted sequence is The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance thought to be single stranded because of the ongoing transcription with 18 U.S.C. Section 1734 solely to indicate this fact. from the Ig promoter. The generated uracil can either be replicated 1 This study was supported by Danish Medical Research Council Grant 22-01-0156. over, generatingaCtoTtransition in the sister cell (or G to A if 2 Current address: University of Oxford, The Peter Medawar Building for Pathogen the transcribed strand is targeted) (phase I), or it can be removed. Research, South Parks Road, Oxford, U.K. Uracil DNA-glycosylase (UNG) and a complex of MutS homologs 3 Address correspondence and reprint requests to Prof. Torben Barington, Department (MSH) 2 and 6 (MSH2 and MSH6) have both been described as of Clinical Immunology, Odense University Hospital, 5000 Odense C, Denmark. E-mail address: [email protected] being capable of uracil removal. 4 Abbreviations used in this paper: SHM, somatic hypermutation; AID, activation- Deletion of or mutation in the murine and human UNG gene induced cytidine deaminase; CSR, class switch recombination; EXO1, exonuclease I; changes the mutation pattern of G and C residues almost exclu- FR, framework region; MSH, MutS homolog; pol, DNA ; UNG, uracil sively to transitions (21, 23, 24). MSH2/MSH6 deficiency leads to DNA glycosylase. impaired mutation of A and T residues (21, 25, 26), and UNG- Copyright © 2007 by The American Association of Immunologists, Inc. 0022-1767/07/$2.00 MSH6 double knock-out mice have almost exclusively C to T and www.jimmunol.org The Journal of Immunology 4323

motifs but, as for phase I C/G mutations, we find no sign of strand specificity. Interestingly, the phase II substitution rates in T and G showed an inverse correlation to the distance to the nearest 3Ј WRC, indicating that phase II mutations predominately occur in T and G residues 5Ј of the initial AID-deaminated cytidine residue or, alternatively, in the corresponding A and C residues on the opposite strand.

Materials and Methods A material consisting of 659 nonproductive and 5,670 productive rear- FIGURE 1. Number of substitutions in the VH region of 659 nonproduc- rangements of IGHV3-23 rearranged to predominately IGHJ4 and IGHJ6 tive rearrangements. Only the 386 sequences with more than three substitu- was collected and validated as described (37) (European Molecular Biol- tions were used in the mutation analysis to reduce the influence of Taq errors. ogy Laboratory (EMBL) accession nos. AM076988–AM083316). In brief, 100 ml of blood was collected from 28 healthy, adult volunteers after informed consent (the study was approved by the Ethics Committee for G to A transitions, i.e., phase I mutations (27). A and T residues Funen and Vejle Counties, Denmark). Donors had been selected to be are targeted during phase II of SHM. Phase II involves resolution homozygous for IGHV3-23*01 and IGHJ6*02, which are the most fre- quent genotypes in the Danish Caucasian population (genotype frequency of the abasic site generated by uracil removal and has been sug- 0.64 and 0.51, respectively) (see Ref. 38). DNA was purified from mag- gested to involve repair by error-prone DNA such as netic bead-isolated memory B cells (MACS B cell isolation kit II followed Downloaded from polymerase (pol) ␩ and pol ␨ and the involvement of exonuclease by CD27-isolation (Miltenyi Biotec via Biotech Line)) using the QIAmp I (EXO1) (28–35). Pol ␪ has been shown to be involved in phase blood DNA mini kit (Qiagen via VWR International). IgH rearrangements were PCR amplified with 3-23cn9.F (5Ј-CTGAGCTGGCTTTTTTTCTT II C/G mutation (36). An alternative hypothesis suggests that in- GTG-3Ј) as the forward primer and either JH4.R (5Ј-GCCGCTGTTGC corporation of dUTP may be the cause of A/T mutations (22). CTCAGG-3Ј) or JH6.R1 (5Ј-CCCACAGGCAGTAGCAGAA-3Ј)asthe In this study we have analyzed 6,912 substitutions in 386 mu- reverse primer. PCR products were cloned using the TOPO TA cloning kit tated, nonproductive human H chain rearrangements using IGHV3- (Invitrogen Life Technologies), and plasmid DNA was purified using the http://www.jimmunol.org/ 23*01 and 542 substitutions in 56 mutated rearrangements using Wizard SV 9600 plasmid purification system (Promega via Ramcon). The rearrangements were sequenced with the BigDye Terminator kit (Applied the IGHV3-h pseudogene. This high number of nonproductive se- Biosystems) with an ABI Prism 3100 genetic analyzer (Applied Biosys- quences enables us to study nonselected mutations with a good tems). A set of 103 rearrangements using the IGHV3-h pseudogene was statistical power. We show that mutations in C and G residues can generated in a similar way (37) (EMBL accession nos. AM282702– be assigned to AID deamination equally targeted to both strands. AM282804). The CDR3 regions were analyzed for D genes, P and N nu- cleotides, and trimming using the JointMLc algorithm as described else- The substitution rate depends on the motifs in which the nucleotide where (37). An online version of the algorithm can be found at is found and the recognized motifs are at least four nucleotides www.cbs.dtu.dk/services/VDJsolver. Statistical analyses were performed long. The substitution rates in A and T residues also depend on the using the Analyze-it addition to Microsoft Excel. by guest on September 26, 2021

FIGURE 2. Substitution rates in the different positions of the IGHV3-23*01 VH gene segment in 386 somatically hypermutated, nonproductive rear- rangements. A total of 6,912 substitutions were analyzed. FR/CDR boundaries are indicated according to ImMunoGeneTics (IMGT) nomenclature. 4324 SHM OCCUR IN AND 5Ј OF AID HOT SPOTS ON BOTH STRANDS

Table I. VH substitutions in 386 non-productive IGHV3–23*01 sequences with more than three substitutions in the VH regiona

To Percentage From ATCGTransitions Transversions Mutated (%)b

A 0.061c (424) 0.052 (359) 0.118 (816) 0.118 (816) 0.113 (783) 6.8 T 0.046 (316) 0.089 (615) 0.054 (375) 0.089 (615) 0.100 (691) 5.6 C 0.040 (273) 0.130 (895) 0.093 (645) 0.130 (895) 0.133 (918) 6.7 G 0.166 (1148) 0.042 (290) 0.109 (756) 0.166 (1148) 0.151 (1046) 6.1 Total 0.503 (3474) 0.497 (3438) 6.3

a The fraction of each type of substitution among all substitutions is given as well as the absolute numbers (parentheses). The overall transition/ transversion ratio was 1.01. The rightmost column shows the substitution rates for each of the four nucleotides and the overall substitution rate. b Substitution rates varied from 5.6% for T to 6.8% for A, which is statistically significant ( p Ͻ 0.0001, ␹2 test). c Fraction of the given substitution of all substitutions.

Results 0.0015 substitutions per nucleotide (37). Furthermore, 153 dele- The data set consisted of 659 nonproductive and 5,670 productive tions and 70 insertions were found in the VH regions, accounting IGHV3-23*01 rearrangements. Sequences were considered non- for 2 and 1% of the total mutations, respectively. Insertions and deletions will be described in detail elsewhere (T. Barington and L. productive when the V-D-JH joint changed the normal reading Downloaded from frame of the JH segment or when the D segment contained one or Ohm-Laursen, manuscript in preparation). A further 855 substitu- more stop codons not resulting from SHM (i.e., germline en- tions were found in the JH-regions and 213 in the D regions of coded). Because the mutations in nonproductive sequences have IGHV3-23-using sequences. The VH region of the IGHV3-h-using not been selected for Ag binding, the mutations were expected to sequences contained 542 substitutions in total. represent the intrinsic preferences of the mutation mechanism. To exclude the possibility that some of the rearrangements used Great care was taken to remove clonally related sequences from other VH-genes, all sequences were thoroughly compared with all http://www.jimmunol.org/ the material (37) and hence the mutations are expected to have known IGHV alleles in the ImMunoGeneTics (IMGT) database arisen independently. All sequences contained exon 2 of the VH (http://imgt.cines.fr) (39). We always found maximal identity with gene starting in codon 1 or further upstream and continued through IGHV3-23*01 or IGHV3-h*01, respectively. Some donors fre- Ј the 3 end of the JH gene. The precise germline sequences of the quently had a specific mutation in IGHD3-3, suggesting the exis- VH gene (IGHV3-23*01) and the JH6 allele (IGHJ6*02) were tence of a new D gene allele. However, this could not be confirmed known, because all donors had been typed and found homozygous by sequencing of the IGHD3-3 germline gene (data not shown). in the respective loci (see Ref. 38). The VDJ joints were analyzed No other mutations in the IGHD genes or IGHJ genes indicated the using the JointML algorithm that we have showed to be the cur- presence of new alleles.

rently best algorithm for identifying D genes (37). Of the 659 Fig. 2 shows the substitution frequencies in the different posi- by guest on September 26, 2021 nonproductive rearrangements, 185 were unmutated in the VH re- tions in the VH region in the 386 nonproductive IGHV3-23*01 gion (codons 1–107) and therefore excluded. To minimize inter- rearrangements. The overall substitution frequency per nucleotide ference from Taq errors in the analyses, those sequences with one in the VH-region was 6.3%, varying from 5.6% for T residues to to three mutations were also excluded (88 sequences). The remain- 6.8% for A residues (see Table I). The transition/transversion ratio ing 386 sequences contained from four to 70 mutations (median was 1.01. In the mutated productive sequences the transition/trans- Ͻ 14, average 16.5) in the VH region, excluding insertions and de- version ratio was significantly higher (1.30, p 0.0001) and the letions (see Fig. 1). Altogether, these sequences contained 6,912 mutation rate lower (5.0%, p ϭ 0.0026). This was largely due to a substitutions. Less than 2.4% of these substitutions were expected lower rate of replacement mutation yielding a lower replacement-to- to be Taq errors based on an estimated error rate of 0.00048– silent mutation ratio (2.53 vs 4.02, p Ͻ 0.0001) that is indicative of

FIGURE 3. Mutation rates given as the log transformed fraction of 386 nonproductive

IGHV3-23*01 VH sequences (codons 1–107) carrying a specific substitution type (of three possible) in individual C (a),G(b),A(c), and T (d) residues. For clarity, the individual nucleo- tides have been placed in the order of declining total substitution rates along the x-axis. For all four nucleotides, transitions (light gray marks) predominate. However, for transversions two distinct patterns are seen. For C and G residues, transversions to the complementary nucleotides (black marks) are more common than transver- sions to the noncomplementary nucleotides (dark gray marks), while both types of transver- sions are equally frequent in A and T residues. The T residue in position three in codon 15 ,ء with a remarkably high level of T to G transversions. The Journal of Immunology 4325

Table II. Substitution rates of the boldfaced C/G (top portion) and A/T (bottom portion) residue in the given four-nucleotide motifs in 386 non-productive, mutated rearrangements (codons 1 through 107)a

C/G Substitutions in the Nontranscribed Strand

Substitutions in C Substitutions in G

No. of Substitution Rates (%) No. of Substitution Rates (%) No. of Motifs No. of Motifs Motif Substitutions Tested Total C to T C to G C to AMotif Substitutions Tested Total G to A G to C G to T

AGCT* 301 871 34.6 14.5 17.2 2.9 AGCT* 309 879 35.2 17.4 15.5 2.3 AGCG— —————CGCT 71 299 23.8 13.0 6.0 5.0 AGCA 110 470 23.4 13.0 5.1 5.3 TGCT — ————— TGCC* 73 334 21.9 11.7 4.8 5.4 GGCA* — ————— TGCT* — —————AGCA* 92 452 20.4 8.9 10.4 1.1 TACT* 76 477 15.9 7.6 5.0 3.4 AGTA* — ————— CGCT 40 268 14.9 7.5 6.0 1.1 AGCG — ————— TACC* — —————GGTA* 87 731 11.9 8.2 1.2 2.5 AACC* — —————GGTT* 34 328 10.4 4.0 5.5 0.9 TACG 26 255 10.2 6.3 2.0 2.0 CGTA 59 279 21.2 11.5 6.1 3.6 ACCT 30 308 9.7 4.9 3.3 1.6 AGGT 8 355 2.3 1.1 0.9 0.3 ATCT 51 580 8.8 3.5 4.1 1.2 AGAT — ————— AGCC* 108 1367 7.9 5.1 1.8 1.0 GGCT* 76 1040 7.3 4.4 1.9 1.0

CACC 51 648 7.9 3.2 3.7 0.9 GGTG 27 600 4.5 2.7 1.3 0.5 Downloaded from ATCC— —————GGAT 21 305 6.9 3.6 2.6 0.7 AACA 42 618 6.8 3.4 2.3 1.1 TGTT 22 350 6.3 3.7 1.4 1.1 TTCT— —————AGAA 19 304 6.3 2.3 4.0 0.0 TGCA 60 1007 6.0 3.8 1.1 1.1 TGCA 32 979 3.3 1.8 1.0 0.4 CACT— —————AGTG 52 879 5.9 3.3 1.7 0.9 GACT 33 696 4.7 1.3 2.9 0.6 AGTC 35 354 9.9 4.2 5.7 0.0 TTCC 13 284 4.6 2.5 1.8 0.4 GGAA 8 327 2.5 1.8 0.3 0.3 ACCC— —————GGGT 47 1076 4.4 2.4 1.6 0.4

CGCA 11 288 3.8 2.1 1.0 0.7 TGCG 7 295 2.4 1.7 0.7 0.0 http://www.jimmunol.org/ TGCG 11 299 3.7 2.0 1.3 0.3 CGCA 21 298 7.1 4.7 1.7 0.7 GGCT 32 996 3.2 1.8 1.3 0.1 AGCC 39 1298 3.0 1.1 1.5 0.5 TCCT 11 345 3.2 1.5 0.3 1.5 AGGA 47 340 13.8 3.5 9.7 0.6 TTCA 20 644 3.1 1.6 0.8 0.8 TGAA 17 636 2.7 2.0 0.6 0.0 TTCG— —————CGAA 7 244 2.9 2.1 0.8 0.0 CACG 17 594 2.8 1.9 0.8 0.2 CGTG 10 349 2.9 0.9 1.2 0.9 CGCC 10 362 2.8 2.2 0.3 0.3 GGCG — ————— CACA 6 240 2.5 0.8 0.8 0.8 TGTG 7 657 1.1 0.5 0.3 0.3 GGCA— —————TGCC 6 267 2.3 1.1 0.8 0.4 CTCA 7 325 2.2 1.5 0.3 0.3 TGAG 17 907 1.9 0.7 0.9 0.3 CTCC 28 1363 2.1 0.8 0.8 0.4 GGAG 22 1007 2.2 0.3 1.9 0.0

TCCC 6 368 1.6 0.8 0.3 0.5 GGGA 8 710 1.1 1.0 0.0 0.1 by guest on September 26, 2021 TCCA 16 999 1.6 0.7 0.7 0.2 TGGA 26 1005 2.6 1.9 0.3 0.4 GACA 10 674 1.5 0.9 0.3 0.3 TGTC — ————— GGCC 10 723 1.4 0.6 0.4 0.4 GGCC 8 721 1.1 0.1 0.4 0.6 ACCA 4 295 1.4 1.0 0.0 0.3 TGGT 57 1013 5.6 3.5 1.1 1.1 TACA 4 298 1.3 1.0 0.3 0.0 TGTA 24 242 9.9 4.6 2.9 2.5 ACCG— —————CGGT 4 303 1.3 0.0 1.0 0.3 GTCT 8 661 1.2 0.6 0.3 0.3 AGAC 17 1000 1.7 0.7 0.7 0.3 TCCG 8 695 1.2 0.9 0.0 0.3 CGGA — ————— CTCT 6 669 0.9 0.5 0.3 0.2 AGAG 10 672 1.5 0.3 1.0 0.2 CTCG— —————CGAG 2 305 0.7 0.7 0.0 0.0 GACC— —————GGTC 6 1035 0.6 0.3 0.2 0.1 GCCC— —————GGGC 4 712 0.6 0.3 0.3 0.0 CCCC— —————GGGG 12 2176 0.6 0.3 0.1 0.1 CCCA— —————TGGG 7 1313 0.5 0.1 0.4 0.1 GCCT 4 906 0.4 0.2 0.2 0.0 AGGC 3 627 0.5 0.2 0.2 0.2 GCCA 2 606 0.3 0.2 0.0 0.2 TGGC — ————— GCCG 3 947 0.3 0.2 0.0 0.1 CGGC 3 350 0.9 0.6 0.0 0.3 CCCT 1 356 0.3 0.0 0.0 0.3 AGGG 5 1034 0.5 0.4 0.1 0.0 GTCC 2 717 0.3 0.1 0.1 0.0 GGAC 0 297 0.0 0.0 0.0 0.0 GGCG— —————CGCC 1 353 0.3 0.3 0.0 0.0

A/T Substitutions in the Nontranscribed Strand

Substitutions in A Substitutions in T

No. of Substitution Rates (%) No. of Substitution Rates (%) No. of Nucleotide No. of Nucleotides Motif Substitutions Tested Total A to G A to T A to CMotif Substitutions Tested Total T to C T to A T to G

GTAT 106 520 20.4 6.7 7.8 5.8 ATAC 47 239 19.4 7.5 5.4 6.7 CTAT 65 367 17.7 7.1 6.0 4.6 ATAG — ————— GTAG 60 388 15.5 10.1 3.4 2.1 CTAC 26 235 11.1 6.8 2.6 1.7 TTAC 31 238 13.0 5.0 3.8 4.2 GTAA — ————— CTAC 29 238 12.2 4.2 3.8 4.2 GTAG 52 380 13.7 6.3 3.2 4.2 GAAT— —————ATTC 75 641 11.7 5.3 3.7 2.7 GTAA— —————TTAC 26 233 11.2 4.7 3.0 3.4 CAAT 31 291 10.7 4.8 2.4 3.4 ATTG — ————— ATAC 22 214 10.3 4.2 3.7 2.3 GTAT 38 452 8.4 4.4 1.6 2.4 GTAC 32 319 10.0 6.3 1.9 1.9 GTAC 31 318 9.8 2.8 3.5 3.5 ATAT 25 250 10.0 4.8 3.6 1.6 ATAT 20 245 8.2 4.5 2.9 0.8 (Table continues) 4326 SHM OCCUR IN AND 5Ј OF AID HOT SPOTS ON BOTH STRANDS

Table II. (Continued)

A/T Substitutions in the Nontranscribed Strand

Substitutions in A Substitutions in T

No. of Substitution Rates (%) No. of Substitution Rates (%) No. of Nucleotide No. of Nucleotides Motif Substitutions Tested Total A to G A to T A to CMotif Substitutions Tested Total T to C T to A T to G

CTAA— —————TTAG 53 567 9.3 6.7 1.1 1.6 AAAT 29 314 9.2 5.7 1.9 1.6 ATTT — ————— CAAG 28 318 8.8 7.2 0.9 0.6 CTTG 19 342 5.6 4.7 0.3 0.6 CAAA 25 304 8.2 6.3 1.0 1.0 TTTG — ————— GAAC 50 619 8.1 5.5 0.7 1.9 GTTC 26 318 8.2 4.7 1.3 2.2 ATAG— —————CTAT 26 328 7.9 4.6 2.1 1.2 CCAT 44 608 7.2 3.3 2.6 1.3 ATGG — ————— GAAA 15 212 7.1 4.3 0.5 2.4 TTTC — ————— TTAG 35 549 6.4 2.9 2.4 1.1 CTAA — ————— AAAG 10 160 6.3 5.0 0.6 0.6 CTTT 18 296 6.1 4.4 1.0 0.7 GAAG 40 696 5.8 4.5 0.7 0.6 CTTC — ————— CCAA 17 312 5.5 3.9 1.0 0.6 TTGG 22 685 3.2 1.3 0.3 1.6 TAAT— —————ATTA 26 482 5.4 1.9 1.9 1.7 GCAT— —————ATGC 12 230 5.2 4.4 0.9 0.0

TCAC 32 651 4.9 2.2 1.2 1.5 GTGA 10 343 2.9 1.5 0.3 1.2 Downloaded from ACAT 11 250 4.4 2.0 1.2 1.2 ATGT — ————— ACAA 14 321 4.4 3.1 0.3 0.9 TTGT — ————— CCAC— —————GTGG 44 1038 4.2 1.8 1.5 1.0 GGAC 13 310 4.2 2.6 0.3 1.3 GTCC 10 725 1.4 0.4 0.4 0.6 GCAA 12 296 4.1 2.7 0.7 7.0 TTGC — ————— CGAA 10 247 4.1 2.0 0.8 1.2 TTCG — ————— TCAG 7 176 4.0 1.7 0.0 2.3 CTGA 22 704 3.1 1.9 0.9 0.4 AGAG 26 688 3.8 2.0 0.0 1.7 CTCT 12 675 1.8 0.7 0.2 0.9

TGAA 24 643 3.7 2.5 0.0 1.2 TTCA 11 635 1.7 0.5 0.3 0.9 http://www.jimmunol.org/ ACAC 22 643 3.4 2.3 0.8 0.3 GTGT — ————— AGAA 10 295 3.4 2.4 0.7 0.3 TTCT — ————— GCAC 6 183 3.3 2.2 0.6 0.6 GTGC 27 982 2.8 1.5 0.7 0.5 TCAT— —————ATGA 20 622 3.2 1.6 0.5 1.1 CCAG 32 1057 3.0 2.0 0.5 0.6 CTGG 100 1634 6.1 0.8 0.4 5.0 GCAG 33 1096 3.0 2.0 0.6 0.4 CTGC 6 271 2.2 1.9 0.0 0.4 GGAG 30 1015 3.0 1.6 0.6 0.8 CTCC 36 1371 2.6 2.0 0.2 0.4 CAAC— —————GTTG 10 339 3.0 1.5 0.9 0.6 CGAG 9 312 2.9 1.3 1.0 0.6 CTCG — ————— TGAG 25 915 2.7 1.2 0.6 1.0 CTCA 11 329 3.3 1.8 0.3 1.2 TAAA— —————TTTA 8 300 2.7 1.7 0.3 0.7

GGAA 8 327 2.5 1.8 0.3 0.3 TTCC 13 284 4.6 3.5 0.0 1.1 by guest on September 26, 2021 GGAT 7 291 2.1 1.7 0.0 0.7 ATCC — ————— ACAG 12 616 2.0 1.0 0.7 0.3 CTGT 12 1134 1.1 0.7 0.4 0.0 AGAC 19 1002 1.9 1.5 0.2 0.2 GTCT 10 663 1.5 0.9 0.5 0.2 AGAT— —————ATCT 10 539 1.9 0.9 0.0 0.9

a All possible four-nucleotide motifs found in the IGHV3–23*01 germline gene are included in the table. Substitutions in C or A are given in the left half of the table and substitutions in the corresponding G or T in the reverse complementary motif are given in the right half of the table. The total substitution rate as well as the substitution rates to each of the three possible nucleotides are given. A dash (—) means that the motif is not found in the IGHV3–23*01 germline gene. Motifs were not considered if nucleotides other than the boldfaced nucleotide were mutated in the given motif in the given sequence. The motifs are listed according to decreasing total substitution frequency in C or A .indicates that the motif is contained within the WRCY/RGYW motif (ء) or, in case the motif containing C or A was not present, in G or T, respectively. An asterisk

selection and highlights the importance of using nonselected se- caused by a mechanism with a very high preference for some po- quences to study the mechanism of SHM. To confirm that the muta- sitions (substitution rates Ͼ30%) and a low preference for other tions in the nonproductive rearrangements were indeed unselected, we positions (substitution rates Ͻ1%). In general, transitions predom- studied mutations (including insertions and deletions) abrogating the inated followed by transversions to the complementary nucleotide. open reading frame of the VH segment. Only 46 (1%) of 3701 mu- In contrast, the substitution rates in the A and T positions were less tated, productive rearrangements contained one or more stop codons. variable and the rates for the two types of transversions were com- These were likely to result from Taq errors because B cells lacking a parable. The graphs for C and G have similar courses and so have functional Ag receptor are rapidly lost from the circulation and should the graphs for A and T, suggesting that the mutation mechanism is therefore not appear in our material (40). In contrast, as many as 171 strand symmetric. (44%, p Ͻ 0.0001 when compared with productive rearrangements) of the 386 nonproductive sequences contained stop codon(s) in the C and G substitution rates in different motifs correlate closely to VH segment. This significant high proportion was not different from that found in rearrangements of the pseudogene IGHV3-h (45%, p ϭ the reported hot spot and cold spot motifs for cytidine 1.00), supporting the notion that substitutions in the nonproductive deamination by AID IGHV3-23*01-derived sequences are indeed unselected. To further investigate strand specificity, we compared the substi- tution rates in different motifs with those of the reverse comple- Substitution in different nucleotides mentary motifs when both were present in the nontranscribed Fig. 3 shows the distribution of substitution rates for all C (Fig. strand of the germline gene. In case the mutations are targeted to 3a), G (Fig. 3b), A (Fig. 3c), and T (Fig. 3d) positions, respec- both strands by similar mechanisms, this method should show a tively, divided into the three different substitution types. The correlation between the mutation rates because the mutation of a curves suggest that the substitutions in C and G residues were given residue in a motif on the nontranscribed strand should The Journal of Immunology 4327 correspond to the mutation of the same nucleotide in the same motif on the transcribed strand. We analyzed all possible three-, four-, and five- nucleotide mo- tifs and substitutions in all positions within these motifs. The stron- gest correlations between substitutions in the same position of the same motif on the two strands were found for C and A residues in position 3 in four-nucleotide motifs on the nontranscribed strand (compared with G and T, respectively, in position 2 in the reverse complementary motifs of the same strand) (data not shown). Table II (top portion) shows the substitution rates in all pairs of reverse complementary C/G motifs. Motifs are ranked by declining total substitution rate in the C residue or the corresponding G in case the C motif was not found on the nontranscribed strand. For a given motif, only sequences in which all other motif positions than the nucleotide in question were unmutated were included to minimize the risk of looking at an influence from neighboring substitutions. With this restriction, ϳ70% of the 6,912 detected substitutions were included in the analysis. The 10 most mutable motifs (that all have a substitution rate of Ͼ10% for C and/or G) included six of Downloaded from the seven RGYW/WRCY (where R is A or G, Y is C or T, and W is T or A) motifs present in the sequence. RGYW/WRCY has earlier been defined as hot spot motifs for SHM in vivo (41–43). FIGURE 4. Correlation analysis for substitution rates (log transformed) The last RGYW/WRCY motifs (AGCC/GGCT) had mutation in reverse complementary motifs on the transcribed and nontranscribed rates of 7.8% and 7.3%, respectively, indicating that RGYW/ strand for C/G (a) and A/T (b) motifs, respectively. Only positions with WRCY is indeed good at predicting high mutability (the targeted more than five substitutions are shown in the figure. Strong correlations http://www.jimmunol.org/ nucleotides of the four nucleotide motifs are set in boldface type). between substitution rates in reverse complementary motifs were found for WRCY includes WRC that has been found to be a hot spot for AID both C/G motifs (p Ͻ 0.0001, R ϭ 0.86, Pearson’s correlation analysis) and deamination of the C residue (19, 20, 44), and hence we find that A/T (p Ͻ 0.0001, R ϭ 0.83). there is a good correlation between deamination hot spots and C/G SHM hot spots. The other four highly mutable motifs (CGCT, AGCA, TACG, and CGCT) only deviate from RGYW/WRCY in viously described as A/T SHM hot spots (30, 45). No motifs had position 1 or 4. Noticeably, the first three of these motifs are WRC/ an A/T substitution rate of Ͻ1%, suggesting that there are no A/T GYW motifs, suggesting that the three first nucleotides within the mutational cold spots. motifs are most important for the mutability. In IGHV3-h sequences the A/T mutation frequency tended to be by guest on September 26, 2021 SYC (where S is G or C) and SSC have been described as lower than in IGHV3-23. This was also the case for the C/G mu- deamination cold spots (19, 20, 44) and, with one exception tation frequency, indicating that IGHV3-h is less targeted by SHM (GGTC), the 12 motifs with a substitution rate of Ͻ1% were all than IGHV3-23. However, the three most mutable A/T motifs were compatible with these or the reverse complementary motifs. The the same in the two VH genes. substitution rates in these motifs were only marginally above the Correlation between substitution rates in reverse complementary Taq error rate (0.00048–0.0015 substitutions per nucleotide). Fur- motifs thermore, those of the 24 possible SYCx and SSCx motifs (or the 24 reverse complementary motifs) that were present in the se- The apparent correlation between the substitution rates in reverse quence had substitution rates of Ͻ3.9%. One exception was CGCT complementary motifs prompted a closer analysis. Fig. 4a shows a with a substitution rate of 14.9%; however, this motif only deviates significant correlation between the total substitution rates in C res- from the hot spot WRCY in position 1. idues in position 3 in four-nucleotide motifs and in G residues in A set of rearrangements using IGHV3-h was also analyzed. Fif- the corresponding reverse complementary motifs ( p Ͻ 0.0001, ty-six of 103 sequences had more than three mutations in the VH- Pearson’s correlation analysis), and Fig. 4b shows that the case is region and were thus included in the substitution analysis. the same for A/T substitutions ( p Ͻ 0.0001). These data show that IGHV3-h is a pseudogene due to a disturbed translation initiation reverse complementary motifs were targeted equally well, indicat- codon and thus these sequences have not undergone selection fol- ing that the SHM machinery targeted individual motifs similarly lowing SHM. Generally the mutation frequencies in the different on the two strands. motifs were lower than in the similar motifs in the IGHV3-23 using rearrangements. ATCC, TACT, AGCT, and AACT were the only Substitutions in and around runs of G residues motifs where the C mutation frequency was Ͼ10%. The last three Seven areas in the FRs contain 3–6 G residues in a row and are included in the consensus WRCY hot spot motif. ATCC and showed a particularly low degree of substitution that correlates AGCT are both present only once in the sequence, leading to con- well with CCC/GGG being a cold spot. An interesting observation, siderable uncertainty in the relatively small sample. All possible however, was the substitution pattern of the T residue in position combinations of SYC and SSC present in the germline sequence 3 of codon 15 immediately adjacent to a run of six G residues. This were also found to have a mutation frequency of Ͻ1% in residue had a very high substitution frequency of 22.5%, the high- -in Fig. 3d). Ninety (ء) IGHV3-h. est for any T residue (marked by an asterisk three percent of the substitutions were transversions to G, a sub- A and T substitution rates in different motifs stitution type that only accounted for 23% of substitutions in other For A/T substitutions, the 17 most mutated four-nucleotide motifs T residues ( p Ͻ 0.0001, Fisher’s exact test). Only two of 262

(see Table II, bottom portion) contained WA/TW mutations pre- (0.8%) sequences with less than three mutations in the VH region 4328 SHM OCCUR IN AND 5Ј OF AID HOT SPOTS ON BOTH STRANDS

Table III. Substitution rates in the C and G nucleotides (boldfaced) of the four germline encoded AGCT motifs present in the 386 nonproductive IGHV3–23*01 rearrangements studieda

AGCTAGCT

Location Mutated Unmutated Rate (%) Mutated Unmutated Rate (%)

Codon 4/3 (FR1) 85 290 22.7 64 312 17.0 Codon 32 (CDR1) 80 286 21.9 144 219 39.7 Codon 40 (FR2) 86 279 23.6 130 233 35.8 Codon 55 (FR2/CDR2) 227 134 62.9 174 191 47.7

a AGCT motifs affected by deletions or insertions were excluded. The columns show the number of motifs that were either mutated or unmutated in the given position as well as the substitution rates. For both AGCT and AGCT the substitution rates varied significantly between the four locations ( p Ͻ 0.0001, ␹2 test). Also, the C and the G substitution rates within the same motif were found to differ significantly for the last three motifs ( p Ͻ 0.0005, pairwise t tests). had a substitution in this T residue (one to G and one to C). This tution rates of C on the nontranscribed and transcribed strands (G is significantly lower than in mutated sequences ( p Ͻ 0.0001), on the nontranscribed strand) within the same motif. indicating that Taq errors cannot account for the high When comparing individual substitution rates for most of the

substitution rate. other motifs that exist more than once, we find that they also vary Downloaded from In the IGHV3-h-using sequences the substitution rate in the cor- without any obvious relation to location (data not shown). This responding residue was also remarkably high (14.6%) and again indicates that a four-nucleotide motif only partially explains the consisted of mostly T to G transversions. Also, the T residue in the mutability in a given position. The substitution rate might still be last position of codon 7 preceding a run of five G residues showed influenced by the base composition in the neighboring region or predominately T to G transversions (75%), but at a somewhat the location in the gene and, hence, by the distance from the pro- lower substitution rate (5.3%). moter or perhaps the distance to an AID target. http://www.jimmunol.org/ Substitution rates when the motif is present more than once Many of the motifs are present more than once in the IGHV3-23 Correlation between the substitution rate and the distance to the Ј gene. The highly mutable AGCT motif, for example, is present nearest 3 AID target four times, but many other four-base motifs are present up to six Replication over an AID-generated uracil can only account for C times. As seen in Table III, the substitution rates of C and G in the to T transitions and G to A transitions (when occurring on the individual AGCT motifs are clearly different without any obvious transcribed strand), and the different processes involved in the re- relation to the location in the gene, i.e., whether the position is in pair of a uracil are thought to generate other mutations. These may a FR or a CDR. Nor is there any correlation between the substi- involve the generation of an abasic site by uracil removal or the by guest on September 26, 2021

Table IV. Correlations between the substitution rate of a given nucleotide and the distance (in base pairs) to the C in the nearest 5Ј or 3Ј WRC AID deamination hot spot motif, respectivelya

Nontranscribed Strand Transcribed Strand

Distance to Nearest 3Ј Distance to Nearest 5Ј Distance to Nearest 3Ј Distance to Nearest 5Ј WRC WRC WRC WRC

Correlation Correlation Correlation Correlation Substitution Coefficient p Valueb Coefficient p Value Substitution Coefficient p Valueb Coefficient p Value

All A Ϫ0.28 0.0309 0.03 0.8267 All Ac Ϫ0.19 0.1399 Ϫ0.19 0.1675 AtoG Ϫ0.19 0.1446 0.04 0.7626 A to G Ϫ0.13 0.3187 Ϫ0.24 0.0680 AtoT Ϫ0.33 0.0088 Ϫ0.14 0.2910 A to T Ϫ0.19 0.1392 Ϫ0.20 0.1287 AtoC Ϫ0.33 0.0106 0.08 0.5532 A to C Ϫ0.31 0.0145 Ϫ0.04 0.7950 All T ؊0.36 0.0040 Ϫ0.05 0.6843 All T ؊0.43 0.0004 Ϫ0.09 0.5097 TtoC ؊0.40 0.0013 0.05 0.6748 T to C ؊0.38 0.0020 Ϫ0.09 0.5173 TtoA ؊0.35 0.0044 Ϫ0.07 0.5823 T to A ؊0.51 <0.0001 Ϫ0.15 0.2484 TtoG Ϫ0.17 0.1794 Ϫ0.12 0.3382 T to G Ϫ0.26 0.0430 Ϫ0.04 0.7861 All Cd Ϫ0.19 0.1848 0.01 0.9234 All Cd Ϫ0.21 0.0819 0.01 0.9338 CtoT Ϫ0.17 0.2423 0.04 0.7772 C to T Ϫ0.25 0.0358 Ϫ0.04 0.7303 CtoG Ϫ0.20 0.1641 Ϫ0.03 0.8084 C to G Ϫ0.14 0.2360 0.03 0.7951 CtoA Ϫ0.25 0.0723 Ϫ0.03 0.8415 C to A Ϫ0.19 0.1077 Ϫ0.06 0.6301 All G ؊0.37 0.0003 0.04 0.6909 All G ؊0.41 0.0003 Ϫ0.02 0.9006 GtoA ؊0.35 0.0006 Ϫ0.01 0.9197 G to A ؊0.41 0.0004 Ϫ0.06 0.5952 GtoC ؊0.36 0.0004 0.07 0.5310 G to C ؊0.35 0.0026 Ϫ0.03 0.8238 GtoT Ϫ0.21 0.0433 0.03 0.7936 G to T Ϫ0.24 0.0459 Ϫ0.19 0.1087

a Both the correlation coefficients and the p values are given. In the right half of the table the substitutions are indicated as if they occurred on the transcribed strand (the strand containing the WRC motif) although technically they were detected as the reverse complementary substitutions on the nontranscribed strand. It is seen that distances to the nearest 3Ј AID hot spot correlate significantly with T and G substitutions on both strands, while the same trend was only borderline significant for A substitutions. No correlation between substitution rates and the distance to the nearest 5Ј WRC was found for any of the substitutions on either strand. b The p value for correlation between substitution frequency and distance to the given motif was obtained by Spearman Correlation analysis. Boldfaced values are significant (Ͻ0.005). Underlined values (0.005 Ͻ p Ͻ 0.05) are considered borderline significant due to the many comparisons. c WRC on the transcribed strand is seen as GYW on the nontranscribed strand, A on the transcribed strand is seen as T on the nontranscribed strand, etc. d Distance 0 (equal to C in WRC) was not included in any of the calculations. The Journal of Immunology 4329

FIGURE 5. Substitution rates in IGHJ6 (n ϭ 250) (a) and IGHJ4 (n ϭ 113) (b). The first two positions in each sequence were not included in the analysis to reduce bias from un- certain definition of the 5Ј end of the

JH gene when the first or second base was mutated. It is seen that most of the substitutions are in the 5Ј end of the genes lying within the CDR3, whereas the part lying in FR4 has a low intrinsic mutability. The high mu- tability of the CDR3 parts could be explained by a high content of motifs found to have a high substitution rate in the VH region (e.g., TACT, GGTA, CTAC, and CTAC; the targeted nu- cleotide is set in boldface type), while Downloaded from the FR4 parts are rich in cold spot mo- tifs (e.g., stretches of multiple G or C residues). http://www.jimmunol.org/ removal of a stretch of nucleotides by endonucleases and/or exo- (average 43%). This was true for all four AGCT motifs present in nucleases followed by gap filling by error-prone polymerases that the germline sequence, indicating that substitutions in a given po- introduce substitutions in the flanking nucleotides. If this is the sition can indeed influence the mutability of the neighboring nu- case, one would expect to find a correlation between the substitu- cleotide in subsequent rounds of SHM. We saw, however, no sig- tion rate and the distance to the nearest AID target motif. We tested nificant differences in the substitution pattern further away than the such correlations in both directions on both strands, that is, the closest nucleotide (data not shown). distance to the nearest 5Ј WRC and the distance to the nearest 3Ј by guest on September 26, 2021 WRC. Naturally, one cannot determine whether a mutation origi- Substitution rates in JH gene and D gene motifs resemble those nally occurred on the nontranscribed or the transcribed strand, so of the VH region correlations were calculated for two extreme scenarios: namely Because of the size of our material, we had an opportunity to study that all substitution had occurred on the nontranscribed or the tran- mutations in individual JH genes and in several D genes. Fig. 5 scribed strand, respectively. Substitutions on the transcribed strand shows the substitution rates for individual residues in IGHJ6 (av- were counted as the complementary substitution on the nontran- erage 4.7% per nucleotide) and IGHJ4 (average 3.9%) that are not scribed strand and correlations were calculated to G in GYW on significantly different ( p ϭ 0.19). The highest substitution rates the nontranscribed strand. Table IV shows that there is a statisti- were seen in the 5Ј end of the JH genes, which falls within the cally significant inverse correlation between substitution rates in T CDR3 and contains most of the motifs found to have a high sub- Ј and G residues on both strands and the distance to the nearest 3 stitution rate in the VH region (e.g., TACT, GGTA, CTAT, and WRC motif on the same strand. The only exception is T to G CTAT; see Table II). Substitution rates in these motifs in the JH transversions on the nontranscribed strand. However, when the genes correspond well to the rates found in the IGHV3-23, indi- substitutions of the T residues in position 3 of codons 15 and 7 (the cating that it is the same regulatory mechanism that controls VH Ј ones preceding the runs of G residues) are omitted, the inverse and JH mutations. The 3 ends of the genes that encode FR4 have correlation between T to G substitutions and the distance to WRC fewer mutations, consistent with a high content of cold spots (e.g., on the nontranscribed strands increases (correlation coefficient of GCC,GGC,GTC, GGC, and GGG). Ϫ0.24, p ϭ 0.06). For substitutions of A there was a trend toward The overall mutation rate in D segments was rather high (aver- an inverse correlation between the substitution rate and the dis- age 7.8% per nucleotide). An analysis of individual motifs was tances to the nearest 3Ј AID hot spots that was borderline signif- performed and showed that the high mutation rate could be ex- icant for transversion to C only. There was no correlation between plained by a high content of hot spot motifs in the D genes. The hot substitution rates and the distance to the nearest 5Ј WRC for any spots targeted had substitution frequencies comparable to those of substitutions, suggesting that the error-inducing repair process fol- the same motif when placed in the VH region (see Table V). lowing AID deamination only works 5Ј of the deaminated C. Mutation frequencies depend on the JH gene Influence of substitutions in the neighboring nucleotide We noticed that the fraction of unmutated sequences, defined as

We also tested whether substitutions in C or G in a given AGCT sequences with less than three mutations in the VH region, varied target would influence the mutation rate in the neighboring G/C by depending on the JH gene. Twenty-one percent of the sequences changing the hot spot motif to a less mutable one. When compar- using JH4 (30 of 143 sequences) were unmutated, whereas 46.5% ing sequences with such substitutions to those without, we found of the JH6-carrying sequences were unmutated (223 of 473 se- that the substitution rate in the neighboring G/C was less than half quences). This was statistically significant ( p Ͻ 0.0001). When 4330 SHM OCCUR IN AND 5Ј OF AID HOT SPOTS ON BOTH STRANDS

Table V. Substitution rates in different motifs in D genes in mutated, nonproductive rearrangements compared with those of the same motifs in IGHV3–23a

C Substitutions G Substitutions

IGHV3–23 D Gene IGHV3–23 D Gene

Substitution No of Positions Substitution Substitution No of Positions Substitution Motif Rate (%) Substitutions Tested Rate (%) Motif Rate (%) Substitutions Tested Rate (%)

AGCT 34.6 15 91 16.5 AGCT 35.2 19 92 20.7 AGCA 23.4 2 36 5.6 TGCT — 4 52 7.7 TACT 15.9 8 31 25.8 AGTA — 11 102 10.8 TACC— 11 47 234 GGTA 11.9 1 24 4.2 TACG 10.2 1 25 4.0 CGTA 21.2 — — — AACC— — — —GGTT 10.4 11 54 20.4 CACT— — — —AGTG 5.9 2 49 4.1 ACCA— 8 51 15.7 TGGT 5.6 6 122 4.9 GACT 4.7 0 27 0.0 AGTC 9.9 — — —

A Substitutions T Substitutions

IGHV3–23 D Gene IGHV3–23 D Gene Downloaded from

Substitution No of Positions Substitution Substitution No of Positions Substitution Motif Rate (%) Substitutions Tested Rate (%) Motif Rate (%) Substitutions Tested Rate (%)

GTAG 20.4 8 114 7.0 CTAC 11.1 2 17 11.8 CTAT 17.7 4 25 16.0 ATAG — 4 31 12.9 TTAC 13.0 2 34 5.9 GTAA — — — —

CTAC 12.2 5 20 25.0 GTAG 13.7 15 125 12.0 http://www.jimmunol.org/ ATAT 10.0 4 43 9.3 ATAT 8.2 1 12 8.3 GTAC 10.0 2 14 14.3 GTAC 9.8 6 63 9.5 ATAG— 3 22 13.6 CTAT 7.9 2 31 6.5 AAAT 9.2 — — —ATTT — 1 32 3.1 CCAC— — — —GTGG 4.2 4 104 3.9 GCAG 3.0 0 46 0.0 CTGC 2.2 3 80 3.8 CCAG 3.0 3 47 6.4 CTGG 6.1 0 32 0.0 GGAG 3.0 0 46 0.0 CTCC 2.6 — — —

a The substitutions in residues at the ends of D segments tend to be underestimated because substitutions in these residues may change the way the joint region is interpreted.

To compensate for this problem, the two 5Ј and the two 3Ј nucleotides of each D segment were excluded from the analysis. The boldfaced residue is the one being analyzed by guest on September 26, 2021 and the motifs are listed after a decreasing substitution rate in C or A residues on the nontranscribed strand, respectively. A dash (—) means that the motif is not found in the gene(s).

looking at the mutated sequences only (more than three mutations based on 25 nonproductive human ␭ IgL rearrangements and 17 in the VH region) we also found that the substitution frequency IgH rearrangements (47). Similarly, the reported deamination cold varied between the two subsets of sequences. JH4-carrying se- spots (SYC and SSC) (19, 20, 44) were found to be cold spots for quences had an average of 18.1 substitutions in the VH region SHMs in C residues on both strands. ϭ compared with 15.2 for JH6 ( p 0.03). Substitution rates in the It is interesting to note that the many highly mutable four-nu- different hot and cold spot motifs were comparable in sequences cleotide motifs include a hot spot for C deamination on both using JH6 and JH4. Also, there was no difference in the ratio be- strands, e.g., AGCT, as this provides a simple explanation for the tween C/G and A/T substitutions in JH6- and JH4-carrying se- double-strand breaks shown to appear during the course of SHM quences ( p ϭ 0.29), suggesting that it was the overall substitution (48, 49). Others, however, have not been able to find a clear cor- rate that was decreased. relation between SHM and double-strand breaks in the BL2 cell line (50). Discussion Replication over an AID-generated uracil can only account for C and G substitution motifs and enzyme specificities C/G transitions and, thus, the preferences of other enzymes in- Using the hitherto largest published set of nonfunctionally rear- volved in the mutation process may influence the mutability of the ranged, somatically mutated human IgH sequences, we found that nucleotides in and around a given motif. UNG or a complex of the most mutable motifs for the SHM of C and G residues corre- MSH2 and MSH6 are proposed to be able to remove the created sponded well to the previously described WRCY/RGYW four- uracil, and the sequence specificities of these enzymes may there- nucleotide motifs (41–43). It has been claimed that WRCH/ fore influence the resolution of the U:G mismatch and hence the DGYW is an even better predictor for C/G mutability (46); mutability of different motifs. Bovine and E. coli UNG have been however, this cannot be supported by our data (targeted nucleo- shown to have high activity in ATU (51, 52), which corresponds tides are set in boldface type). TACA and TGCA, for example, well with the finding of ATC being a hot spot. However, there are have mutation rates as low as 1.3 and 3.3%, respectively, which is also some discrepancies because AGU, corresponding to the AGC lower than average. The discrepancies can be due to the differences mutational hot spot, displays intermediate to low uracil removal in sample sizes and methods. efficiency (51, 52). It is possible that human UNG has a different The WRCY motifs include the reported WRC deamination pref- nucleotide preference or that MSH2/MSH6 provides the necessary erence of AID (19, 20, 44). This is in line with a previous report backup. Substitutions in C and G residues during phase II could The Journal of Immunology 4331 also influence the substitution rates. In support of this, it has been the location of most WRC sequences in the CDRs. However, the shown in mice that the inactivation of pol ␪ reduces the number of fact that not only the distance but also the direction is important C/G substitutions, particularly in hot spot motifs (36). Another shows that this is not the case. These findings suggest that phase II possible phase II C/G mutator is Rev1. Rev1-deficient mice have SHM predominately targets T and G residues in the AID-targeted been shown to have fewer C to G and G to C mutations while the strand 5Ј of the deaminated C. Alternatively, phase II substitutions relative frequencies of C to A, T to C, and A to T substitutions could target the corresponding A and/or C on the opposite strand. were increased (53), suggesting that — at least in mice — Rev1 is Because both strands are targeted, both models account for phase involved in the generation of several types of phase II II substitutions in all four nucleotides. These two models are dis- substitutions. cussed further below. A and T substitutions G and T substitutions 5Ј of the AID target suggests involvement Ј Ј A and T substitutions occur only during phase II. Several enzymes ofa3-5 nuclease followed by gap filling have been shown to be involved, among them some error-prone The inverse correlation between the substitution rate in G and T DNA polymerases likely to be involved in DNA repair following residues and the distance to the nearest 3Ј WRC motif suggests a the removal of the AID-generated uracil. One such error-prone molecular mechanism involving a 3Ј-5Ј exonuclease and/or an en- polymerase is the translesion pol ␩. Pol ␩-deficient mice and pa- donuclease. Such enzyme(s) could be recruited to the abasic site tients with variant who have a mutation created at the site of the initial deamination event where it/they in the gene encoding pol ␩ display a reduced level of A/T substi- could remove a stretch of DNA 5Ј of the abasic site. DNA removal tution despite a normal overall mutation rate (29, 31, 54). Mouse in turn could be followed by error-prone gap filling. Downloaded from pol ␩ is expressed in germinal center B cells (29) and have been Several human 3Ј-5Ј exonucleases are known. These include found to interact with MSH2/6 (31), suggesting a possible way of polymerases ␦ and ␧, WRN, APE1, and MRE11 (62). MRE11 recruitment. The substitution pattern of mouse and human pol ␩ in forms the MRN complex along with RAD50 and NBS1. Ectopic vitro shows a preference for mutations in WA/TW motifs (30), expression of NBS1 increases SHM in a hypermutating Ramos cell which corresponds well with our findings that the most mutable A line (63), suggesting that MRN is involved. This is further sup- and T four-nucleotide motifs include the WA and TW motif, ported by the finding that MRE11 binds to a rearranged VH region http://www.jimmunol.org/ respectively. only in mutating cells and that recombinant MRE11/RAD50 can Pol ␨ and pol ␫ have also been suggested as being involved in cleave abasic sites in ssDNA (64). The ability to cleave DNA is phase II mutations (32, 33, 35, 55), and their error preferences may separable from the 3Ј-5Ј activity (64) and it is possible that both also influence the substitution patterns. Pol ␫ is, for example, functions are important for SHM. APE1 is also capable of DNA known for a preference for creating A to G transversions and for cleavage at abasic sites; however, APE1 does not bind to VH re- incorporating G and T opposite dUTP and A opposite an abasic gion (64), speaking against an involvement in SHM. site (56). EXO1-deficient mice have normal mutation frequencies but their mutations are C/G biased and hot spot focused (34), suggest- Strand symmetry indicates that SHM happens on both strands ing a possible involvement of EXO1 in phase II SHM. EXO1 binds by guest on September 26, 2021

Substitution rates in complementary A and T residues showed to the VH region, but not the C region in hypermutating BL-2 cells strand symmetry, indicating that both strands are targeted by phase (34). However, EXO1 is a 5Ј-3Ј exonuclease and therefore does II mutations. Likewise, the correlation between the C substitution not fit into this model unless it also has 3Ј-5Ј exonuclease or en- frequency in a particular motif and the G substitution frequency in donuclease activity as previously suggested (65). Alternatively, its the motif corresponding to the same motif on the transcribed strand involvement in SHM may not be as a nuclease. strongly suggests that AID can deaminate both strands equally As mentioned earlier, several error-prone DNA polymerases well during the initial phase of SHM. This is in agreement with have been suggested as being involved in phase II mutations in- data from Foster et al. and Boursier et al. who found that SHM cluding polymerases ␪, ␩, ␫, and ␨ (29, 31, 54, 32, 33, 55). These could target both strands in the human ␬ locus (42) and ␭ locus could be involved in gap filling following DNA removal. How- (47), respectively. Studies of AID deamination in vitro are, how- ever, to account for the finding of a correlation only between T and ever, contradictory at this point, as some find only deamination of G substitution rates and distances to WRC, the involved poly- the nontranscribed strand (19, 57) while others have shown that the merase(s) would have to make mistakes mainly opposite A and C transcribed strand can also be targeted (58–60), although in some residues. cases to a lesser extent than the nontranscribed strand. This dis- An alternative explanation that easily accounts for the strong crepancy could be due to the different experimental ways of de- correlation between substitutions in T residues but a less strong tecting deamination in vitro, and the presence of cofactors in vivo correlation for A substitutions is that a large fraction of the T/A may help the targeting of AID to both strands. One such cofactor, substitutions could be caused by the occasional incorporation of which has very recently been shown to be involved in targeting dUTP (instead of dTTP) opposite A during phase II repair. The AID in engineered mice, is MSH6 (61). MSH6 thus seems to be occasional incorporation of dUTP as a means of generating SHMs involved in not only phase II but also phase I SHM. has been suggested by Neuberger et al. (22). According to their model, the incorporated dUTP would subsequently be excised Models to explain correlations between substitutions in T and G and substitutions would be generated during replication over Ј residues and the distance to the nearest 3 AID target the abasic site. Interestingly, we found significant inverse correlations between phase II substitution rates on T and G residues and the distance to Phase II substitution on the opposite strand the nearest 3Ј WRC AID hot spot. Only nonsignificant or border- As mentioned above, the finding of inverse correlation between T and line significant trends were found for A and C substitutions. In G substitution rates and the distance to the nearest 3Ј WRC can also contrast, no correlations were found between substitution rates and be explained if the main targets for phase II mutations are C and A the distance to the nearest 5Ј WRC. It could be argued that the residues on the strand opposite the AID target. This could, for ex- inverse correlation to the distance to WRC is an artifact caused by ample, be the case if the generated uracil is either removed to 4332 SHM OCCUR IN AND 5Ј OF AID HOT SPOTS ON BOTH STRANDS generate an abasic site or is left untouched until replication. During DNA 2–5 nucleotides upstream of G quartets (66). This endonu- replication, the abasic site/uracil could cause the DNA polymerase clease may possibly be involved in the cleavage of the DNA lead- to stall and recruit an error-prone translesion polymerase. In fact, ing to SHM, although the observed substitution pattern is not all of the error-prone polymerases implicated in SHM (␪, ␩, ␫, and readily explained. ␨) are known translesion polymerases. The translesion polymerase would be predicted to be engaged opposite the targeted C and D and JH gene substitutions introduce errors while synthesizing a short DNA segment. If errors We found that substitution rates in motifs in the VH region were are preferentially introduced opposite T and G residues, this model comparable to the substitution rates found in the same motifs in the would explain our findings. This is, for example, the case for poly- D and VH genes. It is noteworthy that the two JH genes analyzed merase ␫, which has been show to preferentially incorporate G contain very few C/G mutational hot spot motifs in the region opposite T, leading toaTtoCtransition on the AID-targeted encoding FR4 while the regions encoding CDR3 have several hot Ј strand 5 of the targeted C (56). spots, for example four overlapping TACT motifs in JH6 creating hot spots for T, A, and C mutations. Hot spots are also common in Influence by substitutions on the neighboring nucleotides the D genes contributing to the CDR3. In contrast, the FR4 regions

We found that substitutions in C and G residues in hot spot motifs encoded by the JH genes contain many cold spot motifs and decreased the substitution frequency of the neighboring G/C res- showed very low substitution rates. This suggests that there has idue. This can be explained if the SHM machinery does not nor- been an evolutionary selection against mutational hot spots in the mally make substitutions in neighboring nucleotides during the FR4 region. That would be in line with earlier studies suggesting same cell cycle, because substitutions in the hot spot motifs most that the codon usage in CDR regions of IgV has been optimized for Downloaded from often lead to a less mutable motif such as AGCA (mutated in SHM (9, 10, 68). 23.8% of the motifs) becoming AACA (8% mutation), for exam- ple. It is also possible that substitutions do occur on both strands The substitution rate depends on the JH gene during the same round of SHM but that the two strands subse- Although substitution patterns in the JH genes are the same as in quently end up in sister cells before the substitutions are fixed. the VH region, we find that the mutation status of a rearrangement

Despite the previously shown inverse correlation between the partly depends on the JH gene. JH6-carrying sequences are less http://www.jimmunol.org/ mutation rate of a given nucleotide and the distance to the nearest likely to be mutated than JH4-carrying sequences and, when mu- 3Ј WRC, we find that mutations in a given AGCT position (in- tated, they contain fewer substitutions on average. The mutation cluded in the WRC/GYW motif) do not influence the overall mu- pattern does not seem to change but the overall frequency is de- tation distribution in the sequence. This suggests that the C/G nu- creased. The fact that the difference was found in nonproductive cleotide that undergoes the initial deamination step (index rearrangements makes the simplest explanation, namely that the nucleotide) leading to phase II SHM is sometimes repaired, while cells with a JH6-containing rearrangement constituted a special on other occasions the mutation is fixed during phase II. If the cell subset containing fewer mutations, very unlikely, because to index nucleotide mutation is always fixed during phase II we account for the observed findings rearrangements on both alleles Ј by guest on September 26, 2021 would expect to find more mutations 5 of the index nucleotide in would then have to use the same JH gene. This is not thought to be the sequences mutated in the index and, on the contrary, if the the case. index is always repaired we would expect to find fewer mutations. Rearrangements using JH6 tend to have longer CDR3 loops than rearrangements with, for example, JH4 (69) (L. Ohm-Laursen Substitution pattern of T residue preceding a run of G residues et al., manuscript in preparation) and CDR3s are longer among The substitution pattern of the T residue in the last position of unmutated sequences compared with mutated (69–71 and S. codon 15 is remarkable because the frequency is very high, it has Petersen and T. Barington, manuscript in preparation). This cor- almost exclusively T to G transversions, and it is outside any responds well with the finding of more JH6 sequences in the un- known SHM hot spot motifs. The adjacent run of six G residues mutated subset. However, even when the mutation analysis is re- contains almost no substitutions but a high frequency of 1–3 nu- stricted to rearrangements within a narrow range of CDR3 lengths cleotide insertions likely to be caused by slippage. (44–52 bp), we still find that the JH6-carrying sequences are less However, comparison of the numbers of substitutions in the T likely to be mutated and, when mutated, are significantly less mu- residue in the mutated (62 in 383 sequences) and unmutated (2 in tated than JH4-carrying sequences (data not shown). Thus, the 263 sequences), nonproductive sequences clearly shows that Taq length of the CDR3 does not seem to account for the changed errors are not the cause of the unusual substitution pattern. Sub- mutation frequency. stitutions in T or A residues preceding the other runs of at least Therefore, we suggest that rearrangements using JH6 have spe- four G residues are also predominantly to G. The JH genes also cial properties influencing the mutation rate. Perhaps a binding site have a run of four G residues, but this region is found to contain for a cofactor is located within the intronic region upstream of JH6 very few mutations in all sequences. and is therefore deleted when JH6 is used. Also, it is possible that Ј Runs of three or four G residues in G-rich motifs are known to JH6 is simply too close to the regulatory elements in the 3 intronic be able to form G quartets when single stranded (66). G quartets, enhancer (E␮) (72, 73) for optimal effect. are for example, formed in the G-rich nontranscribed strand of the Another possible regulator could be the E box motif 5Ј-CAG switch region during CSR, and AID is found to bind to them (67). GTG-3Ј, which is known to bind the regulatory E47 protein (74). Ј It can be hypothesized that the runs of G residues in the variable This motif is found in the 3 end of JH1, JH2, JH4, and JH5 but not region also fold into G quartets during the transcription-dependent, in JH3 and JH6, where the last nucleotide of the motif is exchanged single-stranded phase of SHM. Although AID may bind to the to an A. When inserted into the ␬ locus, the 5Ј-CAGGTG-3Ј motif quartets, the activity of the enzyme may be inhibited, which would has been shown to enhance SHM in transgenic mice without account for the low substitution rate. How this can lead toaTto changing the mutation pattern (74). Mutation of the E box motif to G transversion in the flanking base is unknown. One possibility is 5Ј-AAGGTG-3Ј decreases this effect. Furthermore, inactivation of that the quartets attract other . GQN1 is a human endonu- the E2A gene in the DT40 chicken B cell line reduces SHM. Mu- clease highly expressed in B cells and has been shown to cleave tations can be restored by the expression of either of the E2A splice The Journal of Immunology 4333 variants, E47 or E12, showing the importance of these proteins for tion- induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell the level of SHM (75). Because the 5Ј-CAGGTA-3Ј motifs found 102: 553–563. Ј Ј 12. Revy, P., T. Muto, Y. Levy, F. Geissmann, A. Plebani, O. Sanal, N. Catalan, in JH3 and JH6 deviate from the consensus 5 -CANNTG-3 E-box M. Forveille, R. Dufourcq-Labelouse, A. Gennery, et al. 2000. Activation-in- motif, we therefore hypothesize that this one nucleotide difference duced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2). Cell 102: 565–575. may be involved in reducing the mutational load of JH6-carrying 13. Arakawa, H., J. Hauschild, and J. M. Buerstedde. 2002. Requirement of the sequences compared with JH4-carrying sequences. Regardless of activation-induced deaminase (AID) gene for immunoglobulin gene conversion. the cause, the finding of a variable mutation frequency being de- Science 295: 1301–1306. 14. Martin, A., P. D. Bardwell, C. J. Woo, M. Fan, M. J. Shulman, and M. D. Scharff. pendent on the type of JH gene has implications for the affinity 2002. Activation-induced cytidine deaminase turns on somatic hypermutation in maturation and fine tuning of the repertoire, as JH6 and JH4 are the hybridomas. Nature 415: 802–806. two most commonly used J genes in the repertoire (76). 15. Petersen-Mahrt, S. K., R. S. Harris, and M. S. Neuberger. 2002. AID mutates E. H coli suggesting a DNA deamination mechanism for antibody diversification. Na- Concluding remarks ture 418: 99–104. 16. Yoshikawa, K., I. M. Okazaki, T. Eto, K. Kinoshita, M. Muramatsu, H. Nagaoka, The Ig repertoire is known to be shaped by the SHM of the vari- and T. Honjo. 2002. AID enzyme-induced hypermutation in an actively tran- scribed gene in fibroblasts. Science 296: 2033–2036. able regions during an immune response. In this study we report 17. Okazaki, I. M., K. Kinoshita, M. Muramatsu, K. Yoshikawa, and T. Honjo. 2002. that the mutation machinery operates equally well on both strands. The AID enzyme induces class switch recombination in fibroblasts. Nature 416: The substitution frequency of a given residue is dependent on the 340–345. 18. Bransteitter, R., P. Pham, M. D. Scharff, and M. F. Goodman. 2003. Activation- motif in which it resides and the distance to the nearest 3Ј AID induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA deamination hot spot, suggesting that phase II substitutions occur but requires the action of RNase. Proc. Natl. Acad. Sci. USA 100: 4102–4107. 19. Pham, P., R. Bransteitter, J. Petruska, and M. F. Goodman. 2003. Processive Ј Downloaded from 5 of the site of the initial deamination. Alternatively, phase II AID-catalysed cytosine deamination on single-stranded DNA simulates somatic substitutions occur on the opposite strand 3Ј of the G residue fac- hypermutation. Nature 424: 103–107. ing the AID-targeted C residue. Substitutions in the neighboring 20. Larijani, M., D. Frieder, W. Basit, and A. Martin. 2005. The mutation spectrum of purified AID is similar to the mutability index in Ramos cells and in ung(Ϫ/Ϫ) nucleotide also influence the substitution frequency of C and G in msh2(Ϫ/Ϫ) mice. Immunogenetics 56: 840–845. AGCT double hot spots. Motifs are the same in VH, D, and JH 21. Rada, C., J. M. Di Noia, and M. S. Neuberger. 2004. Mismatch recognition and genes; however, the J gene of the rearrangement influences the uracil excision provide complementary paths to both Ig switching and the A/T- H focused phase of somatic mutation. Mol. Cell 16: 163–171. overall mutation frequency, because JH6-using rearrangements are 22. Neuberger, M. S., J. M. Di Noia, R. C. Beale, G. T. Williams, Z. Yang, and http://www.jimmunol.org/ found to contain fewer mutations than JH4-using rearrangements. C. Rada. 2005. Somatic hypermutation at A.T pairs: polymerase error versus The sequences in this study use the IGHV3-23*01 V gene, and dUTP incorporation. Nat. Rev. Immunol. 5: 171–178. H 23. Rada, C., G. T. Williams, H. Nilsen, D. E. Barnes, T. Lindahl, and it can therefore be argued that the findings may be special to this M. S. Neuberger. 2002. Immunoglobulin isotype switching is inhibited and so- matic hypermutation perturbed in UNG-deficient mice. Curr. Biol. 12: VH gene. However, when possible we have confirmed the results by analysis of a set of sequences using the IGHV3-h pseudogene. 1748–1755. 24. Imai, K., G. Slupphaug, W. I. Lee, P. Revy, S. Nonoyama, N. Catalan, L. Yel, Also, previous work from many groups suggests that the mutation M. Forveille, B. Kavli, H. E. Krokan, H. D. Ochs, et al. 2003. Human uracil-DNA glycosylase deficiency associated with profoundly impaired immunoglobulin process is similar irrespective of which VH genes have been stud- class-switch recombination. Nat. Immunol. 4: 1023–1028. ied. We therefore suggest that the results presented in this paper 25. Rada, C., M. R. Ehrenstein, M. S. Neuberger, and C. Milstein. 1998. Hot spot by guest on September 26, 2021 are also applicable to other human VH genes. focusing of somatic hypermutation in MSH2-deficient mice suggests two stages of mutational targeting. Immunity 9: 135–141. 26. Martomo, S. A., W. W. Yang, and P. J. Gearhart. 2004. A role for Msh6 but not Acknowledgment Msh3 in somatic hypermutation and class switch recombination. J. Exp. Med. We are grateful to Nina Eggers for excellent technical assistance. 200: 61–68. 27. Shen, H. M., A. Tanaka, G. Bozek, D. Nicolae, and U. Storb. 2006. Somatic Disclosures hypermutation and class switch recombination in Msh6Ϫ/ϪUngϪ/Ϫ double- knockout mice. J. Immunol. 177: 5386–5392. The authors have no financial conflict of interest. 28. Matsuda, T., K. Bebenek, C. Masutani, I. B. Rogozin, F. Hanaoka, and T. A. Kunkel. 2001. Error rate and specificity of human and murine DNA poly- References merase eta. J. Mol. Biol. 312: 335–346. 1. Peters, A., and U. Storb. 1996. Somatic hypermutation of immunoglobulin genes 29. Zeng, X., D. B. Winter, C. Kasmer, K. H. Kraemer, A. R. Lehmann, and is linked to transcription initiation. Immunity 4: 57–65. P. J. Gearhart. 2001. DNA polymerase eta is an A-T mutator in somatic hyper- 2. Fukita, Y., H. Jacobs, and K. Rajewsky. 1998. Somatic hypermutation in the mutation of immunoglobulin variable genes. Nat. Immunol. 26: 537–541. heavy chain locus correlates with transcription. Immunity 9: 105–114. 30. Rogozin, I. B., Y. I. Pavlov, K. Bebenek, T. Matsuda, and T. A. Kunkel. 2001. 3. Delpy, L., C. Sirac, C. Le Morvan, and M. Cogne. 2004. Transcription-dependent Somatic mutation hot spots correlate with DNA polymerase eta error spectrum. somatic hypermutation occurs at similar levels on functional and nonfunctional Nat. Immunol. 2: 530–536. rearranged IgH alleles. J. Immunol. 173: 1842–1848. 31. Martomo, S. A., W. W. Yang, R. P. Wersto, T. Ohkumo, Y. Kondo, M. Yokoi, 4. Betz, A. G., C. Milstein, A. Gonzalez-Fernandez, R. Pannell, T. Larson, and C. Masutani, F. Hanaoka, and P. J. Gearhart. 2005. Different mutation signatures M. S. Neuberger. 1994. Elements regulating somatic hypermutation of an immu- in DNA polymerase eta- and MSH6-deficient mice suggest separate roles in noglobulin ␬ gene: critical role for the intron enhancer/matrix attachment region. antibody diversification. Proc. Natl. Acad. Sci. USA 102: 8656–8661. Cell 77: 239–248. 32. Diaz, M., L. K. Verkoczy, M. F. Flajnik, and N. R. Klinman. 2001. Decreased 5. Terauchi, A., K. Hayashi, D. Kitamura, Y. Kozono, N. Motoyama, and frequency of somatic hypermutation and impaired affinity maturation but intact T. Azuma. 2001. A pivotal role for DNase I-sensitive regions 3b and/or 4 in the germinal center formation in mice expressing antisense RNA to DNA polymerase induction of somatic hypermutation of IgH genes. J. Immunol. 167: 811–820. ␨. J. Immunol. 167: 327–335. 6. Kodama, M., R. Hayashi, H. Nishizumi, F. Nagawa, T. Takemori, and H. Sakano. 33. Zan, H., A. Komori, Z. Li, A. Cerutti, A. Schaffer, M. F. Flajnik, M. Diaz, and 2001. The PU. 1 and NF-EM5 binding motifs in the Igkappa 3Ј enhancer are P. Casali. 2001. The translesion DNA polymerase ␨ plays a major role in Ig and responsible for directing somatic hypermutations to the intrinsic hot spots in the bcl-6 somatic hypermutation. Immunity 14: 643–653. transgenic V(kappa) gene. Int. Immunol. 13: 1415–1422. 34. Bardwell, P. D., C. J. Woo, K. Wei, Z. Li, A. Martin, S. Z. Sack, T. Parris, 7. Rada, C., and C. Milstein. 2001. The intrinsic hypermutability of antibody heavy W. Edelmann, and M. D. Scharff. 2004. Altered somatic hypermutation and re- and light chain genes decays exponentially. EMBO J. 20: 4570–4576. duced class-switch recombination in exonuclease 1-mutant mice. Nat. Immunol. 8. Rada, C., J. Yelamos, W. Dean, and C. Milstein. 1997. The 5Ј hypermutation 5: 224–229. boundary of ␬ chains is independent of local and neighbouring sequences and 35. Faili, A., S. Aoufouchi, E. Flatter, Q. Gueranger, C. A. Reynaud, and J. C. Weill. related to the distance from the initiation of transcription. Eur. J. Immunol. 27: 2002. Induction of somatic hypermutation in immunoglobulin genes is dependent 3115–3120. on DNA polymerase iota. Nature 419: 944–947. 9. Wagner, S. D., C. Milstein, and M. S. Neuberger. 1995. Codon bias targets 36. Masuda, K., R. Ouchida, A. Takeuchi, T. Saito, H. Koseki, K. Kawamura, mutation. Nature 376: 732. M. Tagawa, T. Tokuhisa, T. Azuma, and J. Wang. 2005. DNA polymerase theta 10. Cowell, L. G., H. J. Kim, T. Humaljoki, C. Berek, and T. B. Kepler. 1999. contributes to the generation of C/G mutations during somatic hypermutation of Enhanced evolvability in immunoglobulin V genes under somatic hypermutation. Ig genes. Proc. Natl. Acad. Sci. USA 102: 13986–13991. J. Mol. Evol. 49: 23–26. 37. Ohm-Laursen, L., M. Nielsen, S. R. Larsen, and T. Barington. 2006. No evidence 11. Muramatsu, M., K. Kinoshita, S. Fagarasan, S. Yamada, Y. Shinkai, and for the use of DIR, D-D fusions, 15 open reading frames or VH T. Honjo. 2000. Class switch recombination and hypermutation require activa- replacement in the peripheral repertoire was found when applying an improved 4334 SHM OCCUR IN AND 5Ј OF AID HOT SPOTS ON BOTH STRANDS

algorithm, JointML, to 6329 human IgH rearrangements. Immunology 119: 56. Zhang, Y., X. Yuan, X. Wu, and Z. Wang. 2000. Preferential incorporation of G 265–277. opposite template T by the low-fidelity human DNA polymerase ␫. Mol. Cell. 38. Ohm-Laursen, L., S. R. Larsen, and T. Barington. 2005. Identification of two new Biol. 20: 7099–7108. alleles, IGHV3-23*04 and IGHJ6*04, and the complete sequence of the 57. Martomo, S. A., D. Fu, W. W. Yang, N. S. Joshi, and P. J. Gearhart. 2005. IGHV3-h pseudogene in the human immunoglobulin locus and their prevalences Deoxyuridine is generated preferentially in the nontranscribed strand of DNA in Danish caucasians. Immunogenetics 57: 621–627. from cells expressing activation-induced cytidine deaminase. J. Immunol. 174: 39. Lefranc, M. P., V. Giudicelli, C. Ginestoux, J. Bodmer, W. Muller, R. Bontrop, 7787–7791. M. Lemaitre, A. Malik, V. Barbie, and D. Chaume. 1999. IMGT, the international 58. Chaudhuri, J., M. Tian, C. Khuong, K. Chua, E. Pinaud, and F. W. Alt. 2003. immunogenetics database. Nucleic Acids Res. 27: 209–212. Transcription-targeted DNA deamination by the AID antibody diversification en- 40. Lam, K. P., R. Ku¨hn, and K. Rajewsky. 1997. In vivo ablation of surface im- zyme. Nature 422: 726–730. munoglobulin on mature B cells by inducible gene targeting results in rapid cell 59. Besmer, E., E. Market, and F. N. Papavasiliou. 2006. The transcription elongation death. Cell 90: 1073–1083. complex directs activation-induced cytidine deaminase-mediated DNA deamina- 41. Rogozin, I. B., and N. A. Kolchanov. 1992. Somatic hypermutagenesis in im- tion. Mol. Cell. Biol. 26: 4378–4385. munoglobulin genes, II: influence of neighbouring base sequences on mutagen- 60. Shen, H. M., S. Ratnam, and U. Storb. 2005. Targeting of the activation-induced esis. Biochim. Biophys. Acta 1171: 11–18. cytosine deaminase is strongly influenced by the sequence and structure of the 42. Foster, S. J., T. Dorner, and P. E. Lipsky. 1999. Somatic hypermutation of V␬J␬ targeted DNA. Mol. Cell. Biol. 25: 10815–10821. rearrangements: targeting of RGYW motifs on both DNA strands and preferential 61. Li, Z., C. Zhao, M. D. Iglesias-Ussel, Z. Polonskaya, M. Zhuang, G. Yang, selection of mutated codons within RGYW motifs. Eur. J. Immunol. 29: Z. Luo, W. Edelmann, and M. D. Scharff. 2006. The mismatch repair protein 4011–4021. Msh6 influences the in vivo AID targeting to the Ig locus. Immunity 24: 393–403. 43. Dorner, T., H. P. Brezinschek, R. I. Brezinschek, S. J. Foster, R. Domiati-Saad, 62. Shevelev, I. V., and U. Hu¨bscher. 2002. The 3Ј-5Ј exonucleases. Nat. Rev. Mol. and P. E. Lipsky. 1997. Analysis of the frequency and pattern of somatic muta- Cell. Biol. 3: 1–12. tions within nonproductively rearranged human variable heavy chain genes. 63. Yabuki, M., M. M. Fujii, and N. Maizels. 2005. The MRE11-RAD50-NBS1 J. Immunol. 158: 2779–2789. complex accelerates somatic hypermutation and gene conversion of immuno- 44. Yu, K., F. T. Huang, and M. R. Lieber. 2004. DNA substrate length and sur- globulin variable regions. Nat. Immunol. 6: 730–736. rounding sequence affect the activation-induced deaminase activity at cytidine.

64. Larson, E. D., W. J. Cummings, D. W. Bednarski, and N. Maizels. 2005. MRE11/ Downloaded from J. Biol. Chem. 279: 6496–6500. RAD50 cleaves DNA in the AID/UNG-dependent pathway of immunoglobulin 45. Pavlov, Y. I., I. B. Rogozin, A. P. Galkin, A. Y. Aksenova, F. Hanaoka, C. Rada, gene diversification. Mol. Cell 20: 367–375. and T. A. Kunkel. 2002. Correlation of somatic hypermutation specificity and 65. Genschel, J., L. R. Bazemore, and P. Modrich. 2002. Human Exonuclease I Is A-T substitution errors by DNA polymerase eta during copying of a Required for 5Ј and 3Ј Mismatch Repair. J. Biol. Chem. 277: 13302–13311. mouse immunoglobulin ␬ light chain transgene. Proc. Natl. Acad. Sci. USA 99: 66. Sun, H., A. Yabuki, and N. Maizels. 2001. A human nuclease specific for G4 9954–9959. DNA. Proc. Natl. Acad. Sci. USA 98: 12444–12449. 46. Rogozin, I. B., and M. Diaz. 2004. Cutting edge: DGYW/WRCH is a better 67. Duquette, M. L., P. Pham, M. F. Goodman, and N. Maizels. 2005. AID binds to predictor of mutability at G:C bases in Ig hypermutation than the widely accepted transcription-induced structures in c-MYC that map to regions associated with

RGYW/WRCY motif and probably reflects a two-step activation-induced cyti- http://www.jimmunol.org/ translocation and hypermutation. Oncogene 24: 5791–5798. dine deaminase-triggered process. J. Immunol. 172: 3382–3384. 68. Monson, N. L., T. Dorner, and P. E. Lipsky. 2000. Targeting and selection of 47. Boursier, L., W. Su, and J. Spencer. 2004. Analysis of strand biased ‘G’.C hy- ␭ permutation in human immunoglobulin V(␭) gene segments suggests that both mutations in human V rearrangements. Eur. J. Immunol. 30: 1597–1605. DNA strands are targets for deamination by activation-induced cytidine deami- 69. Rosner, K., D. B. Winter, R. E. Tarone, G. L. Skovgaard, V. A. Bohr, and nase. Mol. Immunol. 40: 1273–1278. P. J. Gearhart. 2001. Third complementarity-determining region of mutated VH 48. Sale, J. E., and M. S. Neuberger. 1998. TdT-accessible breaks are scattered over immunoglobulin genes contains shorter V, D, J, P, and N components than non- the immunoglobulin V domain in a constitutively hypermutating B cell line. mutated genes. Immunology 103: 179–187. Immunity 9: 859–869. 70. Luger, E., M. Lamers, G. Achatz-Straussberger, R. Geisberger, D. Infuhr, 49. Bross, L., Y. Fukita, F. McBlane, C. Demolliere, K. Rajewsky, and H. Jacobs. M. Breitenbach, R. Crameri, and G. Achatz. 2001. Somatic diversity of the im- 2001. DNA double-strand breaks in immunoglobulin genes undergoing somatic munoglobulin repertoire is controlled in an isotype-specific manner. Eur. J. Im- hypermutation. Immunity 13: 589–597. munol. 31: 2319–2330.

50. Faili, A., S. Aoufouchi, Q. Gueranger, C. Zober, A. Leon, B. Bertocci, 71. Brezinschek, H. P., S. J. Foster, T. Dorner, R. I. Brezinschek, and P. E. Lipsky. by guest on September 26, 2021 J. C. Weill, and C. A. Reynaud. 2006. AID-dependent somatic hypermutation 1998. Pairing of variable heavy and variable kappa chains in individual naive and occurs as a DNA single-strand event in the BL2 cell line. Nat. Immunol. 39: memory B cells. J. Immunol. 160: 4762–4767. 815–821. 72. Morvan, C. L., E. Pinaud, C. Decourt, A. Cuvillier, and M. Cogne. 2003. The 51. Eftedal, I., P. H. Guddal, G. Slupphaug, G. Volden, and H. E. Krokan. 1993. immunoglobulin heavy-chain locus hs3b and hs4 3Ј enhancers are dispensable for Consensus sequences for good and poor removal of uracil from double stranded VDJ assembly and somatic hypermutation. Blood 102: 1421–1427. DNA by uracil-DNA glycosylase. Nucleic Acids Res. 21: 2095–2101. 73. Bottaro, A., F. Young, J. Chen, M. Serwe, F. Sablitzky, and F. W. Alt. 1998. 52. Eftedal, I., G. Volden, and H. E. Krokan. 1994. Excision of uracil from double- Deletion of the IgH intronic enhancer and associated matrix-attachment regions stranded DNA by uracil-DNA glycosylase is sequence specific. Ann. NY Acad. decreases, but does not abolish, class switching at the ␮ locus. Int. Immunol. 10: Sci. 726: 312–314. 799–806. 53. Jansen, J. G., P. Langerak, A. Tsaalbi-Shtylik, P. ven den Berk, H. Jacobs, and 74. Michael, N., H. M. Shen, S. Longerich, N. Kim, A. Longacre, and U. Storb. 2003. N. de Wind. 2006. Strand-biased defect in G/C transversions in hypermutating The E box motif CAGGTG enhances somatic hypermutation without enhancing immunoglobulin genes in Rev1-deficient mice. J. Exp. Med. 203: 319–323. transcription. Immunity 19: 235–242. 54. Zeng, X., G. A. Negrete, C. Kasmer, W. W. Yang, and P. J. Gearhart. 2004. 75. Schoetz, U., M. Cervelli, Y. D. Wang, P. Fiedler, and J. M. Buerstedde. 2006. Absence of DNA polymerase ␩ reveals targeting of C mutations on the nontran- E2A expression stimulates Ig hypermutation. J. Immunol. 177: 395–400. scribed strand in immunoglobulin switch regions. J. Exp. Med. 199: 917–924. 76. Wasserman, R., Y. Ito, N. Galili, M. Yamada, B. A. Reichard, S. Shane, 55. Delbos, F., A. De Smet, A. Faili, S. Aoufouchi, J. C. Weill, and C. A. Reynaud. B. Lange, and G. Rovera. 1992. The pattern of joining (JH) gene usage in the 2005. Contribution of DNA polymerase ␩ to immunoglobulin gene hypermuta- human IgH chain is established predominantly at the B precursor cell stage. tion in the mouse. J. Exp. Med. 201: 1191–1196. J. Immunol. 149: 511–516.