Predict AID Targeting in Non-Ig Genes Multiple Transcription Factor
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from http://www.jimmunol.org/ by guest on September 26, 2021 is online at: average * The Journal of Immunology published online 20 March 2013 from submission to initial decision 4 weeks from acceptance to publication Multiple Transcription Factor Binding Sites Predict AID Targeting in Non-Ig Genes Jamie L. Duke, Man Liu, Gur Yaari, Ashraf M. Khalil, Mary M. Tomayko, Mark J. Shlomchik, David G. Schatz and Steven H. Kleinstein J Immunol http://www.jimmunol.org/content/early/2013/03/20/jimmun ol.1202547 Submit online. Every submission reviewed by practicing scientists ? is published twice each month by http://jimmunol.org/subscription Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts http://www.jimmunol.org/content/suppl/2013/03/20/jimmunol.120254 7.DC1 Information about subscribing to The JI No Triage! Fast Publication! Rapid Reviews! 30 days* Why • • • Material Permissions Email Alerts Subscription Supplementary The Journal of Immunology The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2013 by The American Association of Immunologists, Inc. All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. This information is current as of September 26, 2021. Published March 20, 2013, doi:10.4049/jimmunol.1202547 The Journal of Immunology Multiple Transcription Factor Binding Sites Predict AID Targeting in Non-Ig Genes Jamie L. Duke,* Man Liu,†,1 Gur Yaari,‡ Ashraf M. Khalil,x Mary M. Tomayko,{ Mark J. Shlomchik,†,x David G. Schatz,†,‖ and Steven H. Kleinstein*,‡ Aberrant targeting of the enzyme activation-induced cytidine deaminase (AID) results in the accumulation of somatic mutations in ∼25% of expressed genes in germinal center B cells. Observations in Ung2/2 Msh22/2 mice suggest that many other genes efficiently repair AID-induced lesions, so that up to 45% of genes may actually be targeted by AID. It is important to understand the mechanisms that recruit AID to certain genes, because this mistargeting represents an important risk for genome instability. We hypothesize that several mechanisms combine to target AID to each locus. To resolve which mechanisms affect AID targeting, we analyzed 7.3 Mb of sequence data, along with the regulatory context, from 83 genes in Ung2/2 Msh22/2 mice to identify common properties of AID targets. This analysis identifies three transcription factor binding sites (E-box motifs, along with YY1 Downloaded from and C/EBP-b binding sites) that may work together to recruit AID. Based on previous knowledge and these newly discovered features, a classification tree model was built to predict genome-wide AID targeting. Using this predictive model, we were able to identify a set of 101 high-interest genes that are likely targets of AID. The Journal of Immunology, 2013, 190: 000–000. omatic hypermutation (SHM) occurs in germinal center the enzyme that deaminates cytosines to initiate SHM, can act (GC) B cells, resulting in the introduction of point muta- outside of the Ig locus. In a previous sequencing study, we showed http://www.jimmunol.org/ tions into Ig genes. Although SHM provides an important that .45% of expressed genes in GC B cells are targeted by AID in S 2/2 2/2 source of genetic diversity, capable of producing specific Abs for Ung Msh2 double-knockout (dKO) mice, where the absence quickly evolving pathogens, the process also poses a severe threat of DNA repair reveals the “footprint” of AID. Even among genes to genomic stability. Activation-induced cytidine deaminase (AID), that were targeted by AID, this study revealed a wide range of mutation frequencies observed across 83 genes (1). In this study, we seek to address two basic questions that are raised by the former *Interdepartmental Program in Computational Biology and Bioinformatics, Yale study: Why are some genes targeted by AID, whereas others are University, New Haven, CT 06511; †Department of Immunobiology, Yale University ‡ not? and How do the genes targeted by AID accumulate different School of Medicine, New Haven, CT 06510; Department of Pathology, Yale Uni- by guest on September 26, 2021 versity School of Medicine, New Haven, CT 06510; xDepartment of Laboratory levels of mutation? The main hypothesis we pursue is that sequence Medicine, Yale University School of Medicine, New Haven, CT 06510; {Department features of each gene are responsible for this differential targeting. of Dermatology, Yale University School of Medicine, New Haven, CT 06510; and ‖ The current model of SHM proposes two phases (2). In the first Howard Hughes Medical Institute, New Haven, CT 06510 1 phase, AID converts a cytosine (C) residue to a uracil (U) in ssDNA Current address: Drinker Biddle & Reath LLP, Washington, D.C. created during the process of transcription, which, if left unrepaired, Received for publication September 26, 2012. Accepted for publication February 15, leads to a C to T (thymine) transition mutation when the DNA is 2013. replicated for cell division (3). The second phase of SHM begins J.L.D. was supported in part by the Pharmaceutical Research and Manufacturers of America Foundation and National Institutes of Health Grant T15 LM07056 from the when DNA repair mechanisms attempt to remove the uracil lesion National Library of Medicine. D.G.S. is an investigator of the Howard Hughes from the DNA. The repair of the uracil happens via two pathways: Medical Institute. Computational resources were provided by the Yale University base excision repair with UNG and mismatch repair facilitated by Biomedical High Performance Computing Center (National Institutes of Health Grant RR19895). the MSH2/MSH6 complex, both of which are capable of working J.L.D. and S.H.K. designed the analyses; M.L. and D.G.S. designed the RNA poly- in an error-prone fashion and contributing to the observed muta- merase II ChIP-Seq experiment; M.L. performed the ChIP portion of the experiment; tion frequency (4). In the dKO setting, the second phase of SHM M.M.T. and M.J.S. provided the microarray expression data; J.L.D. and G.Y. wrote is unavailable, thus revealing the underlying “footprint” of AID, software for the analyses; J.L.D. performed the analyses; and J.L.D., D.G.S., and S.H.K. wrote the manuscript. All authors commented on the manuscript. where the expectation is primarily C → T transition mutations. We The microarray data presented in this article have been submitted to the Gene Ex- previously sequenced 83 non-Ig genes from dKO mice on average pression Omnibus (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE44260) 70 times per gene over a 1-kb region downstream of the tran- under accession number GSE44260. scription start site (TSS) (1). Mutation frequencies varied widely, Address correspondence and reprint requests to Dr. Steven H. Kleinstein, Yale Uni- ranging from ,1 3 1025 to 116.1 3 1025 mutations/bp, but they versity School of Medicine, 300 George Street, Suite 505, New Haven, CT 06511. E-mail address: [email protected] were highly predictable for the same gene across samples from The online version of this article contains supplemental material. multiple mice. In the same system, sequencing of an IgH positive control, specifically the VhJ558-Jh4 intron 39 flanking region Abbreviations used in this article: AID, activation-induced cytidine deaminase; ChIP, chromatin immunoprecipitation; ChIP-Seq, chromatin immunoprecipitation followed (hereafter referred to as the Jh4 intron), found a mutation frequency by massively parallel sequencing; CSR, class switch recombination; dKO, double of 9.96 3 1023 mutations/bp. Each gene represents a unique ge- knockout; FDR, false-discovery rate; GC, germinal center; GSEA, gene set enrich- ment analysis; KO, knockout; KW, Kruskal–Wallis test; MWU, Mann–Whitney U nomic context in which to explore the various properties associated test; NB, negative binomial; RefSeq, National Center for Biotechnology Information with AID targeting. Reference Sequence; SHM, somatic hypermutation; TC-Seq, translocation-capture Differential AID activity in non-Ig genes may be influenced by sequencing; TSS, transcription start site; ZI-NB, zero-inflated negative binomial. multiple underlying mechanisms. A higher transcription rate may Copyright Ó 2013 by The American Association of Immunologists, Inc. 0022-1767/13/$16.00 be associated with an increased mutation frequency. Genes with www.jimmunol.org/cgi/doi/10.4049/jimmunol.1202547 2 AID TARGETING IN NON-Ig GENES a higher mutation frequency may contain a large number of AID Dynabeads Protein A (Invitrogen) were incubated with RNA polymerase hotspots, such as WRC (W = A/T; R = A/G), and/or few AID II Ab N20 (Santa Cruz Biotechnologies) or normal rabbit serum. Excess Ab coldspots, such as SYC (S = C/G; Y = C/T), where the C is the was washed away. Then Ab-bound beads were incubated with chromatin from 20 million sorted spleen GC B cells (previously cross-linked with 1% mutated position (5, 6). Clonal recruitment of AID to certain HCHO and then sonicated to shear the DNA fragments to 100–300 bp) at 4˚C genes may lead to an increased mutation frequency (7). Finally, overnight. Beads were washed, chromatin was eluted, and the cross-linking the genes for which high mutation frequencies are observed may was reversed. DNA was purified, precipitated, and redissolved in TE buffer. share functional elements, like transcription factor binding sites, Precipitated DNA was quantified using a PicoGreen dsDNA quantification kit (Molecular Probe). A total of 200 ng chromatin immunoprecipitation which recruit AID to the locus for mutation. In this study, we first (ChIP) DNA (from 40 million cells) ends was repaired using polynucleotide examine each of the possible mechanisms independently and then kinase and Klenow enzyme, followed by treatment with Taq polymerase to develop an integrated model to predict targeting of AID in the generate a protruding 39 A base used for adaptor ligation.