
Protein−DNA binding in the absence of specific base-pair recognition Ariel Afeka, Joshua L. Schipperb, John Hortonb, Raluca Gordânb,1, and David B. Lukatskya,1 aDepartment of Chemistry, Ben-Gurion University of the Negev, Be’er Sheva 8410501 Israel; and bCenter for Genomic and Computational Biology, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708 Edited by Roger D. Kornberg, Stanford University School of Medicine, Stanford, CA, and approved September 22, 2014 (received for review June 6, 2014) Until now, it has been reasonably assumed that specific base-pair DNA sequence patterns (17). We use the term “nonconsensus recognition is the only mechanism controlling the specificity of tran- protein−DNA binding” to describe such statistical interactions. scription factor (TF)−DNA binding. Contrary to this assumption, here Using high-throughput protein−DNA binding assays com- we show that nonspecific DNA sequences possessing certain repeat bined with statistical mechanics analysis, we demonstrate here symmetries, when present outside of specific TF binding sites (TFBSs), that nonconsensus protein−DNA binding induced by DNA se- statistically control TF−DNA binding preferences. We used high- quence correlations is an entropy-dominated, statistical effect. throughput protein−DNA binding assays to measure the binding levels Contrary to the case of specific protein−DNA binding, which stems and free energies of binding for several human TFs to tens of thousands fromasingleprotein−DNA binding site, the nonconsensus effect of short DNA sequences with varying repeat symmetries. Based on characterized in our study is nonlocal, as it stems from multiple statistical mechanics modeling, we identify a new protein−DNA binding nonspecific interactions between protein and statistically repeated mechanism induced by DNA sequence symmetry in the absence of spe- DNA sequence patterns. Here we show that this effect is quantita- cific base-pair recognition, and experimentally demonstrate that this tively significant, and that, for natural genomic sequences, its strength mechanism indeed governs protein−DNA binding preferences. is comparable to the effect of mutations in the specific TF motif. In addition, we directly measure, for the first time, the free energy of protein−DNA binding | nonspecific protein−DNA binding | nonconsensus protein−DNA binding. We use tens of thousands of transcriptional regulation computationally designed DNA sequences, each 36 bp long, with varying symmetry and length scale of DNA sequence correlations, but orty years ago, von Hippel et al. demonstrated that nonspecific having identical GC content and identical specific protein−DNA Fprotein−DNA binding is an important biophysical mechanism binding site. We demonstrate that statistically, on average, the non- operating in a living cell (1). This seminal work makes it possible consensus effect alone (coming exclusively from the flanking DNA − to interpret experiments that measured how transcription factors regions, and without contribution from the specific protein DNA ∼ k T ≈ : (TFs) search for their specific target sites flanked by nonconsensus binding site) contributes a free energy on the order of 1 B 0 6 k T sequence elements (1–10). A specific consensus motif is a short kcal/mol per DNA sequence ( B is the Boltzmann constant and is DNA sequence, typically 6–20 base pairs (bp), that possesses an the temperature), for human TF Max. We show that, even for short enhanced binding affinity for a particular TF. For example, the DNA sequences, this nonconsensus effect induces a nearly threefold difference in the amount of Max protein bound to DNA molecules sequence CACGTG represents the specific consensus motif for the containing identical specific binding site, but different symmetries human protein Max used in this study (Fig. 1). The process of and correlations scales in the flanking regions (Fig. 2B). establishing specific, consensus protein−DNA binding requires the formation of precise geometrical fit between the protein and its Significance consensus DNA motif, accompaniedbytheformationofspecific hydrogen and electrostatic contacts at the protein−DNA binding interface (6, 7) (Fig. 1). In addition to binding to their consensus Understanding molecular mechanisms of how regulatory pro- DNA motifs, transcription factors can also bind, albeit with lower teins, called transcription factors (TFs), recognize their specific affinity, to DNA regions lacking any consensus motifs. The term binding sites encoded into genomic DNA represents one of the “nonspecific protein−DNA binding” (6) is typically used to describe central, long-standing problems of molecular biophysics. Strik- ingly, our experiments demonstrate that DNA context charac- these weaker interactions. Von Hippel and Berg suggested classi- terized by certain repeat symmetries surrounding specific TF fying nonspecific protein−DNA binding into two related mecha- binding sites significantly influences binding specificity. We ex- nisms (6). The first mechanism includes protein binding to its pect that our results will significantly impact the understanding mutated specific motifs that retain some residual, reduced speci- of molecular, biophysical principles of transcriptional regulation, ficity. The second mechanism is largely DNA sequence indepen- and significantly improve our ability to predict how variations in dent, and it involves electrostatic binding modulated by the overall DNA sequences, i.e., mutations or polymorphisms, and protein DNA geometry (6). Despite significant experimental progress, concentrations influence gene expression programs in living cells. molecular mechanisms responsible for these two types of non- specific binding remain poorly understood, and the free energy of Author contributions: A.A., R.G., and D.B.L. designed research; A.A., J.L.S., J.H., R.G., and nonspecific protein−DNA binding has not been systematically D.B.L. performed research; J.L.S. and J.H. contributed new reagents/analytic tools; A.A., characterized (11–14). The interplay between consensus and non- R.G., and D.B.L. analyzed data; and A.A., R.G., and D.B.L. wrote the paper. consensus DNA sequence elements emerges as a dominant factor The authors declare no conflict of interest. that governs protein−DNA binding preferences. However, this This article is a PNAS Direct Submission. interplay is also poorly understood (15, 16). Until now, it has been Freely available online through the PNAS open access option. reasonably assumed that specific (consensus) base-pair recognition Data deposition: The data reported in this paper have been deposited in the Gene Ex- − pression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession nos. GSE59845, must control the genome-wide specificity of TF DNA binding. GSE61854, and GSE61920). Contrary to this assumption, here we identify a general mechanism 1 − To whom correspondence may be addressed. Email: [email protected] or raluca. for protein DNA binding in the absence of specific base-pair recog- [email protected]. nition, and show that it stems from statistical interactions between This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. proteins and DNA sequence correlations, i.e., statistically repeated 1073/pnas.1410569111/-/DCSupplemental. 17140–17145 | PNAS | December 2, 2014 | vol. 111 | no. 48 www.pnas.org/cgi/doi/10.1073/pnas.1410569111 Downloaded by guest on September 28, 2021 four different sequence symmetries, and for each symmetry type, we computationally designed sequence sets with different values of the length scale of DNA sequence correlations (see SI Appendix,Fig. S1). The larger the correlation scale, the higher the DNA sequence symmetry, and the larger the number of repeated sequence ele- ments for each symmetry type (SI Appendix,Fig.S1). The notion of the correlation scale in our sequence design is qualitatively analo- gous to the correlation length of the one-dimensional Ising model Fig. 1. Examples of specific protein−DNA binding, involving proteins used in this study. Crystal structures of specific protein−DNA complexes formed by (20), or analogous to the frequency of the nearest-neighbor base proteins from the two structural families explored in this work: bHLH family sequences in DNA (21). All DNA sequences in our experimental (Max) and E2F/DP family (E2F4:Dp2). design have identical nucleotide content, and all of them have an identical specific motif in the center of the sequence, flanked by the nonconsensus background. The sequences are 36 bp long, with the The magnitude of the identified nonconsensus effect reaches 10-bp specific motif located in the center. 66% of consensus (specific) TF−DNA binding. This demonstrates − We initially focused on a well-characterized human transcription that the identified effect of nonconsensus protein DNA binding is factor,Max(Fig.1).Weusedalarge, 10-bp specific DNA motif highly significant compared with consensus (specific) binding. In (GTCACGTGAC) instead of the conventional 6-bp specific motif addition, we demonstrate that for DNA sequences lacking specific for the Max protein (CACGTG) to significantly reduce the possi- motif, the magnitude of the nonconsensus effect is as significant as bility that a residual specific binding might lead to the variation in for DNA sequences containing specific motif. We also demonstrate the experimentally measured protein−DNA binding levels (Fig. that the nonconsensus effect depends on the length of repeated 2A). Our design procedure
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-