Specificity Landscapes Unmask Submaximal Binding Site Preferences of Transcription Factors
Total Page:16
File Type:pdf, Size:1020Kb
Specificity landscapes unmask submaximal binding site preferences of transcription factors Devesh Bhimsariaa,b,1, José A. Rodríguez-Martíneza,2, Junkun Panc, Daniel Rostonc, Elif Nihal Korkmazc, Qiang Cuic,3, Parameswaran Ramanathanb, and Aseem Z. Ansaria,d,4 aDepartment of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706; bDepartment of Electrical and Computer Engineering, University of Wisconsin–Madison, Madison, WI 53706; cDepartment of Chemistry, University of Wisconsin–Madison, Madison, WI 53706; and dThe Genome Center of Wisconsin, University of Wisconsin–Madison, Madison, WI 53706 Edited by Michael Levine, Princeton University, Princeton, NJ, and approved September 24, 2018 (received for review July 13, 2018) We have developed Differential Specificity and Energy Landscape Here, we report the development of Differential Specificity (DiSEL) analysis to comprehensively compare DNA–protein inter- and Energy Landscapes (DiSEL) to compare experimental actomes (DPIs) obtained by high-throughput experimental plat- platforms, computational methods, and interactomes of TFs, forms and cutting edge computational methods. While high-affinity especially those factors that bind identical consensus motifs. Our DNA binding sites are identified by most methods, DiSEL uncovered results reveal that (i) most high-throughput experimental plat- nuanced sequence preferences displayed by homologous transcription forms reliably identify high-affinity motifs but yield less reliable factors. Pairwise analysis of 726 DPIs uncovered homolog-specific dif- information on submaximal sites; (ii) with few exceptions, com- ferences at moderate- to low-affinity binding sites (submaximal sites). putational methods model DPIs with a focus on high-affinity DiSEL analysis of variants of 41 transcription factors revealed that sites; (iii) submaximal sites improve the annotation of biologi- many disease-causing mutations result in allele-specific changes in cally relevant binding sites across genomes; (iv) among members binding site preferences. We focused on a set of highly homologous of TF families, homolog-specific preferences are most evident at factors that have different biological roles but “read” DNA using iden- submaximal affinity sites rather than high-affinity motifs; (v) tical amino acid side chains. Rather than direct readout, our results among closely related homologs that use identical side chains to indicate that DNA noncontacting side chains allosterically contribute interact with DNA, the residues that face away from the DNA to sculpt distinct sequence preferences among closely related mem- can allosterically confer homolog-specific preferences for sub- BIOPHYSICS AND bers of transcription factor families. maximal sites; and (vi) among naturally occurring alleles of COMPUTATIONAL BIOLOGY specific factors, several disease-causing alleles impact binding to Differential Specificity and Energy Landscapes | cognate site identification | submaximal affinity sites (24). Taken together, DiSEL analysis DNA–protein interactome | DNA sequence recognition | allostery readily unmasks the differences between experimental platforms and computational models and identifies submaximal sites that enome-wide binding profiles of hundreds of transcription Gfactors (TFs) have made it abundantly clear that these Significance proteins bind to a large spectrum of sequences to manifest their – biological functions (1 3). The affinity for different biologically Several experimental platforms and computational methods relevant binding sites can vary dramatically. Surprisingly, only a have been developed to identify DNA binding sites of over 1,000 fraction of the genomic sites occupied in living cells can be an- transcription factors. Often, high-affinity (maximal) binding sites are notated using high-affinity motifs assigned to a given TF (1). To reported as consensus motifs. Differences between experimental further confound annotation, high-affinity sites can be bound platforms contribute to uncertainty in ascribing binding to sub- interchangeably by TFs that bear a common DNA binding fold maximal sites. However, biological studies emphasize the impor- (4–6). This is especially true for highly homologous TFs that tance of submaximal binding sites in shaping regulatory functions of often bind indistinguishably to consensus high-affinity sites (6, 7). transcription factors. To bridge this gap, we developed Differential Increasingly, moderate- to low-affinity (submaximal or sub- Specificity and Energy Landscapes to unmask differences between optimal affinity) binding sites have been shown to guide selective experimental and computational methods as well as capture distinct binding of individual TFs to distinct genomic loci (8–11). In submaximal binding site preferences of transcription factors. Our other words, energetically subtle preferences for different mod- results suggest that subtle variation in protein structure can allo- erate- to low-affinity sites govern selective binding and distinct sterically confer homolog-specific differences in binding to sub- biological roles of closely related homologous TFs (8–11). maximal affinity sites. The quest to identify consensus binding sites of all DNA (and RNA) binding proteins encoded within the human genome is Author contributions: D.B., J.A.R.-M., Q.C., P.R., and A.Z.A. designed research; D.B., being driven by high-throughput experimental platforms and new J.A.R.-M., J.P., D.R., E.N.K., Q.C., P.R., and A.Z.A. performed research; D.B., P.R., and A.Z.A. contributed new reagents/analytic tools; D.B., J.A.R.-M., J.P., D.R., and E.N.K. ana- computational approaches (12, 13). Each experimental and lyzed data; and D.B., J.A.R.-M., J.P., D.R., E.N.K., Q.C., P.R., and A.Z.A. wrote the paper. computational approach has inbuilt advantages and limitations Conflict of interest statement: A.Z.A. is the sole member of VistaMotif, LLC and founder (14–16). While high-affinity sites are readily identified, binding of the educational nonprofit WINStep Forward. to submaximal affinity sites is nontrivial and is often overlooked. This article is a PNAS Direct Submission. However, an unexpected result from recent analyses is that high- Published under the PNAS license. “ ” affinity consensus binding sites often do not predict in vivo 1Present address: Bio Informaticals, Jaipur, Rajasthan 302016, India. genome-wide binding profiles (chromatin immunoprecipitation 2Present address: Department of Biology, University of Puerto Rico–Rio Piedras, San Juan, followed by sequencing or ChIP-seq) as effectively as models that Puerto Rico 00925. include sequences of submaximal affinities (17). Pairwise com- 3Present address: Departments of Chemistry and Physics & Biomedical Engineering, Boston parisons of DNA–protein interactomes (DPIs) suggest that most University, Boston, MA 02215. experimental platforms capture high-affinity sites with remarkable 4To whom correspondence should be addressed. Email: [email protected]. fidelity (18–23).However,theextenttowhich platform-dependent This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. idiosyncrasies thwart the identification of submaximal binding sites 1073/pnas.1811431115/-/DCSupplemental. is underscrutinized and poorly understood. www.pnas.org/cgi/doi/10.1073/pnas.1811431115 PNAS Latest Articles | 1of10 Downloaded by guest on September 24, 2021 are preferred by homologous proteins with indistinguishable high- A affinity target sites. Our results highlight the importance of non- obvious allosteric contributors in conferring differential sequence specificity. While widely ignored, such allosteric effects likely con- tribute to sequence specificity beyond current models of direct and indirect readout of DNA sequence and shape. Increased evaluation of differential binding to submaximal affinity sites will undoubtedly improve the ability to decipher how genomic information is utilized by TFs to manifest their regulatory functions in vivo. Results Specificity and Energy Landscapes Display Binding Affinities for an B Entire Sequence Space. DPIs from high-throughput experimental methods are typically distilled down to a position weight matrix (PWM)-based “consensus motif” or a limited set of motifs (12, 25, 26) (Fig. 1A). While PWM-based motifs efficiently summa- rize sequence preferences of a DNA binding protein, they compress related sequences into a consensus, overlook the impact of flanking sequences, and underestimate the full spectrum of cognate sites contained within a given interactome. We utilize sequence specificity landscapes (SSLs) to visualize individual interactomes (19, 27) (Fig. 1B). When binding affinities are measured and correlated with cog- nate sites within an interactome, the resulting plots display binding energy landscapes [Specificity and Energy Landscapes (SELs)] of individual TFs (27, 28). In SSL/SEL plots, the binding affinities for a k-mer sequence space are represented in a series of concentric rings organized by a “seed motif.” All sequences in a DPI are then placed at different positions along the concentric circles based on sequence similarity to the seed motif (Fig. 1B and SI Appendix,Fig.S1). SELs of different classes of proteins reveal the range of binding modes displayed by a given TF and impact of flanking sequences and mis- matches on binding (6, 19). C To elucidate similarities and differences in DPIs obtained by various experimental methods and sequence preferences of highly homologous TFs, we now report the development