Detection of secondary binding sites in using fragment screening

R. Frederick Ludlow, Marcel L. Verdonk, Harpreet K. Saini, Ian J. Tickle, and Harren Jhoti1

Astex Pharmaceuticals, Cambridge CB4 0QA, United Kingdom

Edited by Gregory A. Petsko, Weill Cornell Medical College, New York, NY, and approved November 9, 2015 (received for review September 27, 2015) Proteins need to be tightly regulated as they control biological ) will not necessarily detect binding to a noncompetitive site. processes in most normal cellular functions. The precise mechanisms Conversely, some functional assay formats and direct ligand- of regulation are rarely completely understood but can involve binding methods (e.g., isothermal titration calorimetry, surface binding of endogenous ligands and/or partner proteins at specific plasmon resonance) may detect the effect of a small molecule locations on a that can modulate function. Often, these binding to a secondary site but be unable to distinguish it from a additional secondary binding sites appear separate to the primary competitive ligand. Furthermore, many computational methods binding site, which, for example for an , may bind a sub- have been developed to predict ligand-binding sites from protein strate. In previous work, we have uncovered several examples in sequence conservation, 3D structure (12), or both (13). These which secondary binding sites were discovered on proteins using tools have had some success in predicting sites from the apo fragment screening approaches. In each case, we were able to es- structure (14), including the presence of cryptic pockets (15). tablish that the newly identified secondary binding site was biolog- However, identifying ligand-induced pockets remains a largely ically relevant as it was able to modulate function by the binding of unsolved problem. Clearly, strategies to identify secondary sites a small molecule. In this study, we investigate how often secondary more readily would be of great importance to help unravel the binding sites are located on proteins by analyzing 24 protein targets mechanisms by which protein function is regulated and could for which we have performed a fragment screen using X-ray crystal- provide novel approaches for therapeutic intervention. lography. Our analysis shows that, surprisingly, the majority of pro- The emergence of fragment-based drug discovery as an ap- teins contain secondary binding sites based on their ability to bind proach to identify novel small-molecule drug candidates has been BIOPHYSICS AND fragments. Furthermore, sequence analysis of these previously un-

well documented in recent years (16). In particular, screening of COMPUTATIONAL BIOLOGY known sites indicate high conservation, which suggests that they fragment libraries using X-ray crystallography has been shown to may have a biological function, perhaps via an allosteric mechanism. Comparing the physicochemical properties of the secondary sites be a rather effective approach to sample chemical space, revealing with known primary ligand binding sites also shows broad similari- previously unobserved pockets and ligand binding modes. The approach allows detection of direct binding of the fragment to the ties indicating that many of the secondary sites may be druggable in > nature with small molecules that could provide new opportunities to target, is very sensitive (Kd 10 mM, depending on the solubility modulate potential therapeutic targets. of the compound), and can also identify ligand-induced pockets (17), as well as preformed sites. The immediate availability of the protein structure | protein function | X-ray crystallography | crystal structure is also of great use in assessing the potential fragment-based druggability of a newly identified site, facilitating the decision to undertake further biological work to determine whether the site is llosteric and other noncompetitive regulatory binding sites likely to have a functional role. Finally, given a diverse fragment Aon proteins have long been of interest to biologists and also to drug discovery scientists (1–4). The intricate mechanisms by Significance which proteins are regulated often invoke binding at these sites, which can be viewed as “secondary sites” separate from the The regulation of proteins in biological systems is essential to primary site where, for example, an enzyme would perform its their function and nature has evolved a diverse array of catalytic function. Indeed, given the significant size of most mechanisms by which to achieve such regulation. Indeed, the proteins it could be argued that it would be inefficient for evo- primary function of a protein may be regulated by interaction lution not to have resulted in multiple binding sites that could be with endogenous ligands or other protein partners binding at used to regulate function. Furthermore, drug molecules that secondary sites. In this study, we report that fragment target such sites can offer an orthogonal mechanism for modu- screening using X-ray crystallography can identify such sec- lating the biological activity of a target protein and may have ondary sites that may have a biological function, which in turn improved selectivity (5–7) and resistance profiles (8, 9). The implies that the opportunities for modulating protein function different mechanisms by which orthosteric and allosteric inhibi- with small molecules via such sites are far more widespread tors operate also provide different opportunities (and pitfalls) than previously assumed. Many of the secondary sites we for drug development (10). In particular, ligands targeting these discovered were previously unknown and therefore offer po- sites can provide a mechanism for activating , something tential for novel approaches to modulate these protein targets. that is unlikely to be achievable by binding to the catalytic site. An example of this is the allosteric activators that Author contributions: R.F.L., M.L.V., and H.J. designed research; R.F.L., M.L.V., and H.K.S. performed research; R.F.L., M.L.V., H.K.S., and I.J.T. analyzed data; I.J.T. prepared X-ray are currently being investigated for treating type 2 diabetes (11). structure PDB submissions; and R.F.L., M.L.V., and H.J. wrote the paper. Experimentally identifying biologically relevant secondary sites The authors declare no conflict of interest. is challenging: although the existence of some sites may be infer- This article is a PNAS Direct Submission. red, for example from the dependency on a protein–protein in- Data deposition: The atomic coordinates and structure factors have been deposited in the teraction (PPI), other sites may currently have no known function Protein Data Bank, www.pdb.org (PDB ID codes 5FP5, 5FP6, 5FPD, 5FPE, 5FPM, 5FPN, or endogenous ligand. Despite their interest, it remains a chal- 5FPO, 5FPR, 5FPS, 5FPT, and 5FPY). lenging task to identify such sites on proteins and confirm their 1To whom correspondence should be addressed. Email: [email protected]. biological effect. Assay formats that detect binding at a certain site This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. (e.g., displacement of a fluorescently tagged or radiolabeled 1073/pnas.1518946112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1518946112 PNAS Early Edition | 1of6 Downloaded by guest on October 1, 2021 library, the process allows a chemically unbiased screen of the all forms was used. Putative sites were identified through a available protein surface, which maximizes the probability of combination of predictive tools [e.g., LIGSITE (29)], peak- finding alternative binding sites. finding methods, and visual inspection of the electron density. Indeed, our laboratory has previously reported on the successful Once defined, ligand placement into the electron density was application of fragment screening using X-ray crystallography for carried out automatically for the vast majority of the structures identifying secondary, allosteric, sites for three separate targets. using our Autosolve software (28). The first of these studies involved the discovery of an allosteric We were careful to exclude any incidental buffer molecules, inhibitor of the viral - protein HCV NS3 (18). In inorganic ions, and cryoprotectants from our analysis. We also this case, fragment screening against the full-length protein iden- excluded a significant number of occupied sites that we thought tified a previously unknown allosteric binding site at the interface were likely to be an artifact of the crystal environment, such as of the two domains. Close inspection of the HCV NS3 crystal ligands bound to a site formed by the protein and a symmetry- structures led to the hypothesis that compounds binding at this site related molecule. Finally, for each protein target, we used visual would stabilize an inactive conformation of the protein, and would inspection to cluster the multiple ligands identified into discrete thus inhibit its protease function, which was confirmed experi- sites. Sites were divided in “primary sites” and “secondary sites.” mentally. In a second example, a fragment screen against the For each target, one primary site was selected based on knowledge pyruvate M2 enzyme (PKM2) (19) revealed L-serine bound of the protein function—e.g., for enzymes, we selected the active to a previously uncharacterized binding pocket. It was sub- site, and for PPI targets, we selected the main PPI interaction site. sequently shown that L-serine binding to this allosteric site acti- Small-molecule sites that were systematically occupied in vates PKM2. Finally, the allosteric bicarbonate binding site of our X-ray fragment screen (e.g., the glutathione site in PGDS) human soluble adenylate cyclase (SolAC) was also characterized were also assigned to be primary sites. All remaining sites were using X-ray crystallography (20, 21), and a subsequent fragment defined to be secondary sites. Some of the secondary sites have a screen against this target identified inhibitors that occupy this site. known biological function, but for the majority of these sites their In other studies, Arnold and coworkers (22) described a frag- function is unknown. ment screen against HIV-1 reverse transcriptase in which seven Although we did not have a strict definition for how big or small previously unobserved binding sites were identified. An enzyme a site could be, we tried to be conservative in our estimate of the assay confirmed that three of the alternative site ligands were number of sites per target. For example, multiple subpockets inhibitory. Jahnke et al. (23) at Novartis have used NMR frag- within a larger pocket would be counted as a single site. ment screening followed by X-ray crystallography to identify a The results of this analysis are presented in Table 1. A total of previously unknown and druggable allosteric pocket on the pro- 53 distinct sites was observed across 24 protein targets, an average tein farnesyl pyrophosphate (FPPS). Development of the of 2.2 sites per target with at least two sites observed for the fragment hits resulted in a number of cell-active submicromolar majority (67%) of targets. These numbers should be seen as a inhibitors of FPPS. Novartis scientists have also reported the re- lower bound: in addition to our use of conservative site definitions, sults of an NMR-based fragment screen against c- (24) some sites may have been occluded by crystal contacts in the showing binding in the allosteric myristoyl pocket (25). Finally, particular crystal system used for the fragment screen, whereas a fragment tethering approach was used by Shokat and co- some other sites may bind chemo-types that were not represented workers to identify an allosteric pocket on the oncogenic in the screening library. G12C mutant of K-Ras (Kirsten rat sarcoma) (26). Of par- ticular note is the fact that this pocket could not be observed in Examples of Newly Identified Sites. Fig. 1 shows all primary and the apo X-ray structure due to the disordered nature of that part secondary sites that we identified for four of the targets reported in of the protein. Table 1. The first example is CDK2 where the same fragment Although there have been several previous reports confirming (compound 1)isseentobindtotwodifferentpocketsonthe the presence of biologically relevant secondary binding sites on target: one molecule binds to the known ATP binding site, and the proteins, it is unknown whether this is a general feature of pro- other interacts with a region on the C-terminal lobe, well away teins. To answer this question, we analyzed in-house data from 24 previous fragment-based drug discovery campaigns against a wide variety of protein targets where X-ray crystallography had been Table 1. Number of ligand-binding sites detected across 24 used as a primary screening technique. To the best of our knowl- in-house fragment screening campaigns edge, this is the first analysis of this kind, and our results indicate No. sites No. sites that secondary binding sites are indeed present in a majority of the Target observed Target observed proteins analyzed. In most of these cases, these secondary binding sites have not been previously identified and therefore have sig- MetAP2 2 FGFR1 1 nificant implications for the potential to generate new chemical MELK 2 hRAS 2 tools to probe unexploited biological mechanisms. We therefore PGDS (r)* 4 KEAP1 (m)* 2 propose that fragment screening against proteins represents a Lp-PLA2 2 PKM2 6 useful and well-validated method for the identification of sec- CDK2 3 ERK2 2 ondary, potentially biologically relevant, sites. 1 eIF4E 2 PARP1 (m)* 1 XIAP 1 Results HCV NS3 4 DNA (s.a)* 3 Analysis of Fragment Screening Campaigns. In this analysis, we iNOS (m)* 1 ATAD2 2 † gathered all X-ray crystal structures of protein targets for which SolAC 1 BACE1 1 at least 100 different fragments had been screened in-house us- JAK2 1 HSPA2 5 ing X-ray crystallography. This resulted in 5,590 holo X-ray HSP90A 2 SETD2 2 crystal structures, comprising 4,950 distinct compounds and 24 In total, 53 sites were discovered, with an average of 2.2 ± 1.2 sites per target. protein targets. The exact protocols and screening cascades that *Mouse (m), rat (r), and Staphylococcus aureus (s.a) proteins were used for led to the data presented here will have varied according to the these targets. † target, but in broad lines our approach has been described by Although we did identify an allosteric bicarbonate site for SolAC (20), we Hartshorn et al. (27, 28). In some cases we had collected data for considered this too close to the main to be counted as a separate different mutant forms of a protein, and in these cases data from site for this analysis.

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1518946112 Ludlow et al. Downloaded by guest on October 1, 2021 HN O NH2 O SH O N NH2 Cl OH O NH OH N N NH2 N N N N HN N N OH F Cl 1 234 567 O O OH O NH2 OH N O OH NH NH2 Br OH N O OH Br N NH 819 01112

Fig. 1. Overlaid binding sites observed for four targets in this study. (A) CDK2 (PDB 5FP5) with compound 1 (yellow, cyan, PDB 5FP5) and compound 2 (green, PDB 5FP6). (Inset) Overlay of D-Luceferin derivative from PDB 4D1Z and compound 1.(B) HSPA2 protein from PDB 5FPE with fragments 3 (dark gray and cyan), 4 (green, BIOPHYSICS AND

PDB 5FPM), 5 (light gray, PDB 5FPD), and 6 (orange, PDB 5FPN). (C) DNA ligase PDB 5FPO with fragment 7 (green, PDB 5FPR) and 8 (yellow, cyan, PDB 5FPO). COMPUTATIONAL BIOLOGY (D) Protease (gray) and helicase (blue) domains of HCV NS3 in complex with 8 (gray, PDB 4B6E), 9 (green, PDB 5FPS), 10 (yellow, PDB 5FPY), 11 (cyan, PDB 5FPT). (Inset) Overlay of our fragment 11 (cyan, PDB 5FPT) with compound 12 reported by Boehringer Ingelheim (yellow, PDB 4OJQ).

from the active site. Interestingly, a derivative of D-Luciferin has secondary sites were biologically relevant, we would anticipate that recently been observed to bind in this site, through a very similar many, if not most, of these newly discovered sites would be simi- hydrogen-bonding motif (Inset) (30). Compound 2 was also ob- larly relevant. However, experimentally validating all of these sec- served to bind at two locations, in this case at the ATP site and at a ondary sites is unpractical so we resorted to analyzing each in terms highly conserved secondary site (Fig. 1A, green ligand) in the re- of three properties: (i) the level of evolutionary conservation of the gion where CDK2 interacts with Cyclin A (PDB 1FIN) (31). Li- residues within the site, (ii) the conformational flexibility of the gand binding has been previously reported at this site by Betzi et al. residues lining the site, and (iii) the area and polarity of the mo- (32) (PDB 3PXF), who propose that disrupting the CDK2–cyclinA lecular surfaces buried upon binding. complex via this site could lead to allosteric inhibitors of CDK2. The second example (Fig. 1B) is the heat shock protein Evolutionary Conservation of Binding Sites. The first analysis we HSPA2 (also known as HSP70) for which we observed fragments performed was an assessment of the sequence conservation of binding at five distinct sites. One of these sites is the known pocket residues. For each protein target, we identified orthologous nucleotide binding cleft (dark gray ligand), but we are unaware proteins from other vertebrate species. Our expectation is that of any previous reports of the other four sites. Rodina et al. (33) orthologs (e.g., human and mouse CDK2) will share similar reg- have previously reported allosteric inhibitors of HSPA2, though ulatory mechanisms. We do not expect such regulatory mecha- none of the sites we have observed overlap with the site postu- nisms to be as strongly conserved among paralogs (e.g., human lated in their report. The third example (Fig. 1C) depicts NAD- CDK2 and human P38) (36, 37). For a protein site with no bi- dependent DNA ligase, an antibacterial target (34). In addition ological function, we would expect the sequence conservation of to the known adenosine and nicotinamide binding sites (yellow pocket residues between the human and an orthologous protein to and green ligands, respectively), a fragment was observed in be similar to the overall sequence conservation between the two another previously unknown site. Finally, Fig. 1D shows four proteins. Conversely, a relatively high sequence conservation within fragments bound to the HCV NS3 protein. The allosteric tunnel a site suggests there is some evolutionary pressure to conserve that site discussed in the Introduction is shown (gray ligand) as well as pocket. We think that such a situation is suggestive of biological fragments bound to the ATP site (green) and two distinct regions function. Therefore, we generated a multiple-sequence alignment of the nucleic acid binding groove (cyan and yellow). Compound (MSA) for each of the targets presented here against orthologous 12, originally reported by workers at Schering Plough (35), also proteins from closely related species (overall sequence identity binds within the nucleic acid binding groove, as confirmed by a greater than 80%). Orthologs for each of our targets were identi- recent X-ray structure published by researchers at Boehringer fied by BLASTP searches (E value < 0.01) against SwissProt/ Ingelheim (14). Our compound 11 is very similar to compound TrEMBL (38) protein sequences from the mammalian, vertebrate, 12 and adopts the same binding mode (PDB 4OJQ; Inset). and rodent databases. (For DNA ligase and HCV NS3, we used The data presented in Table 1 represent all of the sites we have the bacterial and viral databases, respectively.) Only the top observed throughout our work on these targets that we believe are BLASTP hit from each species was considered and an MSA of the genuine features of the protein rather than a product of the crystal query and hit sequences was constructed using MAFFT (39). Next, packing. Based on our previous experience with HCV, SolAC, and we compared the global sequence identity of each ortholog to the PKM2, where we were able to unequivocally demonstrate that the sequence identity within each of the identified binding sites. [To

Ludlow et al. PNAS Early Edition | 3of6 Downloaded by guest on October 1, 2021 A B

Fig. 2. Ortholog site vs. global sequence identity for (A) primary sites (active sites, known cofactors, etc.) and (B) secondary sites. Each point represents a single ortholog sequence, jitter applied for clarity. A reference plot (the distribution expected if there was no particular conservation of the site residues) is shown in Supporting Information (Fig. S1).

define the site, we selected the 20 protein residues closest to a using a computational approach. The method we used is based on representative ligand. Where possible, we used the native ligand a knowledge-based force field we developed, and a Voronoi tes- (e.g., nucleotide); in other cases, we took the largest synthetic ligand sellation algorithm described by McConkey et al. (40) (Supporting with Mr < 500 for which we had an X-ray structure.] Information). The resulting atomic mobility index (AMI) ranges Fig. 2 shows the local, site-based sequence identity for the from 0.0 Å for completely buried atoms to 2.8 Å for completely orthologs, plotted against the global sequence identity for that solvent exposed atoms with the potential to be highly mobile. The ortholog; sequence identities are relative to the reference sequence algorithm was run on all structures in our test set, after removing (i.e., the sequence used in the X-ray crystal structure). Fig. 2A all ligands and water molecules, and the results are shown in Fig. shows data for primary sites. For comparison, Fig. S1 shows the 3A. As expected, we were able to show a significant difference expected distribution if site conservation equaled global residue between the AMI values for general surface atoms and those in conservation. It is clear that in the vast majority (90%) of cases, primary ligand-binding sites. More importantly, the secondary sites the site-based sequence identity is greater than the global se- we discovered in our fragment screening campaigns also exhibited quence identity. This indicates that, as expected, the primary sites lower AMI values than general protein surface atoms. For all 27 of the targets in our test set are highly conserved. Fig. 2B shows a primary sites, the mean mobility index of the site is lower than that similar analysis but now for all of the secondary sites identified in of general surface atoms (P < 0.0001). For secondary sites, the the protein targets, including the three examples discussed in the mean mobility index values are lower than those of general surface Introduction. Interestingly, the trend is very similar to that of the atoms for 23 out of 26 sites (P < 0.0001). This, albeit a simplistic primary sites: for the majority (74%) of cases, the sequence model of local residue flexibility, suggests that, similar to primary identity within the secondary site is higher than the global se- ligand-binding sites, most of the secondary sites we discovered are quence identity, that is, in general, the secondary sites that locally more structured than general protein surface atoms. support fragment binding tend to be more conserved across We also analyzed the atomic displacement parameters—or “B orthologous proteins. Even when we exclude those secondary sites factors”—of residues in the primary and secondary sites to explore with a known biological function (Fig. S2), the sequence conser- their mobility. A similar analysis was performed by Yuan et al. (41), vation within sites is still higher than the global sequence identity where they reported a tendency for active-site residues in apo in 71% of cases (P < 0.0001). [The probability of observing these structures to have lower normalized B factors than non–active-site results by chance can be calculated by bootstrapping (Supporting residues. We calculated the chain-normalized residual B factors Information); in all cases, the results are significant at P < 0.0001.] [after subtracting the translation, libration, screw-rotation (TLS) This suggests that many of the secondary sites may indeed have a component; Supporting Information] for all surface atoms and the biological function. On the whole, relative to the primary sites, mean B factors for each site, and these are shown plotted in Fig. 3B. the sequence conservation for the secondary sites is slightly lower. Contrary to what was observed by Yuan et al., we observed no This could be due to a number of reasons. It is likely that some significant difference between the B factors of residues involved in of the sites simply have no biological function—the cluster of ligand binding and other surface residues. It could be that this HSPA2 points at the bottom of Fig. 2B certainly suggests that discrepancy is simply a reflection of the types of targets that were there is at least one very poorly conserved site in our dataset (also included in the two studies. However, another explanation could be see Fig. S3). Another explanation is that the regulatory sites for that, as far as we are aware, Yuan et al. did not correct for the TLS these proteins are less conserved between the species included componentoftheBfactors.Wewould suggest that this correction here as they may have diverged to bind other endogenous ligands should be included because the TLS component represents the or perform other functions. rigid-body atomic displacements that may not be correlated with flexibility of individual residues (also see Fig. S4).Thefactthatthe Mobility of Binding Sites. Although some level of conformational B factors of primary- and secondary-site atoms are not lower than movement of residues is often required on ligand binding, we may those of general protein surface atoms indicates either that un- expect more generally, however, ligand binding sites to be relatively occupied ligand-binding sites genuinely are not considerably more rigid compared with surface residues, as this reduces the entropic structured than other parts of the protein surface, or that experi- cost of the formation of the complex. We decided to investigate mental B factors from X-ray crystal structures are poor estimates mobility of the residues in both the primary and secondary sites of the flexibility of protein atoms in solution. Long-timescale

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1518946112 Ludlow et al. Downloaded by guest on October 1, 2021 AB

Fig. 3. Flexibility of primary (green)- and secondary (red)-site atoms, compared with general protein surface atoms (blue). Results are shown for (A) the mean computed local atomic mobility index (AMI) for the structures in our test set, and (B) the mean normalized B factors as observed in X-ray apo structures. Sites for which we developed a potent lead compound are shown in filled circles. The empty circles are sites we have not attempted to, or were so far unable to develop potent lead compounds against.

molecular-dynamics simulations or NMR structures might provide of activities, but nearly all are enzymes that may require more ex- a better assessment of whether ligand-binding sites on proteins are tensive levels of regulation. In most cases, the primary site of the more structured than other surface areas, but this is beyond the enzyme has been previously well characterized and strategies to scope of this work. inhibit its function would typically involve developing compounds that bind at the primary site, for example, ATP competitive inhib-

Physical Properties of the Sites. We also explored the potential of itors of protein . In contrast, the secondary sites that we BIOPHYSICS AND the secondary sites to support the binding of drug-like molecules. discovered may allow development of compounds that inhibit (or COMPUTATIONAL BIOLOGY Our approach was to compare the secondary sites to the primary activate) the enzyme via an alternative mechanism, be that allosteric sites in terms of level of buried protein on fragment binding and regulation or inhibition of other binding events. It is of note that the also physical properties such as polarity of the protein surface. majority of the proteins in this study exhibited a secondary site that Fig. S5 shows the fraction of ligand area that is buried on binding may suggest that opportunities to modulate protein function with- to the protein, Fb, for primary and secondary sites. On average, out directly inhibiting the primary site may be more widespread ligands binding to secondary sites are slightly less buried than li- than currently perceived. gands binding primary sites: hFbi = 0.85 ± 0.09 for primary sites and Although we have clearly identified multiple secondary sites on a hFbi = 0.74 ± 0.14 for secondary sites. However, when we consider variety of proteins at which fragment binding has been observed, the Fb values for sites we have developed lead series against (filled translating these to useful drugs will require significant effort and is circles, Fig. S5A), many of the secondary sites fall within this range. dependent on many factors. For example, whether the site allows Furthermore, the fraction of buried protein surface that is polar in development of drug molecules that are orthosteric in nature and nature, Fp, is shown in Fig. S5B, and again, primary and secondary compete with a natural ligand or alternatively are more allosteric in sites are highly similar. For primary sites, hFpi = 0.30 ± 0.10, nature, stabilizing a particular conformation, will be dependent on whereas for secondary sites, hFpi = 0.31 ± 0.12. By comparison, the specifics of the particular protein target. In addition, the when we consider the entire protein surface area, the average strategies required in either case will be different (10). polarity is significantly higher, hFpi = 0.45 ± 0.03, which indicates Given that generating experimental confirmation of a bi- that both primary and secondary sites are lipophilic in nature. ological role for each of these newly discovered secondary sites is Finally, we analyzed the physical properties of fragments that impractical, we instead used computational analysis to assess bind primary sites compared with those that bind secondary sites. their nature. First, our sequence analysis of orthologs supports We generated distributions for four properties: heavy-atom count, their potential biological relevance as it is unlikely that the se- ClogP, and number of H-bond donors and acceptors (Fig. S6). For quences would be conserved by evolution if they were not per- each property, the distribution for primary-site binders was highly forming an important function. We are not, however, suggesting similar to the distribution for secondary sites. Taken together, these that all of the secondary sites will have a biological role. Indeed, results indicate that secondary sites exhibit features that are in the sequence conservation for some of the sites is not strong, and general similar to primary sites and may indeed be druggable. these sites we would suggest are less likely to be functionally relevant. For example, for HSPA2, there is a cluster of orthologs Discussion indicating lower sequence conservation of a particular secondary Proteins are complex molecules and their appropriate regulation in site (shown in Fig. 1B and Fig. S3), which may indicate this site is biological processes is essential to maintain normal cellular func- not functionally relevant or has evolved to bind a different en- tions. The mechanisms by which proteins are controlled vary widely, dogenous ligand. However, in the majority of cases, the second- and in this study we have provided evidence that most proteins may ary sites exhibit good sequence conservation, which is consistent have more than one binding site for endogenous ligands and/or with their performing an important biological role. Furthermore, other proteins that regulate their function. With hindsight, this our analysis of local mobility and physical properties of these discovery may not be too surprising given that the typical size of secondary sites suggests that many are in general similar in nature proteins should accommodate more than one binding site and that to primary sites that are known to be druggable. Specifically, both evolution often drives biological systems to maximize efficiency. primary and secondary sites exhibit comparable mobility sug- However, the existence of multiple binding sites on proteins has not, gesting that they may be preorganized to a similar degree, and to date, been broadly recognized as a general feature for the ma- also the fraction of ligand surface buried and the polarity of the jority of proteins. The targets that we investigated have a wide range protein surface are not dissimilar between the two types of sites.

Ludlow et al. PNAS Early Edition | 5of6 Downloaded by guest on October 1, 2021 In this study, we have also demonstrated that fragment-based exhibit very low binding affinity, with Kd often in the millimolar screening using X-ray crystallography can be effective in identi- range, the interactions they make are highly efficient (27). In- fying alternative, potentially biologically relevant, secondary sites deed, these crystal structures of protein–fragment complexes are on protein targets. This underlines the potential of fragment- a good basis to develop novel chemical tools that would probe based screening to efficiently sample chemical space and reveal the function of these secondary binding sites and experimentally previously unknown interactions given its unbiased nature. As confirm their role, if any, in biological processes. shown in the examples, such as CDK2, it is not unusual for a particular fragment to bind in more than one pocket of a single ACKNOWLEDGMENTS. We thank all of our crystallographers, as well as all of protein, or indeed in more than one protein. We would em- the other scientists at Astex, past and present for the tremendous amount of phasize that this is not nonspecific binding that is occurring, but work that has gone into generating the >5,000 X-ray crystal structures of pro- because the fragment has low molecular complexity it is able to tein–fragment complexes, which has enabled us to perform our analyses. In particular, we thank Marc O’Reilly, Joe Patel, Dominic Tisi, Tom Davies, and bind to complementary protein motifs in several locations. This Puja Pathuri for enabling and providing the crystal structures for the examples is consistent with the fact that the electron densities of the shown in Fig. 2. We also thank Chris Murray and Glyn Williams for helpful fragments are typically very well defined and that, although they comments on the manuscript.

1. Hardy JA, Wells JA (2004) Searching for new allosteric sites in enzymes. Curr Opin 23. Jahnke W, et al. (2010) Allosteric non-bisphosphonate FPPS inhibitors identified by Struct Biol 14(6):706–715. fragment-based discovery. Nat Chem Biol 6(9):660–666. 2. Conn PJ, Christopoulos A, Lindsley CW (2009) Allosteric modulators of GPCRs: A novel 24. Jahnke W, et al. (2010) Binding or bending: Distinction of allosteric Abl kinase ago- approach for the treatment of CNS disorders. Nat Rev Drug Discov 8(1):41–54. nists from antagonists by an NMR-based conformational assay. J Am Chem Soc 3. Nussinov R, Tsai CJ (2013) Allostery in disease and in drug discovery. Cell 153(2): 132(20):7043–7048. 293–305. 25. Zhang J, et al. (2010) Targeting Bcr-Abl by combining allosteric with ATP-binding-site 4. Hauske P, Ottmann C, Meltzer M, Ehrmann M, Kaiser M (2008) of inhibitors. Nature 463(7280):501–506. . ChemBioChem 9(18):2920–2928. 26. Ostrem JM, Peters U, Sos ML, Wells JA, Shokat KM (2013) K-Ras(G12C) inhibitors al- 5. Langmead CJ, Christopoulos A (2014) Functional and structural perspectives on allo- losterically control GTP affinity and effector interactions. Nature 503(7477):548–551. steric modulation of GPCRs. Curr Opin Cell Biol 27:94–101. 27. Hartshorn MJ, et al. (2005) Fragment-based lead discovery using X-ray crystallogra- 6. Lewis JA, Lebois EP, Lindsley CW (2008) Allosteric modulation of kinases and GPCRs: phy. J Med Chem 48(2):403–413. Design principles and structural diversity. Curr Opin Chem Biol 12(3):269–280. 28. Mooij WT, et al. (2006) Automated protein-ligand crystallography for structure-based – 7. Ohren JF, et al. (2004) Structures of human MAP kinase kinase 1 (MEK1) and MEK2 drug design. ChemMedChem 1(8):827 838. describe novel noncompetitive kinase inhibition. Nat Struct Mol Biol 11(12): 29. Hendlich M, Rippmann F, Barnickel G (1997) LIGSITE: Automatic and efficient de- 1192–1197. tection of potential small molecule-binding sites in proteins. J Mol Graph Model 15(6): – 8. Di Santo R (2014) Inhibiting the HIV integration process: Past, present, and the future. 359 363, 389. J Med Chem 57(3):539–566. 30. Rothweiler U, et al. (2015) Luciferin and derivatives as a DYRK selective scaffold for – 9. Adrián FJ, et al. (2006) Allosteric inhibitors of Bcr-abl-dependent cell proliferation. the design of inhibitors. Eur J Med Chem 94:140 148. Nat Chem Biol 2(2):95–102. 31. Jeffrey PD, et al. (1995) Mechanism of CDK activation revealed by the structure of a – 10. Nussinov R, Tsai CJ (2012) The different ways through which specificity works in or- cyclinA-CDK2 complex. Nature 376(6538):313 320. 32. Betzi S, et al. (2011) Discovery of a potential allosteric ligand binding site in CDK2. thosteric and allosteric drugs. Curr Pharm Des 18(9):1311–1316. ACS Chem Biol 6(5):492–501. 11. Grimsby J, Berthel SJ, Sarabu R (2008) Glucokinase activators for the potential 33. Rodina A, et al. (2013) Identification of an allosteric pocket on human hsp70 reveals a treatment of type 2 diabetes. Curr Top Med Chem 8(17):1524–1532. mode of inhibition of this therapeutically important protein. Chem Biol 20(12): 12. Kozakov D, et al. (2015) The FTMap family of web servers for determining and 1469–1480. characterizing ligand-binding hot spots of proteins. Nat Protoc 10(5):733–755. 34. Howard S, et al. (2013) Fragment-based discovery of 6-azaindazoles as inhibitors of 13. Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA (2009) Predicting bacterial DNA ligase. ACS Med Chem Lett 4(12):1208–1212. protein ligand binding sites by combining evolutionary sequence conservation and 3D 35. Gesell J, McCoy M, Senior M, Wang Y-S, Wyss D (2006) Modern Magnetic Resonance, structure. PLoS Comput Biol 5(12):e1000585. ed Webb GA (Springer, Dordrecht, The Netherlands), pp 1419–1428. 14. LaPlante SR, et al. (2014) Integrated strategies for identifying leads that target the 36. Tharmalingam S, Burns AR, Roy PJ, Hampson DR (2012) Orthosteric and allosteric drug NS3 helicase of the hepatitis C virus. J Med Chem 57(5):2074–2090. binding sites in the Caenorhabditis elegans mgl-2 metabotropic glutamate receptor. 15. Bowman GR, Geissler PL (2012) Equilibrium fluctuations of a single folded protein Neuropharmacology 63(4):667–674. reveal a multitude of potential cryptic allosteric sites. Proc Natl Acad Sci USA 109(29): 37. Hudson JW, Golding GB, Crerar MM (1993) Evolution of allosteric control in glycogen – 11681 11686. . J Mol Biol 234(3):700–721. 16. Murray CW, Rees DC (2009) The rise of fragment-based drug discovery. Nat Chem 38. Bairoch A, et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res – 1(3):187 192. 33(Database issue):D154–D159. 17. Murray CW, Verdonk ML, Rees DC (2012) Experiences in fragment-based drug dis- 39. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: A novel method for rapid – covery. Trends Pharmacol Sci 33(5):224 232. multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 18. Saalau-Bethell SM, et al. (2012) Discovery of an allosteric mechanism for the regula- 30(14):3059–3066. – tion of HCV NS3 protein function. Nat Chem Biol 8(11):920 925. 40. McConkey BJ, Sobolev V, Edelman M (2002) Quantification of protein surfaces, vol- 19. Chaneton B, et al. (2012) Serine is a natural ligand and allosteric activator of pyruvate umes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics kinase M2. Nature 491(7424):458–462. 18(10):1365–1373. 20. Saalau-Bethell SM, et al. (2014) Crystal structure of human soluble adenylate cyclase 41. Yuan Z, Zhao J, Wang ZX (2003) Flexibility analysis of enzyme active sites by crys- reveals a distinct, highly flexible allosteric bicarbonate binding pocket. ChemMedChem tallographic temperature factors. Protein Eng 16(2):109–114. 9(4):823–832. 42. Yang CY, Wang R, Wang S (2005) A systematic analysis of the effect of small-molecule 21. Kleinboelting S, et al. (2014) Crystal structures of human soluble binding on protein flexibility of the ligand-binding sites. J Med Chem 48(18): reveal mechanisms of catalysis and of its activation through bicarbonate. Proc Natl 5648–5650. Acad Sci USA 111(10):3727–3732. 43. Howlin B, Butler SA, Moss DS, Harris GW, Driessen HPC (1993) TLSANL: TLS parameter- 22. Bauman JD, et al. (2013) Detecting allosteric sites of HIV-1 reverse transcriptase by analysis program for segmented anisotropic refinement of macromolecular struc- X-ray crystallographic fragment screening. J Med Chem 56(7):2738–2746. tures. J Appl Cryst 26:622–624.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1518946112 Ludlow et al. Downloaded by guest on October 1, 2021