Small Molecules of Different Origins Have Distinct Distributions of Structural Complexity That Correlate with Protein-Binding Profiles

Small Molecules of Different Origins Have Distinct Distributions of Structural Complexity That Correlate with Protein-Binding Profiles

Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles Paul A. Clemonsa,1, Nicole E. Bodycombea, Hyman A. Carrinskia, J. Anthony Wilsona, Alykhan F. Shamjia, Bridget K. Wagnera, Angela N. Koehlera, and Stuart L. Schreibera,b,c,1 aBroad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142; and bHoward Hughes Medical Institute and cDepartment of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138 Contributed by Stuart L. Schreiber, September 3, 2010 (sent for review August 9, 2010) Using a diverse collection of small molecules generated from a (ii) measured protein-binding activities of each member against variety of sources, we measured protein-binding activities of each each of 100 diverse proteins using small-molecule microarrays individual compound against each of 100 diverse (sequence-unre- (SMMs) (12, 13); and (iii) correlated these computed and lated) proteins using small-molecule microarrays. We also analyzed measured properties (Fig. 1). The resulting correlations suggest structural features, including complexity, of the small molecules. that structural features of small molecules relating to hybridiza- We found that compounds from different sources (commercial, aca- tion and stereochemistry are important contributors to binding demic, natural) have different protein-binding behaviors and that proteins selectively. these behaviors correlate with general trends in stereochemical and shape descriptors for these compound collections. Increasing Results the content of sp3-hybridized and stereogenic atoms relative to To quantify stereochemical and shape complexity, we calculated compounds from commercial sources, which comprise the majority two parameters for each compound for comparative analysis be- of current screening collections, improved binding selectivity and tween the three compound sources. While it is likely that more frequency. The results suggest structural features that synthetic complex descriptors may better physically model molecular shape chemists can target when synthesizing screening collections for complexity, one motivation of the current study was to link simple biological discovery. Because binding proteins selectively can be size-independent metrics with compound performance. First, we a key feature of high-value probes and drugs, synthesizing com- defined stereochemical complexity as the proportion of carbon pounds having features identified in this study may result in C ∕C atoms that are stereogenic ( stereogenic total). This metric pro- improved performance of screening collections. vides a size-independent global assessment of stereochemical complexity, varying on the range [0,1] for each molecule. Inspect- chemical diversity ∣ cheminformatics ∣ natural products ∣ small-molecule CHEMISTRY ∣ ing histograms of this metric as a function of compound source microarrays small-molecule probes (Fig. 2) revealed that CC is lowest in stereochemical complexity (median ¼ 0.00; mean ¼ 0.022). In contrast, NP is highest in mall-molecule probe- and drug-discovery activities in acade- stereochemical complexity (median ¼ 0.24; mean ¼ 0.24), while Smia and the pharmaceutical industry often begin with high- DC has intermediate values (median ¼ 0.11; mean ¼ 0.12). throughput screening. Many thousands of small molecules are Second, we defined shape complexity as the ratio of sp3-hybri- tested with the expectation that each has potential as a discovery dized carbon atoms to total sp3- and sp2-hybridized carbons lead. Thus, assembling or synthesizing compound collections for (Csp3∕½Csp2 þ Csp3). This metric is similar to the recently reported small-molecule screening represents an important step in discov- 3 Fsp metric (14); again, this metric is size-independent, varying BIOPHYSICS AND ery success, particularly when selecting among compounds from a A COMPUTATIONAL BIOLOGY variety of synthetic and natural sources. Unbiased methods to on the range [0,1] for each molecule. Using this metric (Fig. 3 ), we observed that CC is lowest in proportion of sp3-hybridized evaluate the assay performance of compounds from different ¼ 0 22 ¼ 0 27 NP ¼ sources, and to relate performance to chemical structure (defined carbons (median . ; mean . ), is highest (median 0 55 ¼ 0 55 DC by computed structural properties) (1, 2), can provide guidance . ; mean . ), and has an intermediate distribution ¼ 0 36 ¼ 0 39 to one element of more valuable small-molecule screening (median . ; mean . ). When we restricted our analysis to carbon atoms in the molecular scaffold (15), we observed a collections. 3 Comparative analyses between compounds often involve che- decrease in overall sp carbon proportions in all three popula- minformatic analysis of compound structures (3–5) or retrospec- tions (Fig. 3B). This effect was most striking in CC (median ¼ tive analysis of compound performance by mining the literature 0.071; mean ¼ 0.16), indicating that a substantial portion of 3 (6–8) or historical data (9, 10). For example, intermediate mole- sp carbons in these molecules are in appendages rather than cular complexity has been suggested as theoretically preferable in (predominantly flat) core skeletons. In contrast, NP molecules for drug leads (11), and this relationship is supported by evidence retain a large proportion of their sp3 carbons in core skeletons mined from historical data (9). In this study, we performed un- (median ¼ 0.50; mean ¼ 0.49). biased comparisons of compounds from natural and synthetic sources by first identifying compounds with unknown activities and then exposing them to a common assay platform. We iden- Author contributions: P.A.C., A.S., A.N.K., and S.L.S. designed research; P.A.C., N.E.B., and i A.N.K. performed research; P.A.C., N.E.B., H.A.C., J.A.W., B.K.W., and A.N.K. contributed tified a compound collection comprising three subsets: ( ) 6,152 new reagents/analytic tools; P.A.C., N.E.B., H.A.C., A.S., and A.N.K. analyzed data; and compounds from commercial sources that are representative of P.A.C. and S.L.S. wrote the paper. many common screening collections (commercial compounds; The authors declare no conflict of interest. CC ii ); ( ) 6,623 compounds assembled from the academic syn- Freely available online through the PNAS open access option. thetic chemistry community using, e.g., diversity-oriented synth- 1To whom correspondence may be addressed. E-mail: [email protected] or esis (diverse compounds; DC); and (iii) 2,477 naturally occurring [email protected]. NP i compounds (natural products; ). We then ( ) analyzed distri- This article contains supporting information online at www.pnas.org/lookup/suppl/ butions of stereochemical and shape complexity for each set; doi:10.1073/pnas.1012741107/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1012741107 PNAS ∣ November 2, 2010 ∣ vol. 107 ∣ no. 44 ∣ 18787–18792 Downloaded by guest on September 30, 2021 Fig. 1. Study design to relate structural complexity to protein-binding profiles. Three sources of compounds were studied; diverse samples of each subset are shown to illustrate differences between the subsets (all structures in the study are presented in Dataset S1 and Dataset S2). In addition to analyzing computed properties, we sought to epitope tag as a control. For the present study, we scored as “hits” determine differences in protein-binding abilities of compounds for each protein those compounds whose average deviation from from different origins. For this study, we analyzed a dataset re- control-spot intensities exceeded a fixed threshold for statistical sulting from testing all members of the compound collection in- significance, after correction for multiple hypothesis testing (see dividually in binding assays against each of 100 different purified Materials and Methods). We also eliminated compounds that proteins using the SMM platform, which gives a preliminary bound the antibody to the control epitope tag, because these indication of protein binding. In terms of sequence, proteins were events likely do not correspond to specific binding to the 100 pa- selected having a wide range of structural types rather than re- nel proteins. These experiments resulted in a matrix of binary hit presenting a family or families of proteins. In terms of function, calls for 15,252 compounds versus 100 proteins, of which 3,433 (22.5%) compounds bound at least one protein (Fig. 4). We note the proteins were selected having varying roles in transcriptional that this high fraction of compounds binding any protein is not regulation. SMM slides were exposed in triplicate to each of 100 unexpected with 100 parallel protein-binding assays; for example, purified proteins in independent experiments and to a common if hit rates were 0.5% for each protein, and hit compounds were selected randomly for each of 100 proteins, we would expect 1 − ð0.995Þ100 ≈39% of compounds to bind at least one protein. We characterized the resulting 100 protein-binding profiles for each of CC, DC, and NP subsets using two measures of per- formance that inform how useful a small-molecule collection might be to the screening community: (i) a measure of the rate at which hits were identified from each subset, and (ii) a measure of specificity of the discovered hits from each subset, based on the number of proteins bound by a given hit. First, we examined the propensity of compounds from different sources to score positive in any protein-binding assay.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us