Peptide Design by Optimization on a Data-Parameterized Protein
Total Page:16
File Type:pdf, Size:1020Kb
Peptide design by optimization on a data- parameterized protein interaction landscape Justin M. Jensona,1, Vincent Xueb,1, Lindsey Stretza, Tirtha Mandala, Lothar “Luther” Reicha, and Amy E. Keatinga,b,c,d,2 aDepartment of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139; bComputational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA 02139; cDepartment of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139; and dKoch Institute for Integrative Cancer Research, Cambridge, MA 02139 Edited by William F. DeGrado, University of California, San Francisco, CA, and approved September 18, 2018 (received for review July 28, 2018) Many applications in protein engineering require optimizing Proteins of the B cell lymphoma 2 (Bcl-2) family are regulators multiple protein properties simultaneously, such as binding one of apoptosis that have emerged as key therapeutic targets in target but not others or binding a target while maintaining cancer biology. Overexpression of human antiapoptotic proteins stability. Such multistate design problems require navigating a Bcl-2, Mcl-1, Bfl-1, Bcl-xL, and Bcl-w contributes to oncogenesis high-dimensional space to find proteins with desired characteris- and resistance to chemotherapy (10, 11). These five proteins tics. A model that relates protein sequence to functional attributes share the same fold and are 18–53% identical in sequence (12). can guide design to solutions that would be hard to discover via Many native partners of Bcl-2, Mcl-1, Bfl-1, Bcl-xL, and Bcl-w screening. In this work, we measured thousands of protein–pep- contain an ∼23-residue Bcl-2 homology 3 (BH3) motif that is tide binding affinities with the high-throughput interaction assay disordered in isolation but forms a helix upon binding. Small amped SORTCERY and used the data to parameterize a model of molecules or peptides that block binding to this helix can inhibit the alpha-helical peptide-binding landscape for three members of antiapoptotic function and have shown promise in preclinical the Bcl-2 family of proteins: Bcl-xL, Mcl-1, and Bfl-1. We applied and clinical studies (13–17). Bcl-2 inhibitors that are selective for optimization protocols to explore extremes in this landscape to certain family members are preferred because off-target binding discover peptides with desired interaction profiles. Computational can lead to cytotoxic effects (13, 18). design generated 36 peptides, all of which bound with high affin- Designing selective Bcl-2 family inhibitors has proven chal- ity and specificity to just one of Bcl-xL, Mcl-1, or Bfl-1, as intended. lenging. Both experimental and computational efforts to develop BIOPHYSICS AND We designed additional peptides that bound selectively to two out selective protein or peptide inhibitors have had to innovate to COMPUTATIONAL BIOLOGY of three of these proteins. The designed peptides were dissim- achieve this, either by including extensive counter screening in – ilar to known Bcl-2 binding peptides, and high-resolution crys- experimental approaches (19–21) or by expanding the size and tal structures confirmed that they engaged their targets as extent of the inhibitor interface with its Bcl-2 target (22–24). expected. Excellent results on this challenging problem demon- Approaches used so far have treated each target as a separate strate the power of a landscape modeling approach, and the design problem [e.g., “identify a tight and selective binder of designed peptides have potential uses as diagnostic tools or cancer therapeutics. Significance protein–protein interactions | peptide design | energy landscape | Bcl- 2 inhibitor | machine learning Medicine, agriculture, and the biofuel industry use engineered proteins to perform functions such as binding, catalysis, and signaling. Designing useful proteins faces the “needle in a rotein–protein interactions play an essential role in all cel- haystack” problem posed by the astronomical number of pos- lular functions, making them attractive targets for controlling P sible sequences. Proteins of utility can be found by experi- biological processes. But interaction affinity and specificity are mentally screening 102–109 molecules for properties of interest. encoded in sequence in a complex way that is not completely We posit that such screens can serve as the beginning of a understood. A long-standing goal is to describe the relationship powerful computationally aided design process. Data collected between sequence and affinity accurately enough to inform in high-throughput experiments can be used to learn aspects of studies of disease mechanisms, for example, by predicting the the relationship between protein sequence and function. We effects of mutations on function. Models that accurately link show how models trained on data can guide computational sequence to function can also support the design of new exploration of huge sequence spaces. This can enable rational binding partners. design of molecules with custom properties that would be dif- Families of structurally similar protein domains provide ex- ficult to discover using other techniques. amples of how subtle details of the protein sequence–structure relationship affect biological function. Paralogs often have dis- Author contributions: J.M.J., V.X., and A.E.K. designed research; J.M.J., V.X., L.S., and T.M. performed research; L.L.R. contributed new reagents/analytic tools; J.M.J., V.X., L.S., and tinct functions, and many biomedical applications require re- A.E.K. analyzed data; and J.M.J., V.X., and A.E.K. wrote the paper. agents that can bind or inhibit just a single member within a Conflict of interest statement: Massachusetts Institute of Technology is filing a patent larger family. Modern computational and experimental strate- application to cover the sequences reported herein. gies for engineering selective inhibitors can be effective (1–5), This article is a PNAS Direct Submission. but most methods treat each target design as a new problem, Published under the PNAS license. typically requiring a new set of screens or selections. Further, Data deposition: The atomic coordinates have been deposited in the Protein Data Bank, focusing on the “best” sequences obtained from a protein design www.wwpdb.org (PDB ID codes 6E3I and 6E3J). Raw SORTCERY data have been deposited in the Gene Expression Omnibus (GEO) database, https://www.ncbi.nlm.nih.gov/geo (ac- screen or selection leads to an artificially narrow view of allow- cession no. GSE118147). able sequence variation, which can be skewed by biases of the 1J.M.J. and V.X. contributed equally to this work. – experimental setup (6 9). The need for custom, selective, protein- 2To whom correspondence should be addressed. Email: [email protected]. binding molecules has compelled our pursuit of methods This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. for modeling a wide swath of the sequence landscape for a 1073/pnas.1812939115/-/DCSupplemental. protein family. www.pnas.org/cgi/doi/10.1073/pnas.1812939115 PNAS Latest Articles | 1of10 Downloaded by guest on September 27, 2021 protein Bfl-1” (25)], and for every design objective, a new set of elaboration of the original SORTCERY protocol affinity mapped experiments has been performed. Here we recast the specificity (amped) SORTCERY (Fig. 1A). design problem as one of mapping the high-dimensional binding We applied amped SORTCERY to measure binding of the landscape of multiple family members at once. We quantified three Bcl-2 family proteins Bcl-xL, Mcl-1, and Bfl-1 to mem- thousands of protein–peptide interactions for Bfl-1, Mcl-1, and bers of a diverse library of BH3-like peptides. Approximately Bcl-xL and used the data to construct models that capture de- 10,000 peptides were selected from larger combinatorial libraries terminants of binding affinity and specificity (26, 27). We applied that were computationally designed to be enriched in selective our models to peptide design, showing that a data-driven ap- binders of Bcl-xL, Mcl-1, or Bfl-1 (25). The libraries contained proach provides new peptides with customized binding proper- peptides with up to eight amino acid mutations compared with ties. Our success illustrates an approach for integrating data human Bim or Puma BH3 motifs and had a theoretical diversity collection and modeling to map high-dimensional protein bind- of 27,696,384 members; we refer to this set of sequences as the A SI Appendix ing landscapes and guide exploration of large sequence spaces. input library (Fig. 1 and , section A and Table S1). Such methods can enable the discovery of novel peptides with The 10,000 clones to be assayed were preselected to have a range great potential utility. of affinities for Bcl-xL, Mcl-1, and Bfl-1. We performed high- throughput amped SORTCERY binding experiments for each Results target, in duplicate, generating six datasets. After computational Our approach to mapping the Bcl-2 binding landscape was to filtering, each experiment provided binding data for between 1,292 and 3,489 unique peptides (Table 1). collect data using the high-throughput experimental protocol Amped SORTCERY measures affinity (A) in arbitrary units SORTCERY and apply it to derive computational models to related to normalized FACS signals for pools of sorted cells. describe the functions of unobserved sequences. SORTCERY Theory predicts that A will be linearly related to cell surface uses yeast cell surface display, fluorescence-activated cell sorting binding free energies over a certain resolution range, under (FACS), and deep sequencing to obtain information about the certain conditions (26). To test this relationship, and to convert binding of thousands of peptides in parallel (26, 27). Briefly, A measurements to apparent binding free energies in kcal/mol, yeast cells displaying peptide ligands are separated into pools we titrated peptide standards with each of the three target based on their normalized signals for binding to a target protein, proteins and fit the resulting curves (SI Appendix, Table S2 and using FACS.