De novo discovery of synthetic peptide binders to protein-protein interfaces

by

Anthony James Quartararo

B.S. Chemistry; B.S. Microbiology and Cell Science University of Florida, 2014

Submitted to the Department of Chemistry in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

at the

Massachusetts Institute of Technology

September 2020

© 2020 Massachusetts Institute of Technology All rights reserved

Signature of Author: ______Department of Chemistry August 13, 2020

Certified by: ______Bradley L. Pentelute Associate Professor of Chemistry Thesis Supervisor

Accepted by: ______Robert. W. Field Haslam and Dewey Professor of Chemistry Chair, Departmental Committee on Graduate Students

1 This doctoral thesis has been examined by a committee of the Department of Chemistry as follows:

______Alex K. Shalek Pfizer-Laubach Career Development Associate Professor of Chemistry Thesis Committee Chair

______Bradley L. Pentelute Associate Professor of Chemistry Thesis Supervisor

______Ronald T. Raines Firmenich Professor of Chemistry

2

3 De novo discovery of synthetic peptide binders to protein-protein interfaces

by

Anthony James Quartararo

Submitted to the Department of Chemistry on August 13, 2020 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Abstract Protein-protein interactions (PPIs) play crucial roles in mediating normal cellular physiology, but their modulation has been historically challenging. PPIs tend to be intractable to small molecule inhibition, due to their wide and relatively featureless interfaces, and biologics are generally not viable for the approximately two-thirds of PPIs that take place in the intracellular milieu. Therefore, this class of target has been in many circles deemed undruggable. Peptides are an emerging therapeutic modality for disrupting PPIs. With proper engineering, they can engage proteins over large surface areas and in some cases be modified to access the cytosol. PPI-disrupting peptides are often discovered from highly diverse combinatorial libraries, through either genetic or chemical means. Genetically encoded approaches can reliably investigate enormous libraries (108–1013 members), typically via selection. However, despite progress in this area, these libraries are generally limited to natural chemical space. Peptides identified from such approaches therefore require extensive engineering to improve proteolytic stability and promote cell penetration, at the potential cost of potency. Synthetic libraries, on the other hand, are highly amenable to non-canonical amino acid incorporation and a wide variety of chemical modifications. However, these libraries are typically examined by screening, which in practice limits the diversities that can be explored to ~106. In this thesis, a magnetic bead-based affinity selection-mass spectrometry (AS-MS) workflow was developed to interrogate fully randomized, chemically accessed peptide libraries comprising up to 108 members with high fidelity. This approach takes advantage of recent advances in nano-liquid chromatography-tandem mass spectrometry (nLC-MS/MS)-based peptide sequencing, which facilitates high-confidence decoding of complex peptide mixtures. Starting with a model selection target, an anti-hemagglutinin monoclonal , it was demonstrated that high enrichments of true binders could be achieved from a library comprising 106 members. The number of binders identified scaled in proportion to library diversity, as diversity was then increased from 106–108. Beyond 108, the complexity of isolated pools from single-pass selections became too great for reliable decoding. These results were applied to selections against biomedically-relevant targets, enabling the identification of p53-like binders to the oncogenic ubiquitin ligase MDM2, and a family of low-nanomolar affinity, α/β-peptide-based binders to 14-3-3. Both sets of binders engage these targets at PPI epitopes. Finally, machine learning methods were developed to distinguish non- specific from true binders identified by AS-MS, which we anticipate will greatly streamline future discovery efforts.

Thesis supervisor: Bradley L. Pentelute Associate Professor of Chemistry

4

5 Acknowledgements It is hard to put into words what the last six years has meant to me. I have learned so much, gained so many skills, broken so much glassware, and built so many close relationships with a lot of truly amazing people. Graduate school has taught me about chemistry, yes, about peptide synthesis and libraries and biophysical characterization of binding partners…but it has also taught me how to think critically, how to analyze problems and evaluate solutions, how to manage projects and people, and how to communicate a vision that can allow you to lead. I came into graduate school wondering if I could weather it. I am writing these acknowledgements wondering if I am ready to leave it behind. A sincere, first and foremost thank you to my advisor, Prof. Brad Pentelute, for believing in me enough to let me be a part of his lab and share in his vision for the past six years. It was my meetings with Brad and his lab members that inspired me to come to MIT, so needless to say I was excited at the opportunity to conduct my graduate school journey in his lab from day one. I owe a huge debt to Brad for his support over the years, for his excitement over new ideas, his sky-is-the-limit attitude, and his injection of energy into a project when I myself was low on it. I thank him for pushing me when I needed to be pushed, for listening when I needed to be listened to, and for telling me to have fun when I needed that, too. As Brad would always say, you are only limited by your imagination, and in his lab that really felt true. I would like to thank my thesis chair, Prof. Alex Shalek, for providing guidance over the years and for our helpful conversations regarding research, career advice, and general graduate school life. His feedback after oral exams has helped me become a better and more confident communicator. I would also like to thank Prof. Catherine Drennan for welcoming me to MIT in the summer of 2014, for exemplifying what it means to be a great professor when I was a young and nervous teaching assistant for her 5.111 class, and for serving on my thesis committee for the better part of six years. To Prof. Ron Raines, who generously agreed to join my committee when scheduling conflicts arose, thank you as well. In my time in graduate school I have had the pleasure of working with many incredible people. I would like to thank Dr. Zak Gates, who served as my mentor in lab when I first started, for training me in his meticulous and rigorous style, for instilling in me his views on “mental toughness”, and for teaching me the value of a control. I would also like to thank Dr. Amy Rabideau for being a model senior graduate student when I first joined, and for exemplifying how to perform research and lead with positive energy and humble intensity. To all of those who served on team DARPA, in particular Dr. Alex Vinogradov, Dr. Chi Zhang, and Dr. Surin Mong, some of my biggest role models in grad school, I would like to thank them for bringing me on board, for talking to me as a colleague when it felt like I knew nothing, and for persevering under incredibly intense circumstances. We survived to the end. I have worked with so many people during my time at MIT that it is impossible to detail how each of them shaped my development, but I can’t faithfully write an acknowledgements section without calling them out by name here. To: John Albin, Sarah Antilla, Matthew Bakalar, Anupam Bandyopadhyay, Joseph Brown, Ivan Buslov, Alex Callahan, Puguang Chen, Dan Chinnapen, Zi-Ning Choo, Dan Cohen, Amanda Cowfer, Kyan D'Angelo, Peng Dai, Diomedes Dieppa-Matos, Heemal Dhanjee, Daniel Dunkelmann, Matt Elkins, Ethan Evans, Colin Fadzen, Charlotte Farquhar, Nathalie Grob, Cameron Hanna, Stephanie Hanna, Nina Hartrampf, Patrick Hauck, Yuki Hirata, Rebecca Holden, Katie Halloran, Mette Ishoey, Muhammad Jbara, Kashif Khan, Guillaume Lautrette, Yen-Chun Lee, Chengxi Li, Shunying Liu, Alex Loftis, Eva Maria Lopez Vidal, Mike Lu, Mikael Madsen, Yuta Maki, Aaron Mallek, Akihiro Manbo, Massi

6 Menichelli, Alex Mijalis, Somesh Mohapatra, Taichi Nakamura, Sebastian Pomplun, Mackenzie Poskus, Elaine Qian, Jacob Rodriguez, Anthony Rojas, Azin Saebi, Michael Santos, Carly Schissel, Adeline Schmitt, Deborah Sementa, Timothy Senter, Arisa Shimada, Christopher Shugrue, Mark Simon, Bente Somsen, Xuyu Tan, Jason Tao, Dale Thomas, Kyle Totaro, Faycal Touti, Nicholas Truex, Suan Tuang, Ekaterina Vinogradova, Binyou Wang, Justin Wolfe, Jessica Wilson, Xiyun Ye, Genwei Zhang, and Vanessa Zuger, thank you. It was a pleasure working with all of you. In particular, to Bente and Arisa, thank you for being patient with me and helping me to push this work forward. I am very lucky to have worked with such talented visiting students. There are sprints and there are marathons, and then there is grad school. I would have never made it to the end without the encouragement and support from my friends. Thank you to Anmol, Chris, Lisa, Jennifer, and Louis, who made my transition into Biochemistry (now Chemical Biology, maybe a sign I have been here too long) at MIT so much brighter. Thank you to the summer 2014 crew, A Dance with Doctorates, for making coming to MIT such a fun experience. To Nina, Rebecca, Colin, and Azin, thank you for all of your fun, down-to-earth energy, and for always humoring my many horrible puns. And to Loftis, Carly, and Tuang, without whom I could not have survived here (re: workouts, science discussions, nights out, volleyball, personal crises, squash games, and many, many laughs), a thank you from the bottom of my heart. Your friendship means more to me than I can express in a reasonable space here. Finally, I would like to thank my family, whose love and support made my journey to and at MIT possible. To my mom, who taught me to never cut corners, to believe in myself, and who instilled in me the importance of humble ambition—keep your feet on the ground and reach for the stars—thank you for always being there. I promise I will keep dreaming big dreams. To my dad, who got me interested in science at an early age, either through trips to the science museum or one of many Discovery Channel specials on black holes/antimatter/string theory that always seemed to be on, thank you. Your can-do-anything attitude has given me encouragement every step along the way, especially at those times when I really needed it. To my cousins, Joey, Tina, and Lisa, and my Aunt Diane and Uncle Donald: for you, three dogs and a baby might be chaotic; for me, it’s therapeutic (of course, I am not responsible for them). Thank you for all the fun and the laughs—I know we’ll keep them coming. And to my brother, Justin: it’s been nine years since I shadowed you in lab, thirteen years since you tried to tutor tutored me in chemistry, and twenty-eight years I’ve been able to call you my best friend. No one has been a greater inspiration in my professional development than you. Thank you for the support, the guidance, the conversations/debates/vehement disagreements, and the encouragement. I hope this work makes you proud. And I hope I can one day make as much money as you. But that’s a conversation for another time.

To all of you, thank you.

7 Table of Contents

Abstract ...... 4 Acknowledgements ...... 6 Table of Contents ...... 8 List of Figures ...... 11 List of Tables ...... 14 Chapter 1. Background and Overview ...... 15 1.1. Protein-protein interactions: a broad and challenging class of drug target ...... 15 1.2. Development of peptides as protein-protein interaction modulators ...... 17 1.3. Discovery of new peptide-based binders at protein-protein interaction interfaces ...... 24 1.3.1. Genetically encoded approaches ...... 25 1.3.2. Chemical approaches ...... 29 1.4. Thesis overview ...... 36 1.5. References ...... 38 Chapter 2. Affinity Selection-Mass Spectrometry of High Diversity Synthetic Peptide Libraries ...... 45 2.1. Introduction ...... 45 2.2. Results ...... 48 2.2.1. AS-MS recovers high-affinity ligands from high dilution ...... 48 2.2.2. AS-MS enriches 12ca5 binders from a 106-member library ...... 51 2.2.3. Enrichment is maintained as diversity increases from 106-108 ...... 54 2.2.4. Recovery of binders drops from libraries beyond 108 members ...... 55 2.2.5. Parallel selections distinguish non-specific binders ...... 57 2.3. Discussion ...... 59 2.4. Experimental ...... 60 2.4.1 Materials ...... 60 2.4.2. SPPS of anti-hemagglutinin (HA) epitope and analogues ...... 62 2.4.3. Competition fluorescence polarization of HA epitope and analogues ...... 64 2.4.4. BioLayer Interferometry of 25 nM-affinity 12ca5 ligand ...... 64 2.4.5. Preparation of biotinylated 12ca5 ...... 65 2.4.6. Affinity capture of 12ca5-binding peptides: effect of ligand concentration ...... 65 2.4.7. Affinity capture of 12ca5-binding peptides: effect of capture protocol ...... 68 2.4.8. Affinity capture of 12ca5-binding peptides: effect of magnetic bead concentration ...... 69 2.4.9. Affinity capture of 12ca5-binding peptides: effect of ligand concentration, with concentration of eluate ...... 70 2.4.10. Preparation of 2 x 106, 2 x 107, and 2 x 108-member (X)9K-CONH2 libraries ...... 71 2.4.11. Characterization of 2 x 106, 2 x 107, and 2 x 108-member (X)9K-CONH2 libraries ...... 74 2.4.12. Affinity selections of 2 x 106, 2 x 107, and 2 x 108-member libraries against 12ca5 ...... 76 2.4.13. SPPS of HA epitope Ala scan mutants ...... 77 2.4.14. Competition fluorescence polarization of HA epitope Ala scan mutants ...... 77 2.4.15. Investigation of enrichment vs. sample loading ...... 77 2.4.16. Preparation of a 1 x 109-member (X)9K-CONH2 library ...... 78 2.4.17. Characterization of a 1 x 109-member (X)9K-CONH2 library ...... 80 2.4.18. Affinity selections of 108 and 109-member libraries against 12ca5: effect of library diversity ...... 80

8 2.4.19. Affinity selections of 108 and 109-member libraries against 12ca5: effect of increased starting material of library ...... 81 2.4.20. Affinity selections of 108 and 109-member libraries against 12ca5: effect of increased starting material of selection target ...... 82 2.4.21. Affinity selections of 108 and 109-member libraries against 12ca5: effect of exogenous competitor on selections from a 108-member library ...... 82 2.4.22. Side-by-side selections against 12a5 and human polyclonal IgG1: ...... 83 2.5. Acknowledgements ...... 84 2.6. Appendix ...... 85 2.7. References ...... 112 Chapter 3. De novo discovery of synthetic peptide binders to MDM2/p53 and 14-3- 3/phosphoprotein interfaces ...... 116 3.1. Introduction ...... 116 3.2. Results ...... 118 3.2.1. High-diversity libraries enable discovery of MDM2-binding peptides ...... 118 3.2.2. AS-MS identifies non-canonical 14-3-3γ-binding peptides ...... 121 3.2.3. Discovered peptides bind 14-3-3γ with low nanomolar affinity ...... 124 3.2.4. β-amino acids facilitate a key binding contact with 14-3-3 ...... 125 3.3. Discussion ...... 125 3.4. Experimental ...... 127 3.4.1. Materials ...... 127 3.4.2. Preparation of a 1 x 109-member (X)12K-CONH2 library ...... 129 3.4.3. Characterization of a 1 x 109-member (X)12K-CONH2 library ...... 131 3.4.4. Preparation of synthetic (25-109)MDM2 K36(biotin) ...... 131 3.4.5. Affinity selections against MDM2: multi-pot selections of five (2 x 108)-member libraries ...... 132 3.4.6. Affinity selections against MDM2: one-pot selections of 2 x 107 and 1 x 109-member libraries ...... 133 3.4.7. Expression of 14-3-3γ ...... 134 3.4.8. Biotinylation of 14-3-3γ ...... 135 3.4.9. Preparation of a non-canonical (X)4(pS)(X)4-CONH2 library: ...... 136 3.4.10. Characterization of a non-canonical (X)4(pS)(X)4-CONH2 library ...... 138 3.4.11. Affinity selections against 14-3-3γ ...... 139 3.4.12. SPPS of FITC-labeled putative 14-3-3γ-binding peptides ...... 140 3.4.13. Fluorescence anisotropy binding assay of 14-3-3γ-binding peptides ...... 141 3.4.14. SPPS of unlabeled 14-3-3γ-binding peptides ...... 142 3.4.15. Competition fluorescence anisotropy binding assay of 14-3-3γ-binding peptides ...... 142 3.4.16. Expression of 14-3-3σΔc ...... 143 3.4.17. Binding validation of 14-3-3.12 and 14-3-3σΔc ...... 144 3.4.18. Single crystal X-Ray diffraction analysis of 14-3-3σΔc in complex with synthetic peptide binder 14-3-3.12 ...... 144 3.5. Acknowledgements ...... 145 3.6. Appendix ...... 146 3.7. References ...... 166 Chapter 4. Machine learning facilitates classification of specific and non-specific peptide binders ...... 169 4.1. Introduction ...... 169 4.2. Results ...... 171 4.2.1. AS-MS identifies both 12ca5-specific binders and non-specific binders ...... 171

9 4.2.2. Unsupervised learning distinguishes 12ca5-specific and non-specific binders ...... 172 4.2.3. Supervised learning classifies unknown sequences with 99% accuracy ...... 175 4.3. Discussion ...... 176 4.4. Experimental ...... 178 4.4.1. Preparation and characterization of libraries ...... 178 4.4.2. Affinity selection of libraries against 12ca5 ...... 178 4.5. Acknowledgements ...... 179 4.6. Appendix ...... 181 4.7. References ...... 200 Appendix: Discovery of peptide-based binders to the SARS-Cov-2 spike protein receptor binding domain and ACE2 ...... 202 A.1 Introduction ...... 202 A.2 Results ...... 203 A.3 Discussion ...... 210 A.4 Experimental ...... 211 A.4.1. Materials ...... 211 A.4.2. Synthesis and characterization of a 109 (X)12K-CONH2 library ...... 213 A.4.3. Biotinylation of SARS-CoV-2 spike RBD ...... 213 A.4.4. Affinity selections for SARS-CoV-2 spike RBD binding ...... 213 A.4.5. Solid phase synthesis of putative RBD-binding peptides and controls ...... 214 A.4.6. BioLayer Interferometry of putative RBD-binding peptides and controls ...... 215 A.4.7. BioLayer Interferometry competition assay of RBD-binding peptides ...... 215 A.4.8. Synthesis of a non-canonical (X)12K-CONH2 library ...... 216 A.4.9. Characterization of a non-canonical (X)12K-CONH2 library ...... 218 A.4.10. Affinity selections for ACE2 binding ...... 219 A.5 Acknowledgements ...... 220 A.6 Appendix ...... 221 A.7. References ...... 233

10 List of Figures

Figure 2.2.1. AS-MS enables discovery from randomized, high-diversity chemical libraries ...... 47 Figure 2.2.2. Magnetic bead capture recovers high-affinity binders from both a) simple mixtures and b) random libraries ...... 50 Figure 2.2.3. Loss of binder identification from a 109-member library cannot be attributed to material losses or competition ...... 57 Figure 2.2.4. Parallel selections enable differentiation of specific and non-specific binders 59 Figure 2.6.1. LC-MS characterization of the native HA epitope...... 85 Figure 2.6.2. LC-MS characterization of FDYEDYAEWK ...... 85 Figure 2.6.3. LC-MS characterization of HA epitope mutant Ala9Gly ...... 85 Figure 2.6.4. LC-MS characterization of Gypyeydwe ...... 86 Figure 2.6.5. LC-MS characterization of ypyefdeph ...... 86 Figure 2.6.6. Binding characterization of FDYEDYAEWK ...... 87 Figure 2.6.7. MS calibration curves for 12ca5-binding peptides ...... 88 Figure 2.6.8. Magnetic bead capture efficiently recovers high-affinity 12ca5 ligands ...... 89 Figure 2.6.9. High-affinity 12ca5-binding peptides are efficiently recovered with ‘direct’ capture at high dilution ...... 89 Figure 2.6.10. Magnetic bead concentration does not significantly improve recovery for high affinity binders ...... 90 Figure 2.6.11. Detection of recovered 12ca5 binders from 1 pM concentration is enabled with post-pulldown concentration of eluate ...... 90

Figure 2.6.12. nLC-MS/MS characterization of a (X)9K-CONH2 library ...... 91 Figure 2.6.13. LC-MS characterization of HA epitope Tyr1Ala mutant ...... 92 Figure 2.6.14. LC-MS characterization of HA epitope Pro2Ala mutant ...... 92 Figure 2.6.15. LC-MS characterization of HA epitope Tyr3Ala mutant ...... 93 Figure 2.6.16. LC-MS characterization of HA epitope Asp4Ala mutant...... 93 Figure 2.6.17. LC-MS characterization of HA epitope Val5Ala mutant ...... 93 Figure 2.6.18. LC-MS characterization of HA epitope Pro6Ala mutant ...... 94 Figure 2.6.19. LC-MS characterization of HA epitope Asp7Ala mutant ...... 94 Figure 2.6.20. LC-MS characterization of HA epitope Tyr8Ala mutant ...... 94 Figure 2.6.21. Competition fluorescence polarization of HA epitope Ala scan mutants confirms Asp4, Asp7, Tyr8, and Ala9 as ‘hot spot’ residues ...... 95

11 Figure 2.6.22. Enrichment decreases at increased sample loadings in a selection for 12ca5- binding performed at 200 pM/member ...... 96

Figure 2.6.23. nLC-MS/MS characterization of a (X)9K-CONH2 library ...... 96 Figure 2.6.24. Parallel selections enable distinction of specific binders ...... 97 Figure 2.6.25. Parallel selections identify non-specific binders ...... 99 Figure 3.2.1. Magnetic bead-based affinity selection-mass spectrometry (AS-MS) enables de novo discovery of peptide-based binders to various protein targets ...... 118 Figure 3.2.2. Biotinylated (25-109)MDM2 is readily accessed for use in selections via automated fast flow synthesis ...... 119 Figure 3.2.3. High-diversity libraries facilitate discovery of p53-like peptides ...... 121 Figure 3.2.4. Synthetic libraries identify a 14-3-3γ-binding consensus based on β-amino acids ...... 123 Figure 3.6.1. Fully automated fast flow synthesis enables rapid access to biotinylated MDM2 ring domain ...... 146

Figure 3.6.2. nLC-MS/MS characterization of a (X)12K-CONH2 library ...... 147 Figure 3.6.3. Incomplete peptide backbone fragmentation during MS/MS results in potentially inaccurate sequence assignments ...... 148

Figure 3.6.4. nLC-MS/MS characterization of a (X)4(pS)(X)4K-CONH2 library ...... 149 Figure 3.6.5. Selections of a library comprised of non-canonical amino acids against 14-3-3γ identify sequences with prominent N-term and C-term motifs ...... 149 Figure 3.6.6. Select putative 14-3-3γ binders are reproducibly pulled down in the presence of 14-3-3 ...... 150 Figure 3.6.7. 12ca5-unique sequences are not reproducibly pulled down and are likely selection artifacts ...... 152 Figure 3.6.8. LC-MS characterization of FITC-labeled 14-3-3.1 ...... 153 Figure 3.6.9. LC-MS characterization of FITC-labeled 14-3-3.6 ...... 154 Figure 3.6.10. LC-MS characterization of FITC-labeled 14-3-3.12 ...... 155 Figure 3.6.11. LC-MS characterization of FITC-labeled NB.1 ...... 156 Figure 3.6.12. LC-MS characterization of unlabeled 14-3-3.1 ...... 157 Figure 3.6.13. LC-MS characterization of unlabeled 14-3-3.6 ...... 158 Figure 3.6.14. LC-MS characterization of unlabeled 14-3-3.12 ...... 159 Figure 3.6.15. LC-MS characterization of unlabeled NB.1 ...... 160 Figure 3.6.16. FITC-labeled 14-3-3.12 retains binding activity for 14-3-3σ as measured by fluorescence anisotropy ...... 161 Figure 3.6.17. 4-Nitrophenylalanine9 makes key contacts with 14-3-3σ ...... 162

12 Figure 4.1.1. Machine learning streamlines peptide binder discovery to a model selection target ...... 171 Figure 4.2.1. Unsupervised learning over chemical similarities distinguishes 12ca5-specific and non-specific binders ...... 174 Figure 4.2.2. Supervised learning classifies 12ca5-specific, non-specific, and non-binders 176 Figure A.2.1. Identified peptides bind SARS-CoV-2 spike RBD with nanomolar affinity 207 Figure A.2.2. A consensus motif based on non-canonical amino acids is identified from selections against ACE2...... 209 Figure. A.6.1. Analysis of selection hit RBD.1 ...... 221 Figure. A.6.2. Analysis of selection hit RBD.2 ...... 222 Figure. A.6.3. Analysis of selection hit RBD.3 ...... 223 Figure. A.6.4. Structure and LC-MS characterization of RBD.1-biotin ...... 224 Figure. A.6.5. Structure and LC-MS characterization of RBD.2-biotin ...... 225 Figure. A.6.6. Structure and LC-MS characterization of RBD.4-biotin ...... 226 Figure. A.6.7. LC-MS characterization of Scr1, a sequence scrambled variant of RBD.1 227 Figure. A.6.8. LC-MS characterization of Scr2, a sequence scrambled variant of RBD.1 227 Figure A.6.9. Identified RBD-binding peptide shows no binding activity towards control protein 12ca5 ...... 228 Figure A.6.10. RBD.1 does not compete with ACE2 for RBD binding ...... 228 Figure A.6.11. Analysis of putative ACE2-binding peptide Asp-DPro-Msn-4Af-Amb-Amb- Amb-Dff-Ile-Gly-Gln-Php-Lys ...... 229 Figure A.6.12. Analysis of putative ACE2-binding peptide Orn-Dff-DPro-Asp-Aad-Amb- Dff-Gly-Gly-Amb-hAr-Php-Lys ...... 229 Figure A.6.13. Analysis of putative ACE2-binding peptide Tha-Dmf-Aad-Aad-Aad-Asp- Cha-Dff-DPro-Cpa-4Py-Php-Lys ...... 230 Figure A.6.14. Analysis of putative ACE2-binding peptide Php-Aad-Aad-Tha-4Af-Ile-Gly- Gln-Php-Msn-Asp-Amb-Lys ...... 230

13 List of Tables

Table 2.2.1. Mixtures of up to 109 peptides contain sufficient material for MS-based sequencing ...... 48 Table 2.6.1. Characterization of 12ca5-binding peptides by competition fluorescence polarization...... 100 Table 2.6.2. Affinity-capture mass spectrometry identifies 12ca5-binding sequences in proportion with library size ...... 101 Table 2.6.3. Replicate selections from a 2 x 108-member library identify similar populations of 12ca5-binding peptides ...... 105 Table 2.6.4. Use of a precursor selection threshold modulates enrichment based on sample loading ...... 106 Table 2.6.5. Selections against 12ca5 identify a decreasing number of motif-containing sequences as library size is increased from 108 to 109 ...... 107 Table 2.6.6. Increasing scale of 109-member library selections does not restore recovery of 12ca5-binding peptides ...... 108 Table 2.6.7. Ten-fold increase in 12ca5 in selections from a 109-member library does not restore recovery of 12ca5-binding peptides ...... 109 Table 2.6.8. Peptides identified in presence of high exogenous competitor exhibit generally stronger signal intensities than those identified only in the presence of low exogenous competitor ...... 110 Table 2.6.9. Sequence subtraction from side-by-side selections yields modestly improved enrichments ...... 111 Table 3.6.1. Multi-pot selections against MDM2 identify p53-like peptides from a 109- member library ...... 163 Table 3.6.2. Side-by-side selections against 14-3-3γ and 12ca5 identify 17 sequences pulled down uniquely in the presence of 14-3-3 ...... 164 Table 3.6.3. Data collection and refinement statistics for the 14-3-3σΔC/14-3-3.12 complex ...... 165 6 8 9 Table 4.6.1. Affinity selections of 10 –10 -member (X)9K and 10 -member (X)12K libraries against 12ca5 identify 1055 12ca5-binding peptides ...... 181 6 8 9 Table 4.6.2. Affinity selections of 10 –10 -member (X)9K and 10 -member (X)12K libraries against 12ca5 identify 1649 non-specific binders ...... 189 Table A.2.1. AS-MS identifies three selective RBD-binding peptides ...... 205 Table A.6.1. Selections for ACE2 binding identify 54 putative binders ...... 231

14 Chapter 1. Background and Overview

1.1. Protein-protein interactions: a broad and challenging class of drug target

Protein-protein interactions (PPIs) are critical for maintaining a wide variety of cellular processes, including signal transduction, metabolic processing, and cell cycle control1. Although proteins individually perform essential functions, such as catalysis or ion transport, their actions independent of other biomolecules are short ranging, limited by their half-lives and the crowded cellular environment2. The output of proteins acting individually, therefore, is insufficient to sustain life. Proteins instead exert their influence over longer ranges through vast, complex networks of interactions with other proteins (and other biomolecules), enabling highly coordinated regulation of cellular physiology3,4. It is estimated that, from only ~20,000 protein coding genes, the human interactome consists of ~650,000 PPIs5.

Given their abundance and importance for sustaining normal cell function, it is perhaps not surprising that aberrant signaling among PPIs can lead to a variety of human disease states3.

PPIs have therefore drawn much interest from the pharmaceutical industry as a broad class of therapeutic target, providing opportunities for intervention in indications such as oncology, infectious diseases, heart failure, inflammation, and neurological disorders6. Indeed, Venetoclax

(a small molecule) and Pembrolizumab (a monoclonal antibody) are two recent examples of

FDA-approved PPI inhibitors used for the treatment of various cancers, each a first-in-class inhibitor of its respective target.

Despite these successes, however, the development of potent PPI modulators has historically been very challenging. PPIs typically interact over large interfacial surface areas7, approximately between 1000 and 2000 Å2, in contrast6 to more traditional enzyme active site targets, which interact with their substrates over surface areas of ~300–500 Å2. Moreover, PPI

15 interfaces often tend to be relatively flat and featureless, lacking the defined pockets characteristic of enzyme active sites. PPI modulation is therefore a challenging task for traditional small molecules (typically < 500 Da), which tend to lack the surface area or chemical topology required for potent binding at PPI interfaces (though some exceptions exist to this generality, particularly for PPIs with smaller buried surface areas)6,8. The success of the small molecule Venetoclax (868 Da), which inhibits the apoptotic regulator Bcl-2, demonstrates the importance of molecular weight—and contact surface area—for PPI inhibition9.

Due to their ability to engage targets over large surface areas, large biologics—such as nanobodies and monoclonal —are an alternative strategy for PPI inhibition (in fact, most therapeutic antibodies engage protein targets at PPI interfaces). Pembrolizumab, for example, disrupts the interaction between the programmed cell death-1 (PD-1) receptor on T- cells and programmed cell death ligand-1 (PD-L1) on the surface of tumors, abrogating a negative regulatory signal that otherwise downregulates T-cell effector function10. This inhibition has the net effect of stimulating the immune system to recognize and kill tumor cells.

Bevacizumab, another FDA-approved, PPI-disrupting monoclonal antibody, binds to vascular endothelial growth factor A (VEGF-A) and prevents its interaction with VEGF-A receptors, which play critical roles in the development of angiogenesis11. Inhibition of these interactions results in reduced microvascular growth of tumor blood vessels, and ultimately favors of tumor endothelial cells11.

Despite their promising binding properties, however, large biologics face a key limitation in that they cannot easily traverse the cell membrane. Their inability to efficiently escape endosomes and reach the intracellular milieu severely limits the scope of PPIs amenable to biologic-mediated inhibition, since it is estimated that approximately two-thirds of the human

16 proteome is intracellular12. Although many strategies now exist to help improve cytosolic access of biologics, such as cell-penetrating peptides13, liposomes and nanoparticles14, and various protein modification approaches15,16, the ability to efficiently deliver a biologic inside the cell remains a formidable challenge12. Therefore, biologics are generally not therapeutically viable for inhibition of a significant number of PPIs.

For these reasons, intracellular PPIs had for many years been considered “undruggable” by the pharmaceutical industry. Their flat interfaces provide few docking sites for small molecules to latch onto, and their location either in the cytosol or in intracellular compartments generally restricts their access to large biologics. However, recent progress in PPI inhibitor development, particularly with respect to modalities outside of low molecular weight small molecules or antibodies—such as peptides, peptoids, and foldamers—has provided enthusiasm for the future of therapeutic PPI modulation.

1.2. Development of peptides as protein-protein interaction modulators

Peptides are a promising class of compound for the interrogation and perturbation of

PPIs. Given the unique niche of chemical space they occupy, possessing molecular weights in between those of small molecules (<500 Da) and large biologics (up to ~150,000 Da), peptides have come to be considered a “middle space” therapeutic modality17. By nature of their larger molecular weights relative to small molecules, they can make more contacts with a protein surface, in some cases enabling high affinity binding, and moreover, through chemical modifications, can be endowed with the ability to access the cell cytosol and reach intracellular targets, in contrast to most biologics.

In addition to favorable chemical properties, peptides possess a number of synthetically attractive features as well, which aid in their development as PPI modulators. They are readily

17 synthetically accessible through solid phase peptide synthesis (SPPS)18, enabling easy assembly of analogue libraries for structure-activity relationship (SAR) studies (recent developments in automated flow-based SPPS have especially streamlined this process19,20). Peptides are also amenable to significant chemical modification, enabling facile tuning of functional properties.

Such modifications, including incorporation of non-canonical amino acids, head-to-tail macrocyclization, and α-helical stapling, are important because they can render peptides more proteolytically stable21, more cell-penetrant22,23, and even increase binding affinity24,25 relative to their natural, underivatized counterparts, which on their own tend to exhibit poor pharmacological properties.

The development of peptide-based (and small molecule-based) PPI modulators has benefited from the observation that not all contacts at a PPI interface are critical for binding affinity. Instead, only a few contacts—so-called ‘hot spots’, originally defined as residues for which substitution with an alanine yielded a change in binding free energy of ≥ 2.0 kcal/mol— confer the majority of the binding energy26. This phenomenon has made practical structure- guided inhibitor designs, which aim to position the most important amino acids for binding on a topologically-defined scaffold, ideally mimicking the structure of the native partner protein at the interaction site27. A number of examples of how this type of design, such as mimicry of loops, α- helices, and β-strands, has facilitated development of peptide-based PPI modulators is provided here.

A classic example of rational, peptide-based PPI modulator design is the development of

RGD-based compounds that antagonize integrin receptors. Integrins constitute a large family of

α/β heterodimeric transmembrane that play important roles in cell-cell adhesion, and are also implicated in driving cell invasion, angiogenesis, and metastasis in various

18 cancers28. Because most integrin-binding proteins contain the common Arg-Gly-Asp tripeptide, many efforts have focused on development of peptides and derivatives based on these residues, which have been shown to be hot spots for integrin binding. Kessler and coworkers, for example, developed a series of cyclic RGD-containing pentapeptides that selectively and potently antagonize the αvβ3 integrin. In particular, cyclo(RGDfV) exhibited markedly increased binding affinity and suppression of tumor-induced angiogenesis relative to linear RGDfV, demonstrating the importance of structure for facilitating binding to a protein surface28,29.

A number of rational design efforts have focused on mimicking α-helices found at certain

PPI interfaces. An important technology that has aided these efforts is the introduction of an all- hydrocarbon staple30 at the i, i+ 4 or i, i + 7 positions via ring-closing metathesis31, which can help stabilize helicity, boost stability towards proteases, and in some cases increase cell penetration23. In 2012, Verdine and coworkers used this strategy to develop stapled peptide binders to β-catenin, a hub in the Wnt signaling pathway, based on the β-catenin-binding domain of the protein Axin32. Aberrant activation of Wnt signaling is implicated in the progression of several types of cancer, and therefore specific inhibitors of this pathway have become of interest.

The Verdine group showed that incorporation of a staple at positions not involved in β-catenin binding yielded one variant with ~80-fold improved binding affinity relative to the unstapled parent peptide (60 nM vs. ~5 µM KD), again demonstrating the importance of conformational constraint for binding. Further optimization led to constructs that were cell-penetrant, reduced β- catenin-mediated transcriptional activity, and reduced the viability of Wnt-dependent cancer cells32.

α-helical mimicry has also been used to design peptide-based inhibitors of the oncoprotein KRAS, an important but notoriously difficult target in cancer biology. KRAS, a

19 GTPase, is implicated in the pathogenesis of approximately 30% of all human tumors, but PPI inhibitor development has been challenging due to the particularly large and featureless interfaces that characterize KRAS-mediated interactions8. In 2011, Arora and coworkers reported a peptide-based inhibitor of KRAS developed using a hydrogen bond surrogate (HBS) approach, in which the main chain N-terminal hydrogen bond between the carbonyl of the ith amino acid and the amine of the i + 4th amino acid of the peptide is replaced with a carbon-carbon bond, yielding a preorganized α-helix33. The peptide, derived from an α-helix in the guanine nucleotide exchange factor SOS1, bound KRAS with mid-micromolar affinity. In 2015, the Walensky group developed sub-micromolar affinity binders to KRAS based on the same SOS1 α-helix,

34 using i, i+ 4 hydrocarbon stapling . Their most potent peptide, SAH-SOS1A, optimized for cell penetration, bound KRAS with 100-175 nM KD, directly inhibited nucleotide association to

KRAS, and reduced the viability of KRAS-mutant cancer cells, though it should be noted the specificity of this peptide for KRAS has recently been called into question35.

Mimicry of α-helices has been exploited for inhibition of a number of other PPIs as well.

Verdine and coworkers used i, i+ 4 hydrocarbon stapling to develop inhibitors of the anti- apoptotic protein Bcl-2, based on an interacting α-helix derived from the BH3 domain of the pro- apoptotic BID36. Their most potent analogue could specifically activate apoptosis to kill leukemia cells, and inhibited the growth of human leukemia xenografts in mice. Other Bcl-2 inhibitors, derived from BH3 domains of other pro-apoptotic proteins, have also been reported37.

Over the past decade, Aileron has developed a stapled peptide, ATSP-7041, that potently inhibits the oncoproteins MDM2 and MDMX38. Natively, MDM2 and MDMX act to inhibit and degrade p53, a tumor suppressor that induces cell-cycle arrest and apoptosis in response to DNA damage and cellular stress39. ATSP-7041 was initially developed based on an α-helix in p53 that

20 mediates binding to MDM2, and was identified after many rounds of lead optimization to give high-affinity MDM2 (Ki = 0.9 nM) and MDMX (Ki = 7 nM) binding. This peptide was shown to reactivate p53 with great potency, and a derivative (ALRN-6924) has progressed to clinical trials for treatment of various cancers.

α-helices and loops are not the only secondary structural elements amenable to peptide- based mimicry. Peptides based on β-strands have been developed for disrupting PPIs involving

β-sheets as well40. A common approach for inhibiting PPIs that mediate amyloid protein aggregation, for instance, is to use peptides based on the sequence recognition element (SRE) of an amyloid protein, which typically adopt β-strand structures in the misfolded or aggregate forms40. These SREs contain the hot spots for the interactions involved in forming β-sheet structures, and therefore in driving aggregation. Using this approach, peptide-based inhibitors of superoxide dismutase 1 (SOD1) aggregation41, implicated in the pathogenesis of amyotrophic lateral sclerosis, were developed based on the β6 and β7 strands of SOD1. This same approach was used in developing inhibitors of aggregation mediated by β-sheet interactions of Amyloid β and of Tau protein, implicated in Alzheimer’s disease42.

Another strategy to disrupt PPIs involving β-sheet structures is using cyclic β-hairpin mimics. Cyclic dodecapeptides, derived from the N-terminal region of domain 1 of ICAM-1, were developed to disrupt the ICAM-1/LFA-1 interaction involved in T-cell adhesion43. The β- turn in these peptides, formed around the Pro-Arg-Gly sequence, was stabilized by macrocyclization via a disulfide bond, and adopted a similar structure to the β-turn observed at this site in ICAM-1. These peptides demonstrated stronger inhibition than their linear analogues, again demonstrating the importance of conformational constraint for binding.

β-hairpin mimicry was similarly used to develop inhibitors of the T-cell

21 CD2/epithelial cell receptor CD58 interaction, implicated in autoimmune diseases such as rheumatoid arthritis44. Epitopes derived from the CD2 interacting interface were grafted into either sunflower trysin inhibitor or mammalian rhesus theta defensing scaffolds, and the most potent analogues were shown to inhibit cell adhesion and suppress T-cell response in vitro. A number of other examples of using β-hairpin mimicry for PPI disruption have been reported as well40.

Despite the many successes outlined above, predicting hot spots in undercharacterized

PPIs—and therefore rational design of PPI modulators—is still a key challenge. It is not always clear which helix, sheet, or loop in a PPI is most important for binding prior to any mutational studies. Moreover, alanine scanning across an entire PPI interface, and subsequent biophysical studies to determine binding affinities of every analogue, can be laborious and time-consuming.

To address this challenge and further facilitate rational design approaches, many computational methods have been developed for in silico prediction of hot spot residues at PPI interfaces. The

FOLDEF algorithm45, for instance, and Rosetta’s computational alanine scanning engine46, systematically replace each residue at a PPI interface with alanine, and calculate changes in binding energy based on a number of different energetic terms to predict hot spots. A different approach, called ANCHOR, identifies “anchor” residues at protein interfaces by determining how much solvent accessible surface area of side chains changes upon binding47.

More recently, databases such as HippDB48, SippDB49, and LoopFinder50, using structures from the Protein Data Bank (PDB), have been assembled that have computationally characterized in detail PPIs mediated by α-helices, β-sheets, and non-helical/non-sheet loop structures, respectively, providing valuable information for structure-guided design. With the aid of LoopFinder, for instance, the Kritzer lab developed cyclic inhibitors of the Eps15/stonin2

22 interaction, which helps facilitate clathrin-mediated endocytosis51. Peptides derived from a loop on stonin2, identified as important for binding by LoopFinder, were cyclized via thiol bis- alkylation, and the most potent analogue bound the relevant Eps15 domain with a KD of 0.33

µM.

Despite all of the progress made in structure-guided, peptide-based PPI inhibitor design, it is not always the case that the native interacting epitope represents the optimal sequence for tight binding. Indeed, in many cases, peptides derived from the native sequence may simply lack the requisite affinity for useful PPI modulation. Moreover, proteins involved in PPIs can exhibit promiscuity, interacting with several different proteins through the same region. Such contact surfaces have been shown to be adaptable, enabling some proteins to recognize and bind a number of structurally diverse partners52. Therefore, a variety of chemical solutions—including solutions not explored by nature—can exist to facilitate binding to a protein surface, some with higher affinity than others.

To exploit this phenomenon, a number of different peptide library-based approaches have been developed over the past three decades to sample huge swaths of chemical space for high- affinity protein binding. Such approaches often employ libraries based on unbiased, fully randomized scaffolds (i.e. not based on a known peptide ligand), and involve screens or selections to identify functional variants. Phage display has become perhaps the most widely recognized of these techniques53, but many others have been developed as well, and have become common tools in academic labs and the pharmaceutical industry. Broadly, these methods can be categorized into molecular biology-based and chemical approaches, and a perspective on how these techniques have been leveraged for discovery of novel PPI modulators is provided in the next section.

23

1.3. Discovery of new peptide-based binders at protein-protein interaction interfaces

Many methods have been developed for panning randomized peptide libraries against protein targets of interest. These libraries, which can range from 104–1014 members, are accessed through some form of either genetic encoding or chemical synthesis54. Genetic encoding approaches use the biological and translation machinery to produce peptides that are directly or indirectly linked to their encoding nucleotide sequence (DNA or RNA), a feature that greatly facilitates downstream hit deconvolution. These approaches include the split-intein circular ligation of peptides and proteins (SICLOPPS), as well as a variety of display techniques, such as phage display, yeast display, and mRNA display. In some embodiments, genetically encoded libraries can sample enormous diversity, up to 1014 members, but generally are more limited with respect to the non-natural chemical diversity that can be routinely incorporated.

Chemical approaches for library generation, on the other hand, rely on direct chemical synthesis of library members, and often employ some type of split-and-pool procedure to generate diversity. These methods, which include one-bead-one-compound libraries and affinity selection- mass spectrometry, can readily incorporate a wide variety of non-canonical amino acids, and therefore sample diverse swaths of non-natural chemical space, but tend to be much more limited with respect to the library sizes that can be routinely investigated, historically limited to about

106. (DNA-encoded libraries55,56, which can sample up to ~109 members, represent another chemical approach for peptide library generation, but due to inefficiencies in coupling chemistries and design challenges tend to only comprise 3–4 building blocks, and therefore more commonly sample small molecule-like space57. For this reason, they are excluded from this discussion.)

24 Here is provided a brief description of genetically encoded and chemical approaches for assaying randomized peptide libraries, with particular emphasis on how these approaches have been used to discover novel PPI modulators.

1.3.1. Genetically encoded approaches

One of the most widely used molecular biology-based approaches for identification of novel peptide ligands is phage display58. In this approach, peptides are displayed on the surface of filamentous bacteriophages, such as M13, providing a link between the genome and peptide.

Initially, a library of plasmids is transformed into E. coli, and peptide-tagged phages are expressed, assembled, and released. The subsequent library of phages, typically on the order of

109 (library size here is limited by transformation efficiency), is then panned against an immobilized target protein, and binders are separated from non-binders through washing. This process can be achieved iteratively through amplification of isolated phages, enabling high enrichment of binders over non-binders54.

First reported in 1990, phage display has been widely adopted for discovery of novel protein-protein interaction modulators. An early example of such modulation was the discovery of a 20-residue peptide that binds and agonizes the erythropoietin (EPO) receptor, despite having no sequence similarity to the native hormone59. This peptide binds as a dimer and induces dimerization, and therefore activation, of EPO. Other early examples include identification of peptides that inhibit insulin-like growth factor 1/IGFBP-160, vascular epidermal growth factor/kinase insert domain receptor61, and IgG Fc/streptococcal Protein A interactions62,63.

Common to all of these examples is the presence of multiple cysteines, fixed in at least a subset of the library designs employed, to enable disulfide bridge-mediated macrocyclization.

This design feature was incorporated because it has come to be regarded that libraries of linear,

25 unstructured peptides tend to be poor sources of high-affinity ligands, due to high entropic penalties paid by linear peptides upon binding their target. Moreover, cyclic peptides are generally more stable towards proteases relative to their linear analogues. Therefore, much effort has been paid to panning libraries based on constrained scaffolds, such as the disulfide- bridged designs employed in the examples above and various bicyclic designs64.

In 2009, Heinis and coworkers reported a method for generating phage display libraries based on a bicyclic scaffold using three cysteines and a tris-(bromomethyl)benzene linker65.

Similar linkers for this purpose include 1,3,5-triacryloyl-1,3,5-triazinane and N,N’,N’’-(benzene-

1,3,5-triyl)-tris(2-bromoacetamide). All demonstrate robust reactivity and selectivity, are compatible with aqueous solvents, and are symmetric such that a single product is formed64.

Using this approach, binders to β-catenin66 (mid-micromolar affinity), the Notch receptor (150

67 68 64 nM KD) , and Her2 receptor (304 nM KD) , among other targets , have been identified. More recently, the Heinis lab developed methods for screening phage libraries cyclized via two chemical bridges (as opposed to one), enabling sample of many possible diverse scaffolds69. This approach has yielded potent binders to plasma kallikrein (0.5 nM Ki) and the cytokine

69 interleukin-17 (36 nM KD) .

Despite the many successes of phage display, a key challenge remains the ability to examine libraries comprising a high degree of non-canonical amino acid content. Single phenylalanine derivatives were first introduced in phage libraries by Schultz and coworkers in

2004. More recently, Chen and coworkers reported a method to incorporate two non-canonical amino acids with high efficiency using an evolved orthogonal ribosome70, and Liu and coworkers disclosed a method to increase efficiency of non-canonical amino acid incorporation using a technique based on superinfection immunity of phages71. However, despite these

26 advances, phage libraries are still typically limited to only a single site of non-canonical amino acid incorporation, with efficiency of incorporation varying depending on the amino acid. For these reasons, phage libraries generally sample peptides comprising natural amino acids.

An alternative genetically encoded approach for sampling peptide libraries is mRNA display72. In mRNA display, libraries of mRNA are translated in vitro, bypassing the transformation step and enabling access to libraries comprising 1013–1014 members54. A link is maintained between the mRNA and peptide through the covalent attachment of puromycin to each mRNA template, which reacts with the peptide at the stop codon position and enables mRNA barcoding of each library member. During a selection, libraries are typically panned against an immobilized target (for instance, on magnetic beads), the mRNA strands encoding bound members are reverse transcribed and amplified, and the whole process repeated. After multiple rounds of selection, sequences of bound members are identified via DNA sequencing54.

An important advantage of the mRNA display technology relative to phage display is that multiple non-canonical amino acids can be readily incorporated. While non-canonical amino acid incorporation in phage display relies necessarily on the in cellulo efficiency of engineered tRNA synthetases, in in vitro translation, pre-aminoacylated tRNA can be supplied directly. A key advance in this area came with the development of flexizymes73 by Suga and coworkers, aminoacylating ribozymes that recognize activated carboxylates and can charge tRNAs with a wide variety of non-canonical amino acids. This technology, known as flexible in vitro translation (FIT), therefore allows for a much greater degree of genetic code reprogramming relative to other molecular biology-based approaches. Examples of non-canonical amino acids that can be incorporated include phenylalanine analogs, β-amino acids, D-amino acids, N-methyl amino acids, and residues with azoline backbones, though the incorporation efficiency of each

27 class varies. Interfacing FIT with mRNA display resulted in the random non-standard peptide integrated discovery (RaPID) system, which today finds broad utility both in academia and the pharmaceutical industry.

mRNA display technology has been used extensively for the discovery of potent protein- protein interaction inhibitors. A few recent examples will be provided here. In 2016, Suga and coworkers used the RaPID system to identify a 16-mer cyclic peptide that potently binds plexin

B1 (3.5 nM KD) and inhibits its interaction with semaphorin 4D, an axonal guidance factor implicated in immune responses, cancer cell proliferation, and organogenesis74. The same system was used to identify cyclic peptide inhibitors of the interaction between 24

(eVP24) and KPNA (the most potent demonstrating a 9 µM IC50), involved in suppression75. This peptide represents the first example of a chemical probe capable of disrupting eVP24/KPNA. In 2018, an extensively reprogrammed genetic code, comprising 12 proteinogenic and 11 non-canonical amino acids, was used to generate a ~1012-member library via FIT, sampling a large swath of non-natural space76. Selections from this library yielded several interleukin-6 receptor (IL6R) binders (44 nM and 357 nM KD) comprising 50-60% of non-canonical amino acid content. Finally, earlier this year, the RaPID system once again provided the first chemical probes to inhibit a particular PPI, in this instance the Wnt3a/human afamin complex, implicated in colorectal cancer. One peptide, a macrocyclic 13-mer, bound

Wnt3a with A KD of 171 nM, and was shown in a cell culture model to inhibit Wnt3a-mediated signaling77.

The RaPID system in particular therefore represents a powerful tool for early stage drug discovery. It has been shown to routinely provide moderate to high affinity ligands to a wide variety of protein targets, and in many cases these ligands contain non-canonical amino acids.

28 Moreover, it provides access to larger library sizes than can be investigated by any other approach.

However, despite these advantages, RaPID still faces key challenges moving forward.

Efficient generation of libraries wherein consecutive non-canonical amino acids of certain types are employed, such as consecutive D-amino acids or β-amino acids, is impeded by low incorporation efficiencies78,79. It would be impossible, therefore, to discover a peptide binder comprised entirely of D-amino acids with this approach (a feature that would be attractive for protease stability). Moreover, any post-translation chemical modifications that are made must be compatible with aqueous conditions, as well as the presence of mRNA and unprotected peptides.

Introduction of an all-hydrocarbon staple via ring-closing metathesis, for example, to preprogram a library with a propensity for helicity or cell penetration, would not be possible in mRNA display. These limitations together restrict the scope of non-natural chemical space that RaPID can investigate, though progress continues to be made in this area.

1.3.2. Chemical approaches

Chemically accessed libraries are an alternative to genetically encoded libraries for new ligand discovery. Solid-phase organic synthesis in particular offers a powerful technique for generating peptide libraries that sample expanded chemical diversity. There are two primary approaches for mining peptide libraries generated from solid-phase synthesis: on-bead screening, commonly referred to as the one-bead-one-compound (OBOC) approach, and in-solution affinity selections. An overview of each of these approaches, along with a discussion of the key challenges facing each technique, will be provided here.

Beginning with the first report of the one-bead-one-compound approach in 1991 by Lam and coworkers, on-bead screening has held much potential for early stage drug discovery. In this

29 approach, libraries are synthesized via the split-and-pool method80, wherein aliquots of solid support resin are split out, coupled with unique amino acid-based building blocks, and pooled back together. The solid support used is often TentaGel (a crosslinked polystyrene matrix onto which polyethylene glycol (PEG) is grafted), which has favorable swelling properties in aqueous media. Following appropriate N-terminal deprotection, the resin is split out again, and the process repeated, enabling rapid generation of diversity with each coupling cycle. In the resulting library, each bead of the solid support contains a single library member. Following side chain deprotection, the resin-bound library is incubated with a labeled target protein (in its original implementation, a protein containing a fluorescein or alkaline phosphatase), washed several times, and stained beads are isolated either manually or with a particle sorter (though many variations on this basic screening set up now exist). Finally, following cleavage from the isolated bead, the sequence of the peptide is determined, typically via either Edman degradation or tandem mass spectrometry (MS/MS). Due to technical challenges associated with handling large quantities of resin during screening, OBOC libraries are typically limited to ~106 members.

A clear advantage of the OBOC approach relative to genetic-encoding techniques is the ability to readily incorporate non-natural amino acid functionality, as well as access a variety of cyclic and bicyclic architectures. An example of exploiting both of these features for ligand discovery is the identification of bicyclic peptide inhibitors of tumor necrosis factor-α (TNF-α) by Pei and coworkers in 201381. Using a combination of magnetic bead sorting (wherein biotin- labeled TNF-α is incubated with library and captured with streptavidin-coated magnetic beads), and multiple rounds of colorimetric and fluorescence-based bead picking, the authors report the discovery of two bicyclic peptides, anticachexin C1 and C2, that bind TNF-α with a KD of 0.45 and 1.6 µM, respectively. Weak or no binding is reported for the linear and monocyclic variants.

30 Using similar techniques, Pei and coworkers have reported cell permeable, bicyclic binders to the peptidyl-prolyl isomerase Pin182, nuclear factor-κB essential modulator (NEMO)83, and

KRAS84.

Decoding hits from non-linear scaffolds presents a challenge in the OBOC approach, as the chemistry of Edman degradation relies on linear peptides, and MS/MS cannot routinely provide fragmentation spectra from cyclic scaffolds of sufficient quality to enable high- confidence sequencing. One way to circumvent this issue is with the use a linear encoding tag, as was employed for the discovery of anticachexin C1 and C2. In this approach, the TentaGel resin is first spatially segregated through either enzymatic or chemical methods. In theory, these procedures result in orthogonally protected amines in the outer shell and inner core of the resin.

The monomers needed to facilitate later cyclization can then be selectively coupled to the amines on the shell, and the protecting groups in the core can then be removed and the varied positions of the library added simultaneously to the inner and outer portions. Cyclization can then be carried out, leaving the encoding tag unaffected.

Another strategy to decode cyclic peptide architectures is with the incorporation of cleavable amino acid residues or cleavable linkers within the cycle. In 2016, Biron and coworkers demonstrated the use of the photocleavable β-amino acid 3-amino-3-(2- nitrophenyl)propionic acid (ANP) moiety for facilitating robust linearization and MS/MS sequencing85. The authors validated this strategy on individually synthesized cyclic peptides comprising five to nine residues, as well as on cyclic heptapeptides derived from a library. In

2018, Xue and coworkers reported a linearization strategy to enable decoding of bicyclic peptides86. Linearization was carried out using Pd-catalyzed deallylation and Edman degradation-type chemistry, and peptides were sequenced via MS/MS. Using this strategy

31 applied to an OBOC library, the authors identified a bicyclic peptide that recognizes the c-Myc transcription factor with an EC50 of 16 µM.

Recently, progress has also been made in the direct MS/MS-based sequencing of cyclic peptides. In 2018, Lokey and coworkers developed an automated sequencing program, CycLS, to facilitate identification of cyclic peptide sequences without the need for encoding tags or cleavable linkers87. Briefly, a virtual fragment library is first created based on the input library design, and the program implements pre-processing of secondary mass spectra to amplify signal and remove noise. Each combined secondary spectrum is next compared to a subset of the virtual fragment library, derived from candidate sequences based on matches to the parent mass.

Matches to each fragment are tracked, and scores are assigned to each sequence based on total matches and total intensity of matches, reflecting the confidence of each assignment. The authors validated this program by decoding compounds from an 1800-member cyclic hexapeptide library with ~80% fidelity.

Perhaps the biggest challenge facing OBOC library screening is the limited enrichments typically achieved—that is, isolation of true binders often comes at the expense of isolating many library members that do not actually exhibit activity upon resynthesis and validation (in the literature, these binders are often referred to as false positives or non-specific binders).

Enrichments of true binders from background can be hampered by 1) binding of the peptide or bead itself to other components of the screen, such as the fluorophore on a target protein, and 2) high ligand loading on the resins used for synthesis, resulting in high avidity of ligands on the bead surface. A number of strategies have been employed to address these challenges. In 2014,

Lee and coworkers found that using zwitterionic dyes, as opposed to dyes with net negative charges, could significantly reduce non-specific background interactions. Kodadek et al. reported

32 the use of redundant OBOC libraries—wherein members of the library are represented more than one time—to improve enrichments, with the hypothesis being that peptides that are identified reproducibly are more likely to be true binders88. This strategy compensates for inhomogeneity in the bead population with respect to surface density, making it less likely to identify a false positive caused by aberrantly high avidity on a given bead. Topological segregation89 of the beads has also been used to address the problem of high avidity, wherein the outer shell contains a reduced ligand density but a high loading is maintained in the bead interior. To avoid time- consuming and unnecessary resynthesis of false positives, Auer and coworkers devised a protocol to install a fluorescent handle in the library and, following hit isolation and cleavage from the bead, perform direct affinity measurements by fluorescence polarization90.

Despite these advances, however, limited enrichments of true binders are still a key challenge in the on-bead screening approach. The discovery campaign that gave rise to anticachexin C1 and C2, which employed topological segregation of the resin and multiple orthogonal screening techniques, isolated 44 total hits, only 6 of which showed measurable binding activity in solution. The validity of a KRAS/Raf-disrupting peptide identified from

OBOC screening in 2015 has recently been challenged, a cautionary tale of the time and effort spent pursuing a compound that turned out to be non-specific35. Ultimately, the resynthesis and validation efforts of hits from on-bead screening, many of which show no activity in solution, remain a bottleneck in the OBOC approach. Its potential as a powerful, high-fidelity tool for drug discovery therefore has yet to be realized.

Affinity selection-mass spectrometry (AS-MS) is an alternative approach for mining chemically accessed peptide libraries. In this technique, peptide mixtures are investigated in solution, as opposed to on beads, for binding to a target molecule. A key advantage of the

33 solution-based approach is that it is inherently label-free, and eliminates the avidity affects associated with bead-based screening. One of the earliest examples of AS-MS applied to peptide libraries, with direct identification of the active component, was the 1997 report by Huebner and coworkers of a selection from a 600-member library, using size-exclusion chromatography

(SEC) to partition bound from unbound members91. The protein fraction with bound ligands was submitted for MS/MS analysis, and two peptides were identified that presumably bound the target protein. These compounds were not identified in the presence of an excess of known ligand, indicating they bound at the same site.

Aside from SEC, several other partitioning techniques have been used in AS-MS workflows, albeit with small molecules. In a proof-of-concept study, Venton and coworkers demonstrated the use of pulsed ultafiltration mass spectrometry to identify ligands of human serum albumin and calf intestine adenosine deaminase92. In 2008, van Breeman and Choi reported a magnetic bead-based selection from botanical extracts to identify ligands of estrogen receptor-α (ER-α) and ER-β93. An automated, higher-throughput version of this selection format, termed MagMASS, was later developed and used to identify ligands to the retinoid X receptor-α, again from botanical extracts94.

An advance in affinity selection from peptide libraries came in 2016, with a report by

Maaty and Weis describing the use of hydrogen exchange mass spectrometry to discern bound from unbound members95. In this approach, a peptide library derived from the Escherichia coli proteome, spiked with known peptidic ligands of calmodulin and bovine ribonuclease S (RNase

S), was predeuterated by incubation in 60% D2O/40% H2O buffer. The deuterated peptides were then diluted 1:10 into H2O in the presence and absence of the target protein. In this setup, unbound peptides undergo rapid back-exchange to the protiated form, while bound peptides

34 remain in an at least partially deuterated form. Screening software can then be applied to identify mass spectral features that exist in only one of two conditions (± protein target), indicative of a bound peptide. Applying this technique as proof-of-concept, the authors identified the spiked-in peptides as the only true binders of either calmodulin or RNase S, validating the workflow.

Despite these advances, until recently the state-of-the-art did not permit the direct, one- pot interrogation of large (>104-member) synthetic peptide libraries in solution. This limitation is in part due to the challenge of sequencing complex mixtures of synthetic peptides that would result from such selections. The library used by Maaty and Weis in their hydrogen exchange mass spectrometry approach was a natural library derived from an E. coli proteome, enabling database matching to aid peptide identification. Moreover, the authors note that this technique would not be suitable for larger libraries, due to the need to resolve every member of the library by LC-MS. Therefore, although AS-MS provides a label-free, high-fidelity approach for ligand identification, and has been widely adopted in the pharmaceutical industry for small molecule discovery, its utility for the discovery of novel peptide ligands has been historically limited.

Recently, our lab sought to leverage modern mass spectrometry techniques to enable affinity selections from large libraries (≥ 106 members) of synthetic peptides. In 2017, we reported on protocols for the high-throughput, LC-MS/MS-based sequencing of synthetic peptide mixtures, which could reliably identify ~1,000 peptides per hour96. Applying these techniques, our lab developed an SEC-based affinity selection platform to assay small, focused libraries

(103–106 members), designed around known ligands24. By varying the hot spot positions with a suite of non-canonical amino acids, this approach was used to affinity maturate previously reported peptide binders to MDM2 and the C-terminal domain of the HIV-1 protein (C-

CA). The most potent variants demonstrated gains in affinity of 30–100-fold, and were validated

35 as functional PPI inhibitors, disrupting either the p53/MDM2 interaction or C-CA dimerization.

Importantly, this study demonstrated the utility of non-canonical amino acids to enhance molecular recognition at a peptide-protein interface.

Given the success of the SEC-based AS-MS approach, we sought to apply this method for de novo discovery of PPI inhibitors, wherein the entire peptide chain (as opposed to only a few residues) is randomized. We anticipated that for this approach to be generally successful, we would need to investigate libraries with much greater diversities, such as those routinely employed in genetically encoded approaches (≥ 108 members). Due to material losses of small quantities of library members to the column during SEC, we speculated this technique would not be well-suited for assaying high-diversity libraries. We chose instead to investigate the utility of magnetic beads, which have been used in small molecule AS-MS, OBOC library screening, and a variety of proteomics applications, for facilitating high-diversity, peptide-based AS-MS, with the goal of investigating libraries on the order of 108 members or greater97. To our knowledge, the use of these reagents with large synthetic peptide libraries prior to this work has not been reported. The focus of this thesis will therefore be on (1) developing protocols for applying magnetic bead-based AS-MS to large, synthetic peptide libraries; (2) asking basic questions about the effect of library size on selection performance; (3) applying these methods for discovery of new peptide-based reagents that bind at PPI epitopes; and (4) using machine learning to classify binders identified from this approach as specific or non-specific, which we anticipate will streamline validation efforts and improve the efficiency of the discovery process.

1.4. Thesis overview

Chapter 1 has provided a primer on the importance of PPIs, the challenges associated with drugging them, and the utility of rational design and combinatorial approaches for

36 engineering peptides that can modulate these interactions. Chapter 2 will focus on the use of a model selection target, an anti-hemagglutinin monoclonal antibody, to develop a magnetic bead- based AS-MS workflow capable of identifying binders from fully randomized, synthetic peptide libraries comprising up to 108 members. Chapter 3 describes the application of these methods for discovery of peptides that bind at PPI epitopes in the oncogenic ubiquitin ligase MDM2 and signaling hub 14-3-3. Finally, Chapter 4 details the development of machine learning methods to readily distinguish specific and non-specific binders and further streamline AS-MS-based discovery efforts.

37 1.5. References

(1) Kuzmanov, U.; Emili, A. Protein-Protein Interaction Networks: Probing Disease Mechanisms Using Model Systems. Genome Medicine 2013, 5 (4), 37. https://doi.org/10.1186/gm441. (2) Milroy, L.-G.; Grossmann, T. N.; Hennig, S.; Brunsveld, L.; Ottmann, C. Modulators of Protein–Protein Interactions. Chemical Reviews 2014, 114 (9), 4695–4748. https://doi.org/10.1021/cr400698c. (3) Jaeger, S.; Aloy, P. From Protein Interaction Networks to Novel Therapeutic Strategies. IUBMB Life 2012, 64 (6), 529–537. https://doi.org/10.1002/iub.1040. (4) Barabási, A.-L.; Gulbahce, N.; Loscalzo, J. Network Medicine: A Network-Based Approach to Human Disease. Nat Rev Genet 2011, 12 (1), 56–68. https://doi.org/10.1038/nrg2918. (5) Stumpf, M. P. H.; Thorne, T.; de Silva, E.; Stewart, R.; An, H. J.; Lappe, M.; Wiuf, C. Estimating the Size of the Human Interactome. Proceedings of the National Academy of Sciences 2008, 105 (19), 6959–6964. https://doi.org/10.1073/pnas.0708078105. (6) Ran, X.; Gestwicki, J. E. Inhibitors of Protein–Protein Interactions (PPIs): An Analysis of Scaffold Choices and Buried Surface Area. Current Opinion in Chemical Biology 2018, 44, 75–86. https://doi.org/10.1016/j.cbpa.2018.06.004. (7) Jones, S.; Thornton, J. M. Principles of Protein-Protein Interactions. PNAS 1996, 93 (1), 13–20. https://doi.org/10.1073/pnas.93.1.13. (8) Smith, M. C.; Gestwicki, J. E. Features of Protein-Protein Interactions That Translate into Potent Inhibitors: Topology, Surface Area and Affinity., Features of Protein-Protein Interactions That Translate into Potent Inhibitors: Topology, Surface Area and Affinity. Expert Rev Mol Med 2012, 14, 14, e16, e16–e16. https://doi.org/10.1017/erm.2012.10, 10.1017/erm.2012.10. (9) Shultz, M. D. Two Decades under the Influence of the Rule of Five and the Changing Properties of Approved Oral Drugs: Miniperspective. Journal of Medicinal Chemistry 2019, 62 (4), 1701–1714. https://doi.org/10.1021/acs.jmedchem.8b00686. (10) McDermott, J.; Jimeno, A. Pembrolizumab: PD-1 Inhibition as a Therapeutic Strategy in Cancer. Drugs Today 2015, 51 (1), 7–20. https://doi.org/10.1358/dot.2015.51.1.2250387. (11) Kazazi-Hyseni, F.; Beijnen, J. H.; Schellens, J. H. M. Bevacizumab. Oncologist 2010, 15 (8), 819–825. https://doi.org/10.1634/theoncologist.2009-0317. (12) Miersch, S.; Sidhu, S. S. Intracellular Targeting with Engineered Proteins. F1000Res 2016, 5. https://doi.org/10.12688/f1000research.8915.1. (13) Kardani, K.; Milani, A.; Shabani, S. H.; Bolhassani, A. Cell Penetrating Peptides: The Potent Multi-Cargo Intracellular Carriers. Expert Opinion on Drug Delivery 2019, 16 (11), 1227–1258. https://doi.org/10.1080/17425247.2019.1676720. (14) Mura, S.; Nicolas, J.; Couvreur, P. Stimuli-Responsive Nanocarriers for Drug Delivery. Nature Materials 2013, 12 (11), 991–1003. https://doi.org/10.1038/nmat3776. (15) Futami, J.; Yamada, H. Design of Cytotoxic Ribonucleases by Cationization to Enhance Intracellular Protein Delivery. Curr. Pharm. Biotechnol. 2008, 9 (3), 180–184. https://doi.org/10.2174/138920108784567326. (16) Mix, K. A.; Lomax, J. E.; Raines, R. T. Cytosolic Delivery of Proteins by Bioreversible Esterification. J. Am. Chem. Soc. 2017, 139 (41), 14396–14398. https://doi.org/10.1021/jacs.7b06597.

38 (17) Bockus, A. T.; Lokey, C. M. M. and R. S. Form and Function in Cyclic Peptide Natural Products: A Pharmacokinetic Perspective http://www.eurekaselect.com/109783/article (accessed Apr 5, 2020). (18) Schnölzer, M.; Alewood, P.; Jones, A.; Alewood, D.; Kent, S. B. In Situ Neutralization in Boc-Chemistry Solid Phase Peptide Synthesis. Rapid, High Yield Assembly of Difficult Sequences. Int. J. Pept. Protein Res. 1992, 40 (3–4), 180–193. https://doi.org/10.1111/j.1399-3011.1992.tb00291.x. (19) Simon, M. D.; Heider, P. L.; Adamo, A.; Vinogradov, A. A.; Mong, S. K.; Li, X.; Berger, T.; Policarpo, R. L.; Zhang, C.; Zou, Y.; Liao, X.; Spokoyny, A. M.; Jensen, K. F.; Pentelute, B. L. Rapid Flow-Based Peptide Synthesis. ChemBioChem 2014, 15 (5), 713– 720. https://doi.org/10.1002/cbic.201300796. (20) Mijalis, A. J.; Thomas Iii, D. A.; Simon, M. D.; Adamo, A.; Beaumont, R.; Jensen, K. F.; Pentelute, B. L. A Fully Automated Flow-Based Approach for Accelerated Peptide Synthesis. Nature Chemical Biology 2017, 13 (5), 464–466. https://doi.org/10.1038/nchembio.2318. (21) Gates, Z. P.; Vinogradov, A. A.; Quartararo, A. J.; Bandyopadhyay, A.; Choo, Z.-N.; Evans, E. D.; Halloran, K. H.; Mijalis, A. J.; Mong, S. K.; Simon, M. D.; Standley, E. A.; Styduhar, E. D.; Tasker, S. Z.; Touti, F.; Weber, J. M.; Wilson, J. L.; Jamison, T. F.; Pentelute, B. L. Xenoprotein Engineering via Synthetic Libraries. PNAS 2018, 201722633. https://doi.org/10.1073/pnas.1722633115. (22) Spokoyny, A. M.; Zou, Y.; Ling, J. J.; Yu, H.; Lin, Y.-S.; Pentelute, B. L. A Perfluoroaryl- Cysteine SNAr Chemistry Approach to Unprotected Peptide Stapling. J. Am. Chem. Soc. 2013, 135 (16), 5946–5949. https://doi.org/10.1021/ja400119t. (23) Walensky, L. D.; Bird, G. H. Hydrocarbon-Stapled Peptides: Principles, Practice, and Progress. J. Med. Chem. 2014, 57 (15), 6275–6288. https://doi.org/10.1021/jm4011675. (24) Touti, F.; Gates, Z. P.; Bandyopadhyay, A.; Lautrette, G.; Pentelute, B. L. In-Solution Enrichment Identifies Peptide Inhibitors of Protein–Protein Interactions. Nat Chem Biol 2019, 15 (4), 410–418. https://doi.org/10.1038/s41589-019-0245-2. (25) Rogers, J. M.; Passioura, T.; Suga, H. Nonproteinogenic Deep Mutational Scanning of Linear and Cyclic Peptides. PNAS 2018, 115 (43), 10959–10964. https://doi.org/10.1073/pnas.1809901115. (26) Clackson, T.; Wells, J. A Hot Spot of Binding Energy in a Hormone-Receptor Interface. Science 1995, 267 (5196), 383–386. https://doi.org/10.1126/science.7529940. (27) Pelay‐Gimeno, M.; Glas, A.; Koch, O.; Grossmann, T. N. Structure-Based Design of Inhibitors of Protein–Protein Interactions: Mimicking Peptide Binding Epitopes. Angewandte Chemie International Edition 2015, 54 (31), 8896–8927. https://doi.org/10.1002/anie.201412070. (28) Gurrath, M.; Kessler, H. Strong and Selective Inhibitors of 1 Adhesion to Vitronectin and Laminin. FEBS LETTERS 1991, 291 (1), 5. (29) Vega, M. J. P. de; Gonzalez-Muniz, M. M.-M. and R. Modulation of Protein-Protein Interactions by Stabilizing/Mimicking Protein Secondary Structure Elements http://www.eurekaselect.com/77548/article (accessed Mar 30, 2020). (30) Schafmeister, C. E.; Po, J.; Verdine, G. L. An All-Hydrocarbon Cross-Linking System for Enhancing the Helicity and Metabolic Stability of Peptides. J. Am. Chem. Soc. 2000, 122 (24), 5891–5892. https://doi.org/10.1021/ja000563a.

39 (31) Blackwell, H. E.; Grubbs, R. H. Highly Efficient Synthesis of Covalently Cross-Linked Peptide Helices by Ring-Closing Metathesis. Angewandte Chemie International Edition 1998, 37 (23), 3281–3284. https://doi.org/10.1002/(SICI)1521- 3773(19981217)37:23<3281::AID-ANIE3281>3.0.CO;2-V. (32) Grossmann, T. N.; Yeh, J. T.-H.; Bowman, B. R.; Chu, Q.; Moellering, R. E.; Verdine, G. L. Inhibition of Oncogenic Wnt Signaling through Direct Targeting of β-Catenin. PNAS 2012, 109 (44), 17942–17947. https://doi.org/10.1073/pnas.1208396109. (33) Patgiri, A.; Yadav, K. K.; Arora, P. S.; Bar-Sagi, D. An Orthosteric Inhibitor of the Ras- Sos Interaction. Nat Chem Biol 2011, 7 (9), 585–587. https://doi.org/10.1038/nchembio.612. (34) Leshchiner, E. S.; Parkhitko, A.; Bird, G. H.; Luccarelli, J.; Bellairs, J. A.; Escudero, S.; Opoku-Nsiah, K.; Godes, M.; Perrimon, N.; Walensky, L. D. Direct Inhibition of Oncogenic KRAS by Hydrocarbon-Stapled SOS1 Helices. Proceedings of the National Academy of Sciences 2015, 112 (6), 1761–1766. https://doi.org/10.1073/pnas.1413185112. (35) Ng, S.; Juang, Y.-C.; Chandramohan, A.; Kaan, H. Y. K.; Sadruddin, A.; Yuen, T. Y.; Ferrer-Gago, F. J.; Lee, X. C.; Liew, X.; Johannes, C. W.; Brown, C. J.; Kannan, S.; Aronica, P. G.; Berglund, N. A.; Verma, C. S.; Liu, L.; Stoeck, A.; Sawyer, T. K.; Partridge, A. W.; Lane, D. P. De-Risking Drug Discovery of Intracellular Targeting Peptides: Screening Strategies to Eliminate False-Positive Hits. ACS Med. Chem. Lett. 2020. https://doi.org/10.1021/acsmedchemlett.0c00022. (36) Walensky, L. D. Activation of Apoptosis in Vivo by a Hydrocarbon-Stapled BH3 Helix. Science 2004, 305 (5689), 1466–1470. https://doi.org/10.1126/science.1099191. (37) Araghi, R. R.; Bird, G. H.; Ryan, J. A.; Jenson, J. M.; Godes, M.; Pritz, J. R.; Grant, R. A.; Letai, A.; Walensky, L. D.; Keating, A. E. Iterative Optimization Yields Mcl-1–Targeting Stapled Peptides with Selective Cytotoxicity to Mcl-1–Dependent Cancer Cells. PNAS 2018, 115 (5), E886–E895. https://doi.org/10.1073/pnas.1712952115. (38) Chang, Y. S.; Graves, B.; Guerlavais, V.; Tovar, C.; Packman, K.; To, K.-H.; Olson, K. A.; Kesavan, K.; Gangurde, P.; Mukherjee, A.; Baker, T.; Darlak, K.; Elkin, C.; Filipovic, Z.; Qureshi, F. Z.; Cai, H.; Berry, P.; Feyfant, E.; Shi, X. E.; Horstick, J.; Annis, D. A.; Manning, A. M.; Fotouhi, N.; Nash, H.; Vassilev, L. T.; Sawyer, T. K. Stapled Α−helical Peptide Drug Development: A Potent Dual Inhibitor of MDM2 and MDMX for P53- Dependent Cancer Therapy. PNAS 2013, 110 (36), E3445–E3454. https://doi.org/10.1073/pnas.1303002110. (39) Bond, G.; Hu, W.; Levine, A. MDM2 Is a Central Node in the P53 Pathway: 12 Years and Counting. CCDT 2005, 5 (1), 3–8. https://doi.org/10.2174/1568009053332627. (40) Peptides and peptidomimetics as inhibitors of protein–protein interactions involving β- sheet secondary structures - ScienceDirect https://www-sciencedirect- com.libproxy.mit.edu/science/article/pii/S1367593119300262 (accessed Mar 29, 2020). (41) Banerjee, V.; Shani, T.; Katzman, B.; Vyazmensky, M.; Papo, N.; Israelson, A.; Engel, S. Superoxide Dismutase 1 (SOD1)-Derived Peptide Inhibits Amyloid Aggregation of Familial Amyotrophic Lateral Sclerosis SOD1 Mutants. ACS Chemical Neuroscience 2016, 7 (11), 1595–1606. https://doi.org/10.1021/acschemneuro.6b00227. (42) Ryan, P.; Patel, B.; Makwana, V.; Jadhav, H. R.; Kiefel, M.; Davey, A.; Reekie, T. A.; Rudrawar, S.; Kassiou, M. Peptides, Peptidomimetics, and Carbohydrate–Peptide Conjugates as Amyloidogenic Aggregation Inhibitors for Alzheimer’s Disease. ACS Chem. Neurosci. 2018, 9 (7), 1530–1551. https://doi.org/10.1021/acschemneuro.8b00185.

40 (43) Anderson, M. E.; Tejo, B. A.; Yakovleva, T.; Siahaan, T. J. Characterization of Binding Properties of ICAM-1 Peptides to LFA-1: Inhibitors of T-Cell Adhesion. Chemical Biology & Drug Design 2006, 68 (1), 20–28. https://doi.org/10.1111/j.1747- 0285.2006.00407.x. (44) Sable, R.; Durek, T.; Taneja, V.; Craik, D. J.; Pallerla, S.; Gauthier, T.; Jois, S. Constrained Cyclic Peptides as Immunomodulatory Inhibitors of the CD2:CD58 Protein– Protein Interaction. ACS Chem. Biol. 2016, 11 (8), 2366–2374. https://doi.org/10.1021/acschembio.6b00486. (45) Schymkowitz, J.; Borg, J.; Stricher, F.; Nys, R.; Rousseau, F.; Serrano, L. The FoldX Web Server: An Online Force Field. Nucleic Acids Res 2005, 33 (suppl_2), W382–W388. https://doi.org/10.1093/nar/gki387. (46) Kortemme, T.; Kim, D. E.; Baker, D. Computational Alanine Scanning of Protein-Protein Interfaces. Sci. STKE 2004, 2004 (219), pl2–pl2. https://doi.org/10.1126/stke.2192004pl2. (47) Rajamani, D.; Thiel, S.; Vajda, S.; Camacho, C. J. Anchor Residues in Protein–Protein Interactions. PNAS 2004, 101 (31), 11287–11292. https://doi.org/10.1073/pnas.0401942101. (48) Bergey, C. M.; Watkins, A. M.; Arora, P. S. HippDB: A Database of Readily Targeted Helical Protein–Protein Interactions. Bioinformatics 2013, 29 (21), 2806–2807. https://doi.org/10.1093/bioinformatics/btt483. (49) Watkins, A. M.; Arora, P. S. Anatomy of β-Strands at Protein–Protein Interfaces. ACS Chem. Biol. 2014, 9 (8), 1747–1754. https://doi.org/10.1021/cb500241y. (50) Gavenonis, J.; Sheneman, B. A.; Siegert, T. R.; Eshelman, M. R.; Kritzer, J. A. Comprehensive Analysis of Loops at Protein-Protein Interfaces for Macrocycle Design. Nat. Chem. Biol. 2014, 10 (9), 716–722. https://doi.org/10.1038/nchembio.1580. (51) Siegert, T. R.; Bird, M. J.; Makwana, K. M.; Kritzer, J. A. Analysis of Loops That Mediate Protein–Protein Interactions and Translation into Submicromolar Inhibitors. J. Am. Chem. Soc. 2016, 138 (39), 12876–12884. https://doi.org/10.1021/jacs.6b05656. (52) Wells, J. A.; McClendon, C. L. Reaching for High-Hanging Fruit in Drug Discovery at Protein–Protein Interfaces. Nature 2007, 450 (7172), 1001–1009. https://doi.org/10.1038/nature06526. (53) Barderas, R.; Benito-Peña, E. The 2018 Nobel Prize in Chemistry: Phage Display of Peptides and Antibodies. Anal Bioanal Chem 2019, 411 (12), 2475–2479. https://doi.org/10.1007/s00216-019-01714-4. (54) Obexer, R.; Walport, L. J.; Suga, H. Exploring Sequence Space: Harnessing Chemical and Biological Diversity towards New Peptide Leads. Curr Opin Chem Biol 2017, 38, 52–61. https://doi.org/10.1016/j.cbpa.2017.02.020. (55) Franzini, R. M.; Neri, D.; Scheuermann, J. DNA-Encoded Chemical Libraries: Advancing beyond Conventional Small-Molecule Libraries. Accounts of Chemical Research 2014, 47 (4), 1247–1255. https://doi.org/10.1021/ar400284t. (56) Goodnow Jr, R. A.; Dumelin, C. E.; Keefe, A. D. DNA-Encoded Chemistry: Enabling the Deeper Sampling of Chemical Space. Nature Reviews Drug Discovery 2017, 16 (2), 131– 147. https://doi.org/10.1038/nrd.2016.213. (57) Favalli, N.; Bassi, G.; Scheuermann, J.; Neri, D. DNA-Encoded Chemical Libraries – Achievements and Remaining Challenges. FEBS Letters 2018, 592 (12), 2168–2180. https://doi.org/10.1002/1873-3468.13068.

41 (58) Cwirla, S. E.; Peters, E. A.; Barrett, R. W.; Dower, W. J. Peptides on Phage: A Vast Library of Peptides for Identifying Ligands. PNAS 1990, 87 (16), 6378–6382. https://doi.org/10.1073/pnas.87.16.6378. (59) Wrighton, N. C.; Farrell, F. X.; Chang, R.; Kashyap, A. K.; Barbone, F. P.; Mulcahy, L. S.; Johnson, D. L.; Barrett, R. W.; Jolliffe, L. K.; Dower, W. J. Small Peptides as Potent Mimetics of the Protein Hormone Erythropoietin. Science 1996, 273 (5274), 458–463. https://doi.org/10.1126/science.273.5274.458. (60) Lowman, H. B.; Chen, Y. M.; Skelton, N. J.; Mortensen, D. L.; Tomlinson, E. E.; Sadick, M. D.; Robinson, I. C. A. F.; Clark, R. G. Molecular Mimics of Insulin-like Growth Factor 1 (IGF-1) for Inhibiting IGF-1: IGF-Binding Protein Interactions. Biochemistry 1998, 37 (25), 8870–8878. https://doi.org/10.1021/bi980426e. (61) Fairbrother, W. J.; Christinger, H. W.; Cochran, A. G.; Fuh, G.; Keenan, C. J.; Quan, C.; Shriver, S. K.; Tom, J. Y. K.; Wells, J. A.; Cunningham, B. C. Novel Peptides Selected to Bind Vascular Endothelial Growth Factor Target the Receptor-Binding Site. Biochemistry 1998, 37 (51), 17754–17764. https://doi.org/10.1021/bi981931e. (62) DeLano, W. L.; Ultsch, M. H.; De, A. M.; Vos; Wells, J. A. Convergent Solutions to Binding at a Protein-Protein Interface. Science 2000, 287 (5456), 1279–1283. https://doi.org/10.1126/science.287.5456.1279. (63) Cochran, A. G. Antagonists of Protein–Protein Interactions. Chemistry & Biology 2000, 7 (4), R85–R94. https://doi.org/10.1016/S1074-5521(00)00106-X. (64) Deyle, K.; Kong, X.-D.; Heinis, C. Phage Selection of Cyclic Peptides for Application in Research and Drug Development. Accounts of Chemical Research 2017, 50 (8), 1866– 1874. https://doi.org/10.1021/acs.accounts.7b00184. (65) Heinis, C.; Rutherford, T.; Freund, S.; Winter, G. Phage-Encoded Combinatorial Chemical Libraries Based on Bicyclic Peptides. Nat Chem Biol 2009, 5 (7), 502–507. https://doi.org/10.1038/nchembio.184. (66) Bertoldo, D.; Khan, M. M. G.; Dessen, P.; Held, W.; Huelsken, J.; Heinis, C. Phage Selection of Peptide Macrocycles against β-Catenin To Interfere with Wnt Signaling. ChemMedChem 2016, 11 (8), 834–839. https://doi.org/10.1002/cmdc.201500557. (67) Urech-Varenne, C.; Radtke, F.; Heinis, C. Phage Selection of Bicyclic Peptide Ligands of the Notch1 Receptor. ChemMedChem 2015, 10 (10), 1754–1761. https://doi.org/10.1002/cmdc.201500261. (68) Diderich, P.; Heinis, C. Phage Selection of Bicyclic Peptides Binding Her2. Tetrahedron 2014, 70 (42), 7733–7739. https://doi.org/10.1016/j.tet.2014.05.106. (69) Kale, S. S.; Villequey, C.; Kong, X.-D.; Zorzi, A.; Deyle, K.; Heinis, C. Cyclization of Peptides with Two Chemical Bridges Affords Large Scaffold Diversities. Nature Chemistry 2018, 10 (7), 715–723. https://doi.org/10.1038/s41557-018-0042-7. (70) Oller‐Salvia, B.; Chin, J. W. Efficient Phage Display with Multiple Distinct Non- Canonical Amino Acids Using Orthogonal Ribosome-Mediated Genetic Code Expansion. Angewandte Chemie International Edition 2019, 58 (32), 10844–10848. https://doi.org/10.1002/anie.201902658. (71) Tharp, J. M.; Hampton, J. T.; Reed, C. A.; Ehnbom, A.; Chen, P.-H. C.; Morse, J. S.; Kurra, Y.; Pérez, L. M.; Xu, S.; Liu, W. R. An Amber Obligate Active Site-Directed Ligand Evolution Technique for Phage Display. Nat Commun 2020, 11 (1), 1–14. https://doi.org/10.1038/s41467-020-15057-7.

42 (72) Wilson, D. S.; Keefe, A. D.; Szostak, J. W. The Use of MRNA Display to Select High- Affinity Protein-Binding Peptides. Proc. Natl. Acad. Sci. U.S.A. 2001, 98 (7), 3750–3755. https://doi.org/10.1073/pnas.061028198. (73) Murakami, H.; Saito, H.; Suga, H. A Versatile TRNA Aminoacylation Catalyst Based on RNA. Chemistry & Biology 2003, 10 (7), 655–662. https://doi.org/10.1016/S1074- 5521(03)00145-5. (74) Matsunaga, Y.; Bashiruddin, N. K.; Kitago, Y.; Takagi, J.; Suga, H. Allosteric Inhibition of a Semaphorin 4D Receptor Plexin B1 by a High-Affinity Macrocyclic Peptide. Cell Chemical Biology 2016, 23 (11), 1341–1350. https://doi.org/10.1016/j.chembiol.2016.09.015. (75) Song, X.; Lu, L.; Passioura, T.; Suga, H. Macrocyclic Peptide Inhibitors for the Protein– Protein Interaction of Zaire Ebola Protein 24 and Karyopherin Alpha 5. Org. Biomol. Chem. 2017, 15 (24), 5155–5160. https://doi.org/10.1039/C7OB00012J. (76) Passioura, T.; Liu, W.; Dunkelmann, D.; Higuchi, T.; Suga, H. Display Selection of Exotic Macrocyclic Peptides Expressed under a Radically Reprogrammed 23 Amino Acid Genetic Code. J. Am. Chem. Soc. 2018, 140 (37), 11551–11555. https://doi.org/10.1021/jacs.8b03367. (77) E. Otero-Ramirez, M.; Matoba, K.; Mihara, E.; Passioura, T.; Takagi, J.; Suga, H. Macrocyclic Peptides That Inhibit Wnt Signalling via Interaction with Wnt3a. RSC Chemical Biology 2020, 1 (1), 26–34. https://doi.org/10.1039/D0CB00016G. (78) Katoh, T.; Tajima, K.; Suga, H. Consecutive Elongation of D-Amino Acids in Translation. Cell Chemical Biology 2017, 24 (1), 46–54. https://doi.org/10.1016/j.chembiol.2016.11.012. (79) Katoh, T.; Suga, H. Ribosomal Incorporation of Consecutive β-Amino Acids. J. Am. Chem. Soc. 2018, 140 (38), 12159–12167. https://doi.org/10.1021/jacs.8b07247. (80) Furka, A.; Sebestyén, F.; Asgedom, M.; Dibó, G. General Method for Rapid Synthesis of Multicomponent Peptide Mixtures. Int. J. Pept. Protein Res. 1991, 37 (6), 487–493. https://doi.org/10.1111/j.1399-3011.1991.tb00765.x. (81) Lian, W.; Upadhyaya, P.; Rhodes, C. A.; Liu, Y.; Pei, D. Screening Bicyclic Peptide Libraries for Protein–Protein Interaction Inhibitors: Discovery of a Tumor Necrosis Factor-α Antagonist. Journal of the American Chemical Society 2013, 135 (32), 11990– 11995. https://doi.org/10.1021/ja405106u. (82) Jiang, B.; Pei, D. A Selective, Cell-Permeable Nonphosphorylated Bicyclic Peptidyl Inhibitor against Peptidyl–Prolyl Isomerase Pin1. J. Med. Chem. 2015, 58 (15), 6306– 6312. https://doi.org/10.1021/acs.jmedchem.5b00411. (83) Rhodes, C. A.; Dougherty, P. G.; Cooper, J. K.; Qian, Z.; Lindert, S.; Wang, Q.-E.; Pei, D. Cell-Permeable Bicyclic Peptidyl Inhibitors against NEMO-IκB Kinase Interaction Directly from a Combinatorial Library. J. Am. Chem. Soc. 2018, 140 (38), 12102–12110. https://doi.org/10.1021/jacs.8b06738. (84) Upadhyaya, P.; Qian, Z.; Selner, N. G.; Clippinger, S. R.; Wu, Z.; Briesewitz, R.; Pei, D. Inhibition of Ras Signaling by Blocking Ras–Effector Interactions with Cyclic Peptides. Angewandte Chemie International Edition 2015, 54 (26), 7602–7606. https://doi.org/10.1002/anie.201502763. (85) Liang, X.; Vézina-Dawod, S.; Bédard, F.; Porte, K.; Biron, E. One-Pot Photochemical Ring-Opening/Cleavage Approach for the Synthesis and Decoding of Cyclic Peptide Libraries. Org. Lett. 2016, 18 (5), 1174–1177. https://doi.org/10.1021/acs.orglett.6b00296.

43 (86) Li, Z.; Shao, S.; Ren, X.; Sun, J.; Guo, Z.; Wang, S.; Song, M. M.; Chang, C. A.; Xue, M. Construction of a Sequenceable Protein Mimetic Peptide Library with a True 3D Diversifiable Chemical Space. J. Am. Chem. Soc. 2018, 140 (44), 14552–14556. https://doi.org/10.1021/jacs.8b08338. (87) Townsend, C.; Furukawa, A.; Schwochert, J.; Pye, C. R.; Edmondson, Q.; Lokey, R. S. CycLS: Accurate, Whole-Library Sequencing of Cyclic Peptides Using Tandem Mass Spectrometry. Bioorganic & Medicinal Chemistry 2018, 26 (6), 1232–1238. https://doi.org/10.1016/j.bmc.2018.01.027. (88) Doran, T. M.; Gao, Y.; Mendes, K.; Dean, S.; Simanski, S.; Kodadek, T. Utility of Redundant Combinatorial Libraries in Distinguishing High and Low Quality Screening Hits. ACS Comb Sci 2014, 16 (6), 259–270. https://doi.org/10.1021/co500030f. (89) Wang, X.; Peng, L.; Liu, R.; Xu, B.; Lam, K. S. Applications of Topologically Segregated Bilayer Beads in ‘One-Bead One-Compound’ Combinatorial Libraries. The Journal of Peptide Research 2005, 65 (1), 130–138. https://doi.org/10.1111/j.1399- 3011.2005.00192.x. (90) Single Bead Labeling Method for Combining Confocal Fluorescence On-Bead Screening and Solution Validation of Tagged One-Bead One-Compound Libraries. Chemistry & Biology 2009, 16 (7), 724–735. https://doi.org/10.1016/j.chembiol.2009.06.011. (91) Kaur, S.; McGuire, L.; Tang, D.; Dollinger, G.; Huebner, V. Affinity Selection and Mass Spectrometry-Based Strategies to Identify Lead Compounds in Combinatorial Libraries. J Protein Chem 1997, 16 (5), 505–511. https://doi.org/10.1023/A:1026369729393. (92) van Breemen, R. B.; Huang, C.-R.; Nikolic, D.; Woodbury, C. P.; Zhao, Y.-Z.; Venton, D. L. Pulsed Ultrafiltration Mass Spectrometry: A New Method for Screening Combinatorial Libraries. Anal. Chem. 1997, 69 (11), 2159–2164. https://doi.org/10.1021/ac970132j. (93) Choi, Y.; van Breemen, R. B. Development of a Screening Assay for Ligands to the Estrogen Receptor Based on Magnetic Microparticles and LC-MS. Comb. Chem. High Throughput Screen. 2008, 11 (1), 1–6. https://doi.org/10.2174/138620708783398340. (94) Rush, M. D.; Walker, E. M.; Prehna, G.; Burton, T.; van Breemen, R. B. Development of a Magnetic Microbead Affinity Selection Screen (MagMASS) Using Mass Spectrometry for Ligands to the Retinoid X Receptor-α. J. Am. Soc. Mass Spectrom. 2017, 28 (3), 479–485. https://doi.org/10.1007/s13361-016-1564-0. (95) Maaty, W. S.; Weis, D. D. Label-Free, In-Solution Screening of Peptide Libraries for Binding to Protein Targets Using Hydrogen Exchange Mass Spectrometry. J. Am. Chem. Soc. 2016, 138 (4), 1335–1343. https://doi.org/10.1021/jacs.5b11742. (96) Vinogradov, A. A.; Gates, Z. P.; Zhang, C.; Quartararo, A. J.; Halloran, K. H.; Pentelute, B. L. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries. ACS Comb. Sci. 2017, 19 (11), 694–701. https://doi.org/10.1021/acscombsci.7b00109. (97) Quartararo, A. J.; Gates, Z. P.; Somsen, B. A.; Hartrampf, N.; Ye, X.; Shimada, A.; Kajihara, Y.; Ottmann, C.; Pentelute, B. L. Ultra-Large Chemical Libraries for the Discovery of High-Affinity Peptide Binders. Nature Communications 2020, 11 (1), 3183. https://doi.org/10.1038/s41467-020-16920-3.

44 Chapter 2. Affinity Selection-Mass Spectrometry of High Diversity Synthetic Peptide Libraries

2.1. Introduction

Drug discovery benefits from the ability to assay a large number of compounds for activity against a biomolecular target of interest. Of all 113 first-in-class drugs approved by the

FDA between 1999 and 2013, it is estimated that 71% were discovered using some type of target-based mocdality1. Some of the most common strategies for this purpose include high- throughput screening (HTS) and fragment-based drug discovery (FBDD)2 for identification of small molecule binders, and protein engineering strategies3 for discovery of novel biologics.

In the past decade, much research attention has been devoted to discovering and engineering peptide-based binders as an emerging therapeutic modality4,5. Given the unique niche of chemical space they occupy, possessing molecular weights in between those of small molecules (< 500 Da) and biologics (up to ~150,000 Da), peptides offer a distinct profile of chemically attractive features. Peptides are synthetically accessible, amenable to chemical tailoring, and have the potential to bind the typically shallow surfaces seen in therapeutically relevant—and historically intractable—protein-protein interactions (PPIs)6–8. Importantly, chemical modifications, such as non-canonical amino acid incorporation and chemical stapling, can render peptides more proteolytically stable, more cell-penetrant, and in some cases even increase binding affinity relative to their natural, linear counterparts, which on their own tend to exhibit poor pharmacological properties.9–13

Molecular biology-based selection techniques, such as phage display14,15 and mRNA display16, are powerful tools for target-based discovery of novel peptide binders, thanks in part to their ability to examine enormous libraries (108-1013 members)17. However, incorporation of a large number and variety of non-canonical amino acids by these methods remains challenging

45 (Fig. 2.2.1a)18,19. Chemical combinatorial methods, such as DNA-encoded libraries (DELs)20,21 and one-bead-one-compound (OBOC) libraries22,23, can easily overcome this hurdle; however, each of these techniques faces its own limitations. DELs are typically limited to three or four varied positions, due to inefficiencies in the chemistry used for their assembly, and as such are more often categorized as libraries of small molecules. OBOC libraries can easily incorporate more varied positions, but OBOC screening is technically challenging and typically examines only ~106 compounds24.

Affinity selection-mass spectrometry (AS-MS)25–29 represents an alternative strategy for target-based discovery of chemically accessed peptide binders. Work in our lab recently leveraged LC-MS/MS for sequencing individual synthetic peptides present in complex mixtures30, and to increase the diversity of synthetic peptide libraries amenable to AS-MS12, from ~10 to ~106. This advance was used to select improved variants of known binders from small focused libraries (103 members)12. Discovery of binders from fully randomized libraries is a much greater challenge, which may require library diversities considerably higher than are typically examined by chemical screening. In principle, commercial mass spectrometers are sufficiently sensitive to detect and sequence peptides from mixtures as complex as ~109 (Table

2.2.1)31,32. However, it is not obvious whether single-pass affinity selections from such libraries could provide sufficient enrichment (i.e., reduce the number of peptides sufficiently) to identify binders by LC-MS/MS-based sequencing, which is applicable to mixtures of up to several thousand peptides30.

Here, we show that magnetic bead capture-based AS-MS33,34 is capable of identifying binders from libraries of up to ~108 random synthetic peptides (Fig. 2.2.1b). Starting with an anti-hemagglutinin monoclonal antibody (anti-HA mAb) selection target, we demonstrate that

46 high-affinity binders can be captured with near-quantitative recovery from relevant ligand concentrations. These results translate into a selection context, where high-affinity binders containing the HA epitope are identified in proportion to library diversity over the range of 106–

108. The fidelity of this AS-MS approach is illustrated by the high enrichments obtained for true binders over non-binders in each selection context examined. Finally, parallel selections against different proteins enable assessment of the target-dependent enrichment of each peptide identified, providing a means of readily distinguishing specific from non-specific binders.

Figure 2.2.1. AS-MS enables discovery from randomized, high-diversity chemical libraries. a) Among existing techniques for peptide binder discovery, increased chemical control over library synthesis and screening typically comes at the cost of limited library diversity. Affinity selection-mass spectrometry (AS-MS; this work), which relies on chemically-accessed libraries and direct identification of active members, can investigate library diversities on the order of 108–109 while maintaining a high degree of chemical control over the synthesis and selection process. b) A typical AS-MS workflow, which uses magnetic beads as a partitioning reagent to discriminate bound from unbound library members, and nano-liquid chromatography-tandem mass spectrometry to identify sequences of active peptides.

47 Table 2.2.1. Mixtures of up to 109 peptides contain sufficient material for MS-based sequencing.

Library size Member concentration Moles per member (pM) (fmol) 106 1000 1000 107 100 100

8 10 10 10 109 1 1

At a given scale (1 mL), maximum diversity is a tradeoff between the amount of individual peptide required for nLC-MS/MS sequencing, and the solubility limit of the total peptide mixture (~1 mM). Even more peptides might be examined concurrently at a fixed total concentration by increasing the selection volume.

2.2. Results

2.2.1. AS-MS recovers high-affinity ligands from high dilution

Identification of binders from high-diversity libraries would necessitate their efficient recovery at high dilution. To assess the utility of magnetic bead-based affinity capture for this purpose, we investigated the recovery of a number of peptide binders with varying affinity for anti-HA mAb clone 12ca5 (12ca5), which recognizes the linear motif DXXDYA (Figs. 2.6.1–

2.6.6; Table 2.6.1). In these experiments, streptavidin-coated magnetic beads (1 mg; 0.13 nmol

IgG binding capacity) functionalized with biotinylated 12ca5 were incubated with mixtures of

12ca5-binding peptides (either 1 nM/peptide, 100 pM/peptide, or 10 pM/peptide; 1 mL scale).

The magnetic beads were isolated, washed with buffer, and treated with chemical denaturant to elute bound peptides. A portion of this eluate (~2%) was subjected to nLC-MS analysis, and recovery was quantified using normalized MS response. To calculate recovery, a separate dose-

48 response curve was generated for each peptide, since their MS responses varied by ~5-fold (Fig.

2.6.7).

Under the conditions examined, only the two highest affinity binders were significantly retained, with recoveries of 75% and 33% obtained for ~4 and ~25 nM-affinity binders, respectively (Fig. 2.2.2a; 2.6.8–2.6.9). This result is consistent with the work of Sannino and coworkers, who observed low recoveries for micromolar affinity ligands in the context of

DELs35. We tested the use of increased magnetic bead concentration to recover lower affinity ligands; however, this parameter had little effect on recovery, perhaps suggesting that these ligands are lost during the washing step (Fig. 2.6.10). Consistent with this view, recovery of the

25 nM-affinity ligand (33%) is in quantitative agreement with that expected based on its

-3 -1 measured dissociation rate, and the total wash time of ~6 min (34%, based on koff = 3.0 x 10 s ;

Fig. 2.6.6). Therefore, dissociation rate is an important factor in controlling the recovery of binders by AS-MS. For the retained binders, no loss in recovery was observed as their initial concentration was decreased from 1 nM to 10 pM, suggesting that selections from libraries as diverse as 108 would be feasible (Figs. 2.6.8–2.6.9; Table 2.2.1). Below 10 pM, low MS detector response precluded convenient quantitation.

We examined whether recovery of high-affinity binders could be maintained at concentrations below those conveniently quantified. Affinity captures were performed at 10 pM and 1 pM ligand concentration (10 fmol or 1 fmol/peptide, respectively), and the eluates were concentrated by solid phase extraction such that the majority of the bound fraction could be analyzed (~80%). Although recoveries could be maintained down to 1 pM ligand, the resulting

MS signals were observed near the intensity threshold that would be used for MS/MS precursor isolation during an actual selection, to facilitate collection of high quality spectra (5 x 104 counts;

49 Fig. 2.6.11). The precursor selection threshold defines a limit of detection for sequenced peptides, which corresponds to ~50% or ~5% recovery of peptides initially present at 1 pM (1 fmol/mL) or 10 pM (10 fmol/mL), respectively. Therefore, 109 is an upper bound on the library diversity amenable to this AS-MS approach at the examined scale (1 mL), to retain sufficient material for nLC-MS/MS sequencing (Table 2.2.1).

Figure 2.2.2. Magnetic bead capture recovers high-affinity binders from both a) simple mixtures and b) random libraries. a) High-affinity 12ca5-binding peptides were recovered effectively from a concentration of 10 pM, which corresponds to the concentration of an individual library member in a 108-member library at 1 mM total peptide. Recoveries were determined by MS detector response, normalizing to an internal standard. b) The number of identified sequences bearing the 12ca5-binding motif—DXXDY(A/S)—increased in proportion to library diversity, from two million to two hundred million-member libraries. On average, motif-containing sequences represented 71% of all sequences identified. Error bars correspond to one standard deviation among three technical replicates.

50 2.2.2. AS-MS enriches 12ca5 binders from a 106-member library

To assess whether magnetic bead-based pulldowns could recover binders from complex mixtures and reduce the mixture complexity sufficiently for nLC-MS/MS sequencing, we synthesized a library of design (X)9K, where X = all L-amino acids except for cysteine and isoleucine (theoretical diversity = 2 x 1011), by split-and-pool synthesis22,36 on 30 µm TentaGel resin (2.9 g; 2 x 108 beads), for use in subsequent selections (Fig. 2.6.12). Portions of this library comprising 2 x 107 and 2 x 106 members were taken prior to cleavage from resin to give three libraries of increasing size from the same batch of split-and-pool synthesis (Fig. 2.2.2b).

Selections against 12ca5 (0.13 nmol) were then performed with the 2 x 106-member library, at 10 pM/member concentration (1 mL scale; 10 fmol/peptide, 20 nmol total library), with the goal of identifying sequences similar to the HA epitope (selections were performed in triplicate).*

Selection eluates were concentrated by solid phase extraction, and analyzed by nLC-MS/MS

(theoretical loading of a retained peptide: 8 fmol). From these selections, just one peptide that matched the library design was identified with a sequencing score (average local confidence

(ALC) score37) ≥ 80 (MNDLVDYADK; Table 2.6.2). This sequence had five residues in common with the HA epitope, DXVDYA, including the hot spot residues Asp4, Asp7, Tyr8, and

Ala9 required for high-affinity 12ca5-binding (Figs. 2.6.13–2.6.21)38.

*Footnote 1: Given the large theoretical diversity of this library relative to the diversity sampled

(limited by the number of beads used in split-and-pool synthesis), it is unlikely the full HA epitope would be rediscovered. There is a 0.001% chance of finding a given 9-mer sequence in a library comprising 2 x 106 random 9-mers, with 18 possible monomers at each position

(theoretical diversity = 2 x 1011).

51 We endeavored to understand the enrichment achieved by the selection, where enrichment is defined as:

(sequenced binders/total sequenced peptides)isolated/(binders/total peptides)assayed (1)

or equivalently

(sequenced binders) isolated/(binders)assayed * (total peptides)assayed/(total sequenced peptides)isolated

(2)

Achieving high enrichment in a single selection step would be essential for discovering rare binding molecules by AS-MS, since synthetic libraries cannot be propagated to facilitate sequential rounds of selection. Since only a single peptide was sequenced from the selection eluate, a maximum enrichment of 2 x 106 was achieved. The actual enrichment may be significantly lower, if additional binders were present in the library, but not recovered and sequenced. For example, the expected number of DXXDY(A/S)-containing sequences in the 2 x

106-member library is 152** (although not all of these are necessarily high-affinity binders), suggesting the actual enrichment could be as low as ~1.3 x 104.

**Footnote 2: The probability of any given sequence exhibiting the DXXDY(A/S) motif in a particular frame is equal to (1/18)3 x (2/18), or 1.9 x 10-5. With 9 varied positions, this motif can exist in four distinct frames; therefore the probability of finding this motif in any frame is equal to 7.6 x 10-5. In a library of 2 x 106 members, there should then exist approximately 152 sequences containing DXXDY(A/S).

52 To check whether additional peptides were recovered by the selection at lower abundance, we performed a selection at 200 pM/member (Fig. 2.6.22), such that less-abundant peptides could cross the precursor selection threshold. Increasing amounts of the selection eluate were analyzed by nLC-MS/MS, with sample loadings varied from 8 fmol/member (selection threshold: ~5% recovery) to 75 fmol/member (selection threshold: ~0.5% recovery). At the lowest sample loading, just one peptide was identified (MNDLVDYADK), identical to that observed in the original selection. As sample loading was increased to 75 fmol/member, two additional epitope-containing peptides were identified: PDVHDYTWGK and ENDWQDYSHK.

However, recovery of these sequences came at the cost of identifying additional non-motif- containing peptides: a total of 32 from 3 replicate selections. These results suggest that with respect to recovery of binders, little benefit is conferred by higher library concentration, and that

8 fmol/member sample loading avoids detection of peptides present in the selection eluate at lower abundance. Accounting for these additional sequences yields an enrichment of 1.1 x 103, assuming 152 DXXDY(A/S)-containing peptides (Equation 2), which is on the order of enrichments reported for individual rounds of mRNA (~103)39, phage (~103)14, or cell surface display (~104)40,41.

Taken together, these results suggest that magnetic bead affinity capture can facilitate selections from complex mixtures of synthetic peptides at a concentration of 10 pM/member. In conjunction with nLC-MS/MS peptide sequencing, very high enrichment for sequenced peptides is enabled by the use of a precursor selection threshold, to detect only those peptides that are significantly recovered by the selection (here, a DXXDY(A/S)-containing peptide—present in highest abundance). Peptides recovered in lower abundance (here, mostly non-motif-containing peptides) are present in the selection eluate, but not sequenced (Table 2.6.4). As described below

53 (‘Parallel selections distinguish non-specific binders’), the majority of these peptides are non- specific binders.

2.2.3. Enrichment is maintained as diversity increases from 106-108

To investigate whether comparable selection performance could be achieved using higher-diversity libraries, we assayed 2 x 107- and 2 x 108-member libraries in selections for

12ca5 binding. In theory, these libraries should contain 10- and 100-fold more DXXDY(A/S) peptides compared to the 2 x 106-member library. If selections from the higher-diversity libraries performed comparably, then they should recover all of these additional DXXDY(A/S)-containing peptides, with no increase in the proportion of non-binding peptides. For each library, selections were performed in triplicate, with library concentration maintained at 10 pM/member and using

0.13 nmol of selection target, as above.

From the 2 x 107-member library, a representative selection identified 23 peptides that matched the library design (ALC ≥ 80), 14 of which (61%) contained DXXDYA or DXXDYS

(Fig. 2.2.2b; Table 2.6.2). An additional two sequences closely resembled 12ca5 binders—

KVLDYDYAWK and YDDRYADTFK—and may correspond to inaccurate sequence assignments. Selections from the 2 x 108-member library identified 156 total peptides (ALC ≥

80), 109 of which (70%) contained either DXXDYA or DXXDYS (Fig. 2.2.2b; Tables 2.6.2–

2.6.3). These results illustrate that selections identified binders from the higher-diversity libraries without loss in recovery, since the expected ~10 and ~100 DXXDY(A/S)-containing peptides were identified, and without loss in enrichment, since DXXDY(A/S)-containing peptides comprised the majority of selected peptides in each case (100%, 61%, and 70% for 2 x 106, 2 x

54 107, and 2 x 108-member libraries, respectively). Therefore, we concluded that single-pass AS-

MS could be applied to libraries of at least 108 random peptides without loss in performance.

2.2.4. Recovery of binders drops from libraries beyond 108 members

We next set out to understand whether libraries of diversity beyond 108 would also be amenable to AS-MS. To access 109 random synthetic peptides on a convenient lab scale, we used

20 µm TentaGel resin beads (vs. 30 µm beads, above) to prepare a library of design (X)9K (X = all L-amino acids except for cysteine and isoleucine; 5.4 g; 1.3 x 109 beads) (Fig. 2.6.23). As above, a portion of the beads was set aside prior to cleavage from resin to give a 108-member library for side-by-side comparison.

The 108-member library performed comparably to the 108-member library prepared on 30

µm TentaGel beads (above), yielding 257 total peptides (ALC ≥ 80), 183 of which (71%) contained either DXXDYA or DXXDYS. In contrast, selections from the 109-member library identified only 34 peptides (ALC ≥ 80), 21 of which (62%) contained DXXDYA or DXXDYS

(Table 2.6.5). For the 109-member library, selections were performed at 2 pM/member to maintain solubility; however, the lower sequence identification rate cannot be attributed to material losses alone, since selection from a 108-member performed at 2 pM/member identified

131 DXXDY(A/S) sequences (vs. 183 from the selection at 10 pM/member) (Fig. 2.2.3a; Table

2.6.5). Analysis of pooled eluates from replicate selections from the 109-member library yielded no additional sequences, providing further evidence that material losses were not responsible for the reduction in sequence identification rate (since similar populations of DXXDY(A/S)- containing sequences are recovered by replicate selections) (Tables 2.6.3 & 2.6.6).

55 Since affinity selections involve a large number of potential binders competing for a limited number of binding sites, it is possible that many weaker binders present in the library reduce the individual recoveries of all ligands. Two experiments were performed to address whether competition was responsible for the lower recovery of DXXDY(A/S) peptides from the

109-member library. First, we performed a selection using 10-fold higher stoichiometry of 12ca5

(1.3 nmol) relative to library (2 fmol/member). If competition were limiting recovery of binders, this experiment should have recovered additional DXXDY(A/S)-containing peptides. Instead, their recovery was abrogated entirely (Table 2.6.7). Second, to determine the frequency of binders that would be required for competition to become significant, we studied the effect of exogenous competitors on the recovery of DXXDY(A/S) peptides from the 108-member library. 4 nM– or 3 µM–affinity competitors required concentrations of 100 nM or 100 µM, respectively, to attenuate the recovery of DXXDY(A/S) peptides (Fig. 2.2.3b; Table 2.6.8). These concentrations correspond to 5 x 104 or 5 x 107 peptides (present at 2 pM/peptide), suggesting that: 1) the expected ~103 DXXDY(A/S) peptides in the 109-member library would not compete for selection target; and 2) in order for µM–affinity binders to compete with DXXDY(A/S) peptides, they would need to comprise 5% of the 109-member library.

The combined results are consistent with the interpretation that selections from the 109- member library yield more peptides than are compatible with nLC-MS/MS sequencing. As the number of peptides in the selection eluate (the sample complexity) increases, the proportion of sequenced peptides decreases30. This drop in sequencing coverage was observed previously to be particularly significant beyond 103 peptides30, which is consistent with the approximate number of total peptides (binders and non-binders) likely isolated from the 109-member library (>3.5 x

103, based on a maximum of 35 peptides isolated from the 106-member library). Additionally,

56 the competition experiments provide strong evidence that selections took place under conditions of equilibrium binding, since the relative strength of the two competitors was in proportion to their binding constants. Therefore, predictable competition effects should be expected when AS-

MS is conducted in the presence of added competitor, or from focused libraries containing a higher proportion of binders.

Figure 2.2.3. Loss of binder identification from a 109-member library cannot be attributed to material losses or competition. a) An approximate 9-fold drop in the number of 12ca5- binding sequences is observed from one-pot selections of a 109-member library (2 pM/member) relative to a 108-member library. Since recovery from the 108-member library was mostly retained at 2 pM/member, loss in starting material alone cannot explain the drop in sequence identification. b) Exogenous HA epitope (KD = 4 nM) could impede recovery of DXXDY(A/S)- containing sequences from a 108-member library when included at 1:1 stoichiometry relative to 12ca5, while the peptide ‘Gypyeydwe’ (KD = 3 µM) inhibited recovery only when present at 1000:1 stoichiometry relative to 12ca5. Error bars correspond to standard deviations from two experimental replicates.

2.2.5. Parallel selections distinguish non-specific binders

Selections from identical portions of synthetic peptide mixtures obtained by split-and- pool synthesis are readily conducted in parallel. For example, a 2 x 108-member library synthesized on 30 μm TentaGel (~3.7 pmol/bead) provides sufficient material for ~370

57 selections performed at 10 fmol/member scale. We leveraged this capability to understand what proportion of non-HA epitope-containing peptides were common to an unrelated selection target, and therefore attributable to non-specific binding42. Side-by-side selections from a 2 x 108- member library were conducted in triplicate against both 12ca5 (mouse IgG2bκ) and a polyclonal human IgG1, with the goal of quantifying the degree of overlap among selected peptides (Fig.

2.2.4a).

Three replicate 12ca5 selections yielded a total of 133 DXXDY(A/S)-containing peptides, all of which were specific for 12ca5, along with 65 non-motif containing peptides (Fig. 2.2.4b;

Table 2.6.9). Of the non-motif-containing peptides, 18 contained motifs that differed from the

HA epitope by one position (for example, EXXDYA). Of the remaining 47 peptides, closer inspection revealed that 13 may contain mis-sequenced HA epitopes. For example,

VFQDWEDFSK and YMDTVDFSEK contain the FS dipeptide fragment, which is isobaric to

YA (Fig. 2.6.24). A total of 34 peptides had no obvious sequence similarity to the DXXDY(A/S) motif. 17 of these were also sequenced from the IgG1 selections. Examination of the LC-MS data revealed that many or all of the remaining 17 peptides were also present in the IgG1 selection eluates, but not sequenced (9 of 9 selected peptides examined; Fig. 2.6.25). Therefore, essentially all of the non-motif-containing peptides sequenced from 12ca5 selection can be attributed to non-specific binding. Presumably, many additional non-motif-containing peptides were recovered at lower abundance and not sequenced, as for the 106-member library.

58

Figure 2.2.4. Parallel selections enable differentiation of specific and non-specific binders. a) Using AS-MS, the same population of compounds can be used in selections against more than one target in parallel, enabling assessment of target-dependent enrichment of peptides. Here, selections using a 2 x 108-member library were performed against 12ca5 and a polyclonal human IgG1 in parallel, followed by sequential nLC-MS/MS analysis. b) Extracted ion chromatograms of a DXXDY(A/S)-containing peptide (left) and a non-motif-containing peptide (right) from a selection against 12ca5 (blue) and polyclonal human IgG1 (purple). The peptide containing the 12ca5-binding motif is selectively enriched towards 12ca5, while the non-motif-containing peptide is promiscuously pulled down in both conditions.

2.3. Discussion

In this work, we demonstrate that affinity selection-mass spectrometry, using magnetic bead reagents, provides sufficient enrichment to identify high-affinity binders from randomized libraries of 108 synthetic peptides. With respect to accessible diversity, this advance brings synthetic libraries up to the level of molecular biology-based combinatorial libraries. Diversity is a key determinant of selection outcome, as illustrated here in the context of 12ca5 binding, and in the field of antibody engineering41,43. Therefore, the results described here can be expected to considerably extend the utility of synthetic libraries for discovering novel binding molecules.

The practical limit to library diversity amenable to single-pass AS-MS is 108, beyond which the number of binders identified decreases. Our combined results are consistent with non- specific binding as the origin of this limit, which results in the recovery of more peptides from

>108-member libraries than can be sequenced by nLC-MS/MS with high coverage. Since

59 diversity is limited by selection performance and nLC-MS/MS sequencing coverage, rather than peptide solubility, future work should focus on these areas. For example, multi-stage selections might be employed to improve enrichment further, and to reduce the number of peptides in a

>108-member library sufficiently for nLC-MS/MS sequencing. Sequencing coverage might also be improved by the use of specialized nLC columns and extended analysis times44.

The primary benefit of synthetic peptide libraries is the chemical control gained over the library design. Given the comparatively low diversities examined by AS-MS relative to the upper bounds of genetically-encoded techniques (108 vs. 1013), we anticipate that taking advantage of the chemical capabilities AS-MS affords—such as straight-forward non-canonical amino acid incorporation—may prove critical for more intractable targets. For example, AS-MS may be particularly suited to engineering peptide and peptoid foldamers45,46. Interfacing non-canonical amino acid incorporation with the macrocyclic architectures that have been rendered accessible to phage47 and mRNA display48 would further expand the breadth of chemical space amenable to exploration by AS-MS. In our case, performing selections on libraries of macrocycles would require an additional, post-enrichment linearization step for reliable MS/MS-based sequencing49,50. We envision that progress in these areas, along with improved mass spectral methods to enable investigation of libraries of even greater diversities, may ultimately facilitate discovery of fully non-natural peptide binders to historically undruggable targets.

2.4. Experimental

2.4.1 Materials

H-Rink Amide-ChemMatrix resin was purchased from PCAS BioMatrix Inc. (St-Jean- sur-Richelieu, Quebec, Canada). 30 μm TentaGel M NH2 microspheres (M30352; 0.20 to 0.25

60 mmol/g amine loading) were purchased from Rapp Polymere (Tübingen, Germany). 20 μm

TentaGel S NH2 microspheres (TMN-9909-PI; 0.2 to 0.3 mmol/g amine loading) was purchased from Peptides International (Louisville, KY). Fmoc-Ala-OH, Fmoc-Arg(Pbf)-OH, Fmoc-

Asn(Trt)-OH, Fmoc-Asp(tBu)-OH, Fmoc-Gln(Trt)-OH, Fmoc-Glu(tBu)-OH, Fmoc-Gly-OH,

Fmoc-His(Trt)-OH, Fmoc-Leu-OH, Fmoc-Lys(Boc)-OH, Fmoc-Met-OH, Fmoc-Phe-OH, Fmoc-

Pro-OH, Fmoc-Ser(tBu)-OH, Fmoc-Thr(tBu)-OH, Fmoc-Trp(Boc)-OH, Fmoc-Tyr(tBu)-OH, and

Fmoc-Val-OH were purchased from Advanced ChemTech (Louisville, KY). Fmoc-D-Asp(tBu)-

OH, Fmoc-D-Gln(Trt)-OH, Fmoc-D-Leu-OH, and Fmoc-D-Lys(Boc)-OH were also purchased from Advanced ChemTech (Louisville, KY). 1-[Bis(dimethylamino)methylene]-1H-1,2,3- triazolo[4,5-b]pyridinium-3-oxid-hexafluorophosphate (HATU) was purchased from P3

BioSystems (Louisville, KY). 4-[(R,S)-α-[1-(9H-Fluoren-9-yl)-methoxyformamido]-2,4- dimethoxybenzyl]-phenoxyacetic acid (Fmoc-Rink amide linker), Fmoc-L-His(Boc)-OH, Fmoc-

β-Ala-OH, and Fmoc-L-Lys(Alloc)-OH, di-tert-butyl decarbonate, and fluorescein isothiocyanate isomer I were purchased from Chem-Impex International (Wood Dale, IL).

Biotin-(PEG)4-NHS ester and biotin-(PEG)4-propionic acid were purchased from ChemPep Inc.

(Wellington, FL). Peptide synthesis-grade N,N-dimethylformamide (DMF), dichloromethane

(DCM), diethyl ether, HPLC-grade acetonitrile (MeCN), and HPLC-grade methanol (MeOH) were purchased from VWR International (Philadelphia, PA). Trifluoroacetic acid (TFA; for

HPLC, ≥99%), piperidine (ReagentPlus; 99%), triisopropylsilane (98%), 1,2-ethanedithiol

(≥98%), phenylsilane (97%), tetrakis(triphenylphosphine)palladium(0) (99%), and N-α-Fmoc-

O-benzyl-L-phosphoserine were purchased from MilliporeSigma (St. Louis, MO).

Diisopropylethylamine (99.5%; biotech. grade; DIEA) was also purchased from MilliporeSigma, and purified by passage through an activated alumina column (Pure Process Technology solvent

61 purification system; Nashua, NH). Water was deionized using a Milli-Q Reference water purification system (Millipore).

Mouse anti-hemagglutinin (HA) monoclonal antibody clone 12ca5 (anti-HA mAb 12ca5) and polyclonal human IgG1 were purchased from Columbia Biosciences (Frederick, MD).

HyClone™ Fetal Bovine Serum (SH30071.03HI, heat inactivated) was purchased from GE

Healthcare Life Sciences (Logan, UT). Bovine serum albumin (BSA; RIA grade) and Tween 20

(reagent grade) were purchased from Amresco (Solon, OH). Dynabeads MyOne Streptavidin T1 magnetic microparticles were purchased from Invitrogen (Carlsbad, CA).

Phosphate buffered saline (10x, Molecular biology grade) was purchased from Corning.

Tris(hydroxymethyl)aminomethane (Tris) was purchased from J.T. Baker. 4-(2-hydroxyethyl)-1- piperazineethanesulfonic acid (HEPES; ≥99.5%), sodium bicarbonate (ACS grade, ≥99.7%), and magnesium chloride (≥98%) were purchased from MilliporeSigma. Tris(2- carboxyethyl)phosphine hydrochloride (TCEP) was purchased from Hampton Research (Aliso

Viejo, CA). Sodium chloride (ACS grade) was purchased from Avantor. Guanidine hydrochloride (Technical grade) and sodium phosphate monobasic monohydrate (ACS grade) were purchased from Amresco.

2.4.2. SPPS of anti-hemagglutinin (HA) epitope and analogues

Peptide-αcarboxamides were synthesized on a 0.1 mmol scale, using H-Rink amide-

ChemMatrix resin (0.45 mmol/g), using either fully automated51 or manual52 “fast flow” Fmoc-

SPPS. For automated syntheses: syntheses were carried out at 90 °C. Amide bond formation was effected in 8 s, and Fmoc removal was carried out in 8 s with 20% (v/v) piperidine in DMF.

Individual cycle times were each about 40 s. For manual flow-based syntheses: reagents and

62 solvents were delivered to a stainless steel reactor, which contained the resin, by either an HPLC pump (DMF or 20% (v/v) piperidine in DMF) or a syringe pump (active esters of Fmoc-amino acids). The reactor was submerged in a water bath for the duration of the synthesis and the temperature was maintained at 70 °C. The procedure for each coupling cycle included: a 30 second coupling with a mixture of Fmoc-protected amino acid (1 mmol), HBTU (0.95 mmol), and diisopropylethyl amine (DIEA; 2.9 mmol, 500 μL) in 2.5 mL of DMF, at a flow rate of 6 mL/min (for the coupling of tryptophan and histidine, 190 μL of DIEA was used to minimize racemization); 1 min DMF wash, at a flow rate of 20 mL/min; 20 second deprotection with 20%

(v/v) piperidine in DMF, at a flow rate of 20 mL/min; and 1 minute DMF wash, at a flow rate of

20 mL/min. After each synthesis was complete, resins were washed with DCM (5x) and dried under reduced pressure.

Global side chain deprotection and cleavage from solid support were carried out by treatment of dry resin with a solution of 94% (v/v) TFA, 2.5% (v/v) ethanedithiol, 2.5% (v/v) water, and 1.0% (v/v) triisopropylsilane, for 2 h at ambient temperature (~1.5 mL of deprotection solution/50 mg of resin). TFA was then evaporated under a stream of nitrogen, and crude peptide was precipitated by addition of cold diethyl ether. Precipitated peptide was triturated (3x) with cold diethyl ether, dissolved in 50/50 water/acetonitrile (0.1% TFA), passed through a 0.2 μm

PTFE syringe filter, and lyophilized.

Crude peptides were purified by semipreparative reverse phase HPLC, using an Agilent mass directed purification system (1260 infinity LC and 6130 single quad MS). For a typical purification, peptides were dissolved in 95/5 water/acetonitrile (0.1% TFA) and passed through a

0.2 μm PTFE syringe filter. The resulting peptide solution was then loaded onto a 9.4 x 250 mm column (Agilent Zorbax 300SB-C3; 5 μm particle size; 300 Å pore size) and purified using a

63 linear gradient of 1 to 61% acetonitrile (0.1% TFA) over 60 min (4 mL/min flow rate). Fractions containing the desired product were pooled and an aliquot taken for LC-MS analysis. The remainder was lyophilized.

For LC-MS characterization: LC-MS data were acquired using an Agilent 6550 quadrupole time-of-flight LC-MS. Samples were run on an Agilent Zorbac 200SB-C3 column

(2.1 x 150 mm, 5 μm particle size, 300 Å pore size). Total ion current (TIC) chromatograms were plotted, and mass spectra were integrated over the principal TIC peak.

2.4.3. Competition fluorescence polarization of HA epitope and analogues

Solutions of unlabeled peptides (~1 mg/mL each; Table 2.6.1) in 1x PBS was prepared in the presence of 100 nM 12ca5, 1 mg/mL BSA, 0.02% Tween 20 (120 μL). Fluorescent competitor was added (YPYDVPDYAK(FITC)α-CONH2; 28 nM). The resulting solution was diluted serially (20 μL) into 100 nM 12ca5, 28 nM YPYDVPDYAK(FITC)α-CONH2, 1 mg/mL

BSA, 0.02% Tween 20 in 1x PBS (80 μL; 5-fold dilutions). The resulting solutions were transferred to a 96 well plate (Greiner Bio-One; Kremsmünster, Austria; polypropylene, flat- bottom, chimney well), kept under foil (RT), and read after 1 h on a Tecan Infinite M1000 plate reader (470 nm excitation; 517 nm detection; 5 nm bandwidth). The concentration of fluorescent

12CA5 was determined based on absorbance at 490 nm, using ε = 76,900 M-1cm-1.

2.4.4. BioLayer Interferometry of 25 nM-affinity 12ca5 ligand

Lyophilized peptide FDYEDYAEWKK(biotin) (biotinylated on C-term lysine) was dissolved to 1 mg/mL in 1x PBS and diluted 50-fold into 1 mg/mL BSA, 0.02% Tween-20, 1x

PBS (‘kinetic buffer’) for immobilization onto streptavidin Octet biosensors (ForteBio; Menlo

64 Park, CA). Biolayer interferometry (BLI) assays were performed in 96 well plates (GreinerBio-

One; Kremsmünster, Austria; polypropylene, flat-bottom, chimney well) using an Octet Red96

System (ForteBio; Menlo Park, CA). Wells were filled with 200 µL of kinetic buffer, peptide solution, or 12ca5 solution (prepared in kinetic buffer). Biotinylated peptide was immobilized onto the streptavidin tip for 120 s. Sensors were then dipped into kinetic buffer for 60 s, 12ca5 solution (1.5 µM, 370 nM, or 90 nM) for 300 s, and finally into kinetic buffer for 600 s.

Measurements were carried out at 30 °C.

2.4.5. Preparation of biotinylated 12ca5

Biotin-(PEG)4-NHS ester (2.0 mg, 3.3 μmol) was weighed into a plastic tube and dissolved in 1.06 mL of DMF ([Biotin-(PEG)4-NHS] = 3.3 mM). Anti-HA antibody (4.71 mg/mL in 1x PBS, 1.06 mL, 33 nmol) was transferred to a plastic tube and to this was added 123

μL of 1M sodium bicarbonate, pH = 8.0. Biotin-(PEG)4-NHS ester (3.3 mM in DMF, 53 μL, 170 nmol) was added dropwise to solution of anti-HA antibody, and reaction was placed on a nutating mixer for 2 h at ambient temperature. Reaction was quenched with addition of 20 mM

Tris, 150 mM NaCl, pH = 7.5 (4 mL). Mixture was then filtered through a 0.2 μm PTFE syringe filter, and purified by FPLC (ÄKTA Prime Plus Liquid Chromatography System, GE

Healthcare). Concentration of biotinylated 12ca5 was measured by absorption at 280 nm, using a determined extinction coefficient of 2.0 x 105 M-1cm-1. Protein was stored at 4 °C and not subjected to freeze-thaw cycles.

2.4.6. Affinity capture of 12ca5-binding peptides: effect of ligand concentration

Preparation of 12ca5-functionalized magnetic beads:

65 100 μL portions of MyOne Streptavidin T1 Dynabeads (10 mg/mL; 1 mg; 0.13 nmol IgG binding capacity) were transferred to 1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack (New England Biolabs, cat# S1506S). The beads were washed 3 times with 1 mg/mL BSA, 0.02% Tween 20, 1x PBS, and then treated with 100 μL portions of biotinylated

12ca5 (1.5 μM; 0.15 nmol). The resulting suspensions were transferred to a rotating vertical mixer, and kept for 15 min at ambient temperature. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 4 times with 1 mL each of 1 mg/mL BSA, 0.02% Tween 20, 1x PBS.

Affinity capture:

1 mL solutions containing 1 mg/mL BSA, 0.02% Tween 20, 1x PBS, and either 1 nM/peptide (1 pmol) or 10 pM/peptide (10 fmol) 12ca5 control binders (Supplementary Table

1) were prepared in 1.7 mL plastic centrifuge tubes, and chilled on ice for 10 min (the 12ca5 binders were added from mixtures containing 1 µM/peptide or 10 nM/peptide in 6 M guanidine hydrochloride, 200 mM phosphate, pH 7 buffer). The resulting chilled solutions were then added to 1 mg portions of 12ca5-functionalized magnetic beads, and the resulting suspensions were kept on a rotating vertical mixer (1 h, in 4 °C cold room).

Elution:

The centrifuge tubes containing the bead suspensions were transferred to the magnetic separation rack. The beads were isolated, and washed 3 times with 1 mL each of chilled 1x PBS

(beads were exposed to buffer for a total of ~6 min). Then, each drained bead aliquot was treated with 2 x 150 µL of ‘elution buffer’ (6 M guanidine hydrochloride, 200 mM phosphate, pH 7.0 buffer containing 1 fmol/µL Peptide Retention Time Calibration Standard (PRTC; Pierce, cat#

88320; for use as an internal reference in MS-based quantitation)).

66 Preparation of reference samples:

1 pmol and 10 fmol/peptide ‘reference’ samples were prepared by dilution of 1

µM/peptide or 10 nM/peptide 12ca5 binder mixture stock solutions (in 6 M guanidine hydrochloride, 200 mM phosphate, pH 7 buffer) into 300 µL of ‘elution buffer’. These samples contained the amount of peptide that would be present in elution, if 100% of the peptide were retained by affinity capture.

NanoLC-MS:

5 µL portions of the combined eluates were analyzed by nanoLC-MS, alongside 5 µL portions of ‘reference’ samples (16.7 fmol/peptide or 167 amol/peptide loading for ‘1 nM’ or ‘10 pM’ conditions, respectively). Analysis was performed on an EASY-nLC 1200 (Thermo Fisher

Scientific) nano-liquid chromatography handling system connected to an Orbitrap Fusion Lumos

Tribrid Mass Spectrometer (Thermo Fisher Scientific). Samples were run on a PepMap RSLC

C18 column (2 μm particle size, 15 cm x 50 μm ID; Thermo Fisher Scientific, P/N ES801). A nanoViper Trap Column (C18, 3 μm particle size, 100 Å pore size, 20 mm x 75 μm ID; Thermo

Fisher Scientific, P/N 164946) was used for desalting. The standard nano-LC method was run at

40 °C and a flow rate of 300 nL/min with the following gradient: 1% solvent B in solvent A ramping linearly to 61% B in A over 40 or 60 min, where solvent A = water (0.1% FA), and solvent B = 80% acetonitrile, 20% water (0.1% FA). Positive ion spray voltage was set to 2200

V. Orbitrap detection was used for primary MS, with the following parameters: resolution =

120,000; quadrupole isolation; scan range = 200-1400 m/z; RF lens = 30%; AGC target = 1 x

106; maximum injection time = 100 ms; 1 microsan.

Generation of dose-response curve:

67 A dose-response curve was generated by analyzing 5 µL portions of the ‘reference samples’, containing 16.7 fmol/peptide, 1.67 fmol/peptide, or 167 amol/peptide. MS detector counts for each peptide were determined from the apex of extracted ion current chromatograms, and plotted vs. sample loading to verify the linearity of response over the sample loading range of interest.

Quantitation of sample recovery:

MS detector counts for each peptide were determined from the apex of extracted ion current chromatograms. Recoveries were taken as the ratio of counts for samples obtained by affinity selection vs. the ‘reference’ samples. To account for run-to-run variability, these ratios were adjusted based on the counts obtained for internal standard (Peptide Retention Time

Calibration Standard).

2.4.7. Affinity capture of 12ca5-binding peptides: effect of capture protocol

Direct capture with 12ca5-functionalized magnetic beads:

Preparation of 12ca5-functionalized magnetic beads was carried out as in 2.4.6. Affinity capture treatments were performed as in 2.4.6, from 1 mL volumes of 1 nM, 100 pM, or 10 pM/peptide mixtures of 12ca5-binding peptides (Table 2.6.1).

Indirect capture by treatment with 12ca5-biotin:

1 mL solutions containing 1 mg/mL BSA, 0.02% Tween 20, 1x PBS, and either 1 nM,

100 pM, or 10 pM/peptide 12ca5 control binders (Table 2.6.1) were prepared in 1.7 mL plastic centrifuge tubes. The solutions were chilled on ice for 10 min, 12ca5-biotin was added (100 nM;

100 pmol), and the resulting solutions were kept on a rotating vertical mixer (in 4 °C cold room).

After 1 h, 1 mg portions of MyOne Streptavidin T1 Dynabeads (0.13 nmol IgG binding capacity)

68 were added in 100 µL each of 1 mg/mL BSA, 0.02% Tween 20, 1x PBS. The resulting solutions were kept for 15 min (rotating vertical mixer, in 4 °C cold room).

Elution, nanoLC-MS analyses of elution and ‘reference’ samples, and quantitation were performed as in 2.4.6.

2.4.8. Affinity capture of 12ca5-binding peptides: effect of magnetic bead concentration

Direct capture with 12ca5-functionalized magnetic beads:

100 μL (1 mg; 0.13 nmol IgG binding capacity) or 1 mL (10 mg; 1.3 nmol IgG binding capacity) portions of MyOne Streptavidin T1 Dynabeads (10 mg/mL); were transferred to 1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. The beads were washed 3 times with 1 mg/mL BSA, 0.02% Tween 20, 1x PBS, and then treated with 100 μL or 1 mL portions of biotinylated 12ca5 (1.5 μM; 0.15 nmol or 1.5 nmol). The resulting suspensions were transferred to a rotating vertical mixer, and kept for 15 min at ambient temperature. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 4 x 1 mL each with 1 mg/mL BSA, 0.02% Tween 20, 1x PBS.

Affinity capture was performed from 1 mL solutions containing 1 mg/mL BSA, 0.02%

Tween 20, 1x PBS, and 1 nM/peptide (1 pmol) 12ca5 control binders (Table 2.6.1). Elution, nanoLC-MS analyses of elution and ‘reference’ samples, and quantitation were performed as in

2.4.6. (Note: raw ion counts were not normalized, as PRTC was absent from the ‘reference’ samples. Counts for a PRTC ion are shown in Supplementary Fig. 10 to illustrate the degree of run-to-run variability in MS response.)

Indirect capture:

69 1 mL solutions containing 1 mg/mL BSA, 0.02% Tween 20, 1x PBS, and 1 nM/peptide

12ca5 control binders (Table 2.6.1) were prepared in 1.7 mL plastic centrifuge tubes. The solutions were chilled on ice for 10 min, 12ca5-biotin was added to either 100 (100 pmol) or 1

µM (1 nmol), and the resulting solutions were kept on a rotating vertical mixer (in 4 °C cold room). After 1 h, 1 mg (0.13 nmol IgG binding capacity) or 10 mg (1.3 nmol IgG binding capacity) portions of MyOne Streptavidin T1 Dynabeads were added in 100 µL each of 1 mg/mL

BSA, 0.02% Tween 20, 1x PBS. The resulting solutions were kept for 15 min (rotating vertical mixer, in 4 °C cold room).

Elution, nanoLC-MS analyses of elution and ‘reference’ samples, and quantitation were performed as in 2.4.6. (Note the caveat above, under ‘direct capture’.)

2.4.9. Affinity capture of 12ca5-binding peptides: effect of ligand concentration, with concentration of eluate

‘Direct’ affinity capture was performed as in 2.4.6, from 1 mL solutions containing 1 mg/mL BSA, 0.02% Tween 20, 1x PBS, and either 10 pM/peptide (10 fmol) or 1 pM/peptide (1 fmol) of 12ca5 control binders (Table 2.6.1). The resulting elutions (300 µL each) were concentrated by solid phase extraction using C18 ZipTip® cartridges (0.6 μL, MilliporeSigma,

P/N ZTC18S096). This involved: 1) wetting the bonded phase with 80 μL of acetonitrile (0.08% trifluoroacetic acid); 2) equilibration with 3 x 80 μL of water (0.1% trifluoroacetic acid); 3) loading the affinity capture eluate (300 µL); 4) de-salting with 3 x 80 μL of water (0.1% trifluoroacetic acid); and 5) elution with 50 μL of 50/50 water (0.1 % trifluoroacetic acid)/acetonitrile (0.08% trifluoroacetic acid) containing 100 mM guanidine hydrochloride, 3.3 mM phosphate). Elutions were collected in 1.7 mL plastic tubes, and lyophilized to give white

70 pellets that were reconstituted in 6 μL each of water (0.1% formic acid). nLC-MS analysis was performed using 5 μL injections of each sample.

6 7 8 2.4.10. Preparation of 2 x 10 , 2 x 10 , and 2 x 10 -member (X)9K-CONH2 libraries

Library design: (X)9K-CONH2

SPPS:

2.9 g of 30 μm TentaGel resin (0.26 mmol/g, 0.74 mmol, 2 x 108 beads) was transferred to a 100 mL peptide synthesis vessel, swollen in DMF, and then washed with DMF (3x). Fmoc-

Rink amide linker (2.0 g, 3.71 mmol, 5 eq) was dissolved in HATU solution (0.38 M in DMF,

8.8 mL, 3.4 mmol), activated with DIEA (1.86 mL, 10.7 mmol) immediately prior to coupling, and added to resin bed. Coupling was performed for 20 min; after this time, resin was washed with DMF (100 mL). Fmoc removal was carried out by treatment of resin with 20% piperidine in

DMF (1 x 50 mL flow wash; 2 x 50 mL, 5 min batch treatments). Resin was then washed with

DMF (150 mL). Coupling of Fmoc-Lys(Boc)-OH, subsequent Fmoc removal, and DMF washes were performed in the same manner.

At this stage, resin was suspended in DMF (50 mL), and divided evenly among 18 x 10 mL fritted plastic syringes using a 5 mL Eppendorf pipette. Couplings were performed as follows: Fmoc-protected amino acids (0.4 mmol) in HATU solution (0.38M, 980 μL, 0.37 mmol) were activated with DIEA (206 μL, 1.2 mmol). Each of the following amino acid derivatives was added to a single portion of resin (theory: 180 mg resin, 40 μmol): Fmoc-Ala-OH, Fmoc-

Asp(OtBu)-OH, Fmoc-Glu(OtBu)-OH, Fmoc-Phe-OH, Fmoc-Gly-OH, Fmoc-His(Boc)-OH,

Fmoc-Lys(Boc)-OH, Fmoc-Leu-OH, Fmoc-Met-OH, Fmoc-Asn(Trt)-OH, Fmoc-Pro-OH, Fmoc-

Gln(Trt)-OH, Fmoc-Arg(Pbf)-OH, Fmoc-Ser(tBu)-OH, Fmoc-Thr(tBu)-OH, Fmoc-Val-OH,

71 Fmoc-Trp(Boc)-OH, and Fmoc-Tyr(tBu)-OH. After coupling for 20 min, resins were washed with DMF (~10 mL ea.), poured back into a 100 mL synthesis vessel, and washed with DMF

(100 mL). Fmoc removal was carried out by treatment of resin with 20% piperidine in DMF (1 x

50 mL flow wash; 2 x 50 mL, 5 min batch treatments), and resin was washed with DMF (150 mL). Nine cycles of split-and-pool synthesis were performed using this procedure.

Following removal of the N-terminal Fmoc group, the resin was washed with DMF (150 mL), and a small portion was transferred to a plastic fritted syringe, washed with DCM (~ 10 mL), and dried under reduced pressure. 1.0 mg of resin was weighed into a plastic tube (theory:

4.7 x 104 beads) and set aside for later characterization (described in 2.4.11). The remainder was resuspended in DMF and pooled back with the bulk of the library.

Portioning:

Resin was suspended in DMF (~50 mL) and divided evenly among 11 x 10 mL fritted plastic syringes. One of these portions of resin was held aside, and the remainder pooled back together (theory: 1.8 x 108 beads). The portion that was held aside was further divided evenly among 11 x 10 mL fritted plastic syringes. One of these portions was in turn held aside (theory:

1.7 x 106 beads), and the remainder was pooled back together (theory: 1.7 x 107 beads). These three portions of resin represent approximate 2 x 108, 2 x 107, and 2 x 106-member libraries.

Resins were each washed with DCM (~50 mL) in fritted plastic syringes and dried under reduced pressure.

Cleavage from resin:

Libraries were globally deprotected and cleaved from resin by treatment of dry resin with a solution of 94% (v/v) TFA, 2.5% (v/v) ethanedithiol, 2.5% (v/v) water, and 1.0% (v/v) triisopropylsilane, for 3 h at ambient temperature. TFA was then evaporated under a stream of

72 nitrogen, and crude peptide was precipitated by addition of cold diethyl ether. Precipitated peptide was triturated (3x) with cold diethyl ether, dissolved in 30/70 water/acetonitrile (0.1%

TFA), passed through a 0.2 μm PTFE syringe filter, and lyophilized.

Solid phase extraction:

The 2 x 108-member library (200 mg) was dissolved in 20 mL of 95/5 water/acetonitrile

(0.1% TFA); the entirety of the isolated 2 x 107 and 2 x 106-member libraries were each dissolved in 5 mL of 95/5 water/acetonitrile (0.1% TFA). Libraries were purified over Bond Elut

C18 cartridges (1 g bed mass, 40 µm particle size, 6 mL volume; Agilent, P/N 12256130) as follows: cartridges were first conditioned with methanol (~20 mL), and then equilibrated with

99/1 water/acetonitrile (0.1% TFA) (~20 mL). Sample was then loaded, and cartridges were washed with 99/1 water/acetonitrile (0.1% TFA) (~20 mL). Sample was eluted with 30/70 water/acetonitrile (0.1% TFA) (~20 mL). Elutions from each library were collected and lyophilized.

Preparation of stock solutions:

Lyophilized powder of 2 x 108-member library (106 mg) was dissolved in DMF (1.08 mL) and then diluted with 1x PBS (9.76 mL) to bring the final concentration to ~8 mM total peptide (~40 pM/member). Lyophilized powder of 2 x 107-member library (39 mg) was dissolved in DMF (3.96 mL) and diluted with 1x PBS (35.6 mL) to bring the final concentration to ~0.8 mM total peptide (~40 pM/member). Lyophilized powder of 2 x 106-member library was dissolved in DMF (1.09 mL) and diluted with 1x PBS (9.83 mL) to bring the final concentration to ~0.8 mM total peptide (~400 pM/member). Stock solutions were aliquotted out and stored at -

80 °C. Aliquots were thawed on ice prior to use.

73 6 7 8 2.4.11. Characterization of 2 x 10 , 2 x 10 , and 2 x 10 -member (X)9K-CONH2 libraries

Sample preparation:

A 1.0 mg aliquot of library resin (from 2.4.10) was suspended in 1.0 mL of Milli-Q water and sonicated to achieve a homogenous suspension (theory: 4.7 x 104 beads/mL). A 20 μL aliquot (theory: 940 beads; 4 pmol/peptide) was transferred to a plastic tube, spun down, and supernatant removed. Beads were then subjected to treatment with 100 μL of 94% (v/v) TFA,

2.5% (v/v) ethanedithiol, 2.5% (v/v) water, and 1.0% (v/v) triisopropylsilane, for 10 min in a 60

°C water bath. TFA was then evaporated under a stream of nitrogen, and cleaved peptide was resuspended in Milli-Q water (0.1% TFA). Sample was purified over a C18 ZipTip® (0.6 μL,

MilliporeSigma, P/N ZTC18S096), eluted in 30/70 water/acetonitrile (0.1% TFA), and lyophilized. Powder was resuspended in 20 μL of Milli-Q water (0.1% FA), and 0.5 μL (~100 fmol/peptide) was submitted for nLC-MS/MS analysis.

NanoLC-MS/MS analysis:

Details of the columns and instruments used for analysis are provided in 2.4.6. The standard nano-LC method was run at 40 °C and a flow rate of 300 nL/min with the following gradient: 1% solvent B in solvent A ramping linearly to 41% B in A over 120 min, where solvent

A = water (0.1% FA), and solvent B = 80% acetonitrile, 20% water (0.1% FA). Positive ion spray voltage was set to 2200 V. Orbitrap detection was used for primary MS with the following parameters: resolution = 120,000; quadrupole isolation; scan range = 200-1400 m/z; RF lens =

30%; AGC target = 1 x 106; maximum injection time = 100 ms; 1 microsan.

Acquisition of secondary MS spectra was done in a data-dependent manner: dynamic exclusion was employed such that a precursor was excluded for 30 s if it was detected four or more times within 30 s (mass tolerance: 10.00 ppm); monoisotopic precursor selection used to

74 select for peptides; intensity threshold was set to 5 x 104; charge states 2-10 were selected; and precursor selection range was set to 200-1400 m/z. The top 15 most intense precursors that met the preceding criteria were subjected to subsequent fragmentation.

Three fragmentation modes – collision-induced dissociation (CID), higher-energy collisional dissociation (HCD), and electron-transfer/higher-energy collisional dissociation

(EThcD) – were used for acquisition of secondary MS spectra. Only precursors with charge states 3 and above were subjected to all three fragmentation modes; precursors with charge states of 2 were subjected to CID and HCD only. For all three modes, detection was performed in the

Orbitrap (resolution = 30,000; quadrupole isolation; isolation window = 1.3 m/z; AGC target = 2 x 104; maximum injection time = 100 ms; 1 microscan). For CID, a collision energy of 30% was used. For HCD, a collision energy of 25% was used. For EthcD, a supplemental activation collision energy of 25% was used.

De novo peptide sequencing:

De novo peptide sequencing was performed by processing .raw files obtained from

Orbitrap analysis using PEAKS Studio (version 8.5) from Bioinformatics Solutions Inc. (ON,

Canada). HCD and CID scans were merged within a 0.2 minute and 0.02 Da window, mass precursor correction was used, and primary mass filtration was employed as appropriate. Auto de novo sequencing was performed using a 15 ppm precursor mass error and 0.02 Da fragment mass error, and with the following modifications: fixed C-term amidation (-0.98 Da) on lysine, and variable oxidation on methionine (+15.99 Da). 15 candidate sequences were obtained for each preprocessed scan. Post-de novo data analysis was performed as described in Vinogradov, A. V. et. al. Library design-facilitated high-throughput sequencing of synthetic peptide libraries. ACS

Comb. Sci. 19 694-701 (2017).30

75

2.4.12. Affinity selections of 2 x 106, 2 x 107, and 2 x 108-member libraries against 12ca5

Preparation of 12ca5-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads (300 μL of 10 mg/mL stock) were transferred to 1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS, and then treated with 300 μL of biotinylated 12ca5

(1.5 μM; 0.45 nmol). The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween

20, 1x PBS. Beads were resuspended in 300 μL of 10% FBS, 0.02% Tween 20, 1x PBS.

Affinity capture:

Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads (prepared above) in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 10 pM/member library.

Elution:

The centrifuge tubes containing the bead suspensions were transferred to the magnetic separation rack. The beads were washed 3 x 1 mL w/ 1x PBS. Bound peptides were eluted with 2 x 100 μL 6M guanidine hydrochloride, 200 mM phosphate, pH 6.8. Eluates were concentrated via C18 ZipTip® pipette tips (as described in 2.4.9) and lyophilized.

NanoLC-MS/MS:

Powders were resuspended in 6 μL water (0.1% formic acid), and 5 μL submitted for nLC-MS/MS analysis. Analysis was performed as described in 2.4.11.

76

2.4.13. SPPS of HA epitope Ala scan mutants

Peptides were synthesized on a fully automated fast-flow peptide synthesizer as described in 2.4.2. Concomitant side chain deprotection and cleavage from resin, as well as HPLC purification and LC-MS analysis, were also carried out as described in 2.4.2.

2.4.14. Competition fluorescence polarization of HA epitope Ala scan mutants

Competition fluorescence polarization experiments were carried out as described in 2.4.3.

2.4.15. Investigation of enrichment vs. sample loading

MyOne Streptavidin T1 Dynabeads (3 mg) were washed 3 x 1 mL w/ 10% FBS/1x PBS.

Beads were then incubated with biotinylated 12ca5 (0.13 nmol of protein per 1 mg of magnetic beads) for 1 h at 4°C, and washed 3 x 1 mL w/ 10% FBS/1x PBS. For each replicate (3 total): library (200 fmol/member) was incubated with 1 mg protein-immobilized magnetic beads in the presence of 10% FBS/1x PBS (final volume: 1 mL) for 1 h at 4°C. Final screening conditions:

100 nM protein, 1 mg/mL magnetic beads, 200 pM/member library. Beads were washed 3 x 1 mL w/ 1x PBS. Bound peptides were eluted with 2 x 100 μL 6M GuanŸHCl, 200 mM phosphate, pH 6.8. Eluates were concentrated via ZipTip and lyophilized. Powders were resuspended in

13.3 μL of water (0.1% FA), 6.3 μL was removed for later analysis (theory: 15 fmol library member/μL). Remaining 7 μL were diluted into 14 μL of water (0.1% FA), and 10 μL was removed for later analysis (theory: 5 fmol library member/μL). Finally, remaining 11 μL were diluted into 22 μL of water (0.1% FA), and 10 μL was removed for later analysis (theory: 1.7

77 fmol library member/μL). 5 μL of each dilution were submitted for nLC-MS/MS analysis, giving theoretical injection amounts (assuming 100% recovery of peptide) of 8.3 fmol, 25 fmol, and 75 fmol library member, respectively.

9 2.4.16. Preparation of a 1 x 10 -member (X)9K-CONH2 library

Library design: (X)9K-CONH2

SPPS:

5.4 g of 20 μm TentaGel resin (0.26 mmol/g, 1.4 mmol, 1.3 x 109 beads) was transferred to a 100 mL peptide synthesis vessel, swollen in DMF, and then washed with DMF (3x). Fmoc-

Rink amide linker (3.8 g, 7.0 mmol, 5 eq) was dissolved in HATU solution (0.38 M in DMF,

16.7 mL, 6.4 mmol), activated with DIEA (3.5 mL, 20 mmol) immediately prior to coupling, and added to resin bed. Coupling was performed for 20 min; after this time, resin was washed with

DMF (100 mL). Fmoc removal was carried out by treatment of resin with 20% piperidine in

DMF (1 x 50 mL flow wash; 2 x 50 mL, 5 min batch treatments). Resin was then washed with

DMF (150 mL). Coupling of Fmoc-Lys(Boc)-OH, subsequent Fmoc removal, and DMF washes were performed in the same manner.

At this stage, resin was suspended in DMF (50 mL), and divided evenly among 18 x 10 mL fritted plastic syringes using a 5 mL Eppendorf pipette. Couplings were performed as follows: Fmoc-protected amino acids (0.8 mmol) in HATU solution (0.38M, 1.86 mL, 0.71 mmol) were activated with DIEA (391 μL, 2.3 mmol). Each of the amino acid derivatives listed in 2.4.10 was added to a single portion of resin (theory: ~330 mg resin, 80 μmol). Couplings were performed for 20 min. Remainder of split-and-pool synthesis (nine rounds total) was completed according to the procedure outlined in 2.4.10.

78 Following removal of the N-terminal Fmoc group, the resin was washed with DMF (150 mL), and a small portion was transferred to a plastic fritted syringe, washed with DCM (~ 10 mL), and dried under reduced pressure. 1.0 mg of resin was weighed into a plastic tube (theory:

1.6 x 105 beads) and set aside for later characterization (described in 2.4.11). The remainder was resuspended in DMF and pooled back with the bulk of the library.

Portioning:

Resin was suspended in DMF (~50 mL) and divided evenly among 11 x 10 mL fritted plastic syringes. One of these portions of resin was held aside (theory: 1.2 x 108 beads), and the remainder pooled back together (theory: 1.2 x 109 beads). These two portions of resin represent approximate 1 x 109 and 1 x 108-member libraries. Resins were each washed with DCM (~50 mL) in fritted plastic syringes and dried under reduced pressure.

Cleavage from resin and solid phase extraction:

Libraries were globally deprotected and cleaved from resin as described in 2.4.10. Crude, lyophilized powders were resuspended in 95/5 water/acetonitrile (0.1% TFA), and purified over

Supelclean™ LC-18 SPE cartridges (2 g bed mass, 45 μm particle size, 12 mL; Millipore Sigma,

P/N 57117). Procedure is described in 2.4.10.

Preparation of stock solutions:

Lyophilized powder of 109-member library (127 mg) was dissolved in DMF (1.3 mL) and then diluted with 1x PBS (11.7 mL) to a final concentration of 8 mM total peptide (~6 pM/member). Lyophilized powder of 108-member library (110 mg) was dissolved in DMF (1.2 mL) and diluted with 1x PBS (10.1 mL) to a final concentration of 8 mM total peptide (~60 pM/member). Stock solutions were aliquotted out and stored at -80 °C. Aliquots were thawed on ice prior to use.

79

9 2.4.17. Characterization of a 1 x 10 -member (X)9K-CONH2 library

Sample preparation:

A 1.0 mg aliquot of library resin (from 2.4.16) was suspended in 1.0 mL of Milli-Q water and sonicated to achieve a homogenous suspension (theory: 1.6 x 105 beads/mL). A 5 μL aliquot

(theory: 805 beads; 1 pmol/peptide) was transferred to a plastic tube, spun down, and supernatant removed. Beads were then subjected to treatment with 94% (v/v) TFA, 2.5% (v/v) ethanedithiol,

2.5% (v/v) water, and 1.0% (v/v) triisopropylsilane, for 10 min in a 60 °C water bath. TFA was then evaporated under a stream of nitrogen, and cleaved peptide was resuspended in Milli-Q water (0.1% TFA). Sample was purified over a C18 ZipTip® (0.6 μL, MilliporeSigma, P/N

ZTC18S096), eluted in 30/70 water/acetonitrile (0.1% TFA), and lyophilized. Powder was resuspended in 13 μL of Milli-Q water (0.1% FA), and 1 μL (~100 fmol/peptide) was submitted for nLC-MS/MS analysis.

NanoLC-MS/MS analysis and de novo peptide sequencing:

Analysis and de novo sequencing was performed as described in 2.4.11.

2.4.18. Affinity selections of 108 and 109-member libraries against 12ca5: effect of library diversity

Preparation of 12ca5-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads were functionalized with biotinylated 12ca5 as described in 2.4.12.

Affinity capture:

80 109-member library (2 fmol/member), 108-member portion (10 fmol/member), or 108- member library from 2.4.10 (2 fmol/member) was incubated with 100 μL (1 mg) portions of protein-immobilized magnetic beads in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 2 pM/member or

10 pM/member library.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

2.4.19. Affinity selections of 108 and 109-member libraries against 12ca5: effect of increased starting material of library

Preparation of 12ca5-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads were functionalized with biotinylated 12ca5 as described in 2.4.12.

Affinity capture:

109-member library (1 fmol/member, 5 fmol/member, or 3 x 4 fmol/member (= 12 fmol/member total)) was incubated with 100 μL (1 mg) portions of protein-immobilized magnetic beads in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads; variable concentration of library members.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. For the 12 fmol/member condition, eluates from each 4 fmol/member selection were combined and concentrated as described in

2.4.9. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

81

2.4.20. Affinity selections of 108 and 109-member libraries against 12ca5: effect of increased starting material of selection target

Preparation of 12ca5-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads (1.1 mL of 10 mg/mL stock) were transferred to a 1.7 mL plastic centrifuge tube, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ 10% FBS, 1x PBS, and then treated with 1.0 mL of biotinylated 12ca5 (1.7 μM; 1.7 nmol). The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ 10% FBS, 1x PBS. Beads were resuspended in 300 μL of 10% FBS, 1x PBS.

Affinity capture:

109-member library (2 fmol/member) was incubated with either 100 μL (1 mg; 0.13 nmol

IgG binding capacity) of 12ca5-immobilized magnetic beads, or 1 mL (10 mg; 1.3 nmol IgG binding capacity) of 12ca5-immobilized magnetic beads, in the presence of 10% FBS, 1x PBS

(final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads; 2 pM/member library.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

2.4.21. Affinity selections of 108 and 109-member libraries against 12ca5: effect of exogenous competitor on selections from a 108-member library

Preparation of 12ca5-functionalized magnetic beads:

82 MyOne Streptavidin T1 Dynabeads were functionalized with biotinylated 12ca5 as described in 2.4.12.

Affinity capture:

108-member library (10 fmol/member) was incubated with 100 μL (1 mg; 0.13 nmol IgG binding capacity) of 12ca5-immobilized magnetic beads, in the presence of either 1 nM, 10 nM, or 100 nM HA epitope, or 100 nM, 1 µM, or 10 µM Gypyeydwe peptide, and 10% FBS, 1x PBS

(final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads; 10 pM/member library.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

2.4.22. Side-by-side selections against 12a5 and human polyclonal IgG1:

Preparation of 12ca5-functionalized and human IgG1-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads (2 x 300 μL of 10 mg/mL stock) were transferred to

1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS, and then treated with 300 μL of biotinylated

12ca5 (1.5 μM; 0.45 nmol) or 120 μL of biotinylated human polyclonal IgG1 (3.8 μM; 0.45 nmol; diluted to 300 μL with 10% FBS, 0.02% Tween 20, 1x PBS). The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS. Beads were resuspended in 300

μL of 10% FBS, 0.02% Tween 20, 1x PBS.

Affinity capture:

83 Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads (prepared above) in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 10 pM/member library.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

2.5. Acknowledgements

This work was supported by the NIH/NIGMS Interdepartmental Biotechnology Training

Program (T32 GM008334 to A.J.Q.), the Defense Advanced Research Projects Agency

(DARPA; Award 023504-001 to B.L.P.), and Calico (to B.L.P.). We gratefully acknowledge

Anne Fischer and Tyler Stukenbroeker (DARPA) for their support and guidance; Earl Moore,

Mark Paul, Louis Abruzzese, and David Sarracino for their technical assistance with nanoLC and

Orbitrap mass spectrometry; and Eric Spooner, Marko Jovanovic, and Dan Maloney for their discussions regarding MS-based analysis and sequencing. We also thank Zak Gates for assistance in pulldown experiments, Suan Tuang for assistance with automated peptide synthesis, and Faycal Touti, Ethan Evans, and Alex Vinogradov for many fruitful scientific discussions.

84 2.6. Appendix

α Amino acid sequence: YPYDVPDYA -CONH2

1101.49 551.25

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.1. LC-MS characterization of the native HA epitope. Monoisotopic mass: 1100.47 Da; found: 1100.48 Da.

α Amino acid sequence: FDYEDYAEWK -CONH2 682.79

455.53

m/z

0 2 4 6 8 10 12 14 16

Retention time (min)

Figure 2.6.2. LC-MS characterization of FDYEDYAEWK. Monoisotopic mass: 1363.56 Da; found: 1363.57 Da. This peptide was identified from early affinity selection experiments.

α Amino acid sequence: YPYDVPDYG -CONH2

544.24 1087.47

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.3. LC-MS characterization of HA epitope mutant Ala9Gly. Monoisotopic mass: 1086.46 Da; found: 1087.47 Da.

85

α Amino acid sequence: Gypyeydwe -CONH2 610.75

1220.49

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.4. LC-MS characterization of Gypyeydwe. Monoisotopic mass: 1219.47 Da; found: 1219.48 Da.

α Amino acid sequence: ypyefdeph -CONH2 598.26

1195.51

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.5. LC-MS characterization of ypyefdeph. Monoisotopic mass: 1194.49 Da; found: 1194.50 Da.

86

1.5 µM 12ca5

370 nM 12ca5

90 nM 12ca5 4 -1 -1 kon = 4.1 x 10 M s -3 -1 koff = 3.0 x 10 s KD = 74 nM

Figure 2.6.6. Binding characterization of FDYEDYAEWK. FDYEDYAEWKK(biotin) binds 4 -1 -1 -3 -1 to 12ca5 with a kon of 4.1 x 10 M s and a koff of 3.0 x 10 s , as measured by BioLayer Interferometry.

87

Figure 2.6.7. MS calibration curves for 12ca5-binding peptides. A plot of MS signal as a function of sample loading illustrates the ~5-fold variation in signal magnitude between 5 different anti-HA binders. For each peptide, a 10-fold change in sample loading corresponded to a 10-fold change in signal. Therefore, for a given peptide, the ratio of signals in two samples corresponds to their relative concentrations.

88

Figure 2.6.8. Magnetic bead capture efficiently recovers high-affinity 12ca5 ligands. Recoveries of 12ca5-binding peptides obtained by affinity selection from solutions containing either 1 nM or 10 pM/peptide of starting mixture. Of the 5 peptides examined (Table 2.6.1), only the 2 highest-affinity binders were significantly retained. Error bars correspond to the standard deviations in recovery obtained by 3 technical replicates.

Figure 2.6.9. High-affinity 12ca5-binding peptides are efficiently recovered with ‘direct’ capture at high dilution. Pulldowns were performed using 100 nM 12ca5 and the indicated ligand concentrations. Recoveries were determined by nLC-MS analysis of the peptide mixtures obtained by ‘direct’ or ‘indirect’ capture, relative to a reference analysis (the amount of material corresponding to 100% retention). Raw ion counts were normalized to an internal standard to account for run-to-run variability in MS response.

89

Figure 2.6.10. Magnetic bead concentration does not significantly improve recovery for high affinity binders. For lower affinity binders, recovery is only slightly improved. Counts of reference PRTC ion (m/z = 422.74) are indicated below each condition.

Figure 2.6.11. Detection of recovered 12ca5 binders from 1 pM concentration is enabled with post-pulldown concentration of eluate. Efficient recovery and robust MS signals were obtained from as little as 1 pM of 12CA5 binders in 1 mL volume.

90

Figure 2.6.12. nLC-MS/MS characterization of a (X)9K-CONH2 library. Analysis of this library, synthesized on 2 x 108 beads of 30 μM TentaGel resin, identifies 406 individual peptide sequences with an average local confidence (ALC) score ≥ 80 from a theory of 940 beads cleaved. A positional frequency plot based on these 406 sequences is shown.

91 α Sequence: APYDVPDYA -CONH2 505.24 1009.46

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.13. LC-MS characterization of HA epitope Tyr1Ala mutant. Monoisotopic mass: 1008.44 Da; found: 1008.46 Da.

α Sequence: YAYDVPDYA -CONH2 538.24 1075.47

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.14. LC-MS characterization of HA epitope Pro2Ala mutant. Monoisotopic mass: 1074.46 Da; found: 1075.47 Da.

92 α Sequence: YPADVPDYA -CONH2 1009.46 505.24

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.15. LC-MS characterization of HA epitope Tyr3Ala mutant. Monoisotopic mass: 1008.44 Da; found: 1008.46 Da.

α Sequence: YPYAVPDYA -CONH2 529.25 1057.50

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.16. LC-MS characterization of HA epitope Asp4Ala mutant. Monoisotopic mass: 1056.48 Da; found: 1056.49 Da.

α 1073.46 Sequence: YPYDAPDYA -CONH2 537.23

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.17. LC-MS characterization of HA epitope Val5Ala mutant. Monoisotopic mass: 1072.44 Da; found: 1072.45 Da.

93 α Sequence: YPYDVADYA CONH2 - 538.24 1075.47

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.18. LC-MS characterization of HA epitope Pro6Ala mutant. Monoisotopic mass: 1074.46 Da; found: 1074.47 Da.

α 1057.50 Sequence: YPYDVPAYA -CONH2 529.25

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.19. LC-MS characterization of HA epitope Asp7Ala mutant. Monoisotopic mass: 1056.48 Da; found: 1056.49 Da.

α 1009.46 Sequence: YPYDVPDAA -CONH2 505.24

m/z

0 2 4 6 8 10 12 14 16 Retention time (min)

Figure 2.6.20. LC-MS characterization of HA epitope Tyr8Ala mutant. Monoisotopic mass: 1008.44 Da; found: 1008.46 Da.

LC-MS characterization of HA epitope mutant Ala9Gly is provided in Figure 2.6.3.

94 Competition fluorescence polarization of HA tag alanine scan mutants 100 APYDVPDYA YAYDVPDYA YPADVPDYA 50 YPYAVPDYA YPYDAPDYA YPYDVADYA 0 YPYDVPAYA YPYDVPDAA Polarization (mP) YPYDVPDYG YPYDVPDYA

-50 10-9 10-8 10-7 10-6 10-5 10-4 10-3 Competitor concentration (M)

Figure 2.6.21. Competition fluorescence polarization of HA epitope Ala scan mutants confirms Asp4, Asp7, Tyr8, and Ala9 as ‘hot spot’ residues. Mutations to Asp7 and Tyr8 completed abrogated affinity towards 12ca5, while mutations to Asp4 and Ala9 were deleterious but did not completely abolish binding. Mutations elsewhere had a lesser effect on affinity.

95

30

20 Total identified

Motif-containing 10

Number of sequences 0 0.0 50.0 100.0 Theoretical injection (fmol)

Figure 2.6.22. Enrichment decreases at increased sample loadings in a selection for 12ca5- binding performed at 200 pM/member. At the highest injection amount analyzed, two additional DXXDY(A/S)-containing sequences were identified, along with an average of 18 additional background sequences. Error bars correspond to one standard deviation of three replicate experiments.

Figure 2.6.23. nLC-MS/MS characterization of a (X)9K-CONH2 library. Analysis of this library, synthesized on 1.3 x 109 beads of 20 μM TentaGel resin, identifies 1471 individual peptide sequences with an ALC score ≥ 80 from a theory of 805 beads cleaved. A positional frequency plot based on these 1470 sequences is shown.

96

! 12ca5 ! Human IgG1

DYHDSFHWYK( VFQDWEDFSK( YMDTVDFSEK( 12ca5m/z 466.20-466.21 m/z12ca5 650.29-650.31 m/z12ca5 617.27-617.28 2E+5 3E+5 2E+5 2E+5 1E+5 1E+5 1E+5

Intensity Intensity Intensity Intensity 0E+0 0E+0 Intensity 0E+0 45 47 49 51 60 62 64 39 41 43

HumanRT IgG (min)1 HumanRT IgG (min)1 HumanRT IgG (min) 1 2E+5 3E+5 2E+5 2E+5 1E+5 1E+5 1E+5 Intensity Intensity Intensity Intensity 0E+0 0E+0 Intensity 0E+0 45 47 49 51 60 62 64 39 41 43 RT (min) RT (min) RT (min)

Figure 2.6.24. Parallel selections enable distinction of specific binders. Shown are extracted ion chromatograms (EICs) for a subset of peptides sequenced in 12ca5 selections but not IgG1 selections. These peptides, all of which contain either DXXDFS or DXXDSF, may be 12ca5 binders but were possibly mis-sequenced. Because the dipeptide masses of ‘FS’ and ‘YA’ are identical, incomplete fragmentation could have led to erroneous sequence assignments in these cases. The EICs indicate that these peptides were enriched in 12ca5 selections, suggesting that they could in fact be 12ca5-binding peptides.

97 ! 12ca5 ! Human IgG1

ETDWELLVFK) WYGLYFTYFK) FGFGQATTLK) 12ca5m/z 639.83-639.84 m/z12ca5 693.84-693.85 12ca5m/z 534.79-534.80

6E+5 3E+5 4E+5 4E+5 2E+5 2E+5 2E+5 1E+5 Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 82 87 92 88 90 92 45 49 53 Human IgG HumanRT IgG (min)1 HumanRT IgG (min)1 RT (min)1 6E+5 4E+5 3E+5 4E+5 2E+5 2E+5 2E+5 1E+5

Intensity Intensity Intensity Intensity 0E+0 Intensity 0E+0 0E+0 82 87 92 88 90 92 45 49 53 RT (min) RT (min) RT (min)

VWDWWLYFRK/ WSEQTTNLPK!! FHWVYFYAYK/ 12ca5m/z 499.92-499.93 12ca5m/z 601.80-601.81 m/z12ca5 711.85-711.86 6E+5 3E+5 4E+5 4E+5 2E+5 2E+5 2E+5 1E+5

Intensity Intensity 0E+0 Intensity 0E+0 Intensity 0E+0 85 86 87 88 39 41 43 68 70 72 74

HumanRT IgG (min) HumanRT IgG (min) HumanRT IgG (min)1 1 1 6E+5 3E+5 2E+5 4E+5 2E+5 1E+5 2E+5 1E+5 Intensity Intensity Intensity 0E+0 Intensity 0E+0 0E+0 85 86 87 88 39 41 43 106 111 116 RT (min) RT (min) RT (min)

FLPFLTWAWK( FPFWYWMLFK( WFWVFPYHFK( 12ca5m/z 654.36-654.37 m/z12ca5 732.36-732.37 12ca5m/z 728.36-728.37 3E+5 2E+5 2E+5 2E+5 1E+5 1E+5 1E+5

Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 100 102 104 106 108 114 120 106 111 116 Human IgG Human IgG Human IgG RT (min)1 RT (min)1 RT (min)1 3E+5 2E+5 2E+5 2E+5 1E+5 1E+5 1E+5 Intensity Intensity Intensity Intensity 0E+0 0E+0 Intensity 0E+0 100 102 104 106 108 114 120 106 111 116 RT (min) RT (min) RT (min)

98

Figure 2.6.25. Parallel selections identify non-specific binders. Shown are EICs for peptides that were sequenced in 12ca5 selections, but not IgG1 selections, and which do not bear resemblance to the HA epitope. The presence of these peptides is detected in both 12ca5 and IgG1 selections, indicating that these binders are non-specific, despite only having been sequenced from 12ca5 selections.

99 Table 2.6.1. Characterization of 12ca5-binding peptides by competition fluorescence polarization.

Binder Source IC50 KD YPYDVPDYA HA epitope 85 ± 7 nM 3.9 ± 1 nM

FDYEDYAEWK Affinity selection 590 ± 190 nM 25 ± 13 nM

YPYDVPDYG Ala9Gly mutant 3.6 ± 1 μM 170 ± 70 nM

Gypyeydwe On-bead screen 62 ± 8 μM 3.0 ± 0.85 μM

ypyefdeph On-bead screen 380 ± 25 μM 18 ± 4 μM

Model binders include the native HA epitope, a peptide identified from early affinity selection experiments, an Ala9Gly mutant of the native HA epitope, and sequences derived from the loop of 12ca5 xenoprotein binders identified in Gates, Z. P. et al. Xenoprotein engineering via synthetic libraries. Proc. Natl. Acad. Sci. U.S.A. 115, E5298-E5306 (2018).24

100 Table 2.6.2. Affinity-capture mass spectrometry identifies 12ca5-binding sequences in proportion with library size.

Library size Peptide ALC (%) m/z z RT Mass ppm 2 x 106 MNDLVDYADK 99 599.7725 2 43.53 1197.5337 -2.7 YGMGFVVRMK 99 406.8743 3 27.04 1217.605 -3.2 SFDKHDYAFK 99 419.539 3 27.38 1255.5986 -2.7 PVDLKDYAEK 99 392.8788 3 37.51 1175.6187 -3.6 PDLMDYAHFK 99 417.8641 3 48.87 1250.5754 -4.0 EDLQDYAEFK 99 628.7915 2 57.68 1255.572 -2.8 FDLADYATAK 99 557.2808 2 58.35 1112.5503 -3.0 SFAMFSFFMK 99 637.289 2 74.14 1272.5671 -2.9 PGMFWMWLWK 99 706.8334 2 95.67 1411.657 -3.3 WLSFYTWLMK 99 695.3535 2 96.36 1388.6951 -2.0 WPWGYLYFFK 99 703.3566 2 102.96 1404.7019 -2.3 FYSFPWYWWK 98 754.8597 2 102.38 1507.7078 -1.9 FFLDVMDFWK 97 681.8289 2 100.71 1361.6477 -3.2 FDLEDYANEK 96 621.7835 2 57.45 1241.5564 -3.2 WYPFFFYMWK 96 765.3535 2 101.75 1528.7002 -5.1 TGDLADYAVK 94 526.275 2 42.13 1050.5347 0.8

WWFYYLLNNK 94 723.3702 2 97.58 1444.7292 -2.3 2 x 107 YFQPLMLWWK 93 713.8692 2 97.4 1425.7268 -2.1 FFNVWWFRLK 92 481.2654 3 92.39 1440.782 -5.3 FSDLEDYSTK 91 602.278 2 44.9 1202.5454 -3.4 MGNWYFYWLK 89 711.8342 2 89.89 1421.6592 -3.8 WMMFYWNLFK 89 748.8447 2 96 1495.678 -2.1 PVDWPDYAAK 88 580.7887 2 52.13 1159.5662 -2.9 PVDWDPAYAK 88 580.7891 2 52.75 1159.5662 -2.2 WWFQYLNMFK 87 739.3563 2 95.78 1476.7012 -2.1 NLWVFYSYWK 86 702.8549 2 94.76 1403.7026 -5.2 MMDWPDYAVK 84 643.7786 2 51.32 1285.5471 -3.4 SDLQDYSFMK 83 624.7806 2 46.82 1247.5493 -2.2 FWQWWLQQFK 83 748.381 2 96.67 1494.7561 -5.7 DYDRYADTFK 82 431.5335 3 51.22 1291.5833 -3.5 WFLLWNDWWK 82 498.2538 3 88.27 1491.7451 -3.8 DHWTFADLWK 81 439.8813 3 45.57 1316.6301 -6.1 MDMEDYAAMK 99 626.2372 2 21.55 1250.4617 -1.5 MADLMDYAGK 99 573.2504 2 33.26 1144.4893 -2.6 GMDVVDYAAK 99 542.2585 2 35.34 1082.5066 -3.7 PDLEDYAVKK 99 392.8791 3 35.98 1175.6187 -2.8 LEDVGDYAAK 99 540.2714 2 36.97 1078.5295 -1.3 PADLPDYAQK 99 558.7861 2 37.23 1115.561 -3.0 FDMQDYAYMK 99 671.7743 2 38.65 1341.5369 -2.1

TSDLEDYAMK 99 594.2643 2 40.24 1186.5176 -3.0 2 x 108 ATDMVDYAFK 99 588.2723 2 41.38 1174.533 -2.4

LGDLMDYSEK 99 593.2747 2 41.38 1184.5383 -3.0 ADLQDYAAVK 99 546.7863 2 43.23 1091.561 -2.8 VFDLHDYASK 99 398.5351 3 45.44 1192.5876 -3.4 WSDVEDYAAK 99 591.7742 2 46.92 1181.5352 -1.2 MEDFVDYADK 99 624.2651 2 49 1246.5176 -1.6 YEDLEDYAAK 99 608.279 2 49.59 1214.5454 -1.6 SPDLEDYAYK 99 600.2814 2 50.67 1198.5505 -1.9

101 ALQDLQDYAK 99 582.3048 2 51.28 1162.5981 -2.6 FMMDLEDYAK 99 647.2767 2 52.3 1292.5417 -2.2 LDLPDYAFHK 99 406.5477 3 53.29 1216.624 -2.3 ADWRDYADLK 99 417.8743 3 54.23 1250.6042 -2.6 WFDTTDYADK 99 630.7792 2 54.68 1259.5459 -1.7 LTDLQDYSFK 99 614.8128 2 55.26 1227.6135 -2.0 YEDVLDYADK 99 615.2884 2 56.05 1228.561 1.0 LNDLPDYSFK 99 605.8074 2 56.54 1209.603 -2.2 YFDVEDYAAK 99 610.2838 2 57.97 1218.5557 -2.1 PWDVQDYAYK 99 642.3043 2 58.06 1282.5981 -3.2 FADWEDYANK 99 629.2789 2 58.03 1256.5461 -2.4 LMDLEDYAFK 99 630.3009 2 62.3 1258.5903 -2.4 VDLEDYANFK 99 606.7968 2 62.97 1211.5823 -2.6 FFDVQDYAYK 99 647.8066 2 65.85 1293.603 -3.3 FFDLEDYSVK 99 631.3069 2 71.36 1260.6025 -2.5 YFWWQYMMYK 99 788.84 2 75.7 1575.668 -1.6 WFDAQLLVFK 99 633.3539 2 87.77 1264.6968 -2.8 YADAHDYAFK 98 400.5195 3 32.45 1198.5408 -3.4 PLVDVHDYAK 98 385.8754 3 40.45 1154.6084 -3.6 LDKVDYADWK 98 417.8819 3 45.44 1250.6294 -4.3 EFDLHDYATK 98 413.1983 3 45.91 1236.5774 -3.4 WDLEDYSEHK 98 440.8651 3 47.13 1319.5781 -3.5 QADVWDYAAK 98 583.2838 2 52.01 1164.5564 -3.0 YFDVADYASK 98 589.2786 2 56.58 1176.5452 -2.2 QDFPDYADFK 98 622.7811 2 61.08 1243.551 -2.7 EVDLVDYAAK 98 561.2945 2 65.07 1120.5764 -1.8 EFDLADYSWK 98 636.7969 2 74.66 1271.5823 -2.4 KGDLHDYAFK 97 398.2073 3 25.99 1191.6035 -2.9 LGDMPDYAAK 97 548.26 2 27.34 1094.5066 -1.0 MDEDYAAFQK 97 616.764 2 41.17 1231.5178 -3.6 ALDVPDYAGK 97 524.2769 2 44.54 1046.5396 -0.3 DVHDYAHFWK 97 439.5427 3 50.09 1315.6099 -2.7 VDVTDYAEFK 97 593.2922 2 56.81 1184.5713 -1.2 LFDVEDYADK 97 607.2896 2 60.86 1212.5662 -1.3 YTDLWDYAEK 97 651.8024 2 67.33 1301.5928 -1.9 YWPFLWYAMK 97 710.3473 2 90.39 1418.6846 -3.2 LMDFQDYAHK 96 428.1992 3 37.77 1281.5813 -4.3 LDKPDYAQFK 96 408.5505 3 38.33 1222.6345 -3.9 LTDVPDYASK 96 554.2863 2 38.55 1106.5608 -2.6 EVEDYASTMK 96 594.2637 2 39.58 1186.5176 -3.9 ALDLKDAYAK 96 553.8118 2 40.57 1105.613 -3.6 WTDLPDYSHK 96 420.8703 3 40.62 1259.5935 -3.4 KLENDYAELK 96 407.8857 3 42.41 1220.6401 -4.0 MEDAADYAWK 96 607.759 2 43.56 1213.5073 -3.2 WDDFKDYADK 96 434.5298 3 48.05 1300.5723 -3.6 EDLPDYAFPK 96 597.2949 2 58.3 1192.5764 -1.0 WFWYTHPQFK 96 719.8552 2 63.54 1437.6982 -1.7 DLHDYAHTLK 95 404.5423 3 32.42 1210.6094 -3.5 EVDVKDYAYK 95 410.2103 3 37.81 1227.6135 -3.6 LGDMFDYAAK 95 573.2719 2 44.15 1144.5222 6.1 WQDMDPYAFK 95 658.2911 2 52.13 1314.5703 -2.0 DGVMDYAAPK 94 541.2529 2 27.33 1080.491 0.2

102 LVDMDMYAMK 94 632.2737 2 28.45 1262.5344 -1.2 TMLDMDPYAK 94 608.2714 2 32.6 1214.5312 -2.5 YVDKVDYAFK 94 416.2185 3 40.85 1245.6394 -4.5 EDMPDYAVWK 94 634.7835 2 52.89 1267.5542 -1.4 FDVWDYAQVK 94 635.3154 2 73.66 1268.6189 -2.1 VMLDKVDYAK 93 399.5481 3 30.86 1195.6272 -4.0 VEGDLPDYAK 93 553.2777 2 40.46 1104.5452 -3.9 FDYEDYARAK 93 426.2018 3 41.61 1275.5884 -3.8 EVADYADTVK 93 555.2762 2 53.65 1108.54 -1.9 MLDVYDSFWK 93 659.8082 2 67.18 1317.6064 -3.5 ELDMHDYAVK 92 412.5279 3 25.75 1234.5652 -2.6 DLADYADKAK 92 554.784 2 37.8 1107.5559 -2.2 WDVMDYAKAK 92 414.5361 3 39.32 1240.5911 -3.7 WSDLKDYAAK 92 399.2063 3 40.91 1194.6033 -5.3 YMDLVDYESK 92 639.2874 2 47.14 1276.5645 -3.3 EGVQDYASWK 92 591.2805 2 49.56 1180.5513 -4.2 VVDLADYSYK 92 586.3027 2 53.58 1170.592 -1.0 FDVPDYAYYK 92 640.3033 2 64.01 1278.592 0.1 MPDVDMYAAK 91 586.2582 2 33.01 1170.5049 -2.6 HVDWKDYAEK 91 430.5456 3 36.34 1288.6201 -3.9 FADADPYAYK 91 580.2725 2 44.86 1158.5344 -3.5 DKQDYAALFK 91 599.3159 2 49.57 1196.6189 -1.3 VYDVEDYSYK 91 640.2941 2 49.97 1278.5769 -2.6 WDQHDYAFLK 91 441.2174 3 55.99 1320.625 4.1 EVADYADMFK 91 602.2693 2 60.99 1202.5278 -3.2 WFWFGWWMPK 91 743.3563 2 93.08 1484.6853 8.5 YHDLPDYAHK 90 419.8707 3 26.02 1256.5938 -2.7 VDKEDYAMLK 90 409.5404 3 26.83 1225.6013 -1.6 LDDMENYAVK 90 606.7806 2 30.88 1211.5493 -2.1 MNDLPDYATK 90 591.7748 2 36.49 1181.5386 -2.9 YPYDVDPYAK 90 615.2946 2 53.21 1228.5764 -1.4 YYDLPDYAFK 90 647.3102 2 68.1 1292.6077 -1.3 PVDEMDYAEK 89 606.2648 2 24.97 1210.5176 -2.1 LMRDVPDYSK 89 413.5437 3 26.25 1237.6125 -2.7 FVQDMDEYAK 89 630.7797 2 32.16 1259.5493 -3.5 YFDKVDYAEK 89 426.2108 3 36.71 1275.6135 -2.4 GPDMDEYAFK 89 594.255 2 36.97 1186.4963 -0.7 HDLPDYAYYK 89 642.3045 2 44.78 1282.5981 -2.9 DYHDSFHWYK 89 466.2057 3 48.2 1395.5996 -3.1 MFDLRDYAAK 89 415.5401 3 49.53 1243.6018 -2.6 FWDLMDYANK 89 659.2991 2 66.1 1316.5859 -1.7 MLKDVDYSYK 88 426.2114 3 32.74 1275.6169 -3.6 GPDVEDYAVK 88 546.2699 2 40.24 1090.5295 -3.9 FDFDLPDYSK 88 623.2931 2 73.14 1244.5713 0.3 YDVVWYSESK 87 425.5369 3 36.22 1273.5979 -7.1 TMTDLQDYAK 87 600.7798 2 38.25 1199.5493 -3.5 YLDTHDAYFK 87 424.5388 3 42.88 1270.5981 -2.8 WMDLPDAYTK 87 627.7938 2 54.45 1253.575 -1.5 YFMDWPDYAK 87 675.7944 2 63.97 1349.575 -0.5 FADLHDYESK 86 408.5265 3 35.5 1222.5618 -3.3 NDLHDYSYLK 86 422.8735 3 40.98 1265.604 -4.2 LPDHHDYFSK 85 419.8711 3 27.22 1256.5938 -1.8

103 PTDLQDFSNK 85 582.2871 2 35.6 1162.5618 -1.8 FFDMKDYAEK 85 436.8674 3 36.41 1307.5857 -4.0 TQDLKDYAFK 85 409.8821 3 40.3 1226.6294 -4.1 LQDVEDVYSK 85 597.8018 2 40.86 1193.5928 -3.2 PVDWKDYEAK 85 417.2105 3 46.16 1248.6138 -3.2 TTWDVEDAYK 85 613.7865 2 56.7 1225.5615 -2.5 EVWDYADPDK 85 618.7792 2 69 1235.5459 -1.7 FFWYYYGPWK 85 728.3517 2 89.06 1454.6812 5.3 WLMAFWPTWK 85 690.8472 2 91.59 1379.6848 -3.6 KDVHDYAYLK 84 417.5548 3 27.41 1249.6455 -2.3 YPDVHDAYVK 84 402.5348 3 40 1204.5876 -4.2 PWKDNEYSFK 84 438.2135 3 40.62 1311.6248 -4.6 KAQNLQDYAK 84 393.2144 3 43.78 1176.625 -3.2 KSWDPADYAK 84 393.863 3 45.06 1178.572 -4.0 WPDKADYAWK 84 426.8791 3 50.01 1277.6194 -3.2 FWNDAEDYAK 84 629.2795 2 53.37 1256.5461 -1.3 FWYYTPWWSK 84 731.8497 2 88.74 1461.687 -1.5 KDVYDRMQQK 83 442.558 3 21.9 1324.6558 -2.7 MMDVADYSSK 83 589.2455 2 25.05 1176.479 -2.1 PVPDMDPYAK 83 574.2748 2 31.35 1146.5378 -2.4 DLPDYAVSRK 83 581.8119 2 37.9 1161.6143 -4.3 DMDVDRYAWK 83 438.5354 3 44.33 1312.5869 -1.9 VVDVNGYEYK 83 592.7996 2 44.74 1183.5874 -2.4 MDLHDYADFK 83 423.8558 3 45.75 1268.5496 -3.1 WNDFKDYADK 83 434.2006 3 47.94 1299.5884 -6.5 YLDWADFSEK 83 636.7968 2 66.25 1271.5823 -2.6 WMWNAWYWPK 83 741.8414 2 91.77 1481.6704 -1.4 DLHDYADLSK 82 588.2863 2 43.31 1174.5618 -3.2 LMDMPDYANK 81 614.7694 2 28.9 1227.5264 -1.7 HDWNEYAAAK 81 602.2785 2 36.11 1202.5469 -3.7 VFDVHDEYSK 81 413.1984 3 37.4 1236.5774 -3.3 SMDLEDYAMK 81 617.257 2 40.8 1232.5054 -4.9 ADHSEYSFFK 81 410.523 3 43.39 1228.5513 -3.4

List of all sequences (ALC score ≥ 80) identified from selections performed with 2 x 106, 2 x 107, and 2 x 108-member libraries against 12ca5 (one replicate shown). Sequences are separated by the library they were discovered from and listed by decreasing ALC score.

104 Table 2.6.3. Replicate selections from a 2 x 108-member library identify similar populations of 12ca5-binding peptides.

Number of replicates in which a DXXDY(A/S)-containing Count Percent sequence was identified 3 46 30.7 2 43 28.7

1 61 40.7

Out of three replicate selections from a 2 x 108-member library for 12ca5 binding, approximately 60% of DXXDY(A/S)-containing sequences are identified in multiple replicates, suggesting that similar (although not identical) populations of 12ca5-binding peptides are reproducibly identified. In total, 150 DXXDY(A/S)-containing sequences were obtained.

105 Table 2.6.4. Use of a precursor selection threshold modulates enrichment based on sample loading.

ALC MS signal Max NL Peptide RT Mass ppm (%) (8 fmol) (75 fmol) MNDLVDYADK 99 41.83 1197.5337 -7.7 2.40E+05 2.45E+06 WDFWYFYGYK 99 92.25 1472.6553 -8.3 2.02E+04 2.07E+05 WFNFWGWEMK 99 85.67 1444.6387 -6.4 2.50E+04 1.61E+05 PDVHDYTWGK 99 35.24 1215.5671 -7.5 1.67E+04 9.83E+04 YLMYPWWWLK 99 88.72 1499.7424 -6 1.34E+04 9.65E+04 FYQWYYSWFK 99 85.74 1515.6975 -5.9 1.46E+04 8.16E+04 FLWYDYWFYK 99 94.22 1528.718 -6.9 1.44E+04 7.82E+04 WYWMYYWPSK 99 79.77 1523.6697 -7.2 1.33E+04 6.10E+04 YTGFTWFWWK 99 91.77 1419.6765 -6.4 9.30E+03 5.75E+04 NFQWWMMWFK 98 93.56 1533.6687 -7.6 1.89E+04 7.75E+04 FFWWNMLWPK 98 91.98 1468.7114 -5.8 7.54E+03 6.84E+04 FWNFFYSWYK 98 94.75 1485.687 -5.2 6.71E+03 6.46E+04 WDFWYFYYGK 97 89.85 1472.6553 -6.5 2.54E+04 1.11E+05 DMWPWWWMWK 96 92.51 1581.6687 -6.3 9.28E+03 7.25E+04 WWEMWPWDYK 96 86.16 1540.6599 -6.5 1.10E+04 5.98E+04 FTPWWLWGFK 95 94.93 1365.7021 -6.5 1.47E+04 7.94E+44 AFQGWWWYYK 95 79.68 1432.6716 -6.5 9.50E+03 6.03E+04 PYYYPVWWWK 94 84.9 1485.7234 -6.4 1.21E+04 6.32E+04 MWAYWFSWPK 94 84.61 1415.6484 -6.2 1.04E+04 5.87E+04 WWFDWAYLAK 94 97.23 1383.6765 -6.5 5.26E+03 5.62E+04 YFTWPAYWWK 91 87.5 1445.6921 -8 1.26E+04 6.34E+04 HPWFWWLTYK 90 77.3 1461.7346 -5.7 1.53E+04 1.01E+05 FWWQYATMWK 90 81.90 1460.6699 -9.0 1.42E+04 5.45E+04 WPYPFWWLHK 89 74.77 1457.7397 -6.5 8.48E+03 8.07E+04 WWQWWMAGYK 89 77.07 1455.6545 -6.5 7.60E+03 5.27E+04 FWWLMYPWGK 88 92.44 1427.6848 -8.3 1.77E+04 5.60E+04 FWYFPWAWQK 88 91.3 1456.708 -7.1 8.46E+03 5.54E+04 NYWWFYMFPK 87 87.32 1495.6748 -7.3 9.70E+03 8.02E+04 ENDWQDYSHK 87 28.73 1319.553 -7.5 1.10E+04 6.48E+04 WMQYWMWEFK 86 81.78 1564.6631 -8.3 8.94E+03 5.27E+04 YYWVWFWNGK 85 88.53 1446.6873 -4.2 1.05E+04 6.64E+04 PYFWMYYDWK 85 80.32 1512.6536 -6.4 9.56E+03 6.03E+04 WGWWFEEMFK 84 86.53 1459.6384 -7.4 8.70E+03 5.11E+04 FYWPMWFFPK 81 94.77 1462.6897 -6.5 8.76E+03 7.39E+04 LVVFAWMYNK 81 63.84 1284.6689 -6.7 1.02E+04 5.86E+04

At low sample loading, use of a precursor selection threshold of 5 x 104 yielded only one identified sequence (MNDLVDYADK). At high sample loading, two additional DXXDY- containing sequences were identified, along with 32 non-motif-containing sequences. MS signal is reported as the apex of extracted ion chromatograms.

106 Table 2.6.5. Selections against 12ca5 identify a decreasing number of motif-containing sequences as library size is increased from 108 to 109.

Total Motif- [Library ALC score Library size sequences containing Percent member] cutoff identified sequences 108 10 pM ≥ 80 257 183 71.2 108 2 pM ≥ 80 242 131 54.1 109 2 pM ≥ 80 34 21 61.8

An approximate 9-fold drop in the number of 12ca5-binding sequences is observed from one-pot selections of a 109-member library relative to a 108-member library. Selections were performed near the solubility limit of the libraries (1-2 mM), and variable member concentration.

107 Table 2.6.6. Increasing scale of 109-member library selections does not restore recovery of 12ca5-binding peptides.

Library Starting material Total sequences Motif-containing Percent size per member identified sequences 1 fmol 7 4 57.1 109 5 fmol 15 10 66.7

12 fmol 8 7 87.5

An increase in the amount of each library member present in the selection did not yield an increase in the number of sequences bearing the characteristic DXXDY(A/S) motif, suggesting that the decreased number of 12ca5-binding sequences identified is not due to material limitation.

108 Table 2.6.7. Ten-fold increase in 12ca5 in selections from a 109-member library does not restore recovery of 12ca5-binding peptides.

0.1 µM 12ca5 1 µM 12ca5 Amount of 12ca5 (nmol): 0.13 1.3 DXXY(A/S)-containing sequences: 26 0 Total sequences: 52 18 Percent motif-containing: 50% 0%

Increasing the amount and concentration of 12ca5 10x (from 0.1 nmol/0.1 µM to 1 nmol/1µM) abrogated identification of DXXDY(A/S)-containing peptides.

109 Table 2.6.8. Peptides identified in presence of high exogenous competitor exhibit generally stronger signal intensities than those identified only in the presence of low exogenous competitor.

Identified in Average ALC Mass MS signal Peptide RT Ppm presence of Max NL (%) (Da) (1 nM HA) 100 nM HA? (± Std. Dev.) MDMEDYAAMK 99 20.93 1250.4617 -4.5 7.25E+05 PVDMEDYAEK 99 23.17 1210.5176 -3.1 1.17E+06 TWDRPDYADK 99 27.89 1264.5837 -4.9 2.17E+06 1.17E+06 ± YFDVEDYAAK 99 47.82 1218.5557 -4.8 9.64E+05 Yes 4.6E+05 LMDLEDYAFK 99 51.71 1258.5903 -5.5 1.30E+06 YEDLEDYAAK 98 40.25 1214.5454 -5.3 6.85E+05 LTDVPDYASK 93 32.75 1106.5608 -2.9 1.20E+06 PTDVMDYASK 99 25.9 1140.5122 -2.7 2.94E+05 HDWQDYAAAK 99 31.7 1202.5469 -4.1 6.90E+05 TMTDLQDYAK 99 33.39 1199.5493 -3.9 3.70E+05 MSDLEDYAMK 99 34.75 1232.5054 -2.8 2.52E+05 HDLPDYAYYK 99 37.85 1282.5981 -5 8.43E+05 SPDLEDYAYK 99 42.37 1198.5505 -4.7 7.16E+05 5.69E+05 ± No YVDWQDYADK 98 48.38 1300.5725 -6 5.42E+05 2.6E+05 LGDYADYAAK 96 35.36 1084.5188 -5.5 1.96E+05 GPDVEDYAVK 95 34.96 1090.5295 -6.1 9.65E+05 LNDLPDYAYK 91 48 1209.603 -5.7 7.00E+05 HADMWDYADK 88 33.55 1265.5134 -4.6 3.76E+05 TDVRDYADYK 81 32 1243.5833 -5.2 8.85E+05

Seven DXXDYA-containing peptides were identified in the presence of 100 nM exogenous HA epitope, compared to 81 identified in the presence of 1 nM exogenous HA epitope (12 were randomly selected for analysis here). On average, peptides identified in the presence of greater exogenous competitor exhibited higher signal intensities than those only identified under less stringent conditions. MS signal is reported as the apex of extracted ion chromatograms.

110 Table 2.6.9. Sequence subtraction from side-by-side selections yields modestly improved enrichments.

Total sequences Motif-containing Processing method Percent identified (12ca5) sequences ALC ≥ 80 198 133 67.2 (no subtraction) ALC ≥ 80 181 133 73.5 (with subtraction)

Shown are the number of DXXDY(A/S)-containing sequences and total sequences identified from side-by-side selections of a 2 x 108-member library against 12ca5 and polyclonal human IgG1. Selections were performed in triplicate, and unique sequences from the sum of these technical replicates are indicated. Subtracting out non-specific sequences (those identified in both conditions) improves identification of motif-containing sequences as a fraction of the total.

111 2.7. References

(1) Mignani, S.; Huber, S.; Tomás, H.; Rodrigues, J.; Majoral, J.-P. Why and How Have Drug Discovery Strategies in Pharma Changed? What Are the New Mindsets? Drug Discov. Today 2016, 21 (2), 239–249. https://doi.org/10.1016/j.drudis.2015.09.007. (2) Erlanson, D. A.; McDowell, R. S.; O’Brien, T. Fragment-Based Drug Discovery. J. Med. Chem. 2004, 47 (14), 3463–3482. https://doi.org/10.1021/jm040031v. (3) Gebauer, M.; Skerra, A. Engineered Protein Scaffolds as Next-Generation Antibody Therapeutics. Curr. Opin. Chem. Biol. 2009, 13 (3), 245–255. https://doi.org/10.1016/j.cbpa.2009.04.627. (4) Tsomaia, N. Peptide Therapeutics: Targeting the Undruggable Space. Eur. J. Med. Chem. 2015, 94, 459–470. https://doi.org/10.1016/j.ejmech.2015.01.014. (5) Vinogradov, A. A.; Yin, Y.; Suga, H. Macrocyclic Peptides as Drug Candidates: Recent Progress and Remaining Challenges. J. Am. Chem. Soc. 2019, 141 (10), 4167–4181. https://doi.org/10.1021/jacs.8b13178. (6) Grossmann, T. N.; Yeh, J. T.-H.; Bowman, B. R.; Chu, Q.; Moellering, R. E.; Verdine, G. L. Inhibition of Oncogenic Wnt Signaling through Direct Targeting of β-Catenin. Proc. Natl. Acad. Sci. 2012, 109 (44), 17942–17947. https://doi.org/10.1073/pnas.1208396109. (7) Chang, Y. S.; Graves, B.; Guerlavais, V.; Tovar, C.; Packman, K.; To, K.-H.; Olson, K. A.; Kesavan, K.; Gangurde, P.; Mukherjee, A.; Baker, T.; Darlak, K.; Elkin, C.; Filipovic, Z.; Qureshi, F. Z.; Cai, H.; Berry, P.; Feyfant, E.; Shi, X. E.; Horstick, J.; Annis, D. A.; Manning, A. M.; Fotouhi, N.; Nash, H.; Vassilev, L. T.; Sawyer, T. K. Stapled Α−helical Peptide Drug Development: A Potent Dual Inhibitor of MDM2 and MDMX for P53- Dependent Cancer Therapy. Proc. Natl. Acad. Sci. 2013, 110 (36), E3445–E3454. https://doi.org/10.1073/pnas.1303002110. (8) Leshchiner, E. S.; Parkhitko, A.; Bird, G. H.; Luccarelli, J.; Bellairs, J. A.; Escudero, S.; Opoku-Nsiah, K.; Godes, M.; Perrimon, N.; Walensky, L. D. Direct Inhibition of Oncogenic KRAS by Hydrocarbon-Stapled SOS1 Helices. Proc. Natl. Acad. Sci. 2015, 112 (6), 1761–1766. https://doi.org/10.1073/pnas.1413185112. (9) Rezai, T.; Yu, B.; Millhauser, G. L.; Jacobson, M. P.; Lokey, R. S. Testing the Conformational Hypothesis of Passive Membrane Permeability Using Synthetic Cyclic Peptide Diastereomers. J. Am. Chem. Soc. 2006, 128 (8), 2510–2511. https://doi.org/10.1021/ja0563455. (10) Walensky, L. D.; Bird, G. H. Hydrocarbon-Stapled Peptides: Principles, Practice, and Progress. J. Med. Chem. 2014, 57 (15), 6275–6288. https://doi.org/10.1021/jm4011675. (11) Bird, G. H.; Mazzola, E.; Opoku-Nsiah, K.; Lammert, M. A.; Godes, M.; Neuberg, D. S.; Walensky, L. D. Biophysical Determinants for Cellular Uptake of Hydrocarbon-Stapled Peptide Helices. Nat. Chem. Biol. 2016, 12 (10), 845–852. https://doi.org/10.1038/nchembio.2153. (12) Touti, F.; Gates, Z. P.; Bandyopadhyay, A.; Lautrette, G.; Pentelute, B. L. In-Solution Enrichment Identifies Peptide Inhibitors of Protein–Protein Interactions. Nat. Chem. Biol. 2019, 15 (4), 410–418. https://doi.org/10.1038/s41589-019-0245-2. (13) Rogers, J. M.; Passioura, T.; Suga, H. Nonproteinogenic Deep Mutational Scanning of Linear and Cyclic Peptides. Proc. Natl. Acad. Sci. 2018, 115 (43), 10959–10964. https://doi.org/10.1073/pnas.1809901115.

112 (14) Clackson, T.; Wells, J. A. In Vitro Selection from Protein and Peptide Libraries. Trends Biotechnol. 1994, 12 (5), 173–184. https://doi.org/10.1016/0167-7799(94)90079-5. (15) Kay, B. K.; Kurakin, A. V.; Hyde-DeRuyscher, R. From Peptides to Drugs via Phage Display. Drug Discov. Today 1998, 3 (8), 370–378. https://doi.org/10.1016/S1359- 6446(98)01220-3. (16) Wilson, D. S.; Keefe, A. D.; Szostak, J. W. The Use of MRNA Display to Select High- Affinity Protein-Binding Peptides. Proc. Natl. Acad. Sci. U. S. A. 2001, 98 (7), 3750– 3755. https://doi.org/10.1073/pnas.061028198. (17) Obexer, R.; Walport, L. J.; Suga, H. Exploring Sequence Space: Harnessing Chemical and Biological Diversity towards New Peptide Leads. Curr. Opin. Chem. Biol. 2017, 38, 52– 61. https://doi.org/10.1016/j.cbpa.2017.02.020. (18) Katoh, T.; Tajima, K.; Suga, H. Consecutive Elongation of D-Amino Acids in Translation. Cell Chem. Biol. 2017, 24 (1), 46–54. https://doi.org/10.1016/j.chembiol.2016.11.012. (19) Katoh, T.; Suga, H. Ribosomal Incorporation of Consecutive β-Amino Acids. J. Am. Chem. Soc. 2018, 140 (38), 12159–12167. https://doi.org/10.1021/jacs.8b07247. (20) Eidam, O.; L. Satz, A. Analysis of the Productivity of DNA Encoded Libraries. MedChemComm 2016, 7 (7), 1323–1331. https://doi.org/10.1039/C6MD00221H. (21) Zhao, G.; Huang, Y.; Zhou, Y.; Li, Y.; Li, X. Future Challenges with DNA-Encoded Chemical Libraries in the Drug Discovery Domain. Expert Opin. Drug Discov. 2019, 14 (8), 735–753. https://doi.org/10.1080/17460441.2019.1614559. (22) Lam, K. S.; Salmon, S. E.; Hersh, E. M.; Hruby, V. J.; Kazmierski, W. M.; Knapp, R. J. A New Type of Synthetic Peptide Library for Identifying Ligand-Binding Activity. Nature 1991, 354 (6348), 82–84. https://doi.org/10.1038/354082a0. (23) Lam, K. S.; Lebl, M.; Krchňák, V. The “One-Bead-One-Compound” Combinatorial Library Method. Chem. Rev. 1997, 97 (2), 411–448. https://doi.org/10.1021/cr9600114. (24) Gates, Z. P.; Vinogradov, A. A.; Quartararo, A. J.; Bandyopadhyay, A.; Choo, Z.-N.; Evans, E. D.; Halloran, K. H.; Mijalis, A. J.; Mong, S. K.; Simon, M. D.; Standley, E. A.; Styduhar, E. D.; Tasker, S. Z.; Touti, F.; Weber, J. M.; Wilson, J. L.; Jamison, T. F.; Pentelute, B. L. Xenoprotein Engineering via Synthetic Libraries. Proc. Natl. Acad. Sci. 2018, 115 (23), E5298–E5306. https://doi.org/10.1073/pnas.1722633115. (25) Zuckermann, R. N.; Kerr, J. M.; Siani, M. A.; Banville, S. C.; Santi, D. V. Identification of Highest-Affinity Ligands by Affinity Selection from Equimolar Peptide Mixtures Generated by Robotic Synthesis. Proc. Natl. Acad. Sci. U. S. A. 1992, 89 (10), 4505–4509. https://doi.org/10.1073/pnas.89.10.4505. (26) Dunayevskiy, Y. M.; Lai, J.-J.; Quinn, C.; Talley, F.; Vouros, P. Mass Spectrometric Identification of Ligands Selected from Combinatorial Libraries Using Gel Filtration. Rapid Commun. Mass Spectrom. 1997, 11 (11), 1178–1184. https://doi.org/10.1002/(SICI)1097-0231(199707)11:11<1178::AID-RCM991>3.0.CO;2- H. (27) Kaur, S.; McGuire, L.; Tang, D.; Dollinger, G.; Huebner, V. Affinity Selection and Mass Spectrometry-Based Strategies to Identify Lead Compounds in Combinatorial Libraries. J. Protein Chem. 1997, 16 (5), 505–511. https://doi.org/10.1023/A:1026369729393. (28) van Breemen, R. B.; Huang, C.-R.; Nikolic, D.; Woodbury, C. P.; Zhao, Y.-Z.; Venton, D. L. Pulsed Ultrafiltration Mass Spectrometry: A New Method for Screening Combinatorial Libraries. Anal. Chem. 1997, 69 (11), 2159–2164. https://doi.org/10.1021/ac970132j.

113 (29) Maaty, W. S.; Weis, D. D. Label-Free, In-Solution Screening of Peptide Libraries for Binding to Protein Targets Using Hydrogen Exchange Mass Spectrometry. J. Am. Chem. Soc. 2016, 138 (4), 1335–1343. https://doi.org/10.1021/jacs.5b11742. (30) Vinogradov, A. A.; Gates, Z. P.; Zhang, C.; Quartararo, A. J.; Halloran, K. H.; Pentelute, B. L. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries. ACS Comb. Sci. 2017, 19 (11), 694–701. https://doi.org/10.1021/acscombsci.7b00109. (31) Jiang, J.; Parker, C. E.; Hoadley, K. A.; Perou, C. M.; Boysen, G.; Borchers, C. H. Development of an Immuno Tandem Mass Spectrometry (IMALDI) Assay for EGFR Diagnosis. PROTEOMICS – Clin. Appl. 2007, 1 (12), 1651–1659. https://doi.org/10.1002/prca.200700009. (32) Li, H.; Popp, R.; Borchers, C. H. Affinity-Mass Spectrometric Technologies for Quantitative Proteomics in Biological Fluids. TrAC Trends Anal. Chem. 2017, 90, 80–88. https://doi.org/10.1016/j.trac.2017.02.011. (33) Choi, Y.; van Breemen, R. B. Development of a Screening Assay for Ligands to the Estrogen Receptor Based on Magnetic Microparticles and LC-MS. Comb. Chem. High Throughput Screen. 2008, 11 (1), 1–6. https://doi.org/10.2174/138620708783398340. (34) Rush, M. D.; Walker, E. M.; Prehna, G.; Burton, T.; van Breemen, R. B. Development of a Magnetic Microbead Affinity Selection Screen (MagMASS) Using Mass Spectrometry for Ligands to the Retinoid X Receptor-α. J. Am. Soc. Mass Spectrom. 2017, 28 (3), 479–485. https://doi.org/10.1007/s13361-016-1564-0. (35) Sannino, A.; Gabriele, E.; Bigatti, M.; Mulatto, S.; Piazzi, J.; Scheuermann, J.; Neri, D.; Donckele, E. J.; Samain, F. Quantitative Assessment of Affinity Selection Performance by Using DNA-Encoded Chemical Libraries. ChemBioChem 2019, 20 (7), 955–962. https://doi.org/10.1002/cbic.201800766. (36) Furka, A.; Sebestyén, F.; Asgedom, M.; Dibó, G. General Method for Rapid Synthesis of Multicomponent Peptide Mixtures. Int. J. Pept. Protein Res. 1991, 37 (6), 487–493. https://doi.org/10.1111/j.1399-3011.1991.tb00765.x. (37) Ma, B.; Zhang, K.; Hendrie, C.; Liang, C.; Li, M.; Doherty-Kirby, A.; Lajoie, G. PEAKS: Powerful Software for Peptidede Novo Sequencing by Tandem Mass Spectrometry. Rapid Commun. Mass Spectrom. 2003, 17 (20), 2337–2342. https://doi.org/10.1002/rcm.1196. (38) Churchill, M. E.; Stura, E. A.; Pinilla, C.; Appel, J. R.; Houghten, R. A.; Kono, D. H.; Balderas, R. S.; Fieser, G. G.; Schulze-Gahmen, U.; Wilson, I. A. Crystal Structure of a Peptide Complex of Anti- Peptide Antibody Fab 26/9. Comparison of Two Different Antibodies Bound to the Same Peptide Antigen. J. Mol. Biol. 1994, 241 (4), 534–556. https://doi.org/10.1006/jmbi.1994.1530. (39) Olson, C. A.; Nie, J.; Diep, J.; Al-Shyoukh, I.; Takahashi, T. T.; Al-Mawsawi, L. Q.; Bolin, J. M.; Elwell, A. L.; Swanson, S.; Stewart, R.; Thomson, J. A.; Soh, H. T.; Roberts, R. W.; Sun, R. Single-Round, Multiplexed Antibody Mimetic Design through MRNA Display. Angew. Chem. Int. Ed Engl. 2012, 51 (50), 12449–12453. https://doi.org/10.1002/anie.201207005. (40) Georgiou, G.; Stathopoulos, C.; Daugherty, P. S.; Nayak, A. R.; Iverson, B. L.; Iii, R. C. Display of Heterologous Proteins on the Surface of Microorganisms: From the Screening of Combinatorial Libraries to Live Recombinant Vaccines. Nat. Biotechnol. 1997, 15 (1), 29–34. https://doi.org/10.1038/nbt0197-29.

114 (41) Feldhaus, M. J.; Siegel, R. W.; Opresko, L. K.; Coleman, J. R.; Feldhaus, J. M. W.; Yeung, Y. A.; Cochran, J. R.; Heinzelman, P.; Colby, D.; Swers, J.; Graff, C.; Wiley, H. S.; Wittrup, K. D. Flow-Cytometric Isolation of Human Antibodies from a Nonimmune Saccharomyces Cerevisiae Surface Display Library. Nat. Biotechnol. 2003, 21 (2), 163– 170. https://doi.org/10.1038/nbt785. (42) Alluri, P. G.; Reddy, M. M.; Bachhawat-Sikder, K.; Olivos, H. J.; Kodadek, T. Isolation of Protein Ligands from Large Peptoid Libraries. J. Am. Chem. Soc. 2003, 125 (46), 13995– 14004. https://doi.org/10.1021/ja036417x. (43) Griffiths, A. D.; Duncan, A. R. Strategies for Selection of Antibodies by Phage Display. Curr. Opin. Biotechnol. 1998, 9 (1), 102–108. https://doi.org/10.1016/S0958- 1669(98)80092-X. (44) Zhang, Y.; Fonslow, B. R.; Shan, B.; Baek, M.-C.; Yates, J. R. Protein Analysis by Shotgun/Bottom-up Proteomics. Chem. Rev. 2013, 113 (4), 2343–2394. https://doi.org/10.1021/cr3003533. (45) Mándity, I. M.; Fülöp, F. An Overview of Peptide and Peptoid Foldamers in Medicinal Chemistry. Expert Opin. Drug Discov. 2015, 10 (11), 1163–1177. https://doi.org/10.1517/17460441.2015.1076790. (46) Nizami, B.; Bereczki-Szakál, D.; Varró, N.; el Battioui, K.; Nagaraj, V. U.; Szigyártó, I. C.; Mándity, I.; Beke-Somfai, T. FoldamerDB: A Database of Peptidic Foldamers. Nucleic Acids Res. 2020, 48 (D1), D1122–D1128. https://doi.org/10.1093/nar/gkz993. (47) Heinis, C.; Rutherford, T.; Freund, S.; Winter, G. Phage-Encoded Combinatorial Chemical Libraries Based on Bicyclic Peptides. Nat. Chem. Biol. 2009, 5 (7), 502–507. https://doi.org/10.1038/nchembio.184. (48) Goto, Y.; Ohta, A.; Sako, Y.; Yamagishi, Y.; Murakami, H.; Suga, H. Reprogramming the Translation Initiation for the Synthesis of Physiologically Stable Cyclic Peptides. ACS Chem. Biol. 2008, 3 (2), 120–129. https://doi.org/10.1021/cb700233t. (49) Lee, J. H.; Meyer, A. M.; Lim, H.-S. A Simple Strategy for the Construction of Combinatorial Cyclic Peptoid Libraries. Chem. Commun. 2010, 46 (45), 8615–8617. https://doi.org/10.1039/C0CC03272G. (50) Liang, X.; Vézina-Dawod, S.; Bédard, F.; Porte, K.; Biron, E. One-Pot Photochemical Ring-Opening/Cleavage Approach for the Synthesis and Decoding of Cyclic Peptide Libraries. Org. Lett. 2016, 18 (5), 1174–1177. https://doi.org/10.1021/acs.orglett.6b00296. (51) Mijalis, A. J.; Thomas Iii, D. A.; Simon, M. D.; Adamo, A.; Beaumont, R.; Jensen, K. F.; Pentelute, B. L. A Fully Automated Flow-Based Approach for Accelerated Peptide Synthesis. Nat. Chem. Biol. 2017, 13 (5), 464–466. https://doi.org/10.1038/nchembio.2318. (52) Simon, M. D.; Heider, P. L.; Adamo, A.; Vinogradov, A. A.; Mong, S. K.; Li, X.; Berger, T.; Policarpo, R. L.; Zhang, C.; Zou, Y.; Liao, X.; Spokoyny, A. M.; Jensen, K. F.; Pentelute, B. L. Rapid Flow-Based Peptide Synthesis. ChemBioChem 2014, 15 (5), 713– 720. https://doi.org/10.1002/cbic.201300796.

115 Chapter 3. De novo discovery of synthetic peptide binders to MDM2/p53 and 14-3-3/phosphoprotein interfaces

3.1. Introduction

Modulation of protein-protein interactions (PPIs) is a challenging but therapeutically promising approach for treating a variety of human diseases. The interactions mediated by

MDM2 and 14-4-3 in particular have garnered much attention as drug targets, due to their involvement in a number of biomedically relevant signaling pathways. MDM2 is an E3 ubiquitin ligase for the tumor-suppressing transcription factor p53, and is overexpressed in ~30% of human sarcomas1. Because overexpression or deregulation of MDM2 and its homologue,

MDMX, primarily occurs in tumors that retain wild-type p53, it has been pursued extensively for therapeutic intervention2,3. 14-3-3 is a hub protein that interacts with many different protein partners, including both MDMX and p534,5. The inhibition or stabilization of such interactions has been pursued with a variety of therapeutic applications in mind, including diverse cancers and neurodegenerative diseases5–7.

Chemically-modified peptides are well-suited to modulate PPIs, such as those mediated by MDM2 and 14-3-3. They can be engineered to exhibit the high affinities and specificities usually observed with biologics, and through a variety of design strategies can in some cases be rendered cell-penetrant8,9. However, the reliable, high-throughput mining of this chemical space—that is, non-canonical peptides and peptidomimetics—for PPI inhibitors has been hitherto limited. Long-established chemical methods for peptide library synthesis and screening cannot match the throughput of molecular biology-based approaches, which can access enormous diversities (108–1013) but are themselves limited to a high-proportionality of natural amino acid content10.

116 Recent work in our lab improved the throughput of affinity selection-mass spectrometry

(AS-MS) to investigate synthetic peptide libraries with diversities approaching those of molecular biology-based approaches11. The use of magnetic bead reagents, in conjunction with nano-liquid chromatography-tandem mass spectrometry (nLC-MS/MS), facilitated the one-pot selections from 106–108-member libraries for binding to a model selection target, a monoclonal anti-hemagglutinin antibody (anti-HA mAb). These selections identified the characteristic anti-

HA-binding motif, DXXDY(A/S), with high enrichments and in proportion to library diversity.

Here, we sought to apply magnetic bead-based AS-MS for de novo discovery of peptides that engage therapeutically relevant targets (Fig. 3.2.1). First, we benchmarked this approach against molecular biology-based discovery by performing selections against MDM2, a target of previous phage display campaigns. We show that magnetic bead-based AS-MS can recapitulate phage display and map the binding determinants of a PPI, identifying the ‘FWL’ triad characteristic of MDM2-binding. The importance of library diversity is underscored by identifying such peptides from 108-member libraries only, and not from a 107-member library.

We then leveraged the synthetic capability AS-MS affords by panning a library comprising non- canonical amino acids against 14-3-3, identifying a conserved motif characterized by β-amino acids. Resynthesized peptides from this motif family exhibited low nanomolar affinity for 14-3-

3. Finally, an X-ray structure of one of these peptides in complex with 14-3-3σ was determined, illustrating the role of β-amino acids in facilitating a key binding contact.

117

Figure 3.2.1. Magnetic bead-based affinity selection-mass spectrometry (AS-MS) enables de novo discovery of peptide-based binders to various protein targets. Libraries comprising 108 members can be easily investigated in one pot, and diversities of 109 can be accessed on a practical lab scale through parallel selections. Here, magnetic bead-based AS-MS was used to discover novel binders to MDM2 (shown) and 14-3-3.

3.2. Results

3.2.1. High-diversity libraries enable discovery of MDM2-binding peptides

Having previously established magnetic bead-based AS-MS as a selection protocol applicable to high-diversity libraries of random synthetic peptides, we sought to benchmark its performance relative to affinity selection from genetically encoded libraries. As a model target for this purpose, we selected MDM2, an oncogenic ubiquitin ligase that binds its substrate p53 through a FXXXWXXL motif. Phage display has identified a number of well-characterized, high- affinity MDM2 binders based on this motif12–15, and we sought to determine whether AS-MS could recapitulate these results by identifying similar sequences from synthetic libraries of comparable size and design.

To begin this line of investigation, the N-terminal domain of MDM2 (residues 25-109) was first accessed synthetically in biotinylated form, to enable its use as a selection target in conjunction with streptavidin-coated magnetic beads (Fig. 3.2.2; Fig. 3.6.1). To mimic previous phage display libraries, a library of design (X)12K, where X = all L-amino acids except cysteine

118 and isoleucine (theoretical diversity = 1.2 x 1015), was synthesized on 20 µm TentaGel resin (4.2 g; 1.3 x 109 beads). Prior to cleavage from resin, this library was split to yield five distinct 2 x

108-member libraries, as well as a 2 x 107-member library, to investigate the importance of library size in the context of selections for MDM2 binding (Fig. 3.2.3a; Fig. 3.6.2).

Figure 3.2.2. Biotinylated (25-109)MDM2 is readily accessed for use in selections via automated fast flow synthesis. a) Sequence of (25-109)MDM2. A biotin was site-selectively installed on Lys36, indicated in red. b) LC-MS characterization and c) analytical HPLC characterization of 1x HPLC purified (25-109)MDM2 K36(biotin). The mass corresponding to the expected product was found upon deconvolution (expected monoisotopic mass: 10502.5 Da; found: 10503.8 Da).

Selections from three of the five 2 x 108-member libraries against (25-109)MDM2 (0.13 nmol) yielded sequences containing the FXXXWXX(L/V) motif characteristic of MDM2-binding

(Fig. 3.2.3b). In total, 16 sequences from these selections were identified (average local confidence (ALC) ≥ 80), five of which (31%) contained FXXXWXX(L/V) (Table 3.6.1). An

119 additional two sequences appeared to be MDM2-binding peptides, containing the minimal

FXXXW motif, but were potentially mis-sequenced due to poor-quality fragmentation spectra

(Fig. 3.6.3). Selections from the 2 x 107-member library did not yield motif-containing sequences, consistent with the frequency of binders identified from 2 x 108-member libraries.

Selections were also performed from a 109-member library, obtained by pooling the individual 2 x 108-member libraries. These selections failed to identify motif-containing sequences, consistent with the poor recovery of motif-containing sequences from 109-member libraries in selections against 12ca5. In summary, library diversity correlates with number of binders identified, provided the diversity is within the technical limits of AS-MS for decoding single-pass selections.

A closer examination of the FXXXWXX(L/V)-containing sequences identified here alongside known MDM2-binding peptides revealed sequence similarity outside of the FWL triad

(Fig. 3.2.3c). Specifically, a 6-residue motif was observed among the majority of sequences:

(S/T)FX(D/E)YWXXL. Each of the conserved positions corresponds to a hot spot of binding energy, as determined by mutational analysis16. The ability to identify not only the FWL triad but also other significant determinants of binding affinity supports our interpretation that AS-MS is capable of matching the performance of phage display, in the context of selections against

MDM2. While others have demonstrated the utility of synthetic libraries for identifying MDM2 binders17,18, our results illustrate their utility for mapping the determinants of a protein-protein interaction.

120

Figure 3.2.3. High-diversity libraries facilitate discovery of p53-like peptides. a) Design of libraries used in selections against MDM2. Each library was derived from the same batch of split-and-pool SPPS. The readout for selections was the number of sequences bearing the MDM2-binding motif, FXXXWXX(L/V). b) Motif-containing sequences could be identified by selection from individual 2 x 108-member libraries (one-pot), but not from 2 x 107-member or 1 x 109-member libraries (obtained by pooling the five individual 2 x 108-member libraries; one- pot). Selections from the five individual 2 x 108-member libraries (multi-pot; 1 x 109 members total) identified five motif-containing sequences, suggesting that library size was enabling for identification of p53-like peptides as long as the mixture was not overly complex. Error bars correspond to one standard deviation among three technical replicates. c) Sequences of known MDM2-binding peptides, including the native MDM2-binding epitope of p53 (residues 17-28) and three peptides identified from phage display, aligned with sequences identified from (b). Known hot spot residues (F,W,L/V) are indicated in blue; aligned hot spot positions are indicated in bold and underline; and additional conserved residues are indicated in purple.

3.2.2. AS-MS identifies non-canonical 14-3-3γ-binding peptides

To illustrate a key advantage of the synthetic library approach, we investigated whether

AS-MS could achieve comparable selection performance from a library based on non-canonical amino acids. As a selection target, we chose the hub protein 14-3-3, which interacts with a range

121 of disease-relevant proteins including p53, Raf kinases, and estrogen receptor a. Considerable effort has been devoted to developing modulators of these interactions5, which are generally mediated by phosphorylation19.

For use in selections against 14-3-3, we designed a library based on a fixed phosphoserine, flanked by eight varied positions (Fig. 3.2.4a). At each varied position, we incorporated one of 18 non-canonical amino acids—including β- and D-amino acids— encompassing a variety of polar, non-polar, charged, and aromatic side-chain functionalities

(Fig. 3.2.4a) (theoretical diversity = 1.1 x 1010). This library was synthesized on 30 µm TentaGel resin (2.9 g; 2 x 108 beads), yielding 2 x 108 members (Fig. 3.6.4).

Side-by-side selections were performed against the 14-3-3γ isoform, and 12ca5 (negative control, to identify non-specific binders; 0.13 nmol each). These selections yielded a total of 19 sequences that matched the library design (ALC ≥ 80), 17 of which were unique to 14-3-3γ, and

2 of which were unique to 12ca5 (Table 3.6.2; Fig. 3.6.5). Extracted ion chromatograms were used to verify the absence of 14-3-3γ-unique sequences from the 12ca5 selections (Figs. 3.6.6–

3.6.7). Among the 14-3-3γ-unique sequences, seven contained a C-terminal motif: (β- homoserine)-(β-alanine/β-homoserine)-(4-nitrophenylalanine). In general, D-amino acids were not present within the sequences identified. β-amino acids were enriched at positions 3, 7, and 8, but were otherwise largely absent. A positional frequency analysis revealed additional preferences for cyclohexylalanine at the N-terminus, and a β-homoserine at position three (Fig.

3.6.5).

122

Figure 3.2.4. Synthetic libraries identify a 14-3-3γ-binding consensus based on β-amino acids. a) Design of a non-canonical library used in selections against 14-3-3γ. A phosphoserine was fixed in the middle, flanked by four varied positions on either side. A suite of non-canonical amino acids was incorporated at each varied position. b) Sequences of putative 14-3-3γ-binding peptides (14-3-3.1, 14-3-3.6, and 14-3-3.12) identified from affinity selections, as well as a negative control peptide (non-binder or NB.1, identified as an artifact in selections against 12ca5) chosen for resynthesis and binding validation studies. Residues comprising the conserved C-term motif among 14-3-3γ binders—Nph-β-Ser-(β-Ala/β-Ser)-Nph—are indicated in blue. c) Identified binders exhibited nanomolar affinities for 14-3-3γ, as determined by fluorescence anisotropy of FITC-labeled 14-3-3γ-binding peptides, NB.1, and a known 14-3-3γ-binding peptide (BiExoS; positive control). These affinities were approximately 10,000-fold higher relative to the negative control. KD values are given in (b). Uncertainties correspond to 95% confidence intervals derived from nonlinear regression. d) Identified 14-3-3γ-binders could compete off bound BiExoS in a fluorescence anisotropy competition assay, suggesting they bind 14-3-3γ in the canonical, amphipathic binding groove. IC50 values are given in (b). e) Molecular structure of 14-3-3σΔc (white surface) in complex with 14-3-3.12 (cyan sticks), based on a 1.80 Å crystal structure. The 2Fo-Fc electron density map corresponding to 14-3-3.12 is shown (blue mesh), contoured at 1σ. Abbreviations: β-Ala = β-alanine; β-Ser = β-homoserine; β-Thr = β- homothreonine; Aad = aminoadipic acid; Aph = 4-aminophenylalanine; Cha =

123 cyclohexylalanine; Cpa = cyclopropylalanine; Dba = diaminobutyric acid; Fph = 4- fluorophenylalanine; Hyp = hydroxyproline; Nph = 4-nitrophenylalanine; Nva = norvaline; Orn = ornithine; pSer = phosphoserine; Thz = thiazolylalanine. Error bars correspond to standard error among three technical replicates.

3.2.3. Discovered peptides bind 14-3-3γ with low nanomolar affinity

To test the binding affinity of putative 14-3-3γ-binding peptides that contained the C- terminal (β-homoserine)-(β-alanine/β-homoserine)-(4-nitrophenylalanine) motif, we synthesized fluorophore-labeled forms of three selected peptides, for use in a fluorescence anisotropy binding assay (Fig. 3.2.4b; 3.6.8–3.6.11). As a negative control, we employed a sequence identified from a 12ca5 selection. Each of the putative 14-3-3γ-binding peptides examined exhibited low nanomolar affinity for 14-3-3γ, with KD values ranging from 3–19 nM (Fig. 3.2.4c). By contrast, the negative control peptide exhibited approximately 10,000-fold weaker binding, suggesting that the specific amino acid sequences of the 14-3-3γ-binding peptides—not the phosphoserine alone—were required for their identification.

Unlabeled forms of the non-canonical 14-3-3γ binders were assayed for their ability to compete with BiExoS, a peptide ligand derived from the Pseudomonas aeruginosa cytotoxin

Exoenzyme S20, in a fluorescence anisotropy competition assay (Figs. 3.6.12–3.6.15). This experiment would test whether the peptides bind to the amphipathic 14-3-3γ binding groove, or elsewhere on the protein. The non-canonical 14-3-3γ binders were found to compete off BiExoS with IC50 values ranging from 78 to 530 nM, suggesting that they indeed bind in the canonical, phosphopeptide-accepting binding channel on 14-3-3γ (Fig. 3.2.4d)5. By contrast, the negative control peptide showed no inhibitory activity.

124 3.2.4. β-amino acids facilitate a key binding contact with 14-3-3

As an additional means of characterizing the binding interaction of non-canonical peptides with 14-3-3, we crystallized 14-3-3.12 in complex with 14-3-3σ. 14-3-3σ was used in place of 14-3-3γ to facilitate crystallization, and retained most of the binding activity for 14-3-

3.12 (Fig. 3.6.16). Diffraction data were collected to a resolution of 1.8 Å, and the structure was solved by molecular replacement (Table 3.6.3).

The 14-3-3.12 backbone adopts an extended conformation in the 14-3-3γ binding groove, flanked by two half-turns21 defined by thiazolylalanine4 and β-alanine8 (Fig. 3.2.4e). 4-

Nitrophenylalanine6, which was selected along with thiazolylalanine and 4-fluorophenylalanine at this position, makes hydrophobic contacts with Leu218, Ile219, and Leu222 of 14-3-3σ (Fig.

3.6.17). 4-Nitrophenylalanine9—the residue most conserved by the selection—participates in an

ε electrostatic interaction and/or H-bond with the NH3 group of Lys122 (N–O distance=3.2 Å), and makes a hydrophobic contact with Ile168 (Fig. 3.6.17). We speculate that the β-residues conserved at positions 7 and 8 of 14-3-3.12 provide the backbone flexibility necessary to accommodate these energetically-important interactions, which were not identified by selection from peptide libraries based on canonical amino acids.19

3.3. Discussion

In this work, we demonstrate that magnetic bead-based AS-MS enables de novo discovery of high affinity peptide binders to PPI epitopes. In the context of selections for MDM2 binding, this method performs on par with phage display, able to identify p53-like peptides from fully randomized synthetic libraries. This work also demonstrates the utility of AS-MS for mapping the primary binding determinants of a PPI (here, p53-MDM2).

125 As seen with previous selections for anti-hemagglutinin antibody binding, the ideal library size for MDM2 binding was shown to be 108. This size library represents the practical upper limit of diversities amenable to AS-MS. Beyond 108, binder identification was abrogated, presumably because the complexity of the isolated sample precluded high-confidence MS/MS- based sequencing. The importance of achieving a diversity of at least 108 was underscored by the identification of p53-like peptides from 108-member libraries only, and not from a 107-member library derived from the same synthetic batch. Parallel selections of five 2 x 108-member libraries were readily performed to extend the achievable diversity even further, enabling sampling and analysis of a billion synthetic compounds on a reasonably short time scale (~1 week).

Relative to molecular biology-based selection approaches, the chief advantage of a chemical approach is the direct access to non-natural chemical space it affords. The power of the synthetic approach was demonstrated here by the discovery of α/β-peptide-based binders to 14-3-

3, in which β-amino acids were found to facilitate a key binding contact. All non-natural amino acids included in the library were easily decoded by standard proteomics software, and the sequences of isolated hits could be determined with high confidence. The inherent selectivity of the AS-MS approach for high affinity binders was also evident in the case of 14-3-3-binding, as all peptides examined exhibited dissociation constants between 3–19 nM.

Because magnetic bead-based AS-MS still cannot reach the library diversities that have made molecular biology-based display techniques so successful, we anticipate that careful library design, with regard to monomer composition and scaffold choice, will be critical for discovery campaigns against more intractable targets. Interfacing AS-MS with stapled or bicyclic libraries, which would require robust post-enrichment linearization chemistry, is the subject of current

126 investigation. We anticipate that with continued progress, sampling of completely non-natural chemical space with ever-greater throughput will enable discovery of modulators of historically undruggable PPIs.

3.4. Experimental

3.4.1. Materials

H-Rink Amide-ChemMatrix resin was purchased from PCAS BioMatrix Inc. (St-Jean- sur-Richelieu, Quebec, Canada). 30 μm TentaGel M NH2 microspheres (M30352; 0.20 to 0.25 mmol/g amine loading) were purchased from Rapp Polymere (Tübingen, Germany). 20 μm

TentaGel S NH2 microspheres (TMN-9909-PI; 0.2 to 0.3 mmol/g amine loading) was purchased from Peptides International (Louisville, KY). Fmoc-Ala-OH, Fmoc-Arg(Pbf)-OH, Fmoc-

Asn(Trt)-OH, Fmoc-Asp(tBu)-OH, Fmoc-Gln(Trt)-OH, Fmoc-Glu(tBu)-OH, Fmoc-Gly-OH,

Fmoc-His(Trt)-OH, Fmoc-Leu-OH, Fmoc-Lys(Boc)-OH, Fmoc-Met-OH, Fmoc-Phe-OH, Fmoc-

Pro-OH, Fmoc-Ser(tBu)-OH, Fmoc-Thr(tBu)-OH, Fmoc-Trp(Boc)-OH, Fmoc-Tyr(tBu)-OH, and

Fmoc-Val-OH were purchased from Advanced ChemTech (Louisville, KY). Fmoc-D-Asp(tBu)-

OH, Fmoc-D-Gln(Trt)-OH, Fmoc-D-Leu-OH, and Fmoc-D-Lys(Boc)-OH were also purchased from Advanced ChemTech (Louisville, KY). 1-[Bis(dimethylamino)methylene]-1H-1,2,3- triazolo[4,5-b]pyridinium-3-oxid-hexafluorophosphate (HATU) was purchased from P3

BioSystems (Louisville, KY). 4-[(R,S)-α-[1-(9H-Fluoren-9-yl)-methoxyformamido]-2,4- dimethoxybenzyl]-phenoxyacetic acid (Fmoc-Rink amide linker), Fmoc-L-His(Boc)-OH, Fmoc-

β-Ala-OH, and Fmoc-L-Lys(Alloc)-OH, and di-tert-butyl dicarbonate were purchased from

Chem-Impex International (Wood Dale, IL). Fmoc-ß-cyclopropyl-L-alanine, Fmoc-ß- cyclohexyl-L-alanine, Fmoc-L-norvaline, Fmoc-O-tert-butyl-L-β-homoserine, Fmoc-O-tert-

127 butyl-L-β-homothreonine, Fmoc-L-α-aminoadipic acid δ-tert-butyl ester, Nα-Fmoc-Nγ-Boc-L-

2,4-diaminobutyric acid , Nα-Fmoc-Nδ-Boc-L-ornithine, Fmoc-3-(4-thiazolyl)-L-alanine, Fmoc-

O-tert-butyl-L-trans-4-hydroxyproline, Fmoc-4-(Boc-amino)-L-phenylalanine, Fmoc-4-fluoro-L- phenylalanine, Fmoc-4-nitro-L-phenylalanine, and fluorescein isothiocyanate isomer I were also purchased from Chem-Impex International (Wood Dale, IL). Biotin-(PEG)4-NHS ester and biotin-(PEG)4-propionic acid were purchased from ChemPep Inc. (Wellington, FL). Peptide synthesis-grade N,N-dimethylformamide (DMF), dichloromethane (DCM), diethyl ether, HPLC- grade acetonitrile (MeCN), and HPLC-grade methanol (MeOH) were purchased from VWR

International (Philadelphia, PA). Trifluoroacetic acid (TFA; for HPLC, ≥99%), piperidine

(ReagentPlus; 99%), triisopropylsilane (98%), 1,2-ethanedithiol (≥98%), phenylsilane (97%), tetrakis(triphenylphosphine)palladium(0) (99%), and N-α-Fmoc-O-benzyl-L-phosphoserine were purchased from MilliporeSigma (St. Louis, MO). Diisopropylethylamine (99.5%; biotech. grade; DIEA) was also purchased from MilliporeSigma, and purified by passage through an activated alumina column (Pure Process Technology solvent purification system; Nashua, NH).

Water was deionized using a Milli-Q Reference water purification system (Millipore).

Peptone from casein, granulated yeast extract, glycerol, and imidazole were purchased from Merck. LB Broth (Miller) powder, ampicillin sodium salt, potassium phosphate dibasic

(≥99.0%), potassium phosphate monobasic, magnesium chloride hexahydrate (≥99.0%), β- mercaptoethanol (BME; ≥99.0%), and Trizma base (≥99.0%) were purchased from

MilliporeSigma. Isopropyl β-D-Thiogalactopyranoside (IPTG) was purchased from PanReac

AppliChem.

HyClone™ Fetal Bovine Serum (SH30071.03HI, heat inactivated) was purchased from

GE Healthcare Life Sciences (Logan, UT). Bovine serum albumin (BSA; RIA grade) and Tween

128 20 (reagent grade) were purchased from Amresco (Solon, OH). Dynabeads MyOne Streptavidin

T1 magnetic microparticles were purchased from Invitrogen (Carlsbad, CA).

Phosphate buffered saline (10x, Molecular biology grade) was purchased from Corning.

Tris(hydroxymethyl)aminomethane (Tris) was purchased from J.T. Baker. 4-(2-hydroxyethyl)-1- piperazineethanesulfonic acid (HEPES; ≥99.5%), sodium bicarbonate (ACS grade, ≥99.7%), and magnesium chloride (≥98%) were purchased from MilliporeSigma. Tris(2- carboxyethyl)phosphine hydrochloride (TCEP) was purchased from Hampton Research (Aliso

Viejo, CA). Sodium chloride (ACS grade) was purchased from Avantor. Guanidine hydrochloride (Technical grade) and sodium phosphate monobasic monohydrate (ACS grade) were purchased from Amresco.

9 3.4.2. Preparation of a 1 x 10 -member (X)12K-CONH2 library

Library design: (X)12K-CONH2

SPPS:

4.2 g of 20 μm TentaGel resin (0.26 mmol/g, 1.1 mmol, 1.0 x 109 beads) was transferred to a 100 mL peptide synthesis vessel, swollen in DMF, and then washed with DMF (3x). Fmoc-

Rink amide linker (2.9 g, 5.4 mmol, 5 eq) was dissolved in HATU solution (0.38 M in DMF,

12.9 mL, 4.9 mmol), activated with DIEA (2.7 mL, 16 mmol) immediately prior to coupling, and added to resin bed. Coupling was performed for 20 min; after this time, resin was washed with

DMF (100 mL). Fmoc removal was carried out by treatment of resin with 20% piperidine in

DMF (1 x 50 mL flow wash; 2 x 50 mL, 5 min batch treatments). Resin was then washed with

DMF (150 mL). Coupling of Fmoc-Lys(Boc)-OH, subsequent Fmoc removal, and DMF washes were performed in the same manner.

129 At this stage, resin was suspended in DMF (50 mL), and divided evenly among 18 x 10 mL fritted plastic syringes using a 5 mL Eppendorf pipette. Couplings were performed as follows: Fmoc-protected amino acids (0.6 mmol) in HATU solution (0.38M, 1.4 mL, 0.54 mmol) were activated with DIEA (300 μL, 1.7 mmol). Each of the amino acid derivatives listed in

2.4.10 was added to a single portion of resin (theory: ~260 mg resin, 60 μmol). Couplings were performed for 20 min. Remainder of split-and-pool synthesis (twelve rounds total) was completed according to the procedure outlined in 2.4.10.

Portioning:

Following removal of the N-terminal Fmoc group, the resin was washed with DMF (150 mL), then suspended in DMF (~ 50 mL) and divided evenly among 5 x 20 mL fritted plastic syringes. Four of these portions (theory: 2 x 108 beads each) were washed with DCM (3x) and dried under reduced pressure. The fifth portion was further divided among 11 x 10 mL fritted plastic syringes. Ten of these portions were recombined. The recombined beads, along with the remaining portion (theory: 1.8 x 107 beads), were washed with DCM (3x) and dried under reduced pressure. 1.0 mg of dried resin was weighed into a plastic tube (theory: 1.4 x 105 beads) and set aside for later characterization (described in 3.4.3).

Cleavage from resin and solid phase extraction:

Libraries were globally deprotected and cleaved from resin as described in 2.4.10. Crude, lyophilized powders were resuspended in 95/5 water/acetonitrile (0.1% TFA), and purified over

Supelclean™ LC-18 SPE cartridges (2 g bed mass, 45 μm particle size, 12 mL; Millipore Sigma,

P/N 57117). Procedure is described in 2.4.10.

Preparation of stock solutions:

130 Lyophilized powders of 2 x 108-member libraries were each dissolved first in DMF and then diluted with 1x PBS to a final library concentration of 8 mM (~40 pM/member), and a final

DMF concentration of 10% (v/v). Lyophilized powder of 2 x 107-member library was similarly first dissolved in DMF, and then diluted with 1x PBS to a final library concentration of 7 mM

(~400 pM/member), and a final DMF concentration of 10% (v/v). Stock solutions were aliquotted out and stored at -80 °C. Aliquots were thawed on ice prior to use.

9 3.4.3. Characterization of a 1 x 10 -member (X)12K-CONH2 library

Sample preparation:

A 1.0 mg aliquot of library resin (from 3.4.2) was suspended in 1.0 mL of Milli-Q water and sonicated to achieve a homogenous suspension (theory: 1.4 x 105 beads/mL). A 4 μL aliquot

(theory: 559 beads; 1 pmol/peptide) was transferred to a plastic tube, spun down, and supernatant removed. Beads were then subjected to treatment with 94% (v/v) TFA, 2.5% (v/v) ethanedithiol,

2.5% (v/v) water, and 1.0% (v/v) triisopropylsilane, for 10 min in a 60 °C water bath. TFA was then evaporated under a stream of nitrogen, and cleaved peptide was resuspended in Milli-Q water (0.1% TFA). Sample was purified over a C18 ZipTip® (0.6 μL, MilliporeSigma, P/N

ZTC18S096), eluted in 30/70 water/acetonitrile (0.1% TFA), and lyophilized. Powder was resuspended in 34 μL of Milli-Q water (0.1% FA), and 1 μL (~30 fmol/peptide) was submitted for nLC-MS/MS analysis.

NanoLC-MS/MS analysis and de novo peptide sequencing:

Analysis and de novo sequencing was performed as described in 2.4.11.

3.4.4. Preparation of synthetic (25-109)MDM2 K36(biotin)

131 The N-terminal domain of MDM2 (residues 25-109; sequence shown in Supplementary

Fig. 28) was synthesized on a 0.03 mmol scale on H-Rink amide-ChemMatrix resin (0.18 mmol/g) via automated fast flow synthesis13.

A biotin was site-specifically incorporated as follows: Fmoc-L-Lys(alloc)-OH was used for coupling of Lys36 during SPPS. Following main chain elaboration, the N-terminal amino group was Boc-protected by first washing the resin

3 times with DCM, then adding to the resin a solution of di-tert-butyl dicarbonate (40 eq, 400 mM) and DIEA (40 eq) in DCM. Coupling was allowed to proceed for 30 min. At this time, resin was washed 5 times with DCM and coupling was repeated as described. To remove the

Alloc group on Lys36, resin was treated with a solution of tetrakis(triphenylphosphine)palladium(0) (2 eq, 20 mM) and phenylsilane (80 eq) in DCM. After

30 min, reaction mixture was drained and deprotection was repeated as described. At this time, resin was washed 5 times with DCM, then 5 times with DMF. To a solution of biotin-(PEG)4- propionic acid (15 eq, 0.42 M) and HATU (13.5 eq, 0.38 M) in DMF was added DIEA (45 eq), and solution was then added to the resin bed and allowed to react for 3 h. At this time, resin was washed 5 times with DMF, 5 times with DCM, and dried under reduced pressure.

LC-MS characterization of HPLC-purified (25-109)MDM2 K36(biotin) was performed as described in 2.4.2.

3.4.5. Affinity selections against MDM2: multi-pot selections of five (2 x 108)-member libraries

Procedure for each selection (five in total), conducted side by side against 12ca5 as a control, is outlined below:

Preparation of MDM2-functionalized and 12ca5-functionalized magnetic beads:

132 MyOne Streptavidin T1 Dynabeads (2 x 300 μL of 10 mg/mL stock) were transferred to

1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS, and then treated with 115 μL of refolded, biotinylated (25-109)MDM2 (10.8 μM; 1.2 nmol; diluted to 300 μL with 10% FBS, 0.02%

Tween 20, 1x PBS) or 300 μL of biotinylated 12ca5 (1.5 μM; 0.45 nmol). The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C.

After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS. Beads were resuspended in 300 μL of 10% FBS, 0.02% Tween 20, 1x PBS.

Affinity capture:

Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads (prepared above) in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 10 pM/member library.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

3.4.6. Affinity selections against MDM2: one-pot selections of 2 x 107 and 1 x 109-member libraries

Preparation of MDM2-functionalized and 12ca5-functionalized magnetic beads:

Magnetic beads were prepared as described in 3.4.5.

Affinity capture:

For 2 x 107-member library:

133 Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 10 pM/member library.

For 1 x 109-member library:

Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads in the presence of 10% FBS, 1x PBS (final volume: 1.5 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 0.7 mg/mL magnetic beads, 7 pM/member library.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

3.4.7. Expression of 14-3-3γ

Full-length human 14-3-3γ was expressed by transforming pROEX HTb plasmid, containing a His6-tagged 14-3-3γFL gene and ampicillin resistance gene, to Rosetta(DE3)

Escherichia coli cells (Novagen). Cells were grown at 37 °C, with 0.1 mg/mL ampicillin, and protein expression was induced using 0.4 μM IPTG and 1 mM MgCl2 and left overnight at 18

°C. Cells were harvested and resuspended in 200 mL of wash buffer (50 mM Tris, 300 mM

NaCl, 12.5 mM imidazole, 2 mM β-mercaptoethanol (BME), pH = 8.0). The proteins were isolated by homogenizing the cell pellets at a pressure of 40 psi using Emulsiflex-C3 homogenizer. The homogenized mixture was centrifuged at 40,000 x g for 30 min at 4 °C and the supernatant was loaded onto a nickel-nitrilotriacetic acid affinity column (Qiagen) pre-

134 equilibrated with wash buffer. After washing the column with wash buffer containing 12.5 mM imidazole, the bound protein was eluted with 250 mM imidazole. Fractions containing protein were verified using SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis).

The protein containing fractions was dialyzed into dialysis buffer (50 mM Tris pH 8, 300 mM

NaCl and 2 mM BME) and in a next step to ITC buffer (25 mM HEPES pH 7.5, 100 mM NaCl,

10 mM MgCl2 and 0.5 mM TCEP). The protein was concentrated (measured using Nanodrop-

1000), aliquotted and stored at -80 °C.

Purity and exact mass of the 14-3-3γ protein was determined using a High Resolution

LC-MS system consisting of a Waters ACQUITY UPLC I-Class system coupled to a Xevo G2

Quadrupole Time of Flight (Q-ToF). The system was comprised of a Binary Solvent Manager and a Sample Manager with Fixed-Loop (SM-FL). The protein was separated (0.3 mL/min) by the column (Polaris C18A reverse phase column 2.0 x 100 mm, Agilent) using a 15% to 75% acetonitrile gradient in water (0.1% v/v formic acid) before analysis in positive mode in the mass spectrometer. Deconvolution was performed using the MaxENTI algorithm in the Masslynx v4.1

(SCN862) software.

3.4.8. Biotinylation of 14-3-3γ

To 900 μL of a 0.35 mM solution of 14-3-3γ (10 mg, 0.32 μmol) in reaction buffer (25 mM HEPES, 100 mM NaCl, 10 mM MgCl2, 0.5 mM TCEP, pH = 7.5) was added 100 μL of a

6.3 μM solution (in DMF) of NHS-PEG4-biotin (2 eq, 0.63 μmol). Reaction was allowed to proceed for 1 h at ambient temperature. At this time, reaction was quenched with the addition of

5 mL of quenching buffer (25 mM Tris, 100 mM NaCl, 10 mM MgCl2, 0.5 mM TCEP, pH =

135 7.5). Sample was then spin washed 5x (10k MW cutoff, 15 mL spin filter), from quenching buffer back into reaction buffer, to remove excess biotin reagent. Protein concentration, measured by absorbance at 280 nm (ε = 37,945 M-1cm-1), was determined to be 85 μM. Protein was aliquotted and stored at -80 °C.

3.4.9. Preparation of a non-canonical (X)4(pS)(X)4-CONH2 library:

Library design: (X)4(pS)(X)4K-CONH2

SPPS:

2.9 g of 30 μm TentaGel resin (0.26 mmol/g, 0.74 mmol, 2.0 x 108 beads) was transferred to a 100 mL peptide synthesis vessel, swollen in DMF, and then washed with DMF (3x). Fmoc-

Rink amide linker (2.0 g, 3.7 mmol, 5 eq) was dissolved in HATU solution (0.38 M in DMF, 8.8 mL, 3.3 mmol), activated with DIEA (1.9 mL, 11 mmol) immediately prior to coupling, and added to resin bed. Coupling was performed for 20 min; after this time, resin was washed with

DMF (100 mL). Fmoc removal was carried out by treatment of resin with 20% piperidine in

DMF (1 x 50 mL flow wash; 2 x 50 mL, 5 min batch treatments). Resin was then washed with

DMF (150 mL). Coupling of Fmoc-Lys(Boc)-OH, subsequent Fmoc removal, and DMF washes were performed in the same manner.

At this stage, resin was suspended in DMF (50 mL), and divided evenly among 18 x 10 mL fritted plastic syringes using a 5 mL Eppendorf pipette. Couplings were performed as follows: Fmoc-protected amino acids (0.29 mmol) in HATU solution (0.38M, 683 μL, 0.26 mmol) were activated with DIEA (150 μL, 0.86 mmol). Each of the following amino acid derivatives was added to a single portion of resin (theory: ~190 mg resin, 40 μmol): Fmoc-D-

Leu-OH, Fmoc-D-Lys(Boc)-OH, Fmoc-D-Asp(OtBu)-OH, Fmoc-D-Gln(Trt)-OH, Fmoc-β-Ala-

136 OH, Fmoc-L-β-HomoSer(tBu)-OH, Fmoc-β-HomoThr(tBu)-OH, Fmoc-Ala(β-cyclopropyl)-OH,

Fmoc-L-Cha-OH, Fmoc-Nva-OH, Fmoc-L-Aad(OtBu)-OH, Fmoc-Dab(Boc)-OH, Fmoc-L-

Orn(Boc)-OH, Fmoc-Hyp(tBu)-OH, Fmoc-L-Ala(4-thiazoyl)-OH, Fmoc-L-Phe(4-NHBoc)-OH,

Fmoc-Phe(4-F)-OH, and Fmoc-Phe(4-NO2)-OH. Couplings were performed for 20 min.

Following four cycles of split-and-pool synthesis, Fmoc-Ser(PO(OBzl)OH)-OH (1.9 g, 3.7 mmol, 5 eq) was dissolved in HATU solution (0.38 M in DMF, 8.8 mL, 3.3 mmol), activated with DIEA (1.9 mL, 11 mmol) immediately prior to coupling, and added to resin bed. Coupling was performed for 2.5 h; after this time, resin was washed with DMF (100 mL). Fmoc removal was carried out by treatment of resin with 20% piperidine in DMF (1 x 50 mL flow wash; 2 x 50 mL, 5 min batch treatments). Resin was then washed with DMF (150 mL). Four more cycles of split-and-pool synthesis were then performed as described above.

Following removal of N-terminal Fmoc group, resin was washed with DMF (150 mL) and transferred to fritted plastic syringes (20 mL). Resin was then washed with DCM (3x) and dried under reduced pressure. 1.0 mg of dried resin (theory: 4.8 x 104 beads) was weighed into a plastic tube and set aside for later characterization (3.4.10).

Cleavage from resin and solid phase extraction:

Libraries were globally deprotected and cleaved from resin as described in 2.4.10. Crude, lyophilized powders were resuspended in 95/5 water/acetonitrile (0.1% TFA), and purified over

Supelclean™ LC-18 SPE cartridges (2 g bed mass, 45 μm particle size, 12 mL; Millipore Sigma,

P/N 57117). Procedure is described in 2.4.10.

Preparation of stock solution:

Lyophilized powder of library (123 mg) was dissolved in DMF (1.16 mL) and diluted with 1x PBS (10.5 mL) to a final library concentration of 8 mM (~40 pM/member) and final

137 DMF concentration of 10% (v/v). Stock solutions were aliquotted out and stored at -80 °C.

Aliquots were thawed on ice prior to use.

3.4.10. Characterization of a non-canonical (X)4(pS)(X)4-CONH2 library

Sample preparation:

A 1.0 mg aliquot of library resin (from 3.4.9) was suspended in 1.0 mL of Milli-Q water and sonicated to achieve a homogenous suspension (theory: 4.8 x 104 beads/mL). A 10 μL aliquot (theory: 475 beads; 1 pmol/peptide) was transferred to a plastic tube, spun down, and supernatant removed. Beads were then subjected to treatment with 94% (v/v) TFA, 2.5% (v/v) ethanedithiol, 2.5% (v/v) water, and 1.0% (v/v) triisopropylsilane, for 10 min in a 60 °C water bath. TFA was then evaporated under a stream of nitrogen, and cleaved peptide was resuspended in Milli-Q water (0.1% TFA). Sample was purified over a C18 ZipTip® (0.6 μL, MilliporeSigma,

P/N ZTC18S096), eluted in 30/70 water/acetonitrile (0.1% TFA), and lyophilized. Powder was resuspended in 90 μL of Milli-Q water (0.1% FA), and 1 μL (~40 fmol/peptide) was submitted for nLC-MS/MS analysis.

NanoLC-MS/MS analysis and de novo peptide sequencing:

Analysis and de novo sequencing was performed as described in 2.4.11. Non-canonical amino acids with masses that differ from natural amino acids were sequenced as fixed modifications on residues that had been excluded from the monomer set. Specifically, β- homothreonine was identified as fixed modification on Asn (+1.0204), aminoadipic acid as a fixed modification on Glu (+14.0156), diaminobutyric acid as a fixed modification on Gly

(+43.0421), ornithine as a fixed modification on Cys (+11.0701), hydroxyproline as a fixed modification on Pro (+15.9948), cyclopropyl alanine as a fixed modification on Met (-19.9721),

138 cyclohexyl alanine as a fixed modification on Phe (+6.0469), 4-amino phenylalanine as a fixed modification on Arg (+5.9782), 4-fluoro phenylalanine as a fixed modification on Tyr (+1.9957),

4-nitro phenylalanine as a fixed modification on Trp (+5.9742), thiazolyl alanine as a fixed modification on His (+16.9611), and phosphoserine as a fixed modification on Ser (+79.9663).

β-alanine, β-homoserine, and norvaline were identified as Ala, Thr, and Val, respectively. D-

Leu, D-Lys, D-Asp, and D-Gln were identified as Leu, Lys, Asp, and Gln, respectively.

3.4.11. Affinity selections against 14-3-3γ

Preparation of 14-3-3γ-functionalized and 12ca5-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads (2 x 300 μL of 10 mg/mL stock) were transferred to

1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ blocking buffer (10% FBS, 100 μM tri-tryptophan additive, 1x PBS), and then treated with 9 μL of biotinylated 14-3-3γ (85 μM; 0.79 nmol; diluted to 300 μL with blocking buffer) or

300 μL of biotinylated 12ca5 (1.5 μM; 0.45 nmol). The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ blocking buffer. Beads were then resuspended in 300 μL of blocking buffer.

Affinity capture:

Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads (prepared above) in the presence of blocking buffer (final volume:

1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 10 pM/member library.

Elution and nanoLC-MS/MS:

139 Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

3.4.12. SPPS of FITC-labeled putative 14-3-3γ-binding peptides

Preparation of Lys(boc)–β-Ala–Lys(alloc)-Rink amide peptidyl resin:

Rink amide ChemMatrix resin (1.0 g, 0.45 mmol/g) was transferred to a 20 mL plastic fritted syringe, washed 3 x 20 mL with DMF, and swollen in 20 mL of DMF for

1 h. Fmoc-L-Lys(alloc)-OH (2.25 mmol, 5 eq, 1.0 g) was weighed into a glass vial and dissolved in 0.38 M HATU in DMF (5.34 mL, 2.0 mmol, 0.9 eq HATU). To this solution was added DIEA

(1.13 mL, 6.5 mmol, 2.9 eq), and activated amino acid solution was added to the resin bed.

Coupling was allowed to proceed for 1 h. At this time, the reaction mixture was drained and the resin was washed 3 x 20 mL with DMF. Fmoc removal was carried out by treatment of the resin with 20% piperidine in DMF (2 x 20 mL, 5 min batch treatments). Resin was then washed 3 x 20 mL with DMF. Couplings of Fmoc-β-Ala-OH and Fmoc-Lys(boc)-OH were performed in the same manner. After removal of the N-terminal Fmoc group, the resin was suspended in DMF

(~20 mL) and split out 10 ways into 6 mL plastic fritted syringes. (Note: β-Ala was incorporated as a spacer between the sequences obtained from selection, which all bear a C-terminal lysine, and Lys(alloc), to which a FITC fluorophore will be coupled for fluorescence anisotropy studies.)

Main chain elaboration of select sequences derived from affinity selection:

One portion of peptidyl resin prepared above was used for every construct prepared (four in total). Couplings were performed as follows: Fmoc-protected amino acids (0.23 mmol) in

HATU solution (0.38M, 534 μL, 0.2 mmol) were activated with DIEA (113 μL, 0.65 mmol) and

140 added to the resin bed. Couplings were allowed to proceed for 20 min. At this time, reaction mixtures were drained and resins were washed 3 x 5 mL with DMF. Fmoc removal was carried out by treatment of the resin with 20% piperidine in DMF (1 x 5 mL flow wash; 2 x 5 mL, 5 min batch treatments). Following removal of N-terminal Fmoc group, resins were washed 3 x 5 mL with DMF, then 3 x 5 mL with DCM.

Incorporation of FITC:

The free amine on the N-terminus was Boc-protected as follows: to a solution of di-tert- butyl dicarbonate (0.45 mmol, 10 eq, 400 mM) in DCM was added DIEA (10 eq), and solution was added to each portion of resin. Coupling was allowed to proceed for 1 h. At this time, resin was washed 3 x 5 mL with DCM and coupling was repeated as described. Resin was washed 5 x

5 mL with DCM. Alloc removal was achieved as follows: each portion of resin was treated with a solution of tetrakis(triphenylphosphine)palladium(0) (0.5 eq, 20 mM) and phenylsilane (20 eq) in DCM, 2 x 45 min. Resins were then washed 3 x 5 mL with DCM, then 3 x 5 mL with DMF.

FITC was installed on the free amine on each C-terminal lysine by treating each portion of resin with fluorescein isothiocyanate isomer I (10 eq, 400 mM in 4:1 DMF:DCM) and DIEA (15 eq) for 1.5 h. Reactions were kept under aluminum foil for the duration of the coupling. Reaction mixtures were then drained, and resins were washed 3 x 5 mL with DMF, 3 x 5 mL with DCM, and dried under reduced pressure.

Cleavage from resin, HPLC purification, and LC-MS characterization:

Detailed procedures can be found in 2.4.2. Sequences, structures, and LC-MS traces are shown below.

3.4.13. Fluorescence anisotropy binding assay of 14-3-3γ-binding peptides

141 All fluorescence anisotropy affinity measurements were conducted in FA buffer (10 mM

HEPES pH 7.4, 150 mM NaCl, 0.1% (v/v) Tween 20, 0.1% (w/v) BSA). During each assay a 1:1 dilution series (starting at 100 μM) of 14-3-3γ was made in wells containing a fixed concentration FITC-labeled peptide (10 nM or 50 nM). This was done in polystyrene non- binding low-volume Corning Black Round Bottom 384-well plates (Corning 4514).

Measurements were performed at ambient temperature using a Tecan Infinite F500 plate reader with the following parameters: λex: 485 (20) nm; λem: 535 (25) nm; mirror: Dichroic 510; flashes:

20; integration time: 50 μs; settle time; 0 μs; gain: manual 90; Z-position: calculated from well.

The G-factor was set at 35 mP based on wells containing only the FITC-labeled peptide.

3.4.14. SPPS of unlabeled 14-3-3γ-binding peptides

SPPS was carried out on Rink amide ChemMatrix resin (0.45 mmol/g). Couplings for main chain elaboration were carried out as described in 3.4.12. Cleavage from resin, HPLC purification, and LC-MS characterization were performed as described in 2.4.2. Sequences, structures, and LC-MS traces are shown below.

3.4.15. Competition fluorescence anisotropy binding assay of 14-3-3γ-binding peptides

All fluorescence anisotropy affinity measurements were conducted in FA buffer (10 mM

HEPES pH 7.4, 150 mM NaCl, 0.1% (v/v) Tween-20, 0.1% (w/v) BSA). During each assay a 1:1 dilution series (starting at 10 μM) of unlabeled peptides were made in wells containing a fixed concentration FITC-labeled biExoS (10 nM; FITC-O1Pen-QGLLDALDLAS

20 (GGSGGGGSGG)QGLLDALDLAS-CONH2) , and 14-3-3γ (20 nM). This is done in polystyrene non-binding low-volume Corning Black Round Bottom 384-well plates (Corning

142 4514). Measurements were performed at ambient temperature using a Tecan Infinite F500 plate reader with the following parameters: λex = 485 (20) nm; λem = 535 (25) nm, mirror: Dichroic =

510, flashes: 20; integration time: 50 μs; settle time: 0 μs; gain: manual 90; and Z-position: calculated from well. The G-factor was set at 35 mP based on wells containing only the FITC- labeled peptide.

3.4.16. Expression of 14-3-3σΔc

The 14-3-3σ isoform with a truncated C-terminus after T321 (ΔC, to enhance crystallization) was expressed by transforming pROEX HTb plasmid, containing a His6-tagged

14-3-3σΔc gene and ampicillin resistance gene, to BL21(DE3) Escherichia coli cells (Novagen).

Cells were grown at 37 °C, with 0.1 mg/mL ampicillin, and protein expression was induced using 0.4 μM IPTG and 1 mM MgCl2 and left overnight at 18 °C. Cells were harvested and resuspended in 200 mL of wash buffer (50 mM Tris, 300 mM NaCl, 12.5 mM imidazole, 2 mM

β-mercaptoethanol (BME), pH = 8.0). The proteins were isolated by homogenizing the cell pellets at a pressure of 40 psi using Emulsiflex-C3 homogenizer. The homogenized mixture was centrifuged at 40,000 x g for 30 min at 4 °C and the supernatant was loaded onto a nickel- nitrilotriacetic acid affinity column (Qiagen) pre-equilibrated with wash buffer. After washing the column with wash buffer containing 12.5 mM imidazole, the bound protein was eluted with

250 mM imidazole. Fractions containing protein were verified using SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis). The protein containing fractions were dialyzed overnight in dialysis buffer (50 mM Tris pH 8, 300 mM NaCl and 2 mM BME) containing TEV protease for His-tag cleavage at 4°C. Non-cleaved His-tagged 14-3-3σΔc was than captured using a nickel-nitrilotriacetic acid affinity column (Qiagen) pre-equilibrated with

143 wash buffer after which the flow through was dialyzed into ITC buffer (25 mM HEPES pH 7.5,

100 mM NaCl, 10 mM MgCl2 and 0.5 mM TCEP) at 4°C. The protein was concentrated

(measured using Nanodrop-1000), aliquotted and stored at -80 °C. Purity and exact mass of the

14-3-3σΔc protein was determined as described in 3.4.7.

3.4.17. Binding validation of 14-3-3.12 and 14-3-3σΔc

Fluorescence anisotropy measurements were carried according to the general protocol described in 3.4.13.

3.4.18. Single crystal X-Ray diffraction analysis of 14-3-3σΔc in complex with synthetic peptide binder 14-3-3.12

Unlabeled 14-3-3.12 was soaked into preformed crystals of 14-3-3σΔc, which grew in

25% PEG400, 5% Glycerol, 0.2 M CaCl2, 0.1 M HEPES pH 7.5 plus 2 mM BME within two weeks. The soaked crystal was fished after 15 days of incubation and flash-frozen in liquid nitrogen. Diffraction data was collected at 100K on an in-house Rigaku Micromax-003 (Rigaku

Europe, Kemsing Sevenoaks, UK) sealed tube X-ray source and Dectris Pilatus 200K detector

(DECTRIS Ltd Baden-Daettwil, Switzerland).

Integration, scaling and merging of data was done using DIALS (CCP4i2) after which molecular replacement is done with MOLREP (CCP4i2) using PDB 4JC3 as search model. A three-dimensional structure of 14-3-3.12 was generated using eLBOW (Phenix)23 after which it was built within this structure based on visual inspection of Fo-Fc and 2FoFc electron density maps in Coot24. Several rounds of model building and refinement (based on isotropic b-factors and standard set of stereo-chemical restraints: covalent bonds, angles, dihedrals, planarities,

144 chiralities, non-bonded) were performed using Coot and Phenix.refine25,26. See Table 3.6.3 for data collection and refinement statistics.

3.5. Acknowledgements

This work was supported by the NIH/NIGMS Interdepartmental Biotechnology Training

Program (T32 GM008334 to A.J.Q.), the Defense Advanced Research Projects Agency

(DARPA; Award 023504-001 to B.L.P.), and Calico (to B.L.P.). We thank Bente Somsen for expression of 14-3-3, assistance with library synthesis, and X-ray structure determination of the

14-3-3.12/14-3-3σΔc complex. We also thank Joseph Brown for assistance with library synthesis, Nina Hartrampf for assistance with automated fast flow synthesis of (25-109)MDM2

K36(biotin), and Zak Gates for many fruitful scientific discussions.

145 3.6. Appendix

Figure 3.6.1. Fully automated fast flow synthesis enables rapid access to biotinylated MDM2 ring domain. a) UV trace from automated fast flow synthesizer acquired during synthesis, monitoring absorption of the dibenzofulvene-piperidine adduct from Fmoc deprotection. b) Analytical HPLC and c) LC-MS characterization of crude (25-109)MDM2 K36 biotin. d) MS data and deconvolution of LC-MS trace in (c). f) Analytical HPLC and g) LC-MS characterization of 1x HPLC purified (25-109)MDM2 K36 biotin. h) MS data and i) deconvolution of LC-MS trace in (g).

146

Figure 3.6.2. nLC-MS/MS characterization of a (X)12K-CONH2 library. Analysis of this library, synthesized on 1.0 x 109 beads of 20 μM TentaGel resin, identifies 208 individual peptide sequences with an ALC score ≥ 80 from a theory of 559 beads cleaved. A positional frequency plot based on these 208 sequences is shown.

147

a)

ALC 99%

b) ALC 84%

Figure 3.6.3. Incomplete peptide backbone fragmentation during MS/MS results in potentially inaccurate sequence assignments. a) A complete ladder of y and b ions from MS/MS analysis enables the high-confidence sequence assignment of FTFLDYWQLLTGK, which contains the MDM2-binding, FXXXWXXL motif. b) Missing y4 and b9 ions prevent a high-confidence assignment of ‘LQ’ vs. ‘QL’ in the sequence FTFWDYWTLQNYK, which would contain the FXXXWXXL motif if the relative positions of Leu and Gln were inverted.

148

Figure 3.6.4. nLC-MS/MS characterization of a (X)4(pS)(X)4K-CONH2 library. Analysis of this library, synthesized on 2 x 108 beads of 30 μM TentaGel resin, identifies 617 individual peptide sequences with an average local confidence (ALC) score ≥ 80 from a theory of 475 beads cleaved. A positional frequency plot based on these 617 sequences is shown. Abbreviations: A = β-alanine; C = ornithine; D = D-aspartate; E = aminoadipic acid; F = cyclohexyl alanine; G = diaminobutyric acid; H = thiazolyl alanine; K = D-lysine (positions 1-9) or L-lysine (position 10); L = D-leucine; N = β-homothreonine; P = hydroxyproline; M = cyclopropylalanine; Q = D-glutamine; R = 4-aminophenylalanine; S = phosphoserine; T = β- homoserine; V = norvaline; W = 4-nitrophenylalanine; Y = 4-fluorophenylalanine.

Figure 3.6.5. Selections of a library comprised of non-canonical amino acids against 14-3-3γ identify sequences with prominent N-term and C-term motifs. A positional frequency analysis of all identified 14-3-3γ-unique peptides (17 in total) reveals a somewhat conserved FXT motif at the N-term, and a more prominent T(A/T)W motif near the N-term. Abbreviations: A = β-alanine; C = ornithine; D = D-aspartate; E = aminoadipic acid; F = cyclohexyl alanine; G = diaminobutyric acid; H = thiazolyl alanine; K = D-lysine (positions 1-9) or L-lysine (position 10); L = D-leucine; M = cyclopropylalanine; N = β-homothreonine; P = hydroxyproline; Q = D- glutamine; R = 4-aminophenylalanine; S = phosphoserine; T = β-homoserine; V = norvaline; W = 4-nitrophenylalanine, Y = 4-fluorophenylalanine.

149 a) 14.3.3.1 (EIC: m/z 734.28 – 734.29)

14-3-3 (1) 14-3-3 (2) 14-3-3 (3) 5E+5 5E+5 5E+5 4E+5 4E+5 4E+5 3E+5 3E+5 3E+5 2E+5 2E+5 2E+5 Intensity Intensity Intensity Intensity 1E+5 Intensity 1E+5 1E+5 0E+0 0E+0 0E+0 80 90 100 110 120 80 90 100 110 120 80 90 100 110 120 RT (min) RT (min) RT (min)

12ca5 (1) 12ca5 (2) 12ca5 (3) 5E+5 5E+5 5E+5 4E+5 4E+5 4E+5 3E+5 3E+5 3E+5 2E+5 2E+5 2E+5 Intensity Intensity Intensity Intensity Intensity Intensity 1E+5 1E+5 1E+5 0E+0 0E+0 0E+0 80 90 100 110 120 80 90 100 110 120 80 90 100 110 120 RT (min) RT (min) RT (min)

b) 14.3.3.6 (EIC: m/z 710.84 – 710.85)

14-3-3 (1) 14-3-3 (2) 14-3-3 (3) 1E+5 1E+5 1E+5

5E+4 5E+4 5E+4 Intensity Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 64 66 68 70 72 74 64 66 68 70 72 74 64 66 68 70 72 74 RT (min) RT (min) RT (min)

12ca5 (1) 12ca5 (2) 12ca5 (3) 1E+5 1E+5 1E+5

5E+4 5E+4 5E+4 Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 64 66 68 70 72 74 64 66 68 70 72 74 64 66 68 70 72 74 RT (min) RT (min) RT (min) c) 14.3.3.12 (EIC: m/z 721.78 – 721.79)

14-3-3 (1) 14-3-3 (2) 14-3-3 (3) 8E+4 8E+4 8E+4 6E+4 6E+4 6E+4 4E+4 4E+4 4E+4 2E+4 2E+4

Intensity Intensity 2E+4 Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 55 60 65 70 75 55 60 65 70 75 55 60 65 70 75 RT (min) RT (min) RT (min) 12ca5 (1) 12ca5 (2) 12ca5 (3) 8E+4 8E+4 8E+4 6E+4 6E+4 6E+4 4E+4 4E+4 4E+4 2E+4 2E+4 2E+4 Intensity Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 55 60 65 70 75 55 60 65 70 75 55 60 65 70 75 RT (min) RT (min) RT (min)

Figure 3.6.6. Select putative 14-3-3γ binders are reproducibly pulled down in the presence of 14-3-3. Extracted ion chromatograms (EICs) of a subset of 14-3-3γ-unique sequences reveal these peptides were retained in each selection replicate against 14-3-3γ (blue traces). Signals for

150 these peptides are also absent in the chromatograms from 12ca5 selections (orange traces), suggesting they were pulled down to the presence of 14-3-3. Three peptides are examined here: a) 14.3.3.1, b) 14-3-3.6, and c) 14-3-3.12 (Supplementary Table 11).

151 a) ALC Construct Sequence (%) NB.1 D-Lys β-Ser D-Gln Thz pSer Aad Thz Nph β-Thr Lys 92 NB.2 Hyp Nph Thz D-Asp pSer β-Ala Cpa Fph Thz Lys 86

b) NB.1 (EIC: m/z 476.85-476.86)

14-3-3 (1) 14-3-3 (2) 14-3-3 (3) 4E+5 4E+5 4E+5 3E+5 3E+5 3E+5 2E+5 2E+5 2E+5 1E+5 1E+5 1E+5 Intensity Intensity Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 25 30 35 40 25 30 35 40 25 30 35 40 RT (min) RT (min) RT (min) 12ca5 (1) 12ca5 (2) 12ca5 (3) 4E+5 4E+5 4E+5 3E+5 3E+5 3E+5 2E+5 2E+5 2E+5 1E+5 1E+5 1E+5 Intensity Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 25 30 35 40 25 30 35 40 25 30 35 40 RT (min) RT (min) RT (min)

c) NB.2 (EIC 694.73 – 694.74)

14-3-3 (1) 14-3-3 (2) 14-3-3 (3) 2E+5 2E+5 2E+5 1E+5 1E+5 1E+5

5E+4 5E+4 5E+4 Intensity Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 60 65 70 75 60 65 70 75 60 65 70 75 RT (min) RT (min) RT (min) 12ca5 (1) 12ca5 (2) 12ca5 (3) 2E+5 2E+5 2E+5

1E+5 1E+5 1E+5

5E+4 5E+4 5E+4 Intensity Intensity Intensity Intensity Intensity 0E+0 0E+0 0E+0 60 65 70 75 60 65 70 75 60 65 70 75 RT (min) RT (min) RT (min)

Figure 3.6.7. 12ca5-unique sequences are not reproducibly pulled down and are likely selection artifacts. a) Peptide sequences identified from selections against 12ca5. b) EICs of putative 12ca5-binding peptide NB.1 and c) NB.2 from each selection replicate. In both cases, the putative 12ca5-binding peptide was only identified in one of three replicates, indicating it is likely an artifact from the selection rather than a true 12ca5 binder.

152 Construct: FITC-labeled 14-3-3.1

Sequence: D-Leu–Nph–β-Ser–Nph–pSer–Nph–β-Ser–β-Ala–Nph–Lys–β-Ala–Lys(FITC)

Structure:

LC-MS trace: 685.92

1028.37

m/z

0 5 10 15 20 Retention time (min)

Figure 3.6.8. LC-MS characterization of FITC-labeled 14-3-3.1. Monoisotopic mass: 2054.72 Da; found: 2054.72 Da. β-Ala spacer and FITC label are indicated in the sequence in blue.

153 Construct: FITC-labeled 14-3-3.6

Sequence: Cha –Cha–β-Ser–Orn–pSer–Nph–β-Ser–β-Ser–Nph–Lys–β-Ala–Lys(FITC)

Structure:

LC-MS trace: 502.97 670.29

1004.93

m/z

0 5 10 15 20 Retention time (min)

Figure 3.6.9. LC-MS characterization of FITC-labeled 14-3-3.6. Monoisotopic mass: 2007.85 Da; found: 2007.85 Da. β-Ala spacer and FITC label are indicated in the sequence in blue.

154 Construct: FITC-labeled 14-3-3.12

Sequence: D-Lys–Nva–Nph–Thz–pSer–Nph–β-Ser–β-Ala–Nph–Lys–β-Ala–Lys(FITC)

Structure:

LC-MS trace: 508.44 677.58

1015.87

m/z

0 5 10 15 20 Retention time (min)

Figure 3.6.10. LC-MS characterization of FITC-labeled 14-3-3.12. Monoisotopic mass: 2029.72 Da; found: 2029.72 Da. β-Ala spacer and FITC label are indicated in the sequence in blue.

155 Construct: FITC-labeled NB.1

Sequence: D-Lys–β-Ser–D-Gln–Thz–pSer–Aad–Thz–Nph–β-Thr–Lys– β-Ala–Lys(FITC)

Structure:

LC-MS trace: 504.94 672.91 404.15 1008.86

m/z

0 5 10 15 20 Retention time (min)

Figure 3.6.11. LC-MS characterization of FITC-labeled NB.1. Monoisotopic mass: 2015.70 Da; found: 2015.71 Da. β-Ala spacer and FITC label are indicated in the sequence in blue.

156 Construct: 14-3-3.1

Sequence: D-Leu–Nph–β-Ser–Nph–pSer–Nph–β-Ser–β-Ala–Nph–Lys

Structure:

LC-MS trace:

0 2 4 6 8 10 12 Retention time (min)

Figure 3.6.12. LC-MS characterization of unlabeled 14-3-3.1. Monoisotopic mass: 1466.55 Da; found: 1466.61 Da.

157 Construct: 14-3-3.6

Sequence: Cha –Cha–β-Ser–Orn–pSer–Nph–β-Ser–β-Ser–Nph–Lys

Structure:

LC-MS trace:

0 2 4 6 8 10 12 Retention time (min)

Figure 3.6.13. LC-MS characterization of unlabeled 14-3-3.6. Monoisotopic mass: 1419.68 Da; found: 1419.74 Da.

158 Construct: 14-3-3.12

Sequence: D-Lys–Nva–Nph–Thz–pSer–Nph–β-Ser–β-Ala–Nph–Lys

Structure:

LC-MS trace:

0 2 4 6 8 10 12 Retention time (min)

Figure 3.6.14. LC-MS characterization of unlabeled 14-3-3.12. Monoisotopic mass: 1441.55 Da; found: 1441.62 Da.

159 Construct: NB.1

Sequence: D-Lys–β-Ser–D-Gln–Thz–pSer–Aad–Thz–Nph–β-Thr–Lys

Structure:

LC-MS trace:

0 2 4 6 8 10 12 Retention time (min)

Figure 3.6.15. LC-MS characterization of unlabeled NB.1. Monoisotopic mass: 1427.54 Da; found: 1427.60 Da.

160

11505 0 14-3-31 4 -3 -3γ" - 0 h 0 h

14-3-31 4 -3 -3σΔ# c- 0 h0 h Time Isoform Kd 11202 0 14-3-31 4 -3 -3γ"- 4 h 4 h (h) (nM) 14-3-31 4 -3 -3σΔ# -c 4 h 4 h y

p 14-3-31 4 -3 -3γ"- 2 0 h20 h 0 14-3-3γ 19.5 o 909 0 r t 14-3-31 4 -3 -3σΔ# -c 2 0 20h h 0 14-3-3σ 228.9 o s i 606 0 4 14-3-3γ 19.4 n Anisotropy Anisotropy A 4 14-3-3σ 169.4

303 0 20 14-3-3γ 21.9

20 14-3-3σ 101.5 00 1100-5- 5 1100-4- 4 101 0-3- 3 1100 -2- 2 1100-1- 1 1100 00 110011 101 022 ConcentrationC o n c e n tra tio n of 1 414-3-3-3 -3 [ !(µMM ] )

Figure 3.6.16. FITC-labeled 14-3-3.12 retains binding activity for 14-3-3σ as measured by fluorescence anisotropy. Affinity is reduced 5 to 12-fold relative to that for 14-3-3γ. Measurements were taken either immediately after, 4 h after, or 20 h after incubation of 14-3- 2.12 with 14-3-3. Error bars correspond to standard error among three technical replicates.

161

Figure 3.6.17. 4-Nitrophenylalanine9 makes key contacts with 14-3-3σ. a) 4- ε Nitrophenylalanine9 engages in an electrostatic interaction and/or H-bond with the NH3 group of Lys122 (N–O distance=3.2 Å), and makes a hydrophobic contact with Ile168. b) 4- Nitrophenylalanine6 interacts with the hydrophobic roof of the 14-3-3 binding groove, making hydrophobic contacts with Leu218, Ile219, and Leu222 of 14-3-3σ.

162 Table 3.6.1. Multi-pot selections against MDM2 identify p53-like peptides from a 109- member library.

Name Scan Peptide ALC (%) m/z z RT Mass ppm 1.1 17209 PSFHAVMWLKSFK 99 394.9648 4 53.04 1575.8384 -5.1 1.2 38696 KADPWWLETFWWK 80 597.9682 3 85.91 1790.8933 -5.9 2.1 43700 LLFDFTYAKWLYK 95 569.6469 3 89.84 1705.9231 -2.5 2.2 18425 HRPEYAVSVLAAK 86 480.6077 3 45.79 1438.8044 -2.3 2.3 31504 YSFHYWWTQLNEK 80 900.9277 2 68.67 1799.842 -0.6 3.1 25254 VSHYWTAYVQWYK 92 865.4241 2 65.52 1728.8413 -4.5 3.2 41789 PSFLDYWKVVFYK 90 564.3014 3 94.44 1689.8918 -5.6 3.3 46448 WWLWAYSYFQPWK 88 930.4522 2 99.85 1858.8984 -4.6 3.4 47164 FAWQPFQWWYNWK 87 943.4470 2 101.15 1884.8889 -5.0 3.5 26639 KTFVEYWNELRVK 80 570.9803 3 67.94 1709.9253 -3.7 4.1 49230 FTFLDYWQLLTGK 99 815.9340 2 110.13 1629.8555 -1.2 4.2 37222 LTTFFDYWNELYK 97 869.9229 2 101.54 1737.8403 -5.2 4.3 43167 FYALEYWQNFTDK 82 862.4064 2 97.87 1722.8042 -3.4 4.4 8578 ESMMYSHPQLFDK 80 548.5768 3 45.54 1642.7119 -2.0 5.1 28399 FGGYWWSQLYTYK 95 849.4065 2 93.05 1696.8037 -3.1 5.2 32707 FTFWDYWTLQNYK 84 905.9298 2 103.55 1809.8516 -3.6

List of all 16 peptides uniquely identified in the presence of MDM2, from five 2 x 108-member libraries. Selections were performed side-by-side with 12ca5 as a negative control. Sequences bearing the FXXXWXX(L/V) motif characteristic of MDM2-binding are indicated in purple. Sequences bearing the FXXXW motif, but may be mis-sequenced due to incomplete backbone fragmentation during MS/MS, are indicated in blue. Hot spot residues are underlined for clarity.

163 Table 3.6.2. Side-by-side selections against 14-3-3γ and 12ca5 identify 17 sequences pulled down uniquely in the presence of 14-3-3.

ALC Construct Sequence (%) 14-3-3.1 D-Leu Nph β-Ser Nph pSer Nph β-Ser β-Ala Nph Lys 99 14-3-3.2 Fph Nva Nph Fph pSer Cpa D-Gln Fph Fph Lys 99 14-3-3.3 Fph Fph Hyp Fph pSer Thz β-Ser β-Ala Nph Lys 97 14-3-3.4 Cha Cpa β-Ser Thz pSer Cha β-Ser Nph β-Ala Lys 91 14-3-3.5 Cha Cha β-Ser Fph pSer Thz β-Ser β-Ala Hyp Lys 91 14-3-3.6 Cha Cha β-Ser Orn pSer Nph β-Ser β-Ser Nph Lys 90 14-3-3.7 Cha Nph Hyp Fph pSer Thz β-Thr β-Ala Nph Lys 85 14-3-3.8 Cha Fph β-Ser Cpa pSer β-Thr Nph β-Ala Nph Lys 85 14-3-3.9 Thz Thz Hyp Thz pSer Nph β-Ser β-Ser Nph Lys 85 14-3-3.10 Cha β-Ala Nph Aph pSer Nva Aad D-Gln Nph Lys 85 14-3-3.11 D-Gln Cpa Nph Cpa pSer Fph β-Ser β-Ala Nph Lys 84 14-3-3.12 D-Lys Nva Nph Thz pSer Nph β-Ser β-Ala Nph Lys 84 14-3-3.13 Fph β-Ser Hyp Thz pSer Nph β-Ser β-Ser Nph Lys 83 14-3-3.14 Cha Cpa β-Ser Orn pSer Fph Cpa β-Ser Nph Lys 82 14-3-3.15 Cha β-Ala β-Ser Cpa pSer Fph β-Ala β-Ala Nph Lys 81 14-3-3.16 Cpa Fph Cha Thz pSer Thz Hyp Aph Hyp Lys 81 14-3-3.17 Cha Fph β-Ser Cpa pSer Aph β-Ser Aad Fph Lys 81

List of all sequences identified in selections against 14-3-3γ that matched the library design with ALC ≥ 80. Residues held constant in the library are indicated in blue. Prominent amino acid motifs were identified, including a Cha-X-β-Ser motif at the N-term, and an a β-Ser-(β-Ala/β-Ser )-Nph motif at the C-term. Abbreviations: β-Ala = β-alanine; β-Ser = β-homoserine; β-Thr = β- homothreonine; Aad = aminoadipic acid; Aph = 4-aminophenylalanine; Cha = cyclohexylalanine; Cpa = cyclopropylalanine; Fph = 4-fluorophenylalanine; Hyp = hydroxyproline; Nph = 4-nitrophenylalanine; Orn = ornithine; pSer = phosphoserine; Thz = thiazolylalanine.

164 Table 3.6.3. Data collection and refinement statistics for the 14-3-3σΔC/14-3-3.12 complex.

PDB Code 6TCH Data Collection Wavelength 1.54 Resolution range 33.24 - 1.80 (1.87 - 1.80) Space group C 2 2 21 82.4751 Unit cell a, b, c (Å) 112.282 62.7582 90 Unit cell α, β, γ (°) 90 90 Total reflections 167691 (6320) Unique reflections 27297 (2662) Multiplicity 6.1 (4.8) Completeness (%) 99.89 (99.22) Mean I/sigma(I) 18.7 (5.5) Wilson B-factor 11.26 R-merge 0.069 (0.258) R-meas 0.075 (0.289) R-pim 0.030 (0.129) CC1/2 0.998 (0.952) Refinement Reflections used in refinement 27296 (2662) Reflections used for R-free 1377 (133) R-work 0.1480 (0.1675) R-free 0.1838 (0.2365) Number of non-hydrogen atoms 2375 macromolecules 1930 ligands 94 solvent 351 Protein residues 236 RMS(bonds) 0.009 RMS(angles) 1.11 Ramachandran favored (%) 98.29 Ramachandran allowed (%) 1.71 Ramachandran outliers (%) 0.00 Rotamer outliers (%) 0.50 Clashscore 1.28 Average B-factor 15.84 macromolecules 13.00 ligands 33.40 Solvent 26.79

165 3.7. References

(1) Bond, G.; Hu, W.; Levine, A. MDM2 Is a Central Node in the P53 Pathway: 12 Years and Counting. Curr. Cancer Drug Targets 2005, 5 (1), 3–8. https://doi.org/10.2174/1568009053332627. (2) Momand, J.; Jung, D.; Wilczynski, S.; Niland, J. The MDM2 Gene Amplification Database. Nucleic Acids Res. 1998, 26 (15), 3453–3459. https://doi.org/10.1093/nar/26.15.3453. (3) Burgess, A.; Chia, K. M.; Haupt, S.; Thomas, D.; Haupt, Y.; Lim, E. Clinical Overview of MDM2/X-Targeted Therapies. Front. Oncol. 2016, 6. https://doi.org/10.3389/fonc.2016.00007. (4) Okamoto, K.; Kashima, K.; Pereg, Y.; Ishida, M.; Yamazaki, S.; Nota, A.; Teunisse, A.; Migliorini, D.; Kitabayashi, I.; Marine, J.-C.; Prives, C.; Shiloh, Y.; Jochemsen, A. G.; Taya, Y. DNA Damage-Induced Phosphorylation of MdmX at Serine 367 Activates P53 by Targeting MdmX for Mdm2-Dependent Degradation. Mol. Cell. Biol. 2005, 25 (21), 9608–9620. https://doi.org/10.1128/MCB.25.21.9608-9620.2005. (5) Stevers, L. M.; Sijbesma, E.; Botta, M.; MacKintosh, C.; Obsil, T.; Landrieu, I.; Cau, Y.; Wilson, A. J.; Karawajczyk, A.; Eickhoff, J.; Davis, J.; Hann, M.; O’Mahony, G.; Doveston, R. G.; Brunsveld, L.; Ottmann, C. Modulators of 14-3-3 Protein–Protein Interactions. J. Med. Chem. 2018, 61 (9), 3755–3778. https://doi.org/10.1021/acs.jmedchem.7b00574. (6) Hermeking, H.; Benzinger, A. 14-3-3 Proteins in Cell Cycle Regulation. Semin. Cancer Biol. 2006, 16 (3), 183–192. https://doi.org/10.1016/j.semcancer.2006.03.002. (7) Berg, D.; Holzmann, C.; Riess, O. 14-3-3 Proteins in the Nervous System. Nat. Rev. Neurosci. 2003, 4 (9), 752–762. https://doi.org/10.1038/nrn1197. (8) Bockus, A. T.; Lexa, K. W.; Pye, C. R.; Kalgutkar, A. S.; Gardner, J. W.; Hund, K. C. R.; Hewitt, W. M.; Schwochert, J. A.; Glassey, E.; Price, D. A.; Mathiowetz, A. M.; Liras, S.; Jacobson, M. P.; Lokey, R. S. Probing the Physicochemical Boundaries of Cell Permeability and Oral Bioavailability in Lipophilic Macrocycles Inspired by Natural Products. J. Med. Chem. 2015, 58 (11), 4581–4589. https://doi.org/10.1021/acs.jmedchem.5b00128. (9) Bird, G. H.; Mazzola, E.; Opoku-Nsiah, K.; Lammert, M. A.; Godes, M.; Neuberg, D. S.; Walensky, L. D. Biophysical Determinants for Cellular Uptake of Hydrocarbon-Stapled Peptide Helices. Nat. Chem. Biol. 2016, 12 (10), 845–852. https://doi.org/10.1038/nchembio.2153. (10) Obexer, R.; Walport, L. J.; Suga, H. Exploring Sequence Space: Harnessing Chemical and Biological Diversity towards New Peptide Leads. Curr. Opin. Chem. Biol. 2017, 38, 52– 61. https://doi.org/10.1016/j.cbpa.2017.02.020. (11) Quartararo, A. J.; Gates, Z. P.; Somsen, B. A.; Hartrampf, N.; Ye, X.; Shimada, A.; Kajihara, Y.; Ottmann, C.; Pentelute, B. L. Ultra-Large Chemical Libraries for the Discovery of High-Affinity Peptide Binders. Nat. Commun. 2020, 11 (1), 3183. https://doi.org/10.1038/s41467-020-16920-3. (12) Kay, B. K.; Kurakin, A. V.; Hyde-DeRuyscher, R. From Peptides to Drugs via Phage Display. Drug Discov. Today 1998, 3 (8), 370–378. https://doi.org/10.1016/S1359- 6446(98)01220-3.

166 (13) Böttger, V.; Böttger, A.; Howard, S. F.; Picksley, S. M.; Chène, P.; Garcia-Echeverria, C.; Hochkeppel, H. K.; Lane, D. P. Identification of Novel Mdm2 Binding Peptides by Phage Display. Oncogene 1996, 13 (10), 2141–2147. (14) Hu, B.; Gilkes, D. M.; Chen, J. Efficient P53 Activation and Apoptosis by Simultaneous Disruption of Binding to MDM2 and MDMX. Cancer Res. 2007, 67 (18), 8810–8817. https://doi.org/10.1158/0008-5472.CAN-07-1140. (15) Pazgier, M.; Liu, M.; Zou, G.; Yuan, W.; Li, C.; Li, C.; Li, J.; Monbo, J.; Zella, D.; Tarasov, S. G.; Lu, W. Structural Basis for High-Affinity Peptide Inhibition of P53 Interactions with MDM2 and MDMX. Proc. Natl. Acad. Sci. 2009, 106 (12), 4665–4670. https://doi.org/10.1073/pnas.0900947106. (16) Li, C.; Pazgier, M.; Li, C.; Yuan, W.; Liu, M.; Wei, G.; Lu, W.-Y.; Lu, W. Systematic Mutational Analysis of Peptide Inhibition of the P53–MDM2/MDMX Interactions. J. Mol. Biol. 2010, 398 (2), 200–213. https://doi.org/10.1016/j.jmb.2010.03.005. (17) Alluri, P. G.; Reddy, M. M.; Bachhawat-Sikder, K.; Olivos, H. J.; Kodadek, T. Isolation of Protein Ligands from Large Peptoid Libraries. J. Am. Chem. Soc. 2003, 125 (46), 13995– 14004. https://doi.org/10.1021/ja036417x. (18) Kritzer, J. A.; Luedtke, N. W.; Harker, E. A.; Schepartz, A. A Rapid Library Screen for Tailoring β-Peptide Structure and Function. J. Am. Chem. Soc. 2005, 127 (42), 14584– 14585. https://doi.org/10.1021/ja055050o. (19) Yaffe, M. B.; Rittinger, K.; Volinia, S.; Caron, P. R.; Aitken, A.; Leffers, H.; Gamblin, S. J.; Smerdon, S. J.; Cantley, L. C. The Structural Basis for 14-3-3:Phosphopeptide Binding Specificity. Cell 1997, 91 (7), 961–971. https://doi.org/10.1016/S0092-8674(00)80487-0. (20) de Vink, P. J.; Briels, J. M.; Schrader, T.; Milroy, L.; Brunsveld, L.; Ottmann, C. A Binary Bivalent Supramolecular Assembly Platform Based on Cucurbit[8]Uril and Dimeric Adapter Protein 14‐3‐3. Angew. Chem. Int. Ed Engl. 2017, 56 (31), 8998–9002. https://doi.org/10.1002/anie.201701807. (21) Efimov, A. V. Standard Structures in Proteins. Prog. Biophys. Mol. Biol. 1993, 60 (3), 201–239. https://doi.org/10.1016/0079-6107(93)90015-C. (22) Hartrampf, N.; Saebi, A.; Poskus, M.; Gates, Z. P.; Callahan, A. J.; Cowfer, A. E.; Hanna, S.; Antilla, S.; Schissel, C. K.; Quartararo, A. J.; Ye, X.; Mijalis, A. J.; Simon, M. D.; Loas, A.; Liu, S.; Jessen, C.; Nielsen, T. E.; Pentelute, B. L. Synthesis of Proteins by Automated Flow Chemistry. Science 2020, 368 (6494), 980–987. https://doi.org/10.1126/science.abb2491. (23) Moriarty, N. W.; Grosse-Kunstleve, R. W.; Adams, P. D. Electronic Ligand Builder and Optimization Workbench (ELBOW): A Tool for Ligand Coordinate and Restraint Generation. Acta Crystallogr. D Biol. Crystallogr. 2009, 65 (Pt 10), 1074–1080. https://doi.org/10.1107/S0907444909029436. (24) Emsley, P.; Cowtan, K. Coot: Model-Building Tools for Molecular Graphics. Acta Crystallogr. D Biol. Crystallogr. 2004, 60 (Pt 12 Pt 1), 2126–2132. https://doi.org/10.1107/S0907444904019158. (25) Adams, P. D.; Afonine, P. V.; Bunkóczi, G.; Chen, V. B.; Davis, I. W.; Echols, N.; Headd, J. J.; Hung, L.-W.; Kapral, G. J.; Grosse-Kunstleve, R. W.; McCoy, A. J.; Moriarty, N. W.; Oeffner, R.; Read, R. J.; Richardson, D. C.; Richardson, J. S.; Terwilliger, T. C.; Zwart, P. H. PHENIX: A Comprehensive Python-Based System for Macromolecular Structure Solution. Acta Crystallogr. D Biol. Crystallogr. 2010, 66 (Pt 2), 213–221. https://doi.org/10.1107/S0907444909052925.

167 (26) Afonine, P. V.; Grosse-Kunstleve, R. W.; Echols, N.; Headd, J. J.; Moriarty, N. W.; Mustyakimov, M.; Terwilliger, T. C.; Urzhumtsev, A.; Zwart, P. H.; Adams, P. D. Towards Automated Crystallographic Structure Refinement with Phenix.Refine. Acta Crystallogr. D Biol. Crystallogr. 2012, 68 (Pt 4), 352–367. https://doi.org/10.1107/S0907444912001308.

168 Chapter 4. Machine learning facilitates classification of specific and non-specific peptide binders

4.1. Introduction

Panning compound libraries against a target of interest is a staple of many drug discovery efforts1. Conceptually these efforts rely on the ability to reliably identify and validate active compounds from a large starting collection. A major bottleneck of discovery campaigns, however, is the identification of artifacts (sometimes referred to as false positives or non-specific binders)—compounds that do not actually bind the target of interest, but exhibit activity based on either the screening or selection criteria2,3. Identification of artifactual hits can be due to the greasy nature of some compounds, resulting in binding activity in a target-agnostic fashion, or due to off-target binding to a component used in the screening/selection assay (i.e. plastic surfaces, agarose resin, etc.). Because it is not immediately clear if a compound is a true binder or an artifact, each hit must be carefully experimentally validated for both activity and specificity, a time and labor-intensive process.

Over the past decade, peptides have garnered much research attention as a therapeutic class of compound4. This interest has in part been spurred by the potential of peptides to disrupt biomedically relevant protein-protein interactions (PPIs), in addition to their synthetic accessibility5, amenability to chemical tailoring6, and potential for cell penetration7. The discovery of peptide-based binders typically relies on panning large libraries against a target of interest, which can be accomplished through a variety of chemical or genetic approaches. Phage display8, mRNA display9, DNA-encoded and one-bead-one-compound10 library screening, and affinity selection-mass spectrometry (AS-MS)11 are some techniques used for this purpose.

However, inherent to all of these methods is the concomitant discovery of true binders and artifacts, despite a number of strategies aimed at addressing this challenge.

169 Machine learning has greatly impacted the drug discovery field, as in silico screening and the development of various predictor tools have guided efforts in hit identification and advancement12,13. For peptides in particular, machine learning models have enabled prediction of variants (for instance, variants of antimicrobial or anticancer peptides) that exhibit higher activity than peptides from the training set7,13–15. However, current models used to predict peptide-protein interactions face certain limitations. Affinities of peptide binding domain-peptide interactions can be accurately predicted across multiple protein families, for example, but such approaches require the three-dimensional structure of peptides and associated proteins for prediction16,17.

Another set of models have been developed to predict the binding affinity or binding probability of neoantigens for MHC Class I or Class II alleles18–20. However, these models represent amino acids using either one-hot encodings or physicochemical descriptors, which lose the atomistic information of the structure, and therefore cannot readily evaluate the chemical space occupied by non-canonical amino acids. Moreover, because these models are trained on sequences of specific binders alone, they cannot be used to predict non-specific binders (artifacts), which tend to be identified in a target-independent fashion. This lack of predictive ability is problematic when trying to identify specific binders for a new target, for which known ligands may not exist.

To address some of these limitations in the context of peptide binder discovery, we report here the development of an unsupervised-supervised machine learning model, based on topological representation of amino acids, that could accurately discern between specific and non-specific binders. For the purposes of model development and evaluation, data obtained from affinity selections against the monoclonal anti-hemagglutinin antibody, clone 12ca5, was used in this study21. First, unsupervised learning was used to readily distinguish 12ca5-specific binders and non-specific binders on the basis of chemical similarity. Supervised learning was then used

170 to classify unknown sequences as 12ca5-specific binders, non-specific binders, or non-binders

(Fig. 4.1.1). From an initial set of 2,704 binding sequences, and a generated set of 2,704 non- binding sequences, the model could classify a set of 1,081 sequences not used during training with 99% accuracy.

Figure 4.1.1. Machine learning streamlines peptide binder discovery to a model selection target. Binders to the anti-hemagluttinin monoclonal antibody, clone 12ca5, are identified from affinity selection-mass spectrometry, and distinguished from non-specific binders based on the presence of the 12ca5-binding motif, DXXDY(A/S). Unsupervised and supervised machine learning methods are then applied, to cluster these sequences and learn their chemical similarities. Finally, the model is applied to a set of unknown sequences (also derived from affinity selection), and classifies them as either 12ca5-specific, non-specific, or non-binding peptides with high accuracy.

4.2. Results

4.2.1. AS-MS identifies both 12ca5-specific binders and non-specific binders

Affinity selections were performed in triplicate using 2 x 106, 2 x 107, or 2 x 108-member

9 libraries of design (X)9K, or a 10 -member library of design (X)12K, where X = all natural amino acids except for cysteine and isoleucine. These libraries were incubated at varying concentrations with 12ca5-immobilized streptavidin-coated magnetic beads, which were in turn washed, and

171 bound peptides eluted under denaturing conditions. The bound fractions were concentrated and analyzed by nano-liquid chromatography-tandem mass spectrometry (nLC-MS/MS). Where possible, library concentrations up to 1 nM/member were used during selections to increase the number of non-specific binders identified (to expand the data set for subsequent model training).

All identified peptides with a sequencing score (average local confidence (ALC) score) ≥

80 (2,704 in total) were considered “binders”. Binders could either be 12ca5-specific binders or non-specific binders, where a non-specific binder is defined as a peptide that binds any component of the selection assay in a target-independent fashion. For labeling, these peptides were hand-sorted according to the presence or absence of the characteristic 12ca5-binding motif,

DXXDY(A/S). Peptides containing this motif, or that differed by one amino acid from it (for example, EXXDYA or DXXDYT), were labeled 12ca5-specific binders (1055 total). All other identified peptides were labeled non-specific binders (1649 total). All peptides not identified, whose sequence composition is inherently unknown given the undersampled sequence space, were considered non-binders (in theory, ~106–108 in total, depending on the size of the library used).

4.2.2. Unsupervised learning distinguishes 12ca5-specific and non-specific binders

To encapsulate the inherent chemistry of the peptide sequence, each amino acid was represented as a bit-vector, based on the residue’s molecular graph (Fig. 4.2.1a). Briefly, in this representation, each atom was treated as a node, and each bond as an edge (every amino acid is also flanked by amide bonds, to represent its form in the peptide chain). Each bit in the vector corresponds to a particular substructure, and is active if that substructure is present, or inactive if it is absent. The full bit vector was established by an indexed list of all unique atomistic features for a particular monomer, wherein each index position contains either a 0 or 1, depending on the

172 monomer’s features. Finally, a matrix of bit-vectors was used to represent the full peptide sequence (Fig. 4.2.1b).

To quantify chemical similarity between peptide sequences, an alignment metric was developed using amino acid bit vectors (Fig. 4.2.1c). First, pairwise global sequence alignment was performed using the Smith-Watermann algorithm22. Tanimoto distances23 over the bit vectors were used to generate a chemical similarity matrix, which was used as the substitution matrix for scoring. The alignment was then scored based on a match or mismatch of a residue at a specific position. A similarity vector of n-dimensions was generated for each peptide sequence, wherein the sequence was aligned and scored for chemical similarity with every other sequence within the n-sized sequence pool (n = 2704).

Decomposition of sequence similarity vectors and subsequent clustering could differentiate 12ca5-specific and non-specific binders, as well as identify unique sub-families within the 12ca5-specific binder population. First, dimensionality reduction was performed using t-Stochastic Neighbor Embedding (t-SNE), decomposing the 2704-dimensional similarity vector for each sequence into two components. Next, k-means clustering was applied to t-SNE components, revealing two clusters corresponding to 12ca5-specific binders (characterized by the presence of the DXXDY(A/S) motif) and non-specific binders (Fig. 4.2.1d–f). Their distinct clustering indicates their chemical dissimilarity. Additional clustering of the 12ca5-specific binder population yielded three sub-families, which differed based on sequence length (10 vs.

13-mers) or specific amino acid content (for instance, the enrichment of proline in subfamily 2)

(Fig. 4.2.1e–g).

173

A O B C H N D D P D Y A - D L H D Y A V N H O | | . | | . Alanine A D L M D Y - R Fingerprint Bit Vector Chemical Similarity 41 132 200 4.59 O O O H H H N N N N N N H H H O O O Sequence Matrix Similarity Vector

D E F Non-Specific Binders 90 12ca5-S/B 50 Cluster 1 MGPAAF TQE Cluster 3 0 F SNGF TMVP 0 Cluster 2 F VMF QVGTY -50 K W MQMYWFT Non-S/B

tSNE Component 2 Component tSNE -90 : tSNE Component 2 Component tSNE -140 0 140 -150 -100 -50 : tSNE Component 1 tSNE Component 1 ~ 1650 peptide sequences

G 3.0 3.0 3.0 Cluster 1 Cluster 2 Cluster 3 2.02 2.02 2.02 Bits Bits Bits Bits 1.0 D 1.0 PDY 1.0 D Y DD ED Y A DY DD DE A L D A L Y L P E L A P W D A D Y DP SA S Q L A L D P Y P F D D S Y D NYY D S L A V A YL D AL P VQ D L Y D V VV F D L E E A YG E Y LAY S D E LE S SF FW Y LD E AE D F W MV A E Q G F E D D V S V A S SD A M AE P L G TV Y N L F Q F E F A E VE N F QW AA QE Q VQN YT P N M EG Q QS N V V V P A H E K A T N L EW P Y V N A A G MA D AN T F TNQ Y Y V D Q K GQ NY PMQ M G S H F PF S A T E W F Y E V TEQ G V LQV SS PE EQN T TMP G A FK R F V L P M Y P W GA E F E S T SW TTF MM L WT E Y D G G V F V WW M P M H L W P E M M E W F E T N T L Y V H H QN GM T YY Q D N V AV A H V Y W T S T T H E L S H L P SG Y QH K M N V

W K F V Y P S A H W Q F

S

P 0.00 0.00 S 0.00 T 55 1010 5 1010 5 Position Position Position WebLogo 3.7.4 Amino Acid Position WebLogo 3.7.4 WebLogo 3.7.4

Figure 4.2.1. Unsupervised learning over chemical similarities distinguishes 12ca5-specific and non-specific binders. a) Amino acids are represented as fingerprint bit-vectors, with N- terminal acetyl groups and C-terminal methylated amide groups to mimic the peptide chain. The bit-vector for alanine is shown as an example, with three representative substructures and corresponding fingerprint bit-vector indices. In the full molecule, the alanine scaffold is shown in black and the capping groups in blue. For the substructures, the relevant topology is shown in black and the rest of the structure in grey. b) Sequences are represented by row matrices of amino acid fingerprints. Each fingerprint corresponds to the amino acid at that position in the sequence. c) The chemical similarity vector for each sequence is obtained by pairwise global sequence alignment with the remainder of the library. Scoring is performed using a chemical similarity matrix, obtained from Tanimoto distances over the bit-vectors. d) Unsupervised learning over chemical similarity vectors distinguishes 12ca5-specific (12ca5-S/B) and non- specific (Non-S/B) binders. Dimensionality reduction is performed using t-Stochastic Neighbor Embeddings (t-SNE), and clustering by the k-means algorithm. e) 12ca5-specific binders from (d) are clustered into three sub-families. f) A subset of non-specific binders from (d) are shown, with one, two, three, or four aromatic amino acids (C-term lysine is omitted for all sequences). g) Positional frequency plots were generated from multiple sequence alignments of clusters 1, 2, and 3 from (d). Clusters 1 and 2 each contain 13-mers, and cluster 3 contains 10-mers. The characteristic 12ca5-binding motif, DXXDY(A/S), is evident in each, while cluster 2 shows a distinct enrichment in proline. (C-term lysine is omitted here.)

174 4.2.3. Supervised learning classifies unknown sequences with 99% accuracy

Supervised learning was carried out in two steps: first, one-class classification, and generation of a data set of non-binding sequences, was used to classify unknown sequences as either binders or non-binders; next, multi-class classification was used to classify unknown sequences as 12ca5-specific binders, non-specific binders, or non-binders (Fig. 4.2.2a).

To begin, a sequence generator/one-class classifier loop was used to a) learn the chemical similarities of binding sequences and then b) use this information to generate a set of non- binding sequences. This step was necessary because the sequence information of non-binding peptides is lost during affinity selection (non-binding peptides are not isolated, and the pool of

~106–108 non-binding peptides is too complex for nLC-MS/MS analysis). To carry this procedure out, an isolation forest one-class classifier was trained using similarity vectors of binding sequences, to learn their chemical similarity (Fig. 4.2.2b). A sequence generator was then used to sample sequences of the same length and design as sequences of binding peptides.

The generator was therefore constrained as follows: sequences could contain either 10 or 13 residues, comprise all natural amino acids except cysteine and isoleucine, and must contain a C- term lysine. A similarity vector for each non-binding sequence was calculated by comparison with the pool of binding sequences, and then was used as an input for one-class classification. A set of 2,704 non-binding sequences of was generated to create a balanced set (n = 2,704 for both binders and non-binders) for ensuing multi-class classification.

In order to classify 12ca5-specific, non-specific, and non-binders, a convolutional neural network (CNN) model was trained. Bit vector matrices of sequences and their respective class labels were used for this purpose. The model demonstrated 99% accuracy and a 0.002 categorical cross entropy loss when evaluated against a randomly held out 20% of the data, and

175 99% accuracy when evaluated against an additional 20% of the data that was not used during training (Fig. 4.2.2c).

A Classes of sequences B Performance of C Performance of one-class classifier multi-class classifier

Actual Non- Actual 12ca5 Non- specific binders binders binders Binding Non-Binding 12ca5 binders 216 0 0 Non-specific Binding 528 Binders Predicted Non- Predicted specific 1 330 0 by model binders by model Binders Non- Non- Binding 12 binders 0 0 534

Figure 4.2.2. Supervised learning classifies 12ca5-specific, non-specific, and non-binders. a) The different classes of peptides—specific, non-specific, and non-binding peptides—are represented in the Venn diagram. The classification tasks are as follows: Task 1—one-class classification between binding (12ca5-specific and non-specific) and non-binding (~108) peptides; and Task 2—multi-class classification for classification of 12ca5-specific, non-specific, and non-binding peptides. b) The one-class classifier demonstrated 98% accuracy against a set of test sequences. The model was trained using 80% of all binding sequences (specific and non- specific), and the tested on the remaining 20%. c) The convolutional neural network classification model demonstrated 99% accuracy when tested against a set of sequences (20% of the total) not used during training. The model was trained initially on 60% of all sequences, and then validated during training with an additional 20%.

4.3. Discussion

We report here a method to streamline affinity selection of large peptide libraries using machine learning. The key features of the method include: 1) representation of the sequence using chemical fingerprints; 2) chemical similarity metrics; and 3) unsupervised and supervised machine learning for classification. In the context of selections for binding to a model, monoclonal antibody selection target, this method enables distinction of specific, non-specific, and non-binders with 99% accuracy.

176 The topological representation of amino acids used here encodes the covalent bonding network of each monomer (and by extension each sequence), and implicitly accounts for their unique physicochemical properties. This representation therefore enables facile incorporation of non-canonical amino acids. In this way, the model is an advancement over models based on one- hot encodings or physicochemical descriptors, which, in contrast, would require experimental determination of such properties for every new monomer introduced for featurization.

Using pairwise comparison of sequences based on chemical similarity facilitates precise mapping of sequence diversity among peptides from synthetic libraries. In contrast to conventional pairwise alignment approaches, such as BLOSUM62 and PAM250, which use scoring metrics based on evolutionary statistics, the method used here is based on similarity of substructures of individual amino acids. This approach is advantageous in the context of synthetic peptide libraries, which are generated combinatorially, and therefore do not face the constraints of the amino acid-dependent frequencies found in nature. For example, whereas the frequencies of glycine and tryptophan in proteins are quite different, in a synthetic library they should more or less the same. Moreover, the chemical similarity-based approach used here could potentially be used to advance data debiasing, for example in the prediction of protein-ligand or protein-protein interactions24.

The machine learning methods employed here enable highly accurate distinction and classification of peptides obtained from selections for 12ca5-binding. First, unsupervised learning is able to distinguish specific from non-specific binders based on chemical similarities.

This approach alone can be applied to selections against other, less-characterized targets, in order to identify unique clusters of chemically similar sequences. Next, the supervised learning model developed here can classify peptide sequences as 12ca5-specific, non-specific, or non-binding

177 with high accuracy. This model can in theory also be applied to other targets, for which binding information is scarce or not available, to facilitate prediction of non-specific binders, and by subtraction, identification of target-specific binders. We anticipate that continued progress in this area, such as integration of similarly large data sets of binders for other proteins aside from

12ca5, will aid predictions in the context of selections against unknown targets, and further streamline discovery efforts.

4.4. Experimental

4.4.1. Preparation and characterization of libraries

Library preparation is described in detail in sections 2.4.10 and 3.4.2. Library characterization is described in sections 2.4.11 and 3.4.3.

4.4.2. Affinity selection of libraries against 12ca5

Preparation of 12ca5-functionalized magnetic beads, per library:

MyOne Streptavidin T1 Dynabeads (300 μL of 10 mg/mL stock) were transferred to 1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS, and then treated with 300 μL of biotinylated 12ca5

(1.5 μM; 0.45 nmol). The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween

20, 1x PBS. Beads were resuspended in 300 μL of 10% FBS, 0.02% Tween 20, 1x PBS.

Affinity capture:

178 Libraries (1000, 100, or 10 fmol/member for 2 x 106, 2 x 107, and 2 x 108-member libraries, respectively) were incubated with 100 μL (1 mg) portions of protein-immobilized magnetic beads (prepared above) in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads; 1000, 100, or 10 pM/member library.

Elution:

The centrifuge tubes containing the bead suspensions were transferred to the magnetic separation rack. The beads were washed 3 x 1 mL w/ 1x PBS. Bound peptides were eluted with 2 x 100 μL 6M guanidine hydrochloride, 200 mM phosphate, pH 6.8. Eluates from 2 x 107 and 2 x108-member libraries were concentrated via C18 ZipTip® pipette tips (as described in 2.4.9) and lyophilized.

NanoLC-MS/MS:

5 μL of eluate from 2 x 106-member library was submitted for nLC-MS/MS analysis without prior concentration (theoretical loading: 25 fmol/peptide). Powders from 2 x 107- member libraries were resuspended in 20 μL water (0.1% formic acid), and 5 μL submitted for nLC-MS/MS analysis (theoretical loading: 25 fmol/peptide). Powders from 2 x 108-member libraries were resuspended in 6 μL water (0.1% formic acid), and 5 μL submitted for nLC-

MS/MS analysis (theoretical loading: 8.3 fmol/peptide). Analysis was performed as described in

2.4.11.

4.5. Acknowledgements

This work was supported by the NIH/NIGMS Interdepartmental Biotechnology Training

Program (T32 GM008334 to A.J.Q.), the Defense Advanced Research Projects Agency

179 (DARPA; Award 023504-001 to B.L.P.), and Calico (to B.L.P.). Affinity selections, mass spectrometry, and sequencing analysis was performed by A.J.Q. The conceptual framework for how to apply machine learning methods to aid peptide binder discovery came out of conversations with Somesh Mohapatra (in the lab of Rafael Gomez-Bombarelli), A.J.Q., Joseph

Brown, Genwei Zhang, and Andrei Laos. Model conceptualization and development was performed by Somesh Mohapatra. Figures 4.1.1, 4.2.1, and 4.2.2, along with text in parts of the introduction, results, and discussion, were adapted from figures and text originally generated by

Somesh Mohapatra. Text in other parts was generated by A.J.Q, and in some places revised through discussions with Somesh Mohapatra and Joseph Brown.

180 4.6. Appendix

6 8 9 Table 4.6.1. Affinity selections of 10 –10 -member (X)9K and 10 -member (X)12K libraries against 12ca5 identify 1055 12ca5-binding peptides.

AAVDLQDYAK FQATFDMEDYAMK MADLFDYAAK SVFQEDVEDYAFK ADLLDYADLK FQDLLDYADFNGK MADPYDYAGK SVNYADTLDYEGK ADLMDYRTAK FQDVQDYSAK MADVPDYSDK SVYPFDVVDYASK ADLPDYARVK FQELDLEDYAFDK MDAADYADVK TAFFWDDQPDYAK ADLQDYAAVK FRDLHDYAYTRVK MDDDVEDYAK TAFLDDGQDYADK ADMDMQDYATAFK FSDDLQDYAK MDLEDYAAAENGK TDDEDWPDYSTGK ADMEDYAQNYPLK FSDEQTDLADYAK MDLEDYAAANDAK TDDLEDYAVTNPK ADQHDYAFTK FSDGPDYASFADK MDLHDYADFK TDFHDYAFMK ADVEDYASQK FSDLVDYADWTGK MDLHDYAFYK TDLADYASAK ADVKDYAAYK FSKTSDHVDYAWK MDLHDYAYHK TDLVDYAAAK ADVQDYASWK FSLLODEPDYAAK MDLPDYADKK TDQDLADYAAGNK ADWRDYADLK FSTDEDHVDYSFK MDLQDYAFQK TEEDVEDYAMTFK AEGDDLADYAGNK FSVTNDEPDYAWK MDMEDYAAMK TEGFLDQQDYAEK AELPDVEDYSWNK FSWFADMQDYDSK MDMEDYAMSK TEPQEDLSDYAAK AEPMMDTQDYADK FTDQADYAGK MDMEDYSAVQAPK TFDMVDYAEK AGDLTDYAAK FTDVYDYAANFWK MDSYNDFPDYAQK TGEVEDLPDYAAK AGGDLPDYADQPK FTVSPDAQDYASK MDVADYAQSK TGLDVPDYASOOK ALDLKDYAAK FVADYKDYADSTK MDVEDYADFK TLDAQDYAFK ALDVPDYAGK FVAQWDQADYAEK MDVPDYASAK TLDFVDYAAK ALEDHTDYAHLNK FVDLPDYSSK MEADMEDYAEAFK TLDVKDYADK ALHDVEDYAKGPK FVDVKDYADK MESEWDVPDYSEK TMEKDELDYAGGK ALNPVDAEDYANK FVMDQDVADYSEK MGDDVEDYATWGK TMEKDLEDYAGGK ALQDLQDYAK FVMDQDVADYTDK MGDLPDYAAK TMYEDVQDYADQK AMVWMDQLDYADK FVODQDVADYDTK MGTDFQDYADNYK TNDLMDYSHK ANYDLPDYSDQEK FVQDMEDYAK MHTFDLPDYASEK TODLDDFPDYAQK APDMPDYAYK FWFEDWQDYAQNK MLADMEDYAK TPDFQDYAAK APGDLPDYAVTNK FYDQVDYADYENK MLDMPDYANK TPDLTDYSEK APWLFDMPDYSGK FYDYEDYADK MLDWVDYAHK TPDLVDYSAK AQDMLDYADK FYENYDLHDYSFK MMDHQDYAFK TPDVTDYAAK AQYSSDVEDYAVK FYLQADDEEDYAK MMMVDSPDYAANK TPQDHVDYAFVQK ASAMLDAQDYAEK FYVPMDLQDYAEK MMYFNEVTDYADK TQDLVDYAAK ASDLHDYAEVHGK FYWWDYPDYAAMK MNDLPDYATK TSDWPDYAQK ASDLPDYADTLAK GAEDTDDWEDYAK MNMETDMPDYASK TSLLDKHDYAYSK ASYLDVYDYADPK GAPMQDVVDYDSK MPEPDFADYAALK TSVAEDLEDYANK

181 ATDFQDYAYK GAPMQDVVDYSDK MPFSDAPDYADNK TSYGWDQQDYADK ATFLDDQGDYADK GDAADYADQSVPK MSENDVSDYAAFK TTKWTADOEDYAK ATVDQKDYADQVK GDLADYARHK MSNDVVDLPDYAK TTQQEDTPDYAFK ATVNDMPDYAEOK GDLEDYADPTTHK MSTNPDVGDYAAK TTSWADVEDYAGK AVNDMMDYADQGK GDLEDYAWNK MSVWMENLDYADK TTWDVEDYAK AVTEEDLEDYAGK GDLMDYAAQNGDK MTQFDVADYAYDK TVWWDDVEDYDTK AVWDFTDWPDYAK GDPFVDLPDYSNK MTVDVEDYASVAK TWDRPDYADK AWDLVDYSHK GDVMDYADKHLGK MVADFEDYASYPK TWSDLKDYADKGK DADSFDVNDYADK GDYEFLDVEDYAK MVADVTDYAKGDK TYDWEDYSAK DANFQDVTDYSDK GEHSFDTPDYADK MVDLEDYAGK TYMPDTEDYAEGK DANVDLHDYAFYK GFDLEDYVSK MWDLTDYADVMSK VADKEDYAQLLDK DAYDYEDYAAQSK GGMDFDMQDYAGK MYDVHDYSHK VADVQDYAGK DAYNEDLADYASK GGQQHYDLEDYAK MYGFSSDLEDYAK VDLEDYANFK DDASQDVPDYAQK GHVEADLWDYSDK NADLLDYADK VDLGDYAERK DDDLWDYAQK GMDVVDYAAK NAGQHYDLEDYAK VDLHDYADWK DDEAWDMEDYAQK GNDLEDYASK NDDLADYAEK VDLKDYTDVTPNK DDFQLDKPDYVDK GNHDWPDYAAVRK NDLADYAVSK VDLQDYATAK DDFYDVTDYADPK GNMNLGDLEDYAK NDMQDYAFWATYK VDMPDYAEGTQAK DDLADYADPELSK GOASDOADYAAGK NDNVTDVPDYAYK VDVYDYADAK DDLKDYADAMYPK GPDVEDYAVK NDOEDYAHSAPPK VDWDDLQDYAVEK DDLVDYAFAK GPVDVEDYASDNK NDVQDYMAKK VDWDLEDYAK DDLVDYAQPK GPVWADKADYAHK NDWPDYAGFK VDWQDYARYK DDLVGDQEDYAEK GQDSFDVNDYADK NEADLPDYAWFAK VDYADYAAFK DDLYDYAANK GQFDWDDVEDYSK NENFODGPDYADK VEDHVEGEDYAGK DDQKDYAEVELYK GSPEKDLEDYAFK NGAEGDVFDYAEK VEDQQDYAATAWK DDVADYADWWSDK GTEDVEDYAEEAK NGAVDQPDYAAYK VEGDLPDYAK DDVPDYASASPPK GVDDTADYAAFPK NGDASDOVDYAHK VELYDTQDYAEAK DDWQDYAAFK HADWMDYADK NGDVQDYAFK VEMGDLNDYADPK DEDLKDYDSLLAK HAWDLPDYAASHK NGHDWPDYAARVK VETDVMDYADFEK DEDYHDYAYQEVK HDHLDYASDNVFK NGLTMDWVDYADK VFDLHDYASK DEQYKDDLEDYAK HDLADYAAWK NGNDDLEDYAFEK VFEENDAQDYSDK DFQQGDLEDYSDK HDLADYASWYAQK NGNYGYDLQDYAK VFGDLPDYSSTSK DGNDDLEDYAEFK HDLEDYANVK NHEWDVQDYAALK VFGPPDWGDYADK DHFVDMSDYAAFK HDLPDYAAGK NLDQQDYAFK VFLSDAADYAANK DHYHWDQVDYAAK HDLPDYAYYK NLDVPDYSSK VFYYDLPDYASFK DKLTFDTPDYSDK HDNLDYAQQLGHK NLVDWDGPDYADK VGDLEDYASNVYK DLALDMVDYSEPK HDVHDYAFGK NMDLQDYSMK VGDVFDYANDSPK DLEDYADFTNQRK HDVPDYAAHDWAK NMGDLPDYAEYSK VGDWPDYESK DLHDYAFHLRLSK HDVVDYAAGK NNEKDLEDYAHQK VGPQKEVTDYADK

182 DLHDYAHTLK HDWQDYAAAK NPAGDLADYSANK VGQPKDLTDYADK DLHDYAVSHK HEDLHDYALK NPHHWDLTDYAEK VHDLEDYAQK DLHDYSAVRK HFDFNDLEDYADK NQGFPDEEDYANK VLDQEDYAEK DLPDYASRVK HFQYEDQTDYSYK NQYLTDTQDYAWK VLHDLEDYASHEK DLQDYADQGK HGRFEDAPDYADK NSDVADYADDYSK VLSFMEVADYDSK DMHDYAYYWK HHELDLEDYFADK NSFEGDAPDYAGK VMSDDVPDYSASK DNGNMDLWDYAEK HHFDMADYAEWPK NSMEQTDLADYAK VNAEMETGDYAEK DNLFLDVPDYSSK HHQMDLQDYAQPK NVSADGPDYADWK VNDVVDYSEK DNNVDLEDYAQYK HLLMGDEEDYAAK NVVYDVEDYADQK VNDWADYDSK DNSDMQDYASYFK HLNDDEDLPDYSK NWDNPDYADK VPAYDDYPDYSDK DNTYDVEDYAQSK HLQNDMEDYAADK NYPDMPDYAVDPK VPDDPDLSDYSAK DNVDVEDYANDFK HLWFTDKDDYANK NYPYNDLDDYANK VPDMEDYAGK DNVLYDVEDYATK HPFFVDKQDYSWK NYSLQFDEMDYAK VPSDEADELDYAK DOLDADLADYAHK HPHFPDAADYAAK OAAFQDLADYAVK VPWDLGDYAADYK DPDVWDYADTDPK HPHPFDAADYAAK OESDLEDYATEFK VPYDLEDYAK DPGDLPDYAAYAK HSDPDVEDYAGFK OQGQLODLQDYAK VQDDDVQDYANGK DQDLGDYADVQSK HTDVADYAWK OSFLDDWPDYAEK VQDTPDYAATQFK DSDLHDYANVEFK HVAQQDELDYSDK OTLQDLQDYARSK VQEMVDWQDYSDK DSFDDWEDYAAQK HVAWTDVNDYAEK PADLPDYAQK VQNDEDWQDYAVK DSVFFDHPDYASK HVDFADYAAK PADLQDYSMK VSDDWEDYAAQAK DSVMMYDVEDYAK HVDVVDYADK PADVADYAQK VSDLGDYAEK DSWVDDTADYAAK HVDWKDYAEK PAPMYDAEDYSAK VSDLPDYAEK DSYGADVPDYANK HVLLGDWSDYAAK PAVENDLPDYAVK VSDLPDYAHK DTNYVEDQEDYAK HVVDELDYAK PAWDLQDYAK VSDLQDYAHK DVEDYSQWPK KAAMADWQDYAFK PAWDWEDYAAYQK VSQEYDYQDYAWK DVHDYAHFWK KAASDVQDYATWK PDANLDQADYAAK VTEFDFPDYASYK DVMDYADELK KALETWDKEDYAK PDDLEDYSWK VTNKTDLLDYADK DVRDYVDHRLNYK KALQEDWQDYAAK PDLEDYAQQK VTTHDDPEDYAGK DWPDYADVHK KANPDELVDYAAK PDLEDYAVKK VVASDVQDYATWK DYDLHDYASLGFK KAWWDDVEDYDTK PDLHDYAEEK VVDLADYSYK DYKDYADRWK KDLDWEDYAANQK PDLPDYSSTYDPK VVDVNDYAYK DYPVDKTDYARAK KDLKDYAEMK PDMMDYAAVK VVLAODQADYAAK DYQDVKDYAALNK KDLQDYAELK PDMQDYAAAYMDK VWDFPDYAAK DYTYADLPDYAEK KDMSTDVADYADK PDVEDYAYLK VWGDQADYAAVNK EAPMMDTQDYADK KDTMSDVADYADK PDVHDYAEKAFHK VWNDNDEPDYASK EAYTHDOPDYAAK KDVDLEDYAALHK PDVPDYAVASGDK VYDLMDYSAWGNK EDDVADYAYK KDVHDYADFK PDWEDYAVVK VYDVEDYSYK EDDWEDYANK KDVHDYAYLK PEDWGDAVDYADK VYEFNDEEDYAEK EDHRDYAYLGHPK KDYNPDLSDYAAK PEMSPDLTDYADK VYRPGDHVDYADK

183 EDLADYADPWSYK KDYPPDEPDYADK PEQDLPDYAK WADMPDYAWK EDLADYSSMK KEEPELPDYAEGK PEWDWEDYAK WAEEDLVDYSENK EDLMDYAADK KENHDVEDYSSFK PFADVDQEDYSDK WAEEGDLTDYADK EDLPDYAPPK KFADADMHDYADK PFDFEDYANK WAEGLLDLPDYAK EDLQDYAAWSWFK KFDLKDYAAK PFDLSDYADSNLK WAEPWDMQDYANK EDLVDYAAWLHDK KFDLPDYAAGPWK PFDTEDYAAK WASWHDQTDYADK EDMEDYAHVTDMK KFDLQDYATK PFLDEDKPDYAWK WDAMDYAAHK EDVEDYSAFGYPK KFRNFDEHDYAWK PFSEDQDLPDYAK WDANWDQADYAAK EDVQDYSAWQQPK KFSPTDSPDYAAK PFTDQQDYAYSDK WDDDEMDYAEMDK EESDVPDYALAVK KGDLHDYAFK PGDLMDYAEK WDDFKDYADK EESEDMEDYADPK KGFTNDEPDYAWK PGDLQDYAFYNAK WDDQQDYAYK EFDLADYSWK KGNPVDAEDYQGK PGEEEDVPDYSSK WDDVEDYAYK EFDLHDYATK KHLLDMEDYAQPK PHNPQDVMDYADK WDDWPDYTDK EFEDVPDYAFNGK KLDWRDYSDK PHSLQDLEDYSDK WDKDDYSEHK EFESMVDTPDYAK KLENDMEDYARAK PKDLLDYADK WDLEDYSEHK EFNADDFPDYSAK KLQDLDAQDYAHK PLDDLPDYEVQGK WDLMDYAQGK EGADTDDWEDYAK KMAYDEDEDYAMK PLDVEDYAFK WDMPDYAALQANK EGAFGDLHDYSAK KMLHDVEDYSSFK PLDVVDYADK WDMPDYSDRK EGANDAEDYATFK KNDMPDYSEAEAK PLEDLEDYSNTDK WDQPDYAAAK EGFNLMDYEDYAK KNDNDLEDYANVK PLFDLADYAK WDQPDYSDNYMYK EGLDVQDYAFESK KQDVEMEDYAARK PLRDVPDYAHPFK WDTADYAANK EGQFTDVSDYAAK KQEGDVEDYAEQK PLSSNDFADYADK WDVEDYAADK EGSAADVVDYAAK KSDLQDYSADEYK PLTDQPDYAK WDVHDYAGGK EGTLQDLEDYTSK KSVGDWEDYAHGK PLVDVHDYAK WDVKDYAENK ELAYYDTYDYADK KTAFMDLADYDSK PLWDWPDYASDKK WDVQDYADYETFK ELDAVDYASK KTLPFDEYDYAEK PMDYPDYAQK WDYDYDQVDYADK ELDFHDYNSK KTRMDLHDYAVLK PMEPDAFDYAALK WEDAEDYAAAGPK ELDVHDYSNK KVDWPDYSAK PMGVPDTPDYSSK WEDFTDDAPDYAK ELDVQADQPDYAK KVGNWDQTDYADK PNDLHDYAFK WEEDTEDYAQALK ELDWTYDVPDYAK KVHEEGEDYAGVK PNWETDAPDYAVK WFADKQDYAK ELTDLEDYAMEGK KVLWNDVQDYAEK PQDLLDYADK WFDTTDYADK ELVDVEDYALYPK KVPPVDLQDYASK PQLMQDKADYADK WFEDMEDYAK EMDLQDYATK KVQWDFADYAQVK PQPDVWDYADDSK WFLTQDLQDYADK EMELMDYASK KWDYADYADK PSAMNDLADYAWK WGDLPDYSWK EMMNFDYPDYAGK KYEDDDMMDYAYK PSASDLPDYSAYK WGDLQDYAHK ENDWADYAAKNFK KYFDNDFPDYASK PSYLEDYEDYAYK WGDLTDYESK ENFSDAPDYADNK LADDWEDYAWTGK PTDLQDYANK WGDMEDYSGK ENPDYAHFPK LADMEDYSSK PTDVMDYASK WGDVKDYSDK ENPGMDLEDYANK LADQEDYAAEGLK PTVSDVEDYADTK WGDVMDYAAK

184 ENVNNDLPDYSSK LADVEDYAHK PTWDVQDYSAPFK WGNVGDMTDYAEK EPDVQDYSFK LALSYDVVDYSDK PVADVEDYSDHPK WKVDVHDYAK EPFDTDVEDYOSK LAWDQPDYAAETK PVDEHDYAWK WLADTEDYAAQEK EPSNNDLPDYAWK LAYGDTPDYSDNK PVDMEDYAEK WLDERDYAFK EPVQNDVADYSAK LDAEDYAAWK PVDVEDYAVK WLDVPDYASK EPYDVEDYANAVK LDAPDYADMK PVFVADFEDYSAK WMFEDDFPDYAYK EQDWPDYAYK LDDMQDYAVK PVFVHDMPDYAAK WNAYLDLPDYSDK EQDWQDYADK LDFDMQDYAK PVLFDOADYAHSK WNSHWDLEDYATK ERDLEDYSVK LDGDADWQDYAFK PVSDVPDYSAQWK WONLLDKHDYANK ESALQDLEDYSTK LDGDLVDYAATSK PWDKQDYSFK WPEGYDTTDYADK ESDAETDVEDYAK LDKPDYALGK PWDLKDYASK WQANDVPDYAYGK ESENVDYEDYAEK LDKPDYAQFK PWDVQDYAYK WRDWHDYAYK ESFQAVDLEDYAK LDLADYADGK PWLDQEDYAEQDK WSDAQDYAEK ESKDYADAWK LDLPDYAGHK PYDLVDYAEK WSDLEDYSYK ESLAPDVSDYAAK LDLQDYAESNDNK PYDSNDLVDYAGK WSDLKDYAAK ESPDYAAADYNPK LDLQDYARHK PYFQADLADYSEK WSDLVDYDSK ESSNDLFDYADLK LDLSDYASLNHSK PYFQADLEGTDYK WSDVEDYAAK ETDLMDYSDK LDTPDYAENASPK QADVWDYAAK WSLDLPDYAK ETDPYDYSDK LDTPDYAEQGPSK QAEFDFPDYASYK WSNMQDYEDYAAK ETDVHDYSWK LDTQDYAEWK QAGPQDMADYADK WSQMDLQDYAQPK ETDVQDYAVK LDWQDYSAYK QAGTLDLEDYAQK WSVDVPDYAK ETDWQDYADK LEADMQDYAEHNK QANEDLQDYAEQK WSVWDQEDYASGK ETTWMDQPDYASK LEDLQDYAWVLHK QAQYDVPDYATGK WTDLPDYSHK EVADYADLSK LEDVGDYAAK QAWMPDQADYADK WTDWYDYADK EVDLVDYAAK LEDVQDYAVK QDDLODYALAHPK WVDKKDYASK EVDMPDYESK LEDVTDYADNESK QDQPDLQDYAVTK WVEPDWPDYSSGK EVDVEDYARK LEGDLQDYSYNNK QEDWVDYAWK WVQDTVDYADDGK EVDVKDYAYK LEYSDDVTDYSDK QEDWYDYADK WVRDVQDYAK EVEDYADYGK LEYYSDAVDYAAK QGATTDVEDYAYK WVTPDDYEDYANK EVEDYATTTK LFADOPDYAYWGK QGDLPDYSSK WWDYEDYAEK EVFDVPDYAK LFDMQDYAQK QGELKDVQDYAEK WYDLVDYESK EVNDVEDYAPFNK LFDVEDYADK QGGFDDQEDYAYK WYDMHDYASK EVPDYAASFGYGK LFNWDLPDYATNK QGQLPDLQDYAWK YAAQEDMEDYSQK EVPDYADLEGVNK LFQDDLEDYAMTK QGQLPEQVDYAWK YADAHDYAFK EVPDYASVRK LFQDDLEDYAOTK QGSADLPDYAANK YADFHDYAHK EVPDYAVRSK LFVDSEDYAAASK QHLQDLQDYASRK YAMWDVQDYAALK EVWVSDQNDYADK LFWQDVQDYSAYK QLDFHDYANK YASDQQDYAEQLK EWAYGPDVPDYAK LFYENDLEDYSHK QLDMVDYANGEYK YAWFADMQDYSDK EWDMTDWQDYANK LGAPDWEDYASNK QLDNDLEDYANVK YAYDTQDYAYMPK

185 EWFTDDTLDYADK LGDFPDYAFK QMDDSDWQDYAFK YDDFEDYAYK EWFVQDVGDYAEK LGDLMDYSEK QMDNSDWQDYAFK YDLEDYALGK EWYQTDSPDYAAK LGDMPDYAAK QMWANDLPDYAVK YDLQDYSELK EYGMVDAEDYSAK LGDPYDYESK QNENWDYPDYAFK YDQRDYSDPWSSK FADLHDYSEK LGDYADYAAK QOGQLMDLQDYAK YDVHDYAEMK FADOPDYAGNVGK LGDYEDYAHK QPDAEDYAAAEVK YDVQDYADEYYGK FADWEDYANK LGDYHDYSYK QPLPNDLEDYSAK YDVYDWPDYSDDK FAHLDQPDYAADK LGDYPDYSEK QPMGADVVDYSDK YEDLEDYAAK FAHRDVEDYAYAK LGFTDWPDYASLK QPPEADLTDYADK YEDLKDYAFK FAYAADMADYDSK LHDDEWDYAK QQOQFDLQDYANK YEDVLDLEDYAGK FDAFADLEDYAAK LHWOADWADYADK QQQLYDQPDYAAK YEDWQDYSSK FDAWNDQPDYSAK LKDQLDAQDYAHK QQSDFEDYAAQQK YENDSPDYDSLSK FDEPDYAAWEQGK LLDFQDYAAK QQTLDLEDYAAGK YENDSPDYSLDSK FDGAGDLQDYAWK LLDQADYADK QSFDQPDYADANK YENFDAEDYAAYK FDKPDYSWNK LLDTQDYATK QSFEDVEDYAQAK YFAQRDQODYAEK FDLEDYAHTK LLDTTDYADK QSLKPDMEDYAGK YFDKVDYAEK FDLEDYSARK LLDWADYASTLNK QSTPEDVQDYAYK YFDNDVQDYAFAK FDLHDYRALK LLFDVQDYSYNGK QTDLWDYAEK YFDVADYASK FDLKDYSADK LLGVGDTPDYAEK QTHDLADYSAWRK YFDVEDYAAK FDLSDYADRK LLMLQDMVDYAQK QTPPLEDYEDYAK YFDVEDYAGGDKK FDMHDYAEGK LLQDMADYAEEQK QTTLDWQDYASGK YFDVKDYAEK FDVEDYAEDK LLQTWDWPDYSEK QTVEMDLEDYADK YFFFTDVGDYANK FDVPDYAYYK LLWDYPDYAEASK QTVEMDLEDYDAK YFNQGDLTDYSEK FDYEDYAEWK LMDFQDYAHK QTYHWDFEDYASK YFTNDEPDYAEPK FDYEDYARAK LMDLEDYAFK QVAADLEDYAVQK YGDTDVTDYADAK FDYPDYADYTFGK LMDWNDYAEK QVALQDLGDYDTK YGLPVDEEDYANK FEDEEDYAAK LMDYEDYSHK QVDLVDYADK YGLQWDVEDYAAK FEGDYEDYASTSK LNDGDYPDYADAK QVLEADLVDYDSK YGOLDQPDYAAYK FEMEDMQDYAAYK LNDLPDYSFK QVWDVQDYSSMHK YGSQGDYADYADK FESDLWDYAQEGK LNDLVDYADVQSK QWAYGPDVPDYAK YGYDTPDYAGWWK FEVDAPDYAK LNNGDVQDYANVK QYAYDLQDYATSK YGYEDLEDYSVNK FFDEDEEDYAQVK LODLDDYASVEOK QYDEPDYAEVMTK YHDKVDYADVNRK FFDLEDYSVK LPDWQDYSADPFK QYDLMDYASK YHDLPDYAHK FFDMKDYEAK LPEEDLDDYSAGK QYDLPDYSYDPFK YLAVYDLPDYSDK FFDMSDYAHK LPERNDYAWKFGK QYDYQDYPDYSAK YLDLADYAHK FFDTHDYAHK LPGWWQDMPDYAK QYDYRDYAEK YLDTEDYASAFGK FFDVQDYAYK LPNFDVEDYADQK RGFYEDTHDYANK YLDTHDYAFK FFLADVPDYAASK LPNLDDVEDYAWK ROGTMEQEDYADK YLDVWDYAAK FFMDQEDYADTNK LQAAYDQADYADK RPVLTDAHDYAFK YLDWADYAEK

186 FFQDRHDYAK LQDLVDWPDYSSK SAAMLDAQDYAEK YLDYDLPDYSYFK FGDTADYADK LQDMPDYSEAEAK SADTPDYAAFNSK YLSDLQDYDSLGK FGEDDLKDYAAPK LQDVEDYSVK SADVFDYADK YMGFSVESEDYAK FGEGLDYQDFSAK LQLGDAEDYAEAK SALDVDMMDYSAK YMYSFDQEDYADK FGETHNDVEDYAK LQTMDVPDYSQVK SDDWSDYAEK YNFFDMEDYSANK FGFNHDVEDYSHK LQVQMDYPDYSAK SDLMDYAARK YNYYADQPDYAQK FGSTLDVADYSAK LSDAEDYAAKGTK SDLPDYAEFK YPDLEDYAFK FHDHVDYASSDDK LSDAPDYAMFMGK SDLQDYAELDSKK YPDMPDYANK FHHDDVEDYAFRK LSEYLDMQDYAGK SDVEDYAALK YPELDVPDYANVK FHNDVPDYASSDK LTANSDLFDYSDK SDVVDAKDYADSK YPFLNYDELDYAK FKDLQDYANK LTDDVADYAK SEMTFDEEDYAAK YPNAYDLQDYSGK FKNLLDEEDYAEK LTDLADYSHK SFANDTPDYADFK YPNGLDLEDYSGK FLDLDLPDYANSK LTDLQDYSADEYK SFWDKADYAAEPK YPYDVPDYAK FLLDKEDYAK LTDLQDYSFK SGETNHDVEDYAK YSDEEDTPDYAAK FLNGDVHDYAAVK LTDMWDYAAK SHAWFDHKDYAFK YSDVPDYDTVAVK FLQDDLEDYAMTK LTDVPDYASK SLDLNDYAHK YSDVPDYESVGLK FLYENDLEDYHSK LTGYDQEDYAESK SLDLPDYSSK YSPDLPDYSNDSK FMALDLQDYANTK LTNESDLPDYSWK SLODLPDYADENK YTDLQDYSDYENK FMDKADYAGK LTPDDYPDYADNK SMDLMDYAGK YTDLVDYAEVEPK FMDMQDYAWTFEK LTTGDAPDYADGK SMDNHDLEDYSAK YTDLWDYAEK FMMYSDFPDYSAK LTVPDDYPDYSGK SMDYPDYAGK YTEDQPDYAALTK FMSLQFDMEDYAK LTWLYDMYDYAAK SMEEDLQDYAWEK YTEKMDKEDYAAK FNDFHDYAEK LVAEHDMQDYADK SPDLEDYAYK YTFDADAEDYAAK FNDFPDYAGK LVALADQPDYAYK SPDLVDYAEK YTTHDLEDYADFK FNDVDQEDYAALK LVDFFDYADK SPDVSDTQDYAYK YTWMDSDVPDYAK FNLDEQDYADTPK LVFSGDVEDYSEK SPNEDWQDYADGK YTYPQDKEDYAAK FNLFGDQEDYESK LVGDHRDYSFLNK SPQGDLQDYAHVK YVDLSDYAEK FNNQDQHDYAFQK LVSFMDLADYSDK SPQPDLEDYDTPK YVDWQDYADK FNSDQPDYADYEK LVWTQDDVMDYAK SQSKVLDLPDYAK YVFDGDSMDYADK FNTDKMDYAK LWVDKHDYAASMK SQTLNDAEDYASK YVPDMADYAK FPAADAEDYASEK LYDLADYAQK SSPFSDQLDYADK YVQDKADYSAVAK FPDLVDYASK LYDLEDYSQSPTK SSVGDDVQDYAHK YVTMGDLPDYSNK FPDVEDYAADKGK LYFDTEDYAEQNK SSVVGDDQDYAHK YVVYDVHDYAYGK FPDVEDYAAKGDK LYMEEDYQDYSAK STDDLEDYSVNFK YYDLPDYAFK

FPDVEDYAAQATK LYSDLQDYSDLGK STDLEDYSMK

187 Shown are all 12ca5-binding peptides identified (with an average local confidence score ≥ 80) from selections against 12ca5 (1055 total), where a 12ca5-binding peptide is defined as a sequence containing the DXXDY(A/S) motif, or one that differs from it by one residue (i.e. EXXDYA or DXXDYG). The DXXDY(A/S) motif accounts for 93.4% of sequences (985/1055).

6 8 9 Table 4.6.2. Affinity selections of 10 –10 -member (X)9K and 10 -member (X)12K libraries against 12ca5 identify 1649 non-specific binders.

AAEWAWTTHK FYLWYMWPWK WEWMFFFPYK WWFFMPWFHK AAEWAWTTHK FYLWYMWWPK WEWSPYWWYK WWFFWEPSWK AFFWGWEWFK FYMWDWAWLK WEWWEPYFWK WWFFWESPWK AGFFTYFWYK FYNEWMWWWK WEWWFHPFWK WWFHLDAWWK ALFLFHWLMK FYNWFYYWQK WEWWPEYFWK WWFMAPWAFK APWFLFQLFK FYNWGWDWFK WEWWYQPWFK WWFMPAWAFK APWWYYWYYK FYPAMWWFFK WEWYPFYWFK WWFMYFPWQK ASWWPYWYWK FYPFFYQWLK WFANWFAWPK WWFMYPFWQK ATMLMLTFFK FYPMWWYWWK WFANYWWMYK WWFNFFAPFK AWTKFFLMFK FYPVWFWAWK WFAWMPPYWK WWFNFFPAFK DFSPFLLFVK FYPWWGYYFK WFAWMPYPWK WWFNFLPYWK DFVAKFVLHK FYQWWWVHFK WFAYFWWSPK WWFPANWWFK DFWHWHWFFK FYSLFPWWLK WFAYYFFPFK WWFPWTFVLK DLLHFLFELK FYSWVPWFLK WFAYYFPFFK WWFPWTVFLK DPYWWLKYWK FYTWWPWLWK WFDLMNFFLK WWFQWPWMEK EFEWLYWFHK FYTYFVVFHK WFDVFWWHYK WWFTWFYSPK EFFWHFWPFK FYWAWMDYWK WFDWATWFWK WWFTWWPVYK EFHFHWVFYK FYWDPFWWSK WFDWFLAHWK WWFTWYNWPK EFHHFWVFYK FYWEFHYFWK WFEDFFQFWK WWFWEAYAWK EFWLVPWWWK FYWEFNFWWK WFEFLEVTFK WWFWTNYLYK EFWYLQWWPK FYWFMFNFPK WFEPFGWWWK WWFWTNYYLK EQFLPYWWYK FYWFMFNPFK WFEWAVPWPK WWFWVYPLWK EQWPFAWHFK FYWFWQDWYK WFFFWLPHWK WWGWDNFWFK ERWWFPWMHK FYWFWTHPWK WFFFWNFQPK WWHLMFWGWK ERWYHFRWWK FYWHFLPYWK WFFGGPWWYK WWHPFFWGWK ETDWELLVFK FYWTWWNMYK WFFGGWPWYK WWHVWNAFWK EVFNLAFLLK FYWWAFLPWK WFFGWFQPFK WWHWTYYMYK EVLHFLFELK FYWWDFWTAK WFFMRYFHRK WWHWWPYLQK EWWEHVWWFK FYWWLQHLWK WFFPALNWWK WWHYYWMGLK FAFFPWWMYK FYWWLQHWLK WFFPLANWWK WWKDPWRFWK FAFWFLMFLK FYWWNMNMWK WFFPWFGAFK WWKLYNVWWK FAFWFPFFLK FYWWNMNWMK WFFPWFGFAK WWKTYYDWWK FAFWGWEWFK FYWWQLHWLK WFFPWWYMSK WWKTYYWDWK FAWHFYPWFK FYWWTFWLPK WFFSLWDWAK WWKVNYLWWK FAWWTYFGWK FYWWTFWPLK WFFWNFSQFK WWLGNPWWYK FAWWYTHWWK FYWWWDVSYK WFFWVDKLWK WWLNFMQFWK

189 FAYWLPWWFK FYWWWFNFTK WFGDWWWMFK WWLNWFEFTK FAYWPLWWFK FYWWWYHTMK WFGFFLNLMK WWLNWFETFK FDPFWLWAYK FYWWWYPSYK WFGLTPWFWK WWLNWVYFDK FDPFWLWSFK FYWYFLYQPK WFGWFNPPFK WWLPFAWAWK FDPWWFYFGK FYWYSWYWNK WFHFFDLWAK WWLPHHYWWK FDSPFLLFVK FYYMNLAWWK WFHPWFYWVK WWLVWPLFGK FDTMLVDQPK FYYMQVAWWK WFHQTWWYWK WWLWDMPFLK FDWRWFDFWK FYYYPYFWYK WFHWYWQYTK WWLWDPMFLK FDWRWFDWFK FYYYPYFWYK WFHYWMDFFK WWLWPLGPFK FEEWLYWFHK GAFFTYFWYK WFKFFTWWPK WWLWQVFSPK FEFWHFWPFK GFPAWTWFWK WFLFFPFYTK WWLYFVNPWK FEFYWWPWFK HFWSWFPYFK WFLFGYWLPK WWLYHFFPLK FEHFHWVFYK HFWWYSFDWK WFLFGYWPLK WWLYHFLMLK FFDFFSWFNK HGWWWLWHYK WFLVQWWPFK WWLYHFPFLK FFDFLWLLHK HVWTDMYQPK WFLVVPWYWK WWMFNPWWYK FFDHFFWWYK HWLYWPYPYK WFLWVWPFYK WWMFWPMFKK FFDLLKWFLK HWLYWYPPYK WFMFFPDFWK WWMLPHFLFK FFDLLKWLFK HWMWHNWWWK WFMPVAWFFK WWMNYWDLWK FFDPFFLFTK HWWFYANPWK WFMQYWWPWK WWMWDPWLWK FFDVFEWLVK HWWMHNWWWK WFMSWMLWMK WWMWMYPMLK FFDYWFKGFK HWWYGWPWYK WFMSWPFMWK WWMYPLGFFK FFETVWWLLK HWWYGWWPYK WFNAKYMWWK WWNFWPTWPK FFEWGWWNWK HWYFWWYMPK WFNFWDFWSK WWNNLFWGWK FFEWQYFFFK HYLFYVWFPK WFNFYTDWWK WWNPWQFWVK FFFDSLANWWTWK KDYMLWWPWK WFNNLWYFYK WWNQFWDWYK FFFLSVELSK KEEEHYWWLK WFNWFQSFYK WWNWASWLFK FFFPFLLHLK KFLWFWWGFK WFNWFQSYFK WWNWHEFYWK FFFPWAVVFK KFWKFLVFYK WFNWFQYSFK WWPLEFFWFK FFFRWPFFWK KFWNFSFWFK WFNWFWGFSK WWPNYWWHLK FFFSWFPWLK KGELLLLHVK WFNWHPFYWK WWPVWFMNFK FFFSWFWPLK KGELLLLVHK WFNWHWPFYK WWPWAGWPWK FFFYDWSWWK KGSYLFWFHK WFNWMWWFKK WWPWFYGNFK FFFYLMTPWK KRFLNMEQWK WFNWMWWNAK WWPWMPPFWK FFFYLMTWPK KWHFPWRWWK WFNWTHYFPK WWQFWFEPFK FFGGWLQFWK KWHYMYWAWK WFNWTNYLFK WWQFWFPEFK FFGKWYYWWK KWLNFWQWWK WFNWVEFFYK WWQNVFQWWK FFGWYWYSFK KWMQMYWFTK WFNWVWQWNK WWQTFYQFWK FFGWYWYYAK KWNTVFFYWK WFNWWWNAYK WWQTWWQLVK FFGYWGWAFK KWVPFMDLWK WFNYGWYFPK WWQYWPWWGK

190 FFHFWHHYWK KWWEHWYWWK WFNYWLQYWK WWQYYAGWWK FFHFYGYRFK KWWWAHSWWK WFPFYWWNWK WWQYYWWLSK FFHHFFLHLK KWYWHWWLEK WFPPFYVWWK WWSDTWYFWK FFLFLWPFTK LAEDAWTEGK WFPTFWMWHK WWSDTWYWFK FFLHFMLFPK LASYLFWFHK WFPVLWFNWK WWSHYFPFWK FFLLTWSQQK LASYLFWHFK WFPVLWNFWK WWSHYFPFWK FFLNLHFWPK LAWFFDYFLK WFPWFFVGHK WWSLNYYWWK FFLPWWDHWK LAWLTFHFWK WFPWMNMWFK WWSNLWPWFK FFLPWWDWHK LAWLTFHWFK WFPWNFNFWK WWSPWWYLFK FFLTYLLTFK LAYSLFHWFK WFPYMFWPWK WWSVPLFYWK FFMFPWFYAK LAYSLFWFHK WFPYWLYYPK WWSVQWPWFK FFMFSLFHPK LDWLFDLFTK WFQFWNFLLK WWTFAQWQWK FFMVFGFMLK LFAFNFYFPK WFQLAFHGWK WWTFWDYWTK FFMWFFFPLK LFAQLLLMWK WFQYWFFYPK WWTLTWFPWK FFMWFFPFLK LFDFLYELMK WFQYWFPFYK WWTLWNLMMK FFMWLMPFWK LFDFVWDFVK WFRLFDFLLK WWTMFNNFWK FFNLFFNYYK LFDFVWSVLK WFRLSWDWLK WWTMTFWQTK FFNLLLDFPK LFEFLRYFLK WFRTVWDWLK WWTWDHVFWK FFNLWFDFWK LFEFWNYWWK WFRWWSTLWK WWTWFDWVHK FFNMVLNFFK LFFFPYWYFK WFSFPLWLLK WWTYNFPWWK FFNQVFFFLK LFFPFWWFDK WFSWLQGLFK WWTYNPFWWK FFNVWDYLFK LFFTLMQWLK WFSWLWLPTK WWVFFPFFHK FFNWFWPLWK LFHLMFELFK WFSWLWPLTK WWVHLWTWPK FFNWLNFMFK LFLGVLAMLK WFSWWWSEYK WWVYFFQYPK FFPFLWPYWK LFLLQVFMPK WFTMLFNMFK WWWDLMQYFK FFPMWWWFPK LFLNVWAMLK WFTWGFVWPK WWWDPFQYFK FFPWLVLPWK LFLWWLFPYK WFVLAFPWWK WWWDYGYLLK FFPWLVPWLK LFLWWLPFYK WFVLNWNMFK WWWDYVEWFK FFPWLVWPLK LFMLFPWFFK WFVRTWDWLK WWWFPVWMYK FFPWTFMWWK LFMLLNVLLK WFVSTYWWWK WWWFQTGPWK FFPWYFVNWK LFNFLGFWLK WFVWWEPMWK WWWFSPWPQK FFPWYRWMYK LFNVFFWLSK WFWDFWFNHK WWWFTPGWPK FFPYWSWWGK LFPFHWFYLK WFWDTHWFWK WWWFTPWGPK FFQHMSFFFK LFPFWGWWLK WFWEQWAFWK WWWFTWGPPK FFQHWWWNFK LFPYVWLALK WFWFFFPDNK WWWGWMPFYK FFQMGFFFPK LFRWLSYLLK WFWFFFPNDK WWWGWMPYFK FFQWYNWWMK LFRWLSYLLK WFWFFQTPWK WWWHPFFYKK FFSFLSSWFK LFTYLTLFFK WFWFFQWPTK WWWMSPWAFK FFSLLTLMMK LFVLLTYLPK WFWFKSYYPK WWWNGWYLFK

191 FFSLYTWQDK LFWLNFLFAK WFWFPFQVFK WWWNLQSWWK FFTELLAFFK LFWLWWDWSK WFWFQHSWPK WWWNPHWWPK FFVGFFFPWK LFWRLSYLLK WFWFQQNWYK WWWNWPMWHK FFVGFPFFWK LFWSWWDLWK WFWFTYSPWK WWWNWPMWHK FFWDWKYWFK LFWWESWWFK WFWFTYSWPK WWWPDAYLFK FFWFQWPWFK LFWWMNLNFK WFWFWDFPFK WWWPDSFLFK FFWFWGDFLK LFWYLAQLFK WFWFWDPFFK WWWPLFAWAK FFWFWGDLFK LFWYVLKWLK WFWFWDWMPK WWWPLWWGEK FFWFWNEYWK LGFYALTFWK WFWFWLANSK WWWPWAWTHK FFWFWQDYWK LGVLLFSYYK WFWFYAWSPK WWWPWAWTHK FFWLQYNAFK LGWWFPAFWK WFWGYFTTPK WWWPWFYAGK FFWLWAPYFK LLDLWLLLAK WFWLEYHWWK WWWPWLGRFK FFWLWAYPFK LLELVTWLLK WFWLGWLYPK WWWPWLWADK FFWNSFVWWK LLEVFWNLFK WFWLMGGYPK WWWPWMVPYK FFWPLEMFWK LLFDLLLHFK WFWLNSWFMK WWWPWMVYPK FFWWFHPQYK LLFYFLPFPK WFWLPWYVGK WWWPWWGALK FFWWHFWHSK LLGWALNFWK WFWLWPYVGK WWWPWWHYDK FFWWQWPDFK LLRMPWFFLK WFWLWSVMPK WWWPYFWHQK FFWWVPTMWK LLSVAWWPFK WFWLYEHWWK WWWQANAWWK FFWWYGWTPK LLTTLLWFFK WFWMFPWVFK WWWQWGQGWK FFWYWMHNWK LLTWFNFWWK WFWMFVWPFK WWWQWLTYSK FFWYYQWNMK LLVWFPYWWK WFWMTDWWQK WWWQWMFPTK FFYMTWYPWK LLYLFSFYGK WFWNMWWFKK WWWQYLKVFK FFYMTWYPWK LMFWFFPFFK WFWNSDWWYK WWWQYLKVFK FFYMYWNYYK LMGWWFFPYK WFWNTWWFQK WWWSLFPEFK FFYPWWETWK LMNFLMALFK WFWNTWWQFK WWWSLNSWWK FFYPWWTEWK LMNWWWPYFK WFWNVYDWWK WWWSNLSWWK FFYWLLLESK LNFLVLYLLK WFWNWLQNYK WWWSVAPWLK FFYWQLFPWK LNLWFFWQTK WFWNYWYFEK WWWTDYWGYK FFYWWLPLFK LPFFLEVFFK WFWPFFYTYK WWWTNFWAYK FFYWWPLLFK LPFWWHFFLK WFWPFWWQTK WWWTWENAWK FGFGQATTLK LPFYLAAFWK WFWPLWDWYK WWWTWENWAK FGFNFGHYFK LPLFLAFFYK WFWPLWDYWK WWWTWQDAWK FGGWPWWLFK LPMTWWFWFK WFWPMAWLVK WWWTWWEYPK FGPAWTWFWK LPPWYWFLFK WFWPVYMYWK WWWVFPTAWK FGWFDFFHWK LPWFWFFPFK WFWPWMFLSK WWWWFQALRK FGWHFWHWVK LPYFWFLNWK WFWPWNFYFK WWWWFQARLK FGWHWHFWVK LQLMLLTWLK WFWQGWQYFK WWWWKEYMWK FGWLNFLLFK LTLVFNWHFK WFWTFLFGHK WWWWKWTNYK

192 FGWLNLFLFK LVHMHLYFMK WFWTFLYPYK WWWWLSPDWK FGWLTPWFWK LWDFFLNFVK WFWTWSQPFK WWWWTLMRSK FGWNWFQFLK LWDFFQVFVK WFWTWWTWSK WWWWTSLWGK FGWWLPWQWK LWDWVTWVLK WFWVFPYHFK WWWWTSWLGK FGWWMLPWFK LWEWWPWVWK WFWWDYSGWK WWWWTVPYWK FHFPHFWWWK LWEWWPWWVK WFWWGDPFWK WWWWTWEPVK FHHWWDWVWK LWFLNWLRFK WFWWGFPQYK WWWWYSPRPK FHNWWVHWWK LWFWLPSWLK WFWWGFQPYK WWWYAWHMHK FHNWWVHWWK LWFWPLSWLK WFWWLDLTLK WWWYFESQYK FHWLLMTWHK LWNLWKYFFK WFWWLDTLLK WWWYFTQDYK FHWVYFYAYK LWPFFWFFDK WFWWLPVPYK WWWYLPADWK FHWWWHYYWK LWPWYFLWAK WFWWNFDSWK WWWYPLVFMK FLEFWNYWWK LWTAWWYPWK WFWWNPFLFK WWYDYWTFWK FLEFWYNWWK LWWFLNLLHK WFWWNYPGWK WWYFFNTFLK FLFDFTWLLK LWWFLQVLHK WFWWPNFLFK WWYFPVYDWK FLFFDWWSWK LWWTWNGYWK WFWWPWTQLK WWYFPYVDWK FLFFDWWWSK LWWVWDLFHK WFWWSFLLPK WWYFQNSYWK FLFPFVLWLK LWWWPLMYWK WFWWSLFLPK WWYGGWQWYK FLFPFWWFDK LWWYDWFTWK WFWWSLPLFK WWYGGWQWYK FLFTLMQWLK LWWYQTMFWK WFWWTHWYTK WWYGWGQWYK FLGFLLGLVK LWYFFYFGPK WFWWTYEFPK WWYGWWFAPK FLHLFMDLWK LWYFFYGFPK WFWWTYEPFK WWYNSYWFHK FLHWLWWNLK LWYFFYPGFK WFWWWYPETK WWYPFLPYWK FLHYWWFYPK LWYMTYWFPK WFWWYFVTPK WWYPMSWWYK FLKFHVWWGK LWYPYWWAFK WFWYLPWDWK WWYPQDFWWK FLKFVHWGWK LYFPFEFWFK WFWYNWFQRK WWYPSMWWYK FLKFVHWWGK LYFQWLLPWK WFWYPLWDWK WWYQPWWWLK FLLFNLNWFK LYMHFWYLHK WFWYSADWWK WWYTNTWFWK FLLGPWMWWK LYNWWQYFLK WFWYSEGWWK WWYWDYWLMK FLLLQVFMPK LYQWWSWLLK WFWYTHPQFK WWYWFYPHTK FLLNFFHFLK LYTFLQFFLK WFWYTHPQFK WWYWGYYLMK FLLWWLFPYK LYVFYLWPWK WFWYTYYWRK WWYWGYYPFK FLMAWPWWLK LYVFYLWWPK WFWYYTFWPK WWYWPVLGWK FLMQFLDWFK LYWFFLWNPK WFWYYWFPHK WWYWQFPLYK FLNAYVWLFK LYWWPWFYPK WFYDWYMWPK WWYWQPFLYK FLNDLLEFFK LYYLLFGWPK WFYHSPWWWK WWYWWNYVPK FLNFFWFLAK LYYLLHMHFK WFYHWDWYNK WWYWYPYYSK FLNFVFHFLK MAFFWDWLFK WFYHWLGFPK WWYWYYPYSK FLNFVFHLFK MAWYWFFPWK WFYHYYWFLK WWYWYYPYSK

193 FLNFWFPTGK MFFDWFEWWK WFYNTWHWWK WWYYDNYWWK FLNFWFSAPK MFGVQMAWLK WFYNWWYEYK WWYYEPWFLK FLNFWFSAPK MFHWFLNFLK WFYNYFPQWK WWYYNDYWWK FLNLNWWWAK MFHWQFWWPK WFYQWWYWMK WYEWFWNQWK FLNLWAWFLK MFLHFFMPWK WFYVYLFPHK WYEWSWFWGK FLNLWLLLAK MFLHFMFPWK WFYWFFFPTK WYFFLQFWPK FLNVVLDFLK MFMLPWFFWK WFYWFFPFTK WYFFLQPWFK FLPFLTWAWK MFWNWWWWSK WFYWLPDWWK WYFFSFPWWK FLPFMDWYFK MFWPWWLTWK WFYWWYDYPK WYFFSMLWWK FLPLVFVFMK MFYHWMWWQK WFYWYDWYPK WYFFSPFWWK FLPVLMAFLK MFYWLQYLLK WFYYPYMWYK WYFFYLPLWK FLREFFVTWK MGPAAFTQEK WGALFSWFFK WYFGQYWFYK FLRLLLWALK MHPWTYWWLK WGFWWPFQYK WYFHWFNYWK FLSNWFAWLK MLFFSPWWWK WGLPWWWFMK WYFNPWYQWK FLTMFWDYLK MLFFSWPWWK WGLYYWWPYK WYFPWEWWQK FLVELLEFFK MLFFVFYPFK WGPLFWVLLK WYFVFPQSWK FLVGFPWFFK MLFLHLYRMK WGPWWFAQWK WYFVFWGYPK FLVGFWPFFK MLPFFWWYTK WGVMWFPWWK WYFWKWWDTK FLVMEFMWPK MLQVWWSFFK WGWFQWDFWK WYGLYFTYFK FLVYYWPWWK MLSLFWWVLK WGWLPLWFLK WYGWWNARWK FLWFQLAMLK MLVHFSFYPK WGWWWWNAPK WYHFFWSFGK FLWMYPYWWK MLWWYPWFFK WGWWYGWGFK WYLFFMQPWK FLWNFLLFGK MMYWVFVVRK WGYWWALPWK WYLGSWFPWK FLWNFLLGFK MNLVFYHWWK WGYYWWWEPK WYLHWWDYFK FLWSFFPYFK MNVWWNWFFK WHAFFPWWLK WYLPFWWGSK FLWVVTWVYK MPWFFLNQWK WHALPWWFWK WYLSWWEMWK FLYFLWDGFK MPWFFQVQWK WHAWLWEYHK WYMFWNNFWK FLYLWFYNPK MQLLFFKTYK WHHPFFWWFK WYMNFWYTFK FMDLVFYFLK MQLLFFTKYK WHNYFYYWWK WYMPWPFFWK FMEWYVSFWK MVLPWFFFFK WHPHFFWWFK WYMWAFYPFK FMEWYVSWFK MVWHLFNFWK WHWHWTPWWK WYMWWQYWQK FMFPLLWFYK MVWHQFWMWK WHWWWFPYMK WYNFWFWYPK FMFSFFQMFK MVWWQLLTFK WHWWWPFYMK WYNFWQFYYK FMFWWLPYMK MWDWFVFSWK WKELLELLWK WYNGWWWYLK FMGFFWNMFK MWFNMLYWFK WLAWFVPYWK WYNPLWWGWK FMHFWYWFRK MWFNPWWFTK WLAWVFPWYK WYNWAFWNYK FMHMYWHWWK MWFPNFYWFK WLAWVFPYWK WYNWSSWWWK FMPAWWFFYK MWFWFLKWPK WLDFWPWWWK WYNYLLQWLK FMPWFELFLK MWLQWPFWFK WLDFWWPWWK WYNYLNKWLK

194 FMPWFFAYLK MWSAWWPWYK WLDLSVWFWK WYNYWFFWNK FMQYWAQWWK MWWEWANFWK WLFFLFHWPK WYPFLWYMSK FMWPWYLRWK MWWQFPFFYK WLFFLPWFHK WYPHVWWHWK FMWWNFYMYK MWWSTFWFPK WLFFNWLNWK WYPHVWWHWK FMWWYWPEYK MWWYHWLPFK WLFFPMDLFK WYPTYWHWWK FNFWHPWFWK MWWYHWPLFK WLFHLLNLLK WYPTYWHWWK FNGWWWTFFK MWYFWPWHWK WLFLQFSAFK WYPWFFQAWK FNLTEAEGPK MWYWFWSPHK WLFPYWVHWK WYPWWMWHYK FNWFEWWQYK MYLWWGNWWK WLFTYLTLFK WYQFFTWGWK FNWGFWFVTK MYLWWNGWWK WLFWANDWWK WYQQWWAFWK FNWHFPWYWK MYNLWWEYFK WLFWHWWASK WYSWWMPWGK FNWLKWFHGK MYPWFDWWWK WLFWLFQHLK WYTSPWYFWK FNWLKWFHGK MYWFWYTPWK WLFWWPWMYK WYTWWQMAWK FNWWHYAWYK MYYWLLWLPK WLFWWWPMYK WYVWWPMWFK FNWYYWHVWK NMFSWWYWPK WLGLHWTWWK WYVYWFPWNK FPAWWMWWPK NNPWWFTWWK WLHFMMFPWK WYWDWLAAWK FPFFWYWYPK NPYFWEFWWK WLHWLAPYHK WYWEWANFWK FPFNFLFHFK NQWWFYWWPK WLHWPWFFFK WYWFFFNRPK FPFNFLFHFK NSLFLGLFWK WLLDFLNFVK WYWFPWWKYK FPFSLFLWGK NTPWVWWWWK WLLDSVWFWK WYWFWNQWSK FPFWFWLMGK NVDWWYWWLK WLLGHWWTWK WYWFWQNWSK FPFWWFLMGK NVLSVWWVLK WLLPWLWRWK WYWGWNARWK FPFWYMPFWK NWFPWYWLFK WLLWFPFNFK WYWHDHWWFK FPFWYWMLFK NWKWWMWWHK WLNFWWHSFK WYWHWNNWWK FPFWYWPFFK NWKYWFQYWK WLNLWKYFFK WYWHWNNWWK FPLSFAFWFK NWLWNWWFHK WLNTWWYFFK WYWKYWYFYK FPMFGWWFFK NWPHEFFWWK WLPFFWFFDK WYWLGMEWFK FPMLWLLQFK NWSWWPWYWK WLPFFYYTWK WYWMWYNWQK FPPFFWWYTK NWTSFWHWWK WLPFFYYWTK WYWMYYMFPK FPPYWQWFFK NWWYWFHQWK WLPLFDLMFK WYWNFFNFLK FPWWWMWQMK NYWTWWTWFK WLPLLFYLFK WYWNYFWHYK FPWYWVTWFK NYWWWWHMYK WLPWFYPWFK WYWNYFWHYK FPYMWLWPWK NYWWWWHMYK WLPWFYWPFK WYWPWWVTGK FPYQFMWFVK NYYHWPWWWK WLPWFYYMFK WYWVALWFPK FPYWLWPFPK NYYHWPWWWK WLPWVFELFK WYWVLAWFPK FQEFWTWLFK PDWELLWVLK WLQLMWFLGK WYWWFPQMPK FQHWHWWWYK PFFFFDFYLK WLQQLFAYFK WYWWNGWLSK FQHWWHWWYK PFFHWLWGWK WLQQLFSFFK WYWWNGWSLK FQHWWHWWYK PFFHWLWWGK WLQVFDWWWK WYWWNVYAFK

195 FQWWWFPWSK PFGVHELVAK WLSFFDLFLK WYWWPWMNYK FQYWHWEWWK PFKTWFAWWK WLSQLFDFLK WYWWWPEFHK FSAPWFWFFK PFLWWWNWYK WLSVDLLWFK WYWWWSSPFK FSMMNWWWFK PFQWWLPWFK WLTAWWYPWK WYWWYTAYPK FSNGFTMVPK PFTLLGWLLK WLTFLFVESK WYWYDFFLNK FSPVYWWWWK PFTLLWGLLK WLTWHFWLLK WYWYEWWNNK FSTWWFYVWK PFVWWAWQSK WLTYYDYWWK WYWYYDYYQK FSTWWFYWVK PFWMLLDLVK WLVQWWPWWK WYYFWNLRWK FTFAQYWWLK PFWMLLDVLK WLWEWYNPWK WYYFWNLRWK FTWWDFWMGK PFWWWWSPMK WLWFLMGWPK WYYKFFWPFK FTYVFQWLFK PFWYLAQFLK WLWFWYHPWK WYYMNTWWWK FVANAVVTTK PFYFNWFPWK WLWMPWFRWK WYYMSQWWWK FVFFDYWFWK PHHPMPFGVK WLWMPWFWRK WYYQSFSWWK FVFFNFELFK PLLFVQVQFK WLWNVPFWLK WYYSQFSWWK FVFWPTWWFK PLVNFLWFLK WLWPYYYFLK WYYYPWWWDK FVLWAHFFPK PMLPWFWWLK WLWRTWVWRK YAMWWWWFPK FVLWHAFFPK PTVFDLWLFK WLWWPVQYWK YAWFWPFYMK FVMFQVGTYK PTVWLMWHTK WLWWRWYDFK YDWWWHHWYK FVPFWYYLWK PTWWHEWWWK WLWWSHWLSK YFAEWWEWWK FVVWFLMFFK PTWWHWEWWK WLWWSHWLSK YFANYWWFFK FVVWFPFFFK PWFGFFVRSK WLWWSYPFYK YFFFPYFAFK FVVWLFMFFK PWFLMPLLVK WLWWVPQYWK YFFHWNWWLK FVVWPFFFFK PWFMWVYPWK WLWWYFPFFK YFFLPWGFFK FVWFLQDFFK PWFPWWEYWK WLWWYLMFFK YFFPVWAWWK FVWWETYWWK PWFWALFPFK WLWWYPFFFK YFFWLFPLLK FVYSWFVHFK PWHPWYFWWK WLWYDWFTWK YFFWLLFPLK FVYWTFHMVK PWLWFYPWFK WLWYQTMFWK YFHWFNWWLK FWAYYLNWFK PWLWPWEWFK WLWYTWYYPK YFHWQWFLAK FWDAYWWGFK PWMWVLKLWK WLWYWAPWPK YFHYWNFWYK FWDLLLKLYK PWMWVLKWLK WLYEWPFWFK YFLFLPWYPK FWDRWWYVWK PWWDELVWWK WLYFAFMFPK YFMWWTWVPK FWDRWWYVWK PWWDWWAYYK WLYPYWWAFK YFNEWMWWWK FWDVFHYWWK PWWWDGWFLK WLYWLLNLVK YFPAMWWFFK FWELWLHAYK PWWWFYPVFK WMEFFFLPWK YFPFYGWWSK FWEWAVWPPK PWWWFYVPFK WMGPWFWWYK YFTWWPWLWK FWFAMLFMFK PWWWHFSYWK WMGWWNEFWK YFWDWLSLYK FWFFLPFHMK PWWWYNWYPK WMMGWYHFLK YFWFWQDWYK FWFFWLPHWK PYFWYWKWLK WMPFFWWMPK YFWFWTHPWK FWFFWPLHWK PYLWDWWYWK WMPFWYWFSK YFWLFFLPHK

196 FWFHGFYYMK PYNWFLEFFK WMPHWWFWFK YFWNPWWKWK FWFLHLWSPK PYPWWWMHYK WMPSWFTFWK YFWNPWWWKK FWFTHHEFFK PYWLLWQLMK WMPSWFTWFK YFWNPWWWKK FWFTWSEFFK PYWWWAPFWK WMPWVFDWFK YFWQWFYSHK FWGNYPWWWK PYWWWAPWFK WMSAWWYPWK YFWTWWNMYK FWGNYWPWWK QFHWWHWWYK WMSFFWPVWK YFWWHWLLMK FWGYANWFWK QHHYWYWWWK WMSFFWVPWK YFWWQYMMYK FWHQFYWAWK QPVVQTLDVK WMSFFWWVPK YFWYSWYWNK FWHQFYWAWK QTYWWWYPWK WMWFPLFNWK YFYPFLWFPK FWHVMYLHFK SAWWPYWYWK WMWLWQSMWK YFYWWYQGFK FWHYAEWWWK SHFWWYWRFK WMWNFMDFWK YGGFWWWNWK FWHYYEGWWK SHYWWWWHWK WMWNWWPWNK YGLLFFLPWK FWKLFNVLFK SHYWWWWHWK WMWQFFQYFK YGWWFWVATK FWLFGYWLPK SHYWWYFWHK WMWWLQDFYK YHFPWYWLFK FWLYLAQLFK SMSWYFVLAK WMWWLQDYFK YHPFWWMHWK FWMHWYDMWK SPWWMPWWFK WMWWSWFSPK YHPMVWYWWK FWMLFDWPWK SPYFWYWFYK WMYFWWLSPK YHWMYWPMWK FWMPLGWFFK SPYWWLWPFK WNFPWLLFWK YHWWDFGWFK FWMQWYWPWK SVLDWFYWFK WNFPWYWLFK YLFWDWWNWK FWMSWPFWMK SYYWWHWWHK WNFWWTNYWK YLNWWQYFLK FWNFLGMLWK TAWWWFFPFK WNLWNWWFHK YLTFLQFFLK FWNFYTDWWK TPFWELLVFK WNPWWLGYWK YLWYMHSFWK FWNHWFEWWK TPLLWFWLFK WNQLFWDWFK YLWYMHSWFK FWNLLMAVLK TQYWWWYPWK WNSFWYYFWK YLYLVWDLFK FWNLVFLTLK TTWFDMLFWK WNSWWPWYWK YMFLNLWFQK FWNLVFTLLK TVHHFFFWVK WNTLWWTWWK YMFLNLWQFK FWNLVLTFLK TYDFWWWTWK WNWAYWWMGK YMHHFWDWWK FWNSLFFFLK VFDWVFTLFK WNWGWYYFPK YMPLLWLMLK FWNVFWWHYK VFEWLFWPFK WNWGWYYPFK YMTFFLELFK FWNWFYMLGK VFFWPTWWFK WNWSSWWFYK YMTPFWWFFK FWNWVEFFYK VFHYYFWHWK WNWSSWWYFK YMWDPWWWYK FWNWVWQWNK VFKLNSHHPK WNWWHWWQLK YNVYWHYWWK FWPFYWWNWK VFNFLLYLLK WNWWLPYVWK YNWGWYSWWK FWPNFWAWFK VFPFFLYALK WNWWLWPYVK YNWHWWPYWK FWPYMFWPWK VFPMFWWWPK WNWYWFHQWK YNWWWYHLYK FWPYWLYYPK VFQMHFHWFK WPFAWPFWFK YPFFKWWFWK FWPYYLFQWK VFQQWELEYK WPFPVWWFWK YPFFPFWFYK FWRFLQDLLK VFTWLYAWLK WPLPWFVFWK YPFVFWDFWK FWRLFQDLLK VFWHLWGFFK WPMWFFLPWK YPMWWHWWGK

197 FWRLMFMWYK VFWHWFFGLK WPMWFFPLWK YPWAWWYFPK FWRPFFMWYK VFWHWFLPWK WPPFVGWWWK YPWWLFLWPK FWVDWFYFAK VFWWETYWWK WPSFWVFPWK YPWWLLFWPK FWVMFPWHFK VFYPWWFTWK WPWDELVWWK YQFNWFHWWK FWVWLFPHWK VGWFFWPWYK WPWFWQSMWK YQWWHLWPWK FWWDFSLPWK VGWFFWYWPK WPWFWWGEPK YQWWHWLPWK FWWDFWFNHK VGWMWFPWWK WPWFWWGPEK YSPWAWWTYK FWWDHFEYWK VLFDWFAWWK WPWKYFYYWK YSWWEQWWWK FWWFEDWFTK VLGWLMMFSK WPWWWPGWPK YSYWEWWTWK FWWFQHSWPK VLHVHFWLGK WSEQTTNLPK YSYWWPMFWK FWWFQQNWYK VLNWLMDLFK WSFFLFFPLK YSYWWPWFMK FWWFWDFPFK VLPWWFQWFK WSFFLFPFLK YTWFYHWWYK FWWFWDPFFK VLWYTMFNFK WSFLWDMLFK YVQWYHWWWK FWWFWDWMPK VMPWWMWWMK WSFWVFGKPK YVWFFPLWMK FWWHVWMPFK VMQWLWELLK WSFWVFGKPK YWAWYSWFNK FWWHVWPMFK VPFWLFNWLK WSFWVFKGPK YWEKWYWFFK FWWKFYWHYK VPWWEYFLFK WSHFFDWWFK YWEWFWNQWK FWWKYFWHYK VPWWFFFKFK WSNWFWWYHK YWFFLQFPWK FWWKYFWHYK VPWWYEFLFK WSPAFLWFWK YWFPWEWWQK FWWLWPYVGK VVWFHLFFPK WSPFWFWQFK YWFWWQWKHK FWWLYEHWWK VWDFVWYFLK WSPQWWWQWK YWFWYTQWMK FWWMDQMWWK VWDWWLYFRK WSPWFDWWFK YWHFWWDWPK FWWMPWLSLK VWFHFGVWHK WSPWVWWFTK YWHWFGLGFK FWWNYWYFEK VWKFLNLYLK WSWFFFWNPK YWHWFLGGFK FWWNYYYYAK VWKFLNYLLK WSWFFFWPNK YWHWSPWNWK FWWPFFYTYK VWLFNVEPFGWLK WSWNWYSYWK YWKEWYWFFK FWWPFWLPLK VWPSWFLFWK WSWNWYYSWK YWNWFYYYRK FWWTDLWTFK VWPYFLYFWK WSWWEYFTYK YWNWSSWWWK FWWTWSQPFK VWQHWWHYWK WSWYNYFWYK YWNWWKSWWK FWWTWYNLMK VWQHWWWNWK WSWYWLPYYK YWPWFFQAWK FWWWNFDSWK VWQHWWYHWK WSWYWLYYPK YWPWFFQWAK FWWWNPYGWK VWWFPWLPLK WTEYYWHWWK YWQWHWAFWK FWWWPWTQLK VWWHVWNLWK WTEYYWHWWK YWQWHWFAWK FWWWSLPLFK VWWHWAYWPK WTNFWFWYPK YWSWWMPWGK FWWWVWTQNK VWWYFPFWPK WTSWLAWYHK YWVFFPPYWK FWWWWPYRSK VWWYPFFWPK WTTESQQAFK YWWEWANFWK FWWWWPYRSK VWYWFANFFK WTTESQQAFK YWWFPWWKYK FWWWWPYSRK VYAPFWFWWK WTWNWPFLWK YWWHFYHDWK FWWYTYYWRK VYWHEFWWWK WTWPWFWYPK YWWMTMPFWK

198 FWWYYWFPHK WAAWYYNLFK WTWWPYWMWK YWWMWPGVWK FWWYYWFPHK WAFLPYWWYK WVDFVWYFLK YWWNFFNFLK FWYHDYSWWK WAFLPYWYWK WVDFWQYWLK YWWTLQWHWK FWYHLWYPFK WAFLYPWWYK WVFFNFYYPK YWWTLQWWHK FWYHYYTWYK WAFWWPWFGK WVFPFAWFSK YWWWHLMRFK FWYTWWYTHK WAFWWWPFGK WVFWWPNLWK YWWWHYWMPK FWYWFFFPTK WAFYWYPYWK WVGFFFPYFK YWWWHYWMPK FWYWFFPFTK WAHPLWWFWK WVGLFWWMLK YWWYYDYYQK FWYWLPDWWK WAHWLWEYHK WVLQLFKLWK YWYHWGWWFK FWYWLSLLHK WANFWFFNWK WVMFYKNTWK YWYKFFWFPK FWYYPYMWYK WANFWFPYWK WVMVFLNMLK YWYKFFWPFK FWYYQWWGNK WANFWFYPWK WVWWFPWMHK YYFMWPFWWK FYANYWWFFK WANFWYWAWK WVWWLFYPLK YYFNYYEWWK FYDHWAWWWK WANFWYWWAK WVWWMVTNWK YYLSWWNYWK FYDWWSVFWK WAPLWFWWPK WWADFLMLFK YYLSWWNYWK FYDWWYSYFK WAWFMMPFWK WWAFPWFFYK YYTWWWPWPK FYDWWYYSFK WAWFMPMFWK WWAWLWDHWK YYWLFNPWFK FYFDWWYWPK WAWNFFWRFK WWDFAFSVFK YYWLHWFFPK FYFDYWYWWK WAWPFFFSWK WWDWQYWMFK YYWLHWFPFK FYFFNWFLPK WAWWAFPWYK WWDYLNSWWK YYWMTAFWWK FYFPVWAWWK WAWWAFYPWK WWEEPWWWHK YYWWVMWHPK FYFSFWMPWK WAYAWWLWPK WWETAFWWPK YYWWWHLFPK FYFYKFWWFK WDWWPLWHFK WWEWWKWFPK YYYPWYFWFK

FYGFSYWYVK WDYYWMNWFK WWEWWWKFPK

FYHFWNWWLK WEAPWYWYWK WWFEPYWFFK

FYHWWNWGYK WEWFNYPWWK WWFEYPWFFK

Shown are all non-specific binders identified (with an average local confidence score ≥ 80) from selections against 12ca5 (1649 total), where a non-specific binder is defined as any sequence not containing a DXXDY or EXXDY motif.

199 4.7. References

(1) Mignani, S.; Huber, S.; Tomás, H.; Rodrigues, J.; Majoral, J.-P. Why and How Have Drug Discovery Strategies in Pharma Changed? What Are the New Mindsets? Drug Discov. Today 2016, 21 (2), 239–249. https://doi.org/10.1016/j.drudis.2015.09.007. (2) Baell, J. B.; Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53 (7), 2719–2740. https://doi.org/10.1021/jm901137j. (3) Payne, D. J.; Gwynn, M. N.; Holmes, D. J.; Pompliano, D. L. Drugs for Bad Bugs: Confronting the Challenges of Antibacterial Discovery. Nat. Rev. Drug Discov. 2007, 6 (1), 29–40. https://doi.org/10.1038/nrd2201. (4) Tsomaia, N. Peptide Therapeutics: Targeting the Undruggable Space. Eur. J. Med. Chem. 2015, 94, 459–470. https://doi.org/10.1016/j.ejmech.2015.01.014. (5) Mijalis, A. J.; Thomas Iii, D. A.; Simon, M. D.; Adamo, A.; Beaumont, R.; Jensen, K. F.; Pentelute, B. L. A Fully Automated Flow-Based Approach for Accelerated Peptide Synthesis. Nat. Chem. Biol. 2017, 13 (5), 464–466. https://doi.org/10.1038/nchembio.2318. (6) Gates, Z. P.; Vinogradov, A. A.; Quartararo, A. J.; Bandyopadhyay, A.; Choo, Z.-N.; Evans, E. D.; Halloran, K. H.; Mijalis, A. J.; Mong, S. K.; Simon, M. D.; Standley, E. A.; Styduhar, E. D.; Tasker, S. Z.; Touti, F.; Weber, J. M.; Wilson, J. L.; Jamison, T. F.; Pentelute, B. L. Xenoprotein Engineering via Synthetic Libraries. Proc. Natl. Acad. Sci. 2018, 115 (23), E5298–E5306. https://doi.org/10.1073/pnas.1722633115. (7) Schissel, C. K.; Mohapatra, S.; Wolfe, J. M.; Fadzen, C. M.; Bellovoda, K.; Wu, C.-L.; Wood, J. A.; Malmberg, A. B.; Loas, A.; Gómez-Bombarelli, R.; Pentelute, B. L. Interpretable Deep Learning for De Novo Design of Cell-Penetrating Abiotic Polymers. bioRxiv 2020, 2020.04.10.036566. https://doi.org/10.1101/2020.04.10.036566. (8) Krumpe, L. R.; Mori, T. Potential of Phage-Displayed Peptide Library Technology to Identify Functional Targeting Peptides. Expert Opin. Drug Discov. 2007, 2 (4), 525. (9) Roberts, R. W.; Szostak, J. W. RNA-Peptide Fusions for the in Vitro Selection of Peptides and Proteins. Proc. Natl. Acad. Sci. 1997, 94 (23), 12297–12302. https://doi.org/10.1073/pnas.94.23.12297. (10) Lam, K. S.; Kazmierskit, W. M. A New Type of Synthetic Peptide Library for Identifying Ligand-Binding Activity. 1991, 354, 3. (11) Zuckermann, R. N.; Kerr, J. M.; Siani, M. A.; Banville, S. C.; Santi, D. V. Identification of Highest-Affinity Ligands by Affinity Selection from Equimolar Peptide Mixtures Generated by Robotic Synthesis. Proc. Natl. Acad. Sci. U. S. A. 1992, 89 (10), 4505–4509. https://doi.org/10.1073/pnas.89.10.4505. (12) Stokes, J. M.; Yang, K.; Swanson, K.; Jin, W.; Cubillos-Ruiz, A.; Donghia, N. M.; MacNair, C. R.; French, S.; Carfrae, L. A.; Bloom-Ackermann, Z.; Tran, V. M.; Chiappino-Pepe, A.; Badran, A. H.; Andrews, I. W.; Chory, E. J.; Church, G. M.; Brown, E. D.; Jaakkola, T. S.; Barzilay, R.; Collins, J. J. A Deep Learning Approach to Antibiotic Discovery. Cell 2020, 180 (4), 688-702.e13. https://doi.org/10.1016/j.cell.2020.01.021. (13) Yoshida, M.; Hinkley, T.; Tsuda, S.; Abul-Haija, Y. M.; McBurney, R. T.; Kulikov, V.; Mathieson, J. S.; Galiñanes Reyes, S.; Castro, M. D.; Cronin, L. Using Evolutionary Algorithms and Machine Learning to Explore Sequence Space for the Discovery of

200 Antimicrobial Peptides. Chem 2018, 4 (3), 533–543. https://doi.org/10.1016/j.chempr.2018.01.005. (14) Basith, S.; Manavalan, B.; Hwan Shin, T.; Lee, G. Machine Intelligence in Peptide Therapeutics: A Next‐generation Tool for Rapid Disease Screening. Med. Res. Rev. 2020, 40 (4), 1276–1314. https://doi.org/10.1002/med.21658. (15) Giguère, S.; Laviolette, F.; Marchand, M.; Tremblay, D.; Moineau, S.; Liang, X.; Biron, É.; Corbeil, J. Machine Learning Assisted Design of Highly Active Peptides for Drug Discovery. PLOS Comput. Biol. 2015, 11 (4), e1004074. https://doi.org/10.1371/journal.pcbi.1004074. (16) Cunningham, J. M.; Koytiger, G.; Sorger, P. K.; AlQuraishi, M. Biophysical Prediction of Protein–Peptide Interactions and Signaling Networks Using Machine Learning. Nat. Methods 2020, 17 (2), 175–183. https://doi.org/10.1038/s41592-019-0687-1. (17) Townshend, R.; Bedi, R.; Suriana, P.; Dror, R. End-to-End Learning on 3D Protein Structure for Interface Prediction. 10. (18) Shao, X. M.; Bhattacharya, R.; Huang, J.; Sivakumar, I. K. A.; Tokheim, C.; Zheng, L.; Hirsch, D.; Kaminow, B.; Omdahl, A.; Bonsack, M.; Riemer, A. B.; Velculescu, V. E.; Anagnostou, V.; Pagel, K. A.; Karchin, R. High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets. Cancer Immunol. Res. 2020, 8 (3), 396–408. https://doi.org/10.1158/2326-6066.CIR-19-0464. (19) Chen, B.; Khodadoust, M. S.; Olsson, N.; Wagar, L. E.; Fast, E.; Liu, C. L.; Muftuoglu, Y.; Sworder, B. J.; Diehn, M.; Levy, R.; Davis, M. M.; Elias, J. E.; Altman, R. B.; Alizadeh, A. A. Predicting HLA Class II Antigen Presentation through Integrated Deep Learning. Nat. Biotechnol. 2019, 37 (11), 1332–1343. https://doi.org/10.1038/s41587-019- 0280-2. (20) Liu, Z.; Cui, Y.; Xiong, Z.; Nasiri, A.; Zhang, A.; Hu, J. DeepSeqPan, a Novel Deep Convolutional Neural Network Model for Pan-Specific Class I HLA-Peptide Binding Affinity Prediction. Sci. Rep. 2019, 9 (1), 794. https://doi.org/10.1038/s41598-018-37214- 1. (21) Quartararo, A. J.; Gates, Z. P.; Somsen, B. A.; Hartrampf, N.; Ye, X.; Shimada, A.; Kajihara, Y.; Ottmann, C.; Pentelute, B. L. Ultra-Large Chemical Libraries for the Discovery of High-Affinity Peptide Binders. Nat. Commun. 2020, 11 (1), 3183. https://doi.org/10.1038/s41467-020-16920-3. (22) Smith, T. F.; Waterman, M. S. Identification of Common Molecular Subsequences. J. Mol. Biol. 1981, 147 (1), 195–197. https://doi.org/10.1016/0022-2836(81)90087-5. (23) Rogers, D. J.; Tanimoto, T. T. A Computer Program for Classifying Plants. Science 1960, 132 (3434), 1115–1118. https://doi.org/10.1126/science.132.3434.1115. (24) Sundar, V.; Colwell, L. The Effect of Debiasing Protein–Ligand Binding Data on Generalization. J. Chem. Inf. Model. 2020, 60 (1), 56–62. https://doi.org/10.1021/acs.jcim.9b00415.

201 Appendix: Discovery of peptide-based binders to the SARS-Cov-2 spike protein receptor binding domain and ACE2

A.1 Introduction

Coronavirus disease 19 (COVID-19), caused by the novel coronavirus SARS-CoV-2, is an urgent global health crisis. Since December 2019, the virus has caused over 15 million infections and 600,000 deaths, according to the WHO situation report on July 23, 2020

(https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports). Much like the SARS-CoV outbreak in November 2002, SARS-CoV-2 can cause severe respiratory problems, with coughing, fever, and difficulty breathing presenting as the most common symptoms; however, SARS-CoV-2 has spread far more widely and remains uncontained in most parts of the world. It has been recognized since the onset of the pandemic that the virus can spread rapidly through droplets; however, growing evidence now suggests it can also spread via aerosols1. Elderly patients (65+ years of age) with pre-existing medical conditions remain the most at-risk demographic, with mortality rates exceeding 10% (though estimates vary widely)2.

Currently there is no vaccine available for this virus, and existing treatments are not universally effective. Therefore, there remains a highly urgent need for the development of new anti-SARS-

CoV-2 therapies.

Like SARS-CoV, SARS-CoV-2 invades host cells by engaging the angiotensin- converting enzyme 2 (ACE2) receptor on the host cell surface3. This interaction is mediated by the receptor binding domain (RBD) of the SARS-CoV-2 spike protein (S)4. Because a subset of

SARS-CoV neutralizing antibodies block SARS spike RBD binding to ACE25, and anti-SARS-

CoV-2 antibodies have been found to bind the S protein in an ACE2-competitive fashion6, we hypothesized that low molecular weight, targeted moieties that could disrupt this interaction

202 could prevent virus entry into human cells, and provide a starting point for therapeutic and diagnostic development.

While small molecules may not be well-suited to disrupt the S/ACE2 interaction, as protein-protein interactions (PPIs) generally tend to be intractable to small molecule inhibition7, peptides offer a synthetically accessible alternative, able to contact multiple “hot spots” on a protein surface8. Previously, helical mimetics derived from ACE2 have been shown to bind the

SARS-CoV S protein and exhibited neutralizing activity9. Similarly, peptides derived from the heptad repeat (HR) regions of the S2 subunit were shown to inhibit SARS-CoV fusion with the cell membrane10. With respect to SARS-CoV-2, work from out lab has shown that a 23-mer peptide derived from ACE2 could engage SARS-CoV-2 RBD with a KD of approximately 1 µM; however, its binding activity was dependent on the source of RBD used11,12. An 85-mer N- terminal truncate of ACE2 was shown to bind SARS-CoV-2 RBD with nanomolar affinity13; however, lower molecular weight peptides that maintain this affinity have yet to be reported.

We endeavored here to discover novel, low molecular weight peptide binders to the

SARS-Cov-2 S RBD, as well as to ACE2, for disruption of viral fusion. For this purpose, we employed magnetic bead-based affinity selection-mass spectrometry (AS-MS)14, sampling peptide libraries comprising diverse amino acid content. We report the discovery of three 13-mer binders to the S RBD comprising canonical amino acids, and a family of 13-mer binders to

ACE2 comprising a mixture of canonical and non-canonical amino acids. Further characterization of these binders is currently underway.

A.2 Results

203 To initiate selections against SARS-CoV-2 S RBD, four 2 x 108-member libraries (8 x

8 10 total) of design (X)12K, where X = all natural amino acids excluding cysteine and isoleucine, were incubated with biotinylated HEK-expressed RBD immobilized on streptavidin magnetic beads. Selections against an anti-hemagglutinin monoclonal antibody, clone 12ca5, were performed as a negative control, to identify and exclude non-specific binders. Unbound peptides were washed away, and bound peptides eluted under denaturing conditions. Eluates were concentrated and submitted for nano-liquid chromatograpy-tandem mass spectrometry (nLC-

MS/MS) analysis. De novo sequencing was performed using PEAKS software, and python scripts were used to filter sequences based on the library design criteria15.

Selections from these libraries yielded three peptides that appeared selective towards

RBD (based on analysis of extracted ion chromatograms) with an average local confidence

(ALC) score ≥ 80: TVFGLNVWKRYSK (RBD.1), LVMGLNAWNMWYK (RBD.2), and

LVMGLHVYLRQGK (RBD.3) (Table A.1.1; Fig. A.6.1–A.6.3). All sequences contain a conserved XVXGL motif at the N-terminus. Additionally, sequences RBD.2 and RBD.3 share the same five N-terminal residues (LVMGL). In total, all three peptides have seven out of twelve residues in common with at least one other.

204 Table A.2.1. AS-MS identifies three selective RBD-binding peptides.

Name Sequence ALC Notes RBD.1 TVFGLNVWKRYSK 81 From RBD.2 99 LVMGLNAWNMWYK AS-MS RBD.3 LVMGLHVYLRQGK 89 RBD.4 LVMGLNVWLRYSK N/A Consensus Scr1 GSVKRWLTYVKNF N/A Scrambled sequences Scr2 RFYVTKGWSNKVL N/A of RBD.1

From affinity selections of an 8 x 108-member library, three putative RBD-binding peptides (ALC score ≥ 80) were identified that demonstrated selectivity based on analysis of extracted ion chromatograms. Residues conserved across all three peptides are indicated in red, and residues conserved in two out of three in blue.

To validate that these peptides bound SARS-CoV-2 S RBD, BioLayer Interferometry

(BLI) experiments were performed. RBD.1 and RBD.2 were resynthesized, along with a consensus sequence RBD.4 that incorporated residues conserved across two of the three sequences (Table A.2.1; Fig. A.6.4–A.6.6). All peptides were synthesized with a C-terminal

Lys(biotin) and β-Ala spacer. RBD.1 and RBD.2 bound SARS-CoV-2 S RBD with dissociation constants of 250 nM and 290 nM, respectively (Fig. A.2.1a). The consensus peptide RBD.4 exhibited a KD of 80 nM, a roughly three-fold improvement over the identified hits. As a test of selectivity, two scrambled variants of RBD.1 (Scr1 and Scr2) were synthesized and tested as well (with a C-terminal Lys(biotin) and β-Ala spacer, as above) (Table A.2.1; Fig. A.6.7–

A.6.8). No association was observed for either peptide (Fig. A.2.1b). Moreover, RBD.1 did not exhibit appreciable association to 12ca5, suggesting these peptides were not isolated due to non- specific binding (Fig. A.6.9).

We sought to determine if this family of peptides could compete for the ACE2 binding site on SARS-CoV-2 S RBD. A competition BLI assay was performed, in which biotinylated

ACE2 was immobilized on streptavidin sensor tips. Tips were then dipped into a fixed

205 concentration of RBD, pre-incubated with increasing concentrations of RBD.1. In this assay format, no decrease in association was observed; in fact, the association increased slightly with increasing peptide concentration, suggesting RBD.1 binds at different site on RBD, and may associate with ACE2 as a complex (Fig. A.6.10). In contrast, in a self-competition assay, in which RBD.1 was immobilized in place of ACE2, a clear concentration-dependent decrease in response was observed (Fig. A.2.1c). Together, these results suggest that RBD.1 binds at a defined site on RBD, but a site distinct from the ACE2 binding site. Further work characterizing the selectivity of these peptides, their determinants of binding energy, and the site to which they bind RBD is currently underway.

206

Figure A.2.1. Identified peptides bind SARS-CoV-2 spike RBD with nanomolar affinity. a) Putative RBD-binding peptides RBD.1, RBD.2, and RBD.4 were each resynthesized with a C- term β-Ala spacer and Lys(biotin). All peptides exhibited mid-nanomolar affinity for SARS- CoV-2 spike RBD as measured by BioLayer Interferometry (BLI). b) Peptides comprising scrambled amino acid sequences of RBD.1 exhibited no binding activity towards RBD by BLI. c) RBD.1, pre-incubated with RBD at increasing concentrations, could dose-dependently inhibit binding of immobilized RBD.1 to RBD in a BLI-based competition assay, indicating specific binding at a defined site on RBD as opposed to non-specific binding.

207 8 To initiate selections against ACE2, a 2 x 10 -member library of design (X)12K, where X

= a suite of canonical and non-canonical amino acids, was incubated with biotinylated ACE2 immobilized on streptavidin magnetic beads (Fig. A.2.2a). Selections against RBD were performed in parallel, as both a control for selectivity, as well as for discovery of additional

RBD-binding peptides. As above, unbound peptides were washed away, and bound peptides eluted under denaturing conditions. Eluates were concentrated and submitted for nLC-MS/MS analysis.

A total of 54 sequences with ALC ≥ 80 were identified from selections against ACE2, while no selective binders were identified towards RBD (Table A.6.1). Among these 54, 21 contained a cyclopropylalanine at the N-terminus, and 6 contained an N-terminal

(cyclopropylalanine)-(thiazolylalanine)-(3,4-difluorophenylalanine) motif (Fig. A.2.2b). Among peptides that did not have an N-terminal cyclopropylalanine, and did not bear a similar N- terminal motif, four were chosen for further investigation by analysis of extracted ion chromatograms. Four out of four peptides examined appeared selective for ACE2 (Fig. A.6.11–

A.6.14). Further work validating the binding of these peptides, as well as their ability to disrupt the ACE2/RBD interaction, is the subject of ongoing investigation.

208

Figure A.2.2. A consensus motif based on non-canonical amino acids is identified from 8 selections against ACE2. a) A library comprising 2 x 10 members of design (X)12K, where X = a suite of 4 canonical and 17 non-canonical amino acids, was used in selections for ACE2 binding. b) Six peptides were identified with a conserved N-term motif comprising (cyclopropylalanine)-(thiazolylalanine)-(3,4-difluorophenylalanine) (indicated in red). Additional residues common to at least two sequences are indicated in blue. Abbreviations: 4Af = 4-aminophenylalanine; 4Py = 3-(4'-pyridyl)-L-alanine; β-Ala = β-alanine; β-Ser = β- homoserine; Aad = aminoadipic acid; Amb = 4-(aminomethyl)benzoic acid; Cha = cyclohexylalanine; Cpa = cyclopropylalanine; Dff = 3,4-difluorophenylalanine; Dmf = 3,4- dimethoxyphenylalanine; hAr = homoarginine; Hyp = hydroxyproline; Msn = methionine sulfone; Orn = ornithine; Php = 4-phenylpiperidine-4-carboxylic acid; Tha = thiazolylalanine.

209

A.3 Discussion

Here we demonstrate the use of AS-MS for the discovery of peptides that bind one of two proteins related to SARS-CoV-2 infection: the SARS-CoV-2 spike RBD and its host receptor,

ACE2. From selections of 8 x 108 peptides, a family of three selective RBD-binding peptides were identified, each containing a conserved N-terminal motif. From an additional 2 x 108- member library, comprising a high degree of non-canonical amino acid content, 54 putative

ACE2-binding peptides were identified, a subset of which likewise contained a conserved N- terminal motif.

Diversity is a key determinant of selection outcome, as demonstrated here for the discovery of RBD-binding peptides, and previously for discovery of MDM2-binding peptides as well as in the field of antibody engineering16. From four 2 x 108-member libraries, one library yielded one hit, another library two hits, and two libraries no hits. These results underscore the importance of reaching a library diversity of ~109 to achieve consistently successful selections against unknown targets. As previously demonstrated, smaller, focused libraries can achieve success when prior knowledge can be applied to the design17.

The three RBD-binding peptides identified here exhibited mid-nanomolar affinity for

SARS-CoV-2 spike RBD, but did not disrupt the interaction between RBD and ACE2. This result likely suggests that the peptides bind RBD at a site distinct from the ACE2-binding site; indeed, there was no selection pressure applied to isolate binders only to the ACE2-binding site, in part because of the inherit stringency of the assay. However, we anticipate that these reagents could be used for targeted delivery of viricidal payloads, or for diagnostic applications.

210 Alternatively, these peptides could potentially be conjugated to weaker affinity binders to the

ACE2-binding site, to improve their potency.

The ACE2-binding peptides identified in this study remain uncharacterized and both their affinity for ACE2, as well as their ability to disrupt the RBD/ACE2 interaction, need to be assessed. Additionally, given the role ACE2 plays in the renin-angiotensin-aldosterone system18, it will need to validated that these peptides do not inhibit native ACE2 function. Given the diversity in the sequences identified, however, we speculate that these peptides potentially bind multiple distinct sites on ACE2. Characterizing these putative binders with respect to all of these considerations is the subject of ongoing work.

A.4 Experimental

A.4.1. Materials

H-Rink Amide-ChemMatrix resin was purchased from PCAS BioMatrix Inc. (St-Jean- sur-Richelieu, Quebec, Canada). 20 μm TentaGel S NH2 microspheres (TMN-9909-PI; 0.2 to

0.3 mmol/g amine loading) was purchased from Peptides International (Louisville, KY). Fmoc-

Ala-OH, Fmoc-Arg(Pbf)-OH, Fmoc-Asn(Trt)-OH, Fmoc-Asp(tBu)-OH, Fmoc-Gln(Trt)-OH,

Fmoc-Glu(tBu)-OH, Fmoc-Gly-OH, Fmoc-His(Trt)-OH, Fmoc-Ile-OH, Fmoc-Leu-OH, Fmoc-

Lys(Boc)-OH, Fmoc-Met-OH, Fmoc-Phe-OH, Fmoc-Pro-OH, Fmoc-Ser(tBu)-OH, Fmoc-

Thr(tBu)-OH, Fmoc-Trp(Boc)-OH, Fmoc-Tyr(tBu)-OH, and Fmoc-Val-OH were purchased from Advanced ChemTech (Louisville, KY). Fmoc-D-Pro-OH was also purchased from

Advanced ChemTech (Louisville, KY). 1-[Bis(dimethylamino)methylene]-1H-1,2,3-

211 triazolo[4,5-b]pyridinium-3-oxid-hexafluorophosphate (HATU) was purchased from P3

BioSystems (Louisville, KY). 4-[(R,S)-α-[1-(9H-Fluoren-9-yl)-methoxyformamido]-2,4- dimethoxybenzyl]-phenoxyacetic acid (Fmoc-Rink amide linker), Fmoc-L-His(Boc)-OH, Fmoc-

β-Ala-OH, Fmoc-(4-aminomethyl) benzoic acid, Fmoc-β-cyclopropyl-L-alanine-OH, Fmoc-ß- cyclohexyl-L-alanine, Fmoc-L-hydroxyproline(tBu)-OH, Fmoc-β-homoser(tBu)-OH, Fmoc-L- methionine sulfone, Fmoc-L-α-aminoadipic acid δ-tert-butyl ester, Fmoc-L-ornithine(Boc)-OH,

Fmoc-Nω-(Pbf)-L-homoarginine, Fmoc-3-(4-thiazolyl)-L-alanine, Fmoc-3-(4'-pyridyl)-L- alanine, Fmoc-4-(Boc-amino)-L-phenylalanine, Fmoc-3,4-difluoro-L-phenylalanine, Fmoc-3,4- dimethoxy-L-phenylalanine, Fmoc-4-phenylpiperidine-4-carboxylic acid, and Nα-Fmoc-Nε- biotinyl-L-lysine were purchased from Chem-Impex International (Wood Dale, IL). Peptide synthesis-grade N,N-dimethylformamide (DMF), dichloromethane (DCM), diethyl ether, HPLC- grade acetonitrile (MeCN), and HPLC-grade methanol (MeOH) were purchased from VWR

International (Philadelphia, PA). Trifluoroacetic acid (TFA; for HPLC, ≥99%), piperidine

(ReagentPlus; 99%), triisopropylsilane (98%), 1,2-ethanedithiol (≥98%), and phenylsilane (97%) were purchased from MilliporeSigma (St. Louis, MO). Diisopropylethylamine (99.5%; biotech. grade; DIEA) was also purchased from MilliporeSigma, and purified by passage through an activated alumina column (Pure Process Technology solvent purification system; Nashua, NH).

Water was deionized using a Milli-Q Reference water purification system (Millipore).

Mouse anti-hemagglutinin (HA) monoclonal antibody clone 12ca5 (anti-HA mAb 12ca5) was purchased from Columbia Biosciences (Frederick, MD). HEK-derived SARS-CoV-2-Spike-

RBD was purchased from Sino Biological (40592-V08H) and biotinylated in house. Biotinylated human ACE2 was purchased from ACROBiosystems. HyClone™ Fetal Bovine Serum

(SH30071.03HI, heat inactivated) was purchased from GE Healthcare Life Sciences (Logan,

212 UT). Bovine serum albumin (BSA; RIA grade) and Tween 20 (reagent grade) were purchased from Amresco (Solon, OH). Dynabeads MyOne Streptavidin T1 magnetic microparticles were purchased from Invitrogen (Carlsbad, CA).

9 A.4.2. Synthesis and characterization of a 10 (X)12K-CONH2 library

Details of the synthesis and characterization of this library are given in 3.4.2 and 3.4.3.

A.4.3. Biotinylation of SARS-CoV-2 spike RBD

EZ-Link-Sulfo-NHS-LC-LC-biotin (10 mM in H2O, 15 µL, 150 nmol) was added to

SARS-CoV-2-Spike-RBD (16 µM in PBS, pH = 7.5, 2 mL, 33 nmol) at 0 °C and then placed on nutating mixer for 2 h at ambient temperature. Reaction was quenched with addition of 1M Tris

(50 µL). The excess biotin was removed from the mixture by size exclusion centrifugation through 10 kDa Amicon tubes (3 x 5 mL PBS). Concentration of biotinylated SARS-CoV-2-

Spike-RBD was measured by absorption at 280 nm.

A.4.4. Affinity selections for SARS-CoV-2 spike RBD binding

Procedure for each selection (four in total), conducted side by side against 12ca5 as a control, is outlined below:

Preparation of RBD-functionalized and 12ca5-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads (300 μL of 10 mg/mL stock for RBD, 200 μL of 10 mg/mL stock for 12ca5) were transferred to 1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS,

213 and then treated with 44 μL of biotinylated SARS-CoV-2 spike RBD (18 μM; diluted to 300 μL with 10% FBS, 0.02% Tween 20, 1x PBS) or 347 μL of biotinylated 12ca5 (1.5 μM; 0.45 nmol).

The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS.

Beads were resuspended in 300 μL or 200 μL of 10% FBS, 0.02% Tween 20, 1x PBS for RBD and 12ca5 samples, respectively.

Affinity capture:

Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads (prepared above) in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 10 pM/member library.

Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

A.4.5. Solid phase synthesis of putative RBD-binding peptides and controls

Biotinylated variants of RBD.1, RBD.2, RBD.4, Scr1, and Scr2 were each synthesized via fully automated fast-flow peptide synthesis as previously described.19,20 At the C-terminus of each sequence, a β-Ala and C-term Lys(biotin) was incorporated to facilitate subsequent binding studies.

214 A.4.6. BioLayer Interferometry of putative RBD-binding peptides and controls

Lyophilized peptides were dissolved to 2 mg/mL in 1x PBS and diluted 500-fold into

0.1% BSA, 0.02% Tween-20, 1x PBS (‘kinetic buffer’) for immobilization onto streptavidin

Octet biosensors (ForteBio; Menlo Park, CA). Biolayer interferometry (BLI) assays were performed in 96 well plates (GreinerBio-One; Kremsmünster, Austria; polypropylene, flat- bottom, chimney well) using an Octet Red96 System (ForteBio; Menlo Park, CA). Wells were filled with 200 µL of kinetic buffer, peptide solution, or target protein solution (prepared in kinetic buffer at variable concentration). Biotinylated peptide was immobilized onto the streptavidin tip for 120 s. Sensors were then dipped into kinetic buffer for 60 s, protein solutions

300 s, and finally into kinetic buffer for 300 s. Measurements were carried out at 30 °C.

A.4.7. BioLayer Interferometry competition assay of RBD-binding peptides

A BLI competition binding assay was set up as previously described.17 First, a calibration curve was constructed by dipping immobilized biotinylated ACE2 into serially diluted HEK- derived SARS-CoV-2 spike RBD (Sino Biological). In brief, streptavidin sensors were soaked in kinetic buffer (PBS supplemented with 0.02% Tween-20, and 0.1% BSA) for 10 min at 30 °C, and loaded with biotinylated ACE2 for 4 min. Then, serial dilutions of SARS-CoV-2-RBD in kinetic buffer were analyzed for binding, typically at 30 °C and 1,000 rpm. Second, with the binding information from the calibration curve, a concentration of SARS-CoV-2 spike RBD of

500 nM was chosen to premix with various concentrations of RBD.1 to study the competition effects. Briefly, variable concentrations of RBD.1 were incubated with 500 nM SARS-CoV-2 spike RBD at room temperature for 15 min. Meanwhile, streptavidin sensors were soaked into kinetic buffer for 10 min at 30 °C. ACE2 was immobilized on the streptavidin sensor surface and

215 the association and dissociation curves of SARS-CoV-2 spike RBD in the preincubated samples were then analyzed at 30 °C and 1,000 rpm.

A.4.8. Synthesis of a non-canonical (X)12K-CONH2 library

SPPS:

4.17 g of 20 μm TentaGel resin (0.26 mmol/g, 1.1 mmol, 1.0 x 109 beads) was transferred to a 100 mL peptide synthesis vessel, swollen in DMF, and then washed with DMF (3x). Fmoc-

Rink amide linker (2.9 g, 5.4 mmol, 5 eq) was dissolved in HATU solution (0.38 M in DMF,

12.86 mL, 4.89 mmol), activated with DIEA (2.71 mL, 15.1 mmol) immediately prior to coupling, and added to resin bed. Coupling was performed for 20 min; after this time, resin was washed with DMF (100 mL). Fmoc removal was carried out by treatment of resin with 20% piperidine in DMF (1 x 50 mL flow wash; 2 x 50 mL, 5 min batch treatments). Resin was then washed with DMF (150 mL). Coupling of Fmoc-Lys(Boc)-OH, subsequent Fmoc removal, and

DMF washes were performed in the same manner.

At this stage, resin was suspended in DMF (50 mL), and divided evenly among 21 x 10 mL fritted plastic syringes using a 5 mL Eppendorf pipette. Couplings were performed as follows: Fmoc-protected amino acids (0.36 mmol) in HATU solution (0.38M, 857 μL, 0.33 mmol) were activated with DIEA (181 μL, 1.0 mmol). Each of the following amino acid derivatives was added to a single portion of resin (theory: 199 mg resin, 52 μmol): Fmoc-Gly-

OH, Fmoc-β-Ala-OH, Fmoc-(4-aminomethyl) benzoic acid, Fmoc-L-Ile-OH, Fmoc-β- cyclopropyl-L-alanine-OH, Fmoc-ß-cyclohexyl-L-alanine, Fmoc-D-Pro-OH, Fmoc-L- hydroxyproline(tBu)-OH, Fmoc-Gln(Trt)-OH, Fmoc-β-homoser(tBu)-OH, Fmoc-L-methionine sulfone, Fmoc-L-Asp(OtBu)-OH, Fmoc-L-α-aminoadipic acid δ-tert-butyl ester, Fmoc-L-

216 ornithine(Boc)-OH, Fmoc-Nω-(Pbf)-L-homoarginine, Fmoc-3-(4-thiazolyl)-L-alanine, Fmoc-3-

(4'-pyridyl)-L-alanine, Fmoc-4-(Boc-amino)-L-phenylalanine, Fmoc-3,4-difluoro-L- phenylalanine, Fmoc-3,4-dimethoxy-L-phenylalanine, and Fmoc-4-phenylpiperidine-4- carboxylic acid.

Following nine cycles of split-and-pool synthesis, resin was split into two. One aliquot

8 was split out three ways, yielding three 1.7 x 10 -member (X)9K-CONH2 libraries. The other aliquot was elaborated with three more cycles of split-and-pool, and then likewise split out three

8 ways to give three 1.7 x 10 -member (X)12K-CONH2 libraries. Following removal of N-terminal

Fmoc group, resin was washed with DMF (150 mL) and transferred to fritted plastic syringes (20 mL). Resin was then washed with DCM (3x) and dried under reduced pressure. 1.0 mg of dried resin (theory: 1.4 x 105 beads) was weighed into a plastic tube and set aside for later characterization (A.4.9).

Cleavage from resin and solid phase extraction:

Libraries were globally deprotected and cleaved from resin as described in 2.4.10. Crude, lyophilized powders were resuspended in 95/5 water/acetonitrile (0.1% TFA), and purified over

Supelclean™ LC-18 SPE cartridges (2 g bed mass, 45 μm particle size, 12 mL; Millipore Sigma,

P/N 57117). Procedure is described in 2.4.10.

Preparation of stock solution:

Lyophilized powder of each library (6 in total; theoretical members = 1.7 x 108 each) was dissolved first in DMF and diluted with 1x PBS to a final library concentration of 6.7 mM (~40 pM/member) and final DMF concentration of 10% (v/v). Stock solutions were aliquotted out and stored at -80 °C. Aliquots were thawed on ice prior to use.

217 A.4.9. Characterization of a non-canonical (X)12K-CONH2 library

Sample preparation:

A 1.0 mg aliquot of library resin (from A.4.8) was suspended in 1.0 mL of Milli-Q water and sonicated to achieve a homogenous suspension (theory: 1.4 x 105 beads/mL). A 4 μL aliquot

(theory: 571 beads; 1 pmol/peptide) was transferred to a plastic tube, spun down, and supernatant removed. Beads were then subjected to treatment with 94% (v/v) TFA, 2.5% (v/v) ethanedithiol,

2.5% (v/v) water, and 1.0% (v/v) triisopropylsilane, for 10 min in a 60 °C water bath. TFA was then evaporated under a stream of nitrogen, and cleaved peptide was resuspended in Milli-Q water (0.1% TFA). Sample was purified over a C18 ZipTip® (0.6 μL, MilliporeSigma, P/N

ZTC18S096), eluted in 30/70 water/acetonitrile (0.1% TFA), and lyophilized. Powder was resuspended in 50 μL of Milli-Q water (0.1% FA), and 1 μL (~20 fmol/peptide) was submitted for nLC-MS/MS analysis.

NanoLC-MS/MS analysis:

NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

De novo peptide sequencing:

De novo sequencing was performed as described in 2.4.11. Non-canonical amino acids with masses that differ from natural amino acids were sequenced as either fixed or variable modifications as follows: 4-aminophenylalanine was sequenced as a variable modification on phenylalanine (+15.01); 3-(4'-pyridyl)-L-alanine as variable modification on phenylalanine

(+1.00); Aad = aminoadipic acid as a fixed modification on glutamic acid (+14.02); 4-

(aminomethyl)benzoic acid as a variable modification on glycine (+76.03); cyclohexylalanine as a variable modification on phenylalanine (+6.05); cyclopropylalanine as a fixed modification on valine (+12.00); 3,4-difluorophenylalanine as variable modification on phenylalanine (+35.98);

218 3,4-dimethoxyphenylalanine as variable modification on phenylalanine (+60.02); homoarginine as a fixed modification on arginine (+14.02); hydroxyproline as a variable modification on proline (+15.99); methionine sulfone as a fixed modification on methionine (+31.99); ornithine as a fixed modification on cysteine (+11.07); 4-phenylpiperidine-4-carboxylic acid as a variable modification on proline (+90.05); and thiazolylalanine as a fixed modification on histidine

(+16.96).

A.4.10. Affinity selections for ACE2 binding

Preparation of ACE2-functionalized and RBD-functionalized magnetic beads:

MyOne Streptavidin T1 Dynabeads (2 x 300 μL of 10 mg/mL stock) were transferred to

1.7 mL plastic centrifuge tubes, and placed in a magnetic separation rack. Beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20, 1x PBS, and then treated with 125 μL of biotinylated

ACE2 (3.6 μM; diluted to 500 μL with 10% FBS, 0.02% Tween 20, 1x PBS) or 44 μL of biotinylated SARS-CoV-2 spike RBD (18 μM; diluted to 500 μL with 10% FBS, 0.02% Tween

20, 1x PBS). The resulting suspensions were transferred to a rotating vertical mixer and allowed to incubate for 1 h at 4°C. After this time, the beads were returned to the separating rack, the supernatant was removed, and the beads were washed 3 x 1 mL w/ 10% FBS, 0.02% Tween 20,

1x PBS. Beads were resuspended in 300 μL of 10% FBS, 0.02% Tween 20, 1x PBS.

Affinity capture:

Library (10 fmol/member) was incubated with 100 μL (1 mg) portions of protein- immobilized magnetic beads (prepared above) in the presence of 10% FBS, 1x PBS (final volume: 1 mL) on a rotating mixer for 1 h at 4 °C. Final conditions: 1 mg/mL magnetic beads, 10 pM/member library.

219 Elution and nanoLC-MS/MS:

Bound peptides were eluted as described in 2.4.12. NanoLC-MS/MS analysis was performed as described in 2.4.11 and 2.4.12.

De novo peptide sequencing:

De novo sequencing was performed as described in 2.4.11 and A.4.9.

A.5 Acknowledgements

Library synthesis was performed by A.Q.J. Biotinylation of SARS-CoV-2 spike RBD was performed by Sebastian Pomplun. Selections for RBD binding were performed by A.J.Q.,

S.P., Xiyun Ye, and Yenchun Lee. Data analysis was performed by A.J.Q. and S.P. Selections for ACE2 binding were performed by A.J.Q. and Joseph Brown. BioLayer Interferometry was performed by S.P.

220 A.6 Appendix

RBD.1: TVFGLNVWKRYSK ALC = 81% m/z = 399.9814 z = +4 RT = 60.31 Mass = 1595.894

Figure. A.6.1. Analysis of selection hit RBD.1. a) Electron-transfer/higher-energy collision dissociation (EThcD) MS/MS spectrum of peptide RBD.1, enriched from affinity selection. b) Extracted ion chromatograms (EICs) for precursor ion of RBD.1 (m/z = 399.98–399.99) for SARS-CoV-2 spike RBD samples (blue) and 12ca5 samples (orange).

221 RBD.2: LVOGLNAWNOWYK ALC = 99% m/z = 828.9074 z = +2 RT = 88.21 Mass = 1655.795

Figure. A.6.2. Analysis of selection hit RBD.2. a) Higher-energy collisional dissociation (HCD) MS/MS spectrum of peptide RBD.2, enriched by affinity selection. b) EICs for precursor ion of RBD.2 (m/z = 828.90–828.91) for SARS-CoV-2 spike RBD samples (blue) and 12ca5 samples (orange). O = oxidized methionine (+15.99 Da).

222 RBD.3: LVOGLHVYLRQGK ALC = 89% m/z = 382.9759 z = +4 RT = 42.54 Mass = 1527.871

Figure. A.6.3. Analysis of selection hit RBD.3. a) HCD MS/MS spectrum of peptide RBD.3 enriched from affinity selection. b) EICs for precursor ion of RBD.3 (m/z = 382.97-382.98) for SARS-CoV-2 spike RBD samples (blue) and 12ca5 samples (orange). O = oxidized methionine (+15.99 Da).

223 RBD.1-Biotin: TVFGLNVWKRYSK-βA-K(Biotin)-CONH2

O NH O HN NH 2 HN NH NH S O NH OH OH O O O NH O O O O O H H H 2 H H H N N N N N N NH N N N N N N N N 2 H H H H H H H H NH2 O O O O O O O

OH NH2 NH2

Figure. A.6.4. Structure and LC-MS characterization of RBD.1-biotin. Calculated [M+H]+: 2022.11; found: 2022.11.

224 RBD.2-Biotin: LVMGLNAWNMWYK-βA-K(Biotin)-CONH2

O NH O HN NH OH S O S NH O O O NH O O O O O H H H 2 H H H N N N N N N NH N N N N N N N N 2 H H H H H H H H NH2 O O O O NH2 O O O NH S O

NH2

Figure. A.6.5. Structure and LC-MS characterization of RBD.2-biotin. Calculated [M+H]+: 2050.02; found: 2050.03.

225 RBD.4-Biotin: LVMGLNVWLRYSK-βA-K(Biotin)-CONH2

O NH O HN NH 2 HN NH NH S O NH OH O O O NH O O O O O H H H 2 H H H N N N N N N NH N N N N N N N N 2 H H H H H H H H NH2 O O O O O O O S OH NH2

Figure. A.6.6. Structure and LC-MS characterization of RBD.4-biotin. Calculated [M+3H]3+: 668.37; found: 668.37.

226 Scr1: GSVKRWLTYVKNF-βA-K(Biotin)-CONH2

Figure. A.6.7. LC-MS characterization of Scr1, a sequence scrambled variant of RBD.1. Calculated [M+2H]2+: 1011.56; found: 1011.56.

Scr2: RFYVTKGWSNKVL-βA-K(Biotin)-CONH2

Figure. A.6.8. LC-MS characterization of Scr2, a sequence scrambled variant of RBD.1. Calculated [M+2H]2+: 1011.56; found: 1011.56.

227

Figure A.6.9. Identified RBD-binding peptide shows no binding activity towards control protein 12ca5. RBD-binding peptide RBD.1-biotin (blue trace) does not bind anti- hemagglutinin antibody 12ca5, as measured by BioLayer Interferometry (BLI). The binding of a previously identified 12ca5-binding peptide, used as a positive control here, is shown in orange.

Figure A.6.10. RBD.1 does not compete with ACE2 for RBD binding. SARS-CoV-2 spike RBD (500 nM), pre-incubated with RBD.1 (variable concentration), retains affinity for ACE2 as measured by BLI. The increase in signal observed upon increasing RBD.1 concentration suggests that RBD.1 may bind RBD at a distinct site from the ACE2 binding site, and is bound to RBD upon RBD/ACE2 engagement.

228

Figure A.6.11. Analysis of putative ACE2-binding peptide Asp-DPro-Msn-4Af-Amb-Amb- Amb-Dff-Ile-Gly-Gln-Php-Lys. EICs of putative ACE2-binding peptide Asp-DPro-Msn-4Af- Amb-Amb-Amb-Dff-Ile-Gly-Gln-Php-Lys (m/z 584.26-584.27) in ACE2 (blue) and RBD (orange) samples. Abbreviations: 4Af = 4-aminophenylalanine; 4Py = 3-(4'-pyridyl)-L-alanine; β-Ala = β-alanine; β-Ser = β-homoserine; Aad = aminoadipic acid; Amb = 4- (aminomethyl)benzoic acid; Cha = cyclohexylalanine; Cpa = cyclopropylalanine; Dff = 3,4- difluorophenylalanine; Dmf = 3,4-dimethoxyphenylalanine; hAr = homoarginine; Hyp = hydroxyproline; Msn = methionine sulfone; Orn = ornithine; Php = 4-phenylpiperidine-4- carboxylic acid; Tha = thiazolylalanine.

Figure A.6.12. Analysis of putative ACE2-binding peptide Orn-Dff-DPro-Asp-Aad-Amb- Dff-Gly-Gly-Amb-hAr-Php-Lys. EICs of putative ACE2-binding peptide Orn-Dff-DPro-Asp- Aad-Amb-Dff-Gly-Gly-Amb-hAr-Php-Lys (m/z 573.61-573.62) in ACE2 (blue) and RBD (orange) samples.

229

Figure A.6.13. Analysis of putative ACE2-binding peptide Tha-Dmf-Aad-Aad-Aad-Asp- Cha-Dff-DPro-Cpa-4Py-Php-Lys. EICs of putative ACE2-binding peptide Tha-Dmf-Aad-Aad- Aad-Asp-Cha-Dff-DPro-Cpa-4Py-Php-Lys (m/z 573.61-573.62) in ACE2 (blue) and RBD (orange) samples.

Figure A.6.14. Analysis of putative ACE2-binding peptide Php-Aad-Aad-Tha-4Af-Ile-Gly- Gln-Php-Msn-Asp-Amb-Lys. EICs of putative ACE2-binding peptide Php-Aad-Aad-Tha-4Af- Ile-Gly-Gln-Php-Msn-Asp-Amb-Lys (m/z 573.61-573.62) in ACE2 (blue) and RBD (orange) samples.

230 Table A.6.1. Selections for ACE2 binding identify 54 putative binders.

Peptide ALC (%) m/z z RT Mass ppm OHRZERAIYYMGK 99 381.9974 5 28.66 1904.954 -1.9 VHWXQXTHJMGYK 99 435.9548 4 28.71 1739.791 -0.3 VQWTGHXIOQGYK 99 554.6065 3 38.63 1660.799 -1 JHRBWGHRZBBRK 99 389.598 5 50.41 1942.956 -1.5 HWRBPHDMZAZGK 99 606.5896 3 57.14 1816.749 -0.9 ZASZXEBWDGMZK 99 600.6144 3 58.37 1798.824 -1.3 VHWMOTWRWRQIK 99 503.9825 4 59.16 2011.902 -0.6 VHVGPPSWOHABK 99 545.5735 3 60.13 1633.702 -1.9 IYEEBWBBZGTBK 99 589.9507 3 63.59 1766.831 -0.1 VHWGQPMJQHAHK 99 567.2306 3 63.72 1698.674 -2.1 VXWMSHPQWJJAK 99 590.6221 3 73.09 1768.845 -0.2 VHWMHXWQBRBZK 98 490.7112 4 58.36 1958.819 -1.8 VHRPGAZDJMWMK 98 590.9379 3 60.73 1769.79 1.1 DPMYBBBWIGQZK 98 584.2676 3 66.02 1749.783 -0.9 VHIBHIGWAOQWK 98 585.2582 3 83.57 1752.756 -2.1 JHIBJMAWTVHWK 98 606.945 3 99.7 1817.812 0.9 VHWSVXWWAVABK 97 562.2561 3 65.29 1683.749 -1.4 VQWXIBGBGJIRK 96 403.739 4 56.59 1610.926 0.7 VQWXAEJPQWGVK 96 542.618 3 58.03 1624.838 -3.7 EHIBBVHGYJTAK 96 544.5992 3 62.27 1630.779 -1.7 XWPDEBWGGBRZK 96 573.6104 3 64.25 1717.803 3.9 VQGNVHPJDRVJK 95 552.312 3 59.31 1653.917 -1.9 IHWGABYHWTOMK 95 609.9064 3 63.5 1826.703 -2.9 VEWWBGWSQRGWK 95 597.5945 3 65.38 1789.765 -1.7 VHNVIXMPVAIHK 92 536.2731 3 38.76 1605.798 0 HOEEEDJWPVNZK 92 644.3018 3 70.22 1929.882 1 JWAIEDNGZJIMK 92 582.6428 3 80.24 1744.907 -0.2 IYHHSBBWIGQZK 91 584.2677 3 65.97 1749.776 3.1 ZEEHYIGQZMDBK 91 916.4117 2 68.15 1830.811 -1 IHXGWPBBMSXZK 90 569.9362 3 62.39 1706.791 -2.5 IHIIPBPWTOXZK 90 586.9777 3 76.82 1757.918 -3.7 HPQVVBGEHHBZK 89 570.2443 3 61.2 1707.714 -1.9 AMEJWBDABMZBK 89 897.8879 2 78.22 1793.764 -1.6 JHWAMSHYZNHYK 88 479.9476 4 51.13 1915.763 -0.7 HHIGJPDYAYWZK 87 585.9377 3 65.28 1754.791 0.1 VHOTTXITDJXWK 87 571.9666 3 70.5 1712.877 0.7 VHIGPJRTRRSSK 87 539.6533 3 73.33 1615.946 -4.6

231 VHWQVXITBYXZK 86 440.2345 4 46.31 1756.908 0.3 PWVBDEOJGDNHK 86 588.2662 3 68.14 1761.767 5.7 HHAROZSZWIGZK 86 643.9747 3 72.22 1928.907 -2.4 PBHJSHHZNDWZK 86 630.9296 3 74.4 1889.769 -1 ZIBBWHRGZEDRK 85 473.7377 4 62.67 1890.92 0.8 IHIAOGHTRIWAK 84 551.945 3 54.18 1652.813 0.1 ZNBMTJEDEMGBK 83 584.5899 3 58.26 1750.758 -5.8 OBTBMHEOBDOBK 83 659.2678 3 69.58 1974.784 -1.4 VNRZGSWBHYRNK 83 629.6575 3 73.07 1885.935 8.4 XGMPHJGVOHHBK 82 567.5792 3 44.9 1699.713 1.9 HWRBPHDMPYGZK 82 606.9235 3 57.37 1817.744 2.6 VHHMIGAOQROZK 81 623.6255 3 54.64 1867.857 -1.2 ZGXDSBHYZHVZK 81 455.9645 4 68.49 1819.836 -3.9 ZXBJZMDRNNBZK 81 488.5087 4 69.99 1950.001 2.2 OVHBJSJOQGZZK 81 646.3395 3 100.69 1935.999 -1.4 VHWVQXIYAYXZK 80 440.2346 4 46.25 1756.908 0.5 ZGNEMJEDEMGBK 80 584.5888 3 58.48 1750.758 -7.6

8 Selections from a 2 x 10 -member library of design (X)12K, where X = a suite of canonical and non-canonical amino acids, identifies 54 putative ACE2-binding peptides. Cyclopropylalanine is prominent at the N-terminus, appearing in 21/54 sequences. Abbreviations: A = β-alanine; B = 4- (aminomethyl)benzoic acid; D = aspartic acid; E = aminoadipic acid; G = glycine; H = thiazolylalanine; I = isoleucine; J = cyclohexylalanine; K = lysine; M = methionine sulfone; N = 3-(4'-pyridyl)-L-alanine; O = 3,4-dimethoxyphenylalanine; P = D-proline; Q = glutamine; R = homoarginine; S = hydroxyproline; T = β-homoserine; V = cyclopropylalanine; W = 3,4- difluorophenylalanine; X = ornithine; Y = 4-aminophenylalanine; Z = 4-phenylpiperidine-4- carboxylic acid.

232 A.7. References

(1) Lewis, D. Mounting Evidence Suggests Coronavirus Is Airborne — but Health Advice Has Not Caught Up. Nature 2020, 583 (7817), 510–513. https://doi.org/10.1038/d41586- 020-02058-1. (2) Secon, H. The coronavirus death rate in the US is almost 50 times higher than that of the flu. See how they compare by age bracket. https://www.businessinsider.com/coronavirus- death-rate-us-compared-to-flu-by-age-2020-6 (accessed Jul 25, 2020). (3) Wang, Q.; Zhang, Y.; Wu, L.; Niu, S.; Song, C.; Zhang, Z.; Lu, G.; Qiao, C.; Hu, Y.; Yuen, K.-Y.; Wang, Q.; Zhou, H.; Yan, J.; Qi, J. Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2. Cell 2020, 181 (4), 894-904.e9. https://doi.org/10.1016/j.cell.2020.03.045. (4) Lan, J.; Ge, J.; Yu, J.; Shan, S.; Zhou, H.; Fan, S.; Zhang, Q.; Shi, X.; Wang, Q.; Zhang, L.; Wang, X. Structure of the SARS-CoV-2 Spike Receptor-Binding Domain Bound to the ACE2 Receptor. Nature 2020, 581 (7807), 215–220. https://doi.org/10.1038/s41586-020- 2180-5. (5) Yuan, M.; Wu, N. C.; Zhu, X.; Lee, C.-C. D.; So, R. T. Y.; Lv, H.; Mok, C. K. P.; Wilson, I. A. A Highly Conserved Cryptic Epitope in the Receptor Binding Domains of SARS- CoV-2 and SARS-CoV. Science 2020, 368 (6491), 630–633. https://doi.org/10.1126/science.abb7269. (6) Ju, B.; Zhang, Q.; Ge, J.; Wang, R.; Sun, J.; Ge, X.; Yu, J.; Shan, S.; Zhou, B.; Song, S.; Tang, X.; Yu, J.; Lan, J.; Yuan, J.; Wang, H.; Zhao, J.; Zhang, S.; Wang, Y.; Shi, X.; Liu, L.; Zhao, J.; Wang, X.; Zhang, Z.; Zhang, L. Human Neutralizing Antibodies Elicited by SARS-CoV-2 Infection. Nature 2020. https://doi.org/10.1038/s41586-020-2380-z. (7) Laraia, L.; McKenzie, G.; Spring, D. R.; Venkitaraman, A. R.; Huggins, D. J. Overcoming Chemical, Biological, and Computational Challenges in the Development of Inhibitors Targeting Protein-Protein Interactions. Chem. Biol. 2015, 22 (6), 689–703. https://doi.org/10.1016/j.chembiol.2015.04.019. (8) Nevola, L.; Giralt, E. Modulating Protein–Protein Interactions: The Potential of Peptides. Chem. Commun. 2015, 51 (16), 3302–3315. https://doi.org/10.1039/C4CC08565E. (9) Han, D. P.; Penn-Nicholson, A.; Cho, M. W. Identification of Critical Determinants on ACE2 for SARS-CoV Entry and Development of a Potent Entry Inhibitor. Virology 2006, 350 (1), 15–25. https://doi.org/10.1016/j.virol.2006.01.029. (10) Xia, S.; Yan, L.; Xu, W.; Agrawal, A. S.; Algaissi, A.; Tseng, C.-T. K.; Wang, Q.; Du, L.; Tan, W.; Wilson, I. A.; Jiang, S.; Yang, B.; Lu, L. A Pan-Coronavirus Fusion Inhibitor Targeting the HR1 Domain of Human Coronavirus Spike. Sci. Adv. 2019, 5 (4), eaav4580. https://doi.org/10.1126/sciadv.aav4580. (11) Zhang, G.; Pomplun, S.; Loftis, A. R.; Tan, X.; Loas, A.; Pentelute, B. L. Investigation of ACE2 N-Terminal Fragments Binding to SARS-CoV-2 Spike RBD. bioRxiv 2020, 2020.03.19.999318. https://doi.org/10.1101/2020.03.19.999318. (12) Xia, S.; Liu, M.; Wang, C.; Xu, W.; Lan, Q.; Feng, S.; Qi, F.; Bao, L.; Du, L.; Liu, S.; Qin, C.; Sun, F.; Shi, Z.; Zhu, Y.; Jiang, S.; Lu, L. Inhibition of SARS-CoV-2 (Previously 2019-NCoV) Infection by a Highly Potent Pan-Coronavirus Fusion Inhibitor Targeting Its Spike Protein That Harbors a High Capacity to Mediate Membrane Fusion. Cell Res. 2020, 30 (4), 343–355. https://doi.org/10.1038/s41422-020-0305-x.

233 (13) Romano, M.; Ruggiero, A.; Squeglia, F.; Berisio, R. An Engineered Stable Mini-Protein to Plug SARS-Cov-2 Spikes. bioRxiv 2020, 2020.04.29.067728. https://doi.org/10.1101/2020.04.29.067728. (14) Quartararo, A. J.; Gates, Z. P.; Somsen, B. A.; Hartrampf, N.; Ye, X.; Shimada, A.; Kajihara, Y.; Ottmann, C.; Pentelute, B. L. Ultra-Large Chemical Libraries for the Discovery of High-Affinity Peptide Binders. Nat. Commun. 2020, 11 (1), 3183. https://doi.org/10.1038/s41467-020-16920-3. (15) Vinogradov, A. A.; Gates, Z. P.; Zhang, C.; Quartararo, A. J.; Halloran, K. H.; Pentelute, B. L. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries. ACS Comb. Sci. 2017, 19 (11), 694–701. https://doi.org/10.1021/acscombsci.7b00109. (16) Griffiths, A. D.; Duncan, A. R. Strategies for Selection of Antibodies by Phage Display. Curr. Opin. Biotechnol. 1998, 9 (1), 102–108. https://doi.org/10.1016/S0958- 1669(98)80092-X. (17) Touti, F.; Gates, Z. P.; Bandyopadhyay, A.; Lautrette, G.; Pentelute, B. L. In-Solution Enrichment Identifies Peptide Inhibitors of Protein–Protein Interactions. Nat. Chem. Biol. 2019, 15 (4), 410–418. https://doi.org/10.1038/s41589-019-0245-2. (18) Oudit, G. Y.; Crackower, M. A.; Backx, P. H.; Penninger, J. M. The Role of ACE2 in Cardiovascular Physiology. Trends Cardiovasc. Med. 2003, 13 (3), 93–101. https://doi.org/10.1016/S1050-1738(02)00233-5. (19) Mijalis, A. J.; Thomas Iii, D. A.; Simon, M. D.; Adamo, A.; Beaumont, R.; Jensen, K. F.; Pentelute, B. L. A Fully Automated Flow-Based Approach for Accelerated Peptide Synthesis. Nat. Chem. Biol. 2017, 13 (5), 464–466. https://doi.org/10.1038/nchembio.2318. (20) Hartrampf, N.; Saebi, A.; Poskus, M.; Gates, Z. P.; Callahan, A. J.; Cowfer, A. E.; Hanna, S.; Antilla, S.; Schissel, C. K.; Quartararo, A. J.; Ye, X.; Mijalis, A. J.; Simon, M. D.; Loas, A.; Liu, S.; Jessen, C.; Nielsen, T. E.; Pentelute, B. L. Synthesis of Proteins by Automated Flow Chemistry. Science 2020, 368 (6494), 980–987. https://doi.org/10.1126/science.abb2491.

234