Towards Structure Determination of Disulfide-Rich Peptides Using Chemical Shift-Based Methods

Conan K. Wang1,*, David J. Craik1

1Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, 4072, Australia.

*Address correspondence to: Dr Conan Wang ([email protected])

1

ABSTRACT: Disulfide-rich peptides are a class of molecules for which NMR spectroscopy has been the primary tool for structural characterization. Here, we explore whether the process can be achieved by using structural information encoded in chemical shifts. We examine: i) a representative set of five cyclic disulfide-rich peptides that have high-resolution NMR and X-ray structures, and ii) a larger set of 100 disulfide-rich peptides from the PDB. Accuracy of the calculated structures was dependent on the methods used for searching through conformational space and for identifying native conformations. Although Hα chemical shifts could be predicted reasonably well using SHIFTX, agreement between predicted and experimental chemical shifts was sufficient for identifying native conformations for only some peptides in the representative set. Combining chemical shift data with secondary structure information and potential energy calculations improved the ability to identify native conformations. Additional use of sparse distance restraints or homology information to restrict the search space also improved resolution of calculated structures. This study demonstrates that abbreviated methods have potential for elucidation of peptide structures to high-resolution and further optimization of these methods, e.g. improvement in chemical shift prediction accuracy, will likely help transition these methods into the mainstream of disulfide-rich peptide structural biology.

2

INTRODUCTION Peptides are found throughout nature and have a wide range of functions essential for life, including metabolic regulation, signaling and host-defense. There is currently widespread interest in peptides with covalent constraints such as disulfide bonds because constrained peptides typically exhibit increased stability and conformational rigidity over linear peptides, and also adopt well-defined structural elements more typical of .1 These characteristics underpin their potential as potent and selective inhibitors of -protein interactions,2 as illustrated by several constrained peptides either with potent pre- clinical activity,3 or reaching late-stage clinical trials and/or regulatory approval.4 It is expected that advances in technologies for high-throughput design and discovery of bioactive peptides will continue to push peptides into the mainstream of biotechnological applications.

Knowledge of structures of natural and designed peptides is essential to harvest their potential as bioactive modalities, which can be obtained through the use of X-ray and/or NMR spectroscopy. Of the two techniques, NMR is preferred because it does not require crystallization, which is a major bottleneck in the X-ray approach, notwithstanding that recent advances in racemic crystallography 5-8 addresses this issue for disulfide-rich peptides.9,10 However, a concern with the use of NMR spectroscopy is that full structure determination is resource-intensive, requiring analysis of multiple experiments to obtain structural restraints and iterative rounds of data evaluation during structure refinement.

Approaches to streamline NMR determination have been proposed, including automated data analysis11- 14 and NMR chemical shift-based methods.15-21 It is promising that in some cases, structures of proteins or nucleic acids have been determined solely from chemical shift data, circumventing the need to gather other restraints, such as distance restraints provided by the nuclear Overhauser effect and dihedral angle restraints by coupling constants.15-18 However, recent studies suggest that high-resolution structures of proteins are more reliably obtained when chemical shift data are supplemented with some additional restraints.21-26 Nevertheless, more research is needed to understand how broadly chemical shift based methods can be applied, particularly for peptides, to develop more efficient approaches for NMR structure determination.

Here we examine de novo structure determination of disulfide-rich peptides solely from chemical shift data, as well as the effect of including additional sparse restraints and of using homology modeling. We developed computational methods to extract structural information from chemical shifts, using them to examine two sets of peptides: i) a representative set of peptides constrained by a macrocyclic backbone and disulfide bonds and ii) a larger non-overlapping set of 100 disulfide-rich peptides from the PDB. 3

EXPERIMENTAL SECTION

Peptide synthesis and purification The assembly, synthesis, purification, cyclization and oxidation of ribifolin, SFTI-1, cVc1.1, BTD-2, and kB1 have been described previously.27,28 More details are provided in the Supporting Information.

NMR spectroscopy

Peptides were dissolved in H2O/D2O (9:1, v/v) at a concentration of ~1 mM. NMR spectra were recorded on a Bruker Avance-600 MHz NMR spectrometer at 298 K. One- and two-dimensional NMR spectra (1H, 1H TOCSY, NOESY, and 1H, 13C HSQC, and 1H, 15N HSQC) were acquired. Spectra were processed using Topspin 1.2 (Bruker) and analyzed using CCPNMR 2.2.2. Spectra were internally referenced to 2,2-dimethyl-2-silapentane-5-sulfonic acid (DSS) at 0.00 ppm.

A protocol for structure determination using chemical shifts The Supporting Information provides details of the algorithm used for structure determination, describing the representation chosen for any candidate solution (each solution known as an individual has a set of genes), the method for initialization of the population of individuals, the method for calculation of scores (to assign fitness), and routines for generating new individuals (called genetic operators). We note that a genetic algorithm has previously been shown to be effective in calculation of protein structures.29

Molecular dynamics simulations Molecular dynamics simulations were performed as previously described.30 More details are provided in the Supporting Information.

Homology modelling Peptides with sequence similarity to the query peptide (i.e. peptide to be modeled) were identified manually or by a BLAST search of the PDB. Matching peptides that were identical to the query peptide were rejected to prevent bias in the results. A sequence alignment of the query peptide to the remaining matching peptides was generated by optimizing amino acid and chemical shift similarity. The alignments along with coordinates of the subject peptides were used to construct homology models using MODELLER v9.16.31

4

RESULTS

Accuracy of chemical shift predictions for disulfide-rich peptides Prediction accuracies of three popular programs that calculate chemical shifts, i.e. SHIFTX,32 SHIFTX+,33 and SPARTA+34 were examined. These programs have been used successfully for de novo determination of protein structures15-18 and are reported to be fast and accurate. They were tested on a set of 100 disulfide-rich structures (i.e. containing >1 disulfide bond) that exhibited good MolProbity scores35 (used as an indication of structural quality, Table S1 and Figure S1). Prediction accuracies of the programs were comparable, although SPARTA+ performed slightly better than SHIFTX and SHIFTX+, having standard deviations of 0.30, 0.56, 3.04, 1.37 ppm compared to 0.35, 0.67, 3.34, 1.54 ppm for SHIFTX and 0.35, 0.65, 3.30, and 1.52 for SHIFTX+ for the set of nuclei tested (Figure 1a). Nevertheless, the overall trend in accuracy for all three programs with respect to nuclei type was the same, with predictions of the backbone Hα chemical shift having the lowest deviations from experimental values, followed by HN then Cα and finally N.

To examine prediction accuracy in a more detailed and standardized manner, a smaller representative set of disulfide-rich peptides was generated. Specifically, we re-measured chemical shifts for five well- characterized backbone-cyclic peptides that have demonstrated therapeutic potential: ribifolin, SFTI-1, cVc1.1, BTD-2, and kB1 (Figure 1b). Aside from structural diversity (i.e. have varying size, topology, disulfide content and connectivity), these peptides have high-resolution X-ray structures (1.25–1.85 Å)9,10,36 that agree well with their NMR structures (Figure 1b). As shown in Figure 1c and Figure S2, predictions generally tracked well with the residue-to-residue variation in experimental chemical shifts. In some cases, there were discrepancies between predicted and experimental Hα chemical shifts, as observed for ribifolin, cVc1.1 and kB1, but predictions for SFTI-1 and BTD-2 were very accurate by comparison (Figure 1c; using SHIFTX). Overall, the analysis of chemical shift predictions for disulfide- rich peptides suggests that current chemical shift calculation programs might have the potential to guide the de novo determination of structures of disulfide-rich peptides to high resolution.

Scoring of structures by chemical shift-based parameters We examined whether chemical shifts, alone or in combination with other structural parameters, could be used to distinguish between different conformations of disulfide-rich peptides (Figure 2 and Figure S3). In a preliminary analysis, a range of parameters was tested to build a weighted chemical shift-based score (Supporting Information Methods and Figure S3). These parameters included examples based on chemical shift data, secondary structure, potential energy, hydrogen bonding, Ramachandran statistics

5

3 and JHαHN coupling constants, all of which can be easily measured. We focus on chemical shifts, secondary structure, and potential energy parameters here.

We found chemical shift data (i.e. agreement between calculated and predicted chemical shifts) alone could identify some correct conformations from a randomly generated population of structures but were not sufficiently powerful in resolving all correct from incorrect conformations (Figures 2 and S3). For example, some incorrect conformations of ribifolin, cVc1.1, or kB1 had predicted shifts that were similar to experimental shifts of the true conformation (Figure 2, left column). This result is probably due to ambiguous chemical shifts or inaccuracies in their prediction, and suggests that additional parameters might increase the reliability of identifying correct conformations.

Agreement between expected secondary structure (based on secondary Hα chemical shifts) and assigned secondary structure of a candidate conformation was found to be indicative of a correct fold (Figure 2, middle column). For example, when secondary structure data was used, the lowest-scoring conformations of kB1 were more likely to resemble the experimental structure compared to when only Hα chemical shift data was used. However, use of secondary shift data was not useful in all cases, particularly when the peptide lacked well-defined secondary structure (e.g. ribifolin).

Aside from using chemical shift information, identification of correct conformations can be done based on calculations of potential energy. Previous studies on chemical shift-based structure determination have relied on some measure of potential energy, which can be obtained from either physical- or knowledge-based potentials. Knowledge-based statistical potentials were tested here because of their ease of use. Of the three programs tested (DFIRE37, OPUS38, and RW39) for the calculation of statistical potentials, DFIRE had the best performance (Figure 2, right column, and Figure S3).

Use of chemical shifts for structure determination: testing the protocol A protocol for structure elucidation by chemical shifts requires a function to score conformations and a method to navigate the scoring landscape. Here, we used a genetic algorithm for structural optimisation because of its ease of implementation. The protocol is illustrated in Figure 3a. It takes as input: experimental chemical shifts, the peptide sequence and structural features, such as the disulfide connectivity (obtained from mass spectrometric-based experiments for example40,41,42). The protocol involves several stages from generating backbone dihedral angle restraints to calculating of candidate conformations based on those restraints that are subsequently scored.

6

Initially, we focused on using Hα chemical shift data without chemical shifts of other nuclei because our earlier analysis indicated it was the most consistently well-predicted. The benefit of using Hα chemical shift data only is that such data are routinely acquired for peptides and additional resources needed for acquisition and analysis of other chemical shifts is saved. In our implementation of the protocol, Hα chemical shifts were used to search a database of chemical shifts and peptide fragments to predict possible dihedral angles and to guide the genetic algorithm in the way it evolves the population of candidate solutions from generation to generation.

We applied the protocol to determine structures of the peptides within the representative set and obtained final best-scoring solutions that have very good agreement with their experimental structures (< 2 Ǻ) for ribifolin, SFTI-1, and BTD-2 (Figure 3b and c). For cVc1.1, a cyclic peptide engineered by introduction of a C- to N-terminal 6-residue bridging linker, very good agreement between its solutions and its experimental structure was obtained when the flexible linker30 was ignored during structural alignment. Varying the composition of the scoring function showed that the combined use of Hα chemical shifts, secondary structure information, and potential energy approximations resulted in improved overall accuracy compared the use of each component individually (Figure 3b and S4). Use of Hα chemical shifts to initialize the starting population of candidate solutions was also found to be important for prediction accuracy (Figure 3b), as random initialization resulted in lower performance.

Although the protocol could find good solutions for kB1, not all solutions showed reasonable structural similarity to the crystal structure. Extending the search time in the calculate structures routine resulted in a modest improvement in accuracy (Figure S5a) but the main issue encountered was the sampling of non-native conformations that scored well but aligned poorly with the crystal structure (Figure 3d) because of a more complex scoring landscape than expected (Figure 2). These non-native conformations had poor structural geometry, which could be resolved by subsequent use of short (50 ns) molecular dynamics simulations that resulted in improved classification of correct from incorrect conformations (Figure 3d) and best-scoring solutions that were more similar to the experimental structure (RMSD of 1.03 Å for kB1, Figure 3e; and RMSDs of <1 Å for SFTI-1, cVc1.1, and BTD-2, Figure S6). This result suggests that provided near-optimal solutions can be found by the structure determination protocol, a post-refinement step involving molecular dynamics could help identify correct conformations. Indeed, such an approach could pick out the native disulfide connectivity of kB1 from 14 other non-native configurations (Figure S5b), a task that has been challenging experimentally.43 We note that further improvements to the protocol could be made (e.g. optimizing parameters used for scoring and searching of conformations, as exemplified in Figure S5a); however, the important conclusion so far is that

7

structure determination of disulfide-rich peptides is possible using chemical shift data without the need for structural restraints used in classical methods.

To further test the protocol, we applied it to 70 other disulfide-rich peptides that have NMR structures in the PDB and associated chemical shifts in BMRB (Table S1); unlike the representative set though, not all of these peptides have crystal structures for comparison. Again, the protocol found structures of peptides within this expanded test set that aligned well with their experimental structures, but there were also examples where only poor solutions were found (Figure 4). Table 1 shows an analysis of the scores for the examples presented in Figure 4, suggesting that the difficulty in finding good solutions is partly related to there being non-native conformations having scores similar to or better than the native structure, as was observed for kB1 (Figure 3d). For example, non-native solutions were found with secondary structure content in better agreement with experimental secondary shifts (e.g. Midi peptide), or with predicted chemical shifts more similar to experimental values, or with potential energy scores better than that of the native structure (e.g. Tk-Amp-X2 peptide). This result suggests that optimization of the scoring approach, e.g. through improvement in chemical shift prediction accuracy, might be one way towards improving the overall accuracy of structural calculation by chemical shifts. In other cases, poor performance of the protocol might be due to the search methodology having difficulty finding optimal solutions. For example, the calculate structure routine might not be able to find correct structures based on dihedral restraints only, which are used to represent candidate solutions in the genetic algorithm, because of force-field biases. To address the difficulties in scoring native-like conformations and searching conformational space, additional restraints could be applied to direct the search towards correct conformations.

Additional sparse restraints for improved resolution

Addition of sparse distance restraints (i.e. backbone HN-HN) significantly improved performance of the internal calculate structure routine. The structures of 100 disulfide-rich peptides from the PDB were re- calculated based on optimal restraints derived from their deposited structures (Figure 5a, left, depicts this process for Agitoxin 2). Two types of restraints were tested: i) backbone ϕ and φ dihedral angles because they are generated based on chemical shifts; and ii) backbone HN-HN distances (<5 Å) because they provide complementary information to dihedral angle restraints and are easily identified in 1H NOESY spectra (already available during chemical shift assignment). Variations of the search time as well as the tolerance range of the dihedral angle restraints were also tested (Figure S8a). Figures 5a and S8 show that a combination of dihedral angle with sparse distance restraints produced re-calculated structures with the highest structural similarity to those deposited (compared to using solely dihedral angle

8

restraints or backbone HN-HN distance restraints, or even increasing search time or tightening of the tolerance range of dihedral angle restraints).

Our evaluation of the internal calculate structure routine suggested that combining sparse distance restraints with chemical shift input should also translate into improved performance of the full structure determination protocol (Figure 5b). Indeed, this was the case as both convergence as well as the quality of solutions found by the protocol improved (Figure S8b). For example, both the number of best-scoring and best-fitting solutions within 2 Å RMSD of the experimental structure improved by >300% after addition of sparse distance restraints (Figure 5b). Figure 5b and c indicate that structural convergence of best-scoring solutions from multiple runs of the protocols also improved. However, there were still disulfide-rich peptides for which only poor solutions were found. One factor that affected performance of the protocol was structural disorder (as measured by molecular dynamics simulations), which would result in poorly defined distance restraints, poor alignment to the experimental structure, and low structural convergence (Figure S8d), as well as lower chemical shift prediction accuracies.44,45

Combining homology information with chemical shifts for improved resolution Homology to known structures of disulfide-rich peptides could be used to improve performance of the protocol. After identification of potential homologues of a given peptide, chemical shifts were used to guide alignment of all amino acid sequences, following recent suggestions,46 resulting in more meaningful and informative sequence/structure comparisons. A model generated by the resulting multiple sequence alignment was then used to seed the structure determination protocol, effectively directing the search towards structures that are more likely. Figure 6 presents three examples of disulfide-rich peptides for which the use of homology information improved performance of the protocol. Figure 6a represents a scenario in which a close homolog was identified and so a sequence alignment was also easily generated, whereas Figure 6b represents one in which more distant homologues were found in the PDB, requiring comparison of chemical shift data to accurately determine the placement of gaps in the alignment. Figure 6c represents a case in which the structure of a mutant peptide was desired and so a template sequence was easily identified because it was simply that of the wide-type form. In all three examples, use of homology information significantly improved both the convergence and quality of solutions found (Figure 6, panels on the right). Overall, this result suggests that combining chemical shifts with homology information is a powerful strategy for elucidation of disulfide-rich peptide structures.

9

DISCUSSION The overall aim of this study was to explore the potential of chemical shift data to determine structures of disulfide-rich and, more broadly, constrained peptides. The study was motivated by recent interest in constrained peptides as therapeutics and agrochemicals, and the need for rapid structure characterization techniques to match advanced methods for peptide lead discovery in which millions of peptides can be screened efficiently for activity. We demonstrated that chemical shifts can indeed be used to determine structures of disulfide-rich peptides, with examples of high resolution structures calculated solely from Hα chemical shifts. For example, high-resolution structures of SFTI-1, a single disulfide-bond containing peptide, and BTD-2, a three disulfide-bond containing peptide, were obtained from chemical shift data. Currently, the structure determination protocol used performs well for peptides with rigid structures and whose chemical shifts can be accurately predicted, and is thus well-suited for constrained peptides such as those with disulfide bonds and well-defined secondary structure. Generally though, supplementing chemical shift data with sparse distance restrains and/or homology information should generate more reliable and accurate results.

Several limitations of relying on chemical shifts for structure determination were encountered. One is that agreement between predicted and experimental shifts is not always sufficient for identification of the experimental structure. This might be due to a number of reasons, such as ambiguity of chemical shifts, inaccurate predictions of chemical shifts, or inconsistencies in the deposited shifts and structures, which could partly explain why chemical shifts alone are insufficient to define a consistent scoring function, requiring additional input information to guide the search towards correct solutions. Approaches used to generate this additional information come with their own limitations and biases, such as DFIRE having a preference for more compact structures. Certainly, it is promising that excellent solutions could still be obtained for a broad range of disulfide-rich peptides even though empirical/approximate methods for chemical shift and potential energy calculation were used.

In summary, we implemented a simple chemical shift-based method with the main goal of exploring the potential of chemical shifts in structure determination of peptides. There is scope for future improvements to the performance of the method, for example by (i) optimization of search parameters, (ii) use of alternative algorithms, scoring, and energy functions,47-54 or (iii) use of more sophisticated approaches for chemical shift prediction and interpretation.19,20,55 These improvements would be expected to greatly improve the utility of chemical shift-based methods for studies of disulfide-rich peptide structure, but could also find use in characterization of interactions and dynamics.

10

SUPPORTING INFORMATION Detailed materials and methods. List of peptides studied. Relationship between DFIRE score and sequence length. Detailed comparison of predicted and experimental chemical shifts for peptides in the representative set. Analysis of structural parameters for scoring. Analysis of scoring terms and structure calculation protocol variations. Best-scoring solutions after molecular dynamics simulations. Analysis of algorithm variations and disulfide connectivity on accuracy of calculated kB1 structures. Effect of additional restraints on structure calculation accuracy of selected peptides from the PDB. Analysis of structures calculated from backbone torsion angles only. Correlation between backbone dihedral angles and chemical shifts and secondary structure.

11

ACKNOWLEDGEMENTS DJC is an Australian Research Council (ARC) Australian Laureate (FL150100146). Work in our laboratory on peptide scaffolds is supported by grants from the ARC (DP150100443, LE160100218) and the National Health and Medical Research Council (APP1107403).

12

FIGURE CAPTIONS

Figure 1. Prediction of NMR chemical shifts of disulfide-rich peptides. a) Difference between predicted and observed backbone Hα, HN, Cα, and N chemical shifts (Δδ) of disulfide-rich peptides (selected for quality based on MolProbity scores) using three popular programs: SHIFTX, SHIFTX+ and SPARTA+. b) Structures of backbone-cyclic disulfide-rich peptides forming a representative test set from NMR spectroscopy (colored) and X-Ray crystallography (grey) for ribifolin, SFTI-1 (sunflower trypsin inhibitor-1), cVc1.1 (cyclic Vc1.1), BTD-2 (baboon theta defensin-2), and kB1 (kalata B1). c) Comparison of predicted and experimental Hα chemical shifts for each peptide in the representative test set. The sequences and disulfide connectivity of each peptide are shown under their respective chemical shift plot, as are elements of secondary structure (α-helix and β-strand).

Figure 2. Use of chemical shifts to score conformations of peptides in the representative test set. Each plot shows the relationship between a weighted chemical shift-based score (details in Supporting Information) of a randomly generated conformation and the RMSD of its backbone heavy atoms with the X-ray structure of the peptide. Each column involves a different composition of the scoring function: the first column uses only Hα chemical shifts; the second column uses Hα chemical shifts and secondary structure assignments; and the third column uses Hα chemical shifts, secondary structure information and the DFIRE statistical potential.

Figure 3. Structure elucidation based on chemical shift data. a) Flow-chart of the protocol used for structure elucidation. Sequence information and Hα chemical shifts are provided as input data and used to generate initial candidate conformations, which are improved over iterative cycles to produce final solutions. b) Effect of algorithm variations on the RMSD of backbone heavy atoms between final candidate solutions and the X-ray structure as applied to peptides of the representative set. Results for the full protocol are shown at the top of the panel, whereas varying the scoring function to use Hα chemical shift data only is in the middle and the initialization step to use random values is shown at the bottom. RMSDs for the best-scoring solutions from ten runs of the algorithm for each peptide are shown. RMSDs for cVc1.1 was performed over the core scaffold (i.e. residues 1–16). c) Structural superimposition of the best scoring solutions found for ribifolin, SFTI-1, cVc1.1, and BTD-2 and their respective X-ray structures using the protocol. The RMSDs are shown. d) Use of molecular dynamics to score final candidate solutions. Each circle represents a structure a candidate solution sampled during ten runs the algorithm (with extended search time in the calculate structures routine) to find structures of kB1, showing the relationship between their scores and RMSD to the X-ray structure. The ten best- scoring solutions are colored in dark purple. Molecular dynamics simulations of these solutions can 13

resolve poor geometry and result in new solutions (light purple) that exhibit a better correlation between score and RMSD. e) Structural superimposition of the best-scoring solution found during analysis of molecular dynamics trajectories for kB1 with its X-ray structure.

Figure 4. Elucidation of structures using Hα chemical shift data for disulfide-rich peptides extracted from the PDB. Superimposition of the best-scoring (left structures) and best-fitting (right structures) solutions for a selection of peptides with their respective NMR structures. The peptide names and PDB IDs (in bold) and RMSD values over the backbone heavy atoms (in italics; Å) are shown. Solutions with very good agreement (RMSD < 2 Å; colored green) with their experimental structures were obtained, but solutions with mediocre agreement (2 < RMSD < 4 Å; colored orange) and poor alignment (RMSD > 4 Å; colored red) were also generated.

Figure 5. Testing additional backbone distance restraints to improve resolution of structures. a) Effect of restraint type on structure calculation accuracy. Native dihedral angles (ϕ, φ) and backbone HN-HN distance restraints (within 5 Å only) restraints were calculated from deposited PDB structures and used to test the calculate structure routine. The procedure is illustrated for Agitoxin 2 (PDB ID: 1AGT; left). The distribution of RMSDs (right) for the analyzed structures using different restraints combinations: dihedral angles (ϕ, φ) only; backbone HN-HN distance restraints within 5 Å only (HN-HN); and dihedral angles in combination with backbone HN-HN distance restraints (ϕ, φ + HN-HN). b) Effect of adding backbone HN-HN distance restraints on performance of the structure determination protocol (same as that used in Figure 3b, top; Hα, SS, DFIRE scoring function with Hα-based initialization). The number of times solutions of a particular RMSD (separated into bin sizes of 1 Å) was obtained with or without backbone distance restraints were counted. c) A selected example (Agitoxin 2) showing structures obtained from using HN-HN restraints only in the calculate structure routine (best scoring); the full structure calculation protocol (i.e. Hα, SS, DFIRE scoring function with Hα-based initialization; ten best scoring from ten runs); and the full protocol with HN-HN restraints (ten best scoring from ten runs). The solutions are superimposed with their corresponding NMR structures. Average RMSDs (in italics; Å) of the backbone heavy atoms are shown.

Figure 6. Use of homology information to improve resolution of calculated structures. All three panels (a–c) show three sub-panels, including an alignment of a subject peptide (i.e. identified by the PDB IDs 2MD6, 2ME7, and 1N1U) with homologues that was generated based on sequence and Hα secondary shifts (Δδ); a superimposition of the best-scoring solution with the experimental structure and its RMSD; and a comparison of the structural similarity of solutions with the experimental structures obtained

14

without (-) and with (+) homology-derived information. Panel (a) shows an example of an α-conotoxin, (b) shows an example of a scorpion toxin mutant, and (c) shows an example of a mutant of kB1.

15

REFERENCES (1) Wang, C. K.; Craik, D. J. Designing Macrocyclic Disulfide-Rich Peptides for Biotechnological Applications. Nat Chem Biol 2018, 14, 417-427.

(2) Craik, D. J.; Fairlie, D. P.; Liras, S.; Price, D. The Future of Peptide-Based Drugs. Chem Biol Drug Des 2013, 81, 136-147.

(3) Ji, Y.; Majumder, S.; Millard, M.; Borra, R.; Bi, T.; Elnagar, A. Y.; Neamati, N.; Shekhtman, A.; Camarero, J. A. In Vivo Activation of the P53 Tumor Suppressor Pathway by an Engineered Cyclotide. J Am Chem Soc 2013, 135, 11623-11633.

(4) Morrison, C. Constrained Peptides' Time to Shine? Nat Rev Drug Discov 2018, 17, 531-533.

(5) Kent, S. B. Racemic & Quasi-Racemic Protein Crystallography Enabled by Chemical Protein Synthesis. Curr Opin Chem Biol 2018, 46, 1-9.

(6) Avital-Shmilovici, M.; Mandal, K.; Gates, Z. P.; Phillips, N. B.; Weiss, M. A.; Kent, S. B. Fully Convergent Chemical Synthesis of Ester Insulin: Determination of the High Resolution X-Ray Structure by Racemic Protein Crystallography. J Am Chem Soc 2013, 135, 3173-3185.

(7) Dang, B.; Kubota, T.; Mandal, K.; Bezanilla, F.; Kent, S. B. Native Chemical Ligation at Asx-Cys, Glx- Cys: Chemical Synthesis and High-Resolution X-Ray Structure of Shk Toxin by Racemic Protein Crystallography. J Am Chem Soc 2013, 135, 11911-11919.

(8) Gao, S.; Pan, M.; Zheng, Y.; Huang, Y.; Zheng, Q.; Sun, D.; Lu, L.; Tan, X.; Tan, X.; Lan, H.et al. Monomer/Oligomer Quasi-Racemic Protein Crystallography. J Am Chem Soc 2016, 138, 14497-14502.

(9) Wang, C. K.; King, G. J.; Conibear, A. C.; Ramos, M. C.; Chaousis, S.; Henriques, S. T.; Craik, D. J. Mirror Images of Antimicrobial Peptides Provide Reflections on Their Functions and Amyloidogenic Properties. J Am Chem Soc 2016, 138, 5706-5713.

(10) Wang, C. K.; King, G. J.; Northfield, S. E.; Ojeda, P. G.; Craik, D. J. Racemic and Quasi-Racemic X-Ray Structures of Cyclic Disulfide-Rich Peptide Drug Scaffolds. Angew Chem Int Ed Engl 2014, 53, 11236-11241.

(11) Buchner, L.; Guntert, P. Systematic Evaluation of Combined Automated Noe Assignment and Structure Calculation with Cyana. J Biomol NMR 2015, 62, 81-95.

(12) Lee, W.; Tonelli, M.; Markley, J. L. Nmrfam-Sparky: Enhanced Software for Biomolecular Nmr Spectroscopy. Bioinformatics 2015, 31, 1325-1327.

(13) Lee, W.; Petit, C. M.; Cornilescu, G.; Stark, J. L.; Markley, J. L. The Audana Algorithm for Automated Protein 3d Structure Determination from Nmr Noe Data. J Biomol NMR 2016, 65, 51-57.

(14) Wurz, J. M.; Kazemi, S.; Schmidt, E.; Bagaria, A.; Guntert, P. Nmr-Based Automated Protein Structure Determination. Arch Biochem Biophys 2017, 628, 24-32. 16

(15) Kontaxis, G.; Delaglio, F.; Bax, A. Molecular Fragment Replacement Approach to Protein Structure Determination by Chemical Shift and Dipolar Homology Database Mining. Methods Enzymol 2005, 394, 42-78.

(16) Cavalli, A.; Salvatella, X.; Dobson, C. M.; Vendruscolo, M. Protein Structure Determination from Nmr Chemical Shifts. Proc Natl Acad Sci U S A 2007, 104, 9615-9620.

(17) Shen, Y.; Lange, O.; Delaglio, F.; Rossi, P.; Aramini, J. M.; Liu, G.; Eletsky, A.; Wu, Y.; Singarapu, K. K.; Lemak, A.et al. Consistent Blind Protein Structure Generation from Nmr Chemical Shift Data. Proc Natl Acad Sci U S A 2008, 105, 4685-4690.

(18) Wishart, D. S.; Arndt, D.; Berjanskii, M.; Tang, P.; Zhou, J.; Lin, G. Cs23d: A Web Server for Rapid Protein Structure Generation Using Nmr Chemical Shifts and Sequence Data. Nucleic Acids Res 2008, 36, W496- 502.

(19) Vila, J. A.; Aramini, J. M.; Rossi, P.; Kuzin, A.; Su, M.; Seetharaman, J.; Xiao, R.; Tong, L.; Montelione, G. T.; Scheraga, H. A. Quantum Chemical 13c(Alpha) Chemical Shift Calculations for Protein Nmr Structure Determination, Refinement, and Validation. Proc Natl Acad Sci U S A 2008, 105, 14389-14394.

(20) Wylie, B. J.; Sperling, L. J.; Nieuwkoop, A. J.; Franks, W. T.; Oldfield, E.; Rienstra, C. M. Ultrahigh Resolution Protein Structures Using Nmr Chemical Shift Tensors. Proc Natl Acad Sci U S A 2011, 108, 16974- 16979.

(21) Nerli, S.; McShan, A. C.; Sgourakis, N. G. Chemical Shift-Based Methods in Nmr Structure Determination. Prog Nucl Magn Reson Spectrosc 2018, 106-107, 1-25.

(22) Raman, S.; Lange, O. F.; Rossi, P.; Tyka, M.; Wang, X.; Aramini, J.; Liu, G.; Ramelot, T. A.; Eletsky, A.; Szyperski, T.et al. Nmr Structure Determination for Larger Proteins Using Backbone-Only Data. Science 2010, 327, 1014-1018.

(23) Thompson, J. M.; Sgourakis, N. G.; Liu, G.; Rossi, P.; Tang, Y.; Mills, J. L.; Szyperski, T.; Montelione, G. T.; Baker, D. Accurate Protein Structure Modeling Using Sparse Nmr Data and Homologous Structure Information. Proc Natl Acad Sci U S A 2012, 109, 9875-9880.

(24) Schmitz, C.; Vernon, R.; Otting, G.; Baker, D.; Huber, T. Protein Structure Determination from Pseudocontact Shifts Using Rosetta. J Mol Biol 2012, 416, 668-677.

(25) Hartlmuller, C.; Gobl, C.; Madl, T. Prediction of Protein Structure Using Surface Accessibility Data. Angew Chem Int Ed Engl 2016, 55, 11970-11974.

(26) Ovchinnikov, S.; Park, H.; Kim, D. E.; Liu, Y.; Wang, R. Y.; Baker, D. Structure Prediction Using Sparse Simulated Noe Restraints with Rosetta in Casp11. Proteins 2016, 84 Suppl 1, 181-188.

(27) Wang, C. K.; Swedberg, J. E.; Northfield, S. E.; Craik, D. J. Effects of Cyclization on Peptide Backbone Dynamics. J Phys Chem B 2015, 119, 15821-15830.

17

(28) Conibear, A. C.; Wang, C. K.; Bi, T.; Rosengren, K. J.; Camarero, J. A.; Craik, D. J. Insights into the Molecular Flexibility of Theta-Defensins by Nmr Relaxation Analysis. J Phys Chem B 2014, 118, 14257-14266.

(29) Bayley, M. J.; Jones, G.; Willett, P.; Williamson, M. P. Genfold: A Genetic Algorithm for Folding Protein Structures Using Nmr Restraints. Protein Sci 1998, 7, 491-499.

(30) Wang, C. K.; Swedberg, J. E.; Harvey, P. J.; Kaas, Q.; Craik, D. J. Conformational Flexibility Is a Determinant of Permeability for Cyclosporin. J Phys Chem B 2018, 122, 2261-2276.

(31) Webb, B.; Sali, A. Protein Structure Modeling with Modeller. Methods Mol Biol 2017, 1654, 39-54.

(32) Neal, S.; Nip, A. M.; Zhang, H.; Wishart, D. S. Rapid and Accurate Calculation of Protein 1h, 13c and 15n Chemical Shifts. J Biomol NMR 2003, 26, 215-240.

(33) Han, B.; Liu, Y.; Ginzinger, S. W.; Wishart, D. S. Shiftx2: Significantly Improved Protein Chemical Shift Prediction. J Biomol NMR 2011, 50, 43-57.

(34) Shen, Y.; Bax, A. Sparta+: A Modest Improvement in Empirical Nmr Chemical Shift Prediction by Means of an Artificial Neural Network. J Biomol NMR 2010, 48, 13-22.

(35) Williams, C. J.; Headd, J. J.; Moriarty, N. W.; Prisant, M. G.; Videau, L. L.; Deis, L. N.; Verma, V.; Keedy, D. A.; Hintze, B. J.; Chen, V. B.et al. Molprobity: More and Better Reference Data for Improved All- Atom Structure Validation. Protein Sci 2018, 27, 293-315.

(36) Ramalho, S. D.; Wang, C. K.; King, G. J.; Byriel, K. A.; Huang, Y. H.; Bolzani, V. S.; Craik, D. J. Synthesis, Racemic X-Ray Crystallographic, and Permeability Studies of Bioactive Orbitides from Jatropha Species. J Nat Prod 2018, 81, 2436-2445.

(37) Yang, Y.; Zhou, Y. Ab Initio Folding of Terminal Segments with Secondary Structures Reveals the Fine Difference between Two Closely Related All-Atom Statistical Energy Functions. Protein Sci 2008, 17, 1212- 1219.

(38) Lu, M.; Dousis, A. D.; Ma, J. Opus-Psp: An Orientation-Dependent Statistical All-Atom Potential Derived from Side-Chain Packing. J Mol Biol 2008, 376, 288-301.

(39) Zhang, J.; Zhang, Y. A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction. PLoS One 2010, 5, e15386.

(40) Goransson, U.; Craik, D. J. Disulfide Mapping of the Cyclotide Kalata B1. Chemical Proof of the Cystic Cystine Knot Motif. J Biol Chem 2003, 278, 48188-48196.

(41) Albert, A.; Eksteen, J. J.; Isaksson, J.; Sengee, M.; Hansen, T.; Vasskog, T. General Approach to Determine Disulfide Connectivity in Cysteine-Rich Peptides by Sequential Alkylation on Solid Phase and Mass Spectrometry. Anal Chem 2016, 88, 9539-9546.

18

(42) Quick, M. M.; Crittenden, C. M.; Rosenberg, J. A.; Brodbelt, J. S. Characterization of Disulfide Linkages in Proteins by 193 Nm Ultraviolet Photodissociation (Uvpd) Mass Spectrometry. Anal Chem 2018, 90, 8523- 8530.

(43) Rosengren, K. J.; Daly, N. L.; Plan, M. R.; Waine, C.; Craik, D. J. Twists, Knots, and Rings in Proteins. Structural Definition of the Cyclotide Framework. J Biol Chem 2003, 278, 8606-8616.

(44) Robustelli, P.; Stafford, K. A.; Palmer, A. G., 3rd. Interpreting Protein Structural Dynamics from Nmr Chemical Shifts. J Am Chem Soc 2012, 134, 6365-6374.

(45) Karp, J. M.; Eryilmaz, E.; Cowburn, D. Correlation of Chemical Shifts Predicted by Molecular Dynamics Simulations for Partially Disordered Proteins. J Biomol NMR 2015, 61, 35-45.

(46) Shen, Y.; Bax, A. Homology Modeling of Larger Proteins Guided by Chemical Shifts. Nat Methods 2015, 12, 747-750.

(47) Vernon, R.; Shen, Y.; Baker, D.; Lange, O. F. Improved Chemical Shift Based Fragment Selection for Cs- Rosetta Using Rosetta3 Fragment Picker. J Biomol NMR 2013, 57, 117-127.

(48) Huang, Y. J.; Mao, B.; Xu, F.; Montelione, G. T. Guiding Automated Nmr Structure Determination Using a Global Optimization Metric, the Nmr Dp Score. J Biomol NMR 2015, 62, 439-451.

(49) Boomsma, W.; Tian, P.; Frellsen, J.; Ferkinghoff-Borg, J.; Hamelryck, T.; Lindorff-Larsen, K.; Vendruscolo, M. Equilibrium Simulations of Proteins Using Molecular Fragment Replacement and Nmr Chemical Shifts. Proc Natl Acad Sci U S A 2014, 111, 13852-13857.

(50) Hafsa, N. E.; Berjanskii, M. V.; Arndt, D.; Wishart, D. S. Rapid and Reliable Protein Structure Determination Via Chemical Shift Threading. J Biomol NMR 2018, 70, 33-51.

(51) Bhardwaj, G.; Mulligan, V. K.; Bahl, C. D.; Gilmore, J. M.; Harvey, P. J.; Cheneval, O.; Buchko, G. W.; Pulavarti, S. V.; Kaas, Q.; Eletsky, A.et al. Accurate De Novo Design of Hyperstable Constrained Peptides. Nature 2016, 538, 329-335.

(52) Park, H.; Ovchinnikov, S.; Kim, D. E.; DiMaio, F.; Baker, D. Protein Homology Model Refinement by Large-Scale Energy Optimization. Proc Natl Acad Sci U S A 2018, 115, 3054-3059.

(53) Jiang, F.; Wu, Y. D. Folding of Fourteen Small Proteins with a Residue-Specific Force Field and Replica- Exchange Molecular Dynamics. J Am Chem Soc 2014, 136, 9536-9539.

(54) Wu, H. N.; Jiang, F.; Wu, Y. D. Significantly Improved Protein Folding Thermodynamics Using a Dispersion-Corrected Water Model and a New Residue-Specific Force Field. J Phys Chem Lett 2017, 8, 3199- 3205.

(55) Hafsa, N. E.; Arndt, D.; Wishart, D. S. Accessible Surface Area from Nmr Chemical Shifts. J Biomol NMR 2015, 62, 387-401.

19

Table 1: Analysis of RMSD and Chemical Shift-Based Scores for Selected Examples RMSD Weighted Chemical Shift-Based Scorec Peptide a (Å)b Total Hα SS DFIRE Lo1a (2MD6) Experimental - -962.2 205.3 -143.0 -1024.5 All Predictedd 2.31 -1236.1 66.9 -143.0 -1146.0 Best Scoring 1.99 -1268.5 99.5 -157.0 -1211.0 Best Fit 1.77 -1256.6 64.9 -157.0 -1164.5 Midi (2LU6) Experimental - -258.5 451.0 0.0 -709.5 All Predictedd 3.37 -1349.7 243.1 -693.7 -709.5 Best Scoring 3.48 -1453.2 191.3 -476.0 -898.5 Best Fit 3.01 -1251.6 239.4 -628.0 -863.0 Tk -Amp-X2 (2M6A) Experimental - -2561.0 174.5 -362.0 -2373.5 All Predictedd 2.61 -2850.6 109.4 -387.7 -2572.3 Best Scoring 3.10 -2960.8 88.2 -406.0 -2643.0 Best Fit 1.91 -2823.1 92.9 -362.0 -2554.0 GS -Tamapin (2ME7) Experimental - -2875.5 270.0 -574.0 -2571.5 All Predictedd 5.49 -2348.3 164.4 -418.7 -2094.0 Best Scoring 6.59 -2622.6 178.4 -491.0 -2310.0 Best Fit 2.67 -2456.5 185.0 -544.0 -2097.5 Agitoxin 2 (1AGT) Experimental - -3112.4 278.2 -678.0 -2712.5 All Predictedd 6.30 -2598.0 205.7 -409.6 -2394.1 Best Scoring 4.81 -2838.1 181.9 -483.0 -2537.0 Best Fit 4.81 -2838.1 181.9 -483.0 -2537.0 Cellulose Binding (1AZJ) Experimental - -2092.4 348.7 -308.0 -2133.0 All Predictedd 8.64 -2076.9 189.7 -260.8 -2005.2 Best Scoring 8.52 -2208.7 201.0 -387.0 -2023.0 Best Fit 7.20 -1964.5 213.5 -110.0 -2068.0 a The PDB ID is shown in parentheses. b RMSD is the root mean squared deviation of backbone heavy atoms between predicted and experimental structures. c The total, and Hα, secondary structure (SS), and DFIRE terms of the weighted score are shown. d Average values are shown

20

Figure 1

21

Figure 2

22

Figure 3

23

Figure 4

24

Figure 5

25

Figure 6

26

TOC Graphic

27