Computational design of a leucine-rich repeat with a predefined geometry

Sebastian Rämischa, Ulrich Weiningerb, Jonas Martinssona, Mikael Akkeb, and Ingemar Andréa,1

Departments of aBiochemistry and Structural Biology and bBiophysical Chemistry, Center for Molecular Protein Science, Lund University, SE-221 00 Lund, Sweden

Edited by David Baker, University of Washington, Seattle, WA, and approved October 30, 2014 (received for review July 17, 2014) Structure-based offers a possibility of optimizing multiple functional sites in a single protein, binding site targeting, or the overall shape of engineered binding scaffolds to match their specific recognition of protein oligomers. targets better. We developed a computational approach for the Leucine-rich repeat (LRR) display a significant varia- structure-based design of repeat proteins that allows for adjustment tion in shape (13, 14). The repeats in this protein class are typically of geometrical features like length, curvature, and helical twist. By composed of 20–30 residues, and they form helically twisted, combining sequence optimization of existing repeats and de novo solenoid-like structures with a continuous parallel β-sheet on design of capping structures, we designed leucine-rich repeats (LRRs) the concave side; they can be elongated or highly curved (2). It has from the ribonuclease inhibitor (RI) family that assemble into been shown that N- and C-terminal capping structures are crucial structures with a predefined geometry. The repeat proteins were for folding and stability of LRR proteins (15, 16). The combination built from self-compatible LRRs that are designed to interact to form of a stable core of framework positions and variability in overall highly curved and planar assemblies. We validated the geometrical assembly structure makes LRRs ideal building blocks for the de- design approach by engineering a ring structure constructed from 10 velopment of repeat proteins with rationally designed shapes. The self-compatible repeats. Protein designcanalsobeusedtoincrease feasibility of engineering LRR proteins has been demonstrated with our structural understanding of repeat proteins. We use our design consensus design applied to repeats from the ribonuclease inhibitor constructs to demonstrate that buried Cys play a central role for (RI) family (17), the nucleotide-binding oligomerization domain stability and folding cooperativity in RI-type LRR proteins. The family (18), and the variable lymphocyte receptors (VLRs). The computational procedure presented here may be used to develop latter yielded a protein-binding scaffold named “repebodies” (19). BIOPHYSICS AND repeat proteins with various geometrical shapes for applications In this work, we developed a computational approach for the COMPUTATIONAL BIOLOGY where greater control of the interface geometry is desired. structure-based design of repeat proteins that allows for adjust- ment of geometrical features like length, curvature, and helical binding scaffold | Rosetta | buried cysteines | twist. We used the method to design a self-compatible LRR that computational protein design | geometrical design assembles into highly curved and planar repeat proteins. The repeats were designed to allow assembly into a closed-ring ngineered protein-binding scaffolds are increasingly used as structure. Variants with five double repeats and added capping Etherapeutics, diagnostic probes, intracellular reporter mole- structures produced stable proteins. In the absence of caps, the cules, or fusion domains in protein crystallization (1). Nature same repeat variants can dimerize to form complete circles (full- provides a large variety of protein recognition scaffolds from ring structures), thereby verifying the correctly designed geom- which engineered systems could be built. Repeat proteins are etry. The results demonstrate that stable proteins with pre- used in a wide range of biological processes, including the im- defined shapes can be built with limited sequence redesign if the mune response and regulatory cascades (2–5). They consist of conformation of the self-compatible repeat is selected carefully. simple, structurally similar building blocks, called repeats, that Additionally, the results highlight the stabilizing effect of buried assemble into elongated tandem arrays (6). Their extended shapes result in proteins with extraordinarily large binding sur- Significance faces, which makes them ideal scaffolds for protein binding. Analogous to , repeat proteins can be divided into Repeat proteins are used in nature to bind to proteins and pep- framework residues, which encode stability and structure, and tides. The shape of their binding surfaces can vary substantially, variable positions, which are responsible for protein recognition even for proteins within the same family. This variability likely (7). A striking difference from antibodies is that the global arose because they evolved to match the proteins they interact structure can vary considerably between repeat proteins, even with geometrically. Repeat proteins are often engineered to de- within a family. This structural variability suggests that not only velop binders specific to new target proteins. It would be highly the directly interacting residues but also the overall shapes of beneficial to design repeat proteins with predefined geometrical these proteins are optimized for binding target molecules. shapes because such a method would enable development of Engineered repeat proteins have typically been developed by engineered repeat proteins that are shape-optimized to their consensus sequence design, a method where highly conserved targets. Here, we demonstrate that repeat proteins with a pre- sequence positions are identified and scaffolds are built from defined shape can be designed using a computational design identical repeats containing the most common residues at those method. The approach is exemplified by the design of a protein positions (8). Consensus design has been successfully applied to that forms a ring structure not seen in nature. create stable scaffolds from several repeat protein classes (9–12). However, this approach does not enable the design of binders with Author contributions: S.R. and I.A. designed research; S.R., U.W., J.M., M.A., and I.A. performed predefined shapes. Because the geometrical shape of an assembly is research; S.R., U.W., M.A., and I.A. analyzed data; and S.R., U.W., M.A., and I.A. wrote the paper. encoded by subtle structural differences between repeats and The authors declare no conflict of interest. interrepeat interfaces, a structure-based design approach is required This article is a PNAS Direct Submission. to design the assembly shape rationally. Optimizing the shape Freely available online through the PNAS open access option. complementarity to a target molecule would enable development of 1To whom correspondence should be addressed. Email: [email protected]. scaffolds that are custom-made for their target proteins and could This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. yield enhanced binding properties, such as simultaneous binding to 1073/pnas.1413638111/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1413638111 PNAS Early Edition | 1of6 Downloaded by guest on September 24, 2021 Cys in the core of RI-type LRR proteins and address the role of AB capping motifs in the folding of repeat proteins. Results To develop an approach for the design of repeat proteins with a defined geometry, we started from the following consid- erations: Repeat proteins can be described as arrays of repeating structural units. However, there are subtle but important dif- ferences between the conformations of individual repeats that encode the compatibility with neighboring repeats as well as the overall shape of the protein. Features like buried hydrogen bonds (e.g., in Asn and Cys ladders, in structural water mole- cules, in contacts formed between interacting loop segments) serve as specificity elements that define the relative orientation of neighboring repeats. Thus, the central question when de- signing shape-optimized repeat proteins is how to engineer these specificity elements accurately. Because nature has already de- veloped a variety of specific interaction geometries, we attempt to adopt these detailed features from repeats with a known structure. In structures of known LRR proteins, the curvature- defining angles between neighboring repeats range from −0.5° to 37.2° and helical twist angles are between −11.2° and 10.5°. The space of available conformations for repeats within a certain family is quite limited and can be sampled from the structures of intact repeat proteins. Because each protein contains many repeats, a large number of possible backbone conformations can be assembled from a small set of protein structures. We developed a computational design method where repeat proteins with predefined shapes are assembled from structur- ally compatible building blocks taken from crystal structures. Subsequently, sequence redesign is used to optimize self- compatibility. A summary of the procedure is shown in Fig. 1A; it consists of the following steps: i) The desired geometry of the protein is defined. ii) A library of structures of individual repeats is compiled from crystal structures of selected repeat proteins. Fig. 1. Overview of the design procedure. (A) Illustration of the general iii workflow for designing repeat proteins (gray) with a predefined shape, ) Repeats that are most likely to assemble with the predefined optimized for a target molecule (rose). (B) Summary of the design as done in geometry are selected from the structural library. In this this study: Native A + B double repeats were extracted from the RI crystal study, we limit ourselves to symmetrical assembly of single structure. Symmetrical docking of each unit (units 1–7) was performed using self-compatible repeats. four different symmetries (C8–C12). The thick black line indicates the best iv) Cycles of rigid body docking and computational sequence de- combination. Sequences of both surface residues (light blue) and core resi- sign are used to optimize the interface between the repeats. dues (green) were optimized; charged residues are shown in red (negative) v) The repeats are covalently connected via loops between and blue (positive). Symmetrical docking verified the planar assembly of 10 consecutive repeats. identically redesigned double repeats. vi) In most cases, capping repeats have to be added to the N- and C-terminal sides of the repeat protein, either by adopt- curvature would dimerize to form a full ring. To the best of our ing existing caps or using de novo design. knowledge, a donut-shaped structure has not been observed As a proof of principle, we applied the design protocol to among LRR proteins. create a protein built from a single type of self-compatible LRR that assembles to form a planar structure with high curvature. Selection of an Optimal Building Block. For creating a small struc- We set a challenging design goal by requiring that the interaction tural library of repeats, we focused on the RI family because the geometry between consecutive repeats would allow for assembly members of this family possess a relatively high curvature and into a closed circular molecule. Once self-compatible repeats are comparably little twist (21). Proteins from the RI family are as- designed that meet those criteria, planar repeat proteins with sembled from two alternating 28-aa and 29-aa repeats, referred various number of repeats could be made. These strict geo- to as A-type and B-type repeats. The different sequence lengths metrical constraints make it possible to address a number of result in subtle conformational differences. A-type and B-type issues. First, the ring form is not dependent on capping repeats. repeats always come in pairs, which suggests that they are not Thus, the question of whether caps are strictly required for self-compatible. We initially attempted to design repeat proteins folding can be addressed. Second, a structure composed of exclusively from either A or B repeats, but the resulting models identical repeating units that does not require caps would be an were poorly packed and had little shape complementarity be- excellent system to deconvolute intra- vs. interrepeat energies tween repeats. Consequently, we used A + B double repeats as in LRR proteins, as previously done using capped consensus structural building blocks and compiled a library of seven repeat protein (20). Third, formation of a ring is a members from the porcine RI (22) (Fig. 1B). stringent test of our ability to control the interaction geometry Using symmetrical docking simulations (23, 24), we then identi- between repeats precisely. If we could first design a stable cap- fied the best candidate double repeat and the optimal unit number ped half-ring, then only a fully planar half-ring with the designed for assembling a ring-shaped structure. By applying different

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111 Rämisch et al. Downloaded by guest on September 24, 2021 Fig. 2. Alignment of internal repeat sequences. Highlighted positions indicate differences between variants; core positions are highlighted in orange, and surface positions are highlighted in green. Positions that were left at their native identity are marked with an asterisk.

rotational symmetries (C8, C9, C10, C11, and C12), closed planar the molecule are required (15, 16, 18). We analyzed the com- ring-shaped structures with varying numbers of repeating units, plementarity of native caps from RI-type LRR proteins to the and hence curvatures, were generated (Fig. 1B). The quality of an designed LRRs by rigid-body docking in Rosetta. The C-terminal assembly was evaluated by the degree of shape complementarity, cap of porcine RI can dock with an almost identical interface as the strength of hydrogen bonds between β-strands, the amount of the one found in native RI, and with similar association energy. void volume, and the binding energy between repeats, as well as Thus, we chose to add the C-terminal 32 aa of porcine RI the distance between the N and C termini of consecutive repeats, without further modifications. On the N-terminal side, we found which needed to be covalently connected. little compatibility between the designed proteins and N-caps Best results were obtained for 10 repeating units (C10) with from all RI-type LRR proteins of known structure, something double repeat number 6 (residues 310–366) as the building block. that could not be corrected by redesigning the interface residues. The crystal structure of the horseshoe-shaped porcine RI has an We chose to design N-caps de novo. Using a combination of accumulated helical twist of 16° and a curvature of 238° over seven interface design using Protein Data Bank ID code 1IO0, and si- double repeats. If extended to a complete ring, the two end multaneously optimizing interface residues within both the cap and repeats would sterically overlap by more than 13 Å and the rise the first internal repeat (Fig. 3A), we obtained a model with in- along the helical axis would be 11 Å. Thus, the geometry of the terface energies similar to those interface energies found in native design model, with a helical twist of 0° and a total curvature of LRRs. Because the first repeat could be either an A-type or B-type BIOPHYSICS AND 360° (252° over seven double repeats), differs considerably from repeat, two N-caps were designed: one coupled to an A-type repeat COMPUTATIONAL BIOLOGY the native RI protein from which the repeat was adopted. and one coupled to a B-type repeat. Ab initio folding simulations of the designed sequences attached to internal repeats revealed Sequence Optimization to Increase Self-Complementarity. To mini- a strong convergence toward the designed conformations, as shown mize the energy of the assembly, we performed cycles of sequence in Fig. 3B. For experimental characterization, we combined five ’ optimization using Rosetta s fixed-backbone design algorithm (25) self-compatible double repeats, based on gdLRR-1 or gdLRR-2, and symmetrical docking. To minimize the risk of aggregation, we with the different caps (Fig. 3D); the four constructs are referred − B enforced a net charge of 3 per double repeat (Fig. 1 ). At the to as gdLRR-1-A-cap, gdLRR-2-A-cap, gdLRR-1-B-cap, and concave side, a repeating electrostatic interaction pattern was gdLRR-2-B-cap. For proteins with B-caps, we inserted an addi- β designed by selecting Lys, Asn, and Ser in the -strand of repeat A tional redesigned N-terminal B-type repeat (Fig. 3D). A struc- and Glu, Ser, and Ser in repeat B. Because the model with the tural model for one of them, gdLRR-1-A-cap, is shown in Fig. 3C. native core sequence already looked promising, and to avoid the risk of removing critical interactions, we tried to minimize se- Biophysical Characterization. All four variants could be overex- quence changes in the core. Fixed-backbone design calculations pressed in Escherichia coli with yields of 10–40 mg of pure pro- suggested that introduction of a Cys residue at position 56 would tein from the soluble fraction of 1 L of culture volume (Fig. 4A). be greatly stabilizing. Cys is highly conserved at position 56 in RI- type proteins, yet it is missing from the native sequence of the sixth RI double repeat. In the RI protein, five of the seven double repeats have a Cys at this position. In our most conservative de- A BC sign, called geometrically designed LRR-1 (gdLRR-1), we adopted the core residues from the sixth RI double repeat, except for position 56, where Thr was replaced by Cys. In both the crystal structure of RI and our design model, Cys56 mediates buried polar contacts. In the final models, the N and C termini of consecutive repeats ended up favorably positioned, so that consecutive repeats could be covalently connected by reintroducing Thr, which had been removed for docking of unconnected repeats. We constructed a second self-compatible repeat, gdLRR-2, based on the core sequence of the previously published RI con- D sensus design (17). This core resulted in good packing with low energy. The consensus sequence lacks Cys56, as well as an addi- tional Cys at position 9, which had been removed for experimental Fig. 3. De novo design of N-terminal caps. (A) Designed interface between reasons. Cys9 is highly conserved in RI repeats, more so than the first internal A-repeat (gray) and the designed N-terminal cap (red). (B) Cys56. Based on the high degree of conservation and favorable Rosetta energy landscape from ab initio structure prediction of the designed N-terminal sequence. The y axis shows energy of 50,000 models, and the x energetics, we introduced Cys9 and Cys56 into the consensus core axis indicates the Cα rmsd to the designed backbone conformation. (C) of gdLRR-2. All designed repeat sequences are shown in Fig. 2. Model of the complete gdLRR-1-A-cap (blue), docked C-terminal cap from Protein Data Bank ID code 1IO0 (red), and de novo designed N-terminal Addition of Capping Motifs. To be able to produce stable and capping motif. (D) Architecture of the final constructs; residues in the first soluble repeat proteins with fewer repeats than needed for ring internal repeat were redesigned together with the N-terminal capping formation, capping structures at the N- and C-terminal ends of motif, as indicated by red dots. Gray boxes represent A + B double repeats.

Rämisch et al. PNAS Early Edition | 3of6 Downloaded by guest on September 24, 2021 The purified proteins are highly soluble, can be kept in solution which probe conformational exchange on the millisecond time at room temperature for several weeks, and can be lyophilized scale. The results reveal that only a few methyl groups display without triggering aggregation. CD spectroscopy showed a broad a significant dispersion in their relaxation profiles, whereas the negative absorption band with a minimum at 221 nm, indicating large majority of residues have flat profiles (Fig. S3). The absence mixed α/β secondary structure content (Fig. 4C and Fig. S1). of relaxation dispersion implies that the particular residue does not Temperature denaturation followed by CD indicates higher exchange between alternative conformations, or that the chemical degrees of cooperativity for the variants with A-caps (Fig. 4C and shifts are identical in the different states. Conformational exchange Fig. S1). The temperature denaturation midpoints (Tm) were is seen for two Leu residues and two of four Tyr residues in the 41 °C for gdLRR-1-A-cap, 45 °C for gdLRR-1-B-cap, 56 °C for protein (Fig. S3).TwoTyrresiduesarelocatedineachofthecaps; gdLRR-2-A-cap, and 58 °C for gdLRR-2-B-cap. thus, at least one of the caps samples an alternative, high-energy More detailed analysis was carried out for one of the constructs, state with a relative population of ∼3%. For most methyl groups gdLRR-2-A-cap. FTIR spectroscopy was used to study secondary that do not show exchange, the chemical shift is clearly indicative of structure formation. The second-derivative FTIR spectrum showed a well-structured environment. Therefore, we conclude that the characteristic bands in the amide-I region that indicate the presence core of the protein is not in exchange with an unfolded species on − − of both β-strands (1,621 cm 1 and 1,631 cm 1)andtheα-helix the millisecond time scale. − (1,655 cm 1). As a reference system, we used a highly soluble but unfolded design variant, gdLRR-3-A-cap, which is described in more Dimerization of gdLRR-2. After developing a stable assembly of self- detail below. The second-derivative FTIR spectrum of this protein compatible repeats with caps, we wanted to determine whether the − showed a strong signal at 1,648 cm 1, which is indicative of a random interaction geometry of the repeat was correctly encoded. In the coil (28) (Fig. 4B). Thus, FTIR analysis demonstrates that gdLRR-2- absence of both caps, only a structure with no twist and the correct A-cap is a well-folded protein of mixed α/β secondary structure. curvature is expected to form a dimeric complex of two five- To characterize the tertiary structure of the folded state further, double repeat half-rings. We created a construct consisting ex- we studied the protein with heteronuclear 1H–13C NMR spec- clusively of five gdLRR-2 double repeats for experimental testing. troscopy. The dispersion of methyl groups in NMR spectra is an CD and FTIR spectra of this protein were similar to the capped variants (gdLRR-2-A-cap and gdLRR-2-B-cap) (Fig. 5 A and B). excellent probe for the presence of tertiary structure. Without Dynamic light scattering (DLS) experiments suggested a diameter structure formation, all methyl groups of the same type cluster at of the purified protein of ∼80 Å, which is very close to the longest chemical shifts that are characteristic of a random coil. In a folded interatomic distance within the model of the ring (77 Å). Ana- protein, a distinct chemical environment of individual methyl lytical ultracentrifugation showed dimers, but no trace of mono- groups leads to different chemical shifts resulting in a dispersion of mers in solution (Fig. S4). To validate the structural model signals. We expressed gdLRR-2-A-cap and, as a reference, the 13 further, we studied the dimers with negative-stain EM. As shown unfolded gdLRR-3-A-cap (see below) with [1- C]-glucose as the in Fig. 5C, the micrographs demonstrate that the dimers form sole carbon source. Expression results in specific labeling of iso- donut-shaped structures with an outer diameter of ∼80 Å and an lated sites, including methyl groups (29). In agreement with the – 1 –13 inner diameter of 20 25 Å. These dimensions are in agreement FTIR results, the 2D H C NMR spectrum of gdLRR-3-A-cap with the results from DLS and the structural model (Fig. 5D). is characteristic of a random coil, with no more than 5% folded D Temperature denaturation followed by CD revealed an apparent conformation (Fig. 4 , blue contours), whereas gdLRR-2-A-cap two-state behavior with a high degree of unfolding cooperativity shows methyl signals that are well dispersed, indicating an intact (Fig. 5A). The dimer refolds upon cooling (Fig. S5) and has a T of D m tertiary structure for at least 99% of the ensemble (Fig. 4 ,green 86 °C, which is significantly higher than the T of the capped var- 1 –13 m contours). We further compared the experimental H Cspec- iants. This result suggests that disassembly and unfolding happen trum of gdLRR-2-A-cap with a theoretical spectrum, calculated simultaneously; dissociation into stable monomers below 86 °C is for the Rosetta model using SHIFTX2 (30) (Fig. S2). In general, unlikely because the capped monomer starts to unfold above 60 °C the experimental chemical shift dispersion compares favorably and uncapped monomers are expected to be even less stable. The with the predicted spectrum, although a limited number of well- successful formation of a planar ring structure with the designed dispersed predicted signals from residues in, or near, the cap curvature does not exclude that the capped variants adopt a differ- regions are missing in the experimental spectrum. By contrast, the ent geometry. However, because the capped monomers have a well- spectrum predicted for an intrinsically disordered protein of the defined structure and the closed-ring structure of the noncapped same sequence, calculated using the ncIDP program (31), shows dimer is very stable, we believe that energetically costly large-scale no similarity to the experimental spectrum (Fig. S2 B–D). Taken rearrangements upon dimerization are unlikely. together, these results firmly support the conclusion that gdLRR- 2-A-cap forms a well-defined structure in solution. Cys56 Stabilizes the Folded State of a Designed LRR. The successful To investigate the extent of structural flexibility of gdLRR-2- designs were created by introduction of Cys into a conservatively A-cap, we carried out NMR relaxation dispersion experiments, redesigned core (Fig. 6A). However, we wanted to investigate

A CD Fig. 4. Biophysical characterization of gdLRR-2-A- cap. (A) Overexpression of gdLRR-4-A-cap. P, pellet of insoluble fraction; S, soluble fraction. (B) Second- derivative FTIR spectra of gdLRR-2-A-cap (green) and gdLRR-3-A-cap (blue). Greek letters indicate the cor- B responding secondary structure. (C, Top)Far-UVCD spectrum of gdLRR-2-A-cap. (C, Bottom) Temperature denaturation monitored by ellipticity at 222 nm. Dots indicate measured values, and the solid line shows a fit to a two-state model. (D) Methyl regions of the 1H–13C heteronuclear single quantum coherence spectra of gdLRR-2-A-cap (green) and gdLRR-3-A-cap (blue).

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111 Rämisch et al. Downloaded by guest on September 24, 2021 AB conformations. With limited redesign of the core, the backbone conformation of the designed repeat is most likely close to the template, and hence the design model. The experimental analysis demonstrates that changing a single in the core se- quence of the template repeat was sufficient to obtain a folded and stable protein, gdLRR-1. Moreover, limited redesign signif- icantly increased stability and folding cooperativity (gdLRR-2). Both results highlight that the crucial step in designing RI repeat proteins was the selection of an optimal building block, rather than introducing new interactions between building blocks by CD extensive redesign of the core. At some positions, there is more tolerance for introducing amino acids that are rare within a particular LRR family, like Val20 and Ile44 in gdLRR-1 or Ala11 in gdLRR-3-C56. How- ever, highly conserved positions are likely to be more sensitive to substitutions, for instance, Cys9. The structural role of Cys at position 9 of LRR proteins (C10 in the most common numbering scheme) has been extensively described (32, 33). It is part of an Asn/Cys ladder, which is located immediately following the β-strand, where it connects neighboring repeats via side-chain– to–main-chain hydrogen bonds. When comparing gdLRR-3-A- cap-C56, which lacks Cys9, and gdLRR-2-A-cap, which does Fig. 5. Structure analysis of the cap-deficient gdLRR-2. (A) Thermal have Cys9, there is a considerable difference in unfolding unfolding monitored by ellipticity at 222 nm. Dots represent the data, and cooperativity between these constructs. The increased coopera- the solid line shows the fitted function. (Inset) Far-UV CD spectra during the thermal melting experiment; colors indicate the temperature from dark blue tivity of gdLRR-2-A-cap gives some experimental evidence that (15 °C) to dark red (95 °C). (B) Second-derivative FTIR spectrum. Greek letters the Asn/Cys ladder may be important for encoding folding indicate secondary structure. (C) Negative-stain EM images at a magnifica- cooperativity in RI-type LRR proteins. tion of 55,000×.(D) Model of the assembled gdLRR-2 dimer; one monomer is Less attention has been paid to Cys on the descending loops of BIOPHYSICS AND

colored in blue, and one monomer is colored in red. A-type repeats, where a cleft to the preceding B-type repeat is COMPUTATIONAL BIOLOGY found. In RI, as well as in our design models, Cys56 appears to stabilize the conformation of the descending loop by forming hy- whether a stable and folded repeat protein could be designed drogen bonds either to the backbone nitrogen at position i + 2or without buried Cys. To this end, we carried out fixed-backbone to carbonyl oxygen in the neighboring repeat’s helix cap (Fig. 6A). design of the core, while disallowing Cys. Because the core of Polar interactions at this site are likely to be important, because the RI protein is poorly packed, we attempted to stabilize the Cys56 is substituted by Thr or Ser in some repeats. We observed designed repeats by adding more hydrophobic residues. We a significant increase in secondary structure content and unfolding constructed two new sequence variants (gdLRR-3 and gdLRR-4; cooperativity by mutating Ile56 to Cys in gdLRR-3. This obser- Fig. 2) with more favorable energies compared with gdLRR-1 vation is an experimental indication that polar interactions at this and gdLRR-2. In both constructs, the core Cys9 and Cys56 were site are essential for stability. It has been suggested that de- replaced by Thr and Ile, respectively. The hydrophobic Ile fits stabilization of single repeats within an assembly is essential for very well into the cleft between the descending loops of two the function of repeat proteins by introducing flexibility into the adjacent A-type and B-type repeats and forms favorable contacts protein (34–36). Perhaps this need for selective instability explains to other core residues. For experimental testing, we constructed the absence of highly conserved Cys in some native repeats like four capped constructs with five self-compatible double repeats, the template repeat used in this study. Selective mutations of core named gdLRR-3-A-cap, gdLRR-3-B-cap, gdLRR-4-A-cap, and Cys may provide a means for selectively destabilizing specific gdLRR-4-B-cap. repeats in designed binder proteins. All four constructs could be expressed and purified to high Two previously reported consensus sequence designs of LRR yields. Analysis by CD spectroscopy indicated that all constructs proteins were based on VLR (19) and RI-type repeats (17). were unfolded (Fig. S6). FTIR and NMR spectra of gdLRR-3-A- VLRs consist only of 24 residues, which does not allow for for- B cap confirmed a lack of secondary and tertiary structure (Fig. 3 mation of canonical α-helices on the convex side of the and D). The gdLRR-2 and gdLRR-3 repeats differ only at three core positions, two of which are positions 9 and 56. We spec- ulated that the constructs based on gdLRR-3 were unfolded due to the absence of either one or both core Cys. To test this AB hypothesis, we mutated Ile56 to Cys in gdLRR-3-A-cap. The resulting protein, named gdLRR-3-A-cap-I56C, has a CD spec- trum similar to the CD spectra of the folded variants (Fig. 6B). Thus, a single mutation in each repeat seems to induce sec- ondary structure formation, which demonstrates the stabilizing effect of Cys56. Discussion A key for designing shape-optimized repeat proteins is to control accurately the subtle structural features that encode the overall Fig. 6. Roles of buried Cys at positions 9 and 56. (A) Structural represen- tation of Cys9 (Top) and Cys56 (Bottom) in the model of gdLRR-2. Yellow ...... shape of these molecules. Our approach is to adopt these de- dashed lines indicate N HO and N H-N H-bonds, pink dashed lines indicate tailed features from known repeat structures. The benefit of this Cys H-bonds. (B) Comparison of CD spectra (Left) and temperature de- conservative approach to shape-controlled engineering of repeat naturation curves followed by ellipticity at 222 nm (Right) of gdLRR-3-A-cap proteins is that it limits the need for precise design of backbone (blue) and gdLRR-3-A-cap-I56C (green).

Rämisch et al. PNAS Early Edition | 5of6 Downloaded by guest on September 24, 2021 horseshoe. The shorter repeat sequence results in a protein with blocks for cases where single self-compatible repeats are not less void volume, which may explain the higher thermal stability sufficient to produce a desired shape. However, not all shapes of the designed VLR compared with RI repeats. The RI con- could be designed by this approach because of the finite vari- sensus design unfolds with low cooperativity (17). There is ability in repeat conformations. Furthermore, repeats with a low a strong similarity between the published CD spectra of the five- amount of well-defined secondary structure elements are not repeat consensus design and the nonfolded gdLRR-3 and amenable to fixed-backbone design, because sequence changes gdLRR-4 variants. The folded gdLRR-2 variants differ from the are likely to cause structural changes in loop regions. Never- consensus sequence design at two core positions, Cys9 and theless, there should be a sufficient number of structured repeats Cys56. Our results suggest that introduction of these residues to enable the design of a wide range of shape-optimized protein into the consensus sequence design may improve stability and binders and biomaterials. folding cooperativity. However, the different capping repeats may also play a role in explaining the different biophysical Materials and Methods properties compared to our constructs. All genes were codon-optimized for expression in E. coli and synthesized by It has been shown that the LRR protein internalin-B has a po- Genscript. Proteins were expressed in LB for 4 h at 37 °C and purified by His- larized folding pathway, starting from the N-terminal capping tag affinity chromatography, His-tag removal by tobacco etch virus protease motif (15). A capless ring could be used to investigate whether cleavage, and gel filtration. Isotopically labeled proteins were expressed in caps are crucial for initiating folding or merely for stabilizing the M9 minimal medium supplemented with [1-13C]-glucose. Fixed-backbone folded state. Here, we show that a dimeric full-ring construct is design and symmetrical docking simulations were performed using ++ stably folded without caps. Although shielding the hydrophobic Rosetta and Rosetta3. N-caps were designed using RosettaScripts and core is almost certainly crucial for stability, specific capping motifs RosettaRemodel, and convergence of the designed sequence toward the de- sign model was tested by ab initio structure prediction in Rosetta3. Curvatures may not be required for initiating folding of LRR proteins. and helical twists were calculated using Angulator (37). Detailed descriptions Here, we present a general design approach for development of all methods and all sequences are given in SI Materials and Methods. of geometrically optimized repeat proteins built from self- compatible repeats. The proteins developed in this study were ACKNOWLEDGMENTS. We thank Andreas Barth (Stockholm University) for designed to assemble with a repeat-repeat interface geometry help with FTIR, the Lund Protein Production Platform (Lund University, defined by cyclical symmetry. However, a wide variety of shapes www.lu.se/lp3) for protein production, Eva-Christina Ahlgren and Gunnel Karlsson for help with EM, and Sarel Fleischman for valuable comments on can be designed by applying helical symmetry, which can readily the manuscript. The work was supported by the Swedish Research Council be implemented into the design process. The design method (Vetenskapsrådet), Crafoord Foundation, and Defense Advanced Research could also be extended to include multiple different building Projects Agency (Subcontract 747458).

1. Boersma YL, Plückthun A (2011) and other repeat protein scaffolds: Ad- 20. Aksel T, Majumdar A, Barrick D (2011) The contribution of entropy, enthalpy, and vances in engineering and applications. Curr Opin Biotechnol 22(6):849–857. hydrophobic desolvation to cooperativity in repeat-. Structure 19(3): 2. Kobe B, Kajava AV (2001) The leucine-rich repeat as a protein recognition motif. Curr 349–360. Opin Struct Biol 11(6):725–732. 21. Kobe B, Deisenhofer J (1993) Crystal structure of porcine ribonuclease inhibitor, ’ 3. D Andrea LD, Regan L (2003) TPR proteins: The versatile helix. Trends Biochem Sci a protein with leucine-rich repeats. Nature 366(6457):751–756. – 28(12):655 662. 22. Kobe B, Deisenhofer J (1996) Mechanism of ribonuclease inhibition by ribonuclease 4. Li J, Mahajan A, Tsai M-D (2006) Ankyrin repeat: A unique motif mediating protein- inhibitor protein based on the crystal structure of its complex with ribonuclease A. protein interactions. Biochemistry 45(51):15168–15178. J Mol Biol 264(5):1028–1043. 5. Al-Khodor S, Price CT, Kalia A, Abu Kwaik Y (2010) Functional diversity of ankyrin 23. André I, Bradley P, Wang C, Baker D (2007) Prediction of the structure of symmetrical repeats in microbial proteins. Trends Microbiol 18(3):132–139. protein assemblies. Proc Natl Acad Sci USA 104(45):17656–17661. 6. Kajava AV (2012) Tandem repeats in proteins: From sequence to structure. J Struct 24. DiMaio F, Leaver-Fay A, Bradley P, Baker D, André I (2011) Modeling symmetric Biol 179(3):279–288. macromolecular structures in Rosetta3. PLoS ONE 6(6):e20450. 7. Main ERG, Phillips JJ, Millership C (2013) Repeat protein engineering: Creating 25. Das R, Baker D (2008) Macromolecular modeling with rosetta. Annu Rev Biochem 77: functional nanostructures/biomaterials from modular building blocks. Biochem Soc – Trans 41(5):1152–1158. 363 382. 8. Forrer P, Stumpp MT, Binz HK, Plückthun A (2003) A novel strategy to design binding 26. Fleishman SJ, et al. (2011) RosettaScripts: A scripting language interface to the Ro- molecules harnessing the modular nature of repeat proteins. FEBS Lett 539(1-3):2–6. setta macromolecular modeling suite. PLoS ONE 6(6):e20161. 9. Main ERG, Xiong Y, Cocco MJ, D’Andrea L, Regan L (2003) Design of stable α-helical 27. Huang P-S, et al. (2011) RosettaRemodel: A generalized framework for flexible arrays from an idealized TPR motif. Structure 11(5):497–508. backbone protein design. PLoS ONE 6(8):e24109. 10. Kohl A, et al. (2003) Designed to be stable: Crystal structure of a consensus ankyrin 28. Kong J, Yu S (2007) Fourier transform infrared spectroscopic analysis of protein sec- repeat protein. Proc Natl Acad Sci USA 100(4):1700–1705. ondary structures. Acta Biochim Biophys Sin (Shanghai) 39(8):549–559. 11. Binz HK, Stumpp MT, Forrer P, Amstutz P, Plückthun A (2003) Designing repeat 29. Lundström P, et al. (2007) Fractional 13C enrichment of isolated carbons using [1-13C]- proteins: Well-expressed, soluble and stable proteins from combinatorial libraries of or [2-13C]-glucose facilitates the accurate measurement of dynamics at backbone consensus ankyrin repeat proteins. J Mol Biol 332(2):489–503. Calpha and side-chain methyl positions in proteins. J Biomol NMR 38(3):199–212. 12. Urvoas A, et al. (2010) Design, production and molecular structure of a new family of 30. Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: Significantly improved pro- artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like tein chemical shift prediction. J Biomol NMR 50(1):43–57. repeats. J Mol Biol 404(2):307–327. 31. Tamiola K, Acar B, Mulder FAA (2010) Sequence-specific random coil chemical shifts 13. Enkhbayar P, Kamiya M, Osaki M, Matsumoto T, Matsushima N (2004) Structural of intrinsically disordered proteins. J Am Chem Soc 132(51):18000–18003. – principles of leucine-rich repeat (LRR) proteins. Proteins 54(3):394 403. 32. Kobe B, Deisenhofer J (1995) Proteins with leucine-rich repeats. Curr Opin Struct Biol 14. Kajava AV (1998) Structural diversity of leucine-rich repeat proteins. J Mol Biol 277(3): 5(3):409–416. – 519 527. 33. Buchanan SGSC, Gay NJ (1996) Structural and functional diversity in the leucine-rich 15. Courtemanche N, Barrick D (2008) The leucine-rich repeat domain of Internalin B repeat family of proteins. Prog Biophys Mol Biol 65(1-2):1–44. folds along a polarized N-terminal pathway. Structure 16(5):705–714. 34. Truhlar SME, Torpey JW, Komives EA (2006) Regions of IkappaBalpha that are critical 16. Dao TP, Majumdar A, Barrick D (2014) Capping motifs stabilize the leucine-rich repeat for its inhibition of NF-kappaB.DNA interaction fold upon binding to NF-kappaB. Proc protein PP32 and rigidify adjacent repeats. Protein Sci 23(6):801–11. Natl Acad Sci USA 103(50):18951–18956. 17. Stumpp MT, Forrer P, Binz HK, Plückthun A (2003) Designing repeat proteins: Mod- 35. Croy CH, Bergqvist S, Huxford T, Ghosh G, Komives EA (2004) Biophysical character- ular leucine-rich repeat protein libraries based on the mammalian ribonuclease in- hibitor family. J Mol Biol 332(2):471–487. ization of the free IkappaBalpha ankyrin repeat domain in solution. Protein Sci 13(7): – 18. Parker R, Mercedes-Camacho A, Grove TZ (2014) Consensus design of a NOD receptor 1767 1777. leucine rich repeat domain with binding affinity for a muramyl dipeptide, a bacterial 36. Kloss E, Courtemanche N, Barrick D (2008) Repeat-protein folding: New insights into cell wall fragment. Protein Sci 23(6):790–800. origins of cooperativity, stability, and topology. Arch Biochem Biophys 469(1):83–99. 19. Lee S-C, et al. (2012) Design of a binding scaffold based on variable lymphocyte re- 37. Bublitz M, et al. (2008) Crystal structure and standardized geometric analysis of InlJ, ceptors of jawless vertebrates by module engineering. Proc Natl Acad Sci USA 109(9): a listerial virulence factor and leucine-rich repeat protein with a novel cysteine ladder. 3299–3304. J Mol Biol 378(1):87–96.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1413638111 Rämisch et al. Downloaded by guest on September 24, 2021