
i i \SynRFR_PNAS_FINAL_revision" | 2016/6/16 | 16:47 | page 1 | #1 i i A new class of synthetic beta-solenoid proteins with the fragment-free computational design of a beta-hairpin extension James T. MacDonald ∗ y, Burak V. Kabasakal z, David Godding z x, Sebastian Kraatz z {, Louie Henderson z , James Barber z , Paul S. Freemont ∗ y and James W. Murray z ∗Centre for Synthetic Biology and Innovation, Imperial College London, London, SW7 2AZ, UK,yDepartment of Medicine, Imperial College London, London, SW7 2AZ, UK,zDepartment of Life Sciences, Imperial College London, London, SW7 2AZ, UK,xDepartment of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA UK, and {Laboratory of Biomolecular Research, Paul Scherrer Institut, CH-5232 Villigen PSI, Switzerland Submitted to Proceedings of the National Academy of Sciences of the United States of America The ability to design and construct structures with atomic level pre- sequence consensus-based rules [17] or computational protein cision is one of the key goals of nanotechnology. Proteins offer an design methods [18, 19]. There are a number of families of attractive target for atomic design, as they can be synthesized chem- beta-helical repeat proteins [20], from which we chose the pen- ically or biologically, and can self-assemble. However the generalized tapeptide repeat family, forming the RFR-fold (repeat five protein folding and design problem is unsolved. One approach to residues), which has a square cross-sectional profile, as the simplifying the problem is to use a repetitive protein as a scaffold. Repeat proteins are intrinsically modular, and their folding and struc- basis for the design of a new class of synthetic repeat protein tures are better understood than large globular domains. Here, we (Fig. 1 A and B) [21]. have developed a new class of synthetic repeat protein, based on The RFR-fold has a number of properties that make it at- the pentapeptide repeat family of beta-solenoid proteins. We have tractive as a substrate for design. The structure is unusually constructed length variants of the basic scaffold, and computation- regular, but is able to tolerate a wide range of residues on the ally designed de novo loops projecting from the scaffold core. The outside of the solenoid barrel. The solenoids in natural RFR- experimentally solved 3.56 A˚ resolution crystal structure of one de- fold proteins are nearly straight in contrast to several other signed loop matches closely the designed hairpin structure, showing forms of repeat protein such as the leucine rich repeat (LRR) the computational design of a backbone extension onto a synthetic which are highly curved. There are examples of natural RFR- protein core without the use of backbone fragments from known structures. Two other loop designs were not clearly resolved in the fold proteins with loop extensions projecting from the barrel, crystal structures and one loop appeared to be in an incorrect confor- making this class of protein particularly suitable for function- mation. We have also shown that the repeat unit can accommodate alization. The protein is similar in diameter to DNA, and whole domain insertions by inserting a domain into one of the de- some RFR-fold proteins are thought to play a role as DNA signed loops. mimics [22]. Here, we have designed and solved the structures of a number of artificial RFR-fold proteins of different lengths. computational protein design j synthetic repeat proteins j de novo backbone Previously, computationally designed enzymes have reused design backbone scaffolds from known natural proteins [23{25], al- though artificial helical bundle proteins have been functional- Abbreviations: RFR, repeat five residues; RMSD, root-mean-square deviation uring the course of evolution, natural proteins may be re- Dcruited to new unrelated functions conferring a selective Significance advantage to the organism [1,2]. This accretion of new features and functions is likely to have left behind complex interlocking The development of algorithms to design new proteins with amino acid dependencies which can make reengineering natu- backbone plasticity is a key challenge in computational protein design. In this paper, we describe a novel class of extensible ral proteins difficult and unpredictable [3]. For this reason, we synthetic repeat protein scaffolds with computationally designed and others hypothesize that it is more desirable to design de variable loops projecting from the central core. We have devel- novo proteins as these provide a biologically-neutral platform oped new methods to computationally sample backbone confor- onto which functional elements can be grafted [4]. Artificial mations using a coarse-grained potential energy function with- out using backbone fragments from known protein structures. proteins have been designed by decoding simple residue pat- This was combined with existing methods for sequence design terning rules that govern the packing of secondary structural to successfully design a loop at atomic level precision. Given elements and this has been particularly successful for α-helical the inherent modular and composable nature of repeat proteins, bundle proteins [5{7]. An alternative approach is to assemble this approach allows the iterative atomic-resolution design of complex structures with potential applications in novel nanoma- de novo folds from backbone fragments of known structures terials and molecular recognition. or idealized secondary structural elements and use computa- tional protein design methods to design the sequence [4,8{10]. Reserved for Publication Footnotes Both the computational and the simpler rules-based design approaches have concentrated on designing proteins consisting of canonical secondary structure linked with loops of minimal length. A class of proteins that has attracted considerable inter- est is artificial proteins based on repeating structural motifs due to their intrinsic modularity and designability [11]. Re- peat proteins have applications including their use as novel nanomaterials [12{14] and as scaffolds for molecular recog- nition [15, 16]. These proteins may be designed using both www.pnas.org/cgi/doi/??? PNAS Issue Date Volume Issue Number 1{7 i i i i i i \SynRFR_PNAS_FINAL_revision" | 2016/6/16 | 16:47 | page 2 | #2 i i ized using an intuitive manual design process [26{28]. As the grained Cα potential energy function and then reconstructs field of enzyme design becomes more ambitious it is likely that other backbone atoms using a structural alphabet-based algo- consideration of backbone plasticity will become increasingly rithm [34]. The Cα potential energy function includes pseudo- important [29]. Backbone conformations from solved protein bond length, bond angle, and dihedral terms to ensure good structures are guaranteed to be designable as there is at least local structure together with soft steric repulsive and pseudo- one sequence known to fold into that structure. However, this hydrogen bonding terms. Loop conformations were sampled is unlikely to be true for an arbitrary backbone conformation. by successive simulated annealing Monte Carlo runs followed The incorporation of backbone flexibility in protein design has by full backbone reconstruction. Previously, this method was been recognized as a key challenge in computational protein successfully applied to loop prediction giving results that were design [30] with current methods typically reusing backbone comparable to fragment replacement-based methods despite fragments from other known protein structures [31, 32]. Re- the sequence-independence of the initial backbone conforma- cently, we have developed algorithms to rapidly sample loop tional sampling [35]. Coarse-grained loop sampling was fol- conformations using a coarse-grained Cα model [33] and to lowed by sequence design using Rosetta [36] on each of the accurately reconstruct proteins backbones [34] as part of an conformations to generate full-atom models. approach that often gave sub-A˚ RMSD loop predictions [35]. In this paper, we have applied these techniques to de novo Selection of designed loops. A significant proportion of the backbone design without using fragments from known protein 4000 conformations were likely not designable so we devel- structures while also explicitly considering alternative confor- oped an approach that explicitly considered alternative low mational states. We were able to solve the structures of four energy conformational states in order to filter out bad de- loop design proteins using X-ray crystallography and show signs. Each of the 4000 designed sequences was threaded onto that one of these structures matched the design at atomic each of the 4000 loop conformations then gradient minimized level accuracy. in the Rosetta force-field with the resulting energy and RMSD to the designed structure recorded (Fig. 2 B and C). With the Results assumption that we have sampled the important low energy states, we filtered the designs based on the probability that a Design of synthetic RFR-fold proteins of variable length. design is in a folded state, Pi > 0:9 (equation [1]; Fig. 2D), Residue frequency tables were derived from known RFR-fold calculated using the Boltzmann distribution, and other crite- proteins for each of the five positions in the repeat giving the ria (see Methods). The criterion that Pi > 0:9 removed 97.9 consensus sequence ADLSG (Fig. 1C and SI Appendix, Ta- % of designs by itself. ble S2). A 120 residue stochastic repeat sequence (24 repeats or six superhelical turns) was drawn
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-