Designed Armadillo Repeat Proteins: Library Generation, Characterization and Selection of Peptide Binders with High Specificity
Total Page:16
File Type:pdf, Size:1020Kb
Designed Armadillo Repeat Proteins: Library Generation, Characterization and Selection of Peptide Binders with High Specificity Gautham Varadamsetty, Dirk Tremmel, Simon Hansen, Fabio Parmeggiani and Andreas Plückthun Department of Biochemistry, University of Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland Correspondence to Andreas Plückthun: [email protected] http://dx.doi.org/10.1016/j.jmb.2012.08.029 Edited by S. Koide Abstract Designed Armadillo repeat proteins (ArmRPs) are a novel class of binding proteins intended for general modular peptide binding and have very favorable expression and stability properties. Using a combination of sequence and structural consensus analyses, we generated a 42‐amino‐acid designed Armadillo repeat module with six randomized positions, having a theoretical diversity of 9.9×106 per repeat. Structural considerations were used to replace cysteine residues, to define less conserved positions and to decide where to introduce randomized amino acid residues for potential interactions with the target peptide. Based on these concepts, combinatorial libraries of designed ArmRPs were assembled. The most stable version of designed ArmRP in library format was the N5C format, with three randomized library repeat modules flanked by full consensus repeat modules on either side and, in turn, flanked by N- and C-terminal capping repeats. Unselected members of this library were well expressed in the Escherichia coli cytoplasm, monomeric and showed the expected CD spectra and cooperative unfolding. N5C libraries were used in ribosome display selections against the peptide neurotensin. Highly specific peptide binders were enriched after four rounds of selections using ribosome display. Four peptide side chains were shown to contribute most of the interaction energy, and single alanine mutants could be discriminated. Thus, designed ArmRP libraries can become valuable sources for peptide binding molecules because of their favorable biophysical properties and with a potential for application in general modular peptide recognition. © 2012 Elsevier Ltd. All rights reserved. Introduction identification by protein chip technology or affinity chromatography.4,5 Moreover, many posttransla- In the past two decades, numerous protein tional modifications (e.g., phosphorylation, acetyla- scaffolds1–3 were explored for the generation of tion and methylation) are within extended regions of designed binding molecules using both rational and proteins. combinatorial approaches. However, no generic The binding of a flexible peptide chain is accom- peptide-binding scaffold with high specificity and panied by a loss in entropy. This energetic cost must affinity has yet been reported. Most importantly, no be compensated by the formation of specific in- attempt has been made to exploit the modular teractions. The ability to form hydrogen bonds or structure of extended peptides, which contrasts other specific interactions (salt bridges or hydropho- with the idiosyncratic surface of a folded protein. bic contacts) between the protein and the peptide The recognition of extended regions of proteins, side chains is therefore crucial to achieve significant unfolded proteins or peptides from a protein digest binding affinities and selectivity. would be particularly useful for applications in A number of protein-binding scaffolds exist in proteomics, enabling, for example, protein or mutant nature. Antibodies are widely used to bind peptides 0022-2836/$ - see front matter © 2012 Elsevier Ltd. All rights reserved. J. Mol. Biol. (2012) 424,68–87 Designed Armadillo Repeat Protein Libraries 69 and have been well characterized.6–8 Nonetheless, specificities. The peptide binding in these small anti-peptide antibodies made by classical immuni- adaptor domains usually depends on a particular zation or by recombinant technologies are not sequence feature, for example, a phosphorylated always of high specificity and affinity. Although amino acid, a stretch of proline residues or a free peptide-binding antibodies have certain structural C-terminus. These domains thus represent solutions features in common, the orientation of the peptide is to specific recognition problems but seem to lack the not conserved. Thus, information gained from possibility to be general peptide‐binding scaffolds. structures of antibody–peptide complexes cannot Since they recognize only short amino acid stretches, be easily extended to generate new peptide-binding generally characterized by a low affinity, it would be antibodies or peptide-binding proteins. difficult to extend the sequence specificity. While Natural peptide-binding protein domains (e.g., several such domains could be fused together by SH2,9 SH3,10,11 WW12 and PDZ13–15) have been flexible linkers generating some avidity, the entropy mutagenized to derive species with novel binding loss upon binding of such flexibly linked constructs Fig. 1. Models of designed ArmRP libraries. (a) The model of an N5C molecule contains N- and C-capping repeats (in cyan), two consensus repeats (in green) and three central library modules (in blue). Residues which are randomized using all amino acids, except cysteine, glycine and proline, are indicated by orange spheres and labeled with the position within the repeat. Position 4 has a restricted randomization (EHIKQRT), and it is shown as a yellow sphere. Position 37, depicted in magenta, hosts a conserved asparagine, which is responsible for binding the target peptide backbone in available structures. Gray spheres indicate positions 26 and 29, where lysine and glutamine were allowed (variants KK, KQ, QK and QQ). The model was based on the crystal structure of a consensus‐designed ArmRP28 (Protein Data Bank ID 4DB6) and realized with PyMOL (Schrödinger, LLC). (b) Schematic representation of modules present in N3C and N5C libraries: randomized library modules (L, in blue), terminal capping repeats (N and C in cyan) and constant consensus M‐type modules (M, in green). (c) Sequences of the capping repeats and the internal module M are indicated with the randomized position labeled by the red X underneath, symbolizing all 20 amino acids except Cys, Pro and Gly. The red Z in position 4 symbolizes a randomization only to Glu, His, Lys, Arg, Ile, Gln or Thr. The helix number is indicated. 70 Designed Armadillo Repeat Protein Libraries would not necessarily lead to the high affinity that a same time, keep the stable core intact and minimize larger rigid binder can provide. repulsive interactions within the protein surface. We The major histocompatibility complexes (MHC I then need to apply in vitro selections to test the and MHC II)16 possess sufficient intrinsic variability performance of the system. and the ability to recognize a broad range of peptides, The work described here represents only the but difficulties in their functional expression and slow first step of a long-term project, the generation of equilibration with peptide reduce their attractiveness sets of specific binders for modular peptide as a scaffold candidate. recognition. Here, we describe the construction, Natural repeat proteins have evolved to mediate a analysis and selections from a first set of such wide range of protein–protein interactions across all libraries by using ribosome display and the charac- cellular compartments and across all kingdoms.17,18 terization of a binder for residue-specific recognition Designed repeat proteins have been developed over and affinity. the past few years to facilitate the development of specific binding proteins and thus greatly expand the Results and Discussion range of applications beyond what is possible with classical monoclonal antibodies and recombinant antibody fragments (reviewed in Ref. 2). General considerations for generation of a Several repeat proteins bind peptides, such as designed Armadillo library HEAT repeats,19 Armadillo repeats (ArmRs)20–23 or TPR repeats.24 We found Armadillo repeat proteins (ArmRPs) of particular interest, since a series of Asn Positions responsible for peptide binding in every side chains is involved in binding and thus con- repeat of ArmRPs have been identified from the straining the peptide's extended main chain through crystal structures of complexes of ArmRPs with their extensive hydrogen bond formation.25–27 Every targets21,25,29 (Supplementary Data Fig. S1) and ArmR is composed of three α-helices, named H1, mapped onto a consensus‐designed ArmR named H2 and H3, arranged similarly in a spiral staircase, M‐type (Fig. 1). All the positions involved, except and several repeats stack to form the compact position 41, are located on helices. Position 4 is domain (Fig. 1a). Specialized repeats are present at located on helix H1 and the others are on helix H3. the N- and C-termini of the protein, protecting the Recognition of the target by residues situated in hydrophobic core from solvent exposure. ArmRPs secondary structure elements is characteristic of consist of different families, including importins and solenoid-like repeat proteins, in contrast to anti- β-catenins,25–27 and the binding seems to be bodies and some other alternative scaffolds,1–3 conserved along the surface generated by adjacent where the binding site is mainly formed by loops. In helices 3, almost perpendicular to the axis of helix 3. the Armadillo proteins, the rigidity of these elements The peptide is bound to the ArmRP in an antiparallel and the regular spacing allow the formation of the arrangement, and protein and peptide together form desired