Contents

Abbreviations XV

Prologue XVII fane S. Richardson

1 An Introduction to 1

Thomas }. Graddis, Dale L. Oxender

1.1 Protein Engineering is a New Addition to the Biotechnology Revolution 1

1.2 Consist of Several Structural Elements 2 1.3 Technologies that Enable the Development of Protein Engineering . . 3 Recombinant DNA and Technologies 4 — Protein Purifi- cation Plays a Role in the Cycle of Protein Engineering 8 — Functional Analy- sis is Used to Evaluate Structural Changes of Engineered Proteins 11 - Struc- tural Analysis of Proteins is Essential 12 - The Use of X-Ray Crystallography in the Structural Analysis of Proteins 12 — Nuclear Magnetic Resonance Spectroscopy is an Emerging Technology 13 — The Use of Design and Predic- tion in Protein Engineering 14 — De novo Design of Proteins is an Important Long-Term Goal 15 — Knowledge-Based Structural Prediction is Used to Model Proteins of Unknown Structure 16

1.4 There are Currently Several Bottlenecks to the Field of Protein Engi- neering 17 The Rules Governing are Complex 18 — A Few Recognized Steps in Protein Folding are Beginning to Emerge 19 — Chaperones are Pro- teins that Assist the Folding of Other Proteins 19 - Understanding the Struc- ture-Function Relationship is Central to Protein Engineering 20 - Computa- tional Chemistry is Finding Increasing Utility 20 — Data Base Management and Computer Graphics are Useful Tools 21 — Choosing a Prptein Expression System is Often an Empirical Science 22 — Advantages and Disadvantages of Protein Expression Systems 23

1.5 A Progress Report in Protein Engineering Includes Many Exciting Topics . 24 Attempts to Increase Protein Stability have Met with Outstanding Success 25 - Altering Catalytic Activity has been Achieved in a Number of VIII Contents

Systems 27 - Enzyme Specificity and Molecular Recognition are Fundamental to Enzyme Mechanism 28 — Peptide Hormones are Promising Targets for Pro- tein Engineering 30 — Significant Effort has been Applied to the Engineering of 32 — Humanized Antibodies are Providing Important Thera- peutic Agents 32 - Catalytic Antibodies May Become Important Tools for Syn- thetic Chemists 33 — Random Libraries Represent a Semi-Rational Approach to Protein Engineering 33 — Synthetic Peptide Libraries Provide a Systematic Alteration of Peptide Sequence 34 — Combinatorial Libraries Generate Nu- merous Antibodies with Varied Binding Activity 35 — Libraries Permit the Functional Screening of Vast Numbers of Protein Sequences 35 — The Future of may Utilize DNA 37

1.6 The Application of Protein Engineering to Agricultural, Industrial, and Medical Arenas has Produced Useful Products 38 Agricultural Applications 38 — Industrial Applications 40 — Medical Applica- tions 41 — Mass Screening of Natural Products will Utilize the Results of Engi- neered Proteins 43 — Growth Factors are a Promising Area for Protein Engi- neers 44 - The Marriage of Electronics and Biochemistry will Provide Many New Products 44

2 Analysis and Characterization of Proteins 47 Brigitte Wittmann-Liebold, Peter Jungblut

2.1 Introduction: How to Approach the Structure and Function of Pro- teins 47 By the Reductionistic Approach Individual Proteins are Selected for Investiga- tion 49 — By the Global Approach Single Proteins from Thousands of Polypep- tides in the Cell are Selected 50

2.2 The Various Preparative and Analytical Protein Purification Tech- niques 50 How to Consider an Appropriate Strategy for Protein Isolation 50 - Prefrac- tionation is Necessary for Reducing the Number of Proteins in a Complex Protein Mixture 53 — Strategies to Obtain Pure Proteins in a Few Steps 54 — Conventional Column Chromatography for Preparative Isolation of Pro- teins 54 - Gel Filtration is Based on Differences in Molecular Mass 54 - Ion Exchange Chromatography Allows the Isolation of Native Proteins 55 — Hy- drophobic Interaction Chromatography may Assist Isolation of Membrane Proteins 56 - Covalent Chromatography Binds Proteins to Supports for Selec- tive Separation 56 — Immobilized Metal Affinity Chromatography Uses Metal Chelate Formation 57 — Affinity Chromatography is Frequently Applied in Immunological Investigations 58 — HPLC-Separations and 2-DE-Polyacryl- amide Gel Electrophoresis Resolve Complex Protein Mixtures 60 — Electro- phoresis is Applied for Separating Highly Complex Protein Mixtures 60 — Electrophoresis in Combination with Blotting are Fast Means of Screening Pro- teins for Microsequencing and Immunostaining 65 — One Band in SDS Gel Electrophoresis Does Not Guarantee Purity of a Protein 67 — Desalting and Concentrating of Protein Fractions 69 — Spectroscopic Methods Can be Used for the Preliminary Quantitative Determination of Proteins 71 — Amino Acid Analysis is a First Step towards Characterizing a Protein 72 Contents IX

2.3 Investigation of Proteins and Protein Complexes 72 Identification of Subunits 73 - Extraction of Proteins from Hetero-Com- plexes 73 — End-Group Determinations Yield Information about Subunit Structures in Multi-Component Complexes 74 - Modern Mass Spectrometry Makes Possible Analysis of Proteins and Peptides 75 — Reduction and Alkyla- tion versus Treatment by Detergents 75 — Separation of Subunits 75

2.4 Strategies for Primary Structure Analysis of Proteins and Peptides. . 76 Generation of Peptide Fragments 76 — Cyanogen Bromide Cleavage Releases Large Fragments for Sequence Alignment 76 — Mild Acid Treatment Gener- ates Peptides from Insoluble Proteins 77 - Enzymatic Protein Digestions Yield Suitable Peptides for Internal Sequence Analysis 77 - Assignment of Cysteine Residues and Cystine Bridges in Proteins 79 — Mass Spectrometry of Cysteine and Cystine Peptides 80 — Crosslinking of Neighboring Constituents 81 — Protein-Protein Crosslinking Yields Important Information about Distances in the Complex 82 — Investigation of the Crosslinked Proteins on the Amino Acid Level Yields a Valuable Fine Structure Analysis of the Complex 84 - Crosslinks between RNA (DNA) and Protein are More Difficult to Analyze than Protein-Protein Crosslinks 86 — Localization of Structural and Functional Domains 87 — Surface Peptides in Ribosomes 87 - Immunological Studies 88 — Anti-Protein Studies 88 - Synthetic Peptides are Useful for De- tecting Sequence-Specific Antigenic Sites 89 2.5 Protein Microsequence Analysis 90 The Chemistry of the Stepwise Edman Degradation Technique 91 — Advan- tages of the Automatic Microsequence Procedure 93 - Sample Preparation for Automated Sequence Analysis 94 - Manual Methods are Applied to Screen Many Peptide Samples Simultaneously 94 - Dansyl-Edman Degradation Yields Highly Sensitive Fluorescent Amino Acid Derivatives 95 — The Manual DABITC /PITC-Double Coupling Method is Used for Visual Detection of the Released Amino Acids in the Picomole Range 95 — Efforts to Develop Fluores- cent Isothiocyanates for a Sensitive Degradation Failed 96 — Manual Methods Allow the Simultaneous Degradation of Many Peptide Samples 96 — DNA- Sequencing of the Protein's Gene 97 - Animo Acid Sequencing is Necessary for Confirming the Deduced Gene Sequence 98 — Mass Spectrometric Meth- ods 98 — Mass Ion Determinations have Become Possible for Big Proteins Now 99 — Sequencing by Mass Spectrometry is a Complementary Technique to the Edman Degradation 100 — Determination of the C-Terminal Amino Acids is Difficult and of Limited Relevance 100

2.6 Conclusions 102

3 Structure Determination, Modeling and Site-directed Mutagenesis Studies 109 Ulrich Hahn, Udo Heinemann

3.1 Introduction 109

3.2 A Model System: Ribonuclease T1 110 Ribonuclease T1 is Small, Stable and Water-Soluble 112 — Single-Stranded RNA is Cleaved by Ribonuclease T1 Specifically after Guanine 112 X Contents

3.3 Methods of Structure Determination, Modeling and Mutagenesis . . 114 The Three-Dimensional Structure of a Protein Molecule can be Determined with High Accuracy 115 — X-Ray Crystallography Yields High-Resolution Structures in the "Solid" State 117 — Nuclear Magnetic Resonance Spectros- copy Yields an Average Solution Conformation 122 - Databases Archive Structural Information 125 - Modeling and Force-Field Calculations Can Help in the Planning and Understanding of Mutagenesis Experiments 127 — Re- combinant DNA Technology Permits the Creation of Protein Molecules at Will 128 - Foreign DNA Fragments can be Biologically Amplified in Bacteria such as Escherichia coli 134 - High-Level Expression Provides the Amounts of Proteins Required for Many Experiments 138 — Site-Directed Mutagenesis Changes Protein Sequences according to Prespecified Goals 138

3.4 The Structures of Natural and Mutated Ribonuclease T1 145 The Structure of Wild-Type Ribonuclease T1 has been Determined by X-Ray Crystallography 146 — Ribonuclease T1-Inhibitor Complexes Provide Informa- tion about Enzyme Function 147 — Single Amino Acid Mutations may have Drastic Effect on Protein Function and Conformation 155 - NMR Spectros- copy Shows and Flexibility 156

3.5 Catalysis and Specificity 157 Active Site Mutations Help Us to Unterstand the Mechanism of RNA Hydroly- sis 158 - The Prediction of Protein Variants with Altered Substrate Specificity Remains a Problem 159

3.6 Folding Pathways 160 Ribonuclease T1 may Unfold and Refold with Complete Restoration of Activ- ity 160 — Cis-trans Isomerization of two Prolines is the Rate-Limiting Step in Ribonuclease T1 Folding 160

3.7 Protein Stability 161 The Stability of Ribonuclease T1 and Other Proteins is Very Low 161 — Hy- drophobic Interactions and Hydrogen Bonds Contribute Equally to the Sta- bility of Ribonuclease T1161

3.8 Internal Motions in Ribonuclease T1 162

3.9 Modeling of Homologous Ribonucleases 163

3.10 Conclusions 164

4 Rational Design of Proteins with New Properties 169 Dietmar Schomburg

4.1 Overview - The Current Status of Rational 169 Interdisciplinary Work is Essential for the Protein Design Cycle 172 — Step 1: Screening, Purification and Characterization 172 — Step 2: Cloning, Expression and Genetic Engineering of the Wild-Type Enzyme 173 — Step 3: Molecular Modeling 173 — Step 4: Site-Directed Mutagenesis and Evaluation of the Variant 174 Contents XI

4.2 Knowledge of the Protein 3D-Structure is the Essential Prerequisite . . 174 X-Ray Crystallography and NMR are Used for Experimental Structure Deter- mination 176 — Protein Crystallography 176 - NMR-Methods 177 — Structure Prediction is an Alternative Sequence-Oriented Approach 178 — Correct Se- quence Alignment is Necessary for Correct Stucture Predictions by Homol- ogy 180 — Similarity Matrices Improve Sequence Alignment 181 — Predicting the Main-Chain Folding of Insertions and Deletions 184 - Exchange of Amino Acid Side-Chains 185 - Check of the Model 187

4.3 The Game with Large Numbers: Design of New Variants 188 Molecular Graphics are Helpful for Planning Experiments 190 — Experimental Background is Essential 192

4.4 The Necessary Check: Calculations 192 Energy Minimization is Usually the First Step of the Force Field Treatment 194 - Protein Dynamics Simulations Help to Find the Global Energy Minimum of a Protein Structure 194

4.5 Successful Protein Design Examples 195 Successfully Designed 197 — An Important Goal: Redesign of Pro- tein/Protein Interactions 197 — An Instructive Example: Design of a Highly Effective and Selective Inhibitor for Human Granulocyte Elastase 197 — Pro- tein Stability can be Improved by Design 199 — Summary 200

5 Structural Design of Proteins 209

Chris Sander

5.1 Introduction: From Natural to Protein Design 209

5.2 Topological Redesign of Natural Proteins 209 The Core Hypothesis: Side Chain Interactions in the Protein Interior Deter- mine the Specific Fold 209 — Protein Chains can be Cyclically Permuted but Fold as Before 210 — Reengineering of Loop Connections in a Four-Helix Bun- dle Left the Fold Intact 210 — Point of Departure: A Natural Four-Helix Bun- dle 210 — Steps in Redesign Involved Cutting and Pasting 211 — Experimental Verification Proved Successful Design 213 - The Protein Engineer is Free to Redesign Loop Connections 213

5.3 De novo Design of Protein Structures 214 Define the Structure, Invent a Sequence 214 — Construct a Backbone Model 215 — Design an Amino Acid Sequence 216 — Avoid Alternate Folds 216 - Optimize the Model 217 - Express, Synthesize, Purify - and Determine the Structure 217 — More to Come 218

5.4 Design of a-Helices to Form Coiled Coils 218 Different Approaches to the Design of Four-Helix Bundles 219

5.5 Attempts at Designing Larger Proteins: (Pa)8 Barrels 221 Sandwiches of (3-Sheets 223 — Mixed ap Structures 223 XII Contents

5.6 Design Exercises on Computers Help Formulate New Design Projects 224

5.7 De novo Design of Protein Function 224 First Attempts to Engineer Binding Sites: DDT and Metal Ions 224 — Design of Enzymatic Function by a Hybrid Approach 226 — Catalysis Near a Porphyrin Ring 226 — An Attempt to Create an Esterase 227 — Membrane Ion Channels Formed by Simple Helical Peptides 228

5.8 Computer Design and Molecular Selection 229

5.9 Looking Ahead 230

5.10 Protein Databases: Structures and Sequences 230 Database of Protein Structures 230 - Database of Protein Sequences 231 — Derived Databases Useful in Protein Design 231

6 Antibody Catalysis 237

Theodore Tarasow, Donald Hilvert

6.1 Introduction 237 6.2 Exploiting Antibodies as Catalysts 239 6.3 Utilization of Entropy to Speed Chemical Reactions 244 6.4 Catalysis Through Substrate Destabilization 250 6.5 Catalytic Groups and Cofactors can be Used to Accelerate Reactions 253 6.6 Future Prospects for Antibody Catalysis 257

7 Design of Protein Targeting Signals and Membrane Protein Engineering 263

Gunnar von Heijne

7.1 Introduction 263 7.2 Protein Targeting: Pathways and Signals 264 Sorting in the Secretory Pathway Depends on Multiple Signals 264 — Mito- chondrial Targeting Peptides Form Amphiphilic a-Helices 266 - Chloroplast Transit Peptides Contain Many Serines and Threonines 268 — Nuclear Local- ization Sequences are Composed of Two Closely Spaced Clusters of Basic Resi- dues 269 — There are Two Kinds of Peroxisomal Targeting Signals 270 7.3 Membrane Proteins: Principles of Assembly and Engineering .... 270 Membrane Proteins are Adapted to a Non-Polar Environment 270 — The To- pology of Membrane Proteins is Controlled by Positively Charged Resi- dues 271 — Transmembrane a-Helices Pack Together in Much the Same Way as a-Helices in Globular Proteins 273 - The Topology of Membrane Proteins can be Predicted from Their Amino Acid Sequence 274 - Membrane Proteins have Sorting Signals not Found in Soluble Proteins 275

7.4 Conclusions 276 Contents XIII

8 The Rational Design of Amino Acid Sequences 281 Gisbert Schneider, Reinhard Lohmann, Paul Wrede 8.1 Introduction 281 8.2 Understanding the Sequence-Function Relationship is a Prerequisite for Rational Protein Design 282 Appropriate Sequence Representations Allow the Extraction of Features Re- sponsible for a Certain Protein Function 283 — Heuristics Serve as a Guide through the Feature Space 292

8.3 Artificial Neural Networks and Machine Learning are Methods of Choice for Feature Extraction 294 Reliable Prediction Systems are Necessary for Sequence Design: The Problem of Generalization Ability 296 — Artificial Neural Networks Provide Flexible Systems for Pattern Recognition in Amino Acid Sequences and Sequence Clas- sification 298 — Development of Articifial Neural Networks is an Optimiza- tion Task 304 8.4 Simulated Molecular Evolution is a Potential Method for Rational Se- quence-Oriented Protein Design 305 Amino Acid Distance Maps can be Used as a Guide for Sequence Variation 307

9 Structural Control and Engineering of Nucleid Acids 319 Nadrian C. Seeman 9.1 Introduction 319 9.2 The Assignment of Sequences to DNA Objects 326 9.3 Forming DNA Objects Requires Structural, Environmental and Syn- thetic Considerations 328 9.4 Structures Built from Branched DNA Molecules 329 9.5 Potential Applications of Branched DNA Objects and Lattices .... 335 9.6 Concluding Remarks 339

Epilogue 345 by Alexander Rich

Authors 349

Glossary 353

Index 361