CRYSTAL STRUCTURE AND REGULATORY MECHANISM OF

Insights into Calcium-Dependent Proteol y sis

CHRISTOPHER MARK HOSFIELD

A thesis submitted to the Department of Biochemistry

in conformity with the requirements for

the degree of Doctor of Philosophy

Queen's University

Kingston?Ontario, Canada

Apnl, 2001

copyright Q Christopher Mark Hosfield, 2001 National Library Bibliothèque nationale l*l of Canada du Canada Acquisitions and Acquisitions et Bibliographie SeMces selvices bibliographiques

The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive pennethnt à la National Library of Canada to Bibliothéque nationale du Canada de reproduce, loan, distriiute or seil reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/nùn, de reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. The ubiquitous (p- and m-calpain) are heterodimeric that combine cysteine activity togetha with ca2+-binding EF-hand motifs in one molecule. This featrire is unique to the calpains and has linked them to signaling pathways stimulated by ca2'. Excessive activation of calpains due to deregdation of intracellular ca2+homeostasis is implicated in neurodegeneration (as in Alzheimer's and Parkinson's diseases) and ischemic tissue damage (following heart attack or stroke). Consequently, calpains are considered important therapeutic targets and much interest exists to develop effective calpain-specific inhibitors. The enzymatic activity of calpain is tightly regulated by several factors including heterodimer formation, limited autolysis, phospholipid membranes, activator and . Even regulation through ca2', the only absolute requirement for activity, is complicated by the fact that calpain requires significantly more ca2* for activation in vino than is available in vivo. The precise rnechanism of calpain activation by ca2' is a fundamental biochemical question that has remained poorly understood, largely owing to the lack of available structural information. To better undentand the functional replation of this intriguing , we have det mined the three-dimensional structure of rat m-calpain by X-ray aystallography. m-Calpain was crystallized in the absence of ca2+in two crystd foms and a selenomethionine-derivative was used to determine the structure using the method of multiwavelength anomalous dispersion. Refined to 2.6 A resolution, this structure is the first reported for any calpain heterodimer. Structural analysis reveais that the calpain heterodimer is an elongated molecule with several discrete domains. The cornponent (domains 1 and II) and the EF-hand domains (IV and VI) are situated at opposite poles of the enzyme. linked covalently through an extended linker and p- sandwich domain similar to a C2-motif (domain-III), as well as non-covalently by a unique, pro-segment-like N-terminal a-helix. This inactive form of calpain exhibits a novel mechanisrn of cysteine protease zymogen-inactivation: protease domains 1 and II are held apart and consequently the and -binding cleft are not formed. The a-helical pro-segment does not occupy a pre-forrned , as in other cysteine , but instead inhiiits its assernbly by senring as a conformational restraint on protease domain-1, anchoring it to the reglatory subunit (domain-VI). It is likely, therefore, that auto1ytic removal of the pro-segment reduces the ca2'-requirement of calpain by removing this remaint A set of highly conserveci inter-domain salt-bridges between lysine residues in protease domain-11 and acidic residues in the Crlike domain III serve as a conformational restraint on protease dornain-II. Disruption of these key interactions through site-directed mutagenesis has verified their inhiiitory role, and was shown to significantly reduce the [ca2'] required for activation in vitro. The finding of a Cl-like domain in the m-calpain structure has provided a potentiaiiy novel insight, suggestiDg that the EF-hand domains are not the sole replators of the ~a~+-res~onse. Indeed, the mutagenesis studies have indicated the afhity of m-calpain for ca2+ is influenced by domain III, which is far-removed fiom the EF-han& Although a structure of an activated, caN-bound enzyme has not yet been determined, it is clear that ca2+- binding faditates calpain activation through conformational changes that re-orient the protease domains and allow the formation of the substrate-binding clefl and a functional catalytic triad ACKNOWLEDGEMENTS

Above dl, 1 would like to thank my supervisor Dr. Zongchao Jia for allowing me the opportunity to study such an interestkg topic in an exciting and extremely important field of biochernistry. Most important to me were the excellent guidance and support I received fÏom Dr. Jia throughout this entire project. His open-mindedness allowed me to investigate many independent questions, and he always made himself available to hear my fmdings and concerns. On a persona1 note, 1 enjoyed many hurnorous and non- scientific discussions with Dr. Jia and his entire family. in particular 1 enjoyed the practical conversations we had about crystallography, science, careers, and during our many late nights while away at the synchrotron.

A special thank you is also reserved for my brother David, who was a Ph.D candidate in the laboratory of Dr. John Tainer at The Scripps Research Institute in San

Diego over the same time period. In the initial stages of my research, 1 benefited greatly fiorn his help, as he was second only to Dr. Jia as far as offering me advice to overcome the problerns encountered in this project.

Many members of Dr. Aa's laboratory contributed to my success as a graduate snident at Queen's. Dr. Qilu Ye, Daniel Lim, Ante Tocilj, Eeva Leinala, Brent Wathen,

Steffen Graether and Dr. Gour Pal have dl provided me with help; be it teaching me the hdamentals of crystallization or cornputer skills, or by assisting me on several trips to the synchrotron. Special recognition should also be given to Yih-Cherng Liou and Leslie

Magtanong, students who also assisted with data collection at the synchrotron.

Additionally, I must thank undergraduate students Alex Mok for assistance with molecular modeling and Michael Sung for technical help with purification and mutagenesis.

The help fiom our pnmary collaborators Dr. John Elce and Dr. Peter Davies and their laboratones is also gratefully acknowledged. Both Dr. Elce and Dr. Davies provided me with insights into the world of biochemistry that are not typically found in the X-ray lab. 1 must also thank Dr. Elce for his patience and his uncanny ability for instilling me with motivation. The excellent technical support of Carol Hegadom and

Sherry Gauthier is also recognized as a significant contribution to my project.

A very special thank you is saved for Tudor Moldoveanu, a doctoral student in

Dr. Davies' laboratory. Tudor provided me with both unparalleled fiiendship and academic challenges throughout the entire duration of this thesis. Tudor sees the world of science much differently than 1, which made for an endless nurnber of conversations that ended with very insightful conclusions. It is very appropriate that our combined efforts will go hand-in-hand in contributing a new and unforeseen understanding to the field of calpain research.

It goes without saying that my most heart-felt thank you is reserved for my brothers David and Scott, and in particular my parents Robert and Dome. It is arnazing how much I have continued to lem from than, in spite of the fact they have no idea what a protein is.

Finally, 1 would like to thank the Natural Sciences and Engineering Research

Council of Canada (NSERC) and the Canadian Institutes of Health Research (CIHR) for providing me with scholarships. Grants hm the CIHR, the

Network of Centers of ExceIlence, and Warner-Lambert Canada fundeci this research. LIST OF ABBREVIATIONS

APS Advanced Photon Source

BNL Brookhaven National Laboratory

CCD charge coupled device

CHESS Comell High Energy Synchrotron Source

DEAE diethylamine ethyl

OMS0 dimethyl sulfoxide

D?T dithiothreitol

EDTA ethylene-diamine tetraacetic acid

FPLC fast protein liquid chromatography

IPTG iso-propyl P-thiogalactoside

keV kilo electron volts

LB Luria Bertani medium

MAD multiwavelength anomalous dispersion

MALDI matrix-assisted laser desorption ionization

MES 2-morpholinoethanesulfonicacid

MIR multiple isomorphous replacement

MIRAS multiple isomorphous replacement with anomalous scattering

MOI multiplicity of infection

MPD 2-methyl-2,4-pent anediol

NSLS National Synchrotron Light Source

PAGE polyacrylamide gel electrophoresis

PEG pol yethy lene glycol PMSF phenyiniethyl sulfonyl fluoride r.m.s.d, root mean square deviation

SDS sodium dodecylsulfate ssDNA single-stranded DNA

SSRL Stanford Synchrotron Radiation Laboratory

TCA trichloroacetic acid

UV ultraviolet TABLE OF CONTENTS

ACKNOWEDGEMENTS ...... II LIST OF ABBREVIATIONS ...... IV

TABLE OF CONTENTS b~~ebo~O~OOOmmOmmOOOObOO~~~~bmmOOmOOmOoOmmOObOmOOm~ooOOb~OmmmO~O~OmmmmmmmmOOOmmmmbm~mO~bboOmmmo~

LIST OF TABLES ~Ommm~~mOOOOOmmOOOOO~bO~~~OO~~~Ob~~~~~~~OOOOOObOOOOOOOOOOOOOOIX

LIST OF FIGURES m...... mmm..mmm.m....m...... m...... m..m.....m....mm.mm.mmmmm...... mmm..mmmm.mmm.mm.... X

CHAPTER 1: INTRODUCTIONm.....m.m....~..m...mm.m...m.~.mmmmm.....mm.m...... ~...... m...... m.....mmmm.. 1

1.3.1 Ubiquitous Calpain Isoform: p- and m-calpain...... 5 1.3.2 Prima- Structure ...... 6 1.3.3 Physio logicd Fimctions ...... 10 1.3.4 Pathological Roles ...... II 1.3.5 Catalyric Mechanism ...... II 1.3.6 Replation of Calpain Activity ...... 14 1.3.6.1 caZ' ...... 14 1.3.6.2 Autolysis ...... 16 1.3.6.3 Subunit Dissociation ...... ,,,...... 16 1.3.6.4 Phospholipids and Membrane Association...... 17 1.3.6.5 Calpain Activator Proteins ...... 17 1.3.6.6 Calpastatin...... 18 1.3.7 Cvstai Structure of the Regulatow Subunit ...... 18 1.3.8 Tissue-Speciftc and Non-Mammalian Calpains...... 23 1.3.8.1 Large Subunit Homologues ...... 23 1.3.8.2 Srnail Subunit Homologues ...... 24 1.4 PROTEASESAND ZYMOGENS...... ,,.....,...... ~...... 25 1.4.1 Structure and Replation of Cvsteine Proteases ...... 26 1 .5 METHoDs FOR PROTEINSTRUCTURE DETERMTNATION .....,...... 29 1.6 OBJECTIVESAND APPROACH...... 30

2.1 SOURCESAND DESCRIPT[ONOF RAT M-CALPAIN...... ,...,,...... 32 2.2 RECOMBINANTPROTEIN EXPRESSION AND PURIFICATION...... 33 2.3 PRODUCTIONOF SELENOMETHIONDE-LABELED CALPAIN ...... 37 2.4 ASSAYINGCALPAIN ACTMTY AND CA'+-DEPENDENCE ...... 38 2.4.1 Standard Cosein Assqv and ~rr"-~itration ...... 38 2.4.2 Cerein Zvmography ...... 39 2. 5 CRYSTALLIZATION...... 40 2.6 HARVESTINGAND STORAGEOF CRYSTALS...... *....41

4.3.1 Domain Structure...... 123 4.3.2 Structural Baisfor ~o'~-~e~endent~ctiviv of Calpain ...... 124 4.3.3 The N-Terminal Anchor: A Very Ununral Pro-Segment ...... 125 4.3.4 Structure and Arrangement of Durnain III ...... 117 4.3.5 Structural Comparkon of Calpain and -Like Protease Domains ...... 129 4.4 STRUCTURALFEATURES CONTRIBUTMGTO CALPAININA~IVATION LN THE ...... 134 ABSENCE OF CA'+ ...... 134 4.4.1 Anchor-Regulatoy Subunit Interactions Restrict Protease Domain-I ...... 134 4.4.2 Domain-III Restricts Protease Domain-II ...... 137 4.4.3 The tinker ...... 141 4.5.4 The Effeets of Dismpting Multiple "Mobilip Restraints " ...... 142 4.6 PROPOSEDREGULATORY MECHANISM OF CALPAINBY CA'+ ...... 144 4.7 CONCLUSIONSAND FUTURE STUDIES...... 150

APPENDUC A: PRINCIPLES OF PROTEIN CRYSTALLOGRAPHY ....b.....~..... 163 A .1 ELECTRONDENSITY AND THE CRYSTALLOGRAPHICPHASE PROBLEM ...... 163 A.2 PRMCIPLESOF THE MIR METHOD...... 165 A.2. I Overview of MIR ...... 16.5 A.2.2 The Structure Factor ...... 166 A.2.3 Location of Heay Atorn Positions: The Isomorphous Drfirence Patterson 167 A.2.4 MIR and the Solution of the Phase Problem ...... 168 A.3 MULTTWAVELENG'MANOMALOUS DISPERSION (MAD) ...... 170 A.3.1 The Fundamental Concept of MD- Anomalous Scattering ...... 170 A.3.2 Advantages of MAD ...... 173 A .3.3 Historieal Limitations of MALI ...... 173 ERENCES FOR APPENDE:...... 178

CURRICULUM VITAEoooo~o*oooo*m*o.*mo*oooo*mmo*mm*om**ooooomo*omooooom*m~*~m*~o*oo.ooomo*oo*oo*o~o*o*mooo 179 LIST OF TABLES

Table 3.1. Native X-ray diffraction and data statistics ...... 69

Table 3.2. MAD data collected at SSRL ...... 79

Table 3 .3 . MAD phasing statistics for the RIcrystal form of SeMet m-calpain ...... 82

Table 3.4. MAD phasing statistics for the Pl aystal form of SeMet m-calpain ...... 83

Table 3.5. Refinement statistics ...... 86

Table 3.6. Influence of arnino acid sequence on enzyme-substrate stability ...... 107

Table 3.7. Analysis of various calpain mutants ...... 1 10

Table 3.8. Effects of mutations on the ~a"-requirement and specific activity of m-

calpain ...... 1 17 LIST OF FIGURES

Figure 1.1. ~a"-binding to EF-hand proteins ...... 4

Figure 1.2. Primary amino acid sequence cornparison between p- and m-calpain ...... 8

Figure 1.3. Schematic representation of the traditional domain structure of calpain ..... 9

Figure 1.4. Catalytic rnechanism of cysteine proteases ...... 13

Figure 1.5 . Role of EF-hands in the regdation of calpain by ~a"...... 15

Figure 1.6. Crystal structure of domain VI of the calpain regulatory subunit ...... 20

Figure 1.7. Coordination geometry of cal' by the EF-hands in domain VI of the

calpain regdatory subunit ...... 1

Figure 1.8. ca2'-induced conformational changes in domain VI ...... --77

Figure 1.9. Crystal structures of cysteine proteases ...... 27

Figure 1.10. Zymogen inactivation in cysteine proteases involves the N-terminal pro-

segment ...... -28

Figure 3.1. SDS-polyacrylamide electrophoretic andysis of the purification of

recombinant rat m-calpain ...... 62

Figure 3.2. Crystals of m-calpain obtained in the early stages of the crystallization

process...... 65

Figure 3.3. Di fnaction-quality crystals of m-calpain ...... -66

Figure 3.4. Typical X-ray diffraction pattern fiom a P2! crystal of m-calpain ...... 70

Figure 3.5. Calpain crystals obtained firom CO-crystallizationwith heavy atoms ...... 72

Figure 3.6. Isomorphous difference Patterson map fiom TEPGC-soaked native Ri

Crystds ...... 75 Figure 3.7. X-ray fluorescence scan from a crystal of SeMet m-calpain ...... 77

Figure 3.8. Anomalous difference Patterson map in PZi ...... 80

Figure 3.9. Solution of selenium positions in PZ using direct methods ...... 81

Figure 3.10. Electron density maps ...... 84

Figure 3.1 1 . Ramachandm Plot ...... 87

Figure 3.12. Crystal structure of the m-cdpain heterodimer ...... 89

Figure 3.13. Domain structure of m-calpain ...... 90

Figure 3.14. Spatial arrangement of the protease and EF-hand domains ...... 91

Figure 3.15. An electrostatic representation of the van der Waal's surface...... 92

Figure 3.16. The cysteine-protease component ...... 95

Figure 3.17. Domain-111 ...... -97

Figure 3.18. The EF-hand domains ...... -99

Figure 3.1 9 . Calpain has a unique N-terminal anchor ...... IO1

Figure 3.20. The linker ...... ~...... ~...... 102

Figure 3.2 1. Structural basis for cal'-dependent protease activity ...... 104

Figure 3.22. Molecular mode1 of the active site of ~a"-activated rn-calpain ...... 106

Figure 3.23. Molecular mode1 of a calpain-peptide complex...... 108

Figure 3.24. Expression and purification of m-calpain mutants ...... 111

Figure 3.25. GIu504Ser!GluS 17Pro mutation causes subunit dissociation ...... 113

Figure 3.26. Ca2+ -titrations of m-calpain variants ...... 115

Figure 3.27. ~a"-titrations of chimeric Nm-calpains ...... 116

Figure 3.28. Effect of autolysis on Glu504Ser and wild-type m-cdpain ...... 119

Figure 4.1. Domain411 of calph resembla a C2domain ...... 128 Figure 4.2. Cornparison of calpain to conventional cysteine proteases ...... 130

Figure 4.3. Cornparison of the active site of calpain and papain ...... 133

Figure 4.4. The anchor restricts the mobility of D-I ...... 136

Figure 4.5. D-III restricts the mobility of D-II ...... 140

Figure 4.6. cd'-induced conformational changes in D-VI should release the N-

terminal anchor ...... *.....,...,.. ...**...... *...... *..*....147

Figure 4.7. Proposai activation rnechanism of calpain by caL'...... 149

xii Chapter 1: INTRODUCTION

1. I The Role of Structure in Biology

Biological macrumolecules represent a complex group of compounds that carry out many of the cellular functions vital to life. The specific biochemical properties of these molecules can be almost solely amibuted to their three-dimensionai structures.

Thus, it is of great interest to sîudy structure since it allows many inferences to be made about the nature of a given fiction. One very famous example, and arguably one of the most remarkable scientific achievements to date, was the deduction of the structure of

DNA fiom X-ray diffraction patterns in 1953 (1, 1). That discovety allowed James

Watson and Francis Crick to deduce the mode in which the chemical information

contained within the double helix could be stored and replicated, thereby allowing DNA

to act as the central molecule of life.

Since that time, a tremendous arnount of emphasis has been placed on the

elucidation of three-dimensionai (3-D)structures of macromolecules. Proteins, which

constitute both the most stnicturally and fùnctionally diverse class of macromolecules,

have been most extensively studied. Since the first 3-D structures of myoglobin and hemoglobin were determined using X-ray crystallography in 1960 (3,4), the structures of

well over 10,000 proteins have been determined and their coordinates have been

deposited in the (5).

The wealth of information made available from protein structures has

revolutionized the marner in which protein biochemistry is cmently studied. Structural

information is used in conjunction with site-directed mutagenesis to examine the contribution of individual amino acids within proteins. Structure-guideci drug-design has proven beneficial, as in the development of specific HIV-I protease inhibitors (6). More recently, structures are being used as starting points to discern fictional characteristics of newly discovered proteins with previously unknown fiinctions (7). The maturation of

X-ray crystallography as a technique for determination has conveniently coincided with the recent announcement of the human genome sequence (8). The nature of biochemical research is arguably on the verge of a "revolution", and 3-D structure detemination, particularly by X-ray crystallography, will undoubtedly play a central role.

in this dissertation, X-ray crystallographic anaiysis has offered novel insights into how the proteolytic activity of the enzyme calpain is regulated by calcium.

1.2 Calcium and Calcium-Binding Proteins

In the resting state of a cell, the intracellular [~a"] is -0.1 pM (9). This is in stark contrat to the extracellular space, where at -2mM, the [ca2+]is approximately four orders of magnitude higher. This immense concentration gradient is maintained by active transport of ~a"out of the cytosol at the expense of ATP by the C~"-ATP~S~and the

~a+/~a'+channel (9). In addition to the extracellular space, ~a'+is pumped into intracellular compartments such as the endoplasmic reticulum and mitochondria, which

serve as additionai cal'-stores. By maintainïng the cytosolic [~a"] at extremely low

Ievels, nature has selected for an extremely powerful signalhg molecule, since transient

increases in the cellular [caZ'] results in the activation of several pathways including muscle contraction. glycogen metabolism, neurotransmitter release, and many others (9).

~a"is ofien called a second messager, since the transient increase in the intracellular [~a"] is dependent upon an external chernical stimulus (the first messenger) such as the binding of a hormone to a receptor. The second message (the transient increase in ca23 lasts only as long as the hormone remains bound to its receptor, and dissipates quickly thereafter, restoring the cell to the resting state (9).

Although several families of calcium-binding proteins exisî, the effects of ~a" are largely exerted through a family of proteins that contain high-affinity ~8-binding sites, the so-called EF-hand proteins. The EF-hand proteins make up a large superfamily of over 200 pmteins, of which caimodulin (CaM) is considered a prototypical rnember

(10). CaM is composed of two globular domains separated by a long a-helix (Figure

1.a) ( 1 1 Each globular domain contains two ~a"-binding sites, or EF-hand motifs.

The EF-hand, originally identified in the crystal structure of parvalbumin (1 2), is a helix- loop-helix structure (Figure 1 .lb) that coordinates a ~a'+ion specifically in the loop region. EF-hands almost always occur in pairs that fold into stable ca2'-binding dornains

(10). In CaM, the N-terminal domain contains a pair of low-affinity EF-hands, and the

C-terminal domain has a pair of high-affinity EF-hands (1 1). Binding of ~a"to these

EF-hands is highly cooperative and results in a large . Upon ca2+- binding, CaM cm interact with and alter the conformation of dozens of cellular proteins

(Figure 1.1 c), thereby rnodulating the activity of several pathways in response to ~a"

(13)- Figure 1.1. ca2"-binding to EF-band proteins. a) Crystal structure of calmodulin (CaM) (blue) illustrates how each globular domain binds two ca2' ions (gold) through a pair of EF-hands. ~a"-binding is cooperative, such that the binding of two cal+ ions to the C-terminal EF-hands enhances the binding of two additional cal' ions at the N-terminal EF-hands. b) The EF-hand motif is a helix- loop-helix structure that coordinates ~a", ofien in a pentagonal bipyramidal geometry as shown here for EF-hand one in CaM. c) In the ~a"-bound state, CaM (blue) undergoes conformational changes and interacts with target proteins (red) within the cell. (PDB codes: 1CLL for ca2'-bound X-ray structure of CaM; 1 CDL for CaM complexed with a peptide fiom light-chain ). ISubstrate

1.3 The Calpain Superfarnily

At the most basic level, the terni calpain describes a family of enzymes (EC

3.4.22.17) possessing a ca2+-dependent cysteine-proteinase activity ( 14). Several excellent review articles have recently described the calpains in great detail and provide a useful source of background information on this superfarnily of enzymes ( 14- 18). The name calpain was adopted shortly after the first cDNA sequence showed that this ~a"- dependent protease evolved fiom a gene fusion event involving ancestral fonns of papain and calmodulin (19, 20). Traditionally, the study of calpain has focused on the so-called

"classical" isoforms, which are the ubiquitously expressed, mammalian heterodimenc calpains. Recently, a number of tissue-specific and atypical calpain isoforms have been discovered, that Vary in both primary structure and the organism from which they were isolated (1 4, 2 1-26). In this dissertation, the simple term "calpain" may often be used to describe a general feature (or features) of the ubiquitous calpains.

13.1 Ubiquitous Calpain Isoforms: p- and m-calpain

The two ubiquitous calpains are often called calpain-1 and calpain-II, although they are more cornmonly refened to as p-calpain and m-calpain, respectively (14). Both isoforms have been discovered in essentially every mammalian tissue studied to date, although their absolute and relative expression levels are known to Vary somewhat, depending on the tissue (14- 18). These heterodimaic cysteine proteinases conskt of a distinct but highly homologous catdytic subunit of -80 kDa and a cornmon mal1 or regdatory subunit of -28 kDa Although the large subunits are encoded from separate gezies (capn 1 and cap2 encode p-calpain and m-calpain, respectively), they are -60% identicai at the amino acid level, and therefore share many fhctiond characteristics (14).

The bat-characterized difference between the p- and m-isofoms is that they di& greatly in terms of their response to ~a?Generally speaking, the half-maximal ca2+- requkments ([~a'']~.~)for activation in viîro are -5-50 pM and -250-1000 ph4 for p- and m-calpain, respectively (14). Although p-calpain is activated at much lower ~a" concentrations, it is interesting to note that both enzymes require a higher ~a" concentration than is nomally available in vivo (-0.1 PM).

13.2 Primary Structure

The primary amino acid sequences have been detennined from both of the ubiquitous isoforms fiom several species and are available at various databases (19, 27-

30). The sequences of human and rat p- and m-calpain are show in Figure 1.2. The inter-species identity within each isoform is typically -90% or greater, whereas the identity between the isoforms is -60% across the entire catalytic subunit.

On the basis of ptimary sequence alone, the catalytic subunit was initially divided into four functional domains (19) as described below (Figure 1.3). Domain I (residues

-1-80 using m-calpain numbering) is not similar in sequence to any known protein.

Domain II (residues -80-320) was identifieci as the cysteine protease domain and was found to be similar to papain (EC 3.4.22.2). Although prirnary between papain and rn-calpain shows only -25% overail identity in this region, the catalytic triad residues (Cys, His, Asn) are completely conserved, and residues in the vicinity of the catalytic triad are highly conserved. Domain III (residues -320-560) is not homologous to any protein sequence discovered to date. Domain IV (residues -560 to

700), at the C-terminus of the catalytic subunit, shares to calcium-binding proteins such as calrnodulin, and was predicted to have four EF-hand motifs for binding ca2+(19).

The regulatory subunit is composed of two distinct domains (3 1, 32). The N- terminal region is composed of domain V, which is a glycine-nch segment of -85 residues. Domain VI (residues -86 to 268), at the C-terminus, is a second calcium- binding domain that is very simila. (-50% identical) to domain N of the catalytic subunit, and was also predicted to contain four EF-hand motifs (Figure 1.3). Figure 1.2. Primary amino acid sequence cornparison between p- and m- calpain. The amino acid sequences of p- and m-calpain from both rat and human are shown to illustrate the high degree of across species and across isofoms. A (*) indicates identical residues, (:) indicates highly homologous residues and (.) indicates distantly homologous residues. Cataiytic tx-iad residues are indicated in red, bo ld- face type. human-m rat-m human-mu rat-mu human-m FPAIPSAT.IGFKELGPYSSKTRGMRWKRPTEICADPQFIIGGATRTDICQGALGDCWLLAA 110 rat -m FPALPSSLGFKELGPYSSKTRGIEWKRPTEICADPQFIIG 110 human-mu FPPVPQSLGYKDLGPNSSKTYGIKWKRPTELLSNPQF~GATRTDICW&G~ 120 rat-mu FPPVSHSLGFKELGPNSSKTYGIKWKRPTELLSNPQFIP 120 ***:* :**:*:*** **** *:.******: ::****;.******************** human-m IASLTLNEEILARVvPLNQSFQENYAGIFHFQFWQYGEtJVEVVVDDRLPTKDGELLFVHS 170 rat-m IASLTLNEEILARVVPLDQSFQENYAGIFHFQFWQYGEWVEVVVDDRLPTKDGELLFVHS170 human-mu IASLTLNDTLLHRVVPHGQSFQNGYAGIFHFQLWQFGES 180 rat-mu IASLTLNETILHRWPYGQSFQEGYAGIFHFQLWQFGEWS 180 *******: :* *+** ,****:.********:**:****:***** ** ***.*.****. . human-m rat-m human-mu rat-mu human-m rat-m human-mu rat -mu human-m rat-m human-mu rat-mu human-m rat-m human-mu rat-mu human-m rat-m human-mu rat -mu human-m rat-m human-mu rat-mu hurnan-rn rat-m human-mu rat-mu human-m rat-rn human-mu rat-mu human-m rat-rn human-mu rat-mu Figure 1J. Schernatic representation of the traditional domain structure of calpain. The catalytic subunit (top) is roughly divided into four domains based on the pnmary arnino acid sequence (19). Domain II is the cysteine protease domain containing the catalytic triad residues (Cys 105, His262, Asn286). Domain IV is similar in sequence to calmodulin and was suggested to have four EF-hand motifs for binding ~a". Domains I and III are not similar in sequence to any known protein. The regulatory subunit (bottom) is divided into two domains based on the pnmary sequence (3 1,32). The N-terminal domain V is hydrophobic and rich in glycine residues, while the C-terminal domain VI is very similar in sequence to domain IV.

133 Physiologieal Functions

While the exact physiological roles of the ubiquitous calpains remain to be elucidated, their fuoctional characteristics and wide distribution suggest that they have important cellular roles ( 14- 1 8). This is exemplified by the recent demonstration that transgenic mice lacking the classical calpain isofoms die during embryonic development

(33). In this experiment, the canp4 gene, which encodes the regulatory subunit, was dismpted. In the absence of the regulatory subunit, neither p- nor m-calpain had detectable activity in canp4- cells. As a result of canp4 disruption, homozygous canpb- mice died at mid-gestation. In contrast, heterozygous canp4"- mice were bom and had no obvious phenotypic differences, suggesting that one allele is sufficient to allow normal calpain function (33).

The best-documented effects of calpains are on the reorganization of the cellular cytoskeleton, where they have been implicated in the processes of ce11 motility, adhesion and fusion (34-36). Additionally, their dependence on ~a"has linked them to important functions including ~a"-si~nallin~transduction pathways and apoptosis (37-39).

Although a lack of calpain-specific cell-permeable inhibitors has largely prevented the identification of unambiguous functions for these enzymes, experirnents involving the overexpression (34) or injection (35) of calpain's endogenous inhibitor, calpastatin, have provided key insights into calpain function.

Calpains have been called biomodulators because their physiological activity involves cleavage of proteins at inter-domain boundaries. Rather than causing simple digestion, calpain cleavage often serves to modulate the function of substrates, thereby affecthg pathways fùrther downstream (18). For example, it was recently shown that calpain-mediated cleavage of the proapoptotic protein Bax into an 18 kDa fragment commits cells to the apoptotic pathway, since the 18 kDa fiagrnent retains apoptotic activity, and unlike full-length Bax, is not inhibited by the anti-apoptotic protein Bclî

(39). This feature of limited digestion, coupled to the dependence on ca2+,has long intrigued researchers and has made calpain an extensively studied enzyme over the last two de.ades.

Although a rather large list of calpain substrates have been identified both in viîro and in vivo, calpain is considered to be selective in its choice of substrates, and does not have a strong consensus sequence at the cleavage site (14, 18). Some of the more notable calpain substrates identi tied include p53, C, spectrin, c~"-ATP~s~,talin and fibronectin ( 14- 18,4047).

13.4 Pathological Roles

Much interest in calpain stems from its involvement in several pathological conditions. These pathological states are generally attributed to conditions of altered caL+-homeostasiswhich can result in disrupted regulation and excessive activation of calpain (43). Excessive by calpain has been observed in several neuro- pathological states including Huntington's disease, Parkinson's disease and Alzheimer's disease (14, 44). Perhaps best characterized is the hyper-activation of calpain that seems to contribute to the extensive tissue damage accompanying cerebrai and cardiac ischernia

(43). In vivo studies have illustrated that administration of calpain inhibitors can

significantly reduce infarct siza in mammals such as gerbils and rats (45, 46).

Consequently, calpain is consîdered an important therapeutic target, since the development of specific inhibitors could be valuable in treatment of these diseases. 13.5 Catalytic Mechanism

Sequence alignments have shown that calpain possesses the same cataiytic triad residues (Cys 1OS, His262 and Asn286) as conventional cysteine proteases such as papain

(Cys25, His159 and Asnl75) (19), and thus the catalytic mechanism is expected to be sirnilar, if not identicai. Supporthg this are studies illustrating that the m-calpain mutants

Cysl OSSer, His262Ala and Asn286Ala had no cataiytic activity (47).

The catdytic mechanism of cysteine proteases obeys Michaelis-Menten kinetics and proceeds through a multi-step process as illustrated in Figure 1.4 (48). In the initial step, the catalytic Cys and His form an ion pair as the irnidazole (Im) side chain acts as a general base, accepting a proton from thiol group. Upon substrate binding, the thiolate anion serves as the nucleophile, which attacks the carbonyi carbon of the scissile amide bond. The reaction proceeds through a negatively charged tetrahedral intermediate

(THIi) (which is stabilized by the side chain of a key Gln residue as well as the backbone amide of the catalytic Cys) to a covalent acyl-enzyme intermediate, releasing the C- terminai peptide ûagment. A water molecule then initiates the deacylation step (which also proceeds through the tetrahedral intermediate (THI?)) liberating the N-terminal peptide fragment and restoring the enzyme to its original state. Although these major events accurately describe the general catalysis, there are 0thfactors that can influence the rate of the reaction through eRects on substrate-binding and the catalytic efficiency

(48,49). Figure 1.4. Catalytic mechanism of cysteine proteases. A schematic representation of the mechanisrn of substrate cleavage by cysteine proteases. See section 1.3.5 in the text for details and abbreviations. TS$ to TS$ represent postulated transition states. (Reproduced from reference 48). @) m CmP) binding/ acylation LI RCOOH dissociation deacyîation m v 13.6 Regulation of Calpain Activity

Although the physiological fiinctions of caipain remah somewhat obscure, the importance of calpain is hinted at by the extensive amy of mechanisrns designed to regulate calpain activity in vivo.

Ca" is the primary regulator of calpain activity. Since enzymatic activity is absolutely dependent on ~a",it is assumecl that the binding of this divaient cation causes a confomationai change that somehow results in activation of the enzyme. Given that calpain possesses two domains of caimodulin-Iike origin, it has been the convention to assume that the EF-hands are the primary factors involved in the ~a"-reqonse. This line of thinking has been supported by mutagenesis studies that have targeted the EF- hand motifs in both domain-IV and domain-VI (50). As seen in Figure 1 S. disrupting the

EF-hand motifs results in a significant increase in the [~a"]required for activity of the enzyme. As in other EF-hand proteins, binding of ~a"to calpain occus in a cooperative fashion, suggestive of multiple-site binding.

How caipain is activated by Ca" in vivo is much more difficult to understand, since the concentration of ~a"required for calpain activity in vitro is considerably higher than the physiological [~a"]. Multiple mechanisms have been suggested as a means to circumvent this apparent paradox, including transient increasa in the intracellular [~a"], as well as severai additional factors which will be desaibed below

(14-18). Figure 1.5. Role of EF-hands in the regulation of calpain by ca2+. Dimption of ~a"-binding at various EF-hands in calpain drarnaticaily increases the [ca2+]required for half-maximal activation in viiro. Mutations in EF3 in both the cataiytic (a) and regulatory (b) subunits have the most dramatic effect, suggesting that EF3 makes the most significant contribution to the activation of calpain by cal' (50). (Refer to section 1.3.7 for more detailed information on the EF-hand arrangement in calpain.) The sigrnoidai appearance of these curves is indicative of cooperative ~a"-binding, suggesting that multiple ~a"ions contribute to the activation of calpain. Legend: a) wild-type m-calpain (2 1k/m80k) (a); 2 1k/m80k(EF l/W) (O); 2 1k/m80k(EF 112) (r); 2 1Wm80k(EF 113)

(1; 2 1klm80k(EFU3) (A). b) wild-type m-cdpain (2 1k/m80k) (a);

2 1k(EF 1/2/3/4)/m8Ok (G); 2 1k(EF 1/2/3)/m8Ok (r ); 2 1 k(EF 1/3/4)/m80k (i);

2 1k(EF2/3/4)/m8Ok (A). As an example, 2 1Wm80k(EF 113) indicates that EF- hands 1 and 3 were disrupted in the catalytic subunit. - - 1O0 1O00 Calcium Concentration (PM) - - - A - - A A - - 1O0 1 O00 Calcium Concentration (PM) The N-terminal regions of both the catalytic and regulatory subunits of p- and m- calpain are autolytically cleaved very rapidly after addition of ~a"(5 1-53). Aithough the physiological relevame of autolysis has not been conclusively dernonstrated, it is well established that cleavage of a short N-terminai peptide in the catalytic subunit reduces the in vitro ca2+-requirementof p-calpain fiom -10-50 FM to -1 FM and m-calpain from

-300 pM to -100 pM (53). This finding has led to the speculation that autolysis of p- calpain might be sufficient to activate the enzyme in vivo. Autolytic cleavage of the N- terminal 86 residues of the regulatory subunit, which constitute the glycine-nch domain

V, does not affect the ~a"-recpirement, and its significance is not well understood (54).

1.3.6.3 Srrbtinit Dissociation

Suniki's laboratory has demonstrated that under certain conditions, ~a"cm cause the regulatory subunit to dissociate in vitro. liberating a catalytic subunit that retains full catalytic activity (55, 56). The isolated catalytic subunit has a reduced ~a"- requirernent and a capacity for membrane-association, and it has been suggested that this may be an important regulatory rnechanism in vivo (1 8). There remains some uncertainty about this model, since it was not supportai by studies involving immunoprecipitation and affinity chromatography of natural and mutant calpains in the presence of ~a"(57,

58). 1.3.6.4 Phospholipids and Membrane Association

Although typically found in the cytosol, it has been demonstrateci by fiactionation and immunolocalization experiments that calpain translocates to the plasma membrane in a ca2+-dependentmanner ( 15, 59-60). This observation is consistent with the assumption that the primary hction of calpain is cytoskeletal remodeling, and that most prefemed calpain substrates are associated with the cytoskeletal matrix (14-18). Membrane phospholipids have also shown to be key factors affecting the sensitivity of calpain to ca2+. In particular, L-a-phosphatidylinositol has been reported to reduce the [~a"] required for autolysis in vitro (6 1), which in tum reduces the [~a"]needed for substrate proteolysis (53). Neomycin, a polyphosphoinositide-binding antibiotic, can inhibit the effects of phosphatidylinositol, suggesting a specific mechanism for phospholipid binding indeed exists (62).

1.3.6.5 Calpain Activator Proteins

Although most research devoted to calpain activation has focused on autolytic conversion or membrane-association, recent experiments have repoaed the isolation of calpain-specific activator proteins (63, 64). A protein known as LIK114, isolated fiom bovine and rat brain, is completely specific for p-calpain, and reduces the [ca"los for substrate proteolysis fiorn -25 pM to -500 nM in vitro (63). Similarly, the acyl-CoA- binding protein, isolated from rat skeletal muscle, speafically activates m-cafpain, reducing the [~a'']~.~for activity in vitro fkom -300 PM to -10 FM (64). Much work rernains to be done to determine their precise mechanisms of calpain activation, and to establish the potential physiological relevance. Like many other proteases, calpain is susceptible to inactivation by a calpain- specific inhibitor protein known as caipastatin (15, 53). Various isoforms of calpastatin exist, but the most commonly found fom is a -1 10 kDa protein that inhibits calpain at a stoichiometry of 4 calpain heterodimers per molecule of calpastatin (15). Calpastatin is thought to exert its inhibitory effect by binding to both the active site cleft of calpain as well as to the EF-hand domains, of which the latter might inhibit the cal'-induced conformational change. Since calpastatin has not been observed at the membrane, it is often suggested that calpain migrates to the ce11 membrane in response to ~a"in order to escape inhibition by calpastatin (15,65).

13.7 Crystal Structure of the Regdatory Subunit

A major step forward into the structure-function anaiysis of calpain was made with the recent crystal structure determination of domain-VI of the regulatory subunit, in both the ~a"-free (66) and ca2'bound forms (66, 67). Surpnsingly, the structures revealed that domain47 contained 5 EF-hand structures, rather than the 4 predicted fiom the primary sequence (19) (Figure 1.6). The most N-terminal EF-hand, (EFI) differs from the consensus EF-hand sequence and thus was not previously predicted to bind ca".

From the structure, EF1 to EF3 were shown bind ca2' in the presence of 1 rnM

cas, while EF4 was bound to a ca3 ion only at 200 rnM caZ+(66). From this finding, it

was suggested that EF 1 to EF3 represented the physiologicaily relevant EF-han& as far

as regdation of activation by ~a'+is concemed. EF2 and EF3 possess typical CaM-like

consensus EF-hand sequences, and were shown to coordinate caZ' in the expected pentagonal bipyrimidal geornetry. EFI, which does not contain a consensus EF-hand motif, also coordinates ca2' in a pentagonal bipyrimidal geometry, but does so in an atypical manner. EF4, which probably does not bind ca2' under physiological conditions, coordinates the ~a"ion in an unusual manner, with 8 ligands instead of the usud 7 (Figure 1.7) (66).

EF5, which is unpaired, does not bind ~a", and instead interacts with the corresponding EF-hand in a second molecule resulting in homodimer formation, a feature not previously observed in any EF-hand protein (68). Given the higb sequence identity between domain-VI and domain-IV of the catalytic subunit (-50%), this finding suggested that heterodimer formation occurs in the sarne fashion in p- and m-calpain.

Another interesting finding fiom the regulatory subunit structure was the relatively mal1 confornational change induced by ca2'. The r.m.s.d. between the ~a"- bound and ca2'-free States (66), is only 1.77 A for al1 Ca atoms (Figure 1.8.). The

Iargest structural changes were observed in the vicinity of EFI. In this region, 18 Ca atoms have an r.m.s.d. of 4.34 A, and the orientation of EF2 shifts 18O with respect to

EFI (66). While somewhat unexpected, these srnail conformational changes at EFI in the ~a"-binding domains may be amplified in the intact heterodimer through domain-III or domain* (66,67). Figure 1.6. Crystal structure of domain-VI of the calpain regulatory subunit. Domain-VI (yellow) foms a homodimer both in the absence (not shown) and presence of cal' ions (purple) (66,671. Calpain was the fint member of the EF-hand superFamily identified to have an odd number of EF-hands, as each monomer of domain-VI has five. EFl and EF2 (blue) fom one pair, while EF3 and EF4 (green) form a second pair. A novel fiuiction for an EF-hand was discovered as EF5 (red) from each monomer interacts to form an intemolecular pair, forming the stnichual basis of homodimer formation. Given that domain-VI is -50% identical to domain-IV in the catalytic subunit, it was suggested that heterodimerization occurs through a similar mechanism invoiving EF5 bbembrace". (Coordinates tiom PDB code I DVI) Dimerization occurs via EF-Hand-5 "Embrace"

Molecule 1 Molecule 2 Figure 1.7. Coordination geometry of ca2+by the EF-hands in domain-VI of calpnin. a) EF2 (not shown) and EF3 (shown) have a typical pentagonal bipyramidal anangement of ligands for chelation of a ~a"ion (yellow spheres). b) EFI has a somewhat unusual ansingement of ligands compared to CaM, but still coordinates cal' in a pentagonal bipyramidai geometry. c) EF4 was found to bind ~a"only at very high (-300 mM) concentrations, and presumably does not bind ~a"in vivo. The Ca"-coordination geometry at EF4 is atypical in that it has 8 ligands, rather than the usual 7. EF5 (not shown) does not bind ca2+even at a [Ca-'] of 200 mM (66, 67). Al1 images in Figure 1.7 are stereodiagrams. (Adapted from Blanchard et al., 1997) (66).

Figure 1.8. ~a%.nducedconformatiooal changes in domain-VI. The binding of ~a"to domain-VI of calpain induces a relatively mal1 conformational change compared to other EF-hand proteins like CaM or troponin C. A stemdiagram of the overlap of Ca2'-free domain VI (red) with ~a"-bond domain-VI (green) is shown. Ca" ions are indicated as yellow spheres at EF- hands I to 3. In caipain, the major change upon ~a'+-binding is localized to the region involving EF 1 (at al), the most N-terminal EF-hand. It was speculated that these rather small conformational changes may be amplified around the EFI "hinge" in the heterodimer, resulting in activation of the proteolytic activity (66, 67). (Figure adapted from Blanchard et al., 1997) (66)

13.8 Tissue4pecific and Non-Mamrnalian Calpains

1.3.8.1 Large Subunit Homologues

In recent years, many tissue-specific calpain isofoms have been discovered, in most cases only as cDNA sequences, and not yet as proteins (14, 21-26). Among these isofoms, calpain-3, or p94, has drawn the most attention since defects in the human p94 gene that abolish its function result in lirnb-girdle muscular dystrophy ZA (22, 24).

Furthermore, gene-knockout experiments illustrated that transgenic mice lacking p94 develop a muscular dystrophy phenotype sirnilar to that in humans (69).

p94 is specifically expressed in the skeletal muscle, where its mRNA is fond at high levels, -10 tirnes in excess of the conventional caipains (70). At the primary sequence level, p94 is similar to the conventional p- and m-calpains although it has a unique N-terminal region, and two large insertion sequences, one of which contains a putative nuclear localization sequence (70). p94 has a unique biochemical profile since it is rapidly auto-digested, it is not inhibited by calpastatin or other typical calpain- inhibitors, and has not been found to associate with the regulatory subunit (14).

Furthermore, it has been suggested that p94 is a caz'-independent protease, in spite of the fact that it also contains 5 EF-hand motifs (14). However, this observation has been contradicteci (71). and remains unclear since in viïro studies with p94 are complicated by its extremely rapid autolytic breakdown. Physiologically, the function of p94 has not yet been descriied, although it is known to bind to the gigantic protein specifically through its C-temiinal insertion sequence (72). Several atypical isoforms lacking the caz'-binding EF-hand domain, such as

TRA-3 have recently been identified (14). The function of these proteins rernaùis largely uncertain, although TRA-3 is required for sex-determination in C. elegans (73,74).

Interestingly, evidence suggests that TRA-3 requires ~a"for activation (73), in spite of having no putative EF-hand motifs. The primary sequence of TRA-3 suggests it may contain a C2 domain (a domain known to bind ca'3, and it has been suggested caL'- binding to this region could be responsible for activation (73). Another calpain isoform lacking the EF-hand domain (calpain-10) has recently been linked to type 2 diabetes through positional cloning (75). For information regarding several additional large subunit homologues, the reader is directed to several reviews ( 14- 18).

2.3.8.2 Srnall Subirnit Homologues

Although the regulatory subunit of calpain is similar to "typical" EF-hand proteins such as calmodulin and troponin C, proteins displaying much higher levels of homology have been identified, including sorcin, grancalcin and ALG-2 (14). Like the isolated regulatory subunit of calpain, these proteins are known to fom hornodimers, presumably through a fifth unpaired EF-hand. This mode of homodimer formation was recently confixmed in grancalcin by the crystal structure (76). Sorcin and grancalcin dso have glycine-rich domains imrnediately N-terminal to the EF-hand domain, suggesting an additional conservation of fûnction with the calpain regdatory subunit ( 14). 1.4 Proteases and Zymogens

In order to protect themselves from excessive or unhely proteolysis, cells have evolved a mechanism that involves synthesizing cellular proteases in catalytically inactive fonns, or zymogens (77). Upon stimulation by an appmpriate cellular signal, a zymogen is converted to an active protease, typically by a pathway involving proteolytic processing of the zymogen, either autolytically, or through specific maturases (77). X- ray crystallography has been instrumental in detemining both the sûuctual requirernents for inactivity in zymogens, as well as in revealing the mechanism of activation.

Structures have now been determined for both active and inactive forrns of proteases within most major protease families, including cysteine-, serine-, aspartic- and metallo- proteases (77). While the folds of proteases within these families range hm nearly identical to entirely different, they al1 have a common catalytic requirement: a specific geometry of active site residues. Interestingly, structural analysis has revealed that in almost ail cases, the active sites of zymogens and active proteases are virtually indistinguishable. ln other words, the catalytic machinery is "primed" for use upon synthesis of the enzyme. Catalytic activity is inhibited in these enzymes by the zymogen prosegment, an extension of the mature enzyme that physically obstnicts the active site and prevents access of substrates. Production of a catalytically active species absolutely requires the removal of the prosegment from the active site.

One exarnple has recently been described in which a significant degree of active site assembly contributes to activation. in the proplasmepin II, the pro- segment does not block the active site, which is "immature", and removal of the pro- segment promotes active site formation (78). The activation of profactor D, which is otherwise a normal , may follow a unique pathway, involving re- orientation of the catalytic residues to form a self-inhibited mature enzyme, which is activated oniy by the binding of a substrate (79).

1A.1 Structure and Regulation of Cysteine Proteases

Cysteine proteases of the papain fmily have a fold consisting of an N-terminal domain of predominantly a-helix and a C-terminal domain of mostly P-sheet (Figure 1.9)

(77, 80-8 1). The active site cleft resides at the interface of these two domains, with the catalytic cysteine contributed by the N-terminal domain and the histidine and asparagine contributed by the C-terminai domain. The catalytic mechanisrn of cysteine proteases, as described in section 1.3.5, requires a specific geometry of the catalytic Wad so that the

Cys-His ion pair can form, and the thiolate anion can subsequently initiate the nucleophilic attack on a substrate (48). This specific active site geometry has been observed and confimed in nurnerous crystal structures of cysteiw proteases (77,SO-85).

The crystal structures of al1 cysteine protease zymogens determined to date have illustrated that the catalytic tnad is always in the correct geometty for activation, but the protease is inactivated due to the occupation of the active site by the N-terminal prosegment (82-85). The pro-segment binds across the active site clefi in a reverse orientation to that of native substrates, thereby preventing their access to the active site and rendering the protease inactive (Figure 1.10). Activation of the protease therefore musr involve autolytic removal of the pro-peptide to allow native substrates to bind. Figure 1.9. Crystal structures of cysteine proteases. Cysteine proteases in the papain family consist of a mainly a-helicd domain and a mainly P-sheet domain, with the active site residues (Cys, His and Asn) forming a catalytic triad at the interface of these domains. Papain (red), (green) and other cysteine proteases often have very similar structures. (PDB codes: Papain, 9PAP; Cathepsin B, 1HUC) Asn Figure 1.1 0. Zymogen inactivation in cysteine proteases involves the N-terminal pro-segment. The X-ray crystal structures of Cathepsin B (blue) (81) and procathepsin B (purple) are vixtually identical. The pro-segment (red) of procathepsin B occupies the active site, preventing access of native substrates, thereby inactivating the protease. The catalytic triad residues, as shown, are in the identical conformation in both the inactive (blue) and active (red) foms of the enzyme. Activation absolutely requires removal of the pro-segment from the active site clett. This generally occurs due to a decrease in the pH, which disrupts the interactions between the pro-segment and the active site cleft. (PDB codes: Cathepsin B, I HUC; Procathepsin B, 1 MIR) N-terminal pro-segment blocks catalytic triad 1.5 Methods for Protein Structure Determination

Three-dimensional analysis of protein structures is generally accomplished using

the techniques of either nuclear magnetic resonance (NMR) spectroscopy or X-ray

crystdlography. Since protein crystallization is ofien a long and tedious process guided by littie rationale or scientific basis, NMR has a significant advantage over crystallography in that protein crystals are not required for the process of structure

detemination. An additional advantage of NMR is that it is more capable of studying

the dynamic processes associated with proteins such as conformational changes. Crystal

structures, on the other hand, typically represent a static or single "lowest energy"

conformation. Aside from these limitations, X-ray crystallography is generally the

preferred method of choice for three-dimensional structure determination. Three main

factors have continued to contribute to the success of crystallography compared to NMR.

First, NMR is unable to determine structures of molecular weight greater than -30,000

Daltons, while X-ray crystallography has been successfùl in the determination of

structures with molecula. weights of greater than 1 million Daltons (86). Second, X-ray

crystallography is superior from the aspect that it can achieve a much higher level of

detail than NMR,often to atomic resolution. This latter point is significant since it is

fiequently the specific conformation of side-chah residues in proteins that contribute to

function. In most cases, NMR is unable to provide enough detail for determination of the

side chah orientation, which can severely limit the biological intapretation from the

structure. Third, the time required for the structure determination of a typical protein is

significantly less using X-ray crystallography, in spite of the difficulties oh

accompanying crystal growth. This is largely due to severai technological advances that have made the multiwavelength anomalous dispersion (Mm) method a practical tool for solving crystal qstals. The MAD method and the recent technological achievements contributing to its success will be discussed in more detail in Appendix 1.

1.6 Objectives and Approach

X-ray crystallography is a powefil tool for comprehensive analysis of the structure-function relationship of proteins. This technique has been used by biochemists to gain insights into the complex mechanisms of enzyme regulation. Understanding of protease regulation in particular has been significantly affected from the many crystal structures that have been determined in both the active and zymogen States (77).

Although calpain has several zymogen-like characteristics, its ~a"-dependent regdatory mechanimi is far more complex than other cysteine proteases. For instance, the N-terminal region of calpain is not similar to other cysteine proteases. and its removal is apparently not a strict requirernent for activity (87). Even afier autolytic removal of the

N-terminal peptide. ca2+ is still required for activation, suggesting that additional mechanisms of inhibition exist (88). The precise mechanism of calpain regulation by

~a"is a fundamental biochemical question that has remained poorly understood, largely owing to the lack of available structural infornation. Thus, the goal of this project was to subject calpain to a comprehensive structure-hction analysis, so that we might better understand the complex mechanisrns regulating its activity. Specific biochemical questions that a structure might address include the nature of ~a"de~endentregulation, the mechanism of zymogen inactivation and subsequently, potential activation mechanisms. Further, the nature of how autolysis, subunit dissociation and phospholipid- binding affect the sensitivity of calpain to ca2' might be addresseci. A structure rnight allow insights to be made regarding the origin of the high-ca3-rquirernent in calpain, as well as the reasons for the difference in ca2'-&nity between p- and m-caipain. Finaily, a crystal structure of calpain might eventually assist in the development of therapeutic drugs used to treat pathologies including heart-attack, stroke, and Alzheimer's disease. Chapter 2: Materials and Methods

2.1 Sources and Description of Rat m-Calpain

For the purposes of structure determination, this project employed a recombinant rat m-calpain that differs slightly from the natural enzyme. For the catalytic large subunit, the active site mutant CyslOSSer was largely used in place of wild-type since it completely abolishes activity of the enzyme (47) and therefore ensured that auto- degradation could not occur during purification and crystallization. The large subunit also had a 14residue C-terminal extension ending in a 6 x histidine tag, which facilitated purification and increased the expression yield (89). The small subunit consisted of domain-VI only, containing residues 87-268 of the wild-type sequence. This latter construct was chosen since calpains containing domain-V are poorly expressed and prone to degradation in E. di, thereby giving nse to heterogeneous populations of enzyme.

The absence of domain-V does not affect the specific activity or cal' dependence of the enzyme (49,88).

PET vecton (Novagen) and modified PET vecton (pACpET) were previously used in the laboratory of Dr. John Elce of this Department as vehicles for cloning the calpain subunits (89). The plasmid PET-24-m-80k-CHis6 encodes the rn-calpain 80 kDa subunit teminating with a C-terminal histidine tag. The plasmid pACpET-2 1k encodes the C-terminal 184 residues of the regdatory subunit (refmed to as domain-VI or the

2 1kDa subunit). The plasmid PET-24-C 105s-m-80k-CHiso encodes the same protein except that the active site mutation CyslOSSer was introduced. These plasrnids were made available f?om Dr. Elce's laboratory as part of our extensive collaboration. 62Recombinant Protein Erpression and Purifcation

Purified cal pain was obtained from a modi fied protocol previously published by

Elce et al. (89). In order to rninllnize calpain breakdown, al1 procedures were carried out at 4OC as efficiently and rapidly as possible. Al1 buffers were khly prepared and filter- sterilized through 0.2 pm or 0.4 pm filters, stored at 4OC and had a trace arnount (0.01%) of sodium azide present to prevent fungal contamination.

Co-expression of the compatible plasmids PET-24-rn-80k-CHis6 and pACpET-

2 1k within E. coli strain BL2 1(DE3) gave rise to calpain subunits that associate in the cytosol of E. coli to yield an active heterodimer. Bactenally-expressed calpain did not appear to difier in any significant manner from calpain isolated from natural sources (89,

90). The antibiotics kanamycin (50 pg/mL) and ampicillin ( 100 pg/mL) were added to the E. coli growth medium to ensure selection of the 80 kDa and 21 kDa subunits. respectively. Stock solutions of the antibiotics in water were prepared by filter stedization and stored by fireezing at -20°C. Frozen permanents of BLX(DE3) strains containing the aforementioned plasmids were prepared by adding 70 pL of the cryo- protectant DMSO to 930 pL of an ovemight culture grown in Luna-Bertani (LB) medium and stonng the cells at -70°C.

Frozen permanent cells were streaked ont0 petri dishes containing LB-agar supplernented with both kanamycin (50 pg/rnL) and ampicillin (100 pg/mL) and

incubated overnight at 37OC. A single, well-formed colony was traasfmed to 10 mL LB

the next day, placed at 37°C and shaken vigorously. Mer the solution had reached

significant turbidity, indicative of log-phase growth, the 10 mL culture was divided in two in order to initiate two 300 mL cultures containing fresh antiibiotics. The 300 mL cultures, which originated from the sarne colony, were shaken vigorously at 30°C overnight. The 300 rnL cultures were transfd the following day to two 6L flasks containing 4L of LB that had been pre-warmed to 37OC. Just prior to addition of the 300 mL cultures, fiesh antibiotics and antifoam-289 (Sigma) were added to the pre-wmed

LB. Cultures were agitated at 37°C by aeration to an ODsW of approximately 1.2 to 1.5, at which point the temperature was reduced to 22°C and 500 mg (- 0.5 mM) of IPTG was added to induce overexpression of calpain. Cells were induced overnight (typically 8- 10 hours) in order to obtain the maximal yield of recombinant calpain.

Bacterial cells were harvested by low speed centrifugation at 4OC and resuspended in a lysis buffer consisting of 25 mM Tris-HC1 pH 7.6, 5 mM EDTA, 5% glycerol, 10 mM f3-mercaptoethanol and approximately 0.01 mg/mL of the senne protease inhibitor PMSF. Resuspended cells were frozen at -ZO°C, thawed, then sonicated in several short intervals using a Branson sonifier. Cytosolic proteins were separated from cellular debris by centrifugation at 20000 rpm for 1 hou-at 4OC.

To obtain a calpain sample of sufficient purity for crystallization, several successive chromatographic steps were required. The sonicated cell-lysate supematant was added to -150 mL DEAE-Sephacel (Phannacia) anion-exchange resin in DEAE buffer (25 mM Tris-HCI pH 7.6,s mM EDTA, 10 mM p-mercaptoethanol) supplemented with 0.2 M NaCI in a total volume of -600 mL. Afier thorough mixing, the resin containing the bound protein was dlowed to settle and the supematant containing the unbound protein was decanted and discardeci. The slurry was poured into a column, washed with a f.urther 200 mL of DEAE buffer plus 0.2 M NaCI, and a linear gradient of

NaCl fiom 0.2 M to 1.0 M (in DEAE buffer) was applied using a pmp at 1.0 mL./min. Fractions were examined for the presence of calpain by a variety of methods such as the

Bradford assay (9 1), SDS-P AGE, casein-zymograph y (92) or the standard casein assay

(88). In the case where the inactive Cys 1O5Ser mutant was being purified, the activity assays (casein assay or casein zymography) were not used. These methods will be discussed more fully below. The calpain heterodimer eluted at roughly 0.3 to 0.4 M

NaCl fiom the DEAE column. Fractions determined to contain a significant amount of calpain were pooled and prepared for the next chromatographic purification step, which was typically a ~i'+-~~~-a~arose(Qiagen) affinity column.

Since the pooled fractions contained 5 rnM EDTA, it was necessary to remove or sequester the EDTA before proceeding to the ~i"-NTAstep. In most cases, an excess of

MgC12 (typically 10 mM)was added, dong with an additional 200 mM Tris-HCI pH 7.6, which prevented the pH fiom dropping following the formation of the EDTA-Mg complex (which releases protons). As an alternative to MgCl?, in some cases, ammonium sulfate was added (0.36 g/mL) to precipitate the calpain sample, which was then recovered by centrifigation and resuspension in N-buffer (50 mM Tris-HCl pH 7.6, 5 mM imidazole, 2% glycerol). The pooled protein was then applied directly to the ~i"-

NA-agarose affinity resin (-1 0 mL) in a column format. The column was washed with

-100 mL of N-buffer supplemented with 0.5 M NaCl to remove protein that was non- specifically bound. To elute the specifically-bomd pmtein from the column matrîx, a linear gradient of 10-350 mM imidizole in N-buffer plus 0.2 M NaCl was applied at 1.0

Wmin. 5 mL fractions were collecteci into tubes that already contained a Tris-

HCEDTAIB-mercaptoethanol mixture so that the final concentrations within the fractions were 50 mM, 2 mM and 10 rnM, respectively. Fractions containing calpain were identified as mentioned previously, pooled, and concentrated by centrifugation for gel-filtration chromatography using a BioMax 30-kDa M, exclusion device (Millipore) to approximately 0.5 mL.

For gel filtration chromatography, two column resins were employed with comparable success. Initial experiments were performed using Ultra-gel AcA44 resin while experiments conducted in the latter stages of this project were performed using

Sephacryl S2OO-HR (Phamacia). In both cases the column was pre-equilibrated for at least 12 hours prior to use in Gel-Filtration buffer (50 mM Tns-HCl pH 7.6,2% glycerol,

0.2 M NaCl, 10 mM B-mercaptoethanol, 2 rnM EDTA, 0.01 % sodium azide). A final concentration of 20% glycerol was added to the -0.5 mL concentrated sample, which was then pumped directly onto the top of the gel-filtration resin at a flow rate of 0.5 mumin.

Gel-filtration buffer was run through the coiumn ovemight to elute the proteins fiom the matrix. As in the previous chrornatogmphic steps, fiactions containing calpain were identified and pooled.

The final purification step involved an FPLC Q-Sepharose fast-tlow 16/10 anion- exchange column (Pharmacia). The pooled Fractions from the gel filtration step

(typically a total of -30-40 mL) were loaded into a mperloop (Pharmacia) and injected into the Q-sepharose column in 0.3 M NaCl, 50 mM Tns-HCl pH 7.6,2 mM EDTA and

10 mM P-mercaptoethanol at 2.5 mumin. The column was washed with a Mer 100 mL of this bufk at a rate of 6.0 Wmin. To elute pwified calpain from the column, a linear gradient of 0.3 to 0.7 M NaCl (in 50 mM Tris-HCl pH 7.6, 2 mM EDTA and 10 rnM P-mercaptoethanol) was applied at a flow rate of 5.0 dmin. Purifieci m-calpain typically eluted at -0.55 M NaCl nom Q-sepharose under these conditions. Purified calpain was then concentrated by centrifugation using a BioMax 30-kDa

M, exclusion device to approximately 10 mg/d in a buffer containhg 50 mM Tris-HCI pH 7.6, 100 mM NaCl and 200 pM EDTA, and usually 10 mM DIT. The precise concentration was determineci by the üV absorbance at 280 m. The concentrated protein sample was either stored at 4OC or was divided into several small aliquots

(typically 50 PL), flash-fiozen in liquid nitrogen, and stored at -70°C.

2.3 Production of Selenomethionine-labeled Caipuin

To specifically incorporate heavy atoms, we decided to express and purify a selenomethionine (SeMet)-derivative of calpain. For this purpose, we obtained the E. coli strain E3834(DE3) as a generous gifi from Dr. Wayne Hendrickson (Howard Hughes

Medical Institute, Columbia University). This E. coli auxotrophic strain is deficient in the methionine biosynthesis pathway and is therefore dependent upon a growth medium that is suppiernented with methionine. Following the method of Hendrickson et al. (93), a defined medium was prepared, consisting of growth factors, vitarnins, energy sources and al1 essential amino acids with the exception of methionine. In place of methionine, the medium was supplemented with 50 mg/L of D/L-selenomethionine (Sigma), ensuring

that every methionine was completely substituted with selenomethionine.

B834(DE3) cells were CO-transformedwith the compatible plasmids PET-24-m-

80k-CHis6 (or PET-24Cys I05Ser-m-80k-CHi%) and pACpET-2 1k and seiected on LB-

agar plates supplemented with kanamycin and am picillin. Frozen permanents were

prepared as describecl above for BLZI(DE3) cells.

For large-scale production of SeMet-labeled calpain, slight variations in the

expression and purification techniques were made. Briefly, fkozen permanent cells were used to initiate ce11 growth first by plating on LB-agar dishes followed by preparation of a 10 mL and then two 300 mL cultures in LeMaster7smedium (93), which were shaken vigorously at 37OC for severai hom. These cells did not grow nearly as rapidly as

BL21(DE3) cells and ohgreater than 24 hours of growth were required before significant turbidity was obsewed. The 300 rnL cultures were used to initiate growth in two 4L cultures, which were aerated at 37°C until an ODsoonmof approximately 0.6-0.8 was reached, at which point the temperature was reduced to 30°C and 500 mg of IPTG

(-0.5 mM) was added to induce overexpression of SeMet-labeled calpain. Cells were induced for approximately only 4 hours in order to obtain the maximal yield of the desired protein. SeMet-derivatized calpain was purified in essentially the identical manner as desaibed above for wild-type m-calpain, although additional reducing agents

(20 mM Pmercaptoethanol and usually 10 mM L-methionine) were added to prevent oxidation.

2.4 Assaying Calpain Activity and ~a~+-~e~endence

2.4.1 Standard Casein Assay and ca2+-~itration

Casein is a protein substrate of calpain that is most common1y used for the measurement of calpain activity (88). Calpain cleaves casein in the presence of ~a"to produce TCA-soluble hgrnents that cm be detected by absorbante at 280 MI following centrifugation. This standard assay is also most commonly used as a measurement of the

~a"dependence of calpain activity, simply by perfonning the assay at a variety of ca2+ concentrations, as descnied below. The duplicate assays contained 4 mg/mL casein, 0.2

M NaCI, 10 rnM p-rnercaptoethanol, 50 mM Tns-HCI pH 7.6 in a final volume of 100 PL. Net final CaC12 concentrations ranged from O PM to 5.0 mM. The reaction was initiated with 4 pl of enzyme sample (see below) and the mixtures were incubated at

2S°C for 30 min before the reaction was terminateci by addition of 70 pl of ice-col6 10%

TCA. The resultant mixture was placed on ice for 10 min, centrifùged at 15,000 rpm for

15 min, and the absorbance values of the supematants were recorded at 280 nm. lmediately pnor to the Ca" titration, the enzyme aliquots were freshly thawed fiom

-70°C, and diluted in 50 mM Tns-HCI pH 7.6 to a final enzyme concentration of 0.2 mghl to 2 mg/ml, depending on the specific activity of the enzyme sarnple under investigation. The Ca" concentration required for half-maximal activity with casein as substrate is given as [~a"]o.s. This value was calculated by fitting the normalized activity data to the Hill equation y = xn/(kn+ xn), where y is the fraction of maximum

activity, k = [~a'+]~.~,n is the Hill constant, and x is [~a"].

In some cases, limited autolytic digestion of calpain samples was performed to

test the effects on the ~a"-requirernent. In such experiments, the calpain sample (at a

final concentration of 0.4 mgmL) in 50 mM Tris-HC1 pH 7.6 and 10 rnM DTT was

incubated with 2 rnM CaClz for either 1 or 3 minutes at room temperature. The reaction

with CaCl? was stopped by addition of a net final concentration of 2 mM EDTA. A zero

time-point control was used by adding EDTA prïor to addition of 2 mM CaC12. These

autolyzed calpain samples were immediately titrated for their ~a"4e~endenceusing the

casein assay as describeci above.

Casein zymography (92) was used as a rapid and sensitive method to assess the

activity of wild-type calpain and severai mutants, and also to determine where calpain had eluted nom chromatographic purification procedures. This procedure required the incorporation of casein (-2 mm)into a non-denaîuring 10% polyacrylamide gel.

Calpain samples to be tested were loaded into the gel using a standard loading buffer lacking SDS to avoid denaturation. The gel apparatus (Biorad) was mrrounded with ice, and the buffer was constantly mixed to prevent the formation of heat adorpH gradients.

The gel was nin by applying 125 volts for a period of two hours.

Development of the casein zymograrn was accomplished by removing the gel fiom the apparatus and shaking it in a solution containing 10 mM CaC12, 50 mM Tris-

HCI pH 7.6 and 10 mM f3-rnercaptoethanol overnight at room temperature. Gels were then stained using Coomassie-blue dye and then destained. Regions devoid of the blue

Coomassie stain represented regions where calpain activity had caused degradation of casein within the gel during the overnight incubation with CaC12.

Al1 solutions to be used for crystallization were of very high quality.

Polyethylene glycols (PEG)were purchased frorn Fluka while buff' such as MES, and salts such as NaCl or ammonium sulfate were purchased fiom Sigma or Fisher Scientific.

Since microbial contamination can ofien negatively influence crystallization efforts, dl solutions to be used for crystallization were sterilized by filtration through a 0.22 pm filter, and stored in the dark.

The majority of atternpts to crystallize m-calpain were done using the hanging- drop vapour-difion method with 24-well VDX crystallization plates (Hampton

Research). 1 mL of a variety of potential crystallization solutions was placed in each of the 24 wells. To perfom the crystallization trial, typically 2-5 PL of purified calpain solution was placed ont0 a siliconized circular glass cover slide and then mixed with an quai volume of a potential crystallization solution. AAer rnixing, the cover slide was inverted and then sealed over the greased well 60m which the crystallization solution in that &op originated. in early attempts to improve the crystallization, sitting-drop vapor diffusion or capillary-based batch crystallization methods were also employed.

Initial crystallization attempts involved the comrnercially available sparse-matrix

(94) screening kits "Crystal Screen" and "Crystal Screen II" fiom Hampton Research.

Crystallization trials were performed at room temperature, 4OC, and 28OC for both kits and the protein concentration was varied fiom 5 mg/mL to 20 mg/mL. After initial optimization of crystallization conditions, standard two-dimensional screening techniques were employed. In these experiments, the two components that were vked were typicaily the concentration of the precipitant and the pH of the buffer.

Although the vast majority of crystallization efforts were directed towards ~a"-

f?ee conditions, severai attempts were made to obtain crystals in a ~a"-bound state. To this end, CO-crystallizationin the presence of ~a"and soaking native crystals in solutions

containing ~a"were both attempted. In most cases, the final ~a"concentrations

typically ranged from 100 FMto 100 mM.

2.6 Harvesting and Storage of Ciystuis

Three procedures were attempted to preserve and store calpain crystals that had

reached optimal size for difhction. In the first method crystals were removed fMom the

onginal crystallization drop and placed in a micro-bridge (Hampton Research) containing

a solution identical to the original mother-liquor with the exception of a slightly higher

precipitant (PEG-6000) concentration (generally 3.6% higher). The second method consisted of sirnply moving the crystallization trays from room temperature to 4OC. The third method, which will be discussed in more detail below, involved a cryogenic protection strategy whereby crystals were transferred through various cryo-protectant solutions and then flash ftozen in a via1 containing liquid propane. These vials were transfmed to liquid nitrogen-containing dewars where the crystals could be stored indefinitely.

2.7 X-ray Diffraction and Data Processing

During the course of this projet, over 100 data sets were collected on calpain crystals using a variety of X-ray sources. In al1 cases, data were collected using the oscillation method, in which the crystal was rotated about a defined axis (the spindle ais) perpendicula. to the X-ray beam. Depending on the resolution required and the crystal mosaicity, the crystal was rotated in steps ranging from 0.3' to 1.0'. During rotation, a dose of X-rays sufficient to field high quality difiaction patterns was exposed to the crystal. The intensity of the diffracteci spots was recorded on a variety of area detectors.

Most of the initial experirnents involved data collection of calpain crystals using the in-house facility, which consisted of a Rigaku RU-200 rotating anode X-ray generator

(operated at 50 kV, 100 mA) and an Osmic mirror system to focus the resultant X-rays into a tightly collimated and intense beam. The X-ray generator was accompanied by a

300 mm MAR Research image plate detector on which the intensities of the diffiction spots were rneaswed. Most early data sets were collected at room temperature by placing a crystal in a quartz capillary, which was sealed at both ends to prevent dehydration of the crystal. Capillaries were placed ont0 the goniorneter head using plasticene, and crystals were aligned in the center of the X-ray beam pnor to data collection.

Upon installation of a Cryostream Cook from Oxford Cryosystems, al1 subsequent data sets were collected at a temperature of 100 K in order to minimize X-ray induced radiation damage. To protect the crystals fiom the extreme cold-shock and ice formation that occurs at cryogenic temperatUres, a variety of solutions were tested for their ability to prevent crystai damage. Such "nyo-protectant" solutions generall y contained the original crystallization solution supplemented with compounds such as glycerol, MPD, ethylene glycol, sucrose, glucose or oils such as paratone-N. Potential

cryo-protectant candidates were tested by transfemng the crystal into solutions that

contained progressively increasing concentrations of the cryo-protectant. Crystals that

did not appea. to be perturbeci by the cryo-protectant solution were looped using

mounted-cryoloops (Hampton Research) and placed directly into a via1 containing

liquefied propane (-80 K). The vials were then placed within a nitrogen dewar to allow

the propane to solidifi (effectively forming a protective "propane sheath" around the

aystal) for long-tenn storage. For data collection, nozen crystals were mounted ont0 the

X-ray diflhctometer under cryogenic temperatures so that the "propane sheath" would

slowly melt away, after which the crystal was kept at 100 K in the cryostrearn.

Synchrotron sources were used for single wavelength data collection on a variety

of calpain crystals in order to get the highest possible quaiity and resolution of data.

Native data sets were collected at the Comell High Energy Synchrotron Source (CHESS)

beamlines A-1 and F-1 and also beamline X8C at the National Synchrotron Light Source

(NSLS) at Brookhaven National Laboratones (BNL). Al1 three beamlines were equipped with an Area Detector Systems Corporation (ADSC) Quantum-IV CCD detector (188 mm width). High resolution data collection was also attempted at the IMCA-CAT beamline at the Advanced Photon Source (US) using a MAR-CCD detector. Al1 data sets obtained using synchrotron radiation were collected at 100 K using crystals that had previously been fiozen in propane.

Two multiwavelength anomalous dispersion (MAD) (95) experiments were performed at CHESS beamline F-2. which was equipped with a Lk x lk CCD (Sol

Gniner) detector (55mm width). A third MAD data set was collected at beamline X4A at the NSLS, which was equipped with an R-Axis IV image plate detector (300 mm width).

Two MAD data sets (on two different crystal forms) were collected on beamline 1-5 at the Stanford Synchrotron Radiation Laboratory (SSRL), which was equipped with an

ADSC Quantum IV CCD (188 mm width). Al1 MAD experiments were performed near the selenium K-absorption edge (-0.98 A) using the inverse-bearn mode for data collection. which ensured that Bijvoet reflection pairs were collected relatively close together in time. The first three MAD data sets were collected at three wavelengths. Of the two MAD data sets collected at SSRL. one was a complete four wavelength experiment, while the second was only conducted at two wavelengths due a malhction of the cryocooling device that caused the crystal to be destroyed.

The precise wavelengths to be collected in MAD experiments were determined by perfoming an X-ray fluorescence scan on a SeMet-calpain crystal. This scan illustrated the precise energies of the Se-absorption edge and white line (peak), which are used for

MAD data collection. In al1 cases, the white-line (peak) wavelength was collected first, the innection point (edge) was collected second, a remote wavelength of higher energy was collected third, and in one case, a low-energy remote wavelength was also collected.

X-ray diffraction data were processed with the HKL program suite (96, 97) and the

CCP4 program suite (98). The auto-indexing routine in DENZO (96) was used to obtain the unit ce11 dimensions and crystal orientation relative to the detector. STRATEGY (99) was used to determine the optimal starting orientation of the crystal in order to ensure data collection was as efficient as possible. DENZO was used to index diffiction spots and to determine their corresponding intensities. The scaling algorithm as implemented in SCALEPACK (96) was used to put reflections from different images ont0 a common scale, so that a11 data could be combined into a final "data set", which consisted of a set of reflections and their observed intensities. For MAD data sets, the "scaIe anomalous" option in SCALEPACK was used (except at the low energy remote wavelength) to ensure that Bijvoet retlection pain were scaled separately and were not merged. The program TRUNCATE (98) within the CCP4 suite was used to convert the difiaction intensities into structure factor amplitudes.

The number of molecules in the unit ce11 of the crystal was determined according to the method of Matthews (100), using the formula VM=(unit ce11 volume)/(n*Md, where Vbl is the Matthews coefficient, n is an integer, and Mris the molecular weight of the protein. Based on measurements fiom a large number of previously characterized protein crystals, the Matthews coefficient is generally expected to be in the range of 1-68 to 3.53 A3/Da. The formula V,i,,=l-1.23NM, which assumes that most proteins have a partial specific volume of 0.74 cm'/g, was used to calculate the solvent content of the ctystds. 2.8 Screening for Heavy Atom Derivatives

To solve the crystal structure using the method of multiple isomorphous replacement (MIR), we atternpted to obtain heavy-atom derivatized crystals of calpain using two main experimental techniques: soaking and CO-crystallization. Soaking experiments involved taking a pre-formeci calpain crystd fiom its original crystallization solution and placing it in a solution containing one of several kinds of heavy atoms, including mercury, gold, platinum, !ead, etc. Between 75 and 100 different heavy-atom solutions were tested at a variety of different concentrations as well as in different buffers. Potential heavy-atom derivatized crystals were harvested, fiozen in propane and stored until X-ray diffraction data could be collected. Greater than 50 data sets were collected using the in-house facility, beamlines A-1 and F-1 at CHESS, and beamlines

X4A and X8C at the NSLS.

A large number of CO-crystallizationexperiments were also ernployed as a method to obtain potential heavy atom derivatives of m-calpain crystals. In these expenments, a small volume of a heavy atom-containing solution (typically 0.2 PL) was added to the hanging-drop crystailization experiment. Although crystallization conditions for wild- type m-caipain had already been established, it was necessary to Vary the buffer, pH, and precipitant concentration in ûials where heavy-atoms had been added. Co-crystals were subjected to X-ray difiction at both the in-house facility as well as at CHESS bearnline

A-l .

Data sets collected from potential heavy-atom derivatives were processed in a similar manner as for wild-type data sets, although in cases where anornalous signal was present, Bijvoet reflection pairs were not merged. 2.9 Determination of Heavy Atom Positions

To solve a novel crystal structure using either MIR or MAD it is essential to first locate the position of the heavy atoms. For conventional heavy atom screenin, , .. jstals soaked in heavy atoms), data fiom potential heavy-atom derivatives were scaled to the appropriate native data set using SCALEIT (98). Isomorphous difference Patterson maps were next calculated using FFT (98) and the v=1/2 Harker section was examined for the presence of strong peaks and to determine the position of any bound heavy atoms.

The MAD data collected fiom Be SeMet denvative were anaiyzed using both

Patterson methods and direct rnethods. SCALEIT was used to scale the distinct wavelengths to each other, and dispersive difference Patterson maps were calcuiated between the remote and either the peak or edge wavelengths. Anomalous difference

Patterson maps were calculated nom the anornalous differences at the peak wavelength.

The program Shake n Bake (SnB) (101) was used to locate the positions of the selenium atoms using direct methods. Data fiom the peak wavelength was first processed using the DREAR program suite (1 Oz), which converteci the data to normalized structure factor amplitudes and applied a local scaling dgonthm to the Bijvoet pairs.

2.10 Heavy Atom Refinement and Phasing

The crystal structure of calpain was solved using MAD data collected at SSRL from selenomethionine-containing crystals. To refine the heavy atom positions and determine the initial phases, the selenim positions determineci f?om SnB were input into the program SHAW (103), which treats MAD data sets as a special case of MIR including anomalous scattering (MIRAS). Three different stages of maximum-likelihood refinement were perfonned, with a maximum of 10 mini-cycles of parameter rehernent implemented within each stage. In the first stage, most parameters were held constant while the occupancies, error and scaiing parameters were refined. After convergence, the second stage of rehement proceeded to refine the actual positions of the heavy atoms in addition to those panuneters refined in the first stage. In the third stage, isotropie B- factors and f and f' values were added to the refinement. The f' value at each wavelength was allowed to Vary, however the f of the low energy remote wavelength was kept fixed since it served as a reference point for the other wavelengths.

Following convergence of the final stage of the refinement process, SHAW caiculated the initial centroid phases 60m the probability distribution of the native structure factor. These phases served as the starting point for phase improvement using the density-modification techniques solvent-flattening and solvent-flipping. SOLOMON

(KM), which is directly interfaced to work optimally with SHAW was used for this purpose. Solvent flattening used an automatically determined solvent mask, and was performed starting at 4.0 A resolution, gradually extending to 2.6 A over the course of

13 O cycles.

2.11 Building the Crystal Structure Model

Initial electron density maps and solvent flatteneci maps were calculated fiom the

SeMet MAD data using XFFT, and examined using XFIT, programs within the

XTALVIEW program suite (LOS). Maps were examined for the presence of a clear protein-solvent boundary as well as elements of comectivity and secondary structure, and were subsequently deterrnined to be suitable for building the atomic model.

The initial C, trace was built in XFIT using solvent-flatteneci maps fkom the

MAD dataset on the PZi crystai fom. At this stage, there were several regions of unconnected electron density therefore there was a large degree of uncertainty as to the exact identity of the individual Ca markers. Using a library of defined pentarner structures, XFIT was used to build a poly-alanine model that consisted of roughly 550 alanine residues in the large subunit (out of 700) and 140 alanine residues in the srnall subunit (out of 184). Using the known positions of the selenium atoms, the methionine positions were easily deduced, and thus the identity of the amino acid residues nearby the methionines were detemineci. Eventually enough of the arnino acid identities were determined and the majority of the remaining side-chains were incorporateci by implementing the "set sequeuce" feature in XFIT (105). Most of the side chah orientations, however, were irnproperly placed in the density and had to be manually re- fit by applying rotations about side-chain torsion angles.

At this stage, there were still several missing regions totaling well over 150 residues, so the model was placed into the solvent-flattened electron density maps from the PI crystal form, using the positions of the seleniurn atoms as a guide. Several regions of electron density were much better defined in these maps and several ambiguities were quickly resolved. In addition, electron density was clearly visible in some regions that were totally absent in the PZl form. These missing segments were built into the structure and the adjusted model was then introduced into the rehement process.

2.22 Refinement

The initial model was refined against native X-ray data in both space groups using the program CNS (106). A maximum-Iikelihood function (107) was chosen as the rehement target, and the positions of the atoms were refïned so as to minimize the crystallographic R-factor. The R-factor is represented by the following equation, R=Z IJFabs- Fdcll 1 Z IFobrl,where Fobr and Fei, are the observeci and calculated structure factor amplitudes, respectively, and the summation is over al1 reflections. As the atom positions in the model were refined, Fcaic was recalculated and was brought into closer agreement with FOk, thereby minirnizing the R-factor. During refinement, strict geomeîrical constraints were imposed based on the knowledge of the structures of numerous mal1 molecules at high resolution (1 08). Bulk-solvent correction was applied to low resolution (50-6.0 A) reflections. In order to prevent "over-refinement", 10% of the reflections were excluded from the refinement process and used for the calculation of the fiee R-factor (109, 1 10). Othenÿise, al1 of the data was used in the refinement and no sigma-cutoff was applied.

After manual model adjustments were made in XFIT, 20 cycles of ngid-body refinement were applied, treating each subunit as a distinct ''rigid body", to position the molecule in a slightly better orientation within the unit cell. This model was then subjected to 200 cycles of positional refinement, in which the positions of the individual atoms within the molecule were refined. Positional refinement is a very powerfùl tool, but is limited by the fact that it proceeds uni-directionally towards the lowest energy conformation, and as a result, can often end up in a local minimum rather than the global minimum. For this reason, torsional molecular dynamics using a simulated-annealing temperature gradient (2500 K to 300 K in steps of -25 K) was used to correct the model after positional refinement. The simulated hi& temperature affords the atoms with enough kinetic energy to overcome any local energy banïer, which could prevent them fiom reaching their desired low-energy conformation. Following simulated-annealhg rehement, the model was positionally refined for a Wer 100 cycles, and subsequently, individual isotropie B-factors were refined through 30 cycles. Water molecules were initially added by an automatic process that searched for peaks of significant electron density (>30) calculated fiom the F,-F, map. This htround of refinement was accomplished using data fiom the P2, crystal fom over the resolution range 50 to 2.15 A.

Although the model at this stage was largely completed, there were still severai segments that remained missing fiom the model. After each round of refinement, the updated model was used to calculate sigma-A weighted (1 11) 2rn(F,I-DIFcIand mlF,I-

D(FcImaps, which were subsequently analyzed in XFIT. The improved model phases gave rise to slightly improved electron density that allowed some of the missing residues to be added to the model. Several rounds of manual model-rebuilding in XFIT followed by refinement in CNS were required to achieve the best possible model (with the lowest free R-factor). When the refinement in the PZi space group had converged, the model was placed into the Pl ce11 and refined using X-ray diffraction data from 50 to 2.6 A.

The 2F,-Fc maps showed several regions of electron density that were not visible in any of the maps, so several missing residues and even some entirely rnissing segments were built into the model. Water molecules from the ce11 were mostly différent fiom those in the Pl ce11 and therefore discarded in the Pl refinement in favor of the appropnate Pl water molecules. At the conclusion of al1 model refinement, the most complete model, correspondhg to the coordinates submitted to the RCSB Protein Data

Bank (5) (PDB code IDFO), was fkom the Pl unit cell.

2.13 Structure Analysis and VaIidation The refined crystal structure was analyzed graphically using a variety of programs including TURBO-FRODO ( 1 12) and S ETOR ( 1 13). Secondary structure classifications were assigned according to the algorithm implemented in the program STRIDE (1 14). In addition to CNS (106), the program PROCHECK (1 15) was used to check the geometry and stereochemistry of the model, as well as to generate Ramachandran plots (1 16). The program ALIGN (1 17) was used to compare calpain to structurally similar proteins through a protocol that overlaps sirnilar structures by minimizing the root mean square deviation (r.m.s.d.) between smicturally equivalent atoms. Figures for representation of the structure were produced using the programs MOLSCRIPT (1 18) and RASTER 3D

( 1 19). Figura representing the electron density maps were produced in XTALVIEW

(105) and rendered with RASTER 3D. Electrostatic and van der Waal surface diagrams were created using the prograrn GRASP (1 20).

The 3-dimensional coordinates of the m-calpain structure were used to search molecular databases of known crystal structures in order to identify structural homologs of the various calpain domains. The coordinate file was broken dom into srnaller files representing the individual domains that were subsequently up-loaded to internet-based servers for database searching. The DALI (http://www.ebi.ac.uk/dali) and CATH

(http ://www.bioc hem.uc1 .ac.uk/bsm/cath/sewerl) servers were the primary databases used for this purpose. Potential structurai homologs were graded on their sunilarity to the given calpain domain based the position of alpha-carbon atoms. Promising leads were downloaded fiom the server and analyzed using the prograrns TURBO-FRODO (1 12) and SETOR (1 13). 2-15 Molecular Modeling

To produce a molecular model of the p-calpain isoform, the Swiss-Mode1

program (http://www.expasy.ch/swissrnod/SWISS-MODE.was used. The 3-D

coordinates of m-calpain and the primary amino acid sequence of p-calpain were

uploaded and submitted to the server. The Swiss-Mode1 algorithm (121) performed an

initial assessment of the prîmary amino acid sequence homology, and then ''threaded" the

appropriate sequence of residues from the target molecule (p-calpain) ont0 the known

coordinates of m-calpain. The resultant model was energy minimized using the

Grornos96 algorithm ( 122).

2.15.2 Active Calpain and InhibitorISubstrate Design

To obtain a model of what the protease domains of m-calpain may look like in the

~a"-bound, active fom, a combination of manuai and computational adjustment was

performed. The coordinates representing the protease component of m-caipain were used

as a reference for which to overlap the structure of the cysteine protease papain (PDB

code 9PAP). The active site residues of papain were then used as a reference point to

guide the manual re-orientation of the protease component of calpain. One domain of

calpain's protease component was re-onented by applying a manual rotation (and a very

mal1 translation of about 2 A) using TURBO-FRODO (1 12) such that the active site

residues of the two proteases were in near-identical confomations. Two key side-chah

residues in calpain (Trp288 and Gln99) were also manually re-orienteci. This model was then subjected to standard energy minimization in SYBn ( 123) using the TRIPOS force field and Mardi-Gasteiger electrostatic parameters.

Using SYBYL (123), this active mode1 was subsequently used to design a hypothetical peptide molecule that could potentially bind to the active site of m-calpain in the presence of calcium. A hepta-alanine peptide was manually placed along the length of the substrate-binding pocket and energy minimized to achieve the most stable backbone orientation. Based on the knowledge of calpain's limited substrate preferences, the P2 position of this peptide was fixed as a leucine, while the remainder of the residues were varied. The resultant peptides, along with the active site residues, were energy

minimized and the most-favorable (lowest energy) interactions were examined to reveal

possible insights into the nature of the interaction of calpain with its substrates. The

energy minimization procedure included only the atoms of the peptide as well as those

atoms in m-calpain within a 10 A radius of the peptide. The remainder of the D-1 and D-

II coordinates were omitted fiom the minimization to reduce the computational tirne.

2.16 Site Directed Mutagenesis

2.16.1 Preparation of single-stranded DNA

Single-stranded DNA (ssDNA) to be used for site-directed mutagenesis was

prepared using the E. coli strain CJ235 and the helper phage R408. A fresh stock of

R408 phage were prepared by amplification in E. coli strain NM522. A previous stock of

R408 (that had been maintainai at 4 OC) was streaked ont0 an LB-agar plate, der which

a mixture of 3 mL of soft agar (at 50 OC) plus 400 pL of an overnight culture of NM522

was poured unifody over top. Afkr incubation at 37 OC for 12 hours, plaques were picked from the agar plate and transfmed to 100 mL of LB for growth at 37 OC overnight. The ce11 suspension was centrifugeci at 4 OC for 30 minutes at 4000 rpm, afler which R408 phage were collected in the supernatant Serial dilutions of the newly prepared R408 helper phage were plated on lawns of E. coli to determine the titre.

After phage were prepared, plasmids to be used for mutagenesis were transformed into CJ236 competent cells and selected for by growth on an LB-agar plate supplernented with kanamycin (for large subunit plasmids) plus chloramphenicol(45 pg/mL). A single colony was picked and grown in 10 mL of culture to log phase (O.D.rn nm of 0.4) at

37OC. From this culture, 2 mL was infected with R408 at an MOI of 10 (using 100 PL of a 10" titre of R408). Cells were infected for one hour, transferred to 50 mL of LB and grown at 37°C ovemight. This suspension was centnfuged to remove the cells, and the culture supernatant, containing the ssDNA, was recovered. This ssDNA was isolated by adding 5 mL of 25% PEG in 3M NaCl, mixing for 30 minutes and then centrifuging at

10,000 rpm for 30 minutes. The ssDNA pellet was dissolved in 400 pL of TE buffer (1 0 rnM Tris-HCI pH 8.0, 1 mM EDTA), and purified by phenol and phenol/chlorofomi extractions. ssDNA was again pelletai using 2 volumes of ethanol and 0.3 M sodium acetate pH 5.2, and dissolved in 200 pL of TE plus 10 pg/mL RNase A. The quality of the ssDNA was determined by agarose gel-electrophoresis and ethidium bromide staining, using the corresponding double-stranded DNA plasmid as a control.

2.16.2 Site-Duected Mutagenesis

Site-directeci mutagenesis on calpain was performed according to the method of

Kunkel (124). in most cases, mutations were made in the catalytic subunit of the wild- type rn-calpain constnict, using ssDNA produceci as above 60m the plasmid PET-24-m- 80k-CH&. Some mutations were also made in the regulatoiy subunit through a slightly more complicated procedure since the pACpET-2 1k plasmid (used for CO-expressionwith the large subunit) does not contain an fi site and thus ssDNA could not be prepared from this vector. in this case, the PET-20b-21k plasmid (which does contain an fl site) was used to prepare the ssDNA, which was used in the mutagenesis protocol.

Antisense primers to be used for mutagenesis were synthesized by the Cortec

DNA Service Laboratones at Queen's University. The following antisense primers were used for Kunkel mutagenesis in the catalytic subunit:

~'-P-CGGTC~GGCTAGCTTTCATGCAGATGCCAGCCAT-~'(A~~SC~S);5'-P-GCC-

TCGCGGTCCGTGGCCAGTTTCAT-3' (LYSl OThr); 5'-P-CCCAGCCCCTCTGCAGC-

CTCGGAGTCTTTGGCCAG-3' (Arg 1 Ber); 5'-P-TCGGCCGCCTCAGAGTCTGTGG-

CCAGTTïCAT-3' (Lys 1OThr/Arg 1ZSer); 5'-P-CACCCATTCGCAATATTGCCAGA-

ACT3' (Gl y 147Cys); 5 '-P-CAAGCAGAGAACCTGCCTCGAGAGCAGCCTGGATG-

ATAGCGAACAAATTGGGAG-3 ' (Lys226Ala/Lys230Ala/Lys234Ala); 5 '-P-CTïCT-

GGATGATCGAGAAGAGATTGGGAGGAGG-3' (Lys226Ser); S-P-GAGAACCTT-

TCTCGAGAGCCGACTGGATGATCïTG-3' (Lys230Ser); 5'-P-GAGAACCTTTCTC-

GAGAGCCTCCTGGATGATCTTG-3' (Lys230Glu); S-P-AAGCAGAGAACCTGAC-

TCGAGAGCCTTCTGGATG-3 ' (Lys234Ser); 5'-P-AGCAGAGAACCTTCCTCGAG-

AGCCTTCTGGAT-3' (Lys234Glu); 5'-P-AGCAGAGAACCTTCCI'CGAGAGCCTC-

CTGGATGATCTTG-3' (Lys230Glü/Ly~234Glu); 5'-P-GGGCAGCTTGAACGCGTT-

GAGGACCTC-3' (Arg474Ala); 5'-P-TCTTTCTCTGAGAACACGTGGATGCAGAAA-

TC-3' (ArgSOOHis); 5'-P-AGTCAGCCTTCITGCTCGAGAAGACTCGG-3'(Glu504

Ser); 5'-P-AATGTïGGCCTCCTCAATTGGATCATCGACAGITï-3'(Glu5 1SPro); 5'- P-TTCAATGTI'GGCCTGGATlTGATCGTCGATCGTCGACAGGGTA-3'(Glu5 1SGldGlu-

5 l7Gln); 5'-P-TTCAATGTTGGCCGGGATTTCATCATCGA-3' (Glu5 I 7Pro); 5'-P-

GTGCGGCCGCTAGCT-TACCTAGGACACAAAAACTCAGCCAC-3'(Ser698Cys).

The following antisense prirners were used for Kunkel mutagenesis in the regdatory subunit: 5

TGGC-3' (Ala 158Cys); 5'-P-TGAAAGCACGGAAACATGCATCCAGCCT-3'(Met-

155Cys).

Additionally, one mutation was made in a chimeric p/m-calpain construct that had previously been generated in Dr. John Elce's laboratory. This chimeric enzyme consisted of the first -550 amino acids of p-calpain at the N-terminus followed by the C- terminal -1 70 residues of m-calpain. The plasmid encoding this consûuct. PET-24-m-

80k-Fsp-mIV, was used to generate ssDNA as mentioned above. An antisense primer of sequence 5'-P-CCAGCCTTCTTCGAAGAAAAGAAGCGCA-3' was used to introduce a Glu5 ISSer mutation, analogous to the Glu504Ser mutation made in wild-type m- cal pain.

The mutagenesis reaction was perfoxmed using 40-80 pmol of the appropriate antisense primer and 0.1 - 1.O pg of ssDNA template, in a solution containing 20 mM Tris pK 7.5, 50 mM NaCl, 2 mM MgC12. The mixture was heated to 100°C, and cooled slowly to room temperature to allow the mutagenic primer to anneal to the template.

Primer extension was carried out at 37OC for 2 hours afier adding gene-32 protein, T4

DNA-, T4 DNA- and 10 mM each of dATP, dCTP, dGTP, dTïP and

ATP. mer extension, the mutagenesis reaction mixtures were transformed into E. coli strain JM83 and plated on LB-agar plates with the appropriate antiibiotic. The next day, colonies were picked and grown in 3 mL of LB plus antibiotic ovemight at 37OC.

Plasrnid DNA containhg possible mutations was isolated using alkaline lysis mini-prep techniques (125). Potential mutants from the Kunkel procedure were screened using a diagnostic restriction-enzyme digestion, since in each case the mutagenic primer introduced both the desired mutation in addition to a silent mutation that introduced a novel restriction site. When a mutant was detected by digestion, the nucleotide sequence around the mutated site was confirmeci by automated nucleotide sequencing of purified plasmid by the Cortec DNA Service Laboratories at Queen's

University.

2.163 Expression and Purification of Mutants

To express large subunit mutants. the mutated large subunit plasmids were CO- transformed with the plasmid pACpET-îlk (which encodes the small subunit) into E. coli strain BL2 1(DE3) for expression.

Since mal1 subunit mutants could not be obtained in pACpET, the mutant insert was subcloned fiom the PET-lob construct back into the pACpET backbone. Hindlll and Xbal were used to cleave both pACpET-21k and the mutant PET-20b plasmid.

Agarose gel electrophoresis was used to isolate the mutant 2lk-insert and pACpET backbone DNA Fragments, which were then purified using a gel-extraction (Qiagen).

These DNA hgments were then ligated with T4 DNA ligase at 12°C ovemight and transformeci into E. cdi strain JM83. Small subunit mutants were made in two cases, and both of these were designed to be CO-expressedwith a large subunit mutant. In these cases, both the large and small subunit mutant plasmids were purifieci nom JM83 cells, and CO-tmnsformedinto BL2 1(DE3) cells for expression. Al1 mutants were tested for expression levels and activity using mal1 amounts of culture (10 mL). Mutants to be fully characterized were then subjected to the nomal large-scale expression and purification protocols as describe in section 2.2.

Subsequently, several mutants were analyzed for their [~a'']~.~and specific activity. In some cases, mutants had been designed to incorporate disulfide bonds. These mutants were oxidized by the slow addition (by dialysis) of either oxidized glutathione or CuCI2, and their ability to form disulfide bonds was tested using non-reducing SDS-PAGE.

2.1 7 Crystallographic A nalysis of rn- Calpain Mutants

Several mutants were subjected to initiai crystallization atternpts in conditions similar to those for wild-type and the CyslOSSer mutant. Mutants that did not crystallize under these conditions were screened using the Hampton Research sparse-matrix kits

Crystal Screen and Crystal Screen II.

The structures of several mutant enzymes were determined using X-ray crystallography. Data were collected and processed as describeci above, and the structures of mutant enzymes were compareci to wild-type using ALIGN (1 17) and

TURBO-FRODO (1 12). Chapter 3: Results

3.1 Protein Purification

Following four successive chromatographic procedures, recombinant rat m- calpain (wild-type or CyslOSSer) was purified to homogeneity as determined by SDS-

PAGE followed by Coomassie blue staining (Figure 3.1). Typicdly, 20 mg of purifieci protein was obtained fiom 8L of culture. Several months into this research project it was discovered that calpain is not sufficiently stable at 4OC for crystallization purposes. SDS-

PAGE analysis of aged protein sarnples indicated that calpain breakdown products accumulate over time (data not shown). Therefore, if not used irnmediately (within 1-2 days), the concentrated protein sarnple was divided into several mal1 aliquots (typically

50 fi),flash-frozen in liquid nitrogen, and stored at -70°C. On thawing, the enzyme retained protease activity, dernonstmting that snap fieezing had not impaired the protein.

3.2 Purification of Selenomethionine-labeled calpain

The protein expression and purification protocols to obtain the SeMet-denvative of calpain were slightly modifiai fiom those traditionally used. Bacteriai ce11 growth was much slower in the defined medium, and the culture could not be taken beyond an

ODmnm of -0.8 otherwise the medium became contaminated with a thick white precipitate which was apparently due to ce11 lysis. Under such conditions, negligible arnounts of purified protein could be obtained. Even afier optimization of expression conditions, the yield of purified SeMet-calpain following four-step column purification was generally les than 5 mg per 8L of culture. Since SeMet-calpain crystallized in essentially the same conditions as wild-type and CyslOSSer calpain, this amount of protein was sufficient for structure-determination purposes. Amino acid analysis and

MALDI mass spectrometry confirmed the uptake of selenomethionine in place of methionine (not shown).

3.3 Carpain Activity Assays

Following the purification protocol, purified calpain enzymes v ted for their ability to hydrolyze casein. Wild-type calpain was successfully able to cleave casein as indicated by the appearance of TCA-soluble digestion Fragments in the supernatant of the assay medium. CyslOSSer-m-calpain had absolutely no detectable activity, and was therefore used in al1 initial crystallization experiments to avoid potential complications resulting fiom autoproteolysis. Several additional mutant calpain enzymes were also purified and assayed for activity. as will be discussed in more detail in sections

3.1 1 to 3.13. Fipre3.1. SDS-polyacrylamide electropboretic analysis of the purification of recombinant rat m-calpain. The fractions containing the peak amounts of calpain eluted from the following chromatographie purification protocols are shown afier SDS-PAGE and Coomassie-blue staining: a) DEAE- sephacel. b) ~i%J~~-affinit~.c) Sephacryl S-100HR gel-filtration. d) Q- sepharose FPLC. The right-most lane in al1 gels is purified calpain heterodimer with 8OkDa and 2lkDa subunits. The purified calpain heterodimer in (d) is -95- 99% pure based on Coomassie-blue staining.

Initial crystallization efforts using the commercially available Hampton screening kits resulted in several PEG-containing conditions that produced mal1 crystals of

Cys 1OSSer-m-calpain. Akextensive optimization, the best crystallization condition was determineci to be 100 mM MES, pH 6.0-6.5, 10-1 2% PEG-6000,SO mM NaCl using

-10 mg/mL purified calpain.' Crystals were grown at room temperature and generaily became visible under the microscope within one to two days. In early work, most crystals were small and tended to grow as needle-like multi-crystals (Fig 3.2), and were not suitable for collecting high-quality X-ray diffraction data. In general, crystals could not be obtained fiom calpain sampies that had been stored at 4OC for more than two days.

In addition, reproducibility of crystallization was a major problem, since less than 1% of drops contained even moderate difiction-site nystais. A few crystals were capable of dificting X-rays, and were detexmined to belong to the triclinic space group PI.

In the early stages of crystallization, little attention was paid to the "age" of the purifiecl protein. After it was discovered that aging appeared to significantly reduce the ability of the enzyme to crystallize, subsequent crystallization experiments were made more successfil by using only fresh protein sample. or sample that had been flash-frozen immediately after the final purification and concentration step. Reducing the total time required for purification of the enzyme, and incorporatkg fresh D'IT (10 mM) in the final concentration bu& drarnaticaliy affécted the process of crystai formation. Afta these modifications were made, a new crystal fom, Ri,was unexpectedly obtained (Fig.

3.3a). The crystallization conditions were essentially identical to those that produced the original PI crystais, except for the presence of fiesh IO mM DTT in the protein solution. The quality of the P2, crystals was much better as judged by the lack of any noticeable dace defects and the appearance of a greater number of diffiction-quality single crystals.

The quaiity of Pl crystals, and the reproducibility of crystallization were improved through a highly unexpected obsmation. During the coune of one purification procedure, protein sarnple that had undergone the first three chromatographic steps (DEAE,N~~+-NTA, and gel filtration) was maintained at 4OC for several days. The final purification step (Q-sepharose) was then perfoxmed and the enzyme was concentrated in a buffer with only 0.1 mM DTT'. (This preparation of enzyme was to be used for CO-crystallization experiments with mercury, and high [DTT] can cause intederence) Initial crystallization experiments (without mercury) resulted in the appearance of several single PI crystals that grew to a large size with the absence of any surface defects (Fig 3.3b). Although highly unexpected, this rnodified purification procedure was repeated on several occasions and in each instance the ability to produce excellent Pl crystals was observed. It was aiso noticed that adding 10 rnM DTT into the crystallization of this protein sarnple resulted in the appearance of PZ, crystals instead of

Pl. With these procedures, it was possible to reproducibly produce crystais of both native and selenomethionine derivatives of m-caipain in both P1 and forms, although

P2, cxystds were easier to obtain. Fipre3.2. Crystals of m-calpain obtained in the early stages of the crystaüization process. Most crystals were needle-shaped multi-crystals and were unsuitabie for X-ray difiaction experiments.

Figure 3.3. Diffraction-quality crystals of m-calpain. High-quality crystals were obtained in two crystal forms. a) A Ri crystal with dimensions of approximately 1.0 x 0.3 x 0.25 mm. b) A Pl crystal with dimensions of approxirriately 0.3 x 0.3 x 0.3 mm.

Calpain crystals grew to a maximal size in approximately three to five days and remaineci stable for a period of several weeks. Followïng this time period, the cxystals started to display jagged surface characteristics, which was followed closely by complete dissolution. Thus, it became essential to stabilize any di&ction-quality crystals phor to their inevitable breakdown. Although reducing the temperature or increasing the PEG concentration did indeed prevent crystals from dissolving, they were oflen accompanied by unwanted side effects. Crystals moved to 4OC often grew newly formed crystals on the surfaces of the existing crystals, rendenng them useless for data-collection purposes.

Crystals that were transferred to solutions with increased PEG concentrations sometimes cracked and proved to be not as usetùl for data collection. Additionally, both of these methods were deficient as far as transportation of crystais to synchrotron facilities was concemed. Calpain crystals were readily disturbed by the stresses of transportation such as temperature fluctuations and the unpredictable and sometimes vigorous agitation that occurred. ûnly a cryogenic storage method proved to be of practical use to prevent this problern. Cryogenic freezing of the crystals in propane circumvented al1 the aforementioned problems and made crystal transport to the synchrotron vimially a hassle- kee process.

The procedures mentioned above applied strictly to crystals that were grown in the absence of caZ'. Crystallization of calpain in the presence of ca2' was attempted solely with the CysIOSSer inactive mutant since wild-type calpain undergoes autolytic digestion in the presence of ~a?Unfortunately, ail crystallization atternpts involving caZ' proved unsuccessful due to the massive amount of aggregation that occurred upon caz'-addition. To avoid the problems of aggregation rdting kmCO-crystallization ûials, pre-formed native m-calpain crystals were also soaked in solutions containing

~a". These crystals cracked immediately and were completely shattered within several minutes, even at relatively low (-500 PM) ~a"concentrations.

3.5 X-Ray Diffraction Analysis

Several X-ray difhction data sets have ben collected fiom native calpain crystals of both Pl and Ri forms. A typical diffiction pattern (in this case from a PZi crystal) is illustrated in Figure 3.4. The best data in both space groups were collected at

CHESS beamline F- 1, and the statistics are summarized in Table 3.1. Both crystal forms contained one m-calpain molecule per asymmetric unit. The tnclinic Pl fonn has a higher solvent content and lower difiaction resolution than the monoclinic form.

Upon exposure to X-ray radiation, both crystal forms decayed very quickly at room temperature, making it essential to collect data at cryogenic ternperatures. Many cryo- protectants including glycerol. MPD, fructose, sucrose, trehalose and mineral oils were screened and were deerned not suitable since they either cracked the crystals or gave rise to crystals with hi& mosaicity. For Pl crystais, serial transfer to crystallization solutions supplemented with IO%, 20%, and finally 30% PEG 400 (or glucose) for 2 min at each concentration followed by immediate flash cooling in liquid propane gave the best conditions for data collection. For El crystals, serial transfér through solutions supplemented with 5%, IO%, 20%, 30%. and hally 40% ethylene glycol for -5 min at each concentration followed by flash cooling in liquid propane was found to be optimal.

Propane was by far the most effective method for fieezing crystals. Crystals that were hzen by placement in the cryostream or directly into liquid nitrogen suffered fiom severe increases in crystal disorder and mosaicity. Most importantly, no observable

radiation damage was detected when data sets were collected at cryogenic temperatures.

Table 3.1. Native X-ray diffraction and data statistics

Crystal Fom

Space Group P1 ml Beamline CHESS-F1 CHESS-F1 Wavelength (A) 0.9 1O 0.910 Cell dimensions a (A) b (4 c (A) a P 'Y Resolution R~~HA) Total No. of reflections No. of unique reflections 4," 4," Completeness (%) Usig(1)

Mosaicity (O) No. of molecules per asymmetric unit Solvent content (%)

The precision of the data is given by Rv, = qI(k) - IN(k), where I(k) and

3.6 Conventional Heavy Atom Derivative Screening

Co-crystailization of calpain was attempted using a large number heavy atom compounds including K2PtCL, KAu(CN)?, KAuCL, NaAuCh, &Pt&, Pb(CH3COO)û

(CH3)3Pb(CH3COO),p-Chioromercuibenzoate (PCMB), Hg(CH3COO)r, HgC12, SmCl,, mercurochrome, and ma- others. In almost ail cases, and in particular for lanthanides

(~a"analogs) such as samarium and holmium, addition of the heavy atom resulted in a massive amount of aggregated protein in the crystailization drop (similar to that observed upon ~a"addition).

The addition of two heavy atom compounds, triethylphosphine gold chlonde

(TEPGC) and ethyl mercurithiosalicylate (thimerosal), eventuaily gave rise to crystals using PEG 6000 as the precipitant. Both conditions appeared to give rise to crystals of a similar hexagonal morphology (Figure 3.51, with dimensions reaching 0.6 x 0.6 x 0.25 mm or even larger. Unfortunately, these crystals did not diffract beyond 5.5 A resolution even at the synchrotron. From the limited data that were collected oust a few degrees on approximately 10 or 15 crystals that were screened), it was determined that these crystals belonged to a primitive hexagonal space group with unit cell dimensions of approximately 300 x 600 x 300 A. Data collected fiom these crystals were insufficient for location of heavy atom positions and could not be used for structure determination. Figure 3.5. Calpain crystals obtained from co-erystallization with heavy atoms. a) Co-crystals grown in the presence of triethylphosphine gold chloride (TEPGC) were typicaily 0.6 x 0.6 x 0.25 mm in size. b) Co-crystals grown with ethyl mercurithiosaiicylate (thimerosd) were ofien very similar to those in (a) aithough sornetimes grew slightiy differently. The CO-crystai in (b) has dimensions of approximately 0.3 x 0.3 x 0.3 mm.

Essentiaily the same heavy atom compounds used in the CO-crystallizationtrials were used in attempts to generate heavy atom derivatives of calpain through soaking of native crystals. P21 native crystais were used in approximately 90% of the heavy atom soaking trials since they were more abundant than PI crystais. Most heavy atom compounds caused immediate cracking or shattering of the crystals within a matter of seconds, even at very Iow (0.01 to 0.1 rnM) concentrations. Several compounds, including K2PtC4, Pd(CH3C00)2,mercurochrome, thimemsal, TEPGC, either did not visibly affect the crystals or only caused cracking at high concentrations of the heavy atom (10 mM or higher). Some compounds such as Pd(CH3C00)2, K2PtCL, mercurochrome. TEPGC and a few others appeared as promising leads since they readily absorbed the heavy atom, as indicated by a change in the color of the crystal after several hours of soaking. Such crystals were subsequently cryoprotected, flash frozen in liquid propane and subjected to X-ray difiction analysis. Some crystals that did not appear damaged from the heavy atom soaking procedure were unable to difhct X-rays, indicating that internai disorder was present in the crystal, but was not visible under the microscope.

Of the heavy atom-soaked crystals that retained the ability to diffract X-rays, approximately 50 data sets were collected both using the home X-ray source and several synchrotron sources. In most cases, isomorphous-di fference Patterson maps illustrated that no heavy atoms were specificalally bound. Some weak peaks were found on the

Harker section @=Il2 for Ri)of the maps calculateci from crystals soaked with either

TEPGC or thimerosal (Figure 3.6), suggesting that perhaps a small percentage of these heavy atoms had bound specifically. Heavy atom rehement and phasing using difference Patterson-derived TEPGC or thimerosal sites resulted in relatively poor statistics. Electron density maps and solvent-flattened maps were also quite poor as judged by the lack of a clear protein- solvent boundary and very little connectivity within the electron density. Although the phase information was not sufficient to solve the structure, cross Fourier analysis of the

Bijvoet diff~encesfrom the peak wavelength of the SeMet MAD data was able to correctly locate the selenium positions (as verified later from direct methods). Figure 3.6, Isomorphous difference Patterson map from TEPGC-soaked native PLi crystals. A weak -4 o peak was located on the Harker section v=1/2 at position u=0.288 v=O.S w=0.126, corresponding to a gold atom at x=0.144, y=y, ~0.0634. The map was calculated using 10 to 3.8 A data from the isomorphous differences benveen a native and a TEPGC-soaked crystal. This heavy atom site was later verified using cross-difierence Fourier analysis tlom MAD phases (a 22 o peak) and the y-coordinate was determined to be y=0.235 (according to the y-ongin defined by the selenium atoms). Al1 coordinate values represent the fractional value in the Riunit cell. The phase information obtained tiom this derivative was insufficient to solve the crystd structure. Contour levels shown are at increments of 0.5 o,begi~ing at 2.0 a.

3.7 MAD Data

In total, five MAD data sets were collected on the SeMet derivative of calpain during the course of this project. in each case, an X-ray fluorescence scan was measured using a large SeMet-containing crystal at the selenium K-absorption edge (12.620 to

12.720 keV, or 0.9839 to 0.9747 A, respectively) (Figure 3.7). inspection of these plots clearly indicated the positions of the white line and inflection points, which were used to select the peak and edge wavelengths for data collection, respectively. The first data set, collected on the PZi crystal forrn at CHESS beamline F-2, was determined to be of low- level quality as judged by the data processing statistics €rom SCALEPACK (96).

Crystals had suffered mosaicity increases as a result of transport to the synchrotron (since this was prior to using propane as a freezing and storage protocol). The second MAD data set, also collected on a Ri crystal at CHESS beamline F-2 several months later was of much higher quality, but was still insuflicient for structure detemination. Although approximately 10 out of 18 selenium positions were eventually determined by direct methods, the major problem with these data was that the wavelengths were not stable during the data collection as a result of major energy shifis occurring at the beamline.

The third MAD data set, also on a P2i crystal, was collected at bearnline X4A at the

NSLS and appeared to be of higher quality than both CHESS data sets. The energy of the beam appeared stable throughout the data collection, and the anornalous differences at the peak wavelength were used to readily locate 17 of the possible 18 selenim positions using direct methods. Again, it was not possible to interpret the resultuig electron density maps fiom these data for reasons we have not been able to resolve, despite a tremendous amount of help hmthe cxystallographic community. Figure 3.7. X-ray fluorescence scan from a crystal of SeMet m-calpain. The X-ray fluorescence scan from a large SeMet crystal clearly shows presence of setenium. The maximum value of fluorescence was observed at the white- line (or peak) at a wavelength of 0.9790 A (12664 eV), while the inflection point (or edge) was observed at 0.9793 A (12660 eV). This scan was collected on NSLS beamline X4A at Brookhaven National Laboratones. Similar scans were obtained at CHESS beamline F-2, and SSRL beamline 1-S.

Table 3.2. MAD data collected at SSRL

dmin Observed Unique Completeness 1/0(1 Rsym(%) Crystal (A). . (A) Reflections Reflections (%). . SeMet P2 1 " low energy remote 1.069 2.6 80,764 28,841 92.7 (66.1) 25.3 2.7 (15.6) anomalous peak .9795 2.6 155,288 57,860 93.4 (88.7) 2 1.5 3.1 (2 1.9) inflection point .9799 2.6 136,982 54,126 88.3 (8 1.1 ) 19.7 3.2 (26.0) high energy remote .9252 2.6 153,I O0 57,58 1 93.2 (89.0) 2 1.6 3.2 (21 S) SeMet PI' anomnlous peak ,9795 2.6 128,734 74,656 93.7 (92.1) 16.9 2.7 (37.4) inflection point ,9799 2.6 67,174 43,860 57.2 (55.1) 18.7 2.0 (23.4)

Values given in parentheses refer to reflections in the outer resolution shell, 2.7 - 2.6 A. Unit cell dimensions: "PZ : a=5 1.85, b= 1 56.7,~44.43, Pc95.32' "1: a=65.17,b=79.91, ~81.59, a=108S0, P=103.37°,y=112.950 Al1 MAD data were collected in inverse beam mode, with the exception of the low energy remote of the P2, fom, which served as the "native" for phasing. Figure 3.8. Anomalous difference Patterson map in P&. Bijvoet differences tiom the peak wavelength from a SeMet-calpain crystal (collected at SSRL) were used to calculate an anomalous difference Patterson map using 10 to 3.5 A data. The v=1/2 Harker section is shown, and each contour level represents an increment of 0.5 CJ, begiming at 2.0 o. Only one relatively swng peak (-6 a) was found at u=0.72, v=0.5, w=0.09, indicating a selenium atgm at x=0.36, y=y, z=0.045 in the PZi cell. Since there were 18 seleniurn atoms in the crystal, direct methods were required to determine the remainder of the selenium positions (see Figure 3.9). Direct methods confirmed this was a tnie site, with a y-coordinate of 0.475, according to the y-ongin defined by direct methods.

Figure 39. Solution of selenium positions in PTl using direct methods. The program Shake n Bake (SnB) (101) was used to identify the selenium positions fiom the Bijvoet differences collected at the peak wavelength. 1000 trials were m. seeded with random coordinates for 18 potential heavy atom sites. The success of the direct methods algorithm in each trial was indicated by a minimal value of Rmin. In most cases, a given trial was unsuccessful, resulting in an R,. value of -0.55-0.6. A potentially correct solution was indicated when a given trial converged to yield a significantly lower value of R,.. a) Using peak data fkom CHESS, approximately 9 trials converged to an Rmin of -0.43. In these 9 solutions, it was detmined that 10 of the 18 output coordinates represented the correct selenium positions. b) Using peak data fiom SSRL (Rishow here) approximately 100 correct solutions of 1000 were found to converge to an R,;. of -0.35. Analysis of the solutions revealed that in each of the - 100 solutions, 1 7 coordinates were identical, indicating that 17 selenium positions had correctly been identified. These sites were used for the phasing process that led to the structure determination. TdNumber

Trial Number Table 33. MAD phasing statistics for the PZi crystal form of SeMet m-calpah8

AI hz (Peak) h3 (edge) h dispersive/ dispersive/ dispersive/ anornaIous anomalous anomalous

b Rcullis centric acentric Phasing power centric acentric Refined heavy atom parametersd f f' Figure-Of-men t from SHARPC(50 - 2.6 A) centric acentric Figure-of-merit after solvent flattening (50 - 2.6 A)

" MAD phasing was calculated as a special case of multiple isomorphous replacement with anomaious scattering (MIRAS), where hiwas used as a native. b RculIk = zjlE(1 Lj (1 RI - (F~~ll,where E is the lack of closure. Phasing power = ,where FHtalc)is the calculated anomalous difference, and E is the lack of closure. f and P' for Li were not refined. ' (Refmence 103) Table 3.4. MAD phasing statistics for the PL crystal form of SeMet m-caipain. a

LI (edge) hz (peak) anomalous dispersive/anomalous

b Rcullis acentric Phasing power acentric Refined heavy atom parameters: f f'

Figure-Of-meri t nom SH ARPC (50 - 2.6 A) acentric Figure-of-merit afier solvent fiattening (50 - 2.6 A)

%phasing was calculated as a special case of multiple isomorphous replacement with anomalous scattering (MIRAS), where Xi was used as a native. b Rcuiiii = 41EI I Tj II FAI- IFA& where E is the lack of dosure. Phasing power = ,where FH(cdc)is the calculated anomalous difference, and E is the lack of closure. f and P' for hiwere not refined '(reference 103) Figure 3.10. Electron density maps. a) Experimental MAD electron density map obtained From SHAW (1 03) on the Pl crystal form. b) The sarne map, after solvent-flattening using SOLOMON (104) was significantly improved, allowing the model to be traced into the density. In both (a) and (b), the atoms illustmted represent the final model obtained after refinement was completed.

3.8 Mode1 Building and Refnement

The structure was built starting from an initial a-carbon trace Uito solvent- flattened electron density maps calculated from MAD data of the P2, crystal form. A poly-alanine backbone was then fit into the density using the library of pentarner structures available in the program XFIT (105). Starting fiom the methionine positions

(which were deduced from the selenium coordinates), side chains were added using the automatic fitting procedure in XFIT. Approximately 50% of these side chains were improperly positioned, so they were manually fit into the electron density by applying the required torsional rotations about the side chah bonds.

Electron density was either poorly defined or was not visible in several regions of the molecule, therefore only approximately 700 of a total of 884 residues could be initially resolved. This model was then placed into the Pl unit ce11 (using the coordinates of the selenium atoms in both space groups as a reference) which allowed the correction of a few minor tracing mistakes as well as the addition of -100 more residues. This model was then subjected to several iterative cycles of CNS refinernent and manual rebuilding in XFIT in both crystal forms.

Although native data in the Ri crystai form was collected to a much higher resolution (2.15 A), the final structure is reported in the Pl space group at 2.6 A resolution since a more complete mode1 was resolved This model is well defined with the exception of the following disordered regions that were omitted: 245-260, 273-278,

292-32 1, 43 7-459, 565-566. Includmg a total of 6428 protein and 3 6 1 water atoms over the resolution range 25 to 2.6 the crystallographic R-factor was 22.3%, and the R-free was 29.3% (Table 3.5). An analysis of the individual isotropie B-factors of the m-calpain structure showed the conventional trend in that residues in the interna1 core of the structure are more rigidly defined than those at the surface. Several loops have very high

B-factors, mggesting they are highly flexible in nature. In fact, it should be noted that the structure on the whole is moderately flexible, as indicated by relatively high B-values across the entire molecule. The quality of the mode1 was assessed with PROCHECK

(1 15), and displays good stereochemistry. 83.2% of residues lie in the most favorable regions, and only glycine residues fell in the disallowed regions of the Ramachandran plot (Figure 3.1 1).

Table 3.5. Refmement statistics*

Space Group Pl Resolution (A)

Total # reflections used Total # reflections (working set) Total # reflections (free set) Overall U-factor " Overall free R-factor # of non-hydrogen atoms: (protein / water) Average B-factor (A')

R.m.s.d. bond lengths (A) / angles (O)

Crystallographic R-factor. R = qlFol[h.k,l) - IFcl(h~$1 EIFol(hk~I),where (h,k,l) are the reflection indices. b Free R-factor, Rk= qI Foj(t,.klkr- IFcl(hklk7(I EIFol(hkl)c~where ('h,k,l) are the reflection indices €tom the test set T. 'No sigma cutoff was applied to the data. Figure 3.1 1. Ramachandran plot. The crystal structure of m-calpain displays good stereochernistry, as -83% of the residues are in the most-favorable region (red) of the Ramachandran plot. No residues (with the exception of glycine, A) are in the disallowed (white) regions. Phi (degrees) 3.9 Overall Structure of rn-Calpuin

Since the crystal structure was solved in the absence of ~a", al1 observations refer to the conformation of the enzyme in the inactive confonnation. The m-calpzin heterodimer is an elongated, multi-domain assembly, with dimensions of approximately

100 x 60 x 50 A as seen in the ribbon diagram in Figure 3.12. It is readily apparent that the large subunit cm be broken down into four independently folded domains (domain-1 to domain-IV, or D-1 to D-IV) in addition to an a-helical N-teminal anchor of 19 residues (Figure 13.13a). A structure-based domain classification is illustntted for simplicity in Figure 13.13b. This classification scheme is slightly different from that previously derived on the basis of pnmary sequence in 1984 ( 19) (see Figure 1.3), and is justified by the structure. Calpain is effectively circuiarized, as the N-terminal region of the large subunit interacts with D-VI of the regdatory subunit, which in turn dimerizes with D-[V at the C-terminus of the large subunit. It is interesting to note that the catalytic component of this ca2'-dependent enzyme (the protease) is situated quite far away fiom the EF-hand domains (Figure 3.14). From the electrostatic representation of the protein surface illustrateci in Figure 3.15, it is clear that m-calpain is highly acidic in nature. In the absence of cal', rat rn-calpain also contains several disordered regions, rnost of which are located in D-II. Figure 3.12. Crystal structure of the m-calpain heterodimer. The catalytic subunit @lue) and domain-VI of the regdatory subunit (orange) form the m- calpain heterodimer as it exists in the caZ'-~reeconformation. The N- and C- temini are indicated for each subunit. Dotted lines represent disordered regions that could not be identified.

Figure 3.13. Domain structure of m-calpain. a) Ribbon diagram of m- calpain. b) Schematic diagram illustrating the domain organization of m-calpain. c) Stemdiagram of (a). The 80 kDa subunit is composed of a 19 residue anchor (red) at the N-terminus, protease domains 1 and II (Mue and cyan, respectively), domain411 (green), a - 15 residue linker (magenta) and domain-IV (yellow). The regdatory subunit contains only domain-VI (orange) in this structure, since the glycine-rich domain-V (white) was not present in the recombinant construct. Catalytic triad residues are indicated in red, and approximate domain boundaries are indicated by residue number. The color scheme in this diagram will be followed for most of the following figures. I

Protease

D-III

D-II7

Linker -

D-IV b) Anchor Protease D-I Pratease 0-1 I 0-111 Linker EF-hands

19 Cl05 210 H262. N286 355 514 530 700 D-VI D-V EF-hands

Figure 3.14. Spatial arrangement of the protease and EF-hand domains. Since calpain activation is entirely dependent upon ~a"-binding, it is interesting to note that the catalytic component of calpain (the cysteine protease, blue) is separated by a significant distance from the EF-hand domains (red). Clearly, this arrangement has significant implications as far as ~a"-dependent regulation is concemed.

Figure 3.15. An elechostatic representation of the van der Waal's surface. Regions of red and blue indicate negatively and positively charged environments, respectively. Overall, calpain is acidic in nature, which may help to attract ~a" ions for activation.

3.9.1 The Cysteine Protease @ornains 1 and II)

As expected, the protease component of calpain shares several similarities with other members of the papain-like cysteine proteases. The protease region is contained within the catalytic subunit and is subdivided into two distinct domains (D-1 and D-II).

The catalytic triad residues (Cys 105 in D-1, His262 and As1286 in D-II) are located at the interface between D-I and D-II (Figure 3.16). The "backbone" of D-1 (residues 20-2 10) is a central helix (residues 176-189) that is flanked on three faces by a cluster of a- helices, and two anti-parallel B-sheets. This a@ domain is very well-stmctured due to an extensive core of hydrophobic residues that stabilize the fold. Domain-1 is as an entirely novel fold, as no structural homologs have been detected to date in database searches.

Domain-1 is non-covalently associated with the regdatory subunit by the preceding N- terminal 19 amino acid residues that form a single a-helix (see below for more detail).

Domain-1 also makes contacts with D-III (see below) and has several regions that contribute to the extensive interface with D-II. Apart kom the a-helix containing

Cys 1 OS (residues I OS to 1 14), which is slightly shorter than the conesponding helix in other members of the papain family, D-1 is otherwise alrnost entirely ünrelated to the corresponding domain in the typical thiol protease. This conserved helix serves to orient the catalytic residue CyslOS towards the active site clefi at the interface with D-II. A hige region, cornposed of consecutive glycine residues (Gly209 and Gly2 1O), covalently links the two protease domains.

In contrast, the overall fold of D-II (residues 21 I to -355) is quite similar to that of the corresponding domain in other cysteine proteases as it contains two three-stranded anti-pardel Psheets as the core component of the domain. These strands serve an important role, namely, to orient His262 and As11286 of the catalytic triad towards the interface with D-1. A second conserved structurai feature in D-II is a three-turn a-heIix

(residues 224 to 235) that appears to have a distinct role in calpain since it makes several inter-domain electrostatic contacts with D-III. Although these conserved elements are structurally well-defined, D-II appears to be the most flexible domain in calpain, having three regions (residues 273-278, 245-260, and 292-3213 that were not visible in the electron density maps. Figure 3.16. The cysteine-protease component. a) The protease module in calpain consists of two independently folded domains (D-1, blue and D-II,cyan). The residues of the catalytic triad exist at the interface of these two domains, as in other th01 proteases. b) Stereodiagrarn of (a). elix '6- 189)

Protease D-II Protease D-1 3.9.2 Domain-III

Domain-III consists of residues -356-513, and forms an eight-stranded anti-

parallel fbsandwich structure. This P-sandwich is a faVly common foid, and 3D

homology searching revealed several structurally similar folds, notably the C2 domain,

the imrnunoglobulin fold, and viral-coat proteins. In calpain, D-III could be described as

resernbling a battery, having a core component (the P-strands) that separates two oppositely charged poles (the loops at either end of the sandwich) (Figure 3.17). The

'hegative pole" contains several acidic side-chahs contributed by two loops (residues

392 to 404 and 502-505) and interacts primarily with D-II. tn contrast, the 'positive

pole" of D-III has several basic residues residing on one large loop (residues 415-426)

that interact with D-1, the hinge region between D-I and D-II, and with acidic residues in

D-VI. Residues 367 to 371 in D-III also fom a loop that is in close proximity with

residues 624-625, and an a-helix (residues 640-651) in D-IV. Thus, this P-sandwich

scaffold appears to provide D-III with a very important stmcturaI role since it positions

several of its loop regions at critical locations that form a network of inter-domain

interactions within the centrai region of the molecule. Figure 3.17. Domain-III. a) Domain-III in m-caipain is an eight-stranded anti- parallel B-sandwich structure that has a highly acidic loop-region at one end and a highly basic loop-region at the other end. By interacting with every domain in calpain, D-III likely serves an important mle for inter-domain "communication". Colors are as in Figure 3.13, with acidic side chains shown in red, and basic side chains show in blue. b) Stereodiagram of (a). a) Basic "Pole" 3.93 The EF-hand Domains @-IV and D-VI)

D-IV and D-VI,the calmodulin-like EF-hand domains of each subunit, are -50% identical to each other in sequence and are predominantly a-helicd, each containing five

EF-hand motifs (Figure 3.18). Their structures are very simiiar (r.m.s.d. of I .7 A on al1 main chain atoms) and have pseudo-twofold symmetry. Heterodimerization of the catalytic and regdatory subunits occurs primarily through hydrophobic interactions in the

C-terminal regions of D-IV and D-VI. The structure of D-VI in the heterodimer described here, is virtually identical to its structure in the homodimer (r.m.s.d. of 1.29 A on main chah atoms). As in the homodimer structure, EF I and EF2 associate to form one pair, while EF3 and EF4 form a second pair. EF-hand 5 in both D-IV and D-VI forms an intmolecular pair, confirming the prediction based on the crystal structure of the D-VI homodimer (66,67) Figure 3.18. The EF-hand domains. a) Domain-IV and domain-VI constitute the calmodulin-like EF-hand domains in m-calpain. Both D-IV (yellow) and D- VI (orange) have five EF-hands. As shown for D-IV, EF-hands l through 4 (blue) are paired within each domain. EF5 in D-IV (red) and EF5 in D-VI (light blue) contribute to heterodimer formation through intennolecular association. or "EF-hand embrace". This mode of dimer formation is unique to the calpain superfamily. b) Stereodiagram of (a). EF-Hand-5 "Embrace"

EF- 1

D-VI

,- 3.9.4 The N-terminal Anchor

The N-terminus of the catalytic subunit is a 19 residue a-helix that anchors D-I to

D-VI of the regulatoxy subunit. A pocket in DVI provides a hydrophobic environment for several hydrophobic residues near the N-temiinus of the a-helix (Figure 3.19). A helical-helical dipole rnay also stabilize this interaction. Towards the C-terminal region of the anchoring helix, a series of electrostatic interactions exist with D-VI. In this ~a'+-

Free structure, this anchor interacts erclusiveiy with D-1 and D-VI (Figure 3.19), suggesting that the hction of D-1 and D-VI may be interdependent. Unlike other cysteine proteases such as cathepsins, this N-tenninal "extension" in calpain is not in the vicinity of the active site clet?.

3.9.5 The Linker

D-III leads into a -15 residue extended linker that connects D-III to the ca2'- binding D-IV. The linker lacks secondary structure, with the exception of three residues

(5 16-5 18) that fom a short anti-parallel B-sheet with three residues (636-638) from D-

IV. This short P-sheet is strengthened by a conserved electrostatic interaction between the side chahs of Glu5 17 and Lys637 (Figure 3.20). Figure 3.19. Calpain has a unique N-terminal aochor. a) The helical anchor (residues 2-16 are shown) interacts exclusively with D-1 and D-VI (colors as in Figure 3.13). b) View down the helicai axis highlights electrostatic interactions between the residues in the anchor (magenta type) and D-VI (black type), represented as an electrostatic GRASP (120) surface (red-acidic, blue-basic). The hydrophobic pocket in D-VI interacts with hydrophobic residues Md,Gly3, IIe4, AlaS, Leu8 and Ala9 near the N-terminus of the anchor. This N-terminal a-helix is clearly not in the vicinity of the active site as is the case in other cysteine pro teases.

Figure 3.20. The linker. a) A short peptide (residues -5 15 to 530) links D-III to the EF-hand D-IV in the catalytic subunit of rn-calpain. This linker is largely devoid of secondary structure, with the exception of a short P-strand from residues 5 16 to 5 18, which interacts with residues 636-638 in D-IV to form a short anti-parallel P-sheet. This interaction is further stabilized by a salt bridge between E5L 7 (red) and K637 (blue), residues consmed in al1 known species of m-calpain. b) A refined 2F& map illustrates that this P-sheet is structurally well-defined. The residues in the C-teminal region of the linker (-525-530), that connect to EF 1 in D-IV,are somewhat unstructw~lf.

3.9.6 The Active Site

A fundamental question in calpain regulation is how this enzyme is maintaùied in an inactive conformation pnor to ca2+-binding. This structure has revealed a regulatory mechanism that is highly unusual in proteases and is unprecedented within the cysteine protease family: there is no pro-segment bound across the active site, but rather, the active site is not assembled. Although the a-helix containing Cysl O5 in D-1, and the fold of D-II, are each similar to those of other thiol proteases, the required geometry of the cataiytic residues is not observed in m-caipain in the absence of ~a". Numerous crystal structures of cysteine proteases have clearly shown that the catalytic Cys and His residues form an ion-pair in the active site. Specifically, the interatomic distance between the S atom of Cys and the N6 atom of His is approximately 3.7 A (77, 80-85). In this orientation, the His-NG atom is at an appropnate distance to coordinate the hydrogen atom bonded to the Cys-S. significantly decreasing the pKa of the sulfur, increasing its negative charge, and rendenng it nucleophilic. In the absence of cal', this structure reveals that the catalytic Cys 1OS-S in D-1 is -10.5 A away fiom His262-N6 and therefore is too remote to form a competent catalytic triad with its counterparts His262 and Asn286 in D-II (Figure 3.21). A conformational change (caused by ~a"-binding) must reduce this distance to -3.7 A in order to assemble the triad and form an active protease.

Additional differences between active site residues in m-calpain and papain-like proteases were observed in the structure. Both Trp288 and Gln99, residues consexved in other cysteine proteases (48), and known to be important for substrate catalysis in rn- calpain (47), are in significantly different conformations. Figure 3.21. Structural basis for ~a~'-de~endentprotease activity. a) The active site of apo-~a?'-cal~ain(colon as in Figure 3.13) is superimposed with the active site of papain (red, PDB accession-code 9PAP). His262 and Asn286 of calpain are manged in a similar orientation to that in papain. Inactivity of calpain in the absence of ca2' is due to the catalytic Cys105 being displaced by -7 A too far from His262. The distance between His-Ni5 and Cys-S is - 10.5 A in calpain and is -3.7 A in active cysteine proteases, hence the catalytic triad is not formed. b) Stereodiagram of (a). (See text for additional detail)

3.1 O Mdecular Modeling

3.10.1 Modeling of Active Calpain and InhibitorlSubstrate Design

Since the structure was detemined in the ~a"-free, inactive conformation, molecular modeling was used to generate a mode1 structure of what the protease domains rnight resemble in the active conformation. Using a combination of manual adjustment and computational optimization using TURBO-FRODO (1 12) and SYBYL (123), respectively, the conserved active site residues of m-calpain were brought into register with the conformation observed in papain and other cysteine protease structures (Figure

3.22).

The modeled active site of m-calpain was subsequently used to design a stable polypeptide that could potentially act as a calpain substrate or inhibitor. Starting fiom a hepta-alanine peptide, the side-chains at each position in the peptide were mutated, and the complex was energy minimized to determine which amino acid was the most stable at a given position. An alternative modeling procedure could have been performed by comparing the energy differences between the various peptides and the active site both ai an infinite distance (not interacting) and at the energy-minimized state. However, by the procedure applied in this research, the most stable peptide obtained was N-Ala-Ala-Leu-

Leu-Lys-kg-Phe-C, as seen in Table 3.6. This peptide was synthesized and tested for its ability to inhibit m-calpain hydrolysis of a fluorescent peptide substrate. N-Ala-Ala-Leu-

Leu-Lys-Arg-Phe-C did display calpain inhibitory activity (data not shown), but it was extremely weak compared to well-characterized calpain inhibitors such as leupeptin, E64 and calpastath, and kinetic parameters codd not be determineci. Figure 3.22. Molecular mode1 of the active site of ca2'-activated m-calpain. a) Overlap of active sites of ~a"-fiee (inactive) m-calpain (colors as in Figure 3.13) and papain (red). b) An overlap of the active sites of "active"-calpain and papain. Following the molecular modeling procedure, the calpain active site is very similar to that of papain. Both figures are stereodiagrams.

Table 3.6 Influence of amino acid sequence on enzyme-substrate stability?

Amino acid at peptide positionD: Enerd P2' Pl' PI P2 P3 P4 P5 (kcaldmol) Ala Ala AIa AIa Ala AIa Ala -973

Ala Ala Ala Leu Ala Ala Ala - 1060

AIa AIa Ala Leu Leu Ala Ala -1 100

Ala Ala Lys Leu Leu Ala Ala -1 120

Ala Arg Lys Leu Leu Ala Ala -1 170

Leu Arg Lys Leu Leu Ala AIa - 1200

Phe Arg Lys Leu Leu Ala Ala -1250

Starting fiom the modeled active fom of the m-calpain active site, SYBYL (123) was used to generate a hypothetical calpain substratdihibitor through ''mutagenesis" of a hepta-peptide followed by energy minimization. Amino acids in bold/italic-type indicate the most stable residue obtained after a given round of "mutagenesis" and refinement. The P2 position was tested first, followed by the P3, P 1. P 1 ' and finally the P2' position. The P4 and P5 positions were kept as alanine residues since any other residue tested in these positions decreased the stability of the system. The energy minimization procedure included only the atoms of the peptide as well as those atoms in m-calpain within a 10 A radius of the peptide. The rernainder of the D-I and D-II coordinates were omitted from the rninimization to reduce the computationd the. Figure 3.23. Molecular mode1 of a calpain-peptide cornplex. The modeled calpain structure is show as a GRASP (120) electrostatic surface in both a) top and b) side views. The peptide N-Ala-Ala-Leu-Leu-Lys-Arg-Phe-Cis shown (in rod representation) interacting with the substrate-binding clefi. The peptide fits quite well in the active site groove, with the two basic side chains of the peptide interacting with acidic regions (red) on the calpain surface. The amino acid sequences of the p- and m-calpain cataiytic subunits are -60% identical, hence using the SWISS MODEL server (h~://www.expasy.ch/swissmod/

SWISS-MODEL.htm1) it was relatively straightforward to obtain a mode1 of the p- calpain structure. As would be expected for a protein of such high sequence identity and the nature of homology modeling techniques, the modeled structure of p-calpain was overall very similar to that of m-calpain (r.m.s.d. of 0.26 A on ail Ca atoms).

3. Il Production, Expression and Purification of m-labain Mutants

Following the deterinination of the m-calpain structure, several site-directed mutants were generated to assess their effects on specific features of calpain activation.

Of the -25 mutations that were generated and analyzed, only one mutant

(Lys226Ala/Lys230Ala/Lys234Ala) was not expressed within E. coli, as indicated by immunoblot analysis (data not shown). Expressed mutants had phenotypes ranging from completely inactive to essentially wild-type activity, as judged by casein zymography

(Figure 3.24a). Only those mutants that displayed at least a trace of activity were expressed and purified on a large-scale as describeci in section 2.2. Of these, several were pooriy expressed and could not be purified to homogeneity. Table 3.7 gives a summary of the expression and purification of various mutants, while an SDS-polyacrylarnide gel illustrates the purity of several mutants in Figure 3.24b. Table 3.7. Anaiysis of various calpain mutants?

Expressed Activity in Expressed Mutant subunitb in E. coii? Zymogram? and Purifieci? KlOT L Y Y KlOTlRlSS L -d - - R12S L - - - K226CVK23OA/K234A L Ne - - K226S L Y Y Y K230S L Y Y Y K230E L Y Y Y K334S L Y Y Y K234E L Y Y Y K330E/K234E L Y N N R474A L Y N N R500H L Y N N E504S L Y Y Y ES 15s (Ci~-~~~-~~p-m~~)*L Y Y Y

E5 15Q/E517Q L - - O E5 15f L Y Y Y ES 17P L Y Y Y K 10TlE504S L Y Y Y KI OT/E504S/E5 1 7P L - - - E504S/ES 17P L Y Y Y S698C/M155C US Y Y Y A5CIA 1586 L/S Y Y Y a Several of the calpain mutants were expressed and anaiyzed to difkent extents, depending on their observeci properties. Single-letter amino acid code is used for clarity. L-mutation in large subunit; US-double mutant in large and small subunit. ' y (Y@ * - (no data) N (no) This mutant was made in a chimeric p/m-calpain consûuct (see section 2.16.2 for detaiIs). Figure 3.24. Expression and purification of m-calpain mutants. a) A casein zyrnograrn illustrates the varying degree of activity shown by wild-type m-calpain (lane 4) and various mutant calpains. Lane 1 - Lys226Ser; lane 2 - Lys230Ser; lane 3 - Lys230Glu/Lys234Glu A white band in a casein-zyrnogram indicates activity, since casein has been digested and released fiom the gel and is therefore not stained with Coomassie blue. From this figure, it is apparent that the Lys226Ser and LysZ3OSer mutants have activities comparable to wild-type m-calpain, whereas the Lys230GIu/Lys234Glu mutant has very weak activity. b) SDS-PAGE analysis of wild- type m-calpain and several mutants. Lane 1 - Molecular weight marker; Lane 2 - wild- type; Lane 3 - GIu504Ser; Lane 4 - Lys226Ser; Lane 5 - Lys23OSer; Lane 6 - Lys234Ser; Lane 7 - Lys230Glu; Lane 8 - Lys234Glu. Two distinct bands (8OkDa and 2 1 kDa) indicate the presence of highly purified calpain heterodimers.

An interesting feature was observeci in the purification of the double mutant

GluSO4Ser/GluS 1 7Pr0, as the regulatory subunit appeared to dissociate from the catalytic subunit during the purification procedure (Figure 3.25). This phenornenon occurred on four successive purification attempts, illustrating that some feature of this mutation results in heterodimer "instability". Following dissociation of the small subunit, the remaining catalytic subunit was considerably less soluble, as indicated by a white- precipitate that accumulated over time in the protein solution. Interestingly, neither of the single mutants Glu504Ser or Glu517Pro displayed any tendency for subunit dissociation.

3.12 Disulfde Bond Formation Through Mutagenesis

Guided by the program SSBOND ( 126) two double mutants (S698C-80kM 1SC-

2 1k and A5C-80WA 158C-2 1k) were generated in m-calpain in an atternpt to introduce specific disulfide bonds (m-calpain has no endogenous disulfide bonds). Since the putative disulfide bonds would covalently link the catalytic (80kDa) and regulatory

(21kDa) subunits, a -101 kDa band would be visible in non-redunng SDS-PAGE analysis if the disulfide bond had indeed formed. Analysis on non-reducing SDS-PAGE indicated that disulfide-bonds had not been formed following purification of the enzymes.

To stimulate disulfide bond formation, oxidized glutathione or CuCI2 were introduced slowly by dialysis. While it was shown by SDS-PAGE analysis that only CuClz could promote the formation of a -LOO kDa fiagrnent (glutathione could not) of both double mutants, wild-type m-calpain also oxidized to the same 100 kDa band, mggesting that the disulfide bond formation occurred in a non-specific fashion (data not shown). Figure 3.25 Glu504Ser/Glu517Pro mutation causes subunit dissociation. Repeated attempts to puri@ a heterodimenc Glu504Ser/GluS 1 7Pro double mutant of calpain were unsuccessful. In each of four separate attempts of expression and purification of this mutant, the regulatory subunit appeared to dissociate and was lost during the purification protocol. SDS PAGE analysis of purified enzyme (column fractions eluted from FPLC Q-sepharose are shown in the left lanes) clearly shows a disproportionate amount of the 8OkDa subunit compared to the 2lkDa regulatory subunit, compared to wild-type m-calpain (right-most lane). Regulatory subunit dissociation was not observed in either the Glu504Ser or Glu5 17Pro single mutants.

3.13 Effect of Mutations on ca2+-requirementand Specifir Activity

Several mutant enzymes were assayed for their ability to hydrolyze casein at varying concentrations of ca2'. The [~a'+]~.~,calculated fiom the Hill equation, served as the basis for cornparison of the ~a"-requirement of wild-type and mutant enzymes. In the conditions employed in these experirnents, the [~a"]~.~for wild-type m-caipain was

242 i 6 pM (Figure 3.26), which is in gend agreement with previous reports ( 14- 18,

88).

Based on the crystal structure, mutations were targeted to three main regions in calpain, the N-terminal anchor, the interface of D-II and D-III, and the linker.

Introduction of the mutation Glu504Ser in domain III had the most dramatic effect on the ca2'-sensitivity, reducing the [~a"]o.~to 129 * I PM. corresponding to a 47% increase in sensitivity comparai to wild-type m-calpain. The effects of this and other mutations on the ~a"-requirement and specific activity of m-calpain are illustrateci in Figure 3.26 and are summarized in Table 3.8. One mutation was also introduced into a chimeric p/m- calpain hybrid construct that had ken previously produced in the laboratory of Dr. John

Elce (unpublished data). This hybrid enzyme consists of the -550 N-terminal residues of p-calpain at the N-terminus and the -170 C-terminal residues of m-calpain at the C- terminus, and has a [ca2'las (-102 PM) that lies between that of the two isoforms. Based on sequence aliments (Figure 1.2) this mutant, Glu5 15Ser, was designed to mîmic the mutant Glu504Ser in m-calpain that had significantly increased the sensitivity of m- calpain to cal'. Although the effect was not quite as dramatic as that observed in m- calpain, the Glu5 l SSer mutant in the hybnd enzyme clearly reduced the [~a~']~.~,as seen

in Figure 3.27 and Table 3.8. Figure 3.26 ca2+-titrationsof m-calpain variants. The obsewed values of [~a'']~.~were as follows: wild-type rn-calpain (a); Glu504Ser (O); Lys226Ser (v);Lys23OSer (v);Lys234Ser (i);Lys230Glu (O); Lys234Glu (*). The nonnalized rneans of duplicate data points are ploned, and the lines shown were drawn by fitting the data to the equation y = .?/(A? + x"), where y is the fraction of maximum activity, k is [~a'']~.~,n is the Hill constant, and x is [ca2']. The calculated values of [~a'']~,~were 242 * 6, 129 * 1, 226 * 2, 26 1 * 14, 183 * 1, 256 * 1, and 159 * 3 pM respectively. - ...... - -1 1 1O0 1O00 Ca2+Concentration (PM) Figure 3.27 ca2+-titrations of chimeric plm-calpains. The chimeric enzyme PI-III-Fsp-mlV (O) has a [~a'~]~.~(-102 pM) that lies in between the typical values for p-calpain (-25-50 PM) and m-calpain (-2504 000 PM). An increase in the sensitivity of this hybrid enzyme to ~a"is observed after introduction of the mutation Glu5 1 SSer (a), indicated by a reduction in [~a"]~~to -80 PM. 1O0 Ca2+Concentration (PM) Table 3.8. Effects of mutations on the ~a~+-re~uuementand specific activity of m- calpain.

Region of Approximate Specific Activity Fom of Calpain" ~utation~ [ca2'lo. (% of Wild-Type) Wild-type none 242 ft 6 1O0

D-IVD-III 226 k 2 D-WD-III 261 * 1 D-IVD-III 183 1 D-IVD-III 256 * 1 D-IVD-III 159k3

Linker 358 * 7 Linker 140k 5

K 1 OTlESû4S Double -90- 1 10 E504S/E517P Double -50-70

Several mutants were designeci to disrupt electrostatic interactions between PI1and D- III. The dornain in bold-type represents the domain in which a particular mutated residue resides. For example, Lys226 is in D-II, so the nomenclature is D-IYD-III. ' This enzyme is a chimeric plm-calpain (see text for details) It has been well-established that upon limited exposure to ~a",autolytic cleavage and rmoval of the N-terminal region of the catdytic subunit significantly increases the sensitivity of calpain to ~a",as indicated by a &op in the observed [ca2'los (14-1 8, 88).

Autolytic digestion of Glu504Ser m-cal pain, which already has a reduced [~a'']o.~ compared to m-calpain, illustrated that this mutant is also susceptible to a hrther increase in ~a"-sensitivity. After one minute of limited autolysis, both wild-type m-calpain and

Glu504Ser required a significantly lower [ca2+]for activation. This effect was even more pronounceci aRer three minutes of limited autolysis. afier which point the autolytic reaction was essentially complete (as determined by no Merdecreases in the ~a"- requirement). Interestingly, the [~a'']~,~of the Glu504Ser mutant remained significantly lower than wild-type both prior to and afier limited autolysis (Figure 3.28). An additional intriguing effect was observed ahautolytic digestion since that the relative activity of autolyzed enzyme started to decrease at ~a"concentrations higher than -600 FM (Figure

3.28). This ef3ect is generally not observed in calpain samples that have not been previously autolyzed. Figure 3.28 Effect of autolysis on Glu504Ser and wüd-type m-calpain. Autolysis of the catdytic subunit of calpain is known to cause a reduction in [~a'~]o.~.This eflect was also observed in the Glu504Ser mutant. By increasing the extent of autolysis (frorn 1 to 3 minutes) a more pronounced effect was observed. The following [~a'~]~.~values of were obtained from this experiment: wild-type m-calpain. control with no autolysis (0) (25 1.4 pM); wild-type after 1 minute of autolysis (O) (72.3 PM);wild-type after 3 minutes of autolysis (v) (67.7 PM); Glu504Ser control with no autolysis (v) (135.8 PM);Glu504Ser afier 1 minute of autolysis (i)(54.8 PM) ; GIuSû4Ser afier 3 minutes of autolysis (o) (47.9 PM). 1 I 1 10 100 1O00 Ca2+Concentration (PM) 3.15 X-ray C~diogruphyof Mutants

To ascertain whether mutations had unexpected structural changes that might influence the activity of calpain, the structures of several mutants were determined by X- ray crystallography. Crystals have been grown for Lys226Ser, Lys230Glu, Lys234Ser,

Glu504Ser and Glu5 15Pro mutants in the absence of ca2' in conditions very similar to those for wild-type and CyslOSSer m-calpain. It was not possible to obtain crystds of the linker mutant Glu5 17Pro in the standard conditions, and sparse-matrix screening was not successful in producing any crystais. To date, the structures of the Lys230GIu.

Glu504Ser and LyslOThr mutants have been solved and refined to -2.4 to 2.6 A resolution. The R-factor1R-fiee of these mutants ranged f?om 02610.3 1 to 0.27/0.32, indicating the structures were of good quality. The refined structures of these mutants were virtually indistinguishable from wild-type m-calpain (the r.m.s.d. on al1 Ca atoms ranged fiom 0.47 A to 0.49 A ) even in the vicinity of the mutations. Chapter 4: DISCUSSION

4.1 Crystalliz(~~tionof ni-calpain

Recombinant rat m-calpain was crystallized in two crystal foms using PEG 6000 as the primary precipitating agent. The ability of msalpain to crystdlize was heavily influenced by the purity and age of the protein sample. The oxidization state of the protein was also a critical factor affecting whether the enzyme crystdlized in the space group Ri or Pl. Pl crystals tended to grow in "oxidized" sarnples that had been aged for two to three days, whereas PZi crystals were generaily obtained oniy with freshly prepared protein sarnple in a reduced environment. Furthemore, addition of DTT to aged protein sample served as an "oxidative switch" since it influenced the formation of

PZI crystals under conditions otherwise expected to give PI crystals. The exact mechanism of the oxidative switch is not well-undentood since calpain does not have disulfide bonds in the structure of either ciystal form. Furthermore, there is no evidence of cysteine residues at crystallization contact points. suggesting that intermolecular disulfide bond formation is not responsible for Pl crystal formation.

Crystals of calpain could only be obtained in the absence of cal'. Although a great need exists for determining the structure in the ca2+-bound form, practical limitations (Iargely the aggregation that accornpanies ~a"-binding) have prevented crystallization with ~a". Furthmore, soaking native crystals in solutions containhg

~a"has been unsuccessful since the ~a"-induced conformationai change results in immediate crystal cracking. 4.2 Structure Determination

The structure determination of m-calpain was a difficult process that entailed approxirnately two and a half years of work. Native crystals of both space groups were damaged by X-ray radiation, which prevented collection of high-quality data until cryogenic apparatus was obtained. Standard protocols for fieezing crystals in liquid nitrogen or the nitrogen cold-stream were not successfil for calpain, and propane proved to be the only method suitable for flash-freezing. Heavy atom derivatives were extremely difficult to produce, either by soaking or CO-crystallization. Only thimerosal and TEPGC gave any indication of heavy atom bindifig through isomorphous difference Patterson methods, but they were insufficient for structure determination. Co-crystallization of rn- calpain with either thimerosal or TEPGC gave rise to beautiful hexagonal-shaped crystals, but these did not difiact X-rays well enough for useful data to be measured.

SeMet-derivatization ultimately proved to be only viable route for heavy atom denvatization. Even so, with 18 methionine residues present in calpain. deduction of heavy atom (selenium) positions through conventional difference Panerson methods

(particularly in a Pl space group) was not possible. Using direct methods for determination of seienium substnictures, as in this project using SnB (10 1). has only been practical since -1998. For reasons we do not fully understand, it was not possible to determine the protein phase angles from MAD data colleczed at CHESS and BNL

(Although this may have been due to wavelength instability during data collection). The structure was solved using MAD data collected fiom both crystal forms at SSEU beamline 1-5. Electron density maps calculated fkom SHAW (1 03) in either P1 or PZi crystal foms (particularly der solvent ff attening) were of excellent quality. 4.3 Major Findings from the m-Calpain Crystal Structure

43.1 Domain Structure

This structure is the first reported for the heterodimeric form of calpain, and thus has offered the first oppominity to examine the overall assembly of the enzyme as well as the individual domain structures. From the primary sequence, the cataiytic subunit of calpain was initially divided into domains 1 through IV (19) while the regulatory subunit was divided into dornains V and VI (31, 32). The crystal structure has now accurately defined the domain boundaries in m-caipain, and clearly illustrates the necessity for a small modification to the standard domain nomenclature. Historically, domain-1 was considered to constitute residues -1 -80, although the structure illustrates that the first 19 residues form only a single a-helix, while residues -20-80 are part of a totally separate domain (see below). According to general principles of protein structure, this lone a- helix should not be considered a "domain". Given its suspected role in the regdation of m-calpain activity (see below), this a-helix has been tentatively named the N-terminai

"anchor". The cysteine protease component of calpain was found to consist of residues

-20-355, and like other cysteine proteases, is fonned fiom o~oindependently folded domains. Domain-1, so-named since it is the first 'proper" domain of the large subunit, is the larger of the two protease domains, spanning residues -20-2 10. The slightly srnaller domain-II, encompassing residues -2 10-355, makes up the second "haif' of the cysteine protease. Domain-III was identified as an bstranded anti-parallel fhandwich domain, and consists of residues -355-5 14. A short - 15 residue peptide (residues -5 15-530) covalently links D-III to D-N and has been tentatively ben called the linker (or transducer) to emphasize its unique structural role. Domain-IV contains the EF-hands, and is the most C-terminal domain identified in the catalytic subunit of m-calpain, consisting of residues -53 1-700.

Since the protease domain of calpain has historicdly been referred to as domain

II, the proposed structure-based domain classification (Figure 3.13b) differs slightly fiom that previously suggested (19) (Figure 1.3). However, given the fact that papain-like cysteine proteases are commoniy referred to as two-domain proteins (77). and given that the N-terminai a-helical anchor does not constitute a true domain, the structure-based classification scheme is both appropriate and justified. No changes to the existing domain nomenclature for the regulatory subunit are proposed since only the C-terminal

D-VI was present in the construct used for structure determination.

43.2 Structural Basis for ca2'-~e~endentActivity of Calpain

The fundamental question of why the protease component of calpain is inactive in

the absence of ~a"has been answered unequivocally by this crystal structure, which was

determined in the cal'-fiee, inactive conformation. In the absence of ~a",the active site

of calpain, encompassing both the catalytic triad and substrate-binding cleft, is not

formed (Figure 3.2 1). More specifically, D-I and D-II appear to be rotated apart fiom

one another such that the specific geometnc requirements for catalysis at the active site

are not met. Evidence for this cornes from that fact that in every cysteine protease

structure determined to date, the residues of the catalytic triad and the general assembly

of the substrate-binding cleft are identical (77). The observation of D-1 and D-II being

rotated apart hmeach other has several significant consequences. Most importantly, the

interatomic distance between the CyslOS-S atom and the His262-N6 atom is -10.5 A, nearly 7 A longer than the observed distance in papain and other papain-like proteases

(which are, of course, not dependent on ~a"for activity). The observed interatomic distance between the corresponding Cys-S and His-NS atoms in these proteases is -3.7 A.

At a distance of -10.5 A, the requisite Cys-His ion pair cannot fom, and thus the thiolate anion that nomally initiates nucleophilic attack on a substrate is not present.

This mechanism of protease inhibition, or zymogen inactivation, has not been previously observed in a cysteine protease. In al1 other cysteine proteases that have been characterized, the rnechanism of zymogen inactivation proceeds through an N-terminal extension (or pro-segment) of the protease that binds across the active site and prevents access of native substrates (77). In these proteases, the substrate-binding cleft and the catalytic residues are in virtually identical conformations in both the active and zymogen- inactivated States (82-85). Hence calpain regulation is almost entirely different, since it does not make use of a conventional pro-segment. Given that the active site of calpain is not assernbled in the absence of ~a", the role of a conventional pro-segment would be redundant and its existence would therefore be impractical from an evolutionary standpoint.

433The N-Terminal Anchor: A Very Unusual ProSegrnent

As mentioned above, the traditional role of the N-terminal extension (or pro- segment) in cysteine proteases is to inhibit enzyme activity by binding to the active site cleft and preventing substrates from binding. Indeed, this is a comrnon mechanism of zymogen regulation fond in most protease families (77). Although it is clear from the crystal structure that the N-tenninus of calpain is not located in the active site, the anchor shares several characteristics of more conventional pro-segments. First, as in other proteases (77), the N-terminal anchor of calpain is autolytically cleaved upon enzyme activation, although in calpain, this cleavage is not a requirement for activation (87). Given that it does not occupy the active site of calpain, this discrepancy is easily understandable. Furthemore, autolytic rernoval of this segment does not directly result in calpain activation, as in al1 0thproteases, since ~a'+is still required for activity (88).

Second, the anchor appears to have an important role in the folding pathway of calpain, a function that has been well-documented for the pro-segments of several proteases (127). Traditionaily it has been assurned that the regulatory subunit is important for proper folding of calpain, since the catalytic subunit forms inclusion bodies when expressed in its absence (128). Furthemore, it has been show that the regulatory subunit has chaperone-like effects on the refolding of denatured catalytic subunit (56).

However, recent experiments have illustnited that the N-terminus of the catalytic subunit is also important in the process of enzyme assembly, since expression of constnicts lacking the N-terminal residues yielded calpains that lacked enzymatic activity, presumably due to misfolding (88). Given these data, and taken together with the fact that the crystal structure shows an exclusive interaction between the anchor and DM,it seerns likely that one function of the anchor and regulatory subunit is to act as CO-

chaperones, promoting productive folding of the catalytic subunit.

Finally, while the anchor does not directly inhibit the catalytic center of calpain, it

does appear to provide a truly unique mechanism of protease inhibition by serving to

inhibit the assembly of the active site, as will be discussed in more detail below. 43.4 Structure and Arrangement of Domah III

The function of D-III has remained unclear, partly owing to its lack of sequence similarity with any knowa protein. The crystal structure has revealed that D-III is an eight-stranded anti-parallel P-sandwich that is centrally located within the structure and interacts with each domain of the enzyme (Figure 3.17). From 3D homology searching, it was revealed that D-III of m-calpain is structurally similar to a C2domain. Czdomains have been identified in numerous proteins, paticularly those regulated by ~a", and are often described as ca2*-dependent phospholipid-binding domains (1 29). Hence, proteins containing a C2 domain are often involveci in signal transduction pathways or membrane trafficking. In these Ca2'-dependent C2 domains, several loops at one end of the P- sandwich contribute numerous acidic residues that form a binding cradle for Ca" ions

(1 29, 130). Although D-III of calpain has a different topology, the overall dimensions of the B-sandwich are very similar, and calpain also possesses a highly acidic loop region, formed from a stretch of 10 negatively charged arnino acids (residues 392 to 400) and

Glu504. This negatively-charged loop region in D-III is found in the spatially analogous region of the ~a"-bînding loops in the C2 domain, and could potentially serve as a site for ~a"-binding in calpain (Figure 4.1). Figure 4.1. Domain-III of caipain resembles a C2 domain. A typical Cz domain exists as an anti-parallel P-sandwich with several acidic residues at one end that forrn a binding cradle for ~a". The first Cldomain from synaptotagmin (cyan, PDB accession-code 1RSY) ( 130) and D-III (green) have roughly the same overall dimensions, though slightly diffkring topologies. Nurnerous acidic residues (red) result in a highly negative potentid, which is partially stabilized by adjacent basic residues (blue). The finding that D-III of caipain resembles a Cr domain has implications for ~a"-binding and for the mechanism of calpain translocation to the membrane in vivo.

The finding that D-III resembles a Cz domain is particularly attractive since calpain activation in vivo is thought to depend on ~a"-dependent translocation to the plasma membrane (15, 59-60). Additionally, it has been demonstrated in viîro that the

CS-sensitivity of calpain is affecteci by certain phospholipids (6 1). Additional evidence supporthg this putative role for D-III is suggested from the primary sequence of Tra-3, a calpain homologue found in C. elegans, which indicates that a Czdomain exists in the C- terminal region (74). Finally, preliminary evidence reported very recently indicates that recombinantly expressed D-III (whose boundaries were defined based on the crystal structure) does indeed bind both cal' and phospholipid vesicles (1 3 1).

43.5 Structural Cornparison of Calpain and Papain-Like Protease Domains

Compared to other members of the papain-like cysteine protease family, the

calpain structure is much more complex (Figure 4.2), possessing "extra" structural

features that confer additional levels of regulation, such as ~a"-dependence. The

protease module of calpain is also quite distinct from other cysteine proteases. One

notable structural difference is clearly the greater size and lack of similarity of D-1 when

compareci to the corresponding domain in typical cysteine proteases. This domain (D-1,

-20 kDa) is nearly as large as the entire papain molecule and is similar to papain only in

regions defining the active site and substrate-binding cleft. D-II,while possessing a fold

sirnilar to that of papain, is also larger and appears to be more flexible, at least in the

inactive conformation. Figure 4.2. Cornparison of calpain to conventional cysteine proteases. As a cysteine protease, the structure and regulation of calpain are clearly more complex than in related proteins such as papain (red) and cathepsin B (green). The analogous cysteine-protease region in caipain (dark blue and cyan) is accompanied by a host of other structural features that clearly add a significant degree of complexity to the regulatory mechanism. (Colors of caipain domains as in Figure 3.1 3)

In order to draw cornparisons between the active site of calpain and those in other cysteine proteases, we have generated a mode1 structure of the active form of calpain using the structure of papain as a guide (Figure 3.22). Since the active site conformation in laiown cysteine protease structures is highly conserved, the modeled structure is Iikely a relatively accmte representation the of ~a"-activated calpain, at least in the imrnediate vicinity of the active site.

Despite possessing a similar catalytic mechanism (Figure 1.4), the nature of the substrate specificity of calpain differs from that of conventional papain-like proteases.

Calpain shows a weak preference for Leu in the P2 position, but does not othenvise possess a strong sequence specificity for substrate cleavage, and cleaves predominantly at exposed peptide regions between domains in its substrates (14- 18). Cornparison of the active site and substrate-binding cleft of conventional cysteine proteases and their inhibitor-bound complexes with the modeled active conformation of calpain have illustrateci some features that may contribute to the observed differences in substrate specificity. The protease residues that interact with the substrate backbone are partially conserved, including a Gly-Gly repeat comrnonly found in cysteine proteases (residues

197, 198 in calpain, residues 65,66 in papain) that stabilizes S2-P2 backbone interactions through hydrogen bonding (132). In calpain, however, interactions with substrate side chah may offer less specificity because most of the subsite-fonning side chahs in caipain are considerably smaller than those in papain or the cathepsins (132). For example, well-defined S243 substrate binding residues in papain include Asp 158, Tyr67,

Pro68, and Va1 133, while the corresponding residues in calpain are Gly26 1, Ala199, Thr200, and Gly 239. The substrate-binding cleft in calpain therefore appears to be wider and less strict in its binding specificity than in most other thiol proteases (Figure 4.3).

Another strikUig difference between the active sites of calpain and papain is the relative electrostatic charge. As seen in Figure 4.3, the calpain active site region

(including the substrate-binding cleft) is significantly more acidic than in papain, a feature that could have significant biological consequences. Molecular modeling and design of a putative substrate/inhibitor peptide indicated a preference for basic residues in the Pl' and P2' positions, which could help to stabilize substrate binding. Although whether such a preference for basic residues exists in vivo has not been conclusively estabiished, these results may offer insights into future analysis of substrate-binding and inhibitor design. Given that the active site of calpain is so acidic, the question is raiseci whether counter ions such as Na', M~",or even ~a"might bind to this region and affect the stability or activity of the enzyme. Figure 43. Cornparison of the active site of calpain and papain. The observation that calpain has a vastly difference substrate specificity than papain cm be readily visualized upon inspection of their active sites. GRASP (120) electrostatic surface representations of the active sites of a) the modeled active fonn of caipain and b) papain show significant diffaences that may explain their differing specificities. The active site of caipain is clearly much wider and much more acidic, as indicated by the predominantly red color. Papain. on the other hand, has a relatively compact and hydrophobic substnite-cleft. (Yellow arrows indicate the position of the substrate-binding cleft) 4.4 Structural Features Contributirtg to Calpain Inactivation in the Absence of ca2+

As mentioned above, calpain is a ~a"-dependent protease since in the absence of

~a", the active site is not assembled. Compared to the known active site arrangement of papain, it was observeci that D-1 and D-II are rotated apart from each other, which consequently "pulls apart" the active site and inactivates the protease. It is apparent from the structure that specific structural elements maintain this inactive conformation by restricting the mobility of both D-1 and D-II.

4.4.1 Anchor-Regulatory Subunit Interactions Restrict Proteise Dornain-1

The N-terminal anchor and the regulatory subunit appear to have key roles in the inactivation mechanism by restricting the conforrnational fiedom of protease D-1.

Direct expenmental evidence in support of this suggestion was provided by studies that showed autolytic cleavage at Ala9-Lys 1O in the anchor significantly reduced the [ca27 required for activation (88). Moreover, it was shown that dissociation of the regulatory subunit from calpain causes a virtually identical reduction in the ~a"-requirement (55).

Mutagenesis studies employed in our work have lent support to these findings, as dimption of a key "restraining" interaction between the anchor and D-i produced an effect similar (although not quite as dramatic) to autolysis and subunit dissociation.

Substitution of Thr in place of LyslO, which normally intenicts with Glu148 In D-1

(Figure 4.4), resulted in a reduction in the [~a'']~.~£kom -240 pM to -170 pM (Table

3.8). The importance of this interaction is stressed by the absolute conservation of

Lys10 and Glu148 in al1 known m-calpain sequences. The crystd structure of this

LyslOThr mutant was determined by X-ray crystallography and showed that the interaction with Glu148 had been disnipted, but othefwise, no structural changes were observed.

An obvious structurai interpretation of these results is that disruption of the interaction between the anchor and D-VI is responsible for the observed reduction in the

~a"-rquirement. Proteolytic removal of the anchor is not necessary for this effect since both the subunit dissociation (55) and mutagenesis studies have shown that the ca2+- requirernent can be influenced without autolysis. In this capacity, the N-terminal anchor has a remarkably unique pro-segment-like role, namely, to inhibit calpain activity by restricting the mobility of D-1 and elevating the concentration of ~a"-rquired for activation. Figure 4.4 The anchor restricts the mobüity of D-1. In the absence of ca2', the anchor (red) serves to inhibit the assernbly of the active site by interacting with the regulatory subunit (orange) and restricting the conformational freedom of D-1 (blue). Alleviation of this inhibition, either by autolytic removal of the anchor, regulatory subunit dissociation, or site-directed mutagenesis results in a reduction in the amount of ~a"required for activation. In this study, Lys10 (rd) in the anchor was mutated to Thr to disnipt its interaction with Glu148 (blue) in D-1. This mutation resulted in a reduction of [~a'~]~.~from -240 pM to - 170 PM. See text for additional details. The dotted line in this stereodiagram indicates the interaction between the side chains of Lys 1 O and Glu148.

4.4.2 Domain-III Restricts Protease Domain-II

The fact that calpain activity remains cal*-dependent following autolysis of the anchor (88) indicata that 4'releasing" the conformational restrictions on D-1 alone is not sufficient to promote formation of the active site. Assuming that D-1 is fiee to move upon disruption of anchor-regdatory subunit interactions, it follows that movement of D-

II towards D-I is also required for protease activation.

As highlighted in Figure 3.1 7, D-III occupies a central position in the calpain structure, and is therefore optimally situated to act as a "control center", intluencing aspects of domain assembly and perhaps ~a"-induced conformational changes. With respect to exerting an inhibitory role on active site formation, D-III makes a set of extensive electrostatic interactions with D-II that could effectively "hold" D-II apart fiorn

D-1 (Figure 4.5). At this interface, there is a series of electrostatic interactions involving a remarkably acidic loop region composed of Glu392-Asp400 and Glu504 in domain III, and three basic residues, Lys226, Lys230 and Lys234 on a three-tum a-helix in domain

II. Al1 these residues are hiay conserved within the known rn-calpain sequences. It therefore seerned attractive to propose that this salt bridge region exerts a conformational constra.int on the mobility of D-II and thus on the assembly of the active site. To test this hypothesis, we conducted a series of mutagenesis experiments designed to disrupt these

interactions between D-II and D-III (Figure 3.26, Table 3.8).

The most prominent intedomain interaction in this region exists between Lys234

and Glu504 Lys234Ser and GluSWSer mutations both influenced calpain activation as

indicated by their ciramatic effects on the ~a"-requirement of the enzyme. The observed

increase in cal' affinity caused by the Glu504Ser mutation was greater than that caused b y the Lys234Ser mutation, suggesting that Glu504 may make electrostatic contacts to

residues other than Lys234. Like the LyslOThr mutant. the crystal structure of

Glu504Ser is vimially identical to that of the wild-type enzyme in the inactive state,

demonstrating that the reduction in ~a"-requirement is not due to disruption of the

inactive conformation, but must involve the faciiitation of domain movement during Ca"

activation. Another interesting result was observed with the Lys234Glu mutation,

expected to exert a repulsive interaction with Glu504, which reduced the ~a"-

requirement of the enzyme even further than the Lys234Ser mutation. This mutation also

reduced the specific activity of the enzyme, perhaps due to a negative influence on the

stability of this electrostaticdly-sensitive region.

Surpnsingly, mutagenesis showed that the electrostatic interactions of Lys226 and

Lys230 with the acidic Glu392-Asp400 loop do not appear to be critical to the activation of calpain by cal', since the Lys226Ser and Lys23OSer mutations did not affect either the

Ca"-requirement of the enzyme or its specific activity. The Lys230Glu mutation, which

was expected to introduce a strong repulsion between this position and the acidic loop,

did not affect the ~a"-requirement of the enzyme. It did however greatly reduce the

specific activity of the enzyme, similar to the Lys234Glu mutant. The crystal structure of

Lys230Glu in the absence of cap was identical to that of wild-type m-calpain, showing

that the mutation did not affect the domain II-domain III geometry in the resting state of

the enzyme. In the absence of a ~a"-bound structure, it remains unresolved how

mutation of Lys230 to Glu but not Ser affects the specific activity of the enzyme,

particularly since the structure is identical to wild-type m-calpain in the absence of caZ'. While these studies were perfonned on m-calpain, it was appealing to suggest that a similar mechanism of "electrostatic restraint" exists in p-calpain, aven the high degree of similarity of the two isofoms. Unfortunately, difficulties in recombinant expression and purification of p-calpain have largely prevented in-depth structure-function studies on this isoform. However, a hybrid enzyme, consisting of predominantly p-calpain, has been expressed, purified and characterized in the laboratory of our collaborator Dr. John

Elce (unpublished results). This enzyme has a ca2'-dependence that more closely resernbles p-calpain, with a [~a'+]~.~of -100- 1 10 FM. Since the Glu504Ser mutation was observed to have the most significant effect on the ~a"-requirement of m-calpain, the corresponding mutation (Glu5 1SSer) was subsequently introduced into the hybrid enzyme. Analysis of the ~a"response of this mutant showed a similar effect, namely a reduction in [~a"]o.~fiom -102 pM to -80 pM (Figure 3.27, Table 3.8). These results suggest that the inter-domain electrostatic interactions between D-III and D-II in both m- and p-calpain are able to inhibit active site assembly by restricting the mobility of D-II. Figure 4.5 D-III restricts the rnobüity of D-II. a) Overall view of the rn-calpain structure in the absence of ~a"(Colors as in Figure 3.13). The region contained within the black circle is arnplified for clarity in (b). b) Conserved electrostatic interactions between D-II and D-III inhibit active site formation in m-calpain by resûicting the mobility of D-II. Lysine residues (dark blue) Lys226, Lys230 and Lys234 on a domain 11 a-helix form several inter-domain salt bridges with Glu504 and several other acidic residues (red) on one loop (residues 392400) of domain III. Dotted lines indicate possible interactions with Glu504 In this study, Lys226, Lys230, Lys234 and Glu504 were mutated to determine the effects disruption of these inter-domain interactions might have on the activation of m- calpain by ~a". The interaction between Lys234 and Glu504 is the most critical, as mutating either of these residues resulted in an enzyme that was active at a significantl y 10 wer [cal']. This particular interaction is likel y conserved in p- calpain (the corresponding residues are Arg244 and Glu5 15), and mutagenesis of Glu5 15 to Ser in a hybrid p/m-caipain resulted in a similar reduction in the ~a"- requirement. These observations suggest that the restriction of D-II mobility through inter-domain interactions with D-III is a fundamental inhibitory mechanism in both of the ubiquitous calpain isoforms. Protease D-1

Anchor D-VI Active Si Cleft

I D-III Linker

+D-III (392-400) 4.43 The Linker

The above hdings indicate that disruption of specific ''restraining interactions" through autolysis, subunit dissociation or mutagenesis results in a significant decrease in the amount of calf required for activation. While these experiments can be interpreted as affecting the mobility of either D-I or D-II, the question was raisecl whether additional

"restraints" located elsewhere in the molecule might affect active site assernbly indirectly, perhaps through D-III. An ideal candidate for such an effect was suggested by the unique structure of the linker, since it directly connects D-III to EF1 of domain IV, the only region in the EF-hand domain suspected to undergo a conformational change upon ~a'+-binding(Figure 1.8) (66). The linker is nearly devoid of secondary structure with the exception of a three residue P-strand (residues 5 16-5 18) that forms a short anti- parallel P-sheet with a three residue P-strand in D-IV (residues 636-638) (Figure 3.20). It was suspected that the linker might modulate the hction of D-III in response to cal'- binding in D-IV. To test this hypothesis, two mutants were designed in the linker.

Glu517, which occupies the central position in the p-strand, was mutated to Pro specifically to disrupt the short b-sheet interaction, and perhaps emulate the effect of a conformational change at EF 1 due to ~a"-binding in D-IV. The mutant Glu5 15Pro was made essentially to serve as a control since Glu515 is outside of the B-strand and thus was not expected to exert such a drarnatic influence. As seen in Table 3.8, the Glu5 15Pro mutant had essentially no effect on the enzyme, while mutation of Glu5 17 to Pro reduced the [~a~']~.~fkom -240 pM to -140 PM, nearly a two-fold reduction in the ~a"- requirement. This mutation also reduced the specific activity to -10% of wild-type m- calpain, indicative of its important structural role. Consistent with these findings were crystallization experiments that showed Glu5 15Pro crystallized under wild-type conditions whereas Glu5 17Pro did not. It is difficult to interpret precisely how dismption of the short p-sheet in the linker affects the ~a%quirement of the enzyme. However, it is not difficult to envision that a ca2'-induced conformational change in EFI of D-N could disrupt the P-segment, which, given its proximity to D-III, could affect active site assernbly indirectly through subsequent changes in D-III. This hypothesis seems reasonable since it is in general agreement with the experiments mentioned above, namely that direct disruption of conformational restraints on D-I and D-II is reflected in a reduction of the ~a?+-re~uirement.

4.5.4 The Effects of Disrupting Multiple 'Mobiiity Restraints"

To test whether each of the three identified inhibitory regions act independently or coIlectively, three experiments involving either double mutants or limited autolysis were designed to disrupt multiple interactions in combination. The Glu504Ser mutant. which disrupts interactions between D-III and D-II, was the cornmon element in these studies since it had the greatest effect on the ~a"-requirement and did not affect the specific activity of the enzyme. With a [~a"]o.s of -130 PM, this mutant also served as the baseline for data interpretation. It was hypothesized that if the effects of active site- restraint were additive, then dimption of reshnts in either the N-terminai anchor or the linker in the GluSMSer mutant would result in a further lowering of the [~a'']~.~.

Introduction of the double mutation LyslOThrlGlu504Ser did indeed result in a merreduction of the ~a'+-requirementof the enzyme, as the [ca2'los fell to -100 PM.

Sirnilarly, afier limiteci autolytic digestion (which is known to reduce the cazf- requirement of calpain), the [ca2']o.s of the Glu504Ser mutant (-48 phf) remaineci significantly lower than that of wild-type m-calpain (-68 PM) (Figure 3.28). It can be concluded hmthese two observations that the mechanisms that inhibit the mobility of both D-1 and D-II are distinct. In other words, removal or reduction of N-terminal anchor restraints on D-I (either by autolysis or Lys10 mutation, respectively) cornbined with disruption of the electrostatic interactions that restrain D-II (by Glu504 mutation) produce additive effects.

A similar effect was observed in the mutant GluSO4Ser/GluSI7Pro which was expected to reduce the restraining capacity of both the linker and the D-IUD-III interactions. A titration of the ca2'-rquirement of this mutant revealed a very drarnatic reduction in the [~a'']~.~to -60 FM.This value is significantly lower than that observed in either of the single mutations Glu504Ser or Glu5 17Pr0, and represents an approximate four-fold lowering of the [~a'']~.~comparai to wild-type m-calpain. The ~a"-sensitivity of this mutant more closely resembles p-calpain than rn-calpain. and provides additional evidence that each of these inhibitory regions act independently.

It is interesting to note that a very unusual effect was observed with the

Glu504Ser/Glu517Pro mutant. Attempts to puri@ the intact heterodimenc calpain were unsuccessful, and in each case, only the catalytic (80 kDa) subunit could be recovered

(Figure 3.25). The observed dissociation of D-VI was highly unexpected, but was reproduced on four separate occasions. It was readily apparent in these mutant samples that the isolated catalytic subunit was quite unstable since it tended to aggregate and precipitate from solution. This resulted in an enzyme that had weak specific activity (less than 1% of wild-type m-calpain) but still retained a clear, albeit much reduced, dependence on ~a". Subunit dissociation has been previously reported in calpain (55, 56) and has been obsewed in our laboratory (unpublished results), but in both cases, dissociation was found to occur only in the presence of ~a". The question therefore arises how the introduction of two point mutations in an enzyme can result in a phenomenon that is supposedly ca2+-dependent. One interpretation of the reduction in the ~a"-requirement in these studies is that the role of ~a"has essentially been circumvented (at least partially) by the particular modification. Along this Iine of thinking, it could be suggested that by dimpting both the electrostatic interactions between D-III and D-II as well as the P-sheet interaction in the linker, calpain has undergone a Ta2'-like" conformational change, but in the absence of caL'. Given that neither Glu504 or Glu517 reside near the heterodimer interface, and could not directly influence subunit dissociation, an altemate interpretation is not imrnediately obvious.

4.6 Proposed Regulatory Mechanism of Calpain by ca2+

The crystal structure of m-calpain determined in the absence of ~a"has offered several novel insights into the regulation of calpain activity. The most conclusive findings relate specifically to the mode of zymogen inactivation, and illustrate that calpain is regulated by a mechanism significantly more complex than that of most proteases, particularly the papain-like enzymes. The fundamental question of why the proteolytic activity of calpain is ~a"-dependent has been addressed by the finding that in the absence of caZ', D-I and D-II,which constitute the protease function of calpain, are rotated apart from each other. This has the specific consequence that the catalytic triad and substrate-binding cleft are not intact, and substrate cleavage cannot occur.

Upon further analysis, it appeared as though both D-1 and D-II were physically

"held" apart through interactions with specific regions that effectively inhibit their mobility. A series of biochemicai studies designed to test this hypothesis were perfomed, and consequentiy reveded considerable insights into the mechanism of ~a"- induced calpain activation. In accordance with current literature, and based on the crystai structure and our experimental findings, the following comprehensive mechanism describing the structural and functional aspects of calpain regdation by ~a"is proposed.

As indicated from the structure, the regdatory subunit will facilitate the fornation of the calpain heterodimer through interactions with the N-terminal anchor, as well as by the D-NID-VI interactions. The assembled heterodimer remains inactive in the absence of ca2' due to a network of conformational restraints that effectively holds the active site apart. Upon exposure to sufficiently high levels of ~a", calpain is activated by a complex mechanism Iikely to involve several key steps. Intuitively, it seerns likely that the first step in the activation pathway involves ~a"-binding to the EF-hand domains.

The crucial role of the EF-hands to the activation process has been clearly demonstrated by recent mutagenesis studies (50). Dimption of the EF-hands in either D-IV or D-VI severely harnpers the activation pathway, as the concentration of cab required for activation increases dramatically. Given that the affinity of the EF-hands in D-IV and D-

VI are comparable (SO), ~a"should not have a preference for binding to either domain.

If the conformational change observed upon ~a'+-binding in the D-VI homodimer (66,

67) can be extended to the heterodimeric calpain, the rdof caz'-binding to the EF- hands would likely have two effects. First, in D-IV,the movernent of EFI is likely to dimpt the short anti-parallel B-sheet that exists between the adjacent lÏnker and D-N.

Second, ~a''-bindui~ to D-VI causes a maIl change in the hydrophobie pocket in which the N-terminal anchor is bound. As seen in Figure 4.6, this small change would cause a steric clash that should result in "release" of the anchor. Thus, binding of ~a"to the EF- hand domains should dimpt two of the three "inhibitory restraints" identified by the structure-based mutagenesis studies, resulting in a freely mobile D-1 and a les-tightly restrained D-III. The third inhibitory region identified by mutagenesis exists at the

interface of D-II and D-III, and it is difficult to imagine how ~a"-binding at the EF-hand

domains could affect this region, particularly since the suspected conformational change

is restrîcted to a maIl movement around EF 1. Recalling the resemblance of D-III to C2

domains, a much more plausible mechanism would be ca2'-binding to the highly acidic

loop-region in D-III. Coordination of ~a"in this region by the side chains of these

acidic residues could break most or even al1 of the electrostatic interactions with the basic

residues in D-II. ~a"-binding at this loop region could clearly be influenced by

phospholipids, a suggestion supported both by the known function of Cz domains (129,

130) and the fact that calpain activation in vitro and in vivo is afTected by membranes ( 15,

59-62). However, phospholipid-binding is not a requirement for calpain activation, and

binding of ca2* alone to this region should be sufficient to release D-II from the

restraining effect of D-III. Figure 4.6. ~a~~-inducedconformational changes in D-VI should release the N-terminal anchor. The crystal structures of a) ~a"-free and b) ~a"-bound D- VI in the homodimer structure (66) were overlapped ont0 D-VI as it exists in the heterodimeric m-calpain. Both of these forms are shown as GRASP (120) electrostatic surt'ces, and their putative interactions with the N-terminal anchor (rods) are shown. The conformational changes observed upon ~a"-binding show that a significant steric clash (indicated in the yellow box) would be introduced with the side chain of conserved Il& in the N-terminal anchor. This clash should cause the release of the N-terminal anchor fiom this pocket.

Following the events mentioned above, it is proposed that the inhibitos, mechanisrns of "active site restraint" would be lifteci, and consequently D-I and D-II would gain the mobility needed to rotate together to forrn the active site. However, one major steric banier to active site formation would rernain under these circumstances: the aromatic side chah of the consewed Trp288 in D-II, which would clash with D-I upon formation of the active site. In ail characterized cysteine proteases, the side chah of this conserved Trp residue performs an important role, shielding the conserved catalytic triad

Asn-His hydrogen bond From the solvent, promoting efficient catalysis (48). In the absence of ~a'*,it appears that Trp 288 is forced into a non-papain-like conformation by

Pro287 (which is a conserved Ser in other cysteine proteases), so that Trp288 may act as a "wedge" which would help to prevent the cleft between D-I and D-II from closing. In the presence of ~a",Trp288 must rotate to the conformation normally seen in other thiol proteases in order to allow active site formation. The mechanism of such a rotation has not been ascertained fiom these studies, but given the extremely negative charge found in the active site region, it is attractive to speculate that ca2' might also somehow cause this conformational change. Rotation of Trp288 out of the active site cleft should eliminate the final barrier to active site formation so that D-I and D-II can rotate towards each other, assembling the substrate-binding clefi and the catalytic tnad (Figure 4.7). Figure 4.7. Proposed activation mechanism of calpain by ca2+. (Colors as in Figure 3.13) ~a"is likely to activate calpain through a complex mechanism involving several steps. ~a"-binding initially to the EF-hand domains and mbsequently at the C-like D-III should relieve al1 the restraints that hold D-1 and D-II apart. Prior to activation, Trp288 resides in the active site clefi (brown) and must rotate outwards into the orientation normally observeci in cysteine proteases (red). Finally, D-I and D-II rotate towards each other, activating the protease through formation of the functional substrate-binding clet? and catalytic triad. 4. Trp288 Rotates out of Active Site Clefi

3. ca2'-binds to C&e D-III A

5b. D-1 1noves h k towards D-II

L -7 N- terminal

2a. Re1 Linker The complexity of calpain regdation is not restricted to its intramolecular activation mechanism. In vivo, there are a host of factors that can potentially regulate the

hction of this enzyme, including autolysis, subunit dissociation, membrane association,

activator proteins and calpastath (14-18). Even the issue of calpain activation by ~a"in

vivo is puuling since both p- and m-calpain have in viîro requirements for ~a"that are

orders of magnitude higher than typicd ~a"concentrations found in the ce11 (14-1 8).

While not necessarily addressing the physiological significance of this apparent paradox,

the mutagenesis studies reported here indicate that the high ~a"-requirement of calpain

is perhaps a by-product (or necessity) of calpain's unique zymogen inactivation mechanism. The mutated residues in this study that caused a reduction in the ~a"-

requirement were al1 highly conserved, again reinforcing the suggestion that the intrinsic

property of a high ~a"-requirement of calpains is important to the regulatory

mechanism.

4.7 Conclusions and Future Studies

In this project, the crystal structure of rat m-calpain has been determined to 2.6 A

resolution by X-ray c~ystallography. The structure, determined in the absence of cal',

illustnited severai novel features that regulate the activity of this ~a'+-dependent

protease. Combined with a comprehensive structure-function analysis, a theory has been

proposed to explain both the mode of zymogen inactivation as well as the mechanism of

calf-induced activation. Furthemore, we have for the first time conclusively shown that

regions other than the EF-hand domains affect the ~a"-requirement and activation of p-

and m-calpain. These regions have been clearly identified by mutagenesis studies. The

finding that D-III of calpain resembles a Cz domain also has important physiological implications and may ex$ain how calpain hanslocates to the membrane in a ~a"- dependent manner.

Without question, it is essential to determine the crystal structure of calpain in the ca2+-bond, active fom. However, this task is expected to be plagued with significant technical challenges, and it is quite conceivable that a cal'-bound structure of the calpain heterodimer may not be available Ui the foreseeable future. Future avenues to explore the calpain regdatory mechanism must therefore include the expression and characterization of individual domains, since constnicts can be rationally designed now that a structure has accurately defined the domain boundaries. Indeed, in collaboration with Dr. Peter

Davies' laboratory, we have begun to undentand some of the functional properties of these isolated domains. Watson, J.D. and Crick, F.H.C. (1953) A structure for deoxyriiose nucleic acid. Nature, 171,737-738.

Watson, J.D. and Crick, F.H.C. (1953) Genetic implications of the structure of deoxyribonucleic acid. Nature. 171,964967.

Perutz, M.F., Rossmann, M.G., Cullis, A.F., Muirhead, H. and Will, G. (1960) Structure of heamoglobin: A three dimensional fourier synthesis at 5.5 A resolution, obtained by X-ray analysis. Natiîre 185,416-422. Kendrew, J.C., Dickerson, R.E., Strandberg, B.E., Hart, R.G., Davies, D.R., Phillips, D.C. and Shore, V.C. (1960) Structure of myoglobin: A three dimensional fourier synthesis at 2 A resolution. Nature 185,422-427.

Beman, H.M.,Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, LN., and Boume, P.E. (2000) The Protein Data Bank. Nzrcleic Acids Research 28,23 5-242.

Wlodawer, A and Vondrasek, J. (1998) Inhibitors of HIV-1 protease: a major niccess of structure-assisted . Annu. Ra. Biophys. Biomol. Struct. 27, 249-84.

Russell, R.B. and Eggleston, D.S. (2000) New roles for structure in biology and dnig discovery. Nature Struct. Biol. 7,928-930.

The Genome international Sequencing Consortium. (200 1) Initial sequencing and analysis of the human genome. Nahtre 409,860-92 1.

Lodish, H., Baltimore, D., Berk, A., Zipursky, S.L., Patsudaira, P. and Darnell, J. in Molecular Ce11 Biology, 3d edition, Scientific Arnmican Books, New York ( 1995).

Yap, K.L., Arnes, J.B., Swindells, M.B. and Ikura, M. (1999) Diversity of conformational States and changes within the EF-hand . P roteins 15,499-507.

Babu, YS., Bugg, C.E., and Cook, W.I. (1988) Structure of cahodulin refined at 2.2 A resolution. J. Mol. Biol. 204, 19 1-204.

Kretsinger, R.H. and Nockolds7 C.E. ( 1973) Carp muscle calcium-binding protein II. Structure determination and general description. J. Biol. Chem. 248,33 13-3326.

Crivici, A. and [kura, M. (1995) Molecular and struchnal bais of target recognition by calmodulin. Annu. Rev. Biophys. Biomol. Smcct. 24,854 16. Sorimachi, H., Ishiura, S., and Suniki, K.. (1997) Structure and physiological function of calpains. Biochem J. 328,72 1-732.

Molinari, M. and Carafoli, E. (1997) Calpain: A cytosolic proteinase active at the membranes. J, Membrane Biol. 156, 1-8.

Ono, Y., Sorimachi, H., and Suniki, K. (1998) Structure and physiology of caipain, an enigrnatic protease. Biochem. Biophys. Res. Commun. 245,289-294.

Carafoli, E., and Molhari, M. (1998) Calpain: a protease in search of a function? Biochem. Bioph-vs. Res. Commun. 247, 193-203.

Suzuki, K. and Sorimachi, H. (1998) A novel aspect of calpain activation. FEBS Let?. 433, 1-4.

Ohno, S., Emori, Y., Imajoh, S., Kawasaki, H., Kisaragi, M., and Sd,K. (1984) Evolutionary ongin of a calcium-dependent protease by fusion of genes for a thiol protease and a calcium-binding protein? Nature 312, 566-570.

Swuki, K. (1 99 1) Biomed. Biochim Acta 50,483-484.

Franz, T., Vingron, M., Boehm, T., and Dear., T.N. (1999). Capn7: a highly divergent vertebrate calpain with a novel C-terminal domain. Mamm. Genome 10, 318-321.

Fougerousse, F., Durand, M.,. Suel, L., Pourquie, O., Delezoide, A.L., Romero, N.B., Abitbol, M., and Beckmann, J.S. (1998) Expression of genes (CAPN3, SGCA, SGCB, and TM) involved in progressive muscular dystrophies duxing early human development. Genomics 48, 145- 1 56.

Matena, K., Boehm, T., and Dear, T.N. (1998) Genomic organisation of mouse Capn5 and Cap6 genes conhs that they are a distinct calpain subfamily. Genomics 48, 1 1 7- 120.

Richard, L, Broux, O., Allamand, V., Fougerousse, F., Chiannilkulchai, N., Bourg, N., Brenguier, L., Devaud, C., Pasturaud, P., Roudaut, C., Hillaire, D., Passos- Bueno, M.-R., Zatz, M., Tischfield, LA., Fardeau, M., and Beckmann, J.S. (1995) Mutations in the proteolytic enzyme calpain 3 cause lirnb-girdle muscular dystrophy type 2A. Cell, 81,2740.

Sorimachi, H., Ishiura, S. and Suzuki, K. (1993) A novel tissue-specific calpain species expressed predominantly in the stomach comprises two alternative splicing products with and without ~a"-binding domain. J. Biol. Chem 268, 19476- 19482. lekely, G. and Friedrich, P. (1999) Characterization of two recombinant Drosophila calpains. CALPA and a novel homolog, CALPB. J. Biol. Chem. 274,23893-23900. Aoki, K., Imajoh, S., Ohno, S., Emon, Y., Koike, M., Kosaki, G. and Suniki, K. (1986) Complete amino acid sequence of the large subunit of the low-~a"- requiring form of human caz'-activated neutral protease (muCANP) deduced fiom its cDNA sequence. FEBS Let?. 205,3 13-3 17.

Lmajoh, S., Aoki, K., Ohno, S., Emon, Y., Kawasaki, H., Sugihara, H. and Sa, K. (1988) Molecular cloning of the cDNA for the large subunit of the hi@-ça2'- requiring form of human ~a"-activated neutrai protease. Biochemistry 27, 8 122- 8128.

Deluca, C.I., Davies, P.L., Samis, J.A. and Elce, J.S. (1993) Molecular clonhg and bacterial expression of cDNA for rat calpain II 80 kDa subunit. Biochim. Bioph-vs. Acta 1216,81-93.

Sorimachi, H. and Shi,K. (1992) Sequence cornparison arnong muscle-specific calpain, p94, and calpain subunits. Biochim. Biophys. Acta. ll6O,S5-62.

Ohno, S., Emon, Y. and Shi,K. (1986) Nucleotide sequence of a cDNA coding for the mal1 subunit of human calcium-dependent protease. Nucleic AcidF Res. 14, 5559.

Miyake, S., Emon, Y. and Suniki, K. (1986) Gene organization of the maIl subunit of human calcium-activated neutral protease. Nitcfeic Acids Res. 14,8805-88 17.

Arthur, J. S., Elce, J. S., Hegadom, C., Williams, K. and Greer, P. A. (2000) Disruption of the murine calpain midl subunit gene, Capn4: calpain is essentid for embryonic development but not for ce11 growth and division Mol. Cell. Biol. 20, 4474-448 1,

Potter, D.A., Timauer, J.S., Janssen, R., Croall, D.E., Hughes, C.N., Fiacco, KA., Mier, J.W., Maki, M. and Herman, LM. (1998) Calpain regulates actin remodeling dving ce11 spreading. J. CeIl. Biol. 141,647-662.

Ternm-Grove, C.J., Wert, D., Thompson, V.F., Allen, R.E. and Goll, D.E. (1999) Microinjection of calpastatin inhibits fusion in myoblasts. Exp. Cell. Res. 247, 293- 303.

Carragher, N.O., Levkau, B., Ross, R. and Raines, E.W.(1999) Degraded collagen fiagrnents promote rapid disassernbly of smooth muscle focal adhesions that correlates with cleavage of pp 125(FAK), paxillin, and talin. J. CeIf. Biol. 147, 6 19- 630.

Kulkami, S., Saido, T.C., Suniki, K. and Fox, LE. (1999) Calpain mediates integrin-induced signaling at a point upstream of Rho family mernbers. J. BioL Chem. 274,2 1265-2 1275. Meredith, J., Mu, Z., Saido, T. and Du, X. (1998) Cleavage of the cytoplasmic domain of the integrin beta3 subunit during endothelid ce11 apoptosis. J. Biol. Chem. 273, 19525-1953 1.

Gao, G. and Doy Q.P. (2000) N-terminal cleavage of bax by calpain generates a potent proapoptotic 18-kDa fiagrnent that promotes bcl-2-independent cytochrome C release and apoptotic ce11 death. J. Cell. Biochem. 80.53-72.

Dourdin, N., Balcerzak, D., Brustis, J.J., Poussard, S., Cottin, P. and Ducastaing, A. (1999) Potential rn-calpain substrates during myoblast fusion. Erp. Cell. Res.. 246, 43 3 -442.

Harada, K., Maekawa, T., Abu Shama, K. M., Yamashima, T. and Yoshida, K. (1999) Translocation and down-regulation of -alpha, -beta, and - gamma isofoms during ischernia-reperfision in rat brain. J. Neurochem.. 72, 2556-2564.

Kubbutat, M.H. and Vousden, K.H. (1 997) Proteolytic cleavage of human p53 by calpain: a potential regulator of protein stability. Mol. Cell. Biol.. 17,460-468.

Wang, K. K. W. and Yuen, P-W. (1997) Development and therapeutic potential of calpain inhibitors. Adv. Pharmacol.. 37, 1 17- 152.

Lee, M. S., Kwon, Y. T., Li, M., Peng, J., Friedlander, R. M., and Tsai, L. H. (2000) Neurotoxicity induces cleavage of p35 to p25 by calpain. Nature. 405,360-364.

Iwamoto, K., Miun, T., Okamura, T., Shirakawa, K., Iwatate, M., Kawamura, S., Tatsuno, H., Ikeda, Y. and Matsuzaki, M. ( 1999) Calpain inhibitor- 1 reduces infant size and DNA fragmentation of myocardium in ischemiclreperfused rat heart. J. Cardiovasc. Pharmacol, 33,580-586.

Markgraf, C.G., Velayo, N.L., Johnson, M.P., McCarty, D.R., Medhi, S., Koehl, J.R., Chmielewski, P.A. and Limik, MD. (1998) Six-hour window of opportunity for calpain inhibition in focal cerebral ischemia in rats. Stroke. 29, 152- 158.

Arthur, J.S., Gauthier, S. and Elce, J.S. (1995) Active site residues in m-calpain: identification by site-directed mutagenesis. FEBS Len. 368,397-400.

Storer, A.C. and Menard, R. (1994) Catalytic mechanism in papain family of cysteine peptidases. Methods Emymol. 244,486-500.

Arthur, J.S. and Elce, J.S. (1996) Interaction of aspartic acid-104 and proline-287 with the active site of m-calpain. Biochem. J.319,535-54 1. Duc P., Arthur, J.S., Grochulski, P., Cygler, M. and Elce, J.S. (2000) Roles of individual EF-hands in the activation of m-calpain by calcium. Biochem. J. 348,37- 43. Suzuki, K., Tsuji, S., Kubota, S., Kimura, Y. and Imahori, K. (198 1) Limited autolysis of ~a"-activated neutrai protease (CANP) changes its sensitivity to ca2+ ions. J. Biochem (Tokyo).90,275-278.

Hathaway, D.R., Werth, D.K. and Haeberle, J.R. (1982) Limited autolysis reduces the ~a"-reqWent of a smooth muscle ca2'-activated protease. J. Biol. Chern. 257,9072-9077.

Goll, D.E., Thompson, V.F., Taylor, R.G., and Zalewska, T. (1992) 1s calpain activity regulated by membranes and autolysis or by calcium and calpastatin? BioEssays, 14,549-556.

Elce, J.S., Davies, PL., Hegadom, C., Maurice, D.H.,and Arthur, J.S.C. (199%) The effects of tmcations of the small subunit on m-calpain activity and heterodimer formation. Biochem. J. 326,3 1-38.

Yoshizawa, T., Sorimachi, H., Tomioka, S., Ishiura, S., and Suniki, K. (1995) Calpain dissociates into subunits in the presence of calcium ions. Biochem. Biophys. Res. Commun., 208,376-383.

Yoshizawa, T., Sorirnachi, H., Tomioka, S., Ishiura, S. and Suniki, K. (1995) A catalytic subunit of calpain possess full proteolytic activity. FEBS Lett.. 358, 10 1- 103.

Zhang, W. and Mellgren, R.L. (1996) Calpain subunits rernain associated during catalysis. Biochem. Bioph-vs. Res. Comm. 227,890-896.

Dutt, P., Arthur, J.S.C., Croall, D.E., and Elce, J.S. (1998) m-Calpain subunits remain associated in the presence of calcium. FEBS Lett.. 436,367-37 1.

Molinari, M., Anagli, J. and Carafoli, E. (1994) caz'-activated neutral protease is active in the erythrocyte membrane in i't nonautolyzed IO-kDa form. J. Biol. Chem. 269,27992927995.

Pontrernoli, S., Melloni, E., Sparatore, B., Salamino, F., Michetti, M., Sacco, 0. and Horecker, B.L. ( 1985) Binding to erythrocyte membrane is the physiological mechanisrn for activation of ~a-+-dependentneutral proteinase. Biochern. Biophvs. Ra. Commzrn. 128,33 1-338.

Coolican, S.A. and Hathaway, D.R. (1984) Effect of L-a-phosphatidylinositol on a vascular smooth muscle ca2'-dependent protease. Reduction of the ca2+- requirement for autolysis. J. Biol. Chem. 259, 1 1627-1 1630.

Saido, T.C., Nagao, S., Shiramine, M., Tsukaguchi, M., Yoshizawa, T., Sorimachi, H., Ito, H., Tsuchiya, T., Kawashima, S. and Suniki, K. (1994) Distinct kinetics of subunit autolysis in mammalian m-calpain activation. FEBS Lett. 346,263-267. 63. Melloni, E., Michetti, M., Salamino, F. and Pontremoli, S. (1998) Molecular and functional properties of a calpain activator protein specific for mu-isoforrns. J. Biol. Chem. 273, 12827- 1283 1.

Melloni, E., Averna, M., Salamino, F., Sparatore, B., Min* R. and Pontremoli, S. (2000) Acyl-CoA-binding protein is a potent m-calpain activator. J. Biol. Chem. 275,82086,

Mellgren, R.L. (1 987) Calcium-dependent proteases: an enzyme system active at cellular membranes? Faseb Journal, 1, 1 10- 1 15.

Blanchard, H., Grochulski, P., Li, Y., Arthur, J.S.C., Davies, P.L., Elce, J.S., and Cygler, M. ( 1997) Struchue of a cal pain ~a"-binding domain reveals a novel EF- hand and ~a"-induced conformational changes. Nature Stmct. Biol., 4,532-538.

Lin, G.D., Chattopadhyay, D., Maki, M., Wang, K.K., Carson, M., Jin, L., Yuen, P.W., Takano, E., Hatanaka, M., DeLucas, L.J., and Narayana, S.V. (1 997) Crystal structure of daumbound domain VI of caipain at 1.9 A resolution and its role in enzyme assernbly, regulation, and inhibitor binding. Nafure Smct. Biol., 4, 538- 547.

Kretsinger, R. H. ( 1997) EF-hands embrace. Nature Sbuct. Biol. 4, 5 14-5 16.

Richard, I., Roudaut, C., Marchand, S., Baghdiguian, S., Herasse, M., et al. (2000) Loss of Calpain 3 Proteolytic Activity Leads to Muscular Dystrophy and to Apoptosis-associateci IkappaBalpha/Nuclear Factor kappaB Pathway Perturbation in Mice. J. Cell. Biol. 151, 1583-1 590.

Sorimachi, K., Imajoh-Ohmi, S., Emori, Y., Kawasaki, H., Ohno, S., Minami, Y. and Suzuki, K. ( 1989) Molecular cloning of a novel mammalian calcium-dependent protease distinct from both m- and mu-types. Specific expression of the mRNA in skeletal muscle. J. Biol. Chem. 264,20 106-20 11 1,

Bmca, D., Gugliucci, A., Bano, D., Brini, M. and Carafoii, E. (1999) Expression, partial purification and functional properties of themuscle-specific calpain isofom p94. Eur. J. Biochem. 265,839-846.

Sorimachi, H., Kinbara, L,Kimura, S., Takahashi, M., Ishiura, S., et al. (1995) Muscle-specific calpain, p94, responsible for limb girdle muscular dystrophy type 2A, associates with co~ectinthrough IS2, a p94-specific sequence. J. Biol. Chem. 270,3 1 158-3 1 162.

Sokol, S.B., and Kuwabara, P.E. (2000) Proteolysis in Caenorhabditis elegans sex determination: cleavage of TRA-2A by TRA-3. Genes Dev. 14,901-906.

Bames, T.M. and Hodgkin, J. (1996) The tra-3 sex determination gene of Caenorhabditis elegam encodes a member of the calpain regdatory protease family. EMBO J.. 15,4477-4484. 75. Horikawa, Y. et al. (2000) Genetic variation in the gene encoding calpain40 is associated with type 2 diabetes rnellitus. Nature Genetics. 26, 163- 175.

Jia,. J., Han, Q., Borregaard, N., Lollike, K. and Cygler, M. (2000) Crystai structure of human grancalcin, a member of the penta-EF-hand . J. Mol. Biol. 300, 1271-1281.

Khan, A.R. and James, M.N.G.(1998) Molecular mechanisms for the conversion of zyrnogens to active proteolytic enzymes. Protein Sci., 7,8 15-836.

Bernstein, N.K., Cherney, M.M., Loetscher, H., Ridley, R.G. and James, M. N. G. (1 999) Crystai structure of the novel aspartic proteinase zymogen pr~pl~epsinII from plasmodium fdcipam. Nature Stmct BioL. 6,32-37.

Jing, H., Macon, K.J., Moore, D., DeLucas, L.J., Volanakis, J.E., and Narayana, S.V.L. (1999) Stnictural basis of profactor D activation: fiom a highly flexible zymogen to a novel self-inhibiteci senne protease, complement . EMBO J., 18,804-8 14.

Kamphuis, 1. G., Kalk, K. H., Swarte, M. B.A., and Drenth, J. (1984) Structure of papain refined at 1.65 A resolution. J. Mol. Biol.. 179,233-256.

Musil, D., Zucic, D., Turk, D., Engh, R.A., May, I., Huber, R. et. al. (1991) The refined 2.15 A X-ray crystal structure of human Liver cathepsin B: the structural basis for its specificity. EMBO J. 10,232 1-2330.

Coulombe, R., Grochulski, P., Sivaraman, J., Menard, R., Mort, J.S., and Cygler, M. (1996) Structure of human procathepsin L reveals the molecular basis of inhibition by the prosegment EMBO J., 15,5492-5503.

Cygler, M., Sivaraman, J., Grochulski, P., Coulombe, R., Storer, AC., and Mort, J.S. ( 1996) Structure of rat procathepsin 8: mode1 for inhibition of cysteine protease activity by the proregion. Stnicture, 4,405-4 16.

Groves, M. R., Taylor, MA., Scott, M., Cummings, N.J., Pickengill, R.W., and Jenkins, J. A. (1996) The prosequence of procaricah foms an alpha-helicai domain that prevents access to the substnite-binding clek Sfruchire, 4, 1 193-203.

Sivaraman, J., Lalurniere, M., Menard, R., and Cygler, M. (1999) Crystal structure of wild-type human procathepsin K. Protein Sci.. 8,283-290.

Ban, N., Nissen, P., Hansen, J., Moore, P.B. and Steitz, TA. (2000) The complete atomic structure of the large nbosomal subunit at 2.4 A resolution. Science 289, 905-920.

Molimri, M., Anagli, L, and Carafoli, E. (1994) ca2+-activated neutral protease is active in the erythrocyte membrane in its nonautolyzed 80-kDa fom. J. Biol. Chem., 269,27992-2799s- Elce, J. S., Hegadom, C. and Arthur, J. S. C. (1997) Autolysis, ~a'+-requirement, and heterodimer stability in m-calpain. J. Biol. Chem., 272, 11268-1 1275.

Elce, J.S., Hegadom, C., Gauthier, S., Vince, J.W. and Davies, P.L. (1995) Recombinant calpain II: improved expression systems and production of a C 1OSA active-site mutant for crystallography. Protein Eng. 8,843-848.

GrahamSiegenthaler K, Gauthier S, Davies PL, Elce JS (1994) Active recombinant rat calpain II. Bacterially produced large and maIl subunits associate both in vivo and in vitro. J. Biol, Chem. 269,30457-30460.

Bradford, M. M. (1976) A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. And. Biochem. 72,248-254.

Raser, K.J., Posner, A. and Wang, K.K. (1995) Casein zymography: a method to study mu-calpain, m-calpain, and their inhibitory agents. Arch. Biochem. Biophys- 319,211-216.

Hendrickson, W.A., Horion, J.R. and LeMaster. D.M. (1990) Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct detemination of three-dimensional structure. EMBO J. 9, 166 5- 1672.

Jancarik, J. & Kim, S.H. (1 991) Sparse rnatrix sarnpling: A screening method for crystallization of proteins. J. Appt. Crystalfogr.24,4094 11.

Hendrickson, W. A. (1991) Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254,5 1-58.

Otwinowski, 2. Oscillation data reduction program. (1993) In Proceedings of the CCP4 Shrdy Weekend: Data Collection and Processing (ed Sawyer L., Issacs N.& Bailey S.) 56-62 (Daresbury Laboratory, Whngton).

Minor, W. ( 1993) XdispZuyF Program. Purdue University, West Lafayette, USA.

Collaborative Computational Project, Number 4. ( 1994) Acta Cystallogr. DSO, 760-763.

Ravelli, R.B.G., Sweet, R M., Skinner, LM., Duisenberg, A.J.M. & Kroon, J. (1997) STRATEGY: a prograrn to optimize the starting spindle angle and scan range for X-ray data collection. J. Appl. Cryst. 30,55 1-554.

100. Matthews, B.W. (1 968) Solvent content of protein crystals. J. Mol. Biol. 33, 49 1- 497.

10 1. Weeks, C.M. and Miller, R. ( 1999) ûptimizing Shake-and-Bake for proteins. Acta Crystallugr., D55,492-500. 102. Howell, P.L., Blessing, R.H., Smith, G.D. and Weeks, C.M. (2000) Opthking DREAR and SnB parameters for detennining Se-atom substructures. Acta CrystalZogr. D56,604-6 1 7.

103. de La Fortelle, E. and Bricogne, G. (1997) Maximum-likelihood heavy-atom panuneter refinement for multiple isomorphous replacement and multiwavelength anomaious diffiction methods. In Methoak in Enzymology. Macromolenrlar Crystallography, (ed Sweet R. M. & Carter, Jr. C. W.) 276, 472-494 (Academic Press, New York).

104. Abrahams J.P. and Leslie A.G.W. (1996) Methods used in the structure determination of bovine mitochondrial Fi ATPase. Acta C~llogr.D52,30-42.

105. McRee, D.E. (1992) A visual protein crystallographic software system for X 1 1BNiew. J. Mol. Graphics 10,4447.

106. Brünger, A.T., et al. (1998) Crystallography & NMR system: A new software suite for rnacrornolecular structure detemination. Acta Cvstallogr. D54,905-92 1.

107. Pannu, N.S. and Read, R.J. (1996) Improved structure refinement thmugh maximum likelihood. Acta Ctystaflogr.A52,659-668.

108. Engh, R.A. and Huber, R. (1 99 1) Accurate bond and anble parameters for X-ray protein structure refinement. Acta Cn>sraflogr.A47,392-400.

109. Bmger, A.T. (1992) The Free R Value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature 355,472-474.

1 10. Adams, P.D., Pannu, N.S., Read. R.J. & Brünger, A.T. (1997) Cross-validated maximum likelihood enhances crystallographic simulated annealing refinement. froc. Natl. Acad. Sci. USA 94,50 18-5023.

1 1 1. Read, R.J. (1986) Improved Folnier coefficients for maps using phases from partial structures with mors. Acta CqstalIogr. A42, 140- 149.

112. Roussel, A. and Carnbileau, C. (1989) TURBO FRODO (Centre National de la Recherche ScientifiqueRTniversité Aix-Marseille, Marseille, France), Version Open G1.1.

113. Evans, S. V. (1 990) Hardware lighted three-dimensional solid mode1 representations of macromoIecuies J. Mol. Graph. 11, 134- 13 8.

114. Frishman, D. and Argos, P. (1995) Knowledge-based protein secondary structure assignment. Proteins 23, 566-79.

115. Laskowski, RA., MacArthur, M.W., Moss, D.S. & Thomton, J.M. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. App. CrystaIlogr. 26,283-29 1. 116. Rarnakrishnan, C. and Ramachandrm, G.N. (1965) Stemchernical criteria for polypeptide and protein chah conformations II. Aliowed conformations for a pair of peptide units. Biophys. J. 5,909-933.

117. Cohen, G.H. (1997) ALIGN: a program to superimpose protein coordinates, accounting for insertions and deletions. J. Appl. Crystallogr. 30, L 1 60- 1 16 1.

118. Kraulis, P. J. (1991) MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr., 24,946-950.

119. Merritt, E. A. and Bacon, D. J. (1997) Raster3D: Photoredistic molecular graphics. Methoh E~ymol.,277,505-524.

120. Nicholls, A., Sharp, K., and Honig, B. (1991) Protein folding and association: insights fiom the intdacial and thermodynamic properties of hydrocarbons. Proteins, 11,28 1-296.

12 1. Peitsch, M.C. ( 1996) ProMod and Swiss-Model: Internet-based tools for automated comparative protein modeling. Biochem. Soc. Trans. 24274279.

122. van Gunsteren, W.F., Billeter, S.R., Eising, A.A., Hünenberger, P.H., Krüger, P., Mark, A.E., Scott, W.R.P. and Tironi, I.G. (1996). Biomolecular Simulation: The GROMOS96 Manual and User Guide. (Vdf Hochschulverlag AG an der ETH ZUrich, Zürich, Switzerland. pp: 1- 1042) 123. SYBYL (version 6.5) Manual (199 1- 1998). Tripos, Inc., St. Louis.

124. Kunkel, T.A., Shearman, C.W.and Loeb, LA. (1 98 1) Mutagenesis in vitro by depurination of phiX 174 DNA. Nature. 291,349-351.

125. Bimboim, H.C. (1983) A rapid alkaline extraction method for the isolation of plasmid DNA. Methoh Emymol. 100,243-255.

126. Hazes, B. (1988) SSBOND is a ''jifljr" program written by Bart Hazes while at the University of Alberta.

127. Baker, D., Shiau, A. K. and Agard, D. A. (1993) The role of pro regions in protein folding. Curr. Opin. Ceil. BioL. 5,966-970.

128. Vilei, E.M., Calderara, S., Anagli, J., Berardi, S., Hitomi, K., Maki, M. and Carafoli, E. (1997) Functiond properties of recombinant calpain 1 and of mutants lacking domains III and IV of the catalytic subunit. J. Biol. Chem. 272, 25802- 25808.

129. Rizo, J. and Sudbof, T. C. (1998) Cz-domains, structure and bction of a universal ca2+-bindingdomain. J. Biol. Chm.. 273, 15879- 15882. 130. Sutton, R. B., Davletov, B. A., Berghuis, A. M., Sudhof, T. C., and Sprang, S. R. (1995) Structure of the first Cz domain of synaptotagmin 1: a novel ~a"/~hos~holi~id-bindin~fold. CeII, 80,929-3 8.

13 1. Tompa, P., Emon, Y., Sorimachi, H., Suniki, K. and Friedrich, P. (200 1) Domain III of Caipain is a ca2*-~e~ulatedPhospholipid-Binding Domain. Biochern. Biophw Res. Commun. 280, 1333-1339.

132. McGrath, M.E. (1999) The lysosomal cysteine proteases. Annu. Rev. Biophys. Biomol. Stmct.. 28, 18 1-204. Appendix A: Principles of Protein Crystallography

AS Electmn Density and the Ciystallogruphic Phase Problem

The mathematicai principles of crystallography are quite cornplex, and are discussed in great detail in several excellent sources (1-4). The basic concepts of the rnethods employed in this project will be discussed below, and the reader is directed to the aforementioned matenals ( 1-4) for Merdetails.

At the fundamental level, X-ray crystallography depends simply on the ordered difiction of X-rays fiom ordered electrons within a crystal. Hence, a solved crystal structure achially represents a 3-dimensional cloud of electron density that originates from the molecule that makes up the crystal. This electron density is then interpreted visually on a graphics workstation or by automated cornputational approaches, so that the atoms of the molecule are correctly positioned within the electron density. In this way the atomic coordinates of the molecule are produced, and can be used to examine secondary and tertiary structure. Although most interpretation of protein structures is done at this level, it is important to remember that the actual result obtained &om crystallography is the electron density. Thus, solving a crystai structure essentiaily cornes dom to solving the fundamental equation of crystallography - the equation that defines the electron density: ui this equation, p(x y z) is the electron density at any point (x y z) in the unit cell, V is the volume of the unit cell, IF (h k 1)1 is the structure factor amplitude for a reflection

Eom the Miller plane (h k 1) and a@ k 1) is the corresponding phase angle. When X-rays are scattered by a crystal, their positions and intensities (1) are recorded on a detector to generate a difiction pattern. From the geometrical arrangement of spots, it is possible to determine the unit ce11 constants and thus calculate the volume of the ceII. Likewise, the structure factor amplitudes JF(h k 1)1 can be obtained from the difiction experiment directly, as the square root of the intensities, 1 (h k 1). However, current crystallographic procedures measure only photons on a detector, and do not record any information about the associated phase angles of the reflection. Thus equation A. 1 cannot be solved directly nom the diffraction pattern of a given crystal. This is the fundamental problern encountered with X-ray crystallography, and is known as the crystullugraphic phase problem. Inspection of equation A.1 indicates the presence of an imaginary term.

However, the electron density is a purely real function (represented in equation A.2), which can be proven mathematically by expanding the exponential function and

incorporating Friedel's Law (discussed below).

There are four methods most commoniy used for determining the protein phase

angles: direct methods, molecular replacement (MR), multiple isomorphous replacement

(MIR) and muitiwavelength anomalous dispersion (MAD). Suice MIR and MAD are the most commonly used techniques for novel macromolecular structure determination, their key p~cipleswill be discussed in greater detail below.

Direct methods rely on the fact that there is an inherent mathematical relationship between the intensity of a reflection and its phase. This method is rnainly applicable to the solution of small molecule structures, because this weak relationship quickly breaks down with the increasing numbers of atoms and lower diffraction resolutions generally obtained from protein crystals.

MR is a powerful method for deterrnining the structure of a molecule for which a structure of a closely related molecule is known. To solve a structure using MR, the known structure acts as a search model, which is first rotated, then translated into the unit cell of the unknown structure using Patterson methods. Once appropriately positioned within the unit cell, the structure factors calculated from the model structure are used as an initial approximation for refinement against the observed diffraction data of the

unknown structure.

A.2 Principles of the MIR Method

A.2.1 Overview of MIR

Pnor to the large-scale application of the MAD method in the late 1990's. most macromolecular structures were solved using the MIR method. To detemine the protein phase angles using MIR, heavy atoms must be specifically bound into the native crystal.

Ideally, heavy atom derivatives should not cause any major changes in the conformation

of the protein or the crystal lattice. Such derivatives are termed 'Tsomorphous", and are

best mited for facilitating structure determination. Both native and heavy atom

derivatized crystals are subjected to X-ray diffiction experiments. Introduction of electron-dense heavy atoms within the crystal causes intensity ciifferences between the correspondhg reflections of native crystals and heavy atom-derivatized aystals. Using difference Patterson methods, these intensity differences cm be exploited to determine the positions of the heavy atoms. In typical cases, two different heavy atom derivatives are required to unambiguously determine the protein phases and subsequently calculate the electron density map. To clariQ how the crystallographic phase problem can be solved using MIR, a more detailed description of the structure factor and Harker notation will be describeci.

AUThe Structure Factor

The structure factor, F, of a particular reflection arising firom X-ray diffraction by a crystal is a complex nurnber that is represented by both a magnitude, IF1, and a phase, a.

Although IF1 (sometimes represented just as F) can be obtained directly from a difhction

experiment simply by taking the square root of the measured intensity. a cmot be measured. Thus, a may be any angle between O and 360 degrees, and so the structure

factor is oflen represented as a circle in a vector diagram as illustrated in Figure A. 1. For

structure detemination in MIR, it is common to represent the structure factor of the

native protein as Fp, the heavy atom derivative as FPH,and the heavy atom alone as FH. Figure A.I. Representation of the protein structure factor. The structure factor of a reflection cmbe represented as a vector having both amplitude and a phase. In this figure, the rneasued amplitude is FP (ofien shown as IFp/), and the unknown phase, ap.can exist between O and 360 degrees. Hence the structure factor Fp is represented as a circle.

A.2 J Location of Heavy Atorn Positions: The Isomorphous Difierence Patterson

To solve a structure using the MIR method, structure factor amplitudes are measured separately from crystals of the native protein (IFp() and from crystals of the heavy atom derivative (IFPHI). if the heavy atom is incorporated into the crystal

"isomorphous1y", it is assumed that the conformation of the native protein, and consequently the unit cell, have not been perturbed. In this case, the intensity differences between the two is solely due to the contribution of the heavy atom. These intensity diffaences can be used to detennine the positions of the heavy atoms within the unit ce11 using isomorphous diflerence Patterson methods.

Like the electron density function, the Patterson function is describeci by a Fourier sumation, which uses the square of the structure factor amplitudes as coefficients, but does not rely on the phase angles (Equation A.3). 1 P(u v w) = - 1 F(hk1) l2 cos[2n(hu + kv + lw)] vhü

A Patterson map calculated €rom this equation is a vector rnap in which peaks represent the interatomic vectors between atorns. Thus, a peak at position u.v.w in the Patterson ce11 means that atoms exist at positions x, y. s and either -Y + u, y + v, r + w or x - u, y - v, r

- w in the real unit cell. Although Pattenon maps calculated from protein crystals are extremely complex, isomorphous-difference Patterson maps (which use IFpH[- IFPI as the coefficient in the Fourier summation)* are significantly less complex, and can be used to deduce the arrangement of the heavy atoms. Symmetry elements present in certain space groups can ofken be used to detennine the exact positions of the heavy atoms in the unit cell. These elements give nse to so-called Harker sections. on which heavy atoms contribute to the highest (non-origin) peaks in the Patterson map and therefore simplify the interpretation of the heavy atom inter-atomic vectors.

A.2.4 MIR and the Solution of the Phase Problem

Once the heavy atom positions have been identified (either through diffmence

Patterson maps or direct methods), it is possible to calculate the structure factor

(amplitude and phase) of the heavy atorn. This information can be used to determine the protein phase angles, as illustrateci in the Harker diagnuns in Figure A.2.

* IFeH/- lFpi " IFH(,which approximately represents the contniution of the heavy atom doue, Ideally, therefore, a Patterson rnap caiculated with [IFPHI- lFel] as the coefficient gives infornational solely about the heavy atom arrangement. Figure A.2 Solution of the phase problem by multiple isornorphous replacement. a) In single isomorphous replacement (SIR), when only a single heavy atom derivative has been produced, a unique solution to the phase problem cannot be obtained. b) ïhe unique solution of the phase angle can be obtained when multiple heavy atom denvatives are available, as in multiple isomorphous replacement (MIR). In this figure, the structure factor amplitudes are given as Fp for the native protein, FpHfor the first heavy atom derivative and FPE for the second heavy atom derivative. -Fh and -Furepresent the structure factor amplitudes of the heavy atoms in derivative one and hvo, respectively. From experimentation, Fp, FpHand FPH2are rneasured, while Fh and FE (and the corresponding phases of the heavy atoms ah and a~) are calculated hmthe positions of the heavy atoms. As in Figure A. l., the unknown phase angle is represented by a circle, and may be any value between O and 360 degrees. Based on the heavy atom positions, a precise vector with amplitude -Fh and direction ah is drawn, and serves as the origin of a new phase ckle for the corresponding heavy atom derivative. This results in the formation of a triangle with vectors satisQing the equation FpH- Fh = FP. The obvious consequence of such an arrangement is that there are two locations where the circles intersect, representing possible solutions for ap(which is the goai of crystallography). To solve this phase arnbiguity and deduce a unique value for ap,a second heavy atom derivative is required, as in (b). This solution is indicated as the common intersection point for the three circles.

This graphical solution can be proven mathematically using the cosine law, and the remltant formula is indicated in equation A.4, which has two possible solutions due to the inverse cosine bction. Hence, two derivatives are required to obtain a unique solution for ap. In this equation, Fp and FpHare rneasured directly from the difhction expexirnent, while FH and a~are calculated fiom the heavy atom positions. A.3 Multiwavelength Anomalous Dispersion (W)

A3.1 The Fundamentai Concept of MAD - Anomaious Scattering

In the classical treatment of X-ray scattering, electrons within the scattering atoms

are considered to be free, and the scattered beam diffen by exactly 180' in phase from the

incident beam. An important distinction aïises when the wavelength of the incident X-

ray approaches an absorption edge of a particular atom. For certain elements, the

absorption characteristics change dramatically as a function of wavelength, as illustrated

in Figure A.3.

* Absorption - OS Fluwescence os f" O, L

Figure A.3. Absorption characteristics of elements change as a hction of X-ray wavelength.

The sharp change in the cume is ref'ed to as an absotpiion edge, and it refers to

the energy at which an imer sheil electron is ejected by the incident X-ray photon. Imer

shell electrons are tightly associated with the nucleus as a result of the nuclear charge,

which is especially strong in heavier elements. As a result, scattering of the X-rays is

perturbeci, as the diffracted beam does not differ by 180' in phase angle fiom the incident

beam. In these cases, we Say that anomalous scattehg is observed. Since lighter atoms

such as carbon, oxygen and nitrogen do not undergo electronic transitions at wavelengths

near 1.0 A, anomalous scatte~gis only typically observed when heavier atoms are present. In practice, sulfur is the lightest element that can rnake a contribution to anomalous X-ray scattering, aithough its effect is quite weak in proteins.

Anomalous scattering can be compared to scattering fiom fkee electrons as ihstrated:

Figure A4. a) Scattering from a fiee electron. b) Anomalous scane~gfiom an inner shell eIectron. The anomalous contribution consists of two parts, f '. which is reai. and if ", which is imaginary.

Hence, anomalous scattering fiom an electron cm be written as:

where i represents that f" is 90" phase-shifted. Fmm this equation, we see that (f' + if') is the correction for anomalous scattering. The correction terms for the scattering factors of the Friedel mates F(hkl) and F(-h-k-1) will affect each in a different manner:

and It can be seen fkom the Argand diagram in Figure AS that -f and +f are symmetric about the real axis while 4'' and +f are symmetric about the imaginary axis.

As a result, f+ is not equal to f , so that necessarily, F(hk1) does not equal F(-h-k-1).

This is the basis of macromolecular structure determination by MAD.

Figure A.5. Effects of anomdous dispersion on the atomic scattering factors. Note that the real cornponents are symmetric about the real axis, while imaginary cornponents are syrnrnetric about the imaginary axis. As a result of these correction factors, f+ and f' are different, and Friedel's Law breaks dom.

It is known that the values of both f and f' are highly wavelength-dependent, thus the presence of an anomalous scatterer introduces two effects that cm be used for phase information. The anomalous signal arises at a given wavelength, such that the intensity of the reflection (hkl) does not equal that of the (-h-k-1) reflection. The dispersive signal arises f?om intensity differences between corresponding reflections

(hki) at differennt wavelengths. These intensity differences cm be exploited to determine the location of the anomalous scatterer7 because the diffaences are entirely due to the anomalous scatterer (heavy atom).

Determination of the protein phase angles by MAD can be descnied as a special case of MIR with anomalous scattering (MIRAS), and is conceptwlly the same in texms of representation through Argand diagrams (Figure A.2). in MAD, a wavelength remote fiom the absorption edge is considerd a 'hative" data set, and al1 other wavelengths are considerd as heavy atom derivatives. Thus, if three wavelengths were collected, with one serving as a ''native" data set, then we codd Say that we have two dispersive (or isomorphous in MIR terminology) derivatives and up to three anomalous derivatives, depending on the particular heavy atom and wavelengths that are used. If anomalous scattering is present, then a minimum of two wavelengths are required to unambiguously determine the phase angles (one dispersive plus one anomalous signai).

A3.2 Advantages of MAD

The power of the MAD method resides in the fact that al1 the data required to solve the crystal structure cm be collected from of a single crystal. Since MIR requires data collected fiom multiple crystals, several systematic mors are introduced fiom the non-isomorphism of native and derivative unit cells. These mors can have marked effects on the accuracy of the phase angle determination. In the MAD method, since al1 the data are collected from one crystal with intrinsically perfect isornorphisrn, systematic mrs are significantly reduced. As a redt MAD oflen results in extremely high quality electron density maps.

A33Historical Limitations of MAD

Although MAD is an excellent method for determinhg macromolecular crystal

structures, its power has been exploited only recently. In fact, the theory for structure detemination by MAD has existed for around fi@ years, while the tirst structure actually solved by MAD was the blue copper protein in 1988 (5). Severai technological advances in recent years have made the MAD method a practical alternative to MIR, and have revolutionized the field of protein crystallography.

First, MAD requires a source of X-rays with the capability of changing wavelengths. Laboratory X-ray sources produce untunable X-rays, so the ability to change wavelengths is limited to synchrotron sources. Synchrotron sources generate a continuous spectrum of X-rays, which can be effectively filtered to utilize a particular wavelength. Further, it is oniy recently that wavelength selection at synchrotron sources has been reliable enough for the very subtle differences that are required for MAD (see

Table 3.3 and section A.3.3 below).

The second major innovation was the near universal adoption of cryogenic procedures for data collection. X-rays induce free radical reactions within most macromolecular crystals that darnage or even destroy them. Since MAD requires a large amount of data to be collected fiom one crystal using extrernely high-energy synchrotron radiation, most protein crystals would decay before data collection is completed. Within the last few years, instrumentation has been developed to allow data collection at temperatures in the vicinity of 100 K. At these temperatures, the rate of free radical reactions is significantly reduced, and rnost protein crystals are effectively irnrnortalized, even when exposed to high-intensity synchrotron radiation. A related technical advance is the implementation of CCD detectors to measure the difficted X-rays. CCD detectors are more sensitive and have faster data readout than conventional X-ray detectors such as phosphor-storage image plates, and therefore speed up data collection quite significantiy. A fourth innovation that has made MAD a practical method deais with the incorporation of heavy atorns (anomalous scatterers) within crystals. The recent application of molecular biology techniques to incorporate selenium as a heavy atom has had a dramatic impact on the use of MAD by crystailographers. Selenium may be incorporated into recombinantly expressed proteins by substituting selenomethionine

(SeMet) for methionine. In the case of expression in E. coli, an auxotrophic mutant ce11 line that is incapable of synthesizing methionine, such as B84(DE3), is utilized. The cells are then grown in a defined medium lacking methionine, but supplemented with SeMet.

Under these conditions of protein expression there is virtually a complete uptake of

SeMet, which can be show by arnino acid analysis or mas spectrometry. Although the final yields of purified protein are usually lower, and the purified protein is slightly more prone to oxidation and aggregation, this is an excellent method to sprcifically introduce heavy atoms. Fortunately, in most cases SeMet-derivatized proteins crystallize in conditions identical to or very similar to those for the native protein. Additionally, selenium has an ideally suited absorption edge at 0.98 A (12,638 eV), and is heavy enough to provide useful intensity differences resulting from the anomalous scattering.

A33 Practical Aspects of MAD Data Collection

In the MAD method, a single crystal containing anomalously scattering atoms is subjected to data collection at multiple wavelengths. The particular wavelengths to be chosen are determineci by first performing an X-ray fluorescence scan on a crystal in the vicinity of the absorption edge of the heavy atom that is present. An X-ray fluorescence spectnrm for SeMet-derivatized calpain is illustrated in Figure 3.7. In most MAD

experirnents data are collected at the peak, inflection point, and a remote wavelength.

Since the signal resulting from anomalous dispersion is relatively srnall (usually

less than 5% of the scattering intensity), great care must be taken to ensure that al1

reflections are rneasured as accurately as possible (There is always a small error

associated with the memeci intensities during data collection). It is known that even under cryogenic conditions, the diffraction characteristics of a crystai can change slightly over prolonged periods of X-ray exposure. Thus, it is important to make measurements of the Friedel mates (Idci) and (-h-k-1) close together in time. Two data collection

strategies have thus been routinely employed. When data are collected in inverse-beam geornew, data are collected in wedges. It is well-established that if the reflection (hkl) is

recorded on the detector at a given crystal orientation, that the (-h-k-1) reflection is

recorded when the crystal has been rotated by exactly 180°. Thus, a wedge of data

(typicaily IO0) is collected (say from O0 to 10°), and then the Friedel mates of these

reflections are subsequently recorded (fiom 180" to 190'). This ensures that al1 (hkl) and

(-h-k-1) reflections are recorded close together in time, and thus mors due to X-ray-

induced crystal damage is reduced. When data are collected in mirror geometv, the

crystal is carefully aligned with respect to the detector such that a mirrot image exists on

the difiction pattern. As a result, the (hkl) and (-h-k-1) reflections are recorded on the

same difhction image. This has the advantage of collecting the Friedel mates at the

exact sarne time while also minimizing systernatic mon that must be introduced due to

relative scaling of diflhction intensities on different images. A3.5 Summarizing the MAD method

MAD is an extremely powerful method for solving the crystallographic phase problem. The strength of MAD stems fiom the fact systematic errors are reduced, as al1 required data are collected fiom one crystal, so the resultant electron density maps are of very high quality. MAD relies on the fact that heavy atoms scatter X-rays anomalously at characteristic wavelengths, and the intensity differences allow for the location of the anomalous scatterers. Recent technological advances including recombinantly expressed

SeMet-labeled proteins, cryogenic (and more rapid) data collection and more convenient access to wavelength-tunable synchrotron sources have allowed practical implementation of this technique, whose power has been known in theory for quite some time.

Anomalous scattering effects are relatively small, thus great care must be taken to ensure accurate collection of MAD data. References for Appendir:

1. Drenth, J. Principles of Protein X-ray Crystallography. Springer-Verlag, New York. ( 1994).

2. Methoh in E~ymology.Volume 276 Part A and B. (Carter, C.W. & Sweet, R.M. editors), 1 997, Academic Press.

3. Stout, G.H. and Jensen, L.H. X-ray Structure Determination. A Practical Guide. John Wiley & Sons, New York (1989).

4. T.L. Blundell & L.N. Johnson (1976), "Protein Crystallography", Academic Press: London.

5. Guss, J.M., Menitt, E.A., Phizackerley, R.P., Hedman, B., Murata, M., Hodgson, K.O. and Freeman, H.C. (1988) Phase determination by multiple-wavelength x-ray diffraction: crystal structure of a basic "blue" copper protein fiom cucumbers. Science, 241 : 806-8 1 1.