Structural Studies of the CLIC

Dene R. Littler

A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy

April 2005

School of Physics The University of New South Wales Sydney 2052 Australia

2 Abstract

This thesis presents the structures of several CLIC proteins, as determined by X-ray crystallography. Conclusions are drawn relating to the CLIC hypothesis and evolutionary relationships within the CLIC family.

The CLICs are a group of proteins first discovered when the founding family member bound to a diuretic compound that blocks chloride ion channels in bovine kidney. It has been hypothesised that the CLIC proteins themselves may be these chloride ion channels. However, the CLICs have a distinct soluble state that is known to adopt a glutathione transferase like fold. If the CLICs form ion channels they must be able to transit from this soluble state to an alternative membrane-spanning state, which would require them to be structurally dynamic. Previously, the glutathione transferase fold had mainly been known from proteins with enzymatic activities. As such it was not thought to be an exceptionally dynamic fold. This thesis demonstrates that oxidation of CLIC1 results in dimerisation in vitro, the crystal structure of this dimeric form was solved at 1.8Å resolution. In this structure a large-scale structural change occurs within the N-terminal domain of the glutathione transferase fold, which involves the restructuring and loss of the monomer’s β-sheet. Residues thus exposed form the new dimer interface. An intramolecular disulfide bond is formed during dimerisation and mutational studies show that this is a necessary requirement of the process. One of the cysteines involved in this intramolecular disulfide bond is unique to CLIC1. The structural rearrangement that occurs during CLIC1 dimerisation demonstrates that the N-terminal domain of the CLIC fold is capable of being structurally dynamic. The inability of other CLIC family members to form a similar structure limits the conclusions that can currently be drawn about channel formation.

The soluble monomeric structures of human CLIC4 and the CLIC homologues from Drosophila melanogaster and Caenorhabditis elegans are also presented (solved at 1.8 Å, 1.9Å and 1.9 Å resolution respectively). Like CLIC1 these all adopt a glutathione transferase fold, albeit with some minor differences. Based on these structures and also an analysis of sequence similarities, evolutionary relationships within the CLIC family are discussed.

1 Originality statement

I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person or substantial portions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgment is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.

Papers resulting from this thesis

The work presented in Chapter 5 of this thesis has been published previously in the paper:

Littler D. R., Harrop S. J., Fairlie D. W., Brown L. J., Pankhurst G. J., Pankhurst S., DeMaere M. A., Campbell T. J., Bauskin A. R., Tonini R., Mazzanti M., Breit S. N. and Curmi P. M. G. “The Intracellular Chloride Ion Channel CLIC1 Undergoes a Redox-controlled Structural Transition” The Journal of Biological Chemistry Vol. 279, No. 10, Issue of March 5, pp 9298-9305, 2004

The work presented in Chapter 4 has been published in the paper:

Littler D. R., Assaad N. N., Harrop S. J., Brown L. J., Pankhurst G. J., Luciani P., Aguilar M. I., Mazzanti M., Berryman M. A., Breit S. N., Curmi P. M. G. “Crystal structure of the soluble form of the redox-regulated chloride ion channel protein CLIC4” FEBS Journal Vol. 272, No. 19, Issue of Oct, pp 4996-5007, 2005

3 Acknowledgements

The great scientist Isaac Newton is accredited with the saying “If I have seen further it is by standing on the shoulders of Giants”. I’m working on the basis that stacking up a large number of normal people should also work. I’m grateful to all of the load bearers included in this pile, who helped in the creation of this thesis.

In particular I thank my supervisor Paul Curmi and co-supervisor Sam Breit for the opportunity to work on the project. Paul for the many proof-reads, encouragement, ideas and infective sense of humour. Sam, the founder of the project, has been invaluable for his organisational skills and intellectual input.

Thanks also go to all the members of the crystallography lab for making my stay so enjoyable: Stephen Harrop for his help with data processing and modelling, as well as the beneficial formal* and informal† discussions regarding the CLICs; Louise Brown for the many hours spent performing her much loved efflux experiments or mulling ideas over during coffee. Krystyna Wilk, the unofficial manager of the lab’s mental health, a trying task at the best of times; Jason Whittaker for general advice on science and life, before he moved onto bigger and better things; Gary Keenan and Tamara Reztsova for fixing all that breaks; Matt DeMaere who performed much of the early crystallographic work on CLIC1, and to Andrew Mynott and Mika Jormakka who will provide it in the future.

Gratitude also needs to be sent out to the members of the Breit lab who were vital to the project. I crossed paths with Doug Fairlie only briefly but utilised much of his work; many a Friday morning meeting was spent with Greg Pankhurst, thanks are also due to Greg for his Biacore experiments and Westerns; Dave Smith for cloning a number of the CLICs into expression vectors; and more recently Lele Jiang for her in cell work.

To the overseas collaborators: Grazie to Antonella, Ivan and Michele who made my stay in the Mazzanti’s lab in Rome so great; also Mark Berryman from Ohio for his work on dmCLIC and Katie Berry and Oliver Hobert in New York for their work on EXC-4.

On a personal note, I’d like to thank my family for all their support. To my mother Gill, it’s hard to sum twenty odd years worth of appreciation in the space available. The same applies to family now in Scotland: Nyree, Cass, Liz and Craig. To Stu, thanks for the 10 years of flatting together. But most of all, I’d like to thank Simone for her support, love and understanding during the last few years, I couldn’t have done it without any of you.

* Discussions involving hallowed halls and coffee. † Discussions involving beer and music. 4 Table of Contents

LIST OF TABLES...... 8

LIST OF FIGURES ...... 8

ABBREVIATIONS USED IN THE THESIS...... 10

SPECIES NAMES USED IN THIS THESIS...... 11

CHAPTER 1 INTRODUCTION ...... 14

1.1 Ion channels...... 14

1.2 The CLIC family...... 15 1.2.1 p64...... 17 1.2.2 CLIC1 ...... 26 1.2.3 CLIC2 ...... 29 1.2.4 CLIC3 ...... 29 1.2.5 CLIC4 ...... 33 1.2.6 CLIC5 ...... 38 1.2.7 CLIC6 ...... 43 1.2.8 CLIC protein interaction partners ...... 49

1.3 Evidence for the CLICs as chloride channels...... 56

1.4 The Glutathione Transferase family ...... 64 1.4.1 Introduction...... 64 1.4.2 The GST fold family ...... 65 1.4.3 The canonical GSTs ...... 70 1.4.4 The GST fold ...... 72 1.4.5 The GST dimer...... 77 1.4.6 GST substrate binding sites...... 78

1.5 Conclusions...... 80

CHAPTER 2 REVIEW OF THE CRYSTAL STRUCTURE OF REDUCED CLIC1 82

2.1 Introduction...... 82

2.2 Conclusions...... 87

CHAPTER 3 CLIC STRUCTURE AND EVOLUTIONARY RELATIONSHIPS...... 88

3.1 Introduction...... 88

3.2 The clic gene structure...... 88 3.2.1 The vertebrate clic gene structure ...... 88 3.2.2 The invertebrate clic gene structure ...... 95

3.3 CLIC homologues ...... 97 3.3.1 Introduction...... 97 5 3.3.2 Deuterostome CLICs...... 99 3.3.3 CLICs from other Metazoan phyla...... 110 3.3.4 Cnidarian CLICs ...... 121

3.4 Sequence alignment ...... 121

3.5 Conclusions...... 125

CHAPTER 4 CRYSTAL STRUCTURE OF REDUCED CLIC4...... 128

4.1 Introduction...... 128

4.2 Materials and Methods...... 129 4.2.1 Cloning...... 129 4.2.2 Protein expression and purification...... 130 4.2.3 Crystallisation ...... 131 4.2.4 Data collection ...... 132 4.2.5 Structure determination and refinement...... 134

4.3 Results...... 134

4.4 Discussion ...... 138 4.4.1 Comparison of CLIC4(ext) and CLIC1 structures ...... 138

4.5 Conclusions...... 140

CHAPTER 5 CRYSTAL STRUCTURE OF OXIDISED CLIC1...... 142

5.1 Introduction...... 142 5.1.1 Reversible redox-induced dimerisation of CLIC1 ...... 142

5.2 Materials and Methods...... 144 5.2.1 Expression constructs...... 144 5.2.2 Protein expression and purification...... 144 5.2.3 Oxidation and dimer formation...... 145 5.2.4 Crystallisation ...... 145 5.2.5 Data collection ...... 146 5.2.6 Structure determination and refinement...... 148 5.2.7 Ramachandran distance...... 148

5.3 Results...... 149 5.3.1 Structure of the oxidised dimer of CLIC1...... 149 5.3.2 Comparison of the CLIC1 monomer and dimer...... 150 5.3.3 The effects of single amino acid mutants on dimer formation ...... 159 5.3.4 The presence of Cys-59 is necessary but not sufficient for dimer formation...... 163 5.3.5 Reduction of CLIC1 prevents channel formation ...... 165

5.4 Discussion ...... 168 5.4.1 The CLIC1 dimer ...... 168 5.4.2 Redox induced conformational change ...... 174 5.4.3 Dimer formation and the GSTs ...... 176

CHAPTER 6 THE INVERTEBRATE CLIC-LIKE PROTEINS...... 178

6.1 Introduction...... 178

6.2 Materials and Methods...... 181 6.2.1 Cloning...... 181 6 6.2.2 Protein purification...... 181 6.2.3 Crystallisation ...... 182 6.2.4 Data collection ...... 183 6.2.5 Structure solution and refinement ...... 184

6.3 Results...... 185 6.3.1 Protein purification and oxidation...... 185 6.3.2 Crystal structure of dmCLIC...... 186 6.3.3 Crystal structure of EXC-4...... 187 6.3.4 Comparison of the invertebrate and vertebrate CLIC structures...... 189

6.4 Discussion ...... 201 6.4.1 Overview of the invertebrate CLIC structures ...... 201 6.4.2 The putative CLIC glutathione binding sites ...... 201 6.4.3 Helix 6 N-capping motif and hydrophobic staple ...... 203 6.4.4 Cis – proline loop...... 206 6.4.5 N- and C-terminal extensions...... 209 6.4.6 Crystallographic dimer...... 210 6.4.7 Helix 4...... 211 6.4.8 DmCLIC and EXC-4 hydropathy ...... 213 6.4.9 Conclusions...... 215

CHAPTER 7 DISCUSSION...... 217

7.1 Introduction...... 217

7.2 Conclusions from this thesis...... 217

7.3 The evidence for the CLICs as ion channels...... 219

7.4 Future work...... 221

REFERENCES ...... 225

7 List of Tables

Table 1-1 CLIC family name aliases...... 16 Table 1-2 anti-CLIC antibodies in the literature...... 22 Table 1-3 Reported intracellular distributions of CLIC1...... 28 Table 1-4 Reported intracellular distributions of CLIC4...... 36 Table 1-5 Reported intracellular distributions of CLIC5...... 42 Table 1-6 Reported intracellular distributions of CLIC6...... 48 Table 1-7 Miscellaneous CLIC interaction partners...... 54 Table 1-8 Chloride conductances reported to be associated with the CLIC family...... 59 Table 3-1 Known chordate CLIC sequences...... 108 Table 3-2 Known CLIC-like sequences within the Arthropods...... 113 Table 3-3 Known nematode CLIC-like sequences...... 119 Table 4-1 Data reduction and refinement statistics for CLIC4(ext)...... 133 Table 5-1 Data reduction and refinement statistics for oxidised CLIC1...... 148 Table 5-2 activity of CLIC1...... 168 Table 6-1 Data reduction and refinement statistics for dmCLIC and EXC-4...... 187 Table 7-1 Summary of the arguments for and against the CLICs acting as ion channels...... 219

List of Figures

Figure 1-1 Chemical structure of indanyloxyacetic Acid 94(+)16...... 18 Figure 1-2 Sequence alignment of the CLIC3 ESTs and the cloning constructs used by Qian et al.37...... 32 Figure 1-3 Chemical structures of the inhibitors discussed in the text...... 63 Figure 1-4 Phylogram of proteins with the GST fold containing an active site cysteine...... 67 Figure 1-5 Simplified overview of the GST fold superfamily...... 69 Figure 1-6 Domain and motif hierarchy of the GST fold...... 72 Figure 1-7 Structure of human thioredoxin...... 73 Figure 1-8 Structure of the GST C-terminal domain...... 74 Figure 1-9 Subunit structure in the GST family...... 76 Figure 1-10 Cartoon representation of a pi class GST dimer...... 77 Figure 1-11 The G-site and H-site in a human pi class GST...... 79 Figure 2-1 Features of the CLIC1 structure...... 83 Figure 2-2 The three conserved cysteines in the CLICs...... 85 Figure 3-1 Clic gene structure...... 89 Figure 3-2 Phylogram of the human CLICs...... 92 Figure 3-3 CLIC1 pseudogenes...... 94 Figure 3-4 Exon structure of the insect and nematode CLIC homologues...... 96 Figure 3-5 Structure of the C. intestinalis CLIC gene...... 105 Figure 3-6 Proposed metazoan molecular phylogenies...... 110 Figure 3-7 Phylogram of CLIC and CLIC-like proteins...... 124 Figure 3-8 Overview of Runt and CLIC family members...... 127 Figure 4-1 CLIC4(ext) crystals...... 132 Figure 4-2 Diffraction image from CLIC4(ext) crystal...... 133 Figure 4-3 CLIC4(ext) structure...... 135

8 Figure 4-4 Features of the CLIC4(ext) structure (overleaf)...... 136 Figure 5-1 Biochemical analysis of the oxidised CLIC1 dimer...... 143 Figure 5-2 Dimeric CLIC1(wt) crystals...... 146 Figure 5-3 Diffraction image from a dimeric CLIC1 crystal...... 147 Figure 5-4 Structure of the oxidised CLIC1 dimer...... 149 Figure 5-5 Transition between the monomeric and dimeric CLIC1 forms...... 150 Figure 5-6 Surface complementarity of the dimer interface...... 151 Figure 5-7 Stereo view of the dimer interface...... 153 Figure 5-8 Hydrophobic pocket formation at the dimer interface...... 154 Figure 5-9 ClustalW120 alignment of the CLIC family...... 155 Figure 5-10 Ramachandran distance plots...... 157 Figure 5-11 Cys-24 and Cys-59 are essential for dimer formation...... 160 Figure 5-12 Oxidation of several CLIC1 single amino acid mutants…………………161 Figure 5-13 The CLIC1 interdomain loop...... 162 Figure 5-14 Oxidation of CLIC1(T52P) and CLIC4(A70C) mutants...... 164 Figure 5-15 Electrophysiological characterisation of the CLIC1 monomer and oxidised dimer...... 166 Figure 5-16 Chloride efflux from liposomes exposed to monomer or dimer...... 167 Figure 5-17 Redox switch in the transcription regulator protein OxyR...... 175 Figure 6-1 The C. elegans exc phenotype...... 179 Figure 6-2 Crystals of DmCLIC...... 182 Figure 6-3 Oxidation of the invertebrate CLICs...... 185 Figure 6-4 Cartoon representation of dmCLIC...... 186 Figure 6-5 The dimer within the asymmetric unit in the P21 EXC-4 crystal form...... 188 Figure 6-6 Comparison of CLIC4 and the invertebrate CLICs...... 189 Figure 6-7 Sequence alignment of CLIC1, dmCLIC and EXC-4...... 190 Figure 6-8 Differences in loop regions between dmCLIC and CLIC1...... 191 Figure 6-9 Differences between EXC-4, dmCLIC and CLIC1...... 192 Figure 6-10 Cartoon representation of the dmCLIC C-terminal extension...... 193 Figure 6-11 Cartoon representation of the EXC-4 C-terminal extension...... 194 Figure 6-12 The CLIC redox active site...... 196 Figure 6-13 Conformation of the CxxS, CxxC and DxxC motifs in CLIC1, dmCLIC and EXC-4...... 197 Figure 6-14 Helix 6 N-terminal capping motif in dmCLIC...... 197 Figure 6-15 DmCLIC cis-proline loop prior to β-strand 3...... 199 Figure 6-16 Residues interacting with GSH in the CLICs and various classes of the GSTs...... 206 Figure 6-17 Structure of helix 4 in dmCLIC and EXC-4...... 212 Figure 6-18 Kyte-Doolittle hydropathy plot of various CLICs around their putative transmembrane regions...... 214

9 Abbreviations used in the thesis

2YT Two yeast tryptone A9C Anthracene-9-carboxylic acid aa Amino acid(s) AKAP350 Protein kinase A-anchoring protein 350 Amp Ampicillin bp (s) Cam Chloramphenicol cAMP Cyclic adenosine 3’5’-monophosphate cDNA Complementary deoxyribonucleic acid CDNB 1-chloro-2,4-dinitro-benzene C-domain Carboxy terminal domain CLIC See text, Section 1.2 C-terminal Carboxy terminal Da Daltons DCPIB (4-(2-Butyl-6,7-dichloro-2-cyclopentyl-indan-1-on-5-yl) oxybutyric acid DIDS 4, 4’-diisothiocyanatostilbene-2,2’-disulphonic acid DIOA [(dihydroindenyl) oxy] alkanoic acid DNDS 4, 4’-dinitrostilbene-2,2’-disulphonate DHAR Dehydroascorbate reductase DR Dopamine receptor DsbA Disulphide bond isomerase A DTT Dithiothreitol EA Ethacrynic acid ERK7 Extracellular signal-regulated kinase 7 EST Expressed sequence tag GDAP1 Ganglioside induced differentiation associated protein 1 GFP Green fluorescent protein Grx Glutaredoxin GSH Reduced glutathione (γ-glutamyl-cysteinyl-glycine) GSSG Oxidised glutathione GST Glutathione S-transferase HA Hemagglutinin HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulphonic acid IAA Indanyloxyacetic acid IPTG Isopropyl-β-D-thiogalactopyranoside IQGAP1 IQ motif containing GTPase activating protein 1 LB Luria-Bertani MBP Maltose-binding protein MQAE [N-(ethoxycarbonylmethyl)-6-methoxyquinolinium bromide] MUPP1 Multi-PDZ domain protein 1 MW Molecular weight N-domain Amino terminal domain NLS Nuclear localisation sequence NPPB [5-nitro-2-(3-phenylpropylamino)-benzoic acid] nt Nucleotide(s) N-terminal Amino terminal ORF Open Reading Frame

10 Pa Pascal PBS Phosphate buffered saline PCR Polymerase chain reaction PDB protein databank PKA Protein kinase A PKC Protein kinase C PP1 Protein phosphatase 1 r.m.s.d. Root mean square deviation S.D. Standard deviation SDS-PAGE Sodium Dodecyl Sulphate Polyacrylamide Gel Electrophoresis SspA Stringent starvation protein A TGF Transforming growth factor TNF Tumour necrosis factor TRAPP Transport protein particle TRIS Tris(hydroxymethyl) aminomethane TS-TM 5, 11, 17, 23 tetrasulphonato-25, 26, 27, 28 tetramethoxy VLR Variable lymphocyte receptors

Species names used in this thesis

Scientific name Common name Acyrthosiphon pisum pea aphid (insect) Aedes aegypti Yellow fever mosquito (insect) Amblyomma variegatum tick (arachnid) Ambystoma mexicanum Mexican walking fish (amphibian) Anopheles gambiae malarial mosquito (insect) Apis mellifera honey bee (insect) Ascaris suum nematode (clade III, vertebrate parasite) Bombyx mori domestic silkworm (insect) Bos tarus domestic cattle (even-toed ungulate) Brugia malayi nematode (clade III, vertebrate parasite) Caenorhabditis briggsae nematode (clade V, bacteriovore) Caenorhabditis elegans nematode (clade V, bacteriovore) Ciona intestinalis tunicate (urochordate) Ciona savignyi tunicate (urochordate) Crassostrea gigas Pacific oyster (mollusc) Crassostrea virginica eastern oyster (mollusc) Ctenocephalides felis common cat flea (insect) Cyprinus carpio common carp (teleost fish) Drosophila melanogaster fruit fly (insect) Drosophila pseudoobscura fruit fly (insect) Dugesia japonica flatworm (platyhelminth) Echinococcus granulosus tapeworm (platyhelminth) Fugu rubripes puffer fish (teleost fish) Gallus gallus red junglefowl (bird) Gasterosteus aculeatus three spined stickleback (teleost fish) Glossina morsitans morsitans tsetse fly (insect)

11 Haemonchus contortus nematode (clade V, vertebrate parasite) Helicoverpa armigera cotton bollworm (insect) Heterodera glycines nematode (clade IV, plant parasite) Homalodisca coagulata glassy winged sharpshooter (insect) Homarus americanus American lobster (crustacean) Homo sapiens human (primate) Hydra magnipapillata hydra (cnidarian) Ictalurus furcatus catfish (teleost fish) Litopenaeus vannemei white prawn (crustacean) Leucoraja erinacei little skate (cartilaginous fish) Macaca mulatta rhesus monkey (primate) Marsupenaeus japonicus penaeid prawn (crustacean) Meloidogyne chitwoodi nematode (clade IV, plant parasite) Meloidogyne hapla nematode (clade IV, plant parasite) Meloidogyne incognita nematode (clade IV, plant parasite) Meloidogyne paranaensis nematode (clade IV, plant parasite) Monodelphis domestica opossum (marsupial) Mus musculus mouse (rodent) Onchocerca volvulus nematode (clade III, vertebrate parasite) Oncorhynchus mykiss rainbow trout (teleost fish) Oryctolagus cuniculus rabbit (lagomorph) Oryzias latipes Japanese medaka (teleost fish) Ostertagia ostertagi nematode (clade V, vertebrate parasite) Parastrongyloides trichosuri nematode (clade V, bacteriovore) Petromyzon marinus sea lamprey (Hyperoartian) Rattus norvegicus rat (rodent) Rhipicephalus appendiculatus tick (arachnid) Salmo salar Atlantic salmon (teleost fish) Schistosoma japonicum Japanese blood fluke (platyhelminth) Schistosoma mansoni blood fluke (platyhelminth) Spermophilus lateralis ground squirrel (rodent) Strongylocentrotus purpuratus sea urchin (urochordate) Strongyloides ratti nematode (clade V, vertebrate parasite) Strongyloides stercoralis nematode (clade V, vertebrate parasite) Sus scrofa wild boar (even-toed ungulate) Tetraodon nigroviridis puffer fish (teleost fish) Trichinella spiralis nematode (clade I, vertebrate parasite) Trichuris vulpis nematode (clade I, vertebrate parasite) Xenopus laevis African clawed frog (amphibian) Xenopus tropicalis pipid frog (amphibian)

12 Attribution of Credit The contributions of the various authors for the following paper that describes the results presented in Chapter 5 are listed below: “The Intracellular Chloride Ion Channel Protein CLIC1 Undergoes a Redox-controlled Structural Transition” The Journal of Biological Chemistry Vol. 279, No. 10, Issue of March 5, pp 9298-9305, 2004

Littler D. R. - Protein purification, crystal growth and manuscript writing. Harrop S. J. - X-ray data collection, integration, structure solution and model building Fairlie D. W. - Size-exclusion chromatography and oxidation of the CLIC1 dimer. Brown L. J. - Chloride efflux experiments. Pankhurst G. J.- Biacore experiments. Pankhurst S. - Initial preparation of the C24S, C59S and C89S CLIC1 mutants. DeMaere M. A.- Initial identification of the CLIC1 dimer. Campbell T. J. - Electrophysiology experiments. Bauskin A. R. - Initial cloning of CLIC1. Tonini R. - Electrophysiology experiments. Mazzanti M. - Electrophysiology experiments. Breit S. N - Experimental design, manuscript writing and preparation. Curmi P.M.G. - Experimental design, manuscript writing and preparation.

For the structures presented in Chapters 4 and 6 contributions include: Assaad N. N. - CLIC4 purification and crystal growth. Berry K. L. - Provision of the EXC-4 cDNA. Berryman M.A.- Provision of pCLIC4, pCLIC4(ext) and pdmCLIC-L vectors. Breit S. N - Experimental design, manuscript writing and preparation. Curmi P.M.G. - Experimental design, manuscript writing and preparation. Harrop S. J. - X-ray data collection, integration, structure solution and model building of CLIC4(ext), EXC-4 and dmCLIC. Smith D. - Cloning of the EXC-4 cDNA into the pGEX expression vector.

13 Chapter 1 Introduction The CLICs are a family of proteins present in most, if not all, chordates, with CLIC-like homologues probably present in all animals. They are a part of the glutathione transferase (GST) fold superfamily3,4. As the name of this superfamily suggests, the GST fold was first identified in the glutathione transferases, but the glutathione transferases comprise only a subset of the proteins with the GST fold. The glutathione transferases conjugate various hydrophobic substrates to the cysteine containing tripeptide glutathione5,6. Instead of such an enzymatic function, it has been hypothesised that the CLIC proteins act as chloride conducting ion channels in intracellular membranes7-9. If this turns out to be true the CLICs are unusual transmembrane proteins as they contain a soluble state and no obvious stretches of hydrophobic residues from which to form a membrane spanning state. While such characteristics are unusual, they are not unique, as some other proteins behave in a similar manner10. Several independent groups have shown that the CLIC proteins by themselves are sufficient to induce chloride conduction in in vitro membrane systems11- 13. However, whether this is also their primary function in vivo remains to be conclusively proven.

Chapter 1 is an introductory chapter that summarises much of the current literature known about the CLICs. After a brief introduction into ion channels and the CLIC family as a whole, in Section 1.2, this chapter introduces each of the human CLIC family members individually, detailing how they were discovered, their cell and tissue profiles, and any details known about their biological function. Section 1.3 then attempts to introduce and analyse the arguments for and against the CLIC proteins behaving as ion channels. Finally because of the relationship between the CLICs and the GSTs, Section 1.4 gives a brief overview of the glutathione transferase family, looking particularly at the GST-fold. The conclusions to the chapter are then presented in Section 1.5.

1.1 Ion channels A eukaryotic cell contains multiple internal and external membranous systems. Through the selective control of which molecules are transported across these membranes, the cell is able to maintain different chemical environments on both sides 14 of a membrane. As chloride is one of the most physiologically important inorganic anions, ion channels selective for chloride play a vital role in such transport processes.

There have been many comprehensive reviews on chloride channels. The following summary is based on a recent review by Jentsch et al.8. Chloride channels are present in both plasma and intracellular membranes. They function in such processes as ion homeostasis, cell volume regulation, transepithelial transport and regulation of electrical excitability8. There are several molecularly distinct chloride channel families: the Chloride channel (CLC) family; the cystic fibrosis transmembrane conductance regulator (CFTR); and the ligand gated GABA and glycine receptors. As well as these main three chloride channel families, less characterised CLIC and CLCA gene families also exist8, the first of these poorly characterised families is the subject of this thesis.

All eukaryotic cells contain membrane-enclosed sub-cellular organelles that enable different cellular functions to be compartmentalised. The chemical environment inside these organelles differs from that of the cytoplasm and these differences are often critical for organelle function. Specific intracellular ion channels, transporters and pumps are required to maintain these intra-organellar environments. Intracellular anion channels are vital for such processes as organellar volume regulation and maintaining electroneutrality8. Maintaining electroneutrality is required as calcium and hydrogen ion fluxes across organelle membranes can lead to the build up of positive charge, which hampers further uptake8.

1.2 The CLIC family The CLIC proteins are a family of proteins displaying both broad tissue and cellular distribution. There are currently 6 human family members (CLICs 1-6), with CLIC2, CLIC5 and CLIC6 each possessing at least two isoforms. The family is highly conserved across species with direct analogues to human family members observed in amphibians, birds, fish and rodents, while possible alternate family members have been found in sea squirts, nematodes, and insects.

Each CLIC protein consists of a highly conserved ~230 amino acid CLIC module, with CLIC5 and CLIC6 also possessing additional unrelated hydrophilic N-terminal

15 extensions. The conserved CLIC module shows between 40-80% sequence identity across the family within humans, while individual family members are highly conserved across species (e.g., human CLIC4 shows 98.8% identity to an orthologue in mice, 97.6% in rats, 84.6% in African clawed frogs and 85.8% in zebra fish). In contrast the N-terminal extensions of CLIC5 and CLIC6 are less conserved across species than their respective CLIC modules (e.g., the N-terminus of human CLIC6 shows 38.0% identity to the N-terminus of CLIC6 in mice, while the CLIC module retains 95.7% identity).

In this chapter in Sections 1.2.1 to 1.2.7 the various CLIC family members are introduced individually. Details of each family member’s discovery, expression profile and possible biological functions are discussed. The one exception to this treatment is the separate discussion of the founding family member p64 and its human orthologue CLIC5 due to historical considerations.

Table 1-1 CLIC family name aliases. Family member Alias CLIC1 CLIC-1, NCC27, p64CLCP, hRNCC CLIC2 CLIC-2, XAP121 CLIC3 CLIC-3 CLIC-4, CLIC0, DKFZP566G223, CLIC4 CLIC4L, P64H1, H1, huH1, p64H1, mtCLIC, mc3s5 CLIC5 CLIC-5 CLIC6 CLIC1L, CLIC-6, CLIC5, parchorin

Before introducing the various CLIC family members a brief discussion of the CLIC family nomenclature is in order. Landry et al. identified the founding CLIC family member, p64, at the molecular level in 199314. Over the course of the next few years, other CLIC family members were discovered with sequence similarity to the C- terminus of p64, the authors reporting the discovery of each of these homologues chose their own, often quite obscure names, see Table 1-1. It subsequently became apparent that these p64 homologues were part of a conserved protein family with multiple members. When they discovered the third human p64 homologue, Heiss and Poustka suggested a much needed standardisation of the naming criteria for the family34. They proposed the usage of the acronym CLIC (Chloride Intracellular Channel) followed by

16 a number designating order of discovery (e.g., CLIC2). This is now the generally accepted nomenclature for the CLIC family.

However it is this author’s view that the acronym used within this naming scheme is not ideal, as it implies too much about each family member based upon sequence homology alone. For example many, if not all, of the CLICs have yet to be proven conclusively to be chloride channels in their own right, as opposed to chloride channel regulators or perhaps fulfilling other related transport or enzymatic roles. Also the assumption that all of the CLICs are likely to be confined to intracellular membranes seems flawed. Indeed when originally discovered in rabbits CLIC6 partially localised to the plasma membrane15. Because of this localisation Nishizsawa et al. originally named the protein “Parchorin” in order to differentiate it from the CLIC acronym. However when the human and mouse orthologues of this protein were identified, maintaining a naming scheme that detailed the protein’s homology was seen to be more important and they took the name CLIC6. Until either a change in the accepted nomenclature or a detailed characterisation of each family members occurs, within this thesis the author would thus kindly ask the reader to parse “CLIC” as a single word, the expanded acronym will not be referred to again in this text.

1.2.1 p64 Due to the then lack of pharmacological ligands available for the characterisation of epithelial chloride channels, in a seminal paper Landry et al. performed a series of experiments aimed at developing new chloride channel inhibitors16. The basis for these experiments centred on the measurement of a chloride conductance observed in membrane vesicles originating from bovine kidney cortex and trachea epithelium. Several moderately potent inhibitors were identified; the strongest inhibitor of the kidney cortex vesicles’ chloride conductance was indanyloxyacetic acid 94 (IAA-94), see Figure 1-1. While IAA-94 also inhibited the conductance observed in the apical membrane vesicles extracts from bovine trachea, it did so to a lesser extent16.

17

Figure 1-1 Chemical structure of indanyloxyacetic Acid 94(+)16.

Having determined that the reconstituted chloride channel activity in their bovine kidney membrane extracts was sensitive to IAA-94 the same group then set about characterising this channel activity. The founding CLIC family member, p64, was identified as part of this process, occurring through the affinity purification of bovine kidney extracts using an IAA-94 homologue14,17.

Four proteins were found to bind to this IAA affinity column with apparent molecular weights, as determined by SDS-PAGE, of 27, 40, 64 and 97 kDa. The 27 kDa protein corresponded to a pi class GST, while the 97 kDa protein corresponded to the α subunit of Na+, K+-ATPase, these proteins were known to bind to ethacrynic acid which has structural similarities to IAA-94 and were thus discounted from further study18. In retrospect this decision appears somewhat arbitrary as p64 and the pi class GSTs were later found to share strong structural similarities (discussed in Section 1.4), most likely including similar IAA-94 binding sites19. Nonetheless antibodies were raised against the remaining 40 and 64 kDa proteins, only those raised against p64 reached a useful titre allowing further experiments18. Before being reconstituted into phospholipid vesicles, solubilised bovine kidney extracts were immunodepleted with these antibodies raised against p64. Analysis of the rate of potential driven 36Cl- uptake by these vesicles showed that p64 immunodepletion reduced this chloride uptake. Furthermore, fractionation of the bovine kidney extracts using gel filtration showed that channel activity co-eluted with the 64 kDa protein18.

18 Landry et al.14 were then able to ascertain the molecular identity of this 64 kDa protein. A cDNA library constructed from bovine kidney cortex was screened with the anti-p64 sera and a positive clone obtained. By comparing a series of smaller clones a full- length ~ 6 kbp cDNA was determined, this contained a 1311 bp ORF encoding a 437 amino acid protein as well as an extremely long 3’ UTR. The predicted molecular weight for the 437 amino acid protein was 49 kDa, yet when translated in vitro the unmodified protein ran anomalously slow on SDS-PAGE with an apparent MW of 64 kDa, it could also react with the anti-p64 sera. In addition antiserum raised against this in vitro product mimicked the behaviour of the anti-p64 sera, in that it was able to immunodeplete chloride conductance in the bovine kidney vesicle preparations14. These facts strongly indicated that this cDNA clone corresponded to p64, and that this protein was involved in chloride conductance. p64 consists of a distinct ~230 residue C-terminal module later defined as belonging to the CLIC family, and an approximately 200 residue hydrophilic N-terminal extension, which was found to be unique to p64.

Northern blots using a probe based on the ORF of p64 identified a 6.4 kb transcript in bovine kidney cortex, skeletal muscle, heart, kidney medulla and adrenal gland, as well as a 7 kb transcript in the brain and adrenal gland14.

For various CLIC family members many of the early studies detailing their immunolocalisation at the tissue and cellular level suffer from the potential of cross- reactivity of the antibodies used with other CLIC family members, which at the time were not known to exist (see Table 1-2). As will be discussed in Chapter 3, nearly identical orthologues exist for all 6 human CLIC family members throughout most if not all of the vertebrates, while related CLIC-like proteins with lower sequence identities are seen in the invertebrates. In the papers by Redhead et al. and Landry et al. detailing the production of their anti-p64 sera and purification of the respective polyclonals, the final antibodies are seen to react with proteins in a variety of cell lines including the insect cell-line SF914,18. Although it is dependent on what part of the molecule these antibodies are directed against, given the vertebrate CLICs show a higher degree of conservation between family members than any one family member does to the currently known insect CLIC-like proteins, these results suggest that the anti-p64 sera used in these studies are unlikely to be specific to p64 and quite possibly react with at least some of the other CLIC family members. Less ambiguous p64 19 localisation studies have been performed for the human orthologue using antibodies raised against a peptide within the family member specific N-terminal extension20 (see Section 1.2.6 and Table 1-2).

Another possible problem with the production of anti-CLIC antibodies is that, across species for individual orthologues the almost invariant level of sequence conservation within the CLIC module severely limits the number of available epitopes, presumably exacerbating the potential for cross-reactivity. Indeed, when antibodies were raised against CLIC6 that contains an N-terminal extension that does not show the same level of inter-species conservation as the CLIC module, the epitope was located within this extension21. The repetitive nature, unusual amino acid composition and lack of conservation presumably makes these N-terminal extensions, in those CLIC family members that contain them, more antigenic than their respective CLIC modules. If this is the case, it is all the more confusing that the anti-p64 antibodies of Redhead et al. and Landry et al. are cross-reactive with a protein within an insect cell line, as it seems likely that these antibodies should be directed towards the N-terminal extension of p64, which would eliminate the possibility of cross-reactivity between CLIC family members, yet no equivalent sequence exists within the known insect genome or EST databases.

With these anti-p64 antibodies, p64 or related proteins were shown to localise to intracellular membranes in various cell types. In the human colon cancer cell line T84, endogenous protein was seen to localise to centralised intracellular vesicles22. These vesicles are carbohydrate rich and migrate to the periphery of the cell on treatment with phorbol 12-myristate 13-acetate (PMA: a protein kinase C activator) or Ca2+ ionophores, thus they resemble regulated excretory vesicles22. Additionally when p64 was transfected into the human pancreatic cell line PancI, a cell-line in which it does not appear to be endogenous, a prominent peripheral reticular vesicular pattern was observed22. These reticular vesicles appeared to be regulated in a manner similar to those observed with endogenous p64 staining in the T84 cell-line22.

Redhead et al. also transfected PancI cells with p64 constructs truncated at the N- or C- terminus. From this they were able to conclude that the cellular localisation of p64 is dependent on signals at both termini. The C-terminal signal sequence(s) was seen to be 20 responsible for the suppression of protein translocation to the plasma membrane, while the N-terminal sequence(s) ensured targeting to distinct intracellular vesicles22.

21 Table 1-2 anti-CLIC antibodies in the literature A description of various antibodies raised against CLIC family members reported in the literature. Antibody Study first Animal of Monoclonal/ Possible cross- Measured cross- Raised against Purification Epitope name reported in origin polyclonal reactivity reactivity IAA-23 and SDS- Affinity purified Reacts with a ~64 Redhead et al. CLICs 4-6 are Anti-p64 PAGE purified Guinea pig Polyclonal using β-Gal- ? kDa band in insect 1992 highly homologous bCLIC5B bCLIC5B fusion cells Reactive with a ~64 CLICs 4-6 are kDa band in insect Affinity-purified highly homologous, cells. Not reactive SDS-PAGE purified AP95/ Landry et al. using β-Gal- CLIC5A and against hCLIC1 or β-Gal-bCLIC5B Guinea pig Polyclonal ? anti-H2B 1993 bCLIC5B fusion CLIC5B isoforms hCLIC4, no obvious fusion protein. protein share C-terminal bands on Western 230aa blot corresponding to size of CLIC5A IgG-binding domains Within of Staphylococcal CLICs 4-6 are Howell et al. CLIC4 Ab990 protein A fused to Rabbit Polyclonal Not reported highly homologous - 1996 residues 42- residues 42-184 of within this region 184 rCLIC4 CLICs 1,2,4 and 6 Cross reactive with Peptide with Pan-CLIC/ Schlesinger et Affinity purified C-terminal share 12, 10, 12 and CLIC1 and CLIC4 gCLIC5B 18 C- Rabbit Polyclonal 656 al. 1997 using peptide peptide 13 of the 16 amino and other unidentified terminal residues acids in epitope bands

Anti- Valenzuela et bacterially expressed Rabbit Polyclonal Not reported ? ? - NCC27 al.1997 GST-hCLIC1

CLICs 4, 5 and 6 Not reactive against Peptide with hCLIC1 share 9, 11 and 10 Tulk et al. Affinity-purified C-terminal bCLIC5B, mildly AP823 C-terminal residues Rabbit Polyclonal respectively of the 1998 using His-hCLIC1 peptide cross-reactive against 228-241 6 13 amino acids hCLIC4. within the epitope

22 Antibody Study first Animal of Monoclonal/ Possible cross- Measured cross- Raised against Purification Epitope name reported in origin polyclonal reactivity reactivity Preabsorbed with Within 158 GST-hCLIC4 fusion GST and MBP then Cross-reactive with Chuang et al. C-terminal CLICs 4-6 are CUMC29 consisting of the 158 Rabbit Polyclonal affinity purified bCLIC5B and 1999 residues of highly homologous C-terminal residues against MBP- hCLIC1 CLIC4 hCLIC4 fusion, Immunodepletion Within 158 GST-hCLIC4 fusion p64H1 Chuang et al. of CUMC29 against C-terminal CLICs 4-6 are Does not react with consisting of the 158 Rabbit Polyclonal antibody 1999 GST-p64 and MBP- residues of highly homologous bCLIC5B or hCLIC1 C-terminal residues CLIC1 CLIC4 Immunodepletion Edwards et al. Affinity purified CLICs 4-6 are Not reactive against AP1058 Rabbit Polyclonal with GST then ? 1999 GST-hCLIC4 fusion highly homologous hCLIC1 or bCLIC5B affinity-purified Anti-N Fernandez- Peptide with unique CLIC4 Affinity purified terminus Salas et al. mCLIC4 N-terminal Rabbit Polyclonal residues None - with peptide CLIC4 1999 residues 1-17 1-17 CLIC5 and CLIC6 Anti-C Fernandez- Peptide with mCLIC4 CLIC4 See pan-CLIC which Affinity purified share 10 and 9 of terminus Salas et al. C-terminal residues Rabbit Polyclonal residues is raised against with peptide the 16 residues in CLIC4 1999 238-253 238-253 highly similar peptide the epitope Polyclonal Urushidani et HPLC purified CLICs 4-6 are anti- Rat Polyclonal Not reported ? - al. 1999 oCLIC6 highly homologous parchorin Within the CLIC6- Monoclonal CLIC6 A and B Urushidani et SDS-PAGE purified unique N- anti- Mouse Monoclonal 4 isolates isoforms have - al. 1999 oCLIC6 terminal parchorin identical N-termini residues 159-262?

23 Antibody Study first Animal of Monoclonal/ Possible cross- Measured cross- Raised against Purification Epitope name reported in origin polyclonal reactivity reactivity

CLICs 4-6 are CLICs 1 and 4 highly homologous, Berryman and B-132/ SDS-PAGE purified immunodepleted CLIC5A and Bretscher Rabbit Polyclonal ? - APB132 His-hCLIC5A then affinity- CLIC5B isoforms 2000 6 purified share C-terminal 230aa Preabsorbed with Berryman and B-134/ SDS-PAGE purified CLICs 1 and 5A CLICs 4-6 are Bretscher Rabbit Polyclonal ? - APB134 His-hCLIC4 then affinity- highly homologous 2000 6 purified Preabsorbed with CLIC1 and Berryman and B-121/ SDS-PAGE purified CLICs 4 and 5A CLIC2,4-6 are Bretscher Rabbit Polyclonal ? - APB121 His-hCLIC1 then affinity- slightly 2000 6 purified homologous Peptide with unique Anti- Shanks et al. Affinity-purified N-terminal hCLIC5B N-terminal Goat Polyclonal None - CLIC5B 2002 with epitope peptide residues 14-26

Board et al. Anti-CLIC2 purified His-hCLIC2 Rabbit Polyclonal Not reported ? - - 2004 6 Abbreviations are bCLIC: bovine CLIC, gCLIC: chicken CLIC, hCLIC: human CLIC, mCLIC: mouse CLIC, rCLIC: rat CLIC, oCLIC: rabbit CLIC.

24

The p64-related chloride channel activity has been measured by several means. In a paper by Edwards et al., p64 was expressed in HeLa cells and the cell membranes extracted. Planar lipid bilayers composed of cell membranes from the p64-transfected cells, but not controls, showed a 42 pS single channel conductance that was outwardly rectifying and anion selective23. In a larger population-based study, membrane extracts were reconstituted into lipid vesicles and chloride efflux measured using a chloride sensitive electrode. Vesicles containing reconstituted HeLa cell extracts from p64- transfected cells showed a 28% increase in chloride efflux, compared to controls23.

The p64-related chloride channel activity is likely to be regulated by multiple phosphorylation pathways. The epithelial chloride conductance used for the development of the inhibitory ligand IAA-94 responded to various ATP homologues in a manner that suggested a co-purified kinase was capable of closing the channels responsible for this conductance16. Edwards et al. have shown that a tyrosine residue within the N-terminus of p64 is phosphorylated when co-expressed with the Src tyrosine kinase family member p59fyn, with this tyrosine subsequently becoming part of a ligand-binding site for the SH2 domain of the kinase24. In HeLa cells co-expressing both p64 and p59fyn chloride channel activity was seen to be greater than those cells expressing either alone24.

The studies on bovine p64 have stagnated somewhat in the last few years, presumably due to the elucidation of orthologous and homologous proteins in better-characterised organisms with more refined experimental systems. Although the focus may have moved, the discovery of p64 as founding member, and its initial characterisation as a chloride channel have greatly influenced the development of our knowledge of the CLIC family. If for example, the similarity between the CLICs and the glutathione transferase family had been elucidated earlier, it seems likely that studies would have concentrated on their possible interaction with glutathione and on attempts to find substrates at the expense of their potential chloride channel activity.

25 1.2.2 CLIC1 Although not the first p64 homologue to be discovered (see Section 1.2.5), CLIC1 was the first to be found in humans. It was initially identified from a cDNA library that was designed to be enriched for associated with the activation of monocytoid cells25. The cell line used to create this cDNA library, U937, exhibits a monocytic lineage and is able to differentiate in vitro into cells with macrophage characteristics, including the ability to be activated through exposure to PMA. As such, U937 cells can serve as a model for macrophage activation. Based on this model, Valenzuela et al. created a subtracted cDNA library between activated and non-activated differentiated U937 cells, in an attempt to locate genes upregulated during macrophage activation. CLIC1 (initially called NCC27) was one of the clones isolated from this library.

CLIC1 was subsequently identified independently from a cDNA library created from the human pancreatic cell line PancI, when screened at low stringency using a probe for p6426. Both this cDNA clone obtained by Tulk et al. and a later genomic analysis of the gene sequence27 revealed that the original CLIC1 cDNA clone obtained by Valenzuela et al.25 contained a guanine to cytosine mutation at nucleotide position 405. This led to an amino acid mutation at position 63 from glutamine to glutamic acid. Therefore the initial characterisation experiments presented in the 1997 paper by Valenzuela et al. used the CLIC1 mutant Q63E.

CLIC1 is highly conserved across species with fully sequenced orthologues found in mammals (98.3% identity in M. musculus), amphibians (74.3% identity in X. laevis) and fish (75.9% identity in D. rerio). Valenzuela et al. have also reported that a Southern blot using a probe based on CLIC1 cDNA showed hybridisation in all species tested including monkey, rat, mouse, dog, cow, rabbit and yeast28. However, in yeast, despite the completed S. cerevisiae29 and S. pombe30 genomes it is unclear which sequence the probe hybridised with.

At the tissue level, CLIC1 appears to be one of the more widely expressed CLIC family members. In several papers Northern blots using various CLIC1 probes have identified transcripts in almost all human tissues screened25,26,28,31. CLIC1 mRNA shows particularly high levels of expression in skeletal muscle, heart, liver and lung26,28,31.

26 The only tissue with relatively low expression levels appears to be the brain26,28, with one report not identifying any transcript within the brain at all31.

While these reports agree on the tissue distribution of CLIC1 mRNA, the exact size of the transcript differs in various papers from the 1.2 kb and 1.0 kb transcripts initially observed by Valenzuela et al 25. Using a cDNA fragment encoding the C-terminal half of CLIC1, Tulk et al. observed a ~1.7 kb transcript in Northern blots of human tissues including the heart, brain, placenta, lung, liver, skeletal muscle, kidney and pancreas as well as PancI cells26. Ribas et al. report a ~ 1.5 kb and 1.4 kb transcript when using a genomic fragment to probe the cell lines U937, HepG2 and Molt4, as well as 1.5 kb and 1.45 kb transcripts in the Raji cell line27. Berryman and Bretscher observe a ~ 1.4 kb and 1.2 kb transcript in human pancreas, kidney, skeletal muscle, liver, lung, placenta and heart tissue when using a probe corresponding to the entire ORF31. Chuang et al. report a 1.7 kb transcript in all bovine tissues tested32. It is not clear whether these variations are due to alternate mRNA splicing in different tissues or differences in the experimental methods used.

It is now established that CLIC1 is encoded by a 723 bp ORF encoding a 241 amino acid protein that shows 74% similarity to the C-terminal 236 residues of p64. The protein has a predicted MW of 26.9 kDa and a theoretical pI of 5.1. Like other CLIC family members CLIC1 displays anomalous migration on SDS-PAGE gels running at around 34 kDa, and this has been shown not to be due to any posttranscriptional modification26.

The intracellular localisation reported for CLIC1 varies in different reports within the literature, possibly reflecting a tissue specific distribution profile (see Table 1-3). N- terminal FLAG-tagged CLIC1 transfected into CHO-K1 cells has been reported to localise within the nucleoplasm and nuclear membrane with faint staining also observed in the cytoplasm and plasma membrane25. Using an antibody raised against the 13 C- terminal amino acids of CLIC1, Tulk et al. report predominant staining of small vesicular structures throughout the cytoplasm of the cell lines CHO-K1, HeLa and PancI, as well as in macrophages. The same study also reported prominent nuclear staining in the CHO-K1 cells26. It should be noted however that while not identified at the time of publication of this study, CLIC4 and CLIC5 both share 10 of the 13 CLIC1 27 C-terminal residues. Thus cross-reactivity between CLIC family members for the antibody used by Tulk et al. in this study may be an issue. Although HeLa cells expressing CLIC4 or p64 have been reported to stain at the level of controls with this antibody23. In a later paper by Berryman and Bretscher using homologue-specific antibodies, CLIC1 was distributed throughout the cytoplasm and excluded from the nuclei of trophoblast epithelium31. In bovine spermatozoa CLIC1 has been seen to localise to the acrosomal region of the sperm head33.

Table 1-3 Reported intracellular distributions of CLIC1 Antibody Tissue & Visual- Protein Intracellular Species Study used/ cell type/ isation stained localisation Marker cell line Nucleus, nuclear Ovary cell Valenzuela Immuno- Anti- Exogenous membrane, Hamster line et al. 1997 fluorescence FLAG hCLIC1 cytoplasm and (CHO-K1) plasma membrane Ovary cell Cytoplasmic Tulk et al. Immuno- Endog- Hamster AP823 line vesicular structures, 1998 fluorescence enous (CHO-K1) nucleus Pancreatic Cytoplasmic Tulk et al. Immuno- cancer cell Endog- vesicular structures, Human AP823 1998 fluorescence line (Panc- enous large peripheral I) structures Cervical Tulk et al. Immuno- cancer cell Endog- Cytoplasmic Human AP823 1998 fluorescence line enous vesicular structures (HeLa) Tulk et al. Immuno- Macro- Endog- Cytoplasmic Human AP823 1998 fluorescence phage enous vesicular structures Apical domains of proximal tubules in Tulk et al. Immuno- Endog- deep cortex, diffuse Human AP823 Kidney 1998 fluorescence enous staining of cytoplasm in glomeruli Apical domains of Tulk et al. Immuno- Endog- cortical proximal Mouse AP823 Kidney 1998 fluorescence enous tubules and glomeruli. Punctate staining Berryman Placental throughout the and Immuno- Endog- Human APB121 trophoblast cytoplasm Bretscher fluorescence enous epithelium particularly near the 2000 basal surface Diffuse staining in Myers et Immuno- Sperm- Endog- the acrosomal Bovine AP823 al. 2004 fluorescence atozoa enous region of the sperm head Species abbreviations as used in Table 1-2

28 1.2.3 CLIC2 CLIC2 is currently one of the least characterised CLIC family members, with little known about its tissue expression profile or intracellular location. At least two CLIC2 isoforms are likely to exist. CLIC2A was the first isoform to be identified; this was done from sequence similarity to other CLIC family members in a transcript map of the telomeric region of Xq2834. CLIC2B was later identified and cloned by Fan et al. from a cDNA library created from human placental tissue, although only a partial sequence has been reported35. CLIC2B contains an extra exon, exon 1b, inserted between exons 1 and 2 of the CLIC2A sequence. This extra exon encodes an 18 residue sequence (MCLSPSSMIVRPSQPRGT‡) inserted immediately after β-strand 1 in the CLIC GST- like structure. Apart from this insertion, CLIC2B appears to be identical to CLIC2A.

Northern blots by Board et al. using the CLIC2A ORF as a probe stained 2.37 kb and 1.45 kb transcripts in all tissues tested except the brain. Particularly high expression levels were observed in the lung and spleen36.

CLIC2A recombinantly expressed in E. coli runs as a monomer on gel filtration chromatography, and with an apparent MW of 34.9 kDa on SDS-PAGE gels36. Like other CLIC family members this is an anomalous gel migration as the calculated MW for the protein is 28.2 kDa.

1.2.4 CLIC3 Using the C-terminus of ERK7 as bait Qian et al. identified CLIC3 in a yeast two- hybrid screen37. Apart from this 1999 paper by Qian et al. little exists in the literature concerning the role of CLIC3 within the cell. As discussed later, the CLIC3 construct used by the authors in this paper appears to be an N-terminal truncation, thus throwing into doubt some of the experimental results reported for CLIC3. These results should therefore be carefully scrutinised with this in mind.

‡ In the paper by Fan et al. the amino acid sequence encoded by exon 1b is incorrectly translated as FCKDVPFTFFHDCEAFPA. Personal communication with Prof. Xueliang Zhu, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China. 29 Extracellular signal-regulated kinase 7 (ERK7) is a member of the mitogen-activated protein (MAP) kinase family, a family of highly conserved proline-directed serine/threonine kinases. ERK7 was originally identified in rat brain tissue and subsequently found to possess constitutive kinase activity in serum starved cells, conditions in which other ERK family members are significantly less active38. Upstream kinases required for activation of the ERK1 and ERK2 family members do not contribute to ERK7 kinase activity. Instead both the constitutive kinase activity and the nuclear localisation of ERK7 have been found to be dependent on its unusual C- terminal domain38.

In an attempt to locate interaction partners to the unusual C-terminal domain of ERK7, Qian et al. used a yeast two-hybrid screen constructed from a human foetal brain library and the C-terminal domain of ERK7 as bait37. As reported within this paper they obtained two cDNA clones (S9 and S46) that interacted strongly with the tail of ERK7 and not with unrelated control proteins. Their sequence analysis of these clones indicated that they both contained the same partial-length cDNA containing the ORF of a protein similar to other CLIC family members, however this cDNA was truncated at the 5’ end. Their subsequent search of the human EST database located a cDNA clone (N41765) created from 8 and 9 week human placentae that extended by a further 100 bp from the 5’ end than their clone S9, but was truncated at the 3’ end.

At the time of publication (January 1999) of the paper by Qian et al. this was the only sequence information available for CLIC3. By overlapping the sequence data from EST clone N41765 and the clone S9, Qian et al. were able to hypothesise an ORF for CLIC3 that was similar to CLIC1 and the C-terminus of p6437. The protein encoded by this ORF is truncated by approximately 30 amino acids at the N-terminus compared to CLIC1 such that the initiating methionine aligns to the second methionine (Met-32) of CLIC1 and contains an in frame stop codon at position –16. With this hypothesised ORF, the proposed initiating methionine was truncated by only 23 bp in the clone S9 at the 5’ end. Qian et al. thus cloned CLIC3 by PCR extension from the clone S9 utilising a 5’ primer (Primer A) containing the truncated residues as sequenced in the EST clone N41765 (Figure 1-2A).

30 Since this original cloning by Qian et al. several EST sequences containing CLIC3 have been reported. The sequences of some of these ESTs§ extend further in the 5’ direction than N41765, for example BM043659 created from a human prostate cDNA library. The sequences of these ESTs disagree with N41765 in a crucial position 6 bases upstream of the initiating methionine hypothesised by Qian et al., which suggests that N41765 lacks a cytosine in this position. This cytosine is conserved in all other human CLIC family members (shown in mauve in Figure 1-2B), as well as cDNA clones for CLIC3 derived from mouse (AK009020 and BC060967) and chicken (AI980111) and the genomic sequence of from human 9 (NT_024000) and mouse chromosome 2 (NM_027085).

The stop codon 16 residues prior to the initiating methionine in the hypothesised ORF described by Qian et al. is thus a result of the frame-shift due to the missing cytosine in N41765. Instead the CLIC3 sequence contains an alternate possible initiating methionine in a position analogous to those for CLICs 1-4 and 5A, 29 residues before the starting position proposed by Qian et al.

The alternative N-terminal CLIC3 sequence encoded by BM043659 contains several highly conserved regions of amino acids corresponding to β-strand 1 and the N- terminus of α-helix 1 in the CLIC1 structure, both integral parts of the N-terminal thioredoxin domain present in the GST-like CLIC structure. BM043659 encodes an in- frame stop codon 108 bp upstream from this new start site with no intervening methionines.

The CLIC3 construct used by Qian et al. thus appears to be an N-terminal deletion mutant, CLIC3(Δ1-29) lacking β-strand 1 and half of α-helix 1. The loss of the N- terminal residues containing these structural elements is likely to result in CLIC3(Δ1- 29) having, at the very least, a partially misfolded N-terminal domain as well as possibly altering the intracellular localisation of the protein.

§ Including BG480428, BI092272, BI090345, CB998146, CB991330 and BM043659. 31

Figure 1-2 Sequence alignment of the CLIC3 ESTs and the cloning constructs used by Qian et al.37. A: The cDNA clone BM043659 originates from human prostrate, N41765 from human placentae, S9 is the cDNA clone located from the yeast two-hybrid screen of Qian et al. and Primer A is that used by Qian et al. to PCR extend S9 based on the sequence of N41765. The missing cytosine in the sequence of N41765 is shown in mauve, stop codons in red and proposed initiating methionines in blue. B: cDNA alignment of the human CLICs with consensus amino acid sequence below, base pairs are shaded according to conservation, the cytosine missing in N41765 is coloured mauve.

Using an N-terminally FLAG tagged CLIC3(Δ1-29) construct Qian et al. transfected the monkey kidney fibroblast cell line CV-1. Immunostaining with anti-FLAG antibodies showed a predominantly nuclear distribution and a faint background staining of the

32 cytoplasm was also observed. Due to the possible presence of cellular localisation signals at the extreme N-terminus of CLIC3, this intracellular profile requires verification using the full length protein.

Northern blots using antisense cDNA corresponding to the ORF of CLIC3(Δ1-29) located a 1.15 kb transcript that was highly expressed in placenta, heart and lung as well as low levels of expression in skeletal muscle, kidney and the pancreas37.

Qian et al. also measured the current-voltage properties of the plasma membrane in LTK cells transfected with FLAG-CLIC3(Δ1-29) and green fluorescent protein (GFP) or GFP alone. Cells transfected with GFP alone showed a linear current-voltage relation between –100 and +80 mV with a slope of 5 GΩ. Half of the cells transfected with CLIC3 and GFP showed a reduction in the input resistance to 646 ± 242 MΩ as well as an altered current-voltage profile37. Given the preliminary nature of these results and the fact they were performed using the Δ1-29 deletion construct of CLIC3, their physiological relevance is questionable and requires verification using a full length CLIC3 construct.

1.2.5 CLIC4 CLIC4 is one of the best studied CLIC family members, primarily because, like CLIC1, it is ubiquitously expressed in almost all types of cell and tissue. CLIC4 was the first p64 homologue to be identified despite later obtaining the name “CLIC4” under the naming conventions of Heiss and Poustka. This homologue was originally identified in rat tissue homogenates as a band that was cross-reactive to an anti-p64 antibody39. Subsequently a partial cDNA was amplified and cloned from rat brain tissue using degenerate primers based on the cDNA sequence of p64. A full-length cDNA clone was obtained, encoding a 253 amino acid protein that was initially called p64H140.

The human CLIC4 orthologue was later identified from a cDNA library from the pancreatic cell line PancI23, while the mouse orthologue was identified from

33 keratinocytes41. In all these species CLIC4 is a 253 amino acid protein with a molecular weight of between 28 and 29 kDa.

The CLIC4 mRNA transcript has an unusually long 3’ UTR, making up around 3.7 kb of the human 4.5 kb transcript23. Like CLIC1, CLIC4 mRNA is widely expressed, with Northern blots of human tissues identifying a 4.5 kb transcript in all tissues tested with particularly high levels in the heart, placenta and skeletal muscle23. In rat a ~ 5 kb transcript was also observed in all tissues tested with particularly high levels in the heart, brain and retina32, while in mice, transcripts were observed in all tissues tested with the highest levels in heart, lung and kidneys41.

The intracellular localisation of CLIC4 is quite diverse and possibly tissue specific (see Table 1-4). Recombinant rat CLIC4 transiently transfected into rat hippocampal HT-4 cells is directed primarily to the endoplasmic reticulum40. In neurosecretory cells CLIC4 is seen to localise to large dense core vesicles32. Endogenous CLIC4 partly co- localises with caveolae in the pancreatic cell line PancI23 and in HEK-293 cells42. In mice keratinocytes, CLIC4 localises partially but not exclusively to the mitochondria41. Berryman and Goldenring report a similar staining profile within multiple cell-types. They found CLIC4 subcellular immunolocalisation at the nuclear matrix, cytoplasm, mitochondria, cell-cortex, centrosomes and midbody of dividing cells43. In human placenta, CLIC4 is most highly expressed within the trophoblast epithelium where it is localised to the apical surface of the cell, and to a lesser extent the apical cytoplasm31. Finally, in bovine spermatozoa CLIC4 is seen to localise primarily to the anterior perimeter of the sperm head with faint staining also observed in the principal piece of the flagellum33.

While the exact role of CLIC4 within the cell is unknown, CLIC4 has been reported to be involved in cellular apoptosis. CLIC4 mRNA is upregulated in p53+/+ primary mouse keratinocytes compared to p53-/- keratinocytes41. Suppressing the upregulation of CLIC4 in response to p53 overexpression prevents p53 mediated apoptosis44, while overexpression of CLIC4 alone in mouse keratinocytes induces apoptosis directly. This is associated with a loss of mitochondrial membrane potential and cytochrome c release, along with caspase activation. This suggests that CLIC4 mediated apoptosis proceeds via mitochondrial dysfunction44. 34

In resting mouse keratinocytes CLIC4 is distributed between the perinuclear area (60%) and the mitochondria (40%). The addition of cytotoxic agents or Tumour Necrosis Factor (TNF) -α, results in the translocation of CLIC4 from the cytoplasm to the nucleus preceding the appearance of apoptotic markers45. The nuclear translocation of CLIC4 occurs in association with the Nuclear Transport Factor (NTF) -2 nuclear transport complex45. This CLIC4 mediated apoptosis is independent of the Apoptotic protease activating factor (Apaf) pathway.

CLIC4 is also upregulated in response to treating human myofibroblasts with Transforming Growth Factor (TGF) -β146. The overexpression of CLIC4 within these cells results in inhibition of migratory activity. CLIC4 and CLIC1 expression is also seen to be upregulated following photo-oxidative stress in mouse retina along with a variety of antioxidants and transcription factors47.

The possibility that CLIC4 may be associated with Cl- channel activity has only been briefly investigated. Overexpression of CLIC4 in HEK-293 cells resulted in small non- rectifying whole-cell currents with a current density of 120 ± 14 pA pF-1 48. The channels reported within these cells showed a selectivity series of PNO3- ≥ PI- > PCl- >

Pgluconate. This channel activity was inhibited by ~ 50% through the addition of 100 μM IAA-94 48. Macroscopic channel recordings from the same cells displayed a mild outward rectification. Channel activity was blocked by the addition of antibodies raised against the C-terminal domain of CLIC4 when added to the cytoplasmic, but not the extracellular face of the membrane48. The same result was observed on addition of anti- FLAG antibody to cells transfected with a C-terminally FLAG-tagged CLIC4 construct48.

35 Table 1-4 Reported intracellular distributions of CLIC4 Antibody Tissue & cell Organism Study Visualisation Protein stained Intracellular localisation used/ Marker type/ cell line Duncan et Immuno- Kidney cell line ER, outer nuclear membrane and membrane-bound Human Ab990 Exogenous rCLIC4 al. 1997 fluorescence (HEK-293) organelles Duncan et Immuno- Neuronal cell line ER, outer nuclear membrane and membrane-bound Mouse Ab990 Exogenous rCLIC4 al.1997 fluorescence (HT-4) organelles Chuang et Immunoelectron Rat anti-p64H1 Dentate gyrus Endogenous Large dense core vesicles, tubular membranes al. 1999 microscopy Chuang et Immunoelectron Rat anti-p64H1 Axons Endogenous Large dense core vesicles, Microtubular bundles al. 1999 microscopy Edwards et Immuno- Pancreatic cancer Caveolae or trans-Golgi network vesicles and other Human AP1058 Endogenous al. 1999 fluorescence cell line (Panc-I) unidentified vesicle structures distinct from the ER. Diffuse apical-subapical staining within proximal Edwards et Immuno- Human AP1058 Kidney sections Endogenous tubules, punctate intracellular vesicle staining in al. 1999 fluorescence distal nephron and glomeruli. Primary Fernandez- GFP-CLIC4 Perinuclear intracellular organelles, cytoplasm and keratinocytes and Mouse Salas et al. Fluorescence and Exogenous mCLIC4 sites of intracellular junctions on the plasma keratinocyte cell 1999 CLIC4-GFP membrane line (SP1) Primary Fernandez- anti-C- Immuno- keratinocytes and Mouse Salas et al. terminal Endogenous Mitochondria and perinuclear intracellular organelles fluorescence keratinocyte cell 1999 CLIC4 line (SP1) Berryman Placenta, and Immuno- Human APB134 trophoblast Endogenous Apical surface, apical cytoplasm Bretscher fluorescence epithelium 2000 Suginta et Immuno- Kidney cell line Partial co-localisation with Caveolae markers, widely Human Anti-CLIC4 Exogenous rCLIC4 al. 2001 fluorescence (HEK-293) distributed in membrane and soluble compartments. Fernandez- Anti-C- Immunoelectron Keratinocyte cell Cristae and peripheral inner membrane of the Human Salas et al. terminal Endogenous microscopy line (HaCaT) mitochondria, cytoplasm. 2002 CLIC4

36 Antibody Tissue & cell Organism Study Visualisation Protein stained Intracellular localisation used/ Marker type/ cell line Berryman Cervical and Immuno- Cell surface, cytoplasm, centrosome, midbody, Human APB134 carcinoma cell Endogenous Goldenring fluorescence mitochondria, nuclear matrix line (HeLa) 2003 Berryman Polarised and Immuno- placental Cell surface, cytoplasm, centrosome, midbody, Human APB134 Endogenous Goldenring fluorescence epithelial cell line mitochondria, nuclear matrix 2003 (JEG-3) Berryman African and Immuno- Kidney cell line Cell surface, cytoplasm, centrosome, midbody, green APB134 Endogenous Goldenring fluorescence (COS-1) mitochondria, nuclear matrix monkey 2003 Berryman and Immuno- Kidney cell line Cell surface, cytoplasm, centrosome, midbody, Canine APB134 Endogenous Goldenring fluorescence (MDCKII) nuclear matrix 2003 Anti-C- Immuno- terminal Suh et al. Primary Endogenous/ Mitochondria and cytoplasm, cytoplasmic CLIC4 Mouse fluorescence/ CLIC4/ 2003 keratinocytes Exogenous mCLIC4 translocates to nucleus following cell-stress. Fluorescence Anti-HA/ GFP-CLIC4 Anti-C- Suh et al. Immunoelectron Primary Nuclear membrane and nucleoplasm, nuclear pore in Mouse terminal Endogenous 2003 microscopy keratinocytes TNF-α-treated cells. CLIC4 Anti-C- Osteosarcoma cell Suh et al. Immuno- Mitochondria and cytoplasm, cytoplasmic CLIC4 Human terminal line Endogenous 2003 fluorescence translocates to nucleus following cell-stress. CLIC4 (Saos-2) Myers et al. Immuno- Anterior perimeter of the sperm head, principal piece Bovine APB134 Spermatozoa Endogenous 2004 fluorescence of the flagellum Species abbreviations as used in Table 1-2

37

1.2.6 CLIC5 In humans, and possibly all tetrapods, the CLIC5 family member possesses at least two isoforms, CLIC5A and CLIC5B. These result from the incorporation of alternately spliced variants of exon 1 during mRNA maturation20. CLIC5B is the human orthologue of the bovine protein p64. The two possible exon ones, 1a and 1b, possess alternative start sites encoding different CLIC5 N-termini. Exon 1a encodes a short 21 amino acid N-terminal region, the last 7 amino acids of which encompass β- strand 1 of the GST-like CLIC structure. In contrast exon 1b encodes a 180 amino acid N-terminus, which shows sequence similarity to the N-terminus encoded by exon 1a in only the last 7 amino acids contributing to β-stand 1.

CLIC5A was the first human CLIC5 isoform to be discovered, being isolated from a pull-down assay using a construct containing the 30 C-terminal residues of ezrin31. In this study by Berryman and Bretscher, detergent-soluble microvillus extract and immobilised GST-ezrin(556-586) were mixed with the expectation that the endogenous actin present within the extract would bind to the affinity matrix along with proteins of interest. CLIC5A was one of 7 proteins that bound, either individually or as part of a complex to GST-ezrin fusion protein constructs containing an intact F-actin binding site. These proteins did not bind to GST-ezrin constructs with masked C-termini or missing the F-actin binding site31.

Berryman and Bretscher cloned CLIC5A from a human placental cDNA library giving a 251 amino acid protein with a predicted molecular weight of 28.1 kDa and a theoretical pI of 5.5 31. Like other family members, CLIC5A migrates anomalously slowly on SDS-PAGE with an apparent molecular weight of 32 kDa31. The amino acid sequence of CLIC5A is 91% identical to residues 197-437 of the bovine protein p64.

Later, Shanks et al. identified CLIC5B from the lysate of the human colonic cell line HCA-7. This was done with a Western blot using an antibody known to cross-react against various CLIC family members20. Along with CLIC1 and CLIC4, novel 46 and 85 kDa bands were recognised by the cross-reactive anti-CLIC antibody. Further analysis of this 46 kDa band revealed that it was also recognised by antibodies specific

38 for CLIC5A but not those specific for CLIC1 or CLIC4 20. Shanks et al. hypothesised that, due to the high sequence identity between CLIC5A and the C-terminus of p64, the 46 kDa band might represent an alternative larger version of CLIC5 with similarity to the entire bovine p64 protein20. Such a protein was seen to exist when they identified within the genomic sequence of human chromosome 6, a second alternate version of exon 1, 65 kbp upstream of that for CLIC5A. The 180 residues encoded by this new exon, exon 1b, share 45.9% identity and 58.5% similarity with the 201 N-terminal residues of the bovine protein p64.

A noteworthy aspect of this paper by Shanks et al. is the 46 kDa apparent molecular weight seen on the Western blots. Most CLIC family members appear to show aberrantly low mobility on SDS-PAGE resulting in apparent molecular weights significantly larger than those theoretically predicted from sequence alone. Studies through production in vitro14,40 or recombinant bacterial expression23,31,36 have shown that in most cases this aberrant mobility is not dependent on post-transcriptional modification but instead is an inherent property of the CLICs, and interestingly also the evolutionarily related omega class GSTs36,49. In the smaller family members containing only the CLIC module, the apparent MW determined by SDS-PAGE is typically overestimated by only 10-20%, however in those CLICs containing additional highly- acidic N-terminal extensions, over-estimates of as much as 30-80% of the predicted MW have been reported50.

When expressed in vivo or in vitro and measured by SDS-PAGE the bovine protein p64 has an apparent MW ~30% larger than its predicted one of 48.9 kDa14. This is important to note as p64 and CLIC5B, the human orthologue cloned by Shanks et al., have a sequence identity of over 70%. The main difference between the two, is confined to an ~20 amino acid insertion within the N-terminus of p64, which is not present in CLIC5B. Despite this level of conservation, antibodies raised against a peptide sequence within the N-terminus of CLIC5B stained only a 46 kDa band on Western blots, yet the predicted MW for the sequence contained within the cDNA clone of Shanks et al. is 46.4 kDa. Thus if these are one and the same protein, the human CLIC5B orthologue appears unique in that its mobility is not retarded on SDS-PAGE. If however this is not the case, as CLIC5B has not yet been recombinantly expressed, the 46 kDa band observed in these Western blots may represent a third, related but 39 smaller isoform of CLIC5, while the protein encoded by the cDNA sequence of CLIC5B may run at a higher MW. In support of this, Western blots using antibodies raised against bovine p64 have stained bands of ~ 64, 85 and 120 kDa in several human cell lines14. It is however noted that for the same CLIC family member the apparent MW, and hence the degree of overestimation, can vary significantly in different papers. In a paper by Urushidani et al. a possible reason for such variation can be seen, here the apparent MW of rabbit CLIC6 on SDS-PAGE was seen to be highly dependent on the acrylamide concentration of the gel used50. For CLIC6, which has a predicted MW of 65 kDa, a 6% gel gave a MW of 120 kDa while a 12% gel gave a MW of 96 kDa for the same sample50. However the reported gel composition and running conditions used to obtain the apparent MWs of bovine p64 and CLIC5B do not differ markedly14,20.

Northern blots against human tissue cDNA using the ORF of CLIC5A showed light staining of a 6.2 kb transcript in the placenta, kidney and lung, with heavy staining in skeletal muscle and the heart31. However the probe used in this study would be expected to hybridise to both CLIC5A and CLIC5B mRNA transcripts. In a more specific Northern blot using 5’ sequence unique to CLIC5A, a 6.4kb transcript was located in the same tissues albeit the heaviest staining was now observed in the lung. In the heart, kidney and skeletal muscle the level of staining was slightly lower than that obtained with the full-length probe, while the colon and placenta required extremely long exposure times for visualisation of the transcript20. Both probes also identified 3.8 and 2.3 kb transcripts in some tissues20,31. In contrast to CLIC5A, a CLIC5B isoform- specific probe based on exon 1b detected a 6.6 kb transcript only in the small intestine20. Interestingly in this study, despite originally being cloned from a cell-line derived from the colon, no CLIC5B transcript was detected in colonic mRNA20. Also no transcript was identified in human kidney mRNA20, this is worth mentioning as the bovine CLIC5B orthologue, p64, was originally cloned from kidney extracts14.

As discussed in Section 1.2.1, a distinct concern in the interpretation of immunolocalisation results is the possibility of cross-reactivity between family members of anti-CLIC antibodies used in some of the earlier studies. As this is particularly the case for some of the work on p64, the localisation studies on the orthologous human CLIC5 protein are of particular interest. It should be noted however that while the issue of antibody cross-reactivity between family members is largely 40 addressed in more recent studies, few address the possibility of isoform cross-reactivity. In only one report are antibodies raised against isoform specific sequence20 and no antibodies specific for the CLIC5A isoform are currently known. Given the large degree of common sequence between the two, antibodies raised against the CLIC5A isoform31,33 would also be expected to react with CLIC5B. Thus any staining profiles resulting from these antibodies may represent a composite of the profiles for the two isoforms.

In a paper using one such antibody the cellular and intracellular localisation of CLIC5 in human placenta was seen to overlap largely with that of CLIC4 31. For both proteins the staining pattern was most intense at the apical surface of the trophoblast epithelium. In contrast, in the same study CLIC1 was seen to be located mainly within the cytoplasm of the trophoblast31.

Using similarly derived antibodies in bovine spermatozoa, Myers et al. have shown that CLIC5 is localised to the post-acrosomal region of the sperm head, and the principal piece of the flagellum33.

In a more specific study by Berryman and Goldenring, human placental choriocarcinoma cells were transfected with an epitope tagged version of CLIC5A. Endogenous CLIC5A was not detected in these cells. Using antibodies specific to the epitope tag the intracellular distribution of the exogenous CLIC5A was seen to be concentrated in the apical region of the cells where it co-localised with ezrin in microvilli51. CLIC5A was not seen to be associated with the plasma membrane51.

In the only study detailing an isoform specific endogenous localisation profile, Shanks et al. raised antibodies to sequence unique to the CLIC5B isoform N-terminal extension. Using these antibodies in the human colonic adenocarcinoma cell line, HCA-7, CLIC5B localisation was seen to occur partially within the cytosol but predominantly to the Golgi20. To clarify this localisation a GFP fusion with the CLIC5 sequence common to both isoforms was made, this construct also localised to the Golgi suggesting that this localisation was not specific to CLIC5B, or if so, that CLIC5A has N-terminal signal sequence(s) that inhibits Golgi targeting20. Table 1-5 details a

41 Table 1-5 Reported intracellular distributions of CLIC5 Species Antibody Tissue & Visual- Protein Intracellular and Study used/ cell type/ isation stained localisation isoform Marker cell line Apical surface of Pancreatic monolayer, vesicles carcinoma Human Redhead et Immuno- Endog- around nucleus and Anti-p64 cell line CLIC5B al. 1992 fluorescence enous towards the (CFPAC- periphery of the 1) cell. Immuno- Thyroid Rough ER and Sheep Tamir et Endog- electron Anti-p64 parafollic- periphery of CLIC5B al. 1994 enous microscopy ular cells secretory vesicles Intracellular punctate pattern in Rabbit Redhead et Immuno- Kidney Endog- proximal tubular Anti-p64 CLIC5B al. 1997 fluorescence tissue enous cells, particularly near perinuclear region Central intracellular Immuno- secretory vesicles fluorescence Colon concentrated at Human Redhead et / Endog- Anti-p64 cancer cell basal pole of cell, CLIC5B al. 1997 Immuno- enous line (T84) vesicles translocate electron to cells periphery microscopy on PKC activation Exogenous AP95/ Pancreatic Prominent Human Redhead et Immuno- bCLIC5B/ monoclona cell line peripheral reticular CLIC5B al. 1997 fluorescence CD4- l anti-CD4 (PancI) vesicular staining bCLIC5B Schlesinge Immuno- Associated with Chicken Endog- r et al. electron 656 Osteoclasts ruffled border CLIC5B enous 1997 microscopy membrane Berryman Human Placental and Immuno- Endog- CLIC5A/ APB132 trophoblast Apical surface Bretscher fluorescence enous B epithelium 2000 Colon adeno- Cytosol, Golgi Human Shanks et Immuno- Anti- Endog- carcinoma apparatus and CLIC5B al. 2002 fluorescence CLIC5B enous cell line centrosomes (HCA-7) Colon Exogenous adenocarci Human Shanks et GFP- Fluorescence GFP noma cell Golgi apparatus CLIC5B al. 2002 hCLIC5B line (178-410) (HCA-7) Post acrosomal Bovine region of sperm Myers et Immuno- APXB5/6- Sperm- Endog- CLIC5A/ head, principal al. 2004 fluorescence N atozoa enous B piece of the flagellum. Polarised Ezrin rich apical Berryman placental Exogenous microvilli and Human and Immuno- Anti- epithelial Xpress- distinct cytosolic CLIC5A Goldenring fluorescence Xpress cell line hCLIC5A puncta located near 2004 (JEG-3) these microvilli. Species abbreviation as used in Table 1-2

42 summary of the studies detailing CLIC5 localisation for a variety of species and cell types, including those for p64.

Like other CLIC family members CLIC5 appears to exist in both soluble and membrane associated forms; however unlike other family members and in accordance with its original identification due to its association with ezrin, a significant fraction of CLIC5 also appears to be associated with the cytoskeleton. Using Western blots to determine the subcellular distribution of CLIC1, CLIC4 and CLIC5A showed that CLIC4 and CLIC5A were enriched in isolated microvilli extracts over the whole placental tissue, whereas CLIC1 was enriched within soluble fractions. Treatment of cytoskeletal and membrane fractions of microvilli with non-ionic detergent indicated that the insoluble cytoskeletal residue retained around 70% of CLIC5A while CLIC1 and CLIC4 were completely solubilised31.

Like CLIC1 12,13,52, CLIC5A has been shown in vitro to be associated with chloride conductance through measuring the chloride efflux from artificial vesicles. The addition of recombinant, bacterially expressed and purified CLIC5A elicits a concentration dependent increase in chloride efflux51. If CLIC5A was denatured prior to addition to the vesicles, or if measurements were carried out in the presence of IAA- 94, the increased efflux compared to control vesicles was not observed51. Despite these results, in the same study an experiment designed to measure increased plasma membrane anion permeability showed no difference in the rate of iodide efflux between two transfected cell-lines, one expressing CLIC5A, the other not51.

1.2.7 CLIC6 Originally isolated in rabbits and called parchorin, CLIC6 contains a large hydrophilic N-terminal extension unrelated to those of other CLIC family members. In humans this extra N-terminal domain is around 450 amino acids long, is highly acidic and rich in small amino acids. To date CLIC6 orthologues have been observed in mice53, rabbits15 and rats54 as well as 2 isoforms in humans53,55,56.

Across these species the CLIC6 C-terminal CLIC module is similar (92 – 96% sequence identity), whereas the additional N-terminal extension unique to this family member is

43 less conserved retaining only 37 to 78% identity. The CLIC6 N-terminal extensions are distinct from the CLIC5B/p64 extension53 but like these are also hydrophilic and in some species contain short repetitive sequence elements. The CLIC6 N-terminal extensions are highly acidic, with theoretical pI values of around 3.9 due to a high percentage of glutamic acid residues (between 16-20%). They are also rich in small amino acids, such that taken together the three amino acids alanine, glycine and glutamic acid, account for over 50% of the residues within the extensions.

In only some species do the CLIC6 N-terminal extensions contain repetitive sequence elements. In rabbits variations on the 6 residue sequence GGS[V/I]DA is repeated 15 times15, while in humans variations on the decapeptide EGPAGES[V/I][D/E]A is usually repeated 14 times53. It should be noted that out of the last 6 residues in the consensus sequence for the human CLIC6 repeat, 5 of the 6 are shared by the consensus repeat in rabbit. The N-terminal extensions of mouse and rat CLIC6 do not contain repetitive elements in their primary structure comparable to the human and rabbit orthologues.

CLIC6 was first characterised at the molecular level in an attempt to elucidate proteins involved in the activation of gastric acid secretion in rabbits15. When parietal cells are in the resting state, the H+ pump involved in gastric hydrochloric acid secretion, H+,K+- ATPase, is sequestered in an inactive form within cytoplasmic vesicles (tubulovesicles) away from the gastric lumen57. Within these vesicles the H+ pump is kept inactive because of the low permeability of the tubulovesicular membrane to KCl, rather than the absence of ATP57. Gastric acid secretion occurs through the translocation of these tubulovesicles to the apical plasma membrane while at the same time the plasma membrane acquires a passive KCl permeability57. These two processes appear to be activated via secondary messengers such as cAMP, possibly via the activation of cAMP-dependent protein kinase(s) that subsequently phosphorylate effector protein(s)57.

In order to identify such phosphoproteins, Urushidani et al. determined which proteins were radioactively phosphorylated on histamine activation of 32P-loaded rabbit gastric glands58. A 120 kDa phosphoprotein, as determined by SDS-PAGE, was seen to redistribute to apical membrane rich fractions concomitant with H+,K+-ATPase and 44 ezrin50. This 120 kDa protein was present as a soluble cytosolic component in resting parietal cells. Upon activation of these cells a small fraction of the 120 kDa protein translocated to the apical membrane50.

In a subsequent paper by Nishizawa et al., the same group digested the purified 120 kDa phosphoprotein and sequenced the fragments. From this they were able to obtain partial protein sequences that showed high sequence similarity to the bovine p64 protein15. Based on these peptide sequences a cDNA probe was obtained and used to screen a rabbit brain cDNA library, a partial clone was identified and a full length clone subsequently isolated. The full-length cDNA for rabbit CLIC6 encodes a 637 amino acid protein with a predicted MW of 64.9 kDa and predicted pI of 4.2. The unmodified protein was seen to run anomalously on SDS-PAGE gels at 120 kDa15.

Northern blots identified a 4.1 kb transcript in choroid plexus and gastric mucosa as well as faint staining in the kidney15. Western blots of various tissue homogenates using an anti-rabbit CLIC6 antibody stained a 120 kDa band in brain, chorioretinal epithelia, lacrimal glands, submandibular glands, airway epithelium, kidney and gastric mucosa. An 80 kDa band was also stained in the chorioretinal epithelium and lacrimal gland, possibly representing an alternatively spliced version of rabbit CLIC6 15. Many of these cell types are involved in the movement of fluid within the body. In a later paper by Mizukawa et al., the epitope for the antibody used in these studies was localised to the repetitive sequence within the unique N-terminal extension of the rabbit CLIC6 sequence21. Cross-reactivity between other CLIC family members is therefore not expected. However with this epitope, the antibody would be expected to react with both CLIC6A and CLIC6B isoforms if both also exist in rabbit.

Using immunohistochemistry Mizukawa et al. studied the cellular and intracellular distribution of CLIC6 in a variety of rabbit tissues. CLIC6 was seen to stain predominantly within cell types responsible for the creation of ion gradients for the purpose of fluid movement. This led the authors to suggest that CLIC6 may play a role in water transport possibly by effecting chloride permeability of the plasma membrane21. CLIC6 expression was also seen to coincide with the induction of fluid transport in the gastric mucosa (upregulation after weaning) and in mammary glands (upregulated during lactation)21. 45

A partial sequence of the C-terminal CLIC module for the human orthologue of CLIC6 was first identified in the genomic sequence of human chromosome 21 56. In a later paper by Strippoli et al. looking at paralogy in the , a more complete CLIC6 ORF was predicted from the genomic sequence, including most of exon 1 which encodes the unique CLIC6 N-terminal extension55. Later, Friedli et al. demonstrated that the identification of exon 1 by Strippoli et al. had been incomplete53. The full gene is comprised of 7 exons: exon 1 encodes the hydrophilic N-terminal extension similar to that found in rabbit CLIC6; exon 1b encodes a small 18 residue sequence that is only incorporated within one of the human CLIC6 isoforms and the remainder of the gene encompassing exons 2-6 encode the majority of the conserved C-terminal CLIC module.

In humans two alternatively spliced CLIC6 isoforms exist53: CLIC6B contains all 7 exons while in CLIC6A exon 1b is absent. Exon 1 while predominantly encoding the large hydrophilic N-terminal extension unique to CLIC6, also contains at its extreme 3’ end residues corresponding to β-strand 1 of the conserved CLIC module53. The inclusion of exon 1b in CLIC6B results in an 18 residue insertion between β-strand 1 and α-helix 1 within this module. Although not displaying any sequence similarity this insertion in CLIC6B occurs in a position analogous to the 18 residue insertion seen in CLIC2B35.

As described previously, both human CLIC6 isoforms contain a large hydrophilic N- terminal extension. This extension contains 14 decapeptide repeats related to the hexapeptide repeats seen in rabbit and are encoded within an extremely GC rich segment of exon 1. To determine whether these repeats effected cellular localisation, Friedli et al. transfected a kidney cell line with GFP or hemagglutinin (HA) fusion constructs containing either 8 or 0 of the 14 repeats. Both transfectants displayed identical diffuse cytoplasmic and perinuclear staining, which suggests that the repeats are not involved in cellular localisation53.

CLIC6 has also been identified in rodents, with orthologues known in mice and rats53,54. Both these rodent CLIC6s contain hydrophilic N-terminal extensions; however these do

46 not contain the repetitive primary structure observed in the human and rabbit orthologues. The CLIC6 mouse orthologue was identified from the mouse genome due to its similarity to human CLIC6 53. The rat orthologue of CLIC6 was identified in a yeast two-hybrid screen using the C-terminus of the dopamine D2 receptor as bait, while screening a rat brain cDNA library54.

Northern blots probing for CLIC6 in human tissue cDNA libraries have identified a 5 kb transcript in lung and stomach and a 6 kb transcript in heart and muscle53. In mice, transcripts were identified in brain, stomach, lung, kidney, testis and the eyes53, while in rats a 3 kb transcript has been seen in stomach, pituitary gland, brain and the choroid plexus54. However the best characterised tissue expression profile is that of the rabbit CLIC6 orthologue, due predominantly to a paper by Mizukawa et al. that utilises their CLIC6-specific monoclonal antibody to analyse a variety of tissues21. In this paper, rabbit CLIC6 was seen to be primarily located within the epithelium of the ducts of the lachrymal, parotid, submandibular and mammary glands and the pancreas, prostate and testis21. CLIC6 was also found in the trachea and lung, kidney, eye cochlea and gastric mucosa21. It is noteworthy that the tissue distributions derived from the Northern blots for the other CLIC6 orthologues primarily represent different subsets of the tissue distribution identified in this more detailed immunohistology study. It seems most likely that this represents a cautionary reminder as to the limitations of assessing tissue distribution from mRNA transcript profiles, as opposed to real species-specific differences in CLIC6 tissue distribution.

Most CLIC family members are almost exclusively intracellular unless overexpressed, CLIC6 is thus of particular interest as several reports have identified endogenous CLIC6 on the plasma membrane. In rabbit parietal cells Urushidani et al. showed that CLIC6 exists mainly in the cytosol, but that a small fraction of the total protein binds to the apical membrane when the cells were stimulated50. The same group confirmed this profile in a later paper when Nishizawa et al. looked at the localisation of CLIC6 in the water secreting gastric, lacrimal and salivary glands15. An epitope-tagged version of rat CLIC6 transfected into a human kidney cell line was primarily localised to the plasma membrane but also to the cytoplasm, and removal of extracellular chloride did not alter this distribution54. Despite this, using GFP or epitope tagged human CLIC6, Friedli et al. reported only a diffuse cytoplasmic staining in various kidney cell lines53. 47 Table 1-6 Reported intracellular distributions of CLIC6 Antibody Tissue & Visual- Protein Intracellular Species Study used/ cell type/ isation stained localisation Marker cell line Distributed throughout the Polyclonal cytosol of Urushidani Immuno- Gastric Rabbit anti- Endogenous parietal cells, et al. 1999 fluorescence gland parchorin when stimulated some aggregation Diffuse staining of parietal cells, Monoclonal Nishizawa Immuno- Gastric translocation to Rabbit anti- Endogenous et al. 2000 fluorescence glands membranous parchorin structures after stimulation Diffuse staining Monoclonal of acinar cells, Nishizawa Immuno- Lacrimal Rabbit anti- Endogenous membranous et al. 2000 fluorescence glands parchorin structures visible Staining of Monoclonal apical surface of Nishizawa Immuno- Salivary Rabbit anti- Endogenous intercalated et al. 2000 fluorescence glands parchorin ductal cells not acinar cells. Located throughout cytosol, when Kidney Exogenous cells transferred Nishizawa GFP- cell line Pig Florescence GFP- to Cl- free et al. 2000 parchorin (LLC- oCLIC6A solution PK1) translocates to plasma membrane Plasma membrane, Kidney Exogenous diffuse Griffon et Immuno- cell line Human anti-His His- cytoplasmic al. 2003 fluorescence (HEK- rCLIC6A staining with 293) small aggregates. Exogenous Located Immuno- African Kidney HA- throughout Frieldi et fluorescence Anti-HA/ green cell line hCLIC6A/ cytoplasm and al. 2003 / GFP-CLIC6 monkey (Cos7) GFP- in perinuclear Fluorescence hCLIC6A structures Exogenous Located Immuno- Kidney HA- throughout Frieldi et fluorescence Anti-HA/ Canine cell line hCLIC6A/ cytoplasm and al. 2003 / GFP-CLIC6 (MDCK) GFP- in perinuclear Fluorescence hCLIC6A structures Species abbreviations as used in Table 1-2.

48 Although Mizukawa et al. hypothesised that rabbit CLIC6 may be involved in water transport, possibly through regulation of the chloride permeability of the plasma membrane21, independent experiments with the human and rat CLIC6 orthologues have not confirmed this. Friedli et al. performed voltage clamp studies on Xenopus oocytes injected with an expression plasmid containing human CLIC6. No difference was seen between injected oocytes and controls53. In similar experiments performed using the rat CLIC6 orthologue, Griffon et al. suffused transfected CHO cells with the Cl- sensitive fluorophore MQAE. Again no significant difference in Cl- efflux was observed between those cells stably expressing rat CLIC6 and controls54.

1.2.8 CLIC protein interaction partners The exact role or roles of the various CLIC family members within the cell are yet to be fully understood. If, as several lines of evidence possibly indicate, they function as chloride channel regulators or possibly as chloride channels themselves, it is not yet known what cellular function such chloride conductance facilitates. When hypothesising or analysing proposed cellular roles for the CLICs it must be kept in mind that the chloride channel activity associated with them, has predominately been measured in in vitro experimental systems and this may belie their true in vivo function.

Some ideas as to the in vivo cellular function of the CLICs may be obtained by looking at the interaction partners that have been identified for various family members. These interaction partners can presently be grouped into two main classes, cytoskeletal or scaffold proteins, and phosphatases or kinases. These two groups are discussed below in Sections 1.2.8.1 and 1.2.8.2. Proteins that do not easily fit into one of these two categories are discussed separately in Section 1.2.8.3.

1.2.8.1 Cytoskeletal and scaffold proteins Several CLIC proteins have been observed to interact either directly or indirectly with cytoskeletal and scaffold proteins. The cytoskeletal proteins are a diverse group of proteins, some of which are capable of polymerising. They perform a wide range of functions centred on the network of intracellular protein filaments that makes up the cytoskeleton. The cytoskeleton provides a cellular level scaffold structure upon which the subcellular and grosser anatomical features of a cell are organised. The 49 cytoskeleton must meet the requirement of the cell for both a static framework capable of providing large-scale structures with mechanical strength, yet must also be dynamic enough to accommodate cell movement and help co-ordinate such processes as cellular differentiation, mitosis and meiosis, cytokinesis, membrane trafficking and signalling pathways.

At a lower level of organisation, scaffolding proteins frequently mediate the interface between various cellular processes and the cytoskeleton. As such, scaffold proteins are responsible for the recruitment of multiple signalling pathway elements at a single sub- cellular location in order to provide a local degree of specificity to the regulatory enzymes involved59.

The most in depth studies detailing interactions between the CLICs and cytoskeletal or scaffold proteins, involve the CLIC4 and CLIC5 orthologues. As discussed in Section 1.2.6, CLIC5A was first identified from placental microvilli extracts with a pull-down assay using the C-terminal tail of the cytoskeletal protein ezrin as bait31. Ezrin is a major membrane-cytoskeleton linking protein that contains an F-actin binding site in its extreme C-terminus31. CLIC5A was one component of a cytoskeletal complex, each member of which bound directly or indirectly to the ezrin C-terminus; this complex also contained the cytoskeletal proteins actin, α-actinin, full-length ezrin, gelsolin and IQGAP1 31. With the same assay system CLIC4 was also seen to bind directly, or as part of a complex, to the C-terminal tail of ezrin, albeit at levels significantly less than that of CLIC5A31.

A second paper detailed this interaction more clearly by comparing which proteins bound during pull-down assays using either the C-terminal tail of ezrin or CLIC5A as bait. From this it was shown that CLIC5A did not associate directly with gelsolin or itself43. Based on evidence that suggested these cytoskeletal complexes were being produced de novo as opposed to being pre-formed within the microvilli extracts, the experiments were repeated in the presence of a drug that inhibits actin polymerisation. The results obtained suggested that CLIC5A bound directly to full-length ezrin and an unknown 70 kDa protein but not to G-actin or the C-terminal tail of ezrin43.

50 Interestingly CLIC5 may not be the only family member that associates with ezrin or ezrin-like proteins. Ezrin and CLIC6 were both identified as proteins which were phosphorylated when rabbit gastric glands were stimulated, although whether they interact directly is currently unknown50. Additionally, in a yeast two-hybrid screen, rat CLIC6 was observed to interact with radixin, another membrane-cytoskeleton-linking protein related to ezrin54. In this paper CLIC6 was also seen to associate with MUPP1, a scaffold protein containing multiple PDZ peptide binding domains54. Proteins containing such PDZ domains are important for the subcellular localisation of membrane proteins. They frequently contain multiple substrate binding sites and enable the assembly of multi-protein complexes60. The substrates of PDZ domain containing proteins are often ion channels or transmembrane receptor proteins60. However it must be noted that the C-terminus of CLIC6 does not appear to contain a classical substrate motif for a PDZ domain. Therefore the interaction between CLIC6 and MUPP1 is probably either non-direct, or not meditated through one of the MUPP1 PDZ domains.

With a pull-down assay system using the C-terminal domain of CLIC4 as bait, Suginta et al. probed cytosolic rat brain extracts and identified several cytoskeletal and scaffolding proteins that interact either directly or as part of a complex with CLIC4 42. The proteins that bound to CLIC4 in this assay included the cytoskeletal proteins: α- actin, β-tubulin and dynamin I, the scaffolding proteins: 14-3-3ζ and 14-3-3ε and the creatine kinase β-chain42. Using gel overlay and reverse pull-down assays, dynamin I and 14-3-3ζ were shown to bind CLIC4 directly, whereas direct binding was not detected for α-actin, β-tubulin, 14-3-3ε and creatine kinase42.

The dynamins form characteristic ring and spiral homo-oligomers and are involved in membrane remodelling; they appear to be force-generating mechanoenzymes capable of constricting a membrane within the ring or spiral61. Dynamins have been implicated in endocytosis, caveolae internalisation and membrane trafficking from late endosomes and Golgi61. The 14-3-3 proteins are a family of small 27-32 kDa scaffold proteins that bind primarily to sequences containing phosphorylated serine/threonine residues in a wide variety of ligands62. As opposed to typical scaffolding proteins that are thought to simply mediate interactions between various ligands, in some cases the 14-3-3 proteins directly regulate their targets by altering their function when bound62. They regulate a

51 range of physiological processes including intracellular signalling, cell cycling, apoptosis and transcription regulation62.

CLIC5A was discovered due to its interaction with cytoskeletal proteins. CLIC5B was first identified as interacting with the scaffolding protein AKAP350. Shanks et al. located a clone containing CLIC5B in a yeast two-hybrid screen, performed using a segment of Protein kinase A-anchoring protein 350 (AKAP350)20. The AKAPs bind to the regulatory subunits of the type II cAMP-dependent protein kinase A (PKA), thus tethering the inactive holoenzyme at discrete locations within the cell59. This prepares the kinase to phosphorylate local substrates in response to activation by local cAMP generation or release from intracellular stores. There is also evidence that such scaffolding proteins are able to coordinate multiple regulatory enzymes and their substrates into multivalent signalling complexes59.

AKAP350 is one of several protein isoforms derived from the same gene, it has a broad tissue distribution and is localised within the cell to the centrosome and Golgi apparatus59. AKAP350 is able to bind multiple signalling proteins. It has 2 putative binding sites for the type II PKA regulatory domain as well as putative binding sites for protein phosphatase 1, protein kinase N, protein phosphatase 2A, protein kinase Cε and phosphodiesterase 4D3 20. CLIC5B, CLIC1, CLIC4, CLIC5A and rabbit CLIC6 all interacted in yeast two-hybrid binary assays with the same segment of AKAP350 20, this suggests that the AKAP350’s CLIC-binding domain is not specific to any one family member. The last 120 amino acids of CLIC5 were seen to be necessary for interactions with AKAP350 in these binary assays20. CLIC4 has been shown to co-localise with AKAP350 at the centrosome and midbody of cultured cells43, while CLIC5 co-localises with AKAP350 at the Golgi20. These results suggest different CLIC-family members may associate with AKAP350 at different cellular locations.

CLICs 1, 4, 5 and 6 are thus known to associate with various cytoskeletal and related proteins. Whether CLICs 2 and 3 are unique within the family in not doing so is unknown. They are currently the two least studied CLIC family members, thus absence of data in this regard may represent a real lack of interaction or simply be due to current experimental biases.

52 What could the significance of the interaction between the CLICs and these cytoskeletal and scaffold proteins be? It is noteworthy that several of the interactions discussed above involve proteins directly associated with membranes. This evidence is not contrary to the idea of the CLICs being associated with a channel activity. CLICs 4, 5 and 6 were seen to interact with the membrane-cytoskeleton linking proteins ezrin or radixin. CLIC4 was associated with the membrane-active dynamins, while CLIC6 possibly interacts with a scaffolding protein containing domains known to be involved in the formation of membrane-associated protein complexes.

1.2.8.2 Phosphatases and kinases Many cellular proteins are phosphorylated as a means of reversible regulation, signal transduction or to alter their function. As such, they will interact with both kinases and phosphatases during their lifetime. The CLIC proteins have multiple potential phosphorylation sites and several family members are known to be phosphorylated in vivo or in vitro.

Duncan et al. have shown in vitro that rat CLIC4 is phosphorylated by protein kinase C (PKC) at up to 4 different sites40. CLIC3, as discussed in Section 1.2.4, was first identified due its association with a mitogen-activated kinase family member, ERK7, which inhibits DNA synthesis upon activation37. CLIC6 was first discovered as a protein that was phosphorylated in response to stimulation of the gastric glands50, albeit the identity of the kinase involved is not known.

Edwards et al. measured chloride efflux from vesicles prepared from the cell membranes of cells expressing bovine p64, chloride channel activity was seen to increase when the membranes were treated with alkaline phosphatase63. In a later paper the same group showed that bovine p64 was phosphorylated within its N-terminus by p59fyn, a Src family tyrosine kinase24. Chloride efflux rates were higher in vesicles formed from membrane preparations prepared from cells which co-expressed both p64 and p59fyn compared to either alone. This increased chloride efflux could be reversed by treatment with alkaline phosphatase24. Reconciliation of the results of these two papers suggests that p64 has multiple phosphorylation sites, which can either enhance or suppress channel activity.

53

Scaffold proteins recruit multiple signalling elements at the same site of action. Given the ubiquitous use of phosphorylation as a means of signal transduction within the cell, many scaffold proteins have multiple ligand binding sites for various kinases and phosphatases as well as their substrates. This includes those scaffolding proteins that bind the CLICs discussed above in Section 1.2.8.1. Based on the relationship between 14-3-3 proteins and a testis specific isoform of protein phosphatase 1, PP1γ2, Myers et al. demonstrated that CLICs 1, 4 and 5 were all able to interact with PP1γ2 in pull-down assays33.

From the currently available data, it is difficult to draw any real conclusions from the phosphorylation-related CLIC interaction partners. If the CLICs are associated with an ion channel activity, they would be expected to be under tight regulatory control mechanisms perhaps including phosphorylation at various sites. However the ubiquitous use of protein phosphorylation in cellular regulatory mechanisms makes it impossible to guess at the function of a protein without a more detailed knowledge of the specific kinases and phosphatases involved.

1.2.8.3 Unknown and miscellaneous proteins As well as the CLIC interaction partners easily classified into one of the two groups already discussed, several reports detail protein-CLIC interactions with a variety of proteins whose functions currently appear unrelated. These proteins are mainly identified in reports where the interaction concerned is not the main thrust of the article. With further experimentation these interactions may be seen to be experimental artefacts; CLIC family member specific; indicative of the function of the CLICs or alternatively a precursor to the discovery of a more fundamental set of interactions between the CLIC family and that of the respective interaction partner. These proteins are discussed briefly below and summarised in Table 1-7.

In yeast two-hybrid systems Fan et al. showed that CLIC1 and CLIC2 both interact with Sedlin35. The role of Sedlin within the cell is largely unknown; its homologue in yeast is a subunit of the transport protein particle (TRAPP) complex that is involved in vesicle docking and fusion processes during ER-to-Golgi transport35. Griffon et al.

54 used yeast two-hybrid screens to show that CLIC6 was able to interact with the C- terminal tail of the dopamine receptors D2R, D3R and D4R, the interaction with D3R was confirmed using a pull-down assay54. These dopamine receptors are all integral membrane proteins. If they associate in vivo with the CLICs this implies that these CLIC proteins must also localise to common membrane systems.

Table 1-7 Miscellaneous CLIC interaction partners. CLIC interaction partners not classified as either cytoskeletal/scaffold proteins or kinases/phosphatases.

CLIC Reported Interaction Interaction Possible reason family Function by partner determined by for interaction member Fan et al. Part of TRAPP hCLIC1 Sedlin Yeast two-hybrid ? 2003 complex Fan et al. Part of TRAPP hCLIC2 Sedlin Yeast two-hybrid ? 2003 complex Yeast two-hybrid Griffon et D R, D R and Dopamine hCLIC6 2 3 ? al. 2003 and D4R immunoprecipitation receptors for D3R CLIC4 Suh et al. Part of nuclear translocates to the mCLIC4 Importin-α Immunoprecipitation 2003 import complex nucleus after cellular stress GTPase CLIC4 involved in Suh et al. translocates to the mCLIC4 Ran Immunoprecipitation importin cargo 2003 nucleus after release during cellular stress nuclear import CLIC4 Suh et al. Nuclear translocates to the mCLIC4 NTF-2 Immunoprecipitation 2003 transport factor nucleus after cellular stress mRNA Lehner et hCLIC1 Lsm1 Yeast two-hybrid decapping and ? al. 2004 degradation Lehner et Chromosome hCLIC1 Yeast two-hybrid ? ? al. 2004 9 ORF 75 Abbreviations as used in Table 1-2.

In a paper by Suh et al. detailing the translocation of cytoplasmic CLIC4 to the nucleus in response to cellular stress, CLIC4 was seen to interact with members of the importin- α nuclear import complex including Ran, NTF-2 and Importin-α itself45.

55 Finally, in a large-scale yeast two-hybrid screen designed to find interaction partners of proteins encoded for by the MHC class III region, CLIC1 was seen to interact with Lsm1, an mRNA capping and degradation protein and an unknown protein64.

1.3 Evidence for the CLICs as chloride channels It has been proposed that the CLIC family of proteins themselves act as, or regulate, chloride channels. There are several reviews that summarise in detail the current scientific evidence supporting this hypothesis7-9,65. The following section gives a brief overview.

The initial hypothesis that the CLIC proteins may act as, or regulate, chloride channels first arose because p64 bound to an IAA-23 affinity column14,17. IAA-23 is a homologue of indanyloxyacetic acid 94 (see Figure 1-3), a compound that had previously been shown to be a moderate affinity inhibitor of a chloride conductance present in membrane extracts from bovine kidney cortex or trachea16. The involvement of p64 in this chloride conductance was further supported when it was shown that anti- p64 antibodies were able to deplete the bovine membrane extracts of this channel activity18. Since these first studies, several reports have attempted to show that various CLIC family members are directly associated with chloride conduction. A few different experimental systems have been used, the results of these are summarised in Table 1-8.

Using patch clamp studies, cells expressing recombinant CLIC1 or CLIC4 show increased channel activity at the plasma membrane compared to controls11,25,28,48,66. In several of these reports48,66 a direct link to the increased conductance was demonstrated as channel activity was blocked by immunodepletion with antibodies raised against the CLIC protein being studied or an added epitope. In other reports, the endogenous channel activity measured in different cell lines was thought to be CLIC-related, as they appeared to contain similar channel characteristics and inhibitor profiles to those observed in cells expressing recombinant CLIC1 67.

In such whole-cell based channel measurements it is not possible to control all the experimental parameters. Therefore more extensive channel characterisation studies have been performed for some CLICs using simplified experimental systems. These

56 use either membrane extracts derived from cells expressing recombinant protein, or alternatively, use highly purified bacterially expressed protein. The membrane extract or purified protein is then incorporated into planar lipid bilayers11,63 or artificial vesicles in which chloride efflux12,13,51 or uptake14,18 is measured. The majority of these reports detail the behaviour of the human CLIC1 family member, thus it is the best understood within the family. From these experiments it has been shown that recombinantly expressed and purified CLIC1 appears to be sufficient for the formation of IAA-94 inhibitable chloride channels in vitro11-13. The channels thus formed have properties similar to those measured on the plasma membrane of cells expressing recombinant CLIC1 66.

Despite the many papers looking at the possibility that the CLICs act as chloride channels, it remains highly controversial whether this is actually the case. Doubt arises because many aspects of the CLIC proteins appear to be unconventional for ion channels. Several criteria are required to be satisfied when proving a protein is an ion channel; a few of these have yet to be conclusively proved for the CLICs. This is succinctly presented in a paper reviewing the CLIC family by Ashley7., who stated that proteins forming “a transmembrane pore from a single subunit or homo-oligomers must: 1. be able to adopt a structure compatible with a transmembrane, channel-forming protein; 2. form ion channels when expressed in cells (especially cells that normally lack the protein); 3. form identical channels when incorporated as a pure protein into liposomes or planar bilayers;

4. show altered function (e.g., differences in ionic selectivity) when critical pore- forming regions are modified (e.g., by mutagenesis).”7

The CLICs have been shown to satisfy the second and third criteria but are yet to pass the first and fourth.

Several independent groups have shown to different extents, that CLIC family members are able to satisfy the second criterion. However the conclusiveness of these experiments is difficult to judge. Most, if not all, cells appear to express multiple CLIC

57 family members endogenously, thus in such experiments channel activity is usually analysed as an increase compared to controls, as opposed to the more ideal case where a cell-line lacking endogenous CLICs would allow for controls without any such endogenous channel activity. Attributing the observed increase in channel activity directly to the CLICs is also difficult. The most convincing cell-based experiments utilise antibody immunodepletion to show a direct relationship between the CLIC protein being recombinantly expressed and the increase in channel activity. However such experiments are in the minority, and others typically rely on the inhibition of channel activity by IAA-94 to draw a link to the founding CLIC family member p64. As discussed below, IAA-94 is a highly non-specific blocker, and alone, it is not suitable for defining the molecular identity of channels in in vivo systems. One other minor concern in establishing this criterion for the CLIC family is the current lack of agreement in the CLIC-related channel characteristics produced by independent groups. The conductance, ion specificities, opening/closing times and probabilities as well as the channel’s inhibitor profile differ considerably in observations by independent groups. It has been suggested that this may be due to differences in the experimental conditions used by different authors, or alternatively, in the oligomerisation state of the channel being measured66. It is important to resolve this problem by defining a base set of conditions used by all groups that would allow for direct comparisons to be made between the behaviour of the same protein in different laboratories.

Two independent groups have results applicable to the third criterion. These show that purified CLIC1 is capable of forming channels when incorporated into artificial lipid bilayers11 or liposomes12,13. Both groups confirm that CLIC1 alone is sufficient to form ion channels in vitro, and the channels thus formed display many common characteristics with those observed in cells expressing CLIC1.

58 Table 1-8 Chloride conductances reported to be associated with the CLIC family. Experimental in vitro Selectivity profile or Inhibitor Report CLIC Rectifying? Conductivity P /τ /τ systems /in vivo o o c other notes profile Uptake rate of 36Cl uptake in tracer:TcO ->I->SCN- 4 IAA-94> Landry et vesicles prepared bCLIC5B >Br->Cl- N/A 130B=NPPB> al. 1987 from bovine ? Cell extract N/A N/A Inhibition of 36Cl uptake - - - - - DIDS>EA> kidney or trachea ClO4 =SCN =I >Br >Cl - - Bumetanide membrane extracts >NO3 >F >formate =acetate>gluconate IAA-23 affinity Cl-:K+ selectivity of: purified protein Landry et 26, 100 and 400 26pS at 20:1 IAA-94 incorporated into bCLIC5B Cell extract - - al. 1989 pS 100pS at 13:1 insensitive planar lipid 400pS at 2-5:1 bilayers. DIDS affinity purified chicken Schlesinger osteoclast gCLIC5B 15-25pS (140mM Cell extract Outward - - DNDS et al. 1997 membrane extracts ? KCl) incorporated into planar bilayers Transfected HeLa 42pS Edwards et cell extracts bCLIC5B (140mM Cl-:K+ selectivity is DNDS, TS-TM in vivo Outward - al. 1998 incorporated into ? KCl:140mM greater than 20:1 calix(4)arene planar bilayers CsCl) Patch clamp of Proutski et Outward IAA-94 (IC ~ transfected HEK- hCLIC4? in vivo 1pS - - 50 al.2002 (mild) 200 uM) 293 cells Patch clamp of 9 pS (20 mM At 0mV transfected CHO- Tonini et al. KCl), 11 (50mM potential τ IAA-94, A9C, K1 cells, single hCLIC1? in vivo Outward o - 2000 KCl), 17 (140mM not DIDS channel =8.1 ms, τc = KCl) conductance 4.3 ms

59 Experimental in vitro Selectivity profile or Inhibitor Report CLIC Rectifying? Conductivity P /τ /τ systems /in vivo o o c other notes profile Chloride permeability of 200mM Tulk et al. IAA-94 (IC50 vesicles after hCLIC1 in vitro N/A KCl:400mM - - 2000 ~ 8.6 uM) addition of sucrose purified protein 161pS (300mM Br- ~ Cl- > I-, Protein added KCl), 68 pS IAA-94 (IC ~ Tulk et al. Inward and Cl-:K+ is 4.2:1 (300- 50 directly to planar hCLIC1 in vitro (150mM KCl), P = 0.699 86 uM) at 50 2000 Outward o 50mM KCl) and 5.8:1 lipid bilayers reversal 24 mV mV (150-50mM KCl) (300-50mM) Chloride permeability of Tulk et al. Highest activity at pH 5 IAA-94, NEM, vesicles after hCLIC1 in vitro N/A - - 2002 or 9, lowest at pH 7. GSH, GSSG, addition of purified protein Protein added Tulk et al. 150pS (300mM directly to planar hCLIC1 in vitro - - - IAA-94 2002 KCl) lipid bilayers Patch clamp of transfected CHO- Warton et K1 cells, single hCLIC1? in vivo - 30pS P = 0.532 - - al. 2002 o channel conductance SCSK: 8, 14, 21 Protein added Po=0.478 Highest probability of Warton et and 30pS IAA-94 (IC ~ directly to lipid hCLIC1 in vitro slight τ = 250ms, observing channels at 50 al. 2002 (140mM KCl) o 25 uM) bilayers low pH (<6.0) HCFK: 31pS τc= 160ms Patch clamp of transfected CHO- Permeability sequence Valenzuela - - - - - K1 cells, single hCLIC1? in vivo - 22 pS Po = 0.19 SCN >F >Cl >NO3 >I - et al. 1997 - channel >HCO3 >acetate conductance

60 Experimental in vitro Selectivity profile or Inhibitor Report CLIC Rectifying? Conductivity P /τ /τ systems /in vivo o o c other notes profile Patch clamp of Valenzuela transfected CHO- P =0.478, τ o o IAA-94, A9C, et al. 2000 K1 cells, single hCLIC1? in vivo - 7.8 pS - = 250ms, τc= not DIDS channel 160ms conductance Patch clamp of rat Norvarino et microglial cells, 7.1 pS, reversal P = 0.24 hCLIC? in vivo - o - IAA-94 al. 2004 single channel potential –81mV τo = 4.2 ms, conductance Patch clamp of Norvarino et BV-2 cells, single 6,4 pS, reversal P = 0.27 hCLIC? in vivo - o - IAA-94 al. 2004 channel potential –85mV τo = 4.5 ms, conductance Chloride permeability of Berryman et vesicles after hCLIC5A in vitro - - - - IAA-94 al. 2004 addition of purified protein Abbreviations as used in Table 1-2.

61

The controversy over the behaviour of the CLICs as ion channels centres on the current lack of data to address the first and fourth criteria. On the face of things, it seems as if the nature of the CLICs is incompatible with that of a transmembrane, channel-forming protein. The CLICs appear to exist as both soluble and membrane associated forms. Thus if the CLICs themselves are ion channels, they require a membrane spanning form as opposed to being a peripheral membrane protein. Such a transformation between a soluble form and a transmembrane form would require a protein to undergo a large structural change. It is now known that the soluble form of the CLICs adopts a GST fold (discussed further in the following section), a common fold that has not previously been known for large-scale structural changes. As well as this, there are no obvious stretches of hydrophobic residues in the CLICs predicted to reside within the membrane. Despite these facts several hypotheses on how the CLICs may insert into the membrane have been proposed, but with our current level of knowledge it remains difficult to resolve the experimental ion channel data with the amino acid sequence and soluble crystal structure of the CLICs. Until it can be determined what sort of membrane spanning structure the CLICs form, our understanding of how they could function as ion channels will be hampered. The lack of experiments addressing the fourth criterion arises from the lack of understanding of the CLICs transmembrane form. With no obvious models from which to work, few mutants designed to alter ion channel characteristics have been made or tested. Experiments on such mutants are necessary, however it has been suggested that studies further characterising the native channel may be a first priority7.

The ion specificity and inhibitor profile of a channel are important defining characteristics. As there are only a few reports detailing the ion permeability sequence of the CLIC-related channels13,16,25,63 this does not yet provide a reliable means of identifying them. In contrast, there are many reports detailing the behaviour of CLIC- related channel activity in response to a variety of inhibitors. Due to the pivotal role it played in identifying p64 this usually includes IAA-94, but has also included A9C28,66, NPPB68, DNDS63,69 and TS-TM-calix[4]arene63. Some reports have observed inhibition with DIDS17,69 while others have not28,66.

62

Figure 1-3 Chemical structures of the inhibitors discussed in the text. From top left to bottom right. The chemical structures of the indanyloxyacetic acid (IAA) inhibitors: IAA-94; indacrinone (IAA-91/92); IAA-23, DCPIB and DIOA as well as the related compound ethacrynic acid (EA). The inhibitors: anthracene-9-carboxylic acid (A9C); [5-nitro-2-(3-phenylpropylamino)-benzoic acid] (NPPB); the sulphonated calixarene TS-TM-calix[4]arene; and the related disulphonic stilbenes DNDS and DIDS.

There is an obvious temptation to treat IAA-94 as a specific CLIC-related channel inhibitor, because it was originally used in the identification of the first CLIC family member and there are few reports of cross-reactivity with other anion-channels. It is extremely important that this is not done. A full profile of well-understood inhibitors is required before the CLIC-related channels can be reliably identified through such means. The behaviour of IAA-94 is not well understood, and the lack of reported cross- reactivity with other anion channels is most likely simply due to a paucity of such studies. Indeed the few studies that have looked at the behaviour of IAA-94 seem to indicate it is anything but specific. The indanyloxyacetic acid inhibitors were originally studied because they were cyclised versions of the (acryloylphenoxy) acetic acid class of loop diuretics and were unreactive against free sulphydryl groups70,71. The most well known of the (acryloylphenoxy) acetic acid class of loop diuretics is ethacrynic acid (see Figure 1-3). Ethacrynic acid is known to inhibit a wide variety of proteins including GSTs72-74, Na+/K+/2Cl- cotransporters75 and organic anion transporters76. The broad specificity of EA requires a cautious interpretation be made of experimental

63 results attained using the related indanyloxyacetic acid inhibitors, although it is noted that the reduced activity of this group of compounds towards free sulphydryl groups may limit their cross-reactivity.

Several compounds structurally simialr to IAA-94, including DIOA77 and DCPIB78 (see Figure 1-3), are known to inhibit swelling induced chloride currents possibly through blocking various cotransporter proteins. In erythrocytes, IAA-94 itself has been shown to inhibit the activity of the Na+/K+/2Cl- cotransporters (NKCC) and the Band 3 anion exchangers (AE)79. Both these transporter families are involved in cell volume homeostasis, the NKCC in the response to hypo-osmotic solutions80 and the AE in the response to hyper-osmotic solutions and the maintenance of intracellular pH81. Additionally, as described in Section 1.2.1, the affinity column used to identify p64 bound the α subunit of the Na+K+-ATPase, a π class GST and an unknown 40 kDa protein18. If IAA-94 inhibits these proteins as well, it may affect Na+ and K+ gradients across the plasma membrane that are maintained by the Na/K pump. It may also affect the redox state of the cell by inhibiting some GSTs. Treating cells with IAA-94, especially in non-isotonic solutions, is thus likely to lead to changes in cell volume, and hence membrane curvature, intracellular osmolarity and pH, and possibly intracellular redox state. In whole cell-recordings these are relatively large-scale secondary effects that could affect many endogenous proteins, including ion channels. As such IAA-94 is likely to be a poorly specific inhibitor and, by itself not suitable for the molecular identification of a channel conductance. It is noted that such secondary effects are likely to be reduced in inside-out and outside-in patch clamp set-ups where the solutions on the cis and trans side of the membrane are controlled by the experimenter. These secondary effects would presumably not be a problem in artificial lipid systems using purified protein. Such experiments could therefore be useful in determining an inhibitor profile of the CLICs, which could then be extended back to the whole cell channel recordings to elucidate what is occurring in vivo.

1.4 The Glutathione Transferase family

1.4.1 Introduction A structural relationship between the CLIC proteins and the canonical Glutathione S- Transferase (GST) fold family was first proposed by Dulhunty et al.4. This was subsequently confirmed when the crystal structure of CLIC1 was solved3 (discussed

64 further in Chapter 2). The GST fold family is an extensive, evolutionarily ancient protein superfamily with members that perform a wide range of functions. To date they have been found in all fungi, plants and animals as well as some bacteria. Despite their functional versatility and a relatively low sequence identity the canonical GSTs all share a common structural fold. Due to their relationship to the CLIC proteins and for the purpose of structural comparisons, Section 1.4 provides an overview of the GST superfamily and a brief description of the canonical GST fold, which has relevance for analysing the structure of the CLICs.

Organisms are exposed to numerous endogenous and exogenous hydrophobic compounds that are non-nutritional or perhaps toxic, these must be rendered hydrophilic prior to being eliminated. At the cellular level the enzymes responsible for metabolising such compounds are usually classified as either phase I (activation), phase II (detoxification) or phase III (elimination) enzymes. The glutathione transferases are primarily known as a major group of phase II detoxification enzymes responsible for the conjugation of reduced glutathione (GSH) to electrophilic centres in hydrophobic molecules5. The glutathione conjugates thus formed are rendered more hydrophilic, aiding in their elimination from the cell via phase III enzymes6.

There appear to be three evolutionarily distinct groups of glutathione transferases. The majority of these are soluble cytosolic enzymes sharing a common fold; these are defined as the canonical GSTs. There also exists a separate microsomal membrane- associated enzyme family, while sequence analysis82 and a recent crystal structure83 showed that the soluble mitochondrial kappa-class GSTs are distinct from the canonical GST family and instead appear to be related to Disulphide bond isomerase A (DsbA), thus meriting their own subgroup. The CLICs are members of the canonical GST fold family so all further discussions will be limited to this group.

1.4.2 The GST fold family The GST fold is evolutionarily ancient. It was first identified in the glutathione transferases, primarily because of their role in detoxification pathways and thus activity towards experimentally added exogenous compounds. However it is increasingly becoming apparent that these glutathione transferases are only a subset of a larger

65 family of proteins containing the canonical GST fold, many of which utilise glutathione as a substrate or co-factor for a multitude of functions. From sequence conservation alone it is difficult to define exactly how large this GST-fold superfamily is. Even the glutathione transferases themselves display a relatively low level of sequence conservation. Among members of the larger superfamily where functions also diverge this level of conservation will be lower still.

As well as a role in detoxification pathways, the canonical GST fold superfamily includes proteins that are responsible for a variety of other functions. An article by Sheehan et al. summarised some of the following alternatives to glutathione transferase activity: 1. playing a role in the cellular response to oxidative stress via the removal of reactive oxygen species and regeneration of S-thiolated proteins, 2. catalysing glutathione dependent reactions in metabolic pathways not associated with detoxification84, 3. binding hydrophobic ligands as substrates in non-enzymatic roles; 4. performing various structural functions; 5. finally, several GST-like proteins are thought to playing regulatory roles5,85,86.

Compared to the GSTs, which are dimeric, the CLICs are monomeric with an active site cysteine and a low affinity for glutathione. These factors are more commonly found among other members of the larger GST fold superfamily than in the glutathione transferases. Indeed it has been noted that the CLICs share more similarity to the plant GST-like glutathione-dependent dehydroascorbate reductases (DHARs) than to the human omega class GSTs87 (see Figure 1-4).

Through the recent expansion in whole-genome biology, an abundance of proteins with sequence similarity to various GST classes have been identified. Many of these proteins have unknown functions, or do not display classical glutathione transferase activity. Also, as evidenced by the structure of E. coli Glutaredoxin 2 (Grx2)88, proteins retaining little to no sequence identity with the classical GST classes may nonetheless still possess a GST fold. It is possible that an ancestral version of this fold evolved around the time of the advent of the glutathione redox system. The progeny of this ancestral protein may have radiated in many different directions, resulting in a variety 66 of protein families that perform a wide range of redox-related roles. Depending on the age and extent of this radiation, there may be little to no similarity between some branches of the extant progeny of this ancestral protein.

Figure 1-4 Phylogram of proteins with the GST fold containing an active site cysteine. The phylogram shows protein families related to the CLIC proteins. Most of the proteins in the phylogram contain an active site cysteine residue. Proteins are labelled ‘monomeric’ or ‘dimeric’ based on data reported in published characterisations. PSI- blast was used to collect protein sequences that were more closely related to the CLICs than the GSTs, with only the omega class GSTs included (note that some of the sequences are labelled as ‘GST’ even though their activity may not be known. The sequences were aligned with Clustal W120 and a neighbour-joining phylogenetic tree constructed.

It is necessary to note that identification through sequence similarity alone is inherently biased towards finding proteins that have maintained similar functions or diverged only recently from a common ancestor. These biases may have confined our view of the

67 GST fold, and it is possible that the number of proteins containing this fold is presently underestimated. This is shown schematically in Figure 1-5. The majority of characterised proteins with the GST fold are glutathione transferases, these are soluble dimeric proteins typically containing a conserved serine or tyrosine within their active site5. However, the bacterial β and ω class GSTs instead contain an active site cysteine residue that can form a mixed disulphide with glutathione5. It is however noted that neither of these protein families appear to function as classical glutathione transferases49,89. The plant DHARs87, some bacterial glutaredoxins88 and the CLIC-like proteins contain similar cysteine residues within their active sites but, unlike these and all other GSTs, they are predominantly monomeric. It seems likely that this subgroup of cysteine containing monomeric GST-fold proteins is much larger than is presently known. The functions of the protein families known to belong to this subgroup are also poorly understood. If the vertebrate CLIC proteins are shown conclusively to be involved in chloride conduction, it would be interesting to know how many proteins with a GST fold have similar functions. Does such activity extend to the non-vertebrate CLIC-family members, to other monomeric cysteine-containing protein families, or even to some of the dimeric cysteine containing proteins currently classified as GSTs?

Figure 1-5 outlines the subgroups within the GST fold superfamily, but for the sake of clarity has been simplified. In reality the borders between the subdivisions are less distinct, some of the monomeric protein families presumably oligomerise in response to various signals and visa versa for the dimeric proteins. Stringent starvation protein A (SspA) for example, ran as a monomer in gel filtration chromatography91 but crystallised as a GST-like dimer92. The organism identifiers are also not exclusive, but merely represent the lineage in which each protein family is best understood. Some of the proteins within various GST classes are not necessarily glutathione transferases. Also, some glutathione transferases may have more than one function, acting also in non-conjugating roles5. Members of the same protein family can sometimes perform alternative functions in different organisms. For example, in insects members of the σ class GSTs show glutathione transferase activity towards lipid peroxidation products94, while in cephalopods they act as eye crystallins, and in a variety of animals as prostaglandin isomerases5.

68

Figure 1-5 Simplified overview of the GST fold superfamily. As depicted, proteins with glutathione transferase activity are only a subset of the proteins containing the GST fold. The small circles shown above represent the individual protein families with which they are labelled. The large black circle represents the proteins with the canonical GST fold. The red oval encloses the subset of these proteins that are predominantly monomeric, the magenta oval the subset that are predominantly dimeric. Those proteins within the green oval have an active site cysteine. The blue oval represents the subset of proteins with glutathione transferase activity. Oligomerisation or activity data for the various protein families is from: Grx288, CLICs3, DHAR87,90, SspA91,92, Ure2p93, λ class GSTs87, data for other GST classes reviewed by Sheehan et al.5.

Due to the historic focus on the glutathione transferases much more is known about their structure and physiological function than for other canonical GST fold superfamily members. Drawing on this larger knowledge base, the remainder of the review presented here focuses on the GSTs.

69 1.4.3 The canonical GSTs The canonical GSTs were originally identified in mammals due to their ability to conjugate the tripeptide glutathione (γ-glutamyl-cysteinyl-glycine) to electrophilic centres in various hydrophobic compounds, particularly the model coloured substrate 1- chloro-2,4-dinitro-benzene (CDNB). In mammals they are classified into various evolutionary classes, now primarily based on sequence identity. The various classes have different, but often overlapping substrate affinities towards a wide range of hydrophobic compounds. Each enzyme can be active towards a variety of molecules. Our knowledge of such substrates is surely incomplete, and the activity of some GST classes is completely unknown.

The canonical GSTs are predominantly dimeric soluble cytosolic proteins with molecular weights of 24-28 kDa per subunit. Both homo and hetero dimers can be formed but subunits are only capable of hybridising within the same class.

For the mammalian GSTs the naming scheme devised by Mannervik et al. is used95. Each evolutionary class is given a Greek letter and the isoforms within each class a number. A GST heterodimer with subunits consisting of the π class isoforms 1 and 2 is thus referred to as GST P1-2, a homodimer of ζ class isoform 1 subunits as GST Z1-1, and so on. Enzymes within each GST class share high sequence identities of around 60- 80% while a separate class is normally defined when the identity is less than 30%95. This naming scheme can also be extended to the non-mammalian GSTs5. Mannervik originally organised mammalian GSTs into the α, μ, and π classes96. Subsequently additional GST classes were identified, of these some were originally overlooked due to their low affinity towards GSH or low activity towards the classical substrate CDNB; others were identified in non-mammalian organisms and homologues only recently identified in mammals; while several GST classes are specific to plants, insects or bacteria5.

There are currently seven canonical GST classes in mammals (α, μ, π, θ, σ, ω and ζ), with additional classes found exclusively in plants (ϕ, τ and λ), insects (δ) or bacteria (β)5. The functions of the proteins defined into the various mammalian GST classes, and thus the classes themselves, arose at different points in evolution. Some idea as to

70 when this occurred can be found by examining which clades of extant animals homologues can be found in. The GST θ class is one of the most ancient, predating the evolution of multicellular animals, homologues are found in insects, plants, yeast as well as some bacteria5. The ζ class GSTs are found in both animals and plants, the σ and ω classes arose early in the evolution of animals and are known in both vertebrates and the insects5. The α, μ and π classes appear to be a recent innovation, and are probably confined to either the mammalian or vertebrate lineages5.

As will be discussed later in the chapter, when talking about the different GST classes it must be kept in mind that sequence or structural similarities is not always indicative of common function. It is known that a number of proteins within some GST-classes do not necessarily display glutathione transferase activity5. Furthermore the activity of some entire GST classes is also unknown and, while these are predominantly recently discovered classes, it may yet be proven that some of these are not glutathione transferase classes at all. Such “GSTs” awaiting further characterisation include the bacterial beta class89, the human omega class49 and the plant lambda class87.

71 1.4.4 The GST fold

Figure 1-6 Domain and motif hierarchy of the GST fold. Structural hierarchy of human alpha-class GST A1-1, pdb code 1GSE74. The secondary structure referred to at each tree branch encompasses the structural elements with solid shading. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

Despite low sequence similarity between the canonical GST classes, all adopt a similar fold. Each GST subunit consists of two distinct domains (Figure 1-6), an N-terminal domain similar to the thioredoxin fold and an all α-helical C-terminal domain connected via a short loop. The N-terminal thioredoxin fold consists of four β-strands with three flanking α-helices arranged as βαβαββα (reviewed by Martin97). The thioredoxin fold occurs in several enzyme families that have evolved to bind cysteine or cysteine containing polypeptides such as GSH. As well as the GSTs, this includes DsbA, thioredoxin, glutaredoxin and glutathione peroxidase5,6.

72 A cartoon version of human thioredoxin is shown in Figure 1-7, evident are the typical elements of the thioredoxin fold. This fold consists of approximately 80 amino acids, with distinct N-terminal and C-terminal motifs connected via a short linking α-helix (Figure 1-6).

Figure 1-7 Structure of human thioredoxin. Cartoon representation of reduced human thioredoxin (PDB code 1ERT)98. The first β- strand and α-helix are unlabelled and shown as semi-transparent as they are ancillary to the components of the thioredoxin fold, they are not contained in the GST fold. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

Together the four β-strands form a roughly planar mixed β-sheet, with two helices (h1 and h3) running parallel to the β-strands on one side of the plane plus a motif-linking helix (h2) running perpendicular to the β-strands on the alternative side of the plane. The loop that connects the linker helix (h2) to β-strand 3 contains a characteristic 73 proline residue that is in the less-favoured cis conformation. This cis-proline residue is highly conserved in the thioredoxin fold as well as in all known canonical GSTs and other members of the GST fold family including the CLICs.

The N-terminal motif begins with a β-strand (s1), followed by an α-helix (h1) and a second β-strand (s2) parallel to the first forming a βαβ arrangement. This leads into a short loop region and then a second α-helix whose position with respect to the other elements within the fold can vary. The C-terminal motif consists of two sequential antiparallel β-strands (s3 and s4) connected via a β-hairpin turn, which are followed by a final α-helix at the C-terminus resulting in a ββα arrangement5.

Figure 1-8 Structure of the GST C-terminal domain. Labelled structure of the C-terminal domain of the human alpha-class GST A1-1, pdb code 1GSE74. The C-terminal domain is shown with solid colouring, secondary structural elements that are within the N-terminal domain or unique to the alpha-class GSTs are shown as partially transparent. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

74 The all α-helical C-terminal domain (domain II, see Figure 1-8) varies more considerably among the GST classes than the N-terminal domain (see Figure 1-9). The minimal C-terminal domain contains 5 α-helices, spanning approximately 120 amino acids. Several GST classes (α, θ, τ and ω) also have an extra one or two helices at the C-terminus, while in others one of the 5 common α-helices may be split into two. The C-terminal domain consists of a short class-specific interdomain loop linking to the N- terminal domain that then leads into the two longest helices h4 and h5, these run parallel down one face of the molecule and form a large portion of the GST-dimer interface. Following h5 a highly conserved loop structure and N-terminal capping box leads into h6, a helix that forms the core of the C-terminal domain and runs approximately parallel to h4 and h5. Following an extended loop leading into the helix, h7 runs almost perpendicular to h4 and h5 and is amphipathic. The final common helix, h8, varies in length considerably among the various GST classes but is usually only several turns long and runs parallel to h1. Unlike the N-terminal domain, the C-terminal GST domain is not structurally related to any other known proteins, its evolutionary origins thus remain unclear19.

75

Figure 1-9 Subunit structure in the GST family. Cartoon representations of a single subunit from the various classes of the glutathione transferase family compared to CLIC1. The co-ordinates of the 9 GST structures were superimposed onto CLIC1 and then translated. Secondary structure was assigned according to the criteria of Kabsch and Sander99. References and pdb codes for the GST structures used are: alpha, 1GSE74; beta, 2PMT89; delta, 1PN9100; theta, 1LJR101; 102 103 104 105 49 mu, 1HNA ; pi, 1GLPP ; sigma, 1GSQ ; phi, 1GNW ; omega, 1EEM ; tau

76 1GWC106 and CLIC1, 1K0M3. Based on a similar figure on page 6 of Board et al.49 and made using the programs MOLSCRIPT1 and RASTER3D2.

1.4.5 The GST dimer Most of the canonical GSTs exist as a dimer under physiological conditions and are inactive as a monomer5. The dimer interface is relatively class specific, and has been proposed to be the basis for the tendency of GST heterodimers to form only between isoforms among the same class5. The dimer two-fold axis is enclosed by helices h3 and h4 from both subunits (see Figure 1-10). In most GST classes the interface arises from predominantly hydrophobic interactions and can also include several intersubunit salt bridges5. However in a few classes, for example the bacterial β class89 and some σ class GSTs5, the interface can be relatively polar. Interactions across the interface occur mainly between residues from the N-terminal domain of one subunit and the C-terminal domain from the other5.

Figure 1-10 Cartoon representation of a pi class GST dimer. Cartoon representation of the dimeric form of human GST M1-1 (pdb code 5GSS72), helices are coloured red in the A subunit and blue in the B subunit. The molecule is shown viewed A: down and B: along the dimer two-fold axis. In the A subunit GSH, and in the B subunit Tyr-49 and Asp-98 are shown in ball and stick representation. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

In the α, μ, π, ζ and some σ class GSTs the dimer interface is characterised by an aromatic “lock and key” interaction5. Here a tyrosine, phenylalanine or methionine

77 “key” residue located in the loop region between h2 and s3 “locks” into a hydrophobic pocket formed between h4 and h5 in the other subunit5 (see Tyr-49 in Figure 1-10A). A disruption of this “lock and key” can lead to a loss of enzymatic activity107.

Residues within the two longest helices in the GST fold, h4 and h5, are involved in a significant portion of the intersubunit interaction. These helices subtend an acute angle with the interface that varies between classes from nearly parallel in the β class GSTs to 20-30° in other classes that results in a solvent accessible V-shaped wedge (see Figure 1-10B). This trait can be further amplified in GST classes where helices h4 and h5 diverge significantly from a simple straight α-helix5.

1.4.6 GST substrate binding sites There are two substrate binding sites per GST monomer: the G-site which is specific for GSH, and the H-site which in each GST class can bind a variety of electrophilic compounds (see Figure 1-11)5. The G-site is almost exclusively comprised of residues from the N-terminal domain, whereas the H-site is comprised of residues from both domains with the majority arising from the C-terminal one. Across the various evolutionary classes, the GSH binding site shares several common features. Those parts of the protein binding the GSH cysteinyl and glutamyl moieties are more conserved than those interacting with the glycine residue.

The N-terminus of helix 3 is moderately conserved within the GSTs and often contains the consensus sequence [E/Q][S/T]xAIL85, where the Glx residue within the sequence is in the N-cap position and adopts a strained conformation6. In those GSTs containing this sequence the residues within it comprise a large part of their G-site, being involved in binding the γ-glutamyl moiety of GSH (magenta in Figure 1-11A). When these GSTs bind GSH there are several interactions made between residues within the conserved sequence and the amino acid groups of the GSH γ-glutamyl moiety; the amide hydrogen and hydroxyl groups of the serine or threonine residue (Ser-65 in Figure 1-11A) form hydrogen bonds to the two GSH γ-glutamyl carboxyl groups while the Glx residue (Gln-64 in Figure 1-11A) interacts with the GSH γ-glutamyl amino group. Additionally in most GSTs an acidic residue from helix 4 in the other GST

78 dimer subunit also forms a salt bridge to this GSH amino group (see Figure 1-10)5. One of the reasons the glutathione transferases are inactive as monomers is likely to be the loss of this residue from the active site.

In the G-site of most GSTs** the GSH cysteinyl moiety (orange in Figure 1-11A) forms an anti-parallel β-sheet like interaction with a residue in the loop leading into β-strand 3 (Leu-52 in Figure 1-11A)108. The backbone conformation of this residue is likely to be influenced by the fact that it precedes the cis-proline conserved in proteins containing the thioredoxin fold.

Figure 1-11 The G-site and H-site in a human pi class GST. Cartoon representation of human GST P1-1 complexed with GSH (pdb code 5GSS72) highlighting the active sites within a single subunit. A: Residues within the GST monomer contributing to the GSH binding site in GST P1-1 are shown and labelled. Secondary structural elements in the N-terminal domain are solid, those in the C- terminal domain partially transparent. B: Some of the residues contributing to the hydrophobic substrate binding site are shown in ball-and-stick representation. The C- terminal domain is shown in solid shading, the N-terminal domain as partially transparent. Figure made using the programs MOLSCRIPT1 and RASTER3D2

The residues in the G-site that bind the GSH glycine residue (cyan in Figure 1-11A) vary throughout the various evolutionary classes. They are usually tryptophan or basic

** Excluding the μ class GSTs. 79 residues from helix 2 that interact with one or both of the glycine C-terminal carboxyl groups.

The sum of these interactions in the G-site ensures an orientation, position and local environment for the sulphydryl group of the GSH cysteinyl moiety that facilitates catalysis. This is thought to be partially achieved through the activation of GSH by lowering the pKa of this sulphydryl group. This occurs through various interactions with the protein that stabilise the GSH thiolate anion form109. These protein interactions are thought to include proximity to the N-terminal helix dipole moment of helix 1 110, and also usually the formation of a hydrogen bond with a class specific catalytically essential tyrosine residue in s1 (Tyr-7 in Figure 1-11A) or serine residue in h1 5,6. Instead of a tyrosine or serine residue and like the CLIC proteins, in the ω49 and bacterial β89 class GSTs, a cysteine residue at or near the N-terminus of helix1 is at the centre of the G-site and is capable of forming a mixed disulphide with glutathione. The ζ class GSTs also contain a conserved cysteine residue within their active site, however this appears unable to form a disulphide with glutathione111.

In contrast to the G-site the hydrophobic substrate binding site within the GSTs is highly varied and rather ill-defined. This reflects the fact that the various classes of GSTs while using GSH as a common substrate, each have enzymatic activity towards a different range of hydrophobic compounds. The H-site thus varies between classes and as each GST is active towards a selection of compounds, the interactions made with the substrate upon binding are relatively non-specific. Indeed in some GST crystal structures hydrophobic substrates or inhibitors can be seen to bind in more than one mode to the same protein73. In general the residues that make up the H-site arise from helix 1 and the preceding loop in the N-domain, and from helix 4 and the C-terminus in the C-domain (see Figure 1-11B)109.

1.5 Conclusions The CLICs are a family of highly conserved proteins. The founding CLIC family member, p64, co-purified from bovine kidney membrane extracts with a chloride channel conductance. This suggests that it may regulate, or itself be, a chloride channel. Six human CLIC family members are now known, all of which retain high levels of 80 sequence identity to the C-terminal ~230 residues of p64. This conserved region, defined as the CLIC module, adopts a fold similar to that of the canonical glutathione transferases. Most members of the CLIC family consist of only this minimal CLIC module, but CLIC6 and CLIC5B also contain large unrelated N-terminal extensions. Experiments on CLIC family members containing only the minimal CLIC module suggest that the CLIC-associated chloride channel conductance is contained in, or regulated by, this domain and is thus likely to be a common feature throughout the family. If the CLICs do act as chloride channels, they are highly unusual as the GST- fold has previously been known mainly from proteins performing enzymatic functions.

In conclusion, there is some evidence the CLIC proteins are themselves ion channels. As they possess the GST fold and no obvious transmembrane segments, at first glance it is difficult to see how this could occur, but our knowledge of the GST fold superfamily is surely incomplete. This thesis will present the soluble reduced monomeric structures of several CLIC family members allowing an analysis of the features that are conserved. A soluble oxidised dimeric CLIC1 structure is also presented. In forming this structure, CLIC1 undergoes a large scale conformational change, demonstrating that this protein family is structurally dynamic. This is the first member of the GST fold superfamily that has been shown to have two such drastically different states. Being structurally dynamic is a criterion that the CLICs must satisfy if they are to act as ion channels as they must insert into the membrane from a soluble state. This thesis therefore removes one of the stumbling blocks against the argument that the CLICs act as chloride channels. However other stumbling blocks remain, particularly in relation to the lack of knowledge about the final transmembrane state and a relatively poor understanding of the CLIC-associated channel characteristics. These points will need to be addressed by future experiments before the CLIC family can be conclusively seen as ion channel proteins.

81 Chapter 2 Review of the crystal structure of reduced CLIC1

2.1 Introduction Dulhunty et al. were the first to propose a structural relationship between the CLIC proteins and the glutathione transferases4. Their analysis was based upon a low sequence identity between the CLICs and a GST enzyme (~ 15% with an omega class GST) as well as the presence of several invariant residues found in all proteins with the canonical GST fold. The four invariant residues conserved in the GSTs that are also present in the CLICs are: 1. a glycine residue forming the C-terminal cap of helix 1 (Gly-38 in CLIC1) that leads into the loop connecting to β-strand 2, 2. a cis-proline residue (Pro-65) at the N-terminus of β-strand 3, 3. a glycine residue (Gly-170) that is part of a highly conserved loop structure between helices 5 and 6, 4. an aspartic acid (Asp 177) at position N3 in helix 6 whose side chain stabilises the N-terminal helix cap and a loop structure preceding the helix4,19,49.

Harrop et al. subsequently solved the X-ray structure of the CLIC1 mutant (Q63E , E151G) at 1.4 Å resolution, confirming that the soluble form of the protein adopts the canonical GST fold3. In the same paper a 1.8 Å resolution structure of CLIC1 treated with oxidised glutathione (GSSG) also demonstrated that CLIC1 is able to resolve the glutathione’s disulphide bond, consequently forming a mixed disulphide between glutathione and Cys-24 located at the N-terminus of helix 1. Cys-24 is a cysteine conserved in the CLICs and is located within the glutathione-binding site of the GST fold. CLIC1 was not seen to bind glutathione under reducing conditions.

The CLICs show the closest structural similarity to the beta89 and omega class49 GSTs 19 92 (Cα RMS deviation of 1.6 Å) and the bacterial SspA proteins . Like the GSTs, the CLIC1 molecule is thinner in one dimension than in the other two (55 x 52 x 23 Å) and consists of two discrete domains. The N-terminal domain (residues 1-90) has a thioredoxin like fold (see Section 1.4) consisting of a four stranded mixed β-sheet with two α-helices running parallel to the sheet on one face (helices h1 and h3) and one α- 82 helix (helix 2) running orthogonal to the strands within the sheet on the other. The C- domain is all helical and closely resembles that of the omega class GSTs with two exceptions: the insertion of a negatively-charged loop region (Pro-147 to Gln-164) at the “foot” of the molecule and the position of the carboxyl-terminal helix (helix 9).

Figure 2-1 Features of the CLIC1 structure. Left panel: CLIC1 sequence with secondary structure shown above; helices are in red, and β-strands in yellow. The two putative transmembrane domains are highlighted in green the three cysteines conserved in the CLICs in yellow. The residues within several of the loop regions are boxed, the glycine-rich loop after β-strand 1 in cyan, the cis- proline loop in orange, the interdomain loop in magenta and the negatively charged footloop in blue. Right panel: cartoon representation of CLIC1 with the secondary structural elements labelled. The loop regions boxed in the left panel are also labelled. Figure made using the programs MOLSCRIPT1 and RASTER3D2

The foot-loop between helices 5 and 6 in CLIC1 is a distinctive feature of the chordate CLIC family, and is not present in GSTs (see Figure 2-1). However the yeast regulator of nitrogen catabolism, Ure2p, contains an unrelated insertion at the topologically equivalent position within its GST-like regulatory domain93. Within the CLIC family there is a relatively low level of sequence conservation in this footloop, however all sequences contain a net negative charge (-7 in CLIC1, -6 in CLIC4, -5 in CLIC2 and CLIC6, -3 in CLIC3). The conformation of this footloop differs in the various crystal forms where, as it extends away from the body of the molecule, its structure is dominated by packing interactions. In solution, barring potential binding partners or posttranscriptional modifications, this loop region is likely to be highly flexible3.

83 The N- and C-domains of the CLIC proteins are linked by a proline-rich loop joining helices 3 and 4a (see Figure 2-1). In CLIC1 this loop (Cys-89 to Asn-100) forms a sharp turn between Pro-90 and Pro-94 due to the cis conformation of Pro-91. In one subunit of the CLIC1 1.4 Å resolution structure the electron density map showed the presence of two conformations for this interdomain loop3. In the minor conformation Pro-91 instead adopts the trans conformation resulting in a displacement of helices 1 and 3 with respect to the C-domain. In a model of this alternative conformation these helices showed a 1.0 - 1.5 Å displacement compared to the major conformer and helices 1 and 3 were tilted by 4.4º and 1.9º respectively3. These two helices form the interface between the N-terminal and C-terminal CLIC domains. The rotation of these helices demonstrates a degree of plasticity within the domain interface that is linked to the conformation of the proline-rich domain-connecting loop3. In all human CLICs except CLIC5 this loop contains the WW domain-binding consensus sequence PPxY112, and thus may be a site of regulation in vivo.

At the N-terminus of helix 1 in the CLICs is the conserved sequence CP[F/S][S/C], which corresponds to the CP[F/Y][C/S] motif at the centre of the active site found within the topologically equivalent region of the glutaredoxins113. While most CLICs contain only the first cysteine in this sequence, CLIC2 and CLIC3 possess the thioredoxin-like sequence CxxC. The presence of the second cysteine possibly allows the reduction of a substrate at the expense of forming a disulphide bond between these two cysteines, which could then be resolved by glutathione19.

Cromer et al.19 have performed a full comparison between the active sites and regions of interest in the GSTs and topologically equivalent regions in the CLICs. In this paper it is noted that the H-site in CLIC1 is partially buried by the C-terminal helix (helix 9) in a manner similar to that of the human alpha and theta GST classes. In the G-site, most residues are similar to those of the omega or beta class GSTs (see Figure 2-2A), with the exception that no protein contact is made to the N-terminal amino group of the γ-glutamyl moiety or the carbonyl group of the cysteinyl moiety of GSH3,19. In many GSTs the acidic residue that binds this terminal amino group is provided by the other dimer subunit (see Figure 1-10). Thus Cromer et al. suggested that formation of a GST- like dimer could allow the G-site of CLIC1 to make use of an aspartic acid residue provided by helix 4b of the alternate subunit, possibly facilitating GSH binding19. 84

Figure 2-2 The three conserved cysteines in the CLICs. In CLIC1 the immediate environments surrounding the three conserved cysteines within the CLICs are shown (PDB code 1K0M)3. A: Cys-24 is at the centre of the G-site in the CLIC1 structure complexed with GSH. B: Cys-178 is part of the hydrophobic staple, which is part of a conserved motif within the GSTs located at the N-terminus of h6. C: Cys-223 is at the C-terminus of h8. B and C are created from the 1.4Å apo CLIC1 structure. Figure made using the programs MOLSCRIPT1 and RASTER3D2

In contrast to the GSTs the CLIC proteins predominantly exist in a monomeric form3, although recently Berryman et al. have reported observing a dimeric species of CLIC5A51. Whether the CLICs are able to form GST-like dimers in vivo, and if so in response to what signals, is currently unknown.

85 The behaviour of CLIC1 may be dependent upon redox state. This is because proteins with a GST or thioredoxin-fold are often seen to have redox related roles5,97, additionally CLIC1 has been shown to dimerise in vitro in response to hydrogen peroxide (discussed in Chapter 5). Cysteine residues can be important elements in proteins with redox-related roles.

There are three highly conserved cysteines throughout the CLIC family. The first, Cys- 24 is, as discussed previously, at the centre of the G-site and can form a mixed disulphide with glutathione (see Figure 2-2A)3. An equivalent cysteine is contained within the active sites of the omega class GSTs49, and in many of the known proteins adopting the GST fold that are also monomeric (see Figure 1-5).

The next conserved cysteine, Cys-178, is part of a moderately conserved substructure within the GSTs at the N-terminus of helix 6 that is known as the GST motif II85. This motif consists of two components. The first is a short loop structure and N-terminal helix capping box involving an interaction between a serine/threonine N-cap and an aspartic acid residue in the N3 position (see Thr-174 and Asp-177 in Figure 2-2B)114,115. The GST motif II substructure also contains a hydrophobic staple motif consisting of an interaction between hydrophobic residues in the positions N’ and N4, this motif is thought to be important in defining the N-terminal limit of the α-helix116,117. In CLIC1 Cys-178 is the hydrophobic residue in the N4 position of helix 6. However despite Leu- 173 being in the N’ position, the distance between this and Cys-178 indicates that unlike the GSTs a hydrophobic staple interaction does not appear to occur (see Figure 2-2B). Although not forming a hydrophobic staple motif, Cys-178 and Leu-173 are part of a more general hydrophobic interaction at the core of the molecule also consisting of residues from helices 1 and 6. In the GSTs a large hydrophobic residue contributing to the hydrophobic staple normally occupies the N4 position of helix 6. In the majority of GSTs this is a Leu, Ile, Phe or Met residue. However, as in the CLICs, a cysteine residue is found in the bacterial stringent starvation protein A (SspA)91,92 which also adopts a GST-like fold. As well as the lack of a hydrophobic staple motif making the CLICs GST motif II unusual, the presence of Pro-182 causes a break within the helix, such that in CLIC1, Cys-178 and the following residue Asn-179 do not form hydrogen bonds to the residues C-terminal to them in the helix.

86 The last conserved cysteine in the CLICs is Cys-223 in CLIC1; this follows the C- terminal capping serine (Ser-221) of helix 8 such that its amide backbone crosses over the middle of the helix (see Figure 2-2C). This region of the canonical GST fold is more varied among GST classes than those occupied by the other two conserved cysteines. Thus it is difficult to determine if other GSTs possess an equivalent residue from sequence similarity alone.

In mammalian GSTs helix 2, which is an important component of the GST dimer interface, is usually within a relatively flexible region of the protein. However, in non- mammalian GSTs helix 2 is secured by the interaction of an aromatic residue within the helix to a hydrophobic pocket contributed by the β-sheet19. In CLIC1, as in the non- mammalian GSTs, Val-55 (Leu in most CLICs) and Leu-58 anchor helix 2 to a hydrophobic pocket centred around Phe-11 and Phe-66 from the β-sheet19.

Two putative transmembrane regions have been proposed for the CLICs (highlighted in green in Figure 2-1). The first encompasses residues from helix 1 and β-strand 2, and the second, residues in helix 6 and the loop region at its N-terminus.

2.2 Conclusions In conclusion, although monomeric, the crystal structure of the soluble reduced form of CLIC1 adopts a canonical GST fold similar to the omega class GSTs. Like the omega and bacterial beta class GSTs, the CLICs contain a cysteine residue at the centre of their G-site that is able to form a mixed disulphide with glutathione. Unlike these proteins, the CLIC1 G-site lacks many of the interactions to glutathione normally associated with the G-sites of canonical GSTs. The main differences between the structure of CLIC1 and other GST family members are: a glycine rich loop following s1; a cis-proline containing interdomain loop; the presence of a negatively charged flexible footloop between h5 and h6 and the presence of h9, which forms a lid over a relatively open putative H-site. The canonical GST-like structure of the soluble form of CLIC1 offers few hints as to how, or whether, this protein could possibly form an ion channel.

87

Chapter 3 CLIC gene structure and evolutionary relationships

3.1 Introduction In this chapter the evolutionary relationships of the CLIC family are explored and classified. The highly conserved vertebrate clic gene structure is described in Section 3.2.1. The vertebrate CLIC family members are thought to be direct paralogues, resulting from a series of gene duplications in early vertebrate evolution because they have identical gene structures and high sequence identities. The CLIC-like proteins in the arthropods and nematodes have a less conserved gene structure. They are briefly described in Section 3.2.2, however the lack of conservation between the genes in each of these phyla, or with the vertebrate genes, makes it difficult to draw any conclusions from this group. Section 3.3 describes in detail, the currently known members of the CLIC family throughout the various branches of evolution. Although the number of sequences from which to draw conclusions is limited, it is concluded that the 6 or 7 vertebrate CLIC family members arose from a series of duplication events from a single ancestral chordate protein after the divergence of the vertebrates and the tunicates. CLIC-like proteins that may be related to this ancestral chordate CLIC can also be seen in many other non-chordate animals, including the arthropods, echinoderms, molluscs, nematodes, platyhelminths and possibly the cnidarians. Although CLIC-like proteins are present in each of these phyla, it is not known to what extent they perform similar cellular functions.

3.2 The clic gene structure

3.2.1 The vertebrate clic gene structure In humans the CLIC proteins all contain the same basic gene structure, with conservation of the position of intron/exon junctions with respect to protein structural features. The ~230 residue CLIC module is encoded by six exons within the various clic genes, see Figure 3-1. Exon 1 encodes any N-terminal extensions to the CLIC module and β-strand 1 from within it; exon 2, the following α-helix and β-strand; exon 3, the rest of the N-terminal domain; exon 4 the domain linking loop, h4 and the start of h5; exon 5, the remainder of h5, the chordate footloop and start of h6; while exon 6

88 encodes the remainder of the C-terminal domain. The first and last of these exons are the most variable, coding for the less conserved 10-20 N- and C-terminal residues, as well as any 5’ and 3’ UTR sequence. It should be noted that the clic exon numbering system currently used will presumably require standardisation at a later date. As discussed in Chapter 1, papers reporting CLIC mRNA sizes frequently identify multiple transcripts20,25,27,31,36,53, the sizes of which are sometimes larger than the number of cDNA nucleotides that have been sequenced. It seems likely that in some of the clic genes, non-coding exons exist that are yet to be found. In this thesis, exon 1 thus refers to the first coding exon and not necessarily the first within the mRNA transcript. The designation of exon 1b is also somewhat confusing. In the clic5 gene exon 1b is upstream of exon 1a and encodes the same structural residues, exon 1b is spliced in place of exon 1a in the CLIC5B isoform. Whereas in the clic2 and clic6 genes exon 1b is downstream of exon 1, it encodes residues superfluous to the CLIC module and is additionally spliced in the B isoforms of these family members.

Figure 3-1 Clic gene structure. Gene structure of the human CLIC family members. Exons are shown in red and labelled, introns are in black. Insert: a cartoon representation of CLIC1 (pdb code 1K0M3) is coloured according to which exon the sequence is encoded by. Insert made using the programs MOLSCRIPT1 and RASTER3D2

Despite maintaining the same basic structure, the human clic genes vary considerably in size, from ~2 kbp for clic3 to the extremely large ~180 kbp clic5 gene (see Figure 3-1). This is primarily due to variations in the size of the introns. Many of these, in the larger clic 2, 4, 5 and 6 genes, are much longer (up to 5 standard deviations) than the mean 89 intron length determined in a study of human genes, although an under representation of large introns was expected in this study118. As well as the distribution in the sizes of the clic introns, the last exon, exon 6, also varies considerably in size. It ranges from ~230 bp in clic3 to around 3500 bp in , due primarily to varying amounts of 3’ UTR sequence. This results in a clic4 mRNA transcript with an abnormally large amount of 3’ untranslated sequence118. This suggests multiple regulatory elements effecting CLIC4 translation or mRNA turnover may be contained within the 3’ UTR. The unusually large introns contained within the clic2, clic4, clic5 and clic6 genes, may also signify the presence of multiple transcriptional regulatory elements in these regions of the respective genes.

Although a highly speculative conclusion to draw from intron size alone, the large sizes of the introns in the clic2, 4, 5 and 6 genes suggests they are not involved in the cellular recovery response to stress. This is because many forms of cellular stress are likely to lead to both a drop in the processivity of the cell’s transcriptional machinery and the ability of the spliceosome to identify and process potential splicing sites in large immature transcripts. Thus a cell relying for recovery on proteins whose production requires both these elements to be functioning optimally would be at an evolutionary disadvantage. This is in contrast to the structurally related glutathione transferases that aid in the response to stress through metabolising a range of cytotoxic compounds. Multiple glutathione transferases are known to be upregulated in conditions of cell stress119, and in contrast to the large clic genes, the average size of the human GST genes is only around 10 kbp. Interestingly the only abnormally large (18 kb) intron is present in GSTO2 gene, which also has the largest human GST gene overall, and incidentally appears to be one of the closest relatives to the CLICs4. The significance of this however remains unknown.

The CLICs with large introns are therefore unlikely to be involved in stress recovery, however they may still be involved in some form of stress-response. In keratinocytes, CLIC4 was seen to be upregulated in response to DNA damaging agents that result in apoptosis44. As would be expected given the size of the clic4 gene, this upregulation was seen to be posttranscriptional, as mRNA levels reduced under the treatment44. Following exposure to these agents, endogenous CLIC4 translocates from the cytoplasm to the nucleus prior to several apoptotic markers45. In addition exogenous 90 overexpression44 or artificial nuclear targeting of CLIC4 45 led directly to apoptosis, suggesting CLIC4 plays an early causative apoptotic role. If such a role is confirmed, the large gene and transcript sizes of CLIC4 are understandable. Such a protein would require multiple tight transcriptional and translational regulatory control mechanisms to ensure proper control in healthy cells, while still enabling co-ordinated apoptosis to occur in response to cell damage without having to rely upon functional transcriptional or splicing machinery.

In all known vertebrate CLICs the extreme N-terminal residues are encoded for within exon 1, this includes the ~ 6 conserved residues that comprise β-strand 1 (purple in the insert of Figure 3-1). The nucleotides encoding these six residues are often the only region of exon 1 displaying similarity to the same exon in other CLIC family members, making this one of the more variable exons within the gene. Intron 1, which separates exon 1 from the remaining exons is also often the largest within the clic genes, this large separation and variability can make this exon the most difficult to predict from genomic sequence alone. In the clic5 and clic6 genes the less-conserved additional N- terminal extensions encoded within this exon can also be difficult to predict from genomic sequences.

The clic1, clic3 and clic4 genes consist of only this basic 6 exon gene structure, however the clic2, clic5 and clic6 genes all contain at least one additional exon. In the clic5 gene, two alternately spliced variants of exon 1 exist, exon 1b and exon 1a. The alternative incorporation of one of these exons in the final mRNA results in the production of either the CLIC5A or CLIC5B isoform, which possess different N- terminal sequences. The clic2 and clic6 genes instead contain a small, unrelated 54 bp exon (exon 1b) between the normal clic module exons 1 and 2. The alternate incorporation of this exon results in an 18 residue insertion that forms the difference between the B and A isoforms of CLIC2 and CLIC6. Approximately 2 kbp upstream of exon 1b in the clic2 gene is a second 54 bp region that is 90% identical to exon 1b although no splice sites are predicted at the boundaries of this sequence. Due to the lack of potential splice sites this is presumably a non-functional copy of exon 1b and is labelled 1b` in Figure 3-1.

91

Figure 3-2 Phylogram of the human CLICs. A: phylogram showing the relationship between the amino acid sequence of the CLIC modules of the human CLIC family members and B: a phylogram showing the relationship of the respective nucleotide coding regions for the CLIC module. Sequence alignment and tree construction was performed using the program ClustalW120. Neighbour-joining (N-J) bootstrap values were calculated using 1000 alignments.

The various isozymes within the distinct evolutionary classes of the glutathione transferases are usually located within a single chromosomal locus, which sometimes also contains multiple related pseudogenes121,122. In contrast to this, the various human clic genes lie in disparate positions on the genome123,124, with only clic1 and clic5 lying on the same chromosome. Despite the disparate localisation of the clic genes, their sequence similarity, which is on par with that retained by members of a GST class, and the conservation of the clic gene structure suggest a close evolutionary relationship between the family members. Indeed it has been reported that the clic4, clic5 and clic6 genes are embedded within a ~500 kbp region of segmental paralogy that is triplicated in humans on 1, 6 and 21 55. All three versions of this large paralogous segment, called the ACD cluster, contain an Acute Myeloid Leukaemia/Runt (AML/Runt) family member, a CLIC family member and a Down Syndrome Candidate Region 1-like (DSCR1-like) family member. The Runt genes encode transcription factors involved in development and oncogenesis125, while the DSCR1-like proteins are a group of inhibitors of calcineurin, a Ca2+/calmodulin-dependent protein phosphatase55. 92

Although there are outlying clic genes, all three human members of the Runt and DSCR1-like protein families are contained within these ACD clusters, thus it is tempting to speculate that the maintenance of these gene linkages during the course of evolution underlies a common functional or regulatory relationship for these proteins55. The three human ACD clusters and their constituent protein family members are: ACD21 containing RUNX1, CLIC6 and DSCR1; ACD6 containing RUNX2, CLIC5 and DSCRL1; and ACD1 with RUNX3, CLIC4 and DSCR1L2 55. The biological roles of the Runt family members are the most extensively studied of the three ACD cluster protein families. RUNX1 is involved in hematopoiesis and its dysfunction is associated with acute leukaemia125. RUNX2 appears to play a key role in osteogenesis125. RUNX3 is the most ubiquitously expressed human Runt family member and is thought to have multiple biological roles. These include roles in neurogenesis, T-cell differentiation, and RUNX3 dysfunction appears to be associated with gastric carcinogenesis through the suppression of apoptosis and TGF-β1 sensitivity125. Although there are currently no studies directly aimed at analysing possible interrelationships between the biological functions of the various proteins within each ACD cluster, there may be some suggestion of common function. CLIC5, like RUNX2, may be involved in osteogenesis69. CLIC4, like RUNX3, is also ubiquitously expressed, may be involved in apoptosis45,126 and is regulated by TGF-β1 46. While these correlations are weak, they do suggest further experiments in this area may lead to interesting results.

The ACD cluster gene linkage and subsequent duplication events are thought to have occurred after the divergence of the deuterostomes and protostomes, possibly during a round of chromosomal rearrangements just prior to the divergence of the tetrapods and teleosts55. Supporting this timeframe for the formation of the ACD cluster is the fact that in the draft genome sequence of the tunicate C. intestinalis127 the single CLIC, Runt and DSCR1 family members do not appear to be associated, and are contained on different scaffolds in version 1.0 of the genome. Studies of the two puffer fish genomes (F. rubripes and T. nigroviridis) show that the teleosts also contain a second copy of the CLIC5 ACD cluster, although it is noteworthy that the DSCR1 genes have not been maintained within any of the 4 teleosts ACD clusters but instead lie elsewhere on the genome128. Based on sequence similarity, it was proposed that after the divergence 93 from the tunicates, with their single CLIC, DSCR1 and Runt family members, several rounds of duplication gave rise to 4 ACD clusters in the vertebrates, but after the divergence of the tetrapods and teleosts the second CLIC5 ACD cluster was later lost in the tetrapod lineage128.

Thus it seems apparent that CLICs 4, 5 and 6, arose from two separate large-scale duplication events from a single ancestral CLIC protein. Interestingly, although the mRNA sequence supports an earlier disjunction of CLIC5 from a CLIC4/CLIC6 ancestor55, at the protein level CLICs 4 and 5 retain a higher degree of similarity than CLICs 4 and 6 (see Figure 3-2A and B). This could be due to functional differences or may reflect the fact that both isoforms of CLIC6 contain a large N-terminal extension and thus some of the functions of the CLIC module may have migrated to, or become dependent upon, this domain. This could not occur in CLIC5 as one of its isoforms consists of only the minimal CLIC module.

Figure 3-3 CLIC1 pseudogenes. A diagram of the pseudogene copies of CLIC1 and CLIC4 present within the human genome. The two CLIC mRNA transcripts are shown at top coloured by exon. Exons 1-6 are coloured mauve, red, green, blue, yellow and cyan respectively, the poly A tail is shown in orange. For each locus containing a CLIC pseudogene those sections of the mRNA transcript present are shown in colour, the horizontal position with respect to the sequence of the full transcript is maintained. In black and white, a localised map of each locus is shown displaying the orientation and respective positions of the CLIC mRNA segments.

94 These three ACD cluster CLIC family members retain a closer level of similarity with each other than they do to CLICs 1-3. Those CLICs contained within the ACD clusters form a tight, highly similar group, while CLICs 1, 2 and 3 form a second more divergent sub-class. The division into two separate subclasses can be seen in the phylograms shown in Figure 3-2A and B. In this second sub-class CLICs 1 and 3 are more closely related to each other than they are to CLIC2. In this sub-class CLIC2 is closest to the ACD cluster CLICs and CLIC3 the most divergent.

As well as the functional clic genes, the human genome contains several clic pseudogenes. A lack of intronic sequence within the pseudogenes suggests that they are most likely a result of the reverse transcription and genomic incorporation of mRNA transcripts. There are 3 relatively complete copies of the CLIC1 mRNA transcript within the human genome and six different loci contain partial copies of CLIC4 mRNA, see Figure 3-3. The CLIC4 pseudogenes are more fragmented than those of CLIC1 and are predominantly composed of the 3’ UTR encoded in exon6 of the clic4 gene. As the fragment sizes are similar in the clic1 and clic4 pseudogenes, this is likely to be due to the fact that the CLIC4 mRNA transcript is ~3.5 times as large as that of CLIC1, with exon 6 making up around 81 % of the entire transcript. No pseudogenes are identifiable for the other CLIC family members in build 34.2 of the human genome. It is currently unclear why CLIC1 and CLIC4 are unique within the family in containing pseudogenes. While they are the two most widely expressed CLIC family members, they are not the only CLICs expressed in gametes, as CLIC5 has recently been shown to be present in bovine spermatozoa33.

3.2.2 The invertebrate clic gene structure The same 6 exon gene structure of the human CLICs is relatively well conserved across the vertebrates and differs only in the fusion of two of these exons in the tunicate C. intestinalis127. In contrast, the gene structure for CLIC homologues from non-chordate phyla is largely unknown. This is because the majority of currently known invertebrate CLIC sequences have been derived from EST sequencing projects.

95

Figure 3-4 Exon structure of the insect and nematode CLIC homologues. A: cartoon representation of the D. melanogaster CLIC homologues coloured according to the exon structure of the orthologous B. mori protein. B: cartoon representation of the C. elegans CLIC homologue coloured according to the exon structure of the orthologous B. malayi protein. Figure made using the programs MOLSCRIPT1 and RASTER3D2

Genomic studies have however elucidated the gene structure for the CLIC homologues in several nematodes and insects. For the nematodes these include B. malayi (to be published, TIGR, www.tigr.org), C. briggsae129 and C. elegans130, while in the insects, gene structures are known for the honeybee A. mellifera (to be published, NHGR Institute, www.genome.gov), the fruit flies D. melanogaster131,132 and D. pseudoobscura133, the mosquito A. gambiae134 and the silkworm B. mori135. In contrast to the chordates, the number of exons and position of exon/intron junctions within the CLIC module shows only a low degree of conservation in these CLIC-like genes, both to each other and the chordate genes. The exception to this is between the related species C. elegans and C. briggsae that have identical exl-1 genes and the gene structure of the two known Drosophila CLIC homologues are also identical.

Although no relationship can be seen across phyla, between the 5 known insect genes and between the 3 known nematode genes, the position of a core number of exon/intron junctions is conserved (see Figure 3-4). However in both cases the total number of

96 exons varies. The CLIC module of the B. mori homologue has the most exons (8) in the known insect genes and B. malayi the most (6) in the nematodes (see Figure 3-4). The only exon/intron junction shared by the CLIC or CLIC-like proteins in the arthropods, chordates and nematodes is that which follows β-strand 1, though it is difficult to draw any conclusions from this.

3.3 CLIC homologues

3.3.1 Introduction It is not easy to distinguish the relationship between the CLICs and similar proteins in different organisms. This is because the CLIC proteins are distantly related to the omega class glutathione transferases4. As little is known about the function of both the CLICs and the omega class GSTs49,121 there is ignorance over which particular residues help define their respective, presumably distinct roles.

During the evolution of animals, an ancestral protein from which all-present day CLIC- like proteins are derived presumably diverged in sequence and then in function from a common GST-like ancestor. The timing of this divergence, and whether and when this ancestral protein subsequently acquired a fundamental cellular role, are both likely to have influenced the role(s) now played by the progeny of this ancestral “CLIC” in the various branches of evolution. It must be kept in mind that evolution may not have occurred in this way, and therefore drawing parallels between related proteins with different functions may lead to misconceptions.

Despite these concerns the CLIC proteins are moderately conserved across species, orthologues of all 6 human family members appear to exist throughout the vertebrates136, while proteins similar to the CLICs are known to exist in arthropods136, nematodes136,137, molluscs and platyhelminths. They may also exist in cnidarians, suggesting that a CLIC-like protein may have been present prior to the radiata-bilateria split. Before touching on these sequences in more detail there are several possible experimental biases that could affect the interpretation of these results, which will be discussed below.

97 The majority of known CLIC-like sequences are derived from various genome or EST sequencing projects. As the number of these projects is necessarily limited, the currently available sequences reflect the biases involved in the organisms that have been chosen to be sequenced, as described by Hedges138 these fall into three, not necessarily mutually exclusive, categories:

1. Traditional model scientific organisms about which there was a large body of pre-existing knowledge, usually originally chosen for study because of their small size and short generation times (for example D. melanogaster, M. musculus, R. norvegicus and C. elegans). 2. Organisms chosen for unique aspects of their genomes or evolutionary relationships (for example F. rubripes or C. intestinalis). 3. Economically important organisms that are agriculturally important or related to human health (for example human parasites and disease vectors such as S. japonicum or A. gambiae, parasites/pests of agriculturally important plants and animals such as A. pisum or A. variegatum, or the agriculturally important animals themselves such as A. mellifera or S. scrofa).

These sequencing efforts require a large expenditure of resources, which has limited the number of projects undertaken purely to explore evolutionary relationships. Although a large number of CLIC-like sequences are known, the organisms from which these sequences originate are related in a manner that makes it difficult to comprehensively trace the evolutionary origins of the CLICs.

Many of these sequencing projects are currently still in progress. Therefore some of the known CLIC-like sequences are incomplete and only include a portion of the protein. These partial sequences disproportionately cover the CLIC module of the protein, particularly those parts of it encoded by exons 2-6. This occurs for a variety of reasons; in those family members containing N-terminal extensions the CLIC module is more conserved than the respective N-terminus and is thus more easily identified using computational searches; also many of these CLIC sequences are derived from EST sequencing projects which may be based on cDNA libraries with experimental over- representations of sequence derived from the 3’end of mRNA transcripts.

98 Obtaining EST sequences is also dependent on the mRNA profiles of the tissue under study and thus the sequences of highly expressed transcripts are often over represented. Although tissue specific, the mRNA levels of the individual CLIC family members approximately follow the distribution CLIC4 ~ CLIC1 > CLIC5 ~ CLIC6 >> CLIC3 ~ CLIC2. So if only a small number of ESTs are known for a particular organism, the CLIC1 and CLIC4 family members may be the only CLICs represented.

3.3.2 Deuterostome CLICs This section describes in detail the deuterostome CLICs. The information presented and the sources used are summarised in Table 3-1.

3.3.2.1 Mammalian CLICs The duplication events leading to the 6 human CLIC family members were likely to have occurred early in vertebrate evolution. In the eutherian mammals, CLICs are known in the carnivore, ungulate, lagomorph, rodent and primate lineages but all family members are likely to be ubiquitous. For the CLIC module of various orthologues, the amino acid identity between these lineages is >95%, and genomic sequence from dogs, humans123,124, mice139 and rats140 indicates that the gene structure is also highly conserved. Although only a few sequences are known, metatherian and prototherian mammalian CLICs also exist and retain ~ 93% identity with their eutherian orthologues.

Across mammals, the CLIC5 and CLIC6 N-terminal extensions are present but show a much lower degree of conservation (sometimes as low as 30% identity) than their respective CLIC modules. In mammalian genomic sequence these extensions are usually identifiable from sequence similarity alone, but their full extent can sometimes be difficult to define without supporting EST data. These extensions are most likely a later evolutionary addition, occurring sometime after the duplication events that led to the formation of the 6 different CLIC modules. While the CLIC5 and CLIC6 N- terminal extensions diverge significantly in the mammals, they nonetheless remain identifiable. This does not tend to be the case for the short isoform-specific exon that encodes for the small insertion after β-strand 1 present in the B isoforms of human CLICs 2 and 6. Although not in the mouse, rat or cat genes, in the chimpanzee clic6 gene, an orthologous exon 1b identical to that in humans is present, possibly indicating that such insertions may be conserved only in closely related species. 99

3.3.2.2 Avian CLICs The genome sequencing of the red junglefowl, Gallus gallus is almost complete141. G. gallus is thought to have last shared a common ancestor with the mammals ~310 Myr BP, it is presently the only organism whose genome is extensively studied at around this evolutionary distance from the mammals141. In the genomic construct 1.1 141, the gene sequences of CLICs 2-6 can be found, although the clic3 gene is presently incomplete. Within their CLIC modules, these chicken CLICs typically display between 85-95% identity to their mammalian counterparts. CLIC1 however appears absent from both the current genomic and EST G. gallus sequencing projects. It is unclear if this implies the loss of this family member in the birds or if the clic1 gene is located within gaps in the current genome assembly. If the latter, the lack of CLIC1 sequences from the relatively extensive G. gallus EST libraries suggests, at the very least, that CLIC1 is not as ubiquitously expressed in the birds as in mammals. It is possible that gene deletions, with respect to the mammals and teleosts, may be a common feature of the chicken genome141. Immune response genes are also some of the least conserved between chickens and humans141. If CLIC1 is indeed missing in the birds, its presence within the MHC III gene region in humans27 and identification via macrophage activation25, both of which suggest an immune related role, may be indicative of why this particular family member was lost in the avian lineage. In support of CLIC1 simply being unaccounted for due to the incompleteness of the current genome assembly, one of the genes neighbouring CLIC1 in the mammalian genomes (mutS homologue 5) has been sequenced but not placed in version 1.1 of the chicken genome.

From genomic sequence alone, it is difficult to predict whether the CLIC5 and CLIC6 N-terminal extensions are present in the chicken. A 62 kDa chicken protein, immunologically related to bovine p64 has previously been reported69, suggesting the existence of chicken CLICs with some form of extension. In G. gallus chromosome 1, 18 kbp upstream of exon 2 in the clic6 gene is an ORF that encodes for 452 residues. Although not displaying sequence similarity to any of the mammalian CLIC6 extensions, this ORF does have a similar amino acid composition, pI and residues directly bordering the 3’ splice site. A large N-terminal extension, also without similarity to the mammalian CLIC5B extensions, is also predicted for the chicken clic5

100 gene (XM_420060). These results suggest that the CLIC5 and CLIC6 family members gained their N-terminal extensions before the synapsid-diapsid divergence, but that the selection pressures operating on these extensions are relatively weak. Alternatively the functions of the CLIC5 and CLIC6 family members may be uniquely suited to regulation or augmentation by such extensions. Thus they may have evolved independently in the avian and mammalian lineages, possibly explaining the lack of sequence similarity.

3.3.2.3 Amphibian CLICs The amphibians last shared a common ancestor with the vertebrates ~390 Myr BP, and the extant amphibians represent a major early divergence in the tetrapod lineage. Several amphibian CLIC sequences are known. These primarily originate from the EST sequencing projects of the African clawed frog (X. laevis), as well as EST and genomic sequencing projects for the closely related pipid frog (X. tropicalis)142. The sequences of CLICs 1, 4 and 5A are known for both these frogs, as well as the CLIC3 and CLIC6 sequences for X. tropicalis. No real conclusions can be drawn from the lack of information for amphibian CLIC2 due to the incompleteness of these sequencing projects. Between the two closely related Xenopus species, sequence conservation is at around 92-96% identity. Compared with their counterparts in the birds, mammals and fish, the ACD cluster CLICs are more highly conserved sharing around 85% identity, while the CLIC1 and CLIC3 orthologues have lower identities of around 60-75%. The gene structure of these amphibian CLICs is identical to those in the mammalian CLIC genes. As well as these frog CLIC sequences, CLIC1 EST sequences are available for two salamanders, A. mexicanum and A. tigrinum tigrinum. These show ~75% identity to the mammalian, and teleost CLIC1 orthologues, and despite both being amphibians, also the frog CLIC1s.

As in the chicken, the additional N-terminal sequences of the amphibian CLIC5 and CLIC6 orthologues are not easy to identify without presently unavailable supporting EST data. The clic6 gene is contained in the genomic scaffold 1078 in version 3.0 of the X. tropicalis genome (to be published, JGI, www.jgi.doe.gov). A large 592 residue ORF lies directly upstream of the clic6 exon 2 suggesting the Xenopus CLIC6 does contain an N-terminal extension. The amino acid composition, pI and residues

101 immediately bordering the 3’ splice site of this ORF are similar to those in other known mammalian CLIC6 N-terminal extensions. However in contrast to the chicken, this extension also displays low similarity (~20% identity) to those found in the human, rabbit and rat CLIC6 orthologues, possibly suggesting this region may have undergone accelerated evolution and divergence in the birds. Like the human and rabbit extensions this ORF also contains repetitive sequences; EEENVQDVAG or a related sequence occurs 11 times in a row. A smaller CLIC5A-like isoform is seen in the frogs, however no exons encoding for sequence obviously similar to the longer CLIC5B isoforms are currently detected in the X. tropicalis genome.

3.3.2.4 Teleost CLICs The teleost (bony ray-finned) fish are thought to have diverged from the tetrapods ~450 Myr BP143. In both the puffer fish genome sequences (F. rubripes143 and T. nigroviridis144) all 6 vertebrate CLIC orthologues, plus the additional CLIC5-like (CLIC5L) ACD cluster family member lost in the tetrapod lineage128 are present and contain highly similar gene structures. This CLIC5L orthologue consists of only the minimal CLIC module with no obvious extensions. As well as these genomic sequences, a variety of full and partial CLIC EST sequences are known for at least 9 species of teleost fish from 6 different orders, these typically display between 60-85% identity with their corresponding mammalian counterparts depending on the orthologue, with the ACD cluster CLICs again more highly conserved than CLICs 1-3.

As in mammals, the teleost CLIC5 orthologue has at least two alternate isoforms, one of which contains a large N-terminal extension that appears unrelated to the avian or mammalian CLIC5 extensions. In contrast to the mammals the teleost CLIC6 orthologue contains an isoform with only the minimal CLIC module and no N-terminal extensions are currently known for this orthologue. The presence of these extensions in the teleosts supports the hypothesis that the CLIC5 orthologue attained its N-terminal extensions early in vertebrate evolution and has diverged considerably since then, as opposed to the later independent evolution of these extensions in the various vertebrate lineages. If such a scenario is also true for the CLIC6 orthologue, this would indicate that its N-terminal extension became associated with its CLIC module in the tetrapod lineage after the divergence from the teleosts but before the amniote-amphibian split.

102

3.3.2.5 Chondrichthyes CLICs The Chondrichthyes (cartilaginous vertebrates) diverged from that of the Teleostomi (bony fish and tetrapods) early in the evolution of the Gnathostomes (jawed vertebrates). There is thought to be only ~45 Myr between the cartilaginous/bony vertebrate split and the fish/tetrapod split145. Due to a paucity of data, only one partial EST CLIC sequence exists from a cartilaginous fish, from the stingray Leucoraja erinacea. This sequence covers the first ~90 residues of the L. erinacea CLIC1 orthologue, and shows ~70% identity to the same region in the mammalian and teleost CLIC1s.

3.3.2.6 Agnathan CLICs The Gnathostomes are thought to have diverged from the Agnathans (jawless vertebrates) around 440 Myr BP145. Only a single agnathan, the sea lamprey Petromyzon marinus, is known to contain CLICs. However, again, there is a lack of data from species diverging from the Gnathostoma lineage at around this time. From this sea lamprey, several partial EST sequences146 exist that cover the C-terminal domain of CLIC1 and most of “CLIC5”. These sequences are ~75% identical to the corresponding amphibian, avian, fish and mammalian CLIC orthologues, albeit the P. marinus “CLIC5” is almost equally identical to both the CLIC4 and CLIC5 orthologues. The ACD cluster duplication events may thus have occurred at around this time. In these proteins the chordate-specific footloops are present but ~5 residues shorter than those in the gnathostomes possibly suggesting that these insertions into the GST-fold occurred via a series of small steps early in vertebrate evolution. Like the gnathostome CLIC2 and CLIC3 orthologues, and as discussed later also the Arthropod CLICs, the P. marinus “CLIC5” orthologue has a two-cysteine CxxC active site motif.

These CLIC sequences were derived from EST libraries created from larval P. marinus lymphocyte-like cells146, with the “CLIC5” sequence coming from a library enriched for sequences upregulated following immune stimulation. This is noteworthy as the immune system of the sea lamprey is distinct from that of the gnathostomes. The extant agnathans have evolved an alternate adaptive immune response based around a cassette system of leucine-rich-repeats, these form the variable lymphocyte receptors (VLRs) in 103 the lamprey instead of the immunoglobulin based VLRs that evolved in the jawed- vertebrates147. There is also evidence that CLIC proteins are upregulated in the gnathostome immune response25, if both these results are confirmed, the involvement of the CLICs in at least two highly distinct immune systems suggests they may play a fundamental role in the vertebrate immune response.

3.3.2.7 Non-vertebrate chordate CLICs There are three subphyla within the chordates: the vertebrates, many of which have been discussed above, the cephalochordates (lancelets) and the urochordates127. Unfortunately, there are currently no known cephalochordate CLIC sequences with which to make comparisons, although several urochordate CLIC sequences are known. The urochordates are thought to have diverged from the vertebrate/cephalochordate subphyla sometime around ~540 Myr BP145. EST sequencing projects have found CLIC proteins in the three urochordate ascidians: Halocynthia roretzi, Ciona intestinalis and Ciona savignyi.

As well as EST data, genomic sequence is also available for the two Ciona species. In version 1.0 of the C. intestinalis genome127 and version 4 of the C. savignyi genome (Whitehead Institute), only one CLIC-like protein is seen to be present. The EST sequences for all three species are also consistent with there being only a single CLIC gene in the urochordates. Within the genes for these Ciona proteins, the intron/exon junctions are identical to those within the vertebrate clic genes, the only difference being that the sequence encoded for by the vertebrate exons 3 and 4 are fused.

104

Figure 3-5 Structure of the C. intestinalis CLIC gene. Above: the C. intestinalis CLIC gene is shown, exons are labelled with the number(s) of the homologous vertebrate exon. The alternatively spliced N-terminal exon 1s are given Roman numeral subscripts from I to V. Middle: at least 4 different C. intestinalis CLIC isoforms are known, a graphical representation of the exons each incorporate is shown. Bottom: table with the length of each exon; the size of the following intron; the presence of start codons; or sequence encoding β strand 1; and whether the full extent of the exon is known, if not, it is shown if it is the 5’ or 3’ end of the exon that is unknown.

While these urochordates contain only a single CLIC gene, the lack of functional versatility compared to the vertebrates that this would presumably entail is, at least partially, compensated for by having multiple alternatively spliced isoforms. EST and gene sequences show that at least four such isoforms exist in C. intestinalis and sequence conservation within its genome suggests that most of these are also present in C. savignyi. These isoforms differ through the incorporation of different 5’ exons that encode various N-terminal extensions to the CLIC module. Like the vertebrate CLIC5A and CLIC5B orthologues, these extensions result from alternatively spliced

105 versions of exon 1, the 3’ sequence of which encodes for β-strand 1. There are at least 6 different exons that can be included to form these N-terminal extensions (exons 1i, iia, iib, iii, iv and v). Exons 1iii, iv and v each encode different versions of β-strand 1, while different initiation codons are encoded by exons 1i, iib and v (see Figure 3-5). None of these extensions appear similar, either in sequence or amino acid composition, to those found in the vertebrate CLIC5B or CLIC6 orthologues.

Within their CLIC modules, the two closely related Ciona CLICs share ~75% identity with each other, but only around 50% identity to the CLIC from the other urochordate H. roretzisea. The urochordate CLICs retain a similar degree of conservation to the vertebrate CLICs, displaying between 45-55% identity with various vertebrate CLIC family members, with the highest levels of identity typically shared with the vertebrate ACD cluster family members. Apart from their larger N-terminal sequences, the urochordate CLICs seem to contain few obvious insertions or deletions compared to the vertebrate CLICs. Although sequence identity within this portion of the protein is low, the chordate footloop between h5 and h6 appears to be present and of a similar length to that found in the gnathostome CLICs. This implies that either, this footloop was present in a common ancestor of the urochordates and vertebrates but was subsequently shortened in the agnathan CLICs, or perhaps less likely, that it independently evolved in both these lineages. Within the CLIC module itself the most highly conserved region (identity levels >90%) encompasses the secondary structural elements s1, h1 and s2. These include the active site CxxS motif and first putative transmembrane region.

It is interesting to note that the degree of conservation of the CLICs within the urochordate lineage is much lower than among the vertebrates. This may be because these proteins are less functionally constrained in the urochordates compared to the vertebrates, possibly due to the lower number of cell types, and hence have undergone more rapid evolution. Alternatively, and perhaps more likely, anthropocentric biases caused by the lower number of defining features in the urochordates and the sparse nature of their fossil record may mean that the current classification schemes within this lineage do not reflect comparable degrees of evolutionary relationships to those used for the vertebrates.

106 The presence of a single Ciona CLIC is not wholly unexpected; the urochordates are thought to have diverged prior to multiple rounds of whole or segmental genomic duplication that occurred early in vertebrate evolution, and the expansion of many protein families is thought to have occurred during this time145. The two other protein families also contained in the triplicated genome ACD clusters, the Runt and DSCR1 proteins, also appear to have only a single family member within the Ciona genome148,149. Additionally the single Runt, CLIC and DSCR1 proteins all appear to be in distant portions of the Ciona genome. This indicates that the gene linkage that must have occurred prior to the ACD cluster duplication events was not present in the common ancestor of the urochordates and vertebrates. It has been reported by Stricker et al. that there is only a single Runt protein in the lancelet Branchiostoma floridae149, suggesting that the ACD cluster expansions also predate the divergence of the vertebrates and cephalochordates.

Table 3-1 Known chordate CLIC sequences (shown overleaf). Table 3-1 describes which chordate CLIC sequences are known and their origin. The first three columns (from left) describe the subphylum, class and order of each species respectively. Names in bold in these columns are higher order divisions, these refer to all species below them until the next bold entry within this column or a new entry in a column to the left. The fourth and fifth columns give the common and scientific names of each species. The sixth column describes which CLIC family members are known from each species, when multiple isoforms are known this is designated with a capital letter, when only a partial sequence is known a lower case “p” follows the number of the orthologue. The seventh column describes how the CLICs are known in each species, whether via EST or genomic sequence. References to the origin of each sequence are given in the last column, when these are based on unpublished work only the author or institution is given.

107 Table 3-1 Known chordate CLIC sequences. Scientific Family members Identified Subphylum Class Order Common name Citations name known via Genomic 127, Genoscope 2002, Satoh Sea squirt Ciona intestinalis a, b, c, d Phlebobranchia +EST et al.2002 Genomic Broad institute, Satou et al. Urochordata Ascidiacea Sea squirt Ciona savignyi b, c +EST 2004 Halocynthia Stolidobranchia Sea squirt 1p EST Makabe et al. 1999 roretzi Vertebrata Petro- Petromyzon Cephalasidomorphi Sea lamprey 1p, 5p EST 146 Agnatha myzontiformes marinus Leucoraja Chondrichthyes Rajiformes Little skate 1p EST Towle et al. 2004 erinacea Acanthopterygii Japanese puffer 1, 2, 3, 4, 5A, 5B, Fugu rubripes Genomic 143 Tetraodontiformes fish 5L, 6 Three spined Gasterosteus Gasterosteiformes 1, 2, 5A EST Kingsley et al. 2003 stickleback aculeatus Beloniformes Japanese medaka Oryzias latipes 4p EST Kohara et al. 2001 Brown spotted Tetraodon 1, 2, 3, 4, 5A, 5B, Tetraodontiformes Genomic 144 puffer fish nigroviridis 5L, 6 Actinopterygii Ostariophysi Common carp Cyprinus carpiois 1p EST 150 Teleostei Cypriniformes Gnathostomata Genomic, Sanger institute, NIH- Cypriniformes Zebra fish Danio rerio 1, 2, 3, 4, 5A, 5B +EST MGC 1999 Siluriformes Channel catfish Ictalurus furcatus 2, 5p EST 151, Liu et al. 2004 Protacanth- Oncorhynchus 1p, 2p, 3p, 4p, 5p, opterygii Rainbow trout EST Govoroun et al. 2003, 152 mykiss 6p Salmoniformes Salmoniformes Atlantic salmon Salmo salar 2, 4p, 5A, 5Bp EST GRASP Consortium 2002 Sarcopterygii 141, Fitzsimmons et Amniota Genomic, Galliformes Red jungle fowl Gallus gallus 2, 3p, 4, 5A, 6 al.2004, Cogburn et al. Diapsida +EST 2002 Aves Lepidosauria Squamata Japanese gecko Gekko japonicus 1p EST Gu et al.2004

108 Genomic NHGR institute, 14,153, Common cow Bos tarus 1, 2,3p, 4, 5B, 6p +EST Sonstegard et al. 2004 Eutheria Common sheep Ovis aries 3p EST Gray et al. 2003 Artiodactyla Genomic, Common pig Sus scrofa 1, 2p, 4p Sanger institute, 154 +EST Carnivora Domestic dog Canis familiaris 1, 2p, 4p, 5p, 6p Genomic Lindblad-toh et al. 2004 Oryctolagus Lagomorpha Common rabbit 1, 6 EST 15,20 cuniculus 1, 2A, 2B, 3, 4, Genomic, Human Homo sapiens 20,23,25,31,34,35,53,55,123,124 5A, 5B, 6A, 6B +EST Genomic Daza-Vamenta et al. 2004, Primate Rhesus monkey Macaca mulatta 1, 2, 3, 4, 5A +EST Katze et al. 2003 Synapsida Chimpanzee Pan troglodytes 1p, 5p, 6A, 6B Genomic Whitehead institute,155 Mammalia Orangutan Pongo pygmaeus 4 EST Ansorge et al. 2004 Genomic, Common mouse Mus musculus 1, 3, 4, 5A, 5B, 6 139, NIH-MGC 1999 +EST 1, 2, 3, 4, 5A, 5b, Genomic, 39,54,140,156,157, Amgen EST Rodentia Norwegian rat Rattus norvegicus 6 +EST Program 2003 Spermophilus Williams Ground squirrel 2p EST lateralis et al. 2004 Metatheria Short-tailed Monodelphis 1, 4p, 5p Genomic Broad institute Didelphimorphia possum domestica Australian Prototheria Ornithorhynchus duckbilled 4p Genomic NHGR institute Monotremata anatinus platypus African clawed Xenopus laevis 1, 4, 5A EST 136,142 frog Anura Genomic Pipid frog Xenopus tropicalis 1, 3, 4, 5A, 6 JGI, 142 Sarcopterygii, +EST Amphibia Mexican walking Ambystoma 1p EST 155,158 fish mexicanum Caudata Eastern tiger Ambystoma 1p EST 158 salamander tigrinum tigrinum

109

3.3.3 CLICs from other Metazoan phyla The early steps in evolution leading to the present diversity of metazoan animals remain controversial, despite the completed genomes of several model organisms within various branches of the evolutionary tree. The metazoan animals are grouped, relatively easily, into distinct phyla based on shared body forms, however determining the relationships between these various phyla is problematic. As reviewed by Adoutte et al. two competing hypotheses detailing possible phylogenetic relationships of the metazoans have been proposed159. These are shown schematically in Figure 3-6 and discussed in the following two paragraphs which are directly paraphrased from this review:

Figure 3-6 Proposed metazoan molecular phylogenies. A: example of a “traditional” metazoan phylogeny that shows a gradual increase in complexity of the body plan throughout evolution. B: depiction of the phylogenetic relationships proposed under the Ecdysozoa hypothesis as determined from rRNA phylogenies. Figure, with minor modifications, is originally from Adoutte et al.159.

Starting with the “traditional” phylogenetic relationship, shown in Figure 3-6A above, the metazoans are essentially organised in a hierarchal manner according to the complexity of their body-plans based on morphological characteristics and fossil data. In this view, the Porifera or sponges are thought to have been one of the first extant metazoan phyla to diverge from other metazoans, occurring early in evolution during the late Precambrian. The Cnidarians (hydra, jellyfish and corals) and Ctenophores (comb-jellies), with only two germ layers (diploblastic) and radial symmetry, are thought to have been the next phyla to diverge from the other animals, the majority of

110 which contain a third germ layer (triploblastic) and are primarily bilaterally symmetric. The bilaterians are then, depending on whether they contain a coelom, further subdivided into the acoelomates and coelomates, the main acoelomate phyla being the flatworms or platyhelminths. A number of small organisms, including the nematodes, do not easily fit into one of these two groups. While not containing a true coelom, these organisms also display some internal cavities and are classified as the pseudocoelomates. Usually, in the traditional phylogenetic view, the pseudocoelomates are shown as diverging at an intermediary stage between the coelomates and acoelomates. The coelomates can be further subdivided into the protostomes (mouth first) and deuterostomes (mouth second), according to how the coelom develops during embryogenesis. Some of the major phyla within the protostome clade are the arthropods, annelids, and molluscs. While the deuterostome clade comprises the chordate, hemichordate and the echinoderm phyla. A small group of phyla, consisting mainly of the brachiopods, contain both deuterostome and protostome traits. These are called the lophophorates, and in traditional phylogenetic trees are usually shown as being more closely related to the deuterostomes than the protostomes.

A still highly controversial competing phylogenetic relationship is shown in Figure 3-6B. This hypothesis proposes three fundamental bilaterian clades, which replace the previous distinctions based upon the presence of a coelom. This phylogenetic relationship, called the Ecdysozoa hypothesis, is based on recent, preliminary molecular phylogeny studies. Within the coelomates, instead of a loose association with the deuterostomes, rRNA analysis suggests that the lophophorates are firmly associated with the protostomes, in particular the molluscs and annelids. A clade of segmented animals grouping the arthropods, annelids and others had previously been suggested, but was not supported by rRNA phylogenies160. The same analysis also suggested that the pseudocoelomates are an artificial grouping capable of being split into independent clades. Also, the basal relationship of the pseudocoelomates and the acoelomates with respect to the coelomates was not supported. Several of the pseudocoelomate phyla, including the nematodes, nematomorphs and priapulids, were uplifted into the protostomes and together with the arthropods appeared to form a separate clade of moulting animals named the Ecdysozoa. The acoelomate nemertine and platyhelminth phyla were also uplifted into the protostomes and grouped with the molluscs, annelids and the lophophorate subgroup into a sister clade named the Lophotrochozoans. In 111 rRNA phylogenies, the relationship between the bilaterians and the three diploblastic groups is more ambiguous than would be assumed based on physiology alone, all three diploblastic groups appeared to be equidistant to the triploblasts.

The Ecdysozoa hypothesis remains highly controversial, particularly because of the small size of the rRNA datasets used to construct the original molecular phylogenies from which it is based. Since its proposal, molecular phylogeny relationships constructed from larger sequence datasets have been reported that both support161 and contradict162 the hypothesis. The issue is unlikely to be resolved until a larger database of sequences allows more conclusive molecular phylogeny studies to be performed. The differences in the relationships between phyla predicted by these two proposed phylogenies should be kept in mind when attempting to draw conclusions from residue conservation within protein families. This is particularly true when making comparisons between the vertebrates and the two model scientific organisms D. melanogaster and C. elegans. Under traditional phylogenies the arthropods are predicted to be more closely related to the vertebrates than they are to the nematodes, while under the Ecdysozoa hypothesis the reverse is true.

3.3.3.1 Non-chordate Deuterostome CLICs All the CLICs previously discussed are from organisms within the phylum chordata. While keeping in mind the concerns discussed in Section 3.3.1, several CLIC-like sequences are known from organisms in other phyla. The three phyla ascribed to the clade Deuterostoma are thought to have evolved from a common ancestor more than 550 Myr BP127. The only known echinoderm CLIC sequence is derived from an EST sequencing project for the sea urchin Strongylocentrotus purpuratus163. This sequence covers around half of the CLIC module and retains ~35% identity with the chordate CLICs. Unlike the urochordate sequences, there appear to be multiple insertions and deletions compared to the vertebrate CLICs: the chordate footloop appears absent; the interdomain loop is shortened and small insertions, with respect to the vertebrate CLICs, occur near h2.

112 3.3.3.2 Arthropod CLICs The arthropods are one of the most successful metazoan phyla, with many arthropods affecting human well being. Several arthropod species are important model scientific organisms, while others act as human and animal disease vectors or parasites, agricultural benefactors or pests, and some arthropods are farmed themselves. Thus multiple arthropod genome131,133-135 and EST sequencing projects164-170 exist, from which CLIC-like sequences can be identified. These sequences are summarised in Table 3-2 and described in detail below.

3.3.3.2.1 Arachnid CLICs Salivary gland EST sequencing projects for the ixodid tick (A. variegatum)165 and ear tick (R. appendiculatus)164 have located CLIC-like sequences in these two important agricultural parasites. These sequences are highly similar to each other (94% identity) and share around 75% identity with other arthropod CLICs.

3.3.3.2.2 Crustacean CLICs Most arthropods consumed by humans are crustaceans, with several species also intensively farmed. Representing important aquaculture species, EST sequencing projects for the penaeid prawn (M. japonicus)166, white prawn (L. vannamei) and American lobster (H. americanus) have all located partial CLIC-like sequences. These crustacean sequences are incomplete, but display around 80% identity to the other arthropod CLIC-like sequences.

Table 3-2 Known CLIC-like sequences within the Arthropods. Table 3-2 shown overleaf describes the CLIC homologues known in the arthropods. Columns 1 to 8 (counting from left) are similar to those in Table 3-1. Column 9 shows the percentage amino acid identity with the D. melanogaster CLIC homologue for each protein. As some sequences are only partial, the number of amino acids this identity is calculated from is also shown. .

113 Common Scientific Partial/ Identity with Subphylum Class Order Identified via Citation name name complete dmCLIC Amblyomma Chelicerata Hard tick Partial EST 165 76% (193aa) (Spiders and Arachnids Parasitiformes variegatum Rhipcephalus scorpions) Ear tick Complete EST 164 78% (241aa) appendiculatus American Homarus Towle and Partial EST 43% (58aa) lobster americanus Smith 2004 Crustacea Litopenaeus Bartlett et al. Malacostraca Decapoda White prawn Partial EST 81% (192aa) (Crustaceans) vannamei 2004 Marsupenaeus Penaeid prawn Partial EST 166 81% (103aa) japonicus Yellow fever Aedes aegypti Partial EST 167 94% (125aa) mosquito Malarial Anopheles 134, Lobo et al. Complete Genomic +EST 89% (255aa) mosquito gambiae 2003 Drosophila 131, Harvey et Diptera Fruit fly Complete Genomic +EST 260aa total melanogaster al. 2001 (flies) Drosophila Fruit fly Complete Genomic 133 91% (260aa) pseudoobscura Pterygota Glossina Savannah (Division: morsitans Partial EST 168 87% (151aa) tsetse fly Endopterygota) morsitans Hexapoda Lepidoptera Silkworm Bombyx mori Complete Genomic 135 87% (248aa) (Insects) (butterflies and Cotton Helicoverpa Gruber et al. Partial EST 86% (178aa) moths) bollworm armigera 2002 Hymenoptera Genomic HGSC center (ants, bees and Honey bee Apis mellifera Complete 78% (252aa) +EST 2004, 170 wasps) Siphonaptera Cteno- Cat flea Partial EST 169 87% (128aa) (fleas) cephalides felis Acyrthosiphon Hunter et al. Pterygota Pea aphid Complete EST 77% (258aa) Hemiptera pisum 2004 (Division: (bugs) Glassy-winged Homalodisca Hunter et al. Paraneoptera) Partial EST 84% (200aa) sharpshooter coagulata 2003

114

3.3.3.2.3 Insect CLICs The insects are one of the largest subgroups within the arthropods. Because of their roles in agriculture and disease, multiple genomic and EST sequencing projects have identified many insect CLIC-like proteins. As discussed in Section 3.2.2 the relatively complete genomes of the insects A. gambiae134, A. mellifera (HGSC 2004), B. mori135, D. melanogaster131,132 and D. pseudoobscura133 all appear to have only a single CLIC- like sequence. The various arthropod EST projects are also consistent with there being only a single CLIC-like protein within most, if not all, arthropod species.

As well as these genomic sequences, EST CLIC-like sequences are known for a variety of other insects. All of these sequences are from within the superorder Neoptera, with the majority also coming from the division Endopterygota and the order Diptera (flies). But complete sequences are also known from the orders Hemiptera (bugs), Hymenoptera (ants, bees and wasps), Lepidoptera (butterflies and moths) and Siphonaptera (fleas). Most of these insect sequences show sequence identities of between 85-90% with the D. melanogaster CLIC-like protein. Only two of the insect proteins show lower identities, of around 75%, to dmCLIC. These are the honeybee sequence, that is thought to have a basal relationship within the division Endopterygota, and the pea aphid sequence that is within the sister division Paraneoptera. These levels of identity are on par with those shared between dmCLIC and other non-insect arthropod CLIC-like proteins.

Overall, the arthropod CLIC-like proteins are highly similar to the vertebrate CLICs. Unlike the urochordates they lack the chordate footloop between h5 and h6. Instead they have other insertions relative to the chordates: all appear to possess 20 to 50 residue C-terminal extensions following h9; a ~ 4 residue insertion at the C-terminus of h1; and compared to the smaller vertebrate CLICs, they typically have ~ 10-20 more residues N-terminal to s1. Also instead of the active site CxxS motif present in the chordates, the arthropods contain a two-cysteine CxxC motif.

115 3.3.3.3 Mollusc CLICs Although less studied than the arthropods, the molluscs represent another highly successful phylum within the animal kingdom. Some mollusc species are important agricultural pests, while others are major fishing and aquaculture species. CLIC-like proteins are known for two closely related molluscs, the Pacific oyster (Crassostrea gigas) and the Eastern oyster (Crassostrea virginica). These are both partial sequences arising from EST sequencing projects. Although these mollusc sequences are incomplete, they appear to be most closely related to the arthropod and platyhelminth CLIC-like proteins (30-35% identity), but nonetheless are still related to the vertebrate CLICs (25-30% identity). Compared to the arthropod CLICs, the mollusc proteins appear to have a ~10 residue insertion near the interdomain loop, it is unknown whether they also possess a C-terminal extension.

Interestingly the CLIC-like sequence from C. gigas was derived from an EST library looking at genes upregulated following bacterial challenge171, possibly suggesting these proteins play an immune-related role in the molluscs. CLICs or CLIC-like proteins thus appear to be implicated in immune responses within the vertebrates, agnathans and molluscs. If true, these proteins must be involved in some function common to all three immune systems, the diversity of which suggests a fundamental role evolving early in metazoan evolution.

3.3.3.4 Nematode CLICs The nematodes or roundworms are a large metazoan phylum. However, the size of the nematodes and their lack of easily identifiable distinguishing features, has made it somewhat difficult to classify or determine the total number of species. Nematodes can be both free living or parasitic, with many parasitic nematode species directly affecting humans or their domesticated plants and animals. Their simple anatomy and the ease with which they can be cultured, has also meant several nematode species have played an important role as scientific experimental organisms, the archetype in this case being the species C. elegans. For these reasons, the nematodes are a highly studied phylum with EST and genomic sequencing projects underway and completed for many species. From such projects multiple nematode CLIC-like sequences are known (see Table 3-3).

116 Originally identified in C. elegans by Berry et al. there appear to be two CLIC-like proteins within the nematodes137, these are EXC-4 and EXL-1. EST and gene sequences are known for both proteins from several nematode species (see Table 3-3). The majority of these sequences are from parasitic nematodes or relatives of the model scientific species C. elegans.

Within the nematodes, EXC-4 is most closely related to the CLIC-like proteins from other phyla. This orthologue retains ~40% identity to the arthropod proteins, and identities of around 25% with the chordate CLICs. Although many of the sequences are partial, within the nematodes themselves EXC-4 shows sequence identities of 65-75% between distantly related orthologues, and greater than 95% when comparing sequences from within the same genus.

While the sequence identity between them is low (~30%), EXL-1 is presumably a paralogue of EXC-4 resulting from a gene duplication event in an ancestor of the nematodes. If this is the case, it is noteworthy that the selection pressures on EXL-1 appear to be much weaker than those on EXC-4. Within distantly related nematode species the identity between EXL-1 orthologues is less than 50%, and even within species from a single genus it can still be as low as 75%. This may indicate that, following the presumed duplication event, one copy of the parental protein retained most of its original function diverging only slowly with time to become EXC-4. The second copy, freed of some selection constraints, could diverge more significantly and evolve a related cellular role. If this is the case, it is noteworthy that their functions have not diverged too significantly, as when expressed with an EXC-4 promoter EXL-1 still remains capable of recovering the EXC-4 knockout phenotype137. If EXC-4 and EXL-1 are direct paralogues it is noteworthy that as well as significant divergences in their amino acid sequences, there appears to be little if any relationship between their gene structures.

The nematode CLIC-like proteins are most closely related to those found in the arthropods. They contain N- and C-terminal extensions to the CLIC module of similar sizes, and most of the insertions within the CLIC module seen in the arthropods with respect to the chordate CLICs are also present in the nematodes. However there are

117 several minor differences unique to the nematodes, EXC-4 contains a short insertion between s3 and s4, and a much larger insertion between h4 and h5.

3.3.3.5 Platyhelminth CLICs The platyhelminths are like the nematodes in that their relationship to the other phyla is uncertain, and differs considerably in the two proposed phylogenetic relationships discussed in Section 3.3.3. The platyhelminths have a significant impact on humans through the many parasitic species that adversely affect the health of both humans and their domesticated animals.

Platyhelminth CLIC-like sequences have been found from sequences generated by multiple EST sequencing projects as well as in the genomic sequence of Schistosoma mansoni (Sanger institute and TIGR, www.sanger.ac.uk/Projects/S_mansoni/). CLIC- like sequences have also been derived from EST sequencing projects from the following platyhelminths: the bloodflukes S. japonicum172 and S. mansoni173; the tapeworm Echinococcus granulosus (Fernandez and Maizels 2001); and the model scientific organism Dugesia japonica174. Interestingly, when comparing EST data with the S. mansoni genomic sequence, it appears that unlike the genes of other known CLIC-like proteins, the ORF is not interrupted by the presence of introns.

These 4 sequences, which cover three of the four platyhelminth classes, show relatively low sequence identities within the phylum. The two known Schistosoma species retain identities of only around 75%, while identities between the platyhelminth classes are as low as ~ 35%. This is almost on par with the level of conservation observed with sequences from other phyla. This low level of conservation may be the result of rapid evolution or early radiation of these CLIC-like proteins within the platyhelminths, or, as with the tunicates and nematodes, may be indicative of classification problems arising from a lack of distinctive features within such organisms.

118 Table 3-3 Known nematode CLIC-like sequences. Table 3-3 describes the number and origin of known nematode CLIC homologue sequences. Column 1 (counting from left) identifies the protein orthologue, column 2 the nematode species clade and column 3 it’s order. Columns 4-9 are the same as Table 3-2 where column 9 displays identity with the respective C. elegans EXC-4 or EXL-1 orthologue. Complete/ Identified Identity with Orthologue Clade Order Common name Scientific name Citations partial via C. elegans Trichinella McCarter et al. Trichinosis Partial EST 66% (114aa) I Trichurida spiralis 1999 Canine McCarter et al. Trichuris vulpis Partial EST 68% (94aa) whipworm 1999 TIGR, Williams, S. Genomic Filariata elephantiasis Brugia malayi Complete A. 1995, Blaxter et 76% (285aa) +EST al. 1996 III African Onchocerca Williams et al. Spirurida riverblindness Partial EST 75% (182aa) volvulus 1997 nematodes Strongyloides McCarter et al. none Partial EST 70% (130aa) ratti 1999 IVa Rhabditida Strongyloides McCarter et al. EXC-4 Threadworm Partial EST 69% (219aa) stercoralis 1999 Soybean cyst Heterodera McCarter et al. Partial EST 68% (266aa) nematode glycines 1999 Root knot Meloidogyne McCarter et al. Partial EST 70% (185aa) nematode chitwoodi 1999 Root knot Meloidogyne McCarter et al. IVb Tylenchida Partial EST 77% (87aa) nematode hapla 1999 Root knot Meloidogyne McCarter et al. Partial EST 73% (216aa) nematode incognita 1999 Coffee root-knot Meloidogyne McCarter et al. Partial EST 70% (145aa) nematode paranaensis 1999 Caenorhabditis V Rhabditida None Complete Genomic 129 97% (291aa) briggsae The elegant Caenorhabditis Genomic Complete 130,137 290aa total worm elegans +EST 119 Caenorhabditis none Partial Genomic WUGSC 96% (128aa) remanei Barberpole Haemonchus McCarter et al. Partial EST 81% (98aa) worm contortus 1999 Complete/ Identified Identity with Orthologue Clade Order Common name Scientific name Citations partial via C. elegans

Large pig Ascarididae Ascaris suum Partial EST Blaxter et al. 2000 43% (237aa) III roundworm

TIGR, Williams, S. Filariata elephantiasis Brugia malayi Partial EST 34% (134aa) A. 1995 Parastrong- McCarter et al. IVa none yloides Partial EST 41% (199aa) 1999 EXL-1 trichosuri Caenorhabditis None Complete Genomic 129 75% (240aa) Rhabditida briggsae The elegant Caenorhabditis Genomic Complete 130,137 238aa total worm elegans +EST V Caenorhabditis none Genomic EST WUGSC 84% (125aa) remanei Brown stomach Ostertagia McCarter et al. Strongylida Partial EST 46% (177aa) worm of cattle ostertagi 1999

120

The platyhelminth CLIC-like proteins have: like the vertebrate CLICs, relatively short N-termini prior to s1 and an active site CxxS motif; like the arthropod and nematode CLICs, large C-terminal extensions following h9; and unique to the platyhelminth sequences a ~10 residue insertion between h3 and s4 and a small deletion within h5.

3.3.4 Cnidarian CLICs If CLIC-like proteins predate the origin of the bilaterians, it would indicate that they might fulfil a fundamental function that arose early in the evolution of the metazoans. One of the more extensively studied diploblastic organisms is the Cnidarian Hydra magnipapillata. A recent Hydra EST Project by Bode et al. 2002 identified two CLIC- like proteins within H. magnipapillata. A full sequence is available for one of these (defined as HmCLICL-A) but only a partial N-terminal sequence for the other (defined here as HmCLICL-B). The currently known sequences for HmCLICL-B covers only the N-terminal 170 residues, up to the beginning of h6. Over this portion of the protein HmCLICL-B is 33% identical to the same region in HmCLICL-A.

The HmCLICL proteins share sequence identities of 25-30% within the CLIC-module of the known CLIC-like proteins in the various bilaterian phyla. Compared to the vertebrate CLICs, in the HmCLICL proteins the footloop between h5 and h6 is absent, the interdomain loop is also slightly shorter but the vertebrate active site CxxS motif is present. Unlike the arthropod CLIC-like proteins, there are no obvious N- or C-terminal extensions to the CLIC module. The presence of at least two CLIC-like proteins in the Hydra implies that during the course of evolution there has not been a simple increase in the number of CLIC family members with increasing “complexity” of body plan. Instead, this implies that the CLIC family experienced multiple rounds of gene duplication and loss in different evolutionary lineages.

3.4 Sequence alignment An alignment of the CLIC homologues from the organisms discussed in Section 3.3 is shown in Insert 1. The alignment begins just prior to the first proline at the N-terminus of β-strand 1. The majority of these sequences are chordate CLICs. But, there are also a significant number of arthropod and nematode CLIC-like sequences, and a small

121 number of echinoderm, mollusc, platyhelminth and cnidarian sequences. A few cautious reminders should be made with regards to this alignment, as it will be referred back to throughout the thesis. Firstly, it should be kept in mind that in some cases sequences may be in error. Sequence reliability is obviously related to the number of times each homologue has been independently sequenced, in the worst cases only a single partial EST is known (see Table 3-1, Table 3-2, Table 3-3and Section 3.3 for details). Secondly, it should be noted that in regions with significant structural divergence, sequence comparisons become less meaningful. For example, residues at the extreme N- or C-termini or those immediately adjacent to species-specific insertions or deletions may adopt alternate conformations; it should therefore be less surprising, but still informative, if cross-species comparisons in these regions identify differences. Finally in some regions the current alignment is likely to simply be incorrect. The known structures of the CLIC homologues in the chordates (Chapter 2 and Chapter 4), arthropods (Chapter 6) and nematodes (Chapter 6) give a structural basis for the alignment of the sequences from these phyla. However for sequences in other phyla, in regions where there is only low sequence similarity to a homologue of known structure the alignment is less clear. This is particularly the case for the echinoderm, cnidarian, mollusc and platyhelminth sequences around helices 3 and 4, and the connecting interdomain loop.

There are few invariant residues within the CLIC homologues seen in the alignment in Insert 1. The residues that are highly conserved throughout the GSTs that Dulhunty et al. first used to classify the CLICs 4 are present in only some sequences: 1. The glycine residue at the C-terminus of helix 1 is only present in the chordate members, but there is an insertion within this region in the arthropods and nematodes. 2. The conserved cis-proline at the N-terminus of β-strand 1 is one of the few invariant residues. 3. Within a region known as the GST motif II sub-domain, an aspartic acid is conserved in all but one sequence, HmCLICL-B (which may be a sequence error). 4. A second glycine residue within this sub-domain is highly conserved in the GSTs. It is present in most of the CLIC homologues but is replaced by a serine residue in some nematode CLIC-like sequences. 122

As members of the GST-fold superfamily, redox state is likely to be an important aspect of the function of the CLICs. The three conserved cysteines in the CLIC family can be seen within the alignment in Insert 1. The protein microenvironments of these residues was discussed previously in Section 2.1. The first of these conserved cysteines (Cys-24 in CLIC1) occurs at the N-terminus of helix 1, it is within the active site CxxC motif and can form a mixed disulphide bond with GSH. It is the least conserved of the three cysteines as it is replaced by an aspartic acid in some nematode EXC-4 or EXL-1 proteins. The second cysteine (Cys-178 in CLIC1) is invariant throughout the CLIC homologues, it follows the conserved aspartic acid in the GST-motif II sub-domain discussed above. The third and final cysteine (Cys-223 in CLIC1) is at the C-terminus of helix 8 and is conserved in all but two sequences (CLIC1 in C. carpio and O. latipes) both of which are known from a single EST whose sequence deteriorates near this region.

While much could be said about this alignment, the majority of sequence relationships await the introduction of the relevant structures presented in later chapters. Instead, the relationships between the various CLIC homologues can be summarised in the phylogram shown in Figure 3-7. Only full-length sequences were used in the alignment to construct this phylogram so some of the sequences shown in Insert 1 are not present.

In the phylogram in Figure 3-7 each of the vertebrate CLIC orthologues can be seen to cluster into clearly defined branches. These branches cluster into two subgroups: the highly related ACD cluster CLICs associated with the Runt family (in green, cyan and black) and the more loosely related CLICs 1-3 (in red, blue and magenta). Within the branches of each CLIC orthologue, a clear division can be seen between the teleost and tetrapod sequences.

The global structure of the phylogram shown in Figure 3-7 also appears reasonable. As one of the most evolutionary distant animals, the root of the tree is presumable near the sequence of the Cnidarian H. magnipapillata. This sequence, shown in tan, roughly divides the tree into two halves: on one side are the protostome sequences, on the other the chordates. The chordate sequence nearest to H. magnipapillata is that of C. intestinalis, a urochordate which has a basal relationship to the other chordates. The

123 remaining vertebrate sequences then sub-divide into their separate orthologue-specific clades. With fewer sequences and more divergent relationships less can be said about the protostome sequences.

Figure 3-7 Phylogram of CLIC and CLIC-like proteins. Alignments and distance matrix calculated using the program Clustal W120, tree drawn using the program TREEVIEW175.

Within the invertebrates the nematode EXC-4 proteins appear to be more closely related to the CLIC-like proteins in the arthropods than those in the platyhelminths possibly

124 indicative of a common Ecdysozoa lineage. Further sequences will be needed to resolve these relationships. As discussed in Section 3.3.3.4, the nematode’s EXL-1 protein is more divergent than EXC-4. This can be seen in the phylogram, where the EXL-1 proteins form an independent branch. Also, in the sequence alignment (Insert 1) many of the residues conserved in other CLIC homologues are not maintained in the EXL-1 proteins. The platyhelminth and cnidarian sequences form their own independent branches in the phylogram in Figure 3-7 and significant differences can be seen between them and the chordate CLICs in the alignment in Insert 1.

3.5 Conclusions The CLICs appear to be part of an evolutionarily ancient protein family that may pre- date the evolution of the bilaterian animals. In the vertebrates this family has expanded, presumably via a series of duplication events, to contain 6 members in the tetrapods and 7 members in the teleosts. For 3 (tetrapods) or 4 (teleosts) of these CLIC family members, this duplication process occurred in tandem with members of the extensively studied Runt family (see Figure 3-8) possibly suggesting that the functions of these proteins may be related.

There is good evidence to suggest that urochordates, echinoderms, arthropods and platyhelminths all contain only a single CLIC-like protein. From this, it was hypothesised that only a single proto-CLIC was present in the common ancestor of the chordates and perhaps also that of the deuterostomes and protostomes. Although not as extensive as the vertebrate family, there are multiple CLIC family members in some invertebrate phyla, including the nematodes and cnidarians. Duplication events separate to those that led to the 6 or 7 vertebrate CLIC family members therefore most likely occurred in many different evolutionary lineages. These duplication events appear to have occurred independently of similar expansions in the invertebrate Runt family (see Figure 3-8).

Figure 3-8 Overview of Runt and CLIC family members (overleaf). A: Runt, and B: CLIC gene number in the bilaterians mapped onto a simplified phylogeny based on the Ecdysozoa hypothesis. Panel A: is copied from Stricker et al.149 but modified to include the recently discovered C. intestinalis protein and extra D. melanogaster and F. rubripes Runt proteins128,148. Panel B: utilises the simplified phylogeny of Stricker et al.149 to present a similar representation of the evolution of the CLIC family. As described by Stricker et al. panel A: has an: “Overview of runt-gene 125 number and function in bilaterians mapped on a simplified phylogeny of bilaterians, showing a contentious position for hagfish161,176,177. Although the presented phylogeny is widely accepted, alternative chordate phylogenies have been proposed178,179. The step-wise evolution of cartilage180, bone and hematopoiesis and the most likely time intervals of gene-duplications (Gen-Dup) according to Holland181 are indicated. As the position of the fossil Ostracodermi and Placodermi is still unclear, the time of the evolution of desmal bone and the internal skeleton is also uncertain176”149.

126

Figure 3-8 Overview of Runt and CLIC family members.

127 Chapter 4 Crystal structure of reduced CLIC4

4.1 Introduction Among all the human CLIC family members, sequence alignments suggest that CLICs 1 and 3 have diverged more from the remaining family members than they have to each other. In particular, as described in Chapter 3, CLICs 4, 5 and 6 are all closely related and appear to have arisen through the triplication of a gene cluster early in vertebrate evolution55. In order to assess the differences between CLIC1 and a more archetypal CLIC family member, the crystal structure of CLIC4 was solved.

CLIC4 (also called mtCLIC, HuH1 and p64H1) was the first p64 homologue to be identified39. It is expressed in a wide variety of tissues and is highly conserved across species23,31,32,40,41. Studies in Xenopus laevis show that CLIC4 is expressed early in embryogenesis and that it is developmentally regulated136. The intracellular localisation of CLIC4 varies considerably among different cultured cell lines and tissues, ranging from the plasma membrane23,31,137 to various intracellular organelles including the inner mitochondrial membrane44, the caveolae and trans-Golgi network23, the ER40 and large dense core vesicles32.

In mice DNA damage results in the up regulation of CLIC4 protein levels while upregulation of CLIC4 in transfected cells induces apoptosis44. On cellular stress, CLIC4 translocates to the cell nucleus in keratinocytes via a nuclear localisation signal sequence 199-KVVAKKYR-206 45 and it is this form of CLIC4 that induces apoptosis45. Experiments with p53(-/-) keratinocytes indicate that CLIC4 is linked to apoptosis through both p53 dependent and independent pathways41. Initial transfection with antisense CLIC4 protects keratinocytes against p53-independent apoptosis; however, long term transfection with antisense CLIC4 is incompatible with cell survival, suggesting that CLIC4 performs an essential function in cells44.

The mitochondrial function of CLIC4 may be linked to its putative chloride ion channel activity. Studies of mitochondrial-DNA-depleted L929 cells indicate that mitochondrial CLIC4 may be important in maintaining the mitochondrial membrane potential68. In these cells, respiration is abolished and ATP does not appear to be translocated from the cytoplasm to the mitochondria, so as to allow the F1-ATPase to maintain the membrane 128 potential. Instead, CLIC4 expression is upregulated due to increased cellular calcium and it appears to facilitate chloride uptake into the mitochondria, correlating with the maintenance of the membrane potential68. These cells show no sign of apoptosis. In contrast, overexpression of CLIC4 in keratinocytes results in a reduction of mitochondrial membrane potential44.

Like other CLIC proteins, CLIC4 appears to have both a soluble, GST-like form and an integral membrane form, which is resistant to alkali treatment23,40.

In Section 4.3 of this chapter the crystal structure of the soluble form of a CLIC4 mutant is presented. The structure is seen to be highly similar to the previously solved structure of CLIC1, a brief discussion of the small differences between these structures is given in Section 4.4.1. However for the most part this chapter serves as a reference point when analysing the larger structural changes observed on CLIC1 dimerisation presented in Chapter 5, as well as allowing distinctions to be made between family member specific and organism specific differences between the human and invertebrate CLIC structures presented in Chapter 6.

4.2 Materials and Methods

4.2.1 Cloning Two CLIC4 expression constructs were a gift from Prof. M. A. Berryman6 and their cloning has been described previously31. These were pCLIC4(wt), containing the wild type protein and pCLIC4(ext), which due to an incorrect antisense primer possesses a 14 residue C-terminal extension. For both constructs CLIC4 cDNA was amplified by PCR using primers that generated BamHI and either KpnI or HindIII restriction sites at the ends. For the wild type construct, pCLIC4(wt), the primers used were sense: CGCGGATCCATGGCGTTGTCGATGCC-GC and antisense: GCAAGCTTTTACTTGGTGAGTCTTTTGGC. The construct pCLIC4(ext) was inadvertently made using instead the antisense primer AGGTACCTTACTTGGTAGTCTTTTGGC. The PCR products were TA-cloned (Invitrogen, Inc., Carlsbad, CA), then subsequently digested with BamHI and cloned

6 Department of Biomedical Sciences, Molecular and Cellular Biology Program, Ohio University College of Osteopathic Medicine, Athens, OH 45701, USA. 129 into the pGEX-2T GST-fusion vector (Amersham Biosciences, Piscataway, NJ). Because of the incorrect primer in pCLIC4(ext) the original stop codon is out of frame resulting in the last two residues, Thr-252 and Lys-253 being replaced by the 16 residue peptide PSKVPKGEFQHTGGRY.

Both the pCLIC4(ext) and pCLIC4(wt) plasmids were transformed into E. coli BL21 (DE3) competent cells for expression.

4.2.2 Protein expression and purification The two CLIC4 constructs were expressed and purified using the same protocol. E. coli BL21 (DE3) cells containing either pCLIC4(wt) or pCLIC4(ext) were cultured overnight in LB with 100 μg/mL Amp. The culture was used to inoculate 1.2L of 2YT 100 μg/mL Amp media at a ratio of 1:50 overnight culture to fresh media, which was then grown at 37 °C. Expression of recombinant CLIC4 fusion protein was induced at an optical density of 1.0 cm-1 at 600 nm with 1 mM IPTG. Induction proceeded for 4 hrs before harvesting the cells by centrifugation at 6000 rpm for 10 min. at 4 °C. Cells were then resuspended in 30 mL ice cold PBS containing 10 mM DTT before freezing at -80 °C until required.

Resuspended cells were thawed on ice and passed through a chilled French press twice at 110 MPa. Triton X-100 was added to the resulting homogenate to a final concentration of 2% v/v and incubated at room temperature for 30 min. with gentle agitation. The cell debris was then sedimented at 33,000 x g for 40 min. at 4 °C. The resulting supernatant was removed and incubated with 2 mL glutathione sepharose 4B beads (Amersham Biosciences) for 1 hr at room temperature. After which, unbound protein was eluted from the column and the beads washed with ~ 400 mL PBS containing 0.5 mM DTT. The column was then equilibrated with 2 volumes of thrombin cleavage buffer (20 mM TRIS/HCl pH 8.4, 150 mM NaCl, 2.5 mM CaCl2, 1 mM NaN3 and 0.5 mM DTT). The bound fusion protein was cleaved from the glutathione sepharose beads by adding bovine plasma thrombin (Sigma-Aldrich) at a fusion protein:thrombin weight ratio of 50:1, and incubated at room temperature for 16 hrs.

130 After thrombin cleavage the CLIC4 protein was eluted from the column and DTT added to a final concentration of 10 mM, all further purification steps are carried out at 4 °C unless otherwise stated. The sample was then dialysed against buffer A (20 mM

HEPES pH 7.0, 50 mM NaCl, 1 mM DTT, 1 mM NaN3) before concentrating to ~ 3 mL using an Amicon YM-10 Centriprep centrifugal concentrator. This was then loaded on a Fractogel EMD DEAE 650(M) and the bound protein eluted with a 300 mL gradient between buffer A and buffer B (20 mM HEPES pH 7.0, 1 M NaCl, 1 mM DTT, 1 mM

NaN3).

CLIC4 eluted as a single peak at a buffer composition of approximately 270mM NaCl corresponding to a measured conductivity of 190 mS/cm. The fractions corresponding to this peak were pooled and dialysed against sizing column buffer (20 mM HEPES pH

7.0, 100 mM NaCl, 1mM DTT, 1mM NaN3) before being concentrated to 2.5 mL and loaded on a Superdex 75 prep grade HiLoad 26/60 (Amersham Biosciences). The protein was eluted with 300 mL of the sizing column buffer at a flowrate of 0.8 mL/min while collecting 5 mL fractions.

4.2.3 Crystallisation Crystals of CLIC4(ext) were grown at room temperature (~ 18 °C) using the hanging drop vapour diffusion method. The protein sample, at 14 mg/mL in sizing column buffer, was centrifuged at 16,000 x g for 10 min. immediately prior to setting up the crystallisation trials. 3 μL of protein was mixed with 3 μL of reservoir solution on siliconised glass slides, which were then inverted and sealed over 500 μL of reservoir solution. CLIC4(ext) crystals grew in condition #3 of the Hampton Research PEG ion screen, consisting of 0.2M NH4F, 20% w/v monodisperse polyethylene glycol 3350. The CLIC4(ext) crystals grew over a 2 week period reaching a final size of ~ 400 μm x 400 μm x 200 μm (Figure 4-1).

131

Figure 4-1 CLIC4(ext) crystals.

Crystals grown in 0.2M NH4F, 20% w/v PEG 3350, observed using polarising filters.

4.2.4 Data collection The CLIC4(ext) crystals were progressively transferred into a cryoprotectant solution consisting of reservoir solution and D-glucose (final concentration 300 mg/mL). This was performed by adding 3μL of 150 mg/mL D-glucose solution directly to the protein drop and waiting several minutes before adding 3μL of 300 mg/mL D-glucose solution. After several more minutes the crystal was mounted in a cryo loop (Hampton Research) and washed directly in the 300 mg/mL D-glucose solution before freezing in liquid- nitrogen and mounting at 100 K in a nitrogen cryostream (Oxford cryosystems). Diffraction data were thus obtained at 100 K on a Mar345 image plate mounted on a

Nonius rotating anode generator using Cu Kα radiation and Osmic confocal mirror optics.

The crystals diffracted to 1.8 Å resolution in the space group P212121 (a = 77.22 Å, b = 79.48 Å, c = 42.60 Å), see Figure 4-2. The crystals contain 1 molecule within the 3 asymmetric unit and possess a Matthew’s number (Vm) of 2.88 Å /Da and a solvent content of 57%. The resolution of 1.8 Å was determined by following best practice as described by Phil Evans in his guide to using the CCP4 program SCALA (www.ccp4.ac.uk/courses/ECM2004/runningscala.pdf). Briefly, the SCALA statistic Mn(I)/sd(Mn(I)), an estimate of signal to noise, is utilised to determine the resolution cutoff. Data is included in the refinement if Mn(I)/sd(Mn(I)) is greater then 2.0. In the outer 1.9-1.8 Å shell this number is 2.4. Data were processed using the programs MOSFLM182 and SCALA183 (see Table 4-1 for data reduction statistics).

132

Table 4-1 Data reduction and refinement statistics for CLIC4(ext). Reflections (unique) 116,791 (25,137) Completeness (1.9-1.8 Å shell) 99.9% (99.9%) I/σ (1.9-1.8 Å shell) 8.4 (1.3) Rmerge (1.9-1.8 Å shell) 0.063 (0.58) 2 BoverallB 27.6 Å Protein (water) atoms 1884 (158) R factor (Rfree) 0.195 (0.231) r.m.s.d. bond lengths1 0.016 Å r.m.s.d. bond angles1 1.46o Ramachandran plot2 Most favoured region 93.5% Additionally allowed 6.0% Generously allowed 0.5% (Asp-87) Disallowed 0 % 1 From REFMAC 5 184 2 From PROCHECK182 r.m.s.d., root mean square deviation from standard bond lengths and angles determined by high resolution small molecule crystal structures184.

Figure 4-2 Diffraction image from CLIC4(ext) crystal. Collected on a Mar345 image plate, (Δϕ = 1o, distance = 120 mm). Resolution is 1.66 Å at the edge of the plate.

133

4.2.5 Structure determination and refinement The CLIC1 reduced monomer structure (1K0M)3 was used as a molecular replacement probe using the CCP4183 program AMoRe185. An initial phasing model consisting of residues 16-165 and 175-241 from the CLIC1 structure was used in the program wARP186 for phase refinement. The resulting electron density was clear and the CLIC4 sequence was built onto the original CLIC1 model in the program O187. This was refined using maximum likelihood methods (program REFMAC 5 184). The final model consists of residues 16-163 and 174-257 plus 158 water molecules. Two residues have cis peptide bonds: Pro-76 and Pro-102. The final R-factor is 0.195 with R-free 0.231 (R-free calculated with 5% of the data – 1,280 reflections). The data reduction and refinement statistics are summarised in Table 4-1. Water molecules were added using a Babinet model with mask184, the mask parameters used were a vdw probe radius of 1.40Å, an ion probe radius of 0.80Å and a shrinkage radius of 0.80Å. Structures were validated using PROCHECK182 using default parameters.

4.3 Results CLIC4(ext) appears monomeric in the crystal as there are no extensive (>500 Å2) contacts between any two molecules within the crystal. The CLIC4(ext) molecule has approximate dimensions 50 x 40 x 20 Å. The final model consists of residues 16-163 and 174-257, with the break in density corresponding to the flexible foot loop between helix 5 and helix 6. The soluble reduced structure of CLIC4(ext) (Figure 4-3A) closely resembles the structure of the same form of CLIC1 (Figure 4-3B). This is seen by a total RMS deviation of 0.77 Å between the CLIC1 and CLIC4(ext) Cα atoms over residues 17-159 and 175-2517 (Figure 4-3C). The backbone structures of the two CLICs overlay almost perfectly except for a small movement of helix 2, as well as slight differences between residues leading into the flexible foot loops. The conformation of the backbone in helix 2 for both structures is similar (RMS deviation of 0.27 Å for Cα over residues 64-71), despite a low level of sequence conservation for this region. The differences in the position of helix 2 as seen in Figure 4-3C are due to a small ~15 degree rigid-body rotation of the helix around the C-terminal proline cap.

7 CLIC4 numbering. 134

Figure 4-3 CLIC4(ext) structure. A: Cartoon representation of the crystal structure of CLIC4(ext). B: the structure of CLIC1 in the same orientation as CLIC4 in panel A. C: a stereogram showing a superposition of the Cα backbone traces of CLIC4 (green) and CLIC1 (mauve). Figure made using the programs MOLSCRIPT1 and RASTER3D2.

The foot loop in both CLIC1 and CLIC4(ext) structures appears to hinge at two residues that are conserved in all vertebrate CLIC sequences: Pro-158 and Arg-176. The side chain guanidinium group of Arg-176 forms a hydrogen-bonding network with backbone carbonyl oxygen groups from both sides of the foot loop (Figure 4-4A). An identical structure is observed for CLIC1.

The putative internal nuclear localisation sequence (NLS) of CLIC4 proposed by Suh et al.45(residues 199 to 206: KVVAKKYR) is located at the C-terminus of helix 6 (Figure 4-4B). Three basic residues, Lys-199, Lys-203 and Lys-204 form the solvent exposed face of helix 6 near its C-terminus, while Arg-206 is on the top of the molecule, in the loop leaving helix 6 pointing in the opposite direction to the other basic residues (Figure 4-4B). The sequence of this putative NLS is identical in CLICs 2, 4, 5 and 6, while in

135 CLIC1 the single exception is that the residue equivalent to Lys-199 in CLIC4 is Gln- 188 in CLIC1. This structure of this putative NLS region in CLIC1 and CLIC4(ext) is nearly identical.

Figure 4-4 Features of the CLIC4(ext) structure (overleaf). A: Arg-176 locks the two ends of the foot loop into place via a network of hydrogen bonds centred on its side chain guanidinium group. B: The NLS of CLIC4 situated at the C-terminus of helix 6 and the subsequence loop. C: Many side chains in CLIC4 and CLIC1 adopt equivalent rotamers. Here Trp-216 (CLIC4) and His-208 (CLIC1) each stabilise the loop connecting helices 6 and 7 by forming equivalent hydrogen bonds to backbone carbonyl groups. D: An overlay of the loop connecting helix 2 to β-strand 3 from CLIC4 (green and red) and CLIC1 (atomic colours). All parts of this figure are in stereo. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

136

137 4.4 Discussion

4.4.1 Comparison of CLIC4(ext) and CLIC1 structures Due to an initial cloning error some of our work on CLIC4 was based around the mutant CLIC4(ext), in which the last two amino acids are replaced by a 16 amino acid peptide coded for by extraneous sequence of the expression vector. The biophysical characterisation experiments have been performed with both the mutant and wild type CLIC4 constructs and show no salient differences. The structure solved however is that of CLIC4(ext), as crystals of CLIC4(wt) were unable to be obtained. As the extension is at the extreme C-terminus of the protein and at the boundary of the molecule, the effect this mutation has on the overall structure is likely to be minimal.

Human CLIC1 and CLIC4 share 67% sequence identity (79% similarity) and thus unsurprisingly display a high degree of structural similarity. The backbone structures of the two CLICs overlay well except for a small rigid-body movement of helix 2 and slight differences in the flexible foot loops.

As well as similar backbone structures, for the most part chemically similar side chains also adopt the equivalent rotamers in both structures. For example, Trp-218 on helix 7 is highly conserved in vertebrate CLICs with the exception of CLIC1, where it is replaced by a histidine and CLIC3, where it is replaced by either arginine or lysine (see Insert 1). In the structural overlay of this region (see Figure 4-4C), Trp-218 (CLIC4) and His-208 (CLIC1) stabilise the loop connecting helices 6 and 7 by forming equivalent side chain hydrogen bonds to the two carbonyl groups straddling Pro-211. This proline is highly conserved in all CLICs (including invertebrate CLIC-like proteins) except CLIC5 from X. laevis and CLIC3 from the puffer fish F. rubripes where this residue is a serine and CLIC3 from T. nigroviridis where the residue is a cysteine.

The rigid-body rotation of helix 2 between the two structures appears to be due to sequence differences in the loop regions connecting helix 2 to the body of the molecule. In CLIC4 the loop between helix 2 and β-strand 3 possesses the sequence 71-PGTHPP- 76 while the equivalent region in CLIC1 has the sequence 60-PGGQLP-65. The last proline in both these sequences adopts the cis conformation at the N-terminus of β-

138 strand 3. This proline is a conserved feature of most proteins containing the thioredoxin fold and is thought to be important for maintaining the stability of the βαβ and ββα subdomains by stabilising β-strand 3 188. In CLIC4 this cis-proline is preceded by a second proline, as opposed to a leucine in CLIC1. This additional proline is presumably a large part of the reason that this loop conformation differs in the CLIC4(ext) structure. Within the CLIC family, CLICs 2, 4, 5 and 6 contain similar sequences within this loop and are thus likely to adopt a conformation similar to CLIC4, while the sequence of CLIC3 within this region suggests a structure closer to that of CLIC1.

In the glutathione complexes of most GSTs the peptide backbone of the residue preceding the cis-proline at the N-terminus of β-strand 3 forms a β-sheet-like interaction with the GSH cysteinyl moiety188. The CLIC proteins containing a CLIC4-like sequence within this loop region all possess a second proline residue prior to this cis- proline. As such they will be unable to form a hydrogen bond between the amide group of this residue and the carbonyl group of the GSH cysteinyl moiety. Despite containing a leucine residue in this position the CLIC1:GSH complex also fails to display such an interaction3. The implication this has for glutathione binding in the CLIC G-site is further discussed in relation to the dmCLIC structure presented in Chapter 6 in Section 6.4.4.

The GSTs also contain a conserved motif at the N-terminus of helix 3 that forms part of the GSH-binding site. This motif (the GST motif I) usually has the sequence SNAIL/TRAIL, where the isoleucine residue is the most highly conserved (see Insert 1). The serine or threonine residue within this sequence is the N-terminal residue of helix 3, the hydroxyl group and amide backbone of which form hydrogen bonds with the two oxygen atoms of the α-carboxyl group of the γ-glutamyl GSH moiety. This SNAIL/TRAIL motif is not present in the CLICs although the conserved isoleucine residue (Ile-80 in CLIC1 and Ile-91 in CLIC4) is present in all family members. Additionally, like the GST TRAIL sequences, CLIC1 and CLIC3 both contain an N- terminal threonine residue that is seen to bind GSH in the CLIC1:GSH complex (pdb code 1K0N)3. However the equivalent residue in the remaining vertebrate family members is either a valine (CLICs 4-6) or phenylalanine (CLIC2), these lack a hydroxyl groups with which to bind GSH. In the CLIC4(ext) structure this valine adopts an

139 equivalent rotamer to Thr-77 in the CLIC1 structure, thus while a charged hydrogen bond between the CLIC4 valine nitrogen backbone and the GSH α-carboxyl is possible, unless two such interactions are made, the CLIC4 G-site appears to lack a direct protein interaction with the second α-carboxyl oxygen.

This is reminiscent of the bacterial beta class GSTs, which contain a glycine residue instead of a serine or threonine in the N1 position of helix 3 within the sequence GVAIV. In the two E. coli GST structures complexed with glutathione sulphonate (GTS) (1A0F and 1N2A, at 2.1 Å and 1.9 Å resolution respectively)189,190 one of the GTS α-carboxyl oxygens of the γ-glutamyl moiety forms the usual hydrogen bond with the nitrogen group of the amide backbone in the N1 glycine, while the second α- carboxyl oxygen interacts with an asparagine residue located within helix 4 of the other GST subunit as well as a water molecule also coordinated by the backbone nitrogen of the N2 valine and a tyrosine residue from helix 6. In the lower resolution 2.7 Å structure of the beta class GST from P. mirabilis complexed with glutathione89, both the asparagine and tyrosine residues seen in the E. coli structures are also present within the active site, however it is unclear if they are involved in glutathione binding. This may be due either to slightly different binding modes of GSH and GTS in the beta class GSTs, or to the lower resolution of this structure. Within the CLIC4(ext) crystal structure the putative G-site lacks many of the elements normally associated with GSH binding in the GSTs. If the CLICs do form GST-like dimers, like the beta class GSTs, residues from both subunits may facilitate binding to the α-carboxyl group of the GSH γ-glutamyl moiety thus replacing some of the interactions that appear to be missing in the monomeric CLICs.

4.5 Conclusions In conclusion the structures of CLIC4(ext) and CLIC1 are almost identical except for slight differences centred around helix 2 and the residues leading into the flexible footloop. As discussed for CLIC1 in Chapter 2 the footloop is likely to be flexible in solution, thus the differences observed here may only reflect different crystal packing interactions and are unlikely to be significant. In contrast the position of helix 2 appears to be determined by the structure of the loop region connecting it to β-strand 3. Sequence alignments suggest that the majority of the CLIC family, including CLICs 2,

140 5 and 6, share the structure adopted by CLIC4 within this region. The structure and sequence of this loop region leading into β-strand 3 may affect GSH binding and will be discussed in further detail in Section 6.4.4.

141 Chapter 5 Crystal structure of oxidised CLIC1

5.1 Introduction In this chapter the crystal structure of a soluble dimeric form of CLIC1 is presented. In Section 5.1.1 it is shown that CLIC1 dimerises in vitro in response to oxidation, concomitant with dimerisation is the formation of an intramolecular disulphide bond. Section 5.3.1 presents the crystal structure of this oxidised dimeric form of CLIC1, which is then analysed in detail in Section 5.3.2. The oxidative dimerisation of CLIC1 results in the refolding of the N-terminal thioredoxin domain, the new structure of which forms the dimer interface. Intramolecular disulphide bond formation is seen to be a necessary requirement for this dimerisation process (Section 5.3.4). In Section 5.3.5 it is shown that in vitro channel formation of CLIC1 is prevented by either a reducing environment or absence of the cysteines involved in this intramolecular bond. These facts both suggest that the structure adopted upon oxidative dimerisation may be related to ion channel formation. However one of the cysteines involved in this intramolecular disulphide bond is unique to mammalian CLIC1s so generalisations to the CLIC family as a whole cannot be made. Section 5.4 discusses common themes between the CLIC1 redox change and redox changes that have been observed in other proteins.

5.1.1 Reversible redox-induced dimerisation of CLIC1 Most proteins known to adopt the canonical GST fold exist as dimers. However the CLIC proteins are predominantly monomeric in solution3,36. Although the monomeric nature of the GST-like CLIC proteins is unusual, a few studies have previously reported the existence of monomeric GST fold family members (see Figure 1-5). These include: purified rat liver GST Z1-1, which appeared to elute as a monomer in gel filtration chromatography191; an unidentified GST from maize (Z. mays)192; two lambda class GSTs and two DHARs from A. thaliana87; as well as some bacterial glutaredoxins88 and stringent starvation proteins91.

In some of our initial purification trials of CLIC1 a small proportion of protein (~ 1%) occasionally eluted from size exclusion chromatography at a molecular weight of ~ 60 kDa, suggesting the possibility of a CLIC1 dimer. As the CLIC proteins contain three highly conserved cysteine residues and a glutaredoxin-like active site, Dr Douglas

142 Fairlie working in the laboratory of Dr Samuel Breit tested whether dimerisation was effected by reactive oxygen species.

The results of these experiments as reported in Littler et al.52 are shown in Figure 5-1. On a Superose 12 (HR 10/30) sizing column (Amersham Biosciences), reduced CLIC1 elutes as a monomeric species at approximately 30 kDa (Figure 5-1A). Addition of

H2O2 resulted in the appearance of a CLIC1 dimer comprising approximately two-thirds of the total protein (Figure 5-1B). The proportion of dimer formed is unchanged between 5 min. and 6 h. of incubation with H2O2. Dimerisation is completely reversed by reduction with 50 mM DTT (Figure 5-1C). Electrophoresis showed that the observed dimer is not primarily due to an interchain disulphide bond (Figure 5-1D), although a minor amount of such material is always present in oxidised CLIC1 samples (Figure 5-1D, non-reducing SDS-PAGE).

Figure 5-1 Biochemical analysis of the oxidised CLIC1 dimer. Gel filtration chromatography of CLIC1 using a Superose 12 (HR 10/30) column is shown A: before oxidation, B: after oxidation by 2 mM H2O2 for 1 hour at room temperature and C: after subsequent reduction by 50 mM DTT for 1 hour at room temperature. The abscissae give the absorbance at 280 nm. D: electrophoretic analysis of CLIC1 where lane 1 shows untreated CLIC1, lane 2 and 3 show the monomer and dimer peak fractions after oxidation. The left panels show 10% native gels, the right panels show 15% SDS-PAGE gels run under both non-reducing (upper panels) and reducing (lower panels) conditions. Figure based on figure 1 of Littler et al.52.

143 In an attempt to elucidate the effects of oxidation on CLIC1, the dimeric oxidised form was purified and its X-ray crystal structure solved. The in vitro channel activity of the oxidised dimer was assayed, and finally, some mutations affecting the dimerisation process were characterised. The results of these experiments are discussed in this chapter.

5.2 Materials and Methods

5.2.1 Expression constructs The expression construct pCLIC1(wt) and related mutants containing the cDNA for human CLIC1 as a GST fusion protein in the expression vector pGEX-4T-1 were made in the laboratory of Dr Samuel Breit8 as described previously25. The vectors were transformed into the E. coli BL21(DE3) pLysS cell line.

5.2.2 Protein expression and purification E. coli BL21 (DE3) pLysS cells containing pCLIC1(wt) or a mutant construct were cultured overnight in LB with 100 μg/mL Amp, 34 μg/mL Cam. The culture was used to inoculate 1.2L of 2YT 100 μg/mL Amp, 34 μg/mL Cam media at a ratio of 1:50 overnight culture to fresh media, which was then grown at 37 °C. Expression of recombinant CLIC1 fusion protein was induced at an optical density of 1.0 cm-1 at 600 nm with 1 mM IPTG. Induction proceeded for 4 hrs before harvesting the cells by centrifugation at 6000 rpm for 10 min. at 4 °C, cells were then resuspended in 30 mL ice cold PBS containing 10 mM DTT before freezing at -80 °C until required.

Resuspended cells were thawed on ice and passed through a chilled French press twice at 110 MPa. Triton X-100 was added to the resulting homogenate to a final concentration of 2% v/v and incubated at room temperature for 30 min. with gentle agitation. The cell debris was then sedimented at 33,000 x g for 40 min. at 4 °C. The resulting supernatant was removed and incubated with 2 mL glutathione sepharose 4B beads (Amersham Biosciences) for 1 hr at room temperature, after which, unbound protein was eluted from the column and the beads washed with ~ 400 mL PBS containing 0.5 mM DTT. The column was then equilibrated with 2 volumes of

8 Centre for Immunology, St. Vincent’s Hospital, and University of New South Wales, Sydney 2010, Australia. 144 thrombin cleavage buffer (20 mM TRIS/HCl pH 8.4, 150 mM NaCl, 2.5 mM CaCl2, 1 mM NaN3 and 0.5 mM DTT). The bound fusion protein was cleaved from the glutathione beads by adding bovine plasma thrombin (Sigma-Aldrich) at a fusion protein:thrombin weight ratio of 50:1, and incubated at room temperature for 16 hrs.

After thrombin cleavage the CLIC1 protein was eluted from the column and DTT added to a final concentration of 10 mM, all further purification steps are carried out at 4 °C unless otherwise stated. The sample was then dialysed against column buffer (20 mM

HEPES pH 7.0, 100 mM KCl, 1 mM DTT, 1 mM NaN3) before concentrating to 2.5 mL and loaded on a Superdex 75 prep grade HiLoad 26/60 column (Amersham Biosciences). The protein was eluted with 300 mL of the sizing column buffer at a flowrate of 0.8 mL/min while collecting 5 mL fractions.

5.2.3 Oxidation and dimer formation The oxidation of wild-type CLIC1 and the various mutants discussed within this chapter were performed using essentially the same protocol unless stated otherwise.

After the first sizing column, fractions corresponding to CLIC1 reduced monomer were pooled and concentrated before dialysing against phosphate buffered saline solution not containing DTT. To this sample H2O2 was added to a final concentration of 2 mM. The protein was incubated under these oxidising conditions for 5 min. at 18 °C before dialysing against 20 mM HEPES, 100 mM KCl, 1 mM NaN3 at pH 7.0 for 2 h. at 4 °C. The sample was then loaded on a Superdex 75 prep grade HiLoad 26/60 sizing column (Amersham Biosciences), preequilibrated in the same buffer before eluting at 0.8 ml/min while collecting 5 mL fractions. The protein eluted as two peaks corresponding to dimeric (60 kDa) and monomeric (30 kDa) CLIC1. The dimeric fraction was passed over the sizing column a second time, and the resulting single 60 kDa peak concentrated, flash-frozen in liquid nitrogen, and stored at -80 °C.

5.2.4 Crystallisation Crystals of dimeric CLIC1 were grown at 4 °C using the sitting drop vapour diffusion method. Protein sample at 7.6 mg/mL in sizing column buffer was centrifuged at 16,000 x g for 10 min. at 4 °C prior to setting crystallisation trials. 5 μL of protein

145 sample was mixed with 5 μL of reservoir solution in a sitting bridge before sealing within a Limbro well containing 1 mL of reservoir solution. The dimeric CLIC1 crystals grew in reservoir solution consisting of 0.1 M CH3COONa pH 4.5 and 14-16% w/v polyethylene glycol monomethyl ether 5000 (Hampton Research). Crystals grew over a 2 day period to a size of ~ 200 μm x 200 μm x 300 μm, see Figure 5-2.

Figure 5-2 Dimeric CLIC1(wt) crystals.

Crystals grown in 0.1M CH3COONa pH 4.5, 14% w/v PEG 4k.

5.2.5 Data collection Crystals of dimeric CLIC1 were slowly brought into a cryoprotecting solution via a serial transfer performed at 4oC. This was done with two solutions: the first containing reservoir plus 5% v/v polyethylene glycol 400 (Hampton Research) and 150 mg/mL D- glucose (ICN Biomedicals), the second reservoir solution plus 10% v/v polyethylene glycol 400 and 300 mg/mL D-glucose. The transfer consisted of adding 3μL of the first solution to the protein drop and waiting several minutes before adding 3μL of the second solution. After several more minutes the crystal was mounted in a cryo loop (Hampton Research) and washed directly in solution two before freezing in liquid- nitrogen and mounting at 100 K in a nitrogen cryostream (Oxford cryosystems). The crystals were then flash-frozen in liquid-nitrogen and mounted into a 100 K nitrogen cryostream (Oxford cryosystems). Diffraction data were thus obtained on a DIP2030 imaging plate mounted on a Nonius rotating anode generator using Cu Kα radiation and focusing mirrors.

146 The crystals diffracted to 1.8 Å resolution in the space group P212121 (a = 59.91 Å, b =

69.23 Å, c = 107.51 Å). There are two molecules within the asymmetric unit with a Vm of 2.76 Å3/Da and a solvent content of 55.4%. Data were processed using the programs MOSFLM182 and SCALA183 (see Table 5-1 for data reduction statistics). This crystal diffracted in an anisotropic manner thus during integration the outer shell was restricted to 1.8 Å despite an I/σ of 2.8.

Figure 5-3 Diffraction image from a dimeric CLIC1 crystal. Collected on a DIP2030 image plate (Δϕ = 1o, distance = 120mm). Resolution is 1.66 Å at the edge of the plate

147 Table 5-1 Data reduction and refinement statistics for oxidised CLIC1. Reflections (unique) 163,232 (40,808) Completeness (1.86-1.8 Å shell) 95.3% (85.1%) I/σ (1.86-1.8 Å shell) 10.7 (2.8) Rmerge (1.86-1.8 Å shell) 0.058 (0.26) Protein (water) atoms 3346 (268) R factor (Rfree) 0.196 (0.228) r.m.s.d. bond lengths1 0.011 Å r.m.s.d. bond angles1 1.44o Ramachandran plot2 Most favoured region 94.1% Additionally allowed 5.6% Generously allowed 0% Disallowed 0.3% (Ala-125, A subunit) 1 From REFMAC 5 184 2 From PROCHECK182 r.m.s.d., root mean square deviation from standard bond lengths and angles determined by high resolution small molecule crystal structures184.

5.2.6 Structure determination and refinement The CLIC1 reduced monomer structure (1K0M)3 was used as a molecular replacement probe using the CCP4183 program AMoRe185. An initial phasing model consisting of two C-terminal domains (residues 92-241) was built using the program wARP186 for phase refinement. The resulting map was clear, and a model of the residues 23-92 in was built in using the program O187. This was refined using maximum likelihood methods (using the program REFMAC 5 184). The final model consists of residues 23 - 234 in the A subunit and residues 23-154 and 158-234 in the B subunit plus 268 water molecules. In both the A and B subunits Pro-91 adopts a cis peptide bond. The final R- factor is 0.196 with R-free at 0.228. Water molecules were added using a Babinet model with mask184, the mask parameters used were a vdw probe radius of 1.40Å, an ion probe radius of 0.80Å and a shrinkage radius of 0.80Å. Structures were validated using PROCHECK182 using default parameters. The data reduction and refinement statistics are summarised in Table 5-1. The structure is deposited within the protein databank with pdb code 1RK4.

5.2.7 Ramachandran distance The Ramachandran plot of CLIC1 in the monomeric and dimeric states was compared, a Ramachandran distance (degrees), D, was computed for residues 24-234 as: 2 2 D = √( (φd- φm) + (ψd- ψm) )

148 where subscripts d and m refer to the dimer and monomer structure.

5.3 Results

5.3.1 Structure of the oxidised dimer of CLIC1 The structure of the oxidised dimeric form of CLIC1 was determined at 1.8 Å resolution (Figure 5-4A, Table 5-1). The crystal contains one dimer per asymmetric unit, with the two subunits nearly identical (RMS deviation of 0.34 Å excluding residues 147-164). The dimer is ~ 75 Å long, 25 Å wide and tapers in height from 45 Å at each end to 20 Å in the middle (Figure 5-4D). The structure is all helical, and the dimer interface occurs between the two newly configured N-terminal domains. The arrangement of the subunits bears no relationship to that seen in GST dimers6.

Figure 5-4 Structure of the oxidised CLIC1 dimer. A: stereo backbone of the CLIC1 dimer; the A subunit is shown in green, the B subunit in red. In the A subunit every tenth residue is labelled. B: electron density of the intramolecular disulphide bond between Cys-25 and Cys-59 contoured at 1σ. Cartoon representation of the CLIC1 dimer viewed C: along and D: perpendicular to the pseudo 2-fold axis. In the A subunit helices are shown in red, the B subunit in green, disulphide bonds are in yellow. Figures were made with MOLSCRIPT1, RASTER3D2 and CONSCRIPT193.

149

Figure 5-5 Transition between the monomeric and dimeric CLIC1 forms. Representations of A: the reduced monomeric form of CLIC1 and B: the A subunit of the oxidised dimeric form. C: backbone superposition of CLIC1 for the reduced monomeric (green) and the oxidised dimeric (purple) states. The figures were made with MOLSCRIPT1 and RASTER3D2.

5.3.2 Comparison of the CLIC1 monomer and dimer The formation of the oxidised dimer has produced a dramatic structural change in the conformation of CLIC1 (Figure 5-5, A-C). The two C-domains (residues 92-241) show only minor alterations. In contrast, the N-domain has undergone a radical structural rearrangement, including the formation of an intramolecular disulphide bond between Cys-24 and Cys-59 (Figure 5-4, B and C). In the monomer the sulphydryl groups of these two residues are separated by 13.1 Å. Cys-24 is conserved in all CLICs, where it is at the centre of the glutathione-binding site in the monomeric form. Whereas Cys-59 is unique to CLIC1, corresponding to a conserved alanine residue in all other human CLICs (see Insert 1).

The most apparent structural change between the monomer and the dimer is the disappearance of the β-sheet (Figure 5-5, A-C). In the dimer there is no electron density for the first 22 residues (β-strand 1 in the monomer), and helix 2 is extended by an extra 2 turns to include residues starting at Thr-44 (originally β-strand 2; Figure 5-9). Residues 60-77 between helices 2 and 3 (β-strands 3 and 4 of the monomer) form an extended loop loosely packed against the C-domain in the dimer (Figure 5-4D and Figure 5-5B). This loop covers a tunnel filled with ordered water molecules.

150 The position of helix 2 has altered significantly (44.2º rotation and 11.5Å centre of mass translation, calculated using residues 53-58) in order to facilitate the involvement of Cys-59 in the disulphide bond and the formation of the dimer interface. The positions of helices 1 and 3 have remained unchanged relative to the C-domain, although both helices have been shortened from their N-termini to allow the formation of the disulphide bond and the loop packed against the C-domain (Figure 5-5B and Figure 5-9). This contrasts with the observation in the 1.4 Å structure of the reduced CLIC1 monomer, where in one of the subunits, helices 1 and 3 were observed in two conformations, which differed with respect to their orientation relative to the C-domain (see Section 2.1)3.

Figure 5-6 Surface complementarity of the dimer interface. The molecular surface of the CLIC1 dimer A subunit is shown, coloured by the Lawrence shape complementarity, S(x), of the interface194. A backbone representation of the B subunit is displayed in blue, residues involved in the aromatic stacking interactions between the subunits are labelled. The overall value for S(x) of 0.720 is comparable with oligomeric protein interfaces. The tightly packed aromatic region of the interface (yellow) corresponds to S(x) values close to unity, which encloses a loosely packed inner hydrophobic region with low S(x) values (pink). The figure was made using GRASP195.

151 The dimer interface buries ~2200 Å2 of accessible surface (per dimer). It is predominantly hydrophobic, consisting of two flat sheets of 4 α-helices, where each sheet comprises helices 1, 3 and 8 from one subunit and helix 2 from the other subunit (Figure 5-4C). The order of the helices is 8, 1, 3 and 2 with the dipole moments of helices 8 and 2 antiparallel to those of helices 1 and 3. The interface is formed between these helix sheets, and although the two sheets are approximately parallel, they are rotated with respect to each other by around 60° (Figure 5-4C).

The interface can be broken into three roughly concentric areas where the majority of residues within each section are dominated by a particular class of amino acids (Figure 5-6and Figure 5-7). The outermost section is lined with charged/polar residues contributed by helices 2 and 3, forming several intermolecular salt bridges. This region acts as a boun dary between solvent and the dimer interface.

The next ring-like section consists of a tightly packed hydrophobic region dominated by two symmetry-related columns of stacked aromatic residues (Phe-83 from helix 3 slots into a hydrophobic pocket formed by Phe-26 and Phe-31 of helix 1 plus Val-55 and Leu-58 of helix 2; Figure 5-7).

Inside this tightly packed ring there is a central hydrophobic patch composed of Leu, Ile and Val residues contributed by helices 1 and 3. This patch is not entirely complementary across the dimer symmetry axis, resulting in an occluded cavity that is devoid of ordered water molecules (Figure 5-6 and Figure 5-7). Most of the hydrophobic residues at the interface are conserved within the vertebrate CLIC family members, except Phe-26 and Trp-35 (Ser and Leu, respectively, in CLIC3).

In order to compare the complementarity of the dimer interface in CLIC1 with other protein-protein interactions, the Lawrence shape complementarity, S(x), of the interface was calculated194. The CLIC1 dimer interface gives an overall value for S(x) of 0.720 indicating an interaction surface comparable with most standard oligomeric proteins. A plot of the S(x) value mapped onto the surface of the A subunit is shown in Figure 5-6. The tightly packed aromatic region of the interface can be seen in the ring of yellow 152 corresponding to S(x) values close to unity which encloses the loosely packed inner hydrophobic region with low S(x) values coloured pink.

In the CLIC1 monomer, helices 1 and 3 form the surface against which the β-sheet is packed, whereas in the dimer they are central to the dimer interface and form interactions with their symmetry related partners in the other subunit. Figure 5-8 shows a comparison of the environment of Phe-83, which is located within helix3, in the monomeric (Figure 5-8A) and dimeric (Figure 5-8B) states. In the CLIC1 monomer, Phe-83 along with several Ile, Leu and Val residues all contributed by helix-3, form a hydrophobic surface against which residues within the β-strands 3 and 4 interact.

Figure 5-7 Stereo view of the dimer interface. At the top and bottom of the figure are the tightly packed columns of aromatic residues, whereas there is an aliphatic cavity in the centre. Helices are shown as per Figure 5-4C and D. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

153

Figure 5-8 Hydrophobic pocket formation at the dimer interface. A: The residues surrounding Phe-83 in the CLIC1 monomer. Residues involved are labelled. Helices are shown in red, β-strands in yellow. B: Residues forming the hydrophobic pocket into which Phe-83 from the B subunit binds at the dimer interface. Helices are shown as per Figure 5-4C and D. For clarity only those residues contributed by the B subunit are labelled (blue bonds), bonds shown in grey are the same as those from helix 3 labelled in panel A. In panel B, helix 3 from the A subunit is behind helix 2 from the B subunit. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

An equivalent surface exists on helix1 that interacts with β-strands 1 and 2. The loss of the β-sheet on dimer formation exposes these hydrophobic surfaces; the interaction of these two surfaces between subunits forms the dimer interface.

Figure 5-9 ClustalW120 alignment of the CLIC family (shown overleaf). The secondary structure is shown for both monomeric (red, helices; yellow, β-strands) and dimeric (blue, helices) forms. Conserved regions are shaded: green, putative transmembrane regions; yellow, Cys; cream, Gly. Features unique to CLIC1 are in blue. Ramachandran distances (see Section 5.2.7) for the monomer to dimer transition are plotted above its sequence. Figures made with SETOR196.

154

Figure 5-9 ClustalW120 alignment of the CLIC family.

155

To understand the structural transition, the Ramachandran maps of the two structures were compared. A Ramachandran distance was computed (see Section 5.2.7) and plotted as a function of residue number (see Figure 5-9). In the C-domain there are only two regions showing large Ramachandran distances: the loop joining helices 4b and 5 (residues 123-4), and the flexible foot loop (residues 146-165). In contrast, most of the N-domain has undergone significant shifts in Ramachandran space with the exception of the residues in helices 1, 2 and 3 that are common to both structures. The major changes in the N-domain are primarily due to the transition of residues from the β-sheet region of Ramachandran space into less constrained loop structures with α-helical characteristics (Figure 5-10C).

The Ramachandran distance plot can also be used to identify hinge regions that facilitate structural transitions. The Ramachandran distances are mapped onto backbone representations of CLIC1 monomer and dimer structures in Figure 5-10A and B. Residues not observed in the electron density of the dimer are coloured gold on the monomer backbone trace (Figure 5-10A). In the monomer structure most of the hinge regions reside in the N-domain, near the glutathione-binding site (pink regions, Figure 5-10A).

The first hinge occurs at Ser-27, which is at the N-terminus of helix 1 (see Figure 5-9 and Figure 5-10, A and B). The transition disrupts the first turn of the helix in the dimer. It flips the side chain of Phe-26 from the glutathione binding site (monomer structure) to the dimer interface. The side chain of Ser-27 mimics the backbone, capping helix 1 in the dimer structure.

The next peak in Ramachandran distance, although relatively small occurs at Val-39 (see Figure 5-9). Val-39 follows Gly-38, a highly conserved residue in the canonical GST fold, which caps the C-terminus of helix 1 in both structures. This is a hinge point that allows the movement of helix 2 on dimer formation. Surprisingly the residues that hinge at this point, and form the loop connecting helices 1 and 2 in the dimer (β-strand 2 in the monomer) partially retain their β-sheet characteristics due to the formation of a binding cleft at the dimer interface. Residues 45-47 and 50-52 all show large changes in Ramachandran space as they become incorporated into helix 2 upon dimer formation. 156

Figure 5-10 Ramachandran distance plots. Ramachandran distances for residues 23-234 are mapped onto the backbone structure of the A: monomeric and B: dimeric forms. The colour gradient, from grey to pink represents Ramachandran distances from 0° to 180°. Residues not observed in the dimer are coloured gold. C: Ramachandran plot of residues within the N-domain with Ramachandran distances greater than 35° between the two structures. Monomer φ-ψ co-ordinates are plotted as orange squares, the dimer co-ordinates as black squares with a connecting line9. The figures were made with GRASP195.

9 The monomer to dimer transition of Asp-76 seen in panel C is incorrectly labelled as Thr-77 in the identical panel of figure 3F in Littler et al. (52). 157 In the N-domain, β-strands 3 and 4 have flipped away from helices 1 and 3 and now overlay the C-domain. The two hinge regions involved are located at the C-terminus of helix 2 and N-terminus of helix 3 (pink in Figure 5-10, B and C). Gln-63 (part of the cis-proline loop in the monomer) acts as the hinge for the transition of β-strand 3 (monomer) to form the extending arm of the loop (dimer), aided by the cis (monomer) to trans (dimer) isomerisation of Pro-65. Asn-78 forms the hinge for the transition of β- strand 4 (monomer) to the returning arm of the loop (dimer). In the monomer, Asn-78 is at position N3 in helix 3, where its side chain carbonyl group hydrogen bonds to Arg- 29. On dimer formation, it undergoes a significant backbone rotation and is now the N- terminal capping residue in the helix with the side chain carbonyl group interacting with the amide backbone of positions N2 and N3. Its side chain nitrogen amide group also coordinates a water molecule that hydrogen bonds to the hydroxyl group of Ser-27, the new N-terminal capping residue of helix 1. The original N-terminal cap of helix 3, Asp- 76, aids the unwinding of the first two residues of the helix, by forming a hydrogen bond to the side chain of Asn-23 and a salt bridge with Lys-183. The backbone carbonyl group of Asp-76 also replaces the interaction between Asn-78 and Arg-29. During this transformation Asp-76 moves from a generously allowed region of Ramachandran space in the monomer into a less strained conformation in the dimer (see Figure 5-10C), interestingly the equivalent residue in most GSTs also adopts a similar constrained conformation and is usually a glutamine or glutamic acid residue involved in glutathione binding6,49.

The large Ramachandran distances observed for the C-terminal domain correspond to a peptide flip in the loop connecting helices 4b and 5 (residues 123-4). This peptide is involved in a crystal contact with a symmetry related molecule; both conformations are visible within the electron density. The second peak corresponds to the flexible footloop (residues 146-165). This region of the protein extends away from the body of the molecule and its conformation is dominated by crystal packing interactions (the A and B subunit have different conformations) and is thus expected to be relatively flexible in solution.

158 5.3.3 The effects of single amino acid mutants on dimer formation In order to ascertain the role disulphide bond formation plays in dimerisation, the CLIC1 mutants C24S and C59S were made, purified and oxidised as per the wild type protein. These cysteines form the intramolecular disulphide bond in the crystal structure of the wild-type oxidised dimer (Figure 5-4B). The mutants in which either of these cysteines is replaced by a serine are not seen to dimerise or alter their oligomeric state after oxidation as assayed by gel filtration chromatography (Figure 5-11). Thus the formation of the disulphide bond is a necessary requirement for the oxidative dimerisation of CLIC1.

Dr Douglas Fairlie has also mutated several conserved or charged residues in the N- terminal domain of CLIC1 in order to monitor the effect this has on the formation of the oxidised dimer (personal communication). The residues Lys-13, Asp-17, Lys-20, Arg- 29, Lys-37 and Pro-91 were each individually mutated to alanine and the oligomeric state of these point mutants assessed before and after oxidation (Figure 5-12).

Most of these residues, are highly conserved and lie in proximity to the putative N- terminal transmembrane region. Lys-13 and Asp-17 are conserved in all CLICs; Arg-29 and Lys-27 in all vertebrate CLICs; Pro-91 in all vertebrate CLICs except CLIC5 in mammals where it is a Leucine; and Lys-20 is unique to mammalian CLIC1s. The positions of these residues in the CLIC1 monomer structure are highlighted in Figure 5-12I.

159

Figure 5-11 Cys-24 and Cys-59 are essential for dimer formation. Gel filtration chromatography of CLIC1 mutant proteins using a Superdex 75 prep grade HiLoad 26/60 column (Amersham Biosciences). The series of chromatograms show wild-type (wt) CLIC1and the C24S and C59S mutants under reducing conditions and after oxidation via the addition of 2 mM H2O2. The abscissae give the absorbance at 280nm in arbitrary units.

The K13A, K20A and R29A mutants showed similar gel-filtration chromatography profiles to wild-type CLIC1 after oxidation (Figure 5-12B, D and E respectively), although a small portion of the K13A mutant ran as a dimer under reducing conditions. Similarly CLIC1(D17A) also showed a small amount of dimer under reducing conditions and upon oxidation was almost exclusively converted to dimer (Figure 5-12C). Oxidation of the K37A and P91A mutants lead to the formation of a mixture of indistinct oligomeric complexes running close to the column void volume (Figure 5-12F and G). 160 Figure 5-12 Oxidation of several CLIC1 single amino acid mutants. A to H: Gel filtration chromatography of CLIC1 mutants using a Superose 12 (HR 10/30) column is shown left panel: before oxidation, and right panel: after oxidation by 2 mM H2O2 for 5 min. at room temperature. The abscissae give the absorbance at 280 nm. Peaks are numbered according to the elution positions of 1) wild type monomer, 2) wild type dimer, 3) higher MW species and 4) the columns void volume. I: A cartoon representation of the CLIC1 monomer highlighting the position of the mutated residues within its structure. Panel made using the programs MOLSCRIPT1 and RASTER3D 2.

As seen in Figure 5-12I, Lys-13 and Lys-20 are solvent exposed in the CLIC1 monomer, and while not observed at all in the electron density of the dimer are

161 presumably also solvent exposed in this structure. If these residues do retain relatively similar microenvironments in both states this would presumably explain why their mutation has little effect on dimerisation.

Figure 5-13 The CLIC1 interdomain loop. Cartoon diagram of the loop connecting the N-terminal and C-terminal domains in the CLIC1 monomer (pdb code 1K0M3). The backbone of residues Leu-88 to Leu-96 is shown as partially transparent. The residues involved in stabilising the conformation of the loop are shown in bold and labelled. Hydrogen bonds are shown as dashed purple lines; the salt bridge between Lys-37 and Glu-85 as a dashed green line. Figure made using the programs MOLSCRIPT1 and RASTER3D2.

In the structure of the CLIC1 reduced monomer, Asp-17 stabilises the glycine-rich loop between β-strand 1 and α-helix 1 by hydrogen bonding to residues within the loop. The

162 glycine-rich loop of the monomer becomes unstructured in the dimer, the greater affinity of the D17A mutant for the dimeric state may be explained by it promoting a dimer-like conformation within this region.

Structurally, Arg-29 is in an identical position between the N- and C-domains in both the monomer and dimer. Although remaining in a topologically equivalent position in both structures, the residues Arg-29 forms bonds with differ in the two states, thus it is surprising its mutation has little effect on CLIC1 dimerisation.

Lys-37 is at the extreme C-terminus of α-helix 1 in both the monomer and dimer structures where it stabilises the conformation of the domain-connecting loop by forming a salt bridge and several hydrogen bonds (see Figure 5-13). Pro-91, as mentioned previously (see Section 2.1), is also important for defining the conformation of this loop region as it adopts the cis conformer. Why the mutation of these residues leads to the formation of higher MW species on oxidation is unclear.

5.3.4 The presence of Cys-59 is necessary but not sufficient for dimer formation CLIC4 remains monomeric after oxidation using the same conditions as those used to produce the CLIC1 dimer. As noted in Section 5.3.3, Cys-59 is essential for the oxidative dimerisation of CLIC1, yet this residue is a conserved Ala throughout the rest of the CLIC family (see Insert 1). It was hypothesised that other CLICs might be capable of adopting a similar transient structure to that seen in the CLIC1 dimer but perhaps were not offered the stability effected by the formation of a disulphide bond. To determine if this was the case for CLIC4 the alanine residue corresponding to Cys- 59 in CLIC1 was mutated to a cysteine (CLIC4(A70C) mutant).

163

Figure 5-14 Oxidation of CLIC1(T52P) and CLIC4(A70C) mutants. Size exclusion chromatography elution profiles for CLIC1 amino FLAG (AF) T52P mutant under reducing conditions (black) and after oxidation (red). CLIC4 A70C mutant under reducing conditions (green) and after oxidation (blue).

Like wild-type CLIC4 both the reduced and H2O2 treated CLIC4(A70C) mutant elute as monomeric species on gel-filtration chromatography (see Figure 5-14). This indicates that Cys-59, while necessary for CLIC1 dimerisation, is not the only difference between CLIC1 and CLIC4 resulting in oxidative dimerisation.

In helix 2 there is another salient difference between CLIC1 and the other human CLIC family members. In CLIC1, helix 2 has a proline C-terminal capping residue and a threonine N-cap residue. While instead of a threonine residue, all other CLIC family members have prolines at both the N-cap and C-cap positions of helix 2. Due to the failure of the CLIC4(A70C) mutant to dimerise under oxidising conditions, it was hypothesised that the presence of this proline N-cap may inhibit the N-terminal extension of helix 2 during dimerisation. To test this hypothesis a proline C-capping

164 residue was introduced into CLIC1. This mutation was not seen to affect dimerisation (see Figure 5-14).

5.3.5 Reduction of CLIC1 prevents channel formation Dr Raffaella Tonini and Prof. Michele Mazzanti10 have performed preliminary tests on the ability of the CLIC1 dimer to form Cl- channels using tip dip electrophysiology52. These experiments showed that both the reduced monomer and the oxidised dimeric forms of CLIC1 were able to form ion channels in artificial bilayers (Figure 5-15) that are indistinguishable from the channels seen in inside-out patches from Chinese hamster ovary cells transfected with CLIC1 66.

The channel characteristics for the monomer, dimer, and inside-out patches for transfected CHO cells11 are as follows: conductance, 28 ± 9, 28 ± 9 and 30 ± 2 pS (Figure 5-15D); mean open time, 28.8, 30.4, and 26,9 ms (Figure 5-15D); open probability, 0.53 ± 0.03, 0.50 ± 0.06, and 0.53 ± 0.03 (Figure 5-15D), respectively. Tip dip electrophysiology experiments were repeated in the presence of DTT. The probability of observing channel activity was reduced for both monomer and dimer in the presence of 5 mM DTT (Figure 5-15E). Presumably the presence of DTT converts CLIC1 to the monomer form, which is then non-functional under reducing conditions. As a corollary of this, the channel activity observed for CLIC1 monomer may be due to the effect of oxidation occurring in the absence of reducing agents.

10 Department of Cellular and Developmental Biology, University of Rome “La Sapienza”, 00185 Rome, Italy. 165

Figure 5-15 Electrophysiological characterisation of the CLIC1 monomer and oxidised dimer. A: shows current traces from artificial bilayers all recorded at the same potential in the pipette of +30 mV. The top trace is a control to which no protein has been added. A similar lack of channel activity was observed when either CLIC1 monomer or oxidised CLIC1 dimer were added in the presence of 5 mM DTT. The lower panels show typical traces (two examples each) obtained when either CLIC1 monomer or oxidised CLIC1 dimer are added under non-reducing conditions. B: the mean open time of the two preparations (dimer on the left and monomer on the right) obtained from 3 s of data each show similar values to those obtained previously11. C: the current amplitude histograms showing the distribution of closed (0) and open (1) states obtained from 3 s of data for the dimer (left panel) and monomer (right panel). D: shows the i/V curves (left panel) in which the peak amplitude histogram current values at different membrane potential are plotted for the dimer (squares) and for the monomer (triangles). The combined data can be fit to a single channel conductance of 28 ± 9 picosiemens. The right panel shows the open probability plot (squares for dimer and triangles for monomer), which is comparable with previous analysis11. E: percentage of electrophysiological experiments (tip dip patches) where CLIC1 channel activity was observed after the addition of CLIC1 monomer or dimer 5 ± mM DTT. Three different protein preparations of monomer and of dimer were used to test the effect of DTT. A null result indicates that no chloride ion channels were observed within 20 min of protein addition to a patch with a gigaohm seal. Error bars represent 1 S. D., computed by assuming Poisson statistics (σ = √n). Numbers in parentheses indicate total number of trials for each condition.

166 Dr Louise Brown11 has measured the ability of CLIC1 to elicit bulk chloride efflux from artificial liposomes52. In these experiments addition of oxidised dimeric CLIC1 resulted in a larger chloride flux compared to an equivalent concentration of CLIC1 monomer (Figure 5-16 and Table 5-2).

Figure 5-16 Chloride efflux from liposomes exposed to monomer or dimer. Valinomycin-dependent chloride efflux from liposomes: black line, no protein control, red line, CLIC1 monomer; green line, dimer. After 120 s 0.1% Triton X-100 was added to release remaining intravesicular chloride.

To investigate whether disulphide bond formation is required for channel activity the efflux rates and probability of forming channels in tip-dip experiments of the C24S and C59S CLIC1 mutants were measured52. As a control, the behaviour of the non- conserved cysteine mutant C89S that is capable of oxidative dimerisation (data not shown) was also analysed. Both efflux and tip-dip experiments indicate that under the conditions tested the C24S and C59S mutants are inactive as channels, whereas C89S is indistinguishable from wild-type CLIC1 monomer (see Table 5-1).

One cautionary note as to the interpretation of these results should briefly be made. In the single channel recordings presented here, and to smaller extent even the efflux assays, only a tiny amount of protein is required for activity. The vast majority of the

11 Initiative for Biomolecular Structure, School of Physics, University of New South Wales, Sydney, New South Wales 2052, Australia. 167 dimeric protein used to elicit these channel recordings is assuredly of the form seen in the crystal structure. However even if only a fraction of a percent of the total protein added is not of this form, and instead adopts an intermolecular disulphide bonded state that co-purifies with the predominant oxidised dimer, this hypothetical alternate form of CLIC1 may be in sufficient quantity to account for channel activity.

Table 5-2 Chloride channel activity of CLIC1. Bilayer Protein Efflux1 electrophysiology2 No protein 3.7 ± 1.3% (25) Reduced monomer 7.5 ± 1.5% (7) 38/56 Oxidised dimer 14.0 ± 2.7% (6) 52/72 C24S 3.8 ± 2.0% (5) 0/10 (0.001%) C59S 3.4 ± 1.6% (5) 0/6 (0.1%) C89S 6.7 ± 0.7% (6) 5/5 (14.5%) 1 Chloride efflux is measured as a percentage of chloride released from 400-nm unilamellar liposomes that occurs within 120 s after the addition of valinomycin (±S.D.). Numbers in parentheses refer to the number of trials. Each set of experiments was done with protein from at least two independent preparations. 2 The first numeral indicates the number of tip dip bilayer experiments where chloride ion channels were observed within 20 min. of protein addition, and the second numeral is the total number of trials. Where no channels were observed within the experimental time, a second sample of native CLIC1 monomer was added as a positive control to ensure the integrity of the system. For the mutant proteins the number in parentheses indicates the probability (as a percentage) that the result observed would be obtained if the protein had the same properties as the wild type. The calculated probabilities assume a binomial distribution.

5.4 Discussion

5.4.1 The CLIC1 dimer In vitro oxidation of CLIC1 has lead to the discovery of a new, stable, soluble form of CLIC1, which reverts to the monomeric state upon reduction. Oxidation results in the formation of an intramolecular disulphide bond between Cys-24 and Cys-59. Concomitant with disulphide bond formation, the protein undergoes a major structural change, which exposes a large hydrophobic surface. This surface, which interacts with hydrophobic residues contributed by the β-sheet in the monomer, instead forms the dimer interface in the oxidised dimer. 168

The loss of this β-sheet on dimerisation results in a restructuring of the N-terminal thioredoxin-like domain of CLIC1. Residues forming the G-site in the CLIC1 monomer become widely dispersed in the dimeric structure, which is therefore unlikely to bind GSH in a similar manner to the GSTs, if at all.

This structural editing of the N-terminal domain upon dimer formation demonstrates the ability of CLIC1 to adopt multiple stable conformations. Such conformational plasticity is an obvious requirement for the CLIC proteins if they are to attain both soluble and membrane spanning states. Despite the altered structure observed in the dimer, the nature of a possible CLIC membrane spanning state still remains unclear. Although the comparison of the two forms of CLIC1 performed in Section 5.3.2, does serve to highlight positions within the protein capable of acting as hinges, as well as regions of the primary structure able to be incorporated within different forms of secondary structure.

Cys-24 is one of the three cysteine residues conserved within all CLIC family members (Cys-24, Cys-178 and Cys-223) and is the reactive cysteine at the centre of the glutathione-binding site in the monomeric form. In contrast Cys-59, located at the extreme C-terminus of helix 2, is unique to CLIC1 and is not conserved in the CLIC1 of amphibians (X. laevis and X. tropicalis) or fish (D. rerio, T. nigroviridis and F. rubripes). The disulphide bond formed upon the oxidative dimerisation of CLIC1 is thus unlikely to be a required general feature of CLIC function. The question then arises, is Cys-59 and consequently the CLIC1 dimer structure as a whole, relevant to the function of the CLICs in vivo?

Several alternative hypotheses can be made as to the relevance of the CLIC1 dimer: 1) the dimer is an artefact of the non-physiological conditions used to create it and does not relate to the in vivo function of CLIC1; 2) the dimer is an artefact, but regions of flexibility utilised during its formation are in common with those used during membrane insertion; 3) the dimer is a physiologically relevant state unique to CLIC1, for example it may be a metastable state between the soluble and membrane inserted forms of CLIC1 or a means of negative regulation possibly formed during redox-stress; 169 4) all CLICs are capable of adopting a physiologically relevant dimer-like state in response to various signals. For CLIC1 in vivo this signal is redox-related, whereas for the other CLICs the equivalent signal may require binding partners or posttranscriptional modification.

No conclusive evidence currently favours one of these hypotheses over the others. However the preliminary electrophysiology and efflux experiments described in Section 5.3.5 showed that the in vitro channel activity of CLIC1 required the presence of Cys-24 and Cys-59. Both residues are also essential for dimer formation (see Section 5.3.3). Moreover the presence of reducing agents in the tip-dip experiments decreased the possibility of observing channel activity. This at least partially indicates that for CLIC1 the formation of a dimer-like structure may be an intermediary to channel insertion.

This then leads to the question, could the other CLICs possibly form a structure related to that of the CLIC1 dimer? Unlike CLIC1, CLIC4 does not dimerise on oxidation. Additionally the introduction of a cysteine residue in an equivalent position to that of Cys-59 in CLIC1 is insufficient to allow dimerisation in CLIC4. Thus CLIC4 is either incapable of forming a CLIC1 dimer-like structure or the intramolecular disulphide bond is not the only factor involved in the formation of the dimer under the conditions examined.

An analysis of regions of flexibility in CLIC1 corresponding to hinge points about which large conformational changes occur was made in Section 5.3.2. Most of the residues within the hinge regions between the CLIC1 monomer and dimer structures are conserved throughout the CLICs. If other CLICs are to adopt a CLIC1 dimer-like structure, the new interactions with the remainder of the molecule formed by elements with altered secondary structure or changes in topography should occur between conserved residues. An analysis of these interactions follows.

β-Strand 1 is not visible in the electron density of the dimer, presumably due to the loss of the β-sheet and the flexibility allowed by the three conserved glycine residues at its C-terminus. Therefore in the dimer only three of the four β-strands form new interactions with topologically distant parts of the protein.

170

In the monomer β-strand 3 is a highly conserved internal strand within the sheet with hydrophobic residue on one face that pack against helices 1 and 3, and on the other face that pack against helix 2. In the dimer these residues have moved considerably and form direct interactions with residues from helices 4b and 9, except Tyr-69 that remains exposed to solvent. The two leucine residues within β-strand 3 bind to a hydrophobic region centred around the conserved residues Phe-111 and Phe-114 from helix 4b. This hydrophobic patch is in a region topologically equivalent to the H-site in many GSTs. Pro-65 is a cis-proline at the N-terminus of β-strand 3 in the monomer. In the dimer it is in the trans conformer and its backbone carbonyl oxygen forms a hydrogen bond with the hydroxyl group of the invariant residue Tyr-233 from helix 9. This is in part accommodated by a 4° rotation and 0.8 Å translation of helix 9 toward helix 4b. This closing of the C-terminal helix over the H-site, albeit a small movement in this case, is reminiscent of the alpha class GSTs where helix 9 forms a lid over the H-site when an electrophilic substrate is bound74.

β-Strand 2, in the monomer is on one edge of the β-sheet so the side chains of several residues within the strand are exposed to solvent. Despite this these residues are highly conserved. In the dimer residues within this strand (except Thr-45) form extensive direct interactions with conserved residues at the interface between helices 1 and 2 of one subunit and helix 3 of the other. At the C-terminus of this strand the side chains of the conserved residues Thr-44 and Asp-47 do not interact with the remainder of the molecule in the monomer, whereas in the dimer they form the N-cap motif of the extended helix 2.

On the other solvent exposed edge of the β-sheet in the monomer, the residues within β- strand 4 are moderately conserved. In the dimer these residues show relatively few interactions with the rest of the molecule and are predominantly solvent exposed or participating in crystal contacts.

While not moving with respect to the C-terminal domain, helices 1 and 3 both have large portions unmasked on dimerisation that then form hydrophobic contacts with each

171 other across the interface. As mentioned in Section 5.3.2 these residues are also highly conserved.

The last region that undergoes major conformational changes on dimerisation is helix 2 and the loops connecting it to the β-sheet. The loop connecting the N-terminus of helix 2 to β-strand 2 is relatively well conserved; in the CLIC1 dimer this loop becomes incorporated in the N-terminal extension of helix 2. Apart from Asp-47 discussed above most of the residues within this loop are charged and do not interact with the rest of the molecule in either the monomer or dimer forms.

While obviously important for CLIC1 dimer formation the residues within the cis- proline loop and helix 2 itself (see Figure 2-1 and Figure 4-4D) are only moderately conserved within the CLICs (see Insert 1). CLICs 4, 5 and 6 all have similar sequences for both these regions, while those of CLIC1 differ considerably. Helix 2 in CLIC2 is different yet again but its cis-proline loop is the same as CLICs 4, 5 and 6. CLIC3 has the same cis-proline loop as CLIC1 but a helix 2 with a sequence similar to CLIC2. Within these two regions there are three invariant residues (Pro-60, Gly-61 and Pro-65), all contained within the cis-proline loop. In both CLIC1 structures and the monomeric CLIC4 structure, Pro-60 or its equivalent is in the C’ position following the C-cap residue of helix 2 (Cys-59 in CLIC1, Ala in the other CLICs), with Gly-61 in the following C” position. Pro-65 is the cis-proline prior to β-strand 3, its positions in both states has been previously discussed.

There are two moderately conserved hydrophobic residues in helix 2 that anchor the helix to the β-sheet in the monomer. These are Val-55 (Leu in other CLICs) and Leu- 59 (Phe in CLIC3). These two residues form part of the conserved interface in the dimer. The other residues in helix 2 are primarily exposed to solvent in the monomer and not highly conserved, they do not interact much with the body of the molecule in either form of CLIC1 and retain the same helical structure in both forms. It is noteworthy that the N- and C-terminal capping residues of helix 2 are unique to CLIC1. The C-cap residue, Cys-59, has been discussed previously. The N-cap residue, Thr-52, is also unique to CLIC1 and corresponds to a conserved proline in other CLICs. It was hypothesised in Section 5.3.4 that in the other CLICs the presence of a helix 2 proline

172 N-cap may prevent the N-terminal extension of this helix, thus inhibiting dimerisation. However, using the CLIC1 knockin mutant T52P this was shown not to be the case.

As mentioned above there are two versions of the cis-proline loop; CLIC1 and CLIC3 contain the sequences 60-PG[G/S]QLP-65 12, while the corresponding sequence in CLICs 2, 4, 5 and 6 is PGT[N/H]PP. The structure of this loop in the reduced monomeric states of CLIC1 and CLIC4 was seen to be quite different (Figure 4-4D). In the CLIC1 monomer to dimer transition, this loop acts as a hinge point switching between the cis-proline conformation of the monomer and the conformation stabilised by the disulphide bond involving Cys-59 in the dimer (see Section 5.3.2). This is concomitant with large movements in Ramachandran space for both glycines in the sequence (Figure 5-10). In both forms of CLIC1 the two prolines bounding this loop (and partially Leu-58) form the majority of the interactions with the remainder of the molecule. The role of this region thus appears to be that of a hinge rather than a stabilising agent. Whether the alternate sequences of other CLICs can perform a similar hinge-like role is uncertain.

In conclusion, many of the differences between the CLIC1 monomeric and dimeric states involve regions of the protein shoes sequences are highly conserved within the CLIC family. However some aspects of this transformation are obviously unique to CLIC1, in particular those changes centred around helix 2 and the cis-proline loop. On a more global scale, no fundamental differences seem to suggest that the dimer state is necessarily unique to CLIC1. It is noted however that some of the conserved residues identified as being involved in the dimerisation process are within regions of the CLIC1 GST-like fold that are highly conserved between GST classes. These include: the glycine hinge at the C-terminus of helix 1; the cis-proline hinge at the N-terminus of β- strand 3; the interacting regions between the hydrophobic β-strand 3 and the H-site of helices 4b and 9, as well as the conserved hydrophobic residues in helix 1 and 3 that face the β-sheet. This fact does not necessarily prevent involvement of these residues in membrane insertion.

12 CLIC1 numbering. 173 5.4.2 Redox induced conformational change The structural change observed in the monomer-to-dimer transition of CLIC1 is quite radical. Yet similar large scale conformational editing through redox switching has previously been observed in the E. coli hydrogen peroxide transcription factor, OxyR197.

OxyR is a transcription regulator that is activated by exposure to H2O2 and induces the transcription of genes associated with the response to oxidative stress. Activation occurs as a result of the formation of an intramolecular disulphide bond between two cysteines198. In reduced OxyR the sulphydryl groups of these two cysteines are ~ 19 Å apart; activation through H2O2 exposure introduces large scale conformational changes in the OxyR regulatory domain concomitant with disulphide bond formation197. Oxidation results in the loss of some secondary structural elements (α-helix C and β- strand 8) and the formation or extension of others (β-strand 7 and β-strand 8') (see Figure 5-17)197.

There are several parallels between the redox induced conformational changes observed in CLIC1 and OxyR. Firstly, it appears as if both proteins undergo a change in oligomeric state in response to oxidation, from monomer to dimer in the case of CLIC1, while OxyR is thought to change from a dimer to a dimer of dimers (i.e., tetramer)197. The reduced “monomeric” states of both proteins do not possess exposed hydrophobic surfaces, regions unmasked on oxidation reform new hydrophobic interactions in the final state197. Also in both proteins, the thiolate anion of one of the bonding cysteines is stabilised in the reduced state, presumably lowering the pKa of the sulphydryl group thus rendering this cysteine more nucleophilic and activating it towards disulphide bond formation. For OxyR this occurs through interaction with a positively charged arginine residue and two N-terminal helix dipoles197, while in CLIC1 the N-terminal dipole moment of helix 1 is likely to activate Cys-24 in a manner similar to that predicted for the topologically equivalent active site cysteine in the thioredoxins199,200.

174

Figure 5-17 Redox switch in the transcription regulator protein OxyR. A: Cartoon representation of subdomain II in the regulator domain of the reduced OxyR C199S mutant (pdb code: 1I69)197 and B: a similar representation of the active oxidised form (pdb code: 1I6A)197. In both panels helices are shown in red, β-strands in yellow and loop regions in green. The side chains of Cys-199 (Ser-199 in panel A and C) and Cys-208 are shown. C: Alignment of the reduced (purple) and oxidised (green) structures of OxyR where the regions participating in the conformational change are in bold. This figure is based on Figure 1 of Choi et al.197 and produced using the programs MOLSCRIPT1 and RASTER3D2.

In the oxidised state of both proteins the newly formed disulphide bond is removed from this thiolate stabilising environment, therefore the disulphide bonds are not as easily reduced. This occurs in the oxidised form of OxyR where the disulphide bond is in a position distant from the original putative activating arginine residue197. In CLIC1, oxidation results in the unwinding of the first turn of helix 1, removing Cys-24 and concomitantly the disulphide bond from the immediate environment of the helix’s N- terminus. Interestingly in both states of CLIC1 Cys-59 is at the C-terminus of helix 2, thus this helix dipole may render Cys-59 more electrophilic and thus a good binding partner for an activated form of Cys-24. In the reduced monomeric state the partial negative charge of this dipole is tempered by the presence of Arg-51, whose side chain spans helix 2. The elongation of the N-terminus of helix 2 on oxidation reorientates this residue away from Cys-59, such that in the oxidised state the disulphide bond remains in close proximity to and unshielded from the partial charge of the helix’s C-terminus. However it has been noted using a helical peptide model that the dipole moment at the C-terminus of a helix has less of an effect on the pKa of a capping cysteine than that at the N-terminus199. Additionally in the A subunit of the CLIC1 dimer structure Lys-79 from the other subunit is in close proximity to the disulphide bond, perhaps negating any stabilising influences resulting from this helix dipole.

175 Zheng et al. have shown that the two cysteines forming the disulphide bond of the activated oxidised form of OxyR are both important for function, although only one (Cys-199) is essential for transcription198. As a result of mass spectrometry data from the oxidation of a mutant not containing the second cysteine, the authors have proposed that OxyR activation may proceed through a sulphenic acid (-SOH) intermediate of this essential cysteine198. In the reduced form of OxyR Cys-199 is directed towards a hydrophobic pocket in the protein, and it was proposed by Choi et al. that the formation of a polar sulphenic acid intermediate on H2O2 oxidation might aid activation by destabilising this hydrophobic interaction. This would then lead to Cys-199 flipping towards the solution where it must then find its bonding partner, Cys-208197.

If a sulphenic acid intermediate is also involved in the disulphide bond formation of

CLIC1 on H2O2 oxidation an alternate impetus for conformational change is required. This is because in CLIC1 the activated nucleophilic cysteine, Cys-24, is relatively solvent exposed.

In conclusion, there are several commonalities between OxyR and CLIC1. Both undergo large-scale structural changes on oxidation. Concomitant with this structural change the wild type versions of both proteins form an intramolecular disulphide bond. In their reduced state, both proteins activate one of the cysteines involved in this bond through a local environment that stabilises its thiolate anion. On oxidation, this cysteine and hence the newly formed disulphide bond, is removed from this thiolate stabilising environment. This prevents the catalysis of the reverse reduction reaction and thus stabilises the oxidised state.

5.4.3 Dimer formation and the GSTs While the evolutionary relationship between the N-terminal domain of the GST fold family and the thioredoxin superfamily is well understood5, the origin of the GST fold family C-terminal domain is unknown19. As the thioredoxin superfamily has not previously been associated with channel activity it was hypothesised by Cromer et al. that given the known origins of the N-domain of CLIC1 it was less likely to unfold readily during membrane insertion than the C-domain, which correspondingly may have evolved from a pore forming ancestor19. The formation of the oxidised CLIC1 dimer,

176 while not elucidating a mechanism for membrane insertion, does demonstrate that the primary structure of a stable thioredoxin fold can also be compatible with an alternative tertiary structure. Thus, structural homology with the thioredoxin superfamily does not necessarily indicate an immutable fold.

In line with this, the unfolding of GST P1-1 due to thermal201 or chaotropic agents202 is a multistage process indicating the presence of partially folded intermediates during denaturation. The first stage, which leads to the loss of enzymatic activity but with minimal structural changes is likely to be a breakdown of the GST homodimer into monomers. For thermal denaturation it was demonstrated that during the breakdown of this monomeric unit, an increase in the solvent accessibility of N-terminal tryptophan residues preceded the loss of α-helical content. As most of the α-helices are present within the C-domain of the GST fold, the existence of an unfolding/folding intermediate consisting of a predominantly structured C-domain and a partially unstructured N- domain was hypothesised201. The structure of the CLIC1 oxidised dimer appears similar to this hypothetical folding intermediate. Thus in the CLICs lowering the activation energy barrier through the stabilisation of a GST folding intermediary may have allowed the evolution of a dynamic protein class capable of trafficking between two distinct folded states.

177 Chapter 6 The invertebrate CLIC-like proteins

6.1 Introduction The CLIC family is highly conserved among vertebrates, with orthologues of all 6 human family members present with around 60% identity at least as distantly as the teleost fish and, as indicated by partial sequences from the sea lamprey P. marinus, probably throughout the entire subphylum vertebrata. This suggests that the CLICs have a conserved and vital role within the function of vertebrate organisms. The question that then arises is: what is this role and when did it evolve?

There are several proteins or hypothetical sequences with moderate identity to the CLIC family in organisms more evolutionarily distant than the fish. These include a single hypothetical CLIC protein in the chordate Ciona intestinalis that has 47% identity (63% similarity) to human CLICs 4, 5 and 6. The sea squirt and humans last shared a common ancestor approximately 540 Myr BP145.

As discussed in Chapter 3, there are presently few known intervening sequences from organisms that diverged immediately before or after this date. This makes it difficult to trace completely the evolutionary history of the CLICs. However sequences of CLIC- like proteins with low identities to the human family members also appear to be present in the arthropods, nematodes, molluscs, platyhelminths and possibly the cnidarians. In this Chapter the structures of these proteins from the model invertebrate organisms D. melanogaster and C. elegans are presented.

The CLIC-like sequence in the model fruit fly D. melanogaster, dmCLIC, is a 260 amino acid protein with a predicted MW of 30.2 kDa. It contains the core CLIC module with a 4 residue insertion at the C-terminus of helix 1 and deletions corresponding to the interdomain proline rich loop and negatively charged footloop. In addition to the core CLIC module it possesses ~20 amino acid N-terminal and C-terminal extensions. Excluding these two extensions dmCLIC shares ~ 30% amino acid identity (48% similarity, determined using an EMBOSS needle alignment217) with human CLICs 4, 5 and 6.

178 There are also two CLIC-like sequences in the model nematode C. elegans; EXC-4 is a 290 amino acid 33.7 kDa protein that is similar to dmCLIC (38% identity, 51% similarity), while EXL-1 is more divergent, it is a 238 amino acid 27.5 kDa protein with low similarity to EXC-4 (~ 25% identity). Like dmCLIC, EXC-4 contains both N- and C-terminal extensions to the basic CLIC module (17 aa and 34 aa respectively).

Figure 6-1 The C. elegans exc phenotype. Left panels: differential interference contrast micrographs taken from Buechner et al.204. The upper-left panel shows the wild-type excretory canal, highlighted using arrows. The lower-left panel shows the series of cysts formed in the exc-4 null mutant allele. Right panels: fluorescence micrographs taken from Berry et al.137, figures are of nematodes expressing the non-functional construct EXC-4(R202Stop)::GFP. The upper-right panel shows the localisation of the GFP construct in the wild-type nematode. The lower-right panel shows the localisation of the same GFP construct in the exc-4 null mutant allele.

The C. elegans CLIC-like proteins are the best characterised of the invertebrate CLIC- like proteins. C. elegans has a single excretory cell that forms the major tubule network of its excretory system (reviewed by Buechner203). In the exc phenotype mutants the lumen of this excretory cell develops abnormally (the exc-4 mutant is shown in Figure 6-1). In some of these mutants the apical surface of the canal appears to lose the ability to maintain narrow tubule structures and swells into a series of fluid-filled cysts203. In a study by Berry et al. the locus of one of these phenotype mutants, exc-4, was mapped to a region of chromosome I containing a CLIC-like gene. All three exc-4 phenotype mutant alleles contained molecular lesions within this gene. Additionally

179 transformation with a genomic fragment encompassing this gene was able to rescue the mutant phenotype137. The authors subsequently named this CLIC-like gene EXC-4.

An EXC-4 C-terminal GFP construct capable of rescuing the mutant phenotype was seen to be widely expressed, and predominantly localised to membranes that undergo substantial remodelling, including the luminal membrane of the excretory cell137. The authors also identified a second CLIC-like protein in the C. elegans genome that they termed EXL-1 (for EXC-4-like). EXL-1 was seen to localise to intracellular membranes distinct from the expression profile of EXC-4137. In the same study the C. elegans omega class GST was seen to display a solely cytoplasmic expression in contrast to the membrane associated expression of EXC-4 and EXL-1.

Berry et al. used a series of EXC-4 deletion constructs to map the regions of the protein responsible for the luminal membrane localisation within the excretory cell. Truncation of EXC-4 in the middle of helix 6 did not affect membrane localisation; this mutation would be expected to disrupt the second putative CLIC transmembrane domain. In contrast the first 66 residues, which contain the first putative transmembrane domain, were seen to be both necessary and sufficient for membrane targeting, and this can be abolished with a single point mutant L46P (a leucine conserved in all CLIC family members)137. The introduction of a proline at this position could potentially disrupt α- helix 1, and the first putative transmembrane domain.

It is difficult to categorise the invertebrate CLIC-like proteins definitively as members of the CLIC family on the basis of their amino acid sequence alone. This results from the low levels of sequence identity between the CLICs and various GST classes, most likely resulting from a shared evolutionary relationship between the two families4. In order to determine whether these proteins share structural characteristics that are particular to the CLIC family or alternatively to one of the GST classes, their crystal structures were solved. The results of these experiments are presented in this chapter in Section 6.3. Section 6.4 discusses common structural features and the possible relevance this has to the evolution and function of the CLIC family.

180 6.2 Materials and Methods

6.2.1 Cloning

The clone pdmCLIC containing the cDNA for the D. melanogaster gene CG10997 (NM 132700) cloned within the pGEX-2T GST-fusion vector (Amersham Biosciences, Piscataway, NJ) was a gift from Prof. Mark A. Berryman13.

A clone containing the cDNA for the C. elegans gene exc-4 (AY308063) was a gift from Prof. Oliver Hobert 14. Dave Smith15 subsequently cloned the exc-4 gene into the pGEX-4T-1 GST-fusion vector (Amersham Biosciences, Piscataway, NJ) using a protocol similar to that used for CLIC1 25, this construct is called pEXC-4 (personal communications).

6.2.2 Protein purification

6.2.2.1 Purification of the reduced monomer The products of the pdmCLIC and pEXC-4 constructs, dmCLIC and EXC-4 respectively, were purified using essentially the same protocol as that used for CLIC1 (see Section 5.2.2).

Under reducing conditions dmCLIC eluted from the gel filtration Superdex 75 prep grade HiLoad 26/60 column (Amersham Biosciences) as a monomer and was subsequently concentrated to 25.4 mg/mL before aliquoting, flash freezing in liquid nitrogen and storing at -80°C.

EXC-4 also eluted as a monomer in gel filtration chromatography, but appears less stable than the other CLICs in this column buffer (20 mM HEPES, 100 mM KCl, 1 mM

DTT, 1 mM NaN3, pH 7.0). In this buffer it starts to show precipitation if concentrated

13 Department of Biomedical Sciences, Molecular and Cellular Biology Program, Ohio University College of Osteopathic Medicine, Athens, OH 45701, USA. 14 Department of Biochemistry and Molecular Biophysics, Center for Neurobiology and Behavior, Columbia University, College of Physicians and Surgeons, New York, NY 10032, USA. 15 Centre for Immunology, St. Vincent’s Hospital, and University of New South Wales, Sydney 2010, Australia. 181 above ~ 8 mg/mL, thus stock solutions were concentrated to only 7 mg/mL before aliquoting, flash freezing in liquid nitrogen and storing at -80°C.

6.2.2.2 Purification of the oxidised monomer

DmCLIC and EXC-4 were oxidised with 2mM H2O2 essentially as performed for CLIC1 and analysed using a similar protocol (see Section 5.2.3).

6.2.3 Crystallisation

6.2.3.1 DmCLIC

DmCLIC was crystallised at 4 °C using the hanging drop vapour diffusion method over a 500 μL reservoir consisting of 20% w/v monodisperse polyethylene glycol 3350 and 0.2 M KI. Drops consisted of 3 μL protein at 12.7 mg/mL and 3 μL of reservoir. Crystals grew over a 2 day period, reaching a final size of ~ 400 μm x 200 μm x 200 μm (see Figure 6-2), however after growing they deteriorated over 3-4 days.

Figure 6-2 Crystals of DmCLIC. DmCLIC crystals grown in 20% w/v monodisperse polyethylene glycol 3350 and 0.2 M KI at 4 °C.

6.2.3.2 EXC-4

EXC-4 was initially crystallised at 20 °C using the hanging drop vapour diffusion method over a 500 μL reservoir consisting of 16% w/v monodisperse polyethylene glycol 3350 and 0.15 M HCOONa. Drops consisted of 3 μL protein at 4.8 mg/mL and 3 μL of reservoir. Crystals grew over a 2 day period, with two morphologically distinct crystal forms being observed to grow under these conditions. The predominant crystal form grew as hexagonal prisms and displayed anisotropic diffraction in the spacegroup

P32 with cell dimensions a = b = 49.38 Å and c = 160.75 Å. Occasionally small rectangular block like crystals were also observed, these diffracted isotropically in the spacegroup P21 with cell dimensions a = 54.93 Å, b = 91.92 Å, c = 63.83 Å and β =

99.3°. The P32 EXC-4 crystals were not suitable for diffraction studies.

182

Using a glass fibre the P21 crystal form was streak seeded into drops grown over a reservoir consisting of 14% w/v monodisperse polyethylene glycol 3350 and 0.15 M HCOONa. Crystals only grew in those drops that were seeded, these formed large rectangular prisms over a 2 day period that reached a final size of ~ 300 μm x 300 μm x 400 μm.

6.2.4 Data collection

6.2.4.1 DmCLIC Before freezing in liquid nitrogen, crystals were progressively transferred into a cryoprotectant consisting of the initial reservoir solution plus 250 mg/mL D-glucose. This was performed by adding 3μL of 125 mg/mL D-glucose solution directly to the protein drop and waiting several minutes before adding 3μL of 250 mg/mL D-glucose solution. After several more minutes the crystal was mounted in a cryo loop (Hampton Research) and washed directly in the 250 mg/mL D-glucose solution before freezing in liquid-nitrogen. After freezing, crystals were mounted at 100 K in a nitrogen cryostream (Oxford cryosystems). Diffraction data were thus obtained at 100 K on a

Mar345 image plate mounted on a Nonius rotating anode generator using Cu Kα radiation and Osmic confocal mirror optics.

The dmCLIC crystals diffracted to 1.90 Å in the spacegroup P212121 with cell dimensions a = 39.24 Å, b = 62.90 Å and c = 114.12 Å. The crystals contain 1 3 DmCLIC molecule within the asymmetric unit, with a Vm of 3.11 Å /Da and a solvent content of 57.2%.

6.2.4.2 EXC-4

Before freezing in liquid nitrogen, the P21 EXC-4 crystals were equilibrated for 10-20 min. over 500 μL of the initial reservoir solution plus 30% v/v isopropanol.

After freezing EXC-4 crystals were mounted at 100 K in a nitrogen cryostream (Oxford cryosystems). Diffraction data were thus obtained at 100 K on a Mar345 image plate mounted on a Nonius rotating anode generator using Cu Kα radiation and Osmic confocal mirror optics. The P21 crystal form of EXC-4 diffracted to 1.90 Å. This

183 crystal form contains 2 EXC4 molecules within the asymmetric unit with a Vm of 2.34 Å3/Da and a solvent content of 46.4%.

6.2.5 Structure solution and refinement

6.2.5.1 dmCLIC The CLIC1 reduced monomer structure (1K0M)3 without the flexible footloop (residues 6-146 and 166-241) was used as a molecular replacement probe using the CCP4 183 program AMoRe185. An initial model was built using the program wARP186, further model building was performed using the program O187. This was refined using maximum likelihood methods (using the program REFMAC 5 184). The final dmCLIC model consists of a single molecule within the asymmetric unit and contains residues 14-72, 74-232 and 234-258 plus 143 water molecules, one iodide ion and one calcium ion. Pro-85 adopts a cis peptide bond. The final R-factor is 0.207 with R-free at 0.257. Water molecules were added using a Babinet model with mask184, the mask parameters used were a vdw probe radius of 1.40Å, an ion probe radius of 0.80Å and a shrinkage radius of 0.80Å. Structures were validated using PROCHECK182 using default parameters. The data reduction and refinement statistics are summarised in Table 6-1.

The calcium ion was within the structure was initially modeled as a water molecule. During refinement in Refmac 5 184, the B-factors for this water dropped to the lower limit (5Å) indicating that the electron density was not being properly modeled. Once the atom was set to Ca2+ with an occupancy of 1, the B-factors refined to values that were consistent with the B-factors of the oxygen ligands. The fact that all ligands are oxygen is consistent with a hard metal such as Ca2+ or Mg2+, where only the former is sufficiently electron dense to account for the density observed. The coordination is consistent with the assigment of the metal ion as Ca2+. The iodide ion was modelled based on a large electron density and a large anomalous peak consistent with an I- ion.

6.2.5.2 EXC-4 The dmCLIC structure was used as a molecular replacement probe using the CCP4 183 program AMoRe185. An initial model built using the program wARP186, further model building was performed using the program O187. This was refined using maximum likelihood methods (using the program REFMAC 5 184). The final EXC-4 model

184 consists of residues 15-284 in the A subunit and residues 17-141 and 148-285 in the B subunit plus 402 water molecules and two calcium atoms. In both the A and B subunits Pro-79 adopts a cis peptide bond. The final R-factor is 0.194 with R-free 0.240. Water molecules were added using a Babinet model with mask184, the mask parameters used were a vdw probe radius of 1.40Å, an ion probe radius of 0.80Å and a shrinkage radius of 0.80Å. Structures were validated using PROCHECK182 using default parameters. The data reduction and refinement statistics are summarised in Table 6-1.

6.3 Results

6.3.1 Protein purification and oxidation

Figure 6-3 Oxidation of the invertebrate CLICs. Gel filtration chromatography of dmCLIC and EXC-4 using a Superdex 75 prep grade HiLoad 26/60 sizing column. Both proteins are seen to be monomeric under reducing and oxidising conditions.

DmCLIC and EXC-4 were expressed and purified using the same protocol as that used for CLIC1. Using gel filtration chromatography both proteins elute as a monomer under reducing conditions and do not change their oligomeric state after oxidation with 2 mM

185 H2O2 (see Figure 6-3). Thus, unlike CLIC1, these two invertebrate CLIC-like proteins do not form an oxidised dimeric state on exposure to hydrogen peroxide.

6.3.2 Crystal structure of dmCLIC

Figure 6-4 Cartoon representation of dmCLIC. A: Cartoon representation of dmCLIC with helices shown in red, β-strands in yellow and loop regions in green. Secondary structural elements are labelled. B: Cartoon representation of CLIC4(ext) as presented in Figure 4-3A. C: Backbone alignment of the dmCLIC (purple) and CLIC4(ext) (green) crystal structures. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

The crystal structure of the reduced monomeric form of dmCLIC was determined at 1.9 Å resolution (see Figure 6-4A and Table 6-1). The final model consists of residues 14- 72, 74-232 and 234-258. The breaks in electron density occur near the N-terminus of helix 2 and the C-terminus of helix 9 respectively. DmCLIC adopts the canonical GST fold and despite the low sequence conservation, for the most part possesses a Cα

186 backbone structure identical to that of CLIC1 and CLIC4 (Figure 6-4C). The root mean square deviation of Cα atoms is ~ 1.4 Å with CLIC4 over 202 atoms, calculated with O187 (using dmCLIC residues 22-109, 113-116, 118-138, 140-161 and 212-235).

Table 6-1 Data reduction and refinement statistics for dmCLIC and EXC-4.

Protein crystal dmCLIC EXC-4

Spacegroup P212121 P21 Unit cell a = 39.24Å 54.73Å b = 62.90Å 91.69Å c = 114.12Å 63.65Å Angles other than 90º - β = 99.05º Resolution (outer shell) 1.9 Å (2.00-1.90 Å) 1.9 Å (2.00-1.90 Å) Reflections (unique) 79165 (22232) 220217 (16544) Completeness (outer shell) 97.0% (95.1%) 92.7% (78.9%) I/σ (outer shell) 6.4 (1.0) 15.7 (1.6) Rmerge (outer shell) 0.092 (0.913) 0.069 (0.631) Protein (water) atoms 1998 (145) 4434(390)

R factor (Rfree) 0.207 (0.257) 0.194(0.240) r.m.s.d. bond lengths1 0.021 Å 0.019 Å r.m.s.d. bond angles1 1.70º 1.59 º Ramachandran plot2 Most favoured region 90.1% 95.3% Additionally allowed 9.0% 4.7% Generously allowed 0.4% (Asn-90) 0.0% Disallowed 0.4% (Gln-69) 0.0% 1 From REFMAC 5 184 2 From PROCHECK182 r.m.s.d., root mean square deviation from standard bond lengths and angles determined by high resolution small molecule crystal structures184.

6.3.3 Crystal structure of EXC-4

The crystal structure of the reduced soluble monomeric form of EXC-4 was solved in the crystal form P21 at 1.90 Å resolution (see Table 6-1 for refinement statistics). In the

P21 crystal form there are 2 monomers in the asymmetric unit. The final model consists of residues 15-284 in the A subunit and residues 17-141 and 148-285 in the B subunit. The break in electron density occurs in the nematode-specific loop between helix 4 and helix 5. This loop region, which is called the “h4-h5 loop”, also displays higher than average temperature factors in the A subunit, possibly indicating a certain degree of

187 flexibility in solution. The two monomers within the asymmetric unit are related via a non-crystallographic two-fold axis and display nearly identical structures (Cα RMS deviation of 0.41 Å) (see Figure 6-5). Minor differences are confined to residues leading into the h4-h5 loop, the C-terminus of helix 2 and the side chains of various surface residues.

Figure 6-5 The dimer within the asymmetric unit in the P21 EXC-4 crystal form.

Cartoon representation of the asymmetric unit in the P21 EXC-4 crystal structure. In the A subunit helices are in red, β-strands in yellow and loop regions in green. In the B subunit helices are in blue, β-strands in yellow and loop regions in purple. The molecules are viewed A: down and B: along the non-crystallographic 2-fold axis. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

188

As shown in Section 6.3.1 EXC-4 was monomeric in solution prior to crystallisation. However the arrangement of molecules within the EXC-4 crystallographic dimer seen in the asymmetric unit of the P21 crystal form (see Figure 6-5) partially resembles that seen in GST dimers (see Figure 1-10). The EXC-4 interface is relatively polar but buries ~ 1200 Å2 of accessible surface area per molecule (calculated using GRASP195).

6.3.4 Comparison of the invertebrate and vertebrate CLIC structures

Figure 6-6 Comparison of CLIC4 and the invertebrate CLICs.

Cartoon representations of A: The A subunit of EXC-4 in the P21 crystal form B: dmCLIC and C: human CLIC4(ext) are shown. Helices are in red and labelled, β- strands are in yellow, loop regions are in green. Figure produced using the programs MOLSCRIPT1 and RASTER3D2

As expected given their sequence similarities, the structures of dmCLIC and EXC-4 are also similar. A superposition of the Cα backbone traces of the EXC-4 A subunit and dmCLIC was calculated using MOLMAN as implemented in the program O187 using default parameters. EXC-4 and dmCLIC have a RMS deviation of 1.4 Å over 225 atoms. The two invertebrate CLIC-like proteins are also structurally similar to the vertebrate CLIC structures with dmCLIC more closely related than EXC-4 (see Figure

6-6). The Cα backbone alignments of dmCLIC with CLIC1 and EXC-4 with CLIC1 show a RMS deviation of 1.4 Å over 202 atoms and 1.6 Å over 193 atoms respectively, calculated using the program O187.

The more closely related dmCLIC structure was compared to that of CLIC1 and CLIC4(ext). This showed that the length and position of secondary structural elements is mostly conserved, the majority of differences are confined to connecting loop-regions 189 (see Figure 6-7). The minor exceptions to this include: a 3 residue extension at the C- terminus of helix 1, and a slight contraction of helix 5 at its N-terminus and extension from its C-terminus with respect to CLIC1 and CLIC4(ext).

Figure 6-7 Sequence alignment of CLIC1, dmCLIC and EXC-4. Sequence alignment of human CLIC1, dmCLIC and EXC-4, secondary structural elements are shown and labelled above the CLIC1 sequence and below the dmCLIC and EXC-4 sequences. Helices are shown in red for CLIC1, blue for dmCLIC and green for EXC-4, β-strands are in yellow. In the sequences conserved cysteines are highlighted in yellow, the putative transmembrane regions of CLIC1 are highlighted in green. Residues seen in the most complete molecule within the asymmetric unit of the respective crystal structures are capitalised, lower case residues are not observed in the electron density of the respective structures. Amino acid conservation between CLIC1

190 and dmCLIC as well as between the dmCLIC and EXC-4 sequences is indicated for residues occupying structurally equivalent positions within the various structures.

While the secondary structural elements and the majority of loop regions are conserved between dmCLIC and the two vertebrate CLIC structures, several loop-regions adopt different conformations. These include an elongated loop leading into β-strand 2 due to the extra turn at the C-terminus of helix 1 (see Figure 6-8A). The interdomain-loop in dmCLIC is also distinct from that of the two vertebrate CLICs and contains a single α- helical turn (see Figure 6-7 and Figure 6-8B). The 4 residue deletion at the N-terminus of helix 5 results in a shorter connection between helix 4b and helix 5 (see Figure 6-8C). Finally, dmCLIC does not contain a flexible footloop between helices 5 and 6 and instead possesses an abridged GST-like loop within this region (Figure 6-8D).

Figure 6-8 Differences in loop regions between dmCLIC and CLIC1. Expanded backbone alignments of loop regions highlighting the different conformations in the dmCLIC and CLIC1 structures. DmCLIC is in green, CLIC1 in purple. The panels show A: the C-terminus of helix 1 and β-strand 1, B: the interdomain loop region, C: the loop between helix 4 (on right) and helix 5 and D: the CLIC1 footloop region and loop structure leading into helix 6. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

In comparison to the two human CLIC structures, EXC-4 has a slight C-terminal extension to helix 1 and lacks the flexible footloop present in the chordate CLICs. In these regions EXC-4 adopts conformations identical to dmCLIC (see Figure 6-8A and D respectively). For the most part the secondary structural elements in EXC-4 are also identical to those in dmCLIC (see Figure 6-7), with the majority of differences occurring in connecting loop regions. Differences between the two invertebrate CLIC- like protein structures include: helix 3 is extended by 4 residues from its C-terminus and is rotated by ~ 15 degrees with respect to the remainder of the molecule (see Figure 6-9A). In EXC-4 unlike the two human CLIC structures and dmCLIC, the hydrogen 191 bonding in helix 4 is not disrupted and therefore the helix is not broken into two separate parts (see Figure 6-9C and D). Helix 4 also extends an extra 2 turns at its C- terminus leading into the nematode specific elongated loop between helices 4 and 5 (Figure 6-9B). In dmCLIC helix 5 is ~5 residues shorter at the N-terminus than the same helix in the vertebrate CLIC structures (see Figure 6-8C). In EXC-4 this is not the case, however there is a sharp bend leading into the helix from the h4-h5 loop occurring at approximately the same position (see Figure 6-9B). EXC-4 also contains an extra tenth helix within its C-terminal extension that is not observed in dmCLIC (see Figure 6-6A).

Figure 6-9 Differences between EXC-4, dmCLIC and CLIC1. Expanded backbone alignments of regions highlighting the different loop conformations in the EXC-4, dmCLIC and CLIC1 structures. EXC-4 is in red, dmCLIC is in green, CLIC1 in purple. A: Alignment of EXC-4 and dmCLIC encompassing β-strands 3 and 4, helix 3 and the interdomain loop. B: Alignment of EXC-4 and dmCLIC showing the structure of the EXC-4 h4-h5 loop, h4 is on the left. Comparison between helix 4 of EXC-4 and helices 4a and 4b in C: dmCLIC and D: CLIC1. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

In addition to the ~ 230 residue CLIC module the invertebrate CLIC-like proteins all contain 15-20 residue N-terminal extensions and 30-40 residue C-terminal extensions. In the crystal structure of dmCLIC this C-terminal extension packs against the body of the molecule away from the glutathione binding site and interacts with residues from helices 6, 7 and 9 (see Figure 6-10A).

192 A conserved Tyr residue is present at or near the C-terminus of helix 9 in all of the CLICs and CLIC-like proteins (Tyr-233 in CLIC1, Tyr-231 in dmCLIC and Tyr-241 in EXC-4, see Insert 1). This tyrosine residue marks the point where the dmCLIC and EXC-4 C-terminal extensions appear to deviate from the structure of the C-termini in CLICs 1 and 4. Following Tyr-231 in the dmCLIC structure, there are few interactions made between the C-terminal extension and the body of the molecule until Glu-246 (ball and stick representation in Figure 6-10A). Consequently this region appears to be relatively flexible, displaying higher B-factors as well as several breaks in the electron density.

Figure 6-10 Cartoon representation of the dmCLIC C-terminal extension. A: cartoon representation of the C-terminal extension of dmCLIC. The CLIC module is represented using the same colours as Figure 6-4B, while residues 246-258 are shown in ball-and-stick representation. B: Residues involved in the interaction between the C- terminal extension and the body of the molecule. The C-terminal extension is in orange highlight. The side chains of residues in helix 6 and 9 making hydrogen bonds (purple dashed lines) to the C-terminal extension are shown and labelled. C: Hydrophobic side chain interactions made between the extreme C-terminal residues and those in helix 7. Figure and colouring are as shown in panel B. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

193

Figure 6-11 Cartoon representation of the EXC-4 C-terminal extension. Molecule orientation and colouring are the same as those used in Figure 6-10B and C. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

After this deviation away from the body of the molecule the C-terminal extension then passes between helix 8 and helix 9, forming a series of hydrogen bonds with Asp-224 and Gln-225 from the C-terminus of helix 9 (Figure 6-10B). The extension then runs along helix 7 forming hydrogen bonds between C-terminal extension backbone atoms and Gln-183 and Arg-186. Finally the extreme C-terminal residues bend across the N- terminus of helix 7, where a hydrophobic isoleucine and proline residue within the extension interact with the conserved tryptophan (see Section 4.4 and Figure 4-4B) that forms part of the turn leading into helix 7 (Figure 6-10C).

In EXC-4 the ~40 residue C-terminal extension to the CLIC module adopts a conformation similar to the analogous extension seen in dmCLIC (see Figure 6-10 and Figure 6-11). Within this extension, the ~15 residues immediately following helix 9 protrude away from the body of the molecule resulting in a small flexible loop region similar to that seen in dmCLIC. The peptide chain then forms an extended loop structure bound between helices 8 and 9 (Figure 6-11A) that continues along the side of the molecule before finally forming a small 3-turn C-terminal helix (h10) unique to EXC-4 that packs against the N-terminus of helix 7 (Figure 6-11B).

At the centre of the glutathione binding site within the CLICs is the redox active cysteine that occupies the N-cap position of helix 1 (Cys-24 in CLIC1). The vertebrate CLICs 1, 4, 5 and 6 all possess the residues CPFS within this region, a sequence also

194 observed in some omega class GSTs and glutaredoxins, which typically contain the residues CP[F/Y][A/S] and CP[F/Y][C/S] within their respective active sites (Figure 6-12). In contrast CLICs 2 and 3 both contain a second cysteine residue instead of the C-terminal serine (CPFC and CPSC respectively) thus containing a CxxC thioredoxin/glutaredoxin-like motif (Figure 6-12). Due to the presence of this motif Cromer et al. have suggested the possibility that the CLICs may function as either thioredoxins or glutaredoxins19.

Like CLIC2 and CLIC3, dmCLIC contains a thioredoxin/glutaredoxin-like CxxC motif (CLFC) at the N-terminus of helix 1 (Figure 6-12). The electron density within this region is clear with both cysteines appearing to be reduced. The residues within this motif adopt similar side chain and backbone conformations to the equivalent residues in the CLIC1 and CLIC4(ext) structures (Figure 6-13). In dmCLIC Cys-40 occupies a similar position to the GSH binding residue, Cys-24 in CLIC1. EXC-4 is unique within the CLICs in that it does not contain a conserved cysteine residue within the N-cap position of helix 1 and instead contains an aspartic acid within a DxxC motif that is otherwise identical to that of dmCLIC (see Figure 6-12). In the EXC-4 crystal structure the electron density of this region is clear, Cys-39 appears to be reduced and all residues adopt similar conformations to those in the other CLIC structures despite the presence of the aspartic acid (see Figure 6-13C).

195

Figure 6-12 The CLIC redox active site. Sequence comparison of the redox active CxxC, and related motifs, at the N-terminus of helix 1 within the CLICs and other proteins. The secondary structural elements topologically equivalent to β-strand 1 and α-helix 1 in CLIC1 are marked under each sequence grouping. Cysteines within each motif are highlighted in yellow, serines in pink. The active aspartic acid residue in the motif of EXC-4 is highlighted in magenta. The sequences shown are respectively: human CLICs 1-6, human and fruit fly omega class GSTs (P78417, Q9H4Y5, NP_648234), human CLICs 2 and 3, human and fruit fly glutaredoxins (NP_002055, AAH28113, AAL16098), dmCLIC and EXC-4, human and fruit fly theta class GSTs (NP_000844, AAC13317, AAK66764) and the human thioredoxins (AAF86466 and NP_036605).

196

Figure 6-13 Conformation of the CxxS, CxxC and DxxC motifs in CLIC1, dmCLIC and EXC-4. Ball and stick representation of the redox active motifs at the N-terminus of α-helix 1 in A: the reduced CLIC1 monomer, B: the reduced dmCLIC monomer and C: the reduced EXC-4 structure (see Section 6.3.3). The side chains of residues within the CxxS, CxxC and DxxC motifs are shown. Hydrogen bonds within helix 1 are represented with dashed purple lines. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

Figure 6-14 Helix 6 N-terminal capping motif in dmCLIC.

197 In the GST-fold helix 6 contains a conserved N-terminal helix capping motif and preceding loop structure. The side chains of residues within this motif are shown in ball and stick representation and labelled. The structural water molecule at the centre of the loop is shown in cyan (see text). Hydrogen bonds are represented with purple dashed lines. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

In the primary structure of dmCLIC a similar CxxC sequence is also present at the N- terminus of helix 6 (see Figure 6-14), however in the crystal structure these residues are not orientated towards each other like those of the redox couple within the glutathione- binding site (see Figure 6-14). Instead they are involved in a structural motif conserved in all GST-like folds. This is discussed further in Section 6.4.3.

198

Figure 6-15 DmCLIC cis-proline loop prior to β-strand 3. The structure in dmCLIC around the cis-proline conserved in proteins containing the GST fold is shown. The side chains of residues 83-87, 95 and 97 are shown and labelled, the side chains of Cys-40 and Cys-43 are shown but unlabelled (upper right). Secondary structural elements are coloured and labelled as for Figure 6-4A. The bound metal ion is shown in purple, the hydrogen bonds it forms are represented as purple dashed lines. Figure produced using the programs MOLSCRIPT1 and RASTER3D2.

The majority of structures containing the thioredoxin fold contain a conserved cis- proline residue at the N-terminus of β-strand 3 188. This proline is within the N-terminal domain at the end of the loop connecting helix 2 to β-strand 3. As discussed in Sections 4.4 and 5.4.2, there are two versions of this loop structure in the vertebrate CLICs, PG[G/S]QLP in CLICs 1 and 3 and PGT[H/N]PP in CLICs 2, 4, 5 and 6. The first proline in each of these sequences is the C-cap residue of helix 2, while the last proline

199 is the conserved cis-proline that is the residue N-terminal to β-strand 3. In dmCLIC the sequence of this loop region is 79-FEATHPP-85, while EXC-4 has the similar sequence 73-FLGAQPP-79. Thus while the residues leading out of helix 2 differ in the invertebrates, those immediately prior to the cis-proline are similar to the double proline containing CLIC4 sequence. In the invertebrate crystal structures these residues adopt a conformation similar to the equivalent region within the CLIC4(ext) structure (see Figure 4-4D).

This loop has a similar conformation in the invertebrate CLIC-like proteins and the CLIC4(ext) structure. Despite this, in both the dmCLIC and EXC-4 structures the backbone carbonyl groups of the two residues prior to the cis-proline from within this sequence (His-83 and Pro-84 in dmCLIC, Gln-77 and Pro-78 in EXC-4) bind an atom, which when modelled as a water molecule possesses temperature factors half that of the surrounding molecule. This atom also contains a weak anomalous signal (approximately 4σ) suggesting that it may be Mg2+ or more likely a Ca2+ ion.

In both structures this metal ion is also ligated by the side chain carbonyl group of an asparagine residue at the N-terminus of helix 3 (Asn-97 in dmCLIC and Asn-93 in EXC-4) and the backbone carbonyl group of the N’ residue (Leu-95 in dmCLIC and Thr-91 in EXC-4). One or two water molecules are also seen to interact with the metal ions in both structures (see Figure 6-15). In the structure of dmCLIC and the EXC-4 B subunit this putative Ca2+ ion is co-ordinated by 4 protein oxygen atoms and one water molecule, the A subunit of EXC-4 also co-ordinates a second water molecule.

As well as this Ca2+ ion, the crystal structure of dmCLIC contains an ion with a large anomalous signal that presumably corresponding to an iodide ion, which is present at a 200 mM concentration within the crystallisation buffer. This ion is located at a crystal- packing interface within a pocket formed between helix 5 and the N-terminus of helix 7 of one molecule and the C-terminus of helix 6 from a symmetry-related partner.

200 6.4 Discussion

6.4.1 Overview of the invertebrate CLIC structures

Although there are several salient differences between the invertebrate and vertebrate CLIC proteins, dmCLIC and EXC-4 are structurally closer to the CLICs than to any currently available GST fold family structures. Indeed, the Cα backbone traces are nearly identical (see Figure 6-4C). This high degree of similarity between the dmCLIC and EXC-4 structures and those of human CLIC1 and CLIC4(ext) suggests that these proteins share a common evolutionary ancestor. Whether this consequently means they also share a common cellular role is yet to be determined.

When comparing the invertebrate CLIC-like protein structures with the various GST classes with known structures (see Figure 1-9), it is seen that the they are most closely related to the human omega class49, bacterial beta class19 and plant tau class GSTs106 and the bacterial SspA proteins92. This is the same relationships as the chordate CLICs.

6.4.2 The putative CLIC glutathione binding sites

In proteins containing the thioredoxin fold, the presence of an active site CxxC or related motif can confer a wide range of reducing potentials, allowing the motif containing protein to act as either a reducing or oxidising agent205. This tunability of the redox potential of active site residues in the thioredoxin fold, allows proteins within this family to fulfil a wide range of redox associated roles. The redox potential of the active site residues is partially dependent on the sequence of the CxxC or related motif. Proteins fulfilling similar redox functions will thus often maintain similar residues within this motif.

The CxxC or related motif within the vertebrate CLICs contains either one (CLIC1 and 4-6) or two (CLIC2 and 3) cysteines. The vertebrate CLICs predominantly contain either the sequence CPFS or CPFC (CPSC in CLIC3). The CxxC motifs of CLIC2 and CLIC3 are similar to the active site motifs found in the glutaredoxins (CPYC and CSYC in human Grx1 and Grx2 respectively)206, while the CxxS motif found in the remaining CLICs is similar to the evolutionarily related omega class GSTs (CPFA in human

201 GSTO1 and CPYS in human GSTO2)121. These similarities originally led to the proposal that some CLIC family members may possess a thioltransferase activity, a characteristic of the glutaredoxins and to a lesser extent the omega class GSTs19. However in a recent paper by Board et al. no thioltransferase activity was detected for CLIC2. Instead CLIC2 displayed weak peroxidase activity to the two organic peroxides tested (t-butyl hydroperoxide and cumene hydroperoxide)36.

Interestingly CLIC-like CPYS motifs are also found in the bacterial organic hydroperoxide resistance protein (Ohr)207 and the structurally related osmotically induced protein C (OsmC)205. While these proteins do not contain a thioredoxin-like fold, like the CLICs, the reactive cysteine within the CxxS motif is located at the N- terminus of an activating α-helix208,209. Ohr is a peroxidase expressed in response to organic peroxides and is partially responsible for bacterial hydroperoxide resistance207. OsmC from E. coli also functions as a peroxidase displaying higher turnover rates for organic over inorganic hydroperoxides, however unlike Ohr, OsmC is induced in response to osmotic stress as opposed to exposure of the cell to external oxidants208. While the microenvironment around each CPYS motif is likely to fine-tune the redox potential of the reactive cysteine(s) in both protein folds, the similarity of the redox sites of the CLIC family and the Ohr/OsmC family may suggest a common affinity towards peroxides.

Like the vertebrate CLIC2 and CLIC3 family members, dmCLIC contains a CxxC motif within its putative glutathione binding site (see Figure 6-12). However in both the invertebrate structures a leucine residue replaces the proline seen in the vertebrate CxxC-like motifs that follows the first cysteine. Using a model peptide system Kortemme et al. have shown that, for a cysteine residue at the N-terminus of a helix, a following proline residue can act concurrently with the helix dipole moment to lower the cysteine’s pKa199. Structurally, in dmCLIC the proline to leucine replacement does not seem to greatly alter the conformation of the residues within the motif (see Figure 6-13). Future experiments will have to address how these differences between the active sites of the invertebrate CLIC-like proteins and the vertebrate CLICs alter the redox potential of this site.

202 EXC-4 is unique within the CLICs in that instead of the CxxC motif seen in dmCLIC the first cysteine is replaced by an aspartic acid (see Figure 6-12 and Figure 6-13C). If EXC-4, like CLIC1, is able to resolve the disulphide bond in oxidised glutathione, the question arises: does the N-terminal cysteine within this motif always serve as the nucleophilic attacking group? Fomenko et al. have raised this possibility previously, in a study of CxxC derived motifs where the N-terminal cysteine is replaced by a serine or threonine in proteins that nonetheless retain redox activity210. In the currently known nematode sequences the Caenorhabditis species appear to be unique in possessing this DxxC motif within EXC-4, while other known nematode EXC-4 proteins possess a standard CxxC motif similar to dmCLIC (see Insert 1). However, all currently known nematode EXL-1 sequences contain a DxxC like motif like that of EXC-4 in C. elegans.

6.4.3 Helix 6 N-capping motif and hydrophobic staple

In addition to the CxxC motif at the N-terminus of helix 1, dmCLIC also contains the sequence 172-CCFDC-176 where Cys-172 is the N-terminal capping residue of helix 6 (see Figure 6-14). Cys-173 and Cys-176 are located at the N-terminus of helix 6 and possess a CxxC like sequence, however in the dmCLIC structure their sulphydryl groups are orientated in different directions making them an unlikely redox couple. Cys-176 is one of the highly conserved cysteines in the CLICs (Cys-178 in CLIC1).

GST fold family structures contain a conserved N-terminal helical capping motif in helix 6 as well as a tight bend in the loop preceding this helix. These two elements form a small substructure (the GST motif II), which is conserved in all GSTs and related proteins. This motif is buried within the core of the protein and typically has the sequence Gxxh[T/S]xxDh, where x is any residue, h is hydrophobic and the underlined residues are at the N-terminus of helix 6 117. This conserved element plays an important role in the α-helical nucleation and stability of helix 6; a structural element whose helical content is purported to be an early requirement of the GST folding process211. In the CLICs the hydrophobic residue following the conserved aspartic acid within this sequence is one of the conserved cysteines (Cys-178 in CLIC1). However the hydrophobic residue in the N’ position is a leucine or phenylalanine (Leu-175 in CLIC1) in CLICs 1-3 and 5-6 and a methionine in CLIC4, dmCLIC and EXC-4. This

203 conserved cysteine and N’ residue form part of the hydrophobic core of the protein, and in some CLICs also a hydrophobic staple interaction that stabilises the helix.

In the non-arthropod CLICs the helix 6 cap and preceding loop that form the GST motif II, adopt the classical GST-like structure115. In this structure the conserved N3 aspartic acid in helix 6 hydrogen bonds with the backbone nitrogen and hydroxyl group of the N-cap serine or threonine residue and the backbone nitrogen of one of the residues within the loop. The aspartic acid also co-ordinates a structural water molecule at the centre of the loop that forms hydrogen bonds with backbone atoms from two other loop residues and the hydroxyl group of the N-cap serine or threonine. The conserved glycine residue within the Gxxh[T/S]xxDh sequence forms a hydrogen bond across the loop structure, while the two hydrophobic residues form the hydrophobic staple interaction discussed above. Mutation of the conserved residues within the [T/S]xxD capping box114, pre-helix loop structure115 and hydrophobic cluster (h)117 have all been shown to have deleterious effects on the stability of the GST fold.

In the dmCLIC structure, and by homology presumably all currently known arthropod CLICs, the conserved helix 6 N-cap serine or threonine residue is replaced by a cysteine (Cys-172 in dmCLIC) whose sulphydryl group does not hydrogen bond with the structural water molecule at the centre of the pre-helix loop. Instead a threonine residue (Thr-167) on the other side of the loop, preceding the conserved glycine residue fulfils this role (see Figure 6-14). It is interesting to note that the reverse, a threonine N-cap and a cysteine residue prior to the conserved glycine, is observed in the more standard GST motif II sequences found in the crystal structures of the bacterial beta class GSTs from P. mirabilis89 and E. coli189,190.

In the dmCLIC structure the sulphydryl group of this N-cap cysteine is in close proximity to several groups involved in interactions stabilising the conserved pre-helix loop substructure. The integrity of the helix 6 N-cap and preceding loop has previously been shown to influence the stability of the GST fold114,115,117. If the arthropod CLICs and the bacterial beta class GSTs are regulated through redox signals, oxidation of cysteine residues within this sub-structure may represent a means of transducing these signals into structural changes within their respective GST-like folds.

204 In line with this, in one subunit of the dimer in the E. coli beta class GST structure (PDB code 1N2A)189, Cys-147 has undergone a peptide flip with respect to the other subunit that adopts the more classical conformation for this residue in the GST motif II. This results in a rearrangement of the hydrogen bond network around the structural water at the centre of the pre-helix loop. In addition, in this subunit the three residues preceding the loop are disordered and the usual hydrogen bond made by the conserved glycine appears not to be formed. The authors report that the differences between the two subunits do not appear to be due to lattice interactions but instead may reflect inherent conformational heterogeneity189. While the changes seen are small, this demonstrates that the local conformation of a cysteine residue within this conserved motif may be associated with more global changes within the GST fold.

Cocco et al. have previously noted that the standard GST motif II subdomain sequence Gxxh[T/S]xxDh is present in most proteins with a GST-like fold117. A search for proteins containing the alternate dmCLIC-like sequence [S/T]GxxhCxxDh using ScanProsite212 did not reveal any other proteins known to be related to the GSTs displaying this consensus sequence. However a search for CGxxh[T/S]xxDh found several GST-like proteins containing this sequence including, as well as those discussed above, two dichloromethane dehalogenases from methylotrophic bacterium (P21161 and P43387) and all currently known ganglioside-induced differentiation-associated protein 1 (GDAP1) and GDAP1-like 1 proteins. The function of the GDAP1 proteins is currently unknown, but mutations within their genes has been shown to generate autosomal-recessive Charcot-Marie-Tooth type 4A neuropathy; they have been predicted to form a GST-like fold and, like the CLICs, contain several putative transmembrane regions (see Marco et al. and references therein)213.

In all currently known nematode EXL-1, and some nematode EXC-4 proteins (but not C. elegans) the conserved glycine residue within the pre-helix loop also appears able to be replaced by either a serine or an alanine. The substitution of this glycine with alanine or valine has previously been shown to affect the structure of this loop region and overall stability of the hGST P1-1 protein115. Thus the CLIC proteins, which are unusual but nonetheless clearly part of the GST fold family, demonstrate that the residues tolerated within these conserved features of the canonical GST fold are likely to be more diverse than previously reported.

205

6.4.4 Cis – proline loop

Compared to the GSTs the CLIC proteins appear to show a low affinity for glutathione3,36. This is reflected in the crystal structure of CLIC1 complexed with glutathione under oxidising conditions (pdb code 1K0N), where relatively few interactions are made between the protein and the GSH molecule (see Figure 6-16). In this complex, several glutathione groups that usually form interactions with the protein within the G-site of most GSTs do not do so in CLIC1. These include the carbonyl group of the cysteinyl moiety and the backbone amino and carbonyl groups of the γ- glutamyl moiety3 (see Figure 6-16).

Figure 6-16 Residues interacting with GSH in the CLICs and various classes of the GSTs. The residues that interact with each glutathione group are shown using single letter amino acid code. The proteins these residues are from are shown at top left. Residue assignments based on the apo crystal structures of dmCLIC, EXC-4 and CLIC4 and the GSH or analogue complexes of: CLIC13, human GST O1-149, P. mirabilis GST B1-189, human GST P1-172, human GST T2-2101 and the A. dirus GST 1-3 214. Figure based on Figure 4 p361 of Cromer et al.19.

206 In the thioredoxin-like N-terminal domain of the GST fold the loop region connecting helix 2 to β-strand 3 contains a conserved cis-proline residue. This residue has been shown to be important for the structural stability and catalytic activity of GST A1-1 188. In the G-site of most GSTs this cis-proline ensures a conformation that allows the backbone carbonyl and nitrogen amide groups of the preceding residue to form β-sheet- like hydrogen bonds with the nitrogen amide and carbonyl groups of the cysteinyl moiety of GSH (see Figure 6-16).

However in the CLIC1:GSH complex, no interaction is seen to occur between the protein backbone nitrogen amide group and the carbonyl group of the GSH cysteinyl moiety. It is unlikely that such an interaction occurs in any of the CLICs. The cis- proline loop of CLIC3 is similar to that of CLIC1 and presumably adopts the same structure, while the remaining vertebrate CLIC family members possess a second proline residue prior to the cis-proline. While not present in CLIC1, in the CLIC4(ext) and dmCLIC structures, a histidine residue precedes this second proline in the sequence THPP leading into β-strand 3. An alignment of these structures with the CLIC1:GSH complex suggests that the pros nitrogen of this histidines imidazole ring may replace the GST nitrogen amide backbone interaction to the carbonyl group of the GSH cysteinyl moiety (see Figure 6-15). In CLIC2 and CLIC6 an asparagine residue, and in CLIC1, CLIC3 and EXC-4 a glutamine residue could also fulfil a similar role. Unfortunately due to a PCR mutation during the original cDNA isolation of CLIC1 25 the structure of the CLIC1:GSH complex3 was solved using a mutant where this glutamine was a glutamate (i.e., CLIC1(Q63E)). This could explain the lack of protein interactions to the carbonyl group of the GSH cysteinyl moiety in this structure. To address this possibility the structure of the wild type CLIC1:GSH complex needs to be solved.

Both the EXC-4 and dmCLIC structures contain what is most likely a Ca2+ ion situated at the N-terminus of helix 3 between β-strands 3 and 4 (this is shown for dmCLIC in Figure 6-15). In both structures this ion is ligated by the side chain carbonyl oxygen of an N-terminal asparagine residue in helix 3 as well as the backbone carbonyl oxygens of the N’ residue of helix 3, the cis-proline at the N-terminus of s3 and the residue two amino acids prior to this proline. In this putative calcium-binding site, the asparagine residue at the N-terminus of helix 3 (Asn-97 in dmCLIC and Asn-93 in EXC-4) is 207 equivalent to Thr-77 in CLIC1, the residue whose backbone amide nitrogen and hydroxyl groups bind the α-carboxyl group of the GSH γ-glutamyl moiety in the CLIC1:GSH complex. Additionally the two residues within the cis-proline loop ligating the calcium ion sandwich the residue that forms a backbone interaction with the GSH cysteinyl moiety and, as discussed in the previous paragraph, one of these residues (His-83 in dmCLIC, Gln-77 in EXC-4) may also interact directly with GSH. Thus several of the residues making up what is likely to be the G-site of the dmCLIC and EXC-4 structures are also involved in the formation of an ion binding site, perhaps indicating an allosteric interaction between calcium and glutathione binding.

In the vertebrate CLICs this cis-proline loop contains one of two sequences: PG[T/S]QLP in CLICs 1 and 3, and PGGT[H/N]PP in the remaining CLICs. The THPP sequence of dmCLIC is similar to the loop region leading into β-strand 3 in the latter set of sequences and these residues adopt identical conformations in the dmCLIC and CLIC4(ext) structures. EXC-4 also has a double-proline containing loop and, although displaying no other sequence similarity, it is also structurally similar to dmCLIC and CLIC4(ext) in this region. In this region of the protein dmCLIC and EXC-4 both bind a probable calcium ion, but despite the similarities in sequence and structure there is no evidence of a similar ion within the equivalent position in

CLIC4(ext). It is noted though, that the CLIC4(ext) crystals were grown in 0.2M NH4F while the dmCLIC crystals and EXC-4 crystals were grown in NaI or HCOONa. While the calcium ion concentrations were not specifically maintained during any of the protein purifications, thrombin cleavage was carried out in 2.5 mM CaCl2 so all three proteins had equally sufficient opportunity to scavenge metal ions. The solubility -9 2+ product constant (Ksp) of CaF2 is 5.3 x 10 and as such the [Ca ] in the CLIC4(ext) -8 crystallisation conditions must be less than 2.7 x 10 M. The Ksp of CaI2 or 9 2+ Ca(HCOO)2 is ~ 2 x 10 times that of CaF2 and the [Ca ] may therefore be significantly higher in the dmCLIC and EXC-4 crystallisation conditions. If the bound ion is instead magnesium, the magnesium fluoride salt is also significantly less soluble than its iodide or formate counterparts. The CLIC4(ext) crystal structure can therefore conceivably be considered a low calcium state of CLIC4, while the structures of dmCLIC and EXC-4 may have retained sufficient quantities of Ca2+ from purification to represent, at least partially, the calcium bound state. Further structures of dmCLIC, EXC-4 and CLIC4

208 attained under calcium added, and calcium-chelating conditions may be useful in elucidating how calcium ion concentrations affect these proteins.

An analysis of the potential CLIC4(ext) Ca2+ binding site shows that the asparagine residue in helix 3 that co-ordinates the calcium ion in dmCLIC and EXC-4 corresponds to a conserved valine residue in the vertebrate CLIC sequences containing the PGGT[H/N]PP loop sequence. Also in the CLIC4(ext) structure the carbonyl group of the residue equivalent to Leu-95 in dmCLIC, is orientated away from the putative calcium ion site. Thus, if this calcium binding site is a general feature of the CLIC proteins containing the double-proline loop sequence, then the residues at the N- terminus of helix 3 in the CLIC4(ext) structure must undergo a degree of rearrangement to accommodate ion binding.

In support of a common calcium related role of the CLIC proteins containing the double proline loop sequence, three CLIC family members containing this sequence (CLIC 4, 5 and 6) have been maintained within the ACD gene cluster, which also includes members of a calcineurin inhibitor family55. Additionally CLIC2 has recently been shown to downregulate a Ca2+ release channel36.

6.4.5 N- and C-terminal extensions

The dmCLIC and EXC-4 structures both have additional N- and C-terminal extensions to their CLIC modules that are not found in the vertebrate CLICs. A proline residue at the start of β-strand 1 delimits the N-terminal side of the CLIC module. At the N- termini of the CLICs there appears to be little to no conservation of residues prior to this proline. These N-terminal regions are presumably relatively flexible as in the CLIC1, CLIC4(ext) and EXC-4 structures the electron density begins near this point. The dmCLIC structure has the most residues visible prior to this proline, however this is possibly due to it being stabilised by fortuitous packing interactions.

In contrast to the highly variable CLIC N-termini, the C-terminal extensions to the CLIC module show a greater degree of conservation. Following a highly conserved

209 tyrosine at the C-terminus of helix 9 (see Insert 1), the C-termini of the two vertebrate CLIC structures extend towards the top of the C-terminal domain passing between the loop connecting h4 and h5 on one side and C-terminus of h6 on the other. Excluding the CLIC3 orthologues, following this conserved tyrosine most of the vertebrate CLICs have the sequence YxxVAK[x]3-5COOH. This sequence similarity suggests that the vertebrate CLICs presumably all have very similar structures near their C-termini.

In contrast to the short vertebrate C-termini, the dmCLIC and EXC-4 structures both have much longer extensions, these are discussed in Section 6.3.4 and shown in Figure 6-10 and Figure 6-11. These extensions consist of a loosely structured extended loop region that, instead of extending towards the top of the C-terminal domain, reverses direction after h9 then crosses over the C-terminus of h8 before running along the length of h7 and finally turning perpendicular to h7 near its N-terminus. Within these extensions most of the interactions with the body of the molecule occur via backbone atoms placing few constraints on the residues within this sequence. However in most nematode EXC-4 proteins and arthropod CLICs, around 15 residues after the conserved h9 tyrosine are the residues PxxxxxIP, where the undefined (x) residues between the prolines are often serine or threonine (see Insert 1). The first proline in this sequence occurs at the bend where the C-terminal extension crosses over h8, it is present in all but two of the known EXC-4 proteins and arthropod CLICs. The isoleucine and second proline are more highly conserved; these occur where the C-terminal extension turns perpendicular to h7, they pack against a conserved tryptophan residue within the helix. From the presence of these residues, it appears as if C-terminal extensions like that of EXC-4 or dmCLIC are present in all the arthropod CLICs; the nematode EXC-4, but not the EXL-1, proteins; and the platyhelminth CLIC-like proteins. It is not known what role these C-terminal extensions play.

6.4.6 Crystallographic dimer

As shown in Figure 6-3, EXC-4, like the other CLICs, was purified as a soluble monomer under reducing conditions in size exclusion chromatography. However in the

P21 EXC-4 crystal form, there are two molecules in the asymmetric unit that produce a crystallographic dimer. While caution should be taken in inferring physiological significance to such interactions observed in protein crystals, there are several

210 interesting aspects to this dimer. Firstly the accessible surface area buried upon interaction of EXC-4 monomers is large, ~1200 Å2 per monomer. Although highly polar this buried surface area is larger than that observed in the oxidised CLIC1 dimer (~ 1100 Å2 per monomer) and the omega class GSTs (~ 980 Å2 per monomer). The arrangement of the EXC-4 crystallographic dimer also bears some resemblance to that seen in the GSTs in that it consists of a rough antiparallel arrangement of monomers, albeit it is further offset in EXC-4 compared to the GSTs (compare Figure 1-10 and Figure 6-5). This offset means that most of the interactions in the EXC-4 crystallographic dimer occur between the two helix 3’s and between helix 2 from one subunit and helix 4 and helix 9 of the other. In contrast, in the GSTs the dimer interface is typically built from interactions between residues in helix 3 and β-strand 4 from one subunit, and helix 4 from the other subunit, as well as between residues within the cis- proline loop of one subunit and helices 4 and 5 of the other.

6.4.7 Helix 4

In the structures of dmCLIC, CLIC1 and CLIC4(ext) helix h4 is split into two halves: h4a and h4b (see Figure 6-9C and D). In these structures a single turn of the helix appears highly extended forming an S-shaped loop structure between the two halves of the helix. Additionally the residues within helices h4a and h4b leading into this loop show a small degree of helix fraying (more so in the dmCLIC structure). Although the backbone structure is similar, the interactions stabilising this disruption in the dmCLIC and vertebrate CLIC structures differ.

In the vertebrates, the S-shaped loop is stabilised by an aspartate residue (Asp-109 in CLIC1) within the upper bulge of the loop that forms a hydrogen bond to its own nitrogen peptide.

In dmCLIC instead of an aspartate stabilising the loop, His-150 from helix 5 inserts into the lower bulge of the S-shaped loop and forms hydrogen bonds to the carbonyl backbone groups of Leu-124 at the C-terminus of h4a and Glu-126 within the bulge (see Figure 6-17A). Also as seen in the vertebrates this helix-destabilising residue interacts with the side chain of a lysine (Lys-131) at the N-terminus of h4b.

211 In contrast to dmCLIC, as discussed in Section 6.3.4 and shown in Figure 6-9C and Figure 6-17B, in the EXC-4 crystal structure h4 exists as a single continuous helix. However despite these differences in backbone conformation, the residues in helix 4 that are most highly conserved between the dmCLIC and EXC-4 sequences are those within the structurally divergent S-shaped loop region connecting the two halves of helix 4 in dmCLIC. Indeed all the residues within this S-shaped loop as well as the immediately adjacent C- and N-terminal residue of h4a and h4b are identical (125- IENLY-128 in dmCLIC). It is unclear why these residues would remain conserved despite significant changes in secondary structure (see Figure 6-17C), especially while the remaining portion of h4 that is helical in both structures shows little to no conservation.

Figure 6-17 Structure of helix 4 in dmCLIC and EXC-4. A: The residues within, and those that stabilise, the S-shaped loop between helices 4a and 4b in the dmCLIC structure are shown. B: The equivalent region in the EXC-4 structure is shown. C: A Ramachandran plot for the conserved residues common to both structures is shown (see text); labelled residues are in dmCLIC only.

212

One hypothesis deserving of further experimental tests is the possibility that each protein can adopt both structures, that is h4a and h4b in dmCLIC are able to condense into a single helix under certain conditions and vice versa for EXC-4. If this were true, the residues within the S-shaped loop would be under greater selection pressures, as they must be compatible with two conformations and retain the ability to switch between them. What could this hypothetical transformation affect? Several possibilities exist. The simplest of these is regulation of the orientation and position of the conserved tyrosine residue on the C-terminal side of the S-shaped loop, the hydroxyl group of this tyrosine lies near the putative GST-like active site. A second possibility is to influence the position of a conserved glutamate within the loop, it has been noted by others19 that this residue may play a role in GSH binding if the CLICs were to form classical GST dimers. The third and most speculative possibility is that the EXC-4 crystallographic dimer is physiologically relevant and the differences in h4 mediate changes at this dimer interface that affect oligomerisation.

If such a structural transformation does occur many factors would presumably influence switching between the two states. An obvious candidate for one of these factors can be seen in dmCLIC. As discussed above, in this structure His-150 inserts between h4a and h4b forming hydrogen bonds with two backbone carbonyl groups from within the S- shaped loop and a lysine side chain from h4b (see Figure 6-17A). If the pH of this histidine’s local environment dropped below its pKa it would become positively charged. The repulsion between this positive charge and that of the interacting lysine may sufficiently repel the histidine side chain from the S-shaped loop to prevent it from forming its stabilising interactions. This could perhaps lead to the two halves of h4 condensing into a single helix.

6.4.8 DmCLIC and EXC-4 hydropathy

One of the major stumbling blocks in accepting the CLIC proteins as ion channels has been the lack of any obvious primary structure compatible with a transmembrane spanning topology. Two mildly hydrophobic putative transmembrane regions have been proposed for the CLICs and the residues within both these regions are highly conserved throughout the chordate family members. The first of these putative TM

213 regions is comprised of residues that are contained within h1 and s2 of the soluble CLIC GST-fold, the second includes residues from within h6 and the loop region preceding it. Covering 23 to 25 residues, these two putative transmembrane regions are sufficiently long to form membrane spanning helices215. However neither is particularly hydrophobic, as a third to a half of their residues are charged or polar (see Figure 6-18). Any transmembrane state utilising these putative TM regions would thus be required to bury these charged and polar residues within the membrane. Most transmembrane prediction programs do not identify that either of these regions would lie within a membrane. It has been noted by Cromer et al. that the putative CLIC TM regions are not excessively more hydrophobic than the structurally equivalent regions of most GST19. If CLIC1, CLIC4, dmCLIC and EXC-4 all form transmembrane states utilising these putative TM regions, their moderate hydrophobicity should, at the very least, be maintained throughout evolution.

Figure 6-18 Kyte-Doolittle hydropathy plot of various CLICs around their putative transmembrane regions. A Kyte-Doolittle hydropathy plot216 produced using the EMBOSS program suite217 is shown for CLIC1 (blue), CLIC4 (green), dmCLIC (magenta) and EXC-4 (black). Classical transmembrane domains typically produce peaks extending above the red line. Alignment produced using Clustal W120 with manual editing, the hydropathy plot was produced with a window length of 9 residues.

A plot of the Kyte-Doolittle hydropathy for the residues within these putative TM regions is shown in Figure 6-18. The first putative TM region is most hydrophobic in CLIC4 (in green), CLIC1 (in blue) has a similar profile but 3 substitutions within this region make it less hydrophobic than CLIC4 overall. For both dmCLIC (in magenta) and EXC-4 (in black) this region is slightly less hydrophobic than in CLIC1. The invertebrate sequences also have insertions at the C-terminus of h1 with respect to the

214 chordates, these insertions include several charged residues. The hydrophobic nature of the second putative TM domain centred around h6 is less conserved across species. The closely related vertebrate CLIC1 and CLIC4 proteins both show very similar peaks in hydrophobicity following their negatively charged footloops (see Figure 6-18), however dmCLIC and especially EXC-4 are much more polar in this region. If the invertebrate CLIC homologues perform the same functions as the chordate CLICs the non-conserved nature of this second putative TM region suggests it is unlikely to be a major part of any integral membrane structure. The degree of conservation for the first putative TM region is much higher and therefore does not rule out its involvement but this region is slightly more polar than in the chordate CLICs.

In conclusion, in a protein capable of existing in both soluble and membrane bound forms, the most stringent selection constraints would be expected to be imposed on those portions of the protein that must transform from the soluble form into membrane spanning elements in the transmembrane form. Such conservation is not observed for the putative TM region near h6, but is to some extent near h1 and s2. However, alone, such conservation is not sufficient to definitively prove this is a TM region.

6.4.9 Conclusions

In this chapter, the crystal structure of the CLIC homologues from two model organisms was presented. These were dmCLIC from the D. melanogaster and EXC-4 from the nematode C. elegans. Structurally, these proteins are highly similar to human CLIC1 and CLIC4 presented in Chapters 2 and 4 respectively. DmCLIC and EXC-4 appear to be more closely related to the CLICs than to any other currently known proteins with the GST-fold, suggesting a direct evolutionary relationship. Alone, this does not necessarily imply conservation of function. As discussed in Chapter 4, the structural differences between CLIC1 and CLIC4 are mainly confined to the loop region between h2 and s3. The invertebrate CLIC homologues have a CLIC4-like sequence within this region and thus appear to be more closely related to CLIC4 than CLIC1. However, similar C-terminal extensions and several common insertions and deletions within their respective CLIC modules, mean EXC-4 and dmCLIC are more closely related to each other than to either of the chordate CLIC structures.

215 The soluble structures of EXC-4 and dmCLIC have several minor differences compared to the vertebrate CLICs. These include a C-terminal extension that runs away from the GSH binding site, this mainly forms an extended loop structure but in EXC-4 also contains a short α-helix. The two invertebrate CLIC structures also contain a short insertion at the C-terminus of h1, this is contained within one of the putative TM regions. Unlike the chordate CLICs, neither of the invertebrate CLICs possess a flexible footloop between h5 and h6. However in the nematode EXC-4 proteins a similar flexible loop is inserted between h4 and h5.

Unlike the GSTs, the CLICs display a relatively low affinity for glutathione. Perhaps explaining this, an analysis of the crystal structure of CLIC1 suggested that many of the interactions normally made between GSH and the protein’s active site in the GSTs are unable to be formed in the CLICs. In the crystal structures of both dmCLIC and EXC-4 2+ were presented in this chapter, a bound Ca ion is found close to the putative GSH- binding site. From this it was hypothesised that calcium binding may restore some of the missing protein:GSH interactions and thus may be an allosteric effector of GSH binding in the invertebrate CLIC-like proteins and possibly also in the vertebrate CLICs. It will be interesting to see if future experiments confirm or deny this hypothesis.

If the chordate CLICs are proven to be chloride ion channels, the question of whether the invertebrate homologues perform a similar function then arises. While their sequences are quite divergent, on a structural level there appear to be few major differences between the soluble forms of the vertebrate CLICs and dmCLIC or EXC-4. In the chordate CLICs, two putative transmembrane regions with weak hydrophobicity have been predicted, but only one of these regions is of a similar hydrophobicity in the invertebrate homologues. Even then, this region is more hydrophilic in the invertebrate sequences. Future experiments examining whether dmCLIC and EXC-4 can form ion channels in vivo and in vitro are an obvious and necessary extension to current work.

216 Chapter 7 Discussion

7.1 Introduction

In this chapter the final conclusions of the thesis are re-visited, a summary of what is, and what is not known about the CLIC family is presented, and finally, suggestions for the direction of future work are made.

7.2 Conclusions from this thesis In this thesis the CLIC protein family was examined from a structural point of view. The function the CLICs play within the cell is unknown, but several hypotheses have been proposed. One hypothesis is that they may function as chloride ion channels within intracellular membranes. However the CLICs are predominantly soluble proteins, and if they are to act as ion channels they must be able to transfer between a soluble and membrane inserted state. The crystal structure of a soluble form of CLIC1 had previously been solved and this was seen to adopt a GST-fold. In this thesis the crystal structures of four new soluble CLIC structures are reported. Conclusions are made about the ion channel hypothesis and future research directions are suggested.

Four new crystal structures were presented. Three of these were similar to the previously solved soluble CLIC1 structure, but the fourth was of a drastically different, albeit still soluble, dimeric state. By comparing these four structures, several insights into the nature of the CLICs were made but the question of how, or indeed if, the CLICs form ion channels is still to be resolved.

Chapter 1 summarised and interpreted the current scientific literature on the CLIC family. Chapter 3 then looked at the sequence conservation of the CLICs, showing that the family originated early during the evolution of animals. The 6 human family members were seen to have most likely originated during a series of gene duplications early in the evolution of the vertebrates. In this chapter it was shown that the vertebrate CLIC family members can be categorised into two smaller sub-groups. One sub-group consists of CLICs 4, 5 and 6. These are all highly related and contained within a conserved cluster of genes, and as such they are likely to have been formed as part of a

217 large-scale segmental genome triplication. The other sub-group of CLICs, is more internally diverse.

The soluble monomeric crystal structure of CLIC1 had been solved prior to this work3 and was analysed in Chapter 2. For the sake of comparison, in this thesis the equivalent structure for CLIC4 was presented in Chapter 4, and those of the dmCLIC and EXC-4 in Chapter 6. These proteins all adopted highly similar soluble monomeric states that belong to the GST-fold superfamily. The structures of CLIC1, CLIC4, dmCLIC and EXC-4 show that these proteins are more closely related to each other than they are to other GST-fold family members and hence are likely to have a direct evolutionary relationship.

Compared to the glutathione transferases, the CLICs are monomeric, possess a cysteine residue within their active site, and have low affinities for glutathione. These are characteristics also contained by a subgroup of proteins with the GST-fold that do not have glutathione transferase activity.

In Chapter 5 the crystal structure of an oxidised dimeric form of CLIC1 was solved.

The CLIC1 dimer was formed in vitro after oxidation with H2O2, this resulted in the formation of an intramolecular disulphide bond and the re-folding of the N-terminal thioredoxin domain. The structural re-arrangement that is concomitant with the oxidative dimerisation of CLIC1 results in the loss of the GST-fold’s β-sheet, the hydrophobic surface thus exposed forms the new dimer interface. This new dimeric CLIC structure was interesting, as it showed that the CLIC module’s GST-fold was capable of large-scale structural changes. The formation of the intramolecular disulphide bond was seen to be necessary for the oxidative dimerisation of CLIC1, but as one of the cysteine residues in this bond is unique to CLIC1 it appears unlikely that other CLIC family members are capable of dimerising in a similar manner. Indeed, CLIC4, dmCLIC and EXC-4 were not seen to dimerise under equivalent oxidising conditions. However, an analysis of the regions that differ between the two CLIC1 forms shows that the residues involved are largely conserved throughout the CLIC family. Thus it is conceivable that other CLIC family members may be capable of forming a structure similar to that of the CLIC1 dimer in response to alternative signals. Hypotheses about possible physiologically relevant consequences of the CLIC1 dimer 218 were made, but await future work for confirmation. These are the major experimental conclusions from this thesis.

7.3 The evidence for the CLICs as ion channels How do the sum of these conclusions affect the argument for, or against, the hypothesis that the CLICs act as ion channels? Table 7-1 summarises some of the factors and experiments relevant to this hypothesis.

Table 7-1 Summary of the arguments for and against the CLICs acting as ion channels. Each cell of the table gives a brief description of arguments for (green), against (red) or neutral to (blue) the hypothesis that the CLICs act as ion channels. Also shown are any mitigating factors to, or definitive conclusions that can be drawn from each argument.

Evidence for or against the Definitive conclusions able be Mitigating factors CLICs as ion channel proteins drawn from this experiment. CLICs first identified as p64 A soluble π class GST also bound to a compound known to bound the inhibitor in the same p64 binds IAA-23. inhibit a chloride conductance in experiment14. bovine membrane extracts14. Antibodies raised against p64 In bovine membrane extracts at The antibodies used most likely immunodepleted this chloride least one CLIC family member bound to multiple CLIC family conduction in bovine membrane is in some way involved in members. extracts14,18. chloride conduction. In vivo overexpression of Overexpression of some CLIC CLIC111,25,28,66 and CLIC4 48 in a Most cell-lines express multiple family members increases variety of cell-lines show CLIC family members chloride conduction at the increased chloride conduction at endogenously. plasma membrane. the plasma membrane. IAA-94 is likely to be a highly non-selective inhibitor with In most cases this increased many secondary effects. The conduction is IAA-94 None affect of IAA-94 on CLIC- sensitive11,25,28,48,66. related chloride conductance varies in different reports. The current characteristics Purified CLIC1 when added to In in vitro experimental systems measured by different artificial lipid bilayers elicits CLIC1 alone is sufficient to laboratories are not in chloride conduction11-13. produce chloride conduction. agreement. Efflux is dependent on the In in vitro experimental systems Purified CLIC1 or CLIC5 can nature of the lipid used to make CLIC1 or CLIC5 alone are elicit chloride efflux from the vesicles. The highest levels sufficient to produce chloride artificial lipid vesicles12,13,51,52. of efflux are measured with the conduction. crudest lipid extracts. Cells expressing FLAG-tagged Yet to be reproduced by CLIC1 show increased chloride independent laboratories, CLIC1 appears capable of conduction at the plasma although identical results were forming a transmembrane state. membrane. Anti-FLAG seen in similar experiments antibodies block this current in a performed with CLIC4 48. 219 manner that is dependent upon which face of the membrane the antibodies are added and whether the cells are expressing N- or C-terminal FLAG constructs66. The CLICs are mainly soluble proteins, if they form ion Other proteins such as bacterial channels they must transform toxins, bcl-2 and the annexins do None between soluble and membrane this. inserted states. The soluble form of the CLIC In at least CLIC1, the CLIC proteins contains a GST-like Large-scale structural changes GST-like fold has the capability fold3,4, this fold was not occur when forming the CLIC1 of being structurally dynamic, previously known to be oxidised dimer52. albeit the physiological dynamic. relevance of this is unknown. The concentrations of H2O2 used for dimerisation are higher than is likely to be found under In vitro oxidation of purified physiological conditions. The CLIC1 results in dimerisation CLIC1 can dimerise on structural transition is probably concomitant with large-scale oxidation in vitro. unique to CLIC1. The dimer structural change52. structure does not elucidate possible mechanisms for ion channel formation. The CLICs have no obvious Any membrane spanning highly hydrophobic segments. The second putative CLIC TM structure of the CLICs is not Two putative TM regions have segment is equally hydrophobic easily identifiable from their been proposed but these are both in the GSTs19. amino acid sequence alone. relatively hydrophilic. The GST-fold has not previously Our knowledge of the function been associated with ion channel of proteins containing the GST None proteins. fold is surely incomplete. The channel characteristics of CLIC-related channels are poorly understood. The conductance, voltage response, In most cases the relevant ion specificity and inhibitor experiments simply haven’t None profiles of CLIC-related been done. channels are either unknown or differ when measured by independent laboratories (reviewed by7). No CLIC mutants are known which affect the ion selectivity of the CLIC-related chloride Relevant experiments haven’t None channels. Such experiments are been done. vital in proving a protein is an ion channel7. It is difficult to see what sort of transmembrane structure is Biology not constrained by None compatible with the CLIC’s human understanding. hydrophilic nature.

One of the major points against the CLICs acting as ion channels is the fact that such a hypothesis would require that the CLICs transform from a soluble state to a membrane spanning state. While some proteins are known to do this, including a few ion channels,

220 the soluble form of the CLICs contains a globular GST-fold, a fold not previously known to be structurally dynamic. In this thesis, by solving the structure of the oxidised CLIC1 dimer, it was shown that the CLIC GST-fold is actually capable of being structurally dynamic, thus countervailing this particular argument against the ion channel hypothesis of the CLICs. The physiological relevance of the oxidised CLIC1 dimer remains unclear, which limits the ability to draw definitive conclusions in support of the ion channel hypothesis from the CLIC1 dimer structure. However it can be noted that the majority of changes occurring during the CLIC1 dimer formation are in the N- terminal thioredoxin-like domain of the monomer’s GST fold. If the CLICs do form ion channels, it is possible that some of the changes that occur during the oxidative dimerisation of CLIC1 may parallel those occurring during membrane insertion.

7.4 Future work Future lines of experimental inquiry directly relating to the work presented in this thesis have largely been addressed in the conclusions of each of the individual chapters. Therefore only a brief, final overview is required here.

The key question about the CLIC family is still to be resolved. That is, in vivo is their primary role to function as chloride ion channels? Section 7.3 summarised much of the current experimental evidence for and against this hypothesis, but as can be seen from Table 7-1 conclusions cannot yet be drawn with any confidence. Some of the remaining ambiguity that needs to be resolved arises from the differences in observations made of the CLIC-related chloride conductance in independent laboratories. It has previously been suggested that this is possibly caused by minor variations in experimental design7,11. While not wishing to constrain the techniques used to probe the nature of the CLICs, to overcome this possible source of ambiguity it is vital for the purposes of comparison that a standardised set of conditions becomes incorporated within the design of CLIC-related channel experiments. Ideally this would include, where possible, the response of any measured channel activity to a common series of buffer and voltage conditions, inhibitors and ion specificity measurements. The ion-specificity of the CLIC-related channel and its behaviour in response to various inhibitors is particularly poorly understood and needs to be comprehensively addressed in further work.

221 Further structural work on the CLICs needs to occur hand in hand with channel characterisation and cell-based experimental studies. Experiments need to be designed to test the validity of the two putative transmembrane regions of the CLICs. This will presumably occur by examining how site-directed mutations within each of these regions affect channel characteristics or membrane binding.

Hypotheses arising from the structure of the oxidised CLIC1 dimer also require testing. Firstly, the physiological relevance of the CLIC1 dimer needs to be examined in vivo. CLIC1 mutations that either inhibit or promote dimer formation should be characterised. Eliminating either of the two cysteines involved in the intramolecular disulphide bond prevents dimerisation. In Chapter 5 it was shown that these cysteine knockout mutants do not form channels in vitro, and studies of their behaviour in vivo are now needed. As one of these cysteine residues is within the putative glutathione-binding site, its mutation may also affect any glutathione-dependent enzymatic or transferase activity, so other mutations away from this site should also be sought. It is likely to be more difficult to discern the effect of promoting CLIC1 dimer formation in vivo as changes to a cell’s redox status are likely to have large-scale secondary effects. Mutants that destabilise the monomer structure and stabilise that of the dimer may separate the structural transition from the cell’s redox state. A CLIC1 truncation mutant lacking residues within the first β-strand may prevent the formation of the soluble monomer but still be capable of dimer formation. However it would have to be ensured that such a mutant folds correctly.

The question of whether other CLIC family members can form a structure analogous to that of the oxidised CLIC1 dimer also needs to be answered. Perhaps changes in pH, phosphorylation state, metal ion or salt concentration may result in the formation of a similar structure by the other CLIC family members. Alternatively all the CLICs may respond to different redox signals with CLIC1 exquisitely suited to regulation via peroxide.

Finally, experiments should perhaps not be so stringently constrained by the CLIC ion channel hypothesis, it may be beneficial to examine other possible cellular roles for the CLICs. Within the cell a large portion of each CLIC family member is in the soluble state, so only a small percentage is membrane associated. This skewed distribution may 222 indicate that the soluble form of the CLICs may also have some form of glutathione transferase activity or glutathione dependent enzymatic activity. Any such activity may be independent of the putative ion channel conductance, or alternatively may in some way regulate it. Determining any such activity is not an easy task as there are a large number of potential substrates. As discussed in Section 3.2.1, unlike the glutathione transferases that are largely active against exogenous compounds, the large sizes of the vertebrate clic2, 4, 5 and 6 genes would seem to preclude their involvement in a cellular detoxification role and are more indicative of involvement in a regulatory pathway. It has been suggested that, like the omega class GSTs49, in the GST-fold of the soluble monomeric CLIC1 the active site is more open and may be able to bind a peptide chain3. The active site of the CLICs may perhaps therefore be a protein-protein binding site used to bind S-thiolated proteins. Alternatively the similarity between the CLICs and the plant DHARs87 mentioned in Section 1.5, may be indicative of a CLIC redox- associated activity with ascorbic acid or a related compound. A final possibility is that the CLICs membrane association is due to their having a glutathione-dependent enzymatic activity towards a particular lipid. Some of the enzymes involved in the formation of various regulatory lipids are known glutathione transferases or GST-fold family members. This includes enzymes responsible for the synthesis or isomerisation of some of the prostaglandins218 and cysteinyl leukotrienes219. If the CLICs are lipid active, upregulation may result in their activity escaping normal controls. If the CLICs do enzymatically alter some lipids, their associated channel activity on overexpression may be an unintended consequence of unregulated membrane destabilising effects. At the very least, it may be useful to test the activity of the CLICs towards different lipids simply to exclude this possibility.

In conclusion, many more experiments designed to systematically characterise the CLIC-related channel activity are required. Once such a characterisation has been performed, mutations designed to alter the behaviour of the channel could be used to probe the models proposed for membrane insertion. However, in parallel to such channel related experiments, possible substrates for a CLIC-related glutathione transferase activity or glutathione dependent enzymatic activity should also be pursued.

223

DRL

224 References

1. Kraulis, P. J. MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr. 24, 946-950 (1991). 2. Merritt, E. A. & Murphy, M. E. P. Raster3D version 2.0--a program for photorealistic molecular graphics. Acta Crystallogr. D50, 869-873 (1994). 3. Harrop, S. J. et al. Crystal structure of a soluble form of the intracellular chloride ion channel CLIC1 (NCC27) at 1.4-A resolution. J Biol Chem 276, 44993-5000 (2001). 4. Dulhunty, A., Gage, P., Curtis, S., Chelvanayagam, G. & Board, P. The glutathione transferase structural family includes a nuclear chloride channel and a ryanodine receptor calcium release channel modulator. J Biol Chem 276, 3319- 23 (2001). 5. Sheehan, D., Meade, G., Foley, V. M. & Dowd, C. A. Structure, function and evolution of glutathione transferases: implications for classification of non- mammalian members of an ancient enzyme superfamily. Biochem J 360, 1-16 (2001). 6. Wilce, M. C. & Parker, M. W. Structure and function of glutathione S- transferases. Biochim Biophys Acta 1205, 1-18 (1994). 7. Ashley, R. H. Challenging accepted ion channel biology: p64 and the CLIC family of putative intracellular anion channel proteins (Review). Mol Membr Biol 20, 1-11 (2003). 8. Jentsch, T. J., Stein, V., Weinreich, F. & Zdebik, A. A. Molecular structure and physiological function of chloride channels. Physiol Rev 82, 503-68 (2002). 9. Debska, G., Kicinska, A., Skalska, J. & Szewczyk, A. Intracellular potassium and chloride channels: an update. Acta Biochim Pol 48, 137-44 (2001). 10. Roosild, T. P. et al. NMR structure of Mistic, a membrane-integrating protein for membrane protein expression. Science 307, 1317-21 (2005). 11. Warton, K. et al. Recombinant CLIC1 (NCC27) assembles in lipid bilayers via a pH-dependent two-state process to form chloride ion channels with identical characteristics to those observed in Chinese hamster ovary cells expressing CLIC1. J Biol Chem 277, 26003-11 (2002).

225 12. Tulk, B. M., Kapadia, S. & Edwards, J. C. CLIC1 inserts from the aqueous phase into phospholipid membranes, where it functions as an anion channel. Am J Physiol Cell Physiol 282, C1103-12 (2002). 13. Tulk, B. M., Schlesinger, P. H., Kapadia, S. A. & Edwards, J. C. CLIC-1 functions as a chloride channel when expressed and purified from bacteria. J Biol Chem 275, 26986-93 (2000). 14. Landry, D. et al. Molecular cloning and characterization of p64, a chloride channel protein from kidney microsomes. J Biol Chem 268, 14948-55 (1993). 15. Nishizawa, T., Nagao, T., Iwatsubo, T., Forte, J. G. & Urushidani, T. Molecular cloning and characterization of a novel chloride intracellular channel-related protein, parchorin, expressed in water-secreting cells. J Biol Chem 275, 11164- 73 (2000). 16. Landry, D. W., Reitman, M., Cragoe, E. J., Jr. & Al-Awqati, Q. Epithelial chloride channel. Development of inhibitory ligands. J Gen Physiol 90, 779-98 (1987). 17. Landry, D. W. et al. Purification and reconstitution of chloride channels from kidney and trachea. Science 244, 1469-72 (1989). 18. Redhead, C. R., Edelman, A. E., Brown, D., Landry, D. W. & al-Awqati, Q. A ubiquitous 64-kDa protein is a component of a chloride channel of plasma and intracellular membranes. Proc Natl Acad Sci U S A 89, 3716-20 (1992). 19. Cromer, B. A., Morton, C. J., Board, P. G. & Parker, M. W. From glutathione transferase to pore in a CLIC. Eur Biophys J 31, 356-64 (2002). 20. Shanks, R. A. et al. AKAP350 at the Golgi apparatus. II. Association of AKAP350 with a novel chloride intracellular channel (CLIC) family member. J Biol Chem 277, 40973-80 (2002). 21. Mizukawa, Y., Nishizawa, T., Nagao, T., Kitamura, K. & Urushidani, T. Cellular distribution of parchorin, a chloride intracellular channel-related protein, in various tissues. Am J Physiol Cell Physiol 282, C786-95 (2002). 22. Redhead, C., Sullivan, S. K., Koseki, C., Fujiwara, K. & Edwards, J. C. Subcellular distribution and targeting of the intracellular chloride channel p64. Mol Biol Cell 8, 691-704 (1997). 23. Edwards, J. C. A novel p64-related Cl- channel: subcellular distribution and nephron segment-specific expression. Am J Physiol 276, F398-408 (1999).

226 24. Edwards, J. C. & Kapadia, S. Regulation of the bovine kidney microsomal chloride channel p64 by p59fyn, a Src family tyrosine kinase. J Biol Chem 275, 31826-32 (2000). 25. Valenzuela, S. M. et al. Molecular cloning and expression of a chloride ion channel of cell nuclei. J Biol Chem 272, 12575-82 (1997). 26. Tulk, B. M. & Edwards, J. C. NCC27, a homolog of intracellular Cl- channel p64, is expressed in brush border of renal proximal tubule. Am J Physiol 274, F1140-9 (1998). 27. Ribas, G., Neville, M., Wixon, J. L., Cheng, J. & Campbell, R. D. Genes encoding three new members of the leukocyte antigen 6 superfamily and a novel member of Ig superfamily, together with genes encoding the regulatory nuclear chloride ion channel protein (hRNCC) and an N omega-N omega- dimethylarginine dimethylaminohydrolase homologue, are found in a 30-kb segment of the MHC class III region. J Immunol 163, 278-87 (1999). 28. Valenzuela, S. M. et al. The nuclear chloride ion channel NCC27 is involved in regulation of the cell cycle. J Physiol 529 Pt 3, 541-52 (2000). 29. The yeast genome directory. Nature 387, 5 (1997). 30. Wood, V. et al. The genome sequence of Schizosaccharomyces pombe. Nature 415, 871-80 (2002). 31. Berryman, M. & Bretscher, A. Identification of a novel member of the chloride intracellular channel gene family (CLIC5) that associates with the actin cytoskeleton of placental microvilli. Mol Biol Cell 11, 1509-21 (2000). 32. Chuang, J. Z., Milner, T. A., Zhu, M. & Sung, C. H. A 29 kDa intracellular chloride channel p64H1 is associated with large dense-core vesicles in rat hippocampal neurons. J Neurosci 19, 2919-28 (1999). 33. Myers, K., Somanath, P. R., Berryman, M. & Vijayaraghavan, S. Identification of chloride intracellular channel proteins in spermatozoa. FEBS Lett 566, 136-40 (2004). 34. Heiss, N. S. & Poustka, A. Genomic structure of a novel chloride channel gene, CLIC2, in Xq28. Genomics 45, 224-8 (1997). 35. Fan, L., Yu, W. & Zhu, X. Interaction of Sedlin with chloride intracellular channel proteins. FEBS Lett 540, 77-80 (2003).

227 36. Board, P. G., Coggan, M., Watson, S., Gage, P. W. & Dulhunty, A. F. CLIC-2 modulates cardiac ryanodine receptor Ca(2+) release channels. Int J Biochem Cell Biol 36, 1599-612 (2004). 37. Qian, Z., Okuhara, D., Abe, M. K. & Rosner, M. R. Molecular cloning and characterization of a mitogen-activated protein kinase-associated intracellular chloride channel. J Biol Chem 274, 1621-7 (1999). 38. Abe, M. K., Kuo, W. L., Hershenson, M. B. & Rosner, M. R. Extracellular signal-regulated kinase 7 (ERK7), a novel ERK with a C-terminal domain that regulates its activity, its cellular localization, and cell growth. Mol Cell Biol 19, 1301-12 (1999). 39. Howell, S., Duncan, R. R. & Ashley, R. H. Identification and characterisation of a homologue of p64 in rat tissues. FEBS Lett 390, 207-10 (1996). 40. Duncan, R. R., Westwood, P. K., Boyd, A. & Ashley, R. H. Rat brain p64H1, expression of a new member of the p64 chloride channel protein family in endoplasmic reticulum. J Biol Chem 272, 23880-6 (1997). 41. Fernandez-Salas, E., Sagar, M., Cheng, C., Yuspa, S. H. & Weinberg, W. C. p53 and tumor necrosis factor alpha regulate the expression of a mitochondrial chloride channel protein. J Biol Chem 274, 36488-97 (1999). 42. Suginta, W., Karoulias, N., Aitken, A. & Ashley, R. H. Chloride intracellular channel protein CLIC4 (p64H1) binds directly to brain dynamin I in a complex containing actin, tubulin and 14-3-3 isoforms. Biochem J 359, 55-64 (2001). 43. Berryman, M. A. & Goldenring, J. R. CLIC4 is enriched at cell-cell junctions and colocalizes with AKAP350 at the centrosome and midbody of cultured mammalian cells. Cell Motil Cytoskeleton 56, 159-72 (2003). 44. Fernandez-Salas, E. et al. mtCLIC/CLIC4, an organellular chloride channel protein, is increased by DNA damage and participates in the apoptotic response to p53. Mol Cell Biol 22, 3610-20 (2002). 45. Suh, K. S. et al. The organellular chloride channel protein CLIC4/mtCLIC translocates to the nucleus in response to cellular stress and accelerates apoptosis. J Biol Chem 279, 4632-41 (2004). 46. Ronnov-Jessen, L., Villadsen, R., Edwards, J. C. & Petersen, O. W. Differential expression of a chloride intracellular channel gene, CLIC4, in transforming growth factor-beta1-mediated conversion of fibroblasts to myofibroblasts. Am J Pathol 161, 471-80 (2002).

228 47. Chen, L. et al. Light damage induced changes in mouse retinal gene expression. Exp Eye Res 79, 239-47 (2004). 48. Proutski, I., Karoulias, N. & Ashley, R. H. Overexpressed chloride intracellular channel protein CLIC4 (p64H1) is an essential component of novel plasma membrane anion channels. Biochem Biophys Res Commun 297, 317-22 (2002). 49. Board, P. G. et al. Identification, characterization, and crystal structure of the Omega class glutathione transferases. J Biol Chem 275, 24798-806 (2000). 50. Urushidani, T., Chow, D. & Forte, J. G. Redistribution of a 120 kDa phosphoprotein in the parietal cell associated with stimulation. J Membr Biol 168, 209-20 (1999). 51. Berryman, M., Bruno, J., Price, J. & Edwards, J. C. CLIC-5A functions as a chloride channel in vitro and associates with the cortical actin cytoskeleton in vitro and in vivo. J Biol Chem 279, 34794-34801 (2004). 52. Littler D. R et al. The intracellular chloride ion channel protein CLIC1 undergoes a redox-controlled structural transition. J Biol Chem 280, 9298-9305 (2004). 53. Friedli, M. et al. Identification of a novel member of the CLIC family, CLIC6, mapping to 21q22.12. Gene 320, 31-40 (2003). 54. Griffon, N., Jeanneteau, F., Prieur, F., Diaz, J. & Sokoloff, P. CLIC6, a member of the intracellular chloride channel family, interacts with dopamine D(2)-like receptors. Brain Res Mol Brain Res 117, 47-57 (2003). 55. Strippoli, P. et al. Segmental paralogy in the human genome: a large-scale triplication on 1p, 6p, and 21q. Mamm Genome 13, 456-62 (2002). 56. Hattori, M. et al. The DNA sequence of human chromosome 21. Nature 405, 311-9 (2000). 57. Urushidani, T. & Forte, J. G. Signal transduction and activation of acid secretion in the parietal cell. J Membr Biol 159, 99-111 (1997). 58. Urushidani, T., Hanzel, D. K. & Forte, J. G. Protein phosphorylation associated with stimulation of rabbit gastric glands. Biochim Biophys Acta 930, 209-19 (1987). 59. Michel, J. J. & Scott, J. D. AKAP mediated signal transduction. Annu Rev Pharmacol Toxicol 42, 235-57 (2002).

229 60. Fanning, A. S. & Anderson, J. M. PDZ domains: fundamental building blocks in the organization of protein complexes at the plasma membrane. J Clin Invest 103, 767-72 (1999). 61. Danino, D. & Hinshaw, J. E. Dynamin family of mechanoenzymes. Curr Opin Cell Biol 13, 454-60 (2001). 62. Tzivion, G., Shen, Y. H. & Zhu, J. 14-3-3 proteins; bringing new definitions to scaffolding. Oncogene 20, 6331-8 (2001). 63. Edwards, J. C., Tulk, B. & Schlesinger, P. H. Functional expression of p64, an intracellular chloride channel protein. J Membr Biol 163, 119-27 (1998). 64. Lehner, B. et al. Analysis of a high-throughput yeast two-hybrid system and its use to predict the function of intracellular proteins encoded within the human MHC class III region. Genomics 83, 153-67 (2004). 65. Li, X. & Weinman, S. A. Chloride channels and hepatocellular function: prospects for molecular identification. Annu Rev Physiol 64, 609-33 (2002). 66. Tonini, R. et al. Functional characterization of the NCC27 nuclear protein in stable transfected CHO-K1 cells. Faseb J 14, 1171-8 (2000). 67. Novarino, G. et al. Involvement of the intracellular ion channel CLIC1 in microglia-mediated beta-amyloid-induced neurotoxicity. J Neurosci 24, 5322-30 (2004). 68. Arnould, T. et al. mtCLIC is up-regulated and maintains a mitochondrial membrane potential in mtDNA-depleted L929 cells. Faseb J 17, 2145-7 (2003). 69. Schlesinger, P. H., Blair, H. C., Teitelbaum, S. L. & Edwards, J. C. Characterization of the osteoclast ruffled border chloride channel and its role in bone resorption. J Biol Chem 272, 18636-43 (1997). 70. Cragoe, E. J., Jr. et al. (1-oxo-2-substituted-5-indanyloxy)acetic acids, a new class of potent renal agents possessing both uricosuric and saluretic activity. A reexamination of the role of sulfhydryl binding in the mode of action of acylphenoxyacetic acidsaluretics. J Med Chem 18, 225-8 (1975). 71. Woltersdorf, O. W., Jr., deSolms, S. J., Schultz, E. M. & Cragoe, E. J., Jr. (Acylaryloxy)acetic acid diuretics. 1. (2-Alkyl- and 2,2-dialkyl-1-oxo-5- indanyloxy)acetic acids. J Med Chem 20, 1400-8 (1977). 72. Oakley, A. J. et al. The structures of human glutathione transferase P1-1 in complex with glutathione and various inhibitors at high resolution. J Mol Biol 274, 84-100 (1997).

230 73. Oakley, A. J., Lo Bello, M., Mazzetti, A. P., Federici, G. & Parker, M. W. The glutathione conjugate of ethacrynic acid can bind to human pi class glutathione transferase P1-1 in two different modes. FEBS Lett 419, 32-6 (1997). 74. Cameron, A. D. et al. Structural analysis of human alpha-class glutathione transferase A1-1 in the apo-form and in complexes with ethacrynic acid and its glutathione conjugate. Structure 3, 717-27 (1995). 75. Ives, H. E. in Basic & Clinical Pharmacology (ed. K., B. G.) 241-258 (McGraw- Hill Companies, New York, 2004). 76. Hasannejad, H. et al. Interactions of human organic anion transporters with diuretics. J Pharmacol Exp Ther 308, 1021-9 (2004). 77. Brauer, M., Frei, E., Claes, L., Grissmer, S. & Jager, H. Influence of K-Cl cotransporter activity on activation of volume-sensitive Cl- channels in human osteoblasts. Am J Physiol Cell Physiol 285, C22-30 (2003). 78. Decher, N. et al. DCPIB is a novel selective blocker of I(Cl,swell) and prevents swelling-induced shortening of guinea-pig atrial action potential duration. Br J Pharmacol 134, 1467-79 (2001). 79. Culliford, S. et al. Specificity of classical and putative Cl(-) transport inhibitors on membrane transport pathways in human erythrocytes. Cell Physiol Biochem 13, 181-8 (2003). 80. Lauf, P. K. & Adragna, N. C. K-Cl cotransport: properties and molecular mechanism. Cell Physiol Biochem 10, 341-54 (2000). 81. Alper, S. L., Darman, R. B., Chernova, M. N. & Dahl, N. K. The AE gene family of Cl/HCO3- exchangers. J Nephrol 15 Suppl 5, S41-53 (2002). 82. Robinson, A., Huttley, G. A., Booth, H. S. & Board, P. G. Modelling and bioinformatics studies of the human Kappa-class glutathione transferase predict a novel third glutathione transferase family with similarity to prokaryotic 2- hydroxychromene-2-carboxylate isomerases. Biochem J 379, 541-52 (2004). 83. Ladner, J. E., Parsons, J. F., Rife, C. L., Gilliland, G. L. & Armstrong, R. N. Parallel evolutionary pathways for glutathione transferases: structure and mechanism of the mitochondrial class kappa enzyme rGSTK1-1. Biochemistry 43, 352-61 (2004). 84. Jowsey, I. R. et al. Mammalian class Sigma glutathione S-transferases: catalytic properties and tissue-specific expression of human and rat GSH-dependent prostaglandin D2 synthases. Biochem J 359, 507-16 (2001).

231 85. Koonin, E. V. et al. Eukaryotic translation elongation factor 1 gamma contains a glutathione transferase domain--study of a diverse, ancient protein superfamily using motif search and structural modeling. Protein Sci 3, 2045-54 (1994). 86. Umland, T. C., Taylor, K. L., Rhee, S., Wickner, R. B. & Davies, D. R. The crystal structure of the nitrogen regulation fragment of the yeast prion protein Ure2p. Proc Natl Acad Sci U S A 98, 1459-64 (2001). 87. Dixon, D. P., Davis, B. G. & Edwards, R. Functional divergence in the glutathione transferase superfamily in plants. Identification of two classes with putative functions in redox homeostasis in Arabidopsis thaliana. J Biol Chem 277, 30859-69 (2002). 88. Xia, B., Vlamis-Gardikas, A., Holmgren, A., Wright, P. E. & Dyson, H. J. Solution structure of Escherichia coli glutaredoxin-2 shows similarity to mammalian glutathione-S-transferases. J Mol Biol 310, 907-18 (2001). 89. Rossjohn, J. et al. A mixed disulfide bond in bacterial glutathione transferase: functional and evolutionary implications. Structure 6, 721-34 (1998). 90. Shimaoka, T., Yokota, A. & Miyake, C. Purification and characterization of chloroplast dehydroascorbate reductase from spinach leaves. Plant Cell Physiol 41, 1110-8 (2000). 91. Andrykovitch, M. et al. Characterization of four orthologs of stringent starvation protein A. Acta Crystallogr D Biol Crystallogr 59, 881-6 (2003). 92. Hansen, A. M. et al. Structural basis for the function of stringent starvation protein A as a transcription factor. J Biol Chem 280, 17380-91 (2005). 93. Bousset, L., Belrhali, H., Melki, R. & Morera, S. Crystal structures of the yeast prion Ure2p functional region in complex with glutathione and related compounds. Biochemistry 40, 13564-73 (2001). 94. Agianian, B. et al. Structure of a Drosophila sigma class glutathione S- transferase reveals a novel active site topography suited for lipid peroxidation products. J Mol Biol 326, 151-65 (2003). 95. Mannervik, B. et al. Nomenclature for human glutathione transferases. Biochem J 282 ( Pt 1), 305-6 (1992). 96. Mannervik, B. et al. Identification of three classes of cytosolic glutathione transferase common to several mammalian species: correlation between structural data and enzymatic properties. Proc Natl Acad Sci U S A 82, 7202-6 (1985).

232 97. Martin, J. L. Thioredoxin--a fold for all reasons. Structure 3, 245-50 (1995). 98. Weichsel, A., Gasdaska, J. R., Powis, G. & Montfort, W. R. Crystal structures of reduced, oxidized, and mutated human thioredoxins: evidence for a regulatory homodimer. Structure 4, 735-51 (1996). 99. Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577-637 (1983). 100. Chen, L. et al. Structure of an insect delta-class glutathione S-transferase from a DDT-resistant strain of the malaria vector Anopheles gambiae. Acta Crystallogr D Biol Crystallogr 59, 2211-7 (2003). 101. Rossjohn, J. et al. Human theta class glutathione transferase: the crystal structure reveals a sulfate-binding pocket within a buried active site. Structure 6, 309-22 (1998). 102. Raghunathan, S. et al. Crystal structure of human class mu glutathione transferase GSTM2-2. Effects of lattice packing on conformational heterogeneity. J Mol Biol 238, 815-32 (1994). 103. Garcia-Saez, I., Parraga, A., Phillips, M. F., Mantle, T. J. & Coll, M. Molecular structure at 1.8 A of mouse liver class pi glutathione S-transferase complexed with S-(p-nitrobenzyl)glutathione and other inhibitors. J Mol Biol 237, 298-314 (1994). 104. Ji, X. et al. Three-dimensional structure, catalytic properties, and evolution of a sigma class glutathione transferase from squid, a progenitor of the lens S- crystallins of cephalopods. Biochemistry 34, 5317-28 (1995). 105. Reinemer, P. et al. Three-dimensional structure of glutathione S-transferase from Arabidopsis thaliana at 2.2 A resolution: structural characterization of herbicide- conjugating plant glutathione S-transferases and a novel active site architecture. J Mol Biol 255, 289-309 (1996). 106. Thom, R. et al. Structure of a tau class glutathione S-transferase from wheat active in herbicide detoxification. Biochemistry 41, 7008-20 (2002). 107. Hegazy, U. M., Mannervik, B. & Stenberg, G. Functional role of the lock and key motif at the subunit interface of glutathione transferase p1-1. J Biol Chem 279, 9586-96 (2004).

233 108. Oakley, A. J. et al. The three-dimensional structure of the human Pi class glutathione transferase P1-1 in complex with the inhibitor ethacrynic acid and its glutathione conjugate. Biochemistry 36, 576-85 (1997). 109. Armstrong, R. N. Structure, catalytic mechanism, and evolution of the glutathione transferases. Chem Res Toxicol 10, 2-18 (1997). 110. Karshikoff, A., Reinemer, P., Huber, R. & Ladenstein, R. Electrostatic evidence for the activation of the glutathione thiol by Tyr7 in pi-class glutathione transferases. Eur J Biochem 215, 663-70 (1993). 111. Polekhina, G., Board, P. G., Blackburn, A. C. & Parker, M. W. Crystal structure of maleylacetoacetate isomerase/glutathione transferase zeta reveals the molecular basis for its remarkable catalytic promiscuity. Biochemistry 40, 1567- 76 (2001). 112. Chen, H. I. & Sudol, M. The WW domain of Yes-associated protein binds a proline-rich ligand that differs from the consensus established for Src homology 3-binding modules. Proc Natl Acad Sci U S A 92, 7819-23 (1995). 113. Wells, W. W., Yang, Y., Deits, T. L. & Gan, Z. R. Thioltransferases. Adv Enzymol Relat Areas Mol Biol 66, 149-201 (1993). 114. Rossjohn, J. et al. Structures of thermolabile mutants of human glutathione transferase P1-1. J Mol Biol 302, 295-302 (2000). 115. Kong, G. K. et al. Contribution of glycine 146 to a conserved folding module affecting stability and refolding of human glutathione transferase p1-1. J Biol Chem 278, 1291-302 (2003). 116. Munoz, V., Blanco, F. J. & Serrano, L. The hydrophobic-staple motif and a role for loop-residues in alpha-helix stability and protein folding. Nat Struct Biol 2, 380-5 (1995). 117. Cocco, R. et al. The folding and stability of human alpha class glutathione transferase A1-1 depend on distinct roles of a conserved N-capping box and hydrophobic staple motif. J Biol Chem 276, 32177-83 (2001). 118. Deutsch, M. & Long, M. Intron-exon structures of eukaryotic model organisms. Nucleic Acids Res 27, 3219-28 (1999). 119. Hayes, J. D. & Pulford, D. J. The glutathione S-transferase supergene family: regulation of GST and the contribution of the isoenzymes to cancer chemoprotection and drug resistance. Crit Rev Biochem Mol Biol 30, 445-600 (1995).

234 120. Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673-80. (1994). 121. Whitbread, A. K., Tetlow, N., Eyre, H. J., Sutherland, G. R. & Board, P. G. Characterization of the human Omega class glutathione transferase genes and associated polymorphisms. Pharmacogenetics 13, 131-44 (2003). 122. Morel, F., Rauch, C., Coles, B., Le Ferrec, E. & Guillouzo, A. The human glutathione transferase alpha locus: genomic organization of the gene cluster and functional characterization of the genetic polymorphism in the hGSTA1 promoter. Pharmacogenetics 12, 277-86 (2002). 123. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860-921 (2001). 124. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304-51 (2001). 125. Ito, Y. Oncogenic potential of the RUNX gene family: 'overview'. Oncogene 23, 4198-208 (2004). 126. Suh, K. S. et al. Antisense suppression of the chloride intracellular channel family induces apoptosis, enhances tumor necrosis factor {alpha}-induced apoptosis, and inhibits tumor growth. Cancer Res 65, 562-71 (2005). 127. Dehal, P. et al. The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298, 2157-67 (2002). 128. Glusman, G., Kaur, A., Hood, L. & Rowen, L. An enigmatic fourth runt domain gene in the fugu genome: ancestral gene loss versus accelerated evolution. BMC Evol Biol 4, 43 (2004). 129. Stein, L. D. et al. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol 1, E45 (2003). 130. Genome sequence of the nematode C. elegans: a platform for investigating biology. The C. elegans Sequencing Consortium. Science 282, 2012-8 (1998). 131. Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185-95 (2000). 132. Celniker, S. E. et al. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol 3, RESEARCH0079 (2002).

235 133. Richards, S. et al. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution. Genome Res. 15, 1-18 (2005). 134. Holt, R. A. et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science 298, 129-49 (2002). 135. Xia, Q. et al. A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science 306, 1937-40 (2004). 136. Shorning, B. Y., Wilson, D. B., Meehan, R. R. & Ashley, R. H. Molecular cloning and developmental expression of two Chloride Intracellular Channel (CLIC) genes in Xenopus laevis. Dev Genes Evol 213, 514-8 (2003). 137. Berry, K. L., Bulow, H. E., Hall, D. H. & Hobert, O. A C. elegans CLIC-like protein required for intracellular tube formation and maintenance. Science 302, 2134-7 (2003). 138. Hedges, S. B. The origin and evolution of model organisms. Nat Rev Genet 3, 838-49 (2002). 139. Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-62 (2002). 140. Gibbs, R. A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493-521 (2004). 141. Hillier, L. W. et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695-716 (2004). 142. Klein, S. L. et al. Genetic and genomic tools for Xenopus research: The NIH Xenopus initiative. Dev Dyn 225, 384-91 (2002). 143. Aparicio, S. et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301-10 (2002). 144. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431, 946-57 (2004). 145. Robinson-Rechavi, M., Boussau, B. & Laudet, V. Phylogenetic dating and characterization of gene duplications in vertebrates: the cartilaginous fish reference. Mol Biol Evol 21, 580-6 (2004). 146. Pancer, Z., Mayer, W. E., Klein, J. & Cooper, M. D. Prototypic T cell receptor and CD4-like coreceptor are expressed by lymphocytes in the agnathan sea lamprey. Proc Natl Acad Sci U S A 101, 13273-8 (2004).

236 147. Pancer, Z. et al. Somatic diversification of variable lymphocyte receptors in the agnathan sea lamprey. Nature 430, 174-80 (2004). 148. Rennert, J., Coffman, J. A., Mushegian, A. R. & Robertson, A. J. The evolution of Runx genes I. A comparative study of sequences from phylogenetically diverse model organisms. BMC Evol Biol 3, 4 (2003). 149. Stricker, S. et al. A single amphioxus and sea urchin runt-gene suggests that runt-gene duplications occurred in early chordate evolution. Dev Comp Immunol 27, 673-84 (2003). 150. Gracey, A. Y. et al. Coping with cold: An integrative, multitissue analysis of the transcriptome of a poikilothermic vertebrate. Proc Natl Acad Sci U S A 101, 16970-5 (2004). 151. Karsi, A. et al. Transcriptome analysis of channel catfish (Ictalurus punctatus): initial analysis of gene expression and microsatellite-containing cDNAs in the skin. Gene 285, 157-68 (2002). 152. Rexroad, C. E., 3rd et al. Sequence analysis of a rainbow trout cDNA library and creation of a gene index. Cytogenet Genome Res 102, 347-54 (2003). 153. Smith, T. P. et al. Sequence evaluation of four pooled-tissue normalized bovine cDNA libraries and construction of a gene index for cattle. Genome Res 11, 626- 30 (2001). 154. Uenishi, H. et al. PEDE (Pig EST Data Explorer): construction of a database for ESTs derived from porcine full-length cDNA libraries. Nucleic Acids Res 32 Database issue, D484-8 (2004). 155. Habermann, B. et al. An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries. Genome Biol 5, R67 (2004). 156. Hurt, P. et al. The genomic sequence and comparative analysis of the rat major histocompatibility complex. Genome Res 14, 631-9 (2004). 157. Bonaldo, M. F., Lennon, G. & Soares, M. B. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res 6, 791-806 (1996). 158. Putta, S. et al. From biomedicine to natural history research: EST resources for ambystomatid salamanders. BMC Genomics 5, 54 (2004). 159. Adoutte, A., Balavoine, G., Lartillot, N. & de Rosa, R. Animal evolution. The end of the intermediate taxa? Trends Genet 15, 104-8 (1999).

237 160. Aguinaldo, A. M. et al. Evidence for a clade of nematodes, arthropods and other moulting animals. Nature 387, 489-93 (1997). 161. Mallatt, J. & Winchell, C. J. Testing the new animal phylogeny: first use of combined large-subunit and small-subunit rRNA gene sequences to classify the protostomes. Mol Biol Evol 19, 289-301 (2002). 162. Dopazo, H., Santoyo, J. & Dopazo, J. Phylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species. Bioinformatics 20 Suppl 1, I116-I121 (2004). 163. Poustka, A. J. et al. Generation, annotation, evolutionary analysis, and database integration of 20,000 unique sea urchin EST clusters. Genome Res 13, 2736-46 (2003). 164. Nene, V. et al. Genes transcribed in the salivary glands of female Rhipicephalus appendiculatus ticks infected with Theileria parva. Insect Biochem Mol Biol 34, 1117-28 (2004). 165. Nene, V. et al. AvGI, an index of genes transcribed in the salivary glands of the ixodid tick Amblyomma variegatum. Int J Parasitol 32, 1447-56 (2002). 166. Rojtinnakorn, J., Hirono, I., Itami, T., Takahashi, Y. & Aoki, T. Gene expression in haemocytes of kuruma prawn, Penaeus japonicus, in response to infection with WSSV by EST approach. Fish Shellfish Immunol 13, 69-83 (2002). 167. Valenzuela, J. G., Pham, V. M., Garfield, M. K., Francischetti, I. M. & Ribeiro, J. M. Toward a description of the sialome of the adult female mosquito Aedes aegypti. Insect Biochem Mol Biol 32, 1101-22 (2002). 168. Lehane, M. J. et al. Adult midgut expressed sequence tags from the tsetse fly Glossina morsitans morsitans and expression analysis of putative immune response genes. Genome Biol 4, R63 (2003). 169. Gaines, P. J. et al. Analysis of expressed sequence tags from subtracted and unsubtracted Ctenocephalides felis hindgut and Malpighian tubule cDNA libraries. Insect Mol Biol 11, 299-306 (2002). 170. Whitfield, C. W. et al. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res 12, 555-66 (2002). 171. Gueguen, Y. et al. Immune gene discovery by expressed sequence tags generated from hemocytes of the bacteria-challenged oyster, Crassostrea gigas. Gene 303, 139-45 (2003).

238 172. Hu, W. et al. Evolutionary and biomedical implications of a Schistosoma japonicum complementary DNA resource. Nat Genet 35, 139-47 (2003). 173. Verjovski-Almeida, S. et al. Transcriptome analysis of the acoelomate human parasite Schistosoma mansoni. Nat Genet 35, 148-57 (2003). 174. Mineta, K. et al. Origin and evolutionary process of the CNS elucidated by comparative genomics analysis of planarian ESTs. Proc Natl Acad Sci U S A 100, 7666-71 (2003). 175. Page, R. D. TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 12, 357-8 (1996). 176. Pough, F. H., Janis, C. M. & Heiser, J. B. Vertebrate life (Prentice Hall, New York, 1999). 177. Stock, D. W. & Whitt, G. S. Evidence from 18S ribosomal RNA sequences that lampreys and hagfishes form a natural group. Science 257, 787-9 (1992). 178. Brusca, R. C. & Brusca, G. J. Invertebrates. (ed. Sunderland, M. A.) (Sinauer Associates, Inc., 1990). 179. Oda, H. et al. A novel amphioxus cadherin that localizes to epithelial adherens junctions has an unusual domain organization with implications for chordate phylogeny. Evol Dev 4, 426-34 (2002). 180. Wright, G. M., Keeley, F. W. & Robson, P. The unusual cartilaginous tissues of jawless craniates, cephalochordates and invertebrates. Cell Tissue Res 304, 165- 74 (2001). 181. Holland, P. W., Garcia-Fernandez, J., Williams, N. A. & Sidow, A. Gene duplications and the origins of vertebrate development. Dev Suppl, 125-33 (1994). 182. Leslie, A. G. W. in joint CCP4 + ESF-EAMCB Newsletter on Protein Crystallography, No. 26. (1992). 183. Collaborative Computational Project, Number 4. The CCP4 suite: programs for protein crystallography. Acta Cryst. D 50, 760-763 (1994). 184. Murshudov, G. N., Vagin, A. A. & Dodson, E. J. Refinement of macromolecular structures by the maximum likelihood method. Acta Crystallogr D 53, 240-255 (1997). 185. Navaza, J. AMoRe: an automated package for molecular replacement. Acta Cryst. D 50, 157-163 (1994).

239 186. Lamzin, V. S. & Wilson, K. S. Automated refinement of protein models. Acta Cryst. D 49, 129-149 (1993). 187. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kjeldgaard. Improved methods for binding protein models in electron density maps and the location of errors in these models. Acta Crystallogr A 47, 110-9 (1991). 188. Nathaniel, C., Wallace, L. A., Burke, J. & Dirr, H. W. The role of an evolutionarily conserved cis-proline in the thioredoxin-like domain of human class Alpha glutathione transferase A1-1. Biochem J 372, 241-6 (2003). 189. Rife, C. L., Parsons, J. F., Xiao, G., Gilliland, G. L. & Armstrong, R. N. Conserved structural elements in glutathione transferase homologues encoded in the genome of Escherichia coli. Proteins 53, 777-82 (2003). 190. Nishida, M. et al. Three-dimensional structure of Escherichia coli glutathione S- transferase complexed with glutathione sulfonate: catalytic roles of Cys10 and His106. J Mol Biol 281, 135-47 (1998). 191. Tong, Z., Board, P. G. & Anders, M. W. Glutathione transferase zeta catalyses the oxygenation of the carcinogen dichloroacetic acid to glyoxylic acid. Biochem J 331 ( Pt 2), 371-4 (1998). 192. Dean, J. V., Devarenne, T. P., Lee, I. S. & Orlofsky, L. E. Properties of a Maize Glutathione S-Transferase That Conjugates Coumaric Acid and Other Phenylpropanoids. Plant Physiol 108, 985-994 (1995). 193. Lawrence, M. C. & Bourke, P. CONSCRIPT: A program for generating electron density isosurfaces from Fourier syntheses in protein crystallography. J. Appl. Cryst. 33, 990-991 (2000). 194. Lawrence, M. C. & Colman, P. M. Shape complementarity at protein/protein interfaces. J Mol Biol 234, 946-50 (1993). 195. Nicholls, A., Sharp, K. A. & Honig, B. Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins 11, 281-96 (1991). 196. Evans, S. V. SETOR: hardware enlightened three-dimensional solid model representations of macromolecules. J. Mol. Graph. 11, 134-138 (1993). 197. Choi, H. et al. Structural basis of the redox switch in the OxyR transcription factor. Cell 105, 103-13 (2001). 198. Zheng, M., Aslund, F. & Storz, G. Activation of the OxyR transcription factor by reversible disulfide bond formation. Science 279, 1718-21 (1998).

240 199. Kortemme, T. & Creighton, T. E. Ionisation of cysteine residues at the termini of model alpha-helical peptides. Relevance to unusual thiol pKa values in proteins of the thioredoxin family. J Mol Biol 253, 799-812 (1995). 200. Miranda, J. J. Position-dependent interactions between cysteine residues and the helix dipole. Protein Sci 12, 73-81 (2003). 201. Dragani, B. et al. Irreversible thermal denaturation of glutathione transferase P1- 1. Evidence for varying structural stability of different domains. Int J Biochem Cell Biol 30, 155-163 (1998). 202. Aceto, A. et al. Dissociation and unfolding of Pi-class glutathione transferase. Evidence for a monomeric inactive intermediate. Biochem J 285 ( Pt 1), 241-5 (1992). 203. Buechner, M. Tubes and the single C. elegans excretory cell. Trends Cell Biol 12, 479-84 (2002). 204. Buechner, M., Hall, D. H., Bhatt, H. & Hedgecock, E. M. Cystic canal mutants in Caenorhabditis elegans are defective in the apical membrane domain of the renal (excretory) cell. Dev Biol 214, 227-41 (1999). 205. Fomenko, D. E. & Gladyshev, V. N. CxxS: fold-independent redox motif revealed by genome-wide searches for thiol/disulfide oxidoreductase function. Protein Sci 11, 2285-96 (2002). 206. Gladyshev, V. N. et al. Identification and characterization of a new mammalian glutaredoxin (thioltransferase), Grx2. J Biol Chem 276, 30374-80 (2001). 207. Atichartpongkul, S. et al. Bacterial Ohr and OsmC paralogues define two protein families with distinct functions and patterns of expression. Microbiology 147, 1775-82 (2001). 208. Lesniak, J., Barton, W. A. & Nikolov, D. B. Structural and functional features of the Escherichia coli hydroperoxide resistance protein OsmC. Protein Sci 12, 2838-43 (2003). 209. Lesniak, J., Barton, W. A. & Nikolov, D. B. Structural and functional characterization of the Pseudomonas hydroperoxide resistance protein Ohr. Embo J 21, 6649-59 (2002). 210. Fomenko, D. E. & Gladyshev, V. N. Identity and functions of CxxC-derived motifs. Biochemistry 42, 11214-25 (2003).

241 211. Dragani, B., Cocco, R., Principe, D. R., Paludi, D. & Aceto, A. Conformational properties of five peptides corresponding to the entire sequence of glutathione transferase domain II. Arch Biochem Biophys 389, 15-21 (2001). 212. Gattiker, A., Gasteiger, E. & Bairoch, A. ScanProsite: a reference implementation of a PROSITE scanning tool. Appl Bioinformatics 1, 107-8 (2002). 213. Marco, A., Cuesta, A., Pedrola, L., Palau, F. & Marin, I. Evolutionary and structural analyses of GDAP1, involved in Charcot-Marie-Tooth disease, characterize a novel class of glutathione transferase-related genes. Mol Biol Evol 21, 176-87 (2004). 214. Oakley, A. J. et al. The crystal structures of glutathione S-transferases isozymes 1-3 and 1-4 from Anopheles dirus species B. Protein Sci 10, 2176-85 (2001). 215. Arkin, I. T. & Brunger, A. T. Statistical analysis of predicted transmembrane alpha-helices. Biochim Biophys Acta 1429, 113-28 (1998). 216. Kyte, J. & Doolittle, R. F. A simple method for displaying the hydropathic character of a protein. J Mol Biol 157, 105-32 (1982). 217. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16, 276-7 (2000). 218. Kanaoka, Y. et al. Cloning and crystal structure of hematopoietic prostaglandin D synthase. Cell 90, 1085-95 (1997). 219. Lam, B. K. Leukotriene C(4) synthase. Prostaglandins Leukot Essent Fatty Acids 69, 111-6 (2003).

242