<<

CLONING AND SEQUENCING OF THE GENE

FOR VALYL-tRNA SYNTHETASE FROM

BACILLUS STEAROTHERMOPHILUS

by

Nigel John Brand

A dissertation submitted to the Imperial College of Science and

Technology in candidature for the Diploma of Imperial College, and to the University of London in candidature for the degree of Doctor

of Philosophy.

Department of Chemistry

Imperial college

London SW7 2AZ

September 1986 2

ABSTRACT

This thesis describes the determination of the DNA sequence of the valS gene from the bacterium Bacillus stearothermophilus, which encodes the enzyme valyl-tRNA synthetase (ValRS). The gene was cloned from a genomic library of B. stearothermophilus DNA in the plasmid vector pAT153 by complementation of an E. coli mutant containing a temperature-sensitive lesion in the chromosomal copy of valS. The gene was shown to have been cloned from a comparison of the kinetic properties of the cloned gene product to those of native ValRS, purified from B. stearothermophilus. Fragments of the gene and its adjoining regions were ultimately subcloned into the bacteriophage vector Ml 3 and their DNA sequences were determined by the dideoxy chain-termination method. The individual DNA sequences were amalgamated into one contiguous sequence spanning 5 kilobases (kb). The DNA sequence was analysed and an open reading frame, sufficient to code for an 880-residue protein of Mr = 102,036, was identified. This was assigned to ValRS by a number of criteria. First, the pattern of codon usage for the frame was shown to correlate with codon preferences of other aminoacyl-tRNA synthetases. Second, the amino acid composition, predicted from a translation of the DNA sequence, agreed with that determined for a sample of purified, cloned ValRS. A spectrophotometric determination of the number of tryptophan residues in the enzyme also agreed with the predicted value. The primary sequence of the ValRS was compared with those of a number of other synthetases. Some significant homologies were found, notably the conservation of two sequences that have been implicated in binding ATP and the 3’ end of tRNA in other synthetases. A striking degree of homology (25-32%) with the isoleucyl-tRNA synthetase (IleRS) of Escherichia coli was found, representing the most extensive homology yet reported between two aminoacyl-tRNA synthetases from different bacterial species. This argues that the two enzymes may have evolved from a common progenitor. 3

ACKNOWLEDGEMENTS

This work would not have been possible without contributions from a number of people. I should to thank my colleagues Borgford and Tammy Gray for the hard work, enthusiasm and patience that it took to sequence "our gene". I am especially grateful to Robin Leatherbarrow and Wally Ward for their critical reading of the manuscript and for their many helpful comments. My thanks also to Mick Jones for introducing me to molecular biology in the first place, and to Jack Knill-Jones for synthesising all the sequencing primers and for his unstinting interest in the project. Lastly, thanks to Lesia Gojda for some excellent illustrations, to Glyn Millhouse for taking the photographs and to Gail Craigie for her help with the typing and layout of this tome.

There are too many people (past and present) to mention, but special thanks to the following for their company, coffee and chat: Sue Cotterill, Tony Wilkinson, Jian-Ping Shi, Hugh Jones, Alison Campbell, Tim Wells, Katy Brown, Steve Delaney, Carlos Flores, Ishtiag Qadri, Paula Zard and Tom Purcell.

Finally, my sincere thanks to my supervisor, Professor Alan Fersht, for his advice and continual interest at all stages in the project.

The work presented in this thesis was funded by the Medical Research Council. 4

CONTENTS

Abstract 2 Acknowledgements 3 Contents 4 List of Figures 11 List of Tables 14

CHAPTER 1. INTRODUCTION 1.1 Reasons for Studying the Aminoacyl-tRNA Synthetases 15 1.2 The Aminoacyl-tRNA Synthetases: a historical background 17 1.3 General Properties of the Aminoacyl-tRNA Synthetases 20 1.3.1 The Synthetases Catalyse a Fundamental Reaction 20 1.3.2 Structural Properties of the Synthetases 24 1.3.3 Implications for the Evolution of the Synthetases 25 1.3.4 Internal Repetitive Elements 28 1.3.5 Interactions between the Synthetases and tRNAs 30 1.4 The Fidelity of Aminoacylation by Aminoacyl-tRNA Synthetases 33 1.4.1 General Introduction 33 1.4.2 Simple Discrimination Against Dissimilar Amino Acids 34 1.4.3 Some Synthetases have Editing Mechanisms 36 1.5 Cloning and Expression of Aminoacyl-tRNA Synthetase Genes 40 1.5.1 The Impact of Genetic Engineering 40 1.5.2 Sequencing of Synthetase Genes 41 1.5.3 Diversity in the Structure of the Promoters of Synthetase Genes 41 1.5.4 The Expression of Synthetase Genes is Regulated in a Variety of Ways 44 1.5.5 Attenuation as a Means of Controlling 5

Expression of the pheS, T Operon 47 1.6 Structural and Functional Homologies Between Synthetases 50 1.6.1 Homologies at the Level of the Primary Sequence 50 1.6.2 Dissection of Synthetase Structure with Respect to Function 53 1.7 Towards a Study of the Functional Arrangement of the Valyl-tRNA Synthetase 56

CHAPTER 2. EXPERIMENTAL: GENERAL RECOMBINANT DNA METHODS 2.1 Media 58 2.2 Bacterial Strains 59 2.3 Enzymes 59 2.4 Cloning Vectors 59 2.5 Restriction Enzyme Digestions 61 2.5.1 Conditions for Digestion of Plasmid DNA 61 2.5.2 Separation of Restriction Fragments through Agarose Gels 61 2.5.3 Purification of DNA Fragments 64 2.6 Preparation of Phosphatased Vector DNA 65 2.7 B. stear other mophilus Gene Library 65 2.7.1 Construction of the Gene Library 65 2.7.2 Amplification of the Gene Library 66 2.8 Purification of Plasmid DNA 67 2.8.1 Large-scale Plasmid Preparation 67 2.8.2 Small-scale Plasmid Preparation 68 2.9 Ligations 69 2.10 Repairing Cohesive Ends with T4 DNA Polymerase 70 2.11 Digestion of Linearised DNA with Nuclease Bal3\ 70 2.12 Transformation of Competent E. coli 71 2.12.1 Preparation of Competent Cells 71 2.12.2 Transformation of E. coli DH5 71 2.12.3 Transformation of E. coli 236c 72 6

CHAPTER 3. EXPERIMENTAL: KINETIC ASSAYS AND PROTEIN PURIFICATION 3.1 Enzyme Assays 73 3.1.1 Materials 73 3.1.2 Active-Site Titration 74 3.1.3 Aminoacylation (Charging) Assay 75 3.2 SDS-Polyacrylamide Gel Electrophoresis 77 3.3 Purification of Valyl-tRNA Synthetase 78 3.3.1 A Modified Active-Site Titration for ValRS 78 3.3.2 Cell Culture and Preliminary Purification of ValRS 79 3.3.3 Precipitation with Ammonium Sulphate 80 3.3.4 DEAE-Sephacel Chromatography 81 3.3.5 FPLC-Chromatography 83 3.3.6 A Second Heat Step, Concentration and Storage of the Purified Enzyme 86 3.3.7 Discussion 86 3.4 Amino Acid Sequencing 89 3.5 Amino Acid Composition Determination 90 3.6 Determination of the Number of Tryptophan Residues in ValRS 90

CHAPTER 4. CLONING OF THE B. stearothermophilus valS GENE 4.1 Introduction 93 4.2 Cloning of the valS Gene from B. stear other mophilus by Complementation of a Temperature-Sensitive E. coli Strain 97 4.2.1 Amplification of the B. stearothermophilus Library in the Mutant Strain 97 4.2.2 Selection for valS Plasmids by Complementation 98 4.2.3 Characterisation and Sizing of pNB Plasmids 100 4.2.4 Discussion 103 4.3 Subcloning of valS from pNBl 103 4.3.1 Introduction 103 7

4.3.2 Subcloning Strategy 104 4.3.3 Selection of valS Subclones Complementation 107 4.3.4 Discussion 108 4.4 Restriction Analysis of pNB2.1 and pNBl 110 4.4.1 Introduction 110 4.4.2 Restriction Map of pNB2.1 111 4.4.3 A Restriction Map for pNBl 111 4.4.4 Discussion 111 4.5 Subcloning of pNBl - Selection of pTB8 115 4.5.1 Introduction 115 4.5.2 Subcloning of a 3.6 kb Pstl Fragment Containing valS into Ml 3 116 4.5.3 Discussion 118

CHAPTER 5. ANALYSIS OF THE CLONED vaLS GENE PRODUCT 5.1 Introduction 120 5.2 Thermostability of the Cloned B. stearothermophilus ValRS 121 5.2.1 Introduction 121 5.2.2 Results and Discussion 122 5.3 Analysis of Cloned ValRS by SDS— Polyacrylamide Gel Electrophoresis 124 5.4 Kinetic Measurements for Purified B. stearothermophilus ValRS 126 5.4.1 Introduction and Methods 126 5.4.2 Results and Discussion 127 5.5 Expression of valS Cloned into Ml 3 132 5.5.1 Introduction 132 5.5.2 Preliminary Aminoacylation Results 132 5.5.3 Orientation of Ml 3 valS Clones 133 5.5.4 The Class II valS Clones are Induced by IPTG 136 5.5.5 Discussion 140 5.6 Conclusions 143

CHAPTER 6. EXPERIMENTAL: DNA SEQUENCING 6.1 Introduction 147 Dideoxy Chain-termination Sequencing 147 The Use of Ml 3 Vectors in Chain- Termination Sequencing 148 Shotgun Cloning and Sequencing in M13 150 Sequencing of valS by the Chain- termination Method 151 Introduction 151 Materials 152 Bacterial Strains 152 Phage Vectors 152 Enzymes 154 Radiochemicals 154 Sequencing Primers 154 Nucleoside Triphosphates 155 Cloning of Sonicated DNA Fragments 157 Preparation of DNA Fragments 157 Sonication 157 Repair of Sheared Termini 158 Fractionation of 300-700 bp Fragments 158 Ligations 159 Preparation of Competent Cells 159 Transformation of Competent Cells 160 Preparation of Template DNA 160 Dideoxy Sequencing with [a-^^P] dATP 161 Separation of Radiolabelled Polynucleotides by Electrophoresis 162 Fixing and Autoradiography 164 Modifications for Sequencing with [a-35S] dATP 164 Modifications for Sequencing with dITP 165 Double-stranded Sequencing 166 Computer Software 166

7. ANALYSIS OF THE B. stearother mophilus valS DNA SEQUENCE Introduction 169 Analysis of the valS Transcription Unit 170 Nucleotide Sequence of valS 170 9

7.2.2 Transcription Promoter and Termination Signals for valS 178 7.2.3 Ribosome Binding Site for valS 184 7.3 Codon Usage 186 7.3.1 Introduction 186 7.3.2 Codon Preferences of valS and Comparison to other Bacillus spp. Aminoacyl-tRNA Synthetase Genes 188 7.3.3 Bias Towards G or C in the Wobble Position of the Codon in Thermophilic Synthetase Genes 189 7.3.4 Comparison Between the Codon Usage of Synthetases from B. stearothermophilus and E. coli 192 7.3.5 The B. stearothermophilus valS Shows Codon Preferences that Resemble those of a Strongly Expressed E. coli Gene 195 7.3.6 Discussion 199 7.4 Corroboration of the valS Sequence by Protein Chemistry 204 7.4.1 N-terminal Sequencing and Amino Acid Analysis of the Cloned ValRS 204 7.4.2 Spectrophotometric Determination of the Number of Tryptophan Residues in ValRS 208

CHAPTER 8. PROTEIN SEQUENCE HOMOLOGIES BETWEEN ValRS AND OTHER AMINO AC YL-tRNA SYNTHETASES 8.1 Introduction 210 8.2 The B. stearothermophilus ValRS Does Not Contain Internal Repeats 211 8.3 Matrix Comparison of ValRS with E. coli IleRS and other Synthetases 214 8.3.1 Introduction 214 8.3.2 Homologies Between IleRS and ValRS 215 8.3.3 Homologies Between Different Synthetases 218 8.3.4 Discussion 225 8.4 Extensive Localised Homologies Between ValRS and IleRS 229 1 0

8.4.1 Introduction 229 8.4.2 Alignment of IleRS and ValRS Primary Sequences 234 8.4.3 Discussion 236 8.5 The HIGH Region 238 8.6 The KMSKS Region 246 8.7 Conclusions on Common Elements of Structure Between Synthetases 255

CHAPTER 9. FUTURE PROSPECTS FOR PROTEIN ENGINEERING OF ValRS 9.1 Site-Directed Mutagenesis 260 9.2 Deletion Mutagenesis and Other Studies 261

APPENDICES 1 Abbreviations 263 2 Complete DNA Sequencing Database 265

REFERENCES 278 LIST OF FIGURES

3-Dimensional structure of Yeast tRNAP^e 32 Schematic representation of the binding sites of IleRS and YalRS 35 Preferential binding of the non-cognate aminoacyl-tRNA in an editing site 38 Alternative secondary structures for the attenuator of the E. coli trp operon 48 Comparison of homologies in the amino- termini of four aminoacyl-tRNA synthetases 51 Deletion of the C-terminal 100 residues of B. stear other mophilus TyrRS 55 DEAE-Sephacel chromatography of ValRS Facing 83 FPLC-purification of ValRS Facing 85 SDS—Polyacrylamide gel showing purification of ValRS 88 A second round of transformation allows complementation to be distinguished from chromosomal reversion Facing 96 Sizing pNB plasmids 101 Preliminary restriction analysis of pNBl 102 Subcloning strategy for pNBl Facing 106 Restriction analysis of valS subclones 109 Restriction maps of pNB2.1 and pAT153 112 Comparison of the restriction maps of pNBl and pNB2.1 Facing 114 Comparison of the restriction maps of pNBl and pTB8 117 Thermostability of ValRS from different sources 123 Analysis of crude cell-lysates of E. coli 236c carrying the cloned B. stearothermophilus valS gene 125 Determination of ATjyj (ATP) 128 Determination of (val) and /ccat 129 Comparison of the orientation of the valS 12

gene in pTB8 with that in Ml 3 135 5.6 Analysis of induction of Ml 3 valS clones by IPTG by SDS-PAGE 138 5.7 DNA sequence of a Class II Ml 3 clone 142 6.1 Polylinker sequences of Ml 3 vectors mp8 and K8.2 153 7.1 Extent of sequencing of valS Facing 171 7.2 DNA sequence of the B. stearothermophilus valS gene and adjacent regions 174 7.3 Comparison of restriction map for the 3.6 kb PstI fragment of pTB8 with that predicted from the DNA sequence 179 8.1 2-Dimensional matrix comparison of ValRS protein sequence against itself 212 8.2 Matrix comparison of ValRS versus IleRS 216 8.3 Homology plots for ValRS against itself and IleRS at a different program stringency 217 8.4 Matrix homology comparisons of ValRS with other synthetases 220 8.5 Comparison of ValRS with Bst or E. coli TyrRS 221 8.6 Comparison of E. coli and yeast MetRS versus ValRS 223 8.7 Example of effect of changing stringency of the matrix upon homologies between two enzymes 226 8.8 Hydropathicity plots for ValRS and IleRS 228 8.9 Comparison of the primary sequences of ValRS and IleRS 230 8.10 Alignment of 12 aminoacyl-tRNA synthetases around the HIGH sequence Facing 238 8.11 Schematic view of the transition state for the formation of tyrosyl adenylate by B. stearothermophilus TyrRS 241 8.12 Alignment of the a-carbon backbones of B. stearothermophilus TyrRS and E. coli MetRS in the Rossmann fold 244 8.13 Alignment of 11 aminoacyl-tRNA synthetases 13

in the KMSKS region Facing 248 8.14 Alignment of lysine residues implicated in binding the 3* acceptor stem of tRNA 252 14

LIST OF TABLES

1.1 The Genetic Code 27 1.2 Distances Between the Structural Gene and Startpoint of Transcription for a Number of Synthetase Genes 42 2.1 Summary of Strains 60 2.1 Composition of Buffers 62 3.1 Summary of a Typical ValRS Purification Facing 88 4.1 Growth of E. coli 236c Cells Transformed with B. stearothermophilus Gene Library 99 5.1 Comparison of Aminoacylation Kinetics for ValRS Isolated from B. stearothermophilus and E. coli 131 5.2 Rates of Aminoacylation for Ml 3 valS Clones 134 5.3 Comparative Aminoacylation Activities for IPTG-Induced or Non-induced valS Clones 139 6.1 Oligonucleotide Primers for Sequencing the Entire valS Gene from One Strand 156 7.1 Codon Usage of 4 Bacillus spp. Aminoacyl-tRNA Synthetases Facing 188 7.2 Comparison of the G+C Content of the B. stearothermophilus valS with other Thermophilic and Mesophilic Genes 190 7.3 Comparison of the Average Codon Usage for 3 B. stearothermophilus and 5 E. coli Aminoacyl-tRNA Synthetases 193 7.4 Comparison of Codon Usage for Strongly and Weakly Expressed Genes in E. coli 196 7.5 Cellular Levels of Aminoacyl-tRNA Synthetases in E. coli 201 7.6 Amino Acid Analysis of B. stearothermophilus ValRS 207 7.7 Tryptophan Contents of ValRS, TyrRS and Ribonuclease A 209 15

C H A P T E R 1

INTRODUCTION

1.1. Reasons for Studying the Aminoacyl-tRNA Synthetases

The aminoacyl-tRNA synthetases occupy a central position in the process of protein synthesis, catalysing the esterification of a transfer

RNA molecule (tRNA) with its cognate amino acid. This is a critical step with regard to the overall fidelity of protein synthesis. In the bacterial cell, tRNA is found predominantly in the aminoacylated form and is complexed to elongation factor-Tu (EF-Tu) (Mulvey and Fersht,

1977b). This complex, which is also bound to guanosine triphosphate

(GTP), diffuses into the entry or A-site of the ribosome and, assuming that the permissible Watson-Crick hydrogen bonds are made between the messenger RNA codon and the anticodon present on the tRNA (Crick,

1966), the GTP is cleaved and EF-Tu-GMP is released. A peptide bond is formed between the amino acid in the A-site and the carboxy (C) terminal residue of the polypeptidyl-tRNA (located in the donor (P) site on the ribosome), i.e., the polypeptide is transferred onto the incoming aminoacyl-tRNA. This event is followed by translocation, in which the ribosome advances to the next codon in the message, so that the polypeptidyl-tRNA is once more positioned in the P-site and the A-site becomes available for the next incoming aminoacyl-tRNA.

One of the reasons for studying aminoacyl-tRNA synthetases

(aaRSs) is that the synthetase interacts with three substrates, one of which (tRNA) is a macromolecule. The interaction betwen tRNAs and synthetases have been studied exhaustively (Section 1.3.5). The enzyme also has to select its cognate amino acid from a number of often structurally related amino acids. The ways in which aaRSs discrimate in favour of the correct amino acid will be dealt with in Section 1.4.

The tyrosyl-tRNA synthetase (TyrRS) from the thermophilic bacterium Bacillus stearothermophilus has proved to be an ideal model for answering some fundamental problems of enzyme kinetics and illustrates how molecular biology has revolutionised the study of the synthetases in general. The enzyme has been crystallised and its

3-dimensional structure was solved at 3.0 A resolution (Bhat et al,

1982). The structure of the enzyme-tyrosyl adenylate complex was also solved (Rubin and Blow, 1981). The primary sequence of TyrRS was derived from the DNA sequence of the cloned tyrS gene encoding the enzyme and was confirmed, in part, by protein sequencing (Winter et al, 1983). The tyrS gene was originally cloned from a plasmid library containing fragments that were representative of the entire

B. stearothermophilus genome. This was achieved by complementing a temperature-sensitive lesion in the tyrS gene of an Escherichia coli mutant strain (Barker, 1982). Subsequently, a number of synthetase genes have been cloned by this approach (Section 1.5.2). The TyrRS has become a prototype for protein engineering studies (reviewed by

Fersht et al, 1986), allowing individual residues around the active site of the molecule to be changed by using oligonucleotides to introduce precise mutations into the cloned tyrS gene (Winter et al, 1982). The enzyme has been shown to catalyse the formation of tyrosyl adenylate by utilising binding energy alone (Wells and Fersht, 1986). This is an example of Pauling’s "strain" theory of catalysis (Pauling, 1946), where the reaction is brought about via a lowering in activation energy due to an increase in binding energy. Other studies on TyrRS by protein engineering have emphasised the biological significance of the hydrogen bond, allowing a quantitiative assessment of the contribution of different amino acid side-chains as hydrogen bond donors or acceptors

(Fersht et alt 1985a).

Finally, a major point of interest is that this group of enzymes, each of which catalyses the same basic reaction, show a surprising diversity with respect to their sizes and structure (reviewed by

Schimmel and Soli, 1979 and Joachimiak and Barciszewski, 1980). This topic, and a review of opinions relating to the evolutionary origins of the synthetases, will be dealt with in Section 1.3.2.

1.2. The Aminoacyl-tRNA Synthetases: a Historical Background

The principle that underlies the synthesis of all proteins is that

"DNA makes RNA makes Protein" - the central dogma proposed by

Crick in 1955 that requires the presence of an RNA molecule

(subsequently called messenger or mRNA) to carry information from the

DNA to the protein (Crick, 1958). All the information pertaining to a particular protein is contained within DNA as a gene and is transcribed into mRNA by RNA polymerase. The RNA then directs the synthesis of the protein on the ribosomes (originally identified as ribonucleoprotein complexes found in the soluble protein fraction of cells). The direction of flow of information is irreversible and applies to all organisms, both prokaryotic and eukaryotic, with the exception of the retroviruses. This group of viruses possess a double-stranded RNA genome, but use a viral-encoded polymerase (reverse transcriptase) to make a double-stranded DNA copy of the genome which is subsequently integrated into the host genome (Temin and Baltimore, 1972). The late 1950’s and early 1960’s were a tremendously exciting period as regards to the elucidation of the fundamental processes governing the process of protein synthesis. It is worthwhile considering some of the events that surrounded the discovery of the aminoacyl-tRNA synthetases.

DNA is constructed from the monophosphate esters of four deoxyribonucleotides which differ from each other in the type of nitrogenous base attached to the 1’ carbon of the ribose ring; these are either adenine (A), guanine (G), cytosine (C) or thymine (T). The information pertaining to the primary structure of the product of a gene is coded by the sequence of the four bases. This code was shown to be transcribed into an RNA copy (messenger RNA), with the replacement of thymidine by uracil (U), and decoded by a ribonucleoprotein complex in the soluble cell fraction as a set of non-overlapping triplets. These results were obtained in an elegant series of experiments in the early sixties. Nirenberg and Matthaei

(1961) demonstrated that a synthetic messenger comprised of only uracil residues - poly(U) - directed the synthesis of polyphenylalanine in vitro.

The studies were extended to show that the trinucleotide UGU and poly-UG directed the synthesis of polyvaline, but that dinucleotides could not, demonstrating the existence of a non-overlapping triplet code

(Leder and Nirenberg, 1964). Such a code had been proposed originally by Crick et al (1961), on the basis of the effect of frameshift mutations that were introduced by acridines into the rji gene of coliphage T4. With an alphabet of four bases and a selection process based upon reading any three consecutive bases, this meant that the 20 amino acids found in the cell could be coded for by 4^ or 64 separate triplets. A ribosome binding assay was developed that allowed for the assignment of amino acids to their triplet codes or "codons" (Nirenberg 19 and Leder, 1964). This was achieved by directing the synthesis of a radioactively labelled polypeptide from a messenger of known sequence and trapping the polypeptide whilst still bound to the ribosome as the latter could be irreversibly bound to nitrocellulose. This work showed that a number of codons could code for the same amino acid.

In 1955, Crick had presented his "adaptor” hypothesis - that a specific molecule must exist to adapt the information contained in the

RNA sequence of the messenger to the protein (Crick, 1958). The existence of this adaptor molecule - the transfer RNA (tRNA) - was shown in a number of experiments that developed in parallel with the discovery of specific enzymes that were to transfer an amino acid onto a tRNA molecule. Hoagland and co-workers had demonstrated that L-amino acids could be "activated" to aminoacyl adenylates that were associated with specific proteins in the soluble protein fraction of rat liver cells (Hoagland et al, 1956). This activation required the presence of ATP. At the same time, de Moss and Novelli showed that a synthetic aminoacyl adenylate, when incubated with inorganic pyrophosphate and a soluble protein cell fraction, gave rise to the formation of ATP, indicating the reversibility of the activation process

(de Moss and Novelli, 1956). Hoagland’s group went on to demonstrate the existence of the adaptor molecule (transfer or tRNA) in 1958. The microsomal fraction of rat liver cells contained a ribonucleoprotein complex (the ribosome) that contained a high molecular weight RNA component (Zamecnick et al, 1958). The cell cytosol also contained a low molecular weight RNA, which was named soluble or sRNA. When the extract was incubated with [^C] leucine, the label was found to be associated with the sRNA. This complex could be extracted by phenol and precipitated, then introduced into a mouse ascites microsomal fraction where the label was incorporated into polypeptides (Hoaglande/ al, 1958b). Subsequently, the sRNA (tRNA) was shown to possess certain characteristics that marked it as an intermediate, notably that it was labelled transiently with [^C] leucine before the label could be found to be associated with the microsomal ribonucleoprotein

(Hoagland et al, 1958a).

Meanwhile, Berg and Offengand quantitated the ratio of activation of valine to its transfer to tRNA (charging) by using a sonicated cell-free extract of E. coll. They concluded that activation and charging were catalysed by one enzyme, that the reaction was dependendent on the presence of ATP and that it was stimulated by magnesium ions (Berg and Offengand, 1958). A further line of evidence for the function of sRNA as the adaptor came in 1962.

Cysteinyl-tRNAcys could be desulphydrated in the presence of Raney

Nickel, producing alanyl-tRNA (Chapeville et al, 1962). When added to a cellular protein fraction containing a synthetic message that coded for the synthesis of polycysteine, the alanyl-tRNAcys directed the synthesis of polyalanine instead. These results constituted the proof that each amino acid is recognised by a cognate transfer RNA molecule and that a specific ligase (now known as the aminoacyl-tRNA synthetase) is responsible for linking the two molecules together.

1.3. General Properties of the Aminoacyl-tRNA Synthetases

1.3.1. The Synthetases Catalyse a Fundamental Reaction

The aminocylation of a given tRNA is mediated by the cognate aminoacyl-tRNA synthetase in a two-step reaction (equations 1.1 and

1.2). The first step, known as activation, is the adenylation of the 21 amino acid to form an enzyme-bound mixed anhydride (Berg, 1958).

The reaction proceeds by the nucleophilic attack of the aminoacyl carboxylate group on the 5* side of the ATP a-phosphate. This is an

S/y2 reaction, proceeding via a trigonal biplanar pentacoordinate intermediate (Leatherbarrow et a l, 1985) with the resulting expulsion of pyrophosphate and the formation of aminoacyl adenylate. This may be written as

E + ATP + AA — > EAA-AMP + PP / (1.1)

where AA represents a particular amino acid, E represents the cognate aaRS, PP/ is the pyrophosphate leaving group and E-AA-AMP is the enzyme-bound aminoacyl adenylate (Fersht and Jakes, 1975). The reaction is independent of the presence of tRNA for most synthetases.

Exceptions to this rule are arginyl-tRNA synthetase (ArgRS), glutamyl-tRNA synthetase (GluRS) and glutaminyl-tRNA synthetase

(GlnRS), which will only activate the cognate amino acid once the correct tRNA has bound to the synthetase (Soli and Schimmel, 1974).

For most synthetases, the adenylate is formed in the absence of tRNA and the enzyme-adenylate complex may be separated from the starting

reagents by molecular sieve chromatography (Norris and Berg, 1964).

The reaction in equation 1.1 is reversible, allowing many aaRSs to be

assayed by the pyrophosphate exchange method (Calendar and Berg,

1966). The aaRS is mixed with ATP, the cognate amino acid and

[32p]_iabelled sodium pyrophosphate. Aminoacyl adenylate is formed

and unlabelled pyrophosphate is released, whereupon it may be replaced

by the labelled pyrophosphate and the backward reaction produces

[32p] ATP. The ATP is sequestered on activated charcoal in perchloric

acid and trapped by filtration, and the level of radioactivity is 22 determined by scintillation counting. Activation requires the presence of magnesium ions, one mol of Mg^+ binding per mol of enzyme and interacting with the 0- and 7 -phosphate groups (Fersht, 1985). Some synthetases, notably those isolated from the thermophilic bacterium

Thermus thermophilus, appear to bind zinc ions (Kohda et aly 1984).

The methionyl-tRNA synthetase (MetRS) from E. coli has also been reported to bind two mols of Z n2+ per mol of enzyme, in addition to

Mg2+ (Pororske et al, 1979).

The second part of the aminoacylation reaction is the transfer of the amino acid to its cognate tRNA. This reaction (Fersht and Jakes,

1975) may be summarised as follows

E* A A-AMP + tRNA — > AA-tRNA + AMP + E (1.2)

The acceptor site for the amino acid can be either the 2’ or 3’ hydroxyl of the ribose ring of the 3’ terminal adenosine residue of tRNA (Hecht, 1979).

Most prokaryotic organisms contain one aaRS specific for each amino acid. An exception is GluRS from Bacillus subtilis which charges both tRNA8^u and tRNAS*11 to glutamate. Glu-tRNAS*11 is subsequently converted to Gln-tRNAS ^11 by a specific transamination (Wilcox, 1969).

The kinetics of the synthetases have been studied intensively for a number of years. Yarus and Berg (1970) found that the amount of active E. coli IleRS could be quantified in vitro in the presence of inorganic pyrophosphatase, which prevented the back reaction of equation 1.1. In the absence of tRNA, the stoichiometric build-up of enzyme-bound [^C] isoleucyl adenylate can be followed by isolating the 23 enzyme-adenylate complex on nitrocellulose filters. The method has since been adopted as a general method for determining the active concentration of a particular aaRS. The kinetics of the charging reaction (Equation 1.2) may be determined by incubating the enzyme in the presence of ATP, tRNA, inorganic pyrophosphatase and the radiolabelled cognate amino acid (Fersht and Kaethner, 1976). The formation of aminoacyl-tRNA is measured by removing aliquots of the reaction mix at regular intervals and quenching them with trichloroacetic acid. The acid-insoluble aminoacyl-tRNA is subsequently collected by filtration.

These assays apply to the study of steady state kinetics. The various methods that are available for determining the kinetics of synthetases in the pre-steady state will not be discussed here.

One kinetic phenomenon that applies to the dimeric TyrRS and

MetRS of B. stearothermophilus is known as "half-of-the-sites" reactivity.

The enzymes possess two active sites (one per subunit), but only use one of the sites productively. In the case of TyrRS, the enzyme binds

1 mol of tyrosine per mol of enzyme dimer and adenylates it with a rate constant (kcat) of 18 s' 1 (Fersht et al, 1975). A second mol of tyrosyl adenylate is formed on the other subunit, but with a rate constant that is a factor of 10^ times slower than that for formation of the first mol of adenylate. Similarly, the rate constants for the formation of the first and second mols of methionyl adenylate by

E. coli MetRS are 29 and 0.06 s"1 respectively (Mulvey and Fersht,

1976). The significance of these results with effect to a possible conformational change in the structure of the enzyme remains unclear.

One possibility is that the slow formation of the "second" mol of adenylate precedes the rapid turnover of the "first" mol of adenylate 24 bound to the other subunit, mediated by some structural change (Fersht et ah 1975).

1.3.2. Structural Properties of the Synthetases

The data concerning the structure of aminoacyl-tRNA synthetases from a variety of organisms have been well documented by Schimmel and Soli (1979) and Joachimiak and Barciszewski (1980). Most of the collated information concerns prokaryotic aaRSs, but a number of studies have been undertaken on eukaryotic aaRSs, both from animals and plants. These tend to exist only as multi-enzyme complexes with

Mr values in excess of 300,000 (Dang and Dang, 1983). The significance of these aggregates are unclear, but it appears that a multi-enzyme complex may contribute to the thermostability of the aaRS

(Dang, 1982). The prokaryotic synthetases, and those from the lower eukaryote Saccharomyces cerevisiae (yeast) are easier to purify and characterise and only these will be considered henceforth.

The sizes of the prokaryotic aminoacyl-tRNA synthetases vary considerably. In E. coli, values for Mr range from 54,000 for GluRS to

380,000 for alanyl-tRNA synthetase (AlaRS) (Breton et ah 1986; Putney et ah 1981 respectively). This is due to the synthetases possessing four basic tertiary or quarternary structures. These are clasified as single chain momomers (a), dimers of identical subunits (ar2), tetramers of identical subunits (a4) or tetramers of non-identical subunits ( a 2(32).

E. coli AlaRS is an example of the third group, whilst phenylalanyl- tRNA synthetase (PheRS) from the same bacterium is an example of the &2(32 class with subunit sizes of Mr = 38,000 (a) and 96,000 (f3)

(Joachimiak and Barciszewski, 1980). The dimeric tyr-tRNA synthetases from E. coli and B. sterothermophilus have subunits of about Mr 47,000 25

(Calendar and Berg, 1966; Koch et alt 1974 respectively). There are considerable differences in the sizes of the protomers within each class, a fact witnessed by the disparity in the sizes of the monomeric a class. One member of this group, E. coli GluRS, represents the smallest synthetase yet known in terms of quarternary structure, with a protomer comprised of 471 amino acids (Breton et al, 1986). However,

this is not the smallest synthetase protomer - the tryptophanyl-tRNA

synthetases (TrpRS) are small dimeric enzymes and the protomer of the

E. coli enzyme (subunit Mr = 37,000) contains only 334 residues (Hall

et aU 1982). The largest cx synthetases are those corresponding to the

synthetases whose amino acid substrates have branched aliphatic

side-chains - the isoleucyl-, leucyl- and valyl-tRNA synthetase (IleRS,

LeuRS and ValRS, respectively). The bacterial forms of these enzymes

vary between Mr = 100,000-115,000 in size (Schimmel and Soil, 1979).

1.3.3. Implications for the Evolution of Synthetases

One of the fundamental questions concerning the structural

diversity of the synthetases is whether they evolved from a common

progenitor or not. These enzymes are so closely linked to the flow of

genetic information from the messenger RNA to the protein that they

must represent one of the oldest classes of enzymes. Crick proposed

that the primitive pool of amino acids must have been smaller than it

is today (Crick, 1968). The genetic code has probably always been

based on a triplet code with 64 possible codons (a change in the

number of bases in a codon at some point would have had dramatic

evolutionary consequences), so there would have been a wider pattern

of codon recognition for fewer amino acids by their tRNAs. As

examples, Crick cited the observation that tryptophan and methionine 26 are coded by one codon each (Table 1.1), implying that those amino acids may have arisen at a later stage than the earliest amino acids and that their tRNAs had evolved through the mutation of existing tRNA species. Also, it is apparent from Table 1.1 that chemically similar amino acids occupy specific positions with respect to their codons. The hydrophobic amino acids with branched aliphatic side chains - leu, met, ile and val- are found exclusively on the extreme left of the Table, with a consensus sequence for the codon of XUY

(where X and Y are the first and second bases in the codon, respectively). Those residues with aromatic side chains - phe, tyr and trp - are coded by triplets beginning with U, serine and threonine are coded by XPyY (where Py represents a pyrimidine) and so on. Crick’s

"central dogma" must necessarily have begun as a rudimentary association between polynucleotides and amino acids, predating the existence of any complex proteins such as enzymes (Orgel, 1968). As new amino acids arose, new tRNA species would have evolved to recognise them, possibly as mentioned earlier, by mutation of existing species. In parallel, new tRNA-activating enzymes developed to recognise the new tRNAs. Crick postulated that tRNAs may have originally fulfilled some of the functions of the synthetase, not least in the recognition of the amino acid. Certainly, as adaptor molecules they are highly likely to be older than enzymes. Dayhoff and McLaughlin

(1972) speculated that the present set of tRNAs arose from a common ancestral "proto-tRNA". The evolution of a set of simple ligases to recognise specific amino acids and link them to a cognate tRNA would surely have been a necessity if the fidelity of protein synthesis was to be improved so as to allow the evolution of more advanced organisms with a larger repertoire of proteins. 27

Tabic 1.1

The Genetic Code

1st 1 2nd UC AG 1 3rd t ► +

PHESER TYR CYS U PHESER TYR CYS C U LEUSER Ochre ?A LEU SER Amber TRP G LEU PRO HIS ARG U LEUPROHIS ARG C U LEUPRO GLN ARG A LEUPRO GLN ARGG ILETHR ASNSER U ILETHR ASNSERC A ILE THRLYSARGA METTHRLYSARG G VALALA ASP GLY U VAL ALAASP GLY c G VALALA GLU GLYA VAL ALAGLU GLYG

The Table, taken from Crick (1968), denotes the genetic code. The first base in a codon is read from the left-hand vertical column, the second base from the top row and the third base from the right-hand column. The amino acid corresponding to each codon is shown to the left of the codon. The standard three-letter code for amino acids is used: Ala, alanine; Arg, arginine; Asn, asparagine; Asp, aspartic acid; Cys; cysteine; Gin, glutamine; Glu, glutamic acid; Gly, glycine; His, histidine; lie, isoleucine; Leu, leucine; Lys, lysine; Met, methionine; Phe, phenylalanine; Pro, proline; Ser, serine; Thr, threonine; Trp, tryptophan; Tyr, tyrosine, and Val, valine. Ochre, amber and ? represent the transcriptional termination codons UAA, UAG and UGA respectively. 28

Waterson and Konigsberg (1974) believed that there were grounds for assuming that all the aminoacyl-tRNA synthetases had arisen from a common ancestor. As evidence, they cited an observation of the heterologous charging of E. coli aaRSs by tRNAs from other bacteria and blue-green algae. The sequences of over 300 tRNAs infer that the tertiary structure of all tRNAs is practically the same (Sprinzl et al,

1985), so the cross-charging, readily observed for bacterial and yeast tRNA species, is not so surprising as originally thought. Another example cited was the observations of Myers et al (1971) that the values of ATjyj for tRNA binding were similar for seryl-tRNA synthetase

(SerRS), LeuRS and ValRS of E. coli, each enzyme having been assayed with respect to the binding of several of its isoaccepting tRNA species.

Discrimination against misacylation appears to depend upon the rate of charging, rather than the strength of the binding of a tRNA to a particular synthetase (Ebel et al, 1973; Section 1.3.5). Waterson and and

Konigsbergs’ own work on the reason for the structural diversity of aaRSs centred around the proposal that a value of Mr = 50,000 could be assigned to most synthetase protomers and that those enzymes with protomers nearly double this size (i.e., LeuRS, IleRS and ValRS) had arisen by gene duplication (Waterson and Konigsberg, 1974). This was suggested by the apparent occurence of extensive repeats within the primary sequences of synthetases with large protomers.

1.3.4. Internal Repetitive Sequences

The presence of repeating elements within aminoacyl-tRNA synthetases had originally been proposed for the following enzymes:

E. coli IleRS (Mr = 112,000: Kula, 1973); E. coli LeuRS (Mr = 105,000:

Waterson and Konigsberg, 1974), and LeuRS, MetRS and ValRS from 29

B. stearothermophilus (Mr = 110,000, 66,000 and 110,000 respectively:

Koch et al, 1974). All these synthetases exist in vivo as large single chain monomers, with the exception of the MetRS, which is dimeric.

Koch and colleagues cleaved the ValRS at a site that is hypersensitive to trypsin, creating two unequal fragments of 40 and 67 kDa in size respectively (Koch et al, 1974). Each fragment was purified, cleaved with trypsin and the peptides were subjected to

2-dimensional mapping. The holoenzyme was also treated in the same manner. After staining, certain peptide spots were found to occupy equivalent positions in all three maps. Furthermore, the recovery of certain peptides (based on the amino acid composition after acid-hydrolysis) was half of the expected value, infering that these peptides were present at a molar ratio of roughly 2:1 with respect to the holoenzyme. The inference that was drawn from these results was that such spots represented peptides that were duplicated in the primary structure and, as they were present in the two major fragments, were widely separated throughout the holoenzyme. Similar results were obtained for MetRS. As a control, the dimeric TyrRS from

B. stearothermophilus (Mr = 47,000) was also analysed. No repeating sequences were found in this enzyme, implying that only the large

monomeric synthetases (or dimeric enzymes with large subunits, such as

MetRS), had evolved by extensive gene duplication or fusion. Such a

view was also taken by Waterson and Konigsberg, as mentioned in the

previous section. They analysed the peptides of SerRS, a dimer with a

subunit of Mr = 50,000, and LeuRS (Mr = 100,000; monomeric) from

E. coli. LeuRS appeared to contain half of the number of expected peptides, whereas SerRS produced the expected number of peptides.

Like the results from Koch’s laboratory, these results indicated that small synthetases protomers (around Mr = 45-50,000) did not contain repeating elements but that synthetases protomers of approximately twice this size did. Consequently, Waterson and Konigsberg suggested that all aaRSs could be constructed from units of approximately 450-500 residues, those synthetases with around 900 residues in the protomer containing two copies of this hypothetical unit and existing in vivo as monomers. Those with 450 residues per protomer existed as dimers or tetramers in vivo (Waterson and Konigsberg, 1974).

The ileS gene was recently sequenced, and much of the predicted protein sequence was confirmed by sequencing tryptic peptides, revealing that there are no internal repeats in this enzyme (Webster et al, 1984).

The genes encoding MetRS from E. coli (dimeric: Mr = 76,000) and yeast (monomeric: Mr = 85,000) have also been sequenced and provide incontrovertible evidence that these synthetases do not contain duplications of peptide sequences (Dardel et al, 1984; Walter et al,

1983).

1.3.5. Interactions Between the Synthetases and tRNAs

The tRNAs are a family of small RNA molecules that are structurally and chemically related (Sprinzl et al, 1985). Yeast PheRS charges a number of different tRNA species in addition to its cognate one. The binding constant for the interaction between the synthetase and a non-cognate tRNA may only differ by an order of magnitude from the value for the cognate binding; in contrast, the rate constants for the aminoacylation of the cognate versus non-cognate aminoacyl adenylates may differ by four or five orders of magnitude

(Ebel et al, 1973). This suggested a system of recognition working on two levels, the first being based on gross differences in the strength of 31 binding of the tRNA to the synthetase, the second operating at the level of catalysis in order to determine the effectiveness of the interaction. The positioning of the 3’ acceptor end of the tRNA in relation to the site of reaction with the enzyme-bound aminoacyl adenylate may depend, therefore, upon the symmetry of binding of the tRNA to the synthetase (Kim, 1975). A significant difference in the binding of cognate and non-cognate tRNAs became apparent from cross-linking tRNA^e from E. coli to either E. coli IleRS or ValRS from yeast using ultra-violet light (Budzik et al, 1975). Both enzymes were cross-linked to the tRNA in the region of the 3’ acceptor stem and the D-loop (Fig. 1.1). However, the cognate complex was also cross-linked at the 3’ end of the tRNA, whilst the non-cognate complex was linked on the 3’ side of the anticodon (i.e., at diametrically opposing ends of the tRNA). Thus, it appears that the anticodon is not necessary for cognate tRNA recognition in the case of E. coli

IleRS. A similar situation applies to yeast tRNAP^e, as PheRS will

charge phenylalanyl adenylate with a modified tRNAP^e that has had

the anticodon removed by limited treatment with a ribonuclease (Thiebe et al, 1972). Other synthetases, for example, E. coli MetRS, would appear to require the presence of the correct anticodon for aminoacylation. The anticodon of tRNA^met (CAU) has been excised by using RNase A and replaced with one of several trinucleotides that differ in the identity of the base in the wobble positon of the anticodon (Schulman et al, 1983). The rates of aminoacylation for each of these mutant tRNAs (AAU, GAU or UAU) is 4-5 orders of magnitude below that of native tRNAfmet. Recently, a suppressor tRNA that recognises the amber codon UAG, inserting leucine into a polypeptide, was mutated at 12 key positions (Normanly et al, 1986).

The tRNA was no longer recognised by LeuRS but was recognised by 32

Ammo Acid Attachment Site

Fig. 1.1. 3-Dimensional structure of Yeast tRNAPhc. The sequence of yeast tRNAP^e is depicted on the left of the figure and shows the Watson-Crick base-pairing that allows the molecule to assume the cloverleaf structure. On the right is shown a schematic representation of the 3-dimensional structure of the tRNA, with the base pairing indicated by the crossbars. The thin lines represent tertiary interactions between certain bases. The major functional and structural features of the tRNA are denoted. (Taken from Schimmel, 1979). 33

SerRS instead, with the consequent insertion of serine whenever an amber codon was read. The mutations were chosen according to an alignment of conserved bases in the sequences of 6 tRNAser species.

These bases were located in the D-loop, acceptor stem and anticodon loop of the tRNA molecule, in accordance with the results obtained from the UV cross-linking experiments (Budzik et al9 1975). Such results agree well with Kim’s proposals (Kim, 1975). All tRNAs that have been sequenced possess a number of conserved bases that stabilise the tertiary structure of the molecule (Normanly et al, 1986). As tRNA tertiary structure appears to be conserved, the process of recognising the correct tRNA might depend upon subtle differences in base composition between tRNAs.

1.4. The Fidelity of Aminoacylation by Aminoacyl-tRNA Synthetases

1.4.1. General Introduction

Protein synthesis operates with a remarkably low error rate. The accuracy of E. coli RNA polymerase has been estimated at between 4 x

10"4 and 2 x 10"5 (Springgate and Loeb, 1975). The overall error rate for protein synthesis in E. coli has been estimated to be 3 x 10“4 in vitro (Loftfield, 1963; Loftfield and Vanderjagt, 1972). This figure represents the sum of all the errors possible in the process, from transcription of the gene by RNA polymerase through to making the correct codon:anticodon interactions between the message and tRNA, so each step must be at least as accurate as the sum. The fidelity of tRNA discrimination is actually very high, with values ranging from 4 x 10 "6 to 5 x 10 "8 for E. coli and yeast aaRSs (reviewed in Yarus, 34

1979). The weakest link in this chain, as witnessed by the experiments of Loftfield and Vanderjagt in which the misincorporation of radiolabelled valine for isoleucine in rabbit haemaglobin was followed, occurs at the level of amino acid recognition by the aminoacyl-tRNA synthetases.

1.4.2. Simple Discrimination Against Dissimilar Amino Acids

At its basic level, some of the amino acids are so dissimilar from others that selection will operate at the level of steric tolerance, e.g. tryptophan would be excluded from the binding site of glycyl-tRNA synthetase (GlyRS). Some amino acids may be distinguished from one another on the basis of the binding energy between the synthetase and a particular amino acid side-chain, such as the case of the discrimination of TyrRS against binding phenylalanine. An error rate of 5 x 10 "4 has been calculated for the misactivation of phenylalanine by the Tyr-tRNA synthetases of E. coli and B. stear other mophilus (Fersht et al, 1980). Tyrosine is bound more tightly to the enzyme; recently, this has been shown to be mediated through hydrogen bonding of the phenolic hydroxyl of the substrate to residues Asp-176 and Tyr-34 at the bottom of the tyrosine binding pocket of the B. stear other mo philus

TyrRS (Fersht et al, 1985a).

Such methods of discrimination cannot apply to two amino acids that are similar in structure (isosteric) if the overall fidelity of protein synthesis is to be maintained. Consequently, some synthetases have evolved systems for correcting errors of misactivation or misacylation. A prime example is the case of the discrimination between isoleucine and valine by IleRS (the amino acids only differ by a single methylene group: Fig. 1.2). The first evidence of such an "editing mechanism" 35

Isoleucyl-tRNA synthetase

Fig. 1.2. Schematic representation of the binding sites of IleRS and ValRS. (a) The binding site of IleRS is large enough to accomodate its isotere valine, which is smaller by one methylene group, (b) Similarly, ValRS can accomodate threonine in the valine binding pocket. (Taken from Fersht, 1985). came from the demonstration that IleRS possesses a unique hydrolytic activity for valyl adenylate (Baldwin and Berg, 1966).

1.4.3. Some Synthetases have Editing Mechanisms

Norris and Berg (1964) showed that IleRS from E. coli activated valine at a rate that was about 200 times lower than that for the activation of isoleucine. When tRNA^e was present, the level of mischarging of valyl adenylate was in keeping with the results of

Loftfield (1963). The overall reaction (taken from Fersht, 1986) may be written as

Eiie + ATP + Val — > E ileV^al-AMP + PP ,• (1.3)

Eile-Val-AMP + tRNAile — > E He + tRNA He + y ai + AMP (1.4)

Subsequent experiments confirmed that IleRS could act as an ATP- pyrophosphatase in the presence of valyl adenylate and tRNA^e and that hydrolysis did not depend upon whether the amino acid had been transfered to the tRNA (Baldwin and Berg, 1966). Corroborating evidence for a post-transfer hydrolytic route came from the observations that aaRSs acted as very weak esterases towards their cognate aminoacyl-tRNA, but that the rate of hydrolysis was far greater for mischarged tRNAs (for example, Yarus, 1972).

Similar editing mechanisms have been found for the rejection of threonine by ValRS from B. stear other mophilus (Fersht and Kaethner,

1976) and the rejection of homocysteine by the MetRS from

B. stear other mo philus (Fersht and Dingwall, 1979a). These cases 37 emphasise the requirement for an editing mechanism to prevent the mischarging of an isosteric amino acid. Fig 1.2. depicts how the binding site of IleRS can accomodate the smaller valine (a) and how valine and threonine compete for the binding site of ValRS (b). The non-cognate amino acids are bound more weakly, the values of being 150 and 200 times higher for the binding of valine to IleRS

(Loftfield and Eigner, 1966) and threonine to ValRS (Fersht and

Kaethner, 1976) respectively. Evidence for the post-transfer hydrolytic pathway (i.e. after the non-cognate amino acid had been transfered to the tRNA) was produced by Fersht and Kaethner in 1976. A rapid- quenched flow system was used to show thatB. stear other mo philus

ValRS mischarges threonine, forming thr-tRNAva^, and then specifically hydrolyses the complex. The method involved mixing the purified enzyme-threonyl adenylate complex and reacting it with tRNAva* in the quenched-flow apparatus. Thr-tRNAva^ accumulated quickly to a level where 62% of tRNAva* was mischarged, but was rapidly hydrolysed with a rate constant of 40 s"1 at 25 °C. In contrast, 84% of valine is transferred to tRNAva^, with a rate constant of 0.015 s' 1 for hydrolysis (Fersht and Kaethner, 1976). Moreover, the enzyme continues to hydrolyse mischarged tRNAva^ whilst val-tRNAva* is bound to the enzyme. This indicates a two-site system, one for activation and a second site specifically for hydrolysis of the non-cognate complex

(Fig. 1.3). The transfer between sites may be mediated by a translocation of the aminoacyl moeity between the 2 * and 3’ hydroxyls of the terminal adenosine of the tRNA (Fersht and Kaethner, 1976).

When similar studies were carried out on the rejection of valine by E. coli IleRS, only 0.8% of the valine present was transferred to

tRNA^e (Fersht, 1977). The study indicated that most of the misadenylated valine was removed by pre-transfer editing, but that a 38

(a)| ACYLATION HYDROLYSIS

Hydrophobic Hydrophilic

(b) ACYLATION HYDROLYSIS

Hydrophobic Hydrophilic

Fig. 1.3. Preferential binding of non-cognate aminoacyl-tRNA in an editing site. The diagram, taken from Fersht and Kaethner (1976), depicts a possible two-site selection for the ValRS against threonine. When ValRS binds valyl-adenylate, the valine side-group is bound in a hydrophobic (acylation) pocket (a). When threonine is mischarged with tRNAva^ by the enzyme, the side-chain of the amino acid is preferentially bound in a second (hydrophilic) site (b), with the resulting hydrolysis of thr-tRNAva^. This may be mediated by hydrogen-bonding the hydroxyl of threonine. A 2’- to 3’-hydroxyl transfer on the ribose ring may be involved to swing the misacylated tRNA into the editing site. 39 low post-transfer activity was present to remove any val-tRNA1^ that had escaped the first round of editing.

A model to explain editing has been proposed by Fersht that operates at the levels of steric hinderence and kinetic editing (Fersht and Kaethner, 1976; Fersht and Dingwall, 1979b). This model, the

"double sieve" argues that all amino acids which are larger than the cognate amino acid for a particular aaRS are excluded from the active site solely as a consequence of their size. Isoteric and smaller amino acids will be activated, albeit at slower rates than for the cognate amino acid because they bind poorly (Fersht and Dingwall, 1979a).

These are subsequently bound by the hydrolytic site in the second part of the sieve, to the exclusion of the cognate complex. The second sieve most likely operates against the cognate amino acid by specifically binding some part of the non-cognate side chain, for example, in the case of ValRS (Fig. 1.3) by forming a hydrogen bond to the hydroxyl of the threonine side-chain. In the absence of a 3-dimensional structure, one can only speculate as to the nature of such a site. It is conceivable that the hydrolytic site may represent a residue or group of residues that are positioned in the tertiary structure so as to bind the isostere preferentially (e.g. the presence of a hydrophilic residue close to the binding site for valine that binds the hydroxyl of threonine), as opposed to a distinct second binding site.

The cost of editing, according to equations 1.3 and 1.4, is the hydrolysis of one mol of ATP per mol of synthetase. The low-level hydrolysis of the cognate complex has been referred to earlier (Yarus,

1972). The double sieve system becomes cost-efficient by using different forms of discrimination at each step of the process: (1) by 40 preventing the activation of amino acids with larger side groups through steric hinderence; (2) by binding the cognate amino acid more tightly than the competing smaller species or isosteres, and (3) by using a distinct hydrolytic site in order to ensure that the misacylated species are bound more tightly than (and to the exclusion of) the cognate aminoacyl-tRNA, with subsequent hydrolysis of the misacylated species.

1.5. Cloning and Expression of Aminoacyl-tRNA Synthetase Genes

1.5.1. The Impact of Genetic Engineering

The previous section has described some of the key results concerning the mechanisms by which synthetases function in terms of enzyme kinetics. Technical advances in molecular biology over the last ten or fifteen years, such as the cloning of bacterial genes by complementation of bacterial mutants possessing a selectable genotypic marker (Carbon and Clark, 1975), or methods for determining the sequence of DNA (Sanger et al, 1977b; Sanger et al, 1980; Maxam and

Gilbert, 1977), have had a considerable impact on research into the

relationship of aaRS structure to function. Protein engineering of the

B. stearothermophilus TyrRS (reviewed by Fersht et al, 1986) perhaps

represents the state of the art at the moment, drawing together

molecular biology, X-ray crystalography and kinetics to present a

coherent picture of the molecular processes underlying the catalytic

mechanism. The rate-limiting step in such research is the determination

of a 3-dimensional structure: to date, the only reported strucutres are

those for TyrRS (Bhat et al, 1982) and the MetRS from E. coli (Zelwer

et al, 1982). Nevertheless, a considerable body of information 41 concerning the expression of synthetases genes and the structural organisation of their products has been obtained by a variety of means.

1.5.2. Sequencing of Synthetase Genes

The following cloned synthetase genes have been sequenced (all are from E. coli except where indicated): AlaRS (Putney et al, 1981a);

GlnRS (Yamao et al., 1982); GluRS (Breton et ah 1986); GlyRS (Webster et al, 1983); histidyl-tRNA synthetase (HisRS) (Freedman et al, 1985);

IleRS (Webster et al, 1984); MetRS (partial sequence - Barker et al,

1982b; complete sequence - Dardel et al, 1984); yeast cytoplasmic MetRS

(Walter et al, 1983); PheRS (Fayat et al, 1983); TrpRS (Hall et al,

1982); B. stearothermophilus TrpRS (Barstow et al, 1986, in press); yeast mitochondrial TrpRS (Myers and Tzagoloff, 1985); TyrRS (Barker et al,

1982a); B. stearothermophilus TyrRS (Winter et al, 1983) and Bacillus caldotenax TyrRS (Jones et al, 1986). Additionally, the cloning of the genes encoding ValRS from B. stearothermophilus and E. coli have been reported (Brand and Fersht, 1986; Skogman and Nilsson, 1984). In most cases, the primary sequence of each synthetase has been determined from the DNA sequence and compared with other aaRSs in order to identify common regions that may indicate conserved catalytic regions or ancestral trends.

1.5.3. Diversity in the Structure of the Promoters of Synthetase Genes

The sequences corresponding to the transcriptional promoters of a number of aaRS genes have been determined. The distances of the startpoint of transcription from the beginning of the structural gene for a number of synthetases are given in Table 1.2. Some synthetase genes 42

Tabic 1,2

Distances between the Structural Gene and Startooint of

Transcription for a Number of Synthetase Genes

Synthetase Gene Transcription Reference Startpoint a trpS (B . stear) -40 (PI) Barstow et al -74 (P2) (1986), in press tyrS (B. stear) -283 Waye and Winter (1986) valS (B. stear) -434 Brand, this thesis, Chapter 7 alaS (E. coll) -79 Putney and Schimmel (1981) glnS (E. coll) -30 Yamao et al (1982) hisS (E. coll) -10 (PI) Freedman et al (1985) -64 (P2) -103 (P3) ileS (E. coli) -30 (?) Kamio et al (1985) -1111 pheS,T (E. coli) -368 Fayat et al (1983) thrS (E. coli) -162 Fayat and Mayaux (unpublished), cited in Springer et al, 1985

a The figures shown represent the distance of the startpoint of transcription from the first base of the initiation codon of the structural gene. Multiple promoters are denoted by PI, P2 and P3, as appropriate. 43 have multiple promoters. The trpS gene of B. stearothermophilus appears to have two promoters (Barstow et al, 1986, in press) and the hisS gene of E. coli has three potential promoters situated between 10 and 103 bp upstream of the initation codon (Freedman et al, 1985). E. coli AlaRS has a single promoter, the startpoint being 79 bp away from the initiator AUG (Putney and Schimmel, 1981), whereas the startpoint of transcription for the E. coli pheS,T operon is located 368 bp upstream of the structural gene (Fayat et al, 1983). This diversity in promoter structures may be as a result of different methods of controlling gene expression.

One striking feature of the 5* non-coding regions of various synthetases are inverted repeated sequences. Regions of dyad symmetry are quite common in the sequences around bacterial promoters

(Rosenberg and Court, 1979) and are found for all three aminoacyl-tRNA synthetases from B. stearothermophilus that have been sequenced (trpS - Barstow et al, 1986, in press; tyrS - Waye and Winter,

1986; and valS - Brand, this thesis, Chapter 7). These inverted repeats, which are G/C-rich, could form hairpin loops that resemble rho-independent terminator signals (Platt, 1986). Terminator and promoter sequences have been shown to overlap in a number of cases.

The E. coli ribosomal protein genes spc and slO overlap such that the termination signal of spc precedes the promoter of slO (Post et al,

1978). In contrast, some overlapping bacteriophage genes such as those of 4>xl74 have the -10 region of the promoter (Pribnow, 1975) located within the termination sequence of the preceding gene (Sanger et al,

1977a). In the latter case, transcription can be initiated because the sequences corresponding to the termination loop are unable to assume a secondary structure. The significance of overlapping transcription and termination signals for synthetase genes is unclear. 44

1.5.4. The Expression of Synthetase Genes is Regulated in a Variety of

Ways

For some time, it has been apparent that the expression of the genes corresponding to aaRSs in E. coli can be controlled in two ways: general metabolic regulation and specific regulation that is dependent upon the concentration of the cognate amino acid. A number of synthetases from all structural classes, such as IleRS, TrpRS, GlyRS and the a subunit of PheRS, exhibit an increase in their intracellular concentrations as the growth rate of the cell increases (Neidhardt et al,

1977; Pedersen et aU 1978). The levels of the synthetases were measured by growing E. coli through several cell generations in a

minimal medium containing one of five different [^C ] labelled carbon

sources. The growth rate was altered according to the point at which

the carbon source fed into the various E. coli catabolic pathways

(carbon sources tested included glucose, acetate and glycerol).

Subsequently, the labelled proteins were purified and identified by

2D-polyacrylamide gel electrophoresis, followed by autoradiography

(O’Farrell, 1975). The radioactive proteins corresponding to 140 spots

on the film were cut out of the gel, and the level of radioactivity in

each was determined by scintillation counting. The amount of each

protein, and, ultimately, an estimate for the number of molecules of

each protein per cell, was made by comparing the level of radioactivity

in each spot to the total radioactivity for all the proteins (Pedersen

et al, 1978). The identity of many of the proteins were unknown, but

a certain number, including ribosomal proteins, RNA polymerase and all

the synthetases that could be identified, were assigned to a group of

proteins that increased in concentration as the growth rate increased.

Such observations are in keeping with the role of the synthetases in 45 protein synthesis.

A number of E. coli synthetases, including ValRS, LeuRS and

ThrRS, show a transient rise in expression (i.e., derepression) in response to starvation of the cognate amino acid (Neidhardt et al,

1977). In contrast, ArgRS, IleRS and PheRS respond to cognate amino acid starvation with a more long-term derepression of expression. Some recent studies have taken advantage of the cloning of synthetase genes in an attempt to address this puzzle (Springer et al, 1985). A bacteriophage X vector containing the promoter and part of the trpS structural gene fused to the lacZ structural gene was introduced into an E. coli strain that possessed a chromosomal mutation in thrS. The level of (3-galactosidase was measured and found to be 3-5 times higher than when the vector was introduced into a wild-type E. coli strain containing a functional copy of thrS. The level of mRNA corresponding to the fusion protein increased by only a factor of 0.5.

Moreover, the level of 0-galactosidase in the mutant strain fell by a factor of 20-fold if a plasmid that overproduced ThrRS from a cloned copy of thrS was co-transfected with the X construct. Together, these results imply that ThrRS represses its own expression at the level of translation.

The E. coli AlaRS appears to regulate its own synthesis at the

level of transcription. Analysis of radiolabelled alaS transcripts,

synthesised in vitro, showed that by increasing the concentration of

AlaRS present, the level of transcription of alaS was reduced (Putney

and Schimmel, 1981). Increasing levels of AlaRS did not affect the

transcription of a control promoter, that of the E. coli his operon. The

authors went on to show that the synthetase bound to a palindromic

sequence close to the transcription startpoint of the alaS promoter by using DNase I footprinting methods. Furthermore, AlaRS shows enhanced repression of its own synthesis if alanine is present and this repression is stimulated by increasing concentrations of the cognate amino acid.

The IleRS from E. coli presents another variation on promoter structure and the control of expression. The ileS gene appears to be transcribed as part of a polycistronic messenger RNA, contained between the upstream protein x gene (encoding an unidentified protein)

and the IspA gene for lipoprotein signal peptidase (lsp) downstream

(Kamio et aU 1985). There are sequences located immediately upstream

of ileS that could represent the -10 and -35 promoter hexanucleotides

(Siebenlist et aU 1980), placing the transcription startpoint at

approximately 30 bp upstream of the start of the structural gene.

However, there are a number of facts that suggest that the ileS gene is

transcribed as part of a polycistronic mRNA from the x gene promoter

that is situated 1.1 kb upstream of the ileS structural gene (Tokunaga

et al, 1985). There are no termination signals between the x and ileS

genes and the initiation codon for the IspA gene overlaps with the

termination codons for ileS (Yu et aU 1984). Tokunaga and colleagues

also cloned a section of DNA comprising the ileS, IspA and the 3’ end

of the x gene such that transcription of the operon was driven by a

plasmid Tc^ promoter instead of the x promoter, with the order of the

genes (x - ileS - IspA) maintained. The Tc^ promoter functioned as

the major promoter for both ileS and IspA.

E. coli PheRS also appears to regulate its own synthesis.

Mutations in the pheS and T structural genes are associated with a

reduced level of transcription for both genes (Plumbridge and Springer, 1982). This synthetase is a prime example of how the expression of its gene may be regulated in different ways: the pheS and T genes are also controlled by attenuation.

1.5.5. Attenuation as a Means of Controlling Expression of the pheS, T

Operon

The control of prokaryotic gene expression by attenuation has been demonstrated for several genes, including the E. coli trp and Salmonella typhimurium his operons (reviewed by Yanofsky, 1981). Generally, attenuation involves the synthesis of a short transcript (100-200 bp long) that is able to form two alternative secondary structures by complementary base pairing. These may be referred to as the terminator and anti-terminator forms of the attenuator. A characteristic of attenuation sequences controlling biosynthetic operons is the presence of a short open reading frame (ORF), coding for a peptide that is particularly rich in certain amino acid residues, in front of the genes that are controlled by attenuation. In the case of the trp operon

(which encodes enzymes involved in the biosynthesis of tryptophan), the short ORF codes for a leader transcript that contains tandem trp codons. Under normal growth conditions, transcription terminates ahead of the trp operon at a hairpin structure that is followed by a short poly(U) tract on its 3’ side and is recognised by RNA polymerase as a rho-independent terminator (the terminator hairpin; Fig. 1.4a). When the bacteria are starved of tryptophan, the ribosome stalls at the tandem trp codons, probably as a result of the fall in the level of charged trp-tRNAtrP (Das and Yanofsky, 1984). This allows the attenuator to assume an alternative secondary structure (the anti-terminator) which no longer resembles a rho-independent terminator (Fig. 1.4b). As 48

52 >U-A- a-G / ,G An 95 U G G^ C U.? A G*C A G-C U u;Ac G**Uv u~u g£ A--0 8 u \sC-G A A«U 6 C*G c HO HO Cj. a uuuuuuuu- u c A • tHJ-U U-U-U-UHJ fc.d G*d £•

a b

Fig. 1.4. Alternative secondary structures for the attenuator of the E. coli trp operon. The figure, taken from Yanofsky (1981), depicts two sets of hairpin loops formed by the trp leader transcript, (a), the terminator form, is characterised by a hairpin (loops 3:4 of the schematic in the centre of the figure) that resembles a rho-independent terminator, (b), the antiterminator, forms a 2:3 base-paired hairpin. The mechanism of attenuation is explained in Section 1.5.5. 49 prokaryotic transcription and translation are closely linked, the RNA polymerase, which is ahead of the ribosome, is able to continue transcription through the trp operon, with the resultant expression of those genes that are necessary for tryptophan synthesis. Accordingly, the trp operon is able to modulate its synthesis by up to eight-fold by attenuation.

The E. coli PheRS has an of2/32 structure, the subunits being coded by the pheS and pheT genes respectively (Springer et al, 1983;

Fayat et al, 1983). The genes are arranged in an operon which contains elements that are typical of a classic attenuator. The promoter is sited 368 bp upstream of the pheS structural gene. The intervening sequences code for a 14-residue leader peptide that contains five phenylalanine residues. This is coded from a 150 base transcript that is able to form three alternative hairpin structures, including one that strongly resembles a rho-independent terminator and which terminates

90% of transcription. The remaining transcripts extend into the pheS structural gene and, as pheT was demonstrated to be transcribed from the pheS promoter (Plumbridge and Springer, 1980), may be assumed to represent the full-length transcript, covering the entire pheS,T operon.

Further, the expression of the pheS, T operon appears to be regulated by the availability of charged tRNAP^e. A mutation (trpX) adjacent to the anticodon of certain tRNA molecules, notably tRNAtrP and tRNAP^e, derepresses the trp and pheS, T operons respectively.

Presumably, this mutation (the non-modification of an adenosine adjacent to the anticodon) affects the tRNA such that the ribosome stalls over the tandem amino acid repeats, allowing RNA polymerase to read-through into the structural gene of the operon (Eisenberg et al,

1979). Derepression caused a six-fold increase in transcription of the pheS, T operon (Springer et al, 1983). 50

The examples mentioned in this and the previous section serve to show that the expression of aminoacyl-tRNA synthetases is as diverse as their structural organisation. It appears from the example of the

E. coli pheS, T operon that the expression of a particular gene may be controlled on several different levels.

1.6. Structural and Functional Homologies between Synthetases

1.6.1. Homologies at the Level of the Primary Sequence

Computer programs for the alignment of protein sequences have been used to search for regions of homology between one particular aaRS from different species or between different types of synthetases.

Homologies between one kind of synthetase from different species can be quite extensive. For example, TyrRS from B. stearothermophilus is

56% homologous with E. coli TyrRS as regards to perfect amino acid matches (Winter et al, 1983) but 99% homologous to the more closely related TyrRS from B. caldotenax (Jones et al, 1986). The homologies are distributed throughout the length of the enzymes. In another comparison, the Met-tRNA synthetases from yeast and E. coli (751 and

642 residues respectively) were found to be particularly homologous over a region of 400 residues only (Walter et al, 1983).

An important amino acid homology concerns a peptide found in the amino (N) terminus of a number of different synthetases. This peptide, called the "HIGH" sequence on account of it often (but not always) having the sequence his-ile-gly-his, was recently discovered at positions 64-67 in the IleRS from E. coli (Webster et al, 1984; Fig. 1.5).

The report was significant on three counts. First, the HIGH sequence 51

...TCALPII YANGSIHLGH ML E H2.9 MetRS / “ HDGPP YANGSIHI GH S V N k!1.. lleRS

?.4.Y C G F D P T A D|S_L H j_G HjL A T 1“52 TyrRS

.3.°.t r f (pp]e pIngIyIlh ighIa k s i40 GlnRS

Fig. 1.5. Comparison of homologies in the amino-termini of four aminoacyl-tRNA synthetases. MetRS, IleRS and GlnRS are E. coli enzymes, and TyrRS is from B. stear other other mophilus. The standard one-letter code for amino acids is used: A, alanine; C; cysteine; D, aspartic acid; E, glutamic acid; F, phenylalanine; G, glycine; H, histidine; I, isoleucine; K, lysine; L, leucine; M, methionine; N, asparagine; P, proline; Q; glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan, and Y, tyrosine. Regions of perfect homology are boxed with solid lines, whereas conservative matches are drawn within dashed lines. (Taken from Webster et al, 1984). 52 may be aligned with those found in the N-termini of E. coli MetRS,

GlnRS and B. stear other mophilus TyrRS. The MetRS contains a conservative replacement of leu instead of ile in the second position of the peptide. Secondly, the HIGH sequence is part of a stretch of 11 consecutive residues in IleRS that are homologous with MetRS. This represents the most extensive homology yet reported between different synthetases. Thirdly, it is known from the 3-dimensional structure of

MetRS that this region forms part of an N-terminal domain that strongly resembles a nucleotide binding fold (Risler et al, 1981). A comparison of the MetRS structure with that of the B. stearothermophilus

TyrRS, which also has an N-terminal nucleotide binding fold that is known to be the site of ATP binding (Monteilhet and Blow, 1978), revealed that the a-carbon backbone of the two proteins were superimposable over the region encompassing the nucleotide binding fold

(Blow et al, 1983). This suggests that the immediate region containing the HIGH sequence may be structurally conserved between different synthetases, even if there is considerable variation in primary structures.

A second conservative amino acid substitution may occur in the

HIGH sequence, with the second histidine replaced by an asparagine, as in the case of B. caldotenax TyrRS (Jones et al, 1986). Protein engineering of B. stearothermophilus TyrRS confirms that such a substitution does not significantly alter the kinetic properties of the enzyme (Lowe et al, 1985).

A second region of homology has been reported recently for IleRS,

MetRS and TrpRS from E. coli (Hountondji et al, 1986). The conserved sequence is rich in lysines (lys-met-ser-lys-ser) and has been implicated in binding the 3’ terminal adenosine of tRNAmet by MetRS

(Hountondji et al, 1985). Though the E. coli TyrRS differs in primary sequence around this region, it contains a similar arrangement of 53 lysines which could be reacted with aldehyde groups present on the 3*

adenosine of periodate-treated tRNAtyr (Hountondji et aU 1986).

No extensive homologies have been found between different types

of synthetase, whether from the same organism or from different

species. The only major homologies reported so far exist between the

same type of synthetase from different species, as exemplified by the

tyrosyl-tRNA synthetases.

1.6.2. Dissection of Synthetase Structure with Respect to Function

One of the earliest observations on the structure of the dimeric

MetRS of E. coli (subunit Mr = 76,000) was that mild proteolysis

produced a catalytically active monomeric fragment that lacked the

carboxy (C) terminal 200 residues (Cassio and Waller, 1971). This

implies that at least some of the contact points for dimerisation are

contained towards the C-terminus of the protein. The full DNA

sequence for E. coli MetRS and the sequence of yeast cytoplasmic

MetRS have been published recently (Dardel et al, 1984; Fasiolo et al,

1985). Alignment of areas of common homology in the primary

sequences of the two enzymes reveals that the E. coli enzyme possesses

roughly 100 more residues at the C-terminus than the yeast counterpart.

As yeast MetRS exists in vivo as a monomer, this suggests that the

residues essential for dimerisation of the E. coli enzyme are found in

the C-terminal 100 residues. Deletion mutagenesis of the alaS gene

coding for the tetrameric AlaRS fron E. coli ( a 4: subunit Mr = 95,000)

has shown that residues between 699-808 of the 875-residue protomer

are necessary for oligomerisation (Jasin et al, 1983). Such studies have

shed some light on the organisation of tertiary and quarternary

structure amongst various synthetases. Jasin and colleagues cloned the 54 alaS gene into a plasmid vector, linearised the plasmid at a unique restriction site downstream of the structural gene and created deletions of various lengths extending in from the 3’ end of the structural gene using the double-stranded exonuclease Bal3l. The plasmids were subsequently recircularised and used to transformE. coli. Mutagenesis created a selection of plasmids coding for truncated AlaRS, all of the proteins containing the N-terminal region, but varying as to the number of residues that had been deleted from the C-terminus. The truncated alaS plasmids were assayed for adenylation by a modification of the pyrophosphate exchange method (Putney et al, 1981b) and for tRNA charging by complementation of a temperature-sensitive alaS strain of

E. coli at the non-permissive temperature (the charging ability of this mutant was vitually abolished at this temperature, though the adenylation ability remained unimpared). The results indicated that

only the N-terminal 375 residues were needed for activation. The

N-terminal 461 residues represented the minimum information that was

necessary for the aminoacylation of alanine, implicating residues

between 376 and 461 as being involved in binding tRNA.

The 100 residues that comprise the C-terminal domain of the

B. stearothermophilus TyrRS (419 residues per protomer) are disordered in

the 3-dimensional structure of the enzyme. The functional contribution

of this region is unclear. These residues were deleted from the tyrS

gene in an attempt to ascertain the function of the C-terminal domain

by using a novel selection method (Waye et al, 1983). Two

complementary recombinant Ml 3 clones, one copy carrying the whole

tyrS gene, the other carrying a 517 bp fragment from the 3’ end of

the gene, were annealed together to form a partial heteroduplex

(Fig. 1.6). The latter M13 clone contained two amber mutations in essential Ml 3 genes. The double-stranded portion of the heteroduplex 55

"M13 SPLINT

Amber --H-

Hmfl Hmf I Amber:: \ I 5' •CGCTAAGCGTAA- -A ACTAAGCGATGCGG AT T- •ATG- ■GCGAT T CGCATT- -TTGAT TCGCTACGCCTAA Start Ala lie Arg lie Leu lie Arg Tyr Ala Stop 315 316 317 318 415 416 417 418 419

M13 (TyrTS)

HmfI.T4 DNA Itgase Transform intoW 7l-l8 Sequencing primer - ...... 5 -ATG- GCGATTCGCTACGCCTAA------Start Ala lie Arg Tyr Ala Stop 315 418 419

M13 (Truncated Tyr TS)

Fig. 1.6. Deletion of the C-terminal 100 residues of B. stear other mophiliis TyrRS. Two M13 complementary clones, one carrying a full-length copy of tyrS, the other containing a portion of the 3’ end of the structural gene, were allowed to form a partial duplex. The sequences in the double-stranded region that correspond to the C-terminal 100 codons were deleted by cutting with restriction enzyme Hinfl. (Figure taken from Waye et al, 1983). 56 was restricted with restriction endonuclease Hinfl, deleting a fragment of the gene corresponding to the C-terminal 100 amino acids. The

DNA was subsequently re-ligated and transfected into an E. coli non-amber suppressor strain in order to select against the viral strand that contained the 517 bp fragment of tyrS. The viral DNA from white recombinant plaques was isolated and sequenced using the dideoxy chain termination method of Sanger et al (1980). A number of clones were found to contain the desired deletion and the kinetics of the truncated TyrRS were studied. The activation of tyrosine by the truncated enzyme was indistinguishable from the full-length enzyme.

However, the truncated TyrRS was unable to charge tRNA, implying that this region bound tRNA or contained some important points of contact for tRNA binding. As mentioned earlier, TyrRS shows half-of-the-sites reactivity, binding one mol of each of its substrates per dimer. Recent studies in which 40 surface lysine and arginine residues of TyrRS were changed by site-directed mutagenesis indicate that the tRNA forms multiple contacts with the dimer across both subunits (Bedouelle and Winter, 1986). Computer-aided model building studies suggest that the 3’ acceptor adenosine of the tRNA could be bound close to the active site of the one of the subunits.

1.7. Towards a Study of the Functional Arrangement of the Valyl-tRNA

Synthetase

The kinetics of B. stear other mophilus ValRS have been well

characterised, especially with regard to the editing mechanism (Fersht and Kaethner, 1976; Fersht and Dingwall, 1979). The work presented in this thesis describes the cloning and sequencing of the valS gene 57

(Brand and Fersht, 1986; Borgford et al, submitted for publication) as a prelude to studying the organisation of the primary structure with respect to catalytic functions. Such studies would be achieved by deletion and site-directed mutagenesis. The ValRS protein sequence was derived from the DNA sequence and compared with those of other synthetases, particularly that of E. coli IleRS (Webster et al, 1984). It was of considerable interest to see if there were any homologies between the two enzymes that might represent the site of editing

(Section 1.4.3). Deletion or site-directed mutagenesis techniques could then be used in order to probe the functions of the enzyme. Deletion mutagenesis of AlaRS and TyrRS suggests that various functions are distributed throughout the protomer, possibly as discrete domains for some functions such as oligomerisation or adenylation. Such a view is reinforced by the localisation of the HIGH sequence to the N-terminus of a number of synthetases (Section 1.6) and the conservation of the tertiary structure of the nucleotide-binding folds of MetRS and TyrRS.

The dimeric E. coli TrpRS contains only 334 residues per protomer

(Hall et aU 1982) and represents the minimum information needed to catalyse the adenylation and charging of tRNAtrP, the enzyme binding one mol of each substrate per subunit (Muench et al, 1976). This implies that a much larger synthetase such as ValRS or IleRS might contain extensive structural redundancy with respect to function.

Alternatively, the size of ValRS and IleRS might be due to their having editing mechanisms, reflecting another structural domain, or the necessity to have a large surface area for the binding of tRNA. The latter could be provided by having a tertiary or quarternary structure based on one large polypeptide chain or from the association of a number of smaller ones. 58

CHAPTER 2

EXPERIMENTAL: GENERAL RECOMBINANT DNA METHODS

2.1. Media

The following media were used. Luria Broth (L-broth) is 1%

Bacto Tryptone, 0.5% Yeast Extract and 1% NaCl, pH 7.5; 2 x TY is

1.6% Bacto Tryptone, 1% Yeast Extract and 0.5% NaCl, pH 7.5. Both media were used for general culture purposes, agar being added to a final concentration of 2% for solid plates. Where appropriate, filter- steriled ampicillin or tetracycline were added to autoclaved media to give final concentrations of 50 /-ig/ml or 15 ng/ml respectively. All bacterial strains were maintained on L-broth plates with the exception of those carrying the F* episome, such as E. coli JM101 (Messing, 1979).

These were grown on M9 mimimal medium plates, supplemented with glucose and vitamin Bj, in order that the strains maintain the F’ episome which is essential for infection of the cells by Ml 3 phages

(Messing, 1983). The basic M9 recipe is 1.05% K2HPO4, 0.45%

KH2 PO4, 0.1% (NH4)2S0 4 s 0.05% sodium citrate, 0.02% MgSC>4, plus 1.5% agar. Sterile 20% glucose, previously filtered through a 0.22 fi filter

(Millipore Corporation, USA), was added to a final concentration of

0.2 %, and filter-sterilised vitamin B| was added to a final concentration of 0.0005%. All Bacto-media was purchased from Difco, AnalaR-grade chemicals from BDH Chemicals Ltd and antibiotics from the Sigma

Chemical Co. Ltd. 59

2.2. Bacterial Strains

A summary of the bacterial strains that were used, together with their genotypes and key references, are presented in Table 2.1. The temperature-sensitive E. coli mutant 236c is one of a number of aminoacyl-tRNA synthetase mutants that have been isolated by Dr K.

Bohman (Department of Molecular Biology, Biomedicum, Uppsala,

Sweden). It is a K-12 derivative with a reversion frequency 2 x 10"9 at 42 °C on L-broth plates. E. coli JM101 and TG2 were provided by

Drs Mick Jones and Greg Winter, respectively.

2.3. Enzymes

Restriction enzymes, E. coli DNA polymerase I (Klenow fragment) and T4 DNA ligase were supplied by either Cambridge Biotechnology

Ltd or Pharmacia Ltd, U.K.. T4 DNA polymerase was a gift from Dr

Mick Jones. Calf intestinal (alkaline) phosphatase (molecular biology grade) was purchased from Boehringer Mannheim (BCL) Ltd and nuclease B alll from New England Biolabs.

2.4. Cloning Vectors

The plasmid vectors pAT153 (Twigg and Sherratt, 1980), pUC8 and pUC9 (Vieira and Messing, 1982) and M13 vectors mp8 and 9 (Messing and Vieira, 1982) were gifts from Dr Mick Jones. 60

Tai?!g.2_j.

Summary of Strains

Strain Genotype References

236c K12, F", relA+, strA+, argG , Isaksson et al (1977) Zac, £ •,, (Xcl8 5 7 h 8 0st6 8d/«c+)

DH1 F’, relAl, endAl, gyrA96, //h-7, hsdR17 (r\, Hanahan (1983) supE44 , recAl, X-

DH5 A variant of DH1 with a high Hanahan (1985) transformation efficiency

JM101 K12, A{lac-pro), supE , /W, FVrflZ)36, proA+B+, /nclq, focZ AMI 5, Messing (1979)

TG2 As JM101 but recA-, Gibson (1984) Srl:TnlOTcR, fotfA5 61

2.5. Restriction Enzyme Digestions

2.5.1. Conditions for Digestion of Plasmid DNA

A typical restriction digest consisted of 100-300 ng of plasmid

DNA in a 15 jil volume containing 1-5 units of a restriction enzyme or enzymes in an appropriate buffer (Maniatis et aU 1982). Different restriction enzymes are active under specific conditions of ionic strength, so four separate restriction buffers were prepared as ten-fold concentrates. These are referred to as low, medium, high and Sma buffers (the composition of these buffers is given in Table 2.2). The enzyme Smal will not cleave DNA in any of these buffers as it has a dependence for KC1, hence the necessity for a separate buffer. All digestions were carried out at 37 °C for 2 h. Where a digest required using two endonucleases with different buffer requirements, the DNA was digested by the enzyme that was active at the lower ionic strength before adding the second enzyme, plus the appropriate volume of 1 M

NaCl to increase the ionic strength. Double digests that included Smal as one of the two enzymes were carried out by adding Smal as the first enzyme. In all cases, the volume of restriction enzyme(s) added did not exceed 10% of the total volume of the digest.

2.5.2. Separation of Restriction Fragments through Agarose Gels

The various methods that were used for resolving and purifying restriction fragments were modified from Maniatis et al (1982). The sizes of restriction fragments were determined by electrophoresis through a 250mm x 200 mm x 6 mm 1% agarose/TBE slab gel slab gel containing 0.2 ftg EtBr per ml of agarose. TBE buffer (Table 2.2) 62

Table 2.2

Composition of Buffers

Restriction buffers

10 x low salt 100 mM Tris-HCl, pH 8, 100 mM MgCl2, 10 mM dithiothreitol (DTT)

10 x medium salt as 10 x low salt but contains 500 mM NaCl

10 x high salt as 10 x low salt but contains 1 M NaCl

10 x Smal 200 mM KC1, 100 mM Tris-HCl, pH 8, 10 mM MgCl2, 10 mM DTT

Other buffers

2 x Bal3l 1.2 M NaCl, 24 mM CaCl2, 24 mM MgCl2, 40 mM Tris-HCl, pH 8, 2 mM EDTA).

10 x LB 67 mM Tris-HCl, pH 7.5, 6.7 mM MgCl2, 10 mM DTT

1 x STET 5% Triton X-100, 50 mM EDTA, 40 mM Tris-HCl, pH 8, 5% sucrose)

10 x TBE 89 mM Tris-borate, 50 mM EDTA, pH 8

1 x TE, 10 mM Tris-HCl, pH 8, 0.1 mM EDTA

10 x TM lOOmM Tris-HCl, pH8, lOOmM MgCl2

10 x TNE 100 mM Tris-HCl, pH 8, 1 M NaCl, 10 mM EDTA). 63

Table 2.2 (continued)

Tfb I 30 mM KAOc, 100 mM RbCl2, 10 mM CaCl2, 50 mM MnCl2, pH 5.8, 15% glycerol

Tfb II 10 mM MOPS, 75 mM CaCl2, 10 mM RbCl2, pH 6.5, 15% glycerol

was used as electrophoresis buffer throughout. Occasionally, fragments were resolved on a 1% agarose/TBE mini-gel of dimensions 105 mm x

70 mm x 6 mm. Agarose (electrophoresis grade) was purchased from either Bio-Rad or Pharmacia Fine Chemicals Ltd. The DNA sample was mixed with V 5 volume of loading buffer (0.25% bromophenol blue,

40 % (w/v) sucrose in H 20) prior to loading on to the gel. The DNA fragments were separated by electrophoresis overnight at 5-10 mA. The

DNA was visualized by placing the gel on a UVP Ltd UV transilluminator.

The concentration of plasmid DNA in a given sample was determined empirically by electrophoresis through a mini-gel against a marker DNA of known concentration. In general, the lowest amount that one could visualise on a transilluminator was roughly 25 ng.

Molecular weight markers of known size, typically bacteriophage (X) chromosomal DNA restricted with either Aval, Clal or

Hindlll were run alongside the digests in all instances. 64

2.5.3. Purification of DNA Fragments

Between 0.5 and 10 /ig of plasmid DNA were digested by 5-20 units of restriction enzyme, in a volume not exceeding 50 /il, as described in Section 2.5.1. The DNA fragments were resolved by electrophoresis at 50 mA through a 0.8-1.2% low-gelling temperature agarose (LGT-agarose)/TBE mini-gel containing EtBr in TBE buffer.

LGT-agarose was purchased from the Sigma Chemical Co. Ltd.

After electrophoresis, the required restriction fragment was cut out of the gel and placed in a 1.5 ml capped Eppendorf tube. An equal volume of TE buffer (Table2.2) was added and the gel slice was melted at 68 °C. The agarose and proteins were extracted by adding an equal volume of TE buffer-saturated phenol, vortexing briefly and separating the organic and aqueous phases by centrifugation in a bench-top centrifuge for 2 min. The top (aqueous) phase was transferred to a fresh tube and the extraction was repeated with fresh phenol. The aqueous phase was washed with an equal volume of 24:1 chloroform:isoamyl alcohol and the phases were separated as before.

Finally, the aqueous phase was transferred to a clean tube and the

DNA was precipitated by adding 0.1 volume of 3M NaOAc, pH 5, and

2 volumes of cold ethanol and placing the tube at -20 °C overnight or on dry ice for 30 min. The DNA was collected by centrifugation in a bench-top centrifuge (approximately 5,000 rpm) for 10 min at 4 °C, and the pellet was washed with cold 96% ethanol, dried under vacuum and resuspended in a small volume of TE buffer. 65

2.6. Preparation of Phosphatased Vector DNA

The terminal 5* phosphate groups of linear, restriction enzyme-cut, vector DNA molecules were removed by the action of calf intestinal

(alkaline) phosphatase (CIP), using a modification of Maniatis et al

(1982). This was done in order to reduce the level of intramolecular religation or concatemerisation of the vector.

The DNA was digested with the appropriate restriction enzyme as described in Section 2.5.1. CIP was added to the digest (2 p\ of a

0.01 unit//*l stock) and the reaction was incubated at 37 °C for 30 min. 0.01 U of CIP is sufficient to remove the 5’ phosphates from

1 pmol of linear DNA, approximately 1.6 fig of a 4 kb DNA molecule

(Maniatis et al, 1982). The volume of the reaction was adjusted to 80 fil with TE buffer, and 10 fil of 10 x TNE buffer (Table 2.2) plus 10 fi\ of 10% SDS was added. The CIP was inactivated by heating to

68 °C for 15 min and the DNA was extracted and precipitated as described in Section 2.5.3. Finally, the DNA was resuspended in a small volume of TE buffer, and the concentration estimated by running an aliquot on an agarose/EtBr gel. The phosphatased vector was stored at -20 °C as a 20 ng//«il stock.

2.7. B. stearothermophilus Gene library

2.7.1. Construction of the Gene Library

A library that had been constructed in the plasmid pAT153, derived from B. stearothermophilus strain NCA 1503 chromosomal DNA, was provided by Prof. Tony Atkinson (PHLS Centre for Applied 66

Microbiology and Research, Porton Down, Salisbury, Wiltshire, U.K.).

The library was constructed as follows. B. stear other mophilus chromosomal DNA was digested partially with the restriction enzyme

Sau3 A and 9-15 kb fragments were purified following separation by electrophoresis through an agarose slab gel. Subsequently, the fragments were ligated into the unique BamHl site that is contained within the tetracycline-resistance gene (Tc) of the plasmid pAT153 (Twigg and

Sherratt, 1980). The products of ligation were used to transform competent E. coli W5445 (pro-, /e«‘, thi~, thr~, lacY~, ton A, supE44, r", m",

StrR). The cells were plated on L-plates containing either ampicillin or tetracycline and the number of colonies that were resistant to ampicillin

(AmpR) were scored against those that were both AmpR, TcR (i.e. the latter colonies contained intact pAT153 as opposed to a recombinant plasmid). Approximately 80% of the transformants were sensitive to tetracycline (Tc^) and AmpR. Of those, greater than 90% contained inserts of the expected size (determined by purifying plasmids from a cross-section of colonies). Each of 3,600 AmpR, TcR colonies were picked and re-plated on L-plates containing ampicillin as nine groups of

900 colonies apiece. Each group of colonies was scraped from the plates into liquid media, grown in liquid culture and aliquots were mixed with glycerol and stored under liquid nitrogen.

2.7.2. Amplification of the Gene Library

Aliquots (100 pi) of each of the nine batches of transformed

E. coli that represent the gene library (see preceding section) were spread on L-broth + Amp plates and incubated overnight at 37 °C.

Next day, the colonies were scraped off into sterile medium and used to seed a l l liquid culture of L-broth containing ampicillin. The 67 culture was grown at 37 °C, with shaking, for 16 h. A two step selection was employed in order to reduce the loss of plasmids containing large inserts. In liquid culture, a population of small plasmids may come to predominate over large plasmids over several generations. The smaller plasmids replicate and segregate quickly, within the doubling time for the host. Since the proportion of larger plasmid could be lowered if grown in liquid culture, it is advantageous to culture the cells on solid media first and then amplify quickly in liquid media. Hence, the solid media step should ensure that all plasmids, whether large or small, are represented equally and the broth culture step allows for a quick amplification of the yield of each plasmid.

Next day, the cells were harvested by centrifugation. Plasmid

DNA was purified as described in Section 2.8.1. The purified DNA was resuspended in TE buffer and stored at -20 °C.

2.8. Purification of Plasmid DNA

2.8.1. Large-scale Plasmid Preparation

A modification of the alkaline-SDS method of Birnboim and Doly

(1979) was used for large-scale (i.e. greater than 500 ml) preparations of plasmid DNA. The starting material was an overnight culture of E. coli DH5 carrying a plasmid or plasmid library. This is a vigorous strain of E. coli that allows for a purification yield of better than 1 mg of plasmid per litre of bacterial culture. The DNA/RNA pellet, precipitated from propan-2-ol, was resuspended in 7 ml of TE buffer, and solid caesium chloride (7 g) was added, together with 200 pi of 10 68 mg/ml Ethidium Bromide (EtBr). The mixture was transferred to two polyallomer centrifuge tubes and loaded into a Beckman type 70.1 Ti rotor. The CsCl gradients were formed in a Beckman L8-55 at 48,000 rpm at 20 °C for a minimum of 16 h. Next day, the covalently-closed circular DNA was visualised under UV light, the DNA was removed and the EtBr was extracted by standard methods (Maniatis et al, 1982).

2.8.2. Small-scale Plasmid Preparation

A protocol involving rapid boiling was used for purifying plasmids from cell cultures of 2 ml volume. This was especially useful where it was necessary to screen a large number of transformants for the desired recombinant plasmid. The method is a modification from

Holmes and Quigley (1981).

The cells were grown overnight at 37 °C, with shaking, and 1.5 ml of the culture were transferred to an Eppendorf tube. The cells were pelleted by centrifugation in a bench-top centrifuge for 5 min at room temperature, then resuspended in 50 pi of 1 x STET buffer

(Table 2.2). Lysozyme (5 pi of a fresh 10 mg/ml solution dissolved in

1 x STET buffer) was added, the mixture was vortexed gently and placed in a boiling water bath for 1 minute. The tube was centrifuged for 10 min at room temperature in order to pellet the cell debris. The plasmid DNA remains in the supernatant, whereas the denatured cellular material, together with chromosomal DNA, forms a mucoid pellet which was removed from the tube with a sterile toothpick. The supernatant (between 30 and 50 pi) was mixed with

100 pi of 0.3 M NaOAc, pH 8, and 200 pi of cold propan-2-ol and the

DNA was precipitated by placing the tube on dry ice for 10 min.

The DNA was collected by centrifugation (10 min in a bench-top 69 centrifuge at 4 °C), washed with cold ethanol, dried under vacuum and resuspended in 20 /il of TE buffer. The method produced plasmid

DNA (predominantly closed-circular (form I) DNA) that was free from traces of chromosomal DNA, as judged by the appearance on agarose gels containing EtBr. The yield of plasmid DNA was sufficient for several restriction enzyme digestions.

2.9. Ligations

T4 DNA Ligase was used throughout to catalyse the ligation of

DNA molecules with compatible termini. DNA molecules with protruding (sticky) ends were ligated together in a 10 fil volume containing the following: 20-50 ng of restriction enzyme-cleaved, phosphatased vector DNA (see Section 2.6), a 1 to 3-fold excess of foreign DNA (restricted with the same enzyme or enzymes), 1 ftl of

10 x LB (ligation buffer: Table 2.2), 1 mM rATP and 2 units of ligase.

The appropriate controls contained vector DNA only. One control contained ligase in order to ascertain the extent of re-ligation and, by analogy, the efficiency of the phosphatase treatment. A second control did not contain ligase, allowing a check on the amount of uncut vector. The ligations were incubated for at least 6 h at 15 °C.

The ligation of blunt-ended DNA fragments is less efficient as the

for the activity of the enzyme on blunt-ended DNA molecules is two orders of magnitude higher than the Kyi for cohesive ends

(Maniatis et al, 1982). Consequently, the ligation protocol was modified by increasing the concentration of foreign DNA a 2 to 6 -fold excess over vector and adding 5 units of ligase to the ligation mix. The ligations were incubated at 15 °C overnight with appropriate controls. 70

Aliquots of the ligation mixes were used to transform competent

E. coli strains as described in the following Section 2.12.

2.10. Repairing Cohesive Ends with T4 DNA Polymerase

The DNA polymerase from bacteriophage T4 possesses a highly active 3’ -» 5’ exonuclease in addition to the 5’ ■* 3* polymerase activity, so is employed to make flush the cohesive ends generated by restriction enzymes that leave either protruding or recessed 3’ hydroxyl groups (O’Farrell, 1981).

The target DNA was digested with the appropriate restriction enzyme(s) in a 15 /d volume at 37 °C as described earlier (Section

2.5.1) and the termini were made flush according to Bankier and

Barrell (1983). The volume was adjusted to 22 jd with water and the following were added; 3 /d 10 x TM buffer (Table 2.2), 3 /d of a mixture of dATP, dGTP, dCTP and dTTP, all at a concentration of

250 mM in TE buffer, and 2 fd of T4 DNA polymerase (5 units/jd).

The reaction was incubated at 15 °C for 3 h.

Where appropriate, the DNA was phosphatased (Section 2.6).

2.11. Digestion of Linearised DNA with Nuclease Ba/31

The enzyme removes nucleotides from both 5’ and 3’ termini at an equivalent rate (Maniatis et al, 1982). The DNA to be digested was resuspended in 25 /d of TE buffer, to which was added an equal volume of 2x Bal'il buffer (Table 2.2). The digestion was initiated by adding 5 units of the enzyme and the reaction was incubated at 37 °C 71 for a period of time. Aliquots (10 /d) were removed at 5 min intervals and the reaction was terminated by adding each aliquot to

5 fi\ of 50 mM EDTA on ice (the final concentration of EDTA was

20 mM). The DNA fragments were resolved on an agarose gel and fragments in the desired size range were purified as detailed in Section

2.5.2.

2.12. Transformation of Competent E. coli

2 .12.1 Preparation of Competent Cells

The methods were modified from Maniatis et al (1982) using transformation buffer recipes of Dr David Glover.

A 50 ml culture of L-broth was inoculated with 0.5 ml of an overnight culture of either E. coli DH5 or 236c. The culture was incubated at 37 °C for 2-3 h until the cells had grown to /1550 = 0.3.

The culture was chilled on ice, and the cells were pelleted by centrifugation at 3,000 rpm for 5 min in an IEC Centra-4X centrifuge.

The cell pellet was resuspended in transformation buffer I (Table 2.2) in half of the original volume and left on ice for at least 30 min.

The cells were pelleted for a second time and resuspended in 1/ 25 volume of transformation buffer II (Table 2.2). The cells were competent for up to 2 days from this point and were kept at 4 °C.

2.12.2. Transformation of E. coli DH5

Routinely, 3x 10 ** cells were transformed with either a ligation mix

(Section 2.9) or 10 ng of closed-circular DNA in a 200ul volume and left on ice for 45 min. The cells were subjected to a heat-shock step by incubating the transformed cells at 42 °C for 2 min. Next, 0.8 ml of L-broth was added and the cells were incubated for 1 hr at 37 °C

(in order to allow the cells to resume growth and express the plasmid-borne /3-lactamase) prior to selection on selective media. All the recombinant constructs, which were derivatives of pAT153 or pUC9, were screened for expression of the /3-lactamase gene by spreading

50 pi aliquots of cells on L-broth plates containing ampicillin. The plates were incubated in an inverted position at 37 °C overnight.

Where cloning involved the insertional inactivation of the tetracycline-resistant (Tc^) gene of pAT153, 50 fil aliquots were plated on L-broth plates containing tetracycline.

2.12.3. Transformation of E. coli 236c

The valS temperature-sensitive strain 236c was transformed with a ligation mix as described in the preceding section for DH5. In contrast, the cells were heat-shocked more gently by incubating the transformed cells for 10 min at 37 °C. An 0.8 ml aliquot of L-broth was added and the cells were incubated at 30 °C for 1 h prior to plating. Aliquots (50 /il) were spread on duplicate L-broth plates containing ampicillin and the plates were incubated in an inverted position at either 32 °C or 42 °C overnight. 73

CHAPTER 3

KINETIC ASSAYS AND PROTEIN PURIFICATION

3.1. Enzyme Assays

3.1.1. Materials

B. stearothermophilus tRNAva^ was enriched by chromatography on

BD-cellulose from a mixed tRNA pool by J-P Shi. The charging acceptance for valine is 190 pmol/,4260 unit. L-[^C] valine

(285 mCi/mmol) was purchased from Amersham International pic.

Yeast inorganic pyrophosphatase was obtained from Sigma Chemicals Ltd and stored as a 1 unit/jil stock solution in water at -20 °C.

ValRS, Mg.ATP, tRNAva^ and cocktails containing any of these reagents were stored under liquid nitrogen. On the day of assay, the concentration of active enzyme was determined by active-site titration

(Section 3.1.2).

Two types of scintillant were used for radioactivity counting:

BBOT (0.45% 2,5-bis (5’-tert-butyl- benzoxazoyl- (2*))thiophene made up

in 3:1 toluene: 2-methoxyethanol), and PPO/POPOP (0.5%

2,5-diphenyloxazole plus 0.03% l,4-di-(2-(5-phenyloxazolyl))-benzene in

toluene). All solvents used in these scintillants were purchased from

BDH Chemicals Ltd and were of scintillation grade. Radioactive

samples were counted in a 3000, model 6891, Liquid Scintillation

System manufactured by Searle Analytic Inc. 74

3.1.2. Active-Site Titration

This assay, which is modified from Yarus and Berg (1970), measures the concentration of active enzyme by sequestering enzyme- bound valyl adenylate. Briefly, ValRS was mixed with [* 4C] valine and ATP, in the presence of inorganic pyrophosphatase, hence favouring the formation of valyl adenylate. The enzyme-bound adenylate is isolated by filtering an aliquot of the reaction mixture through a nitrocellulose filter, the protein adsorbing to the filter whilst the free

ATP and valine are washed through. The amount of catalytically active enzyme is derived from the amount of radioactivity retained on the filter, assuming a stoichiometric binding of one mol of valyl adenylate binding per mol of enzyme.

The assay was conducted in a final volume of 100 fil that contains the following components: 144 mM Tris-HCl, pH 7.78; 10 mM

MgCl2 ; 5 mM 2-mercaptoethanol (2-ME); 0.1 mM PMSF; 2 mM Mg.ATP;

10.5 piM [ 14C] valine (285 mCi/mmol); 1 unit/ml yeast inorganic pyrophosphatase, and an unknown concentration of ValRS. The reaction was initiated by dispensing a 10 fi\ aliquot of the enzyme from a 10 pi Hamilton syringe and then incubated at 25 °C for up to 30 min.

Aliquots (25/d) were removed at regular intervals. Each aliquot was spotted on to a 2.5 cm, 0.4 nitrocellulose filter (Schleicher and

Schuell BA85) that previously had been soaked in ice-cold 144 mM

Tris-HCl, pH 7.78. The filter was washed with 3 ml of the same ice-cold buffer under gentle suction, then dried under a heat lamp for five minutes, transferred to a scintillation vial containing PPO/POPOP and counted. The total number of counts in the reaction mixture was determined by spotting a 20 /il aliquot directly onto a filter, drying and counting as before. 75

Treatment of Data

As equilibrium was rapidly attained, the three aliquots gave similar values upon counting. The mean of the three readings, corrected for a blank that contained no enzyme, was then divided by the specific activity of [^C] valine to give the amount of enzyme-bound adenylate in the 25 /il aliquot and, hence, the amount of enzyme.

3.1.3. Aminoacylation (Charging) Assay

The Michaelis constants for aminoacylation of [^C] valyl adenylate with tRNAva* were measured for variations in the concentrations of

ATP or valine. All assays were carried out at 25° C in a final volume of 100 /il. The basic reaction cocktail contained 144 mM

Tris-HCl, pH 7.78, 10 mM MgCl2, 5 mM 2-ME, 0.1 mM PMSF,

1 unit/ml yeast inorganic pyrophosphatase and 9-22 /*M tRNAva* (190 pmol/^260 acceptance). The reaction was initiated by adding a 10 pi aliquot of enzyme whose concentration had previously been determined by active site titration (Section 3.1.2). Between ten and twelve concentrations of the varied substrate were assayed. The initial reaction velocities were measured by removing 25 jil aliquots at three different time points and quenching the reaction by mixing the aliquot rapidly with 3 ml of ice-cold 5% trichloroacetic acid (TCA) containing

1% D,L-valine. TCA-precipitated [^C] valyl-tRNAva^ was isolated by filtering the sample through a 2.5 cm Whatman GF/C filter under gentle suction. The filters were washed three times with 3 ml volumes of 5% TCA and, finally, with 3 ml of ethanol. The filters were dried under a heat lamp and inserted into a scintillation vial, to which 6 ml 76 of BBOT was added. The radioactivity bound to the filters was measured by scintillation counting, initially for 30 sec in order to guage the speed of the reaction, then the filters were recounted for 10 min. To ensure that the reaction was measured during its initial linear phase, the amount of radioactivity precipitated was kept below 10% of the level needed to charge all the tRNA (estimated from the known concentration of tRNAva^ and the specific activity of the [^C] valine).

In order to determine the value of for ATP, the concentration of valine was held constant at 45 fiM (64 mCi/mmol), whilst the concentration of Mg.ATP was varied between 25 pM and 5 mM. The concentration of ValRS (purified from a culture of E. coli DH5 transformed with pNBl) was determined by active site titration to be

2.2 nM in the final reaction. The concentration of tRNAva^, in this instance, was 9 /*M

The value of A'm for valine was derived by varying the concentration of [^C] valine between 5 /*M (8 mCi/mmol) and 88 jiM

(143 mCi/mmol). The concentration of Mg.ATP was held at 2 mM; the concentrations of ValRS and tRNAva* were 7.3 nM (by active site) and

22 pM respectively.

Treatment of Data

The initial reaction velocities were determined by plotting the amount of TCA-precipitable counts per minute (cpm) for each concentration of the appropriate variable versus the time of quenching, measured from the point at which the enzyme was added. These values were then converted from cpm s "1 to pmol of [^C] valine charged per second by dividing through by the specific activity for the reaction. The velocities V were then plotted against V divided by [S], 77 where [S] was the concentration of the substrate under study (Eadie,

1942; Hofstee, 1959). This gave a linear plot, the slope of which represents the Michaelis constant The turnover number for the enzyme, fccat, may be derived from Fmax, the intercept on the y-axis which represents the initial velocity of reaction at infinite substrate concentration.

3.2. SDS-Polyacrylamide Gel Electrophoresis

A modification of the method of Laemmli (1970) was used. 10% polyacrylamide denaturing gels (1 mm thick) were cast in 83 x 40 mm moulds from 10% acrylamide, 0.3% N,N’-bis-methylene acrylamide in 0.1

M Tris-Glycine buffer, pH 8.3, containing 0.1% sodium dodecyl sulphate

(SDS). Polymerisation was initiated by the addition of tetramethyl-ethylene-diamine (TEMED) to 0.25 % (v/v) and solid ammonium persulphate to 0.02% (w/v). The gels were run in 0.1 M

Tris-Glycine buffer, pH 8.3 (containing 0.1% SDS) without a stacking gel. The protein sample, 0.2-10 of protein in a volume not exceeding 20 /d, was mixed with 10 /d of sample buffer (10 mM

Tris-Glycine, pH 8.3, 0.02% bromophenol blue, 2% SDS, 10% 2-ME and

25% glycerol) and boiled for 3 min prior to loading. Once the samples

had been loaded, the gel was subjected to electrophoresis at a constant

voltage of 37 V until the samples had entered the gel, then after at

70 V until the bromophenol blue dye had just run off the bottom of

the gel (approximately 40 min).

The protein bands were visualized by staining the gel in freshly

prepared 0.3% Coomassie Brilliant Blue R in rapid-destain solution (50%

methanol, 10% glacial acetic acid in water) for 15 min. The gel was 78 washed for 10 min in rapid-destain solution and then left to destain slowly overnight in 10% glacial acetic acid in water.

Crude cell-lysates were routinely prepared from 5 ml cultures of cells that had grown into late-log phase. The cells were pelleted by centrifugation, resuspended in 1 ml of 144 mM Tris-HCl, pH 7.78,

10 mM MgCl2 containing 14 mM 2-ME plus 0.1 mM PMSF, then frozen and thawed from liquid nitrogen three times to weaken the cell walls.

Finally, the cell suspension was sonicated twice for 5 second bursts with a V i g inch Micro-tip probe coupled to a Heat Systems Inc. model W-375 sonicator. The lysate was centrifuged for 5 min in order to pellet the cell debris and the supernatant was retained. The crude cell-lysates were stored under liquid nitrogen.

3.3. Purification of Valyl-tRNA Synthetase

3.3.1. A Modified Active-Site Titration for ValRS

The activity of ValRS was monitored throughout the purification by active-site titration (Section 3.1.2). Routinely, 10 /d of a sample of

ValRS was mixed with 50 /d of active-site cocktail and 40 /d of water. The concentrations of the various components of the cocktail were as follows: 144 mM Tris-HCL, pH7.78; 10 mM MgCl2; 5 mM 2-ME;

0.1 mM PMSF; 2 mM Mg.ATP; 5.8 /iM [^C] valine (285 mCi/mmol) and

1 unit/ml inorganic pyrophosphatase. The reaction was incubated at

25 °C and 25 /d aliquots were removed at three time points. The enzyme-bound [^C] valyl adenylate was trapped as described before.

The assay was modified slightly for quantifying the amount of 79 active ValRS during the enzymes purification by DEAE- and FPLC- chromatographies (see sections 3.3.4 and 3.4.5). Each column fraction was assayed by removing a 15 pi aliquot and mixing it with 10 pi of the active-site cocktail. The reaction was incubated at 25 °C for 5 min and 20 pi of the reaction mixture was spotted onto a filter and the adenylate was isolated as described in Section 3.1.2.

3.3.2. Cell Culture and Preliminary Purification of ValRS

A 5 ml volume of L-broth, containing 50 pg/m l ampicillin, was

inoculated with E. coli strain DH5 carrying a plasmid that contains the

cloned B. stearothermophilus valS gene (either the original clone, pNBl,

or the pUC9 subclone pTB8). The culture was grown to an A550 of

0.3 at 37 °C with shaking (typically, this took 3-4 hours). 2.5 mis of

this culture was used to inoculate each of two2 1 flasks containing

500 ml of fresh L-Broth plus ampicillin (50 pg/ml). Next day, the cells were pelleted by centrifugation at 5,000 rpm for 10 min at 4 °C and were resuspended in 60 ml of TE buffer, pH 7.78 (50 mM

Tris-HCl, pH 7.78, 1 mM EDTA containing 10 mM 2-ME and 0.1 mM

PMSF). This cell suspension was frozen and thawed three times from liquid nitrogen and sonicated with a £ inch probe attached to a Heat

Systems - Ultrasonics Inc (USA) model W-375 sonicator for five bursts of 40 sec apiece. Throughout sonication, the suspension was kept on ice to mimimise heating. Subsequently, the crude lysate was clarified by centrifugation at 30,000 rpm for 20 min at 4 °C in a Beckman

45 Ti rotor. The supernatant was retained.

The supernatant was heated to 56 °C for 30 min in order to

denature E. coli proteins, those from B. stearothermophilus being stable at 80 this temperature. The denatured proteins were pelleted by centrifugation at 8,000 rpm for 15 min at 4 °C. The volume of the

lysate was measured and a small aliquot was retained for analysis by

SDS-polyacrylamide gel electrophoresis (SDS-PAGE) and active-site

titration.

3.3.3. Precipitation with Ammonium Sulphate

It had been shown from earlier work that the ValRS from

B. stear other mophilus could be precipitated from a cell lysate with

saturated ammonium sulphate at concentrations above 30% (J-P Shi,

unpublished), 55% saturated ammonium sulphate being the optimal

concentration.

17.28 g (0.326 g/ml) of solid ammonium sulphate was added to the

lysate, giving a final concentration of 55% saturation and the protein

was left to precipitate over 1 h at 4 °C with gentle stirring. The

precipitate was collected by centrifugation at 20,000 rpm for 20 min at

4 °C in a Beckman Ti45 rotor.A small aliquot of the supernatant

was retained for analysis, but the bulk was discarded. The precipitate

was retained and resuspended in a small volume of 50 mM potassium

phosphate (KP) buffer, pH 6.5, containing 0.1 mM EDTA, 5 mM 2-ME

and 0.1 mM PMSF (this buffer is referred to as 50 mM KP, pH 6.5,

in subsequent sections). The suspension was dialysed twice at 4 °C for

16 hours against 2 1 of the same buffer containing 10 /iM tetrasodium

pyrophosphate (added to the buffer in order to displace any

enzyme-bound valyl adenylate). Finally, the dialysate was centrifuged

in the 45 Ti rotor at 20,000 rpm for 20 min at 4 °C in order to

pellet any solid material. The volume of the supernatant was adjusted

to 45 ml with 50 mM KP buffer, pH 6.5 and retained for 81 chromatography on DEAE—Sephacel.

An aliquot of the 55% ammonium sulphate supernatant was retained and dialysed against the same buffer for future analysis by

SDS-PAGE and active site titration, in addition to a sample of the re-dissolved ammonium sulphate precipitate. Active-site titration revealed that a 55% ammonium sulphate cut was sufficent to precipitate greater than 90% of the ValRS (J-P. Shi, unpublished).

3.3.4. DEAE-Sephacel Chromatography

A 5 cm diameter column was packed with 100 ml of

DEAE-Sephacel ion-exchange resin (Pharmacia) suspended in 2 M KP, pH 6.5, and the column was equilibrated with 50 mM KP buffer, pH 6.5. The column was loaded with the ammonium sulphate- precipitated material (see preceding section) and the column was washed with 50 mM KP buffer, pH 6.5. A flow rate of 1 ml/min was maintained throughout the purification and all manipulations took place at 4 °C. A sample of the eluent was tested by active-site titration to check that the ValRS had bound to the column. This being so, the enzyme was eluted with a l l gradient of 50-350 mM KP buffer, pH 6.5. Fractions (10 ml) were collected using a Pharmacia FRAC-300 fraction collector and the fractions were assayed by the modified active-site titration (see section 3.3.1.). The number of cpm retained on the filters were plotted against fraction number and a typical plot is shown in Fig. 3.1. Those fractions encompassing the peak of activity were pooled, in this case giving a final volume of 60 ml. Routinely, a yield of between 30-50% was obtained for this step.

The peak fractions from the DEAE-Sephacel chromatography were concentrated to a volume of 10 ml by pressure membrane filtration at 4 5 0 t 8 , r 2 0 o o ^3 1 1 cn 0 CO c n jiaijov ---- A CO o o c n — 1— 1 1 0 cn r o 1 1 O N) o CO 082) eoueqjosqv

iuu _ 1 ( c n O _ ho _ _ | \ \ (e-0l.x iudo) \ 1 1 0 o Concentration of K-P (pH6.5) gradient (mM) 200 250 300 350 400 450 500 550 Volume(mls) 83

a pressure of 30 p.s.i. using an Amicon PM30 filter. The concentrate

was then dialysed against 2 1 of 20 mM Tris-HCl, pH 7.8, containing

0.1 mM EDTA, 5 mM 2-ME and 0.1 mM PMSF, at 4 °C for 16 hr.

The concentrate was frozen rapidly in liquid nitrogen and stored at

-70 °C until required.

3.3.5. FPLC-Chromatography

The Pharmacia Fast Protein Liquid Chromatography (FPLC) system

allowed for a rapid purification at room temperature of the ValRS that

is present in the concentrate from the DEAE-Sephacel column. The

enzyme was resolved on a 100 mm x 10 mm Pharmacia Mono Q pre-packed HR 10/10 ion-exchange column with a bed volume of 8 ml and a maximum protein capacity of 200 mg. The column resin is an anion-exchange material consisting of a hydrophilic beaded support with

trimethyl-amino methyl exchange groups. The column, previously

equilibrated with the loading buffer (20 mM Tris-HCl, pH 7.8, 5 mM

2-ME, 0.1 mM PMSF and 0.1 mM EDTA) was loaded with half of the

dialysed concentrate, then washed with starting buffer for 5 min. A

Fig.3.1. (Facing) DEAE-Sephacel chromatography of ValRS. Protein precipitated by a 55% ammonium sulphate cut was dialysed against 50 mM KP buffer, pH 6.5, and eluted from a DEAE-Sephacel ion-exchange column by a 50-350 mM KP buffer gradient (dashed line) at 4 °C. The ^ g o was monitored and is shown as a solid line. Fractions were assayed for ValRS by the modified active-site titration (Section 3.3.1) and the activity, in counts per minute (cpm) per 15 fil aliquot, is depicted by filled circles. bobne (280nm)Absorbance

Volume (mis) 85 flow rate of 5 ml/min was maintained throughout all steps of the purification. A gradient of 100-300 mM NaCl in loading buffer was applied to the column to elute the bound ValRS, the gradient being applied from a dual syringe pump P-500 system over 40 min and controlled from a GP-250 Gradient Programmer. The ValRS was eluted at approximately 200 mM NaCl. The absorbance of the column eluant was monitored at 280 nm through a flow cell and UV-1 single path monitor attached to a chart recorder and 2.5 ml fractions were collected by a Frac-100 fraction collecter. All FPLC components were purchased from Pharmacia Fine Chemicals AB, Sweden. Fractions were assayed for ValRS activity by the modified active-site titration (see

Section 3.3.1.). A graph was plotted with the results of the active-site titration, expressed as counts per minute (cpm), and the absorbance at

280 nm (^280) Plotted against fraction number. A typical profile is shown in Fig. 3.2. In this particular example, the peak of ValRS

activity was judged to comprise fractions 37 to 40 inclusive and these

fractions were pooled accordingly, giving a total volume of 10 ml.

Fraction 41 was also retained, but was not pooled with the peak

fractions. Samples of the FPLC-purified enzyme were retained for

SDS-PAGE and active-site titration.

Fig. 3.2. (Facing) FPLC-purification of ValRS. The peak fractions from the DEAE-Sephacel chromatography (Fig. 3.1) were pooled and dialysed against 20 mM Tris-HCl buffer, pH 7.8. The dialysate was loaded on to a Pharmacia Mono Q anion-exchange column and resolved by applying a 100-300 mM NaCL gradient in loading buffer, (dashed line). The absorbance at 280 nm was monitored throughout and the progress of elution is shown as an unbroken line. Fractions were assayed for ValRS by active-site titration (Section 3.3.1). ValRS activity (cpm per 15 /d sample) is plotted as filled circles. 86

3.3.6. A Second Heat Step, Concentration and Storage of the Purified

Enzyme

The pooled fractions were subjected to a second heat treatment by incubating at 56 °C for 30 min in order to remove any remaining

E. coli ValRS. Precipitated proteins were removed by centrifugation.

The pool, which contained 200 mM NaCl, was concentrated to 1 ml by pressure membrane filtration through an Amicon PM30 filter as described in Section 3.3.4. The NaCl was removed by solvent exchange by adding 10 ml of 20 mM Tris-HCl buffer, pH 7.8, directly to the concentrate in the Amicon chamber and concentrating to 1 ml. This procedure was repeated and the concentrate was washed out with

20 mM Tris buffer, the final volume being 2 ml. An aliquot was removed for active-site titration and the remainder of the concentrated enzyme was stored under liquid nitrogen.

3.3.7. Discussion

The results of a typical purification are presented in Table 3.1.

The overall yield of purified ValRS was 39% (relative to the amount in the crude lysate as determined by active-site titration). This represents roughly 15 mg of ValRS from a 1 1 broth culture.

An SDS-polyacrylamide gel representing the course of purification is shown in Fig. 4.3. The 55% ammonium sulphate cut precipitates the bulk of the 110,000 Da material that is present in the crude lysate

(compare lanes b and d). The side-fraction No. 41 (lane f) from the

FPLC-chromatography apparently contained no more smaller contaminants than did the peak pooled fractions (lane g). Tabic 3.1

Summary of a Typical ValRS Purification

Purification Volume Active Sites* ^280 Specific Activity^ % Step Concentration (absorbance units) Purification (mis) (/iM) (/iM/absorbance unit)

Sonicated 53 6.5 2.84 2.29 100 crude lysate

55% ammonium 20 16.2 3.1 5.23 94.2 sulphate ppte

DEAE-Sephacel 20 9.2 0.73 12.6 53.5 pool

FPLC-MonoQ3 4.2 32.2 1.22 26.4 39.3 pool

* Active site titrations were carried out as described in Section 3.1.2. 2 The specific activity was calculated by dividing the concentration of active sites by the absorbance at 280 nm. 3 FPLC Anion-exchange chromatography; see Section 3.3.5. 88

ab cdefghi

Fig. 3.3. SDS-Polyacrylamide gel showing purification of ValRS. ValRS was purified as described in Section 3.3. Lanes a and i, 2 fig of purified native ValRS (Mulvey and Fersht, 1977a), Mr = 103,000; lane b, crude cell lysate, sonicated and heated to 56 °C; lane c, 55% ammonium sulphate supernatant; lane d, 55% ammonium sulphate precipitate; lane e, pool of peak fractions from DEAE-chromatography; lane f, pool of FPLC-chromatography peak fractions; lane g, FPLC side-fraction No. 41 (see Section 3.3.5.); lane h, FPLC pooled fractions following heat-step at 56 °C. 89

E. coli proteins are heat-labile under conditions where

B. stearothermophilus proteins are stable. For example, an 8 min incubation of a crude E. coli cell-lysate at 60 °C denatures 55% of the proteins, compared to 3% of the thermophilic proteins in a comparable study (Koffler and Gale, 1957, cited in Stanier et al, 1976). This presents a convenient method of removing contaminating E. coli ValRS and has been used routinely during the purification of batches of cloned B. stearothermophilus TyrRS (A.J. Wilkinson, University of London

PhD. thesis, 1984). The second heat step (i.e post-FPLC purification) appears to have a slight effect at removing E. coli proteins (compare lanes f and h). Undoubtedly, most proteins are removed during the first heat step.

3.4. Amino acid Sequencing

N-terminal analysis of purified B. stearothermophilus ValRS was carried out by Mr. Ian Blench. Approximately 1 nmol of the enzyme in a 50 pi volume was dialysed against 0.1% trifluoroacetic acid and analysed on the Imperial College gas-phase protein sequencer by modified gas-phase Edman degredation (Gros and Labouesse, 1969).

Briefly, the enzyme was subjected to a double-couple/double-cleavage cycle consisting of two coupling reactions to a disc support (15 min each), followed by two rounds of cleavage by trifluoroacetic acid (in argon) to cleave off the phenylthiocarbamyl amino acid derivatives.

These were analysed by high-pressure liquid chromatography on a

Varian LC-5000 liquid chromatograph and the samples were monitored at 254, 269, 280 and 313 nm simultaneously using a Waters 490 detector. 90

3.5. Amino Acid Composition Determination

Purified B. stear other mophilus YalRS (60 pmol) was hydrolysed in vacuo at 110 °C in a 50 /tl volume of 6 N HC1. Each hydrolysis contained 2 nmol of nor-leucine as an internal standard. Reaction samples were terminated at the following times; 16, 24, 68, 88 and 111 hr. The samples were analysed on a Beckman 121MB amino acid analyser, using the method of Moore and Stein (1963). I am grateful to Mr. Dave Featherby for carrying out the analysis.

The numbers of each amino acid were calculated from the area under each peak of the chromatograph trace, relative to a standard containing the 20 amino acids in equimolar ratios.

3.6. Determination of the Number of Tryptophan Residues in ValRS

The number of tryptophan residues in the cloned ValRS that had been predicted from a translation of the valS DNA sequence was confirmed by a modification of the spectrophotometric method of

Edelhoch (1967). The method (Mulvey et al, 1974) involves determining the number of tryptophan residues in a completely unfolded protein (in

6M guanidine hydrochloride) from the differences in absorbance at 280 and 288 nm using the formula

e 288 e 280 N{rp = ------(1) 3103 10318 where Ntrp is the number of tryptophan residues in the protein, and

^-288 anc* ^280 are extinction coefficients for that protein at 288 91 and 280 nm, respectively. Samples of purified B. stearothermophilus

TyrRS, which contains 6 tryptophan residues, B. stearothermophilus ValRS and ribonuclease A (contains no tryptophans) were diluted separately to

^280 = 0-4 111 ^M guanidine hydrochloride/35 mM KP buffer, pH 6.5, containing 5 mM 2-ME (modified from Mulvey et aU 1974). The proteins were left to denature at 4 °C for 6 hr. A blank sample, containing no protein, was treated in the same fashion. The absorbances at 280 and 288 nm for each sample were then determined and corrected for the blank values. The values of £ 2 8 8 an<* ^280 were derived by dividing the corresponding absorbance value by the concentration of the protein.

The extinction coefficient (E^) of the purified cloned ValRS was determined on the basis that the values for tryptophan, tyrosine and cysteine at 279-281 nm are 5480, 1180 and 120 respectively

(Mulvey et a/, 1974) using the formula

5480.Ntrp + 1180.Ntyr + 120.Ncys (2) eM 102,030

where Ntrp, Ntyr and Ncys are the predicted numbers of trptophans

(27), tyrosines (34) and cysteines (4) and 102,030 represents the predicted molecular weight of the ValRS (Section 7.2.1). This gives a value for (ValRS) of 1.85 mg-1 ml-1 cm"1.

The molar concentration of the ValRS was determined by dividing the absorbance at 280 nm (/I2 8O) by ^M anc* its molecular weight.

This value was confirmed by determining the protein content using the method of Lowry et al (1951). A calibration curve of A ^ q versus protein content was calculated for a standard solution of 0- 100 /zg/ml bovine serum albumin (BSA). The concentration of ValRS was derived from this curve after its A ^ q had been measured and found to be

1.428 mg/ml. The ^280 this ValRS sample was found to be 2.619, giving an extinction coefficient of 1.835 mg-1 ml"1 cm-1 which agrees well with the aforementioned value. 93

CHAPTER 4

CLONING OF THE B. stearothermophilus valS GENE

4.1. Introduction

The valyl-tRNA synthetase belongs to a distinct sub-group of aminoacyl-tRNA synthetases that also includes the isoleucyl- and leucyl-tRNA synthetases (Schimmel and Soli, 1979). The characteristics of these enzymes have been discussed elsewhere (Chapter 1). The large size of these enzymes (typically Mr = 110,000) suggested that parts of the enzymes might be redundant with respect to function. Several synthetase protomers, for example, from E. coli TrpRS (a dimer of subunit Mr 37,000) and HisRS (dimeric; subunit Mr 45,000), are almost three times smaller than the large monomeric synthetases, suggesting that the latter enzymes contain regions that are non-essential for substrate binding and catalysis. A popular view of the mid-1970s was that the larger synthetases had evolved through gene duplication and fusion, giving rise to excessively large enzymes containing extensive internal repeats (Section 1.3.4). The evidence to support such conclusions has fallen into disrepute of late with the cloning and sequencing of a number of synthetase genes. The DNA sequences for E. coli AlaRS

(Putney et al, 1981a), the MetRS from E. coli (Barker et al, 1982;

Dardel et al, 1984) and yeast cytoplasm (Walter et al, 1983), and the

E. coli IleRS (Webster et al, 1984) have been determined. The predicted protein sequences, confirmed in some cases by protein sequencing of proteolytic fragments or by mass spectroscopy, were shown not to contain any significant internal repeats.

The first objective was to clone and express the valS gene from a 94 library of B. stearothermophilus chromosomal DNA. Clarke and Carbon established the precedent for the isolation of bacterial genes in 1975.

Basically, this is achieved by selecting for the ability of a cloned gene

(present in a collection of DNA fragments derived from total chromosomal DNA and ligated into a plasmid vector) to complement a lesion in the corresponding chromosomal gene of an E. coli mutant.

The method was used to clone the entire E. coli arabinose, leucine and tryptophan operons by complementation of the appropriate auxotrophic mutant (Clarke and Carbon, 1975) and, later, extended to cloning yeast genes in E. coli (Ratzin and Carbon, 1977; Struhl et aU 1976). Since then, numerous genes have been cloned using this approach, including several aminoacyl-tRNA synthetase genes, among them E. coli valS

(Skogman and Nilsson, 1984) and the tyrS genes from both E. coli and

B. stearothermophilus (Barker, 1982), cloned by complementation of the appropriate temperature-sensitive E. coli strain. Competent mutant cells were transformed with a gene library that had been constructed in a plasmid containing a selectable marker, usually an antibiotic-resistance gene. Cells that had taken up a plasmid were selected on solid media containing the antibiotic at the selection temperature that is non-permissive for that strain. Colonies growing at that temperature may be of two types (Fig. 4.1). First, a colony may have been transformed by a plasmid encoding the cloned gene, the protein product of which is able to complement the temperature-sensitive lesion in the mutant gene (Fig. 4.1, scheme (a)). Second, the colony might possess a chromosomal reversion back to wild-type phenotype, able to grow in the presence of the antibiotic because it carries a plasmid (scheme (b), same figure). The possibilities may be distinguished by purifying plasmid from a number of colonies, introducing them back into the mutant strain and plating aliquots of transformed cells on selective media at 1st selection 2nd selection 42 °C 32 °C 42 °C

— [ | VailS chromosome

t

plasmid (vatS) both the permissive and non-permissive temperatures. A plasmid that contains the desired gene will give equivalent numbers of colonies at both selection temperatures after the second round of selection (a). If plasmid DNA had been isolated from a revertant colony, the plasmid will be unable to complement the mutation during the second round of selection and transformed colonies will grow only at the permissive temperature.

Fig. 4.1. (Facing) A second round of transformation allows complementation to be distinguished from chromosomal reversion. The E.coli chromosomal valS gene (open box, top of each figure) contains a thermolabile lesion which is depicted as an open vertical bar at 32 °C and as a filled vertical bar at 42 °C (the non-permissive temperature). Plasmids are depicted as circles carrying either a valS gene (open box) or a non-specific piece of DNA (filled box), in addition to the plasmid-encoded /3-lactamase gene (Amp^: hatched box). A colony of transformed E. coli 236c growing at 42 °C on L-Amp media has a valS+ phenotype for one of two reasons. Scheme (a): a plasmid carrying a cloned valS gene is isolated from a mutant cell and transformed back into 236c. Transformed valS+ colonies growing at 32 °C in the second selection contain functional chromosomal and plasmid-borne valS genes, whereas colonies grown at 42 °C possess a functional plasmid gene only, conferring a valS+ phenotype on the colony. Scheme (b): a plasmid carrying a random DNA insert is isolated from a valS+ chromosomal revertant of 236c at 42 °C. When the plasmid is purified and transformed back into 236c (selection 2), colonies grow at 32 °C as the chromosomal valS gene is functional, but not at 42 °C as this temperature is non-permissive for the chromosomal gene. 97

4.2. Cloning of the valS Gene from B. stearothermophilus by

Complementation of a Temperature-Sensitive E. coli Strain

4.2.1. Amplification of the B stearothermophilus Library in the

Mutant Strain

The E. coli 236c mutant strain contains a temperature-sensitive lesion for ValRS, enabling the strain to grow at 32 °C (the permissive temperature) but not at the non-permissive temperature of 42 °C (Dr K.

Bohman, personal communication). This was verified, and the reversion

frequency for the mutation ascertained, by preparing serial dilutions of

a mid-log phase culture in fresh broth and spreading aliquots on

duplicate L-broth plates. One of each pair of plates was incubated

overnight at 32 °C, the other was incubated at 42 °C for the same period of time. Next day, the numbers of colonies were counted. No colonies were observed at the non-permissive temperature and, from the numbers of colonies that grew at 32 °C at highest dilutions, a reversion frequency of better than 3.3 x 10"^ was calculated. This compares well with the quoted frequency of 2 x 10"9 (Dr D.G. Barker, personal communication).

Competent 236c were prepared as detailed in Section 2.12.1. Two

200 pi aliquots of cells were transformed with 100 ng of the

B. stearothermophilus DNA library (see Section 2.7). Following a 45 min incubation on ice, the cells were heat-shocked, then incubated at 32 °C with 0.8 ml of fresh L-broth for 1 h. Next, 100 pi aliquots were

spread on L-broth plates containing ampicillin and the plates were

incubated overnight at 32 °C.

Next day, the number of Amp^ colonies were counted

approximately 25,000 colonies in total were obtained. These were harvested by adding 3 ml of sterile L-broth to the plates and scraping the cells from the surface of the agar with a sterile glass rod. The cell suspensions from all the L-broth + Amp plates were pooled.

Taking into account that about 20% of the colonies contain religated or uncut vector only (i.e. those that are Amp^> Tc^), and that the library was based on a total of 3,600 individual colonies, this selection should cover the entire library roughly five times. This step allows for an amplification of the library in the mutant strain, ensuring that the entire library is present in the next, and critical, selection at the non-permissive temperature.

4.2.2. Selection for valS Plasmids by Complementation

A series of ten-fold dilutions was made from the pool of colonies that grew at the permissive temperature and aliquots were plated on duplicate L-broth plates containing ampicillin. The plates were divided into two sets, one set being incubated overnight at 32 °C, the other at

42 °C.

The numbers of colonies on both sets of plates were counted next day and the results are given in Table 4.1. By calculating the number of colonies relative to the dilution factor, the ratio of numbers of colonies growing at 42 °C to those at 32 °C was determined to be 1

in 12,000. This figure is well above the reversion frequency for

E. coli 236c. Consequently, a number of colonies that grew at the

non-permissive temperature were individually picked into 2 ml of

L-broth containing ampicillin and grown overnight at 42 °C with

shaking. Plasmid DNA was purified from each culture as described in

Section 2.8.2 and resuspended in a 25 n\ volume of TE buffer.

Competent 236c (200 pi aliquots) were transformed with 50-100 ng of 99

T ab le 4.1

Growth of E. colt 236c Cells Transformed with

B. stear other mo nhilus Gene library

Selection Temperature

Dilution factor 32°C 42°C

1 0 '4 ND 150 i

O ND 2 0

10"6 lawn 3

10-7 - 1 0 0 0 ND

1 0 '8 300 ND

The temperature-sensitive E. coli strain 236c (Isaksson et al, 1977) was transformed with a gene library of B. stear other mo philus DNA fragments cloned into the BamHl site of pAT153 as described in Section 2.7. 25,000 ampicillin-resistant colonies that grew at 32 °C (the permissive temperature for the strain) were harvested and serially diluted in L-broth. 50 fi\ aliquots were re-plated on duplicate L-Amp plates and incubated at 32 °C or 42 °C (the non-permissive temperature). The table shows the number of colonies that grew at both temperatures for a range of ten-fold dilutions. The ratio of colonies growing at 42 °C to those growing at 32 °C is approximately 1:12,000. ND = not determined. 1 0 0 each plasmid preparation and selected by growing on L-plates containing ampicillin at both selection temperatures. All of the plasmids (from a sample of 15) transformed 236c to equivalence at both temperatures.

On the basis of these results, the plasmids were considered to carry the

B. stearothermophilus valS gene. Such plasmids were numbered and prefixed pNB (e.g. pNBl, pNB2 etc).

4.2.3. Characterisation and Sizing of pNB Plasmids

Approximately 200 ng of each plasmid was restricted with either

£coRI or Pstl. The restriction fragments were resolved on an agarose gel against molecular weight markers; Fig. 4.2 shows a typical set of digests. The molecular weight markers (lane a) are derived from an

Aval digest of bacteriophage X DNA. The plasmids were estimated to be between 13-14 kb in size (representing a 9.5-10.5 kb insert of

B. stearothermophilus DNA) from the rough sizes assigned to the fragments generated by £a?RI or Pstl. All of the plasmids were identical. Plasmid PNBl was singled out for further study and was restricted with a number of enzymes (Fig. 4.3). A more accurate size of 13.6 kb was assigned to the plasmid on the basis of these data.

The Bstl digestion (lane c) produced a smear of DNA fragments, probably due to the reported "” activity for this enzyme, whereby the enzyme recognises Sau3A sites (5’ N^GATCN 3’) as well as the intended target sequence (5’ G^GATCC 3’). This occurs when the concentration of glycerol in the digestion mix exceeds 5% (Pharmacia

Molecular Biologicals catalogue, 1984). 101

a b c d e f g h i

14.7 8.6 6.9 4.7 3.7 —

1 . 9 \_ 1 . 7 r 1 . 6

\___ i \___ t i— t »— t

p N B 3 p N B 7 p N B 4 p N B 1

Fig.4.2. Sizing pNB plasmids. A number of pNB plasmids were restricted with either EcoRl (lanes c,e,g and i) or Pst I (lanes b,d,f and h). Molecular weight markers (an Aval digest of phage X DNA) were run in lane a and the sizes of the fragments, in kilobases (kb), are shown alongside. 102

a b c d e f g

11.4 1 0.5 \ 6 . 2 ^ 4.2/4.4A Lmd — 3 .7\ LJ />y — ^— 2.6 — 2.0 — 1 - 9 1 . 8 / - 1 . 7

Fig. 4.3. Preliminary restriction analysis of pNBl. Plasmid DNA (200 ng) was digested with one of the following restriction enzymes: ZTcoRI (lane b); Bstl (c); //mdlll (d); Pstl (e), and Kpnl (f). A Clal digest of X DNA was run alongside as a marker (lanes a and g). The sizes of the marker fragments, in kb, are indicated (fragment x is a artefactual band, migrating with an apparent size of 3.3 kb, that is common to all preparations of the marker/ 103

4.2.4. Discussion

The size of the valS gene had earlier been estimated to be 3 kb

long, based upon an assumed average molecular weight for an amino

acid of 110 Da and an enzyme of Mr = 110,000 (Koch, Boulanger and

Hartley, 1974; Mulvey and Fersht, 1977a). Plasmid pNBl is roughly

14 kb in size, so it was logical to attempt to sub-clone the gene in order to reduce the amount of DNA that would have to be sequenced.

The chosen approach was to re-clone fragments of pNBl back into

pAT153 and select for the valS+ phenotype in transformed E. coli 236c

growing at 42 °C (next section).

The cloning of the B. stear other mophilus valS gene was confirmed

by analysing the size of the protein by polyacrylamide gel

electrophoresis of crude cell lysates under denaturing conditions

(Laemmli, 1970) and by detailed analysis of the kinetic parameters of

purified enzyme. These results are discussed in greater detail in

Chapter 5.

4.3. Subcloning of valS from pNBl

4.3.1. Introduction

An attempt was made to subclone the gene so as to reduce the

amount of DNA that would have to be sequenced. The

B. stearothermophilus DNA library was constructed by ligating Sau3A

fragments (Sau3A recognition sequence 5’ N^GATCN 3’) into the BamHl

site of pAT153 (recognition sequence 5’ G^GATTC 3’), so the chances

of having recreated the BamHl sites, allowing for the inserted DNA to 104 be excised from the plasmid with BamYU. in subsequent manipulations, are low. Rather than map the plasmid with restriction fragments and subclone a smaller fragment carrying the valS gene, an attempt was made to subclone fragments of pNBl of between 3-5 kb by a shotgun approach, similar to that taken to clone the gene in the first place.

Restriction fragments from a partial digest of pNBl were cloned back into the BamHl-cut pAT153 and the constructs were screened for those containing the intact gene by selecting for transformed E. coli 236c growing with a valS+ phenotype at 42 °C. A strategy was chosen that would re-create the BamHl sites, allowing for the excision and purification of the cloned B. stear other mophilus DNA by digesting the subcloned plasmid with BamYH and separating the insert and vector restriction fragments by electrophoresis through an agarose gel.

4.3.2. Subcloning Strategy

The strategy behind subcloning is outlined in Fig. 4.4. The method exploits two characteristics of the restriction enzymes Alul and

Hae III. First, the enzymes have recognition sequences of only four base pairs, therefore, on average, the enzymes should encounter the appropriate restriction site every 256 (i.e. 4 ^) base pairs. Secondly, Alul

(recognition sequence: 5’ AG*CT 3’) and Hae III (recognition sequence: 5’

GG*CC 3’) generate molecules that have blunt ends. If the restriction digest is carried out at 25 °C for a limited period of time, only a small percentage of the total number of Alul and Hae III sites in the target DNA will be cut, generating a set of partially- digested fragments from the target DNA and, the earlier the reaction is terminated, the longer the fragments will be. Consequently, if such fragments are cloned into blunt-ended ftamHI-cut pAT153 (i.e. the 5’ pNB1 pAT153

C1 ] | BamHI I Alul/Haelll

I 4-6 kb fragments

CT -AG GA TC

CC------GG GG------CC [2] |T4 pol/dNTPs

[3] | CIP

Transform 236c

Select Amp” colonies at 42°C ▼ 106 overhangs had been filled in by T4 DNA polymerase or Klenow polymerase in the presence of all four dNTPs), BamKl sites will be recreated on either side of the insert. The insert DNA may be subsequently purified from the vector by electrophoresis of a BamHl digest of the recombinant plasmid through a low-gelling temperature agarose gel.

Initially, a time course experiment was conducted to monitor the rate of digestion of pNBl by Alul and HaeIII. 2 /xg of pNBl was digested in a 25 /xl volume at 25 °C with 3 units of each of the two enzymes. Aliquots (5 /xl) were removed at 5, 10, 20, 30 and 40

Fig. 4.4. (Facing) Sub-cloning strategy for pNBl. Plasmid pNBl (~14 kb), containing the B. stear other mophilus valS gene, was partially digested with the restriction enzymes Alul and HaellI; 4-6 kb fragments were separated on and subsequently purified from a low-gelling temperature agarose gel (Section 2.5). The plasmid vector pAT153 was cut with BamHl [1], made blunt-ended with dATP, dCTP, dGTP and dTTP in the presence of T4 DNA polymerase [2] and phosphatased with calf intestinal (alkaline) phosphatase (CIP) to prevent self-ligation [3]. The pNBl fragments were cloned into the blunt-ended vector with T4 DNA ligase and the ligation mixes were used to transform the E. coli valS mutant 236c. Ampicillin-resistant colonies were selected as described in Section 4.2 and plasmid DNA was isolated from them (Section 2.8.2). Recombinant plasmids containing valS were able to confer a valS+ phenotype upon the transformed mutant strain when grown at the non-permissive temperature. Key to vector restriction sites: E and P represent the EcoRI and Pstl sites of pAT153 at positions 1 and 2900 respectively on the conventional map (Dr W.H.J. Ward, personal communication). [B] (top left of figure) denotes the former BamHl sites of pNBl that flank the B. stear other mo philus chromosomal DNA insert. B (bottom of figure) represents the regenerated BamHl site in an ideal sub-clone. 107 min and separated by electrophoresis on a 1% agarose gel. Phage X chromosomal DNA cut with Aval was run alongside as a marker. A

20 min digest at 25 °C was judged to be a suitable length of time to generate a high proportion of fragments of 4-6 kb in length, large enough to carry the valS gene. Next, 5 fig of pNBl was digested with

7.5 units of each enzyme (in a 75 fi\ volume) for 20 min at 25 °C.

The digestion was terminated by adding 15 ft\ of loading dye and the fragments were separated on a low-gelling temperature agarose gel

(Section 2.5.2), with Aval-cut phage X DNA fragments being run alongside as size markers. Those partial fragments of between 4 and 6 kb in size were purified for subcloning according to Section 2.5.3. The

DNA fragments were resuspended in TE buffer at a concentration of approximately 30 ng/fi\.

The vector pAT153 was restricted with BamHl (Fig. 4.4[ 1 ]) and the

5’ cohesive ends were made flush by T4 DNA polymerase in the presence of dATP, dGTP, dCTP and dTTP, [2], as described in Section

2.10. The terminal 5’ phosphate groups were then removed with calf intestinal (alkaline) phosphatase (CIP), [3], in order to reduce religation or concatemerisation of the vector molecules (see Section 2.6) and the

DNA was purified and resuspended in TE buffer. Various amounts of the Alul/Haelll fragments were incubated with 20 ng of vector DNA for 16 hr at 15 °C in the presence of 5 units of T4 DNA ligase (see

Section 2.9). The ligation mixes were used to transform competent E. coli 236c to ampicillin resistance.

4.3.3. Selection of valS Subclones by Complementation

Plasmids that contained the entire valS gene were selected by their ability to confer a valS+ phenotype when transformed into competent E. coli 236c and grown at 42 °C on L-Plates containing ampicillin.

The procedure described in Section 4.2 previously for the cloning and selection of the 14 kb plasmid pNBl was followed. Colonies that grew at 42 °C were used to seed liquid cultures from which plasmid DNA was purified by the method of Holmes and Quigley (1981). The plasmids were introduced back into 236c and the transformed cells were plated out on selective media at both the permissive and non-permissive temperatures. Plasmids that transformed 236c to equivalence at both temperatures were judged to carry the valS gene. Subsequently, a number of those plasmids were digested with Hindlll and the restriction fragments were separated by electrophoresis through a 1% agarose gel

(Fig. 4.5a). Two types of plasmid were observed. Those typified by pNB2.3 (lane d) contain three fragments of approximately 4.5 kb, 3.3 kb and 1.2 kb in size. The others, of which pNB2.1 is representative

(same figure, lane b), contains the 4.5 kb and 3.3 kb bands only.

Transformation of E. coli 236c with both types of plasmid produced equal numbers of Amp^ colonies at both selection temperatures, therefore as pNB2.1 represents the minimum information necessary to code for the valS gene it was chosen for further study. Digestion of pNB2.1 with BarriUl produced two bands of 4.7 kb and 3.6 kb

(Fig. 4.5b, lane d).

4.3.4. Discussion

The results presented above indicated that pNB2.1 contained two

BamYil sites and, as pAT153 is approximately 3.6 kb long, it appeared that the subcloning had been successful and that the Bam HI sites flanking the inserted DNA had been re-created. The plasmid was able to confer a valS+ phenotype on transformed E. coli 236c at the 109

abcde f g h i j

' - $ 10.5/1 1.4 6.2^ - 4.2 / 4.4 ^ a 3.7 ----- m m • •• Vi 2.6 ----- 1 . 7 - 2 . 0 -----

1.1 ----

a b c d e f 9

2 3. 0 ---- 9.4 ___ 6.6 -- 4.3 -- b 2.3 --- 2 . 0 -----

Fig. 4.5. Restriction analysis of valS subclones, a) Putative subclones were picked and grown in L-broth, plasmid DNA was prepared (Section 2.8.2) and the DNA was digested with Hindlll. Molecular weight markers (X DNA cut with Clal) were run in parallel in lanes a,g and h. b) Plasmid pNB2.1 was cut with various restriction enzymes and the products were resolved on an agarose gel. A BamHl digest produces two fragments of 3.6 and 4.7 kb (lane d). The other digests shown are Kpnl (b), Rs/I (c), Sail (e), Clal (f) and ZTcoRI (g). The DNA markers (a) were derived from a HaeIII digest of X DNA. 1 1 0 non-permissive temperature and produced a protein of apparent Mr =

1 1 0 ,0 0 0 , comparable with the protein expressed from cells transformed with pNBl (see Section 5.3).

The 4.7 kb BamHI fragment was purified for shotgun DNA cloning and sequencing (Chapter 6 ). A simple restriction map of pNB2.1 was constructed in parallel with this work. Once the DNA sequence had been determined, a more thorough restriction map could be derived by using appropriate computer programs, such as those contained on ANALYSEQ (Staden, 1984). A suitable fragment carrying valS could subsequently be excised from pNB2.1 and cloned into an

Ml 3 vector for protein engineering studies.

4.4. Restriction Analysis of pNB2.1 and pNBl

4.4.1. Introduction

Preliminary analysis of the structure of pNB2.1 by cleavage with restriction enzymes revealed that it was difficult to assign restriction sites using a map of pAT153 (Dr W.H.J. Ward, personal communication) as a guide for the assignment of sites in the vector. For example, the distance from the unique EcoRl site of pAT153 through to the BamHI site in an anticlockwise direction on the standard map is approximately

3.3 kb. Therefore, an ZscoRI digest of the plasmid pNB2.1, containing

DNA inserted at the BamHI site, should produce one fragment which contains at least 3.3 kb of vector. In contradiction, the largest fragment visible in an EcoRl digest of pNB2.1 appeared to be 2.8 kb in size (Fig. 4.5b, lane g). Consequently, a restriction map was derived without assigning individual fragments to sections of the pAT153 vector Ill as an aid to construction.

4.4.2. Restriction Map of pNB2.1

The positions of restriction sites were determined by comparing the sizes of fragments that had been generated from single digests with those from a double digest with an additional enzyme present. Fig. 4.6 shows the restriction map of pNB2.1, as constructed from a number of digests (a), compared to a map of pAT153 (b). The maps are aligned at a common Clal/Hindlll junction. The data revealed that the original subcloning scheme had been ineffective and that, presumably, pNB2.1 had been derived from pNBl by deletion of a portion of B. stearothermophilus DNA linked to 1.4 kb of pAT153 through a substantial rearrangement in vivo.

4.4.3. A Restriction Map for pNBl

In the light of these results, a restriction map of pNBl was constructed. Fig. 4.7 shows that the map is co-linear with pNB2.1 for approximately 6.2 kb.

4.4.4. Discussion

A number of points can be drawn from the restriction maps.

The 3.6 kb BamHl fragment, previously assumed to have been the vector pAT153, has a completely different restriction pattern throughout its length compared to pAT153 and must be assumed to be a 3.6 kb fragment of B. stearothermophilus DNA. Both pAT153 and pNB2.1 share a common region, shown at the extreme left of each figure; a Pstl site 112

N/Sc H S S Av Sc C/H P Acl PjK B Kl SclAc B C a 1___1_____ l L _ l L L 11 il 1 __ U pNB2.1

C /H P AvNSBC/H b I ' 1 I II I pAT153 irep i---- - i A m p R ----^ L T c R ori

1 kb

Fig. 4.6.Restriction maps of pNB2.1 and pAT153. (a) the restriction map of pNB2.1 was constructed from single and double digests with a variety of restriction enzymes, (b) the restriction map of the plasmid vector pAT153 (Twigg and Sherratt, 1980), shown aligned with a common region of pNB2.1. Key to restriction enzymes: Ac-.4ccl; Av-/lvnl; B-BamHI; C-C/al; H-HindlU; K-Kp/il; P-Pstl, and S-Sall. AmpR and TcR represent the ampicillin and tetracycline resistance genes respectively and the arrows show the direction of transcription. The region essential for replication and the direction of replication from the origin arc depicted by rep and ori-> respectively (redrawn from Thompson, 1982). N/Sc N ScH I H/Sc N/Sc Av Sc C/H P Av I S BSj |Av(b |Av S C Ac I L— ■ ,1,11,| llll I ll__LL pNB1

1 kb N/Sc H S S Av Sc C/H P Acl P|K B |K| Sc jAc| B C/H pN B2.1

[ vatS 114 is located approximately 0.8 kb to the right of a unique Clal site and a Hindlll site in close proximity to each other (Fig. 4.6). In addition, both plasmids have a BamHl site located approximately 0.4 kb from the other side of the Clal/Hindlll sites. This suggests that a 1.2 kb

BamUl-Pstl fragment of pNB2.1 is derived from pAT153. However, this is the only similarity between the two plasmids. The alignment of the maps in Fig. 4.6. shows that there is an Accl site in a position equivalent to the unique Aval site of pAT153. This Accl site lies

0.4 kb to the left of a unique Nrul site and is present also in pNBl 1425 (see Fig. 4.7). The Aval site at position/in pAT153 is not present in pNB2.1, thus, this region probably represents the junction between the vector and sequences derived from pNBl. The region to the immediate left of the Accl site in pNB2.1 (and pNBl) has not been well characterised by restriction mapping, so it is impossible to pinpoint the junction between the vector and B. stearothermophilus DNA more precisely. Nevertheless, the portion of vector must contain the

/8-lactamase gene and, presumably, the plasmid origin of replication.

Fig. 4.6b, redrawn from Thompson (1982), shows that the pAT153 origin

Fig. 4.7. (Facing) Comparison of the restriction maps of pNBl and pNB2.1. (Reproduced from Brand and Fersht, 1986). Plasmid pNBl, containing 10.2 kb of B. stearothermophilus DNA (thin line) inserted into the BamHl site of pAT153 (thick line), is drawn above its 8.3 kb derivative, pNB2.1. The maps are aligned around a common region of approximately 6.2 kb. Restriction sites are denoted as follows:Ac-^ccI; Av-Aval; B-BamHl; C-Clal; H-//mdIII; K-Kpnl’, N-Nrul; P-Rs/I; S-Sall, and Sc-Sacl. The open box below the maps represents a 3.6 kb region that contains the valS gene and has been isolated on the plasmid pTB8 by subcloning a Sau3A fragment of pNBl into the BamHl site of pUC9 (Section 4.5). 115 of replication (ori) lies about 0.3 kb to the left of the Aval site at position 1424. Backman et al (1978) have shown that a 580 bp fragment from pMBl (a parent plasmid of pBR322 and pAT153) represents the minimum size for plasmid replication in E. coli. This segment is depicted as rep in Fig. 4.6b.

The remainder of pNB2.1, approximately 6.2 kb in size, is derived from, and co-linear with, pNBl. This size represents the upper limit to those Haelll/Alul fragments that were gel-purified and, subsequently, ligated in the presence of ZtamHI-cut pAT153. Presumably, pNB2.1 arose from a complex ligation between several fragments of pNBl and/or the BamHI-cut vector. The construction of pBR322 itself is enigmatic. The determination of the full DNA sequence for pBR322 revealed that two junctions between different DNA segments did not contain the expected restriction sites at those junctions (Sutcliffe, 1978).

Thus, recombination may have occured in vivo during the plasmids construction. The selection pressure that was employed to isolate transformed E. coli 236c valS+ colonies, growing at the non-permissive temperature, was such that pNB2.1 possessed the three components necessary to successful transformation a functional origin of replication, a complete AmpR gene and the valS gene, able to complement the mutation in 236c when grown at 42 °C.

4.5. Subcloning of pNBl - Selection of pTB8

4.5.1. Introduction

A second sub-cloning strategy was undertaken in conjunction with Dr Thor Borgford and Tammy Gray. Plasmid pNBl was digested 116 with Sau3A at 25 °C for various periods of time, producing a partial digest at each time point, and fragments longer than 3 kb were purified. Subsequently, these fragments were ligated into 2tawHI-cut pUC9 (Vieira and Messing, 1982). The ligation mixes were used to transform E. coli 236c and valS+ colonies that grew at 42 °C were selected as before. Plasmid DNA was extracted from such colonies and cut with restriction enzymes. One plasmid (pTB8) contained a 4.4 kb insert and produced a protein of about Mr = 110,000 that was able to charge [^C] valyl adenylate with tRNAva^ (results presented in next section). A restriction map of pTB8 confirmed that it contained a region that is also common to pNBl and pNB2.1 (Fig. 4.8a).

4.5.2. Subcloning of a 3.6 kb Pstl Fragment containing valS into Ml 3

A number of other plasmids were constructed by cloning selected restriction fragments of pNBl into pUC vectors. These constructs were assayed for their ability to complement the temperature-sensitive lesion in 236c and the results are presented in Fig. 4.8. The restriction maps of pNBl and pTB8 are compared; both complement the mutant, as does pCla5.3, a plasmid containing a 5.3 kb fragment that was derived from limited nuclease Bal31 digestion of the 7.3 kb Clal fragment of pNBl.

In contrast, the 3.6 kb BaniHl fragment of B. stearothermophilus DNA from pNBl, cloned into £a/?2HI-pUC8 (pBam3.6), is unable to complement the mutant, implying that sequences to the left of BamHl site in pTB8 are essential to the integrity of the valS transcription unit. A 4.2 kb Sail fragment of pNBl (pSal4.2) did not complement either. Taken together, these results point to the gene occupying the middle of the inserts in pTB8 and pCla5.3 as opposed to one end or the other. A number of Ml 3 clones containing the 3.6 kb Pstl 117

N/Sc N SCH H/Sc N/Sc H S S Av Sc C/H P A v S BSj Av B lAv S C Ac p |k B K Sc Ac B C/H |___I _L __LJ_ f _lL _L j. | _1_ _U pNB 1

a p i_ pTB8

Complements 236c

+ pCla5.3 b ND VTB3CM13) pBam3.6 pSa!4.2

Fig. 4.8. Comparison of the restriction maps of pNBl and pTB8. a) The restriction map of pNBl (top) is compared with that of pTB8 (bottom). Plasmid pTB8, constructed by cloning a Sau3A fragment of pNBl into the i?amHI site of pUC9 (Vieria and Messing, 1982), was digested with the following restriction enzymes; Av-Aval; B-BamHI; k-Kpnl; P-Pstl; S-Sall; Sc-^SV/cI; Sm-S’wnl. The map was co-linear with pNBl over approximately 4.5 kb. Other restriction sites in pNBl are described in the legend to Fig. 4.7. b) Restriction fragments of pNBl were subcloncd into appropriately cut pUC8 or pUC9 vectors (pCla5.3, pBam3.6 and pSal4.2) or into M13 (vTB3[M13]). The pUC clones were assayed for complementation in the mutant E. coli strain 236c. The Ml 3 clone was shown to charge valyl adenylate with tRNAva* by aminoacylation assays that were carried out on crude cell lysates of E. coli TG2 harbouring the phage (Section 5.5.2). ND = not determined. 118

fragment from pTB8 (of which vTB3[M13], same figure, is

representitive) showed considerable ValRS activity when the phage were

grown in E. coli TG2 and crude cell lysates were assayed for

aminoacylation activity (see Section 5.5.2). This result suggested that

the valS gene is contained within this 3.6 kb fragment. It was not

possible to assay the Ml 3 clones for complementation as E. coli 236c

does not contain the F* episome that is required for pilus formation in

E. coli (Pratt, 1969).

4.5.3. Discussion

The 3.6 kb Pstl fragment was cloned in both orientations into

M13mp8 (confirmed by dideoxy sequencing). The fragment was not

cloned back into pUC9 and assayed for complementation, but crude

lysates of E. coli TG2 harbouring Ml 3 valS clones were assayed for

their ability to charge [^C] valyl adenylate with tRNAva^ from an

enriched B. stearothermophilus tRNA pool. Seven clones were obtained,

three of which showed a minimal charging activity and probably reflect

the background for the assay (see Section 5.5 for a more thorough

discussion of these results). One such clone, vTB2, turned out to

contain a duplication of M13 sequences as opposed to the Pstl

fragment, whereas clones vTBl and vTB8 are orientated in the same

direction as pTB8 (as drawn in Fig. 4.8). The other four clones

exhibited a higher level of charging and, significantly, were orientated

in the opposite direction to the other set of clones. Subsequent work

implied that this latter set of clones were transcribed from the Ml 3

lacZ promoter which is positioned about 0.45kb upstream of the valS

initiation codon. The 3.6 kb fragment was isolated and fragmented by

sonication. The fragments were subsequently cloned into Ml 3 vectors 119 and the complete DNA sequence of the fragment was determined by dideoxy sequencing (Chapter 6). The results confirmed that the PstI

fragment contains the valS structural gene (Chapter 7). 1 2 0

CHAPTER 5

ANALYSIS OF THE CLONED valS GENE PRODUCT.

5.1. Introduction

The valS gene of B. stear other mophilus has been isolated and subcloned (see Chapter 4), and its DNA sequence has been determined

(Chapter 7). The gene, which is about 2.7 kb long, was cloned initially from a B. stear other mo philus gene library on a 10 kb insert in the plasmid vector pAT153 (Twigg and Sherratt, 1980). The plasmid, denoted as pNBl, conferred a valS+ phenotype on the E. coli temperature-sensitive strain 236c when grown at the non-permissive temperature. An 8.3 kb derivative of pNBl (pNB2.1) was also isolated.

It contained approximately 6.2 kb of B. stearothermophilus DNA and

2.1 kb of pAT153 (see Section 5.4). Finally, the gene was sub-cloned into the vector pUC9 (Yieira and Messing, 1982) and pTB8, a plasmid containing a 4.4 kb insert, was isolated. The gene was located

tentatively on a 3.6 kb PstI insert of pTB8 in a region that is common

to all three constructs (Section 4.5).

The B. stearothermophilus valS gene was shown to have been cloned by a number of criteria (described in the following sections). The gene product has a thermostability profile and aminoacylation kinetics that are comparable to those of native ValRS. The cloned enzyme is over-expressed, relative to the E. coli ValRS, in cells harbouring the plasmid. The cloned protein co-migrates with purified native ValRS on a polyacrylamide gel (under denaturing conditions) with an apparent Mr of 110,000. 121

5.2. Thermostability of the Cloned B. stearothermophilus ValRS

5.2.1. Introduction

B. stearothermophilus is a thermophilic gram-positive bacillus with an optimal growth temperature of about 56 °C (Donk, 1920). The cellular proteins are highly stable, only 3% of the protein in a sonicated lysate being denatured after incubating at 60 °C for 8 min compared with 55% of E. coli proteins (Koffler and Gale, 1957, cited in Stanier et al, 1976). The stability of the B. stearothermophilus

ValRS, cloned on pNB2.1 and expressed in E. coli DH1, was determined at various temperatures. This was compared with data for untransformed DH1, B. stearothermophilus native ValRS and the E. coli valS mutant 236c.

Cultures of E. coli 236c, DH1 and DH1 carrying pNB2.1, denoted as DHl[pNB2.1] were grown in L-broth at 37 °C with shaking overnight. B. stearothermophilus NCA 1503 was grown in the same medium, but was cultured overnight at 56 °C with shaking. Crude cell-lysates were prepared from all cultures (Section 3.2) and aliquots were incubated for 30 min at various temperatures between 25 and

70 °C. The lysates were centrifuged to pellet any denatured protein and the supernatant was re-incubated at the appropriate temperature for

5 min. Charging cocktail (Section 3.1.3.) was heated to the same temperature over 5 min, then 10 pi of lysate was mixed with 90 p\ of cocktail and the reaction was allowed to proceed at the given temperature. Aliquots of the mixture were removed for counting at 1.5 and 3 min, as described in Section 3.1.3. 122

5.2.2. Results and Discussion

The thermostability profiles for E. coli and B. stearothermophilus

ValRS were determined from the level of aminoacylation activity present in crude cell—lysates that had been pre-incubated, then assayed, at a temperature between 25 and 70 °C. Lysates of E. coli DHI carrying the cloned valS gene on plasmid pNB2.1 and

B. stearothermophilus NCA 1503 both exhibit maximal activity at

60-65 °C (Fig. 5.1, filled and open circles respectively). In contrast, untransformed DHI or the mutant strain 236c show maximum charging at 37 °C, the level of charging being negligible when the lysates had been incubated above 50 °C (same figure, filled and open triangles respectively). The results are consistent with pNB2.1 encoding a thermophilic ValRS. However, these results are difficult to analyse quantitatively for a number of reasons. First, the results are derived from crude cell lysates and the amount of enzyme obtained is dependant upon the efficiency of the various steps of the lysis.

Second, although the activity measured at each temperature reflects the amount of active enzyme remaining in the lysate following the pre-incubation, this activity will also be affected by the length of the actual assay. The sample and cocktail were incubated for 5 min, then mixed, and aliquots were taken at 1.5 min and 3 min for counting.

The assay also reflects the inherent activity of ValRS. An increase in temperature will have an effect on the net enzyme activity which will be the sum of various effects (Dixon and Webb, 1964). For example, the initial reaction velocity will increase as the result of a direct effect on the entropies and free energies of the various steps along the reaction pathway, including the stability of the transition state intermediates. The rate of reaction is limited largely by the stability 123

25 35 45 55 65 75 Temperature/°C

Fig. 5.1. Thermostability of ValRS from various sources. Crude cell lysates were incubated and assayed for aminoacylation of [^C] valyl adenylate at various temperatures (Section 5.1.1.). Aliquots (25 /xl) were removed at 1.5 and 3 min, quenched and counted. The mean counts per minute value for the two time-points was divided by the specific activity for valyl adenylate and plotted as the mean charging rate (pmol m in'1) versus the assay temperature for that sample. Key: Filled circles, E. coli DHl[pNB2.1]; open circles, B. stear other mophilus NCA 1503; filled triangles, DH1; open triangles, E. coli valS mutant 236c. (Figure redrawn from Brand and Fersht, 1986). 124 of the enzyme itself, as well as by potential pH changes which affect the enzymes state of ionization. Thus, E. coli ValRS will be most active at the optimal physiological temperature for the organism (i.e. at

37 °C) but at higher temperatures there be a sharp decline in the observed activity. The site of the valS mutation in 236c is unknown, but it is reasonable to assume that the mutation involves an amino acid substitution(s) in the ValRS that affects the enzymes thermostability. The valS mutant 236c grows slowly in liquid culture when grown at 37 °C, compared with 32 °C (NJB, unpublished observations). When the aminoacylation profiles for E. coli DH1 and

236c in Fig. 5.1 are compared, the activities of both lysates at 30 °C appear to be equivalent, but the activity of the DH1 strain at 37 °C is about three times greater than that for the mutant strain at the same temperature. (Obviously, the activity of E. coli DHl[pNB2.1] relative to that of B. stearothermophilus 1503 cannot be compared in this manner as the copy number of the plasmid-borne valS will be higher than that of the chromosomal equivalent.)

5.3. Analysis of Cloned ValRS by SDS-Polyacrylamide Gel

Electrophoresis

A crude lysate (1 ml) was prepared from a 5 ml overnight culture of cells harbouring a plasmid that complemented the ts lesion of E. coli

236c (see Section 3.2). Fig. 5.2 shows an SDS-PAGE gel of lysates of

236c containing either pNBl, pNB2.1 or pTB8 (lanes b, c and d respectively) compared to 236c transformed with pAT153 (lane e) or untransformed cells (lane f). A standard of ValRS that had been purified from B. stearothermophilus is also shown (lanes a and g). The 125

Fig. 5.2. Analysis of crude cell-lysates of E. coli 236c carrying the cloned B. stearothermophilus valS gene. SDS-PAGE (modified from Laemmli, 1970) of crude cell-lysates of E. coli 236c was carried out as described in Section 4.1.4. Lanes a and g, 2 pg of native ValRS, prepared according to Mulvey and Fersht, 1977a; lane b, 236c[pNBl]; lane c, 236c[pNB2.1]; lane d, 236c[pTB8]; lane e, 236c[pAT153]; lane f, untransformed 236c. 126 results show that all three constructs — pNBl (13.8 kb), pNB2.1

(8.3 kb) and pTB8 (7.1 kb) — direct the overproduction of a protein with the same molecular weight as purified native ValRS. The greatest level of overproduction is seen in cells harbouring pTB8, the recombinant that contains the smallest insert. Maniatis et al (1982) have noted that, in general, the smaller the size of a plasmid, the higher the copy number for that plasmid. Effectively, this has a gene dosage effect, generating a higher number of templates for transcription which, ultimately, is reflected in an increased amount of protein transcribed from a given gene. Whilst being a rather general observation, this is a plausible explanation for the overproduction of the cloned ValRS from the smallest valS construct. Such estimates of overproduction are merely qualitative but serve as an indication of yield of the cloned ValRS from a cell culture (see Section 5.6).

5.4. Kinetic Measurements for Purified B. stearothermophilus ValRS

5.4.1. Introduction and Methods

The steady-state kinetics of purified cloned ValRS were measured and compared with those of native ValRS, purified from a culture of

B. stearothermophilus. The Michaelis constant (AT^) f°r b°th ATP and valine and &cat, the first-order rate constant, were derived from aminoacylation assays (Section 3.1.3.). Native ValRS was purified by

R.S. Mulvey according to Mulvey and Fersht (1977a) and stored at

-20 °C in 50% glycerol. Cloned ValRS was purified from a culture of

E. coli 236 carrying pNBl and stored under liquid nitrogen

(Section 4.3). 127

The values of Af^ for ATP and valine, and &cat had been measured for the B. stearothermophilus ValRS (Mulvey and Fersht, unpublished data). Accordingly, when one substrate was being measured

(the substrate concentrations ranging from roughly one fifth of to five times above the known the concentration of the other substrate was kept above its measured ATj^ value.

5.4.2. Results and Discussion

The Michaelis constant for ATP, (ATP), was measured by varying the concentration of ATP between 25 /iM and 5 mE Aliquots

(25 p\) were removed at three time points, counted and the data were handled as described in Section 3.1.3. Fig. 5.3 shows a plot of the reaction velocity, K, plotted against the velocity divided by the substrate concentration. The data were analysed by non-linear regression weighting, using a program written by Dr. R. Leatherbarrow and run on an IBM-XT computer. The plot is biphasic, giving two values for AT^ as noted previously (Mulvey and Fersht, unpublished).

Both values are given in Table 5.1. The lower value for was derived by fitting the line to those points representing the lower substrate concentrations and then extrapolating back to infinite substrate concentration (i.e. to the y-axis). This intercept is Fmax. The slope of the graph gave a value for Af^ of 102 /iM.

The value of for valine was measured in a similar way, the concentration of valine being varied between 5 and 88 /iM. Non-linear regression weighting was used to fit the line to the points. ATj^ (val) was deduced to be 23 ^M from an Eadie-Hoftsee plot (Fig. 5.4). Both of these ATj^j values are in good agreement with those measured by

Mulvey and Fersht (Table 5.1). 128

V / [ s ] X 1 0 ~ 6

Fig. 5.3. Determination of ATM (ATP). Eadie-Hoftsee plot of the reaction velocity for aminoacylation V (^M s"1) versus F/[ATP] (^M s ' 1 The plot exhibits biphasic kinetics. The gradient of the plot, KM, was determined by extrapolating back to the ordinate from data points corresponding to low concentrations of ATP. 129

V / l S l x 10-4 (mMs 1jjM_1)

Fig. 5.4. Determination of KM (val) and fccat. Eadie-Hoftsee plot for aminoacylation (see preceding figure legend). (val) is obtained from the slope of the plot. The turnover number for the enzyme, A:ca^, is derived by dividing the intercept of the plot with the ordinate (Fmax, the reaction velocity at infinite substrate concentration) by the concentration of ValRS, expressed as the number of moles of active sites.against the velocity divided by the concentration of ATP, V/[S] (Eadie, 1942; Hoftsee, 1959). 130

The intercept with the ordinate on the (val) plot gives a value for fccat of 5.65 s-1 at 2 mM ATP (Table 5.1). The value of kcai is twice that value obtained by Mulvey and Fersht. This discrepancy may reflect the difference between the two purification procedures employed. The cloned ValRS was prepared relatively easily, all steps taking less than a week, and the enzyme was purified by two rounds of ion-exchange chromatography. The purification of native

B. stearothermophilus ValRS was lengthy and laborious in comparison.

Purification involved at least four chromatographic steps;

DEAE-Sephadex and phosphocellulose ion-exchange chromatographies, separated by a hydroxylapatite column, and gel filtration through

Sephadex G-150 (Mulvey and Fersht, 1977a). This protracted method, combined with the lengthy storage of the enzyme in 50% glycerol at

-20 °C, may have affected it»* stability and/or activity. The activation and charging steps are distinct and structural evidence infers that, for a number of synthetases, the functions are associated with distinct structural domains ( Jasin et aU 1983; Waye et al, 1983). Active-site titration (Yarus and Berg, 1970) is used to quantitate the amount of active enzyme from the stoichiometric formation of valyl adenylate (i.e. this measures the productivity of the activation step only). The value of /ccat for aminoacylation was determined by dividing Fmax by the concentration of active enzyme (determined by active-site titration). It is not inconceivable that the concentration of active sites used by

Mulvey and Fersht is for an enzyme preparation that catalysed activation efficiently, but was functionally defective for the charging step, possibly due to a structural perturbation caused by purification or storage in 50% glycerol. Such storage conditions have been shown to affect the "half-of-the-sites” reactivity of the dimeric TyrRS from

B. stearothermophilus (Jones et al. 1985). The enzyme normally forms 131

Table 5,1

Comparison of Aminoacvlation Kinetics for ValRS Isolated from

B. stearothermophilus and E. coli a

Enzyme *cat ATM(val) ATM(ATP) s”^ /tM /i M

Purified ValRS from B. stearothermophilus b 2.9 24 102

ValRS isolated from E. coli 236c[pNBl] 5.65 23 102 c 380

a All experiments were conducted at 25 °C in 144 mM Tris-HCl (pH 7.78) containing 10 mM MgCl2, 14 mM 2-mercaptoethanol and 0.1 mM PMSF. Conditions for derivation of Ky^ (val) and Kyi (ATP) are given in Section 4.1.3. ValRS from B. stearothermophilus was purified according to Mulvey and Fersht (1977a). The cloned ValRS from E. coli 236c containing plasmid pNBl was purified as described in Section 4.2. b From unpublished data, R.S. Mulvey and A.R.F. c The plot for ATP is biphasic. Two lines were drawn, from which separate values (upper and lower) for Kyi were calculated. one mol of tyrosyl adenylate per dimer in a few seconds, but if the enzyme has been stored at -20 °C in glycerol, the stoichiometry of tyrosyl adenylate formation changes such a second mol of tyrosyl adenylate forms per dimer over a period of minutes. In contrast, the cloned enzyme was prepared rapidly and stored under liquid nitrogen and should be less susceptible to partial degredation. Ideally, the value of &cat for activation should be measured by pyrophosphate exchange and compared with earlier results (Mulvey and Fersht, unpublished).

Significantly, the apparent values agree with those measured by

Mulvey and Fersht and the value of &cat (aminoacylation) that was obtained for the cloned enzyme is higher than the previous measurement. The values, which are independent of the degree of enzyme activity, are a more reliable guide to the identity of the enzyme.

5.5. Expression of valS Cloned into Ml 3

5.5.1. Introduction

The results presented in Section 4.5 implied that the valS structural gene was contained on the 3.6 kb Pstl fragment of pTB8.

Consequently, this fragment was cloned into Ps/-cut M13mp8. Dideoxy

DNA sequencing confirmed that the fragment had been cloned in both orientations with respect to the Ml 3 lacZ promoter.

5.5.2. Preliminary Aminoacylation Results

E. coli 236c does not carry an F* episome, so it was not possible 133 to infect the strain with M13 recombinant phage and determine whether the valS gene had been cloned or not by complementation of the ts lesion. The Ml 3 valS clones were propagated as plaques grown on an infected lawn of E. coli TG2 cells. The plaques were picked into individual 1.5 ml cultures of TG2 and grown at 37 °C for 6 h with shaking. Crude cell-lysates were prepared by standard methods (Section

3.2). The lysates were assayed by SDS-PAGE; none of them appeared to overproduce a protein of Mr = 110,000 when compared with a lysate of TG2 carrying the plasmid pTB8 (NJB, results not shown). The lysates were assayed next by aminoacylation and these results are presented in Table 5.2. The Ml 3 clones may be divided for convenience into three groups on the basis of their respective charging activities; those that have negligible or no activity, such as vTB2 and

6, those that show a significant rate of charging above background

(clones vTB3, 4, 7 and 9) and those with intermediate activity (vTBl, 5 and 8). Significantly, this division can be related to the orientation of the Pstl fragment with respect to the Ml 3 lacZ promoter (next section).

5.5.3. Orientation of Ml 3 valS Clones

The DNA sequence of valS is presented in Chapter 7 and a full explanation of the methods used is given in Chapter 6. The terminal sequences of the inserts of a number of the Ml 3 valS clones were determined using a "universal" primer that is complementary to a vector sequence adjacent to the insert DNA (Section 6.2.1). The M13 clones could be split into two groups, depending upon which way the Pstl had been cloned into the vector. The sequences were assigned to one end or other of the Pstl fragment by comparison with the contiguous sequence for that fragment which had been derived by shotgun 134

Tabic 5,2

Rates of Aminoacvlation for M13 valS Clones

Clone Rate of Aminoacylation (pmol min"1) a

vTBl 0.37 vTB2 0.03 vTB3 1.71 vTB4 1.07 vTB5 0.81 vTB 6 0.11 vTB7 1.11

vTB 8 0.55 vTB9 1.92

a Aminoacylation (charging) assays were carried out as described in Section 4.1.3. The rate of aminoacylation of tRNAva* for each clone was determined from the slope of a plot of counts per minute against time (min). The specific activity of the reaction was determined and the aminoacylation rate, in cpm min-1, was divided by the specific activity to give the no. of pmol of valyl adenylate that were charged with tRNAva* per minute in a 25 /d reaction volume. 135

X P v a lS -----> pTB8 <-— u lacZ

P M 1 3 valS — ► j class u *lacZ

p «•— valS class II u lacZ

Fig. 5.5. Comparison of the orientation of the valS gene in pTB8 with that in M13. (a) valS (2.7 kb) is located on a 3.6 kb Pstl fragment of pTB8 (Section 4.5). This fragment was subsequently subcloned into Psll- cut M13mp8 in either the same orientation as pTB8 (b) or in the reverse orientation (c). Key: Restriction enzyme sites are denoted by P (Pj/I) or X (Xmal). valS—> and lacZ—> indicate the directions of transcription of valS and the N-terminal portion of the M13 lacZ gene respectively. The filled lines represent vector sequences, the thin lines B. stearothermophilus DNA and the open box represents the valS coding region. U—> represents the priming site for a universal sequencing primer and the direction of 5’ -> 3* synthesis by Klenow polymerase during chain-termination sequencing (Section 6.2.1). 136 sequencing (Chapter 7.2). Sequencing confirmed that the DNA inserts of clones vTBl and 8 were orientated in the same direction as the insert in pTB8 (compare b with a, Fig. 5.5), i.e. the beginning of the structural gene is positioned away from the Ml 3 lacZ promoter. These clones are denoted as Class I clones in the figure. Clone vTB2 was found to be a deletant of wild-type Ml 3 and did not contain an insert. This serves as a valid background control for the aminoacylation results. Clones vTB3, 4, 7 and 9 (comprising Class II clones, Fig. 5.5c) are orientated in the opposite direction, with the

N-terminal end of the structural gene positioned within 450 bp of the start of the lacZ strucural gene. Clones vTB5 and 6 were not sequenced.

5.5.4. The class II valS Genes are Induced by IPTG

The expression of ValRS from the Class II clones implied that the E. coli lacZ promoter was driving transcription through the valS structural gene. To test this hypothesis, 3 ml cultures of E. coli TG2, each harbouring one of the Ml 3 valS clones, were grown to ^ 5 5 0 = 0.2.

The cultures were split in two, one half being treated with IPTG

(3 mM final concentration), and both cultures were grown for 6 h at

37 °C. If the valS gene was being transcribed from the lacZ promoter

(possibly as a fusion protein), it should be possible to increase the level of transcription by inducing the lacZ promoter with IPTG. Crude lysates were prepared and assayed by SDS-PAGE. The results indicate clearly that Class I clones do not show a significant increase in the level of 110,000 Da proteins when grown in the presence of IPTG

(Fig. 5.6, gel I). Neither does vTB2, the clone that does not contain the valS gene. In contrast, Class II clones contain increased levels of a 137 protein that co-migrates with the ValRS standard when the cultures are induced with IPTG (compare lanes p and q for vTB7, Fig. 5.6II).

Proteins of Mr = 110,000 and 124,000 should be resolved on this gel system, so the results suggest that the ValRS is not expressed as a fusion protein (the valS initiation codon is located approximately 140 codons from the lacZ initiation codon, sufficient to add about 14 kDa to the size of ValRS if it was synthesised as part of a fusion protein.

Unfortunately, two of the induced Class I cultures, vTBl+ and vTB 8+ (Fig. 5.61, lanes and e and i respectively) are under-loaded with respect to their respective uninduced lysates (same figure, lanes d and h). However, it appears unlikely that the induced clones are producing more ValRS. To confirm these observations, charging assays were carried out on a variety of lysates. A selection of the crude lysates that had been assayed by SDS-polyacrylamide gel electrophoresis were used. Uninfected E. coli TG2, Class I clone vTBl and Class II clones vTB3, 7 and 9, from cultures grown with or without IPTG, were tested. Lysates prepared from IPTG-treated or untreated cultures of

TG2 carrying pTB8, the best overproducing valS plasmid, were also assayed. The results are presented in Table 5.3.

The lysate from vTBl-infected cells has a charging activity

(measured as the number of pmol of [^C ] valyl adenylate that are charged with tRNAva* per second in a 25 pi volume) that is comparable with that of a lysate of uninfected TG2 and, apparently, is uninduced by IPTG. The aminoacylation rates of TG2 or vTBl lysates, given in Table 5.3 as 0.039-0.043 pmol s”1, correspond to roughly 3,000 cpm, whereas the background activity was roughly 100-200 cpm. These results were reproducible for a number of assays. If the vTBl or TG2 lysates were incubated at 56 °C for 30 min, the level of activity in 138

abodeg h i ]

/

//

-4- — 4- — + — 4- IPTG

Fig. 5.6. Analysis of induction of M13 valS clones by IPTG by SDS-PAGE. Crude lysates of E. coli TG2 were prepared, as described in Section 3.2, from cultures that had been grown in the presence or absence of IPTG. (I) PAGE gel of uninfected TG2 (lanes b and c), class I M13 clones vTBl (d and e) and vTB8 (h and i). Clone vTB2 (f and g) does not contain an insert. (II) Uninfected TG2 (lanes 1 and m) are run alongside class II M13 clones vTB3 (n and o), vTB7 (p and q) and vTB9 (r and s). A molecular weight marker of purified native ValRS (Mulvey and Fersht, 1977a) was run in parallel (lanes a and j, gel I; a,d lanes k and t, gel II). 139

Tabic 5.3

or Non-induced Ml 3 valS Clones

M13 clone Induction by Aminoacylat IPTG a (pmol

TG2 + 0.043 TG2 (56 °C) + 0.005

vTBl + 0.039 - 0.043 vTBl (56 °C) + 0.005 vTBl (56 °C) - 0.005

vTB3 + 0.6

- 0.22

vTB7 + 1.32 - 0.25

vTB9 + 0.8 - 0.29

pTB8 + = 1.1

- = 1.29

a Cell cultures were induced with 3 mM IPTG at ^ 5 5 0 = 0.2 and grown to saturation overnight. b Aminoacylation assays were carried out as described in Section 3.1.3. The figures refer to the no. of pmol of [^C ] valyl adenylate that are charged with tRNAva^ per minute at 25 °C in a 25 /il volume. each lysate was reduced to an equivalent level that is close to background for the assay (for example, vTBl (56 °C) t IPTG,

Table 5.3). These results suggest that the activity in the TG2[vTBl] lysate is due solely to the E. coli ValRS.

The aminoacylation activities of lysates from uninduced class II cultures are about 5-7 times higher than the activity of the uninfected

TG2 lysate and there is a substantial increase in the charging level where IPTG has been added to the cultures. After subtracting the contribution that the E. coli ValRS makes to the overall charging rate

(about 0.043 pmol s“1), IPTG induces production of the cloned ValRS by a further 200% for vTB3 and vTB9 and by 510% for the vTB7 culture. The greater charging activity of the induced vTB7 lysate

(vTB7+ in Table 5.3) correlates with the intensity of the 110,000 kDa band that is seen on the SDS-PAGE gel (Fig. 5.6II, lane q). The

B. stearothermophilus ValRS predominates in such lysates, as judged by the high level of charging activity that remains after a 30 min incubation at 56 °C (results not shown).

The activity of the cloned ValRS, expressed in TG2 from the overproducing plasmid pTB8, is at least as high as that from the

IPTG-induced culture of TG2[vTB7]. The activity appears unaffected by adding IPTG to the cell culture.

5.5.5. Discussion

The results that have been presented in the previous sections imply that the Class I Ml 3 valS clones are weakly expressed or, most likely, not at all in E. coli TG2. DNA sequencing has shown that the coding region of the gene in such clones is orientated towards the Ml 3 lacZ promoter region, as is the valS gene in the pUC9 derivative pTB8 141

(Fig. 5.5). This plasmid, however, overproduces the B. stearothermophilus

ValRS. A plausible explanation for this difference in expression is that B. stearothermophilus valS transcription is initiated from a region located on the 0.8 kb Smal-Pstl fragment in pTB8 that lies upstream of the 3.6 kb Pstl fragment that contains the structural gene (Fig. 5.5a).

None of these constructs are affected by adding IPTG to the cell culture, which induces transcription from the lacZ promoter present on recombinant Ml 3. This was judged by SDS—PAGE and measuring the aminoacylation activities of crude cell-lysates. Class II Ml 3 valS clones, where the structural gene is orientated in the same direction as the promoter and beginning of the lacZ structural gene (Fig. 5.5c), possess charging activities that are above the basal level of E. coli ValRS and increase when cells harbouring those phages are grown in the presence of IPTG (Table 5.3). The inference here is that the cloned valS gene is being transcribed from the lacZ promoter. The DNA sequence for the start of the lacZ structural gene and the B. stearothermophilus valS upstream sequences of a Class II construct is shown in Fig. 5.7.

Translation of a transcript that had been initiated at the lacZ promoter could produce a 54 residue fusion protein, assuming that the ribosome recognises the lacZ ribosome binding site and that translation terminates at a stop codon that is situated 130 bp downstream of the Pstl site.

Vectors containing the E. coli lacZ promoter are often used to drive the transcription of cloned genes. For, example, Ptashne and co-workers developed a plasmid system by which prokaryotic or eukaryotic genes could be cloned downstream of, and expressed from, the lacZ promoter and ribosome binding site (Backman and Ptashne,

1978; Guarente et al, 1980). The results that have been presented in

this section imply that the valS gene is transcribed from the lacZ promoter when cloned into M13mp8 in the correct orientation (Class II 142 clones). The addition of IPTG, an inducer that derepresses the lac operon, to a culture of E. coli TG2 harbouring such an Ml 3 recombinant, produces an increase in the amount of B. stearothermophilus

ValRS. Presumably, this increase (between 2-5 times more than for an uninduced culture) is reflected by a similar rise in the abundance of lacZ transcripts. An inspection of the sequences upstream of the valS gene in a Class II Ml 3 clone (Fig. 5.7) that there are three termination codons, marked by asterisks, in frame with the beginning of the lacZ or-peptide. Translation of such a transcript from the lacZ initation

---- y ACCATGATTACGAATTCCCGGGGATCCGTCGACCTGCAGAGAGGGAAATCAGAAGCTGCGAGA TMITN5RGSVDLQRGKSEAAR

TTTCCTAGGATGACGGCTGCTGGAAGGTCGCCCTTGAGCTGCTTCTTTGAAAGGGCGTGCTGC FPRMTAAGRSPLSCFFERRCC

CGAGTAGAGGAAGCCGGCATCATGCCGTTATGCCAATGAAGCGCTTAAGCTTCTAGAAGGCCG RVEEAGIMPLCQ*SA*ASRRP

GTCAACTAAACTGATTGTGGGCTC VN*TDCGL

Fig. 5.7. DNA sequence of a Class II Ml 3 clone. The figure depicts the DNA sequence of a class II valS clone and shows the predicted protein sequence of a 54 residue fusion peptide, formed from the 13 N-terminal residues of /3-galactosidase and the upstream non-coding region of valS. The peptide would be encoded by a transcript originating at the M13 lacZ promoter. — > denotes the direction of transcription. The termination codons for transcription are denoted by asterisks. The standard one-letter code for amino acids is used (see legend to Fig. 1.5). 143 codon should yield a 54-residue fusion peptide, incorporating the first

13 residues of the lacZ a-peptide. There are no palindromic sequences that could form potential termination signals in this region. However, there is a potential rho-independent termination signal (AG = -15.6 kcal at 25 °C) situated about 140 bp downstream of the first termination codon (see Section 7.2.2). It is unlikely that this forms an efficient termination signal (possibly due to the distance from the termination codons) as the results, both from aminoacylation assays and SDS-PAGE, clearly indicate that transcription of valS is inducible by IPTG, so a number of full-length transcripts must be synthesised. Confirmation of these observations, if time had permitted, could have been obtained by

Northern blot analysis of mRNA from IPTG-induced or non-induced cultures of TG2 harbouring Class I or II Ml 3 clones or pTB8.

A lysate of E. coli TG2 carrying pTB8 appeared to have the same level of aminoacylation activity as the induced lysate of TG2[vTB7]

(Table 5.3). In comparison, a lysate of DH5 carrying the same plasmid but analysed by SDS-PAGE contained a very intense ValRS band

(Fig. 5.2, lane d). As both lysates were prepared by the same method

(Section 3.2), it is likely that the overproduction of ValRS from pTB8 valS exceeds that of the IPTG-induced TG2[vTB7] clone.

5.6. Conclusions

The ValRS from B. stearothermophilus has been expressed from a series of plasmids, presumably from its own promoter, and can be over-expressed from the lacZ promoter of Ml 3 by inducing transcription with IPTG. Aminoacylation assays, conducted on crude lysates prepared from cell cultures that had been allowed to grow into stationary phase, 144 indicated that the cloned ValRS is overproduced with respect to the

E. coli ValRS by more than 30-fold when expressed in E. coli TG2 (e.g. for the IPTG-induced Ml 3 clone vTB7 or for the plasmid pTB8;

Table 5.3). SDS-polyacrylamide gel electrophoresis of crude cell-lysates of E. coli DH1 containing one of the three valS plasmids suggested that the level of gene was transcription was regulated by a gene dosage effect, the smallest construct, pTB8, producing the most protein

(Fig. 5.2). The intensity of the ValRS band in the pTB8 lysate (same figure, lane d) suggests that the level of overproduction in a DH1 culture may be even higher.

Small plasmids tend to replicate to a higher copy number than larger ones (Maniatis et al, 1982), but it is questionable that a plasmid-borne valS gene will be present in more copies than if the gene were carried on an M13 vector. The life cycle of M13 is such that 100-200 copies of the double-stranded replicative form of the genome accumulate in the infected cell prior tothe cessation of replication (Messing, 1983). Overproduction of a cloned protein is an advantage when studying protein structure and function. For example, both X-ray crystallography and protein engineering require that substantial amounts of the cloned protein be purified for subsequent experiments. An overproducing Ml 3 valS clone would be ideal, combining a single-stranded template for site-directed mutagenesis with a system that produces abundant of the mutant protein. For example

B. stear other mo philus TyrRS is overproduced from an Ml 3 clone to account for up to 50% of the total cell protein without any adverse consequences for the host cell (Winter et aly 1982). A major drawback to working with M13 recombinants is their stability. M13 phages containing large inserts are reported generally as being unstable (Dente 145 et al, 1983), even though there appear to be no packaging constraints associated with the size of the genome: a larger genome is simply packaged in a longer viral capsid (Herrmann et al, 1980; Messing, 1983).

In practice, it appears that inserts exceeding 5 kb in size are unstable

(Thompson, 1982). Cordell et al (1979) were unable to clone the rat insulin gene on 5.4 kb or 9.4 kb inserts, though a 1.7 kb fragment common to the larger fragments was stable when cloned into M13mp5.

This may be an inherent limitation to working with filamentous phage vector such as M13, as the closely related phage fd also exhibits instability towards large inserts (Herrmann et al, 1980). Instability may manifest itself as a deletion of parts or the whole of an insert. We have detected apparently complete loss of insert DNA when deleting parts of the valS gene (cloned into Ml 3 vectors) using the exonuclease

III/S1 method of Henikoff (1984). Dideoxy sequencing (T-reaction only) of digested and religated clones revealed that greater than 60% of recombinant phage had lost both the insert and Ml 3 sequences distal to the universal priming site (NJB, unpublished observations). This problem occured even if the constructs were transfected into a recA~ host such as TG2. Exo III/S1 deletion mutagenesis, carried out on pUC clones, is not prone to spontaneous deletions (Dr. Jonathon

Knowles, personal communication).

The E. coli valS gene has been isolated and subcloned so that the gene product may be overproduced eight-fold over the level of the native enzyme (Skogman and Nilsson, 1984). The B. stearothermophilus

ValRS, by similar criteria, is overproduced by at least 30-fold though, strictly, the two cannot be compared as differential codon usage is likely to have an effect upon the expression of Bacillus genes in

E. coli. The expression of the cloned E. coli ValRS appeared to be 146 dependant upon the size of the plasmid carrying the gene, similar to the expression of the B. stearothermophilus ValRS. The E. coli valS gene was originally isolated on a 28 kb plasmid and produced 2.5 times as much ValRS as a standard E. coli strain. A subclone carrying a

9.4 kb fragment overproduced ValRS approximately eight-fold and a second subclone with a 4.2 kb insert expressed ValRS to seven-fold over the standard. Skogman and Nilsson speculated that the different levels of overproduction for certain cloned E. coli aminoacyl-tRNA synthetases, varying between 7 and 100-fold, reflects more than one level of gene control (Skogman and Nilsson, 1984). Neidhardt et al

(1977) noted that the aminoacyl-tRNA synthetases from E. coli could be regulated in more than one way (Section 1.5.4). A number of synthetases appear to regulate their own transcription, either at the level of transcription or translation (Putney and Schimmel, 1981;

Springer et al, 1985). AlaRS is an example of a synthetase that binds top its own promoter to repress its own expression. At least one synthetase gene, that encoding IleRS in E. coli, is transcribed as part of a polycistronic message, the major promoter being over 1 kb from the beginning of the ileS coding region (Kamio et al, 1985). Thus, subcloning may delete sequences that are involved in repressing expression of the cloned gene. As yet, too little is known about the control of aaRS expression in E. coli and nothing is known about expression in other bacterial species such as B. stearothermophilus. 147

CHAPTER 6

EXPERIMENTAL: DNA SEQUENCING

6.1. Introduction

One of the major contributions to molecular biology has been the development of methods for determining the sequence of DNA.

Chemical sequencing (Maxam and Gilbert, 1977) and chain-termination sequencing with dideoxyribonucleotide triphosphates (Sanger et al, 1977b;

Sanger et al, 1980) have become standard practices in many laboratories.

In particular, dideoxy chain-termination sequencing has benefited from the development of a number of increasingly versatile cloning and sequencing vectors such as the Ml 3 family of single-stranded bacteriophage vectors (Gronenborn & Messing, 1978; Messing et al, 1981) which allow quick and simple purification of template DNA for dideoxy sequencing. A variety of computer programs have been developed for manipulating the data that are accumulated, such as those of Staden (1982b; 1986) or Queen and Korn (1984). Such programs allow short DNA sequences to be arranged relative to one another so that, ultimately, they are merged into one contiguous DNA sequence.

6.2. Dideoxy Chain-termination Sequencing

The dideoxy chain-termination method involves the interruption of synthesis of a labelled second strand that is synthesised from a

single-stranded DNA template. This is achieved by the incorporation of

dideoxynucleotide triphosphates (ddNTPs) which lack a hydroxyl group 148 at the 3’ position of the ribose ring, thus preventing further extension by the large (Klenow) fragment of DNA polymerase I. By using four dNTP mixtures, each containing [a-32p] labelled dATP plus one of the four dideoxy derivatives of dATP, dCTP, dGTP and dTTP, a set of radioactive polynucleotides of varying lengths, terminating at the appropriate dideoxy insertion, is obtained. The products of elongation are resolved by electrophoresis on a polyacrylamide gel and the sequence is derived from the pattern of labelled bands after

autoradiography of the gel. Up to 300 nucleotides can be read off of

a single dideoxy sequencing reaction by splitting the DNA between two

6 % polyacrylamide gels and running one for two hours, the other for

four hours (Bankier and Barrell, 1983). A recent improvement has been

the development of "buffer gradient" gels (Biggin, Gibson and Hong,

1983), allowing up to 300 nucleotides to be read from a single gel

when used in conjunction with [a-^^S] dATP instead of [a-32p] dATP

(see Section 6.3.7).

6 .2.1. The Use of M13 Vectors in Chain-Termination Sequencing

The M13mp series of cloning vectors were derived from the

filamentous coliphage Ml 3 by inserting a region of DNA from the

E. coll lac operon (containing the promoter and first 145 codons of the

lacZ gene product, /3-galactosidase) into non-essential viral DNA

sequences (Gronenborn and Messing, 1978). Mutations were introduced

into the vector DNA in order to create a polylinker of unique

restriction endonuclease recognition sites within the first few codons of

the |3-galactosidase structural region. Subsequent manipulations have

produced an extensive family of Ml 3 cloning vehicles possessing

different polylinkers containing a variety of unique restriction sites 149

(Messing and Vieira, 1982; Norrander et a/, 1983; Yanisch-Perron et al,

1985). The polylinker maintains the reading frame for the

/3-galactosidase fragment and recombinant molecules may be distinguished from a background of uncut or re-ligated vector by means of a simple histochemical reaction. The Ml 3 vector is transfected into a suitable

E. coli host such as JM101 (Messing, 1979) in the presence of the inducer IPTG and 3-bromo-4-chloro-5-indolyl /3-D galactopyranoside

(X-gal), a substrate for /3-galactosidase. Wild-type vectors, i.e., those that do not contain any recombinant DNA, produce the amino-terminal fragment of /3-galactosidase which complements a mutation in the lacZ gene product from the host cell. Together, the two polypeptides form a functional complex (ar-complementation) which hydrolyses the substrate, producing lactose and releasing a dye which gives the plaque its characteristic blue colour. In contrast, foreign DNA that has been inserted into the M13 polylinker produces a fusion protein that is unable, if the reading frame for the /3-galactosidase fragment is destroyed, or poorly able if the lacZ reading frame is maintained, to complement the deficiency in the host enzyme, thereby producing a white plaque.

The life-cycle of single-stranded DNA phages like Ml 3 is such that the infected cell is not lysed by the phage, but remains intact whilst progeny virions are extruded into the culture medium, allowing a rapid biological purification by simply pelleting the cells by centrifugation and retaining the culture supernatant. Subsequently, the

DNA may be sequenced using a "universal" sequencing primer that hybridises to a vector sequence adjacent to the cloning site. 150

6.2.2. Shotgun Cloning and Sequencing in Ml 3

It is possible to sequence a piece of DNA by the chain-termination method by using a "shotgun" or random strategy, without recourse to a restriction map for that DNA (Deinenger, 1983).

Briefly, the DNA is cleaved at random, either with restriction enzymes that have a four base pair recognition sequence and thus cleave, on average, every 4^ or 256 base pairs or by sonicating the DNA and purifying sheared fragments of between 300-1000 base pairs. Such fragments may then be cloned into M13 and introduced into competent cells. Each white recombinant Ml 3 plaque is picked and used to infect a small E. coli culture, the cells are grown and processed within the same day, and the DNA is purified and sequenced. The sequences are then overlapped by using computer programs such as DBAUTO (Staden,

1982b). This generates a lot of data and as more of the sequence is compiled, the level of redundancy in terms of sections of DNA that have been sequenced several times increases. At such a point it is advantageous to switch to a stategy of sequencing defined regions of the DNA in order to obtain the missing sequence(s). This may involve synthesising oligonucleotide primers that are complementary to sections of the full-length target DNA and using them to sequence through that

DNA until the new sequence overlaps with some existing sequences.

Alternatively, specific restriction fragments may be cloned into M13 vectors and sequenced, assuming that a sufficiently detailed restriction map had been obtained beforehand.

De novo sequencing (i.e. determining the DNA sequence of a gene for which the protein sequence is unknown) demands that the sequence be derived from both strands for greatest accuracy. A length of single-stranded DNA that is rich in G and C residues may form a 151

stable secondary structure, especially when the G/C-rich region is

located close to the 3’ end of the polynucleotide. This accounts for a

particular sequencing anomaly known as a compression (Kramer and

Mills, 1978). In such a case, the polynucleotide migrates through a

polyacrylamide gel with an abnormal mobility, producing a group of

bands on the autoradiograph for which it is impossible to determine

the precise order of the sequence. The stability of regions of secondary structure may be overcome in a variety of ways. For example, the amount of formamide that is present in the gel loading dye mixture may be increased (Bankier and Barrell, 1983), or the dGTP in the nucleotide mixes may be replaced by deoxyinosine triphosphate

(dITP) which forms only two Watson-Crick hydrogen bonds with dCTP

as opposed to three in a dG:dC base pair (Mills and Kramer, 1979). A

third approach is to sequence the complementary strand of the DNA

insert. The compression is often associated with a particular side of

the secondary structure and occurs on opposite sides of the secondary

structure in both strands. Thus, the correct sequence may be surmised

by comparing the sequences obtained from both strands for that region.

6.3. Sequencing of valS by the Chain-termination Method

6.3.1. Introduction

The valS gene from B. stearothermophilus was sequenced by using a

combination of two strategies: a random shotgun approach, based upon

cloning sonicated fragments of the gene into Ml 3 vectors, determining

the DNA sequence of each insert and overlapping the sequences with

computer programs (Staden, 1982b), and an ordered approach which 152 involved cloning specific restriction fragments. The methods that were used were adapted from Bankier and Barrell (1983).

6.3.2. Materials

6.3.2. A. Bacterial Strains

E. coli JM101 (Messing, 1979) or TG2, a recA~ derivative (Gibson,

1984) was used throughout. The genotypes are given in Table 2.1.

6.3.2. B. Phage Vectors

A number of derivatives of the M13mp seies of vectors were used for cloning. The vectors mp8, 18 and 19 were gifts from Dr Mick

Jones. The latter two vectors (Yanisch-Perron et al, 1985) were used to clone specific restriction fragments, using methods that have been described in Chapter 2. M13K8.2 (Waye et al, 1985), a gift from Dr

Mary Waye, is derived from M13mp8 and contains two copies of a cassette of an EcoK restriction sequence containing an internal Smal site. The polylinker sequences of mp8 and K8.2 are shown in Fig. 6.1.

The replicative form (RF) of the vector DNA was grown in an EcoK~ host such as TG2 and cut with Smal. Blunt-ended sonicated DNA fragments were ligated with the vector and the ligation mixture was used to transform JM101. The transformed cells were then plated in

H-agar overlays in the presence of IPTG and X-gal. The majority of plaques were white recombinants. Uncut vector or religated vector molecules possess an intact EcoK restriction site and are recognised by the host restriction and modification enzymes, whereas those containing an insert at the Smal site remain intact. The few blue plaques that 153

M13mp8

l a c Z ----- > T M I T AT GACCAT GATT ACGGA ATT CCCGGGGAT CCGT CGACCT GC AGCCA AGCTT GGCACT EcoRl BamHl Pst I HindHI

S m a 1 S a l I X m aI Ac c l Hi ncII

M13k8.2

lacZ — > T M I T AT GACCAT GATT ACGAATTCCCAACCCGGGAGTGCTAAACCCGGGAGTGCTAGG EcoRl Smal Sma1

E c oK E c o K

GGAT CCGT CGACCT GCAGCCAAGCTTGGCACTGGCC BamHl Pst 1 Hindlll

Sail

Fig. 6.1. Polylinkcr sequences of Ml 3 vectors mp 8 and K8.2. lacZ— > refers to the direction of transcription of the N-terminal fragment of 0-galactosidase that is encoded by Ml 3 vectors. The first four codons of the structural gene are shown. 154 occured were possibly in-frame recombinant molecules. The method renders it unnecessary to dephosphorylate the vector prior to ligation: it is also easier to toothpick recombinant plaques without a substantial background of blues.

6.3.2. C. Enzymes

Restriction enzymes were purchased from the Boehringer

Corporation Ltd, Pharmacia Fine Chemicals or Cambridge Biotechnology

Laboratories Ltd. DNA Polymerase I (Klenow fragment), T4 DNA

Polymerase and T4 DNA Ligase were purchased from Cambridge

Biotechnology Ltd.

6.3.2. D Radiochemicals

[a-32pj deoxyadenosine 5’ triphosphate (410 Ci/mmol) and deoxyadenosine 5’ triphosphate (600 Ci/mmol) were purchased from

Amersham Radiochemicals pic.

6.3.2. E. Sequencing Primers

The universal sequencing primer is complementary to an M13 (+) sense DNA sequence immediately downstream of the polylinker (i.e. to the right of the Hindlll site of both vectors shown in Fig. 6.1). The sequence of this primer is 5* GTAAAACGACGGCCAGT 3’. The Ml 3 reverse sequencing primer, sequence 5’ AACAGCTATGACCATG 3’, is complementary to Ml 3 (-) sense DNA and anneals upstream of the polylinker. Both primers were synthesised using phosphoramidite chemistry on an Applied Biosystems DNA Synthesiser, model 380B. 155

The sequence of the structural gene was confirmed (after the

shotgun sequencing data had been amalgamated) by resequencing with a

set of oligonucleotide primers that were complementary to the minus

strand of an M13mp8 clone containing the 3.6 kb Pstl fragment of

pTB8 (Fig. 4.8a). All of the primers were made on the ABI DNA

synthesiser using phosphoramidite chemistry. The sequences and lengths

of each of these primers are shown in Table 6.1.

Two other primers were made in order to fill in gaps that were left after the shotgun data had been assembled. NIG1

(CGGCTTGCTCGGCGGC) was used to fill in a gap on the minus strand. NIG2 (TGCCGGCTTCCTCTA) was used to link the 3.6 kb Pstl fragment of pTB8 to an adjacent 0.8 kb Xmal/Pstl fragment.

I am indebted to Jack Knill-Jones for synthesising all of the above primers.

6.3.2.F. Nucleoside Triphosphates

Deoxynucleotide triphosphates (dNTPs) and dideoxynucleotide

triphosphates (ddNTPs) were purchased from Sigma Chemicals Ltd, U.K.

dATP, dGTP, dCTP and dTTP were each made up as 10 mM stock

solutions in TE buffer. Each dideoxy derivative of the above was

made up as a 10 mM solution in water and stored as above. A stock

of 10 mM dITP in TE buffer was also prepared. All stock solutions

were stored at -20 °C. 156

Table 6.1

Oligonucleotide Primers for Sequencing the Entire valS

Gene from One Strand a

Primer b Sequence Length

YNTERM GTCCATCATGTTGAAGG 17

VAL2 GGGAATGGATCACGCCGG 18

VAL3 TACATCATCAACTGGG 16

YAL5 GACTTTGAAATCGGCAAT 18

VAL6 CGCAATGGTTTGTGA 15

VAL7 ACATGGTTCAGCTCGG 16

VAL8 GAT GG AT GT GATCGAC 16

VAL9 CGAAACGATCGAGACG 16

VAL10 CGCATGAAGGCGAAT 15

VAL11 AGCT CTT GATT GAT AC A 17

VAL12 GCTCATGTCGTCGAAGA 17

a The primers were synthesised on an Applied Biosystems DNA Synthesiser, model 380B using phosphoramidite chemistry. b A primer VAL4 was originally synthesised but proved to be difficult to purify and unsuitable for sequencing. The primer was re-synthesised and named VAL5). 157

6.3.3. Cloning of Sonicated DNA Fragments

6.3.3. A Preparation of DNA Fragments

5 /ig of plasmid pTB8 (Fig. 4.8) was digested to completion with

5 units each of restriction enzymes Pstl and Smal in a total volume of

50 fil (see Section 2.5.1). Subsequently, the digested DNA was loaded onto a 0.7% low-gelling temperature agarose/TBE gel, and the 3.6 kb

Pstl fragment was resolved by electrophoresis and purified by methods described in Section 2.5. Some sequence data was derived from the

4.7 kb BamHl fragment of pNB2.1 (Fig. 4.7).

Prior to sonication, the DNA fragment was concatermerised for at least 4 h at 18 °C in a 20 /tl volume containing 1 mM rATP and 5 units of T4 DNA ligase in 1 x ligation buffer (LB: Table 2.2). This ensured that sonication produced an even distribution of fragments along the length of the DNA, with minimal bias towards fragments derived from one or other end.

6.3.3. B Sonication

The volume of the concatermerised DNA mixture was adjusted to

250 fil with sonication buffer (0.5 M NaCl, 0.1 M Tris-HCl, pH 8 and

1 mM EDTA) and chilled in a capped Eppendorf tube. The tube was immersed in iced water in a cup horn sonicator (Heat systems-

Ultrasonics, Inc. Model W-375), the bottom of the tube being positioned a few millimetres above the sonicator probe. The DNA solution was sonicated three times in 40 s bursts with 40 s breaks between sonication.

Following sonication, 2 volumes of cold ethanol and 25 /il 158

3M NaOAC (final concentration 0.1 M) were added to the solution and the sheared DNA was extracted and precipitated by standard methods

(Section 2.5.3).

6.3.3. C Repair of Sheared Termini

It was necessary to repair the ends of the sonicated DNA fragments in order to make them blunt-ended for shotgun cloning into

Smal-cut M13 vectors. The DNA fragments, precipitated from ethanol and collected by centrifugation, were dried under vacuum and resuspended in 22 fil of water. The ends were repaired by using T4 polymerase (Section 2.10).

6.3.3. D Fractionation of 300 — 700 bp Fragments

Sonication produces a wide range of fragments, from large fragments down to extremely short pieces of DNA. The conditions for sonication that were used in Section 6.3.3.B had been determined empirically by sonicating an identical sample of DNA and withdrawing aliquots of the sheared DNA at different times so that a set of DNA fragments, sonicated for between 30 s and 200 s, were obtained. These were subjected to electrophoresis on an agarose gel and the period of sonication that produced fragments of 300-1000 bp in size (3 x 40 s) was chosen for the main sonication. One can read approximately

250-300 bp from the average sequencing gel; hence, it is desirable to clone fragments of DNA of at least 300 bp in length into M13 vectors.

On the other hand, larger fragments tend to clone into Ml 3 less efficiently compared to smaller fragments (Thompson, 1982). Therefore,

DNA fragments of about 300-700 bp in length were purified following 159 electrophoresis through a low-gelling temperature agarose gel

(Section 2.5.3). The fractionated DNA was washed with equal volumes

of phenol and chloroform, then precipitated from ethanol and

resuspended in 20 jd of TE buffer.

6.3.3. E Ligations

RF M13mp8 was cut with Smal and phosphatased, as described in

Section 2.6. 10 ng of .SVmd-cut mp 8 or M13K8.2 was ligated with 5,

10, 25 or 50 ng of blunt-ended sonicated DNA fragments in a total

volume of 10 /d containing 1 x LB and 1 mM rATP. T4 DNA ligase

(10 units) was added and the ligations were incubated overnight at

15 OQ

6.3.3. F Preparation of Competent Cells

A single colony of E. coli JM101 was picked into 5 ml of 2xTY broth and grown overnight at 37 °C with shaking. An aliquot of the overnight culture (200 /d) was added to 15 ml of fresh medium and this culture was grown for 2 h at 37 °C with shaking to roughly ^ 5 5 0

= 0.3. The cells were chilled on ice, then pelleted by centrifugation at

3,000 rpm for 5 min. The cells were resuspended in 7.5 ml of cold

50 mM CaCl2, 10 mM Tris-HCl, pH 7.5, and left on ice for at least 30

min. The cells were then pelleted by centrifugation as before and

resuspended in a final volume of 1.5 ml of cold CaC^/Tris buffer.

The competent cells remained viable for two days if kept at 4 °C. 160

6.3.3. G Transformation of Competent Cells

A 200 /d aliquot of competent cells was mixed with 5 /d of each

ligation mixture and incubated on ice for at least 30 min. The cells

were heat-shocked for 2 min at 45 °C. Aliquots (30 fi\) of IPTG (25

mg/ml in water) and X-gal (25 mg/ml in DMF), together with 50 /d of

an overnight JM101 culture, were mixed with the transformed cells.

The mixture was rapidly mixed with 3 ml of H-top (0.7%) agar,

previously kept at 45 °C, and poured on H-agar plates. Once the

H-top overlay had set, the plates were incubated at 37 °C overnight.

6.3.3. H Preparation of Template DNA

White recombinant plaques were picked into 1.5 ml of 2xTY

medium containing 50 /d of an overnight culture of E. coli JM101.

The cultures were shaken vigorously for 6 h at 37 °C, then transferred

to Eppendorf tubes and the cells were pelleted by centrifugation in a

bench-top microcentrifuge at about 10,000 rpm for 5 min at room

temperature. The supernatant was carefully poured off into a fresh

Eppendorf tube, taking care not to disturb the cell pellet (RNA and

cellular DNA fragments - whether viral or chromosomal - may hybridise

to the ssM13 template and act as primers). The single-stranded viral

DNA was extracted and precipitated according to Bankier and Barrell

(1983). The DNA was resuspended in 30 /d of TE buffer. The yield,

on average, was 3-5 fig (as determined from electrophoresis through an

agarose gel in comparison to standards of known concentration).

Template DNA was stored at -20 °C. 1 6 1

6.3.4. Didcoxy sequencing with [a-32pj dATP

Single-stranded template DNA, prepared as detailed in the

preceding section, has a concentration of about 0.05 pmol per /d. 5 /d

of template DNA was mixed with 1 /d of 10 x TM buffer (Table 2.2),

1 jd of sequencing primer (0.2 pmol) and 3 /d of water in a capped

Eppendorf tube. The primer and template were allowed to anneal

together by boiling the tube for 2 min in a shallow water bath, then

removing from the heat and allowing the tube to cool down slowly to

room temperature in the water bath.

2 /d of primed template was added to each of four capless

Eppendorf tubes, which were arranged in a row in an Eppendorf

centrifuge rack that can accommodate ten tubes. Four racks (i.e. 8

separate clones) can be loaded into the centrifuge at any one time.

The dideoxynucleotides (ddNTP) were each diluted to 0.1 mM

dilutions in water for sequencing, with the exception of ddTTP

(0.4 mM), and stored at -20 °C. The four dNTP mixes were prepared

from 0.5 mM stocks of dATP, dCTP, dGTP and dTTP in TE buffer

and stored at -20 °C. The concentrations of these mixes are: A-mix,

154 fiM dGTP, 154 pM dCTP and 154 /*M dTTP; G-mix, 11 /iM dGTP,

217 fiM dCTP and 217 fiM dTTP; C-mix, 217 nM dGTP, 11 piM dCTP

and 217 jiM dTTP; T-mix, 217 jiM dGTP, 217 jiM dCTP and 11 /iM

dTTP.

An equal volume of each dNTP mix was mixed with its

corresponding ddNTP. 2 /dof ddATP/A-mix was added to the first

tube in each row of tubes containing primed template DNA, 2 /d of

ddGTP/G-mix to the second tube and so on. The nucleotides were

mixed with the primed template DNA by briefly spinning the tubes.

Finally, 2 /d of a mixture containing 2.5 /iCi of [a-^^P] dATP 162 and 0.25 unit of the Klenow fragment of DNA polymerase I in

1 x TM buffer was added to each tube. The reaction was allowed to proceed for 5 min at room temperature, as described by Bankier and

Barrell (1983). The tubes were subsequently incubated at 37 °C where the reaction was allowed to continue for a further 10 min. An elevated temperature had been shown previously to prevent "pile-ups”, i.e. sequencing anomalies due to premature termination of second strand synthesis that occur when a G/C-rich region of the template forms a secondary structure, preventing the DNA polymerase from traversing through that region (N.J.B., unpublished observations). Next, 2 /il of a

cold nucleotide chase mixture (0.5 mM for each dNTP) containing

0.25 unit of Klenow polymerase was added to each tube. The reactions

were re-incubated at 37 °C for 15 min and then terminated by adding

2 fil of formamide dye (0.1% (w/v) bromophenol blue, 0.1% xylene

cyanol FF and 10 mM EDTA in formamide). The mixtures were

boiled for 3 min and immediately loaded onto a 6 % polyacrylamide, 7

M urea sequencing gel (see next section).

6.3.5. Separation of Radiolabelled Polynucleotides by Electrophoresis

Polyacrylamide gels of size 360 x 180 x 0.4 mm containing 6 % polyacrylamide and 7 M urea were cast between glass plates (one of which was treated with dimethyldichlorosilane (BDH Chemicals Ltd) so that the polymerised gel would not stick to it). The acrylamide solution contained 6 % de-ionised acrylamide (38:2 acrylamide:

N,N’-methylene-bisacrylamide), 7 M urea, 0.05% ammonium persulphate and 0.1% TEMED in TBE buffer, pH 8.3. The gel was poured immediately after adding the ammonium persulphate and TEMED. The concentration of the TBE buffer varied according to the sequencing 163 protocol that was followed. When it was necessary to run long and short gels (i.e. the sequencing rection was split between two gels, the first being electrophoresed for 1.5 hr, the second being run for 4 hr in order to resolve the larger polynucleotides), the final concentration of

TBE buffer in the gels was single strength. In many cases, polynucleotides were resolved on a single buffer gradient gel (Biggin,

Gibson and Hong, 1982). This type of gel was cast from two separate gel mixtures (one containing 0.5 x TBE, the other containing 5 x TBE buffer and 50 /*g/ml bromophenol blue) as follows. A 4 ml volume of

0.5 x TBE mixture was drawn up into a graduated 10 ml pipette.

This was followed by 7 ml of 5 x TBE mix, a few air bubbles being drawn up with the solution so as to mix the two solutions in the pipette and to form a rough buffer gradient. This mixture was pipetted between and down one side of a pair of gel plates, followed by a further 10 ml of 0.5 x mixture which was pipetted across the plates in order to prevent the first volume from being pushed across the bottom of the plates and up the other side (this is readily visible as the 5 x TBE mixture is blue in colour. Finally, the gel was topped up with a further 10 ml of 0.5 x TBE gel mixture and a comb was inserted to form the slots.

The buffer that was used for electrophoresis was 1 x TBE buffer.

The comb was removed from the gel and the slots were flushed out with buffer several times prior to loading the samples in order to remove local high concentrations of urea that diffuses out of the gel.

The DNA samples, previously boiled for 2-3 min, were loaded onto the gel in the order A, G, C, and T. Approximately 1 pel was layered across the bottom of each slot. Care was taken not to overload the slot: because of the high energy of the decay of [32p]-labelled compounds, there is a certain amount of scatter during emission. This 164 scatter is more pronounced when the gel slot has been overloaded, producing indistinct bands on the autoradiograph. This makes the sequence difficult to read, especially at the top of the gel. Buffer gradient gels were electrophoresed for 1.5 h (or until the bromophenol blue marker band had run off of the bottom of the gel) at a constant power setting of 38 W. The 1 x TBE gels were run at a constant current of 30 mA. The short gel was stopped after the bromophenol blue had just run off the bottom of the gel (1.5 h), the long gel was stopped after the xylene cyanol had run off the bottom (approximately

4 h).

6.3.6. Fixing and Autoradiography

Following electrophoresis, the gel (attached to the non-siliconised back plate) was immersed in 10% acetic acid for 15 min in order to fix the polynucleotides and to remove most of the urea. The gel, still attached to the back plate, was removed from the acid and blotted gently with tissues. A sheet of Whatman 3MM chromatography paper was carefully dropped onto the gel, then firmed down so that the gel stuck to the paper and the gel was subsequently peeled away from the glass plate.

The gel was dried for 1 hr at 80 °C under vacuum on a Bio-Rad

Slab Gel Dryer. The gel was exposed to Fuji RX Medical film for 16 h and developed using a Kodak Industrial "X-Omat" Processor, model 3.

6.3.7. Modifications for Sequencing with [a-^^S] dATP

A number of modifications were necessary in order to adapt the protocol for sequencing with [or-^^S] dATP. The isotope is a weaker 165 emitter than [a-^^P] dATP, so it was necessary to use 12 f id of radionucleotide per clone and to autoradiograph the gel for 24 h. The advantages of this isotope are, firstly, that it is safer to handle than

32p. Secondly, because the emission energy is about ten times lower,

there is less scatter and the bands on the autoradiograph are sharper

than those produced from a gel that has been used to separate [a-32p]

labelled polynucleotides. Consequently, the gel slots can be filled with

up to 5 fi\ of sequencing reaction and the gel autoradiographed for

longer, allowing the weakest sequences to be developed without fear of

rendering the bands unreadable. Over 250 nucleotides can be read

from such a gel with ease, especially if the polynucleotides were

separated on a buffer gradient gel. The half-life of the isotope is also

longer than that of [a-32pj dATP (44 days as opposed to 14 days).

It is also necessary to alter the ddNTP:N°A ratios. The

concentration of ddATP has to be reduced to 0.33 mM, probably due to

a lower rate of incorporation for [a-^^S] dATP (Bankier and Barrell,

1983). The ddNTP:N°A ratios for both the G and C reactions were

adjusted from 1:1 to 1:1.2. Finally, the length of the times of

incubation at 37 °C were increased from 10 and 15 min to 15 and 20

min respectively, and the dried gel was autoradiographed for at least

24 h. Both these modifications are made so as to counter the lower

incorporation of the isotope.

6.3.8. Modifications for Sequencing with dITP

The working dilutions of both the dNTP mixes (containing dITP

instead of dGTP) were the same as described in Section 6.3.4.

However, the ratios of N-mix to ddNTP were as follows; for A and I, 166

1:1; for C and T, 2:1.

6.3.9. Double-stranded Sequencing

Some sequence data was derived by double-stranded sequencing of plasmid DNA (Chen and Seeburg, 1985). Briefly, plasmid DNA was denatured with 0.2 M NaOH, neutralised with ammonium acetate, pH 4.5, precipitated with ethanol, dried in vacuo and resuspended in a small volume of 1 x TM buffer. Primer (5 pmol) was added, left to anneal at 50 °C for 15 min and the DNA was sequenced as described earlier.

6.4. Computer Software

The DNA sequences were assembled into a contiguous sequence using the DBUTIL and DBAUTO programs of Staden (1980; 1982b).

The programs were stored in a Vax 11/750 mainframe computer in the

Biophysics Department, Imperial College. I am grateful to Dr Peter

Brick for his help and continual interest in the progress of this work.

The completed sequence was analysed using ANALYSEQ programs for the analysis of DNA sequences (Staden, 1984; 1986) and

ANALYSEP, a program for analysing protein sequences (R. Staden, unpublished). The former program contains several sub-routines that were used to examine the DNA sequence (for example, searching for restriction sites, calculating codon usage, translating of the DNA sequence in all three reading frames for both strands). The

ANALYSEP programs were used to examine the protein sequence of

ValRS, once derived from the DNA sequence. ANALYSEP contains an 167 improved version of the DIAGON program (Staden, 1982a) that was used to search for homologies between the primary sequences of a number of aaRSs. The program compares two protein sequences and produce a 2-dimensional graphical display of the homologies between them. The program searches for similarities between chemically or structurally similar residues (e.g. for conservative amino acid substitutions) using a scoring matrix (Dayhoff, 1969) and the sequences are compared by moving a "window" of variable length along the sequences in increments of one residue at a time. The size of the window ("span length") is set at an odd number by the operator and the program scans either side of the central position. Two other parameters may be set by the operator. These are the number of perfect amino acid matches within the span length ("identities") and the degree of stringency of the Dayhoff matrix ("scores proportional"). As the window moves along the sequences that are being compared, a dot is plotted each time that the parameters are satisfied. The final graphical output (see Sections 8.2 and 8.3) shows discrete groups of dots representing regions of homology. For example, if a protein is compared to itself, such as when searching for internal repeating sequences, then a diagonal line will be plotted, representing 100% homology. If a low stringency is used, then a background of minor homologies between different parts of the protein will be plotted as discrete regions of dots that are parallel to the diagonal. A more rigorous determination of what constitutes a match is found as the stringency of algorithm is increased.

Some synthetase sequences were checked using the Micro-Genie software of Queen and Korn (1984), run on an IBM-XT microcomputer.

The program is less sophisticated than DIAGON, but nonetheless versatile. The operator can vary the number of matches within the window ("minmatch"), the amount of sequence that can be allowed to

"loop out" of one sequence to get it to match to another one ("looplen")

and the degree of stringency of the overall match for the window,

typically set at 50% ("minper"). Under the appropriate conditions, no

significant difference was found between the outputs of the two

programs for a given pair of enzymes.

HYDROPLOT, a sub-routine of ANALYSEP, was used to plot the

hydropathicity within the VA1RS protein sequence. Both the DIAGON

and HYDROPLOT analyses were run on a VAX 11/780 mainframe at

the MRC Laboratory of Molecular Biology, Cambridge, with the help of

Dr John Walker. 169

CHAPTER 7

ANALYSIS OF THE B. stearothermophilus valS DNA SEQUENCE.

7.1. Introduction.

The cloning of the valS gene from B. stearothermophilus, its

localisation on three separate plasmids and details concerning the kinetic

and physical properties of the gene product have been described in

Chapters 4 and 5. The DNA sequence of the gene has been

determined by the dideoxy chain-termination method (Sanger et al,

1977b; Sanger et aU 1980). The results reveal that there is an open

reading frame (ORF), predicted to encode a protein of Mr - 103,000,

that is bounded by initiation and termination signals resembling those signals that are recognised by E. coli RNA polymerase (Section 7.2).

The codon usage for this reading frame agrees well with the codon

usage of other Bacillus synthetases and suggests that such genes are

expressed in E. coli with few difficulties (Section 7.3). Amino acid

analysis (Moore and Stein, 1963) of purified B. stearothermophilus ValRS,

coupled with a spectrophotometric determination of the number of

tryptophan residues in the protein (Edelhoch, 1967; Mulvey et al, 1974)

implies that the correct reading frame was maintained throughout the

length of the ORF that had been assigned to valS (Section 7.4).

The sequencing was carried out in conjunction with Tammy Gray

and Dr Thor Borgford. 170

7.2. Analysis of the valS Transcription Unit.

7.2.1. Nucleotide Sequence of valS.

The gene was sequenced by cloning fragments of DNA into the single-stranded DNA phage vectors M13mp8 or M13K.2 (Messing and

Vieira, 1982; Waye et al, 1985). The fragments spanning the valS structural gene (2.65 kb) were generated from the 3.6 kb Pstl fragment of pTB8 by sonication (Deininger, 1983) or by cloning specific restriction fragments that had been derived from the same plasmid.

The construction of simple restriction maps for all three plasmids that carry the gene was useful for orientating small contiguous stretches of shotgun DNA sequence data relative to each other in the early stages of the project. Approximately 1.6 kb of sequence upstream of the structural gene was also determined, largely by "shotgun" cloning of sonicated fragments that were derived from the 4.7 kb BamHl fragment of pNB2.1 (Fig. 4.7). This region contains sequences that agree well with the canonical hexanucleotide sequences of a prokaryotic promoter.

These sequences appear to be well conserved between E. coli and

Bacillus spp. and lie immediately upstream of the start site of transcription (Rosenberg and Court, 1979; Siebenlist et al, 1980).

The elucidation of a DNA sequence demands, for completeness, that the DNA be sequenced from both strands. The reason for this is that certain DNA sequences exhibit strand-specific secondary structures that are recognised on sequencing gels as "compressions" (Section 6.2.2).

These may be overcome in several ways, such as sequencing the complementary DNA strand or by substituting dITP for dGTP in the dNTP mixes (Section 6.3.8; Mills and Kramer, 1979). The dITP:dCTP interaction forms two hydrogen bonds, compared with three bonds in a ♦ 4

— 4 — -> 4

— 4 7 7 8 9 10 111 2 - * — -> -■) 5 5 6 > ---- VAL VAL sequencing primers --■* 1 1 2 3 f f — 4 P P H 172 dGTP:dCTP pairing; accordingly, a polynucleotide containing inosine nucleosides is less likely to form the secondary structures that cause compressions. Sequencing reactions containing the standard dNTP mixes

(dA, dG, dC, dT) were run in parallel with reactions that contained dITP. Much of the target DNA sequence, including practically all of the structural gene, was determined from each strand at least twice by the shotgun method. The sequences upstream of the structural gene were not so rigorously determined; nonetheless, any part of the sequence was often determined from two or more separate shotgun clones. A schematic representation of the extent of sequencing, for the entire

3.6 kb Pstl fragment of pTB8 and 1.4 kb of DNA further upstream of this fragment, is shown in Fig. 7.1.

A third approach to determine the sequence with accuracy was to sequence the entire structural gene in one direction only, using a set of synthetic oligonucleotides that were spaced roughly 250 bp apart. The primers were annealed in turn to an M13 template that contained the

3.6 kb Pstl fragment from pTB8; the priming sites are shown schematically in Fig. 7.1. The arrows in the figure depict the

Fig. 7.1. (Facing) Extent of sequencing of valS. The B. stearothermophilus valS gene was sequenced using methods that have been described in Chapter 6. The region of DNA is drawn as a thin line, with the valS coding region depicted as a thick line. Key restriction sites are marked: B-BamHl; H-//mdIII; P-Psd, and S-Sall. The arrows represent individual gel readings: sequences representing the upper (+) strand are shown at the top of the figure, with the (-) strand sequences at the bottom. A set of oligonucleotide primers (dashed arrows), complementary to the (+) strand of an M13 clone containing the 3.6 kb Pstl fragment were synthesised in order to sequence through the entire valS coding region (Table 6.1). The arrows represent schematically the approximate positions of priming, as opposed to the amount of sequence that was obtained from each one (about 300 bp). N.B. Primer VAL4 did not work and was resynthesised as VAL5 (Table 6.1) 173 individual sequences that comprise the database. Further, the data obtained by overlapping sequences from shotgun clones confirmed that there had been no deletions in the structural gene in the Ml 3 clone carrying the 3.6 kb Pstl fragment. As mentioned in Section 5.6, large inserts cloned into Ml 3 vectors tend to be relatively instable, even when propagated in a recA~ strain of E. coli (Thompson, 1982; N.J.B., unpublished observations).

The complete consensus sequence of the valS gene is shown in

Fig. 7.2. The consensus sequence was derived from a large body of overlapping sequence data by using the DBUTIL and DBAUTO programs of Staden (1980; 1982b), run on a VAX 11/750 mainframe.

The complete database is presented in Appendix 2. The longest open reading frame (ORF) is 2640 bp long, including the initiation AUG codon, and the frame is immediately followed by a TAG termination

Fig. 7.2. (Following four pages) DNA sequence of the B. stearothermophilus valS gene and adjacent regions. The DNA sequence was determined by the dideoxy chain-termination method (Sanger et al, 1977b; Sanger et aU 1980). The numbering refers to the start of the valS coding region, with +1 representing the first base of the initiation codon. The positions of the Pstl sites (see Fig. 7.1) are shown underlined and are 3.65 kb apart. The sequences corresponding to a set of oligonucleotide primers, complementary to the plus strand of an Ml 3 clone containing the entire gene (Section 7.2.1), are shown in bold and are labelled according to Fig. 7.1. Features upstream of the gene. The ribosome binding site (rbs) and the hexanucleotide promoter elements (-10 and -35) are underlined. The sequence GAAAAATGGT, repeated twice at -657 and -678, is marked by dotted underlines. T1 represents the centre of a region of secondary structure that could base-pair to form the stem of a hairpin loop (underlined). This closely resembles a rho-independent terminator (Platt, 1986). Features downstream of the gene. A second hairpin structure, T2, is denoted with underbars representing the stem of a potential rho- independent terminator. Termination codons (TAG or TGA) are denoted by an asterisk. The predicted protein sequence of the valS structural gene is shown below the DNA sequence, denoted by the standard one-letter code (see legend to Fig. 1.5). 174

-700 -670 CAGGCAATGAAACGAGATGGCCAATAAATGACATTCAAACGTCGTTGGCAGAAAAATGGT -640 -610 GCGGATCGCTTGAAAAATGGTAGCGGGATGATCACTTGGGCGCAGACGGCGCGCACTTGC -580 -550 CGAAGCCAAGGCGGCCATGAATACGGGTTGAATCGCCGCATCGGTGTTCCCTCCTTTTCC -520 -490 GGTTT ACT AT ACAAT ATGAAACGTGCGGTGGTTTGTCACCAAATGCACACGGATGGAATA

(-35) -460 (-10) -430 AATAAATAAAAACTTGGCAAAGATGATGGACAATGGTAGTATAATAATCGAATCATGACG

-400 PstI -370 ACAAAAGCGAAGACGGGGAGGAGTACAGCGGTCCGCGCCTGCAGAGAGGGAAATC......

T1 -9 0 -6 0 GCAGCTCCTTTCGTCCTTTTCGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCGCGGCAT

-30 vNterm rbs -1 TGCTTAGTGGGAAGCCGCCGTCGCCGCATTGTCCATCATGTTGAAGGAGGGAAATGAAAC 30 60 30 60 ATGGCACAGCACGAAGTGTCGATGCCACCCAAATACGATCATCGCGCCGTTGAAGCCGGG MAQHEVSMPPKYDHRAVEAG

90 120 CGCTACGAATGGTGGCTGAAAGGAAAATTTTTTGAAGCGACCGGTGATCCGAACAAACGA RYEWWLKGKFFEATGDPNKR

150 180 CCGTTTACGATCGTCATCCCGCCGCCCAACGTCACCGGCAAATTGCACTTAGGGCATGCG PFTI VIPPPNVTGKLHLGHA

210 240 TGGGATACGACGCTGCAAGACATCATTACGCGCATGAAGCGGATGCAAGGGTATGACGTT WDTTLQDI ITRMKRMQGYDV

val2 270 300 TTGTGGCTTCCGGGAATGGATCACGCCGGCATCGCCACCCAGGCGAAAGTCGAGGAAAAA LWLPGMDHAG I ATQAKVEEK

330 360 TTGCGCCAGCAAGGGCTGTCGCGCTACGATTTAGGCCGGGAAAAATTTTTAGAAGAAACG LRQQGLSRYDLGREKFLEET

390 420 TGGAAGTGGAAGGAAGAATACGCCGGCCATATTCGCAGTCAATGGGCGAAGTTAGGGCTT WKWKEEYAGHIRSQWAKLGL

450 480 GGGCTTGATTACACGCGCGAGCGGTTTACGCTTGACGAAGGGCTTTCCAAAGCGGTGCGC GLDYTRERFTLDEGLSKAVR 175

510 val3 540 GAAGTGTTCGTCTCGCTCTACCGGAAAGGGCTCATTTACCGCGGCGAATACATCATCAAC EVFVSLYR K G L I Y R G E Y I I N

570 600 TGGGATCCGGTGACGAAAACCGCGTTGTCGGACATTGAGGTTGTTTATAAAGAAGTGAAA WDPVTKTA LSD I E V V Y KEV K

630 660 GGCGCGCTCTACCATATGCGCTATCCGCTCGCCGACGGCTCCGGCTTCATCGAAGTGGCG GALYHMRY P L ADG S G F I EV A

690 720 ACGACCCGTCCGGAGACGATGCTCGGTGACACGGCCGTTGCCGTGCATCCGGATGACGAG TTRPETML G D TAV A V H PDD E

750 780 CGGTACAAGCACTTGATCGGCAAAATGGTGAAACTGCCGATCGTCGGCCGCGAAATTCCG R Y K H L I G K M V K L P I V G R E I P

810 840 ATCATCGCCGATGAGT ACGTCGAT ATGGAATTCGGCTCTGGGGCGGTCAAAATTACGCCG I I A D E Y V D M E F G S 6 A V K I T P

v a l5 870 900 GCGCACGACCCGAACGACTTTGAAATCGGCAATCGCCACAACTTGCCGCGCATTCTCGTC AHDPNDFE I G NRH N L P RI L V

930 960 ATGAACGAAGACGGCACGATGAATGAAAACGCCATGCAATACCAGGGGCTTGACCGGTTT MNEDGTMN ENA MQ Y Q G L DR F

990 1020 GAATGCCGGAAGCAGATCGTCCGCGATTTGCAAGAACAAGGCGTCCTCTTTAAAATCGAG E C R K Q I V R D L Q E Q G V L F K I E L 1050 1080 GAGCACGTGCACTCAGTCGGTCATAGCGAACGAAGCGGCGCAGTT ATTGAACCGT ATTTG EHVHSVGH S E RSG A V I EPY L

v a l6 1110 1140 TCGACGCAATGGTTTGTGAAAATGAAGCCGCTCGCCGAAGCCGCCATCAAGCTCCAGCAA STQWFVKM K P L AE A A I KL Q Q

1170 1200 ACCGACGGCAAAGTGCAGTTCGTGCCGGAACGGTTTGAAAAAACGTATTTGCATTGGCTT TDGKVQFV PER F E K T Y L HW L

1230 1260 GAAAACATCCGCGACTGGTGCATTTCACGCCAGCTTTGGTGGGGGCATCGCATCCCGGCA E N I R D W C I S R 0 L W W G H R I P A

1290 1320 TGGTACCATAAAGAGACGGGTGAAATTTACGTCGACCATGAACCGCCGAAAGACATCGAA WYHKETGE I Y V DH E P P KDI E 176

1350 val7 1380 AACTGGGAACAAGACCCAGATGTGCTCGACACATGGTTCAGCTCGGCGCTCTGGCCGTTC N W E Q D P D V L D T W F S S A L W P F

1410 1440 TCGACAATGGGCTGGCCGGATACCGACTCGCCGGATTACAAGCGCTACTACCCGACCGAT S T M G W P D TDSPD Y K R Y Y P T D

1470 1500 GTGCTGGTCACCGGTTATGACATCATTTTCTTCTGGGTGTCGCGCATGATTTTTCAAGGG V L V T G Y D I I F F W V S R M I F Q G

1530 1560 CTTGAATTCACCGGAAAGCGTCCGTTCAAAGACGTGCTCATCCACGGCCTCGTCCGCGAC L E F T G K R P F K D V L I H G L V R D

1590 va!8 1620 GCCCAAGGGCGGAAAATGAGCAAGTCGCTCGGCAACGGCGTCGATCCGATGGATGTGATC A Q G R K M S K S L G N G V D P M D V I

1650 1680 GACCAGTACGGCGCCGATGCGCTCCGTTACTTCTTGGCGACCGGCAGCTCGCCGGGGCAA D Q Y G A D A L R Y F L A T G S S P G Q

1710 1740 GACTTGCGCTTCAGCACAGAAAAAGTCGAAGCGACGTGGAATTTTGCTAACAAAATTTGG D L R F S T E K V E A T W N F A N K I W

1770 1800 AACGCCTCGCGCTTTGCCTTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGATTTG N A S R F A L M N M G G M T Y E E L D L

1830 v a l9 1860 AGCGGCGAAAAAACGGTCGCCGACCATTGGATTTTAACGCGTCTCAACGAAACGATCGAG S G E K T V A D H W I L T R L N E T I E

1890 1920 ACGGTGACGAAGCTCGCTGAGAAATACGAATTCGGCGAACGGGGGCGTACGCTGTACAAC T V T K L A E K Y E F G E R G R T L Y N

1950 1980 TTTATTTGGGACGACTTGTGCGACTGGTACATTGAAATGGCGAAATTGCCGCTTTACGGT F I W D D L C D W Y I E M A K L P L Y G

2010 2040 GACGACGAAGCGGCGAAAAAGACGACGCGCTCCGTGTTGGCGTATGTGCTCGACAACACG DDEAAKKTTRSVLAYVLDNT

2070 2100 ATGCGCCTGCTTCACCCGTTTATGCCGTTCATTACCGAGGAAATTTGGCAAAACTTGCCG MRLLHPFMPFITEEIWQNLP 177

val 10 2130 2160 CATGAAGGCGAATCGATCACCGTCGCTCCGTGGCCGCAAGTGCGCCCTGAGCTGTCGAAC H E G E S I T V A P W P Q V R P E L S N

2190 2220 GAAGAAGCCGCGGAAGAAATGCGGATGCTT6TGGACATCATCCGCGCCGTCCGCAACGTC E E A A E E M R M L V D I I R A V R N V

2250 2280 CGCGCCGAAGT AAACACGCCGCCGAGCAAGCCGATTGCGCTCT AT ATT AAGACAAAAGAC R A E V N T P P S K P I A L Y I K T K D

2310 2340 GAGCACGTGCGGGCCGCGCTTTTGAAAAACCGCGCCTATCTTGAGCGGTTCTGCAACCCG EHVRAALLKNRAYLERFCNP

val11 2370 2400 AGCGAGCTCTTGATTGATACAAACGTTCCCGCGCCGGACAAAGCGATGACGGCGGTCGTC SELLI DTNVPAPDKAMTAVV

2430 2460 ACCGGCGCCGAGCTCATCATGCCTCTTGAAGGATTGATCAATATTGAGGAAGAAATTAAG TGAELIMPLEGLINIEEEIK

2490 2520 CGGCTTGAAAAAGAGCTTGACAAATGGAACAAAGAAGTCGAGCGCGTCGAAAAGAAACTG RLEKELDKWNKEVERVEKKL

2550 val12 2580 GCGAATGAAGGCTTTTTGGCGAAAGCGCCGGCTCATGTCGTCGAAGAAGAGCGGCGCAAG A N E G F L A K A P A H V V E E E R R K

2610 2640 CGGCAAGATTACATCGAAAAACGCGAAGCCGTCAAGGCGCGCCTCGCCGAGCTCAAACGG R Q D Y I E K R E A V K A R L A E L K R

2670 T2 2700 TAGACAAACGATCTGGCGGTTGATTATGGTTGATTATGATGAAGACGAATCCGCTTTCCT * * * * 2730 2760 GTGGATTCGTCTTTTTCGATGGATCATGATGGAAGGTTGGCATATTCTGAGAAGAAGGTT 2790 2820 TGTTGTCAACATATACTCGCTTTCGCCACGCTTCTCGATGTGTGGCCGCATCTGCCGCCA 2850 PstI 3170 TGAAGCCGGCAGGGGGCGGCAAGACACCAA___TTCCGATCCGTCGACCTGCAGCCA 178 codon. Three more termination codons are found in frame with the

ORF, 10, 12 and 13 codons respectively, further downstream from the first termination codon. The occurence of a group of termination codons at the end of a gene is a common characteristic of prokaryotic genes (Dr W.H.J. Ward, personnal communication). The predicted amino acid sequence for this ORF, written in the conventional one-letter code, is shown underneath the DNA sequence. This ORF codes for a protein of predicted Mr = 102,030. This is in good agreement with previous estimates of Mr = 110,000 for the size of both the ValRS from

B. stearothermophilus and E. coli (Yaniv and Gros, 1970; Koch et al,

1974) and a more recent estimate for the E. coli ValRS of Mr =

103,000 (Skogman and Nilsson, 1984).

The distribution of restriction sites within this region was analysed using the program ANALYSEQ (Staden, 1984; 1986) and found to agree well with the maps that were constructed for pNBl, pNB2.1 and pTB8 (Fig. 7.3). The main difference between the two maps is the presence of three Sacl sites (Sc) to the right of the predicted map, in contrast to the one site as found from restriction mapping. However, restriction with Sacl would have generated two fragments of less than

200 bp in size and these would not have been detected by electrophoresis through agarose gels (Section 2.5.2).

7.2.2. Transcription Promoter and Termination Signals for valS.

The promoter of a typical E. coli transcription unit may be regarded as containing three characteristic sequence elements. The startpoint from which RNA polymerase initiates transcription is usually a purine (A more common than G), often within the sequence CPuT 179

H K Sc B s| S IA v A c P Restriction ;lrK 1 ill ___lr_ j_i m a n

H K C A v S c P K S | S Ac P Predicted from r__ i £_Q DNA sequence

i______i 1 kb

Fig. 7.3. Comparison of restriction map for the 3.6 kb Pstl fragment of pTB8 with that predicted from the DNA sequence. The restriction map (Section 4.5.1) was compared to a map that was predicted from the DNA sequence using the program ANALYSEQ (Stadcn, 19S4). The identity of restriction sites are given in the legend to Fig. 4.7. (Lewin, 1983). This sequence is preceded by two highly conserved regions, the so-called "-10" region (Pribnow box) and the "-35” region

(Pribnow, 1975; Schaller e t a l, 1975) that are centred roughly 10 and 35 bp upstream from the startpoint and are separated by a 16-19 bp

AT-rich stretch of DNA (Siebenlist et al, 1980). These regions are recognised by RNA polymerase and, presumably, stabilise the interaction between the RNA polymerase and its DNA template. The effect of point mutations in these regions on the initiation of transcription in vivo implies that the integrity of their sequences contributes to the establishment of productive transcription (Rosenberg and Court, 1979). The consensus sequences for these two regions, based on an analysis of over 150 known E. coli promoters, are TATAAT and

TTGACA, respectively (Hawley and McClure, 1983). The -10 and -35 regions appear to be present in B acillu s promoters and are recognised by the major form of RNA polymerase in B. subtilis (Wong and Doi,

1984). Moran et a l (1982) examined the promoters from nine individual

B. subtilis or B. subtilus phage genes and concluded that the E. coli consensus sequences for the -10 and -35 regions applied to the B acillu s promoters also.

The putative site for the initiation of transcription of the

B. stear other mophilus valS is an adenosine in the sequence CAT (Fig. 7.2, underlined), 434 bp upstream of the initiation AUG. A hexanucleotide sequence, TATAAT, that agrees exactly with the canonical Pribnow box sequence, lies 10 bp upstream of CAT (same figure, underlined). The

Pribnow box is separated by 20 bp from a further upstream sequence,

TTGGCA, that is a good candidate for the -35 region (same figure, underlined). These three elements represent the best and only match that could be found with the E. coli promoter consensus sequences and must represent the B. stear other mo philus valS promoter. This promoter 181 lies within the 0.8 kb X m a l-P stl fragment in the plasmid pTB8, upstream of the Rs/I 3.6 kb fragment of pTB8 that carries the structural gene (Fig. 7.1). Significantly, the Class I M13 clones

(Fig. 5.5b) which contain only the Ps/I fragment cloned in the same orientation as is depicted in Fig. 7.1, do not appear to produce

B. ste ar other mo philus ValRS in vivo (Table 5.3). When the same fragment is cloned in the opposite orientation (Class II clones), va lS is transcribed from the M13 la c Z promoter (Table 5.3). These observations are consistent with the v a lS promoter being contained on the X m a l-P stl fragment.

The va lS promoter closely resembles that of the ty r S gene from

B. stearothermophilus. The conserved sequences for this promoter are

AATAAT (-10 region) and TTGACA (-35 region) and the startpoint of transcription is sited 284 bp upstream of the structural gene (Waye and

Winter, 1986). However, these two promoters differ considerably from the sequences that have been proposed for the two promoters of the

B. stearothermophilus trpS gene (Barstow e t a l, 1986, in press).

The entire sequence was screened for inverted repeating sequences that could base-pair to form stable hairpin loops, using ANALYSEQ

(Staden, 1984). The program parameters were set to search for hairpins with at least 8 bp in the stem and no limit on the size of the loop.

The free energies of potential hairpin structures at 25 °C were calculated according to the rules of Tinoco e t a l (1973). There are only two significant hairpin structures in the va lS non-coding region

with AG values that are greater than -10 kcal; the inverted repeat

stems of these hairpins are depicted with horizontal underbars in

Fig. 7.2. One hairpin (T2, same figure) is located 38 bp downstream

from the first termination codon and has a predicted free energy of AG= - 14 kcal at 25 °C. This structure closely resembles that of a rho-independent transcription terminator, found in many E. co li genes

(Platt, 1986), in that the inverted repeat sequences are followed by a run of thymidines on the 3’ side of the hairpin stem. There is a run of 5 T’s, two of which are base-paired at the foot of the hairpin.

Such structures have been located downstream of the genes encoding the

E. coli GlnRS (Breton e t a l, 1986; E. coli GlyRS (Webster e t a l, 1983), and the TrpRS from B. stearothermophilus (Barstow e t a l (1986), in press). There is a second hairpin, again resembling a rho-independent terminator, located 87 bp upstream from theva lS structural gene (Tl,

Fig. 7.2). The stem has a predicted AG value of -15.4 kcal. Potential rho-independent terminators have been found upstream of the trp S structural genes for both E. coli and B. stearothermophilus (Hall and

Yanofsky, 1981; Barstow e t a l (1986), in press), as well as the upstream of the B. stearothermophilus tyrS gene (Waye and Winter, 1986). The function of these structures remains unclear. Both of the tr p upstream terminator structures overlap with the trp promoters. In contrast, the v a lS upstream hairpin is sited roughly 300 bp downstream of the promoter, whilst two major ty r S hairpins are centred approximately 190 and 230 bp downstream of the promoter. The latter structure is centred around -50, relative to the ty r S initiator AUG, and was shown to be an efficient terminator of transcription in vitro (Waye and Winter,

1986). This was achieved by linking the ty r S upstream sequences to the g a lK structural gene, then deleting the hairpin loop by oligonucleotide-directed mutagenesis and analysing the lengths of transcripts that were initiated at the ty r S promoter by a run-off transcription assay (Travers et a l, 1983), i.e., using a template that had been linearised by a restriction enzyme (in this particular case at an

A va l site downstream of the supposed terminator hairpin). The 183 transcripts were resolved by electrophoresis on a 6% polyacrylamide/8M urea gel. A 257 nucleotide (nt) transcript was observed: moreover, a transcript of the same length was observed when a full-length template was used, indicating that transcription was terminated at a specific point on the template. When the -50 hairpin was deleted, the 257 nt transcript disappeared and was replaced by a 522 nt read-through transcript. Deletion of this region causes a rise in the concentration of

TyrRS by at least 2-fold. The second hairpin is centred at -98 but its role is uncertain. This hairpin may be deleted without affecting the termination of the 257 nt transcript at -50 terminator, though the level of TyrRS increases by 50%. The tyrS gene does not appear to be regulated by its gene product, like the E. coli alaS (Putney and

Schimmel, 1981) or trpS genes (Das and Yanofsky, 1984): in vitro studies indicate that adding TyrRS to the transcription run-off system does not alter the comparitive ratio of the 257 and 522 nt transcripts (Waye and

Winter, 1986).

The sequences surrounding the upstream hairpin structure of the cloned valS gene would not appear to form any other significant hairpins, essential to the formation of alternative secondary structures in a transcription attenuator such as the one found for the E. coli pheS, T operon (Springer et aU 1983). Neither is there any evidence for a valine-rich leader peptide preceding the structural region (see

Section 1.5.5. for a full descripiton of the elements associated with an attenuator).

A 10 bp region with the sequence GAAAAATGGT is repeated twice within the valS 5’ non-coding region. The repeats start at positions -678 snd -657 respectively and are separated by 11 bp. The 184 signifigance of these repeats is unknown, but the spacing between them is roughly equivalent to one complete turn of the double helix, such that the repeats would lie on the same face of the helix. A possible explanation for this repeat is that these sequences might form contact points for a protein that interacts with upstream v a lS elements. A recent example of the importance of helical periodicity came from a study of repeating sequences in the early promoter of simian virus (SV)

40 (Takahashi et a l, 1986). The repeating units (three 21 bp) bind a protein factor that is necessary for transcription, s p l (Dynan and Tijan,

1983). Insertion or deletion of multiples of 5 bp between the repeats and the transcription startpoint had an adverse effect upon the efficiency of transcription, indicating a requirement for a topological alignment between the startpoint of transcription and the sequences that bind the protein factor.

7.2.3. Ribosome Binding Site for v a lS

A potential ribosome binding site (RBS) lies immediately upstream of the va lS structural gene and agrees with the canonical E. coli sequence 5’ AGGAGG 3’ (Shine and Dalgarno, 1974). The RBS and the initiator AUG codon are separated by 10 bp. The RBS is part of a sequence that would base-pair to the 3’ end of B. subtilis 16S ribosomal

RNA with a free energy (AG) of -15.8 kcal/mol at 25 °C, predicted using the rules of Tinoco et a l (1973). This is in good agreement with previous estimates for the free energy of the RBS/rRNA interaction in

B a cillu s species of between -14 and -23 kcal/mol (Moran et al, 1982).

The free energy of the interaction of the B. stearothermophilus tyrS RBS with B. subtilis 16S rRNA has been estimated recently to be

-17.1 kcal/mol (Waye and Winter, 1986). The spacing of the RBS from the initiation codon agrees exactly with the average spacing that has been observed for a number of

B acillu s genes (McLaughlin et ah 1981). These two factors, as well as

the broad specificity of E. coli RNA polymerase for both E. coli and

B acillu s promoter sequences (Wong and Doi, 1984), may be why B acillu s

genes are often expressed well in E. coli. In contrast, E .coli genes are

usually poorly expressed in B. subtilis (Band and Henner, 1984). The 3’

end sequences of the 16S rRNA from both E. coli and B. subtilis are

almost identical (Gold e t al, 1981) and contain a sequences that are

complementary to AGGAGG. It has been suggested that B. subtilis

requires a higher level of stringency in the pairing of its rRNA to a

RBS in order to form a tight association between the ribosome and its

transcript (McLaughlin e t ah 1981): this is reflected partly by the

difference in the free energies for the RBS/rRNA interactions between

B acillu s species (-14 to -23 kcal; Moran e t ah 1982) and the analagous

interaction in E. coli (average AG = -11 kcal, Gold e t ah 1981). The

spacing between the RBS and the initiator AUG in B. subtilis appears

to have a profound effect upon the level of gene expression from that

transcript, whereas the spacing in an E. coli transcript appears to be

less critical (Band and Henner, 1984). Indeed, the RBS/AUG spacing in

E. coli varies from 3-12 bp (Harris, 1983). Therefore, the E. coli

ribosome should form an efficient initiation complex with the cloned

v a lS transcript. The level of expression of va lS is probably lower from

the induced Class II M13 clones that contain va lS under the control of

the la c Z promoter than it is from the plasmid pTB8, even though the

phage-borne gene is undoubtedly present in the infected cell at a

higher copy number (Hohn et a l, 1971). The high level of expression

from pTB8 is possibly due to the fact that the gene is transcribed

from its own promoter which agrees well with the ideal E. coli and 186

B acillu s promoters with regard to the -10 and -35 consensus sequences, as well as the strong RBS and its spacing from the start of the valS structural gene. The la c Z promoter sequences have poor homologies to the canonical -10 and -35 sequences by comparison (Reznikoff and

Abelson, 1980).

7.3. Codon Usage.

7.3.1. Introduction

The codon usage of the major ORF was analysed using the

ANALYSEQ program of Staden (1984). Firstly, the codon usage was compared with that of the TyrRS and TrpRS from B. stear other mophilus

(Winter e t a l, 1983; Barstow et a l (1986), in press) and the TyrRS from

Bacillus caldotenax (Jones e t a l, 1986). The use of G or C as the third or "wobble” position of the codon was examined, as thermophilic bacteria appear to have a higher G:C content than mesophilic bacteria

(Kagawa e t a l, 1984). Additionally, the use of C or U in the wobble position has been correlated with the level of expression of E. coli genes (Ikemura, 1981; Grosjean and Friers, 1982; Gouy and Gautier,

1982). The summed codon usage for the aforementioned synthetases from B. stearothermophilus, expressed as the number of codons inserted per thousand (after Gouy and Gautier, 1982), was next compared with the summed codon usage for five E. coli aminoacyl-tRNA synthetases, which were themselves compared with consensus values for strongly or weakly expressed genes of E. coli (Grosjean and Fiers, 1982). Table 7.1

Comparison of Codon Usage in Thermophilic Aminoacyl-tRNA Synthetases

U C A G Bs t Bst Bca Bst Bst Bst Bca Bst Bst Bst Bca Bst Bst Bst Bca Bst Trp Tyr Tyr Val Trp Tyr Tyr Val Trp Tyr Tyr Val Trp Tyr Tyr Val

F 5 12 12 16 S 0 3 2 1 Y 8 4 3 9 C 0 0 0 0 U 2 2 4 F 3 7 7 15 S 3 0 0 3 Y 6 10 11 25 C 3 C Term u L 0 3 2 5 S 1 2 3 2 Term A Term 3 6 6 27 L 7 12 13 21 S. 7 6 5 14 W G 3 3 5 L 6 10 10 18 P 1 0 0 2 H 3 3 3 13 R 0 U 4 4 3 11 7 19 19 33 L 5 10 10 23 P 1 0 0 3 H R C 2 2 12 12 12 16 2 1 0 2 c L 1 1 1 0 P 3 2 Q R A 39 6 5 5 9 11 5 6 17 L 9 7 8 9 P 9 9 9 Q R G

4 5 0 1 1 1 I 11 13 13 23 T 0 0 0 0 N 2 4 S U 14 7 8 9 24 5 7 7 9 I 16 14 14 28 T 6 4 3 N S C A 1 0 1 2 3 5 17 18 17 40 0 0 0 0 A I 0 1 T K R 10 23 21 29 5 8 9 19 0 1 1 0 G M 10 6 5 27 T K R

7 4 3 3 4 D 3 8 9 21 1 0 1 6 U V 4 4 3 A G 10 9 9 26 D 13 15 14 34 11 19 19 29 V 12 10 9 27 nA Gy C 1 0 1 4 3 3 3 E 15 17 18 60 3 4 3 4 G V 0 nA VJG A 13 5 8 23 10 19 21 28 E 14 22 21 24 5 13 13 15 G V A G 188

7.3.2. Codon Preferences of va lS and Comparison to other

B a cillu s spp. Aminoacyl-tRNA Synthetase Genes

The codon preference of the B. ste ar other mophilus large ORF

(which will be refered to from now on as va lS) is compared with the

codon usage of a number of other thermophilic aminoacyl-tRNA

synthetases (Table 7. 1). The synthetases do not differ significantly

from one another in terms of preference for certain codons. However,

there appear to be certain trends that may be tentatively regarded as peculiar to thermophilic B a cilli or to this group of synthetases. For example, all four synthetases show an absolute preference for UGC over

UGU for cysteine and avoid using ACU (thr). The codon GUA (val) is also avoided, GUC or GUG being used preferentially. This suggests that there is a preference for placing C or G in the third position of

the codon. Such preferences also apply to the following codon quartets

or sextets: serine (AGC or UCG over the other four ser codons);

threonine (ACG > ACC > ACA); alanine (GCC and GCG > GCU,

GCA); arginine (CGC and CGG over the other four codons), and

glycine (GGC and GGG > GGU, GGA). Similarly, for the tyrosine and

Table 7.1. (Facing) Codon Usage of 4 B a cillu s spp. Aminoacyl-tRNA Synthetases. TrpRS (Barstow et a l, 1986, in press), 327 codons; TyrRS (Winter e t a l, 1983), 419 codons and ValRS (Brand, this chapter), 880 codons, are all enzymes from B. stear other mo philus. Bca Tyr is the TyrRS from the extreme thermophile Bacillus caldotenax (Jones e t a l, 1986), 419 codons. The code is read as follows; left-hand column (first base), horizontal row (second base) and right-hand column (third base). The amino acid represented by a particular codon is shown to the immediate left of the codon - the standard one-letter code is used (see legend to Fig. 1.5). "Term" denotes a termination codon. Tables 7.3 and 7.4 should be read in the same manner. 189 aspartate doublets, UAC > UAU (tyr) for the two ty r S genes and valS

(but not the trp gene) and GAC > GAU (asp).

7.3.3. Bias Towards G or C in the Wobble Position of the Codon in

Thermophilic Synthetase Genes

The frequency of use of G or C in the third position of the codon may be compared for a number of synthetases, including those

E. coli synthetases that have been sequenced. Barstow e t a l (1986, in press) concluded that the B. stear other mophilus trpS gene shows a definite bias towards G or C in the third position, in common with a selection of thermophilic genes from various B acillu s species, even though the G+C content of B. stear other mo philus DNA (49-53%) is not significantly different from the E. coli content of 48-52% (Donk, 1920).

The frequency of G+C at the third base in the codon for valS , together with the overall percentage of G+C in the structural gene, is shown in Table 7.2. Similar data for a selection of other thermophilic genes (including the B. stearothermophilus trpS and tyrS genes), E. coli synthetases and an average of 64 E. co li genes (Grosjean and Fiers,

1982) are included for comparison. The total G+C content of the four

B. stearothermophilus genes appears to be slightly higher than that of the E. coli synthetases and the E. co li average figure. The major difference appears to lie in the use of G or C in the third position of the codon. The enzymes from B. stearothermophilus tend, in general, to have roughly 15-30% more G or C in the third position than the

E. co li enzymes in this study. Other B acillu s species such as

B. licheniformis, (included here as Winter et al (1983) compared the codon usage of the penicillinase from this bacterium with that of

B. stearothermophilus tyrS), appear to resemble E. coli in its use of G 190

T ab le 7.2

Comparison of the G+C Content of the B. stear other mo ohilus

valS with other Thermophilic and Mesophilic Genes

Gene %G+C %G+C Reference (total) (third base) trpS (Bst) 54 69 Barstow et al (1986) tyrS (Bst) 54 68 Winter et al (1983) valS (Bst) 55 69 This thesis neutral protease 58 72 Takagi et al (1985) (Bst) lactate deH 55 60 Barstow et al, (1986, (Bst) in press) penicillinase 44 64 Neugebauer et aU (1981) (B. licheniformis) leuB 70 89 Kagawa et al (1984) (T. thermophilus)

trpS (E.coli) 52 60 Hall et al (1982) tyrS (E.coli) 52 58 Barker et al (1982a) alaS (E.coli) 53 56 Putney et al (1981a) average of 64 53 55 Grosjean and Fiers (1982) E. coli genes

Comparison of G+C content (total) and percentage of G+C inserted in the third position of the codon for a selection of Bacillus sp. genes, including the three cloned B. stearothermophilus aminoacyl-tRNA synthetases (Bst). The G+C values for an extreme thermophile, Thermus thermophilus, are also shown. The G+C contents of selected E. coli synthetases are shown, together with an average of 64 E. coli genes (taken from Grosjean and Fiers, 1982). 191 or C in the third position. The sample of thermophiles used by

Barstow and co-workers reflects a sample of B acillu s species, rather than thermophiles. The genes that are presented in Table 7.2 are, potentially, a more representative selection of genes, contrasting the synthetases from E. coli with those from B. stear other mo philus, as well as including values for an extreme thermophile, the gene encoding the

3-isopropylmalate dehydrogenase (leuB ) from Thermus thermo philus

(Kagawa e t a l, 1984). This bacterium is capable of growing at

temperatures exceeding 80 °C and clearly shows a preference for G or

C in the wobble position, as well as having a higher total G+C

content. If taken together with the B. stear other mo philus data (including

the neutral protease), this suggests that thermophilic organisms maximise

the stability of their genetic material by increasing the G+C content,

exploiting the degeneracy of the genetic code so as

to cause the minimum change in the primary structure of the protein.

The amino acid variations that are seen amongst the same enzyme from

different organisms (both thermophiles and mesophiles) often correlate

with the increased thermostability of the thermophilic enzyme.

Evidence to support such trends can be seen for the enzyme

glyceraldehyde-3-phosphate dehydrogenases. The enzymes from

B. stearothermophilus and the extreme thermophile Thermus aquaticus

(optimal growth temperature, 70-75 °C) resemble their mesophilic

equivalents in function (the two proteins share 61% homology), but

differ in structure by having extra salt bridges which appear to hold

the subunits together (Walker et a l, 1980). The higher proportion of

charged residues on the surface of the T. aquaticus enzyme implies that

surface salt bridges play a major role in contributing to the greater

thermostability of that enzyme. Such a phenomenon has also been

reported for thermophilic ferrodoxins (Perutz and Raidt, 1975). The cellular RNA molecules of an extreme thermophile may also be stabilised by increasing the G+C content; T. thermophilus tRNAs have a higher proportion of G, C and modified bases than their E. coli equivalents (Kagawa e t a l, 1984).

7.3.4. Comparison Between the Codon Usage of Synthetases from

B. stear other mo pilus and E. coli.

Table 7.3 shows a comparison of the sum of the

B. stearothermophilus synthetase codon preferences (EBst), expressed as the frequency of insertion of each amino acid during translation in codons inserted per thousand (Gouy and Gautier, 1982), versus the sum for five E. coli synthetases, EEc. The five E. coli synthetase sequences

(references given in Table 7.3 legend), like the four thermophilic synthetases, do not show any major deviations in the use of a particular codon from one enzyme to another (data not shown). The total sample sizes for the B. stearothermophilus and E. coli genes were

1626 and 3463 codons, respectively. Barstow et a l (1986, in press) have noted that the codon preference of thetr p S gene from

B. stearothermophilus bears a stronger resemblence to that of the ty rS gene from the same organism than to the E. coli trpS gene. This observation can be extended by comparing the summed codon preferences for the two groups of synthetases. Certain codon preferences agree with the observations of Barstow and co-workers for the B. stearothermophilus trpS gene. For example, E. co li synthetases prefer CAG over CAA (gin), whereas the reverse situation appears to apply to the B. stearothermophilus synthetases. The E. coli trpS gene prefers GUG over the other val codons but the summed codon preference for E. co li synthetases appears to be GUU and GUG with 193

Table 7.3

Comparison of Average Codon Usage for 3 B, stearothermophilus'

and 5 E. coin}? Aminoacyl-tRNA Synthetases

u c A G EE EEE EE E Bst Ec Bst Ec Bst Ec Bst Ec

F 20 17 S 3 12 Y 12 14 c 0 4 U 16 32 4 10 25 18 6 8 C 1 1 F S Y c U L 5 5 s 3 3 Term T Term A L 25 9 s 17 4 Term W 24 12 G

L 21 4 p 2 4 H 12 7 R 5 30 U 23 6 3 3 12 13 36 P L p H R 24 C ° L 1 1 p 4 6 Q 25 9 R 3 1 A L 16 68 p 35 28 Q 12 36 R 20 1 G

I 29 22 T 0 14 N 7 11 S 1 4 U 36 26 15 24 28 A 1 T 25 N S 13 16 C A I 1 2_ T 5 2 K 46 41 R 0 0 A 25 38 M26 T 10 K 20 18 R 1 1 G

V 9 21 A 6 21 D 20 35 G 4 31 U 30 10 A 28 21 D 38 29 G 36 34 C P vy Vj 1 11 A 6 20 E 57 48 G 6 2 A V 25 25 A 35 36 E 37 21 G 20 4 G

aZBst. Sum of codons, expressed as codons per thousand (Gouy and Gautier, 1982), corresponding to the mRNAs of the following B. stearothermophilus aminoacyl-tRNA synthetases: TrpRS (Barstow e t al, 1986, in press); TyrTS (Winter et a l , 1983), and ValRS (Brand, this thesis). The total number of codons was 1626.

^ZEc. Sum of codons for five E, eoli aminoacyl-tRNA synthetases: AlaRS (Putney e t a l , 1981); GlyRS (Webster e t a t , 1983); MetRS (Barker e t at) ; TrpRS (Hall e t al, 1982) and TyrRS (Barker e t al, 1982). The total number of codons was 3463. The values shown are expressed as codons per thousand. 194 equal frequency over the other two val codons. Both groups of synthetases show a definite bias towards using CCG (pro) above the other pro codons, a fact that does not correlate with whether a particular gene is strongly or weakly expressed (Section 7.3.5; Grantham et al, 1981; Grosjean and Fiers, 1982), and a bias for AAA (lys) over

AAG, again independent of the level of gene expression. These two cases probably result from the preferred codons being recognised by the major E. coli tRNAPr0 and tRNA^s species, respectively. Other significant differences are as follows, with the E.coli preference given first: leucine, CUG > all others (UUG, CUU and CUC are preferred by

B.stearothermophilus)\ UGU and UGC (cys) are both used, with a slight bias towards the latter codon (UGC only in B. stearothermophilus); serine, UCU, UCC, AGC > all others (UCG and AGC over the other ser codons for B. stearothermophilus); all alanine codons used, with preference for GCG (definite bias towards GCC and GCG in

B. stear other mo philus synthetases), and glycine, GGU, GGC > others

(GGC, GGG in B. stear other mo philus). The most pronounced distinction concerns the arginine codon family. E. coli synthetases avoid those codons that correspond to the rarer arg isoaccepting species, i.e. those tRNAs that recognise CGA, CGG, AGA and AGG, (Ikemura, 1981).

However, the synthetases of B. stearother mo philus appear to use CGC and CGG predominantly. This probably reflects the preference of a thermophilic bacterium to increase the amount of G or C in the third position of the codon; most of the B. stearothermophilus codon preferences that are given above reflect this trend. It appears unlikely that the high use of the tRNA species decoding CGG affects the expression of B. stearothermophilus synthetase genes that have been cloned and expressed in E. coli as both the B. stearothermophilus tyrS and valS genes are expressed at high levels from plasmids in E. coli 195

(Barker, 1982, Section 5.3 respectively). The tyrS gene is also expressed at a high level (50% total cell protein) when cloned into Ml 3 and expressed in E. coli (Winter et aU 1982).

7.3.5. The B. stearothermophilus valS Shows Codon Preferences that

Resemble those of a Strongly Expressed E. coli Gene

Table 7.4, adapted from Grosjean and Fiers (1982), compares the codon usage of 24 strongly expressed (e.g. those corresponding to abundant proteins such as RNA polymerase and ribosomal proteins) and

18 weakly expressed genes (such as repressor proteins) from E. coli.

The total number of codons in each group was roughly 5200, and the codon preference is represented as the number of times that a particular codon occurs in every thousand codons. A clear correlation between third base degeneracy and whether a gene is expressed strongly or weakly is seen (boxed codons). Weakly expressed genes (WE-genes) tend to use the minor tRNA species more than strongly expressed genes

(SE-genes). Post et al (1979) showed that the genes of ribosomal proteins of E. coli (SE-genes) used those codons that corresponded to the major tRNA species. Later, Ikemura was able to determine the levels of most of the tRNA species in E. coli by 2-dimensional polyacrylamide gel electrophoresis and demonstrated that there was a direct relationship between the intracellular tRNA concentrations and the codon usage in

E. coli, the major isoaccepting tRNA families being used almost exclusively over the minor species (Ikemura, 1981; Post and Nomura,

1980). Fiers and co-workers, studying the codon preferences of the coliphage MS2 (Min Jou et al, 1972) proposed that the minor tRNA species may regulate protein synthesis and reduce the chance of misinsertion or frameshift mutations (Grosjean and Fiers, 1982). 196

Table 7. A

Comparison of Codon Usage for Strongly and Weakly Expressed

Genes in E. coli

U C A G Z Z Z Z Z Z Z Z Strong Weak Strong Weak Strong Weak Strong Weak

F 8 29 S 18 7 Y 7 18 C 3 7 U F 22 20 S 17 9 Y 19 13 C 5 8 C U L 2 1A s 1 7 Term Term A L 3 12 s 2 12 Term W 5 G

L 5 1A p A 6 H A 18 R A3 19 U L 6 13 p 0 9 H 1A 11 R 19 26 C C L 1 A p 5 9 Q 7 17 R 1 5 A L 66 56 p 31 19 Q 32 32 R 0 8 G

I 12 30 T 20 9 N 3 19 S 2 11 U I 50 23 T 26 23 N 30 19 S 9 12 C A I 0 5 T 3 6 K A9 31 R 1 5 A M 27 25 T 5 15 K 20 9 R 0 3 G

V 37 21 A 33 16 D 22 35 6 A3 2 U V 8 13 A . 9 3A D 39 20 G 33 2 C G V 23 9 A 23 21 E 63 A0 G 1 8 A V 16 2A A 25 29 E 20 19 G 3 13 G

The data are adapted from Grosjean and Fiers (1982) and are ex­ pressed as codons per thousand. Z strong: the sample size of abundant (strongly expressed) mRNAs was 2A. The total number of codons was 5253. Z weak: the sample size for mRNAs corresponding to non-abundant proteins (i.e. weakly expressed genes) was 18. Total number of codons was 5231. The boxed codons are those that are implicated as being characteristic of strong or weak expres­ sion. 197

Goiiy and Gautier (1982) described a plausible system of E. coli gene expression that is based on both the availability of tRNA species and the strength of the codon:anticodon interaction. Briefly, the speed and efficiency of translation depends upon two factors. Firstly, the various aminoacylated tRNAs are bound up with GTP and EF-Tu to form quarternary complexes that interact with the ribosome, which is temporally stalled over a codon in the entry (A) site of the ribosome.

The probability that the correct codon:anticodon match will be made depends upon the relative concentration of the tRNA molecules. Thus, genes containing codons that are recognised by the major tRNA species should sort through the intracellular pool of aminoacyl-tRNAs with fewer non-productive encounters before making the correct codon:anticodon pairing, resulting in translocation of the cognate amino acid onto the nascent polypeptide chain and the ribosome progressing to the next codon. Secondly, it appears that the choice of nucleotide in the wobble position is biased towards a codon:anticodon interaction of intermediate strength (Ikemura, 1981; Grosjean and Fiers, 1982; Gouy and Gautier, 1982). The reasons for this are not clear but it could affect protein synthesis in two ways: by assuring that the frequency of low-strength interactions is minimised, so as to prevent misincorporation, and by reducing the number of excessively strong interactions. This latter point implies that a stronger interaction may increase the duration of the translocation step and, hence, the overall rate of translation. This is debatable, but there does appear to be a correlation between third base degeneracy and expression as SE-genes tend to form moderately strong codonranticodon interactions. The boxed codons in Table 7.4 may be split into two categories, those in which the codonranticodon interaction for the first two bases of the codon would involve four hydrogen bonds (i.e. AAN, AUN, UAN and UUN) 198 and those which involve six hydrogen bonds (GGN, GCN, CGN and

CCN). Clearly, SE-genes prefer to place a C in the wobble position of the first group and a U for the third base in the second group. The

WE-genes show the opposite preferences. This data, from Grosjean and

Fiers (1982), was based on a sample of 24 HE- and 18 WE-genes.

Gouy and Gautier (1982) corroborate the codon usage for the SE-genes using a larger mRNA pool, derived from 64 E. coli SE-genes. This particular preference for either C or U in the third position occurs despite the fact that the each pair of codons is recognised by the same tRNA. For example, the ile codons AUU and AUC are recognised by the same isoaccepting species (anticodon GAU) as a G in the wobble position of the anticodon recognises both C and U in the corresponding position in the codon. In summary, strongly expressed genes appear to optimise their expression by using a selective system of codon preference, based upon choosing codomanticodon pairings of medium strength and by using major isoaccepting species of tRNAs where possible.

If one compares the data for WE- and SE-genes with the summed codon usage for the E. coli synthetases, it is apparent that, with few exceptions, the synthetases are expressed in a manner similar to

SE-genes. Accordingly, their expression could be viewed as moderate to strong. This is not surprising, as the aminoacyl-tRNA synthetases are

"housekeeping" genes, i.e. genes whose products play a central role in replication, DNA repair, transcription or protein synthesis such as ribosomal proteins, recA or RNA polymerase (all abundant and strongly expressed genes). On the whole, expression is closer to SE-genes, especially in the case of the codons for his, gin, asn, asp and pro.

This conclusion may also be drawn for the B. stear other mophilus synthetases, taking into account that one must superimpose the 199

B. stearothermophilus bias towards G or C in the wobble position upon any comparisons that are drawn with E. coli gene expression. An important guide to expression is that SE-genes have a dramatically biased preference for certain codons in a family. If one compares the differences between the two levels of expression in Table 7.4, it is apparent that in the case of SE-genes, the distinction between the numbers of preferred codons used in relation to the minor codons in a particular quartet or septet is more pronounced than in the case of the

WE-genes. For example, the major arginine codons CGU and CGC are used to the almost complete exclusion of the four minor arg codons in

SE-genes. By contrast, though CGU and CGC are the preferred arg codons for WE-genes, the minor codons are used more extensively

(almost 1 in 3 arginine residues will be transfered to the ribosome

A-site by a minor tRNAar8 species. This correlates with the views of

Post et al (1979) and Ikemura (1981), mentioned earlier, that the minor tRNA species are used as one mechanism of modulating gene expression.

7.3.6. Discussion

The level of expression for weakly or strongly expressed genes often correlates directly with the number of molecules of gene product in the cell, assuming that the rates of breakdown for the proteins from these genes are similar. Data on the numbers of E. coli proteins encoded by various SE- or WE-genes, reviewed in Gouy and Gautier

(1982), shows that SE-genes such as outer membrane lipoprotein (lpp),

EF-Tu, recA and various ribosomal proteins are present at between

330,000 (lpp) and 9,000 (most ribosomal proteins) molecules per genome.

WE-gene products, such as the repressors of the lac and ara operons, are present at 10 and 100 copies per genome, respectively. The 2 0 0 aminoacyl-tRNA synthetases tend to be present at a modest level in the cell. Pedersen et al (1978) quantitated the numbers of molecules per cell (expressed as the number per genome equivalent) of over 100

E. coli proteins, including a number of synthetases (described in

Section 1.5.4). The results for the synthetases are shown in Table 7.5, together with the amount of each synthetase expressed as a fraction of the total cell protein. Subsequently, the synthetases were grouped with a number of proteins (such as those coded by such SE-genes as

ribosomal proteins, elongation factors and RNA polymerase) that increase

in concentration with the growth rate of the cells.

The conclusions that have been presented in the preceding sections,

that the B. stear other mophilus aminoacyl-tRNA synthetases show codon

preferences that are typical of E. coli strongly expressed genes disagree

with the conclusions that Barstow et al (1986, in press) drew for

expression of the B. stearothermophilus trpS gene. Those authors

concluded that the TrpRS (and also TyrRS from the same organism)

were regulated weakly, based on a comparison of the codon preferences

for the trpS genes from both B. stearothermophilus and E. coli and the

tyrS of B. stearothermophilus. They suggested that a pattern of poor

gene expression, based on observed codon preferences, might account for

the apparent low level (2 x 10"3 with respect to total cell proteins) of

the B. stearothermophilus TrpRS in vivo (Atkinson et aU 1979). Hall

et al (1982), reporting the sequence of the E. coli trpS gene, noted that

the estimated number of TrpRS molecules in the cell did not correlate

with the genes codon usage which suggested a strongly expressed gene

(i.e. using major tRNA species and, in theory, able to be translated at

a maximal rate). The number of TrpRS molecules in the E. coli cell

had been estimated to be 500-1100 per cell (Calendar and Berg, 1966). 2 0 1

T ab le 7.5

Cellular Levels of Aminoacvl-tRNA Synthetases of E. coli a

Aminoacyl-tRNA No. of molecules Weight fraction of Synthetase per genome b total cell protein (xlO3)

PheRS(a) 1310 1.07 IleRS 1010 2.45 PheRS(0) 960 2.05 GlyS(0?) 940 1.65 GluRS(0) 880 0.96 GlnRS 820 1.14 LysRS 800 1.07 ArgRS 620 0.82 ThrRS 580 0.86 ValRS 580 1.39 LeuRS 520 1.18

a Data taken from Pedersen et al (1978). b The figures shown were obtained by 2-dimensional PAGE of radiolabelled cell extracts, as described in Section 1.5.4. This is equivalent to a weight fraction (with respect to total cell protein) of roughly 0.4-0.9 xlO '3. Though Calendar and Berg did not state the conditions under which the cells were cultured, the constitution of the growth medium having a direct effect on the levels of cellular proteins (Neidhardt et al, 1977; Pedersen et aU 1978), this estimate agrees well with the numbers for the synthetases that are shown in Table 7.5 (Pedersen et aU 1978), though it does have one of the lowest intracellular synthetase concentrations. If

B. stearothermophilus TrpRS does represent 0.2 xlO-3 as a fraction of the total cell proteins, and the purification of this enzyme may be regarded as being particularly efficient (Atkinson et al, 1979), then the intracellular level is not significantly different to that of its E. coli counterpart. Neidhardt realised that many synthetases were under metabolic regulation, various synthetase genes showing a transient or long-term derepression of their synthesis when E. coli was starved of the particular amino acid (Neidhardt et al, 1975). The expression of several synthetases have subsequently been shown to be controlled by attenuation mechanisms or by the synthetase binding to and represssing its own promoter (Section 1.5.5). Such observations suggest that the promoter may have a more important role in controlling the expression of aminoacyl-tRNA synthetase genes than codon usage.

If codon usage is not used to maximise the expression of synthetase genes, then why do these genes have codon preferences that are typical of strongly expressed genes? Orgel (1963) stated that the ability of a cell to produce proteins that were free of mutations depended on the template and the fidelity of the transcription and translation machineries. Errors occuring in the production of

"housekeeping" proteins, especially those involved in transcription or 203 translation themselves, would be accumulative, an effect that Orgel called "error catastrophe". The expression of E. coli synthetases and, by extension, those from B. stear other mophilus, suggests that codon preference may be used to ensure the fidelity of translation of such fundamentally important enzymes. However, a comparison of regions of homology between different synthetases indicates that conservative substitutions are allowed for catalytically functional residues, such as the substitution of asparagine for the second histidine in the "HIGH" sequence. This sequence is found close to the amino-terminus of a number of synthetases (Section 1.6.1) and in, the case of the E. coli

AlaRS (Jasin et al, 1984) and the TyrRS from B. stearothermophilus

(Leatherbarrow et al, 1985), has been implicated in the reaction with

ATP. These homologies will be discussed in greater detail in the next section. Presumably, a number of phenotypically silent point mutations are also tolerable. Recent site-directed mutagenesis studies that have been undertaken on the TyrRS from B. stearothermophilus infer that amino acid substitutions affecting residues implicated in catalysis may not have disastrous consequences for the enzyme. The threonine at position 51 (Thr-51) in the enzyme binds the ribose ring oxygen of tyrosyl adenylate in the transition state (Fersht et al, 1985b; Ho and

Fersht, 1986). Changing Thr-51 to an alanine, cysteine or proline by oligonucleotide-directed mutagenesis stabilizes the transition state for the activation reaction. The latter mutant (Pro-51) has the most marked effect upon transition state stabilization, increasing the forward rate constant k$ for tyrosyl adenylate formation, which is equivalent to the overall rate constant kca^, by 20-fold. This improved value is now close to the value of k$ for the E. coli TyrRS, an enzyme which shows considerable homology with the B. stearothermophilus TyrRS in the ATP binding site and has a proline at position 51 (see Fig. 8.10). However, the increased stability of the enzyme-bound tyrosyl adenylate is accompanied by a reduced rate of transfer to tR N A ^1". In contrast,

"natural variants" of the B. stear other mophilus TyrRS, i.e. the equivalent enzymes from E. coli (Pro-51) and B. caldotenax (Ala-51), bind the transition state without stabilizing the adenylate or limiting the transfer to tRNAtyr (Ho and Fersht, 1986). Nonetheless, a minority of point mutations, such as those affecting Thr-40 or His-45, residues that bind the pyrophosphate leaving group of ATP during the activation transition state (Leatherbarrow et al, 1985), have radical effects upon the activity of the B. stearothermophilus TyrRS. Hence, maximising the use of the major tRNA species through codon usage might still reflect an important contribution to minimising the mutation rate of the synthetases.

7.4. Corroboration of the valS Sequence by Protein Chemistry

7.4.1. N-terminal Sequencing and Amino Acid Analysis of the Cloned

ValRS.

Purified B. stearother mo philus ValRS, prepared from cultures of

E. coli DH1 carrying either pNBl or pTB8, were sequenced by modified

Edman degredation in the gas-phase (Section 3.4). The protein prepared from a DHl[pNBl] culture yielded a nine residue sequence that differed from the sequence that is predicted from DNA sequencing in the second and fourth positions (relative to the initiator methionine).

The sequences may be aligned as shown overleaf; 2 0 5

Predicted sequence MAQHEVSMPP

Sequence determined by SQAEVSMPP

Edman degredation

The protein was re-sequenced and the same result was obtained. A batch of ValRS that had been purified from DHl[pTB8] was prepared for Edman degredation but it proved impossible to determine the

N-terminal sequence of this preparation. This suggested that the amino terminus of the B. stear other mo philus ValRS was "blocked", a feature often found with prokaryotic proteins. For example, the N-terminus of

E. coli MetRS was also blocked (Fasiolo et al, 1985). A closer inspection of the protein sequencing data, obtained from the protein derived from the DHl[pNBl] culture, revealed that the yield of the sequenced protein represented less than 5% of the original amount of protein (approximately 1 nmol) that was subjected to Edman degredation. A plausible explanation for these results is that the

N-terminus of the B. stear other mo philus ValRS is blocked and remains

inaccessible to sequencing by conventional methods. The chances of

two point mutations having occured in such close proximity at the

N-terminus of the protein, possibly introduced through the manipulations leading to the creation of the plasmid pTB8, are negligible. Thus, it is highly likely that the sequence that was obtained corresponds to the

N-terminal sequence of a small amount of contaminating E. coli ValRS, despite the heat treatment for 30 min at 56 °C in order to denature the mesophilic proteins (Section 3.3). A particular aminoacyl-tRNA synthetase from both E. coli and B. stear other mo philus often shows a high level of homology at the amino acid level. For example, the primary sequences for the TyrRS from these bacteria are 56%, whilst 206 the TrpRS exhibit 60% homology (Winter et al, 1983; Hall et al, 1982).

The ValRS from both these bacteria appear to be the same size and, assuming that there is a similar high level of homology between these enzymes, the physical characterisics of the enzymes may sufficiently similar such that the two enzymes cannot be resolved by the purification methods that are currently used (Section 3.3). The purification procedures employed relied on two separate chromatographies which, probably, do not discriminate between the two enzymes. The only direct selection that is employed against the E. coli

ValRS relies on the enzyme being heat-labile at 56 °C, a temperature at which the B. stearothermophilus enzyme is highly stable. An improved method of purification might be to separate the two proteins by any difference in their isoelectric points.

If the N-terminal sequence that had been obtained from the pNBl-encoded protein preparation did correspond to contaminating

E. coli ValRS, then the protein that had been purified from a

DHl[pTB8] culture (from which it was impossible to derive an

N-terminal sequence) might be expected to be more pure. Consequently,

60 pmol of this preparation was hydrolysed in 6M HC1 at 110 °C in order to ascertain the amino acid composition (Section 3.5). The results are compared with the theoretical composition, predicted from the DNA sequence, in Table 7.6. The results of the analysis are reasonable, being close to the predicted values for most of the amino acids.

Cysteine, predicted to occur 4 times in the protein, was probably present at such a low level in the hydrolysates that it did not give a peak on the analysis trace. 20 7

T ab le 7,6

Amino Acid Analysis of B. stearothermophilus ValRS

No. of residues per molecule Protein hydrolysate a Predicted

Asp/Asn 89 84 Thr 45 48 Ser 25 30 Glu/Gln 121 109 Pro 53 46 Gly 57 54 Ala 66 61 Cys 0 4 Val 63 58 Met 20 27 lie 38 51 Leu 67 76 Tyr 34 34 Phe 32 31 Lys 64 59 His 25 24 Arg 52 57 Trp 27 c 27

a The numbers of amino acids were determined as described in Section 3.5. b Numbers predicted from translation of the valS DNA sequence. c The number of tryptophan residues was determined spectrophotometrically (Section 3.6; Mulvey et al, 1974). 208

7.4.2. Spectrophotometric Determination of the Number of Tryptophan

Residues in the ValRS

The value for tryptophan was determined by using the spectrometric method of Mulvey et al (1974), based upon the difference in the absorbance at 280 n m and 288 n m of the aromatic rings of tryptophan in a fully denatured protein (Edelhoch, 1967). The number of tryptophan residues were calculated as described in Section 3.6. The number of tryptophans in B. s tear other mophilus TyrRS and bovine ribonuclease A were also determined by this method and compared with the known values. The results are presented in Table 7.7. The measured values show excellent agreement with the numbers obtained from translating the D N A sequences of the each of the proteins. The value for valS implies that the correct open reading frame is maintained throughout the length of the 2.65 kb ORF, especially as the tryptophan codons are fairly evenly distributed along the length of the gene. The reading frame of valS was moved by 1 or 2 bases in a 5’

— > 3’ direction using a subroutine of the program A N A L Y S E Q (Staden,

1984). The numbers of tryptophans in the new reading frames were 15

(+1) and 2 (+2) respectively. This is further evidence that the correct reading frame has been selected. 209

Table 7.7

Tryptophan Contents of ValRS. TvrRS and Ribonuclease A

Enzyme N trp a Ntrp b C (measured) (predicted)

ValRS 26.7 27 1.85 (B. stear)

TyrRS 5.5 6 1.049 (B. stear)

Ribonuclease A 0.43 0 0.587 (bovine)

a Ntrp> the number of tryptophan residues in the protein, was determined according to Mulvey et al (1974). See Section 3.6. b The number of tryptophans were obtained from a translation of the valS gene (ValRS) or, in the case of TyrRS and RNase, from the protein databank contained in the Micro-Genie sequencing software (Queen and Korn, 1984). c The extinction coefficients for each protein, were calculated from the absorbance of tryptophan, tyrosine and cysteine at 279-281 n m (Section 3.6, equation 2). 2 1 0

CHAPTER 8

PROTEIN SEQUENCE HOMOLOGIES BETWEEN ValRS AND

OTHER AMINOACYL-tRNA SYNTHETASES

8.1. Introduction

The protein sequence of the B. stearothermophilus YalRS was derived from the DN A sequence using the A N A L Y S E Q program of

Staden (1984). The protein sequence was compared to those of other aminoacyl-tRNA synthetases, in particular the E. coli IleRS, and homologies between the enzymes were analysed and displayed graphically using an updated version of D I A G O N (Staden, 1982a; R. Staden, unpublished). The results presented in the following sections confirm that the YalRS from B. stearothermophilus possesses an amino-terminal

"HIGH" sequence (Fig. 1.5; Webster et a U 1984) that is also present in the TyrRS from the same bacterium. This peptide has been shown to form key contacts with A T P and one of the histidines shown to bind pyrophosphate in the transition state of the adenylation reaction

(Leatherbarrow et al, 1985). A second significant homology has been found with the E. coli IleRS and MetRS involving residues that have been implicted in the binding of t R N A for the MetRS (Hountondji et al, 1986). Other regions of extensive homology have been found which are unique to ValRS and IleRS, representing the strongest homology yet found between two different synthetases, suggesting that these enzymes may have diverged from a common ancestor. The ValRS sequence was also compared to itself and shown not to contain any internal repeated sequences, as originally proposed by Koch et al (1974).

The hydropathicity of both ValRS and IleRS were determined 2 1 1 using HYDROPLOT, a program contained within a new protein analysis package called ANALYSEP (R. Staden, unpublished).

8.2. The B. stearothermophilus ValRS Does Not Contain Internal Repeats

The reasoning behind the theory that those synthetases with large subunits, such as the monomeric ValRS and LeuRS from

B. stearothermophilus (Mr = 110,000; Koch e t a l, 1974), had evolved through a process of gene duplication or fusion has been described in

Section 1.3.4. Recent evidence, obtained by determining the DNA sequences of various aaRS genes, infers that synthetases with a large protomer such as E. coli IleRS or MetRS (Fasiolo et a l, 1985) do not contain internal repeats. In several cases, the primary sequence, predicted from the sequence of the gene, was confirmed by protein sequencing of purified peptides.

The protein sequence of the B. stearothermophilus ValRS, derived from the DNA sequence of the va lS gene, was compared against itself using an updated version of the DIAGON matrix program (Staden,

1982; Fig. 8.1). The program plots a dot every time that the criteria governing a proportional scoring algorithm are satisfied, using "double matching probability" (McLachlan, 1971). The algorithm looks for perfect matches ("identities") within a window of defined length ("span length") and also applies a scoring matrix (Dayhoff, 1969) to the comparison to assess the quality of semi-conservative matches ("scores proportional"). In this example, a window of 25 residues was used to screen the sequence, searching for at least 8 perfect matches, at a scores proportional setting of 275. The results prove conclusively that 2 1 2

Fig. 8.1. 2-Dimensional matrix comparison of ValRS protein sequence against itself. The comparison was made using a version of DIAGON, as described in Section 8.2. The scale on each axis is marked in increments of 100 amino acid residues. 0,0 represents the N-terminus of each sequence (i.e. the sequence is scanned from N- to C-terminus by following the diagonal from the bottom left-hand corner to the top right-hand corner). The program parameters used in this comparison (see Section 8.2) were: span length - 25; identities - 8, and scores proportional - 275. 213 the B. stearothermophilus ValRS does not contain any significant areas of internal sequence redundancy. An identical plot was obtained using a similar matrix algorithm program contained in the Micro-Genie software (Queen and Korn, 1984). The background of dots represents non-significant matches. For example, the two small groups of dots that are close to the diagonal at the top right-hand corner of the figure correspond to two sequences centred around position 830 in the protein. An inspection of the sequence for this region suggests that the sequences of these peptides are leu-glu-lys-glu-leu-asp and trp-asn-lys-glu-val-glu. Certain matches are allowed according to the

Dayhoff matrix, for example, val for leu and asp for glu are fairly conservative substitutions, and at a low stringency, would constitute a match for DIAGON. Clearly, these regions are unlikely to represent a sequence duplication. Using a pool of 20 amino acids, any two residues may be expected to occupy adjacent positions in a sequence, on average, once every 400 residues. Hoben et al (1982) found that four tripeptides and a tetrapeptide, each repeated once in the primary sequence of E. coli GlnRS (Mr = 64,500), constituted the only repeating elements in this synthetase. Similarly, Putney and co-workers were unable to find anything longer than a five amino acid repeat in the

875 residue protomer of E. coli AlaRS (Putney et al, 1981a). Such repeats are insignificant and are not consistent with the idea of synthetases having arisen by gene duplication. 214

8.3. Matrix Comparison of ValRS with E. coli IleRS and other

Synthetases

8.3.1. Introduction

As the IleRS from E. coli represents the only large monomeric

synthetase for which the gene has been sequenced and the protein

sequence published (Webster e t aU 1984), it was logical to look for any

homologies between IleRS and the ValRS. Significant, but limited,

homologies have been found previously between different synthetases.

A particularly well-conserved region of homology has been found

between a number of different synthetases from E. co li, B acillus

c a ld o te n a x, B. stearothermophilus and yeast. This region is referred to

as the "HIGH" sequence, so-called because the sequence his-ile-gly-his

was found in the N-terminus of both E. co li IleRS and the TyrRS

from B. stearothermophilus (Webster et a l, 1984; Winter et a l, 1983).

Other enzymes have been found to contain this sequence or a variation

of it in the N-terminus (often with the substitution of leu for ile

and/or asn for his); these sequences are compared in Fig. 8.10. The

IleRS and ValRS are functionally similar enzymes (Section 1.4.2); both

are large monomeric enzymes that aminoacylate amino acids with

hydrophobic side chains and possess editing mechanisms (Baldwin and

Berg, 1966; Fersht and Kaethner, 1976). An alignment of the two

protein sequences was made with DIAGON (Staden, 1982a) and

significant homologies were found throughout the length of the proteins.

The ValRS protein sequence was also compared against the protein

sequences of various aminoacyl-tRNA synthetases contained in the EMBL

protein database contained on a VAX 11/780 at the MRC Laboratory

of Molecular Biology, Cambridge. 215

8.3.2. Homologies Between IleRS and ValRS

Fig. 8.2 shows a DIAGON plot of the ValRS sequence (vertical axis) compared against IleRS (horizontal axis). The stringency of the comparison is governed by a span length of 25, searching for 8 identities with a scores proportional value of 289. The plot reveals a

25% homology between the enzymes which is distributed throughout the length of the proteins. It is notable that there is minimal deviation from the diagonal, suggesting that the enzymes may have evolved from a common ancestor by a series of minor insertions and deletions and/or point mutations. In a separate set of analyses, ValRS was compared to

itself or with IleRS, using a setting of 281 for scores proportional

(Fig. 8.3a and b repectively). Note that the IleRS is depicted on the

vertical axis of Fig. 8.3b in this comparison. At this level of stringency, the ValRS shows an even lower level of off-centre matches

when compared to itself than the plot in Fig. 8.1 (scores proportional =

275). The IleRS/ValRS comparison is less stringent than the one shown in Fig. 8.2, but, nonetheless, indicates that there are a number of regions of homology between the two synthetases that are evenly spaced from one another along the length of the two enzymes. If the two proteins are linked by a common ancestor, and if they were to contain internal repeats, it is probable that some repeats would be apparent in both DIAGON plots. This does not appear to be the case and implies that neither protein has evolved from gene duplication or gene fusion. eal o te cl ad lt r a dsrbd n h lgn t Fg 8.1, Fig. to legend the in described as are plot and scale the of details Fig. 8.2. Matrix comparison of VaIRS versus IleRS. IleRS. versus VaIRS of comparison Matrix 8.2. Fig. xet ht soe pootoa stig f 8 ws used. was For 289 respectively. of axes setting proportional horizontal scores and a vertical that the except on. depicted are IleRS

0,0 VaIRS lleRS . coli E. aR and VaIRS

216 217 b

VaIRS VaIRS

Fig. 8.3. Homology plots for VaIRS against itself and IleRS at a different program stringency, (a) VaIRS compared to itself, (b) IleRS (vertical axis) compared to VaIRS (horizontal axis). For details of scale, see Fig. 8.1 legend. Each bar on both scales represents an increment of 100 amino acid residues. The program parameters used were: span length - 25; identities - 8, and scores proportional - 281. 218

8.3.3. Homologies between Different Synthetases

A number of amino acid homologies have been reported between a particular aminoacyl-tRNA synthetase from two different species. The

TyrRS from E. coli and B. stearothermophilus exhibit 56% perfect homology at the level of the primary sequence (Winter e t a l, 1983), whilst the mitochondrial form of yeast TrpRS shows 37% and 39% homologies, respectively, to the corresponding enzymes from E. co li and

B. stearothermophilus (Myers and Tzagoloff, 1985). The 400 N-terminal residues of E. coli MetRS are particularly homologous (44%) to residues

192-594 of the yeast MetRS (Walter e t a l, 1983). The best fit between pairs of sequences was achieved by introducing a limited number of single insertions into both sequences (where necessary) and by allowing for semi-conservative substitutions. Localised regions of especially high homology are common, both between the same enzyme from two species

(for example, a block of 11 residues is conserved in the C-terminal portion of the three Trp-tRNA synthetases (Myers and Tzagoloff, 1985) with only one substitution in the mitochondrial enzyme), and between different enzymes. The latter point is well illustrated by the areas around the HIGH sequence in the IleRS and MetRS from E. coli. A stretch of 11 identical amino acids (phe-tyr-ala-asn-gly-ser-ile-his-ile- gly-his) is found in the N-termini of both synthetases (Webster et al,

1984).

The B. stearothermophilus ValRS was compared to a number of synthetases, notably the E. co li and yeast Met-tRNA synthetases, the

E. coli and B. stearothermophilus Tyr-tRNA synthetases and the AlaRS from E. co li using the matrix comparison program DIAGON. The series of comparisons are shown in Figs. 8.4-8.6 and are dealt with 219

individually in the following pages.

ValRS versus E. coli AlaRS. The AlaRS from E. coli is a tetramer

made up of identical subunits with an 875 residue protomer. No

HIGH region has been reported for this enzyme. A DIAGON plot of

AlaRS (vertical axis) against ValRS (horizontal axis) is shown in

Fig. 8.4a. Clearly, there is no homology between the two enzymes. A

span length of 25 and identities of 8 was used as usual, with scores

proportional set at 281. These parameters were maintained for all the following comparisons, except where indicated.

ValRS versus B. stearothermophilus TyrRS. The TyrRS is dimeric, with the subunit consisting of 419 residues. The plot shown in Fig. 8.5a, with the TyrRS on the vertical axis, does not show any significant homology to the ValRS, except for a minor region near the C-termini

of both enzymes. The program does not bring out the HIGH sequence

match at this stringency. Similar results were obtained for comparison

between ValRS and E. coli TyrRS (plot not shown).

Comparison of the Tyr-tRNA synthetases from B. stearothermophilus

and E. coli confirmed the significant identity between the two

Tyr-tRNA synthetases for most of their length (Fig. 8.5b). Winter et al

(1983) aligned the two sequences by making one minor insertion in

each sequence such that a match of 56% perfect identities was

obtained. The application of the Dayhoff similarity matrix to the

comparison suggests that the degree of overall homology, allowing for

conservative substitutions, is substantially higher.

A scores proportional setting of 289 was used in these two

comparisons. 2 2 0 a b c

_ i .

>

• t

• i i CO • j CQ

- 0 )

fc- >1 - • h- *

4 • t

ii in in i 1111111 I ( 1 , o 0 , 0 0 . 0 VaIRS VaIRS TrpRS ( B s t )

Fig. 8.4. Matrix homology comparisons of VaIRS with other synthetases. (a) E. coli AlaRS (vertical axis) is compared to VaIRS (horizontal axis). (b) B. stearothermophilus (Bst) TyrRS (vertical axis) compared to VaIRS (horizontal axis), (c) VaIRS (vertical axis) compared to Bst TrpRS (horizontal axis). The scale is marked in increments of 100 amnio acid residues and 0,0 represents the N-termini of the two proteins in each comparison. The program parameters are as given in Fig. 8.3 legend. 2 2 1 a b

_ ;J J i i

]1 f l • l t i 4m i CO t* *i _ t - / ■ /

- 1 l Ji / t t 1 t t 1 1 ! ' i i i i , o 0 , 0 TyrRS (Bst) TyrRS (E.coliJ

Fig. 8.5. Comparison of ValRS with Bst or E. coli TyrRS. (a) ValRS (vertical axis) was compared to Bst TyrRS. (b) Bst TyrRS (vertical axis) and E. coli TyrRS (horizontal axis) compared to each other. Program parameters used were: span length - 25; identities - 8, and scores proportional, 289. Scale markings as for Fig. 8.1 legend. 2 2 2

ValRS versus B. stearothermophilus TrpRS. TrpRS (327 residues) was plotted on the horizontal axis. No homologies were found between these two enzymes (Fig. 8.3c).

ValRS versus Met-tRNA Synthetases. The Met-tRNA synthetases from yeast cytoplasm (vertical axis: 751 residues) and E .coli (horizontal axis:

676 residues) were compared to each other using DIAGON (Fig. 8.6a).

A striking degree of homology is apparent. As mentioned earlier,

Walter e t a l (1983) noted that the two sequences were 44% homologous

(30% identical residues, 14% conservative substitutions) for a region that spans residues 192-594 in the yeast enzyme and residues 1-401 in the

bacterial enzyme. Walter aligned the sequences by making minor

insertions or deletions in one or other of the sequences. DIAGON does

not use such a method of alignment, but any significant matches over

a small region that are disrupted by a minor insertion or deletion will

be plotted as a discrete group of dots that are parallel to the diagonal

(Staden, 1982). The plot confirms that there is extensive homology over

the regions described above. The scores proportional (281) is probably

not stringent enough for the comparison (the same can be taken as

applying to the IleRS/ValRS plot in Fig. 8.3b), but this particular

setting was chosen as an intermediate value in order to reveal any

homologies between ValRS and dissimilar synthetases such as AlaRS and

TrpRS.

The startpoint of the major homology between the two

Met-tRNA synthetases implies that these two enzymes have diverged

from a common ancestral protein, but that at a certain point, either

the E. coli enzyme lost roughly 200 amino acids from the N-terminus

or that the yeast enzyme has gained an additional 200 residues at it s

N-terminus as a result of gene fusion. The latter possibility is more 223

a b c

r * « i / * J • -

i f ' / i i • j i

o • • • •i 1 o • '*r cc • i }l ** c c - o _ <1) 1 - s .

t • i 1 + m t

-

i i j-1.1111 ;1 HI 111 1 M Ml 111 0,0 0,0 0,0 MetRS VaIRS VaIRS (E.coli)

Fig. 8.6. Comparison of E. coli and yeast MetRS versus VaIRS. (a) yeast MetRS (vertical axis) is compared to the E. coli MetRS (horizontal axis). VaIRS (horizontal axes) was subsequently compared to E. coli MetRS (b) or yeast MetRS (c). Parameter settings and scales as described in Fig. 8.3 legend. 224 likely to be correct. It will be shown in Section 8.5 that a number of synthetases have a common homology at the amino-termini with respect to the HIGH sequence. Most of the enzymes depicted in Table 8.1 have the HIGH sequence located at between 15 and 65 residues in from the N-terminus. The sole exception is the yeast MetRS where the

HIGH sequence occupies residues 212-215 (the E. coli HIGH sequence occupies positions 21-24).

When ValRS is compared to the Met-tRNA synthetases from

E. coli and yeast, certain homologies are found that correspond to the

HIGH regions (Fig. 8.6, panels b and c respectively). The ValRS is depicted on the horizontal axis in both instances. E. coli MetRS shows two areas of homology that are close to the diagonal. The HIGH region is denoted by (I) in the figure, whilst the other region (II) represents a new region of high homology that is conserved among

different synthetases which will be referred to in subsequent sections as

the KMSKS sequence, a lysine-rich region that has been impicated in

binding the 3’ end of tRNA for the E. coli TyrRS and MetRS

(Hountondji et aU 1986). This sequence is centred around position 525

in ValRS and 330 in MetRS. The yeast MetRS (Fig. 8.6c) shows a

homology with ValRS around the HIGH sequence (III). The stringency

of the comparison is such that a match in the KMSKS sequence is

excluded (the yeast sequence is KFSKS, compared to KMSKS in the

ValRS or E. coli MetRS - see Table 8.2). Regions I and II in the

E. coli MetRS align with ValRS along the diagonal, but there is no

evidence of more extensive homologies.

IleRS versus MetRS. These two enzymes share little homology apart

from the HIGH region (data not shown). The extent of the HIGH

homology is discussed in more detail in Section 8.5. 225

8.3.4. Discussion

The stringency of the DIAGON program can be increased by

reducing the size of the window or by increasing the scores

proportional value, removing practically all of the background. An example is shown in Fig. 8.3a where ValRS was compared to itself with the scores proportional setting increased to 281. The selection of the parameters is entirely arbitary and it is left to the operator to decide whether the stringency of the comparison is high enough by examining the pattern of dots. For example, the Trp-tRNA synthetases from B. stearothermophilus and E. coli were compared at a scores proportional setting of 275. This produced a diagonal line, broken every few residues but with little off-diagonal matches (i.e. no large deletions or insertions in either sequence). The scores proportional value was increased to 325, refining the match to 4 major blocks of homology that were distributed along the length of the enzymes

(Fig. 8.7, a and b respectively). Nonetheless, the versatility of this

program, coupled with the Dayhoff weighting matrix, makes DIAGON an invaluable aid for assessing the relatedness of protein sequences.

Regions of homology can be pinpointed and the sequences that

constitute a match examined further.

The E. coli IleRS and the ValRS from B. stearothermophilus are

remarkably homologous (25%) considering that two different enzymes,

one from a Gram-negative and the other from a Gram-positive

bacterium, are being compared. The two enzymes are functionally

similar (Section 1.4.3) and, on a superficial level, are structurally

similar, being momomeric enzymes of 939 and 880 residues respectively.

This suggests that they may have evolved from a common ancestor.

This possibility is supported by the distribution of the homologies 226

0 , 0

Fig. 8.7. Example of effect of changing stringency of the matrix upon homologies between two enzymes. The Trp-tRNA synthetases from E. coh (vertical axis) and Bst (horizontal axis) were compared with a window of span length 25, identities 8, but a scores proportional setting of 275 (a) or 325 (b). 227 throughout the full lengths of the enzymes and by the fact that the alignment fits the diagonal. The effects of small insertions, deletions or mismatches would be apparent on the 2-dimensional plot as a set of broken co-linear lines (Staden, 1982a). Such homology is unique amongst those synthetases that have been sequenced to date.

The IleRS and ValRS were also compared withrespect to similarities

between the hydropathicity profiles for the two enzymes. Fig. 8.8

shows a comparison of ValRS (a) against IleRS (b) that was made

using HYDROPLOT (R. Staden, unpublished). The regions of

hydrophobicity are plotted above the central horizontal line and

hydrophilicity plotted below it, the N-terminus of each protein being

represented on the left of the plot (N). The scale on the vertical axis

represents arbitary units of hydropathicity, whilst the horizontal axis is

scaled in units of 100 amino acid residues. The sequences were

screened with a window of 11 residues, moving through the sequence

with a step size of 5 residues. Certain regions appear to be present in

both enzymes, such as blocks A and F (ValRS and IleRS respectively)

or B and F. The latter alignment, with block B being centred around

position 240 in ValRS and block G centred roughly at 290 in IleRS,

agrees with an independent assessment of the degree of relatedness

between the two synthetases in which the protein sequences were

aligned by introducing padding characters into one sequence or the other (Section 8.4; Fig. 8.9). The sequences corresponding to blocks B and G may be aligned by introducing two main insertions into the

ValRS sequence at positions 110 and 252. In addition, blocks B and G

are a good example of a general difference throughout the lengths of

the two enzymes; ValRS tends to be more hydrophilic than IleRS.

Further, the ValRS shows greater extremes with respect to the extent 228

N C

N C

Fig. 8.8. Hydropathicity plots for ValRS and IleRS. THe hydropathicity of ValRS (a) and IleRS (b) were compared using HYDROPLOT (Section 8.3.5). Hydrophobicity is plotted above the central horizontal line in each plot and hydrophilicity below it. N and C represent the amino- and carboxy-termini of the proteins respectively. The horizontal scale is marked in increments of 100 amino acid residues. 229 of the hydrophobicity and hydrophilicity in certain regions (for example, compare blocks E and J at the C-terminus of each protein).

There is no evidence of either enzyme containing extensive domains that are particularly hydrophilic or hydrophobic, with the possible exception of block G in IleRS. This region is particularly rich in asp, glu, and arg residues (Fig. 8.9).

A comparison of ValRS with AlaRS, an enzyme whose subunit size is comparable to that of ValRS (875 residues), was made using the hydropathicity function of MicroGenie (Queen and Korn, 1984). No significant relatedness was found between the two enzymes (data not shown).

8.4. Extensive Localised Homologies Between ValRS and IleRS

8.4.1. Introduction

A comparison of ValRS with IleRS, made using DIAGON, revealed a number of regions of homology that were distributed throughout the length of the proteins (Fig. 8.2). A closer inspection of these homologies may be made by an arbitary aligninment of the protein sequences, allowing for certain insertions or deletions, where

Fig. 8.9. Comparison of the primary sequences of ValRS and IleRS (following three pages). The sequences were aligned to give the best overall homology, allowing for conservative amino acid substituitions according to the matrix of Dayhoff (1969). The ValRS and IleRS sequences are shown on the top and bottom lines respectively. Perfect matches are boxed, whilst conservative matches are indicated by an asterisk. The sequences are numbered from the N-termini and the standard one-letter code for amino acids is used (see Fig. 1.5 legend). 2 3 0

1 MAQHEVSMPPKYDHRAVE

1 MSDYKSTLNLPETGFPMRGDLAKREPGML

20 G R Y E W W L KGKFFEATGDPN KR PF T I V I PP P *■ # * 31 RWTDDD _LY_G_IGKIRAAKKG K K T_FI L H D GP P

50 N V T GK L HL G HAWDTT L 0 D I I1T R MKR M QG Y D * * * 59 Y A N GS I HI G HS V N K I L KD I I V K S_KG L SG Y D

80 VLWL P GMDH AGI A T Q AK VE EKLRQO GL S R Y * 89 S P Y VP GWDC HGL P I E L K VE Q E Y G K PGE K F T

110 D------L G

119 AAEFRAKCREYAATQVDGORKDFIRLGV L G

113 REKFLEE TWK WKEEYAGHI R Q W A K L GL-G * 149 DWSHP YL TMD F KTE_— £ N I I R L G K I I G_NG

143 DYTRERFTLDEGLSK- AV R EV F VS LYRKG L * * * * « 178 LHKGAKP VHWCVDCRS AL A£ A EVE YY_DLidT S

172 I YRGEY II NWDPVT KTALSDI EVVYKEVKG X * 208 PS I DVAFGAVDQDAL KAKF AVSN£ NGP ISL

202 ALYHM~R YPL ADG SGFI EVATTRPET ML * * 238 VI WTTRRG_LCL PTAQS.L LH0 I ST MlRjWWQI D

229 DTAVAVHPDD R YKHL I G KMVKL------* * W -X- 268 QAVILAKDLV SVMQR I GVTDYTILGTVK

253 P I VGR E I p I I A D E Y V DHEF * # # * * 298 GADVELLRFTH F MGF D V_PAI L G D h [vJT L D A 231

272 G S G A V K I T p AHD P N D F E I G N R H N L * * * * 328 G. T G A V H T A £ G_HG_P DD Y V I G QKYGL

302 D G T-MNENAMQYDGL D R FECRKO I R d |T * * * 1 358 D G TYLPGTYPTLDG. V N V_FKAND I V V_A l[l

331 Q EQGVLF K I EE H VH SV G H S E RS G A VIE P Y L * * * 388 Q EKGALL H VEK M QH SYPCCWRH K T PII F R A

361 Q W F VKMKPLAEAAI KLQ Q T DGKV QF VPE * 418 OWFVS D Q K G L RAQ SLK E I KG.- V QW IPD

391 RFEKTYLHWL NilRiDWCISRQL WW GH R I P A * 447 WGQARIESMV R PDWC I SROR TW GVPHSL

421 WYHKETGEIYVDHEPPKDIENWEQD------* *** 477 FVHKDx EE_LHPRTLELMEEVAKRVE V D GI Q

446 P DVL D| Ti W F. s i ! I 507 AWWDLDAKEILGDEADQYVKVP DTl d!v ;w f'd

455 SALwPFSTMGWpDTDSPDYK R Y Y P T

537 S_GSTHSs VVDVRPEFAGHAA------DJM Y L E

485 GYD I I F FiWV - - S,R M I F Q G L E F TJ6 KiRPF K D!V 1 | ! * 1 562 GSDQHRGiW F M SlSJL M I --STAMKIGKiA JPY C Q i V

513 LIH GL V R*,D A QGRKMSKS LG NGi VI D< PMD Vj I D Q t l * CD 590 _L_T X F T v:d GIQ G R K M SKSIG Nt!vjspQD V:M N K

543 YGADAL RY F L AT G S S P G DD L R F STIE * * # 620 LGAD I L RLW VA S T D Y TG D- M A VSd!e 232

— 573 WNFANK I w N A S R F A L M N - MG GMTYEELDLS * * * 649 A D S Y R R R N T A R F Lx A N_L NX FDPAKDMVKP

602 GEKTV ADHWI LT R L N E T IE T V T K LAE Kr*\ EF * * * I* 679 E E_M Vx L £ R_WAVG CA K AA Qx DI LK A Y£ AX' DX

632 G ER GR Tr Y NT I WDD L CD WY I E M A K L P LY * * * * 709 HE V V 0 Rx M Rx c S V EMV S FX LD I I KD RDX T P

660 G D D E AA K K T T R S V r A Y V L D N T M"r L L H P F MP *** ** * * If *• 739 K R T V WA R R S c 0 T A x Y H I A E A L V R WMA P I L S

690 F I T E E I w D N L P H E G E S I T V A P WPQVRPELS * * * 769 x TAD x ViWG Y L P G E R E KYVLTGE WYEGLFGL

720 NEEAAEEMRMLVDII R A V RNVRAEVNT P P S * * *** 799 ADSEAMNDAF WDELLK------VRGEVN------

750 KP I ALY I KTKDEHVRAALLKNRAYLERFCN * 821 KVI EQAR ADNKVGGSLE AAVTTYAEPELSA

780 P S E lTI i D | T NVPAPDKAMTAV TGAELIMPL

851 K L T A_Jg DjELRFVLLTSDRRYI * ADYNDAPAD

810 E G L I N I E EIE I KRL EKE L DKW N K; E VERVEKK ! * * 881 ADDS ------XlV LjjK G KVA L_ S K A E G[_E K< C P R C W H

840 LANEGFL AKAP HVVEEERRk'ROD Y I,EKR * * * 908 YTODVGKVAEH EICGRCVSNVAGDGEKR

870 AVKARLAELKR * F A938 233 appropriate, to produce a sensible match as the IleRS is 59 amino acids longer than ValRS. Such an alignment is shown in Fig. 8.9. Identical matches between pairs of residues are boxed; conservative substitutions are marked with an asterisk and were made according to the Dayhoff matrix (Dayhoff, 1969). The substitutions allowed were: asp/glu; asn/gln; ser/thr; val/ile; val/ala; val/leu; met/leu; lys/arg/his; phe/tyr and cys/ser. The substitution of alanine for glycine was not included because glycine increases the flexibility of a polypeptide chain by virtue of having only a hydrogen atom for a side chain (Schulz and

Schirmer, 1979). Additionally, a type II reverse turn ("/3-bend"), which for consecutive residues /, /+2 and i+3 has the oxygen of i hydrogen-bonded to the nitrogen of i+3, requires that residue i+2 is a glycine (Venkatachalam, 1968). The alignment was made by using a homology program contained in Micro-Genie (Queen and Korn, 1984) that uses Dayhoff’s similarity matrix (Dayhoff, 1969). The number of perfect matches made through the entire comparison was calculated to be 22.8%. The program made many minor insertions in both sequences so subsequently, the comparison was re-aligned by eye. The best alignment was made by positioning the N-terminus of the ValRS (top

sequence in the figure) relative to residue 12 of the IleRS and by

introducing a number of groups of padding characters. For ValRS,

the major insertions were made at positions 111 (27 characters), 214 (3

characters), 253 (17) and at position 446 (26 characters). A few minor

insertions (typically, no more than 5 characters) were made elsewhere.

A limited number of small insertions were also made in the IleRS

sequence; these did not exceed 5 amino acids on any occasion.

Insertions of 1 or 2 residues were avoided where possible. 234

8.4.2. Alignment of IleRS and ValRS Primary Sequences

A total of 212 perfect matches were made, with a further 87 conservative substitutions (Fig. 8.9). This alignment represents roughly

32% identities throughout the length of the alignment. The comparison was divided into a number of blocks of amino acids according to the pattern of homology; these regions are described below.

(A) ValRS (19-105)/IleRS (30-114). This region contains a number of single homologies or matches between 2 or 3 consecutive residues and includes a sequence his-leu-gly-his at positions 56-59 of ValRS that corresponds to the HIGH sequence (Winter et al, 1983; Webster et al,

1984). This homology is compared to the HIGH sequences of a number of other synthetases in Section 8.5.

(B) ValRS (lll-247)/IleR S (147-286). This block contains a number of isolated homologies of usually 1 or 2 residues which are distributed evenly throughout the region.

(C) ValRS (253-429)/IleRS (309-485). Several longer homologies are found in this region, including a block of 7 residues of sequence asp-trp-cys-ile-ser-arg-gln (positions 405-411 in the ValRS sequence).

There is a high proportion of conserved hydrophobic residues such as ala, met, ile, leu and val over this area (17%). This compares with 9% for block (A) and 11% for (B).

(D) ValRS (446-455)/IleRS (528-537). This small section is 70% homologous betwen the two enzymes and is in the centre of a very poorly conserved region that required that a run of 26 consecutive 235 padding characters was placed in ValRS at position 446.

(E) ValRS (480-591)/IleRS (557-668). This is a region of particularly high homology that is centred around a group of 11 residues which are a perfect match apart from one conservative substitution (leu for ile).

The degree of homology over the region between residues 505 and 549 with respect to ValRS is 66% and represents the most extensive homology that has been found between different synthetases to date.

The region contains a limited number of conservative substitutions. An

11-residue block at the centre of this region contains the KMSKS sequence that appears to be conserved among several other synthetases, notably MetRS and TrpRS from different organisms (Hountondji et alt

1986). The implications of these homologies are described more fully in Section 8.6.

(F) ValRS (665-704)/IleRS (744-783). A region of isolated perfect matches and a high degree of conservative matches, in contrast to (E).

(G) ValRS (750-end)/IleRS (821-end). The C-terminal region contains mainly isolated substitutions or perfect matches and represents the area with the lowest overall homology in this comparison. Lysines are well conserved (18% of the conserved residues). The C-terminus of the

ValRS contains more basic amino acids than IleRS, lysine, histidine and arginine comprising 22% of the residues in this portion of ValRS

(compared to 16% for IleRS). There is a very basic region located between residues 857 and 868 in ValRS (50%). 236

8.4.3. Discussion

The comparison of IleRS with ValRS suggests that these two enzymes are closely related, both in function and structure, despite belonging to different prokaryotic genera. The evidence may be summarised as follows: (1) the enzymes are similar in size and quartenary structure, both activate amino acids with branched aliphatic side-chains and charge one mol of aminoacyl adenylate per mol of enzyme, and have evolved editing mechanisms to reduce mischarging

(Fersht, 1985). IleRS can activate valine but hydrolyses val-tRNA1^ rapidly, releasing tRNA^e, valine, AMP and pyrophosphate (Baldwin and

Berg, 1966). Similarly, ValRS edits against threonine (Fersht and

Kaethner, 1976); (2) a matrix comparison of the enzymes, portrayed as a 2-dimensional plot (Fig. 8.2), indicates that there is widespread homology throughout the length of the two proteins. A level of

25-32% homology (both perfect matches and conservative substitutions) may be assigned to the comparison. The minimal deviation from the diagonal suggests that the two enzymes have evolved by a number of small-scale mutagenic events (e.g. point mutations, minor insertions or deletions); (3) Certain sequences are conserved more strongly than others. Both enzymes possess a HIGH sequence, implicated in binding

ATP, close to the N-terminus. Such a region also occurs in TyrRS,

MetRS, GlnRS, GluRS and TrpRS from different sources (Section 8.5).

A region in the C-terminal half of each protein (the KMSKS sequence) contains the longest stretch of identity between the two synthetases (11 residues with only one conservative substitution) contained within a region of better than average homology. The extent of the homology around the KMSKS sequence of IleRS and ValRS over roughly 45 amino acids (66%) identifies this region as the best conserved homology TyrRS (B. c a l . ) 34 Y fc' GFD p T 1*1 D - S fin H I G NLAAI 52 Jones et al, 1986 I | i i 1 1 i i Y G FD p T D - s HI G H LATI 52 Winter et al, 1983 TyrRS (B. S t . ) 34 lC 1 'A* 1 1 il i T - L G HL V L TyrRS (E. c .) 34 Y |C 1 GF D p j A J D s Llj H P 52 Barker et al, 1982a

MctRS (S. c . cytoplasmic) 201 T ■ s 1 A L - p Y lv | NN V p HL G NII G S 219 Walter et al, 1983 1 1 1 T L - p Y N G s H LG H M LEH Barker et al, 1982b MctRS ( E . c .) 10 I f J A A| 28 52 L H D G P p Y | A N G s H I G H S V N K 71 Webster et al, 1984 IlcRS (E. c ) I1! 44 I V I P P p N T G K HL G HA W D T 63 Brand, this thesis ValRS (B . S t.) If J LlJ

TrpRS (S. c . mitochondrial) 38 F S MI Qp T - - G CFHL G NYLGA 55 Myers and Tzagoloff, 1985

TrpRS (E . c .) 7 F SG A Qp - G E Il“| TI G NY MGA 2 Hall et al, 1982 s " i , 5 F SG I p - G V TI G NYI GA 22 Winter and Hartley, 1977 TrpRS ( B . S t.) Q s - W GluRS ( E . c .) 4 K T R F A p P T G Y H VGGARTA 23 Breton et al, 1986 (in press) _ i l !L j 28 H T R F p E P NG YHI G H AKSI 47 Yamao et al, 1982 GlnRS (E. c ) . l l I 238 between these synthetases and, at the level of the primary structure, is good evidence for a link between the two enzymes.

8.5. The HIGH Region

Fig. 8.10 depicts an arbitary alignment of the N-terminal sequences of a number of different synthetases that contain a variation on the sequence his-ile-gly-his (HIGH region). Earlier observations on the importance of this region and evidence for it being part of the ATP binding site of a number of different synthetases have been mentioned elsewhere (Section 1.6.1). Regions of perfect homology that are present in most of the sequences depicted are shown within solid boxes. The substitutions of ser for thr, ser for cys or his for asn were considered to be particularly good matches, calculated from the Dayhoff similarity matrix. Other substitutions, such as conservative replacements of

Fig. 8.10. (Facing) Alignment of 12 aminoacyl-tRNA synthetases around the HIGH sequence. The primary sequences of a number of synthetases from E. coli (E.c.), B. ste ar other mophilus (B.st.), B. caldotenax (B.cal.) and yeast (S.c.) are compared. The sequences were aligned around the sequence his-ile-gly-his, or an acceptable variation of this sequence made according to the amino acid similarity matrix of Dayhoff (1969). Identical homologies are boxed within unbroken lines; regions of conservative homology are drawn within dashed lines. Padding characters (-) have been inserted only where necessary in order to align regions of probable homology. The numbering represents the numbers of the first and last amino acids respectively, relative to the N-termimus of each enzyme. The references for each sequence are shown to the right of the figure. 239 residues with hydrophobic branched side chains, for example, are drawn within dotted boxes. Identities that are common for a particular synthetase from several species (such the TyrRSs from E. coli,

B. stear other mophilus, and B. caldotenax) and are not representative of a general trend between a range of different synthetases are omitted for clarity. The use of single insertions (padding characters) to align regions of homology relative to one another has been limited to where necessary. For example, all the synthetases shown have at least one proline located 5-7 residues upstream of the HIGH region and these have been aligned by padding. No characters were inserted in the

IleRS or ValRS sequences, in agreement with the alignment for this region shown in Fig. 8.9.

The HIGH sequence itself is found close to the N-termini of all the enzymes in this comparison, with the exception of MetRS from yeast. The plot of the extent of homology between E. coli and yeast

MetRS shown in Fig. 8.6a implies that the yeast MetRS has gained 200 or so amino acids at its N-terminus, accounting for the location of the sequence HLGN at residues 212-215. The glycine is the only invariant residue in this sequence and ile, val and leu appear to be fairly interchangeable in the second position of the HIGH quartet. The second histidine (His-48 in the TyrRS from B. stear other mo philus) is occasionally replaced by asn, most notably in all three Trp-tRNA synthetases in this study. The first histidine of the sequence (His-45 in the TyrRS) is well conserved, with the exception of the Trp-tRNA synthetases from E. coli and B. stearothermophilus where it is replaced by threonine. Protein engineering studies that have been undertaken on the TyrRS of B. stearothermophilus have thrown some light on the role of these residues. Consequently, this enzyme may be viewed as a good 2 4 0 example of the function of the HIGH residues.

The HIGH sequence is contained on a region of a-helix that is part of the ATP binding domain of the protein (Bhat et al, 1982).

The domain, which strongly resembles the so-called "Rossmann" fold that is found in many nucleotide-binding proteins (Rossmann et al, 1976), consists of a group of five parallel and one short antiparallel /3-strand that are linked together by a-helices. His-45 has been shown to be involved in the binding of ATP in the transition state (Leatherbarrow et al, 1985). By superimposing the structure of the transition state pentacoordinate intermediate upon the known crystal structure of the

TyrRS, it was proposed that the leaving group (pyrophosphate) would be pushed away from the active site in the transition state and up towards Thr-40 and His-45 so as to make favourable hydrogen bonds

between these residues and the 7-phosphate (Fig. 8.11). The removal of

either of the residues by site-directed mutagenesis (Thr—>Ala-40;

His—>Gly-45) reduced the forward rate constant for the formation of

tyrosyl adenylate, k 3 (equivalent to kcat) by two orders of magnitude

or more. k 3 for the double mutant (i.e. both residues were mutated at

the same time) was over 105 times slower than wild-type enzyme. The

attack of pyrophosphate upon tyrosyl adenylate to form ATP and tyrosine was also affected by these mutations, indicating that Thr-40 and His-45 provide binding sites for pyrophosphate in the backward reaction.

The glycine at position 47 relative to the B. stearothermophilus

TyrRS is conserved in all synthetases possessing a HIGH region. This glycine could possibly be conserved because it places some structural constraint on this part of the enzyme (for example, for the reasons mentioned in Section 8.4.1). Recent evidence for the role of Gly-47 in the TyrRS suggests that it is well conserved on account of the size of 241

Fig. 8.11. Schematic view of the transition state for the formation of tyrosyl adenylate by B. stear other mophilus TyrRS. The figure, taken from Leatherbarrow et al (1985), is the result of a model building study in which the transition state of tyrosyl adenylate was superimposed upon the crystallographic structure of the TyrRS. 242 its side chain. If gly is replaced with ala by site-directed mutagenesis

(Gly—>Ala-47), the rate of adenylation is reduced by a factor of at least 80 (Paul Carter, unpublished). X-ray crystallographic studies suggest that the introduction of the methyl group (replacing the of Gly-47) may be sterically unfavourable, displacing the adenine ring

of ATP (Katy Brown, personal communication). The adenine ring is

bordered by a number of hydrophobic residues such as Leu-44, Ile-46,

Leu-49 and Ala-50. (A general point that can be drawn from Fig. 8.10

is that the HIGH sequence is often followed by 1 or 2 hydrophobic

residues such as ala, leu, ile, met or val.) Alternatively, alanine may

distort the top of the helix and cause a slight shift in the position of

His-45. The effect of the mutation is to have a minor but significant

effect on adenylation.

His-48 in TyrRS is replaced by asn in the corresponding enzyme

from B. caldotenax, the two enzymes being 99% homologous at the

protein level (Jones et al, 1986). His-48 binds the ribose-oxygen of

ATP (Fersht et al, 1984) and protein engineering has been used to

change His-48 to asn. The mutation produces an enzyme that is

kinetically almost indistinguishable from the wild-type enzyme (Lowe et al, 1985).

These results for the B. stearothermophilus TyrRS may have wider

significance for some of the other synthetases shown in Fig. 8.10. It

has already been shown that the ValRS from B. stearothermophilus and

the E . coli IleRS are fairly homologous (25-32%) over much of their

lengths (Section 8.4.2). Webster and co-workers, reporting the protein

sequence of IleRS, noted an 11-residue homology between IleRS and

E. coli MetRS that includes the HIGH sequence (Webster et al, 1984).

The sequence, PYANGSIH(I/L)GH, occupies residues 57-67 in IleRS and 243

14-24 in MetRS with one conservative substitution (ile for leu, in parentheses in the above sequence). This oligopeptide, together with the sequence surrounding the KMSKS sequence that is conserved between

IleRS and ValRS (Section 8.4.2 and below), represent the longest stretches of identity yet found between any two different synthetases.

The amino acid sequence of this region in the B. stearothermophilus

TyrRS is PTAD-SLHIGH, allowing the one insertion, and virtually identical sequences are found in the Tyr-tRNA synthetases from E. coli and B. caldotenax (Fig. 8.10). The importance of the MetRS/IleRS sequence homology is that this region of the MetRS has been shown to be almost identical in 3-dimensional structure to the corresponding region in the B. stearothermophilus TyrRS (Blow et al, 1983). The

MetRS also possesses a Rossmann fold in the N-terminal portion of the enzyme (Risler et al, 1981) and a region of good structural homology between the TyrRS and MetRS is maintained over about 200 residues

(Blow et al, 1983). Fig. 8.12 shows that the two regions may be superimposed with little deviation between them, especially over that section which contains the two HIGH sequence histidines and a conserved cysteine (Cys-35 in TyrRS; Cys-11 in MetRS). Taken together, the structural homologies between E. coli MetRS and

B. stearothermophilus TyrRS, the sequence homologies between MetRS and

IleRS and those between the three Tyr-tRNA synthetases strongly suggest that these synthetases are related to one another structurally with respect to the conformation of the ATP binding site.

Sequence homologies with the other synthetases in the Table are less clear. The HIGH sequence is nearly always preceded by a hydrophobic residue, usually ile or leu. ValRS, IleRS, GluRS, GlnRS and the three Trp-tRNA synthetases possess an invariant glycine three residues before HIGH, despite considerable variation in sequence around 244

Fig. 8.12. Alignment of the a-carbon backbones of B. stear other mophilus TyrRS and E. coli MetRS in the Rossmann fold. Stereoscopic view of the (3^ and /Sg strands, and org helix of the Rossmann folds of MetRS (large atoms) and TyrRS (small atoms). N and C represent the N- and C-termini of this part of the fold. The arrowed residues denote the conserved cysteine and the two histidines of the HIGH sequence (see Fig. 8.10). Figure taken from Blow et al (1983). the glycine. An arrangement that had been noted by Barker and

Winter (1982), mentioned above, was a cysteine that is conserved in an identical position in the sequences and 3-dimensional structures of

TyrRS and MetRS. This cysteine is also conserved between all three

Tyr-tRNA synthetases (Cys-35) and is replaced by serine in the yeast

MetRS (Walter et al, 1983), a conservative substitution according to the

Dayhoff similarity matrix. The significance of this alignment is debatable. Cysteine is one of the rarer amino acids (Doolittle, 1981), and occurs 8 times in MetRS (1% of the total residues) and only twice in TyrRS (0.5%). Given that its side-chain is a potent nucleophile, it might be expected to be particularly well-conserved. Protein engineering of the B. stear other mo philus TyrRS reveals that Cys-35 is one of a number of residues, sited away from the centre of the reaction, that contribute collectively to lowering the activation threshold for adenylation by stabilizing the transition state (Fersht et al, 1986).

Hence, the evolutionary pressure to conserve a residue whose function may be duplicated by other amino acids may not be as high as expected. An example of the difficulty of assigning residues to specific functions on the basis of their alignment to other sequences comes from the E. coli AlaRS. Barker and Winter (1982) noted that two histidines and a cysteine in a region covering positions 175-195 in the AlaRS sequence could be aligned perfectly with the conserved residues in the N-terminus of MetRS and TyrRS. However, a close inspection of the AlaRS sequence, and a comparison to the tertiary structures of MetRS and TyrRS, revealed that this particular region of the AlaRS was rich in proline and glycine residues, suggesting little tertiary homology with MetRS and TyrRS.

The AlaRS from E. coli (Putney et alt 1981a) does not contain a 246

HIGH sequence, nor have HIGH sequences been reported for GlyRS

(Webster et al, 1983) and HisRS (Freedman et al, 1985), both from

E. coli. An inspection of the sequence of both subunits of E. coli

PheRS (Fayat et al, 1983) does not reveal a HIGH region either.

8.6 . The KMSKS Region

The sequence lys-met-ser-lys-ser is located at the centre of the region of greatest homology between YalRS and IleRS (Section 8.4.2).

The area around this sequence will be refered to henceforth as the

KMSKS region. Hountondji et al (1985) have implicated the second lysine in the sequence as one that interacts with the 3’ end of tRNA for E. coli MetRS. Similarly, the Tyr-tRNA synthetases from

B. stearothermophilus and E. coli contain lysine residues within a chemically similar variation on the KMSKS sequence that react with the

3’ end of their cognate tRNA (Bosshard et al, 1978; Hountondji et al,

1986). Consequently, it was of interest to see how many aminoacyl-tRNA synthetases appeared to contain this sequence and whether any inferences could be made for relationships between different synthetases as appear to apply to the HIGH region.

A comparison betwen 11 aminoacyl-tRNA synthetases is shown in

Fig. 8.13. Identical residues are drawn within solid boxes and conservative substitutions, made in accordance with the amino acid similarity matrix of Dayhoff (1969), are depicted by dashed boxes.

The sequences have been aligned around the sequence KMSKS that is present in the following enzymes: MetRS, IleRS and TrpRS from E. coli; yeast mitochondrial TrpRS, and TrpRS and ValRS from MetRS (s . c. cytoplasmic) 522 E N G s R G V G V 534 Walter et ah 1983

MetRS {E. c.) 329 N G A s R G T F I 341 Barker et alt 1982b

ValRS (B. St.) 522 Q G s L G N G V 534 Brand, this thesis

IleRS (E. c.) 599 Q G ! R | s I G N T V 611 Webster et ah 1984 ' 1 TrpRS (E. c.) 192 P T |K | s D P N R N 204 Hall et ah 1982 TrpRS (B. St.) 190 PT w s D PNP K 202 Winter and Hartley, 1977 TrpRS (S. c. mitochondrial) 223 P E !«! s DPN HD 235 Myers and Tzagoloff, 1985 GluRS (E. c.) 234 DG >k | RHG AV S 246 Breton et ah 1986 (in press) l J TyrRS (E. c.) 223 T V P ADG T K F 235 Barker et ah 1982a

TyrRS (B. St.) 219 T I P ADGT K F 231 Winter et ah 1983

TyrRS (B. cal.) 219 T I P A D G T K F 231 Jones et ah 1986 248

B. stearothermiophilus. Yeast cytoplasmic MetRS and E. coli GluRS possess simple variations on this sequence (KFSKS and KLSKR respectively). The second lysine in the sequence is the only invariant residue for all the enzymes in this comparison. The three Tyr-tRNA synthetases have a different sequence altogether, but maintain two chemically equivalent residues in front of the invariant lysine. Other identities are less significant, for example the conserved glycine in all but the Trp-tRNA synthetases. No real significance should be attached to the fact that the latter enzymes contain a proline in that position as the three Trp-tRNA synthetases share considerable homologies with each other at the level of the primary sequence (Myers and Tzagaloff,

1985).

The 11 residue homology betwen IleRS and ValRS in this region has already been metioned (Section 8.4.2.E). Certain homologies in the sequences for the region shown in the figure appears to link MetRS,

YalRS and IleRS together, similar to the pattern that exists for the

HIGH region (see previous section). By allowing a single insertion in the yeast MetRS sequence between residues 524 and 525, a glycine may be aligned with E. coli MetRS, IleRS and ValRS, two residues in front of the first lysine in the sequence KMSKS. This glycine is preceded by either asn (E. coli and yeast MetRS) or gin (ValRS and IleRS).

Fig. 8.13. (Facing) Alignment of 11 aminoacyl-tRNA synthetases in the KMSKS region. The sequences are aligned about the sequence lys-met-ser-lys-ser or a permissible variation, made according to Dayhoff (1969). The key to the species from which the sequences are derived, and the references shown on the right of the Figure, are given in the legend to Fig. 8.10. Perfect amino acid matches are boxed within unbroken lines; conservative matches are depicted within dashed lines. 249

The conservation of a glycine three amino acids after the second lysine has been mentioned already; this is followed by any two residues and then val or ile.

One similarity of these alignments is that, despite the disparity in size of the various synthetase protomers, the KMSKS region is located in the C-terminal half of each protomer, close to the mid-point of the protein sequence. The 3-dimensional structures of B. stearothermophilus

TyrRS and E. coli MetRS remain the only ones to have been solved so far (Bhat et al, 1982; Zelwer et al, 1982), but results indicate that the structures of the N-terminal halves of each enzyme subunit are similar, both assuming a folding pattern that is characteristic of a Rossmann nucleotide binding fold (Blow et al, 1983). The KMSKS region is located just outside of this region on a loop in both enzymes and has been implicated in binding the terminal adenosine of 3* acceptor stem of tRNA (see below).

A lysine immediately preceding the sequence KMSKS is conserved between the Trp-tRNA synthetases and GluRS in this comparison, and an arginine is also present in this position in ValRS and IleRS. The significance of lysine residues in the KMSKS sequence was first noted by Hountondji and co-workers in an attempt to map the binding site for tRNAmet in E. coli MetRS (Hountondji et al, 1985). Early studies had revealed that if tRNAmet was treated with periodate (creating aldehyde groups at the 3* end of the molecule) and mixed with MetRS, a stoichiometric covalent complex was formed as a result of Schiff base formation between the 3’ aldehyde groups of the modified tRNA and a lysine residue that is positioned near the active site of the enzyme

(Fayat et al, 1979). This complex could be stabilised by reduction with sodium cyanoborohydride and isolated (Hountondji et al, 1979). The rate of covalent modification of MetRS was mirrored by the rate of 250 the loss of both charging and pyrophosphate exchange activities for the enzyme (Fayat et aU 1979). Using [^C] labelled, periodate-treated tRNAmet, the enzyme-tRNA complex was isolated and cleaved with trypsin under controlled conditions (Hountondji et aU 1985). Unlabelled and labelled peptides were separated by molecular sieve chromatography and the labelled peptides were subsequently mapped by 2-dimensional chromatography. The major labelled peptides were isolated and their position relative to the known protein sequence and structure of MetRS were determined by amino acid analysis and by N-terminal sequencing.

Modified lysine residues were identified from the amino acid analyses by the absence of a peak that corresponded to lysine. The study showed that Lys-61 and Lys-335 were labelled preferentially. Lys-61 is found in the N-terminal nucleotide binding fold (presumed to be the

ATP binding site and the seat of reaction for aminoacylation) whereas

Lys-335 is located in the C-terminal domain (Zelwer et aU 1982) and represents the second lysine in the sequence KMSKS. The abolition of adenylation (as assayed by pyrophosphate exchange) possibly reflects steric hinderence of entry to the active site by the covalently bound tRNA.

This work was recently extended for the MetRS and applied to the E. coli TyrRS (Hountondji et al, 1986). Refinement of the MetRS crystal structure provisionally locates the fifth parallel 0 -strand of the nucleotide binding fold on one face of the active site. This would place Lys-335, which is located at one end of the 0-strand, in close proximity to Lys-61 which is on the other side of the active site

(S. Brunie (unpublished), cited in Hountondji et al, 1986). The major labelled peptide of the MetRS-tRNAme* complex comprised residue

Ser-334 - Phe-340, but two other lysines are in close proximity to

Lys-335; one at position 342 and another at position 332 (the first 251 conserved lysine in the KMSKS sequence). Lys-332 and Lys-342 did not appear to be labelled in the earlier set of experiments (Hountondji et al, 1985), but some weakly labelled peptides were obtained from peptide mapping and it is conceivable that some of them correspond to peptides containing those lysines. This cluster of lysines is reminiscent of a group of three lysines (Lys-225, 230 and 233) in the TyrRS from

B. stearothermophilus which may be bound to the 3’ end of tRNAty1,

(Bosshard et al, 1978). These residues were shielded from acetylation by tritiated acetic anhydride in the TyrRS-tRNA^ complex, but not in the enzyme alone. Studies on the interaction of E. coli TyrRS with tRNAtyr identified a group of lysine residues at positions 229, 234 and

237 that were labelled by periodate-modified [^C] tRNAtyr, Lys-234 being the most reactive of the three (Hountondji et aU 1986). These lysines are comparable to those residues of the B. stearothermophilus

TyrRS that were protected from acetylation by tRNAty1, in the experiments of Bosshard and colleagues.

The two Tyr-tRNA synthetases share 56% homology with respect to identical residues (Winter et al, 1983) and may be assumed to be structurally homologous as well. Fig. 8.14, redrawn from Hountondji et al (1986) to include the sequence of TyrRS from

B. stearothermophilus, shows how the lysines may be aligned on the basis of amino acid homologies. Hountondji and co-workers noted that a threonine and a phenylalanine adjacent to Lys-234 in E. coli TyrRS could be aligned with Thr-339 and Phe-340 in MetRS as shown. The significance of such an alignment is questionable in the light of the comparison of homologies between 11 synthetases shown in Fig. 8.13.

However, the observation that the lysine which occupies positions 229 or 335 in the E. coli TyrRS and MetRS respectively, is preceded by two chemically conserved residues (met/ile/leu/val; thr/ser) is valid. 252

** * 225 230 TyrRS (5. St.) 222 L V T KADGT K F G K 233 ** * 229 234 TyrRS (£. c.) 226 LIT KA DGT KF GK 237 ★ 335 MetRS (E. c.) 332 K M S K SRGT _ F I K 342

Fig. 8.14. Alignment of lysine residues implicated in binding the 3’ acceptor stem of tRNA. The Figure, adapted from Hountondji et al (1986), depicts an alignment around the KMSKS sequences for TyrRS from B. stearothermophilus (B.st.) and E. colt (E.c.) in comparison to E. coli MetRS. Lysine residues that are identified as binding the 3’ end of the cognate tRNA (Section 8.6 ) are marked by an asterisk and are numbered according to their position in the primary sequence. References for the sequences are given in the legend to Fig. 8.10. 253

The lysine residues are numbered in the figure and an asterisk represents those that are implicated in tRNA binding. Assuming that the refinement of the 3-dimensional structure of MetRS is correct, then such an alignment of lysines would agree with the structure of TyrRS in which the three lysines are located on a loop on the C-terminal side of the fifth parallel 0-sheet (Blow et a h 1983; Winter et a h 1983).

The lack of a-helical or 0-strand structure in these regions would probably accomodate a single residue insertion or deletion in the MetRS primary sequence. Note that though MetRS does not have a lysine in a position equivalent to residues 230 and 234 in the Tyr-tRNA synthetases (B . stearothermophilus and E. coli respectively), there is a third lysine at position 332. Similarly, though ValRS, IleRS and TrpRS do not possess a third lysine on the C-terminal side of the KMSKS box, there is a third lysine (or a conservative replacement by arginine) immediately adjacent to the N-terminal side of the box (Fig. 8.13).

The finding that E. coli TyrRS (and by analogy, B. stearothermophilus

TyrRS) and MetRS possess clustered reactive lysines in an area of considerable structural identity, despite considerable sequence variation, suggests that the KMSKS region may represent a site for binding the 3’ end of tRNA that is common to all of the synthetases shown in

Fig. 8.13.

A recent protein engineering study of the B. stearothermophilus

TyrRS has indicated that certain lysine and arginine residues on the surface of both subunits of the dimer interact with different parts of tRNAty1", apparently holding the 3’ acceptor arm of the tRNA in a fixed position relative to the active site (Bedouelle and Winter, 1986).

Site-directed mutagenesis of the tyrS gene, cloned in Ml 3, was used to mutate 40 surface acidic groups and the mutant phage were identified 254 by DNA sequencing. They were then used to infect a male (i.e., possessing pili and thus amenable to infection by Ml 3), E. coli strain

(HB2111) that contains a temperature-sensitive lesion in tyrS, and screened for the inability to complement the tyrS mutation at the non-permissive temperature. Seven mutants were poorly able or unable to complement the temperature-sensitive strain. Two of the latter mutants contained substitutions of asn for lys at positions 230 and 233 respectively. The positions of these lysine residues in the 3-dimensional structure lie close to the presumed "docking" site for the 3’ end of the tRNA. Lys-225, which aligns with lys-335 of E. coli MetRS in

Fig. 8.14 and was protected from acetylation by tRNA (Bosshard et al,

1978), may be mutated to alanine without imparing the ability of the mutant synthetase to complement HB2111. However, Hountondji et al

(1986) demonstrated that Lys-234 was the most reactive lysine in E. coli

TyrRS; this aligns with Lys-230 in the B. stear other mophilus enzyme

(Fig. 8.14). Given that the two Tyr-tRNA synthetases share exceptional primary sequence homology, these two lysines presumably lie in equivalent positions, i.e., in the case of the B. stear other mo philus TyrRS, on one face of the active site cleft. This would also be equivalent to the position of Lys-335 in E. coli MetRS, both residues being located at the end of the 5 strand. Though the Lys —> Asn mutations at positions 230 and 233 in B. stearothermophilus TyrRS showed diminished charging rates compared to wild-type TyrRS (Bedouelle and Winter,

1986), the interpretation of kinetic results for tRNA-binding mutants is not clear-cut and throws doubt on the validity of the complementation assay in the first place. Recent studies on these particular mutations indicate that they have a debilitating effect on the rate of adenylation as well (Jack Knill-Jones, unpublished results). Such a mutant would probably not be able to complement HB2111. 255

8.7. Conclusions on Common Elements of Structure Between Synthetases.

The analysis of various sequence homologies presented in this chapter suggests that certain structural elements may be common to some of the synthetases that have been compared, but not to all synthetases. Most notably, E. coli AlaRS, the first synthetase for which the primary structure was determined (Putney et a h 1981a), does not show any significant homology to any other synthetases. Neither can any significant homologies can be found by comparing any of the synthetases in Fig. 8.10 or 8.13 with E. coli HisRS (Freedman et ah

1985), PheRS (Fayat et a h 1983) or GlyRS (Webster et a h 1983). The latter authors made an alignment between the a-subunit of GlyRS and

AlaRS, TyrRS, GlnRS and TrpRS (all from E. coli). However, the homology was extremely dubious, introducing a number of single amino acid insertions and aligning disparate regions of the different synthetases.

The most recent comparison of aminoacyl-tRNA synthetases primary sequences that has been published is by Breton et al (1986). The authors compared homologies between GluRS and GlnRS from E. coli with all other available E. coli aminoacyl-tRNA synthetase sequences.

The flaws in this comparison, which aimed to discover any evolutionary links between GluRS and GlnRS, were to compare GlnRS to GluRS first and then compare any homologies that were found to other synthetases, and to ignore B. stear other mophilus and yeast sequences. As a consequence, the match between GluRS and the KMSKS region was overlooked (Fig. 8.13). GluRS is the smallest of the E. coli synthetases yet reported (471 residues) and may be tentatively assigned to a sub-family of synthetases that also contains GlnRS and ArgRS. All 256 three enzymes from E. coli exist as small momomers that have to bind the appropriate tRNA before the enzyme can adenylate the cognate amino acid (Soli and Schimmel, 1979). Four main segments of homology were found between GlnRS and GluRS. The first two were the areas of strongest homology and were co-linear over about 100 residues in the N-terminus of each enzyme. The two sequences may be compared over this region and about 40% identity found, without introducing any insertions to align the sequences (NJB, unpublished observations). Additionally, the region closest to the N-terminal, which contains the HIGH sequence, is 75% homologous over a 17-residue span.

This implies that GlnRS and GluRS are closely related. Breton and colleagues suggested that this homology might represent structural constraints placed upon conserving a binding site for ATP as the

100-residue region would be sufficient to form a single folding domain.

Bacterial TyrRS and MetRS, which are considerably larger than GluRS or GlnRS and exist as dimers in vivo, contain about 200 residues in the

ATP binding site, so the smaller size proposed for the ATP binding site in the latter two enzymes is quite likely. A further claim in

Breton’s paper, that a similarity between TrpRS and TyrRS in the

N-terminal halves of the two proteins may constitute the specificity site for discriminating between phenylalanine and tyrosine is unlikely. The region of the TyrRS that was proposed (residues 8-26 in the E. coli sequence) probably forms part of the first two or-helices and the short antiparallel /3-strand that constitute part of the ATP binding site in the

B. stearothermophilus TyrRS. Protein engineering studies that have been carried out on the latter enzyme indicate that TyrRS discriminates against phenylalanine through hydrogen bonding of the tyrosine side-chain hydroxyl to Tyr-34 and Asp-176 (Fersht et al, 1985a).

Fig. 8.11 indicates how residues that are distributed throughout a polypeptide may be brought together to form part of an active site when the protein has folded into its tertiary configuration. This is a common trend amongst proteins, for example, the asp-ser-his catalytic triad in the serine proteases (Blow et al, 1969).

The similarities between the B. ste ar other mophilus ValRS, E. coli

IleRS and MetRS, both at the levels of primary sequence (lie and

ValRS; IleRS and MetRS) and tertiary structure (conservation of the mononucleotide binding fold in E. coli MetRS compared to

B. ste ar other mo philus TyrRS and homologous location of HIGH residues; conservation of HIGH region to amino-terminus of all the synthetases that have been compared) imply that IleRS, ValRS and MetRS may be considered as members of the same aminoacyl-tRNA synthetase sub-family. All these enzymes possess editing mechanisms. The IleRS

(Mr = 105,000; Webster et al, 1984) and ValRS (M r - 102,000; Section

7.2) have a monomeric tertiary structure nd) whereas E. coli MetRS is dimeric (Mr = 76,000; Dardel et al, 1984). The cytoplasmic MetRS from yeast is a monomer of M r = 85,000 in contrast to the E. coli enzyme. The yeast enzyme contains an extra 200 residues at it’s

N-terminus whilst lacking about 120 amino acids from the C-terminus which, in E. coli, contain residues that are responsible for dimerisation

(Fasiolo et al, 1985). Mild proteolysis with trypsin cleaves about 200 residues from the C-terminus of E. coli MetRS, generating a truncated enzyme which exists as a monomer in vivo (Cassio and Waller, 1971).

Walter et al (1983) proposed that the two MetRSs are related as regards to the topology of the N-terminii as the N-terminal 300 residues of the

E. coli MetRS form a nucleotide binding (Rossmann) fold and yeast

MetRS is particularly homologous to its E. coli counterpart in this region. Supporting evidence came from the identification of a 258 conserved glycine at position 214 in the yeast primary sequence that corresponded to Gly-23 in the bacterial enzyme. This glycine occupies similar positions in the 3-dimensional structures of E. coli MetRS and liver alcohol dehydrogenase (ADH). The position of the glycine was judged to be significant as ADH has a Rossmann fold and all enzymes that are structurally equivalent to it have this particular glycine located between the first parallel 0 -strand and the adjacent a-helix on the

C-terminal side (Rossmann and Argos, 1977). This glycine is the invariant HIGH sequence glycine in all of the synthetases that have been compared in Table 8.10. However, despite the strong evidence for structural homology between the two MetRSs, the two proteins are not immunologically cross-reactive (Fasiolo et al, 1985), indicating that if the enzymes are structurally related, those elements are unlikely to be on the surface. Perhaps the structural diversity between different synthetases reflects a tendency to conserve certain domains or binding residues (for example, a mononucleotide binding fold or surface contact points for tRNA) on a heterologous framework.

The extent of homology for a particular enzyme from different sources has already been alluded to in an earlier section. For example,

MetRS from E. coli and yeast share 44% homology over a 400 residue alignment (Walter et al, 1983), and the homology plot shown in

Fig. 8.6 a, coupled with the location of the HIGH sequence (Fig. 8.10), implies that the yeast enzyme gained 200 residues at the N-terminus in the course of its evolution. One of the more striking examples of conservation between different soecies was reported from our laboratory

(Jones et al, 1986). The Tyr-tRNA synthetases from

B. stear other mo philus (a moderate thermophile) and B. caldotenax (an extreme thermophile) are homologous to a degree of 99%. Such 259 findings argue in favour of divergent evolution from a common ancestor for at least some synthetases. The conclusions that have been drawn in previous sections imply that the IleRS and ValRS may have evolved from a common ancestor in such a way, despite of the dissimilarities between the two bacteria. One of the striking facts about the alignment of the ValRS and IleRS amino acid sequences

(Fig. 8.9) is that some regions are highly conserved, such as that surrounding the KMSKS sequence in the C-terminal half of each protein. This particular sequence is at the centre of a region that represents the most extensive region of homology between two different synthetases so far reported: an 8-residue oligopeptide that includes

KMSKS is surrounded by small blocks of perfect matches, the extent of homology diminishing the further one looks either side of the KMSKS sequence. The findings of Hountondji and colleagues indicate that this region, rich in lysine and arginine residues, binds tRNA in MetRS and

TyrRS from E. coli (Section 8.6 ). The extent of the homologies implies that certain synthetases possess a high degree of similarity with regard to particular sequences. The evolutionary constraints on an enzyme must be such that a number of point mutations are tolerable, but only up to a level where the catalytic function of the enzyme becomes impaired. The selective pressure to minimise mutations affecting residues that participate directly in catalysis (e.g. those involved in binding the substrate(s) or in binding the transition state intermediates) must be greater than for less important residues. Thus, certain areas in an enzyme would be likely to be more rigorously conserved than others. The KMSKS peptide could well be one such region. 260

CHAPTER 9

FUTURE PROSPECTS FOR PROTEIN ENGINEERING OF ValRS

9.1. Site-directed Mutagenesis

The B. ster other mophilus valS gene has already been cloned as a

3.6 kb Pstl fragment into the single-stranded phage vector M13 (Section

5.5). A set of oligonucleotide primers were synthesised so that the entire DNA sequence of valS could be deduced from the coding strand

(Table 6.1). These primers, spaced roughly 250 bp apart, would also serve for sequencing any valS mutants that are created by site-directed mutagenesis. Unfortunately, the full-length clones amenable to such sequencing belong to the Class I grouping (Section 5.5) which lack the valS promoter (see Section 7.2). The first valS mutants have been constructed in a typical Class I clone. These mutants are currently being expressed by subcloning the fragment back into an agarose gel-purified vector portion of Ps/I-cut pTB8, (i.e., pTB8 lacking the 3.6 kb Pstl fragment. This plasmid expresses wild-type ValRS particularly well (Fig. 5.2).

The following mutations have been made and await kinetic analysis: His —> Asn 56 and Thr —> Ala 52. These are located in the HIGH region (Fig. 8.10) and, we presume, are analagous to Thr-40 and His-45 in the B. stearothermophilus TyrRS, residues that bind the pyrophosphate leaving group of ATP in the transition state for tyrosyl adenylate formation (Leatherbarrow et al, 1985). Potentially, the removal of His-56 or Thr-52 could provide the same hydrogen bonding groups for binding the pyrophosphate and implicate the HIGH region of

ValRS in ATP binding. 261

9.2. Deletion Mutagenesis and Other Studies

Though less refined than a site-directed mutagenesis strategy for dissecting the function of ValRS, this approach could still yield some important information with regards to the functional organisation of the enzyme. There are suitable unique restriction sites (for example, see

Fig. 4.8) so that sections of varying length could be removed from the

C-terminus. Alternatively, the exonuclease 1/ nuclease SI method of

Henikoff (1984) could be used to generate deletions precisely. Deletion mutagenesis of E. coli AlaRS has yielded some relevant information pertaining to the C-terminal boundary of the adenylation domain (Jasin et al, 1983).

Such an approach is not really applicable to locating residues involved in tRNA binding, as the residues for binding tRNA may well be distributed throughout the length of the polypeptide, being brought together as the polypeptide folds into the tertiary structure. Surface lysines and arginines implicated in binding tRNAtyr to the TyrRS from

B. stearothermophilus offer such an example (Bedouelle and Winter, 1986).

Jasin and colleagues did pinpoint a block of residues at positions

385-461 in the AlaRS protomer, essential for charging, by deletion mutagenesis. This was achieved by assaying for loss of complementarity of a temperature-sensitive E. coli alaS mutant (Jasin et aU 1983). A better method may be to use "tab-linker” mutagenesis.

This technique, originally developed by Barany (1985), involves inserting small oligonucleotide cassettes, containing a restriction enzyme recognition site for selection and coding for two amino acids, into appropriate restriction sites in the cloned gene in question. Thus, two amino acid residues can be introduced into a protein sequence in order to try and affect a particular activity without causing major structural 262

perturbations. This would be a good way of locating residues that are

involved in the editing mechanism of ValRS. At present, the best

candidates for tRNA binding and editing would be areas of the synthetase that are particularly homologous to the IleRS.

Ultimately, it would be hoped that the ValRS could be persuaded

to crystallize, allowing a 3-dimensional structure to be determined for future protein engineering studies. 263

APPENDIX 1

ABBREVIATIONS aaRS aminoacyl-tRNA synthetase ATP adenosine 5’-triphosphate BBOT 4.5% 2,5-bis(5’-tert-butyl-benz-oxazolyl-(2)))thiophene in 3:1 toluene:2-methoxy-ethanol Bis-Tris Bis[2-hydroxyethyl]amino-tris-[hydroxymethyl]methane bp base pair cpm counts per minute dATP deoxyadenosine 5’-triphosphate dCTP deoxycytidine 5’-triphosphate ddATP dideoxyadenosine 5’-triphosphate ddCTTP dideoxycytidine 5’-triphosphate ddGTP dideoxyguanosine 5’-triphosphate ddTTP dideoxythymidine 5’-triphosphate DEAE diethylaminoethyl dGTP deoxyguanosine 5’-triphosphate DMF N’N’-dimethyl-formamide dNTP deoxynucleotide 5’-triphosphate DTT diethiothreitol dTTP doexythymidine 5*-triphosphate EDTA ethylene diamine tetra-acetic acid FPLC fast protein liquid chromatography IPTG /3-isopropyl-thio-galactoside kb kilobase KP potassium phosphate LB Luria broth 2-ME 2-mercaptoethanol Mr relative molecular mass nt nucleotide ORF open reading frame PAGE polyacrylamide gel electrophoresis PMSF phenylmethane sulphonyl fluoride PPO/POPOP- 0.5% 2,5-diphenyloxazole and 0.03% l,4-di-(2-(5-phenyl oxazolyl))-benzene in toluene RBS ribosome binding site 264

SDS sodium dodecyl sulphate SE-gene strongly expressed gene TCA trichloroacetic acid TEMED N,N,N’,N’-tetramethyl-ethylenediamine Tris 2 amino-2-(hydroxymethyl—propane-1,3-diol TY tryptone, yeast extract WE-gene weakly expressed gene X-gal 5’-bromo,4’-chloro,3’indolyl /3-D galactoside 265

APPENDIX 2

LIST OF THE COMPLETE DNA SEQUENCE DATABASE

The following section contains the complete database for the valS sequencing project and is depicted in the 5’ —> 3’ direction. The sequences of each individual gel reading are shown, together with a consensus sequence derived from regions of overlap between contiguous gel readings (bottom row in each group of overlapping sequences). The sequence was assembled using the programs DBUTIL (Staden, 1980) and DBAUTO (Staden, 1982b), run on a Vax 11/750 mainframe in the Department of Biophysics, Imperial College, London. The sequence is numbered with 60 characters per row. The first 1 kb of sequence, which is not relevant to the work that has been presented in this Thesis, is not shown. The left-hand columns list the identification name for each individual gel reading. A minus sign next to the gel name indicates that the gel reading had to be reversed in order to align it with the consensus sequence (i.e., the sequence of that reading was from the opposite strand to the one denoted by the consensus). Certain ambiguity codes of Staden (1979) were followed:

Symbol Meaning

1 possibly 1 C 2 It 1 T 3 If 1 A 4 II 1 G B A or AA D C or CC H G or GG V T or TT

Padding characters (*) were inserted into the individuall gel reading by DBAUTO in the course of aligning some sequences (for example, a region AAB would be entered as AAB* in order to align it with AAAA from another gel reading). An unidentified base in the consensus sequence is denoted by — . PROJECT NAME =VALBST COPY NUMBER

FOR 100Z DISPLAY CONSENSUS TYPE Y

SELECT OPTION BY NUMBER ST0P=0,ENTER=1,PRINT=2TDISPLAY=3,J0IN=4, C0MPLEMENT=5,EDIT = 6,SEARCH=7 FIX=8,C0PY=9fCHECK=10fSCAN=ll,C0NSENSUS=12, DISK 0UTPUT=13 OPTION NUMBER = 3 NUMBER OF LEET GEL THIS CONTIG =2 DEFINE REGION RELATIVE POSITION OF LEFT END(DEF= 1) =1000 RELATIVE POSITION OF RIGHT END(DEF= 5021) = 1009 1019 1029 1039 1049 19 V19A2 GCGA1GCCGCGDAGAAGGAAGA 40 V19S10 GCGACGCCGCGCAGCAGGAAGAGGCGATGGATAATGAACAGCCGGCGGAT 78 V19S10RE GCGACGCCGCGCAGCAGGAAGAGGCGATGGATAATGAADAGCCGGCGGAT GCGACGCCGCGCAGCAGGAAGAGGCGATGGATAATGAACAGCCGGCGGAT

1059 1069 1079 1089 1099 40 V19S10 GAAAGCAGCAATGGTTAAATGGATTTTGAATDAAGCCCAAACCGGCGDGC 78 V19S10RE GAAAGCAGCAATGGTTAAATGGATTTVGAATCCAGCCCCAACCGGDHCAC -160 VCS68RE AGCCCCAADAHACGCGC GAAAGCAGCAATGGTTAAATGGATTTTGAATCCAGCCCCAACCGGCGCGC

1109 1119 1129 1139 1149 40 V19S10 ATAAATCCATTCDAAATGGAGCGCDAGGCAATGAAACGAGATGGCCAATA 78 V19S10RE ATAAATCCATT -160 VCS68RE ATAAATCCATACCAAATGGAGCGCCAGGCAATGAAACGAGATGGCCAATA ATAAATCCATTCCAAATGGAGCGCCAGGCAATGAAACGAGATGGCCAATA

1159 1169 1179 1189 1199 40 V19S10 AATGACAT -160 VCS63RE AATGACATTCAAACGTCGTTGGCAGAAAAATGGTGCGGATCGCTTGAAAA -162 V336RE TCAAACGTCGTVGGDAHAAAAATGAT4CGGATCGCTTGAAAA -163 VS43RE CGGATCGCTTGAAAA AATGACATTCAAACGTCGTTGGCAGAAAAATGGTGCGGATCGCTTGAAAA

1209 1219 1229 1239 1249 -160 VCS63RE ATGGTAGCGGGATGATCACTAGGGCGCAGACGGCGCGCACVTGCCGAAGC -162 VS36RE ATGGTAGCAHGATGATCACTTGGACGCAGACAHCGCGCACTTGCCGAAGC -163 VS43RE ATGGTAHCAGHATGATCACTTGGGCGCAGACGGCGCGCACTTGCCGAAGC -161 VS25RE ATGGTA4CGGGATGATCACTTGGGCGCAGACGGCGCGCACTTGCC4AAGC -164 SMA4RE GCGCACTTGCCGAAGC ATGGTAGCGGGATGATCACTTGGGCGCAGACGGCGCGCACTTGCCGAAGC

1259 1269 1279 1289 1299 -160 VCS68RE CAAGGCGGCCA -162 VS36RE CAAGGCGGCCATGAATACGGGTTGAATCGCCGCATCG GTGTTCCCTCCTT -163 VS43RE CAAGGCGGCCATGAATACGGGTTGAATCGCCGCATCG GTGTTCCCTCCTT -161 VS25RE CAAGGCGGCCATGAATACGGGVGTABTCGCCGCATCG GTGTTCCCTCCTT -164 SMA4RE CAAGGCGGADTVAAATACGHATTGAATCGCCGCATCG GTGTTCCCTCCTT CAAGGCGGCCATGAATACGGGTTGAATCGCCGCATCG GTGTTCCCTCCTT

1309 1319 1329 1339339 1349 -162 VS36PE CTCCGGTTTACTATACAATAT4AAACGTGCGGTGG -163 VS43RE CTCCGGTTTACTATACAATA -161 VS25RE CTCCGGVTTACTATACAATAT -164 SMA4RE CTCCGGTTTACTATACAATATGAAACGT4CGGTGGTTTGTCACCAAATCC -207 CM A1 ACGTGCGGTGGTTTGTCACCAAArGC CTCCGGTTTACTATACAATATGAAACGTGCGGTGGTTTGTCACCAAATCC

1359 1369 1379 1339 1399 -164 SMA4RE A C A C G G A T G G A A T A A A T A A A T A A A A A C T T G G C A A A G A T G A T G G A C A A T G G -207 SMA1 A C A C G G A T G G A A T A A A T A A A T A A A A A C T T G G C A A A G A T G A T G G A C A A T G G A C A C G G A T G G A A T A A A T A A A T A A A A A CfT T G G C ^ jA A G A T G A T G G A C A A T G G

1409 1419 1429 1439 1449 -164 SMA4RE TAGTATAATAATCGAATCATGACGACAAAAGCGAAGACGGGGAGGAGTAC -207 SMA 1 TAGTATAArAArCGAATCATGACGACAAAAGCGAAGACGGGGAGGAGTAC -210 VN IG2CS ^__AATAATCGAATCATGACGACAAAAGCGAAGACGGGGAGGAGTAC TAG(TArAArhATCGAATCATGACGACAAAAGCGAAGACGGGGAGGAGTAC 267

1459 1469 1479 1489 1499 SMA4RE AGCGGTCCGCGACTGCAG 207 SMA1 AGCGG 210 VNIG2CS AGCGGTCCGCGCCTGCAGAGAGGGAAATCAGAAGCTGCGAGATTTCCTAG 97 VTB1TAM TGCAGAGAGGGAAATCAGAAGCTGCGAGATTTCCTAG 184 VTAM29END GCAGAGAGGGAAAl'CAGAAGCTGCGAGATTTCCTAG 154 VTAM91 CAGAGAGGGAAATCAGAAGCTGCGAGATTTCCTAG 201 VB3 AGAGAGGGAAATCAGAAGCTGCGAGATTTCCTAG 94 0MTB8 GGGAAATCAGAAGCTGCGAGATTTCCTAG 185 VTAM100 AGAAGCTGCGAGATTTCCTAG 194 VBS7 TTTCCTAG AGCGGTCCGCGCCTGCAGAGAGGGAAATCAGAAGCTGCGAGATTTCCTAG

1509 1519 1529 1539 1549 210 VNIG2CS GATGACGGCTGCTGGAAGGTCG 97 VTB1TAM GATGACGGCTGCTGGAAGGTCGCCCTT4AGCTGCTTCTTTGAAAGGCGCT 184 VTAM29END GATGACGGCTGCTGG 154 VTAM91 GATGACGGCTGCTGGAAGGTCGCCCTTGAGCTGCTTCTTTGAAAGGCGCT 201 VB3 GATGACGGCTGCTGGAAGGTCGCCCTTGAGCTGCTTCTTTGAAAGGCGCT 94 VMTB3 GATGACGGCTGCTGGAAGGTCGCCCTTGAGCTGCTTCTTTGAAAGGCGCT 185 VTAM100 GATGACGGCTGCTGGAAGGTCGCCCTTGAGCTGCTTCTTAGAAAGGCGCT 194 0 BS7 GATGACGGCTGCTGGAAGGTCGCCCTTGAGCTGCTTCTTTGAAAGGCGCT 29 0CS46 TTGAGCTGCTT1TTTGAAAGGCHCT GATGACGGCTGCTGGAAGGTCGCCCTTGAGCTGCTTCTTTGAAAGGCGCT

1559 1569 1579 1539 1599 97 VTB1TAM GCTGCCGAGTAGAGGAAGCCGGCAT1ATGCCGTTATGCCAATGAAGCGCT 154 VTAM91 GCTGCCGAGTAGAGGAAGCCGGCATCATGCCGITATGCCAATGAAGCGCT 201 VB3 GCTGCCGAGTAGAGGAAGCCGGCATCATGCCGTTATGDAAATGAAGCGAT 94 VMTB8 GCTGCCGAGTAGAGGAAGCCGGCATCATGCCGTTATGCCAATGAAGCGCT 185 VT AM100 GDTGCCGAGTAGAGGAAGCCGGCATCATGCCGTTATGCCAATGAAGCGCT 194 VBS7 GCTGCCGAGTAGAGGAAGCCGGCATCATGCCGTTATGCCAATGAAGCGCT 29 VCS46 GCTGCCGAGTAGAGGAAGCCGGCATCATGCCGTTATGCCAATGAAGCGCT 115 TAM1 CCGGCATCATGCCGTTATGCCAATGAAGCGCT -62 VCS168A TGAAACGCT GCTGCCGAGTAGAGGAAGCCGGCATCATGCCGTTATGCCAATGAAGCGCT

1609 1619 1G29 1639 1649 97 0TB1TAM TAAGCTTCTAGAAGGCCGGTCAACTAAACTGATTGTGGGCTCGCCAGCCG 154 0TAM91 TAAGCTTCTAGAAGGCCGGTCAACTAAACTGATTGTGGGCTCGCCAGCCG •201 VB3 TAGGC 94 0MTB8 TAAGCTTCTAGAAGGCCGGT1AA1TAAA1TGATTGTGGGCTCGCCAGCCG 185 VTAM100 TAAGCTTCTAGAAGGCCGGTCAACTAAAATGATTGTGHACTAGCAAGCAG 194 VBS7 TAAGCTTCTAGAAGGCCGGTCAACTAAACTGATTGTGGGCTCGCCAGCCG 29 9CS46 TAAGCTTCTAGAAGGCCGATCA 115 TAM1 TAAGCTTCTAGAAGGCCGGTCAACTAAACTGATTGTGGGCTCGCCAGCCG -62 VCS168A TAAHCTTCTA4AAHACCGGTCAACTAAACT4ATT4TGGGCTCGCCAGCCG -149 0TAM81 AGAAGGCCGGTCAACTAAACTGAAVGTAGGCTCGCCAGDCG TAAGCTTCTAGAAGGCCGGTCAACTAAACTGATTGTGGGCTCGCCAGCCG

1659 1669 1679 1689 1699 97 VTB1TAM ATGTTTCACCGGTGGGTTGGCCGTGTTGCGGTT1TCTTCATCACCAGCCT 154 VTAM91 ATGTTTCACCGGTGGGTTGGCCGTGTTGCGGTTCTCTTCATCACCAACCT 94 VMTB8 ATGTTTCACCGGTGGGTTGGCCGTGTTGCGGTT1TATT 185 VTAM100 AAG 194 VBS7 ATGTTTCACCGGTGGGTTGGCCGTGTTGCGGTTCTCTTCATCACDAGCCT 115 TAM1 ATGTTACACC -62 VCS168A ATGTTTCACCGGTGGGTTGGCCGTGTTGCGGTTCTCTTCATCACCAGCCT -149 VTAM81 ATGTTTCACCHATAHGTTGGCCGTGTTGCGGTTCTCTTCATCACCAGCCT -71 VCS168 GTTGGCCGTGTTGCGGTTCTCTTCATCACCAGCCT -174 VKB14RE GDCGTGTTGCGGTTCTCTTCATCACCAGCCT -171 VTAM81RE TGTTGCGGTTCTCTTCATCACCAGCCT 7 9S30.DAT AATTCTCTTCATCACCAGCCT 6 VS3.DAT TCTTCATCACCAGCCT ATGTTTCACCGGTGGGTTGGCCGTGTTGCGGTTCTCTTCATCACCAGCCT 268

1709 1719 1729 1739 1749 97 VTB1TAM TCVAGGCTTAAGAAAAAAGGTHATACCGCGAAAGCAG 154 0TAM91 TCTAGGDTTAAGAAAABAGGTGGTACCGCGAAAG1AGCTCCTTTCGTCCT 194 VBS7 TCTAGGCTTAAGAAAAAAGGTGGTACCGCGAAAG -62 VCS168A ACTAGGCTTAAGAAAAAAGGTGGTACCGCGAAAGCA 149 0TAM81 TCTAGGCTTAAGAAAAAAGGTGGTACCGCGAAAGCAGCTCCTTTCGTCCT -71 0CS168 TCTAGGCTTAAGAAAAAAGGTGGTACCGCGAAA4CAG 174 0KB14RE TCTAGGCTTAAGAAAAAAHATGGTACCGCGAAAGCAGCT-D-OACA2CCT 171 0TAM81RE TCTAGGCTTAAGAAAAAAGGTGGTACCGCGAAAGCAGCTCCTTTCGTCCT 7 0S30.DAT TCTAGGCTTAAGAAAAAAGGTGGTACCGCGAAAGAAGCTCCTTTCGTCCT 6 0S3.DAT TCTAGGCTTAAGAAAAAAGGTGGTACCGCGAAAGCAGDTCCTTTCGTCCT 41 0CS128 GGTGGTACCGCGAAAGCAGCTCCTTTCGTCCT 152 0TAM44 GGTGGTACCGCGABAAAACCTCCTTTCGTCCT 93 OKI 400.2A CGAAAGCAGCTCCTTTCGTCCT TCTAGGCTTAAGAAAAAAGGTGGTACCGCGAAAGCAGCTCCTTTCGTCCT

1759 1769 1779 1789 1799 154 0TAM91 TTOCGGATGAAAGGGACTTTTOAGCTTGCCGGCTGCGCGGCATTGCTTAG 149 0TAM81 TTTCGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCGCGGCATTGCTTAG 174 0KB14RE TTTCGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCGCGGCATTGCTTAG 171 0TAM81RE TTACGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCGCGGCATTGCTTAG 7 0S30.DAT TTTCGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCGCGGCATTGCTTAG 6 0S3.DAT TTTCGGATGAAAGGGGCTTTTTTGDTTGCCGGCTGGGCGGCATTGCTTAG 41 0CS128 TTTCGGATGAAAGGGGCTITTTTGCTTGCCGGCTGCGCGGA 152 0TAM44 TTTCGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCGCGGCATTGCTTAG 93 0K1400.2A TTTCGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCG1GGDATTGCTTAG 51 0CS28E GGGCTTTTTTGCTTGCCGGCTGCGAGGCATTGCTTAG TXTCGGATGAAAGGGGCTTTTTTGCTTGCCGGCTGCGCGGCATTGCTTAG

1809 1319 1829 1339 1849 154 0TAM91 TGGABAG 149 0TAM31 TGGGAAGCCGCCGTCGCCGCATTGTCCATCATGTTGAAGGAGGGAAATGA 174 0KB14RE TGGGAAGCCGCCGTCGCCGCATTGTCCATCATGTTGAAGGAGGGAAATGA 171 0TAM81RE TGGGAAGCCGCCGTCGCCGCATTGTCCATCATGTTGAAGGAGGGAAATGA 7 0S30.DAT TGGGAAGDAGCCGOAACCGA 152 0TAM44 TGGGAAGCCGCCGTCGCCGCATIGTCCATCATGTTGAAGGAGGGAAATGA 93 OK1400.2A TGGGAAGCCGCCGTCGCCGCATTGTCCATCATGTTGABGGAGGGAAATGA 51 0CS28E TGGGAAGCACGCGTC 148 0KB14 BA4DA4CC4TCGCCGCA0TGTCCATCATGTTGAAGGAGGGAAATGA TGGGAAGCCGCCGTCGCCGCATTGTCCATCATGTTGAAGGAGGGAAATGA

1859 1869 1879 1889 1899 149 OT AMS 1 AACATGGCACAGCACGAAGTHTCGATGCCACCCAAATACGATCATCGCGC 174 0KB14RE AACATGGCACAGCACGAAGTGTCGATGCCACCCAAATACGATCATCGCGC 171 0TAM81RE AACATGGCACAGCACGAAGTGTCGATGCCACCCAAATACGATCATCGCGG 152 0TAM44 AACATGGCACAGCACGAAGTGTCGATGCCACCCAAATA1GATCATCGCGC 93 OK1400.2A AACATGGCA1AGCACGAAGTGTCGATGCCADAAAAATA1GATCATCGCGC 148 OKB14 AACATGGCA1AGCACGAAGTGTCGATGCCACCCAAATACGATCATCGC 187 ONTERM GGCACAGCACG— GT— CAATGCCACCCAAATACGATCATCGCGC -26 0UNCN1 ATAGAAACCACCCAAATACGATCATCGCGC AACATGGCACAGCACGAAGTGTCGATGCCACCCAAATACGATCATCGCGC

1909 1919 1929 1939 1949 149 OT AMS 1 CGTTGAAGCCGGGCGCTACGAATGGTGGCTGAAA •174 0KB14RE AGT •171 0TAM31RE CGTTGAAGCCGGGCGCTACGAA 152 0TAM44 CGTTGAAGCCGGGCGCTACGAATGGTGGCTGAAAGGAAAATTTTTTGAAG 93 OK 1400.2A CGTTGAAGCCGGGCGiTACGAATGGTAHCTGAAAGGAAAATTTTT 187 ONTERM CGTTGAAGCCGGGCAATACGAATGGTGGCTGAAAGGAAAATTTTTTGAAG -26 0UNCN1 CGTTGAAGCCGGGCGCTACGAATGGTGGCTGAAAGGAAAATTTTTTGAAG CGTTGAAGCCGGGCGCTACGAATGGTGGCTGAAAGGAAAATTTTTTGAAG

1959 1969 1979 1989 1999 152 OT AM 4 4 CGACCGGTGATCCGAACABACGACCGTTTACGATCGTCATCCCGCCGCCC 1S7 ONTERM CGACCGGTGATCCGAACAAACGACCGTTTACGATCGTCATCCCGCCGCCC -26 0UNCN1 CGACCGGTGATCCGAACAAACGACCGTTTACGATC -173 TAM7SRE TCATCC-AC-AACC CGACCGGTGATCCGAACAAACGACCGTTTACGATCGTCATCCCGCCGCCC 269

2009 2019 2029 2039 2049 152 VTAM44 AACGTCACCGGCAAATTGCACTVAGGGCATGCGTGGGA 187 VNTERM AACGTCACCGGCAAATTGCACTTAGGGCATGCGTGGGATACGACGCTGCA 173 TAM78RE AACATCACCGACAAATAACACVAA6GGCAT4C4TAHGATACGACGCT4CA 107 VRES5B GGCATGCGTAGGATACGACGCTGCA 129 VEX04A TACGACGCTGCA 112 EXQ2PART TOGA AACGTCACCGGCAAATTGCACTTAGGGCATGCGTGGGATACGACGCTGCA

2059 2069 2079 2089 2099 187 VNTERM AGACATCATTACGCGCATGAAGCGGATGCAAGGGTATGACGTTTTGTGGC 173 TAM78RE AGACATCATTACGCGCATAAAACAGATGCAAAHGTATGAC4TTTTATGGC 107 VRES5B AGACATCATTACGCGCATGAAGCGGATGCAAGGGTATGACGTTTTGTGGC 129 VEX04A AGACATCATTACGCGCATGAAGCGGATGCAAGGGTATGACGTTTTGTGGC 112 EX02PART AGACATCATTACGCGCATGAAGCGGATGCAAGGGTATGACGTTTTGTGGD 132 VTAM7S AGACATCATTACGCGCATGAAGCHGATGCAAGGGTATGAC4TTTTGTGGC 170 VTAM140 GGTATGACGAVTTGTGGC AGACATCATTACGCGCATGAAGCGGATGCAAGGGTATGACGTTTTGTGGC

2109 2119 2129 2139 2149 187 VNTERM TTCCGGGAATGGATCACGCCGGCATCGCAACCCAGGACAAAGTCGAGGAA 173 TAM78RE TTCCGGGAATGGATCACGCCHACATCGCCACCCAGGCGAAAGTCGAGGAA 107 VRES5B TTCCGGGAATGGATCACGCCGGCATCGCCACCCAGGCGAAAGTCGAGGAA 129 VEX04A TTCCGGGAATGGATCACGCCGGCATCGCCACCCAGGCGAAAGTCGAGGAA 112 EX02PART TTCCGGGAATHAATCAC4CCHACA 132 VTAM78 TTCCGGGAATGGATCACGCCGGCATCGDCACCCAGGCGAAAGTCGAGGAA 170 VTAM140 TTCCAAHAATGGATCACGCCGGCATCGDCACCCAGGCGAAAGTCGAGGAA TTCCGGGAATGGATCACGCCGGCATCGCCACCCAGGCGAAAGTCGAGGAA

2159 2169 2179 2189 2199 187 VNTERM AAATTGCGCCAGAAAGGGCTGTCGCGCTACGATTTAGGCCGGGAAAAATT 173 TAM78RE AAATTACGCCAGCAAGGGCTGTCGCGCTACGATTTAGGCCGGGAAAAATT 107 VRES5B AAATTGDGCCAGCAAGGGCVGTCGCGCATCGATVAAGGCCGHAAAAAATT 129 V E X 0 4 A AAATTGDGCCAGCAAGGGCTGTCGCGCTACGATTTAGGCCGGGAAAAATT 132 VTAM78 AAATTCCGCCAGCAAGHGCTGTCGCGCTACGATTTAGGCCGGGAAAAATT 170 VTAM140 AAATTACGCCAGCAAGGGCTGTCGCGCTACGATTTAGGCCGGGAAAAATT 196 VTB8VAL2 ATTGCGCCAGCAAGGGCTGTCGCGCTACGATTTAGGCCGGGAAAAATT 49 VSH23 CACGCTACGATTTAGGCCGGGAAAAATT 14 V8A1B AAAAATT AAATTGCGCCAGCAAGGGCTGTCGCGCTACGATTTAGGCCGGGAAAAATT

2209 2219 2229 2239 2249 187 VNTERM TTTAGAAGAAACGTCGAAGTAGAAGH 173 TAM78RE TTTAGAAGAAACGTGGAAGTGGAAGGAAGAATACGCCGGCCATATTCGCA 107 VRES5B TVAAGAAGAAACGTAHAAGTAHAAGGAAGAATACGCCGGCCATAATCG 129 VEX04A TTTAGAAGAAACGTGGAAGTGGAAGGAAGAATACGCCGGCCATATTCGCA 132 VTAM78 TTTAGAAGAAACGTGGAAGTGGAAGGAAGAATACGCCGGCCATATTCGCA 170 VTAM140 TTTAGAAGAAACGTGGAAGTGGAAGGAA4AATACGCCGGCCATATTCGCA 196 VTB8VAL2 TTTAGAAGAAACGTGGAAGTHAAAGGAAGAATACGCCGGCCATATTCGCA 49 VSH23 TTTAGAAGAAACGTGGAAGTGGAAGGAAGAATACGCCGGCCATATTCGCA 14 V8A1B TTTAGAAGAAACGTGGAAGTGGAAGGAAGAATACGC 105 VRES1BB CATHGAAATHGAAGGAAGAATACGCCGGCCATATTCGCA -1 VS1.DAT TGAAAGAA3HAATAC4CCGGCCATATTCGCA -45 VS40RI CGCCGGCCATATTCGCA 111 X3A CCGGCCATATTCGCA TTTAGAAGAAACGTGGAAGTGGAAGGAAGAATACGCCGGCCATATTCGCA

2259 2269 9 n*">o9 229a 173 TAM73RE GTCA A T G G G C G A A G T T A G G GCTTGG GCTT GATTACACGCGCGAGCGG 129 V E X 0 4 A GTCA A T G G G C G A A G T T A G G GCTTGG GCTT GATTACACGCGCGAACGHTTT 132 VTAM7B GTCA atgggcgaagttaggg CTTGG GCTT 4ATTACACGCGCGAGCGGTTT 170 VT Ah 140 GTCA A T 6 G G C G A A G T T A G 6 GC T T G GG C T TGATTACACGCGCGAGCGGT1T 196 VTB8VAL2 GTCA atgggcgaagttaggg CTTGG GCTT GATTACACGCGCGAACGATTT 49 VSH23 GTCA ATGGGCGAAGTTAGGG CTTGG GCTT GATTACACGCGCGAG 105 VF.ES1BE: G T C AATGGG DGAE.g TTAGGG CTTGG GCTT GATTACACGCGCGAGCGGTTT -1 VS1.DAT 4TCA A T A G G D G A E' G T T A G G GCTTGG GCTT GA T T A C A C G C G C G A G C G G T T T -45 V S 4 01-. I GTCA A T A l-l G C - A A - T T A G G GCTTGG GCA VGATTACACGCGCGAGCGGTTT 111 X3A GTCA ATGGGCGAAGTTAGGG CTTGG GCT rGA T T A C A C G D G C G A H C A A T T T GTCA ATGGGCGAAGTTAGGG CTTGG GCTT GATTACACGCGCGAGCGGTTT 270

2309 2319 2329 2339 2349 129 VEX04A ACGCTTGACGAAGGGCATTCCAAAACGGTACGCGA 132 VTAM78 ACGCXTGACGAAGGGCTTTCCA 170 VTAM140 ACGCTTGACGAAGGGCTTTCCAAAGCGGTGCGCGAAGTGTTCGTCTCGCT 196 VTBSVAL2 ACGCTTGACGAAAGGCITTCCAAACHGAIGCGAA1AGTGIICGXCICGCT 105 VRES1BB ACGCTTGACGAAGGGCTTTCCAAAGCGGTGCGCGAAGTGTTCGTCTCGCT -1 VS1.DAT ACGCTTGACGAAGGGCTTTCCAAAGCGGIGCGCGAAGTGTTCGTCTCGCT -45 VS40RI ACGCTTGACGAAGGGCTTTCCAAAGCGGTGCGCGAAATGTTCGTCTCGCT 111 X3A ACGCTTGACGAAGGGCTTTCCAAAGCGGTGCGCGAAGTGTTCGTCTCGCT 188 VTAM148 CGAAGGGCTTTCCAAAACGGT-AAAGAGGTGTTCGTCTCGCT ACGCTTGACGAAGGGCTTTCCAAAGCGGTGCGCGAAGTGTTCGTCTCGCT

2359 2369 2379 2339 2399 170 VTAM140 CTACCGGAAAGGGCTCATTTACCGCGGCGAATACATCATCAACTGGGATC 196 VTB8VAL2 CTACCGAAAAGGACTCATTTACCGCGGCGAATA1AT1AT1AACT1GGATC 105 VRES1BB CTACCGGAAAGGGCTCATTTACCGCGGCGAATACATCATCAACTGGGATC -1 VS1.DAT CTACCGGAAAGGGCTCAITTACCGCGGCGAATACATCATCAACTGGGATC -45 VS40RI CTACCGGAAAGGGCTCATTTACC4CGGCGAATACATCATC 111 X3A CTACCGGAAAGGGCTCATTTACCGCGGCGAATACATCATC 188 VTAM143 CTACCGGAAAGGACTCATTTACCGCGGCGAATACATCATCAACTGGGATC 165 VX3AEXT ATTTACCGCGGCGAATACATCATCAACTGGGATC 197 TH0RVAL2RE GGGATC 200 VBAMSAL3 TC CTACCGGAAAGGGCTCATTTACCGCGGCGAATACATCATCAACTGGGATC

2409 2419 2429 2439 2449 170 VTAM140 CGGTGACGAAAACCGCGTTGTCGGACATTGAGGTTATT 196 VTB8VAL2 CGGTGACGAAAACAGCGTTGTCHAACATTGAGGTTGTTTATAAAGAAGTG 188 VTAM148 CHATGACGAAAACCGCGTTGTCGAACATTGAGGTTGTTTATAAAGAAGTG 165 VX3AEXT CGGTGACGAAAACCGCGTTGTCGGACATTGAGGTTGTTTATAAAGAAGCG 197 TH0RVAL2RE CGGTGACGAAAACCGCGTTGTCGGACATTGAGGTTGTTTATAAAGAAGTG 200 VBAMSAL3 CGATGACGAAAACCGCGTTGTCGGACATTGAGGTTGTTTATAAAGAAGTG 204 VIAMVAL4 CCGCGTTGTCGGACATTGAGGTTGTTTATAAAGAAGTG 190 VTAM156EXT GAAGTG CGGTGACGAAAACCGCGTTGTCGGACATTGAGGTTGTTTATAAAGAAGTG

2459 2469 2479 2489 2499 196 VTB8VAL2 ABAGGCGCGCTCTACCATATGCG1TATC1GATAGAAGA1GACTCCGGCTT 188 VTAM148 AAAGGCGCGCTCTACCATATGCGATATCCACTCGCCGA1GACTCCGACTT 165 VX3AEXT AAAGGCGCGCTATACCATATGCGCTATCCGCTAGDAGAAGAATCCGGCTT 197 TH0RVAL2RE AAAGGCGCGCTCTACCATATGCGDTATCCGCTCGCCGACHACTCCGGCTT 200 VBAMSAL3 AAAGGCGCGCTCTACCATATGCHCTATCCGCTCGCCGACGGCTCCGGCTT 204 VTAMVAL4 AAAGGCGCGCTCTACCATATGCGCTATCCGCTCGCCGACGGCTCCGGCTT •190 VTAM156EXT AAAGGCGCACAATACCATAVGCGCAAVCAGCTCGCCGACGGCTCCGGCTT AAAGGCGCGCTCTACCATATGCGCTATCCGCTCGCCGACGGCTCCGGCTT

2509 2519 2529 2539 2549 196 VTB8VAL2 CATCGBAGTGGCGACGACCCGTCAGGAGAGAATGCTAGGTGACAAGGCCG 188 VTAM148 CATCGAAGTGGCAACGAGCCCTCCGGAGAGAC 165 VX3AEXT CATCGAAGTGG 197 TH0RVAL2RE ACTCGAAGTGGCGACGACCCGTCCGGAGACGATGCTCGGTGACADGGCCG 200 VBAMSAL3 CATCGAAGTGGCGACGACCCGTCCGGAGACGATGCTCGGTGACACGGCCG 204 VTAMVAL4 CATCGAAGTGGCGACGACCCGTCCGGAGACGATGCTCGGTGACACGGCCG •190 VTAM156EXT CATCGBAGTGGCGACGAAACGTCCGGBGACGATGCTCGGTGACACGGCCG •189 VTAM66LNG CGAAGTGGCGACGAAAC4TCCGGBGACGATGCTCGGTGACACGGCCG 146 VZA1 GGAGACGATGCTCGGTGACA1GGCCG 145 VX3C3 GGAGACGATGCTCGGTGACACGGCCG CATCGAAGTGGCGACGACCCGTCCGGAGACGATGCTCGGTGACACGGCCG

2559 2569 2579 2589 2599 196 VXB8VAL2 T T G C A G T - A A T C A G G A T G A C G A A C G G T A C A A G C A A T T G A T C G 6 C A B A A TI -I 1 97 TH0RVAL2RE TTG 200 VBAMSAL3 T T G C C G T G C A T C C G G A T G A C G A G C G C T A C A A G C A C T T G A T C G G C A A A A T G 20 4 VXAMVAL4 TTGCCTGGCATCCGGATGACGAGCGGXACAAGCACTTGATCGGCAAAATG -190 VTAM156EXT V A H C C G T G C A T C C G G A T G A C G A G C G G T A G A A G C A C V A G A T C G G C A A A AIG -1 S 9 VXAM66LNG T T G C C G T G C A T C C G G A T G A C G A G C G G T A C A A G G A C T T G A X C G G C A A A A T G 146 VZA 1 XXGCCXGGCAXGCGGAXGACGAGGGGXAGAAGGA1XXGAXCGGGAAAAXG 145 VX3C5 XXGCCGXGCAXCCGGAXGACGAGCGGXACAAGCACXXGAXCGGGAAAAXG 198 TH0RVAL2EX TTGDAGXGAAXCCGGAXGACGCGCGGXACACGCAATTAATCGGCAAA XG TXGCCGXGCATCCGGATGACGAGCGGXACAAGCACXXGAXCGGCAAAAXG 271 2609 2619 2629 2639 2649 196 0TBB0AL2 ATAAAAAT 200 0BAMSAL3 GTGAAAATGCCGATCGTCGGCCGCGAAATTCCGATCATCGCCGATHAGTA 204 VTAM0AL4 GTGAAACTGCCGATCGTCGGCCGCGAAATTCCGATCATCGCCGATGAGTA -190 UTAM156EXT GTGAAACTGCCGATCGTCGGCCGCGAAATTCCGATCATCGCCGATGAGTA -189 VTAM66LNG GTGAAACTGCCGATCGTCGGCCGCGAAATTCCGATCATCGCCGATGAGTA 146 VZA1 GTGAABAT4CCGATCGTCGGCCG1GAAATTCCGATCATCGDAGATGAATA 145 0X3C3 GIGAAACTGCCGATCGTCGGCCGCGAAATTCCGATCATCGCCGATGAGTA 198 TH0R0AL2EX GTGABAATGCCGATCGTCHACCGCGAAATTCCGATCATGGCCGATAAGTA -181 VT26EXT GATC-TCGGCCGCGAAATTCCGAT1ATCGCCGATGA4TA GTGAAACTGCCGATCGTCGGCCGCGAAATTCCGATCATCGCCGATGAGTA

2659 2669 2679 2689 2699 200 0BAMSAL3 CGTCGATATGGAAOAAGGCT1TGGGACGGTCAAAATAACGCCGGCGCACG 204 0TAM0AL4 CGTCGATATGGAATTCGGCTCTGGGGCGGTCAAAATTACGCCGHCGCAC -190 0TAH156EXT CGTCGATATGGAATTCGGCTCTGGGGCGGTCAAAATTACGCCGG -189 0TAh66LNG CGTCGATATG 146 VZ A1 CGTCGATATGGAATT 145 0X3C3 CGTCGATATGGAAXTCGGCTCTGGGGCGGTCAAAATTACGCCGGCGCACG 198 TH0R0AL2EX 1GTCGATATGGAATTCGGCTCTGGHACGGTCA -181 0T26EXT. C4TCGATATGGAATTCGGCTCTGGGGCHCTCAAAATTAAGCCGGCGCACG -192 0TAM66 CAATATGGAATTCGGCTCTHGAACGGTCAAAATTACGCCGGCGCACG 140 0TAM97 CGCCGGCGCACG 124 VX2FPART CGCACG -202 0156AEXT CGCGCG CGTCGATATGGAATTCGGCTCTGGGGCGGTCAAAATTACGCCGGCGCACG

2709 2719 2729 2739 2749 200 0BAMSAL3 ACDAGAACGACTCTG 145 0X3C3 ACCCGAACGACTTTGAAATCGGCAATCGCCACAAATTGCCGCGCATTCTC -181 0T26EXT ACCCGAACGACTTTGAAATCGGCAATCGCCAC -192 0TAM66 ACCCGAACGACTTTGAAATAHCCAATCGCCACAACTTGCCGCGCATTCTC 140 0TAM97 ACCCGAACGACTTTGAAATCGGCAATCGCCACAACTTGCCGCGCATTCTC 124 0X2FPART ACCCGAACGACTTTGAAATCGGCAATCGCCACAACTTGCCGCGCATTCTC -202 0156AEXT ACCCGAACGACTTT4AAATCGGCAATCGCCACAACTTGCCGCGCATTCTC -183 0TAM161BEG GGCAATCGCCACAACTGTCCGCGCATTCTC -191 VTAM156 CAATCGCCACAACTTHCCGCGCATTCTC ACCCGAACGACTTTGAAATCGGCAATCGCCACAACTTGCCGCGCATTCTC

2759 2769 2779 2789 2799 145 0X3C3 GTCATGAACGAAGA1HACBAGATGAATGAAAACGDAA -192 0TAM66 GTCATGAACGAAGACGGCACGATGAATGAAAACGCCATGCAATACCAGGG 140 0TAM97 GTCATGAACGAAGACGGCAAGATGAATGAAAACGCCATGCAATACCAGGG 124 0X2FPART GTCATGAACGAAGAAGGCACGATGAATGAAAACGCCATGCAATACCAGGG -202 0156AEXT GTCATGAACGAAGACGGCACGATGAATGAAAACGCCATGCAATACCAGGG -183 0TAM161BEG GTCATGAACGAAGACGGCACGATGAATGAAAACGCCATGCAATACCAGGG -191 0TAM156 GTCATGAACGAAGACGGCACAATAAATGAAAACGDCATGCAATADAAGGG -178 0TAM26RE AAGACGGCACGATGAATGAAAACGDAATGCAATADAAGGG 122 0X2B CACGATGAATGAAAACGCCATGCAATACCAGGG GTCATGAACGAAGACGGCACGATGAATGAAAACGCCATGCAATACCAGGG

2809 2819 2829 2839 2849 -192 0TAM66 GCTTGACAGGTTT4AATGCCGGAAGCAGATCGTCCGCGATTTGCAAGAAC 140 0TAM97 GCTTGACCGGTTTGAATGCCGGAAGCAGATCGTCCG1GATTTGCAAGAAC 124 0X2FPART GCTTGACCGGTTTGAATGCCGGAAGCAGATCGTCCGCGATTTGCAAGAAC -202 V156AEXT GCTTGACCGGTTTGAATGCCGGAAGCAGATCGTCCGCGATTTGCAAGAAC -133 0TAM161BEG GCTTGACCGGTTTGAATGCCGGAAGCAGA -191 0TAM156 GCTTGACAGGTTTGAATGCCGGAAGCAGATCGTCCGCGATTTGCAAGAAC -178 0TAM26RE GCTAGACCGGTTTGAATGCCGGAAGAAGATCGTCCGCGATTTGCAAGAAC 122 0X2D GCTTGACCGGTTTGAATGCCGGAAGCAGATCGTCCGCGATTTGCAAGAAC -92 OK 1400.8A TCGTADGCGATTTGCAAGAAC GCTTGACCGGTTTGAATGCCGGAAGCAGATCGTCCGCGATTTGCAAGAAC

2859 2 3 u 9 2879 2889 2899 -192 0TAM66 AAGGCGTCCTCTTTAAAATCGAGGAGCACGTGCACTAAGTCGGTCATAGC 140 0TAM97 AAGGCGTCCTCTTTABAATCGh GGAGCACGTCCACI'CAGICGGTCATAGC 124 0X2FPA A A G G C G T C C T C T T T A A A A r C G A G i j A G C A - 202 01 56A E AAGGCGTCCTCTTTAAAATCGAGGAGCACCTGCACTCAGTCGGTCATAGC -191 0TAM15 A A G G C G T C C T C T T T A A A A T C G A G G A G C A C G T G C A 8 T C A G T C G G T C A T A G C -178 0TAM26 A A G G C GIC C T C T T T A A A A T C G A G G A G A A C G T G C A C T C A G T C G GIC A T A G C 122 0X2 D AAGGCGTCCTCTTTAAAArCGAGGAGCACCTGCACTCAGTCGGTCATAGC - 9 2 OK 1400 AAHGCGTCCTCAOTAAAATCAAAHAGCACTGGCACTCAGTCGGTCATAGC -102 OKI AGC AAGGCGTCCTCTTTAAAATCGAGGAGCACGTGCACTCAGTCGGTCATAGC 2909 2919 2929 2939 2949 192 0TAM66 GAACGAAACGGCGCA 140 0TAM97 GAACGAAGCGGCGCAGTTATTGAACCGTATTTGTCGACGCAATGGTTTGT 202 0156AEXT GAACGAAGCGGCGCAGTTATTGAACCGTATTTGTCGACGCAATGGTTTGT 191 0TAM156 GAACGAAACGGCGCAGTTATTGAACCGTATTTGTCGACGCAA 178 0TAM26RE GAACGAAGCGGCGCAGTTATTGAACCGTATTTGTCGACGCAATGGTTTGT 122 0X2D GAACGAAGCGGCGCAGTTATTGAACCGTATTTGTCGACGCAATGGTTTGT -92 0K1400.8A GAACGAA4CGGCGCAGTTATTGAACCGTAAT0GTCGACHCAATGGTTTGT 102 OKI GBACGAA4CGGCGCAGTTATTGAACCGTATTTGTCCACGCAATGGA0TGT 101 0C1 ATAAAOGTCGACGCAATGGAATGT 95 0ACCF2.5A CGCAATGGTTTGT GAACGAAGCGGCGCAGTTATTGAACCGTATTTGTCGACGCAATGGTTTGT

2959 2969 2979 2989 2999 140 0TAM97 CAABATGAAGCCGCTCGCCGAAGCCGCCATC 202 0156AEXT GAAAATGAAG 178 0TAM26RE GAAAATGAAGCCGGTCGCCGAAGCAGCCATC 122 0X2D GAAAATGAAGCCGCTCGCCGAAGCCGCC -92 OK1400.8A GAAAATGAAHCCGCTCGCCGAAHCCGCCATCAAGCTCCAGCAAACCGACG 102 OKI GAAAATGAAGCCGCTCGCCGAAGCCGCCATCAAGCTCCAGCAAACCGACG 101 0C1 GABAATGAAGACGCTCGADGAAGACGCCATCAAGCTCCAGCAAACCGACG 95 0ACCF2.5A GAAAATGAAGCCGCTCGCCGAAGCCGCCATCAAGCTCCAGCAAACCGACG -98 UNC01 CGCTCGCCGAAGADGCCATCAAGCTCCAGCAAACCGACG 28 0CS44 ATCAAGCTCCAGCAGACCGACG 106 OSON8 ATCAAGCTCCAGCAAACCGACG GAAAATGAAGCCGCTCGCCGAAGCCGCCATCAAGCTCCAGCAAACCGACG

3009 3019 3029 3039 3049 -92 OK1400.8A GCAAAGTGCAGTTCGTGCCGGAACGGTTTGAAAAAACGTATTTGCATTGG 102 OKI GCAAAGTGCAGTTCGTHCCGGAACGGATTGAAAAAACGTATTTGCATTGG 101 0C1 GCAAA4TGCAGTTCATGACGGAACGGA0TGAAAAAACGTATTTGCATTGG 95 0ACCF2.5A GCAAAGTGCAGTTCGTGCCGGAACGGTTTGAAAABACGTATTTGCATTGG -98 UNC01 GCAAAGTHCAGTTC4TGCCGGAACGGTTTGAAAAAACGTATTTGCATTGG 28 0CS44 GCAAAGTGCAGTTCGTGCCGG3CCGGTTTGAAAABADGTATTTGCAT 106 OSONS GCAAAGTGCAGTTCGTGCCGGAACAGTTTGAAAAAAAGTATTTGCATTGG -96 ACCF2.4 GTGCA4TTC4TGCCAGAACAHTTT4AAAAAADATATTTGCATTGG GCAAAGTGCAGTTCGTGCCGGAACGGTTTGAAAAAACGTATTTGCATTGG

3059 3069 3079 3089 3099 -92 0K1400.8A CTTGAAAACATCCGCGACTGGTGCATTTCACGCCAGCTTTGGTGGGGGCA ■102 OKI CATGAAAACATCCGCGACTGGTGCATTTCACGCC ■101 0C1 CAOGAAAACATCCGCGACTGGTGCATTTCACHCCAGCTTTGGTGGGGGCA 95 0ACCF2.5A CTTGAAAACATCCGCGACTGGTGCATTTCACACCAGCTTTGGTGGGHACA -98 UNCO 1 CTTGAAAACATCCGCGACTGGTGCATTTCACGCCAGCTTTGGTGGGGGCA 106 OSON8 CT -96 ACCF2.4 CTTGAAAACATCCGCGACTAHTGCATTTCACGCCAACTTTAGTGGGGGCA 123 0X2E CATCCGCG3CTGGTGCATTTCACGCCAGCTTTGGTGGGGGCA 156 0TAM116 CATTTCACGCCAGCTTTGGTGGGGGCA CTTGAAAACATCCGCGACTGGTGCATTTCACGCCAGCTTTGGTGGGGGCA

3109 3119 3129 3139 3149 -92 OK1400.8A TCGCATCCCGGCAT ■101 0C1 TCGCATACCGGCATGGTA 95 0ACCF2.5A TCGCATCCCGGCATGGTACCATAAAGAGACGGGTGAAATTTACGTCGACD -98 UNC01 TCGCATACCGGCATGGTACC -96 ACCF2.4 TCGCATCCCGGCATAHTACCATAAAGAGACCGGTGA 123 0X2E TCGCATCCCGGCATGGTACCATAAAGAGACGGGTGAAATTTACGTCGACC 156 0TAM116 TCGCATCCCGGCATGGTACCATABAGAGACGGGTGAAATTTACGTCGACC -104 0RES4C CCCAHCATHA0ACCATBAAGAGACHAGTGAAATTT3CGTCGACC TCGCATCCCGGCATGGTACCATAAAGAGACGGGTGAAATTTACGTCGACC

3159 3169 3179 3189 3199 95 0ACCF2.5A ATG 123 0X2E ATGAACCGCCGAAAGACATCGAAAB1TGGGAACAAGACCCAGATGTGCTC 156 0TAM116 ATGAACCGCCGAAAGACATCGAAAACTGGGAACAAGACCCAGATGTGCTC -104 0RES4C ATGAACCGCCGAAAGACATCGAAAACTGGGAACAAGACCCAGATGTGCTC ATGAACCGCCGAAAGACATCGAAAACTGGGAACAAGACCCAGATGTGCTC

3209 3219 3229 3239 3249 123 0X2 E GACACATGGTTCAGCTCGGCGCTCTAMCCGTTCTCGACAATHGADOAGCC 156 0TAM116 GACACATGGTTCAGCTCGGCGCTCTGGCCGTTCTCGACAATGGGCTGGCC -104 ORES 4C GACACATGGTTCAGCTCGGCGCTCTGGCCGTTCTCGACAATGGGCTGGCC 153 0TAM131 A C A T G G T T C A G C T C G G C G C T C T G G C C G T T C T C G A C A A T G G G C T G G C C GACACArGGTTCAGCTCGGCGCTCTGGCCGTTCTCGACAATGGGCTGGCC 273

3259 3269 3279 3289 3299 123 0X2E GGATACCGB1TCGCCGGATTACAAGCGCTACT 156 0TAM116 GGATACCGACTCGCCGGATTAAAAGCGCT -104 0RES4C GGATACCGACTCGCCGGATTACAAGCGCTACTACCCGACCGATGTGCTGG 158 0TAM131 GGATACCGACTCGCCGGATTACAAGCGCTACTACCCGACCGATGTGCTGG -182 0T154EXT TCGCCHGATTACAAACGCTCAT4CCCGACCGAT4TGCTGG 121 0X2A ACTACCCGACCGATGTGCTGG 113 S0NK3 CCGATGTGCTGG GGATACCGACTCGCCGGATTACAAGCGCTACTACCCGACCGATGTGCTGG

3309 3319 3329 3339 3349 -104 0RES4C TCACCGGTTATGACATCATTTTCTTCTGGGTGTCGCGCATGATTTTTCAA 158 0TAM131 TCACCGGTTATGACATCATTTTCTTCTGGGTGTCGCGCATGATTTTTCAA -182 0T154EXT TCACCHAA0ATGACATCATTTTCTTCTGGGT4TC4CGCATGATTTTTCAA 121 0X2A TCACCGGTTATGACATCATTTTCTTCTGGGTGTCGCGCATGATTTTTTAA 113 S0NK3 TCACCGGTTATGACATCATTTTCTTCTGGGTGTCGCGCATGATTTTTCAA -36 OB12 CCGGTTATGACATCATTTTCTTCTAGGTGTCGCGCAT-AOTTTCTAA 136 0TAMS7 ATCATTTTCTTCTGGGTGTCGCGCATGATTTTTCAA -193 04D TTCTTCTAHGTGTCGCGCATGATTTTTCAA TCACCGGTTATGACATCATTTTCTTCTGGGTGTCGCGCATGATTTTTCAA

3359 3369 3379 3389 3399 -104 0RES4C GG 158 VTAM131 GGGCTTGAATTCACCGGAAAGCGTCCGTTCAAAGACGTGCTCATCCACGG -182 VT154EXT GGGCTT4AATTCACCGGAAAACGTCCGTTCAAAGACATGCTCATCCACGG 121 VX2A GGGCTTGAATTCACCGGAAAGCGTCCGTTCAAAGACGTGCTCATCCACGG 113 S0NK3 GGGCTTGAATTCACCGGAAAGCGTCCGTTCAAAGACGTGCTCATCCACGG 136 VTAM87 GGGCTTGAATTCACCGGAAAGCGTCCGTTCAAAGACGTGCTCATCCACGG -193 04 D GGGCTTGAATTCACCGGAAAGCGTCCGTTCABAGACGTGCTCATCCACGG 73 VCS32 CACCGGAAAGCGTCCGTTCAAAGACGTGCTCATCCACGG GGGCTTGAATTCACCGGAAAGCGTCCGTTCAAAGACGTGCTCATCCACGG

3409 3419 3429 3439 3449 158 VTAM131 CCTCGTCCGCGACGCCCAAGGGGCAAAAATGAGCAAGTCCCTCGGCAACG -182 VT154EXT CCTC 121 0X2 A CCTCGTCCGC 113 S0NK3 CCTCGTCCGCGBCGCCCAAGGGCGGAAAATGAGCAAGTDGCTCGGCAACG 136 0TAHS7 CCTCGTCCGCGACGCCCAAGGGCGGAAAATGAGCAAGTCGCTCGGCAACG -193 04 D CCTCGTCCGCGACGCCCAAGGGCGGAAAATGAGCAAGTCGCTCGGCAACG 73 0CS32 CCT1HTCCGCGA1HDACAAGGGCGGAAAATGAG1BAGTCGCTAGGCAAAG 125 0X2B CCTCGTCCGCG3CGCCCAAAHGCHAAAAATGAGCAAGTCGCTCGGCAACG -179 0TAM39RE CAAAHGCGGAAAATAAGCAAGTCGCTCGGCAACG -180 0TAM154RE AAGGGCHAAAAATAAHCAAGTCGCTCGGCAACG 169 0NEU13 GGAAAATGAGCAAGTCGCTCGGCAACG CCTCGTCCGCGACGCCCAAGGGCGGAAAATGAGCAAGTCGCTCGGCAACG

3459 3469 3479 3489 3499 158 0TAM131 GCGTCGATCCGATGGATGTGATCGACCAGTACGGCGCCGATGCGATCCGT 113 S0NK3 GCGTCGATCCGATGGATGTGATCGACCAGTACGGCGABGATG1GATADGT 136 0TAM87 GCGTCGATCCGATGGATGTGATCGACCAGTACGGCGCCGATGCGATCCGT -193 04D GCGTCGATCCGATGGATGTGATCGACCAGTACGGCGCCGATGCGCTCCGT 73 0CS32 GAATCGATCAGATGGAT 125 0X2B GCGTCGATCCGATGGATGTGATCGACCAGTACGGCGCCGATGCGATCCGT -179 0TAM39RE GCGTCGATCAGATGGATGTGATCGADAAGTGAGGDACCAATGCGCTCCGT -180 0TAM154RE GCGTCGATDAGA2GGATGTAATCGACAAGGATGGAABCGATGCHCTCCGA 169 0NEU3 GCGTCGATCCGATGGATGTGATCGACCAGTACGGCGCTGATGGCATCCGT 147 0KB4 GGATGTGATCGACCAGTACGGCGCCGATGCGATCCGT 177 0KB4RE GGATGTGATCGACCAGTACGGCGCCGATCGGATCCGT GCGTCGATCCGATGGATGTGATCGACCAGTACGGCGCCGATGCGCTCCGT

3509 3519 3529 3539 3549 158 0TAM131 TACTTCTTGGCGACCGGCAGCTCGCCGGGACAAGACTTGDGC 113 S0NK3 TB1TTCA0GGCGAADGGCAGCT 136 0TAM87 TACTTCTTGGCGACCGGCAGCTCGCCGGGACAAGACTTGDGCTTCAGCAC -193 04 D TACTTCTTGGCGACGGAAAGCTCGCCGGGACAAGACTTGCGCTTCAGCAC 125 0X2B TACTTCTTGGCGACCGGCAGCTCGCCHGGGCAAGACTTGCGCTT1AGCAC -179 0TAM39RE TACTTCTTGCHGACCGGCAGCTCGCCGGGGCAAGACTTGCGCTTCAGCAC -180 0TAM154RE TTCTTCTTGGCGAAAGGCAGCTCGCCGGGGCABGACTTGCGCTTCAGCAC 169 0NEU3 TACTTCTTGGCGACCGGCAGCTCGCAGGGACAAGACTTGCGCTTCAGCAC 147 0KB 4 TACTTCTTGGCGACCGGCAGCTCGCCGGGGCAAGACTTGCGCTTCAGCAC 177 0KB4RE TACTTCTTGGCGACCGGCAGCTCGCCGGGGCAAGACTTGCGCTTCAGCAC TACTTCTTGGCGACCGGCAGCTCGCCGGGGCAAGACTTGCGCTTCAGCAC 274 3559 3569 3579 3589 3599 VTAM87 AGAAAAAHTCGAAGCGACGTGGAATTTTGDTAACAAAATTTGHAACGCCT V4D AGAAAAAGTCGAAGCGACGTGGAATTTTGCTAACAAAATTTGGAACGCCX 125 UX2B AGAAAAAGTCGAA-CGACGTGGAATTTTGCTAACA 179 UTAM39RE AGAAAAAGTCGAAGCGACGTGGAATTTTGCTAACAAAATTTGGAAAGDCT 180 VTAM154RE AGAAAAAGTCGAAGCGACGTGGAATTTTGCTAACAAAATTTGGAAGGCCT 169 UNEW3 A 147 UKB4 AGAAAAAGTCGAAGCGA1GTGGAATTT 177 VKB4RE AGAAAAAGTCGAAGCGACGTGGAATTTTGCTAACAAAATTTGGAACGCCT 168 VTAM141 TTGGAACGCCT AGAAAAAGTCGAAGCGACGTGGAATTTTGCTAACAAAATTTGGAACGCCT

3609 3619 3629 3639 3649 136 UTAM87 CGCGCTTT 193 V4D CGCGCTTTGCCTTGATGAACATGGGCGGCATGAC 179 UTAM39RE CGCGCTTTGCCTTGATGAACATGGGCGGCATGACGTACGAGGA 180 VTAM154RE CGCGCTTTGCCTTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGAT 177 UKB4RE CGCGXTTTGCCTTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGAT 168 UTAM141 CGGGCTTTGCCTTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGAI 133 UTAMSOA GCCTTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGAT 155 UTAM80 GCCTTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGAT 186 US0N1RE TTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGAT 103 US0N1 GATGAACATGGGCGGCATGACGTACGAGGAGCTTGAT CGCGCTTTGCCTTGATGAACATGGGCGGCATGACGTACGAGGAGCTTGAT

3659 3669 3679 3689 3699 180 VTAM154RE TTGAGCGGC 177 UKB4RE TTGAGCGGCGAAAAAAAGGTCGCCGACCATTGGATTTTAACGCGTCUCAA 168 VTAM141 TTGAGCGGCGAAAAAACGGTCGCCGACCATTGGATTTTAADGCGTCTCAA 133 UTAM80A TTGAGCGGCGAAAAAACGGTCGCCGA 155 UTAM80 TTGAGCGGCGAAAAAACGGTCGCCGACCATTGGATTTTAACGCGTCTCAA 186 US0N1RE TTGAGCGGCGAAAAAACGGTCGCCGACCATTGGATTTTAACGCGTCTCAA 103 US0N1 TTGAGCGGCGAAAAAACGGTCGCCGACCATTGGATTTTAACGCGTCTCAA 134 UTAM80B ACGCGTCTCAA TTGAGCGGCGAAAA'AACGGTCGCCGACCATTGGATTTTAACGCGTCTCAA

3709 3719 3729 3739 3749 177 UKB4RE CGAAACGATCGAGACGGTGACGAAGCTCG1TGAGAAATA1GAATTDGGCG 168 UTAM141 CGAAACGATCGAGACGGTGACGAAGCTCGCTGAGAAATACGAATTCGGCG 155 VTAM80 CGAAACGATCGAGACGGTGACGAAGCTCGCTGAGAAATACGAATTCGGCG 186 US0N1RE CGAAACGATCGAGACGGTGACGAAGCTCGCTGAGAAATACGAATTCGGCG 103 US0N1 CGAAACGATCGAGACGGTGACGAAGCTCGCTGAGAAATACGAATTCGGDG 134 UTAM80B CGAAACGATCGAGACGGTGACGAAGCTCGCTGAGAAATACGAATICGGCG •209 UNIG1SEQ AATTCGGCG CGAAACGATCGAGACGGTGACGAAGCTCGCTGAGAAATACGAATTCGGCG

3759 3769 3779 3789 3799 177 UKB4RE AAAGGGGGCGTACG 168 UTAM141 AACGGGGGCGTACGCTGTACAADTTTATTTGGGACGACTTGTGCGACTGG 155 VTAM80 AACAHGGGCGTACGCTGTACAACTTTATTTGGGACGACTTGTGCGACTGG 186 VS0N1RE AACGGGHACGTACGCTGTACA-CTTTATAAG 103 US0N1 AACAHGGGDGTACGCTGTACAACTTT 134 UTAM80B AACAHGGGCGTACGCTGTACAACTTTATTTGGGACGACTTGTGCGACTGG ■209 VNIG1SEQ AAAGGAHGC4TACGCTGTACAACTTTATTTGGGACGACTTGTGCGACTGG 176 VTAM121 GCGTACCGTGTACAACTTTATTTGGGACGACTTGTGCGACTGG ■205 UNIG1 TACAACTUAATTAGGGACGACTAGTGCGACTGG AACGGGGGCGTACGCTGTACAACTTTATITGGGACGACTTGTGCGACTGG

3809 3819 3829 3839 3849 168 UTAM141 TACATT4AAATGGCGAAATTGCCGCTTTACGGTGACGACGAAGCGGCGAA 155 UTAM80 TACATTGAAATGGCGAAATTGCCGCTTTACGGTGACGACGAAGCGGCGAA 134 UTAM80B TACATTGAAATHAC4AAATT4CCGCTTTACGGTGBDGADGAAG ■209 UNIG1SEQ TACATTGAAATGGCGAAATTGCCGCTTTACGGTGACGA1GAAGCGGCGAA 176 UTAM121 TACATTGAAATGGCGAAATTGCCGCTTTACGGTGACGACGAAGCGGCGAA -205 UNIG1 TACAUAGAAATGGCGAAATAGDAGCTTTACGGTGACGACGAAGCGGCGAA 143 UTAM69RE GGTGACGACGAAGCGGCGAA TACATTGAAATGGCGAAAITGCCGCTTTACGGTGACGACGAAGCGGCGAA

3859 3869 3879 3889 3899 168 UTAM141 AAAGACGACGCGCTCCGTCTTGGCGTATGTGCTCGACBACACGATGCGCC 155 UTAM80 AAAGACGACGCGCTCCGTGTTGGCGTATGTGCTCGACAACACGATGCGCC -209 UN IG1SEG AAAGACGACGCGCTCCGTGTTGGCGTATGTGCTCGACAACACGATGCGCC 176 UTAM121 AAAGACGACGCGCTCCGTGTTGGDGTATGTGCTCGACAACACGATGCGCC -205 UN IG1 AAAGACGACGCGCTCCGTGTTGGCGTATGTGCTCGACAACACGATGCGCC 143 UTAM69RE AAAGACGAGGCGCTCCGTGTTGGCGTATGTQCTCGACAACAIGATGCGCC aaagacgacgcgctccgtgttggcgtaigtgctcgacaacacgatgcgcc 275 3909 3919 3929 3939 3949 168 VTAM141 TGCTTCACCC4TTTATGCCGTTCATTACCGAGGAA 155 VTAMBO TGCTTCACCCGTTTATGCDGTTCATTADA 209 VNIG1SEQ TGCTTCACCCGTTTATGCCGTTCATTACCGAGGAAATTTGGCAAAACTTG 176 0TAM121 TGCTTCACCCGTTTATGCCGTTCATTACCGAGGAAATTTGGCAAAACTTG 205 VNIG1 TGCTTCACCCGTTTATGCCGTTCATTACCGAGGAAATTTGGCAAAACTTG 143 VTAM69RE TGCTTCACCCGTTTATGCCGTT1ATTACCGAGGAAATTTGGCAAAA1TTG TGCTTCACCCGXTTATGCCGTTCATTACCGAGGAAATTTGGCAAAACTTG

3959 3969 3979 3989 3999 209 VNIG1SEG CCGCATGAAGGCGAATCGATCACCGTCGCTCCGTHACCGCGAATGCGCCC 176 VTAM121 CCGCATGAAGGCGAATCGATCACCGTCGCTCCGTGGCDGCAAGTGCGCCC 205 VNIG1 CCGCATGAAGGCGAATCGATCACCGTCGCTCCGTGGCCGCAAGTGCGCCC 143 VTAM69RE CCGCATGAAGGCGAATCGATCACAGTCGCTCCGTGGCCGCAAGTGCGCCC 195 VBS25 CCC CCGCATGAAGGCGAATCGATCACCGTCGCTCCGTGGCCGCAAGTGCGCCC

4009 4019 4029 4039 4049 209 VNIG1SEQ TGAGCTGTCGAA 176 VTAM121 TAAGCTATCGAACGAAGAAGCCGCGGAAHAAATACGGAGTCTT ■205 VNIG1 . TGAGCTGTCGAACGAAGAAGCCGCGGAAGAAATGCGGATGCTTGTGGA 143 VTAM69RE TAAGCTGTCGAACGAAGAAGCCGCGGAAGAAATGCGGATGCTTGTGGACA 195 VBS25 TAAGCTGTCGAACGAAGAAGCCGCGGAAGAAATGCGGATGCTTGTGGACA 157 VTAM125 GCCGCGGAAGAAATGCGGATGCTTGTGGACA 144 VTAM79 GGACA TGAGCTGTCGAACGAAGAAGCCGCGGAAGAAATGCGGATGCTTGTGGACA

4059 4069 4079 4089 4099 143 VTAM69RE TCATCCGCGCCGTCCG1AACGTCCGCGCCGAAGTAAACA1GCCGCCGAGC 195 VBS25 TCATCCGCGCCGTCCGCAACGTCCGCGCCGAAGTAAACACGCCGCCGAGC 157 VT AM125 TCATCCGCGCCGTCCGCAACGTCCGCGCCGAAGTAAACACGCCGCCGAGC 144 VTAM79 TCATCCGCGCCGTCCGDA3CGTCCGCGCCGAAGTAAACACGCCGCCGAGC •142 0TAM75RE AAGTAAACACGCCGCCGAHC TCATCCGCGCCGTCCGCAACGTCCGCGCCGAAGTAAACACGCCGCCGAGC

4109 4119 4129 4139 4149 143 VTAM69RE AAGCCGATTGCGCTCT 195 VBS25 AAGCCGATTGCGCTCTATATTAAGACAAAAGACGAGCACGTGCGGGCCGC 157 0TAM125 AAGCCGATTGCGCTCTATATTAAGACAAAAGACGAGCACGTGCGGGCCGC 144 VTAM79 AAGCCGATTGCGCTCTATATVAAGACAAAAGACGAGCACGTGCGGGCCGC -142 VTAM75RE AAGCCGATTCCGCTCTATATTAAGACAAAAGACGAGCACGTGCGGGCCGC AAGCCGATTGCGCTCTATATTAAGACAAAAGACGAGCACGTGCGGGCCGC

4159 4169 4179 4189 4199 195 0BS25 GCTTTTGAAAAAADGCGCCTATCTTGAGCGGTTCTGCAACCCGAGCGAGC 157 VTAM125 GCTTTTGAAAAACCGCGCCTATCTTGAGCGGTTCTGCAACCCGAGCGAGC 144 VTAM79 GCTTTTGAAAAACCGCGCCTATCTTGAGCGGTTCTGCAACCCGAGCGAGD ■142 0TAM75RE HCTTTTGAAAAACCGCGCCTATCTIGAGCGGTTCTGCAACCCGAGCGAGC -199 VTAM45LNG CC4A4CGAGC GCTTTTGAAAAACCGCGCCTATCTTGAGCGGTTCTGCAACCCGAGCGAGC

4209 4219 4229 4239 4249 195 VBS25 T1TTGATTGATA1ABACGTTCDAGCGCCGGA1ABAGCGATGACGGCGGTC 157 VTAM125 TCTTGATTGATACAAACGTTCCCGCGCCGGACAAAGCGATGACGGCGGTC 144 VTAM79 TCTTGATTGATACAAACGTTCCCGCGCCGGACAAAGDGATGACGGCHATC -142 VTAM75RE TCTTGATTGATACAAACGTTCCCGCGCCGGACAAAHCGATGACGGCGGTC -199 0TAM45LNG TCTT4ATTGATACAAACGTTCCCGCGCCGGACAAAGCGATGACGGCGGTC TCTTGATTGATACAAACGTTCCCGCGCCGGACAAAGCGATGACGGCGGTC

4259 4269 4279 4289 4299 195 VBS25 GTCACAGGCGDAGAG1T1AT1ATGCCT1TTGA 157 OT AM 125 GTCACCGGCGCCGAGCTCATCATGCCTCTTGAAGGATAGATCAATATTGA 144 VTAM79 GTCACDGGCGCCGAGCTCATCATGCCTCTTGAAGGA -142 0TAM75RE GTCACDGGCGCCGAGCTCATCATGCCTCTTGAAGGATAGATC -199 VTAM45LNG GTCACCGGCGCCGAGCTCATCATGCCTCTT-1AAGGATTGATCAATATTGA -159 UTAM70RE AGGATAGATCAATATTAA -153 VTAM45 TT A A GTCACCGGCGCCGAGCTCATCATGCCTCTTGAAGGATTGATCAATATTGA

4309 4319 4329 4339 4349 157 OT AM 125 GGAAGAAATTABGCGGDTTGAAABAGAGCTTGACABATGGAAC -199 VTAM45LNG GGAAGAAATTAA -159 VTAM70RE GGAA4AAATTAAGCGGC-T4AAAAAGAGCTTGACAAATGGAACAAAGAAG -153 0TAM45 HGAA4AAATTAAGCAHCAT4AAAAAGAGCTTGACAAATGGAACAAA4AAG GGAAGAAATTAAGCGGCTTGAAAAAGAGCTTGACAAATGGAACAAAGAAG 276

4359 4369 4379 4389 4399 159 VTAM70RE TCGAGCGCGTCGAAAAGAAACTGGCGAATGAAGGCTTTTTGGCGAAAGCG 153 VTAM45 TC4A4CGCGTCGAAAAGAAACTGGCGAATGAAGGCTTTTTGGCGAAAGCG 131 VTAM70 GGCTTTTTGGCGAAAGCG TCGAGCGCGTCGAAAAGAAACTGGCGAATGAAGGCTTTTTGGCGAAAGCG

4409 4419 4429 4439 4449 159 VTAM70RE CCGGCTCATGTCGTCGAAGAAGAGCGGCGCAAGCGGCAAGATTACATCGA 153 VTAM45 CCGGCTCATGTCG1CGAAGAAGAGCGGCGCAAGCGGCAAGATTACATCGA 131 VTAM70 CCGGCTC-TGTCGTCGAAGAAGAGCGGCGCAAGCGGCAAGATTACATCGA 208 VAL12SEQ AGATTACATCGA CCGGCTCATGTCGTCGAAGAAGAGCGGCGCAAGCGGCAAGATTACATCGA

4459 4469 4479 4489 4499 159 VTAM70RE AAAACGCGAAGCCGTCAAGGCGCGCCTCGCCGAGCTCAAACGGTAGACAA 153 VTAM45 AAAACGCGAAGCCGTCAAGGCGCGCCXCGCCGAGCTCAAACGGTAGACAA 131 VTAM70 AAAACGCGAAG-CCGAAABGGCGCGCCTCGCCGAGCTCAAACGGTAGACAA 208 VAL12SEQ AAAACGCGAAGCCGTCAAGGCGCGCCTCGCCGAGCTCAAACGGTAGACAA 206 TB80AL12 GCCGTCAAGGCGCGCCXCGCCGAGCTCAAACGGXAGACAA 33 VCS43 XCAAACGGIAGACAA AAAACGCGAAGCCGXCAAGGCGCGCCXCGCCGAGCXCAAACGGXAGACAA

4509 4519 4529 4539 4549 159 VTAM70RE ACGATCTGGCGGVAGATTATGGTTGATTATGATGAAGACGAA ■153 VTAM45 ACGATCTGGCGGTAGAITATGGTAGATTATGATGAAGACGAATCCGCTTT 131 VTAM70 ACGATCTGGCGGTTGATTATGGTTGATTATGATGAAGACGAA-CCGCAAA 208 VAL12SEQ ACGATCXGGCGGXXGAXXATGGXXGATXAXGAXGAAGACGAAXCCGCXXX 206 TB80AL12 ACGAXCXGGCGGTXGAXXAXGGTXGAXXAXGAXGAAGACGAAXCCGCTXX 33 VCS43 ACGAICXGGCGGXXGAXXAXGGXXGAIXAIGAXGAAGACGAAXCCGCXXX ACGATCTGGCGGTTGAITATGGTTGATTATGATGAAGACGAATCCGCTTT

4559 4569 4579 4589 4599 ■153 VTAM45 CCXGXGG ■131 VTAM70 CCTGXGGA 208 VAL12SEQ CCXGXGGAXXCGXCXXIXXCGAXGGAXCAXGAXGGAAGGXXGG 206 TB8VAL12 CCXGXGGAXXCGXCXXXXX 33 VCS43 CCTGXGGAXXCGICXXXXXCGAXGGAXCAXGAXGGAAGGXXGGCAXAXXC •137 VTB3EXT TGATAHAAGATTGGCATATTC 167 VS0N6 GATGGAAGGTTGGCATATTC CCTGTGGATTCGTCTTTTTCGATGGATCATGATGGAAGGTTGGCATATTC

4609 4619 4629 4639 4649 33 VCS43 TGAGAAGAGGTTTGTTGTCAACATATACTCGCTTTCGCCACGCTTCTCGA -137 VTB3EXT TGAGAAGAGGTTTGTAGTCAACATATACTCACTTTCGCCACGDTTCTCAA 167 VS0N6 TGAGAAnAGGTTTGTTGTCAACATATACTCGCTTTCGCCACGCTTCTCGA TGAGAAGAGGTTTGTTGTCAACATATACTCGCTTTCGCCACGCTTCTCGA

4659 4669 4679 4639 4699 33 VCS43 T4T4TGGCCGCATCTGCCGCATGAAGCCGGCAGGGGA1GGAAA -137 VTB3EXT T4T4TGGCAGCATCTGCCGCATAAAGCCGGCAGGGGGCGGCAAGACACCA 167 VS0N6 IGTGTGGDAGCATCTGCCGCATGAAGCCGGCAGGGHACGGCAAGACACCA -166 VTAM120EX GCCGGCAHAGGGCGGCAAGACACCA 135 VT Ah83 GCAAGACACCA TGTGTGGCCGCATCTGCCGCATGAAGCCGGCAGGGGGCGGCAAGACACCA

4709 4719 4729 4739 4749 -137 VTP3EXT AXHAAAAAAGGAGGACGATGAACATGGTTC4AACGTATGAAGAAGCAGTC 167 VS0N6 ATGGAAAAAGGAGGACGATGAACATGGTTCGAACGTATGAAGAAGCAGTC -166 VTAM120EX ATGGPAAAAGGAGGACGATGAACATGGTTCGAACGTATGAAGAAGCAGTC 135 VTAMB3 ATGGAAAAAGGAGGACGATGAACATGGTTCGAACGTATGAAGAAGCAGTC -114 0TB3TAM TGGTTCGAACGTATGAAGAAGCAGTC ATGGAAAAAGGAGGACGATGAACATGGITCGAACGTATGAAGAAGCAGTC

4759 4769 4779 4789 4793 -137 VTB3EXT GCTXGGATTCACGGGCGGCTGCGGCTCGGCAT4AAACCGGGAT4AAACGG 167 VS0N6 GCTTGGATTCACGGGCGGDTACGGCTCGGCATAAABCDHAAATGABACGG -166 VTAM120EX GCTTGGATTCACGGGCGGCTGCGGCXCGGCATGAAACCGGGATGAAACGG 135 VTAM83 GCTTGGATTCACGGGCGGCTGCGGCTCGGCATGAAACCGGGATGAAACGG -114 VTB3TAM GCAAGGAAOCACGGGCGGCTGCGGCTCGGCATGAAACCGGGATGAAACGG -109 VE20SM GAATCACAHGCAHDT4CGGCTCHGCAT4AAACCAHGATGAAACAM -108 VRES3D AHGAVGAAACAG GCTTGGATTCACGGGCGGCTGCGGCTCGGCATGAAACCGGGATGAAACGG 4809 4819 4829 4839 4849 -137 VXB3EXI AXAHAAIGGAXGAIAGAACAACXCGGCCACCCGGAACGACCAAGXCCGCG 167 VS0N6 AXGAAAXGAAXG -166 VXAM120EX ATGGAATGGATGATGGAA 135 VIAM83 AXGGAAXGGAXGAIGGAACAACXCGACCACCCGGAACGACCGCGXCCGCG -114 VXB3XAM AXGGAAXGGAXGAXGGAACAACTCGGCCACCCGGAACGCCCAAGXCCGCG -109 VE20SM ATAHAATGGAXGAXGGAACAACXCGGCCACCCGGAACHCCAAAGXCCGCG -108 VRES3D AXGGAAIGGAIGAXGGAACAACXCHGCCACCCGGAACGCCCAGGXCC4CG -127 BEGIN3A GAXGGAACAACXCGGCCACCCGGAACGCCCAAAXCCGCG -120 TH0RBEG3 GGAACAACXCGGCCACCCGGAACGCCC8GGXCCGCG -116 TAH3 AACAACXCGGCCADCCGGAACGCCCAAGXCC-CG AXGGAAIGGAXGAXGGAACAACXCGGCCACCCGGAACGCCCG-GICCGCG

4859 4869 4379 4389 4899 -137 VXB3EXI CCGXCCAXAXCGGGG 135 VXAM83 CCGICCAXAICGGGGGAACGAACGGCAAAGGGXCAACGGXCGCCXAXXXG -114 VIB3XAM CCGTCCATAXCGGGGGAACGAACGGCAAAGGGICAACGGXCGCCTAXXXG -109 VE20SM CCGXCCAXAXCGGGGGAACGAACGGCAAAGGGTCAACGGXCGCCXATXXG -108 VRES3D CCGXCCAXAXCGGGGGAACGAACGGCAAAGGGICAACGGXCGCCXAXXXG -127 BEGIN3A CCGXCCATAXCGGGGGAACGAACGGCAAAGGGXCAACGGXCGCCXAXTIG -120 XH0RBEG3 CCGXCCATAXCGGGGGAACGAACGGCAAAGGGXCAACGGXCGCCXATXTG -116 XAM3 CCGTCCAXAXCGGGGGAAC4AACGGCAAAGGGXCAACGGXCGCCTATXTG -117 XB8REV CAXCCAXAXCGGGGGAAC4AACGGCAAAGGGXCAACGGXCGCCTATIXG -138 VB15 XCCAXATCGGHAAAADAAACGGCAAAGGGXCAACGGTCGCCTATTTG CCGICCAXAXCGGGGGAACGAACGGCAAAGGGICAACGGXCGCCTATTIG

4909 4919 4929 4939 4949 135 VXAM83 CGCXCGAXXXXGCAGGCGGCGGGCXAXXCGGXCGGDACGIXCACCXCGCC -114 VXB3XAM CGCXCGAXTXXGCAGGCGGCGGGCXAXXCGGXCGGCACGXXCACCXCGCC -109 VE20SM CGCICGATTXIGCAGGCGGCGGGCXAXTCGGXCGGCACGTXCACCXCHCC -108 VRES3D CGCXCGAXIXTGCAGGCGGCGGGCXAXICGGICGGCACGIXCACCXCGCC -127 BEGIN3A CGCTCGAITXXGCAG -120 TH0RBEG3 CGCTCGAXTXTGCAGGCGGCGGGCXAXTCGGXCGGCACGXXCACCXCGCC -116 TAM3 CGCICGAIXXIG -117 XB8REV CGCXCGATTXXGCAGGCGGCGGGCXA -138 VB 15 CGCTCGATIXXGCAGGCGGCGGGCXAXTCGGXCGGCACGXXCACCXCGCC CGCXCGAXTXXGCAGGCGGCGGGCXAXICGGXCGGCACGXXCACCICGCC

4959 . 4969 4979 4989 4999 135 VXAM83 GXAXGXCGAGCAGIXXAACGAACGAAXCAGCAICAACGGCGAACCGAXXT -114 VIB3IAM GXATGXCGAGCAGXXXAACGAACGAAXCAGCAXCAACGGCGAACCGAXXI -109 VE20SM GXATGXCGAGCAGXXTAACGAACGAAXCAGCATCAACGGCGAACCGAIXX -108 ORES3D GXAXGXCGAHCAGXXXAACGAACGAAXCAGCAXCAACGGCGAACCGAXXX -120 XH0RBEG3 GXATGXCGAGCAGXXIAACGAA'CGAAXCAGCAICAACGGCGAACCGAXXI -138 VB 15 GXAXGXCGAGGAGTTXAACGAACGAAXCAGCAXCAACGGCGAACCGATXX GXAIGXCGAGCAGXTXAACGAACGAAXCAGCATCAACGGCGAACCGAXXX

5009 5019 5029 5039 5049 135 VXAM83‘ CCGAXCCGICGACDXGCAGCCA -114 VXB3XAM CCGAXCCGXCGACCXGCAG -109 VE20SM CCGAXCCGXCGACC -108 VRES3D CCGAXCCG -120 XH0RBEG3 CCGAXCCGICGACCXG -138 VB 15 CCG CCGAXCCGXCGACCXGCAGCCA 278

REFERENCES

Atkinson, T., Banks, G.T., Bruton, C.J., Comer, M., Jakes, R., Kamalagharan, T., Whitaker, A.R. and Winter, G.P. (1979) J. Appl. Biochem. i, pp 247-258.

Backman, K. and Ptashne, M (1978) Cell J_3, pp 65-71.

Backman, K., Betlach, M., Boyer, H.W. and Yanofsky, S. (1978) Cold Spring Harbor Symp. Quant. Biol. 43, pp 69-76.

Baldwin, A.N. and Berg, P. (1966) J. Biol. C h e m . 24L PP 839-845.

Band, L. and Henner, DJ. (1984). DNA 3, pp 17-21.

Bankier, A.T. and Barrell, B.G. (1983) in Techniques in the Life Sciences B5, Nucleic Acids Biochemistry, B508, Elsevier Scientific Publishers Ltd, Ireland.

Barany, F. (1985) Proc. Natl. Acad. Sci. U S A 82, pp 4202-4206.

Barker, D.G. (1982) Eur. J. Biochem. 125. pp 357-360.

Barker, D.G. and Winter, G. (1982) F E B S Letts 145, pp 191-193.

Barker, D.G., Bruton, C.J. and Winter, G. (1982a) F E B S Letts 150, pp 419-423.

Barker, D.G., Ebel, J-P., Jakes, R. and Bruton, C.J. (1982b)Eur. J. Biochem. 127. pp 449-457.

Barstow, D.A., Sharman, A.F., Atkinson, T. and Minton, N.P. (1986) G e n e , in press.

Bedouelle, H. and Winter, G. (1986) Nature 320, pp 371-373.

Berg, P. (1958) J. Biol. C h e m . 233, PP 601-607. 279

Berg, P. and Offengand, E.J. (1958) Proc. Natl. Acad. Sci. USA 44, pp 78-85.

Bhat, T.N., Blow, D.M., Brick, P. and Nyborg, J. (1982) J. Mol. Biol. 158. pp 699-709.

Biggin, N.D., Gibson, T.J. and Hong, G-F. (1983) Proc. Natl. Acad. Sci. 80, PP 3963-3965.

Birnboim, H. C. and Doly, J. (1979) Nucleic Acids Res. 7, pp 1513-1523.

Blow, D.M., Birktoft, J.J. and Hartley, B.S. (1969) Nature 221. pp 337-340.

Blow, D.M., Bhat, T.N., Metcalfe, A., Risler, J.L., Brunie, S. and Zelwer, C. (1983) J. Mol. Biol. H i, PP 571-576.

Borgford, T., Brand, N.J., Gray, T.E. and Fersht, A.R. (1986) Submitted for publication.

Bosshard, H.F., Koch, G.L.E. and Hartley, B.S. (1978) J. Mol. Biol. 119. pp 377-389.

Brand, N.J. and Fersht, A.R. (1986) Gene 44, pp 139-142.

Breton, R., Sanfacon, H., Papayannopoulos, I., Biemann, K. and Lapointe, J. (1986) J. Biol. Chem. 26L PP 10610-10617.

Budzik, G.P., Lam, S.S.M., Schoemaker, H.J.P. and Schimmel, P.R. (1975) J. Biol. Chem. 250, PP 4433-4439.

Calendar, R. and Berg, P. (1966) Biochemistry 5, pp 1681-1690.

Carter, P., Bedouelle, H. and Winter, G. (1986) Proc. Natl. Acad. Sci. USA 83, PP 1189-1192.

Cassio, D. and Waller, J-P. (1971) Eur. J. Biochem. 20, pp 283-300. 280

Chapeville, F., Lipmann, F., von Ehrenstein, G., Weisblum, B., Ray, W.J. and Benzer, S. (1962) Proc. Natl. Acad. Sci. U S A 48, pp 1086-1092.

Chen, E.Y. and Seeburg, P.H. (1985) DNA 4, pp 165-170.

Clarke, L. and Carbon, J. (1975) Proc. Natl. Acad. Sci. U S A 72, pp 4361-4365.

Cordell, B., Bell, G., Tisher, E., DeNoto, F.M., Ullrich, A., Pictet, R., Rutter, W.J. and Goodman, H.M. (1979) Cell 18, pp 533-543.

Crick, F.H.C. (1958) S y m p . Soc. Expl. Biol. 12, pp 138-163.

Crick, F.H.C. (1966) J. Mol. Biol. 19, pp 548-555.

Crick, F.H.C. (1968) J. Mol. Biol. 38, pp 367-379.

Crick, F.H.C., Barnett, L., Brenner, S. and Watts-Tobin, R.J. (1961) Nature 192 pp 1227-1232.

Dang, C.V. (1982) Biochem. Biophys. Res. Commun. 106. pp 49-53.

Dang, C.V. and Dang, C.V. (1983) Bioscience Reports 3, pp 527-538.

Dardell, F., Fayat, G. and Blanquet, S. (1984) J. B a d . 160. pp 1115-1122.

Das, A. and Yanofsky, C. (1984) J. B a d . 160. pp 805-807.

Dayhoff, MO. (1969) of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, D.C.

Dayhoff, M.O. and McLaughlin, P.J. (1972) in Atlas of Protein Sequence and Structure 1972 (volume 5), National Biomedical Research Foundation, Washington, D.C. de Moss, J.A. and Novelli, G.D. (1956) Biochim. Biophys. Acta 22, pp 49-61. 281

Deinenger, P.L. (1983) Anal. Biochem. 129. pp 216-223.

Dente, L., Cesareni, G. and Cortese, R. (1983) Nucleic Acids Res. 11. ppl645-1655.

Dixon, M. and Webb, E.C. (1964) E n z y m e s (2nd Edition), Longmans, U.K.

Doolittle, R.F. (1981) Science 214. pp 149-159.

Donk,P.J. (1920) in Ber gey’s Manual of Determinative Bacteriology, 8 th Edn, (1974), (Buchanan, R.E. and Gibbons, N.E., Eds), Williams and Wilkins, Baltimore.

Dynan, W.S. and Tijan, R. (1983) Cell 32, pp 669-680.

Eadie, G.S. (1942) J. Biol. C h e m . 146, pp 85-93.

Ebel, J-P., Giege, R., Bonnet, J., Kern, D., Befort, N., Bollack, C., Fasiolo, F., Gangloff, J. and Dirheimer, G. (1973) Biochimie 55, pp 547-557.

Edelhoch, H. (1967) Biochemistry 6, pp 1948-1954.

Eisenberg, S.P., Yarus, M. and Soil, L. (1979) J. Mol. Biol. 135. pp 111-126.

Fasiolo, F., Gibson, B.W., Walter, P., Chatton, B., Biemann, K. and Boulanger, Y. (1985) J. Biol. C h e m . 260, pp 15571-15576.

Fayat, G., Hountondji, C. and Blanquet, S. (1979) Eur. J. Biochem. 96, pp 87-92.

Fayat, G., Mayaux, J-F., Sacerdot, C., Fromant, M., Springer, M., Grunberg-Manago, M. and Blanquet, S. (1983) J. Mol. Biol. 171. pp 239-261.

Fersht, A.R. (1977) Biochemistry 16, pp 1025-1030. 282

Fersht, A.R. (1985) Enzyme Structure and Mechanism (2nd edn), W.H. Freeman & Co., New York.

Fersht, A.R. (1986) in Accuracy in Molecular Processes (Kirkwood, T.B.L., Rosenberger, R.F. and Galas, D.J., Eds), Chapman and Hall, New York.

Fersht, A.R. and Dingwall, C. (1979a) Biochemistry 18, pp 1250-1256.

Fersht, A.R. and Dingwall, C. (1979b) Biochemistry j_8, pp 2627-2631.

Fersht, A.R. and Jakes, R. (1975) Biochemistry 14, pp 3350-3356.

Fersht, A.R. and Kaethner, M. (1976) Biochemistry 15, pp 3342-3346.

Fersht, A.R., Mulvey, R.S. and Koch, G.L.E. (1975) Biochemistry 14, pp 13-18.

Fersht, A.R., Schindler, J.S. and Tsui, W-C. (1980) Biochemistry 19, pp 5520-5524.

Fersht, A.R., Shi, J-P., Wilkinson, A.J., Blow, D.M., Carter, P., Waye, M.M.Y. and Winter, G.P. (1984) Angew. Chem. (Int. Ed. Eng.) 23, pp 467-473.

Fersht, A.R., Shi, J-P., Knill-Jones, J., Lowe, D.M., Wilkinson, A.J., Blow, D.M., Brick, P., Carter, P., Waye, M.M.Y. and Winter, G. (1985a) Nature 314. pp 235-238.

Fersht, A.R., Wilkinson, A.J., Carter, P. and Winter, G.P. (1985b) Biochemistry 24, pp 5858-5861.

Fersht, A.R., Leatherbarrow, R.J. and Wells, T.N.C. (1986) TIBS j_l, pp 321-325.

Freedman, R., Gibson, B., Donovan, D., Biemann, K., Eisenberg, S., Parker, J. and Schimmel, P. (1985) J. Biol. C h e m . 260. pp 10063-10068. 283

Gibson, T.J. (1984) PhD Thesis, University of Cambridge.

Gold, L., Pribnow, D., Schneider, T., Shinedling, S., Singer, B.S. and Stormo, G. (1981). Ann. Rev. Microbiol. 35., pp 365-403.

Gouy, M. and Gautier, C. (1982) Nucleic Acids Res. IQ, pp 7055-7072.

Grantham, R., Gautier, C., Gouy, M., Jacobzone, M. and Mercier, R. (1981) Nucleic Acids Res. 9, pp r43-r74.

Gronenborn, B. and Messing, J. (1978) Nature 272. pp 373-377.

Gros, C. and Labouesse, B. (1969) Eur. J. Biochem. 7, pp 463-470.

Grosjean, H. and Fiers, W. (1982) G ene 18, pp 199-209.

Guarente, L., Lauer, G., Roberts, T.M. and Ptashne, M. (1980) Cell 20j pp 543-553.

Hall. C.V. and Yanofsky, C. (1981) J. Bact. 148. pp 941-949.

Hall, C.Y., vanCleemput, M., Muench, K.H. and Yanofsky, C. (1982) J. Biol. C h e m . 257. pp 6132-6136.

Hanahan, D. (1983) J. Mol. Biol. 166, pp 557-580.

Hanahan, D. (1985) in DNA Cloning, vol 1 (Glover, D.M., Ed.), IRL Press Ltd, U.K.

Harris, T.J.R. (1983) in Genetic Engineering 4 (Williamson, D., Ed.), Academic Press, London.

Hawley, D.K. and McClure, W.R. (1983) Nucleic Acids Res. 11. pp2237-2255.

Hecht, S.M. (1979) in Transfer R N A : Structure, Properties a n d Recognition, (Schimmel, P.R., Soli, D. and Abelson, J.N., Eds), Cold Spring Harbor Laboratory, N.Y. 284

Henikoff, S. (1984) Nucleic Acids Res. 28, pp 351-359.

Herrmann, R., Neugebauer, K., Pirkl, E., Zentgraf, H. and Schaller, H. (1980). Molec. Gen. Genet. 177, pp 231-242.

Ho, C.K. and Fersht, A.R. (1986) Biochemistry 25, pp 1891-1897.

Hoagland, MB., Keller, E.B. and Zamecnick, P.C. (1956) J. Biol. C h e m . 218. pp 345-358.

Hoagland, M.B., Stephenson. M.L., Scott, J.F., Hecht, L.I. and Zamecnick, P.C. (1958a) J. Biol. C h e m . 231. pp 241-257.

Hoagland, M.B., Zamecnick, P.C. and Stephenson. M.L. (1958b) Biochim. Biophys. Acta 24, pp 215-216.

Hoben, P., Royal, N., Cheung, A., Yamao, F., Biemann, K. and Soil, D. (1982) J. Biol. C h em. 257, pp 11644-11650.

Hofjt ee, B.H.J. (1959) Nature 184, pp 1296-1298.

Hohn, B., Lechner, H. and Marvin, D.A. (1971) J. Mol. Biol. 56, pp 143-154.

Holmes, D.S. and Quigley, M. (1981) Anal. Biochem. 114. pp 193-197.

Hountondji, C., Fayat, G. and Blanquet, S. (1979) Eur. J. Biochem. 102. pp 247-250.

Hountondji, C., Blanquet, S. and Lederer, F. (1985) Biochemistry 24, pp 1175-1180.

Hountondji, C., Lederer, F., Dessen, P. and Blanquet, S. (1986) Biochemistry 25, pp 16-21.

Ikemura, T. (1981) J. Mol. Biol. 146. pp 1-21.

Isaksson, L. A., Skold, S-E., Skjoldebrand, J. and Takata, R. (1977) 285

Molec. Gen. Genet. 156. pp 233-237.

Jasin, M., Regan, L. and Schimmel, P. (1983) Nature 306. pp 441-447.

Joachimiak, A. and Barciszewski, J. (1980)F E B S Lett. 119. pp 201-211.

Jones, D.H., McMillan, A.J., Fersht, A.R. and Winter, G. (1985) Biochemistry 24, pp 5852-5857.

Jones, M.D., Lowe, D.M., Borgford, T. and Fersht, A.R. (1986) Biochemistry 25, pp 1898-1891.

Joseph, D.R. and Muench, K.H. (1971) J. Biol. C h e m . 246, pp 7602-7609.

Kagawa, Y., Nojima, H., Nukiwa, N., Ishizuka, M., Nakajima, T., Yasuhara, T., Tanaka, T. and Oshima, T. (1984) J. Biol. C h e m . 259. pp 2956-2960.

Kamio, Y., Lin, C-K., Regue, M. and Wu, H.C. (1985) J. Biol. C h e m . 260. pp 5616-5620.

Kim, S.H. (1975) Nature 256. pp 679-681.

Koch, G.L.E., Boulanger, Y. and Hartley, B.S. (1974) Nature 249. pp 316-320.

Kohda, D., Yokoyama, S. and Miyazawa, T. (1984) F E B S Lett. 174. pp 20-23.

Kramer, F.R. and Mills, D.R. (1978) Proc. Natl. Acad. Sci. U S A 75, pp 5334-5338.

Kula, M-R. (1973) F E B S Lett. 35, PP 299-302.

Laemmli, U.K. (1970) Nature 227. pp 680-685.

Leatherbarrow, R.J., Fersht, A.R. and Winter, G. (1985) Proc. Natl. Acad. Sci. U S A 82, pp 7840-7844. 286

Leder, P. and Nirenberg, M.W. (1964) Proc. Natl. Acad. Sci. U S A 52, pp 420-427.

Lee, F., Bertrand, K., Bennett, G. and Yanofsky, C. (1978) J. Mol. Biol. 121. pp 193-217.

Lewin, B. (1983) Genes, John Wiley, New York.

Loftfield, R.B. (1963) Biochem. J. 89, pp 82-92.

Loftfield, R.B. and Eigner, E.A. (1966) Biochim. Biophys. Acta 130. pp 426-448.

Loftfield, R.B. and Vanderjagt, D. (1972) Biochem. J. 128. pp 1353-1356.

Lowe, D.M., Fersht, A.R., Wilkinson, A.J., Carter, P. and Winter, G. (1985) Biochemistry 24, pp 5106-5109.

Lowry, O.H., Rosebrough, N.J., Farr, A.L., and Randall, R.J. (1951) J. Biol. C h e m . 193. pp 265-275.

Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning. A Laboratory Manual , Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

Maxam, A.M. and Gilbert, W. (1977) Proc. Natl. Acad. Sci. U S A 74, pp 560-564.

McLachlan, A.D. (1971) J. Mol. Biol. 6L PP 409-424.

McLaughlin, J.R., Murray, C.L. and Rabinowitz, J.C. (1981) J. Biol. C h e m . 256, PP 11283-11291.

Messing, J. (1979) Recomb. D N A Tech. Bull 2, p 43.

Messing, J. (1983) Meths. Enzymology 101. pp 20-78.

Messing, J. and Vieira, J. (1982)Gene 19, pp 269-276. 287

Messing, J., Crea, R. and Seeburg, P.H. (1981) Nucleic Acids Res. 9, pp 309-321

Mills, D.R. and Kramer, F.R. (1979) Proc. Natl. Acad. Sci. U S A 76, pp 2232-2235.

Min Jou, W., Haegeman, M., Ysebaert, M. and Fiers, W. (1972) Nature 237. pp 82-88.

Moran Jr., C.P., Lang, N., LeGrice, S.F.J., Lee, G., Stephens, M., Sonenshein, A.L., Pero, J. and Losick, R. (1982). Mol. Gen. Genet. 186. pp 339-346.

Monteilhet, C. and Blow, D.M. (1978) J. Mol. Biol. 122, pp 407-417.

Moore, S. and Stein, W.H. (1963) Meths. Enzymol. 6, pp819-831.

Muench, K.H. (1976) J. Biol. C h em. 251, PP 5195-5199.

Mulvey, R.S. and Fersht, A.R. (1976) Biochemistry 15, pp 243-249.

Mulvey, R.S. and Fersht, A.R. (1977a) Biochemistry 16, pp 4005-4013.

Mulvey, R.S. and Fersht, A.R. (1977b) Biochemistry 1_6, pp 4731-4737.

Mulvey, R.S., Gualtieri, R.J. and Beychok, S. (1974) Biochemistry J_3, pp 782-787.

Myers, A.M and Tzagoloff, A. (1985) J. Biol. C h e m . 260. pp 15371-15377.

Myers, G., Blank, H.U. and Soil, D. (1971) J. Biol. C h e m . 246, pp 4955-4964.

Neidhardt, F.C., Bloch, P.L., Pedersen, S. and Reeh, S. (1977) J. Bact. 129. pp 378-387.

Neugebauer, K., Sprengel, R. and Schaller, H. (1981) Nucleic Acids Res. 9, pp 2577-2588. 288

Nirenberg, MW. amd Leder, P. (1964) Science 145. pp 1399-1407.

Nirenberg, MW. and Matthaei, J.H. (1961) Proc. Natl. Acad. Sci. U S A 47, pp 1588-1602.

Normanly, J., Ogden, R.C., Horvath, S.J. and Abelson, J. (1986)Nature 321. pp 213-219.

Norrander, J., Kempe, T. and Messing, J. (1983)Gene 26, pp 101-106.

Norris, A.T. and Berg, P. (1964) Proc. Natl. Acad. Sci. U S A 52, pp 330-337.

O’Farrell, P. (1981) B R L Focus 3, pp 1-3.

O’Farrell, P.H. (1975) J. Biol. C h e m . 250, pp 4007-4021.

Orgel, L.E. (1963) Proc. Natl. Acad. Sci. U S A 49, pp 517-521.

Orgel, L.E. (1968) J. Mol. Biol. 38, pp 381-393.

Pauling, L. (1946) Chem. Eng. News 24, pp 1375-1377.

Pedersen, S., Bloch, P.L., Reeh, S., Neidhardt, F.C. (1978) Cell 14, pp 179-190.

Perutz, MF. and Raidt, H. (1975) Nature 255. pp 256-258.

Pharmacia Ltd, Molecular Biologicals catalogue (1984).

Platt, T. (1986) Ann. Rev. Biochem. £5, pp 339-372.

Plumbridge, J.A. and Springer, M. (1980) J. Mol. Biol. 144. pp 595-600.

Plumbridge, S.D. and Springer, M. (1982) J. Bact, 152. pp 661-668.

Pororske, L.H., Cohn, M., Yanagiasawa, N. and Auld, D.S. (1979) Biochim. Biophys. Acta 576. pp 128-133. 289

Post, L.E. and Nomura, M. (1980) J. Biol. Chem. 225. pp 1064-1066.

Post, L.E., Arfsten, A.E., Reusser, F. and Nomura, M. (1978) Cell, 15, pp 215-229.

Post, L.E., Strycharz, G.D., Nomura, M., Lewis, H. and Dennis, P.P. (1979) Proc. Natl. Acad. Sci. USA 76, pp 1697-1701.

Pratt, D. (1969) Ann. Rev. Genet. 3, pp 343-367.

Pribnow, D. (1975) Proc. Natl. Acad. Sci. USA 72, pp 784-788.

Putney, S.D. and Schimmel, P.R. (1981) Nature 291. pp 632-635.

Putney, S.D., Royal, N.J., Neuman de Vegvar, H.N., Herlihy, W.C., Biemann, K. and Schimmel, P. (1981a) Science 213. pp 1497-1500.

Putney, S.D., Sauer, R.T. and Schimmel, P.R. (1981b) J. Biol. Chem. 256. pp 198-204.

Queen. C. and Korn, L.J. (1984) Nucleic Acids Res. 12, pp 581-599.

Ratzin, B. and Carbon, J. (1977) Proc. Natl. Acad. Sci. USA 74, pp 487-491.

Risler, J.L., Zelwer, C. and Brunie, S. (1981) Nature 292. pp 384-386.

Rosenberg, M. and Court, D. (1979) Ann. Rev. Genet. 1.3, pp 319-353.

Rossmann, M.G. and Argos, P. (1977) J. Mol. Biol. 109. pp 99-129.

Rossmann, M.G., Liljas, A., Branden, C.I. and Banaszak, L.J. (1975) in The Enzymes vol. XI (Boyer, P.D., Ed.), Academic Press, New York.

Rubin, J. and Blow, D. (1981) J. Mol. Biol. 145. pp 489-500.

Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, J.C., Hutchinson III, C., Slocombe, P.M. and Smith, M. (1977a) 290

Nature 265. pp 687-695.

Sanger, F., Nicklen, A.R. and Coulson, A.R. (1977b). Proc. Natl. Acad. Sci., U.S.A. 74, pp 5463-5467.

Sanger, F., Coulson, A.R., Barrell, B.G., Smith, A.J.H. and Roe, B.A. (1980) J. Mol. Biol. 143, pp 161-178.

Schaller, H., Gray, C. and Herrmann, K. (1975) Proc. Natl. Acad. Sci. USA 72, pp737-741.

Schimmel, P.R. (1979) in Transfer RN A : Structure, Properties and Recognition, (Schimmel, P.R., Soil, D. and Abelson, J.N., Eds), Cold Spring Harbor Laboratory, N.Y.

Schimmel, P.R. and Soli, D. (1979) Ann. Rev. Biochem. 48, pp 601-648.

Schulman, L.H., Pelka, H. and Susani, M. (1983) Nucleic Acids Res. i i , pp 1439-1455.

Schulz, G.E. and Schirmer, R.H. (1979) Principles of Protein Structure, Springer-Verlag, New York.

Shepard, H.M., Yelverton, E. and Goeddel, D.Y. (1982). DNA I, pp 121-131.

Shine, J. and Dalgarno, L. (1974) Proc. Natl. Acad. Sci. U.S.A. 71, pp 1342-1346.

Siebenlist, U., Simpson, R.B. and Gilbert, W. (1980) Cell 20, pp 269-281.

Skogman, S.G. and Nilsson, J. (1984) Gene 30, pp 219-226.

Soil, D. and Schimmel, P.R. (1974) in The Enzymes vol. X (Boyer, P.D., Ed.), Academic Press, New York.

Springer, M., Trudel, M., Graffe, M., Plumbridge, J.A., Fayat, G., Mayaux, J.F., Sacerdot, C., Blanquet, S. and Grunberg-Manago, M. 291

(1983) J. Mol. Biol. \TL> PP 263-279.

Springer, M, Plumbridge, J.A., Butler, J.S., Graffe, M, Dondon, J., Mayaux, J.F., Fayat, G., Lestienne, P., Blanquet, S. and Grunberg-Manago, M. (1985) J. Mol. Biol. 185. pp 93-104.

Springgate, C.F. and Loeb, L.A. (1975) J. Mol. Biol. 97, pp 577-591.

Sprinzl, M., Moll, J., Meissner, F. and Hartmann, T, (1985) Nucleic Acids Res. 13, rl-r49.

Staden, R. (1979) Nucleic Acids Res. 6, pp 2601-2610.

Staden, R. (1980) Nucleic Acids Res. 8, pp 3673-3694.

Staden, R. (1982a) Nucleic Acids Res. 10, pp 2951-2961.

Staden, R. (1982b) Nucleic Acids Res. 10, pp 4731-4751.

Staden, R. (1984) Nucleic Acids Res. 12, pp 521-538.

Staden, R. (1986) Nucleic Acids Res. 14, pp 217-231.

Stanier, R.Y., Adelberg, E.A. and Ingraham, J.L. (1976) General Microbiology, 4th edition, Macmillan Press Ltd, London.

Struhl, K., Cameron, J.R. and Davis, R.W. (1976) Proc. Natl. Acad. Sci. USA 73, pp 1471-1475.

Sutcliffe, J.G. (1978) Cold Spring Harbor Symp. Quant. Biol. 43, pp 77-90.

Takahashi, K., Vigneron, M, Matthes, H., Wildeman, A., Zenke, M. and Chambon, P. (1986) Nature 319. pp 121-126.

Temin, H.M. and Baltimore, D. (1972) Adv. Virus Res. 1_7, PP 129-186.

Thiebe, R., Harbers, K. and Zachau, H.G. (1972) Eur. J. Biochem. 26, pp 144-152. 292

Thompson, R. (1982) in Genetic Engineering 3 (Williamson, R., Ed.), Academic Press, London.

Tinoco Jr., I., Borer, P.N., Dengler, B., Levine, M.D., Uhlenbeck, O.C., Crothers, D.M. and Gralla, J. (1973). Nature New Biol. 246. pp 40-41.

Tokunaga, M., Loranger, J.M, Chang, S-Y., Regue, M, Chang, S. and Wu, H.C. (1985) J. Biol Chem. 260, pp 5610-5615.

Travers, A.A., Lamond, A.I., Mace, H.A.F. and Berman, M.L. (1983) Cell 35, PP 265-273.

Twigg, A. J. and Sherratt, D. (1980) Nature 283. pp 216-218.

Yenkatachalam, C.M. (1968) Biopolymers 6, pp 1425-1436.

Vieira, J. and Messing, J. (1982)Gene 19, pp 259-268.

Walker, J.E., Wonacott, A.J. and Harris, J.I. (1980)Eur. J. Biochem. 108. pp 581-586.

Walter, P., Gangloff, J., Bonnet, J., Boulanger, Y., Ebel, J-P and Fasiolo, F. (1983) Proc. Natl. Acad. Sci ( U S A ) 80, pp 2437-2441.

Waterson, R.M and Konigsberg, W.H. (1974) Proc. Natl. Acad. Sci. U S A ZL pp376 - 380.

Waye, M.M.Y. and Winter, G. (1986) Eur. J. Biochem. 158. pp 505-510.

Waye, M.M.Y., Winter, G., Wilkinson, A.J. and Fersht, A.R. (1983) E M B O J. 2, pp 1827-1829.

Waye, M.M.Y., Verhoeyen, M.E., Jones, P.T. and Winter, G. (1985) Nucleic Acids Res. 13, pp 8561-73.

Webster, T.A., Gibson, B.W., Keng, T., Biemann, K. and Schimmel, P. (1983) J. Biol. C h e m . 258, pp 10637-10641. 293

Webster, T., Tsai, H., Kula, M., Mackie, G.A. and Schimmel, P. (1984) Science 226. pp 1315-1317.

Wells, T.N.C. and Fersht, A.R. (1986) Biochemistry 25, pp 1881-1886.

Wilcox, M. (1969) Eur. J. Biochem. JLL PP 405-412.

Wilkinson, A.J. (1984) Ph.d. Thesis, University of London.

Winter, G., Fersht, A.R., Wilkinson, A.J., Zoller, M. and Smith, M. (1982) Nature 299. pp 756-758.

Winter, G., Koch, G.L.E., Hartley, B.S. and Barker, D.G. (1983) Eur. J. B iochem 132. pp 383-387.

Wong, S-L. and Doi, R.H. (1984) J. Biol. C h e m . 259, pp 9762-9767.

Yamao, F., Inokuchi, H., Cheung, A., Ozeki, H. and Soli, D. (1982) J. Biol. C h e m . 287. pp 11639-11643.

Yanisch-Perron, C., Yieira, J. and Messing, J. (1985)Gene 33, pp 103-119.

Yaniv, M. and Gros, F. (1970) J. Mol. Biol. 44, pp 1-15.

Yanofsky, C. (1981) Nature 289. pp 751-758.

Yarus, M. (1972) Biochemistry 1_L pp 2352-2361.

Yarus, M. (1979) in Transfer R N A : Structure, Properties a n d Recognition, (Schimmel, P.R., Soli, D. and Abelson, J.N., Eds), Cold Spring Harbor Laboratory, N.Y.

Yarus, M. and Berg, P. (1970) Anal. Biochem. 35, pp 450-465.

Yu, F., Yamada, H., Daishima, K. and Mizushima, S. (1984) F E B S Lett. 173. pp 264-268. 294

Zamecnick, P.C., Stenphenson, M.L. and Hecht, L.I. (1958) Proc. Natl. Acad. Sci. USA 44, PP 73-78.

Zelwer, C., Risler, J.L. and Brunie, S. (1982) J. Mol. Biol. 155. p 63-81.