INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis arxj dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy subm itted. Broken or indistinct print, colored or poor quality illustrations and photographs, print t)leedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps.

Photographs included in the original manuscript have t>een reproduced xerographically in this copy. Higher quality 6” x 9" t>lack and white photographic prints are available for any ptiotographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order.

Bell & Howell Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 UMT

NOTE TO USERS

This reproduction Is the best copy available.

UMI

THE MOLECULAR BIOLOGY OF THE Streptomyces snp LOCUS

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the Graduate

School of The Ohio State University

By

Charles Louis DeSanti, B.S.

*****

The Ohio State University 2000

Dissertation Committee: Approved by Dr. Tina Henkin, Adviser

Dr. Charles Daniels

Dr. John Reeve Adviser

Dr. William Strohl Department of Microbiology UMI Number 9982547

UMI*

UMI Microform9982547 Copyright 2000 by Bell & Howell Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.

Bell & Howell Information and Leaming Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor. Ml 48106-1346 ABSTRACT

The Streptomyces are producers of numerous extracellular . Among these , the small neutral metalloproteinases, typified by SnpA from Streptomyces sp. strain C5, is imique among metalloproteinases because of its small size and atypical aspartyl zinc-binding residue. The snp locus of Streptomyces sp. strain C5, which encodes SnpA, is comprised of the snpA proteinase gene and the divergently oriented snpR gene. SnpR, a LysR-like transcriptional regulator activates snpA transcription by about 35-fold compared to non-activated levels. The transcriptional start points of the two genes were determined, and one of the interesting features found is that snpA is transcribed as a leaderless mRNA species. The transcriptional properties of the

Streptomyces sp. strain C5 snp locus were harnessed to generate a family of expression and expression-secretion vectors for heterologous gene expression in Streptomyces, and were validated by expressing and secreting recombinant human endostatin from S. lividans such that it remained soluble in the extracellular milieu.

A panel of Streptomyces was also screened using PGR for ^np-like DNA, and 33 out of the 96 strains tested gave positive results. Twelve amplification products were cloned, sequenced, and analyzed. Comparison of the new and published sequences revealed that the Streptomyces snp loci from diverse species have significant similarity in both the 5’ snpR region and the snpA region encoding the mature proteinase, but substantially lower similarity in the intergenic region, and the 5’ end of snp A encoding the signal peptide and propeptide. Phylogenetic analysis of the loci suggested that, although distributed widely among the Streptomyces taxa, monophyletic relationships exist between the snp loci of: i) S. fradiae and S. fellus\ ii) S. coelicolor and Streptomyces

ATCC 11862; and iii) S. rochet, S. griseoruber and S. lividans.

Ill For Christine

IV ACKNOWLEDGMENTS

I wish to thank Dr. William Strohl for his invaluable guidance, motivation,

support, and unending enthusiasm throughout my scientific training, both in the academic

and the industrial settings. 1 am also indebted to Dr. Tina Henkin for her patience and

support in the latter stages of my graduate career, as well as Dr. John Reeve and Dr.

Charles Daniels for their thoughtful input and fresh perspectives.

I owe a debt of gratitude to past members of the Strohl lab from whom I received

my initial training in microbiology: Richard Plater, Yun Li, Michael Dickens, and Gary

Kleman. Other members, Anton Woo, Vineet Rajgahria, and Rob Walczak provided the

framework of moral support and sense of camaraderie that benefited us all. 1 would also

like to thank David Eeles and Peggy Milliman-Wing for their unquestioning accommodation of my myriad requests while finishing my degree.

The fine personnel at Merck Research Labs are also deserving of many thanks for their scientific input and gracious support of my thesis research during my tenure there, especially Dr. Steve Gould, without whom 1 would not have had the opportunity to study at MRL, and Dr. Neal Connors, Dr. Liesa Anderson, and Desmond Clark for their support and encouragement.

Lastly, 1 am deeply indebted to my family, especially my wife Christine, for their unwavering support and encouragement throughout my graduate career. VITA

December 12, 1969...... Bom, Fairview Park, Ohio

1992...... B.S. Microbiology, The Ohio State University

1992-1997...... Graduate Teaching and Research Associate, The Ohio State University

1997-2000...... Visiting Scientist, Merck Research Labs, Merck & Co., Inc, Rahway, NJ

2000-present ...... Research Associate, Case Western Reserve University

PATENTS AND PATENT APPLICATIONS

Strohl, W. R.. Dickens, M. L. and DeSanti, C. L. (1999). Methods of Producing Doxorubicin. U.S. Patent 5,962,293. Assignee: The Ohio State University Research Foundation.

Strohl, W. R., Dickens, M. L. and DeSanti, C. L. (1999). Methods of Producing Doxorubicin. U.S. Patent 5,976,830. Assignee: The Ohio State University Research Foundation.

DeSanti, C. L. and W. R. Strohl. (2000). Soluble recombinant human endostatin and method of making same from Streptomyces sp. International patent application PCT/USOO/09747.

FIELDS OF STUDY

Major Field: Microbiology

VI TABLE OF CONTENTS

Page

Dedication ...... iv

Acknowledgments ...... v

Vita...... vi

List of Tables ...... ix

List of Figures ...... x

Abbreviations ...... xv

Chapters:

1. Introduction ...... 1 Streptomyces...... I Proteinases ...... 6 LysR-like Transcriptional Regulators ...... 12 The Streptomyces snp Loci ...... 15 Recombinant Gene Expression in Streptomyces...... 24 Goals of this study ...... 32

2. Genetic characterization of the Streptomyces sp. strain C5 snp locus, and development of an snp-derived expression vector family ...... 33

Introduction ...... 33

Method and Materials ...... 34 Bacterial strains and plasmids ...... 34 Media and growth conditions ...... 35 DNA and RNA manipulation ...... 36 DNA sequencing and analysis ...... 37 Cloning procedures and plasmid construction ...... 39 Reporter quantification ...... 47 Primer extension analyses ...... 48 Protein electrophoresis and blotting ...... 49 Immunodetection procedures ...... 50 Semi-quantitative western blot analysis ...... 51

Results...... 51 Sequence and analysis of snpR...... 51 Construction of general purpose cloning vectors ...... 56 vii Reporter gene constructs ...... 56 Comparison of different reporter constructs ...... 59 Expression of episomal snpA as a function of host growth ...... 64 Mapping of the snp A and snpR transcriptional start sites ...... 65 Construction and utilization o f Streptomyces expression vectors ...... 71 Construction and utilization of Streptomyces secretion vectors ...... 77

Discussion ...... 91

Summary...... 103

3. Isolation, characterization and analysis of twelve Streptomyces loci homologous to Streptomyces sp. strain C5 snp...... 104

Introduction ...... 104

Methods and Materials ...... 105 Bacterial strains and plasmids ...... 105 Media and growth conditions ...... 107 Nucleic acid purification ...... 107 PCR methodology ...... 107 DNA sequencing ...... 108 Sequence management and analysis ...... 108 Phylogenetic analysis ...... 109

Results...... 110 Design of primers for amplification of homologous loci ...... 110 Evaluation of primer sets ...... 110 Homo log search ...... 110 Cloning, sequencing, and properties of the s/ip-homologous loci ...... 115 Comparative analysis of the snp loci...... 115 snpR...... 119 Intergenic region ...... 121 SnpA proteinase signal peptide ...... 131 The SnpA propeptide region ...... 133 The SnpA mature proteinase ...... 134 Phylogenetic analysis of the snp locus...... 139

Discussion ...... 147

Summary...... 163

4. Conclusions ...... 164

List of References ...... 167

Appendices ...... 191 Appendix A. Table of plasmids used in the Streptomyces sp. strain C5 snpA/R study presented in CHAPTER 2 ...... 191 Appendix B. Streptomyces screened for snp DNA in CHAPTER 3 ...... 194 Appendix C. Nucleotide Sequences of Cloned snp Loci...... 198 Appendix D. Maps of Plasmids Used ...... 217

VIII LIST OF TABLES

Table Page

1.1 Properties of selected LysR family members ...... 19

1.2 Selected examples of heterologous produced and secreted in S. lividans .. 3 1

2 .1 Bacterial strains used in this study (Chapter 2 ) ...... 35

2.2 DNA oligonucleotides used for linker preparation ...... 42

2.3 Leader peptides of the snp-based secretion vectors ...... 78

3 .1 Plasmids used in this study (Chapter 3 ) ...... 106

3.2 PCR product properties for j/ip-positive Streptomyces...... 116

3.3 Pairwise sequence differences between snpR gene fragments ...... 140

3.4 Pairwise sequence differences between snpR mature gene regions ...... 144

A. I Plasmids used in CHAPTER 2 ...... 192

B. I Streptomyces species screened for snp DNA...... 195

IX LIST OF FIGURES

Figure Page

1.1 Diagrammatic representation o f Streptomyces life cycle...... 3

1.2 Families of metalloproteinases ...... II

1.3 Domain organization of LysR-like regulatory proteins ...... 17

1.4 Milk plate showing various recombinant and non-recombinant Streptomyces colonies ...... 21

1.5 Ribbon representation of ScNP ...... 26

2.01 Gene snpR nucleotide and deduced amino-acid sequence ...... 52

2.02 CLUSTALW comparison of Streptomyces C5 SnpR, 5. lividans SIpR and S. coelicolor MprR amino acid sequences ...... 55

2.03 Nucleotide sequences of E. coli cloning vector multiple cloning sites ...... 58

2.04 Promoter probe constructs pANT849, pANT852, pANT853, and pANT856.... 61

2.05 Bar chart of Nptll production conferred by pANT849, pANT852, pANT853, pANT855 and pANT856 constructs ...... 63

2.06 Comparison of total protein to optical density for growth curve preparation 66

2.07 S. lividans TK24 (pANT852) Nptll production profile ...... 68

2.08 Mutant ermE promoter construct and comparison of P,„p 4 and PtrmE- regulated Nptll production ...... 70

2.09 Primer extension results for the snpA transcript ...... 73

2 .10 Primer extension results for the snpR transcript ...... 75

2.11 The pANT1200 series expression vector plasmids and multiple cloning site 80

2.12 Nucleotide sequence of pANT3042 signal peptide-endostatin junction ...... 83

2 .13 Western blot analysis of culture broth from streptomycetes producing and secreting recombinant human endostatin ...... 86 Figure

2.14 Semi-quantitative western blot of S. lividans (pANT3052) broth samples 88

2.15 Growth curve and recombinant endostatin production profile for S. lividans TK24 (pANT3052) ...... 90

2.16 Nucleotide sequence and features of the Streptomyces C5 snpA-R intergenic region ...... 96

3.01 CLUSTALW alignment of snpR. slpR and mprR nucleotide sequences with regions chosen as primer sequences indicated ...... 112

3.02 CLUSTALW alignment o f snpA, slpA, and mprA nucleotide sequences with regions chosen as primer sequences indicated ...... 114

3.03 Electrophoretic gel image showing PCR reactions conducted with genomic DNA from Merck culture collection streptomycetes ...... 118

3.04 Alignment of LysR-like protein sequences from 5/repro/nycey and E. coli 120

3.05 Consensus amino acid sequence of SnpR amino-terminus ...... 119

3.06 Full alignment of snpA-R intergenic DNA sequences ...... 123

3.07 Section of snpA-R intergenic region alignment showing sense strand with respect to snpR...... 127

3.08 Enlarged view of the T-N|i-A inverted repeat motif...... 128

3.09 Two separate alignments of sequences upstream of the snpA start codon ...... 130

3.10 Alignment of SnpA homolog signal peptide sequences ...... 132

3.11 Alignment of SnpA homolog pro-peptide amino acid sequences ...... 135

3.12 Alignment of SnpA homolog mature proteinase amino acid sequences ...... 138

3.13 Consensus sequence of the mature SnpA conserved region ...... 139

3.14 Unrooted neighbor joining phylogenetic tree of snpR gene fragments ...... 141

5.15 Unrooted maximum parsimony phylogenetic tree of snpR gene fragments 142

3.16 Unrooted neighbor Joining phylogenetic tree of snpA gene segments ...... 145

3.17 Unrooted maximum parsimony phylogenetic tree of snpA gene segments 146

3.18 Distribution of snp loci among Streptomyces taxa ...... 150

3.19 Structure of S. caespitosus small neutral proteinase showing region of amino-acid gapping among the homologous SnpA proteinases ...... 161

C. 1 Nucleotide sequence of S. avermitilis snp locus fragment ...... 199 xi Figure

C.2 Nucleotide sequence of S. spectabilis snp locus fragment ...... 200

C.3 Nucleotide sequence o f S. griseoruber snp locus fragment ...... 201

C.4 Nucleotide sequence o f Streptomyces ATCC 11862 snp locus fragment ...... 203

C.5 Nucleotide sequence of 5. thermotolerans snp locus fragment ...... 205

C.6 Nucleotide sequence of S. rochei snp locus fragment ...... 206

C.7 Nucleotide sequence o f S. flocculus snp locus fragment ...... 208

C.8 Nucleotide sequence of S. spadicis snp locus fragment ...... 210

C.9 Nucleotide sequence of Streptomyces ATCC 21021 snp locus fragment ...... 211

C IO Nucleotide sequence o f5.yra

C. 11 Nucleotide sequence of 5. namwaensis snp locus fragment ...... 214

C. 12 Nucleotide sequence of S. fellus snp locus fragment ...... 216

D.l Map of plasmid pCR2.1 TOPO ...... 218

D.2 Map of plasmid pUC 19 ...... 219

D.3 Map of plasmid pIJ 101 ...... 220

D.4 Map of plasmid plJ303 ...... 221

D.5 Map of plasmid plJ486 ...... 222

D.6 Map of plasmid plJ702 ...... 223

D.7 Map of plasmid pKK840 ...... 224

D.8 Map of plasmid pANT826 ...... 225

D.9 Map of plasmid pANT827 ...... 226

D.IO Map of plasmid pANT840 ...... 227

D. 11 Map of plasmid pANT841 ...... 228

D. 12 Map of plasmid pANT842 ...... 229

D. 13 Map of plasmid pANT846 ...... 230

D. 14 Map of plasmid pANT849 ...... 231

D. 15 Map of plasmid pANT852 ...... 232 xii Figure Page

D.l 6 Map of plasmid pANT853 ...... 233

0.17 Map of plasmid pANT855 ...... 234

D. 18 Map of plasmid pANT856 ...... 235

D. 19 Map of plasmid pANT857 ...... 236

D.20 Map of plasmid pANT859 ...... 237

D.21 Map of plasmid pANT882 ...... 238

D.22 Map o f plasmid pANT883 ...... 239

D.23 Map of plasmid pANT886 ...... 240

D.24 Map of plasmid pANT887 ...... 241

D.25 Map of plasmid pANT889 ...... 242

D.26 Map o f plasmid pANT890 ...... 243

D.27 Map of plasmid pANT891 ...... 244

D.28 Map of plasmid pANT892 ...... 245

D.29 Map of plasmid pANT893 ...... 246

D.30 Map of plasmid pANT894 ...... 247

D.31 Map of plasmid pANT895 ...... 248

D.22 Map of plasmid pANT 1200 ...... 249

D.33 Map of plasmid pANTI20l ...... 250

D.34 Map of plasmid pANT 1202 ...... 251

D.3 5 Map of plasmid pANT3021 ...... 252

D.36 Map of plasmid pANT3022 ...... 253

D.37 Map of plasmid pANT3023 ...... 254

D.38 Map of plasmid pANT3024 ...... 255

D.39 Map of plasmid pANT3025 ...... 256

D.40 Map of plasmid pANT3026 ...... 257

D.41 Map of plasmid pANT3032 ...... 258

xiii Figure Page

D.42 Map of plasmid pANT3035 ...... 259

D.43 Map of plasmid pANT3042 ...... 260

D.44 Map of plasmid pANT3045 ...... 261

D.45 Map of plasmid pANT3052 ...... 262

XIV ABBREVIATIONS

bp base pairs cDNA complementary deoxyribonucleic acid Da Daltons DNA deoxyribonucleic acid dNTP deoxy-nuc leoside-5 ’ -triphosphate dsDNA double-stranded deoxyribonucleic acid EtBr ethidium bromide FITC fluorescein isothiocyanate IPTG isopropyl-P-D-thiogalactoside kbp kilobase pairs kDa kilodaltons MCS multiple cloning site mg milligram MOPS morpholinopropanesulfonic acid mRNA messenger ribonucleic acid NFDM non-fat dairy milk ng nanogram Pg picogram pmol picomole PMSF phenylmethylsulfonyl fluoride PVDF polyvinyl difloride RNA ribonucleic acid rRNA ribosomal RNA SDS-PAGE sodium dodecylsulfate polyacrylamide gel electrophoresis ssDNA single-stranded deoxyribonucleic acid tRNA transfer RNA pg microgram pM micromolar X-Gal 5-Bromo-4-chloro-3-indolyl-P-D-galactopyranoside

XV CHAPTER 1

INTRODUCTION

Streptomyces

Bacteria of the genus Streptomyces are Gram-positive, aerobic, spore-forming soil

microorganisms distributed widely in nature (Lechevalier, 1981; Williams, 1978). The

streptomycetes, and other members of the order Actinomycetales are well known

producers of niunerous extracellular hydrolases, and play an important role in the

turnover of organic matter in soil (Kutzner et al., 1981; Goodfellow et a. I, 1984). Having

evolved the ability to degrade and subsist on the high molecular weight organic polymers

frequently found in their growth environments (Kom-Wendisch and Kutzner, 1991 ), different Streptomyces species have demonstrated the ability to utilize cellulose (Ishaque and Kluepfel, 1980), lignocellulose (Barder and Crawford, 1981), chitin (Carpentier and

Percheron, 1983), and pectin (Sreeneth and Joseph, 1982) as growth substrates. Beyond polysaccharide substrates, streptomycetes assimilate extracellular proteinaceous nitrogen sources through the action of secreted proteolytic enzymes (Shapiro, 1989).

Streptomyces exhibit a developmental life cycle with complex morphological and physiological changes (Figure 1.1). When presented with a suitable growth surface, such as decaying plant material or solid agar medium, Streptomyces spores undergo germination and grow in a vegetative fashion.

I Figure 1.1. Diagrammatic representation of Strepton^ces life cycle. On suitable solid growth substrates, streptomycete spores will germinate and proceed to a vegetative growth phase in which mycelia traverse and penetrate the substratum, aided by secreted hydrolases. When the nutrients begin to disappear, reproductive growth ensues, culminating in the generation of aerial mycelium which septate to form spore chains. These mature spores allow the bacterium to survive extended periods of nutrient deprivation, and facilitate transport to fresh growth surfaces. Second '^Sporulation stationary . ' sepiation phase Spore Reproductive maturation growth

First stationary phase Vegetative growth

S eco n d a ry metabolites

Assimilation , a f nutrients '* \ Dcsiccation-rcsistani spores

Figure 1.1. Diagrammatic representation of Streptomyces life cycle.

(Adapted from Chater, 1998) During this growth phase, branched mycelia traverse and penetrate the growth substrate, secreting hydrolytic enzymes to liberate nutrients from the medium (Chater, 1998).

Growth of the mycelia occurs predominantly at the hyphal tips (Brana et al., 1982), and the mycelial cells lack internal septa. Thus, the coenocytic mycelia possess large muitinucleate cytoplasmic spaces, with the genetic material positioned centrally within the ceils and evenly spaced along the length of the hyphae (Chen, 1966). This cellular structure also facilitates transfer of nutrients within the developing colony. When the nutritive properties of the growth medium begin to dissipate, a developmental switch occurs, triggering the sporulation pathway (Hopwood, 1988). The mycelia begin to grow upward and away from the growth surface, partly utilizing the original vegetative mycelium as a source of carbon and energy (Chater, 1998). Often termed substrate mycelium, the original growth is auto-processed by secondary nutritional hydrolases

(Ginther, 1978; Gibb and Strohl, 1988). These hyphae grow upward in the form of aerial mycelium, and begin to septate. By the time the nutritional resources of the substrate mycelium are completely consiuned, the aerial mycelia have completed the transition to spore chains. Metabolically dormant, desiccation resistant, and easily dispersed, these spores allow the streptomycete to survive prolonged periods of nutrient deprivation, and also allow transport of the bacterium (e.g. as wind-bome spores) to fresh growth environments (Reponen et al., 1998).

The onset of the sporulation pathway is accompanied by the production of numerous secondary metabolites. Most frequently low molecular weight compounds, these substances are thought primarily to function in the protection of the substrate mycelium from consumption by other more rapidly growing soil microbes. Many of the secondary metabolites produced by streptomycetes have potent biological properties that make them effective human and animal health therapeutics. As a result, streptomycetes produce approximately 70% of all known antibiotics, and members of the Order

Actinomycetales are well known as rich sources of clinically important natural products.

In addition to the production of biologically active small molecules, the numerous hydrolytic enzymes produced by Streptomyces makes them particularly well suited as producers of commercial and industrial enzymes (Gilbert et al., 1995). One example of a commercial of streptomycete-origin is Pronase, a preparation of S. griseus culture supernatants comprised of four proteinases: the -like proteinases A,

B and E, and S. griseus (Narahashi et a i, 1967; Sidhu et a i, 1994). From an industrial perspective, some enzymes such as xylanases have been used to selectively hydrolyze the hemicellulosic components of wood pulp, and the actinomyces are good sources of such xylanolytic enzymes (Holtz et al., 1991). Microbial xylanases have been used successfully as biobleaching agents in the paper industry (Viikari et a i, 1991 ), and they stand to significantly reduce the use of toxic chlorine-derived bleaching chemicals in the paper production process (Senior et al., 1992). Moreover, several Streptomyces species, including S. viridosporus (Ramachandra et a i, 1988), exhibit lignocellulolytic activity generating acid precipitable polymeric lignin (APPL) from ball-milled wheat straw in commercially competitive quantities (Pettey and Crawford, 1985; Trigo and

Ball, 1994).

The ty pical Streptomyces chromosome is about 8x10^ basepairs (8 Mb) in size, nearly double the ca. 4 Mb chromosomes of E. coli and B. subtilis (Redenbach et al.,

1996). The DNA of Streptomyces is typically between 68 and 78 mol% guanosine plus cytosine (G+C) (Goodfellow and Cross, 1984). Until recently, the genetics of the

Streptomyces were enigmatic. With the advent of automated sequencing technology and pulsed-field gel electrophoresis, however, a more complete understanding of

Streptomyces genetics has emerged. Originally assumed to be circular, the Streptomyces chromosome is now known to be linear, containing long terminal inverted repeats of tens to himdreds of kilobases with covalently boimd terminal proteins (Wang et al., 1999).

Moreover, the complete genomic DNA sequence of S. coelicolor A3(2), the best characterized streptomycete, is nearly complete. Preliminary analyses of the data show that the gene density is about one gene per 1100 basepairs, which suggests approximately

7000 genes are encoded by the chromosome, 20% more than the eukaryotic yeast

Saccharomyces cerevisiae (Hopwood, 1999). Several hundred such ‘extra’ genes encode proteins involved in secondary metabolite production and morphological differentiation, but far more are involved in detecting and responding to environmental changes

(Hopwood, 1999). Thus, the streptomycetes are now understood to be far more sensitive to minute variations in their environment than earlier thought.

Proteinases

Proteolysis is a metabolic activity essential to nearly all living cells. The effectors of proteolysis, proteolytic enzymes, catalyze cleavage of the peptide bonds in proteins

(Neurath, 1984). Presumed to have arisen from the earliest life forms, mammalian digestive proteinases share common ancestry with microbial proteinases, the predecessors of which are considered to have arisen some two billion years ago (James, 1980). During this evolutionary process, proteinases of broad substrate specificity, which served only primitive digestive purposes, adapted to become components of more elaborate

physiological mechanisms requiring limited, rather than complete, proteolysis. These

proteinases developed restricted substrate preferences, cleaving only specific peptide

bonds in specific physiological substrates. Many biological processes, such as the

transport of proteins across membranes, the activation of hormones from inactive

precursors, blood coagulation and fibrinolysis, and fertilization are critically dependent

on limited proteolysis (Neurath et a i, 1976).

Under physiological conditions, the basic mechanism by which peptide bonds are cleaved by proteolytic enzymes is the same. This mechanism involves: i) an electrophile to polarize the carbonyl carbon-oxygen bond of the scissile bond; ii) some form of

nucleophile to form a tetrahedral intermediate with the carbonyl carbon atom; and iii) a proton donor which adds a proton to the nitrogen of the leaving group (James, 1980).

Beyond these similarities, the amino acids either directly or indirectly responsible for the specific roles above vary among the known proteinases. Grouping proteinases by their active site chemistries, however, reveals four major classes, each employing different active site amino acids (or cations) to fulfill the electrophile, nucleophile, and proton donor roles.

The serine proteinases are perhaps the most thoroughly characterized class of proteolytic enzymes. Characterized by a of aspartic acid, serine and histidine residues, the trypsin superfamily exhibits a cleavage reaction sequence comprised of a Michaelis complex, followed by a tetrahedral intermediate, an acyl- enzyme, another tetrahedral intermediate and finally an enzyme-product complex

(Hofmann, 1985). The serine residue acts as the nucleophile, with the histidine residue receiving the proton from the serine and subsequently donating it to the leaving group; the aspartic acid residue acts to position the histidine, and also provides the correct electrostatic environment for the proton transfer (James, 1980; Steitz et a i, 1982). The backbone amino groups of a serine and glycine donate two hydrogen bonds to provide the electrophilic component, in the form of the so-called (Henderson, 1970).

The mammalian digestive enzymes trypsin and chymotrypsin, as well as from

Bacillis subtilis (Smith et al., 1966), are well known examples of serine proteinases.

Cysteine proteinases, also known as sulfhydryl or thiol proteinases, are comparable to serine proteinases with the exception of a cysteine residue in place of the serine; the catalytic triad is similarly completed by histidine and aspartic acid residues

(Hunkapiller et al., 1973). The mechanism of action proceeds via a similar acyl intermediate, but the acidic nature of the active site thiol group eliminates the need for the strong negative charge of the aspartic acid for proton transfer (Hofmann, 1985). Found in plants, mammals and bacteria, examples of cysteine proteinases include papain from papaya (Drenth et a i, 1971), actinidin from kiwi fruit (Baker, 1977), mammalian A and B (Fersht, 1977), and the clostridial proteinase clostripain (Mitchell and

Harrington, 1968).

Aspartyl, or acid, proteinases employ a pair of active site aspartate residues to effect peptide bond cleavage via a non-covalent tetrahedral intermediate (Hofmann,

1985). The most well characterized aspartyl proteinase is pepsin, and mechanistic analyses suggest a proton shared by the aspartate residues provides the electrophilic component, and the nucleophile is a water molecule activated by one of the aspartyl residues (James et a i, 1977, 1981, 1983). The proteinase penicillopepsin from the mold Pénicillium janthineiium (Sodek and Hofinann, 1970) and the mammalian proteinases pepsin and renin (Blundell et a i, 1983) are examples o f aspartyl proteinases.

Metalloproteinases, or neutral proteinases, constitute the fourth major class of proteinases. The involvement of a metal ion, usually zinc, in the active sites of these enzymes sets them apart from the other classes (Fersht, 1977). The amino acids that comprise the active sites of the metalloproteinases typically consist of a pair of histidines, which serve as metal ion ligands, and a glutamic acid, which acts as the nucleophile during cleavage. The primary mechanistic difference between metalloproteinases and other proteinases is the electrophilic role of the bound metal ion (Hofmann, 1985). The proton donor is often a tyrosine or histidine residue. Two of the most thoroughly characterized metalloproteinases are bovine carboxypeptidase A (Hartsuck et al., 1973) and (Latt et a i, 1969).

In 1995, Rawlings and Barrett published an evolutionary analysis of matalloproteinases based on comparisons of the amino acid sequences of enzymes known at the time (Rawlings and Barrett, 1995). Combined with the earlier metalloproteinase classification schemes proposed by Jiang and Bond (1992) and Hooper (1994), a robust classification system based on the primary structure of the metalloproteinase active site emerged (Figure 1.2). The majority of metalloproteinases known today fall into the large zincin superfamily, which is characterized by the his-glu-xaa-xaa-his (HEXXH) active site sequence. First observed in the three dimensional model of thermolysin (Holmes and

Matthews, 1982), the histidine residues of the HEXXH motif are zinc ligands, while the Figure 1.2. Families of metalloproteinases. This diagram represents a classification scheme for metalloproteinases based on the sequences around zinc-binding residues. Italicized letters represent zinc liganding amino acids, while bold letters represent amino acids participating in catalysis. B stands for a bulky, apolar residue, and X for any amino acid. The boxed sequences in tier A are the first two zinc binding amino acids, tier B contains the third zinc-binding residue, and tier C shows the methionine containing region of the metzincins. Numbers on the bars connecting boxed sequences represent typical distances between the motifs. The clan and family designations in parenthesis are from the system of Rawlings and Barrett (1992). Family M7 (not shown) falls within clan MB.

10 Zinc Mctalloproleases

HEXXH |/fxxE/f| IwxxeI {h x h \ Z in cin s lnvcr/.incins

82aa l08-l3Saa •42aa

2Saa 64aa 29aa 24 o a b Ig x b n e x b s p I Ie n x a d x o g I IEABGPVLAI | l W LN£G B| ^ a B##SYSQ ŒI or Thermolysin Ncprilysin ACF Aminopcptidasc Insulinasc LWGGXI^ DD-catboxy- Family Family Family Family Family pcplidasc or (family M4 ) (MI3) (Ml) FfflY SE ^ ______Carboxypeptidase Family gluzinclns (clan MA) ______. :SBMSiliciMXP||ABMYPi Aslacin Scrraiin Kcprolysin Mairixin Family Family Family Family (MI2) (MIO) (MIO)

metzincins (clan MB) Figure 1.2 Families of metalloproteinases.

(Adapted from NM Hooper, 1994) glutamate promotes the nucleophilic attack of a buried water molecule on the carbonyl carbon of the scissile peptide bond (Weaver et al., 1977). Beyond the core HEXXH motif of the zincins, however, the extended active site sequences, as well as the third amino acid residue binding the catalytically essential zinc are more variable. Two major clans (nomenclature of Rawlings and Barrett), MA and MB, are present within the zincin superfamily. Also known as gluzinclns, because of the third zinc-liganding glutamate residue, metalloproteinases of clan MA include the thermolysin, and aminopeptidase families of enzymes (Hooper, 1994). Clan MB proteinases, in contrast, are distinguished by two structural features. First, the third amino-acid residue binding the zinc ion is almost always a histidine (with the only exception being the Streptomyces small neutral proteinases, discussed below). Second, a completely conserved methionine residue serves a vital function in the active site structures of these enzymes by maintaining a so-called 'met-tum’; hence, the proteinases of clan MB have been termed metzincins (Bode et al., 1996). One interesting class of non-zincin metalloproteinases are the insulinases, which has been termed the inverzincin family because if their

HXXEH active site motif (Becker and Roth, 1993).

LysR-like Transcriptional Regulators

Proteinases, as well as other enzymes and proteins encoded by microbial genomes, are frequently subject to regulation. Although some gene products are essential for cellular metabolism and are constitutively expressed, most are turned on only when the products they encode are required by the cell. Such regulation of transcription is

12 usually mediated by binding proteins that interact with the DNA to either promote or

prohibit transcription of the gene by the RNA polymerase (Adhya, 1990).

Among these proteins, several discrete motifs have been identified which

facilitate their specific interaction with DNA sequences. Two of the most comon are the zinc-finger domain, first observed vaXenopus transcription factor IIIA (KJug et a i,

1987), and the helix-timi-helix motif, originally observed in X Cro protein (Pabo et ai,

1984). In work published in 1988, sequence analysis of a number of transcriptional regulators of prokaryotic origin revealed the presence of a subset sharing common ancestry with the LysR protein of Escherichia coli (Henikoff et al., 1988). Originally only comprised of nine proteins, this LysR class has since grown into one of the largest groups of transcriptional regulatory proteins known in bacteria.

The LysR family now contains at least 100 different proteins from diverse prokaryotic genera, and new members are discovered on a regular basis. Acting on diverse genes, the LysR-like proteins usually activate transcription of linked genes between 6 to 200-fold, and compared to the two-component regulatory systems (Stock et al., 1989), the LysR-like transcriptional regulators are likely the most common positive transcriptional control mechanisms in bacteria (Schell, 1993). Many LysR-like proteins require the presence of a smal 1-molecule coinducer to activate transcription of their target genes, but can usually bind to target DNA sequences in the absence of coinducer. It is common as well for LysR-like regulator genes to be transcribed divergently from their target gene(s), and often the promoters overlap (Schell, 1993). Moreover, LysR-like regulators often negatively autoregulate their own transcription between 3 to 10-fold, perhaps to maintain constant levels of the regulator protein. Purification of LysR-like

13 regulators revealed that most were active only as multimers. Escherichia coli NhaR and

CysB are active only as tetramers (Rahav-Manor et a/., 1992; Ostrowski et al., 1987), while £. coli llvY and P. putida CatR were isolated as dimers (Wek and Hatfield, 1986;

Rothmel et al., 1990).

Sequence comparisons of LysR-like regulators reveal several regions of similarity. The most pronounced of these regions is the amino terminal helix-tum-helix domain. Approximately 66 amino acids in length, this domain contains a core sequence between residues 23 and 42 predicted to contain a a-helix P-tum a-helix DNA binding motif (Gamier et al., 1978; Dodd and Egan, 1990). Along with approximately 15 flanking residues on either side, this helix-tum-helix motif has been demonstrated by mutagenesis studies to confer the DNA binding properties of the LysR-like proteins

(Bum et al., 1989; Schell and Sukordhaman, 1989; Schell et al., 1990).

Beyond the amino terminal region, residues within the 95-173, 196-206, and 227-

253 ranges of several different LysR-like proteins exhibit relatedness, albeit at a lower level that the helix-tum-helix region. Numerous studies on LysR-like regulators have shown that these secondary regions of homology are involved in coinducer recognition

(Mclver et al., 1989; Bartowsky and Normark, 1991; Huang and Schell, 1991). The low level of relatedness is exemplified by the divergence of the coinducer-binding domains of closely related NodD proteins from different Rhizobium species (Appelbaum et al.,

1988). In 1997, Tyrrell and coworkers reported the crystallization of a carboxyl-terminal fragment of CysB containing the coinducer (Tyrrell et a i, 1997). The structure showed a coinducer binding cleft similar to the -binding domain of Lac repressor, suggesting a structural relationship between the Lac repressor and LysR-like

14 regulatory proteins. Figure 1.3 shows a schematic of LysR-like protein domain organization, and the consensus sequence for the amino-terminal DNA-binding region, and Table 1.1 provides a sampling of LysR-like regulators from different bacterial species.

The Streptomyces snp Loci

As part of a streptomycete proteinases study, Streptomyces sp. strain C5, a mutagenized strain which produces anthracycline antibiotics, was found to produce significant proteolysis of the proteins in Carnation dry milk, but very little hydrolytic activity against azocasein (Gibb et a i, 1987; Gibb et a i, 1989). Utilizing the low proteinase producer S. lividans 1326 as a host, genomic libraries of Streptomyces sp. strain C5 DNA were screened for milk protein hydrolysis on milk-agar plates, and several clones with large zones of hydrolysis were chosen for further analysis (Figure 1.4).

Characterization of the secreted proteolytic activity, as well as the recombinant DNA harbored by the S. lividans hosts ensued.

The plasmid conferring the hyperproteolytic phenotype on S. lividans was designated pANT21. Comprised of an 6.56 kilobase pair (kbp) SstI fragment inserted into the streptomycete cloning vector pIJ702 (Katz et ai, 1983), pANT21 was found to be slightly unstable. Subcloning of the insert abrogated the stability problem and localized the DNA responsible for the phenotype to a 3.31 kbp Bam\\\-Sst\ fragment, the insert of plasmid pANT42. The locus contained within the Bam\i\-Sst\ fragment was designated snp (for jmall neutral proteinase) and was partially sequenced. Analysis of the DNA sequence revealed two divergently oriented open reading frames, designated snpA and snpR.

15 Figure 1.3. Domain organization of LysR-like regulatory proteins. This schematic shows the conserved LysR-like regulatory protein amino-acid regions identified by M. A. Schell (1993). The boxed domains are described in the text, and the numbers denote approximate domain locations. In the expanded consensus amino-terminal segment sequence below the diagram, x is any amino acid, h is a hydrophobic residue, and p is a hydrophilic residue. Positions where amino acid alterations eliminated DNA binding properties are shown in bold.

16 coinducer binding and response domains ohelix-ptum-othelix DNA binding region

H.N- CO2H

6 — 66 95-173 196 - 206 227 - 253

hpLRpLRxFxxhxpppphSxAApxLphSQPAhSxQhppLEpxLGxxLFxRxpRxhxxxTxA

helix-tum-helix motif

Figure 1.3. Domain organization of LysR-like regulatory proteins.

17 Table 1.1: Properties of selected LysR family members.The protein, source organism, regulated pathway and coinducer (if known) are shown here for selected LysR-like regulatory proteins. Uncharacterized coinducers are indicated by . References: a, Renna et al., 1993; b, Campbell et al., 1989; c, Urabe and Ogawara, 1992; d, Neilde et al., 1989; e, Rothmel et al., 1990; f. Sung and Fuchs, 1992; g, Ostrowski et al., 1987; h, Keller, 1990; i, Wek and Hatfield, 1986; j, Stragier et al., 1983; k, Plamann and Stauffer, 1987; 1, Renault, 1989; m. Bender, 1991; n, Schell, 1989; o, Mackie, 1986; p, Rahav- Manor et al., 1992; q, von Lintig et al., 1991; r, Appelbaum et al., 1998; s, Rushig et al., 1991; t, Habeeb et al., 1991; u, Parke, 1996; v, Brumbley et al., 1993; w, Kusano and Sugawara, 1993; x, Davison et al., 1992; y, Fellay, 1998; z, van der Meer et al., 1991; aa, Chang et al., 1989.

18 Member Origin Function of regulated target gene Coinducer Ref AisR Bacillus subtilis Acetoin synthesis acetate or pH a AmpR Rotobacter P-lactamase 3'Lactam or derivative b capsulatans BlaA Streptomyces spp. 3-lactamase - c CatM Acinetobacter Catechol catabolism c/s-c/s-muconate d calcolaceticus CatR Pseudomonas putida Catechol catabolism c/s-c/s-muconate e CynR Escherichia coli Cyanate detoxification cyanate f CysB Salmonella Cysteine biosynthesis O- or N-acetyl serine s typhimirium DgdR Pseudomonas cepacia DialkyIglycine decarboxylase 2-methylalanine h IlvY Escherichia coli Isoleucine / Valine biosynthesis acetolactate or i acetohydroxy-butyrate LysR Escherichia coli Lysine biosynthesis diaminopimelate J MetR S. typhimirium, E. coli Methionine biosynthesis homocysteine k MIeR Lactococcus lactis Malolactic enzyme L-malate 1 Nac Klebsiella aerogenes Poor nitrogen source use none m NahR P. putida NAH7 Naphthaline/salicylate catabolism salicylate n plasmid NhaR S. enteriditis, E. coli Na*-H" antiporter Na* or Li* o ,p NocR A. tumefaciens Ti Nopaline catabolism nopaline q plasmid NodD Rhizobium and related Nitrogen fixation symbiosis fiavanoids r, s species OccR A., tumefaciens Ti Octopine catabolism octopine t plasmid PcaQ A., tumefaciens Ti catabolism of phenolics c/s-c/j-muconate u plasmid PhcA Pseudomonas Virulence factor regulation - V solanacearum RbcR C. vinosum, CO2 fixation photoautotrophy w T. ferroxidans SdsB Pseudomonas spp. Alkyl sulfatase sodium dodecylsulfate X SyrM Rhizobium meliloti Exopolysaccharide synthesis and - y nodulation TcbR Pseudomonas plasmid Chlorocatechol metabolism 3-chlorobenzoate z J51 TrpI Pseudomonas species Tryptophan biosynthesis indoleglycerol- aa phosphate

Table 1.1: Properties of selected LysR family members.

19 Figure 1.4. Milk plate showing various recombinant and non­ recombinant Streptomyces colonies. A typical milk plate showing hydrolysis of the milk proteins by secreted , both recombinant and native. The natural profile of proteolysis by Streptomyces sp. strain C5 (C5) can be compared to that of two other non-proteolytic library clones (cl & c2), another weakly proteolytic clone (pANTl I), the strongly proteolytic snpA/R clone (pANT21), and the background proteolysis of S. lividans 1326, the host of the C5 genomic library. Note the darker anthracycline pigment produced by the C5 colony.

20 »

c l

K)

PANT21 SL1326

Figure 1.4. Milk plate showing various recombinant and non-recombinant 5/rep/oiftj'cef colonies. The deduced amino acid sequence of the snpA gene contained a histidine-glutamate-x-x- histidine (HEXXH) motif, which is the conserved zinc binding site found in many metalloenzymes (Jongeneel et a i, 1989).

Purification of the proteolytic activity from culture supernatants revealed a monomeric protein of Mr 15,500 kDa with optimum activity at pH 7.0 and 55°C. The enzyme was not inhibited by traditional serine, aspartyl and cysteine proteinase inhibitors such as diisopropylfluorophosphate (DIFP), A^-tosyl-L-phenylalanine chloromethylketone

(TPCK), aprotinin or bestatin, but was completely inhibited by 1,10-phenanthroline, indicating the requirement of metal ions for activity (Lampel et al., 1992). Subsequent energy-dispersive X-ray fluorescence spectrometry indicated the presence of about 1 mol of zinc per mol of pure enzyme. Taken together, the properties of the purified enzyme indicated that it was a neutral metalloproteinase. In contrast to the other members of the metalloproteinase family, however, the SnpA proteinase is imusually small. The N- terminal amino acid sequence was determined by Edman degradation on the purified enzyme, yielding the sequence NH 2 -AAVTVVYNASGAPS, which matches exactly a segment of the deduced primary structure from the nucleotide sequence (Lampel et a i,

1992).

Divergently oriented to snpA is the putative regulatory gene snpR. Subcloning experiments demonstrated the hyperproteolytic phenotype is dependent on the presence of the upstream DNA, and the nucleotide sequence of a 5’ portion of the open reading frame (ORE) revealed a deduced amino acid sequence with homology to members of the

LysR family of transcriptional regulator proteins (Schell, 1993).

22 Coincident with the publication of the Streptomyces sp. strain C5 snp locus, two other research groups reported the isolation of a similar locus from S. lividans 66.

Designated slpA (Butler et a i, 1992), or prt (Lichenstein et a i, 1992), the proteinase

locus of S. lividans has more than 70% homology to the snp locus of Streptomyces sp. strain C5. With an Mr of 22,000 kDa, however, the secreted S. lividans proteinase is significantly larger than 15.5 KDa mature SnpA (Lampel et ai, 1992). Additionally, researchers studying the proteinases of S. coelicolor ‘Mueller’ reported on their characterization of a DNA fragment conferring a proteolytic phenotype on recombinant hosts, designating it the mpr locus (Dammann et ai, 1992). Similar to the S. lividans proteinase, S. coelicolor MprA, produced as recombinant protein in S. lividans, has an approximate molecular mass of 20,000. Among the three loci, all share the same divergent orientation of proteinase and putative regulator genes, with between 60 and

70% DNA sequence homology in the coding regions.

Two decades before these reports, researchers at Kyowa Hakko Kogyo Co., Ltd. crystallized a proteinase from crude fermentation broth of S. caespitosns (Yokote and

Noguchi, 1969). With an estimated size of around 15,000 Da, the enzyme was determined biochemically to be a neutral metalloproteinase. Designated ScNP, a three- dimensional model of the metalloproteinase a-carbon chain, based on a 2.5 angstrom resolution electron density map was reported by Harada and coworkers (Harada et ai,

1991). Several years later, the complete amino acid sequence was determined by Edman degradation (Harada et a i, 1995), and it became obvious that ScNP was a close relative of SnpA, SlpA, and MprA. The primary amino acid sequence of ScNP exhibits significant homology to the deduced mature regions of SnpA, MprA and SlpA.

23 Finally, in 1997 a 1.6 angstrom resolution structure of the crystallized ScNP proteinase was reported (Figure 1.5), providing insight into the structure and function of the proteinase at the atomic level (Kurisu et a i, 1997).

Recombinant Gene expression inStreptomyces

Due to the capacity of Streptomyces species to produce industrially important natural products and enzymes, considerable effort has been devoted to developing systems to allow genetic manipulation of the bacterium Under the direction of Sir David

A. Hopwood, study of streptomycete genetics started at the John Innes Institute

(Norwich, UK) in the 1970’s. The isolation of the first extrachromosomal elements from

Streptomyces coelicolor in 1975 (Schrempf et a l, 1975) signaled the beginning of an era that continues on today, with the complete genome sequence of the same strain nearly finished (Hopwood, 1999).

There are two general types of cloning vectors used to generate recombinant

Streptomyces strains: plasmid based and actinophage based. Almost all streptomycete actinophage vectors are based on the <|>C31 phage originally isolated from S. coelicolor

A3(2) (Lomovskaya et al., 1972). Derivatives of <(>C31 containing antibiotic resistance genes and promoterless reporter genes (Rodicio et al., 1985; King and Chater, 1986) have seen limited use in Streptomyces cloning. However, the lack of additional actinophage vector development since this first round of derivatives alludes to the pervasive use of plasmid based systems for cloning in Streptomyces.

The first Streptomyces plasmid described was a 31 kb covalently closed circular

DNA species isolated from S. coelicolor A3(2), designated SCP2 (Schrempf et a i, 1975).

24 Figure 1.5. Ribbon representation of ScNP. S. caespitosus ScNP consists of a highly twisted five stranded P-sheet, and four a-helices. This ribbon model presents a view of the a-helix elements, labeled A though D, and shows the active site residues His83, His87 and Asp93 forming three of the four bonds with a zinc ion; the catalytic residue Glu84 is shown as well. Note the central location of the active site histidine and glutamate residues within the long C a- helix. Although the amino-terminus is blocked in this view, the carboxyl- term inus is visible just below a-helix C.

25 C,liiS4 . m - '

a-helix A

a-helix D

Figure l.S. Ribbon representation of ScNP.

26 In crosses with SCP2’ strains, SCP2*, a high fertility variant of SCP2, was discovered

which gave rise to recombinants at a high frequency (Bibb and Hopwood, 1981). With a

low copy number of 1 to 2 per chromosome (Bibb et a/., 1977), SCP2* is very stably

inherited, and was the vector of choice for early cloning experiments (Bibb et al., 1980;

Thompson et al., 1982). The primary limitations of the early SCP2* vectors, however,

were the lack of detailed restriction maps and imavailability of derivatives with selectable

genetic markers (Lydiate et al., 1985).

Parallel genetic study S. lividans 66, a close relative of S. coelicolor, revealed the

existence of the autonomous self-transmissible plasmid, termed SLPl, which exists as an

integrated element in the chromosome of 5. coelicolor A3(2) (Thompson et a i, 1982b).

Matings of the two Streptomyces species results in excision of integrated S. coelicolor

SLPl and transfer to S. lividans, where it replicates as a stable extrachromosomal element

with a copy number of 3 to 5 (Thompson et al., 1982b). Since SLPl undergoes imprecise

excision from the S. coelicolor chromosome, a number of different SLPl variants were

recovered from 5. lividans (Bibb et al., 1981b). One of the largest, SLPl.2, was further

developed into the cloning vectors plJ41 and plJ61 by the addition of the aph and tsr

antibiotic resistance markers (Thompson et al., 1982). These vectors, along with the

SCP2*-based plasmids plJ922 and pIJ940 (Lydiate et al., 1985), and derivatives are the

most common low-copy number Streptomyces vectors in use today.

Although the low copy number vectors are useful, two of the most widely used

Streptomyces vectors, plJ486 (Ward et al., 1986) and plJ702 (Katz et al., 1983) are both derived from the high-copy number plasmid pIJlOl (Kieser et al., 1982). Having been isolated from S. lividans ISP 5434 as an autonomous and self-transmissable conjugative

27 plasmid, pIJlOl was found to have a copy number of 40 to 300 copies per chromosome

(Katz et al., 1983). The prospect of greater gene dosage and easier use led to the

generation of numerous pIJlOl derivatives; perhaps one of the most important is pIJ350,

which contains the tsr thiostrepton resistance gene of S. azureus (Thompson et al., 1980,

1982a). Plasmid pIJ350 was further modified with the addition of the promoterless aphW

neomycin resistance gene of transposon Tn5 (Beck et al., 1982) downstream of a

multiple cloning site, generating the promoter probe plasmid pIJ486 (Ward et al, 1986).

Separately, the genes melCl and melCl were introduced into pIJ350, yielding

the differential cloning vector pIJ702 (Katz et al., 1983). Whereas pIJ486 is best suited

for study of streptomycete DNA transcriptional properties, pIJ702 is more suitable for

shotgun cloning experiments because hosts such as S. lividans will not produce melanin if the mel genes of the plasmid are disrupted with insert DNA. Due to the utility and

success of plJ486 and pIJ702, numerous improved derivatives have been prepared from these plasmids with additional useful properties.

Several noteworthy derivatives of the original SCP2*, SLPl.2 and pIJlOl vectors have been developed. Plasmid pWHM3 combines the pIJ486 replicon and tsr gene with the ColEl replication origin, P-lactamase gene, and lacZ-a. fragment of pUC19 to make an E. coliStreptomyces shuttle vector with intact a-complementation (Vara et al., 1989).

In contrast, pKC505, a cosmid shuttle vector, is a derivative of SCP2* containing cos sites for packaging by bacteriophage Ï. proteins, a ColEl origin, and an apramycin resistance gene with an E. coli—Streptomyces bifunctional promoter (Richardson et al.,

1987). Lastly, Mazodier and coworkers have described pIJl01-ColEl based shuttle

28 vectors with the RX2 (IncP) origin of transfer, which when propagated in an E. coli host supplying the RP4 (IncP) functions in trans, will undergo cross-genus conjugal transfer to S. lividans (Mazodier et al., 1989).

The discovery and development of Streptomyces genetic tools has led to considerable interest in heterologous gene expression using streptomycete hosts.

Heterologous genes are defined as originating from different Streptomyces species, different Actinomycetales genera, or entirely different kingdoms of life, and there are numerous examples of successful expression studies using all three definitions.

Most heterologous gene expression experiments fall into one of two classes. The first class involves attempts to adjust the cytoplasmic metabolism to desired parameters, such as increased or altered antibiotic production (Hopwood, 1989), through addition of specific heterologous genes (often termed ‘metabolic engineering’). An example of such an effort was the production of functional hybrid polyketide synthase proteins by co­ expression of type I polyketide synthase subunits fi'om different Streptomyces species in the same recombinant host (Tang et al., 2000). The second class of experiments involves attempts at harnessing the natural protein secretion capacity of the Streptomyces to produce and transport recombinant proteins into the fermentation medium (Chang, 1987;

Anne and Van Mellaert, 1993; Binnie et al., 1997). A notable example of such an effort was the production and secretion of soluble human T cell receptor CD4 in S. lividans as a fusion construct with the S. longisporus STI-II proteinase inhibitor promoter, ribosome binding site and leader peptide (Fomwald et al., 1993). Table 1.2 provides additional selected examples of heterologous proteins expressed and secreted in S. lividans.

29 Table 1.2. Selected examples of heterologous proteins produced and secreted in S. lividans. The proteins, their organism of origin, and the promoter and signal peptide sequences utilized for expression and secretion, respectively, are shown along with the reported level of secreted product. NR: not reported. References: a, Paradis et al., 1996; b, Anne et al., 1995; c, Schmitt-John and Engels, 1992; d, Lichenstein et al., 1988; e, Fomwald et al., 1993; f, Chang and Chang, 1988.

30 Signal Protein Origin Promoter Production Reference peptide

P-glucuronidase E. coli xlnA XlnA 30 mg/l a

Pertussis toxin B. pertussis xlnA XlnA I mg/l a SI

S. venezuelae S. venezuelae mTNF-a mouse NR b a-amylase a-amylase

tendamistat S. tendae ermE* native 500 mg/l c

Streptomyces Streptomyces ILl-P human 1-5 mg/l d P-galactosidase P-galactosidase

CD4 receptor human STI-II STI-II 300mg/l e

TNF-a human ermE* STI-II 20 mg/l f

Table 1.2. Selected examples of heterologous proteins produced and secreted inS. lividans.

31 Goals of this study

Four primary research goals formed the basis of the experimental agenda pursued in this work. First, determination of the regulatory relationship between Streptomyces sp. strain C5 snpR and snpA was essential to understanding the role of SnpR in SnpA proteinase expression. Second, precise definition of the transcriptional start sites within the Streptomyces sp. strain C5 snpA-R intergenic region was necessary to develop a model explaining the above-established regulatory relationship. Third, the regulatory model developed for Streptomyces sp. strain C5 snp was to be compared to other published homologous snp loci by comparative sequence analyses, so as to identify functionally important nucleotide sequences conserved among the homologs, if any.

Exploration of the potential for development of 5/ip-based heterologous gene expression tools, their development, and their application was the last, and most applied, objective of this research.

32 CHAPTER 2

GENETIC CHARACTERIZATION OF THE Streptomyces sp. strain C5 snp LOCUS,

AND DEVELOPMENT OF AN ^np-DERIVED EXPRESSION VECTOR FAMILY

Introduction

The prospect of using Streptomyces as a host for the expression of recombinant proteins has led to significant interest in the regulation and genetics of the many proteinases produced by these bacteria. Considerable effort has been devoted to cloning, sequencing and, in numerous cases, disrupting the genes encoding secreted streptomycete proteinases. Streptomyces lividans 66, a common laboratory strain considered to be the host of choice for heterologous gene expression, is known to secrete a number of proteinases, any one of which might interfere with recombinant protein production and secretion. A combination of biochemical and genetic techniques has led to the characterization of four of these extracellular proteinases. They include a chymotrypsin- like serine proteinase (Binnie et al., 1996), a tripeptidyl aminopeptidase (Krieger et al.,

1994; Binnie et al., 1995), a subtilisin BPN'-like proteinase (Butler et al., 1996), and a neutral proteinase (Butler et al., 1992; Lichenstein et al., 1992). Isolation of the genes encoding these and other proteinases enabled the development of a S. lividans strain devoid of endogenous secreted proteinases, engineered specifically for use as a recombinant protein expression host (Cangenus™, Cangene, Inc; U.S. Patent 5,712,127). 33 The applied benefits stemming from an increased understanding of streptomycete

proteolysis has spurred significant research interest in the field. Three similar

streptomycete loci, the snp locus o f Streptomyces sp. strain C5 (Lampel et al., 1992), the

sip (also known as prt) locus of S. lividans 66 (Butler et al., 1992; Lichenstein et al.,

1992), and the mpr locus of 5. coelicolor 'Müller' (Dammann & Wohlleben, 1992) have

all been shown to direct the production of similar extracellular metalloproteinases of

unusually small size. The snp, sip and mpr loci are composed of two divergently oriented genes, one encoding the proteinase and the other a transcriptional regulator protein belonging to the LysR family (Henikoff et al., 1988). The snp locus was isolated from a plasmid library of Streptomyces sp. strain C5 genomic DNA by virtue of its ability to confer a hyperproteolytic phenotype on S. lividans. The nucleotide sequence of snpA and the snpA-R intergenic region, as well as the biochemical characterization of the SnpA proteinase, have been published previously (Lampel et al., 1992). This report describes the nucleotide sequence and regulatory role of snpR, as well as the transcriptional properties of the intergenic region. In addition, the development of the locus into a family of expression and secretion vectors, as well as application of the system to the production of extracellular soluble recombinant human endostatin, will be described.

Methods and Materials

Bacterial strains and plasmids^^ Bacterial strains used in this study are summarized in

Table 2.1. Streptomyces lividans TK24 and S. lividans 1326 (Hopwood et a i, 1985), host strains for recombinant DNA, were obtained from D. A. Hopwood. In some experiments an alternative host, S. lividans 66 HLP-6 (Butler et a i, 1992), a mutant strain with a

34 deleted sip locus, was used as a genetically clean background for regulatory studies. S. lividans HLP-6 was obtained from Cangene, Inc. Escherichia coli DH5a (Life

Technologies, Inc, Gaithersburg, MD) and E. coli Top 10 (Invitrogen, San Diego, CA) were used for propagation of E. coli plasmids for restriction analysis cloning, and sequencing. A table of the plasmids used in this study is provided in Appendix 1.

Strain Relevant Characteristics/Genotype Source/Reference

E. coli DHSaF’ F <^80d/acZ AM 1 5A(lacZY A-argF) U 169 endA 1 recA 1 LifeTechnologies hsdRl7(r k *m k * ) deoR thi-I jupE44 X." gyrA96 relA\

E. co//Top 10 F ■ mcrA A{mrr-hsdRSAS-mcr^C) f80/acZ\M15 A/ocX74 Invitrogen deoR recA 1 araD139 A{qra-leu) 7697 gaRi galR rpsL (Sti* ) endA 1 nupG

E. coli ET 12567 dam, dcm MacNeil, 1988

S. lividans 1326 minor milk-protein hydrolysis conferred by slpA locus D.A.Hopwood S. lividans TK24 plasmid-free derivative of S. lividans 1326 D. A. Hopwood S. lividans 66 HLP-6 sip A mutant of S. lividans 1326 Butler, 1992

Table 2.1. Bacterial strains used in this study

Media and growth conditions. E. coli cultures were prepared using liquid or solid Luria

Bertani medium as described by Maniatis (1982), containing 50 pg/ml ampicillin, 20 pg/ml neomycin, or 50 pg/ml apramycin when necessary. IPTG, X-Gal, and all antibiotics used were purchased from Sigma Chemical Company (St. Louis, MO).

The cultivation and preservation of streptomycetes were conducted as described by Hopwood et al. (1985). Liquid cultures were routinely prepared in YEME medium 35 (Hopwood et ai. 1985), or in TSBP-S with the following components (in g per L):

Tryptic Soy Broth, 30; Yeast Extract, 1; dextrose, 5; sucrose, 150. R2YE medium, used

as the primary solid growth medium for streptomycete cultures, was prepared as described by Hopwood el at. (1985). An alternative growth medium, MSEM, was also

used for cultivation of recombinant streptomycetes. MSEM contains (in g per L):

maltose, 20; dextrose, 20; yeast extract, 5; MgS 0 4 «7 H2 0 , 0.2; K2 HP0 4 *3 H2 0 , 0.5;

MOPS, 4; lOX trace elements (Hopwood et al., 1985), 0.8 ml. Streptomyces cultures harboring plasmids based on the pIJlOl replicon, such as plJ486, pIJ702 or pANT849, were grown on solid media containing 50 pg/ml of thiostrepton. Recombinant cultures harboring plasmids with the aphW or acc(3)-IV resistance markers were maintained on media containing 20 pg/ml neomycin or 50 pg/ml apramycin, respectively.

DNA and RNA manipulations.

Plasmid purification. Small scale preparations of plasmid DNA were purified from E. coli and Streptomyces by an alkaline lysis /silica gel binding procedure adapted from the method described by Carter and Milton (1993). Briefly, biomass from 1.5 ml of dense liquid culture was resuspended in a buffer of 25 mM Tris-HCl, 10 mM EDTA, pH 8.0.

With streptomycete preparations, lysozyme was added to a final concentration of 2.5 mg/ml and the mixture incubated for 30 minutes at 37°C to weaken the cell walls. The cells were then lysed by the addition of a 0.2 N NaOH / 1% SDS solution followed by agitation. The lysate was cleared of most of the cellular debris by precipitation with a 3

M potassium acetate solution (pH 4.8) and centrifugal separation of the solids. An aliquot of silica resin solution composed of 1% (w/v) diatomaceous earth in 4 M 36 guanidine thiocyanate (50 mM Tris-Cl, 50 mM EDTA) was added to the cleared lysate, and the mixture was incubated at room temperature for 2 min, during which time plasmid

DNA molecules adsorbed to the silica particles. Passage of the slurry through a syringe fitted with a microfuge tube adaptable mini-column (Promega, Madison, WI) allowed for capture of the plasmid-loaded silica particles, which then were washed with several syringe volumes of column wash solution (200 mM NaCl, 20 mM Trizma base, 5 mM

EDTA in 50% ethanol). Finally, the plasmid DNA was eluted from the silica particles by addition of a small volume of TE buffer to the mini-column, followed by centrifugal separation.

Large-scale preparations of plasmid DNA from E. coli were prepared fi'om 500 ml cultures by a scaled up version of the alkaline lysis method of Bimboim and Doly

(1979), followed by CsCl/EtBr density gradient centrifugation. Water-saturated butanol was used to extract residual EtBr from plasmid preparations, and ethanol precipitation permitted concentration of the plasmid DNA.

RNA purification. TRIZOL (Life Technologies, Inc.) was used for all streptomycete

RNA preparations as per the manufacturer’s instructions, with the following modifications. Lysozyme pre-treatment of the mycelia greatly enhanced RNA recovery, and phenol :CHCl 3 extraction after the final precipitation of the Trizol protocol was found to be necessary if further enzymatic manipulation of the RNA was planned.

DNA sequencing and analysis.

Radioisotopic sequencing. Manual DNA sequence determination was performed using the dideoxynucleoside termination procedure of Sanger et al. (1977). Sequenase,

37 purchased from U.S.Biochemical Corp (Cleveland, Ohio), and [a-thio-^*S]dATP (>1,000

Ci/mmol; Amersham Corp., Arlington Heights, 111.) were used according to the manufacturer’s directions in reactions typically containing 1.0 pg of mini-prep plasmid

DNA as the template. Vertical gel electrophoresis, gel handling, and autoradiographic visualization methods were performed essentially as described by Maniatis (1982), and the DNA sequence information was manually entered into a PC workstation for digital storage and manipulation.

Automated Sequencing. Automated DNA sequence determination was performed with an ABI377 automated DNA sequencing system (Perkin Elmer Applied Biosystems,

Foster City, CA) using the enzyme and reagents provided by the same manufacturer. The sequencing reactions were set up and performed in a PE9600 thermal cycler (Perkin

Elmer) as indicated in the manufacturer’s instructions, and the reactions were loaded onto the AB1377 vertical gel system for separation. The differential fluorescence of the four terminator nucleosides was recorded by the instrument and processed by the software into an electropherogram representing the sequence of the template. Based on the quality of the electropherogram, the reliable region (usually between 400 and 600 nucleotides) of the associated sequence was excised and stored for later digital manipulation.

Sequence management and analvsis. All bioinformatic and network interface programs were operated on a Pentium processor-based IBM PC compatible workstation running

Microsoft Windows operating systems. Clone Manager 5 (Scientific & Educational

Software, Inc., State Line, PA) was used for routine management of DNA sequence files, including such manipulations as open reading frame (ORE) searches, the generation of predicted restriction profiles, planning cloning experiments and preparing printouts. The

38 DNA and deduced amino acid sequences were analyzed with software from Genetics

Computer Group, University of Wisconsin, Madison, WI (Devereux et a i, 1984) and the suite of analysis tools available in the NCBI web site (http://www.ncbi.nlm.nih.gov/).

Multiple alignments were generated using the CLUSTALW program running on the Pôle

Bio-Informatique Lyonnais Network Protein Sequence Analysis server (PBIL-NPSA,

Lyon, France; http://pbil.ibcp.fr/NPSA) and were viewed and manipulated with BioEdit

4.74 (Hall, 1999). BOXSHADE output was generated using the European Molecular

Biology network (EMBnet, Swiss node; http://www.ch.embnet.org/). Helix-tum-helix analyses based on the method of Dodd and Egan (1990) were conducted using the PBIL-

NPSA server.

Cloning procedures and plasmid construction. Protocols for the preparation of chemically competent £. coli, and their transformation and regeneration were carried out as described by Maniatis (1982), or with pre-made competent £. coli preparations purchased from Life Technologies, Inc (Gaithersburg, MD). The procedures used for the preparation, transformation and regeneration of Streptomyces protoplasts were performed as described by Hopwood (1985).

Restriction endonucleases, DNA modifying enzymes and T4 DNA were purchased from Life Technologies, Inc. or New England Biolabs, Inc. (Beverly, MA).

DNA fragments for ligations were purified from gel slices either by the phenol freeze- fracture technique, or an adapted version of the above-described silica-binding mini-prep procedure. The phenol freeze-fracture method involved mincing of the gel slice containing the band of interest, the addition of an aliquot of buffered phenol, three freeze-

39 thaw cycles at -80®C and 37®C (5 minutes at each temperature), removal of the aqueous phase, and ethanol precipitation of the DNA. The yield from this procedure typically ranged between 70 and 80%. The alternative silica binding technique involved incubating the minced gel slice in 6 M sodium iodide imtil completely dissolved, followed by addition of silica resin and processing as described above for the plasmid mini-preparation procedure.

When dsDNA oligonucleotide linkers were used in plasmid construction, complementary oligonucleotides (Life Technologies, Inc.) were designed to yield small dsDNA fragments with the desired 5’ or 3’ overhangs. The separate ssDNA linkers were mixed in equimolar quantities in a buffer containing 10 mM Tris chloride pH 8.0, 10 mM

MgCb, heated to 99°C for 10 minutes, and allowed to cool slowly to room temperature before addition to the linearized vector fragment for ligation.

Appendix 1 lists the plasmids used in this study and indicates how they were constructed. A number of plasmids were generated using dsDNA linkers. Table 2.2 shows the individual oligonucleotides synthesized for this purpose.

Construction of general purpose cloning vectors. The three modified versions of pUC19 generated in this study were constructed as follows. Plasmid pANT840 was prepared by ligating the Xbal cohesive dsDNA linker CLDAVJl/2 mXoXbal digested pUC19 and screening for white colonies on plates containing IPTG and X-Gal. Plasmid pANT84l, with restored alpha complementation, was prepared by ligating the RESTOl/2 linker into

Mlul - Spel digested pANT840 and screening for blue colonies on plates containing

IPTG and X-Gal, as described by Maniatis (1982).

40 Table 2.2. DNA oligonucleotides used for linker preparation. Sequentially numbered oligonucleotides were combined to form dsDNA linkers, generating the palsmids indicated. Cohesive end sites in parentheses indicate that the annealed dsDNA linker contained compatible ends, but destroy the receiving site. Plasmid pANT846 was generated in two steps (see text).

41 Oligo­ Relevant Cohesive Sequence nucleotides plasmids Ends

CLDAVJl 5’CTAGATCTCGAGGCCTGATCATCGATGGGCCCA TGGCTAGCGGCCGC ACGCGTCGCG ACTAGT3 ' pANT840 Xba\ 5’CTAGACTAGTCGCGACGCGTGCGGCCGCTAGCC CLDAVJ2 ATGGGCCC ATCG ATG ATCAGGCCTCG AG AT3 ’

RESTOl 5’CTAGTCGCGAA3’ pANT84l MluX-Spel REST02 5CGCGTTCGCGA3'

5 TCG AC ATTAATTAATTTAAATCCGTTTAAACGG REST07 CGCGCCGGCCGGCCGCGATCGCGCACTAGTCGCG A A3’

5’CGCGTTCGCGACTAGTGCGCGATCGCGGCCGGC REST08 Sall-Mlul CGGCGCGCCGTTTAAACGGATTTAAATTAATTAA TG3’ pANT846' REST07p3 5’GATCCTTGCGCGCCTAGGCGGCCGTGGCCAGCC Bam\\l-Kpn\ CGGGCCTTGGTAC3 ’ REST08p4 5’CAAGGCCCGGGCTGGCCACGGCCGCCTAGGCGC GCAAG3’

5’CGCGTTAACTAGTTTAAAGCTTGAGCTCTAGAT CLDAVJ3 CTGAATTCGCATG3 ’ pANT849 Sph\-Mlul CLDAVJ4 5’CGAATTCAGATCTAGAGCTCAAGCTTTAAACTA GTTAA3’

5’CTAGAGCTCTGCAGGATCCAAGCTTACTAGTGG CLDTERM! TACCACGCGTGAATAGATAGAGTGCAGGGGCCCC GACCGTGTCGGGGCCCCTGCACGGGTCCATGG3’ pANT1200 Xba\- ,1201.1202 iHindWV) 5’AGCTCCATGGACCCGTGCAGGGGCCCCGACACG GTCGGGGCCCCTGCACTCTATCTATTCACGCGTGG CLDTERM2 TACCACTAGTAAGCTTGGATCCTGCAGAGCT3’

(continued) Table 2.2. DNA oligonucleotides used for linker preparation.

42 Table 2.2. (continued) 5’CGATGCGCAGAAAGACCGTGGCAGCTGCACTCG VAAUPl CCCTTGTGGCGGGAGCCGCTGTGGCCGTCACGGG C AACGCCCCGGCGC AGGCCGTCCCGCCCG3 ' Cla\-BamH\ PANT3021 5’GATCCGGGCGGGACGGCCTGCGCCGGGGCGTTG VAAUP2 CCCGTGACGGCCACAGCGGCTCCCGCCACAAGGG CG AGTGCAGCTGCCACGGTCTTTCTGCGC AT3 ’

5’CGATGGCCAGAAAGACCGTGGCAGCTGCACTCG VAAl CCCTTGTGGCGGGAGCCGCTGTGGCCGTCACGGG C AACGCCCCGGCGC AGGCCGTCCCGCCCG3 ' Cla\~BamW\ pANT3022 5’GATCCGGGCGGGACGGCCTGCGCCGGGGCGTTG VAA2 CCCGTGACGGCCACAGCGGCTCCCGCCACAAGGG CG AGTGC AGCTGCC ACGGTCTTTCTGGCC AT3 ’

5’CGATGCGTCGCACCCTCCAGGCCGTGGGAGCAG VSIUPl CCGCGGCGGCGGCCACCTGCGTCCTCGCCGCGAC GGCAGGCACCGCGCAGGCCGAGGCCCCCG3 ' Cla\-Bam\\\ pANT3023 5’GATCCGGGGGCCTCGGCCTGCGCGGTGCCTGCC VSIUP2 GTCGCGGCGAGGACGCAGGTGGCCGCCGCCGCGG CTGCTCCCACGGCCTGGAGGGTGCGACGCAT3'

5’CGATGCGTCGCACCCTCAAGGCCGTGGGAGCAG VSIl CCGCGGCGGCGGCCACCTGCGTCCTCGCCGCGAC GGCAGGCACCGCGCAGGCCGAGGCCCCCG3 ’ Clal-BamHl pANT3024 5’GATCCGGGGGCCTCGGCCTGCGCGGTGCCTGCC VSI2 GTCGCGGCGAGGACGCAGGTGGCCGCCGCCGCGG CTGCTCCCACGGCCTTGAGGGTGCGACGCAT3’

5’CGATGCGCATGCCCCTGTCCGTTCTCACCGCCG SNP I CCGGACTGAGCCTGGCGACCCTCGGTCTCGGCAC CGCCGGTCCGGCCTCGGCG ACCCCC ACCG3 ’ pANT3025 Clal-BamHl 5’GATCCGGTGGGGGTCGCCGAGGCCGGACCGGC SNP2 GGTGCCGAGACCGAGGGTCGCCAGGCTCAGTCCG GCGGCGGTGAGAACGGACAGGGGCATGCGCAT3’

5’CGATGATTCCACAACGCACAGCACTGATTTCCG PEL I CCGCCGTGCTGGTCCTCGGCGCGGTGTCGATTCCC C AGGCG ACCGCCGC ACCGTTCG3 ’ PANT3026 Clal-BamHl 5’GATCCGAACGGTGCGGCGGTCGCCTGGGGAATC PEL2 GACACCGCGCCGAGGACCAGCACGGCGGCGGAA ATCAGTGCTGTGCGTTGTGGAATCAT3 ’

43 Plasmid pANT846 was generated in a similar fashion, except two separate linkers,

REST07/8 and REST07p3/8p4 were ligated to pANT840 in a two step process. The structure of the multiple cloning site regions of plasmids pANT840, pANT841, and pANT846 were verified by DNA sequencing.

Construction of Streptomyces expression vectors. The first streptomycete expression vector generated in this work, pANT849, was generated by ligating the CLDAVJ3/4 linker into Sph\-Mlu\ digested pANT842. Addition of the pUC19 replicon and bla marker generated the E. coli - Streptomyces shuttle vector pANT857, and further addition of the ^ al-^ m d lll linker TERM 1/2 containing the bidirectional mmr terminator

(Neal and Chater, 1991) and additional restriction enzyme sites gave the thiostrepton/ampicillin selectable plasmid pANT1200. In order to eliminate the Ba/nHI site in the plJlOl rep gene, the Kpnl-Notl fragment of pANT849 containing the rep gene of pIJ 101 was replaced with a mutagenized version of the gene from pIJ486, devoid of the BamWl site, generating pANT866. The neomycin and apramycin selectable vectors described below were generated from pANT866.

The acc(3)-IV gene (Brau et al., 1984), conferring resistance to apramycin, was isolated from pKC505 (Richardson et ai, 1987) as an Sstl fragment ligated with similarly cut pANT841, giving pANT880. The hi functional E. coli-Streptomyces Pacc(3 )-iv promoter and gene, contained in pANT883, was used in the construction of both the neomycin and apramycin selectable vectors. The aphW gene of pIJ486 was excised from pANT806 and inserted into pANT883, replacing the apramycin marker and placing the gene under P

44 pANT886 were sequentially destroyed by restriction and blunting with Klenow polymerase, yielding pANT889 and pANT890, respectively. Plasmid pANT890 was then used as the donor of the E. coli replicon and bifunctional aphU marker to pANT866, giving pANT894, to which the TERM 1/2 linker was then added, yielding the neomycin- selectable shuttle vector pANT1201. Generation of the apramycin-selectable vector involved destruction of the pANT883 £coRI, //mdlll, Sphl, and Pstl restriction sites, yielding pANT893, addition of the pANT894 replicon, giving plasmid pANT895, and finally the addition of the TERM 1/2 to generate the apramycin-selectable shuttle vector pANT1202. The structures of the MCS regions of pANT849, pANT1200, pANT1201 and pANT1202 were confirmed by DNA sequencing.

Construction of reporter gene plasmids. The aphW gene, originally isolated from bacteriophage Tn5 (Beck et al., 1982), was excised from pIJ486 as an £coRI-//indIII fragment and ligated into pUC19, generating pANT806, which was subsequently used as a donor of the reporter gene to pANT849 as an £coRI-//mdIII fragment, yielding pANT852. Plasmid pANT853, which contains a deletion of the snpR coding region between the Saul and Stul sites in the snpR-aphW cassette, provided an 5n/;£-minus construct. Ligation of the CLDAVJ3/4 dsDNA linker into SJoW-Af/wl-digested pIJ702, positioning the polylinker downstream of the melCX promoter (Beman et al., 1985), generated plasmid pANT855. Plasmid pANT856 was prepared by ligating the aphll gene from pANT806 into £coRI-MwdIII-digested pANT855. To compare P- snpA with an established highly active promoter, the endogenous £coRI, Sioci, and Kpnl restriction sites of plasmid pIJ4070, which contains the mutant ermE promoter (termed Perm£*; Bibb and Janssen, 1986) on a Bglll fragment, were deleted, generating pANT824, and joined

45 with the Sph\-Pst\ fragment of pIJ702 to give the shuttle plasmid pANT825. Addition of

the CLDAVJ3/4 linker to pANT825 generated pANT826, with the multiple cloning site

downstream of Perm£«- The aphll cassette of pANT806 was then added to pANT826 as

an £coRI-//mdIII fragment, placing the reporter gene under transcriptional control of the

mutant promoter. For a shuttle plasmid variant of pANT852, plasmid pANT868 was

generated by ligation of the pANT806 £coRI-//mdIII aphU fragment into pANT857.

Construction of Streptomyces expression-secretion vectors. Six synthetic dsDNA linkers

were prepared in such a way that ligation of the linkers into Clal-BamYW digested

pANT1201 would generate a set of plasmids with open reading frames encoding different streptomycete signal peptides fused to the ATG start codon of the snpA gene. The signal peptides chosen were from the S. venezuelae a-amylase gene (VAA; Virolle et a i, 1988) and a mutant thereof (VAAUP; Lammertyn et al., 1998), the S. venezuelae subtilisin inhibitor (VSI; Van Mellaert, 1998) and a mutant thereof (VSIUP; Lammertyn, 1997), the Amycolata sp. pel pectate gene (PEL; Brulhmann and Keen, 1997) and the native snpA gene of Streptomyces sp. strain C5 (SNP; Lampel et al., 1992). Each completed vector contained a unique in-frame BamVll site just downstream of the encoded cleavage site of the leader peptide. A glycine codon (GOA), followed by a serine codon (TCC) comprised this BamWl endonuclease recognition site.

The signal peptide fusions were verified by DNA sequencing.

Construction of human endostatin secretion plasmids. The human endostatin cDNA construct pMALc-H#I5, with a BamVll site engineered just upstream of the encoded functional amino-terminus of endostatin (Saarela et al., 1998), was provided by Dr. Judy

46 Boice at Merck & Co., Inc. for in-frame ligation into the glycine-serine-encoding BamVl

site (GS-5amHl) of the secretion vectors. Neomycin-selectable shuttle plasmids

pANT3032 and pANT3035 were generated by ligating the BamHI-///>idIlI fragment of

pAMLc-H# 15 into similarly cut pANT3022 and pANT3025, respectively.

Thiostrepton/ampicillin-selectable analogs of pANT3032 and pANT3035 were generated

by addition of the 5n/7/?-leader-endostatin cassettes to Kpnl cut pANT826, giving

plasmids pANT3042 and pANT3045, respectively. Lastly, to allow comparison of

different vector replicons, the pANT3035 Kpnl fragment containing the snpR-VAA-

endostatin cassette was ligated into the Streptomyces vector plJ303, which contains the

second strand origin (jsti) from plJlOl (Kieser et al., 1982), generating pANT3052.

Reporter protein quantification. The reporter protein produced by the various recombinant S. lividans cultures was measured using an Nptll ELISA kit from 5 Prime 3

Prime, Inc. (Boulder, CO). This sensitive sandwich immunoassay system employs a polyclonal anti-Nptll coating antibody to capture Nptll protein in the sample, a monoclonal anti-Nptll primary antibody, and a biotinylated secondary antibody reagent for detection of the bound reporter protein. Incubation with streptavidin-conjugated alkaline phosphatase and addition of the chromogenic substrate p-nitrophenol phosphate permitted spectrophotometric quantification of bound Nptll by measurement of released p-nitrophenol (e = 405 nm). A standard curve generated on each plate with pure Nptll indicated the linear range of the assay.

Cell lysates were prepared for disruption by resuspending 2 ml of culture biomass in 350 pi of TEP buffer (10 mM Tris-Cl, 1 mM EDTA, 1.0 mM PMSF), followed by

47 addition of one equivalent (w/v) of 0.1 mm glass beads. The samples were disrupted in a

Mini-Bead Beater (Biospec Products, Bartlesville, OK) with four 30 second pulses at

5000 rpm, and cooling of the samples on ice for 2 minutes between pulses. After

centrifugal separation of the solids at 14,000 x g, cell lysate protein concentrations were

determined using a dye binding kit from Bio-Rad (Hercules, CA), and equivalent

amounts of protein were loaded into the ELISA plates allowing the quantified reporter

protein to be calibrated to total lysate protein. For each sample assayed, a series of

dilutions was tested to ensiue results within the linear range of the ELISA. Lysates

prepared from cultures expected to be low producers were typically diluted 1:2 through

eight wells from a starting concentration of 200 pg/ml, while those lysates from

suspected high-producers of Nptll were diluted 1:3 through eight wells.

Primer extension analyses. The transcriptional start points of the snpA and snpR genes

were determined by a non-radioactive primer-extension method exploiting the

capabilities of the ABI 377 automated sequencer. The procedure involved the use of a 5’

fluorescein modified oligonucleotide corresponding to the antisense strand of the gene

being studied, approximately 50-100 bp from the expected start site. This primer was

used in a reverse transcriptase reaction yielding a 5’ fluorescinylated ssDNA fragment

with a 3' terminus corresponding to the 5’ terminus of the cognate mRNA molecule.

Using a size standard of dideoxy-terminator sequencing reactions generated with the same primer, the extension reactions were separated by electrophoresis with an ABI 377 instrument. The data acquisition program coupled to the ABI 377 generated a gel image similar in appearance to autoradiographs generated using traditional methods. Using

48 Adobe Photoshop 5.0 (Adobe Systems Inc., San José, CA), the raw image files generated by the instrument (in TIFF format) were cropped to remove extraneous content, converted to grayscale, inverted to give dark bands on a light background, and finally adjusted for brightness and contrast.

For both transcripts, RNA was prepared from late logarithmic phase cultures of S. lividans TK24 (pANT842) grown in TSBP-S using the Trizol method described above, and evaluated for quality both by spectrophotometry and electrophoresis. The fluoresceinylated primer 5’FGACGACCGGCGCGCCCTCAGCG3’ was used to map the snpA transcription start point, and 5 ' FGTGTTCGATGCGCCGCAGCTGCGTG3 ' used to map the snpR transcription start point. Each primer extension reaction contained 1.0 pg o f total

RNA, 200 pM of each dNTP, I.O mM MnCb, 1.0 pmol of the fluorescein-labeled primer in IX vTth buffer, and was started with the addition of 5 U of rTth thermostable reverse transcriptase (Perkin Elmer, Foster City, CA). The reactions were incubated at 60°C for

70 min, after which they were ethanol precipitated, resuspended in automated sequencer template suppression reagent, heat denatured, and held on ice until loaded. Dideoxy- sequencing ladders were generated using the AmpliCycle sequencing kit (Perkin Elmer) as described by the manufacturer, the template plasmid DNA was desalted using Centri-

Sep spin columns (Princeton Separations, Princeton, NJ), and the fluoresceinylated oligonucleotide from the accompanying extension reaction was used to prime DNA synthesis.

Protein Electrophoresis and Blotting. Utilizing a Mini-Protean II vertical electrophoresis apparatus (Bio-Rad), protein samples were separated by SDS-PAGE with 49 3% (w/v) stacking and 12% (w/v) resolving gels as described by Laemmli (1970), or with pre-cast 12 % (w/v) Tris-Cl gels (Bio-Rad), running in buffer containing SDS. For western blots, proteins were transferred to PVDF membranes using a Mini Trans-Blot cell purchased with the Mini-protean tank system (Bio-Rad). The transfer parameters were 1 GOV constant voltage for 1 hour at 4°C.

Immunodetection Procedures. Membranes containing transferred SDS-PAGE profiles of protein samples were first blocked with a solution of 0.01% (w/v) NFDM in Tris- buffered saline (TBS) for 1 hour at room temperature, followed by three washes with

TBS-0.1% (v/v) Tween-20 (TTBS). The primary antibody used for detection of recombinant endostatin was provided by Dr. Judy Boice at Merck & Co., Inc, and was used at a 1 ;20,000 dilution for western blots. Primary antibody, in a solution of TTBS-

0.05% NFDM, was incubated with the membrane overnight at 4°C with gentle agitation.

Two-hour incubations at room temperature were also used, but gave weaker signal intensities. The secondary antibody was alkaline phosphatase-conjugated goat anti-rabbit

IgG (Bio-Rad) and was diluted to 1:6000 in TTBS-0.05% NFDM before use. The membrane was washed three times with TTBS, and incubated with the secondary antibody solution for 1 hour at room temperature, washed again, and then developed.

The membrane was soaked in Immun-Star™ chemiluminescent alkaline-phosphatase substrate (Bio-Rad) for 5 minutes, after which it was heat-sealed in a plastic pouch and placed atop a sheet of X-OMAT X-ray film (Eastman Kodak Company, Rochester, NY).

After exposure for periods of 0.5 to 2 min, the film was removed and developed with a

Picker-processor automatic film developer (Marconi Medical Systems, Cleveland, OH). 50 Purified and insoluble recombinant human endostatin, provided by Dr. Judy Boice at

Merck & Co., Inc., was also used as a standard in SDS-PAGE and blots when necessary.

Semi-quantitative western blot analysis. To obtain an approximate measurement of recombinant endostatin produced by S. lividans (pANT3052), a western blot was performed with a set of lanes containing known amounts of £. coli endostatin standard next to lanes containing the recombinant Streptomyces fermentation broth samples to be quantified. After development of the blot, the resulting film was converted into a digital

(TIFF format) image file using a flatbed scanner and Adobe Photoshop 5 LE.

Densitometric analysis was conducted on the resulting image file using Image Pro Plus

4.1.0.0 (Media Cybernetics, L.P., Silver Spring, MD). To generate a standard curve, the area and mean pixel density values were determined for the relevant bands, and the products of these values were plotted as a fimction of the endostatin concentration in each lane. The approximate concentration of endostatin in the Streptomyces fermentation broth sample was calculated with a regression formula, generated from the standard curve, and the density product of the darkest Streptomyces broth sample band.

Results

Sequence and analysis of snpR. With the sequence of the extreme 5’ region of snpR serving as a point of reference (Lampel et al., 1992), the remaining insert DNA of the

Streptomyces sp. strain C5 snpA/R subclone plasmid pANT54 was sequenced in both directions, using both radioisotopic and non-radioactive techniques. The nucleotide sequence snpR is provided in Figure 2.01.

51 1 ATGGAGCTTG AGGTCAGGCA CCTCAGGGCG CTGTGCGCCA TCGCCGACAC CGGCAGCCTG mel evr hira lea iad t q s 1 PvuM 61 CACCGCGCGG CACGCCAACT GGGAGTGACA CAGCCCTCGT TGAGCACGCA GCTGCGGCGC hra arq Iqvt qps 1st qlr r helix-tum-helix motif 121 ATCGAACACG AGCTGGGCGG TGCCCTGTTC GTCCGGGCCC GCACCGGCTG CCGCCCCACA ieh elg galf vra rtg crpc

131 CCGCTGGGCC GGCTGGTGCT CAGTCGTGCC CGCCCCCTGG TGGCCGAATT GCGCTCCCTC pig rlv Isra rpl vae Irsl /’vull 241 GTCAGCGAGG CCCGCGCCGC CGCCGTCGGC GGACGCCAGC TGCGCGTCGG CTCCACGGCC vse ara aavg grq Irv gsta

301 AGCCGGGCCC TGGCGGGCTG GCTGCGCCGG CTCCGCCGGC ACTGGCAGGA ACCCACCCTG sra lag wlrr Irr hwq eptl

361 CACATGGACG TCTCCGCCAA CGCCCTGCTG CGCATGGTGG CCGACGGCCA CCTCGACGTC hmd vsa nail rm v adg hldv

421 GCCTTCGTGC ACGAGGTCGA GGGCAGCCCG CTGCGCGTCC CCGAAGGGCT CCGGGTCCGC afv hev egsp Irv peg Irvr 5acII 4 81 GTACTGGTCC AGCGGGAACC GCAGTTCGTC TGCCTGCCGG CCGACCACCC GGCCGCGGCG V 1 V qre pqfv clp adh p a a a

541 AAGCCGTCGT ACGCGTCGCG GACCTGGCCC ACGACCCGCT GGATGATCGA CCCCACCGTC kps yas ptwp ttr wmi dptv

601 GACGGCGAGT GGGACGCGGT GCGCCGGGTC CTGCGCGCCG AGGGACTCGA CCCGCGCATC dge wda vrrv Ira eg! dpri

661 CTGCACGGGG ACTACCACAC CGCCGCGTCC CTGGTCGCCA CCGGCGAGGT CGTCACCGTG Ihg dyh taas Iva tge vvtv

721 GTCCAGCCGA CCTCGCCCTC CCGCGCCGAG ACGGCCGTCC GCCGGCTGCA CGGCGACCCG vqp tsp srae tav rrl hgdp

781 CTCGGCGTAC GGCTGGTGCT GGCGGCCCGC ACGGACACGG AACTGGAGGG CGTCTACCCC 1 g V r 1 1 1 a a r tdt ele g v y p

841 GACCTCGCGG AGGCCTACGG GGAGGTCGCC CGGCAGGCCC CGGCGTACCG GGAGTGGCTG d 1 a eay geva rqa pay rewl

9 0 1 GAACGCAGTG GGTCGGGGGC ACTCGTCCCA GCCCTCCCGT GA ers gsg alvp alp

Figure 2.01. Nucleotide and deduced amino-acid sequence of ihe Streptomyces sp. strain C5 snpR gene.

52 As expected, the short partial open reading frame determined earlier in fact encodes the amino terminus of a deduced protein with properties consistent with the LysR class of transcriptional regulatory proteins. The complete snpR gene is 942 bp in length, and has an overall G+C content of 75 mol%. With a calculated molecular mass of 34,258 Da, the encoded SnpR protein is 313 residues in length and has a calculated isoelectric point of

9.88. Analysis of the SnpR amino acid sequence with the algorithm of Dodd and Egan

(1990) indicated a strong helix-tum-helix motif (score of 5.09) near the amino terminus, implicating SnpR as a DNA binding protein.

Comparison of Streptomyces snpR with the homologous genes from S. lividans and S. coelicolor revealed 80% nucleotide sequence identity between snpR and S. lividans slpR, and 70% identity between snpR and S. coelicolor mprR. When the encoded proteins were compared, SnpR showed 63% identity with S. lividans SlpR, and

59% identity with S. coelicolor MprR, while showing only 17% identity with £. coli

LysR. Aligned as a group, the three streptomycete proteins showed 44% identity. Figure

2.02 shows the alignment of SnpR, SlpR, MprR and LysR.

Additionally, analysis of the SnpR, SlpR, and MprR primary sequences by the method of Dodd and Egan (1990), indicates all three are very likely to be DNA binding proteins, with scores of 5.09, 4.59, and 4.51, respectively. For all three sequences, the likely DNA binding region begins at residue 18 (glycine) and continue to residue 39

(arginine/threonine). The greatest degree of identity between the three proteins lies in the amino-terminal region of the sequences, which contain the helix-tum-helix motif implicated in DNA binding.

53 Figure 2.02. CLUSTALW alignment of Streptomyces sp. strain C5 snpK and published homoiogs. The three streptomycete LysR-like proteins, Streptomyces sp. strain C5 SnpR (CSSnpR), S. lividans SlpR (SlSlpR), S. coelicolor MprR (ScoMprR), and E. coli LysR (EcoLysR) were aligned using the CLUSTALW program with the PAM250 weight matrix, a gap opening penalty of 10.0 and a gap extension penalty of 0.1. The amino-terminal segments containing the helix-tum- helix motif implicated in DNA binding are boxed. Residues shaded in black are identical in more than one sequence, while those shaded in gray are biochemically similar to residues in the other sequences.

54 CSSnpR 1 SlSlpR 1 ScoMprR 1 EcoLysR 1 ohelix-ptum-ohelix

CSSnpR 60 SlSlpR 60 |d . . . ScoMprR 60 ILGRPG EcoLysR 61 FEEVQRSWYGBD ILREFRQGE

CSSnpR 114 SlSlpR 115 ScoMprR 120 EcoLysR 119 PO ES e h :e w l s h r h d l

CSSnpR 174 SlSlpR 175 s^ g rs ScoMprR 180 DLADLAQ TT EcoLysR 174 PDD§QGE e t h s a I s v

CSSnpR 232 SlSlpR 233 ScoMprR 238 EcoLysR 234

CSSnpR 290 SlSlpR 291 ig C C S Y R ScoMprR 297 EcoLysR 291 L V T S g ...... ■ ■ S S A T T ®

CSSnpR SlSlpR 351 ScoMprR EcoLysR

Figure 2.02. CLUSTALW alignment of Streptomyces sp. strain C5 sitpR and published homologs.

55 This similarity is consistent with the observations of Henikoff et al. (1988), who found

the original members of the LysR family shared their strongest identity in a region

aligning with LysR residues 21-40.

Construction of general purpose cloning vectors. To enable greater flexibility in

cloning procedures with high G+C mol% streptomycete DNA, a series of improved E.

coli alpha-complementation vectors was prepared by modifying the lacZ-a fragment of

pUC19 to contain additional restriction endonuclease sites. The first vector generated, the

lacZ-a disruptant pANT840, was of limited immediate value as a cloning plasmid, but

facilitated construction of the more advanced pANT841 and pANT846 vectors by

allowing restoration of alpha-complementation. Plasmid pANT840 incorporated 12 new

unique restriction endonuclease sites into the pUC19 MGS, but also incorporated an extra

Xbal site. Plasmid pANT841 contained a restored lacZ-a. open reading frame and unique

Xba\ site. The most versatile vector, pANT846, contained 22 unique endonuclease

recognition sites not found in pUC19, including sites for eight meganucleases which

recognize octameric sequences. Figure 2.03 shows the MGS sequences of these vectors.

Reporter gene constructs. In order to determine and quantify the role of the putative

transcriptional regulator snpR on snpA transcription, a series of reporter-gene plasmids

was designed and constructed. The reporter gene selected, aphll, encoding neomycin phosphotransferase (Nptll), was obtained from the streptomycete promoter-probe vector plJ486 (Ward et al., 1986). This reporter gene was chosen because it has proven functionality in Streptomyces, and the encoded Nptll enzyme can be precisely quantified 56 Figure 2.03. Multiple cloning site of pUC19, pANT840, pANT841 and pANT846. The entire pUC19 MCS (bold), spanning the 5’ HindWl and 3’ £coRI sites is provided for reference. Plasmid pANT840 and pANT841 polylinker DNA outside of the HincW and Xbal sites is identical to pUC19, and not shown. The stop codon in pANT840 is underlined, and DNA identical to pUC19 is in boldface for all MCS fragments.

57 pUC19

Hind3 SphI PstI HincIIXbal BamHISmalKpnl S a d EcoRI 5 ' AAGCTTGCATGCCTGCAGGTCaACTCTJUSJU»31kTCCCCGGGTACCaAGCTC(SAATtC3 '

DANT84 0

HincIIXbal Spel Nrul Mlul NoCI Nhel NcoIApal Clal Bell StuIXhoIBglll 5 ' GTCGACTCTAGACTAGTCGCGACGCGTGCGGCCGCTAGCCATGGGCCCATCGATGATCAGGCCTCGAGATCTACA3 '

DANT841

Hindi Spel Nrul Mlul NotI Nhel NcoIApal Clal Bell StuIXhoIBglll 5 ' GTCGACAACTAGTCGCGAACGCGTGCGGCCGCTAGCCATGGGCCCATCGATGATCAGGCCTCGAGATCTAGA3 '

PAMT84 6

Hindi P a d Swal Pmel AscI Fsel Sgfl Spel Nrul Mlul 5'GTCGACATTAATTAATTTAAATCCGTTTAAACGGCGCGCCGGCCGGCCGCGATCGCGCACTAGTCGCGAACGCGT

Not I Nhel NcoIApal Clal Bell StuIXhoIBglll BamHI BssH2Avr2 EagI GCGGCCGCTAGCCATGGGCCCATCGATGATCAGGCCTCGAGATCTAGAGGATCCTTGCGCGCCTAGGCGGCCG

Smal MscI SrfI Kpnl S a d EcoRI TGGCCAGCCCGGGCCTTGGTACCGAGCTCGAATTC3 '

Figure 2.03. Multiple cloning site of pUC19, pANT840, pANT841 and pANT846.

58 using a commercially available ELISA assay. Since earlier subcloning experiments determined the snpR gene was apparently essential for the hyperproteolytic phenotype of pANT42 (Lampel et al., 1992), and many LysR-like proteins serve as activators of transcription, the initial premise was that SnpR protein was an activator of snpA transcription. To test this hypothesis in an extra chromosomal context, four reporter construct plasmids were designed and prepared (see Figure 2.04). The aphll gene was isolated from pIJ486 as an £coRJ-//mdIII fragment, containing the open reading fmme preceded by the native ribosome binding site, and cloned into pANT841 to generate pANT806, which served as the donor of the reporter gene for all subsequent ligations.

The reporter gene was first inserted into pANT849, a snp based expression plasmid (see following sections), generating pANT852, which contains the aphll gene positioned downstream of ?snpA-, and an intact divergent snpR gene. A deletion of the snpR DNA between the Saul and Slid restriction sites gave pANT853, identical to pANT852 except for the lesion in snpR. To compare snpA promoter activity with another well- characterized streptomycete promoter, the aphll gene was also placed under the transcriptional control of the melCl promoter from plJ702, generating pANT856. In both the PsnpA and P,„e/c/ reporter constructs, the SphI site used as the 5’ end of the non-snp or melC DNA contains an ATG codon encoding an amino-terminal methionine of SnpA or

MelC 1, and hence the aphll reporter gene is positioned similarly downstream of the two promoters.

Comparison of different reporter constructs. To evaluate the role of snpR in snpA expression, and to compare PsnpA with Pmeici, MSEM-grown cultures of S. lividans HLP-

6, harboring pANT849, pANT852, pANT853, pANT855, or pANT856 were

59 Figure 2.04. Promoter probe constructs used for the evaluation of snpA promoter activity and snpR function. The aphll gene of transposon Tn5 was excised from pIJ486 and inserted downstream of the snpA promoter both in ^np/?-positive (pANT852) and snp/î-negative (pANT853) contexts. Plasmid pANT856 was constructed to compare the level of the snpA promoter to the well-characterized melC 1 promoter present in pIJ702. Not shown is plasmid pANT855, which contains the pANT849 MCS fragment inserted into the Sphl-Mlul sites of pIJ702. Plasmid pANT855 was a negative control analogous to pANT849 for the reporter assays.

60 Xhol EcoRI Sad SpeP Clal Bglll Hindi II Mlul LSphI XBal Oral y

StuI PANT849 — # snpR P-snpA

X hol ■Clal KEcoRI S S acI StuI ^Kpnl Ncol PANT852 snpR P-snpA aphll

X hol Clal E coR I S a d i^Kpnl N col PANT853 [DELsnpR] aphll

SphI PstI EcoRI Bell ^ Kpnl PstI SphI PANT856 ■i—J— pii -".I ■ P-melC1 aphll

Figure 2.04. Promoter probe construets used for the evaluation of snpA promoter activity and snpR function.

61 Figure 2.05. Bar chart of Nptll production by recombinant Streptomyces cultures. Multiple cultures of recombinant S. lividans HLP6 strains harboring pANT849, pANT852, pANT853, pANT855 and pANT856 were grown in MSEM medium, processed and tested for cytoplasmic Nptll as described in the text.

62 Aphll Production by Recombinant S. Nvidans HLP6 (mean values from n cultures) 2500

f 2000 g 1750

9- 1500

S 1000

PA.NT849 PANT852 pANT8S3 pA>T855 pANT856 Plasmids Tested fbr Reporter Production

Figure 2.05. Bar chart of Nptll production by recombinant Streptomyces cultures.

63 harvested after 48 hours of growth, disrupted, and measured for Nptll production by

ELISA. Figure 2.05 shows the data generated by these ELISA assays. As expected, the negative controls, pANT849 and pANT855 (pIJ702 plus the pANT849 MCS downstream of Pme/c/) did not produce any detectable Nptll. The snp/î-minus construct and the Vmeici construct each produced around 50 pg Nptll per pg soluble protein, and the snpR positive construct produced around 1.7 ng Nptll per pg soluble protein. On average, a 35-foId increase of Nptll production was conferred by an intact snpR gene, confirming that the gene is involved in activation of snpA transcription. Of additional significance is the same ca. 3 5-fold difference between the melCl promoter, which produces Nptll reporter levels comparable to the snpR-minus construct, and the snp/?-activated snpA promoter.

Expression of episomal snpA as a function of host growth. In order to establish a production curve for Pj»px-driven Nptll production, experiments were conducted to plot

Nptll production as a function of culture age. In order to generate meaningful growth curve data, medium conditions were optimized to allow good growth rates while maintaining a dispersed culture. Due to the mycelial nature of vegetative Streptomyces growing in liquid medium, clumping tends to occur, which can hamper generation of growth curves. This was overcome with the inclusion of 15% sucrose (w/v) in TSBP medium, and the use of coiled springs in baffled flasks; the combination of lower water activity and increased mechanical agitation gave the desired culture characteristics.

As an alternative to optical density measurements, the samples taken for reporter protein measurement were measured for total protein (after medium removal and disruption, but before centrifugal clarification) by the method of Bradford (1976). 64 Figure 2.06 shows plots of optical density and total protein measurements versus culture age for the same culture give similar profiles. Generating growth curves from total protein rather than optical density gave more consistent data. The ELISA method was used to measure the amount of Nptll produced by three separate cultures of S. lividans

TK24 (pANT852) at a number of timepoints over a period of several days. A plot of the

SnpR-activated snpA promoter-driven Nptll produced as a function of total culture protein is provided in Figure 2.07. The graph shows that expression of the aphll reporter gene occurs primarily after the logarithmic growth phase, and increases with culture age.

To directly compare the snpA promoter with the mutant ermE promoter (Perm£»), an established over-expression promoter, two new constructs were generated: pANT867, which contained a snpR-?snpA-ophJI cassette; and pANT827, carrying a PermE-aphll cassette. Both plasmids are E. coli-Streptomyces shuttle vectors. Figure 2.08 shows the results of ELISA assays conducted on soluble cellular protein from S. lividans TK24

(pANT867), S. lividans TK24 (pANT827), and the negative control S. lividans TK24

(pANT857). As expected, the negative control produced imdetectable levels of Nptll.

The mean level of Nptll produced by triplicate cultures containing the PsnpA construct, however, was about 1.3-fold greater than that produced by similar cultures containing the

Perm£* coustruct, indicating that under the conditions of the test, the snpA promoter slightly outperformed the highly active mutant ermE promoter.

Mapping of the snpA and snpR transcriptional start sites. The S' mRNA termini of the two Streptomyces C5 snp transcripts were mapped using a non-radioactive technique optimized for the Applied Biosystems ABI 377 automated DNA sequencer.

65 Time vs.Optical Density

oJ

04

0 20 40 8060 Tim# (hrs) Time vs Culture I Time vs Culture 2 Time vs Culture 3

B Time vs. Total Soluble Protein lu u 9 8 7 6 5 4

3

8 2

I 0 0 9 0 8 0 7

0 20 40 60 80

-# — Time vs cultl prot Tim* (hrs) O Time vs cults prat —w — Time vs cults prat

Figure 2.06. Comparison of streptomycete growth curves generated from optical density readings (A«oo readings, panel A) and total protein levels (Bradford assay As9s readings, panel B).

6 6 Figure 2.07. Profile of SnpR-activated P,npA-driven reporter gene expression and culture growth. Samples from triplicate 0.5 L cultures of S. lividans TK24 (pANT852) grown in TSBP-S medium were processed for Nptll determination as described in the text. With the mean total protein curve indicating the growth phase of the fermentation, Nptll accumulation increased as the culture ages. This suggests that the snpA promoter is active in stationary phase.

67 s. lividans TK24 (pANT852) Nptll Accumulation { mean values from three 500 ml cultures }

10

- 14

- 12

E - 10 a Ic 0) — S 2 o Q. — 6 z

I - 4

0 .9 0.8 - 2 0 .7 0.6 0 20 4 0 6 0 8 0 Time (hrs) Time vs mean Total Prot Time vs mean Nptll

Figure 2.07. Profile of SnpR-activated P%npA driven reporter gene expression and culture growth.

6 8 Figure 2.08. Reporter cassette structure and Nptll production by control, ŸsnpA and Pemif» constructs. Panel A, the ap/i//expression cassettes for comparison of ermE* and snpA promoters; Panel B, bar chart showing results of Nptll ELISA analysis of triplicate TSBP-S cultures of S lividans (pANT867) and S', lividans (pANT827), as well as the S. lividans (pANT857) negative control.

69 PANT857 snpR MCS ~snp.4

snpR pANT867 c ^ snpA

pANT827

PermE*

B. Nptll Production by Recombinant S. IMdans

1400

Ia. 1200 « n 1000

pANT857 PANT867 pANT827

Figure 2.08. Reporter cassette structure and Nptll production bycontrol, P s n p A and P e r m E * COnStlTICtS.

70 Based on the method described by Yamada et al. (1998), 5’ FITC labeled oligonucleotides were hybridized to total RNA prepared from early stationary phase S. lividans TK24 (pANT842) cultures, and the addition of reverse transcriptase initiated the extension of the DNA primer on the RNA template. When separated by electrophoresis adjacent to a dideoxy-sequencing ladder generated with the same primer, the exact size of the extension product indicated the transcriptional start point. Figure 2.09 and Figure

2.10 show the results of the primer extension experiments for the snp A and snpR transcripts, respectively. In the case of snpR, the transcriptional start site is 139 nucleotides upstream of the likely AUG start codon, and is positioned 28 nucleotides upstream of the putative SnpR binding site. The snpA transcript, on the other hand, is a leaderless mRNA species (Janssen, 1993), with transcription of the gene starting at the adenine of the ATG start codon most likely to encode the formy 1-methionine of SnpA.

Construction and utilization of Streptomyces expression vectors. The desire to harness the productive capacity of the snp locus originated with the observation of the large hydrolytic halos surrounding colonies of S. lividans TK24 (pANT42) growing on milk-agar plates. An SphI site, conveniently located at the extreme 5' end of the snpA open reading frame, permitted the replacement of snpA with a synthetic multiple cloning site, generating plasmid pANT849, which contains 10 unique restriction endonuclease sites downstream of the SnpR-activated snpA promoter. Although the relatively small size (5343 bp) and stability of pANT849 make it easy to work with, the time-consuming nature of Streptomyces genetic manipulations (transformations, protoplast regeneration.

71 Figure 2.09. Primer extension results forsnpA transcript. Primer extension reactions using a snpA specific primer, conducted in the absence (lane 1 ) and presence (lane 2) of reverse transcriptase, were electrophoretically separated next to a dideoxy terminator sequencing reaction prepared with the same fluorescein labeled primer as described in the text.

72 A T G C 1 2

C T C A G T A G C T * A C G C G T A C G G G G A C A G G

Figure 2.09. Primer extension results forsnpA transcript.

73 Figure 2.10. Primer extension results for the snpR transcript. Primer extension reactions using an snpR specific primer, conducted in the presence (lane 1 ) and absence (lane 2) of reverse transcriptase, were electrophoretically separated next to a dideoxy terminator sequencing reaction prepared with the same fluorescein labeled primer; see text for details.

74 T-N|i-A region

«

Figure 2.10. Primer extension results for the snpR transcript.

75 and plasmid preparation) necessitated the construction of pANT849-based E. coli

Streptomyces shuttle plasmids. The first such vector prepared was plasmid pANT857, which contains the pUC19 replicon and bla gene inserted into pANT849. This 7970 bp shuttle vector is selectable in E. coli with ampicillin and Streptomyces with thiostrepton.

The desire to conduct bipartite plasmid experiments involving multiple cistrons expressed by the snp promoter led to the generation of a family of non-thiostrepton selectable pANT849-based shuttle plasmids. The alternative markers chosen were the aphll neomycin phosphotransferase gene and the acc(3)-TV apramycin resistance gene. A multi-step construction scheme was imdertaken to optimize the properties of the finished plasmids; the elimination of common restriction sites, as well as reduction in size of the

E. coli replicon donor plasmids, permitted preservation of many unique MCS restriction sites, and kept the overall size of the final plasmids reasonable. It has been observed that high-copy number plasmids exceeding approximately 10 kbp in size exhibit exaggerated instability in both E. coli and Streptomyces (personal observations; Wames and

Stephenson, 1986; Baibas et al., 1986). Of additional importance in reducing the final size of the expression vectors was the generation of bifimctional resistance markers.

The apramycin resistance gene from pKC505 is transcribed by a promoter,

Pacc( 3)-iv, which functions both in streptomycetes and E. coli. This promoter was used to construct a bifunctional aphll marker for use in construction of neomycin selectable vector. Plasmids pANT894 and pANT895 are neomycin and apramycin selectable snp- based expression vectors, respectively, and together with pANT857 represent the second generation of j/tp-based expression vectors. These plasmids were further improved with the addition of an enhanced MCS containing additional unique restriction enzyme sites,

76 stop codons in ail three frames (useful for expression of gene fragments lacking their native stop codon), and the strong Streptomyces coelicolor mmr transcriptional terminator. This third generation of 5w/7-based expression vectors, the pANT1200 series, and the improved MCS is shown in Figure 2.11. In support of streptomycete antibiotic biosynthesis studies, jwp-based vector systems have been used to express numerous genes from the Streptomyces sp. strain C5 daunomycin biosynthesis gene cluster. The dps polyketide synthase (Rajgarhia and Strohl, 1997), dnm glycosyltransferase (Woo and

Strohl, unpublished), and doxA monooxygenase (Dickens and Strohl, 1996) genes have all been successfully expressed using the jnp/î-activated snpA promoter.

Construction and utilization ofStreptomyces secretion vectors. The next step in the development of the transcriptional-fusion plasmids was the generation of a translational fusion expression system harnessing the secretion machinery of the snp locus. In designing the new secretion plasmids, it was decided to mitigate the often-stochastic nature of heterologous protein secretion by constructing a series of plasmids incorporating secretion signals from several different extracellular streptomycete proteins. The signal sequences chosen were from the native Streptomyces sp. strain C5

SnpA (SNP), the Amycolata species pectate lyase (PEL), and four S. venezuelae sequences, two wild type (VAA, VSI) and two mutant (VAAUP, VSIUP) signal peptides with demonstrated efficacy in directing secretion of heterologous proteins. Table 2.3 shows the sequences and properties of the signal peptides used.

77 Figure 2.11. The pANT1200 series expression vector plasmids and multiple cloning site. The pANT1200 series of 5 /7/ 7 -based shuttle expression vectors is comprised of three plasmids, pANT1200, which is selectable in E. coli with ampicillin and in Streptomyces with thiostrepton, pANTI201, selectable in both hosts with neomycin, and pANT1202, selectable in both hosts with apramycin (Panel A). The MCS of these vectors (Panel B) contains 13 restriction endonuclease sites, of which different sets are unique in the three different vectors. Also included in the MCS are stop codons in three frames, as well as a bidirectional terminator after the polylinker. The snpA transcriptional start point (TSP) is indicated by the arrow above the sequence, the start and stop codons are in boldface, and the inverted repeat comprising the transcriptional terminator is indicated with convergent arrows superimposed on the sequence.

78 A.

S

5758 .!

a

1.2207

B. Clal EcoRI Xbal PstI Hindi II S>CACGATGTCTGACTCATCCCCCCACCTCGAGCAGTCATCGATCCGC/^TGCGAy\TTCAGATCTAGA(yTCTGCAGGATCCAAGCTTACTAGT Xhol SphI Bgin SacI BamHI SpcI

(«top codons m üirec fmes) ^ Kpnl , Ncol Dral GGTACCACC^GTGAATAGATAGAGTjGCAGGGGCCCCGA^GI^CGGGGCCCCTGC^CGGCrrCCATGÜAGCTrr'AAACTAGTTAACGCG-à' Mlul

Figure 2.11. The pANT1200 series expression vector plasmids and multiple cloning site.

79 Code Plasmid Signal Sequence*

SNP pANT3025 MRMPLSVLTAAGLSLATLGLGTAGPASA TAEG

VAA PANT3022 MARKTVAAALALVAGAAVAVTGNAPAQAVPPG

VAAUP PANT3021 f®RKTVAAALALVAGAAVAVTGNAPAQA VPPG

VSI PA N T 3024 MRRTLKAVGAAAAAATCVLAATAGTAOA EAPK

VSIUP PANT3023 MRRTLQAVGAAAAAATCVLAATAGTAQA EAPK

PEL P A N T 3026 MIPQRT ALIS AAVL VLGAVSIPQAT A APFG -1 +1

Table 2.3. Leader peptides of the snp-based secretion vectors.The hydrophobic core regions of the signal peptides are flanked by the cationic amino terminal regions in boldface, and the underlined carboxyl-terminal charged regions. The putative signal peptide cleavage sites are indicated by the dashed line. For the VAAUP and VSIUP sequences, the single amino-acid alteration from the respective native sequence is double underlined.

80 The secretion vectors were constructed by ligating Cla\-BamH\ ended dsDNA linkers containing the appropriate coding sequence into similarly digested pANT1201.

The resulting plasmids, pANT3021-3026, each contain unique BamHl and HindWl sites allowing the insertion of DNA containing a gene to be expressed and secreted.

The BamHl site is comprised of two codons, GGA and TCC, which encode glycine and serine residues positioned three codons downstream of the putative signal peptide recognition sites of the signal sequences. Genes to be expressed and secreted must be engineered (by PCR) to incorporate an in-frame GS-encoding BamVll site (alternatively

Bell or 5g/II, which generate BamHl compatible cohesive ends, could be used), and a downstream Hindlll, Spel, Kpnl, or Mlul site. Insertion of the engineered fragment into the six different vectors would generate six chimeric genes under the transcriptional control of the snpA promoter, which would then be evaluated for secretion efficiency in recombinant Streptomyces hosts.

In order to test two of the secretion vectors, recombinant human endostatin was expressed and secreted by recombinant S. lividans. A version of the human endostatin cDNA clone, containing a 5’ BamHl site encoding an amino terminal glycine-serine extension, was inserted into pANT3022 and pANT3025 for comparison of the SNP and

VAA signal sequences in directing secretion of the protein. The vector-insert joint of the

VAA-endostatin construct, pANT3042, is shown in Figure 2.12.

SDS-PAGE and western blot analysis of the culture broths of S. lividans TK24 harboring the different plasmids revealed the presence of immunoreactive bands when probed with anti-endostatin antibodies. The immimoreactive species from the S. lividans

(pANT3042) broths migrated identically to pure recombinant endostatin, while those

81 Figure 2.12. Nucleotide sequence of pANT3042 signal peptide- endostatin junction. This figure shows several important features of the signal peptide-endostatin junction in the VAA-endostatin construct. The Cla\ site is positioned over the superimposed transcriptional-translational start site, which is indicated with an asterisk. The arrow indicates the putative VAA signal peptidase cleavage site. Encoding glycine and serine residues, the BamHl site is the point of fusion between the vector DNA and the human endostatin cDNA; the amino-acid sequence of native human endostatin is capitalized.

82 Cla\ • ATCGRTGGCCAGAAAGA.CCGTGGCAGCTGCACTCGCCCTTGTGGCGGGAGCCGCTGTGGCCGTCACG TAGCTACCGGTCTTTCTGGCACCGTCGACGTGAGCGGGAACACCGCCCTCGGCGACACCGGCAGTGC marktvaaalalvagaavavt

BamWX GGCAACGCCCCGGCGCAGGCCGTCCCGCCCGGATCCCACAGCCACCGCGACTTCCAGCCGGTGCTC CCGTTGCGGGGCCGCGTCCGGCAGGGCGGGCCTAGGGTGTCGGTGGCGCTGAAGGTCGGCCACGAG gnapaqavppgsHSKRDFQPVL Î

Figure 2.12. Nucleotide sequence of pANT3042 signal peptide-endostatin junction.

83 from the pANT3045 strain exhibited different migration. Whereas the endostatin used as

a standard was purified in the form of insoluble inclusion bodies from recombinant E.

coli, the material produced by the streptomycetes was completely soluble, as determined

by centrifugal clarification of the broth samples before electrophoresis. Figure 2.13

shows western blots of various protein samples from S. lividans (pANT3042) and S.

lividans (pANT3045) cultures.

A significant qualitative difference in endostatin secretion was observed between

the pANT3042 and pANT3045 recombinant S. lividans strains. While the broth of S.

lividans (pANT3042) contained properly sized product, very little was seen in the

p ANT3045-recombinant broth. Analysis of the total mycelial protein from the two

revealed that most of the endostatin produced with the SNP leader remained cell

associated, and exhibited significant degradation, while only a small amoimt of the VAA-

endostatin product remained cell-associated, and most was in the broth fraction.

In an effort to enhance production of the recombinant protein, a Kpnl fragment

containing the VAA-endostatin cassette was excised from pANT3042 and

inserted into the streptomycete vector pIJ303. Containing the second-strand origin of

replication {stf), pIJ303 has a higher copy number and is a more stable vector than the stf

pIJ486 and pIJ702 vectors. It was hoped that the alternative vector would lead to enhanced production of the encoded protein. Figure 2.14 shows a growth curve of S. lividans (pANT3052) and the associated western blot of the broth samples. For comparative purposes, an aliquot of S. lividans (pANT3042) broth was analyzed in the western blot as well. The qualitative results indicate a significant increase in volumetric secretion of recombinant endostatin by the pANT3052 recombinant when compared to

84 Figure 2.13. Western blot analysis of culture broth from streptomycetes secreting recombinant human endostatin. Panel A, western blot showing Streptomyces-proànccd soluble endostatin versus E. co//-produced insoluble endostatin; Panel B, western blot analysis of 5 pi broth aliquots from S. lividans (pANT3042) TSBP-S (Lane 1) and defined medium (Lane 2) cultures, as well as S. lividans (pANT3045) TSBP-S (Lane 3) and defined mediiun (Lane 4) cultures. Also shown in lanes 5-8 of panel B are aliquots of disrupted mycelium corresponding to the broth samples of lanes 1-4, respectively.

85 I l exfil < ♦

B.

4

Figure 2.13. Western blot analysis of culture broth from streptomycetes secreting recombinant human endostatin.

8 6 Figure 2.14. Growth curve and recombinant endostatin production profile for S. lividans TK24 (pANT3052). Panel A, optical density growth curve; Panel B, Anti-human endostatin western blot of clarified broth samples from the same culture, with an added aliquot of S. lividans (pANT3042) culture broth; Arrows drawn between bands indicate comparable fermentation broth samples.

87 Streptomyces lividans TK24-pANT3052 Growth (500ml TSBP-S)

6 5 4 3 I

2 i

0.6 0.5 0.4 0.3

0.2

0.1 0 10 20 30 40 SO 60 70 80 90 100 110 120 130 140 ISO Time (hrs)

Compare Directly

endostatin pANT3042 culture, 46hrs

Figure 2.14. Growth curve and recombinant endostatin production profile for 5. lividans TK24 (pANT3052).

8 8 Figure 2.15. Semi-quantitative western blot and densitometric analysisof S. lividans TK24 (pANT3052) broth samples. Panel A, western blot of E. coli- produced endostatin standard and Streptomyces-^rodxxccd endostatin. Lanes 1-4 contain 2.1, 10.6, 21.4, and 210 ng of E. co//endostatin respectively; Lanes 5-10 contain 5 pi broth aliquots from a TSBP-S culture of of S. lividans TK24 (pANT3052) taken at 0, 17, 24,44, 65 and 89 hours post-inoculation, respectively. Panel B, plot of the density products for the bands in lanes 1,2, and 3 (for the standard curve, filled circles) as well as the band in lane 9 (unquantified Streptomyces endostatin, open inverted triangle).

89 10

Densitometry Plot B.

> , 6e*6 -

* Ë

0 s 10 IS 20 25 ng endostatin standard

Figure 2.15. Semi-quantitative western blot and densitometric analysis ofS. lividans TK24 (pANT3052) broth samples.

90 the pANT3042 strain. To estimate the amount of the recombinant protein produced by the S. lividans (pANT3052) culture, a semi-quantitative western blot was performed by loading broth samples next to a series of lanes with increasing amoimts of endostatin standard. Densitometric analysis of band intensities (Figure 2.15, Panel B) indicated the streptomycete cultures produce extracellular endostatin at a concentration of approximately 2 mg per liter.

Discussion

This section describes the genetic characteristics of the Streptomyces sp. strain C5 snp locus, and the development of a set of gene expression tools harnessing the transcriptional properties of the s/y/(-activated snpA promoter.

The snpR gene encodes a LysR-like protein with a strong amino-terminal helix- tum-helix motif, and significant homology to the two other SnpR-like proteins described from homologous streptomycete loci, SlpR (Butler et al., 1992; Lichenstein et a i, 1992) and MprR (Dammann and Wohlleben, 1992). Since they were originally described as a distinct class of transcriptional regulators by Henikoff et al. (1988), the LysR-like family has grown considerably. Members of the family are involved in the control of diverse metabolic pathways, ranging from activation of antibiotic biosynthesis and resistance to xenobiotic degradation and the production of virulence factors. A number of

Streptomyces LysR-like proteins unrelated to SnpR have been identified. For example, the ahaB gene, isolated from S. antibioticus ATCCl 1891, has been characterized as a global activator of secondary metabolite biosynthesis, and has close homo logs in many other Streptomyces species (Scheu et ai, 1997). In contrast, the claR gene product of S.

91 clavuligerus specifically activates transcription of clavaminate synthase, and thus

controls the late stages of clavulanic acid biosynthesis (Paradkar el a/., 1998). The bla4

gene of S. cacaoi encodes an activator of two p-lactamase genes, blaL and blaU

(Magdelena et al., 1997), and the StgR protein of S. alboniger negatively regulates

transcription of the putative aspartate aminotransferases gene stgA (Tercero et a i, 1998).

Reporter-gene experiments conducted in this study indicate that SnpR serves as an activator of snpA transcription. On average, SnpR-activated levels of snpA promoter activity exceeded imactivated levels by 35-fold. The function of SnpR as an activator is consistent with the generally observed role of LysR-like transcriptional regulators in the

loci they affect. Most serve to stimulate transcription of their target gene(s), usually between 6 to 200 fold (Schell, 1993). Of all the LysR-like proteins characterized to date, only a small number regulate hydrolytic processes. Of these, SnpR and its homologs appear to be the only members that directly activate the expression of a proteinase gene.

The HexA protein o f Envinia carotovora is somewhat similar in that it acts a global regulator of multiple extracellular genes, but it functions as a repressor, controlling the transcription of genes encoding such enzymes as pectate lyase and cellulase, virulence factors involved in bacterium-plant interactions (Harris et al., 1998).

The LrhA protein of E. coli, on the other hand, indirectly controls proteolytic degradation of RpoS by modulating expression of the response regulator SprE, which in turn controls cleavage of RpoS by ClpX and ClpP (Gibson and Silhavy, 1999). The role of SnpR in direct activation of proteinase expression imderscores the importance of this study.

Manual and computer-assisted evaluation of the snpA-R intergenic sequence for recognizable DNA motifs suggested a number of potentially important features. The

92 most pronounced was an inverted repeat, centered on a T-Nn-A sequence, within 100

nucleotides of the snpA start codon. This motif, originally described by Henikoff et al.

(1988) and observed repeatedly since, is typical of many LysR-like protein binding sites.

Thus, this snp intergenic feature is likely to be the SnpR binding site. Searches for

sequences resembling promoters, however, were less fruitful. Neither computer assisted

screening with a neural network algorithm (Reese et al., 1996), nor manual screening for

sequences resembling consensus streptomycete promoters (Strohl, 1991; Bourn and

Babb, 1995) revealed any segments with similarity to general prokaryotic or specific

streptomycete consensus promoters.

Primer extension experiments conducted on both snpA and snpR revealed the

transcriptional properties of the Streptomyces sp. strain C5 locus. The snpR

transcriptional start point was mapped to a position 138 nucleotides upstream of the

predicted snpR start codon. Although this leaves a long untranslated stretch of snpR

mRNA, this placement of the promoter positions the -35 element within the likely SnpR

binding site of the snpA-R intergenic region. The overlapped orientation of Pj„pR and the

likely SnpR binding site suggests a negative autoregulatory role for the snpR gene

product in snpR transcription. Such negative autoregulation is a feature of many LysR-

like systems (Schell, 1993). In contrast, the transcriptional start site of snpA mapped to the adenine of the snpA ATG start codon, indicating the proteinase transcript is a

leaderless mRNA species. The AGGA sequence immediately uptream of the snpA ATG, although optimally positioned and originally speculated to function as a Shine-Dalgaro sequence (Lampel et al., 1992), is clearly not involved in translation initiation.

93 Figure 2.16 shows the nucleotide sequence and transcriptional features o f the snpA~R intergenic

region. This promoter structure has significant functional implications for the various expression plasmids

based on the snpA promoter. Since the snp locus has been successfully used as a platform for expression of

both streptomycete and non-streptomycete genes (as transcriptional and translational fusion constructs),

several conclusions can be made about translational events likely to be occiuring. In circumstances where

the DNA inserted into the expression system contains both the gene of interest and its native Shine-

Daigamo sequence, such as with the aphll reporter gene or doxA monooxygenase, conventional translation

initiation is likely to occur. These situations probably involve the synthesis o f a short polypeptide from the

5’ AUG of the chimeric transcript. Inspection o f the mRNA encoded by the aphll reporter constructs

shows that a 21 amino-acid polypeptide would be synthesized by translation of the open reading frame

starting with the 5’ AUG before terminating with a UGA codon just 5’ of the aphll AUG. Similarly, the

doxA expression cassette of pANT195 contains an open reading frame with eight codons starting with the

5’AUG and terminating at a UGA codon overlapping the doxA AUG (U.S Patent 5,962,293). In the cases

where chimeric signal peptide-endostatin genes are expressed by the snp system, however, the AUG start

codons of the new open reading frames (within the Clal site) are superimposed on the transcriptional start

points of the chimeric transcripts. Although this translational context is similar to wild type snpA, in

replacing the native coding sequence with the chimeric open reading frame, any native translational signals

downstream of the AUG are lost, and not suitably replaced, as with the aphll and doxA fusions. As evidenced especially well

94 Figure 2.16. Nucleotide sequence and transcriptional features of the Streptomyces C5 snpA-R intergenic region. Annotations include the snpA and snpR transcriptional start points (TSPs), and the optimal -10 and -35 regions for each. Spacing between these putative RNA polymerase binding sites is 19 nucleotides for the snpR TSP, and 18 nucleotides for the snpA TSP. The likely SnpR binding site is indicated, with the T-Nn-A thymidine and adenine in boldface, and the inverted repeat indicated with arrows. The snpA and snpR start codons are indicated in boldface as well.

95 snpR CATGTCCTGGGAGGGTAAGGCGGAAGTTCAGCTTTCACCAGACATACAAAATGGCGACCGATCAGGACCATCGGGCCTTCACGGCGCGA GTACAGGACCCTCCCATTCCGCCTTCAAGTCGAJiAGTGGTCTGTATGTTTTACCGCTGGCTAGTCCTGGTAGCCCGGAAGTGCCGCGCT

{putative SnpR binding site} GGCGTCGGCCCGGATCGGCAGGGGCCCCGGCCGGGGCCGCCGGGCAGGGCGGGGCAGGTGGGGACGGAGGGdGATAGGGtaoCcCCTATCi

-10 region -35 region snpR TSP snpA ~ ^P -35 region -10 region ggcggttgccatcatcacaacggccgtacgggcacggacactcacgatgtctgactcatccccccacctcgaggagtcatcgatgcgca CCGCCAACGGTAGTAGTGTTGCCGGCATGCCCGTGCCTGTGAGTGCTACAGACTGAGTAGGGGGGTGGAGCTCCTCAGTAGCTACGCGT snpA

Figure 2.16. Nucleotide sequence and transcriptional features of the Streptomyces C5 snpA-R intergenic region.

96 in the case of the VAA-endostatin construct, the observed production indicates that the leaderless

endostatin expression cassettes are recognized by the host ribosomes and translated properly.

Most bacterial mRNAs carry an untranslated leader region 5’ of the translational

start point (Ptashne et ai, 1976; KJock and Hillen, 1986). Between 4 and 15 nucleotides

upstream of the start codon lies the Shine-Dalgamo (SD) sequence, which is

complementary to the 3’ end of the 16S rRNA (Jacob et a i, 1987). The widely accepted

model of translation initiation involves the formation of a ternary complex between the

mRNA, fMet-tRNA*^®‘, and the 30S subunit, aided by the three initiation factors, IFl, IF2

and IF3, and one molecule of GTP. On leadered transcripts, the specific interaction

between SD sequence and anti-SD sequences has been shown to be a major determinant

of translational efficiency, along with the initiation codon anti-codon interaction, and

mRNA secondary structure (reviewed by Gold, 1988). Not all mRNA molecules are

synthesized with 5’ untranslated leaders, though. Numerous transcripts lacking

untranslated leader segments have been characterized from eubacteria, archaebacteria and eukaryotes (Janssen, 1993). For obvious reasons, the existence of leaderless transcript

molecules challenges the notion that a 5’ leader (and SD sequence) is essential for

translation of mRNA molecules. The fact that many of these mRNAs are translated with

high efficiency, despite the absence of SD sequences (Janssen, 1993), indicates an alternative method of translation initiation is occurring in these cases. A search for sequences which might act analogously to the SD sequence of leadered transcripts revealed several leaderless mRNA containing regions downstream of the initiation codon, termed “downstream boxes”, with complementarity to the ribosomal 16S rRNA sequences 5’ to the accepted anti-SD region (Sprengart et al., 1990). Experimental

97 alterations in the putative downstream boxes of several E. coli genes encoding leaderless transcripts that increased or decreased complementarity to the cognate 16S rRNA sequence enhanced and reduced translation, respectively (Faxen et al., 1991; Ito et al.,

1993; Nagai et al., 1991). These data supported the theory that the downstream boxes may act analogously to the SD sequences of leadered mRNAs in binding 30S ribosomal particles during ternary complex formation. Based on their observation that the leaderless À. cl transcript is translated more efficiently by ribosomal protein S2-deficient mutants of E. coli, Shean and Gottesman (1992) proposed a sub-population of ribosomes lacking the S2 protein was responsible for translation of cl mRNA in X infected E. coli.

Suggesting that this mechanism might apply to the translation of other leaderless transcripts as well, this work gave further credence to the downstream box theory.

Several years later, however, in the face of mounting evidence for the downstream box anti-downstream box theory, Resch et al. (1996) reported an independent, detailed analysis of leaderless mRNA translation. Using E. coli Xcl, phage P2 gene V, and

Tnl 721 tetR mRNAs, they showed that the downstream box anti-downstream box interactions were neither essential for ternary complex formation in vitro, nor necessary for efficient translation in vivo (Resch et al., 1996). Moreover, the authors cast doubt on the original downstream box / anti-downstream box theory by suggesting alternative interpretations of the published data. More recently, the same group published a study of the role of the initiation factors in leaderless mRNA translation (Tedin et al., 1999).

They showed that 1F3 antagonizes translation initiation from leaderless transcripts, and in vivo, 1F3 levels are inversely proportional to efficiency of leaderless mRNA translation

(Tedin et al., 1999). The revised model proposed suggests a that sub-population of 1F3- 98 deficient 3OS ribosomes form ternary complexes on leaderless transcripts during exponential growth, while 70S ribosomes (which show a high preference for terminal

AUG codons, and accumulate after logarithmic growth has ceased) are responsible for these initiation events in stationary phase (Tedin et al., 1999).

As for expression of the VAA and SNP-endostatin cassettes in S. lividans, the mRNAs are translated at a level sufficient to allow immunochemical detection of the recombinant protein; whether this is indeed efficient translation remains to be determined, but the S. lividans protein synthesis machinery clearly recognizes and translates the transcripts.

The first 5 /7/ 7 -based expression vector generated, pANT849, enabled the aphll reporter gene promoter studies, but was also immediately useful in the efforts to clone and over-express enzymes of the daimomycin biosynthetic pathway of Streptomyces sp. strain C5. O f the numerous daunomycin genes expressed in pANT849, one of the more noteworthy is the doxA gene, encoding daunomycin-13-hydroxylase. Acting late in the biosynthetic pathway, DoxA is responsible for converting the anthracycline daunomycin to the more potent doxorubicin (Dickens and Strohl, 1996), and the snp promoter was successfully used to over-express the enzyme for biochemical characterization, as well as bioconversion experiments (U.S. Patent 5,962,293).

Accompanying the original expression plasmid is a set of vectors complementing and improving upon the strengths of the prototype. Generating pANT849-based £. coli-

Streptomyces shuttle vectors with an improved MCS and different selectable markers simplified manipulation, improved the versatility and efficiency of the vectors, and enabled multipartite expression experiments. The general purpose E. coli cloning vectors

99 pANT841 and pANT846, on the other hand, were constructed to contain a multitude of

restriction enzyme sites, not found in pUC19, in the form of a modified /acZa fragment.

These plasmids are designed to act as intermediate vectors in cloning schemes where the

source DNA does not contain restriction sites of the pANT849 (or derivative) MCS. For

example the dauB gene of Streptomyces sp. strain C5 was first cloned into pANT841 as

an Xhol-Sstl fragment, and then excised as a Clal-Sstl fragment and inserted into

pANT857 (C.L.DeSanti, unpublished data).

Completing the expression vector set are the protein secretion variants. These

plasmids were designed for translational fusion of a streptomycete signal peptide to a

heterologous gene (either non-snp or non-streptomycete) chosen for expression and

secretion. Since protein secretion is multivariate and still not completely understood, a panel of six streptomycete signal peptides was selected and incorporated into the snp expression locus. At the outset of a protein expression-secretion project, the gene to be expressed is engineered with the appropriate enzyme sites to allow the fragment to be inserted into the six pANT302x plasmids. The six chimeric genes, imder control of the

5«pi?-activated snpA promoter, can then be tested for successful expression and secretion of the desired product. Two of the expression-secretion vectors, pANT3021 containing the VAA signal peptide, and pANT3025 with the SNP signal peptide, were tested for the ability to confer expression and secretion of recombinant human endostatin on S. lividans. The endostatin gene was chosen because: i) it is human in origin, and therefore represents a more significant expression/secretion challenge than a streptomycete or bacterial protein; ii) the cDNA, standards and immtmochemical reagents were readily accessible; iii) adequate systems for soluble recombinant hiunan endostatin preparation

100 were not available; and iv) as a novel angiogenesis inhibitor (O’Reilly et a i, 1997;

Saarela et a i, 1998), endostatin is of obvious clinical interest.

After plasmid construction and fermentation of the recombinant S. lividans hosts, comparison of endostatin production conferred by the two expression-secretion cassettes showed significant qualitative differences. While the VAA-endostatin construct conferred production of abundant, properly sized, soluble extracellular recombinant human endostatin on S. lividans, the SNP-endostatin cassette led to a product which was largely mycelium-associated and only barely detectable in the cell free, clarified culture broth. One possible explanation for this aberrant SNP-endostatin secretion is that the gene product was not efficiently recognized by the streptomycete signal peptidase(s), and remained membrane associated after abortive processing. In contrast, the VAA- endostatin product was properly processed and released. Once the VAA-endostatin cassette was shown to confer expression and secretion of properly-sized human endostatin on streptomycete hosts, an experiment was undertaken to evaluate the contribution of different vector replicons to the overall productivity of recombinant endostatin expression and secretion.

Many streptomycete plasmids, including pANT849, are derived from the broad host range S. lividans plasmid pIJlOl (Kieser et ai, 1982). However, in the construction of the plJlOl derivitives most commonly used for streptomycete cloning, pIJ486 and pIJ702, significant portions of the native pIJlOl replication machinery were eliminated in the name of reducing size and rendering restriction enzyme sites unique (Katz et a i,

1983; Ward e /a/., 1986).

101 Later, after pIJ486 and pIJ702 were in common use, continued characterization of

pIJlOl revealed that the initiation site for second strand replication (named sti, Deng et

al., 1988) had been inadvertently deleted ft-om the two plasmids. As a result, recombinant

streptomycetes harboring pIJ486 or pIJ702 based plasmids accumulate a disproportionate

amount of single stranded plasmid DNA (Zaman et al., 1993). It has been suggested that

this fact can lead to: i) undesirable inter- and intra-plasmid, as well as plasmid-

chromosome recombination events; ii) plasmid instability; and iii) plasmii copy numbers

lower than seen with sti+ plasmids (Zaman et al., 1993).

To test the contribution of sti to overall recombinant endostatin productivity, the

VAA-endostatin cassette was inserted into pIJ303, which contains all of the pIJlOl replication machinery, including most importantly the sti locus. Qualitative comparison of samples taken from equivalent fermentations shows that the pIJ303-based plasmid conferred greater endostatin productivity on S. lividans than did the pIJ702- based plasmid. This is most likely due to the increased copy number and stability of the plJ303-based construct. It is also notable that the endostatin produced in these fermentations was stable in the culture broth, and continued to accumulate without noticeable degradation after 120 hours of culture incubation. Most likely, this results from a combination of vector stability, properly folded protein, and low water-activity in the culture medium (at inoculation, the sucrose concentration was 15%; with evaporative water loss, the final concentration was certainly higher). Using a semi-quantitative western blot method, the level of endostain produced by the S.lividans (pANT3052) strain is estimated to be 2 mg per liter of fermentation broth at the 65 hour timepoint, well into stationary growth phase. Even without the extensive optimization that is often

102 applied in such projects (e.g. Fomwald et al., 1993), this level of production is similar to several other projects designed to produce and secrete mammalian proteins with

Streptomyces (Binnie et al., 1997).

Summary

The Streptomyces sp. strain C5 snp locus is comprised of two divergently oriented genes, the snpA metalloproteinase gene, and the snpR gene, which encodes a LysR-like activator of snpA transcription. The transcriptional start point of snpR is immediately downstream of a strong T-Ni i-A inverted repeat motif likely to be the SnpR binding site, suggesting a negative autoregulatory effect of the SnpR protein on snpR transcription.

The snpA transcriptional start site is directly over the ATG of the snpA gene, making the transcript a leaderless mRNA species. The absence of any downstream box-like regions in the snpA sequence, or in the heterologous genes fused to the start codon and successfully expressed, supports the theory put forth by Resch et al. (1996) limiting the importance of non-Shine-Dalgamo sequences in the translation of leaderless mRNAs.

On extrachromosomal vectors, the snpR activated snp A promoter was around 35 times more active than either the unactivated snpA promoter, or the well characterized melCl promoter of pIJ702. When compared to the highly active ermE-wp promoter, snpR- activated snpA transcription was about 30% higher using the aphll reporter gene of pIJ486. The highly active snpA promoter was tailored into a set of transcriptional and translational fusion expression vectors which have been employed in the intracellular expression of numerous daunomycin biosynthesis pathway genes from Streptomyces sp. strain C5, as well as the expression and secretion of soluble recombinant human endostatin.

103 CHAPTER3

ISOLATION, CHARACTERIZATION AND ANALYSIS OF TWELVE Streptomyces

LOCI HOMOLOGOUS TO Streptomyces sp. strain C5 snp

Introduction

Bacteria of the genus Streptomyces are well known producers of extracellular hydrolases. Inhabiting largely terrestrial niches, streptomycetes rely on many of these hydrolases for nutritional purposes, as they degrade polymeric raw materials found in the environment into monomers and oligomers for transport and assimilation. Beyond nutritional roles, proteinases also have been demonstrated to function in the developmental life cycle, regulating key steps between growth stages, as well as in the invasion and pathogenesis processes of plant pathogenic streptomycetes.

Three similar Streptomyces genetic loci encoding unusually small neutral metalloproteinases from Streptomyces sp. strain C5 {snp; Lampel el a i, 1992), S. lividans

66 {slplprt; Butler et a i, 1992; Lichenstein et a i, 1992) and S. coelicoior Müller{mpr;

Dammann and Wohlleben, 1992) have been reported in the literature. All three loci contain two divergently oriented genes encoding a metalloproteinase and a LysR-like transcriptional regulator. In addition, the primary structure of a similar small metalloproteinase crystallized from culture broth of S. caespitosus has been reported

(Harada et ai, 1995). Although genetically uncharacterized, the S. caespitosus enzyme possesses an amino acid sequence very similar to the deduced SnpA, SlpA and MprA 104 sequences, and thus constitutes a fourth member of this proteinase class. Interestingly, the Streptomyces sp. strain C5 and S. caespitosus proteins were demonstrated both by

SDS-PAGE and amino acid sequence analysis to be significantly smaller than the S. lividans and S. coelicoior enzymes, even though the homologous genes are exceedingly similar within the mature proteinase coding regions. The snpR genes exhibit a comparably high degree of similarity, but such relatedness between loci appears not to extend through the intergenic regions or the 5’ ends of the proteinase genes.

This contrast of conserved regions within the snp loci was intriguing, especially when considered in the context of the differential secretion and/or maturation characteristics exhibited by the homologous proteinases from different species. In order to gain more insight into the nature of the variable intergenic and 5’ proteinase regions, as well as to assess the distribution of snp loci among other streptomycetes, a PCR based screen was used to evaluate diverse Streptomyces species for snp-Wke DNA. The results of this screening effort and comparative analyses of the newly expanded snp family are presented in this report. For the sake of consistency, all of the homologous loci will be referred to as snp herein, and differentiated instead by species of origin.

Methods and Materials

Bacterial strains and plasmids. The Streptomyces species screened for snp homologs are listed in Appendix 2. Escherichia coli Top 10 (Invitrogen, San Diego, CA) was used as the host for cloning of the PCR products generated in this study. Plasmids used in this study and harboring snp homolog inserts are summarized in Table 3.1.

105 Plasmid Characteristics Reference pCRtopo2.1 3.908 kbp, TA cloning vector Invitrogen, Inc. pANTI221 4.888 kbp, pCRtopo2.1 plus 5. avermitilis snp locus This Study pANTI222 4.988 kbp, pCRtopo2.1 plus S. speclabilis snp locus This Study PANTI223 4.999 kbp, pCRtopo2.l plus S. griseornber snp locus This Study pANT1224 5.011 kbp, pCRtopo2.I plus S. ATCC 11862 snp locus This Study pANT1225 4.827 kbp, pCRtopo2.1 plus 5. thermotolerans snp locus This Study pANT1232 5.053 kbp, pCRtopo2.I plus 5. rochei snp locus This Study pANTI233 5.066 kbp, pCRtopo2.I plus S. flocculus snp locus This Study pANTI235 4.859 kbp, pCRtopo2.1 plus S. spadicis snp locus This Study pANTI236 4.869 kbp, pCRtopo2.1 plus S. ATCC 21021 snp locus This Study PANT1237 5.023 kbp, pCRtopo2.1 plus 5. fradiaesnp locus This Study PANTI238 4.977 kbp, pCRtopo2.1 plus S. namwaensis snp locus This Study pANTI239 4.953 kbp, pCRtopo2.1 plus S. fellus snp locus This Study

Table 3.1. Plasmids used in this study

106 Media and growth conditions. E. coli cultures were prepared using liquid or solid LB medium as described by Sambrook (1982), containing 50 pg/ml ampicillin when necessary. IPTG, X-Gal, and ampicillin were purchased from Sigma Chemical Company

(St. Louis, MO). Streptomyces were grown in TSBP medium containing the following components (in g per L): Tryptic Soy Broth, 30; Yeast Extract, 1; dextrose, 5. R2YE medium, used as the solid growth medium for streptomycete cultures, was prepared as described by Hopwood et al. (1985).

Nucleic acid purification. For purposes of genomic DNA isolation, lyophilized preparations of the Streptomyces strains to be tested were rehydrated in 1 ml of sterile water, and 100 pi aliquots were used to inoculate 5 ml TSBP tubes containing four 4-mm glass beads (Fisher Scientific, Hanover Park, IL) for incubation at 29°C. When the culture became dense with mycelia, usually about 48 hours post inoculation, the biomass was harvested by centrifugation and processed for genomic DNA by the method of

Hopwood et ai (procedure #4; 1985). Small aliquots of the genomic DNA were removed for PCR analysis, and the remainder stored at -20°C.

PCR methodology. The optimal oligonucleotide set chosen for the screening project consisted of primer R3 (5 ’CGGCCCAGCGGGGTGGGGCGGCAGCCG3’) and primer A l

(5 ’GGCGTTGGTGCAGGACGGGCCGGG3’). Each PCR reaction contained 1 pg of genomic

DNA, 250 nmol of each primer, and 10% (v/v) DMSO in PCR Supermix (Catalog #

10572014, Life Technologies, Gaithersberg, MD). The dénaturation, annealing and extension conditions were 94°C (45 seconds), 59°C (45 seconds), and 68°C (90 seconds), 107 respectively, and after 25 cycles, a 10 minute hold at 68°C completed the reaction, and the outcomes of the reactions were evaluated by standard horizontal gel electrophoresis of reaction aliquots in 0.8% or 1.0% agarose gels as described by Sambrook (1982).

DNA sequencing. Automated DNA sequence determination was performed with an

ABI377 automated DNA sequencing system (Perkin Elmer Applied Biosystems, Foster

City. CA) using the enzyme and reagents provided by the same manufacturer. The sequencing reactions were set up and performed in a PE9600 thermal cycler (Perkin

Elmer) as indicated in the manufacturer’s instructions, and the reactions were loaded onto the ABI377 vertical gel system for separation. The differential fluorescence of the four terminator nucleosides was recorded by the instrument and processed by the software into an electropherogram representing the sequence of the template. Based on the quality of the electropherogram, the reliable region (usually between 400 and 600 nucleotides) of the associated sequence was excised and stored for later digital manipulation.

Sequence management and analysis. All bioinformatic and network interface programs were operated on a Pentium processor-based IBM PC compatible workstation running

Microsoft Windows operating systems. Clone Manager 5 (Scientific & Educational

Software, Inc., State Line, PA), was used for routine management o f DNA sequence files, including such manipulations as open reading frame (ORE) searches, the generation of predicted restriction profiles, plaiming cloning experiments and preparing printouts.

Simple analyses of the DNA and deduced amino acid sequences were carried out utilities available on the Merck Research Labs (MRL) Bioinformatics Server (Merck & Co., Inc.)

108 and the NCBI web site (http://www.ncbi.nlm.nih.gov/). Multiple alignments were generated using the CLUSTALW program running on the MRL Bioinformatics Server and the Pôle Bio-Informatique Lyonnais Network Protein Sequence Analysis server

(PBIL-NPSA, Lyon, France; http://pbil.ibcp.fr/NPSA/) or as DOS executable programs.

Alignments were viewed and manipulated with BioEdit 4.74 (Hall, 1999). Grayscale alignment printouts were obtained using the BioEdit rich text export function, or

BOXSHADE running on the MRL bioinformatics server or EMBnet (European

Molecular Biology network, Swiss node; http://www.ch.embnet.org/).

Phylogenetic analysis. Molecular phylogenetic analyses were conducted on sequence sets aligned using CLUSTALW and CLUSTALX, with additional manual adjustment using BioEdit. The PHYLIP package of programs (Felsenstein, 1993), running as DOS executable programs in a Windows operating environment was used to construct phylogenetic trees by both parsimony and distance methods. Confidence values for internal lineages were determined by analyzing 100 bootstrap replicates of each aligned sequence generated with SEQBOOT. Distance matrices were prepared by the Kimura two-parameter method (Kimura, 1980) with DNADIST. Maximum parsimony analysis was performed with DNAPARS, and distance analyses performed by the neighbor- joining method (Saitou and Nei, 1987) contained in the NEIGHBOR program of the

PHYLIP package. Majority rule consensus trees were generated with CONSENSE, and output viewed with Tree View 1.5.2 (http://taxonomy.zoology.gla.ac.uk/rod/rod.html).

109 Results

Design of primers for amplification of homologous loci. The published nucleotide

sequences for snpR and homologs, as well as snpA and homologs, were aligned using

CLUSTALW. After minor manual adjustments, three conserved regions between 18 and

20 nucleotides in length were selected from each alignment to serve as trial amplifrcation

primers. Figures 3.01 and 3.02 show the alignments, the regions selected, and the

primers designed from the snpA and snpR sequences, respectively.

Evaluation of primer sets. In order to evaluate primer suitability for amplification of

new snp loci, PCR reactions containing genomic DNA from Streptomyces sp. strain C5,

S. lividans 1326, and several different S. coelicolor strains were conducted to test for

amplification products of the predicted sizes. Nine different combinations of A and R

primers (R3:A1,R3:A2,R3:A3, R2:AI,R2:A2,R2:A3, R1:A1,R1:A2,R1:A3) were tested

for the ability to amplify the known loci using PCR conditions selected on the

basis of the primer oligonucleotide melting points. As expected, the outcomes were

variable (data not shown), but primer set R3:A1 proved to be the most effective pair of

the nine tested, and was thus chosen for use in the screening project.

Homolog search. For the genomic DNA screening effort, a total of 96 species of

Streptomyces, uncharacterized for the presence of 5wp-like sequences, were selected from

the Merck Natural Products Drug Discovery Culture Collection (tabulated in Appendix

B). Genomic DNA from each streptomycete was prepared from freshly grown biomass and used as template in PCR reactions with primers R3 and A l.

110 Figure 3.01. Alignment of published snpR nucleotide sequences. This snpR sequence alignment was used to select regions suitable for PCR primer generation. The three regions selected, R l, R2 and R3, are denoted by solid bars below the sequence. Labels: S.C5, Streptomyces sp. stain C5; S. liv, S. lividans; S. co, S. coelicolor.

I ll S.C5 snpR S.liv sopR S.CO snpR

S.C5 snpR S.liv anpR S.CO anpR

S.C5 anpR S.liv anpR S.CO anpR

S.CS anpR S.liv anpR S.CO anpR

S.C5 aapR S . liv anpR ----- S.CO anpR CCGGGC

S.C5 aapR S.liv anpR S.CO anpR ACAT

S.C5 anpR S.liv anpR S. CO at^R

S.C5 anpR S. liv aapR S.CO anpR

S.C5 aapR S.liv anpR S.CO anpR

S.C5 anpR S.liv aapR S.CO anpR

S.C5 anpR S.liv anpR S.CO aapR

S.CS anpR S.liv anpR S. CO anpR

S.CS anpR S.liv anpR S.CO aapR

S.CS aapR S.liv anpR S.CO anpR

S.CS anpR S.liv anpR GGCGGCCGGGTTTCGTGCGGGCGCGCTGAAGGCTCCCGGTCCCGCCGCCCACGGGACGTCGCCCCACCCC S.CO anpR

S.CS aapR S. liv anpR GGCCAATAGGGTGCTGTTCATACCGGTCCGGTGTGTCTGCGTGA S. CO anpR

Figure 3.01. Alignment of published snpR nucleotide sequences.

112 Figure 3.02. Alignment of published snpA nucleotide sequences. This snpA sequence alignment was used to select regions suitable for PCR primer generation. The three regions selected, Al, A2 and A3, are denoted by solid bars below the sequence. Labels: S.C5, Streptomyces sp. stain C5; S. liv, S. lividans-, S. co, S. coelicolor.

113 S.CS aapA S . li.v sapA S.CO anpA CTC—

S.CS sapA S.Zi.7 anpA S.CO aapA

S.CS sspA S.llv anpA S.CO aopA

S.CS anpA S.liv anpA S.CO anpA

S.CS aapA S.liv anpA S.C O anpA

S.CS anpA S.liv anpA S.CO anpA

S.CS anpA S.liv anpA S.C O anpA

S.CS anpA S.liv anpA S.CO snpA

S.CS snpA S. liv snpA “ 2 ^ S. CO snpA

S.CS anpA S. liv snpA S.C O anpA

Figure 3.02. Alignment of published snpA nucleotide sequences.

114 Electrophoretic separation of reaction aliquots revealed numerous amplification products

of varying sizes and intensities. Figure 3.03 shows a representative gel image from one

set of reactions in which several different reaction outcomes are evident. Of the 96

strains tested, 33 showed amplification products, indicating that as many as 34% of the

strains may contain jnp-like DNA. Table 3.2 summarizes the PCR outcomes from the

DNA samples giving positive results.

Cloning, sequencing, and properties of the snp-homologous loci. The PCR products

from twelve positive reactions were chosen for isolation, cloning, and sequence

determination on the basis of their resemblance (in size and intensity) to PCR products

from the known protease loci. Although the other PCR products may have represented

more divergent and therefore interesting homologs of the snp locus, they could also have

represented products of aberrant amplification, and hence, were not pursued in this study.

The Taq amplified products were cloned using the TA cloning vector pCRtopo2.1

(Invitrogen). and the resulting plasmids were purified for DNA sequence determination.

Both forward and reverse strands of the cloned snp homolog inserts were sequenced

using an ABI 377 automated sequencer. The nucleotide sequences of the twelve new snp

loci are presented in Appendix C.

Comparative analysis of thesnp loci. Primers R3 and Al amplified segments of the new snp loci comprising a short 54 codon stretch of the 5’ end of the snpR gene, the snpA-R intergenic segment, and the snpA gene to a point approximately 40 nucleotides

115 Strong product of expected Weak product of Product not of expected Multiple PCR size expected size size products S. avermitilis S. platensis S. violaceoruber S. rutgersensis S. griseoruber S. flavogriseus S. platensis S. argenteolus S. chattanoogensis S. viridifaciens S. gnseolus ATCC14892 S. olivaceus S. graminofaciens S. parvus S. rochei S. prasinus S. melanogenes S. kanamyceticus S. tateyamensis S. fellus S. rimosus S. thermotolerans S. antibioticus S. violacoruber S. olivochromogenus S. flocculus S. spadicis S. namiwaensis S. halstedii ATCC 11862 ATCC21001 ATCC21021

...

Table 3.2. PCR product properties for j/?/?-positive Streptomyces. See Appendix B for more detailed strain information.

116 Figure 3.03. Gel electrophoresis image showing PCR products amplified from a subset of the Streptomyces species tested. This gel image shows aliquots of PCR reactions using primers Al and R3 on genomic DNA template from the following cultures; Lane 2, OSU-MA 23; Lane 3, OSU-MA 32; Lane 4, OSU-MA 33; Lane 5, OSU-MA 34; Lane 6, OSU-MA 35; Lane 7, OSU-MA 36; Lane 8, OSU-MA 37; Lane 9, OSU-MA 38; Lane 10, OSU- MA 39; Lane 11, OSU-MA 42; Lane 12, OSU-MA 43; Lane 13, OSU-MA 44; Lane 14, OSU-MA 45; Lane 15, OSU-MA 48; Lane 16, OSU-MA 49; Lane 17, OSU-MA 50; Lane 18, OSU-MA 51; Lane 19, OSU-MA 53; Lane 20, OSU-MA 54; Lane 21, OSU-MA 55. Lane S contains a 1 kbp size standard ladder, and Lane I an aliquot of a positive control PCR reaction containing primers Al and R3 and plasmid pANT842 as template. See Appendix B for OSU-MA codes.

117 s 1 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 17 18 19 20 21

Figure 3.03. Gel electrophoresis image showing PCR products amplified from a subset of the Streptomyces species tested.

118 downstream of the sequence encoding the HEXXH active site motif of the SnpA

proteinase. In this report, the homologous loci will be treated as being comprised of five

different regions: i) the 5’ snpR region; ii) the snpA-R intergenic region; iii) the snpA

signal peptide region; iv) the snpA propeptide region; and v) the mature protease region.

snpR Figure 3.04 shows a protein sequence alignment of 16 deduced amino-terminal

LysR-iike fragments: the three published and twelve new SnpR sequences, as well as E.

coli LysR. The Dodd and Egan scores, which indicate the likelihood of a particular

protein sequence containing a helix-tum-helix DNA binding motif (Dodd and Egan,

1990), range between 5.09 for Streptomyces sp. strain C5 snpR (100% calculated

probability of being a DNA binding sequence) and 3.87 for S. flocculus snpR (71%

calculated probability of being a DNA binding sequence). With 57% identity shared

across the group, the 15 streptomycete SnpR protein sequences show a high degree of

relatedness. The consensus sequence for the amino-terminal 54 residues of SnpR is

shown in Figure 3.05, with the helix-tum-helix motif underlined.

MELEVRHLRALCAIADXGSLHXAARXLGXXOPXLXTOLRRIEXXLr.XXi.FXRXR...

Figure 3.05. Consensus amino acid sequence of SnpR amino-terminus.

In the cases of S. flocculus and S. namwaensis snpR, the formyl-methionine- encoding ATG start codon is replaced with a GTG codon encoding valine. Beyond this difference, the first 16 amino acids of the SnpR proteins are strictly conserved, with only one other exception, an alanine to valine substitution at position 10 of 5. spadicis SnpR. 119 S .C 5 - R S.Ilvldans-R S. avmrmitils-B. S. girismombr-R S.rochei-R S. thermotoierene-R S. spadicis-R S. apectahilie-R S.ATCC21021-R 5.fradiae-R 5.iellua-R S .namvaenaia-R S. floceulua-R S.coelicolor-R S.ATCC11862-R B. coli-LysR

a-helix P-tum a-helix

Figure 3.04. Alignment of LysR-llke proteins from Streptomyces and E. coti.

120 The helix-tum-helix regions of the SnpR proteins exhibited more sequence variability than the amino termini, paralleling the heterogeneity found in the snp intergenic regions.

Intergenic region. Figure 3.06 shows an alignment of the 15 intergenic regions. In contrast to the SnpR sequences, the snpA-R intergenic regions were markedly dissimilar.

Ranging from 269 bp for Streptomyces sp. strain C5 to 178 bp for S. thermotolerans, the variation in size of the different intergenic regions presents a significant gap-opening issue when the task of aligning the regions is undertaken. The alignment shown was prepared without the assistance of any computer programs, and was constructed considering the functional information elucidated in the biochemical study of the

Streptomyces sp. strain C5 snp locus. The fifteen snpA and sripR start codons were held fixed relative to each other, as were the sequences immediately upstream of the snpA

ATG, characterized as the promoter region of Streptomyces sp. strain C5 snpA. The numerous T-Nn-A sequences in the intergenic regions were evaluated for the likelihood of function as SnpR binding sites by considering several factors. First the presence and strength of any inverted repeat character around the T-N, ,-A was determined. The likely

SnpR binding site of Streptomyces sp. strain C5 contains an eight bp inverted repeat centered on a T-Nn-A sequence.

Next, the sequence adjacent to and within the homo log T-Nu-A motifs was checked for nucleotide stretches aligning with Streptomyces sp. strain C5 snp and other members of the group. It was evident early in the alignment process that a block of snpA- proximal sequence adjacent to one T-Nn-A from nearly every snp homolog showed significant similarity.

121 Figure 3.06. Alignment of 15 Streptomyces species snpAR intergenic regions. Shown as sense strands with respect to snpA, the full alignment of the snpA-R intergenic regions contains significant gapping, due to the range of sizes exhibited by the homologous loci. The snpR and snpA start codons, as well as the T-Nn-A region likely to contain the SnpR binding sites, are indicated; the Streptomyces sp. strain C5 transcriptional start points (TSPs) for snpA and snpR are shown as well. Sequence codes: s e s , Streptomyces sp. strain C5; SLI, S. lividans’, SA V, S.avermitilis; SGR, S. griseoruber, SRO, S. rochei’, STH, S. thermotolerans’, SSPA, S. spadicis’, SSPE, S. spectabilis; 21021, Streptomyces ATCC 21021; SF/?, S. fradiae; SFE, S. fellus’, SNA, S. namwaensis’, SFL, S. flocculus’, SCO, S. coelicolor, 11862, Streptomyces ATCC 11862.

122 snpR start codon

GTTCAGC'TTT ^CAT TGGCGACCGATCAGGACCATCGGGCCTTCACGGCGCGAGGCGTCGGCC GTTgI'TTG CGIgGCGG^CAGGGCGAGTTCTGCACTCTGGTCAAAGCTGGAAC' ' - TAAaI'ATT 'tCTC CclATGCAicACAGGCGAGTCGGCGCACCTCAGGCCCC'- --- - GTTgI'TTG GGlGGCGGfccAGGGCGAGTTCTGCACTCTGGTCAAAGCTGGAAC'-" GTTgJ'TTG GG§GGCGG§ACAGGGCGAGTTCTGCACTCCGGTCGAAGCCGTTCCCGG CTBTGT GCGGC-GTTAl CGCA CCATCGCTA~'~~~~"~ ...... GAACA'CGA ATGTT TTACAAGG ...... -...... GTTGCGGCl'GCA a c IcatggIaccgtagaagttgcccctccG " ------21021 CTACGGCcl'CCG ATAC acJacgagJ a c g a g c c g t t ...... ctJcctgaI'Tcg ACGC TGGCGCAATCACCTTCCTTCTCAAGGC A...... -... CGACCTGcf'-TCG ACAG GTGCG#GTCAA#TACCTTGCTTTGCAAGGCA...... CG§GATGGCGTCA ACTCCT cgIcatgaIctgccctttcc---'--'------TGTCACACGGCA ACAGC cggcIggtgaI c a c g c c g c g a ...... -... C#CGCCC#' CA&CCAApGCCCTCAACCATCCGGAAG------C|CGCCC|' CCCgG^^T' CaIgCCAaI tGCCCTCAACC ATCCGG AAG ------

Streptomyces sp. strain C5 snpR TSP 4 —^ s e s CGGATCGGCAGGGGCCCCGGCCGGGGCCGCCGGGCAGGGCGGGGCAG»GGlGAClGA SLI ...... -...... gagagt1|cc|ggc|gti — SAV — ------— -cacctcHacccggtggo SGR ------GAGAGTgCC#GGC#GTi SRO CCGGTCGCCGGGGTGGGCCG-'------AGGCCGMCGfGCciGCd STH -----— ------AACCAGgTGAACTGBGCA SSPA ------AAAGCGËACGAGTcÊGCq SSPE ------TCCGGTmCC#GAT&Gi 21021 ~~'~~~~~~TCGGGcSACJGCGfCGi SFR ------~------gtTGAC|BcTCTCGATGi CGAT SFE ...... GTTGGA^ACCGAËCAi SSA ~~~TCCTGG^rATTCA*rCGTCC SFL ------AATCGGCCCt^GCgCCi SCO ------GGTACG##AC#TGGCCAi 11862 ------GGTACgHacIi’GGCCA

T-N||-A inverted repeat region

(continued) Figure 3.06. Alignment of 15 Streptomyces species snpAR intergenic regions. SC5 ' ACGGCCGTACGGGCACGGACACTCAlGATGTjrGACTiATCC SLI ~~""-"------""-"-~-G|GGGCTCATGG|CGCCCCACAUr|ACCGGTGAlGACTTl'TCCC§ACTC SAV — ------gccgccgcacaIgcatcaacakIaccgatgaIgacttI----- ACCC SOR . 'GiGGGCTCATGGlcGCCCCACAmlACCGGTGAlGACTTi' -TCCBaCT SRO ...... GfGGGCTCATGGicGCTCCACA|T|ACCGGTGAlGACTT|"TCcHACA STH ~~~~~~~~~~lGCGCCGCACAGGCATCGAGA§rCACCGGTGAG^CA SSPA ------— ~~~cgc1gtgttgggct|accgagact1aJtggcaaccgatcgttcaacc1|aca SSPE ~ ~gacg1gccaggctgcccgacactcaIggataccgaaacgttcatcttt§accg 21021 ------GTGCGCCACATGCTCCCCGAGACT0ACTGCCACCGAACCGTTCA(yTTCACCGAC#CCCCA#GGTCGTGGAA SFR ...... ~~ACGGG|C|CGGCTCCclcGAGAmCTCT^GT SFE — ...... -' - ACGCGmCKGGCTTCCgCGAGAgrCGCT&AC SSA "GCGTCCAGACTCGGCGGCAClAAGAAATCCG§GGCCCCGGciGicGCCCCACACGGCGGCCGTA^CT SFL acgcggtccgctcgaacagactctcccgcagcaggaatccgtccccagggt1cggcgacgcccccacgccag|a§gccccacgtcgcgg|gccgg|Bctc SCO — ccgacgtcggtgcgctccagactcgttcccggaIcgaactcctcIgtccggtacgcggcggacccIactcggccggt^gc 11862 ----- CCGACGTCGGTGCGCTCCAGACTCGTTCCCGGA|cGAACTCCTc|GTCCGGTACGCGGCGGACCC§ACTCCGCCCGTGicGC

Streptomyces C5 snpA TSP

CCTCG, lACTC, CCCAC, :GCTC, ACAC, ACCC, AGAAGG, lAGTAGGG, 21021 ■gagtAGTAGGGAGCCC' AGAGG p ACAGG, ■ggaaGAAATTAAi GGAGCCTAAGG, ggaI agtcagg , 11862 ggaI agtcagg ,

snpA start codon

Figure 3.06. (continued) Lastly, the positions of the T-Nn-A motifs relative to the snpR and snpA start codons were considered, as motifs of intermediate distance between the two open reading frames are more likely to serve as regulatory protein binding sites than those very close to the open reading frames. Because of the proximity of the Streptomyces sp. strain C5 snpR promoter to the likely SnpR binding site, the alignment was also prepared so as to juxtapose the j«/?/?-proximal DNA adjacent to the T-Nn-A motifs for direct comparison.

Figure 3.06 shows the complete nucleotide alignment of the snp intergenic regions.

Immediately upstream of the snpR start codon lie four regions of similarity, shown along with the T-Nn-A region as the sense strand with respect to snpR in Figure

3.07. The first sequence, TANNCTC, starts between five and seven nucleotides upstream of the start codon. Two nucleotides further upstream lies the conserved sequence CCGC in eleven of the fifteen sequences. The hexamer CTGGTG, well conserved in all cases, lies thirteen to fourteen nucleotides upstream of the CCGC pattern. Lastly, a TTTG sequence between four and five nucleotides from the CTGGTG sequence is present in seven cases, and partly represented in five more (Figure 3.07).

The T-N II-A inverted repeat region, shown as part of the intergenic region alignment in Figure 3.06 and in greater detail in figure 3.08, exhibits substantial similarity both within the inverted repeat area and in the snp/f-proximal adjacent sequences. The sequence GATAG-N?-CTATC comprises the core of the motif, and is perfectly represented in eight of the fifteen cases, and partly present in six additional cases. A notable exception is S. namwaensis, the snpAR intergenic DNA of which does not contain a T-Nn-A sequence resembling the other members o f the group. Excepting

S. namwaensis snp, all sequences possess an inverted repeat of at least four nucleotides

125 Figure 3.07. Section of snpA~R intergenic region alignment showing sense strand with respect to snpR. A reverse complement of the snpR half of the alignment shown in figure 3.06, the T-N 11-A region, gapped region, and snpR start codon are indicated; the transcriptional start point of Streptomyces sp. strain C5 snpR is indicated with the arrow. The numbers within the alignment gap denote number of nucleotides comprising the individual gaps. The sequence codes are the same as in Figure 3.06.

126 Streptomyces sp. strain C5 sttpR transcriptional start point

TClGTCiCC, TGCCC AAA'GCTGAACT' ac| gc CTCTC CAA'iCAAC GCCACCGGG' lAGGTG GAGT AAT'BtTTA, GCAGG ac| gcc| CTCTC CAA'IcAAC gcI ggc{ CGGCCT CAA'|CAAC GCICAGTTC TGGTT TGCG AAC'GCCGC igcI g a c t c g GCTTT AACAT CG'TGTTC caI atcI g g ' CCGGA GC'IgCCGCAAI GCCC :CG|cGcf GCCCGA GG'IgGCCGTA' ATCG CATCGAGAi TCAAC GA'tcAGGgA' TCfcCGGTi ITCCAAC 30 AC :GA'§GCAGGTC TCGCC GGACGA&GAAT, CAGGA AGGAG g a c g c c a t c J cgI âtggi gg| gc GGGCCGATT GCTG gccgtgtgacac I gcJ g' SCO GCAC TGGCC. GTACC ACCfl gg'IgggcgIgg CTT( 311 GCAC TGGCC GTACC EgG'|gggcg|gg ICTT K) snpR start T-N|i-A alignment codon region gap

Figure 3.07. Section of snpA-R intergenic region alignment showing sense strand with respect to snpR. S.C5-IG S .lividans-IG S.avermitilis-IG S . griseoz%dber-IG S. rochei-IG S.tbermotolmrana- S.spadicia~lG S. spactabilis-ZG S.ATCC21021-IG S. fradiae-IG 5. fellus-IG S .namwaensis-IG S.flocculus-IG S.coelicolor-IG S.ATCC11862-IG

Figure 3.08. The T-Nn-A inverted repeat region of the snpA-R intergenic region alignment. Identical nucleotides are shaded gray, the thymidine and adenine of the T-Nn-A motif are in bold, and the inverted repeats are imderlined.

128 (such as for Streptomyces ATCC 11862, S. coelicolor, and S. avermitilis snp), with a mean of six nucleotides and a maximum of eight (such as for Streptomyces sp. strain C5,

S. lividans, S. rochei,and S. spadicis snp). The adjacent sequences snpA-pxox\ma\ to the motif also show a high degree of relatedness, with the sequence TGCCATCAT perfectly represented in nine cases and partly represented in another five. Interestingly, the sequences -proximal to the likely SnpR binding sites show very little similarity to

Streptomyces sp. strain C5 snp DNA. A higher degree of relatedness was expected given that this region contains the Streptomyces sp. strain C5 snpR promoter.

The last area analyzed within the intergenic region, the DNA immediately upstream of the snpA start codon, can be divided into two subgroups of similar sequences. Figure 3.09 shows an expanded view of the regions proximal to the snpA start codon, with the members aligned as two distinct groups. Group A contains Streptomyces sp. strain C5, S. lividans, S. avermitilis, S. griseoruber, S. rochei, and S. thermotolerans snp elements, and exhibits three areas of sequence identity. The first, and strongest conserved region lies immediately upstream of the snpA start codon, and is characterized by the following pattern: CAAGGAGTCATCGATG (with start codon in boldface). This sequence is perfectly represented in four of the six members of group A, and partially represented in the other two (eleven out of the thirteen nucleotides for S. thermotolerans and twelve out of thirteen for Streptomyces sp. strain C5). Second, three nucleotides 5’ to this motif is a stretch of four or five cytosine residues partially represented in all six cases. Lastly, between four to ten nucleotides further upstream is the sequence

TGACGACTTC, perfectly represented in four of the six group A regions, and partly represented in a fifth {Streptomyces sp. strain C5, with seven out of ten nucleotides

129 A.

S.C5 taIgggcacgg C#^grG#TG S. lividans cclcHAVr|Ai S.svarmitilis at I aH aI tI ai S.grissorubsr CcI cH aI tI ai ► group A S .r o c h s i CT|c|AlrlAi ## S.thsnaotolmrana gcJgcacaggcatcg^ ^ ^ ^ ccggtgagac T snpA start codon

B. S.apsctabilis TTCA rTTCAC§GAAC*T~AGG S.ATCC21021 3g t c g t g g a a c | gtaggga S .f radia» AGAC rCTCCAGTC#CCTAGAGG S .fa llu a AGAC CTCCGAC CTACAG S.naanramnaia GTAGC ► group B S.flocculus TCGCG CCGG S.comlicolor CT'CGGCCGGT 5.ATCC11862 GT'CCGCCCGT S.spadicis ATCGTTCAACC T snpA start codon

Figure 3.09. Two dissimilar groups of snpA upstream regions. conserved). Group B contains S. spectabilis, S. ATCC21021, S. fradiae, S. fellus, S. namwaensis, S. flocculus, S. coelocilor, S. ATCCl 1862, and S. spacicis snp elements, and is characterized by the pattern AGGAG(C) 6 -7 immediately upstream of the snpA start codon. Unlike group A, these sequences do not exhibit any significantly conserved regions upstream of the AGGAG(C) 6 -7 motif. The S. spadicis sequence is notable in that it is the only case containing a thymidine residue in the center of the cytosine stretch.

SnpA proteinase signal oeotide. Figure 3.10 shows an amino acid alignment of fifteen deduced SnpA signal peptides. The positions of likely signal peptidase cleavage (and therefore the carboxyl-termini of the signal peptides) were determined in the following manner. First, the likely processing sites posited in the published reports characterizing

SnpA, SlpA and MprA (Lampel et al, 1992; Butler et al., 1992; Dammann and

Wohlleben, 1992) were used as indicators of the SnpA signal peptide terminus. Second, all fifteen amino acid sequences were aligned, with most of the new homologs exhibiting clear homology to the published sequences in the signal peptidase cleavage region.

Lastly, where an obvious cleavage site was not apparent, the position of other signature patterns was used to determine the likely cleavage position.

The signal peptide sequences varied in length from the 26-residue Streptomyces

ATCC21021 peptide to the 38-residue S. spadicis peptide, and had a mean length of 29.2 residues. The obvious outlier is the S. spadicis signal sequence, at least eight residues longer than any of the other signal peptides (see discussion). Three features common to the SnpA signal peptides are apparent. First, within the cationic amino terminal region of the signal sequences, twelve of the fifteen proteins contain a positively

131 S.C5-A ------"~MRMPLSVLT| sI a t l S . liv x d * n s - A — ---- -MRITLPL, S. uvmrmitilxs-K ------~MRMSTS S . ffr±s»orabm r-K -"-'MRKSLSBLAVL S.rocbmi-A ------MRISLPL J ~ t | v S. thmrmotolmrmns-K ------— MRKSLsJlavL S . sp»diC3.3-K MASFRNSTGSKSTRRLLGLIl S. sp3CtMbHis~K MHKR S.ATCC2102X-A ~-"~~MKRYL«~VfAV s.£x*d3.*m-K -mrkpkvltIvltat S . f » l l u s - K ------MRHPKVLNjVLTAA S . xiamwMmnsi.s~K MRS PKJAlMaa S. flocculus-K MRYRRSAa1~a|i^ S . c o m lic o lo r -K ------~MRMTRAaBa~LA S.ATCC11862-A ------~MRMTRTgJ~ALA

Figure 3.10. Alignment of snp signal peptide sequences. Similar residues are shaded gray, and identical residues shaded black.

132 charged arginine residue in the second position, making the most frequent amino-terminal dipeptide MR. Of the three exceptions, two contain substitutions of similar charge: histidine for S. spectabilis SnpA, and lysine for Streptimyces ATCC21021 SnpA. The cationicity of these amino-terminal segments ranges from +1 {Streptomyces sp. strain C5,

S. lividans, S. rochei, and S. avermitilis) to +5 (5. spadicis), with a mean charge o f+2.25.

A serine residue, conserved in eleven of the fifteen sequences, separates the amino- terminal cationic segments and the central hydrophobic cores of the signal peptides.

Within the hydrophobic core, a signature pattern, GLGLXXA is well represented in most peptides, and was used to guide the assignment of the likely S. spadicis signal-peptidase cleavage site. Lastly, the polar carboxyl-terminal region of the signal peptides contains a

PAXA motif present in eleven of the fifteen cases, and partially in the remaining four.

The SnpA propeptide region. Between the PAXA motifs and the amino-termini of mature Streptomyces sp. strain C5 and S. caespitosus SnpA (Lampel et al., 1992; Harada el ai, 1995) lies the SnpA propeptide. As inferred from studies of other well- characterized proteinases, the SnpA propeptide sequence most likely serves to hold the recently secreted protein in a proenzyme, or zymogen, state. However, there is without question a variation in propeptide processing between the Streptomyces sp. strain C5 and

S. caespitosus proteinases when compared to those from S. lividans and S. coelicolor.

The reports by Butler et al. (1992) and Dammann and Wohlleben (1992) clearly show an active proteinase of significantly greater size than those reported from Streptomyces sp. strain C5 and S. caespitosus. Potential explanations for these observations are discussed below. For the purposes of these analyses, all fifteen sequences were considered to

133 contain a propeptide segment starting at the +1 residue after the signal peptidase cleavage site and ending at the VTV motif representing the mature proteinase amino-terminus (see mature proteinase alignment results, below).

The propeptide regions of the fifteen SnpA proteinases range in size from 55 amino acids for Streptomyces sp. strain C5, to 46 amino acids for S. fadiae, with a mean propeptide length of 49.4 residues. Upon alignment, two regions of differing relatedness were evident: an amino-terminal region exhibiting low sequence conservation, and a carboxyl-terminal region exhibiting high sequence conservation. Figure 3.11 shows an alignment of the propeptide sequences. Extensive manual and computer assisted evaluation of the amino-terminal propeptide region yielded no definitive residues likely to be conserved between the homologous proteins. The carboxyl terminal region, however, contains the conserved consensus sequence AFFEAVXKSVAEKRAANP, with fourteen of the fifteen propeptides containing a version of this motif. The one exception, the S', flocculus propeptide, has only the phenylalanyl, lysyl, and prolyl motif elements.

The SnpA mature proteinase. The amino-terminus of Streptomyces sp. stain C5 SnpA was determined to be AAVTV (Lampel et a!., 1992), while the amino-terminus of S. caespitosus SnpA was reported to be TVTV (Harada et al., 1995). The VTV tripeptide, common to both, was found in all fifteen SnpA sequences; for the purposes of these analyses it is treated as the common amino-terminus of the mature proteinases. Although the different mature SnpA proteinases likely have heterogeneous amino-termini, this simplification does not adversely effect the quality of the conclusions drawn herein.

134 5.C5-A TPTAEGAPWAYDGiPSAGjPADAKA S.liv x d * n s~ K AAAPQE PVRM q LGYQPSAGIg S . av 0 rm i tH i s - K ~~~APVAP, 1t a y s g y t g s |_ S. çris»orvb02:~K SPVAEP f v t q |s a s a g a S. rochei.-A a p q a p a ; |q l g y q p a a g |g S. thaaBotol»rmna-A — SPVAEP: f v t q Js a s a g a S.spadicia-K —~QDTTPA! iGYVAGA -DAGT ~SLC S.spactabilis-A — a t s a d s a k s t : YAGl S.ATCC21021-A ----- TPTAAA iYEGSKAl -GAK S. fxradiaa-A ~vsadrtggaavayag -GAI S . f a l l u s - A ------VGTAPMjJAPAAVAYAG S.aamwamnaia-A ~-~ ~T PAS PAS HRfcPASVfAYNG KADT S.flocculua-A S daaapaprhqta M p a e a t t q g y S.coolicolor-A --— AAETBTPRS'/ËAYEA S.ATCCX1862-A ~ ~ ~ ---- AAE t |t PRSVgAYEA

Figure 3.11. Amino-acid alignment of SnpA propeptide sequences. Similar residues are shaded gray, and identical residues shaded black. The amino-terminus experimentally determined for S. lividans SnpA is underlined.

135 The 3’ end of primer A1 anneals to Streptomyces sp. strain C5 snpA 571 nucleotides downstream of the ATG adenine, amplifying a fragment containing 108 of

146 snpA codons (ca. 75% of the gene, not counting the sequence contributed by the primer, see Figure 3.02). Thus, the newly isolated loci contain three quarters of the mature proteinase sequences. Figure 3.12 shows an alignment of sixteen deduced SnpA mature proteinase amino-acid sequences, as well as the S. casepitosus SnpA sequence, which was determined by Edman degradation and X-ray crystallographic analyses

(Harada et al., 1991, 1995). Sharing 55% identity, these deduced mature proteinase amino-acid sequences show a very high degree of relatedness. Upon alignment, it is apparent that the first ca. 35% of the protein primary sequences exhibit a greater degree of variation than the remaining portions of the sequences. Also, comparison of the published SnpA sequences suggests that this high degree of relatedness likely decreases beyond the point of the A1 primer, and the different SnpA homologs probably possess carboxyl-termini heterogeneous both in length and sequence.

Ranging in length between 106 and 108 amino-acid residues, the mature sequences have only one region of gapping in the alignment, around residue 39 of the mature proteinases. This gapped region roughly delineates the amino-terminal variable region from the more strictly conserved remainder of the mature proteinases. Lying within the variable region are the consensus sequences APSF (with strictly conserved residues in boldface), followed three residues later by lAXSTQlWNSSVSNVXLQ.

The consensus sequence for the remainder of the mature proteinase, carboxyl-terminal to the gapped region, is shown in figure 3.13.

136 Figure 3.12. Alignment of mature SnpA primary sequences. Similar residues are shaded gray, and identical residues shaded black. The numbering scheme corresponds to the S. caespitosus proteinase.

137 S.C5-A S.lividans-A S.avermitilis-A S .griseoruber-A S.rochei-A S.thermo tolarans-A S.spadicis-A S.spactabilis-A 5.ATCC21021-A S.fradiae-A 5.fe llu s - A S .namwamnsis-A S.flocculus-A S.coelicolor-A S.ATCC11862-A S.caespitosus-A

S.C5-A PGPSCTNAYPNSAERSRVNQLWANGFAAAMDKALEKSAR‘ S.lividans-A PGPSCTNPYPNSTERSRVNQLWAYGFQAALDKALEKASQR* S.avermitilis-A S.griseoruber-A S .ro c h e i-A S.thermotolerans-A S.spadicis-A S.spectabilis-A S.ATCC21021-A 5.fradiae-A S.fellus-A S.namwaensis-A S.flocculus-A S.coelicolor-A PGPSCTNAQPDSAERSRVEQLWANGLAEAAAEVR* S.ATCC11862-A S.caespitosus-A PGPSCTNPYPNAQERSRVNALWANG*

active site

Figure 3.12. Alignment of mature SnpA primary sequences. 50 60 70 80 90 100 I i I I I I DFXYREGNDXRGSYASTDGHGXGYIFLDYXQNQQYXSTKVTAHETGHVLGLPDHYSGPCSEIHSGGG

Figure 3.13. Consensus SnpA sequence for the mature proteinase conserved region. Completely conserved residues are shown in boldface; numbering taken from Streptomyces sp. strain C5 SnpA. Active site zinc ligand residues are indicated by asterisks.

Between residues 80 and 100 {Streptomyces sp. strain C5 SnpA numbering, see figure

3.13) lies the SnpA active site. Comprised of the totally conserved HETGHVLGLPD motif, the histidine and the aspartic acid residues serve as three ligands for the catalytically essential zinc ion (Kurisu et al., 1997). Carboxyl-terminal to this motif lie sequences essential to the structure of the active site, which also show a high degree of relatedness.

Phylogenetic analysis of thesnp locus. Table 3.3 provides the pairwise difference values between the various new and published snpR gene fragments. The data were generated with DNADIST from a nucleic acid alignment corresponding exactly to the amino acid alignment shown in Figure 3.04. The values range from 0.00 to 0.3768, with an average score of 0.2637, indicating all fifteen sequences are fairly closely related.

Figure 3.14 shows an unrooted neighbor-joining phylogenetic tree derived from 100 bootstrap replicates of the snpR alignment, and Figure 3.15 shows a maximum parsimony tree generated from the same data set. Since phylogenetic relationships with bootstrap 139 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1. SC6R

fillvR 0.2066

3. SavR 0.1470 0.17 33

4. SgrlR 0.2066 0.0000 0.17 33

b. SrocR 0.2301 0.0918 0.2096 0.0918

6. StheR 0.2352 0.2500 0.1884 0.2500 0.2076

7. SspadR 0.2732 0.3582 0.2808 0.3582 0.3231 0.2600

8. SspecR 0.2539 0.2426 0.1546 0.2426 0.2539 0.2526 0.2561

9. 21021R 0.2655 0.2808 0.17 95 0.2808 0.2930 0.2438 0.2414 0.0918

10. SfradR 0.2900 0.3118 0.24 64 0.3118 0.2962 0.2946 0.2056 0.1646 0.2003

11. SfelR 0.2979 0.3007 0.2655 0.3007 0.2761 0.2930 0.2046 0.1812 0.2181 0.0378

ê 12. SnamR 0.3464 0.3564 0.3445 0.3564 0.3404 0.3213 0.3213 0.2914 0.2614 0.2614 0.2539

13. SflocR 0.3152 0.2294 0.2305 0.2294 0.2689 0.2883 0.2703 0.1708 0.1539 0.2364 0.2579 0.224 3

14. ScoR 0.3170 0.3666 0.2991 0.3666 0.3404 0.3445 0.3644 0.2914 0.3195 0.2946 0.2899 0.3687 0.3524

15. 11B62R 0.3152 0.3544 0.2975 0.3544 0.3526 0.3564 0.3768 0.2808 0.3084 0.2838 0.3007 0.3666 0.3406 0.0123

Table 3.3. Pairwise sequence differences between snpR gene fragments. Pairwise sequence differences calculated using DNADIST; Abbreviations: SC5R, Streptomyces C5 snpR', SlivR, S. lividans snpR\ SavR, S. avermitilis snpR\ SgriR, S. griseoruber snpR\ SrocR, S. rochei snpR; StheR, S. thermotoierans snpR\ SspadR, S. spadicis snpR\ SspecR, S. spectabilis snpR; 21021R, Streptomyces ATCC 21021 snpR\ SfradR, S. fradiae snpR, SfelR, S. fellus snpR\ SnamR, S. namwaensis snpR\ SflocR, S. flocculus snpR\ ScoR, S. coelicolor snpR', 11862R, Streptomyces ATCC 11862 snpR S.fradiae snpR

S.namwaensis snpR S.flocculus snpR S. spectabilis snpR S.fellus snpR

100 ATCC 21021 snpR ___ S.spadids snpR

'40 ATCC. 11862 snp/? 100 60

SJividans 36 snpR

S.coelicoior snpR 100 45 100 S. thenmotolerans snpR

S.griseoruber snpR S.avermitilis S.rochei snpR snpR Streptomyces sp. strain ,C5 snpR

Figure 3.14. Unrooted, neighbor-joining phylogenetic tree for homologous snpR gene fragments.

141 S.lividans snpR

S.griseoruber snpR S.rochei snpR

100

S.thermotolerans Streptomyces sp. snpR , strain.CS srtpR

S.avermitilis snpR S. coelicolor snpR

38

100 26 S.fradiae snpR ATCC 11862 snpR 39 44 96

S.namwaensis snpR 47

S.flocculus snpR S.spadids snpR S.fellus snpR

S.spectabilis snpR ATCC 21021 snpR

Figure 3.15. Unrooted maximum-parsimony phylogenetic tree for homologous snpR gene fragments

142 probability values greater than 95% are considered to be statistically significant,

comparison of the neighbor-joining and maximum parsimony trees derived from the 5’

ends of the snpR genes suggest monophyletic relationships exist between snpR

homologs from the following groups: I) S. lividans, S. griseoruber, and S. rochei', 2) S.

coelicolor and Streptomyces ATCCl 1862; and 3) S. fradiae and S. fellus.

Table 3.4 provides a pairwise difference distance matrix for the snpA gene segments encoding the mature SnpA proteinase. Again, the data were generated with

DNADIST from a nucleic acid alignment corresponding to the amino acid alignment shown in Figure 3.12; however, positions containing gaps in the original alignment were stripped for the phylogenetic analyses. The values range from 0.00 to 0.3328, with an average score of 0.2380, again indicating that all fifteen sequences are closely related. In the case of the snp A distance matrix, the 0.000 value was returned for the S. lividans versus S. griseoruber pair because the sequences are nearly identical, and the dissimilar nucleotides fell within the stripped region of the alignment. In the case of the snpR distance matrix, the short stretches of S. lividans and 5. griseoruber snpR DNA shared

100% identity, and hence returned a distance of 0.00 as well. Figure 3.16 shows an unrooted neighbor-joining phylogenetic tree derived from 100 bootstrap replicates of the snpA alignment, and Figure 3.17 shows a maximum parsimony tree generated from the same set of replicates. Comparison of the neighbor-joining and maximum parsimony trees derived from the mature SnpA-encoding regions suggests monophyletic relationships between snpA homo logs from the following groups: 1)5. lividans and 5. griseoruber, 2) 5. coelicolor and Streptomyces ATCC 11862; 3) 5. avermitilis and 5. thermotoierans', and 4) 5. fradiae and 5. fellus.

143 1 : 3 4 5 6 7 8 9 10 11 12 13 14 15

1. CbA

2. SlivA 0.2356

3. SavA 0.2034 0.2325

4. SgrlA 0.2356 0.0000 0.2325

5. SrocA 0.2102 0.0998 0.2458 0.0998

6. StheA 0.1767 0.2349 0.1006 0.2349 0.2281

7. SspadA 0.2510 0.1947 0.2275 0.1947 0.2217 0.2177

8. SspecA 0.3053 0.3155 0.3240 0.3155 0.3287 0.3141 0.3178

9. 31021A 0.2522 0.2749 0.2381 0.2749 0.2781 0.2543 0.2634 0.2536

10. SfradA 0.2836 0.1574 0.2504 0.1574 0.1607 0.2200 0.2517 0.3178 0.3328

11. SfelA 0.27 32 0.1459 0.2381 0.1459 0.1404 0.2151 0.2272 0.2907 0.3237 0.0603

12. SnamA 0.2649 0.2820 0.2781 0.2820 0.2995 0.2375 0.2536 0.2931 0.2337 0.2907 0.2547

13. SflocA 0.2956 0.2541 0.2666 0.2541 0.2501 0.2574 0.2805 0.3053 0.3070 0.2592 0.2209 0.2620

14. ScoA 0.2517 0.2400 0.2543 0.2400 0.2331 0.24 97 0.2337 0.2287 0.1881 0.2710 0.2311 0.2452 0.2688

15. 11062A 0.2510 0.2394 0.2536 0.2394 0.2325 0.2490 0.2232 0.2183 0.1876 0.2702 0.2305 0.2445 0.2574 0.0127

Table 3.4. Pairwise sequence differences between snpR mature gene regions. Pairwise sequence differences calculated using DNADIST; Abbreviations: SC5A, Streptomyces C5 snpA; SlivA, S. lividans snpA; SavA, S. avermitilis snpA\ SgriA, S. griseoruber snpA; SrocA, S. rochei snpA; StheA, S. thermotoierans snpA; SspadA, S. spadicis snpA; SspecA, S. spectabilis snpA; 21021 A, Streptomyces ATCC 21021 snpA\ SfradA, S. fradiae snpA, SfelA, S. fellus snpA; SnamA, S. ttamwaensis snpA\ SflocA, S. flocculus snpA\ ScoA, S. coelicolor snpA\ 11862A, Streptomyces ATCCl 1862 snpR s.coelicoior snoA

ATCC 11862 snpA

ATCC 21021 snpA I 100

S.namwaensis .

S.spadids 43 Streptomyces sp. snpAy. strain.CS snp A 46 82 S. thermotoierans snoA^ 37

99 S.griseorut)er snpA S.flocculus snpA 96 S. avermitilis snpA 100 100

S.lividans snpA S.rochei snp A S.fradiae snpA S.felllus snpA

Figure 3.16. Unrooted neighbor-joining phylogenetic tree for snpA sequences encoding the mature proteinase.

145 s.coelicoior snpA ATCC 11862 snpA

S.namwaensis 100 snp A » S.spectabilis snpA S.flocculus snpA

ATCC 21021 Streptomyces snpA sp. strain C5 44 snpA \ 42

S.avermitilis 44 snpA 27 90

S.lividans snpA

S. thermotoierans 84 100 snpA

S.spadids snpA

S.griseoruber 100 snpA S.rochei snpA

S.fradiae snp A

S.fellus snpA

Figure 3.17. Unrooted maximum parsimony phylogenetic tree for the snpA gene regions encoding the mature proteinase.

146 Discussion

This report details the isolation and characterization of genetic loci encoding new

members of the Streptomyces small neutral proteinase family. Originally undertaken to

provide further insight into the structure and function of the snpA-R intergenic region,

this effort has provided additional information about the LysR-like regulatory gene snpR,

the SnpA signal peptide and propeptide, and the mature proteinase itself. Moreover,

insight into the distribution and phylogeny of the Streptomyces snp locus is also now apparent. Reports by Lampel et a i, Dammann and Wohlleben, Butler et a i, and

Lichenstein et ai detailing the genetic characterization of the first members of the

Streptomyces small neutral proteinase family all appeared in 1992. Designated snp in

Streptomyces sp. strain C5 (Lampel et a i, 1992), sip or prt in S. lividans (Butler et ai and

Lichenstein et ai, respectively, both 1992), and mpr in S. coelicolor Müller (Dammann and Wohlleben, 1992), the loci are composed of two divergently oriented genes: the proteinase, and its associated LysR-like trancsriptional activator. Although the three loci are clearly homologous, significant differences were apparent. Two of the more pronounced were: i) the difference in length and sequence of the intergenic regions; and ii) the different observed sizes of the mature secreted proteinases from S. lividans and S. coelicolor compared with Streptomyces sp. strain C5 SnpA. As part of this study, twelve additional snp loci have been cloned and sequenced. While clarifying some historical issues, the new data also pose fresh questions about the streptomycete snp locus.

As a group, the fifteen different snp loci share two regions of high sequence similarity, the 5’ end of the snpR gene, encoding the helix-tum-helix motif of the DNA

147 binding protein, and the middle portion of the snpA gene, encoding the mature proteinase.

Thus, in retrospect, it was not unreasonable to use primers designed from these regions to

amplify homologous DNA fragments. O f the 96 Streptomyces isolates screened, 33

appeared to contain 5«p-like DNA, of which eighteen gave PCR products of size

comparable to those from the characterized loci. From these, twelve were chosen for

isolation, cloning, and sequence determination. Numerous instances of atypical

amplification products were observed (Table 3.2), and could likely represent more

diverse members of the small neutral proteinase family. S. nitgersensis, S. argenteolus

and Streptomyces ATCC 14892 exhibited multiple amplification products of different

size, suggesting paralogous proteinase loci may be present in these organisms. The

orthologous snp loci, however, are the primary focus of this study. The 5«p-positive

strains were fairly widely distributed among the various Streptomyces taxa (Figure 3.18).

For the comparative analyses, the locus was treated as a modular entity

comprised of five regions: the snpR ORf encoding the helix-tum-helix motif of SnpR,

the snpA-R intergenic region, the snpA signal sequence, the snpA propeptide, and the snpA mature proteinase. Alignment of the snpR gene fragments revealed that the 5’ portion of the gene is highly conserved among the fifteen members. In fact, the LysR-

like transcriptional regulator proteins as a class exhibit significant similarity in the amino- terminal regions of the proteins, but very little similarity in the more carboxyl-terminal portions of the proteins (Schell, 1993). The first sixteen amino acids are completely conserved, suggesting snpR derives from a common ancestor, and the amino-terminus of the encoded protein is critical to proper function of the protein. In order to compare this level of observed SnpR similarity to an analogous set of LysR-like homologs, an

148 Figure 3.18. Distribution of snp loci among different Streptomyces taxa. The Streptomyces clusters containing species tested positive for 5w/7-Iike DNA are indicated with (+). Not all 33 5«/7-positive streptomycetes could be placed into this classification. Adapted from Williams et al., 1983.

149 0.63 0.66 0.69 0.73 0.77 0.81 0.84 0.88 0.92 0.96 0.99 I------1------1------1------■ ■ ' ■ '____ I I

1« Sfrepremyws aiJgio/tMvus -

16 $tr0pes'rryç9s aJb^do^iavu» aruW us

ic Strrpayrryces « S ceflsw s Aâf.*/»* ■

3 £SfTVOfrvmCBuS

5 £.mMfcOmtus

6 &vteuic9

10 5 "

14 S0Uf9Ot0Cl»n9 15 S.cftfomofu9sus

17 &pr*s»ow.A»

19 S.(tmsaetis •

S o*vac»cv>.uj

S çns0onjO0r 2322 SCnsmZiM rrttfrsJTawf poon0si9 27 S vtnooefvor^OÇ0n0S 28 5 S/ysKtrt — — r-f S a'fftiftoocus

33 S.ef*for*off0fius > 34 S n o ç a iM tm r 36 S.Vicrmov\jtçans 37 S.g^S0Ofi0ws 3 0 S «OTTÿisporo/ltvu* 40 S.pfta»ocAror>oOtfncg £m«os^s - 65 S.^s#ocff.Ti4.*u** 56 S r*frapjW S.OiastTiycmttcui" S6-v*/t>CjPj?W5*

• AS K)Tnedy cimwiwK) «a Sxropfov^rttetCujm %o q

Figure 3.18. Distribution of snp loci among different Streptomyces taxa.

150 alignment was prepared of 12 Rhizobium species nodD\ genes, which encode LysR-like activators of nodulation genes (Honma and Ausubel, 1987; Mulligan and Long, 1989).

The results showed that the deduced NodDl proteins exhibited a similarly high degree of relatedness in the first 60 amino-terminal amino acid residues (data not shown). Within the helix-tum-helix region (as defined by the method of Dodd and Egan), the amino acid sequence also was more variable. This may be a reflection of lower selective pressure, or perhaps parallel evolution of SnpR with sequences of the intergenic region, such that proper complementarity with the cognate binding site is maintained. Beyond the helix- tum-helix motif, some similarity was observed, but to a lesser degree than the amino- terminus. Again, this is consistent with the LysR-like proteins as a class (Schell, 1993).

The 3’ snpR sequences were not isolated in this study, which is unfortunate because the carboxyl-terminal regions of these transcriptional regulators often contain structural features tailored to the specific locus being regulated, such as co-inducer binding motifs

(Schell, 1993). Additional structural information could shed light on SnpR fimction, as the nature of snp locus expression is not completely understood.

The sequences of the snpA-R intergenic regions, in contrast to the snpR genes, show substantially more variation. After several computer-based algorithms gave unsatisfactory results, an alignment which emphasizes the similarities of three key functional regions of the intergenic sequence (the snpA promoter region, the T-Nn-A putative SnpR binding region, and the snpA promoter) was arrived upon using manual methods. Immediately upstream from the snpR start codon lies a region with several well conserved blocks of sequence. Continuing toward snpA, a stretch of nucleotides ranging between 10 and 100 nucleotides precedes the T-Nn-A regions implicated in SnpR

151 binding. Another stretch of sequence ranging in size from ca. 45 to 115 nucleotides separates the T-Nn-A motif from several conserved regions just upstream of the snpA start codon. The conserved sequences adjacent to the snpR start codon are noteworthy, especially because the Streptomyces sp. strain C5 snpR promoter is located 138 bp upstream of the start codon. Since none of these elements have any significant complementarity to the 3’ end of S. lividans 16S rRNA, it is likely these regions have been conserved through evolution due to frmctional importance, perhaps to serve as binding sites for unknown regulatory proteins. Translation initiation in the different snp- containing streptomycetes is likely primed by different sequences within the upstream untranslated region of the homologous snpR transcripts. The ability of Streptomyces to express foreign genes with non-streptomycete ribosome-binding sites, and the wide variety of ribosome binding site sequences found in streptomycete genes has led to the theory that extensive similarity between 5’ untranslated mRNA sequences and 3’ 16S rRNA termini is not required for translation initiation (Strohl, 1992).

The region designated as the likely SnpR binding site in these loci is signified by a conserved T-Nn-A motif within an inverted repeat, followed by a TGNCATCAT sequence (sense strand with respect to snpA). Although the fifteen snp sequences exhibit substantial similarity in this region, the location of the putative SnpR binding motif with respect to the snpR and snpA genes varies considerably. In the case of Streptomyces sp. strain C5 snp, the T-Nn-A motif overlaps the -35 region of the snpR promoter, suggesting Streptomyces sp. strain C5 SnpR may negatively autoregulate its own transcription. Since the homologous loci do not exhibit conserved sequences within the region known to be the snpR promoter in Streptomyces sp. strain C5, and since the

152 distances between the T-Nn-A motifs and their respective snpR start codons varies by as

much as 100 nucleotides, it is plausible that snpR expression is regulated differently among the jn/?-positive Streptomyces species.

Such a regulatory differential may also exist in snpA expression, since the distance between the putative SnpR binding sites and snpA start codons varies by as much as 70 nucleotides, and there appear to be two classes of snpA start codon-proximal regions. As shown in Figure 3.09, five of the sequences group with Streptomyces sp. strain C5 snp, and most likely have overlapping start points for snpA transcription and translation, while a second group of nine sequences is dissimilar to Streptomyces sp. strain C5 snp, and may represent loci with alternative snpA transcriptional start sites.

Taken as a whole, these observations complicate efforts at defining a model of snp locus regulation that can be applied to all the species tested. The experiments conducted on

Streptomyces sp. strain C5 showed an snpR promoter which could potentially be occluded by SnpR binding, and overlapping snpA transcriptional and translational start sites. Comparison of Streptomyces sp. strain C5 snp to the other homologous loci, however, indicates differences in the sequences assigned these roles, thus the only logical conclusion is that the details of snp intergenic regions vary in the species tested.

In general, polypeptides destined to traverse the cytoplasmic membrane have signal peptides attached to either the amino- or carboxyl-terminal ends. Such signal sequences are most frequently found on the amino-termini of proteins destined for secretion, and are thus also known as prepeptides or leader peptides. These sequences function to direct the nascent polypeptide to the cell membrane, and enable the protein secretion machinery to recognize and transport the polypeptide through the membrane.

153 The fifteen deduced SnpA proteins contain sequences resembling signal peptides, and all

but one, the S. spadicis sequence, retain the same approximate length and characteristics.

The key properties defined by Von Heijne (1986), a positively charged amino terminus, a

hydrophobic core, a polar carboxyl terminal segment and a helix-breaking proline in the

-3 position (relative to the likely signal peptidase cleavage site) are maintained in the

SnpA prepeptides (leader sequences) despite variation in the specific amino acid

sequences. The outlying S. spadicis signal peptide retains most of the conserved residues

of the other signal peptides, such as the GLGL motif within the hydrophobic core and the

PAXA sequence at the likely cleavage site, but exhibits an extended deduced amino terminus. Since a wide variety of leader peptides have been described for Streptomyces secreted proteins (Rowland et al., 1992), and at least four signal peptidases are known to be encoded by the S. lividans chromosome (Parro et al., 1999), it is plausible that the streptomycete protein secretion machinery can recognize and process diverse leader peptide sequences. Thus it stands to reason that mutations within the snpA sequence encoding the leader peptide which changed the specific amino acids without effecting the overall characteristics of the leader peptide were not detrimental to SnpA expression, and therefore were tolerated.

When compared to the imderstanding of signal sequences, the functional role of propeptides is less well imderstood. Propeptide sequences are thought to maintain premature enzymes in the zymogen-state until they are transported to their final destination, such as the extracellular milieu (Chang et al., 1994). Numerous studies have implicated propeptides in protein folding, protein secretion, and inhibition o f enzymatic activity. The propeptide of the a-lytic proteinase, for example, was shown to lower the

154 activation energy barrier between inactive and active conformations of the enzyme, indicating it functions as an intramolecular chaperonin (Baker et al., 1992a). During secretion of the a-lytic proteinase, the propeptide is autocatalytically cleaved from the zymogen and degraded, leaving the extracellular mature proteinase in the active conformation (Fujishige et al., 1992). Additionally, it was demonstrated that, in vitro, the a-lytic proteinase propeptide is a potent inhibitor of its parental enzyme (Baker et al.,

1992b). Coupled with observation of the inhibitory properties of propeptides from subtilisin-E and carboxypeptidase Y (Ohta and Inouye, 1990; Winther and Sorensen,

1991), these reports suggest a potential biological role of propeptides as intracellular inhibitors of their parental enzymes, protecting the cytoplasmic components from proteolysis before the proteinase is secreted. It is more likely, though, that this phenomenon is ancillary to the primary propeptide function, at least in the fore mentioned cases, of lowering the energy barrier between the inactive and active conformations of the enzyme (Silen and Agard, 1989).

Exhibiting as much as 20% variation in length, the SnpA propeptides show very little similarity within their amino-terminal segments, but the carboxyl-terminal sequences are quite strongly conserved. The consensus motifs AFFEAV and

VAEKRAANP are present in all but the S. flocculus SnpA propeptide, suggesting a role for these residues in proper SnpA function, and hence selective pressure to retain these sequences through evolutionary time.

One interesting piece of evidence pertaining to the function of the SnpA propeptide can be found in the reports describing the Streptomyces sp. strain C5, S. caespitosus, S. lividans, and S. coelicolor SnpA enzymes.

1 5 5 The amino termini of the mature proteinases from Streptomyces sp. strain C5 and

S. caespitosus were reported to be AAVTV and TVTV, respectively, giving them molecular masses between 15,741 and 14,377 Daltons, respectively. The amino terminal sequence of the S. lividans proteinase, however, was determined to be YQPSA, giving the enzyme a calculated molecular mass of 19,757 Daltons, which agreed with the apparent molecular mass for purified S. lividans proteinase o f20,000 Daltons (Butler el al., 1992). Curiously, the YQPSA amino terminus of the S. lividans proteinase is 14 amino acid residues carboxyl-terminal of the predicted signal peptidase cleavage site (See

Figures 3.10 and 3.11). This might be a manifestation of amino-terminal heterogeneity, which has been observed with over-expressed recombinant proteins (Chang and Chang,

1988), or, alternatively, could suggest incomplete stepwise processing of the propeptide.

Additionally, although the amino-terminal sequence was not determined, S. coelicolor

SnpA was also shown to have an apparent molecular mass of ca. 20,000 Daltons. If the

Streptomyces jrp. strain C5 and S. caespitosus proteinases represent fully mature enzymes, then perhaps the S. lividans and S. coelicolor proteinases are in fact active zymogens.

In 1999, Nirasawa and co-workers described an aminopeptidase from Aeromonas caviae which is active in its zymogen form (Nirasawa et al., 1999). Using a synthetic substrate, it was demonstrated that the fully processed proteinases possessed a much greater ^cat value than the zymogen, but that both forms of the enzyme had the same Km value. This suggests that the propeptide does not significantly affect the formation of the enzyme-substrate complex, and therefore does not influence the conformation of the active site. The low /teat value of the weakly active zymogen, on the other hand, indicates a high activation energy barrier is present in the reaction coordinate of the zymogen-

156 substrate complex. In a similar study, procarboxypeptidase A kcM values were shown to be significantly lower than those of the mature enzyme, while the Km values were similar

(Uren and Neurath, 1974). Although both of these examples are exopeptidases, they set a precedent for the existence of active proteinase zymogens.

Without further information about the kinetic properties of the S. lividans and S. coelicolor SnpA proteinases, it is difficult to determine if in fact the enzymes are fully active or only partially active. It is tempting to speculate about how such a processing divergence occurred. Perhaps the larger proteinases do not play as central a role in the proteolytic capability of these streptomycetes as do their smaller, likely fully-mature counterparts, and we are observing the manifestation of lower selective pressure.

Alternatively, perhaps the observed sizes of the S. lividans and S. coelicolor proteinases are an artifact of the recombinant context in which they were produced. Regardless of the speculative explanations, the SnpA propeptide system presents a unique opportunity to explore the biology of small neutral proteinase maturation.

As mentioned earlier, the analyses conducted on the mature snpA regions were based on the assumption that the VTV motif common to Streptomyces sp. strain C5 and

S. caespitosus SnpA (as well as all other homolog sequences) represents the amino terminus of the fully mature proteinase. In 1997 Kurisu and co-workers published a high-resolution x-ray crystallographic model of the S. caespitosus SnpA proteinase. This report provides informative structural details that correspond to the primary sequence of

S. caespitosus SnpA, and by extension, the entire homolog family. The mature SnpA proteinase is comprised of a highly twisted five-stranded P-sheet, four a-helicies, and one ion each of zinc and calcium associated with the proteinase. Kurisu et al. divided the

157 proteinase into two domains. The N-domain, from threonine-1 to aspartate-76 (5. caespitosus numbering, see figure 3.12) contains relatively regular secondary structural elements such as the P-sheets and several a-helicies. The C-domain (from glycine-90 to glycine-132) however contains more irregular structure. A long a-helix connects the two domains, and contains the HEXXH active site motif responsible in part for binding the catalytically essential zinc ion. The two domains form a cleft roughly parallel to the connecting a-helix, and this groove contains the active site. The SnpA proteinases are structurally unique among the metalloproteinases (Kurisu et al., 1997) because the third zinc ligand is an aspartic acid residue, rather than histidine, as found in the larger metalloproteinases (Rawlings and Barrett, 1995). The extended active site motif of the proteinases, HETGHVLGLPD, contains the three zinc liganding residues (two histidines and one aspartic acid residue) which together with one water molecule complete the tetrahedral ligation of the catalytic zinc.

Before the crystal structure was available, some speculation concerned the likelihood of histidine-94 acting as the third zinc ligand, which would make the active site structure more consistent with those of other metalloproteinases. This proposal was discarded when the crystal structure of S. caespitosus ScNP (Kurisu et al., 1997) became available. In addition to the crystallographic evidence, the alignment of SnpA homologs shows the strictly conserved extended active site motif ends at aspartic acid-93, and the adjacent histidine-94 is replaced in four of the 16 cases. Of significance also is methionine-103, which forms the methionine-tum characteristic of the metzincin family of metalloproteinases (Bode et al., 1996). The metzincins, also known as clan MB metalloproteinases by the nomenclature of Rawlings and Barrett (1995), form a super- 158 family of enzymes with common structural features. The SnpA proteinases, classified as clan MB Family M7 by (Rawlings and Barrett, 1995), are very significant metzincins because they are the only family within clan MB with an aspartic acid serving as the third zinc ligand; all other members of clan MB have three zinc-liganding histidine residues.

An additional divergence is the absence in SnpA proteinases of the “cysteine-switch” propeptide mechanism. Many clan MB metalloproteinases have a conserved cysteine residue within the propeptide which forms a temporary fourth bond with the catalytically essential zinc ion, maintaining inactivity of the zymogen until cleavage of the propeptide

(Springman et al., 1990).

Although the SnpA active site sequence is strongly conserved among the homologs, other regions, especially the amino-terminal third of the matiue proteinases, show more variation. These sequences correspond to the N-domain of SnpA, which exhibits more regular structural elements, and this region is probably more tolerant of amino acid substitutions that do not alter the overall protein conformation. The area of gapping in the mature SnpA aligiunent corresponds to a surface exposed loop between p- sheets in the N-domain on the side of the enzyme opposite the active site cleft, as shown in Figure 3.19.

Due to the high degree of relatedness between the snpR and mature-amp/f sequences, pairwise distance values generated from snpR and snpA alignments (Tables

3.3 and 3.4) are relatively small. A family of Streptomyces protease inhibitors characterized recently, for example, showed an average pairwise distances of 0.64, with a range of 0.1 to 1.10 (Taguchi et al., 1997). The data in tables 3.3 and 3.4 were generated with the Kimura Two parameter method (Kimura, 1980), and are very similar to data

159 Figure 3.19. Structure of S. caespitosus small neutral proteinase showing region of amino-acid gapping among the homologous SnpA proteinases. This ribbon drawing shows the surface-exposed loop between p-sheets II (obscured) and III. This perspective also shows the five stranded anti-parallel P-sheet comprised of p-strands I-V. His83, His87 and Asp93 are shown coordinating a zinc ion.

160 CO2I1

location o f gap in SnpA homo log alienment

/

Figure 3.19. Structure of 5. caespitosus small neutral proteinase showing region of amino-acid gapping among the homologous SnpA proteinases.

161 generated with the Jukes-Cantor method (Jukes and Cantor, 1969; data not shown). Both neighbor joining and maximum parsimony tests were performed on bootstrapped replicates of the distance tables, and in general, the same potential evolutionary relationships are evident from trees prepared by both methods. Comparison of the four trees reveals three monophyletic groups which connect Sfradiae with S. fellus, S. coelicolor with Streptomyces ATCCl 1862, and S. rochei with S. griseoruber and S. lividans. Additionally, inclusion of S. rochei snpA in the S. lividans-S. griseoruber snpA sister group is supported by the neighbor-joining tree, but not the maximum parsimony tree. Furthermore, the S. fradiae-S. fellus snpA group shows a tendency to cluster with the S. lividans-S. griseoruber-S. rochei snpA group. Lastly, both the neighbor-joining and maximum parsimony trees linked S. thermotolerans and S. avermitilis snpA, while the former tree suggested Streptomyces sp. strain C5 snpA was a sibling of these two as well. It would indeed be interesting to compare the molecular masses and biochemical properties of the proteinases apparently related to Streptomyces sp. strain C5 SnpA with those grouping with the S. lividans proteinase. Such information would further define the aforementioned differential propeptide processing as a characteristic of specific of snpA gene lineages. Since the three strong relationships are evident in both the snpR and snpA trees, it is likely that at least in these groups, the snp locus evolved as a cassette containing both genes.

Considering the heterogeneity of the intergenic and premature proteinase DNA, and the homogeneity of the SnpR amino-terminus and mature SnpA coding sequences, it appears likely that in the process o f evolution, most of the selective pressure was exerted within the latter sequences. The fact that the S. coelicolor and S. lividans proteinases are

162 apparently active as larger pro-enzymes suggests that the propeptide does not interfere

with catalysis, and consequently strict conservation of the propeptide sequence is not a

prerequisite for the evolutionary maintenance of SnpA activity. It would indeed be

interesting to compare the activity profiles of these two ‘proenzymes’ with their properly

processed counterparts, Streptomyces sp. strain C5 SnpA and S. caespitosus SnpA.

Summary

Twelve new members of the Streptomyces small neutral proteinase locus snp have

been cloned and sequenced. Analysis of these loci along with the snp homologs already

published revealed that the group shares significant nucleotide sequence homology in the

5’ snpR region, as well as in the snpA sequence encoding the mature SnpA proteinase.

The sequences between these two flanking regions, however, show much less sequence

conservation. The most pronounced region of sequence similarity within the snpA-R

intergenic region surrounds a T-Nn-A motif likely to be the site of SnpR binding.

Alignment of the deduced SnpA signal peptides reveals a conserved GLGL motif in the

hydrophobic core, and PAXA at the likely signal peptidase recognition site. When the

deduced propeptides were aligned, the carboxyl-terminal halves of the sequences showed

significantly higher relatedness than the amino-terminal halves. Within the deduced

amino acid sequences of the SnpA proteinases, very high sequence similarity was

observed throughout the sequence. Although distributed widely among the Streptomyces

taxa, phylogenetic analyses suggest monophyletic groupings of the snp loci from S. fradiae with S. fellus, S. coelicolor with Streptomyces ATCC 11862, and S. rochei with S. griseoruber and S', lividans.

163 CHAPTER 4

CONCLUSIONS

This study has contributed a number of significant observations to the understanding of the Streptomyces snp proteinase locus. SnpR, a LysR-iike regulator of snpA transcription, was demonstrated to activate expression from the proteinase promoter approximately 35-fold, clearly showing that SnpR serves as a transcriptional activator of snpA. In experiments designed to evaluate the temporal expression of the snpA promoter, the levels of Nptll reporter protein produced from a plasmid-bome transcriptional fusion of the aphll gene to the snpA promoter continued to increase well into stationary phase.

This suggests that, in a high copy munber plasmid, the promoter is still active after logarithmic growth phase, and perhaps may exhibit optimal expression in stationary phase. Accordingly, the heterologous endostatin expression-secretion project corroborates this observation, showing the most significant accumulation of product after logarithmic growth phase. Transcript mapping experiments identified the snpA promoter as positioned at the snpA start codon, making the snpA transcript a leaderless mRNA species. Conversely, the snpR transcript starts nearly 140 nucleotides upstream of the snpR start codon, generating a substantial 5’ untranslated region on the snpR mRNA.

164 A PCR-based screening experiment revealed that 33 out of 96 Streptomyces strains previously uncharacterized for small neutral proteinases contained j«/7-like DNA.

Fragments comprising the 5’ end of snpR, the intergenic region, and ca. 75% of the snpA gene from 12 such homologous snp loci were isolated and characterized. Analysis of these novel snp loci in comparison with the published sequences showed that, as a group, the loci have significant similarity in the 5’ end of the snpR open reading frame as well as in the snpA region encoding the mature proteinase. In contrast to these regions, the snpA-

R intergenic region and the 5’ end of the snpA gene exhibit lower levels of similarity.

Significantly, the Streptomyces sp. strain C5 snpR promoter structure appears not to be conserved in the other homologous loci. The Streptomyces sp. strain C5 snpA promoter structure looks to be conserved in a subgroup of the loci, termed here group A, while absent in the remaining loci (group B).

Throughout the basic research agenda, an applied research project exploring the development and application of Streptomyces sp. strain C5 ^n/7-based heterologous gene expression tools was underway. The outcome of this project was a set of transcriptional and translational fusion vectors utilizing the high activity snpA promoter. The expression of Streptomyces sp. strain C5 doxA, and the expression and secretion of recombinant human endostatin are examples of successfully expressed transcriptional and translational fusions, respectively.

The fact that SnpR appears to be the only LysR-like regulator directly activating a proteinase gene makes it an excellent model system to study the regulation of protease expression in bacteria. Of significant interest in this field is the nature of potential coinducer stimuli which may trigger expression of the SnpA proteinase (Wolfgang

165 Wohlleben, personal communication). Future experimentation in this area might include extension of the cloned snp loci to include the 3’ regions of snpR, which could then be aligned to search for conserved coinducer binding domains.

Further study of the snp loci also could shed light on the nature of proteinase processing and maturation. The differential processing exhibited by the published SnpA proteinases from Streptomyces sp. strain C5, S. caespitosus, S. lividans, and S. coelicolor makes these enzymes good candidates for such a project. Comparison of the substrate specificity and kinetics of these proteinases could expand the understanding of this processing phenomenon, and lend further credence to the “active-zymogen” theory posited by Nirasawa et al. (1999). The apparently divergent transcriptional characteristics of the various snp homologs are also of interest. Future experimentation in this area might involve mapping the transcriptional start points for several loci falling into group B with respect to snpA transcription, as well as several from group A to confirm they indeed generate leaderless snpA transcripts.

From the applied perspective, a fourth generation of jnp-based expression vectors might incorporate an inducible promoter, such as the tipA (Murakami et al., 1989) or merR (Rother et al., 1999) promoters of S. lividans, driving snpR expression to allow inducible expression of heterologous genes. Additionally, factorial-design fermentation optimization experiments, such as those described by Plackett and Burman (1946) would permit maximization of snpA promoter expression, and would lead to more productive fermentations and insight into the physiology of snp expression.

166 LIST OF REFERENCES

Adhya, S. and Garges, S. (1990).Positive control. J Bio/ Chem 265, 10797-800.

Anne, J. and Van Mellaeit, L.(1993). Streptomyces lividans as host for heterologous protein production. FEMS Microbiol Lett 114, 121-8.

Anne, J., Van Mellaert, L., Lammertyn, E., Joris, B., Sablon, E., Van Broekhoven, A. and Eyssen, H. (1995). Factors influencing secretion of heterologous proteins in Streptomyces lividans using mouse tumor necrosis factor alpha as a model protein. In Proceedings o f the Ninth Symposium on the Actinomycetes., pp. 55-60.

Appelbaum, E. R., Thompson, D. V., Idler, K. and Chartrain, N. (1988).Rhizobium japonicum USDA 191 has two nodD genes that differ in primary structure and function. J Bacterial 170, 12-20.

Baker, E. N. (1977). Structure of actinidin: details of the polypeptide chain conformation and active site from an electron density map at 2-8 angstrom resolution. J A/o/ Biol 115, 263-77.

Baker, D., Sohl, J. L. and Agard, D. A. (1992).A protein-folding reaction under kinetic control [see comments]. Nature 356, 263-5.

Baker, D., Siien, J. L. and Agard, D. A. (1992).Protease pro region required for folding is a potent inhibitor of the mature enzyme. Proteins 12, 339-44.

Baibas, P., Soberon, X., Merino, E., Zurita, M., Lomeli, H., Valle, F., Flores, N. and Bolivar, F. (1986). Plasmid vector pBR322 and its special-purpose derivatives—a review. Gene 50. 3-40.

167 Barder, M. J. and Crawford, D. L. (1981).Effects of carbon and nitrogen supplementation on lignin and cellulose decomposition by a Streptomyces. Can J Microbiol 27. 859-63.

Bartowsky, E. and Normark, S. (1991).Purification and mutant analysis of Citrobacter freimdii AmpR, the regulator for chromosomal AmpC beta-lactamase. Mol Microbiol 5, 1715-25.

Beck, E., Ludwig, G , Auerswald, E. A., Reiss, B. and Schaller, H. (1982).Nucleotide sequence and exact localization of the neomycin phosphotransferase gene from transposon Tn5. Gene 19, 327-36.

Becker, A. B. and Roth, R. A. (1993).Identification of glutamate-169 as the third zinc- binding residue in proteinase III, a member of the family of insulin-degrading enzymes. BiochemJ 292, 137-42.

Bender, R. A. (1991).The role of the NAG protein in the nitrogen regulation of Klebsiella aerogenes. Mol Microbiol 5, 2575-80.

Bernan, V., Filpuia, D., Herber, W., Bibb, M. and Katz, E. (1985).The nucleotide sequence of the tyrosinase gene from Streptomyces antibioticus and characterization of the gene product. Gene 37, 101-10.

Bibb, M. J., Freeman, R. F. and Hopwood, D. A. (1977).Physical and genetical characterization of a second sex factor, SCP2, for Streptomyces coelicolor A3(2). Mol Gen Genet 154, 155-66.

Bibb, M., Schottel, J. L. and Cohen, S. N. (1980).A DNA cloning system for interspecies gene transfer in antibiotic- producing Streptomyces. Nature 284, 526-31.

Bibb, M. J. and Hopwood, D. A. (1981).Genetic studies of the fertility plasmid SCP2 and its SCP2* variants in Streptomyces coelicolor A3{2).J Gen Microbiol 126, 427-442.

Bibb, M. J., Ward, J. M., Kieser, T., Cohen, S. N. and Hopwood, D. A. (1981). Excision of chromosomal DNA sequences from Streptomyces coelicolor forms a novel family of plasmids detectable in Streptomyces lividans. Mol Gen Genet 184, 230-40.

168 Bibb, M. J. and Janssen, G. R. (1986).Unusual features of transcription and translation on antibiotic resistance genes in antibiotic-producing Streptmyces. In Fifth International Symposium on the Genetics o f Industrial Microorganisms., pp. 309-318. Edited by M. Alacevic, D. Hranueli and Z. Toman.

Binnie, C., Cossar, J. D. and Stewart, D. I. (1997).Heterologous biopharmaceutical protein expression in Streptomyces. Trends Biotechnol 15, 315-20.

Bimboim, H. C. and Doly, J. (1979).A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res 7, 1513-23.

Blundell, T., Sibanda, B. L. and Pearl, L. (1983).Three-dimensional structure, specificity and catalytic mechanism of renin. Nature 304, 273-5.

Bode, W., Grams, F., Reinemer, P., Gomis-Ruth, F. X., Baumann, U., McKay, D. B. and Stocker, W. (1996).The metzincin-superfamily of zinc-peptidases. Adv Exp Med 5/0/389. 1-11.

Bohannon, D. E. and Sonenshein, A. L. (1989).Positive regulation of glutamate biosynthesis in Bacillus sublilis. J Bacterial 171, 4718-27.

Bourn, W. R. and Babb, B. (1995).Computer assisted identification and classification of streptomycete promoters. Nucleic Acids Res 23, 3696-703.

Bradford, M. M. (1976).A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72, 248- 54.

Brana, A. F., Manzanal, M. B. and Hardisson, C (1982).Characterization of intracellular polysaccharides of Streptomyces. Can J Microbiol 28, 1320-3.

Brau, B., Pilz, U. and Piepersberg, W. (1984). Genes for gentamicin-(3)-N- acetyltransferases 111 and IV: I. Nucleotide sequence of the AAC(3)-1V gene and possible involvement of an IS 140 element in its expression. Mol Gen Genet 193, 179-187.

169 Bruhlmann, F. and Keen, N. T. (1997).Cloning, sequence and expression of the pel gene from an Amycolata sp. Gene 202,45-51.

Brumbley, S. M., Carney, B. F. and Denny, T. P. (1993).Phenotype conversion in Pseudomonas solanacearum due to spontaneous inactivation of PhcA, a putative LysR transcriptional regulator. J Bacteriol 175, 5477-87.

Bum, J. E., Hamilton, W. D , Wootton, J. C. and Johnston, A. W. (1989).Single and multiple mutations affecting properties of the regulatory gene nodD of Rhizobium. Mol Microbiol 3, 1567-77.

Burnett, W. V., Brawner, M., Taylor, D. P., Fare, L. R., Henner, J. and Eckhardt, T. (1985). Cloning and analysis of an exported beta galactosidase and other proteins from Streptomyces lividans. In Microbiology 1985, pp. 441-444. Edited by L. Leive. Washington, D C.: American Society for Microbiology.

Butler, M. J., Davey, C. C., Krygsman, P., Walczyk, E. and Malek, L. T. (1992). Cloning of genetic loci involved in endoprotease activity in Streptomyces lividans 66: a novel neutral protease gene with an adjacent divergent putative regulatory gene. Can J Microbiol 38,912-20.

Caldwell, A. L. and Gulig, P. A. (1991).The Salmonella typhimurium virulence plasmid encodes a positive regulator of a plasmid-encoded virulence gene. J Bacteriol 173, 7176- 85.

Campbell, J. I., Scahill, S., Gibson, T. and Ambler, R. P. (1989).The phototrophic bacterium Rhodopseudomonas capsulata splOS encodes an indigenous class A beta- lactamase. Biochem J 260, 803-12.

Carter, M. J. and Milton, I. D. (1993).An inexpensive and simple method for DNA purifications on silica particles. Nucleic Acids Res 21, 1044.

Chang, S. (1987). Engineering for protein secretion in gram-positive bacteria. Methods Enzymol 153, 507-16.

170 Chang, S.-Y. and Chang, S. (1988).In Biology o f actinomycetes '88- Proceedings o f the Seventh International Symposium on the Biology o f Actinomycetes., pp. 103-107. Edited by Y. Okami, T. Beppu and H. Ogawara. Tokyo: Japan Scientific Societies Press.

Chang, M., Hadero, A. and Crawford, I. P. (1989).Sequence of the Pseudomonas aeruginosa trpi activator gene and relatedness of trpi to other procaryotic regulatory genes. J Bacteriol \1 \, 172-83.

Chang, S. C , Chang, P. C. and Lee, V. H. (1994). The roles of propeptide in maturation and secretion of Npr protease from Streptomyces. J Biol Chem 269, 3548-54.

Charpentier, M. and Percheron, F. (1983).The chitin-degrading enzyme system of a Streptomyces species. Int J Biochem 15, 289-92.

Chater, K. F. (1998).Taking a genetic scalpel to the Streptomyces colony. Microbiology 114, 1465-1478.

Chen, P. L. (1966). Observation of the nuclear behavior in Streptomyces cinnamonensis. Amer J Bot 53, 291-295.

Dammann, T. and Wohlleben, W. (1992). A metalloprotease gene from Streptomyces coelicolor Muller' and its transcriptional activator, a member of the LysR family. Mol Microbiol 6, 2267-78.

Davison, J., Brunei, F., Phanopoulos, A., Prozzi, D. and Terpstra, P. (1992).Cloning and sequencing of Pseudomonas genes determining sodium dodecyl sulfate biodégradation. Gene 114, 19-24.

Deng, Z. X., Kieser, T. and Hopwood, D. A. (1988)."Strong incompatibility" between derivatives of the Streptomyces multi- copy plasmid pIJlOl. Mol Gen Genet 214, 286-94.

Devereux, J., Haeberli, P. and Smithies, O. (1984).A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 12, 387-95.

171 Dickens, M. L. and Strohl, W. R. (1996).Isolation and characterization of a gene from Streptomyces sp. strain C5 that confers the ability to convert daunomycin to doxorubicin on Streptomyces lividans TK2A. J Bacteriol 178,3389-95.

Dodd, I. B. and Egan, J. B. (1990).Improved detection of helix-tum-helix DNA- binding motifs in protein sequences. Nucleic Acids Res 18, 5019-26.

Drenth, J., Jasonius, J. N., Koekoek, R. and Walthers, B. G. (1971).The structure of papain. Adv Protein Chem 25, 79-115.

Faxen, M., Plumbridge, J. and Isaksson, L. A. (1991).Codon choice and potential complementarity between mRNA downstream of the initiation codon and bases 1471 - 1480 in 16S ribosomal RNA affects expression of glnS. Nucleic Acids Res 19, 5247-51.

Fellay, R., Hanin, M., Montorzi, G., Frey, J., Freiberg,C., Golinowski, W., Staehelin, C , Broughton, W. J. and Jabbouri, S. (1998).nodD2 of Rhizobium sp. NGR234 is involved in the repression of the nodABC operon. Mol Microbiol 2 7 ,1039- 50.

Feisenstein, J. (1993).PHYLIP. University of Washington, USA.

Fersht, A. (1977).Structure and mechanism of selected enzymes. In Enzyme structure and mechanism., pp. 288-354. San Francisco: W. H. Freeman.

Fornwald, J. A., Donovan, M. J , Gerber, R., Keller, J., Taylor, D. P., Arcuri, E. J. and Brawner, M. E. (1993).Soluble forms of the human T cell receptor CD4 are efficiently expressed by Streptomyces lividans. Biotechnology (NY) 11, 1031-6.

Fujishige, A., Smith, K R., Silen, J. L. and Agard, D. A. (1992).Correct folding of a- lytic protease is required for its extracellular secretion from Escherichia coli. J Cell Biol 118, 33-42.

Gam ier, J., Osguthorpe, D. J. and Robson, B (1978).Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120, 97-120.

172 Gibb, G. D.y Dekleva, M. L., Lampel, J. S. and Strohl, W. R. (1987).Isolation and characterization of a Streptomyces C5 mutant strain hyperproductive of extracellular protease. Biotechnol Lett 9, 605-610.

Gibb, G. D and Strohl, W. R. (1988).Physiological regulation of protease activity in Streptomyces peucetius. Can J Microbiol‘^4, 187-90.

Gibb, G. D , Ordaz, D. E. and Strohl, W. R. (1989).Overproduction of extracellular protease activity by Streptomyces C5-A13 in fed-batch fermentations. Appl Microbiol Biotechnol 31, 119-124.

Gibson, K. E. and Silhavy, T. J. (1999).The LysR homolog LrhA promotes RpoS degradation by modulating activity of the response ve^XdXor sprE. J Bacteriol 181, 563- 71.

Gilbert, M., Morosoli, R., Shareck, F. and Kluepfel, D. (1995).Production and secretion of proteins by streptomycetes. Crit Rev Biotechnol 15, 13-39.

Ginther, C. L. (1978).Sporulation and the production of cephamycin c by Streptomyces lactamdurans. Antimicrob Agents Chemother 15, 522-526.

Gold, L. (1988). Posttranscriptional regulatory mechanisms in Escherichia coli. Annu Rev Biochem 57. 199-233.

Goldberg, M. B., Boyko, S. A. and Calderwood, S. B. (1991).Positive transcriptional regulation of an -regulated virulence gene in Vibrio cholerae. Proc Natl Acad Sci (75^88, 1125-9.

Goodfellow, M., Mordarsld, M. and Williams, S. T. (1984).Introduction to and importance of actinomycetes. In Biology o f Actinomycetes, pp. 1-6. Edited by M. Goodfellow, M. Mordarski and S. T. Williams. London: Academic Press.

Goodfellow, M. and Cross, T. (1984). Classification. In Biology of Actinomycetes, pp. 7-164. Edited by M. Goodfellow, M. Mordarski and S. T. Williams. London: Acedemic Press.

173 Goss, T. J. and Datta, P. (1985).Molecular cloning and expression of the biodegradative threonine dehydratase gene {tdc) of Escherichia coli K12. Mol Gen Genet 201,308-14.

Habeeb, L. F., Wang, L. and Winans, S. C. (1991).Transcription of the octopine catabolism operon of the Agrobacterium tumor-inducing plasmid pTiA6 is activated by a LysR-type regulatory protein. Mol Plant Microbe Interact 4, 379-85.

Hall, T. A. (1999).BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 41, 95-98.

Harada, S., Kitadokoro, K., KInoshita, T., Kai, Y. and Kasai, N. (1991). Crystallization and main-chain structure of neutral protease from Streptomyces caespitosus. J Biochem (Tokyo) 110, 46-9.

Harada, S., Kinoshita, T., Kasai, N., Tsunasawa, S. and Sakiyama, F. (1995). Complete amino acid sequence of a zinc metalloendoprotease from Streptomyces caespitosus. Eur J Biochem 233, 683-6.

Harris, S, J., Shih, V. L., Bentley, S. D. and Salmond, G. P. (1998).The hexA gene of Erwinia carotovora encodes a LysR homologue and regulates motility and the expression of multiple virulence determinants. Mol Microbiol 28, 705-17.

Hartsuck, J. A. and Lipscomb, W. N. (1971).In The Enzymes., pp. 1. Edited by P. D. Boyer. New York: Academic Press.

Henderson, R. (1970).Structure of crystalline a-chymotrypsin. IV. The structure of indoleacryloyl-a-chyotrypsin and its relevance to the hydrolytic mechanism of the enzyme. J Mol Biol 54, 341-54.

HenikofT, S., Haughn, G. W., Calvo, J. M. and Wallace, J. C. (1988).A large family of bacterial activator proteins. Proc Natl Acad Sci U SA 85, 6602-6.

Hofmann, T. (1985). Metalloproteinases. In Metalloproteins, part 2: metal proteins with non-redox roles., pp. 1-64. Edited by P. M. Harison. Weinheim, Germany: Verlag Chemie GmbH.

174 Holmes, M. A. and Matthews, B. W. (1982).Structure of thermolysin refined at 1.6 angstrom resolution. J Mol Biol 160,623-39.

Holtz, C., Kaspari, H and Kiemme, J. H. (1991).Production and properties of xylanases from thermophilie actinomycetes. Antonie Van Leeuwenhoek 59, 1-7.

Honma, M. A. and Ausubei, F. M. (1987).Rhizobium meliloti has three functional copies of the nodD symbiotic regulatory gene. Proc Natl Acad Sci U SA 84, 8558-62.

Hooper, N. M. (1994). Families of zinc metalloproteases. FEES Lett 354, 1-6.

Hopwood, D. A., Bibb, M. J., Chater, K. F., Kieser, T., Bruton, C. J., Kieser, H. M., Lydiate, D. J., Smith, C. P., Ward, J. M. and Schrempf, H. (1985).Genetic manipulation o f Streptomyces: a laboratory manual. Norwich, UK: The John Innes Foundation.

Hopwood, D. A. (1988). The Leeuwenhoek lecture, 1987. Towards an understanding of gene switching in Streptomyces, the basis of sporulation and antibiotic production. Proc R Soc Lond B Biol Sci 235, 121-38.

Hopwood, D. A. (1989).Antibiotics: opportunities for genetic manipulation. Philos Trans R Soc Lond B Biol Sci 324, 549-62.

Hopwood, D. A. (1999).Forty years of genetics with Streptomyces: from in vivo through in vitro to in silico. Microbiology 145, 2183-202.

Huang, J. Z. and Schell, M. A. (1991).In vivo interactions of the NahR transcriptional activator with its target sequences. Inducer-mediated changes resulting in transcription activation. J 5 /0 / Chem 266, 10830-8.

Hunkapiller, M. W., Smallcombe, S. H., Witaker, D. R. and Richards, J. H. (1973). Ionization behavior of the histidine residue in the catalytic triad of serine proteases. Mechanistic implications. J Biol Chem 248, 8306-8.

175 Ishaque, M. and Kluepfel, D. (1980).Cellulase complex of a mesophilic Streptomyces strain. Can J Microbiol 26, 183-9.

Ito, K., Kawakami, K. and Nakamura, Y. (1993).Multiple control of Escherichia coli lysyl-tRNA synthetase expression involves a transcriptional repressor and a translational enhancer element. Proc Natl Acad Sci USA 90, 302-6.

Jacob, W. F., Santer, M. and Dahlberg, A. E. (1987).A single base change in the Shine-Dalgamo region of 16S rRNA of Escherichia coli affects translation of many proteins. Proc Natl Acad Sci U SA 84, 4757-61.

James, M. N., Hsu, I. N. and Delbaere, L. T. (1977).Mechanism of acid protease catalysis based on the crystal structure of penicillopepsin. Nature 267, 808-13.

James, M. N. (1980).An X-ray crystallographic approach to enzyme structure and function. Can J Biochem 58, 252-71.

James, M. N. G., Hsu, I. N., Hofman, T. and Sielecki, A. R. (1981).In Structural Studies on Molecules o f Biological Interest., pp. 350. Edited by G. Dodson, J. P. Glusker and D. Sayre. Oxford: Clarendon Press.

James, M. N. and Sielecki, A. R. (1983).Structure and refinement of penicillopepsin at 1.8 angstrom resolution. J Mol Biol 163, 299-361.

Janssen, G. R. (1993).Eubacterial, archaebacterial, and eukaryotic genes that encode leaderless mRNA. In Industrial Microorganisms: Basic and Applied Molecular Genetics, pp. 59-67. Edited by R. H. Baltz, G. D. Hegeman and P. L. Skatrud. Washington, DC: American Society for Microbiology.

Jaton-Ogay, K., Suter, M., Crameri, R., Falchetto, R., Fatih, A. and Monod, M. (1992). Nucleotide sequence of a genomic and a cDNA clone encoding an extracellular alkaline protease oi Aspergillus fumigatus. FEMS Microbiol Lett 71, 163-8.

Jiang, W. and Bond, J. S. (1992).Families of and their relationships. FEBS Lett 312, 110-4.

176 Jongeneel, C. V., Bouvier, J and Bairoch, A.(1989). A unique signature identifies a family of zinc-dependent metaliopeptidases. FEBS Lett 242, 211-4.

Jukes, T. H. and Cantor, C. R. (1969).Evolution of protein molecules. In Mammalian protein metabolism., pp. 21-132. Edited by H. N. Munro. New York: Academic Press.

Kaphammer, B. and Olsen, R. H. (1990).Cloning and characterization of tfdS, the repressor-activator gene of tfdB, from the 2,4-dichlorophenoxyacetic acid catabolic plasmid pJP4. J Bacteriol 172, 5856-62.

Katz, E., Thompson, C. J. and Hopwood, D. A. (1983).Cloning and expression of the tyrosinase gene from Streptomyces antibioticus in Streptomyces lividans. J Gen Microbiol 129, 2703-14.

Keller, J. W., Baurick, K. B., Rutt, G. C., O'Malley, M. V., Sonafrank, N. L., Reynolds, R. A., Ebbesson, L. O. and Vajdos, F. F. (1990).Pseudomonas cepacia 2,2- dialkylglycine decarboxylase. Sequence and expression in Escherichia coli of structural and repressor genes. J Biol Chem 265, 5531-9.

Kieser, T., Hopwood, D. A., Wright, H. M. and Thompson, C. J. (1982). pIJlOl, a multi-copy broad host-range Streptomyces plasmid: functional analysis and development of DNA cloning vectors. Mol Gen Genet 185, 223-8.

Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. 7 A/o/ Evol 16, 111- 20.

King, A. A. and Chater, K. F. (1986).The expression of the Escherichia coli lacZ gene in Streptomyces. J Gen Microbiol 132, 1739-52.

Klock, G. and Hillen, W. (1986).Expression, purification and operator binding of the transposon Tnl721- encoded Tet repressor. J A/o/ Biol 189,633-41.

Klug, A and Rhodes, D. (1987).Zinc fingers: a novel protein fold for nucleic acid recognition. Cold Spring Harb Symp Quant Biol 52, 473-82.

177 Korn-Wendisch, F. and Kutzner, H. J. (1991).The family of streptomycetaceae. In The prokaryotes., pp. 921-995. Edited by A. Balows, H. G. Truper, M. Dworkin, W. Harder and K. H. Schleifer. New York: Springer Verlag.

Krause, M., Roudier, C., Fierer, J., Harwood, J. and Guiney, D. (1991).Molecular analysis of the virulence locus of the Salmonella dublin plasmid pSDL2. Mol Microbiol 5,307-16.

Kurisu, G., Kinoshita, T., Sugimoto, A., Nagara, A., Kai, V., Kasai, N. and Harada, S. (1997). Structure of the zinc endoprotease from Streptomyces caespitosus. J Biochem (Tokyo) 121, 304-8.

Kusano, T. and Sugawara, K. (1993).Specific binding of Thiobacillus ferrooxidans RbcR to the intergenic sequence between the rbc operon and the rbcR gene. J Bacteriol 175, 1019-25.

Kutzner, H. J. (1981).Streptomyces. In The prokaryotes: a handbook o f habitats, isolation, and identification o f bacteria., pp. 2028-2090. Edited by M. P. Starr, H. Stolp, H. G. Truper, A. Balows and H. G. Schlegel. Berlin: Springer Verlag.

Laemmli, U. K. (1970).Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680-5.

Lammertyn, E., Van Mellaert, L., Schacht, S., Diilen, C., Sablon, E., Van Broekhoven, A. and Anne, J. (1997).Evaluation of a novel subtilisin inhibitor gene and mutant derivatives for the expression and secretion of mouse tumor necrosis factor alpha by Streptomyces lividans. Appl Environ Microbiol 63, 1808-13.

Lammertyn, E., Desmyter, S., Schacht, S., Van Mellaert, L. and Anne, J. (1998). Influence of charge variation in the Streptomyces venezuelae a-amylase signal peptide on heterologous protein production by Streptomyces lividans. Appl Microbiol Biotechnol 49, 424-30.

Lampel, J. S., Aphale, J. S., Lampel,K. A. and Strohl, W. R. (1992).Cloning and sequencing of a gene encoding a novel extracellular neutral proteinase from Streptomyces sp. strain C5 and expression of the gene in Streptomyces lividans 1326. J Bacteriol 174, 2797-808.

178 Lan, S. A., Holmquist, B. and Vallee, B. L. (1969). Thermolysin: a zinc metalloenzyme. Biochem Biophys Res Commun 37, 333-9.

Lechevalier, M. P. (1981).Ecological associations involving actinomycetes. In Actinomyces, pp. 159-164. Edited by K. P. Schaal and E. Pulverer. New York: Gustav Fischer.

Lichenstein, H., Brawner, M. E , Miles, L. M., Meyers, C. A., Young, F. R., Simon, P. L. and Eckhardt, T. (1988).Secretion of interleukin- 1 P and Escherichia coli galactokinase by Streptomyces lividans. J Bacteriol 170, 3924-9.

Lichenstein, H. S., Busse, L. A., Smith, G. A., Narhi, L. O., McGinley, M. O , Rohde, M. F., Katzowitz, J. L. and Zukowski, M. M. (1992).Cloning and characterization of a gene encoding extracellular metalloprotease from Streptomyces lividans. Gene 111, 125- 30.

Lomovskaya, N. D., Mkrtumian, N. M., Gostimskaya, N. L. and Danilenko, V. N. (1972). Characterization of temperate actinophage OC31 isolated from Streptomyces coelicolor A3(2). J Virol 9, 258-62.

Long, C. M., Virolle, M. J., Chang, S. V., Chang, S. and Bibb, M. J. (1987).Alpha- Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate a-amylases. J Bacteriol 169. 5745-54.

Lydiate, D. J., Maipartida, F. and Hopwood, D. A.(1985). The Streptomyces plasmid SCP2*: its functional analysis and development into useful cloning vectors. Gene 35, 223-35.

Mackie, G. A. (1986). Structure of the DNA distal to the gene for ribosomal protein S20 in Escherichia coli K12: presence o f a strong terminator and an ISl element. Nucleic Acids Res 14, 6965-81.

MacNeil, D. J. (1988).Characterization of a unique methyl-specific restriction system in Streptomyces avermitilis. J Bacteriol 170, 5607-12.

179 Magdalena, J., Gerard, C., Joris, B., Forsman, M. and Dusart, J. (1997).The two P- lactamase genes of Streptomyces cacaoi, blaL and blaU, are under the control of the same regulatory system. Mol Gen Genet 255, 187-93.

Maniatis, T., Fritsch, E. F. and Sambrook, J. (1982).Molecular cloning: a laboratory manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory.

Mazodier, P., Fetter, R. and Thompson, C. (1989).Intergeneric conjugation between Escherichia coli and Streptomyces species. J Bacteriol 171, 3583-5.

Mclver, J., Djordjcvic, M. A., Weinman, J. J., Bender, G. L. and Rolfe, B. G. (1989). Extension of host range of Rhizobium leguminosarum bv. trifolii caused by point mutations in nodD that result in alterations in regulatory function and recognition of inducer molecules. Mol Plant Microbe Interact 2, 97-106.

Mitchell, W. M. and Harrington, W. F. (1968).Purification and properties of clostridiopeptidase B (Clostripain). J Biol Chem 243, 4683-92.

Mondou, F., Shareck, F., Morosoli, R. and Kluepfel, D. (1986).Cloning of the xylanase gene of Streptomyces lividans. Gene 49, 323-9.

Mulligan, J. T. and Long, S. R. (1989). A family of activator genes regulates expression of Rhizobium meliloti nodulation genes. Genetics 122, 7-18.

Murakami, T., Holt, T. G. and Thompson, C. J. (1989).Thiostrepton-induced gene expression in Streptomyces lividans. J Bacteriol 171, 1459-66.

Nagai, H., Yuzawa, H. and Yura, T. (1991). Interplay of two cis-acting mRNA regions in translational control of sigma 32 synthesis during the heat shock response of Escherichia coli. Proc Natl Acad Sci U SA 88, 10515-9.

Narahashi, Y. and Yanagita, M. (1967).Studies on proteolytic enzymes (pronase) of Streptomyces griseus K-1. 1. Nature and properties of the proteolytic enzyme system. J Biochem (Tokyo) 62, 633-41.

180 Neal, R. J. and Chatcr, K. F. (1991).Bidirectional promoter and terminator regions bracket mmr, a resistance gene embedded in the Streptomyces coelicolor A3(2) gene cluster encoding methylenomycin production. Gene 100, 75-83.

Neidle, E. L., Hartnett, C. and Omston, L. N. (1989).Characterization of Acinetobacter calcoaceticus catM, a repressor gene homologous in sequence to transcriptional activator genes. 7 5ac/er/o/ 171, 5410-21.

Neurath, H. and Walsh, K. A. (1976). Role of proteolytic enzymes in biological regulation (a review). Proc Natl Acad Sci USA73, 3825-32.

Neurath, H. (1984). Evolution of proteolytic enzymes. Science 224, 350-7.

Nirasawa, S., Nakajima, Y., Zhang, Z. Z., Voshida, M. and Hayashi, K. (1999). Intramolecular chaperone and inhibitor activities of a propeptide from a bacterial zinc aminopeptidase. Biochem J341, 25-31.

O'Reilly, M. S., Boehm, T., Shing, V., Fukai, N., Vasios, G , Lane, W. S., Flynn, E., Birkhead, J. R., Olsen, B. R. and Folkman, J. (1997).Endostatin: an endogenous inhibitor of angiogenesis and tumor growth. Cell 88, 277-85.

Ohta, V. and Inouye, M.(1990). Pro-subtilisin E: purification and characterization of its autoprocessing to active subtilisin E in vitro. Mol Microbiol 4, 295-304.

Ostrowski, J., Jagura-Burdzy, G. and Kredich, N. M. (1987).DNA sequences of the cysB regions of Salmonella typhimurium and Escherichia coli. J Biol Chem 262, 5999- 6005.

Pabo, C. O. and Sauer, R. T. (1984).Protein-DNA recognition. Annu Rev Biochem 53, 293-321.

Paradis, F. W., Shareck, F., Dupont, C., Kluepfel, D. and Morosoli, R. (1996). Expression and secretion of P-glucuronidase and Pertussis toxin S1 by Streptomyces lividans. Appl Microbiol Biotechnol 45, 646-51.

181 Paradkar, A. S., Aidoo, K. A. and Jensen, S. E. (1998).A pathway-specific transcriptional activator regulates late steps of clavulanic acid biosynthesis in Streptomyces clavuligerus. Mol Microbiol 27, 831-43.

Parke, D. (1996). Characterization of PcaQ, a LysR-type transcriptional activator required for catabolism of phenolic compounds, from Agrobacterium tumefaciens. J Bacteriol 178. 266-72.

Parro, V., Schacht, S., Anne, J. and Mellado, R. P. (1999).Four genes encoding different type I signal peptidases are organized in a cluster in Streptomyces lividans TK21. Microbiology 145, 2255-63.

Pettey, T. M. and Crawford, D. L. (1985).Characterization of acid-precipitable, polymeric lignin (APPL) produced by Streptomyces viridosporus and protoplast fusion of recombinant Streptomyces strains. Biotechnol Bioeng Symp 15, 179-190.

Plackett, R. L. and Burman, J. P. (1946). The design of optimum multifactorial experiments. Biometrika 33, 305-325.

Plamann, L. S. and Stauffer, G. V. (1987).Nucleotide sequence of the Salmonella typhimurium metR gene and the metR-metE control region. J Bacteriol 169, 3932-7.

Ptashne, M., Hackman, K., Humayun, M.Z., Jeffrey,A., Maurer, R., Meyer, B. and Sauer, R. T. (1976). Autoregulation and function of a repressor in bacteriophage X . Science 194, 156-61.

Rahav-Manor, O., Carmel, O., Karpei, R., Taglicht, D., Glaser, G., Schuldiner, S. and Padan, E. (1992). NhaR, a protein homologous to a family of bacterial regulatory proteins (LysR), regulates nhaA, the sodium proton antiporter gene in Escherichia coli. J Biol Chem 267, 10433-8.

Rajgarhia, V. B. and Strohl, W. R. (1997).Minimal Streptomyces sp. strain C5 daunorubicin polyketide biosynthesis genes required for aklanonic acid biosynthesis. J Bacteriol 179, 2690-6.

182 Ramachandra, M., Crawford, D L. and Hertel, G. (1988).Characterization of an extracellular lignin peroxidase of the lignocellulolytic actinomycete Streptomyces viridosporus. Appl Environ Microbiol 54, 3057-63.

Rawlings, N. D. and Barrett, A. J. (1995).Evolutionary families of metallopeptidases. Methods Enzymol 248, 183-228. Redenbach, M., Kieser, H. M., Denapaite, D., Eichner, A., Cullum, J., Kinashi, H. and Hopwood, D. A. (1996).A set of ordered cosmids and a detailed genetic and physical map for the 8 Mb Streptomyces coelicolor A3(2) chromosome. Mol Microbiol 21. 77-96.

Reese, M. G., Harris, N. L. and Eeckman, F. H. (1996).Large Scale Sequencing Specific Neural Networks for Promoter and Splice Site Recognition. In Biocomputing: Proceedings o f the 1996 Pacific Symposium. Edited by L. Hunter and T. E. Klein. Singapore: World Scientific Publishing Company.

Renault, P., Gaillard in, C and Heslot, H. (1989).Product of the Lactococcus lactis gene required for malolactic fermentation is homologous to a family of positive regulators, y fiacrer/o/ 171, 3108-14.

Renna, M. C., Najimudin, N., WInik, L. R. and Zahler, S. A (1993). Regulation of the Bacillus subtilis alsS, alsD, and alsR genes involved in post-exponential-phase production of acetoin. J Bacteriol 175, 3863-75.

Reponen, T. A., Gazenko, S. V., Grinshpun, S. A., Willeke, K. and Cole, E. C. (1998). Characteristics of airborne actinomycete spores. Appl Environ Microbiol 64, 3807-12.

Resch, A., Tedin, K., Grundling, A., Mundlein, A and Blasi, U. (1996).Downstream box-anti-downstream box interactions are dispensable for translation initiation of leaderless mRNAs. EMBO J 15, 4740-8.

Richardson, M. A., Kuhstoss, S., Solenberg, P., Schaus, N. A. and Rao, R. N. (1987). A new shuttle cosmid vector, pKC505, for streptomycetes: its use in the cloning of three different spiramycin-resistance genes from a Streptomyces ambofaciens library. Gene 61, 231-41.

183 Rodicio, M. R., Bruton, C J. and Chater, K. F. (1985).New derivatives of the Streptomyces temperate phage O C31 useful for the cloning and functional analysis of Streptomyces DNA. Gene 34, 283-92.

Rother, Mattes, R. and Altenbuchner, J. (1999).Purification and characterization of MerR, the regulator o f the broad- spectrum mercury resistance genes in Streptomyces lividans 1326. Mol Gen Genet 262, 154-62. Rothmel, R. K., Aldrich, T. L., Houghton, J. E., Coco, W. M., Omston, L. N. and Chakraharty, A. M. (1990).Nucleotide sequencing and characterization of Pseudomonas putida caiR : a positive regulator o f the catBC operon is a member of the LysR family. J Bacteriol 172, 922-31.

Rowland, S. S., Zulty, J. J., Sathyamoorthy, M., Pogeli, B. M. and Speedie, M. K. (1992). The effect of signal sequences on the efficiency of secretion of a heterologous phosphotriesterase by Streptomyces lividans. Appl Microbiol Biotechnol 38, 94-100.

Rushing, B. G., Yelton, M. M. and Long, S. R. (1991). Genetic and physical analysis of the nodD3 region of Rhizobium meliloti. Nucleic Acids Res 19, 921-7.

Saarela, J., Yiikarppa, R., Rehn, M., Purmonen, S. and Pihlajaniemi, T. (1998). Complete primary structure of two variant forms of human type XVIII collagen and tissue-specific differences in the expression of the corresponding transcripts. Matrix Biology 16, 319-328.

Saitou, N. and Nci, M. (1987).The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4 , 406-25.

Sanger, F., Nicklen, S. and Coulson, A. R. (1977).DNA sequencing with chain- terminating inhibitors. Proc Natl Acad Sci U SA 74, 5463-7.

Schell, M. A. and Sukordhaman, M. (1989).Evidence that the transcription activator encoded by the Pseudomonas putida nahR gene is evolutionarily related to the transcription activators encoded by the Rhizobium nodD genes. J Bacteriol 171, 1952-9.

Schell, M. A., Brown, P. H. and Raju, S. (1990).Use of saturation mutagenesis to localize probable functional domains in the NahR protein, a LysR-type transcription activator. J Biol Chem 265, 3844-50.

184 Schell, M. A. (1993).Molecular biology of the LysR family of transcriptional regulators. Annu Rev Microbiol 47, 597-626.

Scheu, A. K., Martinez, E., Sollveri, J. and Malpartida, F. (1997).abaB, a putative regulator for secondary metabolism in Streptomyces. FEMS Microbiol Lett 147, 29-36.

Schmitt-John, T. and Engels, J. W. (1992).Promoter constructions for efficient secretion expression in Streptomyces lividans. Appl Microbiol Biotechnol 36,493-8.

Schrempf, H., Bujard, H , Hopwood, D. A. and Goebel, W. (1975).Isolation of covalently closed circular deoxyribonucleic acid from Streptomyces coelicolor A3(2). J Bacteriol 121, 416-21.

Schwacha, A. and Bender, R. A. (1993).The nac (nitrogen assimilation control) gene from Klebsiella aerogenes. J Bacteriol 175, 2107-15.

Senior, D. J., Hamilton, J., Bernier, R. L. and du Manoir, J. R. (1992).Reduction in chlorine use during bleaching of kraft pulp following xylanase pretreatment. Tappi Journal 75. 125.

Shapiro, S. (1989). Nitrogen assimilation in actinomycetes and the influence of nitrogen nutrition on actinomycete secondary metabolism. In Regulation o f secondary metabolism in actinomycetes., pp. 149-153. Edited by S. Shapiro. Boca Raton: CRC Press.

Shean, C. S. and Gottesman, M. E. (1992). Translation of the prophage A. cl transcript. C e//70, 513-22.

Sidhu, S. S., Kalmar, G. B. and Borgford, T. J. (1993).Characterization of the gene encoding the glutamic-acid-specific protease of Streptomyces griseus. Biochem Cell Biol 71,454-61.

Sidhu, S. S., Kalmar, G. B., Willis, L. G. and Borgford, T. J. (1994).Streptomyces griseus protease C. A novel enzyme of the chymotrypsin superfamily. JR /o/ Chem 269, 20167-71.

185 Silen, J. L. and Agard, D. A. (1989).The a-lytic protease pro-region does not require a physical linkage to activate the protease domain in vivo. Nature 341, 462-4.

Smith, E. L., Markland, F. S., Kasper, C. B., DeLange, R. J., Landon, M. and Evans, W. H. (1966). The complete amino acid sequence of two types of subtilisin, BPN' and Carlsberg. y 5/o/ Chem 241, 5974-6.

Sodek, J. and Hofmann, T. (1970).Large-scale preparation and some properties of penicillopepsin, the acid proteinase of Pénicillium janthinellum. Can J Biochem 48,425- 31.

Sprengart, M. L., Fatscher, H. P. and Fuchs, E. (1990).The initiation of translation in E. coli: apparent base pairing between the 16S rRNA and downstream sequences of the mRNA. Nucleic Acids Res 18, 1719-23.

Springman, E. B., Angleton, E. L., Blrkedai-Hansen, H. and Van Wart, H. E. (1990). Multiple modes of activation of latent human fibroblast : evidence for the role of a Cys73 active-site zinc complex in latency and a "cysteine switch" mechanism for activation. Proc Natl Acad Sci U SA 87,364-8.

Sreenath, H. K. and Joseph, R. (1982).Purification and properties of extracellular xylan hydrolases of Streptomyces exfoliatus. Folia Microbiol 27, 107-15.

Steitz, T. A and Schulman, R. G. (1982).Crystallographic and NMR studies of the serine proteases. Annu Rev Biophys Bioeng 11, 419-444.

Stern, I. J. (1969). Biochemistry of chymopapain. Clin Orthop 67, 42-6.

Stock, J. B., Ninfa, A. J. and Stock, A. M. (1989).Protein phosphorylation and regulation of adaptive responses in bacteria. Microbiol Rev 53, 450-90.

Stragier, P., Danos, O. and Patte, J. C. (1983).Regulation of diaminopimelate decarboxylase synthesis in Escherichia coli. II. Nucleotide sequence of the lysA gene and its regulatory region. J Mol Biol 168, 321-31.

186 Strohl, W. R. (1992).Compilation and analysis of DNA sequences associated with apparent streptomycete promoters. Nucleic Acids Res 20, 961-74.

Strohl, W. R., Dickens, M. L. and DeSanti, C L. (1999).Methods of Producing Doxorubicin. U.S. Patent 5,962,293. Assignee; The Ohio State University Research Foundation.

Sung, Y. C. and Fuchs, J. A. (1992).The Escherichia coli K-12 cyn operon is positively regulated by a member of the lysR family. J Bacteriol 174, 3645-50.

Taguchi, S., Kojima, S., Terabe, M., Kumazawa, Y., Kohriyama, H., Suzuki, M., Miura, K. and Momose, H. (1997).Molecular phylogenetic characterization of Streptomyces protease inhibitor family. J Mol Evol 44, 542-51.

Tang, L., Fu, H. and McDaniel, R. (2000). Formation of functional heterologous complexes using subunits from the picromycin, erythromycin and oleandomycin polyketide synthases. Chem Biol 7, 77-84.

Tedin, K., Moll, I., Grill, S., Resch, A., Graschopf, A., Gualerzi, C O. and Blasi, U. (1999).Translation initiation factor 3 antagonizes authentic start codon selection on leaderless mRNAs [see comments]. Mol Microbiol 3 1 ,67-77.

Tercero, J. A., Espinosa, J. C. and Jimenez, A. (1998).StgR, a new Streptomyces alboniger member of the LysR family of transcriptional regulators. Mol Gen Genet 259, 475-83.

Thompson, C J., Ward, J. M. and Hopwood, D. A. (1980).DNA cloning in Streptomyces: resistance genes from antibiotic-producing species. Nature 286, 525-7.

Thompson, C J., Ward, J. M. and Hopwood, D. A. (1982a).Cloning of antibiotic resistance and nutritional genes in streptomycetes. J Bacteriol 151, 668-77.

Thompson, C J., Kieser, T., Ward, J. M. and Hopwood, D. A. (1982b).Physical analysis of antibiotic-resistance genes from Streptomyces and their use in vector construction. Gene 20, 51-62.

187 Trigo, C. and Ball, A. S. (1994). Is the solubilized product from the degradation of lignocellulose by actinomycetes a precursor of humic substances? Microbiology 140, 3145-52.

Tyrrell, R., Verschueren, K. H., Dodson, E. J., Murshudov, G. N., Addy, C. and Wilkinson, A. J. (1997).The structure of the cofactor-binding fragment of the LysR family member, CysB: a familiar fold with a surprising subunit arrangement. Structure 5, 1017-32. Urabe, H. and Ogawara, H. (1992).Nucleotide sequence and transcriptional analysis of activator-regulator proteins for P-lactamase in Streptomyces cacaoi. J Bacteriol 174, 2834-42.

Uren, J. R. and Neurath, H.(1974). Intrinsic enzymatic activity of bovine procarboxypeptidase A S5. Biochemistry 13, 3512-20. van der Meer, J. R., Frijters, A. C , Leveau, J. H., Eggen, R. 1., Zehnder, A. J. and de Vos, W. M. (1991). Characterization of the Pseudomonas sp. strain P51 gene tcbR, a LysR- type transcriptional activator of the tcbCDEF chlorocatechol oxidative operon, and analysis of the regulatory region. J Bacteriol 173, 3700-8.

Van Mellaert, L., Lammertyn, E., Schacht, S., Proost, P., Van Damme, J., Wroblowski, B., Anne, J., Scarcez, T., Sablon, E., Raeymaeckers, J. and Van Broekhoven, A. (1998). Molecular characterization of a novel subtilisin inhibitor protein produced by Streptomyces venezuelae CBS762.70. DNA Seq 9, 19-30.

Vara, J., Lewandowska-Skarbek, M., Wang, Y. G., Donadio, S. and Hutchinson, C. R. (1989). Cloning of genes governing the deoxysugar portion of the erythromycin biosynthesis pathway in Saccharopolyspora erythraea {Streptomyces erythreus). J Bacteriol 171, 5872-81.

Viale, A. M , Kobayashi, H., Akazawa, T. and HenikofT, S.(1991). rbcR, a gene coding for a member of the LysR family of transcriptional regulators, is located upstream of the expressed set of ribulose 1,5-bisphosphate carboxylase/ genes in the photosynthetic bacterium Chromatium vinosum. J Bacteriol 173, 5224-9.

Viikari, L., Kantelinen, A., Ratto, M. and Sundquist, J.(1991). Enzymes in pulp and paper processing. ACS Symposium Series 460, 12.

188 Virolle, M. J., Long, C M., Chang, S. and Bibb, M. J. (1988).Cloning, characterisation and regulation of an a-amylase gene from Streptomyces venezuelae. Gene 74, 321-34. von Heijne, G. (1986). A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14, 4683-90. von Lintig, J., Zanker, H. and Schroder, J. (1991).Positive regulators of opine- inducible promoters in the nopaline and octopine catabolism regions of Ti plasmids. Mol Plant Microbe Interact 4, 370-8.

Wang, S. J., Chang, H. M., Lin, Y. S., Huang, C. H. and Chen, C. W. (1999). Streptomyces genomes: circular genetic maps from the linear chromosomes. Microbiology 145, 2209-20.

Ward, J. M., Janssen, G. R., Kieser, T., Bibb, M. J. and Buttner, M. J. (1986). Construction and characterisation of a series of multi-copy promoter- probe plasmid vectors for Streptomyces using the aminoglycoside phosphotransferase gene from Tn5 as indicator. Mol Gen Genet 203, 468-78.

Warnes, A. and Stephenson, J. R. (1986).The insertion of large pieces of foreign genetic material reduces the stability of bacterial plasmids. Plasmid 16, 116-23.

Weaver, L. H., Kester, W. R. and Matthews, B. W. (1977).A crystallographic study of the complex of phosphoramidon with thermolysin. A model for the presumed catalytic transition state and for the binding of extended substances. J Mol Biol 114, 119-32.

Wek, R. C. and Hatfield, G. W. (1986).Nucleotide sequence and in vivo expression of the ilvYand ilvC genes in Escherichia coli K12. Transcription from divergent overlapping promoters. J Biol Chem 261, 2441-50.

Williams, S. T. (1978).Streptomyces in the soil ecosystem. In Nocardia and Streptomyces, pp. 137-144. Edited by M. Mordarski, W. Kurylowicz and N. Jeljaszewicz. New York: Gustav Fischer.

189 Williams, S. T., Goodfellow, M., Alderson, G., Wellington, E. M., Sneath, P. H. and Sackin, M. J. (1983).Numerical classification of Streptomyces and related genera. J Gen Microbiol 129, 1743-813.

Windhovel, U. and Bowien, B. (1991).Identification of çfxR, an activator gene of autotrophic C02 fixation mAlcaligenes eutrophus. Mol Microbiol 5, 2695-705.

Winther, J. R. and Sorensen, P. (1991).Propeptide of carboxypeptidase Y provides a chaperone-like function as well as inhibition of the enzymatic activity. Proc Natl Acad Sci U S AS8, 9330-4.

Yamada, M., Izu, H., Nitta, T., Kurihara, K. and Sakurai, T. (1998).High- temperature, nonradioactive primer extension assay for determination of a transcription- initiation site. Biotechniques 25, 72-4, 76, 78.

Yen, K. M. and Gunsalus, I. C. (1982).Plasmid gene organization: naphthalene/salicylate oxidation. Proc Natl Acad Sci U SA 79,874-8.

Yokote, Y. and Noguchi, V. (1969).Studies on enzymes produced by Streptomyces caespitosus part II. Crystallization and some properties of neutral protease. Agricult Chem 43, 132-138.

Zaman, S., Radnedge, L., Richards, H. and Ward, J. M. (1993).Analysis of the site for second-strand initiation during replication of the Streptomyces plasmid pIJlOl. J Ge« Microbiol 139, 669-76.

190 Appendix A

Table of plasmids used in the Streptomyces sp. strain C5 snpA/R study presented in CHAPTER 2

191 Plasmid Relevant Characteristics Source/Reference

pIJ303 10,620 kbp, thiostrepton marker in pIJIOI, Thio\ sti+ Kieser, 1982 pIJ486 6.2 kbp, promoter probe derivative of pIJ3S0 containing Ward, 1986 the apA//reporter gene of Tn5, HC, Thio' pIJ702 5.65 kbp, derivative of pIJ350, HC, Thio% Mel+ Katz, 1983 pIJ4070 2.97 kbp, ermE* promoter fragment in pUCIS, Amp"^ C.R. Hutchinson pMALc-H#15 6.60 kbp, pMALc-H containing human endostatin cDNA Merck & Co, Inc. pUC19 2.686 kbp, Amp^ LifeTechnologies pKC505 18.7 kbp, cosmid derivative o f SCP2*, SP, Apr' Richardson, 1987 pANT42 8.76 kbp, S. sp. strain 05 snp locus in pIJ702, Thio' Lampel,l992 pANT54 3.9 kbp, snpR gene & intergenic region in pUC19, Amp' Lampel,l992 pANT806 3.937 kbp, aphW gene from pIJ486 in pUC19, Amp' This Study pANT807 5.656 kbp, 3.0 kbp Bam\\\-Hind\\\ fragment o f pANT852 This Study containing the snpR-aphU cassette in pUC 19, Amp' pANT809 4.591 kbp, pANT807 minus 5auI-5/ul snpR DNA, Amp' This Study pANT825 8.220 kbp, 5.293 Sph\-Pst\ fragment of pIJ702 in pIJ4070 This Study with deleted £coRI, 5acl and Kpnl sites, SP, Thio', Amp' pANT826 7.334 kbp, PANT825 with 39 bp synthetic Sph\-Mlu\ MCS This Study fragment downstream of Pe^E* , SP, Thio', Amp' pANT827 8.617 kbp, 1.3 kbp £coRI-///>rdIII fragment o f pANT806 This Study in pANT826, SP, Thio', Amp' pANT840 2.749 kbp, pUCI9 plus linker containg modified MCS, This Study disrupted /acZ-ot, Amp' pANT841 2.746 kbp, pUC19 plus linker containing modified MCS, This Study intact lacZ-a, Amp' pANT842 6.733 kbp, pANT42 minus 1.95 kbp Kpn\ fragment, Thior This Study PANT846 2.830 kbp, pANT841 plus linker containing modified MCS This Study with meganuclease sites, intact lacZ-a, Amp' pANT849 5.343 kbp, pANT842 with 39 bp synthetic Sphl-Mlu\ MCS This Study fragment inserted in place of the snpA gene, Thio' pANT852 6.626 kbp, 1.3 kbp EcoRl-HiruMW fragment of pANT806 This Study in pANT849, Thio', Neo' pANT853 5.561 kbp, 1.935 kbp £a/nHI-///>idIII (snpR-aphll) This Study fragment from pANT809 in pANT849, TTiio', Neo' pANT855 4.799 kbp, pIJ702 with 39bp synthetic Sphl-Mlul MCS This Study fragment inserted in place of the mel genes, Thio' pANT856 6.082 kbp, 1.3 kbp £coRI-///>idni fragment of pANT806 This Study in pANT855, Thio', Neo' pANT857 7.97 kbp, 2.364 kbp PvuW fragment of pUCI9 containing This Study replicon and bla gene in pANT849, SP, Thio', Amp' pANT866 5.343 kbp, 1.527 kbp Kpn\-Not\ fragment of pIJ486 with This Study altered rep gene (lacking BamHl site) in pANT849, Thio' PANT867 9.003 kbp, 1.3 kbp £coRI-A//>idin fragment of pANT806 This Study in pANT857, Thio', Neo' pANT880 4.25 kbp, 1.6 kbp EcoBA-Pstl fragment containing This Study acc(3)-!V \ocus of pKC505 in pUCI9, Amp', Apr' pANT881 3.498 kbp, 752 bp Sstl fragment containing acc(3)-!V This Study gene of pKC505 in pUC 19, Amp' (continued)

192 Table A l. Plasmids used in CHAPTER 2

Table A.l (continued).

Plasmid Relevant Characteristics Source/Reference

PANT883 3.638 kbp, 233 bp fragment containing the This Study bifunctionai E.coliStreptomyces acc(3)-!V promoter in pANT881, Amp', ApK pANT886 3.883 kbp, 1001 bp Sma\ fragment of pANT806 in This Study pANT883, downstream of acc(3)-!V promoter. Amp', Neo' pANT894 6.512 kbp, 3.297 kbp Dral-Ndel fragment o f pANT866 This Study in pANT886 variant with deleted EcoRI and Hindlll sites SP, Neo' PANT895 6.247 kbp, 3.297 kbp Dra\-Nde\ fiagment of pANT866 This Study in pANT883 variant with deleted £colU, HindlW. Sph\ and Pst\ sites, SP, Apr' pANT1200 8.052 kbp, pANT857 plus 104 bp synthetic Xba\-Hind\\\ This Study MCS term fiagment, SP, Thio', Amp' PANTI201 6.601 kbp, pANT894 plus 104 bp synthetic A2>al-//i>idlll This Study MCS-term fiagment, SP, Neo' PANT1202 6.336 kbp, pANT895 plus 104 bp synthetic Xba\~Hind\\\ This Study MCS-term fiagment, SP, Apr' PANT302I 6.661 kbp, pANT120l plus 103 bp synthetic C/aI-f//>idIII This Study fragment comprising VAA-UP leader peptide, Neo', SP pANT3022 6.661 kbp, pANT1201 plus 103 bp synthetic Cla\-Hind\\\ This Study fragment comprising VAA leader peptide, Neo', SP PANT3023 6.661 kbp, pANT120l plus 103 bp synthetic Cla\-Hind\\\ This Study fragment comprising VSI-UP leader peptide, Neo', SP PANT3024 6.661 kbp, pANTI20l plus 103 bp synthetic Cla\-Hind\\\ This Study fragment comprising VSI leader peptide, Neo', SP PANT3025 6.661 kbp, pANT120l plus 103 bp synthetic Cla\-Hind\\\ This Study fragment comprising SNP leader peptide, Neo', SP PANT3026 6.655 kbp, pANTI20l plus 97 bp synthetic Clal~Hind\\\ This Study fragment comprising PEL leader peptide, Neo', SP PANT3032 7.227 kbp, pANT3022 plus cDNA ORF from pMalcH# 15 This Study Neo', SP PANT3035 7.227 kbp, pANT3025 plus cDNA ORF from pMalcH# 15 This Study Neo', SP PANT3042 9.296 kbp, vaa-endo cassette from pANT3032 in pANT826 This Study Thio', SP PANT3045 9.296 kbp, snp-endo cassette from pANT3035 in pANT826 This Study Thio', SP pANT3052 12.582 kbp, vaa-endo cassette from pANT3032 in plJ303 This Study Thio'

*Thio, thiostrepton; Mel, melanin; Neo, neomycin; Apr, apramycin; Amp, ampicillin; SP, E. coli- Streptomyces shuttle plasmid.

193 Appendix B

Streptomyces screened for snp DNA in CHAPTER 3

194 OSUMA# Species External Collection ID MA Number S. avermitih's MA4680 1 S. spectabilis ATCC27465 MA6231 2 S. griseus NRRL B8090 MA6230 3 S. therinotolerans ATCC11416 MA5961 4 S. violaceoruber ATCC14980 MA5963 5 S. griseoruber ATCC23919 MA5964 6 S. platensis ATCC13865 MA5969 7 S. viridogenes ATCC3372 MA5979 8 S. rutgersensis ATCC15191 MA5987 9 S. endus ATCC23904 MA5988 10 unspeciated ATCC11862 MA5989 11 S. flavogriseus ATCC25452 MA5992 12 S. hygroscopicus ATCC27438 MA5993 13 S. todomminensis ATCC31489 MA5994 14 unspeciated ATCC31358 MA6066 15 S. alboniger ATCC12461 MA6093 16 S. antibioticus ATCC11891 MA6096 17 S. carzinostaticus ATCC15945 MA6113 18 S. gardnerii ATCC9604 MA6119 19 S. noursei ATCC11455 MA6120 20 S. chattanoogensis ATCC13358 MA6121 21 S. nodosus ATCC14899 MA6122 22 S. silvensis ATCC53525 MA6272 23 S. roseochromogenes NRRL B1233 MA6187 24 S. fulvissimus NRRLB1456 MA6188 25 S. platensis NRRL2364 MA6189 26 S. akiyoshiensis ATCC13480 MA6145 27 unspeciated ATCC53527 MA6287 28 S. jamaicensis NCIB10166 MA6336 29 S. virginiae ATCC13161 MA6341 30 S. mitakaensis ATCC15297 MA6342 31 unspeciated ATCC53929 MA6348 32 S. violacoruber ATCC3355 MA2921 33 S. griseolus ATCC3325 MA413 34 S. olivaceus ATCC3335 MA236 35 S. olivochromogenus ATCC3336 MA494 36 S. rochei ATCC10739 MA2923 37 S. argenteolus ATCC11009 MA318 38 S. viridifaciens ATCC 11989 MA450

(continued)

Table B.l. Streptomyces species screened for snp DNA.

195 39 S. parvus ATCC12320 MA428 40 unspeciated ATCC12463 MA4749 41 S. aureofaciens ATCC12552 MA601 42 S. cel/ulosae ATCC12625 MA420 43 S. graminofaciens ATCC12705 MA317 44 S. melanogenes ATCC12851 MA4750 45 S. flocculus ATCC25453 MA6528 48 unspeciated ATCC14892 MA4087 49 S. azureus ATCC14921 MA4507 50 S. fimbriatus ATCC15051 MA2826 51 S. caelestis ATCC15084 MA4359 53 S. prasinus ATCC15825 MA2942 54 S. spadicis ATCC19017 MA4128 55 S. cacaoi ATCC19093 MA4855 56 S. thermovulgaris ATCC19284 MA4069 57 S. althioticus ATCC19724 MA4508 58 S. canus ATCC19737 MA4751 59 S. echinatus ATCC19748 MA4360 60 S. eurythermus ATCC19749 MA4742 62 S. nitrosporeus ATCC19792 MA4743 63 S. resistomycificus ATCC19804 MA4744 64 S. thermoviolaceus ATCC19994 MA4068 65 unspeciated ATCC21001 MA4857 66 unspeciated ATCC21021 MA1157 67 S. fradiae ATCC21096 MA2898 68 S. vindochromogenes ATCC21240 MA2916 69 S. achromogenes ATCC12767 MA6518 70 S. olivaceus ATCC21379 MA4342 71 unspeciated ATCC21386 MA4381 72 S. tateyamensis ATCC21389 MA4296 73 S. rimosus ATCC21484 MA4469 74 S. kanamyceticus ATCC21486 MA4473 75 S. namiwaensis ATCC21689 MA4343 76 S. albus ATCC21838 MA4766 77 S. chartreusis ATCC21999 MA4716 79 S. purpurascens ATCC23871 MA4510 80 S. ambofaciens ATCC23877 MA5043 81 S. filipinensis ATCC23905 MA4891 82 S. gelaticus ATCC23912 MA4752 83 S. griseinus ATCC23915 MA4753

(continued)

196 Table B.l (continued).

84 S. matensis ATCC23935 MA4511 85 S. actuosus ATCC25421 MA4512 86 S. congiobatus ATCC31005 MA4717 87 S. diastatochromogenes ATCC31013 MA4726 88 unspeciated ATCC31358 MA5251 90 S. violaceusniger NRRL B1356 MA4616 91 unspeciated NRRL B1364 MA607 92 S. virginiae NRRL B1446 MA443 93 S. chrysomallus NRRL 2250 MA4201 94 S. fellus NRRL 2251 MA439 95 S. venezuelae NRRL 2277 MA260 96 unspeciated NRRL 2294 MA437 97 S. halstedii ATCC10897 MA6768 98 S. griseoviridis ATCC23920 MA4550 99 S. garyphalus NRRL2448 MA266 100 S. spheroides NRRL2449 MA319

Table B.l (continued).

197 Appendix C

Nucleotide Sequences of Cloned snp Loci

198 1 cggcccagcg gggtggggcg gcagccggtg cgctcgcggg tgaacagctc gccgcccagc gccgggtcgc cccaccccgc cgtcggccac gcgagcgccc acttgtcgag cggcgggtcg

61 tcctgttcga tgcggcgcaa ctgcgtgctc aacgagggct gtgccacgcc gagttggcgg aggacaagct acgccgcgtt gacgcacgag ttgctcccga cacggtgcgg ctcaaccgcc

121 gcggcacggt gcaggctgcc ggtgtccgct atggcgcaca gcgcgcggag gtgcctgacc cgccgtgcca cgtccgacgg ccacaggcga taccgcgtgt cgcgcgcctc cacggactgg

181 Ticaagctcca tgcctgcgga gcctaagccg gaacttaaag attcaccaga ctcacaaacc agttcgaggt acggacgcct cggattcggc cttgaatttc taagtggtct gagtgtttgg

241 ccatgcaaca caggcgagtc ggcgcacctc aggcccccac ctcggacccg gtggccggga ggtacgttgt gtccgctcag ccgcgtggag tccgggggtg gagcctgggc caccggccct

301 tacgtccgtc ctatcaccac ttgccatcat cacagccgcc gcacaggcat caacactcac atgcaggcag gatagtggtg aacggtagta gtgtcggcgg cgtgtccgta gttgtgagtg

361 cgacgacgac ttcacccccc acccacaagg agtcatcgat gcgcatgtcc acgtccgccc gctactgctg aagtgggggg tgggtgttcc tcagtagcta cgcgtacagg tgcaggcggg

421 tcgcggcggc tgcggtcggt ctgagtctcg cgaccgcctc gctgagcatg gccgtaccgg agcgccgccg acgccagcca gactcagagc gctggcggag cgactcgtac cggcatggcc

481 ccacggccgc gcccgtcgca ccggccgccg ccacggcgta ctcgggctac accggctcgg ggtgccggcg cgggcagcgt ggccggcggc ggtgccgcat gagcccgatg tggccgagcc

541 ccgtggaggc caaggccaac caggcgttct tcgaggccgt catcaagtcc gtcgccgaga ggcacctccg gttccggttg gtccgcaaga agctccggca gtagttcagg cagcggctct

601 agcgcgccgc ccagccgaga agcgccgcgg ccgtcaccgt cgtgtgcgac gcctcccgcg tcgcgcggcg ggtcggctct ccgcggcgcc ggcagtggca gcacacgctg cggagggcgc

661 cgccctcgtt cagctccgtg atagcccgca gcacccagat atggaacggc tcggtgtcga gcgggagcaa gtcgaggcac tatcgggcgt cgtgggtcta taccttgccg agccacagct

721 acgtgaagct ccagtcgggc tccgcgtcga atgccgactt cagctaccgc gagggcaacg tgcacttcga ggtcagcccg aggcgcagct tacggctgaa gtcgatggcg ctcccgttgc

781 actcccgtgg ctcgtacgcc tccacggacg gtcacggcag cggctatgtc ttcctcgact tgagggcacc gagcatgcgg aggtgcctgc cagtgccgtc gccgatacag aaggagctga

841 acgcgcagaa ccggcagtac gactccaccc gcgtgaccgc ccacgagacc gggcacgtgc tgcgcgtctt ggccgtcatg ctgaggtggg cgcactggcg ggtgctctgg cccgtgcacg

901 tcggtcttcc cgaccactac accgggccgt gcagtgagct gatgtcgggc ggcggccccg agccagaagg gctggtgatg tggcccggca cgtcactcga ctacagcccg ccgccggggc

961 gcccgtcctg caccaacgcc cgggcaggac gtggttgcgg

Figure C .l Nucleotide sequence of 5. avermitUis snp locus fragment.

199 1 aattgtaata cgactcacta tagggcgaat tgggccctct agatgcatgc tcgagcggcc ttaacattat gctgagtgat atcccgctta acccgggaga tctacgtacg agctcgccgg

61 gccagtgtga tggatatctg cagaattcgc ccttcggccc agcggggtgg ggcggcagcc cggtcacact acctatagac gtcttaagcg ggaagccggg tcgccccacc ccgccgtcgg

121 ggtgcgctcg cgggtgaaga gctggccgcc caccgcgtgc tcgatgcggc gcagttgggt ccacgcgagc gcccacttct cgaccggcgg gtggcgcaca agctacgccg cgtcaaccca

181 ggtcaacgag ggctggctca tgccgagttg gcgggcggcc ttgtgcaggc tgccgctgtc ccagttgctc ccgaccgagt acggctcaac cgcccgccgg aacacgtccg acggcgacag

241 agcgatggcg cagagtgcgc ggaggtgcct gacctcgagc tccatggccg tgagcgtaga tcgctaccgc gtctcacgcg cctccacgga ctggagctcg aggtaccggc actcgcatct

301 gcgggttgcg gcggcacacc agcctcccaa acacccatgg aaccgtagaa gtcgcccctc cgcccaacgc cgccgtgtgg tcggagggtt tgtgggtacc ttggcatctt caacggggag

361 cgtccggtga ccggatgtgc ggcgataggc ccgtgctatc ggcacctgcc atcatccgcg gcaggccact ggcctacacg ccgctatccg ggcacgatag ccgtggacgg tagtaggcgc

421 acgcgccagg ctgcccgaca ctcacggata ccgaaacgtt catctttcac cgaacgagta tgcgcggtcc gacgggctgt gagtgcctat ggctttgcaa gtagaaagtg gcttgctcat

4 81 gggagccccc cacatgcaca agcgttatgt gtccgtcgcc gccgcactcg gactcgcggt ccctcggggg gtgtacgtgt tcgcaataca caggcagcgg cggcgtgagc ctgagcgcca

541 cgccgcgctc ggtacctcct ccgtcgcctc ggccgccacg tcggccgact cggccaagtc gcggcgcgag ccatggagga ggcagcggag ccggcggtgc agccggctga gccggttcag

601 caccaaggcc accgtatcgg ccgcccggta cgcgggctcc gccgaagagg cggccgccaa gtggttccgg tggcatagcc ggcgggccat gcgcccgagg cggcttctcc gccggcggtt

661 caaggcgttc ttcgaggccg tcgtcaaggc ggtcgccgag aagcgtgccg cgaaccccgg gttccgcaag aagctccggc agcagttccg ccagcggctc ttcgcacggc gcttggggcc

721 cgtcaaggcc gtcacggtga cctacagcac ccagcgcgcc ccgtcgttcc ggcagcagat gcagtcccgg cagtgccact ggatgtcgtg ggtcgcgcgg ggcagcaagg ccgtcgtcta

781 cgccacgagc acctccatat ggaacgcggc cgtctccaac gtcaagctcc aggagggtac gcggtgctcg tggaggtata ccttgcgccg gcagaggttg cagttcgagg tcctcccatg

841 gagcggcacg agcttcgagt accgcgaggg caacgacccg cgcggctcgt acgcgagcac ctcgccgtgc tcgaagctca tggcgctccc gttgctgggc gcgccgagca tgcgctcgtg

901 caacggccac ggtcgcggct acatcttcct cgactaccgg cagaaccagc agtacaactc gttgccggtg ccagcgccga tgtagaagga gctgatggcc gtcttggtcg tcatgttgag

961 cacccgcgtg accgcccacg agaccgggca cgtgctcggc ctccccgaca actaccgcgg gtgggcgcac tggcgggtgc tctggcccgt gcacgagccg gaggggctgt tgatggcgcc

1021 cccgtgctcg gagctcatgt ccggcggcgg ctggatccga gctcggtacc aagcttggcg gggcacgagc ctcgagtaca ggccgccgcc gacctaggct cgagccatgg ttcgaaccgc

Figure C.2. Nucleotide sequence of 51 spectabilis snp locus fragment. 200 1 aatacgactc actatagggc gaattgggcc ctctagatgc atgctcgagc ggccgccagt ttatgctgag tgatatcccg cttaacccgg gagatctacg tacgagctcg ccggcggtca

61 gtgatggata tctgcagaat tcgcccttcg gcccagcggg gtggggcggc agccggtgcg cactacctat agacgtctta agcgggaagc cgggtcgccc caccccgccg tcggccacgc

121 ctcccgggtg aacagcggtc cgcccagggc ctgttcgatg cgggtcaact gagtgctcaa gagggcccac ttgtcgccag gcgggtcccg gacaagctac gcccagttga ctcacgagtt

181 cgtgggctgt gcgacaccca gtcggcgtgc cgcccggtgc aggctgccgg cgtcggcgat gcacccgaca cgctgtgggt cagccgcacg gcgggccacg tccgacggcc gcagccgcta

241 ggcgcacagt gcgcgtaggt gcctcacctc gagctccatg cagggagcgt aaagcggaac ccgcgtgtca cgcgcatcca cggagtggag ctcgaggtac gcccctcgca tttcgccttg

301 agttggttgc gccaggtgaa caaaacgcgg cggatcaggg cgagttctgc actctggtca tcaaccaacg cggtccactt gttttgcgcc gcctagtccc gctcaagacg tgagaccagt

361 aagctggaac gagagtggcc gggcggtggg tgatagcccg gccctatcac ttgttgccat ttcgaccttg ctctcaccgg cccgccaccc actatcgggc cgggatagtg aacaacggta

421 caccacagcg ggctcatggg cgccccacac tcaccggtga cgacttctcc ccactccccc gtagtgtcgc ccgagtaccc gcggggtgtg agtggccact gctgaagagg ggtgaggggg

481 gctcaaggag tcatcgatgc gtatcaccct gccccttctt tccaccgcgg tcggtctcgg cgagttcctc agtagctacg catagtggga cggggaagaa aggtggcgcc agccagagcc

541 cctgacggcc gccgtgctcg gcaccggccc cgccgcgacg gccgcggcgc cccaggagcc ggactgccgg cggcacgagc cgtggccggg gcggcgctgc cggcgccgcg gggtcctcgg

601 ggtcagagcc gcccagctcg gctaccagcc ctcggccggc tcgggcgagg acgcggccgc ccagtctcgg cgggtcgagc cgatggtcgg gagccggccg agcccgctcc tgcgccggcg

661 caaccgcgcg tccttcgagg cggtcgtcaa gtccgtcgcc gagaagcgcg ccgccaaccc gttggcgcgc aagaagctcc gccagcagtt caggcagcgg ctcttcgcgc ggcggttggg

721 gtccgccgcc gcggccgtca ccgtctacta cagcgccacc aacgcgccga gcttccgttc caggcggcgg cgccggcagt ggcagatgat gtcgcggtgg ttgcgcggct cgaaggcaag

781 ccagatatcc cgctccgccc agatctggaa cagctcggtg tccaacgtac ggctcgcgga ggtctatagg gcgaggcggg tctagacctt gtcgagccac aggttgcatg ccgagcgcct

841 gtcgagttcc ggcgcggact tcgcgtacta cgagggcaac gactcgcgcg gctcgtacgc cagctcaagg ccgcgcctga agcgcatgat gctcccgttg ctgagcgcgc cgagcatgcg

901 gtccacggac gggcacggca gcggctacat cttcctcgac taccgccaga accagcagta caggtgcctg cccgtgccgt cgccgatgta gaaggagctg atggcggtct tggtcgtcat

961 cgactcgacc cgcgtgaccg cccacgagac cgggcacgtg ctcggcctgc ccgaccacta gctgagctgg gcgcactggc gggtgctctg gcccgtgcac gagccggacg ggctggtgat

(continued)

Figure C.3. Nucleotide Sequence of S’, griseoruber snp locus fragment. 201 1021 ctccgggccg tgcagcgagc tgatgtcggg cggcggctgg atccgagctc ggtaccaagc gaggcccggc acgtcgctcg actacagccc gccgccgacc taggctcgag ccatggttcg

1081 ttggcgtaat c aaccgcatta g

Figure C.3 (continued).

202 1 tgtaatacga ctcactatag ggcgaattgg gccctctaga tgcatgctcg agcggccgcc acattatgct gagtgatatc ccgcttaacc cgggagatct acgtacgagc tcgccggcgg

61 agtgtgatgg atatctgcag aattcgccct tcggcccagc ggggtggggc ggcagccgtc tcacactacc tatagacgtc ttaagcggga agccgggtcg ccccaccccg ccgtcggcag

121 gcggccccgg tggaagagtt cggcgccgag gctctgttcg atgcggcgca gttgtgtggt cgccggggcc accttctcaa gccgcggctc cgagacaagc tacgccgcgt caacacacca

181 cagtgcgggc tggctcacgc cgagttcgcg tgcggcgcgg cgcacgcttc cggtgtcggc gtcacgcccg accgagtgcg gctcaagcgc acgccgcgcc gcgtgcgaag gccacagccg

241 gatggcgcag agcgcgcgaa ggtgacgcac ctcgagctcc atggaccgag gctagaagcg ctaccgcgtc tcgcgcgctt ccactgcgtg gagctcgagg tacctggctc cgatcttcgc

301 ccccgcccgc cccgccagtg gtcccaagca cgccaaatgc cctcaaccat ccggaagggt ggggcgggcg gggcggtcac cagggttcgt gcggtttacg ggagttggta ggccttccca

361 acgggacgtg gccaggaata taagaccggg ttatcgcagg ttggcatcat gtgcccgacg tgccctgcac cggtccttat attctggccc aatagcgtcc aaccgtagta cacgggctgc

421 tcggtgcgct ccagactcgt tcccggaccg aactcctcgg tccggtacgc ggcggacccc agccacgcga ggtctgagca agggcctggc ttgaggagcc aggccatgcg ccgcctgggg

481 actccgcccg tgccgcggac agtcaggaga cccccacatg cgtatgaccc ggaccggctc tgaggcgggc acggcgcctg tcagtcctct gggggtgtac gcatactggg cctggccgag

541 cgccctcgcc gggctcggcc tcgccgtcgc cgccgccctc ggctcggtcg cccccgcctc gcgggagcgg cccgagccgg agcggcagcg gcggcgggag ccgagccagc gggggcggag

601 ggccgccgcc gagacgtcga ccccgcgctc ggtcgccgcc tacgaggcgt ccaccgagaa ccggcggcgg ctcrgcagct ggggcgcgag ccagcggcgg atgctccgca ggtggctctt

661 cgccgccgcc acccgcgcct tccaggaggc ggtcatgaag gcggtcgccg agaagcgcgc gcggcggcgg tgggcgcgga aggtcctccg ccagtacttc cgccagcggc tcttcgcgcg

721 cgccaacccg ggcgcgctcg ccgtcaccgt cacctacgac gcctcggccg cccccacctt gcggttgggc ccgcgcgagc ggcagtggca gtggatgctg cggagccggc gggggtggaa

781 ccgctcgcag atcgccagct ccacctcgat ctggaacggc gccgtctcca acgtccgcct ggcgagcgtc tagcggtcga ggtggagcta gaccttgccg cggcagaggt tgcaggcgga

841 ccaggaaggc tccaacgccg acttcacgta ccgcgagggc aacgacccgc gcggctcgca ggtccttccg aggttgcggc tgaagtgcat ggcgctcccg ttgctgggcg cgccgagcgt

901 cgccagcacc gacggccacg gccggggcta catcttcctc gactacgccc agaaccagca gcggtcgtgg ctgccggtgc cggccccgat gtagaaggag ctgatgcggg tcttggtcgt

961 gtacaactcc acccgggtga ccacccacga gaccggccac gtgctcggcc tgcccgacac catgttgagg tgggcccact ggtgggtgct ctggccggtg cacgagccgg acgggctgtg

(continued)

Figure C.4. Nucleotide sequence of Streptomyces ATCC 11862 snp locus fragment.

203 1021 ctactccggc ccgtgcagcc agctgatgtc cggcggcggc tggatccgag cccggtacca gatgaggccg ggcacgtcgg tcgactacag gccgccgccg acctaggctc gagccatggt

1081 agcttggcgt aatcatggtc ata tcgaaccgca ttagtaccag tat

Figure C.4 (continued).

204 1 cggcccagcg gggtggggcg gcagccggtg tgggagcgca ggaagagcgg tccgcccagc gccgggtcgc cccaccccgc cgtcggccac accctcgcgt ccttctcgcc aggcgggtcg

SI tcctgctcga tccggcgcag ttgggtgctc aacgacggct gggcgacgcc cagctgccgg aggacgagct aggccgcgtc aacccacgag ttgctgccga cccgctgcgg gtcgacggcc

121 gcggcacggt gcagactgcc ggtgtcggct atggcgcaca gcgcgcggag gtgcctcacc cgccgtgcca cgtctgacgg ccacagccga taccgcgtgt cgcgcgcctc cacggagtgg

181 tcgagctcca tgctctgtga ggctaaagcg gaaccgcggc gttaaccagc cgcataatcc agctcgaggt acgagacact ccgatttcgc cttggcgccg caattggtcg gcgtattagg

241 atcgctaaac caggtgaact gggcagccga tagtgccgtc ctatccctgg ttgccatcat tagcgatttg gtccacttga cccgtcggct atcacggcag gatagggacc aacggtagta

301 cacaggcgcc gcacaggcat cgagactcac cggtgagact caaccccacc caaggagttc gtgtccgcgg cgtgtccgta gctctgagtg gccactctga gttggggtgg gttcctcaag

361 tcgatgcgaa agtccctgtc ttccctggcg gttctcggtc tcggcctcac gatggccgga agctacgctt tcagggacag aagggaccgc caagagccag agccggagtg ctaccggcct

421 ctcggaacgg cttcccccgc cgcggcgtcc ccggtcgcgg agccccacgc cgggttcgtc gagccttgcc gaagggggcg gcgccgcagg ggccagcgcc tcggggtgcg gcccaagcag

481 acccagtcgt ccgcctcggc gggagccgac gccgacgcga gcagagcctt cttccaggcg tgggtcagca ggcggagccg ccctcggctg cggctgcgct cgtctcggaa gaaggtccgc

54 1 gtcctgaagt ccgtcgccga gaagcgcgcc gcgaacccga gtgccaccgc ggccgtcacc caggacttca ggcagcggct cttcgcgcgg cgcttgggct cacggtggcg ccggcagtgg

601 gtcgtctacg acgcctcccg cgcgccgacg ttcagcgccc agatagcgcg cagcacccag cagcagatgc tgcggagggc gcgcggctgc aagtcgcggg tctatcgcgc gtcgtgggtc

661 atatggaaca gctcggtgtc gaacgtgaag ctccagtcgg gctccgcttc ggcggccgac tataccttgt cgagccacag cttgcacttc gaggtcagcc cgaggcgaag ccgccggctg

721 ttcagctacc gggagggcaa cgactcccgc ggctcgtacg cctccaccga cggtcacggc aagtcgatgg ccctcccgtt gctgagggcg ccgagcatgc ggaggtggct gccagtgccg

781 agcggctaca tcttcctgga ctaccggcag aaccagcagt acgactccac ccgggtcacc tcgccgatgt agaaggacct gatggccgtc ttggtcgtca tgctgaggtg ggcccagtgg

841 gcccacgaga cgggtcacgt cctcggtctt cccgaccact actcgggccc gtgcagtgag cgggtgctct gcccagtgca ggagccagaa gggctggtga tgagcccggg cacgtcactc

901 ctgatgtccg gtggcggct gactacaggc caccgccga

Figure C S Nucleotide sequence of S. ihermotoierans snp locus fragment.

205 1 taatacgact cactataggg cgaattgggc cctctagatg catgctcgag cggccgccag attatgctga gtgatatccc gcttaacccg ggagatctac gtacgagctc gccggcggtc

61 tgtgatggat atctgcagaa ttcgcccttc ggcccagcgg ggtggggcgg cagccggtgc acactaccta tagacgtctt aagcgggaag ccgggtcgcc ccaccccgcc gtcggccacg

121 gttccctgat gaacagcggc ccgcccaggg cctgttcgat cctggtcagc tgggtgctca caagggacta cttgtcgccg ggcgggtccc ggacaagcta ggaccagtcg acccacgagz

181 aggtgggctg ggcgacgccc agttggcggg ccgcccggtg caggctgccg gcgtcggcga tccacccgac ccgctgcggg tcaaccgccc ggcgggccac gtccgacggc cgcagccgct

241 tggcgcacag cgcgcgtagg tgcctcacct cgagctccat gcagggagcg taaggcggaa accgcgtgtc gcgcgcatcc acggagtgga gctcgaggta cgtccctcgc attccgcctt

301 cagttggttg cgccaggtga acaaaaggcg gcggaacagg gcgagttccg cactccggtc grcaaccaac gcggtccact tgttttccgc cgccttgtcc cgctcaagac gtgaggccag

361 gaagccgttc ccggccggtc gccggggtgg gccgaggccg tgcgggccgg ccggtgatag cttcggcaag ggccggccag cggccccacc cggctccggc acgcccggcc ggccactatc

421 ccccggccta tcacctgttg ccatcatcac agcgggctca tgggcgctcc acactcaccg ggggccggat agtggacaac ggtagtagtg tcgcccgagt acccgcgagg tgtgagtggc

4 81 gcgacgactt ctccccacac ccccacacaa ggagtcatcg atgcgcatca gcctgcccct cactgctgaa gaggggtgtg ggggtgtgtt cctcagtagc tacgcgtagt cggacgggga

541 gctctccacc gcggtcggac tcggcctgac cgcggccgtc ctcggcgccc cgaccgcgac cgagaggtgg cgccagcctg agccggactg gcgccggcag gagccgcggg gctggcgctg

601 cgtcgccgct ccccaggcgc cggccccggc cgcccagctc ggctaccagc ccgccgccgg gcagcggcga ggggtccgcg gccggggccg gcgggtcgag ccgatggtcg ggcggcggcc

661 ctcgggcgag gacgccgccg cgaaccgggc gttcttcgag gcggtcatcg agtccgtcgc gagcccgctc ctgcggcggc gcttggcccg caagaagctc cgccagtagc tcaggcagcg

721 cgagaagcgc gccgcgaacc cgtcctccac ggcggccgtc accgtctact acagcgccgc gctcttcgcg cggcgcttgg gcaggaggtg ccgccggcag tggcagacga tgtcgcggcg

781 caacgcgccc agcttccgca cgcagatagc ccgttccacc cagatctgga acagctcggt gttgcgcggg tcgaaggcgt gcgtctatcg ggcaaggtgg gtctagacct tgtcgagcca

841 ccccaacgtc agactcgccg agagcagctc gggagcggac tccgcgtact acgagggcaa gaggttgcag tctgagcggc tctcgtcgag ccctcgcctg aagcgcatga tgctcccgtt

901 cgactcgcgc ggttcgtacg cctccaccga cggacacggc aacggctaca ttttcctcga gctgagcgcg ccaagcatgc ggaggtggct gcctgtgccg ttgccgatgt aaaaggagct

961 ctaccggcag aaccagcagt acgactcgac ccgggtgacc gcgcacgaga cggggcacgt gatggccgtc ttggtcgtca tgctgagctg ggcccactgg cgcgtgctct gccccgtgca

(continured)

Figure C.6. Nucleotide sequence of S. rochei snp locus fragment. 206 1021 gctcggcctg cccgaccact actccgggcc gtgcagcgag ctgacgtcgg gcggcggttg cgagccggac gggctggtga tgaggcccgg cacgtcgctc gactacagcc cgccgccaac

1081 gatccgagct cggtaccaag cttgatgcat agcttgagta ttctatagtg tcacctaaat ctaggctcga gccatggttc gaactacgta tcgaactcat aagatatcac agtggattta

1141 a g c t t t c g a a

Figure C.6 (continued).

207 1 ctcactatag ggcgaattgg gccctctaga tgcatgctcg agcggccgcc agtgtgatgg gagtgatatc ccgcttaacc cgggagatct acgtacgagc tcgccggcgg tcacactacc

61 atatctgcag aattcgccct tcggcccagc ggggtggggc ggcagccggt gcgctcccgg tatagacgtc ttaagcggga agccgggtcg ccccaccccg ccgtcggcca cgcgagggcc

121 gtgaagagct gtccgccgac ggcctgctcg atgcggcgca gctgggtcgt caacgacggc cacttctcga caggcggctg ccggacgagc cacgccgcgt cgacccagca gttgctgccg

181 tgtgtcatgc cgagttggcg ggccgccttg tgcacgctgc cggtgtcggc tatcgcgcac acacagtacg gctcaaccgc ccggcggaac acgtgcgacg gccacagccg atagcgcgtg

241 agtgcacgaa gatgcctcac ttcgagctcc acagggcgga gcataacggc ggtgtcacac tcacgtgctc ctacggagcg aagctcgagg tgtcccgcct cgtattgccg ccacagtgtg

301 ggcacaccag acagcgaccg gccggtgaac acgccgcgaa atcggcccgg agcgcccggt ccgtgtggtc tgtcgctggc cggccacttg tgcggcgctt tagccgggcc tcgcgggcca

361 gatgggcccg tgttatcgcc agatggcatc attccgcaac gcggtccgcc cgaacagact ctacccgggc acaatagcgg tctaccgtag taaggcgttg cgccaggcga gcttgtctga

421 ctcccgcagc aggaatccgt ccccagggtc cggcgacgcc cccacgccag cacgccccac gagggcgtcg cccttaggca ggggtcccag gccgctgcgg gggtgcggtc gtgcggggtg

481 gtcgcggcgc cggccctcgg agcctaagga gaccccccga tgcgataccg cagatcggct cagcgccgcg gccgggagcc tcggattcct ctggggggct acgctatggc gtctagccga

541 gccrcggcgg cgctcggcct cggtgtgacc gtggccctcg gactgggcgc cgtgcccgcg cggagccgcc gcgagccgga gccacactgg caccgggagc ctgacccgcg gcacgggcgc

601 accgcgtccg acgcggcagc cccggctccc cgccaccaga cggcgtctgc ccccgcggag tggcgcaggc tgcgccgtcg gggccgaggg gcggtggtct gccgcagacg ggggcgcctc

661 gccaccaccc agggctacac gggcgagagc gagcgcgcca acagggagtt cttcaagttc cggtggtggg tcccgatgtg cccgctctcg ctcgcgcggt tgtccctcaa gaagttcaag

721 atcgtgaagg agaccctccg caagcagggc gagaagcccg gcttccagca ggtcaccgtc tagcacttcc tctgggaggc gttcgtcccg ctcttcgggc cgaaggtcgt ccagtggcag

781 cggtacaacg cgagcagcgc gcccagcttc cgcagccaga tagccaacag cacccgggtg gccatgttgc gctcgtcgcg cgggtcgaag gcgtcggtct atcggttgtc gtgggcccac

841 tggaacgccg ccgtgcgcaa tgtccagctc gccgagggca gcggcgccag cttcagctac accttgcggc ggcacgcgtt acaggtcgag cggctcccgt cgccgcggtc gaagtcgatg

901 cgcgagggca acgacccgcg cggctcctac gcctacaccg acgggcacgg cggcggctac gcgctcccgt tgctgggcgc gccgaggatg cggatgtggc tgcccgtgcc gccgccgatg

961 atcttcctcg actacgcgca gaaccagcag tacaactcca accgcgtcgt cgcgcacgag tagaaggagc tgatgcgcgt cttggtcgtc atgttgaggt tggcgcagca gcgcgtgctc

(continued)

Figure C.7. Nucleotide sequence of S’, flocculus snp locus fragment.

208 1021 accgggcacg cgctcggcct gccggaccac tactcgggtc cgtgcagcga gctgatgtcc tggcccgtgc acgagccgga cggcctggtg atgagcccag gcacgtcgct cgactacagg

1081 ggtggcggct ggatccgagc tcggtaccaa gcttgatgca tagcttgagt attctatagt ccaccgccga cctaggctcg agccatggtt cgaactacgt atcgaactca taagatatca

1141 gtcacctaaa tagcttgg cagtggattt atcgaacc

Figure C.7 (continued).

209 1 cggcccagcg gggtggggcg gcagccggtg cgggcgcgca cgaagagctg ggcgccgagc gccgggtcgc cccaccccgc cgtcggccac gcccgcgcgt gcttctcgac ccgcggctcg

61 gcgcgctcga tccgccgtag ctgggtgctc aacgagggct gtgccaggcc gagttcccgg cgcgcgagct aggcggcatc gacccacgag ttgctcccga cacggtccgg ctcaagggcc

121 gccgctttgt gcaggctgcc ggtgtcgggc gatggcgcag agcacacgca gatgtctcac cggcgaaaca cgtccgacgg ccacagcccg ctaccgcgtc tcgtgtgcgt ctacagagtg

181 ctccaactcc atggggggag cgtagggcgg cacggaacac gacaccagat gttcaagtta gaggttgagg tacccccctc gcatcccgcc gtgccctgtg ctgtggtcta caagttcaat

241 caaggaaagc ggacgagtcg gccggtgata gcccggagct atccccaagt gccatcattc gttcctttcg cctgcccagc cggccactat cgggcctcga taggggttca cggtagtaag

301 cccgccgtgt tgggctgacc gagactcact ggcaaccgat cgttcaaccc cacaggacga gggcggcaca acccgactgg ctctgagtga ccgttggcta gcaagttggg gtgtcctgct

351 gaaggagccc tccccatggc gtcgttccga aacagcaccg ggtccaagag caccaggaga ctccctcggg aggggtaccg cagcaaggct ttgtcgtggc ccaggttctc gtggtcctct

421 ctcctgggcc tggcactcgg cctcggactg gcctccgccg cgctcggcac cgcgagcccc gaggacccgg accgtgagcc ggagcctgac cggaggcggc gcgagccgtg gcgctcgggg

481 gccggtgccc aggacaccac cccggcaccc gcccgcaccg ttgccgccgg ctacgtggcc cggccacggg tcctgtggtg gggccgtggg cgggcgtggc aacggcggcc gatgcaccgg

54 1 ggcgccgagg acgccggcac caaggcgttc ttcgacgcgg tactgaagtc ggtcgccaag ccgcggctcc tgcggccgtg gttccgcaag aagctgcgcc atgacttcag ccagcggttc

601 cggcaggccg aacagccctc cctacaggcg gtgaccgtct actacaacgc ctcacaggcc gccgtccggc ttgtcgggag ggatgtccgc cactggcaga tgatgttgcg gagtgtccgg

661 ccgagcttcc gcacccagat atcgagctcc gcctcgatat ggaacagctc cgtgtcgaac ggctcgaagg cgtgggtcta tagctcgagg cggagctata ccttgtcgag gcacagcttg

721 gtcaagctcc aggcgacctc cagcggcggc gacttcgcct actacgaggg caacgactcg cagttcgagg tccgctggag gtcgccgccg ctgaagcgga tgatgctccc gttgctgagc

781 cgcgggtcct acgcctccac cgacggccac ggccgcggct acatcttcct ggactacgcg gcgcccagga tgcggaggtg gctgccggtg ccggcgccga tgtagaagga cctgatgcgc

841 cagaaccggc agtacgactc catccgcgtc accgcccacg agaccggcca tgtgctgggc gtcttggccg tcatgctgag gtaggcgcag tggcgggtgc tctggccggt acacgacccg

901 ctgcccgacc actacagcgg gccgtgcagc gagctgatgt cgggcggcgg c gacgggctgg tgatgtcgcc cggcacgtcg ctcgactaca gcccgccgcc g

Figure C.8. Nucleotide sequence of S. spadicis snp locus fragment.

210 1 cggcccagcg gggtggggcg gcagccggtg cgctcgcggg cgaagagctg gccgccgagc gccgggtcgc cccaccccgc cgtcggccac gcgagcgccc gcttctcgac cggcggctcg

61 gcgtgctcga tgcggcgcaa ctgggtcgtc aacgacggct ggctcatgcc gagctggcgg cgcacgagct acgccgcgtt gacccagcag ttgctgccga ccgagtacgg ctcgaccgcc

121 gcggccttgt gcaggctgcc gctgtcggcg atggcgcaca gcgcacggag gtgccggacc cgccggaaca cgtccgacgg cgacagccgc taccgcgtgt cgcgtgcctc cacggcctgg

181 tcgagctcca tggtcgatga gcgtagaggc gctacggccg ccgcaccaga tacccaaaca agctcgaggt accagctact cgcatctccg cgatgccggc ggcgtggtct atgggtttgt

241 ccacgagaac gagccgtttc gggcagacgg cggcggggtg atagctccgg gctatcgggc ggtgctcttg ctcggcaaag cccgtctgcc gccgccccac tatcgaggcc cgatagcccg

301 attgccatca tccccgtgcg ccacatgctc cccgagactc actgccaccg aaccgttcac taacggtagt aggggcacgc ggtgtacgag gggctctgag tgacggtggc ttggcaagtg

361 ctttcaccga ccccccacgg tcgtggaacg agtagggagc cccccacaca tgaacaagcg gaaagtggct ggggggtgcc agcaccttgc tcatccctcg gggggtgtgt acttgttcgc

421 ttacctctcc gtcgccgccg tcctcggtct caccgtcgcc ggtctcggcg tgacctcgcc aatggagagg cagcggcggc aggagccaga gtggcagcgg ccagagccgc actggagcgg

481 gaccgccgcg cccgccaccc cgacggccgc cgcgcccgcc gcctcgtcct gggccgcgta ctggcggcgc gggcggtggg gctgccggcg gcgcgggcgg cggagcagga cccggcgcat

541 cgagggctcg aaggccgagg ccaagaacaa cgcggcgttc ttcgaggccg tcgtggaggc gctcccgagc ttccggctcc ggttcttgtt gcgccgcaag aagctccggc agcacctccg

601 ggtcgccgag aagcgcgcgg ccaaccccgg cgcgaagtcc gtgaccgtcg tctacgacgc ccagcggctc ttcgcgcgcc ggttggggcc gcgcttcagg cactggcagc agatgctgcg

661 ctccggagct cccaccttcg cctcgcagat agccagcagc gcgtccatct ggaacggcgc gaggcctcga gggtggaagc ggagcgtcta tcggtcgtcg cgcaggtaga ccttgccgcg

721 ggtctccaac gtgaagctcc aggcgggcac ctccggcgcc agcttcgagt accgcgaggg ccagaggttg cacttcgagg tccgcccgtg gaggccgcgg tcgaagctca tggcgctccc

781 caacgacccg cgcggctcgt acgcgagcag cgacggccac ggcaacggct acgtcttcct gttgctgggc gcgccgagca tgcgctcgtc gctgccggtg ccgttgccga tgcagaagga

841 ggactacgcg cagaacgagg agtacgactc gacgcgcgtc accacccacg agaccgggca cctgatgcgc gtcttgctcc tcatgctgag ctgcgcgcag tggtgggtgc tctggcccgt

901 cgtgctcggc ctgcccgaca cctacgacgg cccgtgcagc cagctcatgt cgggcggcgg gcacgagccg gacgggctgt ggatgctgcc gggcacgtcg gtcgagtaca gcccgccgcc

961 c g

Figure C.9. Nucleotide sequence of Streptomyces ATCC 21021 s/tp locus fragment.

211 1 tcactatagg gcgaattggg ccctctagat gcatgctcga gcggccgcca gtgtgatgga agtgatatcc cgcttaaccc gggagatcta cgtacgagct cgccggcggt cacactacct

61 tatctgcaga attcgccctt cggcccagcg gggtggggcg gcagccggtg cggccgcggg atagacgtct taagcgggaa gccgggtcgc cccaccccgc cgtcggccac gccggcgccc

121 agaagagctc ggcgcccagg gagttctcga tgcggcgcag ctgggtggtc aacgagggct tcttctcgag ccgcgggtcc ctcaagagct acgccgcgtc gacccaccag ttgctcccga

181 ggctcatgcc cagttgccgg gccgccttgt gcaggctgcc cgtgtccgcg atcgcgcaga ccgagtacgg gtcaacggcc cggcggaaca cgtccgacgg gcacaggcgc tagcgcgtct

241 gcgcgcgcag gtgtctcacc tccagctcca tgcccccgag ggtagggcgg ctccctgagt cgcgcgcgtc cacagagtgg aggtcgaggt acgggggctc ccatcccgcc gagggactca

301 cgcaccagac gcccggagtg gcgcaatcac cttccttctc aaggcagttg acggctctcg gcgtggtctg cgggcctcac cgcgttagtg gaaggaagag ttccgtcaac tgccgagagc

361 atgcggcgat gcgatggagt tatcggcggt tggcatcatc cccacgggcc ccggctcccc tacgccgcta cgctacctca atagccgcca accgtagtag gggtgcccgg ggccgagggg

421 cgagactctc tccagtcccc tagaggagga gccccccatg cgtcacccca aggtcctcac gctctgagag aggtcagggg atctcctcct cggggggtac gcagtggggt tccaggagtg

4SI ctccgtgctg accgccaccc tcggtctcgg cctcgccgcc tccctcggag tcgccccggc gaggcacgac tggcggtggg agccagagcc ggagcggcgg agggagcctc agcggggccg

541 ggcggccgtg agcgccgacc ggacgggcgg cgccgctgtg gcgtacgccg gttcgaccga ccgccggcac tcgcggctgg cctgcccgcc gcggcgacac cgcatgcggc caagctggct

601 ggaggccaag gccaaccagg ccttcttcga ggccgtcgtg aagtcggtcg cgaagaagcg cctccggttc cggttggtcc ggaagaagct ccggcagcac ttcagccagc gcttcttcgc

661 ggccgccaat cccggcgcgc tcgcggtcac cgtcgtctac agcgccgcca acgcgccgag ccggcggtta gggccgcgcg agcgccagtg gcagcagatg tcgcggcggt tgcgcggctc

721 cttccgcacc cagatcgccc gctccacgca gatatggaac agctcggtgg tgaacgtccg gaaggcgtgg gtctagcggg cgaggtgcgt ctataccttg tcgagccacc acttgcaggc

781 gctcgtcgag ggaagcaacc cggacttccg ctactacgag ggcaacgact cgcgtggctc cgagcagctc ccttcgttgg gcctgaaggc gatgatgctc ccgttgctga gcgcaccgag

841 gtacgcgagc accgacgggc acggcagcgg ctacatcttc ctcgactacc ggcagaacca catgcgctcg tggctgcccg tgccgtcgcc gatgtagaag gagctgatgg ccgtcttggt

901 gcagtacaac tcgaccaggg tgaccgccca cgagacgggg cacgtcctgg gtcttccgga cgtcatgttg agctggtccc actggcgggt gctctgcccc gtgcaggacc cagaaggcct

961 ccactacagc ggtccgtgca gcgagctgat gtcgggcggc ggctggatcc gagctcggta ggtgatgtcg ccaggcacgt cgctcgacta cagcccgccg ccgacctagg ctcgagccat (continued)

Figure C IO. Nucleotide sequence of 5. fradiae snp locus fragment.

212 1021 ccaagctcga tgcatagctt gagtattcta tagtgtcacc taaatagctt ggcgtaatca ggttcgaact acgtatcgaa ctcataagat atcacagtgg atttatcgaa ccgcattagt

1081 tggtcaragc tgtttcctgt gtgaaattgt tatcc accagtatcg acaaaggaca cactttaaca atagg

Figure C.IO (continued).

213 1 tacgccaagc tatttaggtg acactataga atactcaagc tatgcatcaa gcttggtacc atgcggttcg ataaatccac tgtgatatct tatgagttcg atacgtagtt cgaaccatgg

61 gagctcggat ccactagtaa cggccgccag tgtgctggaa ttcgcccttc ggcccagcgg ctcgagccta ggtgatcatt gccggcggtc acacgacctt aagcgggaag ccgggtcgcc

121 ggtggggcgg cagccggtgg cggcgcggaa gaagagctgg ccgccgaggg ccttttcgat ccaccccgcc gtcggccacc gccgcgcctt cttctcgacc ggcggctccc ggaaaagcta

181 gcggtggagc tgcgtggtca gcgacggctg ggtcatgccc agctgacggg ccgccttccg cgccacctcg acgcaccagt cgctgccgac ccagtacggg tcgactgccc ggcggaaggc

241 tacgctgccg gagtcggcga tggcgcacag ggctcgaaga tgcctcacct ctagctccac atgcgacggc ctcagccgct accgcgtgtc ccgagcttct acggagtgga gatcgaggtg

301 tcggggagca tagccatgcg cgatggcgtc acaccagact cctgaaccgc catgaactgc agcccctcgt atcggtacgc gctaccgcag tgtggtctga ggacttggcg gtacttgacg

361 cctttcctcc tggggtattc agtcgtccta tcgcctcctg gcatctccgt ctcccgggcg ggaaaggagg accccataag tcagcaggat agcggaggac cgtagaggca gagggcccgc

421 aactgcgtcc agactcggcg gcaccaagaa atccggggcc ccggccgccg ccccacacgg ttgacgcagg tctgagccgc cgtggttctt taggccccgg ggccggcggc ggggtgtgcc

481 cggncgtagc cctcggaaat taaggagccc ccacatgaga tctcccaaga cggcgctgtc gccngcatcg ggagccttta attcctcggg ggtgtactct agagggttct gccgcgacag

541 ggcagcgctc ggcctgggcc ttgccgccgc actcaccgcc gcggtcccgg cgtccgcgac ccgtcgcgag ccggacccgg aacggcggcg tgagtggcgg cgccagggcc gcaggcgctg

601 gcccgcctcc cccgcctctc atcgctccac gcccgcctcg gtggccgcgt acaacggctc cgggcggagg gggcggagag tagcgaggtg cgggcggagc caccggcgca tgttgccgag

661 ggccgagcag aaggccgaca ccaaggcctt cttcgacgcc gtgctgaagt cggcggcgaa ccggctcgtc ttccggctgt ggttccggaa gaagctgcgg cacgacttca gccgccgctt

721 gaagctgaag gccaacccgc acctgcagtc ggtcaccgtg acctacgacg cgagcgccgc cttcgacttc cggttgggcg tggacgtcag ccagtggcac tggatgctgc gctcgcggcg

781 cccgacgttc gcgggccaga tatcgcaggc cgcgcagatc tggaacagcg cggtctccaa gggctgcaag cgcccggtct atagcgtccg gcgcgtctag accttgtcgc gccagaggtt

841 cgtcaagctc cagtcgggca gcggcggcga cttccagtac cgcgagggca acgacccgcg gcagttcgag gtcagcccgt cgccgccgct gaaggtcatg gcgctcccgt tgctgggcgc

901 cggttcgtac gccagcaccg acggccacgg cagcggctac gtcttcctgg actacgcgca gccaagcatg cggtcgtggc tgccggtgcc gtcgccgatg cagaaggacc tgatgcgcgt

(continued)

Figure C .ll. Nucleotide sequence of 51 namwaensis snp locus fragment.

214 961 gaaccagcag tacaactcga cccgcgtggt ggcccacgag accggccatg tgccgggtct cttggtcgtc atgttgagct gggcgcacca ccgggtgctc tggccggtac acgacccaga

1021 gccggaccac tacgagggcc cgtgcagcga gctgatgtcc ggcggcggc cggcctggtg atgctcccgg gcacgtcgct cgactacagg ccgccgccg

Figure C .ll (continued).

215 1 cactataggg cgaattgggc cctctagatg catgctcgag cggccgccag tgtgatggat gtgatatccc gcttaacccg ggagatctac gtacgagctc gccggcggtc acactaccta

61 atctgcagaa ttcgcccttc ggcccagcgg ggtggggcgg agccggtgcg gccgcgggag tagacgtctt aagcgggaag ccgggtcgcc ccaccccgcc tcggccacgc cggcgccctc

121 aagagctcgg cgcccaggga gttctcgatc cggcgcagct gggtggtcaa cgagggctgg ttctcgagcc gcgggtccct caagagctag gccgcgtcga cccaccagtt gctcccgacc

181 ctcatgccca gttgccgcgc cgccttgtgc aggctgcccg cgtccgcgat ggcgcagagc gagtacgggt caacggcgcg gcggaacacg tccgacgggc gcaggcgcta ccgcgtctcg

241 gcgcgcagat gcctcacctc cagctccatg ccccgagggt agggcggcga cctgcgtcgc cgcgcgtcta cggagtggag gtcgaggtac ggggctccca tcccgccgct ggacgcagcg

301 accagacagc cggtgcgcgt caaatacctt gctttgcaag gcagttggac ggaccgagca tggtctgtcg gccacgcgca gtttatggaa cgaaacgttc cgtcaacctg cctggctcgt

361 gggtgatgtg ggcgggctat cgccagttga catcatcccc acgcgccccg gcttccccga cccactacac ccgcccgata gcggtcaact gtagtagggg tgcgcggggc cgaaggggct

421 gactcgctcc gaccccctac aggaggagcc ccccatgcgt caccccaagg tcctcaactc ctgagcgagg ctgggggatg tcctcctcgg ggggtacgca gtggggttcc aggagttgag

481 cgtactgacc gccgccctcg gtctcggtct cgccgcctcg ctcggtacgg ctccggcggc gcatgactgg cggcgggagc cagagccaga gcggcggagc gagccatgcc gaggccgccg

541 ggccgtcggt acggctccgg cggccgctcc cgccgccgtc gcgtacgccg gttcggccga ccggcagcca tgccgaggcc gccggcgagg gcggcggcag cgcatgcggc caagccggct

601 ggaggccaag gccaacgagg cgttcttcga ggccgtcgtg aagtcggtgg cgaagaagcg cctccggttc cggttgctcc gcaagaagct ccggcagcac ttcagccacc gcttcttcgc

661 ggccgccaac ccgggcgcgg ccgccgtcac cgtcgtctac agcgcgagca acgcgccgag ccggcggttg ggcccgcgcc ggcggcagtg gcagcagatg tcgcgctcgt tgcgcggctc

721 cttccgcacc cagatcgccc gctccaccca gatatggaac agctcggtgg tgaacgtccg gaaggcgtgg gtctagcggg cgaggtgggt ctataccttg tcgagccacc acttgcaggc

781 gctcgtcgag ggcagcaacc cggacttccg gtactacgag ggcaacgact cccgcggctc cgagcagctc ccgtcgttgg gcctgaaggc catgatgctc ccgttgctga gggcgccgag

841 gtacgccagc accgacggcc acggcagcgg ctacatcttc ctcgactacc ggcagaacca catgcggtcg tggctgccgg tgccgtcgcc gatgtagaag gagctgatgg ccgtcttggt

901 gcagtacaac tccacccggg tgacggccca cgagaccggg cacgtcctcg gcctgccgga cgtcatgttg aggtgggccc actgccgggt gctctggccc gtgcaggagc cggacggcct

961 ccactacagc gggccgtgca gcgagctgat gtcgggcggc ggctggatcc gagctcggta ggtgatgtcg cccggcacgt cgctcgacta cagcccgccg ccgacctagg ctcgagccat

1021 ccaagcttga tgcatagctt gagta ggttcgaact acgtatcgaa ctcat

Figure C.12. Nucleotide sequence of S. fetlus snp locus fragment. 216 Appendix D

Maps of Plasmids Used

217 EcoRJ,7 EcoRV.17 j.Notl.33 :Avb I.40 : iXhol.40 :2.Xbal,52 lHApal.58

.Oialll.482 lacZ

pCR2.1 TOPO on 3908 bps

kanR

ampR

Ncol.1562

2118,Seal 1997.Xmnl

Figure D.l. Map of plasmid pCR2.1 TOPO

218 2682.Sapl .Pvutl.52 Hincit J ___ L

Pvull.374

lacZ-alpha

pUC19 2686 bps

1803,Oral 1784,0ml

Dral.1092

Figure D.2. Map of plasmid pUC19

219 BamHI.1 Xhol.42

7821 .Sad.

rep

ORF 7243.Spel.

6934.Mul.„ torB plJIOI MIB torA 8830 bps

6155.Bglll spdB

spdA (re

Pwull.3975

Figure D.3. Map of plasmid pIJlO l

220 10217,Kpnl.10620.BamHI^Xho,.41 9864,PsU.

,Notl,1124 9264,Ndel. rep 9033,Clal..

8760,EcoRV..

tsiR

kilB plJ303 0RF85 10620 bps 7820,Sacl Pstl,2773 spdB korB 7242,Spel spdA torA tra 6597,Notl

6154,Bglll

Figure D.4. Map of plasmid pIJ303

221 Smal 32 Xhol.42 Smal 327 5849 .Smal .

5462 .PstI . 5337 .Bell rep

4948 .Smal

PIJ486 6217 bps 4572 .Bglll 4568 ,Pstl Crf56 4548 .PstI 4525 .Smal 4500 Bglll 4495 .Bell 4287 .PstI

tsrR Bell 3189

3499 .Smal 3364 PstI 3244 Bell

Figure D.5. Map of plasmid pIJ486

222 BamHI ,1 ,Xhol.42 5445 Apal . 5283 ,Kpnl . j

4930 PstI , .Bell .812 4805 .Bell .. rep .Ncol.910

4538 .SphI . 4538 .Bbui .Notl.1125 lelCI

4373 .Bglll ... plJ702 melCI 5685 bps Apal .1476 4137 .Sad

orf56 3948 Allul 3831 Wlul

tsrR

3244 8 d l Eco RV2711

Figure D.6. Map of plasmid pIJ702

223 |tll,234 ndlll,246 Xbal.420 Hincll.466 /Sall.466

,.Smal.631 ,.Sacl.712

3l55,Dral 3136,Oral. aphi pKK840 4042 bps BamHI.1014 S a d .1108

bla

2444.Dral EcoRI.1626 Bglll.1635 Ndel.1849

Figure D.7. Map of plasmid pKK840

224 Bgin.226 ;Pstl.230 ,Kpnl.583

Xhol,1027

PANT826 7334 bps

Notl.2110 PerniE' 4637 SphI 4919,Bglll EcoRI Bglll S ad tSiR Hindlll Spel Hpal Mul 4595 Qal.3423

Figure D.8. Map of plasmid pANT826

225 §9111.2 ^

,Kpnl,583

,Xhol.1027

PANT827 Notl.2110 8617 bps ,PetmE' 6202, Bglll or«6 5934,Xbal.f 5914,EcoRl.v 5908,Sacl; 5902.Kpnl 5685,PstI tStR

Oal,3423 4762,Pstl EcoRV,3696 4636,Xbal 4624,Pstl 4612,Hindlll ; 4595,Mul

Figure D.9. Map of plasmid pANT827

226 Hindi

2745.Sapl

230SV\paLI

Ndel.560 ApaLI.S66 BamHI PANT840 2749 bps

Sspl.928

1645.Bpml Xmnl.1131 Seal, 1252

Figure D.IO. Map of plasmid pANT840

227 Hindlll SphI PstI BspM AccI Hncll Sail Spel .Pvull,52 Nrul Mul Notl Xmalll Nhel □sal Ncol Styl Pvull,434 A^l .Bbel.505 Clal Nail.SOS Bell lacZ-alpha StuI 2206,AI wNI.. Ndel.557 Xhol Bglll Xbal BamHI Smal PANT841 Xmal Asp718l 2746 bps Kpnl Orall.751 S a d EcoRI 344 Aatll,809

Sspl.925

1660.ECO31I . Xmnl.1128 1647,Cfr10l 1642.Qsul Scal.1249

Figure D ll Map of plasmid pANT841

228 BamHI .1 Xhol ,42

Ncol ,910 rep

5091 Xhol 5080 ,Clal PANT842 5072,SphI 6763 bps orf56 snpA 4763 ,Bglll Ncol 2038 4667 Mlul Ndel 2207 tsrR

4137,S ad

3610JVHul

Figure D.12. Map of plasmid pANT842

229 BssHI BsiEI EC052I BsiEI Eco52l Hinfl.40 BsiEI Pvull.52 Spel 2805.rtnfl Hndlll.233 Mul 2791.BsiEI. Sphl.239 Notl 2730,Hinfl. :.Pstl,245 BsiEI ;.Hincll,251 EC052I :.Sall.251 Dsal Ncol Styl aal StuI Aval 2367,BsiEl Xhol 2334,HinfI.. Pvull,518 Bglll BsiEI.548 Xbal lacZ-alpha .Bgll.574 BamHI BssHII B& Styl PANT846 BsiEI Eco52l 2830 bps Dsal MscI Stfl Aval Smal Xmal Styl Kpnl S ad EcoRI Sspl.1009 428 1817,Kinfl

1692.Bgll BsiEI.1295 1444,BsiEI

Figure D.13. Map of plasmid pANT846

230 BamH.I Xhol.42 4941.KpnI. 4763,StuI,

rep

,...Notl.1125 PANT849 5343 bps 3652 SphI P-snpA EcoRI Bglll Xbal orf56 S ad Hindlll Oral Spel Hpal tstR Mul 3610 Ndel.2207

Figure D.14. Map of plasmid pANT849

231 BamH.I Xhol.42 6224.Kpnl. 6046,Stul.

Bcll.812 rep

..Notl.1125

PANT852 4954,Xhol 6626 bps 4929,EcoRI orfse 4923,Sacl 4917,Kpnl aphll

Bcll,2189 tsiR

EcoRV,2711 3651 .Xbal 3627,Hindlll Bcll.3244 3623,Dial 3618,Spel 3614,Hpal 3610.MUI

Figure D.15. M ap of plasmid pANT852

232 BamHI .1

4686 .EcoRI 4680 .Sad Bdl.812 4674 .Kpnl DELsnpR rep '■-snpA

..Notl.1125

PANT853 aphll 5561 bps

orfS6

3627 .Hindlll 3623 .Oral 3618,Spel tsrR 3614.Hpal 3610 Mlul Bell 2189 3244 .Bdl Ndel 2207 ECORV2711

Figure D.16. M ap of plasmid pANT853

233 BamH.I : Xhol.42

4044,Pstl.,

3652 SphI Bbul EcoRI Bglll Xbal S ad PANT855 Hindlll Oral 4799 bps Spel Hpal Mul 3610

orf56 tS(R

Figure D.17. Map of plasmid pANT8S5

234 BamH.I Xhol.42

5327.Pstl. rep ,.Ncol.910

4935.Sphl. 4929, EcoRI.:

4 7 0 0 , P s t l , PANT856 aphll 6082 bps

4348,Sphl 4317,Ncol

Ncol.2038

tsrR 3777,Pstl 3639,Pstl 3633,Sphl

Figure D.18. Map of plasmid pANT856

235 bla on

6352,Stul...

PANT857 snpR 7963 bps

5260 Xhol Clal P-snpA SphI EcoRI Bglll Notl,2714 Xbal orf56 S ad tsfR Hindlll Spel Hpal Mul Ball.3359 5199

Figure D.19. Map of plasmid pANT857

236 BamHU Xhol.42

,PmaCI.799

rep .Notl.1125 6460.BstXI. 6012 EcoRI Bglll Xbal 6037.Xhol. SacI 6026.Clal.^ inpA Oral Spel PANT859 Hpal Mul 5976 7709 bps

Ndel.2207

Clal.2438

aphll BspHI.2748 on

3923.Xmnl

Figure D.20. Map of plasmid pANT859

237 Hindlll 233 Pstl 245 Mlul269 / .Xhol .310 i . .Sacl.338 cri

aprR PANT882 3498 bps

Sacl .1090 EcoRI .1096 bla

Figure D.21. Map of plasmid pANT882

238 Pvull,S2 Hindlll 233 SphI 239 Pstl 245

ori 3098AIWNI ..

PANT883 3638 bps fusion Mlul.990 Xhol .1003

Hindi .1142 bla EcoRI .1236 Pvull.1326

Ndel .1449

Figure D.22. Map of plasmid pANT883

239 .Pvull,52 .Hindlll 233 .SphI 239 7.Pstl245

Ori 3343AlwNI ., aprR-p

.Pstl ,689 Pvull ,744

3000 ,Dral 2981 £)ral PANT886 aphll 3883 bps SphI ,1041 Ncol,1072 Styl,1072

ampR

EcoRI ,1481 2289,Oral Pvull ,1571 Ndel .1694

Figure D.23. Map of plasmid pANT886

240 Pvull 52 Hindlll 233 .SphI 239 •. Pstl 245 ori

1826 AlwN! PANT887 2366 bps

Pstl ,689

Pvull.744 aphll

SphI .1041 Ncol .1072 Styl.1072

Figure D.24. Map of plasmid pANT887

241 Pvull.52 Sphl.243 Pstl,249

Ori

apiR-p

.Pstl.693 ..Pvull.748

PANT889 aphll 3887 bps SphI. 1045 Ncol. 1076 Styl.1076

ampR

EcoRI.1485 2293.0ral Pvull.1575 Ndel.1698

Figure D.25. Map of plasmid pANT889

242 Pvull.52 Sphl.243 PsU.249

apiR-p Pstl.693 Pvull.748

3008.Dral 2989.Dral, PANT890 aphll 3891 bps SphI. 1045 Ncol. 1076 Styl.1076

bla

2297,Dral Pvull. 1579 Ndel.1702

Figure D.26. Map of plasmid pANT890

243 EcoRI.1 .Ndet.126 3297.Sacll.

3057,Sacl,

2963,BamHI,

.Dral.721

...Scal,818 aphi PANT891 bla 2661,Sacl 3574 bps

Dral,1413 Oral,1432

Figure D.27. Map of plasmid pANT891

244 Pvull.52

PANT892 2327 bps

on

1320,Pvull . Mul,980 Xhol.993 Hincll.1132

Figure D.28. Map of plasmid pANT892

245 Pvull,52

P-aptR

PANT893 aptR 3626 bps Mul,974 Xhol,987

Hincll,1126 bla

Pvull,1314

Ndel. 1437

Figure D.29. Map of plasmid pANT893

246 Xhol ,42

rep ..Ncol .910

Notl.1125

5196 .Pvull ..

1840 Xhol Clal PANT894 SphI coRI 6512 bps Bglll orf56 Xbal Sacl ndlll Oral 1792 Ncol 2038 on

aprR-p aPh"

3857 Pvull 3666 .SphI : Ncol 2833 3660 Pstl SphI 2864 Pvull .3161 Pstl .3216

Figure D.30. Map of plasmid pANT894

247 Xhol.42

rep .Ncol.910

Noll,1125

4931.Pvuil„,

4575 Xhol PANT895 Clal ^P-snpA SphI 6247 bps EcoRI Bglll Xbal Sacl Hindlll Oral 4527 Ncol.2038 ori A,. Ndel.2207 P-aptR Pvull.2330 Sacl.2430 Xhol.2657 Mul.2670 3182.Sacl

Figure D.31. Map of plasmid pANT895

248 7616,Seal

bla on

6542 .StuI „

6224 Sacl I ... snpR PANT1200 8058 bps

>425 5712,Bsu36l SphI P-snpA coRI Bglll ...... Xbal Sacl Pstl ndlll tsrR crf56 Kpnl 5378

Sacll 3654 4289 Aoel 4105 .Sacll

Figure D.32. Map of plasmid pANT1200

249 6266.Sacll. 6027.Stul,

5709,SacH (ep

5197,Bsu36l...

4904 EcoRl PANT1201 Bglll !-snpA Xbal 6607 bps Sacl BamHI Hindlll Spel 4869 Sacll,1970

ori Ndel,2207 aphll

Figure D.33. Map of plasmid pANT1201

250 6001,Sacll. 5762.Stul.

rep

.Noll,1125

4653 Clal PANT1202 SphI 'iP-snpA EcoRi 6342 bps Bglll Xbal Pstl BamHI Hindlll Spel Sacll. 1970 4604 ori Ndel.2207 aprR aprR-p

Figure D.34. Map of plasmid pANT1202

251 rep

PANT3021 4978 .Clal VAA-UP 6661 bps orf56 4881 ,BamHI 4875 .Hindlll

Ori aphll

3666 .SphI

Figure D 35 Map of plasmid pANT3021

252 rep

PANT3022 4978 .Clal 6661 bps orf56 4881 ,BamHI 4875 .Hindlll

on aphll

SphI 2864 3666 .SphI

Figure D.36. Map of plasmid pANT3022

253 rep

'snpA PANT3023 4978 .Clal VSI-UP 6661 bps orfSS 4881 BamHI 4875 Blndlll

ori aphll PaprR

3666 .SphI

Figure D.37. Map of plasmid pANT3023

254 rep

PANT3024 4978 ,Clal 6661 bps 56 4881 BamHI Off 4875 ^Hindlll

ori aphll PaprR

3666 ,Sphl

Figure D.38. Map of plasmid pANT3024

255 tep

PANT3025 4978,Clal. 4970,Sphl 6661 bps 4881.BamHI 4875.Hindlll

aphll

Sphl.2864 3666.Sphl

Figure D.39. Map of plasmid pANT3025

256 rep

PANT3026 4972 ,Clal 6655 bps 4881 .BamHI orf56 4875 ^indlll

ori aphll PaprR

SphI 2864 3666 .SphI

Figure D.40. Map of plasmid pANT3026

257 rep

npA PANT3032 ^VAA 7227 bps endos tatin

4883,Pstl 4875,Hindlll aphll P-api Sphl.2864

Pstl.3216 3666,SphI: 3660,Psil

Figure D.41. Map of plasmid pANT3032

258 tep

!-snpA PANT3035 T SNP ot«6 5377,Sphl 7227 bps endostatin

4875.Hindlll aphll ori P-api Sphl.2864

3666.Sphl

Figure D.42. Map of plasmid pANT3035

259 Pstl.230

7471.Sspl.„ P-snj VAA .. Pstl.2126 PANT3042 endostatin 9296 bps Pstl.2525 6593.ECORI. 6580,Sacl 6561,Hpal 6557,Mul tsiR

ot«6

Notl.4072 Mjnl.4398

Figure D.43. Map of plasmid pANT3042

260 Pstl.230 ,Kpnl,583

7471,Sspl.. P-snj SNP ...Pstl.2126 PANT3045 endostatin 9296 bps .Pstl.2525 6593.ECORI. .Hlndlll.2533 6574,Hindlll Kpnl.2545

otB6

Figure D.44. Map of plasmid pANT3045

261 12562.BamHI 12179.Kpnl.

10909,Xhol 10898,aal. 10801,BamHI. rep snpR

VAA-endostatin HIB

PANT3052 spdB 12582 bps spdA 9033,Clal 8760.EcoRV

C3RPB5 tre

8021,Mul kocA 7820,Sacl

7242,Spel

Figure D.45. Map of plasmid pANT3052

262