US007618799B2

(12) United States Patent (10) Patent No.: US 7,618,799 B2 Coleman et al. (45) Date of Patent: Nov. 17, 2009

(54) BACTERIAL LEADER SEQUENCES FOR NCBI Report for Accession No.YP 346180, Direct Submission on INCREASED EXPRESSION Aug. 8, 2005. Huber, D., “Use of Thioredoxin as a Reporter to Identify a Subset of Escherichia coli Signal Sequences That Promote Signal Recognition (75) Inventors: Russell J. Coleman, San Diego, CA Particle-Dependent Translocation.” Journal of Bacteriology, 2005, (US); Diane Retallack, Poway, CA pp. 2983-2991, vol. 187 (9). (US); Jane C. Schneider, San Diego, Miot, M. and Betton, J., “Protein Quality Control in the Bacterial CA (US); Thomas M. Ramseier, Periplasm.” Microbial Cell Factories, 2004, pp. 1-13. Newton, MA (US); Charles D. Ma, Q., et al., “Protein Secretion Systems of Hershberger, Poway, CA (US); Stacey aeruginosa and Pfluorescens.” Biochim. Biophys. Acta, Apr. 1, 2003, Lee, San Diego, CA (US): Sol M. pp. 223-233, vol. 1611, No. 1-2. Resnick, Encinitas, CA (US) Retallack, D.M., et al..."Transport of Heterologous Proteins to the Periplasmic Space of Using a Variety of (73) Assignee: Dow Global Technologies Inc, Midland, Native Signal Sequences, ’’ Biotechnol Lett, Oct. 2007, pp. 1483 MI (US) 1491, vol. 29, No. 10. Urban, A., et al., “DsbA and DsbC Affect Extracellular Enzyme Formation in Pseudomonas aeruginosa. J. Bacteriol. Jan. 2001, pp. (*) Notice: Subject to any disclaimer, the term of this 587-596, vol. 183, No. 2. patent is extended or adjusted under 35 Wang, H., et al., “High-level Expression of Human TFF3 in U.S.C. 154(b) by 0 days. Escherichia coli.” Peptides, Jul. 1, 2005, pp. 1213-1218, vol. 26, No. 7. (21) Appl. No.: 12/022,789 EMBL Database Accession No. AF057031, Submitted Apr. 2, 1998 (XP-002483318). (22) Filed: Jan. 30, 2008 NCBI Database Accession No. Q3KHI7. Direct Submission Aug. 2005 (SP-002483317). (65) Prior Publication Data * cited by examiner US 2008/O193974 A1 Aug. 14, 2008 Primary Examiner Maryam Monshipouri Related U.S. Application Data (74) Attorney, Agent, or Firm—Jarett Abramson; Alston & Bird LLP (60) Provisional application No. 60/887,476, filed on Jan. 31, 2007, provisional application No. 60/887,486, (57) ABSTRACT filed on Jan. 31, 2007. Compositions and methods for improving expression and/or (51) Int. Cl. secretion of protein or polypeptide of interestina host cell are CI2N 9/00 (2006.01) provided. Compositions comprising a coding sequence for a CI2N IS/00 (2006.01) bacterial Secretion signal peptide are provided. The coding CI2N L/04 (2006.01) sequences can be used in vector constructs or expression (52) U.S. Cl...... 435/183; 435/252.3:435/320.1; systems for transformation and expression of a protein or 536/23.1 polypeptide of interest in a host cell. The compositions of the (58) Field of Classification Search ...... 435/183, invention are useful for increasing accumulation of properly 435/252.3, 320.1; 536/23.2 processed proteins in the periplasmic space of a host cell, or See application file for complete search history. for increasing secretion of properly processed proteins from the host cell. In particular, isolated Secretion signal peptide (56) References Cited encoding nucleic acid molecules are provided. Additionally, U.S. PATENT DOCUMENTS amino acid sequences corresponding to the nucleic acid mol ecules are encompassed. In particular, the present invention 5,629, 172 A * 5/1997 Mascarenhas et al...... 435/69.7 provides for isolated nucleic acid molecules comprising 5,801,017 A * 9/1998 Werber et al...... 435/69.2 nucleotide sequences encoding the amino acid sequences FOREIGN PATENT DOCUMENTS shown in SEQID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, and 24, and the nucleotide sequences set forth in SEQID NO: 1. WO WO 2005/089093 A2 9, 2005 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, and 23, as well as variants and OTHER PUBLICATIONS fragments thereof. Paulson et al., Nature Biotechnology, 23, 873-878, 2005.* 68 Claims, 11 Drawing Sheets U.S. Patent Nov. 17, 2009 Sheet 1 of 11 US 7,618,799 B2

??eIduu???00£MOCIduuou,dys-OqspMod

U.S. Patent Nov. 17, 2009 Sheet 2 of 11 US 7,618,799 B2

U.S. Patent Nov. 17, 2009 Sheet 3 of 11 US 7,618,799 B2

U.S. Patent Nov. 17, 2009 Sheet 4 of 11 US 7,618,799 B2

U.S. Patent Nov. 17, 2009 Sheet 5 of 11 US 7,618,799 B2

U.S. Patent Nov. 17, 2009 Sheet 6 of 11 US 7,618,799 B2

zz00-89°d= U.S. Patent Nov. 17, 2009 Sheet 7 of 11 US 7,618,799 B2

|odK1pIJAA Á??Apevoud

O O

v IO/(n) AAIoe would U.S. Patent US 7,618,799 B2

U.S. Patent US 7,618,799 B2

8 U.S. Patent NOV, 17 9 2009 Sheet 10 of 11 US 7 9 618,799 B2

|S

QZ?Š?I, |S

9

S

308 IS

VYHN {S H

|S

2, a U.S. Patent Nov. 17, 2009 Sheet 11 of 11 US 7,618,799 B2

US 7,618,799 B2 1. 2 BACTERIAL LEADER SEQUENCES FOR branes (see Agarraberes and Dice (2001) Biochim Biophys INCREASED EXPRESSION Acta. 1513:1-24; Muller et al. (2001) Prog Nucleic Acid Res Mol. Biol. 66:107-157). CROSS REFERENCE TO RELATED Strategies have been developed to excrete proteins from the APPLICATION cell into the supernatant. For example, U.S. Pat. No. 5,348, 867; U.S. Pat. No. 6,329, 172: PCT Publication No. WO This application claims the benefit of U.S. Provisional 96/17943; PCT Publication No. WO 02/40696; and U.S. Application Ser. Nos. 60/887,476, filed Jan. 31, 2007 and Application Publication 2003/0013150. Other strategies for 607887,486, filed Jan. 31, 2007, the contents of which are increased expression are directed to targeting the protein to herein incorporated by reference in their entirety. 10 the periplasm. Some investigations focus on non-Sec type secretion (see for e.g. PCT Publication No. WO 03/079007: REFERENCE TO SEQUENCE LISTING U.S. Publication No. 2003/0180937; U.S. Publication No. SUBMITTED ELECTRONICALLY 2003/0064.435; and, PCT Publication No. WO 00/59537). However, the majority of research has focused on the secre The official copy of the sequence listing is Submitted elec 15 tion of exogenous proteins with a Sec-type secretion system. tronically via EFS-Web as an ASCII formatted sequence list A number of secretion signals have been described for use ing with a file named “339398 SequenceListing..txt, created in expressing recombinant polypeptides or proteins. See, for on Jan. 17, 2008, and having a size of 28,000 bytes and is filed example, U.S. Pat. No. 5,914,254; U.S. Pat. No. 4.963,495; concurrently with the specification. The sequence listing con European Patent No. 0177343; U.S. Pat. No. 5,082,783; PCT tained in this ASCII formatted document is part of the speci Publication No. WO 89/10971; U.S. Pat. No. 6,156,552; U.S. fication and is herein incorporated by reference in its entirety. Pat. Nos. 6,495,357; 6,509, 181; 6,524,827; 6,528,298; 6,558, 939; 6,608,018; 6,617,143: U.S. Pat. Nos. 5,595,898; 5,698, FIELD OF THE INVENTION 435; and 6,204,023; U.S. Pat. No. 6,258,560; PCT Publica tion Nos. WO 01/21662, WO 02/068660 and U.S. This invention is in the field of protein production, particu 25 Application Publication 2003/0044906: U.S. Pat. No. 5,641, larly to the use of targeting polypeptides for the production of 671; and European Patent No. EP 0121352. properly processed heterologous proteins. Strategies that rely on signal sequences for targeting pro teins out of the cytoplasm often produce improperly pro BACKGROUND OF THE INVENTION cessed protein. This is particularly true for amino-terminal 30 secretion signals such as those that lead to secretion through the Sec System. Proteins that are processed through this sys More than 150 recombinantly produced proteins and tem often either retain a portion of the secretion signal, polypeptides have been approved by the U.S. Food and Drug require a linking element which is often improperly cleaved, Administration (FDA) for use as biotechnology drugs and or are truncated at the terminus. vaccines, with another 370 in clinical trials. Unlike small 35 As is apparent from the above-described art, many strate molecule therapeutics that are produced through chemical gies have been developed to target proteins to the periplasm of synthesis, proteins and polypeptides are most efficiently pro a host cell. However, known strategies have not resulted in duced in living cells. However, current methods of production consistently high yield of properly processed, active recom of recombinant proteins in often produce improperly binant protein, which can be purified for therapeutic use. One folded, aggregated or inactive proteins, and many types of 40 major limitation in previous strategies has been the expres proteins require secondary modifications that are inefficiently sion of proteins with poor secretion signal sequences in inad achieved using known methods. equate cell systems. One primary problem with known methods lies in the As a result, there is still a need in the art for improved formation of inclusion bodies made of aggregated proteins in large-scale expression systems capable of secreting and prop the cytoplasm, which occur when an excess amount of protein 45 erly processing recombinant polypeptides to produce trans accumulates in the cell. Another problem in recombinant genic proteins in properly processed form. protein production is establishing the proper secondary and tertiary conformation for the expressed proteins. One barrier SUMMARY OF THE INVENTION is that bacterial cytoplasm actively resists disulfide bonds formation, which often underlies proper protein folding (Der 50 The present invention provides improved compositions man et al. (1993) Science 262:1744-7). As a result, many and processes for producing high levels of properly processed recombinant proteins, particularly those of eukaryotic origin, protein or polypeptide of interest in a cell expression system. are improperly folded and inactive when produced in bacte In particular, the invention provides novel amino acid and 18. nucleotide sequences for secretion signals derived from a Numerous attempts have been developed to increase pro 55 bacterial organism. In one embodiment, the secretion signals duction of properly folded proteins in recombinant systems. of the invention include an isolated polypeptide with a For example, investigators have changed fermentation con sequence that is, or is Substantially homologous to, a ditions (Schein (1989) Bio/Technology, 7:1141-1149), varied Pseudomonas fluorescens (P. fluorescens) secretion polypep promoter strength, or used overexpressed chaperone proteins tide selected from a mutant phosphate binding protein (pbp). (Hockney (1994) Trends Biotechnol. 12:456-463), which can 60 a protein disulfide isomerase A (dsbA), a protein disulfide help prevent the formation of inclusion bodies. isomerase C (dsbC), a Cup A2, a CupB2, a CupC2, a NikA, a An alternative approach to increase the harvest of properly FlgI, a tetratricopeptide repeat family protein (ORF5550), a folded proteins is to secrete the protein from the intracellular toluene tolerance protein (Ttg2C), or a methyl accepting environment. The most common form of Secretion of chemotaxis protein (ORF8124) secretion signal, as well as polypeptides with a signal sequence involves the Sec System. 65 biologically active variants, fragments, and derivatives The Sec system is responsible for export of proteins with the thereof. In another embodiment, the secretion signals of the N-terminal signal polypeptides across the cytoplasmic mem invention include an isolated polypeptide with a sequence US 7,618,799 B2 3 4 that is, or is substantially homologous to, a Bacillus coagul separated by SDS-PAGE, transferred to a filter and visualized lans Bce Secretion signal sequence. The nucleotide sequences with an antibody to insulin (Chicken polyclonal, Abcam cathi encoding the signal sequences of the invention are useful in ab14042). vectors and expression systems to promote targeting of an FIG. 8 shows an SDS-PAGE analysis of EP484-003 and expressed protein or polypeptide of interest to the periplasm EP484-004 fractions. Representative results of SDS-PAGE of Gram-negative bacteria or into the extracellular environ analyses are shown. Molecular weight markers (L) are shown ment. at the center. BSA standards (BSA Stds.) are indicated. The DNA constructs comprising the secretion signal sequences arrow indicates induced band. Below each lane is the fraction are useful in host cells to express recombinant proteins. type: soluble (Sol), insoluble (Ins), or cell-free broth (CFB). Nucleotide sequences for the proteins of interest are operably 10 Above each lane is the sample time at induction (IO), or 24 linked to a secretion signal as described herein. The cell may hours post induction (I24). The strain number is shown below express the protein in a periplasm compartment. In certain each grouping of samples. The large protein band in the I24 embodiments, the cell may also secrete expressed recombi soluble fraction of EP484-004 corresponds to enhanced gene nant protein extracellularly through an outer cell wall. Host expression facilitated by Bce leader sequence. cells include eukaryotic cells, including yeast cells, insect 15 FIG.9 demonstrates SDS-PAGE and Western analyses of cells, mammaliancells, plant cells, etc., and prokaryotic cells, Gal2 schv expression. Soluble (S) and Insoluble (I) fractions including bacterial cells such as Pfluorescens, E. coli, and the were analyzed. Above each pair of lanes is indicated the like. Any protein of interest may be expressed using the secretion leader fused to Gal2. Molecular weight markers are secretion polypeptide leader sequences of the invention, described to the left of each SDS-PAGE gel (top) or Western including therapeutic proteins, hormones, a growth factors, blot (bottom). Arrows indicate the migration of Gal2. extracellular receptors or ligands, proteases, kinases, blood FIG. 10 represents an SDS-PAGE Analysis of Thioredoxin proteins, chemokines, cytokines, antibodies and the like. (TrxA) expression. Soluble fractions were analyzed. Above each pair of lanes is indicated the secretion leader fused to BRIEF DESCRIPTION OF THE FIGURES TrxA. Molecular weight markers are described to the left of 25 the SDS-PAGE gel. Arrows indicate the migration of unproc FIG.1 depicts the expression construct for the dsbCSS-skip essed (upper arrow) and processed (lower arrow) TrXA. fusion protein. DETAILED DESCRIPTION FIG. 2 shows expression of the Skp protein around 17 kDa (arrows). Bands labeled 2 and 3 were consistent with the Skp 30 protein. Band 1 appears to have both DNA binding protein I. Overview (3691) and Skp. Compositions and methods for producing high levels of FIG. 3 is an analysis after expression of dsbC-skip in properly processed polypeptides in a host cell are provided. In Pseudomonas fluorescens after 0 and 24 hours in soluble (S) particular, novel secretion signals are provided which pro and insoluble (I) fractions for samples labeled 2B-2 (FIG.3A) 35 mote the targeting of an operably linked polypeptide of inter and 2B-4 (FIG. 3B). In FIG. 3A, bands 5, 7 and 9 were the est to the periplasm of Gram-negative bacteria or into the unprocessed disbC-skip protein in the insoluble fraction. extracellular environment. For the purposes of the present Bands 6, 8, and 10 were the processed disbC-skip in the invention, a 'secretion signal.” “secretion signal polypep insoluble fraction. Bands 1 and 3 were the processed disbC tide.” “signal peptide, or “leader sequence' is intended a skip in the soluble fraction. Bands 2 and 4 were an unknown 40 peptide sequence (or the polynucleotide encoding the peptide protein. In FIG. 3B, bands 15, 17, and 19 were the unproc sequence) that is useful for targeting an operably linked pro essed dsbC-skip protein in the insoluble fraction. Bands 16, tein or polypeptide of interest to the periplasm of Gram 18, and 20 were the processed disbC-skip in the insoluble negative bacteria or into the extracellular space. The secretion fraction. Bands 11 and 13 were the processed disbC-skip in the signal sequences of the invention include the secretion soluble fraction. Bands 12 and 14 were an unknown protein. 45 polypeptides selected from pbp, dsbA, dsbC, Bce, Cup A2. FIG. 4 shows a Western analysis of protein accumulation CupB2, CupC2, NikA, Flg.I, ORF5550, Ttg2C, and ORF8124 after expression of DC694 (dsbA-PA83). Accumulation of secretion signals, and fragments and variants thereof. The the soluble (S), insoluble (I), and cell free broth (B) at 0 and amino acid sequences for the secretion signals are set forth in 24 hours was assessed by Western analysis. SEQID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, and 24. The 50 corresponding nucleotide sequences are provided in SEQID FIG. 5 shows a Western analysis of protein accumulation NO:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, and 23, respectively. after expression of EP468-002.2 (dsbA). Accumulation of the The invention comprises these sequences as well as frag soluble (S) and insoluble (I) protein at 0 and 24 hours after ments and variants thereof. induction was assessed by Western analysis. The methods of the invention provide improvements of FIG. 6 demonstrates the alkaline phosphatase activity of 55 current methods of production of recombinant proteins in the plNS-008-3 (pbp) mutant compared to pINS-008-5 bacteria that often produce improperly folded, aggregated or (wildtypepbp) secretion signal. Cell cultures were adjusted to inactive proteins. Additionally, many types of proteins 1 ODoo unit, then PhoA activity was measured by adding require secondary modifications that are inefficiently 4-methylumbelliferone (MUP) and measuring fluorescent achieved using known methods. The methods herein increase product formation at 10 min. The negative control contains 60 the harvest of properly folded proteins by secreting the pro MUP but no cells. tein from the intracellular environment. In Gram-negative FIG. 7 shows a Western analysis of protein accumulation bacteria, a protein secreted from the cytoplasm can end up in after expression of plNS-008-3 (pbp) and plNS0008-5 the periplasmic space, attached to the outer membrane, or in (wildtype pbp). Accumulation of the proinsulin-phoA in the the extracellular broth. The methods also avoid inclusion soluble (Sol), insoluble (Insol), and extracellular fraction 65 bodies, which are made of aggregated proteins. Secretion into (Bro) at I0. I16, and I40 hour was assessed by Western analy the periplasmic space also has the well known effect of facili sis. Aliquots of the culture were adjusted to 20 ODoo units, tating proper disulfide bond formation (Bardwell et al. (1994) US 7,618,799 B2 5 6 Phosphate Microorg. 270-5; Manoil (2000) Methods in Enzy tides operative therein can be referred to as Sec dependent mol. 326: 35-47). Other benefits of secretion of recombinant secretion signals. One of the issues in developing an appro protein include more efficient isolation of the protein; proper priate secretion signal is to ensure that the signal is appropri folding and disulfide bond formation of the transgenic pro ately expressed and cleaved from the expressed protein. tein, leading to an increase in the percentage of the protein in Signal polypeptides for the sec pathway generally consist active form; reduced formation of inclusion bodies and of the following three domains: (i) a positively charged n-re reduced toxicity to the host cell; and increased percentage of gion, (ii) a hydrophobic h-region and (iii) an uncharged but the recombinant protein in soluble form. The potential for polar c-region. The cleavage site for the signal peptidase is excretion of the protein of interest into the culture medium located in the c-region. However, the degree of signal can also potentially promote continuous, rather than batch 10 sequence conservation and length, as well as the cleavage site culture for protein production. position, can vary between different proteins. Gram-negative bacteria have evolved numerous systems A signature of Sec-dependent protein export is the pres for the active export of proteins across their dual membranes. ence of a short (about 30 amino acids), mainly hydrophobic These routes of secretion include, e.g.: the ABC (Type I) amino-terminal signal sequence in the exported protein. The pathway, the Path/Fla (Type III) pathway, and the Path/Vir 15 signal sequence aids protein export and is cleaved off by a (Type IV) pathway for one-step translocation across both the periplasmic signal peptidase when the exported protein plasma and outer membrane; the Sec (Type II), Tat, MscL, reaches the periplasm. A typical N-terminal Sec signal and Holins pathways for translocation across the plasma polypeptide contains an N-domain with at least one arginine membrane; and the Sec-plus-fimbrial usher porin (FUP), Sec or lysine residue, followed by a domain that contains a stretch plus-autotransporter (AT), Sec-plus-two partner secretion of hydrophobic residues, and a C-domain containing the (TPS), Sec-plus-main terminal branch (MTB), and Tat-plus cleavage site for signal peptidases. MTB pathways for two-step translocation across the plasma Bacterial protein production systems have been developed and outer membranes. Notallbacteria have all of these secre in which transgenic protein constructs are engineered as tion pathways. fusion proteins containing both a protein of interest and a Three protein systems (types I, III and IV) secrete proteins 25 secretion signal in an attempt to target the protein out of the across both membranes in a single energy-coupled step. Four cytoplasm. systems (Sec, Tat, MscL and Holins) secrete only across the P. fluorescens has been demonstrated to be an improved inner membrane, and four other systems (MTB, FUP AT and platform for production of a variety of proteins and several TPS) secrete only across the outer membrane. efficient secretion signals have been identified from this In one embodiment, the signal sequences of the invention 30 organism (see, U.S. Application Publication Number utilize the Sec secretion system. The Sec system is respon 20060008877, herein incorporated by reference in its sible for export of proteins with the N-terminal signal entirety). P. fluorescens produces exogenous proteins in a polypeptides across the cytoplasmic membranes (see, Agar correctly processed form to a higher level than typically seen raberes and Dice (2001) Biochim Biophys Acta. 1513:1-24: in other bacterial expression systems, and transports these Muller et al. (2001) Prog Nucleic Acid Res Mol. Biol. 66: 35 proteins at a higher level to the periplasm of the cell, leading 107-157). Protein complexes of the Sec family are found to increased recovery of fully processed recombinant protein. universally in prokaryotes and eukaryotes. The bacterial Sec Therefore, in one embodiment, the invention provides a system consists of transport proteins, a chaperone protein method for producing exogenous protein in a Pfluorescens (SecB) or signal recognition particle (SRP) and signal pepti cell by expressing the target protein linked to a secretion dases (SPase I and SPase II). The Sec transport complex in E. 40 signal. coli consists of three integral inner membrane proteins, SecY, The secretion signal sequences of the invention are useful SecE and SecG, and the cytoplasmic ATPase, SecA. SecA in Pseudomonas. The Pseudomonads system offers advan recruits SecY/E/G complexes to form the active translocation tages for commercial expression of polypeptides and channel. The chaperone protein SecB binds to the nascent enzymes, in comparison with other bacterial expression sys polypeptide chain to prevent it from folding and targets it to 45 tems. In particular, Pfluorescens has been identified as an SecA. The linear polypeptide chain is Subsequently trans advantageous expression system. Pfluorescens encompasses ported through the SecYEG channel and, following cleavage a group of common, nonpathogenic saprophytes that colonize of the signal polypeptide, the protein is folded in the peri soil, water and plant Surface environments. Commercial plasm. Three auxiliary proteins (SecD, SecF and YajC) form enzymes derived from Pfluorescens have been used to reduce a complex that is not essential for secretion but stimulates 50 environmental contamination, as detergent additives, and for secretion up to ten-fold under many conditions, particularly at Stereoselective hydrolysis. Pfluorescens is also used agricul low temperatures. turally to control pathogens. U.S. Pat. No. 4,695,462 Proteins that are transported into the periplasm, i.e. through describes the expression of recombinant bacterial proteins in a type II Secretion system, can also be exported into the P. fluorescens. Between 1985 and 2004, many companies extracellular media in a further step. The mechanisms are 55 capitalized on the agricultural use of Pfluorescens for the generally through an autotransporter, a two partner Secretion production of pesticidal, insecticidal, and nematocidal toxins, system, a main terminal branch system or a fimbrial usher as well as on specific toxic sequences and genetic manipula porin. tion to enhance expression of these. See, for example, PCT Of the twelve known secretion systems in Gram-negative Application Nos. WO 03/068926 and WO 03/068948; PCT bacteria, eight are known to utilize targeting signal polypep 60 publication No. WO 03/089455; PCT Application No. WO tides found as part of the expressed protein. These signal 04/005221; and, U.S. Patent Publication Number polypeptides interact with the proteins of the secretion sys 2006OOO8877. tems so that the cell properly directs the protein to its appro priate destination. Five of these eight signal-polypeptide II. Compositions based secretion systems are those that involve the Sec system. 65 A. Isolated Polypeptides These five are referred to as involved in Sec-dependent cyto In one embodiment of the present invention, an isolated plasmic membrane translocation and their signal polypep polypeptide is provided, wherein the isolated polypeptide is a US 7,618,799 B2 7 8 novel secretion signal useful for targeting an operably linked acid residue is a residue that can be altered from the wild-type protein or polypeptide of interest to the periplasm of Gram sequence of a secretion signal polypeptide without altering negative bacteria or into the extracellular space. In one the biological activity, whereas an “essential amino acid embodiment, the polypeptide has an amino acid sequence that residue is required for biological activity. A “conservative is, or is substantially homologous to, a pbp, dsbA, dsbC, amino acid substitution' is one in which the amino acid Bce, Cup A2, Cup B2, CupC2, NikA, Flg. ORF5550, Ttg2C, residue is replaced with an amino acid residue having a simi or ORF8124 secretion signal, or fragments or variants lar side chain. Families of conservative and semi-conserva thereof. In another embodiment, this isolated polypeptide is a tive amino acid residues are listed in Table 1. fusion protein of the secretion signal and a protein or polypep tide of interest. 10 TABLE 1 In another embodiment, the polypeptide sequence is, or is Substantially homologous to, the secretion signal polypeptide Similar Amino Acid Substitution Groups set forth in SEQID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, or 24, or is encoded by the polynucleotide sequence set forth in Conservative Groups (8) Semi-Conservative Groups (7) 15 Arg, Lys Arg, Lys, His SEQ ID NO:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, or 23. In Asp, Gln ASn, Asp, Gly, Gln another embodiment, the polypeptide sequence comprises at ASn, Glu least amino acids 2-24 of SEQID NO:2, at least amino acids Ile, Leu, Val Ile, Leu, Val, Met, Phe 2-22 of SEQID NO:4, at least amino acids 2-21 of SEQID Ala, Gly Ala, Gly, Pro, Ser, Thr Ser, Thr Ser, Thr, Tyr NO:6, at least amino acids 2-33 of SEQ ID NO:8, at least Phe, Tyr Phe, Trp, Tyr amino acids 2-25 of SEQID NO:10, at least amino acids 2-24 CyS (non-cysteine), Ser Cys (non-cysteine), Ser, Thr of SEQ ID NO:12, at least amino acids 2-23 of SEQ ID NO:14, at least amino acids 2-21 of SEQID NO:16, at least amino acids 2-21 of SEQID NO:18, at least amino acids 2-21 Variant proteins encompassed by the present invention are of SEQ ID NO:20, at least amino acids 2-33 of SEQ ID biologically active, that is they continue to possess the desired NO:22, or at least amino acids 2-39 of SEQID NO:24. In yet 25 biological activity of the native protein; that is, retaining another embodiment, the polypeptide sequence comprises a secretion signal activity. By “retains activity” is intended that fragment of SEQID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, the variant will have at least about 30%, at least about 50%, at or 24, which is truncated by 1, 2,3,4,5,6,7,8,9, or 10 amino least about 70%, at least about 80%, about 90%, about 95%, acids from the amino terminal but retains biological activity, about 100%, about 110%, about 125%, about 150%, at least i.e., secretion signal activity. 30 about 200% or greater secretion signal activity of the native In one embodiment the amino acid sequence of the protein. homologous polypeptide is a variant of a given original B. Isolated Polynucleotides polypeptide, wherein the sequence of the variant is obtainable The invention also includes an isolated nucleic acid with a by replacing up to or about 30% of the original polypeptide's sequence that encodes a novel secretion signal useful for amino acid residues with otheramino acid residue(s), includ 35 targeting an operably linked protein or polypeptide of interest ing up to about 1%, 2%,3%, 4%, 5%, 6%, 7%, 8%.9%, 10%, to the periplasm of Gram-negative bacteria or into the extra 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, cellular space. In one embodiment, the isolated polynucle 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, or 30%, otide encodes a polypeptide sequence Substantially homolo provided that the variant retains the desired function of the gous to a pbp, dsbA, dsbC, Bce, Cup A2, CupB2, CupC2, original polypeptide. A variant amino acid with Substantial 40 NikA, Flg.I, ORF5550, Ttg2C, or ORF8124 secretion signal homology will be at least about 70%, at least about 75%, at polypeptide. In another embodiment, the present invention least about 80%, about 85%, about 90%, about 95%, about provides a nucleic acid that encodes a polypeptide sequence 96%, about 97%, about 98%, or at least about 99% homolo Substantially homologous to at least amino acids 2-24 of SEQ gous to the given polypeptide. A variant amino acid may be ID NO:2, at least amino acids 2-22 of SEQID NO:4, at least obtained in various ways including amino acid Substitutions, 45 amino acids 2-21 of SEQID NO:6, at least amino acids 2-33 deletions, truncations, and insertions of one or more amino of SEQID NO:8, at least amino acids 2-25 of SEQID NO:10, acids of SEQID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, or 24, at least amino acids 2-24 of SEQ ID NO:12, at least amino including up to about 1, about 2, about 3, about 4, about 5. acids 2-23 of SEQ ID NO:14, at least amino acids 2-21 of about 6, about 7, about 8, about 9, about 10, about 15, about SEQID NO:16, at least amino acids 2-21 of SEQID NO:18, 20, about 25, or more amino acid substitutions, deletions or 50 at least amino acids 2-21 of SEQ ID NO:20, at least amino insertions. acids 2-33 of SEQID NO:22, or at least amino acids 2-39 of By “substantially homologous' or “substantially similar SEQ ID NO:24, or provides a nucleic acid substantially is intended an amino acid or nucleotide sequence that has at homologous to SEQIDNO:1,3,5,7,9,11, 13, 15, 17, 19, 20, least about 60% or 65% sequence identity, about 70% or 75% 21, or 23, including biologically active variants and frag sequence identity, about 80% or 85% sequence identity, about 55 ments thereof. In another embodiment, the nucleic acid 90%, about 91%, about 92%, about 93%, about 94%, about sequence is at least about 60%, at least about 65%, at least 95%, about 96%, about 97%, about 98% or about 99% or about 70%, about 75%, about 80%, about 85%, about 90%, greater sequence identity compared to a reference sequence about 95%, about 96%, about 97%, about 98%, or at least using one of the alignment programs described herein using about 99% identical to the sequence of SEQID NO:1, 3, 5, 7, standard parameters. One of skill in the art will recognize that 60 9, 11, 13, 15, 17, 19, 20, 21, or 23. In another embodiment, the these values can be appropriately adjusted to determine cor nucleic acid encodes a polypeptide that is at least about 70%, responding identity of proteins encoded by two nucleotide at least about 75%, at least about 80%, about 85%, about 90%, sequences by taking into account codon degeneracy, amino about 95%, about 96%, about 97%, about 98%, or at least acid similarity, reading frame positioning, and the like. about 99% identical to the amino acid sequence of SEQ ID For example, preferably, conservative amino acid Substi 65 NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, or 24. tutions may be made at one or more predicted, preferably Preferred secretion signal polypeptides of the present nonessential amino acid residues. A “nonessential amino invention are encoded by a nucleotide sequence Substantially US 7,618,799 B2 9 10 homologous to the nucleotide sequences of SEQID NO:1 or These and other sequence alignment methods are well 3. Using methods such as PCR, hybridization, and the like, known in the art and may be conducted by manual alignment, corresponding secretion signal polypeptide sequences can be by visual inspection, or by manual or automatic application of identified, such sequences having Substantial identity to the a sequence alignment algorithm, Such as any of those embod sequences of the invention. See, for example, Sambrook J. ied by the above-described programs. Various useful algo and Russell, D.W. (2001) Molecular Cloning: A Laboratory rithms include, e.g.: the similarity search method described in Manual. (Cold Spring Harbor Laboratory Press, Cold Spring W. R. Pearson & D. J. Lipman, Proc. Natl. Acad. Sci. USA Harbor, N.Y.) and Innis, et al. (1990) PCR Protocols: A Guide 85:2444-48 (April 1988); the local homology method to Methods and Applications (Academic Press, NY). Variant described in T. F. Smith & M. S. Waterman, in Adv. Appl. nucleotide sequences also include synthetically derived 10 Math. 2:482-89 (1981) and in J. Molec. Biol. 147: 195-97 nucleotide sequences that have been generated, for example, (1981); the homology alignment method described in S. B. by using site-directed mutagenesis but which still encode the Needleman & C. D. Wunsch, J. Molec. Biol. 48(3):443-53 secretion signal polypeptides disclosed in the present inven (March 1970); and the various methods described, e.g., by W. tion as discussed infra. Variant Secretion signal polypeptides R. Pearson, in Genomics 11(3):635-50 (November 1991); by encompassed by the present invention are biologically active, 15 W. R. Pearson, in Methods Molec. Biol. 24:307-31 and that is, they continue to possess the desired biological activity 25:365-89 (1994); and by D. G. Higgins & P. M. Sharp, in of the native protein, that is, retaining secretion signaling Comp. Applins in Biosci. 5:151-53 (1989) and in Gene 73(1): activity. By “retains activity” is intended that the variant will 237-44 (15 Dec. 1988). have at least about 30%, at least about 50%, at least 70%, at Unless otherwise stated, GAP Version 10, which uses the least about 80%, at least about 85%, at least about 90%, at algorithm of Needleman and Wunsch (1970) supra, will be least about 95%, about 96%, about 97%, about 98%, at least used to determine sequence identity or similarity using the about 99% or greater of the activity of the native secretion following parameters:% identity and% similarity for a nucle signal polypeptide. Methods for measuring secretion signal otide sequence using GAPWeight of 50 and Length Weight of polypeptide activity are discussed elsewhere herein. 3, and the nwsgapdna.cmp scoring matrix; % identity or % The skilled artisan will further appreciate that changes can 25 similarity for an amino acid sequence using GAP weight of 8 be introduced by mutation into the nucleotide sequences of and length weight of 2, and the BLOSUM62 scoring pro the invention thereby leading to changes in the amino acid gram. Equivalent programs may also be used. By "equivalent sequence of the encoded secretion signal polypeptides, with program' is intended any sequence comparison program that, out altering the biological activity of the secretion signal for any two sequences in question, generates an alignment polypeptides. Thus, variant isolated nucleic acid molecules 30 having identical nucleotide residue matches and an identical can be created by introducing one or more nucleotide Substi percent sequence identity when compared to the correspond tutions, additions, or deletions into the corresponding nucle ing alignment generated by GAP Version 10. In various otide sequence disclosed herein, Such that one or more amino embodiments, the sequence comparison is performed across acid Substitutions, additions or deletions are introduced into the entirety of the query or the Subject sequence, or both. the encoded protein. Mutations can be introduced by standard 35 D. Hybridization Conditions techniques, such as site-directed mutagenesis and PCR-me In another aspect of the invention, a nucleic acid that diated mutagenesis. Such variant nucleotide sequences are hybridizes to an isolated nucleic acid with a sequence that also encompassed by the present invention. encodes a polypeptide with a sequence Substantially similar C. Nucleic Acid and Amino Acid Homology to a pbp, dsbA, dsbC, Bce, Cup A2, CupB2, CupC2, NikA, Nucleic acid and amino acid sequence homology is deter 40 FlgI, ORF5550, Ttg2C, or ORF8124 secretion signal mined according to any of various methods well known in the polypeptide is provided. In certain embodiments, the hybrid art. Examples of useful sequence alignment and homology izing nucleic acid will bind under high Stringency conditions. determination methodologies include those described below. In various embodiments, the hybridization occurs across Sub Alignments and searches for similar sequences can be per stantially the entire length of the nucleotide sequence encod formed using the U.S. National Center for Biotechnology 45 ing the secretion signal polypeptide, for example, across Sub Information (NCBI) program, MegaBLAST (currently avail stantially the entire length of one or more of SEQID NO:1, 3, able at www.ncbi.nlm.nih.gov/BLAST/). Use of this program 5, 7, 9, 11, 13, 15, 17, 19, 21, or 23. A nucleic acid molecule with options for percent identity set at, for example, 70% for hybridizes to “substantially the entire length' of a secretion amino acid sequences, or set at, for example, 90% for nucle signal-encoding nucleotide sequence disclosed herein when otide sequences, will identify those sequences with 70%, or 50 the nucleic acid molecule hybridizes over at least 80% of the 90%, or greater sequence identity to the query sequence. entirelength of one or more of SEQID NO:1,3,5,7,9,11, 13, Other software known in the art is also available for aligning 15, 17, 19, 21, or 23, at least 85%, at least 90%, or at least 95% and/or searching for similar sequences, e.g., sequences at of the entire length. Unless otherwise specified, “substan least 70% or 90% identical to an information string contain tially the entire length” refers to at least 80% of the entire ing a secretion signal sequence according to the present 55 length of the secretion signal-encoding nucleotide sequence invention. For example, sequence alignments for comparison where the length is measured in contiguous nucleotides (e.g., to identify sequences at least 70% or 90% identical to a query hybridizes to at least 53 contiguous nucleotides of SEQ ID sequence can be performed by use of, e.g., the GAP, BEST NO:3, at least 51 contiguous nucleotides of SEQID NO:5, at FIT, BLAST, FASTA, and TFASTA programs available in the least 80 contiguous nucleotides of SEQID NO:7, etc.). GCG Sequence Analysis Software Package (available from 60 In a hybridization method, all or part of the nucleotide the Genetics Computer Group, University of Wisconsin Bio sequence encoding the secretion signal polypeptide can be technology Center, 1710 University Avenue, Madison, Wis. used to screen cDNA or genomic libraries. Methods for con 53705), with the default parameters as specified therein, plus struction of Such cDNA and genomic libraries are generally a parameter for the extent of sequence identity set at the known in the art and are disclosed in Sambrook and Russell, desired percentage. Also, for example, the CLUSTAL pro 65 2001. The so-called hybridization probes may be genomic gram (available in the PC/Gene software package from Intel DNA fragments, cDNA fragments, RNA fragments, or other ligenetics, Mountain View, Calif.) may be used. oligonucleotides, and may be labeled with a detectable group US 7,618,799 B2 11 12 such as P. or any other detectable marker, such as other are detected (heterologous probing). Generally, a probe is less radioisotopes, a fluorescent compound, an enzyme, or an than about 1000 nucleotides in length, preferably less than enzyme co-factor. Probes for hybridization can be made by 500 nucleotides in length. labeling synthetic oligonucleotides based on the known Typically, stringent conditions will be those in which the secretion signal polypeptide-encoding nucleotide sequence 5 salt concentration is less than about 1.5 M Naion, typically disclosed herein. Degenerate primers designed on the basis of about 0.01 to 1.0 MNaion concentration (or other salts) at pH conserved nucleotides or amino acid residues in the nucle 7.0 to 8.3 and the temperature is at least about 60° C., pref otide sequence or encoded amino acid sequence can addition erably about 68° C. Stringent conditions may also be ally be used. The probe typically comprises a region of nucle achieved with the addition of destabilizing agents such as otide sequence that hybridizes understringent conditions to at 10 formamide. Exemplary low stringency conditions include least about 10, at least about 15, at least about 16, 17, 18, 19, hybridization with a buffer solution of 30 to 35% formamide, 20, or more consecutive nucleotides of a secretion signal 1 MNaCl, 1% SDS (sodium dodecyl sulfate) at 37° C., and a polypeptide-encoding nucleotide sequence of the invention wash in 1x to 2xSSC (20xSSC=3.0M NaC1/0.3 M trisodium or a fragment or variant thereof. Methods for the preparation 15 citrate) at 50 to 55° C. Exemplary moderate stringency con of probes for hybridization are generally known in the art and ditions include hybridization in 40 to 45% formamide, 1.0 M are disclosed in Sambrook and Russell, 2001, herein incor NaCl, 1% SDS at 37°C., and a wash in 0.5x to 1XSSC at 55 porated by reference. to 60° C. Exemplary high stringency conditions include In hybridization techniques, all or part of a known nucle hybridization in 50% formamide, 1 MNaCl, 1% SDS at 37° otide sequence is used as a probe that selectively hybridizes to C., and a wash in 0.1 xSSC at 60 to 68°C. Optionally, wash other corresponding nucleotide sequences present in a popu buffers may comprise about 0.1% to about 1% SDS. Duration lation of cloned genomic DNA fragments or cDNA fragments of hybridization is generally less than about 24 hours, usually (i.e., genomic or cDNA libraries) from a chosen organism. about 4 to about 12 hours. The hybridization probes may be genomic DNA fragments, Specificity is typically the function of post-hybridization cDNA fragments, RNA fragments, or other oligonucleotides, 25 washes, the critical factors being the ionic strength and tem and may be labeled with a detectable group such as P. or any perature of the final wash solution. For DNA-DNA hybrids, other detectable marker. Thus, for example, probes for the T can be approximated from the equation of Meinkoth hybridization can be made by labeling synthetic oligonucle and Wahl (1984) Anal. Biochem. 138:267-284: T-81.5° otides based on the Secretion signal polypeptide-encoding 30 C.+16.6 (log M)+0.41 (%GC)-0.61 (% form)-500/L: where nucleotide sequence of the invention. Methods for the prepa M is the molarity of monovalent cations, 96 GC is the per ration of probes for hybridization and for construction of centage of guanosine and cytosine nucleotides in the DNA,% cDNA and genomic libraries are generally known in the art form is the percentage of formamide in the hybridization and are disclosed in Sambrook et al. (1989) Molecular Clon solution, and L is the length of the hybrid in base pairs. The T. ing: A Laboratory Manual (2d ed., Cold Spring Harbor Labo 35 is the temperature (under defined ionic strength and pH) at ratory Press, Plainview, N.Y.). which 50% of a complementary target sequence hybridizes to For example, the entire Secretion signal polypeptide-en a perfectly matched probe. T is reduced by about 1° C. for coding nucleotide sequence disclosed herein, or one or more each 1% of mismatching; thus, T., hybridization, and/or portions thereof, may be used as a probe capable of specifi wash conditions can be adjusted to hybridize to sequences of cally hybridizing to corresponding nucleotide sequences and 40 the desired identity. For example, if sequences with 290% messenger RNAS encoding secretion signal polypeptides. To identity are sought, the T can be decreased 10°C. Generally, achieve specific hybridization under a variety of conditions, stringent conditions are selected to be about 5° C. lower than Such probes include sequences that are unique and are pref the thermal melting point (T) for the specific sequence and erably at least about 10 nucleotides in length, or at least about its complement at a defined ionic strength and pH. However, 15 nucleotides in length. Such probes may be used to amplify 45 severely stringent conditions can utilize a hybridization and/ corresponding secretion signal polypeptide-encoding nucle or wash at 1, 2, 3, or 4°C. lower than the thermal melting point otide sequences from a chosen organism by PCR. This tech (T); moderately stringent conditions can utilize a hybridiza nique may be used to isolate additional coding sequences tion and/or wash at 6, 7, 8, 9, or 10° C. lower than the thermal from a desired organism or as a diagnostic assay to determine melting point (T): low stringency conditions can utilize a the presence of coding sequences in an organism. Hybridiza 50 hybridization and/or wash at 11, 12, 13, 14, 15, or 20° C. tion techniques include hybridization screening of plated lower than the thermal melting point (T). Using the equa DNA libraries (either plaques or colonies; see, for example, tion, hybridization and wash compositions, and desired T. Sambrook et al. (1989) Molecular Cloning. A Laboratory those of ordinary skill will understand that variations in the Manual (2d ed., Cold Spring Harbor Laboratory Press, Pla stringency of hybridization and/or wash solutions are inher inview, N.Y.). 55 ently described. If the desired degree of mismatching results Hybridization of Such sequences may be carried out under in a T of less than 45° C. (aqueous solution) or 32° C. stringent conditions. By "stringent conditions' or 'stringent (formamide solution), it is preferred to increase the SSC hybridization conditions” is intended conditions under which concentration so that a higher temperature can be used. An a probe will hybridize to its target sequence to a detectably extensive guide to the hybridization of nucleic acids is found greater degree than to other sequences (e.g., at least 2-fold 60 in Tijssen (1993) Laboratory Techniques in Biochemistry and over background). Stringent conditions are sequence-depen Molecular Biology—Hybridization with Nucleic Acid dent and will be different in different circumstances. By con Probes, Part I, Chapter 2 (Elsevier, N.Y.); and Ausubel et al., trolling the Stringency of the hybridization and/or washing eds. (1995) Current Protocols in Molecular Biology, Chapter conditions, target sequences that are 100% complementary to 2 (Greene Publishing and Wiley-Interscience, New York). the probe can be identified (homologous probing). Alterna 65 See Sambrooketal. (1989) Molecular Cloning: A Laboratory tively, stringency conditions can be adjusted to allow some Manual (2d ed., Cold Spring Harbor Laboratory Press, Cold mismatching in sequences so that lower degrees of similarity Spring Harbor, N.Y.). US 7,618,799 B2 13 14 E Codon Usage both the signal sequence and/or the protein or polypeptide The nucleic acid sequences disclosed herein may be sequence. The gene(s) are constructed within or inserted into adjusted based on the codon usage of a host organism. Codon one or more vector(s), which can then be transformed into the usage or codon preference is well known in the art. The expression host cell. selected coding sequence may be modified by altering the Other regulatory elements may be included in a vector genetic code thereof to match that employed by the bacterial (also termed “expression construct”). Such elements include, host cell, and the codon sequence thereof may be enhanced to but are not limited to, for example, transcriptional enhancer better approximate that employed by the host. Genetic code sequences, translational enhancer sequences, other promot selection and codon frequency enhancement may be per ers, activators, translational start and stop signals, transcrip formed according to any of the various methods known to one 10 tion terminators, cistronic regulators, polycistronic regula of ordinary skill in the art, e.g., oligonucleotide-directed tors, tag sequences, such as nucleotide sequence “tags' and mutagenesis. Useful on-line InterNet resources to assist in “tag” polypeptide coding sequences, which facilitates iden this process include, e.g.: (1) the Codon Usage Database of tification, separation, purification, and/or isolation of an the Kazusa DNA Research Institute (2-6-7 Kazusa-kamatari, expressed polypeptide. Kisarazu, Chiba 292-0818 Japan) and available at www.ka 15 In another embodiment, the expression vector further com Zusa.orjp/codon; and (2) the Genetic Codes tables available prises a tag sequence adjacent to the coding sequence for the from the NCBI database at www.ncbi.nlm.nih secretion signal or to the coding sequence for the protein or .gov/-Taxonomy/Utils/wprintgc.cgi?mode c. For example, polypeptide of interest. In one embodiment, this tag sequence Pseudomonas species are reported as utilizing Genetic Code allows for purification of the protein. The tag sequence can be Translation Table 11 of the NCBI Taxonomy site, and at the an affinity tag. Such as a hexa-histidine affinity tag. In another Kazusa site as exhibiting the codon usage frequency of the embodiment, the affinity tag can be a glutathione-S-trans table shown at www.kazusa.or.ip/codon/cgibin. It is recog ferase molecule. The tag can also be a fluorescent molecule, nized that the coding sequence for either the secretion signal such as YFP or GFP, or analogs of such fluorescent proteins. polypeptide, the polypeptide of interest described elsewhere The tag can also be a portion of an antibody molecule, or a herein, or both, can be adjusted for codon usage. 25 known antigen or ligand for a known binding partner useful F. Expression Vectors for purification. Another embodiment of the present invention includes an A protein-encoding gene according to the present inven expression vector which includes a nucleic acid that encodes tion can include, in addition to the protein coding sequence, a novel secretion polypeptide useful for targeting an operably the following regulatory elements operably linked thereto: a linked protein or polypeptide of interest to the periplasm of 30 promoter, a ribosome binding site (RBS), a transcription ter Gram-negative bacteria or into the extracellular space. In one minator, translational start and stop signals. Useful RBSS can embodiment, the vector comprises a polynucleotide sequence be obtained from any of the species useful as host cells in that encodes a polypeptide that is substantially similar to a expression systems according to the present invention, pref secretion signal polypeptide disclosed herein, operably erably from the selected host cell. Many specific and a variety linked to a promoter. Expressible coding sequences will be 35 of consensus RBSS are known, e.g., those described in and operatively attached to a transcription promoter capable of referenced by D. Frishman et al., Starts of bacterial genes: functioning in the chosen host cell, as well as all other estimating the reliability of computer predictions, Gene 234 required transcription and translation regulatory elements. (2):257-65 (8 Jul. 1999); and B. E. Suzek et al. A probabi The term “operably linked’ refers to any configuration in listic method for identifying start codons in bacterial which the transcriptional and any translational regulatory 40 genomes, Bioinformatics 17(12): 1123-30 (December 2001). elements are covalently attached to the encoding sequence in In addition, either native or synthetic RBSs may be used, e.g., Such disposition(s), relative to the coding sequence, that in those described in: EP 0207459 (synthetic RBSs); O. Ikehata and by action of the host cell, the regulatory elements can et al., Primary structure of nitrile hydratase deduced from the direct the expression of the coding sequence. nucleotide sequence of a Rhodococcus species and its expres The vector will typically comprise one or more phenotypic 45 sion in Escherichia coli, Eur. J. Biochem. 181(3):563-70 selectable markers and an origin of replication to ensure (1989)(native RBS sequence of AAGGAAG). Further maintenance of the vector and to, if desirable, provide ampli examples of methods, vectors, and translation and transcrip fication within the host. Suitable hosts for transformation in tion elements, and other elements useful in the present inven accordance with the present disclosure include various spe tion are described in, e.g.: U.S. Pat. No. 5,055,294 to Gilroy cies within the genera Pseudomonas, and particularly pre 50 and U.S. Pat. No. 5,128,130 to Gilroy et al.: U.S. Pat. No. ferred is the host cell strain of Pfluorescens. 5,281,532 to Rammler et al.; U.S. Pat. Nos. 4,695,455 and In one embodiment, the vector further comprises a coding 4,861,595 to Barnes et al.; U.S. Pat. No. 4,755,465 to Gray et sequence for expression of a protein or polypeptide of inter al.; and U.S. Pat. No. 5,169,760 to Wilcox. est, operably linked to the secretion signal disclosed herein. Transcription of the DNA encoding the proteins of the The recombinant proteins and polypeptides can be expressed 55 present invention is increased by inserting an enhancer from polynucleotides in which the target polypeptide coding sequence into the vector or plasmid. Typical enhancers are sequence is operably linked to the leader sequence and tran cis-acting elements of DNA, usually about from 10 to 300 bp Scription and translation regulatory elements to form a func in size that act on the promoter to increase its transcription. tional gene from which the host cell can express the protein or Examples include various Pseudomonas enhancers. polypeptide. The coding sequence can be a native coding 60 Generally, the recombinant expression vectors will include sequence for the target polypeptide, if available, but will more origins of replication and selectable markers permitting trans preferably be a coding sequence that has been selected, formation of the host cell and a promoter derived from a improved, or optimized for use in the selected expression host highly-expressed gene to direct transcription of a downstream cell: for example, by Synthesizing the gene to reflect the structural sequence. Such promoters can be derived from codon use bias of a host species. In one embodiment of the 65 operons encoding the enzymes Such as 3-phosphoglycerate invention, the host species is a Pfluorescens, and the codon kinase (PGK), acid phosphatase, or heat shock proteins, bias of Pfluorescens is taken into account when designing among others. The heterologous structural sequence is US 7,618,799 B2 15 16 assembled in appropriate phase with translation initiation and replication and mobilization loci from the RSF 1010 plasmid. termination sequences, and preferably, the secretion Other exemplary useful vectors include those described in sequence capable of directing secretion of the translated U.S. Pat. No. 4,680,264 to Puhler et al. polypeptide. Optionally the heterologous sequence can In one embodiment, an expression plasmid is used as the encode a fusion polypeptide including an N-terminal identi 5 expression vector. In another embodiment, RSF 1010 or a fication polypeptide imparting desired characteristics, e.g., derivative thereof is used as the expression vector. In still stabilization or simplified purification of expressed recombi another embodiment, pMYC1050 or a derivative thereof, or nant product. pMYC4803 or a derivative thereof, is used as the expression Vectors are known in the art for expressing recombinant Vector. proteins in host cells, and any of these may be used for 10 expressing the genes according to the present invention. Such The plasmid can be maintained in the host cell by inclusion vectors include, e.g., plasmids, cosmids, and phage expres of a selection marker gene in the plasmid. This may be an sion vectors. Examples of useful plasmid vectors include, but antibiotic resistance gene(s), where the corresponding anti are not limited to, the expression plasmids pEBBR1MCS, biotic(s) is added to the fermentation medium, or any other pDSK519, pKT240, pML122, pPS10, RK2, RK6, pRO1600, 15 type of selection marker gene known in the art, e.g., a pro and RSF 1010. Other examples of such useful vectors include totrophy-restoring gene where the plasmid is used in a host those described by, e.g.: N. Hayase, in Appl. Envir. Microbiol. cell that is auxotrophic for the corresponding trait, e.g., a 60(9):3336-42 (September 1994); A. A. Lushnikov et al., in biocatalytic trait Such as an amino acid biosynthesis or a Basic Life Sci. 30:657-62 (1985): S. Graupner & W. Wack nucleotide biosynthesis trait, or a carbon Source utilization emagel, in Biomolec.Eng. 17(1): 11-16. (October 2000); H. P. trait. Schweizer, in Curr. Opin. Biotech. 12(5):439-45 (October The promoters used in accordance with the present inven 2001); M. Bagdasarian & K. N. Timmis, in Curr. Topics tion may be constitutive promoters or regulated promoters. Microbiol. Immunol.96:47-67 (1982); T. Ishiiet al., in FEMS Common examples of useful regulated promoters include Microbiol. Lett. 116(3):307-13 (Mar. 1, 1994); I. N. Olekh those of the family derived from the lac promoter (i.e. the lacz novich & Y. K. Fomichev, in Gene 140(1):63-65 (Mar. 11, 25 promoter), especially the tac and trc promoters described in 1994); M. Tsuda & T. Nakazawa, in Gene 136(1-2):257-62 U.S. Pat. No. 4,551,433 to DeBoer, as well as Ptac16, Ptac17, (Dec. 22, 1993); C. Nieto et al., in Gene 87(1): 145-49 (Mar. 1, PtacII, PlacUV5, and the T71ac promoter. In one embodi 1990); J. D. Jones & N. Gutterson, in Gene 61(3):299-306 ment, the promoter is not derived from the host cell organism. (1987); M. Bagdasarian et al., in Gene 16(1-3):237-47 (De In certain embodiments, the promoter is derived from an E. cember 1981); H. P. Schweizer et al., in Genet. Eng. (NY) 30 coli organism. 23:69-81 (2001); P. Mukhopadhyay et al., in J. Bact. 172(1): Common examples of non-lac-type promoters useful in 477-80 (January 1990); D.O. Wood et al., in J. Bact. 145(3): expression systems according to the present invention 1448-51 (March 1981); and R. Holtwicket al., in Microbiol include, e.g., those listed in Table 3. ogy 147(Pt 2):337-44 (February 2001). Further examples of expression vectors that can be useful 35 TABLE 3 in a host cell comprising the secretion signal constructs of the invention include those listed in Table 2 as derived from the Examples of non-lac Promoters indicated replicons. Promoter Inducer TABLE 2 40 PR High temperature P. High temperature Examples of Useful Expression Vectors Pn Alkyl- or halo-benzoates Pl Alkyl- or halo-toluenes Replicon Vector(s) Psal Salicylates PPS10 PCN39, PCN51 45 RSF1010 PKT261-3 See, e.g.: J. Sanchez-Romero & V. De Lorenzo (1999) PMMB66EH Genetic Engineering of Nonpathogenic Pseudomonas strains PEB8 PPLGN1 as Biocatalysts for Industrial and Environmental Processes, PMYC1050 in Manual of Industrial Microbiology and Biotechnology (A. RK2 RP1 PRK415 50 Demain & J. Davies, eds.) pp. 460-74 (ASM Press, Washing PJB653 ton, D.C.); H. Schweizer (2001) Vectors to express foreign PRO1600 PUCP genes and techniques to monitor gene expression for PBSP Pseudomonads, Current Opinion in Biotechnology, 12:439 445; and R. Slater & R. Williams (2000) The Expression of The expression plasmid, RSF1010, is described, e.g., by F. 55 Foreign DNA in Bacteria, in Molecular Biology and Biotech Heffron et al., in Proc. Natl Acad. Sci. USA 7209):3623-27 nology (J. Walker & R. Rapley, eds.) pp. 125-54 (The Royal (September 1975), and by K. Nagahari & K. Sakaguchi, in J. Society of Chemistry, Cambridge, UK)). A promoter having Bact. 133(3): 1527-29 (March 1978). Plasmid RSF110 and the nucleotide sequence of a promoter native to the selected derivatives thereof are particularly useful vectors in the bacterial host cell may also be used to control expression of present invention. Exemplary, useful derivatives of RSF1010, 60 the transgene encoding the target polypeptide, e.g., a which are known in the art, include, e.g., pKT212, pKT214, Pseudomonas anthranilate or benzoate operon promoter pKT231 and related plasmids, and pMYC1050 and related (Pant, Pben). Tandem promoters may also be used in which plasmids (see, e.g., U.S. Pat. Nos. 5,527,883 and 5,840,554 to more than one promoter is covalently attached to another, Thompson et al.), such as, e.g., pMYC1803. Plasmid whether the same or different in sequence, e.g., a Pant-Pben pMYC1803 is derived from the RSF110-based plasmid 65 tandem promoter (interpromoter hybrid) or a Plac-Plac tan pTJS260 (see U.S. Pat. No. 5,169,760 to Wilcox), which dem promoter, or whether derived from the same or different carries a regulated tetracycline resistance marker and the organisms. US 7,618,799 B2 17 18 Regulated promoters utilize promoter regulatory proteins tion signal selected from the group consisting of a pbp. in order to control transcription of the gene of which the dsbA, dsbC, Bce, Cup A2, CupB2, CupC2, NikA, Flg.I. promoter is a part. Where a regulated promoter is used herein, ORF5550. Ttg2C, and ORF8124 secretion signal sequence, a corresponding promoter regulatory protein will also be part or a sequence that is Substantially homologous to the secre of an expression system according to the present invention. tion signal sequence disclosed herein as SEQID NO:1, 3, 5, Examples of promoter regulatory proteins include: activator 7, 9, 11, 13, 15, 17, 19, 20, 21, or 23, or a nucleotide sequence proteins, e.g., E. coli catabolite activator protein, MalT pro encoding SEQID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, or tein; AraC family transcriptional activators; repressor pro 24. In some embodiments, no modifications are made teins, e.g., E. coli LacI proteins; and dual-function regulatory between the signal sequence and the protein or polypeptide of proteins, e.g., E. coli NagO protein. Many regulated-pro 10 interest. However, in certain embodiments, additional cleav moter/promoter-regulatory-protein pairs are known in the art. age signals are incorporated to promote proper processing of Promoter regulatory proteins interact with an effector com the amino terminal of the polypeptide. pound, i.e. a compound that reversibly or irreversibly associ The secretion system can also include a fermentation ates with the regulatory protein so as to enable the protein to medium, such as described below. In one embodiment, the either release or bind to at least one DNA transcription regu 15 system includes a mineral salts medium. In another embodi latory region of the gene that is under the control of the ment, the system includes a chemical inducer in the medium. promoter, thereby permitting or blocking the action of a tran The CHAMPIONTM pET expression system provides a Scriptase enzyme in initiating transcription of the gene. Effec high level of protein production. Expression is induced from tor compounds are classified as either inducers or co-repres the strong T7lac promoter. This system takes advantage of the sors, and these compounds include native effector high activity and specificity of the bacteriophage T7 RNA compounds and gratuitous inducer compounds. Many regu polymerase for high level transcription of the gene of interest. lated-promoter/promoter-regulatory-protein/effector-com The lac operator located in the promoter region provides pound trios are known in the art. Although an effector com tighter regulation than traditional T7-based vectors, improv pound can be used throughout the cell culture or ing plasmid stability and cell viability (Studier and Moffatt fermentation, in a preferred embodiment in which a regulated 25 (1986).J Molecular Biology 189(1): 113-30; Rosenberg, et al. promoter is used, after growth of a desired quantity or density (1987) Gene 56(1): 125-35). The T7 expression system uses of host cell biomass, an appropriate effector compound is the T7 promoter and T7 RNA polymerase (T7 RNAP) for added to the culture to directly or indirectly result in expres high-level transcription of the gene of interest. High-level sion of the desired gene(s) encoding the protein or polypep expression is achieved in T7 expression systems because the tide of interest. 30 T7 RNAP is more processive than native E. coli RNAP and is By way of example, where a lac family promoter is uti dedicated to the transcription of the gene of interest. Expres lized, a lacI gene can also be present in the system. The lacI sion of the identified gene is induced by providing a source of gene, which is (normally) a constitutively expressed gene, T7 RNAP in the host cell. This is accomplished by using a encodes the Lac repressor protein (LacD protein) which binds BL21 E. coli host containing a chromosomal copy of the T7 to the lac operator of these promoters. Thus, where a lac 35 RNAP gene. The T7 RNAP gene is under the control of the family promoter is utilized, the lacI gene can also be included lacUV5 promoter which can be induced by IPTG. T7 RNAP and expressed in the expression system. In the case of the lac is expressed upon induction and transcribes the gene of inter promoter family members, e.g., the tac promoter, the effector eSt. compound is an inducer, preferably a gratuitous inducer Such The pBAD expression system allows tightly controlled, as IPTG (isopropyl-D-1-thiogalactopyranoside, also called 40 titratable expression of protein or polypeptide of interest "isopropylthiogalactoside'). through the presence of specific carbon sources such as glu For expression of a protein or polypeptide of interest, any cose, glycerol and arabinose (Guzman, et al. (1995) J Bacte plant promoter may also be used. A promoter may be a plant riology 177(14): 4121-30). The p3AD vectors are uniquely RNA polymerase II promoter. Elements included in plant designed to give precise control over expression levels. Het promoters can be a TATA box or Goldberg-Hogness box, 45 erologous gene expression from the pBAD vectors is initiated typically positioned approximately 25 to 35 basepairs at the araBAD promoter. The promoter is both positively and upstream (5') of the transcription initiation site, and the negatively regulated by the product of the araC gene. AraC is CCAAT box, located between 70 and 100 basepairs a transcriptional regulator that forms a complex with L-ara upstream. In plants, the CCAAT box may have a different binose. In the absence of L-arabinose, the AraC dimer blocks consensus sequence than the functionally analogous 50 transcription. For maximum transcriptional activation two sequence of mammalian promoters (Messing et al. (1983) In: events are required: (i.) L-arabinose binds to AraC allowing Genetic Engineering of Plants, Kosuge et al., eds., pp. 211 transcription to begin. (ii.) The cAMP activator protein 227). In addition, virtually all promoters include additional (CAP)-cAMP complex binds to the DNA and stimulates upstream activating sequences or enhancers (Benoist and binding of AraC to the correct location of the promoter Chambon (1981) Nature 290:304-310; Gruss et al. (1981) 55 region. Proc. Nat. Acad. Sci. 78:943-947; and Khoury and Gruss The trc expression system allows high-level, regulated (1983) Cell 27:313-314) extending from around -100 bp to expression in E. coli from the trc promoter. The trc expression -1,000 bp or more upstream of the transcription initiation vectors have been optimized for expression of eukaryotic site. genes in E. coli. The trc promoter is a strong hybrid promoter G. Expression Systems 60 derived from the tryptophane (trp) and lactose (lac) promot The present invention further provides an improved ers. It is regulated by the lacO operator and the product of the expression system useful for targeting an operably linked lacIQ gene (Brosius, J. (1984) Gene 27(2):161-72). protein or polypeptide of interest to the periplasm of Gram Transformation of the host cells with the vector(s) dis negative bacteria or into the extracellular space. In one closed herein may be performed using any transformation embodiment, the system includes a host cell and a vector 65 methodology known in the art, and the bacterial host cells described above comprising a nucleotide sequence encoding may be transformed as intact cells or as protoplasts (i.e. a protein or polypeptide of interest operably linked to a secre including cytoplasts). Exemplary transformation methodolo US 7,618,799 B2 19 20 gies include poration methodologies, e.g., electroporation, PJ 372; 203 PJ 376; 204 IFO 15835; PJ 682): 205 PJ protoplast fusion, bacterial conjugation, and divalent cation 686: 206PJ 692; 207 PJ 693): 208PJ 722; 212. PJ 832); treatment, e.g., calcium chloride treatment or CaCl/Mg2+ 215 PJ 849: 216 PJ 885; 267 B-9: 271 B-1612; 401 treatment, or other well known methods in the art. See, e.g., C71A; IFO 15831; PJ 187; NRRL B-3 1784; IFO. 15841; Morrison, J. Bact., 132:349-351 (1977); Clark-Curtiss & KY 8521:3081: 30-21; IFO3081; N: PYR; PW; D946-B83 Curtiss, Methods in Enzymology, 101:347-362 (Wu et al., eds. BU 2183: FERM-P 3328; P-2563 FERM-P 2894; IFO 1983), Sambrook et al., Molecular Cloning, A Laboratory 13658); IAM-1 12643F: M-1: A506 A5-06); A505 A5 Manual (2nd ed. 1989); Kriegler, Gene Transfer and Expres 05-1: A526 A5-26: B69; 72; NRRL B-4290; PMW6 Sion: A Laboratory Manual (1990); and Current Protocols in NCIB 11615: SC 12936; Al IFO 15839); F 1847 CDC Molecular Biology (Ausubel et al., eds., 1994)). 10 EB; F 1848 CDC93): NCIB 10586; P17; F-12: AmMS 257; H. Host Cell PRA25; 6133D02: 6519E01; Ni; SC15208; BNL-WVC: In one embodiment the invention provides an expression NCTC 2583 NCIB 81.94: H13; 1013 ATCC 11251; CCEB system useful for targeting an operably linked protein or 295]: IFO 3903: 1062; or Pf-5. polypeptide of interest to the periplasm of Gram-negative In one embodiment, the host cell can be any cell capable of bacteria or into the extracellular space. In one embodiment, 15 producing a protein or polypeptide of interest, including a P this system utilizes a secretion signal peptide. In another fluorescens cell as described above. The most commonly used embodiment, the expression system is a Pfluorescens expres systems to produce proteins or polypeptides of interest sion system for expression of a protein comprising a secretion include certain bacterial cells, particularly E. coli, because of signal disclosed herein. This aspect of the invention is their relatively inexpensive growth requirements and poten founded on the Surprising discovery that P. fluorescens is tial capacity to produce protein in large batch cultures. Yeasts capable of properly processing and targeting secretion signals are also used to express biologically relevant proteins and from both Pfluorescens and non-Pfluorescens systems. polypeptides, particularly for research purposes. Systems In this embodiment, the host cell can be selected from include Saccharomyces cerevisiae or Pichia pastoris. These “Gram-negative Subgroup 18.” “Gram-nega systems are well characterized, provide generally acceptable tive Proteobacteria Subgroup 18 is defined as the group of all 25 levels of total protein expression and are comparatively fast Subspecies, varieties, strains, and other Sub-special units of and inexpensive. Insect cell expression systems have also the species Pseudomonas fluorescens, including those emerged as an alternative for expressing recombinant pro belonging, e.g., to the following (with the ATCC or other teins in biologically active form. In some cases, correctly deposit numbers of exemplary strain(s) shown in parenthe folded proteins that are post-translationally modified can be sis): Pseudomonas fluorescens biotype A, also called biovar 1 30 produced. Mammalian cell expression systems, such as Chi or biovar I (ATCC 13525); Pseudomonas fluorescens biotype nese hamster ovary cells, have also been used for the expres B, also called biovar 2 or biovar II (ATCC 17816); Pseudomo sion of proteins or polypeptides of interest. On a small scale, nas fluorescens biotype C, also called biovar 3 or biovar III these expression systems are often effective. Certain biolog (ATCC 17400); Pseudomonas fluorescens biotype F, also ics can be derived from proteins, particularly in animal or called biovar 4 or biovar IV (ATCC 12983); Pseudomonas 35 human health applications. In another embodiment, the host fluorescens biotype G, also called biovar 5 or biovar V (ATCC cell is a plant cell, including, but not limited to, a tobacco cell, 17518); Pseudomonas fluorescens biovar VI: Pseudomonas corn, a cell from an Arabidopsis species, potato or rice cell. In fluorescens Pf)-1; Pseudomonas fluorescens Pf-5 (ATCC another embodiment, a multicellular organism is analyzed or BAA-477); Pseudomonas fluorescens SBW25; and is modified in the process, including but not limited to a Pseudomonas fluorescens subsp. cellulosa (NCIMB 10462). 40 transgenic organism. Techniques for analyzing and/or modi The host cell can be selected from “Gram-negative Proteo fying a multicellular organism are generally based on tech bacteria Subgroup 19.” “Gram-negative Proteobacteria Sub niques described for modifying cells described below. group 19 is defined as the group of all strains of Pseudomo In another embodiment, the host cell can be a prokaryote nas fluorescens biotype A. A particularly preferred strain of Such as a bacterial cell including, but not limited to an this biotype is P. fluorescens strain MB101 (see U.S. Pat. No. 45 Escherichia or a Pseudomonas species. Typical bacterial cells 5,169,760 to Wilcox), and derivatives thereof. An example of are described, for example, in “Biological Diversity: Bacteria a preferred derivative thereof is Pfluorescens strain MB214, and Archaeans”, a chapter of the On-Line Biology Book, constructed by inserting into the MB101 chromosomal asd provided by Dr M J Farabee of the Estrella Mountain Com (aspartate dehydrogenase gene) locus, a native E. coli PlacI munity College, Arizona, USA at the website www.emc lacI-laczYA construct (i.e. in which PlacZ was deleted). 50 maricotpa.edu/faculty/farabee/BIOBK/BioBookDiversity. Additional Pfluorescens strains that can be used in the In certain embodiments, the host cell can be a Pseudomonad present invention include Pseudomonas fluorescens Migula cell, and can typically be a P fluorescens cell. In other and Pseudomonas fluorescens Loitokitok, having the follow embodiments, the host cell can also be an E. coli cell. In ing ATCC designations: NCIB 8286; NRRL B-1244: NCIB another embodiment the host cell can be a eukaryotic cell, for 8865 strain CO1, NCIB 8866 strain CO; 1291 ATCC 55 example an insect cell, including but not limited to a cell from 17458; IFO 15837: NCIB 8917; LA; NRRL B-1864; pyrro a Spodoptera, Trichoplusia, Drosophila or an Estigmene spe lidine: PW2 ICMP 3966; NCPPB 967; NRRL B-899); cies, or a mammalian cell, including but not limited to a 13475; NCTC 10038; NRRL B-1603 6; IFO 15840:52-1C: murine cell, a hamster cell, a monkey, a primate or a human CCEB 488-A BU 140); CCEB 553 EM15/47; IAM 1008 cell. AHH-27; IAM 1055 AHH-23; 1 (IFO 15842; 12 ATCC 60 In one embodiment, the host cell can be a member of any of 25323; NIH 11; den Dooren de Jong 216; 18 IFO 15833; the bacterial taxa. The cell can, for example, be a member of WRRLP-7: 93 TR-10; 108 52-22; IFO 15832; 143 IFO any species of eubacteria. The host can be a member of any 15836; PL: 1492-40-40; IFO 15838; 182 IFO 3081; PJ one of the taxa: Acidobacteria, Actinobacteira, Aquificae, 73; 184IFO 15830; 185W2L-1:186IFO 15829; PJ 79; Bacteroidetes, Chlorobi, Chlamydiae, Choroflexi, Chrysio 187 NCPPB 263; 188 NCPPB 316): 189 PJ227; 1208; 65 genetes, Cyanobacteria, Deferribacteres, Deinococcus, Dic 191 (IFO 15834; PJ 236; 22/1; 194 Klinge R-60; PJ 253); tyoglomi, Fibrobacteres, Firmicutes, Fusobacteria, Gemma 196 PJ 288: 197 PJ 290; 198 PJ 302; 201 PJ 368; 202 timonadetes, Lentisphaerae, Nitrospirae, Planctomycetes, US 7,618,799 B2 21 22 Proteobacteria, Spirochaetes, Thermodesulfobacteria, Ther gomonas (and the genus Blastomonas, derived therefrom), momicrobia, Thermotogae, Thermus (Thermales), or Verru which was created by regrouping organisms belonging to comicrobia. In a embodiment of a eubacterial host cell, the (and previously called species of) the genus Xanthomonas, cell can be a member of any species of eubacteria, excluding the genus Acidomonas, which was created by regrouping Cyanobacteria. 5 organisms belonging to the genus Acetobacter as defined in The bacterial host can also be a member of any species of Bergey (1974). In addition hosts can include cells from the Proteobacteria. A proteobacterial host cell can be a member genus Pseudomonas, Pseudomonas enalia (ATCC 14393), of any one of the taxa Alphaproteobacteria, Betaproteobac Pseudomonas nigrifaciensi (ATCC 19375), and Pseudomo teria, , Deltaproteobacteria, or Epsi nas putrefaciens (ATCC 8071), which have been reclassified lonproteobacteria. In addition, the host can be a member of 10 respectively as Alteromonas haloplanktis, Alteromonas nigri any one of the taxa Alphaproteobacteria, Betaproteobacteria, faciens, and Alteromonas putrefaciens. Similarly, e.g., or Gammaproteobacteria, and a member of any species of Pseudomonas acidovorans (ATCC 15668) and Pseudomonas Gammaproteobacteria. testosteroni (ATCC 1 1996) have since been reclassified as In one embodiment of a Gamma Proteobacterial host, the Comamonas acidovorans and Comamonas testosteroni, host will be member of any one of the taxa Aeromonadales, 15 respectively; and Pseudomonas nigrifaciens (ATCC 19375) Alteromonadales, Enterobacteriales, , or and Pseudomonas piscicida (ATCC 15057) have been reclas Xanthomonadales; or a member of any species of the Entero sified respectively as Pseudoalteromonas nigrifaciens and bacteriales or Pseudomonadales. In one embodiment, the host Pseudoalteromonas piscicida. “Gram-negative Proteobacte cell can be of the order Enterobacteriales, the host cell will be ria Subgroup 1 also includes Proteobacteria classified as a member of the family Enterobacteriaceae, or may be a belonging to any of the families: , AZoto member of any one of the genera Erwinia, Escherichia, or bacteraceae (now often called by the synonym, the Azoto Serratia; or a member of the genus Escherichia. Where the bacter group' of Pseudomonadaceae), Rhizobiaceae, and host cell is of the order Pseudomonadales, the host cell may Methylomonadaceae (now often called by the synonym, be a member of the family Pseudomonadaceae, including the “Methylococcaceae). genus Pseudomonas. Gamma Proteobacterial hosts include 25 Consequently, in addition to those genera otherwise members of the species Escherichia coli and members of the described herein, further species Pseudomonasfluorescens. Proteobacterial genera falling within “Gram-negative Pro Other Pseudomonas organisms may also be useful. teobacteria Subgroup 1 include: 1) Azotobacter group bac Pseudomonads and closely related species include Gram teria of the genus Azorhizophilus; 2) negative Proteobacteria Subgroup 1, which include the group 30 Pseudomonadaceae family bacteria of the genera of Proteobacteria belonging to the families and/or genera Cellvibrio, Oligella, and Teredinibacter; 3) Rhizobiaceae described as “Gram-Negative Aerobic Rods and Cocci' by R. family bacteria of the genera Chelatobacter, Ensifer, Liberi E. Buchanan and N. E. Gibbons (eds.), Bergey’s Manual of bacter (also called “Candidatus Liberibacter'), and Determinative Bacteriology, pp. 217-289 (8th ed., 1974) (The Sinorhizobium; and 4) Methylococcaceae family bacteria of Williams & Wilkins Co., Baltimore, Md., USA)(hereinafter the genera Methylobacter, Methylocaldum, Methylomicro “Bergey (1974)). Table 4 presents these families and genera bium, Methylosarcina, and Methylosphaera. of organisms. In another embodiment, the host cell is selected from “Gram-negative Proteobacteria Subgroup 2.” “Gram-nega TABLE 4 tive Proteobacteria Subgroup 2 is defined as the group of 40 Proteobacteria of the following genera (with the total num Families and Genera Listed in the Part, bers of catalog-listed, publicly-available, deposited Strains “Gram-Negative Aerobic Rods and Cocci' (in Bergey (1974)) thereof indicated in parenthesis, all deposited at ATCC, Family I. Pseudomomonaceae Giuconobacter except as otherwise indicated): Acidomonas (2); Acetobacter Pseudomonas (93); Gluconobacter (37); Brevundimonas (23); Beverinckia Xanthomonas 45 (13); Derxia (2); Brucella (4); Agrobacterium (79); Chelato Zoogloea Family II. Azotobacteraceae Azomonas bacter (2); Ensifer (3); Rhizobium (144); Sinorhizobium (24); Azotobacter Blastomonas (1): Sphingomonas (27); Alcaligenes (88); Bor Beijerinckia detella (43); Burkholderia (73); Ralstonia (33); Acidovorax Dexia (20); Hydrogenophaga (9); Zoogloea (9); Methylobacter (2): Family III. Rhizobiaceae Agrobacterium 50 Rhizobium Methylocaldum (1 at NCIMB); Methylococcus (2); Methylo Family IV. Methylomonadaceae Methylococcus microbium (2); Methylomonas (9); Methylosarcina (1): Methylomonas Methylosphaera, Azomonas (9): Azorhizophilus (5): Azoto Family V. Halobacteriaceae Haiobacterium bacter (64); Cellvibrio (3); Oligella (5); Pseudomonas Haiococcus Other Genera Acetobacter (1139); Francisella (4); Xanthomonas (229); Stenotroph Alcaligenes 55 Omonas (50); and Oceanimonas (4). Bordeteia Exemplary host cell species of “Gram-negative Proteobac Bruceiia teria Subgroup 2 include, but are not limited to the following Franciselia bacteria (with the ATCC or other deposit numbers of exem Thermits plary strain(s) thereof shown in parenthesis): Acidomonas 60 methanolica (ATCC 43581); Acetobacter aceti (ATCC “Gram-negative Proteobacteria Subgroup 1 also includes 15973); Gluconobacter oxydans (ATCC 19357); Brevundi Proteobacteria that would be classified in this heading monas diminuta (ATCC 11568); Beijerinckia indica (ATCC according to the criteria used in the classification. The head 9039 and ATCC 19361); Dervia gummosa (ATCC 15994): ing also includes groups that were previously classified in this Brucella melitensis (ATCC 23456), Brucella abortus (ATCC section but are no longer, Such as the genera Acidovorax, 65 23448); Agrobacterium tumefaciens (ATCC 23308), Agro Brevundimonas, Burkholderia, Hydrogenophaga, Oceani bacterium radiobacter (ATCC 19358), Agrobacterium rhizo monas, Ralstonia, and Stenotrophomonas, the genus Sphin genes (ATCC 11325); Chelatobacter heintzii (ATCC 29600); US 7,618,799 B2 23 24 Ensifer adhaerens (ATCC 33212); Rhizobium leguminosa The host cell can be selected from “Gram-negative Proteo rum (ATCC 10004); Sinorhizobiumfredii (ATCC 35423); bacteria Subgroup 8.” “Gram-negative Proteobacteria Sub Blastomonas natatoria (ATCC 35951); Sphingomonas group 8' is defined as the group of Proteobacteria of the paucimobilis (ATCC 29837); Alcaligenes faecalis (ATCC following genera: Brevundimonas, Blastomonas, Sphin 8750); Bordetella pertussis (ATCC 9797); Burkholderia gomonas, Burkholderia, Ralstonia, Acidovorax, Hydro cepacia (ATCC 25416); Ralstonia pickettii (ATCC 27511); genophaga, Pseudomonas, Stenotrophomonas, Xanthomo Acidovorax facilis (ATCC 11228); Hydrogenophagafiava nas; and Oceanimonas. (ATCC 33.667); Zoogloea ramigera (ATCC 19544); Methy The host cell can be selected from “Gram-negative Proteo lobacter luteus (ATCC 49878); Methylocaldum gracile bacteria Subgroup 9.” “Gram-negative Proteobacteria Sub (NCIMB 11912); Methylococcus capsulatus (ATCC 19069); 10 group 9” is defined as the group of Proteobacteria of the Methylomicrobium agile (ATCC 35068); Methylomonas following genera: Brevundimonas, Burkholderia, Ralstonia, methanica (ATCC 35067); Methylosarcina fibrata (ATCC Acidovorax, Hydrogenophaga, Pseudomonas, Stenotroph 700909); Methylosphaera hansonii (ACAM 549): Azomonas Omonas; and Oceanimonas. agilis (ATCC 7494): Azorhizophilus paspali (ATCC 23833); The host cell can be selected from “Gram-negative Proteo Azotobacter chroococcum (ATCC 9043); Cellvibrio mixtus 15 bacteria Subgroup 10.” “Gram-negative Proteobacteria Sub (UQM2601); Oligella urethralis (ATCC 17960); Pseudomo group 10' is defined as the group of Proteobacteria of the nas aeruginosa (ATCC 10145), Pseudomonas fluorescens following genera: Burkholderia, Ralstonia, Pseudomonas, (ATCC 35858): Francisella tularensis (ATCC 6223): Stenotrophomonas; and Xanthomonas. Stenotrophomonas maltophilia (ATCC 13637); Xanthomo The host cell can be selected from “Gram-negative Proteo nas campestris (ATCC 33913); and Oceanimonas doudorofli bacteria Subgroup 11.” “Gram-negative Proteobacteria Sub (ATCC 27123). group 11 is defined as the group of Proteobacteria of the In another embodiment, the host cell is selected from genera: Pseudomonas, Stenotrophomonas; and Xanthomo “Gram-negative Proteobacteria Subgroup 3.” “Gram-nega nas. The host cell can be selected from “Gram-negative Pro tive Proteobacteria Subgroup 3 is defined as the group of teobacteria Subgroup 12.” “Gram-negative Proteobacteria Proteobacteria of the following genera: Brevundimonas, 25 Subgroup 12 is defined as the group of Proteobacteria of the Agrobacterium, Rhizobium, Sinorhizobium, Blastomonas, following genera: Burkholderia, Ralstonia, Pseudomonas. Sphingomonas, Alcaligenes, Burkholderia, Ralstonia, Aci The host cell can be selected from “Gram-negative Proteo dovorax, Hydrogenophaga, Methylobacter, Methylocal bacteria Subgroup 13.” “Gram-negative Proteobacteria Sub dum, Methylococcus, Methylomicrobium, Methylomonas, group 13 is defined as the group of Proteobacteria of the Methylosarcina, Methylosphaera, Azomonas, Azorhizophi 30 following genera: Burkholderia, Ralstonia, Pseudomonas; lus, Azotobacter, Cellvibrio, Oligella, Pseudomonas, Tere and Xanthomonas. The host cell can be selected from "Gram dinibacter, Francisella, Stenotrophomonas, Xanthomonas; negative Proteobacteria Subgroup 14.” “Gram-negative Pro and Oceanimonas. teobacteria Subgroup 14' is defined as the group of Proteo In another embodiment, the host cell is selected from bacteria of the following genera: Pseudomonas and “Gram-negative Proteobacteria Subgroup 4.” “Gram-nega 35 Xanthomonas. The host cell can be selected from "Gram tive Proteobacteria Subgroup 4” is defined as the group of negative Proteobacteria Subgroup 15.” “Gram-negative Pro Proteobacteria of the following genera: Brevundimonas, teobacteria Subgroup 15” is defined as the group of Proteo Blastomonas, Sphingomonas, Burkholderia, Ralstonia, Aci bacteria of the genus Pseudomonas. dovorax, Hydrogenophaga, Methylobacter, Methylocal The host cell can be selected from “Gram-negative Proteo dum, Methylococcus, Methylomicrobium, Methylomonas, 40 bacteria Subgroup 16.” “Gram-negative Proteobacteria Sub Methylosarcina, Methylosphaera, Azomonas, Azorhizophi group 16' is defined as the group of Proteobacteria of the lus, Azotobacter, Cellvibrio, Oligella, Pseudomonas, Tere following Pseudomonas species (with the ATCC or other dinibacter, Francisella, Stenotrophomonas, Xanthomonas; deposit numbers of exemplary strain(s) shown in parenthe and Oceanimonas. sis): Pseudomonas abietaniphila (ATCC 700689); In another embodiment, the host cell is selected from 45 Pseudomonas aeruginosa (ATCC 101.45); Pseudomonas “Gram-negative Proteobacteria Subgroup 5.” “Gram-nega alcaligenes (ATCC 14909); Pseudomonas anguilliseptica tive Proteobacteria Subgroup 5’ is defined as the group of (ATCC 33660); Pseudomonas citronellolis (ATCC 13674): Proteobacteria of the following genera: Methylobacter, Pseudomonas flavescens (ATCC 51555); Pseudomonas men Methylocaldum, Methylococcus, Methylomicrobium, docina (ATCC 25411); Pseudomonas nitroreducens (ATCC Methylomonas, Methylosarcina, Methylosphaera, Azomo 50 33634); Pseudomonas oleovorans (ATCC 8062); Pseudomo nas, Azorhizophilus, Azotobacter, Cellvibrio, Oligella, nas pseudoalcaligenes (ATCC 17440); Pseudomonas resino Pseudomonas, Teredinibacter, Francisella, Stenotrophomo vorans (ATCC 14235); Pseudomonas straminea (ATCC nas, Xanthomonas; and Oceanimonas. 33636); Pseudomonas agarici (ATCC 25941); Pseudomonas The host cell can be selected from “Gram-negative Proteo alcaliphila, Pseudomonas alginovora, Pseudomonas ander bacteria Subgroup 6.” “Gram-negative Proteobacteria Sub 55 sonii. Pseudomonas aspleni (ATCC 23835); Pseudomonas group 6' is defined as the group of Proteobacteria of the azelaica (ATCC 27162): Pseudomonas beverinckii (ATCC following genera: Brevundimonas, Blastomonas, Sphin 19372); Pseudomonas borealis, Pseudomonas boreopolis gomonas, Burkholderia, Ralstonia, Acidovorax, Hydro (ATCC 33662); Pseudomonas brassicacearum, Pseudomo genophaga, Azomonas, Azorhizophilus, Azotobacter, nas butanovora (ATCC 43655); Pseudomonas cellulosa Cellvibrio, Oligella, Pseudomonas, Teredinibacter, 60 (ATCC 55703); Pseudomonas aurantiaca (ATCC 33663); Stenotrophomonas, Xanthomonas; and Oceanimonas. Pseudomonas chlororaphis (ATCC 9446, ATCC 13985, The host cell can be selected from “Gram-negative Proteo ATCC 17418, ATCC 17461); Pseudomonas fragi (ATCC bacteria Subgroup 7..” “Gram-negative Proteobacteria Sub 4973); Pseudomonas lundensis (ATCC 49968); Pseudomo group 7” is defined as the group of Proteobacteria of the nas taetrolens (ATCC 4683); Pseudomonascissicola (ATCC following genera: Azomonas, Azorhizophilus, Azotobacter, 65 33616); Pseudomonas coronafaciens, Pseudomonas diter Cellvibrio, Oligella, Pseudomonas, Teredinibacter, peniphila, Pseudomonas elongata (ATCC 10144); Stenotrophomonas, Xanthomonas; and Oceanimonas. Pseudomonasflectens (ATCC 12775); Pseudomonas azoto US 7,618,799 B2 25 26 formans, Pseudomonas brenneri. Pseudomonas cedrella, diluted and inoculated into fresh rich or minimal medium in Pseudomonas corrugata (ATCC 29736); Pseudomonas either a shake flask or a fermentor and grown at 37° C. extremorientalis, Pseudomonas fluorescens (ATCC 35858); A host can also be of mammalian origin, Such as a cell Pseudomonas gessardii. Pseudomonas libanensis, derived from a mammal including any human or non-human Pseudomonas mandelii (ATCC 700871); Pseudomonas mar mammal. Mammals can include, but are not limited to pri ginalis (ATCC 10844); Pseudomonas migulae, Pseudomo mates, monkeys, porcine, ovine, bovine, rodents, ungulates, nas mucidolens (ATCC 4685); Pseudomonas orientalis, pigs, Swine, sheep, lambs, goats, cattle, deer, mules, horses, Pseudomonas rhodesiae, Pseudomonas synxantha (ATCC monkeys, apes, dogs, cats, rats, and mice. 9890); Pseudomonas tolaasii (ATCC 33618); Pseudomonas A host cell may also be of plant origin. Any plant can be 10 selected for the identification of genes and regulatory veronii (ATCC 700474); Pseudomonas federiksbergensis, sequences. Examples of suitable plant targets for the isolation Pseudomonas geniculata (ATCC 19374); Pseudomonas gin of genes and regulatory sequences would include but are not geri. Pseudomonas graminis, Pseudomonas grimontii, limited to alfalfa, apple, apricot, Arabidopsis, artichoke, aru Pseudomonas halodenitrificans, Pseudomonas halophila, gula, asparagus, avocado, banana, barley, beans, beet, black Pseudomonas hibiscicola (ATCC 19867); Pseudomonas hut 15 berry, blueberry, broccoli, brussels sprouts, cabbage, canola, tiensis (ATCC 14670); Pseudomonas hydrogenovora, cantaloupe, carrot, cassaya, castorbean, cauliflower, celery, Pseudomonas jessenii (ATCC 700870); Pseudomonas kilon cherry, chicory, cilantro, citrus, clementines, clover, coconut, ensis, Pseudomonas lanceolata (ATCC 14669); Pseudomo coffee, corn, cotton, cranberry, cucumber, Douglas fir, egg nas lini. Pseudomonas marginata (ATCC 25417); plant, endive, escarole, eucalyptus, fennel, figs, garlic, gourd, Pseudomonas mephitica (ATCC 33665); Pseudomonas deni grape, grapefruit, honey dew, jicama, kiwifruit, lettuce, leeks, trificans (ATCC 19244); Pseudomonas pertucinogena lemon, lime, Loblolly pine, linseed, mango, melon, mush (ATCC 190); Pseudomonas pictorum (ATCC 23328); room, nectarine, nut, oat, oil palm, oil seed rape, okra, olive, Pseudomonas psychrophila, Pseudomonasfilva (ATCC onion, orange, an ornamental plant, palm, papaya, parsley, 31418); Pseudomonas monteilii (ATCC 700476); Pseudomo parsnip, pea, peach, peanut, pear, pepper, persimmon, pine, nas mosselii, Pseudomonas oryzihabitans (ATCC 43272): 25 pineapple, plantain, plum, pomegranate, poplar, potato, Pseudomonas plecoglossicida (ATCC 700383); Pseudomo pumpkin, quince, radiata pine, radiscehio, radish, rapeseed, nas putida (ATCC 12633); Pseudomonas reactans, raspberry, rice, rye, Sorghum, Southern pine, soybean, spin Pseudomonas spinosa (ATCC 14606); Pseudomonas bale ach, squash, Strawberry, Sugarbeet, Sugarcane, Sunflower, arica, Pseudomonas luteola (ATCC 43273); Pseudomonas Stutzeri (ATCC 17588); (ATCC Sweet potato, Sweetgum, tangerine, tea, tobacco, tomato, triti 33614); Pseudomonas avellanae (ATCC 700331); 30 cale, turf, turnip, a vine, watermelon, wheat, yams, and Zuc Pseudomonas caricapapayae (ATCC 33615); Pseudomonas chini. In some embodiments, plants useful in the method are cichorii (ATCC 10857); Pseudomonas ficuserectae (ATCC Arabidopsis, corn, wheat, soybean, and cotton. 35104); Pseudomonas fiscovaginae, Pseudomonas meliae III. Methods (ATCC 33050); Pseudomonas syringae (ATCC 19310): 35 The methods of the invention provide the expression of Pseudomonas viridiflava (ATCC 13223); Pseudomonas ther fusion proteins comprising a secretion signal polypeptide mocarboxydovorans (ATCC 35961); Pseudomonas thermo selected from a pbp, dsbA, dsbC, Bce, Cup A2, CupB2, tolerans, Pseudomonas thivervalensis, Pseudomonas van CupC2, NikA, FlgI, ORF5550, Ttg2C, or ORF812 secretion couverensis (ATCC 700688); Pseudomonas wisconsinensis; signal. In one embodiment, the method includes a host cell and Pseudomonas Xiamenensis. 40 expressing a protein of interest linked to a secretion signal of The host cell can be selected from “Gram-negative Proteo the invention. The methods include providing a host cell, bacteria Subgroup 17.” “Gram-negative Proteobacteria Sub preferably a P fluorescens host cell, comprising a vector group 17 is defined as the group of Proteobacteria known in encoding a recombinant protein comprising the protein or the art as the “fluorescent Pseudomonads’ including those polypeptide of interest operably linked to a secretion signal belonging, e.g., to the following Pseudomonas species: 45 sequence disclosed herein, and growing the cell under con Pseudomonas azotoformans, Pseudomonas brenneri; ditions that result in expression of the protein or polypeptide. Pseudomonas cedrella, Pseudomonas COrrugata, Alternatively, the method of expressing proteins or polypep Pseudomonas extremorientalis, Pseudomonasfluorescens, tides using the identified secretion signals can be used in any Pseudomonas gessardii. Pseudomonas libanensis, given host system, including host cells of either eukaryotic or Pseudomonas mandelii, Pseudomonas marginalis, 50 prokaryotic origin. The vector can have any of the character Pseudomonas migulae, Pseudomonas mucidolens, istics described above. In one embodiment, the vector com Pseudomonas Orientalis, Pseudomonas rhodesiae, prises a nucleotide sequence encoding the secretion signal Pseudomonas synxantha, Pseudomonas tolaasii; and polypeptides disclosed hereinas SEQID NO:2, 4, 6, 8, 10, 12, Pseudomonas veronii. 14, 16, 18, 20, 22, or 24, or variants and fragments thereof. In Other suitable hosts include those classified in other parts 55 another embodiment, the vector comprises a nucleotide of the reference, such as Gram (+) Proteobacteria. In one sequence comprising SEQID NO:1, 3, 5, 7, 9, 11, 13, 15, 17. embodiment, the host cell is an E. coli. The genome sequence 19, 20, 21, or 23. for E. coli has been established for E. coli MG1655 (Blattner, In another embodiment, the host cell has a periplasm and et al. (1997) The complete genome sequence of Escherichia expression of the secretion signal polypeptide results in the coli K-12, Science 277(5331): 1453-74) and DNA microar 60 targeting of Substantially all of the protein or polypeptide of rays are available commercially for E. coli K12 (MWG Inc. interest to the periplasm of the cell. It is recognized that a High Point, N.C.). E. coli can be cultured in either a rich Small fraction of the protein expressed in the periplasm may medium such as Luria-Bertani (LB)(10 g/L tryptone, 5 g/L actually leakthrough the cell membrane into the extracellular NaCl, 5 g/L yeast extract) or a defined minimal medium such space; however, the majority of the targeted polypeptide as M9 (6g/L NaHPO3 g/L KHPO 1 g/L NHCl, 0.5g/L 65 would remain within the periplasmic space. NaCl, pH 7.4) with an appropriate carbon source such as 1% The expression may further lead to production of extracel glucose. Routinely, an over night culture of E. coli cells is lular protein. The method may also include the step of puri US 7,618,799 B2 27 28 fying the protein or polypeptide of interest from the periplasm g/L, about 20 g/L, at least about 25 g/L, or greater. In some or from extracellular media. The Secretion signal can be embodiments, the amount of periplasmic protein produced is expressed in a manner in which it is linked to the protein and at least about 5%, about 10%, about 15%, about 20%, about the signal-linked protein can be purified from the cell. There 25%, about 30%, about 40%, about 50%, about 60%, about fore, in one embodiment, this isolated polypeptide is a fusion 5 70%, about 80%, about 90%, about 95%, about 96%, about protein of the secretion signal and a protein or polypeptide of 97%, about 98%, about 99%, or more of total protein or interest. However, the secretion signal can also be cleaved polypeptide of interest produced. from the protein when the protein is targeted to the periplasm. In one embodiment, the method produces at least 0.1 g/L In one embodiment, the linkage between the secretion signal correctly processed protein. A correctly processed protein has and the protein or polypeptide is modified to increase cleav 10 an amino terminus of the native protein. In some embodi age of the secretion signal. ments, at least 50% of the protein or polypeptide of interest The methods of the invention may also lead to increased comprises a native amino terminus. In another embodiment, production of the protein or polypeptide of interest within the at least 60%, at least 70%, at least 80%, at least 90%, or more host cell. The increased production alternatively can be an of the protein has an amino terminus of the native protein. In increased level of properly processed protein or polypeptide 15 various embodiments, the method produces 0.1 to 10 g/L per gram of protein produced, or per gram of host protein. The correctly processed protein in the cell, including at least about increased production can also be an increased level of recov 0.2, about 0.3, about 0.4, about 0.5, about 0.6, about 0.7, about erable protein or polypeptide produced per gram of recombi 0.8, about 0.9 or at least about 1.0 g/L correctly processed nant or per gram of host cell protein. The increased produc protein. In another embodiment, the total correctly processed tion can also be any combination of an increased level of total protein or polypeptide of interest produced is at least 1.0 g/L. protein, increased level of properly processed protein, or at least about 2 g/L, at least about 3 g/L, about 4 g/L, about 5 increased level of active or soluble protein. In this embodi g/L, about 6 g/L, about 7 g/L, about 8 g/L, about 10 g/L, about ment, the term “increased' is relative to the level of protein or 15 g/L, about 20 g/L, about 25g/L, about 30 g/L, about 35 g/l. polypeptide that is produced, properly processed, soluble, about 40 g/l, about 45 g/1, at least about 50 g/L, or greater. In and/or recoverable when the protein or polypeptide of interest 25 Some embodiments, the amount of correctly processed pro is expressed in a cell without the secretion signal polypeptide tein produced is at least about 5%, about 10%, about 15%, of the invention. about 20%, about 25%, about 30%, about 40%, about 50%, An improved expression of a protein or polypeptide of about 60%, about 70%, about 80%, about 90%, about 95%, interest can also refer to an increase in the solubility of the about 96%, about 97%, about 98%, at least about 99%, or protein. The protein or polypeptide of interest can be pro 30 more of total recombinant protein in a correctly processed duced and recovered from the cytoplasm, periplasm or extra form. cellular medium of the host cell. The protein or polypeptide The methods of the invention can also lead to increased can be insoluble or soluble. The protein or polypeptide can yield of the protein or polypeptide of interest. In one embodi include one or more targeting sequences or sequences to ment, the method produces a protein or polypeptide of inter assist purification, as discussed Supra. 35 estas at least about 5%, at least about 10%, about 15%, about The term “soluble' as used herein means that the protein is 20%, about 25%, about 30%, about 40%, about 45%, about not precipitated by centrifugation at between approximately 50%, about 55%, about 60%, about 65%, about 70%, about 5,000 and 20,000x gravity when spun for 10-30 minutes in a 75%, or greater of total cell protein (tcp). “Percent total cell buffer under physiological conditions. Soluble proteins are protein’ is the amount of protein or polypeptide in the host not part of an inclusion body or other precipitated mass. 40 cell as a percentage of aggregate cellular protein. The deter Similarly, “insoluble'means that the protein or polypeptide mination of the percent total cell protein is well known in the can be precipitated by centrifugation at between 5,000 and art. 20,000xgravity when spun for 10-30 minutes in a buffer In a particular embodiment, the host cell can have a recom under physiological conditions. Insoluble proteins or binant polypeptide, polypeptide, protein, or fragment thereof polypeptides can be part of an inclusion body or other pre 45 expression level of at least 1% top and a cell density of at least cipitated mass. The term “inclusion body' is meant to include 40 g/L, when grown (i.e. within a temperature range of about any intracellular body contained within a cell wherein an 4°C. to about 55° C., including about 10° C., about 15° C., aggregate of proteins or polypeptides has been sequestered. about 20°C., about 25°C., about 30°C., about 35°C., about The methods of the invention can produce protein localized 40° C., about 45° C., and about 50° C.) in a mineral salts to the periplasm of the host cell. In one embodiment, the 50 medium. In a particularly preferred embodiment, the expres method produces properly processed proteins or polypep sion system will have a protein or polypeptide expression tides of interest in the cell. In another embodiment, the level of at least 5% top and a cell density of at least 40 g/L, expression of the secretion signal polypeptide may produce when grown (i.e. within a temperature range of about 4°C. to active proteins or polypeptides of interest in the cell. The about 55° C., inclusive) in a mineral salts medium at a fer method of the invention may also lead to an increased yield of 55 mentation scale of at least about 10 Liters. proteins or polypeptides of interest as compared to when the In practice, heterologous proteins targeted to the periplasm protein is expressed without the secretion signal of the inven are often found in the broth (see European Patent No. EPO tion. 288 451), possibly because of damage to oran increase in the In one embodiment, the method produces at least 0.1 g/L fluidity of the outer cell membrane. The rate of this “passive' protein in the periplasmic compartment. In another embodi 60 secretion may be increased by using a variety of mechanisms ment, the method produces 0.1 to 10 g/L periplasmic protein that permeabilize the outer cell membrane: colicin (Mikschet in the cell, or at least about 0.2, about 0.3, about 0.4, about 0.5, al. (1997) Arch. Microbiol. 167: 143-150); growth rate about 0.6, about 0.7, about 0.8, about 0.9 or at least about 1.0 (Shokri et al. (2002) App Miocrobiol Biotechnol g/L periplasmic protein. In one embodiment, the total protein 58:386-392); Toll II overexpression (Wan and Baneyx (1998) or polypeptide of interest produced is at least 1.0 g/L, at least 65 Protein Expression Purif: 14:13–22); bacteriocin release pro about 2 g/L, at least about 3 g/L, about 4 g/L, about 5 g/L. tein (Hsiung et al. (1989) Bio/Technology 7: 267-71), colicin about 6 g/L, about 7 g/L, about 8 g/L, about 10 g/L, about 15 A lysis protein (Lloubes et al. (1993) Biochimie 75: 451-8) US 7,618,799 B2 29 30 mutants that leak periplasmic proteins (Furlong and Sund protein or polypeptide of interest that allows for a compara strom (1989) Developments in Indus. Microbio. 30: 141-8); tive analysis to the native protein or polypeptide so long as fusion partners (Jeong and Lee (2002) Appl. Environ. Micro Such activity is assayable. Alternatively, the proteins or bio. 68: 4979-4985); recovery by osmotic shock (Taguchi et polypeptides produced in the present invention can be al. (1990) Biochimica Biophysica Acta 1049: 278-85). Trans 5 assayed for the ability to stimulate or inhibit interaction port of engineered proteins to the periplasmic space with between the protein or polypeptide and a molecule that nor Subsequent localization in the broth has been used to produce mally interacts with the protein or polypeptide, e.g. a Sub properly folded and active proteins in E. coli (Wan and strate or a component of the signal pathway that the native Baneyx (1998) Protein Expression Purif. 14: 13-22; Sim protein normally interacts. Such assays can typically include mons et al. (2002).J. Immun. Meth. 263: 133-147; Lundell et 10 the steps of combining the protein with a Substrate molecule al. (1990).J. Indust. Microbio. 5: 215-27). under conditions that allow the protein or polypeptide to A. Production of Active Protein interact with the target molecule, and detect the biochemical In some embodiments, the protein can also be produced in consequence of the interaction with the protein and the target an active form. The term “active” means the presence of molecule. biological activity, wherein the biological activity is compa 15 Assays that can be utilized to determine protein or rable or substantially corresponds to the biological activity of polypeptide activity are described, for example, in Ralph, P. a corresponding native protein or polypeptide. In the context J., et al. (1984).J. Immunol. 132:1858 or Saiki et al. (1981).J. of proteins this typically means that a polynucleotide or Immunol. 127:1044, Steward, W. E. II (1980) The Interferon polypeptide comprises a biological function or effect that has Systems. Springer-Verlag, Vienna and New York, Broxmeyer, at least about 20%, about 50%, preferably at least about H. E., et al. (1982) Blood 60:595, Molecular Cloning. A 60-80%, and most preferably at least about 90-95% activity Laboratory Manua', 2d ed., Cold Spring Harbor Laboratory compared to the corresponding native protein or polypeptide Press, Sambrook, J., E. F. Fritsch and T. Maniatis eds., 1989, using standard parameters. The determination of protein or and Methods in Enzymology. Guide to Molecular Cloning polypeptide activity can be performed utilizing correspond Techniques, Academic Press, Berger, S. L. and A. R. Kimmel ing standard, targeted comparative biological assays for par 25 eds., 1987, AK Patra et al., Protein Expr Purif, 18(2): p.182 ticular proteins or polypeptides. One indication that a protein 92 (2000), Kodama et al., J. Biochem.99: 1465-1472 (1986); or polypeptide of interest maintains biological activity is that Stewart et al., Proc. Nat’l. Acad. Sci. USA 90: 5209-5213 the polypeptide is immunologically cross reactive with the (1993); (Lombillo et al., J. Cell Biol. 128:107-115 (1995); native polypeptide. (Vale et al., Cell 42:39-50 (1985). The invention can also improve recovery of active protein 30 or polypeptide of interest. Active proteins can have a specific B. Cell Growth Conditions activity of at least about 20%, at least about 30%, at least The cell growth conditions for the host cells described about 40%, about 50%, about 60%, at least about 70%, about herein can include that which facilitates expression of the 80%, about 90%, or at least about 95% that of the native protein of interest, and/or that which facilitates fermentation protein or polypeptide that the sequence is derived from. 35 of the expressed protein of interest. As used herein, the term Further, the Substrate specificity (k/K) is optionally Sub "fermentation' includes both embodiments in which literal stantially similar to the native protein or polypeptide. Typi fermentation is employed and embodiments in which other, cally, k/K, will be at least about 30%, about 40%, about non-fermentative culture modes are employed. Fermentation 50%, about 60%, about 70%, about 80%, at least about 90%, may be performed at any scale. In one embodiment, the at least about 95%, or greater. Methods of assaying and quan 40 fermentation medium may be selected from among rich tifying measures of protein and polypeptide activity and Sub media, minimal media, and mineral salts media; a rich strate specificity (k/K), are well known to those of skill in medium may be used, but is preferably avoided. In another the art. embodiment either a minimal medium or a mineral salts The activity of the protein or polypeptide of interest can be medium is selected. In still another embodiment, a minimal also compared with a previously established native protein or 45 medium is selected. In yet another embodiment, a mineral polypeptide standard activity. Alternatively, the activity of the salts medium is selected. Mineral salts media are particularly protein or polypeptide of interest can be determined in a preferred. simultaneous, or Substantially simultaneous, comparative Mineral salts media consists of mineral salts and a carbon assay with the native protein or polypeptide. For example, in Source Such as, e.g., glucose, Sucrose, or glycerol. Examples vitro assays can be used to determine any detectable interac 50 of mineral salts media include, e.g., M9 medium, Pseudomo tion between a protein or polypeptide of interest and a target, nas medium (ATCC 179), Davis and Mingioli medium (see, e.g. between an expressed enzyme and Substrate, between BD Davis & E S Mingioli (1950) in J. Bact. 60:17-28). The expressed hormone and hormone receptor, between mineral salts used to make mineral salts media include those expressed antibody and antigen, etc. Such detection can selected from among, e.g., potassium phosphates, ammo include the measurement of calorimetric changes, prolifera 55 nium Sulfate or chloride, magnesium Sulfate or chloride, and tion changes, cell death, cell repelling, changes in radioactiv trace minerals such as calcium chloride, borate, and Sulfates ity, changes in Solubility, changes in molecular weight as of iron, copper, manganese, and zinc. No organic nitrogen measured by gel electrophoresis and/or gel exclusion meth Source. Such as peptone, tryptone, amino acids, or a yeast ods, phosphorylation abilities, antibody specificity assays extract, is included in a mineral salts medium. Instead, an Such as ELISA assays, etc. In addition, in vivo assays include, 60 inorganic nitrogen source is used and this may be selected but are not limited to, assays to detect physiological effects of from among, e.g., ammonium salts, aqueous ammonia, and the Pseudomonas produced protein or polypeptide in com gaseous ammonia. A preferred mineral salts medium will parison to physiological effects of the native protein or contain glucose as the carbon Source. In comparison to min polypeptide, e.g. weight gain, change in electrolyte balance, eral salts media, minimal media can also contain mineral salts change in blood clotting time, changes in clot dissolution and 65 and a carbon Source, but can be supplemented with, e.g., low the induction of antigenic response. Generally, any in vitro or levels of amino acids, vitamins, peptones, or other ingredi in vivo assay can be used to determine the active nature of the ents, though these are added at very minimal levels. US 7,618,799 B2 31 32 In one embodiment, media can be prepared using the com Liters, 50 Liters, 75 Liters, 100 Liters, 200 Liters, 500 Liters, ponents listed in Table 5 below. The components can be added 1,000 Liters, 2,000 Liters, 5,000 Liters, 10,000 Liters or in the following order: first (NH4)HPO, KHPO and citric 50,000 Liters. acid can be dissolved in approximately 30 liters of distilled In the present invention, growth, culturing, and/or fermen water; then a solution of trace elements can be added, fol 5 tation of the transformed host cells is performed within a lowed by the addition of an antifoam agent, such as Ucolub N temperature range permitting Survival of the host cells, pref 115. Then, after heat sterilization (such as at approximately erably a temperature within the range of about 4°C. to about 121°C.), sterile solutions of glucose MgSO and thiamine 55°C., inclusive. Thus, e.g., the terms “growth' (and “grow.” HCL can be added. Control of pH at approximately 6.8 can be “growing”), “culturing (and “culture'), and “fermentation' 10 (and “ferment,” “fermenting'), as used herein in regard to the achieved using aqueous ammonia. Sterile distilled water can host cells of the present invention, inherently means then be added to adjust the initial volume to 371 minus the “growth.” “culturing and “fermentation.” within a tempera glycerol stock (123 mL). The chemicals are commercially ture range of about 4°C. to about 55° C., inclusive. In addi available from various suppliers, such as Merck. This media tion, “growth' is used to indicate both biological states of can allow for a high cell density cultivation (HCDC) for 15 active cell division and/or enlargement, as well as biological growth of Pseudomonas species and related bacteria. The states in which a non-dividing and/or non-enlarging cell is HCDC can start as a batch process which is followed by a being metabolically sustained, the latter use of the term two-phase fed-batch cultivation. After unlimited growth in “growth' being synonymous with the term “maintenance.” the batch part, growth can be controlled at a reduced specific An additional advantage in using Pseudomonas fluore growth rate over a period of 3 doubling times in which the scens in expressing secreted proteins includes the ability of biomass concentration can increased several fold. Further Pseudomonas fluorescens to be grown in high cell densities details of such cultivation procedures is described by Riesen compared to some other bacterial expression systems. To this berg, D.; Schulz, V. Knorre, W. A.; Pohl, H. D.: Korz, D.: end, Pseudomonas fluorescens expressions systems accord Sanders, E. A.; Ross, A.; Deckwer, W. D. (1991) “High cell ing to the present invention can provide a cell density of about density cultivation of Escherichia coli, at controlled specific 25 20 g/L or more. The Pseudomonasfluorescens expressions growth rate” J. Biotechnol. 2001) 17-27. systems according to the present invention can likewise pro vide a cell density of at least about 70 g/L, as stated in terms TABLE 5 of biomass per Volume, the biomass being measured as dry cell weight. Medium composition 30 In one embodiment, the cell density will be at least about Initial 20 g/L. In another embodiment, the cell density will be at least Component concentration about 25g/L, about 30g/L, about 35 g/L, about 40 g/L, about 45g/L, about 50 g/L, about 60 g/L, about 70 g/L, about 80 KH2PO 13.3 g (NH4)HPO. 4.0 g g/L, about 90 g/L., about 100 g/L, about 110 g/L, about 120 Citric Acid 1.7 g l' 35 g/L, about 130 g/L, about 140 g/L, about or at least about 150 MgSO 7H2O 1.2 g I' g/L. Trace metal solution 10 ml || Thiamin HCI 4.5 mg l' In another embodiments, the cell density at induction will Glucose-H2O 27.3 g || be between about 20 g/L and about 150 g/L.; between about 20 Antifoam Ucolub N115 0.1 ml | g/L and about 120 g/L: about 20 g/L and about 80 g/L: about Feeding solution 40 25 g/L and about 80 g/L: about 30g/L and about 80 g/L, about MgSO 7H2O 19.7 g || 35 g/L and about 80 g/L: about 40 g/L and about 80 g/L, about Glucose-H2O 770 g I 45 g/L and about 80 g/L: about 50 g/L and about 80 g/L, about NH 23 g 50 g/L and about 75 g/L; about 50 g/L and about 70 g/L, about Trace metal solution 40 g/L and about 80 g/L. 45 6 gl' Fe(III) citrate 1.5 gl' MnCl2 4H2O C. Isolation of Protein or Polypeptide of Interest 0.8g | ZmCHCOOI 2HO 0.3 g || To measure the yield, Solubility, conformation, and/or HBO activity of the protein of interest, it may be desirable to isolate 0.25 g I Na2MoC—2H2O 0.25 g I CoCl2 the protein from the host cell and/or extracellular medium. 6H2O 0.15g | CuCl2HO 0.84 g lethylene The isolation may be a crude, semi-crude, or pure isolation, Dinitrilo-tetracetic acid Nasah 2H2O 50 depending on the requirements of the assay used to make the (Tritriplex III, Merck) appropriate measurements. The protein may be produced in the cytoplasm, targeted to the periplasm, or may be secreted into the culture or fermentation media. To release targeted The expression system according to the present invention proteins from the periplasm, treatments involving chemicals can be cultured in any fermentation format. For example, 55 such as chloroform (Ames et al. (1984) J. Bacteriol., 160: batch, fed-batch, semi-continuous, and continuous fermenta 1181-1 183), guanidine-HCl, and Triton X-100 (Naglak and tion modes may be employed herein. Wherein the protein is Wang (1990) Enzyme Microb. Technol., 12: 603-611) have excreted into the extracellular medium, continuous fermen been used. However, these chemicals are not inert and may tation is preferred. have detrimental effects on many recombinant protein prod The expression systems according to the present invention 60 ucts or Subsequent purification procedures. Glycine treatment are useful for transgene expression at any scale (i.e. Volume) of E. coli cells, causing permeabilization of the outer mem of fermentation. Thus, e.g., microliter-scale, centiliter scale, brane, has also been reported to release the periplasmic con and deciliter scale fermentation Volumes may be used; and 1 tents (Ariga et al. (1989) J. Ferm. Bioeng. 68: 243-246). The Liter scale and larger fermentation Volumes can be used. In most widely used methods of periplasmic release of recom one embodiment, the fermentation volume will beat or above 65 binant protein are osmotic shock (Nosal and Heppel (1966).J. 1 Liter. In another embodiment, the fermentation volume will Biol. Chem., 241:3055-3062; Neu and Heppel (1965).J. Biol. be at or above 5 Liters, 10 Liters, 15 Liters, 20 Liters, 25 Chem., 240:3685-3692), hen eggwhite (HEW)-lysozyme? US 7,618,799 B2 33 34 ethylenediamine tetraacetic acid (EDTA) treatment (Neu and permissive conditions to express the gene product intracellu Heppel (1964).J. Biol. Chem., 239:3893-3900; Witholtet al. larly, and externalizing the product by raising the temperature (1976) Biochim. Biophys. Acta, 443: 534-544: Pierce et al. to induce phage-encoded functions. Asami et al. (1997) J. (1995) ICheme Research. Event, 2:995-997), and combined Ferment. and Bioeng., 83: 511-516 discloses synchronized HEW-lysozyme?osmotic shock treatment (French et al. 5 disruption of E. coli cells by T4 phage infection, and Tanji et (1996) Enzyme and Microb Tech., 19:332-338). The French al. (1998).J. Ferment. and Bioeng., 85: 74-78 discloses con method involves resuspension of the cells in a fractionation trolled expression of lysis genes encoded in T4 phage for the buffer followed by recovery of the periplasmic fraction, gentle disruption of E. coli cells. where osmotic shock immediately follows lysozyme treat Upon cell lysis, genomic DNA leaks out of the cytoplasm ment. 10 into the medium and results in significant increase in fluid Typically, these procedures include an initial disruption in Viscosity that can impede the sedimentation of Solids in a osmotically-stabilizing medium followed by selective release centrifugal field. In the absence of shear forces such as those in non-stabilizing medium. The composition of these media exerted during mechanical disruption to breakdown the DNA (pH, protective agent) and the disruption methods used (chlo polymers, the slower sedimentation rate of Solids through roform, HEW-lysozyme, EDTA, sonication) vary among spe 15 Viscous fluid results in poor separation of Solids and liquid cific procedures reported. A variation on the HEW-lysozyme/ during centrifugation. Other than mechanical shear force, EDTA treatment using a dipolar ionic detergent in place of there exist nucleolytic enzymes that degrade DNA polymer. EDTA is discussed by Stabel et al. (1994) Veterinary Micro In E. coli, the endogenous gene endA encodes for an endo biol., 38: 307-314. For a general review of use of intracellular nuclease (molecular weight of the mature protein is approx. lytic enzyme systems to disrupt E. coli, see Dabora and 24.5 kD) that is normally secreted to the periplasm and Cooney (1990) in Advances in Biochemical Engineering/ cleaves DNA into oligodeoxyribonucleotides in an endo Biotechnology, Vol. 43, A. Fiechter, ed. (Springer-Verlag: nucleolytic manner. It has been Suggested that endA is rela Berlin), pp. 11-30. tively weakly expressed by E. coli (Wackemagel et al. (1995) Conventional methods for the recovery of proteins or Gene 154: 55-59). polypeptides of interest from the cytoplasm, as soluble pro 25 In one embodiment, no additional disulfide-bond-promot tein or refractile particles, involved disintegration of the bac ing conditions or agents are required in order to recover terial cell by mechanical breakage. Mechanical disruption disulfide-bond-containing identified polypeptide in active, typically involves the generation of local cavitation in a liquid soluble form from the host cell. In one embodiment, the Suspension, rapid agitation with rigid beads, Sonication, or transgenic polypeptide, polypeptide, protein, or fragment grinding of cell Suspension (Bacterial Cell Surface Tech 30 thereofhas a folded intramolecular conformation in its active niques, Hancock and Poxton (John Wiley & Sons Ltd, 1988), state. In one embodiment, the transgenic polypeptide, Chapter 3, p. 55). polypeptide, protein, or fragment contains at least one HEW-lysozyme acts biochemically to hydrolyze the pep intramolecular disulfide bond in its active state; and perhaps tidoglycan backbone of the cell wall. The method was first up to 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 or more disulfide developed by Zinder and Arndt (1956) Proc. Natl. Acad. Sci. 35 bonds. USA, 42: 586-590, who treated E. coli with egg albumin The proteins of this invention may be isolated and purified (which contains HEW-lysozyme) to produce rounded cellular to Substantial purity by standard techniques well known in the spheres later known as spheroplasts. These structures art, including, but not limited to, ammonium sulfate or etha retained some cell-wall components but had large Surface nol precipitation, acid extraction, anion or cation exchange areas in which the cytoplasmic membrane was exposed. U.S. 40 chromatography, phosphocellulose chromatography, hydro Pat. No. 5,169,772 discloses a method for purifying hepari phobic interaction chromatography, affinity chromatography, nase from bacteria comprising disrupting the envelope of the nickel chromatography, hydroxylapatite chromatography, bacteria in an osmotically-stabilized medium, e.g., 20% reverse phase chromatography, lectin chromatography, pre Sucrose solution using, e.g., EDTA, lysozyme, or an organic parative electrophoresis, detergent solubilization, selective compound, releasing the non-heparinase-like proteins from 45 precipitation with Such substances as column chromatogra the periplasmic space of the disrupted bacteria by exposing phy, immunopurification methods, and others. For example, the bacteria to a low-ionic-strength buffer, and releasing the proteins having established molecular adhesion properties heparinase-like proteins by exposing the low-ionic-strength can be reversibly fused with a ligand. With the appropriate washed bacteria to a buffered salt solution. ligand, the protein can be selectively adsorbed to a purifica Many different modifications of these methods have been 50 tion column and then freed from the column in a relatively used on a wide range of expression systems with varying pure form. The fused protein is then removed by enzymatic degrees of success (Joseph-LiaZun et al. (1990) Gene, 86: activity. In addition, protein can be purified using immunoaf 291-295; Carter et al. (1992) Bio/Technology, 10: 163-167). finity columns or Ni-NTA columns. General techniques are Efforts to induce recombinant cell culture to produce further described in, for example, R. Scopes, Protein Purifi lysozyme have been reported. EPO 155 189 discloses a means 55 cation: Principles and Practice, Springer-Verlag. N.Y. (1982); for inducing a recombinant cell culture to produce lysozymes, Deutscher, Guide to Protein Purification, Academic Press which would ordinarily be expected to kill such host cells by (1990); U.S. Pat. No. 4,511,503; S. Roe, Protein Purification means of destroying or lysing the cell wall structure. Techniques. A Practical Approach (Practical Approach U.S. Pat. No. 4,595,658 discloses a method for facilitating Series), Oxford Press (2001); D. Bollag, et al., Protein Meth externalization of proteins transported to the periplasmic 60 ods, Wiley-Lisa, Inc. (1996); AK Patra et al., Protein Expr space of E. coli. This method allows selective isolation of Purif, 18(2): p.182-92 (2000); and R. Mukhija, et al., Gene proteins that locate in the periplasm without the need for 165(2): p. 303-6 (1995). See also, for example, Ausubel, et al. lysozyme treatment, mechanical grinding, or osmotic shock (1987 and periodic supplements): Deutscher (1990) “Guide treatment of cells. U.S. Pat. No. 4,637,980 discloses produc to Protein Purification.” Methods in Enzymology vol. 182, and ing a bacterial product by transforming a temperature-sensi 65 other volumes in this series; Coligan, etal. (1996 and periodic tive lysogen with a DNA molecule that codes, directly or Supplements) Current Protocols in Protein Science Wiley/ indirectly, for the product, culturing the transformant under Greene, NY; and manufacturer's literature on use of protein US 7,618,799 B2 35 36 purification products, e.g., Pharmacia, Piscataway, N.J., or The molecular weight of a protein or polypeptide of inter Bio-Rad, Richmond, Calif. Combination with recombinant est can be used to isolated it from proteins of greater and techniques allow fusion to appropriate segments, e.g., to a lesser size using ultrafiltration through membranes of differ FLAG sequence or an equivalent which can be fused via a ent pore size (for example, Amicon or Millipore membranes). protease-removable sequence. See also, for example. As a first step, the protein mixture can be ultrafiltered through Hochuli (1989) Chemische Industrie 12:69-70; Hochuli a membrane with a pore size that has a lower molecular (1990) “Purification of Recombinant Proteins with Metal weight cut-off than the molecular weight of the protein of Chelate Absorbent in Setlow (ed.) Genetic Engineering, interest. The retentate of the ultrafiltration can then be ultra Principle and Methods 12:87–98, Plenum Press, NY; and filtered against a membrane with a molecular cut off greater Crowe, et al. (1992) QIAexpress: The High Level Expression 10 than the molecular weight of the protein of interest. The & Protein Purification System QUIAGEN, Inc., Chatsworth, protein or polypeptide of interest will pass through the mem Calif. brane into the filtrate. The filtrate can then be chromato Detection of the expressed protein is achieved by methods graphed as described below. known in the art and include, for example, radioimmunoas The secreted proteins or polypeptides of interest can also says, Western blotting techniques or immunoprecipitation. 15 be separated from other proteins on the basis of its size, net Certain proteins expressed in this invention may form Surface charge, hydrophobicity, and affinity for ligands. In insoluble aggregates (inclusion bodies'). Several protocols addition, antibodies raised against proteins can be conjugated are suitable for purification of proteins from inclusion bodies. to column matrices and the proteins immunopurified. All of For example, purification of inclusion bodies typically these methods are well known in the art. It will be apparent to involves the extraction, separation and/or purification of one of skill that chromatographic techniques can be per inclusion bodies by disruption of the host cells, e.g., by incu formed at any scale and using equipment from many different bation in a buffer of 50 mM TRIS/HCL pH 7.5, 50 mMNaCl, manufacturers (e.g., Pharmacia Biotech). 5 mM MgCl, 1 mM DTT, 0.1 mM ATP, and 1 mM PMSF. D. Renaturation and Refolding The cell Suspension is typically lysed using 2-3 passages In some embodiments of the present invention, more than through a French Press. The cell suspension can also be 25 50% of the expressed, transgenic polypeptide, polypeptide, homogenized using a Polytron (Brinkman Instruments) or protein, or fragment thereof produced can be produced in a Sonicated on ice. Alternate methods of lysing bacteria are renaturable form in a host cell. In another embodiment about apparent to those of skill in the art (see, e.g., Sambrook et al., 60%, 70%, 75%, 80%, 85%, 90%, 95% of the expressed Supra; Ausubel et al., Supra). protein is obtained in or can be renatured into active form. If necessary, the inclusion bodies can be solubilized, and 30 Insoluble protein can be renatured or refolded to generate the lysed cell Suspension typically can be centrifuged to secondary and tertiary protein structure conformation. Pro remove unwanted insoluble matter. Proteins that formed the tein refolding steps can be used, as necessary, in completing inclusion bodies may be renatured by dilution or dialysis with configuration of the recombinant product. Refolding and a compatible buffer. Suitable solvents include, but are not renaturation can be accomplished using an agent that is limited to urea (from about 4M to about 8 M), formamide (at 35 known in the art to promote dissociation/association of pro least about 80%, volume/volume basis), and guanidine teins. For example, the protein can be incubated with dithio hydrochloride (from about 4 M to about 8 M). Although threitol followed by incubation with oxidized glutathione guanidine hydrochloride and similar agents are denaturants, disodium salt followed by incubation with a buffer containing this denaturation is not irreversible and renaturation may a refolding agent Such as urea. occur upon removal (by dialysis, for example) or dilution of 40 The protein or polypeptide of interest can also be rena the denaturant, allowing reformation of immunologically tured, for example, by dialyzing it against phosphate-buffered and/or biologically active protein. Other suitable buffers are saline (PBS) or 50 mMNa-acetate, pH 6 buffer plus 200 mM known to those skilled in the art. NaCl. Alternatively, the protein can be refolded while immo The heterologously-expressed proteins present in the bilized on a column, such as the NiNTA column by using a Supernatant can be separated from the host proteins by stan 45 linear 6M-1 Murea gradient in 500 mM NaCl, 20% glycerol, dard separation techniques well known to those of skill in the 20 mM Tris/HCl pH 7.4, containing protease inhibitors. The art. For example, an initial salt fractionation can separate renaturation can be performed over a period of 1.5 hours or many of the unwanted host cell proteins (or proteins derived more. After renaturation the proteins can be eluted by the from the cell culture media) from the protein or polypeptide addition of 250 mMimidazole. Imidazole can be removed by of interest. One Such example can be ammonium Sulfate. 50 a final dialyzing step against PBS or 50 mM sodium acetate Ammonium sulfate precipitates proteins by effectively reduc pH 6 buffer plus 200 mM. NaCl. The purified protein can be ing the amount of water in the protein mixture. Proteins then stored at 4°C. or frozen at -80° C. precipitate on the basis of their solubility. The more hydro Other methods include, for example, those that may be phobic a protein is, the more likely it is to precipitate at lower described in M H Lee et al., Protein Expr: Purif., 25(1): p. ammonium sulfate concentrations. A typical protocol 55 166-73 (2002), W. K. Cho et al., J. Biotechnology, 77(2-3): p. includes adding Saturated ammonium Sulfate to a protein 169-78 (2000), Ausubel, et al. (1987 and periodic supple Solution so that the resultant ammonium Sulfate concentration ments), Deutscher (1990) “Guide to Protein Purification.” is between 20-30%. This concentration will precipitate the Methods in Enzymology Vol. 182, and other volumes in this most hydrophobic of proteins. The precipitate is then dis series, Coligan, et al. (1996 and periodic Supplements) Cur carded (unless the protein of interest is hydrophobic) and 60 rent Protocols in Protein Science Wiley/Greene, NY. S. Roe, ammonium sulfate is added to the Supernatant to a concen Protein Purification Techniques. A Practical Approach (Prac tration known to precipitate the protein of interest. The pre tical Approach Series), Oxford Press (2001); D. Bollag, et al., cipitate is then solubilized in buffer and the excess salt Protein Methods, Wiley-Lisa, Inc. (1996) removed if necessary, either through dialysis or diafiltration. E. Proteins of Interest Other methods that rely on solubility of proteins, such as cold 65 The methods and compositions of the present invention are ethanol precipitation, are well known to those of skill in the useful for producing high levels of properly processed protein art and can be used to fractionate complex protein mixtures. or polypeptide of interest in a cell expression system. The US 7,618,799 B2 37 38 protein or polypeptide of interest (also referred to herein as an interferon Such as interferon-alpha, -beta, and -gamma; “target protein’ or “target polypeptide') can be of any species colony stimulating factors (CSFs), e.g., M-CSF, GM-CSF, and of any size. However, in certain embodiments, the protein and G-CSF; interleukins (ILS), e.g., IL-1 to IL-10; anti or polypeptide of interest is a therapeutically useful protein or HER-2 antibody; superoxide dismutase; T-cell receptors; sur polypeptide. In some embodiments, the protein can be a face membrane proteins; decay accelerating factor; viral anti mammalian protein, for example a human protein, and can be, gen Such as, for example, a portion of the AIDS envelope; for example, a growth factor, a cytokine, a chemokine or a transport proteins; homing receptors; addressins; regulatory blood protein. The protein or polypeptide of interest can be proteins; antibodies; and fragments of any of the above-listed processed in a similar manner to the native protein or polypeptides. polypeptide. In certain embodiments, the protein or polypep 10 tide does not include a secretion signal in the coding In certain embodiments, the protein or polypeptide can be sequence. In certain embodiments, the protein or polypeptide selected from IL-1, IL-1a, IL-1b, IL-2, IL-3, IL-4, IL-5, IL-6, of interest is less than 100 kD, less than 50 kD, or less than 30 IL-7, IL-8, IL-9, IL-10, IL-11, IL-12, IL-12elasti, IL-13, kD in size. In certain embodiments, the protein or polypeptide IL-15, IL-16, IL-18, IL-18BPa, IL-23, IL-24, VIP, erythro of interest is a polypeptide of at least about 5, 10, 15, 20, 30. 15 poietin, GM-CSF, G-CSF, M-CSF, platelet derived growth 40, 50 or 100 amino acids. factor (PDGF), MSF, FLT-3 ligand, EGF, fibroblast growth Extensive sequence information required for molecular factor (FGF; e.g., C-FGF (FGF-1), B-FGF (FGF-2), FGF-3, genetics and genetic engineering techniques is widely pub FGF-4, FGF-5, FGF-6, or FGF-7), insulin-like growth factors licly available. Access to complete nucleotide sequences of (e.g., IGF-1, IGF-2); tumor necrosis factors (e.g., TNF, Lym mammalian, as well as human, genes, cDNA sequences, photoxin), nerve growth factors (e.g., NGF), vascular endot amino acid sequences and genomes can be obtained from helial growth factor (VEGF); interferons (e.g., IFN-O, IFN-B, GenBank at the website www.ncbi.nlm.nih.gov/Entrez. IFN-Y); leukemia inhibitory factor (LIF); ciliary neurotrophic Additional information can also be obtained from Gen factor (CNTF); oncostatin M: stem cell factor (SCF); trans eCards, an electronic encyclopedia integrating information forming growth factors (e.g., TGF-C. TGF-B1, TGF-32, about genes and their products and biomedical applications 25 TGF-B3); TNF superfamily (e.g., LIGHT/TNFSF14, from the Weizmann Institute of Science Genome and Bioin STALL-1/TNFSF13B (BLyS, BAFF, THANK), TNFalpha/ formatics (bioinformatics. Weizmann.ac.il/cards), nucleotide TNFSF2 and TWEAK/TNFSF12); or chemokines (BCA-1/ sequence information can be also obtained from the EMBL BLC-1, BRAK/Kec, CXCL16, CXCR3. ENA-78/LIX, Nucleotide Sequence Database (www.ebi.ac.uk/embl/) or the Eotaxin-1, Eotaxin-2/MPIF-2, Exodus-2/SLC, Fractalkine? DNA Databanik of Japan (DDBJ, www.ddbi.nig.ac.ii/; addi 30 Neurotactin, GROalpha/MGSA, HCC-1, I-TAC, Lymphotac tional sites for information on amino acid sequences include tin/ATAC/SCM, MCP-11MCAF, MCP-3, MCP-4, MDC/ Georgetown’s protein information resource website (www STCP-1/ABCD-1, MIP-1.cquadrature. MIP-1.Quadrature. nbrf. Georgetown. edu/pirl) and Swiss-Prot (au.expasy.org/ MIP-2.quadrature./GRO.cquadrature. MIP-3.quadrature./ Sprot/sprot-top.html). Exodus/LARC, MIP-3/Exodus-3/ELC, MIP-4/PARC/DC Examples of proteins that can be expressed in this inven 35 CK1, PF-4. RANTES, SDF1, TARC, or TECK). tion include molecules Such as, e.g., renin, a growth hormone, In one embodiment of the present invention, the protein of including human growth hormone; bovine growth hormone; interest can be a multi-subunit protein or polypeptide. Mul growth hormone releasing factor, parathyroid hormone; thy tisubunit proteins that can be expressed include homomeric roid stimulating hormone; lipoproteins; C-1-antitrypsin; and heteromeric proteins. The multisubunit proteins may insulin A-chain; insulin B-chain; proinsulin; thrombopoietin: 40 include two or more subunits, that may be the same or differ follicle stimulating hormone; calcitonin; luteinizing hor ent. For example, the protein may be a homomeric protein mone; glucagon; clotting factors such as factor VIIIC, factor comprising 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more subunits. IX, tissue factor, and von Willebrands factor; anti-clotting The protein also may be a heteromeric protein including 2, 3, factors such as Protein C. atrial naturietic factor; lung Surfac 4, 5, 6, 7, 8, 9, 10, 11, 12, or more subunits. Exemplary tant; a plasminogen activator, Such as urokinase or human 45 multisubunit proteins include: receptors including ion chan urine or tissue-type plasminogen activator (t-PA); bombesin; nel receptors; extracellular matrix proteins including chon thrombin; hemopoietic growth factor, tumor necrosis factor droitin; collagen; immunomodulators including MHC pro alpha and -beta; enkephalinase; a serum albumin Such as teins, full chain antibodies, and antibody fragments; enzymes human serum albumin; mullerian-inhibiting Substance; including RNA polymerases, and DNA polymerases; and relaxin A-chain; relaxin B-chain; prorelaxin; mouse gonadot 50 membrane proteins. ropin-associated polypeptide; a microbial protein, such as In another embodiment, the protein of interest can be a beta-lactamase; Dnase; inhibin; activin; vascular endothelial blood protein. The blood proteins expressed in this embodi growth factor (VEGF); receptors for hormones or growth ment include but are not limited to carrier proteins, such as factors; integrin: protein A or D; rheumatoid factors; a neu albumin, including human and bovine albumin, transferrin, rotrophic factor such as brain-derived neurotrophic factor 55 recombinant transferrin half-molecules, haptoglobin, (BDNF), neurotrophin-3, -4, -5, or -6 (NT-3, NT-4, NT-5, or fibrinogen and other coagulation factors, complement com NT-6), or a nerve growth factor such as NGF-?3; cardiotro ponents, immunoglobulins, enzyme inhibitors, precursors of phins (cardiac hypertrophy factor) Such as cardiotrophin-1 Substances such as angiotensin and bradykinin, insulin, (CT-1); platelet-derived growth factor (PDGF): fibroblast endothelin, and globulin, including alpha, beta, and gamma growth factor such as aFGF and bFGF; epidermal growth 60 globulin, and other types of proteins, polypeptides, and frag factor (EGF); transforming growth factor (TGF) such as ments thereof found primarily in the blood of mammals. The TGF-alpha and TGF-B, including TGF-?31, TGF-B2, TGF amino acid sequences for numerous blood proteins have been B3, TGF-B4, or TGF-35; insulin-like growth factor-I and -II reported (see, S. S. Baldwin (1993) Comp. Biochem Physiol. (IGF-I and IGF-II): des(1-3)-IGF-I (brain IGF-I), insulin-like 106b:203-218), including the amino acid sequence for human growth factor binding proteins; CD proteins such as CD-3, 65 serum albumin (Lawn, L. M., et al. (1981) Nucleic Acids CD-4, CD-8, and CD-19; erythropoietin; osteoinductive fac Research, 9:6103–61 14.) and human serum transferrin (Yang, tors; immunotoxins; a bone morphogenetic protein (BMP); F. et al. (1984) Proc. Natl. Acad. Sci. USA 81:2752-2756). US 7,618,799 B2 39 40 In another embodiment, the protein of interest can be a ably linked to the carboxy-terminus of the protein. In another recombinant enzyme or co-factor. The enzymes and co-fac embodiment, the protein includes a secretion signal for an tors expressed in this embodiment include but are not limited autotransporter, a two partner secretion system, a mainter to aldolases, amine oxidases, amino acid oxidases, aspar minal branch system or a fimbrial usher porin. tases, B12 dependent enzymes, carboxypeptidases, carbox 5 The following examples are offered by way of illustration yesterases, carboxylyases, chemotrypsin, CoA requiring and not by way of limitation. enzymes, cyanohydrin synthetases, cystathione synthases, decarboxylases, dehydrogenases, alcohol dehydrogenases, EXPERIMENTAL, EXAMPLES dehydratases, diaphorases, dioxygenases, enoate reductases, epoxide hydrases, fumerases, galactose oxidases, glucose 10 Example 1 isomerases, glucose oxidases, glycosyltrasferases, methyl transferases, nitrile hydrases, nucleoside phosphorylases, Identification of dsbC Leader Sequence oxidoreductases, oxynitilases, peptidases, glycosyltras ferases, peroxidases, enzymes fused to a therapeutically I. Materials and Methods active polypeptide, tissue plasminogen activator, urokinase, 15 reptilase, streptokinase; catalase, Superoxide dismutase; A. Construction of pDOW2258 Expression Plasmid Dnase, amino acid hydrolases (e.g., asparaginase, amidohy Standard recombinant DNA techniques were used in the drolases); carboxypeptidases; proteases, trypsin, pepsin, chy construction of plasmid plOW2258 used for the expression motrypsin, papain, bromelain, collagenase; neuramimidase; of the DsbC leader peptide-Skp fusion protein (FIG. 1). lactase, maltase, Sucrase, and arabinofuranosidases. A PCR amplification reaction was performed using Her In another embodiment, the protein of interest can be a culase Master Mix (Stratagene, #600610-51), primers single chain, Fab fragment and/or full chain antibody or frag RC-322 (5'-AATTACTAGTAGGAGGTACATTAT ments or portions thereof. A single-chain antibody can GCGCTT-3', SEQ ID NO:25) and RC-323 (5'-TATACTC include the antigen-binding regions of antibodies on a single GAGTTATTTAACCTGTTTCAGTA-3', SEQ ID NO:26), stably-folded polypeptide chain. Fab fragments can be apiece 25 and template plasmid plOW3001 (already containing the of a particular antibody. The Fab fragment can contain the cloned disbC leader-skip coding sequence fusion generated by antigen binding site. The Fab fragment can contain 2 chains: SOE PCR) to amplify the 521 bp dsbC-skip coding sequence a light chainanda heavy chain fragment. These fragments can using the manufacturer's protocol. The PCR fragment was be linked via a linker or a disulfide bond. purified using the QIAQUICKR) Gel Extraction Kit (Qiagen, The coding sequence for the protein or polypeptide of 30 #28704), digested with Spel and XhoI restriction nucleases interest can be a native coding sequence for the target (New England Biolabs, RO133 and R0146), then ligated to the polypeptide, if available, but will more preferably be a coding expression plasmid plDOW1169 (already digested with Spel sequence that has been selected, improved, or optimized for and XhoI) using T4 DNA ligase (New England Biolabs, use in the selected expression host cell: for example, by M0202) according to the manufacturer's protocol. The liga synthesizing the gene to reflect the codon use bias of a 35 tion reaction was transformed into P. fluorescens DC454 Pseudomonas species such as P. fluorescens or other Suitable (Isc:lacI' ApyrF) by electroporation, recovered in SOC organism. The gene(s) that result will have been constructed with-Soy medium and plated on selective medium (M9 glu within or will be inserted into one or more vectors, which will cose agar). Colonies were analyzed by restriction digestion of then be transformed into the expression host cell. Nucleic plasmid DNA (Qiagen, cat.#27106). Ten clones containing acidor a polynucleotide said to be provided in an “expressible 40 inserts were sequenced to confirm the presence of error-free form' means nucleic acid or a polynucleotide that contains at dsbC-skip coding sequence. Plasmid from sequence con least one gene that can be expressed by the selected expres firmed isolates were designated pCOW2258. sion host cell. B. Growth and Expression Analysis in Shake Flasks P In certain embodiments, the protein of interest is, or is uorescens strain DC454 (Isc::lacI' ApvrF)py isolates con Substantially homologous to, a native protein, Such as a native 45 taining plOW2258 were analyzed by the standard Dow 1 mammalian or human protein. In these embodiments, the L-scale shake-flask expression protocol. Briefly, seed cul protein is not found in a concatameric form, but is linked only tures grown in M9 medium supplemented with 1% glucose to a secretion signal and optionally a tag sequence for purifi and trace elements were used to inoculate 200 mL of defined cation and/or recognition. minimal salts medium with 5% glycerol as the carbon Source. In other embodiments, the protein of interest is a protein 50 Following an initial 24-hour growth phase, expression via the that is active at a temperature from about 20 to about 42°C. In Ptac promoter was induced with 0.3 mM isopropyl-f-D-1- one embodiment, the protein is active at physiological tem thiogalactopyranoside (IPTG). peratures and is inactivated when heated to high or extreme Cultures were sampled at the time of induction (IO), at 24 temperatures, such as temperatures over 65° C. hours after induction (I24), and at 48 hours after induction In one embodiment, the protein of interest is a protein that 55 (I48). Cell density was measured by optical density at 600 nm is active at a temperature from about 20 to about 42°C. and/or (ODoo). The cell density was adjusted to ODoo 20, and is inactivated when heated to high or extreme temperatures, aliquots of 100 u, were centrifuged at 14,000xrpm for 5 such as temperatures over 65° C.; is, or is substantially minutes and the Supernatant was removed. homologous to, a native protein, such as a native mammalian Soluble and insoluble fractions from shake flask samples or human protein and not expressed from nucleic acids in 60 were generated using EASYLYSETM (Epicentre Technolo concatameric form; and the promoter is not a native promoter gies). The cell pellet was resuspended and diluted 1:4 in lysis in Pfluorescens but is derived from another organism, Such as buffer and incubated with shaking at room temperature for 30 E. coli. minutes. The lysate was centrifuged at 14,000 rpm for 20 In other embodiments, the protein when produced also minutes (4°C.) and the Supernatant removed. The Supernatant includes an additional targeting sequence, for example a 65 was saved as the soluble fraction. Samples were mixed 1:1 sequence that targets the protein to the extracellular medium. with 2x Laemmli sample buffer containing B-mercaptoetha In one embodiment, the additional targeting sequence is oper nol (BioRadicatil 161-0737) and boiled for 5 minutes prior to US 7,618,799 B2 41 42 loading 20 uL on a Bio-Rad Criterion 12% Bis-Tris gel (Bio B. Construction of Transposome to Screen for P. Fluore Rad cath 45-01 12 Lothicx090505C2) and electrophoresis in scens Signal Sequences 1xMES buffer (cat.# 161-0788 Loth 210001188). Gels were A transposome vector was engineered by inserting the stained with SIMPLYBLUETM SafeStain (Invitrogen cathi kanR gene (encoding resistance to kanamycin) and a phoA LC6060) according to the manufacturer's protocol and 5 reporter gene, which was missing the start codon and N-ter imaged using the Alpha Innotech Imaging system. minal signal sequence, between the vector-encoded trans C. N-terminal Sequencing Analysis posase binding sites (“mosaic ends’) in the transposome vec Soluble and insoluble fractions separated by SDS-PAGE torpMOD-2 (Epicentre Technologies). The 1.6 kB were transferred to Sequencing grade PVDF membrane (Bio- 10 kanR gene was purified from puC4-KIXX (Pharmacia) by Rad, cat.#162-0236) for 1.5 hours at 40 Vusing 10 mMCAPS restriction digestion with XhoI, then ligated into Sall-di (2.21 g/L), pH 11 (with NaOH), and 10% methanol as the gested pMOD2 to form plDOW1245. The signal-se transfer buffer. The blot was stained in the staining solution quence-less phoA gene was PCR-amplified from E. coli K12 (0.2% Commassie Brilliant Blue R-250, 40% methanol, 10% (ATCC) with BamHI and Xbal sites added by the primers. 5 After restriction digestion, the gene was ligated into BamHI acetic acid) for ten seconds then immediately destained three and Xbal-digested plDOW1245 to make pDOW1208. The times, ten seconds each. Proteinbands of interest were cutout linear transposome was prepared by restriction digestion of from the blot and sequenced using Edman degradation per pDOW 1208 with Psh A1 and gel purification of the 3.3 kb formed on a PROCISE(R) Protein Sequencer (model 494) mosaic-end-flanked fragment using Ultrafree DA (Amicon). from Applied Biosystems (Foster City, Calif.). 2O After passage over a MicroBioSpin 6 column (Biorad), 30 ng II. Results was mixed with 4 units of transposase (Epicentre) and ali quots were electroporated into Pfluorescens MB 101. SDS-PAGE analysis confirmed significant accumulation C. Identification of Improved pbp Signal Sequence of protein at the predicted molecular weight for Skp (~17 Apbp-proinsulin-phoA expression plasmid was designed kDa) at both 24 hours (I24) and 48 hours (148) post-induction as to fuse the pbp-proinsulin protein to a mature PhoA enzyme, in both the soluble and insoluble fractions (FIG. 2). so that accumulation of proinsulin-phoA in the periplasm N-terminal sequencing analysis confirmed that the induced could be measured and strains with improved accumulation soluble band of expected size for Skip protein at I24 produced could be selected by assaying for PhoA activity The fusion the first 5 amino acids of the predicted protein sequence for between the pbp signal sequence, human proinsulin, and the processed form of DsbC-Skip (ADKIA, SEQID NO:27). 30 phoA in plNS-008 was constructed by SOE PCR (Horton et The N-terminal analysis also showed that two bands that al. 1990), using primers that overlap the coding sequence for accumulated in the insoluble fraction at I24 produced both the the secretion leader and proinsulin, and for proinsulin and the processed and un-processed forms of DsbC-Skp. The higher mature form of PhoA (i.e. without the native secretion molecular weight band produced the first 10 amino acids of leader). The fusion was cloned under the control of the tac the predicted protein sequence for the unprocessed from of 35 promoter in pdOW 1169 (Schneider et al. 2005a; Schneider, DsbC-Skip (MRLTQIIAAA, SEQID NO:28) while the lower Jenings etal. 2005b) which was restriction digested with Spel molecular weight band produced the first 10 amino acids of and XhoI and treated with shrimp alkaline phosphatase, then the predicted protein sequence for the processed form of the ligated and electroporated into DC206, to form plNS-008. The proinsulin gene template was codon-optimized for DsbC-Skp protein (ADKIAIVNMG, SEQ ID NO:29). See O FIGS 3A and 3B. expression in Pfluorescens and synthesized (DNA 2.0). The phoA gene was amplified from E. coli MG 1655 genomic DNA. The colonies were screened on agar plates containing Example 2 BCIP, a calorimetric indicator of alkaline phosphatase activ ity, with IPTG to induce expression of the pbp-proinsulin Identification of pbp Secretion Signal 45 phoA gene. Of the colonies that exhibited BCIP hydrolysis, one grew much larger than the others. This isolate was found I. Materials and Methods to have a single C to Tmutation in the region encoding the secretion peptide, causing a change from alanine to valine at A. Strains amino acid 20 (A20V, SEQID NO:2; see Table 6). DC206 (ApyrF, 1sc::lacI?") was constructed by PCR 50 The expression of the two strains was assessed by the amplification of the E. coli lacI gene from pCN5 lacI standard shake flask protocol. The growth of both leveled off (Schneider et al. 2005b) using primers incorporating the shortly after addition of the IPTG inducer. Alkaline phos lacI promoter mutation (Calos et al. 1980), and recombin phatase activity in the mutant pINS-008-3 strain was 3-4 ing the gene into the lsc (levan sucrase) locus of MB101 times higher (FIG. 6), and accumulation of the (soluble) ApyrF (Schneider, Jenings et al. 2005b) by allele exchange. protein was higher (FIG. 7).

TABLE 6 Sec secretion signals identified in P. fluorescens Curated Predicted signal SEQ SEQ Protein sequence (signalP- ID ID Function Abbreviation HMM) NO: DNA sequence NO : pbp (signal pbp* MKLKRLMAAMTF 2 atgaaactgaaacgtttgatggcggcaa. 1. sequence WAAGWATWNAWA tgactitttgtc.gctgctggcgttgcgacc mutant) gt caacg.cggtggcc US 7,618,799 B2

TABLE 6-continued Sec secretion signals identified in P. fluorescens Curated Predicted signal SEQ SEQ Protein sequence (signalP- ID ID Function Abbreviation HMM) NO: DNA sequence NO : porin E1 PO MKKSTLAWAWTL 31 atgaagaagt ccaccittggctgtggctgt 3O precursor GAIAQQAGA aacgttgggcgcaatcgcc.ca.gcaa.gc aggcgct

Outer OP MKLKNTLGLAIGS 33 atgaaactgaaaaac accttgggcttgg 32 membrane LIAATSFGWLA ccattggttct cittattgcc.gct actitcttitc porin F ggcgttctggca

Periplasmic PB MKLKRLMAAMTF 35 atgaaactgaaacgtttgatggcggcaa. 34 phosphate WAAGWATANAWA tgactitttgtc.gctgctggcgttgcgacc binding gccaacgcggtggcc protein (pbp) azurin AZ MFAKLWAWSLLTL 37 atgtttgc caaactic gttgctgttitcc.ctg 36 ASGQLLA Ctgact ctggcgagcggc.cagttgcttg ct

ae L MIKRNLLWMGLA 39 atgatcaaacgcaatctgctggittatggg 38 lipoprotein WLISA Ccttgc.cgtgctgttgagcgct B precursor

Lysine- LAO MONYKKFLLAAA 41 atgcagaactataaaaaattic cttctggc 4 O arginine- WSMAFSATAMA cgcggc.cgt.ctcgatggcgttcagcgc ornithine- cacggc catggca binding protein

Iron (III) IB MIRDNRLKTSLLR 43 atgatcc.gtgacaac cqactcaagacat 42 binding GLTLTLISLTLLSP cc cttctg.cgcggcc tigaccct cacc ct protein AAHS acticagoctdaccctgct citcgc.ccg.cg gcc cattct

D. Genomic Sequencing sequence was amplified using pGal2 (Martineau et al. 1998) Genomic DNA was purified by the DNA Easy kit (Invitro as template. The 837 bp SOE-PCR product was cloned into gen) and 10 ug were used as template for sequencing with a the pCR BLUNT II TOPO vector and sequence was con transposon specific primer using 2x ABI PRISM BigDye 40 firmed. The Sclv gene fused to signal sequence was excised Terminators v3.0 Cycle Kit (Applied Biosystems). The reac from the TOPO vector with Xbaland SalI restriction enzymes tions were purified and loaded on the ABI PRISM 3100 and cloned into the Spel and XhoI sites of pMYC1803 to Genetic Analyzer (Applied Biosystems) according to manu produced plDOW1122 (oprF:gal2) and plDOW1123 (pbp: facturer's directions. gal2) using standard cloning techniques (Sambrook et al. E. Cloning of Signal Sequence Coding Regions 45 2001). The resulting plasmids were transformed into DC191 Signal sequences were determined by SPScan software or selected on LB agar Supplemented with 30 g/mL tetracy as in (De et al. 1995). The results of these experiments have cline and 50 ug/mL, kanamycin. been disclosed in copending U.S. Patent Application Number The porE signal sequence from pl)OW 1183 and fused by 20060008877, filed Nov. 22, 2004. The outer membrane 50 SOE-PCR to gal2amplified from p)OW1123 and PCR prod porin F (oprF) phosphate binding protein (pbp), porineI ucts were purified by gel extraction. The resulting PCR prod (porE), aZurin, lipoprotein B and iron binding protein secre uct was cloned into PCRII Blunt TOPO and subsequently tion leaders were amplified by polymerase chain reaction transformed into E. coli Top 10 cells according to manufac (PCR). The resulting PCR products were cloned into the turers instructions (Invitrogen). The resulting clones were pCRII Blunt TOPO vector and transformed into E. coli Top 10 55 sequenced and a positive clone (pDOW1185) selected for cells (Invitrogen) according to the manufacturer's protocol. subcloning. pl.)OW 1185 was restriction digested with Spel Resulting transformants were screened for correct insert by and Sal, and the porE-gal2 fragment was gel purified. The sequencing with M13 forward and M13 reverse primers. purified fragment was ligated to Spel-Sal digested Positive clones were named as follows: oprE (pDOW1112), pDOW1169 using T4 DNA ligase (NEB). The ligation mix pbp (pDOW1113), porinE1 (pDOW1 183), azurin 60 was transformed into electrocompetent DC454 and selected (pDOW1180), lipoprotein B (pDOW1182), iron binding pro on M91% glucose agar plates. Transformants were screened tein (pDOW 1181). by restriction digestion of plasmid DNA using Spel and SalI. F. Construction of gal2 schv Clones for Secretion in P A positive clone was isolated and stocked as pl)OW 1186. fluorescens. Signal sequences of aZurin, iron binding protein and lipB The OprF and pbp signal sequences were amplified to fuse 65 were amplified from clones plDOW1180, plDOW1181 and to the Gal2 coding sequence at the +2 position using pDOW 1182, respectively. The gal2 gene was amplified from pDOW1112 or pl)OW1113 as template. The gal2 coding pDOW1123 using appropriate primers to fuse to each secre US 7,618,799 B2 45 46 tion leader, and resultant PCR products were isolated and resuspended in an equal Volume of lysis buffer and resus fused by SOE-PCR as described above. The SOE-PCR prod pended by pipetting up and down. For selected clones, cell ucts were cloned into pCR-BLUNT II TOPO, the resultant free broth samples were thawed and analyzed without dilu clones were sequenced and positive clones for each fusion tion. were subcloned into plOW 1169 as described above. L. Expression and Analysis of Secretion of Proteins or G. Construction of a Pfluorescens Secretion Vector with Polypeptides of Interest C-Terminal Histidine Tag The seed cultures, grown in 1xM9 supplemented with 1% A clone containing an insert with the pbp Secretion leader, glucose (Teknova), Supplemented with trace element Solution MCS with C-terminal His tag, and rrn T1 T2 transcriptional were used to inoculate 50 mL of Dow defined minimal salts terminators was synthesized by DNA 2.0 (p.J5:G03478). The 10 medium at 2% inoculum, and incubated at 30°C. with shak 450 bp secretion cassette was isolated by restriction digestion ing. Cells were induced with 0.3 mM IPTG (isopropyl B-D- with Speland NdeI and gel purified. The fragment was ligated thiogalactopyranoside) ~24 hours elapsed fermentation time to pIDOW1219 (derived from pMYC 1803 (Shao et al. 2006)) (EFT). Samples were taken at time of induction (IO) and 16 digested with the same enzymes. The ligation products were (I16), 24 (I24), or 40 (I40) hours post induction for analyses. transformed into chemically competent E. coli JM109. Plas 15 Cell density was measured by optical density at 600 nm mid DNA was prepared and screened for insert by PCR using (ODoo). The cell density was adjusted to ODoo 20, and 1 vector specific primers. The resultant plasmid was sequence mL was centrifuged at 14000xg for five minutes. Superna confirmed and named pl)OW3718. Electrocompetent Pfluo tants (cell free broth) were pipetted into a new microfuge rescens DC454 was then transformed with the plasmid and tube, then cell pellets and cell free broth samples were frozen selected on LBagar Supplemented with 250 ug/mL uracil and at -80° C. for later processing. 30 ug/mL tetracycline. M. SDS-PAGE Analysis Open reading frames encoding human proteins were Soluble and insoluble fractions from shake flask samples amplified using templates from the human ORFeome collec were generated using EASY LYSETM Buffer (Epicentre tion (Invitrogen). PCR products were restriciton digested Technologies). The frozen pellet was resuspended in 1 mL of with Nhe and XhoI, and then ligated to NheI-XhoI digested 25 lysis buffer. Fifty microliters were added to an additional 150 pDOW3718. Ligation products were subsequently trans uL EASYLYSETM buffer and incubated with shaking at room formed into electrocompetent P. fluorescens DC454 and temperature for 30 minutes. The lysate was centrifuged at transformants selected on LB agar supplemented with 250 14,000 rpm for 20 minutes (4° C.) and the supernatant ug/mL uracil and 30 ug/mL tetracycline. Positive clones were removed. The supernatant was saved as the soluble fraction. sequenced to confirm insert sequence. 30 The pellet was then resuspended in an equal volume (200LL) H. Construction of E. coli Secretion Clones of lysis buffer and resuspended by pipetting up and down; this Human ORFs were amplified as above, except that primers was saved as the insoluble fraction. Cell free broth samples were designed with an NcoI site on the 5' primer and XhoI on were thawed and used at full strength. the 3'primer. PCR products were restriction digested with N. Western Analysis NcoI and XhoI (NEB), then purified using Qiaquick Extrac 35 Soluble and insoluble fractions prepared and separated by tion kit (Qiagen) The digested products were ligated to Nco SDS-PAGE were transferred to nitrocellulose (BioRad) using XhoI digested pBT22b (Novagen) using T4 DNA ligase 1x transfer buffer (Invitrogen) prepared according to manu (NEB), and the ligation products were transformed into facturer's protocol, for 1 hour at 100V. After transfer, the blot chemically competent E. coli Top 10 cells. Transformants was blocked with POLY-HRP diluent (Research Diagnostics, were selected in LB agar ampicillin plates (Teknova). Plas 40 Inc.) and probed with a 1:5,000 dilution of anti His tag anti mid DNA was prepared (Qiagen) and positive clones were body (Sigma or US Biologicals). The blot was washed with sequenced to confirm insert sequence. One confirmed cloned 1xPBS-Tween and subsequently developed using the Immu plasmid for each was subsequently transformed into BL21 nopure Metal Enhanced DAB Substrate Kit (Pierce). (DE3)(Invitrogen) for expression analysis. O. 20 L. Fermentation I. DNA Sequencing 45 The inocula for the fermentor cultures were each generated Sequencing reactions (Big dye version 3.1 (Applied Bio by inoculating a shake flask containing 600 mL of a chemi systems)) were purified using G-50 (Sigma) and loaded into cally defined medium Supplemented with yeast extract and the ABI3100 sequencer. glycerol with a frozen culture stock. After 16-24hr incubation J. High Throughput (HTP). Expression Analysis with shaking at 32° C., the shake flask culture was then The Pfluorescens strains were analyzed using the standard 50 aseptically transferred to a 20 L fermentor containing a Dow HTP expression protocol. Briefly, seed cultures grown medium designed to support a high biomass. Dissolved oxy in M9 medium supplemented with 1% glucose and trace gen was maintained at a positive level in the liquid culture by elements were used to inoculate 0.5 mL of defined minimal regulating the sparged airflow and the agitation rates. The pH salts medium with 5% glycerol as the carbon source in a 2.0 was controlled at the desired set-point through the addition of mL deep 96-well plate. Following an initial growth phase at 55 aqueous ammonia. The fed-batch high density fermentation 30°C., expression via the Ptac promoter was induced with 0.3 process was divided into an initial growth phase of approxi mM isopropyl-f-D-1-thiogalactopyranoside (IPTG). Cell mately 24 hr and gene expression phase in which IPTG was density was measured by optical density at 600 nm (ODoo). added to initiate recombinant gene expression. The expres K. Preparation of HTP Samples for SDS-PAGE Analysis sion phase of the fermentation was then allowed to proceed Soluble and insoluble fractions from culture samples were 60 for 24 hours. generated using EASY LYSETM (Epicentre Technologies P. N-Terminal Amino Acid Sequence Analysis cathRP03750). The 25 L whole broth sample was lysed by Samples were run as described in SDS-PAGE analysis adding 175 mL of EASY LYSETM buffer, incubating with above and transferred to a Criterion Sequi-Blot PVDF mem gentle rocking at room temperature for 30 minutes. The lysate brane (Biorad). The membrane was stained with GelCode was centrifuged at 14,000 rpm for 20 minutes (4°C.) and the 65 Blue stain reagent (Pierce) and subsequently destained with Supernatant removed. The Supernatant was saved as the 50% methanol 1% acetic acid, rinsed with 10% methanol soluble fraction. The pellet (insoluble fraction) was then followed by de-ionized water, then dried. Bands of interest US 7,618,799 B2 47 48 were sliced from the membrane, extracted and subjected to 8 shows a mixture of protein with processed and unprocessed cycles of Edman degradation on a Procise protein sequencing secretion signal. However, the signal sequence was observed system, model 494 (Applied Biosystems, Foster City, Calif.). to be fully processed from the IB-Gal2 fusion. P. Edman, Acta Chem. Scand. 4, 283 (1950); review R. A. Expression of Gal2 scFv fused to each of seven leaders was Laursen et al., Methods Biochem. Anal. 26, 201-284 (1980). evaluated at the 20 L fermentation scale using standard fer mentation conditions. All Strains grew as expected, reaching II. Results induction OD (~180 units) at 18-24 hours. The lipB-Gal2 A. Identification of Native Secretion Signal Sequences by strain grew slightly more slowly than other strains. This was Transposon Mutagenesis not wholly unexpected as the lipB-Gal2 strain did not grow To identify P. fluorescens signal sequences that would 10 following inoculation of shake flask medium at Small scale secrete a heterologous protein to the periplasm or broth, a fermentation. Expression and processing of the Gal2 schv secretion reporter gene was cloned into a transposome. The was assessed by SDS-PAGE and Western blot. SDS-PAGE secretion reporter gene used is an E. coli alkaline phosphatase analysis showed that high levels of Gal2 was expressed when gene (phoA) without a start codon or N-terminal signal fused to either the OP or PB secretion signals. However, only sequence. PhoA is active in the periplasm (but not the cytosol) 15 a portion (~50%) of the OP-Gal2 fusion protein appeared to due to the formation of intramolecular disulfide bonds that be secreted to the periplasm with the signal sequence cleaved. allow dimerization into the active form (Derman et al. 1991). As observed at Small scale, Gal2 was expressed predomi A similar method referred to as “genome scanning was used nantly in the insoluble fraction, although soluble protein was to find secreted proteins in E. coli (Bailey et al 2002). The detected by Western blot. A small amount of protein was also phoA gene has also been used to analyze secretion signals in detected in the culture Supernatant, indicating leakage from periplasmic, membrane, and exported proteins in E. coli the periplasm (FIG. 7). N-terminal sequence analysis con (Manoil et al. 1985) and in other bacteria (Gicquel et al. firmed that the ibp and azurin leaders were processed as 1996). After electroporation and plating on indicator media, expected, resulting in the N-terminal amino acid sequence eight blue colonies were isolated. The insertion site of the AQVQL (SEQ ID NO:44). Likewise, the PorE secretion transposome in the genome was sequenced and used to search 25 leader appeared to be processed by Western analysis and was a proprietary genome database of P. fluorescens MB101. confirmed by N-terminal analysis. The level of insoluble Eight gene fusions identified as able to express active PhoA PorE-Gal2 expression was slightly lower than that of are shown in Table 6. insoluble ibp-Gal2 and azurin-Gal2. LipB-Gal2 showed B. Cloning of Signal Sequence-Gal2 Fusions expression of processed Gal2 at levels similar to that of PorE The signal sequences of the secreted proteins identified 30 Gal2. The greatest amount of protein was observed from above, outer membrane porin F (OP), phosphate binding strains expressing pbp-Gal2 and pbpA20V-Gal2. The amount protein porE (PB), iron binding protein (IB), azurin (AZ), of Gal2 expressed from the pbpA20V-Gal2 strain appeared to lipoprotein B (L) and lysine-ornithine-arginine binding pro be even higher than that produced by the pbp-Gal2 strains tein (LOA) were predicted using the SignalP program (J. D. (FIG. 6). Soluble processed Gal2 was detected by Western Bendtsen 2004). Signal sequences for OP, PE and AZ have 35 analysis, as was a mixture of unprocessed and processed been previously identified in other systems Arvidsson, 1989 insoluble protein (FIG. 7). N-terminal sequence analysis of #25; De, 1995 #24; Yamano, 1993 #23. The activity of an the insoluble protein confirmed a mixture of unprocessed and additional secretion leader identified in another study, correctly processed Gal2. pbpA20V (Schneider et al. 2006), was also analyzed in par allel. In this study the coding region of six native Pfluore 40 Example 3 scens signal sequences, and a mutant of the Pfluorescens phosphate binding protein signal sequence (see Table 6) were Identification of Bce Leader Sequence each fused to the gal2 schv gene using splicing by overlap extension PCR (SOE-PCR) as described in Materials and I. Materials and Methods Methods such that the N-terminal 4 amino acids of Gal2 45 following cleavage of the signal peptide would be AQVO. BceL is a secretion leader that was identified to be encoded Repeated attempts to amplify the LAO signal sequence failed, by part of DNA insert containing a gene for a hydrolase from and this signal sequence was dropped from further analysis. Bacillus coagulans CMC 104017. This strain Bacillus coagul The gene fusions were cloned into the Pfluorescens expres lans is also known as NCIMB 8041, ATCC 10545 and DSMZ sion vector plOW1169 and transformed into DC454 host 50 2311 in various commercial culture collections, and has its strain (ApyrFlsc:lacI'). The resultant strains were subse origins as NRS784. NRS 784 is from the NR Smith collection quently assessed for Gal2 scFv expression and proper pro of Spore forming bacteria (Smith et al Aerobic spore forming cessing of the secretion leaders. bacteria US. Dep. Agr. Monogra. 16:1-148 (1952)). The other C. Expression of Secreted Gal2 schv original reference for this strain cited by NCIMB is Cambell, At the shake flask scale, fusions of PB, OP, PO, AZ, IB, and 55 L. L. and Sniff E. E. (1959. J. Bacteriol. 78:267 An investi L to gal2 schv achieved the expected ODoo, except for gation of Folic acid requirements of Bacillus coagulans). L-gal2 sclv, which failed to grow following subculture into Sequence and Bioinformatics Analysis production medium (data not shown). Western blot analysis A DNA insert of 4,127 bp from Bacillus coagulans CMC confirmed that the PB, OP, PO, AZ and IB signal sequences 104017 was sequenced and analyzed to localize coding were cleaved from the Gal2 scv fusion. However Western 60 sequences potentially encoding a hydrolase enzyme. One analysis showed the presence of unprocessed PB-Gal2 and coding sequence of 1.314 bp, designated CDS1, was identi OP-Gal2. Some soluble Gal2 scFv expressed from AZ and IB fied behind the lac promoter at the 5' end. The DNA and fusions was found in the cell-free-broth, indicating that predicted protein sequences for CDS1 are set forth in SEQID soluble protein was expressed and leaked from the periplas NO:45 and 46, respectively. CDS1 was determined most mic space. Amino terminal sequence analysis was performed 65 likely to encode a hydrolase based upon BLASTP analysis of to confirm the cleavage of the signal sequence. Insoluble Gal2 the predicted protein sequence. The CDS1 sequence showed protein expressed from the azurin (pDOW1191) fusions homology (E-value: 2e) to beta-lactamase from US 7,618,799 B2 49 50 Rhodopseudomonas palustris HaA2. SignalP 3.0 hidden minutes. The lysate was centrifuged at 14,000 rpm for 20 Markov model analysis (Bendtsen J. D. Nielson G. von Heijne minutes (4°C.) and the Supernatant removed. The Supernatant G. Brunak S: Improved prediction of signal peptides: Signal was saved as the soluble fraction. The pellet (insoluble frac 3.0. J. Mol. Biol. 2004, 340:783.) of CDS1 predicted the tion) was then resuspended in an equal Volume of lysis buffer presence of a signal sequence for the organism class Gram and resuspended by pipetting up and down. Cell free broth positive bacteria with a signal peptidase cleavage site samples were thawed and used at full strength. Samples were between residues 33/34 of SEQID NO:46. mixed 1:1 with 2x Laemmlisample buffer containing 3-mer Construction of Protein Expression Plasmids captoethanol (BioRadicatil 161-0737) and boiled for 5 min Standard cloning methods were used in the construction of utes prior to loading 20 L on a Bio-Rad Criterion 10% Crite expression plasmids (Sambrook J. Russell D: Molecular 10 rion XT gel (BioRad cathl 45-01 12) and separated by Cloning a Laboratory Manual, third edn. Cold Spring Harbor: electrophoresis in the recommended 1xMOPS buffer (cat...it Cold Spring Harbor Press; 2001). DNA sequence fusions 161-0788 Loth 210001188). Gels were stained with SIMP were performed using the SOE-PCR method (Horton, R. M., LYBLUETM SafeStain (Invitrogen cath. LC6060) according Z. Cai, S. N. Ho and L. R. Pease (1990). “Gene splicing by to the manufacturer's protocol and imaged using the Alpha overlap extension: tailor-made genes using the polymerase 15 Innotech Imaging system. The protein quantity of gel bands chain reaction.” BioTechniques 8(5): 528-30, 532, 534-5)). of interest were estimated by comparison to BSA protein Phusion DNA polymerase (New England Biolabs standards loaded to the same gel. cathF531S) was used for all PCR reactions. Plasmids were designed to express and localize an esterase II. Results protein from Bacillus coagulans CMC 104017 into either the A total of six shake flasks (3 flasks per strain) were used to cytoplasm or periplasmic space of P. fluorescens. The final evaluate hydrolase expression. Growth of the periplasmic and PCR products were digested with the Spel and XhoI restric cytoplasmic designed Strains were consistent with normal tion endonucleases (New England Biolabs cat.#R0133 and growth for P. fluorescens strains, reaching an ODoo of #R0146) then ligated into expression vector pl)OW1169, approximately 15 at twenty-four hours post induction. SDS also digested with Speland XhoI, using T4DNA ligase (New 25 PAGE analysis was performed to assess hydrolase (CDS1 England Biolabs cat. iiM0202S) to produce the cytoplasmic protein) expression at the time of induction and 24 hours CMC104641 CDS-1 expression vector p484-001 and the post-induction. Soluble, insoluble, and cell free broth frac native Bce leader CMC 104641 CDS-1 expression vector tions were analyzed by SDS-PAGE. For the cytoplasmic p484-002. The ligation reaction mixtures were then trans CDS-1 strain (p484-001), protein of the expected size for formed into Pfluorescens strain DC454 (ApyrF, lacI) by 30 cytoplasmic hydrolase (44.1 kDa) accumulated almost electroporation, recovered in SOC-with-soy medium entirely in the soluble fraction at 124 (24 hours following (Teknova catfi2S2699) and plated on selective medium (M9 IPTG induction) in all three isolates at an estimated yield of glucose agar, Teknovacati2M1200). Colonies were analyzed 0.1 mg/mL. FIG. 8 shows representative results for the cyto by restriction digestion of miniprep plasmid DNA (Qiagen, plasmic strain evaluated as EP484-003. A negligible band of cat.#27106). Ten clones from each transformation were 35 expected size was detectable in the insoluble fraction and no sequenced to confirm correct insert. CDS1 protein was detected in the cell-free broth. For the Expression Analysis periplasmic strain expressing the native Bce leader-CDS1 The Pfluorescens strain DC454 carrying each clone was (p484-002), protein of the expected size for native esterase examined in shake-flasks containing 200 mL of defined mini accumulated almost entirely in the soluble fraction at 124 in mal salts medium with 5% glycerol as the carbon source 40 all three isolates at an estimated yield of 0.8 mg/mL. FIG. 8 (“Dow Medium'). Following an initial growth phase, expres shows representative results for the periplasmic strain con sion via the tac promoter was induced with 0.3 mMisopropyl taining the Bce leader fusion evaluated as EP484-004. It was B-D-1-thiogalactopyranoside (IPTG). Cultures were sampled unclear if the expressed native esterase was entirely pro at the time of induction (IO), and at 24 hours post induction cessed since the gel loading used made it difficult to discern (I24). Cell density was measured by optical density at 600 nm 45 between the predicted unprocessed size of 47.6 kDa and (ODoo). A table showing the shake flask numbering scheme processed size of 44.1 kDa. Similar to results with the cyto is shown in Table 7. plasmic expression strain, a negligible band of expected size was detectable in the insoluble fraction and no CDS1 protein TABLE 7 was detected in the cell-free broth. The translated sequence of 50 the Bce Leader of interest is set forth in SEQID NO:8. Host Plasmid number Strain (leader-gene) Flask Number Example 4 DC4S4 P4.84-001 EP484-001 EP484-002 EP484-003 (cytoplasmic 484) Identification and Analysis of Pfluorescens DC4S4 P4.84-002 EP484-004 EP484-005 EP484-006 55 Secretion Leaders (native leader 484) 6,433 translated ORFs from the MB214 genome were ana At each sampling time, the cell density of samples was lyzed with the signal peptide prediction program, SignalP2.0 adjusted to ODoo 20 and 1 mL aliquots were centrifuged at (Nielsen, H., et al. Protein Eng, 1997. 10(1): p. 1-6). 1326 14000xg for five minutes. Supernatants (cell free broth) were 60 were predicted by the HMM model to contain a signal pep pipetted into a new microfuge tube then cell pellets and cell tide. These proteins were analyzed with PsortB 2.0 (Gardy, J. free broth samples were frozen at -20°C. L., et al. Bioinformatics, 2005.21 (5): p. 617-23) and all those Cell Lysis and SDS-PAGE Analysis with a PsortB final localization identified as cytoplasmic or Soluble and insoluble fractions from shake flask samples cytoplasmic membrane were removed leaving 891. 82 pro were generated using Easy Lyse (Epicentre Technologies). 65 teins for which the SignalPHMM probability of containing a The frozen pellet was resuspended and diluted 1:4 in lysis signal peptide was below 0.79 were removed yielding 809. buffer and incubated with shaking at room temperature for 30 The cutoff of 0.79 was chosen because that was the highest US 7,618,799 B2 51 52 value that did not exclude aprA (RXF04304, known to be an 8e7 and 5e. This method yielded 111 unique potential extracellular protein). The amino terminal sequences of these homologues, some of which overlapped with the 7 targets 809 translated ORFs containing the signal peptide as pre obtained above. dicted by the SignalP Neural Network algorithm plus the first The combined list of 18 unique proteins were analyzed 7 amino acids of the processed protein were aligned using using SignalP and 9 final targets which were predicted to have CLUSTALX 1.81 (Thompson, J. D., et al. Nucleic Acids Res, a single likely signal peptidase cut site were chosen for 1997. 25(24): p. 4876-82). expression studies. Huber et al. Suggest that highly hydrophobic signal sequences are more likely to be co-translationally secreted Isolation and Sequence Analysis of Secretion Leaders (Huber, D., et al. J. Bacteriol, 2005. 187(9): p. 2983-91). For 10 The identified Pfluorescens secretion leaders were ampli the purpose of identifying co-translationally secreted proteins fied from DC454 (descended from Pfluorescens MB 101) the amino acid indexes of Wertz-Scheraga (WS)(Wertz, D. H. genomic DNA and cloned into pCRBLUNTII-TOPO (Invit and H. A. Scheraga, Macromolecules, 1978. 11(1): p. 9-15), rogen) for DNA sequence verification. The DNA and were found to be the best. For this study, these indexes were deduced amino acid sequence of each Pfluorescens Secretion obtained from AAindex on the worldwide web at www 15 leader isolated is referenced in Table 9. ..genomeip/dbget-bin/www bget?aax 1:WERD780101. An algorithm reported by Boyd (Boyd, D., C. Schierle, and J. TABLE 9 Beckwith, Protein Sci, 1998. 7(1): p. 201-5), was modified and used to rank the 809 proteins based on hydrophobicity. A fluorescens Secretion leader Sequences The algorithm scans each sequence averaging the WS Scores LEADER DNA SEQID NO: AMINO ACID SEQ ID NO: within a window of 12. The most hydrophobic region is used Cup A2 9 10 to assign the WS score for the whole protein. This yielded 142 CupB2 11 12 signal sequences with WS scores greater than 0.69, the cutoff CupC2 13 14 defined in Huberet. al. This smaller list was cross-referenced ToB 49 50 25 NikA 15 16 with data from 2D-LC whole proteome experiments per Flg.I 17 18 formed by the Indiana Centers for Applied Protein Sciences ORF5550 19 2O (INCAPS). These experiments attempted to identify and Ttg2C 21 22 quantify all proteins expressed in MB214 (descended from P ORF8124 23 24 fluorescens MB101) under a variety of growth conditions. A 30 protein that appears in this list with high maximum expres Fusion of Secretion Leaders to Gal2 scv and E. Coli sion levels is likely to be highly expressed. In these data a Thioredoxin and Expression Analysis priority score of 1 or 3 indicates high confidence in the iden tification of the protein. The proteins from the list of 142 Each secretion leader (Table 9) was fused in frame to the which were identified in the INCAPS experiments with a Gal2 scFv sequence (Martineau, P. et al. 1998 J. Mol. Bio. 35 280:117) and/or the E. coli thioredoxin (TrxA) sequence priority of 1 or 3 are listed in Table 8 in order of their maxi (SEQ ID NO:46) using splicing by overlap extension PCR mum expression levels. (Horton R. M. et al. 1990 Biotechniques 8:528). The resulting TABLE 8 fragments were purified and Subsequently used as template for a second round of PCR to fuse NikA secretion leader 7 unique proteins from the list of 142 with priority 1 or 3 40 coding sequence to the trXA sequence. The fusions were then (indicating high confidence in the identification) listed in order of cloned into the Pfluorescens expression vectorp)OW1169 maximum expression levels found during the INCAPS experiments. under control of the tac promoter. Each construct was trans Priority Protein ID Curated Function Max formed into P. fluorescens DC454 and expression was assessed in high throughput format. Cultures were grown in a 1 RXFO5550.1 tetratricopeptide repeat family 377264.2 45 protein defined mineral salts medium supplemented with 5% glyc 1 RXFO8124.1 Methyl-accepting chemotaxis 134887.4 erol in 2 mL deep well plates at a culture volume of 0.5 mL. protein Following a 24 hour growth period, the recombinant protein 1 RXFO72S6.1 TolB protein 88429.16 was induced with 0.3 mMIPTG and allowed to express for 24 3 RXFO7256.1 a1 TolB protein 84O2O.S1 3 RXFO4046.2 a1 cytochrome c oxidase, 79275.3 hours. Cultures were fractionated by Sonication and protein monoheme subunit, 50 expression and secretion leader processing was assessed by membrane-bound (ec 1.9.3.1) SDS-CGE and Western blot (FIG. 9). Each of the leaders 3 RXFO3895.1 a1 asma SO164.08 tested, with the exception of the Bce leader, was found to be 3 RXFO7256.1 pn TolB protein 4921S.09 partially or fully processed from the Gal2 scFv protein 1 RXFO6792.1 Conserved Hypothetical Protein 47485-35 3 RXFO2291.1 toluene tolerance protein ttg2C 457.03.08 sequence. Each also greatly improved expression of Gal2 55 ScFv compared to an expression strain that encodes cytoplas mic Gal2 schv (none), indicating that in addition to directing Several co-translationally secreted proteins in E. coli have the Subcellular localization, these secretion leaders can also been identified. The sequences of several of these were used improve overall expression. Not unexpectedly, varying levels to search the MB214 genome for homologues. The E. coli of expression and solubility of Gal2 scFv were also observed. genes were: DsbA, TorT. SfmC, FocC, CcmH, Yral, TolB, 60 Western analysis confirmed that some soluble Gal2 was pro NikA, Flg.I. The BLASTP algorithm (Altschul, S. F., et al., J duced when fused to the Cup A2, CupC2, NikA, Flg.I and ORF Mol Biol, 1990. 215(3): p. 403-10) was used to search a 5550 (FIG. 9). Although expression of TolB leader fused to database of MB214 translated ORFs. The MB214 proteins Gal2 was lower than observed with the other leaders, Western were placed into two categories based on the degree of analysis showed that all protein expressed was soluble. N-ter homology they showed to their E. coli counterparts. High 65 minal analysis showed that the TolB, Cup A2, CupC2, Flg.I. homology proteins matched with expect scores of 2e or NikA and ORF5550 leaders were cleaved from Ga12 ScFv as better. Low homology proteins had expect scores between expected (data not shown). US 7,618,799 B2 53 54 Although not processed from Gal2 scFv, the Bce leader in the art to which this invention pertains. All publications and was found to be processed from TrxA (FIG. 10). Thioredoxin patent applications are herein incorporated by reference to the has been described as a model protein for identification of same extent as if each individual publication or patent appli co-translational secretion leaders as it folds rapidly in the cation was specifically and individually indicated to be incor cytoplasm (Huber et al. 2005 J. Bateriol. 187:2983). The porated by reference. successful secretion of soluble TrxA utilizing the Bce leader Although the foregoing invention has been described in may indicate that this leader acts in a co-translational manner Some detail by way of illustration and example for purposes to facilitate periplasmic secretion. of clarity of understanding, it will be obvious that certain All publications and patent applications mentioned in the changes and modifications may be practiced within the scope specification are indicative of the level of skill of those skilled of the appended claims.

SEQUENCE LISTING

<16 Oc NUMBER OF SEO ID NOS: 50

<210 SEQ ID NO 1 <211 LENGTH: 72 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: mutant phosphate binding protein leader sequence (pbp) <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (72)

<4 OO SEQUENCE: 1 atgaaa citg aaa cqt ttg atg gcg gca atg act titt gtC gct gct ggc 48 Met Lys Lieu Lys Arg Lieu Met Ala Ala Met Thr Phe Val Ala Ala Gly 1. 5 1O 15

gtt gcg acc gtC aac gcg gtg gcc 72 Wall Ala Thr Wall Asn Ala Wall Ala 2O

<210 SEQ ID NO 2 <211 LENGTH: 24 &212> TYPE: PRT <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: mutant phosphate binding protein leader sequence (pbp)

<4 OO SEQUENCE: 2 Met Lys Lieu Lys Arg Lieu Met Ala Ala Met Thr Phe Val Ala Ala Gly 1. 5 1O 15

Wall Ala Thr Wall Asn Ala Wall Ala 2O

<210 SEQ ID NO 3 <211 LENGTH: 66 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (66

<4 OO SEQUENCE: 3

atg cqt aat ctd atc ct c agc gcc gct citc gtc act gcc agc ctic titc 48 Met Arg Asn Lieu. Ile Lieu. Ser Ala Ala Lieu Val Thir Ala Ser Lieu. Phe 1. 5 1O 15

ggc atg acc gca caa gCt 66 Gly Met Thr Ala Glin Ala 2O

<210 SEQ ID NO 4 <211 LENGTH: 22 &212> TYPE: PRT US 7,618,799 B2 55 56

- Continued

<213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 4 Met Arg Asn Lieu. Ile Lieu. Ser Ala Ala Lieu Val Thr Ala Ser Lieu. Phe 1. 5 10 15 Gly Met Thr Ala Glin Ala 2O

<210 SEQ ID NO 5 <211 LENGTH: 63 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (63)

<4 OO SEQUENCE: 5 atg cgc titg acc Cag att att gcc gcc gca gcc att gcg ttg gtt to C 48 Met Arg Lieu. Thr Glin Ile Ile Ala Ala Ala Ala Ile Ala Lieu Val Ser 1. 5 1O 15 acc titt gcg ct c goc 63 Thir Phe Ala Lieu. Ala 2O

<210 SEQ ID NO 6 <211 LENGTH: 21 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 6 Met Arg Lieu. Thr Glin Ile Ile Ala Ala Ala Ala Ile Ala Lieu Val Ser 1. 5 10 15

Thir Phe Ala Lieu. Ala 2O

<210 SEQ ID NO 7 <211 LENGTH: 99 &212> TYPE: DNA <213> ORGANISM: Bacillus coagulans &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) ... (99)

<4 OO SEQUENCE: 7 atg agc aca ca at C C cc cqC ca caa tig citg aaa ggc gcc tog ggc 48 Met Ser Thr Arg Ile Pro Arg Arg Glin Trp Lieu Lys Gly Ala Ser Gly 1. 5 1O 15

Ctg Ctg gcc gcc gcg agc Ctg ggc cgg ttg gCC aac cqc gag gC9 cc 96 Lieu. Lieu Ala Ala Ala Ser Lieu. Gly Arg Lieu Ala Asn Arg Glu Ala Arg 2O 25 3 O gcc 99 Ala

<210 SEQ ID NO 8 <211 LENGTH: 33 &212> TYPE: PRT <213> ORGANISM: Bacillus coagulans <4 OO SEQUENCE: 8 Met Ser Thr Arg Ile Pro Arg Arg Glin Trp Lieu Lys Gly Ala Ser Gly 1. 5 10 15 Lieu. Lieu Ala Ala Ala Ser Lieu. Gly Arg Lieu Ala Asn Arg Glu Ala Arg 2O 25 3 O US 7,618,799 B2 57 58

- Continued

Ala

SEO ID NO 9 LENGTH: TYPE: DNA ORGANISM: Pseudomonas fluorescens FEATURE: NAME/KEY: CDS LOCATION: (1) . . . (75)

<4 OO SEQUENCE: atg tog tgc aca gca tto aaa cca citg ctg. Ctg atc ggc ctg gCC 48 Met Ser Cys Thir Ala Phe Llys Pro Lieu. Lieu. Lieu. Ile Gly Lieu Ala 1. 1O 15 a Ca Ctg atg tgt Cat gca ttic got Thir Luell Met Cys His Ala Phe Ala 2O 25

<210 SEQ ID NO 10 <211 LENGTH: 25 &212> TYPE : PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 10 Met Ser Cys Thr Arg Ala Phe Llys Pro Lieu. Lieu. Lieu. Ile Gly Lieu Ala 1. 5 10 15

Thr Lieu Met Cys Ser His Ala Phe Ala 25

SEQ ID NO 11 LENGTH: 72 TYPE: DNA ORGANISM: Pseudomonas fluorescens FEATURE: NAME/KEY: CDS LOCATION: (1) . . . (72)

<4 OO SEQUENCE: 11 atg citt titt cgc a Ca tta Ctg gcc agc citt acc titt gct gtc atc gcc 48 Met Luell Phe Arg Thir Lell Lell Ala Ser Lieu. Thir Phe Ala Wall Ile Ala 1. 1O 15

tta cc.g to c acg gcc CaC gcg 72 Luell Pro Ser Thir Ala His Ala

<210 SEQ ID NO 12 <211 LENGTH: 24 &212> TYPE : PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 12 Met Lieu. Phe Arg Thr Lieu. Lieu. Ala Ser Lieu. Thir Phe Ala Wall Ile Ala 1. 5 10 15

Gly Leu Pro Ser Thr Ala His Ala

<210 SEQ ID NO 13 <211 LENGTH: 69 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (69)

<4 OO SEQUENCE: 13

US 7,618,799 B2 61 62

- Continued

<4 OO SEQUENCE: 18 Met Llys Phe Lys Glin Lieu Met Ala Met Ala Lieu Lleu Lieu Ala Lieu. Ser 1. 5 10 15

Ala Wall Ala Glin Ala 2O

<210 SEQ ID NO 19 <211 LENGTH: 63 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (63)

<4 OO SEQUENCE: 19 atgaat aga tot toc gog titg ct c ctic got titt gtc titc ct c agc ggc 48 Met Asn Arg Ser Ser Ala Lieu. Lieu. Lieu Ala Phe Val Phe Lieu. Ser Gly 1. 5 1O 15 tgc cag gCC atg gcc 63 Cys Glin Ala Met Ala 2O

<210 SEQ ID NO 2 O <211 LENGTH: 21 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 2O Met Asn Arg Ser Ser Ala Lieu. Lieu. Lieu Ala Phe Val Phe Lieu. Ser Gly 1. 5 10 15 Cys Glin Ala Met Ala 2O

<210 SEQ ID NO 21 <211 LENGTH: 99 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) ... (99)

<4 OO SEQUENCE: 21 atg caa aac cqc act gtg gaa at C ggt gtC ggc Ctt titc ttg ctg gct 48 Met Glin Asn Arg Thr Val Glu Ile Gly Val Gly Lieu. Phe Lieu. Lieu Ala 1. 5 1O 15 ggc at C Ctg gct tta Ctg ttgttg gcc ctg cga gtc agc ggc Ctt tog 96 Gly Ile Lieu Ala Lieu Lleu Lleu Lieu Ala Lieu. Arg Val Ser Gly Lieu. Ser 2O 25 3 O gcc 99 Ala

<210 SEQ ID NO 22 <211 LENGTH: 33 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 22 Met Glin Asn Arg Thr Val Glu Ile Gly Val Gly Lieu. Phe Lieu. Lieu Ala 1. 5 10 15 Gly Ile Lieu Ala Lieu Lleu Lleu Lieu Ala Lieu. Arg Val Ser Gly Lieu. Ser 2O 25 3 O

Ala US 7,618,799 B2 63 64

- Continued

<210 SEQ ID NO 23 <211 LENGTH: 117 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (117)

<4 OO SEQUENCE: 23 atgtct citt cqt aat atgaat atc gcc ccg agg gcc titc ct c ggc titc 48 Met Ser Lieu. Arg Asn Met Asn. Ile Ala Pro Arg Ala Phe Lieu. Gly Phe 1. 5 1O 15 gcg titt att ggc gcc titg atgttg ttg Ct c ggt gtg titc gcg ctgaac 96 Ala Phe Ile Gly Ala Lieu Met Lieu. Lieu. Lieu. Gly Val Phe Ala Lieu. Asn 2O 25 3 O

Cag atg agc aaa att cqt gcg 117 Glin Met Ser Lys Ile Arg Ala 35

<210 SEQ ID NO 24 <211 LENGTH: 39 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 24 Met Ser Lieu. Arg Asn Met Asn. Ile Ala Pro Arg Ala Phe Lieu. Gly Phe 1. 5 10 15 Ala Phe Ile Gly Ala Lieu. Met Lieu. Lieu. Lieu. Gly Val Phe Ala Lieu. ASn 2O 25 3 O Glin Met Ser Lys Ile Arg Ala 35

<210 SEQ ID NO 25 <211 LENGTH: 30 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: oligonucleotide primer

<4 OO SEQUENCE: 25 aattactagt aggagg taca ttatgcgctt 3 O

<210 SEQ ID NO 26 <211 LENGTH: 30 &212> TYPE: DNA <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: oligonucleotide primer

<4 OO SEQUENCE: 26 tatact coag titatttaa.cc tdttt cagta 3 O

<210 SEQ ID NO 27 <211 LENGTH: 5 &212> TYPE: PRT <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: First 5 amino acids of the predicted protein sequence for the processed form of disbC-Skp

<4 OO SEQUENCE: 27 Ala Asp Llys Ile Ala 1. 5 US 7,618,799 B2 65 66

- Continued

SEQ ID NO 28 LENGTH: 10 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: First 10 amino acids of the predicted protein sequence for the unprocessed form of disbC-Skp

SEQUENCE: 28 Met Arg Lieu. Thr Glin Ile Ile Ala Ala Ala 1. 5 10

SEQ ID NO 29 LENGTH: 10 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: First 10 amino acids of the predicted protein sequence for the processed form of disbC-Skp

SEQUENCE: 29 Ala Asp Llys Ile Ala Ile Val Asn Met Gly 1. 5 10

SEQ ID NO 3 O LENGTH: 63 TYPE: DNA ORGANISM: Pseudomonas fluorescens FEATURE: NAME/KEY: CDS LOCATION: (1) . . . (63)

SEQUENCE: 3 O atg aag aag to C acc ttg gct gtg gct gta acg ttg ggc gca at C gcc 48 Met Lys Llys Ser Thr Lieu Ala Val Ala Val Thir Lieu. Gly Ala Ile Ala 1. 5 1O 15

Cag caa gCaggc gct 63 Glin Glin Ala Gly Ala

SEQ ID NO 31 LENGTH: 21 TYPE PRT ORGANISM: Pseudomonas fluorescens

SEQUENCE: 31 Met Lys Llys Ser Thr Lieu Ala Val Ala Val Thir Lieu. Gly Ala Ile Ala 1. 5 10 15 Glin Glin Ala Gly Ala

SEQ ID NO 32 LENGTH: 72 TYPE: DNA ORGANISM: Pseudomonas fluorescens FEATURE: NAME/KEY: CDS LOCATION: (1) . . . (72)

SEQUENCE: 32 atgaaa citg aaa aac acc titg ggc titg goc att ggit tot citt att go c 48 Met Lys Lieu Lys Asn. Thir Lieu. Gly Lieu Ala Ile Gly Ser Lieu. Ile Ala 1. 5 1O 15 gct act tct titc ggc gtt c to go a 72 Ala Thr Ser Phe Gly Val Lieu Ala US 7,618,799 B2 67 68

- Continued

<210 SEQ ID NO 33 <211 LENGTH: 24 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 33 Met Lys Lieu Lys Asn. Thir Lieu. Gly Lieu Ala Ile Gly Ser Lieu. Ile Ala 1. 5 10 15 Ala Thr Ser Phe Gly Val Lieu Ala 2O

<210 SEQ ID NO 34 <211 LENGTH: 72 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (72)

<4 OO SEQUENCE: 34 atg aaa citg aala C9t ttg atg gC9 goa atg act titt gtc gct gct ggc 48 Met Lys Lieu Lys Arg Lieu Met Ala Ala Met Thr Phe Val Ala Ala Gly 1. 5 1O 15 gtt gcg acc gcc aac gcg gtg gCC 72 Wall Ala Thir Ala Asn Ala Wall Ala 2O

<210 SEQ ID NO 35 <211 LENGTH: 24 &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 35 Met Lys Lieu Lys Arg Lieu Met Ala Ala Met Thr Phe Val Ala Ala Gly 1. 5 10 15

Wall Ala Thir Ala Asn Ala Wall Ala 2O

<210 SEQ ID NO 36 <211 LENGTH: 60 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (60)

<4 OO SEQUENCE: 36 atgttt gcc aaa ctic gtt gct gtt to c ctg. citg act Ctg gCd agc ggc 48 Met Phe Ala Lys Lieu Val Ala Val Ser Lieu. Lieu. Thir Lieu Ala Ser Gly 1. 5 1O 15 cag ttg citt got 6 O Gln Lieu. Lieu. Ala 2O

<210 SEQ ID NO 37 <211 LENGTH: 2O &212> TYPE: PRT <213> ORGANISM: Pseudomonas fluorescens

<4 OO SEQUENCE: 37 Met Phe Ala Lys Lieu Val Ala Val Ser Lieu. Lieu. Thir Lieu Ala Ser Gly 1. 5 10 15

Gln Lieu. Lieu. Ala

US 7,618,799 B2 75

- Continued <210 SEQ ID NO 46 <211 LENGTH: 438 &212> TYPE: PRT <213> ORGANISM: Bacillus coagulans <4 OO SEQUENCE: 46 Met Ser Thr Arg Ile Pro Arg Arg Glin Trp Lieu Lys Gly Ala Ser Gly 1. 5 10 15 Lieu. Lieu Ala Ala Ala Ser Lieu. Gly Arg Lieu Ala Asn Arg Glu Ala Arg 2O 25 3 O Ala Ala Glu Ala Ser Ala Ala Ala Pro Lieu. Asp Thr Gly Ser Lieu. Gly 35 4 O 45 Ala Ser Pro Arg Ala Thr Lieu. Asp Ala Cys Lieu Gln Lys Ala Val Asp SO 55 6 O Asp Gly. Thir Lieu Lys Ser Val Val Ala Met Ala Ala Thr Glu Arg Gly 65 70 7s 8O Lieu Ala Tyr Glin Gly Ala Arg Gly Pro Ala Asn Ala Ala Gly Glu Pro 85 9 O 95 le Gly Pro Asp Thr Val Phe Trp Met Leu Ser Met Thr Lys Ala Ile OO OS 1O Thir Ala Thr Ala Cys Met Gln Lieu. Ile Glu Glin Gly Arg Lieu. Gly Lieu. 15 2O 25 Asp Gln Pro Ala Ala Glu Ile Lieu Pro Glin Lieu Lys Ala Pro Glin Val 3O 35 4 O Lieu. Glu Gly Phe Asp Ala Ala Gly Glin Pro Arg Lieu. Arg Pro Ala Arg 45 SO 55 160 Arg Ala Ile Thr Val Arg His Leu Lieu. Thr His Thr Ser Gly Tyr Thr 65 70 7s yr Ser Ile Trp Ser Glu Ala Leu Gly Arg Tyr Glu Glin Val Thr Gly 8O 85 90 Met Pro Asp Ile Gly Tyr Ser Lieu. Asn Gly Ala Phe Ala Ala Pro Lieu. 95 2 OO 2O5 Glu Phe Glu Pro Gly Glu Arg Trp Glin Tyr Gly Ile Gly Met Asp Trp 210 215 22O Val Gly Lys Lieu Val Glu Ala Val Thir Asp Glin Ser Lieu. Glu Val Ala 225 23 O 235 24 O Phe Arg Glu Arg Ile Phe Ala Pro Leu Gly Met His Asp Thr Gly Phe 245 250 255 Lieu. Ile Gly Ser Ala Glin Lys Arg Arg Val Ala Thr Lieu. His Arg Arg 26 O 265 27 O Glin Ala Asp Gly Ser Lieu. Thr Pro Glu Pro Phe Glu Thr Asn Glin Arg 27s 28O 285 Pro Glu Phe Phe Met Gly Gly Gly Gly Leu Phe Ser Thr Pro Arg Asp 290 295 3 OO Tyr Lieu Ala Phe Lieu Gln Met Lieu. Lieu. Asn Gly Gly Ala Trp Arg Gly 3. OS 310 315 32O Glu Arg Lieu. Lieu. Arg Pro Asp Thr Val Ala Ser Met Phe Arg Asn Glin 3.25 330 335 Ile Gly Asp Lieu. Glin Val Arg Glu Met Lys Thr Ala Glin Pro Ala Trp 34 O 345 350 Ser Asn Ser Phe Asp Glin Phe Pro Gly Ala Thr His Lys Trp Gly Lieu. 355 360 365 Ser Phe Asp Lieu. Asn. Ser Glu Pro Gly Pro His Gly Arg Gly Ala Gly 37O 375 38O US 7,618,799 B2 77

- Continued Ser Gly Ser Trp Ala Gly Lieu. Lieu. Asn. Thr Tyr Phe Trp Ile Asp Pro 385 390 395 4 OO

Ala Lys Arg Val Thr Gly Ala Lieu. Phe Thr Glin Met Leu Pro Phe Tyr 4 OS 410 415

Asp Ala Arg Val Val Asp Lieu Tyr Gly Arg Phe Glu Arg Gly Lieu. Tyr 425 43 O Asp Gly Lieu. Gly Arg Ala 435

SEO ID NO 47 LENGTH: 324 TYPE: DNA ORGANISM: Escherichia coli FEATURE: NAME/KEY: CDS LOCATION: (1) ... (324.)

SEQUENCE: 47 agc gat aaa att att cac ctd act gac gac agt titt gac acg gat gta 48 Ser Asp Llys Ile Ile His Lieu. Thir Asp Asp Ser Phe Asp Thr Asp Val 1. 5 1O 15

Ct c aaa gC9 gaC ggg gcg atc ctic git c gat titc tgg gca gag tig tigc 96 Lieu Lys Ala Asp Gly Ala Ile Lieu Val Asp Phe Trp Ala Glu Trp Cys 25 3 O ggit cog to aaa atg atc gcc ccg att ctd gat gaa atc gct gac gala 144 Gly Pro Cys Llys Met Ile Ala Pro Ile Lieu. Asp Glu Ile Ala Asp Glu 35 4 O 45 tat cag ggc aaa citg acc gtt gca aaa citg aac atc gat caa aac cct 192 Tyr Glin Gly Lys Lieu. Thr Val Ala Lys Lieu. ASn Ile Asp Glin Asn. Pro SO 55 6 O ggc act gcg ccg aaa tat ggc at C cqt ggit atc. ccg act ctg ctg. Ctg 24 O Gly Thr Ala Pro Llys Tyr Gly Ile Arg Gly Ile Pro Thir Lieu. Lieu. Lieu. 65 70 7s 8O ttic aaa aac ggit gaa gtg gcg gca acc aaa gtg ggit gca citg tct aaa. 288 Phe Lys Asn Gly Glu Val Ala Ala Thir Lys Wall Gly Ala Lieu. Ser Lys 85 9 O 95 ggt cag titg aaa gag titc ctic gac got aac Ctg gcg 324 Gly Glin Lieu Lys Glu Phe Lieu. Asp Ala Asn Luell Ala 1 OO 105

SEQ ID NO 48 LENGTH: 108 TYPE PRT ORGANISM: Escherichia coli

SEQUENCE: 48

Ser Asp Llys Ile Ile His Lieu. Thir Asp Asp Ser Phe Asp Thr Asp Val 1. 5 10 15

Lieu Lys Ala Asp Gly Ala Ile Lieu Val Asp Phe Trp Ala Glu Trp Cys 25 3 O

Gly Pro Cys Llys Met Ile Ala Pro Ile Lieu. Asp Glu Ile Ala Asp Glu 35 4 O 45

Tyr Glin Gly Lys Lieu. Thr Val Ala Lys Lieu. ASn Ile Asp Glin Asn. Pro SO 55 6 O

Gly Thr Ala Pro Llys Tyr Gly Ile Arg Gly Ile Pro Thir Lieu. Lieu. Lieu. 65 70 7s 8O

Phe Lys Asn Gly Glu Val Ala Ala Thir Lys Wall Gly Ala Lieu. Ser Lys 85 9 O 95

Gly Glin Lieu Lys Glu Phe Lieu. Asp Ala Asn Luell Ala 1 OO 105 US 7,618,799 B2 79 80

- Continued

<210 SEQ ID NO 49 <211 LENGTH: 63 &212> TYPE: DNA <213> ORGANISM: Pseudomonas fluorescens &220s FEATURE: <221 NAME/KEY: CDS <222> LOCATION: (1) . . . (63)

<4 OO SEQUENCE: 49 atg aga aac Ctt Ctt Cagga atg Ctt gtC gtt att to tdt atg gca 48 Met Arg Asn Lieu. Lieu. Arg Gly Met Lieu Val Val Ile Cys Cys Met Ala 1. 5 1O 15 ggg at a gC9 gC9 gC9 63 Gly Ile Ala Ala Ala 2O

SEO ID NO 5 O LENGTH: 21 TYPE PRT ORGANISM: Pseudomonas fluorescens

<4 OO> SEQUENCE: 5 O Met Arg Asn Lieu. Lieu. Arg Gly Met Lieu Val Val Ile Cys Cys Met Ala 1. 5 10 15 Gly Ile Ala Ala Ala 2O

That which is claimed: 9. The vector of claim 5, wherein the protein or polypeptide 1. An isolated nucleic acid molecule comprising a secretion of interest is from an organism that is not a Pseudomonad. signal coding sequence for a secretion polypeptide selected 10. The vector of claim3, wherein the protein or polypep from the group consisting of 35 tide of interest is obtained from a eukaryotic organism. a) a nucleic acid molecule comprising the nucleotide 11. The vector of claim 10, wherein the protein or polypep sequence of SEQID NO:3; and tide of interest is obtained from a mammalian organism. b) a nucleic acid molecule which encodes a polypolypep 12. The vector of claim 5, further comprising a sequence encoding a linkage sequence between the signal coding tide comprising the amino acid sequence of SEQ ID 40 NO:4. sequence and a sequence encoding the protein or polypeptide 2. The nucleic acid molecule of claim 1, wherein said of interest. nucleic acid molecule has been adjusted to reflect the codon 13. The vector of claim 12, wherein the linkage sequence is preference of a host organism selected to express the nucleic cleavable by a signal peptidase. acid molecule. 14. The vector of claim 5, wherein the sequence encoding 3. A vector comprising a secretion signal coding sequence 45 the protein or polypeptide sequence of interest is operably selected from the group consisting of: linked to a sequence encoding a second signal sequence. a) a nucleic acid molecule comprising the nucleotide 15. The vector of claim 14, wherein the second signal sequence of SEQID NO:3; and sequence comprises a sequence targeted to an outer mem b) a nucleic acid molecule which encodes a polypolypep 50 brane secretion signal. tide comprising the amino acid sequence of SEQ ID 16. The vector of claim 3, wherein the vector further com NO:4. prises a promoter. 4. The vector of claim 3, wherein said nucleic acid mol 17. The vector of claim 16, wherein the promoter is native ecule has been adjusted to reflect the codon preference of a to a bacterial host cell. host organism selected to express the nucleic acid molecule. 55 18. The vector of claim 16, wherein the promoter is not 5. The vector of claim 3, wherein the secretion signal native to a bacterial host cell. coding sequence is operably linked to a sequence encoding a 19. The vector of claim 17, wherein the promoter is native protein or polypeptide of interest. to E. coli. 6. The vector of claim 5, wherein the protein or polypeptide 20. The vector of claim 16, wherein the promoter is an of interest is native to a host organism in which the protein or 60 inducible promoter. polypeptide of interest is expressed. 21. The vector of claim 16, wherein the promoter is a lac 7. The vector of claim3, wherein the protein or polypeptide promoter or a derivative of a lac promoter. of interest is native to Pfluorescens. 22. A recombinant cell comprising a secretion signal cod 8. The vector of claim 5, wherein the protein or polypeptide ing sequence for a secretion polypeptide selected from the of interest is obtained from a protein or polypeptide that is not 65 group consisting of: native to a host organism in which the protein or polypeptide a) a nucleic acid molecule comprising the nucleotide of interest is expressed. sequence of SEQID NO:3; and US 7,618,799 B2 81 82 b) a nucleic acid molecule which encodes a polypolypep 44. The method of claim 42, wherein the cell is grown at a tide comprising the amino acid sequence of SEQ ID high cell density. NO:4. 45. The method of claim 44, wherein the cell is grown at a 23. The cell of claim 22, wherein the secretion signal cell density of at least 20 g/L. coding sequence is in an expression vector. 46. The method of claim 42, further comprising purifying 24. The cell of claim 22, wherein the secretion signal the recombinant protein. coding sequence is operably linked to a sequence encoding a 47. The method of claim 46, wherein the recombinant protein or polypeptide of interest. protein is purified by affinity chromatography. 25. The cell of claim 24, wherein the cell expresses the 48. The method of claim 42, wherein the operable linkage protein or polypeptide of interest operably linked to the secre 10 of the protein or polypeptide of interest and the secretion tion signal polypeptide. signal polypeptide is cleavable by an enzyme native to the 26. The cell of claim 25, wherein the protein or polypeptide host cell. is expressed in a periplasmic compartment of the cell. 49. The method of claim 48, wherein the secretion signal 27. The cell of claim 25, wherein an enzyme in the cell polypeptide is cleaved from the protein or polypeptide of cleaves the secretion signal polypeptide from the protein or 15 interest during expression. polypeptide of interest. 50. The method of any claim 42, wherein the protein or 28. The cell of claim 22, wherein the cell is obtained from polypeptide of interest is native to the organism from which a bacterial host. the host cell is obtained. 29. The cell of claim 28, wherein the host is a 51. The method of claim 42, wherein the protein or Pseudomonad. polypeptide of interest is native to a Pfluorescens organism. 30. The cell of claim29, wherein the host is a Pfluorescens. 52. The method of claim 42, wherein the protein or 31. The cell of claim 28, wherein the host is an E. coli. polypeptide of interest is not native to the organism from 32. An expression system for expression of a protein or which the host cell is obtained. polypeptide of interest comprising: 53. The method of claim 42, wherein the protein or a) a host cell; and, 25 polypeptide of interest is obtained from an organism that is b) a vector comprising a nucleic acid molecule encoding not a Pseudomonad. the protein or polypeptide of interest operably linked to 54. The method of claim 42, wherein the protein or a secretion signal polypeptide selected from the group polypeptide of interest is obtained from a eukaryotic organ consisting of: 1S. i) a nucleic acid molecule comprising the nucleotide 30 55. The method of claim 42, wherein the recombinant sequence of SEQID NO:3; and protein comprises a sequence that includes at least two cys ii) a nucleic acid molecule which encodes a poly teine residues. polypeptide comprising the amino acid sequence of 56. The method of claim 42, wherein at least one disulfide SEQID NO:4. bond is formed in the recombinant protein in the cell. 33. The expression system of claim 32, wherein the host 35 57. The method of claim 42, further comprising a linkage cell expresses the protein or polypeptide of interest operably sequence between the signal polypeptide sequence and the linked to the Secretion signal polypeptide. sequence of the protein or polypeptide of interest. 34. The expression system of claim 33, wherein the protein 58. The method of claim 50, wherein at least 50% of the or polypeptide of interest is expressed in a periplasmic com protein or polypeptide of interest comprises a native amino partment of the cell. 40 terminus. 35. The expression system of claim 33, wherein an enzyme 59. The method of claim 58, wherein at least 80% of the in the cell cleaves the signal polypeptide from the protein or protein or polypeptide of interest comprises a native amino polypeptide of interest. terminus. 36. The expression system of claim 32, wherein the cell is 60. The method of claim 59, wherein at least 90% of the obtained from a bacterial host. 45 protein or polypeptide of interest comprises a native amino 37. The expression system of claim 32, wherein the host is terminus. a Pseudomonad. 61. The method of claim 42, wherein at least 50% of the 38. The expression system of claim 37, wherein the host is recombinant protein is active. Pfluorescens. 62. The method of claim 61, wherein at least 80% of the 39. The expression system of claim 36, wherein the host is 50 recombinant protein is active. E. coli. 63. The method of claim 42, wherein at least 50% of the 40. The expression system of claim 32, further comprising recombinant protein is expressed in a periplasmic compart a fermentation medium. ment. 41. The expression system of claim 40, wherein the fer 64. The method of claim 6, wherein at least 75% of the mentation medium comprises a chemical inducer. 55 recombinant protein is expressed in a periplasmic compart 42. A method for the expression of a recombinant protein in ment. a host cell comprising providing a host cell comprising a 65. The method of claim 64, wherein at least 90% of the vector encoding a protein or polypeptide of interest operably recombinant protein is expressed in a periplasmic compart linked to a secretion signal polypeptide selected from the ment. group consisting of 60 66. The method of claim 42, wherein the host cell is a i) a nucleic acid molecule comprising the nucleotide Pseudomonad cell. sequence of SEQID NO:3; and 67. The method of claim 66, wherein the cell is a Pfluo ii) a nucleic acid molecule which encodes a polypolypep rescens cell. tide comprising the amino acid sequence of SEQ ID 68. The method of claim 42, wherein the cell is an E. coli NO:4. 65 cell. 43. The method of claim 42, wherein the cell is grown in a mineral salts media. UNITED STATES PATENT AND TRADEMARK OFFICE CERTIFICATE OF CORRECTION PATENT NO. : 7,618,799 B2 Page 1 of 1 APPLICATIONNO. : 12/022789 DATED : November 17, 2009 INVENTOR(S) : Coleman et al. It is certified that error appears in the above-identified patent and that said Letters Patent is hereby corrected as shown below:

Title Page, Item (75) Inventor is corrected to read: -- Russell J. Coleman, San Diego (CA); Diane Retallack, Poway (CA); Thomas M. Ramseier, Newton (MA): Charles D. Hershberger, Poway (CA): Stacey Lee, San Diego (CA) --.

Signed and Sealed this Eighteenth Day of August, 2015 74-4-04- 2% 4 Michelle K. Lee Director of the United States Patent and Trademark Office