<<

US 200601 10747A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0110747 A1 Ramseier et al. (43) Pub. Date: May 25, 2006

(54) PROCESS FOR IMPROVED (60) Provisional application No. 60/591489, filed on Jul. EXPRESSION BY STRAIN ENGINEERING 26, 2004. (75) Inventors: Thomas M. Ramseier, Poway, CA Publication Classification (US); Hongfan Jin, San Diego, CA (51) Int. Cl. (US); Charles H. Squires, Poway, CA CI2O I/68 (2006.01) (US) GOIN 33/53 (2006.01) CI2N 15/74 (2006.01) Correspondence Address: (52) U.S. Cl...... 435/6: 435/7.1; 435/471 KING & SPALDING LLP 118O PEACHTREE STREET (57) ABSTRACT ATLANTA, GA 30309 (US) This invention is a process for improving the production levels of recombinant or or improving the (73) Assignee: Dow Global Technologies Inc., Midland, level of active recombinant proteins or peptides expressed in MI (US) host cells. The invention is a process of comparing two genetic profiles of a that expresses a recombinant (21) Appl. No.: 11/189,375 protein and modifying the cell to change the expression of a product that is upregulated in response to the recom (22) Filed: Jul. 26, 2005 binant protein expression. The process can improve or can improve protein quality, for example, by Related U.S. Application Data increasing solubility of a recombinant protein. Patent Application Publication May 25, 2006 Sheet 1 of 15 US 2006/0110747 A1 Figure 1

09

010909070£020\,0 10°0 Patent Application Publication May 25, 2006 Sheet 2 of 15 US 2006/0110747 A1

Figure 2

Ester sers Custer || || || || ||

HH-I-H

1 H4 s a cisiers TT closers | | | | | | Ya

S T

RXFO 1961. HSV RXFO 1957: HSU RXFO3987:CbpA RXD05455: HtpG Patent Application Publication May 25, 2006 Sheet 3 of 15 US 2006/0110747 A1 Figure 3

RXFO5399: DnaK RXFO5406: DnaJ Patent Application Publication May 25, 2006 Sheet 4 of 15 US 2006/0110747 A1

Figure 4

plph GH vs. nitA Time Point Comparison plph GH vs. nitA

Strain Comparison

GH Tine Point Comparison Patent Application Publication May 25, 2006 Sheet 5 of 15 US 2006/0110747 A1

Figure 5

(~~~~aegae| Patent Application Publication May 25, 2006 Sheet 6 of 15 US 2006/0110747 A1

Figure 6

2

.

2

2 Patent Application Publication May 25, 2006 Sheet 7 of 15 US 2006/0110747 A1 Figure 7

100

-0-DC369 (wt, hCGH) --DC372 (hs|U, hCGH) -A-DC271 (wt, pbp:hCGH) -HDC373 (hstU, pbp:hCGH)

Hours Patent Application Publication May 25, 2006 Sheet 8 of 15 US 2006/0110747 A1 Figure 8

L s S. r f

ve

s S. O s.

N f f

has s s

E s () N - Patent Application Publication May 25, 2006 Sheet 9 of 15 US 2006/0110747 A1

Figure 9

e s?t Lif s N- V) L | - s t f

ss te E s f S. N s - - f

f O f f

as N

e ver

b s O sos s w Ns s - -- s L ha s r s Patent Application Publication May 25, 2006 Sheet 10 of 15 US 2006/0110747 A1

Figure 10

iO 5 : 25 siss 23a iii.is 3: 8

s

8

B

SR F 83

R

O y Patent Application Publication May 25, 2006 Sheet 11 of 15 US 2006/0110747 A1

Figure 11

6000.00 8 5000.00 C 3 5 4000.00 -0-DC369 3000.00 --DC206 hCGH 2000.00 -A-hslU hCH is 1000.00 0.00

Hours

0 5

6 22

48 16 Patent Application Publication May 25, 2006 Sheet 12 of 15 US 2006/0110747 A1 Figure 12

e - C2 as 5is aE R v O is C O) O. E ww * O. "I Go C i .. Z s -nos Y s H . . JJ O) 9. 3 is w N is s 9. 9 5 s 5 S2 e R 5 1 is al Patent Application Publication May 25, 2006 Sheet 13 of 15 US 2006/0110747 A1 Figure 13

A. Plasmid CrOSS-in

k- Region to be deleted (1917 bp) SUV Genome ust N 11.

SS-505 bp 634 bp -1lasmid Allelic exchange mutagenesis hsiuV deletion plasmid (Tet", pyrf") Cross-in: Tet pDOW2050 • Cross out: 5- FOA tolerant

(A hsil JV ) ef pyrf

B. Tet resistant; FOA Sensitive

N - Genome

plasmid Cross-out

C. FOA tolerant

Patent Application Publication May 25, 2006 Sheet 14 of 15 US 2006/0110747 A1

Figure 14

Cls U C s S. -0-HJ104 (1) O -o-HJ104 (2) c C -A-HJ117 (1) 2 w -o-HJ117 (2) d 4.

20 30 Hours after induction Patent Application Publication May 25, 2006 Sheet 15 of 15 US 2006/0110747 A1

Figure 15

C CO st to qs s S. - c | Na-

• O C el

2 C CO q US 2006/01 10747 A1 May 25, 2006

PROCESS FOR IMPROVED PROTEIN with lac gene induction (Wei Y., et al. (2001) High-density EXPRESSION BY STRAIN ENGINEERING microarray-mediated profiling of Escheri chia coli. J Bacteriol. 183(2):545-56). Other groups have CROSS REFERENCE TO RELATED also investigated transcriptional profiles regulated after APPLICATION mutation of endogenous or deletion of regulatory genes (Sabina, J. etal (2003) Interfering with Different Steps 0001. This application claims priority to U.S. Provisional of Protein Synthesis Explored by Transcriptional Profiling Application No. 60/591489, filed Jul. 26, 2004. of K-12J Bacteriol. 185:6158-6170; Lee J H (2003) Global analyses of transcriptomes and proteomes FIELD OF THE INVENTION of a parent strain and an L--overproducing mutant 0002 This invention is in the field of protein production, strain. J Bacteriol. 185(18):5442-51; Kabir M. M. et al. and in particular is a process for improving the production (2003) Gene expression patterns for metabolic pathway in levels of recombinant proteins or peptides or improving the pgi knockout Escherichia coli with and without phb genes level of active recombinant proteins or peptides expressed in based on RT-PCR J Biotechnol. 105(1-2): 11-31; Eymann host cells. C., et al. (2002) functional genomics: global characterization of the stringent response by proteome and BACKGROUND transcriptome analysis. J Bacteriol. 184(9): 2500-20). 0003 More than 155 recombinantly produced proteins 0007 Gill et al. disclose the use of microarray technology and peptides have been approved by the U.S. Food and Drug to identify changes in the expression of stress related genes Administration (FDA) for use as biotechnology drugs and in E. coli after expression of recombinant chloramphenicol vaccines, with another 370 in clinical trials. Unlike small acetyltransferase fusion proteins (Gill et al. (2001) Genomic molecule therapeutics that are produced through chemical Analysis of High-Cell-Density Recombinant Escherichia synthesis, proteins and peptides are most efficiently pro coli Fermentation and “Cell Conditioning for Improved duced in living cells. In many cases, the cell or organism has Recombinant Protein Yield Biotech. Bioengin. 72:85-95). been genetically modified to produce or increase the pro The stress gene transcription profile, comprising only 16% duction of the protein. of the total genome, at high cell density was used to evaluate “cell conditioning strategies to alter the levels of chaper 0004. When a cell is modified to produce large quantities ones, , and other intracellular proteins prior to of a target protein, the cell is placed under stress and often recombinant protein overexpression. The strategies for reacts by inducing or Suppressing other proteins. The stress “conditioning involved pharmacological manipulation of that a host cell undergoes during production of recombinant proteins can increase expression of for example, specific the cells, including through dithiothreitol and ethanol treat proteins or cofactors to cause degradation of the overex mentS. pressed recombinant protein. The increased expression of 0008 Asai et al. described the use of microarray analysis compensatory proteins can be counterproductive to the goal to identify target genes activated by over-expression of of expressing high levels of active, full-length recombinant certain sigma factors that are typically induced after cell protein. Decreased expression or lack of adequate expres stresses (Asai K., et al. (2003) DNA microarray analysis of sion of other proteins can cause misfolding and aggregation Bacillus subtilis sigma factors of extracytoplasmic function of the recombinant protein. While it is known that a cell family. FEMS Microbiol. Lett. 220(1): 155-60). Cells over under stress will change its profile of protein expression, it expressing Sigma factors as well as reporter genes linked to is not known in any given example which specific proteins sigma factor promoters were used to show stress regulated will be upregulated or downregulated. gene induction. Microarrays 0009 Choi et al. described the analysis and up-regulation of metabolic genes that are down-regulated in high-density 0005 Microarray technology can be used to identify the batch cultures of E. coli expressing human insulin-like presence and level of expression of a large number of growth factor fusion protein (IGF-I) (Choi et al. (2003) polynucleotides in a single assay. See for eg. U.S. Pat. No. Enhanced Production of Insulin-Like Growth Factor I 6,040,138, filed Sep. 15, 1995, U.S. Pat. No. 6,344,316, filed Fusion Protein in Escherichia coli by Coexpression of the Jun. 25, 1997, U.S. Pat. No. 6,261,776, filed Apr. 15, 1999, Down-Regulated Genes Identified by Transcriptome Profil U.S. Pat. No. 6,403,957, filed Oct. 16, 2000, U.S. Pat. No. ing App. Envir. Microbio. 69:4737-4742). The focus of this 6,451,536, filed Sep. 27, 2000, U.S. Pat. No. 6,532,462, filed work was on the metabolic changes that occur during Aug. 27, 2001, U.S. Pat. No. 6,551,784, filed May 9, 2001, high-density conditions after protein induction. Genes that U.S. Pat. No. 6,420,108, filed Feb. 9, 1998, U.S. Pat. No. were down regulated after induction of recombinant protein 6,410,229, filed Dec. 14, 1998, U.S. Pat. No. 6,576.424, filed production during high density growth conditions were Jan. 25, 2001, U.S. Pat. No. 6,687,692, filed Nov. 2, 2000, identified and specific metabolic genes that had been down U.S. Pat. No. 6,600,031, filed Apr. 21, 1998, and U.S. Pat. regulated were expressed in cells producing recombinant No. 6,567,540, filed Apr. 16, 2001, all assigned to Affyme IGF-I. The work showed that increasing metabolic produc trix, Inc. tion of certain nucleotide bases and amino acids could 0006 U.S. Pat. No. 6,607,885 to E. I. duPont de Nemours increase protein production and that growth rates could be and Co. describes methods to profile and identify gene modified by increasing expression of a down-regulated expression changes after Subjecting a bacterial cell to metabolic transporter molecule. These strategies were expression altering conditions by comparing a first and designed to alter the cellular environment to reduce meta second microarray measurement. Wei et al. used a microar bolic stresses associated with the protein production gener ray analysis to investigate gene expression profiles of E. coli ally or with high density culture. US 2006/01 10747 A1 May 25, 2006

Protein Degradation the secondary and tertiary structure of the protein itself is of 0010 Unwanted degradation of recombinant protein pre critical importance. Any significant change in protein struc sents an obstacle to the efficient use of certain expression ture can yield a functionally inactive molecule, or a protein systems. The expression of exogenous proteins often with significantly reduced biological activity. In many cases, induces stress responses in host cells, which can be, for a host cell expresses folding modulators (FMs) that are example, natural defenses to a limited carbon source. All necessary for proper production of active recombinant pro cells contain a large number of genes capable of producing tein. However, at the high levels of expression generally degradative proteins. It is not possible to predict which required to produce usable, economically satisfactory bio proteases will be regulated by a given host in response to technology products, a cell often can not produce enough expression of a particular recombinant protein. For example, native folding modulator or modulators to process the the Pfluorescens contains up to 200 proteases and recombinant protein. related proteins. 0014. In certain expression systems, overproduction of exogenous proteins can be accompanied by their misfolding 0011. In the cytoplasm of E. coli, is generally and segregation into insoluble aggregates. In bacterial cells carried out by a group of proteases and molecules. these aggregates are known as inclusion bodies. In E. coli, Most early degradation steps are carried out by five ATP the network of folding modulators/chaperones includes the dependent Hsps: Lon/La FtsH/HflB, ClpAP ClpXP, and Hsp70 family. The major Hsp70 chaperone, DnaK, effi ClpYQ/HslUV (Gottesman S (1996) Proteases and their ciently prevents protein aggregation and Supports the refold targets in Escherichia coli. Annu. Rev. Genet. 30:465-506). ing of damaged proteins. The incorporation of heat shock Along with FtsH (an inner membrane-associated protease proteins into protein aggregates can facilitate disaggrega the of which faces the cytoplasm), ClpAP and tion. However, proteins processed to inclusion bodies can, in ClpXP are responsible for the degradation of proteins modi certain cases, be recovered through additional processing of fied at their carboxyl termini by addition of the non-polar the insoluble fraction. Proteins found in inclusion bodies destabilizing tail AANDENYALAA (Gottesman S, et al. typically have to be purified through multiple steps, includ (1998) The ClpXP and ClpAP proteases degrade proteins ing denaturation and renaturation. Typical renaturation pro with carboxyl-terminal tails added by the SSrA cesses for inclusion body targeted proteins involve attempts tagging system. Genes Dev. 12:1338-1347; Herman C, et al. to dissolve the aggregate in concentrated denaturant and (1998) Degradation of carboxy-terminal-tagged cytoplasmic Subsequent removal of the denaturant by dilution. Aggre proteins by the Escherichia coli protease HflB (FtsH). Genes gates are frequently formed again in this stage. The addi Dev. 12:1348-1355). tional processing adds cost, there is no guarantee that the in 0012 Several approaches have been taken to avoid deg vitro refolding will yield biologically active product, and the radation during recombinant protein production. One recovered proteins can include large amounts of fragment approach is to produce host strains bearing mutations in a impurities. protease gene. Baney X and Georgiou, for example, utilized a protease-deficient strain to improve the yield of a protein 0015. One approach to reduce protein aggregation is A-B-lactamase fusion protein (Baney X F. Georgiou G. through fermentation engineering, most commonly by (1991) Construction and characterization of Escherichia coli reducing the cultivation temperature (see Baneyx F (1999) strains deficient in multiple secreted proteases: protease III In vivo folding of recombinant proteins in Escherichia coli. degrades high-molecular-weight Substrates in vivo. J. Bac In Manual of Industrial Microbiology and Biotechnology, teriol 173: 2696-2703). Park et al. used a similar mutational Ed. Davies et al. Washington, DC: American Society for approach to improve recombinant protein activity 30% Microbiology ed. 2:551-565 and references therein). The compared with the parent strain of E. coli (Park S. et al. more recent realization that in vivo protein folding is (1999) Secretory production of recombinant protein by a assisted by molecular chaperones, which promote the proper high cell density culture of a protease negative mutant isomerization and cellular targeting of other polypeptides by Escherichia coli strain. Biotechnol. Progr. 15:164-167). U.S. transiently interacting with folding intermediates, and by Pat. Nos. 5,264,365 and 5,264,365 describe the construction foldases, which accelerate rate-limiting steps along the fold of protease-deficient E. coli, particularly multiply protease ing pathway, has provided additional approaches combat the deficient strains, to produce proteolytically sensitive problem of inclusion body formation (see for e.g. Thomas J polypeptides. PCT Publication No. WO 90/03438 describes Get al. (1997). Molecular chaperones, folding catalysts and the production of strains of E. coli that include protease the recovery of active recombinant proteins from E. coli: to deficient strains or strains including a protease inhibitor. fold or to refold. Appl Biochem Biotechnol, 66:197-238). Similarly, PCT Publication No. WO 02/48376 describes E. 0016. In certain cases, the overexpression of chaperones coli strains deficient in proteases DegP and Prc. has been found to increase the soluble yields of aggregation prone proteins (see Baneyx, F. (1999) Recombinant Protein Protein Folding Expression in E. coli Curr. Opin. Biotech. 10:411-421 and 0013 Another major obstacle in the production of recom references therein). The process does not appear to involve binant proteins in host cells is that the cell often is not dissolution of preformed recombinant inclusion bodies but is adequately equipped to produce either soluble or active related to improved folding of newly synthesized protein protein. While the primary structure of a protein is defined chains. For example, Nishihara et al. coexpressed groESL by its sequence, the secondary structure is and dna JK/grpE in the cytoplasm to improve the stability defined by the presence of alpha helixes or beta sheets, and and accumulation of recombinant Cry2 (an allergen of the ternary structure by covalent bonds between adjacent Japanese cedar pollen) (Nishihara K. Kanemori M. Kita protein stretches, such as disulfide bonds. When expressing gawa M. Yanagi H. Yura T. 1998. Chaperone coexpression recombinant proteins, particularly in large-scale production, plasmids: differential and synergistic roles of DnaK-DnaJ US 2006/01 10747 A1 May 25, 2006

GrpE and GroEL-GroES in assisting folding of an allergen decrease the expression of these particular proteases, while of Japanese cedar pollen, Cry2, in Escherichia coli. Appl. sparing other proteins that are useful or even necessary for Environ. Microbiol. 64:1694). Lee and Olins also coex cell homeostasis. pressed GroESL and DnaK and increased the accumulation of human procollagenase by tenfold (Lee S. Olins P. 1992. 0028. As another example, a cell may selectively upregu Effect of overproduction of heat shock chaperones GroESL late one or more folding modulators or cofactors to increase and DnaK on human procollagenase production in Escheri the folding capability or solubility of the recombinant pro chia coli. JBC 267:2849-2852). The beneficial effect asso tein. Again, it cannot be predicted in advance which folding ciated with an increase in the intracellular concentration of modulators or cofactors will be selected in a given system to these chaperones appears highly dependent on the nature of assist in the processing of a specific recombinant protein. the overproduced protein, and Success is by no means Analyzing the genetic profile by microarray or equivalent guaranteed. technology allows identification of the folding modulators or cofactors that have been upregulated. Based on this infor 0017. A need exists for processes for development of host mation, the cell is genetically modified to increase the strains that show improved recombinant protein or peptide expression of the selected folding modulators or cofactors production, activity or solubility in order to reduce manu preferred by the cell for the given recombinant protein. This facturing costs and increase the yield of active products. modification can increase the percent of active protein 0018. It is therefore an object of the invention to provide recovered, while minimizing the detrimental impact on cell processes for improving recombinant protein expression in homeostasis. a host. 0029. Therefore, the yield and/or activity and/or solubil ity of the recombinant protein can be increased by modify 0019. It is a further object of the invention to provide ing the host organism via either increasing or decreasing the processes that increase expression levels in host cells expression of a compensatory protein (i.e. a protein that is expressing recombinant proteins or peptides. upregulated in response to given cell stress) in a manner that 0020. It is another object of the invention to provide is selective and that leaves whole other beneficial mecha processes to increase the levels of Soluble protein made in nisms of the cell. recombinant expression systems. 0030 The process can be used iteratively until the 0021. It is yet another object of the invention to provide expression of active recombinant protein is optimized. For processes to increase the levels of active protein made in example, using the process described above, the host cell or recombinant expression systems. organism is genetically modified to upregulate, down regu late, knock-in or knock-out one or more identified compen SUMMARY satory proteins. The host cell or organism so modified can then be cultured to express the recombinant protein, or a 0022. A process is provided for improving the expression related protein or peptide, and additional compensatory of a recombinant protein or peptide comprising: proteins identified via microarray or equivalent analysis. 0023 i) expressing the recombinant protein or peptide in The modified host cell or organism is then again genetically a host cell; modified to upregulate, down regulate, knock-in or knock out the additional selected compensatory proteins. This 0024 ii) analyzing a genetic profile of the cell and process can be iterated until a host cell or organism is identifying one or more endogenous gene products that are obtained that exhibits maximum expression of active and/or up-regulated upon expression or overexpression of the soluble protein without undue weakening of the host organ recombinant protein or peptide; and ism or cell. These steps for example can be repeated for 0.025 iii) changing expression of one or more identified example, one, two, three, four, five, six, seven, eight, nine, endogenous gene products by genetically modifying the cell. or ten or more times. 0026. The process can provide improved expression as 0031. In another embodiment, the process further com measured by improved yields of protein, or can improve the prises: iv) expressing the recombinant protein or peptide in recovery of active protein, for example by increasing Solu a genetically modified cell. In yet another embodiment, the bility of the expressed recombinant protein, or a related process further comprises: V) analyzing a second genetic protein or peptide. profile of the genetically modified cell expressing recombi 0027. Using this process, it can be determined which of nant protein or peptide and identifying one or more addi the many cellular proteins are “chosen” by the cell to tional gene products that are differentially expressed in the compensate for the expression of the foreign recombinant modified cell expressing recombinant protein or peptide. In protein, and this information can lead to development of a further embodiment, the process additionally comprises: more effective protein expression systems. For example, it is vi) changing the expression of one or more identified known that, typically, a cell will selectively upregulate one additional gene products to provide a double modified cell. or more proteases to degrade an overexpressed recombinant Optionally, the recombinant protein or peptide, or a related protein. However, it cannot be predicted in advance which protein or peptide, can be expressed in the double modified protease(s) the cell will upregulate to compensate for the cell. The differentially regulated gene products identified in stress caused by any given recombinant protein. Analysis of the modified cell can be up- or down-regulated when com the cell's genetic profile by microarray or equivalent tech pared to the host cell or when compared to the modified cell nology can identify which proteases are upregulated in a not expressing recombinant protein or peptide. given cell in response to exogenous protein production. This 0032. In yet another embodiment, the process further information is then used to genetically modify the cell to comprises: iv) analyzing a second genetic profile of a US 2006/01 10747 A1 May 25, 2006 genetically modified cell expressing recombinant protein or tion products of genes from the genome. The process can peptide and identifying one or more additional gene prod include analyzing the transcriptome profile using a microar ucts that are differentially expressed in the modified cell that ray or equivalent technology. In this embodiment, the is not expressing recombinant protein or peptide. In a further microarray can include binding partners to at least a portion embodiment, the process additionally comprises: V) chang of the transcriptome of the host cell, and typically includes ing the expression of one or more additional identified gene samples from binding partners to gene products of at least products in the modified cell to provide a double modified 50% of the genome of the organism. More typically, the cell. The differentially regulated gene products identified in microarray includes samples from at least 80%, 90%. 95%, the modified cell can be up- or down-regulated when com 98%, 99% or 100% of the binding partners to gene products pared to the host cell or organism or when compared to the in the genome of the host cell. modified cell not expressing recombinant protein or peptide. 0037. In a separate embodiment, the microarray can 0033. In one specific embodiment, a process is provided include a selected Subset of binding partners to genes or gene for improving the expression of a recombinant protein or products which represent classes of products that are peptide comprising: i) expressing the recombinant protein or affected by the recombinant protein expression. Nonlimiting peptide in a host cell; ii) analyzing a genetic profile of the examples include putative or known proteases, co-factors of cell and identifying at least one protease that is up-regulated proteases or protease-like proteins; folding modulators, co when the recombinant protein or peptide is expressed; and factors of folding modulators or proteins that may improve iii) changing expression of an identified protease by geneti protein folding or Solubility; transcription factors; proteins cally modifying the host cell or organism to reduce the involved in nucleic acid stability or translational initiation; expression of the upregulated protease. In a further embodi kinases; extracellular or intracellular receptors; metabolic ment, the process comprises changing the expression of at ; metabolic cofactors; envelope proteins; sigma least a second identified protease in the modified cell to factors; membrane bound proteins; transmembrane proteins; provide a double protease modified cell. In another embodi membrane associated proteins and housekeeping genes. The ment, the process further comprises: iv) expressing the genetic profile can be analyzed by measuring the binding of recombinant protein or peptide, or a related protein or the expressed genes of the host cell expressing the recom peptide, in a protease modified cell. In another embodiment, binant protein or peptide to the microarray. The transcrip the process further comprises analyzing a second genetic tome profile can also be analyzed using non-microarray profile of the protease modified cell to identify one or more assays such as blot assays, including northern blot assays, or additional gene products that are differentially expressed in columns coated with binding partners. the modified cell. 0038. In another embodiment, the genetic profile ana 0034. In another embodiment, a process is provided for lyzed can be a proteome profile, i.e. a profile of the proteins improving the expression of a recombinant protein or pep produced from genes in a given organism. The process can tide comprising: i) expressing the recombinant protein or include analyzing the proteome profile using, for example, peptide in a host cell; ii) analyzing a genetic profile of the two-dimensional electrophoresis. Techniques like mass cell and identifying at least one up-regulated folding modu spectrometry in combination with separation tools such as lator (FM) that is up-regulated after overexpression of the two-dimensional gel electrophoresis or multidimensional recombinant protein or peptide; and iii) changing expression liquid chromatography, can also be used in the process. In of at least one identified folding modulator by genetically two dimensional electrophoresis, the proteins separated can modifying the cell to provide a FM modified cell. In a further include proteins from at least 10% of the proteome of the embodiment, the process comprises changing the expression organism. More typically, proteins from at least 20%, 30%, of at least a second identified folding modulator in the 40%, 60%, 80% or 90% of the proteins in the proteome of modified cell to provide a double FM modified cell. In the host cell are separated and analysed by techniques such another embodiment, the process further comprises: iv) as staining of proteins and/or mass spectrometry. expressing the recombinant protein or peptide, or a related 0039. In additional embodiment, the proteome profile is protein or peptide, in a FM modified cell. In another embodi analyzed using mass spectrometry. There are several related ment, the process further comprises analyzing a second techniques that use liquid chromatography (LC) coupled to genetic profile of the FM modified cell to identify one or mass spectrometry (MS) and tandem mass spectrometry more additional gene products that are differentially (MS/MS) to identify proteins and measure their relative expressed in the modified cell. abundance. Often, one sample is labeled with a heavy 0035. The term “genetic profile' as used herein is meant isotope tag that allows for comparison to another sample to include an analysis of genes in a genome, mRNA tran without changing the chemical properties. For example, in scribed from genes in the genome (or the equivalent cDNA), one sample the amino acid cysteine can be labeled with a tag transcription products that have been modified by a cell such containing eight hydrogen atoms. The other sample is as splice variants of genes in eukaryotic systems, or proteins labeled with a tag that contains eight deuterium (“heavy') or peptides translated from genes in a genome, including atoms instead (+8 Daltons). MS data can be used to find proteins that are modified by the cell or translated from pairs of peptides 8 Daltons apart and quantitate the differ splice variants of mRNA translated from the genome. A ence. MS/MS data from the same peptides provides an genetic profile is meant to include more than one gene or approximation of primary sequence, and the protein ID. gene product, and typically includes a group of at least 5, 10. Other experiments label the proteins in vivo by growing 50, 100 or more genes or gene products that are analyzed. cells with “heavy” amino acids. These types of techniques can be used to identify thousands of proteins in a single 0036). In one embodiment, the genetic profile analyzed experiment and estimate relative abundance if present in can be a transcriptome profile, i.e. a profile of the transcrip both samples (see Goodlett D R and Aebersold R H (2001). US 2006/01 10747 A1 May 25, 2006

Mass Spectrometry in Proteomics. Chem Rev 101:269-295). cell is a prokaryote, such as a bacterial cell including, but not ICAT is a type of MS/MS, it stands for Isotope Coded limited to an Escherichia or a Pseudomonas species. The Affinity Tags (see Gygi S. P. Rist B, Gerber SA, Turecek F. host cell may be a Pseudomonad cell such as a Pfluorescens Gelb MH, and Aebersold R H (1999). Quantitative analysis cell. In other embodiments, the host cell is an E. coli cell. In of complex protein mixtures using isotope-coded affinity another embodiment the host cell is a eukaryotic cell, for tags. Nat Biotech 17:994-999). example an insect cell, including but not limited to a cell 0040. In another embodiment, the process can include from a Spodoptera, Trichoplusia Drosophila or an analyzing the proteome profile using, for example, a Estigmene species, or a mammalian cell, including but not microarray. In this embodiment, the array can include bind limited to a murine cell, a hamster cell, a monkey, a primate ing partners to at least a portion of the proteins expressed by or a human cell. In another embodiment, the host cell is a the host cell under appropriate growth conditions, and plant cell, including, but not limited to, a tobacco cell, corn, typically includes binding partners to proteins from at least a cell from an Arabidopsis species, potato or rice cell. In 10% of the proteome of the organism. More typically, the another embodiment, a whole organism is analyzed in the microarray includes binding partners to proteins from at process, including but not limited to a transgenic organism. least 20%, 30%, 40%, 60%, 80% or 90% of the proteins in 0045. In one embodiment, the identified upregulated the proteome of the host cell. The binding partners can be compensatory genes or gene products are one or more antibodies, which can be antibody fragments such as single proteases and/or one or more folding modulators. In certain chain antibody fragments. In a separate embodiment, the embodiments, an identified gene or gene product can also be microarray can include binding partners for a selected Subset a subunit of a protease or a folding modulator or a cofactor of proteins from the proteome, including, for example, of a protease or a cofactor of a folding modulator. In one putative protease proteins or putative folding modulators. embodiment, the identified gene can be selected from a The microarray can typically also include a set of binding serine, threonine, cysteine, aspartic or metallo peptidase. In partners to proteins that are used as controls. The genetic certain other embodiments, the identified gene or gene profile can be analyzed by measuring the binding of the product can be selected from hisV. hsIU, clp A, clpb and proteins of the host cell expressing the recombinant protein clpX. The identified gene or gene product can also be a or peptide to the binding partners on the microarray. The cofactor of a protease. In another embodiment, the identified proteome profile can also be analyzed in a standard assay gene or gene product is a folding modulator. In certain format, such as an Elisa assay or a standard western blot embodiments, the identified gene or gene product can be aSSay. selected from a chaperone protein, a foldase, a peptidyl prolyl and a disulfide bond isomerase. In one 0041. The samples in the genetic profile can be analyzed embodiment, the identified gene or gene product can be individually or grouped into clusters. The clusters can typi selected from htpG, cbp.A. dna), dnaK and flkbP. In one cally be grouped by similarity in gene expression. In par embodiment, a gene or gene product homologous to the ticular embodiments, the clusters can be grouped as genes identified up-regulated gene is modified in the genome of the that are upregulated to a similar extent or genes that are host. down-regulated to a similar extent. 0046) The process can lead to increased production of 0042. The identified up-regulated gene is typically iden recombinant protein or peptide in a host cell, by for example, tified by comparing a genetic profile of the host cell express increasing the amount of protein per gram of host protein ing the recombinant protein or peptide to a genetic profile of (total cell protein) in a given amount of time, or increasing the host cell not expressing the recombinant protein or the amount of length of time during which the cell or peptide. In a further embodiment, a host cell expressing a organism is producing the recombinant protein. The protein homologous to the first recombinant protein is increased production may optimize the efficiency of the cell analyzed. or organism by for example, decreasing the energy expen 0043. The genome of the host cell expressing the recom diture, increasing the use of available resources, or decreas binant protein or peptide can be modified by recombination, ing the requirements for growth Supplements in growth for example homologous recombination or heterologous media. The increased production may also result in an recombination. The genome can also be modified by muta increased level of recoverable protein or peptide, such as tion of one or more nucleotides in an open reading frame soluble protein, produced per gram of recombinant or per encoding a gene, particularly an identified protease. In gram of host cell protein. another embodiment, the host cell is modified by including 0047 The invention also includes an improved recombi one or more vectors that encode an inhibitor of an identified nant host cell that is produced by the claimed process. gene or gene product, Such as a protease inhibitor. In another BRIEF DESCRIPTION OF THE DRAWINGS embodiment, the host cell is modified by inhibition of a , which can be a native promoter. In a separate 0048 FIG. 1 is a graph of a growth comparison (optical embodiment, the host cell is modified by including one or density over time) of different strains of Pfluorescens. The more vectors that encode a gene, typically a folding modu cells were induced with 0.3 M of IPTG at 24 hr after lator or a cofactor of a folding modulator. In another inoculation. The strains are: DC280 harboring the empty embodiment, the host cell is modified by enhancing a vector p)OW1339, DC240 that produces the soluble cyto promoter for an identified folding modulator or a cofactor plasmic nitrilase , and DC271 that produces the for a folding modulator, including by adding an exogenous partially insoluble periplasmic hCGH. DC206, the parental strain of DC280, DC240, and DC271 was included as a promoter to the host cell genome. control. Samples were taken at 0 and 4 hrs post-IPTG 0044) The host cell can be any cell capable of producing induction for RNA isolation and gene expression profiling, recombinant protein or peptide. In one embodiment, the host as indicated by arrows. US 2006/01 10747 A1 May 25, 2006

0049 FIG. 2 is an graph of hierarchical clustering of all 0056 FIG. 9 is an image of the SDS-PAGE analysis of genes from Pfluorescens strains DC280, DC240 and DC271 strains DC369 and DC372 expressing hCH in the cyto into 12 clusters at 4 hr after IPTG when compared to 0 hr plasm. Samples were taken from DC369 (wild-type, W) and IPTG (indicated at the bottom of the figure). Based on the DC372 (hslU mutant, M) just before protein induction (Ohr) value and trend, genes were clustered and grouped using the and then 4 hrs, 8 hrs, 24 hrs, 30 hrs, and 50 hrs after IPTG hierarchical clustering algorithm from Spotfire Decision addition. Soluble (S) and insoluble (I) fractions were pre Site. Broken lines indicate data points that were filtered out pared for each sample analyzed. The production of hCH is due to poor spots quality or low level of expression. The indicated by an arrow. The molecular weight (MW) marker X-axis represents the comparison of each Strain; the y-axis (Ma) is shown on the right hand side of the gels. represents the relative expression value 4 hrs after to before IPTG induction. All the identified FMs are highlighted. 0057 FIG. 10 is a graph of growth curves of strains Cluster 7 shows 2 FM and 2 protease subunit genes that are expressing the hCH:COP fusion protein. The strains highly expressed in strain DC271, which overproduces the include: DC369 expressing hCH only (not fused to COP) as periplasmic hCH protein. The remaining FM genes are a negative control; HJ104, the wild type expressing hCGH grouped in cluster 6. ::COP; HJ105, the hslU mutant expressing hCH:COP. 0050 FIG. 3 is a hierarchical cluster analysis of cluster 0058 FIG. 11 is a graph of the green fluorescence 6 from FIG. 2. In the new cluster 8, two folding modulators, activity measurements for strains expressing the hCH::COP DnaK and DnaJ, were identified both of which showed fusion protein using a fluorimeter. Five OD600 of cell higher expression levels for periplasmic recombinant pro culture were sampled for each strain harboring hCH or tein production similar to the previously identified HslVU, hCH:COPat different time points after IPTG induction. The Cbp.A, and HitpG. Cluster 6 shows where the rest of the FMs strains tested include: DC369 expressing hCH only (not are grouped. fused to COP) as a negative control; HJ104, the wild type expressing hCH::COP: HJ105, the hslU mutant expressing 0051 FIG. 4 is a Venn diagram showing the up-regulated hGH::COP. The inserted table shows percent increase of protease and FMs from the three sets of experiments in Table relative fluorescence in the hslU mutant compared to the 5, 6 and 7. As summarized in Table 5, 6 and 7, the list of wild type at different time points after IPTG induction. genes were organized in Venn diagram to highlight the overlap of the gene list among the three sets of experiments 0059 FIG. 12 is a pictoral representation of the process indicated at the corner. For each gene, the ratio of each of measuring relative abundance of mRNA between two experiment was shown with 2 as a cut off. Samples. 0.052 FIG. 5 is a graph of the sequence analysis of the 0060 FIG. 13 is a representation of the construction of hslV (RXFO1961) and hslU (RXFO1957) genes from P chromosomal deletion of hslUV gene in pyrF-negative fluorescens generated by Artemis. The codon usage plot (top strain. A. Plasmid p)OW2050 contains 505 bp and 634 bp panel) indicates that the gene boundary are correct. This is DNA fragments flanking the hslUV gene. Since suicide corroborated by the best homologues of HslV and HslU plasmid p)OW2050 can not replicate in P. fluorescens, protein sequences to P aeruginosa as indicated beneath the tetracycline-resistant cells will only be generated after a genes of RXFO 1961 and RXFO1957. The Phrap quality single recombination event at one of the homologous score plot shows that the sequence quality is good, i.e. the regions that integrates the entire plasmid into the genome. B. score line is above the horizontal line indicating a better Tetracycline-resistant cells contains the entire plasmid inte quality than 1 error in 10 kb (middle panel). The open white grated into the genome. These cells also contain the pyrF pointed boxes below the genes show the location of the gene encoded from the plasmid. Selection for cells that has probes generated for use in the DNA microarray experi the second recombinant event occurred by plating cells on mentS. agar plates supplemented with FOA, which in pyrF-positive strains, is converted into a toxic compound. C. The chro 0053 FIG. 6 is a schematic illustration of an hslU mutant mosomal deletion strain was confirmed by sequencing construction where an approximately 550 bp PCR product of analysis hslU (light blue box) was ligated into TOPO TA2.1 cloning vector (circle). The resulting plasmid was transformed into 0061 FIG. 14 is a graph of relative fluorescence over competent Pfluorescens cells and kanamycin (kan)-resistant time for green fluorescence activity measurements for the colonies were analyzed in diagnostic PCR to confirm the strains expressing the hCH: :COP fusion protein using a construction of an insertion mutation in the hslU gene. fluorimeter. Duplicates were used for both the wild type 0054 FIG. 7 is a graph of a growth curve assays com (HJ104) and hslUV deletion strain (HJ117). paring wild type with hslU mutant strain overproducing 0062 FIG. 15 is images of SDS-PAGE gels of strains hGH or pbp::hCH in shake flask production medium. The expressing hCH with or without folding modulators GrpE arrows indicate time points where samples were taken. DnakJ. Samples were removed at various times after induc 0055 FIG. 8 is an image of SDS-PAGE analysis of tion by IPTG (0, 4, 8, 24 and 48 hr), normalized to OD600 strains DC271 and DC373 expressing pbp::hCH. Samples of 20 and lysed using EasyLyse. The soluble (S) insoluble (I) were taken from DC271 (wild-type, W) and DC373 (hslU fractions were separated on a BioRad Criterion 15% Tris mutant, M) just before protein induction (Ohr) and then 4hr. HCl SDS-PAGE gel and stained with Coomassie. 8 hr, 24 hr, and 30 hr after IPTG addition. Soluble (S) and DETAILED DESCRIPTION insoluble (I) fractions were prepared for each sample ana lyzed. The production of unprocessed and processed hCH is 0063 A process is provided for improving the expression indicated by arrows. The molecular weight (MW) marker of a recombinant protein or peptide comprising i) expressing (Ma) is shown on the right hand side of the gels. the recombinant protein or peptide in a host cell; ii) analyZ US 2006/01 10747 A1 May 25, 2006 ing a genetic profile of the cell and identifying one or more further manipulation of the cell. For example, chemical endogenous up-regulated gene products, including one or treatment of the cell may be required to initiate or enhance more proteases or folding modulators that are up-regulated protein or peptide expression. Promoter and repressor ele upon expression of the recombinant protein or peptide; and ments that govern the expression of recombinant proteins or iii) changing expression of one or more identified gene peptides in host cells are described below and are well products by genetically modifying the cell. In another known in the art. These can include promoter elements embodiment, the process further comprises expressing the based on the “tac' promoter, responsive to IPTG. recombinant protein or peptide in a genetically modified cell. In another embodiment, the process further comprises Selection of a Host Cell or Organism analyzing a second genetic profile of the genetically modi fied cell to identify one or more additional gene products that 0072 The process of the invention can be used in any are differentially expressed in the modified cell. In a further given host system, including of either eukaryotic or prokary embodiment, the process comprises changing the expression otic origin. The process is generally limited only by the of at least a second identified gene product in the modified availability of enough genetic information for analysis of a cell to provide a double modified cell. The process can genetic profile to identify a identified gene. Although it is provide improved expression as measured by improved generally typical that representative sequences from a large yields of protein, or can improve the recovery of active percentage of the genome is available, for example at least protein, for example by increasing solubility of the 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99% or 100% of the expressed recombinant protein. sequences expressed or found in the genome, transcriptome, or proteome, the invention can be practiced using only a 0064 More generally, the invention includes a process portion of the sequences in the genome, transcriptome, or for improving the expression of a recombinant protein or proteome. In particular, in instances when the information peptide in a host cell or organism comprising: available includes information on a group of related 0065 i) expressing the recombinant protein or peptide in sequences, such as a metabolically linked group, only a the recombinant host cell or organism; Small portion of representative sequences from the genome can be used for the process of the invention. The process is 0.066 ii) analyzing a genetic profile of the recombinant also not limited to particular recombinant proteins being cell to identify a compensatory gene or gene product that is expressed, as a key aspect of the process is the capacity to expressed at a higher level in the recombinant cell than in rationally and iteratively design expression systems based one of either a host cell that has not been modified to express on techniques for identifying cellular changes that occur in the recombinant protein or a recombinant cell that is not a host cell upon expression of recombinant proteins or expressing the recombinant protein; and peptides and modulating the host cell using procedures 0067 iii) changing expression of the identified compen known in the art. satory gene or gene product in the recombinant cell by genetic modification to provide a modified recombinant cell 0073. The host cell can be any cell capable of producing that achieves an increase in recombinant protein expression, recombinant protein or peptide. In one embodiment, the host activity or solubility. cell is a microbial cell, ie. a cell from a bacteria, fungus, yeast, or other unicellular , prokaryotes and 0068 Throughout the specification, when a range is viruses. The most commonly used systems to produce provided, it should be understood that the components are recombinant proteins or peptides include certain bacterial meant to be independent. For example, a range of 1-6 means cells, particularly E. coli, because of their relatively inex independently 1, 2, 3, 4, 5 or 6. pensive growth requirements and potential capacity to pro duce protein in large batch cultures. Yeast are also used to 0069. The steps of the process are described in more express biologically relevant proteins and peptides, particu detail below. larly for research purposes. Systems include Saccharomyces Step I: Genetic Modification of Host Cell or Organism to cerevisiae or Pichia pastoris. These systems are well char Express a Recombinant Protein or Peptide in a Host Cell acterized, provide generally acceptable levels of total pro tein expression and are comparatively fast and inexpensive. 0070. In the first step of the process, a host cell is Insect cell expression systems have also emerged as an modified to have the capacity to express a recombinant alternative for expressing recombinant proteins in biologi protein or peptide. The host cell can be modified using any cally active form. In some cases, correctly folded proteins techniques known in the art. For example, the recombinant that are post-translationally modified can be produced. protein can be expressed from an expression vector that is Mammalian cell expression systems, such as Chinese ham exogenous to the genome of the cell and that is transfected ster ovary cells, have also been used for the expression of or transformed into the cell. The construction of expression recombinant proteins. On a small scale, these expression vectors as well as techniques for transfection or transforma systems are often effective. Certain biologics can be derived tion are described below. The host cell can also be modified from mammalian proteins, particularly in animal or human to express a recombinant protein or peptide from a genomic health applications. In another embodiment, the host cell is insert as described below. A gene encoding the recombinant a plant cell, including, but not limited to, a tobacco cell, protein or peptide can be inserted into the genome of the host corn, a cell from an Arabidopsis species, potato or rice cell. cell or organism by techniques such as homologous or In another embodiment, a multicellular organism is analyzed heterologous recombination. These techniques are described or is modified in the process, including but not limited to a below. transgenic organism. Techniques for analyzing and/or modi 0071. The recombinant protein or peptide can be fying a multicellular organism are generally based on tech expressed under the control of an element that requires niques described for modifying cells described below. US 2006/01 10747 A1 May 25, 2006

0074. In one embodiment, the host cell can be a prokary of Determinative Bacteriology, pp. 217-289 (8th ed., 1974) ote such as a bacterial cell including, but not limited to an (The Williams & Wilkins Co., Baltimore, Md., USA) (here Escherichia or a Pseudomonas species. Typical bacterial inafter “Bergey (1974)). The following table presents these cells are described, for example, in “Biological Diversity: families and genera of organisms. Bacteria and Archaeans', a chapter of the On-Line Biology Book, provided by Dr MJ Farabee of the Estrella Mountain Community College, Arizona, USA at URL: http://www.em c.maricopa.edu/faculty/farabee/BIOBK/BioBookDiver Families and Genera Listed in the Part, “Gram-Negative Aerobic Rods sity 2.html. In certain embodiments, the host cell can be a and Cocci' (in Bergey (1974)) Pseudomonad cell, and can typically be a Pfluorescens cell. Family I. Pseudomonadaceae Giuconobacter In other embodiments, the host cell can also be an E. coli Pseudomonas Xanthomonas cell. In another embodiment the host cell can be a eukaryotic Zoogloea cell, for example an insect cell, including but not limited to Family II. Azotobacteraceae Azomonas a cell from a Spodoptera, Trichoplusia Drosophila or an Azotobacter Estigmene species, or a mammalian cell, including but not Beijerinckia Dexia limited to a murine cell, a hamster cell, a monkey, a primate Family III. Rhizobiaceae Agrobacterium or a human cell. Rhizobium Family IV. Methylomonadaceae Methylococcus 0075. In certain embodiments, the host cell is a Methylomonas Pseudomonad cell, and can be for example a Pfluorescens Family V. Halobacteriaceae Haiobacterium organism. Haiococcus Other Genera Acetobacter 0076. In one embodiment, the host cell can be a member Alcaligenes of any of the bacterial taxa. The cell can, for example, be a Bordeteia Bruceiia member of any species of eubacteria. The host can be a Franciselia member any one of the taxa: Acidobacteria, Actinobacteira, Thermits Aquificae, Bacteroidetes, Chlorobi, Chlamydiae, Choroflexi, Chrysiogenetes, Cyanobacteria, Deferribacteres, Deinococ cus, Dictyoglomi, Fibrobacteres, Firmicutes, Fusobacteria, Gemmatimonadetes, Lentisphaerae, Nitrospirae, Plancto 0080) “Gram(-) Proteobacteria Subgroup 1 also mycetes, Proteobacteria, Spirochaetes, Thermodesulfobac includes Proteobacteria that would be classified in this teria, Thermomicrobia, Thermotogae, Thermus (Ther heading according to the criteria used in the classification. males), or Verrucomicrobia. In one embodiment of a The heading also includes groups that were previously eubacterial host cell, the cell can be a member of any species classified in this section but are no longer, Such as the genera of eubacteria, excluding Cyanobacteria. Acidovorax, Brevundimonas, Burkholderia, Hydro genophaga, Oceanimonas, Ralstonia, and Stenotrophomo 0077. The bacterial host can also be a member of any nas, the genus Sphingomonas (and the genus Blastomonas, species of Proteobacteria. A proteobacterial host cell can be derived therefrom), which was created by regrouping organ a member of any one of the taxa Alphaproteobacteria, isms belonging to (and previously called species of) the Betaproteobacteria, Gammaproteobacteria, Deltaproteo genus Xanthomonas, the genus Acidomonas, which was bacteria, or Epsilonproteobacteria. In addition, the host can created by regrouping organisms belonging to the genus be a member of any one of the taxa Alphaproteobacteria, Acetobacter as defined in Bergey (1974). In addition hosts Betaproteobacteria, or Gammaproteobacteria, and a mem can include cells from the genus Pseudomonas, Pseudomo ber of any species of Gammaproteobacteria. nas enalia (ATCC 14393), Pseudomonas nigrifaciens (ATCC 19375), and Pseudomonas putrefaciens (ATCC 0078. In one embodiment of a Gammaproteobacterial 8071), which have been reclassified respectively as Altero host, the host will be member of any one of the taxa monas haloplanktis, Alteromonas nigrifaciens, and Altero Aeromonadales, Alteromonadales, Enterobacteriales, monas putrefaciens. Similarly, e.g., Pseudomonas aci Pseudomonadales, or Xanthomonadales, or a member of dovorans (ATCC 15668) and Pseudomonas testosteroni any species of the Enterobacteriales or Pseudomonadales. (ATCC 11996) have since been reclassified as Comamonas In one embodiment, the host cell can be of the order acidovorans and Comamonas testosteroni, respectively; and Enterobacteriales, the host cell will be a member of the Pseudomonas nigrifaciens (ATCC 19375) and Pseudomo family Enterobacteriaceae, or a member of any one of the nas piscicida (ATCC 15057) have been reclassified respec genera Erwinia, Escherichia, or Serratia, or a member of tively as Pseudoalteromonas nigrifaciens and Pseudoaltero the genus Escherichia. In one embodiment of a host cell of monas piscicida. “Gram(-) Proteobacteria Subgroup 1 also the order Pseudomonadales, the host cell will be a member includes Proteobacteria classified as belonging to any of the of the family Pseudomonadaceae, even of the genus families: Pseudomonadaceae, AZotobacteraceae (now often Pseudomonas. Gamma Proteobacterial hosts include mem called by the synonym, the “Azotobacter group' of bers of the species Escherichia coli and members of the Pseudomonadaceae), Rhizobiaceae, and Methylomona species Pseudomonas fluorescens. daceae (now often called by the synonym, “Methylococ 0079. Other Pseudomonas organisms may also be useful. caceae). Consequently, in addition to those genera other Pseudomonads and closely related species include Gram(-) wise described herein, further Proteobacterial genera falling Proteobacteria Subgroup 1, which include the group of within “Gram(-) Proteobacteria Subgroup 1 include: 1) Proteobacteria belonging to the families and/or genera AZotobacter group bacteria of the genus Azorhizophilus, 2) described as “Gram-Negative Aerobic Rods and Cocci' by Pseudomonadaceae family bacteria of the genera Cellvibrio, R. E. Buchanan and N. E. Gibbons (eds.), Bergey’s Manual Oligella, and Teredinibacter; 3) Rhizobiaceae family bac US 2006/01 10747 A1 May 25, 2006

teria of the genera Chelatobacter, Ensifer, Liberibacter (also dovorax, Hydrogenophaga, Methylobacter, Methylocal called “Candidatus Liberibacter'), and Sinorhizobium, and dum, Methylococcus, Methylomicrobium, Methylomonas, 4) Methylococcaceae family bacteria of the genera Methy Methylosarcina, Methylosphaera, Azomonas, Azorhizophi lobacter, Methylocaldum, Methylomicrobium, Methylosa lus, Azotobacter, Cellvibrio, Oligella, Pseudomonas, Tere rcina, and Methylosphaera. dinibacter, Francisella, Stenotrophomonas, Xanthomonas, 0081. In another embodiment, the host cell is selected and Oceanimonas. from “Gram(-) Proteobacteria Subgroup 2.'"Gram(-) Pro 0084. In another embodiment, the host cell is selected teobacteria Subgroup 2 is defined as the group of Proteo from “Gram(-) Proteobacteria Subgroup 4.”“Gram(-) Pro bacteria of the following genera (with the total numbers of teobacteria Subgroup 4” is defined as the group of Proteo catalog-listed, publicly-available, deposited Strains thereof bacteria of the following genera: Brevundimonas, Blasto indicated in parenthesis, all deposited at ATCC, except as in OndS, Sphingomonas, Burkholderia, Ralstonia, otherwise indicated): Acidomonas (2); Acetobacter (93); Acidovorax, Hydrogenophaga, Methylobacter, Methylocal Gluconobacter (37); Brevundimonas (23); Beijerinckia (13): dum, Methylococcus, Methylomicrobium, Methylomonas, Derxia (2); Brucella (4); Agrobacterium (79); Chelatobacter Methylosarcina, Methylosphaera, Azomonas, Azorhizophi (2); Ensifer (3); Rhizobium (144); Sinorhizobium (24); Blas lus, Azotobacter, Cellvibrio, Oligella, Pseudomonas, Tere tomonas (1); Sphingomonas (27); Alcaligenes (88); Borde dinibacter, Francisella, Stenotrophomonas, Xanthomonas, tella (43); Burkholderia (73); Ralstonia (33); Acidovorax and Oceanimonas. (20); Hydrogenophaga (9); Zoogloea (9); Methylobacter 0085. In an embodiment, the host cell is selected from (2); Methylocaldum (1 at NCIMB); Methylococcus (2): “Gram(-) Proteobacteria Subgroup 5.'"Gram(-) Proteobac Methylomicrobium (2); Methylomonas (9); Methylosarcina teria Subgroup 5’ is defined as the group of Proteobacteria (1); Methylosphaera, Azomonas (9): Azorhizophilus (5): of the following genera: Methylobacter, Methylocaldum, Azotobacter (64); Cellvibrio (3): Oligella (5); Pseudomonas Methylococcus, Methylomicrobium, Methylomonas, Methy (1139); Francisella (4); Xanthomonas (229); Stenotroph losarcina, Methylosphaera, Azomonas, Azorhizophilus, Omonas (50); and Oceanimonas (4). Azotobacter, Cellvibrio, Oligella, Pseudomonas, Teredini 0082) Exemplary host cell species of “Gram(-) Proteo bacter, Francisella, Stenotrophomonas, Xanthomonas, and bacteria Subgroup 2 include, but are not limited to the Oceanimonas. following bacteria (with the ATCC or other deposit numbers of exemplary strain(s) thereof shown in parenthesis): Aci 0086) The host cell can be selected from “Gram(-) Pro domonas methanolica (ATCC 43581); Acetobacter aceti teobacteria Subgroup 6.”“Gram(-) Proteobacteria Subgroup (ATCC 15973); Gluconobacter oxydans (ATCC 19357); 6' is defined as the group of Proteobacteria of the following Brevundimonas diminuta (ATCC 11568); Beijerinckia genera: Brevundimonas, Blastomonas, Sphingomonas, indica (ATCC 9039 and ATCC 19361); Dervia gummosa Burkholderia, Ralstonia, Acidovorax, Hydrogenophaga, (ATCC 15994); Brucella melitensis (ATCC 23456), Bru Azomonas, Azorhizophilus, Azotobacter, Cellvibrio, Oli cella abortus (ATCC 23448); Agrobacterium tumefaciens gella, Pseudomonas, Teredinibacter, Stenotrophomonas, (ATCC 23308), Agrobacterium radiobacter (ATCC 19358), Xanthomonas, and Oceanimonas. Agrobacterium rhizogenes (ATCC 11325); Chelatobacter 0087. The host cell can be selected from “Gram(-) Pro heintzii (ATCC 29600); Ensifer adhaerens (ATCC 33212): teobacteria Subgroup 7.'"Gram(-) Proteobacteria Subgroup Rhizobium leguminosarum (ATCC 10004); Sinorhizobium 7' is defined as the group of Proteobacteria of the following fredii (ATCC 35423); Blastomonas natatoria (ATCC genera: Azomonas, Azorhizophilus, Azotobacter, 35951); Sphingomonas paucimobilis (ATCC 29837); Alcali Cellvibrio, Oligella, Pseudomonas, Teredinibacter, genes faecalis (ATCC 8750); Bordetella pertussis (ATCC Stenotrophomonas, Xanthomonas, and Oceanimonas. The 9797); Burkholderia cepacia (ATCC 25416); Ralstonia host cell can be selected from "Gram(-) Proteobacteria pickettii (ATCC 27511); Acidovorax facilis (ATCC 11228): Subgroup 8.”“Gram(-) Proteobacteria Subgroup 8' is Hydrogenophaga flava (ATCC 33.667); Zoogloea ramigera defined as the group of Proteobacteria of the following (ATCC 19544); Methylobacter luteus (ATCC 49878); genera: Brevundimonas, Blastomonas, Sphingomonas, Methylocaldum gracile (NCIMB 11912); Methylococcus Burkholderia, Ralstonia, Acidovorax, Hydrogenophaga, capsulatus (ATCC 19069); Methylomicrobium agile (ATCC Pseudomonas, Stenotrophomonas, Xanthomonas, and Oce 35068); Methylomonas methanica (ATCC 35067); Methy animonas. losarcina fibrata (ATCC 700909); Methylosphaera hansonii 0088. The host cell can be selected from “Gram(-) Pro (ACAM 549): Azomonas agilis (ATCC 7494): Azorhizophi teobacteria Subgroup 9.'"Gram(-) Proteobacteria Subgroup lus paspali (ATCC 23833); Azotobacter chroococcum 9 is defined as the group of Proteobacteria of the following (ATCC 9043); Cellvibrio mixtus (UQM 2601); Oligella genera: Brevundimonas, Burkholderia, Ralstonia, Acidovo urethralis (ATCC 17960); Pseudomonas aeruginosa (ATCC rax, Hydrogenophaga, Pseudomonas, Stenotrophomonas, 10145), Pseudomonas fluorescens (ATCC 35858); Fran cisella tularensis (ATCC 6223); Stenotrophomonas malto and Oceanimonas. philia (ATCC 13637); Xanthomonas campestris (ATCC 0089. The host cell can be selected from “Gram(-) Pro 33913); and Oceanimonas doudoroffii (ATCC 27123). teobacteria Subgroup 10.’"Gram(-) Proteobacteria Sub 0083. In another embodiment, the host cell is selected group 10' is defined as the group of Proteobacteria of the from “Gram(-) Proteobacteria Subgroup 3.'"Gram(-) Pro following genera: Burkholderia, Ralstonia, Pseudomonas, teobacteria Subgroup 3’ is defined as the group of Proteo Stenotrophomonas, and Xanthomonas. bacteria of the following genera: Brevundimonas, Agrobac 0090 The host cell can be selected from “Gram(-) Pro terium, Rhizobium, Sinorhizobium, Blastomonas, teobacteria Subgroup 11.”“Gram(-) Proteobacteria Sub Sphingomonas, Alcaligenes, Burkholderia, Ralstonia, Aci group 11 is defined as the group of Proteobacteria of the US 2006/01 10747 A1 May 25, 2006

genera: Pseudomonas, Stenotrophomonas, and Xanthomo 25417); Pseudomonas mephitica (ATCC 33665); nas. The host cell can be selected from “Gram(-) Proteo Pseudomonas denitrificans (ATCC 19244); Pseudomonas bacteria Subgroup 12.”“Gram(-) Proteobacteria Subgroup pertucinogena (ATCC 190); Pseudomonas pictorum (ATCC 12' is defined as the group of Proteobacteria of the following 23328); Pseudomonas psychrophila, Pseudomonas fulva genera: Burkholderia, Ralstonia, Pseudomonas. The host (ATCC 31418); Pseudomonas monteilii (ATCC 700476): cell can be selected from “Gram(-) Proteobacteria Subgroup Pseudomonas mosselii, Pseudomonas Oryzihabitans (ATCC 13.”“Gram(-) Proteobacteria Subgroup 13” is defined as the 43272); Pseudomonas plecoglossicida (ATCC 700383); group of Proteobacteria of the following genera: Burkhold Pseudomonas putida (ATCC 12633); Pseudomonas reac eria, Ralstonia, Pseudomonas, and Xanthomonas. The host tans, Pseudomonas spinosa (ATCC 14606); Pseudomonas cell can be selected from “Gram(-) Proteobacteria Subgroup balearica, Pseudomonas luteola (ATCC 43273); Pseudomo 14.”“Gram(-) Proteobacteria Subgroup 14' is defined as the nas Stutzeri (ATCC 17588); Pseudomonas amygdali (ATCC group of Proteobacteria of the following genera: Pseudomo 33614); Pseudomonas avellanae (ATCC 700331); nas and Xanthomonas. The host cell can be selected from Pseudomonas caricapapayae (ATCC 33615); Pseudomonas “Gram(-) Proteobacteria Subgroup 15.”“Gram(-) Proteo cichorii (ATCC 10857); Pseudomonas ficuserectae (ATCC bacteria Subgroup 15” is defined as the group of Proteobac 35104); Pseudomonas fuscovaginae, Pseudomonas meliae teria of the genus Pseudomonas. (ATCC 33050); Pseudomonas syringae (ATCC 19310): 0091. The host cell can be selected from “Gram(-) Pro Pseudomonas viridiflava (ATCC 13223); Pseudomonas teobacteria Subgroup 16.”“Gram(-) Proteobacteria Sub thermocarboxydovorans (ATCC 35961); Pseudomonas group 16' is defined as the group of Proteobacteria of the thermotolerans, Pseudomonas thivervalensis, Pseudomo following Pseudomonas species (with the ATCC or other nas vancouverensis (ATCC 700688); Pseudomonas wiscon deposit numbers of exemplary strain(s) shown in parenthe Sinensis, and Pseudomonas Xiamenensis. sis): Pseudomonas abietaniphila (ATCC 700689); Pseudomonas aeruginosa (ATCC 101.45); Pseudomonas 0092. The host cell can be selected from “Gram(-) Pro alcaligenes (ATCC 14909); Pseudomonas anguilliseptica teobacteria Subgroup 17.”“Gram(-) Proteobacteria Sub (ATCC 33660); Pseudomonas citronellolis (ATCC 13674): group 17 is defined as the group of Proteobacteria known Pseudomonas flavescens (ATCC 51555); Pseudomonas in the art as the “fluorescent Pseudomonads” including those mendocina (ATCC 25411); Pseudomonas nitroreducens belonging, e.g., to the following Pseudomonas species: (ATCC 33634); Pseudomonas oleovorans (ATCC 8062); Pseudomonas azotoformans, Pseudomonas brenneri; Pseudomonas pseudoalcaligenes (ATCC 17440); Pseudomonas cedrella, Pseudomonas corrugata, Pseudomonas resinovorans (ATCC 14235); Pseudomonas Pseudomonas extremorientalis, Pseudomonas fluorescens, straminea (ATCC 33.636); Pseudomonas agarici (ATCC Pseudomonas gessardii. Pseudomonas libanensis, 25941); Pseudomonas alcaliphila, Pseudomonas algino Pseudomonas mandelii, Pseudomonas marginalis, vora, Pseudomonas andersonii. Pseudomonas asplenii Pseudomonas migulae, Pseudomonas mucidolens, (ATCC 23835); Pseudomonas azelaica (ATCC 27162): Pseudomonas Orientalis, Pseudomonas rhodesiae, Pseudomonas beijerinckii (ATCC 19372); Pseudomonas Pseudomonas synxantha, Pseudomonas tolaasii; and borealis, Pseudomonas boreopolis (ATCC 33662); Pseudomonas veronii. Pseudomonas brassicacearum, Pseudomonas butanovora 0093. The host cell can be selected from “Gram(-) Pro (ATCC 43655); Pseudomonas cellulosa (ATCC 55703); teobacteria Subgroup 18.”“Gram(-) Proteobacteria Sub Pseudomonas aurantiaca (ATCC 33663); Pseudomonas group 18' is defined as the group of all Subspecies, varieties, chlororaphis (ATCC 9446, ATCC 13985, ATCC 17418, strains, and other Sub-special units of the species Pseudomo ATCC 17461); Pseudomonas fragi (ATCC 4973); nas fluorescens, including those belonging, e.g., to the Pseudomonas lundensis (ATCC 49968); Pseudomonas following (with the ATCC or other deposit numbers of taetrolens (ATCC 4683); Pseudomonas cissicola (ATCC exemplary strain(s) shown in parenthesis): Pseudomonas 33616); Pseudomonas coronafaciens, Pseudomonas diter fluorescens biotype A, also called biovar 1 or biovar I peniphila, Pseudomonas elongata (ATCC 10144); (ATCC 13525); Pseudomonas fluorescens biotype B, also Pseudomonas flectens (ATCC 12775); Pseudomonas azoto called biovar 2 or biovar II (ATCC 17816); Pseudomonas formans, Pseudomonas brenneri. Pseudomonas cedrella, fluorescens biotype C, also called biovar 3 or biovar III Pseudomonas corrugata (ATCC 29736); Pseudomonas (ATCC 17400); Pseudomonas fluorescens biotype F, also extremorientalis, Pseudomonas fluorescens (ATCC 35858); called biovar 4 or biovar IV (ATCC 12983); Pseudomonas Pseudomonas gessardii. Pseudomonas libanensis, fluorescens biotype G, also called biovar 5 or biovar V Pseudomonas mandelii (ATCC 700871); Pseudomonas (ATCC 17518); Pseudomonas fluorescens biovar VI; marginalis (ATCC 10844); Pseudomonas migulae, Pseudomonas fluorescens Pf)-1, Pseudomonas fluorescens Pseudomonas mucidolens (ATCC 4685); Pseudomonas ori Pf-5 (ATCC BAA-477); Pseudomonas fluorescens SBW25; entalis, Pseudomonas rhodesiae, Pseudomonas Synxantha and Pseudomonas fluorescens subsp. cellulosa (NCIMB (ATCC 9890); Pseudomonas tolaasi (ATCC 33618): 10462). Pseudomonas veronii (ATCC 700474); Pseudomonas fied eriksbergensis, Pseudomonas geniculata (ATCC 19374); 0094) The host cell can be selected from “Gram(-) Pro Pseudomonas gingeri. Pseudomonas graminis, Pseudomo teobacteria Subgroup 19.”“Gram(-) Proteobacteria Sub nas grimonti. Pseudomonas halodenitrificans, Pseudomo group 19 is defined as the group of all strains of Pseudomo nas halophila, Pseudomonas hibiscicola (ATCC 19867); nas fluorescens biotype A. A typical strain of this biotype is Pseudomonas huttiensis (ATCC 14670); Pseudomonas Pfluorescens strain MB101 (see U.S. Pat. No. 5,169,760 to hydrogenovora, Pseudomonas jessenii (ATCC 700870); Wilcox), and derivatives thereof. An example of a derivative Pseudomonas kilomensis, Pseudomonas lanceolata (ATCC thereof is P. fluorescens strain MB214, constructed by 14669); Pseudomonas lini. Pseudomonas marginata (ATCC inserting into the MB101 chromosomal asd (aspartate dehy US 2006/01 10747 A1 May 25, 2006

drogenase gene) locus, a native E. coli PlacI-lacI-laczYA bage, canola, cantaloupe, carrot, cassava, castorbean, cauli construct (i.e. in which Placz was deleted). flower, celery, cherry, chicory, cilantro, citrus, clementines, clover, coconut, coffee, corn, cotton, cranberry, cucumber, 0.095 Additional Pfluorescens strains that can be used in Douglas fir, eggplant, endive, escarole, eucalyptus, fennel, the present invention include Pseudomonas fluorescens figs, garlic, gourd, grape, grapefruit, honey dew, jicama, Migula and Pseudomonas fluorescens Loitokitok, having the kiwifruit, lettuce, leeks, lemon, lime, Loblolly pine, linseed, following ATCC designations: NCIB 8286): NRRL mango, melon, mushroom, nectarine, nut, oat, oil palm, oil B-1244: NCIB 8865 strain CO1, NCIB 8866 strain CO2: seed rape, okra, olive, onion, orange, an ornamental plant, 1291 ATCC 17458; IFO 15837: NCIB 8917; LA; NRRL palm, papaya, parsley, parsnip, pea, peach, peanut, pear, B-1864; pyrrolidine: PW2ICMP3966; NCPPB 967; NRRL pepper, persimmon, pine, pineapple, plantain, plum, pome B-899): 13475; NCTC 10038; NRRL B-1603 6; IFO granate, poplar, potato, pumpkin, quince, radiata pine, radis 15840): 52-IC: CCEB 488-A BU 140); CCEB 553 IEM cchio, radish, rapeseed, raspberry, rice, rye, Sorghum, South 15/47); IAM 1008 AHH-27). IAM 1055 AHH-23): 1 (IFO ern pine, soybean, spinach, squash, Strawberry, Sugarbeet, 15842; 12 ATCC 25323; NIH 11; den Dooren de Jong Sugarcane, Sunflower, Sweet potato, Sweetgum, tangerine, 216); 18 IFO 15833; WRRL P-793 TR-10): 10852-22; tea, tobacco, tomato, triticale, turf, turnip, a vine, water IFO 15832): 143 IFO 15836; PL: 1492-40-40; IFO melon, wheat, yams, and Zucchini. In some embodiments, 15838; 182 IFO 3081; PJ 73): 184IFO 15830; 185 W2 plants useful in the process are Arabidopsis, corn, wheat, L-1): 186 IFO 15829; PJ 79); 187 NCPPB 263): 188 Soybean, and cotton. NCPPB 316): 189 PJ227; 1208): 191 (IFO 15834; PJ 236; 22/1); 194 Klinge R-60; PJ 253); 196 PJ 288); 197 PJ 0099 For expression of a recombinant protein or peptide, 290): 198 PJ 302; 201 PJ 368; 202 PJ 372); 203 PJ or for modulation of an identified compensatory gene, any 376): 204 IFO 15835; PJ 682): 205 PJ 686; 206PJ 692); plant promoter can be used. A promoter may be a plant RNA 207PJ 693; 208PJ 722); 212PJ 832); 215 PJ 849): 216 polymerase II promoter. Elements included in plant promot PJ 885); 267B-9: 271 B-1612): 401 (C71A; IFO 15831; ers can be a TATA box or Goldberg-Hogness box, typically PJ 187; NRRL B-3 1784; IFO 15841); KY 8521; 3081; positioned approximately 25 to 35 basepairs upstream (5') of 30-21; IFO 3081); N: PYR; PW; D946-B83 BU 2183; the transcription initiation site, and the CCAAT box, located FERM-P3328); P-2563 FERM-P2894; IFO 13658); IAM between 70 and 100 basepairs upstream. In plants, the 112643F): M-1: A506 A5-06); A505 A5-05-1: A526 CCAAT box may have a different consensus sequence than A5-26); B69; 72; NRRL B-4290; PMW6 NCIB 11615): the functionally analogous sequence of mammalian promot SC 12936; A1 (IFO 15839); F 1847 CDC-EB); F 1848 ers (Messing et al., In: Genetic Engineering of Plants, CDC 93): NCIB 10586; P17; F-12: AmMS 257; PRA25; Kosuge et al., eds., pp. 211-227, 1983). In addition, virtually 6133D02: 6519E01; N1; SC15208; BNL-WVC; NCTC all promoters include additional upstream activating 2583 NCIB 8194; H13; 1013 ATCC 11251; CCEB 295; sequences or enhancers (Benoist and Chambon, Nature IFO 3903: 1062; or Pf-5. 290:304-310, 1981: Gruss et al., Proc. Nat. Acad. Sci. USA 78:943-947, 1981; and Khoury and Gruss, Cell 27:313-314, 0096. Other suitable hosts include those classified in 1983) extending from around -100 bp to -1,000 bp or more other parts of the reference, such as Gram (+) Proteobacteria. upstream of the transcription initiation site. In one embodiment, the host cell is an E. coli. The genome sequence for E. coli has been established for E. coli MG 1655 Expression of Recombinant Protein or Peptide (Blattner, et al. (1997) The complete genome sequence of Escherichia coli K-12 Science 277(5331): 1453-74) and 0100. As described below, a host cell or organism can be DNA microarrays are available commercially for E. coli engineered to express recombinant protein or peptide using K12 (MWG Inc, High Point, N.C.). E. coli can be cultured standard techniques. For example, recombinant protein can in either a rich medium such as Luria-Bertani (LB) (10 g/L be expressed from a vector or from an exogenous gene tryptone, 5 g/L NaCl, 5 g/L yeast extract) or a defined inserted into the genome of the host. Vectors that can be used minimal medium such as M9 (6 g/L NaHPO 3 g/L to express exogenous proteins are well known in the art and KHPO, 1 g/L NHCl, 0.5 g/L NaCl, pH 7.4) with an are described below. Genes for expressing recombinant appropriate carbon Source Such as 1% glucose. Routinely, an protein or peptide can also be inserted into the genome using over night culture of E. coli cells is diluted and inoculated techniques such as homologous or heterologous recombina into fresh rich or minimal medium in either a shake flask or tion, as described below. a fermentor and grown at 37° C. 0101 The recombinant protein or peptide can be expressed after induction with a chemical compound or 0097. A host can also be of mammalian origin, such as a upon expression of an endogenous gene or gene product. cell derived from a mammal including any human or non The recombinant protein can also be expressed when the human mammal. Mammals can include, but are not limited host cell is placed in a particular environment. Specific to primates, monkeys, porcine, Ovine, bovine, rodents, ungu promoter elements are described below. These include, but lates, pigs, Swine, sheep, lambs, goats, cattle, deer, mules, are not limited to, promoters that can be induced upon horses, monkeys, apes, dogs, cats, rats, and mice. treatment of the cell with chemicals such as IPTG, benzoate 0098. A host cell may also be of plant origin. Any plant or anthranilate. can be selected for the identification of genes and regulatory sequences. Examples of Suitable plant targets for the isola Recombinant Proteins/Peptides tion of genes and regulatory sequences would include but 0102) The host cell has been designed to express a are not limited to alfalfa, apple, apricot, Arabidopsis, arti recombinant protein or peptide. These can be of any species choke, arugula, asparagus, avocado, banana, barley, beans, and of any size. However, in certain embodiments, the beet, blackberry, blueberry, broccoli, brussels sprouts, cab recombinant protein or peptide is a therapeutically useful US 2006/01 10747 A1 May 25, 2006 protein or peptide. In some embodiments, the protein can be expression vectors have been optimized for expression of a mammalian protein, for example a human protein, and can eukaryotic genes in E. coli. The trc promoter is a strong be, for example, a growth factor, a cytokine, a chemokine or hybrid promoter derived from the tryptophane (trp) and a blood protein. The recombinant protein or peptide can be lactose (lac) promoters. It is regulated by the lacO operator expressed primarily in an inactive form in the host cell. In and the product of the lac18 gene (Brosius, J. (1984) Toxicity certain embodiments, the recombinant protein or peptide is of an overproduced foreign gene product in Escherichia coli less than 100 kD, less than 50 kD, or less than 30 kD in size. and its use in plasmid vectors for the selection of transcrip In ceratin embodiments, the recombinant protein or peptide tion terminators Gene 27(2): 161-72). is a peptide of at least 5, 10, 15, 20, 30, 40, 50 or 100 amino acids. 0.107 The invention also includes the improved recom 0103 Expression vectors exist that enable recombinant binant host cell that is produced by the claimed process. In protein production in E. coli. For all these protein expression one embodiment, the invention includes a cell produced by systems routine cloning procedures as described earlier can the described process. In another embodiment, the invention be followed (Sambrook, et al. (2000) Molecular cloning. A includes a host cell or organism that expresses a recombinant laboratory manual, third edition Cold Spring Harbor, N.Y., protein that has been genetically modified to reduce the Cold Spring Harbor Laboratory Press). expression of at least two proteases. In other embodiments, 0104. The ChampionTM pET expression system provides the invention includes a host cell or organism that expresses a high level of protein production. Expression is induced a recombinant protein that has been genetically modified to from the strong T7lac promoter. This system takes advan reduce the expression of at least one protease selected from tage of the high activity and specificity of the bacteriophage the group consisting of products of hslV, hslU, clpX, clp A T7 RNA polymerase for high level transcription of the gene and clpB genes, and in certain Subembodiments, the cell or of interest. The lac operator located in the promoter region organism has been modified to reduce the expression of provides tighter regulation than traditional T7-based vectors, HslV or HslU. In certain embodiments, the modified host improving plasmid stability and cell viability (Studier, F. W. cell or organism expresses a recombinant mammalian and B. A. Moffatt (1986) Use of bacteriophage T7 RNA derived protein, and may express a recombinant human polymerase to direct selective high-level expression of derived protein, which may be human growth hormone. The cloned genes Journal of Molecular Biology 189(1): 113-30; cell can be modified by any techniques known in the art, for Rosenberg, et al. (1987) Vectors for selective expression of example by a technique wherein at least one protease gene cloned DNAs by T7 RNA polymerase Gene 56(1): 125-35). is knocked out of the genome, or by mutating at least one The T7 expression system uses the T7 promoter and T7 RNA protease gene to reduce expression of a protease, or by polymerase (T7 RNAP) for high-level transcription of the altering at least one promoter of at least one protease gene gene of interest. High-level expression is achieved in T7 to reduce expression of a protease. expression systems because the T7 RNAP is more proces 0108. In another embodiment, a host or organism that sive than native E. coli RNAP and is dedicated to the expresses a recombinant protein that is presented that has transcription of the gene of interest. Expression of the been genetically modified to increase the expression of at identified gene is induced by providing a source of T7 RNAP least one, at least two folding modulators, or at least three in the host cell. This is accomplished by using a BL21 E. coli folding modulators. In certain Subembodiments, the folding host containing a chromosomal copy of the T7 RNAP gene. modulators that are not folding modulator subunits. The The T7 RNAP gene is under the control of the lacUV5 folding modulator may be selected from the group consist promoter which can be induced by IPTG. T7 RNAP is ing of products of cbp.A, htpG. dnaK, dna, fkbP2, groES expressed upon induction and transcribes the gene of inter and groEL genes, and, in certain Subembodiments, can be eSt. htpG or cbp.A. The host cell or organism can in a non 0105 The p3AD expression system allows tightly con limiting example, express a mammalian protein, such as a trolled, titratable expression of recombinant protein through human protein. The protein may be human growth hormone. the presence of specific carbon Sources such as glucose, The folding modulator or modulators can be increased by, glycerol and arabinose (Guzman, et al. (1995) Tight regu for example, including an expression vector as described lation, modulation, and high-level expression by vectors herein in the cell. The folding modulator expression can also containing the arabinose PBAD promote'Journal of Bacte be increased by, for example, mutating a promoter of a riology 177(14): 4121-30). The pBAD vectors are uniquely folding modulator or folding modulator subunit. A host cell designed to give precise control over expression levels. or organism that expresses a recombinant protein can also be Heterologous gene expression from the pBAD Vectors is genetically modified to increase the expression of at least initiated at the araBAD promoter. The promoter is both one folding modulators and decrease the expression of at positively and negatively regulated by the product of the least one protease or protease protein. Organisms compris araC gene. AraC is a transcriptional regulator that forms a ing one or more cells produced by the described process are complex with L-arabinose. In the absence of L-arabinose, also included in the invention. the AraC dimer blocks transcription. For maximum tran Scriptional activation two events are required: (i.) L-arabi Step II: Analyzing a Genetic Profile to Identify a Compen nose binds to AraCallowing transcription to begin. (ii.) The satory Gene or Gene Product that is Expressed at a Higher cAMP activator protein (CAP)-cAMP complex binds to the Level in the Recombinant Cell DNA and stimulates binding of AraC to the correct location 0.109 The process of the invention includes analyzing a of the promoter region. genetic profile of the recombinant cell to identify a com 0106 The trc expression system allows high-level, regu pensatory gene or gene product that is expressed at a higher lated expression in E. coli from the trc promoter. The trc level in the recombinant cell than in either a host cell that has US 2006/01 10747 A1 May 25, 2006

not been modified to express the recombinant protein or a at least 50% of the transcribed genes of the organism. More recombinant cell that is not expressing the recombinant typically, the microarray or equivalent technique includes protein. binding partners for samples from at least 80%, 90%. 95%, 0110. A "genetic profile' as used herein can include 98%, 99% or 100% of the transcribed genes in the genome genes in a genome, mRNA transcribed from genes in the of the host cell. However, in a separate embodiment, the genome or cDNA derived from mRNA transcribed from microarray can include binding partners to a selected Subset genes in the genome. A gentic profile can also include of genes from the genome, including but not limited to transcription products that have been modified by a cell such putative protease genes or putative folding modulator genes. as splice variants of genes in eukaryotic systems, or proteins A microarray or equivalent technique can typically also translated from genes in a genome, including proteins that include binding partners to a set of genes that are used as are modified by the cell or translated from splice variants of controls, such as housekeeper genes. A microarray or mRNA translated from the genome. A genetic profile is equivalent technique can also include genes clustered into meant to refer solely to the simultaneous analysis of multiple groups such as genes coding for degradative proteins, fold entitities, such as in an array or other multiplex system, ing modulators and cofactors, metabolic proteins such as including multiple simultaneous blot analysis or column proteins involved in glucose or amino acid or chromatography with multiple binding partners attached to nucleobase synthesis, transcription factors, nucleic acid sta the packing. According to the invention, at least 5, 10, 25. bilizing factors, extracellular signal regulated genes such as 50, 70, 80, 90 or 100 or more genes or gene products that are kinases and receptors or scaffolding proteins. analyzed simultaneously. 0114. A microarray is generally formed by linking a large Transcriptome number of discrete binding partners, which can include polynucleotides, aptamers, chemicals, antibodies or other 0111. In one embodiment, the genetic profile analyzed is proteins or peptides, to a solid Support such as a microchip, a transcriptome profile. A complete transcriptome refers to glass slide, or the like, in a defined pattern. By contacting the the complete set of mRNA transcripts produced by the microarray with a sample obtained from a cell of interest and genome at any one time. Unlike the genome, the transcrip detecting binding of the binding partners expressed in the tome is dynamic and varies considerably in differing cir cell that hybridize to sequences on the chip, the pattern cumstances due to different patterns of gene expression. formed by the hybridizing polynucleotides allows the iden Transcriptomics, the study of the transcriptome, is a com tification of genes or clusters of genes that are expressed in prehensive means of identifying gene expression patterns. the cell. Furthermore, where each member linked to the solid The transcriptome analyzed can include the complete known Support is known, the identity of the hybridizing partners set of genes transcribed, i.e. the mRNA content or corre from the nucleic acid sample can be identified. One strength sponding cDNA of a host cell or host organism. The cDNA of microarray technology is that it allows the identification can be a chain of nucleotides, an isolated polynucleotide, of differential gene expression simply by comparing patterns nucleotide, nucleic acid molecule, or any fragment or of hybridization. complement thereof that originated recombinantly or Syn thetically and be double-stranded or single-stranded, coding 0115 Examples of high throughput screening processes and/or noncoding, an exon or an intron of a genomic DNA include hybridization of host cell mRNA or substantially molecule, or combined with carbohydrate, lipids, protein or corresponding cDNA, to a hybridizable array(s) or microar inorganic elements or Substances. The nucleotide chain can ray(s). The array or microarray can be one or more array(s) be at least 5, 10, 15, 30, 40, 50, 60, 70, 80, 90 or 100 of nucleic acid or nucleic acid analog oligomers or poly nucleotides in length. The transcriptome can also include mers. In one embodiment, the array(s) or microarray(s) will only a portion of the known set of genetic transcripts. For be independently or collectively a host-cell-genome-wide example, the transcriptome can include less than 98%. 95, array(s) or microarray(s), containing a population of nucleic 90, 85, 80, 70, 60, or 50% of the known transcripts in a host. acid or nucleic acid analog oligomers or polymers whose The transcriptome can also be targeted to a specific set of nucleotide sequences are hybridizable to representative por genes. tions of all genes known to encode or predicted as encoding FMs in the host cell strain or all genes known to encode or 0112 In one embodiment, the screening process can predicted to encode proteases or protease proteins in the host include screening using an array or a microarray to identify cell strain. A genome-wide microarray includes sequences a genetic profile. In another embodiment, the transcriptome that bind to a representative portion of all of the known or profile can be analyzed by using known processes Such as predicted open reading frame (ORF) sequences, such as hybridization in blot assays such as northern blots. In from mRNA or corresponding cDNA of the host. another embodiment, the process can include PCR-based processes such as RT-PCR that can quantify expression of a 0116. The oligonucleotide sequences or analogs in the particular set of genes. In one embodiment of the invention, array typically hybridize to the mRNA or corresponding an identified gene, for example a folding modulator protein cDNA sequences from the host cell and typically comprise (FM) or protease protein, i.e. a protease, peptidase, or a nucleotide sequence complimentary to at least a portion of associated polypeptide or cofactor, is identified by a high a host mRNA or cDNA sequence, or a sequence homologous throughput Screening process. to the host mRNA or cDNA sequence. Single DNA strands with complementary sequences can pair with each other and 0113. The process can include analyzing the transcrip tome profile using a microarray or equivalent technique. In form double-stranded molecules. this embodiment, the microarray can include at least a 0.117 Microarrays generally apply the hybridization prin portion of the transcribed genome of the host cell, and ciple in a highly parallel format. Instead of one identified, typically includes binding partners to samples from genes of thousands of different potential identifieds can be arrayed on US 2006/01 10747 A1 May 25, 2006

a miniature solid support. Instead of a unique labeled DNA bridged nucleic acids, and 2'-O,4'-C-methylene bridged probe, a complex mixture of labeled DNA molecules is used, nucleic acids; cyclohexenyl nucleic acids; 2.5'-linked nucle prepared from the RNA of a particular cell type or tissue. otide-based nucleic acids; morpholino nucleic acids (nucleo The abundances of individual labeled DNA molecules in this base-substituted morpholino units connected, e.g., by phos complex probe typically reflect the expression levels of the phorodiamidate linkages); backbone-substituted nucleic corresponding genes. In a simplified process, when hybrid acid analogs, e.g., 2-substituted nucleic acids, wherein at ized to the array, abundant sequences will generate strong least one of the 2" carbon atoms of an oligo- or poly signals and rare sequences will generate weak signals. The saccharide-type nucleic acid or analog is independently strength of the signal can represent the level of gene expres Substituted with, e.g., any one of a halo, thio, amino, sion in the original sample. aliphatic, oxyaliphatic, thioaliphatic, or aminoaliphatic group (wherein aliphatic is typically C-C aliphatic). 0118. In one embodiment, a genome-wide array or microarray will be used. In one embodiment, the array 0.122 Oligonucleotides or oligonucleotide analogs in the represents more than 50% of the open reading frames in the array can be of uniform size and, in one embodiment, can be genome of the host, or more than 55%, 60%. 65%, 70%, about 10 to about 1000 nucleotides, about 20 to about 1000, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 20 to about 500, 20 to about 100, about 20, about 25, about 97%, 98%, 99% or 100% of the known open reading frames 30, about 40, about 50, about 60, about 70, about 80, about in the genome. The array can also represent at least a portion 90 or about 100 nucleotides long. of at least 50% of the sequences known to encode protein in the host cell. In separate embodiments, the array represents 0123 The array of oligonucleotide probes can be a high more than 50% of the genes or putative genes of the host density array comprising greater than about 100, or greater cell, or more than 55%, 60%, 65%, 70%, 75%, 80%, 85%, than about 1,000 or more different oligonucleotide probes. 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or Such high density arrays can comprise a probe density of 100% of the known genes or putative genes. In one embodi greater than about 60, more generally greater than about ment, more than one oligonucleotide or analog can be used 100, most generally greater than about 600, often greater for each gene or putative gene sequence or open reading than about 1000, more often greater than about 5,000, most frame. In one embodiment, these multiple oligonucleotide or often greater than about 10,000, typically greater than about analogs represent different portions of a known gene or 40,000 more typically greater than about 100,000, and in putative gene sequence. For each gene or putative gene certain instances is greater than about 400,000 different sequence, from about 1 to about 10000 or from 1 to about oligonucleotide probes per cm (where different oligonucle 100 or from 1 to about 50, 45, 40, 35, 30, 25, 20, 15, 10 or otides refers to oligonucleotides having different sequences). less oligonucleotides or analogs can be present on the array. The oligonucleotide probes range from about 5 to about 500, or about 5 to 50, or from about 5 to about 45 nucleotides, or 0119) A microarray or a complete genome-wide array or from about 10 to about 40 nucleotides and most typically microarray may be prepared according to any process from about 15 to about 40 nucleotides in length. Particular known in the art, based on knowledge of the sequence(s) of arrays contain probes ranging from about 20 to about 25 the host cell genome, or the proposed coding sequences in oligonucleotides in length. The array may comprise more the genome, or based on the knowledge of expressed mRNA than 10, or more than 50, or more than 100, and typically sequences in the host cell or host organism. more than 1000 oligonucleotide probes specific for each identified gene. In one embodiment, the array comprises at 0120 For different types of host cells, the same type of least 10 different oligonucleotide probes for each gene. In microarray can be applied. The types of microarrays include another embodiment, the array has 20 or fewer oligonucle complementary DNA (cDNA) microarrays (Schena, M. et otides complementary each gene. Although a planar array al. (1995) Quantitative monitoring of gene expression pat Surface is typical, the array may be fabricated on a surface terns with a complementary DNA microarray. Science of virtually any shape or even on multiple Surfaces. 270:467-70) and oligonucleotide microarrays (Lockhart, et al. (1996) Expression monitoring by hybridization to high 0.124. The array may further comprise mismatch control density oligonucleotide arrays. Nat Biotechnol 14:1675-80). probes. Where such mismatch controls are present, the For cDNA microarray, the DNA fragment of a partial or quantifying step may comprise calculating the difference in entire open reading frame is printed on the slides. The hybridization signal intensity between each of the oligo hybridization characteristics can be different throughout the nucleotide probes and its corresponding mismatch control slide because different portions of the molecules can be probe. The quantifying may further comprise calculating the printed in different locations. For the oligonucleotide arrays, average difference in hybridization signal intensity between 20-80-mer oligos can be synthesized either in situ (on-chip) each of the oligonucleotide probes and its corresponding or by conventional synthesis followed by on-chip immobi mismatch control probe for each gene. lization, however in general all probes are designed to be similar with regard to hybridization temperature and binding 0.125. In some assay formats, the oligonucleotide probe can be tethered, i.e., by covalent attachment, to a solid affinity (Butte, A. (2002) The use and analysis of microarray Support. Oligonucleotide arrays can be chemically synthe data. Nat Rev Drug Discov 1:951-60). sized by parallel immobilized polymer synthesis processes 0121. In analyzing the transcriptome profile, the nucleic or by light directed polymer synthesis processes, for acid or nucleic acid analog oligomers or polymers can be example on poly-L- Substrates Such as slides. Chemi RNA, DNA, or an analog of RNA or DNA. Such nucleic cally synthesized arrays are advantageous in that probe acid analogs are known in the art and include, e.g.: peptide preparation does not require cloning, a nucleic acid ampli nucleic acids (PNA); arabinose nucleic acids; altritol nucleic fication step, or enzymatic synthesis. The array includes test acids; bridged nucleic acids (BNA), e.g. 2'-O,4'-C-ethylene probes which are oligonucleotide probes each of which has US 2006/01 10747 A1 May 25, 2006 a sequence that is complementary to a Subsequence of one 0129. The process can include techniques that measure of the genes (or the mRNA or the corresponding antisense protein expression levels, protein-protein interactions, pro cRNA) whose expression is to be detected. In addition, the tein-Small molecule interactions or enzymatic activities. In array can contain normalization controls, mismatch controls one embodiment, the proteome is analyzed using a screening and expression level controls as described herein. process that includes measurement of size of certain pro teins, typically using mass spectrometry. In one embodi 0126 An array may be designed to include one hybrid ment, the technique to analyze the proteome profile includes izing oligonucleotide per known gene in a genome. The hybridization of an antibody to a protein of interest. For oligonucleotides or equivalent binding partners can be example, the process can include Western blot processes as 5'-amino modified to Support covalent binding to epoxy known in the art or can include column chromatography. coated slides. The oligonucleotides can be designed to The process can also include standard processes such as reduce cross-hybridization, for example by reducing Elisa Screening known in the art. The process can also sequence identity to less than 25% between oligonucle include binding of nucleic acid modified binding partners, otides. Generally, melting temperature of oligonucleotides is which can be aptamers or can be protein or chemical binding analyzed before design of the array to ensure consistent GC partners for proteins or peptide fragments in the proteome content and Tm, and secondary structure of oligonucleotide and a screening process can include amplification of the binding partners is optimized. For transcriptome profiling, nucleic acids. The process can also include chemical com secondary structure is typically minimized. In one embodi pounds that bind to the proteins or fragments of proteins in ment, each oligonucleotide is printed at at least two different a proteome and the process can include measurement of the locations on the slide to increase accuracy. Control oligo binding by chemical means. The measurement can also nucleotides can also be designed based on sequences from include measurement of reaction products in a chemical different species than the host cell or organism to show reaction, or by activation of a fluorophore. Techniques like background binding. mass spectrometry in combination with separation tools 0127. The samples in the genetic profile can be analyzed Such as two-dimensional gel electrophoresis or multidimen individually or grouped into clusters. The clusters can typi sional liquid chromatography, can also be used in the cally be grouped by similarity in gene expression. In one process. Typically, the process includes a high throughput embodiment, the clusters may be grouped individually as Screening technique. genes that are regulated to a similar extent in a host cell. The 0.130. The process of the invention can include analyzing clusters may also include groups of genes that are regulated the proteome profile using, for example, two-dimensional to a similar extent in a recombinant host cell, for example electrophoresis. This is a method for the separation and genes that are up-regulated or down-regulated to a similar identification of proteins in a sample by displacement in two extent compared to a host cell or a modified or an unmodi dimensions oriented at right angles to one another. This fied cell. The clusters can also include groups related by allows the sample to separate over a larger area, increasing gene or protein structure, function or, in the case of a the resolution of each component. The first dimension is transcriptome array, by placement or grouping of binding typically based on the charge of a particular molecule while partners to genes in the genome of the host. Groups of the second dimension may be based on the size of a binding partners or groups of genes or proteins analyzed can molecule. In the first dimension, proteins are resolved in include genes selected from, but not limited to: genes coding according to their isoelectric points using immobilized pH for putative or known proteases, co-factors of proteases or gradient electrophoresis (IPGE), isoelectric focusing (IEF), protease-like proteins; folding modulators, co-factors of or non-equilibrium pH gradient electrophoresis. Under stan folding modulators or proteins that could improve protein dard conditions of temperature and urea concentration, the folding or solubility; transcription factors; proteins involved observed focusing points of the great majority of proteins in nucleic acid stability or translational initiation; kinases; closely approximate the predicted isoelectric points calcu extracellular or intracellular receptors; metabolic enzymes; lated from the proteins amino acid compositions. Generally, metabolic cofactors; envelope proteins; sigma factors; mem the first step after preparation of a host sample includes brane bound proteins; transmembrane proteins; membrane running the sample against a pH gradient, a process known associated proteins and housekeeping genes. as isoelectric focusing. The pH gradients can be generated by adding ampholytes to an acrylamide gel. These are a Proteome mixture of amphoteric species with a range of p values. The 0128. In another embodiment, the genetic profile ana pH gradients can also be generated by adding Immobilines, lyzed is a proteome profile. The proteome of a host is the which are similar to ampholytes but have been immobilised complete set of proteins produced by the genome at any one within the polyacrylamide gel producing an immobilised pH time. The proteome is generally much more complex than gradient that does not need to be pre-focused. either the genome or the transcriptome because each protein 0131 The second dimension in two-dimensional electro can be chemically modified after synthesis. Many proteins phoresis may be separation by size of proteins. Proteins may are cleaved during production, are phosphorylated, acety be separated according to their approximate molecular lated, methylated, or have carbohydrate groups added to weight using sodium dodecyl Sulfate poly-acrylamide-elec them, depending on the host cell. The proteome is also very trophoresis (SDS-PAGE). The technique is widely used and dynamic. Proteomics, the study of the proteome, can cover known in the art. The basic idea is to coat proteins with a a number of different aspects of protein structure, protein detergent (SDS), which coats all proteins in a sample and expression, and function. The techniques for proteome negatively charges them. The proteins are then Subjected to analysis are not as straightforward as those used in tran gel electrophoresis. The gels can typically be acrylamide Scriptomics. However, an advantage of proteomics is that the gels and can be in a gradient of density. The charge placed functional molecules of the cell are being studied. on the gel pushes the proteins through the gel based on size. US 2006/01 10747 A1 May 25, 2006

In two dimensional electrophoresis, the proteins separated affinity protein capture reagents for protein biochips. For can include proteins from at least 10% of the proteome of the example, aptamers have been used, which are single organism. More typically, proteins from at least 20%, 30%, stranded RNA or DNA molecules originating from in vitro 40%, 60%, 80% or 90% of the proteins in the proteome of selection experiments (termed SELEX: systematic evolution the host cell are separated and analysed by techniques such of ligands by exponential enrichment) with high affinities to as staining of proteins and/or mass spectrometry. proteins. A further development in aptamer technologies are 0132) The process of the invention can also include so called photoaptamers. These molecules have an addi analyzing the proteome profile using a microarray. In this tional attribute that enhances their utility as protein capture embodiment, the microarray can include binding partners to reagents. They carry the photoactivatible crosslinking group at least a portion of the proteins expressed by the host cell 5'-bromodeoxyuridine, which, when activated by UV light, under appropriate growth conditions, and typically includes can cause covalent crosslinking with bound identified pro binding partners to proteins from at least 5% of the proteome teins (Petach, H & Gold, L. (2002) Dimensionality is the of the organism. More typically, the microarray includes issue: use of photoaptamers in protein microarrays Curr binding partners to proteins from at least 10%, 20%, 30%, Opin Biotechnol 13:309-314). The photo-crosslinking event 40%, 60%, 80% or 90% of the proteins in the proteome of provides a second dimension of specificity similar to the the host cell. The binding partners can be antibodies, which binding of a secondary detection antibody in a sandwich can be antibody fragments such as single chain antibody immunoassay. fragments. The binding partners can also include aptamers, 0.135 A wide variety of surface substrates and attachment which are molecules including nucleic acids that bind to chemistries have been evaluated for the immobilization of specific proteins or portions of proteins. In a separate capture agents on protein microarrays. One way to immo embodiment, the microarray can include binding partners bilize proteins on a solid Support relies on non-covalent for a selected subset of proteins from the proteome, includ interactions based on hydrophobic or van der Waals inter ing, for example, putative protease proteins or putative actions, hydrogen bonding or electrostatic forces. Examples folding modulators. The microarray can typically also of electrostatic immobilization include the use of materials include a set of binding partners to proteins that are used as Such as nitrocellulose and poly-lysine- or aminopropyl controls. The genetic profile can be analyzed by measuring silane-coated glass slides. Protein microarrays were also the binding of the proteins of the host cell expressing the fabricated by means of physical adsorption onto plastic recombinant protein or peptide to the binding partners on the surfaces of 96-well plates. An example of covalent attach microarray. ment of proteins to the surface has been described by MacBeath and Schreiber (MacBeath, G & Schreiber, S L 0133. The simplest protein array format generally con (2000) Printing proteins as microarrays for high-throughput sists of a large number of protein capture reagents bound to function determination Science 289:1760-1763). Due to the defined spots on a planar Support material. This array is then very high affinity of streptavidin to biotin, the immobiliza exposed to a complex protein sample. The binding of the tion of biotinylated proteins onto Streptavidin Surfaces can specific analyte proteins to the individual spots can then be be considered quasi covalent (Peluso, Pet al. (2003) Opti monitored using different approaches. In cases where the mizing antibody immobilization strategies for the construc analytes have been pre-labeled with a fluorescent dye, the tion of protein microarrays Anal Biochem 312:113-124). binding can be monitored directly using a fluorescence Further strategies have been described (Ruiz-Taylor, LA, et scanner. Often the classical antibody sandwich type format all (2001) X-ray photoelectron spectroscopy and radiometry is used, in which two protein binding reagents simulta studies of biotin-derivatized poly(L-lysine)-grafted-poly neously bind to the same antigen: one antibody is immobi (ethylene glycol) monolayers on metal oxides (Langmuir) lized onto the surface, and the other one is fluorescently 7313-7322: Ruiz-Taylor, L A et al. (2001) Monolayers of labeled or conjugated to an enzyme that can produce a derivatized poly(L-lysine)-grafted poly(ethylene glycol) on fluorescent, luminescent or colored product when Supplied metal oxides as a class of biomolecular interfaces Proc Natl with the appropriate substrate. AcadSci USA 2001, 98:852-857; Espejo A, Bedford Mont. 0134 Monoclonal antibodies or their antigen-binding (2004) Protein-domain microarrays Processes Mol Biol. fragments are currently one choice for capture agents due to 264:173-81; Zhu, H. et al. (2001) Global analysis of protein their high specificity, affinity and stability. They have been activities using proteome chips. Science Express). used in a variety of classical single analyte protein profiling 0.136 The samples in the genetic profile can be analyzed assays Such as enzyme-linked immunosorbent assays individually or grouped into clusters. The clusters can typi (ELISA) since the seventies. Additionally, phage-display cally be grouped by similarity in gene expression. In one libraries of antibody fragments offer the potential for anti embodiment, the clusters may be grouped individually as body production at proteomic scales. These libraries can be proteins that are regulated to a similar extent in a host cell. used to isolate high-affinity binding agents against protein The clusters may also include groups of proteins that are identified in a significantly shorter time frame than it is regulated to a similar extent in a recombinant host cell, for possible with immunization-based processes. Ribosome dis example, that are up-regulated or down-regulated to a simi play and mRNA display are additional, completely in vitro, lar extent compared to a host cell or a modified or an processes that rely on physically linking the library proteins unmodified cell. The clusters can also include groups related to their encoding mRNA sequences. Such processes have by protein structure, function, or processing. Groups of Successfully been used to select high-affinity binding protein binding partners in an array, or groups of proteins reagents to identified proteins (Wilson, DS, et al. (2001) The analyzed in a different assay such as two-dimensional elec use of mRNA display to select high-affinity protein-binding trophoresis can be selected from, but are not limited to: peptides Proc Natl Acad Sci USA 98:3750-3755). Several putative or known proteases, co-factors of proteases or groups have taken a different approach to develop high protease-like proteins; folding modulators, co-factors of US 2006/01 10747 A1 May 25, 2006 folding modulators or proteins that could improve protein using osmotic lysis; using thermal changes, such as freeze folding or solubility; transcription factors; proteins involved thaw cycles; using mechanical means or using pressure in nucleic acid stability or translational initiation; kinases; changes. Typically chemicals are included in the process of extracellular or intracellular receptors; metabolic enzymes; lysing a cell or cell system that inhibit certain proteins. Such metabolic cofactors; envelope proteins; and housekeeping as proteases, particularly non-specific proteases, to limit genes. degradation of proteins. In addition, cell lysates are typically kept at or below 4°C., and can be kept at or below 0°C. or Metabolome at or below 20° C. during processing. Cell lysates can be 0137 Proteomic analysis processes allow the abundance separated before further processing, for example by size and distribution of many proteins to be determined simul exclusion chromatography, ion exchange or affinity matrix taneously. However, the functional consequences of changes chromatography Such as by using HPLC. to the proteome are reported only indirectly. Another 0.141 Typically, the identified genetic product, mRNA, approach is to measure the levels of these Small molecules, cDNA, protein or metabolite is labeled with a detectable or metabolites. A genetic profile analyzed in the process of marker or probe. The marker or probe can be one or more the invention can thus include a metabolomic profile. Pro fluorescent molecules or fluorophores. These can include cesses for analyzing the metabolome of a specific host commercially available molecules such as Cy3 and Cy5 include gas chromatography, high-pressure liquid chroma linked to, for example, particular nucleotides that can be tography and capillary electrophoresis to separate metabo incorporated into a reverse transcribed cDNA to provide lites according to various chemical and physical properties. detectable molecules for Screening. In one embodiment, a The molecules can then be identified using processes such as first fluorophores is incorporated into a sample from the host mass spectrometry. and a second fluorophore is incorporated into a sample from Detection/Analysis a host expressing recombinant protein or peptide. In one 0138. The process includes analyzing a genetic profile to embodiment, the first fluorophore and second fluorophore identify a compensatory gene or gene product that is emit different wavelengths of light. In this embodiment, the expressed at a higher level in the recombinant cell. In binding of samples from the host and the host expressing general, this step includes the monitoring of the expression recombinant protein can be monitored in the same assay. In (e.g. detecting and or quantifying the expression) of a another embodiment, the fluorophores are excited at differ multitude of genes or gene products. The expression is ent wavelengths of light. In another embodiment, the first generally monitored by detecting binding of host cell gene and second fluorophore are excited or emit light at the same products to a transcriptome, proteome or metabolome profile wavelength. In this embodiment, the samples from the host as described above. The analysis of the binding may involve and from the host expressing recombinant protein are typi a comparison of binding between a recombinant host cell cally monitored in different assays. expressing recombinant protein or peptide and a naive host 0142. The process can additionally include a step of cell or a recombinant host cell not expressing the protein or quantifying the hybridization of the identified nucleic acids peptide. or proteins or chemical metabolites. The quantification can include measurement of the levels of transcription of one or Detection more genes. Typically the pool of identified nucleic acids for 0.139. This step includes the monitoring of the expression example, is one in which the concentration of the identified (e.g. detecting and or quantifying the expression) of a nucleic acids (pre-mRNA transcripts, mRNA transcripts or multitude of genes or gene products. The expression is nucleic acids derived from the RNA transcripts) is propor generally monitored by detecting binding of host cell gene tional to the expression levels of genes encoding those products to a transcriptome, proteome or metabolome profile identified nucleic acids. as described above. Typically, at least about 10 genes, or at least about 100, or at least about 1000 and or at least about 0.143 For transcriptome analysis, the pool of nucleic 10,000 different genes can be assayed at one time. The acids may be labeled before, during, or after hybridization, process can involve providing a pool of identified nucleic although typically the nucleic acids are labeled before acids comprising RNA transcripts of one or more of said hybridization. Fluorescence labels are typically used, often genes, or nucleic acids derived from the RNA transcripts: with a single fluorophore, and, where fluorescence labeling hybridizing the pool of nucleic acids to an array of oligo is used, quantification of the hybridized nucleic acids can be nucleotide probes immobilized on a surface, where the array by quantification of fluorescence from the hybridized fluo comprises more than 100 different oligonucleotides and each rescently labeled nucleic acid. Such quantification is facili different oligonucleotide is localized in a predetermined tated by the use of a confocal laser scanner or fluorescence region of said Surface, each different oligonucleotide is microscope. Such as a confocal fluorescence microscope, attached to the Surface through at least one covalent bond, which can be equipped with an automated Stage to permit and the oligonucleotide probes are complementary to the automatic scanning of the array, and which can be equipped RNA transcripts or nucleic acids derived from the RNA with a data acquisition system for the automated measure transcripts; and quantifying the hybridized nucleic acids in ment recording and Subsequent processing of the fluores the array. A pictoral representation of one technique for cence intensity information. Devices for reading such arrays monitoring expression of a gene product between two include the CloneTrackerTM, ImaGeneTM, GeneSightTM modules and the GeneDirectorTM database, available from samples is depicted in FIG. 12. Biodiscovery, Inc., El Segundo, Calif., or the GeneChipTM 0140. The process can also involve providing a pool of reader, available from Affymetrix, Inc. of Santa Clara, Calif. cellular proteins. These can be derived from cellular lysates In one embodiment, hybridization occurs at low stringency that are made by lysing cells using detergents or Surfactants; (e.g. about 20° C. to about 50° C., or about 30° C. to about US 2006/01 10747 A1 May 25, 2006

40° C., or about 37° C.). Hybridization may include subse 0148. In another approach to background reduction, a quent washes at progressively increasing stringency until a pool of mRNAs derived from a biological sample is hybrid desired level of hybridization specificity is reached. ized with paired identified specific oligonucleotides where the paired identified specific oligonucleotides are comple 0144) Quantification of the hybridization signal can be by mentary to regions flanking subsequences of the mRNAS any means known to one of skill in the art. However, in one complementary to the oligonucleotide probes in the array. embodiment, quantification is achieved by use of a confocal The pool of hybridized nucleic acids is treated with RNase fluorescence scanner. Data is typically evaluated by calcu H which digests the hybridized (double stranded) nucleic lating the difference in hybridization signal intensity acid sequences. The remaining single stranded nucleic acid between each oligonucleotide probe and its corresponding sequences which have a length about equivalent to the mismatch control probe. Typically, this difference can be region flanked by the paired identified specific oligonucle calculated and evaluated for each gene. Certain analytical otides are then isolated (e.g. by electrophoresis) and used as processes are provided herein. the pool of nucleic acids for monitoring gene expression. 0145 Techniques have been developed to prepare appro 0149) A third approach to background reduction involves priate bacterial hybridization probes (see for eg. Choi et al. eliminating or reducing the representation in the pool of (2003) App. Envir. Microbio. 69:4737-4742). For example, particular preselected identified mRNA messages (e.g., mes cells can be stored in an RNA stabilizing agent such as sages that are characteristically overexpressed in the RNAlater (Ambion, Austin,Tex.). RNA is generally purified sample). This process involves hybridizing an oligonucle in three steps: (1) isolation of the total RNA, (2) removal of otide probe that is complementary to the preselected iden contaminating DNA and (3) clean-up of the total RNA. Total tified mRNA message to the pool of polyA" mRNAs derived RNA can be isolated and then mixed with random hexamer from a biological sample. The oligonucleotide probe hybrid primers and reverse transcriptase to make cDNA. Typically izes with the particular preselected polyA" mRNA to which at least one fluorescent probe is incorporated into the cDNA. it is complementary. The pool of hybridized nucleic acids is In one embodiment, one fluorescent probe is incorporated, in treated with RNase H which digests the double stranded another embodiment more than one probe, for example 2, 3, (hybridized) region thereby separating the message from its 4, 5 or more fluorescent probes are incorporated into the polyA" tail. Isolating or amplifying (e.g., using an oligo dT same or different samples of cDNA. In a eukaryotic host, the column) the polyA" mRNA in the pool then provides a pool pool of identified nucleic acids can also be the total polyA" having a reduced or no representation of the preselected mRNA isolated from a biological sample, or cDNA made by identified mRNA message. reverse transcription of the RNA or second strand cDNA or Analysis RNA transcribed from the double stranded cDNA interme diate. 0150. The identified gene is typically identified by com paring a genetic profile of the host cell expressing the 0146). Fluorescent dyes are typically incorporated into recombinant protein or peptide to a genetic profile of the cDNA molecules during the reverse transcription reaction. host cell not expressing the recombinant protein or peptide. Due to the different mRNA structure between prokaryotes In iterative embodiments, the identified gene to be modified (bacteria) and eukaryotes (yeast, mammalian cells, etc.). is identified by comparing a genetic profile of the cell that is different primers can be used, however random primers can to be modified (the second cell) to the cell that it was be used in both cases, and oligo-dT primers can be used in modified from (the first cell). The identified gene is identi eukaryots, which have polyA tails. An alternative process is fied by comparing a genetic profile of the second cell to a amino-allyl labeling to increase the signal intensity. This genetic profile of the first cell and identifying one or more process incorporates nucleotide analogs featuring a chemi genes the expression of which is increased in the second cell. cally reactive group to which a fluorescent dye may be 0151 cDNA microarrays measure the relative mRNA attached after the reverse transcription reaction (Manduchi, abundance between two samples. A series of post-induction E. et al. (2002) Comparison of different labeling processes time point samples can be compared to the pre-induction for two-channel high-density microarray experiments. sample for the same strain (temporal expression profile), or Physiol Genomics 10:169-79). post-induction samples can be compared with different 0147 The pool of identified nucleic acids can be treated strains at the same time point. The comparison can be to reduce the complexity of the sample and thereby reduce through the use of a computer program, such as Gen the background signal obtained in hybridization. The terms eSightTM. For example, when using a microarray using a “background” or “background signal” refer to hybridization fluorescent tag, a spot intensity can be measured for each signals resulting from non-specific binding, or other inter sample attached to the array (for example a DNA sequence). actions, between the labeled identified nucleic acids and The spot intensity can then be corrected for background and components of the oligonucleotide array (e.g., the oligo the ratio of the intensity for samples from the host versus the nucleotide probes, control probes, the array substrate, etc.). host expressing the recombinant protein or peptide, or for In one approach, a pool of mRNAs, derived from a biologi the host expressing the recombinant protein or peptide cal sample, is hybridized with a pool of oligonucleotides compared to the modified host expressing the recombinant comprising the oligonucleotide probes present in the array. protein or peptide can be measured. The ratio provides a The pool of hybridized nucleic acids is then treated with measure to identify the genes that are up-regulated or the RNase A which digests the single stranded regions. The expression of which is increased upon expression of the remaining double stranded hybridization complexes are then recombinant protein or peptide, or upon modification of the denatured and the oligonucleotide probes are removed, host cell to allow identification of a identified gene. leaving a pool of mRNAs enhanced for those mRNAs 0152 To identify whether a gene is up-regulated, a stan complementary to the oligonucleotide probes in the array. dard or “cut off ratio is established. The cut off ratio may US 2006/01 10747 A1 May 25, 2006 be designed to overcome the effects of background noise expression of a protease cofactor or protease protein. In associated with a particular assay. In general, any ratio of another embodiment, the host cell is modified by inhibition greater than 1 between the measurements can designate an of a promoter for a protease or related protein, which can be up-regulated gene. However, variation between assays can a native promoter. The gene modification can be to modulate require a ratio higher than 1, for example 1.5, or more than a protein homologous to the identified identified gene. 2, or more than 2.5, or more than 3, or more than 3.5 or more O157. In the MEROPS database, peptidases are grouped than 4 or more than 4.5, or more than 5 or more than 6, or into clans and families. The families are groups of closely more than 7, or more than 8, or more than 9 or more than 10. related functionally similar peptidases. Families are grouped The standard may be established before the process, relying by their catalytic type: S, serine; T, threonine: C, cysteine; A, on Standards known in the art, or may be established during aspartic; M. metallo and U, unknown. Over 20 families measurements by comparing ratios of levels of control genes (denoted S1-S27) of have been identified, or gene products, such as housekeeper genes. these being grouped into 6 clans (SA, SB, SC, SE, SF and Step III: Changing Expression of the Identified Compensa SG) on the basis of structural similarity and other functional tory Gene or Gene Product by Genetically Modifying the evidence. Structures are known for four of the clans (SA, Cell to Provide a Modified Recombinant Cell that Achieves SB, SC and SE). Threonine peptidases are characterized by an Increase in Recombinant Protein Expression, Activity or a threonine nucleophile at the N terminus of the mature Solubility. enzyme. The type example for this clan is the archaean beta component of Thermoplasma acidophilum. Identified Compensatory Genes Cysteine peptidases have characteristic molecular topolo 0153. The compensatory genes or gene products that are gies and are peptidases in which the nucleophile is the identified in step ii), or homologous analogues, cofactors or Sulphydryl group of a cysteine residue. Cysteine proteases Subunits thereof, are used to design strategies to genetically are divided into clans (proteins which are evolutionary modify the cell to either increase, decrease, knock in or related), and further sub-divided into families, on the basis knock out expression of one or more identified genes. The of the architecture of their catalytic dyad or triad: gene sequences identified in public databases can be used to 0158 Clan CA contains the families of (C1). design strategies, particularly to design constructs to modu calpain (C2), Streptopain (C10) and the ubiquitin-specific late expression of a gene by techniques described above. peptidases (C12, C19), as well as many families of viral Such techniques are well known. cysteine . 0154) In one embodiment, the identified gene or genes is at least one putative protease, a protease-like protein, a 0159) Clan CD contains the families of clostripain (C11), cofactor or subunit of a protease. In other embodiments, the R (C25), legumain (C13), caspase-1 (C14) and identified gene or genes is at least one folding modulator, separin (C50). These enzymes have specificities dominated putative folding modulator, cofactor or subunit of a folding by the interactions of the S1 subsite. modulator. In certain embodiments, a identified gene is a 0.160 Clan CE contains the families of adenain (C5) from subunit of a protease. In one embodiment, the identified adenoviruses, the eukaryotic Ulp 1 protease (C48) and the gene or genes can be a serine, threonine, cysteine, aspartic bacterial Yop proteases (C55). or metallo peptidase. In one embodiment, the identified gene or genes can be selected from hslV, hslU, clpA, clp3 and 0.161 Clan CF contains only pyroglutamyl peptidase I clpX. The identified gene can also be a cofactor of a (C15). protease. In another embodiment, the identified gene or 0162 Clan PA contains the picornains (C3), which have genes is a folding modulator. In some embodiments, the probably evolved from serine peptidases and which form the identified gene or genes can be selected from a chaperone majority of enzymes in this clan. protein, a foldase, a peptidyl prolyl isomerase and a disulfide bond isomerase. In some embodiments, the identified gene 0.163 Clans PB and CH contain the autolytic cysteine or genes can be selected from htpG, cbp.A, dna), dnaK and peptidases. fkbP. 0.164 Aspartic endopeptidases of vertebrate, fungal and 0155 Bacterial genes are organized into , which retroviral origin have been characterised. Aspartate pepti are gene clusters that encode the proteins necessary to dases are so named because Asp residues are the ligands of perform coordinated function, Such as biosynthesis of a the activated water molecule in all examples where the given amino acid. Therefore, in one embodiment, the iden catalytic residues have been identifed, although at least one tified gene is part of an . In a particular embodiment, viral enzyme is believed to have as Asp and an ASn as its the identified gene is in an operon that encodes for one or catalytic dyad. All or most aspartate peptidases are endopep more proteins with protease activity alone or in combina tidases. These enzymes have been assigned into clans (pro tion, or is an operon that encodes for one or more proteins teins which are evolutionary related), and further sub-di with folding modulator activity, including foldases, chaper vided into families, largely on the basis of their tertiary ones, and . Structure. 0.165 Metalloproteases are the most diverse of the four Proteases main types of protease, with more than 30 families identified 0156. In one embodiment of the invention, the host cell to date. In these enzymes, a divalent cation, usually Zinc, is modified by reducing expression of inhibiting or remov activates the water molecule. The metal ion is held in place ing at least one protease from the genome. The modification by amino acid ligands, usually three in number. The known can also be to more than one protease in some embodiments. metal ligands are His, Glu, Asp or Lys and at least one other In a related embodiment, the cell is modified by reducing residue is required for , which may play an elec US 2006/01 10747 A1 May 25, 2006 20 trophillic role. Of the known metalloproteases, around half , gamma-glu-X carboxypeptidase, acylmu contain an HEXXH motif, which has been shown in crys ramoyl-ala peptidase. Serine proteinases include chymot tallographic studies to form part of the metal-. rypsin, c, metridin, , , coagul The HEXXH motif is relatively common, but can be more lation factor Xa, , , , alpha stringently defined for metalloproteases as abXHEbbHbc, lytic protease, glutamyl, , G, where a is most often or threonine and forms part of factor viia, coagulation factor ixa, cucumisi, the S1 subsite in and , b is an prolyl , coagulation factor Xia, brachyurin, uncharged residue, and 'c' a hydrophobic residue. is plasma , tissue kallikrein, pancreatic , leu never found in this site, possibly because it would break the kocyte elastase, coagulation factor Xia, , comple helical structure adopted by this motif in metalloproteases. ment component c1rS5, complement component c1 S55. 0166 The peptidases associated with clan U-have an classical-complement pathway c3/c5 convertase, comple unknown catalytic mechanism as the protein fold of the ment factor I, complement , alternative-complement active site domain and the active site residues have not been pathway c3/c5 convertase, cerevisin, hypodermin C, lysyl reported. endopeptidase, endopeptidase 1a, gamma-reni, Venombin ab, leucyl endopeptidase, , Scutelarin, kexin, Subtili 0167 Certain proteases (e.g. OmpT) can adsorb to the sin, oryzin, endopeptidase k, thermomycolin, thermitase, Surface of inclusion bodies and may degrade the desired protein while it is being refolded. Therefore, certain identi endopeptidase SO. T-, protein C, pan fied proteins can be proteases or protease proteins that creatic endopeptidase E, ii, IGA-specific serine endopeptidase, U-plasminogen, activator, Venombin adhere to inclusion bodies and these can be modified to, for A , myeloblastin, semenogelase, A or cyto example, reduce attachment. toxic T-lymphocyte proteinase 1, granzyme B or cytotoxic 0168 Proteases or protease proteins can also be classified T-lymphocyte proteinase 2, Streptogrisin A, treptogrisin B, as ; ; Dipeptidyl-peptidases glutamyl endopeptidase II, oligopeptidase B, limulus clot and tripeptidyl peptidases: Peptidyl-dipeptidases; Serine ting factor c, limulus clotting factor, limulus clotting type ; Metallocarboxypeptidases; Cys enzyme, omptin, repressor lexa, bacterial leader peptidase I. teine-type carboxypeptidases; Omegapeptidases; Serine togavirin, flavirin. Cysteine proteinases include cathepsin B, proteinases; Cysteine proteinases; Aspartic proteinases; papain, ficin, chymopapain, asclepain, clostripain, Strepto Metallo proteinases; or Proteinases of unknown mechanism. pain, actinide, cathepsin 1, cathepsin H, calpain, cathepsint, glycyl, endopeptidase, cancer procoagulant, cathepsin S. 0169 Aminopeptidases include cytosol picornain 3C, picornain 2A, caricain, ananain, stem brome (), membrane alanyl aminopeptidase, lain, fruit bromelain, legumain, histolysain, interleukin cystinyl aminopeptidase, tripeptide aminopeptidase, prolyl 1-beta converting enzyme. Aspartic proteinases include pep aminopeptidase, arginyl aminopeptidase, glutamyl ami sin A, B, gastricsin, , . neopen nopeptidase, X-pro aminopeptidase, bacterial leucyl ami thesin, , retropepsin, pro-opiomelanocortin converting nopeptidase, thermophilic aminopeptidase, clostridial ami enzyme, aspergillopepsin I, aspergillopepsin II, penicil nopeptidase, cytosol alanyl aminopeptidase, lysyl lopepsin, rhizopuspepsin, endothiapepsin, mucoropepsin, aminopeptidase, X-trp aminopeptidase, tryptophanyl ami candidapepsin, Saccharopepsin, rhodotorulapepsin, physa nopeptidase, methionyl aminopeptidas, d-stereospecific ropepsin, acrocylindropepsin, polyporopepsin, pycnopo aminopeptidase, aminopeptidase ey. Dipeptidases include ropepsin, Scytalidopepsin a, Scytalidopepsin b, Xanth X-his , X-arg dipeptidase, X-methyl-his dipepti omonapepsin, , barrierpepsin, bacterial leader dase, cys-gly dipeptidase, glu-glu dipeptidase, pro-X dipep peptidase I. pseudomonapepsin, . Metallo pro tidase, X-pro dipeptidase, met-X dipeptidase, non-stereospe teinases include atrolysin a, microbial , leucol cific dipeptidase, cytosol non-specific dipeptidase, ysin, interstitial collagenase, neprilysin, envely sin, iga-spe , beta-ala-his dipeptidase. Dipeptidyl cific , procollagen N-endopeptidase, peptidases and tripeptidyl peptidases include dipeptidyl thimet oligopeptidase, neurolysin, stromelysin 1, meprin A, peptidase i, dipeptidyl-peptidase ii, iii. procollagen C-endopeptidase, peptidyl-lys metalloendopep dipeptidyl-peptidase iv, dipeptidyl-dipeptidase, tripeptidyl tidase, astacin, stromelysin, 2., matrilysin , aero peptidase I, tripeptidyl-peptidase II. Peptidyl-dipeptidases monolysin, pseudolysin, thermolysin, bacillolysin, aureol include peptidyl-dipeptidase a and peptidyl-dipeptidase b. ysin, coccolysin, mycolysin, beta-lytic Serine-type carboxypeptidases include lysosomal pro-X car metalloendopeptidase, peptidyl-asp metalloendopeptidase, boxypeptidase, serine-type D-ala-D-ala carboxypeptidase, neutrophil collagenase, gelatinase B, leishmanolysin, sac , . Metallocarbox charolysin, autolysin, deuterolysin, serralysin, atrolysin B, ypeptidases include , carboxypeptidase atrolysin C, atroXase, atrolysin E. atrolysin F, adamalysin, B. lysine() carboxypeptidase, gly-X carboxypepti horrilysin, ruberlysin, bothropasin, bothrolysin, ophiolysin, dase, , muramoylpentapeptide car trimerelysin I, trimerelysin II, mucrolysin, pitrilysin, insul boxypeptidase, carboxypeptidase h, glutamate carboxypep ysin, O-Syaloglycoprotein endopeptidase, russellysin, mito tidase, . muramoyltetrapeptide chondrial, intermediate, peptidase, dactylysin, nardilysin, carboxypeptidase, Zinc d-ala-d-ala carboxypeptidase, car magnolysin, meprin B, mitochondrial processing peptidase, boxypeptidase A2, membrane pro-X carboxypeptidase, tubu macrophage elastase, choriolysin, toxilysin. Proteinases of linyl-tyr carboxypeptidase, . Omegapep unknown mechanism include thermopsin and multicatalytic tidases include acylaminoacyl-peptidase, peptidyl endopeptidase complex. glycinamidase, pyroglutamyl-peptidase I, beta-aspartyl peptidase, pyroglutamyl-peptidase II, n-formylmethionyl 0170 Certain protease of Pfluorescens are listed in Table peptidase, pteroylpoly-gamma-glutamate A. US 2006/01 10747 A1 May 25, 2006 21

TABLE A

Class Family Curated Function Gene Physiology MEROPS Homologs Aspartic A8 (signal peptidase II family) RXFOS383 Lipoprotein signal Processing of numerous bacterial Peptidases peptidase (ec Secreted lipoproteins. 3.4.23.36) A24 (type IV prepilin peptidase family) RXFO5379 type 4 prepilin This membrane-bound peptidase peptidase pild (ec cleaves a specialized leader 3.4.99.—) peptide from type 4 prepilin during its secretion from many bacterial species. Once Secreted, the processed proteins are required for functions including type 4 pilus formation, toxin and other enzyme secretion, gene transfer, and biofilm formation. Cysteine C15 (pyroglutamyl peptidase I family) RXFO2161 Pyrrollidone Removal of pyroglutamyl groups Peptidases carboxylate peptidase from peptides in protein (ec 3.4.19.3) catabolism. C40 RXFO 1968 invasion-associated protein, P60 RXFO4920 invasion-associated protein, P60 RXFO4923 phosphatase-associated protein papq C56 (PfpI endopeptidase family) RXFO1816 protease I (ec 3.4.—.—) Metallopeptidases M1 RXFO8773 Membrane (ec 3.4.11.2) M3 RXFOOS61 Oligopeptidase A (ec prlC Degradation of lipoprotein signal 3.4.24.70) peptides, and other Intracellular oligopeptides. Role in maturation of bacteriophage P22 gp7 precursor. RXFO4631 Zn-dependent M4 (thermolysin family) RXFOS113 Extracellular metalloprotease precursor (ec 3.4.24.—) M41 (FtsH endopeptidase family) RXFOS400 protein Proposed role in proteolytic ftsH (ec 3.4.24-) quality control of regulatory molecules and membrane proteins, in yeast. M10 FO4304 Serralysin (ec 3.4.24.40) Serralysin (ec 3.4.24.40) FO1590 Serralysin (ec 3.4.24.40) FO4495 Serralysin (ec 3.4.24.40) FO2796 Serralysin (ec 3.4.24.40) M14 (carboxypeptidase A family) FO9091 Zinc-carboxypeptidase precursor (ec 3.4.17.—) M16 (pitrilysin family) FO3441 Coenzyme paq synthesis protein F (ec 3.4.99.—) FO 1918 Zinc protease (ec 3.4.99.—) FO 1919 Zinc protease (ec 3.4.99.—) FO3699 processing peptidase (ec 3.4.24.64) M17 (leucyl aminopeptidase family) FOO285 Cytosol Contributes to bacterial nutrition. aminopeptidase (ec 3.4.11.1) M18 FO7879 Asparty aminopeptidase (ec 3.4.11.21) RXFOO811 Succinyl dapF. diaminopimelate deSuccinylase (ec 3.5.1.18) RXFO4052 Xaa-His dipeptidase US 2006/01 10747 A1 May 25, 2006 22

TABLE A-continued

Class Family RXF Curated Function Gene Physiology (ec 3.4.13.3) RXFO1822 Carboxypeptidase G2 precursor (ec 3.4.17.11) RXFO4892 N-acyl-L-amino acid amidohydrolase (ec 3.5.1.14) M28 (aminopeptidase Y family) RXFO3488 Alkaline phosphatase isozyme conversion protein precursor (ec 3.4.11.—) M42 (glutamylaminopeptidase family) RXFO5615 Deblocking aminopeptidase (ec 3.4.11.—) M22 RXFO5817 O-sialoglycoprotein endopeptidase (ec 3.4.24.57) RXFO3065 Glycoprotease M23 RXFO1291 endopeptidase, family M23, M37 RXFO3916 Membrane proteins related to RXFO9147 Cell wal endopeptidase, family M23, M37 M24 RXFO4693 Probable role in cotranslational aminopeptidase (ec removal of N-terminal 3.4.11.18) methionine. RXFO3364 Methionine Probable role in cotranslational aminopeptidase (ec removal of N-terminal 3.4.11.18) methionine. RXFO298O Xaa-Pro Involved in intracellular protein aminopeptidase (ec turnover, in bacteria. 3.4.11.9) RXFO6564 Xaa-Pro aminopeptidase (ec 3.4.11.9) M48 (Ste24 endopeptidase family) RXFO5137 HtpX RXF05081 Zinc metalloprotease (ec 3.4.24.—) M50 (S2P protease family) RXFO4692 Membrane metalloprotease Serine Peptidases S1 (chymotrypsin family) RXFO1250 protease do (ec 3.4.21.—) RXF07210 protease do (ec 3.4.21.—) S8 ( family) RXFO6755 serine protease (ec 3.4.21.—) RXFO8517 serine protease (ec 3.4.21.—) RXFO8627 extracellular serine protease (ec 3.4.21.—) RXFO6281 Extracellular serine protease precursor (ec 3.4.21.—) RXFO8978 extracellular serine protease (ec 3.4.21.—) RXFO6451 serine protease (ec 3.4.21.—) S9 (prolyl oligopeptidase family) RXFO2003 Protease ii (ec 3.4.21.83) RXFOO458 S11 (D-Ala-D-Ala carboxypeptidase A RXFO4657 D-alanyl-D-alanine family) endopeptidase (ec 3.4.99.—) RXFOO670 D-alanyl-D-alanine carboxypeptidase (ec 3.4.16.4) S13 (D-Ala-D-Ala peptidase C family) RXFOO133 D-alanyl-meso Acts in synthesis and remodelling diaminopimelate of bacterial cell walls. endopeptidase (ec 3.4.——) RXFO4960 D-alanyl-meso US 2006/01 10747 A1 May 25, 2006 23

TABLE A-continued

Class Family RXF Curated Function Gene Physiology diaminopimelate endopeptidase (ec 3.4.—.—) S14 (ClpP endopeptidase family) RXFO4567 atp-dependent Clp clpP Thought to contribute to protease proteolytic elimination of damaged proteins subunit (ec 3.4.21.92) in heat shock. RXFO4663 atp-dependent Clp clpP Thought to contribute to protease proteolytic elimination of damaged proteins subunit (ec 3.4.21.92) in heat shock. S16 (lon protease family) RXFO4653 atp-dependent protease Thought to contribute to La (ec 3.4.21.53) elimination of damaged proteins in heat shock. RXFO8653 atp-dependent protease La (ec 3.4.21.53) RXFO5943 atp-dependent protease La (ec 3.4.21.53) S24 (LexA family) RXFOO449 Lex A repressor (ec 3.4.21.88) RXFO3397 Lex A repressor (ec 3.4.21.88) S26 (signal peptidase I family) RXFO1181 Signal peptidase I (ec Cleaves signal peptides from 3.4.21.89) Secreted proteins. S33 RXFOS236 Proline iminopeptidase pip3 (ec 3.4.11.5) RXFO4802 Proline iminopeptidase pip1 (ec 3.4.11.5) RXFO4808 Proline iminopeptidase pip2 (ec 3.4.11.5) S41 (C-terminal processing peptidase RXFO6586 Tail-specific protease family) (ec 3.4.21.—) RXFO1037 Tail-specific protease (ec 3.4.21.—) S45 RXFO7170 Penicillin acylase (ec pacB2 3.5.1.11) RXFO6399 Penicillin acylase ii (ec pacB1 3.5.1.11) S49 (protease IV family) RXFO6993 possible protease sohb (ec 3.4.—.—) RXFO1418 protease iv (ec 3.4.——) S58 (Dmp A aminopeptidase family) RXFO6308 D-aminopeptidase (ec 3.4.11.19) Threonine T1 (proteasome family) RXFO 1961 atp-dependent protease Thought to contribute to Peptidases hslV (ec 3.4.25.—) elimination of damaged proteins in heat shock. T3 (gamma-glutamyltransferase family) RXFO2342 Gamma glutamyltranspeptidase (ec 2.3.2.2) RXF04424 Gamma glutamyltranspeptidase (ec 2.3.2.2) Unclassified U32 RXF00428 protease (ec 3.4——) Peptidases RXF02151 protease (ec 3.4——) RXFO4715 Muramoyltetrapeptide carboxypeptidase (ec 3.4.17.13) RXFO4971 PmbA protein pmbA The product of the PmbA gene ({Escherichia coli) facilitates the secretion of the peptide microcin B17, removing an N terminal, 26-amino acid leader peptide (Madison et al., 1997). RXFO4968 TldD protein Non MEROPS Proteases RXFOO325 Repressor protein C2 RXFO2689 Microsomal dipeptidase (ec 3.4.13.19) RXFO2739 membrane dipeptidase (3.4.13.19) RXFO3329 Hypothetical Cytosolic Protein RXF02492 Xaa-Pro dipeptidase US 2006/01 10747 A1 May 25, 2006 24

TABLE A-continued Class Family RXF Curated Function Gene Physiology (ec 3.4.13.9) RXFO4047 caax amino terminal protease family RXFO8136 protease (transglutaminase-like protein) RXFO9487 Zinc metalloprotease (ec 3.4.24.—)

0171 Certain proteases of E. coli origin are listed in Table B.

TABLE B Class Family Code Peptidase or homologue (Subtype) Gene Aspartic A8 AO8.OO1 signal peptidase II lsp A Peptidases A24A A24.001 type IV prepilin peptidase 1 (EtpN etpN protein (plasmid p0157) A24.OO1 type IV prepilin peptidase 1 (CofF cof protein) A24.OO1 type IV prepilin peptidase 1 (Hof D hof)/hopD/hopO protein) A24.OO3 type IV prepilin peptidase 2 (HopD hopD/ECS4.188 protein) A24 amily A24A unassigned peptidases pppA(ORF F310 unassigne (ORF F310 protein) A24 amily A24A unassigned peptidases pill J unassigne (Pill J protein (plasmid R721)) A24 amily A24A unassigned peptidases bfpP/bfpG unassigne (BfpP protein (plasmid pMAR2)) A24 amily A24A unassigned peptidases PILU unassigne (Pill J protein) A26 A26.OO1 omptin ompTIECS1663/B0565 A26.OOS proteinase Sop A SopA Cysteine C26 C26 amily C26 unassigned peptidases YCLZ2490, ECS1875 Peptidases unassigne C40 C4O.OO)4 sprig.p. (Escherichia-type) (spr spr protein) C4O amily C40 unassigned peptidases nlpC/C2104/Z2737/ unassigne (NlpC protein) ECS241S C4O amily C40 unassigned peptidases Yaf unassigne (YafL protein) C4O amily C40 unassigned peptidases unassigne (chitinase 3) C4O amily C40 unassigned peptidases ydho unassigne (Yaho protein) C39 C39.OOS colicin V processing peptidase (CvaB cvaB protein) C39.OOS colicin V processing peptidase (MtfB mtfB protein) C39 amily C39 unassigned peptidases mchFMCLB unassigned (microcin H47 secretion protein MchF) C56 C56 amily C56 unassigned peptidases yhbo unassigned (YhbO protein) C56 amily C56 unassigned peptidases c4536 unassigned (c4536 protein) Metallopeptidases M1 MO1.OOS alanyl aminopeptidase pepN (proteobacteria) M3A MO3.004 oligopeptidase A prlCopdA MO3.OOS peptidyl-dipeptidase Dep dcp Z2160 ECS2147 MO3.OOS peptidyl-dipeptidase Dep dep M41 M41.001 FtsH endopeptidase BiftSEHFECS4057 M66 M66.001 SteE protease stcE M1 SD M1S subfamily M15D unassigned didpX/vanx/B1488/ unassigned peptidases (Van X protein) Z2222 ECS2092 M16A M16.OO1 pitrilysin ptr/ECs3678 M16B M16 subfamily M16B unassigned pqqLyddC US 2006/01 10747 A1 May 25, 2006 25

TABLE B-continued

Class Family Code Peptidase or homologue (Subtype) Gene unassigned peptidases (PdqL protein) M17 M17.003 aminopeptidase A (bacteria) pepAixerB M17.004 PepB aminopeptidase pepB/Z3790/ECS3389 M24.001 1 map M24.003 X-Pro dipeptidase (bacteria) pepO/ECs4775 M24.004 aminopeptidase P (bacteria) pepP M24 Subfamily M24B unassigned yghTypdF/B2385/ unassigne peptidases (YChT protein) c2924 M2O.O10 DapF peptidase (Succinyl dapF/msgB/C2999 diaminopimelate desuccinylase) M2O Subfamily M20A unassigned ygey unassigne peptidases (YgeY protein) M2O.OO3 peptidase T pepTIZ1832/ECS1572 M2O.007 X-His dipeptidase pepD/pepH/ECs0264 M2O amily M20D unassigned peptidases ydaJ/ECs1922 unassigne (Ydal protein) M28 Subfamily M28A unassigned unassigne peptidases (YfbL protein) M28.OOS AP aminopeptidase iap M42 amily M42 unassigned peptidases yhC) unassigne (YhC) protein) M42 amily M42 unassigned peptidases frvX unassigne (FrvX protein) M42 amily M42 unassigned peptidases frvX/b2384/ypdE unassigne (FrvX protein) M38 M38.001 beta-aspartyl dipeptidase M22 M22.001 O-Sialoglycoprotein endopeptidase M22.002 yeaZ protein

M23B M23.006 YibP peptidase (YibP protein) M23 Subfamily M23B unassigned unassigned peptidases (YebA protein) M48.002 HtpX endopeptidase HtpX M48 Subfamily M48B unassigned YGGGiC3521 unassigned peptidases M48 Subfamily M48B unassigned unassigned peptidases M48 Subfamily M48B unassigned unassigned peptidases (YggG protein) M48 Subfamily M48B unassigned ycaL/C1047/Z1255/ unassigned peptidases (YcaL protein) ECSO992 MSOA MSO.OO4 YaeL protease (YAEL protein) ecife, YAELBO176, ZO187;ECSO178, CO213 MS2 MS2.001 HybD endopeptidase (HybD protein) hybD/ECS3878 MS2.002 HyalD endopeptidase (HyalD protein) hyalD MS2.003 HycI endopeptidase (HycI protein) hycI/C3277 Serine SO1.260 B1598 endopeptidase b1598 Peptidases SO1273 protease Do SO1.274 DegO

SO1275 DegS SO6.OO2 EspPg.p. (Escherichia coli) SO6.OO3 Tsh peptidase (Escherichia coli) (Tsh protein) SO6.OO3 Tsh peptidase (Escherichia coli) cO393 SO6.004 Pet endopeptidase Sat SO6.004 Pet endopeptidase SO6.OOS Pic endopeptidase (Shigella flexneri) S6 amily S6 unassigned peptidases unassigned (eatA protein) S6 amily S6 unassigned peptidases cO3SO unassigned (c0350 protein) S6 amily S6 unassigned peptidases espC unassigned (EspC protein) S6 amily S6 unassigned peptidases epeA unassigned (epeA protein) S6 amily S6 unassigned peptidases unassigned S8 Subfamily S8A unassigned peptidases unassigned SO9.010 oligopeptidase B ptrB SO9.010 oligopeptidase B ptrB/C2255 family S9 unassigned peptidases YFEHRC3060 b2S34, US 2006/01 10747 A1 May 25, 2006 26

TABLE B-continued

Class Family Code Peptidase or homologue (Subtype) Gene unassigned S11 S11.OO2 murein-DD-endopeptidase pbpG S11.003 penicillin-binding protein 6 dacCZ1066, ECSO919 S11.003 penicillin-binding protein 6 dacDphsE/ECs2812 (penicillin-binding protein pbp-6B) S11.003 penicillin-binding protein 6 dacA S12 S12 amily S12 unassigned peptidases c2452 unassigned (c2452 protein) S12 amily S12 unassigned peptidases yai H/CO480 unassigned (YaiH protein) S13 S13.OO1 D-Ala-D-Ala peptidase C dacBECS4061 S14 S14.OO1 endopeptidase Clip (type 1) clpP/lopP/ECSO491 S14 amily S14 unassigned peptidases ZO967 ECSO829 unassigned (ECs0829 protein) S14 amily S14 unassigned peptidases HOO22,Z2112 ECS2960. unassigned (ECs2960 protein) L34 S16 S16.OO1 on protease on deg ECs0493 S16 amily S16 unassigned pep OnB.Z1305, ECS1039 unassigned (ECS1039 protein) S16 amily S16 unassigned peptidases c1091 unassigned (c1091 protein) S24 S24.OO1 repressor Lex A (Lex A protein) exA exra S24.003 Umud protein S24.003 Umud protein umuDC1631 S26 S26A S26.001 signal peptidase I S26.014 traF plasmid-transfer protein (TraF traF protein) S33 S33 amily S33 unassigned peptidases bioIFC4189, Z4767; unassigned (BioH protein) ECS4255 S41A S41.OO1 C-terminal processing protease-1 pre?tsp/ECS2540/ Z2877, C2239 S45 penicillin G acylase precursor 80 S49 protease IV spp.A/ECs2472/C2170 sohB endopeptidase SOBJECS1844;Z2538, C1737 SS1.OO1 dipeptidase E pepE SS4 amily S54 unassigned peptidases cO741 unassigned (c0741 protein) SS4 amily S54 unassigned peptidases glpG/C4201//Z4784/ unassigned (glycerophosphate dehydrogenase) ECS4267 Threonine TO1.006 HsV component of HslUV peptidase hSIV Peptidases TO2.002 asparaginase TO3.001 gamma-glutamyltransferase 1 (bacterial) C-terminal processing protease-1 pre?tsp/ECS2540/ Z2877, C2239 Unclassified murein endopeptidase mepA/ECs3212/C2874 Peptidases U32 unassigned amily U32 unassigned peptidases (YolcP protein) U32 amily U32 unassigned peptidases yegO/C2611 unassigned (YegCR protein) U32 amily U32 unassigned peptidases YHBUC3911 Z4519, unassigned (YhbU protein) ECS4039 U35 U35 unassigned amily U35 unassigned peptidases U35 amily U35 unassigned peptidases ECS4973 unassigned (ECs4973 protein) U49 U49.OO Lit protease (Escherichia coli) U61.OO muramoyl etrapeptide carboxypeptidase amily U61 unassigned peptidases mccF unassigned (MccF protein) U62 U62.OO microcin processing peptidase 1 microcin-processing peptidase 2 tldDECS4117 endopeptidase ECP 32 (Escherichia US 2006/01 10747 A1 May 25, 2006 27

TABLE B-continued Class Family Code Peptidase or homologue (Subtype) Gene coli)

0172 Certain proteases of S. cerevisiae origin are listed in Table C.

TABLE C Class Family Code Peptidase or homologue (Subtype) Gene Aspartic A1 A01.015 barrierpepsin bar1 Peptidases A01.018 Saccharopepsin pep4 pho9 AO1.030 yapsin 1 yap3 AO1.031 yapsin 2 mkc7 AO1.035 yapsin 3 YPS3 A01..UPW family A1 unassigned peptidases YPS7, D9476.8/ YDR349C A01..UPW family A1 unassigned peptidases YIRO39C YIRO39C protein) A2D A02.022 Ty3 transposon (Saccharomyces POL3, TY3-2 orfB, cerevisiae) endopeptidase TY3B (retrotransposon Ty3-1) A11B A11.003 Ty1 transposon (Saccharomyces Ty1B cerevisiae) endopeptidase (transposon Ty1-17 protein B) A11.003 Ty1 transposon (Saccharomyces Ty1B cerevisiae) endopeptidase (transposon Ty1 protein B) A11.003 Ty1 transposon (Saccharomyces Ty1B cerevisiae) endopeptidase (transposon Ty1 protein B) A11X A11.UPW family A11 unassigned peptidases (retrotransposon Ty4) A22B A22.008 YKL100c protein (Saccharomyces YKL100c cerevisiae) Cysteine C1B C01.085 bleomycin hydrolase (yeast) GAL6.YCP1, LAP3 Peptidases C2 CO2.008 calpain-7 YMR154C/Cp11/ Rim13 C12 C12.002 ubiquitinyl hydrolase YUH1 yuhl C13 C13.005 glycosylphosphatidylinositol:protein 9798.2 transamidase C19 C19.002 Ubp1 ubiquitin peptidase ubp1 C19.003 Ubp2 ubiquitin peptidase ubp2 C19.004 Ubp3 ubiquitin peptidase ubp3 C19.OOS Doa4 ubiquitin peptidase DOA4 C19.006 Ubp5 ubiquitin peptidase ubp5 C19.079 UBP6 () yfro1Ow (YFRO1OW protein) C19.UPW family C19 unassigned peptidases YNL186W (YNL186W protein) C19.UPW family C19 unassigned peptidases ubp9 (UBP9) C19.UPW family C19 unassigned peptidases YBLO67C (YBLO67C protein) C19.UPW family C19 unassigned peptidases UBP12YBRO58C (YBRO58C protein) C19.UPW family C19 unassigned peptidases UBP16YPLO72W (ubiquitin carboxy-terminal hydrolase LPF12W 6) C19.UPW family C19 unassigned peptidases YMR3O4W (YMR304W protein) ym9952.06 C19.UPW family C19 unassigned peptidases YMR223W (YMR223W protein) ym9959.05 C19.UPW family C19 unassigned peptidases ubp7 (UBP7) C19.UPW family C19 unassigned peptidases ubp13 (UBP13) C44 C44.971 glucosamine-fructose-6-phosphate aminotransferase

US 2006/01 10747 A1 May 25, 2006 29

TABLE C-continued

Class Family Code Peptidase or homologue (Subtype) Gene M67.002 Jab1/MPN domain metalloenzyme M67.973 26S proteasome non-ATPase regulatory Subunit 7 Serine Peptidases S1C SO1.434 Nma111 endopeptidase (Saccharomyces cerevisiae) (YNL123W protein) S8A SO8.052 cerevisin prb S08.UPA subfamily S8A unassigned peptidases YSP3 (YSP3 protein) S08.UPA subfamily S8A unassigned peptidases (YCR54C protein) S8B SO8.07.0 kexin kex2 S9B SO9.OOS dipeptidyl aminopeptidase A ste13/yci1 SO9.006 dipeptidyl (fungus) dap2 S9X SO9.UPW family S9 unassigned peptidases YNL32OW (Ynl320w protein) S10 S10.001 carboxypeptidase Y pro S10.007 kex carboxypeptidase kex1 S10.UPW family S10 unassigned peptidases (YBR139W protein) S16 S16.002 PIM1 endopeptidase lon pim1 S26A S26.002 mitochondrial inner membrane protease imp1 1 (1) S26.012 mitochondrial inner membrane protease imp2 2 (2) S2.6B S26.010 signalase () 21 kDa Sec11 component S33.UPW family S33 unassigned peptidases S33.UPW family S33 unassigned peptidases SS4 SS4.007 Pcp1 protein (Saccharomyces cereviseae) (YGR101W protein) S59 SS9.001 nucleoporin 145 Threonine T1A TO1.010 proteasome catalytic Subunit 1 Peptidases TO1.011 proteasome catalytic Subunit 2 pup1 TO1.012 proteasome catalytic Subunit 3 pre2.prg1 TO1983 proteasome subunit beta 3 pup3 TO1.984 proteasome subunit beta 2 pre1 TO1.986 proteasome subunit beta 1 pre7/prs3 TO1.987 proteasome subunit beta 4 brea. T1X TO1971 proteasome subunit alpha 6 prs2/pre2 TO1972 proteasome subunit alpha 2 pre3 prS4 TO1973 proteasome subunit alpha 4 preg/prs5 T01.974 proteasome subunit alpha 7 bre6 TO1975 proteasome subunit alpha 5 pup2 TO1976 proteasome subunit alpha 1 bres TO1977 proteasome subunit alpha 3 pre10/prs1 pre1 T3 TO3.012 gamma-glutamyltransferase L80O3.4 (Saccharomyces) (YLR299w protein) T5 T05.001 ornithine acetyltransferase precursor Unclassified U48 U48.001 prenyl protease 2 Peptidases

Folding Modulators 0.174 The best characterized molecular chaperones in the cytoplasm of E. coli are the ATP-dependent DnaK-DnaJ 0173 The identified up-regulated genes or gene products GrpE and GroEL-GroES systems. Based on in vitro studies can be one or more folding modulator. Folding modulators and homology considerations, a number of additional cyto can for example be HSP70 proteins, HSP110/SSE proteins, plasmic proteins have been proposed to function as molecu HSP40 (DNAJ-related) proteins, GRPE-like proteins, lar chaperones in E. coli. These include Clpb, HtpG and HSP90 proteins, CPN60 and CPN10 proteins, Cytosolic Ibp A/B, which, like DnaK-DnaJ-GrpE and GroEL-GroES, chaperoning, HSP100 proteins, Small HSPs, Calnexin and are heat-shock proteins (Hsps) belonging to the stress regu calreticulin, PDI and thioredoxin-related proteins, Peptidyl lon. The trans conformation of X-Pro bonds is energetically prolyl isomerases, Cyclophilin PPIases, FK-506 binding favored in nascent protein chains; however, ~5% of all proteins, Parvulin PPIases, Individual chaperoning, Protein prolyl peptide bonds are found in a cis conformation in specific chaperones, or intramolecular chaperones. Folding native proteins. The trans to cis isomerization of X-Pro modulators are generally described in "Guidebook to bonds is rate limiting in the folding of many polypeptides Molecular Chaperones and Protein-Folding Catalysts’ and is catalyzed in Vivo by peptidyl prolyl cis/trans (1997) ed. M. Gething, Melbourne University, Australia. isomerases (PPIases). Three cytoplasmic PPIases, Sly D, US 2006/01 10747 A1 May 25, 2006 30

Slp A and trigger factor (TF), have been identified to date in in the reduction of disulfide bridges that transiently arise in E. coli. TF, a 48 kDa protein associated with 50S ribosomal cytoplasmic enzymes. Thus, identified genes can be disul Subunits that has been postulated to cooperate with chaper fide bond forming proteins or chaperones that allow proper ones in E. coli to guarantee proper folding of newly syn disulfide bond formation. thesized proteins. At least five proteins (thioredoxins 1 and 2, and glutaredoxins 1, 2 and 3, the products of the trXA, 0.175 Certain folding modulators in P. fluorescens are trXC, grXA, grxB and grXC genes, respectively) are involved listed in Table D.

TABLED

RXF gene function Family GroESEL rxf)2095 groES chaperone Hsp10 rxf)6767::rxfo2090 groEL chaperone Hsp60 RXFO1748 ibp A Small heat-shock protein (sEISP) IbpA PA3126; Acts as Hsp20 a holder for GroESL folding RXFO3385 hscB Chaperone protein hsch3 HSb2O Hsp70 (DnaK/J) rxf)5399 dnaK chaperone Hsp70 RXFO6954 dnaK chaperone Hsp70 RXFO3376 hscA chaperone Hsp70 RXFO3987 cbp.A Curved dina-binding protein, dinal like activity Hsp40 RXFOS4O6 dna Chaperone protein dinal Hsp40 RXFO3346 dna Molecular chaperones (DnaJ family) Hsp40 Hsp100 (Clp/Hsl) RXFO4587 clpA atp-dependent clip protease atp-binding Subunit clip A Hsp1OO RXFO8347 clpB ClpB protein Hsp1OO RXFO4654 clpX atp-dependent clip protease atp-binding Subunit clpX Hsp1OO RXFO1957 hSIU atp-dependent his protease atp-binding subunit his U Hsp1OO RXFO1961 hSIV atp-dependent hsil protease atp-binding SubunithslV HS1OO Hsp33 RXFO4254 yrfI 33 kDa chaperonin (Heat shock protein 33 homolog) HSp33 (HSP33). Hsp90 RXFO5455 htpG Chaperone protein htpG Hsp90 SecB RXFO2231 SecB secretion specific chaperone SecB SecB Disulfide Bond Isomerases

RXFO7017 disbA disulfide isomerase DSBA RXFO8657 disbAdsbC disulfide isomerase DSBA dsbGif oxidoreductase fernA rxf)1002 disbAdsbC disulfide isomerase DSBA oxidoreductase. Thioredoxin rxf)3307 dsbC disulfide isomerase glutaredoxin. Thioredoxin rxf)4890 dsbG disulfide isomerase glutaredoxin. Thioredoxin Peptidyl-prolyl cis-trans isomerases RXFO3768 ppiA Peptidyl-prolyl cis-trans isomerase A (ec 5.2.1.8) PPIase: cyclophilin type RXFOS345 ppiB Peptidyl-prolyl cis-trans isomerase B. PPIase: cyclophilin type RXFO6034 fkIB Peptidyl-prolyl cis-trans isomerase FklB. PPIase: FKBP type RXFO6591 fkBifkbP k506 binding protein Peptidyl-prolyl cis-trans PPIase: FKBP isomerase (EC 5.2.1.8) type RXFO5753 fkIB; fkbP Peptidyl-prolyl cis-trans isomerase (ec 5.2.1.8) PPIase: FKBP type RXFO1833 slyD Peptidyl-prolyl cis-trans isomerase Sly D. PPIase: FKBP type RXFO46SS tig Trigger factor, ppiase (ec 5.2.1.8) PPIase: FKBP type RXFO5385 yaad Probable FKBP-type 16 kDa peptidyl-prolyl cis-trans PPIase: FKBP isomerase (EC 5.2.1.8) (PPiase) (Rotamase). type RXFOO271 Peptidyl-prolyl cis-trans isomerase (ec 5.2.1.8) PPIase: FKBP type pili assembly chaperones (papD like) US 2006/01 10747 A1 May 25, 2006 31

TABLE D-continued

gene function Family

FO6068 Cl Chaperone protein cup pi i b 8. F05719 (C D Chaperone protein ecpD pi i b 8. FO3406 (C pl); Chaperone protein ecpD pi i b cSuC 8. FO4296 (C Chaperone protein ecpD pi i b Cl 8. FO4553 (C pl); Chaperone protein ecpD pi i b Cl 8. FO4554 (C pl); Chaperone protein ecpD pi i b Cl 8. F05310 (C pl); Chaperone protein ecpD pi i b Cl 8. FOS304 (C pl); Chaperone protein ecpD pi i b Cl 8. F05073 Gram-negative pili assembly chaperone periplasmic pi i b function 8.

0176 Certain folding modulators in E. coli are listed in Table E.

TABLE E Uniprot Accession Uniprot ID Annotation Family GroES.EL

P05380 CH10 ECOLI 10 kDa chaperonin PO6139 CH6O ECOLI 60 kDa chaperonin Hsp70 (DnaK/J) PO4475 DNAK EC OLI haperone protein dnaK 70 P77319 HSCC ECOLI haperone protein hscC 70 P36659 CBPA ECOLI urved DNA-binding protein chp.A 40 P31680 DJLA ECOLI na-like protein dIA, rScG 40 PO8622 DNAJ ECOLI haperone protein dna 40 P291.31 FTSN ECOLI ell division protein ftsN 40 PO9372 GRPE ECOLI rpE protein G P31658 HCHA ECOLI haperone protein hchA 31 Hsp100 (Clp/Hsl)

P15716 CLPA ECOLI ATP-dependent Clp protease ATP-binding Subunit clpA PO3815 CLPB ECOLI ClpB protein P33138 CLPX ECOLI ATP-dependent Clp protease ATP-binding subunit clpX P321.68 HSLU ECOLI ATP-dependent hsil protease ATP-binding subunit hslU, clpY Small Heat Shock Proteins P29209 IBPA ECO L 16 kDa heat shock protein A. P29210 IBPB ECO LI 16 kDa heat shock protein B. Not Part of a Larger Group P36.662 TORD ECOLI Chaperone protein torD Tor) P15040 SECB ECOLI Protein-export protein secB SecB P45803 HSLO ECOLI 33 kDa chaperonin Hsp33 P10413 HTPG ECOLI Chaperone protein httpG Hsp90 HSCAB

P36541 HSCA ECOLI Chaperone protein hscA Hsp66 P36540 HSCB ECOLI Co-chaperone protein hsch3 Hsp20 Lipoprotein Carrier Protein

P61316 LOLA ECOLI Outer-membrane lipoprotein carrier protein Lola precursor P61320 LOLB ECOLI Outer-membrane lipoprotein lolB precursor LOIB US 2006/01 10747 A1 May 25, 2006 32

TABLE E-continued Uniprot Accession Uniprot ID Annotation Family Disulfide Bond Isomerases

P24991 DSBA ECOLI : disulfide interchange protein dsbA precursor. P30018 DSBB ECOLI Disulfide bond formation protein B Disulfide Bond Oxidoreductase P21892 DSBC ECOLI Thiol: disu e interchange protein disbC precursor. P36655 DSBD ECOLI Thiol: disulfide interchange protein dsbD precursor (EC 1.8.1.8) (Protein-disulfide reductase) P33926 DSBE ECOLI Thiol: disu e interchange protein disbE (Cytochrom c biogenesis protein ccmG). DSBG ECOLI Thiol: disu e interchange protein disbG Disulfide Bond (CSO Oxidoreductase Peptidyl-prolyl cis-trans isomerases P22257 G. ECOLI Trigger factor PPIase: FKBP type P45523 KBA ECOLI FKBP-type peptidyl-prolyl cis-trans isomerase PPIase: FKBP type kpA precursor P39311 FKBB ECOLI FKBP-type 22 kDa peptidyl-prolyl cis-trans PPIase: FKBP type isomerase P22563 FKBX ECOLI FKBP-type 16 kDa peptidyl-prolyl cis-trans PPIase: FKBP type isomerase P3O856 SLYD ECOLI FKBP-type peptidyl-prolyl cis-trans isomerase PPIase: FKBP type slyD P2O752 PPIA ECOL Peptidyl-prolyl cis-trans isomerase A precursor C PIase: Cyclophilin

P23869 PPIB ECOL Peptidyl-prolyl cis-trans isomerase B C ase: Cyclophilin P39159 P PIC ECOL Peptidyl-prolyl cis-trans isomerase C ase: PPIC type P77241 PPID ECOL Peptidyl-prolyl cis-trans isomerase D ase: PPIC type P212O2 SURA ECOLI Survival protein surA precursor : ase: Parvulin type pili assembly chaperones (papD like)

P53516 AFAB ECOLI Chaperone protein afaB precursor Pili Assembly PapD P33128 ECPD ECOLI Chaperone protein ecp) precursor Pili Assembly PapD P31697 FIMC ECOLI Chaperone protein fimC precursor Pili Assembly PapD P77249 SFMC ECOLI Chaperone protein SfmC precursor Pili Assembly PapD P75749 YBGP ECOLI Hypothetical fimbrial chaperone ybgP precursor Pili Assembly PapD P40876 YCBF ECOLI Hypothetical fimbrial chaperoneycbF precursor Pili Assembly PapD P75856 YCBR ECOLI Hypothetical fimbrial chaperoneycbR precursor Pili Assembly PapD P33342 YEHC ECOLI Hypothetical fimbrial chaperone yehC precursor Pili Assembly PapD P77599 YFCS ECOLI Hypothetical fimbrial chaperone yfcS precursor Pili Assembly PapD P28722 YHCA ECOLI Hypothetical fimbrial chaperone yhcA precursor Pili Assembly PapD P77616 YQIH ECOLI Hypothetical fimbrial chaperone ydiH precursor Pili Assembly PapD P42914 YRAI ECOLI Hypothetical fimbrial chaperone yraI precursor Pili Assembly PapD

0177 Certain folding modulators of S. cervisia are shown in table F.

TABLE F Uniprot Accession Uniprot ID GO Source Annotation Family GroESEL

P19882 HS60 YEAST GOA: Heat shock protein 60, Hsp60 mitochondrial precursor P38228 TC62 YEAST GOA: interpro Mitochondrial chaperone TCM62 Hsp60 P38910 CH10 YEAST GOA: interpro 10 kDa heat shock protein, Hsp10 mitochondrial Hsp70 (DnaK/J)

P25491 MASS YEAST GOA: interpro Mitochondrial protein import Hsp40 protein MAS5, Yol1 P10591 HS71 YEAST PMID: 9789005 Heat shock protein SSA1 Hsp70 P10592 HS72 YEAST PMID: 9448096 Heat shock protein SSA2 Hsp70 US 2006/01 10747 A1 May 25, 2006 33

TABLE F-continued Uniprot Accession Uniprot ID GO Source Annotation Family P11484 HS75 YEAST Heat shock protein SSB1 70 P4O150 HS76 YEAST Heat shock protein SSB2 70 PO9435 HS73 YEAST PMID: 7867784 Heat shock protein SSA3 70 P222O2 HS74 YEAST Heat shock protein SSA4 70 P25294 SIS1 YEAST GOA: interpro SIS1 protein 340 P32527 ZUO1 YEAST GO: OOO3754 Zuotin 340 P35.191 MDJ1 YEAST GOA: interpro MDJ1 protein, mitochondrial 340 precursor P12398 HS77 YEAST PMID: 8654364 Heat shock protein SSC1, mitochondrial precursor P38523 GRPE YEAST GOA: interpro GrpE protein homolog, mitochondrial precursor, MGE1 P14906 SC63 YEAST GOA: spkw Translocation protein SEC63 340 P16474 GR78 YEAST GRP 78, BIP, Kar2 70 P25303 SCJ1 YEAST GOA: interpro DnaJ-related protein SCJ1 340 P39101 CAJ1 YEAST GOA: interpro CAJ1 protein 340 P48353 HLJ1 YEAST GOA: interpro HLJ1 protein 340 P3.9102 XDJ1 YEAST GOA: interpro XDJ1 protein 340 P52868 YGM8 YEAST GOA: interpro ypothetical 41.0 kDa protein in 340 EG1-SOH1 intergenic region P53940 YNH7 YEAST GOA: interpro ypothetical 58.9 kDa protein in Hsp40 PM1-MKS1 intergenic region P38353 SSH1 YEAST ec sixty-one protein homolog. P36O16 LHS1. YEAST GOA: spkw s eat shock protein 70 homolog LHS1, SSI1 P38788 YHM4 YEAST PMID: 11054575 Heat shock protein 70 homolog Hsp70 YHR064C Hsp110/Sse P32589 HS78 YEAST PMID: 10480867 Heat shock protein homolog SSE1 SS P32590 HS79 YEAST Heat shock protein homolog SSE2 SS Hsp100 (Clp/Hsl)

P31539 6H104 YEAST GOA: interpro Heat shock protein 104 P33416 HSP7 YEAST GOA: spkw Heat shock protein 78, mitochondrial precursor P38323 MCX1 YEAST GOA: interpro Mitochondrial clpX-like chaperone MCX1 Small Heat Shock Proteins P15992 HS26 YEAST PMID: 10581247 Heat shock protein 26 Small Hisp Prefoldin

P483.63 PFD3 YEAST GOA: interpro Probable prefoldin subunit 3 Prefoldin Q04493 PFD5 YEAST GOA: interpro Prefoldin subunit 5 Prefoldin P43573 YFC3 YEAST GOA: interpro Hypothetical 91.4 kDa protein in Prefoldin STE2-FRS2 intergenic region P46988 PFD1 YEAST GOA: spkw Prefoldin subunit 1 KE2 P4OOOS PFD2 YEAST GOA: spkw Prefoldin subunit 2 KE2 P53900 PFD4 YEAST GOA: spkw Prefoldin subunit 4 KE2 P52553 PFD6 YEAST GOA: spkw Prefoldin subunit 6 KE2

PO2829 HS82. YEAST GOA: interpro Heat shock protein HSP82 Hsp90 P15108 HS83 YEAST GOA: interpro Heat shock cognate protein Hsp90 HSC82 PO 6101 CC37 YEAST GOA: spkw Hsp90 co-chaperone Cdc37 Cdc37 P33313 CNS1 YEAST GOA: spkw Cyclophilin seven suppressor 1 CNS1 P15705 STI1 YEAST PMID: 8972212 Heat shock protein STI1 Calnexin

P2 7825 CALX YEAST GOA: spkw Calnexin homolog precursor Calnexin Cytosolic Chaperonins T-complex P12612 TCPA YEAST GOA: interpro T-complex protein 1, alpha TCP-1, Hsp60 Subunit P39076 TCPB YEAST GOA: interpro T-complex protein 1, beta subunit TCP-1, Hsp60 P39078 TCPD YEAST GOA: interpro T-complex protein 1, delta TCP-1, Hsp60 Subunit P4O413 TCPE YEAST GOA: interpro T-complex protein 1, epsilon TCP-1, Hsp60 Subunit P3907 7 TCPG YEAST GOA: interpro T-complex protein 1, gamma TCP-1, Hsp60 Subunit US 2006/01 10747 A1 May 25, 2006 34

TABLE F-continued Uniprot Accession Uniprot ID GO Source Annotation Family P42943 TCPH YEAST GOA: interpro T-complex protein 1, eta Subunit TCP-1, Hsp60 P47079 TCPQ YEAST GOA: interpro T-complex protein 1, theta TCP-1, Hsp60 Subunit P39079 TCPZ YEAST GOA: interpro T-complex protein 1, Zeta Subunit TCP-1, Hsp60 Protein Specific

P48606 TBCA YEAST GOA: Spkw Tubulin-specific chaperone A protein specific P53904 TBCB YEAST GOA: Spkw Tubulin-specific chaperone B protein specific P46670 CIN2 YEAST GOA: Spkw Tubulin-folding cofactor C Cin2 protein specific P4O987 CIN1 YEAST Tubulin-folding cofactor D Cin1 protein specific P39.937 PAC2 YEAST GOA: Spkw Tubulin-folding cofactor E PAC2 protein specific P21560 CBP3 YEAST GOA: Spkw CBP3 protein, mitochondrial protein specific precursor Q12287 COXS YEAST GOA: Spkw Cytochrome c oxidase copper protein specific chaperone P40202 LYST YEAST GOA: interpro Superoxide dismutase 1 copper chaperone Q02774 SHR3 YEAST PMID: 105642SS Secretory component protein protein specific SHR3 P38293 UMP1. YEAST GOA: Spkw Proteasome maturation factor protein specific UMP1 P38784 VM22 YEAST PMID: 7673216 Vacuolar ATPase assembly protein specific protein VMA22 P38072 SCO2 YEAST GOA: Spkw SCO2 protein, mitochondrial protein specific (CSO P53266 SHY1, YEAST PMID: 11389896 SHY1 protein protein specific P4OO46 VTC1 YEAST GOA: Spkw Vacuolar transporter chaperone 1 protein specific P38958 PTOO YEAST PMID: 11498.004 PET100 protein, mitochondrial protein specific

Isomerases

P17967 PDI YEAST PMID: 11157982 Protein disulfide isomerase Disulfide bond (CSO oxidoreductase P32474 EUG1 YEAST PMID: 11157982 Protein disulfide isomerase EUG1 Disulfide bond (CSO oxidoreductase Q12404 MPD1 YEAST PMID: 11157982 Disulfide isomerase MPD1 Disulfide bond (CSO oxidoreductase Q99316 MPD2 YEAST PMID: 11157982 Protein disulfide isomerase MPD2 Disulfide bond precursor (EC 5.3.4.1) oxidoreductase Q03103 ERO1 YEAST PMID: 96599.13 Endoplasmic oxidoreductin 1 Disulfide bond precursor (EC 1.8.4—) oxidoreductase (Endoplasmic oxidoreductase protein 1). P38866 FMO1 YEAST PMID: 10077572 Thiol-specific monooxygenase Disulfide bond (EC 1.14.13.—) (Flavin-dependent oxidoreductase monooxygenase). Peptidyl-prolyl cis-trans isomerases P14832 CYPH YEAST GOA: interpro Peptidyl-prolyl cis-trans clophilin e A/Cpr1/Cyp1/CPH1/Scc1 P23285 CYPB YEAST GOA: interpro Peptidyl-prolyl cis-trans isomerase cyclophilin e P25719 CYPC YEAST GOA: interpro Peptidyl-prolyl cis-trans isomerase C/CYP3/CPR3, C yC O hillin e P25334 CYPR YEAST GOA: interpro Peptidyl-prolyl cis-trans isomerase CPR4, Sco yclophilin Ty e P35176 CYPD YEAST GOA: interpro Peptidyl-prolyl cis-trans isomerase D CypDCprS yclophilin Ty e P53691 CYP6 YEAST PMID: 10942767 Peptidyl-prolyl cis-trans C isomerase CPR6 yclophilin Ty e P47103 CYP7 YEAST PMID: 10942767 Peptidyl-prolyl cis-trans C isomerase CYP7 yclophilin Ty e P53728 CYP8 YEAST GOA: interpro Peptidyl-prolyl cis-trans C isomerase CYP8 yclophilin Ty e Q02770 Q02770 GOA: interpro Yp 1064cp C e P20081 FKBP YEAST GOA: interpro FK506-binding protein 1 FKB1 RBP1 ype P32472 FKB2 YEAST GOA: interpro FK506-binding protein 2, FKBP C C 8. S e: F K B C 3/FKBP-15/FKB2, FPR2 US 2006/01 10747 A1 May 25, 2006 35

TABLE F-continued Uniprot Accession Uniprot ID GO Source Annotation Family P38911 FKB3 YEAST GOA: interpro FK506-binding nuclear protein PPIase: FKBP FKBP-70/Npi46/Fpr3/ Type Q06205 FKB4 YEAST GOA: interpro FK506-binding protein 4 FPR4 PPIase: FKBP Type P22696 ESS1 YEAST GOA: spkw ESS1 protein PPIase: Parvulin Type Miscellaneous poorly characterised P27697 ABC1 YEAST GOA: spkw ABC1 protein, mitochondrial ABC1 precursor P531.93 YGB8 YEAST GOA: interpro Hypothetical 21.8 kDa protein in Hsp20 CKB1-ATE1 intergenic region P28707 YKL7 YEAST PMID: 9632755 24.1 kDa protein in VMA12- p23/wos2 APN1 intergenic region P38.932 VP45 YEAST PMID: 11432826 Vacuolar protein sorting- SEC1 like associated protein 45 Q12019 MDN1 YEAST GOA: spkw Midasin

Genetic Manipulation Development 4: 1951 (1990); Rao, et al., (1991) PNAS 88:2984)) and, when two complementary DNA strands pair 0178. In step iii), the process includes changing expres with a DNA duplex, a classical Holliday recombination joint sion of the identified compensatory gene or gene product in or chi structure (Holliday, R., Genet. Res. 5: 282 (1964)) can the recombinant cell by genetic modification to provide a form, or a double-D loop (“Diagnostic Applications of modified recombinant cell. After identification of one or Double-D Loop Formation' U.S. Ser. No. 07/755,462, filed more up-regulated genes, proteins or metabolic processes, Sep. 4, 1991). Once formed, a heteroduplex structure can be the genome of the host may be modified. Certain genes or resolved by strand breakage and exchange, so that all or a gene products, although identified as up-regulated, may not portion of an invading DNA strand is spliced into a recipient be available for modulation because they are essential to the DNA duplex, adding or replacing a segment of the recipient cell or are known to affect other processes that may be DNA duplex. Alternatively, a heteroduplex structure can essential to the cell or organism. result in gene conversion, wherein a sequence of an invading 0179 The genome may be modified by including an strand is transferred to a recipient DNA duplex by repair of exogenous gene or promoter element in the genome or in the mismatched bases using the invading strand as a template host with an expression vector, by enhancing the capacity of (Genes, 3" Ed. (1987) Lewin, B., John Wiley, New York, an identified gene to produce mRNA or protein, or by N.Y.: Lopez, et al., Nucleic Acids Res. 15: 5643(1987)). deleting or disrupting a gene or promoter element, or by Whether by the mechanism of breakage and rejoining or by reducing the capacity of a gene to produce mRNA or protein. the mechanism(s) of gene conversion, formation of hetero The genetic code can be altered, thereby affecting transcrip duplex DNA at homologously paired joints can serve to tion and/or translation of a gene, for example through transfer genetic sequence information from one DNA mol Substitution, deletion ("knock-out'), co-expression or inser ecule to another. tion ("knock-in”) techniques. Additional genes for a desired 0181. In homologous recombination, the incoming DNA protein or regulatory sequence that modulate transcription of interacts with and integrates into a site in the genome that an existing sequence can also be inserted. contains a Substantially homologous DNA sequence. In non-homologous (“random” or “illicit) integration, the Recombination incoming DNA integrates not at a homologous sequence in 0180. The genome of the host cell expressing recombi the genome but elsewhere, at one of a large number of nant protein or peptide can be modified via a genetic potential locations. A number of papers describe the use of targeting event, which can be by insertion or recombination, homologous recombination in mammalian cells. for example homologous recombination. Homologous recombination refers to the process of DNA recombination 0182 Various constructs can be prepared for homologous based on . Homologous recombination recombination at a identified locus. Usually, the construct permits site-specific modifications in endogenous genes and can include at least 10 bp, 20 bp, 30 bp, 40 bp, 50 bp, 70 bp, thus novel alterations can be engineered into a genome. One 100 bp, 500 bp, 1 kbp, 2 kbp, 4 kbp., 5 kbp. 10 kbp. 15 kbp. step in homologous recombination is DNA strand exchange, 20 kbp, or 50 kbp of sequence homologous with the iden which involves a pairing of a DNA duplex with at least one tified locus. Various considerations can be involved in DNA strand containing a complementary sequence to form determining the extent of homology of identified DNA an intermediate recombination structure containing hetero sequences, such as, for example, the size of the identified duplex DNA (see, for example Radding, C. M. (1982) Ann. locus, availability of sequences, relative efficiency of double Rev. Genet. 16: 405; U.S. Pat. No. 4,888,274). The hetero cross-over events at the identified locus and the similarity of duplex DNA can take several forms, including a three DNA the identified sequence with other sequences. Strand containing triplex form wherein a single complemen 0183 The targeting DNA can include a sequence in tary strand invades the DNA duplex (Hsieh, et al., Genes and which DNA substantially isogenic flanks the desired US 2006/01 10747 A1 May 25, 2006 36 sequence modifications with a corresponding identified modulated (either increased or decreased), or the mutant sequence in the genome to be modified. The Substantially protein can have a different activity when compared to the isogenic sequence can be at least about 95%, 97-98%, native protein. 99.0-99.5%, 99.6-99.9%, or 100% identical to the corre 0188 There are strategies to knock out genes in bacteria, sponding identified sequence (except for the desired which have been generally exemplified in E. coli. One route sequence modifications). The targeting DNA and the iden is to clone a gene-internal DNA fragment into a vector tified DNA can share stretches of DNA at least about 10, 20, containing an antibiotic resistance gene (e.g. amplicillin). 30, 50, 75, 150 or 500 base pairs that are 100% identical. Before cells are transformed via conjugative transfer, chemi cal transformation or electroporation (Puehler, et al. (1984) 0184 The DNA constructs can be designed to modify the Advanced Molecular Genetics New York, Heidelberg, Ber endogenous, identified gene product. The homologous lin, Tokyo, Springer Verlag), an origin of replication, such as sequence for identifieding the construct can have one or the vegetative plasmid replication (the oriV locus) is excised more deletions, insertions, Substitutions or combinations and the remaining DNA fragment is re-ligated and purified thereof designed to disrupt the function of the resultant gene (Sambrook, et al. (2000) Molecular cloning. A laboratory product. In one embodiment, the alteration can be the manual, third edition Cold Spring Harbor, N.Y., Cold Spring insertion of a selectable marker gene fused in reading frame Harbor Laboratory Press). Alternatively, antibiotic-resistant with the upstream sequence of the identified gene. plasmids that have a DNA replication origin can be used. After transformation, the cells are plated onto e.g. LB agar 0185. The genome can also be modified using insertional plates containing the appropriate (e.g. 200 g/mL deletion. In this embodiment, the genome is modified by ampicillin). Colonies that grow on the plates containing the recombining a sequence in the gene that inhibits gene antibiotics presumably have undergone a single recombina product formation. This insertion can either disrupt the gene tion event (Snyder, L. W. Champness, et al. (1997) Molecu by inserting a separate element, or remove an essential lar Genetics of Bacteria Washington DC, ASM Press) that portion of the gene. In one embodiment, the insertional leads to the integration of the entire DNA fragment into the deletion includes insertion of a gene coding for resistance to genome at the homologous locus. Further analysis of the a particular stressor, Such as an antibiotic, or for growth in antibiotic-resistant cells to verify that the desired gene a particular media, for example for production of an essen knock-out has occurred at the desired locus is e.g. by tial amino acid. diagnostic PCR (McPherson, M. J. P. Quirke, et al. (1991) 0186 The genome can also be modified by use of trans PCR: A Practical Approach New York, Oxford University posons, which are genetic elements capable of inserting at Press). Here, at least two PCR primers are designed: one that sites in prokaryote genomes by mechanisms independant of hybridizes outside the DNA region that was used for the homologous recombination. Transposons can include, for construction of the gene knock-out; and one that hybridizes example, TnT in E. coli, Tn554 in S. aureus, IS900 in M. within the remaining plasmid backbone. Successful PCR paratuberculosis, IS492 from Pseudomonas atlantica, amplification of the DNA fragment with the correct size IS116 from Streptomyces and IS900 from M. paratubercu followed by DNA sequence analysis will verify that the gene losis. Steps believed to be involved in transposition include knock-out has occurred at the correct location in the bacte cleavage of the end of the transposon to yield 3'OH: strand rial . The phenotype of the newly constructed transfer, in which transposase brings together the 3'OH mutant strain can then be analyzed by e.g. SDS polyacry exposed end of transposon and the identified sequence; and lamide gel electrophoresis (Simpson, R. J. (2003) Proteins a single step transesterification reaction to yield a covalent and Proteomics A Laboratory Manual. Cold Spring Har linkage of the transposon to the identified DNA. The key bor, N.Y., Cold Spring Harbor Laboratory Press). reaction performed by transposase is generally thought to be 0189 An alternate route to generate a gene knock-out is nicking or strand exchange, the rest of the process is done by by use of a temperature-sensitive replicon, Such as the host enzymes. pSC101 replicon to facilitate gene replacement (Hamilton, et al. (1989) New process for generating deletions and gene 0187. In one embodiment, a process is provided to replacements in Escherichia coli. Journal of Bacteriology increase the level of a identified gene or homologue thereof 171 (9): 4617-22). The process proceeds by homologous by incorporating a genetic sequence encoding the gene or recombination between a gene on a chromosome and homologue into the genome by recombination. In another homologous sequences carried on a plasmid temperature embodiment, a promoter is inserted into the genome to sensitive for DNA replication. After transformation of the enhance the expression of the identified gene or homologue. plasmid into the appropriate host, it is possible to select for In a separate embodiment, a process is provided for decreas integration of the plasmid into the chromosome at 44° C. ing the expression of a identified gene or homologue thereof Subsequent growth of these cointegrates at 30° C. leads to by recombination with an inactive gene. In another embodi ment, a sequence that encodes a different gene, which can a second recombination event, resulting in their resolution. have a separate function in the cell or can be a reporter gene Depending on where the second recombination event takes Such as a resistance marker or an otherwise detectable place, the chromosome will either have undergone a gene marker gene can be inserted into a genome through recom replacement or retain the original copy of the gene. bination. In yet another embodiment, a copy of at least a 0190. Other strategies have been developed to inhibit portion of the identified gene that has been mutated at one expression of particular gene products. For example, RNA or more locations is inserted into the genome through interference (RNAi), particularly using small interfering recombination. The mutated version of the identified gene RNA (siRNA), has been extensively developed to reduce or can not encode a protein, or the protein encoded by the even eliminate expression of a particular gene product. mutated gene can be rendered inactive, the activity can be siRNAs are short, double-stranded RNA molecules that can US 2006/01 10747 A1 May 25, 2006 37 target complementary mRNAs for degradation. RNAi is the 0193 The general format for site-directed mutagenesis phenomenon in which introduction of a double-stranded is: denaturation of plasmid DNA containing the template of RNA Suppresses the expression of the homologous gene. interest (cDNA, promoter, etc.) to produce single-stranded dsRNA molecules are reduced in vivo to 21-23 nt siRNAs regions; annealing of a synthetic mutant oligonucleotide to which are the mediators of the RNAi effect. Upon introduc the identified Strand; synthesis of a new complementary tion, double stranded RNAs get processed into 20-25 nucle Strand using, for example, T4 DNA Polymerase; and sealing otide siRNAs by an RNase III-like enzyme called Dicer the resulting nick between the end of the new strand and the (initiation step). Then, the siRNAs assemble into endoribo oligonucleotide, for example using T4 DNA . The nuclease-containing complexes known as RNA-induced resulting heteroduplex is propagated by transformation, for silencing complexes (RISCs), unwinding in the process. The example in E. coli. Selection and enrichment processes have siRNA strands subsequently guide the RISCs to comple been incorporated into mutagenesis processes to greatly mentary RNA molecules, where they cleave and destroy the improve the efficiency of mutant strand recovery and rates cognate RNA (effecter step). Cleavage of cognate RNA approaching 80-90% are possible. Numerous processes exist takes place near the middle of the region bound by the to generate different types of mutations and to enhance for siRNA strand. RNAi has been successfully used to reduce the selection of the mutant. Examples of processes to gene expression in a variety of organisms including enhance for the selection of the mutant include positive Zebrafish, nematodes (C. elegans), insects (Drosophila antibiotic selection of the mutant Strand, using a uracil melanogaster), planaria, cnidaria, trypanosomes, mice and containing DNA strand which can be selectively degraded in mammalian cells. vivo, and dNTP analog incorporation, which can render one strand of heteroduplex DNA impervious to digestion. Some Mutation approaches can be combined, Such as cassette mutagenesis 0191 The genome can also be modified by mutation of and the use of "doped oligonucleotides to create a library one or more nucleotides in a open reading frame encoding of random mutations in a small, defined region. an identified gene, particularly an identified protease. Tech 0194 An extension of the so-called “standard processes niques for genetic mutation, for instance site directed of site-directed mutagenesis includes those that rely on DNA mutagenesis are well known in the art. Some approaches amplification, specifically the polymerase chain reaction focus on the generation of random mutations in chromo (PCR). The major commonality in site-directed mutagenesis somal DNA such as those induced by X-rays and chemicals. is the use of a mutagenic oligonucleotide. The mutagenic Mutagenesis targeted to a defined region of DNA includes oligonucleotide should hybridize efficiently to the template. many techniques, Some more popular than others. In vitro For efficient hybridization, there can be, for example, 100% approaches to site-directed mutagenesis can be grouped base pairing at either end of the identified sequence without generally into three categories: i) processes that restructure secondary structure formation, but can also be less than fragments of DNA. Such as cassette mutagenesis; ii) local 100% identify, such as 98%. 95%, 92%, 90%, 85%, 80%, ized random mutagenesis; and iii) oligonucleotide-directed 70% or only a portion of the sequence can be identical. For mutagenesis. small substitutions, 10-15 bases hybridizing on either side of 0192 Oligonucleotide-directed mutagenesis is based on the mismatch are usually sufficient. The composition of the the concept that an oligonucleotide encoding a desired 3'-end of the primer is particularly important as polymerases mutation(s) is annealed to one strand of the DNA of interest do not typically extend from a mismatched or poorly hybrid and serves as a primer for initiation of DNA synthesis. In this ized 3'-end. manner, the mutagenic oligonucleotide is incorporated into 0.195 The basis for site-directed mutagenesis by positive the newly synthesized Strand. Mutagenic oligonucleotides antibiotic selection is that a selection oligonucleotide or incorporate at least one base change but can be designed to oligonucleotides are simultaneously annealed, with the generate multiple Substitutions, insertions or deletions. mutagenic oligonucleotide, to repair an antibiotic resistance Examples include PCR-based processes and practically all gene (10-13). Selection for the mutant strand is enabled by of the non-PCR-based processes in use today. These tech antibiotic resistance of the mutated DNA and sensitivity of niques include positive antibiotic selection (Lewis, M. K. the nonmutated strand. This approach offers a very efficient and Thompson, D. V. (1990) Nucl. Acids Res. 18, 3439; means to generate an indefinite number of the desired Bohnsack, R. N. (1996) Meth. Mol. Biol. 57, 1; Vavra, S. and mutations with little hands-on time. Brondyk, W. H. (1996) Promega Notes 58, 30; Altered Sites(R II in vitro Mutagenesis Systems Technical Manual 0196) Site-directed mutagenesis by the use of a unique #TM001, Promega Corporation), unique restriction site restriction site is based on the processes of Deng and selection (Deng, W. P. and Nickoloff, J. A. (1992) Anal. Nickoloff (Deng, W. P. and Nickoloff, J. A. (1992) Anal. Biochem. 200, 81), uracil incorporation (Kunkel, T. A. Biochem. 200, 81). In this approach, a selection oligonucle (1985) Proc. Natl. Acad. Sci. USA 82, 488; Kunkel, T. A., otide containing a mutated sequence for a unique restriction Roberts, J. D. and Zakour, R. A. (1987) Meth. Enzymol. 154, site is annealed simultaneously with the mutagenic oligo 367), and phosphorothioate incorporation (Taylor, J. W., Ott, nucleotide. The selection oligonucleotide renders the non J. and Eckstein, F. (1985) Nucl. Acids Res. 13, 8764: essential site immune to restriction by the corresponding Nakamaye, K. and Eckstein, F. (1986) Nucl. Acids Res. 14, enzyme. Selection for the mutant strand is enhanced by 9679). Oligonucleotides can also encode a library of muta digesting the resulting pool of plasmids with the unique tions by randomizing the base composition at sites during restriction enzyme. The digestion linearizes the parental chemical synthesis resulting in degenerate or "doped oli plasmid thereby effectively decreasing its ability to trans gonucleotides. The ability to localize and specify mutations form bacteria. is greatly enhanced by the use of synthetic oligonucleotides 0.197 Site-directed mutagenesis by deoxyuridine incor hybridized to the DNA insert-containing plasmid vector. poration relies on the ability of a host strain to degrade US 2006/01 10747 A1 May 25, 2006 template DNA that contains uracil (U) in place of thymidine ecule with a sequence complementary to at least a portion of (T). A small number of dUTPs are incorporated into the the identified gene. In addition, the inhibitor can be an template strand in place of dTTP in a host that lacks interfering RNA or a gene that encodes an interfering RNA. dUTPase (dut-) and uracil N-deglycosidase (ung-) activi In Eukaryotic organisms, such an interfering RNA can be a ties. (Uracil per se is not mutagenic and it base pairs with small interfering RNA or a ribozyme, as described, for adenine.) Normally, dOTPase degrades deoxyuridine and example, in Fire, A. et al. (1998) Nature 391:806-11, uracil N-deglycosidase removes any incorporated uracil. Elbashir et al. (2001) Genes & Development 15(2):188-200, Post-mutation replication in a dut+ ung+ strain is used then Elbashir et al. (2001) Nature 411 (6836): 494-8, U.S. Pat. to degrade nonidentified strand DNA. This approach Nos. 6,506,559 to Carnegie Institute, 6,573,099 to Benitec, requires that single-stranded DNA be used so that only one U.S. patent application Nos. 2003/0108923 to the White Strand contains the US which are Susceptible to degradation. head Inst., and 2003/0114409, PCT Publication Nos. WO03/ 006477, WO03/012052, WO03/023015, WO03/056022, 0198 The phosphorothioate incorporation approach to WOO3/064621 and WOO3/070966. The inhibitor can also be site-directed mutagenesis rests on the ability of a dNTP another protein or peptide. The inhibitor can, for example, analog containing a thiol group to render heteroduplex DNA be a peptide with a consensus sequence for the protease or resistant to restriction enzyme digestion. The mutant strand protease protein. The inhibitor can also be a protein or is extended from the mutagenic oligonucleotide and synthe peptide that can produce a direct or indirect inhibitory sized in the presence of dCTPalphaS. Unused template DNA molecule for the protease or protease protein in the host. is removed by digestion with an exonuclease. Theoretically, Protease inhibitors can include , E-64, Antipain, only circular, heteroduplex DNA remains. The heteroduplex Elastatinal, APMSF, Leupeptin, Bestatin, Pepstatin, Benza is then nicked, but not cut, at the restriction site(s). Exonu midine, 1,10-Phenanthroline, Chymostatin, Phosphorami clease III is used to digest the nicked strand and the don, 3,4-dichloroisocoumarin, TLCK, DFP, TPCK. Over remaining fragment then acts as a primer for repolymeriza 100 naturally occurring protein protease inhibitors have tion, creating a mutant homoduplex. been identified so far. They have been isolated in a variety 0199. In the polymerase chain reaction (PCR) based of organisms from bacteria to animals and plants. They approach to generate a mutation in DNA, a template is behave as tight-binding reversible or pseudo-irreversible amplified using a set of gene-specific oligonucleotide prim inhibitors of proteases preventing Substrate access to the ers except that one oligonucleotide, or more in protocols that active site through steric hindrance. Their size are also use multiple amplifications, contains the desired mutation. extremely variable from 50 residues (e.g. BPTI: Bovine Variations include altering the hybridization site of the Pancreatic Trypsin Inhibitor) to up to 400 residues (e.g. oligonucleotides to produce multiple, overlapping PCR frag alpha-1PI: alpha-1 Proteinase Inhibitor). They are strictly ments with the mutation in the overlap and the class-specific except proteins of the alpha-macroglobulin "megaprimer' approach, which uses three oligonucleotides family (e.g. alpha-2 macroglobulin) which bind and inhibit and two rounds of amplification wherein a product strand most proteases through a molecular trap mechanism. from the first amplification serves as a primer in the second 0203. An exogenous vector or DNA construct can be amplification. transfected or transformed into the host cell. Techniques for 0200. In the overlap extension approach, complementary transfecting and transforming eukaryotic and prokaryotic oligodeoxyribonucleotide (oligo) primers and the poly cells respectively with exogenous nucleic acids are well merase chain reaction are used to generate two DNA frag known in the art. These can include lipid vesicle mediated ments having overlapping ends. These fragments are com uptake, calcium phosphate mediated transfection (calcium bined in a subsequent fusion reaction in which the phosphate/DNA co-precipitation), viral infection, particu overlapping ends anneal, allowing the 3' overlap of each larly using modified viruses such as, for example, modified strand to serve as a primer for the 3' extension of the adenoviruses, microinjection and electroporation. For complementary strand. The resulting fusion product is prokaryotic transformation, techniques can include heat amplified further by PCR. Specific alterations in the nucle shock mediated uptake, bacterial protoplast fusion with otide (nt) sequence can be introduced by incorporating intact cells, microinjection and electroporation. Techniques nucleotide changes into the overlapping oligo primers. for plant transformation include Agrobacterium mediated transfer, Such as by A. tumefaciens, rapidly propelled tung Vector Constructs Sten or gold microprojectiles, electroporation, microinjec 0201 In a separate embodiment, the host cell is modified tion and polyethelyne glycol mediated uptake. The DNA can by including one or more vectors that encode a identified be single or double stranded, linear or circular, relaxed or gene, typically a folding modulator or a cofactor of a folding supercoiled DNA. For various techniques for transfecting modulator. In another embodiment, the host cell is modified mammalian cells, see, for example, Keown et al. (1990) by enhancing a promoter for a folding modulator or a Processes in Enzymology Vol. 185, pp. 527-537. cofactor for a folding modulator, including by adding an 0204 For recombination events, the constructs can exogenous promoter to the host cell genome. include one or more insertion sequences, which can insert or 0202) In another embodiment, the host cell is modified by transpose one or more nucleic acid sequence into a different including one or more vectors that encode an inhibitor of an sequence. However, the construct can be designed for exog identified compensatory gene. Such as a protease inhibitor. enous expression of an identified compensatory gene or Such an inhibitor can be an antisense molecule that limits the homologue thereof without incorporation into the existing expression of the identified compensatory gene, a cofactor cellular DNA/genome. of the identified gene or a homologue of the identified gene. 0205 The constructs can contain one, or more than one, Antisense is generally used to refer to a nucleic acid mol internal ribosome entry site (IRES). The construct can also US 2006/01 10747 A1 May 25, 2006 39 contain a promoter operably linked to the nucleic acid sequence encoding at least a portion of the identified gene, -continued or a cofactor of the identified gene, a mutant version of at least a portion of the identified compensatory gene, or in the Promoter SOUCC regulation induction case of proteases, an inhibitor of the identified gene. Alter lipp-lac (hybrid) E. coli lacI IPTG phoA E. coi phoB (positive) phosphate natively, the construct can be promoterless. In cases in which phoR (negative) starvation the construct is not designed to incorporate into the cellular recA E. coi lex A nalidixic acid DNA/genome, the vector typically contains at least one proU E. coi oSmolarity cst-1 E. coi glucose promoter element. In addition to the nucleic acid sequences starvation the expression vector can contain selectable marker tetA E. coi etracyclin sequences. The expression constructs can further contain cadA E. coi cadR bH sites for transcription initiation, termination, and/or ribo 8 E. coi fnir anearobic conditions Some binding sites. The identified constructs can be inserted PL w cIts857 hermal into and can be expressed in any prokaryotic or eukaryotic (shift to cell, including, but not limited to bacterial cells, such as P. 42° C.) cSpA E. coi hermal fluorescens or E. coli, yeast cells, mammalian cells, such as (shift to below CHO cells, or plant cells. 20° C.) T7 T7 cIts857 hermal 0206 Cloning vectors can include e.g. plasmid pBR322 T7-lac operator T7 lacI PTG (Bolivar, Rodriguez et al. 1977), the puC series of plasmids T3-lac operator T3 lacIl PTG T5-lac operator T5 lacI, lacI PTG (Vieira and Messing 1982), pBluescript (Short, Fernandez et T4 gene 32 T4 T4 infection al. 1988), p.ACYC177 and p ACYC184 (Chang and Cohen nprM-lac operator Bacilius lacI PTG 1978). Exogenous promoters for use in Such constructs, VHb Vitreoscia Oxygen include, but are not limited to, the phage lambda PL pro Protein A S. airetts moter, E. coli lac, E. coli trp., E. coli phoA, E. coli tac promoters, SV40 early, SV40 late, retroviral LTRs, PGKI, GALI, GALIO genes, CYCI, PH05, TRPI, ADHI, ADH2, 0208 Constructs can include selection markers to iden forglymaldehyde phosphate dehydrogenase, hexokinase, tify modified cells. Suitable selectable marker genes include, pyruvate decarboxylase, phosphofructokinase, triose phos but are not limited to: genes conferring the ability to grow on certain media Substrates, such as the tk gene (thymidine phate isomerase, phosphoglucose isomerase, glucokinase kinase) or the hprt gene (hypoxanthine phosphoribosyltrans alpha-mating factor pheromone, PRBI, GUT2, GPDI pro ferase) which confer the ability to grow on HAT medium moter, metallothionein promoter, and/or mammalian viral (hypoxanthine, aminopterin and thymidine); the bacterial promoters, such as those derived from adenovirus and gpt gene (guanine/Xanthine phosphoribosyltransferase) vaccinia virus. Other promoters will be known to one skilled which allows growth on MAX medium (mycophenolic acid, in the art. adenine, and Xanthine). See, for example, Song, K-Y., et al. 0207 Promoters for exogenous vectors, or exogenous (1987) Proc. Nat'l Acad. Sci. U.S.A. 84.6820-6824; Sam promoters designed to be inserted into the genome can be brook, J., et al. (1989) Molecular Cloning A Laboratory based on specific response elements in a cell. For example, Manual, Cold Spring Harbor Laboratory, Cold Spring Har promoters can be responsive to chemical compounds, for bor, N.Y., Chapter 16. Other examples of selectable markers example to anthranilate or benzoate, as described in PCT include: genes conferring resistance to compounds Such as Publication No. WO 2004/005221. The constructs can antibiotics, genes conferring the ability to grow on selected include one or more promoters. These can be independent, Substrates, genes encoding proteins that produce detectable or can be in tandem. For example the promoters can be signals such as luminescence, such as green fluorescent designed so that a identified compensatory gene is up- or protein, enhanced green fluorescent protein (eGFP). A wide down-regulated in a particular time frame with the recom variety of Such markers are known and available, including, binant protein or peptide. For example, in a case in which the for example, antibiotic resistance genes such as the neomy identified gene is a folding modulator, the folding modulator cin resistance gene (neo) (Southern, P., and P. Berg, (1982) or cofactor can be induced shortly before induction of the J. Mol. Appl. Genet. 1:327-341); and the hygromycin resis recombinant protein or peptide. Promoters can include, but tance gene (hyg) (1983) Nucleic Acids Research 11:6895 6911, and Te Riele, H., et al. (1990) Nature 348:649-651). are not limited to the following: Other selectable marker genes include: acetohydroxy acid synthase (AHAS), alkaline phosphatase (AP), beta galac tosidase (LacZ), beta glucoronidase (GUS), chlorampheni Promoter SOUCC regulation induction col acetyltransferase (CAT), green fluorescent protein (GFP), red fluorescent protein (RFP), yellow fluorescent lac E. coi lacI, lacI IPTG acUWS E. coi lacI, lacI IPTG protein (YFP), cyan fluorescent protein (CFP), horseradish tac (hybrid) E. coi lacI, lacI IPTG peroxidase (HRP), luciferase (Luc), nopaline synthase trc (hybrid) E. coi lacI, lacI IPTG (NOS), octopine synthase (OCS), and derivatives thereof. P., (synthetic) E. coi lacI, lacI IPTG trp E. coi tryptophan Multiple selectable markers are available that confer resis starvation tance to amplicillin, bleomycin, chloramphenicol, gentamy araBAD E. coi araC 1-arabinose cin, hygromycin, kanamycin, lincomycin, methotrexate, lipp E. coi IPTG, lactose phosphinothricin, puromycin, and tetracycline. Additional selectable marker genes useful in this invention, for US 2006/01 10747 A1 May 25, 2006 40 example, are described in U.S. Pat. Nos: 6,319,669; 6,316, deletions or other sequences in the homologous sequence. 181; 6,303,373; 6,291,177; 6,284,519; 6,284,496; 6,280, After final manipulation, the construct can be introduced 934; 6,274,354; 6,270,958; 6,268,201: 6,265,548; 6,261, into the cell. 760; 6,255,558; 6.255,071; 6,251,677; 6,251,602; 6,251, 0213 The process can be iterative. In one embodiment, 582; 6,251,384; 6,248,558; 6,248,550; 6,248.543; 6,232, after modification of the host and expression of the recom 107; 6,228,639; 6,225,082; 6,221,612; 6,218,185; 6,214, binant protein in the modified host, a genetic profile of the 567; 6,214,563; 6,210,922: 6,210,910; 6,203,986; 6,197, modified host cell is analyzed to identify one or more further 928; 6,180,343; 6,172,188: 6,153,409; 6,150,176: 6,146, identified genes the expression of which is changed in the 826; 6,140,132: 6,136,539; 6,136,538; 6,133,429; 6,130, modified host cell. In particular, compensatory genes can be 313; 6,124,128; 6,110,711; 6,096,865; 6,096,717; 6,093, those that show increased expression in the modified host 808; 6,090,919; 6,083,690; 6,077,707; 6,066,476; 6,060, expressing recombinant protein when compared to a modi 247; 6,054,321; 6,037,133; 6,027,881; 6,025, 192: 6,020, fied host cell not expressing the recombinant protein or 192: 6,013,447; 6,001,557; 5,994,077; 5,994,071; 5,993, peptide, or when compared to an unmodified host cell. The 778; 5,989,808; 5,985,577; 5,968,773; 5,968,738; 5,958, process further includes changing the expression of the 713; 5,952,236; 5,948,889; 5,948,681; 5,942,387; 5,932, further identified gene or genes and expressing the protein or 435; 5,922,576; 5,919,445; and 5,914,233. peptide in the doubly modified cell. These steps can be iterated to improve protein expression and can be repeated 0209) Deletions can be at least about 5 bp, 10 bp, 20 bp, one, two, three, four, five, six, seven, eight, nine, or at least 30 bp, 40 bp or 50 bp, commonly at least about 100 bp, and ten times. generally not more than about 20 kbp, where the deletion can normally include at least a portion of the coding region Production of Protein including a portion of or one or more exons, a portion of or one or more introns, and can or can not include a portion of 0214) The process of the invention optimally leads to the flanking non-coding regions, particularly the 5'-non increased production of recombinant protein or peptide in a coding region (transcriptional regulatory region). Thus, the host cell. The increased production can include an increased homologous region can extend beyond the coding region amount of protein per gram of host protein in a given amount into the 5'-non-coding region or alternatively into the 3'-non of time, or can include an increase in the length of time the coding region. Insertions can generally not exceed 10 kbp. host cell is producing recombinant protein or peptide. The usually not exceed 5 kbp, generally being at least 50 bp, increased production can also include an improvement in the requirements for growth of the recombinant host cell. The more usually at least 200 bp. increased production can be an increased production of full 0210. The region(s) of homology can include mutations, length protein or peptide. If the improvement is in increased where mutations can further inactivate the identified gene, in levels of protein, the protein or peptide can be produced in providing for a frame shift, or changing a key amino acid, or one or more inclusion bodies in a host cell. the mutation can correct a dysfunctional allele, etc. Usually, the mutation can be a Subtle change, not exceeding about 5% 0215. The increased production alternatively can be an of the homologous flanking sequences. increased level of active protein or peptide per gram of protein produced, or per gram of host protein. The increased 0211 The construct can be prepared in accordance with production can also be an increased level of recoverable processes known in the art, various fragments can be protein or peptide. Such as soluble protein, produced per brought together, introduced into appropriate vectors, gram of recombinant or per gram of host cell protein. The cloned, analyzed and then manipulated further until the increased production can also be any combination of desired construct has been achieved (see, for example FIGS. increased total level and increased active or soluble level of 5-11). Various modifications can be made to the sequence, to protein. allow for restriction analysis, excision, identification of probes, etc. Silent mutations can be introduced, as desired. 0216) Increased production is typically measured by At various stages, restriction analysis, sequencing, amplifi comparing the level of production after a certain period of cation with the polymerase chain reaction, primer repair, in induction in a modified cell to the same induction in the vitro mutagenesis, etc. can be employed. Processes for the unmodified cell. incorporation of antibiotic resistance genes and negative Soluble? Insoluble selection factors will be familiar to those of ordinary skill in the art (see, e.g., WO 99/15650; U.S. Pat. No. 6,080,576: 0217. The improved expression of recombinant protein U.S. Pat. No. 6,136,566; Niwa, et al., J. Biochem. 113:343 can be an increase in the solubility of the protein. The 349 (1993); and Yoshida, et al., Transgenic Research, 4:277 recombinant protein or peptide can be produced and recov 287 (1995)). ered from the cytoplasm, periplasm or extracellular medium of the host cell. The protein or peptide can be insoluble or 0212. The construct can be prepared using a bacterial soluble. The protein or peptide can include one or more vector, including a prokaryotic replication system, e.g. an targeting sequences or sequences to assist purification. origin recognizable by a prokaryotic cell Such as Pfluore Scens or E. coli. A marker, the same as or different from the 0218. In certain embodiments, the invention provides a marker to be used for insertion, can be employed, which can process for improving the Solubility of a recombinant pro be removed prior to introduction into the identified cell. tein or peptide in a host cell. The term “soluble' as used Once the vector containing the construct has been com herein means that the protein is not precipitated by centrifu pleted, it can be further manipulated, such as by deletion of gation at between approximately 5,000 and 20,000xgravity certain sequences, linearization, or introducing mutations, when spun for 10-30 minutes in a buffer under physiological US 2006/01 10747 A1 May 25, 2006

conditions. Soluble, active proteins are capable of exhibiting shake-flasks at 30° C. ODs, was recorded for each strain at function, and are not part of an inclusion body or other various time points. precipitated mass. 0219. The invention can also improve recovery of active TABLE 1. recombinant proteins or peptides. For example, the interac Overview of bacterial strains tion between a identified and a parent polypeptide, polypep tide variant, segment-substituted polypeptide and/or resi Relevant Strain due-substituted polypeptide can be measured by any Strain Genotype Plasmid Recombinant Protein convenient in vitro or in vivo assay. Thus, in vitro assays can MB214 Pfluorescens host strain be used to determine any detectable interaction between a DC206 pyrF identified and polypeptide, e.g. between enzyme and Sub DC240 pyrF pDOW2415 nitrilase strate, between hormone and hormone , between DC271 pyrF pDOW1323 pbp:hoH antibody and antigen, etc. Such detection can include the DC28O pyrF pDOW1339 vector only plasmid measurement of colorimetric changes, changes in radioac DC369 pyrF pDOW1426 GH DC462 pyrF pDOW3501 GrpE, DnaKJ tivity, changes in solubility, changes in molecular weight as DC463 pyrF pDOW3501, GrpE, DnaKJ, hoH measured by gel electrophoresis and/or gel exclusion pro pDOW1426 cesses, etc. In vivo assays include, but are not limited to, H104 pyrF pDOW1349 GH-COP assays to detect physiological effects, e.g. weight gain, DC370 pyrF, hsi U DC372 pyrF, hsi U pDOW1426 GH change in electrolyte balance, change in blood clotting time, DC373 pyrF, hsi U pDOW1323 pbp:hoH changes in clot dissolution and the induction of antigenic H1 OS pyrF, hsi U pDOW1349 GH-COP response. Generally, any in vivo assay can be used so long DC417 pyrF, hsi UV as a variable parameter exists so as to detect a change in the HJ115 pyrF, hsi UV pDOW1426 GH interaction between the identified and the polypeptide of H117 pyrF, hsi UV pDOW1349 GH-COP interest. See, for example, U.S. Pat. No. 5,834,250. Cytoplasmic/Periplasmic/Secreted 0224 Plasmids used in the following experiments are 0220. In certain embodiments, the protein can also be listed in Table 2. secreted into the periplasm if fused to an appropriate signal secretion sequence. In one embodiment, the signal sequence TABLE 2 can be a phosphate binding protein, a Lys-Arg-Orn binding Qverview of plasmids protein (LAObp or KRObp) secretion signal peptide, an Outer Membrane Porin E (OprE) secretion signal peptide, an Plasmids Relevance aZurin secretion signal peptide, an iron (III) binding protein pDOW2236 cloning vector Fe(III)bp secretion signal peptide, or a lipoprotein B pDOW2240 Ptac grpE-dnaKJ, pyrF' (LprB) secretion signal peptide. pDOW2247 Pmtl no recombinant gene; empty vector pDOW3501 Pmtl grpE-dnaKJ, pyrF' 0221) In one embodiment, no additional disulfide-bond pDOW1349 pyrF', hoH:COP promoting conditions or agents are required in order to pDOW1426 pyrF", hoH pDOW1261-2 suicide vector, pyrF* recover disulfide-bond-containing identified polypeptide in pDOW2050 used for construction of the his UV deletion strains active, soluble form from the modified host cell or doubly or multiply modified cell. In one embodiment, the transgenic peptide, polypeptide, protein, or fragment thereof has a folded intramolecular conformation in its active state. It has Sample Collection and RNA. Isolation been found that complex mammalian proteins soluble in the 0225. All samples were collected from a 200 ml standard cytoplasm can configure appropriately with the proper posi shake flasks experiments. Samples were taken at different tioning of the thiol groups for later disulfide bond formation time points as indicated in the figures. At each time point, 10 in the periplasm. In one embodiment, the transgenic peptide, ml of cell culture from the shake flasks was collected and polypeptide, protein, or fragment contains at least one mixed with 10 ml of RNAlater (Ambion, Austin, Tex.) intramolecular disulfide bond in its active state; and perhaps reagent to stabilize RNA. up to 2, 4, 6, 8, 10, 12, 14, 16, 18, or 20 or more disulfide bonds. Microarray Hybridization and Data Analysis 0222. In one embodiment, more than 50% of the 0226 For each RNA sample, the fluorescent nucleotides expressed, transgenic peptide, polypeptide, protein, or frag Cy3-dUTP or Cy5-dUTP (Amersham Pharmacia, Piscat ment thereof produced will be produced as single, functional away, N.J.) were incorporated into cDNA in a reverse peptides, polypeptides, proteins, or fragments thereof in transcription (RT) reaction using random hexamer primer soluble, active form or insoluble easily renatured form in the (Amersham). The two labeled cDNA pools were combined cytoplasm or periplasm. In another embodiment about 60% and applied to a microarray slide. The microarray slides , 70%, 75%, 80%, 85%, 90%. 95% of the expressed protein contains 50mer amino-modified oligodeoxyribonucleotides is obtained in or easily renatured into active form. (oligos) representing each ORF of Pfluorescens. Each oligo was printed twice for duplicate spots at different location EXAMPLES using the SDDC-2 robot (Virtek, Toronto, Canada—now distributed through Bio-Rad Laboratories, Hercules, Calif.) 0223 The bacterial strains used in the current study are and SMP3 pins (TeleChem International Inc., Sunnyvale, listed in Table 1. Strains of P. fluorescens were grown in Calif.). The microscope slides used were coated with a US 2006/01 10747 A1 May 25, 2006 42 positively charged epoxy resin for efficient DNA binding 0230. The genetic profiles, ie. transcriptional profiles (MWG Inc, Alameda, Calif.). After printing, the slides were were based on the comparison of the 4 hrs after induction post-processed according to MWG’s specifications. A soft time point sample with that of Ohr sample, the two samples ware package from BioDiscovery Inc. (El Segundo, Calif.) were labeled with fluorescent dyes, either Cy3-dUTP or was used to facilitate the data analysis. This package con Cy5-dUTP, and co-hybridized to the same slide for each sists of CloneTrackerTM, ImaGeneTM, GeneSightTM modules strain. Each hybridization was duplicated with dye-swap and the GeneDirectorTM database. Each hybridized slide was experiments (i.e., samples were labeled with either Cy3 scanned using Scan Array 5000 (Packard BioScience, Bil dUTP or Cy5-dUTP) (Table 3, slides 1 to 6). The hybridized lerica, Mass.) to measure fluorescence of the Cy3- and Cy5-labeled cDNA bound to the microarray. The acquired slides were scanned using a confocal laser Scanner. Signal images were quantified in ImaCieneTM and raw data was intensity for each gene was determined and processed using processed in GeneSightTM. During the data preparation, the the microarray software package from Biodiscovery (El spot intensity for each gene was background-corrected; the Segundo, Calif.). The expression ratio of the two time points signal for the Cy5 channel was normalized to the Cy3 for each gene was calculated and ratios for all the genes channel using the total signal intensity for the entire array; across the Strains were clustered based on the ratio value and the normalized ratio of Cy5 to Cy3 for each gene was log2 trend among the three strains (DC280, DC240 and DC271) transformed, and replicates were combined. (FIG. 2). Protein Expression Analysis by SDS-PAGE TABLE 3 0227 Culture aliquots were harvested at various time points after IPTG induction, normalized to ODoo of 10. Cell Summary of microarray experiments performed in Examples 1-3 lysates were separated into soluble and insoluble fractions Experiment Slide Cy3 Cy5 by centrifugation at 11000 g for 5 min. Aliquots of 2.5 ul DC28O 1 4 hr sample 0 hr sample were combined with 5 ul 2xNuPAGE LDS sample buffer 2 0 hr sample 4 hr sample (Invitrogen, San Diego, Calif.), 50 uM DTT, and HO to 10 DC240 3 4 hr sample 0 hr sample ul, then heated at 95° C. for 5 min. The proteins were 4 0 hr sample 4 hr sample DC271 5 4 hr sample 0 hr sample separated and visualized on 12% Nupage gels stained with 6 0 hr sample 4 hr sample Coomassie Blue using Simply Blue Safestain (Invitrogen, Ohr 7 DC240 DC271 San Diego, Calif.). 8 DC271 DC240 4.hr 9 DC240 DC271 Fluorescence Activity Measurement 10 DC271 DC240 DC369 11 4 hr sample 0 hr sample 0228 Protein yield was also measured by fluorescence 12 0 hr sample 4 hr sample activity of the fusion of green fluorescence protein (COP) and human growth hormone (hGH). The hgh::COP fusion construct was transformed into wild-type or hslU mutant 0231. To focus on FM and protease gene expression in P strains and selected on the M9 glucose agar plate without uracil. The IPTG-induced cell culture were normalized to fluorescens under the stress imposed by high level recom OD, of five. Relative fluorescence (RF) activity was mea binant protein production, a list of FM and protease genes Sured using the Spectramax Gemini microplate spectrofluo was compared to the cluster analysis. After hierarchical rimeter (Molecular Devices, Sunnyvale, Calif.) under the clustering analysis of all the genes from DC280, DC240 and appropriate setting (Ex485, Ems.38530 bandpass filter). DC271, FMs and proteases were identified in two clusters (lines in clusters 6 and 7: FIG. 2). Example 1 0232 Four genes in cluster 7 show significant higher Gene Expression Analysis of Strains Producing Cytoplasmic expression in DC271 expressing mainly insoluble periplas and Periplasmic Proteins—Comparison of Different Time mic human growth hormone as compared to DC240 pro Points ducing soluble cytoplasmic nitrilase or DC280, which does 0229. To study the FMs and protease gene expression not overproduce any protein. The four genes are rxfo1961 during the production of heterologous protein, Pfluorescens encoding HslV, rxfo1957 encoding HslU, rxfo3987 encod strains DC206, 280, 240 and 271 were used in the initial ing CbpA and rxf)5455 encoding HtpG. The E. coli HslV microarray experiments. DC206 is the host strain and was (ClpQ) and HslU (ClpY) together form a cytoplasmic pro used as a control for cell growth; DC280 has a vector-only tease. The small subunit, HslV, is a peptidase related to the plasmid and was used as a control for the microarray proteasomal C-Subunits of eukaryotes. The large subunit, experiments; DC240 is DC206 with a plasmid encoding HslU, is an ATPase with homology to other Clp family cytoplasmic nitrilase enzyme that is soluble; DC271 is such as ClpA and ClpX. CbpA of E. coli is an DC206 with a plasmid encoding the periplasmic human growth hormone (pbp::hCH) that is partly insoluble. Strains analogue of the well-characterized co-chaperone DnaJ as were grown in 200 ml of shake flask medium and cell judged from not only its structure but also its function. The growth was monitored by measuring ODs,s. IPTG induction phenotype of lesions in DnaJ. Such as temperature sensitiv was performed 24 hrs after inoculation. All Strains grew ity for growth, are restored upon introduction of the cbp.A similarly and culture samples were taken just before (0 hr) gene on a multicopy plasmid. HitpG of E. coli functions as and 4 hrs after induction for RNA isolation and transcrip an ATP-independent molecular chaperone in vitro. It recog tional profiling (TXP) using DNA microarrays (FIG. 1). nizes and transiently binds non-native folding intermediates, US 2006/01 10747 A1 May 25, 2006 43 reducing their free concentration in Solution and thus pre Example 2 Venting unspecific aggregation. Gene Expression Analysis of Strains Producing Cytoplasmic and Periplasmic Proteins—Direct Comparison of Different 0233. The genes in cluster 6 of FIG. 2 were clustered Strains again using hierarchical clustering to identify less pro 0234. In order to confirm the results obtained above, nounced effects. FIG. 3 shows that FMs and proteases were additional microarray experiments were performed by direct identified in two main clusters (lines in cluster 6 and 8). The comparison of the two strains DC271 and DC240 (slides 7 two FMs in cluster 8 are DnaK and DnaJ, two main to 10 in Table 3). The comparison of the two strains at the chaperones that are well known to work together to fold 4 hrs after induction time point confirmed that an almost numerous proteins. Further analysis of expression values of 0235 identical set of FM and protease genes were up regulated in cells expressing partially soluble pbp:hCH genes from cluster 6 identified an additional FM, ClpX that (Table 5). All genes listed in Table 5 are significantly (i.e. is higher expressed in DC271 producing pbp::hCH as com 22-fold) higher expressed in Strains producing the partly pared to DC240 producing nitrilase or DC280, which does insoluble hCH as compared to cells producing fully soluble not overproduce any protein. The E. coli ClpX heat shock nitrilase. In the direct comparison of DC271 to DC240, a protein is homologous to members of prokaryotic and few additional proteins were identified as compared to the eukaryotic HSP100/Clp ATPases family. ClpX of E. coli was time point comparison (see Table 4) that showed signifi isolated as a specific component of the ATP-dependent Clp cantly higher gene expression values during partially insoluble hCH production. Those genes included rxfo8347 proteases, which maintain certain polypeptides in a form encoding Clpb, rxf)4587 encoding ClpA, and rxfo5753 competent for proteolysis by the Clipp protease subunit. encoding FkbP. The E. coli Clpb homologue is involved in ClpX can act as a molecular chaperone, in the absence of reactivation of inclusion bodies together with DnaKJ-GrpE. Clpb, by activating the initiation proteins involved in DNA ClpA from E. coli has a chaperone function or, when replication. Identified FMs and proteases important for together with Clpb, degrades proteins. In E. coli, FkbP periplasmic hCH production are listed in Table 4. 0236 functions as a peptidyl-prolyl isomerase.

TABLE 4 List of FM and protease genes whose steady-state mRNA ratio levels are higher in DC271 as compared to DC240 and DC280. The values listed are the ratio of 4 hr after IPTG induction to Ohr.

DC28O DC240 DC271 Gene ID (4 hr vs. Ohr) (4 hr vs. Ohr) (4 hr vs. Ohr) Gene Function RXFO5455 1 O.8 O6 5.3 htpG Chaperone protein HtpG RXFO3987 1 1.O O.S 5.2 cbp.A Curved DNA-binding protein RXFO1961. 1 O.9 0.4 S.O hSV ATP-dependent protease Hs V (ec 3.4.25.—) RXFO1957 1 1.O 4.8 hisU ATP-dependent Hs! protease, ATP-binding Subunit HSU RXFOS399 1.O O6 3.3 dnaK Chaperone protein DnaK RXFO5399 1 1.3 O6 3.0 dnaK Chaperone protein DnaK RXFO5406 1 1.2 0.7 3.0 dna Chaperone protein DnaJ RXFO4654. 1 1.1 O.9 2.0 clpX ATP-dependent Clp protease, ATP-binding Subunit ClpX *For dinaK, two probes are present on the microarray chip and thus two gene expression values are pro vided.

TABLE 5 List of FM and protease genes whose steady-state mRNA levels are higher in DC271 as compared to DC240. The values listed are the ratio of DC271 to DC240 at 4 hr after IPTG induction.

DC271 wS. DC271 wS. Gene ID DC240 at Ohr DC240 at 4 hr Gene Function RXFO3987 1 O.8 10.8 cbp.A Curved DNA-binding protein RXFO 1957 1 O.9 1O.O hSIU ATP-dependent Hsl protease, ATP binding subunit HslU RXFO 1961. 1 0.7 1O.O hSIV ATP-dependent protease HsV (ec 3.4.25.—) US 2006/01 10747 A1 May 25, 2006 44

TABLE 5-continued List of FM and protease genes whose steady-state mRNA levels are higher in DC271 as compared to DC240. The values listed are the ratio of DC271 to DC240 at 4 hr after IPTG induction.

DC271 vs. DC271 wS. Gene ID DC240 at Ohr DC240 at 4 hr Gene Function RXFO5455 1 0.7 7.8 htpG Chaperone protein HtpG RXFO5406 1 1.O 4.7 dna Chaperone protein DnaJ RXFO8347 1 O6 3.8 clipB ClpB protein RXFO5399 1 1.O 3.7 dnaK* Chaperone protein DnaK RXFOS399 O.9 2.9 dnaK* Chaperone protein DnaK RXFO4587 1 O.9 2.8 clip A ATP-dependent Clp protease, ATP binding subunit Clp A RXFO5753. 1 1.1 2.1 fkbP Peptidyl-prolyl cis-trans isomerase (ec 5.2.1.8) RXFO4654. 1 1.2 2.0 clpX ATP-dependent Clp protease, ATP binding subunit ClpX *For dinaK, two probes are present on the microarray chip and thus two gene expression values are provided. Example 3 Example 4 Gene Expression Analysis of a Strain Producing an Generation of an hslU Mutant Strain in P. fluorescens Insoluble Cytoplasmic Protein DC206 0237 Since DC271 expresses partially periplasmic 0239). The two genes hslVU were found to be among the human growth hormone (pbp::hCH), it was investigated if most highly up-regulated identified genes. HslU is a cyto similar or different FMs and protease genes were up-regu plasmic ATPase. The homologous protein in E. coli can act lated in a strain expressing mainly insoluble cytoplasmic in combination with a second protein to promote energy hGH. DC369 was used in this experiment. The 4 hrs after dependent protein degradation in E. coli. HslU interacts with induction sample was compared with that of the Ohr time HslV, a protein with homology to the a subunits of protea point sample, and microarray experiments were performed some. The E. coli HslVU homologues were reported to be as shown in Table 3 (slides 11 and 12). Again, similar FM involved in overall proteolysis of misfolded proteins in and protease genes were found to be up-regulated indicating Missiakas, D., et al. (1996) Identification and characteriza that the identified genes are involved in tion of HsIV HsIU (ClpQ ClpY) proteins involved in overall proteolysis of misfolded proteins in Escherichia coli. Embo 0238 cytoplasmic rather than periplasmic folding and J 15:6899-909. DNA sequence analysis suggested that the P protein degradation (Table 6). A Summary of which genes fluorescens hslVU genes are likely to be part of a bicistronic were identified in which experiment along with the fold operon (FIG. 5). up-regulation is shown in the Venn diagram of FIG. 4. 0240. In order to verify that HsVU are indeed involved TABLE 6 in the degradation of hCGH, an hslU knockout strain was constructed. Such a strain was generated by insertional List of FM and protease genes whose steady-state mRNA levels inactivation of hslU (FIG. 6). An approximately 550 bp are higher in DC369 at 4 hrs after induction as compared to time zero. The values listed are the ratio of 4 hr after IPTG induction DNA fragment internal to hslU was cloned into the kana to Ohr (iust before induction). mycin-resistant pCR2.1-TOPO vector. Since this vector has an origin of replication (ColE1) that is functional in E. coli DC369 but not in P. fluorescens, the constructed plasmids will Gene ID (4 hr vs. Ohr) Gene Function integrate into the chromosome of DC206 through homolo RXFO1961. 1 4.8 hSV ATP-dependent protease HsV gous recombination in order to confer kanamycin resistance. (ec 3.4.25.—) The correct insertion site for the kanamycin resistant colo RXFO1957 1 4.3 hsO ATP-dependent Hsl protease, ATP-binding subunit HslU nies was confirmed by diagnostic colony PCR using primers RXFO3987 1 4.1 cbp.A Curved DNA-binding protein that hybridize to the outside of the originally amplified RXFO5455 1 3.3 htpG Chaperone protein HtpG region and within the plasmid backbone (Table 3). The RXFO5406 1 2.3 dna Chaperone protein DnaJ constructed hslU mutant strain was designated DC370. RXFO8347 1 2.2 clpB ClpB protein RXFO5399 1 2.1 dnaK* Chaperone protein DnaK RXFO2095 1 2.0 groES 10 kDa Chaperonin GroES 0241 Primers were designed that would amplify a ~550 RXFO6767 1 2.0 groEL 10 kDa Chaperonin GroEL bp internal region of the hslU gene (Table 7). The internal RXFOS399 1.8 dnaK* Chaperone protein DnaK fragment was amplified using Taq Polymerase (Promega), RXFO4587 1 1.7 clpA ATP-dependent Clp protease, purified, and cloned into pCR2.1-TOPO vector (Invitrogen, ATP-binding subunit ClpA San Diego, Calif.). The plasmids were transformed into *For dnaK, two probes are present on the microarray chip and thus two competent P. fluorescens DC206 and selected on the M9 gene expression values are provided. glucose agar plates Supplemented with 250 g/ml uracil and 50 g/ml kanamycin. US 2006/01 10747 A1 May 25, 2006

samples were taken at various time points for fluorescence TABLE 7 measurements (FIG. 10). The readings from the fluorimeter clearly showed that the hslU protease mutant strain had Primers significantly higher protein expression levels compared to Primers Sequence Purpose that of the parental strain (FIG. 11). This finding corrobo rates the results obtained by SDS-PAGE analysis. Compar hslU sens accgaagttcggctatotggg used in PCR (SEQ ID NO: 1) amplification ing to the wild type strain, the hslU mutant increased 33.05% his 1U antis aatcgc.gctgcacgc.ctitcg of the of the relative fluorescence at 24 hrs after induction (see (SEQ ID NO: 2) internal insert in FIG. 11). his 1U fragment Example 5 his1 F2 ttcatcaaggtogaag.cg used in Construction of an hslUV Clean Knockout Strain (SEQ ID NO:3) diagnostic PCR his1 R2 to agtcttgaccatgcc 0244. The Hsl protease consists of two subunits: an (SEQ ID NO : 4) ATP-binding subunit encoded by hslU, and a protease sub M13 R caggaaac agctatoac (SEQ ID NO:5) unit encoded by hslV. The previously constructed Hsl pro M13 F taaaacgacggc.ca.g tease knock-out strain is an insertional inactivation of the (SEQ ID NO : 6) hslU gene. To remove the concern that HsV might still function as a protease by being able to couple with an his 1-Up gtgg cagocaccaaggctgc used in SOE (SEQ ID NO: 7) PCR the up ATP-binding subunit of another protease, a deletion strain his 1 middleUp ccacattgagtgaggcttac and down- DNA was constructed that had both the hslU and hslV genes aaggggaga.gtc.tcCacg fragment of removed from the chromosome. (SEQ ID NO:8) his 1UW 0245) As shown in FIG. 13, plasmid p)OW2050 was his 1 middleDown cgtggagact citcc.ccttgt constructed by PCR amplification of two DNA fragments aagcct cact caatgtggg flanking the hslUV region, the two fragments were subse (SEQ ID NO:9) his 1 down ggcCaatggttggCC acgc.g quently fused using the Splicing by Overlap Extension (SEQ ID NO : 10) (SOE) PCR method (see Ho, S. N. (1991) Method for gene splicing by overlap extension using the polymerase chain his 1 UpUp tgcc.gacgc.cacaggtgc used in (SEQ ID NO : 11) diagnostic PCR reaction. Application: U.S. patent 89-3920955023171). The his1 DownDown gcctgg tact gcigacitcg fused DNA fragments were then ligated into the SrfI site of (SEQ ID NO:12) vector pl)OW1261-2. The deletion plasmid was named pDOW2050 after the insert was confirmed by DNA RC1.99 atatactagtag gaggtaac used in titatggctgacgaacagac cloning the sequencing. gca (SEQ ID NO: 13) grpE-DnaKJ 0246 Plasmid plDOW2050 was electroporated in DC206 RC2OO at attctagattacaggtog and plated onto M9 agar plates supplemented with 1% CC galagaagc glucose and 15ug/ml tetracycline. Tetracycline-resistance is (SEQ ID NO:14) due to an integration event that recombines the entire plasmid into the chromosome at one of the two homologous regions within the genome (FIG. 13). To select for cells that Protein Expression Comparison by SDS-PAGE Analysis have a deletion of the hslUV genes resulting from a second 0242 To study the effect of the hslU gene knockout, two homologous recombination between the integrated plasmid exogenous protein expression were compared between the and the homologous DNA region in the chromosome, the parent strain DC206 and the newly constructed mutant strain tetracycline resistant colonies were grown to stationary DC370. The plasmids harboring the gene encoding phase in LB medium supplemented with 250 g/ml uracil. pbp:hCH (pDOW 1323), and hCH (pDOW1426) were each Cells are then plated onto LBagar plates supplemented with transformed into competent DC370 cells and resulted in 500 ug/ml 5-fluoroorotic acid (5-FOA). Cells that lost the strains DC373 and DC372, respectively. Standard shake integrated plasmid by a second recombination event also flask growth experiments were performed with the four have lost the pyrF gene and thus are resistant to 5-FOA, strains. FIG. 7 shows that the wild-type and mutant strains resulting in the desired chromosomal hslUV deletion strain, have similar growth rates. Samples were run on SDS-PAGE called DC417. gels (FIGS. 8 and 9). The results suggest that the mutant Phenotypic Analysis of hslUV Deletion Strain produced higher amounts of proteins due to the deletion of 0247 SDS-PAGE analysis of the hslUV deletion strain the protease subunit HslU. expressing hCGH protein (strain HJ115) showed much higher Protein Expression Comparison by Fluorescence Activity protein yield than the wild-type strain DC369, similar to 0243 Since the observed effect of the lack of HslU on the what was observed earlier using the hslU insertional mutant yield of hCGH is difficult to quantitate using SDS-PAGE strain DC372 (data not shown). analyses, the temporal profile of protein production was 0248 Protein yield was also measured by fluorescence monitored by the fluorescence of a fusion protein between activity of the hCH::COP fusion using the same method COP green fluorescent protein and hCH. A plasmid contain described earlier. Plasmid plCW 1349 containing the hCH ing an hCGH::COP fusion was constructed and transformed ::COP fusion was transformed into wild-type and mutant into the parent strain DC206 and the hslU gene deletion strains resulting in strains HJ 104 and HJ117, respectively. strain DC370 resulting in strains HJ104 and HJ105 (Table Standard shake flask experiments were performed and 1). Standard shake flask experiments were performed and samples were taken at various time points for relative

US 2006/01 10747 A1 May 25, 2006 48

TABLE 8-continued Protease genes whose steady-state mRNA levels are higher in the hslUV protease deletion strain (HJ115) as compared to the wild type strain (DC369), based on the ratio of 4 hr after IPTG induction to Ohr (iust before induction).

Curated ORFID Function Sequence tCgatgccgaccagatcgacgaCatCatggcgggccgtacgcc.gcgtgagcc.gc.gcgactg ggalaggtggttcgggtactitcgggcactCcgcctgtggtgcagaatgagogcCctgaaacgc CtatcggcggcCCggcagctgatCactaa

Example 7 Shake Flask Fermentation, Sample Collection and Analysis Co-Overexpression of Folding Modulators Increases Solu 0252) Duplicate cultures of DC463 were grown in shake bility of Target Protein hCH flasks. Protein induction was accomplished by addition of 0250 Based on the transcriptional profiling data shown in 0.1 mM IPTG for hCH and 0.5% mannitol for GrpE-DnaKJ FIG. 4, expression of folding modulators (FMs) DnaK and at 24 hrs after inoculation. Samples were collected at 0, 4, DnaJ was increased in Strains producing recombinant pro 8, 24 and 48 hours after induction. At each time point, 20 tein compared to control Strains (see Tables 4 and 5). A strain OD, cells normalized in 1 mL were harvested, lysed using that co-overproduced GrpE. DnaK and DnaJ along with EasyLyseTM (Epicentre, Madison, Wis.) and separated into hGH was produced and tested to identify if this resulted in soluble and insoluble fractions by centrifugation at 14000 the accumulation of increased levels of soluble hCH. rpm for 30 minutes. Equal Volumes of samples were com Construction of grpE-dnaKJ-Containing Plasmid for Co bined with BioRad (Hercules, Calif.) 2x Laemmli buffer, Overexpression with hCH heated at 95° C. for 5 minutes with 30 uL loaded onto a BioRad 15% Tris HCl Criterion gel using 1x Tris 0251 The Pfluorescens grpE-dnaKJ genes were ampli SDS running buffer (BioRad). The proteins were visualized fied using chromosomal DNA isolated from MB214 (DNeasy; Qiagen, Valencia, Calif.) as a template, RC199 with Simply Blue Safestain (Invitrogen, Carlsbad, Calif.) as (5'-ATATACTAGTAGGAGGTAACTTATGGCT. shown in FIG. 15. The resulting Coomassie-stained gels GACGAACAGACGCA-3') and RC200 (5'-ATATTCTA were scanned using a Molecular Devices Personal Densito GATTACAGGTCGCCGAAGAAGC-3') as primers, Pfu meter (Molecular Devices, Sunnyvale, Calif.) with analyses Turbo (Stratagene, La Jolla, Calif.) was used following the performed using ImageOuant and Excel. As shown in FIG. manufacturer's recommendations. The resulting PCR prod 15, co-overexpression of GrpE. DnaKJ significantly uct (4 kb) was digested with Spel and Xbal (restriction sites increased the solubility of hCGH, converting almost 100% of underlined in the primers above) and ligated to pI)OW2236 the target protein into the soluble fraction, albeit at a lower to create pDOW2240 containing the grpE-dnaKJ genes total protein yield. Additional experiments repeating growth under the control of the tac promoter. Plasmid p)OW2240 and induction of DC463 using the simultaneous addition of was digested with Spel and HindIII and the resulting grpE IPTG and mannitol closely mimicked the results shown dnaKJ-containing 4.0 kb fragment was gel-purified using here, although with a varying degree of hCGH solubility Qiaquick (Qiagen) and ligated to plOW2247 also digested (between 50-100%; data not shown), when GrpE DnaKJ with Spel and HindIII. The resulting plasmid, pIDOW3501, were co-overproduced. These findings further demonstrate containing grpE-dnaKJ under the control of the mannitol that targeted Strain engineering based on transcriptional promoter, was transformed into DC388 by selecting on M9 profiling can lead to a rational strain design to increase glucose plates Supplemented with 250 ug/ml uracil. Finally, solubility and/or yield of a recombinant protein. pDOW1426 was electroporated into the above strain (DC462) and selected on M9 glucose plates, resulting in 0253) The invention has been described with reference to strain DC463 with two inducible plasmids: 1) p)OW1426 certain embodiments and non-limiting examples. It will be carrying P.tac hCGH and 2) pl)OW3501 carrying P. grpE clear to one of skill in the art that other embodiments of the dnaKJ. invention are also possible.

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS : 19

<21 Oc SEQ ID NO 1 <211 LENGTH 2.0 <212> TYPE DNA <213> ORGANISM: artificial sequence FEATURE US 2006/01 10747 A1 May 25, 2006 49

-continued <223> OTHER INFORMATION: Sense primer HslU gene <400 SEQUENCE: 1 accgaagttcg gctatotggg 20

<210> SEQ ID NO 2 &2 11s LENGTH 2.0 &212> TYPE DNA <213> ORGANISM: artificial sequence &220s FEATURE <223> OTHER INFORMATION: Antisense primer hslU gene <400 SEQUENCE: 2 aatcgc.gctg. cacgc.citt.cg 20

<210> SEQ ID NO 3 &2 11s LENGTH 18 &212> TYPE DNA <213> ORGANISM: artificial sequence &220s FEATURE <223> OTHER INFORMATION: forward primer hslF2 <400 SEQUENCE: 3 ttcatcaagg togaag.cg 18

<210> SEQ ID NO 4 &2 11s LENGTH 17 &212> TYPE DNA <213> ORGANISM: artificial sequence &220s FEATURE <223> OTHER INFORMATION: reverse primer hslR2 <400 SEQUENCE: 4 tdagtottga ccatgcc 17

<210 SEQ ID NO 5 &2 11s LENGTH 17 &212> TYPE DNA <213> ORGANISM: artificial sequence &220s FEATURE <223> OTHER INFORMATION: reverse primer M13R <400 SEQUENCE: 5 caggaalacag citatgac 17

<210> SEQ ID NO 6 &2 11s LENGTH 15 &212> TYPE DNA <213> ORGANISM: artificial sequence &220s FEATURE <223> OTHER INFORMATION: forward primer M13F <400 SEQUENCE: 6 taaaacgacg gccag 15

<210 SEQ ID NO 7 &2 11s LENGTH 2.0 &212> TYPE DNA <213> ORGANISM: artificial sequence &220s FEATURE <223> OTHER INFORMATION: overlap extension primer hslUp <400 SEQUENCE: 7 US 2006/01 10747 A1 May 25, 2006 50

-continued gtgg cago.ca cca aggctgc 20

SEQ ID NO 8 LENGTH 39 TYPE DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: overlap extension primer hsil middle up <400 SEQUENCE: 8 cc cacattga gtgaggctta caaggggaga gtc.tccacg 39

SEQ ID NO 9 LENGTH 39 TYPE DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: overlap extension primer hsil middle down <400 SEQUENCE: 9 cgtggagact citc.cccittgt aag cotcact caatgttggg 39

SEQ ID NO 10 LENGTH 2.0 TYPE DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: overlap extension primer hsil down <400 SEQUENCE: 10 ggcCaatggit toggccacgc.g 20

SEQ ID NO 11 LENGTH 18 TYPE DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: overlap extension primer hsil up up <400 SEQUENCE: 11 tgcc.gacgcc acaggtgc 18

SEQ ID NO 12 LENGTH 18 TYPE DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: overlap extension primer hsil down down <400 SEQUENCE: 12 gcctgg tact gcgacitcg 18

SEQ ID NO 13 LENGTH 42 TYPE DNA ORGANISM: artificial sequence FEATURE: OTHER INFORMATION: forward primer RC199 <400 SEQUENCE: 13 atatac tagt aggaggtaac titatggctga C galacagacg ca 42

SEQ ID NO 14

US 2006/01 10747 A1 May 25, 2006 54

-contin ued ttgaac gata togcaaagaa totgatcc to tdgttgatca togcggctgt cctggtgacg 60 gtgatgaaca acttct coag Coctaacgag cc.gcagacco toaactatto cg actitcatc 120 cagdaagtta aggatggcaa gotcgagcgc gtagcggttg atggctacgt. gattaccggit 18O aa.gc.gcaacg atggcgacag cittcaag acc attcgtocto coattcagga caacggtotc 240 atcggtgacc toggtggataa caaggtogtt gtggaaggca agcagoctoga acago: aaag.c 3OO atctgg acco agctoctogt ggc.ca.gctitc cc gatcc togg to attatc.gc cgtgttcatg 360 ttcttcatgc gcc agatgca aggcggtgcg ggaggcaagg gC9ggcc gat gagctitcggc 420 aaaagcaagg cqc.gc.ctgct citc.cgaagac caggtgaaga ccaccctggc tgacgtogca 480 ggttgcgacg aagccaagga agaagtcggit gagttggtog agttcctg.cg tgatc.cgggc 540 aagttc.ca.gc gcc toggtgg cc.gtattoct c goggtotgc tigatggtggg gcc to cqggit 600 accggtaaaa ccttgctggc caaggcgatt gcc gg.cgaag cca aggtgcc tittcttcacg 660 attitcc.ggitt citgactitcgt. c gagatgttt gtcgg.cgtog gogc.ca.gc.cg tgttcgc gat 720 atgttcgagc aggccaagaa goacgc.gc.ca to catcatct tcatcgacga aatcgatgcc 78O gttggtc.gcc atcgtggcgc ggg catgggg ggtgg to acg atgagcgtga gcagaccctic 840 alaccagttgc tiggtagagat ggatggtttc gagatgaatg acggcattat cgtcatcgc.c 9 OO gcaaccalacc gtc.ccgacgt totcg accot gcgttgctgc gtc.cggg.ccg. tittcg accqt 96.O Caggittgtgg toggcctgcc ggacattcgt g g togtogagc agatcctgaa agtacacatg O20 cgcaaggtgc caatgggtga C gacgtggct coggctdtga togc.ccgtgg tactccc.ggit O8O ttct coggtg citgatctggc galacctgg to aac gaggott cqctgttc.gc tg.ccc.gtact 14 O ggcaa.gc.gca togttgagat gaaagagttc gaattgg.cga aag acaagat catgatgggc 200 gcc gag cqca aatccatggit catgtc.cgag aaagaga agc agaac accgc ttatcac gag 260 gcc.ggtoacg ccattgtagg togcgttgtg cct gag catg accocgt.cta caaagtgtc.g 320 atcattcc to gtggtogggc actgggtgtg accatgttcc toccggalaga agatcgctac 38O agccitctoca agcgtgcgct gatcago cag atctgctcqc totatgg.cgg togtattgct 4 40 gaggaaatga cccittggctt cqacggtgtg accactdgtog cct coaatga catcatgcgt. 5 OO gccago caga togcacgaaa catggtgacc aagtggggct totcggaaaa acticggcc.ca 560 ttgatgtacg cc.gaagagga aggcgaagtg titcCtggggc gtggcgg.cgg tgggcaaag C 62O gccagottct c gggtgagac agccaagctg atcgacitcc.g. aagttc.gcag catcattgac 680 cagtgctato goacggcc aa goagattittg actgaca acc gtgacaa.gct ggacgc.catg 740 gctgatgcgt tatgaagta C gaalaccatc gatgcc.gacc agatcgacga catcatggcg 800 ggc.cgitacgc cqCgtgagcc gcgcgactgg galaggtggitt C ggg tacttic ggg CactCcg 860 cctgtggtgc agaatgag cq coctogaaacg cctatoggcg gcc.cggcago tgatcactaa 920

1. A process for improving the expression of a recombi been modified to express the recombinant protein or a nant protein or peptide in a host cell or organism comprising: recombinant cell that is not expressing the recombinant i) expressing the recombinant protein or peptide in the protein; and recombinant host cell or organism; iii) changing expression of the identified compensatory ii) analyzing a genetic profile of the recombinant cell to gene or gene product in the recombinant cell by genetic identify one or more compensatory gene(s) or gene modification to provide a modified recombinant cell product(s) that are expressed at a higher level in the that achieves an increase in recombinant protein recombinant cell than in either a host cell that has not expression, activity or solubility. US 2006/01 10747 A1 May 25, 2006

2. The process of claim 1 further comprising expressing 22. The process of claim 21 wherein the identified gene the protein or peptide in the modified recombinant cell. product is removed by homologous recombination. 3. The process of claim 1 further comprising: 23. The process of claim 1 wherein the identified gene (a) expressing the recombinant protein or peptide in the product is a folding modulator, a Subunit of a folding modulator, a cofactor of a folding modulator, or a cellular or modified recombinant cell; genetic modulator affecting the expression of a folding (b) analyzing a genetic profile of the modified recombi modulator. nant cell to identify at least one second gene(s) or gene 24. The process of claim 23 wherein the identified gene product(s) that are differentially expressed in the modi product is a folding modulator. fied recombinant cell; 25. The process of claim 23 wherein the identified gene (c) changing expression of the second identified gene product is a subunit of a folding modulator. product in the modified recombinant cell to provide a 26. The process of claim 23 wherein the identified gene doubly modified cell; and product is a cofactor of a folding modulator. (d) expressing the protein or peptide in the doubly modi 27. The process of claim 23 wherein the identified gene fied recombinant cell. product is a cellular or genetic modulator affecting the 4. The process of claim 3 further comprising repeating expression of a folding modulator. steps a) to d). 28. The process of claim 23 wherein the folding modu 5. The process of claim 4 comprising repeating steps a) to lator is a chaperone protein. d) until cell viability is affected by changing the expression 29. The process of claim 23 wherein the folding modu of the identified gene(s) or gene product(s). lator is selected from the group consisting of gene products 6. The process of claim 4 comprising repeating steps a) to of the genes cbp.A, htpG. dnaK, dinal, fkbP2, groES and d) until expression of the recombinant protein or peptide groEL. reaches a targeted endpoint. 30. The process of claim 23 wherein the expression of the 7. The process of claim 1 wherein a genetic profile is identified gene product is changed by increasing expression analyzed by comparing a genetic profile of the recombinant of the identified gene, a cofactor of a identified gene, or a cell to a second genetic profile of the host cell. cellular or genetic modulator of the identified gene. 8. The process of claim 1 wherein the genetic profile is a 31. The process of claim 30 wherein the increased expres transcriptome profile. sion is by inclusion of a DNA encoding the identified gene 9. The process of claim 8 wherein the transcriptome product. profile is determined through a microarray. 32. The process of claim 30 wherein the increased expres 10. The process of claim 1 wherein the genetic profile is sion is by insertion of a promoter into a host cell genome. a proteome profile. 33. The process of claim 30 wherein the increased expres 11. The process of claim 10 wherein the proteome profile sion is by inclusion of an exogenous vector into the host cell. is determined through two dimensional gel electrophoresis, 34. The process of claim 1 wherein the host cell is a ICAT or LC/MS. microbial cell. 12. The process of claim 10 wherein the proteome profile 35. The process of claim 1 wherein the host cell is a is determined through a peptide array. Pseudomonad. 13. The process of claim 12 wherein the peptide array is 36. The process of claim 1 wherein the host cell is a P an antibody array. fluorescens cell. 14. The process of claim 1 wherein the identified gene 37. The process of claim 1 wherein the host cell is an E. product is a protease, a subunit of a protease, a cofactor of coli cell. a protease, a cellular or a genetic modulator affecting expression of a protease. 38. The process of claim 1 wherein the host cell is selected 15. The process of claim 14 wherein the identified gene from the group consisting of an insect cell, a mammalian product is a protease. cell, a yeast cell, a fungal cell and a plant cell. 16. The process of claim 14 wherein the identified gene 39. The process of claim 9 wherein the microarray com product is a subunit of a protease. prises samples of binding partners to at least 50% of a 17. The process of claim 14 wherein the identified gene genome of the host cell. product is a cofactor of a protease. 40. The process of claim 9 wherein the microarray tech 18. The process of claim 14 wherein the identified gene nique comprises samples of binding partners to at least 80% product is a cellular or genetic modulator affecting expres of a genome of the host cell. sion of a protease. 41. The process of claim 9 wherein the microarray com 19. The process of claim 14 wherein the identified gene prises samples of binding partners to at least 90% of a product is selected from the group consisting of D-alanyl genome of the host cell. meso-diaminopimelate endopeptidase, Zinc protease, 42. The process of claim 9 wherein the microarray com microsomal dipeptidase, extracellular metalloprotease pre prises samples of binding partners to at least 95% of a cursor, cell division protein ftsH and gene products derived genome of the host cell. from genes hslV. hslU, clpX, clpA and clpB. 43. The process of claim 1 wherein the improved expres 20. The process of claim 14 wherein identified gene sion is an increase in the amount of recombinant protein or product mRNA level is up-regulated when the recombinant peptide. protein or peptide is expressed in the host cell. 44. The process of claim 1 wherein the improved expres 21. The process of claim 14 wherein the identified gene sion is an increased solubility of the recombinant protein or product is removed from a host cell genome. peptide. US 2006/01 10747 A1 May 25, 2006 56

45. The process of claim 1 wherein the improved expres dase, Zinc protease, microsomal dipeptidase, extracellular sion is an increased activity of the recombinant protein or metalloprotease precursor, cell division protein fish and peptide. gene products derived from genes hslV, hslU, clpX, clp A 46. The process of claim 1 wherein the genetic profile is and clipb. a profile of genes in a gene family. 51. A host cell or organism that expresses a recombinant 47. The process of claim 1 wherein the profile comprises mammalian derived protein that has been genetically modi proteases and folding modulators. fied to reduce the expression of at least one protease. 48. The process of claim 46 wherein the profile consists essentially of proteases. 52. The host cell or organism of claims 52 wherein the 49. A host cell or organism that expresses a recombinant recombinant protein is human growth hormone. protein that has been genetically modified to reduce the 53. A host cell or organism that expresses a recombinant expression of at least two proteases. protein that has been genetically modified to increase the 50. A host cell or organism that expresses a recombinant expression of at least two folding modulators that are not protein that has been genetically modified to reduce the folding modulator Subunits. expression of at least one protease selected from the group consisting of D-alanyl-meso-diaminopimelate endopepti