US007037894B2 (12) United States Patent (10) Patent No.: US 7,037,894 B2 Marshall et al. (45) Date of Patent: May 2, 2006

(54) STABILIZED PROTEINS Malencik and Anderson, 1994, "Dityrosine Formation in Calmodulin: Conditions for Intermolecular Cross-Linking, (75) Inventors: Christopher P. Marshall, Brooklyn, Biochemistry 33: 13363-13372. NY (US); Alexander Hoffman, Los Brown et al., 1995, “Highly Specific Oxidative Cross-Link Angeles, CA (US); Joseph P. Errico, ing of Proteins Mediated by a Nickel-Peptide Complex’, Far Hills, NJ (US); Paul B. Marshall, Biochemistry 34:4733–4739. Munich (DE) Fancy and Kodadek, 1999. “Chemistry for the analysis of protein-protein interactions: rapid and efficient cross-link (73) Assignee: Avatar Medical, LLC, Newark, NJ ing triggered by long wavelength light', PNAS USA (US) 96:6O2O 6O24. Campbell et al., 1998, “Protein Cross-Linking Mediated by (*) Notice: Subject to any disclaimer, the term of this Metalloporphyrins, Biorg. And Medicinal Chem. patent is extended or adjusted under 35 6:1301 7. U.S.C. 154(b) by 390 days. Galeazzi et al., 1999, “In vitro peroxidase oxidation induces stable dimmers of -amyloid (1–42) through dityrosine (21) Appl. No.: 09/837,235 bridge formation'. Amyloid: Int. J. Exp. Clin. Investig. 6:7 13. (22) Filed: Apr. 18, 2001 Baudry et al., 1996, “Dityrosine bridge formation and thy roid hormone synthesis are tightly linked and are both (65) Prior Publication Data dependent on N-glycans', FEBS Lett. 396:223–226. US 2002/0061549 A1 May 23, 2002 Briza et al., 1994, “The sporulation-specific enzymes encoded by the DIT1 and DIT2 genes catalyze a two-step Related U.S. Application Data reaction leading to a soluble LL-dityrosine-containing pre cursor of the yeast spore wall', Proc. Natl. Acad. Sci. USA (63) Continuation of application No. PCT/US00/28595, filed on 91:4524-4528. Oct. 16, 2000. Briza et al., 1990, “Characterization of a DL-dityrosine (60) Provisional application No. 60/159,763, filed on Oct. 15, 1999. containing macromolecule from yeast ascospore walls”. J. Biol. Chem. 265:15118-15123. (51) Int. Cl. Briza et al., 1986, "Dityrosine is a prominent component of A6 IK 38/00 (2006.01) the yeast ascospore wall'. J. Biol. Chem. 261:4288-4294. A6 IK 38/43 (2006.01) DeVore and Gruebel, 1978, “Dityrosine in adhesive formed C07K L/00 (2006.01) by the sea mussel, Mytilus edulis”, Biochem. Biophys. Res. C07K 6/00 (2006.01) Comm 80:993-999. CI2N 9/00 (2006.01) Downie et al. 1972, “An insoluble, dityrosine-containing protein from uterus'. Biochim. Biophys. Acta 263:604-609. (52) U.S. Cl...... 514/12: 424/94.1; 424/130.1; 424/94.3: 530/350; 530/387.1; 530/388.21; (Continued) 530/388.22; 530/388.24; 530/399; 435/183; Primary Examiner Tekchand Saidha 435/198: 514/2 (74) Attorney, Agent, or Firm Jane M. Love; Wilmer (58) Field of Classification Search ...... 514/2, Cutler Pickering Hale and Dorr LLP 514/12: 424/94.1, 94.3, 130. 1, 192.1; 530/350, 530/387.1, 388.21,388.22,388.24, 399, (57) ABSTRACT 530/300,387.9; 435/183, 198: 536/23.1, Isolated polypeptides or polypeptide chains are modified by 536/23.2 di-tyrosine cross-linking such that the retain at least one See application file for complete search history. functional activity. In one embodiment, the isolated polypeptide or polypeptide chains comprise at least one (56) References Cited di-tyrosine cross-link, wherein at least one tyrosine of the di-tyrosine cross-link originates from a point mutation to U.S. PATENT DOCUMENTS tyrosine, and wherein the di-tyrosine cross-linked protein 4,650,758 A 3, 1987 Shaked et al...... 435/105 retains at least one function displayed by the protein in the 4,904,592 A 2, 1990 Freeman et al...... 435/183 absence of di-tyrosine cross-linking. In another 5,538,876 A 7, 1996 Mutsaers et al...... 435/141 embodiment, the di-tyrosine cross-linked polypeptide or 5,747,654 A 5/1998 Pastan et al...... 530/3917 polypeptide chain has enhanced stability compared to the same polypeptide or polypeptide chain in the absence of FOREIGN PATENT DOCUMENTS di-tyrosine cross-linking. A method for stabilization of a WO WO O1/29247 4/2001 polypeptide or polypeptide complex, by the introduction of WO WO O1, 54486 8, 2001 intra-polypeptide and/or inter-polypeptide di-tyrosine bonds, which simultaneously maintains the structure and OTHER PUBLICATIONS function of the polypeptide or polypeptide complex is also Stachel et al., 1998, “Stabilization of a C7 Equatorial described. Gamma Turn in DMSO–d6 by a Ditryptophan Crosslink', Bioorganic and Medicinal Chemistry 6:1439–1446. 25 Claims, 26 Drawing Sheets US 7,037,894 B2 Page 2

OTHER PUBLICATIONS Nomura et al., 1990, “Pulcherosine, a novel tyrosine-d- erived, trivalent cross-linking from the fertili Fetterer et al., 1993, “Synthesis of tyrosine-derived zation envelope of sea urchin embryo', Biochem. cross-links in Ascaris suum cuticular proteins”. J. Parasitol. 29:45.25 4534. 79:160 166. Onorato et al., 1998, “Immunohistochemical and ELISA Fetterer and Rhoads, 1990, “Tyrosine-derived cross-linking assays for biomarkers of oxidative stress in aging and amino acids in the sheath of Haemonchus contortus infec disease', Annals NY Acad. Sci. 854:277–290. tive larvae', J. Parasitol. 76:619–624. Pandey and Aronson, 1979. “Properties of the Bacillus Frater et al., 1960, “A role for thiol and disulphide groups in subtilis spore coat, J. Bacteriol. 137: 1208–1218. determining the rheological properties of dough made from Raven et al., 1971, “Occurrence of dityrosine in Tussah silk wheaten flour', Nature 186:451–454. fibroin and keratin, Biochim. Biophys. Acta 251:96–99. Gmeiner and Seelos, 1989, “Phosphorylation of tyrosine Shewry et al., 1992, “High molecular weight subunits of prevents dityrosine formation in vitro, FEBS Lett. wheat glutenin', J. Cereal Sci. 15:105-120. 255:395. 397. Smail et al., 1995, “Candida albicans cell walls contain the Grune et al., 2001, "Age-related changes in protein oxida fluorescent cross-linking amino acid dityrosine'. Infect. tion and proteolysis in mammalian cells'. J. Gerontol. Immun. 63:4078 4083. 56A:B459 B467. Sobel and Ajie, 1992, “Modification in amino acids of Dead Sea scroll parchments', Free Radical Biol. Med. Guptasarma and Balasubramanian, 1992, “Dityrosine for 13: 701 702. mation in the proteins of the eye lens”. Current Eye Res. Souza et al., 2000, "Dityrosine cross-linking promotes 11:1121-1125. formation of a stable C-synuclein polymers'. J. Biol. Chem. Hensley et al., 1998, “Electrochemical analysis of protein 2.75:18344- 18349. nitrotyrosin and dityrosine in the Alzheimer brain indicates Spangler and Erman, 1986, “Cytochrome c peroxidase com region-specific accumulation”. J. Neurosci. 18:8126-8132. pound I: formation of covalent protein crosslinks during the Huggins et al., 1992, “Tyrosine and dityrosine concentra endogenous reduction of the active site'. Biochim. Biophys. tions in oxidized proteins and lens proteins with age'. Acta 872:155 157. Annals NY Acad. Sci. 663:436–437. Sullivan et al., 1940, “The action of oxidizing and reducing Kanwar and Balasubramanian, 2000, "Structural studies on agents on flour, Cereal Chem. 17:507–528. some dityrosine-cross-linked globular proteins: stability is Tilley et al., 2001, “Tyrosine cross-links: molecular basis of weakened, but activity is not abolished. Biochem. gluten structure and function.” J. Agric. Food Chem. 49: 39:14976- 14983. 2627 2632. Kanwar and Balasubramanian, 1999, "Structure and stabil Totsune et al., 1993, "Chemiluminescence from bamboo ity of the dityrosine-linked dimer of YB-crystallin'. Exp. shoot cut, Biochem. Biophys. Res. Comm. 194:1025-1029. Eye Res. 68:773–784. Waffenschmidt et al., 1993, “Isodityrosine cross-linking Kato et al., 1995, “Formation of protein-bound 3,4-dihy mediates insolubilization of cell walls in Chlamydomonas', droxyphenylalanine in collagen types I and IV exposed to Plant Cell 5:809–820. ultraviolet light', Photochem. Photobiol. 61:367–372. Waykole and Heidemann, 1976, "Dityrosine in collagen'. Kay and Shapiro, 1987. “Ovoperoxidase assembly into the Connective Tissue Res. 4:219–222. sea urchin fertilization envelope and dityrosine crosslink Wells-Knecht et al., 1993, “Oxidized amino acids in lens ing, Devel. Biol. 121:325–334. protein with age”, J. Biol. Chem. 268:12348–12352. Keeley et al., 1969, "Dityrosine in a non-hydroxyproline, Aeschbach et al., 1976, “Formation of Dityrosine alkali-soluble protein isolated from chick aorta and bovine Cross-Links in Proteins by Oxidation of Tyrosine Resi ligament, Biochem. Biophys. Res. Comm. 34:156–161. dues.” Biochim. Biophys. Acta 439, 292–301 ikas. Bodaness et al., 1984, “An Analysis of the H.O. Mediated LaBella et al., 1967, “Evidence for dityrosine in elastin', Crosslinking of Lens Crytallins Catalyzed by the Heme-Un Biochem. Biophys. Res. Comm. 26:748–753. decapeptide from Cytochrome c.” Arch. Biochem. Biophys. Leonardi et al., 1994, "Presence of dityrosine bridges in 231 (2), 461–469. thyroglobulin and their relationship with iodination, Bio Brown et al., 1998, “Determining Protein-Protein Interac chem. Biophys. Res. Comm. 202:38–43. tions by Oxidative Cross-Linking of a -Glycine Leeuwenburgh et al., 1999, “Oxidized amino acids in the Histidine Fusion Protein.” Biochemistry 37, 4397–4406. urine of aging rats: potential markers for assessing oxidative Campbell et al., 1998, “Protein Cross-Linking Mediated by stress in vivo”, Am. J. Physiol. 276:R128-R135. Metalloporphyrins,” Bioorg. Med. Chem. 6, 1301–1307. Li et al., 1996, “Involvement of peroxidase in chorion Fancy & Kodadek, 1997, “Site Directed Oxidative Protein hardening in Aedes aegypti. Insect Biochem. Molec. Biol. Crosslinking.” Tetrahedron. 53 (35), 11953–11960. 26:309. 3.17. Fancy & Kodadek, 1999. “Chemistry for the Analysis of Malanik and Ledvina, 1979, “The content of dityrosine in Protein-Protein Interactions: Rapid and Efficient chick and rabbit aorta proteins’. Connective Tissue Res. Cross-Linking Triggered by Long Wavelength Light.” Proc. 6:235 240. Natl. Acad. Sci. USA 96, 6020–6024. Malencik et al., 1996, “Dityrosine: preparation, isolation and Gmeiner & Seelos, 1989, “Phosphorylation of Tyrosine analysis”. Anal. Biochem. 242:202–213. Prevents Dityrosine Formation. In Vitro.” FEBS Lett. 255 Michon et al., 1999, “Wheat prolamine crosslinking through (2), 395-397. dityrosine formation catalyzed by peroxidases: improve Govardhan, 1999, “Crosslinking of Enzymes for Improved ment in the modification of a poorly accessible substrate by Stability and Performance. Curr. Opin. Biotech. 10, “indirect' catalysis’, Biotechnol. Bioeng. 63:449–458. 331 335. US 7,037,894 B2 Page 3

Helms et al., 1998, “Flexibility Involving the Intermolecular Malencik & Anderson, 1987. “Dityrosine Formation in Dityrosyl Cross-Links of Enzymatically Polymerized Calm Calmodulin,” Biochemistry 26, 695–704. odulin,” Biochemistry 37, 8378–8384. Malencik & Anderson, 1991, “Fluorometric Characteriza Kanwar & Balasubramanian, 1999, “Structure & Stability of tion of Dityrosine: Complex Formation with Boric Acid and the Dityrosine–Linked Dimer of gB Crystallin.” Exp. Eye Borate Ion. Biochem. Biophys. Res. Commun. 178 (1), Res. 698, 773–784. 60 67. Kanwar & Balasubramanian, 2000, "Structural Studies on Some Dityrosine Cross-Linked Globular Proteins: Stability Malencik & Anderson, 1994, "Dityrosine Formation in Is Weakened, But Activity Is Not Abolished.” Biochemistry Calmodulin: Conditions for Intermolecular Cross-Linking.” 39, 14976-14983. Biochemistry 33, 13363–13372. Lardinois et al., 1999, “Spin Trapping and Protein Malencik & Anderson, 1996, Dityrosine Formation in Calm Cross-Linking of the Lactoperoxidase Protein Radical. J. odulin: Cross-Linking and Polymerization Catalyzed by Biol. Chem. 274 (50), 35441–35448. Arthromyces Peroxidase, Biochemistry 35, 4375-4386. U.S. Patent May 2, 2006 Sheet 1 of 26 US 7,037,894 B2

CATALYST OXLDIZING REAGENT

U.S. Patent May 2, 2006 Sheet 2 of 26 US 7,037,894 B2

: U.S. Patent May 2, 2006 Sheet 3 of 26 US 7,037,894 B2 U.S. Patent May 2, 2006 Sheet 4 of 26 US 7,037,894 B2

3 COMPLEMENTARTY LIGHT DETERMINING REGIONS LIGHT

FAB FRAGMENT 4 FRAMEWORK COMPLEMENT BINDING REGIONS REGION

MACROPHAGE RECRUITING REGION

HEAVY CHANS F.G. 1D U.S. Patent May 2, 2006 Sheet S of 26 US 7,037,894 B2 1.54 le: is . OH

APHA CARBON

ROTATIONAL FREEDOM OH N BEA CARBON ROTATIONAL FREEDOMIN THE DITYROSINE BOND

POLYPEPTDE BACKBONE

FIG.2C U.S. Patent May 2, 2006 Sheet 6 of 26 US 7,037,894 B2

U.S. Patent May 2, 2006 Sheet 7 of 26 US 7,037,894 B2

U.S. Patent May 2, 2006 Sheet 8 of 26 US 7,037,894 B2

LIGHT CHAIN () CHAN K&W ATOM AMINO ACID x COORDINATE y COORDINATE 7 COORDINATE 1

1

1

2 2 2 2 2 2 2 2 3

FIG.5A U.S. Patent May 2, 2006 Sheet 9 of 26 US 7,037,894 B2

HEAVY CHAIN (H) CHAN K&W ATOM AMNO ACID x COORDINATE y COORDINATE 2 COORDINATE

1

2 2 2 2 2 2 2

3.

FIG.5B U.S. Patent May 2, 2006 Sheet 10 of 26 US 7.037,894 B2

FW FRAGMENT L. L. L. L. 2 3 4 5 At CO CO CO CO CO. AA Asp He 298 0.60 ; -3.7826.64-25.07. -1.24 Ch KW At AA Co Gu 1.43 -1.08 805 580 34.84 tle 11.56 -2.24 4.42 3.22 $1.42

Fy FRAGMENT 2

2 3 4 5 Cox Co. Cox CO CO. AA Gu Ser 3.94 8,89 56.85

Glu 10.23 61.09 64.74 Vol 13.63 62.72 65.19

FIG.6B U.S. Patent May 2, 2006 Sheet 11 of 26 US 7,037,894 B2

Fy FRAGMENT 3

L. L. L. L. 2 3 4 5 CO CO CO CO. Ser 1956 1909 . y -13.02-1506 . Z -1586 -12.67 . Ch KW At AA CO GN 26.7 9.76 1088 35.84 35,05 Co Vol 27.45 8,61 734 32.69 32.1 CO CO CO

FIG.6C U.S. Patent May 2, 2006 Sheet 12 of 26 US 7,037,894 B2

RESIDUE PARS H L H 2 H L5 H 4 H Los

FG.7A

RESIDUE PARS AMERAGE StDEV. MAX H L 35.09 1.56 37.37 H1 2 3400 1.87 57.6 H H L4 H Ltos 3226 1.57 31.32 .99

FIG.7B U.S. Patent May 2, 2006 Sheet 13 of 26 US 7,037,894 B2

L. L. L. L. 2 3 4 5 C3 C3 C3 C3 4.14 -0.21 -329 -1.52 . Z -27.53 -2378 . Ch KW At AA C8 GLU 1247 -0.12 8.66 37.27 34.85 C3 ILE 10.11 -2.68 3.62 31.73 29.30

FIG.8 U.S. Patent May 2, 2006 Sheet 14 of 26 US 7,037,894 B2

APHA DISTANCES L. L. L. 2 3 4 5 At CO CO CO CO CO. AA ASP LE 2.98 0.60 -3.78 -1.24 . z -26.64 -25.07 . KW At AA Co. CU 11.45 -1.08 805 5.80 3484 CO E 11.36 -2.24 4.42 221 3142 CC Co. CO FIG.9A

BEA DISTANCES L L. L L 1 2 3 4 5 C3 C3 C3 C3 C3 AA ASP E . a e 4.14 -0.21 -29 -1.52 . Z -27.53 -2.78 . Ch KW At AA CB GLU 1247 -0.12 8.66 37.27 34.85 LE 10.11 -2.68 3.62 3.73 29.30

FIG.9B U.S. Patent May 2, 2006 Sheet 15 of 26 US 7,037,894 B2

DFFERENCE

DFFERENCES BEWEEN RESIDUE PAR APHA- AND BETA CARBON DISTANCES

Ch KY AA -147 -0.01 . 0.48 2.10 .

FIG.9C U.S. Patent May 2, 2006 Sheet 16 of 26 US 7,037,894 B2

FV FRACMENT 1

Fv FRAGMENT 3 U.S. Patent May 2, 2006 Sheet 17 of 26 US 7,037,894 B2

RESIDUE PARS H H 2 H H1 4 H Lto6

FIG. 1 1A

RESIDUE PARS AVERACE Strid,DEV. H L1 -0.68 1.04 H 2 0.34 0.82 H1 L H 4 H Lios 0.74 0.69 1.78 0.50

FIG, 11B U.S. Patent May 2, 2006 Sheet 18 of 26 US 7,037,894 B2

AA 1 Glu- 58 Glu 24 Asp 5 Glu 2 Wol lie 2 Alo Glu 3 Gin 90 Thr 5 Glu 3 His 4 Leu 101 Vol 5 - - -

Amino Acid von der Wags Hydrophobicity volumes A

F AA F AA F AA F 91 3 48 2 67 1 99 124 2 67 109 1 124 1 109 5 18 2 124 2 F.G. 12C U.S. Patent May 2, 2006 Sheet 19 of 26 US 7,037,894 B2

89||'0||

SEWITTOASTWMXHQN\!\

U.S. Patent May 2, 2006 Sheet 22 of 26 US 7,037,894 B2

PCROligos for Candida antarctica Lipases

Ogos forpra?-CAB Prime. A:5'asaatcatcate catcatcacgcagoggectacote eggalogucces"(SEQID NO:3) PrimerB:5'etata gegoogetat caggagggeogataeggs"(SEQID NO; 4)

Oligos for PortAutations (aldolpa-CAA) Mi Ryf pinner MP5agga attacata catcatcatcaag casegg catacctic RgticegaceatgacAttag'(SEQID NO:5) M. WSY PinerM2F5 gaccguctACatec ected PinerM2R:SengagggggioTagteg tagsSEQDNO:7) (SEQID NO:6) M3 FITY PrimerM3F5egtriggestAotutocagtated PinerMSR:Spit ackaggsa stagicagucccs' (SEQID NO:9) (SEQID NO:8)

Olgas for pacAB PrimarC time? SegATagattite catc2a tit-3 5-stclagaaag giggee great-3(SEQID NO; 11) (SEQID NO:10)

Olgus for error-prone PCR Pine E. Piraer D; Sagtgateca Catcatc. 5-stage aggggggecirc-s(SEQID NO: 13) (SEQID NO:12) FIG. 15B

U.S. Patent May 2, 2006 Sheet 24 of 26 US 7,037,894 B2 Subtilisin Amino Actd Alignment 1 2 3 4 5 6 7 9 10 1 12 13 14 is 16 17 18 AIA GT 8E VAT PRO TR GT3 ERAGWA, GA AA PROAA AA SAS AA & SWA, PEO (E. S. GE BAA PROAA SDES 83 AAs CE8 WA SERTYB G VA, SER GY RISAA FROAA LSO EIS R 9 at 2, 22 23 24, 25 2S 27 29 29 30 3 33 34 35 36 A Girls G. S. GVA, VA. AAA, ABOAS R G 3 SR GE GE REGE SER is VA, IE8 VAC, AAYAT, SAS SERGEYER ASP G. G. R GY SSR AEN A IS WAAA WA, SASR 8ER GY: As 38 9 40 4 2 43 45 47 48 49 50 5.1 5 S3 5. TER IS PROASPICUASNILE ARG GIY GTY ATASER PIE VAL FRO GE GLO SR Sig S Rolls ags: WA, AR IS GAA SEWA. R. S. GAO BER SERS PEOASP J ASE WAT, ATA GE EY AYA SER REWA, PO Se GEO S5 SS 57 SS 59 60 62 63 6 65 66 67 68 69 70 Tl 72 -- - EOSRR GIW ASP G AS Y S St HIS WAAA Girl ERAS ROTER --- Git ASR GTY SER SEREIS GLY THE EIS WAY. ATA GE TER & ASN Red Psk --- Gís RSS ASEASE SER BS. Grit IS WAAT GER 5 76 77 79 79 80 8, 82 83 B. e5 86 8' 88 89 90 a AAAA LSUAS ASS Sai I G VA, EG GAY WA SAA GTO AAAA Ed AS AS SE 3G VA. O. GE WA R R SER A SE ss --- sor re-ser -r r- or- was YA, SOAA WA AA PRO SER ATA SKR 91 92 93 94 S5 96 97 99 99 00101102 10304 10506 107 108 IEUTY ATA VAt YS VAL LEU GLY ATASER GY SER CLY S3R YAL SER SER TIR IEC TYRAA VA, EYs WA, IEUAS SER FAR GY 88R GYGY TRSER SPIEB List RATA VAL, SWAT, E G AAASE Glysis (EY GER SERRY ILR Ogle 12 3 4 5 15 188 120 2 122 23 2d 125 25 ATA is G. S GO TRP AAG AS AS, GM E8A AAAS ERO SE, ASN S S GU R A E SER ASS ASHRAS I, EE ME SER EAN Cry 3 GO TRP AEA is AAASS ASS ASF WA, EASE SKR 27 28 129 30 13, 132 133 34 13S 136 137 138 139 40 1 2 3 4. gy (; S3 PRO SER PRO SERAAER GO GE GAAVAL ASS 8 AAER E. G. G. PROTER GY 8ERTER AA 4U SERVAL WAT, ASPIS AAWA LED G. G. RE SER at SERAA AAR IS AAAAA. ASE IS AAWAT 45 146 l 1489 50 15, 52 53 15455 15657 SB 159 160 S 62 SRRARGGE VAI, LSUVA?, VAL AIA ALASER GTY ASN SER GY --- A.AGE SER (SEQDN0:16) Sigg SER GY ris VAL, VA. AAAAAIA ALAGY ASW GE GEY SER SE, G SER (SEQDN0:17) ALASER GLY WAL VAL VAT, WA, ATA AIA ALA GYASS GL. GISTER SER CL SER (SEQID NO:18) F.G. 16B U.S. Patent May 2, 2006 Sheet 25 of 26 US 7,037,894 B2

Subtilisin Amino Acid Alignment (cond.)

163 164 1.65 66 i67 68 169 176 17 372 173 174 175 76 17778 17960 IL SER --- -r- -rr TYR PRO AA ARG frk AIA. ASN AA MET AIA WA, G. A.A. SR SER WA, G SR RRO ALA LYSTER PRO 928 E. LEAIAWA, YA SKR SER TER WA GT TYR PRO G. YSTR ERO SER WA, its AA YAL, SY ALA 182 83 84 e5 85 1878 89 90 19, 192 93 194, 195 S6 97 198 R ASP G ASN, ASN AS ARG AA S3R K S CN tag ev. A.A. G. A. ASF At ASN SER SER AS GAN ARG AA S3R RER SER SSR A. Y SER U R ASP WAT, ASP SER SER ASN GIN ARG ATA SSR 2 SER SER WA GLY PRO GU at ASR 199 200 20 22 203 204, 20S 206. 207 208. 209 210 21. 212 23. 2 25 26 EWA AA PRO G, WA, ASN WA GTN SER TRS PROGY SER REY AIA WA, MET AA PRO G, WA. SER is Gris SR U PRO G, G TRTR GY WA. M. A.A. R. G. W.A. SR 8 CS SER TER IS PRO (YASN IS TY, G 21, 28 29 220 21, 222 223 224, 225 226 22, 228 229 230 23 232 233 234 SRR at ASS GAY R SER ME AA R O S WA. AA Y AA AA AA is AAR AS C R SME ALA TER PRO ES WAI, AIA GIY ALA A. A. AA EU AA 13 S3R e ER SER ME AA S3R BROS WA AA GT AA AA AA TRU 235 236 as 238 239 240 24, 242 243 244 24s 246 247 248 249 250 25 2S2 WA, YS GIN SAN FRO SER TRP SER ASN WAT, GTN & ARG AS IS ISU YS 8 EU SE IS EIS PRO TR TRP R ASN AIA GN WA, ARG ASP AG R GLU IR RU SER IS HIS PRO ASN TRP R ASN THR GIN WA, ARG SER SER LEU GU 2.53 2.54. 255 2.56 257 258 259 260 26, 262 263 264. 255 266 26, 268 269 270 ASS THR ALA TER SERIEO GLY SER THRASELEUTR GY SER GY LEU VA, ASY See IHRAIA THRTYR LEO GEY ASN SER RERTYRTYR GTY INS GIY LSU TIRAS ASN RTR TER IS ISO (STY ASN SER HE TYR TYR GY IS GL. SO & AST 27 272 273 274 25 276 AA, GL AIA. AEA HR ARG SES ID NO: 19, VAL GIN FLA ALA ALA (STN (SEQID NO: 17 WA, GN AIA AA AA GIN (SEQIO NO: 18)

FG 16C U.S. Patent May 2, 2006 Sheet 26 of 26 US 7,037,894 B2

PCROgos for Suhsilsla E

Aprinter- Buinner 5taggigcatalgigaag' 5tage testaatgagg (SEQID NO:19) gaggfig ty aggetta gigats (SEQID NO:20)

1. KTY - G6Y F3ggctetacga(SEQID Taga NO:21) gotgitatic- Scast Gaggao TAegetacg3Asage (SEQDD NO:33NO:33) RisgaaaaagttacAtAtacgagee-(SEQID NO:22) R5 cogging act A glocggage (SEQSEOD NO: 34)

S. S98 Saltcett'TCatctgart 5-8 egctgat TAcaggage SEC ID NO:35 (SEQD NO:23) ages ) Sacca agtegggg. Agaaag attac' R5gregattico gaTAafe agtact3 (SEQID NO:36) (SEQID NO:24) 3- DS - 7 FS.gic agga attract citato at-3' F5.svgwggest Tactitage ta' (SEQID NO:37) (SEQID NO:25) R5-gatgagaagagh attegligo-3' R5-age tigagaga agagicagoga' (SEQID NO:38) (SEQID NO:26) - 20 2- 86 SeanageReactCAtggattic-3' 5-ciggpgtagotAtagsgrateata:3(SEQID NO:39) (SEQID NO:27) R5-taggettical Aargggg3 R3-taatgaggetATAgtaege cag(SEQID NO: 40) (SEQID NO:28) 4.- KTOY POY 95 getacctest at cottstaeta-3 F5-gatggagestTAtEgg!g tec alb3 (SEQID NO:41) (SEQID NO: 29) RS-agtags regata AlAtgagggagee-3 R5-gategacao becaTArecataoac-8(SEQID NO: 42) (SEQID NO:30) 4 E195Y F5-age agg, tcTaTctigates als (SEC ID NO:31) R5'-cat cacateaag AtAagace tec go-3' (SEQID NO:32) FIG. 16D US 7,037,894 B2 1. 2 STABILIZED PROTEINS tions; however, sequence similarities reveal their presence. Crystallographic studies have shown that related domains This application is a continuation-in-part of PCT/US00/ are even more conserved in secondary, tertiary and quater 28595 filed Oct. 16, 2000, which claims priority of U.S. nary structure than in primary amino acid sequence, Such Provisional Application No. 60/159,763 filed Oct. 15, 1999, that structural inferences can be made about a particular each of which is incorporated-by-reference herein in its domain if structural data is available on one or preferably entirety. multiple related domains (see e.g., Hofmann K., Cell Mol. Life Sci. vol. 55(8–9): pp. 1113–28, 1999; Chou J. J. et al., 1. FIELD OF THE INVENTION Cell vol. 94(2): pp. 171-80, 1998). The present invention relates to cross-linking methods to 10 2.2. Biocatalytic Enzymes stabilize polypeptides and polypeptide complexes for com There are numerous conceivable commercial applications mercial uses (pharmaceutical, therapeutic, and industrial), of stabilized proteins, protein complexes and protein-protein and to polypeptides and polypeptide complexes so cross interactions. As an example of a class of proteins for which linked. stabilization is desirable, enzymes and other proteins that 15 have been used as biocatalysts in industrial applications are 2. BACKGROUND OF THE INVENTION considered in this section. Valuation of the biocatalytic enzyme market is also considered. 2.1. Structure and Function of Polypeptides and Industrial biocatalytic processes have use in many indus Polypeptide Complexes trial Sectors, including the chemical, detergent, A protein molecule consists of a linear polypeptide chain pharmaceutical, agricultural, food, cosmetics, textile, of amino acids that is intricately folded in three dimensions materials-processing, and paper industries. Within these to form, e.g., interaction Surfaces, binding pockets and industries, biocatalysts have many applications, ranging active sites. A specific three-dimensional fold is generally from product synthesis (e.g., amino acid manufacturing), use required for protein function, wherein the fold itself is as active agents in certain products (e.g., biological washing specified by the linear sequence of amino acids (i.e., the 25 powders), use in diagnostic testing equipment, and use as primary structure of the protein). It is notable, however, that therapeutic agents. Total sales of industrial biocatalysts in dissimilar primary structures can have nearly identical three 1999 were roughly S1.4 billion. This figure is expected to dimensional folds. Evolution has conserved specific folds to grow significantly over the next decade as biocatalyst appli a greater extent than specific primary structures. The protein cations are enabled by novel technologies Such as the folding process remains an active field of study. It is known, 30 invention described herein. however, that secondary structure elements such as alpha Market sectors believed to have potential for growth and helices, beta sheets and beta turns contribute to assembly of technological innovation include engineered enzymes (e.g., the tertiary structure of a polypeptide. A biological protein for providing faster throughput, cheaper production, and/or entity made up of several polypeptides is said to have the capability to produce novel products), pollution-control quaternary structure. 35 systems (e.g., for bioremediation), and non-aqueous biocata Protein folding ultimately results from the interaction of lytic systems (e.g., for oil and fat bioprocessing and drug intra- and inter-molecular forces. As such, a folded protein manufacture) (see Business Intelligence Center, Explorer: has a finite stability that translates into a finite structural and “BIC Explorer”; Business Opportunities in Technology functional “half-life' in a given solvent environment. For Commercialization). example, in an aqueous environment, proteins attain stability 40 Historically, only a handful of fine chemical companies in part by clustering hydrophobic residues in the protein core such as DSM, Lonza and Avecia Ltd., have embraced and and hydrophilic residues at the protein-solvent interface. invested in biocatalytic processes. More recently, however, Accordingly, the activity half-life for a given protein is in there have been several significant corporate investments in part a function of Solvent properties. Additionally, chemical 45 the field of biocatalysis. One example of such an investment bonds such as disulfides occur in nature to fix the is Bayer's recent announcement that it will use 6-7% of fine co-ordination of non-neighboring side chains in close proX chemical sales to develop enzyme-based processes for cer imity in a folded protein, thereby stabilizing its structure and tain molecules. function. Major customers of fine chemical companies tend to favor In many biological systems, proteins associate with each 50 Suppliers with a broad range of process development. This other to form dimers or higher order multimers (i.e. quater consideration suggests that those with biocatalytic expertise nary structures), and only as such carry out their specific stand to gain a further competitive edge in the marketplace. functions. The formation of Such complexes is often an Some firms have recognized this and are trying quickly to important event in regulating the activity of proteins. Vari close the gap via acquisitions (e.g. Great Lakes's acquisition ous mechanisms have been found to regulate protein com 55 of NSC Technologies and Cambrex’s purchase of Celgene). plex formation, such as ligand binding, or post-translational Others acknowledge that they will lose out on further modification. The functions of protein complexes can range business opportunities if they don't do something to access from providing structure to the intra-cellular matrix, where, the basic skills required for biocatalysis (Joe Blanchard, for instance, actin forms a structural lattice, to transcription Altus Biologics Inc., 1999). factors. 60 Major enzyme manufacturers (e.g. Novo, Genencor, Proteins consist of discrete functional domains. Domains Roche, etc.) tend to focus on large-scale enzyme production of similar or analogous function in different proteins usually for the major industrial markets (such as detergents and show amino acid sequence similarities and are related in textiles) and not on the application of enzymes for fine evolution. “Domain shuffling has played a major role in the chemical development (Joe Blanchard, Altus Biologics Inc., evolution (as well as in the gene engineering) of proteins 65 1999). with highly diverse functionalities. Interaction domains, for The continued growth in interest in the commercial use of example, can be found in proteins of many different func biocatalysis and the fragmentation of the biocatalyst indus US 7,037,894 B2 3 4 try will allow both large and Small companies to exploit Algorithms that calculate intra-molecular forces within innovative biocatalysts and the products and processes that proteins are being used to design and/or evolve enzymes utilize them (BIC Explorer: Business Opportunities in Tech with greater thermostability in silico. This approach is still nology Commercialization, 1999). severely hampered by the limited understanding of the Bioremediation applications may, in the future, turn into 5 intra-molecular forces and the processes involved in protein one of the most economically important applications of folding. biocatalytic enzymes. For example, approximately 2.3 tril Addition of chemical modifications that can hold proteins lion gallons of municipal effluent and 4.9 billion gallons of in their correct conformation is often referred to as protein industrial waste are passed into U.S. waters each year, and engineering. Such protein engineering approaches include approximately 1 million gallons of hydrocarbons enter our 10 derivatization (e.g. PEGylation, addition of polymeric environment per day. Hydrocarbon cleansing is a routine Sucrose and/or dextran, methoxypolyethylene glycol, etc.) requirement for various commercial operations (e.g., oil and old methods of protein cross-linking (e.g. production of tankers, marine bilges, storage, fuel and truck tanks). cross-linked enzyme crystals or CLECs). Unfortunately, Currently, there are several processes in development that these approaches are often ineffectual or cause dramatic utilize biocatalysts for decontamination/decomposition of 15 losses in activity. both hydrocarbons and wastewater. Not only are these Strategies for the operational stabilization of biocatalysts processes commercially the most promising systems due to that have proven Successful in Some respects include (a) efficiency and low costs, but they are also the cleanest. catalyst immobilization and (b) the use of organic solvents Furthermore, biocatalytic desulfurization is an inexpen in the reaction medium (termed medium engineering). Ther sive and attractive technology to the crude oil production mal stability upon immobilization is the result of molecular market, where low-sulfur crude oil commands a premium rigidity and the creation of a protected microenvironment. price over high-sulfur crude oil. There is a growing need for Methods include multi-point covalent attachment and gel cost-effective Sulfur management and desulfurization world entrapment. Immobilization of biocatalysts is the most used wide due to an increased level of sulfur in fossil fuels and strategy as additional benefits are obtained, such as flexibil increasingly stringent regulations requiring lower Sulfur 25 ity of reactor design, and facilitated product recovery with emissions. Compliance with these regulations is expected to out catalyst contamination. However, despite its great tech cost the European refining industry alone more than $50 nological potential, few large-scale processes utilize billion in capital and S10 billion annually in operating immobilized enzymes. Severe restrictions often arise in scale-up because of additional costs, activity losses, and expenditures. 30 All catalyst manufacturing in 1997 represented a S10 issues regarding diffusion. billion-plus market in the U.S., a figure quoted by the The main purpose of medium engineering in biocatalysis American Chemical Society (see also, “Catalyst Industry was originally to utilize robust commercial hydrolytic Stresses Need for Partners as Key to Future Success.” C&E enzymes in organic synthesis. However, enhanced thermo News, Jul. 11, 1994: CatCon 96 presentations by T. Lud 35 stability in organic media has proven an additional and ermann of CONDEA Chemie GmbH, Paul Lamb of Engle significant bonus. It is hypothesized that partial or almost hard Corporation, and J. Ohmer and K. Herbert of Degussa total substitution of water is beneficial since water is Corporation). According to Maxigen, the total industrial involved in enzyme inactivation. Whatever the mechanism, enzymes market (a segment of the catalyst manufacturing numerous cases have recently been reported where remark market) is estimated at S1.4 billion today, growing at 40 able enzyme stability has been obtained in organic media roughly 10% annually. Such as polyglycols and glymes. Despite this advance, medium engineering is unlikely to solve all biocatalysis 2.3. Stabilization Strategies stability problems. Some of the most promising solutions to biocatalysis Several protein stabilization strategies are known in the problems have combined evolutionary approaches with art and have been previously described, as highlighted 45 operational stabilization techniques, such as using directed below. evolution to generate enzymes with higher reaction rates in 2.3.1. Stabilization of Biocatalytic Enzymes organic solvents. Such combined approaches may provide Several approaches have been taken to enhance the sta significant synergies which maximally improve upon and bility of biocatalysts. On the protein level, the most promi 50 enable commercially-relevant biocatalytic processes. In nent approaches include discovery of stable biocatalysts principle, the invention described herein below can be from investigation of thermophilic organisms, directed applied in combination with any of the above-mentioned evolution, and computational- and protein engineering, as known stabilization approaches. described below. 2.3.2. Stabilization of Other Proteins Thermophilic organisms, or 'extremophiles, are sought 55 Molecular biological techniques have made it possible to in extreme environments such as deep-sea vents and Yel stabilize Some proteins by, e.g., engineering fusion-proteins. lowstone geysers. Although enzymes of commercial rel Some fusion proteins have even displayed novel function evance have been identified from them, this discovery alities. To make a fusion-protein, a single nucleic acid approach is limited by what can be found in nature. This construct is created that directs the expression of modular approach has not yielded as many commercially-relevant, 60 domains derived from at least two proteins as one protein. thermostable biocatalysts as was initially hoped for and/or Due to fusion, two domains can be held in very close projected. proximity to each other, thereby making the local concen Directed evolution techniques are powerful approaches tration of each domain very high with respect to the other. capable of generating stabilized enzymes, often also with In this way, a functional complex is stabilized. For example, altered/improved functional specificities. However, the 65 homo- and heterodimers of the interleukin 8 family have approach is limited by the feasibility of the selection pro been stabilized in this way, maintaining functionality similar cedure. to wild type (Leong S. R. et al. Protein Sci., vol. 6(3): pp: US 7,037,894 B2 5 6 609–17, 1997) Another example of protein complexes sta 2.3.3. Fv Fragments bilized in this way is the method stabilizing immunoglobulin Immunoglobulin Fv fragments comprise another example FV fragments, consisting of the variable domains of immu of a class of proteins for which stabilization is desirable. noglobulin heavy and light chains, lacking the stabilizing Immunoglobulin Fv fragments are the Smallest fragments of effect of inter-chain disulfide bonds. It is necessary to immunoglobulin complexes shown to bind antigen. Fv frag stabilize the complex by another means to maintain the affinity of the immunoglobulin complex, and expression of ments consist of the variable regions of immunoglobulin both polypeptides as a single chain is one of the methods heavy and light chains and have broad applicability in used (Pluckthun and P. Pack. Immunotechnology; vol. 3(2): pharmaceutical and industrial settings. pp. 83-105, 1997). Value of Fv Fragment Market However, in the design of pharmacological reagents, it is 10 A recent analysis estimated that 20 to 40 percent of all often disadvantageous to create fusion proteins that require bio-technological therapeutics and diagnostics currently in a linker sequence to stabilize them. For example, Such development are based on immunoglobulin (Pharmaceutical linkers introduce non-self epitopes which are often recog Research and Manufacturers of America. New Medicines in nizes by the organism as foreign and elicit immune Development, Survey. 1998). Furthermore, a significant responses. This reduces the efficacy of Such therapeutics 15 portion, and the majority of current “state of the art and/or diagnostics because the reagents are then cleared by Ig-based therapeutics and diagnostics in development are FV the immune system (see, for example, Raag R. and Whitlow fragment-based (Price Waterhouse: Survey of Biopharma M. FASEB; vol. 9: pp. 73–80, 1995). In the case of single ceutical Industry, 1998). For reviews of the utility of immu chain Fv fragments, the linker, which is most frequently noglobulin as a pharmacological agent, see Penichet M. L. chosen to be a highly flexible structure, allows the complex to disassociate, since the affinity of the two polypeptides to et al., Hum Antibodies; vol. 8(3): pp. 106–18, 1997: Sensel each other is low. The single chain Fv fragments then M. G. et al. Chem. Immunol.; vol. 65: pp. 129–58, 1997: aggregate, or clump, and thereby loose their functionality Reiter Y. and Pastan I.TIBTECH; vol. 16(12): pp. 513–520, 1998; Reiter Y. et al. Nat Biotech.: vol. 14: pp. 1239–1245, (Webber K. O. et al. Mol. Immunol.; Vol. 32(4): pp. 1996: Pluckthun and P. Pack. Immunotechnology; vol. 3(2): 249–258, 1995). More rigid linkers that lend the complex 25 more stability, and would thereby decrease the level or speed pp. 83–105, 1997: Wright A. and Morrison S. L. Trends of aggregation and loss of functionality, are associated with Biotechnol.; vol. 15(1): pp. 26–32, 1997: Schwartz M. A. et increased immunogenicity (Raag R. and Whitlow M. al. Cancer Chemother. Biol. Response Modif.; Vol. 13:pp. FASEB; vol. 9: pp. 73–80, 1995). 156–74, 1992; Houghton A. N. and Scheinberg D. A. Semin Cross-linking the domains at close contact sites would 30 Oncol.; vol. 13(2): pp. 165–79, 1986; and Cao Y. and Suresh circumvent these problems, where it is possible to direct the M. R. Bioconjugate Chemistry; vol. 9(6): pp. 635-644, cross-link between two proteins to such surfaces of the 1998. proteins where after the reaction the cross-link is buried. Following the successful introduction of the first Ig-based One Such means is to stabilize complexes by introducing a biotech drug, ReoPro by Centocor, in 1994, six more disulfide bond between two polypeptides by introducing 35 Ig-based drugs were approved in 1997 and 1998 and six point mutations to cystine in both polypeptide chains. The more were in phase III clinical trials as of the end of 1998. mutations are introduced at positions that allow the forma Sales of a single, clinically successful, immunoglobulin tion of such bonds (see, for example, Reiter Y. et al. Nat based product can result in annual revenues on the order of Biotech.: vol. 14: pp. 1239–1245, 1996: Pastan et al. U.S. several hundreds of millions of dollars (Pharmaceutical Research and Manufacturers of America. New Medicines in Pat. No. 5,747,654, issued May 5, 1998). 40 Disulfide bonds are, however, unstable under many physi Development, Survey, 1998). Together, these facts give ological conditions (Klinman J. P. (ed). Methods in Enzy evidence of the commercial and clinical value of these types mology; vol. 258, 1995). Physiological conditions vary of products. widely, for instance with respect to redox potential The cost of developing, producing and clinically testing (oxidizing vs. reducing) and acidity (high VS. low pH) of the 45 Such products is, however, immense and the risk of failure various, physiological milieuS (intracellular, extracellular, is often great. Because of this, any technology that can either pinocytosis vesicles, gastro-intestinal lumen, etc.). increase the product’s effectiveness, broaden its range of Di-sulfide bonds are found in nature only in extracellular applications or increase its chances of Succeeding in clinical proteins, and they are known to fall apart in reducing trials will add enormously to the Net Present Value of a environments, such as the intracellular milieu. But even in 50 product in development (Boston Consulting Group: The the extracellular milieu, many engineered di-sulfide bonds Contribution of Pharmaceutical Companies: What's at stake are unstable. for America, 1993). Several other chemical cross-link methodologies allow Fv Fragment Stabilization Methods the formation of bonds that are stable under a broad range of To date, a variety of methodologies have been employed physiological and non-physiological pH and redox condi 55 to stabilize engineered antibodies. First, introduction of tions. However, in order to maintain the complex’s activity additional di-sulfide bonds has been performed through and specificity, it is necessary that the cross-link is specifi molecular biological manipulation of the antibody cally directed and controlled such that, first, the overall expressing construct (Reiter Y. and Pastan I.TIBTECH; Vol. structure of the protein is minimally disrupted, and second, 16(12): pp. 513–520, 1998). Second, introduction of a linker that the cross-link is buried in the protein complex so as not 60 has been employed that allows both fragments to be to be immunogenic. But with most cross-link expressed as a single chain (single chain Fv fragments) methodologies, the degree to which it is possible to direct (Pluckthun and P. Pack. Immunotechnology; vol. 3(2): pp. the bond to a specific site is too limited to allow them to be 83–105, 1997: Cao Y. and Suresh M. R. Bioconjugate used for most bio-pharmaceutical and/or diagnostic appli Chemistry; vol. 9(6): pp. 635-644, 1998). Finally, fusion of cations. Examples of Such cross-link methodologies include 65 an exogenous di- or oligomerization domain to each of the UV-cross-linking, and treatment of protein with formamide Fv fragment chains has been performed (Pluckthun and P. or glutaraldehyde. Pack. Immunotechnology; vol. 3(2): pp. 83-105, 1997: Cao US 7,037,894 B2 7 8 Y. and Suresh M. R. Bioconjugate Chemistry; vol. 9(6): pp. side-chain, followed by covalent coupling of the resultant 635-644, 1998; see also Antibody Engineering Page, IMT. tyrosyl radical with another tyrosyl side-chain that is in University of Marburg, FRG. Sufficient proximity. However, all of these technologies have significant draw This cross-link methodology was originally developed to backs. Disulfide bonds are a suitable bond in the context of 5 cross-link proteins that interact in cell lysates, as a proxy to Fab fragments (see FIG. 1D), and many other extra-cellular the in vivo situation, to enable the study of the functionality proteins, to stabilize protein complexes. Furthermore the of proteins by identifying other proteins they interact with. introduction of disulfide bonds avoids the need to introduce The reaction only occurs with tyrosine side-chains that are foreign peptides, and the resultant stabilized complexes are in very close proximity to each other. Furthermore, the bond minimally immunogenic. Nonetheless, the introduction of 10 formed between the tyrosyl side-chains is irreversible and disulfide bonds in Fv fragments by molecular biological stable under a very wide range of physiological conditions. means results in complexes that are insufficiently stable None of the above-cited references disclose or suggest under many commercially relevant, physiological methods using di-tyrosyl cross-linking for formation of conditions, such as the intracellular milieu and sometimes buried chemical cross-links for stabilizing a protein complex even serum. As such they have limited usefulness in the 15 while maintaining the complex’s activities and specificities. pharmaceutical context. Accordingly, a need exists for Such methods wherein the With single chain Fv fragments there is a trade-off product is functional under a wide range of physiological between the stability of the complex and its immunogenicity and non-physiological conditions, and wherein the structure, in a therapeutic or in vivo diagnostic context. Linkers that function, and specificity of the cross-linked protein complex result in stable conjugates that are more rigid structures, and is maintained. elicit immune responses, which in turn results in decreased Citation or identification of any reference in Section 2 or utility. Linkers that are not immunogenic are generally the any other section of this application shall not be construed more flexible linkers that provide insufficient stability (see as an admission that such reference is available as prior art above, Raag R. and Whitlow M. FASEB; vol. 9: pp. 73–80, to the present invention. 1995). 25 Fv fragments stabilized by fusion to multimerization 3. SUMMARY OF THE INVENTION domains are significantly immunogenic, and lack the most This invention provides a method for stabilization of a significant advantage of Fv fragments in the first place: polypeptide or polypeptide complex, by the introduction of reduced size and resultant increased tissue penetration. 30 intra-polypeptide and/or inter-polypeptide di-tyrosine Other currently available chemical cross-link methods, bonds, which simultaneously maintains the structure and such as UV cross-linking (see above), are severely limited in function of the polypeptide or polypeptide complex. Further, the degree to which it is possible to direct the bond to a this invention provides various methods for optimizing specific site. As bio-pharmaceutical and/or diagnostic appli protein stabilization. Such methods include Statistical analy cations require the maintenance of the polypeptide's 35 ses of the primary amino acid sequences of related proteins function, specificity in the cross-link reaction is paramount. (two-dimensional data analysis) and statistical analyses of the three-dimensional coordinates of proteins believed to be 2.4. The Tyrosyl-Tyrosyl Oxidative Cross-Link related in three-dimensional structure (three-dimensional Oxidative cross-link reactions between tyrosyl side data analysis). chains have been demonstrated to occur naturally. For 40 Further, this invention provides stabilized polypeptides example, cytochrome c peroxidase compound I has been and polypeptide complexes. To achieve Stabilization, the demonstrated to form di-tyrosine bonds during the endog cross-link reaction is carefully controlled Such that polypep enous reduction of its active site (Spangler B. D. and Erman tides and polypeptide complexes maintain their original J. E. Biochim. Biophys. Acta; Vol. 872(1-2): pp. 155–7, functionality. In one embodiment, the invention provides a 1986), and di-tyrosine-linked dimers of gammaB-crystallin 45 method for the identification of amino acid residues which, are reportedly associated with cataractogenesis of the eye when cross-linked, are least disruptive to the structure and lens. In vitro, di-tyrosine protein-protein links are readily function of the polypeptide or polypeptide complex. In formed photodynamically in the presence of sensitizers another embodiment, the invention provides a method for (Kanwar R. and Balasubramanian D. Exp. Eye Res., vol. mutagenesis of identified residues to further control the 68(6): pp. 773–84, 1999). Furthermore, protein cross 50 cross-link reaction. Polypeptides and polypeptide com linking through the formation of di-tyrosine bonds can be plexes so stabilized can be utilized under a wide variety of catalysed, for example, by peroxidase (Gmeiner B. and physiological and non-physiological conditions. Further, the Seelos C. FEBS Lett; vol. 255(2): pp. 395–7, 1989), or by cross-link methodology disclosed herein may preclude the metallo-ion complexes (Campbell et al. Bioorganic and need for addition of exogenous structures to engineered Medicinal Chemistry, vol. 6: pp. 1301–1037, 1998: Brown 55 proteins and complexes, such as peptide linkers. In another K. C. et al. Biochem.; vol.34(14): pp.4733–4739, 1995), and embodiment, the invention provides a method for statistical by light-triggered oxidants (Fancy D. A. and Kodadek T. analysis of databases of structural and/or sequence informa Proc. Natl. Acad. Sci., U.S.A.; vol. 96: pp. 6020-24, 1999). tion available for polypeptides and polypeptide complexes As described by Campbell et al., in the presence of an to be stabilized. The statistical analysis identifies suitable appropriate catalyst and an appropriate oxidizing reagent, an 60 residue pairs which are least likely to be disruptive of oxidative cross-link reaction can occur between tyrosyl structure and function when cross-linked. Further, in a side-chains of proteins that are properly spaced. In this polypeptide chain or chains to be cross-linked, potentially reaction, the hydroxyl groups of the tyrosyl side-chains react undesirable reactive side-chains may be altered using site with each other, an HO molecule is released, and the directed mutagenesis, e.g., to introduce a maximally con side-chains are linked by a covalent bond. This reaction is 65 servative point mutation that will not support the cross-link thought to proceed through a high-valent metallo-oxo com reaction. The cross-link reaction conditions may also be plex which abstracts an electron from an accessible tyrosyl adjusted to prevent undesired cross-links. At residues iden US 7,037,894 B2 10 tified as desirable positions for cross-linking, reactive side described by the alpha carbon-alpha carbon axis and (1) the chains may be introduced by site-directed mutagenesis, and alpha carbon-beta carbon bond of the first chain (A1-B1), the cross-link reaction is carried out using the conditions and (2) the alpha carbon-beta carbon bond of the second identified above. chain (A2-B2). FIG. 3. The angle (), indicated in FIG. 2B, is +120°. For 4. BRIEF DESCRIPTION OF THE FIGURES this configuration, the alpha carbon distances, angles up and The present invention may be understood more fully by (p, and the alpha-beta distance differences (see text) are reference to the following detailed description, illustrative represented geometrically for maximal and minimal con examples of specific embodiments and the appended figures. figurations (that fall into one plane), given this angle (). The FIG. 1 The dityrosyl cross-link and example proteins 10 angle b is 109.50, the tetrahedral angle of carbon atoms, which can be stabilized according to methods of the inven and complete rotational freedom of the alpha carbon around tion. A. Schematic representation of a dityrosyl cross-link. the around the beta carbon-gamma carbon axis is assumed. Addition of a cross-linking catalyst and an oxidizing reagent In A, the length c is the distance between the two carbon atoms of a carbon-carbon bond; the length v is cos(180° to a protein or protein complex preparation wherein at least O)/2)x c, the length his sin((180°-O.)/2)x c, length a is half two tyrosine residues occur in close proximity and in proper 15 of the square root of the sum of 7v squared and h squared, orientation results in a dityrosyl cross-link and one water and the length b is the square root of the Sum of the square molecule. B. Schematic representation of the canonical fold of (a+V) and h squared. In B, V is the cos(180°-(B-(180° of a/b hydrolases, a group of enzymes which includes O)/2+arctan(h/7v))x c, h is the sin(180°-(B-(180°-C.)/2+ lipases. The topological positions of the active site residues arctan(h/7V))X c, and, analogously, length a is half of the are indicated as solid circles. From K.-E. Jaeger et al., 1999, square root of the Sum of 7v squared and h squared, and the Ann. Rev. Microbiol. 53, 315–351. C. Schematic represen length b is the square root of the Sum of the square of (a+V) tation of secondary structure of Candida antarcticalipase B. and h squared. In the configuration depicted in A, at which The topological positions of the active site residues are the alpha carbon distance is maximal, the angles up and pare indicated as residues S105, D187, and H224. From J. (180°-C.)/2-arctan(h/7v); in the configuration in B, at which Uppenberg et al., 1994, Structure 2, 293–308. D. Schematic 25 the alpha carbon distance is minimal for an angle w of representation of an immunoglobulin molecule (IgG). The +120°, up and cp are B-(180°-C.)/2-arctan(h/7v). immunoglobulin hetero-tetramer comprises two identical FIG. 4. The angle (), indicated in FIG. 2B, is -120°. In light chains, and two identical heavy chains. The complex is FIG. 4, the alpha carbon distances, angles p and p, and the stabilized by inter-chain disulfide bonds; the disulfide bonds alpha-beta distance differences (see text) are represented are indicated by the “S-S' links in the schematic represen 30 geometrically for maximal and minimal configurations (that tation. Both antigen-binding domains, one at either end of fall into one plane), given this angle (). The angle B is kept the "fork, consist of a pair of heavy and light chain variable constant at 109.5°, the tetrahedral angle of carbon atoms, regions, and are referred to as the “Fv fragments'. The and complete rotational freedom of the alpha carbon around antigen-binding domain is the Fv fragment, consisting of the the around the beta carbon-gamma carbon axis is assumed. variable region of both the heavy and light chain consist of 35 In A, the length X is 4V, the lengthy is the square root of the four relatively conserved Framework Regions that provide Sum of h squared and 3v squared, the length Z is the the overall structure, and of three Complementarity Deter cos(180°-120°--arctan(h/3V))x y, the length a is half of the mining Regions that lend the Fv fragment its specificity for square root of the Sum of (X+Z) Squared and y Squared, the a specific antigen. The Fab fragment, which comprises both length v is the cos(120°-3)x c, and the length b is the sum the light and heavy chain variable regions (V1 & Vh), 40 of the lengths a and V. In B, the length v is the cos((B-2x constant region of light chain (Cl), and the first constant (180°-C.)/2)x c, and the length b is the difference of the region of the heavy chain (Ch1), is stabilized by an inter lengths a and V. In the configuration depicted in A, at which chain disulfide bond. In the Fv fragment none of the immu the alpha carbon distance is maximal for an angle () of noglobulin inter-chain disulfide bonds are present, as +120°, and pare C-B, in the configuration in B, at which indicated, resulting in the requirement for this protein com 45 the alpha carbon distance is minimal, up and (p are 180° plex to be stabilized artificially. (B-2x((180°-O.)/2). FIG. 2. A. Schematic representation of a tyrosyl side FIG. 5. Structural Coordinate Data, the primary (or chain, consisting of an alpha carbon (A) which is still part input-) data of a 3-D database. First two amino acid residues of the polypeptide back-bone, a beta carbon (B), the first of a representative Fv Fragment heavy (H) and light (L) atom in the side-chain not part of the back-bone, an aromatic 50 chain, in Angstroms; the data of each atom is represented in ring, which, in turn, consists of six carbon atoms, and a rows, the atoms are listed in columns. Coordinate data is hydroxyl group (OH). The angle B in the beta carbon represented for all residue atoms other than Hydrogen between the beta carbon-hydoxyl oxygen axis and the alpha atoms, including those involved in the polypeptide backbone carbon-beta carbon bond is indicated. B. Schematic repre and those in the amino acids side-chain. In the left-hand sentation of a tyrosyl-tyrosyl bond indicating in addition the 55 column, under the heading “Chain', the identity of the angle B, the angle (), which is the angle between the polypeptide chain is listed, with which an atoms coordi dityrosyl bond and the carbon-carbon bond in the aromatic nates are associated. An Fv fragment consists of two ring of the cross-linked tyrosyl side chain that is proximal to polypeptides: a heavy chain (H; below) and a light chain (L: the beta-carbon of the same side chain, projected into the above). The number under the heading “K&W indicates the two plane of the two aromatic rings. Also indicated are the 60 position of the atoms residue within the Kabat & Wu angle C, the angle between all carbon residues in the plane (K&W) alignment system. Under the heading “Atom', the of the aromatic rings (120), and the degrees of rotational identity of an atom of the specific amino acid present in the freedom (1) in the dityrosine bond itself, and (2), of the representative polypeptide at that particular residue are alpha carbon around the beta carbon-gamma carbon (most indicated (identified under the heading “Amino Acid in proximal carbon atom in the aromatic ring) axis. C. Three 65 three letter code). The x, y, and Z three-dimensional coor dimensional angles formed by the alpha carbon-alpha car dinates of each atom are represented in the right-hand bon axis, the beta carbons (up and (p), and the two planes (X) columns, as indicated. US 7,037,894 B2 11 12 FIG. 6. Schematic representation of 3 actual Fv fragment FIG. 12. Quantification of amino acid side-chain physical entries into a 3-D database. Arrays of alpha-carbon coordi properties, as an example of relevant derivative data, at (the nate data of heavy and light chain residues of the Fv first four, representative) residues of the Fv fragment heavy fragments, and, as an example of relevant derivative data, chain, based on FV fragment polypeptide sequence data, calculated inter-chain, inter-atomic distances. Heavy chain compiled in a 2-D database. A. Amino Acid Sequence Data. alpha-carbon data is represented in rows, as described in the Representation of primary data compiled in a 2-D database. description of FIG. 5, and light chain alpha-carbon data is Amino acids (AA) occurring at each residue are sorted by transposed, and the light chain data described in FIG. 5 is the frequency (F) of their occurrence at that specific residue. B. Amino Acid Side-chain Quantification Tables. Represen represented in columns. Derivative data describing the inter tation of numeric values used in a 2-D database to obtain chain, 3-D relationships of the atoms on both chains is 10 relevant derivative data by quantifying the physical proper represented at the intersection of each heavy chain row and ties of amino acids: e.g. van der Waals volume A light chain column. (Richards, F. M.) and numeric hydrophobicity values FIG. 7. Statistical measurements in a 3-D database of (Eisenberg, D.). C. Quantification of the physical properties, alpha carbon distances between of Fv fragment heavy and exemplified here by van der Waals volumes, of the amino light chain residue pairs, as an example of relevant deriva 15 acid side-chains present at each residue in the sample of FV tive data. A. Illustrative statistical measurements of the alpha fragment sequences in the 2-D database. carbon distances between residue pairs of the three repre FIG. 13. Statistical measurements in a 2-D database of sentative Fv Fragment heavy and light chains in the descrip side-chain physical properties at each residue of Fv fragment tion of FIG. 6 (i.e. data shown for n=3). B. Actual statistical heavy chains present in the 2-D database (sample), as an measurements of the alpha carbon distances between the example of relevant derivative data, quantified as described residue pairs of all Fv fragment heavy and light chains in the in the description of FIG. 12. In the third column from the sample of Fv fragments used for the selection (data shown left, under the heading “Cons', the consensus, or most for n=17). frequently occurring amino acid for each represented residue FIG. 8. Schematic representation of a Fv fragment entry is listed. As representative statistical measures, average and 25 standard deviations are shown, both weighted and (Fv Fragment 1 of FIG. 6) into a 3-D database. Arrays of un-weighted by the frequency of each amino acids occur beta-carbon coordinate data of heavy and light chain resi rence in the sample at each residue represented in this figure. dues of the Fv fragment, and, as an example of relevant A. Average and standard deviations are shown for residue derivative data, calculated inter-chain, inter-atomic dis van der Waals volumes, both weighted and un-weighted by tances. Heavy chain beta-carbon data is represented in rows, 30 the frequency of each amino acids occurrence in the sample and light chain beta-carbon data is transposed and repre at each residue represented in this figure. B. Average and sented in columns, as described in the description of FIG. 5. standard deviations are shown for residue Hydrophobicity Derivative data describing the inter-chain, 3-D relationships quantities, both weighted and un-weighted by the frequency of the atoms on both chains is represented at the intersection of each heavy chain row and light chain column. of each amino acids occurrence in the sample at each 35 residue represented in this figure. FIG. 9. Schematic Representation of the approach taken FIG. 14. Schematic illustration of a successive array and to calculate the differences between the inter-chain, inter a parallel array of filters designed for automation using a atomic residue pair alpha-carbon and beta-carbon distances computer system and Software for the residue pair selection (alpha-beta distance differences) for an individual Fv frag process. The filters shown are an illustrative set of filters ment in the 3-D database (Fv Fragment 1 of FIG. 6 and 8). 40 taken from the filters described above (see Identification of Heavy chain alpha-(top) and beta-carbon (middle) data is Suitable Residues for the Reaction). In this illustration, the represented in rows, and light chain alpha- and beta-carbon number of selected residues that “passed each filter, either data is transposed, and represented in columns, as described in Succession (left) or in parallel (right), is derived from an in the description of FIG. 5. Derivative data describing the analysis of the 106 amino acids of the Fv fragment light inter-chain, inter-atomic distances in the top and middle 45 chain, the 120 amino acids of the Fv fragment heavy chain, panels, and the alpha-beta distance differences in the bottom and the resultant 12720 possible residue pairs in a given Fv panel, is represented at the intersection of each heavy chain fragment. The percentages indicating the permissiveness of row and light chain column. each filter are also illustrative of the Fv fragment example. FIG. 10 Alpha-beta distance difference data, derived as See text for further discussion (Software for Selection describe in FIG. 9, of representative Fv fragments (FV 50 Process). fragments 1, 2, and 3 of FIG. 6) in a 3-D database. Heavy FIG. 15. A. Nucleotide and amino acid sequence of the C. and light chain residues are represented in arrays, where the antayclica Lipase B. Both sequences start where the 25 heavy chain residues are listed vertically, and the light chain amino acid pre-propeptide is cleaved. B. Sequences of residues are listed horizontally. Data correlated with heavy oligonucleotides used for cloning, site-directed mutagenesis, and light chain residues is represented at the intersection of 55 and error-prone PCR as indicated. The pPal-CALB vector is each heavy chain row and light chain column. based on the pPICZalpha A vector, whereby the insert is the FIG. 11. Statistical measurements in a 3-D database of N-terminally His-tagged reading frame of the CALB gene, alpha-beta distance differences of Fv fragment heavy and as represented in A, that is cloned into the EcoRI and Notl light chain residue pairs, as an example of relevant deriva sites in the multiple cloning site of the vector. The vector tive data. A. Illustrative statistical measurements of the 60 pYal-CALB is based on the pYES2.1 V5-His-TOPO vector, alpha-beta distance differences of the pairs between the three whereby the insert is the alpha factor CALB fusion, con representative Fv Fragment heavy and light chains in FIG. taining the N-terminal His-tag, EcoRI and NotI restriction 6 (i.e. data shown for n=3). B. Actual statistical measure sites, amplifed from the pPal-CALB vector. Primers for ments of the alpha-beta distance differences of the pairs error-prone PCR allow for directional cloning of the PCR between all Fv fragment heavy and light chains in the 65 product into the EcoRI and NotI sites in the pYal-CALB sample of Fv fragments used in the for selection (data shown vector. All of the construcis are generated by single amino for n=17). acid Substitutions. US 7,037,894 B2 13 14 FIG. 16. A. Nucleotide and amino acid sequence of thereof. Alternatively, the proteins and fragments and Subtilisin E from B. subtilis. B and C. Amino acid sequence derivatives thereof may be a component of a composition alignment of the functionally and structurally related Sub that comprises other components, for example, a diluent, tilisin enzymes: the middle row represents the sequence of Such as saline, a pharmaceutically acceptable carrier or subtilisin E. D. Oligonucleotides used for cloning and site excipient, a culture medium, etc. directed mutagenesis of Subtilisin E, as indicated. The A In specific embodiments, the invention provides frag Primer hybridizes with the 5' end of the gene, B-Primer ments of a stabilized polypeptide consisting of at least 3 hybridizes with the 3' end of the gene and further encodes a amino acids or of a stabilized polypeptide complex consist C-terminal his(6)-tag for use in affinity purification. The ing of at least 6 amino acids, 10 amino acids, 20 amino acids, forward and reverse primers indicated are for the constructs 10 50 amino acids, 100 amino acids, 200 amino acids, 500 1-7 containing single and double amino acid Substitutions. amino acids, 1000 amino acids, 2000 amino acids, or of at Constructs with double amino acid Substitutions are gener least 5000 amino acids. ated by making the first amino acid Substitution using the 5.1.1. Polypeptide Derivatives and Analogs Derivatives or analogs of proteins include those mol forward and reverse primers X.1, then generating the second ecules comprising regions that are Substantially homologous substitution using the forward and reverse primers X. 15 to a protein or fragment thereof (e.g., in various 5. DETAILED DESCRIPTION OF THE embodiments, at least 40% or 50% or 60% or 70% or 80% INVENTION or 90% or 95% identity over an amino acid or nucleic acid The invention described herein comprises methods for sequence of identical size or when compared to an aligned stabilizing polypeptides and polypeptide complexes. Also sequence in which the alignment is done, for example, by a provided are polypeptides and polypeptide complexes sta computer homology program known in the art) or whose bilized using the described methods. The stabilization reac encoding nucleic acid is capable of hybridizing to a coding tion is controlled Such that the polypeptides and polypeptide gene sequence, under high Stringency, moderate stringency, complexes maintain their original functionality by providing or low stringency conditions. specifically localized reactive side-chains. The stabilized 25 Further, one or more amino acid residues within the polypeptides and polypeptide complexes can be maintained sequence can be substituted by another amino acid of a and utilized under a wide variety of physiological and similar polarity that acts as a functional equivalent, resulting non-physiological conditions without exogenous chemical in a silent alteration. Substitutions for an amino acid within structures that could be immunogenic and/or significantly the sequence may be selected from other members of the decrease their efficacy. 30 class to which the amino acid belongs. For example, the nonpolar (hydrophobic) amino acids include alanine, By taking a statistical approach to analyzing databases of , , valine, proline, phenylalanine, tryp structural and sequence information for domains of proteins, tophane and methionine. The polar neutral amino acids suitable residue pairs may be identified at which the cross include glycine, serine, threonine, cysteine, tyrosine, link reaction is likely to be least disruptive of the overall asparagine, and glutamine. The positively charged (basic) Structure. 35 amino acids include arginine, lysine and histidine. The At these residues, reactive side-chains are placed via negatively charged (acidic) amino acids include aspartic site-directed point mutations. In the polypeptide chains that acid and glutamic acid. Such Substitutions are generally are to be cross-linked, the codons of potentially reactive understood to be conservative substitutions. side-chains at other positions are also altered to introduce a The derivatives and analogs of the polypeptides of the maximally conservative point mutation that will not support 40 complex to be stabilized by application of the instant inven the reaction. tion can be produced by various methods known in the art. 5.1. Polypeptide and Polypeptide Complexes The manipulations that result in their production can occur Suitable for Application of the Invention at the gene or protein level. For example, a cloned gene Polypeptides and polypeptide complexes that can be 45 sequence can be modified by any of numerous strategies stabilized by the methods described herein are single known in the art. polypeptides or complexes that consist of two or more Chimeric polypeptides can be made comprising one or polypeptides and that remain functionally active upon appli several of the polypeptides of a complex to be stabilized by cation of the instant invention. Nucleic acids encoding the the instant invention, or fragment, derivative, analog thereof foregoing polypeptides are also provided. The term “func 50 (preferably consisting of at least a domain of a protein tionally active' material, as used herein, refers to that complex to be stabilized, or at least 6, and preferably at least material displaying one or more functional activities or 10 amino acids of the protein) joined at its amino- or functionalities associated with one or more of the polypep carboxy-terminus via a peptide bond to an amino acid tides of the complex. Such activities or functionalities may sequence of a different protein. be the polypeptide complexes original, natural or wild-type 55 Such a chimeric polypeptide can be produced by any activities or functionalities, or they may be designed and/or known method, including: recombinant expression of a engineered. Such design and/or engineering may be nucleic acid encoding the polypeptide (comprising a achieved, for example, either by deleting amino acids, or polypeptide coding sequence joined in-frame to a coding adding amino acids to, parts of one, any, both, several, or all sequence for a different polypeptide); ligating the appropri of the polypeptides, by fusing polypeptides of different 60 ate nucleic acid sequences encoding the desired amino acid polypeptides or polypeptide complexes, by adding or delet sequences to each other in the proper coding frame, and ing post-translational modifications, by adding chemical expressing the chimeric product; and protein synthetic modifications or appendixes, or by introducing any other techniques, for example, by use of a peptide synthesizer. mutations by any methods known in the art to this end as set 5.1.2. Manipulations of a Protein Sequence at the Protein forth in detail below. 65 Level The compositions may consist essentially of the polypep Included within the scope of the invention are tides of a complex, and fragments, analogs, and derivatives polypeptides, polypeptide fragments, or other derivatives or US 7,037,894 B2 15 16 analogs, which are differentially modified during or after carboxylic acid, Tca. Furthermore, the amino acid can be D translation or synthesis, for example, by glycosylation, (dextrorotary) or L (levorotary). For a review of classical acetylation, phosphorylation, amidation, derivatization by and non-classical amino acids, see Sandberg et al. (Sandberg known protecting/blocking groups, proteolytic cleavage, M. et al. J. Med. Chem.; vol. 41 (14): pp. 2481-91, 1998). etc. 5.1.3. Molecular Biological Methods Any of numerous chemical modifications may be carried out by known techniques, including but not limited to, Nucleic acids encoding one or more polypeptides stabi specific chemical cleavage by cyanogen bromide, trypsin, lized by the methodology of instant invention are provided. chymotrypsin, papain, V8 protease, NaBH4, acetylation, The polypeptides, their derivatives, analogs, and/or chimers, formylation, oxidation, reduction, metabolic synthesis in the of the complex can be made by expressing the DNA presence of tunicamycin, etc. 10 sequences that encode them in vitro or in Vivo by any known method in the art. Nucleic acids encoding one, any, both, In addition, polypeptides, polypeptide fragments, or other several, or all of the derivatives, analogs, and/or chimers of derivatives or analogs that can be stabilized using the the complex to be stabilized by the methodology of the methods of the instant invention can be chemically synthe instant invention can be made by altering the nucleic acid sized. For example, a peptide corresponding to a portion of 15 a protein can be synthesized by use of a peptide synthesizer. sequence encoding the polypeptide or polypeptides by Substitutions, additions (e.g., insertions) or deletions that Furthermore, if desired, non-classical amino acids or chemi provide for functionally acitve molecules. The sequences cal amino acid analogs can be introduced as Substitutions can be cleaved at appropriate sites with restriction and/or additions into the sequence of one, any, both, several endonuclease(s), followed by further enzymatic modifica or all of the polypeptides of the complex. tion if desired, isolated, and ligated in vivo or in vitro. Non-classical amino acids include, but are not limited to, Additionally, a nucleic acid sequence can be mutated in vitro the D-isomers of the common amino acids, fluoro-amino or in vivo, to create and/or destroy translation, initiation, acids, designer amino acids such as B-methyl amino acids, and/or termination sequences, or to create variations in CY-methylamino acids, NY-methylamino acids, and amino coding regions and/or to form new, or destroy preexisting, acid analogs in general. 25 restriction endonuclease sites to facilitate further in vitro Examples of non-classical amino acids include: modification. C.-aminocaprylic acid, Acpa, (S)-2-aminoethyl-L- Due to the degeneracy of nucleotide coding sequences, cysteine.HCl, Aecys; aminophenylacetate, Afa, 6-amino many different nucleic acid sequences which encode Sub hexanoic acid, AhX, Y-amino isobutyric acid and stantially the same amino acid sequence as one, any, both, C.-aminoisobytyric acid, Aiba; alloisoleucine, Aile; 30 several, or all of the polypeptides of complex to be stabilized L-allylglycine, Alg, 2-amino butyric acid, 4-aminobutyric may be used in the practice of the present invention. These acid, and C.-aminobutyric acid, Aba; p-aminophenylalanine, can include nucleotide sequences comprising all or portions Aphe; b-alanine, Bal; p-bromophenylalaine, Brphe; of a domain which is altered by the substitution of different cyclohexylalanine, Cha; citruline, Cit; B-chloroalanine, codons that encode the same amino acid, or a functionally Clala, cycloleucine, Cle; p-cholorphenylalanine, Clphe; cys 35 teic acid, Cya; 2,4-diaminobutyric acid, Dab; 3-amino pro equivalent amino acid residue within the sequence, thus pionic acid and 2,3-diaminopropionic acid, Dap, 3,4- producing a “silent’ (functionally or phenotypically dehydroproline, Dhp; 3,4-dihydroxylphenylalanine, Dhphe; irrelevant) change. p-flurophenylalanine, Fphe; D-glucoseaminic acid, Gaa; Any technique for mutagenesis known in the art can be homoarginine, Hag; 8-hydroxylysine.HCl, Hlys: DL-8- 40 used, including but not limited to, chemical mutagenesis, in hydroxy norvaline, Hinvl; homoglutamine, Hog; vitro site-directed mutagenesis, using, for example, the homophenylalanine, Hoph; homo Serine, Hos; QuikChange Site-Directed Mutagenesis Kit (Stratagene), hydroxyproline. Hpr; p-iodophenylalanine, Iphe; isoserine, etc. Ise; C.-methyl leucine, Mle; DL-methionine-S- 5.2. Applications of the Stabilization Technology methylsulfoniumchloide, Msmet; 3-(1-naphthyl) alanine, 45 1Nala; 3-(2-naphthyl)alanine, 2Nala; , Nile; The polypeptide and polypeptide complex stabilization N-methylalanine, Nimala; Norvaline, Nva; O-benzylserine, methods of the invention have broad applicability. Some Obser; O-benzyltyrosine. Obtyr; O-ethyltyrosine, Oetyr; non-limiting examples are set forth below. O-methylserine, Omser; O-methyl threonine, Omthr: 5.2.1. General O-methyltyrosine, Omtyr; Ornithine, Orn; phenylglycine; 50 Polypeptide complexes which are held together in nature penicillamine, Pen; pyroglutamic acid, Pga; pipecolic acid, by domains that mediate protein-protein interactions may be Pip; sarcosine, Sar; t-butylglycine; t-butylalanine; 3,3,3- stabilized using the methods of the invention. Further, single trifluroalanine, Tfa, 6-hydroxydopa, Thphe, L-Vinylglycine, polypeptide chains may be stabilized using the methods of Vig, (-)-(2R)-2-amino-3-(2-aminoethylsulfonyl)propanoic the invention to engineer intra-chain di-tyrosine cross-links. acid dihydroxochloride, Aaspa, (2S)-2-amino-9-hydroxy-4, 55 For example, hormones (e.g. insulin, erythropoietin, human 7-dioxanonanoic acid, Ahdna; (2S)-2-amino-6-hydroxy-4- growth hormone or bovine growth hormone), other growth oxahexanoic acid, Ahoha, (-)-(2R)-2-amino-3-(2- factors (e.g. insulin-like growth factors, neurotrophic hydroxyethylsulfonyl)propanoic acid, Ahsopa, (-)-(2R)-2- factors), and enzymes and/or biosensors and biocatalysts can amino-3-(2-hydroxyethylsulfanyl)propanoic acid, Ahspa; be stabilized, either alone or together as a complex with a (2S)-2-amino-12-hydroxy-4,7,10-trioxadodecanoic acid, 60 receptor or other protein binding partner (McInnes C. and Ahtda; (2S)-2,9-diamino-4,7-dioxanonanoic acid, Dadna; Sykes B. D. Biopolymers; vol. 43(5): pp. 339–66, 1997). (2S)-2,12-diamino-4,7,10-trioxadodecanoic acid, Datda; Examples of protein-protein interaction domains which may (S)-5,5-difluoronorleucine, Dfnl: (S)-4,4-difluoronorvaline, be stabilized using the methods of the invention include, but Dfnv; (3R)-1-1-dioxo-1,4thiaziane-3-carboxylic acid, are not limited to, leucine-zipper domains (Alber T. Curr. Dtca: (S)-4,4,5,5,6,6,6-heptafluoronorleucine, Hfnl: (S)-5.5, 65 Opin. Genet. Dev.; vol. 2(2): pp. 205-10, 1992), SH2 and 6, 6,6-pentafluoronor leucine, Pfnl: (S)-4,4,5, 5,5- SH3 domains (Pawson T. Princess Takamatsu Symp.; vol. pentafluoronorvaline, Pfnv; and (3R)-1,4-thiazinane-3- 24: pp. 303–22, 1994), PTB and PDZ domains (Cowburn D. US 7,037,894 B2 17 18 Curr. Opin. Struct. Biol.; vol. 7(6): pp. 835–8, 1997: Boc research, or industry sectors, that include, for example, but kaert J. and Pin J. P. EMBO J.; vol. 18(7): pp. 1723–9, are not limited to, the chemical, detergent, pharmaceutical, 1999), WD40 domains (Royet J. et al. EMBO.J.; vol. 17(24): agricultural, food, cosmetics, textile, materials-processing, pp. 7351–60, 1998), death- and death effector domains and paper industries. Within Such industry sectors, enzymes, (Strasser A. and Newton K. Int. J. Biochem. Cell. Biol.; vol. or biocatalysts, may be applied in any way, or have any kind of utility, Such as, but not limited to, product synthesis, use 31(5): pp. 533–7, 1999), disintegrin domains (Black R. A. as active agents in products, use in diagnostic testing and White J. M. Curr Opin Cell Biol.; vol. 10(5): pp. 654–9, equipment, or any other applications that may include, but 1998), and CARD domains (Chou J.J. et al. Cell: vol. 94(2): are not limited to, wastewater and agricultural soil treatment, pp. 171-80, 1998). and crude oil refinement. Examples of synthetic applications Proteins which dimerize or multimerize to function may 10 include, but are not limited to, amino acid manufacturing be stabilized using the methods of the invention. Such and fine chemical synthesis. Examples of biocatalytic appli proteins include most immunoglobulin complexes, includ cations as active agents in products include, but are not ing the fragments that retain immunoglobulin functionality, limited to, such applications as biological washing powders. Such as, for example, Fab., F(ab), Fc, and Fv fragments Biocatalysts may be derived from enzymes of any class, (Penuche M. L. et al. Hum Antibodies; vol. 8(3): pp. 106–18, 15 family, or any other categorization of enzymes, including, 1997: Sensel M. G. et al. Chem. Immunol.; vol. 65: pp. 129-58, 1997). Most cell-surface receptors that transmit but not limited to, oxidoreductases, transferases, hydrolases, extracellular signals to intracellular signaling systems lyases, isomerases, ligases, polymerases, lipases, esterases, dimerize and contain some of the above mentioned domains proteases, glycosidases, glycosyltransferases, phosphatases, that mediate protein-protein interactions (McInnes C. and kinases, monooxygenases, dioxygenases, transaminases, Sykes B. D. Biopolymers; vol. 43(5): pp. 339–66, 1997: amidases, and acylases; they may comprise a single Guogiang J. et al.; Nature; vol. 401: pp.606–610, 1999). polypeptide chain, or two or more polypeptide chains of a Further examples are intracellular protein complexes. Such polypeptide complex. as, for example, the caspases (Chou J. J. et al. Cell: Vol. A biosensor is defined as a device that consists of a 25 biological recognition system, often called a bioreceptor, 94(2): pp. 171-80, 1998). and a transducer. The interaction of the analyte with the Growth factors which may be stabilized using the meth bioreceptor is designed to produce an effect measured by the ods of the invention include, but are not limited to, those that transducer, which converts the information into a measur dimerize to function, such as interleukin-8 (Leong S. R. et able effect, such as an electrical signal. A biochip consists of al. Protein Sci., vol. 6(3): pp: 609–17, 1997) and members 30 an array of individual biosensors that can be individually of the NGF/TGF family. These proteins are generally char monitored and generally are used for the analysis of multiple acterized as having 110–120 amino acid residues, up to 50% analytes. A bioreceptor can be a biological molecular species homology with each other, and are used for the treatment of (e.g., an antibody, an enzyme, or a protein) that utilizes a a variety of health disorders, such as cancer, osteoporosis, spinal cord injury and neuronal regeneration. Examples of biochemical mechanism for recognition. Common forms of 35 bioreceptors used in biosensing are based on antibody/ the NGF family include, but are not limited to, NGF, BDNF, antigen and enzymatic interactions. Biosensors are widely NT-3, NT-4/5, and NT-6, TRAIL, OPG, and FasL polypep applied in biological monitoring and environmental sensing. tides (Lotz M. et al. J. Leukoc. Biol.; vol. 60(1): pp. 1-7. Furthermore, significant advances are being made in their 1996; Casaccia-Bonnefil P. et al Microsc Res Tech., vol. use in the analysis of samples of biomedical interest. (Vo 45(4-5): pp. 217-24, 1999; Natoli G. et al. Biochem. 40 Dinh and Cullum. Fresenius J Anal Chem. Vol. 366: pp. 540 Pharmacol. vol. 56(8): pp. 915–20, 1998). TRAIL is cur 551, 2000). As described above, enzymes and rently in clinical trials, and may be useful to induce apop immunoglobulin-derived polypeptides and polypeptide tosis in cancer cells. OPG is also in clinical trials and may complexes can be stabilized by application of the instant be useful to strengthen bone tissue and prevent bone loss during menopause (Wickelgren I. Science; vol. 285 (5430): invention. The improvements that stabilization of these 45 molecules provides, as described above, is also of significant pp. 998-1001, 1999). relevance to their use in biosensors and biochips. Growth factors that do not dimerize to function, that may The technology described herein can be applied alone, or be stabilized using the methods of the invention include, but in combination with other technologies. In one embodiment, are not limited to, polypeptides that can be stabilized by the technology can be applied in combination with one or introducing intra-chain di-tyrosine bonds, such as, as 50 more alternative technologies that provide additional stabil examples, insulin, erythropoietin, any of the colony stimu ity for the protein or protein complex. In another lating factors (CSFs), PDGF. embodiment, the technology described herein can be applied Industrial biocatalytic processes are used in many indus in combination with one or more alternative technologies try Sectors, including the chemical, detergent, that provide additional beneficial attributes to the protein or pharmaceutical, agricultural, food, cosmetics, textile, 55 protein complex. In yet another embodiment, the technology materials-processing, and paper industries. Within these may be applied in combination with a single alternative industries, biocatalysts have many applications, ranging technology that both stabilizes and provides additional ben from product synthesis (e.g. amino acid manufacturing, and eficial attributes. In yet another embodiment, the technology fine chemical synthesis of Small-molecule pharmaceuticals) may be applied in combination with two or more through use as active agents in products (for example, in 60 technologies, at least one of which that provides additional biological washing powders) to use in diagnostic testing stability, and at least one of which that provides at least one equipment. Biocatalysts also have industrial applications additional attribute. that range from wastewater and agricultural soil treatment, Combinations of technologies often leads to synergistic to crude oil refinement. effects, i.e. the combination of technologies is more effective Enzymes that may be stabilized using the methods of the 65 than the sum of the effects of the individual technologies invention include, but are not limited to, enzymes with applied individually. Synergies may be observed with regard applications as catalysts in basic, applied, or industrial specifically to stabilization, as example, but not limited to, US 7,037,894 B2 19 20 by combining application of the instant invention with an in a single antibody. The utility of Such monoclonal antibodies vitro evolutionary approach or immobilization strategies (MAbs), whose unique binding specificity can be character (see below). ized in detail, is vast. From a monoclonal population of Alternative technologies that provide additional stability antibody-producing cells it is possible to isolate the genes when applied in combination with the instant technology encoding the polypeptide chains that make up the antibody. include, but are not limited to, generating fusion proteins, Efficient large-scale production of recombinant immunoglo Such as, for example, single chain Fv fragments (scFv's; see bulin in yeast or bacterial expression systems is an active Pluckthun and Pack, Immunotechnology; vol. 3(2): pp. interest of the biotechnology industry. More importantly, 83-105, 1997); protein derivatization, such as, for example, however, molecular biological techniques allow us to PEGylation (Wright and Morrison. Trends Biotechnol.; vol. 10 manipulate these genes and thereby produce antibody 15(1): pp. 26–32, 1997: DeSantis & Jones. Curr. Opin. derived proteins custom-tailored to individual applications, Biotech., vol. 10(4) pp. 324–330, 1999); disulfide cross such as those described below. linking, generating Such products as disulfide stabilized One of the major limitations to the clinical effectiveness biocatalysts (Illanes. Elec. J. Biotech., vol. 2(1): pp. 7–15, of antibodies is their size. Full-length immunoglobulin mol 1999) or FV fragments (dsEvs. Reiter and Pastan. 15 ecules are effective as humoral agents, but their size makes TIBTECH; vol. 16(12): pp. 513–520, 1998; Reiter et al. Nat it difficult for them to penetrate tissues such as solid tumors. Biotech.: Vol. 14: pp. 1239–1245, 1996); other cross-link As a result, Smaller, engineered versions of antibodies have methodologies, such as, for example, generating cross been designed. Such engineered antibodies are designed to linked enzyme crystals by glutaraldehyde cross-linking retain normal functional specificity with respect to antigen (CLECs; Govardhan. Curr. Opin. Biotech., vol. 10(4) pp. binding in a much smaller molecule, while at the same time 331–334, 1999; Haring and Schreier. Curr. Opin. Chem. uncoupling this binding function from the immunoglobulin Biol., vol. 3(1): pp.35–38, 1999; Illanes. Elec. J. Biotech., molecule's other biological effector functions (e.g. comple vol. 2(1): pp. 7–15, 1999); other immobilization strategies, ment activation or macrophage binding, FIG. 1D). Such as, for example, embedding biocatalysts in gels, such Fv fragments have been shown to be the smallest as polyacrylamide (Illanes. Elec. J. Biotech., vol. 2(1): pp. 25 Ig-derived fragments that retain full binding specificity 7–15, 1999), medium engineering, Such as, for example, use (FIG. 1D). The Fv fragment essentially comprises only those of a biocatalyst in organic or aqueous-organic solvents amino acid sequences of the antibody molecule that consti (Carrea G. and Riva S. Angew. Chem. Int. Ed. Engl; vol. tute the “variable domain responsible for antigen binding. 39(13): pp. 2226-2254, 2000), and any in vitro evolution Due to their minimal size. Fv fragments show significantly strategies, such as, for example, directed evolution by DNA 30 better tissue penetration and can therefore be used in a shuffling (Stemmer. Nature, vol. 370: pp. 389–391, 1994: broader range of contexts (e.g. Solid tumor therapy). As used Zhao and Arnold. Nucleic Acids Res. Vol. 25: pp. herein, Fv fragments shall include the variable region of 1307–1308, 1997; Zhao et al. Nat. Biotechnol., vol 16: pp. immunoglobulin molecules or the equivalent or homologous 258-261, 1998: Shao et al Nucleic Acids Res. Vol. 26: pp. region of a T cell receptor. 681–683.). 35 Amino acid sequence comparisons of the 110–120 residue Technologies that may provide additional beneficial long VH and VL regions reveal that each is made up of four attributes to a polypeptide or polypeptide complex when relatively conserved sequence segments, called the “Frame applied in combination with the instant technology include, work Regions’ (FRs), and three highly variable sequence but are not limited to, generating fusion proteins, such as, for segments, called “Complementarity Determining Regions' example, hetero specific diabodies or FV fragments fused to 40 (CDRI, II, & III), which largely determine the specificity of cytotoxins, protein derivatization, such as, for example, the antibody (FIG. 1D, “right arm’). PEGylation, medium engineering, such as, for example, use The heavy and light chain FV fragment polypeptides of a biocatalyst in an organic or aqueous-organic Solvent, and any in vitro evolution strategies, such as, for example, associate with each other largely at sites within the con 45 served FRS. Fv fragments, however, lack the structural directed evolution by DNA shuffling (see above). stabilizing inter-chain di-sulfide bonds present in the Ig Technologies can be applied simultaneously either by constant regions. In order to keep recombinant FV heavy and incorporating the process of the other technology or tech light chains associated and achieve functional stability and nologies in the process of applying the instant invention, or affinity, the two chains of the molecule must be “stabilized' Vice versa. This would be the case, as a non-limiting 50 by Some other means. example, when applying an in vitro evolutionary approach in combination with the instant technology, such as 5.3. Biocatalysts described in Example II, Chapter 7. Alternatively, technolo Biocatalysts are a preferred class of catalysts for industrial gies can be applied in any Succession that best meets the process development, due to their high specificity and pro requirements and circumstances of a specific application. 55 cess yields. Specifically, they allow for the use of less energy 5.2.2. Immunoglobulin Fv Fragments and less expensive feedstocks (starting materials), reduce Antibodies or immunoglobulin molecules (Ig) are among the number of individual steps leading to a product, and the most therapeutically useful molecules. Their utility reduce waste products. Their commercial use is, however, results from their ability to bind to given target molecules still limited by instability, curtailing key applications. This with extremely high specificity and affinity. Their function in 60 invention provides methods for stabilizing Such enzymes, the immune system is to bind to foreign molecules (such as improving their performance as industrial catalysts, and those present on the Surface of pathogens) and to trigger the prolonging their half-lives and shelf-lives. Application of the removal of these foreign molecules from the body using a instant invention also enables the industrial use of novel, variety of effector mechanisms. previously unstable, biocatalysts, and thereby also shortens With the advent of hybridoma technology, based on the 65 industrial process innovation cycle times. work of G. Kohler and C. Milstein in the early 1980s, it has Specifically, application of the instant invention stabilizes become possible to engineer pure clones of cells expressing biocatalysts, for example, by preventing the unfolding of the US 7,037,894 B2 21 22 protein. This increases their ability to catalyze chemical fragments can then be separated according to size by stan reactions under adverse reaction conditions, prolongs their dard techniques, such as agarose and polyacrylamide gel half- and shelf-lives, and maximizes their activity at milder, electrophoresis and column chromatography. actual process temperatures. Cloning Once the DNA fragments are generated, identification of 5.4. Obtaining Polypeptides to be Stabilized the specific DNA fragment containing the desired gene may Any method known to one skilled in the art may be used be accomplished in a number of ways. For example, clones to obtain a polypeptide or polypeptide complex to be sta can be isolated by using PCR techniques that may either use bilized according to the methods of the invention. two oligonucleotides specific for the desired sequence, or a 10 single oligonucleotide specific for the desired sequence, 5.4.1. Purification of Polypeptides using, for example, the 5' RACE system (Cale J. M. et al. A polypeptide or polypeptide complex to be stabilized Methods Mol. Biol.; vol. 105: pp. 351–71, 1998; Frohman using the methods of the instant invention may be obtained, M. A. PCR Methods Appl.: vol. 4(1): pp. for example, by any protein purification method known in S40–58, 1994). The oligonucleotides may or may not the art. Such methods include, but are not limited to, 15 chromatography (e.g. ion exchange, affinity, and/or sizing contain degenerate nucleotide residues. Alternatively, if a column chromatography), ammonium Sulfate precipitation, portion of a gene or its specific RNA or a fragment thereof centrifugation, differential solubility, or by any other stan is available and can be purified and labeled, the generated dard technique for the purification of proteins. A polypeptide DNA fragments may be screened by nucleic acid hybrid ization to the labeled probe (e.g. Benton and Davis. Science: may be purified from any source that produces it. For vol. 196(4286): pp. 180–2, 1977). Those DNA fragments example, polypeptides may be purified from sources with substantial homology to the probe will hybridize. It is including, prokaryotic, eukaryotic, mono-cellular, multi also possible to identify the appropriate fragment by restric cellular, animal, plant, fungus, vertebrate, mammalian, tion enzyme digestion(s) and comparison of fragment sizes human, porcine, bovine, feline, equine, canine, avian, tissue with those expected according to a known restriction map if culture cells, and any other natural, modified, engineered, or 25 any otherwise not naturally occurring source. The degree of such is available. Further selection can be carried out on the purity may vary, but in various embodiments, the purified basis of the properties of the gene. protein is greater than 50%, 75%, 85%, 95%, 99%, or 99.9% The presence of the desired gene may also be detected by of the total mg protein. Thus, a crude cell lysate would not assays based on the physical, chemical, or immunological comprise a purified protein. properties of its expressed product. For example, cDNA 30 clones, or DNA clones which hybrid-select the proper Where it is necessary to introduce one or more tyrosine mRNAs, can be selected and expressed to produce a protein residues to be cross-linked into a purified polypeptide or that has, for example, similar or identical electrophoretic polypeptide complex, the polypeptide(s) can be micro migration, isoelectric focusing behavior, proteolytic diges sequenced to determine a partial amino acid sequence. The tion maps, hormonal or other biological activity, binding partial amino acid sequence can then be used together with 35 library screening and recombinant nucleic acid methods well activity, or antigenic properties as known for a protein. known in the art to isolate the clones necessary to introduce Using an antibody to a known protein, other proteins may tyrosines. be identified by binding of the labeled antibody to expressed putative proteins, for example, in an ELISA (enzyme-linked 5.4.2. Expression of DNA Encoding a Polypeptide immunosorbent assay)-type procedure. Further, using a Source of DNA 40 binding protein specific to a known protein, other proteins Any prokaryotic or eukaryotic cell can serve as the may be identified by binding to such a protein either in vitro nucleic acid source for molecular cloning. A nucleic acid or a suitable cell system, Such as the yeast-two-hybrid sequence encoding a protein or domain to be cross-linked or system (see e.g. Clemmons D. R. Mol. Reprod. Dev.; vol. Stabilized may be isolated from sources including 45 35: pp. 368–374, 1993; Loddick S. A. et al. Proc. Natl. Acad. prokaryotic, eukaryotic, mono-cellular, multi-cellular, Sci., U.S.A.: vol. 95: pp. 1894–1898, 1998). animal, plant, fungus, vertebrate, mammalian, human, A gene can also be identified by mRNA selection using porcine, bovine, feline, equine, canine, avian, etc. nucleic acid hybridization followed by in vitro translation. The DNA may be obtained by standard procedures known In this procedure, fragments are used to isolate complemen in the art from cloned DNA (e.g., a DNA “library’), by 50 tary mRNAs by hybridization. Such DNA fragments may chemical synthesis, by cDNA cloning, by the cloning of represent available, purified DNA of another species (e.g., genomic DNA, or fragments thereof, purified from the Drosophila, mouse, human). Immunoprecipitation analysis desired cell (see e.g., Sambrook et al.; Glover (ed.). MRL or functional assays (e.g. aggregation ability in vitro, bind Press, Ltd., Oxford, U.K. vol. I, II, 1985). The DNA may ing to receptor, etc.) of the in vitro translation products of the also be obtained by reverse transcribing cellular RNA, 55 isolated products of the isolated mRNAs identifies the prepared by any of the methods known in the art, such as mRNA and, therefore, the complementary DNA fragments random- or poly A-primed reverse transcription. Such DNA that contain the desired sequences. may be amplified using any of the methods known in the art, In addition, specific mRNAs may be selected by adsorp including PCR and 5' RACE techniques (Weis J. H. et al. tion of polysomes isolated from cells to immobilized anti Trends Genet. 8(8): pp. 263-4, 1992: Frohman M. A. PCR 60 bodies specifically directed against protein. A radiolabeled Methods Appl. 4(1): pp. S40–58, 1994). cDNA can be synthesized using the selected mRNA (from Whatever the source, the gene should be molecularly the adsorbed polysomes) as a template. The radiolabeled cloned into a Suitable vector for propagation of the gene. mRNA or cDNA may then be used as a probe to identify the Additionally, the DNA may be cleaved at specific sites using DNA fragments from among other genomic DNA frag various restriction enzymes, DNAse may be used in the 65 mentS. presence of manganese, or the DNA can be physically Alternatives to isolating the genomic DNA include, sheared, as for example, by sonication. The linear DNA chemically synthesizing the gene sequence itself from a US 7,037,894 B2 23 24 known sequence or making cDNA to the mRNA which can also be accomplished using computer Software programs encodes the protein. For example, RNA for cDNA cloning of available in the art. Other methods of structural analysis the gene can be isolated from cells that express the gene. include X-ray crystallography, nuclear magnetic resonance Vectors spectroscopy and computer modeling. The identified and isolated gene can then be inserted into 5.5. Suitable Residues for a Cross-Linking Reaction an appropriate cloning or expression vector. A large number of vector-host systems known in the art may be used. The identification and/or engineering of Suitable residues Possible vectors include plasmids or modified viruses, but for a cross-linking reaction may involve one or more of the the vector system must be compatible with the host cell several steps set forth below. used. Such vectors include bacteriophages Such as lambda 10 5.5.1. Introduction of Point Mutations to Control the derivatives, or plasmids such as pBR322 or puC plasmid Cross-Link Reaction derivatives or the Bluescript vector (Stratagene). Engineering the overall structure and function of a stabi The insertion into a cloning vector can, for example, be lized polypeptide or polypeptide complex is achieved by accomplished by ligating the DNA fragment into a cloning controlling the availability of tyrosyl side-chains for the vector that has complementary cohesive termini. However, 15 cross-linking reaction, for example, but not limited to, via if the complementary restriction sites used to fragment the mutagenesis. Functionality of a polypeptide or polypeptide DNA are not present in the cloning vector, the ends of the complex may be compromised or altered by a tyrosine DNA molecules may be enzymatically modified. tyrosine cross-link reaction. In this case, an undesirable Alternatively, any site desired may be produced by ligating bydroxyl group of a tyrosyl side-chain may be removed by nucleotide sequences (linkers) onto the DNA termini; these mutating Such residues to phenylalanine, or masked to ligated linkers may comprise specific chemically synthe inhibit Its participation in Such a reaction. In this way, a sized oligonucleotides encoding restriction endonuclease tyrosyl residue available for the cross-linking reaction but recognition sequences. Furthermore, the gene and/or the that may lead to distortion of structure and compromise vector may be amplified using PCR techniques and oligo functionality and/or specificity of the polypeptide or nucleotides specific for the termini of the gene and/or the 25 polypeptide complex is removed. Moreover, point mutations vector that contain additional nucleotides that provide the to tyrosine may be introduced at positions where the tyrosyl desired complementary cohesive termini. In alternative side-chains will react with each other to form a bond that methods, the cleaved vector and a gene may be modified by causes the least distortion to structure arid function; these homopolymeric tailing (Cale J. M. et al. Methods Mol. Biol. positions are identified as described in detail below. Thereby, vol. 105: pp. 351–71, 1998). Recombinant molecules can be 30 the overall structure and functionality of the polypeptide or introduced into host cells via transformation, transfection, polypeptide complex is maintained. infection, electroporation, etc., so that many copies of the 5.5.2. Removing Undesirable Reactive Side-Chains gene sequence are generated. Reactive side-chains identified in a polypeptide chain or Preparation of DNA 35 in the polypeptide chains of a complex are identified that In specific embodiments, transformation of host cells with subjected to the conditions of the oxidative cross-link recombinant DNA molecules that incorporate an isolated described above would result in a bond that would distort the gene, cDNA, or synthesized DNA sequence enables genera structure of the complex. These residues are identified by tion of multiple copies of the gene. Thus, the gene may be comparison of the polypeptides amino acid sequences to obtained in large quantities by growing transformants, iso 40 available structural information on Such or similar com lating the recombinant DNA molecules from the transfor plexes (see below). Such a bond can be formed either mants and, when necessary, retrieving the inserted gene between two polypeptide chains of the complex (inter-chain from the isolated recombinant DNA. bond) or between two residues of one and the same polypep The sequences provided by the instant invention include tide chain (intra-chain bond). The effect of the formation of those nucleotide sequences encoding Substantially the same 45 a bond is determined by both of the reactive side-chains amino acid sequences as found in native proteins, and those involved in the formation of such a bond, and therefore these encoded amino acid sequences with functionally equivalent residues would be identified in pairs. amino acids, as well as those encoding other derivatives or To neutralize this damaging effect of the cross-link analogs, as described below for derivatives and analogs. reaction, masking reagents that protect aromatic side chains 50 (Politt S. and Schultz. P. Agnew. Chem. Int. Ed.; Vol. 37(15): Structure of Genes and Proteins pp. 2104-2107, 1998) may be use, or amino acid substitu The amino acid sequence of a protein can be derived by tions to phenylalanine, or any other amino acid, may be deduction from the DNA sequence, or alternatively, by introduced at least at one of the residues involved, for direct sequencing of the protein, for example, with an example, by introducing a point mutation in the cDNA of the automated amino acid sequencer. 55 gene directing the expression of the polypeptide. A protein sequence can be further characterized by a 5.5.3. Introducing Reactive Side-Chains hydrophilicity analysis (Hopp T. P. and Woods K. R. Proc. To achieve a stabilized polypeptide or polypeptide com Natl. Acad. Sci., U.S.A.: vol. 78: pp. 3824, 1981). A hydro plex without disrupting its structure and/or function, posi philicity profile can be used to identify the hydrophobic and tions within each polypeptide are identified at which a hydrophilic regions of the protein and the corresponding 60 reactive side-chain would be able to form a bond with a regions of the gene sequence which encode such regions. reactive side-chain on the, or one of the, other polypeptide Secondary, structural analysis (Chou P.Y. and Fasman G. chain(s). Such positions are selected both with respect D. Biochemistry; vol. 13(2): pp. 222 45, 1974) can also be toward maintaining the overall structure of the same done, to identify regions of a protein that assume specific polypeptide, and with respect toward the suitability of a secondary structures. Manipulation, translation, and second 65 position in the other polypeptide involved in the bond, and ary structure prediction, open reading frame prediction and the positions are therefore selected in pairs (see below for plotting, as well as determination of sequence homologies, detailed description of selection process). US 7,037,894 B2 25 26 When at a selected residue of either, or any, polypeptide and toward determining the Suitability of a residue or group (s) the reactive tyrosyl side-chain is not already present, a of residues for modifications that are intended not to disrupt point mutation may be introduced, for example, but not the fold, structure, and/or function of the protein or protein limited to, by using molecular biological methods to intro complex. A method of evaluating sets of data on related to duce such a point mutation into the cDNA of the gene 5 the amino acid sequence, the structure, and/or function/ directing its expression, Such that a reactive side-chain is functionality of related polypeptides statistically for the present and available for the reaction. purpose of identifying important residues, or Suitable resi 5.6. Structurally Conserved Domains dues for modification within a protein or protein complex of 5.6.1. Relationship Between Structure and Function interest, or a group of related proteins or protein complexes It is the three-dimensional, or the tertiary, structure of 10 of interest, is disclosed. every protein, and the quaternary structure of every protein Given the availability of relevant data, it is often possible complex that lends them the functionality that has allowed to assign quantitative values for certain characteristics of an them to be maintained and developed through the evolu amino acid side chain present at each residue of a domain, tionary process over time. A point mutation in the gene of a polypeptide, or polypeptide complex. Furthermore, given polypeptide or polypeptide complex that leads to an amino 15 the relevant data on domains, polypeptides, or polypeptide acid substitution at any given residue will alter the structure complexes, it is possible to give groups of amino acids of the polypeptide and/or of the overall complex to a greater values that describe their structural and/or functional rela or lesser extent. The extent of Such an amino acid substitu tionship. These values can be compared between individual tion’s effect on the structure of the polypeptide or polypep domains by aligning the data in Such a way that the sets of tide complex is dependent on the structural context of the values to be compared are structurally and functionally residue, and on the nature of the resultant amino acids related (see above). If there is a sufficient number of side-chain. individual domains, polypeptides, or polypeptide Protein domains that show extensive similarity in their complexes, for which such data is available, it is possible to amino acid sequences to domains in other proteins are analyze these sets of data statistically. referred to as “conserved domains’. Within conserved 25 domains individual residues are more conserved than others; Statistical analysis of sets of data provides information some can be 100% conserved, and others not at all. Most concerning the degree of structural conservation and/or conserved domains are not only similar in their amino acid variability of a residue or a group of residues in a sample, sequences, but also in their three-dimensional structures, and and an indication to what extent a residue or a group of also in their functions. In the absence of evolutionary residues are involved in providing the underlying pressures that require a residue of a domain to be conserved, 30 architecture, or the specificity, of a domain. This information it is thought that the amino acid present at a residue would is derived from statistical measurements that include, but are vary widely due to the rate of mutation that drives evolu not limited to, a given values average, variance, standard tionary diversification. Hence, the residues within a con deviation, range, maximum, and minimum. For example, served domain that are highly conserved are thought to be high variance or standard deviation measurements of a important contributors to the overall structure, or the 35 certain value implies high variability of a certain value of a architecture, of the domain. Among the residues that are less residue or a group of residues, and thus a low degree of conserved are those that contribute to the specificity of the conservation, and vice versa. individual domain of the group. From the measurements that are made on a set of data, it Conserved domains, however, can also show very little is possible to make predictions for the suitability of residues, sequence homology and yet have conserved structures, such 40 or groups of residues, in related domains, polypeptides of as, for examples, leucine Zippers (Alber T. Curr. Opin. polypeptide complexes that are, and that are not, present in Genet. Dev.; Vol. 2(2): pp. 205–10, 1992). Since a conserved the sample. A residue that is highly conserved in a sample of structure also yields structurally conserved residues, the related polypeptides with regard to one or more relevant sets distinction between the above described architectural and of data has a high likelihood of having similarity in all specificity determining residues can also be made in the 45 individual polypeptides including those not present in the absence of sequence conservation. For the purposes of the sample. Therefore, using statistical analyses to identify instant invention, a conserved domain is defined, depending important residues and/or to determine which residues are on the availability of data, either by sequence homology, Suitable for modification, lends this methodology a higher which can be as low as 5% identity or similarity, or by the degree of generally applicability. group of domains structure or functionally. 50 5.6.2. Alignment of Conserved Residues Potential applications of this methodology include, but Alignment of the two-dimensional sequences of con are not limited to, structure-function analyses of polypep served domains reveals further that between conserved tides or polypeptide complexes, that include, for example, residues there are frequently interspersed by chains of but are not limited to, determining the importance of one of varying lengths, i.e. there are varying numbers of amino acid 55 more side-chains of a residue or a group of residues in either residues between conserved residues important for the over the active site of an enzyme, the protein-protein interaction all structure of the domain. In order to be able to compare Surface of a polypeptide or polypeptide complex, the Sub the sequences of individual domains to determine where to strate binding pocket of an enzyme, and/or the binding direct the cross-link reaction to, it is essential that the pocket of an inhibitor. sequences are aligned in Such a way that amino acids that 60 Furthermore, as described below, this methodology can be correspond structurally to one another are compared. For applied to identify residues or groups of residues that are residues identified from amino acid and nucleotide sequence suitable for modifications that include, but are not limited to, analyses as highly conserved, this is easily accomplished. the Substitution of one or more amino acids (for example, by point-directed mutagenesis) and/or chemical modification. 5.7. Statistical Selection Method 65 Non-limiting examples of Such modifications include Sub Structural comparisons of proteins and protein complexes stitutions of amino acids to cysteines toward the formation can inform toward the identification of important residues, of disulfide bonds; substitution of amino acids to tyrosine US 7,037,894 B2 27 28 and Subsequent chemical treatment of the polypeptide Database of Sequences of Proteins of Immunological Inter toward the formation of dityrosine bonds, as disclosed in est Johnson, G. et al. Weir's Handbook of Experimental detail herein; one or more amino acid Substitutions and/or Immunology I. Immunochemistry and Molecular chemical modification toward generating a binding pocket Immunology, Fifth Edition, Ed. L. A. Herzenberg, W. M. for a small molecule (substrate or inhibitor), and/or the Weir, and C. Blackwell, Blackwell Science Inc., Cambridge, introduction of side-chain specific tags (e.g. to characterize Me... Chapter 6.1–6.21, 1996) that contains, among other molecular interactions or to capture protein-protein interac things, sequences of immunoglobulin molecules (see Sec tion partners). tions 6–8, Examples). Such sequence data is also available The selection of residues and/or residue pairs to which a from Genbank(R). modification can be directed to stabilize a polypeptide or 10 Structural Data polypeptide complex functionally is preferably carried out Three-dimensional structures, as described by atomic by analyzing data on several polypeptide or polypeptide coordinate data, of a polypeptide or complex of two or more complex structures of a group of conserved domains or polypeptides can be obtained in several ways. polypeptides statistically and selecting the residue pairs The first approach is to mine databases of existing struc based on selection criteria, such as those developed and 15 tural co-ordinates for the proteins of interest. The data of described below. solved structures is often available on databases that are easily accessed in the form of three-dimensional coordinates 5.8. Generation and use of Databases (x, y, and z) in Angström (10" m) units. Often this data is 5.8.1. Generating Data Relevant to the Selection Criteria also accessible through the internet (e.g., on-line protein The increasing availability of data concerning the genes, structure database of the National Brookhaven Laboratory). proteins, and other bio-molecules of many living species, The second utilizes diffraction patterns (by for example, make it possible to compile a significant amount of data on but not limited to X-rays or electrons) of regular 2- or several protein domains/modules for statistical analyses to 3-dimensional arrays of proteins as for example used in the make predictions, as described above. This data can be 25 field of X-ray crystallography. Computational methods are transformed into data that can be utilized for Such analyses used to transform such data into 3-dimensional atomic directly. co-ordinates in real space. Such transformations can, for instance, be done by con The third utilizes Nuclear Magnetic Resonance (NMR) to verting nucleotide data into amino acid sequence data, and determine inter-atomic distances of molecules in solution. further by converting amino acid sequence data into numeric 30 Multi-dimensional NMR methods combined with computa data concerning the physical properties of the amino acids tional methods have succeeded in determining the atomic side-chains of a given residue. Such properties, for instance, co-ordinates of polypeptides of increasing size. A fourth can be the charge or the degree of hydrophobicity of a approach consists entirely of computational modeling. Algo residue's side-chains (see below). rithms may be based on the known physio-chemical nature Furthermore, structural data of a polypeptide or of two or 35 of amino-acids and bonds found in proteins, or on iterative several polypeptides in a complex can be transformed into approaches that are experimentally constrained, or both. An numeric data that describes the structural relationship of the example of software is the CNS program developed by Axel individual residues with the other residues of the polypep Brunger and colleagues at the HHMI at Yale University tide or those of the other polypeptide(s) in the complex. An (Adams P. D. et al. Acta Crystallogr. D. Biol. Crystallogr., example for Such a transformation would be the calculation 40 vol. 55 (Pt 1): pp. 181–90, 1999). of the distances between the alpha carbons of a residue pair Functional Data using three-dimensional coordinate data derived from crys Functional data is not as easily used, as there is no tallographic resolution of a polypeptide's or a complex uniform way of standardizing and compiling it, such as structure using Pythagorean three-dimensional geometry. nucleotide or amino acid sequence data, or coordinates for It is possible to generate many different sets of data structural data. It is generated in many different ways, Such relevant for the stabilization according to the procedure of as genetic, biochemical, and mutational analyses, molecular this invention concerning many of the structural features of biological dissection and the construction of chimerical the residues and residues pairs of a domain or a complex. As domains. In many cases the data available is not always often more qualitative judgements are required to determine clearly interpretable and therefore its use becomes less the reliability of the selection inputs, it also becomes a more 50 clearly delineated. But when available, functional data pro qualitative decision how many different sets of data should vides valuable information concerning the specificity and be used in the identification or selection of residues or functionality of a domain/module, and where possible is groups of residues. The less reliable the inputs, the more preferably incorporated into the selection process. useful it is to implement additional information in the Functional data is preferably also generated after the selection. 55 cross-link reaction according to the present invention to 5.8.2. Data Sources ensure that the predictions made were accurate for the Sequence Data specific application, and that the polypeptide or polypeptide The most direct way of accumulating sequences is by complex actually retained its functionality and specificity. cloning and sequencing cDNAs of proteins that contain the 60 5.8.3. Construction of Databases domains/modules of interest. Sequence data is becoming 3-D Database more and more available through the efforts of the genome A database of structural information including the atomic projects. Much of the sequence data is available in databases coordinate data of crystallographically solved polypeptides that can be accessed through the internet, or otherwise, and and polypeptide complexes of a group of conserved furthermore there are several published sources that have 65 polypeptides or domains and their ligands, and derivative, accumulated sequences of specific domains/modules. One relevant data is compiled. Input data is derived from struc Such collection of specific sequence data is the Kabat tural coordinate data files. Data relevant to the selection US 7,037,894 B2 29 30 process in this database is derived from coordinate data by Database of Functional Data applying coordinate geometry in three dimensions. This Where it is possible to obtain functional data that indi database preferably contains, for example, in addition to the cates the importance of a residue/residue pair for the structural coordinate data, the following, relevant data polypeptide's or polypeptide complex overall structure together with statistical measurements (e.g. mean, median, 5 and/or specificity, it is preferably incorporated into the mode, standard deviation, maximum, and minimum) on selection process, as it enhances the accuracy of the statis each of the following features for each residue pair, whereby tical predictions made. Such data is preferably quantified, to the sample polypeptides or polypeptide complexes are whatever degree possible, with respect to individual residues aligned as described above. and/or residue pairs of a polypeptide or complex, or with 1. Inter-chain alpha carbon to alpha carbon distances of the polypeptide pair(s) of a polypeptide or complex, in order 10 respect to Sub-domains or domains that mediate protein to find residue pairs that are appropriately spaced for a folding or protein-protein interactions, and compiled in a tyrosyl-tyrosyl bond to be formed. These distances are suitable database. calculated by, for instance, but not limited to, applying 5.8.4. Required Sample Size (N) Pythagorean geometry to the 3D coordinates of the alpha Often the availability of data is limiting for this approach. carbons. For every residue pair statistical measurements are 15 However, to make statistical measurements on a sample of calculated, such as the average, standard deviation, range polypeptides or polypeptide complexes in order to identify and median of corresponding alpha carbon-alpha carbon residues or select residues or groups of residues for distances. modification, it is best to use a large sample, as it will yield 2. The three angles, p, up and X (FIG. 2C) in relation to more accurate predictions. But often it is very labor which the side-chains of each residue pair are oriented intensive accumulating and/or aligning the data in Such a toward each other relative to the inter-chain alpha carbon way that measurements become meaningful (see above). alpha carbon axes, are calculated from the coordinates of the Since there is always a limited range of values, and since alpha and beta carbons of each pair for each polypeptide or therefore their variability is also limited, accurate predic polypeptide complex in the sample. The angles are calcu tions can also be made from Smaller sets of data. A sample lated by defining two planes, each of which are defined by 25 with more than 15 individual structures, sequences or func both alpha carbon positions and one of the beta carbons tional units is preferable. positions. By applying analytical geometry, each of the However, previously methods have been used to position angles in the alpha carbons (Scalar products), and the angle other cross-links, such as di-sulfide bonds, by examining formed by the planes (vector products) are calculated. Sta only the one polypeptide or complex in which the point tistical measurements are also made from this set of data, as 30 mutations are to be made, and this has resulted in functional described for the alpha carbon spacing. complexes (Pastan et al., U.S. Pat. No. 5,747,654 issued The difference between the alpha carbon distance (i.e. the May 5, 1998). Therefore it is possible to make predictions backbone carbon distance) and the beta carbon distance (i.e. that can be accurate on a small sample. However, in order to the distance between the first carbons in each side chain) of make predictions based on statistics that include Such mea each residue pair can also be calculated as a proxy of the 35 Surements as standard deviations, it is not meaningful to use orientation of the side chains relative to each other (see a sample size less than three (a standard deviation on 2 below). points of data is not a meaningful measurement). Therefore 2-D Database the minimum of a sample size is three for any statistical A database of DNA or amino acid sequences of polypep analyses. tides or polypeptides involved in complexes of a kind, 40 including residue side-chain usage from sequence data and 5.9. Selection Process derivative, relevant data is compiled. Data relevant to the selection process in this database is derived from sequence 5.9.1. Selection Criteria for Amino Acid Substitutions data by applying a numeric value representing the physical Structural Suitability properties of every occurring amino acid side chain at each 45 The object of such analyses is to determine which resi residue, whereby the sample polypeptides or polypeptide dues pairs will be most suited for the cross-link reaction in complexes are aligned as described above. This database order to main the structure, function, and specificity of a contains, for example, in addition to sequence data, the polypeptide or polypeptide complex. Therefore, many of the following, relevant data together with statistical measure criteria the residue pairs are selected for relate to the pairs ments (e.g. mean, median, mode, standard deviation, 50 potential to accommodate two cross-linked reactive side maximum, and minimum) on each of the following features chains without distorting the peptide-bond backbone and for each residue pair. The statistical measurements can be altering the structure of the polypeptide or complex at made and stored on the occurring amino acids at each positions that enable and define its function and specificity. residue both weighted and un-weighted by the frequency at Measurements that can be made to attain information which the specific side chain occurs at this residue. 55 concerning this potential relate to the determinants of the 1. Numeric data concerning the bulk/volume of residues space available for the reactive side-chains and the bond. side chains, such as, but not limited to, chemical Such measurements include the distance between the residue composition, molecular weight and Van der Waals Volumes pairs alpha-carbons, which are the carbon atoms that are a (Xia X. and Li W. H.; Richards, F. M.). part of the “backbone' formed by the peptide bonds between 2. Numeric data concerning the polarity of the residues 60 all amino acids of the polypeptide. The selected residue pairs side-chains, such as, but not limited to, charge, isoelectric should have an average alpha-carbon distance dose to the point, and hydrophobicity (Xia X. and Li W. H.; Eisenberg, distance that the alpha-carbons of the cross-linked tyrosyl D.). side-chains would be from each other if point mutations Examples of other amino acid side chain property mea were introduced, and the cross-link reaction were directed to Surements that can be incorporated in Such a database are 65 that residue pair. The selected residue pairs should be so that can be analyzed are aromaticity, aliphaticity, close to the distance of the alpha-carbons of cross-linked hydrogenation, and hydroxythiolation (Xia X. and Li W. H.). tyrosyl side-chains to ensure that the functionality of the US 7,037,894 B2 31 32 polypeptide or polypeptide complex is maintained. The Variability criteria for this selection are described in detail below The specificity of each individual domain and its coun (Selection Process: Determination of the Alpha Carbon terpart in the same protein or in another protein of a complex Distance in the Tyrosyl-tyrosyl Bond, The Filters). Since the is generally determined by residues that are less, or not, variability of a residue pair's structural characteristics is also conserved. Therefore, considering the specificity of an indi an important criterion in the selection of suitable residue pairs for the cross-link reaction (see below), the required vidual domain, a residue with high variability can be a less proximity to the optimal distance is calculated for each desirable choice to which to direct the cross-link reaction. residue pair, dependent on the variability of its alpha-carbon However, considering the overall structure and architecture distances in the sample. The calculation of this requirement of a domain, the architecture of the domain can more likely is also described in detail below (Selection Process: The 10 accommodate a mutation at a residue that exhibits a high Filters). degree of variability. Thus, from this perspective, high Measurements can also be made to determine whether the variability indicates that a residue is a better candidate at protein will fold in such a way that the reactive side-chains which to introduce a point mutation, and place a reactive will be directed toward each other. Selection criteria can be side-chain. developed based on the angles of the reactive side-chains 15 Depending on the reliability and accuracy of these and of the cross-link, the rotational freedom of the reactive analyses, which, in turn, depends on the reliability of the side-chains, and measurements concerned with the three inputs into the analyses (see below), it is possible to vary the dimensional geometrical relationship between the alpha requirement for a positions, or a pair's variability (which carbons and the beta-carbons of each residue pair. The beta indicates a certain degree of flexibility and/or robustness). carbon is the first carbon atom of the amino acid side-chains Thus, if the inputs are highly accurate, and Sufficient data is not part of the backbone. Such selection criteria are present in the sample, it is possible to determine that a described in detail below (Selection Process: Calculations of residue pair is highly Suitable for the reaction although its Side-chain Angles in the Tyrosyl Bond, The Filters). The variability is low. However, in cases where there is insuffi Smallest amino acid, glycine, does not have a beta-carbon, cient data or insufficient accuracy in the inputs for the and therefore residue pairs of which one or both of the amino 25 analyses to allow for low variability, a residue that is acids is a conserved glycine cannot be analyzed in this way. important for the specificity, but not for the overall archi Since mutation of a conserved glycine would likely lead to tecture of the domain may be selected. In the absence of a significant structural distortion, residue pairs of which one functional data it is very difficult to determine a residues or both residues are a conserved glycine are eliminated. This contribution to the specificity of the domain. selection criterion is also described in detail below 30 (Selection Process: The Filters). Furthermore, the structural 5.9.2. Determination of the Alpha Carbon Distance in the context of the residue pair is preferably considered to Tyrosyl-Tyrosyl Bond ascertain the availability of three-dimensional space for the As stated above, selected residue pairs should have an reactive side-chains and the bond. The relevantamino acid average alpha-carbon distance close to the distance of the side-chain characteristics of proximal residues therefore are 35 alpha-carbons of cross-linked tyrosyl side-chains. The range preferably taken into account, to further substantiate that the of distances that is possible between the alpha carbons of reactive side-chains will be able to rotate such that the bond two cross-linked tyrosines is calculated for the epsilon can be formed without distorting the polypeptide backbone. epsilon bonded isoform of the cross-link by applying stan If the context is such that the reactive side-chains introduced dard geometry, Pythagorean geometry, and trigonometry. The calculations are based on all carbon-carbon bonds by point mutation will not be able to rotate freely into the 40 desired position, the bond will either not readily be formed, dityrosine bond forming 120 degree angles due to the planar or distortions will occur that could potentially impair or alter structure of the aromatic ring with the exception of the angle the function and/or specificity of the polypeptide or polypep in the beta carbon, which forms the tetrahedral angle of tide complex. Therefore, selection criteria are developed to 109.5 degrees (FIG. 2A). allow more conservative point mutations to be introduced 45 Furthermore, these calculations take into consideration that will be less likely to cause structural distortions. Such that the structure of the dityrosine has significant degrees of criteria are based on the amino acids present at, and rotational freedom, and that therefore the distance between Surrounding, the residues of a pair, and are quantified based the alpha carbons of the two tyrosines can be quite different on numeric values of the physical properties of those amino depending on its conformation. Specifically, the rotational acid side-chains. The calculation of Such requirements is 50 freedoms in the beta carbon-gamma carbon bonds, and the described in detail below (Selection Process: The Filters). rotational freedom in the bond linking the aromatic rings are If a suitable residue pair can be identified that is already considered. Other isoforms of the cross-link are, however, an appropriated reactive amino acid on both chains at Some possible, which would enable even closer distances between frequency in the sample, this pair would be an ideal selec the alpha-carbons of the dityrosine, which is further taken tion. However, reactive side-chains present in the polypep 55 into consideration in setting the possible ranges in the tides or polypeptides of the complex to be cross-linked that selection process of the residue pairs, as described below in would cause structural distortions by forming either inter- or the “Filters’. intra-chain bonds should be neutralized, either by a means of The angle X in FIG. 2C is the angle formed by the two masking/protecting them (Politt S. and Schultz, P. Agnew. planes, each defined by the alpha carbon-alpha carbon axis, Chem. Int. Ed.; vol. 37(15): pp. 2104–2107, 1998) or by 60 and individually by the positions of each of the beta carbons introducing maximally conservative point mutations. Such of the two tyrosyl side-chains involved in the bond. The reactive residue pairs are identified using the same criteria as angle (), determined by the rotational freedom in the dity for the positive selection of residue pairs suitable for cross rosine bond itself, is 120° in FIG. 3, and -120° in FIG. 4. linking. However, the presence of undesirable side-chains The schematic depictions of possible bond configurations can only be determined by analyzing the specific sequence 65 for an angle () of 120° in FIG.3 represent an angle X of 180°. of an individual domain, and by comparing it with the at which both the maximal and minimal angles are in the structural information used for the positive selection. projected plane. The schematic depictions of possible bond US 7,037,894 B2 33 34 configurations for an angle () of 120° in FIG. 4 represent an “filters' aimed at successively narrowing down the residue angle X of 0°, at which both the maximal and minimal angles pairs most likely to result in an inter-chain cross-linked are in the projected plane. tyrosine pair of a polypeptide or polypeptide complex that For an angle () of 120° and an angle X of 180°, and in the minimally alters the polypeptide's or polypeptide complex configuration at which the alpha carbon distance is at a structural characteristics. minimum (FIG.3A), the alpha carbon distance is 1174 A; in Where it is not possible or inconvenient to obtain the the configuration, in which the alpha carbon distance is at a required data for statistical analyses, residue pairs can also maximum (FIG. 3B), the alpha carbon distance is 9.56 A. be selected in any other way, including, for example, trial For an angle () of -120° and an angle X of 180°, and in the configuration at which the alpha carbon distance is at a and error. Such selection processes yield residue pairs to minimum (FIG. 4A), the alpha carbon distance is 10.73 A: 10 which the cross-link can be directed while maintaining the in the configuration, in which the alpha carbon distance is at functionality of the polypeptide or polypeptide complex. a maximum (FIG. 4B), the alpha carbon distance is 5.70 A. An example of a successive set of filters is the following: 5.9.3. Calculations of Side-chain Angles in the Tyrosyl 1. Selection based on residue pair alpha carbon spacing, Bond based on (1) the calculated maximal and minimal distances The angles (p and up (FIG. 2C) are the angles in each of the 15 in a cross-linked tyrosine pair (see above), and (2) the alpha carbon atoms between the alpha carbon-alpha carbon distances measured and compiled in a 3-D database. The axis and the alpha carbon-beta carbon bond. They are selection is carried out on the average, median, mode, or any calculated for the maximum and minimum distances other statistical value suitable to determine whether the pair between the alpha carbon atoms based on the rotational is likely to be spaced in such a way that the cross-link will flexibility of the carbon-carbon bonds in the beta carbon minimally distort the overall structure. The optimal range of atOm. residue pair alpha carbon distances to be selected is deter The schematic depictions of possible bond configurations mined by averaging first the minimal distances in a cross for an angle () of 120° in FIG.3 represent an angle X of 180°. linked tyrosine pair of the isoform depicted in FIG. 2B for at which both the maximal and minimal angles are in the () angles of 120° and -120°, and then, analogously, aver projected plane. The schematic depictions of possible bond 25 configurations for an angle () of 120° in FIG. 4 represent an aging the maximal distances, as calculated above. These angle X of 0°, at which both the maximal and minimal angles calculations result in the following optimal range: are in the projected plane. Min: 7.63 A, Max: 11.24 A. For an angle () of 120° and an angle X of 180°, and in the configuration at which the alpha carbon distance is at a 30 Since distances are possible in a larger range, and because minimum (FIG. 3A), the angles (p and up are maximal and other isoforms are also possible that would allow for con equal at approximately 77.1; in the configuration, in which figurations with Zero distance, the average between a Zero the alpha carbon distance is at a maximum (FIG. 3B), the distance and the minimal distance between alpha carbons for angles (p and up are minimal and equal, at approximately either angle () provides the lower limit and the maximal 34.5°. 35 distance between alpha carbons for either angle () provides For an angle () of -120° and an angle X of 0°, at which the upper limit of the preferred range. Therefore, the pre the alpha carbon distance is at a minimum (FIG. 4A), the ferred range is: angles (p and up are maximal and equal at 130.5°; in the configuration, in which the alpha carbon distance is at a Min: 2.85 A, Max: 11.74 A maximum (FIG. 3B), the angles (p and up are minimal and 40 equal, at 10. Furthermore, it has been demonstrated in several cases that a protein structure can often absorb a certain amount of Differences in the alpha-alpha and beta-beta distances structural changes, and that the specificity and functionality As a proxy to the orientation of the side-chains, the is nonetheless maintained. It is therefore also possible, difference in the alpha-alpha and beta-beta distances though less preferred, to introduce the reactive side-chains ("alpha-beta distance difference') and its range are calcu 45 into residue pairs that are spaced even beyond the preferred lated again based on the extremes of alpha carbon spacing range. Given this degree of structural flexibility the largest for angles () of 120° and -120° (FIGS. 3 and 4). The range possible is: maximum and minimum of the alpha-beta distance differ ence is calculated for both () angles at which the both Min: 0 A, Max: 13.74 A. aromatic rings of the tyrosyl side-chains are in the same 50 plane, and at which the alpha-beta distance difference is at 2. Selection based on positional flexibility is carried out, its extremes. This difference is calculated by subtracting as examples, on the measured/calculated Standard deviations twice the length a from twice the length b in FIGS. 3 and 4. or ranges of the alpha-carbon distances in the sample, or any For an angle () of 120° (FIG. 3), and in the configuration, other statistical measure that quantifies the variability of the at which the alpha carbon distance is maximal, the alpha 55 pairs distances measured/calculated and compiled in a 3-D beta distance difference is 2.37 A, in the configuration, at database. The range for this selection is preferably set in which the alpha carbon distance is minimal, the alpha-beta Such a way that the average measured alpha-carbon distance distance difference is 0.19 A. For an angle () of -120° (FIG. of the selected residue pairs is within less than one standard 4), and in the configuration, at which the alpha carbon deviation of the preferred range. However, 2 standard devia distance is maximal, the alpha-beta distance difference is 60 tions are also possible as a selection criterion. 3.03 A; in the configuration, at which the alpha carbon 3. Selection based on side-chain orientation, determined distance is minimal, the alpha-beta distance difference is either by calculating the three-dimensional angles relative to -2.00 A. the alpha-carbon-alpha carbon axis (up, p, and X angles, as described in FIG. 2C), or by calculating a proxy, e.g. an 5.10. The Filters 65 estimate of the orientation based on the alpha-beta distance In cases where sufficient data is available, the selection difference described above. The selection is carried out on process preferably consists of a series of statistical tests or the average, median, mode, or any other statistical value of US 7,037,894 B2 35 36 the angles, or the proxy, suitable to determine whether the 4. The flexibility of the side-chains orientation toward side-chains of the pair are likely to be oriented such that the each other is measured on the standard deviation or range of cross-link will minimally distort the overall structure. the sample, as examples, or any other statistical measure that The angle X can vary by 360°, and the bond is still quantifies the variability of the side-chains of the pairs possible without any distortion of the structure, so long as 5 measured and compiled in a 3-D database. The range for this the angles up and (p adjust correspondingly. Therefore, the selection is preferably set in Such a way that the average selection range based on the angle X should be set by a measured alpha-beta distance difference of the selected metric driven by the angles up, (p, and X with a degree of residue pairs is within less than one standard deviation of the flexibility similar to that for the angles up and p, or for the preferred range. However, 2 standard deviations are also alpha-beta distance difference, the range for which is 10 possible as a selection criterion. described below. 5. Pairs that contain one or both residues that are at least The range for the angles up, (p is, analogous to the optimal 95% or more, preferably 80% or more, possibly also 50% or range of alpha carbon distances in Filter 1, optimally more conserved among the domains in the sample are between the averages of the extreme values calculated for eliminated, as they are likely to be important for the overall the isoform of the dityrosine pair depicted in FIG. 2B, and 15 architecture of the domain, e.g. cysteines in the formation of for () angles of 120° and 120°. This optimal range is thus di-sulfide bonds, in the formation of leucine between: Zippers, etc. 6. Side-chain physical properties, e.g. charge, Min: 22.49, Max: 103.80°. hydrophobicity, van der Waals volumes, molecular weight, - - - - 2O etc. The selection is carried out on the average, median, Since these angles are possible in a larger range even mode, or any other statistical value of these properties, within this one isoform of the dityrosine bond, and since the individually or combined, suitable to determine whether the above optimal range is often too restrictive, the minimal mutations to tyrosine and the cross-link between a residue angle for either angle () provides the lower limit and the pair will minimally distort the overall structure. The degree, maximal angle for either angle () provides the upper limit of to which a residue is conserved, is measured by the standard the preferred range. Therefore, the preferred range is: 25 deviation or range, as examples, or any other statistical Min: 10.5°, Max: 130.5°. measure of the sample that quantifies the variability of the side-chains physical properties which are measured and Furthermore, it has been demonstrated in several cases compiled in a 2-D database. that a protein structure can often absorb a certain amount of 30 The range can be set, as an example, in the following structural changes, and that the specificity and functionality manner: the value of a physical property for a tyrosine pair is nonetheless maintained. It is therefore also possible, (2xvalue of tyrosine) is compared with the combined value though less preferred, to introduce the reactive side-chains of both residues of a pair, and the difference is obtained by into residue pairs that have angles up and (p even beyond the subtraction. The difference is then compared with the com preferred range. Given this degree of structural flexibility the 35 bined standard deviations of the residue pair. A multiple largest range possible is: smaller than 2 of the combined standard deviations should make up for the difference between the value of a tyrosine Min: O, Max: 140°. pair and the combined averages of the residue pair. The optimal range of residue pair alpha carbon distances However, more direct or intuitive measures, as well as more 40 Sophisticated and accurate measures, can also be used to to be selected is determined by averaging first the minimal score and select for physical properties of residue pairs. alpha-beta distance difference in a cross-linked tyrosine pair 7. Elimination of pairs of which one or both residues are of the isoform depicted in FIG. 2B, and for () angles of 120° at a minimum 90% or more, conserved , preferably and 120°, and then, analogously, averaging the maximal 60% or more. Glycine is the smallest of the amino acids and alpha-beta distance difference, as calculated above. This has no beta carbon. Glycine is often associated with turns in these calculations result in the following optimal range: protein structures, and Substitution of a glycine with one of Min: 0.90 A, Max: 2.70 A. the largest amino acids, tyrosine, would likely have too great an impact on the overall structure. Since distance differences are possible in a larger range, 8. The above structural and/or amino acid side-chain and since the above optimal range is often too restrictive, the 50 conservation and/or physical properties of residues/residue minimal alpha-beta distance difference for either angle () pairs proximal to each residue/residue pair. ProXimity can be provides the lower limit and the maximal alpha-beta dis determined with regard to both the polypeptide sequences tance difference for either angle () provides the upper limit (2-D) and the overall structure of the polypeptide or of the preferred range. Therefore, the preferred range is: polypeptide complex (3-D). 55 9. Functional properties concerning the effect of a residue/ Min: -2.00 A, Max: 3.03 A. residue pair on the functionality and/or specificity of the Furthermore, it has been demonstrated in several cases polypeptide or polypeptide complex. that a protein structure can often absorb a certain amount of 5.10.1. Incorporation of Data Derived from Modeling structural changes, and that the specificity and functionality Particularly in embodiments of the instant invention, in is nonetheless maintained. Furthermore, other isoforms of 60 which a single polypeptide is stabilized, such as, for the dityrosine bond are possible. It is therefore also possible, example, a peptide growth factor or a biocatalyst, any of the though less preferred, to introduce the reactive side-chains known methods in the art may be employed to calculate into residue pairs that have alpha-beta distance difference and/or compute the effects of the mutations and/or the even beyond the preferred range. Given this degree of cross-link on the structure, stability, activity, or specificity of structural flexibility the largest range possible is: 65 the resultant polypeptide. One example of such a software package is the above mentioned CNS (Adams P. D. et al. Min: -2.75 A, Max: 3.08 A. Acta Crystallogr. D. Biol. Crystallogr.: Vol. 55 (Pt 1): pp. US 7,037,894 B2 37 38 181–90, 1999) using the CHARM energy minimization and mast cells (Leder A. et al. Cell: Vol. 45: pp. 485-495, plug-in. Data derived from Such analyses may be used to 1986), albumin gene control region which is active in liver further narrow down the selection or residue pairs, and may (Pinkert C. A. et al. Genes Dev.; vol. 1: pp. 268–276, 1987), also be used to inform the settings of the selection alpha-fetoprotein gene control region which is active in liver parameters, such as, for example, the selection ranges. (Krumlauf R. et al. Mol. Cell. Biol.; vol. 5: pp. 1639–1648, 5.10.2. Minimally Required Filters for Selection 1985); alpha 1-antitrypsin gene control region which is Depending on the nature of the polypeptide or polypep active in the liver (Kelsey G. D. et al. Genes Dev.; Vol. 1: pp. tide complex, and on the availability of data, a subset of 161-171, 1987), beta-globin gene control region which is filters can, however, suffice to select a suitable pair for the active in myeloid cells (Magram J. et al. Nature; vol. 315: cross-link reaction. For instance, a filter based on the aver 10 pp. 338-340, 1985); myelin basic protein gene control age of residue alpha carbon spacing (Filter 1, above) can be region which is active in oligodendrocyte cells in the brain used alone. It is also possible to make a selection using the (Readhead C. et al. Cell: vol. 48: pp. 703–712, 1987); above filters 6 and 7, both based on the degree to which myosin light chain-2 gene control region which is active in residues are conserved, if structural data is available for at skeletal muscle (Shani M. Nature; vol. 314: pp. 283–286, least one structure of Such a polypeptide or polypeptide 15 1985), and gonadotropic releasing hormone gene control complex. Any one or more of the above filters, and any region which is active in the hypothalamus (Mason A. J. et combination thereof can be used for the selection. al. Science; vol. 234: pp. 1372–1378, 1986). The order of the filters is not of importance. Furthermore, In a specific embodiment, a vector is used that comprises where it would add to the quality of the selection, the above a promoter operably linked to a gene nucleic acid, one or filters can be split in to two or more filters to stress certain more origins of replication, and, optionally, one or more aspects of the filter. Filters can additionally be combined by selectable markers (e.g., an antibiotic resistance gene). In designing metrics that quantify several criteria simulta bacteria, the expression system may comprise the lac neously. Thereby, for instance, the selection can be refined response system for selection of bacteria that contain the further by selecting one criterion taking the value of another vector. Expression constructs can be made, for example, by criterion into account. 25 Subcloning a coding sequence into one the restriction sites of each or any of the pGEX vectors (Pharmacia, Smith D. B. 5.11. DNA Vector Constructs and Johnson K. S. Gene; vol. 67: pp. 31–40, 1988). This The nucleotide sequence coding for the polypeptide, or allows for the expression of the protein product. for one, any, both, several or all of the polypeptides of a Vectors containing gene inserts can be identified by three complex, or functionally active analogs or fragments or 30 general approaches: (a) identification of specific one or other derivatives thereof, can be inserted into an appropriate several attributes of the DNA itself, such as, for example, expansion or expression vectors, i.e., a vector which con fragment lengths yielded by restriction endonuclease tains the necessary elements for the transcription alone, or treatment, direct sequencing, PCR, or nucleic acid hybrid transcription and translation, of the inserted protein-coding ization; (b) presence or absence of “marker gene functions; sequence(s). The native genes and/or their flanking 35 and, where the vector is an expression vector, (c) expression sequences can also Supply the necessary transcriptional of inserted sequences. In the first approach, the presence of and/or translational signals. a gene inserted in a vector can be detected, for example, by Expression of a nucleic acid sequence encoding a sequencing, PCR or nucleic acid hybridization using probes polypeptide or peptide fragment may be regulated by a comprising sequences that are homologous to an inserted second nucleic acid sequence so that the polypeptide is 40 gene. In the second approach, the recombinant vector/host expressed in a host transformed with the recombinant DNA system can be identified and selected based upon the pres molecule. For example, expression of a polypeptide may be ence or absence of certain “marker gene functions (e.g., controlled by any promoter/enhancer element known in the thymidine kinase activity, resistance to antibiotics, transfor art. mation phenotype, occlusion body formation in baculovirus, Promoters which may be used to control gene expression 45 etc.) caused by the insertion of a gene in the vector. For include, as examples, the SV40 early promoter region, the example, if the gene is inserted within the marker gene promoter contained in the 3' long terminal repeat of Rous sequence of the vector, recombinants containing the insert sarcoma, the herpes thymidine kinase promoter, the regula an identified by the absence of the marker gene function. In tory sequences of the metallothionein gene; prokaryotic the third approach, recombinant expression vectors can be expression vectors such as the B-lactaminase promoter, or the 50 identified by assaying the product expressed by the recom lac promoter, plant expression vectors comprising the nopa binant expression vectors containing the inserted sequences. line synthetase promoter or the cauliflower mosaic virus 35S Such assays can be based, for example, on the physical or RNA promoter, and the promoter of the photosynthetic functional properties of the protein in in vitro assay systems, enzyme ribulose biphosphate carboxylase; promoter ele for example, binding with anti-protein antibody. ments from yeast or other fungi such as the Gal 4 promoter, 55 Once a particular recombinant DNA molecule is identified the alcohol dehydrogenase promoter, phosphoglycerol and isolated, several methods known in the art may be used kinase promoter, alkaline phosphatase promoter, and the to propagate it. Once a suitable host system and growth following animal transcriptional control regions, which conditions are established, recombinant expression vectors exhibit tissue specificity and have been utilized in transgenic can be propagated and prepared in quantity. Some of the animals: elastase I gene control region which is active in 60 expression vectors that can be used include human or animal pancreatic acinar cells (Swift et al. Cell: Vol. 38: pp. viruses such as vaccinia virus or adenovirus; insect viruses 639-646, 1984); a gene control region which is active in Such as baculovirus; yeast vectors; bacteriophage vectors pancreatic beta cells (Hanahan D. Nature; vol. 315: pp. (e.g., lambda phage), and plasmid and cosmid DNA vectors. 115-122, 1985), an immunoglobulin gene control region Once a recombinant vector that directs the expression of which is active in lymphoid cells (Grosschedl R. et al. Cell: 65 a desired sequence is identified, the gene product can be vol. 38: pp. 647–658, 1984), mouse mammary tumor virus analyzed. This is achieved by assays based on the physical control region which is active in testicular, breast, lymphoid or functional properties of the product, including radioactive US 7,037,894 B2 39 40 labeling of the product followed by analysis by gel of a different protein). Such a chimeric product can be made electrophoresis, immunoassay, etc. by ligating the appropriate nucleic acid sequences encoding the desired amino acid sequences to each other by methods 5.12. Systems of Gene Expression and Protein known in the art, in the proper coding frame, and expressing Purification the chimeric product by methods commonly known in the A variety of host-vector systems may be utilized to art. Alternatively, such a chimeric product may be made by express the protein-coding sequences. These include, as protein synthetic techniques, for example, by use of a examples, mammalian cell systems infected with virus (e.g., peptide synthesizer. vaccinia virus, adenovirus, etc.); insect cell systems infected The polypeptides of a complex may be expressed together with virus (e.g., baculovirus); microorganisms such as yeast 10 in the same cells either on the same vector, driven by the containing yeast vectors, or bacteria transformed with same or independent transcriptional and/or translational bacteriophage, DNA, plasmid DNA, or cosmid DNA. The signals, or on separate expression vectors, for example by expression elements of vectors vary in their strengths and cotransfection or cotransformation and selection, for specificities. Depending on the host-vector system utilized, example, may be based on both vectors individual selection any one of a number of Suitable transcription and translation 15 markers. Alternatively, one, any, both, several or all of the elements may be used. polypeptides a complex may be expressed separately; they In a specific embodiment, the gene may be expressed in may be expressed in the same expression system, or in bacteria that are protease deficient, and that have low different expression systems, and may be expressed indi constitutive levels and high induced levels of expression vidually or collectively as fragments, derivatives or analogs where an expression vector is used that is inducible, for of the original polypeptide. example, by the addition of IPTG to the medium. 5.13. The Cross-Link Reaction In yet another specific embodiment, the polypeptide, or one, any, both, several or all of the polypeptides of a 5.13.1. Introduction of Point Mutations to Phenylalanine complex may be expressed with signal peptides, such as, for One of the codons of every tyrosine residue pair that may example, pelB bacterisl signal peptide, that directs the 25 react with each other and cause undesirable structural and/or protein to the bacterial periplasm (Lei et al J. BacteroL, Vol. functional distortions is preferably point mutated to codons 169: pp. 4379, 1987). Alternatively, protein may be allowed that direct the expression of phenlyalanine. to form inclusion bodies, and subsequently be resolubilized Point mutations can be introduced into the DNA encoding and refolded (Kim S. H. et al. Mol. Immunol. Vol. 34: pp. the polypeptide, or one, any, both, several or all of the 891, 1997). 30 polypeptides of a complex by any method known in the art, In yet another embodiment, a fragment of the polypeptide, such as oligonucleotide-mediated site-directed mutagenesis. or one, any, both, several or all of the polypeptides a Such methods may utilize oligonucleotides that are homolo complex comprising one or more domains of the protein is gous to the flanking sequences of Such codons, but that expressed. Any of the methods previously described for the 35 encode tytosine at the selected site or sites. With these insertion of DNA fragments into a vector may be used to oligonucleotides, DNA fragments containing the point muta construct expression vectors containing a chimeric gene tion or point mutations are amplified and inserted into the consisting of appropriate transcriptional/translational con gene or genes, for example, by Subcloning. One example of trol signals and the protein coding sequences. These meth such methods is the application of the QuikChangeTM Site ods may include in vitro recombinant DNA and synthetic 40 Directed Mutagenesis Kit (Stratagene Catalog #200518); techniques and in Vivo recombinants (genetic this kit uses the Pfu enzyme having non-strand-displacing recombination). action in any double stranded plasmid mutation in PCR In addition, a host cell Strain may be chosen that modu reactions. Other methods may utilize other enzymes Such as lates the expression of the inserted sequences, or modifies DNA polymerases, or fragments and/or analogs thereof. and processes the gene product in the specific fashion 45 The plasmid or plasmids containing the point mutation or desired. Expression from certain promoters can be elevated point mutations are, for example, transformed into bacteria in the presence of certain inducers; thus, expression of the for expansion, and the DNA is prepared as described above. genetically engineered polypeptides may be controlled. The isolated, expanded, and prepared DNA may be exam Furthermore, different host cells have characteristic and ined to verify that it encodes the polypeptide or polypeptides specific mechanisms for the translational and post 50 of the complex, and that the correct mutation or mutations translational processing and modification (e.g., were achieved. This may, for example, be verified by direct glycosylation, phosphorylation of proteins. Appropriate cell DNA sequencing, DNA hybridization techniques, or any lines or host systems can be chosen to ensure the desired other method known in the art. modification and processing of the foreign polypeptide(s) 5.13.2. Purification of Gene Products expressed. For example, expression in a bacterial system can 55 The gene product may be isolated and purified by stan be used to produce a non-glycosylated core protein product. dard methods including chromatography (e.g., ion Expression in yeast will produce a glycosylated product. exchange, affinity, and sizing column chromatography), Expression in mammalian cells can be used to ensure ammonium sulfate precipitation, centrifugation, differential “native’ glycosylation of a heterologous protein. solubility, or by any other standard technique for the puri Furthermore, different vector/host expression systems may 60 fication of proteins. effect processing reactions to different extents. The functional properties may be evaluated using any In other embodiments of the invention, the polypeptide, or Suitable assay. The amino acid sequence of the protein can one, any, both, several or all of the polypeptides a complex, be deduced from the nucleotide sequence of the chimeric and/or fragments, analogs, or derivative(s) thereof may be gene contained in the recombinant vector. As a result, the expressed as a fusion-, or chimeric, protein product 65 protein can be synthesized by Standard chemical methods (comprising the protein, fragment, analog, or derivative known in the art (e.g., see Hunkapiller M. et al. Nature; vol. joined via a peptide bond to a heterologous protein sequence 310(5973): pp. 105-11, 1984). US 7,037,894 B2 41 42 5.13.3. The Reaction concentration of the polypeptides, the catalyst, the oxidizing The cross-link reaction can utilize any chemical reaction agent, or any other reagents that are applied toward Such a or physical known in the art that specifically introduces reaction. The catalyst is a small molecule that diffuses easily, dityrosine cross-links, such as peroxidase catalysed cross and can be used at varying concentrations. Tightly packed linking, or photodynamically in the presence or absence of 5 polypeptide hydrophobic cores have a degree of solvent sensitizers (see Section II). Preferably, however, the reaction accessibility. This may be modulated by any known method is catalyzed by a metallo-ion complex, as described in detail in the art, including, but not limited to, by altering the below. reaction temperature, or by the addition of salts, detergents, Partially purified polypeptides containing appropriate deoxycholate, or guanidinium. tyrosine residues may be equilibrated by dialysis in a buffer, 10 such as phosphate buffered saline (PBS), together or sepa 5.14. Achieving a Stabilized Polypeptide or rately before mixing them. The catalyst is then added (on ice Complex or otherwise). The catalyst of the reaction is any compound 5.14.1. Point Mutation to Tyrosine and Gene Product that will result in the above cross-link reaction. The catalyst Purification should have the structural components that convey the 15 specificity of the reaction, generally provided by a structure The codons of the residues identified as a suitable pair to complexing a metal ion, and the ability to abstract an which the cross-link should be directed, as described above, electron from the Substrate in the presence of an oxidizing and selected for a particular embodiment of the instant reagent, generally provided by the metal ion. An active invention, are point mutated Such that the resultant residue metal is encased in a stable ligand that blocks non-specific pairs direct the expression of tyrosyl side-chains. Point binding to chelating sites on protein Surfaces. For example, mutations are introduced as described above. either a metalloporphyrin, Such as, but not limited to, The gene products are again purified as described above. 20-tetrakis (4-sulfonateophenyl)-21H,23H-porphine manga 5.14.2. Cross-Linking the Polypeptide or Complex nese (III) chloride (MnTPPS) or hemin iron (III) protopor The polypeptides now containing tyrosyl side-chains at phyrin IX chloride (Campbell L. A. et al. Bioorganic and 25 the residues to which the cross-link reaction should be Medicinal Chemistry, vol. 6: pp. 1301–1037, 1998), or a directed are subjected to the cross-link reaction under the metal ion-peptide complex, such as the tripeptide NH2-Gly conditions determined as described above and carried out, Gly-His-COOH complexing Ni---- can serve as the catalyst also as described above. The efficiency of the reaction may of the reaction. Metalloporphoryns are a class of oxidative be examined, for example, by Western blotting experiments, ligand-metal complexes for which there are few, if any, high 30 in which a cross-linked complex should run at approxi affinity sites in naturally occurring eukaryotic proteins. The mately the molecular weight of both or all polypeptides of reaction can also be catalyzed by intramolecular Ni---- the complex. If the bond is readily formed under the above peptide complexes, such as and C-terminal amino acids conditions, the strength of the reaction may still be further consisting either of 3 or more histidine residues (his-tag), or adjusted to the minimally required strength. of the above GGH tripeptide. The reaction is initiated by the 35 addition of the oxidizing reagent at room temperature or In embodiments of the invention wherein the cross-link is otherwise. Oxidizing reagents include, but are not limited to, directed to residue pairs that are buried and/or are not readily hydrogen peroxide, oxone, and magnesium monoperXyph accessible to the catalyst or oxidizing reagents, secondary and higher order polypeptide structure can be temporarily thalic acid hexahydrate (MMPP) (Brown K. C. et al. Bio dissociated to permit reagent access. For example, such an chem.; vol. 34(14): pp. 4733–4739, 1995). Higher specific 40 approach may be necessary when directing the cross-link to ity can be achieved by using a photogenerated oxidant, Such the hydrophobic core of a single polypeptide or to a buried as the oxidant used in the process described by Fancy D. and residue pair of polypeptide complex having very high affin Thomas Kodadek, which involves brief photolysis of tris ity among subunits. Any means known in the art may be bipyridylruthenium(II) dication with visible light in the used to reversibly denature polypeptide structure to permit presence of an electron acceptor, such as aminonium per 45 reagent access to buried residue pairs. Such means include, sulfate (Fancy D. A. and Kodadek T. Proc. Natl. Acad. Sci., but are not limited to, manipulating (increasing or U.S.A.; vol. 96: pp. 6020-24, 1999). The optimal reaction decreasing) salt concentration or reaction temperature, or period is preferably determined for each application; employing detergents, or Such agents as guanidine HC1. As however, In cases where an optimization process is not denaturing conditions are withdrawn (e.g., by dialysis) and possible, the reaction should preferably be stopped after one 50 minute. Using a photogenerated oxidant, such as above the polypeptide or complex begins to refold/reassociate, the described, the exposure to light can be less than one second. catalyst and oxidizing reagents may be added, as described The reaction is stopped by the addition of a sufficient amount above. of reducing agent, such as B-mercaptoethanol, to counteract 5.15. Purification of Cross-Linked Complexes and/or neutralize the oxidizing agent. 55 Alternatively, the reaction may be stopped by the addition The cross-linked polypeptide or complex may be isolated of a chelating reagent, such as, for example, EDTA or and purified from proteins in the reaction that failed to EGTA. The solution is again equilibrated by dialysis in a cross-link, or any other undesirable side-products, by stan buffer, such as phosphate buffered saline (PBS), to remove dard methods including chromatography (e.g., sizing col the reagents required for the cross-link reaction, Such as the 60 umn chromatography, glycerol gradients, affinity), oxidizing reagent, the catalyst, or the metal ion, reducing centrifugation, or by any other standard technique for the agents, chelating reagents, etc. The cross-link reaction con purification of proteins. In specific embodiments it may be ditions are preferably adjusted such that the polypeptides or necessary to separate polypeptides that were not cross polypeptides of a complex that have been mutated to remove linked, but that homo- or heterodimerize with other polypep undesirable tyrosyl side-chains no longer form a bond. 65 tides due to high affinity binding. Separation may be These conditions are adjusted by varying the reaction achieved by any means known in the art, including, for temperature, pH, or osmolarity conditions, or by varying the example, addition of detergent and/or reducing agents. US 7,037,894 B2 43 44 Yield of functionally cross-linked polypeptides or com by comparing the specificity of the cross-linked polypeptide plexes can be determined by any means known in the art, for or complex with that of the polypeptide or complex before example, by comparing the amount of Stabilized complex, stabilization, cross-linked or stabilized by another method, purified as described above, with the starting material. or naturally stabilized by a post-translational modification. Protein concentrations are determined by standard Assays for retained specificity can be based, for example, on procedures, such as, for example, Bradford or Lowrie pro enzymatic Substrate specificity, or ELISA-type procedures. tein assays. The Bradford assay is compatible with reducing For example, the retained or resultant specificity of a lipase agents and denaturing agents (Bradford, M. Anal. Biochem.; (carboxylesterase) may be determined by any method vol. 72: pp. 248, 1976), the Lowry assay is better compat known to one skilled in the art. Non-limiting examples of ibility with detergents and the reaction is more linear with 10 Such methods include using a number of fluorogenic alky respect to protein concentrations and read-out (Lowry, O. J. ldiacylglycerols as Substrates for an analysis of the biocata Biol. Chem.; vol. 193: pp. 265, 1951). lyst’s stereoselectivity. For a detailed description of such methods and of certain such compounds, see the article 5.16. Assay of a Cross-linked Polypeptide or “New fluorescent glycerolipids for a dual wavelength assay Complex 15 of lipase activity and stereoselectivity” (Zandonella G. et al., 5.16.1. Retained Function 1997, J. Mol. Catal. B: Enzymn. 3: pp. 127–30). Functionality 5.16.2. Stability Depending on the nature of the polypeptide or polypep In vitro tide complex, retained functionality can be tested, for Stability of the polypeptide or complex may be tested in example, by comparing the functionality of the cross-linked vitro in, for example but not limited to, time-course experi complex, cross-linked as described above, with that of the ments incubating the polypeptide or complex at varying polypeptide or complex before stabilization, cross-linked or concentrations and temperatures. Polypeptide or complex stabilized by another method, or naturally stabilized by a stability may also be tested at various pH levels and under post-translational modification that, for example, regulates 25 various redox conditions. For all of the above conditions, the the association of certain polypeptides. Assays for retained remaining levels of functional polypeptides or polypeptide functionality can be based, for example, on the biochemical complexes is determined by assaying as described above properties of the protein in in vitro assay systems. (Functionality). In the above example of a biocatalyst, Alternatively, the polypeptide or complex can be tested for improved or altered stability of a stabilized polypeptide or functionality by using biological assay systems. For 30 complex can be determined by any method known to one example, the activity of a kinase can be tested in in vitro skilled in the art. Such methods include, but are not limited kinase assays, and a growth factor, Such as a member of the to, calorimetric and/or structural analyses, thermodynamic IL-8 family, can be tested for activity in chemotactic cell calculations and analyses, and comparison of the activities migration assays or beta-glucuronidase release assays of the stabilized and unstabilized enzymes under their opti (Leong S. R. et al. Protein Sci., vol. 6(3): pp: 609–17, 1997). mal conditions and under Suboptimal, or adverse reaction As another example, retained enzymatic activity of a bio 35 conditions, such as higher or lower temperature, pressure, catalyst can be determined by any method known to one pH, salt concentration, inhibitory compound, or enzyme skilled in the art. The activity of an enzyme is preferably and/or Substrate concentration. Any of the above analyses measured directly by comparing the activity of the enzyme may also include time course experiments directed to the on a Substrate before and after stabilization, and quantitating determination of stabilized biocatalyst half-life and/or shelf the product of the reaction. As examples, such assays 40 life. Stabilization of a biocatalyst according to the invention include, but are not limited to, visualization upon chromato can also be evaluated in the context of other methods of graphic separation of the compounds in the reaction, spec biocatalyst stabilization. As non-limiting examples, the trophotometric and fluorometric analyses of reaction above enzymatic activities can be tested in immobilizing products, analysis of incorporated or released detectable 45 gels or other matrices, or in partial or pure organic solvents. markers, such as, for example, radioactive isotopes. Indirect Furthermore, a biocatalyst stabilized by any of the methods methods, that include, but are not limited to, computational, known in the art (such as directed evolution or designed structural, or other thermodynamic analyses, may also be mutagenesis, see Background) can also be subjected to the used for the determination of the activity of the stabilized methods of the instant invention to achieve further stabili Zation. biocatalyst. More specifically, as an example of a 50 biocatalyst, the activity of a lipase, or specifically the In vivo activity of carboxylesterases catalyzing the hydrolysis of Pharmaceutical and therapeutic applications are best long-chain acylglycerols, is determined by any method tested in vivo or under conditions that resemble physiologi known in the art, including, but not limited to the measure cal conditions (see also, below). The stability of the polypep ment of the hydrolysis of p-nitrophenylesters of fatty acids 55 tide or complex may be tested in, for example but not limited with various chain lengths (>=C-10) in solution by spectro to, serum, incubating the polypeptide or complex in time photometric detection of p-nitrophenol at 410 nm. Where it course experiments at various temperatures (e.g. 37, 38, 39, is necessary to distinguish between lipases and esterases, the 40, 42, and 45° C.), and at different serum concentrations, triglyceride derivative 1.2-O-dilauryl-rac-glycero-3-glutaric and assaying for the remaining levels of functional polypep acid resorufin ester (available from Boehringer Mannheim 60 tides or complexes. Furthermore, stability of a polypeptide Roche GmbH, Germany), may also be used as a substrate, or complex in the cytoplasm may be tested in time-course yielding resorufin, which can be determined spectrophoto experiments in cell-lysates, lysed under various conditions metrically at 572 nm, or fluorometrically at 583 nm (Jaeger (e.g. various concentrations of various detergents) at differ K-E et al. Annu. Rev. Microbiol. 1999. 53: pp. 315–51). ent temperatures (e.g. 37, 38, 39, 40, 42, and 45° C.), and Specificity 65 assaying for the remaining levels of functional polypeptides Depending on the nature of the polypeptide or polypep or complexes. More directly, stability in the cytoplasm may tide complex, retained specificity can be tested, as examples, be tested in time-course experiments by Scrape-loading US 7,037,894 B2 45 46 tissue culture cells with stabilized polypeptide or complex limited to, by adjusting certain conditions of the reaction, and assaying for the remaining levels of function. The Such as, but not limited to, salt, Tris, or protein stability of the polypeptide or complex may also be tested by concentration, or by adjusting the pH of the reaction. If injecting it into an experimental animal and assaying for thereby the strength of the polypeptides association is specific activity. Alternatively, the compound may be recov 5 increased, for example, as determined by non-reducing SDS ered from the animal at an appropriate time point, or several PAGE, the cross-link reaction should be tried again under time points, and assayed for activity and stability, as these conditions. described above. The opposite could also be the problem: the polypeptides 5.16.3. Biodistribution of a complex, or the polypeptide structures of a single To determine the utility of a stabilized polypeptide or 10 polypeptide, associate with each other too tightly, the tyrosyl polypeptide complex more directly, biodistribution and/or side-chains are not exposed to the catalyst or oxidizing other pharmacokinetic attributes may be determined. In a reagents, and the dityrosine bond does not form. In Such specific embodiment, a stabilized polypeptide or polypep cases, the protein Sub- or secondary structures or the tide complex may be injected into a model organism and polypeptides of a complex are first dissociated by any means assayed by tracing a marker, such as but not limited to, 'I 15 known in the art, as described above, by adjusting, for or 'F radio labels (Choi C. W. et al. Cancer Research, vol. example, but not limited to, the concentrations of salt, 55: pp. 5323–5329, 1995), and/or by tracing activity as detergent, guanidine HCl, and/or any other agents that cause described above (Colcher D. et al. Q. J. Nucl. Med. Vol. reversible denaturation, temperature, pressure, and/or reac 44(4): pp. 225–241, 1998). Relevant information may be tion time. It may also, for example, be possible to add the obtained, for example, by determining the amount of func oxidizing agent and catalyst at an earlier or later time-point, tional polypeptide or polypeptide complex that can be as the above conditions are reversed, as described above, expected to be pharmaceutically active due to its penetration and the polypeptide or polypeptide complex begins to of the specifically targeted tissue. Such as, for example, a refold/reassociate. tumor. Half-life in the circulation and at the specifically 25 Increase Strength of Reaction Conditions targeted tissue, renal clearance, immunogenicity, and speed Should the cross-link not form in spite of appropriate of penetration may also be determined in this context. polypeptide folding or good complex formation under the 5.16.4. Animal and Clinical Studies conditions of the reaction, the next solution could be to Utility of a stabilized polypeptide or complex can be increase the strength of the conditions of the reaction, e.g. by determined directly by measuring its pharmacological 30 increasing the concentration of the oxidizing reagent and/or activity, either in animal studies or clinically. In a specific of the catalyst. A preferred method would still use the embodiment, such measurements may indude, for example, minimal strength of the reaction required for the cross-link measurements with which tumor progression or regression is to form. monitored upon treatment of an animal model or one or Identify Second-site Mutation several patients with a stabilized polypeptide or complex 35 designed as an anti-cancer pharmacological agent. In It may be possible, by screening a library of mutants of the another embodiment. Such measurements may include, for polypeptide or polypeptide complex to be cross-linked, to example, measurements of bone mass, such as X-ray identify second-site mutations that alter the fold and/or measurements, upon treatment of an animal model or one or structure of the polypeptide or polypeptide complex in Such several patients with a stabilized polypeptide or complex 40 a way, that the cross-link can form. Such second-site muta designed as an anti-menopausal bone-loss pharmacological tions may be identified by any methods known in the art, agent. Such as, for example, but not limited to, any of the in vitro evolutionary approaches (see above). 5.17. Troubleshooting Direct Cross-linking Reaction to an Alternative Residue 5.17.1. Polypeptide or Complex not Cross-Linked 45 Pair If the polypeptide or polypeptides of a complex should The cross-link may be directed to a pair of tyrosines that not become cross-linked and stabilized by the above cannot be cross-linked due to structural elements not cap described reaction, as determined, for example, by non tured in the selection process. Should the above approaches reducing Sodium Dodecyl Sulphate Polyacrylamide Gel not cause the cross-link to form between the selected resi Electrophoresis (SDS PAGE), there may be several expla 50 dues of a pair encoding tyrosine under any conditions, nations and solutions to the problem. another residue pair may be selected, and the cross-link Adjust Polypeptide Concentrations Salt/Osmolarity and/ reaction tried again, where necessary adjusting the reaction or pH Conditions conditions, as described above. For the stabilization of a polypeptide complex, the least 55 Combined Approach problematic explanation may be that the polypeptides, as It may be necessary to employ one, two, any, several, or they are not yet stabilized, do not form a sufficiently stable all of the above approaches to trouble-shooting to achieve complex in solution for the cross-link to form under the the desired stabilizing dityrosine bond. present conditions of the reaction. This could, for example, 5.17.2. Compromise Functionality of Polypeptide or be determined by immunoprecipitating one of the polypep 60 Complex tides by any method known in the art, and assaying for the presence and relative quantity of the other polypeptide(s) of Decrease Strength of Reaction Conditions the complex in the precipitate, for example, by Western Reducing the strength of the reaction by adjusting, for blotting. example, but not limited to, the concentration of either the Should this be (one of) the problem(s), it may be possible 65 catalyst or the oxidizing reagent, the temperature, pressure, to increase the strength of the polypeptides association with and/or reaction time, may result in a stabilized polypeptide each other by any known means in the art, including, but not or polypeptide complex with better retained functionality. US 7,037,894 B2 47 48 Adjust Protein Concentrations, Salt/Osmolarity and/or pH niques. In addition, in vitro assays may optionally be Conditions employed to help identify optimal dosage ranges. The pre Non-specific cross-link reactions may compromise the cise dose to be employed will also depend on the route of functionality of the polypeptide or polypeptide complex, administration and the seriousness of the disease or disorder, that may occur under certain reaction conditions, such as, and should be decided according to the judgment of the but not limited to, high protein concentrations relative to the practitioner and each subjects circumstances. Effective optimum, certain pH levels, or salt, detergent, denaturing, doses may be extrapolated from dose-response curves and/or any other concentrations of the components in the derived from in vitro or animal model test systems. reaction. These conditions may be adjusted to minimize or Various delivery systems are known and can be used to eliminate the formation of non-specific, compromising dity 10 administer a pharmaceutical composition of the present rosine bonds. invention. Methods of introduction include but are not Identify Second-site Mutation limited to intradermal, intramuscular, intraperitoneal, It may be possible, by screening a library of mutants of the intravenous, Subcutaneous, intranasal, epidural, and oral polypeptide or polypeptide complex to be cross-linked, to routes. The compounds may be administered by any con identify second-site mutations that alter the fold and/or 15 venient route, for example by infusion or bolus injection, by structure of the polypeptide or polypeptide complex in Such absorption through epithelial or mucocutaneous linings a way, that the its functionality upon cross-linking is (e.g., oral mucosa, rectal and intestinal mucosa, etc.) and restored. Such second-site mutations may be identified by may be administered together with other biologically active any methods in the art, such as, for example, but not limited agents. Administration can be systemic or local. In addition, to, any of the in vitro evolutionary approaches (see above). it may be desirable to introduce the pharmaceutical compo sitions of the invention into the central nervous system by Direct Cross-linking Reaction to an Alternative Residue any Suitable route, including intraventricular and intrathecal Pair injection; intraventricular injection may be facilitated by an As often input data for the selection process is less than intraventricular catheter, for example, attached to a completely accurate, as or for any other reason, the selected 25 reservoir, such as an Ommaya reservoir. Pulmonary admin residue pair may yield residue pairs that distort the overall istration can also be employed, e.g., by use of an inhaler or structure of the polypeptide or polypeptide complex, and nebulizer, and formulation with an aerosolizing agent. thereby compromise or alter its functionality. Should this be In a specific embodiment, it may be desirable to admin the case, another pair that the selection process yielded ister the pharmaceutical compositions of the invention should be mutated such that both residues encode tyrosine, 30 locally to the area in need of treatment; this may be achieved and the cross-link reaction should be tried again, and by, for example, and not by way of limitation, local infusion retained functionality tested. during Surgery, by injection, by means of a catheter, or by Combined Approach means of an implant, said implant being of a porous, Of course, it may be necessary to employ one or more of non-porous, or gelatinous material, including membranes, the above approaches to trouble-shooting to achieve the 35 Such as Sialastic membranes, or fibers. In one embodiment, desired stabilizing dityrosine bond. administration can be by direct injection at the site (or former site) of a malignant tumor or neoplastic or pre 5.18. Software for Selection Process neoplastic tissue. This invention provides software that permits automated 40 In another embodiment, pharmaceutical compositions of selection of suitable residue pairs at which a di-tyrosine the invention can be delivered in a controlled release system. bond can be placed. Such software can be used in accor In one embodiment, a pump may be used (see Langer, Supra; dance with the geometrical, physical, and chemical criteria Sefton, CRC Crit. Ref. Biomed. Eng.: vol. 14: pp. 201, 1987: described above (see especially Identification of Suitable Buchwald et al., Surgery; vol. 88: pp. 507, 1980: Saudek et Residue Pairs for the Reaction), and a Residue Pair Selection 45 al., N. Engl. J. Med...: vol. 321: pp. 574, 1989). In another Flowchart such as is set forth in Section 6 below. As embodiment, polymeric materials can be used (see Medical described above, a successive array of Filters is imple Applications of Controlled Release, Langer and Wise (eds.), mented and residue pairs that “pass' through the filters CRC Pres. Boca Raton, Fla., 1974; Controlled Drug comprise the selected residue pairs (FIG. 14, left side). Bioavailability, Drug Product Design and Performance, Alternatively, filters can be implemented to process all 50 Smolen and Ball (eds.), Wiley, N.Y., 1984; Ranger and residue pairs in a parallel array (FIG. 14, right side). Residue Peppas, J. Macromol. Sci. Rev. Macromol. Chem.; vol. 23: pairs that “pass” through a filter define that filter's set of pp. 61, 1983; see also Levy et al. Science; vol. 228: pp. 190, passed pairs. In a preferred embodiment, residue pairs that 1985: During et al. Ann. Neurol.; vol. 25: pp. 351, 1989; are in all filters passed sets (i.e. residue pairs that form the Howard et al. J. Neurosurg; vol. 71: pp. 105, 1989). In yet intersection of all filter sets) are the selected pairs. The filter 55 another embodiment, a controlled release system can be requirements are as described above (Identification of Suit placed in proximity of the therapeutic target, i.e., the brain, able Residue Pairs for the Reaction). thus requiring only a fraction of the systemic dose (see, e.g., Goodson, in Medical Applications of Controlled Release, 5.19. Pharmaceutical Composition supra, vol. 2, pp. 115–138, 1984). In one embodiment, this invention provides a pharmaceu 60 Other controlled release systems are discussed in the tical composition comprising an effective amount of a review by Langer (Science; vol. 249: pp. 527–1533, 1990). stabilized polypeptide or polypeptide complex, and a phar In a preferred embodiment, the composition is formulated maceutically acceptable carrier. As used herein, “an effective in accordance with routine procedures as a pharmaceutical amount’ means an amount required to achieve a desired end composition adapted for intravenous administration to result. The amount required to achieve the desired end result 65 human beings. Typically, compositions for intravenous will depend on the nature of the disease or disorder being administration are solutions in sterile isotonic aqueous treated, and can be determined by standard clinical tech buffer. Where necessary, the composition may also include US 7,037,894 B2 49 50 a solubilizing agent and a local anesthetic Such as lidocaine many of these domains, the interaction sites with other to ease pain at the site of the injection. Generally, the proteins have also been mapped. ingredients are Supplied either separately or mixed together In the following section, methods of Stabilizing one such in unit dosage form, for example, as a dry lyophilized complex, an Fv fragment complex, for which an abundance powder or water free concentrate in a hermetically sealed of data is available, are described in detail. Specifically, container Such as an ampoule or Sachette indicating the described below are the assembly of relevant databases for quantity of active agent. Where the composition is to be the selection process, the selection process itself, the intro administered by infusion, it can be dispensed with an duction of point mutations, bacterial expression of the infusion bottle containing sterile pharmaceutical grade water polypeptides and their purification, adjustment of the cross or saline. Where the composition is administered by 10 link reaction conditions, the cross-link reaction itself, and injection, an ampoule of sterile water for injection or saline analysis of the resulting stabilized complex. can be provided so that the ingredients may be mixed prior The input data for the 2-D database is obtained from to administration. Weir's Handbook of Experimental Immunology I. Immu nochemistry and Molecular Immunology, Fifth Edition. The 5.20. Consideration for Pharmaceutical 15 input data for the 3-D database is obtained from the Composition Brookhaven National Laboratory Protein Database. The Stabilized polypeptides or polypeptide complexes of the derivative data relevant to the selection process in both invention should be administered in a carrier that is phar databases is calculated as described. The selection process is maceutically acceptable. The term “pharmaceutically carried out using a set of filters that is convenient and acceptable” means approved by a regulatory agency of the appropriate for this application of the instant invention. Federal or a state government or listed in the U.S. Pharma Point mutations to tyrosine (directing the cross-link copeia or other generally recognized pharmacopeia or reaction) are introduced according to the final selection of receiving specific or individual approval from one or more the selection process, and point mutations to phenylalanine generally recognized regulatory agencies for use in animals, (limiting the cross-link reaction) according to the specific and more particularly in humans. The term “carrier refers 25 sequence of each FV fragment and the corresponding and to a diluent, adjuvant, excipient, or vehicle with which the relevant structural information contained in the 3-D data therapeutic is administered. Such pharmaceutical carriers base. The polypeptides of the complex are expressed bac can be sterile liquids, such as water, organic solvents, such terially as GST fusion proteins, and purified over a as certain alcohols, and oils, including those of petroleum, GT-affinity column. The purified polypeptides of the com animal, vegetable or synthetic origin, such as peanut oil, 30 plex are proteolytically cleaved from the GST parts of the soybean oil, mineral oil, sesame oil and the like. Buffered fusion proteins, and the GST polypeptide is removed, again saline is a preferred carrier when the pharmaceutical com using a GT affinity column. position is administered intravenously. Saline solutions and The minimally required reaction conditions are adjusted aqueous dextrose and glycerol solutions can also be using a construct with the mutations to phenylalanine, but employed as liquid carriers, particularly for injectable solu 35 lacking the mutations to tyrosine, and the cross-link reaction tions. The composition, if desired, can also contain minor is then carried out with the constructs containing both sets of amounts of wetting or emulsifying agents, or pH buffering point mutations. The efficiency of the reaction is tested for, agents. These compositions can take the form of Solutions, and the resulting, stabilized FV fragments are then tested for Suspensions, emulsion and the like. Examples of Suitable 40 retained affinity, stability, immunogenicity, and biodistribu pharmaceutical carriers are described in “Remington's Phar tion characteristics. maceutical Sciences” by E. W. Martin. Such compositions will contain a therapeutically effective amount of the 6.2. Advantages of the Tyrosyl-Tyrosyl Cross-Link Therapeutic, preferably in purified form, together with a for FV Fragments suitable amount of carrier so as to provide the form for The underlying chemistry of the technology covered by proper administration to the patient. The formulation should 45 the present invention causes an oxidative cross-link to form suit the mode of administration. In a preferred embodiment, between reactive side-chains of proteins that form stable the composition is formulated in accordance with routine complexes. Because the cross-linking reaction is catalyzed, procedures as a pharmaceutical composition adapted for once established, the cross-link is stable in the absence of the intravenous administration to human beings. Typically, com catalyst under a broad range of pH and redox conditions. The positions for intravenous administration are solutions in 50 cross-link reaction requires very close proximity between sterile isotonic aqueous buffer. the molecules that will cross-link and therefore only occurs between molecules that normally interact and associate 6. EXAMPLE I closely in solution and is therefore limited to molecules that 55 have legitimate functional interactions. Stabilized FV Fragments Thus, the current invention describes a new technology The following example illustrates certain variations of the that will allow stabilization of immunoglobulin-derived con methods of the invention for protein and protein complex jugates and result in both a very high degree of stability and stabilization. This example is presented by way of illustra minimal immunogenicity in therapeutic contexts. This tech tion and not by way of limitation to the scope of the 60 nology is designed to improve on preceding, and comple invention. ment compatible, technologies. The resultant stabilized FV fragments will have the fol 6.1. Introduction lowing characteristics: Several polypeptides and polypeptide complexes with 1. The conjugates will be stable under a broad range of pH significant commercial value have been identified in recent 65 and redox conditions and at high protein concentrations. years, and furthermore, several modular domains have been 2. The resultant cross-linked complex will be minimally identified that mediate protein-protein interactions. For immunogenic since no exposed residues are altered. US 7,037,894 B2 51 52 This Fv fragment stabilization technology is well suited residue pairs most likely to result in a cross-linked heavy for the development of new products with novel chain-light chain tyrosine pair that minimally alter the Fv applications, the improvement of existing immunoglobulin fragment's structural characteristics. based products, and the complementation of existing tech nologies for the development of novel immunoglobulin 6.4.1. Data used for the Analysis applications. 5 Residue amino acid usage data is data compiled on amino acids encoded and expressed at each residue of known and 6.3. Fv Fragment Applications sequenced FV fragments. It is collected in, and obtained There is a wide spectrum of potential applications for from, the publication “Proteins of Immunological Interest'. immunoglobulin-based products, the limits of which are Kabat and Wu, Government Printing Office, NIH Publica determined by the following factors: 10 tion 91–3242, 1991 (“K&W). The amino acid sequences in The target must be in an environment that is accessible to this publication are ordered according to a standardized immunoglobulin-derived products, such as, for example, numbering system that takes into account the gene structure serum, the extracellular matrix, the brain, or the intracellular of the heavy and light chain variable regions. In the variable space by way of liposomes (Hoffman R. M. J. Drug Target.: regions of the heavy and light chains alike, four Framework vol. 5(2): pp. 67–74, 1998) or peptide induced cellular 15 uptake (Schwarze S. R. et al. Science; vol. 285: pp. 1565–72, Region segments (FRS)—which are relatively conserved— 1999). For intracellular applications of immunoglobulin, see are interspersed by three-highly variable— Bosilevac J. M. et al. J. Biol. Chem.: vol. 273 (27): pp. Complementarity Determining Regions (CDRs). The CDRs 16874–79, 1998; Graus-Porta D. et al. Mol. Cell Biol.; vol contain the amino acids that determine the antibody's 15: pp. 1182–91, 1995; Richardson J. H. et al. Proc. Nat. specificity, and that physically contad the antigen. Aligning Acad. Sci., USA; vol. 92: pp. 3137–41, 1995; Maciejewski ali sequences according to the K&W numbering system was J. P. et al. Nat. Med... vol. 1: pp. 667–73, 1995; Marasco W. very important for the purpose of performing a statistical A. et al. Proc. Nat. Acad. Sci., USA; vol. 90: pp. 7889–93, analysis as described in this example since the correspond 1993; Levy Mintz P. et al. J. Virol.; vol. 70: pp. 8821–32, ing residues of the FRS are thereby always aligned, regard 1996; Duan L. et al. Hum. Gene Ther.: Vol. 6(12): pp. 25 less of the varying sequence lengths of the interspersed 1561–73, 1995; and Kim S. H. et al. Mol. Immunol.; vol. CDRs. This ensured that statistical measurements were 34(12–13): pp. 891–906, 1997. A favorable environment is made with sets of data containing appropriate and compa present in all tissues and organs that are reached by the blood rable data points. Coordinate data for distance calculations Supply, and where the target molecule is present on the cell of all atoms other than hydrogens of 17 Fv fragments from surface or in the extra-cellular matrix. Since the function crystallographically solved immunoglobulin structures was ality of iminunoglobulin-derived FV fragments is primarily 30 downloaded from the protein structure database Brookhaven to bind to target molecules, binding to the target should preferably suffice to accomplish the desired therapeutic or National Laboratory (FIG. 5). These data provide the three diagnostic effect. Catalytic functionality is, however, also dimensional coordinates (x, y, and Z) for each atom in a known for immunoglobulin, and may therefore also be solved structure, expressed in metric units, i.e. Angströms achieved in pharmacological and/or industrial contexts 35 (10' m, A). With this data it was possible to calculate the (Pluckthun A. et al. Ciba Found. Symp.; vol. 159: pp. three-dimensional distances between any desired atoms 103–12; discussion 112–7, 1991; Kim S. H. et al. Mol. (e.g., amino alpha and beta carbon atoms) and to calculate Immunol, vol. 34: pp. 891–906, 1997). statistical measurements of the variability of Such distance There is a multitude of applications of potential between the different Fv fragments in the sample being immunoglobulin-based applications that meet these criteria, 40 analyzed (FIGS. 5, 6, and 7). and it is the purpose of the following paragraphs only to 6.4.2. Selection Methodology point out certain relevant applications, as examples. Optimal residues, to which the cross-link reaction is 6.3.1. Drug Delivery/Tissue Targeting directed, were selected by a series of filters based on the Many existing applications of immunoglobulin therapy statistical measurements of values in databases compiled for make use of antibody's ability to direct therapeutic agents to 45 the targeted tissues. Such therapeutic agents have thus far the purposes of this selection. These databases contain been toxins and radioisotopes targeted to tumors by linkage numeric measurements of (1) alpha carbon spacing, (2) beta to anti-tumor associated antigen or anti-tumor specific carbon spacing and the difference between the alpha and antibodies, on the one hand, and diagnostic agents, i.e. beta distances, and (3) residue amino acid usage (see below). antibodies linked to an imaging agent, on the other hand. 50 6.3.2. Modulation of Extra-Cellular Biochemical Pro 6.5. Filter 1 CSSS There are a multitude of biochemical processes that are of Elimination of Residue Pairs with Glycines therapeutic, and thus of commercial relevance that occur in extra-cellular milieus, such as blood serum. One example of Glycine is the Smallest of the amino acids and has no beta Such a process is the process of blood clotting. In this 55 carbon and is often associated with positional flexibility of example, the immunoglobulin binds to one of the proteins protein structures. Substitution of a glycine with one of the involved in the biochemical cascade of reactions that lead to largest amino acids, tyrosine, would likely have too great an the formation of blood clots, and interrupts this cascade, impact on the overall structure of the protein complex, and thereby blocking the formation of blood clots. The thera thereby on the antigen-binding characteristics of the cross peutic value of being able to inhibit the formation of blood 60 linked FV fragment. Therefore, as a first cut, from among all clots, indeed, spurred the development of one of the first candidate residue pairs of the Framework Regions, those immunoglobulin-based pharmaceutical to enter the market. pairs, of which one of the residues is most frequently a glycine (as determined by comparison with the K&W data) 6.4. Selection of Optimal Residues for Tyrosyl were eliminated a priori. For the purposes of this analysis Tyrosyl Cross-Link 65 most frequent occurrence of a particular amino acid at a The selection process consisted of a series of statistical given residue was defined as occurrence in more than 75% tests or filters’ aimed at Successively narrowing down the of the sample. US 7,037,894 B2 53 54 cant positional flexibility were selected. Therefore, residue TABLE 1. pairs were eliminated from among those in Table 1 in which the optimal distance, 8.72 A, does not fall within 2 times of Heavy chain-light chain candidate pairs with average alpha carbon distance measurements mx, within the range of 5.70A to 11.74A that specific residue pairs standard deviation from its aver (sorted by K&W numbering, first on the light chain, second on heavy age. In this example, 36 residue pairs met this criterion. chain positions). Furthermore, the relative positional flexibility of the remain ing 12 candidate residue pairs was rated according to the AVER- AVER Light Heavy AGE STDEV Light Heavy AGE STDEV following formula: 36 45 O.38 0.23 44 91 9.33 0.33 10 Rating I=a,/o. 36 103 O.99 O.31 44 92 O.91 O-40 37 45 149 0.36 44 93 9.74 0.29 C=Tu-2O, for all L2T 38 39 149 0.18 44 103 6.92 O.30 C=1+2O-T, for all L2T 38 45 O.17 O.43 44 105 8.95 0.55 38 103 1.26 O41 45 93 O43 O.41 T=optimal distance 40 41 1.27 1.50 45 103 740 0.41 15 40 43 1.68 1.34 45 105 O.95 0.45 L=the average distance for any given residue pair 42 39 1.04 O.84 46 93 O.78 0.40 42 89 O.28 O.99 46 94 19 O.25 O =standard deviation of the distance for any given resi 42 90 1.72 O.88 46 103 8.98 0.33 due pair 42 91 O.S O.66 85 43 O4 0.49 Thus, residues that scored highly under this metric are 42 103 O.13 O.34 85 45 O.93 0.37 those that (i) have an average spacing close to the optimal 42 105 7.14 O.40 86 45 O.63 0.35 42 107 1.18 O.82 87 43 64 0.32 distance, and/or (ii) have a large standard deviation. The 43 4 1...SO O.S6 87 45 8.19 O.25 remaining 12 residue pairs are listed, sorted by Rating I in 43 37 O.94 O.87 87 46 O.90 O.33 Table 2. 43 38 O.97 O.98 88 45 O.04 O.10 43 39 O.34 0.79 88 46 69 O.21 43 45 O.78 O.71 98 37 O.24 O.31 25 TABLE 2 43 89 9.95 O.71 98 38 25 0.25 43 90 O.23 O.72 98 39 17 O.2O Residue pairs of Table 1 selected' and rated by Rating I2. 43 91 8.04 O.71 98 43 6O 0.39 43 92 O.21 O.S9 98 45 6.49 0.18 Heavy Light Rating I AVG STDEV 43 93 O.14 O.65 98 46 6.66 0.29 44 105 1.35 8.95 0.55 43 103 6.74 0.51 98 48 7.65 0.57 30 43 91 O.76 8.04 O.71 43 105 5.74 O.44 98 49 37 O.S8 46 103 O49 8.98 O.33 43 107 O.66 0.62 100 39 42 0.29 1OO 43 O.33 8.27 0.41 44 37 O.S8 O.39 100 43 827 0.41 43 37 O.26 10.9 O.87 44 38 1.31 OSO 100 45 7.82 0.27 42 89 O.17 10.3 O.99 44 39 O.73 0.48 100 46 956 O.46 40 41 O.14 11.3 1...SO 44 45 9.43 0.48 102 43 47 O.36 35 44 45 O.13 9.43 O48 43 89 O.O6 9.95 O.71 1OO 46 O.O1 9.S6 O46 6.6. Filter 2 98 48 O.O1 7.56 0.57 Identification of Appropriately Spaced Residue 44 91 O.O1 9.33 O.33 Pairs 40 'Selection criterion: optimal distance (T) must fall within the range of the To find residue pairs spaced appropriately for a tyrosyl residue pair's specific distance average (1) +/- 2 times the residue pair's tyrosyl bond, the alpha carbon to alpha carbon distances Specific standard deviation (O). from every residue in the light chain to every residue in the ‘Rating I formula: Cf.?o, where T is the optimal distance, and C. = T – heavy chain in Fv fragments represented in the Brookhaven 1 + 20, for all 1, 2 T, and a = 1 + 20, - T, for all 1 is T. National Protein Structure Database were calculated in a 3D database. This calculation was performed by applying 45 6.8. Filter 4 Pythagorean geometry to the 3D coordinates of the alpha carbons (FIG. 6). For every combination of heavy and light Side-Chain Orientation chain residues, the average, standard deviation, range and In the space that the heavy and light chains occupy, the median of the alpha carbon-alpha carbon distance was tyrosine side chains should be oriented toward each other for calculated on the Fv fragments in the sample (FIG. 7). Based 50 a cross-link to form with minimal structural distortion. The on the calculations above, as a second cut, all residue pairs difference between the alpha carbon distance (i.e. the back were selected whose alpha carbons are spaced at an average, bone carbon distance: FIG. 6) and the beta carbon distance m, within the selection range. The range that was selected (i.e. the distance between the first carbons in each side chain; for was the following: FIG. 8) of each residue pair was calculated as a proxy, i.e. Min 5.70 A, Max 11.74 A. 55 an estimate of the orientation of the side chains relative to The optimal distance (T) was calculated by averaging the each other (FIG. 9). maximum and the minimum of the range. Therefore, The range that was selected for was the following: T-(5.70 A+11.74 A)/2=8.72 A. Min-0.5 A, Max 2.0 A. In this example, 64 residue pairs met this criterion, listed 60 in Table 1. The optimal distance difference (D) was calculated by averaging the maximum and the minimum of the range. 6.7. Filter 3 Therefore, Identification of Residue Pairs with Sufficient Positional Flexibility 65 In order to identify residue pairs at which substitution to Again, based on 3D coordinate geometry, for each residue tyrosine is minimally disruptive, residues pairs with signifi pair, the distance between the beta carbons was calculated US 7,037,894 B2 55 56 (FIG. 8). The beta distance was then subtracted from the O =standard deviation of the distance difference for any alpha distance of the residue pair (FIG. 9). This filter was given residue pair based on whether the average difference in the alpha and beta distances of a residue pair (FIGS. 10 and 11) falls within Of the set of potential residue pairs listed in Table 4, five the estimated optimal range. In this example, 12 residue 5 pairs met these criteria. This set of potential residue pairs is pairs met this criterion, listed in Table 3. listed in Table 5. TABLE 4 TABLE 3 10 Residue pairs of Table s selected' and rated according to Rating IIf Residue pairs of Table 2 selected by average alpha-beta distance difference. Difference between C Heavy Light Rating I AVG STDEV AVG STDEV alpha and C-beta distances Alpha Carbon distance

91 43 O.76 8.04 O.71 1.33 O.70 Heavy Light Rating II Average Stdev Rating I Average Stdev 45 43 O.S6 10.78 O.71 -0.04 O.31 15 92 43 O.10 -0.23 O.61 O.1S 10.21 O.S9 103 46 O49 8.98 O.33 O.81 O.18 39 43 O.17 O41 O.28 O.OO 10.34 0.79 39 42 O48 11.04 O.84 O.21 O.14 48 98 O.30 O.87 O.17 O.O1 7.65 0.57 91 42 O.30 1O.S O.66 -0.14 O.17 103 46 O49 O.81 O.18 O.49 8.98 0.33 37 43 O.26 10.94 O.87 O.81 O.S9 91 43 O.96 1.33 O.70 O.76 8.04 0.71 89 42 O.17 10.28 O.99 O.O1 O.O6 89 43 1.27 O.71 O.36 O.O6 9.95 0.71 92 43 O.15 10.21 O.S9 -0.23 O.61 2O 93 43 1.79 1.07 0.73 O.O2 10.14 O.65 89 43 O.O6 9.95 O.71 O.71 O.36 37 43 2.10 O.81 O.S9 O.26 10.94 O.87 93 43 O.O2 10.14 O.65 1.07 0.73 48 98 O.O1 7.65 0.57 O.87 O.17 'Selection criterion: Optimal difference in alpha and beta distances (D) 30 43 O.OO 10.34 0.79 O41 O.28 must fall within the range of the residue pairs average alpha-beta distance-difference (6.) 2 x the residue pair's specific standard deviation Furthermore, analogously to the selection based on alpha 25 O).Rating II formula: Cl?o, whereby D is the optimal distance difference, carbon distances, those pairs were eliminated for which the and C = D – 6 + 2o., for all 8, 2 D, and C = 8, + 20, - D, for all 8, optimal average distance difference, 0.75 A, does not fall is D. within 2 times that residue pair's specific standard deviation from its average. Note that optimal alpha-alpha distance and alpha-beta 30 Rating II=a./o. distance difference (Target) also falls comfortably within the range of actually measured values of most of the residue C=D-ul-H2O, for all 12D pairs selected, as shown in Table 5. This is important, C=u,+2O,-D, for all 12D because it further underscores the likelihood that the D=optimal distances difference 35 selected candidate pairs will result in cross-linked tyrosine L=the average distance difference for any given residue side chains that minimally disrupt the Fv fragment structure pair and function.

TABLE 5 Average, median, standard deviation, and range of actually measured alpha-alpha distances and alpha-beta distance differences. The remaining residue pairs are identified in the top two rows by their heavy and light chain K&W residue numbers. Heavy 37 39 89 91 92 93 103 48 Light 43 43 43 43 43 43 46 98 Average 10.94 10.34 9.95 8.04 10.21 10.14 8.98 7.65 Stdew O.87 0.79 0.71 O.71 O.S9 O.65 0.33 O.S7 Alpha Carbon Max 13.23 12.37 11.7S 9.82 11.81 11.81 9.63 8.68 Distance Min 9.94 9.63 9.OS 7.32 9.56 9.42 8.39 6.78 Median 10.81 10.10 9.80 7.92 9.99 9.95 8.95 7.89 Average 0.81 0.41 0.71 1.33 –0.23 1.07 0.81 0.87 Stdew O.S9 O.28 O.36 0.70 0.61 O.73 0.18 O.17 Ca-Cb Max 142 0.84 1.17 2.O2 (0.33 1.74 109 1.37 Difference Min -0.64 -0.10 -0.08 -0.25 -1.86 -0.69 O.4O O.63 Median 1.03 0.45 0.75 1.65 0.05 1.29 0.77 0.81

6.9. Filter 5 Amino Acid Side-Chain Usage 60 Since residue pairs are to be substituted with tyrosine such that the substitutions are minimally disruptive to the struc ture and function of the resulting cross-linked complex, residue pairs were selected from among those in Tables 4 and 5 such that the properties of the original amino acid 65 side-chains were as similar as possible to those of tyrosine. The principal side chain properties that were measured are (i) van der Waals volume and (ii) hydrophobicity. These US 7,037,894 B2 57 58 measurements were used as proxies for the size and charge pairs average van der Waals and hyrophobicity scores and of the amino acid side chains, respectively. their standard deviations with those of a tyrosine pair. At each residue, every occurring amino acid side chain was given a numeric value representing its van der Waals TABLE 6 volume and its hydrophobicity (FIG. 12). Based on amino acid usage data for these residues (Kabat & Wu), the average Numeric values of amino acid side chain van der Waals volumes and standard deviation of the residue's van der Waals (Richards, F. M. J. Mol. Biol. 82, 1–14, 1974) and hydrophobicity volume and hydrophobicity were calculated, both weighted, Eisenberg, D. Ann. Rev. Biochem, 53, 595-623, 1984). and un-weighted by the frequency at which the specific side Wander Walls chain occurs at this residue. A weighted Statistical measure Amino Acid volumes A Hydrophobicity ment is calculated on every value present in the sample 10 Ala 67 O.62 (n=number of sequences in 2-D database), and an Arg 48 -2.50 un-weighted Statistical measurement is calculated on the ASn 96 -O.78 value of each occurring amino acid (n=20 maximally) (FIG. Asp 91 -O.90 13). Cys 86 O.29 For example, given 10 sequences in a database, whereby 15 Gln 14 -O.85 Glu 09 -O.79 at a given residue alanine occurs 8 times, and leucine twice, Gly 48 O48 the weighted average of the van der Waals volumes would His 18 -0.40 be: Ile 24 140 (8Xala value + 2 x leu value)f 10 = (8 x 67 + 2 x 124)f 10 = 78.4. Leu 24 1.10 Lys 35 -150 Met 24 O.64 Phe 35 1.2O In the same example, the un-weighted average would be Pro 90 O.12 Ser 73 -0.18 (ala value + leu value)f 2 = (67 + 124) f2 = 95.5. Thr 93 -O.OS Trp 63 O.81 25 Tyr 41 O.26 Wall 05 1.10 The numeric values of all 20 amino acids of both van der Waals volume and hydrophobicity used for the selection are listed in Table 6. For each of the residues listed in Table 5, the average van Each of the 6 residue pairs identified in the structural 30 der Waals volumes and hydrophobicity values and their analysis was examined for its ability to be “conservatively standard deviations, weighted and unweighted, are listed in Substituted with two tyrosine residues, by comparing the Table 7 and 8, respectively.

TABLE 7 Van der Waals scores for residue pairs and comparison to a tyr-tyr pair. Heavy 37 39 89 91 92 93 103 48

Consensus VAL GLN VAL TYR CYS ALA TRP VAL Average 109 113 110 141 86 69 160 110 Stdew 8 12 12 1 9 11 9 unweighted Average 116 103 122 138 86 78 136 116 Stdew 10 51 18 4 26 27 10

Light 43 43 43 43 43 43 46 98

Consensus ALA ALA ALA ALA ALA ALA LEU PHE weighted Average 72 72 72 72 72 72 124 135 Stdew 14 14 14 14 14 14 3 2 unweighted Average 94 94 94 94 94 94 118 128 Stdew 24 24 24 24 24 24 11 6

Heavy 37 39 89 91 92 93 103 48 Light 43 43 43 43 43 43 46 98

2 x tyr value 282 282 282 282 282 282 282 282 Comb. value' 181 18S 182 213 158 141 283 245 weighted Difference? 101 97 1OO 69 124 141 1 38 Comb. Stdev. 22 26 26 15 14 23 14 11 Rating III O.21 0.27 O.26 O.21 O.11 O16 10.39 O.28 2 x tyr value 282 282 282 282 282 282 282 282 Comb. value' 210 197 216 232 18O 172 253 244 unweighted Difference’ 72 85 66 50 102 110 29 39 Comb. Stdev. 35 75 43 29 24 50 38 17 Rating IV O49 O.89 O.64 0.57 O.24 O46 1.32 O.43 'Sum of the residue pair's average van der Waals values Size of the di ference (square root of squared difference) between the sum of the value for two tyrosine residues (282) and the sum of the residue pairs’ average values () Sum of both residue's standard deviation Formula used: Stolev/Difference (/) US 7,037,894 B2 59 60

TABLE 8 Hydrophobicity scores for residue pairs and comparison to a tyr-tyr pair. Heavy 37 39 89 91 92 93 103 48

Consensus VAL, GLN VAL TYR CYS ALA TRP VAL Weighted Average 1.14 -0.86 O.90 O.30 O.29 O.58 0.79 1.14 Stdew O.14 O.35 0.66 0.20 - O.19 O3O O. 11 Unweighted Average 1.07 -O.96 O41 O.73 O.29 O.S4 O41 1.25 Stdew O.27 1.49 137 0.66 O.47 10S 0.17

Light 43 43 43 43 43 43 46 98

Consensus ALA ALA ALA ALA ALA ALA LEU PHE Weighted Average O.SO O.SO OSO O.SO O.SO OSO 108 120 Stdew O.33 O.33 O.33 O.33 O.33 O.33 O.O9 O.O3 Unweighted Average O.47 O.47 O.47 O.47 O.47 O.47 O.9S 1.23 Stdew O.S9 O.S9 O.S9 O.S9 O.S9 O.S9 O27 O.15

Heavy 37 39 89 91 92 93 103 48 Light 43 43 43 43 43 43 46 98

2 x tyr value O.S2 O.S2 O.S2 O.S2 O.S2 O.S2 OS2 2.34 Comb. value' 1.64 -0.36 14O O.80 O.79 108 1.87 1.82 Weighted Difference’ 1.12 0.88 O.88 O.28 O.27 O.S6 135 0.13 Comb. Stdev. 0.46 0.69 1.00 0.53 0.33 0.53 0.38 0.07 Rating V O42 (0.78 113 1.89 124 O.97 O28 0.06 2 x tyr value O.S2 O.S2 O.S2 O.S2 O.S2 O.S2 OS2 O.S2 Comb. value' 1.54 -0.49 0.88 1.20 O.76 1.01 1.35 2.48 Unweighted Difference’ 1.02 101 O.36 O.68 O.24 O49 O.83 1.96 Comb. Stdev. 0.87 2.09 1.97 1.26 0.59 1.07 1.32 0.33 Rating IV O.8S 2.07 S.44 1.86 2.49 2.20 158 0.17 'Sum of the residue pair's average hydrophobicity values °Size of the difference (square root of squared difference) between the sum of the value for two tyrosine residues (0.52) and the sum of the residue pairs’ average values () Sum of both residue's standard deviation Formula used: Stolev/Difference (/)

6.10. Filter 6 35 TABLE 9-continued Residue amino acid identity conservation Partial Elimination of Pairs with Highly Conserved Occurrence Sample No. AA identity Residues 40 of size, occurring COSE Consensus' consensus’ N3 AAs vation Light All residues under consideration are within the Frame Chain work Regions of either the heavy or the light chain of Fv 45 43 ALA 49 65 6 75% fragments, and can therefore be expected to be conserved. 46 LEU S4 57 3 95% Therefore, for the purpose of this analysis, residues that are 98 PHE 66 68 3 97% more than 80% conserved (see Table 9) are eliminated, with "Most frequently occurring amino acid the indicated residue the exception of pairs in which an aromatic amino acid is °Number of the consensus amino acid () occurrences at the indicated resi conserved (see below). due 50 Number of amino acids known for an Fv fragment at the indicated resi due TABLE 9 Number of different amino acids (AAS) occurring at the indicated residue Occurrence of the consensus amino acid () divided by the sample size, Residue amino acid identity conservation N(). Occurrence Sample No. AA identity 55 of size, occurring COSE Consensus' consensus’ N3 AAs vation Heavy Of the residues of the residue pairs of tables 4, 5, 6, 8, and Chain 9, four pairs either do not contain a conserved aromatic 37 VAL 31 40 4 78% 60 amino acid, or do contain a residue that is more than 80% 39 GLN 35 37 3 95% conserved, and are therefore eliminated. 48 VAL 30 42 4 71.9% 89 VAL 25 40 7 63% 91 TYR 42 44 2 95% 92 CYS 44 44 1 100% 93 ALA 37 42 4 88% 65 The remaining residue pairs, that are predicted to be the 103 TRP 30 33 3 91% optimal positions for the cross-link, are listed in Table 10 with all ratings described above. US 7,037,894 B2 61 62 3-D Database Import and Sorting of Data TABLE 10 Sorting by Sequence (2-D) Import of 3D-ordinate data of the polypeptides (from the Selected potential residue pairs for the tyr-tyr cross-link to be directed to. structure of the complex as a whole) Residue pairs 5 Define: (H/L) Rating I Rating II Rating III/IV Rating VVI m (subscript)=sample size (number) of different struc 103f46 O.49 O49 10.39,132 O.28,158 tures file imported (for both polypeptides of a complex) 89.43 O.O6 1.27 0.260.64 1.135.44 Alignment of data according to functional conservation (e.g. 37.43 O.26 2.10 0.210.49 O42.O.85 Kabat & Wu numbering system for Ig) 48.98 O.O1 O.30 0.280.43 O.06.0.17 10 Sorting by Atomic, 3-D Position Sorting of coordinate data by amino acid residue and atom position 6.11. Residue Pair Selection Flowchart for Select alpha and beta carbons Software 15 Define: Ca1=alpha carbon belonging to the first of two polypep Database Assembly tides Starting Material Ca2=alpha carbon belonging to the second of two 2-D Database Import and Sorting of Data polypeptides Sequence Data 2O Cb1=beta carbon belonging to the first of two polypep Import of 2D-polypeptide sequence data tides Cf32=beta carbon belonging to the second of two Define: polypeptides S=Sample size (number) of sequences of the individual polypeptide chains of the protein complex (preferably 25 Coordinates of Cali: X11, y1, Z11, in polypeptide pairs of a complexes) Coordinates of Ca2, X2, y12, Z 42, Alignment of data according to functional conservation (e.g. Coordinates of CB1, X1, y1, Zai, Kabat & Wu numbering system for Ig) Coordinates of CB2, X2: y 2, Z2, Define: Assembly of Residue Pairs i (subscript)=amino acid position within the alignment 30 Assembly of all possible inter-chain pairs of residues system to which any given atom belongs Define Compilation of identity (three letter code) and frequency of (subscript)=pair of amino acids as they fall within the amino acids occurring at each residue above alignment system of both polypeptide chains Define: Compilation of Relevant Measurements; Secondary, Deriva f=frequency of the occurrence of a particular amino acid 35 tive Data at a given residue, i 2-D Derivative Data n=number of amino acids occurring at a given residue, i Computation of Residue characteristics for each physical Define and mark residues of both polypeptides within the property conserved regions of both polypeptides (Framework Retrieval of numeric values of each side-chain physical Regions for FV fragments) 40 property for each amino acid occurring at each residue Assign: Match every amino acid identity at each residue in the con=conserved residues look-up table, and retrieve corresponding numeric val non=variable residues US Assignment of consensus 45 Calculation of weighted Statistical measurements for each Define: residue Define: The consensus is the most frequently occurring amino w=weighted average of the sample, s, of numeric acid at any given residue of either polypeptide. values of a physical property at each residue, i. Assign: weighted by each occurring amino acids frequency of For each residue, i, 50 occurrence, f, Assign the consensus using, for example, amino acid wo-weighted standard deviation of the sample, s, of single-letter code. For residues at which two or more numeric values of a physical property at any residue, i. amino acids occur most frequently, assign all most weighted by each occurring amino acids frequency of frequently occurring amino acids. 55 occurrence, f, Data On Physical Properties of Amino Acid Side-Chains Calculate: Compilation of look-up tables with amino acids and corre for the sample of sequences in the database, S, for each sponding numeric values Numeric values correspond to residue, h, and for each physical property, p the most relevant physical properties of amino acid side chains as they influence the overall structure of polypep tide complexes (e.g. side-chain volume, charge, 60 hydrophobicity, and degrees of rotational freedom, etc.) Define: Calculation of un-weighted Statistical measurements for p (Subscript): amino acid side-chain physical property each residue chosen for the selection process 65 Define: N=numeric value of a physical property corresponding up=un-weighted average of the sample, s, of numeric to an occurring amino acid at a given residue, i values of a physical property at any residue, i, not US 7,037,894 B2 63 64 weighted by each occurring amino acids frequency of |LA-Average of all A, occurrence, f, uOp=un-weighted standard deviation of the sample, s, of VA=Median of m A, the numeric values of a physical property at any OA=Standard deviation of all A, residue, i, not weighted by each occurring amino acids 5 Max=Maximum of all A, frequency of occurrence, f, Calculate: Mina-Minimum of all A, for the sample of sequences in the database, S, for each Calculation of 3D Angles, p, and up, residue, i, and for each physical property, p: Define: 10 t4=(X,n) n, (p=angle described by the atoms (points) CB1-Co-1- uo-SQR T(n,x'n'-X(n*N))/n, (n-1)) CO2, Calculation of Each Pair’s Combined Average and Stan p =angle described by the points CB2,-CO2-CC1, dard Deviation 15 For both residues of each pair the sum of both average and val-vector from Co.1, to Co.2, standard deviation values are calculated for each physi va2-vector from CO2, to Co.1. cal property. vb1=vector from Co.1, to CB1 Calculate: vb2-vector from Col.2, to CB2, For every residue pair, j: 2O Calculate: Wilf-Wulf+Will; vector coordinates, for every residue pair, j:

val Xval; XA2 XAI yvali y A2i. yAII Zvalj - ZA2; ZAII

title=ll;tall; Angle (p, (based on scalar products), for every residue pair, wo-wo-wo j

3-D Derivative Data (vali: Vibii + y vali + 8 yybijzybii 83 bii) Calculation of Residue Pari Inter-atomicalphacarbon spi = acco Distances, D. Application of Pythagorean geometry to the alpha carbon 40 coordinates of each residue pair, And for the sample of structures in the database, m Calculate: L=Average of all p, For every residue pair, j: v=Median of all (p, O=Standard deviation of all p, Max=Maximum of all p, Min=Minimum of all p, And for the sample of structures in the database, m Calculate: L1-Average of all D, Angle up, (based on scalar products), for every residue v=Median of all D, pa1r, O=Standard deviation of all D, 50 Max-Maximum of all D, Min =Minimum of all D, } = acco Calculation of Difference Between Residue Pair Alpha and Beta Carbon Distances, A, 55 Application of Pythagorean geometry to residue pair beta And for the sample of structures in the database, m carbon coordinates, and Subtraction Ll-Average of all p, Calculate: v=Median of all p, For every residue pair, j: O=Standard deviation of all p, D: formula as described for alpha-carbon distance mea 60 Max=Maximum of all p, Surement with beta carbon distance measurement with Min =Minimum of all p, beta carbon coordinates X, 2. Calculation of the Third 3D-angle y B1 and 2: AB1 and 2 Define: A=D-D, 65 Vector g1, (vg.1): A1,-B2, Plane E1 described by vectors val, and vbl, And for the sample of structures in the database, m Plane E2, described by vectors val, and vbl, US 7,037,894 B2 65 66 Vector n1,(vn1), perpendicular to E1, the vector product Calculate: of val, and vbl, For each residue pair, Vector n2,(Vn2), perpendicular to E2, the vector product of val, and vbl, uR=vuo, (abs(T-ulti-vuo) Calculate: 5 Residue Pair Ratings Based on 3-D Database Vg1 coordinates, for every residue pair, Alpha Carbon Spacing Define: V, allowable multiples of the standard deviation of 10 inter-chain alpha carbon distances, O, vMax... maximal value allowable for L1, in the selection process VMin: minimal value allowable for L1, in the selection process 15 T.: Target value for alpha carbon spacing Calculate: R. Rating based on inter-chain alpha carbon spacing, Vn1 and Vn2 coordinates (vector products), for every scores high for residue pairs, j, with Ll, values close to residue pair, the target value, T., and/or with high O. values vn1 =vector product of val, and va2, (flexibility) Vn2=vector product of Val, and Vg1, Calculate: T=average of vMax, and VMin, For all residue pairs, Vinli Vn2, 25 For all 1 < T.: For all 1 < T.: R = (T. - 1 + v * O.) 2 to R = (1 + Vr, Oi - T,) 2 to Calculate: Angle between Vn 1, and Vn2, angle X for every residue 30 d and Angles pair, j Define: V: allowable multiples of the standard deviation of p, and , angles, O, and O, X acco 35 vMax: maximal value allowable for 1, in the selection process (same value for both angles) vMax: minimal value allowable for u in the selection And for the sample of structures in the database, m process (same value for both angles) L=Average of all X, T: Target value of p and p angles (same value for both V=Average of all X, 40 angles) R. Rating based on the angles (p and p; scores high for O=Standard deviation of all X, residue pairs, j, with Li, and Ll values close to the Max=Maximum of ally, target value, T. and/or with high O and O, values Min =Minimum of all X, (flexibility) Compilation of Residue Pair Ratings; Tertiary, Derivative 45 r: Sub-rating based on the angle (p Data r: Sub-rating based on the angle p Residue Pair Ratings Based on 2-D Database Calculate: For each Physical Property Chosen for the Selection Process Tw-average of vMax, and VMin Define: 50 For every residue pair, T=sum of the numeric values of the physical properties of the amino acids to be substituted with in both polypeptide chains (2x value of tyrosine for the tyrosine oxidative cross-link) For all li < Th: For all led < T100 : ri (Tap - bi + Vrba :" O').2 Obi rj - (4 + VR. :"Oi - Tip) 2 (O. v=allowable multiples of the weighted and un-weighted 55 rij - (Tap - 'bi + VR Oyj)7Osiri - (-lpp + VR. " Opi - Tip). O: standard deviations of a physical property's values, R = average of ri and ri uO. Rating (R) based on numeric values of a physical property, p, corresponding to occurring amino acids, weighted by Difference Between Alpha- and Beta Carbon Spacing Define: the frequency of each amino acids occurrence. 60 Calculate: VA: allowable multiples of the standard deviation for each residue pair, j, of m differences between inter For each residue pair, chain alpha- and beta carbon distances, O.A. vMaxa: maximal value allowable for LLA in the selection 65 process Rating based numeric values of a physical property, p. VMina: minimal value allowable for L.A. in the selection corresponding to occurring amino acids. process US 7,037,894 B2 67 68 TA: Target value for the difference between alpha beta Calculate: carbon spacing IFMin,T=True, select Selection Processes IFIL, +V.*O->T=False, discard For all >T, The sequence of filters is of no significance IFL-V. O.TA=True, select Max... maximum limit for the selection of an amino 45 acid side-chain physical property, p, based on weighted IFLA+VA*OA->TA=False, discard statistical measurements For all LldTA Min: minimum limit for the selection of an amino acid IFLA-VA*OA->TAl=True, select side-chain physical property, p, based on weighted IFLA-VA*OA->TA=False, discard statistical measurements 50 Final Selection Calculate: Selected Amino Acid Pairs All residue pairs, j, that are selected in all Filters (I.1–4 and IF Min-wR90% complete. 65 retained affinity in ELISA-type procedures. Using 96 well Insoluble matter is removed by centrifugation at 20,000 g at plates, the inside surfaces of the ELISA-assay plate wells are 4°C. for 20min. The supernatant is then incubated with 2 ml coated with antigen, for example integrin C.5 (Example 1) US 7,037,894 B2 71 72 and integrin B1 (Example 2). The wells are washed, and with 3 Jug of roughly 4.5 MBq/ug of Fv fragment complex. respect to one another, half the concentration of the full Injected animals are sacrificed at 15, 45,90, 360 min. and 24 length antibody and an equal molar concentration of the h. and immediately exsanguinated by cardiac puncture. F(ab) fragment of the antibody (see below) as positive Tissues are separated, dried and weighed on an analytical controls, and the Fv fragment of the antibody, cross-linked 5 balance, and counted in a gamma-radiation counter using a as described above, are incubated in PBS for two hours at high energy setting (for 'F). Aliquots of blood are also dried 37° C. in serial dilutions in the wells coated with the and counted. Counts are corrected for decay. Tissue:blood respective antigen on one plate. F(ab) fragments are derived ratios, and the percentage of injected dose per gram tissue by pepsin digestion of the full length antibody and Subse are calculated for each tissue. quent purification first by removal of the Fc fragments by 10 Early-phase blood clearance studies are performed in running the antibody/protease solution through a Protein A mice injected with the same amount of above described 'F column, and second by fractionating the flow-through of the radio-labeled stabilized FV fragments. Serial tail-vein blood Protein A column by ion exchange FPLC to remove the samples are taken at 1, 2, 5, 10, 15, and 30 min. The samples protease. The wells are washed four times with 200 ul of are dried and counted as described above, and the half-life PBS and the anti-HA tag and alkaline phosphatase-coupled 15 of the Fv fragments in blood is calculated according to secondary antibody are sequentially incubated in PBS for an standard procedures (Choi C. W. et al. Cancer Research; vol. additional hour at 37° C. Wells are washed again four times 55: pp. 5323-5329, 1995). with 200 ul of PBS. The concentrations of bound IgG, F(ab) As controls for the above studies, single chain and dis fragment, and FV fragment are determined by standard ulfide Fv fragment constructs of the same original immu procedures with an ELISA assay reader. noglobulin molecule are compared. 6.15.3. Stability in Serum, Lysate, and the Cyytoplasm Stability of the complex in serum is tested in time-course 7. EXAMPLE II experiments by incubating the complex in human serum at Candida antarctica Lipase B (CALB) 37° C., 38° C., 39° C., 40° C., 42°C., and 45° C. for up to two weeks, and testing for the remaining levels of functional 25 The following example illustrates certain variations of the Fv fragment complexes. As controls, the stability of Fab, methods of the invention for protein and protein complex ScFv's and/or dsRv's are compared, all tagged with the same stabilization. This example is presented by way of illustra marker. tion and not by way of limitation to the scope of the Stability of the complex in the cytoplasm is tested, also in invention. time-course experiments, analogously to the incubation in 30 Introduction serum, by incubating the complex in cell-lysates. More Several polypeptides with significant commercial value directly, the stability of the complex in the cytoplasm is have been identified in recent years, and furthermore, for tested by scrape-loading tissue culture cells with stabilized many of these polypeptides structural data is available. In FV fragments and assaying for the remaining levels of the following section, methods of Stabilizing one functional complexes. As controls, the stability of scFv's 35 polypeptide, a biocatalyst, for which data is available only and dsFv's of the same original immunoglobulin molecule, for the polypeptide itself, but not for other, structurally both tagged with the same marker as the cross-linked FV related polypeptides. Specifically, described below are the fragment, are compared. residue pair selection process, introduction of point In all of these experiments, the remaining levels of 40 mutations, expression of the polypeptides and their purifi functional complexes will be determined in ELISA assays cation and deglycosylation, the cross-link reaction itself, and with the same secondary antibody, as described above. analysis of the resulting stabilized biocatalyst; for the 6.15.4. Immunogenicity description of the adjustment of the cross-link reaction conditions, refer to Chapter 6. Furthermore, a description of Mice are injected with various doses, ranging from 1 Jug the combination of the dityrosine stabilization technology to 10 mg. of stabilized complex. Stabilized complex is 45 with a complementary technology, a directed evolution injected in the presence and absence of Freunds (Complete) approach, is described. Adjuvant. Further injections are given to the mice as boosts The biocatalyst stabilized in the below example is the every five days (in the presence and absence of Incomplete lipase B of Candida antarctica (“CALB, FIGS. 1C, 15A), Adjuvant). The mice receive a total of three or four boost an enzyme for which multiple commercially relevant appli immunizations. 50 cations are possible due to its exquisite enantioselectivity, of Tail-vein blood samples are taken before each injection, which some are still uneconomic due to its lack of stability and one week after the final boost. Blood samples are spun under adverse reaction conditions. at 3000 g for 30 min. at 4° C. The structure file 1 LBS containing the three dimensional ELISA plates are coated with the stabilized complex and 55 atomic coordinates of the polypeptide's crystal structure is a mixture of the unstabilized Fv fragment heavy and light obtained from the Brookhaven National Laboratory Protein chains, and ELISA assays are performed according to stan Database. The derivative data relevant to the selection dard procedures, using a labeled anti-mouse secondary anti process is calculated as described. The selection process is body. carried out using a set of filters that is convenient and The immunogenicity of complexes stabilized by the meth 60 appropriate for this application of the instant invention. ods of the instant invention are compared to dsEvs and Point mutations to tyrosine (directing the cross-link ScFv's constructs of the same original immunoglobulin reaction) are introduced according to the final selection of molecule as controls. the selection process, as described. The polypeptide is 6.15.5. Biodistribution expressed in Pichia pastoris as a yeast alpha factor fusion 'F radiolabeled stabilized Fv fragments, labeled accord 65 protein, which directs the secretion of the fusion protein. The ing to the procedures published by Lang L. and Eckelmann protein is affinity purified by its C-terminal His(6) tag, using U., 1994, are injected into mice. Each mouse is injected with NTA column. US 7,037,894 B2 73 74 The minimally required reaction conditions are adjusted relevance, and promises to contribute significantly to the US as described in Chapter 6. The cross-link efficiency of the Scientific and technical knowledge base. reaction is tested, and the resulting, stabilized biocatalyst is Selection of Optimal Residues for Tyrosyl-Tyrosyl Cross then tested for retained activity and specificity, and for Link improved Stability in time, and under adverse conditions. The selection process consisted of a series of tests or Advantages of the Tyrosyl-Tyrosyl Cross-Link for Bio filters’ aimed at successively narrowing down the residue catalysts pairs most likely to result in a cross-linked tyrosine pair that The underlying chemistry of the technology covered by minimally alter the activity or specificity of the enzyme, the present invention causes an oxidative cross-link to form while lending maximal stability. between reactive side-chains of polypeptides that form 10 stable complexes. The dityrosine bond is stable under a Data used for the Analysis broad range of pH and redox conditions. The cross-link Coordinate data for distance calculations of all atoms reaction requires close proximity between the reactive side other than hydrogens of CALB was downloaded from the chains that will cross-link. protein structure database Brookhaven National Laboratory Thus, the current invention describes a new technology 15 (FIG. 5). These data provide the three-dimensional coordi that allows stabilization of biocatalysts and enables their use nates (x, y, and Z) for each atom in the solved structure, in a broader range of industrial applications. This technology expressed in metric units, i.e. Angströms (10" m, A). is designed to improve on preceding, and complement These data also contains the amino acid sequence of the compatible, technologies. polypeptide. With this data it was possible to calculate the The resultant stabilized biocatalysts will have the follow three-dimensional distances between any desired atoms ing characteristics: (e.g., alpha and beta carbon atoms). 1. The enzymes will be more stable under a broad range Seliction Methodology of reaction conditions, including, but not limited to, Optimal residues, to which the cross-link reaction is temperature, pH, pressure, salinity, or concentration of other directed, were selected by a series of filters based on the compounds in the reaction, such as a reducing agent, which 25 measurements of values in a database compiled for the is often a component of the chemical reaction for which the purposes of this selection. This database contains numeric catalyst is required. measurements of (1) alpha carbon spacing, (2) beta carbon 2. The resultant cross-linked and stabilized biocatalyst spacing and the difference between the alpha and beta will retain its activity and specificity due to the specificity of distances, and (3) residue amino acid usage (see below). the cross-link reaction and to the selection process. 30 FILTER 1: SELECTION OF SUFFICIENTLY-SPACED This stabilization technology is well suited for the devel AROMATIC RESIDUES opment of new products with novel applications, the Because there are a significant number of aromatic resi improvement of existing industrial biocatalysts, and the dues available in the sequence of CALB, and because complementation of existing technologies for the develop mutation of an aromatic residue (other than tyrosine, i.e. ment of novel biocatalysts. 35 tryptophane, phenylalanine, or histidine) to tyrosine would Biocatalyst Applications be maximally conservative, for the selection process of this Biocatalytic enzymes constitute the preferred class of example, only aromatic residue pairs were analyzed. catalysts for industrial processes due to their high specificity Furthermore, to maximize the degree to which application of the instant invention stabilizes the enzyme, only pairs that and turnover rates, and their low development costs and 40 cycle times. However, their utility is limited by the relative are spaced more than 40 amino acids apart in the two instability and limited shelf-life of protein molecules that is dimensional amino acid sequence are selected. exacerbated under adverse reaction and/or storage condi tions. The technology of this invention that can be applied to TABLE 11 stabilize biocatalysts, thereby enhancing their utility and 45 Aromatic residue pairs with alpha carbon distances within the range broadening their commercial application. of 5.70A to 9.74A, Space more than 20 residues apart. Application of the instant invention stabilizes enzymes with specifically placed internal cross-links, and thereby Alpha carbon CC-CB Distance increases the stability of enzymes without impairing their CALB residue pair distance Difference activity in the desired reaction conditions. The resulting 50 Phe Tyr82 9.29 -0.20 Phea.8 Trp104 8.85 1.53 increase in enzyme stability thus not only addresses shelf Trp52 Tyr234 8.71 O.O2 life limitations but also increases the enzymes reaction rates Phel31 Tyr183 6.19 -1.31 and process yields. Trp104 His224 9.33 O.33 Industrial biocatalytic processes are used in many indus Tyr135 Tyr2O3 7.58 O.10 Tyr183 His224 8.2O -1.09 try Sectors, including the chemical, detergent, 55 pharmaceutical, agricultural, food, cosmetics, textile, Phel17 Tyr300 7.7 2.07 materials-processing, and paper industries. Within these industries, biocatalysts have many applications, ranging FILTER 2: IDENTIFICATION OF APPROPRIATELY from product synthesis (e.g. amino acid manufacturing, and SPACED RESIDUE PAIRS fine chemical synthesis of Small-molecule pharmaceuticals) 60 To find residue pairs spaced appropriately for a tyrosyl through use as active agents in products (for example, in tyrosyl bond, the alpha carbon to alpha carbon distance biological washing powders) to use in diagnostic testing between every residue pair in the polypeptide was calculated equipment. Biocatalysts also have industrial applications in a 3D database. This calculation was performed by apply that range from wastewater and agricultural soil treatment, ing Pythagorean geometry to the 3D coordinates of the alpha to crude oil refinement (e.g. desulfurication). 65 carbons (FIG. 6). Based on the calculations above, as a Thus, the example of an application of the instant inven second cut, all residue pairs were selected whose alpha tion described below focuses on a problem of wide carbons are spaced within the selection range. US 7,037,894 B2 75 76 Because of the lack of statistical measurements that give point-mutation is required to generate a tyrosine pair, these insight to positional flexibility, the selection range was distances may be altered, and all of the pairs are generated reduced by 2 A, but only on the upper limit. and examined. The range that was selected for was the following: Generating Proteins Containing the Selected Point Muta tions Vector Construction of pPal-CALB Min 5.70 A, Max 9.74 A. The C. antarctica lipase B gene (plasmid pMT1335) is isolated by polymerase chain reaction (PCR) omitting the FILTER 3: SIDE-CHAIN ORIENTATION pre-propeptide sequence according to standard procedures In the space that the heavy and light chains occupy, the known in the art, using the plasmid pMT1335 (Patkar et al. tyrosine side chains should be oriented toward each other for Chem. & Phys. Of Lipids, 1998. Vol. 93, pp. 95-101) as a a cross-link to form with minimal structural distortion. The 10 template. The lipase gene is amplified using the primers A difference between the alpha carbon distance (i.e. the back and B (see FIG. 15B) for the introduction of an EcoRI (and bone carbon distance: FIG. 6) and the beta carbon distance a His(6)-tag) and a NotI site at the 5’- and 3'-end, respec (i.e. the distance between the first carbons in each side chain; tively. The PCR product and the vector pPICZalphaA FIG. 8) of each residue pair was calculated as a proxy, i.e. (Invitrogen) are digested with the restriction enzymes EcoRi an estimate of the orientation of the side chains relative to 15 and NotI, and gel purified, using the kit QiaexII Gel extrac each other (FIG. 9). tion Kit (Qiagen, 2001 catalog # 20021) according to the The range that was selected for was the following: manufacturer's protocol. The insert is ligated into the vector, resulting in a fusion between the yeast alpha-factor secretion Min -2 A, Max 3.0 A. signal peptide (sequence contained in pPICZalphaA) and Again, based on 3D coordinate geometry, for each residue CALB, and the resulting plasmid construct, pPal-CALB, is pair, the distance between the beta carbons was calculated transformed by standard methods known in the art into (FIG. 8). The beta distance was then subtracted from the competent HB101 cells (E. coli). The transformants are alpha distance of the residue pair (FIG. 9). This filter was selected on LB-Amp agar plates. The CALB gene is based on whether the difference in the alpha and beta sequenced by standard methods known in the art. distances of a residue pair falls within the estimated optimal 25 Point Mutagenesis range. In this example, all of the residue pairs in Table 11 At the residues of the pair selected, as described above, met this criterion. amino acid Substitutions are introduced by point mutation, FILTER: PARTIAL ELIMINATION OF PAIRS WITH So far as tyrosine is not already present at the selected RESIDUES IN PROXIMITY TO THE ACTIVE SITE OF residues, using forward primer for M1 together with Primer THE ENZYME 30 B, and forward and reverse primers for M2 and M3, as The functionality of an enzyme as a biocatalyst lies in its described in FIG. 15B. Point mutations are introduced by ability to catalyze chemical reaction. The activity and selec using the QuikChangeTM Site-Directed Mutagenesis Kit (see above). tivity of a catalyst is most sensitive at those sites where the Protein Expression and Purification catalyst and the reactants physically contact each other. Protein expression and purification are carried out accord Therefore, mutations and/or cross-links are least desirable in 35 ing to an adapted method published by Rotticci-Mulder et al. the active site, and residues in or proximal to the active site The yeast strain P. pastoris SMD1168 (his 4, pep4) are excluded. (Invitrogen) is used for the expression of CALB (Schmidt His224 is in the active site, and is therefore excluded. Dannert. Bioorg. & Med. Chem., 1999. Vol. 7, pp. Because Tyr183 is in close proximity to His224, the selected 2123-2130; Rotticci-Mulder et al. Prot. Expr. & Purif. 2001. residues below should be mutated to generate polypeptides 40 Vol. 21, pp. 386–392.). Cells are made competent and with tyrosine pairs, with and without the mutation of Tyr183 transformed by standard methods known in the art, and to Phel83. Furthermore, because His224 is also in close transformants are selected on RD His agar plates (186 g proximity to Trp 104, and because Trp 104 is in close prox Sorbitol. 20 gagar, 20g dextrose, 13.4 g yeast nitrogen base, imity to Phe-A8, residue pairs containing the above residues 0.2 mg biotin, 50 mg amino acid mix without histidine per are also excluded. The remaining residue pairs are list in 45 liter). P. pastoris is grown in YPD medium (10 g yeast Table 12 below. extract, 20 g peptone, 20g dextrose per liter) or BMGY medium (10 g yeast extract, 20 g peptone, 13.4 g yeast TABLE 12 nitrogen base, 0.4 mg biotin, 10 mL glycerol, and 100 mL 1 List of remaining residue pairs with relevant distance measurements. M. KHPO/KHPO, pH 6.0 per liter). Protein expression 50 under the control of the AOX1 methanol-inducible promoter Alpha carbon CC-CB Distance Epsilon carbon is induced by growing the culture in BMMY medium (10 g CALB residue pair distance Difference distance yeast extract, 20 g peptone, 13.4 g yeast nitrogen base, 0.4 Phel17 Tyr300 7.7 2.07 4.59 mg biotin, 5 mL methanol, and 100 mL of a 1 M KHPO/ Trp52 Tyr234 8.71 O.O2 7.00 KHPO solution, pH 6.0 per liter). Tyr135 Tyr2O3 7.58 O.10 9.08 55 Five-hundred milliliters of BMGY in a 5000-mL E-flask Phe Tyr82 9.29 -0.20 9.31 are inoculated with 1 mL of an overnight yeast culture in YPD and grown overnight at 28°C., 300 rpm. The medium *In Trp52, Epsilon Ni is used. is changed for 500 mL BMMY to induce for lipase expres Analysis of Epsilon Carbon Distances sion. Methanol is added to the culture medium to a final Because the most likely isomer of the di-tyrosine bond is 60 concentration of 0.5% (v/v) every 24h for the following 3 thought to be the epsilon-epsilon bond, and because coor days. The sample is collected by separating the culture dinate data for an epsilon position atom of all of the amino medium from the cells by centrifugation. acids selected is available, the distances between the epsilon Aliquots of the sample are taken and concentrated accord positions of the above selected residue pairs in Table 12 ing to standard procedures known in the art. The concen were analyzed. 65 trated sample is separated by SDS-PAGE on a 12% poly The pairs in Table 12 are ranked according to their epsilon acrylamide gel, and analyzed by Coomassie Blue and silver carbon distances. However, since in three of the four pairs a staining. US 7,037,894 B2 77 78 The protein is bound to NTA column (Qiagen) that binds Tris.C1 pH7.9 to 50 mM and 3-mercaptoethanol to 10 mM, the protein’s His-tag according to the manufacturers and the solution is again dialyzed against PBS to remove the protocol, and the beads are washed several times with catalyst, oxidizing and reducing agents. Phosphate Buffered Saline (PBS). Again the protein is The efficiency of the cross-link reaction is tested by analyzed by separation on a 12% polyacrylamide gel, and reducing and non-reducing PAGE and Coomassie blue stain analysis by Coomassie Blue and silver staining. 1ng. Deglycosylation Improved Stability and Retained Activity Endoglycosidase H and endoglycosidase F (Boehringer The retained hydrolytic activity of the lipase is tested by Mannheim, Mannheim, Germany) are used to cleave incubating equal amounts of the wild type and cross-linked N-linked carbohydrates from CALB produced in P. pastoris. 10 mutants of the enzyme in PBS at 55° C., 60° C., 65° C., and Digestion is performed according to the manufacturers 95°C. for 0, 1, 2, 5, 10, 15, 30, 60, and 90 min. Furthermore, instructions under reducing conditions on the NTA beads. the activity of the enzyme is assayed adding 0, 10 mM, 50 The deglycosylated protein is separated by SDS-PAGE on a mM, 150 mM, 0.5M, 1M, and 2M of NaCl and other salts, 12% polyacrylamide gel, and analyzed by staining, and by 0 1 mM, 10 mM, 50 mM, 150 mM, 0.5M, and 1M beta Western blot analysis using an antibody to the c-myc tag (see 15 mercaptoethanol. The remaining activities of the wild type above). and various mutants are then assayed hydrolyzing tributyrin, Active-Site Titration of Recombinant Lipase as described above. The enzymatic activity of the wild type Active-site titration of the purified lipase was performed and mutant enzymes in various pH conditions is determined using a methyl p-nitrophenyl n-hexylphospho-nateinhibitor spectrophotometrically by measuring the hydrolysis of in order to determine the concentration of active enzyme p-nitrophenyl esters (e.g. p-nitrophenyl palmitate and/or (Rotticci-Mulder et al. Prot. Expr. & Purif. 2001. Vol. 21, pp. p-nitrophenyl laurate), and the release of p-nitrophenol, at 386–392). The active-site concentration was determined by 410 nm. measuring the concentration of released p-nitrophenolate Dityrosine Stabilization and Directed Evolution spectrophotometrically at 25° C. and 400 nm. General Approach Lipase Activity Assay 25 The strategy for combining a directed evolution approach The hydrolytic activity of the lipase is tested by measur with the dityrosine technology described herein is based on ing hydrolysis of tributyrin. The substrate solution (0.2 M the concept that the cross-link conditions can be viewed as tributyrin, 2% gum arabicum, 0.2 M CaCl) is emulsified by a selection environment/selective pressure to which the gene sonication for 1 min. The reaction is initiated by the addition is adapted during the in vitro evolution of the enzyme. In the of enzyme to the Substrate emulsion. The enzymatic reaction 30 following, an approach is described that is an adaptation of is carried out at 25° C. and pH 7.5, and the level of the the approach described by Liebeton et al. (Liebeton et al. enzyme’s activity is measured by titration of the released “Directed Evolution of an Enantioselective Lipase'. Chem. fatty acid with 100 mM sodium hydroxide, using a pH-stat & Biol. 2000. Vol. 7 (9), pp. 709–718). Random mutations (Rotticci-Mulder et al. Prot. Expr. & Purif. 2001. Vol. 21, pp. are introduced to identify sites that enhance the cross-link 386–392: TIM900 Titration Manager Radiometer, 35 efficiency, the enzyme’s performance upon cross-linking, or Denmark). the stability of the protein in the presence of the cross-link. Stabilization of CALB These sites are then further examined by saturation Introduction of the Dityrosine Bond mutagenesis to identify the optimal mutation at the identified Introduction of the dityrosine bond is carried out both on site. and off the NTA beads. To cross-link the enzyme on the 40 Thus, first the mutations to tyrosine are introduced at the beads, the catalyst, metalloporphyrin 20-tetrakis selected residues, as described above. Second site mutations (4-sulfonateophenyl)-21H,23H-porphine manganese (III) are then randomly introduced by error-prone PCR using the chloride (MnTPPS) is then added to PBS to a concentration mutated gene as the template, and the resulting genes, of 1 uM, 5uM, 10 uM, 50 uM and 100 uM to the reaction. containing on average approximately 1-2 mutants per copy, The reaction is initiated by the addition of the oxidant 45 are ligated into the expression vector, pYES2. 1 V5-His potassium mono-persulfate to a concentration of 1-100 LM, TOPO (Invitrogen), and transformed into S. cerevisiae. at room temperature or otherwise, for each of the concen Secretion of the enzyme is directed by a S. cerevisiae trations of the catalyst. The beads are agitated, and after 45 signal-peptide. The secreted protein is cross-linked in the seconds, 60 seconds, and 2 minutes the reaction is quenched Supernatants of the cultures, and cross-linked and non-cross by the addition of Tris HCl pH7.9 to 50 mM and 50 linked protein is heat-treated at 60° C. The resulting B-mercaptoethanol to 10 mM, and the beads are washed enzymes are analyzed by adding a reaction buffer containing several times in PBS to remove the catalyst, oxidizing and substrate specific for lipases, in which the activity of the reducing agents. enzyme can easily be detected by spectrophotometric analy To cross-link the enzyme in solution, the protein is eluted sis. Clones identified as more readily cross-linked, more from the NTA column according to the manufacturers 55 active upon cross-linking, and/or more thermostable, are protocol, the eluate is equilibrated by dialysis in phosphate recovered from the original S. cerevisiae clone and buffered saline (PBS), and the protein concentration is sequenced. adjusted to several concentrations between 100 nM and 1 Second site mutations identified are further analyzed by mM. The catalyst, metalloporphyrin 20-tetrakis saturation mutagenesis. Once the optimal mutation for a site (4-sulfonateophenyl)-21H,23H-porphine manganese (III) 60 is identified, a construct containing this mutation is used as chloride (MnTPPS) is added on ice to a concentration of 1 the template for another round of random second site uM, 5uM, 10 uM, 50 uM and 100 uM to the reaction. The mutation screening, and Saturation mutagenic analysis. This reaction is then initiated by the addition of the oxidant process is iterated 10 to 15 times over. potassium mono-persulfate to a concentration of 1-100 LM, Vector Construction of pYal-CALB at room temperature or otherwise, for each of the concen 65 The DNA encoding the yeast alpha factor-CALB fusion trations of the catalyst, and at several protein concentrations. proteins is amplified from the pPal-CALB vectors contain After 45 seconds the reaction is quenched by the addition of ing the point mutations, as described above, using the US 7,037,894 B2 79 80 primers Primer C and D described in FIG. 15B. The PCR Approximately 1000–2000 transformants are each picked products are ligated into the pYES2.1/V5-His-TOPO vector with sterile toothpicks and resuspended in a well of a (Invitrogen) according to the manufacturer's protocol, and 96-deep-well microtiter plate filled with 1 ml of supSC-U. transformed into competent HB 101 cells (E. coli) according Cultures are incubated on a shaker overnight at 30° C. To to standard procedures known in the art. The transformants induce protein expression, the cultures are spun down (15 are selected on LB-Amp agar plates. Plasmid DNA is min. at 5000 g), the supernatants are removed, and 1 ml of isolated, and the CALB genes (wild type and mutants) are indSC-U is added to each well. The cultures are spun down, sequenced by standard methods known in the art. the Supernatants are distributed into 96 well plates for These constructs are isolated and purified using the Qiagen Plasmid Maxi Kit (Qiagen, 2001 catalog number analysis of the enzymes (see below), and the cells are 12162) according to the manufacturer's protocol. 10 resuspended and maintained in supSC-U to be able to Error Prone PCR Reactions recover the plasmid DNA. 10 ug of the pYal-CALB vectors are cut with the restric Cross-linking in Supernatants of the Cultures tion enzymes EcoRI and NotI, and the resulting linearized Cross-linked and uncross-linked enzymes are compared plasmid are gel purified using the QiaeX II Gel Extraction after heat-inactivation; because of the large number of Kin (see above) according to the manufacturer's protocol. 15 colonies to be screened for increased activity/stability, the A total volume of 50 ul of 67 mM Tria HCl pH 8.8, 16.6 protein in the 96well plates is cross-linked directly in the mM (NH)SO 6.1 mM MgCl, 6.7 mM EDTA, 0.2 mM Supernatants of the cultures. dNTPs, 10 mM beta-mercaptoethanol, 10% (v/v) DMSO, 35 ul of each supernatant is transferred to two 96-well 0.15 M each of the Primers E and D, as described in FIG. plates to which 5 ul each of 10x PBS, 1 mM MnTPPS 15B, contains 1 ng of template DNA and 2 units of Goldstar (catalyst, see above), and to the samples on one of the 96 Taq-polymerase (Eurogentec). Ten parallel samples overlaid well plates, 5ul of 1 mMKHSO4 (oxidant) are added. After with 70 ul paraffin are amplified using the following thermo 2 minutes, the cross-link reaction is quenched in the samples cycling protocol: of the plates to which the oxidant was added by the addition 1 cycle: 2 min. 95° C. of 2.5ul of 2.88M B-mercaptoethanol. To the samples on the 25 cylcles: 1 min. 94° C., 2 min. 64° C., 1 min. 64° C. 25 other plate, 7.5 ul of 1xPBS are added. 1 cycle: 7 min. 72° C. Lipase Stabilization/Activity Assay PCR products are gel purified with the Qiaex II Gel Lipase activity is measured both before and after heat Extraction Kit, cut with the restriction enzymes EcoRI and inactivation. The period for which the protein is best heat NotI, and again gel purified with the Qiaex II Gel Extraction treated at 60° C. is determined on the wild-type in a Kit (see above). 30 time-course experiment. A cross-linked and a non-cross In a total volume of 10 ul, 5 pmols each of insert and linked 96-well plate are each heat-inactivated at 60° C. for vector are ligated for two hrs. at room temperature according the determined period of time. Lipase activities are deter to standard procedures known in the art. Ligated DNA is mined by hydrolysis of p-nitrophenyl palmitate and spec transformed into competent HB101 cells according to stan trophotometric analysis at 410 nm, according to the methods dard procedures known in the art, and the cells are grown 35 published by Liebeton et al. and Winkler & Stuckmann overnight as a culture, selecting for amp. resistance. Plasmid (Liebeton et al. “Directed Evolution of an Enantioselective DNA is recovered using the Qiagen Plasmid Midi Kit Lipase'. Chem. & Biol. 2000. Vol. 7 (9), pp. 709–718: (Qiagen, 2001 catalog number 12143) according to the Winkler & Stuckmann. "Glycogen, Hyaluronate, and Some manufacturer's protocol. Other Polysaccharides Greatly Enhance the Formation of Transformation and Expression in S. cerevisiae 40 Exolipase by Serratia marcescens”. J. Bacteriol. 1979. Vol. The constructs are transformed into competent, uracil 138, pp. 663–670). auxotrophic S. cerevisiae using the S. C. EasyComp Trans Saturation Mutagenesis formation Kit (Invitrogen, 2001 catalog number kS050-01) Saturation mutagenesis is performed as described for site according to the manufacturer's protocol. Transformants are directed point mutagenesis, with mutagenic primers in isolated on selection plates. Because expression of the 45 which the codon under investigation is randomized by inserts in the pYal-CALB vectors is driven by a Gal mixing equal amounts of nucleoside phosphoamidates dur inducible promoter, the yeast Strains are grown in an SC-U ing synthesis. The optimal codon for that position is again medium with 2% glucose Suppressing protein expression identified by screening approximately 150–200 clones for (supSC-U) containing 0.67% yeast nitrogen base (without activity upon cross-linking with and without heat treatment, amino acids with ammonium sulfate, 2% glucose, 0.01% 50 as described above. each of adenine, arginine, cysteine, leucine, lysine, threonine, tryptophan, and uracil, 0.005% each of aspartic 8. EXAMPLE III acid, histidine, isoleucine, methionine, phenylalanine, Subtilisin E proline, serine, tyrosine, and valine. Protein expression is induced by changing the medium to an SC-U medium with 55 The following example illustrates certain variations of the 2% galactose (indSC-U) containing 0.67% yeast nitrogen methods of the invention for protein and protein complex base (without amino acids with ammonium Sulfate, 2% stabilization. This example is presented by way of illustra galactose, 0.01% each of adenine, arginine, cysteine, tion and not by way of limitation to the scope of the leucine, lysine, threonine, tryptophan, and uracil, 0.005% invention. each of aspartic acid, histidine, isoleucine, methionine, 60 Introduction phenylalanine, proline, serine, tyrosine, and valine. Upon In the following section, methods of Stabilizing one induction, the enzymes with and without the point mutations polypeptide, a biocatalyst, for which structural data is avail are secreted into the medium, and can easily be affinity able for several structurally or functionally related polypep purified by their His(6) tags over NTA columns. The optimal tides. Specifically, described below are the residue pair period of induction is determined by inducing for 1, 2, 8, and 65 selection process, the introduction of point mutations, bac 36 hours and measuring the activities in the cultures Super terial expression of the polypeptides and their purification, natantS. the cross-link reaction itself, and analysis of the resulting US 7,037,894 B2 81 82 stabilized biocatalyst. For the description of the cross-link the mutations using the automated, knowledge-based protein reaction and the adjustment of the cross-link reaction modeling server Swiss Model, and visualizing the resultant conditions, refer to Chapter 6. polypeptides structures, and with the program Swiss pdb The biocatalyst stabilized in the below example is the Viewer, both of which are available from the proteomics serine endopeptidase Subtilisin E (FIG. 16A), which is one server of the Swiss Institute of Bioinformatics (SIB). of the most commercially important biocatalysts. Subtilisin Additionally, residue pairs were selected that had previously E is a secreted protein of Bacillus subtilis, and it cleaves ester and amide bonds. It is used for the total hydrolysis of been mutated to cysteines and formed disulfide bonds, proteins and peptides at alkaline pH. It has been Successfully stabilizing the enzyme and maintaining its activity. applied toward the racemic resolution of amino acids, 10 amines, carboxylic acids and alcohols and in peptide FILTER 1: SELECTION OF RESIDUES BASED ON synthesis, e.g. D-terminal deprotection. AMINO ACID USAGE The structure files containing the three dimensional To minimize the distortions that point mutations to atomic coordinates of the polypeptides are obtained from the Brookhaven National Laboratory Protein Database. The 15 tyrosine will introduce into the structure of the enzyme, derivative data relevant to the selection process is calculated residues were selected that in every enzyme in the sample as described. In addition to the statistical selection process, have aromatic, or hydrophobic amino acids. Amino acids carried out using a set of convenient and appropriate filters, that were scored for included Trp, Tyr, Phe, His, Pro, Lys, data regarding improved stability of the protein upon intro Leu, and Arg, whereby Leu and Arg were only permitted in duction of disulfide bonds is used to select potential residue maximally /3 of the sample. Selected residues are listed in pairs to which the cross-link is directed. Table 13. Point mutations to tyrosine (directing the cross-link reaction) are introduced according to the final selection of TABLE 13 residue pairs (Tables 15 and 16, FIG.16D), and expressed in Bacillus subtilis. The polypeptide is affinity purified and 25 Selected residues based on their amino acid usage. cross-linked, and the resulting biocatalyst is evaluated, as Residue AA Consensus Residue Consensus described. 6 Tyr (W) 130 Pro Selection of Optimal Residues for Tyrosyl-Tyrosyl Cross 14 Pro 168 Tyr Link 30 17 His 169 Pro The selection process consisted of (1) a review of func 21 Tyr (K) 172 Tyr 27 Lys 190 Phe tional data on subtilisin enzymes with improved half-lives 39 His 2O2 Pro upon introduction of disulfide bonds, and (2) the statistical 40 Pro 211 Pro measurements on the alpha carbon distances within the 50 Phe 215 Tyr polypeptides of a series of tests or filters' aimed at Succes 52 Pro 218 (Leu, Tyr, Lys) 35 57 Pro 226 Pro sively narrowing down the residue pairs most likely to result 65 His 227 His in a cross-linked tyrosine pair that minimally alters the 68 His 238 Lys activity or specificity of the enzyme, while lending maximal 87 Pro 240 pro stability. Furthermore, residue pairs are further evaluated by 92 Tyr 242 Trp computationally modeling the mutations to tyrosine. 95 Lys 263 Tyr (L) Data used for the Analysis 40 114 Trp 284 Tyr Coordinate data for distance calculations of 3 related non-consensus amino acids occurring at a position are indicated in paren subtilisin proteins (subtilisin E and BPN, and subtilisin from theses. Bacillus lentus) from crystallographically solved structures was downloaded from the protein structure database at 45 FILTER 2: SELECTION OF RESIDUE PAIRS BASED Brookhaven National Laboratory These data provide the ONAVERAGE ALPHA CARBON DISTANCES three-dimensional coordinates (x, y, and Z) for each atom in the solved structure, expressed in metric units, i.e. To find residue pairs spaced appropriately for a tyrosyl Angströms (10' m, A). These data also contain the tyrosyl bond, the alpha carbon to alpha carbon distance sequence and/or amino acid usage of the polypeptide. With 50 between every residue pair and each of the polypeptides in this data, aligned as shown in FIGS. 16B and C, it was the set used for the statistical analysis was calculated in a 3D possible to calculate the three-dimensional distances between any desired atoms. Functional data regarding database. This calculation was performed by applying improved stability of the enzyme was taken from the litera Pythagorean geometry to the 3D coordinates of the alpha ture (see below). 55 carbons (FIG. 6). Analogously to the selection described in Selection Methodology Chapter 7, the range that was selected for was the following: Optimal residues, to which the cross-link reaction is directed, were selected first based on the amino acid usage Min 5.70 A, Max 9.74 A. within the set of structurally and functionally related polypeptides, selecting for residues that in all of the 60 Furthermore, because the dityrosine bond is intended to polypeptides of the set are either Trp, Tyr, Phe, Lys, Pro, or His residues. From this set of residues, residue pairs were stabilize a single polypeptide rather than cross-link two or selected based on their average alpha carbon distances more proteins of a complex, it was important to select for within the set of structurally and functionally related residues that were sufficiently spaced in the two-dimensional polypeptides. Finally residue pafrs were selected from the 65 polypeptide chain to maximize the stabilizing effect of the above set of residue pairs based on the proximity of the engineered dityrosine bond. Residue pairs were selected that modeled tyrosine side-chains. This was done by modeling are more than 40 residues apart. US 7,037,894 B2 83 84 Structural Comparison with a Thermophilic Serine Protease. TABLE 1.4 JBC 1990. Vol. 265(12); pages 6874-8), Mansfeld et al., 1997 (Extreme Stabilization of a Thermolysin-like Protease Aromatic residue pairs with alpha carbon distances within the by an Engineered Disulfide Bond. JBC 1997. Vol. 272(17): Selection range, each spaced more than 40 residues apart. pages 11152–56), Takagi et al., 2000 (Engineering Subtilisin Subtilisin E residue Alpha carbon Alpha carbon E for Enhanced Stability and Activity in Polar Organic pairs average distance distance St. dev. Solvents. J. Biochem. 2000. Vol. 127; pages 617–25), and Mitchinson and Wells (Protein engineering of disulfide Tyró Pro2O2 8.2 O.32 His17 Pro87 8.9 O.08 bonds in subtilisin BPN'. Biochemistry 1989. Vol. 28(11): Tyr21 Pro87 9.5 O16 10 pages 4807–15). Tyr21 Lys238 6.3 O.S1 In Table 16 below, these additionally-selected residues are Lys27 Tyr92 7.4 O.09 listed along with their most relevant functional data. His39 Pro211 6.8 O.22 PhesO Lys95 6 O.04 PhesO Trp114 9.6 O.O7 TABLE 16 HiséS Pro211 9.1 O.04 15 HiséS Tyr218 9.O O.O3 Additionally selected residue pairs based on disulfide bond data from Hisé8 Pro211 8.2 O.O6 the literature. Hisé8 Tyr215 8.1 O.O3 Hisé8 Tyr218 8.3 O.OO2 Mutations/Disulfide Secondary Hisé8 Pro226 9.5 O.O6 Enzyme positions Structures* Half-life Activity Pro130 Lys171 9.5 O.11 Subt. E & BPN G61CS98C & H3 BS3 2–3 x wit wit N61CA98C Subt. E K17OC.E195C BS6 BST 60% wit 46% Wit Based on these calculations, as a second cut, all residue BPN D36CP210C BS2- BS8 wit No pairs were selected from the set of residues identified based report on the residues amino acid usage that have average alpha carbon distances within the selection range, and that are 25 *Secondary structures cross-linked by the disulfide bond. H: alpha helix: sufficiently spaced, as listed in Table 13. BS: beta sheet. Residue Pair Selection Based on Structural Modeling and Introduction of the Point Mutations at the Selected Resi Visualization of the Mutations dues By modeling the mutations indicated in Table 14, the According to the final selection of residue pairs (Tables 15 likelihood was assessed that each residue pair would form a 30 and 16, FIG.16D), PCR is used to introduce point mutations ditryosine bond, stabilize the enzyme, and introduce mini to tyrosine, and nucleotides are added to the 3' end of the mal distortions into the structure of the protein, particularly wild type and mutant genes (FIG. 16D, Primers A and B) to in the active site of the enzyme, to maximize its retained introduce a poly-histidine tag to the polypeptide. Point activity and specificity. This was achieved by using the mutations are introduced by PCR using the QuickChangeTM automated knowledge-based protein modeling server Swiss 35 Site-Directed Mutagenesis Kit (Stratagene, 1998 Catalog it Model, and visualizing the resultant polypeptides structures 200518). The 5' primer (FIG. 16D, Primer A) creates an and with the program Swiss pdbViewer, as stated above. Ndel site, and the 3' primer (FIG. 16D, Primer B) creates a Taking the epsilon carbon distances, calculated in the Swiss Bamn 1 site. pdbViewer, between the modeled tyrosyl side chains into The PCR product is digested with NdeI and BamHI, consideration, and the residues proximity to the active site, 40 purified, and ligated into the multiple cloning site of a shuttle residues that looked the most promising were selected. The expression-vector that propagates both in bacillus and in E. remaining residue pairs are listed in Table 15. coli, and that directs expression of the polypeptide under the Bacillus subtilis subtilisin promoter (PBE3, Zhao and TABLE 1.5 Arnold, 1999). Ligated constructs are transformed into com 45 petent HB101 cells, grown, isolated, and analyzed by stan List of remaining residue pairs with relevant distance measurements. dard restriction enzyme digestion and sequencing. CC-CB Epsilon Expression and Purification of the Protein Alpha carbon Distance carbon To express the proteins, the plasmids described above are CALB residue pair distance Difference distance transformed into competent cells of a strain of subtilisin Tyró Pro2O2 8.2 O.32 4.30 50 negative bacillus subtilis (DB428; Zhao and Arnold, 1999). His17 Pro87 8.9 O.08 5.31 Cells are grown for 36 hours at 37°C., and protein is purified Tyr21 Lys238 6.3 O.S1 4.O2 from the Supernatants of the cultures. Lys27 Tyr92 7.4 O.09 S.69 The protein is bound to NTA column supplied by Invit *Epsilon carbon distances of the modeled tyrosine pairs. rogen that binds the proteins’ His-tags, by methods known 55 to one skilled in the art, and/or according to the manufac Selection of Additional Residue Pairs Based of Functional turer's protocol, and the beads are washed several times with Data Phosphate Buffered Saline (PBS). The cross-link reaction Functional data is available regarding positional Suitabil and the adjustment of the reaction conditions, as otherwise ity of residues at which engineered disulfide bonds improve described in Chapter 6, are carried out on the beads in PBS upon the stability of subtilisin enzymes. This information 60 containing the catalyst of the cross-link reaction, 20 tetrakis was taken into account, and residues were added to the (sulfonatophenyl)-21H,23H-porphorine manganese (III) selection of Table 15 that were able to confer significant chloride (MnTTP), and the oxidant, KHSOs, supplied by stability by forming a disulfide bond between engineered Fluka as 47% of a mixture containing KHSO and KSO. cystine side-chains while maintaining the enzymes’ activity. Analysis of the Resultant Cross-Linked Enzyme Articles containing such data include Takagi et al., 1990 65 The assay for the activities of the various mutants of the (Enhancement of the Thermostability of Subtilisin E by enzyme are carried out using 0.2 mM suc-AAPF-pNa as the Introduction of a Disulfide Bond Engineered on the Basis of substrate in a buffer containing 100 mM Tris 8.0 and 10 mM US 7,037,894 B2 85 86 CaCl. The activity is monitored spectrophotometrically by Pawson T. Tyrosine Kinase Signalling Pathways. Princess measuring absorbance of the reaction mixture at a wave Takamatsu Symp.; Vol.24: pp.303–22, 1994 length of 410 nm. Cowburn D. Peptide Recognition by PTB and PDZ The enzymes are analyzed, first to determine the mutants Domains. Curr. Opin. Struct. Biol.; Vol. 7(6): pp. 835–8, activity before cross-linking, relative to the wild-type 1997 enzyme. Enzymes purified from 100 ul of the cultures Bockaert J. and Pin J. P. Molecular Tinkering of G Supernatants are analyzed for their activity by letting the Protein-coupled Receptors: an Evolutionary Success. enzyme assay reaction run for 0, 30, 60, and 90 min. Furthermore, the enzymes are analyzed for activity before EMBO J.; vol. 18(7): pp. 1723–9, 1999 and after cross-linking, as described above. Finally, the 10 Royet J. et al. Notchless Encodes a Novel WD40-repeat stability of the enzymes is determined by time-course heat containing Protein that Modulates Notch Signaling Activity. inactivation experiments, where the enzymes are incubated EMBO J.; vol. 17(24): pp. 7351–60, 1998 for 0, 1, 2, 5, 15, and 60 minutes at 45° C., 55° C., 65° C., Chou J. J. et al. Solution Structure of the RAIDD CARD and 95° C. and Model for CARD/CARD Interaction in Caspase-2 and 15 Caspase-9 Recruitment. Cell: vol. 94(2): pp. 171–80, 1998 9. References Black R. A. and White J. M. ADAMs: Focus on the Campbell L. A. et al. Protein Cross-linking Mediated by Protease Domain. Curr Opin Cell Biol.; Vol. 10(5): pp. Metalloporphyrins. Bioorganic and Medicinal Chemistry, 6549, 1998 vol. 6: pp. 1301–1037, 1998 Strasser A. and Newton K. FADD/MORT1, a Signal Brown K. C. et al. Highly Specific Oxidative Cross-link Transducer that Can Promote Cell Death or Cell Growth. Int. of Proteins Mediated by a Nickel-peptide Complex. Bio J. Biochem. Cell. Biol.; vol. 31(5): pp. 533–7, 1999 chem.; vol. 34(14): pp. 4733–4739, 1995 McInnes C. and Sykes B. D. Growth Factor Receptors: Politt S. and Schultz, P. Agnew. Chem. Int. Ed.; Vol. Structure, Mechanism, and Drug Discovery. Biopolymers: 37(15): pp. 2104 2107, 1998 25 vol. 43(5): pp. 339–66, 1997 Spangler B. D. and Erman J. E. Cytochrome c Peroxidase Lotz M. et al. The Nerve Growth Factor/Tumor Necrosis Compound I: Formation of Covalent Protein Crosslinks Factor Receptor Family. J. Leukoc. Biol.; vol. 60(1): pp. During the Endogenous Reduction of the Active Site. Bio 17, 1996 chim. Biophys. Acta; vol. 872(1-2): pp. 155–7, 1986 Casaccia-Bonnefil P. et al. p75 Neurotrophin Receptor as Gmeiner B. and Seelos C. Phosphorylation of Tyrosine 30 a Modulator of Survival and Death Decisions. Microsc Res Prevents Dityrosine Formation in vitro. FEBS Lett; vol. Tech.: vol. 45(4-5): pp. 217-24, 1999 255(2): pp. 395 7, 1989 Natoli G. et al. Apoptotic, Non-apoptotic, and Anti Kanwar R. and Balasubramanian D. Structure and Sta apoptotic Pathways of Tumor Necrosis Factor Signalling. bility of the Dityrosine-linked Dimer of GammaB-crystallin. Biochem. Pharmacol.; vol. 56(8): pp. 915–20, 1998 Exp. Eye Res., vol. 68(6): pp. 773–84, 1999 35 Alber T. Structure of the Leucine Zipper. Curr. Opin. Fancy D. A. and Kodadek T. Chemistry for the Analysis Genet. Dev.; vol. 2(2): pp. 205-10, 1992 of Protein-protein Interactions: Rapid and Efficient Cross Griffith T. S. et al. Functional Analysis of TRAIL Recep linking Triggered by Long Wavelength Light. Proc. Natl. tors. Using Monoclonal Antibodies. J. Immunol.; Vol. 162 Acad. Sci., U.S.A.: vol. 96: pp. 6020-24, 1999 40 (5): pp. 2597–605, 1999 Klinman J. P. (ed.). Redox-active Amino Acids in Biol Yasuda H. et al. Identity of Osteoclastogenesis Inhibitory ogy. Methods in Enzymology; vol. 258, 1995 Factor (OCIF) and Osteoprotegerin (OPG): a Mechanism by Richards, F. M. The Interpretation of Protein Structures: which OPG/OCIF Inhibits Osteoclastogenesis in vitro. Total Volume, Group Volume Distributions and Packing Endocrinology; vol. 139(3): pp. 1329–37, 1998 Density. J. Mol. Biol.; vol. 82: pp. 1–14, 1974 45 Ortiz A. et al. New Kids in the Block: the Role of FasL Eisenberg, D. Three-dimensional Structure of Membrane and Fas in Kidney Damage. J. Nephrol.; Vol. 12(3): pp. and Surface Proteins. Ann. Rev. Biochem.; vol. 53: pp. 150 8, 1999 595-623, 1984 Price Waterhouse: Survey of Biopharmaceutical Industry, National Brookhaven Laboratory Protein Database 1998 Boston Consulting Group: The Contribution of Phar Pastan et al. Recombinant Disulfide Stabilized Polypep 50 maceutical Companies: What's at stake for America, 1993 tide Fragments Having Binding-specificity. U.S. Pat. No. Pharmaceutical Research and Manufacturers of America. 5,747,654, issued May 5, 1998 New Medicines in Develoment, Survey. Hofmann K. The Modular Nature of Apoptotic Signaling Penuche M. L. et al. Antibody-IL-2 Fusion Proteins: a Proteins. Cell Mol. Life Sci., vol. 55(8–9): pp. 1113–28, 55 Novel Strategy for Immune Protection. Hum Antibodies: 1999 vol. 8(3): pp. 106–18, 1997 Johnson, G. et al. Weir's Handbook of Experimental Sensel M. G. et al. Engineering Novel Antibody Mol Immunology I. Immunochemistry and Molecular ecules. Chem. Immunol. vol. 65: pp. 129–58, 1997 Immunology, Fifth Edition, Ed. L. A. Herzenberg, W. M. Reiter Y. and Pastan I. Recombinant Fv Immunotoxins Weir, and C. Blackwell, Blackwell Science Inc., Cambridge, 60 and Fv Fragments as Novel Agents for Cancer Therapy and Me., Chapter 6.1–6.21, 1996 Diagnosis. TIBTECH; vol. 16(12): pp. 513–520, 1998 Wickelgren I. Mining the genome for drugs. Science; Vol. Reiter Y. et al. Engineering Antibody FV Fragments for 285(5430): pp. 998-1001, 1999 Cancer Detection and Therapy: Disulfide-stabilized FV Leong S. R. et al. IL-8 single-chain homodimers and Fragments. Nat Biotech.: vol. 14: pp. 1239–1245, 1996 heterodimers: interactions with chemokine receptors 65 Pluckthun A. and P. Pack. New Protein Engineering CXCR1, CXCR2, and DARC. Protein Sci., vol. 6(3): pp: Approaches to Multi-valent and Bi-specific Antibody Frag 609 17, 1997 ments. Immunotechnology; vol. 3(2): pp. 83-105, 1997 US 7,037,894 B2 87 88 Wright A. and Morrison S. L. Effect of Glycosylation on Antibody Engineering Page, IMT, University of Marburg, Antibody Function: Implications for Genetic Engineering. FRG Trends Biotechnol.: vol. 15(1): pp. 26–32, 1997 Hunkapiller M. et al. A Microchemical Facility for the Schwartz M. A. et al. Monoclonal Antibody Therapy. Analysis and Synthesis of Genes and Proteins. Nature; vol. Cancer Chemother. Biol. Response Modif.; Vol. 13:pp. 5 310(5973): pp. 105-11, 1984 156 74, 1992 Xia X and Li W. H. What Amino Acid Properties Affect Houghton A. N. and Scheinberg D. A. Monoclonal Anti Protein Evolution, J. Mol. Evol.; vol. 47(5): pp. 557–64, bodies: Potential Applications to the Treatment of Cancer. 1998 Semin Oncol.; vol. 13(2): pp. 165–79, 1986 Sandberg M. et al. New Chemical Descriptors Relevant Cao Y. and Suresh M. R. Bi-specific Antibodies as Novel 10 for the Design of Biologically Active Peptides. A Multivari Bio-conjugates. Bioconjugate Chemistry; Vol. 9(6): pp. ate Characterization of 87 Amino Acids. J. Med. Chem.; Vol. 635-644, 1998 41(14): pp. 2481-91, 1998 Raag R. and Whitlow M. Single-chain Fvs. FASEB; vol. Hopp T. P. and Woods K. R. Prediction of Protein Anti 9: pp. 73–80, 1995 15 genic Determinants from Amino Acid Sequences. Proc. Webber K. O. et al. Preparation and Characterization of a Natl. Acad. Sci., U.S.A.: vol. 78: pp. 3824, 1981 Disulfide-stabilized Fv Fragment of the Anti-Tac Antibody: Bradford, M. A Rapid and Sensitive Method for the Comparison with its Single-chain Analog. Mol. Immunol. Quantitation of Microgram Quantities of Protein Utilizing vol. 32(4): pp. 249–258, 1995 the Principle of Protein-dye Binding. Anal. Biochem.; Vol. Klinman J. P. (ed.). Redox-active Amino Acids in Biol 72: pp. 248–54, 1976 ogy. Methods in Enzymology, vol. 258, 1995 Lowry, O. J. Biol. Chem.: vol. 193, pp. 265, 1951 Bosilevac J. M. et al. Inhibition of Activating Transcrip Lei S. P. et al. Characterization of the Erwinia Carotovora tion Factor 1- and cAMP-responsive Element-binding pelB Gene and its Product Pectate Lyase. J. Bacteril.; Vol. Protein-activated Transcription by an Intracellular Single 169: pp. 4379–83, 1987 chain Fv fragment. J. Biol. Chem.; vol. 273 (27): pp. 25 Chou P. Y. and Fasman G. D. Prediction of Protein 16874–16879, 1998 Conformation. Biochemistry; vol. 13(2): pp. 222–45, 1974 Graus-Porta D. et al. Single Chain Mediated Intracellular Lang L. and Eckelmann W. C. One-step Synthesis of 18F Retention of ErbB-2 Impairs Neu Differentiation Factor and labeled 18F-N-succinimidyl 4-(fluoromethyl)benzoate for Epidermal Growth Factor Signaling. Mol. Cell Biol.; Vol 15: Protein Labeling. Appl. Radiat. Isot.; Vol. 45: pp. 1155–63, pp. 1182–1191, 1995 30 1994 Richardson J. H. et al. Phenotypic Knockout of the Sambrook et al.; Glover (ed.). DNA Cloning: A Practical High-affinity Interleukin 2 Receptor by Intracellular Single Approach. MRL Press, Ltd., Oxford, U.K. vol. I, II, 1985 Chain Antibodies against the Alpha Subunit of the Receptor. Benton and Davis. Screening Lambdagt Recombinant Proc. Nat. Acad. Sci., USA; vol. 92: pp. 3137–3141, 1995 Clones by Hybridization to Single Plaques in situ. Science: Maciejewski J. P. et al. Intracellular Expression of Anti 35 body Fragments Directed against Human Immunodeficiency vol. 196(4286): pp. 1802, 1977 Virus Reverse Transcriptase Prevents HIV Infection in vitro. Clemmons D. R. IGF Binding Proteins and their Func Nat. Med... vol. 1: pp. 667-673, 1995 tions. Mol. Reprod. Dev.; vol. 35: pp. 368–374, 1993 Marasco W. A. et al. Design, Intracellular Expression, and Loddick S. A. et al. Displacement of Insulin-like Growth Activity of a Human Anti-human Immunodeficiency Virus 40 Factors from their Binding Proteins as a Potential Treatment Type I gp120 Single Chain Antibody. Proc. Nat. Acad. Sci., for Stroke. Proc. Natl. Acad. Sci., U.S.A.: vol. 95: pp. USA; vol. 90: pp. 7889 7893, 1993 1894-1898, 1998 Levy Mintz Pet al. Intracellular Expression of Single Swift G. H. et al. Tissue-specific expression of the rat Chain Variable Fragment to Inhibit Early Stages of the Virla pancreatic elastase I gene in transgenic mice. Cell: Vol. Life Cycle by Targeting Human Immunodeficiency Virus 45 38:pp. 639-646, 1984 Type 1 Integrase. J. Virol.; vol. 70: pp. 8821–8832, 1996 Hanahan D. Heritable formation of pancreatic beta-cell Duan L. et al. Intracellular Immunization Against Human tumours in transgenic mice expressing recombinant insulin/ Immunodeficiency Virus Type I Infection of Human T simian virus 40 oncogenes. Nature; vol. 315: pp. 115-122. Lymphocytes: Utility of Anti-rev Single Chain Variable 1985 Fragment. Hum. Gene Ther.: Vol. 6(12): pp. 1561–1573, 50 Grosschedl R. etal Introduction of a mu immunoglobulin 1995 gene into the mouse germ line: specific expression in Kim S. H. et al. Expression and Characterization of lymphoid cells and synthesis of functional antibody. Cell: Recombinant Single-chain Fv and Fv Fragments Derived vol. 38: pp. 647–658, 1984 from a Set of Catalytic Antibodies. Mol. Immunol.; vol. 55 Leder A et al. Consequences of widespread deregulation 34(12–13): pp. 891–906, 1997 of the c-myc gene in transgenic mice: multiple neoplasms Choi C. W. et al. Biodistribution of 18F- and 125I-labelled and normal development. Cell: Vol. 45: pp. 485-495, 1986 Anti-Tac Disulfide-stabilized Fv Fragments in Nude Mice Pinkert C. A. et al. An albumin enhancer located 10 kb with Interleukin 2 a Receptor-positive Tumor Xenografts. upstream functions along with its promoter to direct Cancer Research; vol. 55: pp. 5323–5329, 1995 60 efficient, liver-specific expression in transgenic mice. Genes Colcher D. et al. Pharmacokinetics and Biodistribution of Dev.; vol. 1: pp. 268–276, 1987 Genetically-engineered Antibodies. Q J Nucl Med.; Vol. Krumlauf R. et al. Developmental regulation of alpha 42(4): pp. 225-41, 1998 fetoprotein genes in transgenic mice. Mol. Cell. Biol.; Vol. Pavlinkova G. et al. Pharmacokinetics and Biodistribution 5: pp. 1639–1648, 1985 of Engineered Single-chain Antibody Constructs of MAb 65 Kelsey G. D. et al. Species- and tissue-specific expression CC49 in Colon Carcinoma Xenografts. J. Nucl. Med.; Vol. of human alpha 1-antitrypsin in transgenic mice. Genes 40(9): pp. 1536–46, 1999 Dev.; vol. 1: pp. 161-171, 1987 US 7,037,894 B2 89 90 Magram J. et al. Developmental regulation of a cloned Jaeger K-E. et al. Bacterial Biocatalysts: Molecular adult beta-globin gene in transgenic mice. Nature; Vol. 315: Biology, Three-Dimensional Structures, and Biotechnologi pp. 338-340, 1985 cal Applications of Lipases. Annu. Rev. Microbiol. vol. 53: Readhead C. et al. Expression of a myelin basic protein pp. 315-51, 1999 gene in transgenic shiverer mice: correction of the dysmy 5 Carrea G. and Riva S. Properties and Synthetic Applica elinating phenotype. Cell: Vol. 48: pp. 703–712, 1987 tions of Enzymes in Organic Solvents. Angew Chem Int Ed Shani M. Tissue-specific expression of rat myosin light Engl. Vol. 39(13): pp. 2226-2254, 2000 chain 2 gene in transgenic mice. Nature; Vol. 314: pp. Stemmer W. P. C. Rapid Evolution of a Protein in Vitro by 283 286, 1985 DNA Shuffling. Nature. Vol. 370: pp. 389–391, 1994 Mason A. J. et al. The hypogonadal mouse: reproductive 10 Zhao H. and Arnold F. H. Optimization of DNA Shuffling functions restored by gene therapy. Science; Vol. 234: pp. for High Fidelity Recombination. Nucleic Acids Res. Vol. 1372–1378, 1986 25: pp. 1307–1308, 1997 Smith D. B. and Johnson K. S. Single-step purification of Zhao H. et al. Molecular Evolution by Staggered Exten polypeptides expressed in Escherichia coli as fusions with 15 sion Process (StEP) in Vitro Recombination. Nat. Biotech glutathione S-transferase. Gene; vol. 67: pp. 31–40, 1988 nol. Vol 16: pp. 258-261, 1998 Lei S. P. et al. Characterization of the Erwinia carotovora Shao Z. et al. Random-priming in Vtro Recombination: an pelB gene and its product pectate lyase. J. Bacteril., vol. 169: Effective Tool for Directed Evolution. Nucleic Acids Res. pp. 4379, 1987 Vol. 26: pp. 681–683, 1998 Kim S. H. et al. Expression and characterization of Vo-Dinh T. and Cullum B. Biosensors and Biochips: recombinant single-chain Fv and Fv fragments derived from Advances in Biological and Medical Diagnostics. Fresenius a set of catalytic antibodies. Mol. Immunol, Vol. 34: pp. J Anal Chem. Vol. 366: pp. 540–551, 2000 891-906, 1997 Patkar et al. Effect of Mutations in Candida Antarctica B Cale J. M. et al. Optimization of a reverse transcription Lipase. Chem.& Phys. Of Lipids. Vol. 93, pp. 95–101, 1998 polymerase chain reaction (RT-PCR) mass assay for low 25 abundance mRNA. Methods Mol. Biol.; vol. 105: pp. Rotticci-Mulder et al. Expression in Pichia Pastoris of 351. 71, 1998 Candida Antarctica Lipase B and Lipase B Fused to a Weis J. H. et al. Detection of rare mRNAs via quantitative Cellulose Binding Domain. Prot. Expr. & Purif. Vol. 21, pp. RT-PCR. Trends Genet.: vol. 8(8): pp. 263-4, 1992 386–392, 2001 30 Winkler & Stuckmann. Glycogen, Hyaluronate, and Frohman M. A. On beyond classic RACE (rapid ampli Some Other Polysaccharides Greatly Enhance the Forma fication of cDNA ends). PCR Methods Appl.: vol.4(1): pp. tion of Exolipase by Serratia marcescens. J. Bacteriol. Vol. S40-58, 1994 138, pp. 663-670, 1979 Adams P. D. et al. Extending the limits of molecular replacement through combined simulated annealing and Liebeton et al. Directed Evolution of an Enantioselective maximum-likelihood refinement. Acta Crystallogr. D. Biol. Lipase. Chem. & Biol. 2000. Vol. 7 (9), pp. 709–718 Crystallogr.: Vol. 55 (Pt 1): pp. 181–90, 1999 Schmidt-Dannert. Recombinant Microbial Lipases for Schwarze S. R. et al. In Vivo Protein Transduction: Biotechnological Applications. Bioorg. & Med. Chem. Vol. Delivery of a Biologically Active Protein into the Mouse. 7, pp. 2123-2130, 1999 Science; vol. 285: pp. 1565–72, 1999 Takagi et al. Enhancement of the Thermostability of 40 Subtilisin E by Introduction of a Disulfide Bond Engineered Hoffman R. M. Topical liposome targeting of dyes, on the Basis of Structural Comparison with a Thermophilic melanins, genes, and proteins selectively to hair follicles. J. Serine Protease. JBC. Vol. 265(12); pages 6874–78, 1990 Drug Target.: vol. 5(2): pp. 67–74, 1998 Mansfeld et al. Extreme Stabilization of a Thermolysin Pluckthun A. et al. Catalytic antibodies: contributions like Protease by an Engineered Disulfide Bond. JBC. Vol. from engineering and expression in Escherichia coli. Ciba 45 272(17); pages 11152–56, 1997 Found. Symp.; Vol. 159: pp. 103–12; discussion 112 7, 1991 Takagi et al. Engineering Subtilisin E for Enhanced Guogiang J. et al. Dimerization Inhibits the Activity of Stability and Activity in Polar Organic Solvents. J. Biochem. Receptor-like Protein-tyrosine Phosphatase alpha. Nature: Vol. 127: pages 617–25, 2000 vol. 401: pp.606–610, 1999 Mitchinson and Wells. Protein Engineering of Disulfide BIC, Explorer, Business Opportunities in Technology 50 Bonds in Subtilisin BPN'. Biochemistry. Vol. 28(11); pages Commercialization. 4807–15, 1989 Illanes A. Stability of biocatalysts. Elec.J.Biotech., vol. Zhao and Arnold. Directed Evolution Converts Subtilisin 2(1): pp. 7–15, 1999 E into a Functional Equivalent of Thermitase. Protein Eng. DeSantis G. and Jones J. B. Chemical modification of Vol.12(1): pages 47–53, 1999 enzymes for enhanced functionality. Curr. Op. Biotech., vol. 55 The invention claimed and described herein is not to be 10(4): pp. 324-340, 1999 limited in Scope by the specific embodiments, including but not limited to the deposited microorganism embodiments, Govardhan C. P. Crosslinking of enzymes for improved herein disclosed since these embodiments are intended as stability and performance. Curr Opin Biotechnol. Aug; vol illustrations of several aspects of the invention. Indeed, 10(4):331-5, 1999 60 various modifications of the invention in addition to those Beguin P. Hybrid enzymes. Curr. Op. Biotech., vol. 10(4): shown and described herein will become apparent to those pp. 336-340, 1999 skilled in the art from the foregoing description. Such Haring D. and Schreier P. Cross-linked enzyme crystals. modifications are also intended to fall within the scope of the Curr Opin Chem Biol.; vol. 3(1): pp. 35–8, 1999 appended claims. Moreno-Hagelsieb G. and Soberon X. Protein engineer 65 A number of references are cited herein, the entire dis ing as a powerful tool for the chemical modification of closures of which are incorporated herein, in their entirety, enzymes. Biol Res., vol. 29(1): pp. 127–40, 1996 by reference.

US 7,037,894 B2 93

-continued Asp Ala Leu Ala Val Ser Ala Pro Ser Val Trp Gln Gln Thr Thr Gly 145 15 O 155 160

Ser Ala Lieu. Thir Thr Ala Lieu Arg Asn Ala Gly Gly Lieu. Thr Glin Ile 1.65 170 175

Val Pro Thr Thr Asn Leu Tyr Ser Ala Thr Asp Glu Ile Val Glin Pro 18O 185 19 O Glin Val Ser Asn. Ser Pro Leu Asp Ser Ser Tyr Lieu Phe Asin Gly Lys 195 200 2O5 Asn Val Glin Ala Glin Ala Val Cys Gly Pro Leu Phe Val Ile Asp His 210 215 220 Ala Gly Ser Leu Thir Ser Glin Phe Ser Tyr Val Val Gly Arg Ser Ala 225 230 235 240 Leu Arg Ser Thr Thr Gly Glin Ala Arg Ser Ala Asp Tyr Gly Ile Thr 245 250 255 Asp Cys Asn Pro Leu Pro Ala Asn Asp Lieu. Thr Pro Glu Glin Lys Val 260 265 27 O

Ala Ala Ala Ala Leu Lleu Ala Pro Ala Ala Ala Ala Ile Val Ala Gly 275 280 285 Pro Lys Glin Asn Cys Glu Pro Asp Leu Met Pro Tyr Ala Arg Pro Phe 29 O 295 3OO Ala Val Gly Lys Arg Thr Cys Ser Gly Ile Val Thr Pro 305 310 315

<210> SEQ ID NO 3 &2 11s LENGTH 57 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 3 atgg gaattic catcatcatc atcatcacag cagcggccta cctitc.cggitt cqg acco 57

<210> SEQ ID NO 4 &2 11s LENGTH 37 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 4 citcttggcgg cc.gc.ctatoa gggggtgacg atgcc.gg 37

<210 SEQ ID NO 5 &2 11s LENGTH: 68 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 5 atgg gaattic catcatcatc atcatcacag cagcggccta cctitc.cggitt cqg accotgc 60 citattogc 68

<210> SEQ ID NO 6 <211& LENGTH 24 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 6 cgacitcgaac tacatccc.cc totc 24 US 7,037,894 B2 95 96

-continued <210 SEQ ID NO 7 <211& LENGTH 24 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 7 gagaggggga totagttcga gtcg 24

<210 SEQ ID NO 8 &2 11s LENGTH 25 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 8 gggtotgacc tact tcc.cca gitatc 25

<210 SEQ ID NO 9 &2 11s LENGTH 25 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 9 gatactgggg aagtaggtoa gacco 25

<210> SEQ ID NO 10 <211& LENGTH 21 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 10 cgatgagatt toctitcaatt t 21

<210> SEQ ID NO 11 <211& LENGTH 21 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 11 totagaaagg togcggcc.gc c 21

<210> SEQ ID NO 12 <211& LENGTH 22 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 12 gaagctggat to catcatca to 22

<210> SEQ ID NO 13 <211& LENGTH 21 &212> TYPE DNA <213> ORGANISM: Candida antarctica

<400 SEQUENCE: 13 totagaaagg togcggcc.gc c 21

<210> SEQ ID NO 14 &2 11s LENGTH 1074 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis US 7,037,894 B2 97 98

-contin ued <400 SEQUENCE: 14 atgtctgtgc aggctg.ccgg aaaaag.cagt acagaaaaga aatacattgt cggatttaaa 60 cagacaatga gtgccatgag titcc.gc.caag aaaaaggatg ttatttctga aaaaggcgga 120 aaggttcaaa agcaatttaa gtatgttaac gCggcc.gcag caa.cattgga tgaaaaagct 18O gtaaaagaat tgaaaaaaga to C gag.cgtt gcatatgttgg aagaagatca tattgcacat 240 gaatatgcgc aatctgttcc titatgg catt totcaaatta aag.cgc.cggc tottcactict caaggctaca caggctictaa cgtaaaagta gctgttatcg acagoggaat tgacticttct 360 catcctgact taaacgtoag aggcggagca agctt.cgtac cittctgaaac aalacc catac 420

Caggacggca gttcto acgg tacgcatgta gcc.ggtacga. ttgcc.gctdt taataactica 480 atcggtgttc tggg.cgittag cc caag.cgca toattatatg cagtaaaagt gcttgattoa 540 acaggaag.cg gccaatatag citggattatt aacgg cattg agtgggc cat titccaacaat 600 atggatgtta totalacatgag ccittggcgga cctactggitt citacagogct gaaaa.ca.gto 660 gttgacaaag cc.gtttccag cggitatcgto gttgctg.ccg Cagc.cggaala c galaggttca 720 to cq galagca caag cacagt cggctaccct gcaaaatato cittctactat tgcagtaggit gc ggtaaa.ca gcagdalacca aagagctt.ca ttctocagog caggttctga gcttgatgtg 840 atggcticcitg gc gtgtcc at ccaaag caca cittcc to gag gcacttacgg cgcttataac 9 OO ggaacgto.ca tgg.cgacticc toacgttgcc ggagcagcag cgittaattct ttctaag cac 96.O cc.gacittgga caaacgc.gca agtcc.gtgat cgtttagaaa gcactgcaac atatottgga 1020 aactictittct actatogaaa agggittaatc aacgtacaag cagotgcaca atala 1074

<210 SEQ ID NO 15 &2 11s LENGTH 357 &212> TYPE PRT <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 15 Met Ser Wall Glin Ala Al a Gly Lys Ser Ser Thr Glu Lys Lys Tyr Ile 1 5 10 15

Val Gly Phe Lys Gln Thr Met Ser Ala Met Ser Ser Ala Lys 25 30 Asp Val Ile Ser Glu Lys Gly Gly Lys Val Glin Lys Glin Phe 35 40 45

Wall Asn Ala Ala Ala Al a Thr Leu Asp Glu Lys Ala Wall Lys Glu Lieu 50 55 60

Lys Lys Asp Pro Ser Val Ala Tyr Wall Glu Glu Asp His Ile Ala His 65 70 75 8O

Glu Tyr Ala Gln Ser Val Pro Tyr Gly Ile Ser Glin Ile Lys Ala Pro 85 90 95

Ala Lieu. His Ser Glin Gl y Tyr Thr Gly Ser Asn Val Lys Val Ala Wall 100 105 110

Ile Asp Ser Gly Ile As p Ser Ser His Pro Asp Lieu. Asn. Wall Arg Gly 115 120 125

Gly Ala Ser Phe Wall Pro Ser Glu Thr Asn Pro Tyr Glin Asp Gly Ser 130 135 1 4 0

Ser His Gly Thr His Val Ala Gly Thir Ile Ala Ala Lieu. Asn Asn. Ser 145 15 O 155 160

Ile Gly Val Leu Gly Val Ser Pro Ser Ala Ser Leu Tyr Ala Wall Lys 1.65 170 175 US 7,037,894 B2 99 100

-continued

Wall Telu Asp Ser Thr Gly Ser Gly Glin Tyr Ser Trp Ile Ile Asn Gly 18O 185 19 O

Ile Glu Trp Ala Ile Ser Asn Asn Met Asp Wall Ile Asn Met Ser Telu 195 200 2O5

Gly Gly Pro Thr Gly Ser Thr Ala Telu Lys Thr Wall Wall Asp Ala 210 215 220

Wall Ser Ser Gly Ile Wall Wall Ala Ala Ala Ala Gly Asn Glu Gly Ser 225 230 235 240

Ser Gly Ser Thr Ser Thr Wall Gly Tyr Pro Ala Pro Ser Thr 245 250 255

Ile Ala Wall Gly Ala Wall Asn Ser Ser Asn Glin Arg Ala Ser Phe Ser 260 265 27 O

Ser Ala Gly Ser Glu Teu Asp Wall Met Ala Pro Gly Wall Ser Ile Glin 275 280 285

Ser Thr Telu Pro Gly Gly Thr Tyr Gly Ala Asn Gly Thr Ser Met 29 O 295

Ala Thr Pro His Wall Ala Gly Ala Ala Ala Teu Ile Teu Ser His 305 310 315 320

Pro Thr Trp Thr Asn Ala Glin Wall Arg Asp Arg Teu Glu Ser Thr Ala 325 330 335

Thr Telu Gly Asn Ser Phe Tyr Tyr Gly Lys Gly Teu Ile Asn Wall 340 345 35 O

Glin Ala Ala Ala Glin 355

SEQ ID NO 16 LENGTH 269 TYPE PRT ORGANISM: Bacillus subtilis

<400 SEQUENCE: 16

Ala Glin Ser Wall Pro Trp Gly Ile Ser Wall Glin Ala Pro Ala Ala 1 5 10 15

His Asn Arg Gly Teu Thr Gly Ser Gly Wall Wall Ala Wall Telu Asp 25 30

Thr Gly Ile Ser Thr His Pro Asp Telu Asn Ile Arg Gly Gly Ala Ser 35 40 45

Phe Wall Pro Gly Glu Pro Ser Thr Glin Asp Gly Asn Gly His Gly Thr 50 55 60

His Wall Ala Gly Thr Ile Ala Ala Telu Asn Asn Ser Ile Gly Wall Telu 65 70 75

Gly Wall Ala Pro Asn Ala Glu Telu Tyr Ala Wall Wall Telu Gly Ala 85 90 95

Ser Gly Ser Gly Ser Wall Ser Ser Ile Ala Glin Gly Teu Glu Trp Ala 100 105 110

Gly Asn Asn Gly Met His Wall Ala Asn Telu Ser Teu Gly Ser Pro Ser 115 120 125

Pro Ser Ala Thr Teu Glu Glin Ala Wall Asn Ser Ala Thr Ser Arg Gly 130 135 1 4 0

Wall Telu Wall Wall Ala Ala Ser Gly Asn Ser Gly Ala Gly Ser Ile Ser 145 15 O 155 160

Pro Ala Arg Tyr Ala Asn Ala Met Ala Wall Gly Ala Thr Asp Glin 1.65 170 175

Asn Asn Asn Arg Ala Ser Phe Ser Glin Tyr Gly Ala Gly Telu Asp Ile 18O 185 19 O US 7,037,894 B2 101 102

-continued

Wall Ala Pro Gly Wall Asn Wall Glin Ser Thr Pro Gly Ser Thr 195 200 2O5

Ala Ser Telu Asn Gly Thr Ser Met Ala Thr Pro His Wall Ala Gly Ala 210 215 220

Ala Ala Telu Wall Lys Glin Lys Asn Pro Ser Trp Ser Asn Wall Glin Ile 225 230 235 240

Arg Asn His Telu Lys Asn Thr Ala Thr Ser Teu Gly Ser Thr Asn Telu 245 250 255

Gly Ser Gly Teu Wall Asn Ala Glu Ala Ala Thr Arg 260 265

SEQ ID NO 17 LENGTH 275 TYPE PRT ORGANISM: Bacillus subtilis

<400 SEQUENCE: 17

Ala Glin Ser Wall Pro Tyr Gly Ile Ser Glin Ile Ala Pro Ala Telu 1 5 10 15

His Ser Glin Gly Tyr Thr Gly Ser Asn Wall Wall Ala Wall Ile Asp 25 30

Ser Gly Ile Asp Ser Ser His Pro Asp Telu Asn Wall Arg Gly Gly Ala 35 40 45

Ser Phe Wall Pro Ser Glu Thr Asn Pro Glin Asp Gly Ser Ser His 50 55 60

Gly Thr His Wall Ala Gly Thr Ile Ala Ala Leu Asn Asn Ser Ile Gly 65 70 75

Wall Telu Gly Wall Ser Pro Ser Ala Ser Telu Ala Wall Wall Telu 85 90 95

Asp Ser Thr Gly Ser Gly Glin Tyr Ser Trp Ile Ile Asn Gly Ile Glu 100 105 110

Trp Ala Ile Ser Asn Asn Met Asp Wall Ile Asn Met Ser Telu Gly Gly 115 120 125

Pro Thr Gly Ser Thr Ala Teu Lys Thr Wall Wall Asp Ala Wall Ser 130 135 1 4 0

Ser Gly Ile Wall Wall Ala Ala Ala Ala Gly Asn Glu Gly Ser Ser Gly 145 15 O 155 160

Ser Thr Ser Thr Wall Gly Tyr Pro Ala Lys Pro Ser Thr Ile Ala 1.65 170 175

Wall Gly Ala Wall Asn Ser Ser Asn Glin Arg Ala Ser Phe Ser Ser Ala 18O 185 19 O

Gly Ser Glu Telu Asp Wall Met Ala Pro Gly Wall Ser Ile Glin Ser Thr 195 200 2O5

Teu Pro Gly Gly Thr Gly Ala Tyr Asn Gly Thr Met Ala Thr 210 215 220

Pro His Wall Ala Gly Ala Ala Ala Telu Ile Teu Ser His Pro Thr 225 230 235 240

Trp Thr Asn Ala Glin Wall Arg Asp Telu Glu Ser Thr Ala Thr 245 250 255

Teu Gly Asn Ser Phe Tyr Tyr Gly Lys Gly Teu Ile Asn Wall Glin Ala 260 265 27 O

Ala Ala Glin 275 US 7,037,894 B2 103 104

-continued SEQ ID NO 18 LENGTH 26 6 TYPE ORGANISM: Bacillus subtilis

<400 SEQUENCE: 18

Ala Lys Cys Val Ser Tyr Gly Wall Ser Glin Ile Ala Pro Ala Telu 1 5 10 15

His Ser Glin Gly Tyr Thr Gly Ser Asn Wall Wall Ala Wall Ile Asp 25 30

Ser Gly Ile Asp Ser Ser His Pro Asp Telu Asn Wall Ala Gly Gly Ala 35 40 45

Ser Phe Wall Pro Ser Glu Thr Asn Pro Phe Glin Asp Asn Asn Ser His 50 55 60

Gly Thr His Wall Ala Gly Thr Wall Telu Ala Wall Ala Pro Ser Ala Ser 65 70 75

Teu Ala Wall Lys Wall Teu Gly Ala Asp Gly Ser Gly Glin Tyr Ser 85 90 95

Trp Ile Ile Asn Gly Ile Glu Trp Ala Ile Ala Asn Asn Met Asp Wall 100 105 110

Ile Asn Met Ser Teu Gly Gly Pro Ser Gly Ser Ala Ala Telu Ala 115 120 125

Ala Wall Asp Ala Wall Ala Ser Gly Wall Wall Wall Wall Ala Ala Ala 130 135 1 4 0

Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Wall Gly Tyr Pro Gly 145 15 O 155 160

Pro Ser Wall Ile Ala Wall Gly Ala Wall Asp Ser Ser Asn Glin 1.65 170 175

Arg Ala Ser Phe Ser Ser Wall Gly Pro Glu Teu Asp Wall Met Ala Pro 18O 185 19 O

Gly Wall Ser Ile Cys Ser Thr Telu Pro Gly Asn Tyr Gly Ala 195 200

Ser Gly Thr Ser Met Ala Ser Pro His Wall Ala Gly Ala Ala Ala Telu 210 215 220

Ile Telu Ser His Pro Asn Trp Thr Asn Thr Glin Wall Arg Ser Ser 225 230 235 240

Teu Glu Asn Thr Thr Thr Lys Telu Gly Asn Ser Phe Gly 245 250 255

Gly Telu Ile Asn Wall Glin Ala Ala Ala Glin 260 265

SEQ ID NO 19 LENGTH 22 TYPE DNA ORGANISM: Bacillus subtilis

<400 SEQUENCE: 19 cc.gagc gttg catatgtgga ag

SEQ ID NO 20 LENGTH 54 TYPE ORGANISM: Bacillus subtilis

<400 SEQUENCE: 20 ttaggatcct taatgatgat gatgatgatg ttgttgcagot gcttgtacgt to at US 7,037,894 B2 105 106

-continued <210> SEQ ID NO 21 &2 11s LENGTH 27 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 21 ggctictaacg tatatgtagc tigittatc 27

<210> SEQ ID NO 22 &2 11s LENGTH 27 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 22 gata acagot acatatacgt tagagcc 27

<210> SEQ ID NO 23 &2 11s LENGTH 27 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 23 ttaattctitt cittaccacco gacittgg 27

<210> SEQ ID NO 24 &2 11s LENGTH 2.8 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400> SEQUENCE: 24 ccaagtcggg togtaagaaa gaattaac 28

<210> SEQ ID NO 25 &2 11s LENGTH 25 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 25 gacagoggaa tttactictitc. tcatc 25

<210> SEQ ID NO 26 &2 11s LENGTH 25 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 26 gatgagaaga gtaaattic.cg citgtc 25

<210 SEQ ID NO 27 &2 11s LENGTH 27 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 27 caaag cacac tittatggagg cacttac 27

<210> SEQ ID NO 28 &2 11s LENGTH 26 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 28 taagtgcc to cataaagtgt gctittg 26 US 7,037,894 B2 107 108

-continued <210 SEQ ID NO 29 &2 11s LENGTH 2.8 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 29 ggctaccotg catattatcc ttctacta 28

<210 SEQ ID NO 30 &2 11s LENGTH 27 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 30 agtagaagga taatatgcag g g tagcc 27

<210> SEQ ID NO 31 &2 11s LENGTH 27 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 31 agcgcaggitt cittatcttga tigtgatg 27

<210> SEQ ID NO 32 &2 11s LENGTH 27 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400> SEQUENCE: 32 catcacatca agataagaac citgcgct 27

<210 SEQ ID NO 33 &2 11s LENGTH 26 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 33 ccataccagg act acagttc. tcacgg 26

<210> SEQ ID NO 34 &2 11s LENGTH 26 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 34 cc.gtgagaac totagt cotg gtatgg 26

<210 SEQ ID NO 35 &2 11s LENGTH 26 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 35 aagtgcttga ttatacagga agcggc 26

<210 SEQ ID NO 36 &2 11s LENGTH 26 &212> TYPE DNA <213> ORGANISM: Bacillus subtilis

<400 SEQUENCE: 36 gcc.gctitcct gtataatcaa goactt 26

US 7,037,894 B2 111 112 (b) mutating at least one of the selected residues to nations thereof, wherein the protein is obtained from a tyrosine; and method comprising: (c) cross-linking the residue pairs in the presence of an (a) selecting one or more residue pairs in a protein for oxidant; di-tyrosine cross-linking, wherein the di-tyrosine cross-linked protein retains at least (b) mutating at least one of the selected residues to one functional activity displayed by the protein in the tyrosine; absence of di-tyrosine cross-linking, and (c) isolating the protein; and wherein at least one tyrosine of a di-tyrosine cross-link originates from a point mutation to tyrossine. (d) cross-linking tyrosine residue pairs in the presence of 6. The method of claim 5, wherein the di-tyrosine cross 10 an oxidant; link reaction occurs in the presence of one or more oxidants wherein the di-tyrosine cross-linked protein retains at least selected from the group consisting of hydrogen peroxide, one functional activity displayed by the protein in the oxone, magnesium monoperoxyphthalic acid hexahydrate absence of di-tyrosine cross-linking, and (MMPP), a photogenerated oxidant, ammonium persulfate, wherein at least one tyrosine of a di-tyrosine cross-link or any combination thereof. 15 originates from a point mutation to tyrosine. 7. The method of claim 6, wherein the di-tyrosine cross 17. The protein of claim 16, further comprising at least linking is catalyzed by a catalyst selected from the group one amino acid which originates from a point mutation from consisting of polyhistidine, Gly-Gly-His, a tyrosine Such that the amino acid is not cross-linked under metalloporphyrin, a peroxidase or any combination thereof. cross-linking conditions. 8. The protein of claim 5, wherein the protein is a 18. The protein of claim 5 or 16, wherein the protein has hormone, a receptor, a growth factor, an enzyme, an enhanced stability compared to the protein in the absence of antibody, or a fragment of a hormone, a receptor, a growth di-tyrosine cross-linking. factor, an enzyme or an antibody. 19. The protein of claim 16, wherein the protein is an 9. The protein of any of claims 1-4 or 8, wherein the enzyme, a hormone, a growth factor, a receptor, an antibody, protein is part of a pharmaceutical composition. 25 or a fragment of an enzyme, a hormone, a growth factor, a 10. The protein of claim 9, wherein the pharmaceutical receptor, or an antibody. composition comprises a pharmaceutically acceptable car 20. The protein of claim 16, wherein the di-tyrosine rier. cross-link reaction occurs in the presence of one or more 11. The protein of claim 9, wherein the pharmaceutical oxidants selected from the group consisting of hydrogen composition is suitable for in Vivo use in humans. 30 peroxide, oxone, magnesium monoperoxyphthalic acid 12. The protein of claim 1, wherein the protein is a hexahydrate (MMPP), a photogenerated oxidant, ammo chimeric polypeptide comprising a hormone, a receptor, a nium persulfate, or any combination thereof. growth factor, an enzyme, an antibody, or a fragment of an 21. The protein of claim 16, wherein the di-tyrosine enzyme, a hormone, a growth factor, a receptor, or an cross-linking is catalyzed by a catalyst selected from the antibody. 35 group consisting of polyhistidine, Gly-Gly-His, a 13. The protein of any of claims 1–4, 8 or 12, wherein the metalloporphyrin, a peroxidase or any combination thereof. protein is part of a kit. 22. The protein of claim 16, wherein the protein is a 14. A composition comprising a protein of any of claims chimeric polypeptide comprising a hormone, a receptor, a 1–4, 8 or 12. growth factor, an enzyme, an antibody, or a fragment of an 15. The composition of claim 14, wherein the composi 40 enzyme, a hormone, a growth factor, a receptor, or an tion is part of a kit. antibody. 16. An isolated stabilized protein having a functional 23. A composition comprising a protein of claim 16 or 22. activity selected from the group consisting of an enzymatic 24. A kit comprising the protein of claim 16 or 22. activity, an antigen-binding activity, a protein-protein inter 25. A kit comprising the composition of claim 23. action activity, a DNA binding activity, a hormone activity, 45 a receptor activity a growth factor activity, and any combi UNITED STATES PATENT AND TRADEMARK OFFICE CERTIFICATE OF CORRECTION

PATENT NO. 7,037,894 B2 Page 1 of 1 APPLICATION NO. : 09/837235 DATED : May 2, 2006 INVENTOR(S) : Christopher P. Marshall et al. It is certified that error appears in the above-identified patent and that said Letters Patent is hereby corrected as shown below:

Title Page, Item (75), Inventors Please delete Paul B. Marshall, Munich (DE)

Signed and Sealed this Twelfth Day of January, 2010

David J. Kappos Director of the United States Patent and Trademark Office