<<

US009 156897B2

(12) United States Patent (10) Patent No.: US 9,156,897 B2 Alvarez et al. (45) Date of Patent: *Oct. 13, 2015

(54) FUSION POLYPEPTIDES COMPRISING AN (56) References Cited ACTIVE LINKED TO A -DOMAN POLYPEPTIDE U.S. PATENT DOCUMENTS 5,635,599 A 6/1997 Pastan et al. (71) Applicant: Alkermes, Inc., Waltham, MA (US) 5,739,282 A 4/1998 Colotta et al. 5,747,444 A 5/1998 Haskill et al. (72) Inventors: Juan Alvarez, Chelmsford, MA (US); 5,814,469 A 9, 1998 Haskill 5,824,549 A 10, 1998 Haskill et al. Jean Chamoun, Waltham, MA (US); 5,837.495 A 11/1998 Colotta et al. Heather C. Losey, Lexington, MA (US) 5,840,496 A 11, 1998 Haskill 5,872,095 A 2f1999 Haskill et al. (73) Assignee: Alkermes, Inc., Waltham, MA (US) 6,011,002 A 1/2000 Pastan et al. 6,087, 178 A 7/2000 Haskill et al. 6,096,728 A 8, 2000 Collins et al. (*) Notice: Subject to any disclaimer, the term of this 6,294,170 B1 9, 2001 Boone et al. patent is extended or adjusted under 35 6.492.492 B1 12/2002 Stayton U.S.C. 154(b) by 0 days. 6,497,870 B1 12/2002 Ford et al. 6,518,061 B1 2/2003 Puri et al. This patent is Subject to a terminal dis 6,733,753 B2 5, 2004 Boone et al. claimer. 7,619,066 B2 11/2009 Raibekas et al. 7,700,318 B2 4, 2010 Hui 8,034,351 B2 10/2011 Holgersson (21) Appl. No.: 13/911,818 8,734,774 B2 5/2014 Frelinger et al. 2002fO159969 A1 10/2002 Agrawal et al. (22) Filed: Jun. 6, 2013 2003/OO73822 A1 4/2003 Lofling et al. 2003. O165825 A1 9, 2003 Balint et al. (65) Prior Publication Data 2004/OOO2585 A1* 1/2004 Holgersson ...... 530/350 2004/O137580 A1 7/2004 Holgersson et al. US 2013/0338067 A1 Dec. 19, 2013 2004/0175359 A1 9/2004 Desjarlais et al. 2007/0105767 A1 5/2007 Kharbanda et al. 2007,0264234 A1 1 1/2007 Sayers et al. 2008, OOO3619 A1 1/2008 Lutz et al. Related U.S. Application Data 2008/0241166 A1 10/2008 Tomlinson et al. (60) Provisional application No. 61/657.264, filed on Jun. (Continued) 8, 2012, provisional application No. 61/778,575, filed on Mar. 13, 2013, provisional application No. FOREIGN PATENT DOCUMENTS 61/657,378, filed on Jun. 8, 2012, provisional WO 9527732 A2 10, 1995 application No. 61/723,081, filed on Nov. 6, 2012, WO 96.294.17 A1 9, 1996 provisional application No. 61/657.285, filed on Jun. (Continued) 8, 2012, provisional application No. 61/778,812, filed OTHER PUBLICATIONS on Mar. 13, 2013. UniProt Protein Database, Protein Accession Q8N307, MUC20 or (51) Int. Cl. Mucin-20, Sequence on pp. 6-7, accessed on Nov. 20, 2014.* C07K I4/47 (2006.01) Thornton, D.J., et al., “From to Mucus Toward a More Coher ent Understanding of this Essential Barrier.” Proc Am Thorac Soc., C07K 4/545 (2006.01) vol. 1, pp. 54-61 (2004). C07K 4/54 (2006.01) Lang, T., et al., “Bioinformatic Identification of Polymerizing and C07K I4/55 (2006.01) Transmembrane Mucins in the Puffer Fish Fugurubripes.” Glycobiol C07K I4/75 (2006.01) ogy 14(6): pp. 521-527 (2004). C07K I4/50 (2006.01) Antibody Structure and Classification-Note 7.1, Molecular Probes C07K 4/705 (2006.01) the Handbook. www.invitrogen.com, retrieved from the Internet, C07K I4/7 (2006.01) Nov. 2011. (52) U.S. Cl. (Continued) CPC ...... C07K 14/4713 (2013.01); C07K 14/4727 Primary Examiner — James H Alstrum Acevedo (2013.01); C07K 14/50 (2013.01); C07K Assistant Examiner — Erinne Dabkowski 14/545 (2013.01); C07K 14/54.12 (2013.01); (74) Attorney, Agent, or Firm — Elmore Patent Law Group, C07K 14/54.43 (2013.01); C07K 14/55 P.C.; Darlene A. Vanstone; Carolyn S. Elmore (2013.01); C07K 14/70503 (2013.01); C07K 14/71 (2013.01); C07K 14/7155 (2013.01); (57) ABSTRACT C07K 2319/00 (2013.01); C07K 2319/02 The present invention provides fusion comprising a (2013.01) mucin-domain polypeptide covalently linked to an active pro (58) Field of Classification Search tein that has improved properties (e.g. pharmacokinetic and/ CPC ...... C07K 14/4713; C07K 14/4727; C07K or physicochemical properties) compared to the same active 14/5412: C07K 14/545; C07K 14/55; C07K protein not linked to mucin-domain polypeptide, as well as 14/7155; C07K 14/50; C07K 14/5443; C07K methods for making and using the fusion proteins of the 14/71; C07K 2319/00; C07K 2319/02 invention. See application file for complete search history. 1 Claim, 21 Drawing Sheets US 9,156,897 B2 Page 2

(56) References Cited OTHER PUBLICATIONS 1 , Wikipedia, The free encyclopedia, U.S. PATENT DOCUMENTS retrieved from the Internet, Nov. 2011. , Wikipedia, The free encyclopedia, retrieved from the 2008/0286211 A1 11, 2008 Barker 2008.O3OO193 A1 12, 2008 Ahn et al. Internet, Nov. 2011. 2009,0005266 A1 1/2009 Ostermeier et al. Arai, et al., “Design of the Linkers Which Effectively Separate 2010.0035804 A1 2, 2010 Pradhananga et al. Domains of a Bifunctional Fusion Protein.” Protein Engineering 2010.0036001 A1 2, 2010 DeAngelis 14(8): pp. 529-532 (Sep. 2001). 2010, OO63258 A1 3, 2010 Swartz et al. Wriggers, et al., “Control of Protein Functional Dynamics by 2010.0196991 A1 8, 2010 O'Connell et al. Linkers.” Biopolymers (Peptide Science) 80: pp. 736-746 (May 2010/0261872 A1* 10, 2010 DeFrees et al...... 530,322 2010/0298.236 A1 11, 2010 Grotzinger et al. 2005). 2012/0028911 A1 2, 2012 Shebuski et al. Zhang, et al., “Design and Optimization of a Linker for Fusion 2013,0040845 A1 2, 2013 Springer et al. Protein Construction.” Progress in Natural Science 19: pp. 1197.- 2000 (Sep. 2009). FOREIGN PATENT DOCUMENTS Jones, D., et al., “Developing Therapeutic Proteins by Engineering Ligand-Receptor Interactions. Trends in Biotechnology 26(9): pp. WO 98.18924 A1 5, 1998 498-505 (2008). WO 98.18926 A1 5, 1998 Yu, Y, et al., “Circular Permutation: A Different Way to Engineer WO O196565 A2 12/2001 Enzyme Structure and Function.” Trends in Biotechnology 29(1): pp. WO O222149 A1 3, 2002 WO O30593.76 A1 T 2003 18-25 (Jan. 2011). WO 200403.3651 A2 4/2004 Heaney, M., et al., “Soluble Cytokine Receptors.” BLOOD, The WO 2005OO3165 A2 1, 2005 Journal of the American Society of Hematology, 87 (3): pp. 847-857 WO 2007128979 A1 11, 2007 (Feb. 1996). WO 2008072O75 A2 6, 2008 WO WO 2008072O75 A3 * 11, 2008 ...... CO7K 14f76 * cited by examiner U.S. Patent Oct. 13, 2015 Sheet 1 of 21 US 9,156,897 B2

F.G. 1

U.S. Patent Oct. 13, 2015 Sheet 3 of 21 US 9,156,897 B2

was site :::::

F.G. 3

U.S. Patent Oct. 13, 2015 Sheet 5 of 21 US 9,156,897 B2

s: w 88.388xxix. 8eight. Sixxis is six 8883:38:8

v:::::::::::::::

F.G. S U.S. Patent Oct. 13, 2015 Sheet 6 of 21 US 9,156,897 B2

3.

vici:8:

F.G. 6 U.S. Patent Oct. 13, 2015 Sheet 7 of 21 US 9,156,897 B2

. 8.3: 388 x 8.3: 3:83. 8:8

&:38kira:

F.G. 7 U.S. Patent Oct. 13, 2015 Sheet 8 of 21 US 9,156,897 B2

:38: 88 8:3. 3 8:38kix:3 : so

FG, 8

U.S. Patent Oct. 13, 2015 Sheet 10 of 21 US 9,156,897 B2

& 38 38 3& x: ises

F.G. 10 U.S. Patent Oct. 13, 2015 Sheet 11 of 21 US 9,156,897 B2

& - s: Sixes

F.G. U.S. Patent Oct. 13, 2015 Sheet 12 of 21 US 9,156,897 B2

U.S. Patent Oct. 13, 2015 Sheet 13 of 21 US 9,156,897 B2

U.S. Patent Oct. 13, 2015 Sheet 14 of 21 US 9,156,897 B2

U.S. Patent Oct. 13, 2015 Sheet 15 of 21 US 9,156,897 B2

Day 21

---, IV injection Score - 400 (4 mg) ug) 200 it LPS Necropsy mAb Cocktail • SC injection of treatent

SSSSSSSSSSSSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSS SSS Dose Conc Dose Wol. Conc Dose Wo. Dose Conc Dose Wol Group (mg (mg/ml) it) Dose (ng) (mg/ml) () (mg/kg) (mg/ml) (mL/kg) Saline 4 10 mg/ml. 400 || 0 1 1 0.5 200 || 0 || 0 || 1 Ankinra 4 10 mg/m. 400 0.1 0.5 200 20 150 0.13

F.G. 15 U.S. Patent Oct. 13, 2015 Sheet 16 of 21 US 9,156,897 B2

Effect of Sigie Sc injection of Biologic on Paw Edema it is Se Oce

3.33. : a M. 8 s a. - & s ors 8.33 3. v s ; : x * as ... -8 - - - - ...... 8 a s 3. * t 838. : f E & x8

& s

F.G. 6 U.S. Patent Oct. 13, 2015 Sheet 17 of 21 US 9,156,897 B2

Mean ROB 1815 and 1816. After SC or Vadimir to SD Rats 00000. 100000 -- 1815 - 2.1mpk IV 10000 0000 -a 1815 - 5.6mpk SC 1000 000 s.A as a a 1816 - IV 2.4 mpk ^^ - 1816 - SC 8.4 mpk OO 0. 1. O 1. 0. 0.1 O 2O 40 60 80 100 Time (hr)

50300 RDB-181639 to 2s 66000

F.G. 7 U.S. Patent Oct. 13, 2015 Sheet 18 of 21 US 9,156,897 B2

U.S. Patent Oct. 13, 2015 Sheet 19 of 21 US 9,156,897 B2

• Scieckier sieight Staxiarcis -- 3:33

-xx-xx-xx-xx:::::::: *------......

as its 25 so 8 2. ;: Retention waitine ti

F.G. 19 U.S. Patent Oct. 13, 2015 Sheet 20 of 21 US 9,156,897 B2

· Execi RDB2203

Exercis-3:8Csx 8.8, 38 R82283: EC x 38 U.S. Patent Oct. 13, 2015 Sheet 21 of 21 US 9,156,897 B2

LSC 12-261 (exendin-4-mucin) Rat PK: Mean 100000 :------: * -0* - SC 0.65 mg/kg

US 9,156,897 B2 1. 2 FUSION POLYPEPTIDES COMPRISING AN potential to modify the physicochemical properties of a pro ACTIVE PROTEIN LINKED TO A tein such as charge, Solubility and Viscoelastic properties of MUCIN-DOMAIN POLYPEPTIDE concentrated Solutions of the active protein. The fusion protein compositions and methods of the RELATED APPLICATION(S) present invention improve the biological, pharmacological, safety, and/or pharmaceutical properties of an active protein. This application claims the benefit of U.S. Provisional Application Nos. 61/657.264, filed on Jun. 8, 2012: 61/778, SUMMARY OF THE INVENTION 575, filed Mar. 13, 2013: 61/657,378, filed Jun. 8, 2012: 61/723,081, filed Nov. 6, 2012: 61/657,285, filed Jun. 8, 2012 10 The present invention provides fusion proteins comprising and 61/778,812, filed Mar. 13, 2013. The entire teachings of a mucin-domain polypeptide covalently linked to an active the above application(s) are incorporated herein by reference. protein that has improved properties (e.g. pharmacokinetic and/or physicochemical properties) compared to the same SEQUENCE LISTING active protein not linked to mucin-domain polypeptide, as 15 well as methods for making and using the fusion proteins of The instant application contains a Sequence Listing which the invention. has been submitted in ASCII format via EFS-Web and is In one embodiment the invention provides a fusion protein hereby incorporated by reference in its entirety. Said ASCII comprising a mucin-domain polypeptide linked to an active copy, created on May 31, 2013, is named protein wherein at least one pharmacokinetic or physico 4000.3058WO SL.txt and is 27,431 bytes in size. chemical property of the active protein is improved as com pared to the corresponding active protein that is not fused to BACKGROUND OF THE INVENTION the mucin-domain polypeptide. In one embodiment the invention provides nucleic acid The pharmacokinetics, pharmacodistribution, Solubility, sequences encoding the fusion protein of the invention as well stability, enhancement of effector function and receptor bind 25 as vectors and host cells for expressing the nucleic acids of the ing of protein therapeutics can be significantly influenced by invention. the carbohydrate moiety of glycosylated proteins. In addition, In one embodiment the invention provides methods for many biologically active and proteins have limited extending the serum half life of a therapeutic active protein. solubility, or become aggregated during recombinant produc In one embodiment the invention provides improving the tions, requiring complex solubilization and refolding proce 30 solubility of a therapeutic active protein. dures. Furthermore, protein and peptide therapeutics with In one embodiment the invention provides pharmaceutical molecular weights lower than 60 kilodaltons (kD) often suffer compositions comprising the fusion proteins of the invention. from short half-lives due to renal clearance. In one embodiment, the invention provides methods of Current strategies employed to extend serum half-life of treating diseases, conditions and disorders in Subject in need protein therapeutics primarily fall within two general catego 35 of treatment using the pharmaceutical compositions of the ries: 1) utilization of FcRn-mediated recycling and 2) invention. increase of hydrodynamic Volume. Specific approaches which have been described include conjugation, binding, or BRIEF DESCRIPTION OF THE DRAWINGS fusion to FcRn-binding proteins or domains (Fc, albumin) for the former strategy, and multimerization, chemical coupling 40 FIG. 1. Coomassie Blue-stained SDS/polyacrylamide gel to polymers or carbohydrates (such as PEG, Colominic acid, (A) and IEF gel (B) of IL1Ra mucin constructs. Arrows or Hydroxyethyl starch), incorporation of N-glycosylation indicate the proteins of interest. Multiplicity of bands of in sites for the latter. However, the production of Fc-fusion IEF gel indicate differentially charged species, most likely proteins is a time-consuming, inefficient, and expensive pro due to differences in N-glycosylation. cess that requires additional manufacturing steps and often 45 FIG. 2. Gel filtration chromatogram of RDB1813 (grey) complex purification procedures. In addition, chemical cou and molecular size standards (black). Molecular weights of pling strategies, PEGylation being the most widely used, the standards and apparent molecular weight of RDB1813 are result in significant increases in production costs due to the listed above each eluting peak. addition of conjugation and purification steps and reduced FIG. 3. Gel filtration chromatogram of RDB1814 (grey) overall yields. Recently, other recombinant PEG mimetics 50 and molecular size standards (black). Molecular weights of produced through fusion of a long, flexible polypeptide the standards and apparent molecular weight of RDB1814 are sequence, such as those described in U.S. 2010/0239554 A1, listed above each eluting peak. have also been described. Although this technology circum FIG. 4. Gel filtration chromatogram of RDB1826 (grey) vents the additional conjugation step, the added peptide and molecular size standards (black). Molecular weights of sequence, being non-endogenous, has the potential for immu 55 the standards and apparent molecular weight of RDB1826 are nogenicity. listed above each eluting peak. Mucin proteins and mucin-domains of proteins contain a FIG. 5. Gel filtration chromatogram of RDB1815 (grey) high degree of glycosylation which structurally allows mucin and molecular size standards (black). Molecular weights of proteins and other polypeptides comprising mucin domains the standards and apparent molecular weight of RDB1815 are to behave as stiffened random coils. This stiffened random 60 listed above each eluting peak. coiled structure in combination with the hydrophilic FIG. 6. Gel filtration chromatogram of RDB1816 (grey) branched hydrophilic carbohydrates that make up the heavily and molecular size standards (black). Molecular weights of glycosylated mucin domains is particularly useful in for the standards and apparent molecular weight of RDB1816 are increasing the hydrodynamic radius of the active protein listed above each eluting peak. beyond what would be expected based on the molecular 65 FIG. 7. Inhibition of IL1B signaling by RDB1813 (2TR) weight of the expressed protein. Also because of the high and RDB1814 (4TR) in the HEK-blue assay. Activity of level of glycosylation, addition of a mucin domain has the IL1B (Yw ) as a function of its concentration in the US 9,156,897 B2 3 4 absence of inhibition. Inhibition by RDB1813 ("K", FIG. 19. Gel filtration chromatogram of RDB2203 (grey) RDB 1814 (...:...) and IL1Ra (Anakinra x:y were mea and molecular size standards (grey). Molecular weights of the sured in the presence of 15 pM of IL1B. All measurements standards are listed above each eluting peak. were made in duplicate. Estimated values of ICs are reported FIG.20. GLP-1 Ractivity assay for RDB2203 and exendin in the top right corner of the figure. 5 4. FIG.8. Inhibition of IL1 B signaling by RDB1826 (6TR) in FIG. 21. Pharmacokinetic profile of RDB2203. the HEK-blue assay. Activity of IL1B (QQx) as a function DETAILED DESCRIPTION OF THE INVENTION of its concentration in the absence of inhibition. Inhibition by RDB 1826 (s.3.) and IL1Ra (Anakinra as:- ) were mea 10 A description of preferred embodiments of the invention sured in the presence of 15 pM of IL1B. All measurements follows. were made in duplicate. Estimated values of ICso are reported Definitions in the top right corner of the figure. As used herein, the following terms have the meanings FIG. 9. Inhibition of IL1 B signaling by RDB1815 (8TR) ascribed to them unless specified otherwise. and RDB1816 (12TR) in the HEK-blue assay. Activity of 15 As used in the specification and claims, the singular forms IL1B (Yw ) as a function of its concentration in the a”, “an and “the include plural references unless the con absence of inhibition. Inhibition by RDB1815 ("X"), text clearly dictates otherwise. For example, the term “a cell RDB 1816 (s.3.) and IL (Anakinra as:- ) were measured includes a plurality of cells, including mixtures thereof. in the presence of 15 pM of IL1B. All measurements were The terms “polypeptide', 'peptide', and “protein’ are used made in duplicate. Estimated values of ICso are reported in interchangeably herein to refer to polymers of amino acids of the top right corner of the figure. any length. The polymer may be linear or branched, it may FIG.10. Surface Plasmon Resonance (SPR) measurements comprise modified amino acids, and it may be interrupted by of RDB1813 binding to immobilized mouse IL1RI receptor. non-amino acids. The terms also encompass an amino acid Sensorgrams and fitted curves are in grey and black, respec 25 polymer that has been modified, for example, by disulfide tively. The kinetic parameters for RDB1813 and Anakinra bond formation, glycosylation, lipidation, acetylation, phos (data not shown) are in the inset. phorylation, or any other manipulation, such as conjugation FIG.11. Surface Plasmon Resonance (SPR) measurements with a labeling component. of RDB1814 binding to immobilized mouse IL1RI receptor. As used herein the term "amino acid' refers to either natu Sensorgrams and fitted curves are in grey and black, respec 30 ral and/or unnatural or synthetic amino acids, including but tively. The kinetic parameters for RDB1814 and Anakinra not limited to glycine and both the D or L optical isomers, and (data not shown) are in the inset. amino acid analogs and peptidomimetics. Standard single or FIG. 12. Surface Plasmon Resonance (SPR) measurements three letter codes are used to designate amino acids. of RDB1826 binding to immobilized mouse IL1RI receptor. The term “non-naturally occurring, as applied to Sensorgrams and fitted curves are in grey and black, respec 35 sequences and as used herein, means polypeptide or poly tively. The kinetic parameters for RDB1826 and Anakinra nucleotide sequences that do not have a counterpart to, are not (data not shown) are in the inset. complementary to, or do not have a high degree of homology FIG. 13. Surface Plasmon Resonance (SPR) measurements with a wild-type or naturally-occurring sequence found in a of RDB1815 binding to immobilized mouse IL1RI receptor. mammal. For example, a non-naturally occurring polypep Sensorgrams and fitted curves are in grey and black, respec 40 tide may share no more than 99%, 98%. 95%, 90%, 80%, tively. The kinetic parameters for RDB1815 and Anakinra 70%, 60%, 50% or even less amino acid sequence identity as (data not shown) are in the inset. compared to a natural sequence when Suitably aligned. FIG. 14. Surface Plasmon Resonance (SPR) measurements The terms “glycosylation' and “glycosylated are used of RDB1816 binding to immobilized mouse IL1RI receptor. interchangeably herein to mean the carbohydrate portion of a Sensorgrams and fitted curves are in grey and black, respec 45 protein or the process by which Sugars are post-translation tively. The kinetic parameters for RDB1816 and Anakinra ally attached to proteins during their production in cells to (data not shown) are in the inset. form glyco-proteins. Glycosylation of proteins is a post FIG. 15. Experimental design for evaluation of RDB1816 translational event and refers to the attachment of glycans to in the mouse CAIA model of inflammation. serine and threonine and, to a lesser extent to hydroxyproline FIG. 16. The inhibitory effects of a single 20 mg/kg injec 50 and hydroxylysine in the case of O-linked glycosylation, or tion of RDB1816 (Š ) IL1Ra (Anakinra - ), and saline asparagine, in the case of N-linked glycosylation. control (*) in the mouse CAIA model of inflammation. The A “fragment' is a truncated form of a native active protein black arrows indicate the days of injection with the mono that retains at least a portion of the therapeutic and/or biologi clonal antibody cocktail (mAb) and with LPS and treatment cal activity. A “variant is a protein with molecule. A group of eight mice were used for each treatment 55 to the native active protein that retains at least a portion of the and each time point represents the mean from each group. therapeutic and/or biological activity of the active protein. FIG. 17. Pharmacokinetic profile of RDB1815 and For example, a variant protein may share at least 70%, 75%, RDB 186 in rat. The plasma concentration-time profiles are 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% amino acid recorded for RDB1815 i.v. (-- single injection of 2.1 sequence identity with the reference active protein. As used mg/Kg Impk), SC injection ( single injection of 5.6 60 herein, the term “active protein moiety” includes proteins mpk) and for RDB1816 i.v. (x^ single injection 2.4 mpk), modified deliberately, as for example, by site directed SC injection (ws single injection 6.4 mpk). Symbols repre mutagenesis, insertions, or accidentally through mutations. sent the mean from three different rats per condition. Phar A "host cell' includes an individual cell or cell culture macokinetic parameters for the SC groups are Summarized in which can be or has been a recipient for the subject vectors. the table. 65 Host cells include progeny of a single host cell. The progeny FIG. 18. Coomassie Blue-stained SDS/polyacrylamide gel may not necessarily be completely identical (in morphology of exendin-4 mucin construct RDB2203. or in genomic of total DNA complement) to the original US 9,156,897 B2 5 6 parent cell due to natural, accidental, or deliberate mutation. applied to a polynucleotide, a polypeptide, means that the A host cell includes cells transfected in vivo with a vector of polynucleotide or polypeptide is derived from a genotypi this invention. cally distinct entity from that of the rest of the entity to which “Isolated, when used to describe the various polypeptides it is being compared. disclosed herein, means polypeptide that has been identified The terms “polynucleotides”, “nucleic acids”, “nucle and separated and/or recovered from a component of its natu otides' and "oligonucleotides’ are used interchangeably. ral environment. Contaminant components of its natural envi They refer to a polymeric form of nucleotides of any length, ronment are materials that would typically interfere with either deoxyribonucleotides or ribonucleotides, or analogs diagnostic or therapeutic uses for the polypeptide, and may thereof. Polynucleotides may have any three-dimensional include enzymes, hormones, and other proteinaceous or non 10 structure, and may performany function, known or unknown. proteinaceous solutes. AS is apparent to those of skill in the The following are non-limiting examples of polynucleotides: art, a non-naturally occurring polynucleotide, peptide, coding or non-coding regions of a or gene fragment, loci polypeptide, protein, antibody, or fragments thereof, does not () defined from linkage analysis, exons, introns, mes require "isolation” to distinguish it from its naturally occur senger RNA (mRNA), transfer RNA, ribosomal RNA, ring counterpart. In addition, a “concentrated”, “separated 15 ribozymes, cDNA, recombinant polynucleotides, branched or "diluted polynucleotide, peptide, polypeptide, protein, polynucleotides, plasmids, vectors, isolated DNA of any antibody, or fragments thereof, is distinguishable from its sequence, isolated RNA of any sequence, nucleic acid probes, naturally occurring counterpart in that the concentration or and primers. A polynucleotide may comprise modified nucle number of molecules per Volume is generally greater than that otides, such as methylated nucleotides and nucleotide ana of its naturally occurring counterpart. In general, a polypep logs. If present, modifications to the nucleotide structure may tide made by recombinant means and expressed in a host cell be imparted before or after assembly of the polymer. The is considered to be "isolated.” sequence of nucleotides may be interrupted by non-nucle An "isolated polynucleotide or polypeptide-encoding otide components. nucleic acid or other polypeptide-encoding nucleic acid is a “Recombinant’ as applied to a polynucleotide means that nucleic acid molecule that is identified and separated from at 25 the polynucleotide is the product of various combinations of least one contaminant nucleic acid molecule with which it is in vitro cloning, restriction and/or ligation steps, and other ordinarily associated in the natural source of the polypeptide procedures that result in a construct that can potentially be encoding nucleic acid. An isolated polypeptide-encoding expressed in a host cell. nucleic acid molecule is other than in the form or setting in The terms “gene' or “gene fragment” are used inter which it is found in nature. Isolated polypeptide-encoding 30 changeably herein. They refer to a polynucleotide containing nucleic acid molecules therefore are distinguished from the at least one open reading frame that is capable of encoding a specific polypeptide-encoding nucleic acid molecule as it particular protein after being transcribed and translated. A exists in natural cells. However, an isolated polypeptide-en gene or gene fragment may be genomic or cDNA, as long as coding nucleic acid molecule includes polypeptide-encoding the polynucleotide contains at least one open reading frame, nucleic acid molecules contained in cells that ordinarily 35 which may cover the entire coding region or a segment express the polypeptide where, for example, the nucleic acid thereof. A “fusion gene' is a gene composed of at least two molecule is in a chromosomal or extra-chromosomal location heterologous polynucleotides that are linked together. different from that of natural cells. “Homology” or “homologous' refers to sequence similar “Conjugated”, “linked,” “fused, and “fusion” are used ity or interchangeability between two or more polynucleotide interchangeably herein. These terms refer to the joining 40 sequences or two or more polypeptide sequences. When together of two more chemical elements or components, by using a program Such as BestFit to determine sequence iden whatever means including chemical conjugation or recombi tity, similarity or homology between two different amino acid nant means. For example, a promoter or enhancer is operably sequences, the default settings may be used, oran appropriate linked to a coding sequence if it affects the transcription of the scoring matrix, Such as bloSum45 or bloSum80, may be sequence. Generally, “operably linked' means that the DNA 45 selected to optimize identity, similarity or homology scores. sequences being linked are contiguous, and in reading phase Preferably, polynucleotides that are homologous are those or in-frame. An “in-frame fusion” refers to the joining of two which hybridize under stringent conditions as defined herein or more open reading frames (ORFs) to form a continuous and have at least 70%, preferably at least 80%, more prefer longer ORF, in a manner that maintains the correct reading ably at least 90%, more preferably 95%, more preferably frame of the original ORFs. Thus, the resulting recombinant 50 97%, more preferably 98%, and even more preferably 99% fusion protein is a single protein containing two or more sequence identity to those sequences. segments that correspond to polypeptides encoded by the The terms “stringent conditions” or “stringent hybridiza original ORFs (which segments are not normally so joined in tion conditions' includes reference to conditions under which nature). a polynucleotide will hybridize to its target sequence, to a In the context of polypeptides, a "linear sequence' or a 55 detectably greater degree than other sequences (e.g., at least "sequence' is an order of amino acids in a polypeptide in an 2-fold over background). Generally, stringency of hybridiza amino to carboxyl terminus direction in which residues that tion is expressed, in part, with reference to the temperature neighbor each other in the sequence are contiguous in the and salt concentration under which the wash step is carried primary structure of the polypeptide. A "partial sequence' is out. Typically, stringent conditions will be those in which the a linear sequence of part of a polypeptide that is known to 60 salt concentration is less than about 1.5 M Naion, typically comprise additional residues in one or both directions. about 0.01 to 1.0 MNaion concentration (or other salts) at pH “Heterologous' means derived from a genotypically dis 7.0 to 8.3 and the temperature is at least about 30°C. for short tinct entity from the rest of the entity to which it is being polynucleotides (e.g., 10 to 50 nucleotides) and at least about compared. For example, a glycine rich sequence removed 60° C. for long polynucleotides (e.g., greater than 50 nucle from its native coding sequence and operatively linked to a 65 otides) for example, “stringent conditions' can include coding sequence other than the native sequence is a heterolo hybridization in 50% formamide, 1 MNaCl, 1% SDS at 37° gous glycine rich sequence. The term "heterologous' as C., and three washes for 15 min each in 0.1XSSC/1% SDS at US 9,156,897 B2 7 8 60 to 65° C. Alternatively, temperatures of about 65° C., 60° shownherein, in the Tables, Figures or Sequence Listing, may C., 55°C., or 42°C. may be used. SSC concentration may be be used to describe a length over which percentage identity varied from about 0.1 to 2xSSC, with SDS being present at may be measured. about 0.1%. Such wash temperatures are typically selected to A “vector is a nucleic acid molecule, preferably self be about 5°C. to 20°C. lower than the thermal melting point replicating in an appropriate host, which transfers an inserted for the specific sequence at a defined ionic strength and pH. nucleic acid molecule into and/or between host cells. The The Tm is the temperature (under defined ionic strength and term includes vectors that function primarily for insertion of pH) at which 50% of the target sequence hybridizes to a DNA or RNA into a cell, replication of vectors that function perfectly matched probe. An equation for calculating Tm and primarily for the replication of DNA or RNA, and expression 10 vectors that function for transcription and/or translation of the conditions for nucleic acid hybridization are well known and DNA or RNA. Also included are vectors that provide more can be found in Sambrook, J. et al. (1989) Molecular Clon than one of the above functions. An “expression vector' is a ing: A Laboratory Manual, 2" ed., Vol. 1-3, Cold Spring polynucleotide which, when introduced into an appropriate Harbor Press, Plainview N.Y.; specifically see Volume 2 and host cell, can be transcribed and translated into a poly Chapter 9. Typically, blocking reagents are used to block 15 peptide(s). An "expression system' usually connotes a Suit non-specific hybridization. Such blocking reagents include, able host cell comprised of an expression vector that can for instance, sheared and denatured salmon sperm DNA at function to yield a desired expression product. about 100-200 ug/ml. Organic solvent, such as formamide at "Degradation resistance.” as applied to a polypeptide, a concentration of about 35-50% V/V, may also be used under refers to the ability of the polypeptides to withstand degrada particular circumstances, such as for RNA:DNA hybridiza tion in blood or components thereof, which typically involves tions. Useful variations on these wash conditions will be proteases in the serum or plasma, or within a formulation readily apparent to those of ordinary skill in the art. intended as a storage or delivery vehicle for a protein. The The terms “percent identity” and “96 identity, as applied to degradation resistance can be measured by combining the polynucleotide sequences, refer to the percentage of residue protein with human (or mouse, rat, monkey, as appropriate) matches between at least two polynucleotide sequences 25 blood, serum, plasma, or a formulation, typically for a range aligned using a standardized algorithm. Such an algorithm of days (e.g. 0.25, 0.5, 1, 2, 4, 8, 16 days), at specified may insert, in a standardized and reproducible way, gaps in temperatures such as -80° C., -20°C., 0°C., 4°C., 25°C., the sequences being compared in order to optimize alignment and 37°C. The intact protein in the samples is then measured between two sequences, and therefore achieve a more mean using standard protein quantitation techniques. The time ingful comparison of the two sequences. Percent identity may 30 point where 50% of the protein is degraded is the “degrada tion half-life' of the protein. be measured over the length of an entire defined polynucle The term “half-life” typically refers to the time required for otide sequence, for example, as defined by a particular SEQ the plasma concentration of a drug to be reduced by one-half. ID number, or may be measured over a shorter length, for The terms “half-life”, “t”, “elimination half-life” and “cir example, over the length of a fragment taken from a larger, 35 culating half-life' are used interchangeably herein. defined polynucleotide sequence, for instance, a fragment of The “hydrodynamic radius' is the apparent radius (R, in at least 45, at least 60, at least 90, at least 120, at least 150, at nm) of a molecule in a solution calculated from diffusional least 210 or at least 450 contiguous residues. Such lengths are properties. The “hydrodynamic radius' of a proteinaffects its exemplary only, and it is understood that any fragment length rate of diffusion in aqueous Solution as well as its ability to Supported by the sequences shown herein, in the Tables, Fig 40 migrate ingels of macromolecules. The hydrodynamic radius ures or Sequence Listing, may be used to describe a length of a protein is influenced by its molecular weight as well as by over which percentage identity may be measured. its structure, including shape and compactness, and its hydra “Percent (%)amino acid sequence identity, with respect to tion state. Methods for determining the hydrodynamic radius the polypeptide sequences identified herein, is defined as the are well known in the art, such as by the use of DLS and size percentage of amino acid residues in a query sequence that are 45 exclusion chromatography. Most proteins have globular identical with the amino acid residues of a second, reference structure, which is the most compact three-dimensional struc polypeptide sequence or a portion thereof, after aligning the ture a protein can have with the Smallest hydrodynamic sequences and introducing gaps, if necessary, to achieve the radius. Some proteins adopt a random and open, unstructured, maximum percent sequence identity, and not considering any or linear conformation and as a result have a much larger conservative Substitutions as part of the sequence identity. 50 hydrodynamic radius compared to typical globular proteins Alignment for purposes of determining percent amino acid of similar molecular weight. sequence identity can be achieved in various ways that are “Physiological conditions” refer to a set of conditions in a within the skill in the art, for instance, using publicly avail living host as well as in vitro conditions, including tempera able computer software such as BLAST, BLAST-2. ALIGN ture, salt concentration, pH, that mimic those conditions of a or Megalign (DNASTAR) software. Those skilled in the art 55 living Subject. A host of physiologically relevant conditions can determine appropriate parameters for measuring align for use in in vitro assays have been established. Generally, a ment, including any algorithms needed to achieve maximal physiological buffer contains a physiological concentration alignment over the full length of the sequences being com of salt and is adjusted to a neutral pH ranging from about 6.5 pared. Percent identity may be measured over the length of an to about 7.8, and preferably from about 7.0 to about 7.5. A entire defined polypeptide sequence, for example, as defined 60 variety of physiological buffers is listed in Sambrook et al. by a particular SEQ ID number, or may be measured over a (1989). Physiologically relevant temperature ranges from shorter length, for example, over the length of a fragment about 25°C. to about 38°C., and preferably from about 35° C. taken from a larger, defined polypeptide sequence, for to about 37° C. instance, a fragment of at least 15, at least 20, at least 30, at “Controlled release agent”, “slow release agent”, “depot least 40, at least 50, at least 70 or at least 150 contiguous 65 formulation' or “sustained release agent” are used inter residues. Such lengths are exemplary only, and it is under changeably to refer to an agent capable of extending the stood that any fragment length Supported by the sequences duration of release of a polypeptide of the invention relative to US 9,156,897 B2 10 the duration of release when the polypeptide is administered The term “therapeutically effective dose regimen’, as used in the absence of agent. Different embodiments of the present herein, refers to a schedule for consecutively administered invention may have different release rates, resulting in differ doses of an active protein, either alone or as a part of a fusion ent therapeutic amounts. protein composition, wherein the doses are given in therapeu The term 'antagonist', as used herein, includes any mol tically effective amounts to result in sustained beneficial ecule that partially or fully blocks, inhibits, or neutralizes a effect on any symptom, aspect, measured parameter or char biological activity of a native polypeptide disclosed herein. acteristics of a disease state or condition. Methods for identifying antagonists of a polypeptide may Fusion Proteins comprise contacting a native polypeptide with a candidate In various aspects the invention provides fusion proteins antagonist molecule and measuring a detectable change in 10 one or more biological activities normally associated with the that comprise a mucin-domain polypeptide linked to an active native polypeptide. In the context of the present invention, protein. Such proteins are also referred to herein as “muciny antagonists may include proteins, nucleic acids, carbohy lated proteins. As used herein, a “fusion protein’ of the drates, antibodies or any other molecules that decrease the invention comprises a mucin-domain polypeptide linked to effect of an active protein. 15 an active protein. In one embodiment the mucin-domain The term “' is used in the broadest sense and polypeptide and the active protein normally exist in separate includes any molecule that mimics a biological activity of a proteins and are brought together in the fusion protein; or they native polypeptide disclosed herein. Suitable agonist mol may normally exist in the same protein but are placed in a new ecules specifically include agonist antibodies or antibody arrangement in the fusion protein. The compositions and fragments, fragments or amino acid sequence variants of methods of the invention are particularly useful for enhancing native polypeptides, peptides, Small organic molecules, etc. the pharmacokinetic properties, such as half-life of an active Methods for identifying of a native polypeptide may protein when fused with a mucin-domain polypeptide. In one comprise contacting a native polypeptide with a candidate embodiment the fusion proteins of the invention retain all or agonist molecule and measuring a detectable change in one or a portion of the biologic and/or therapeutic activity of the more biological activities normally associated with the native 25 corresponding active protein not linked to a mucin-domain polypeptide. polypeptide. In one embodiment the therapeutic/biological Activity” for the purposes herein refers to an action or activity of the active protein is improved when fused with a effect of a component of a fusion protein consistent with that mucin-domain polypeptide to form a fusion protein of the of the corresponding native active protein, wherein “biologi invention. cal activity” or “bioactivity” refers to an in vitro or in vivo 30 In one embodiment, the fusion protein in accordance with biological function or effect, including but not limited to the invention specifically excludes immunoglobulin mol receptor binding, antagonist activity, agonist activity, or a ecules or any molecules containing an Fc domain, or any cellular or physiologic response. fragment thereof. In one embodiment, the fusion protein or As used herein, “treatment' or “treating,” or “palliating or any portion thereof is not glycosylated by C-13, galactosyl “ameliorating is used interchangeably herein. These terms 35 transferase or B1,6-acetylglucosaminyltransferase. In one refer to an approach for obtaining beneficial or desired results embodiment, the fusion protein does not bind an antibody including but not limited to a therapeutic benefit and/or a specific for C.Gal. In one embodiment the fusion protein of the prophylactic benefit. By therapeutic benefit is meant eradica invention does not bind a Gal O.1, 3Gal specific antibody. tion or amelioration of the underlying disorder being treated. Mucin proteins and mucin-domains of proteins contain a Also, a therapeutic benefit is achieved with the eradication or 40 high degree of glycosylation which structurally allows mucin amelioration of one or more of the physiological symptoms proteins and other polypeptides comprising mucin domains associated with the underlying disorder Such that an improve to behave as stiffened random coils. This stiffened random ment is observed in the subject, notwithstanding that the coiled structure in combination with the hydrophilic subject may still be afflicted with the underlying disorder. For branched hydrophilic carbohydrates that make up the heavily prophylactic benefit, the compositions may be administered 45 glycosylated mucin domains is particularly useful in for to a Subject at risk of developing a particular disease, or to a increasing the hydrodynamic radius of the active protein Subject reporting one or more of the physiological symptoms beyond what would be expected based on the molecular of a disease, even though a diagnosis of this disease may not weight of the expressed protein. Also because of the high have been made. level of glycosylation, addition of a mucin domain also has A “therapeutic effect”, as used herein, refers to a physi 50 the potential to modify the physicochemical properties of a ologic effect, including but not limited to the cure, mitigation, protein such as charge, Solubility and viscoelastic properties amelioration, or prevention of disease in humans or other of concentrated solutions of the active protein. Mucinylated animals, or to otherwise enhance physical or mental well fusion proteins of the invention have several advantages over being of humans or animals, caused by a fusion protein of the the prior art strategies for extending the half life of proteins. invention other than the ability to induce the production of an 55 Fusion proteins of the invention may be produced via stan antibody againstan antigenic epitope possessed by the active dard expression means without the need for further conjuga protein. Determination of a therapeutically effective amount tion and purification steps. Mucin-domain polypeptides may is well within the capability of those skilled in the art, espe be linked to the active protein via either the N- or C-terminus cially in light of the detailed disclosure provided herein. of the active protein. Mucin-domain polypeptides are struc The terms “therapeutically effective amount” and “thera 60 turally less restrictive than other fusion partners in that they peutically effective dose”, as used herein, refers to an amount are monomeric, non-globular proteins having reduced bulk of a active protein, either alone or as a part of a fusion protein and a lowered risk of impact on bioactivity. The use of mucin composition, that is capable of having any detectable, benefi domains lowers the risk of endogenous bioactivities such as cial effect on any symptom, aspect, measured parameter or Fc effector functions. When used to prepare human therapeu characteristics of a disease state or condition when adminis 65 tics, the fusion proteins of the invention may comprise fully tered in one or repeated doses to a subject. Such effect need human sequences with high glycosylation to reduce the risk not be absolute to be beneficial. of immunogenicity. US 9,156,897 B2 11 12 The activity of the fusion protein compositions of the that may vary in length from about 8 amino acids to 150 invention, including functional characteristics or biologic and amino acids per each tandem repeat unit. The number of pharmacologic activity and parameters that result, may be tandem repeat units may vary between 1 and 25 in a mucin determined by any suitable screening assay known in the art domain polypeptide of the invention. for measuring the desired characteristic. The activity and 5 Mucin-domain polypeptides of the invention include, but structure of the fusion proteins may be measured by assays are not limited to, mucin proteins. A "portion thereof is described herein, assays of the Examples, or by methods meant that the mucin polypeptide linker comprises at least known in the art to ascertain the half life, degree of solubility, one mucin domain of a mucin protein. Mucin proteins include structure and retention of biologic activity of the composi any protein encoded for by a MUC gene (i.e., MUC1, MUC2, MUC3A, MUC3B, MUC4, MUC5AC, MUCSB, MUC6, tions of the invention as well as comparisons with active 10 MUC7, MUC8, MUC9, MUC11, MUC12, MUC13, MUC proteins that are not fusion proteins of the invention. 15, MUC16, MUC17, MUC19, MUC20, MUC21). The When referring to the fusion protein, the term “linked' or mucin domain of a mucin protein is typically flanked on either “fused’ or “fusion' is intended to indicate that the mucin side by non-repeating amino acid regions. A mucin-domain domain polypeptide and the active proteins are expressed as a polypeptide may comprise all or a portion of a mucin protein single polypeptide in cells in a manner that allows for 15 (e.g. MUC20) including the extracellular portion of the O-linked glycosylation of the mucin-domain polypeptide and mucin protein, the signal sequence portion of the mucin pro maintains the activity of the active protein. In one embodi tein, the transmembrane domain of the mucin protein, and/or ment the mucin-domain polypeptide may optionally be the cytoplasmic domain of the mucin protein. A mucin-do linked to the active protein via an amino acid linker. The main polypeptide may comprise all or a portion of a mucin amino acid linker may further optionally comprise a cleavage protein of a soluble mucin protein. Preferably the mucin sequence that may be designed to release the active protein domain polypeptide comprises the extracellular portion of a upon administration of the fusion protein to a subject. mucin protein. Optionally, the fusion protein comprising the active protein A mucin domain polypeptide may also comprise all or a fused to the mucin-domain polypeptide may be further fused portion of a protein comprising a mucin domain but that is not to one or more additional moieties intended to enhance the 25 encoded by a MUC gene. Such naturally occurring proteins activity or impart additional activities to the fusion protein. In that are not encoded by a MUC gene but that comprise mucin one embodiment, the fusion protein comprises the structure: domains include, but are not limited to, membrane-anchored A-M-B, wherein A is an N-terminal fusion partner, M is a proteins such as transmembrane immunoglobulin and mucin mucin domain, and B is a C-terminal fusion partner. A and B domain (TIM) family proteins, fractalkine (neurotactin), may comprise similar and dissimilar identities. In one aspect 30 P-selectinglycoprotein ligand 1 (PSGL-1, CD162), E-selec of this embodiment, A and Bare bioactive moieties which can tin, L-selectin, P-selectin, CD34, CD43 (leukosialin, sialo act independently or synergistically in a manner that includes, phorin), CD45, CD68, CD96, CD164, GlyCAM-1, MAd but is not limited to, agonism, antagonism, enzymatic activ CAM, red blood cell , glycocalicin, ity, targeting to specific proteins or cells, chemical reactivity, , LDL-R, ZP3, endosialin, decay accelerating or oligomerization. In another aspect, when A and B are the 35 factor (daf, CD55), podocalyxin, endoglycan, alpha-dystro same, enhancement of activity is driven through avidity. glycan, neurofascin, EMR1, EMR2, EMR3, EMR4, ETL and A fusion protein of the invention can be produced by stan epiglycanin. dard recombinant DNA techniques. For example, DNA frag A mucin-domain polypeptide may also comprise a non ments coding for the different polypeptide sequences are naturally occurring polypeptide having a mucin domain as ligated together in-frame in accordance with conventional 40 that term is defined herein. In one embodiment, the mucin techniques, e.g., by employing blunt-ended or stagger-ended domain polypeptide is designed de novo to comprise a mucin termini for ligation, restriction enzyme digestion to provide domain in accordance with the invention. for appropriate termini, filling-in of cohesive ends as appro In one embodiment a mucin domain polypeptide com priate, alkaline phosphatase treatment to avoid undesirable prises domains of tandem amino acid repeats that are rich in joining, and enzymatic ligation. In another embodiment, the 45 Pro, Ser and Thr. In one aspect of this embodiment, the fusion gene can be synthesized by conventional techniques number of tandem repeat units within a mucin domain including automated DNA synthesizers. Alternatively, PCR polypeptide of the invention is between 1 and 25. Preferably, amplification of gene fragments can be carried out using the number of tandem repeat units within a mucin domain anchor primers that give rise to complementary overhangs polypeptide is between 2 and 20. More preferably, the number between two consecutive gene fragments that can Subse 50 of tandem repeat units within a mucin domain polypeptide is quently be annealed and reamplified to generate a chimeric at least about 4. In a further aspect of this embodiment, the gene sequence (see, for example, Ausubeletal. (eds.) Current percentage of serine and/or threonine and proline residues Protocols in Molecular Biology, John Wiley & Sons, 1992). within a mucin domain polypeptide of the invention is at least Many expression vectors are commercially available to assist 10%. Preferably, the percentage of serine and/or threonine with fusion moieties and will be discussed in more detail 55 and proline residues within a mucin domain polypeptide of below. the invention is at least 20%. More preferably, the percentage Mucin-Domain Polypeptide of serine and/or threonine and proline residues within a mucin A "mucin-domain polypeptide' is defined herein as any domain polypeptide of the invention is greater than 30%. In a protein comprising a "mucin domain'. A mucin domain is final aspect of this embodiment, each tandem amino acid rich in potential glycosylation sites, and has a high content of 60 repeat unit within the mucin domain is comprised of at least 8 serine and/or threonine and proline, which can represent amino acids. Preferably, each unit is comprised of at least 16 greater than 40% of the amino acids within the mucin domain. amino acids. More preferably, each unit is comprised of at A mucin domain is heavily glycosylated with predominantly least 19 amino acids, and each unit may vary in length from O-linked glycans. A mucin-domain polypeptide has at least about 19 amino acids to 150 amino. about 60%, at least 70%. at least 80%, or at least 90% of its 65 In one embodiment the mucin-domain polypeptide com mass due to the glycans. Mucin domains may comprise tan prises at least 32 amino acids, comprising at least 40% Serine, dem amino acid repeat units (also referred to herein as TR) Threonine, and Proline. In one embodiment, a mucin-domain US 9,156,897 B2 13 14 polypeptide in accordance with the invention comprises at domain polypeptide, and/or nucleic acids encoding the least 2, 4, 8, 10 or 12 tandem amino acid repeating units of at mucin-domain polypeptide, may be constructed using mucin least 8 amino acids in length per tandem repeating unit. Pre domain encoding sequences of proteins that are known in the ferred amino acid sequences of a tandem repeating unit art and are publicly available through sources such as Gen include, but are not limited to those of Table I. The mucin Bank. TABL I

Tandem Repeat (TR) Amino Acid Number of Accession Name Sequence (# of aa' s ) TR/MUC: Number" Notes MUC1 PAPGSTAPPAHGWTSAPDTR (2O) 21-125; P15941 Multiple SEQ ID NO: 11 41 and variants 85 are of cost MUC1 cond exist

MUC2 ITTTTTVTPTPTPTGTOTPTTTP (23) 99 QO2817 Major TR; SEQ ID NO: 12 alternative TR sequences exist

MUC3 (A) ITTTETTSHDTPSFTSs (17) QO2505 Degenerate SEQ ID NO: 13 TR sequence; long serine-rich and threonine rich sequence also exist

MUC4 ATPLPVTDTSSASTGH (16) 145 - 395 Q991.02 Degenerate SEQ ID NO: 14 TR sequence, long serine-rich and threonine rich sequence also exist

MUCSAC TTSTTSAP (8) (46, 17, 34, 58) Consensus SEQ ID NO: 15 sequence T-T-S-T- T-S-A-P (SEQ ID NO: 15)

MUCSB ATGSTATPSSTPGTTHTPPVLTTTATTPT (11, 11, 17, Degenerate (29) SEQ ID NO: 16 11, 23) TR sequence

MUC6 PTS NA NA

MUC7 TTAAPPTPSATTOAPPSSSAPPE (23) Degenerate SEQ ID NO: 17 TR sequence

MUC11/12 EESTTWHSSPGATGTALFP (19) 28 Consensus SEQ ID NO: 18 sequence E-E-S-X- X-X-H-X- X-P-X-X- T-X-T-X- X-X-P (SEQ ID NO: 25)

MUC13 PTS NA

MUC14 PTS NA

MUC1s PTS NA

MUC16 PTS NA US 9,156,897 B2

TABLE I - continued Tandem Repeat (TR) Amino Acid Number of Accession Name Sequence (# of aa' s ) TR/MUC k Number" Notes MUC17 SSSPTPAEGTSMPTSTYSEGRTPLTSMPVSTT 59-60 Q685J3 Degenerate LVATSAISTLSTTPWDTSTPVTNSTEA (60) TR SEQ ID NO : 19 sequence MUC19 PTS NA O725P9 Repeats of G-W-T- G T-T- G- P S-A (SEQ ID NO : 26)

MUC2O SESSASSDGPHPVITPSRA (19) 11-12 O8N3O7 SEQ ID NO: 2O MUC21 ATNSESSTVSSGIST (15) 28 Q5SSG8 Degenerate SEQ ID NO: 21 TR sequence

MUC22 PTS NA E2RYF6

TIM-1 VPTTTT (6) 11 Q96D42 Degenerate SEQ ID NO: 22 TR sequence

TIM-4 PTS NA Q96H15

Fractalkine Mucin-like region (PTS) NA P78423 Macrosialin Mucin-like region (PTS) NA P3 4810 (CD68)

CD96 PTS NA P4O2(OO

Endosial in Pro-rich region NA Q9HCUO DAF Pro/Thr-rich region NA PO8174 (CD55) Podocalyxin Thr-rich region NA OOO592

EMR1 Ser/Thr-rich region NA Q14246

PSGL-1 OTTOPAATEA (10) 12 Q14242 Degenerate SEQ ID NO: 23 TR sequence

MUC8 and MUC9 are omitted; no reliable data PTS proline/serine/threonine rich sequence *approximate; TR number is reported as a range in most cases Uniprot number The number n of TR is different in specific regions NA. Not announced

In one embodiment a mucin-domain polypeptide total main polypeptide comprises additional O-linked glycosyla sequence length is from 32 to 200. As increased half life tion sites compared to the wild-type mucin-domain polypep correlates with increasing hydrodynamic radius and, most 50 tide. Alternatively, the variant mucin-domain polypeptide importantly, an apparent molecular weight of greater than comprises amino acid sequence mutations that result in an around 60 kD which allows circumvention of renal filtration, increased number of serine, threonine or proline residues as the length of the mucin-domain polypeptide is somewhat compared to a wild type mucin-domain polypeptide Alterna dependent on the size of the active moiety. For example, a 55 tively, the variant mucin-domain polypeptide sequences com peptide of molecular weight less than 5 kD, may require a prise added or Subtracted charged residues, including but not mucin-domain polypeptide of 200 amino acids to achieve the limited to aspartic acid, glutamic acid, lysine, histidine, and desired half-life extension. In contrast, a protein of molecular arginine, which change the p or charge of the molecule at a weight of 40 kD. may only need a mucin-domain polypeptide particular pH. of 32 amino acids to achieve the desired half-life. Further 60 Active Protein and Therapeutic Active Protein more, mucinylation allows for the half-life to be optimized by As used herein an “active protein’ for inclusion in the increasing or reducing the number of mucintandem repeats in fusion protein of the invention means a protein of biologic, the mucin-domain peptide of the fusion protein. therapeutic, prophylactic, or diagnostic interest or function Alternatively, the mucin-domain polypeptide moiety is and/or is capable of mediating a biological activity. A “thera provided as a variant mucin-domain polypeptide having a 65 peutic active protein’ as that term is used herein is a protein mutation in the naturally-occurring mucin-domain sequence that is capable of preventing or ameliorating a disease, disor of a wild type protein. For example, the variant mucin-do der or conditions when administered to a subject. US 9,156,897 B2 17 18 In one embodiment, an active protein or therapeutic active mimetics of a natural sequence that retain at least a portion of protein inaccordance with the invention specifically excludes the biological activity of the native active protein. immunoglobulin molecules or any containing an Fc domain, In non-limiting examples, the active protein can be a or any fragment thereof. sequence that exhibits at least about 80% sequence identity, Of particular interest are active proteins and therapeutic or alternatively 81%, 82%, 83%, 84%, 85%,86%, 87%, 88%, active proteins for which an increase in a pharmacokinetic 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, parameter, increased solubility, increased stability, or some 99%, or 100% sequence identity to the native active protein or other enhanced pharmaceutical property is sought, or those a variant of a native active protein. Such proteins and peptides active proteins for which increasing the half-life would includebut are not limited to the following: bioactive peptides 10 (such as GLP-1, exendin-4, oxytocin, peptides), cytok improve efficacy, safety, or result in reduce dosing frequency ines, growth factors, chemokines, lymphokines, ligands, and/or improve patient compliance. Thus, the fusion proteins receptors, hormones, enzymes, antibodies and antibody frag of the invention are prepared with various objectives in mind, ments, domain antibodies, nanobodies, single chain antibod including improving the therapeutic efficacy of the therapeu ies, engineered antibody alternative scaffolds such as tic active protein, for example, increasing the in vivo expo 15 DARPins, centyrins, adnectins, and growth factors. sure or the length of time that the fusion protein of the inven Examples of receptors include the extracellular domain of tion remains within the therapeutic window when membrane associated receptors (such as TNFR2, VEGF administered to a subject, compared to an active protein not receptors, IL-1R1, IL-1RAcP, IL-4 receptor, hCH receptor, linked to mucin-domain polypeptide. CTLA-4, PD-1, IL-6RO, FGF receptors), soluble receptors In one embodiment, a fusion protein of the invention may which have been cleaved from their transmembrane domains, comprise a single active protein linked to a mucin-domain “dummy' or decoy receptors (such as IL-1RII, polypeptide (as described more fully below). In another TNFRSF11B, DcR3), and any chemically or genetically embodiment, the fusion protein of the invention can comprise modified soluble receptors. Examples of enzymes include a first active protein and a second molecule of the same active activated , factor VII, collagenase; agallsidase-beta; protein, resulting in a fusion protein comprising the two 25 dornase-alpha; alteplase; pegylated-asparaginase; asparagi active proteins linked through one or more mucin-domain nase; and imiglucerase. Examples of specific polypeptides or polypeptides. In another embodiment, the fusion protein of proteins include, but are not limited to granulocyte macroph the invention can comprise a first active protein and a second age colony stimulating factor (GM-CSF), granulocyte colony distinct active protein, resulting in a fusion protein compris stimulating factor (G-CSF), macrophage colony stimulating ing the two active proteins with differing activities linked 30 factor (M-CSF), colony stimulating factor (CSF), interferon through one or more mucin-domain polypeptides. beta (IFN-B), interferon gamma (IFNY), interferon gamma In one embodiment, an active protein will exhibit a binding inducing factor I (IGIF), transforming growth factor beta specificity to a given target or another desired biological (TGF-3), RANTES (regulated upon activation, normal T-cell characteristic when used in vivo or when utilized in an invitro expressed and presumably secreted), macrophage inflamma assay. For example, the active protein can be an agonist, a 35 tory proteins (e.g., MIP-1-C. and MIP-1-3, Leishmania elon receptor, a ligand, an antagonist, an enzyme, or a hormone. Of gation initiating factor (LEIF), platelet derived growth factor particular interest are active proteins used or known to be (PDGF), (TNF), growth factors, e.g., useful for a disease or disorder wherein an extension in their epidermal growth factor (EGF), vascular endothelial growth half-life would permit less frequent dosing or an enhanced factor (VEGF), fibroblast growth factor. (FGF), nerve growth pharmacologic effect. Also of interest are active proteins that 40 factor (NGF), brain derived neurotrophic factor (BDNF), have a narrow therapeutic window between the minimum neurotrophin-2 (NT-2), neurotrophin-3 (NT-3), neurotro effective dose or blood concentration (C) and the maxi phin-4 (NT-4), neurotrophin-5 (NT-5), glial cell line-derived mum tolerated dose or blood concentration (C). In Such neurotrophic factor (GDNF), ciliary neurotrophic factor cases, the linking of the active protein to a fusion protein (CNTF), TNF a type II receptor, erythropoietin (EPO), insu comprising a mucin-domain polypeptide can result in an 45 lin and Soluble e.g., gp120 and gp160 glyco improvement in these properties, making them more useful as proteins. The gp120 is a human immunodefi therapeutic or preventive agents compared to active protein ciency virus (HIV) envelope protein, and the gp160 not linked to a mucin-domain polypeptide. glycoprotein is a known precursor to the gp120 glycoprotein. The active proteins of the invention that are therapeutic In one embodiment, the biologically active polypeptide is active proteins can have utility in the treatment in various 50 GLP-1. In another embodiment, the biologically active therapeutic or disease categories, including but not limited to: polypeptide is nesiritide, human B-type natriuretic peptide glucose and insulin disorders, metabolic disorders, cardio (hBNP). In yet another embodiment, the biologically active vascular diseases, coagulation/bleeding disorders, growth polypeptide is secretin, which is a peptide hormone com disorders or conditions, tumorigenic conditions, inflamma posed of an amino acid sequence identical to the naturally tory conditions, autoimmune conditions, and other diseases 55 occurring porcine secretin consisting of 27 amino acids. In and disease categories wherein a therapeutic protein or pep one embodiment, the biologically active polypeptide is enfu tide not linked to a mucin-domain polypeptide exhibits a virtide, a linear 36-amino acid synthetic polypeptide which is suboptimal half-life, or wherein a therapeutic protein or pep an inhibitor of the fusion of HIV-1 with CD4+ cells. In one tide does not exist. embodiment, the biologically active polypeptide is bivaliru An active protein of the invention can be a native, full 60 din, a specific and reversible direct thrombin inhibitor. Anti length protein or can be a fragment or a sequence variant of an hemophilic Factor (AHF) may be selected as the active active protein that retains at least a portion of the therapeutic polypeptide. The mean in vivo half-life of HEMOFIL MTM activity of the native active protein. In one embodiment, the AHF is known to be 14.7+5.1 hours (n=61). In another active proteins in accordance with the invention can be a embodiment, erythropoietin is the biologically active recombinant polypeptide with a sequence corresponding to a 65 polypeptide. Erythropoietin is a 165 amino acid glycoprotein protein found in nature. In another embodiment, the active manufactured by recombinant DNA technology and has the proteins can be sequence variants, fragments, homologs, and same biological effects as endogenous erythropoietin. In US 9,156,897 B2 19 20 adult and pediatric patients with chronic renal failure, the hepatitis C. This PEGylated protein requires weekly injection elimination half-life of unmodified plasma erythropoietin and slow release formulations with longer half-life are desir after intravenous administration is known to range from 4 to able. 13 hours. In still another embodiment, the biologically active Additional cellular proteins include, but are not limited to: polypeptide is Reteplase. Reteplase is a non-glycosylated 5 VEGF, VEGF-R1, VEGF-R2, VEGF-R3, Her-1, Her-2, Her deletion mutein of tissue plasminogen activator (tPA), com 3, EGF-1, EGF-2, EGF-3, Alpha3, cMet, ICOS, CD40L, prising the kringle 2 and the protease domains of human tRA. LFA-1, c-Met, ICOS, LFA-1, IL-6, B7.1, B7.2, OX40, IL-1b, Based on the measurement of thrombolytic activity, the effec TACI, IgE, BAFF, or BLyS, TPO-R, CD19, CD20, CD22, tive half-life of unmodified Reteplase is known to be approxi CD33, CD28, IL-1-R1, TNFC, TRAIL-R1, Complement 10 Receptor 1, FGFa, Osteopontin, , Ephrin A1-A5, mately 15 minutes. Ephrin B1-B3, alpha-2-macroglobulin, CCL1, CCL2. CCL3, In one preferred embodiment, the active polypeptide is CCL4, CCL5, CCL6, CCL7, CXCL8, CXCL9, CXCL10, Anakinra, a recombinant, nonglycosylated form of the human CXCL11, CXCL12, CCL13, CCL14. CCL15, CXCL16, interleukin-1 receptor antagonist or the glycosylated form CCL16, CCL17, CCL18, CCL19, CCL20, CCL21, CCL22, expressed in mammalian cells (IL-IRa). In one case, Anak 15 PDGF, TGFb, GMCSF, SCF, p40 (IL12/IL23), IL1b, IL1a, inra consists of 153 amino acids and has a molecular weight IL1ra, IL2, IL3, IL4, IL5, IL6s, IL8, IL10, IL2, IL15, IL23, of 17.3 kilodaltons. It may be produced by recombinant DNA Fas, FasL, Flt3 ligand, 41 BB, ACE, ACE-2, KGF, FGF-7, technology using an E. coli bacterial expression system. The SCF, Netrin1.2, IFNa,b,g, Caspase-2,3,7,8,10, ADAMS1.S5, glycosylated version of IL-Ra can be produced in mammalian 8,9,15.TS1, TS5 Adiponectin, ALCAM, ALK-1, APRIL, expression systems. The in vivo half-life of unmodified Anak Annexin V, Angiogenin, Amphiregulin, Angiopoietin-1.2.4. inra is known to range from 4 to 6 hours. B7-1/CD80, B7-2/CD86, B7-H1, B7-H2, B7-H3, Bcl-2, In another preferred embodiment, the active polypeptide is BACE-1, BAK, BCAM, BDNF, bNGF, bECGF, BMP2,3,4, exendin-4. In one case, exendin-4 consists of 39 amino acids. 5,6,7,8: CRP, Cadherin-6,8,11: Cathepsin A, B, C, D, E, L, S. It may be produced by recombinant DNA technology using V, X: CD11a/LFA-1, LFA-3, GP2b3a, GH receptor, RSV F an E. coli bacterial expression system. The in vivo half-life of 25 protein, IL-23 (p40, p19), IL-12, CD80, CD86, CD28, unmodified exendin-4 is known to be 0.5 hours iv. Ai, G., et CTLA-4, a4P1, a4137, TNF/Lymphotoxin, IgE, CD3, CD20, al.; Pharmacokinetics of exendin-4 in Wistar rats; Journal of IL-6, IL-6R, BLYS/BAFF, IL-2R, HER2, EGFR, CD33, Chinese Pharmaceutical Sciences; 17 (2008) 6-10. CD52, Digoxin, Rho (D), Varicella, Hepatitis, CMV. Tetanus, Becaplermin may also be selected as the active polypep Vaccinia, Antivenom, Botulinum, Trail-R1, Trail-R2, cMet, tide. Becaplermin is a recombinant human platelet-derived 30 TNF-R family, such as LA NGF-R, CD27. CD30, CD40, growth factor (rhPDGF-BB) for topical administration. Beca CD95, Lymphotoxin a?b receptor, Wsl-1, TL1 ATNFSF15, plermin may be produced by recombinant DNA technology BAFF, BAFF-R/TNFRSF13C, TRAIL R2/TNFRSF10B, by insertion of the gene for the B chain of platelet derived TRAILR2F/TNFRSF10B, Fas/TNFRSF6 CD27/TNFRSF7, growth factor (PDGF) into the yeast strain Saccharomyces DR3/TNFRSF25, HVEM/TNFRSF14, TROY/TNFRSF19, cerevisiae. One form of Becaplermin has a molecular weight 35 CD40 Ligand/TNFSF5, BCMA/TNFRSF17, CD30/TN of approximately 25 kD and is a homodimer composed of two FRSF8, LIGHT/TNFSF14, 4-1BB/TNFRSF9, CD40/TN identical polypeptide chains that are bound together by dis FRSF5, GITR/TNFRSF18, Osteoprotegerin/TNFRSF11B, ulfide bonds. The active polypeptide may be , RANK/TNFRSF11A, TRAIL R3/TNFRSF10C, TRAIL/ which is a recombinant form of interleukin eleven (IL-11) TNFSF10, TRANCE/RANK L/TNFSF11, 4-1BB Ligand/ that is produced in Escherichia coli (E. coli) by recombinant 40 TNFSF9, TWEAK/TNFSF12, CD40 Ligand/TNFSFS, Fas DNA technology. In one embodiment, the selected biologi Ligand/TNFSF6, RELT/TNFRSF19L, APRIL/TNFSF13, cally active polypeptide has a molecular mass of approxi DcR3/TNFRSF6B, TNF R1/TNFRSF1A, TRAIL mately 19,000 daltons, and is non-glycosylated. The polypep R1/TNFRSF 10A, TRAIL R4/TNFRSF10D, CD30 Ligand/ tide is 177 amino acids in length and differs from the 178 TNFSF8, GITR Ligand/TNFSF18, TNFSF18, TAC1/ amino acid length of native IL-11 only in lacking the amino 45 TNFRSF13B, NGF R/TNFRSF16, OX40 Ligand/TNFSF4, terminal proline residue, which is known not to result in TRAIL R2/TNFRSF10B, TRAIL R3/TNFRSF10C, measurable differences in bioactivity either in vitro or in vivo. TWEAK R/TNFRSF12, BAFF/BLyS/TNFSF13, DR6/TN The terminal half-life of unmodified Oprelvekin is known to FRSF21, TNF-alpha/NFSF1A, Pro-TNF-alpha/TNFSF1A, be approximately 7hrs. Yet another embodiment provides for Lymphotoxin beta R/TNFRSF3, Lymphotoxin beta R a biologically active polypeptide which is Glucagon, a 50 (LTbR)/Fc Chimera, TNF R1/TNFRSF1A, TNF-beta/ polypeptide hormone identical to human glucagon that TNFSF1B, PGRP-S, TNF R1/TNFRSF1A, TNF RII/ increases blood glucose and relaxes Smooth muscles of the TNFRSF1B, EDA-A2, TNF-alpha/TNFSF1A, EDAR, gastrointestinal tract. Glucagon may be synthesized in a spe XEDAR, TNFR1/TNFRSF1A 4EBP1, 14-3-3 Zeta, 53BP1, cial non-pathogenic laboratory strain of E. coli bacteria that 2B34/SLAMF4, CCL21/6Ckine, 4-1BB/TNFRSF9, 8D6A, have been genetically altered by the addition of the gene for 55 4-1BB Ligand/TNFSF9.8-oxo-dG, 4-Amino-1,8-naphthal glucagon. In a specific embodiment, glucagon is a single imide, A2B5, Aminopeptidase LRAP/ERAP2, A33, Ami chain polypeptide that contains 29 amino acid residues and nopeptidase N/ANPEP, Aag. Aminopeptidase P2/XPNPEP2, has a molecular weight of 3,483. The in vivo half-life is ABCG2, Aminopeptidase P1/XPNPEP1, ACE, Aminopepti known to be short, ranging from 8 to 18 minutes. dase PILS/ARTS1, ACE-2, Amnionless, Actin, Amphiregu G-CSF may also be chosen as the active polypeptide. 60 lin, beta-Actin, AMPK alpha 1/2, Activin A, AMPK alpha 1, Recombinant granulocyte-colony stimulating factor or Activin AB, AMPKalpha2, Activin B, AMPKbeta 1, Activin G-CSF is used following various chemotherapy treatments to C, AMPKbeta 2, Activin RIA/ALK-2, Androgen R/NR3C4, stimulate the recovery of white blood cells. The reported Activin RIB/ALK-4, Angiogenin, Activin RIIA, Angiopoi half-life of recombinant G-CSF is only 3.5 hours. etin-1, Activin RIIB, Angiopoietin-2, ADAMS, Angiopoi In one embodiment the biologically active polypeptide can 65 etin-3, ADAM9, Angiopoictin-4, ADAM10, Angiopoietin be interferon alpha (IFN alpha). Chemically PEG-modified like 1, ADAM12, Angiopoietin-like 2, ADAM15, interferon-alpha2a is clinically validated for the treatment of Angiopoietin-like 3, TACE/ADAM17, Angiopoietin-like 4,

US 9,156,897 B2 27 28 KIM-1HAVCR, Tryptase beta-1/MCPT-7, TIM-2, Tryptase Granditropin, Vitrase, recombinant insulin, interferon-alpha beta-21TPSB2, TIM-3, Tryptase epsilon/BSSP-4, TIM-4, (oral lozenge), GEM-21S, vapreotide, idursulfase, omapatri Tryptase gamma-1/TPSG1, TIM-5, Tryptophan Hydroxy lat, recombinant serum albumin, certolizumab pegol, glu lase, TIM-6, TSC22, TIMP-1, TSG, TIMP-2, TSG-6, TIMP carpidase, human recombinant C1 esterase inhibitor (an 3, TSK, TIMP-4, TSLP, TL1A/NFSF15, TSLP R, TLR1, gioedema), lanoteplase, recombinant human growth TSP50, TLR2, beta-III Tubulin, TLR3, TWEAK/TNFSF12, hormone, enfuvirtide (needle-free injection, Biojector 2000), TLR4, TWEAK R/TNFRSF12, TLRS, Tyk2, TLR6, Phos VGV-1, interferon (alpha), lucinactant, aviptadil (inhaled, pho-Tyrosine, TLR9, Tyrosine Hydroxylase, TLX/NR2E, pulmonary disease), icatibant, ecallantide, omiganan, Auro Tyrosine Phosphatase Substrate I. Ubiquitin, UNC5H3, Ugi, grab, pexiganan acetate, ADI-PEG-20, LDI-200, degarelix, UNC5H4, UGRP1, UNG, ULBP-1, uPA, ULBP-2, uPAR, 10 cintredekin besudotox, Favld, MDX-1379, ISAtX-247, lira ULBP-3, URB, UNC5111 UVDE, UNC5H2, Vanilloid R1, glutide, teriparatide (osteoporosis), tifacogin, AA-4500, VEGF R, VASA, VEGF R1/Flt-1, Vasohibin, VEGF T4N5 liposome lotion, catumaxomab, DWP-413, ART-123, R2/KDR/Flk-1, Vasorin, VEGFR3/Flt-4, Vasostatin, Versi Chrysalin, desmoteplase, amediplase, corifolitropin alpha, can, Vav-1, VGSQ, VCAM-1, VHR, VDR/NR111, Vimentin, TH-9507, tediuglutide, Diamyd, DWP-412, growth hormone VEGF, Vitronectin, VEGF-B, VLDLR, VEGF-C. v WF-A2, 15 (Sustained release injection), recombinant G-CSF, insulin (in VEGF-D, Synuclein-alpha, Ku70, WASP Wnt-7b, WIF-1, haled, AIR), insulin (inhaled, Technosphere), insulin (in Wnt-8a WISP-1/CCN4, Wnt-8b, WNK1, Wnt-9a, Wnt-1, haled, AERX), RGN-303, DiaPep277, interferon beta (hepa Wnt-9b, Wnt-3a, Wnt-10a, Wnt-4, Wnt-10b, Wnt-5a, Wnt titis C viral infection (HCV)), interferon alfa-n3 (oral), 11, Wnt-5b, winvNS3, Wnt7a, XCR1, XPE/DDB1, XEDAR, belatacept, transdermal insulin patches, AMG-531, MBP XPE/DDB2, Xg, XPF, XIAP, XPG, XPA, XPW, XPD, 8298, Xerecept, opebacan, AIDSVAX, GV-1001. LymphoS XRCC1, Yes, YY1, EphA4. can, ranpirinase, Lipoxysan, lusupultide. MP52 (beta-tricalci Other active polypeptides include: BOTOX, Myobloc, umphosphate carrier, bone regeneration), melanoma vaccine, Neurobloc, Dysport (or other serotypes of botulinum neuro sipuleucel-T, CTP-37. Insegia, Vitespen, human thrombin toxins), alglucosidase alfa, daptomycin. YH-16, choriogona (frozen, surgical bleeding), thrombin, TransMID, alfime dotropin alfa, filgrastim, cetrorelix, interleukin-2, aldesleu 25 prase, Puricase, terlipressin (intravenous, hepatorenal Syn kin, teceleukin, , interferon alfa-n3 drome), EUR-1008M, recombinant FGF-1 (injectable, vas (injection), interferon alfa-n1, DL-8234, interferon, Suntory cular disease). BDM-E, rotigaptide, ETC-216. P-113. MBI (gamma-1a), interferon gamma, thymosin alpha 1, tasoner 594AN, duramycin (inhaled, cystic fibrosis), SCV-07, OPI min, DigiFab, ViperaTAb, EchiTAb, CroFab, nesiritide, 45, Endostatin, Angiostatin, ABT-510, Bowman Birk abatacept, alefacept, Rebif, eptoterminalfa, teriparatide (os 30 Inhibitor Concentrate, XMP-629,99mTc-Hynic-Annexin V. teoporosis), calcitonin injectable (bone disease), calcitonin kahalalide F, CTCE-9908, teverelix (extended release), oza (nasal, osteoporosis), etanercept, hemoglobin glutamer 250 relix, romidepsin, BAY-50-4798, interleukin-4, PRX-321, (bovine), drotrecogin alfa, collagenase, carperitide, recombi Pepscan, iboctadekin, rh , TRU-015, IL-21, ATN nant human epidermal growth factor (topical gel, wound 161, cilengitide, Albuferon, Biphasix, IRX-2, omega inter healing), DWP-401, darbepoetin alfa, epoetin omega, epoetin 35 feron, PCK-3145, CAP-232, pasireotide, huN901-DM1, ova beta, epoetin alfa, desirudin, lepirudin, bivalirudin, nonacog rian cancer immunotherapeutic vaccine, SB-249553, alpha, Mononine, eptacog alfa (activated), recombinant Fac Oncovax-CL, OncoVax-P. BLP-25, CerVax-16, multi tor VIII+VWF, Recombinate, recombinant Factor VIII, Fac epitope peptide melanoma vaccine (MART-1, gp100, tyrosi tor VIII (recombinant). Alphanate, octocog alfa, Factor VIII, nase), nemifitide, ra AT (inhaled), ra AT (dermatological), palifermin, Indikinase, tenecteplase, alteplase, pamiteplase, 40 CGRP (inhaled, ), pegSunercept, thymosin beta-4, reteplase, nateplase, monteplase, follitropin alfa, rESH. plitidepsin, GTP-200, ramoplanin, GRASPA, OBI-1, hpFSH. micafungin, pegfilgrastim, lenograstim, nar AC-100, Salmon calcitonin (oral, eligen), calcitonin (oral, tograstim, sermorelin, glucagon, eXenatide, pramlintide, imi osteoporosis), examorelin, capromorelin, Cardeva, Velafer glucerase, galsulfase, Leucotropin, molgramoStim, triptore min. 131I-TM-601, KK-220, TP-10, ularitide, depelestat, lin acetate, histrelin (Subcutaneous implant, Hydron), 45 hematide, Chrysalin (topical), rNAPc2, recombinant Factor deslorelin, histrelin, nafarelin, leuprolide Sustained release VIII (PEGylated liposomal), bFGF, PEGylated recombinant depot (ATRIGEL), leuprolide implant (DUROS), goserelin, staphylokinase variant, V-10153, SonoLysis Prolyse, Neu Somatropin, Eutropin, KP-102 program, Somatropin, Somat roVax, CZEN-002, islet cell neogenesis therapy, rGLP-1. ropin, mecasermin (growth failure), enfuvirtide, Org-33408, BIM-51077, LY-548.806, exenatide (controlled release, Med insulin glargine, insulin glulisine, insulin (inhaled), insulin 50 isorb), AVE-0010, GA-GCB, avorelin, AOD-9604, lina lispro, insulin detemir, insulin (buccal, Rapid Mist), mecaser clotide acetate, CETi-1, Hemospan, VAL (injectable), fast min rinfabate, anakinra, celmoleukin, 99mTc-apcitide injec acting insulin (injectable, Viadel), intranasal insulin, insulin tion, myelopid, Betaseron, glatirameracetate, Gepon, Sargra (inhaled), insulin (oral, eligen), recombinant methionyl mostim, oprelvekin, human leukocyte-derived alpha human leptin, pitrakinra Subcutaneous injection, eczema), interferons, Bilive, insulin (recombinant), recombinant 55 pitrakinra (inhaled dry powder, asthma). Multikine, human insulin, insulinaspart, mecasermin, Roferon-A, inter RG-1068, MM-093, NBI-6024, AT-001, PI-0824, Org feron-alpha 2, Alfaferone, interferon alfacon-1, interferon 39141, Cpn 10 (autoimmune iseases/inflammation), talactof alpha, Avonex recombinant human luteinizing hormone, errin (topical), rEV-131 (ophthalmic), rEV-131 (respiratory dornase alfa, trafermin, Ziconotide, taltirelin, diboterminalfa, disease), oral recombinant human insulin (diabetes), RPI atosiban, becaplermin, eptifibatide, Zemaira, CTC-1 11, 60 78M, oprelvekin (oral), CYT-99007 CTLA4-Ig, DTY-001, Shanvac-B, HPV vaccine (quadrivalent). NOV-002, oct valategrast, interferon alfa-n3 (topical), IRX-3, RDP-58, reotide, lanreotide, ancestim, agallsidase beta, agallsidase alfa, Tauferon, bile salt stimulated lipase, Merispase, alkaline laronidase, prezatide copper acetate (topical gel), rasbu phosphatase. EP-2104R, Melanotan-II, , ATL ricase, ranibiZumab, Actimmune, PEG-Intron, Tricomin, 104, recombinant human microplasmin, AX-200, SEMAX, recombinant house dust mite desensitization injec 65 ACV-1, Xen-2174, CJC-1008, A, SI-6603. LAB tion, recombinant human parathyroid hormone (PTH) 1-84 GHRH, AER-002. BGC-728, malaria vaccine (virosomes, (Sc, osteoporosis), epoetin delta, transgenic antithrombin III, PeviPRO), ALTU-135, parvovirus B 19 vaccine, influenza US 9,156,897 B2 29 30 vaccine (recombinant neuraminidase), malaria/1-1HBV vac the wild type polynucleotide sequence (e.g., a polynucleotide cine, anthrax vaccine. Vacc-5q. Vacc-4x, HIV Vaccine (oral), which encodes the wildtype active protein, wherein the DNA HPV vaccine, Tat Toxoid, YSPSL, CHS-13340, PTH(1-34) sequence of the polynucleotide has been optimized, for liposomal cream (Novasome). Ostabolin-C, PTH analog example, for expression in a particular species; or a poly (topical, psoriasis), MBR1-93.02, MTB72F vaccine (tuber nucleotide encoding a variant of the wildtype protein, such as culosis), MVA-Ag85A vaccine (tuberculosis), FAR-404, a site directed mutant oran allelic variant. It is well within the BA-210, recombinant plague F1 V vaccine, AG-702, OXSO ability of the skilled artisan to use a wild-type or consensus Drol, rBetV1, Der-p1/Der-p2/Der-p7 allergen-targeting vac cDNA sequence or a codon-optimized variant of a active cine (dust mite allergy), PR1 peptide antigen (leukemia), protein to create fusion protein constructs contemplated by mutant ras vaccine. HPV-16 E7 lipopeptide vaccine, labyrin 10 the invention using methods known in the art and/or in con thin vaccine (adenocarcinoma), CML vaccine, WT1-peptide junction with the guidance and methods provided herein, and vaccine (cancer), IDD-5, CDX-1 10, Pentrys, Norelin, Cyto described more fully in the Examples. Fab, P-9808, VT-111, icrocaptide, telbermin (dermatological, Pharmacokinetic Properties of the Fusion Proteins diabetic foot ulcer), rupintrivir, reticulose, rGRF, PIA, alpha The invention provides fusion proteins of therapeutic galactosidase A, ACE-011, ALTU-140, CGX-1160, angio 15 active proteins with enhanced pharmacokinetics compared to tensin therapeutic vaccine, D-4F, ETC-642. APP-018, the therapeutic active protein not linked to a mucin-polypep rhMBL, SCV-07 (oral, tuberculosis), DRF-7295, ABT-828, tide domain, that, when used at the optimal dose determined ErbB2-specific immunotoxin (anticancer), DT3881 L-3, for the composition by the methods described herein, can TST-10088, PRO-1762, Combotox, cholecystokinin-B/gas achieve enhanced pharmacokinetics compared to a compa trin-receptor binding peptides, 1111n-hEGF, AE-37, trastu rable dose of the therapeutic active protein not linked to a Zumab-DM1, Antagonist G, IL-12 (recombinant), mucin-domain polypeptide in accordance with the invention. PM-02734, IMP-321, rhIGF-BP3, BLX-883, CUV-1647 As used herein, a “comparable dose” means a dose with an (topical), L-19 based radioimmunotherapeutics (cancer), equivalent moles/kg for the therapeutic active protein that is Re-188-P-2045, AMG-386, DC/1540/KLH vaccine (cancer), administered to a subject in a comparable fashion. It will be VX-001, AVE-9633, AC-9301, NY-ESO-1 vaccine (pep 25 understood in the art that a “comparable dosage of the fusion tides), NA. 17. A2 peptides, melanoma vaccine (pulsed anti protein would represent a greater weight of agent but would gen therapeutic), prostate cancer vaccine, CBP-501, recom have essentially the same mole-equivalents of the therapeutic binant human lactoferrin (dry eye), FX-06, AP-214, WAP active protein in the dose of the fusion protein and/or would 8294A2 (injectable), ACP-HIP, SUN-11031, peptide YY have the same approximate molar concentration relative to 3-36 (obesity, intranasal), FGLL, atacicept, BR3-Fc. 30 the therapeutic active protein. BN-003, BA-058, human parathyroid hormone 1-34 (nasal, The pharmacokinetic (PK) properties of a therapeutic osteoporosis), F-18-CCR1, AT-1001 (celiac disease? diabe active protein that can be enhanced by linking a mucin tes), JPD-003, PTH(7-34) liposomal cream (Novasome), polypeptide domain to the therapeutic active protein include duramycin (ophthalmic, dry eye), CAB-2, CTCE-0214, Gly half-life, area under the curve (AUC), C, T, peak-to coPEGylated erythropoietin, EPO-Fc, CNTO-528, 35 trough concentration ratio, and Volume of distribution. The AMG-114. JR-013, Factor XIII, aminocandin, PN-951, enhancements in PK properties can lead to improvements in 716155, SUN-E7001, TH-0318, BAY-73-7977, teverelix efficacy due to increased exposure, reduction in adverse (immediate release), EP-51216, hCGH (controlled release, events due to reduction in dose and a dampening of C, and Biosphere), OGP-I, sifuvirtide, TV-4710, ALG-889. Org a reduction in dosing frequency. As described more fully in 41259, rhCC10, F-991, thymopentin (pulmonary diseases), 40 the Examples, the invention provides fusion proteins com r(m)CRP, hepatoselective insulin, subalin. L19-IL-2 fusion prising a mucin-domain polypeptide linked to a therapeutic protein, elafin, NMK-150, ALTU-139, EN-122004, rhTPO, active protein that increase the half-life for the administered receptor agonist (thrombocytopenic disor fusion protein, compared to the corresponding therapeutic ders). AL-108. AL-208, nerve growth factor antagonists active protein not linked to the fusion protein, of at least about (pain), SLV-317, CGX-1007, INNO-105, oral teriparatide 45 two-fold longer, or at least about three-fold, or at least about (elligen), GEM-OS1, AC-162352, PRX-302, LFn-p24 fusion four-fold, or at least about five-fold, or at least about six-fold, vaccine (Therapore), EP-1043, S. pneumoniae pediatric vac or at least about seven-fold, or at least about eight-fold, or at cine, malaria vaccine, Neisseria meningitidis Group B vac least about nine-fold, or at least about ten-fold, or at least cine, neonatal group B streptococcal vaccine, anthrax vac about 15-fold, or at least a 20-fold or greater an increase in cine, HCV vaccine (gpE1+gpE2+MF-59), otitis media 50 half-life compared to the therapeutic active not linked to the therapy, HCV vaccine (core antigen+ISCOMATRIX), hPTH fusion protein. (1-34) (transdermal, VialDerm), 768974, SYN-101, PGN Similarly, the fusion proteins of the invention can have an 0052, aviscumine, BIM-23190, tuberculosis vaccine, multi increase in AUC of at least about 50%, or at least about 60%, epitope tyrosinase peptide, cancer vaccine, enkastim, APC or at least about 70%, or at least about 80%, or at least about 8024, G1-5005, ACC-001, TTS-CD3, vascular-targeted TNF 55 90%, or at least about 100%, or at least about 150%, or at least (Solid tumors), desmopressin (buccal controlled-release), about 200%, or at least about 300% increase in AUC com onercept, TP-9201. pared to the corresponding therapeutic active protein not The nucleic acid and amino acid sequences of numerous linked to the fusion protein. The pharmacokinetic parameters active proteins are well known in the art and descriptions and of half-life and AUC of a fusion protein of the invention can sequences are available in public databases Such as Chemical 60 be determined by standard methods involving dosing, the Abstracts Services Databases (e.g., the CAS Registry), Gen taking of blood samples at times intervals, and the assaying of Bank, GenPept, Nucleotide, Entrez Protein, The Uni the protein using ELISA, HPLC, radioassay, or other methods versal Protein Resource (UniProt) and subscription provided known in the art or as described herein, followed by standard databases such as GenSeq (e.g., Derwent). Polynucleotide calculations of the data to derive the half-life and other PK sequences may be a wild type polynucleotide sequence 65 parameters. encoding a given active protein (e.g., either full length or The increases in half-life result in a reduction of the peak mature), or in some instances the sequence may be a variant of to-trough concentration ratio, Smoothening the concentration US 9,156,897 B2 31 32 vs. time profile when multiple doses are delivered. The more peutic protein or peptide exhibits a Suboptimal stimulatory or consistent exposure can result in improved efficacy as well as Suboptimal inhibitory effect as an agonist or antagonist, a reduction inadverse events, which are often driven by a high respectively. C (Supratherapeutic concentrations). The extended dura Diseases amenable to treatment by administration of the tion of individual doses, also reduces the dose frequency, compositions of the invention include without limitation can resulting in reduction of any delivery-related adverse events cer, inflammatory diseases, arthritis, osteoporosis, infections (such as injection site reactions), improved compliance, and in particular hepatitis, bacterial infections, viral infections, added convenience for the patient. genetic diseases, pulmonary diseases, type 1 diabetes, type 2 Physicochemical and Pharmaceutical Properties diabetes, hormone-related disease, Alzheimer's disease, car In addition to enhancing the PK properties of atherapeutic, 10 diac diseases, myocardial infarction, deep vain thrombosis, fusion to a mucin-domain polypeptides may useful for diseases of the circulatory system, hypertension, hypoten improving the pharmaceutical or physicochemical properties Sion, , pain relief dwarfism and other growth disor (such as the degree of aqueous solubility) of the therapeutic ders, intoxications, blot clotting diseases, diseases of the active peptide or protein. Solubility improvements can be innate immune system, embolism, wound healing, healing of mediated both through addition of the highly hydrophilic 15 burns, Crohn's disease, asthma, ulcer, sepsis, glaucoma, cere carbohydrates on the mucinas well as through selection of the brovascular ischemia, respiratory distress syndrome, corneal proper mucin-polypeptide sequence, which may additionally ulcers, renal disease, diabetic foot ulcer, anemia, factor IX contain ionizable residues such as aspartic acid, glutamic deficiency, factor VIII deficiency, factor VII deficiency, acid, histidine, lysine, and arginine. The ionizable residues mucositis, dysphagia, thrombocyte disorder, lung embolism, result in the modulation of the pi of the fusion protein and infertility, hypogonadism, leucopenia, neutropenia, thereby the total charge of the protein in a particular formu endometriosis, Gaucher disease, obesity, lysosome storage lation. disease, AIDS, premenstrual syndrome, Turners syndrome, The fusion proteins of the invention can be constructed and cachexia, muscular dystrophy, Huntington's disease, colitis, assayed, using methods described herein, to confirm the SARS, Kaposi sarcoma, liver tumor, breast tumor, glioma, physicochemical properties of the fusion protein result in the 25 Non-Hodgkin lymphoma, Chronic myelocytic leukemia; desired properties. In one embodiment, the mucin-domain Hairy cell leukemia; Renal cell carcinoma: Liver tumor; polypeptide is selected Such that the fusion protein has an Lymphoma; Melanoma, multiple Sclerosis, Kaposis sarcoma, aqueous solubility that is within at least about 25% greater papillomavirus, emphysema, bronchitis, periodontal disease, compared to a therapeutic active protein not linked to the dementia, parturition, non Small cell lung cancer, pancreas fusion protein, or at least about 30%, or at least about 40%, or 30 tumor, prostate tumor, acromegaly, psoriasis, ovary tumor, at least about 50%, or at least about 75%, or at least about Fabry disease, lysosome storage disease. 100%, or at least about 200%, or at least about 300%, or at In one embodiment, the method comprises administering a least about 400%, or at least about 500%, or at least about therapeutically-effective amount of a pharmaceutical compo 1000% greater than the corresponding therapeutic active pro sition comprising a fusion protein comprising a therapeutic tein not linked to the fusion protein. Preferred mucin-domain 35 active protein linked to an mucin-domain polypeptide and at polypeptide sequences can have at least 80% sequence iden least one pharmaceutically acceptable carrier to a Subject in tity, or about 90%, or about 91%, or about 92%, or about 93%, need thereofthat results in greater improvement in at least one or about 94%, or about 95%, or about 96%, or about 97%, or parameter, physiologic condition, or clinical outcome medi about 98%, or about 99%, to about 100% sequence identity to ated by the therapeutic active protein of the fusion protein mucin-domain polypeptide selected from Table I. 40 compared to the effect mediated by administration of a phar Uses of the Fusion Proteins maceutical composition comprising a therapeutic active pro In another aspect, the invention provides a method of for tein not linked to mucin-domain polypeptide administered at achieving a beneficial effect in a disease, disorder or condi a comparable dose. In one embodiment, the pharmaceutical tion mediated by therapeutic active protein. The present composition is administered at a therapeutically effective invention addresses disadvantages and/or limitations of 45 dose. In another embodiment, the pharmaceutical composi therapeutic active proteins that have a relatively short termi tion is administered using multiple simultaneous or sequen nal half-life and/or a narrow therapeutic window between the tial doses using a therapeutically effective dose regimen (as minimum effective dose and the maximum tolerated dose. defined herein) for the length of the dosing period. In one embodiment, the invention provides a method for As a result of the enhanced pharmacokinetic parameters of achieving a beneficial affect in a Subject comprising the step 50 the fusion protein, as described herein such as extended half of administering to the Subject a therapeutically- or prophy life, the therapeutic active protein linked to a mucin-domain lactically-effective amount of a fusion protein. The effective polypeptide may be administered using longer intervals amount can produce a beneficial effect in helping to treat a between doses compared to the corresponding therapeutic disease or disorder. In some cases, the method for achieving a active protein not linked to mucin-domain polypeptide to beneficial effect can include administering a therapeutically 55 prevent, treat, alleviate, reverse or ameliorate symptoms or effective amount of a fusion protein composition to treat a clinical abnormalities of the disease, disorder or condition or Subject for diseases and disease categories wherein a thera prolong the Survival of the Subject being treated. peutic protein or peptide not linked to a mucin-domain A therapeutically effective amount of a fusion protein may polypeptide exhibits a suboptimal halflife. In other cases, the vary according to factors such as the disease state, age, sex, method for achieving a beneficial effect can include admin 60 and weight of the individual, and the ability of the antibody or istering a therapeutically effective amount of a fusion protein antibody portion to elicit a desired response in the individual. composition to treat a subject for diseases and disease cat A therapeutically effective amount is also one in which any egories wherein a therapeutic protein or peptide does not toxic or detrimental effects of the fusion protein are out exist. In still further cases, the method for achieving a ben weighed by the therapeutically beneficial effects. A prophy eficial effect can include administering a therapeutically 65 lactically effective amount refers to an amount of fusion effective amount of a fusion protein composition to treat a protein required for the period of time necessary to achieve Subject for diseases and disease categories wherein a thera the desired prophylactic result. US 9,156,897 B2 33 34 In one embodiment, a method of treatment comprises ments hereinabove described in this paragraph, the gene can administration of a therapeutically effective dose of a phar further comprise nucleotides encoding spacer sequences that maceutical composition comprising a fusion protein of the may also encode cleavage sequence(s). invention to a subject in need thereof that results in an In one approach, a construct is first prepared containing the increase in the half-life of the therapeutic active protein as DNA sequence corresponding to a fusion protein. DNA compared to a comparable dose of the therapeutic active encoding an active protein and/or a mucin polypeptide protein not linked to a fusion polypeptide of the invention. domain may be obtained from a cDNA library prepared using In another aspect, the invention provides methods of mak standard methods from tissue or isolated cells believed to ing fusion proteins to improve ease of manufacture relative possess the mRNA of an active protein and to express it at a half-life extension technologies requiring post-expression 10 detectable level. If necessary, the coding sequence can be chemical coupling, and result in increased stability, increased obtained using conventional primer extension procedures as water solubility, and/or ease of formulation, as compared to described in Sambrook, et al., Supra, to detect precursors and the native therapeutic active proteins. In one embodiment, the processing intermediates of mRNA that may not have been invention includes a method of increasing the aqueous solu reverse-transcribed into cDNA. Accordingly, DNA can be bility of a therapeutic active protein comprising the step of 15 conveniently obtained from a cDNA library prepared from linking the therapeutic active protein to a mucin-domain Such sources. The encoding gene(s) may also be obtained polypeptide selected Such that a higher concentration in from a genomic library or created by standard synthetic pro soluble form of the resulting fusion can be achieved, under cedures known in the art (e.g., automated nucleic acid syn physiologic conditions or in a therapeutically acceptable for thesis) using DNA sequences obtained from publicly avail mulation, compared to the therapeutic active protein not able databases, patents, or literature references. Such linked to a mucin-domain polypeptide. Factors that contrib procedures are well known in the art and well described in the ute to the property of mucin-domain polypeptide to confer Scientific and patent literature. For example, sequences can be increased water solubility of active protein when incorpo obtained from Chemical Abstracts Services (CAS) Registry rated into a fusion protein include the high percentage of Numbers (published by the American Chemical Society) and/ glycosylation, the type of glycans, and the charge on the 25 or GenBank Accession Numbers available through the amino acids of the mucin-domain polypeptide. In some National Center for Biotechnology Information (NCBI) embodiments, the method results in a fusion protein wherein webpage, available on the world wide web at ncbi.nlm. the water solubility is at least about 50%, or at least about 60% nih.gov that correspond to entries in the CAS Registry or greater, or at least about 70% greater, or at least about 80% GenBank database that contain an amino acid sequence of the greater, or at least about 90% greater, or at least about 100% 30 active protein or of a fragment or variant of the active protein greater, or at least about 150% greater, or at least about 200% or of the mucin-domain polypeptide. greater, or at least about 400% greater, or at least about 600% A gene or polynucleotide encoding the active protein, for greater, or at least about 800% greater, or at least about example can be then be cloned into a construct, which can be 1000% greater, or at least about 2000% greater, or at least a plasmid or other vector under control of appropriate tran about 4000% greater, or at least about 6000% greater under 35 Scription and translation sequences for high level protein physiologic conditions, or in a therapeutically acceptable for expression in a biological system. In a later step, a second mulation, compared to the native therapeutic active protein. gene or polynucleotide coding for the mucin-domain Nucleic Acid Sequences polypeptide for example is genetically fused to the nucle The present invention provides isolated polynucleic acids otides encoding the N- and/or C-terminus of the active protein encoding fusion proteins and sequences complementary to 40 gene by cloning it into the construct adjacent and in frame polynucleic acid molecules encoding fusion proteins of the with the gene(s) coding for the active protein. invention. In another aspect, the invention encompasses The resulting polynucleotides encoding the fusion methods to produce polynucleic acids encoding fusion pro polypeptides can then be individually cloned into an expres teins of the invention and sequences complementary to fusion sion vector. The nucleic acid sequence may be inserted into proteins of the invention, including homologous variants. In 45 the vector by a variety of procedures. In general, DNA is general, the invention provides methods of producing a poly inserted into an appropriate restriction endonuclease site(s) nucleotide sequence coding for a fusion protein and express using techniques known in the art. Vector components gen ing the resulting gene product include assembling nucleotides erally include, but are not limited to, one or more of a signal encoding each of the mucin-domain polypeptides and active sequence, an origin of replication, one or more marker , proteins, linking the components in frame, incorporating the 50 an enhancer element, a promoter, and a transcription termi encoding gene into an appropriate expression vector, trans nation sequence. Construction of suitable vectors containing forming an appropriate host cell with the expression vector, one or more of these components employs standard ligation and causing the fusion protein to be expressed in the trans techniques which are known to the skilled artisan. Such tech formed host cell, thereby producing the fusion protein of the niques are well known in the art and well described in the invention. Standard recombinant techniques in molecular 55 Scientific and patent literature. biology can be used to make the polynucleotides and expres Suitable vectors, hosts, and expression systems are well sion vectors of the present invention. In accordance with the known to those skilled in the art of recombinant expression. invention, nucleic acid sequences that encode a fusion protein Various vectors are publicly available. The vector may, for may be used to generate recombinant DNA molecules that example, be in the form of a plasmid, cosmid, viral particle, or direct the expression of fusion proteins in appropriate host 60 phage. Both expression and cloning vectors contain a nucleic cells. Several cloning strategies are envisioned to be suitable acid sequence that enables the vector to replicate in one or for performing the present invention, many of which can be more selected host cells, and further allows expression and used to generate a construct that comprises a gene coding for post-translational modification of the recombinant protein a fusion protein or its complement. In one embodiment, the within the host cell. cloning strategy would be used to create a gene that encodes 65 The present invention also provides a host cell for express a monomeric fusion protein that comprises an active protein ing the monomeric fusion protein compositions disclosed and a mucin-domain polypeptide. In the foregoing embodi herein. Examples of suitable eukaryotic host cells include, US 9,156,897 B2 35 36 but are not limited to yeast hosts such as Saccharomyces injection or saline can be provided so that the ingredients may cerevisiae, Pichia pastoris, and Hansenula polymorpha: be mixed prior to administration. insect hosts such as Spodoptera frugiperda Sf9, Spodoptera In another preferred embodiment, the pharmaceutical frugiperda Sf21, and High Five cells; and mammalian hosts composition is administered Subcutaneously. In this embodi such as mouse fibroblast cells (C 127-BPV), Chinese hamster ment, the composition may be supplied as a lyophilized pow ovary cells (CHO-DHFR, CHO-NEOSPLA, CHO-GS), and der to be reconstituted prior to administration. The composi mouse myeloma cells (NSO-GS). tion may also be supplied in a liquid form, which can be Expressed fusion proteins may be purified via methods administered directly to a patient. In one embodiment, the known in the art or by methods disclosed herein. Procedures composition is Supplied as a liquid in a pre-filled Syringe Such Such as gel filtration, affinity purification, salt fractionation, 10 that a patient can easily self-administer the composition. ion exchange chromatography, size exclusion chromatogra In another embodiment, the compositions of the present phy, hydroxyapatite adsorption chromatography, hydropho invention are encapsulated in liposomes, which have demon bic interaction chromatography and gel electrophoresis may strated utility in delivering beneficial active agents in a con be used; each tailored to recover and purify the fusion protein trolled manner over prolonged periods of time. Liposomes produced by the respective host cells. Methods of purification 15 are closed bilayer membranes containing an entrapped aque are described in Robert K. Scopes, Protein Purification: Prin ous Volume. Liposomes may also be unilamellar vesicles ciples and Practice, Charles R. Castor (ed.), Springer-Verlag possessing a single membrane bilayer or multilamellar 1994, and Sambrook, et al., supra. Multi-step purification vesicles with multiple membrane bilayers, each separated separations are also described in Baron, et al., Crit. Rev. from the next by an aqueous layer. The structure of the result Biotechnol. 10:179-90 (1990) and Below, et al., J. Chro ing membrane bilayer is such that the hydrophobic (non matogr. A. 679:67-83 (1994). polar) tails of the lipid are oriented toward the center of the Pharmaceutical Compositions bilayer while the hydrophilic (polar) heads orient towards the The present invention provides pharmaceutical composi aqueous phase. In one embodiment, the liposome may be tions comprising fusion proteins of the invention. In one coated with a flexible water soluble polymer that avoids embodiment, the pharmaceutical composition comprises the 25 uptake by the organs of the mononuclear phagocyte system, fusion protein and at least one pharmaceutically acceptable primarily the liver and spleen. Suitable hydrophilic polymers carrier. Fusion proteins of the present invention can be for for Surrounding the liposomes include, without limitation, mulated according to known methods to prepare pharmaceu PEG, polyvinylpyrrolidone, polyvinylmethylether, polym tically useful compositions, whereby the polypeptide is com ethyloxazoline, polyethyloxazoline, polyhydroxypropylox bined with a pharmaceutically acceptable carrier vehicle, 30 aZoline, polyhydroxypropylmethacrylamide, polymethacry Such as aqueous solutions or buffers, pharmaceutically lamide, polydimethylacrylamide, acceptable suspensions and emulsions. Examples of non polyhydroxypropylmethacrylate, polyhydroxethylacrylate, aqueous solvents include propyl ethylene glycol, polyethyl hydroxymethylcellulose hydroxyethylcellulose, polyethyl ene glycol and vegetable oils. Therapeutic formulations are eneglycol, polyaspartamide and hydrophilic peptide prepared for storage by mixing the active ingredient having 35 sequences as described in U.S. Pat. Nos. 6,316,024; 6,126, the desired degree of purity with optional physiologically 966; 6,056,973 and 6,043,094, the contents of which are acceptable carriers, excipients or stabilizers, as described in incorporated by reference in their entirety. Remington’s Pharmaceutical Sciences 16th edition, Osol, A. Liposomes may be comprised of any lipid or lipid combi Ed. (1980), in the form of lyophilized formulations or aque nation known in the art. For example, the vesicle-forming ous solutions. 40 lipids may be naturally-occurring or synthetic lipids, includ The pharmaceutical compositions can be administered ing phospholipids, such as phosphatidylcholine, phosphati orally, intranasally, parenterally or by inhalation therapy, and dylethanolamine, phosphatidic acid, phosphatidylserine, may take the form of tablets, lozenges, granules, capsules, phasphatidylglycerol, phosphatidylinositol, and sphingomy pills, ampoules, Suppositories or aerosol form. They may also elin as disclosed in U.S. Pat. Nos. 6,056,973 and 5,874,104. take the form of Suspensions, Solutions and emulsions of the 45 The vesicle-forming lipids may also be glycolipids, cerebro active ingredient in aqueous or nonaqueous diluents, syrups, sides, or cationic lipids, such as 1,2-dioleyloxy-3-(trimethy granulates or powders. In addition, the pharmaceutical com lamino) propane (DOTAP); N-1-(2,3,-ditetradecyloxy)pro positions can also contain other pharmaceutically active com pyl-N,N-dimethyl-N-hydroxyethylammonium bromide pounds or a plurality of compounds of the invention. (DMRIE); N-1-(2,3-dioleyloxy)propyl-N,N-dimethyl-N- More particularly, the present pharmaceutical composi 50 hydroxyethylammonium bromide (DORIE); N-1-(2,3-dio tions may be administered for therapy by any suitable route leyloxy)propyl-N,N,N-trimethylammonium chloride including oral, rectal, nasal, topical (including transdermal, (DOTMA); 3 N—(N',N'-dimethylaminoethane) carbamoly aerosol, buccal and Sublingual), vaginal, parenteral (includ cholesterol (DC-Chol); or dimethyldioctadecylammonium ing Subcutaneous, Subcutaneous or intrathecally by infusion (DDAB) also as disclosed in U.S. Pat. No. 6,056,973. Cho pump, intramuscular, intravenous and intradermal), intravit 55 lesterol may also be present in the proper range to impart real, and pulmonary. It will also be appreciated that the pre stability to the vesicle as disclosed in U.S. Pat. Nos. 5,916,588 ferred route will vary with the therapeutic agent, condition and 5,874,104. and age of the recipient, and the disease being treated. For liquid formulations, a desired property is that the for In a preferred embodiment, the composition is formulated mulation be Supplied in a form that can pass through a 25, 28. in accordance with routine procedures as a pharmaceutical 60 30, 31, 32 gauge needle for intravenous, intramuscular, composition adapted for intravenous administration to intraarticular, or Subcutaneous administration. human beings. Typically, compositions for intravenous In other embodiments, the composition may be delivered administration are solutions insterile isotonic aqueous buffer. via intranasal, buccal, or Sublingual routes to the brain to Where the composition is to be administered by infusion, it enable transfer of the active agents through the olfactory can be dispensed with an infusion bottle containing sterile 65 passages into the CNS and reducing the systemic administra pharmaceutical grade water or saline. Where the composition tion. Devices commonly used for this is administered by injection, an ampoule of sterile water for are included in U.S. Pat. No. 6,715,485. Compositions deliv US 9,156,897 B2 37 38 ered via this route may enable increased CNS dosing or SYBR Green. The fragment corresponding to the gene of reduced total body burden reducing systemic toxicity risks interest was isolated from the second row of wells on the gel. associated with certain drugs. Preparation of a pharmaceuti C) Ligation Reaction of the Gene to pcDNA. cal composition for delivery in a subdermally implantable The prepared pcDNA (step A) was mixed with the DNA device can be performed using methods known in the art, Such from step B in the presence of T4 ligase and incubated at room as those described in, e.g., U.S. Pat. Nos. 3,992,518; 5.660, temperature for 30 minutes. Following the ligation, the prod 848; and 5,756,115. ucts were transformed into TOP10 cells (Invitrogen; chemi cally competent strain of E. coli) and the correct clone was EXAMPLES picked and stored as a glycerol stock at the -80° C. 10 5. Expression of mucinylated IL-1Ra and exendin-4 fusion The following examples are offered by way of illustration proteins and are not to be construed as limiting the invention as All the proteins were expressed in CHO cells using Fre claimed in any way. eStyleTMMAX Reagent (Invitrogen) following the manufac Example 1 15 turer's protocol. Briefly, a day prior to transfection the cells were seeded at 0.5x10 cells/mL, and on the day of transfec Design, Preparation, Expression, and Purification of tion they were adjusted to 1x10° cells/mL as recommended Mucin Domain Fusion Constructs by manufacturer. For a 1 liter transfection, two tubes (A and B) of media (OptiPROTM, Invitrogen) were prepared, each 1. Design of Mucinylated IL-1Ra Fusion Proteins containing about 19 ml. 1 mg of DNA was added to tube A, Fusion proteins of IL-1Ra with varying lengths of muci and 1 ml of FreeStyleTMMAX Reagent was added to tube B. nylation were designed. The length of the mucin domain was Immediately the contents of both tubes were mixed and incu systematically increased by varying the number of mucin bated at room temperature for 15 minutes. After the incuba tandem repeats. The mucinylated domains of the fusion pro tion period the mixture was added slowly to the 1 liter of CHO teins were based on the tandem repeat (TR) from the human 25 cells. After transfection, the cells were left for 6 to 7 days and MUC20 protein. Fusion proteins of IL-1Ra with 2TR, 4TR, then the Supernatant was collected. 6TR, 8TR, and 12TR were designed, designated RDB1813 6. Purification of Mucinylated IL-1Ra and Exendin-4 Fusion SEQID NO: 1 (protein); SEQID NO: 2 (DNA), RDB1814 Proteins SEQID NO:3 (protein); SEQID NO: 4 (DNA), RDB1826 Purification of the His-tagged expressed proteins was car SEQID NO: 5 (protein); SEQID NO: 6 (DNA), RDB1815 30 ried out on a nickel column. After binding the protein, the SEQ ID NO: 7 (protein); SEQ ID NO: 8 (DNA), and column was washed with up to 5 column volumes of buffer A RDB1816 SEQIDNO:9 (protein); SEQID NO: 10 (DNA), (50 mM Tris pH8 and 500 mMNaCl). The bound protein was respectively. A linker of 4 glycines (GGGG (SEQ ID NO: eluted with an increasing concentration of imidazole (20 to 27)) was inserted between the C-terminus of the IL-1Ra 500 mM). The purified protein was dialyzed overnight against sequence and the first mucin TR, and between each set of 2 35 PBS. mucin TR. A His-tag was added to the C-terminus of RDB 1826 was purified on an anti-FLAG column. After RDB1813, RDB1814, RDB1815, and RDB1816, and a binding the protein, the column was washed with up to 5 FLAG tag to the C-terminus of RDB1826 for ease of purifi column volumes of PBS. The protein was eluted at pH3 and cation. directly neutralized with Tris buffer pH7. The purified protein 2. Design of Mucinylated Exendin-4 Fusion Protein 40 was dialyzed overnight against PBS. RDB2203 A fusion protein of exendin-4 with 8TR from the human Example 2 MUC20 protein was designed: RDB2203 SEQID NO: 24. A linker of 4 glycines (GGGG (SEQID NO: 27)) was inserted Molecular Weights and pls of Mucin-IL-1Ra between the C-terminus of the exendin-4 sequence and the 45 Constructs first mucin TR, and between each set of 2 mucin TR. A His-tag was added to the C-terminus for ease of purification, IL-1Ra-mucin fusion proteins RDB1813 (IL1Ra 2TR), with a spacer (GGGGS (SEQID NO: 28)) between the final RDB1814 (IL1Ra 4TR), RDB1815 (IL1Ra 8TR), and TR and the His-tag. RDB1816 (IL1Ra 12TR) were characterized using SD 3. Gene Synthesis 50 S/PAGE (FIG. 1A). The apparent molecular weight of all the Synthesis of the genes for expression of the designed con constructs is significantly higher than the calculated molecu structs was carried out using standard methods. lar weight of the polypeptide sequence, consistent with the 4. Subcloning of the Synthesized Gene into a Mammalian expected high level of glycosylation. The respective calcu Expression Vector lated molecular weights of the polypeptides for RDB1813, A) Preparation of the Expression Vector pcDNATM (Invit 55 RDB1814, RDB1815, and RDB1816 are 22 kD, 26.2 kD, rogen). 34.8kD, and 43.3 kD, and their respective apparent molecular 5ug of pcDNA was digested with BamHI and HindIII for weights based on their mobility on the gel are 35 kD. 45 kD. two hours at 37°C. The digest was treated with calf alkaline 65 kD, and 80 kD (FIG. 1A: arrows). phosphatase to remove the 5' phosphate, thus preventing reli The isolelectric points of constructs RDB1813 (IL1Ra gation of vector on itself. Buffer was exchange to remove salts 60 2TR), RDB1826 (IL1Ra 6TR), RDB1815 (IL1Ra 8TR), and from calf alkaline phosphatase reaction. Qiagen’s PCR RDB 1816 (IL1Ra 12TR) were measured using isoelectric cleanup kit was used following the manufacturer's Suggested focusing (FIG. 1B). All constructs were heterogenous with protocol. The DNA was eluted in 30 ul of H20. respect to charge, with multiple bands around the PI of the B) Preparation of the Gene of Interest. protein. The PI’s of the protein were largely in line with their The gene of interest was digested with BamHI and HindIII 65 calculated PI's based on the polypeptide sequence, Suggest for two hours 37° C. The digestion reaction was run on an ing the O-glycans are not heavily sialylated. The multiplicity E-Gel R. CloneWellTM apparatus (Invitrogen) using 0.8% of bands is most likely due to differences in N-glycosylation. US 9,156,897 B2 39 40 All constructs were further characterized by analytical gel Example 4 filtration on a Superdex 200 column (FIGS. 2-6). Gel filtra tion separates proteins based on their hydrodynamic Volume Binding Affinity of Mucin-IL-1Ra to IL-1RI under native conditions (i.e. unlike SDS/PAGE, it is non denaturing). Elution times can be calibrated with globular 5 IL-1RI Fc (fusion protein) was immobilized on a Bia protein standards Such that the apparent molecular weight for coreTM Sensor Chip using the Human Antibody Capture Kit an unknown globular protein can be calculated. As elution (GE Healthcare). Unmodified IL-1Ra (positive binding con time is more directly related to hydrodynamic radius rather trol), RDB1813 (IL1Ra 2TR), RDB1814 (IL1Ra 4TR), than actual molecular weight, apparent molecular weights RDB1815 (IL1Ra 8TR), RDB1816 (IL1Ra 12TR), and which are significantly higher than their calculated molecular 10 RDB1826 (IL1Ra 6TR) were flowed over the chip at 5 con weights Suggest non-globular (ie, elongated or rod-like) centrations: 20 nM, 6.6 nM, 2.2 nM, 0.74 nM and 0.24 nM. In structures, high levels of glycosylation, and/or high levels of the First cycle, the 0.24 nM concentration of the specific hydration. construct was flowed over the bound ligand on the surface of the chip for 180 seconds, after which a blank solution was On gel filtration, the apparent molecular weights for 15 passed over the surface to allow the analyte to dissociate. The RDB1813, RDB1814, RDB1826, RDB1815, and RDB1816 same procedure was repeated for an additional four times are, 42 kD. 50kD, 118 kD, 139 kD, and 230 kD, respectively using an increasing concentration from 0.74 to 20 nM for (FIGS. 2-6). The apparent molecular weights of all the mucin each construct. The resulting sensorgrams were analyzed constructs were significantly higher than both their calculated with the native instrument software to calculate the binding molecular weights (based upon the amino acid sequence affinities of the constructs (FIGS. 10-14). alone) and their mobility on SDS/PAGE. These observations Consistent with their activity data from the cell-based are highly consistent with both the high level of glycosylation assay, all the constructs retained potent binding affinity for and the expected rod-like structure of the mucin constructs. IL-1RI. The calculated binding constants (K) to IL-1RI for RDB1813 (IL1Ra 2TR), RDB1814 (IL1Ra 4TR), RDB1826 Example 3 25 (IL1Ra 6TR), RDB1815 (IL1Ra8TR), and RDB1816 (IL1Ra 12TR) were 240 pM, 64 p.M., 526 pM, 160 pM, and 900 pM, Antagonist Activities of Mucin-IL-1Ra Constructs respectively. Thus, the binding affinities of the mucinylated constructs are 4.6-fold (RDB1813), 1.2-fold (RDB1814), HEK-BlueTM IL-1B cells (Invivogen) are human embry 10-fold (RDB1826), 3.1-fold (RDB1815), and 17.3-fold onic kidney cells specifically designed to detect bioactive 30 (RDB1816) higher than the unmodified IL-1Ra, whose K, IL-1B in vitro by monitoring the IL-1,3-induced expression of was determined to be 52 pM. It is important to note that the an NF-kB/AP-1 secreted embryonic alkaline phosphatase construct with the weakest affinity to IL-1RI (i.e. RDB1816) (SEAP) reporter gene. SEAP can be readily monitored when had the largest number of mucintandem repeats in the series using the SEAP detection medium QUANT1-BlueTM (Invi (12TR). RDB1816's loss in affinity was therefore entirely 35 due to a slower association rate (K-1.5x10) when com Vogen). IL-1Ra inhibits this IL-1,3-induced signal through pared with unmodified IL-1Ra (K=2.8x10'), perhaps due to binding IL-1RI, preventing the binding of IL-1RAcP, and a slower tumbling rate, as the dissociation rate (K) of thus assembly of the full signaling complex. The ability of the RDB1816 (K-1.4x10) was the same as for the unmodified mucin-IL1Ra constructs RDB1813 (IL1Ra 2TR), RDB1814 IL-1Ra (K-1.5x10) (FIG. 14). (IL1Ra 4TR), RDB1826 (IL1Ra 6TR), RDB1815 (IL1Ra 40 8TR), and RDB1816 (IL1Ra 12TR) to inhibit IL-1,3-induced Example 5 SEAP was evaluated. IL1Ra (anakinra) that lacks a fused mucin domain was used as a positive antagonist control. In Vivo Efficacy of RDB1816 100 uL of media containing HEK-BlueTMIL-1 B cells were plated into 96-well microtiter plates to a final concentration of 45 The efficacy and duration of action of RDB1816 (IL1Ra 50,000 cells/well. Mucin-IL1Ra constructs RDB1813 12TR) was evaluated in the mouse collagen-antibody-in (IL1Ra 2TR), RDB1814 (IL1Ra 4TR), RDB1826 (IL1Ra duced-arthritis (CAIA) model. Unmodified IL-1Ra is active 6TR), RDB1815 (IL1Ra 8TR), and RDB1816 (IL1Ra 12TR) in this model when continuously infused at a rate of 6 mg/kg/ were prepared at initial concentrations of 12 nM (RDB1813), hr, but has no effect as a single 20 mg/kg dose due to its 4.8 nM (RDB1814), 34.5 nM (RDB1826), 2.4 nM 50 extremely short half-life. (RDB1815), and 11.6 nM (RDB1816), then serially diluted The experimental design for the mouse CAIA model is and added in duplicate test samples to the HEK-BlueTM depicted in FIG. 15, with an in of 8 mice per group. Briefly, IL-1Bcells. IL-1B at 0.015 nM was used in all test samples to arthritis is initiated in the mice by intravenous (IV) injection induce SEAP 0.117 nM IL-1 Balone, and 11.76 nM IL1Ra+ of an anti-collagen antibody cocktail. After 3 days, the 0.015 nM IL-1B were used in control samples. The samples 55 inflammatory response was boosted with an intraperitoneal were incubated at 37°C., for 20-24 hours and evaluated using (IP) injection of lipopolysaccharide (LPS), while at the same the QUANT1-BlueTM assay. time, the test compounds RDB1816 (20 mg/kg) and anakinra Mucin-IL1Ra constructs RDB1813 (IL1Ra 2TR), (unmodified IL-1Ra, 20 mg/kg), and a saline control were RDB1814 (IL1Ra 4TR), RDB1826 (IL1Ra 6TR), RDB1815 delivered subcutaneously (SC). Paw volume was measured (IL1Ra 8TR), and RDB1816 (IL1Ra 12TR) inhibited IL-1 B 60 across multiple days to assess the degree of inflammation. induced SEAP expression in a dose-dependent fashion, with The results of the mouse CAIA experiment are depicted in ICs values of 0.72 nM, 0.37 nM, 3.5 nM, 0.42 nM, and 0.53 FIG. 16. The anakinra-treated group showed no reduction in nM, respectively, comparing favorably with the control (un inflammation relative to the saline control. This is due to its modified IL1Ra) ICs value of 0.1-0.15 nM (FIGS. 7-9). short half-life resulting in low exposure, as a continuous Thus, the inhibitory activity of all mucinlylated constructs is 65 infusion of anakinra is efficacious in this model. In marked within a few fold of the unmodified IL (non-mucinylated) contrast, significant reduction in paw Volume was observed in protein control. the RDB1816-treated mice relative to both the saline-treated US 9,156,897 B2 41 42 and the anakinra-treated groups. Thus, mucinylation Suffi Example 8 ciently increased the exposure of the IL-1Ra molecule to elicit a pharmacodynamic effect. Bioactivity of RDB2203 The ability of RDB2203 to agonize the GLP-1 receptor was Example 6 measured using the DiscoveRx Pathhunter eXpress GLP-1 receptor cAMP assay. The assay was executed as per the Pharmacokinetics (PK) of RDB1815 and RDB1816 manufacturers instructions. The results indicate that RDB2203 is a potentagonist of the GLP-1 receptor, demon A pharmacokinetic study in rats was conducted to deter 10 strating an ECs of 1.5 nM, which is about 21-fold less potent mine the half life of two IL-1Ra-mucin fusion polypeptide than the unmodified exendin-4 (ECso of 0.07 nM) (FIG. 20). constructs: RDB1815 and RDB1816. A single dose of RDB 1815 was delivered subcutaneously (5.6 mg/Kg) or Example 9 intravenously (2.1 mg/Kg), and a single dose of RDB1816 Pharmacokinetic Profile of RDB2203 was delivered Subcutaneously (6.4 mg/Kg) or intravenously 15 (2.4 mg/Kg), with an in 3 rats per dosage group. Blood was RDB2203 was dosed in rats intravenously at 0.65 mg/kg collected at various time points over 6 days and analyzed for and Subcutaneously at 0.65 mg/kg and 7.0 mg/kg. Relative to IL-1Ra by ELISA. the un-mucinylated exendin-4", RDB2203 exhibited an From the pharmacokinetic data (FIG. 17), half lives of 17.9 increase in half-life (2.0 hr vs. 0.5 hr., iv) and reduced clear h and 13.9 h were calculated for RDB1815 and RDB1816, ance (74 ml/hr/kg vs. 200 ml/hr/kg, iv), resulting in an respectively when by IV, and 11.0 hand 8.0 h, respectively increase in total exposure' Ai, G. et al.: Pharmacokinetics of when delivered by SC. This represents an extension of half exendin-4 in Wistar rats; Journal of Chinese Pharmaceutical life between 10-fold and 14-fold over reported values for the Sciences; 17 (2008) 6-10 (FIG. 21). Via the subcutaneous unmodified IL-1Ra control (Anakinra) Consequently, the route, RDB2203 also displayed improved , exposure of both RDB1815 and RDB1816 are dramatically 25 >95% relative to 65% for the unmodified exendin-4 (FIG. increased, consistent with the efficacy observed in the mouse 21). CAIA model. The patent and scientific literature referred to herein estab lishes the knowledge that is available to those with skill in the Example 7 art. All United States patents and published or unpublished 30 United States patent applications cited herein are incorpo rated by reference. All published foreign patents and patent Molecular Weight of Mucin-Exendin-4 Construct applications cited herein are hereby incorporated by refer RDB2203 ence. All other published references, documents, manuscripts and scientific literature cited herein are hereby incorporated RDB2203 was characterized using SDS/PAGE and ana 35 by reference. lytical gel filtration on a Superdex 200 column. By SDS/ While this invention has been particularly shown and PAGE, the apparent molecular weight of RDB2203 is about described with references to preferred embodiments thereof, 45 kD, about twice its calculated polypeptide molecular it will be understood by those skilled in the art that various weight of 22.8 kD, consistent with a high level of glycosyla changes in form and details may be made therein without tion (FIG. 18: molecular markers on the left). By analytical 40 departing from the scope of the invention encompassed by the gel filtration, the apparent molecular weight of RDB2203 is appended claims. It should also be understood that the 120 kD (single peak between the 158 kD and 75 kD stan embodiments described herein are not mutually exclusive and dards), due to the large hydrodynamic radius imparted by the that features from the various embodiments may be combined highly glycosylated mucin domain (FIG. 19). in whole or in part in accordance with the invention.

SEQUENCE LISTING

<16 Os NUMBER OF SEO ID NOS: 28

<21 Os SEQ ID NO 1 &211s LENGTH: 209 212s. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polypeptide

<4 OOs SEQUENCE: 1 Ser Ala Arg Arg Pro Ser Gly Arg Llys Ser Ser Lys Met Glin Ala Phe 1. 5 1O 15

Arg Ile Trp Asp Val Asn Gln Llys Thr Phe Tyr Lieu. Arg Asn. Asn Glin 2O 25 3 O

Lieu Val Ala Gly Tyr Lieu. Glin Gly Pro Asn Val Asn Lieu. Glu Glu Lys 35 4 O 45

Ile Asp Val Val Pro Ile Glu Pro His Ala Leu Phe Leu Gly Ile His US 9,156,897 B2 43 44 - Continued

SO 55 6 O Gly Gly Lys Met Cys Lieu. Ser Cys Wall Lys Ser Gly Asp Glu Thir Arg 65 70 7s 8O Lieu. Glin Lieu. Glu Ala Val Asn. Ile Thr Asp Lieu. Ser Glu Asn Arg Llys 85 90 95 Gln Asp Lys Arg Phe Ala Phe Ile Arg Ser Asp Ser Gly Pro Thr Thr 1OO 105 11 O Ser Phe Glu Ser Ala Ala Cys Pro Gly Trp Phe Lieu. Cys Thr Ala Met 115 12 O 125 Glu Ala Asp Glin Pro Val Ser Lieu. Thir Asn Met Pro Asp Glu Gly Val 13 O 135 14 O Met Val Thr Llys Phe Tyr Phe Glin Glu Asp Glu Gly Gly Gly Gly Ser 145 150 155 160 Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val Ile 1.65 17O 17s Thr Glu Ser Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro 18O 185 19 O His Pro Val Ile Thr Glu Ser Arg Gly Ser Ser His His His His His 195 2OO 2O5

His

<210s, SEQ ID NO 2 &211s LENGTH: 633 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence & 22 O FEATURE; <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polynucleotide

<4 OOs, SEQUENCE: 2 tcc.gct cac gaccttctgg gcgaaaatct tctaaaatgc aggcctt.ccg gatttgggat 6 O gtgaat Caga aaacttittta Cctgaggaac alaccagctgg togctggata cctgcaggga 12 O ccaaacgtga atctggagga gaaaatcgac gtcgt.cccaa ticgaacctica cqctctgttt 18O

Ctgggaatcc atggcggcaa aatgtgtctg. tcc ttgttga aatctggcga Cagact aga 24 O

Ctgcagctgg aggctgttgaa tat caccgac Ctgtctgaga atcgtaalaca gga caaacgc 3OO tittgcc titta t cogct coga tagtggacca acaac ct citt togaatctgc tigcttgc cct 360 ggatggitttctgttgtaccgc tatggaggcc gat cagcctg tdt Ctctgac caatatgcCC 42O gatgagggag ticatggtgac aaaattctac titt Caggagg atgagggcgg aggcggttct 48O gctagtagcq agt cct ctdc ttct tcc.gat gigacct cacc ccgtgattac cqaatc.ccga 54 O gcttct tccg aatcttctgc ct citt cogac gg.cccacacic ctdtcat cac tdagagc.cgt. 6OO ggttcatcac accaccatca toaccactag toga 633

<210s, SEQ ID NO 3 &211s LENGTH: 254 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polypeptide

SEQUENCE: 3 Ser Ala Arg Arg Pro Ser Gly Arg Llys Ser Ser Lys Met Glin Ala Phe 1. 5 15

Arg Ile Trp Asp Val Asn Glin Llys Thr Phe Tyr Lieu. Arg Asn. Asn Glin US 9,156,897 B2 45 46 - Continued

25

Lieu Val Ala Gly Tyr Lieu. Glin Gly Pro Asn. Wall Asn Lell Glu Glu Lys 35 4 O 45

Ile Asp Val Val Pro Ile Glu Pro His Ala Lieu. Phe Lell Gly Ile His SO 55 6 O

Gly Gly Lys Met Cys Lieu. Ser Cys Val Lys Ser Gly Asp Glu Thr Arg 65 70 7s 8O

Lieu. Glin Lieu. Glu Ala Wall Asn. Ile Thir Asp Lieu. Ser Glu Asn Arg Llys 85 90 95

Glin Asp Lys Arg Phe Ala Phe Ile Arg Ser Asp Ser Gly Pro Th Thr 105 11 O

Ser Phe Glu Ser Ala Ala Cys Pro Gly Trp Phe Lell Cys Thir Ala Met 115 12 O 125

Glu Ala Asp Glin Pro Val Ser Lieu. Thir Asn. Met Pro Asp Glu Gly Val 13 O 135 14 O Met Val Thr Llys Phe Tyr Phe Glin Glu Asp Glu Gly Gly Gly Gly Ser 145 150 155 160

Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile 1.65 17O

Thr Glu Ser Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro 18O 185 19 O

His Pro Val Ile Thr Glu Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser 195

Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thir Glu Ser 210 215 220

Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall 225 23 O 235 24 O

Ile Thr Glu Ser Arg Gly Ser Ser His His His His His His 245 250

<210s, SEQ ID NO 4 &211s LENGTH: 768 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polynucleotide

<4 OOs, SEQUENCE: 4 tcc.gct cac gaccttctgg gcgaaaatct tctaaaatgc aggcct tcc.g gatttgggat 6 O gtgaat caga aaactttitta Cctgaggaac alaccagctgg tcgctggata Cctgcaggga 12 O ccaaacgtga atctggagga gaaaatcgac gtcgt.cccaa tcqaacctica cgctctgttt 18O

Ctgggaatcc atggcggcaa. aatgtgtctg t ccttgttga aatctggcga cgagact aga 24 O

Ctgcagctgg aggctgttgaa tat caccgac Ctgtctgaga atcgtaalaca gga caaacgc 3OO tittgcc titta t cogct coga tagtggacca acalacct Ctt tcqaatctgc tgcttgc cct 360 ggatggitttctgttgtaccgc tatggaggcc gat cagoctg tgtct ctdac caatatgcc c 42O gatgagggag ticatggtgac aaaattctac titt Caggagg atgaggg.cgg aggcggttct 48O gctagtagcg agt cct Ctgc ttct tcc.gat ggacct cacc cc.gtgattac cgaatcc.cga 54 O gcttct tccg aatcttctgc citct tcc.gac ggcc.cacacc ctgtcat cac tgaga.gc.cgt. 6OO ggtggcggtg gatctgctag tag tdaatca tctgctagta gtgacggccC acaccc.cgtg 660 attact gaga gtcgtgcctic titc.cgaatca tctgctagta gtgacggacc t caccc.cgtg 72 O US 9,156,897 B2 47 48 - Continued at cactgagt ccc.gtggctic at cacaccac catcatcacc act agtga 768

<210s, SEQ ID NO 5 &211s LENGTH: 297 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polypeptide

<4 OOs, SEQUENCE: 5 Ser Ala Arg Arg Pro Ser Gly Arg Llys Ser Ser Lys Met Glin Ala Phe 1. 5 1O 15 Arg Ile Trp Asp Val Asn Glin Llys Thr Phe Tyr Lieu. Arg Asn. Asn Glin 2O 25 3O Lieu Val Ala Gly Tyr Lieu. Glin Gly Pro Asn Val Asn Lieu. Glu Glu Lys 35 4 O 45 Ile Asp Val Val Pro Ile Glu Pro His Ala Leu Phe Lieu. Gly Ile His SO 55 6 O Gly Gly Lys Met Cys Lieu. Ser Cys Wall Lys Ser Gly Asp Glu Thir Arg 65 70 7s 8O Lieu. Glin Lieu. Glu Ala Val Asn. Ile Thr Asp Lieu. Ser Glu Asn Arg Llys 85 90 95 Gln Asp Lys Arg Phe Ala Phe Ile Arg Ser Asp Ser Gly Pro Thr Thr 1OO 105 11 O Ser Phe Glu Ser Ala Ala Cys Pro Gly Trp Phe Lieu. Cys Thr Ala Met 115 12 O 125 Glu Ala Asp Glin Pro Val Ser Lieu. Thir Asn Met Pro Asp Glu Gly Val 13 O 135 14 O Met Val Thr Llys Phe Tyr Phe Glin Glu Asp Glu Gly Gly Gly Gly Ser 145 150 155 160 Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val Ile Thr Pro 1.65 17O 17s Ser Arg Ala Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val 18O 185 19 O Ile Thr Pro Ser Arg Ala Gly Gly Gly Gly Ser Ser Glu Ser Ser Ala 195 2OO 2O5 Ser Ser Asp Gly Pro His Pro Val Ile Thr Pro Ser Arg Ala Ser Glu 21 O 215 22O Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val Ile Thr Pro Ser Arg 225 23 O 235 24 O Ala Gly Gly Gly Gly Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro 245 250 255 His Pro Val Ile Thr Pro Ser Arg Ala Ser Glu Ser Ser Ala Ser Ser 26 O 265 27 O Asp Gly Pro His Pro Val Ile Thr Pro Ser Arg Ala Gly Gly Gly Gly 27s 28O 285

Ser Asp Tyr Lys Asp Asp Asp Asp Llys 29 O 295

<210s, SEQ ID NO 6 &211s LENGTH: 897 &212s. TYPE: DNA <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polynucleotide US 9,156,897 B2 49 - Continued

<4 OOs, SEQUENCE: 6 tcc.gc.caggc ggcct tcc.gg Caggaaaagc agcaagatgc aggcctt.ccg gatctgggac 6 O gtgalaccaga agacct tcta Cctg.cggaac alaccagctgg togc.cggct a tictgcaaggc 12 O Cccaacgt.ca acctggagga gaagat.cgac gtcgt.cccta t cagcctica CCCCtgttc 18O

Ctcggcatcc acggcggaaa gatgtgcctg agctg.cgtga agt ccggcga Calgacaagg 24 O Ctccagotcg aggc.cgtgaa tat caccgac Ctgtc.cgaga accggaagca gga caag.cgg 3OO titcgc.ctt catcaggit coga cagcggcc ct accacct cott togaatcc.gc cqcttgtc.ct 360 ggctggitttctgttgtaccgc tatggaggcc gaccagcctg tdt CCct cac caa.catgcct 42O gacgagggcg tatggtgac Caagttctac titcCaggagg acgaaggagg cqgcggctic C 48O agcgaatcca gcgc.ct coag catggcc cc catcCt9tca to acc cc tag cagggcct cc 54 O gaaagctic.cg C cagcagoga tiggacct cat CCtgt catta Cacct agcag ggctggagga 6OO ggaggcagct C cagt ccag cqctagotcc gacggacccc accc.cgtgat tacaccct cc 660 cgggct tccg agagcagcgc titc.ca.gcgat ggacct catc ccgtgat cac ccct tcc agg 72 O gctggcggag gcggct coag cagagcago goctocagcg acggcc.ccca C cctgttgatt 78O acacct tccc gggc.ca.gcga gagct cogct agcagcgatg gaccc catcc cqtgat caca 84 O cc.ca.gcaggg ccggaggcgg aggaagcgat tacaaggacg acgacgacaa gtagtga 897

<210s, SEQ ID NO 7 &211s LENGTH: 344 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polypeptide

<4 OO > SEQUENCE: 7

Ser Ala Arg Arg Pro Ser Gly Arg Llys Ser Ser Met Glin Ala Phe 1. 5 1O 15

Arg Ile Trp Asp Val Asn Glin Lys Thr Phe Tyr Lell Arg Asn ASn Glin 25

Lell Wall Ala Gly Tyr Lieu. Glin Gly Pro Asn. Wall Asn Lell Glu Glu Lys 35 4 O 45

Ile Asp Wall Wall Pro Ile Glu Pro His Ala Lieu. Phe Lell Gly Ile His SO 55 6 O

Gly Gly Met Cys Lieu Ser Cys Val Lys Ser Gly Asp Glu Thr Arg 65 70 7s 8O

Lell Glin Luell Glu Ala Wall Asn. Ile Thir Asp Lieu. Ser Glu Asn Arg Llys 85 90 95

Glin Asp Arg Phe Ala Phe Ile Arg Ser Asp Ser Gly Pro Th Thr 105 11 O

Ser Phe Glu Ser Ala Ala Cys Pro Gly Trp Phe Lell Cys Thir Ala Met 115 12 O 125

Glu Ala Asp Gln Pro Wal Ser Lieu. Thir Asn. Met Pro Asp Glu Gly Val 13 O 135 14 O

Met Wall Thir Llys Phe Tyr Phe Glin Glu Asp Glu Gly Gly Gly Gly Ser 145 150 155 160

Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile 1.65 17O 17s

Thir Glu Ser Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro 18O 185 19 O US 9,156,897 B2 51 52 - Continued

His Pro Wall Ile Thr Glu Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser 195

Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thir Glu Ser 21 O 215 22O

Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall 225 23 O 235 24 O

Ile Thir Glu Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser Glu Ser Ser 245 250 255

Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thr Glu Ser Arg Ala Ser 26 O 265 27 O

Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thr Glu 27s 285

Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser Glu Ser Ser Ala Ser Ser 29 O 295 3 OO

Asp Gly Pro His Pro Wall Ile Thr Glu Ser Arg Ala Ser Ser Glu Ser 3. OS 310 315 32O

Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thir Glu Ser Arg Gly 3.25 330 335

Ser Ser His His His His His His 34 O

SEQ ID NO 8 LENGTH: 1038 TYPE: DNA ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Description of Artificial Sequence: Synthetic polynucleotide

<4 OOs, SEQUENCE: 8 tcc.gct cac gaccttctgg gcgaaaatct tctaaaatgc aggcct tcc.g gatttgggat 6 O gtgaat Caga aaactittitta Cctgaggaac alaccagctgg tcgctggata Cctgcaggga 12 O ccaaacgtga atctggagga gaaaatcgac gtcgt.cccaa tcqaacctica cgctctgttt 18O

Ctgggaatcc atggcggcaa. aatgtgtctg t ccttgttga aatctggcga cgagact aga 24 O

Ctgcagctgg aggctgttgaa tat caccgac Ctgtctgaga atcgtaalaca gga caaacgc 3OO tittgcc titta tcc.gct coga tagtggacca acalacct Ctt tcqaatctgc tgcttgc cct 360 ggatggitttc tgttgtaccgc tatggaggcc gat cagoctg tgtct ctdac caatatgcc c gatgagggag t catggtgac aaaattctac titt Caggagg atgaggg.cgg aggcggttct gctagtagcg agt cct ctogc ttct tcc.gat ggacct cacc cc.gtgattac cgaatcc.cga 54 O gcttct tcc.g aatcttctgc citct tcc.gac ggcc.cacacc ctgtcat cac tgaga.gc.cgt.

gatctgctag tag tdaatca tctgctagta gtgacggccC acaccc.cgtg 660 attact gaga gtcgtgcctic titc.cgaatca tctgctagta gtgacggacc t caccc.cgtg 72 O at cactgagt CCC gttggcgg tggcggitt CC gct tcatctg aatct tcc.gc ttcatc.cgat ggtc.cc catc ctgtcattac cgaat citcgt. gcct ctagog aat catcc.gc ttctagtgac 84 O ggtc.cccacc ctgtcattac tgaatcc.cga ggcgg.cggtg gatctgcttic titc.cgaatca 9 OO tctgcttcta gtgacggacc acaccctgtc attaccgaga gtagggctt C atctgaatct 96.O tcc.gctt cat ccgacggacc a catcctgtg attactgaat cacgaggctic at Cacac CaC

CatCat Cacc act agtga 1038

<210s, SEQ ID NO 9 US 9,156,897 B2 53 - Continued

LENGTH: 434 TYPE : PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Description of Artificial Sequence: Synthetic polypeptide

<4 OOs, SEQUENCE: 9

Ser Ala Arg Arg Pro Ser Gly Arg Ser Ser Met Glin Ala Phe 1. 5 15

Arg Ile Trp Asp Wall Asn Glin Thir Phe Lell Arg Asn Asn Glin 25

Lell Wall Ala Gly Tyr Lell Glin Gly Pro Asn Wall Asn Lell Glu Glu 35 4 O 45

Ile Asp Wall Wall Pro Ile Glu Pro His Ala Luell Phe Lell Gly Ile His SO 55 6 O

Gly Gly Met Cys Lell Ser Wall Lys Ser Gly Asp Glu Thir Arg 65 70

Lell Glin Luell Glu Ala Wall Asn Ile Thir Asp Luell Ser Glu Asn Arg 85 90 95

Glin Asp Arg Phe Ala Phe Ile Arg Ser Asp Ser Gly Pro Thir Thir 105 11 O

Ser Phe Glu Ser Ala Ala Pro Gly Trp Phe Lell Cys Thir Ala Met 115 12 O 125

Glu Ala Asp Glin Pro Wall Ser Luell Thir Asn Met Pro Asp Glu Gly Wall 13 O 135 14 O

Met Wall Thr Phe Tyr Phe Gln Glu Asp Glu Gly Gly Gly Gly Ser 145 150 155 160

Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile 1.65 17O 17s

Thir Glu Ser Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro 18O 185 19 O

His Pro Wall Ile Thir Glu Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser 195

Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thir Glu Ser 21 O 215 22O

Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall 225 23 O 235 24 O

Ile Thir Glu Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser Glu Ser Ser 245 250 255

Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thir Glu Ser Arg Ala Ser 26 O 265 27 O

Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thir Glu 27s 285

Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser Glu Ser Ser Ala Ser Ser 29 O 295 3 OO

Asp Gly Pro His Pro Wall Ile Thir Glu Ser Arg Ala Ser Ser Glu Ser 3. OS 310 315

Ser Ala Ser Ser Asp Gly Pro His Pro Wall Ile Thir Glu Ser Arg Gly 3.25 330 335

Gly Gly Gly Ser Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro 34 O 345 35. O

His Pro Wall Ile Thir Glu Ser Arg Ala Ser Ser Glu Ser Ser Ala Ser 355 360 365

Ser Asp Gly Pro His Pro Wall Ile Thir Glu Ser Arg Gly Gly Gly Gly

US 9,156,897 B2 57 58 - Continued Pro Asp Thr Arg 2O

<210s, SEQ ID NO 12 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Homo sapiens <4 OOs, SEQUENCE: 12 Ile Thr Thr Thr Thr Thr Val Thr Pro Thr Pro Thr Pro Thr Gly Thr 1. 5 1O 15

Gn. Thir Pro Thir Thir Thr Pro 2O

<210s, SEQ ID NO 13 &211s LENGTH: 17 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 13 Ile Thir Thr Thr Glu Thir Thr Ser His Asp Thr Pro Ser Phe Thr Ser 1. 5 1O 15

Ser

<210s, SEQ ID NO 14 &211s LENGTH: 16 212. TYPE: PRT <213> ORGANISM: Homo sapiens < 4 OO SEQUENCE: 14 Ala Thr Pro Leu Pro Val Thr Asp Thr Ser Ser Ala Ser Thr Gly His 1. 5 1O 15

<210s, SEQ ID NO 15 &211s LENGTH: 8 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 15

Thir Thir Ser Thir Thir Ser Ala Pro 1. 5

<210s, SEQ ID NO 16 &211s LENGTH: 29 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 16 Ala Thr Gly Ser Thr Ala Thr Pro Ser Ser Thr Pro Gly Thr Thr His 1. 5 1O 15

Thir Pro Pro Wall Leu. Thir Thir Thir Ala Thir Thr Pro Thir 2O 25

<210s, SEQ ID NO 17 &211s LENGTH: 23 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 17

Thir Thir Ala Ala Pro Pro Thr Pro Ser Ala Thir Thr Glin Ala Pro Pro 1. 5 1O 15

Ser Ser Ser Ala Pro Pro Glu 2O US 9,156,897 B2 59 60 - Continued

<210s, SEQ ID NO 18 &211s LENGTH: 19 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 18 Glu Glu Ser Thr Thr Val His Ser Ser Pro Gly Ala Thr Gly Thr Ala 1. 5 1O 15

Leul Phe Pro

<210s, SEQ ID NO 19 &211s LENGTH: 59 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 19 Ser Ser Ser Pro Thr Pro Ala Glu Gly Thr Ser Met Pro Thr Ser Thr 1. 5 1O 15 Tyr Ser Glu Gly Arg Thr Pro Leu. Thir Ser Met Pro Val Ser Thir Thr 2O 25 3O Lieu Val Ala Thr Ser Ala Ile Ser Thr Lieu Ser Thr Thr Pro Val Asp 35 4 O 45

Thir Ser Thr Pro Wall. Thir Asn. Ser Thr Glu Ala SO 55

<210s, SEQ ID NO 2 O & 211 LENGTH 19 212. TYPE: PRT <213> ORGANISM: Homo sapiens <4 OOs, SEQUENCE: 2O Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val Ile Thr Pro 1. 5 1O 15 Ser Arg Ala

<210s, SEQ ID NO 21 &211s LENGTH: 15 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 21 Ala Thr Asn Ser Glu Ser Ser Thr Val Ser Ser Gly Ile Ser Thr 1. 5 1O 15

<210s, SEQ ID NO 22 &211s LENGTH: 6 212. TYPE: PRT <213> ORGANISM: Homo sapiens <4 OOs, SEQUENCE: 22

Wa Pro Thir Thir Thir Thir 1. 5

<210s, SEQ ID NO 23 &211s LENGTH: 10 212. TYPE: PRT <213> ORGANISM: Homo sapiens

<4 OOs, SEQUENCE: 23

Glin. Thir Thr Glin Pro Ala Ala Thr Glu Ala 1. 5 1O US 9,156,897 B2 61 - Continued

<210s, SEQ ID NO 24 &211s LENGTH: 23 O 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic polypeptide

<4 OOs, SEQUENCE: 24 His Gly Glu Gly Thr Phe Thir Ser Asp Leu Ser Lys Gln Met Glu Glu 1. 5 1O 15 Glu Ala Val Arg Lieu. Phe Ile Glu Trp Lieu Lys Asn Gly Gly Pro Ser 2O 25 3O Ser Gly Ala Pro Pro Pro Ser Gly Gly Gly Gly Ser Ala Ser Ser Glu 35 4 O 45 Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val Ile Thr Glu Ser Arg SO 55 6 O Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val Ile 65 70 7s 8O Thr Glu Ser Arg Gly Gly Gly Gly Ser Ala Ser Ser Glu Ser Ser Ala 85 90 95 Ser Ser Asp Gly Pro His Pro Val Ile Thr Glu Ser Arg Ala Ser Ser 1OO 105 11 O Glu Ser Ser Ala Ser Ser Asp Gly Pro His Pro Val Ile Thr Glu Ser 115 12 O 125 Arg Gly Gly Gly Gly Ser Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp 13 O 135 14 O Gly Pro His Pro Val Ile Thr Glu Ser Arg Ala Ser Ser Glu Ser Ser 145 150 155 160 Ala Ser Ser Asp Gly Pro His Pro Val Ile Thr Glu Ser Arg Gly Gly 1.65 17O 17s Gly Gly Ser Ala Ser Ser Glu Ser Ser Ala Ser Ser Asp Gly Pro His 18O 185 19 O Pro Val Ile Thr Glu Ser Arg Ala Ser Ser Glu Ser Ser Ala Ser Ser 195 2OO 2O5 Asp Gly Pro His Pro Val Ile Thr Glu Ser Arg Gly Gly Gly Gly Ser 21 O 215 22O

His His His His His His 225 23 O

<210s, SEQ ID NO 25 &211s LENGTH: 19 212. TYPE: PRT &213s ORGANISM: omo sapiens 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (4) . . (6) <223> OTHER INFORMATION: Any amino acid 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (8) ... (9) <223> OTHER INFORMATION: Any amino acid 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (11) . . (12) <223> OTHER INFORMATION: Any amino acid 22 Os. FEATURE: <221s NAME/KEY: MOD RES <222s. LOCATION: (14) . . (14) <223> OTHER INFORMATION: Any amino acid 22 Os. FEATURE: US 9,156,897 B2 63 64 - Continued <221s NAME/KEY: MOD RES <222s. LOCATION: (16) ... (18) <223> OTHER INFORMATION: Any amino acid <4 OOs, SEQUENCE: 25

Glu Glu Ser Xaa Xala Xaa His Xaa Xala Pro Xaa Xala Thr Xaa Thr Xaa 1. 5 1O 15

Xaa Xala Pro

<210s, SEQ ID NO 26 &211s LENGTH: 10 212. TYPE: PRT <213> ORGANISM: Homo sapiens <4 OOs, SEQUENCE: 26 Gly Val Thr Gly Thr Thr Gly Pro Ser Ala 1. 5 1O

<210s, SEQ ID NO 27 &211s LENGTH: 4 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic peptide

<4 OOs, SEQUENCE: 27 Gly Gly Gly Gly 1.

<210s, SEQ ID NO 28 &211s LENGTH: 5 212. TYPE: PRT <213> ORGANISM: Artificial Sequence 22 Os. FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic peptide

<4 OOs, SEQUENCE: 28 Gly Gly Gly Gly Ser 1. 5

What is claimed is: 45 wherein the half-life of the fusion protein is increased by two 1. A fusion protein comprising a mucin-domain polypep fold as compared to the half-life of the corresponding active tide linked to at least one active protein wherein the mucin- proteind wherein that is the not fusi fused to the mucin-domainh polypeptide,id domain polypeptide has at least 4 tandem amino acid repeat- aC W s typen comprises the amino ac1 ing units of at least 8 amino acids in length per tandem Sequence O Q repeating unit, wherein the active protein is exendin-4; and k . . . .