US008877688B2

(12) United States Patent (10) Patent No.: US 8,877,688 B2 Vasquez et al. (45) Date of Patent: *Nov. 4, 2014

(54) RATIONALLY DESIGNED, SYNTHETIC (56) References Cited LIBRARIES AND USES THEREFOR U.S. PATENT DOCUMENTS 4.946,778 A 8, 1990 Ladner et al. (75) Inventors: Maximiliano Vasquez, Palo Alto, CA 5,118,605 A 6/1992 Urdea (US); Michael Feldhaus, Grantham, NH 5,223,409 A 6/1993 Ladner et al. 5,283,173 A 2f1994 Fields et al. (US); Tillman U. Gerngross, Hanover, 5,380,833. A 1, 1995 Urdea NH (US); K. Dane Wittrup, Chestnut 5,525.490 A 6/1996 Erickson et al. Hill, MA (US) 5,530,101 A 6/1996 Queen et al. 5,565,332 A 10/1996 Hoogenboom et al. (73) Assignee: Adimab, LLC, Lebanon, NH (US) 5,571,698 A 11/1996 Ladner et al. 5,618,920 A 4/1997 Robinson et al. (*) Notice: Subject to any disclaimer, the term of this 5,658,727 A 8, 1997 Barbas et al. patent is extended or adjusted under 35 (Continued) U.S.C. 154(b) by 747 days. FOREIGN PATENT DOCUMENTS This patent is Subject to a terminal dis claimer. DE 19624562 A1 1, 1998 WO WO-88.01649 A1 3, 1988 (21) Appl. No.: 12/404,059 (Continued) OTHER PUBLICATIONS (22) Filed: Mar 13, 2009 Rader et al. (Jul. 21, 1998) Proceedings of the National Academy of (65) Prior Publication Data Sciences USA vol. 95 pp. 8910 to 8915.* US 201O/OO5638.6 A1 Mar. 4, 2010 (Continued) Primary Examiner — Heather Calamita Related U.S. Application Data Assistant Examiner — Christian Boesen (63) Continuation-in-part of application No. 12/210,072, (74) Attorney, Agent, or Firm — Choate, Hall & Stewart LLP filed on Sep. 12, 2008. (57) ABSTRACT (60) Provisional application No. 60/993,785, filed on Sep. The present invention overcomes the inadequacies inherent in the known methods for generating libraries of antibody-en 14, 2007. coding polynucleotides by specifically designing the libraries (51) Int. Cl. with directed sequence and length diversity. The libraries are C40B 40/08 (2006.01) designed to reflect the preimmune repertoire naturally created (52) U.S. Cl. by the human immune system, with or without DH segments USPC ...... 506/17 derived from other species, and are based on rational design (58) Field of Classification Search informed by examination of publicly available databases of None antibody sequences. See application file for complete search history. 8 Claims, 26 Drawing Sheets

88. Exeter: &: Exists:

3. fiss:f3d

83. & st St.G R - 8.

33.

3.

3 . SSXX s s x s : s s Pereitage US 8,877,688 B2 Page 2

(56) References Cited 6,489,123 B2 12/2002 Osbournet al. 6,492,107 B1 12/2002 Kauffman et al. U.S. PATENT DOCUMENTS 6,492,123 B1 12/2002 Holliger et al. 6,492,160 B1 12/2002 Griffiths et al. 5,688,666 A 11/1997 Bass et al. 6,521.404 B1 2/2003 Griffiths et al. 5,695,941. A 12/1997 Brent et al. 6,531,580 B1 3/2003 Huse et al. 5,712, 120 A 1/1998 Rodriguez et al. 6,544,731 B1 4/2003 Griffiths et al. 5,723,323 A 3, 1998 Kauffman et al. 6,545,142 B1 4/2003 Winter et al. 5,733,743 A 3, 1998 Johnson et al. 6,555,313 B1 4/2003 Griffiths et al. 5,739.281 A 4/1998 Thogersen et al. 6.569,641 B1 5/2003 Kauffman et al. 5,750,373 A 5, 1998 Garrard et al. 6,582,915 B1 6/2003 Griffiths et al. 5,767,260 A 6, 1998 Whitlow et al. 6,589,527 B1 7/2003 Winter et al. 5,780,279 A 7, 1998 Matthews et al. 6,589,741 B2 T/2003 Pluckthun et al. 5,798,208 A 8, 1998 Crea 6,593,081 B1 7/2003 Griffiths et al. 5,811,238 A 9, 1998 Stemmer et al. 6,610,472 B1 8/2003 Zhu et al. 5,814,476 A 9, 1998 Kauffman et al. 6,653,443 B2 11/2003 Zhang et al. 5,817483. A 10, 1998 Kauffman et al. 6,664.048 B1 12/2003 Wanker et al. 5,821,047 A 10, 1998 Garrard et al. 6,680,192 B1 1/2004 Lerner et al. 5,824,514 A 10, 1998 Kauffman et al. 6,696,245 B2 2, 2004 Winter et al. 5,830,663 A 1 1/1998 Embleton et al. 6,696,248 B1 2/2004 Knappik et al. 5,837.242 A 1 1/1998 Holliger et al. 6,696,251 B1 2/2004 Wittrup et al. 5,837,500 A 11/1998 Ladner et al. 6,699,658 B1 3, 2004 Wittrup et al. 5,840,479 A 11/1998 Little et al. 6,706,484 B1 3, 2004 Knappik et al. 5,846,765 A 12/1998 Matthews et al. 6,753,136 B2 6/2004 Lohning 5,858,657 A 1/1999 Winter et al. 6,806,079 B1 10/2004 McCafferty et al. 5,858,671 A 1/1999 Jones 6,828,422 B1 12/2004 Achim et al. 5,863,765 A 1/1999 Berry et al. 6,833,441 B2 12/2004 Wang et al. 5,866,344 A 2/1999 Georgiou 6,846,634 B1 1/2005 Tomlinson et al. 5,869,250 A 2/1999 Cheng et al. 6,916,605 B1 7/2005 McCafferty et al. 5,871,907 A 2f1999 Winter et al. 6,919, 183 B2 7/2005 Fandlet al. 5,872,215 A 2f1999 Osbourne et al. 6,969,586 B1 1 1/2005 Lerner et al. 5,885,793 A 3, 1999 Griffiths et al. 7,005,503 B2 2/2006 Hua et al. 5,888,773 A 3, 1999 Jost et al. 7,063,943 B1 6/2006 McCafferty et al. 5,917,018 A 6/1999 Thogersen et al. 7,083,945 B1 8/2006 Chen et al. 5,922,545 A 7, 1999 Mattheakis et al. 7,094,571 B2 8/2006 Harvey et al. 5,928,868 A 7, 1999 Liu et al. 7,166,423 B1 1/2007 Miltenyi et al. 5,935,831 A 8/1999 Quax et al. 7,189,841 B2 3/2007 Lerner et al. 5.948,620 A 9, 1999 Hurd et al. 7,208,293 B2 4/2007 Ladner et al. 5,955,275 A 9, 1999 Kamb 7,435,553 B2 10/2008 Fandlet al. 5,962.255. A 10/1999 Griffiths et al. 7.465,787 B2 12/2008 Wittrup et al. 5,965,368 A 10/1999 Vidal et al. 7,569,357 B2 8, 2009 Kranz et al. 5,969.108 A 10/1999 McCafferty et al. 2002, 0004215 A1 1/2002 Osbournet al. 5,976,862. A 1 1/1999 Kauffman et al. 2002/0169284 A1 11/2002 Ashkenazi et al. 5,994,515 A 1 1/1999 Hoxie 2002/0197691 Al 12/2002 Sugiyama 88s. A 3. SE 2003.01.14659 A1 6, 2003 Winter et al. W. J. W. 2003. O130496 A1 7, 2003 Winter et al. 2866 A 58 SES al. 2003/0.148372 A1 8/2003 Tomlinson et al. 6,040.136w w A 3/2000 Garrard etal 2.89.2003. O190674 A1 10,858 2003 GriffithsA. et al.1 88: A 58 Ria. 2003/0228.302 A1 12/2003 Crea 6,072,036 A 6/2000 Marasco et al. 2003/0232333 A1 12/2003 Ladner 6,083,693. A 7/2000 Nandabalan et al. 2003/0232395 A1 12/2003 Hufton 6,114,147 A 9, 2000 Frenken et al. 2004/0038921 A1 2/2004 Kreutzer et al. 6,132,963 A 10/2000 Brent et al. 2004/0110941 A2 6/2004 Winter et al. 6,140,471. A 10/2000 Johnson et al. 2004/0146976 A1 7/2004 Wittrup et al. 6,159,705. A 12/2000 Trueheart et al. 2004/0157214 A1 8/2004 McCafferty et al. 6,171,795 B1 1/2001 Korman et al. 2004/0157215 A1 8/2004 McCafferty et al. 6,172,197 B1 1/2001 McCafferty et al. 2004/02196.11 A1 11/2004 Racher 6,180.336 B1 1/2001 Osbourn et al. 2005/02025 12 A1 9, 2005 Tomlinson et al. 6,187,535 B1 2/2001 LeGrain et al. 2006/0003334 A1 1/2006 Achim et al. 6,200,759 B1 3/2001 Dove et al. 2006/0008883 A1 1/2006 LaZaret al. 85. R 38: Wins s al 2006/00 1926.0 A1 1/2006 Lerner et al. 6.2915& B 9/2001 Winter et al. 2006/0166252 A1 7/2006 Ladner et al. 6.29.159 B1 9/2001 Winter et al. 2006/0234302 A1 10, 2006 Ladner et al. 6.29160 B 92001 Lerneret al. 2006, O257937 A1 11, 2006 Ladner 6.29,161 B1 9/2001 Lerner et al. 2007/003 1879 A1 2/2007 Ley et al. 6.291,650 B1 9, 2001 Winter et al. 2007/0099267 A1 5/2007 Harvey et al. 6,300.064 B1 10/2001 Knappik et al. 2007,0258954 A1 11/2007 Iverson et al. 6,300,065 B1 10/2001 Kieke et al. 2008. O153712 A1 6, 2008 Crea 6,319,690 B1 1 1/2001 Little et al. 2008.0171059 A1 7, 2008 Howland et al. 6,342.588 B1 1/2002 Osbournet al. 2008/0206239 A1 8/2008 Jones et al. 6,358,733 B1 3/2002 Motwani et al. 2009/0082213 A1 3/2009 Horowitz et al. 6,406,863 B1 6/2002 Zhu et al. 2010.0009866 A1 1/2010 Prinz et al. 6,410,246 B1 6/2002 Zhu et al. 2010/0292103 A1 11/2010 Ladner 6,410,271 B1 6/2002 Zhu et al. 2011/0009280 A1 1/2011 Hufton et al. 6,420,113 B1 7/2002 Buechler et al. 2011/0082054 A1 4/2011 Ladner 6,423.538 B1 7/2002 Wittrup et al. 2011/O136695 A1 6, 2011 Crea US 8,877,688 B2 Page 3

(56) References Cited Adimab Exhibit 2009, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Sauer, B. “Inducible Gene Targeting in Mice Using the U.S. PATENT DOCUMENTS Crd / lox System” (1998). Adimab Exhibit 2010, Filed in Interference No. 105,809, Filed Aug. FOREIGN PATENT DOCUMENTS 19, 2011 WO 03/02956 Published Apr. 10, 2003. WO WO-88.06630 A1 9, 1988 Adimab Exhibit 2011, Filed in Interference No. 105,809, Filed Aug. WO WO-94/07922 A1 4f1994 19, 2011- EP 02766425 Amendment before Examination filed Jun. WO WO-95/26400 A1 10, 1995 16, 2004. WO WO-97,0832O A1 3, 1997 Adimab Exhibit 2012, Filed in Interference No. 105,809, Filed Aug. WO WO-97/20923 A1 6, 1997 19, 2011- EP Publication No. EP1438400 Published Jun. 17, 2009. WO WO-97.49809 A1 12/1997 Adimab Exhibit 2013, Filed in Interference No. 105,809, Filed Aug. WO WO-98/49198 A1 11F1998 19, 2011 USSN 10,26264: Non-Final Rejection Issued Dec. 13, WO WO-9852976 A1 11, 1998 2007. WO WO-99/06834 A2 2, 1999 WO WO-99.285O2 A1 6, 1999 Adimab Exhibit 2014, Filed in Interference No. 105,809, Filed Aug. WO WO-99,36569 A1 7, 1999 19, 2011—Pogue and Goodnow "Gene Dose-dependent Matura WO WO-99/50461 A1 10, 1999 tion and Receptor Editing of B Cells Expressing Immumoglobulin WO WO-99.53049 A1 10, 1999 (Ig)G1 or IgM/IgG1 Tail Receptors' (2000). WO WO-99,55367 A1 11, 1999 Adimab Exhibit 2015, Filed in Interference No. 105,809, Filed Aug. WO WO-00, 18905 A1 4/2000 WO WO-OO,54057 A1 9, 2000 19, 2011—U.S. Appl. No. 10/262,646: Amendment in Reply to WO WO-01 (79229 A2 10, 2001 Action of Sep. 7, 2006. WO WO-01 (79481 A2 10, 2001 Adimab Exhibit 2016, Filed in Interference No. 105,809, Filed Aug. WO WO-2005/007121 A2 1, 2005 19, 2011—U.S. Appl. No. 10/262,646: Amendment in Reply to WO WO-2005O23993 A2 3, 2005 Action of Dec. 13, 2007. WO WO-2006,138700 A2 12/2006 Adimab Exhibit 2017. Filed in Interference No. 105,809, Filed Aug. WO WO-2007/056441 5/2007 19, 2011—Swers et al., "Shuffled antibody libraries created by in WO WO-2007/056441 A2 5/2007 vivo homologous recombination and yeast Surface display” (2004). WO WO-2008/O19366 A2 2, 2008 Adimab Exhibit 2018, Filed in Interference No. 105,809, Filed Aug. WO WO-2008/053275 5, 2008 19, 2011 U.S. Appl. No. 12/625,337: Dyax's Third Preliminary WO WO-2008/053275 A2 5, 2008 Amendment and Suggestion of Interference under 37 C.F.R. S WO WO-2008067547 A2 6, 2008 41.202(a) filed Jan. 24, 2011. WO WO-201OOO5863 A1 1, 2010 Adimab Exhibit 2019, Filed in Interference No. 105,809, Filed Aug. OTHER PUBLICATIONS 19, 2011—Agilent Technologies, Inc. v. Afjmetrix, Inc. Decided Jun. 4, 2009. Abbas, A.K. et al., Cellular and Molecular Immunology 4th Ed. W.B. Adimab Exhibit 2020, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Jones J. Robertson and Robert M. Currie v. Jos Tim Saunders Company, p. 133 (2000). mermans and Jean C. Raymond Decided May 5, 2010. Adams, G.P. and Schier, R., “Generating Improved Single-Chain Fv Adimab Exhibit 2021, Filed in Interference No. 105,809, Filed Aug. Molecules for Tumor Targeting” Journal of Immunological Methods, 19, 2011 U.S. Appl. No. 10/818,920: Ex parte Donald V. Smart 23 1:249-260 (1999). Appeal 2009-015036. Adams, G.P. and Weiner, L. M., “ therapy of Adimab Exhibit 2022, Filed in Interference No. 105,809, Filed Aug. cancer” Nature Biotechnology, 23(9) 1147-1157 (2005). 19, 2011—Dyax Motions List filed Jun. 28, 2011. Adimab Clean Copy of Claims, Filed in Interference No. 105,809, Adimab Exhibit 2023, Filed in Interference No. 105,809, Filed Aug. Filed May 19, 2011. 19, 2011—U.S. Appl. No. 10/262,646: Final Rejection Issued Oct. 3, Adimab Current List of Exhibits, Filed in Interference No. 105,809, 2008. Filed Jan. 20, 2012. Adimab Exhibit 2024, Filed in Interference No. 105,809, Filed Aug. Adimab Current List of Exhibits, Filed in Interference No. 105,809, 19, 2011—U.S. Appl. No. 10/262,646: Amendment in Reply to Filed Sep. 30, 2011. Action of Oct. 3, 2008 filed Apr. 3, 2009. Adimab Designation of Lead and Backup Counsel, Filed in Interfer Adimab Exhibit 2025, Filed in Interference No. 105,809, Filed Aug. ence No. 105,809, Filed May 19, 2011. 19, 2011—U.S. Appl. No. 12/625,337: Amendment and Response to Adimab Exhibit 2001, Filed in Interference No. 105,809, Filed Aug. Notice to File Missing Parts filed Aug. 17, 2010. 19, 2011 US Patent No. 7,700,302 Issued May 19, 1992. Adimab Exhibit 2026, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2002, Filed in Interference No. 105,809, Filed Aug. 19, 2011 U.S. Appl. No. 09/703,399, filed Oct. 31, 2000. 19, 2011—Boder and Wittrup–Boderetal"Yeast surface display for Adimab Exhibit 2027, Filed in Interference No. 105,809, Filed Aug. screening combinatorial polypeptide libraries” (1997). 19, 2011 USSN 2011/0009280 Published Jan. 13, 2011. Adimab Exhibit 2003, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2028, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Boder and Wittrup “Yeast Surface Display for Directed 19, 2011—Adimab's Reissue U.S. Appl. No. 13/213,302, filed Aug. Evolution of Protein Expression, Affinity, and Stability” (2000). 19, 2011. Adimab Exhibit 2004, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2029, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Schreuder et al., “Immobilizing proteins on the surface of 19, 2011—Patent No. 7,700,302: Amendment in Accordance with 37 yeast cells” (1996). C.F.R.S 1.73(b) filed Aug. 19, 2011. Adimab Exhibit 2005, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2030, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Liu et al. “Rapid Construction of Recombinant DNA by 19, 2011-Dyax's U.S. Appl. No. 10/262,646, filed Sep. 30, 2002. the Univector Plasmid-Fusion System” (2005). Adimab Exhibit 2031, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2006, Filed in Interference No. 105,809, Filed Aug. 19, 2011-Dyax's U.S. Appl. No. 60/326,320, filed Oct. 1, 2001. 19, 2011- Walhout et al., “GATEWAY Recombinational Cloning: Adimab Exhibit 2032, Filed in Interference No. 105,809, Filed Aug. Application to the Cloning of Large Numbers of Open Reading 19, 2011–Adimab's U.S. Appl. No. 1 1/593,957, filed Nov. 6, 2006. Frames or ORFeomes” (2000). Adimab Exhibit 2033, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2007, Filed in Interference No. 105,809, Filed Aug. 19, 2011–Adimab's U.S. Appl. No. 10/360,828, filed Feb. 7, 2003. 19, 2011—Smith, G. “Homologous Recombination Near and Far Adimab Exhibit 2034, Filed in Interference No. 105,809, Filed Aug. from DNA Breaks: Alternative Roles and Contrasting Views” (2001). 19, 2011–Adimab's U.S. Appl. No. 10/133,978, filed Apr. 25, 2002. Adimab Exhibit 2008, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2035, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Declaration of David M. Kranz, Ph.D. 19, 2011–Adimab's U.S. Appl. No. 10/072,301, filed Feb. 8, 2002. US 8,877,688 B2 Page 4

(56) References Cited Adimab Exhibit 2062, Filed in Interference No. 105,809, Filed Nov. 21, 2011—Cappellaro et al "Mating type-specific cell—cell recog OTHER PUBLICATIONS nition of Saccharomyces cerevisiae: cell wall attachment and active sites of a- and O-agglutinin' (1994). Adimab Exhibit 2036, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2063, Filed in Interference No. 105,809, Filed Nov. 19, 2011–Adimab's U.S. Appl. No. 10/071,866, filed Feb. 8, 2002. 21, 2011—Vander vaartet al., “Comparison of Cell Wall Proteins of Adimab Exhibit 2037, Filed in Interference No. 105,809, Filed Aug. Saccharomyces cerevisiae as Anchors for Cell Surface Expression of 19, 2011 U.S. Appl. No. 12/625,337, filed Nov. 24, 2009. Heterologous Proteins” (1997). Adimab Exhibit 2038, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2064, Filed in Interference No. 105,809, Filed Nov. 19, 2011–Assignment for Patent No. 7700302 filed Nov. 6, 2006. 21, 2011—Murai et al., “Construction of a Starch-Utilizing Yeast by Adimab Exhibit 2039, Filed in Interference No. 105,809, Filed Aug. Cell Surface Engineering” (1997). 19, 2011-Filing Receipt for U.S. Appl. No. 1 1/593,957 dated Dec. Adimab Exhibit 2065, Filed in Interference No. 105,809, Filed Nov. 4, 2006. 21, 2011—Ueda and Tanaka “Cell Surface Engineering of Yeast: Adimab Exhibit 2040, Filed in Interference No. 105,809, Filed Aug. Construction of Arming Yeast with Biocatalyst” (2000). 19, 2011—Hua et al., “Minimum Length of Sequence Homology Adimab Exhibit 2066, Filed in Interference No. 105,809, Filed Nov. Required for in Vivo Cloning by Homologous Recombination in 21, 2011—Shibasaki et al., "Construction of an engineered yeast Yeast” (1997). with glucose-inducible emission of green fluorescence from the cell Adimab Exhibit 2041, Filed in Interference No. 105,809, Filed Aug. surface” (2000). 19, 2011—U.S. Appl. No. 10/262,646: Notice of Abandonment dated Adimab Exhibit 2067, Filed in Interference No. 105,809, Filed Nov. Dec. 10, 2009. 21, 2011—Boder and Wittrup “Yeast Surface display system for Adimab Exhibit 2042, Filed in Interference No. 105,809, Filed Aug. antibody engineeirng” (1996). 19, 2011 U.S. Appl. No. 10/133,978: Office Action Restricing Adimab Exhibit 2068, Filed in Interference No. 105,809, Filed Nov. Claims dated Oct. 5, 2004. 21, 2011—Deposition of David M. Kranz, Ph.D. dated Oct. 14, 2011. Adimab Exhibit 2043, Filed in Interference No. 105,809, Filed Aug. Adimab Exhibit 2069, Filed in Interference No. 105,809, Filed Nov. 19, 2011 U.S. Appl. No. 09/703,399 Complete File History. 21, 2011—Deposition of Nathalie Scholler, M.D., Ph.D. dated Oct. Adimab Exhibit 2044, Filed in Interference No. 105,809, Filed Aug. 24, 2011. 19, 2011—U.S. Appl. No. 13/213,302 Second Amendment to Intro Adimab Exhibit 2070, Filed in Interference No. 105,809, Filed Nov. duce Missing Drawing Sheets filed on Aug. 19, 2011. 21, 2011 US Patent No. 6,699.658 Issued Mar. 2, 2004. Adimab Exhibit 2045, Filed in Interference No. 105,809, Filed Sep. Adimab Exhibit 2071, Filed in Interference No. 105,809, Filed Nov. 30, 2011 US Patent No. 7, 138,496 Issued Nov. 21, 2006. 21, 2011 US Patent No. 7,208.293 Issued Apr. 24, 2007. Adimab Exhibit 2046, Filed in Interference No. 105,809, Filed Sep. Adimab Exhibit 2072, Filed in Interference No. 105,809, Filed Nov. 30, 2011 US Patent No. 7,005,503 Issued Feb. 28, 2006. 21, 2011 International Publication No. WO 94/01567 Published Adimab Exhibit 2047, Filed in Interference No. 105,809, Filed Sep. Jan. 20, 1994. 30, 2011 U.S. Appl. No. 13/249,581: Complete File History of the Adimab Exhibit 2073, Filed in Interference No. 105,809, Filed Nov. CON of Reissue. 21, 2011 US Patent No. 6,423,538 Issued Jul 23, 2002. Adimab Exhibit 2048, Filed in Interference No. 105,809, Filed Sep. Adimab Exhibit 2074, Filed in Interference No. 105,809, Filed Nov. 30, 2011—U.S. Appl. No. 13/213,302: Fourth Preliminary Amend 21, 2011—Hendershot and Sitia "Immunoglobulin Assembly and ment filed Aug. 19, 2011. Secretion” (2004). Adimab Exhibit 2049, Filed in Interference No. 105,809, Filed Sep. Adimab Exhibit 2075, Filed in Interference No. 105,809, Filed Nov. 30, 2011 US Patent No. 6,610,472 Issued Aug. 26, 2003. 21, 2011—Hoogenboom and Chames “Natural and designer binding Adimab Exhibit 2050, Filed in Interference No. 105,809, Filed Sep. sites made by phage display technology” (2000). 30, 2011 U.S. Appl. No. 13/213,302: Third Preliminary Amend Adimab Exhibit 2076, Filed in Interference No. 105,809, Filed Nov. ment filed Aug. 19, 2011. 21, 2011 U.S. Appl. No. 1 1/593,957 Notice of Allowance dated Adimab Exhibit 2051, Filed in Interference No. 105,809, Filed Nov. Nov. 19, 2009. 21, 2011—Horwitz et al., “Secretion of functional antibody and Fab Adimab Exhibit 2077. Filed in Interference No. 105,809, Filed Nov. fragment from yeast cells” (1988). 21, 2011- Declaration of Eric T. Boder, Ph.D. filed Nov. 21, 2011. Adimab Exhibit 2052, Filed in Interference No. 105,809, Filed Nov. Adimab Exhibit 2078, Filed in Interference No. 105,809, Filed Nov. 21, 2011 US Publication No. 2003/0232395 Published Dec. 18, 21, 2011—Second Declaration of David M. Kranz, Ph.D. filed Nov. 2003. 21, 2011. Adimab Exhibit 2053, Filed in Interference No. 105,809, Filed Nov. Adimab Exhibit 2079, Filed in Interference No. 105,809, Filed Nov. 21, 2011- Website of Nathalie Scholler, MD., Ph.D. 21, 2011—U.S. Appl. No. 13/300,340: Reissue Application filed Adimab Exhibit 2054, Filed in Interference No. 105,809, Filed Nov. Nov. 18, 2011. 21, 2011—Raymond et al., “General Method for Plasmid Construc Adimab Exhibit 2080 corrected. Filed in Interference 105,809, tion Using Homologous Recombination” (1999). Filed Nov. 21, 2011 U.S. Appl. No. 13/300,308: corrected Reis Adimab Exhibit 2055, Filed in Interference No. 105,809, Filed Nov. Sue Application filed Nov. 18, 2011. 21, 2011 US Patent No. 6,300,065 Issued Oct. 9, 2001. Adimab Exhibit 2081, Filed in Interference No. 105,809, Filed Nov. Adimab Exhibit 2056, Filed in Interference No. 105,809, Filed Nov. 21, 2011—Declaration of James Sheridan, Ph.D. filed Nov. 21, 2011. 21, 2011 US Patent No. 6,027,910 Issued Feb. 22, 2000. Adimab Exhibit 2082, Filed in Interference No. 105,809, Filed Nov. Adimab Exhibit 2057, Filed in Interference No. 105,809, Filed Nov. 21, 2011–Curriculum Vitae for Eric T. Boder, Ph.D. 21, 2011 US Patent No. 6,696,251 Issued Feb. 24, 2004. Adimab Exhibit 2083, Filed in Interference No. 105,809, Filed Nov. Adimab Exhibit 2058, Filed in Interference No. 105,809, Filed Nov. 21, 2011–Curriculum Vitae for David M. Kranz, Ph.D. 21, 2011 International Publication No. WO 94/18330 Published Adimab Exhibit 2084, Filed in Interference No. 105,809, Filed Nov. Aug. 18, 1994. 21, 2011 US Patent No. 5,223,409 Issued Jun. 29, 1993. Adimab Exhibit 2059, Filed in Interference No. 105,809, Filed Nov. Adimab Exhibit 2085, Filed in Interference No. 105,809, Filed Nov. 21, 2011 US Patent No. 6,114,147 Issued Sep. 5, 2000. 21, 2011—Patel et al., “Parallel selection of antibody libraries on Adimab Exhibit 2060, Filed in Interference No. 105,809, Filed Nov. phage and yeast Surfaces via a cross-species display' (2011). 21, 2011—Schreuder et al., “Targeting of a Heterologous Protein to Adimab Exhibit 2086, Filed in Interference No. 105,809, Filed Nov. the Cell Wall of Saccharamyces cerevisiae' (1993). 21, 2011—Struhl et al., “High-frequency transformation of yeast: Adimab Exhibit 2061, Filed in Interference No. 105,809, Filed Nov. Autonomous replication of hybrid DNA molecules” (1979). 21, 2011—Imai and Yamamoto “The fission yeast mating pheromone Adimab Exhibit 2087, Filed in Interference No. 105,809, Filed Jan. P-factor: its molecular structure, gene structure, and ability to induce 20, 2012 Ueda and Tanaka "Genetic immobilization of proteins on gene expression and Garrest in the mating partner' (1993). the yeast cell surface” (2000). US 8,877,688 B2 Page 5

(56) References Cited Adimab Notice of Service of Priority Statement, Filed in Interference No. 105,809, Filed Aug. 26, 2011. OTHER PUBLICATIONS Adimab Opposition 1. Filed in Interference No. 105,809, Filed Nov. 21, 2011. Adimab Exhibit 2088, Filed in Interference No. 105,809, Filed Jan. Adimab Opposition 2, Filed in Interference No. 105,809, Filed Nov. 20, 2012 Deposition of Eric T. Boder, Ph.D. dated Dec. 20, 2011. 21, 2011. Adimab Exhibit 2089, Filed in Interference No. 105,809, Filed Jan. Adimab Opposition 3, Filed in Interference No. 105,809, Filed Nov. 20, 2012—Second Deposition of David M. Kranz, Ph.D. dated Jan. 3. 21, 2011. 2012. Adimab Opposition 4. Filed in Interference No. 105,809, Filed Nov. Adimab Exhibit 2090, Filed in Interference No. 105,809, Filed Jan. 21, 2011. 20, 2012—Schöndorf et al., “Characterization of the Complete Adimab Priority Statement, Filed in Interference No. 105,809, Filed Genome of the Tupaia (Tree Shrew) Adenovirus' (2002/2003). Aug. 19, 2011. Adimab Exhibit 2091, Filed in Interference No. 105,809, Filed Jan. Adimab Reply 1, Filed in Interference No. 105,809, Filed Jan. 20. 20, 2012—Availability emails for Dr. Sheridan, Dr. Boder, Dr. Kranz, 2012. Adimab Reply 2, Filed in Interference No. 105,809, Filed Jan. 20. Dr. Zhu and Dr. Scholler, between Wolf, Greenfield & Sacks, P.C. and 2012. Choate Hall & Stewart LLP. Adimab Reply 3, Filed in Interference No. 105,809, Filed Jan. 20. Adimab Exhibit 2092, Filed in Interference No. 105,809, Filed Jan. 2012. 20, 2012—Second Deposition of Nathalie Scholler, M.D., Ph.D. Adimab Reply 4, Filed in Interference No. 105,809, Filed Jan. 20. dated Jan. 12, 2012. 2012. Adimab Exhibit 2093, Filed in Interference No. 105,809, Filed Jan. Adimab Reply 5, Filed in Interference No. 105,809, Filed Jan. 20, 20, 2012 US Patent No. 7,700,302: Application for Reissue of 2012. Utility Patent filed on Sep. 30, 2011. Adimab Reply 6, Filed in Interference No. 105,809, Filed Jan. 20. Adimab Exhibit 2094, Filed in Interference No. 105,809, Filed Jan. 2012. 20, 2012—Davison et al., “Genetic content and evolution of Adimab Request for File Copies, Filed in Interference No. 105,809, adenoviruses” (2003). Filed May 19, 2011. Adimab Exhibit 2095, Filed in Interference No. 105,809, Filed Jan. Adimab Request for Oral Argument, Filed in Interference No. 20, 2012—U.S. Appl. No. 13/213.302 Filing Receipt for Reissue of 105,809, Filed Feb. 3, 2012. Patent No. 7,700,302 dated Oct. 11, 2011. Adimab Responsive Motion 6, Filed in Interference No. 105,809, Adimab Exhibit 2096, Filed in Interference No. 105,809, Filed Jan. Filed Sep. 30, 2011. 20, 2012 U.S. Appl. No. 13/300,308 Filing Receipt for Reissue of Adimab Substantive Motion 1, Filed in Interference No. 105,809, Patent No. 7,138,496 dated Dec. 26, 2011. Filed Aug. 19, 2011. Adimab Exhibit 2097, Filed in Interference No. 105,809, Filed Jan. Adimab Substantive Motion 2, Filed in Interference No. 105,809, 20, 2012 U.S. Appl. No. 13/213,302: Petition to Accept an Unin Filed Aug. 19, 2011. tentionally Delayed Priority Claim under 37 C.F.R.S 1.78(a)(3) filed Adimab Substantive Motion 3, Filed in Interference No. 105,809, Jan. 18, 2012. Filed Aug. 19, 2011. Adimab Exhibit 2098, Filed in Interference No. 105,809, Filed Jan. Adimab Substantive Motion 4 request to substitute count, Filed in 20, 2012 U.S. Appl. No. 13/300,340: Petition to Accept an Unin Interference No. 105,809, Filed Aug. 19, 2011. tentionally Delayed Priority Claim under 37 C.F.R.S 1.78(a)(3) filed Adimab Substantive Motion 5, Filed in Interference No. 105,809, Jan. 18, 2012. Filed Aug. 19, 2011. Adimab Exhibit 2099, Filed in Interference No. 105,809, Filed Jan. Adimab's Notice of Filing Preliminary Amendment, Filed in Inter 20, 2012 U.S. Appl. No. 13/300,308: Petition to Accept an Unin ference No. 105,809, Filed Sep. 8, 2011. tentionally Delayed Priority Claim under 37 C.F.R.S 1.78(a)(3) filed Akamatsu, Y. et al., “Construction of a human Ig combinatorial Jan. 19, 2012. library from genomic V segments and synthetic CDR3 fragments' J. Adimab Exhibit 2100, Filed in Interference No. 105,809, Filed Feb. Immunol. 51(9):4651-4659 (1993). 3, 2012–Adimab's Objections to II 26, 44-47 & 48-51 of Dyax Allen, J.B. et al., “Finding prospective partners in the library: the Exhibit 1112 filed Jan. 27, 2012. two-hybrid system and phage display find a match” TIBS, Adimab List of Exhibits, Filed in Interference No. 105,809, Filed 20:(12):511-516 (1995). Feb. 3, 2012. Alt, F.W. and Baltimore, D., "Joining of Immunoglobulin Heavy Adimab List of Proposed Motions, Filed in Interference No. 105,809, Chain Gene Segments: Implications from a Chromosome with Evi Filed Jun. 28, 2011. dence of Three D-JH Fusions” PNAS, 79:41 18-4122 (1982). Adimab Misc. Motion 7 Approved, Filed in Interference No. Alves, J. et al., “Accuracy of the EcoRV restriction endonuclease: 105,809, Filed Nov. 23, 2011. binding and cleavage studies with oligodeoxynucleotide Substrates Adimab Miscellaneous Motion 7, Filed in Interference No. 105,809, containing degenerate recognition sequences' Biochemistry, 34(35): Filed Nov. 23, 2011. 11191-1 1197 (1995). Adimab Miscellaneous Motion 8, Filed in Interference No. 105,809, Arden, B., "Conserved motifs in T-cell receptor CDR1 and CDR2: Filed Feb. 3, 2012. implications for ligand and CDS co-receptor binding” Current Opin Adimab Miscellaneous Motion 9, Filed in Interference No. 105,809, ion in Immunology, Current Biology LTD., 10(1):74-81 (1998). Filed Feb. 3, 2012. Aronheim, Ami et al., “Isolation of an AP-1 Repressor by a Novel Adimab Notice Designating Additional Backup Counsel, Filed in Method for Detecting Protein-Protein Interactions' Molecular and Interference No. 105,809, Filed Dec. 12, 2011. Cellular Biology, 17(6):3094-3102 (1997). Adimab Notice of Exhibit List Filed, Filed in Interference No. Aujame, L. et al., “High affinity human by phage display 105,809, Filed Jan. 20, 2012. Human Antibodies, 8(4): 155-168 (1997). Adimab Notice of Exhibit List Filed, Filed in Interference No. Ayala, M. et al., “Variable region sequence modulates periplasmic 105,809, Filed Feb. 3, 2012. export of a single-chain Fv antibody fragment in Escherichia coli Adimab Notice of Exhibits filed, Filed in Interference No. 105,809, BioTechniques, 18(5):832-838, 840-2 (1995). Filed Aug. 19, 2011. Balint, R.F. and Larrick, J.W., “Antibody engineering by parsimoni Adimab Notice of Exhibits Filed, Filed in Interference No. 105,809, ous mutagenesis' Gene, 137: 109-118 (1993). Filed Sep. 30, 2011. Barbas, C.F. 3rd et al., “Human autoantibody recognition of DNA” Adimab Notice of Real Party in Interest, Filed in Interference No. Proc. Natl. Acad. Sci., 92:2529-2533 (1995). 105,809, Filed May 19, 2011. Barbas, C.F. 3rdet al., “Semisynthetic combinatorial antibody librar Adimab Notice of Related Proceedings, Filed in Interference No. ies: a chemical solution to the diversity problem” Proceedings of the 105,809, Filed May 19, 2011. National Academy of Sciences of USA, 89:4457-4461 (1992). US 8,877,688 B2 Page 6

(56) References Cited Cattaneo, A. and Biocca, S., “The selection of intracellular antibod ies” TIBTECH, 17: 115-120 (1999). OTHER PUBLICATIONS Cha, J.H. et al., “Cell surface monkey CD9 antigen is a coreceptor that increases diphtheria toxin sensitivity and diphtheria toxin recep Barbas, C.F. 3rd, et al., “Assembly of combinatorial antibody librar tor affinity” Journal of Biological Chemistry, 275(10):6901-6907 ies on phage surfaces: The gene III site' Proc. Natl. Acad. Sci., (2000). 88:7978-7982 (1991). Chang, C.N. et al., “Expression of antibody Fab domains on Basu, M. et al., “Synthesis of compositionally unique DNA by ter bacteriophage surfaces. Potential use for antibody selection' J. minal deoxynucleotidyl transferase” Biochem. Biophys. Res. Immunol. 147(10):3610-3614. (1991). Comm., 111(3): 1105-1112 (1983). Bhatia, S.K. et al., “Rolling adhesion kinematics of yeast engineered Chang, H.C. et al., “A general method for facilitating heterodimeric to express selectins' Biotech. Prog., 19:1033-1037 (2003). pairing between two proteins: Application to expression of alpha and Binz, H.K. et al., “Engineering novel binding proteins from nonim beta T-cell receptor extracellular segments' Proc Natl. Acad. Sci., munoglobulin domains' Nat. Biotechnol. 23(10): 1257-1268 (2005). USA, 91:11408-11412 (1994). Bird, R.E. et al., “Single-chain antigen-binding proteins' Science, Charlton, H.R. et al., "Characterisation of a generic monoclonal 242(4877):423-426 (1988). antibody harvesting system for adsorption of DNA by depth filters Blakesley, et al., “Duplex Regions in "Single-stranded” oX174 DNA and various membranes' Bioseparation 8: 281-291 (1999). Are Cleaved by a Restriction Endonuclease from Haemophilus Chaudhary, V.K. et al., “A rapid method of cloning functional vari aegyptius' The Journal of Bilogical Chemistry, 252:7300-7306 able-region antibody genes in Escherichia coli as single-chain (1977). immunotoxins” Proc. Natl. Acad. Sci., 87(3):1066-1070 (1990). Boder, E.T. and Jiang, W., “Engineering Antibodies for Cancer Chen, C.M. et al., “Direct interaction of hepatitis C virus core protein Therapy” Annu. Rev. Chem. Biomol. Eng. 2:53-75 (2011). with the cellular lymphotoxin-beta receptor modulates the signal Boder, E.T. and Wittrup, K.D., "Optimal screening of surface-dis pathway of the lymphotoxin-beta receptor” Journal of Virology, played polypeptide libraries' Biotechnol Prog., 14(1):55-62 (1998). 71(12):9414-9426 (1997). Boder, E.T. and Wittrup, K.D., “Yeast surface display for screening Chen, W. et al., “Characterization of germline antibody libraries from combinatorial polypeptide libraries' Nat Biotechnol. 15(6):553-7 human umbilical cord blood and selection of monoclonal antibodies (1997). to viral envelope glycoproteins: Implications for mechanisms of Boder, E.T. et al., “Directed evolution of antibody fragments with immune evasion and design of vaccine immunogens' Biochem. monovalent femtomolar antigen-binding affinity” Proc Natl AcadSci Biophys. Res. Commun. 1-6 (2012). USA,97(20): 10701-5 (2000). Chiswell, David and McCaffery, John, "Phage antibodies: will new Borth, N. et al., “Efficient selection of high-producing subclones 'colliclonal antibodies replace monoclonal antibodies?” TIBTECH, during gene amplification of recombinant Chinese hamster ovary 10(3):80-84 (1992). cells by flow cytometry and cell sorting” Biotechnol. and Bioengin. Chothia, C. and Lesk, A.M.. “Canonical structures for the hypervari 71(4):266-273 (2000-2001). able regions of immunoglobulins' J. Mol. Biol. 196(4):901-917 Brachmann, Rainer K. and Boeke, J.D., “Tag games in yeast: the (1987). two-hybrid system and beyond”. Curr. Opin. Biotechnol. 8(5):561 Chothia, C. et al., “Conformations of immunoglobulin hypervariable 568 (1997). regions' Nature, 342(6252):877-883 (1989). Bradbury, A., “Display Technologies Expand Their Horizons' Chothia, C. et al., “Structural repertoire of the human VH segments' TIBTECH 17:137-138 (1999). J. Mol. Biol., 227(3):799-817 (1992). Bradbury, A., “Molecular Library Technologies at the Millenium”. Clackson, T. and Wells, J.A., “In vitro selection from protein and TIBTECH 18:132-133 (2000). peptide libraries' Elsevier Science Ltd., 12(5):173-184 (1994). Bradbury, A., “Recent advances in phage display: the report of the Clackson, T. et al., “Making antibody fragments using phage display Phage Club first meeting” Immunotechnology, 3(3):227-231 (1997). librarires' Nature, 352(6336):624-628 (1991). Bray, J.K. et al., "Optimized Torsion-Angle Normal Modes Repro Involved claims, Filed in Interference No. 105,809, Filed May 20, duce Conformational Changes More Accurately Than Cartesian 2011. Modes' Biophysical Journal 101 2966-2969 (2011). Co., M.S. and Queen, C., “Humanized antibodies for therapy'Nature, Breitling, F. et al., “A Surface expression vector for antibody Screen 351(6326):501-502 (1991). ing” Gene, 104(2): 147-153 (1991). Cochet O. et al., “Intracellular expression of an antibody fragment Brezinschek, H.P. et al., “Analysis of the human VH gene repertoire. neutralizing p21 ras promotes tumor regression'. Cancer Res. Differential effects of selection and Somatic hypermutation on human 58(6): 1170-1176 (1998). peripheral CD5(+)/IgM+ and CD5(-)/IgM+ B cells' The American Collins, A.M. et al., “Partitioning of rearranged Ig genes by mutation Society for Clinical Investigation, Inc., 99(10):2488-2501 (1997). analysis demonstrates D-D fusion and V gene replacement in the Broder, Y.C. et al., “The ras recruitment system, a novel approach to expressed human repertoire” J. Immunol. 172(1):340-348 (2004). the study of protein-protein interactions' Current Biology Collins, A.M. et al., “The reported germline repertoire of human 8(20): 1121-1124 (1998). immunoglobulin kappa chain genes is relatively complete and accu Burton, D.R. et al., “A large array of human monoclonal antibodies to rate” Immunogenetics, 60(11):669-676 (2008). type 1 human immunodeficiency virus from combinatorial libraries Corbett, S.J. et al., “Sequence of the human immunoglobulin diver of asymptomatic seropositive individuals' Proc. Natl. Acad. Sci., sity (D) segment locus: a systematic analysis provides no evidence 88(22): 10134-10137 (1991). for the use of DIR segments, inverted Degments, “minor D segments Canaan-Haden, L., “Purification and application of a single-chain Fv or D-D recombination” J. Mol. Bioi..., 270:587-597 (1997). antibody fragment specific to hepatitis B virus Surface antigen” Corrected Adimab Opposition 4. Filed in Interference No. 105,809, BioTechniques, 19(4) 606-608, 610, 612 passim(1995). Filed Nov. 23, 2011. Casset, Fet al., “A peptide mimetic of an anti-CD4 monoclonal Courtney, B.C. et al., “A phage display vector with improved stabil antibody by rational design” Biochemical and Biophysical Research ity, applicability and ease of manipulation”. Gene, 165(1): 139-140 Communications, 307(1):198-205, (2003). (1995). Castelli, L.A. et al., “High-level secretion of correctily processed Couto, J.R. et al., “Designing human consensus antibodies with beta-lactamase from Saccharomyces cera visiae using a high-copy minimal positional templates', Cancer Res., (23 Suppl):5973s-5977s number secretion vector' Biomolecular Research Institute, (1995). 142(1): 113-117 (1994). Crameri, R. and Blaser, K., "Cloning Aspergillus filmigatus allergens Caton, A.J. and Koprowski, H., “Influenza virus hemagglutinin-spe by the pluFo filamentous phage display system” Int Arch Allergy cific antibodies isolated from a combinatorial expression library are Immunol. 110(1):41-45 (1996). closely related to the immune response of the donor” Proc. Natl. Crameri, R. and Suter, M., “Display of biologically active proteins on Acad. Science, USA, 87(16):6450-6454 (1990). the Surface offilamentous phages: a cDNA cloning system for selec US 8,877,688 B2 Page 7

(56) References Cited Dyax Exhibit 1013, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Yeung and Wittrup "Quantitative Screening of Yeast Surface OTHER PUBLICATIONS Displayed Polypeptide Libraries by Magnetic Bead Capture” (2002). Dyax Exhibit 1014, Filed in Interference No. 105,809, Filed Aug. 19, tion of functional gene products linked to the genetic information 2011—Orr et al., “Rapid Method for Measuring Schiv Thermal Sta responsible for their production” Gene, 137(1):69-75 (1993). bility by Yeast Surface Display” (2003). Current List of Exhibits, Filed in Interference No. 105,809, Filed Dyax Exhibit 1015, Filed in Interference No. 105,809, Filed Aug. 19, Nov. 21, 2011. 2011—Griffin et al., “Blockade ofT Cell Activation Using a Surface Cwirla, S.E., et al., “Peptides on phage: a vast library of peptides for Linked Single-Chain Antibody to CTLA-4 (CD152)” (2000). identifying ligands' Proc. Natl. Acad. Sci. USA, 87(16):6378-6382 Dyax Exhibit 1016, Filed in Interference No. 105,809, Filed Aug. 19, (1990). Davies, J. and Riechmann, L., “Affinity improvement of single anti 2011—Kieke et al., “High Affinity T Cell Receptors from Yeast body VH domains: residues in all three hypervariable regions affect Display Libraries BlockT Cell Activation by Superantigens' (2001). antigen binding” Immunotechnology 2(3): 169-179 (1996). Dyax Exhibit 1017, Filed in Interference No. 105,809, Filed Aug. 19, de Haard et al., “A Large Non-immunized Human Fab Fragment 2011—Brophy et al., “A yeast display system for engineering func Phage Library That Permits Rapid Isolation and Kinetic Analysis of tional peptide-MHC complexes” (2003). High Affinity Antibodies' Journal of Biological Chemistry, Dyax Exhibit 1018, Filed in Interference No. 105,809, Filed Aug. 19, 274(26): 18218-18230 (1999). 2011—Starwalt et al., “Directed evolution of a single-chain class II De Jaeger, G. et al., “Analysis of the interaction between single-chain MHC product by yeast display” (2003). variable fragments and their antigen in a reducing intracellular envi Dyax Exhibit 1019, Filed in Interference No. 105,809, Filed Aug. 19, ronment using the two-hybrid system” FEBS Lett., 467(2-3):316 2011—van den Beucken et al., “Affnity maturation of Fab antibody 320 (2000). fragments by fluorescent-activated cell sorting of yeast-displayed de Kruif, J. et al., “Selection and application of human single chain Fv libraries” (2003). antibody fragments from a semi-synthetic phage antibody display Dyax Exhibit 1020, Filed in Interference No. 105,809, Filed Aug. 19, library with designed CDR3 regions” J. Mol. Biol. 248(1):97-105 2011—Brochure of the 13th Annual Phage and Yeast Display of (1995). Antibodies and Protein Conference. De Simone, A. et al., “Experimental free energy Surfaces reveal the Dyax Exhibit 1021, Filed in Interference No. 105,809, Filed Aug. 19, mechanisms of maintenance of protein solubility” PNAS, 108:52 2011 US Patent No. 6,610,472 Issued Aug. 26, 2003. 21057-21062 (2011). Dyax Exhibit 1022, Filed in Interference No. 105,809, Filed Aug. 19, Declaration BD.R. 203, Filed in Interference No. 105,809, Filed May 2011—Walhout et al., “GATEWAY Recombinational Cloning: 6, 2011. Application to the Cloning of Large Numbers of Open Reading Delves, P.J. "Antibody production: essential techniques' John Wiley Frames or ORFeomes” (2000). & Sons, New York, pp. 90-113 (1997). Dyax Exhibit 1023, Filed in Interference No. 105,809, Filed Aug. 19, Designation of Lead and Backup Counsel, Filed in Interference No. 2011–Swers et al., “Shuffled antibody libraries created in vivo 105,809, Filed May 20, 2011. homologous recombination and yeast Surface display' (2004). Dyax Corrected Substantive Motion 1. Filed in Interference No. Dyax Exhibit 1024, Filed in Interference No. 105,809, Filed Aug. 19, 105,809, Filed Sep. 2, 2011. 2011—Swers et al., “Integrated Mimicry of B Cell Antibody Dyax Exhibit 1000, Filed in Interference No. 105,809, Filed Aug. 19, Mutagenesis. Using Yeast Homologous Recombination” (2011). 2011—Dyax Exhibit List dated Aug. 19, 2011. Dyax Exhibit 1025, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1001, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Abbas et al., "Cellular and Molecular Immunology, 4th ed., 2011 US Patent No. 7,700,302 Issued Apr. 20, 2010. p. 43, Figure 3-1. W.B. Saunders Co. (2000). Dyax Exhibit 1002, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1026, Filed in Interference No. 105,809, Filed Aug. 19, 2011 U.S. Appl. No. 12/625,337, filed Nov. 24, 2009. 2011—Lewin, B. “Genes V, p.99, Oxford University Press (1994). Dyax Exhibit 1003, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1027. Filed in Interference No. 105,809, Filed Aug. 19, 2011–Curriculum Vitae for Nathalie Scholler (née Buonavista), 2011—Dranginis et al., "A Biochemical Guide to Yeast Adhesins: M.D., Ph.D. Glycoproteins for Social and Antisocial Occasions” (2007). Dyax Exhibit 1004, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1027A, Filed in Interference No. 105,809, Filed Aug. 2011—Boder and Wittrup “Yeast surface display for screening 19, 2011—Dranginiset al., “A Biochemical Guide to Yeast Adhesins: combinatorial polypeptide libraries” (1997). Glycoproteins for Social and Antisocial Occasions” (2007). Dyax Exhibit 1005, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1028, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Kieke et al., “Isolation of anti-T cell receptor schiv mutants by 2011—Sharifmoghadam, et al., “The fission yeast Map4 protein is a yeast surface display” (1997). novel adhesin required formating” (2006). Dyax Exhibit 1006, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1029, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Cho et al., “A yeast surface display system for the discovery 2011 International Publication No. WO 02/055718 Published Jul. of ligands that trigger cell activation” (1998). 18, 2002. Dyax Exhibit 1007, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1030, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Boder and Wittrup "Optimal Screening of Surface-Displayed 2011–Specification of U.S. Appl. No. 09/703,399 as filed. Polypeptide Libraries” (1998). Dyax Exhibit 1031, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1008, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Ma et al., “Plasmid construction by homologous recombina 2011—Kieke et al., “Selection of functional T cell receptor mutants tion in yeast” (1987). from a yeast surface-display library” (1999). Dyax Exhibit 1032, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1009, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Wood et al., “The synthesis and in vivo assembly of func 2011—VanAntwerp and Wittrup "Fine Affinity Discrimination by tional antibodies in yeast” (1985). Yeast Surface Display and Flow Cytometry” (2000). Dyax Exhibit 1033, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1010, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Weaver-Feldhaus et al., “Yeast mating for combinatorial Fab 2011—Holler et al., “In vitro evolution of a T cell recepto with high library generation and surface display” (2004). affinity for peptide / MHC” (2000). Dyax Exhibit 1034, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1011, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Shen et al., “Delineation of Functional Regions within the 2011—Shusta et al., “Directed evolution of a stable scaffold for T-cell Subunits of the Saccharomyces cerevisiae Cell Adhesion Molecule receptor engineering” (2000). a-Agglutinin” (2001). Dyax Exhibit 1012, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1035, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Boder et al., “Directed evolution of antibody fragments with 2011—Ma et al., “Association of Transport-Defective Light Chains monovalent femtomolar antigen-binding affinity” (2000). with Immunoglobulin Heavy Chain Binding Protein' (1990). US 8,877,688 B2 Page 8

(56) References Cited Dyax Exhibit 1060, Filed in Interference No. 105,809, Filed Sep. 30, 2011—Souriau and Hudson " for cancer OTHER PUBLICATIONS diagnosis and therapy” (2001). Dyax Exhibit 1061, Filed in Interference No. 105,809, Filed Sep. 30, Dyax Exhibit 1036, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Liu et al., “Rapid Construction of Recombinant DNA by the 2011—Colby et al., “Development of a Human Light Chain Variable Univector Plasmid-Fusion System” (2000). Domain(V) Intracellular Antibody Specific for the Amino Terminus Dyax Exhibit 1062, Filed in Interference No. 105,809, Filed Sep. 30, of Huntingtin via Yeast Surface Display” (2004). 2011—Hua et al., “Construction of a modular yeast two-hybrid Dyax Exhibit 1037, Filed in Interference No. 105,809, Filed Aug. 19, cDNA library from human EST clones for the human genome protein 2011 US Patent No. 4.946,778 Issued Aug. 7, 1990. linkage map' (1998). Dyax Exhibit 1038, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1063, Filed in Interference No. 105,809, Filed Sep. 30, 2011—Hamilton and Gerngross "Glycosylation engineering in 2011—Abbas et al., Cellular and Molecular Immunology, 4th ed., yeast: the advent of fully humanized yeast” (2007). W.B. Saunders Co. (2000). Dyax Exhibit 1039, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1064, Filed in Interference No. 105,809, Filed Sep. 30, 2011—Bird et al., “Single-Chain Antigen-Binding Proteins” (1998). 2011–Burke et al., “Methods in Yeast Genetics” (2000). Dyax Exhibit 1065, Filed in Interference No. 105,809, Filed Sep. 30, Dyax Exhibit 1040, Filed in Interference No. 105,809, Filed Aug. 19, 2011—U.S. Appl. No. 12/625,337: Third Preliminary Amendment 2011—Boder and Wittrup “Yeast Surface Display for Directed Evo and Suggestion of an Interference under 37 C.F.R. S. 41.202(a) filed lution of Protein Expression, Affinity, and Stability” (2000). Jan. 24, 2011. Dyax Exhibit 1041, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1066, Filed in Interference No. 105,809, Filed Sep. 30, 2011—First Declaration of Nathalie Scholler, M.D., Ph.D. filed Aug. 2011—Alberts et al., “Molecular Biology of the Cell Third Edition” 19, 2011. (1994). Dyax Exhibit 1042, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1067, Filed in Interference No. 105,809, Filed Sep. 30, 2011 U.S. Appl. No. 1 1/593.957: Non-final Office Action dated 2011 U.S. Appl. No. 10/262,646, filed Sep. 30, 2002. Aug. 12, 2008. Dyax Exhibit 1068, Filed in Interference No. 105,809, Filed Sep. 30, Dyax Exhibit 1043, Filed in Interference No. 105,809, Filed Aug. 19, 2011 U.S. Appl. No. 60/326,320, filed Oct. 1, 2001. 2011—U.S. Appl. No. 1 1/593.957: Adimab's response to the Non Dyax Exhibit 1069, Filed in Interference No. 105,809, Filed Sep. 30, final Office Action filed Jan. 13, 2009. 2011—Adimab's Substantive Motion 2 (for no interference-in-fact) Dyax Exhibit 1044, Filed in Interference No. 105,809, Filed Aug. 19, filed Aug. 19, 2011. 2011- First Declaration of Li Zhu, Ph.D. dated Jan. 13, 2009. Dyax Exhibit 1070, Filed in Interference No. 105,809, Filed Sep. 30, Dyax Exhibit 1045, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Marks et al., “Bypassing Immunization Human Antibodies 2011 U.S. Appl. No. 1 1/593,957: Final Office Action dated Apr. 8, from V-gene Libraries Displayed on Phage” (1991). 2009. Dyax Exhibit 1071, Filed in Interference No. 105,809, Filed Sep. 30, 2011—Abbas et al., Cellular and Molecular Immunology, Fourth Dyax Exhibit 1046, Filed in Interference No. 105,809, Filed Aug. 19, Edition—Section III Maturation, Activation, and Regulation of Lym 2011—U.S. Appl. No. 1 1/593.957: Adimab's response to Final phocytes (2000). Office Action filed Oct. 7, 2009. Dyax Exhibit 1072, Filed in Interference No. 105,809, Filed Sep. 30, Dyax Exhibit 1047, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Second Declaration of Nathalie Scholler, M.D., Ph.D. dated 2011- Second Declaration of Li Zhu, Ph.D. filed Oct. 7, 2009. Sep. 29, 2011. Dyax Exhibit 1048, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1073, Filed in Interference No. 105,809, Filed Sep. 30, 2011—U.S. Appl. No. 1 1/593.957: Request to Change Inventorship 2011—U.S. Appl. No. 10/262,646: Final Office Action with Restric under 37 C.F.R.S 148(b) filed Oct. 7, 2009. tion Requirement dated Mar. 9, 2005. Dyax Exhibit 1049. Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1074, Filed in Interference No. 105,809, Filed Sep. 30, 2011—Sambrook et al., “Molecular Cloning: A Laboratory Manual” 2011—U.S. Appl. No. 10/262,646: Response to Restriction Require (1989). ment filed Apr. 8, 2005. Dyax Exhibit 1050, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1075, Filed in Interference No. 105,809, Filed Sep. 30, 2011—Claim chart showing disclosures in the 472 patent for claim 2011—U.S. Appl. No. 12/625,337: Amendment and Response to 1 of the 302 patent. Notice to File Missing Parts filed Aug. 17, 2010. Dyax Exhibit 1051, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1076, Filed in Interference No. 105,809, Filed Nov. 21, 2011—Claim chart showing disclosure in the 472 patent for claims 2011–Curriculum Vitae for David M. Kranz, Ph.D. 2-15 of the 302 patent. Dyax Exhibit 1077, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1052, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Woods and Gietz. “High-Efficiency Transformation of 2011—Claim chart showing disclosures in Boder for claim 1 of the Plasmid DNA into Yeast'. Methods in Molecular Biology, vol. 177. 302 patent. Dyax Exhibit 1078, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1053, Filed in Interference No. 105,809, Filed Aug. 19, 2011—David M. Kranz, Ph.D. biography from the University of 2011—Claim chart showing disclosures in Boder for claims 2-15 of Illinois. the 302 patent. Dyax Exhibit 1079, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1054, Filed in Interference No. 105,809, Filed Aug. 19, 2011 US Patent Application 2001/0037016 Published Nov. 1, 2011—Claim chart showing disclosures in Boder and Walhout for 2001. claim 1 of the 302 patent. Dyax Exhibit 1080, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1055, Filed in Interference No. 105,809, Filed Aug. 19, 2011 US Patent Publication No. US 2002/0026653 Published Feb. 2011—Claim chart showing disclosures in Boder and Walhout for 28, 2002. claim 2-15 of the 302 patent. Dyax Exhibit 1081, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1056, Filed in Interference No. 105,809, Filed Aug. 19, 2011 US Patent Publication No. US 2002/0037280 Published Mar. 2011—An on-line document showing publication date of Methods in 28, 2002. Enzymology, vol. 328. Dyax Exhibit 1082, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1057, Filed in Interference No. 105,809, Filed Aug. 19, 2011—Alberts et al., Molecular Biology of the Cell. Forth Edition, 2011—Karu et al., “Recombinant Antibody Technology” (1995). pp. 293-4 (2001). Dyax Exhibit 1058, Filed in Interference No. 105,809, Filed Aug. 19, Dyax Exhibit 1083, Filed in Interference No. 105,809, Filed Nov. 21, 2011 U.S. Appl. No. 1 1/593,957: Complete File History. 2011—Alberts et al., Molecular Biology of the Cell. Forth Edition, Dyax Exhibit 1059, Filed in Interference No. 105,809, Filed Sep. 30, pp. 540-1 (2001). 2011—U.S. Appl. No. 12/625,337: Fourth Preliminary Amendment Dyax Exhibit 1084, Filed in Interference No. 105,809, Filed Nov. 21, filed Sep. 30, 2011. 2011—The Hena Protein-Protein Interaction Webpage print out. US 8,877,688 B2 Page 9

(56) References Cited Dyax Exhibit 1107, Filed in Interference No. 105,809, Filed Jan. 20, 2012–Decision on Motions and Order for Patent Interference No. OTHER PUBLICATIONS 104,424 dated Jul. 28, 2000. Dyax Exhibit 1108, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Exhibit 1085, Filed in Interference No. 105,809, Filed Nov. 21, 2012—Stemmer, W. "DNA shuffling by random fragmentation and 2011—Deposition of David M. Kranz, Ph.D. dated Oct. 14, 2011. reassembly: In vitro recombination for molecular evolution” (1993/ Dyax Exhibit 1086, Filed in Interference No. 105,809, Filed Nov. 21, 1994). 2011—Deposition of Nathalie Scholler, M.D., Ph.D. dated Oct. 24. Dyax Exhibit 1109, Filed in Interference No. 105,809, Filed Jan. 20, 2011. 2012–Deposition of Eric T. Boder, Ph.D. dated Dec. 20, 2011. Dyax Exhibit 1087, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1110, Filed in Interference No. 105,809, Filed Jan. 20, 2011—In-Fusion SMARTer Directional cDNA Library Construction 2012—Second Deposition of David M. Kranz, Ph.D. dated Jan. 3. Kit User Manual Clontech—Jun. 2011. 2012. Dyax Exhibit 1088, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1111, Filed in Interference No. 105,809, Filed Jan. 20, 2011 US Patent Publication No. US 2003/009 1995 Published May 2012—Second Deposition of Nathalie Scholler, M.D., Ph.D. dated 15, 2003. Jan. 12, 2012. Dyax Exhibit 1089, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1112, Filed in Interference No. 105,809, Filed Jan. 20, 2011 US Patent Publication No. US 2001/0041333 Published Nov. 2012—Fourth Declaration of Nathalie Scholler, M.D., Ph.D. dated 15, 2001. Jan. 19, 2012. Dyax Exhibit 1090, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit 1113, Filed in Interference No. 105,809, Filed Jan. 20, 2011—Sheets et al., “Efficient construction of a large nonimmune 2012–Decision of Yager Miscellaneous Motion 4, filed in Interfer phageantibody library: the production of high-affinity human single ence No. 104,718 filed Mar. 11, 2002. chain antibodies to protein ' (1998). Dyax Exhibit List, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Exhibit 1091, Filed in Interference No. 105,809, Filed Nov. 21, 2012. 2011—Third Declaration of Nathalie Scholler, M.D., Ph.D. dated Dyax Exhibit List, Filed in Interference No. 105,809, Filed Aug. 19. Nov. 20, 2011. 2011. Dyax Exhibit 1092, Filed in Interference No. 105,809, Filed Nov. 21, Dyax Exhibit List, Filed in Interference No. 105,809, Filed Sep. 30, 2011—Shusta et al., “Yeast Polypeptide Fusion Surface Display 2011. Levels Predict Thermal Stability and Soluble Secretion Efficiency” Dyax List of Exhibits, Filed in Interference No. 105,809, Filed Nov. (1999). 21, 2011. Dyax Exhibit 1093, Filed in Interference No. 105,809, Filed Jan. 20, Dyax list of proposed motions, Filed in Interference No. 105,809, 2012–Boder et al., “Yeast Surface Display of a Noncovalent MHC Filed Jun. 28, 2011. Class II Heterodimer Complexed With Antigenic Peptide' (2005). Dyax Miscellaneous Motion 1. Filed in Interference No. 105,809, Dyax Exhibit 1094, Filed in Interference No. 105,809, Filed Jan. 20, Filed Feb. 3, 2012. 2012 Pepper et al., “A Decade of Yeast Surface Display Technol Dyax Notice of Exhibits Served, Filed in Interference No. 105,809, ogy: Where Are We Now?” (2008). Filed Aug. 19, 2011. Dyax Exhibit 1095, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Notice of Filing of Priority Statement, Filed in Interference No. 2012—Alberts et al., Molecular Biology of the Cell. Fourth Edition 105,809, Filed Aug. 19, 2011. (2002). Dyax Notice of Service of Exhibits, Filed in Interference No. Dyax Exhibit 1096, Filed in Interference No. 105,809, Filed Jan. 20, 105,809, Filed Jan. 20, 2012. 2012—Lin et al., “Display of a functional hetero-oligomeric catalytic Dyax Notice of Service of Exhibits, Filed in Interference No. antibody on the yeast cell surface” (2002). 105,809, Filed Nov. 21, 2011. Dyax Exhibit 1097, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Notice of Service of Priority Statement, Filed in Interference 2012—Hubberstey and Wildeman, “Use of interplasmid recombina No. 105,809, Filed Aug. 26, 2011. tion to generate stable selectable markers for yeast transformation: Dyax Opposition 1. Filed in Interference No. 105,809, Filed Nov. 21. application to studies of actin gene control” (1990). 2011. Dyax Exhibit 1098, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Opposition 2, Filed in Interference No. 105,809, Filed Nov. 21. 2012—Dielbandhoesing et al., “Specific Cell Wall Proteins Confer 2011. Resistance to Nisin upon Yeast Cells” (1998). Dyax Opposition 3, Filed in Interference No. 105,809, Filed Nov. 21. Dyax Exhibit 1099, Filed in Interference No. 105,809, Filed Jan. 20, 2011. 2012–Sedlp (Saccharomyces cerevisiae S288c Protein NCBI Dyax Opposition 4. Filed in Interference No. 105,809, Filed Nov. 21. Reference Sequence: NP 010362. 1. 2011. Dyax Exhibit 1100, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Opposition 5. Filed in Interference No. 105,809, Filed Nov. 21. 2012 Tiplp Saccharomyces cerevisiae S288c Protein NCBI 2011. Reference Sequence: NP 009623.1. Dyax Opposition 6, Filed in Interference No. 105,809, Filed Nov. 21. Dyax Exhibit 1101, Filed in Interference No. 105,809, Filed Jan. 20, 2011. 2012—Deposition of David M. Kranz, Ph.D. dated Oct. 14, 2011. Dyax Priority Statement, Filed in Interference No. 105,809, Filed Dyax Exhibit 1102, Filed in Interference No. 105,809, Filed Jan. 20, Aug. 19, 2011. 2012 Tsubouchi and Roeder, “Budding yeast Hedl down-regulates Dyax Reply 1, Filed in Interference No. 105,809, Filed Jan. 20, 2012. the mitotic recombination machinery when meiotic recombination is Dyax Reply 2, Filed in Interference No. 105,809, Filed Jan. 20, 2012. impaired” (2006). Dyax Reply 3, Filed in Interference No. 105,809, Filed Jan. 20, 2012. Dyax Exhibit 1103, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Reply 4, Filed in Interference No. 105,809, Filed Jan. 20, 2012. 2012—Sample Campaign << Adimab, Webpage from www.adimab. Dyax Request for Oral Argument, Filed in Interference No. 105,809, com/science-and-technology/technology-overview? sample cam Filed Feb. 3, 2012. paign. Dyax Responsive Motion 4. Filed in Interference No. 105,809, Filed Dyax Exhibit 1104, Filed in Interference No. 105,809, Filed Jan. 20, Sep. 30, 2011. 2012 Corporate Overview << Adimab, Webpage from www.adimab. Dyax Second Notice of Discussions, Filed in Interference No. com/about-adimab/corporate-overview. 105,809, Filed Oct. 6, 2011. Dyax Exhibit 1105, Filed in Interference No. 105,809, Filed Jan. 20, Dyax Substantive Motion 1. Filed in Interference No. 105,809, Filed 2012—Manivasakamand Schiestl. High efficiency transformation of Aug. 19, 2011. Saccharomyces cerevisiae by electroporation (1993). Dyax Substantive Motion 2. Filed in Interference No. 105,809, Filed Dyax Exhibit 1106, Filed in Interference No. 105,809, Filed Jan. 20, Aug. 19, 2011. 2012—Kranz and Voss, "Restricted reassociation of heavy and light Dyax Substantive Motion3, Filed in Interference No. 105,809, Filed chains from hapten-specific monoclonal antibodies” (1981). Aug. 19, 2011. US 8,877,688 B2 Page 10

(56) References Cited Hasan, N. and Szybalski. W. “Control of cloned gene expression by promoter inversion in vivo: construction of improved vectors with a OTHER PUBLICATIONS multiple cloning site and the Ptac promoter' Gene, 56(1): 145-151 (1987). Ehlers, M.D., “Synapse structure: glutamate receptors connected by Hawkins, R.E. and Winter, G., “Cell selection strategies for making the shanks’ Current Biol. 9(22): R848-850 (1999). antibodies from variable gene libraries: trapping the memory pool' Fan, Z. and Mendelsohn, J., “Therapeutic application of anti-growth Eur, J. Immunol. 22(3):867-870 (1992). factor receptor antibodies' Curr. Opin. Oncol., 10(1):67-73 (1998). He, M. and Taussig, M.J., “Antibody-ribosome-mRNA (ARM) com Fan, Z. et al., “Three-dimensional structure of an Fv from a human IgM immunoglobulin' J. Mol. Biol., 228(1): 188-207 (1992). plexes as efficient selection particles for invitro display and evolution Feldhaus, M.J. et al., “Oligonucleotide-conjugated beads for of antibody combining sites' Nucleic Acids Res. 25(24):5132-5134 transdominant genetic experiments' Nucleic Acids Research, (1997). 28(2):534-543 (2000). Heddle, R.J. and Rowley, D., “Dog Immunoglobulins, I. Fellouse, F.A. et al., “High-throughput Generation of Synthetic Anti immunochemical characterization of dog serum, parotid saliva, bodies from Highly Functional Minimalist Phage-displayed Librar colostrum, milk and small bowel fluid” Immunology, 29(1):185-195 ies” J. Mol. Biol. 373(4):924-940 (2007). (1975). Fellouse, F.A. et al., “Molecular Recognition by a Binary Code” J. Hoet, R.M. et al., “Generation of high-affinity human antibodies by Mol, Biol. 348(5): 1153-1162 (2005). combining donor-derived and synthetic complementarity-determin Fellouse, F.A. et al., “Synthetic antibodies from a four-amino-acid ing-region diversity' Nat. Biotechnol. 23(3):344-348 (2005). code: a dominant role for tyrosine in antigen recognition' PNAS, Hoet, R.M. et al., “The importance of the light chain for the epitope 101 (34): 12467-12472 (2004). specificity of human anti-U1 Small nuclear RNA autoantibodies Fields, S. and Sternglanz, R. “The two-hybrid system: an assay for present in Systemic lupus erythematosus patients' Journal of Immu protein-protein interactions' Trends Genet., 10(8):286-292 (1994). nology, 163(6):3304-3312 (1999). Fields. S. and Song, O.. A novel genetic system to detect protein Holmes, P. and Al-Rubeai. M.. “Improved cell line development by a protein interactions” Nature, 340(6230):245–246 (1989). high throughput affinity capture Surface display technique to select Finley R.L., Jr. and Brent, R., “Interaction mating reveals binary and for high secretors' J. Immunol. Methods, 230(12): 141-147 (1999). ternary connections between Drosophila cell cycle regulators' Proc. Hoogenboom, H.R. and Winter, G., “By-passing immunisation. Natl. Acad. Sci. USA, 91(26): 12980-12984 (1994). Human antibodies from synthetic repertoires of germline VH gene Firth, A.E. and Patrick, W.M., “GLUE-IT and PEDEL-AA: new segments rearranged in vitro” J. Mol. Biol., 227(2):381-388 (1992). programmes for analyzing protein diversity in randomized libraries' Hoogenboom, H.R. et al., “Antibody phage display technology and Nucleic Acids Res., 36:W281-W285 (2008). Frazer, J. K., and J. D. Capra, “Immunoglobulins: Structure and its applications' Immunotechnology, 4(1):1-20 (1998). Function', in Fundamental Immunology, Fourth Edition, William E. Hoogenboom, H.R. et al., “Multi-subunit proteins on the surface of Paul, ed., Lippincot-Raven Publishers, Philadelphia, pp. 41-43 and filamentous phage: methodologies for displaying antibody (Fab) 51-52 (1999). heavy and light chains' Nucleic Acids Research, 19(15):4133-4137 Fromont-Racine, M. et al., “Toward a functional analysis of the yeast (1991). genome through exhaustive two-hybrid screens' Nat. Genet. Hoogenboom, H.R., “Designing and optimizing library selection 16(3):277-282 (1997). strategies for generating high-affinity antibodies' Trends Biotechnol. Frykman, S. and Srienc., F. "Quantitating secretion rates of indi 15(2):62-70 (1997). vidual cells: design of Secretion assays’ Biotechnol. & Bioeng. Horwitz A.H. et al., “Secretion of functional antibody and Fab frag 59(2):214-226 (1998). ments from yeast cells' Proc. Natl. Acad. Sci. USA, 85(22):8678 Fuh, G., “Synthetic antibodies as therapeutics' Expert Opin. Biol. 8682 (1988). Ther. 7(1): 73-87 (2007). Hoshino, Y. et al., “The rational design of a synthetic polymer Fundamental Immunology, William E. Paul, M.D. ed., 3rd Edi nanoparticle that neutralizes a toxic peptide in vivo” PNAS tion:292-295 (1993). 109(1):33-38 (2012). Fusco, et al., In vivo construction of cDNA libraries for use in the Hrincir, Z. and Chyikova et al., “Anticardiolipin antibodies in diffuse yeast two-hybrid system. Yeast, 15(8):715-720 (1999). connective tissue diseases in the IgG, IgM and IgA isotypes' Vnitr. Garcia, R.A. et al., “The neuregulin receptor ErbB-4 interacts with PDZ containing proteins at neuronal synapses' Proc. Natl. Acad Sci Lek., 36(11):1041-1049 (1990), translation (provided by USPTO) USA,97(7):3596-3601 (2000). pp. 1-13 (1999). Gietz et al., “Improved method for high efficiency transformation of Hua, S.B. et al., “Construction of a modular yeast two-hybrid cDNA intact yeast cells' Nucleic Acids Res., 20(6): 1425 (1992). library from human EST clones for the human genome protein link Gietz, R.D. and R.H. Schiestl, "Transforming Yeast with DNA” age map' Gene, 215(1): 143-152 (1998). Methods in Molecular and Cellular Biology (Invited Chapter), 5:255 Hua, S.B. et al., “Minimum length sequence homology required for 269 (1995). in vivo cloning by homologous recombination in yeast Plasmid, Gilfillan, S. et al., “Efficient immune responses in mice lacking 38(2):91-96 (1997). N-region diversity” Eur, J. Immunol. 25(11):3115-3122 (1995). Huang, D. and Shusta, E.V. et al., “Secretion and surface display of Griffiths, A.D. et al., “Human anti-self antibodies with high specific green fluorescent protein using the yeast Saccharomyces cerevisiae ity from phage display libraries” EMBO J., 12(2):725-734 (1993). Biotechnol. Prog., 21(2):349-357 (2005). Griffiths, A.D. et al., “Isolation of high affinity human antibodies Huse, W.D. et al., “Generation of a large combinatorial library of the directly from large synthetic repertoires.” EMBO J., 13(14):3245 immunoglobin repertoire in phage lambda” Science 3260 (1994). 246(4935): 1275-1281 (1989). Grimes E. et al., “Achilles' heel cleavage: creation of rare restriction Huston, J.S. et al., “Protein engineering of antibody binding sites: sites in lambda phage genomes and evaluation of additional opera recovery of specific activity in an anti-digoxin single-chain Fv ana tors, repressors and restriction/modification systems' Gene, logue produced in Escherichia coli Proc. Natl. Acad. Sci. USA 90(1):1-7 (1990). 85(16):5879-5883 (1988). Gushiken, F.C. et al., “Polymorphism of beta2-glycoprotein I at Ivanov, I.I. et al., “Development of the expressed Ig CDR-H3 reper codons 306 and 316 in patients with systemic lupus erythematosus toire is marked by focusing of constraints in length, amino acid use, and antiphospholipid syndrome' Arthritis & Rheumatism, and charge that are first established in early B cell progenitors. J. 42(6): 1189-1193 (1999). Immunol., 174(12):7773-7780 (2005). Hanes, J. et al., “Picomolar affinity antibodies from a fully synthetic Jackson, K.J., et al., “Identifying highly mutated IGHD genes in the naive library selected and evolved by ribosome display” Nat junctions of rearranged human immunoglobulin heavy chain genes.” Biotechnol. 18:(12): 1287-1292 (2000). J. Immunol. Methods, 324(1-2):26-37 (2007). US 8,877,688 B2 Page 11

(56) References Cited Lee, C.E., et al., “Reconsidering the human immunoglobulin heavy chain locus: 1. An evaluation of the expressed human IGHD gene OTHER PUBLICATIONS repertoire” Immunogenetics, 57(12):917-925 (2006). Lee, C.V. et al., “High-affinity human antibodies from phage-dis Jacobsson, K. and Frykberg, L., “Phage Display Shot-Gun Cloning played synthetic Fab libraries with a single framework scaffold' J. of Ligand-Binding Domains of Prokaryotic Receptors Approaches Mol. Biol. 340(5):1073-1093 (2004). 100% Correct Clones’ BioTechniqes, 2006): 1078, 1080-1081 Lee, S.Y. et al., “Microbial cell-surface display Trends Biotechnol. (1996). 21(1):45-52 (2003). Jirholt, P. et al., “Exploiting sequence space: shuffling in vivo formed Leonard, B. et al., "Co-expression of antibody fab heavy and light complementarity determining regions into a master framework'. chain genes from separate evolved compatible replicons in E. coli J. Gene, 215(2):471-476 (1998). Immunol. Methods, 317(1-2):56-63 (2006). Johns M. et al., “In vivo selection of slfv from phage display libraries' Lerner, R.A. et al., “Antibodies without immunization' Sci J. Immunol. Methods, 239(1-2): 137-151 (2000). ence.258(5086): 1313-314 (1992). Juul, L. et al., “The normally expressed kappa immunoglobulin light Lezcano, N. et al., “Dual Signaling Regulated by Calcyon, a D1 chain gene repertoire and somatic mutations studied by single-sided Dopamine Receptor Interacting Protein’ Science, 287(5458): 1660 specific polymerase chain reaction (PCR); frequent occurrence of 1664 (2000). features often assigned to autoimmunity Clin. Exp. Immunol. Li, C.H. et al., “beta-Endorphin omission analogs: dissociation of 109(1): 194-203 (1997). immunoreactivity from other biological activities' Proc Natl Acad Kaczorowski, T. and Szybalski, W., "Genomic DNA sequencing by Sci USA. 177(6):3211-3214 (1980). SPEL-6 primer walking using hexamer ligation' Gene, 223(1-2):83 Li, H. et al., “Biofuels: Biomolecular Engineering Fundamentals and 91 (1998). Advances' Annu. Rev. Chem. Biomol. Eng. 1:19-36 (2010). Kang, A.S. et al., “Linkage of recognition and replication functions Li, N et al., “B-Raf kinase inhibitors for cancer treatment' Current by assembling combinatorial antibody Fab libraries along phage Opinion in Investigational Drugs 8(6) 452-456 (2007). surfaces” Proc. Natl. Acad. Sci., 88(10):4363-4666 (1991). List of Exhibits, Filed in Interference No. 105,809, Filed Aug. 19, Kieke, M.C. et al., “Isolation of anti-T cell receptor scFv mutants by 2011. yeast surface display”. Protein Eng. 10(11): 1303-1310 (1997). Little, M. et al., “Generation of a large complex antibody library from Kieke, M.C. et al., “Selection of functional T cell receptor mutants multiple donors' J. Immunol Methods, 231(1-2):3-9 (1999). from a yeast Surface-display library” Proc. Natl. Acad. Sci. USA, Liu, Q, et al., “Rapid construction of recombinant DNA by the 96(10):5651-5656 (1999). univector plasmid-fusion system” Methods Enzymol. 328:530-49 Kim, S.C. et al., “Cleaving DNA at any predetermined site with (2000). adapter-primers and class-IIS restriction enzymes' Science, Love J.C. et al., “A microengraving method for rapid selection of 240(4851):504-506 (1988). single cells producing antigen-specific antibodies' Nature Kim, S.C., et al., “Structural requirements for Fokl-DNA interaction Biotechnol. 24(6):703-707 (2006). and oligodeoxyribonucleotide-instructed cleavage” J. Mol. Biol. Lowman, H.B. and Wells, J.A., "Affinity maturation of human growth 258(4):638-649 (1996). hormone by monovalent phage display” J. Mol. Biol. 234(3):564 Klein, R. et al., “Expressed human immunoglobulin kappa genes and 578 (1993). their hypermutation” Eur, J. Immunol. 23(12):3248-3262 (1993). Lowman, H.B. et al., “Selecting high-affinity binding proteins by Knappik. A. et al., “Fully synthetic human combinatorial antibody monovalent phage display” Biochemistry, 30(45): 10832-10838 libraries (HuCAL) based on modular consensus frameworks and (1991). CDRs randomized with trinucleotides' J. Mol. Biol., 296(1):57-86 Ma, H. et al., “Plasmid construction by homologous recombination in (2000). yeast Gene, 58(2-3):201-216 (1987). Koiwai, O. et al., “Isolation and characterization of bovine and mouse MacCallum, R.M. et al., “Antibody-antigen interactions: contact terminal deoxynucleotidyltransferase cDNAs expressible in mam analysis and binding site topography” J. Mol. Biol., 262(5):732-745 malian cells' Nucleic Acids Res., 14(14):5777-5792 (1986). (1996). Kontermann, R.E. and Müller, R., “Intracellular and cell Surface Manz, R. et al., “Analysis and sorting of live cells according to displayed single-chain diabodies”. J. Immunol. Methods, 226(1- secreted molecules, relocated to a cell-surface affinity matrix” Proc. 2): 179-188 (1999). Natl. Acad. Sci. USA. 92(6):1921-1925 (1995). Koob, M. and Szybalski. W. “Cleaving yeast and Escherichia coli Marks, J.D. et al., “By-passing Immunization. Human Antibodies genomes at a single site' Science, 250(4978):271-273 (1990). from V-gene Libraries Displayed on Phage” J. Mol. Biol. Koob, M. et al., “Conferring new specificity upon restriction 222(3):581-597 (1991). endonucleases by combining repressor-operator interaction and Marks, J.D. et al., “By-passing Immunization: building high affinity methylation” Gene, 74(1):165-167 (1988). human antibodies by chain shuffling” Biotechnology (NY), Koob, M. et al., “Conferring operator specificity on restriction 10(7):779-783 (1992). endonucleases.” Science, 241(4869): 1084-1086 (1988). Martin, A.C., "Accessing the Kabat antibody sequence database by Koob, M. et al., “RecA-AC: single-site cleavage of plasmids and computer” Proteins, 25(1): 130-133 (1996). chromosomes at any predetermined restriction site' Nucleic Acids Martin, A.C. and Thornton, J.M., "Structural families in loops of Res., 20021):5831-5836 (1992). homologous proteins: automatic classification, modelling and appli Kostrub, C.F. et al., “Use of gap repair in fission yeast to obtain novel cation to antibodies' J. Mol. Biol., 263(5):800-815 (1996). alleles of specific genes' Nucleic Acids Research, 26(20):4783-4784 Marzari, R. et al., “Extending filamentous phage host range by the (1998). grafting of a heterologous receptor binding domain' Gene, Kretzschmar, T. and von Riden, T., "Antibody discovery: phage 185(1):27-33 (1997). display” Curr. Opin. Biotechnol. 13(6):598-602 (2002). Matsuda, F. et al., “The complete nucleotide sequence of the human Kur, J. et al., “A novel method for converting common restriction immunoglobulin heavy chain variable region locus' J. Exp. Med., enzymes into rare cutters: integration host factor-mediated Achilles' 188(11):2151-2162 (1998). cleavage (IHF-AC)' Gene, 1.10(1):1-7 (1992). Mattila, P.S. et al., “Extensive allelic sequence variation in the J Lake, D.F. et al., “Generation of diverse single-chain proteins using a region of the human immunoglobulin heavy chain gene locus' Eur, J. universal (Gly4-Ser)3 encoding oligonucleotide' BioTechniques, Immunol. 9(:)2578-2582 (1995). 19(5):700-702 (1995). MazorY. et al., “Isolation of engineered, full-length antibodies from Lecrenier, N., et al., “Two-hybrid systematic screening of the yeast libraries expressed in Escherichia coli Nature Biotecnol. proteome” Bioessays. 20(1):1-5 (1998). 25(5):563-565 (2007). Lederman, S. et al., “A single amino acid Substitution in a common Mimran, A. et al., “GCN4-Based Expression System (pGES): African allele of the CD4 molecule ablates binding of the monoclonal Translationally Regulated Yeast Expression Vectors' BioTechniques, antibody, OKT4” Mol Immunol. (11): 1171-81 (1991). 28(3):552-554, 556, 558-560 (2000). US 8,877,688 B2 Page 12

(56) References Cited Onda, T. et al., “A phage display system for detection of T cell receptor-antigen interactions' Mol Immunol. 32(17-18): 1387-1397 OTHER PUBLICATIONS (1995). Order BD.R. 103 Limited Transfer of Jurisdiction, Filed in Interfer Moll, J.R. et al., “Designed heterodimerizing leucine zippers with a ence No. 105,809, Filed Jul. 25, 2011. ranger of pls and stabilities up to 10(-15) M” Protein Science, Order B.D.R. 104 Authorizing Reply Declaration, Filed in Interfer 10(3):649-55 (2001). ence No. 105,809, Filed Dec. 8, 2011. Mollova, S. et al., “Visualising the immune repertoire” BMC Sys Order B.D.R. 104 Regarding Adimab'S Reissue Application, Filed in tems Biology, 1(S1):P30 (2007). Interference No. 105,809, Filed Sep. 7, 2011. Muhle-Goll, C. et al., “The Leucine Zippers of the HLH-LZ Proteins Order BD.R. 104, Filed in Interference No. 105,809, Filed Jul. 21, Max and c-Myc Preferentially Form Heterodimers' Biochemistry, 2011. 34(41): 13554-13564 (1995). Order B.D.R. 104-Regarding Related Applications, Filed in Interfer Mullinax, R.L. et al., “Identification of human antibody fragment ence No. 105,809, Filed Jul. 1, 2011. clones specific for tetanus toxoid in a bacteriophage lambda Order B.D.R. 109(b) Authorizing Copies of Office Records, Filed in immunoexpression library” Proc. Natl. Acad. Sci., 87(20):8095 Interference No. 105,809, Filed May 23, 2011. 8099 (1990). Order B.D.R. 121 Authorizing Motion, Filed in Interference No. Mézard, C. et al., “Recombination between similar but not identical 105,809, Filed Aug. 4, 2011. DNA sequences during yeast transformation occurs within short Order B.D.R. 121 Authorizing Motions, Filed in Interference No. stretches of identity” Cell, 70(4):659-670 (1992). 105,809, Filed Sep. 1, 2011. Nakamura, Y. et al., “Development of novel whole-cell Order B.D.R. 121 Contingently Authorizing Responsive Motion, immunoadsorbents by yeast Surface display of the IgG-binding Filed in Interference No. 105,809, Filed Sep. 13, 2011. domain' Appl. Microbiol. Biotechnol. 57(4):500-505 (2001). Order B.D.R. 121(a) Authorizing Motions, Filed in Interference No. Nishigaki, K. et al., “Type II restriction endonucleases cleave single 105,809, Filed Jul. 1, 2011. stranded DNAs in general” Nucleic Acids Res, 13(16):5747-5760 Order BD.R. 5(a), Filed in Interference No. 105,809, Filed Sep. 27. (1985). 2011. Notice Designating pro hac vice counsel, Filed in Interference No. Ornstein, R.L. et al., “An optimized potential function for the calcu 105,809, Filed Oct. 6, 2011. lation of nucleic acid interaction energies I. Base stacking Notice of 4th Preliminary Amendment in Reissue Application, Filed Biopolyrners, 17:2341-2360 (1978). in Interference No. 105,809, Filed Sep. 30, 2011. Notice of Discussions, Filed in Interference No. 105,809, Filed Aug. Panka, D.J. et al., “Variable region framework differences result in 12, 2011. decreased or increased affinity of variant anti-digoxin antibodies' Notice of Exhibits List Filed, Filed in Interference No. 105,809, Filed Proc. Natl. Acad. Sci. USA, 85(9):3080-3084 (1988). Nov. 21, 2011. Parrott, M.B. et al., “Metabolically biotinylated adenovirus for cell Notice of Filing Continuation of Reissue Application, Filed in Inter targeting, ligand screening, and vector purification” Mol. Ther. ference No. 105,809, Filed Sep. 30, 2011. 8(4):688-700 (2003). Notice of Filing of Adimab Priority Statement, Filed in Interference Parthasarathy, R. et al., “An immobilized biotin ligase: surface dis No. 105,809, Filed Aug. 19, 2011. play of Escherichia coli BirA on Saccharomyces cerevisiae' Notice of Filing of Continuation Application, Filed in Interference Biotechnol. Prog., 21(6):1627-1631 (2005). No. 105,809, Filed Nov. 21, 2011. Pasqualini, R. and Ruoslahti, E., “Organ targeting in vivo using phage Notice of Filing of Reissue Application, Filed in Interference No. display peptide libraries' Nature, 380(6572):364-366 (1996). 105,809, Filed Aug. 19, 2011. Patrick, W.M. et al., “User-friendly algorithms for estimating com Notice of Filing of Reissue of 7005503, Filed in Interference No. pleteness and diversity in randomized protein-encoding libraries' 105,809, Filed Nov. 21, 2011. Protein Engineering, 16(6);451-457 (2003). Notice of Filing of Reissue of 7138496, Filed in Interference No. Pearson, B.M. et al., “Construction of PCR-ligated long flanking 105,809, Filed Nov. 21, 2011. homology cassettes for use in the functional analysis of six unknown Notice of related proceedings filed during interference, Filed in Inter open reading frames from the left and right arms of Saccharomyces ference No. 105,809, Filed Jun. 28, 2011. Notice of Related Proceedings, Filed in Interference No. 105,809, cerevisiae chromosome XV'Yeast, 14(4):391-399 (1998). Filed May 20, 2011. Persson, M.A. et al., “Generation of diverse high-affinity human Notice of Second Preliminary Amendment in Reissue Applicatio, monoclonal antibodies by repertoire cloning” Proc. Natl. Acad. Sci. Filed in Interference No. 105,809, Filed Aug. 19, 2011. USA, 88(6):2432-2436 (1991). Notice of Service of Dyax Exhibits, Filed in Interference No. Philibert, P. et al., “A focused antibody library for selected scFvs 105,809, Filed Sep. 30, 2011. expressed at high levels in the cytoplasm.” BMC Biotechnol. 7:81 Notice of Stipulated Extension, Filed in Interference No. 105,809, (2007). Filed Sep. 02, 2011. Phizicky, E.M. and Fields, S. et al., “Protein-protein interactions: Notice of Stipulation of Extension Time Period 3, Filed in Inter methods for detection and analysis” Microbiol. Rev. 59(1):94-123 ference No. 105,809, Filed Oct. 20, 2011. (1995). Notice of Stipulation of Extension Time Periods 3 & 4. Filed in Piatesi, A. et al., “Directed evolution for improved secretion of can Interference No. 105,809, Filed Nov. 1, 2011. cer-testis antigen NY-ESO-1 from yeast.” Protein Expr. Purif. Notice of Stipulation of Extension Time Periods 4, 5 & 6, Filed in 48(2):232-42 (2006). Interference No. 105,809, Filed Dec. 5, 2011. Pickens, L.B. et al., “Metabolic Engineering for the Production of Notice of Stipulation of Extension of Time Period 2, Filed in Inter Natural Products' Annu. Rev. Chem. Biomol. Eng. 2:211-236 ference No. 105,809, Filed Sep. 21, 2011. (2011). Notice of Stipulation of Extension of Time Period 3, Filed in Inter Pini, A. et al., “Design and use of a phage display library. Human ference No. 105,809, Filed Nov. 14, 2011. antibodies with Subnanomolar affinity against a marker of Notice of Stipulation of Extension of Time Periods 4, 5, & 6, Filed in angiogenesis eluted from a two-dimensional gel' Journal of Biologi Interference No. 105,809, Filed Jan. 12, 2012. cal Chemistry, 273(34): 21769-21776 (1998). Notice of Stipulation of Extension of Time Periods 5 and 6, Filed in Pluckthun, A., “Antibody engineering: Advances from the use of Interference No. 105,809, Filed Jan. 24, 2012. Escherichia coli expression systems’ Biotechnology (NY)9(6):545 Oldenburg, K.R. et al., “Recombination-mediated PCR-directed 551 (1991). plasmid construction in vivo in yeast' Nucleic Acids Res. 25(2):451 POA for Mr. Eric Marandett, Filed in Interference No. 105,809, Filed 452 (1997). Oct. 6, 2011. US 8,877,688 B2 Page 13

(56) References Cited adversely impacting momoclonal antibody recovery' Journal of Biotechnology, 128:813-823 (2007). OTHER PUBLICATIONS Roitt, I. et al., “Immunoglobulins: A Family of Proteins', in Immu nology, Sixth Edition, Mosby, Harcourt Publishers Limited, London, Podhajska A.J. and Szybalski W. “Conversion of the Fokl pp. 67-70 and 80 (2001). endonuclease to a universal restriction enzyme: cleavage of a phage Rudikoff, S. et al., “Single amino acid Substitution altering antigen M13mp7 DNA at predetermined sites.” Gene, 40(2-3): 175-182 binding specificity” Proc. Natl. Acad. Sci. USA, 79(6):1979-1983 (1985). (1982). Podhajska A.J. et al., “Conferring new specificities on restriction Ruiz, M. et al., “The human immunoglobulinheavy diversity (IGHD) enzymes: cleavage at any predetermined site by combining adaptor and joining (IGHJ) segments.” Exp. Clin. Irnrnunogenet, 16(3):173 oligodexynucleotide and class-IIS enzyme” Methods in Enzymol 184 (1999). ogy, 216(G):303-309 (1992). Ryu, D.D. and Nam, D.H., “Recent progress in biomolecular engi Powell, Richard and McLane, Kathryn Evans, “Construction, assem neering” Biotechnol Prog, 16(1):2-16 (2000). bly and selection of combinatorial antibody libraries.” Genetic Engi Saada, R. et al., “Models for antigen receptor gene rearrangement: neering with PCR (Horton and Tait, Eds. 1998), vol. 5 of the Current CDR3 length” Immunol. Cell Biol. 85(4):323-332 (2007). Innovations in Molecular Biol series, Horizon Scientific Press. Saviranta, P. et al., “Engineering the steroid-specificity of an anti Prabakaran, P. et al., “Expressed antibody repertoires in human cord 17beta-estradiol Fab by random mutagenesis and competitive phage blood cells: 454 sequencing and IMGT/High V-QUEST analysis of panning.” Protein Engineering, 11(2): 143-152 (1998). germline gene usage, junctional diversity, and somatic mutations' Sblattero, D. and Bradbury, A., “A definitive set of oligonucleotide Immunogenetics (2011). primers for amplifying human V regions' Immunotechnology, Prabakaran, P. et al., Supplemental “Expressed antibody repertoires 3(4): 271-278 (1998). in human cord blood cells: 454 sequencing and IMGT/High Sblattero, D. and Bradbury, A., “Exploiting recombination in single V-QUEST analysis of germline gene usage, junctional diversity, and bacteria to make large phage antibody libraries' Nat. Biotechnol. Somatic mutations' Immunogenetics (2011). 18(1):75-80 (2000). Proba, K. et al., “Antibody sclv fragments without disulfide bonds Scaviner, D. et al., “Protein displays of the human immunoglobulin made by molecular evolution”. J Mol Biol. 275(2):245-253 (1998). heavy, kappa and lambda variable and joining regions.” Exp. Clin. Pu, W.T. and Struhl, K., “Dimerization of leucine zippers analyzed by Immunogenet., 16(4):234-240 (1999). random selection”. Nucleic Acids Res. vol. 21 (18):4348-55 (1993). Schable, K.F. and Zachau, H.G., “The variable genes of the human Puga, A et al., “Aromatic hydrocarbon receptor interaction with the immunoglobulin kappa locus’ Biol. Chem. Hoppe Seyler, retinoblastoma protein potentiates repression of E2F-dependent tran 374(11): 1001-1022 (1993). Scription and cell cycle arrest' Journal Biological Chemistry, Schoonbroodt, S. et al., “Oligonucleotide-assisted cleavage and liga 275(4):2943-2950 (2000). tion: a novel directional DNA cloning technology to capture cDNAs. Posfai, G. and Szybalski. W. "A simple method for locating Application in the construction of a human immune antibody phage methylated bases in DNA using class-IIS restriction enxymes.” Gene, display library” Nucleic Acids Research, 33(9):e81:2-14 (2005). 74(1): 179-181 (1988). Seed, B., “Developments in expression cloning.” Current Opinion in Pörtner-Taliana, A. et al., “In vivo selection of single-chain antibod Biotechnology, 6(5):567-573 (1995). ies using a yeast two-hybrid system”. J. Immunol. Methods, 238(1- Sheets, M.D. et al., “Efficient construction of a large nonimmune 2):161-172 (2000). phage antibody library: The production of high-affinity human Qi, G.R. et al., “Restriction of single-stranded M13 DNA using single-chain antibodies to protein antigens.” Proc. Natl. Acad. Sci. Synthetic oligonucleotides: the structural requirement of restriction USA,95(11):6157-6162 (1998). enzymes.” Biochem. Cell Biol. 65(1):50-55 (1986). Shoji. H. et al., “Identification and characterization of a PDZ protein Rader, C and Barbas, C.F. 3rd, “Phage display of combinatorial that interacts with activin type II receptors.”. J.Biol. Chem. antibody libraries' Curr. Opin. Biotechnol. 8(4):503-508 (1997). 275(8):5485-5492 (1999). Rader, C. et al., “Aphage display approach for rapid antibody human Short, M.K. et al., “Contribution of antibody heavy chain CDR1 to ization: designed combinatorial V gene libraries' Proc. Natl. Acad. digoxin binding analyzed by random mutagenesis of phage-dis Sci. USA,95(15):89 10-8915 (1998). played Fab 26-10” J. Biol. Chem., 270(48):28541-28550 (1995). Rajan, S. and Sidhu, S., "Simplified Synthetic Antibody Libraries' Shusta, E.V. et al., “Directed evolution of a stable scaffold for T-cell Methods in Enzymology 2023-23 (2012). receptor engineering Nat. Biotechnol., 18(7):754-759 (2000). Rakestraw, J.A. and Wittrup, K.D., “Dissertation Abstracts Interna Shusta, E.V. et al., “Yeast Polypeptide Fusion Surface Display Levels tional”, 68(1 B):43, abstract only (2006). Predict Thermal Stability and Soluble Secretion Efficiency” J. Mol. Rakestraw, J.A. et al., “A Flow Cytometric Assay for Screening Biol. 292 949-956 (1999). Improved Heterologous Protein Secretion in Yeast.” Biotechnol. Sidhu, S.S. et al., “Phage-displayed antibody libraries of synthetic Prog. 22(4): 1200-1208 (2006). heavy chain complementarity determining regions' J. Mol. Biol. Rauchenberger, R. et al., “Human combinatorial Fab library yielding 338(2):229-310 (2004). specific and functional antibodies against the human fibroblast Skerra, A., “Alternative non-antibody scaffolds for molecular recog growth factor receptor 3” J. Biol. Chem., 278(40):38194-38205 nition” Current Opin. Biotechnol. 18(4): 295-304 (2007). (2003). Smith, G.P. and Petrenko, V.A., “Phage Display” Chern. Rev., Raymond, C.K. et al., “General method for plasmid construction 97(2):391-410 (1997). using homologous recombination” BioTechniques, 26(1): 134-138, Soderlind, E. et al., “Domain libraries: synthetic diversity for denovo 140-141 (1999). design of antibody V-regions' Gene, 160(2): 269-272 (1995). Real Party-in-interest, Filed in Interference No. 105,809, Filed May Soderlind, E. et al., “The immune diversity in a test tube—non 20, 2011. immunised antibody libraries and functional variability in defined Redeclaration, Filed in Interference No. 105,809, Filed May 23, protein scaffolds' Combinatorial Chemistry & High Throughput 2011. Screening, 4(5):409-416 (2001). Request for file copies, Filed in Interference No. 105,809, Filed May Souto-Carneiro, M.M. et al., “Characterization of the Hurnan Ig 20, 2011. Heavy Chain Antigen Binding Complementarity Determining Retter, I. et al., “VBASE2, an integrativeV gene database' Nucleic Region 3 Using a Newly Developed Software Algorithm, Acids Res., 33:D671-D674 (2005). JOINSOLVER. J. Immunol., 172(11):6790-6802 (2004). Rhoden, J.J. and Wittrup, K.D., “Dose Dependence of Intratumoral Standing Order, Filed in Interference No. 105,809, Filed May 6, Perivascular Distribution of Monoclonal Antibodies' Journal of 2011. Pharmaceutical Sciences 101(2): 860-867 (2012). Stewart, A.K. et al., “High-frequency representation of a single VH Riske, F. et al., “The use of chitosan as a flocculant in mammalian cell gene in the expressed human B cell repertoire” J. Exp. Med., culture dramatically improves clarification throughput without 177(2):409-418 (1993). US 8,877,688 B2 Page 14

(56) References Cited Volpe, J.M. and Kepler, T.B., “Genetic correlates of autoreactivity and autoreactive potential in human Ig heavy chains' Immunome OTHER PUBLICATIONS Res., 5:1 (2009). Volpe, J.M. et al., “SoDA: Implementation of a 3D Alignment Algo Stohl, W. and Hilbert, D.M.. “The discovery and development of rithm for Inference of Antigen Receptor Recombinations.” Bioinfor belimumab: the anti-BLyS-lupus connection” Nature Biology rnatics, 22(4):438-444 (2006). 30(1):69-77 (2012). Vugmeyster Y. "Biodistribution of 1251-Labeled Therapeutic Pro Suzuki, M. et al., “Light chain determines the binding property of teins: Application in Protein Drug Development Beyond Oncology” human anti-dsDNA IgG autoantibodies' Biochem. Biophys. Res. Journal of Pharmaceutical Sciences 99(2) 1028-1045 (2010). Commun., 271(1):240-243 (2000). Vugmeyster, Y, et al., “Complex Pharmacokinetics of a Humanized Swers, J.S. et al., “Shuffled antibody libraries created by in vivo Antibody Against Human Amyloid Beta Peptide, Anti-Abeta Ab2, in homologous recombination and yeast Surface display Nuc. Acids. Nonclinical Species' Pharm Res, 28:1696-1706 (2011). Res. 32(3), e36, 1-8 (2004). Walhout, A.J. et al., “GATEWAY recombinational cloning: applica SzybalskiW. and Skalka A., “Nobel prizes and restriction enzymes.” tion to the cloning of large numbers of open reading frames or Gene 4(3):181-182 (1978). ORFeomes” Methods in Enzymology, 328:575-92 (2000). Szybalski W. "Reasons and risks to study restriction/modification Wang, Y. et al., “Many human immunoglobulin heavy-chain IGHV enzymes form extreme thermophiles: chilly coldrooms, 13th Sample, gene polymorphisms have been reported in error Immunol. Cell. and 13-codon overlap' Gene, 112(1):1-2 (1992). Biol. 86(2):111-115 (epub 2007-2008). Szybalski W. “Universal restriction endonucleases: designing novel Weaver-Feldhaus, J.M. et al., “Yeast mating for combinatorial Fab cleavage specificities by combining adaptor oligodeoxynucleotide library generation and surface display” FEBS Lett., 564(1-2):24-34 and enzyme moieties' Gene, 40(2-3): 169-173 (1985). (2004). Szybalski. W. et al., “Class-IIS restriction enzymes—a review” Gene, Wentz, A.E. and Shusta, E.V., “A novel high-throughput screen 100:13-26 (1991). reveals yeast genes that increase secretion of heterologous proteins' Tavladoraki, P. et al., “Transgenic plants expressing a functional Appl. Environ. Microbiol. 73(4): 1189-1198 (2007). single-chain Fv antibody are specifically protected from virus attack” Winter G. and Milstein C., “Man-made antibodies' Nature, Nature, 366(6454):469-472 (1993). 349(6307):293-299 (1991). Terret, N.K. (1998) “Combinational Chemistry”, Oxford University Wu, H. et al., “Humanization of a murine monoclonal antibody by Press, pp. 2-5. simultaneous optimization of framework and CDR residues' J. Mol. Terskikh, A.V. et al., “Peptabody: A new type of high avidity binding Biol., 294(1): 151-162 (1999). protein’ Proc. Natl. Acad., 94(5):1663-1668 (1997). Wörn, A. and Plückthun, A., “An intrinsically stable antibody sclv Thielking, V, et al., "Accuracy of the EcoRI restriction endonuclease: fragment can tolerate the loss of both disulfide bonds and fold cor binding and cleavage studies with oligodeoxynucleotide Substrates rectly.” FEBS Lett., 427(3):357-361 (1998). containing degenerate recognition sequences' Biochemistry, Xu, J.L. and Davis, M.M., “Diversity in the CDR3 region of V(H) is 29(19):4682-4691 (1990). sufficient for most antibody specificities' Immunity, 13(1):37-45 Tomlinson, I.M. et al., “The repertoire of human germline VH (2000). sequences reveals about fifty groups of VH segments with different Yang, W.P. et al., “CDR walking mutagenesis for the affinity matu hypervariable loops' Journal of Molecular Biology, 227(3):776-798 ration of a potent human anti-HIV-1 antibody into the picomolar (1992). range” J. Molecular Biology, 254(3):392-403 (1995). Tomlinson, I.M. et al., “The structural repertoire of the human V Zemlin, M. et al., “Expressed murine and human CDR-H3 intervals kappa domain” EMBO J., 14(18):4628-4638 (1995). of equal length exhibit distinct repertoires that differ in their amino Tsurushita, N. et al., “Phage display vectors for in vivo recombination acid composition and predicted range of structures' J. Mol. Biol. of immunoglobulin heavy and light chain genes to make large 334(4): 733-749 (2003). combinatorial libraries' Gene, 172(1):59-63 (1996). Zhu D.L., “Oligodeoxynucleotide-directed cleavage and repair of a Ueda, M. and Tanaka, A., “Genetic immobilization of proteins on the single-stranded vector: a method of site-specific mutagenesis' Ana yeast cell surface” Biotechnology Advances, 8(2): 121-140 (2000). lytical Biochemistry, 177(1):120-124 (1989). Uetz, P. et al., “A comprehensive analysis of protein-protein interac Zhu, J. and Kahn, C.R., “Analysis of a peptide hormone-receptor tions in Saccharomyces cerevisiae' Nature, 403(6770):623-627 interaction in the yeast two-hybrid system” Proc. Natl. Acad. Sci. (2000). USA. 94(24): 13063-13068 (1997). Urlinger, S. et al., “Exploring the sequence space for tetracycline Zucconi, A. et al., “Domain repertoires as a tool to derive protein dependent transcriptional activators: novel mutations yield expanded recognition rules” FEBS Letters, 480(1):49-54 (2000). range and sensitivity” Proc. Natl. Acad. Sci. USA,97(14):7963-7968 de Kruif, John, et al. “Selection and Application of Human Single (2000). Chain Fv Antibody Fragments from a Semi-synthetic Phage Anti Van Holten, R.W. and Autenrieth, S.M.. “Evaluation of depth filtra body Display Library with Designed CDR3 Regions.” J. Mol. Biol. tion to remove prion challenge from an immune globulin prepara (1995) 248, 97-105. tion’ Vox Sanguinis 85:20-24 (2003). Fellouse, Frederic A. etal. “High-throughput Generation of Synthetic Vander Vaart, J.M. et al., “Comparison of cell wall proteins of Sac Antibodies from Highly Functional Minimalist Phage-displayed charomyces cerevisiae as anchors for cell Surface expression of Libraries.” J. Mol. Biol. (2007) 373,924-940. heterologous proteins' Appl. Environ. Microbiol. 63(2):615-620 Fellouse, Frederic A. et al. “Molecular Recognition by a Binary (1997). Code.” J. Mol, Biol. (2005) 348, 1153-1162. Vaswani, S.K. and Hamilton, R.G., “Humanized antibodies as poten Fellouse, Frederic A. et al. “Synthetic Antibodies from a Four tial therapeutic drugs' Ann. Allergy Athma Immunol. 81(2):105-115 Amino-Acid Code: A Dominant Role for Tyrosine in Antigen Rec (1998). ognition.” PNAS (2004) vol. 101. No. 34 12467-12472. Vendel, M.C. et al., “Secretion from bacterial versus mammalian Fuh, Germaine, “Synthetic Antibodies as Therapeutics.” Expert cells yields a recombinant scFv with variable folding properties” Opin. Biol. Ther. (2007) 7(1). Arch. Biochem. Biophys. 1-6 (2012). Jackson, Katherine, JL, et al. “Identifying Highly Mutated IGHD Virnekas, B. et al., “Trinucleotide phosphoramidites: ideal reagents Genes in the Junctions of Rearranged Human Immunoglobulin for the synthesis of mixed oligonucleotides for random mutagenesis' Heavy Chain Genes.” JIM 324 (2007) 26-37. Nucleic Acids Res., 22(25):5600-5607 (1994). Knappik. A. et al. "Fully Synthetic Human Combinatorial Antibody Visintin. M. et al., “Selection of antibodies for intracellular function Libraries (HuCAL) based on Modular Consensus Frameworks and using a two-hybrid in vivo system”. Proc. Natl. Acad. Sci. USA CDRs Randomized with Trinucleotids.” J. Mol. Biol (2000) 296, 96(21): 11723-1 1728 (1999). 57-86. US 8,877,688 B2 Page 15

(56) References Cited Shojaei, F. et al., “Unusually Long Germline D Genes Contribute to Large Sized CDR3H in Bovine Antibodies.” Mol. Immunol. 40:61 OTHER PUBLICATIONS 67 (2003). Solem, S.T. et al., “Antibody Repertoire Development in Teleosts—A Lee, CEH, etal. “Reconsidering the Human Immunoglobulin Heavy Review with Emphasis on Salmonids and Gadus morhua L.” Dev. chain Locus: 1. An Evaluation of the Expressed Human IGHD Gene Comp. Immunol. 30:57-76 (2006). Repertoire.” Immunogenetics (2006) 57,917-925. Wagner, B., “Immunoglobulins and Immunoglobulin Genes of the Lee, Chingwei V. et al. “High-affinity Human Antibodies from Horse.” Dev. Comp. Immunol. 30:155-164 (2006). Phage-displayed Synthetic Fab Libraries with a Single Framework Ye, J., “The Immunoglobulin IGHD Gene Locus in C57BL/6 Mice.” Scaffold.” J. Mol. Biol. (2004) 340, 1073-1093. Immunogenetics, 56:399-404 (2004). Sidhu, Sachdev S. et al. “Phage-displayed Antibody Libraries of Zhao, Y. et al., “The Bovine Antibody Repertoire.” Dev. Comp. Synthetic Heavy Chain Complementarity Determining Regions.” J. Immunol. 30:175-186 (2006). Mol. Biol. (2004) 338, 229-310. Bahler et al., "Clonal Salivary Gland Infiltrates Assciated with Zemlin, Michael, et al. “Expressed Murine and Human CDR-H3 Myoepithelial Sialadenitis (Sjögren's Syndrome) Begin as Nonma Intervals of Equal Length Exhibit Distinct Repertoires that Differ in lignant Antigen-Selected Expansions”. Blood, 91(6):1864-1872 Their Amino Acid Composition and Predicted Range of Structures.” (1998). J. Mol. Biol. (2003) 334, 733-749. Bakkus et al., “Evidence that Multiple Myeloma Ig Heavy Chain VDJ Bengten, E. et al., “Channel Catfish Immunoglobulins: Repertoire Genes Contain Somatic Mutations but Show no Intraclonal Varia and Expression.” Dev. Comp. Immunol. 30:77-92 (2006). tion”. Blood, 80(9):2326-2335 (1992). Briggemann, M. et al., “Immunoglobulin Heavy Chain Locus of the Barbas et al., “Molecular Profile of an Antibody Response to HIV-1 Rat: Striking Homology to Mouse Antibody Genes.” Proc. Natl. as Probed by Combinatorial Libraries”, J. Mol. Biol., 230:812-823 Acad. Sci. USA, 83:6075-6079 (1986). (1993). Butler, J.E. et al., “Antibody Repertoire Development in Swine.” Dev. Carrollet al., “Absence of IgV Region Gene Somatic Hypermutation Comp. Immunol. 30:199-221 (2006). in Advanced Burkitt's Lymphoma'. J. Immunol. 143(2):692-698 Das, S. et al., “Evolutionary Dynamics of the Immunoglobulin Heavy (1989). Chain Variable Region Genes in Vertebrates.” NIH Public Access Corbett et al., “Sequence of the Human Immunoglobulin Diversity Author Manuscript from Immunogenetics, 60(1):47-55 (2008). (D) Segment Locus: A Systematic Analysis Provides no Evidence for De Genst, E. et al., “Antibody Repertoire Development in Camelids.” the Use of DIR Segments, Inverted D Segments, Minor D Segments Dev. Comp. Immunol. 30:187-198 (2006). or D-D Recombination”, J. Mol. Biol., 270:587-597 (1997). Dirkes, G. et al., “Sequence and Structure of the Mouse IgH DQ525 Daviet al., “High Frequency of Somatic Mutations in the VH Genes Region.” Immunogenetics, 40:379 (1994). Expressed in Prolymphocytic Leukemia', Blood, 88 (10):3953-3961 Dooley, H. et al., “Antibody Repertoire Development in Cartilagi (1996). nous Fish.” Dev. Comp. Immunol. 30:43-56 (2006). DiPietro et al., “Limited number of immunoglobulin VH regions Friedman, M. L. et al., "Neonatal V, D, and J. Gene Usage in expressed in the mutant rabbit Alicia”, Eur, J. Immunol. 20:1401 Rabbit B Lineage Cells,” J. Immunol. 152: 632-641 (1994). 1404 (1990). Gerondakis, S. et al., “Immunoglobulin J. Rearrangement in a T-cell Esposito et al., “Phage display of a human antibody against Line Reflects Fusion to the D. Locus at a Sequence Lacking the Clostridium tetani toxin', Gene, 148:167-168 (1994). Nonamer Recognition Signal.” Immunogenetics, 28:255-259 (1998). Huang et al., “A Majority of Ig H Chain cDNA of Normal Human Ghaffari, S.H. et al., “Structure and Genomic Organization of a Adult Blood Lymphocytes Resembles cDNA for Fetal Ig and Natural Second Cluster of Immunoglobulin Heavy Chain Gene Segments in Autoantibodies”. J. Immunol. 151:5290-5300 (1993). the Channel Catfish.” J. Immunol. 162:1519-1529 (1999). Ivanovski et al., “Somatic Hypermutation, Clonal Diversity, and Gu, H. et al., “B Cell Development Regulated by Gene Rearrange Preferential Expression of the VH 51.pl/VL kV325 Immunoglobin ment: Arrest of Maturation by Membrane-Bound Du Protein and Gene Combination in Hepatitis C Virus-Associated Selection of D, Element Reading Frames.” Cell, 65:47-54 (1991). Immunocytomas”. Blood, 91(7): 2433-2442 (1998). Hayman, J. R. et al., “Heavy Chain Diversity Region Segments of the Lieber et al., “Lymphoid V(D)J recombination: Nucleotide insertion Channel Catfish: Structure, Organization, Expression and at signal joints as well as coding joints'. Proc. Natl. Acad. Sci. USA, Phylogenetic Implications.” J. Immunol. 164: 1916-1924 (2000). 85:8588-8592 (1988). Jenne, C.N. et al., “Antibody Repertoire Development in the Sheep.” Lieber, M.R., “Site-specific recombination in the immune system'. Dev. Comp. Immunol. 30: 165-174 (2006). FASEB.J., 5:2934-2944 (1991). Kurosawa, Y. et al., Identification of D Segments of Immunoglobulin Liu et al., “Normal Human IglD+IgM-Germinal Center B Cells can Heavy-Chain Genes and Their Rearrangement in T Lymphocytes, Express up to 80 Mutations in the Variable Region of their IglD Nature, 290:565-570 (1981). Transcripts”. Immunity, 4:603-613 (1996). Link, J. M. et al., “The Rhesus Monkey Immunoglobulin IGHD Matolcsy et al., “Molecular Characterization of IgA- and/or IgG Germline Repertoire.” Immunogenetics, 54:240-250 (2002). Switched Chronic Lymphocytic Leukemia B Cells”. Blood, Lundqvist, M. L. et al., “Immunoglobulins of the Non-Galliform 89(5):1732-1739 (1997). Birds: Antibody Expression and Repertoire in the Duck.” Dev. Comp. McIntosh et al., “Analysis of Immunoglobulin Gk Antithyroid Immunol. 30:93-100 (2006). Peroxidase Antibodies from Different Tissues in Hashimoto's Mage, R. G. et al., “B Cell and Anitbody Repertoire Development in Thyroiditis”, J. Clin. Endocrinol. Metab., 82(11):3818-3825 (1997). Rabbits: The Requirement of Gut-Associated Lymphoid Tissues.” Ruiz et al., “The Human Immunoglobulin Heavy Diversity (IGHD) Dev. Comp. Immunol. 30:137-153 (2006). and Joining (IGHJ) Segments'. Exp. Clin. Immunogenet., 16:173 Malecek, K. et al., “Somatic Hypermutation and Junctional Diversi 184 (1999). fication at Ig Heavy Chain Loci in the Nurse Shark.” J. Immunol. Sahota et al., “IgVH Gene Mutational Patterns Indicate Different 175.8105-8115 (2005). Tumor Cell Status in Human Myeloma and Monoclonal Gam Nguyen, V. K. et al., "Camel Heavy-Chain Antibodies: Diverse mopathy of Undetermined Significance”. Blood, 87(2):746-755 Germline V.H and Specific Mechanisms Enlarge the Antigen-bind (1996). ing Repertoire.” EMBOJ, 19:(5)921-930 (2000). Shimoda et al., “Natural polyreactive immunoglobulin A antibodies Ratcliffe, M.J.H. et al., “Antibodies, Immunoglobulin Genes and the produced in mouse Peyer's patches'. Immunology, 97:9-17 (1999). Bursa of Fabricius in Chicken B Cell Development.” Dev. Comp. Welschofetal., “Amino acid sequence based PCR primers for ampli Immunol. 30: 101-118 (2006). fication of rearranced human heavy and light chain immunoglobulin Shimizu, T. et al., "Biased Reading Frames on Pre-Existing DJ variable region genes”. J. Immunol. Meth., 179:203-214 (1995). Coding Joints and Preferential Nucleotide Insertions at V-DJ Sig Wen et al., “T cells recognize the VH complementarity-determining nal Joints of Excision Products of Immunoglobulin Heavy Chain region3 of the idiotypic protein of B cell non-Hodgkin's lymphoma', Gene Rearrangements.” EMBO.J., 11:(13)4869-4875 (1992). Eur, J. Immunol., 27: 1043-1047 (1997). US 8,877,688 B2 Page 16

(56) References Cited Hoogenboom et al., “By-passing Immunisation Human Antibodies from Synthetic Repertoires of Germline VH Gene Segments Rear OTHER PUBLICATIONS ranged in Vitro”. J. Mol. Biol., 227:381-388 (1992). Nissim et al., “Antibody fragments from a single pot phage display Winkler et al., “Analysis of immunoglobulin variable region genes from human IgG anti-DNA hybridomas'. Eur, J. Immunol., 22:1719 library as immunochemical reagents'. The EMBO Journal, 1728 (1992). 13(3):692-698 (1994). Patricket al., “User-friendly algorithms for estimating completeness Cioe, L., Cloning and Nucleotide Sequence of a Mouse Erythrocyte and diversity in randomized protein-encoding libraries'. Protein beta-Spectrin cDNA, Blood, 70:915-920 (1987). Engineering, 16(6);451-457 (2003). Interference No. 105809 Decision on Motions, dated Nov. 2, 2012. Scaviner et al., “Protein Displays of the Human Immunoglobulin Heavy, Kappa and Lambda Variable and Joining Regions'. Exp. Clin. Interference No. 105809 Dyax Resp to Order to Show Cause, dated Immunogenet., 16:234-240 (1999). Jan. 4, 2013. Knappik et al., “Fully Synthetic Human Combinatorial Antibody Kokubu F. et al., Complete structure and organization of Libraries (HuCAL) Based on Molecular Consensus Frameworks and immunoglobulin heavy chain constant region genes in a phylogeneti CDRs Randomized with Trinucleotides', J. Mol. Biol., 296:57-86 cally primitive vertebrate. The EMBO Journal, 7(7): 1979-1988 (2000). (1988). Winter, Greg, “Synthetic human antibodies and a strategy for protein McCormack, W.T. Comparison of latent and nominal rabbit IgVHal engineering”, FEBS Letters, 430:92-94 (1998). allotype cDNA sequences. J. Immunol., 141(6):2063-2071 (1988). McCafferty et al., “Phage antibodies: filamentous phage displaying antibody variable domains”, Nature, 348:552-554 (1990). Schwager, J. et al., Amino acid sequence of heavy chain from Mouquet et al., “Enhanced HIV-1 neutralization by antibody Xenopus levis IgM deduced from cDNA sequence: Implications for heteroligation', PNAS, published on line before printing, Jan. 4. evolution of immunoglobulin domains, Proc. Natl. Acad. Sci. USA, 2012, doi:10.1073/pnas. 11200591.09. 85:224.5-2249 (1988). Xu et al., “Diversity in the CDR3 Region of VH is Sufficient for Most Philibert et al., “A focused antibody library for selected scFvs Antibody Specificities'. Immunity, 13:37-45 (2000). expressed at high levels in the cytoplasm.” BMC Biotechnology, Zeng et al., “CD146, an epithelial-mesenchymal transition inducer, is 2007, 7:81. associated with triple-negative breast cancer', published on line before print Dec. 30, 2011, doi:1010.1073/pnas. 1111053108. * cited by examiner U.S. Patent Nov. 4, 2014 Sheet 1 of 26 US 8,877,688 B2

Figure 1

Construction by Yeast Fiordiogous Recorbination

*C83risis its S33 $crisis. Riki Eisatist Essy is tasis $tifices, Eis is star

SS F. Y.

H fissisterisatiitiEkittressiests is ESE

R&stritis. Eigestest Wigston U.S. Patent Nov. 4, 2014 Sheet 2 of 26 US 8,877,688 B2

Figure 2

3. 8 O %, 30.0%

O

1S.0%

O

Number of Anino Acids U.S. Patent Nov. 4, 2014 Sheet 3 of 26 US 8,877,688 B2

Figure 3

7(O -

tC

O) --ow

L3 length

U.S. Patent Nov. 4, 2014 Sheet 5 of 26 US 8,877,688 B2

Figure 5

y &sis tests yes sits & S&T, Šter & essessing her SS

& sessess eers stafsir essssss: as SSSSSSS ts &fs essess arrara arrarar arrararararew sets & Eris's ASYEAR get S f 32 Faits W$23 SSSSSSSSSSF

$88ssass&s Syssess&ssy Spies & SSR Artified U.S. Patent Nov. 4, 2014 Sheet 6 of 26 US 8,877,688 B2

Figure 6

TRP1

p14HC-3-23 7072 bp

s Signal peptide vhs-23 Terminator / primer binding site Not (3966) DTAVYYCAK \ Sfil (2918) \ Sfil(2951) VTVSS common to all J Nhei (29.72 higG1 m17, 1 N297A

U.S. Patent Nov. 4, 2014 Sheet 8 of 26 US 8,877,688 B2

Figure 8 ARSH4 CEN6

APr

p16LC-1-39 chassis F1 OR 6467 bp

Gal1 promotor iiii (263)

T3P x. Matalpha terminator A Sfi1 stuffer Nod (3361) Sil (2967) Sii (3OOO) JK4 Bswico 36 Constant Kappa

U.S. Patent Nov. 4, 2014 Sheet 11 of 26 US 8,877,688 B2

Expected EOussive

: r : r : ...--- S 6 7 8 S: G 2 3 4 5 S ; ; 8 9 20 2 28 23 i3 length (Kabat 95 to 102) U.S. Patent Nov. 4, 2014 Sheet 12 of 26 US 8,877,688 B2

Figure 11

40.0 to s

35. Expected

x 3G, x s

x x x

x x x x x x x x x

s

x x

3. war - sx

S 4. S S 7 8 } U.S. Patent Nov. 4, 2014 Sheet 13 of 26 US 8,877,688 B2

Figure 12

60.0

50.

43 -

3.

2O. :

1,

- O t 2 3. Amino Acids in N2 Segment U.S. Patent Nov. 4, 2014 Sheet 14 of 26 US 8,877,688 B2

Figure 13

25 -

2O) -

SO

O) -

SO -

() is

------

U.S. Patent Nov. 4, 2014 Sheet 16 of 26 US 8,877,688 B2

Figure 15

xYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY.

V

V 8

V

V

8 V

V

V 8 V

V S. S. 8 sistiki is: $8. V V

V U.S. Patent Nov. 4, 2014 Sheet 17 of 26 US 8,877,688 B2

Figure 16

3

****

×4, *** leist of Di Sigrist

U.S. Patent Nov. 4, 2014 Sheet 18 of 26 US 8,877,688 B2

Figure 17

Expsetsd & Cississi

yzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzý, &sis

Airistic Acisis in N2 Segret U.S. Patent Nov. 4, 2014 Sheet 19 of 26 US 8,877,688 B2

Figure 18

Settiisutists of it to YiCER3. (tritiser of attina asids U.S. Patent Nov. 4, 2014 Sheet 20 of 26 US 8,877,688 B2

Figure 19

:------3.

Sssssss s s ...... S. ss s W.SSS

SXXXX NNNNNNYNN

r s s s . s

YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY U.S. Patent Nov. 4, 2014 Sheet 21 of 26 US 8,877,688 B2

Figure 20

********?

U.S. Patent Nov. 4, 2014 Sheet 22 of 26 US 8,877,688 B2

Figure 21

S-38's

.

:

Percentage U.S. Patent Nov. 4, 2014 Sheet 23 of 26 US 8,877,688 B2

Figure 22

S8: 8

3341344r3:33,†133.1%,

-SS SS S-3S

U.S. Patent Nov. 4, 2014 Sheet 24 of 26 US 8,877,688 B2

Figure 23

''''''''''x'''''''''''''xy''''''''''yx''''''''''www.s...swiss-s-s-xxv-vs...... vxxx...'...'...-- ---....Y.YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY wys

%,222222&ºzzzzzzzzzzzzzzzzzzzzzzzzzzz??????????azzz, ž~~~~~~~~~;~~*~*~~~~zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz E. &rigin is kaist -3 YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYWYYYYYYYYYYYYYYNYWYYYYYYYYYYYYWWWWYYYYYYYYYYYYYYYYYYYY------rrrrry YYY www.www.YYYYYYYYYYYYYYYYWYYYYYYYYYYYYYYYYYYYYYYYYYY

U.S. Patent Nov. 4, 2014 Sheet 26 of 26 US 8,877,688 B2

Figure 25

4f6 clonal KD analysis

& irst Šlë

O ------Š & Cié 3

: 8 4 : ice S.

sch 3 ... it is & i. & 1 J Š 3. for it

3) 3OO SCO C US 8,877,688 B2 1. 2 RATIONALLY DESIGNED, SYNTHETIC variety of antigens). However, obtaining such libraries ANTIBODY LIBRARIES AND USES requires balancing the competing objectives of restricting the THEREFOR sequence diversity represented in the library (to enable Syn thesis and physical realization, potentially with oversam RELATED APPLICATION pling, while limiting the introduction of non-human sequences) while maintaining a level of diversity Sufficient to This application is a continuation-in-part of U.S. patent recognize a broad variety of antigens. Prior to the instant application Ser. No. 12/210,072 filed Sep. 12, 2008. This invention, it was known in the art that “although libraries application also claims priority to U.S. provisional applica containing heavy chain CDR3 length diversity have been tion Ser. No. 60/993,785, filed on Sep. 14, 2007, incorporated 10 reported, it is impossible to synthesize DNA encoding both herein in its entirety by this reference. the sequence and the length diversity found in natural heavy In accordance with 37 CFRS1.52(e)(5), the present speci chain CDR3 repertoires” (Hoet et al., Nat. Biotechnol., 2005, fication makes reference to a Sequence Listing (Submitted 23: 344, incorporated by reference in its entirety). electronically as a .txt file named “12404059SeqList2..txt on Therefore, it would be desirable to have antibody libraries Nov. 23, 2011). The .txt file was generated on Nov. 22, 2011 15 which (a) can be readily synthesized, (b) can be physically and is 300 kb in size. The entire contents of the Sequence realized and, in certain cases, oversampled, (c) contain Suffi Listing are herein incorporated by reference. cient diversity to recognize all antigens recognized by the preimmune human repertoire (i.e., before negative selection), BACKGROUND OF THE INVENTION (d) are non-immunogenic in humans (i.e., comprise sequences of human origin), and (e) contain CDR length and Antibodies have profound relevance as research tools and sequence diversity, and framework diversity, representative in diagnostic and therapeutic applications. However, the iden of naturally-occurring human antibodies. Embodiments of tification of useful antibodies is difficult and once identified, the instant invention at least provide, for the first time, anti antibodies often require considerable redesign or humaniza body libraries that have these desirable features. tion before they are suitable for therapeutic applications. 25 Previous methods for identifying desirable antibodies have SUMMARY OF THE INVENTION typically involved phage display of representative antibodies, for example human libraries derived by amplification of The present invention relates to, at least, synthetic poly nucleic acids from B cells or tissues, or, alternatively, Syn nucleotide libraries, methods of producing and using the thetic libraries. However, these approaches have limitations. 30 libraries of the invention, kits and computer readable forms For example, most human libraries known in the art contain including the libraries of the invention. In some embodi only the antibody sequence diversity that can be experimen ments, the libraries of the invention are designed to reflect the tally captured or cloned from the Source (e.g., B cells). preimmune repertoire naturally created by the human Accordingly, the human library may completely lack or immune system and are based on rational design informed by under-represent certain useful antibody sequences. Synthetic 35 examination of publicly available databases of human anti or consensus libraries known in the art have other limitations, body sequences. It will be appreciated that certain non-lim Such as the potential to encode non-naturally occurring (e.g., iting embodiments of the invention are described below. As non-human) sequences that have the potential to be immuno described throughout the specification, the invention encom genic. Moreover, certain synthetic libraries of the art suffer passes many other embodiments as well. from at least one of two limitations: (1) the number of mem 40 In certain embodiments, the invention comprises a library bers that the library can theoretically contain (i.e., theoretical of synthetic polynucleotides, wherein said polynucleotides diversity) may be greater than the number of members that encode at least 10 unique antibody CDRH3 amino acid can actually be synthesized, and (2) the number of members sequences comprising: actually synthesized may be so great as to preclude Screening (i) an N1 amino acid sequence of 0 to about 3 amino acids, of each member in a physical realization of the library, 45 wherein each amino acid of the N1 amino acid sequence thereby decreasing the probability that a library member with is among the 12 most frequently occurring amino acids a particular property may be isolated. at the corresponding position in N1 amino acid For example, a physical realization of a library (e.g., yeast sequences of CDRH3 amino acid sequences that are display, phage display, ribosomal display, etc.) capable of functionally expressed by human B cells; screening 10" library members will only sample about 10% 50 (ii) a human CDRH3 DH amino acid sequence, N- and of the sequences contained in a library with 10" members. C-terminal truncations thereof, or a sequence of at least Given a median CDRH3 length of about 12.7 amino acids about 80% identity to any of them; (Rocket al., J. Exp. Med., 1994, 179:323-328), the number of (iii) an N2 amino acid sequence of 0 to about 3 amino acids, theoretical sequence variants in CDRH3 alone is about 20'’7. wherein each amino acid of the N2 amino acid sequence or about 3.3x10" variants. This number does not account for 55 is among the 12 most frequently occurring amino acids known variation that occurs in CDRH1 and CDRH2, heavy at the corresponding position in N2 amino acid chain framework regions, and pairing with different light sequences of CDRH3 amino acid sequences that are chains, each of which also exhibit variation in their respective functionally expressed by human B cells; and CDRL1, CDRL2, and CDRL3. Finally, the antibodies iso (iv) a human CDRH3 H3-JHamino acid sequence, N-ter lated from these libraries are often not amenable to rational 60 minal truncations thereof, or a sequence of at least about affinity maturation techniques to improve the binding of the 80% identity to any of them. candidate molecule. In other embodiments, the invention comprises a library of Accordingly, a need exists for Smaller (i.e., able to be synthetic polynucleotides, wherein said polynucleotides synthesized and physically realizable) antibody libraries with encode at least about 10°unique antibody CDRH3 amino acid directed diversity that systematically represent candidate 65 sequences comprising: antibodies that are non-immunogenic (i.e., more human) and (i) an N1 amino acid sequence of 0 to about 3 amino acids, have desired properties (e.g., the ability to recognize a broad wherein: US 8,877,688 B2 3 4 (a) the most N-terminal N1 amino acid, if present, is ID NO: 15), IGHD-5-18 (polynucleotides encoding selected from a group consisting of R. G. P. L. S. A. V. SEQID NO: 15), IGHD6-13 (polynucleotides encoding K., I, Q, T and D; SEQ ID NOs: 7 and 8), IGHD6-19 (polynucleotides (b) the second most N-terminal N1 amino acid, if encoding SEQ ID NOs: 5 and 6), IGHD6-25 (SEQ ID present, is selected from a group consisting of G. P. R. 5 NO:514), IGHD6-6 (SEQID NO:515), and IGHD7-27 S, L, V, E, A, D, I, T and K; and (SEQ ID NO: 516), and N- and C-terminal truncations (c) the third most N-terminal N1 amino acid, if present, thereof; is selected from the group consisting of G. R. P. S. L. (iv) N2 is an amino acid sequence selected from the group A, V, T, E, D, K and F: consisting of G. P. R. A. S. L. T. V. GG, GP, GR, GA, GS, (ii) a human CDRH3 DH amino acid sequence, N- and 10 GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, C-terminal truncations thereof, or a sequence of at least PS, PL, PT, PV, RP AP, SP, LP, TP. VP, GGG, GAG, about 80% identity to any of them; GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, (iii) an N2 amino acid sequence of 0 to about 3 amino acids, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, wherein: GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, (a) the most N-terminal N2 amino acid, if present, is 15 AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, selected from a group consisting of G. P. R. L., S.A, T, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, V, E, D, F and H: RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ., (b) the second most N-terminal N2 amino acid, if SR, SS, ST, SV, TATR, TS, TT, TW, VD, VS, WSYS, present, is selected from a group consisting of G. P. R. AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, S, T, L, A, V. E.Y., D and K; and PTO, REL RPL, SAA, SAL, SGL, SSE, TGL, WGT, (c) the third most N-terminal N2 amino acid, if present, and combinations thereof, and is selected from the group consisting of G, P, S, R, L, (v) H3-JH is an amino acid sequence selected from the A, T, V, D, E, W and Q; and group consisting of AEYFQH (SEQ ID NO: 17), (iv) a human CDRH3 H3-JH amino acid sequence, N-ter EYFQH (SEQID NO: 583), YFQH (SEQID NO:584), minal truncations thereof, or a sequence of at least about 25 FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL 80% identity to any of them. (SEQID NO:585),YFDL (SEQIDNO:586), FDL, DL, In still other embodiments, the invention comprises a LAFDV (SEQIDNO:19), FDV, DV, V,YFDY (SEQID library of synthetic polynucleotides, wherein said polynucle NO:20), FDY, DY.Y., NWFDS (SEQIDNO: 21), WFDS otides encode at least about 10 unique antibody CDRH3 (SEQID NO:587), FDS, DS, S.YYYYYGMDV (SEQ amino acid sequences that are at least about 80% identical to 30 ID NO: 22), YYYYGMDV (SEQID NO:588), YYYG an amino acid sequence represented by the following for MDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: mula: 590), YGMDV (SEQ ID NO. 591), GMDV (SEQ ID NO. 592), and MDV, or a sequence of at least 80% identity to any of them. (i) X is any amino acid residue or no amino acid residue. 35 In still another embodiment, the invention comprises (ii) N1 is an amino acid sequence selected from the group wherein said library consists essentially of a plurality of poly consisting of G. P. R. A. S. L. T. V. GG, GP, GR, GA, GS, nucleotides encoding CDRH3 amino acid sequences that are GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, at least about 80% identical to an amino acid sequence rep PS, PL, PT, PV, RP AP, SP, LP, TP. VP, GGG, GPG, resented by the following formula: GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, 40 AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, (i) X is any amino acid residue or no amino acid residue, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, (ii) N1 is an amino acid sequence selected from the group LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, consisting of G. P. R. A. S. L. T. V. GG, GP, GR, GA, GS, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ., 45 GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, SR, SS, ST, SV, TATR, TS, TT, TW, VD, VS, WSYS, PS, PL, PT, PV, RP AP, SP, LP, TP. VP, GGG, GPG, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, PTO, REL, RPL, SAA, SAL. SGL, SSE, TGL, WGT, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, and combinations thereof; GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, (iii) DH is an amino acid sequence selected from the group 50 AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, consisting of all possible reading frames that do not LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, include a stop codon encoded by IGHD1-1 (SEQ ID RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ., NO: 501), IGHD1-20 (SEQ ID NO. 503), IGHD1-26 SR, SS, ST, SV, TATR, TS, TT, TW, VD, VS, WSYS, (polynucleotides encoding SEQ ID NOs: 13 and 14), AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, IGHD1-7 (SEQ ID NO. 504), IGHD2-15 (polynucle 55 PTO, REL RPL, SAA, SAL, SGL, SSE, TGL, WGT, otides encoding SEQID NO: 16), IGHD2-2 (polynucle and combinations thereof; otides encoding SEQ ID NOs: 10 and 11), IGHD2-21 (iii) DH is an amino acid sequence selected from the group (SEQ ID NOs: 505 and 506), IGHD2-8 (SEQ ID NO: consisting of all possible reading frames that do not 507), IGHD3-10 (polynucleotides encoding SEQ ID include a stop codon encoded by IGHD1-1 (SEQ ID NOs: 1-3), IGHD3-16 (SEQ ID NO: 508), IGHD3-22 60 NO: 501), IGHD1-20 (SEQ ID NO. 503), IGHD1-26 (polynucleotides encoding SEQ ID NO: 4), IGHD3-3 (polynucleotides encoding SEQ ID NOs: 13 and 14), (polynucleotides encoding SEQ ID NO: 9), IGHD3-9 IGHD1-7 (SEQ ID NO. 504), IGHD2-15 (polynucle (SEQID NO:509), IGHD4-17 (polynucleotides encod otides encoding SEQID NO: 16), IGHD2-2 (polynucle ing SEQ ID NO: 12), IGHD4-23 (SEQ ID NO. 510), otides encoding SEQ ID NOs: 10 and 11), IGHD2-21 IGHD4-4 (SEQID NO: 511), IGHD-4-11 (SEQID NO: 65 (SEQ ID NOs: 505 and 506), IGHD2-8 (SEQ ID NO: 511), IGHD5-12 (SEQ ID NO: 512), IGHD5-24 (SEQ 507), IGHD3-10 (polynucleotides encoding SEQ ID ID NO: 513), IGHD5-5 (polynucleotides encoding SEQ NOs: 1-3), IGHD3-16 (SEQ ID NO: 508), IGHD3-22 US 8,877,688 B2 5 6 (polynucleotides encoding SEQ ID NO: 4), IGHD3-3 N-terminal tail residue. In still another aspect, the N-terminal (polynucleotides encoding SEQ ID NO: 9), IGHD3-9 tail residue is selected from the group consisting of G, D, and (SEQID NO:509), IGHD4-17 (polynucleotides encod E. ing SEQ ID NO: 12), IGHD4-23 (SEQ ID NO. 510), In yet another aspect, the N1 amino acid sequence is IGHD4-4 (SEQID NO: 511), IGHD-4-11 (SEQID NO: selected from the group consisting of G. P. R. A. S. L. T. V. 511), IGHD5-12 (SEQ ID NO: 512), IGHD5-24 (SEQ GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, ID NO: 513), IGHD5-5 (polynucleotides encoding SEQ TG, VG, PP, PR, PA, PS, PL, PT, PV, RP AP, SP, LP, TP. VP, ID NO: 15), IGHD-5-18 (polynucleotides encoding GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, SEQID NO: 15), IGHD6-13 (polynucleotides encoding RGG, AGG, SGG, LGG, TGG, VGG, GGA, GGR, GGA, SEQ ID NOs: 7 and 8), IGHD6-19 (polynucleotides 10 GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, encoding SEQ ID NOs: 5 and 6), IGHD6-25 (SEQ ID AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, NO:514), IGHD6-6 (SEQID NO:515), and IGHD7-27 LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, (SEQ ID NO: 516), and N- and C-terminal truncations RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, thereof; TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, (iv) N2 is an amino acid sequence selected from the group 15 EKR, ISR, NTP, PKS, PRP, PTA, PTO, REL RPL, SAA, consisting of G. P. R. A. S. L. T. V. GG, GP, GR, GA, GS, SAL, SGL, SSE, TGL, WGT, and combinations thereof. In GL, GT, GV, PG, RG, AG, SG, LG, TG, VG, PP, PR, PA, certain other aspects, the N1 amino acid sequence may be of PS, PL, PT, PV, RP AP, SP, LP, TP. VP, GGG, GPG, about 0 to about 5 amino acids. GRG, GAG, GSG, GLG, GTG, GVG, PGG, RGG, In yet another aspect, the N2 amino acid sequence is AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, GGS, selected from the group consisting of G. P. R. A. S. L. T. V. GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, GG, GP, GR, GA, GS, GL, GT, GV, PG, RG, AG, SG, LG, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, TG, VG, PP, PR, PA, PS, PL, PT, PV, RP AP, SP, LP, TP. VP, LE, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, GGG, GPG, GRG, GAG, GSG, GLG, GTG, GVG, PGG, RH, RL, RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ., RGG, AGG, SGG, LGG, TGG, VGG, GGP, GGR, GGA, SR, SS, ST, SV, TATR, TS, TT, TW, VD, VS, WSYS, 25 GGS, GGL, GGT, GGV, D, E, F, H, I, K, M, Q, W, Y, AR, AS, AAE, AYH, DTL, EKR, ISR, NTP, PKS, PRP, PTA, AT, AY, DL, DT, EA, EK, FH, FS, HL, HW, IS, KV, LD, LE, PTO, REL, RPL, SAA, SAL. SGL, SSE, TGL, WGT, LR, LS, LT, NR, NT, QE, QL, QT, RA, RD, RE, RF, RH, RL, and combinations thereof, and RR, RS, RV, SA, SD, SE, SF, SI, SK, SL, SQ, SR, SS, ST, SV, (v) H3-JH is an amino acid sequence selected from the TA, TR, TS, TT, TW, VD, VS, WS, YS, AAE, AYH, DTL, group consisting of AEYFQH (SEQ ID NO: 17), 30 EKR, ISR, NTP, PKS, PRP, PTA, PTO, REL RPL, SAA, EYFQH (SEQID NO: 583), YFQH (SEQID NO:584), SAL, SGL, SSE, TGL, WGT, and combinations thereof. In FQH, QH, H, YWYFDL (SEQ ID NO: 18), WYFDL certain other aspects, the N2 sequence may be of about 0 to (SEQIDNO:585),YFDL (SEQIDNO:586), FDL, DL, about 5 amino acids. LAFDV (SEQIDNO:19), FDV, DV, V,YFDY (SEQID In yet another aspect, the H3-JH amino acid sequence is NO:20), FDY, DY.Y., NWFDS (SEQIDNO: 21), WFDS 35 selected from the group consisting of AEYFQH (SEQID NO: (SEQID NO:587), FDS, DS, S.YYYYYGMDV (SEQ 17), EYFQH (SEQID NO: 583), YFQH (SEQID NO:584), ID NO: 22), YYYYGMDV (SEQID NO:588), YYYG FQH, QH, HYWYFDL (SEQIDNO: 18), WYFDL (SEQID MDV (SEQ ID NO: 589), YYGMDV (SEQ ID NO: NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV 590), YGMDV (SEQ ID NO. 591), GMDV (SEQ ID (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), NO. 592), and MDV, or a sequence of at least 80% 40 FDY, DY.Y., NWFDS (SEQIDNO: 21), WFDS (SEQID NO: identity to any of them. 587), FDS, DS, S, YYYYYGMDV (SEQ ID NO: 22), In another embodiment, the invention comprises a library YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID of synthetic polynucleotides, wherein said polynucleotides NO:589),YYGMDV (SEQID NO:590), YGMDV (SEQID encode one or more full length antibody heavy chain NO:591), GMDV (SEQ ID NO. 592), and MDV. sequences, and wherein the CDRH3 amino acid sequences of 45 In other embodiments, the invention comprises a library of the heavy chain comprise: synthetic polynucleotides encoding a plurality of antibody (i) an N1 amino acid sequence of 0 to about 3 amino acids, CDRH3 amino acid sequences, wherein the percent occur wherein each amino acid of the N1 amino acid sequence rence within the central loop of the CDRH3 amino acid is among the 12 most frequently occurring amino acids sequences of at least one of the following i-i--1 pairs in the at the corresponding position in N1 amino acid 50 library is within the ranges specified below: sequences of CDRH3 amino acid sequences that are Tyr-Tyr in an amount from about 2.5% to about 6.5%: functionally expressed by human B cells; Ser-Gly in an amount from about 2.5% to about 4.5%: (ii) a human CDRH3 DH amino acid sequence, N- and Ser-Ser in an amount from about 2% to about 4%; C-terminal truncations thereof, or a sequence of at least Gly-Ser in an amount from about 1.5% to about 4%; about 80% identity to any of them; 55 Tyr-Ser in an amount from about 0.75% to about 2%; (iii) an N2 amino acid sequence of 0 to about 3 amino acids, Tyr-Gly in an amount from about 0.75% to about 2%; and wherein each amino acid of the N2 amino acid sequence Ser-Tyr in an amount from about 0.75% to about 2%. is among the 12 most frequently occurring amino acids In still other embodiments, the invention comprises a at the corresponding position in N2 amino acid library of synthetic polynucleotides encoding a plurality of sequences of CDRH3 amino acid sequences that are 60 antibody CDRH3 amino acid sequences, wherein the percent functionally expressed by human B cells; and occurrence within the central loop of the CDRH3 amino acid (iv) a human CDRH3 H3-JH amino acid sequence, N-ter sequences of at least one of the following i-i--2 pairs in the minal truncations thereof, or a sequence of at least about library is within the ranges specified below: 80% identity to any of them. Tyr-Tyr in an amount from about 2.5% to about 4.5%: The following embodiments may be applied throughout 65 Gly-Tyr in an amount from about 2.5% to about 5.5%; the embodiments of the instant invention. In one aspect, one Ser-Tyr in an amount from about 2% to about 4%; or more CDRH3 amino acid sequences further comprise an Tyr-Ser in an amount from about 1.75% to about 3.75%; US 8,877,688 B2 7 8 Ser-Gly in an amount from about 2% to about 3.5%; acid sequences are from about 2 to about 30, from about 8 to Ser-Ser in an amount from about 1.5% to about 3%; about 19, or from about 10 to about 18 amino acid residues in Gly-Ser in an amount from about 1.5% to about 3%; and length. In other aspects, the synthetic polynucleotides of the Tyr-Gly in an amount from about 1% to about 2%. library encode from about 10° to about 10", from about 107 In another embodiment, the invention comprises a library 5 to about 10", from about 10 to about 10'', from about 10 to of synthetic polynucleotides encoding a plurality of antibody about 10', or from about 10' to about 10' unique CDRH3 CDRH3 amino acid sequences, wherein the percent occur amino acid sequences. rence within the central loop of the CDRH3 amino acid In certain embodiments, the invention comprises a library sequences of at least one of the following i-i--3 pairs in the of synthetic polynucleotides, wherein said polynucleotides library is within the ranges specified below: 10 encode a plurality of antibody VKCDR3 amino acid Gly-Tyr in an amount from about 2.5% to about 6.5%: sequences comprising about 1 to about 10 of the amino acids Ser-Tyr in an amount from about 1% to about 5%; found at Kabat positions 89,90,91, 92,93, 94, 95,95A, 96, Tyr-Ser in an amount from about 2% to about 4%; and 97, in selected VKCDR3 amino acid sequences derived Ser-Ser in an amount from about 1% to about 3%; from a particular IGKV or IGKJ germline sequence. Gly-Ser in an amount from about 2% to about 5%; and 15 In one aspect, the synthetic polynucleotides encode one or Tyr-Tyr in an amount from about 0.75% to about 2%. more of the amino acid sequences listed in Table 33 or a In one aspect of the invention, at least 2, 3, 4, 5, 6, or 7 of sequence at least about 80% identical to any of them. the specified i-i-1 pairs in the library are within the specified In some embodiments, the invention comprises a library of ranges. In another aspect, the CDRH3 amino acid sequences synthetic polynucleotides, wherein said polynucleotides are human. In yet another aspect, the polynucleotides encode encode a plurality of unique antibody VKCDR3 amino acid at least about 10 unique CDRH3 amino acid sequences. sequences that are of at least about 80% identity to an amino In other aspects of the invention, the polynucleotides fur acid sequence represented by the following formula: ther encode one or more heavy chain chassis amino acid sequences that are N-terminal to the CDRH3 amino acid VK Chassis-L3-VK-X-JK), wherein: sequences, and the one or more heavy chain chassis 25 (i) VK Chassis is an amino acid sequence selected from sequences are selected from the group consisting of about the group consisting of about Kabat amino acid 1 to Kabatamino acid 1 to about Kabatamino acid 94 encoded by about Kabatamino acid 88 encoded by IGKV1-05 (SEQ IGHV1-2 (SEQ ID NO: 24), IGHV1-3 (SEQ ID NO: 423), ID NO: 229), IGKV1-06 (SEQID NO:451), IGKV1-08 IGHV1-8 (SEQID NOs: 424,425), IGHV1-18 (SEQID NO: (SEQID NO: 452,453), IGKV1-09 (SEQID NO:454), 25), IGHV1-24 (SEQID NO:426), IGHV1-45 (SEQID NO: 30 IGKV1-12 (SEQID NO: 230), IGKV1-13 (SEQID NO: 427), IGHV1-46 (SEQID NO: 26), IGHV1-58 (SEQID NO: 455), IGKV1-16 (SEQ ID NO:456), IGKV1-17 (SEQ 428), IGHV1-69 (SEQID NO: 27), IGHV2-5 (SEQID NO: ID NO:457), IGKV1-27 (SEQID NO. 231), IGKV1-33 429), IGHV2-26 (SEQ ID NO: 430), IGHV2-70 (SEQ ID (SEQID NO:232), IGKV1-37 (SEQIDNOs:458,459), NO: 431,432), IGHV3-7 (SEQID NO: 28), IGHV3-9 (SEQ IGKV1-39 (SEQ ID NO. 233), IGKV1D-16 (SEQ ID ID NO: 433), IGHV3-11 (SEQ ID NO: 434), IGHV3-13 35 NO: 460), IGKV1D-17 (SEQID NO: 461), IGKV1D (SEQID NO:435), IGHV3-15 (SEQID NO: 29), IGHV3-20 43 (SEQID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, (SEQ ID NO:436), IGHV3-21 (SEQID NO:437), IGHV3 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ 23 (SEQID NO:30), IGHV3-30 (SEQID NO:31), IGHV3 ID NO. 234), IGKV2-29 (SEQID NO:466), IGKV2-30 33 (SEQID NO:32), IGHV3-43 (SEQID NO:438), IGHV3 (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), 48 (SEQID NO:33), IGHV3-49 (SEQID NO:439), IGHV3 40 IGKV2D-26 (SEQID NO: 469), IGKV2D-29 (SEQ ID 53 (SEQ ID NO: 440), IGHV3-64 (SEQ ID NO. 441), NO: 470), IGKV2D-30 (SEQID NO: 471), IGKV3-11 IGHV3-66 (SEQ ID NO: 442), IGHV3-72 (SEQ ID NO: (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), 443), IGHV3-73 (SEQ ID NO. 444), IGHV3-74 (SEQ ID IGKV3-20 (SEQ ID NO. 237), IGKV3D-07 (SEQ ID NO: 445), IGHV4-4 (SEQ ID NO. 446, 447), IGHV4-28 NO: 472), IGKV3D-11 (SEQ ID NO: 473), IGKV3D (SEQID NO:448), IGHV4-31 (SEQID NO:34), IGHV4-34 45 20 (SEQ ID NO: 474), IGKV4-1 (SEQ ID NO. 238), (SEQ ID NO:35), IGHV4-39 (SEQID NO:36), IGHV4-59 IGKV5-2 (SEQID NOs: 475,476), IGKV6-21 (SEQID (SEQ ID NO:37), IGHV4-61 (SEQ ID NO:38), IGHV4-B NOs: 477), and IGKV6D-41, or a sequence of at least (SEQ ID NO:39), IGHV5-51 (SEQ ID NO: 40), IGHV6-1 about 80% identity to any of them; (SEQID NO:449), and IGHV7-4-1 (SEQID NO:450), or a (ii) L3-VK is the portion of the VKCDR3 encoded by the sequence of at least about 80% identity to any of them. 50 IGKV gene segment; and In another aspect, the polynucleotides further encode one (iii) X is any amino acid residue; and or more FRM4 amino acid sequences that are C-terminal to (iv) JK is an amino acid sequence selected from the group the CDRH3 amino acid sequences, wherein the one or more consisting of sequences encoded by IGJK1, IGJK2, FRM4 amino acid sequences are selected from the group IGJK3, IGJK4, and IGJK5, wherein the first residue of consisting of a FRM4 amino acid sequence encoded by 55 each IGJK sequence is not present. IGHJ1 (SEQID NO: 253), IGHJ2 (SEQID NO:254), IGHJ3 In still other aspects, X may be selected from the group (SEQ ID NO: 255), IGHJ4 (SEQID NO:256), IGHJ5 (SEQ consisting of F, L, I, R. W.Y. and P. IDNO: 257), and IGHJ6 (SEQID NO: 257), or a sequence of In certain embodiments, the invention comprises a library at least about 80% identity to any of them. In still another of synthetic polynucleotides, wherein said polynucleotides aspect, the polynucleotides further encode one or more 60 encode a plurality of VWCDR3 amino acid sequences that are immunoglobulin heavy chain constant region amino acid of at least about 80% identity to an amino acid sequence sequences that are C-terminal to the FRM4 sequence. represented by the following formula: In yet another aspect, the CDRH3 amino acid sequences are expressed as part of full-length heavy chains. In other VW Chassis-L3-Vi-JJ), wherein: aspects, the full-length heavy chains are selected from the 65 (i) VW Chassis is an amino acid sequence selected from the group consisting of an IgG1, IgG2, IgG3, and IgG4, or com group consisting of about Kabat amino acid 1 to about binations thereof. In one embodiment, the CDRH3 amino Kabat amino acid 88 encoded by IGV1-36 (SEQ ID US 8,877,688 B2 9 10 NO: 480), IG) V1-40 (SEQ ID NO: 531), IGV1-44 475, 476), IGKV6-21 (SEQ ID NOs: 477), and (SEQ ID NO: 532), IG) V1-47 (SEQ ID NO: 481), IGKV6D-41, or a sequence of at least about 80% iden IGV1-51 (SEQ ID NO: 533), IG) V10-54 (SEQ ID tity to any of them; NO:482), IGV2-11 (SEQID NOS:483,484), IGV2 (b) L3-VK is the portion of the VKCDR3 encoded by the 14 (SEQ ID NO: 534), IGAV2-18 (SEQ ID NO: 485), IGKV gene segment; and IGV2-23 (SEQID NOS:486,487), IGV2-8 (SEQID (c) X is any amino acid residue, and NO: 488), IGV3-1 (SEQ ID NO: 535), IGV3-10 (d) JK is an amino acid sequence selected from the group (SEQ ID NO: 489), IG) V3-12 (SEQ ID NO: 490), consisting of sequences encoded by IGJK1, IGJK2, IGV3-16 (SEQID NO:491), IG) V3-19 (SEQID NO: IGJK3, IGJK4, and IGJK5, wherein the first residue of 536), IGV3-21 (SEQIDNO:537), IGV3-25 (SEQID 10 each IGJK sequence is not present. NO: 492), IGV3-27 (SEQ ID NO: 493), IGV3-9 In some aspects, the VKCDR3 amino acid sequence com (SEQ ID NO: 494), IGV4-3 (SEQ ID NO: 495), prises one or more of the sequences listed in Table 33 or a IGV4-60 (SEQID NO: 496), IGV4-69 (SEQID NO: sequence at least about 80% identical to any of them. In other 538), IG2V5-39 (SEQIDNO:497), IGV5-45 (SEQID 15 aspects, the antibody proteins are expressed in a het NO: 540), IGV6-57 (SEQ ID NO: 539), IG2V7-43 erodimeric form. In yet another aspect, the human antibody (SEQ ID NO: 541), IGV7-46 (SEQ ID NO: 498), proteins are expressed as antibody fragments. In still other IGV8-61 (SEQID NO:499), IGV9-49 (SEQID NO: aspects of the invention, the antibody fragments are selected 500), and IG) V10-54 (SEQID NO: 482), or a sequence from the group consisting of Fab, Fab'. F(ab'). Fv fragments, of at least about 80% identity to any of them; diabodics, linear antibodies, and single-chain antibodies. (ii) L3-V) is the portion of the VCDR3 encoded by the In certain embodiments, the invention comprises an anti IGWV segment; and body isolated from the polypeptide expression products of (iii) JW is an amino acid sequence selected from the group any library described herein. consisting of sequences encoded by IGWJ1-01, IGWJ2 In still other aspects, the polynucleotides further comprise 01, IG) J3-01, IGN.J3-02, IGNJ6-01, IGN.J7-01, and 25 a 5' polynucleotide sequence and a 3' polynucleotide IGWJ7-02, and wherein the first residue of each IGJ). sequence that facilitate homologous recombination. sequence may or may not be deleted. In one embodiment, the polynucleotides further encode an In further aspects, the invention comprises a library of alternative scaffold. synthetic polynucleotides, wherein said polynucleotides In another embodiment, the invention comprises a library encode a plurality of antibody proteins comprising: of polypeptides encoded by any of the synthetic polynucle (i) a CDRH3 amino acid sequence of claim 1; and otide libraries described herein. (ii) a VKCDR3 amino acid sequence comprising about 1 to In yet another embodiment, the invention comprises a about 10 of the amino acids found at Kabat positions 89. library of vectors comprising any of the polynucleotide librar 90, 91, 92, 93, 94, 95, 95A, 96, and 97, in selected ies described herein. In certain other aspects, the invention VKCDR3 sequences derived from a particular IGKV or 35 comprises a population of cells comprising the vectors of the IGKJ germline sequence. instant invention. In still further aspects, the invention comprises a library of In one aspect, the doubling time of the population of cells synthetic polynucleotides, wherein said polynucleotides is from about 1 to about 3 hours, from about 3 to about 8 encode a plurality of antibody proteins comprising: 40 hours, from about 8 to about 16 hours, from about 16 to about (i) a CDRH3 amino acid sequence of claim 1; and 20 hours, or from 20 to about 30 hours. In yet another aspect, (ii) a VKCDR3 amino acid sequences of at least about 80% the cells are yeast cells. In still another aspect, the yeast is identity to an amino acid sequence represented by the Saccharomyces cerevisiae. following formula: In other embodiments, the invention comprises a library 45 that has a theoretical total diversity of N unique CDRH3 VK Chassis-L3-VK-X-JK*, wherein: sequences, wherein N is about 10° to about 10"; and wherein (a) VK Chassis is an amino acid sequence selected from the physical realization of the theoretical total CDRH3 diver the group consisting of about Kabat amino acid 1 to sity has a size of at least about 3N, thereby providing a about Kabatamino acid 88 encoded by IGKV1-05 (SEQ probability of at least about 95% that any individual CDRH3 ID NO: 229), IGKV1-06 (SEQID NO:451), IGKV1-08 50 sequence contained within the theoretical total diversity of (SEQID NO: 452,453), IGKV1-09 (SEQID NO:454), the library is present in the actual library. IGKV1-12 (SEQID NO: 230), IGKV1-13 (SEQID NO: In certain embodiments, the invention comprises a library 455), IGKV1-16 (SEQ ID NO:456), IGKV1-17 (SEQ of synthetic polynucleotides, wherein said polynucleotides ID NO:457), IGKV1-27 (SEQID NO. 231), IGKV1-33 encode a plurality of antibody VWCDR3 amino acid (SEQID NO:232), IGKV1-37 (SEQIDNOs:458,459), 55 sequences comprising about 1 to about 10 of the amino acids IGKV1-39 (SEQ ID NO. 233), IGKV1D-16 (SEQ ID foundat Kabat positions 89,90,91, 92,93, 94,95,95A,95B, NO: 460), IGKV1D-17 (SEQID NO: 461), IGKV1D 95C, 96, and 97, in selected VWCDR3 sequences encoded by 43 (SEQID NO: 462), IGKV1D-8 (SEQ ID NOs: 463, a single germline sequence. 464), IGKV2-24 (SEQ ID NO: 465), IGKV2-28 (SEQ In some embodiments, the invention relates to a library of ID NO. 234), IGKV2-29 (SEQID NO:466), IGKV2-30 60 synthetic polynucleotides encoding a plurality of antibody (SEQ ID NO: 467), IGKV2-40 (SEQ ID NO: 468), CDRH3 amino acid sequences, wherein the library has a IGKV2D-26 (SEQID NO: 469), IGKV2D-29 (SEQID theoretical total diversity of about 10° to about 10' unique NO: 470), IGKV2D-30 (SEQID NO: 471), IGKV3-11 CDRH3 sequences. (SEQ ID NO: 235), IGKV3-15 (SEQ ID NO: 236), In still other embodiments, the invention relates to a IGKV3-20, IGKV3D-07 (SEQID NO:472), IGKV3D 65 method of preparing a library of synthetic polynucleotides 11 (SEQID NO:473), IGKV3D-20 (SEQID NO.474), encoding a plurality of antibody VK amino acid sequences, IGKV4-1 (SEQID NO: 238), IGKV5-2 (SEQID NOs: the method comprising: US 8,877,688 B2 11 12 (i) providing polynucleotide sequences encoding: (i) providing polynucleotide sequences encoding: (a) one or more VK Chassis amino acid sequences (a) one or more VW Chassis amino acid sequences selected from the group consisting of about Kabat selected from the group consisting of about Kabat amino acid 1 to about Kabat amino acid 88 encoded residue 1 to about Kabat residue 88 encoded by by IGKV1-05 (SEQ ID NO: 229), IGKV1-06 (SEQ IGV1-36 SEQ ID NO: 480), IGV1-40 (SEQ ID ID NO: 451), IGKV1-08 (SEQ ID NO: 452, 453), NO: 531), IG) V1-44 (SEQ ID NO: 532), IGV1-47 IGKV1-09 (SEQ ID NO:454), IGKV1-12 (SEQ ID (SEQ ID NO: 481), IGV1-51 (SEQ ID NO: 533), NO: 230), IGKV1-13 (SEQID NO:455), IGKV1-16 IGV 10-54 (SEQ ID NO: 482), IGV2-11 (SEQ ID (SEQ ID NO. 456), IGKV1-17 (SEQ ID NO. 457), NO: 483, 484), IGV2-14 (SEQ ID NO: 534), IGKV1-27 (SEQ ID NO. 231), IGKV1-33 (SEQ ID 10 IGV2-18 (SEQ ID NO: 485), IG), V2-23 (SEQ ID NO: 232), IGKV1-37 (SEQ ID NOs: 458, 459), NO: 486, 487), IGV2-8 (SEQ ID NO: 488), IGKV1-39 (SEQID NO. 233), IGKV1D-16 (SEQID IGV3-1 (SEQ ID NO: 535), IGV3-10 (SEQ ID NO: 460), IGKV1D-17 (SEQ ID NO: 461), NO: 489), IG) V3-12 (SEQ ID NO: 490), IGV3-16 IGKV1D-43 (SEQ ID NO: 462), IGKV1D-8 (SEQ (SEQ ID NO: 491), IGV3-19 (SEQ ID NO: 536), ID NOs: 463,464), IGKV2-24 (SEQ ID NO: 465), 15 IGV3-21 (SEQ ID NO. 537), IGV3-25 (SEQ ID IGKV2-28 (SEQ ID NO. 234), IGKV2-29 (SEQ ID NO: 492), IG) V3-27 (SEQ ID NO: 493), IGV3-9 NO: 466), IGKV2-30 (SEQID NO: 467), IGKV2-40 (SEQ ID NO: 494), IGV4-3 (SEQ ID NO: 495), (SEQID NO: 468), IGKV2D-26 (SEQID NO:469), IGV4-60 (SEQ ID NO: 496), IGV4-69 (SEQ ID IGKV2D-29 (SEQID NO: 470), IGKV2D-30 (SEQ NO:538), IGV5-39 (SEQ ID NO: 497), IGV5-45 ID NO.471), IGKV3-11 (SEQIDNO:235), IGKV3 (SEQ ID NO. 540), IGV6-57 (SEQ ID NO: 539), 15 (SEQIDNO: 236), IGKV3-20 (SEQID NO. 237), IGV7-43 (SEQ ID NO: 541), IGV7-46 (SEQ ID IGKV3D-07 (SEQID NO: 472), IGKV3D-11 (SEQ NO: 498), IG) V8-61 (SEQ ID NO: 499), IGV9-49 ID NO: 473), IGKV3D-20 (SEQ ID NO: 474), (SEQID NO:500), and IG 10-54 (SEQID NO:482), IGKV4-1 (SEQ ID NO. 238), IGKV5-2 (SEQ ID or a sequence at least about 80% identical to any of NOs: 475,476), IGKV6-21 (SEQID NOs: 477), and 25 them; IGKV6D-41, or a sequence at least about 80% iden (b) one or more L3-VW sequences, wherein L3-V is tical to any of them; the portion of the VWCDR3 amino acid sequence (b) one or more L3-VKamino acid sequences, wherein encoded by the IGWV gene segment; L3-VK the portion of the VKCDR3 amino acid (c) one or more JW sequences, wherein JW is an amino sequence encoded by the IGKV gene segment; 30 acid sequence selected from the group consisting of (c) one or more X residues, wherein X is any amino acid amino acid sequences encoded by IG. J1-01 (SEQID residue; and NO:557), IGJ2-01 (SEQ ID NO:558), IGJ3-01 (d) one or more JKamino acid sequences, wherein.JK* (SEQ ID NO. 559), IGJ3-02 (SEQ ID NO. 560), is an amino acid sequence selected from the group IGJ16-01 (SEQ ID NO: 561), IGJ7-01 (SEQ ID consisting amino acid sequences encoded by IGKJ1 35 NO: 562), and IGJ7-02 (SEQID NO:563) wherein (SEQID NO:552), IGKJ2 (SEQID NO:553), IGKJ3 the first amino acid residue of each sequence may or (SEQ ID NO: 554), IGKJ4 (SEQ ID NO. 555), and may not be present; and IGKJ5 (SEQ ID NO: 556), wherein the first amino (ii) assembling the polynucleotide sequences to produce a acid residue of each sequence is not present; and library of synthetic polynucleotides encoding a plurality (ii) assembling the polynucleotide sequences to produce a 40 of human VW amino acid sequences represented by the library of synthetic polynucleotides encoding a plurality following formula: of human VK sequences represented by the following formula: VW Chassis-L3-Vi-JJ). In certain embodiments, the amino acid sequences 45 encoded by the polynucleotides of the libraries of the inven In some embodiments, the invention relates to a method of tion are human. preparing a library of synthetic polynucleotides encoding a The present invention is also directed to methods of pre plurality of antibody light chain CDR3 sequences, the method paring a synthetic polynucleotide library comprising provid comprising: ing and assembling the polynucleotide sequences of the (i) determining the percent occurrence of each amino acid 50 instant invention. residue at each position in selected light chain CDR3 In another aspect, the invention comprises a method of amino acid sequences derived from a single germline preparing the library of synthetic polynucleotides encoding a polynucleotide sequence; plurality of antibody CDRH3 amino acid sequences, the (ii) designing synthetic polynucleotides encoding a plural method comprising: ity of human antibody light chain CDR3 amino acid 55 (i) providing polynucleotide sequences encoding: sequences, wherein the percent occurrence of any amino (a) one or more N1 amino acid sequences of about 0 to acid at any position within the designed light chain about 3 amino acids, wherein each amino acid of the CDR3 amino acid sequences is within at least about 30% N1 amino acid sequence is among the 12 most fre of the percent occurrence in the selected light chain quently occurring amino acids at the corresponding CDR3 amino acid sequences derived from a single ger 60 position in N1 sequences of CDRH3 amino acid mline polynucleotide sequence, as determined in (i); and sequences that are functionally expressed by human B (iii) synthesizing one or more polynucleotides that were cells; designed in (ii). (b) one or more human CDRH3 DH amino acid In other embodiments, the invention relates to a method of sequences, N- and C-terminal truncations thereof, or a preparing a library of synthetic polynucleotides encoding a 65 sequence of at least about 80% identity to any of them; plurality of antibody VW amino acid sequences, the method (c) one or more N2 amino acid sequences of about 0 to comprising: about 3 amino acids, wherein each amino acid of the US 8,877,688 B2 13 14 N1 amino acid sequence is among the 12 most fre FIG. 3 depicts the length distribution of the CDRL3 quently occurring amino acids at the corresponding regions of rearranged human kappa light chain sequences position in N2 amino acid sequences of CDRH3 compiled from the NCBI database (Appendix A). amino acid sequences that are functionally expressed FIG. 4 depicts the length distribution of the CDRL3 by human B cells; and regions of rearranged human lambda light chain sequences (d) one or more human CDRH3 H3-JH amino acid compiled from the NCBI database (Appendix B). sequences, N-terminal truncations thereof, or a FIG. 5 depicts a schematic representation of the 424 clon sequence of at least about 80% identity to any of them; ing vectors used in the synthesis of the CDRH3 regions before and and after ligation of the DH-N2-JH segment (ii) assembling the polynucleotide sequences to produce a 10 library of synthetic polynucleotides encoding a plurality (DTAVYYCAR: SEQID NO:579; DTAVYYCAK: SEQ ID of human antibody CDRH3 amino acid sequences rep NO:578; SSASTK: SEQID NO:580). resented by the following formula: FIG. 6 depicts a schematic structure of a heavy chain vec tor, prior to recombination with a CDRH3. 15 FIG.7 depicts a schematic diagram of a CDRH3 integrated In one aspect, one or more of the polynucleotide sequences into a heavy chain vector and the polynucleotide and polypep are synthesized via split-pool synthesis. tide sequences of CDRH3 (SEQID NO: 581). In another aspect, the method of the invention further com FIG. 8 depicts a schematic structure of a kappa light chain prises the step of recombining the assembled synthetic poly vector, prior to recombination with a CDRL3. nucleotides with a vector comprising a heavy chain chassis FIG.9 depicts a schematic diagram of a CDRL3 integrated and aheavy chain constant region, to form a full-length heavy into a light chain vector and the polynucleotide and polypep chain. tide sequences of CDRL3 (SEQID NO: 582). In another aspect, the method of the invention further com FIG. 10 depicts the length distribution of the CDRH3 prises the step of providing a 5" polynucleotide sequence and domain (Kabat positions 95-102) from 96 colonies obtained a 3' polynucleotide sequence that facilitate homologous 25 by transformation with 10 of the 424 vectors synthesized as recombination. In still another aspect, the method of the described in Example 10 (observed), as compared to the invention further comprises the step of recombining the expected (i.e., designed) distribution. assembled synthetic polynucleotides with a vector compris FIG. 11 depicts the length distribution of the DH segment ing a heavy chain chassis and a heavy chain constant region, from 96 colonies obtained by transformation with 10 of the to form a full-length heavy chain. 30 424 vectors synthesized as described in Example 10 (ob In some embodiments, the step of recombining is per served), as compared to the expected (i.e., designed) distri formed in yeast. In certain embodiments, the yeast is S. Cer bution. evisiae. In certain other embodiments, the invention comprises a FIG. 12 depicts the length distribution of the N2 segment method of isolating one or more host cells expressing one or 35 from 96 colonies obtained by transformation with 10 of the more antibodies, the method comprising: 424 vectors synthesized as described in Example 10 (ob (i) expressing any of the human antibodies disclosed herein served), as compared to the expected (i.e., designed) distri in one or more host cells; bution. (ii) contacting the host cells with one or more antigens; and FIG. 13 depicts the length distribution of the H3-JH seg (iii) isolating one or more host cells having antibodies that 40 ment from 96 colonies obtained by transformation with 10 of bind to the one or more antigens. the 424 vectors synthesized as described in Example 10 (ob In another aspect, the method of the invention further com served), as compared to the expected (i.e., designed) distri prises the step of isolating one or more antibodies from the bution. one or more host cells that present the antibodies which FIG. 14 depicts the length distribution of the CDRH3 recognize the one or more antigens. In yet another aspect, the 45 domains from 291 sequences prepared from yeast cells trans method of the invention further comprises the step of isolat formed according to the method outlined in Example 10.4, ing one or more polynucleotide sequences encoding one or namely the co-transformation of vectors containing heavy more antibodies from the one or more host cells that present chain chassis and constant regions with a CDRH3 insert the antibodies which recognize the one or more antigens. (observed), as compared to the expected (i.e., designed) dis In certain other embodiments, the invention comprises a kit 50 tribution. comprising the library of synthetic polynucleotides encoding FIG. 15 depicts the length distribution of the Tail-N1 a plurality of antibody CDRH3 amino acid sequences, or any region from the 291 sequences prepared from yeast cells of the other sequences disclosed herein. transformed according to the protocol outlined in Example In still other aspects, the CDRH3 amino acid sequences 10.4 (observed), as compared to the expected (i.e., designed) encoded by the libraries of synthetic polynucleotides 55 distribution. described herein, or any of the other sequences disclosed FIG. 16 depicts the length distribution of the DH region herein, are in computer readable form. from the 291 sequences prepared from yeast cells trans formed according to the protocol outlined in Example 10.4 BRIEF DESCRIPTION OF THE DRAWINGS (observed), as compared to the theoretical (i.e., designed) 60 distribution. FIG. 1 depicts a schematic of recombination between a FIG. 17 depicts the length distribution of the N2 region fragment (e.g., CDR3) and a vector (e.g., comprising a chas from the 291 sequences prepared from yeast cells trans sis and constant region) for the construction of a library. formed according to the protocol outlined in Example 10.4 FIG. 2 depicts the length distribution of the N1 and N2 (observed), as compared to the theoretical (i.e., designed) regions of rearranged human antibody sequences compiled 65 distribution. from Jackson et al. (J. Immunol. Methods, 2007, 324; 26. FIG. 18 depicts the length distribution of the H3-JH region incorporated by reference in its entirety). from the 291 sequences prepared from yeast cells trans US 8,877,688 B2 15 16 formed according to the protocol outlined in Example 10.4 able region (heavy and light) is also recombined with a con (observed), as compared to the theoretical (i.e., designed) stant region, to produce a full-length immunoglobulin chain. distribution. While combinatorial assembly of the V, D and J gene FIG. 19 depicts the familial origin of the JH segments segments make a substantial contribution to antibody variable identified in the 291 sequences (observed), as compared to the region diversity, further diversity is introduced in vivo, at the theoretical (i.e., designed) familial origin. pre-B cell stage, via imprecise joining of these gene segments FIG. 20 depicts the representation of each of the 16 chassis and the introduction of non-templated nucleotides at the junc of the library (observed), as compared to the theoretical (i.e., tions between the gene segments. designed) chassis representation. VH3-23 is represented After a B cell recognizes an antigen, it is induced to pro twice; once ending in CAR and once ending in CAK. These 10 liferate. During proliferation, the B cell receptor locus under goes an extremely high rate of Somatic mutation that is far representations were combined, as were the ten variants of greater than the normal rate of genomic mutation. The muta VH3-33 with one variant of VH3-30. tions that occur are primarily localized to the Ig variable FIG. 21 depicts a comparison of the CDRL3 length from 86 regions and comprise Substitutions, insertions and deletions. sequences selected from the VKCDR3 library of Example 6.2 15 This somatic hypermutation enables the production of B cells (observed) to human sequences (human) and the designed that express antibodies possessing enhanced affinity toward sequences (designed). an antigen. Such antigen-driven Somatic hypermutation fine FIG.22 depicts the representation of the light chain chassis tunes antibody responses to a given antigen. amongst the 86 sequences selected from the library (ob Significant efforts have been made to create antibody served), as compared to the theoretical (i.e., designed) chassis libraries with extensive diversity, and to mimic the natural representation. process of affinity maturation of antibodies against various FIG. 23 depicts the frequency of occurrence of different antigens, especially antigens associated with diseases such as CDRH3 lengths in an exemplary library of the invention, autoimmune diseases, cancer, and infectious disease. Anti Versus the preimmune repertoire of Lee et al. (Immunogenet body libraries comprising candidate binding molecules that ics, 2006, 57: 917, incorporated by reference in its entirety). 25 can be readily screened against targets are desirable. How FIG. 24 depicts binding curves for 6 antibodies selected ever, the full promise of an antibody library, which is repre from a library of the invention. sentative of the preimmune human antibody repertoire, has FIG. 25 depicts binding data for 10 antibodies selected remained elusive. In addition to the shortcomings enumerated from a library of the invention binding to hen egg white above, and throughout the application, synthetic libraries that lysozyme. 30 are known in the art often suffer from noise (i.e., very large libraries increase the presence of many sequences which do DETAILED DESCRIPTION OF THE INVENTION not express well, and/or which misfold), while entirely human libraries that are known in the art may be biased The present invention is directed to, at least, synthetic against certain antigen classes (e.g., self-antigens). More polynucleotide libraries, methods of producing and using the 35 over, the limitations of synthesis and physical realization libraries of the invention, kits and computer readable forms techniques restrict the functional diversity of antibody librar including the libraries of the invention. The libraries taught in ies of the art. The present invention provides, for the first time, this application are described, at least in part, in terms of the a fully synthetic antibody library that is representative of the components from which they are assembled. human preimmune antibody repertoire (e.g., in composition In certain embodiments, the instant invention provides 40 and length), and that can be readily Screened (i.e., it is physi antibody libraries specifically designed based on the compo cally realizable and, in Some cases can be oversampled) sition and CDR length distribution in the naturally occurring using, for example, high throughput methods, to obtain, for human antibody repertoire. It is estimated that, even in the example, new therapeutics and/or diagnostics absence of antigenic stimulation, a human makes at least In particular, the synthetic antibody libraries of the instant about 107 different antibody molecules. The antigen-binding 45 invention have the potential to recognize any antigen, includ sites of many antibodies can cross-react with a variety of ing self-antigens of human origin. The ability to recognize related but different epitopes. In addition the human antibody self-antigens is usually lost in an expressed human library, repertoire is large enough to ensure that there is an antigen because self-reactive antibodies are removed by the donors binding site to fit almost any potential epitope, albeit with low immune system via negative selection. Another feature of the affinity. 50 invention is that screening the antibody library using positive The mammalian immune system has evolved unique clone selection, for example, by FACS (florescence activated genetic mechanisms that enable it to generate an almost cell sorter) bypasses the standard and tedious methodology of unlimited number of different light and heavy chains in a generating a hybridoma library and Supernatant screening. remarkably economical way, by combinatorially joining Still further, the libraries, or sub-libraries thereof, can be chromosomally separated gene segments prior to transcrip 55 screened multiple times, to discover additional antibodies tion. Each type of immunoglobulin (Ig) chain (i.e., Klight, W against other desired targets. Before further description of the light, and heavy) is synthesized by combinatorial assembly of invention, certain terms are defined. DNA sequences selected from two or more families of gene segments, to produce a single polypeptide chain. Specifically, 1. Definitions the heavy chains and light chains each consist of a variable 60 region and a constant (C) region. The variable regions of the Unless defined otherwise, all technical and scientific terms heavy chains are encoded by DNA sequences assembled from used herein have the meaning commonly understood by one three families of gene segments: variable (IGHV), joining of ordinary skill in the art relevant to the invention. The (IGHJ) and diversity (IGHD). The variable regions of light definitions below supplement those in the art and are directed chains are encoded by DNA sequences assembled from two 65 to the embodiments described in the current application. families of gene segments for each of the kappa and lambda The term “antibody' is used herein in the broadest sense light chains: variable (IGLV) and joining (IGLJ). Each vari and specifically encompasses at least monoclonal antibodies, US 8,877,688 B2 17 18 polyclonal antibodies, multi-specific antibodies (e.g., bispe- sis of the invention may contain certain modifications relative cific antibodies), chimericantibodies, humanized antibodies, to the corresponding germline variable domain sequences human antibodies, and antibody fragments. An antibody is a presented herein or available in public databases. These protein comprising one or more polypeptides Substantially or modifications may be engineered (e.g., to remove N-linked partially encoded by immunoglobulin genes or fragments of 5 glycosylation sites) or naturally occurring (e.g., to account for immunoglobulin genes. The recognized immunoglobulin allelic variation). For example, it is known in the art that the genes include the kappa, lambda, alpha, gamma, delta, epsi- immunoglobulin gene repertoire is polymorphic (Wang et al., lon and mu constant region genes, as well as myriad immu- Immunol. Cell. Biol., 2008, 86: 111; Collins et al., Immuno noglobulin variable region genes. genetics, 2008, DOI 10.1007/s00251-008-0325-Z, published "Antibody fragments” comprise a portion of an intact anti- online, each incorporated by reference in its entirety); chas body, for example, one or more portions of the antigen-bind- sis, CDRs (e.g., CDRH3) and constant regions representative ing region thereof. Examples of antibody fragments include of these allelic variants are also encompassed by the inven Fab, Fab', F(ab'), and Fv fragments, diabodies, linear anti- tion. In some embodiments, the allelic variant(s) used in a bodies, single-chain antibodies, and multi-specific antibodies particular embodiment of the invention may be selected based formed from intact antibodies and antibody fragments. on the allelic variation present in different patient popula An “intact antibody is one comprising full-length heavy- tions, for example, to identify antibodies that are non-immu and light-chains and an Fc region. An intact antibody is also nogenic in these patient populations. In certain embodiments, referred to as a “full-length, heterodimeric' antibody or the immunogenicity of an antibody of the invention may immunoglobulin. depend on allelic variation in the major histocompatibility The term “variable” refers to the portions of the immuno- 20 complex (MHC) genes of a patient population. Such allelic globulin domains that exhibit variability in their sequence and variation may also be considered in the design of libraries of that are involved in determining the specificity and binding the invention. In certain embodiments of the invention, the affinity of a particular antibody (i.e., the “variable domain(s) chassis and constant regions are contained on a vector, and a '). Variability is not evenly distributed throughout the vari- CDR3 region is introduced between them via homologous able domains of antibodies; it is concentrated in sub-domains recombination. of each of the heavy and light chain variable regions. These In some embodiments, one, two or three nucleotides may sub-domains are called “hypervariable' regions or “comple- follow the heavy chain chassis, forming either a partial (if one mentarity determining regions' (CDRs). The more conserved or two) or a complete (if three) codon. When a full codon is (i.e., non-hyperVariable) portions of the variable domains are present, these nucleotides encode an amino acid residue that called the “framework” regions (FRM). The variable domains is referred to as the “tail,” and occupies position 95. of naturally occurring heavy and light chains each comprise The “CDRH3 numbering system used herein defines the four FRM regions, largely adopting a B-sheet configuration, first amino acid of CDRH3 as being at Kabat position 95 (the connected by three hypervariable regions, which form loops “tail,” when present) and the last amino acid of CDRH3 as connecting, and in some cases forming part of the B-sheet 35 position 102. The amino acids following the “tail” are called structure. The hyperVariable regions in each chain are held “N1 and, when present, are assigned numbers 96, 96A,96B, together in close proximity by the FRM and, with the hyper- etc. The N1 segment is followed by the “DH' segment, which variable regions from the other chain, contribute to the for- is assigned numbers 97.97A,97B, 97C, etc. The DH segment mation of the antigen-binding site (see Kabat et al. Sequences is followed by the “N2' segment, which, when present, is of Proteins of Immunological Interest, 5th Ed. Public Health numbered 98, 98A, 98B, etc. Finally, the most C-terminal Service, National Institutes of Health, Bethesda, Md., 1991, 9 amino acid residue of the set of the “H3-JH” segment is incorporated by reference in its entirety). The constant designated as number 102. The residue directly before (N-ter domains are not directly involved in antigen binding, but minal) it, when present, is 101, and the one before (if present) exhibit various effector functions, such as, for example, anti- is 100. For reasons of convenience, and which will become body-dependent, cell-mediated cytotoxicity and complement apparent elsewhere, the rest of the H3-JH amino acids are activation. numbered in reverse order, beginning with 99 for the amino The “chassis” of the invention represent a portion of the acid just N-terminal to 100,99A for the residue N-terminal to antibody heavy chain variable (IGHV) or light chain variable 99, and so forth for 99B, 99C, etc. Examples of certain (IGLV) domains that are not part of CDRH3 or CDRL3, CDRH3 sequence residue numbers may therefore include the respectively. The chassis of the invention is defined as the following:

13 Amino Acid CDR-H3 with N1 and N2 (95) (96) (96A) (97) (97A) (97) (97C) (97D) (98) (99) (1 OO) (101) (102)

10 Amino Acid CDR-H3 without N1 and N2 (97) (97A) (97) (97C) (97D) (97E) (97) (97G) (101) (102)

DH H3 - JH

60 portion of the variable region of an antibody beginning with As used herein, the term “diversity” refers to a variety or a the first amino acid of FRM1 and ending with the last amino noticeable heterogeneity. The term “sequence diversity’ acid of FRM3. In the case of the heavy chain, the chassis refers to a variety of sequences which are collectively repre includes the amino acids including from about Kabat position sentative of several possibilities of sequences, for example, 1 to about Kabat position 94. In the case of the light chains 65 those found in natural human antibodies. For example, heavy (kappa and lambda), the chassis are defined as including from chain CDR3 (CDRH3) sequence diversity may refer to a about Kabat position 1 to about Kabat position 88. The chas- variety of possibilities of combining the known human DH US 8,877,688 B2 19 20 and H3-JH segments, including the N1 and N2 regions, to As described throughout the specification, the term form heavy chain CDR3 sequences. The light chain CDR3 “library” is used herein in its broadest sense, and also may (CDRL3) sequence diversity may refer to a variety of possi include the sub-libraries that may or may not be combined to bilities of combining the naturally occurring light chain vari produce libraries of the invention. able region contributing to CDRL3 (i.e., L3-VL) and joining As used herein, the term “synthetic polynucleotide refers (i.e., L3-JL) segments, to form light chain CDR3 sequences. to a molecule formed through a chemical process, as opposed As used herein, H3-JH refers to the portion of the IGHJ gene to molecules of natural origin, or molecules derived via tem contributing to CDRH3. As used herein, L3-VL and L3-JL plate-based amplification of molecules of natural origin (e.g., refer to the portions of the IGLV and IGLJ genes (kappa or immunoglobulin chains cloned from populations of B cells lambda) contributing to CDRL3, respectively. 10 via PCR amplification are not “synthetic' used herein). In As used herein, the term "expression' includes any step Some instances, for example, when referring to libraries of the involved in the production of a polypeptide including, but not invention that comprise multiple components (e.g., N1, DH. limited to, transcription, post-transcriptional modification, N2, and/or H3-JH), the invention encompasses libraries in translation, post-translational modification, and secretion. which at least one of the aforementioned components is syn As used herein, the term "host cell' is intended to refer to 15 thetic. By way of illustration, a library in which certain com a cell into which a polynucleotide of the invention. It should ponents are synthetic, while other components are of natural be understood that such terms refer not only to the particular origin or derived via template-based amplification of mol Subject cell but to the progeny or potential progeny of Such a ecules of natural origin, would be encompassed by the inven cell. Because certain modifications may occur in Succeeding tion. generations due to either mutation or environmental influ The term “split-pool synthesis” refers to a procedure in ences, such progeny may not, in fact, be identical to the parent which the products of a plurality of first reactions are com cell, but are still included within the scope of the term as used bined (pooled) and then separated (split) before participating herein. in a plurality of second reactions. Example 9, describes the The term “length diversity” refers to a variety in the length synthesis of 278 DH segments (products), each in a separate of a particular nucleotide or amino acid sequence. For 25 reaction. After synthesis, these 278 segments are combined example, in naturally occurring human antibodies, the heavy (pooled) and then distributed (split) amongst 141 columns for chain CDR3 sequence varies in length, for example, from the synthesis of the N2 segments. This enables the pairing of about 3 amino acids to over about 35 amino acids, and the each of the 278 DH segments with each of the 141 N2 seg light chain CDR3 sequence varies in length, for example, ments. As described elsewhere in the specification, these from about 5 to about 16 amino acids. Prior to the instant 30 numbers are non-limiting. invention, it was known in the art that it is possible to design “Preimmune' antibody libraries have similar sequence antibody libraries containing sequence diversity or length diversities and length diversities to naturally occurring diversity (see, e.g., Hoet et al., Nat. Biotechnol., 2005, 23: human antibody sequences before these sequences have 344: Kretzschmar and von Ruden, Curr. Opin. Biotechnol. undergone negative selection or Somatic hypermutation. For 2002 13:598; and Rauchenberger et al., J. Biol. Chem., 2003 35 example, the set of sequences described in Lee et al. (Immu 278: 38194, each of which is incorporated by reference in its nogenetics, 2006, 57: 917, incorporated by reference in its entirety); however, the instant invention is directed to, at least, entirety) is believed to represent sequences from the preim the design of synthetic antibody libraries containing the mune repertoire. In certain embodiments of the invention, the sequence diversity and length diversity of naturally occurring sequences of the invention will be similar to these sequences human sequences. In some cases, synthetic libraries contain 40 (e.g., in terms of composition and length). In certain embodi ing sequence and length diversity have been synthesized, ments of the invention, Such antibody libraries are designed to however these libraries contain too much theoretical diversity be small enough to chemically synthesize and physically to synthesize the entire designed repertoire and/or too many realize, but large enough to encode antibodies with the poten theoretical members to physically realize or oversample the tial to recognize any antigen. In one embodiment of the inven entire library. 45 tion, an antibody library comprises about 107 to about 10' As used herein, a sequence designed with “directed diver different antibodies and/or polynucleotide sequences encod sity' has been specifically designed to contain both sequence ing the antibodies of the library. In some embodiments, the diversity and length diversity. Directed diversity is not sto libraries of the instant invention are designed to include 10, chastic. 10, 10, 10°, 107,108, 10°, 1010, 10, 1012, 1013, 10-, 1015, As used herein, 'stochastic” describes a process of gener 50 10", 10'7, 10", 10', or 10 different antibodies and/or ating a randomly determined sequence of amino acids, which polynucleotide sequences encoding the antibodies. In certain is considered as a sample of one element from a probability embodiments, the libraries of the invention may comprise or distribution. encode about 10 to about 10, about 10 to about 107, about The term “library of polynucleotides' refers to two or more 107 to about 10, about 10 to about 10'', about 10' to about polynucleotides having a diversity as described herein, spe 55 10', about 10' to about 10", about 10' to about 107, or cifically designed according to the methods of the invention. about 107 to about 10 different antibodies. In certain The term “library of polypeptides’ refers to two or more embodiments of the invention, the diversity of the libraries polypeptides having a diversity as described herein, specifi may be characterized as being greater than or less than one or cally designed according to the methods of the invention. The more of the diversities enumerated above, for example greater term “library of synthetic polynucleotides’ refers to a poly 60 than about 10, 10, 10, 10, 107, 10, 10, 10, 10'', 10", nucleotide library that includes synthetic polynucleotides. 10, 10, 1015, 10, 107, 1018, 10°, or 1020 or less than The term “library of vectors' refers herein to a library of at about 10, 10, 10, 10°, 107,10,10,109, 10'', 10°, 10', least two different vectors. As used herein, the term “human 10, 10, 10", 107, 10, 10', or 10°. In certain other antibody libraries.” at least includes, a polynucleotide or embodiments of the invention, the probability of an antibody polypeptide library which has been designed to represent the 65 of interest being present in a physical realization of a library sequence diversity and length diversity of naturally occurring with a size as enumerated above is at least about 0.0001%, human antibodies. 0.001%, 0.01%, 0.1%, 1%. 5%, 10%, 20%, 30%, 40%, 50%, US 8,877,688 B2 21 22 60%, 70%, 80%, 85%, 90%, 95%,99%, 99.5%, or 99.9% (see expressed human libraries, because self-reactive CDRH3s Library Sampling, in the Detailed Description, for more are removed by the donors immune system via negative information on the probability of a particular sequence being selection. present in a physical realization of a library). The antibody Libraries of the invention containing “VKCDR3” libraries of the invention may also include antibodies directed 5 sequences and “V CDR3' sequences refer to the kappa and to, for example, self (i.e., human) antigens. The antibodies of lambda sub-sets of the CDRL3 sequences, respectively. the present invention may not be present in expressed human These libraries may be designed with directed diversity, to libraries for reasons including because self-reactive antibod collectively represent the length and sequence diversity of the ies are removed by the donor's immune system via negative human antibody CDRL3 repertoire. “Preimmune' versions 10 of these libraries have similar sequence diversities and length selection. However, novel heavy/light chain pairings may in diversities to naturally occurring human antibody CDRL3 some cases create self-reactive antibody specificity (Griffiths sequences before these sequences undergo negative selec et al. U.S. Pat. No. 5,885,793, incorporated by reference in its tion. Known human CDRL3 sequences are represented in entirety). In certain embodiments of the invention, the num various data sets, including the NCBI database (see Appendix ber of unique heavy chains in a library may be about 10, 50. 15 A and Appendix B for light chain sequence data sets) and 10?, 150, 10, 10, 10, 10°, 107, 10, 10, 100, 10'', 102, Martin, Proteins, 1996, 25: 130 incorporated by reference in 10, 10, 10, 10, 107, 10, 10, 10°, or more. In its entirety. In certain embodiments of the invention, Such certain embodiments of the invention, the number of unique CDRL3 libraries are designed to be small enough to chemi light chains in a library may be about 5, 10, 25, 50, 10, 150, cally synthesize and physically realize, but large enough to 500, 10, 10, 10, 10°, 107, 10, 10, 100, 10'', 1012, 10, encode CDRL3S with the potential to recognize any antigen. 10, 1015, 10, 107, 1018, 10, 1020, or more. In one embodiment of the invention, an antibody library As used herein, the term “human antibody CDRH3 librar comprises about 10 different CDRL3 sequences and/or poly ies at least includes, a polynucleotide or polypeptide library nucleotide sequences encoding said CDRL3 sequences. In which has been designed to represent the sequence diversity some embodiments, the libraries of the instant invention are and length diversity of naturally occurring human antibodies. 25 designed to comprise about 10", 10, 10, 10, 10, 107, or 10 “Preimmune’ CDRH3 libraries have similar sequence diver different CDRL3 sequences and/or polynucleotide sequences sities and length diversities to naturally occurring human encoding said CDRL3 sequences. In some embodiments, the antibody CDRH3 sequences before these sequences undergo libraries of the invention may comprise or encode about 10' to negative selection and Somatic hypermutation. Known about 10, about 10 to about 10, or about 10 to about 10 human CDRH3 sequences are represented in various data 30 different CDRL3 sequences. In certain embodiments of the sets, including Jackson et al., J. Immunol Methods, 2007, invention, the diversity of the libraries may be characterized 324; 26: Martin, Proteins, 1996, 25: 130; and Lee et al., as being greater than or less than one or more of the diversities Immunogenetics, 2006, 57: 917, each of which is incorpo enumerated above, for example greater than about 10", 10, rated by reference in its entirety. In certain embodiments of 10, 10, 10, 10, 107, or 10 or less than about 10", 10, 10, the invention, such CDRH3 libraries are designed to be small 35 10, 10, 10, 107, or 10. In certain embodiments of the enough to chemically synthesize and physically realize, but invention, the probability of a CDRL3 of interest being large enough to encode CDRH3s with the potential to recog present in a physical realization of a library with a size as nize any antigen. In one embodiment of the invention, an enumerated above is at least about 0.0001%, 0.001%, 0.01%, antibody library includes about 10° to about 10" different 0.1%, 1%. 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, CDRH3 sequences and/or polynucleotide sequences encod 40 85%, 90%, 95%, 99%, 99.5%, or 99.9% (see Library Sam ing said CDRH3 sequences. In some embodiments, the librar pling, in the Detailed Description, for more information on ies of the instant invention are designed to about 10, 10, 10. the probability of a particular sequence being present in a 10°, 107, 10, 10, 100, 10'', 1012, 10, 10, 105, or 100, physical realization of a library). The preimmune CDRL3 different CDRH3 sequences and/or polynucleotide libraries of the invention may also include CDRL3s directed sequences encoding said CDRH3 sequences. In some 45 to, for example, self (i.e., human) antigens. Such CDRL3s embodiments, the libraries of the invention may include or may not be present in expressed human libraries, because encode about 10 to about 10, about 10° to about 10, about self-reactive CDRL3s are removed by the donor’s immune 10 to about 10', about 10' to about 10', about 10° to system via negative selection. about 10, or about 10' to about 10 different CDRH3 As used herein, the term “known heavy chain CDR3 sequences. In certain embodiments of the invention, the 50 sequences” refers to heavy chain CDR3 sequences in the diversity of the libraries may be characterized as being greater public domain that have been cloned from populations of than or less than one or more of the diversities enumerated human B cells. Examples of Such sequences are those pub above, for example greater than about 10, 10, 10, 10, 107, lished or derived from public data sets, including, for 10, 10, 10, 10'', 10°, 10, 10, 10', or 10' or less than example, Zemlin et al., JMB, 2003, 334: 733; Lee et al., about 10, 10, 10, 10°, 107,10,10,109, 10'', 10°, 10, 55 Immunogenetics, 2006, 57: 917; and Jackson et al. J. Immu 10, 10', or 10'. In certain embodiments of the invention, nol. Methods, 2007, 324; 26, each of which are incorporated the probability of a CDRH3 of interest being present in a by reference in their entirety. physical realization of a library with a size as enumerated As used herein, the term “known light chain CDR3 above is at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%, sequences’ refers to light chain CDR3 sequences (e.g., kappa 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 60 or lambda) in the public domain that have been cloned from 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, 99%, populations of human B cells. Examples of Such sequences 99.5%, or 99.9% (see Library Sampling, in the Detailed are those published or derived from public data sets, includ Description, for more information on the probability of a ing, for example, the NCBI database (see Appendices A and particular sequence being present in a physical realization of B filed herewith). a library). The preimmune CDRH3 libraries of the invention 65 As used herein the term “antibody binding regions” refers may also include CDRH3s directed to, for example, self (i.e., to one or more portions of an immunoglobulin or antibody human) antigens. Such CDRH3s may not be present in variable region capable of binding an antigens). Typically, US 8,877,688 B2 23 24 the antibody binding region is, for example, an antibody light ing character of a heavy chain variable region (CDRH1, chain (or variable region or one or more CDRs thereof), an CDRH2 and CDRH3). CDRs contribute to the functional antibody heavy chain (or variable region or one or more activity of an antibody molecule and are separated by amino CDRs thereof), a heavy chain Fd region, a combined antibody acid sequences that comprise scaffolding or framework light and heavy chain (or variable regions thereof) Such as a regions. The exact definitional CDR boundaries and lengths Fab, F(ab), single domain, or single chain antibodies (ScPV), are subject to different classification and numbering systems. or any region of a full length antibody that recognizes an CDRs may therefore be referred to by Kabat, Chothia, contact antigen, for example, an IgG (e.g., an IgG1, IgG2, IgG3, or or any other boundary definitions, including the numbering IgG4 Subtype), IgA1, IgA2, Ig), IgE, or IgM antibody. system described herein. Despite differing boundaries, each The term “framework region” refers to the art-recognized 10 of these systems has some degree of overlap in what consti portions of an antibody variable region that exist between the tutes the so called “hypervariable regions” within the variable more divergent (i.e., hypervariable) CDRs. Such framework sequences. CDR definitions according to these systems may regions are typically referred to as frameworks 1 through 4 therefore differ in length and boundary areas with respect to (FRM1, FRM2, FRM3, and FRM4) and provide a scaffold for the adjacent framework region. See for example Kabat, the presentation of the six CDRs (three from the heavy chain 15 Chothia, and/or MacCallum et al., (Kabat et al., in and three from the light chain) in three dimensional space, to “Sequences of Proteins of Immunological Interest.” 5" Edi form an antigen-binding Surface. tion, U.S. Department of Health and Human Services, 1992: 1. The term “canonical structure” refers to the main chain Chothia et al., J. Mol. Biol., 1987, 196:901; and MacCallum conformation that is adopted by the antigen binding et al., J. Mol. Biol., 1996, 262: 732, each of which is incor (CDR) loops. From comparative structural studies, it has porated by reference in its entirety). been found that five of the six antigenbinding loops have The term “amino acid' or "amino acid residue typically only a limited repertoire of available conformations. refers to an amino acid having its art recognized definition Each canonical structure can be characterized by the Such as an amino acid selected from the group consisting of torsion angles of the polypeptide backbone. Correspon alanine (Ala or A); arginine (Arg or R); asparagine (ASn or dent loops between antibodies may, therefore, have very 25 N); aspartic acid (Asp or D); cysteine (CyS or C); glutamine similar three dimensional structures, despite high amino (Glin or Q); glutamic acid (Glu or E); glycine (Gly or G); acid sequence variability in most parts of the loops histidine (H is or H); isoleucine (Ile or I): leucine (Leu or L); (Chothia and Lesk, J. Mol. Biol., 1987, 196: 901; lysine (Lys or K); methionine (Metor M); phenylalanine (Phe Chothia et al., Nature, 1989, 342: 877; Martin and or F); proline (Pro or P); serine (Ser or S); threonine (Thr or Thornton, J. Mol. Biol., 1996, 263: 800, each of which is 30 T); tryptophan (Trp or W); tyrosine (TyrorY); and valine (Val incorporated by reference in its entirety). Furthermore, or V), although modified, synthetic, or rare amino acids may there is a relationship between the adopted loop struc be used as desired. Generally, amino acids can be grouped as ture and the amino acid sequences Surrounding it. The having a nonpolar side chain (e.g., Ala, Cys, Ile, Leu, Met, conformation of a particular canonical class is deter Phe, Pro, Val); a negatively charged side chain (e.g., Asp, mined by the length of the loop and the amino acid 35 Glu); a positively charged sidechain (e.g., Arg, His, Lys); or residues residing at key positions within the loop, as well an uncharged polar side chain (e.g., ASn, Cys, Gln, Gly, His, as within the conserved framework (i.e., outside of the Met, Phe, Ser, Thr, Trp, and Tyr). loop). Assignment to a particular canonical class can The term “polynucleotide(s) refers to nucleic acids such therefore be made based on the presence of these key as DNA molecules and RNA molecules and analogs thereof amino acid residues. The term “canonical structure' 40 (e.g., DNA or RNA generated using nucleotide analogs or may also include considerations as to the linear using nucleic acid chemistry). As desired, the polynucle sequence of the antibody, for example, as catalogued by otides may be made synthetically, e.g., using art-recognized Kabat (Kabat et al., in “Sequences of Proteins of Immu nucleic acid chemistry or enzymatically using, e.g., a poly nological Interest.” 5" Edition, U.S. Department of merase, and, if desired, be modified. Typical modifications Heath and Human Services, 1992). The Kabat number 45 include methylation, biotinylation, and other art-known ing scheme is a widely adopted Standard for numbering modifications. In addition, the nucleic acid molecule can be the amino acid residues of an antibody variable domain single-stranded or double-stranded and, where desired, in a consistent manner. Additional structural consider linked to a detectable moiety. ations can also be used to determine the canonical struc The terms “theoretical diversity”, “theoretical total diver ture of an antibody. For example, those differences not 50 sity', or “theoretical repertoire” refer to the maximum num fully reflected by Kabat numbering can be described by ber of variants in a library design. For example, given an the numbering system of Chothia et al. and/or revealed amino acid sequence of three residues, where residues one by other techniques, for example, crystallography and and three may each be any one of five amino acid types and two or three-dimensional computational modeling. residue two may be any one of 20 amino acid types, the Accordingly, a given antibody sequence may be placed 55 theoretical diversity is 5x20x5–500 possible sequences. into a canonical class which allows for, among other Similarly if sequence X is constructed by combination of 4 things, identifying appropriate chassis sequences (e.g., amino acid segments, where segment 1 has 100 possible based on a desire to include a variety of canonical struc sequences, segment 2 has 75 possible sequences, segment 3 tures in a library). Kabat numbering of antibody amino has 250 possible sequences, and segment 4 has 30 possible acid sequences and structural considerations as 60 sequences, the theoretical total diversity of fragment X would described by Chothia et al., and their implications for be 100x75x200x30, or 5.6x10 possible sequences. construing canonical aspects of antibody structure, are The term “physical realization” refers to a portion of the described in the literature. theoretical diversity that can actually be physically sampled, The terms “CDR', and its plural “CDRs', refer to a for example, by any display methodology. Exemplary display complementarity determining region (CDR) of which three 65 methodology include: phage display, ribosomal display, and make up the binding character of a light chain variable region yeast display. For synthetic sequences, the size of the physical (CDRL1, CDRL2 and CDRL3) and three make up the bind realization of a library depends on (1) the fraction of the US 8,877,688 B2 25 26 theoretical diversity that can actually be synthesized, and (2) K occurs in position one in 100% of the instances and P the limitations of the particular screening method. Exemplary occurs in position three in about 67% of the instances. In limitations of Screening methods include the number of vari certain embodiments of the invention, the sequences selected ants that can be screened in a particular assay (e.g., ribosome for comparison are human immunoglobulin sequences. display, phage display, yeast display) and the transformation 5 As used herein, the term “most frequently occurring amino efficiency of a host cell (e.g., yeast, mammalian cells, bacte acids' at a specified position of a sequence in a population of ria) which is used in a screening assay. For the purposes of polypeptides refers to the amino acid residues that have the highest percent occurrence at the indicated position in the illustration, given a library with a theoretical diversity of 10' indicated polypeptide population. For example, the most fre members, an exemplary physical realization of the library quently occurring amino acids in each of the three most (e.g., in yeast, bacterial cells, ribosome display, etc.; details 10 N-terminal positions in N1 sequences of CDRH3 sequences provided below) that can maximally include 10' members that are functionally expressed by human B cells are listed in will, therefore, sample about 10% of the theoretical diversity Table 21, and the most frequently occurring amino acids in of the library. However, if less than 10' members of the each of the three most N-terminal positions in N2 sequences library with a theoretical diversity of 10° are synthesized, of CDRH3 sequences that are functionally expressed by and the physical realization of the library can maximally 15 human B cells are listed in Table 22. include 10' members, less than 10% of the theoretical diver For the purposes of analyzing the occurrence of certain sity of the library is sampled in the physical realization of the duplets (Example 13) and the information content (Example library. Similarly, a physical realization of the library that can 14) of the libraries of the invention, and other libraries, a maximally include more than 10' members would “over “central loop' of CDRH3 is defined. If the C-terminal 5 sample' the theoretical diversity, meaning that each member amino acids from Kabat CDRH3 (95-102) are removed, then may be present more than once (assuming that the entire 10' the remaining sequence is termed the “central loop'. Thus, theoretical diversity is synthesized). considering the duplet occurrence calculations of Example The term “all possible reading frames' encompasses at 13, using a CDRH3 of size 6 or less would not contribute to least the three forward reading frames and, in Some embodi the analysis of the occurrence of duplets. A CDRH3 of size 7 ments, the three reverse reading frames. 25 would contribute only to the i-i-1 data set, a CDRH3 of size The term “antibody of interest” refers to any antibody that 8 would also contribute to the i-i-2 data set, and a CDRH3 of size 9 and larger would also contribute to the i-i-3 data set. has a property of interest that is isolated from a library of the For example, a CDR H3 of size 9 may have amino acids at invention. The property of interest may include, but is not positions 95-96-97-98-99-100-100A-101-102, but only the limited to, binding to a particular antigen or epitope, blocking first four residues (bolded) would be part of the central loop a binding interaction between two molecules, or eliciting a 30 and contribute to the pair-wise occurrence (duplet) statistics. certain biological effect. As a further example, a CDRH3 of size 14 may have the The term “functionally expressed” refers to those immu sequence: 95-96-97-98-99-100-100A-100B-100C-100D noglobulingenes that are expressed by human B cells and that 100E-100E-101-102. Here, only the first nine residues do not contain premature stop codons. (bolded) contribute to the central loop. The term “full-length heavy chain” refers to an immuno 35 Library screening requires a genotype-phenotype linkage. globulin heavy chain that contains each of the canonical The term “genotype-phenotype linkage' is used in a manner structural domains of an immunoglobulin heavy chain, consistent with its art-recognized meaning and refers to the including the four framework regions, the three CDRs, and fact that the nucleic acid (genotype) encoding a protein with the constant region. The term “full-length light chain” refers a particular phenotype (e.g., binding an antigen) can be iso to an immunoglobulin light chain that contains each of the 40 lated from a library. For the purposes of illustration, an anti canonical structural domains of an immunoglobulin light body fragment expressed on the Surface of a phage can be chain, including the four framework regions, the three CDRs, isolated based on its binding to an antigen (e.g., Ladner et al.). and the constant region. The binding of the antibody to the antigen simultaneously The term “unique,” as used herein, refers to a sequence that enables the isolation of the phage containing the nucleic acid is different (e.g. has a different chemical structure) from every 45 encoding the antibody fragment. Thus, the phenotype (anti other sequence within the designed theoretical diversity. It gen-binding characteristics of the antibody fragment) has been “linked to the genotype (nucleic acid encoding the should be understood that there are likely to be more than one antibody fragment). Other methods of maintaining a geno copy of many unique sequences from the theoretical diversity type-phenotype linkage include those of Wittrup et al. (U.S. in a particular physical realization. Pat. Nos. 6,300,065, 6,331,391, 6,423,538, 6,696,251, 6,699, For example, a library comprising three unique sequences 50 658, and US Pub. No. 20040146976, each of which is incor may comprise nine total members if each sequence occurs porated by reference in its entirety), Miltenyi (U.S. Pat. No. three times in the library. However, in certain embodiments, 7,166,423, incorporated by reference in its entirety), Fandl each unique sequence may occur only once. (U.S. Pat. No. 6,919,183, US Pub No. 20060234311, each The term "heterologous moiety' is used herein to indicate incorporated by reference in its entirety), Clausell-Tormos et the addition of a composition to an antibody wherein the 55 al. (Chem. Biol., 2008, 15: 427, incorporated by reference in composition is not normally part of the antibody. Exemplary its entirety), Love et al. (Nat. Biotechnol., 2006, 24: 703, heterologous moieties include drugs, toxins, imaging agents, incorporated by reference in its entirety), and Kelly et al., and any other compositions which might provide an activity (Chem. Commun., 2007, 14: 1773, incorporated by reference that is not inherent in the antibody itself. in its entirety). Any method which localizes the antibody As used herein, the term “percent occurrence of each 60 protein with the gene encoding the antibody, in a way in amino acid residue at each position” refers to the percentage which they can both be recovered while the linkage between of instances in a sample in which an amino acid is found at a them is maintained, is Suitable. defined position within a particular sequence. For example, given the following three sequences: 2. Design of the Libraries KVR 65 KYP The antibody libraries of the invention are designed to KRP, reflect certain aspects of the preimmune repertoire as natu US 8,877,688 B2 27 28 rally created by the human immune system. Certain libraries (v) one or more H3-JH segments, based on one or more of the invention are based on rational design informed by the IGHJ segments, and one or more N-terminal trunca collection of human V. D., and J genes, and other large data tions thereof (e.g., down to XXWG); bases of human heavy and light chain sequences (e.g., pub (c) one or more selected human antibody kappa and/or licly known germline sequences; sequences from Jackson et 5 lambda light chain chassis; and al., J. Immunol. Methods, 2007, 324; 26, incorporated by (d) a CDRL3 repertoire designed based on the human reference in its entirety; sequences from Lee et al., Immuno IGLV and IGLJ germline sequences, wherein “L” may genetics, 2006, 57: 917, incorporated by reference in its be a kappa or lambda light chain. entirety; and sequences compiled for rearranged VK and The heavy chain chassis may be any sequence with homol VW see Appendices A and B filed herewith). Additional 10 ogy to Kabat residues 1 to 94 of an immunoglobulin heavy information may be found, for example, in Scaviner et al., chain variable domain. Non-limiting examples of heavy Exp. Clin. Immunogenet., 1999, 16:234: Tomlinson et al., J. chain chassis are included in the Examples, and one of ordi Mol. Biol., 1992, 227: 799; and Matsuda et al., J. Exp. Med., nary skill in the art will readily recognize that the principles 1998, 188: 2151 each incorporated by reference in its entirety. 15 presented therein, and throughout the specification, may be In certain embodiments of the invention, cassettes represent used to derive additional heavy chain chassis. ing the possible V. D., and J diversity found in the human As described above, the heavy chain chassis region is fol repertoire, as well asjunctional diversity (i.e., N1 and N2), are lowed, optionally, by a “tail” region. The tail region com synthesized de novo as single or double-stranded DNA oli prises Zero, one, or more amino acids that may or may not be gonucleotides. In certain embodiments of the invention, oli selected on the basis of comparing naturally occurring heavy gonucleotide cassettes encoding CDR sequences are intro chain sequences. For example, in certain embodiments of the duced into yeast along with one or more acceptor vectors invention, heavy chain sequences available in the art may be containing heavy or light chain chassis sequences. No primer compared, and the residues occurring most frequently in the based PCR amplification or template-directed cloning steps tail position in the naturally occurring sequences included in from mammalian cINA or mRNA are employed. Through 25 the library (e.g., to produce sequences that most closely standard homologous recombination, the recipient yeast resemble human sequences). In other embodiments, amino recombines the cassettes (e.g., CDR3s) with the acceptor acids that are used less frequently may be used. In still other vector(s) containing the chassis sequence(s) and constant embodiments, amino acids selected from any group of amino regions, to create a properly ordered synthetic, full-length acids may be used. In certain embodiments of the invention, human heavy chain and/or light chain immunoglobulin the length of the tail is zero (no residue) or one (e.g., G/D/E) library that can be genetically propagated, expressed, dis amino acid. For the purposes of clarity, and without being played, and screened. One of ordinary skill in the art will bound by theory, in the naturally occurring human repertoire, readily recognize that the chassis contained in the acceptor the first 2/3 of the codon encoding the tail residue is provided vector can be designed so as to produce constructs other than by the FRM3 region of the VH gene. The amino acid at this full-length human heavy chains and/or light chains. For 35 position in naturally occurring heavy chain sequences may example, in certain embodiments of the invention, the chassis thus be considered to be partially encoded by the IGHV gene may be designed to encode portions of a polypeptide encod (2/3) and partially encoded by the CDRH3 (/3). However, for ing an antibody fragment or subunit of an antibody fragment, the purposes of clearly illustrating certain aspects of the so that a sequence encoding an antibody fragment, or Subunit 40 invention, the entire codon encoding the tail residue (and, thereof, is produced when the oligonucleotide cassette con therefore, the amino acid derived from it) is described herein taining the CDR is recombined with the acceptor vector. as being part of the CDRH3 sequence. In certain embodiments, the invention provides a synthetic, As described above, there are two peptide segments preimmune human antibody repertoire comprising about 10 derived from nucleotides which are added by TodT in the to about 10' antibody members, wherein the repertoire com- 45 naturally occurring human antibody repertoire. These seg prises: ments are designated N1 and N2 (referred to herein as N1 and (a) selected human antibody heavy chain chassis (i.e., N2 segments, domains, regions or sequences). In certain amino acids 1 to 94 of the heavy chain variable region, embodiments of the invention, N1 and N2 are about 0, 1, 2, or using Kabat’s definition); 3 amino acids in length. Without being bound by theory, it is (b) a CDRH3 repertoire, designed based on the human 50 thought that these lengths most closely mimic the N1 and N2 IGFID and IGHJ germline sequences, the CDRH3 rep lengths found in the human repertoire (see FIG. 2). In other ertoire comprising the following: embodiments of the invention, N1 and N2 may be about 4, 5, (i) optionally, one or more tail regions; 6, 7, 8, 9, or 10 amino acids in length. Similarly, the compo (ii) one or more N1 regions, comprising about 0 to about sition of the amino acid residues utilized to produce the N1 10 amino acids selected from the group consisting of 55 and N2 segments may also vary. In certain embodiments of fewer than 20 of the amino acid types preferentially the invention, the amino acids used to produce N1 and N2 encoded by the action of terminal deoxynucleotidyl segments may be selected from amongst the eight most fre transferase (TdT) and functionally expressed by quently occurring amino acids in the N1 and N2 domains of human B cells; the human repertoire (e.g., G. R. S. P. L. A.V. and T). In other (iii) one or DH segments, based on one or more selected 60 embodiments of the invention, the amino acids used to pro IGHD segments, and one or more N- or C-terminal duce the N1 and N2 segments may be selected from the group truncations thereof; consisting offewer than about 20, 19, 18, 17, 16, 15, 14, 13, (iv) one or more N2 regions, comprising about 0 to about 12, 11, 10,9,8,7,6, 5, 4, or 3 of the amino acids preferentially 10 amino acids selected from the group consisting of encoded by the activity of TdT and functionally expressed by fewer than 20 of the amino acids preferentially 65 human B cells. Alternatively, N1 and N2 may comprise amino encoded by the activity of TdT and functionally acids selected from any group of amino acids. It is not expressed by human B cells; and required that N1 and N2 be of a similar length or composition, US 8,877,688 B2 29 30 and independent variation of the length and composition of humans. In certain embodiments of the invention, it may be N1 and N2 is one method by which additional diversity may desirable to select chassis that correspond to germline be introduced into the library. sequences that are highly represented in the peripheral blood The DH segments of the libraries are based on the peptides of humans. In other embodiments, it may be desirable to encoded by the naturally occurring IGHD gene repertoire, select chassis that correspond to germline sequences that are with progressive deletion of residues at the N- and C-termini. less frequently represented, for example, to increase the IGHD genes may be read in multiple reading frames, and canonical diversity of the library. Therefore, chassis may be peptides representing these reading frames, and their N- and selected to produce libraries that represent the largest and C-terminal deletions are also included in the libraries of the most structurally diverse group of functional human antibod invention. In certain embodiments of the invention, DH seg 10 ies. In other embodiments of the invention, less diverse chas ments as short as three amino acid residues may be included sis may be utilized, for example, if it is desirable to produce a in the libraries. In other embodiments of the invention, DH smaller, more focused library with less chassis variability and segments as short as about 1, 2,4,5,6,7, or 8 amino acids may greater CDR variability. In some embodiments of the inven be included in the libraries. tion, chassis may be selected based on both their expression in The H3-JH segments of the libraries are based on the 15 a cell of the invention (e.g., a yeast cell) and the diversity of peptides encoded by the naturally occurring IGHJ gene rep canonical structures represented by the selected sequences. ertoire, with progressive deletion of residues at the N-termi One may therefore produce a library with a diversity of nus. The N-terminal portion of the IGHJ segment that makes canonical structures that express well in a cell of the inven up part of the CDRH3 is referred to herein as H3-JH. In tion. certain embodiments of the invention, the H3-JH segment 2.1.1. Design of the Heavy Chain Chassis Sequences may be represented by progressive N-terminal deletions of In certain embodiments of the invention, the antibody one or more H3-JH residues, downto two H3-JH residues. In library comprises variable heavy domains and variable light other embodiments of the invention, the H3-JH segments of domains, or portions thereof. Each of these domains is built the library may contain N-terminal deletions (or no deletions) from certain components, which will be more fully described down to about 6, 5, 4, 3, 2, 1, or 0 H3-JH residues. 25 in the examples provided herein. In certain embodiments, the The light chain chassis of the libraries may be any sequence libraries described herein may be used to isolate fully human with homology to Kabat residues 1 to 88 of naturally occur antibodies that can be used as diagnostics and/ortherapeutics. ring light chain (Kor) sequences. In certain embodiments of Without being bound by theory, antibodies with sequences the invention, the light chain chassis of the invention are most similar or identical to those most frequently found in synthesized in combinatorial fashion, utilizing VL and JL 30 peripheral blood (for example, in humans) may be less likely segments, to produce one or more libraries of light chain to be immunogenic when administered as therapeutic agents. sequences with diversity in the chassis and CDR3 sequences. Without being bound by theory, and for the purposes of In other embodiments of the invention, the light chain CDR3 illustrating certain embodiments of the invention, the VH sequences are synthesized using degenerate oligonucleotides domains of the library may be considered to comprise three or trinucleotides and recombined with the light chain chassis 35 primary components: (1) a VH “chassis', which includes and light chain constant region, to form full-length light amino acids 1 to 94 (using Kabat numbering), (2) the chains. CDRH3, which is defined herein to include the Kabat The instant invention also provides methods for producing CDRH3 proper (positions 95-102), and (3) the FRM4 region, and using such libraries, as well as libraries comprising one or including amino acids 103 to 113 (Kabat numbering). The more immunoglobulin domains or antibody fragments. 40 Design and synthesis of each component of the claimed anti overall VH structure may therefore be depicted schematically body libraries is provided in more detail below. (not to scale) as: 2.1. Design of the Antibody Library Chassis Sequences One step in building certain libraries of the invention is the (1) ... (94) (95) ... (102) (103) ... (113) selection of chassis sequences, which are based on naturally 45 occurring variable domain sequences (e.g., IGHV and IGLV). This selection can be done arbitrarily, or by the selection of WHChassis chassis that meet certain criteria. For example, the Kabat database, an electronic database containing non-redundant The selection and design of VH chassis sequences based on rearranged antibody sequences, can be queried for those 50 the human IGHV germline repertoire will become more heavy and light chain germline sequences that are most fre apparent upon review of the examples provided herein. In quently represented. The BLAST search algorithm, or more certain embodiments of the invention, the VH chassis specialized tools such as SoDA (Volpe et al., Bioinformatics, sequences selected for use in the library may correspond to all 2006, 22:438-44, incorporated by reference in its entirety), functionally expressed human IGHV germline sequences. can be used to compare rearranged antibody sequences with 55 Alternatively, IGHV germline sequences may be selected for germline sequences, using the V BASE2 database (Retter et representation in a library according to one or more criteria. al., Nucleic Acids Res., 2005, 33: D671-D674), or similar For example, in certain embodiments of the invention, the collections of human V. D., and J genes, to identify the germ selected IGHV germline sequences may be among those that line families that are most frequently used to generate func are most highly represented among antibody molecules iso tional antibodies. 60 lated from the peripheral blood of healthy adults, children, or Several criteria can be utilized for the selection of chassis fetuses. for inclusion in the libraries of the invention. For example, In certain embodiments, it may be desirable to base the sequences that are known (or have been determined) to design of the VH chassis on the utilization of IGHV germline express poorly in yeast, or other organisms used in the inven sequences in adults, children, or fetuses with a disease, for tion (e.g., bacteria, mammalian cells, fungi, or plants) can be 65 example, an autoimmune disease. Without being bound by excluded from the libraries. Chassis may also be chosen theory, it is possible that analysis of germline sequence usage based on their representation in the peripheral blood of in the antibody molecules isolated from the peripheral blood US 8,877,688 B2 31 32 of individuals with autoimmune disease may provide infor globulins, or antibody fragments. The methods of the inven mation useful for the design of antibodies recognizing human tion may be applied to select any IGHV germline sequence antigens. that has utility in an antibody library of the instant invention. In some embodiments, the selection of IGHV germline In certain embodiments of the invention, the VH chassis of sequences for representation in a library of the invention may the libraries may comprise from about Kabat residue 1 to be based on their frequency of occurrence in the peripheral about Kabat residue 94 of one or more of the following IGHV blood. For the purposes of illustration, four IGHV1 germline germline sequences: IGHV1-2 (SEQID NO: 24), IGHV1-3 sequences (IGHV1-2 (SEQID NO:24), IGHV1-18 (SEQID (SEQ ID NO: 423), IGHV1-8 (SEQ ID NO: 424, 425), NO:25), IGHV1-46 (SEQID NO: 26), and IGHV1-69 (SEQ IGHV1-18 (SEQID NO:25), IGHV1-24 (SEQID NO:426), ID NO: 27) comprise about 80% of the IGHV1 family reper 10 toire in peripheral blood. Thus, the specific IGHV1 germline IGHV1-45 (SEQID NO:427), IGHV1-46 (SEQID NO: 26), sequences selected for representation in the library may IGHV1-58 (SEQID NO:428), IGHV1-69 (SEQID NO: 27), include those that are most frequently occurring and that IGHV2-5 (SEQID NO:429), IGHV2-26 (SEQID NO:430), cumulatively comprise at least about 80% of the IGHV1 IGHV2-70 (SEQID NO: 431,432), IGHV3-7 (SEQID NO: family repertoire found in peripheral blood. An analogous 15 28), IGHV3-9 (SEQ ID NO. 433), IGHV3-11 (SEQID NO: approach can be used to select specific IGHV germline 434), IGHV3-13 (SEQ ID NO: 435), IGHV3-15 (SEQ ID sequences from any other IGHV family (i.e., IGHV1, NO:29), IGHV3-20 (SEQID NO:436), IGHV3-21 (SEQID IGHV2, IGHV3, IGHV4, IGHV5, IGHV6, and IGHV7). The NO:437), IGHV3-23 (SEQID NO:30), IGHV3-30 (SEQID specific germline sequences chosen for representation of a NO:31), IGHV3-33 (SEQID NO:32), IGHV3-43 (SEQID particular IGHV family in a library of the invention may NO:438), IGHV3-48 (SEQID NO:33), IGHV3-49 (SEQID therefore comprise at least about 100%, 99%, 98%, 97%, NO:439), IGHV3-53 (SEQID NO: 440), IGHV3-64 (SEQ 96%95%, 94%,93%, 92%,91%.90%, 89%, 88%, 87%,86%, ID NO: 441), IGHV3-66 (SEQ ID NO: 442), IGHV3-72 85%, 84%, 83%, 82%, 81%, 80%, 75%, 70%, 65%, 60%, (SEQID NO. 443), IGHV3-73 (SEQID NO: 444), IGHV3 55%, 50%, 45%, 40%, 35%, 30%, 25%, 20%, 15%, 10%, 5%, 74 (SEQ ID NO. 445), IGHV4-4 (SEQ ID NO: 446, 447), or 0% of the particular IGHV family member repertoire found 25 IGHV4-28 (SEQID NO:448), IGHV4-31 (SEQID NO:34), in peripheral blood. IGHV4-34 (SEQID NO:35), IGHV4-39 (SEQID NO:36), In some embodiments, the selected IGHV germline IGHV4-59 (SEQID NO:37), IGHV4-61 (SEQID NO:38), sequences may be chosen to maximize the structural diversity IGHV4-B (SEQ ID NO:39), IGHV5-51 (SEQ ID NO: 40), of the VH chassis library. Structural diversity may be evalu IGHV6-1 (SEQID NO:449), and IGHV7-4-1 (SEQID NO: ated by, for example, comparing the lengths, compositions, 30 450). In some embodiments of the invention, a library may and canonical structures of CDRH1 and CDRH2 in the IGHV contain one or more of these sequences, one or more allelic germline sequences. In human IGHV sequences, the CDRH1 variants of these sequences, or encode an amino acid (Kabat definition) may have a length of 5, 6 or 7 amino acids, sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, while CDRH2 (Kabat definition) may have length of 16, 17. 97.5%, 97%, 96.5%,96%. 95.5%, 95%, 94.5%, 94%, 93.5%, 18 or 19 amino acids. The amino acid compositions of the 35 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, IGHV germline sequences and, in particular, the CDR 87%. 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, domains, may be evaluated by sequence alignments, as pre 73.5%, 70%. 65%, 60%, 55%, or 50% identical to one or sented in the Examples. Canonical structure may be assigned, more of these sequences. for example, according to the methods described by Chothia In other embodiments, the VH chassis of the libraries may et al., J. Mol. Biol., 1992, 227: 799, incorporated by reference 40 comprise from about Kabat residue 1 to about Kabat residue in its entirety. 94 of the following IGHV germline sequences: IGHV1-2 In certain embodiments of the invention, it may be advan (SEQID NO. 24), IGHV1-18 (SEQID NO: 25), IGHV1-46 tageous to design VH chassis based on IGHV germline (SEQ ID NO: 26), IGHV1-69 (SEQ ID NO: 27), IGHV3-7 sequences that may maximize the probability of isolating an (SEQ ID NO: 28), IGHV3-15 (SEQID NO: 29), IGHV3-23 antibody with particular characteristics. For example, with 45 (SEQ ID NO:30), IGHV3-30 (SEQID NO:31), IGHV3-33 out being bound by theory, in Some embodiments it may be (SEQ ID NO:32), IGHV3-48 (SEQID NO:33), IGHV4-31 advantageous to restrict the IGHV germline sequences to (SEQID NO:34), IGHV4-34 (SEQID NO:35), IGHV4-39 include only those germline sequences that are utilized in (SEQ ID NO:36), IGHV4-59 (SEQID NO:37), IGHV4-61 antibodies undergoing clinical development, or antibodies (SEQID NO:38), IGHV4-B (SEQID NO:39), and IGHV5 that have been approved as therapeutics. On the other hand, in 50 51 (SEQID NO: 40). In some embodiments of the invention, Some embodiments, it may be advantageous to produce a library may contain one or more of these sequences, one or libraries containing VH chassis that are not represented more allelic variants of these sequences, or encode an amino amongst clinically utilized antibodies. Such libraries may be acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, capable of yielding antibodies with novel properties that are 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, advantageous over those obtained with the use of “typical 55 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, IGHV germline sequences, or enabling studies of the struc 88%, 87%. 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, tures and properties of "atypical IGHV germline sequences 75%, 73.5%, 70%. 65%, 60%, 55%, or 50% identical to one or canonical structures. or more of these sequences. The amino acid sequences of One of ordinary skill in the art will readily recognize that a these chassis are presented in Table 5. variety of other criteria can be used to select IGHV germline 60 2.1.1.1. Heavy Chain Chassis Variants sequences for representation in a library of the invention. Any While the selection of the VH chassis with sequences based of the criteria described herein may also be combined with on the IGHV germline sequences is expected to Support a any other criteria. Further exemplary criteria include the abil large diversity of CDRH3 sequences, further diversity in the ity to be expressed at sufficient levels in certain cell culture VH chassis may be generated by altering the amino acid systems, solubility in particular antibody formats (e.g., whole 65 residues comprising the CDRH1 and/or CDRH2 regions of immunoglobulins and antibody fragments), and the thermo each chassis selected for inclusion in the library (see Example dynamic stability of the individual domains, whole immuno 2). US 8,877,688 B2 33 34 In certain embodiments of the invention, the alterations or Example 2 illustrates the application of this method to mutations in the amino acid residues comprising the CDRH1 heavy chains derived from a particular IGHV germline. One and CDRH2 regions, or other regions, of the IGHV germline of ordinary skill in the art will readily recognize that this sequences are made after analyzing the sequence identity method can be applied to any germline sequence, and can be within data sets of rearranged human heavy chain sequences 5 used to generate at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, that have been classified according to the identity of the 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, original IGHV germline sequence from which the rearranged 50, 55,60, 65,70, 75,80, 85,90, 95, 100, 1000, 10, 10, 10°, sequences are derived. For example, from a set of rearranged or more variants of each heavy chain chassis. antibody sequences, the IGHV germline sequence of each 2.1.2. Design of the Light Chain Chassis Sequences antibody is determined, and the rearranged sequences are 10 The light chain chassis of the invention may be based on classified according to the IGHV germline sequence. This kappa and/or lambda light chain sequences. The principles determination is made on the basis of sequence identity. underlying the selection of light chain variable (IGLV) ger Next, the occurrence of any of the 20 amino acid residues mline sequences for representation in the library are analo at each position in these sequences is determined. In certain gous to those employed for the selection of the heavy chain embodiments of the invention, one may be particularly inter 15 sequences (described above and in Examples 1 and 2). Simi ested in the occurrence of differentamino acid residues at the larly, the methods used to introduce variability into the positions within CDRH1 and CDRH2, for example if increas selected heavy chain chassis may also be used to introduce ing the diversity of the antigen-binding portion of the VH variability into the light chain chassis. chassis is desired. In other embodiments of the invention, it Without being bound by theory, and for the purposes of may be desirable to evaluate the occurrence of different illustrating certain embodiments of the invention, the VL amino acid residues in the framework regions. Without being domains of the library may be considered to comprise three bound by theory, alterations in the framework regions may primary components: (1) a VL “chassis', which includes impact antigen binding by altering the spatial orientation of amino acids 1 to 88 (using Kabat numbering), (2) the the CDRS. VLCDR3, which is defined herein to include the Kabat After the occurrence of amino acids at each position of 25 CDRL3 proper (positions 89-97), and (3) the FRM4 region, interest has been identified, alterations may be made in the including amino acids 98 to 107 (Kabat numbering). The VH chassis sequence, according to certain criteria. In some overall VL structure may therefore be depicted schematically embodiments, the objective may be to produce additional VH (not to scale) as: chassis with sequence variability that mimics the variability observed in the heavy chain domains of rearranged human 30 antibody sequences (derived from respective IGHV germline sequences) as closely as possible, thereby potentially obtain ing sequences that are most human in nature (i.e., sequences that most closely mimic the composition and length of human sequences). In this case, one may synthesize additional VH 35 In certain embodiments of the invention, the VL chassis of chassis sequences that include mutations naturally found at a the libraries include one or more chassis based on IGKV particular position and include one or more of these VH germline sequences. In certain embodiments of the invention, chassis sequences in a library of the invention, for example, at the VL chassis of the libraries may comprise from about a frequency that mimics the frequency found in nature. In Kabat residue 1 to about Kabat residue 88 of one or more of another embodiment of the invention, one may wish to 40 the following IGKV germline sequences: IGKV1-05 (SEQ include VH chassis that represent only mutations that most ID NO: 229), IGKV1-06 (SEQ ID NO: 451), IGKV1-08 frequently occur at a given position in rearranged human (SEQ ID NO: 452, 453), IGKV1-09 (SEQ ID NO:454), antibody sequences. For example, rather than mimicking the IGKV1-12 (SEQ ID NO: 230), IGKV1-13 (SEQ ID NO: human variability precisely, as described above, and with 455), IGKV1-16 (SEQ ID NO. 456), IGKV1-17 (SEQ ID reference to exemplary Tables 6 and 7, one may choose to 45 NO:457), IGKV1-27 (SEQ ID NO. 231), IGKV1-33 (SEQ include only top 19, 18, 17, 16, 15, 14, 13, 12, 11, 10,9,8,7, ID NO. 232), IGKV1-37 (SEQID NOs: 458, 459), IGKV1 6, 5, 4, 3, 2, or 1, amino acid residues that most frequently 39 (SEQ ID NO. 233), IGKV1D-16 (SEQ ID NO: 460), occur at each position. For the purposes of illustration, and IGKV1D-17 (SEQID NO:461), IGKV1D-43 (SEQID NO: with reference to Table 6, if one wished to include the top four 462), IGKV1D-8 (SEQID NOs: 463,464), IGKV2-24 (SEQ most frequently occurring amino acid residues at position 31 50 ID NO: 465), IGKV2-28 (SEQ ID NO. 234), IGKV2-29 of the VH1-69 sequence, then position 31 in the VH1-69 (SEQID NO: 466), IGKV2-30 (SEQID NO: 467), IGKV2 sequence would be varied to include S. N. T. and R. Without 40 (SEQ ID NO: 468), IGKV2D-26 (SEQ ID NO: 469), being bound by theory, it is thought that the introduction of IGKV2D-29 (SEQID NO:470), IGKV2D-30 (SEQID NO: diversity by mimicking the naturally occurring composition 471), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID of the rearranged heavy chain sequences is likely to produce 55 NO: 236), IGKV3-20 (SEQID NO:237), IGKV3D-07 (SEQ antibodies that are most human in composition. However, the ID NO:472), IGKV3D-11 (SEQID NO: 473), IGKV3D-20 libraries of the invention are not limited to heavy chain (SEQID NO.474), IGKV4-1 (SEQID NO: 238), IGKV5-2 sequences that are diversified by this method, and any criteria (SEQID NOs:475,476), IGKV6-21 (SEQID NOs:477), and can be used to introduce diversity into the heavy chain chas IGKV6D-41. In some embodiments of the invention, a library sis, including random or rational mutagenesis. For example, 60 may contain one or more of these sequences, one or more in certain embodiments of the invention, it may be preferable allelic variants of these sequences, or encode an amino acid to Substitute neutral and/or Smaller amino acid residues for sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, those residues that occur in the IGHV germline sequence. 97.5%, 97%, 96.5%,96%. 95.5%, 95%, 94.5%, 94%, 93.5%, Without being bound by theory, neutral and/or smaller amino 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, 88%, acid residues may provide a more flexible and less sterically 65 87%. 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, hindered context for the display of a diversity of CDR 73.5%, 70%. 65%, 60%, 55%, or 50% identical to one or Sequences. more of these sequences. US 8,877,688 B2 35 36 In other embodiments, the VL chassis of the libraries may incorporated by reference in its entirety). It is also known that comprise from about Kabat residue 1 to about Kabat residue both the DH region and the N1/N2 regions contribute to the 88 of the following IGKV germline sequences: IGKV1-05 CDRH3 functional diversity (Schroeder et al., J. Immunol. (SEQID NO: 229), IGKV1-12 (SEQID NO: 230), IGKV1 2005, 174: 7773 and Mathis et al., Eur J. Immunol., 1995, 25: 27 (SEQ ID NO. 231), IGKV1-33 (SEQ ID NO. 232), 3.115, each of which is incorporated by reference in its IGKV1-39 (SEQ ID NO. 233), IGKV2-28 (SEQ ID NO: entirety). For the purposes of the present invention, the 234), IGKV3-11 (SEQ ID NO: 235), IGKV3-15 (SEQ ID CDHR3 region of naturally occurring human antibodies can NO: 236), IGKV3-20 (SEQ ID NO. 237), and IGKV4-1 be divided into five segments: (1) the tail segment, (2) the N1 (SEQID NO. 238). In some embodiments of the invention, a segment, (3) the DH segment, (4) the N2 segment, and (5) the library may contain one or more of these sequences, one or 10 JH segment. As exemplified below, the tail, N1 and N2 seg more allelic variants of these sequences, or encode an amino ments may or may not be present. acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, In certain embodiments of the invention, the method for 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 94.5%, 94%, selecting amino acid sequences for the synthetic CDRH3 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, 89%, libraries includes a frequency analysis and the generation of 88%, 87%. 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 15 the corresponding variability profiles of existing rearranged 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one antibody sequences. In this process, which is described in or more of these sequences. The amino acid sequences of more detail in the Examples section, the frequency of occur these chassis are presented in Table 11. rence of a particular amino acid residue at a particular posi In certain embodiments of the invention, the VL chassis of tion within rearranged CDRH3s (or any other heavy or light the libraries include one or more chassis based on IGWV chain region) is determined. Amino acids that are used more germline sequences. In certain embodiments of the invention, frequently in nature may then be chosen for inclusion in a the VL chassis of the libraries may comprise from about library of the invention. Kabat residue 1 to about Kabat residue 88 of one or more of 2.2.1. Design and Selection of the DH Segment Repertoire the following IGWV germline sequences: IG) V3-1 (SEQ ID In certain embodiments of the invention, the libraries con NO:535), IGV3-21 (SEQIDNO:537), IGV2-14 (SEQID 25 tain CDRH3 regions comprising one or more segments NO:534), IGV1-40 (SEQID NO:531), IG) V3-19 (SEQID designed based on the IGHD gene germline repertoire. In NO:536), IGV1-51 (SEQID NO:533), IG) V1-44 (SEQID some embodiments of the invention, DH segments selected NO:532), IGNV6-57 (SEQID NO: 539), IGV2-8, IGV3 for inclusion in the library are selected and designed based on 25, IGV2-23, IG) V3-10, IGV4-69 (SEQ ID NO: 538), the most frequent usage of human IGHD genes, and progres IGV1-47, IGV2-11, IGV7-43 (SEQ ID NO: 541), 30 sive N-terminal and C-terminal deletions thereof, to mimic IGV7-46, IGV5-45 (SEQ ID NO. 540), IGV4-60, the in vivo processing of the IGHD gene segments. In some IGV10-54 (SEQ ID NO: 482), IGV8-61 (SEQ ID NO: embodiments of the invention, the DH segments of the library 499), IGV3-9 (SEQID NO: 494), IG) V1-36 (SEQID NO: are about 3 to about 10 amino acids in length. In some 480), IGV2-18 (SEQID NO:485), IG) V3-16 (SEQID NO: embodiments of the invention, the DH segments of the library 491), IGV3-27 (SEQID NO:493), IGV4-3 (SEQID NO: 35 are about 0,1,2,3,4,5,6,7,8,9, or 10amino acids in length, 495), IGV5-39 (SEQID NO:497), IGV9-49 (SEQID NO: or a combination thereof. In certain embodiments, the librar 500), and IGV3-12 (SEQ ID NO: 490). In some embodi ies of the invention may contain DH segments with a wide ments of the invention, a library may contain one or more of distribution of lengths (e.g., about 0 to about 10amino acids). these sequences, one or more allelic variants of these In other embodiments, the length distribution of the DH may sequences, or encode an amino acid sequence at least about 40 be restricted (e.g., about 1 to about 5 amino acids, about 3 99.9%, 99.5%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, amino acids, about 3 and about 5 amino acids, and so on). In 91%, 90%, 85%, 80%, 75%, 70%. 65%, 60%, 55%, or 50% certain embodiments of the library, the shortest DH segments identical to one or more of these sequences. may be about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 amino acids. In other embodiments, the VL chassis of the libraries may In certain embodiments of the invention, libraries may comprise from about Kabat residue 1 to about Kabat residue 45 contain DH segments representative of any reading frame of 88 of the following IG. V. germline sequences: IG) V3-1 any IGHD germline sequence. In certain embodiments of the (SEQID NO:535), IGNV3-21 (SEQID NO:537), IGV2-14 invention, the DH segments selected for inclusion in a library (SEQID NO:534), IGV1-40 (SEQID NO:531), IGV3-19 include one or more of the following IGHD sequences, or (SEQID NO:536), IGV1-51 (SEQID NO:533), IGV1-44 their derivatives (i.e., any reading frame and any degree of (SEQID NO:532), IGV6-57 (SEQID NO:539), IGV4-69 50 N-terminal and C-terminal truncation): IGHD3-10 (SEQ ID (SEQ ID NO: 538), IGV7-43 (SEQ ID NO: 541), and NOs: 1-3), IGHD3-22 (SEQ ID NOs: 239, 4, 240), IGHD6 IGV5-45 (SEQ ID NO. 540). In some embodiments of the 19 (SEQID NOs: 5, 6, 241), IGHD6-13 (SEQID NOs: 7, 8, invention, a library may contain one or more of these 242), IGHD3-3 (SEQID NOs: 243, 244, 9), IGHD2-2 (SEQ sequences, one or more allelic variants of these sequences, or ID NOs: 245, 10, 11), IGHD4-17 (SEQ ID NOs: 246, 12, encode an amino acid sequence at least about 99.9%, 99.5%, 55 247), IGHD1-26 (SEQID NOs: 13, 248 and 14), IGHD5-5/ 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 5-18 (SEQID NOs: 249,250, 15), IGHD2-15 (SEQID NOs: 85%, 80%, 75%, 70%, 65%, 60%, 55%, or 50% identical to 251, 16, 252), IGHD6-6 (encoded by SEQ ID NO: 515), one or more of these sequences. The amino acid sequences of IGHD3-9 (encoded by SEQ ID NO. 509), IGHD5-12 (en these chassis are presented in Table 14. coded by SEQID NO: 512), IGHD5-24 (encoded by SEQID 2.2. Design of the Antibody Library CDRH3 Components 60 NO: 513), IGHD2-21 (encoded by SEQ ID NOs: 505 and It is known in the art that diversity in the CDR3 region of 506), IGHD3-16 (encoded by SEQID NO. 508), IGHD4-23 the heavy chain is sufficient for most antibody specificities (encoded by SEQID NO. 510), IGHD1-1 (encoded by SEQ (Xu and Davis, Immunity, 2000, 13: 27-45, incorporated by ID NO: 501), IGHD1-7 (encoded by SEQ ID NO. 504), reference in its entirety) and that existing Successful libraries IGHD4-4/4-11 (encoded by SEQ ID NO: 511), IGHD1-20 have been created using CDRH3 as the major source of diver 65 (encoded by SEQ ID NO. 503), IGHD7-27, IGHD2-8, and sification (Hoogenboom et al., J. Mol. Biol., 1992, 227: 381: IGHD6-25. In some embodiments of the invention, a library Lee et al., J. Mol. Biol., 2004, 340: 1073 each of which is may contain one or more of these sequences, allelic variants US 8,877,688 B2 37 38 thereof, or encode an amino acid sequence at least about TABLE 1 - continued 99.9%, 99.5%, 99%,98.5%, 98%,97.5%, 97%,96.5%, 96%, Example of Progressive N- and C-terminal Deletions 95.5%, 95%, 94.5%, 94%, 93.5%,93%, 92.5%, 92%, 91.5%, of Reading Frame for Gene IGHD3-10, Yielding 91%, 90.5%, 90%, 89%, 88%, 87%. 86%, 85%, 84%, 83%, DH. Segments 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, 5 or 50% identical to one or more of these sequences. DH SEQ ID NO: For the purposes of illustration, progressive N-terminal GELL 61.2 and C-terminal deletions of IGHD3-10, reading frame 1, are GEL enumerated in the Table 1. N-terminal and C-terminal dele 10 tions of other IGHD sequences and reading frames are also ELL encompassed by the invention, and one of ordinary skill in the art can readily determine these sequences using, for example, the non-limiting exemplary data presented in Table 16. and/or In certain embodiments of the invention, the DH segments the methods outlined above. Table 18 (Example 5) enumer 15 selected for inclusion in a library include one or more of the ates certain DH segments used in certain embodiments of the following IGHD sequences, or their derivatives (i.e., any invention. reading frame and any degree N-terminal and C-terminal truncation): IGHD3-10 (SEQID NOs: 1-3), IGHD3-22 (SEQ TABLE 1. ID NOs: 239, 4, 240), IGHD6-19 (SEQ ID NOs: 5, 6, 241), IGHD6-13 (SEQ ID NOs: 7, 8, 242), IGHD3-03 (SEQ ID Example of Progressive N- and C-terminal Deletions NOs: 243, 244, 9), IGHD2-02 (SEQ ID NOs: 245, 10, 11), of Reading Frame 1 for Gene IGHD3-10, Yielding IGHD4-17 (SEQID NOs: 246, 12, 247), IGHD1-26 (SEQID DH. Segments NOs: 13, 248 and 14), IGHD5-5/5-18 (SEQ ID NOs: 249, DH SEO ID NO: 250, 15), and IGHD2-15 (SEQ ID NOs: 251, 16, 252). In 25 Some embodiments of the invention, a library may contain WLLWFGELL 1. one or more of these sequences, allelic variants thereof, or WLLWFGEL 593 encode an amino acid sequence at least about 99.9%, 99.5%, 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, WLLWFGE 594 94.5%, 94%, 93.5%,93%, 92.5%, 92%, 91.5%, 91%,90.5%, WLLLWFG 595 30 90%, 89%, 88%, 87%. 86%, 85%, 84%, 83%, 82%, 81%, 80%, 77.5%, 75%, 73.5%, 70%. 65%, 60%, 55%, or 50% WLLWF 596 identical to one or more of these sequences. WLLW. 597 In certain embodiments of the invention, the DH segments WLL 35 selected for inclusion in a library include one or more of the following IGHD sequences, wherein the notation x' LLWFGELL 598 denotes the reading frame of the gene, or their derivatives LLWFGEL 5.99 (i.e., any degree of N-terminal or C-terminal truncation): IGHD1-26 1 (SEQIDNO: 13), IGHD1-26 3 (SEQID NO: LLWFGE 6 OO 40 14), IGHD2-2 2 (SEQ ID NO: 10), IGHD2-2 3 (SEQ ID LLWFG 6O1 NO: 11), IGHD2-15 2 (SEQ ID NO: 16), IGHD3-3 3 (SEQ ID NO: 9), IGHD3-10 1 (SEQ ID NO: 1), IGHD3 LLWF 6 O2 10 2 (SEQ ID NO: 2), IGHD3-10 3 (SEQ ID NO. 3), IGHD3-22 2 (SEQID NO: 4), IGHD4-17 2 (SEQID NO: LLW. 45 12), IGHD5-5 3 (SEQID NO: 15), IGHD6-13 1 (SEQ ID LWFGELL 603 NO: 7), IGHD6-13 2 (SEQID NO: 8), IGHD6-19 1 (SEQ ID NO. 5), and IGHD6-19 2 (SEQ ID NO: 6). In some LWFGEL 604 embodiments of the invention, a library may contain one or LWFGE 605 more of these sequences, allelic variants thereof, or encode an 50 amino acid sequence at least about 99.9%, 99.5%, 99%, LWFG 6 O6 98.5%, 98%,97.5%, 97%, 96.5%,96%. 95.5%, 95%, 94.5%, 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, LWF 89%, 88%, 87%. 86%, 85%, 84%, 83%, 82%, 81%, 80%, WFGELL 6. Of 55 77.5%, 75%, 73.5%, 70%, 65%, 60%, 55%, or 50% identical to one or more of these sequences. WFGEL 608 In certain embodiments of the invention, the libraries are designed to reflect a pre-determined length distribution of N WFGE 609 and C-terminal deleted IGHD segments. For example, in WFG 60 certain embodiments of the library, the DH segments of the library may be designed to mimic the natural length distribu FGELL 610 tion of DH segments found in the human repertoire. For example, the relative occurrence of different IGHD segments FGEL 611 in rearranged human antibody heavy chain domains from Lee FGE 65 et al. (Immunogenetics, 2006, 57: 917, incorporated by ref erence in its entirety). Table 2 shows the relative occurrence of the top 68% of IGHD segments from Lee et al. US 8,877,688 B2 39 40 TABLE 2 ignored and the progressive deletion of the IGHD gene seg ments can occur as for any other segments, regardless of the Relative Occurrence of Top 68% of IGHD Gene Usage presence of unpaired cysteine residues. In still other embodi from Lee et al. ments of the invention, the cysteine residues can be mutated GHD Reading Sequence Relative to any other amino acid. Frame (Parent) SEQ ID NO : Occurrence 2.2.1.2. Design and Selection of DH Segments from Non Human Vertebrates GHD3-10 1 WLLWFGELL 1. 4.3 & In certain embodiment of the invention, DH segments from GHD3-10 2 YYYGSGSYYN 2 8. 4& non-human vertebrates may be used in conjunction with 10 human VH, N1, N2, and H3-JH segments to produce GHD3-10 3 ITMVRGWII 3 4. O. CDRH3s and/or antibodies in which all segments except the DH segment are synthesized with reference to human GHD3-22 2 YYYDSSGYYY 4. 15 68 sequences. Without being bound by theory, it is anticipated GHD6-191 GYSSGWY 5 7.43 that the extensive variability in the DH segment of antibodies, 15 for example as the result of Somatic hypermutation, may GHD6-19 2 GIAWAG 6 6. O. make this region more permissive to the inclusion of GHD6-13 1. GYSSSWY 7 8. 4& sequences that have non-human characteristics, without sac rificing the ability to recognize a broad variety of antigens or GHD6-13 2 GIAAAG 8 5.3% introducing immunogenic sequences. GHD3-33 ITIFGWWII 9 7.43 The general methods taught hereinare readily applicable to information derived from species other than humans. GHD2-22 GYCSSTSCYT 10 5.2% Example 16 presents exemplary DH segments from a variety of species and outlines methods for their inclusion in the GHD2-2 3 DIWWWPAAM 11 4.18 libraries of the invention. These methods may be readily GHD4-17 2 DYGDY 12 6.8% 25 applied to information derived from other species and/or Sources of information other than those presented in Example GHD1-26 1 GIWGATT 13 2.93. 16. For example, as IGHD sequence data becomes available for additional species (e.g., as a result of focused sequencing GHD1-26 3 YSGSYY 14 4.3 & efforts), one of ordinary skill in the art could use the teachings GHD5-5 3 GYSYGY 15 4.3 & 30 of this application to construct libraries with DH segments derived from these species. GHD2-15 2 GYCSGGSCYS 16 5.6% In certain embodiments of the invention, a library may contain one or more DH segments derived from the IGHD In certain embodiments, these relative occurrences may be genes presented in Table 55. As further enumerated in used to design a library with DH prevalence that is similar to 35 Example 16, these sequences can be selected according to one the IGHD usage found in peripheral blood. In other embodi or more non-limiting criteria, including diversity in length ments of the invention, it may be preferable to bias the library and sequence, maximal (or minimal) human'string content.” toward longer or shorter DH segments, or DH segments of a and/or the absence or minimization of T cell epitopes. Like particular composition. In other embodiments, it may be the human IGHD sequences discussed elsewhere in the appli desirable to use all DH segments selected for the library in 40 cation, the non-human IGHD segments of the invention may equal proportion. be deleted at their N- and/or C-termini to provide DH seg In certain embodiments of the invention, the most com ments with a minimal length of 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 monly used reading-frames of the ten most frequently occur amino acids. The length distribution, reading frame, and fre ring IGHD sequences are utilized, and progressive N-termi quency of inclusion of the non-human DH segments selected nal and C-terminal deletions of these sequences are made, 45 for inclusion in the library may be varied as presented for the thus providing a total of 278 non-redundant DH segments that human DH segments. Non-human DH segments include are used to create a CDRH3 repertoire of the instant invention those derived from non-human IGHD genes according to the (Table 18). In some embodiments of the invention, the meth methods presented herein, allelic variants thereof, and amino ods described above can be applied to produce libraries com acid and nucleotide sequence at least about 99.9%, 99.5%, prising the top 1,2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 50 99%, 98.5%, 98%, 97.5%, 97%, 96.5%, 96%, 95.5%, 95%, 17, 18, 19, 20, 21, 22, 23, 24, or 25 expressed IGHD 94.5%, 94%, 93.5%,93%, 92.5%, 92%, 91.5%, 91%,90.5%, sequences, and progressive N-terminal and C-terminal dele 90%, 89%, 88%, 87%. 86%, 85%, 84%, 83%, 82%, 81%, tions thereof. As with all other components of the library, 80%, 77.5%, 75%, 73.5%, 70%. 65%, 60%, 55%, or 50% while the DH segments may be selected from among those identical to one or more of these sequences. that are commonly expressed, it is also within the scope of the 55 IGHD segments may be obtained from multiple species, invention to select these gene segments based on the fact that including camel, shark, mouse, rat, llama, fish, rabbit, and so they are less commonly expressed. This may be advanta on. Non-limiting exemplary species from which IGHD seg geous, for example, in obtaining antibodies toward self-anti ments may be obtained include Mus musculus, Camelus sp., gens or in further expanding the diversity of the library. Alter Llama sp., Camelidae sp., Raja sp., Gingly mostoma sp., Car natively, DH segments can be used to add compositional 60 charhinus sp., Heterodontus sp., Hydrolagus sp., Ictalurus diversity in a manner that is strictly relative to their occur sp., Gallus sp., Bos sp., Marmaronetta sp., Aythya sp., Netta rence in actual human heavy chain sequences. sp., Equus sp., Pentalagus sp., Bunolagus sp., Nesolagus sp., In certain embodiments of the invention, the progressive Romerolagus sp., Brachylagus sp., Sylvilagus sp., Oryctola deletion of IGHD genes containing disulfide loop encoding gus, sp., Poelagus sp., Ovis sp., Sus sp., Gadus sp., Salmo sp., segments may be limited, so as to leave the loop intact and to 65 Oncorhynchus sp., Macaca sp., Rattus sp., Pan sp., Hexian avoid the presence of unpaired cysteine residues. In other chus sp., Heptranchias sp., Notorvinchus sp., Chlamydosela embodiments of the invention, the presence of the loop can be chus sp., Heterodontus sp. Pristiophorus sp., Pliotrema sp., US 8,877,688 B2 41 42 Squatina sp., Carcharia sp., Mitsukurina sp., Lamma sp., YYYYGMDV (SEQ ID NO: 588), YYYGMDV (SEQ ID Isurus sp., Carcharodon sp., Cetorhinus sp., Alopias sp., NO:589),YYGMDV (SEQID NO:590), YGMDV (SEQID Nebrius sp., Stegostoma sp., Orectolobus sp., Eucrossorhinus NO:591), GMDV (SEQID NO:592), MDV, and DV. In some sp., Sutorectus sp., Chiloscyllium sp., Hemiscyllium sp., embodiments of the invention, a library may contain one or Brachaelurus sp., Heteroscyllium sp., Cirrhoscyllium sp., 5 more of these sequences, allelic variations thereof, or encode Parascyllium sp., Rhincodon sp., Apristurus sp., Atelo an amino acid sequence at least about 99.9%, 99.5%, 99%, mycterus sp., Cephaloscyllium sp., Cephalurus sp., Dichich 98.5%, 98%,97.5%, 97%, 96.5%,96%. 95.5%, 95%, 94.5%, thy's sp., Galeus sp., Halaelurus sp., Haploblepharus sp., 94%, 93.5%, 93%, 92.5%, 92%, 91.5%, 91%, 90.5%, 90%, Parmaturus sp., Pentanchus sp., Poroderna sp., Schroeder 89%, 88%, 87%. 86%, 85%, 84%, 83%, 82%, 81%, 80%, ichthys sp., Scyliorhinus sp., Pseudotriakis sp., Scylliogaleus 10 77.5%, 75%, 73.5%, 70%. 65%, 60%, 60%, 55%, or 50% sp., Furgaleus sp., Hemitriakis sp., Mustelus sp., Triakis sp., identical to one or more of these sequences. Iago sp., Galeorhinus sp., Hypogaleus sp., Chaenogaleus sp., In other embodiments of the invention, the H3-JH segment Hemigaleus sp., Paragaleus sp., Galeocerdo sp., Prionace may comprise about 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or more amino sp., Sciolodon sp., Loxodon sp., Rhizoprionodon sp., Aprion acids. For example, the H3-JH segment of JH1 4 (Table 20) Odon sp., Negaprion sp., Hypoprion sp., Carcharhinus sp., 15 has a length of three residues, while non-deleted JH6 has an Isogomphodon sp., Triaenodon sp., Sphyrna sp., Echinorhi H3-JH segment length of nine residues. The FRM4-JH nus sp., Oxynotus sp., Squalus sp., Centroscyllium sp., region of the IGHJ segment begins with the sequence WG (Q/ Etmopterus sp., Centrophorus sp., Cirrhigaleus sp., Deania R)G (SEQID NO. 23) and corresponds to the portion of the sp., Centroscymnus sp., Scymnodon sp., Dallatias sp., Eupro IGHJ segment that makes up part of framework 4. In certain tomicrus sp., Isistius sp., Squaliolus sp., Heteroscymnoides 20 embodiments of the invention, as enumerated in Table 20, sp., Somniosus sp. and Megachasma sp. there are 28 H3-JH segments that are included in a library. In Publications discussing IGHD segments from additional certain other embodiments, libraries may be produced by species and/or methods of obtaining Such segments include, utilizing about 1,2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, for example, Ye. Immunogenetics, 2004, 56: 399; De Genstet 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 of the al., Dev. Comp. Immunol., 2006, 30: 187: Dooley and Fla- 25 IGHJ segments enumerated above or in Table 20. jnik, Dev. Comp. Immunol. 2006, 30:43; Bengton et al., Dev. 2.2.3. Design and Selection of the N1 and N2 Segment Rep Comp. Immunol., 2006, 30: 77: Ratcliffe, Dev. Comp. Immu ertoires nol., 2006, 30: 101; Zhao et al., Dev. Comp. Immunol., 2006, Terminal deoxynucleotidyl transferase (TdT) is a highly 30: 175; Lundqvist et al., Dev. Comp. Immunol., 2006, 30: conserved enzyme from vertebrates that catalyzes the attach 93: Wagner, Dev. Comp. Immunol. 2006, 30: 155; Mageet 30 ment of 5' triphosphates to the 3' hydroxyl group of single- or al., Dev. Comp. Immunol., 2006, 30: 137; Malecek et al., J. double-stranded DNA. Hence, the enzyme acts as a template Immunol., 2005, 175: 8105: Jenne et al., Dev. Comp. Immu independent polymerase (Koiwai et al., Nucleic Acids Res., nol., 2006,30: 165; Butleret al., Dev. Comp. Immunol., 2006, 1986, 14: 5777; Basu et al., Biochem. Biophys. Res. Comm., 30: 199: Solem et al., Dev. Comp. Immunol., 2006, 30: 57: 1983, 111: 1105, each incorporated by reference in its Das et al., Immunogenetics, 2008, 60: 47, and Kiss et al., 35 entirety). In vivo, TclT is responsible for the addition of nucle Nucleic Acids Res., 2006, 34: e132, each of which is incor otides to the V-D and D-Jjunctions of antibody heavy chains porated by reference in its entirety. (Alt and Baltimore, PNAS, 1982, 79: 4118; Collins et al., J. Given the degree of variability in N1 and N2, these seg Immunol., 2004, 172: 340, each incorporated by reference in ments might also be considered possible regions for Substi its entirety). Specifically, TaT is responsible for creating the tution with non-human sequences, that is, sequences with 40 N1 and N2 (non-templated) segments that flank the D (diver composition biases not arising from those of human terminal sity) region. deoxynucleotide transferase. The methods taught herein for In certain embodiments of the invention, the length and the identification and analysis of the N1 and N2 regions of composition of the N1 and N2 segments are designed ratio human antibodies are also readily applicable to non-human nally, according to statistical biases in amino acid usage antibodies. 45 found in naturally occurring N1 and N2 segments in human 2.2.2. Design and Selection of the H3-JH Segment Reper antibodies. One embodiment of a library produced via this toire method is described in Example 5. According to data com There are six IGHJ (joining) segments, IGHJ1 (SEQ ID piled from human databases (Jackson et al., J. Immunol. NO: 253), IGHJ2 (SEQ ID NO: 254), IGHJ3 (SEQ ID NO: Methods, 2007, 324: 26, incorporated by reference in its 255), IGHJ4 (SEQID NO:256), IGHJ5 (SEQID NO: 257), 50 entirety), there are an average of 3.02 amino acid insertions and IGHJ6 (SEQID NO: 258). The amino acid sequences of for N1 and 2.4 amino acid insertions for N2, not taking into the parent segments and the progressive N-terminal deletions account insertions of two nucleotides or less (FIG. 2). In are presented in Table 20 (Example 5). Similar to the N- and certain embodiments of the invention, N1 and N2 segments C-terminal deletions that the IGHD genes undergo, natural are restricted to lengths of Zero to three amino acids. In other variation is introduced into the IGHJ genes by N-terminal 55 embodiments of the invention, N1 and N2 may be restricted to "nibbling, or progressive deletion, of one or more codons by lengths of less than about 4, 5, 6, 7, 8, 9, or 10 amino acids. exonuclease activity. In some embodiments of the invention, the composition of The H3-JH segment refers to the portion of the IGHJ these sequences may be chosen according to the frequency of segment that is part of CDRH3. In certain embodiments of the occurrence of particular amino acids in the N1 and N2 invention, the H3-JH segment of a library comprises one or 60 sequences of natural human antibodies (for examples of this more of the following sequences: AEYFQH (SEQ ID NO: analysis, see, Tables 21 to 23, in Example 5). In certain 17), EYFQH (SEQID NO: 583), YFQH (SEQID NO:584), embodiments of the invention, the eight most commonly FQH, QH, HYWYFDL (SEQIDNO: 18), WYFDL (SEQID occurring amino acids in these regions (i.e., G. R. S. P. L. A. NO: 585), YFDL (SEQ ID NO: 586), FDL, DL, L, AFDV T, and V) are used to design the synthetic N1 and N2 seg (SEQ ID NO: 19), FDV, DV, V, YFDY (SEQ ID NO: 20), 65 ments. In other embodiments of the invention about the most FDY, DY.Y., NWFDS (SEQIDNO: 21), WFDS (SEQID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, or 19 587), FDS, DS, S. YYYYYGMDV (SEQ ID NO: 22), most commonly occurring amino acids may be used in the US 8,877,688 B2 43 44 design of the synthetic N1 and N2 segments. In still other on the information taught herein, it is now possible to produce embodiments, all 20 amino acids may be used in these seg libraries that are more diverse or less diverse (i.e., more ments. Finally, while it is possible to base the designed com focused) by varying the number of distinct amino acid position of the N1 and N2 segments of the invention on the sequences used in the N1 pool and/or N2 pool. composition of naturally occurring N1 and N2 segments, this 5 As described above, many alternative embodiments are is not a requirement. The N1 and N2 segments may comprise envisioned, in which the compositions and lengths of the N1 amino acids selected from any group of amino acids, or and N2 segments vary from those presented in the Examples designed according to other criteria considered for the design herein. In some embodiments, Sub-stoichiometric synthesis of a library of the invention. A person of ordinary skill in the of trinucleotides may be used for the synthesis of N1 and N2 art would readily recognize that the criteria used to design any 10 segments. Sub-stoichiometric synthesis with trinucleotides is portion of a library of the invention may vary depending on described in Knappik et al. (U.S. Pat. No. 6,300,064, incor the application of the particular library. It is an object of the porated by reference in its entirety). The use of sub-stoichio invention that it may be possible to produce a functional metric synthesis would enable synthesis with consideration library through the use of N1 and N2 segments selected from of the length variation in the N1 and N2 sequences. any group of amino acids, no N1 or N2 segments, or the use 15 In addition to the embodiments described above, a model of N1 and N2 segments with compositions other than those of the activity of TdT may also be used to determine the described herein. composition of the N1 and N2 sequences in a library of the One important difference between the libraries of the cur invention. For example, it has been proposed that the prob rent invention and other libraries known in the art is the ability of incorporating a particular nucleotide base (A, C, G, consideration of the composition of naturally occurring T) on a polynucleotide, by the activity of TdT, is dependent on duplet and triplet amino acid sequences during the design of the type of base and the base that occurs on the strand directly the library. Table 23 shows the top twenty-five naturally preceding the base to be added. Jackson et al., (J. Immunol. occurring duplets in the N1 and N2 regions. Many of these Methods, 2007, 324: 26, incorporated by reference in its can be represented by the general formula (G/P)(G/R/S/P/L/ entirety) have constructed a Markov model describing this A/V/T) or (R/S/L/A/V/T)(G/P). In certain embodiments of 25 process. In certain embodiments of the invention, this model the invention, the synthetic N1 and N2 regions may comprise may be used to determine the composition of the N1 and/or all of these duplets. In other embodiments, the library may N2 segments used in libraries of the invention. Alternatively, comprise the top 2, 3, 4, 5, 6,7,8,9, 10, 11, 12, 13, 14, 15, 16, the parameters presented in Jackson et al. could be further 17, 18, 19, 20, 21, 22, 23, 24, or 25 most common naturally refined to produce sequences that more closely mimic human occurring N1 and/or N2 duplets. In other embodiments of the 30 Sequences. invention, the libraries may include duplets that are less fre 2.2.4. Design of a CDRH3 Library Using the N1, DH, N2, and quently occurring (i.e., outside of the top 25). The composi H3-JHSegments tion of these additional duplets or triplets could readily be The CDRH3 libraries of the invention comprise an initial determined, given the methods taught herein. amino acid (in certain exemplary embodiments, G, D, E) or Finally, the data from the naturally occurring triplet N1 and 35 lack thereof (designated herein as position 95), followed by N2 regions demonstrates that the naturally occurring N1 and the N1, DH, N2, and H3-JH segments. Thus, in certain N2 triplet sequences can often be represented by the formulas embodiments of the invention, the overall design of the (G)(G)(G/R/S/P/L/A/V/T), (G)(R/S/P/L/A/V/T)(G), or (R/S/ CDRH3 libraries can be represented by the following for P/L/A/V/T)(G) (G). In certain embodiments of the invention, mula: the library may comprise the top 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 40 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 most commonly occurring N1 and/or N2 triplets. In other embodi While the compositions of each portion of a CDRH3 of a ments of the invention, the libraries may include triplets that library of the invention are more fully described above, the are less frequently occurring (i.e., outside of the top 25). The composition of the tail presented above (G/D/E/-) is non composition of these additional duplets or triplets could 45 limiting, and that any amino acid (or no amino acid) can be readily be determined, given the methods taught herein. used in this position. Thus, certain embodiments of the inven In certain embodiments of the invention, there are about 59 tion may be represented by the following formula: total N1 segments and about 59 total N2 segments used to create a library of CDRH3s. In other embodiments of the invention, the number of N1 segments, N2 segments, or both 50 wherein X is any amino acid residue or no residue. is increased to about 141 (see, for example. Example 5). In In certain embodiments of the invention, a synthetic other embodiments of the invention, one may select a total of CDRH3 repertoire is combined with selected VH chassis about 0, 5, 10, 15, 20, 25, 30, 35,40, 45, 50,55, 60, 65,70, 75, sequences and heavy chain constant regions, via homologous 80, 85,90, 95, 100, 110, 120, 130, 140, 150, 160, 170, 180, recombination. Therefore, in certain embodiments of the 190, 200, 220, 240, 260, 280, 300, 320, 340,360, 380, 400, 55 invention, it may be necessary to include DNA sequences 420, 440, 460, 480, 500, 1000, 10, or more N1 and/or N2 flanking the 5' and 3' ends of the synthetic CDRH3 libraries, segments for inclusion in a library of the invention. to facilitate homologous recombination between the Syn One of ordinary skill in the art will readily recognize that, thetic CDRH3 libraries and vectors containing the selected given the teachings of the instant specification, it is well chassis and constant regions. In certain embodiments, the within the realm of normal experimentation to extend the 60 vectors also contain a sequence encoding at least a portion of analysis detailed herein, for example, to generate additional the non-nibbled region of the IGHJ gene (i.e., FRM4-JH). rankings of naturally occurring duplet and triplet (or higher Thus, a polynucleotide encoding an N-terminal sequence order) N regions that extend beyond those presented herein (e.g., CA(K/R/T)) may be added to the synthetic CDRH3 (e.g., using sequence alignment, the SoDA algorithm, and any sequences, wherein the N-terminal polynucleotide is database of human sequences (Volpe et al., Bioinformatics, 65 homologous with FRM3 of the chassis, while a polynucle 2006, 22:438-44, incorporated by reference in its entirety). otide encoding a C-terminal sequence (e.g., WG (Q/R)G; An ordinarily skilled artisan would also recognize that, based SEQ ID NO. 23) may be added to the synthetic CDRH3, US 8,877,688 B2 45 46 wherein the C-terminal polynucleotide is homologous with ments, sequences within this length range comprise about FRM4-JH. Although the sequence WG(Q/R)G (SEQID NO: 62% of the sequences of a library. 23) is presented in this exemplary embodiment, additional In certain embodiments of the invention, CDRH3s with a amino acids, C-terminal to this sequence in FRM4-JH may length of about 13 to about 15 residues comprise about 35% also be included in the polynucleotide encoding the C-termi to about 45% of the sequences of a library. In some embodi nal sequence. The purpose of the polynucleotides encoding ments, sequences within this length range comprise about the N-terminal and C-terminal sequences, in this case, is to 40% of the sequences of a library. facilitate homologous recombination, and one of ordinary 2.3. Design of the Antibody Library CDRL3 Components skill in the art would recognize that these sequences may be The CDRL3 libraries of the invention can be generated by longer or shorter than depicted below. Accordingly, in certain 10 one of several approaches. The actual version of the CDRL3 embodiments of the invention, the overall design of the library made and used in a particular embodiment of the CDRH3 repertoire, including the sequences required to facili invention will depend on objectives for the use of the library. tate homologous recombination with the selected chassis, can More than one CDRL3 library may be used in a particular be represented by the following formula (regions homolo embodiment; for example, a library containing CDRH3 gous with vector underlined): 15 diversity, with kappa and lambda light chains is within the Scope of the invention. CARK/T-X)-N1-(DH-N2-H3-JH In certain embodiments of the invention, a CDRL3 library WG(QR)Gl. is a VKCDR3 (kappa) library and/or a VCDR3 (lambda) In other embodiments of the invention, the CDRH3 reper library. The CDRL3 libraries described herein differ signifi toire can be represented by the following formula, which cantly from CDRL3 libraries in the art. First, they consider excludes the T residue presented in the schematic above: length variation that is consistent with what is observed in actual human sequences. Second, they take into consideration the fact that a significant portion of the CDRL3 is encoded by the IGLV gene. Third, the patterns of amino acid variation References describing collections of V. D., and J genes 25 within the IGLV gene-encoded CDRL3 portions are not sto include Scaviner et al., Exp. Clin, Immunogenet., 1999, 16: chastic and are selected based on depending on the identity of 243 and Ruiz et al., Exp. Clin. Immunogenet, 1999, 16:173, the IGLV gene. Taken together, the second and third distinc each incorporated by reference in its entirety. tions mean that CDRL3 libraries that faithfully mimic 2.2.5. CDRH3 Length Distributions observed patterns in human sequences cannot use a generic As described throughout this application, in addition to 30 design that is independent of the chassis sequences in FRM1 accounting for the composition of naturally occurring to FRM3. Fourth, the contribution of JL to CDRL3 is also CDRH3 segments, the instant invention also takes into considered explicitly, and enumeration of each amino acid account the length distribution of naturally occurring CDRH3 residue at the relevant positions is based on the compositions segments. Surveys by Zemlin et al. (JMB, 2003, 334: 733, and natural variations of the JL genes themselves. incorporated by reference in its entirety) and Lee et al. (Im 35 As indicated above, and throughout the application, a munogenetics, 2006, 57.917, incorporated by reference in its unique aspect of the design of the libraries of the invention is entirety) provide analyses of the naturally occurring CDRH3 the germline or "chassis-based aspect, which is meant to lengths. These data show that about 95% of naturally occur preserve more of the integrity and variability of actual human ring CDRH3 sequences have a length from about 7 to about sequences. This is in contrast to other codon-based synthesis 23 amino acids. In certain embodiments, the instant invention 40 or degenerate oligonucleotide synthesis approaches that have provides rationally designed antibody libraries with CDRH3 been described in the literature and that aim to produce “one segments which directly mimic the size distribution of natu size-fits-all” (e.g., consensus) libraries (e.g., Knappik, et al., J rally occurring CDRH3 sequences. In certain embodiments Mol Biol, 2000, 296:57: Akamatsu et al., JImmunol, 1993, of the invention, the length of the CDRH3s may be about 2 to 151: 4651, each incorporated by reference in its entirety). about 30, about 3 to about 35, about 7 to about 23, about 3 to 45 In certain embodiments of the invention, patterns of occur about 28, about 5 to about 28, about 5 to about 26, about 5 to rence of particular amino acids at defined positions within VL about 24, about 7 to about 24, about 7 to about 22, about 8 to sequences are determined by analyzing data available in pub about 19, about 9 to about 22, about 9 to about 20, about 10 to lic or other databases, for example, the NCBI database (see, about 18, about 11 to about 20, about 11 to about 18, about 13 for example, GI numbers in Appendices A and B filed here to about 18, or about 13 to about 16 residues in length. 50 with). In certain embodiments of the invention, these In certain embodiments of the invention, the length distri sequences are compared on the basis of identity and assigned bution of a CDRH3 library of the invention may be defined to families on the basis of the germline genes from which they based on the percentage of sequences within a certain length are derived. The amino acid composition at each position of range. For example, in certain embodiments of the invention, the sequence, in each germline family, may then be deter CDRH3s with a length of about 10 to about 18 amino acid 55 mined. This process is illustrated in the Examples provided residues comprise about 84% to about 94% of the sequences herein. of a the library. In some embodiments, sequences within this 2.3.1. Minimalist VKCDR3 Libraries length range comprise about 89% of the sequences of a In certain embodiments of the invention, the light chain library. CDR3 library is a VKCDR3 library. Certain embodiments of In other embodiments of the invention, CDRH3s with a 60 the invention may use only the most common VKCDR3 length of about 11 to about 17 amino acid residues comprise length, nine residues; this length occurs in a dominant pro about 74% to about 84% of the sequences of a library. In some portion (greater than about 70%) of human VKCDR3 embodiments, sequences within this length range comprise sequences. In human VKCDR3 sequences of length nine, about 79% of the sequences of a library. positions 89-95 are encoded by the IGKV gene and positions In still other embodiments of the invention, CDRH3s with 65 96-97 are encoded by the IGKJ gene. Analysis of human a length of about 12 to about 16 residues comprise about 57% kappa light chain sequences indicates that there are not strong to about 67% of the sequences of a library. In some embodi biases in the usage of the IGKJ genes. Therefore, in certain US 8,877,688 B2 47 48 embodiments of the invention, each of the five the IGKJ genes combinations are possible. Moreover, while it is believed that can be represented in equal proportions to create a combina maintaining the pairing between the VK chassis and the torial library of (MVK chassis)x(5 JK genes), or a library of L3-VK results in libraries that are more similar to human size Mx5. However, in other embodiments of the invention, it kappa light chain sequences in composition, the L3-VK may be desirable to bias IGKJ gene representation, for 5 regions may also be combinatorially varied with different VK example to restrict the size of the library or to weight the chassis regions, to create additional diversity. library toward IGKJ genes known to have particular proper 2.3.2. VKCDR3 Libraries of About 10 Complexity ties. While the dominant length of VKCDR3 sequences in As described in Example 6.1, examination of the first humans is about nine amino acids, other lengths appear at amino acid encoded by the IGKJ gene (position 96) indicated 10 measurable frequencies that cumulatively approach almost that the seven most common residues found at this position about 30% of VKCDR3 sequences. In particular, VKCDR3 of are L. Y. R. W. F. P. and I. These residues cumulatively lengths 8 and 10 represent about 8.5% and about 16%, respec account for about 85% of the residues found in position 96 in tively, of VKCDR3 lengths in representative samples (Ex naturally occurring kappa light chain sequences. In certain ample 6.2: FIG. 3). Thus, more complex VKCDR3 libraries embodiments of the invention, the amino acid residue at 15 may include CDR lengths of 8, 10, and 11 amino acids. Such position 96 may be one of these seven residues. In other libraries could account for a greater percentage of the length embodiments of the invention, the amino acid at this position distribution observed in collections of human VKCDR3 may be chosen from amongst any of the other 13 amino acid sequences, or even introduce VKCDR3 lengths that do not residues. In still other embodiments of the invention, the occur frequently in human VKCDR3 sequences (e.g., less amino acid residue at position 96 may be chosen from than eight residues or greater than 11 residues). amongst the top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, The inclusion of a diversity of kappa light chain length 16, 17, 18, 19, or 20 amino acids that occur at position 96, or variations in a library of the invention also enables one to even residues that never occur at position 96. Similarly, the include sequence variability that occurs outside of the amino occurrence of the amino acids selected to occupy position 96 acid at the VK-JK junction (i.e., position 96, described may be equivalent or weighted. In certain embodiments of the 25 above). In certain embodiments of the invention, the patterns invention, it may be desirable to include each of the amino of sequence variation within the VK, and/or JK segments can acids selected for inclusion in position 96 at equivalent be determined by aligning collections of sequences derived amounts. In other embodiments of the invention, it may be from particular germline sequences. In certain embodiments desirable to bias the composition of position 96 to include of the invention, the frequency of occurrence of amino acid particular residues more or less frequently than others. For 30 residues within VKCDR3 can be determined by sequence example, as presented in Example 6.1, arginine occurs at alignments (e.g., see Example 6.2 and Table 30). In some position 96 most frequently when the IGKJ1 (SEQID NO: embodiments of the invention, this frequency of occurrence 552) germline sequence is used. Therefore, in certain may be used to introduce variability into the VK Chassis, embodiments of the invention, it may be desirable to bias L3-VK and/or JK segments that are used to synthesize the amino acid usage at position 96 according to the origin of the 35 VKCDR3 libraries. In certain embodiments of the invention, IGKJ germline sequence(s) and/or the IGKV germline the top 1,2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, sequence(s) selected for representation in a library. 19, or 20 amino acids that occur at any particular position in Therefore, in certain embodiments of the invention, a mini a naturally occurring repertoire may be included at that posi malist VKCDR3 library may be represented by one or more of tion in a VKCDR3 library of the invention. In certain embodi the following amino acid sequences: 40 ments of the invention, the percent occurrence of any amino acid at any particular position within the VKCDR3 or a VK light chain may be about 0%, 1%, 2%. 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%. In these schematic exemplary sequences, VK Chassis rep 45 In certain embodiments of the invention, the percent occur resents any VK chassis selected for inclusion in a library of rence of any amino acid at any position within a VKCDR3 or the invention (e.g., see Table 11). Specifically, VK Chassis kappa light chain library of the invention may be within at comprises about Kabat residues 1 to 88 of a selected IGKV least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%. 10%, sequence. L3-VK represents the portion of the VKCDR3 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, encoded by the chosen IGKV gene (in this embodiment, 50 120%, 140%, 160%, 180%, or 200% of the percent occur Kabat residues 89-95). F, L, I, R, W, Y, and P are the seven rence of any amino acid at any position within a naturally most commonly occurring amino acids at position 96 of occurring VKCDR3 or kappa light chain domain. VKCDR3s with length nine, X is any amino acid, and JK is In some embodiments of the invention, a VKCDR3 library an IGKJamino acid sequence without the N-terminal residue may be synthesized using degenerate oligonucleotides (see (i.e., the N-terminal residue is substituted with F, L, I, R. W.Y. 55 Table 31 for IUPAC base symbol definitions). In some P. or X). Thus, in one possible embodiment of the minimalist embodiments of the invention, the limits of oligonucleotide VKCDR3 library, 70 members could be produced by utilizing synthesis and the genetic code may require the inclusion of 10 VK chassis, each paired with its respective L3-VK, 7 more or fewer amino acids at a particular position in the amino acids at position 96 (i.e., X), and one JK sequence. VKCDR3 sequences. An illustrative embodiment of this Another embodiment of the library may have 350 members, 60 approach is provided in Example 6.2. produced by combining 10 VK chassis, each paired with its 2.3.3. More Complex VKCDR3 Libraries respective L3-VK, with 7 amino acids at position 96, and all The limitations inherent in using the genetic code and 5 JK genes. Still another embodiment of the library may degenerate oligonucleotide synthesis may, in some cases, have 1,125 members, produced by combining 15 VK chassis, require the inclusion of more or fewer amino acids at a par each paired with its respective H3-JK, with 15 amino acids at 65 ticular position within VKCDR3 (e.g., Example 6.2, Table position 96 and all 5 JK genes, and so on. A person of 32), in comparison to those amino acids found at that position ordinary skill in the art will readily recognize that many other in nature. This limitation can be overcome through the use of US 8,877,688 B2 49 50 a codon-based synthesis approach (Virnekas et al. Nucleic invention is that, unlike the IGKV genes, the contribution of Acids Res., 1994, 22:5600, incorporated by reference in its the IGVW genes to CDRL3 (i.e., L3-VW) is not constrained to entirety), which enables precise synthesis of oligonucleotides a fixed number of amino acid residues. Therefore, while the encoding particular amino acids and a finer degree of control combination of the VK (including L3-VK) and JK segments, over the proportion of any particular amino acid incorporated with inclusion of position 96, yields CDRL3 with a length of at any position. Example 6.3 describes this approach in only 9 residues, length variation may be obtained within a greater detail. VWCDR3 library even when only the V (including L3-VW) In some embodiments of the invention, a codon-based and JW segments are considered. synthesis approach may be used to vary the percent occur As for the VKCDR3 sequences, additional variability may rence of any amino acid at any particular position within the 10 be introduced into the VWCDR3 sequences via the same VKCDR3 or kappa light chain. In certain embodiments, the methods outlined above, namely determining the frequency percent occurrence of any amino acid at any position in a of occurrence of particular residues within VWCDR3 VKCDR3 or kappa light chain sequence of the library may be sequences and synthesizing the oligonucleotides encoding about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, the desired compositions via degenerate oligonucleotide Syn 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 15 thesis or trinucleotides-based synthesis. 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 100%. In some 2.4. Synthetic Antibody Libraries embodiments of the invention, the percent occurrence of any In certain embodiments of the invention, both the heavy amino acid at any position may be about 1%, 2%. 3%, or 4%. and light chain chassis sequences and the heavy and light In certain embodiments of the invention, the percent occur chain CDR3 sequences are synthetic. The polynucleotide rence of any amino acid at any position within a VKCDR3 or sequences of the instant invention can be synthesized by kappa light chain library of the invention may be within at various methods. For example, sequences can be synthesized least about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, by split pool DNA synthesis as described in Feldhaus et al., 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, Nucleic Acids Research, 2000, 28: 534; Omstein et al., 120%, 140%, 160%, 180%, or 200% of the percent occur Biopolymers, 1978, 17: 2341; and Brenner and Lerner, rence of any amino acid at any position within a naturally 25 PNAS, 1992, 87: 6378 (each of which is incorporated by occurring VKCDR3 or kappa light chain domain. reference in its entirety). In certain embodiments of the invention, the VKCDR3 In Some embodiments of the invention, cassettes represent (and any other sequence used in the library, regardless of ing the possible V. D., and J diversity found in the human whether or not it is part of VKCDR3) may be altered to repertoire, as well as junctional diversity, are synthesized de remove undesirable amino acid motifs. For example, peptide 30 novo either as double-stranded DNA oligonucleotides, sequences with the pattern N-X-(S or T)-Z, where X and Zare single-stranded DNA oligonucleotides representative of the different from P. will undergo post-translational modification coding strand, or single-stranded DNA oligonucleotides rep (N-linked glycosylation) in a number of expression systems, resentative of the non-coding strand. These sequences can including yeast and mammalian cells. In certain embodi then be introduced into a host cell along with an acceptor ments of the invention, the introduction of N residues at 35 vector containing a chassis sequence and, in Some cases a certain positions may be avoided, so as to avoid the introduc portion of FRM4 and a constant region. No primer-based tion of N-linked glycosylation sites. In some embodiments of PCR amplification from mammalian cINA or mRNA or the invention, these modifications may not be necessary, template-directed cloning steps from mammalian cDNA or depending on the organism used to express the library and the mRNA need be employed. culture conditions. However, even in the event that the organ 40 2.5. Construction of Libraries by Yeast Homologous Recom ism used to express libraries with potential N-linked glyco bination Sylation sites is incapable of N-linked glycosylation (e.g., In certain embodiments, the present invention exploits the bacteria), it may still be desirable to avoid N-X-(S/T) inherentability of yeast cells to facilitate homologous recom sequences, as the antibodies isolated from Such libraries may bination at high efficiency. The mechanism of homologous be expressed in different systems (e.g., yeast, mammalian 45 recombination in yeast and its applications are briefly cells) later (e.g., toward clinical development), and the pres described below. ence of carbohydrate moieties in the variable domains, and As an illustrative embodiment, homologous recombina the CDRs in particular, may lead to unwanted modifications tion can be carried out in, for example, Saccharomyces cer of activity. evisiae, which has genetic machinery designed to carry out In certain embodiments of the invention, it may be prefer 50 homologous recombination with high efficiency. Exemplary able to create the individual sub-libraries of different lengths S. cerevisiae strains include EM93, CEN.PK2, RM11-1a, (e.g., one or more of lengths 5, 6, 7, 8, 9, 10, 11, or more) YJM789, and BJ5465. This mechanism is believed to have separately, and then mix the Sub-libraries in proportions that evolved for the purpose of chromosomal repair, and is also reflect the length distribution of VKCDR3 in human called 'gap repair or 'gap filling'. By exploiting this mecha sequences; for example, in ratios approximating the 1:9:2 55 nism, mutations can be introduced into specific loci of the distribution that occurs in natural VKCDR3 sequences of yeast genome. For example, a vector carrying a mutant gene lengths 8,9, and 10 (see FIG.3). In other embodiments, it may can contain two sequence segments that are homologous to be desirable to mix these sub-libraries at ratios that are dif the 5' and 3' open reading frame (ORF) sequences of a gene ferent from the distribution of lengths in natural VKCDR3 that is intended to be interrupted or mutated. The vector may sequences, for example, to produce more focused libraries or 60 also encode a positive selection marker, such as a nutritional libraries with particular properties. enzyme allele (e.g., URA3) and/or an antibiotic resistant 2.3.4. VCDR3 Libraries marker (e.g., Geneticin/G418), flanked by the two homolo The principles used to design the minimalist V.CDR3 gous DNA segments. Other selection markers and antibiotic libraries of the invention are similar to those enumerated resistance markers are known to one of ordinary skill in the above, for the VKCDR3 libraries, and are explained in more 65 art. In some embodiments of the invention, this vector (e.g., a detail in the Examples. One difference between the VCDR3 plasmid) is linearized and transformed into the yeast cells. libraries of the invention and the VKCDR3 libraries of the Through homologous recombination between the plasmid US 8,877,688 B2 51 52 and the yeast genome, at the two homologous recombination homologous base pairs may be used to facilitate recombina sites, a reciprocal exchange of the DNA content occurs tion. In other embodiments, between about 20 and about 40 between the wild type gene in the yeast genome and the base pairs are utilized. In addition, the reciprocal exchange mutant gene (including the selection marker gene(s)) that is between the vector and gene fragment is strictly sequence flanked by the two homologous sequence segments. By dependent, i.e. it does not cause a frame shift. Therefore, selecting for the one or more selection markers, the Surviving gap-repair cloning assures the insertion of gene fragments yeast cells will be those cells in which the wild-type gene has with both high efficiency and precision. The high efficiency been replaced by the mutant gene (Pearson et al., Yeast, 1998, makes it possible to clone two, three, or more targeted gene 14: 391, incorporated by reference in its entirety). This fragments simultaneously into the same vector in one trans mechanism has been used to make systematic mutations in all 10 formation attempt (Raymond et al., Biotechniques, 1999, 26: 6,000 yeast genes, or open reading frames (ORFs), for func 134, incorporated by reference in its entirety). Moreover, the tional genomics studies. Because the exchange is reciprocal, nature of precision sequence conservation through homolo a similar approach has also been used successfully to clone gous recombination makes it possible to clone selected genes yeast genomic DNA fragments into a plasmid vector (Iwasaki or gene fragments into expression or fusion vectors for direct et al., Gene, 1991, 109: 81, incorporated by reference in its 15 functional examination (El-Deiry et al., Nature Genetics, entirety). 1992, 1: 4549; Ishioka et al., PNAS, 1997, 94: 2449, each By utilizing the endogenous homologous recombination incorporated by reference in its entirety). machinery present in yeast, gene fragments or synthetic oli Libraries of gene fragments have also been constructed in gonucleotides can also be cloned into a plasmid vector with yeast using homologous recombination. For example, a out a ligation step. In this application of homologous recom human brain cDNA library was constructed as a two-hybrid bination, a target gene fragment (i.e., the fragment to be fusion library in vector pG4-5 (Guidotti and Zervos, Yeast, inserted into a plasmid vector, e.g., a CDR3) is obtained (e.g., 1999, 15: 715, incorporated by reference in its entirety). It has by oligonucleotides synthesis, PCR amplification, restriction also been reported that a total of 6,000 pairs of PCR primers digestion out of another vector, etc.). DNA sequences that are were used for amplification of 6,000 known yeast ORFs for a homologous to selected regions of the plasmid vector are 25 study of yeast genomic protein interactions (Hudson et al., added to the 5' and 3' ends of the target gene fragment. These Genome Res., 1997. 7: 1169, incorporated by reference in its homologous regions may be fully synthetic, or added via PCR entirety). In 2000, Uetz et al. conducted a comprehensive amplification of a target gene fragment with primers that analysis-of protein-protein interactions in Saccharomyces incorporate the homologous sequences. The plasmid vector cerevisiae (Uetz et al., Nature, 2000, 403: 623, incorporated may include a positive selection marker, such as a nutritional 30 by reference in its entirety). The protein-protein interaction enzyme allele (e.g., URA3), or an antibiotic resistance map of the budding yeast was studied by using a comprehen marker (e.g., Geneticin/G418). The plasmid vector is then sive system to examine two-hybrid interactions in all possible linearized by a unique restriction cut located in-between the combinations between the yeast proteins (Ito et al., PNAS, regions of sequence homology shared with the target gene 2000,97: 1143, incorporated by reference in its entirety), and fragment, thereby creating an artificial gap at the cleavage 35 the genomic protein linkage map of Vaccinia virus was stud site. The linearized plasmid vector and the target gene frag ied using this system (McCraithet al., PNAS, 2000,97:4879, ment flanked by sequences homologous to the plasmid vector incorporated by reference in its entirety). are co-transformed into a yeast host strain. The yeast is then In certain embodiments of the invention, a synthetic CDR3 able to recognize the two stretches of sequence homology (heavy or light chain) may be joined by homologous recom between the vector and target gene fragment and facilitate a 40 bination with a vector encoding a heavy or light chain chassis, reciprocal exchange of DNA content through homologous a portion of FRM4, and a constant region, to form a full recombination at the gap. As a consequence, the target gene length heavy or light chain. In certain embodiments of the fragment is inserted into the vector without ligation. invention, the homologous recombination is performed The method described above has also been demonstrated to directly in yeast cells. In some embodiments, the method work when the target gene fragments are in the form of single 45 comprises: stranded DNA, for example, as a circular M13 phage derived (a) transforming into yeast cells: form, or as single stranded oligonucleotides (Simon and (i) a linearized vector encoding a heavy or light chain Moore, Mol. Cell. Biol., 1987, 7: 2329; Ivanov et al., Genet chassis, a portion of FRM4, and a constant region, ics, 1996, 142: 693; and DeMarini et al., 2001, 30: 520., each wherein the site of linearization is between the end of incorporated by reference in its entirety). Thus, the form of 50 FRM3 of the chassis and the beginning of the constant the target that can be recombined into the gapped vector can region; and be double stranded or single stranded, and derived from (ii) a library of CDR3 insert nucleotide sequences that chemical synthesis, PCR, restriction digestion, or other meth are linear and double stranded, wherein each of the ods. CDR3 insert sequences comprises a nucleotide Several factors may influence the efficiency of homologous 55 sequence encoding CDR3 and 5’- and 3'-flanking recombination in yeast. For example, the efficiency of the gap sequences that are Sufficiently homologous to the ter repair is correlated with the length of the homologous mini of the vector of (i) at the site of linearization to sequences flanking both the linearized vector and the target enable homologous recombination to occur between gene. In certain embodiments, about 20 or more base pairs the vector and the library of CDR3 insert sequences: may be used for the length of the homologous sequence, and 60 and about 80 base pairs may give a near-optimized result (Hua et (b) allowing homologous recombination to occur between al., Plasmid, 1997, 38: 91; Raymond et al., Genome Res., the vector and the CDR3 insert sequences in the trans 2002, 12: 190, each incorporated by reference in its entirety). formed yeast cells, such that the CDR3 insert sequences In certain embodiments of the invention, at least about 5, 10, are incorporated into the vector, to produce a vector 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 3435, 65 encoding full-length heavy chain or light chain. 36, 37, 38, 39, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85,90, 95, As specified above, the CDR3 inserts may have a 5' flank 100, 110, 120, 130, 140, 150, 160, 170, 180, 187, 190, or 200 ing sequence and a 3’ flanking sequence that are homologous US 8,877,688 B2 53 54 to the termini of the linearized vector. When the CDR3 inserts encoding a signal sequence, Such as the omp A, phoA or pelB and the linearized vectors are introduced into a host cell, for signal sequence (Lei et al., J. Bacteriol., 1987, 169: 4379, example, a yeast cell, the 'gap' (the linearization site) created incorporated by reference in its entirety). These gene fusions by linearization of the vector is filled by the CDR3 fragment are assembled in a dicistronic construct, so that they can be insert through recombination of the homologous sequences at expressed from a single vector, and secreted into the periplas the 5' and 3' termini of these two linear double-stranded mic space of E. coli where they will refold and can be recov DNAs (i.e., the vector and the insert). Through this event of ered in active form. (Skerra et al., Biotechnology, 1991, 9: homologous recombination, libraries of circular vectors 273, incorporated by reference in its entirety). For example, encoding full-length heavy or light chains comprising vari antibody heavy chain genes can be concurrently expressed able CDR3 inserts is generated. Particular instances of these 10 with antibody light chain genes to produce antibodies or methods are presented in the Examples. antibody fragments. Subsequent analysis may be carried out to determine the In other embodiments of the invention, the antibody efficiency of homologous recombination that results in cor sequences are expressed on the membrane Surface of a rect insertion of the CDR3 sequences into the vectors. For prokaryote, e.g., E. coli, using a secretion signal and lipida example, PCR amplification of the CDR3 inserts directly 15 tion moiety as described, e.g., in US20040072740: from selected yeast clones may reveal how many clones are US2003.01.00023; and US2003.0036092 (each incorporated recombinant. In certain embodiments, libraries with mini by reference in its entirety). mum of about 90% recombinant clones are utilized. In certain Higher eukaryotic cells. Such as mammalian cells, for other embodiments libraries with a minimum of about 1%, example myeloma cells (e.g., NS/O cells), hybridoma cells, 5%. 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, Chinese hamster ovary (CHO), and human embryonic kidney 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%, 89%, (HEK) cells, can also be used for expression of the antibodies 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% of the invention. Typically, antibodies expressed in mamma recombinant clones are utilized. The same PCR amplification lian cells are designed to be secreted into the culture medium, of selected clones may also reveal the insert size. or expressed on the surface of the cell. The antibody or anti To verify the sequence diversity of the inserts in the 25 body fragments can be produced, for example, as intact anti selected clones, a PCR amplification product with the correct body molecules or as individual VH and VL fragments, Fab size of insert may be “fingerprinted with restriction enzymes fragments, single domains, or as single chains (scFV) (Huston known to cut or not cut within the amplified region. From a et al., PNAS, 1988, 85:5879, incorporated by reference in its gel electrophoresis pattern, it may be determined whether the entirety). clones analyzed are of the same identity or of the distinct or 30 Alternatively, antibodies can be expressed and screened by diversified identity. The PCR products may also be sequenced anchored periplasmic expression (APEX 2-hybrid surface directly to reveal the identity of inserts and the fidelity of the display), as described, for example, in Jeong et al., PNAS, cloning procedure, and to prove the independence and diver 2007, 104: 8247 (incorporated by reference in its entirety) or sity of the clones. FIG. 1 depicts a schematic of recombina by other anchoring methods as described, for example, in tion between a fragment (e.g., CDR3) and a vector (e.g., 35 Mazor et al., Nature Biotechnology, 2007, 25: 563 (incorpo comprising a chassis, portion of FRM4, and constant region) rated by reference in its entirety). for the construction of a library. In other embodiments of the invention, antibodies can be 2.6. Expression and Screening Systems selected using mammalian cell display (Ho et al., PNAS, Libraries of polynucleotides generated by any of the tech 2006, 103: 9637, incorporated by reference in its entirety). niques described herein, or other Suitable techniques, can be 40 The screening of the antibodies derived from the libraries expressed and Screened to identify antibodies having desired of the invention can be carried out by any appropriate means. structure and/or activity. Expression of the antibodies can be For example, binding activity can be evaluated by standard carried out, for example, using cell-free extracts (and e.g., immunoassay and/or affinity chromatography. Screening of ribosome display), phage display, prokaryotic cells (e.g., bac the antibodies of the invention for catalytic function, e.g., terial display), or eukaryotic cells (e.g., yeast display). In 45 proteolytic function can be accomplished using a standard certain embodiments of the invention, the antibody libraries assays, e.g., the hemoglobin plaque assay as described in U.S. are expressed in yeast. Pat. No. 5,798,208 (incorporated by reference in its entirety). In other embodiments, the polynucleotides are engineered Determining the ability of candidate antibodies to bind thera to serve as templates that can be expressed in a cell-free peutic targets can be assayed in vitro using, e.g., a BIA extract. Vectors and extracts as described, for example in U.S. 50 CORETM instrument, which measures binding rates of an Pat. Nos. 5,324.637; 5.492.817: 5,665,563, (each incorpo antibody to a given target orantigenbased on Surface plasmon rated by reference in its entirety) can be used and many are resonance. In vivo assays can be conducted using any of a commercially available. Ribosome display and other cell-free number of animal models and then Subsequently tested, as techniques for linking a polynucleotide (i.e., a genotype) to a appropriate, in humans. Cell-based biological assays are also polypeptide (i.e., a phenotype) can be used, e.g., ProfusionTM 55 contemplated. (see, e.g., U.S. Pat. Nos. 6,348,315; 6,261,804; 6.258,558: One aspect of the instant invention is the speed at which the and 6.214.553, each incorporated by reference in its entirety). antibodies of the library can be expressed and screened. In Alternatively, the polynucleotides of the invention can be certain embodiments of the invention, the antibody library expressed in an E. coli expression system, Such as that can be expressed in yeast, which have a doubling time of less described by Pluckthun and Skerra. (Meth. Enzymol., 1989, 60 than about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 178: 476: Biotechnology, 1991, 9:273, each incorporated by 19, 20, 21, 22, 23, or 24 hours. In some embodiments, the reference in its entirety). The mutant proteins can be doubling times are about 1 to about 3 hours, about 2 to about expressed for secretion in the medium and/or in the cytoplasm 4, about 3 to about 8 hours, about 3 to about 24, about 5 to of the bacteria, as described by Better and Horwitz, Meth. about 24, about 4 to about 6 about 5 to about 22, about 6 to Enzymol., 1989, 178: 476, incorporated by reference in its 65 about 8, about 7 to about 22, about 8 to about 10 hours, about entirety. In some embodiments, the single domains encoding 7 to about 20, about 9 to about 20, about 9 to about 18, about VH and VL are each attached to the 3' end of a sequence 11 to about 18, about 11 to about 16, about 13 to about 16, US 8,877,688 B2 55 56 about 16 to about 20, or about 20 to about 30 hours. In certain encoded by plasmids. In other embodiments of the invention, embodiments of the invention, the antibody library is the light chains may be integrated into the genome of the host expressed in yeast with a doubling time of about 16 to about cell. 20 hours, about 8 to about 16 hours, or about 4 to about 8 The coding sequences of the antibody leads may be hours. Thus, the antibody library of the instant invention can 5 mutagenized by a wide variety of methods. Examples of be expressed and Screened in a matter of hours, as compared methods of mutagenesis include, but are not limited to site to previously known techniques which take several days to directed mutagenesis, error-prone PCR mutagenesis, cassette express and screen antibody libraries. A limiting step in the mutagenesis, and random PCR mutagenesis. Alternatively, throughput of such screening processes in mammalian cells is oligonucleotides encoding regions with the desired mutations simply the time required to iteratively regrow populations of 10 can be synthesized and introduced into the sequence to be isolated cells, which, in Some cases, have doubling times mutagenized, for example, via recombination or ligation. greater than the doubling times of the yeast used in the current Site-directed mutagenesis or point mutagenesis may be invention. used to gradually change the CDR sequences in specific In certain embodiments of the invention, the composition regions. This may be accomplished by using oligonucleotide of a library may be defined after one or more enrichment steps 15 directed mutagenesis or PCR. For example, a short sequence (for example by Screening for antigen binding, or other prop of an antibody lead may be replaced with a synthetically erties). For example, a library with a composition comprising mutagenized oligonucleotide in either the heavy chain or about X % sequences or libraries of the invention may be light chain region, or both. The method may not be efficient enriched to contain about 2X %, 3X %, 4x %, 5x %, 6x %, 7x for mutagenizing large numbers of CDR sequences, but may %, 8x %, 9x96, 10x %, 20x %, 25x %, 40x96, 50x %, 60x96 be used for fine tuning of a particular lead to achieve higher 75x%, 80x96,90x %,95x%, or 99X% sequences or libraries affinity toward a specific target protein. of the invention, after one or more screening steps. In other Cassette mutagenesis may also be used to mutagenize the embodiments of the invention, the sequences or libraries of CDR sequences in specific regions. In a typical cassette the invention may be enriched about 2-fold, 3-fold, 4-fold, mutagenesis, a sequence block, or a region, of a single tem 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 100-fold, 1,000 25 plate is replaced by a completely or partially randomized fold, or more, relative to their occurrence prior to the one or sequence. However, the maximum information content that more enrichment steps. In certain embodiments of the inven can be obtained may be statistically limited by the number of tion, a library may contain at least a certain number of a random sequences of the oligonucleotides. Similar to point particular type of sequence(s), such as CDRH3S, CDRL3s, mutagenesis, this method may also be used for fine tuning of heavy chains, light chains, or whole antibodies (e.g., at least 30 a particular lead to achieve higher affinity towards a specific about 10, 10, 10, 10°, 107,10,10,109, 10'', 10°, 10, target protein. 10", 10", 10", 10'7, 10", 10', or 10'). In certain embodi Error-prone PCR, or “poison” PCR, may be used to ments, these sequences may be enriched during one or more mutagenize the CDR sequences by following protocols enrichment steps, to provide libraries comprising at least described in Caldwell and Joyce, PCR Methods and Appli about 10, 10, 10, 10, 10°, 107, 10, 10, 100, 10, 102, 35 cations, 1992, 2:28: Leung et al., Technique, 1989, 1: 11: 10", 10", 10", 10", 10'7, 10", 10' of the respective Shafikhani et al., Biotechniques, 1997, 23:304; and Stemmer sequence(s). et al., PNAS, 1994,91: 10747 (each of which is incorporated 2.7. Mutagenesis Approaches for Affinity Maturation by reference in its entirety). As described above, antibody leads can be identified Conditions for error prone PCR may include (a) high con through a selection process that involves screening the anti 40 centrations of Mn (e.g., about 0.4 to about 0.6 mM) that bodies of a library of the invention for binding to one or more efficiently induces malfunction of Taq DNA polymerase; and antigens, or for a biological activity. The coding sequences of (b) a disproportionally high concentration of one nucleotide these antibody leads may be further mutagenized in vitro or in substrate (e.g., dGTP) in the PCR reaction that causes incor vivo to generate secondary libraries with diversity introduced rect incorporation of this high concentration Substrate into the in the context of the initial antibody leads. The mutagenized 45 template and produces mutations. Additionally, other factors antibody leads can then be further screened for binding to such as, the number of PCR cycles, the species of DNA target antigens or biological activity, in vitro or in Vivo, fol polymerase used, and the length of the template, may affect lowing procedures similar to those used for the selection of the rate of misincorporation of “wrong nucleotides into the the initial antibody lead from the primary library. Such PCR product. Commercially available kits may be utilized for mutagenesis and selection of primary antibody leads effec 50 the mutagenesis of the selected antibody library, such as the tively mimics the affinity maturation process naturally occur “Diversity PCR random mutagenesis kit' (CLONTECHTM). ring in a mammal that produces antibodies with progressive The primer pairs used in PCR-based mutagenesis may, in increases in the affinity to an antigen. In one embodiment of certain embodiments, include regions matched with the the invention, only the CDRH3 region is mutagenized. In homologous recombination sites in the expression vectors. another embodiment of the invention, the whole variable 55 This design allows facile re-introduction of the PCR products region is mutagenized. In other embodiments of the invention back into the heavy or light chain chassis vectors, after one or more of CDRH1, CDRH2, CDRH3, CDRL1, CDRL2, mutagenesis, via homologous recombination. and/ CDRL3 may be mutagenized. In some embodiments of Other PCR-based mutagenesis methods can also be used, the invention, “light chain shuffling may be used as part of alone or in conjunction with the error prone PCR described the affinity maturation protocol. In certain embodiments, this 60 above. For example, the PCR amplified CDR segments may may involve pairing one or more heavy chains with a number be digested with DNase to create nicks in the double stranded of light chains, to select light chains that enhance the affinity DNA. These nicks can be expanded into gaps by other exo and/or biological activity of an antibody. In certain embodi nucleases such as Bal 31. The gaps may then be filled by ments of the invention, the number of light chains to which random sequences by using DNA Klenow polymerase at a the one or more heavy chains can be paired is at least about 2, 65 low concentration of regular substrates dGTP, dATP, dTTP, 5, 10, 100, 1000, 10, 10, 10°, 107, 10, 10, or 109. In and dCTP with one substrate (e.g., dGTP) at a disproportion certain embodiments of the invention, these light chains are ately high concentration. This fill-in reaction should produce US 8,877,688 B2 57 58 high frequency mutations in the filled gap regions. These represented. In certain embodiments, the instant invention method of DNase digestion may be used in conjunction with provides a rationally designed library with diversity so that error prone PCR to create a high frequency of mutations in the any given member is 95% likely to be represented in a physi desired CDR segments. cal realization of the library. In other embodiments of the The CDR or antibody segments amplified from the primary invention, the library is designed so that any given member is antibody leads may also be mutagenized in vivo by exploiting at least about 0.0001%, 0.001%, 0.01%, 0.1%, 1%. 5%, 10%, the inherentability of mutation in pre-B cells. The Ig genes in 20%, 30%, 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95%, pre-B cells are specifically Susceptible to a high-rate of muta 99%, 99.5%, or 99.9% likely to be represented in a physical tion. The Ig promoter and enhancer facilitate such high rate realization of the library. For a review, see, e.g., Firth and mutations in a pre-B cell environment while the pre-B cells 10 Patrick, Biomol. Eng., 2005, 22: 105, and Patricket al., Pro proliferate. Accordingly, CDR gene segments may be cloned tein Engineering, 2003, 16: 451, each of which is incorpo into a mammalian expression vector that contains a human Ig rated by reference in its entirety. enhancer and promoter. This construct may be introduced In certain embodiments of the invention, a library may have into a pre-B cell line, such as 38B9, which allows the muta a theoretical total diversity of X unique members and the tion of the VH and VL gene segments naturally in the pre-B 15 physical realization of the theoretical total diversity may con cells (Liu and Van Ness, Mol. Immunol., 1999, 36: 461, tain at least about 1X, 2X, 3X, 4X, 5X, 6X, 7X, 8X 9X, 10X, incorporated by reference in its entirety). The mutagenized or more members. In some embodiments, the physical real CDR segments can be amplified from the cultured pre-B cell ization of the theoretical total diversity may contain about 1X line and re-introduced back into the chassis-containing to about 2X, about 2X to about 3X, about 3X to about 4X, vector(s) via, for example, homologous recombination. about 4X to about 5X, about 5X to about 6X members. In In some embodiments, a CDR "hit' isolated from screen other embodiments, the physical realization of the theoretical ing the library can be re-synthesized, using degenerate total diversity may contain about 1Xto about 3X, or about 3X codons or trinucleotides, and re-cloned into the heavy or light to about 5X total members. chain Vector using gap repair. An assumption underlying all directed evolution experi 25 ments is that the amount of molecular diversity theoretically 3. Library Sampling possible is enormous compared with the ability to synthesize it, physically realize it, and screen it. The likelihood offinding In certain embodiments of the invention, a library of the a variant with improved properties in a given library is maxi invention comprises a designed, non-random repertoire mized when that library is maximally diverse. Patrick et al. wherein the theoretical diversity of particular components of 30 used simple statistics to derive a series of equations and the library (for example, CDRH3), but not necessarily all computer algorithms for estimating the number of unique components or the entire library, can be over-sampled in a sequence variants in libraries constructed by randomized oli physical realization of the library, at a level where there is a gonucleotide mutagenesis, error-prone PCR and in vitro certain degree of statistical confidence (e.g., 95%) that any recombination. They have written a Suite of programs for given member of the theoretical library is present in the physi 35 calculating library statistics, such as GLUE, GLUE-IT, cal realization of the library at least at a certain frequency PEDEL, PEDEL-AA, and DRIVeR. These programs are (e.g., at least once, twice, three times, four times, five times, described, with instructions on how to access them, in Patrick or more) in the library. et al., Protein Engineering, 2003, 16: 451 and Firth et al., In a library, it is generally assumed that the number of Nucleic Acids Res., 2008, 36: W281 (each of which is incor copies of a given clone obeys a Poisson probability distribu 40 porated by reference in its entirety). tion (see Feller, W. An Introduction to Probability Theory and It is possible to construct a physical realization of a library Its Applications, 1968, Wiley N.Y., incorporated by reference in which some components of the theoretical diversity (such in its entirety). The probability of a Poisson random number as CDRH3) are oversampled, while other aspects (VH/VL being Zero, corresponding to the probability of missing a pairings) are not. For example, consideralibrary in which 10 given component member in an instance of a library (see 45 CDRH3 segments are designed to be present in a single VH below), is e' where N is the average of the random number. chassis, and then paired with 10 VL genes to produce 10' For example, if there are 10° possible theoretical members of (=10*10) possible full heterodimericantibodies. If a physi a library and a physical realization of the library has 107 cal realization of this library is constructed with a diversity of members, with an equal probability of each member of the 10 transformant clones, then the CDRH3 diversity is over theoretical library being sampled, then the average number of 50 sampled ten-fold (=10/10), however the possible VH/VL times that each member occurs in the physical realization of pairings are undersampled by 10' (=10/10"). In this the library is 107/10°=10, and the probability that the number example, on average, each CDRH3 is paired only with 10 of copies of a given member is zero is eYe'=0.000045; or samples of the VL from the possible 10 partners. In certain a 99.9955% chance that there is at least one copy of any of the embodiments of the invention, it is the CDRH3 diversity that 10° theoretical members in this 10X oversampled library. For 55 is preferably oversampled. a 2.3X oversampled library one is 90% confident that a given 3.1. Other Variants of the Polynucleotide Sequences of the component is present. For a 3X oversampled library one is Invention 95% confident that a given component is present. For a 4.6X In certain embodiments, the invention relates to a poly oversampled library one is 99% confident a given clone is nucleotide that hybridizes with a polynucleotide taught present, and so on. 60 herein, or that hybridizes with the complement of a poly Therefore, if M is the maximum number of theoretical nucleotide taught herein. For example, an isolated polynucle library members that can be feasibly physically realized, then otide that remains hybridized after hybridization and washing M/3 is the maximum theoretical repertoire size for which one under low, medium, or high Stringency conditions to a poly can be 95% confident that any given member of the theoreti nucleotide taught herein or the complement of a polynucle cal library will be sampled. It is important to note that there is 65 otide taught herein is encompassed by the present invention. a difference between a 95% chance that a given member is Exemplary low stringency conditions include hybridiza represented and a 95% chance that every possible member is tion with a buffer solution of about 30% to about 35% forma US 8,877,688 B2 59 60 mide, about 1 M NaCl, about 1% SDS (sodium dodecyl comprising one or more libraries or sub-libraries of the inven sulphate) at about 37°C., and a wash in about 1x to about tion, in an amount in which the one or more libraries or 2xSSC (20xSSC=3.0 M NaC1/0.3 M trisodium citrate) at sub-libraries of the invention can be effectively screened and about 50° C. to about 55° C. from which sequences encoded by the one or more libraries or Exemplary moderate stringency conditions include 5 sub-libraries of the invention can be isolated, also fall within hybridization in about 40% to about 45% formamide, about 1 the scope of the invention. MNaCl, about 1% SDS at about 37°C., and a wash in about 3.3. Alternative Scaffolds 0.5x to about 1XSSC at abut 55° C. to about 60° C. In certain embodiments of the invention, the amino acid Exemplary high Stringency conditions include hybridiza products of a library of the invention (e.g., a CDRH3 or tion in about 50% formamide, about 1 M NaCl, about 1% 10 CDRL3) may be displayed on an alternative scaffold. Several SDS at about 37° C., and a wash in about 0.1XSSC at about of these scaffolds have been shown to yield molecules with 60° C. to about 65° C. specificities and affinities that rival those of antibodies. Optionally, wash buffers may comprise about 0.1% to Exemplary alternative scaffolds include those derived from about 1% SDS. fibronectin (e.g., AdNectin), the B-sandwich (e.g., iMab), The duration of hybridization is generally less than about 15 lipocalin (e.g., ), EETI-II/AGRP. BPTI/LACI-D1/ 24 hours, usually about 4 to about 12 hours. ITI-D2 (e.g., ), thioredoxin (e.g., peptide 3.2. Sub-Libraries and Larger Libraries Comprising the aptamer), protein A (e.g., Affibody), ankyrin repeats (e.g., Libraries or Sub-Libraries of the Invention DARPin), YB-crystallin/ubiquitin (e.g., ), CTLDs (e.g., As described throughout the application, the libraries of the Tetranectin), and (LDLR-A module). (e.g., ). Addi current invention are distinguished, in certain embodiments, tional information on alternative scaffolds are provided in by their human-like sequence composition and length, and Binz et al., Nat. Biotechnol., 2005 23: 1257 and Skerra, the ability to generate a physical realization of the library Current Opin. in Biotech., 2007 18:295-304, each of which is which contains all members of (or, in Some cases, even over incorporated by reference in its entirety. samples) a particular component of the library. Libraries comprising combinations of the libraries described herein 25 4. Other Embodiments of the Invention (e.g., CDRH3 and CDRL3 libraries) are encompassed by the invention. Sub-libraries comprising portions of the libraries In certain embodiments, the invention comprises a syn described herein are also encompassed by the invention (e.g., thetic preimmune human antibody CDRH3 library compris a CDRH3 library in a particular heavy chain chassis or a ing 10 to 10 polynucleotide sequences representative of the sub-set of the CDRH3 libraries). One of ordinary skill in the 30 sequence diversity and length diversity found in knownheavy art will readily recognize that each of the libraries described chain CDR3 sequences. herein has several components (e.g., CDRH3, VH, CDRL3, In other embodiments, the invention comprises a synthetic VL, etc.), and that the diversity of these components can be preimmune human antibody CDRH3 library comprising varied to produce sub-libraries that fall within the scope of the polynucleotide sequences encoding CDRH3 represented by invention. 35 the following formula: Moreover, libraries containing one of the libraries or sub libraries of the invention also fall within the scope of the invention. For example, in certain embodiments of the inven wherein IG/D/E/- is Zero to one amino acids in length, N1 tion, one or more libraries or sub-libraries of the invention is Zero to three amino acids, DH is three to ten amino acids may be contained within a larger library, which may include 40 in length, N2 is Zero to three amino acids in length, and sequences derived by other means, for example, non-human H3-JH is two to nine amino acids in length. or human sequence derived by stochastic or semi-stochastic In certain embodiments of the invention, G/D/E/- is rep synthesis. In certain embodiments of the invention, at least resented by an amino acid sequence selected from the group about 1% of the sequences in a polynucleotide library may be consisting of G, D, E, and nothing. those of the invention (e.g., CDRH3 sequences, CDRL3 45 In some embodiments of the invention, N1 is represented sequences, VH sequences, VL sequences), regardless of the by an amino acid sequence selected from the group consisting composition of the other 99% of sequences. In other embodi of G, R, S, P. L. A. V. T. (G/P)(G/R/S/P/L/A/V/T), (R/S/L/ ments of the invention, at least about 0.001%, 0.01%, 0.1%, A/V/T)(G/P), GG(G/R/S/P/L/A/VIT), G(R/S/P/L/A/V/T)G, 2%. 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 81%, (R/S/P/L/A/V/T)GG, and nothing. 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 50 In certain embodiments of the invention, N2 is repre 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% of the sented by an amino acid sequence selected from the group sequences in any polynucleotide library may be those of the consisting of G, R, S, P. L. A.V.T. (G/P)(G/R/S/P/L/A/V/T), invention, regardless of the composition of the other (R/S/L/A/V/T)(G/P), GG(G/R/S/P/L/A/VIT), G(R/S/P/L/A/ sequences. In some embodiments, the sequences of the inven V/T)G, (R/S/P/L/A/V/T)GG, and nothing. tion may comprise about 0.001% to about 1%, about 1% to 55 In some embodiments of the invention, DH comprises a about 2%, about 2% to about 5%, about 5% to about 10%, sequence selected from the group consisting of IGHD3-10 about 10% to about 15%, about 15% to about 20%, about 20% reading frame 1 (SEQID NO: 1), IGHD3-10 reading frame 2 to about 25%, about 25% to about 30%, about 30% to about (SEQID NO: 2), IGHD3-10 reading frame3 (SEQIDNO:3), 35%, about 35% to about 40%, about 40% to about 45%, IGHD3-22 reading frame 2 (SEQ ID NO: 4), IGHD6-19 about 45% to about 50%, about 50% to about 55%, about 55% 60 reading frame 1 (SEQID NO:5), IGHD6-19 reading frame 2 to about 60%, about 60% to about 65%, about 65% to about (SEQID NO: 6), IGHD6-13 reading frame 1 (SEQIDNO:7), 70%, about 70% to about 75%, about 75% to about 80%, IGHD6-13 reading frame 2 (SEQ ID NO: 8), IGHD3-03 about 80% to about 85%, about 85% to about 90%, about 90% reading frame 3 (SEQID NO:9), IGHD2-02 reading frame 2 to about 95%, or about 95% to about 99% of the sequences in (SEQ ID NO: 10), IGHD2-02 reading frame 3 (SEQID NO: any polynucleotide library, regardless of the composition of 65 11), IGHD4-17 reading frame 2 (SEQ ID NO: 12), IGHD1 the other sequences. Thus, libraries more diverse than one or 26 reading frame 1 (SEQ ID NO: 13), IGHD1-26 reading more libraries or sub-libraries of the invention, but yet still frame 3 (SEQ ID NO: 14), IGHD5-5/5-18 reading frame 3 US 8,877,688 B2 61 62 (SEQ ID NO: 15), IGHD2-15 reading frame 2 (SEQID NO: NO: 467) (1-95), IGKV1-27 (SEQ ID NO: 231) (1-95), 16), and all possible N-terminal and C-terminal truncations of IGKV1-16 (SEQID NO:456) (1-95), and truncations of said the above-identified IGHDs down to three amino acids. group up to and including position 95 according to Kabat. In certain embodiments of the invention, H3-JH com In some embodiments of the invention, F/L/I/R/W/Y is prises a sequence selected from the group consisting of AEY an amino acid selected from the group consisting of F, L, I, R, FQH (SEQID NO:17), EYFQH (SEQID NO:583), YFQH W, and Y. (SEQ ID NO:584), FQH, QH,YWYFDL (SEQ ID NO: 18), In certain embodiments of the invention, JK comprises a WYFDL (SEQID NO:585),YFDL (SEQIDNO:586), FDL, sequence selected from the group consisting of TFGQGT DLAFDV (SEQID NO:19), FDV, DV,YFDY (SEQID NO: KVEIK (SEQID NO:528) and TFGGGT (SEQIDNO:529). 20), FDY, DY, NWFDS (SEQ ID NO: 21), WFDS (SEQ ID 10 In some embodiments of the invention, the light chain NO: 586), FDS, DS, YYYYYGMDV (SEQ ID NO: 22), library comprises a kappa light chain library. YYYYGMDV (SEQ ID NO. 587), YYYGMDV (SEQ ID In certain embodiments of the invention, the polynucle NO:588),YYGMDV (SEQID NO:589), YGMDV (SEQID otide sequences further comprise 5' and 3' sequences which NO. 590), GMDV (SEQ ID NO:591), MDV, and DV. facilitate homologous recombination with a light chain chas In some embodiments of the invention, the sequences rep 15 S1S. resented by G/D/E/-N1 ext-DHIN2H3-JH comprise In some embodiments, the invention comprises a method a sequence of about 3 to about 26 amino acids in length. for producing a synthetic preimmune human antibody In certain embodiments of the invention, the sequences CDRH3 library comprising 107 to 10 polynucleotide represented by G/D/E/-N1 ext-DHIN2H3-JH com sequences, said method comprising: prise a sequence of about 7 to about 23 amino acids in length. a) selecting the CDRH3 polynucleotide sequences In some embodiments of the invention, the library com encoded by the CDRH3 sequences, as follows: prises about 10 to about 10" sequences. {0 to 5 amino acids selected from the group consisting of In certain embodiments of the invention, the library com fewer than ten of the amino acids preferentially prises about 10 sequences. encoded by terminal deoxynucleotidyl transferase In some embodiments of the invention, the polynucleotide 25 (TdT) and preferentially functionally expressed by sequences of the libraries further comprise a 5" polynucle human B cells, followed by otide sequence encoding a framework 3 (FRM3) region on (all possible N or C-terminal truncations of IGHD alone the corresponding N-terminal end of the library sequence, and all possible combinations of N and C-terminal wherein the FRM3 region comprises a sequence of about 1 to truncations, followed by about 9 amino acid residues. 30 {0 to 5 amino acids selected from the group consisting of In certain embodiments of the invention, the FRM3 region fewer than ten of the amino acids preferentially comprises a sequence selected from the group consisting of encoded by TclT and preferentially functionally CAR, CAK, and CAT. expressed by human B cells, followed by In some embodiments of the invention, the polynucleotide (all possible N-terminal truncations of IGHJ, down to sequences further comprise a 3' polynucleotide sequence 35 DXWG, wherein X is S. V. L, or Y}; and encoding a framework 4 (FRM4) region on the corresponding b) synthesizing the CDRH3 library described in a) by C-terminal end of the library sequence, wherein the FRM4 chemical synthesis, wherein a synthetic preimmune region comprises a sequence of about 1 to about 9 amino acid human antibody CDRH3 library is produced. residues. In certain embodiments, the invention comprises a syn In certain embodiments of the invention, the library com 40 thetic preimmune human antibody CDRH3 library compris prises a FRM4 region comprising a sequence selected from ing 107 to 10" polynucleotide sequences representative of WGRG (SEQID NO. 23) and WGQG (SEQ ID NO. 23). known human IGHD and IGHJ germline sequences encoding In some embodiments of the invention, the polynucleotide CDRH3, represented by the following formula: sequences further comprise an FRM3 region coding for a {0 to 5 amino acids selected from the group consisting of corresponding polypeptide sequence comprising a sequence 45 fewer than ten of the amino acids preferentially encoded selected from the group consisting of CAR, CAK, and CAT: by terminal deoxynucleotidyl transferase (TdT) and and an FRM4 region coding for a corresponding polypeptide preferentially functionally expressed by human B cells, sequence comprising a sequence selected from WGRG (SEQ followed by all possible N or C-terminal truncations of ID NO. 23) and WGQG (SEQID NO. 23). IGHD alone and all possible combinations of N and In certain embodiments of the invention, the polynucle 50 C-terminal truncations, followed by otide sequences further comprise 5' and 3' sequences which {0 to 5 amino acids selected from the group consisting of facilitate homologous recombination with a heavy chain fewer than ten of the amino acids preferentially encoded chassis. by TodT and preferentially functionally expressed by In some embodiments, the invention comprises a synthetic human B cells, followed by preimmune human antibody light chain library comprising 55 (all possible N-terminal truncations of IGHJ, down to polynucleotide sequences encoding human antibody kappa DXWG (SEQID NO:530), wherein X is S, V, L, orY}. light chains represented by the formula: In certain embodiments, the invention comprises a syn thetic preimmune human antibody heavy chain variable domain library comprising 107 to 10' polynucleotide In certain embodiments of the invention, IGKV (1-95) is 60 sequences encoding human antibody heavy chain variable selected from the group consisting of IGKV3-20 (SEQ ID domains, said library comprising: NO. 237) (1-95), IGKV1-39 (SEQ ID NO. 233) (1-95), a) an antibody heavy chain chassis, and IGKV3-11 (SEQ ID NO: 235) (1-95), IGKV3-15 (SEQ ID b) a CDRH3 repertoire designed based on the human NO: 236) (1-95), IGKV1-05 (SEQ ID NO: 229) (1-95), IGHD and IGHJ germline sequences, as follows: IGKV4-01 (1-95), IGKV2-28 (SEQ ID NO. 234) (1-95), 65 {0 to 5 amino acids selected from the group consisting of IGKV 1-33 (1-95), IGKV1-09 (SEQ ID NO: 454) (1-95), fewer than ten of the amino acids preferentially IGKV1-12 (SEQ ID NO: 230) (1-95), IGKV2-30 (SEQ ID encoded by terminal deoxynucleotidyl transferase US 8,877,688 B2 63 64 (TdT) and preferentially functionally expressed by Stranded coding polynucleotide sequences, single-stranded human B cells, followed by non-coding polynucleotide sequences, and double-stranded (all possible N or C-terminal truncations of IGHD alone polynucleotide sequences. and all possible combinations of N and C-terminal In certain embodiments, the invention comprises a syn truncations}, followed by thetic full-length preimmune human antibody library com {0 to 5 amino acids selected from the group consisting of prising about 107 to about 10" polynucleotide sequences fewer than ten of the amino acids preferentially representative of the sequence diversity and length diversity encoded by TaT and preferentially functionally found in known heavy chain CDR3 sequences. expressed by human B cells, followed by In certain embodiments, the invention comprises a method (all possible N-terminal truncations of IGHJ, down to 10 of selecting an antibody of interest from a human antibody DXWG (SEQID NO: 530), wherein X is S, V, L, or library, comprising providing a synthetic preimmune human Y}. antibody CDRH3 library comprising a theoretical diversity of In some embodiments of the invention, the synthetic pre (N) polynucleotide sequences representative of the sequence immune human antibody heavy chain variable domain library 15 diversity and length diversity found in known heavy chain is expressed as a full length chain selected from the group CDR3 sequences, wherein the physical realization of that consisting of an IgG1 full length chain, an IgG2 full length diversity is an actual library of a size at least 3(N), thereby chain, an IgG3 full length chain, and an IgG4 full length providing a 95% probability that a single antibody of interest chain. is present in the library, and selecting an antibody of interest. In certain embodiments of the invention, the human anti In some embodiments of the invention, the theoretical body heavy chain chassis is selected from the group consist diversity is about 107 to about 10 polynucleotide sequences. ing of IGHV4-34 (SEQID NO:35), IGHV3-23 (SEQID NO: 30), IGHV5-51 (SEQID NO: 40), IGHV1-69 (SEQID NO: EXAMPLES 27), IGHV3-30 (SEQID NO:31), IGHV4-39 (SEQID NO: 36), IGHV1-2 (SEQID NO: 24), IGHV1-18 (SEQ ID NO: 25 This invention is further illustrated by the following 25), IGHV2-5 (SEQ ID NO. 429), IGHV2-70 (SEQID NO: examples which should not be construed as limiting. The 431,432), IGHV3-7 (SEQ ID NO: 28), IGHV6-1 (SEQ ID contents of all references, patents and published patent appli NO:449), IGHV1-46 (SEQID NO: 26), IGHV3-33 (SEQID cations cited throughout this application are hereby incorpo NO:32), IGHV4-31 (SEQ ID NO:34), IGHV4-4 (SEQ ID rated by reference. NO: 446,447), IGHV4-61 (SEQID NO:38), and IGHV3-15 30 In general, the practice of the present invention employs, (SEQ ID NO: 29). unless otherwise indicated, conventional techniques of chem In some embodiments of the invention, the synthetic pre istry, molecular biology, recombinant DNA technology, PCR immune human antibody heavy chain variable domain library technology, immunology (especially, e.g., antibody technol comprises 107 to 10" polynucleotide sequences encoding ogy), expression systems (e.g., yeast expression, cell-free human antibody heavy chain variable domains, said library 35 expression, phage display, ribosome display, and PROFU comprising: SIONTM), and any necessary cell culture that are within the a) an antibody heavy chain chassis, and skill of the art and are explained in the literature. See, e.g., b) a synthetic preimmune human antibody CDRH3 library. Sambrook, Fritsch and Maniatis, Molecular Cloning Cold In some embodiments of the invention, the polynucleotide Spring Harbor Laboratory Press (1989); DNA Cloning, Vols. sequences are single-stranded coding polynucleotide 40 1 and 2. (D. N. Glover, Ed. 1985); Oligonucleotide Synthesis Sequences. (M. J. Gait, Ed. 1984); PCR Handbook Current Protocols in In certain embodiments of the invention, the polynucle Nucleic Acid Chemistry, Beaucage, Ed. John Wiley & Sons otide sequences are single-stranded non-coding polynucle (1999) (Editor); Oxford Handbook of Nucleic Acid Structure, otide sequences. Neidle, Ed., Oxford Univ Press (1999); PCR Protocols: A In some embodiments of the invention, the polynucleotide 45 Guide to Methods and Applications, Innis et al., Academic sequences are double-stranded polynucleotide sequences. Press (1990); PCR Essential Techniques: Essential Tech In certain embodiments, the invention comprises a popu niques, Burke, Ed., John Wiley & Son Ltd (1996): The PCR lation of replicable cells with a doubling time of four hours or Technique: RT-PCR, Siebert, Ed., Eaton Pub. Co. (1998); less, in which a synthetic preimmune human antibody reper Antibody Engineering Protocols (Methods in Molecular toire is expressed. 50 Biology), 510, Paul, S., Humana Pr (1996); Antibody Engi In some embodiments of the invention, the population of neering: A Practical Approach (Practical Approach Series, replicable cells are yeast cells. 169), McCafferty, Ed., Irl Pr(1996); Antibodies: A Labora In certain embodiments, the invention comprises a method tory Manual, Harlow et al., C.S.H.L. Press, Pub. (1999); of generating a full-length antibody library comprising trans Current Protocols in Molecular Biology, eds. Ausubel et al., forming a cell with a preimmune human antibody heavy chain 55 John Wiley & Sons (1992); Large-Scale Mammalian Cell variable domain library and a synthetic preimmune human Culture Technology, Lubiniecki, A., Ed., Marcel Dekker, antibody light chain library. Pub., (1990); Phage Display: A Laboratory Manual, C. Bar In some embodiments, the invention comprises a method bas (Ed.), CSHL Press, (2001); Antibody Phage Display, P of generating a full-length antibody library comprising trans O'Brien (Ed.), Humana Press (2001); Border et al., Nature forming a cell with a preimmune human antibody heavy chain 60 Biotechnology, 1997, 15:553; Border et al., Methods Enzy variable domain library and a synthetic preimmune human mol., 2000, 328: 430; ribosome display as described by antibody light chain library. Pluckthun et al. in U.S. Pat. No. 6,348,315, and ProfusionTM In certain embodiments, the invention comprises a method as described by Szostak et al. in U.S. Pat. Nos. 6.258,558: of generating an antibody library comprising synthesizing 6.261,804; and 6.214.553; and bacterial periplasmic expres polynucleotide sequences by split-pool DNA synthesis. 65 Sion as described in US2004.00584.03A1. Each of the refer In some embodiments of the invention, the polynucleotide ences cited in this paragraph is incorporated by reference in sequences are selected from the group consisting of single its entirety. US 8,877,688 B2 65 66 Further details regarding antibody sequence analysis using TABLE 3 Kabat conventions and programs to screen aligned nucleotide and amino acid sequences may be found, e.g., in Johnson et IGHV Characteristics and Occurrence in Antibodies from Peripheral Blood al., Methods Mol. Biol., 2004, 248: 11: Johnson et al., Int. Estimated Relative Immunol., 1998, 10: 1801; Johnson et al., Methods Mol. GHV Length of Length of Canonical Occurrence in Biol., 1995, 51: 1; Wu et al., Proteins, 1993, 16:1; and Martin, Germline CDRH1 CDRH2 Structures' Peripheral Blood Proteins, 1996, 25: 130. Each of the references cited in this paragraph is incorporated by reference in its entirety. GHV1-2 5 7 -3 37 GHV1-3 5 7 -3 15 10 2. Further details regarding antibody sequence analysis GHV1-8 5 7 -3 13 using Chothia conventions may be found, e.g., in GHV1-18 5 7 -2 25 Chothia et al., J. Mol. Biol., 1998,278: 457; Morea et al., GHV1-24 5 7 -U 5 Biophys. Chem., 1997, 68: 9; Morca et al., J. Mol. Biol., GHV1-45 5 7 -3 O 1998, 275: 269; Al-Lazikani et al., J. Mol. Biol., 1997, GHV1-46 5 7 -3 25 15 GHV1-58 5 7 -3 2 273: 927. Barre et al., Nat. Struct. Biol., 1994, 1: 915; GHV1-69 5 7 -2 58 Chothia et al., J. Mol. Biol., 1992, 227: 799; Chothia et GHV2-5 7 6 3-1 10 al., Nature, 1989, 342: 877; and Chothia et al., J. Mol. GHV2-26 7 6 3-1 9 Biol., 1987, 196:901. Further analysis of CDRH3 con GHV2-70 7 6 3-1 13 formation may be found in Shirai et al., FEBS Lett. GHV3-7 5 7 -3 26 1999, 455: 188 and Shirai et al., FEBS Lett., 1996, 399: GHV3-9 5 7 -3 15 1. Further details regarding Chothia analysis are GHV3-11 5 7 -3 13 described, for example, in Chothia et al., Cold Spring GHV3-13 5 6 -1 3 Harb. Symp. Quant Biol., 1987, 52: 399. Each of the GHV3-15 5 9 -4 14 references cited in this paragraph is incorporated by 25 GHV3-20 5 7 -3 3 GHV3-21 5 7 -3 19 reference in its entirety. GHV3-23 5 7 -3 8O Further details regarding CDR contact considerations are GHV3-30 5 7 -3 67 described, for example, in MacCallum et al., J. Mol. Biol. GHV3-33 5 7 -3 28 GHV3-43 5 7 -3 2 1996, 262: 732, incorporated by reference in its entirety. 30 GHV3-48 5 7 -3 21 Further details regarding the antibody sequences and data GHV3-49 5 9 -U 8 bases referred to herein are found, e.g., in Tomlinson et al., J. GHV3-53 5 6 -1 7 Mol. Biol., 1992, 227: 776, VBASE2 (Retter et al., Nucleic GHV3-64 5 7 -3 2 Acids Res., 2005, 33: D671); BLAST (www.ncbi.nlm.nih GHV3-66 5 7 -3 3 .gov/BLAST/); CDHIT (bioinformatics.ljcrf.edu/cd-hi/); 35 GHV3-72 5 9 -4 2 EMBOSS (www.hgmp.mrc.ac.uk/Software/EMBOSS/); GHV3-73 5 9 -4 3 PHYLIP (evolution.genetics.washington.edu/phylip.html); GHV3-74 5 7 -3 14 and FASTA (fasta.bioch. virginia.edu). Each of the references GHV4-4 5 6 -1 33 cited in this paragraph is incorporated by reference in its GHV4-28 6 6 2-1 1 entirety. 40 GHV4-31 7 6 3-1 25 GHV4-34 5 6 -1 125 GHV4-39 7 6 3-1 63 Example 1 GHV4-59 5 6 -1 51 GHV4-61 7 6 3-1 23 45 GHV4-B 6 6 2-1 7 Design of an Exemplary VH Chassis Library GHVS-51 5 7 -2 52 GHV6-1 7 8 3-5 26 This example demonstrates the selection and design of GHV7-4-1 5 7 -2 8 exemplary, non-limiting VH chassis sequences of the inven A apted from Chothia et al., J. Mol. Biol., 1992, 227: 799 tion. VH chassis sequences were selected by examining col 50 lections of human IGHV germline sequences (Scaviner et al., 2A apted from Table S1 of Wang et al., Immunol. Cell. Biol., 2008, 86: 111 Exp. Clin. Immunogenet., 1999, 16:234: Tomlinson et al., J. In the currently exemplified library, 17 germline sequences Mol. Biol., 1992, 227: 799; Matsuda et al., J. Exp. Med., were chosen for representation in the VH chassis of the library 1998, 188: 2151, each incorporated by reference in its (Table 4). As described in more detail below, these sequences entirety). As discussed in the Detailed Description, as well as 55 were selected based on their relatively high representation in below, a variety of criteria can be used to select VH chassis the peripheral blood of adults, with consideration given to the sequences, from these data sources or others, for inclusion in structural diversity of the chassis and the representation of the library. particular germline sequences in antibodies used in the clinic. Table 3 (adapted from information provided in Scaviner et 60 These 17 sequences account for about 76% of the total sample al., Exp. Clin. Immunogenet., 1999, 16:234. Matsuda et al., of heavy chain sequences used to derive the results of Table 4. J. Exp. Med., 1998, 188: 2151; and Wang et al. Immunol. As outlined in the Detailed Description, these criteria are Cell. Biol., 2008, 86: 111, each incorporated by reference in non-limiting, and one of ordinary skill in the art will readily its entirety) lists the CDRH1 and CDRH2 length, the canoni recognize that a variety of other criteria can be used to select cal structure and the estimated relative occurrence in periph 65 the VH chassis sequences, and that the invention is not limited eral blood, for the proteins encoded by each of the human to a library comprising the 17VH chassis genes presented in IGHV germline sequences. Table 4. US 8,877,688 B2 67 68 TABLE 4 TABLE 4-continued VH Chassis Selected for Use in the Exemplary Library VH Chassis Selected for Use in the Exemplary Library VH Relative Length of Length of VH Relative Length of Length of Chassis Occurrence CDRH1 CDRH2 Comment Chassis Occurrence CDRH1 CDRH2 Comment VH1-2 37 5 17 Among highes usage for VH4 family VH1 family VH4-59 51 5 16 Among highest usage in VH1-18 25 5 17 Among highes usage for VH4 family VH1 family VH4-61 23 7 16 Among highest usage in VH1-46 25 5 17 Among highes usage for 10 VH4 family VH1 family VH4-B 7 6 16 Not among highest usage in VH1-69 58 5 17 Highest usage for VH1 VH4 family, but has unique family. The four chosen structure (H1 of length 6). VH1 chassis represent The 6 chosen VH4 chassis about 80% of the VH1 account for close to 90% of repertoire. 15 the VH4 family repertoire VH3-7 26 5 17 Among highest usage in VHS-51 52 5 17 High usage VH3 family VH3-15 14 5 19 Not among highest usage, but it has unique structure In this particular embodiment of the library, VH chassis (H2 of length 19). Highest occurrence among those derived from sequences in the IGHV2, IGHV6 and IGHV7 with Such structure. germline families were not included. As described in the VH3-23 8O 5 17 Highest usage in VH3 Detailed Description, this exemplification is not meant to be family. VH3-30 67 5 17 Among highest usage in limiting, as, in some embodiments, it may be desirable to VH3 family include one or more of these families, particularly as clinical VH3-33 28 5 17 Among highest usage in information on antibodies with similar sequences becomes VH3 family 25 available, to produce libraries with additional diversity that is VH3-48 21 5 17 Among highest usage in VH3 family. The six chosen potentially unexplored, or to study the properties and poten VH3 chassis account for tial of these IGHV families in greater detail. The modular about 70% of the VH3 design of the library of the present invention readily permits repertoire. the introduction of these, and other, VH chassis sequences. VH4-31 25 7 16 Among highest usage in 30 VH4 family The amino acid sequences of the VH chassis utilized in this 125 5 16 Highest usage in VH4 family particular embodiment of the library, which are derived from 63 7 16 Among highest usage in the IGHV germline sequences, are presented in Table 5. The details of the derivation procedures are presented below. TABLE 5 Amino Acid Sequences for VH Chassis Selected for Inclusion in the Exemplary Library

SEO ID Chassis NO: FRM1 CDRH1 FRM2 CDRH2 FRM3

WH1-2 24 OVOLVOSG GYYMH WVROAPG WINPNSG RVTMTRDTSI AEWKKPGA OGLEWMG GTNYAOK STAYMELSRL SWKWSCKA FOG RSDDTAVYYC SGYTFT AR

WH1 - 18 25 OVOLVOSG SYGIS WVROAPG WISAYNG RVTMTTDTST AEWKKPGA OGLEWMG NTNYAOK STAYMELRSL SWKWSCKA LQG RSDDTAVYYC SGYTFT AR

WH1 - 46 26 OVOLVOSG SYYMH WVROAPG IINPSGG RVTMTRDTST AEWKKPGA OGLEWMG STSYAOK STVYMELSSL SWKWSCKA FOG RSEDTAVYYC SGYTFT AR

WH1 - 69 27 OVOLVOSG SYAIS WVROAPG GIIPIFG RVTITADKST AEWKKPGS OGLEWMG TANYAOK STAYMELRSL SWKWSCKA FOG RSEDTAVYYC SGGTFS AR

WH3-7 28 EVOLVESG SYWMS WVROAPG NIKODGS RFTISRDNAK GGLVOPGG KGLEWVA EKYYVDS NSLYLOMNSL SLRLSCAA WKG RAEDTAVYYC SGFTFS AR

VH3-15' 29 EVOLVESG NAWMS WVROAPG RIKSKTD RFTISRDDSK GGLWKPGG KGLEWVG GGTTDYA INTLYLOMNSL SLRLSCAA APWKG RAEDTAVYYC SGFTFS AR US 8,877,688 B2 69 70 TABLE 5- continued Amino Acid Sequences for VH Chassis Selected for Inclusion in the Exemplary Library

SEQ ID Chassis NO: FRM1 CDRH1 FRM2 CDRH2 FRM3

WH3 - 23 3O EVOLLESG SYAMS WVROAPG AISGSGG RFTISRDNSK GGLVOPGG KGLEWWS STYYADS NTLYLOMNSL SLRLSCAA WKG RAEDTAVYYC SGFTFS AK

WH3-3 O 31 QVOLVESG SYGMH WVROAPG WISYDGS RFTISRDNSK GGVVOPGR KGLEWWA NKYYADS NTLYLOMNSL SLRLSCAA WKG RAEDTAVYYC SGFTFS AR

WH3-33 32 QVOLVESG SYGMH WVROAPG WIWYDGS RFTISRDNSK GGVVOPGR KGLEWWA NKYYADS NTLYLOMNSL SLRLSCAA WKG RAEDTAVYYC SGFTFS AR

WH3 - 48 33 EVOLVESG SYSMN WVROAPG YISSSSS RFTISRDNAK GGLVOPGG KGLEWWS TIYYADS NSLYLOMNSL SLRLSCAA WKG RAEDTAVYYC SGFTFS AR

WH4 - 31 34 QWOLOESG SGGYY WIROHPG YIYYSGS RVTISWDTSK PGLVKPSO WS KGLEWIG TYYNPSL NOFSLKLSSW TLSLTCTV KS TAADTAVYYC SGGSIS AR

VH4-34° 35 QWOLOOWG GYYWS WIROPPG EIDHSGS RVTISWDTSK AGLLKPSE KGLEWIG TNYNPSL NOFSLKLSSW TLSLTCAW KS TAADTAVYYC YGGSFS AR

WH4 - 39 36 OLOLOESG SSSYY WIROPPG SIYYSGS RVTISWDTSK PGLWKPSE WG KGLEWIG TYYNPSL NOFSLKLSSW TLSLTCTV KS TAADTAVYYC SGGSIS AR

WH4 - 59 37 QWOLOESG SYYWS WIROPPG YIYYSGS RVTISWDTSK PGLWKPSE KGLEWIG TNYNPSL NOFSLKLSSW TLSLTCTV KS TAADTAVYYC SGGSIS AR

WH4 - 61 38 QWOLOESG SGSYY WIROPPG YIYYSGS RVTISWDTSK PGLWKPSE WS KGLEWIG TNYNPSL NOFSLKLSSW TLSLTCTV KS TAADTAVYYC SGGSWS AR

WH4 - 8 39 QWOLOESG SGYYW WIROPPG SIYHSGS RVTISWDTSK PGLWKPSE G KGLEWIG TYYNPSL NOFSLKLSSW TLSLTCAW KS TAADTAVYYC SGYSIS AR

WHS-51 4 O EVOLVOSG SYWIG WVROMPG IIYPGDS OVTISADKSI AEWKKPGE KGLEWMG DTRYSPS STAYLOWSSL SLKISCKG FOG KASDTAVYYC SGYST AR

The Original KT sequence in WH3-15 was mutated to RA (bold/underlined) and TT to AR (bold/underlined), in order to match other VH3 family members selected for inclusion in the library. The modification to RA was made so that no unique sequence stretches of up to about 20 amino acids are Created. Without being bound by theory, this modification is expected to reduce the Odds of introducing novel T-cell epitopes in the WH3-15-derived chassis sequence. The avoidance of T cell epitopes is an additional Criterion that can be considered in the design of certain libraries of the invention. 2The Original NHS motif in VH4-34 was mutated to DHS, in order to remove a possible N-linked glycosylation site in CDR-H2. In certain embodiments of the invention, for example, if the library is transformed into yeast, this may prevent unwanted N-linked glycosylation.

60 Table 5 provides the amino acid sequences of the seventeen additional nucleotide would make the resulting codon encode chassis. In nucleotide space, most of the corresponding ger one of the following two amino acids: Asp (if the codon is mline nucleotide sequences include two additional nucle GAC or GAT) or Glu (if the codon is GAA or GAG). One, or otides on the 3' end (i.e., two-thirds of a codon). In most cases, both, of the two 3'-terminal nucleotides may also be deleted in those two nucleotides are GA. In many cases, nucleotides are 65 the final rearranged heavy chain sequence. If only the A is added to the 3' end of the IGHV-derived gene segment in vivo, deleted, the resulting amino acid is very frequently a G. If prior to recombination with the IGHD gene segment. Any both nucleotides are deleted, this position is “empty, but US 8,877,688 B2 71 72 followed by a general V-D addition oran amino acid encoded human heavy chain antibody sequences was analyzed (Lee et by the IGHD gene. Further details are presented in Example al., Immunogenetics, 2006, 57: 917: Jackson et al., J. Immu 5. This first position, after the CAR or CAK motif at the nol. Methods, 2007, 324: 26) and they were classified by the C-terminus of FRM3 (Table 5), is designated the “tail.” In the origin of their respective IGHV germline sequence. As an currently exemplified embodiment of the library, this residue illustrative example, about 200 sequences in the data set may be G, D, E, or nothing. Thus, adding the tail to any exhibited greatest identity to the IGHV1-69 germline, indi chassis enumerated above (Table 5) can produce one of the cating that they were likely to have been derived from IGHV1-69. Next, the occurrence of amino acid residues at following four schematic sequences, wherein the residue fol each position within the CDRH1 and CDRH2 segments, in lowing the VH chassis is the tail: each germline family selected in Example 1 was determined. (1) VH Chassis-G 10 For VH1-69, these occurrences are illustrated in Tables 6 and (2) VH Chassis-D 7. Second, neutral and/or Smaller amino acid residues were (3) VH Chassis-E favored, where possible, as replacements. Without being (4) IVH Chassis bound by theory, the rationale for the choice of these amino These structures can also be represented in the format: acid residues is the desire to provide a more flexible and less VH Chassis-G/D/E/-, 15 sterically hindered context for the display of a diversity of wherein the hyphen symbol (-) indicates an empty or null CDR sequences. position. Using the CDRH3 numbering system defined in the Defi TABLE 6 nitions section, the above sequences could be denoted to have Occurrence of Amino Acid Residues at Each Position amino acid 95 as G, D, or E, for instances (1), (2), and (3), Within IGHV1-69-derived CDRH1 Sequences respectively, while the sequence of instance 4 would have no 31 32 33 34 35 position 95, and CDRH3 proper would begin at position 96 or S Y A. I S 97. In some embodiments of the invention, VH3-66, with A. 1 O 129 O O 25 C O 1 O O 2 canonical structure 1-1 (five residues in CDRH1 and 16 for D O 5 1 O O CDRH2) may be included in the library. The inclusion of E O O O O O VH3-66 may compensate for the removal of other chassis F O 9 1 8 O from the library, which may not express well in yeast under G O O 24 O 3 H 2 11 O O 4 some conditions (e.g., VH4-34 and VH4-59). 30 I 2 O O 159 1 K 3 O O O O Example 2 L O 1O 2 5 O M 1 O O O O N 21 2 2 O 27 Design of VH Chassis Variants with Variation Within P O O 1 O O CDRH1 and CDRH2 35 Q 1 1 O O 5 R 9 O O O 1 This example demonstrates the introduction of further S 133 3 7 O 129 diversity into the VH chassis by creating mutations in the T 12 1 10 O 12 V O O 7 13 O CDRH1 and CDRH2 regions of each chassis shown in W O O O O O Example 1. The following approach was used to select the Y O 142 1 O 1 positions and nature of the amino acid variation for each 40 chassis: First, the sequence identity between rearranged TABLE 7 Occurrence of Amino Acid Residues at Each Position Within IGHV1-69-derived CDRH2 Sequences

50 51 52 S2A S3 SS S6 S7 S8 S9 6O 61 62 63 64 65 G I I P I G T A N Y A Q K F Q G 4 3 132 O O 178 O O O O O O O O O O O O O O O O 11 O 1 21 O O O 2 O O 12 4 O O 2 O 1 1 4 O 2 O 1 1 O O O O O O O O 18O O O 155 O 3 1 O O O O O O 173 O O O 4 4 O 3 O O 4 O 1 3. O 34 O 2 1 O O O O O O O 4 1 5 O O 2 156 O 3 O 1 3 O 1 O O O O O O 3 2 O O 3 1 O O O O O O O O 5 O O 132 1 O O 8 O O O O O 15 O O 3 6 O O O O 1 O 1 O O O 173 2 O 164 O 1 4 O 3 O O O 13 O 9 O 3 5 8 7 O 2 O O 1 O O O 127 15 8 3 1 O O O O O 1 O 4 8 O O O O O O O O O O O O O O O O O O O 1 1 O O O 176 O O O 1 1 O US 8,877,688 B2 73 74 The original germline sequence is provided in the second - Continued row of the tables, in bold font, beneath the residue number (Kabat system). The entries in the table indicate the number of SYTIS (SEQ ID NO: 44) times a given amino acid residue (first column) is observed at SYAIN (SEQ ID NO: 45) the indicated CDRH1 (Table 6) or CDRH2 (Table 7) position. 5 For example, at position 33 the amino acid type G (glycine) is Similarly, the analysis that produced Table 7 provided a observed 24 times in the set of IGHV1-69-based sequences basis for choosing the following single-amino acid variant that were examined. Thus, applying the criteria above, Vari sequences for VH1-69 chassis CDRH2s: ants were constructed with N at position 31, L at position 32 (H can be charged, under some conditions), G and T at posi... 10 tion 33, no variants at position 34 and N at position 35, SIIPIFGTANYAOKFOG (SEQ ID NO: 46) resulting in the following VH1-69 chassis CDRH1 single amino acid variant sequences: GIAPIFGTANYAOKFOG (SEO ID NO: 47) 15 GIIPILGTANYAOKFOG (SEQ ID NO: 48) NYAIS (SEQ ID NO: 41) GIIPIFGTASYAQKFOG (SEQ ID NO: 49)

SLAIS (SEQ ID NO: 42) A similar approach was used to design and COnStruct Var ants of the other selected chassis; the resulting CDRH1 and 2O CDRH2 variants for each of the exemplary chassis are pro SYGIS (SEQ ID NO: 43) vided in Table 8. One of ordinary skill in the art will readily recognize that the methods described herein can be applied to create variants of other VH chassis and VL chassis.

TABLE 8

WH Chassis Wariants

SEO ID SEO ID SEQ ID SEQ ID Chassis CDRH1NO: CDRH2 No : Chassis CDRH1 NO: CDRH2 NO :

- 18. O SYGIS SO WISAYNGNT 56 3-48. O SYSMN 29 YISSSSSTI 136 NYAOKLQG YYADSWKG -18. 1 NYGIS 51 WISAYNGNT 56 3-48.1' NYSMN 3 O YISSSSSTI 136 NYAOKLQG YYADSWKG

-18.2 SNGIS 52 WISAYNGNT 56 3-48.2 IYSMN 31 YISSSSSTI 136 NYAOKLQG YYADSWKG

-18.3 SYAIS 53 WISAYNGNT 56 3-48.3 SNSMN 32 YISSSSSTI 136 NYAOKLQG YYADSWKG

- 18.4 SYGIT 54 WISAYNGNT 56 3-48. 4 SYEMN 33 YISSSSSTI 136 NYAOKLQG YYADSWKG

-18.5 SYGIH 55 WISAYNGNT 56 3-48.5 SYNMN 34 YISSSSSTI 136 NYAOKLQG YYADSWKG

-18. 6 SYGIS 50 sISAYNGNT 57 3-48. 6 SYSMT 35 YISSSSSTI 136 NYAOKLQG YYADSWKG

- 18.7 SYGIS 50 WISTYNGNT 58 3-48.7 SYSMN 29 TISSSSSTI 137 NYAOKLQG YYADSWKG

- 18.8 SYGIS 50 WISPYNGNT 59 3-48.8 SYSMN 29 YISGSSSTI 138 NYAOKLQG YYADSWKG

- 18.9 SYGIS 50 WISAYNGNT 60 3-48.9 SYSMN 29 YISSSSSTI 139 YYAOKLQG LYADSVKG

-2. O GYYMH 61 WINPNSGGT 67 3-7. O SYWMS 4O NIKODGSEK 152 NYAOKFOG YYWDSWKG

-2.1 DYYMH 62 WINPNSGGT 67 3-7.1 TYWMS 41 NIKODGSEK 152 NYAOKFOG YYWDSWKG

-2.2 RYYMH 63 WINPNSGGT 67 3-7. 2 NYWMS 42 NIKODGSEK 152 NYAOKFOG YYWDSWKG

-2.3 GSYMH 64 WINPNSGGT 67 3-7.3 SSWMS 43 NIKODGSEK 152 NYAOKFOG YYWDSWKG

-2.4 GYSMH 65 WINPNSGGT 67 3-7.4 SYGMS 44 NIKODGSEK 152 NYAOKFOG YYWDSWKG

US 8,877,688 B2 82 TABLE 8- continued

WH Chassis Wariants

SEQ ID SEQ ID SEQ ID SEO ID Chassis CDRH1NO: CDRH2 No : Chassis CDRH1 NO : CDRH2 NO:

5-519 SYWIG 218 IIYPGDSDT 228

Contains an N-linked glycosylation site which can be removed, if desired, as described herein.

As specified in the Detailed Description, other criteria can TABLE 9-continued be used to select which amino acids are to be altered and the identity of the resulting altered sequence. This is true for any IGKV Gene Characteristics and Occurrence in Antibodies from Peripheral Blood heavy chain chassis sequence, or any other sequence of the 15 invention. The approach outlined above is meant for illustra Estimated tive purposes and is non-limiting. Relative Occurrence Alternative CDRL1 CDRL2 Canonical in Peripheral Example 3 IGKV Gene Names Length Length Structures' Blood Design of an Exemplary VK Chassis Library IGKV4-1 B3 17 7 3-1-(1) 83 IGKVS-2 B2 11 7 2-1-(1) 1 IGKV6-21 A10, A26 11 7 2-1-(1) 6 This example describes the design of an exemplary VK IGKV6D-41 A14 11 7 2-1-(1) O chassis library. One of ordinary skill in the art will recognize 25 Adapted from Tomlinson et al. EMBO.J., 1995, 14:4628, incorporated by reference in its that similar principles may be used to design a VW library, or entirety, The number in parenthesis refers to canonical structures in CDRL3, if one assuming a library containing both VK and V chassis. Design of a VW the most common length (see Example 5 for further detail about CDRL3). *Estimated from sets of human VK sequences compiled from the NCBI database; full set of chassis library is presented in Example 4. GInumbers provided in Appendix A, As was previously demonstrated in Example 1, for IGHV germline sequences, the sequence characteristics and occur The 14 most commonly occurring IGKV germline genes rence of human IGKV germline sequences in antibodies from 30 (bolded in column 6 of Table 9) account for just over 90% of peripheral blood were analyzed. The data are presented in the usage of the entire repertoire in peripheral blood. From the Table 9. analysis of Table 9, ten IGKV germline genes were selected for representation as chassis in the currently exemplified TABLE 9 library (Table 10). All but V1-12 and V1-27 are among the top 35 10 most commonly occurring. IGKV germline genes VH2 IGKV Gene Characteristics and Occurrence in 30, which was tenth in terms of occurrence in peripheral Antibodies from Peripheral Blood blood, was not included in the currently exemplified embodi Estimated ment of the library, in order to maintain the proportion of Relative chassis with short (i.e., 11 or 12 residues in length) CDRL1 Occurrence 40 sequences at about 80% in the final set of 10 chassis. V1-12 Alternative CDRL1 CDRL2 Canonical in Peripheral was included in its place. V1-17 was more similar to other GKV Gene Names Length Length Structures' Blood? members of the V1 family that were already selected; there GKV1-05 L12 7 2-1-(U) 69 fore, V1-27 was included, instead of V1-17. In other embodi GKV1-06 L11 7 2-1-(1) 14 ments, the library could include 12 chassis (e.g., the ten of GKV1-08 L9 7 2-1-(1) 9 GKV1-09 L8 7 2-1-(1) 24 45 Table 10 plus V1-17 and V2-30), or a different set of any “N” GKV1-12 L5, L19 7 2-1-(1) 32 chassis, chosen strictly by occurrence (Table 9) or any other GKV1-13 L4, L18 7 2-1-(1) 13 criteria. The ten chosen VK chassis account for about 80% of GKV1-16 L 7 2-1-(1) 15 the usage in the data set believed to be representative of the GKV1-17 A30 7 2-1-(1) 34 GKV1-27 A2O 7 2-1-(1) 27 entire kappa light chain repertoire. GKV1-33 O8, O18 7 2-1-(1) 43 50 GKV1-37 O14, O4 7 2-1-(1) 3 TABLE 10 GKV1-39 O2, O12 7 2-1-(1) 147 GKV1D-16 L15 7 2-1-(1) 6 VK Chassis Selected for Use in the Exemplary Library GKV1D-17 L14 7 2-1-(1) 1 GKV1D-43 L23 7 2-1-(1) 1 CDR-L1 CDR-L2 Canonical Estimated Relative Occurrence GKV1D-8 L24 7 2-1-(1) 1 55 Chassis Length Length Structures in Peripheral Blood GKV2-24 A23 6 7 4-1-(1) 8 GKV2-28 A19, A3 6 7 4-1-(1) 62 WK1-5 11 7 2-1-(U) 69 GKV2-29 A18 6 7 4-1-(1) 6 WK1-12 11 7 2-1-(1) 32 GKV2-30 A17 6 7 4-1-(1) 30 WK1-27 11 7 2-1-(1) 27 GKV2-40 O1, O11 7 7 3-1-(1) 3 WK1-33 11 7 2-1-(1) 43 GKV2D-26 A8 6 7 4-1-(1) O WK1-39 11 7 2-1-(1) 147 60 GKV2D-29 A2 6 7 4-1-(1) 2O WK2-28 16 7 4-1-(1) 62 GKV2D-30 A. 6 7 4-1-(1) 4 WK3-11 11 7 2-1-(1) 87 GKV3-11 L6 1 7 2-1-(1) 87 WK3-15 11 7 2-1-(1) 53 GKV3-15 L2 1 7 2-1-(1) 53 WK3-20 12 7 6-1-(1) 195 GKV3-20 A27 2 7 6-1-(1) 195 VK4-1 17 7 3-1-(1) 83 GKV3D-07 L2S 2 7 6-1-(1) O GKV3D-11 L20 1 7 2-1-(U) O 65 GKV3D-2O A11 2 7 6-1-(1) 2 The amino acid sequences of the selected VK chassis enu merated in Table 10 are provided in Table 11. US 8,877,688 B2 83 84 TABLEE 11 Amino Acid Sequences for VK Chassis Selected for Inclusion in the Exemplary Library SEQ ID Chassis FRM1 CDRL1 FRM2 CDRL2 FRM3 CDRL31 NO :

WK1-5 DIOMTQS RASOSI WYOOKP DASSLE GVPSRFSGSGSGT OYNSY 229 PSTLSAS SSWLA. GKAPKL S EFTLTISSLOPDD S WGDRVTI LIY FATYYC TC

WK1-12 DIOMTQS RASQGI WYOOKP AASSLOS GVPSRFSGSGSGT QANSFP 23 O PSSWSAS SSWLA. GKAPKL DFTLTISSLOPED WGDRVTI LIY FATYYC TC

WK1-27 DIOMTQS RASQGI WYOOKP AASTLOS GVPSRFSGSGSGT KYNSAP 231 PSSLSAS SNYLA GKWPKL DFTLTISSLOPED WGDRVTI LIY WATYYC TC

WK1-33 DIOMTOS QASODI WYOOKP DASNLET GVPSRFSGSGSGT OYDNLP 232 PSSLSAS SNYLN GKAPKL DFTLTISSLOPED WGDRVTI LIY IATYYC TC

WK1-39 DIOMTQS RASOSI WYOOKP AASSLOS GVPSRFSGSGSGT OSYSTP 233 PSSLSAS SSYLN GKAPKL DFTLTISSLOPED WGDRVTI LIY FATYYC TC

WK2-28 DIVMTOS RSSOSL WYLOKP LGSNRAS GVPDRFSGSGSGT OALOTP 234 PLSLPWT LHSNGY GOSPOLLIY DFTLKISRWEAED PGEPASI NYLD WGVYYC SC

WK3 - 11 EIWLTQS RASOSV WYOOKP DASNRAT GIPARFSGSGSGT ORSNWP 235 PATLSLS SSYLA GOAPRL DFTLTISSLEPED PGERATL LIY FAWYYC SC

WK3-15 EIVMTOS RASOSV WYOOKP GASTRAT GIPARFSGSGSGT OYNNWP 236 PATLSWS SSNLA GOAPRILLIY EFTLTISSLOSED PGERATL FAWYYC SC

WK3-2O EIVLTOS RASOSV WYOOKP GASSRAT GIPDRFSGSGSGT OYGSSP 237 PGTLSLS SSSYLA GOAPRL DFTLTISRLEPED PGERATL LIY FAWYYC SC

WK4 - 1 DIVMTOS KSSOSV WYOOKP WASTRES GVPDRFSGSGSGT OYYSTP 238 PDSLAVS LYSSNN GOPPKL DFTLTISSLOAED GERATI KNYLA LIY WAVYYC NC

'Note that the portion of the IGKV gene Contributing to WKCDR3 is not considered part of the chassis as described herein. The WK chassis is defined as Kabat residues 1 to 88 of the IGKV- encoded sequence, or from the start of FRM1 to the end of FRM3. The portion of the WKCDR3 sequence Contributed by the IGKV gene is referred to herein as the L3-VK region.

Example 4 TABLE 12 IGAV Gene Characteristics and Occurrence in Peripheral Blood Design of an Exemplary V Chassis Library 55 Contribution of Estimated Relative IGV Alternative Canonical IGV, Gene to Occurrence in This example, describes the design of an exemplary VW Gene Name Structures' CDRL3 Peripheral Blood chassis library. As was previously demonstrated in Examples 1-3, for the VH and VK chassis sequences, the sequence IGV3-1 3R 11-7(*) 8 11.5 s C s 9 IGV3-21 3E 11-7(*) 9 1O.S characteristics and occurrence of human IgwV germline-de- 60 IGV2-14 2A2 14-7(A) 9 10.1 rived sequences in peripheral blood were analyzed. As with IGV1-40 1E 14-7(A) 9 7.7 the assignment of other sequences set forth hereinto germline IGV3-19 3L 11-7(*) 9 7.6 families, assignment of VW sequences to a germline family IGV1-51 1B 13-7(A) 9 7.4 IGV1-44 1C 13-7(A) 9 7.0 was performed via SoDA and VBASE2 (Volpe and Kepler, IGV6-57 6A 13-7(B) 7 6.1 Bioinformatics, 2006, 22:438; Mollova et al., BMS Systems 65 IGV2-8 2C 14-7(A) 9 4.7 Biology, 2007, 1S: P30, each incorporated by reference in its IGV3-2S 3M 11-7(*) 9 4.6 entirety). The data are presented in Table 12. US 8,877,688 B2 85 86 TABLE 12-continued blood (as extrapolated from analysis of published sequences corresponding to the GI codes provided in Appendix B) were IGV Gene Characteristics and Occurrence in Peripheral Blood first discarded. From the remaining 18 germline sequences, Contribution of Estimated Relative the top occurring genes for each unique canonical structure GV Alternative Canonical IGV, Gene to Occurrence in 5 and contribution to CDRL3, as well as any germline gene Gene Name Structures' CDRL3 Peripheral Blood represented at more than the 5% level, were chosen to con GV2-23 2B2 4-7(A) 9 4.3 stitute the exemplary V chassis. The list of 11 such GV3-10 3P 1-7(*) 9 3.4 sequences is given in Table 13, below. These 11 sequences GV4-69 4B 2-11(*) 7 3.0 GV1-47 1G 3-7(A) 9 2.9 represent approximately 73% of the repertoire in the exam GV2-11 2E 4-7(A) 9 1.3 '' ined data set (Appendix B). GV7-43 7A 4-7(B) 8 1.3 GV7-46 7B 4-7(B) 8 1.1 TABLE 13 GVS-45 5C 4-11(*) 8 1.O GV4-60 4A 2-11(*) 7 0.7 V. Chassis Selected for Use in the Exemplary Libr GV10-54 8A 4-7(B) 8 0.7 15 GV8-61 10A 3-7(C) 9 0.7 CDRL1 CDRL2 Canonical Relative GV3-9 3. 1-7(*) 8 O.6 Chassis Length Length Structure Occurrence GV1-36 1A 3-7(A) 9 0.4 GV2-18 2D 4-7(A) 9 O.3 V3-1 11 7 11-7(*) 11.5 GV3-16 3A 1-7(*) 9 O.2 V3-21 11 7 11-7(*) 1O.S GV3-27 1-7(*) 7 O.2 2O V2-14 14 7 14-7(A) 10.1 GV4-3 SA 4-11(*) 8 O.2 V1-40 14 7 14-7(A) 7.7 GVS-39 4C 2-11(*) 12 O.2 V3-19 11 7 11-7(*) 7.6 GV9-49 9A 2-12(*) 12 O.2 V1-51 13 7 13-7(A) 7.4 GV3-12 3I 1-7(*) 9 O.1 V1-44 13 7 13-7(A) 7.0 V6-57 13 7 13-7(B) 6.1 Adapted from Williams et al. J. Mol. Biol. 1996: 264, 220-32. The (*) indicates that the V4-69 12 11 12-11 (*) 3.0 canonical structure is entirely defined by the lengths of CDRs L1 and L2. When distinct structures are possible for identical L1 and L2 length combinations, the structure presentin 25 V7-43 14 7 14-7(B) 1.3 a given gene is set forth as A,B, or C. WS-45 11 11 14-11(*) 1.O *Estimated from a set of human V. sequences compiled from the NCBI database; full set of GI codes set forth in Appendix B. To choose a subset of the sequences from Table 12 to serve The amino acid sequences of the selected VW chassis enu as chassis, those represented at less than 1% in peripheral merated in Table 13 are provided in Table 14, below. TABL E 14 Amino Acid Sequences for Viv Chassis Selected for Inclusion in the Exemplary Library

Chassis FRM1 CDRL1 FRM2 CDRL2 FRM3 CDRL32 W.1 - 4 O QSVLTOP TGSSSN WYOOLP GN- - - - GVPDRFSGSKSG-- QSYDSSLSG SEO ID PSVSGAP IGAGYD GTAPKL SNRPS TSASLAITGLOAEDE NO: 531 GORVTIS ---VH LIY ADYYC C

W.1-44 QSVLTOP SGSSSN WYOOLP SN- - - - GVPDRFSGSKSG-- AAWDDSLNG SEO ID PSASGTP IGSNT - GTAPKL NORPS TSASLAISGLOSEDE NO: 532 GORVTIS ---VN LIY ADYYC C

W1-51 QSVLTOP SGSSSN WYOOLP DN - - - - GIPDRFSGSKSG-- GTWDSSLSA SEO ID PSVSAAP IGNNY - GTAPKL NKRPS TSATLGITGLOTGDE NO: 533 GOKVTIS - - -WS LIY ADYYC C

W2-14 QSALTOP TGTSSD WYOOHP EV- - - - GVSNRFSGSKSG-- SSYTSSSTL SEO ID ASVSGSP WGGYNY GKAPKL SNRPS NTASLTISGLOAEDE NO: 534 GOSITIS - - -WS MIY ADYYC C

V) 3-1 SYELTOP SGDKLG WYOOKP QD- - - - GIPERFSGSNSG-- QAWDSSTA SEO ID PSVSVSP DKY-- - GOAPWL SKRPS NTATLTISGTOAMDE NO: 535 GOTASIT - - -AS WIY ADYYC C

W3-19 SSELTOD QGDSLR WYOOKP GK- - - - GIPDRFSGSSSG-- NSRDSSGNH SEQ ID PAVSVAL SYY- - - GOAPWL NNRPS NTASLTITGAOAEDE NO: 536 GOTVRIT - - -AS MY ADYYC C

W3-21 SYWLTOP GGNNIG WYOOKP YD- - - - GIPERFSGSNSG-- OVWDSSSDH SEQ ID PSVSWAP SKS- - - GOAPWL SDRPS NTATLTISRWEAGDE NO: 537 GKTARIT - - -WH WIY ADYYC C US 8,877,688 B2 87 88 TABLE 14- Continued Amino Acid Sequences for W. Chassis Selected for Inclusion in the Exemplary Library

Chassis FRM1 CDRL1 FRM2 CDRL2 FRM3 CDRL3? W4- 69 OLVLTQS TLSSGH WHOOOP LNSDGS GIPDRFSGSSSG-- OTWGTGI- - SEO ID PSASASL SSYA- - EKGPRY HISKGD AERYLTISSLOSEDE NO: 538 GASWKLT - - - IA LMK ADYYC C V6-57 NFMLTOP TRSSGS WYOORP ED- - - - GWPDRFSGSIDSSSN QSYDSSN-- SEO ID HSVSESP IASNY- GSSPTT NORPS SASLTISGLKTEDEA NO: 539 GKTVTIS - - - VO MY DYYC C V5-45 OAVLTOP TLRSG I WYOOKP YKSDSD GVPSRFSGSKDASAN MIWHSSAS SEO ID ASLSASP NWGTYR GSPPOY KQOGS AGILLISGLOSEDEA NO: 54 O GASASLT - - - IY ILR DYYC C

W.7-43 QTVWTOE ASSTGA WFOOKP ST- - - - WTPARFSGSLLG-- LLYYGGAQ SEO ID PSLTVSP WTSGYY GOAPRA SNKHS GKAALTLSGWOPEDE NO: 541 GGTWTLT - - - PN LIY AEYYC C The last amino acid in CDRL1 of the V.3-1 chassis, S, differs from the corresponding One in the IGV3-1 germline gene, C. This was done to avoid having a potentially unpaired CYS (C) amino acid in the resulting synthetic light chain. Note that, as for the VK chassis, the portion of the IGV gene contributing to W.CDR3 is not considered part of the chassis as described herein. The V. chassis is defined as Kabat residues 1 to 88 of the IGV- encoded sequence, or from the start of FRM1 to the end of FRM3. The portion of the V.CDR3 sequence contributed by the IGV gene is referred to here in as the L3-V. region. Example 5 rated by reference in its entirety), with preference for repre 30 sentation in the library given to those IGHD genes most Design of a CDRH3 Library frequently observed in human sequences. Second, the degree of deletion on either end of the IGHD gene segments was This example describes the design of a CDHR3 library estimated by comparison with known heavy chain sequences, from its individual components. In nature, the CDRH3 using the SoDA algorithm (Volpe et al., Bioinformatics, sequence is derived from a complex process involving recom 35 2006, 22:438, incorporated by reference in its entirety) and bination of three different genes, termed IGHV. IGHD and sequence alignments. For the presently exemplified library, IGHJ. In addition to recombination, these genes may also progressively deleted DH segments, as short as three amino undergo progressive nucleotide deletions: from the 3' end of acids, were included. As enumerated in the Detailed Descrip the IGHV gene, either end of the IGHD gene, and/or the 5' end 40 tion, other embodiments of the invention comprise DH seg of the IGHJ gene. Non-templated nucleotide additions may ments with deletions to a different length, for example, about also occur at the junctions between the V, D and J sequences. 1, 2, 4, 5, 6, 7, 8, 9, or 10 amino acids. Table 15 shows the Non-templated additions at the V-Djunction are referred to as relative occurrence of IGHD gene usage in human antibody “N1, and those at the D-J junction are referred to as “N2. heavy chain sequences isolated mainly from peripheral blood The D gene segments may be read in three forward and, in 45 B cells (list adapted from Lee et al., Immunogenetics, 2006, Some cases, three reverse reading frames. 57: 917, incorporated by reference in its entirety). In the design of the present exemplary library, the codon (nucleotide triplet) or single amino acid was designated as a TABLE 1.5 fundamental unit, to maintain all sequences in the desired Usage of IGHD Genes Based on reading frame. Thus, all deletions or additions to the gene 50 Relative Occurrence in Peripheral Blood segments are carried out via the addition or deletion of amino acids or codons, and not single nucleotides. According to the Estimated Relative Occurrence CDRH3 numbering system of this application, CDRH3 GHD Gene in Peripheral Blood extends from amino acid number 95 (when present; see GHD3-10 117 55 GHD3-22 111 Example 1) to amino acid 102. GHD6-19 95 GHD6-13 93 Example 5.1 GHD3-3 82 GHD2-2 63 Selection of the DHSegments GHD4-17 61 GHD1-26 51 60 GHD5-5/5-18 49 In this illustrative example, selection of DH gene segments GHD2-15 47 for use in the library was performed according to principles GHD6-6 38 similar to those used for the selection of the chassis GHD3-9 32 GHDS-12 29 sequences. First, an analysis of IGHD gene usage was per GHDS-24 29 formed, using data from Lee et al., Immunogenetics, 2006, 65 GHD2-21 28 57: 917; Corbett et al., PNAS, 1982, 79: 4118; and Souto GHD3-16 18 Carneiro et al., J. Immunol., 2004, 172: 6790 (each incorpo US 8,877,688 B2 89 90 TABLE 15-continued TABLE 17 Usage of IGHD Genes Based on D Genes Selected for use in the Exemplary Library Relative Occurrence in Peripheral Blood Amino Acid Total Number Estimated Relative Occurrence GHD Gene Sequence SEQID NO: of Variants? IGHD Gene in Peripheral Blood GHD1-26 1 GIVGATT 13 15 IGHD4-23 1 GHD1-26 3 YSGSYY 14 10 IGHD1-1 GHD2-2 2 GYCSSTSCYT 10 93 IGHD1-7 GHD2-2 3 DIVVVPAAM 11 28 IGHD4-44-11° 10 GHD2-1S 2 GYCSGGSCYS 16 9 IGHD1-20 GHD3-33 ITIFGVVII 9 28 IGHD7-27 GHD3-10 1 WLLWFGELL 1 28 IGHD2-8 GHD3-10 2 YYYGSGSYYN 2 36 IGHD6-2S GHD3-10 3 ITMVRGVII 3 28 GHD3-22 2 YYYDSSGYYY 4 36 Although distinct genes in the genome, the nucleotide sequences of IGHD5-5 and IGH GHD4-17 2 DYGDY 12 6 18 are 100% identical and thus indistinguishable in rearranged VH sequences. 15 *IGHD4-4 and IGHD4-11 are also 100% identical. GHD5-53 GYSYGY 15 10 Adapted from Lee et al. Immunogenetics, 2006, 57.917, by merging the information for GHD6-13 1 GYSSSWY 7 15 distinct alleles of the same IGHD gene. GHD6-13 2 GIAAAG 8 10 *IGHD1-14 may also be included in the libraries of the invention. GHD6-191 GYSSGWY 5 15 GHD6-19 2 GIAVAG 6 10 The translations of the ten most commonly expressed The rea ing frame (RF) is specified as RF after the name of the gene. IGHD gene sequences found in naturally occurring human *In most cases the total number of variants is given by (N-1) times (N-2) divided by two, antibodies, in three reading frames, are shown in Table 16. where N is the total length in amino acids of the intact D segment, Those reading frames which occur most commonly in periph As detailed herein, the number of variants for segments containingaputative disulfide bond eral blood have been highlighted in gray. As in Table 15, data (two C or Cys residues) is limited in this illustrative embodiment, regarding IGHD sequence usage and reading frame statistics For each of the selected sequences of Table 17, variants were derived from Lee et al., 2006, and data regarding IGHD 25 were generated by systematic deletion from the N- and/or sequence reading frame usage were further complemented by C-termini, until there were three amino acids remaining. For data derived from Corbett et al., PNAS, 1982, 79: 4118 and example, for the IGHD4-17 2 above, the full sequence Souto-Cameiro et al., J. Immunol, 2004, 172: 6790, each of DYGDY (SEQ ID NO: 12) may be used to generate the which is incorporated by reference in its entirety. progressive deletion variants: DYGD (SEQ ID NO: 613), TABLE 16 Translations of the Ten Most Common Naturally Occurring IGHD Sequences, in Three Reading Frames (RF)

IGHD RF1 SEQID NO SEQ ID NO IGHD3-10 2 IGHD3-22 4 240 IGHD6-19 6 IGHD6-13

IGHD3-03 IGHD2-02 WIL#YQLLC 245 46

IGHD2-15 RILEww.LLL 251 # represents a stop codon, Reading frames highlighted in gray correspond to the most commonly used reading frames,

In the presently exemplified library, the top 10IGHD genes YGDY (SEQID NO: 614), DYG, GDY andYGD. In general, most frequently used in heavy chain sequences occurring in for any full-length sequence of size N, there will be a total of peripheral blood were chosen for representation in the library. (N-1)*(N-2)/2 total variants, including the original full Other embodiments of the library could readily utilize more 55 sequence. For the disulfide-loop-encoding segments, as or fewer D genes. The amino acid sequences of the selected exemplified by reading frame 2 of both IGHD2-2 and IGHD2-15, (i.e., IGHD2-2 2 and IGH2-15 2), the progres IGHD genes, including the most commonly used reading sive deletions were limited, so as to leave the loop intact i.e., frames and the total number of variants after progressive N only amino acids N-terminal to the first Cys, or C-terminal to and C-terminal deletion to a minimum of three residues, are 60 the second Cys, were deleted in the respective DH segment listed in Table 17. As depicted in Table 17, only the most variants. The foregoing strategy was used to avoid the pres commonly occurring alleles of certain IGHD genes were ence of unpaired cysteine residues in the exemplified version of the library. However, as discussed in the Detailed Descrip included in the illustrative library. This is, however, not tion, other embodiments of the library may include unpaired required, and other embodiments of the invention may utilize 65 cysteine residues, or the Substitution of these cysteine resi IGHD reading frames that occur less frequently in the periph dues with otheramino acids. In the cases where the truncation eral blood. of the IGHD gene is limited by the presence of the Cys

US 8,877,688 B2 95 96 TABLE 18-continued TABLE 19-continued DH Gene Segments Used in Length Distributions of DHSegments Selected for the Presently Exemplified Library Inclusion in the Exemplary Library DHSegment Number of Designation' Peptide SEQID NO: DH Size Occurrences IGHD6-13 2-3 IAA 9 12 IGHD6-13 2-4 AAAG 803 10 4 IGHD6-13 2-5 GIAA 804 IGHD6-13 2-6 IAAA 805 10 IGHD6-13 2-7 GIAAA 806 As specified above, based on the CDRH3 numbering sys IGHD6-13 2-8 IAAAG 807 tem defined in this application, IGHD-derived amino acids IGHD6-13 2-9 GIAAAG 8 (i.e., DH segments) are numbered beginning with position 97. The sequence designation isformatted as follows: (IGH DGene Name) (Reading Frame)- followed by positions 97A, 97B, etc. In the currently exem (Variant Number) plified embodiment of the library, the shortest DH segment *Note that the origin of certain variants is rendered somewhat arbitrary when redundant 15 segments are deleted from the library (i.e., certain segments may have their origins withmore has three amino acids: 97,97A and 97B, while the longest DH than one parent, including the one specified in the table), segment has 10 amino acids: 97,97A,97B, 97C,97D,97E, Table 19 shows the length distribution of the 278 DH 97F, 97G,97H and 97I. segments selected according to the methods described above. Example 5.2 TABLE 19 Selection of the H3-JH Segments Length Distributions of DHSegments Selected for Inclusion in the Exemplary Library There are six human germline IGHJ genes. During in vivo assembly of antibody genes, these segments are progressively Number of 25 deleted at their 5' end. In this exemplary embodiment of the DH Size Occurrences library, IGHJ gene segments with no deletions, or with 1, 2, 3, 78 4, 5, 6, or 7 deletions (at the amino acid level), yielding JH 64 segments as short as 13 amino acids, were included (Table 50 20). Other embodiments of the invention, in which the IGHJ 38 30 27 gene segments are progressively deleted (at their 5"/N-termi 2O nal end) to yield 15, 14, 12, or 11 amino acids are also contemplated. TABL E IGHU Gene Segments Selected for use in the Exemplary Library

ent H3-JH)-FRM4) SEO ID NO: H3 - JH SEQ ID NO: JH1 parent or JH1 1 AEYFQHWGQGTLVTVSS 253 AEYFOH 17 JH1 2 EYFOHWGOGTLVTVSS 8O8 EYFOH 83 O JH1 3 YFOHWGOGTLVTVSS 809 YFOH 831 JH1 4 FOHWGOGTLWTVSS 810 FOH JH1 5 OHWGOGTLVTVSS 811 OH JH2 parent or JH2 1 YWYFDLWGRGTLVTVSS 254 YWYFDL 18

JH2 2 WYFDLWGRGTLWTWSS 812 WYFDL 832

JH2 3 YFDLWGRGTLWTWSS 813 YFDL 833

JH2 4 FDLWGRGTLWTWSS 814 FDL

JH2 5 DLWGRGTLWTWSS 815 DL

JH3 parent or JH3 1 AFDWWGQGTMVTVSS 255 AFDW 19 JH3 2 FDWWGOGTMVTVSS 816 FDW

JH3 3 DWWGOGTMVTVSS 817 DV JH4 parent or JH4 1 YFDYWGQGTLVTVSS 256 YFDY 2O JH4 2 FDYWGOGTLWTVSS 818 FDY

JH4 3 DYWGOGTLVTVSS 819 DY

JH5 parent or JH5 1 NWFDSWGQGTLVTVSS 257 NWFDS 21 US 8,877,688 B2 97 98 TABLE 2O - continued IGHJ Gene Seciments Selected for use in the Exemplary Library

ent H3-JH)-FRM4 SEQ ID NO: H3- JH SEO ID NO: JH5 2 WFDSWGOGTLVTVSS 82O WFDS 834

JH5 3 FDSWGOGTLVTVSS 821 FDS

JH5 4 DSWGOGTLVTVSS 822 DS JH6 parent or JH6 1. YYYYYGMDVWGQGTTVTVSS 258 YYYYYGMDV 22

JH6 2 YYYYGMDVWGOGTTWTVSS 823 YYYYGMDV 835

JH6 3 YYYGMDWWGOGTTWTVSS 824 YYYGMDW 836

JH6 4 YYGMDVWGOGTTWTVSS 825 YYGMDW 837

JH6 5 YGMDVWGOGTTWTVSS 826 YGMDW 838

JH6 6 GMDVWGOGTTWTVSS 827 GMDW 839

JH6 7 MDVWGOGTTWTVSS 82.8 MDV

'H3-JH is defined as the portion of the IGHJ segment included within the Kabat definition of CDRH3; FRM4 is defined as the portion of the IGHJ segment encoding framework region four.

According to the CDRH3 numbering system of this appli ing naturally occurring N1 and N2 segments in human anti cation, the contribution of for example, JH6 1 to CDRH3. bodies. According to data compiled from human databases would be designated by positions 99F, 99E, 99D, 99C,99B, 30 (see, e.g., Jackson et al., J. Immunol. Methods, 2007, 324:26, 99A, 100, 101 and 102 (Y.Y.Y.Y.Y. G. M., D and V, respec incorporated by reference in its entirety), there are an average tively). Similarly, the JH4 3 sequence would contribute of about 3.02 amino acid insertions for N1 and about 2.4 amino acid positions 101 and 102 (D and Y, respectively) to amino acid insertions for N2, not taking into account inser CDRH3. However, in all cases of the exemplified library, the tions of two nucleotides or less. FIG. 2 shows the length JH segment will contribute amino acids 103 to 113 to the 35 distributions of the N1 and N2 regions in human antibodies. FRM4 region, in accordance with the standard Kabat num In this exemplary embodiment of the invention, N1 and N2 bering system for antibody variable regions (Kabat, op. cit. were fixed to a length of 0, 1, 2, or 3 amino acids. The 1991). This may not be the case in other embodiments of the naturally occurring composition of these sequences in human library. antibodies was used as a guide for the inclusion of different 40 amino acid residues. Example 5.3 The naturally occurring composition of single amino acid, two amino acids, and three amino acids N1 additions is Selection of the N1 and N2 Segments defined in Table 21, and the naturally occurring composition of the corresponding N2 additions is defined in Table 22. The While the consideration of V-D-J recombination enhanced 45 most frequently occurring duplets in the N1 and N2 set are by mimicry of the naturally occurring process of progressive compiled in Table 23. deletion (as exemplified above) can generate enormous diver sity, the diversity of the CDRH3 sequences in vivo is further amplified by non-templated addition of a varying number of TABLE 21 nucleotides at the V-Djunction and the D-Jjunction. 50 Composition of Naturally Occurring 1, 2, and N1 and N2 segments located at the V-D and D-Jjunctions, 3 Amino Acid N1 Additions respectively, were identified in a sample containing about Number of Number of Number of 2,700 antibody sequences (Jackson et al., J. Immunol. Meth Position Occur- Position Occur- Position Occur ods, 2007, 324: 26) also analyzed by the SoDA method of (CES 2 (CES 3 (CES 55 Volpe et al., Bioinformatics, 2006, 22:438-44; (both Jackson R 251 G 97 G 101 et al., and Volpe et al., are incorporated by reference in their G 249 P 67 R 66 entireties). Examination of these sequences revealed patterns C 173 R 67 P 47 in the length and composition of N1 and N2. For the construc 130 S 42 S 47 tion of the currently exemplified CDRH3 library, specific S 117 L 39 L 38 A. 84 V 33 A. 33 short amino acid sequences were derived from the above 60 V 62 E 24 V 28 analysis and used to generate a number of N1 and N2 seg K 61 A. 21 T 27 ments that were incorporated into the CDRH3 design, using I 55 D 18 E 24 Q 51 I 18 D 22 the synthetic scheme described herein. T 51 T 18 K 18 As described in the Detailed Description, certain embodi D 50 K 16 F 14 ments of the invention include N1 and N2 segments with 65 E 49 Y 16 I 13 rationally designed length and composition, informed by sta 3 H 13 W 13 tistical biases in these parameters that are found by compar US 8,877,688 B2 99 100 TABLE 21-continued TABLE 23-continued Composition of Naturally Occurring 1, 2, and Top Twenty-Five Naturally 3 Amino Acid N1 Additions* Occurring N1 and N2 Duplets Number of Number of Number of 5 Number of Cumulative Individual Position Occur- Position Occur- Position Occur Sequence Occurrences Frequency Frequency 1 (CES 2 (CES 3 (CES VG 6 O.378 O.O13 H 32 F 12 N 10 KG 6 O.389 O.O11 N 30 Q 11 Y 10 GW 5 O.400 O.O11 W 28 N 5 H 8 10 FP 5 O411 O.O11 Y 21 W 5 Q 5 LG 5 O422 O.O11 M 16 C 4 C 3 RS 5 O.433 O.O11 C 3 M 4 M 3 TP 5 0.444 O.O11 EG 5 O.4S5 O.O11 1546 530 530 15 *Defined as the sequence C-terminal to “CARX” (SEQ DNO: 840), or equivalent, of VH, wherein “X” is the “tail” (e.g., D, E, G, or no amino aci residue), Example 5.3.1

TABLE 22 Selection of the N1 Segments Composition of Naturally Occurring 1, 2, and 3 Analysis of the identified N1 segments, located at the junc Amino Acid N2 Additions* tion between V and D, revealed that the eight most frequently Number of Number of Number of occurring amino acid residues were G. R. S. P. L. A. T and V Position Occur- Position Occur- Position Occur 1 (CES 2 (CES 3 (CES (Table 21). The number of amino acid additions in the N1 25 segment was frequently none, one, two, or three (FIG. 2). The G 242 G 244 G 1S6 addition of four or more amino acids was relatively rare. P 219 P 138 C 79 Therefore, in the currently exemplified embodiment of the R 18O R 86 S S4 L 132 S 85 R 51 library, the N1 segments were designed to include Zero, one, S 123 T 77 49 two or three amino acids. However, in other embodiments, N1 A. 97 L 74 A. 41 30 segments of four, five, or more amino acids may also be T 78 A. 69 T 31 utilized. G and P were always among the most commonly V 75 V 46 V 29 E 57 E 41 D 23 occurring amino acid residues in the N1 regions. Thus, in the D 56 Y 38 E 23 present exemplary embodiment of the library, the N1 seg F S4 D 36 W 23 ments that are dipeptides are of the form GX, XG. PX, or XP, H S4 K 30 Q 19 35 where X is any of the eight most commonly occurring amino Q 53 F 29 17 I 49 W 27 Y 17 acids listed above. Due to the fact that G residues were N 45 H 24 H 16 observed more frequently than Presidues, the tripeptide Y 40 I 23 I 11 members of the exemplary N1 library have the form GXG. K 35 Q 23 K 11 W 29 N 21 N 8 GGX, or XGG, where X is, again, one of the eight most M 2O M 8 C 6 40 frequently occurring amino acid residues listed above. The C 6 C 5 M 6 resulting set of N1 sequences used in the present exemplary embodiment of the library, include the “Zero” addition 1644 1124 670 amounts to 59 sequences, which are listed in Table 24. *Defined as the sequence C-terminal to the D segment but not encoded by IGHJ genes, 45 TABL E 24 N1 Sequences Selected for Inclusion TABLE 23 in the Exemplary Library Top Twenty-Five Naturally Occurring N1 and N2 Duplets Segment 50 Type Sequences Number Number of Cumulative Individual Zero' (no addition) V segment 1. Sequence Occurrences Frequency Frequency joins directly to D segment GG 17 O.O37 O.O37 PG 15 O.O70 O.O33 Monomers G. P. R. A. S. L T, V 8 RG 15 O. 103 O.O33 55 PP 13 O.132 O.O29 Dimers GG, GP, GR, GA, GS, GL, GT, 28 GP 12 O.158 O.O26 GV, PG, RG, AG, SG, LG, TG, GL 11 O.182 O.O24 WG, PP, PR, PA, PS, PL, PT PT 10 O.204 O.O22 PV, RP AP, SP, LP, TP VP TG 10 O.226 O.O22 GV 9 O.246 O.O20 Trimers. GGG GPG, GRG, GAG, GSG 22 RR 9 O.266 O.O20 60 GLG GTG, GWG, PGG RGG, SG 8 O.284 O.O18 AGG, SGG LGG, TGG WGG, RP 7 O.299 O.O15 GGP, GGR, GGA, GGS, GGL, GGT, GGV G 6 O.312 O.O13 GS 6 O.325 O.O13 SR 6 O.338 O.O13 In accordance with the CDRH3 numbering system of the PA 6 O.352 O.O13 65 application, the sequences enumerated in Table 24 contribute LP 6 O.365 O.O13 the following positions to CDRH3: the monomers contribute position 96, the dimers to 96 and 96A, and the trimers to 96, US 8,877,688 B2 101 102 96A and 96B. In alternative embodiments, where tetramers occurring antibodies, and including N1 and N2 segments of and longer segments could be included among the N1 four, five, or more amino acids in the library. Tables 21 to 23 sequences, the corresponding numbers would go on to and FIG. 2 provides information about the composition and include 96C, and so on. length of the N1 and N2 sequences in naturally occurring antibodies that is useful for the design of additional N1 and Example 5.3.2 N2 regions which mimic the natural composition and length. In accordance with the CDRH3 numbering system of the application, N2 sequences will begin at position 98 (when Selection of the N2 Segments present) and extend to 98A (dimers) and 98B (trimers). Alter 10 native embodiments may occupy positions 98C,98D, and so Similarly, analysis of the identified N2 segments, located at O. the junction between D and J, revealed that the eight most frequently occurring amino acid residues were also G. R. S. P. Example 5.4 L.A.T and V (Table 22). The number of amino acid additions in the N2 segment was also frequently none, one, two, or three A CDRH3 Library (FIG. 2). For the design of the N2 segments in the exemplary 15 library, an expanded set of sequences was utilized. Specifi When the “tail” (i.e., G/D/E/-) is considered, the CDRH3 cally, the sequences in Table 25 were used, in addition to the in the exemplified library may be represented by the general 59 sequences enumerated in Table 24, for N1. formula:

TABL E 25 In the currently exemplified, non-limiting, embodiment of the library, G/D/E/- represents each of the four possible Extra Sedulences in N2 Additions terminal amino acid “tails': N1 can be any of the 59 Segment Number Number sequences in Table 24; DH can be any of the 278 sequences in Type Sequence New Total Table 18; N2 can be any of the 141 sequences in Tables 24 and 25 25; and H3-JH can be any of the 28 H3-JH sequences in Monomers D, E, F, H, I, K, M, Q, 1O 18 Table 20. The total theoretical diversity or repertoire size of W, Y this CDRH3 library is obtained by multiplying the variations Dimers AR, AS, AT, AY, DL, DT, 54 82 at each of the components, i.e., 4x59x278x141x28–2.59x EA, EK, FH FS, HL HW, 10. IS, KW LD, LE, LR, LS, 30 However, as described in the previous examples, redun LT, NR, NT, OE, QL, QT, RA, RD, RE, RF, RH, RL dancies may be eliminated from the library. In the presently RR, RS, RW, SA, SD, SE, exemplified embodiment, the tail and N1 segments were SF, SI, SK, SL, SO, SR, combined, and redundancies were removed from the library. SS, ST, SW, TA TR, TS For example, considering the VH chassis, tail, and N1 TT, TW, VD, WS, WS YS 35 regions, the sequence IVH Chassis-G may be obtained in Trimers AAE, AYH, DTL, EKR ISR, 18 4 O two different ways: VH Chassis+G+nothing or VH NTP, PKS PRP, PTA, PTO Chassis+nothing+G. Removal of redundant sequences REL RPL SAA SAL. SGL, resulted in a total of 212 unique IG/D/E/--N1 segments SSE, TGL, WGT out of the 236 possible combinations (i.e., 4 tailsx59 N1). 40 Therefore, the actual diversity of the presently exemplified The presently exemplified embodiment of the library, CDRH3 library is 212x278x141x28–2.11x10. FIG. 23 therefore, contains 141 total N2 sequences, including the depicts the frequency of occurrence of different CDRH3 “Zero” state. One of ordinary skill in the art will readily lengths in this library, versus the preimmune repertoire of Lee recognize that these 141 sequences may also be used in the N1 et al. region, and that Such embodiments are within the scope of the 45 Table 26 further illustrates specific exemplary sequences invention. In addition, the length and compositional diversity from the CDRH3 library described above, using the CDRH3 of the N1 and N2 sequences can be further increased by numbering system of the present application. In instances utilizing amino acids that occur less frequently than G. R. S. where a position is not used, the hyphen symbol (-) is P. L. A. T and V, in the N1 and N2 regions of naturally included in the table instead. TABL E 26 Examples of Designed CDRH3 Sequences According to the Library Exemplified in Examples 1 to 5

Tail N1 DH N2 H3 - JH CDRH3 95 96 96A96B 97 97A97B 97C 97D 97E97F 97G 97H97I 98 98A 98B 99E99D99C99B 99A 99 100 101102 Length No. 1 G - - - Y Y Y D. W. 6

No. 2 D G - - G Y C S G. G. S Y S Y ------F Q H 16

No. 3 E R - - I T I F G W - - - - G G ------Y F D Y 14

No. 4 P P - W L L W. F. G E L D D. L. 14

No. 5 G G. S G Y Y Y G. S G. S Y N P - - - - - A E Y F Q H 21

No. 6 D - - - R G W I I - - - - - M - - Y Y Y Y Y G M D W 16 US 8,877,688 B2 103 104 TABLE 26- continued Examples of Designed CDRH3 Sequences According to the Library Exemplified in Examples 1 to B

Tail N1 DH N2 H3 - JH CDRH3 95 96 96A96B 97 97A97B97C 97D 97E97F 97G 97H97I 98 98A 98B 99E99D 99C99B 99A 99 100 101102 Length

No. 7 E S G - Y Y Y D S S G Y Y T G L - - - - W Y F D L 21

No. 8 S - - D Y G D Y - - - - S I ------F D I 11

No. 9 P G - W. F. G ------P S - - - - Y Y G M D W 13

No. 10 - - - C S G. G. S C - - - A Y - - - - - N W F D P 13

Sequence Identifiers: No. 1 (SEQ ID NO: 542) ; No. 2 (SEQ ID NO : 543) ; No. 3 (SEQ ID NO: 544); No. 4 (SEQ ID NO: 545); No. 5 (SEQ ID NO: 546) ; No. 6 (SEQ ID NO: 547); No. 7 (SEQ ID NO: 548); No. 8 (SEQ ID NO: 549) ; No. 9 (SEQ ID NO: 55O) ; No. 10 (SEQ ID NO: 551).

Example 6 TABLE 27-continued Design of VKCDR3 Libraries IGK Gene Usage in Various Data Sets This example describes the design of a number of exem Gene Klein Juul LUA plary VKCDR3 libraries. As specified in the Detailed 25 IGKJ3 7.0% 8.0% 12.1% Description, the actual version(s) of the VKCDR3 library IGK4 26.0% 24.0% 26.5% made or used in particular embodiments of the invention will IGKS 6.0% 18.0% 8.0% depend on the objectives for the use of the library. In this example the Kabat numbering system for light chain variable regions was used. 30 Thus, a simple combinatorial of “MVK chassis and the 5 In order to facilitate examination of patterns of occurrence, IGKJ genes would generate a library of size Mx5. In the human kappa light chain sequences were obtained from the Kabat numbering system, for VKCDR3 of length nine, amino publicly available NCBI database (Appendix A). As for the acid number 96 is the first encoded by the IGKJ gene. Exami heavy chain sequences (Example 2), each of the sequences nation of the amino acid occupying this position in human obtained from the publicly available database was assigned to 35 sequences showed that the seven most common residues are its closest germline gene, on the basis of sequence identity. L. Y. R. W. F. P. and I, cumulatively accounting for about 85% The amino acid compositions at each position were then of the residues found in position 96. The remaining 13 amino determined within each kappa light chain Subset. acids account for the other 15%. The occurrence of all 20 amino acids at position 96 is presented in Table 28. Example 6.1 40 A Minimalist VKCDR3 Library TABLE 28 This example describes the design of a “minimalist” Occurrence of 20 Amino Acid Residues at VKCDR3 library, wherein the VKCDR3 repertoire is Position 96 in Human VK Data Set 45 restricted to a length of nine residues. Examination of the Type Number Percent Cumulative VKCDR3 lengths of human sequences shows that a dominant 333 22.3 22.3 proportion (over 70%) has nine amino acids within the Kabat Y 235 15.8 38.1 definition of CDRL3: positions 89 through 97. Thus, the R 222 14.9 52.9 currently exemplified minimalist design considers only W 157 1O.S 63.5 50 148 9.9 734 VKCDR3 of length nine. Examination of human kappa light 96 6.4 79.8 chain sequences shows that there are not strong biases in the C 90 6.O 85.9 usage of IGKJ genes; there are five such IKJ genes inhumans. Q 53 3.6 89.4 Table 27 depicts IGKJ gene usage amongst three data sets, N 39 2.6 92.0 namely Juul et al. (Clin. Exp. Immunol., 1997, 109: 194, H 31 2.1 94.1 55 V 21 1.4 95.5 incorporated by reference in its entirety), Klein and Zachau G 2O 1.3 96.8 (Eur. J. Immunol., 1993, 23: 3248, incorporated by reference C 14 O.9 97.8 in its entirety), and the kappa light chain data set provided in K 7 O.S 98.3 S 6 0.4 98.7 Appendix A (labeled LUA). A. 5 O.3 99.0 60 D 5 O.3 99.3 TABLE 27 E 5 O.3 99.7 T 5 O.3 1OOO IGKJ Gene Usage in Various Data Sets M O O.O 1OOO Gene Klein Juul LUA

IGK1 35.0% 29.0% 29.3% 65 To determine the origins of the seven residues most com IGK2 25.0% 23.0% 24.1% monly found in position 96, known human IGKJamino acid sequences were examined (Table 29). US 8,877,688 B2 105 106 TABL E 29 than G. In some embodiments, it may be advantageous to produce a minimalist pre-immune repertoire with a higher Known Human IGKU Amino Acid Sequences degree of conformational flexibility. Considering the ten VK Gene Sequence SEQ ID NO: chassis depicted in Table 11, one implementation of the mini malist VKCDR3 library would have 70 members resulting IGKJ1 WTFGOGTKVEIK 552 from the combination of 10 VK chassis by 7 junction (posi tion 96) options and one IGKJ-derived sequence (e.g., IGKJ4 IGKJ2 YTFGOGTKLEIK 553 (SEQ ID NO:555). Although this embodiment of the library IGKJ3 FTFGPGTKVDIK 554 has been depicted using IGKJ4 (SEQ ID NO: 555), it is 10 possible to design a minimalist VKCDR3 library using one of IGKJ4 LTFGGGTKWEIK 555 the other four IGKJ sequences. For example, another embodi IGKJS ITFGOGTRLEIK 556 ment of the library may have 350 members (10VK chassis by 7 junctions by 5 IGKJ genes). One of ordinary skill in the art will readily recognize that Without being bound by theory, five of the seven most 15 one or more minimalist VKCDR3 libraries may be con commonly occurring amino acids found in position 96 of structed using any of the IGKJ genes. Using the notation rearranged human sequences appear to originate from the first above, these minimalist VKCDR3 libraries may have amino acid encoded by each of the five human IGKJ genes, sequences represented by, for example: namely, W, Y, F, L, and I. JK1: VK Chassis-L3-VK-F/L/I/R/W/Y/P-TF Less evident were the origins of the P and R residues. GQGTKVEIK (SEQID NO: 528): Without being bound by theory, most of the human IGKV JK2: VK Chassis-L3-VK-F/L/I/R/W/Y/P-TF gene nucleotide sequences end with the sequence CC, which GQGTKLEIK (SEQID NO: 842): occurs after (i.e., 3' to) the end of the last full codon (e.g., that JK3: VK Chassis-L3-VK-F/L/I/R/W/Y/P-TFG encodes the C-terminal residue shown in Table 11). There PGTKVDIK (SEQID NO: 843; and fore, regardless of which nucleotide is placed after this 25 JK5: VK Chassis-L3-VK-F/L/I/R/W/Y/P-TF sequence (i.e., CCX, where X may be any nucleotide) the GQGTRLEIK (SEQ ID NO: 844). codon will encode a proline (P) residue. Thus, when the IGKJ gene undergoes progressive deletion (just as in the IGHJ of Example 6.2 the heavy chain; see Example 5), the first full amino acid is lost and, if no deletions have occurred in the IGKV gene, a P 30 A VKCDR3 Library of About 10 Complexity residue will result. To determine the origin of the arginine residue at position In this example, the nine residue VKCDR3 repertoire 96, the origin of IGKJ genes in rearranged kappa light chain described in Example 6.1 is expanded to include VKCDR3 sequences containing R at position 96 were analyzed. The lengths of eight and ten residues. Moreover, while the previ analysis indicated that Roccurred most frequently at position 35 ously enumerated VKCDR3 library included the VK chassis 96 when the IGKJ gene was IGKJ1 (SEQID NO: 552). The and portions of the IGKJ gene not contributing to VKCDR3, germline W (position 1: Table 29) for IGKJ1 (SEQ ID NO: the presently exemplified version focuses only on residues 552) is encoded by TGG. Without being bound by theory, a comprising a portion of VKCDR3. This embodiment may be single nucleotide change of T to C (yielding CGG) or A favored, for example, when recombination with a vector (yielding AGG) will, therefore, result in codons encoding Arg 40 which already contains VK chassis sequences and constant (R). A change to G (yielding GGG) results in a codon encod region sequences is desired. ing Gly (G). Roccurs about ten times more often at position While the dominant length of VKCDR3 sequences in 96 in human sequences than G (when the IGKJ gene is IGKJ1 humans is nine amino acids, other lengths appear at measur (SEQIDNO:552), and it is encoded by CGG more often than able rates that cumulatively approach almost 30% of kappa AGG. Therefore, without being bound by theory, C may 45 light chain sequences. In particular, VKCDR3 of lengths 8 originate from one of the aforementioned two Cs at the end of and 10 represent, respectively, about 8.5% and about 16% of IGKV gene. However, regardless of the mechanism(s) of sequences in representative samples (FIG. 3). Thus, a more occurrence, Rand Pare among the most frequently observed complex VKCDR3 library includes CDR lengths of 8 to 10 amino acid types at position 96, when the length of VKCDR3 amino acids; this library accounts for over 95% of the length is 9. Therefore, a minimalist VKCDR3 library may be repre 50 distribution observed in typical collections of human sented by the following amino acid sequence: VKCDR3 sequences. This library also enables the inclusion of additional variation outside of the junction between the VK and JK genes. The present example describes such a library. The library comprises 10 sub-libraries, each designed around In this sequence, VK Chassis represents any selected VK 55 one of the 10 exemplary VK chassis depicted in Table 11. chassis (for non-limiting examples, see Table 11), specifically Clearly, the approach exemplified here can be generalized to Kabat residues 1 to 88 encoded by the IGKV gene. L3-VK consider M different chassis, where M may be less than or represents the portion of the VKCDR3 encoded by the chosen more than 10. IGKV gene (in this embodiment, residues 89-95). F/L/I/R/ To characterize the variability within the polypeptide seg W/Y/P represents any one of amino residues F, L, I, R. W.Y. 60 ment occupying Kabat positions 89 to 95, human kappa light or P. In this exemplary representation, IKJ4 (minus the first chain sequence collections derived from each of the tenger residue) has been depicted. Without being bound by theory, mline sequences of Example 3 were aligned and compared apart from IGKJ4 (SEQID NO. 555) being among the most separately (i.e., within the germline group). This analysis commonly used IGKJ genes in humans, the GGGamino acid enabled us to discern the patterns of sequence variation at sequence is expected to lead to larger conformational flex 65 each individual position in each kappa light chain sequence, ibility than any of the alternative IGKJ genes, which contain grouped by germline. The table below shows the results for a GXGamino acid sequence, where X is an amino acid other sequences derived from IGKV1-39 (SEQID NO: 233). US 8,877,688 B2 107 108 TABLE 30 facilitate description of the library, the IUPAC code for degenerate nucleotides, as given in Table 31, will be used. Percent Occurrence of Amino Acid Types in IGKV1-39-Derived Sequences TABLE 31 Amino Acid P89 P90 P91 P92 P93 P94 P95 Degenerate Base Symbol Definition A. O O 1 O O 4 1 C O O O O O O O IUPAC Symbol Base Pair Composition D O O 1 1 3 O O E O 1 O O O O O 10 F O O O 5 O 2 O A. A (100%) G O O 2 1 2 O O C C (100%) H 1 1 O 4 O O O I O O 1 O 4 5 1 G G (100%) K O O O 1 2 O O T T (100%) L 3 O O 1 1 3 7 15 R A (50%) G (50%) M O O O O O 1 O N O O 3 2 6 2 O Y C (50%) T (50%) P O O O O O 4 85 W A (50%) T (50%) Q 96 97 O O O O O S C (50%) G (50%) R O O O O 5 O 2 S O O 8O 4 65 6 3 M A (50%) C (50%) T O O 9 O 10 65 1 K G (50%) T (50%) V O O O O O 1 1 W O O O O O O O B C (33%) G (33%) T(33%) (*) Y O O 2 8O O 3 O D A (33%) G (33%) T(33%) H A (33%) C (33%) T(33%) For example, at position 89, two amino acids, Q and L. 25 V A (33%) C (33%) G (33%) account for about 99% of the observed variability, and thus in N A (25%) C (25%) G (25%) T (25%) the currently exemplified library (see below), only Q and L were included in position 89. In larger libraries, of course, (*) 33% is short hand here for 1/3 (i.e., 33.3333. . .%) additional, less frequently occurring amino acid types (e.g., H), may also be included. 30 Using the VK1-39 chassis with VKCDR3 of length nine as Similarly, at position 93 there is more variation, with amino an example, the VKCDR3 library may be represented by the acid types S, T, N, R and I being among the most frequently following four oligonucleotides (left column in Table 32), occurring. The currently exemplified library thus aimed to with the corresponding amino acids encoded at each position include these five amino acids at position 93, although clearly of CDRL3 (Kabat numbering) provided in the columns on the others could be included in more diverse libraries. However, right. TABL E 32 Exemplary Oligonucleotides Encoding a VK1-39 CDR3 Library

Oligonucleotide Sequence 89 9 O 91 92 93 94. 95 95A 96 97

CWGSAAWCATHCMVTATCCTTWCACT LO EO ST FSY HNPRST ISTP - FY T (SEQ ID NO:

CWGSAAWCATHCMVTATCCTMTCACT LO EO ST FSY HNPRST ISTP - IL. T. (SEQ ID NO: 308)

CWGSAAWCATHCMVTABTCCTWGGACT LO EO ST FSY HNPRST ISTP - WR T (SEQ ID NO:

CWGSAAWCATHCMVTATCCTCBTACT LO EO ST FSY HNPRST ISTP PLR - T (SEQ ID NO: 310) because this library was constructed via standard chemical For example, the first codon (CWG) of the first nucleotide oligonucleotide synthesis, one is bound by the limits of the of Table 32, corresponding to Kabat position 89, represents genetic code, so that the actual amino acid set represented at 55 50% CTG and 50% CAG, which encode Leu (L) and Gln (Q), position 93 of the exemplified library consists of S, T, N. R. P respectively. Thus, the expressed polypeptide would be and H, with P and H replacing I (see exemplary 9 residue expected to have L and Q each about 50% of the time. Simi VKCDR3 in Table 32, below). This limitation may be over larly, for Kabat position 95A of the fourth oligonucleotide, come by using codon-based synthesis of oligonucleotides, as the codon CBT represents /3 each of CCT, CGT and CTT, described in Example 6.3, below. A similar approach was 60 corresponding in turn to /3 each of Pro (P), Lcu (L) and Arg followed at the other positions and for the other sequences: (R) upon translation. By multiplying the number of options analysis of occurrences of amino acid type per position, available at each position of the peptide sequence, one can choice from among most frequently occurring Subset, fol obtain the complexity, in peptide space, contributed by each lowed by adjustment as dictated by the genetic code. oligonucleotide. For the VK1-39 example above, the numbers As indicated above, the library employs a practical and 65 are 864 for the first three oligonucleotides and 1.296 for the facile synthesis approach using standard oligonucleotide Syn fourth oligonucleotide. Thus, the oligonucleotides encoding thesis instrumentation and degenerate oligonucleotides. To VK1-39 CDR3s of length nine contribute 3,888 members to US 8,877,688 B2 109 110 the library. However, as shown in Table 32, sequences with L 3,024. As depicted in Table 33, for the complete list of oligo or Rat position 95A (when position 96 is empty) are identical nucleotides that represent VKCDR3 of sizes 8,9, and 10, for to those with L or R at position 96 (and 95A empty). There- all 10 VK chassis, the overall complexity is about 1.3x10 or fore, the 3,888 number overestimates the LR contribution and 1.2x10 unique sequences after correcting for over-counting the actual number of unique members is slightly lower, at of the LR contribution for the size 9 VKCDR3. TABL E 33 Degenerate Oligonucleotides Encoding an Exemplary VKCDR3 Library

Junc CDR tion SEQ L3 Type Degenerate ID Chassis Length (1) Oligonucleotide NO: 89 90 91 92 93 94. 95 95A96 97

WK1-5 8 1 CASCASTMCVRTRSTT 259 HO HO SY DGHNRS AGST FY - - FY T WCTWCACT

WK1-5 8 2 CASCASTMCVRTRSTT 26 OHO HO SY DGHNRS AGST FY - - IL. T. WCMTCACT

WK1-5 8 3 CASCASTMCVRTRSTT 261 HO HO SY DGHNRS AGST FY - - WR T WCWGGACT

WK1-5 8 4 CASCASTMCVRTRSTT 262 HO HO SY DGHNRS AGST FY PS - - T WCYCTACT

WK1-5 9 1 CASCASTMCVRTRSTT 263 HO HO SY DGHNRS AGST FY PS - FY T WCYCTTWCACT

WK1-5 9 2 CASCASTMCVRTRSTT 264 HO HO SY DGHNRS AGST FY PS - IL. T. WCYCTMTCACT

WK1-5 9 3 CASCASTMCVRTRSTT 265 HO HO SY DGHNRS AGST FY PS - WR T WCYCTWGGACT

WK1-5 9 4 CASCASTMCVRTRSTT 266 HO HO SY DGHNRS AGST FY PS PS - T WCYCTYCTACT

WK1-5 10 1 CASCASTMCVRTRSTT 267 HO HO SY DGHNRS AGST FY PS PLR FY T WCYCTCBTTWCACT

WK1-5 10 2 CASCASTMCVRTRSTT 268 HO HO SY DGHNRS AGST FY PS PLRIL. T. WCYCTCBTMTCACT

WK1-5 10 3 CASCASTMCVRTRSTT 269 HO HO SY DGHNRS AGST FY PS PLRWR T WCYCTCB TWGGACT

WK1-12 8 1 CASCASDCTRVCARTT 27 OHO HQ ASTADGNST NS FL - - FY T TSTWCACT

WK1-12 8 2 CASCASDCTRVCARTT 271HO HQ ASTADGNST NS FL - - IL. T. TSMTCACT

WK1-12 8 3 CASCASDCTRVCARTT 272 HO HQ ASTADGNST NS FL - - WR T TSWGGACT

WK1-12 8 4 CASCASDCTRVCARTT 273 HO HQ ASTADGNST NS FL P - - T TSCCTACT

WK1-12 9 1 CASCASDCTRVCARTT 274 HO HQ ASTADGNST NS FL P - FY T TSCCTTWCACT

WK1-12 9 2 CASCASDCTRVCARTT 275 HO HQ ASTADGNST NS FL P - IL. T. TSCCTMTCACT

WK1-12 9 3 CASCASDCTRVCARTT 276 HO HQ ASTADGNST NS FL P - WR T TSCCTWGGACT

WK1-12 9 4 CASCASDCTRVCARTT 277 HO HQ ASTADGNST NS FL P PLR- T TSCCT CBTACT

WK1-12 10 1 CASCASDCTRVCARTT 278 HO HQ ASTADGNST NS FL P PLR FY T TSCCT CBTTWCACT

WK1-12 10 2 CASCASDCTRVCARTT 279 HO HQ ASTADGNST NS FL P PLRIL. T. TSCCT CBTMTCACT

WK1-12 10 3 CASCASDCTRVCARTT 28OHO HQ ASTADGNST NS FL P PLRWR T TSCCTCB TWGGACT