<<

Downloaded by guest on September 26, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1810576116 a Bujotzek Alexander Raybould J. I. Matthew profiling antibody therapeutic for guidelines developability computational Five rmseicsqec oislal ops-o cotranslational or clear- post- of to liable rates modification. motifs (e.g., high results sequence heterogeneity often specific to Product glycosylation) from negative or linked levels. isomerization, and are oxidation, expression high through (10) CDRs poor at domains positive and the viscosity ance of in variable and Patches charge light-chain self-association (11) 9). and with (4, heavy- the correlated concentrations the in aggre- also Asymmetry of in is (2–8). implicated charge polyspecificity been net and repeatedly viscosity, have gation, hydrophobicity, of (CDRs), complementarity-determining levels variable regions High highly the ambiguous. in less particularly are others gin, (2). negative pipelines these industrial in for routine screening now vitro polyspeci- is In characteristics viscosity, expression. poor high confor- and antigen. ficity, and self-association, the chemical instability, devel- to immunogenicity, mAb affinity mational intrinsic therapeutic desired include to in the These approved barriers achieving first many were besides 10 are opment, 2018), There 12, (1). June of 2017 the as Administration by (valid Drug and 1986 approved Food since therapies or antibody Agency 73 Medicines European the of antigens; soluble M sequencing gene immunoglobulin antibodies monoclonal therapeutic at available them freely is TAP potential tests opig.stats.ox.ac.uk/webapps/sabdab-sabpred/TAP.php forms. reports . canonical sequences, and and liabilities guidelines, domain sequence downloadable developability five variable builds our that of against Antibody devel- tool Therapeutic models with computational the homology sequences a available highlight (TAP), make sets, Profiler selectively We discovery to issues. identify drug to opability able proposed mAb were is two system val- The we On flagging the charges. a candidates. from and surface asym- derived nonconforming CSTs, were light-chain and in property seen and CDRs, each ues heavy- the for cutoffs hydrophobicity, net in guideline the surface charge in of negative metry magnitude and developabil- charge and poor positive extent regions in complementarity-determining the implicated the of (CDRs), be length to total values the thought case, guideline ity: metrics describe each the We of five In snapshot repertoire. for a properties. gene against antibody typical distribution post-phase-I human sil- CST their of the in estimate contextualize set calculate we to large and model metrics (CSTs) a we ico therapeutics Here, of design. antibody structures antibody clinical-stage domain for variable cur- analog of is silico the selection there in the properties, no guide biophysical rently to appropriate five of with stability discov- rule molecules drug Lipinski’s poor small-molecule from must as While benefits but such aggregation. ery 2018) of target issues” 4, levels July their “developability review high for to from or (received bind 2019 free 10, only be January approved not also and CA, must Jolla, La mAbs Diego, Therapeutic San California, of University Carson, A. Dennis by Edited and Germany; Penzberg, Kingdom; United 2NY, Kingdom; SG1 United 6GH, CB21 Cambridge MedImmune, eateto ttsis nvriyo xod xodO13B ntdKingdom; United 3LB, OX1 Oxford Oxford, of University Statistics, of Department hl oecsso ordvlpblt r utei ori- in subtle are developability poor of cases some While puistreigawd ag fmmrn-on or membrane-bound of range wide a targeting ther- as apeutics used increasingly are (mAbs) antibodies onoclonal e d hmsr eatet C hra luhS13E ntdKingdom United 3WE, SL1 Slough Pharma, UCB Department, Chemistry ieShi Jiye , d oh hraRsac n al eeomn,LreMlcl eerh oh noainCne uih DE-82377 Munich, Center Innovation Roche Research, Molecule Large Development, Early and Research Pharma Roche a lieMarks Claire , | | ufc hydrophobicity surface eeoaiiyguidelines developability e n hrot .Deane M. Charlotte and , a ordKrawczyk Konrad , c opttoa n oeln cecs lxSihln eerhadDvlpet Stevenage Development, and Research GlaxoSmithKline Sciences, Modelling and Computational | ufc charge surface | a,1 a rc Taddese Bruck , 1073/pnas.1810576116/-/DCSupplemental ulse nieFbur 4 2019. 14, February online Published hsatcecnan uprigifrainoln at online information supporting contains article This 1 under distributed is BY).y is (CC J.S. article access and open GmbH, This Submission.y Direct Diagnostics therapies.y PNAS antibody a Roche sell is and article by discover This companies employed four All is Celltech. UCB A.B. by employed employed Limited, is B.T. plc, MedImmune GlaxoSmithKline by by employed is A.P.L. statement: the interest wrote of C.M.D. Conflict and C.M., M.I.J.R., and data; M.I.J.R. C.M., analyzed paper. research; M.I.J.R., y C.M.D. tools; and designed reagents/analytic J.S., C.M.D. new A.B., A.P.L., contributed B.T., and J.S. J.N., and B.T. K.K., research; performed C.M., M.I.J.R., contributions: Author etcmb;asmlrsrtg ntefil fpharmacokinetics of field the candidates in thera- those strategy tested similar highlight clinically a mAbs; from to peutic greatly differ is characteristics profiles whose developability poor structure of accuracy current the (16). analysis mod- prediction given homology antibodies, atomic-resolution static diverse comparing SAP’s in of use els that to sensitive likely too antibody be is crys- would known antibody It a related structure. to closely relative a tal candidates using (11), rank profile to (15), developability able patches be surface to as to and shown such been regions, has aggregation-prone This (6). later detect Index if (5), Developability metric use the method (SAP) in to structure-based Propensity included Aggregation purely equation (4, Structural One analogous the (4). sequence is an available antibody is suggest the structure some a only although use 8), purpose algorithms 7, Many this software proclivity. designing for aggregation on designed predict been has better years can recent that in (14). focus sites glycosylation primary N-linked A or cysteines glycation of (13), presence lysine the deamidation of and asparagine sites (13), isomerization example aspartate for (12), liabilities, sequence of experimental sil- identifica- tion their the in facilitate than already cheaper tools of and Computational equivalents. faster development are the which assays, enabled ico has properties biophysical owo orsodnesol eadesd mi:[email protected] Email: addressed. be should correspondence whom To eefudt grgt rhv orexpression. poor that have antibodies or aggregate approach manufacturing variable to our found against the were where advised is examples have input show would We required sequence. only clinical-stage domain The in rare/unseen therapeutics. are mAb that that antibodies characteristics highlights molecules, possess Profiler small Antibody in Therapeutic druglikeness of our Lipinski measure identification the to which Early Akin guidelines, essential. anti- therapeutic. is an characteristics a negative preclude these becoming all from can expression body viscosity, poor high or polyspecificity, self-association, instability, Immunogenicity, Significance natraieapoc opeitatbde ieyt have to likely antibodies predict to approach alternative An these governing factors the of understanding improved An b eateto nioyDsoeyadPoenEngineering, Protein and Discovery Antibody of Department b aolwNowak Jaroslaw , PNAS | ac ,2019 5, March . y raieCmosAtiuinLcne4.0 License Attribution Commons Creative a lnP Lewis P. Alan , | o.116 vol. www.pnas.org/lookup/suppl/doi:10. | o 10 no. c , | 4025–4030

BIOPHYSICS AND COMPUTATIONAL BIOLOGY led to the Lipinski rules for small-molecule drug design (17). IMGT CDR Lengths. Loop length has a major impact on the nature Here, we build 3D models of a large set of post-phase-I therapeu- of antigen binding. For example, if an antibody has a long tics and survey their sequence and structural properties. These CDRH3 loop, it tends to form most of the interactions with values are then contextualized against human immunoglobulin an antigen, while shorter CDRH3 loops contribute to concave gene sequencing (Ig-seq) sequences and models, to see where binding sites where other CDRs more often assist in binding (25). therapeutics share and deviate from the properties of human The 137 CST and human Ig-seq sequences were IMGT- mAbs. numbered (26), and IMGT CDR definitions were used to split Using the distributions of these properties, we build the Ther- the sequences by region. The 137 CST CDRH3 loops had a apeutic Antibody Profiler (TAP), a computational tool that median length of 12, compared with 15 for the human VdH Ig- highlights antibodies with anomalous values compared with ther- seq dataset (Fig. 1). In the case of CDRL3 the distributions were apeutics. TAP builds a downloadable structural model of an closer, with a median length of 9 for the 137 CSTs and the human antibody variable domain sequence and tests it against guideline VdH Ig-seq data (SI Appendix, Fig. S1E). thresholds of five calculated measures likely to be linked to poor To test whether hybridomal development might account for developability. It also reports potential sequence liabilities and these findings—as it is known that mouse antibodies tend to have all non-CDRH3 loop canonical forms. shorter CDRH3 loops than human antibodies (27)—we split the 137 CST dataset by developmental origin (SI Appendix, Fig. S3). Results Fully human therapeutics were disproportionately represented Sequence Data. As a dataset of mAbs unlikely to suffer with at longer CDRH3s (mean: 13.21, median: 12), compared with developability issues, we used the variable domain heavy- and chimeric, humanized, or fully murine therapeutics (mean: 11.91, light-chain sequences of 137 clinical-stage antibody therapeu- median: 12). However, both therapeutic subsets still have shorter tics (137 CSTs) (18). To contextualize the properties of the CST CDRH3s than human-expressed antibodies. set, we retrieved Vander Heiden’s recent snapshot of the human The combined length of all CDRs for each antibody in the antibody repertoire from the Observed Antibody Space database 137 CST dataset had a median value 48 (SI Appendix, Fig. S4). (19, 20) (human VdH Ig-seq). We also used a larger proprietary The 137 CST total CDR length was highly correlated to CDRH3 dataset procured by UCB Pharma Ltd. (human UCB Ig-seq). All length (Pearson’s correlation coefficient of +0.77, with a two- comparisons in the paper are made to the Vander Heiden data, tailed P value of 2.44e−28). While neither human Ig-seq dataset is with UCB comparisons available in SI Appendix. Each human natively paired, our artificially paired human Ig-seq models had Ig-seq dataset was analyzed as a set of nonredundant heavy or a total CDR length distribution similar to that of the CSTs (SI light chains (human Ig-seq nonredundant chains) and as a set Appendix, Fig. S4), so CDR length should not bias comparisons of nonredundant CDR sequences (human Ig-seq nonredundant in other metrics. As the total length of the CDRs captures both CDRs). We chose these Ig-seq datasets as they contain simulta- binding-site shape (lower value and more concave) and CDRH3 neously sequenced heavy and light chains and so are a promising length (typically shorter in CSTs than our human Ig-seq heavy starting point for realistic in silico pairing, required to make chains), this metric was selected for inclusion in the final five complete structural models. TAP guidelines.

Model Structures. High-quality structural information is critical Canonical Forms. In natural antibodies, all CDR loops, apart to accurately predict the surface properties of antibodies. As from CDRH3, are thought to fall into structural classes known crystal structures are often unavailable, or difficult to attain, as canonical forms (28, 29). We assigned length-independent accurate modeling is a necessary step of an effective antibody canonical forms (Methods) to the 137 CST and human Ig-seq profiler. Accordingly, all our comparisons are made between models. All assignable CST model CDRs were labeled with a models, even when crystal structures are available, to avoid a bias canonical form also present in at least one human Ig-seq model in terms of structural quality [modeling introduces a systematic dataset (SI Appendix, Figs. S5 and S6). Fewer than 19% of CST bias toward higher values for our patches of surface hydropho- CDRs remained unassigned in each loop region, suggesting that, bicity (PSH) metric; see SI Appendix, Figs. S9 and S10]. ABody- Builder (21) was run on the 56 CSTs with a reference Protein Data Bank (PDB) (22) structure (as of May 4, 2018). Sequence- identical templates were not included, and each resulting model was aligned to its reference to evaluate the backbone rmsd across all IMGT (international ImMunoGeneTics information system) regions (SI Appendix, Methods). The mean framework and CDR rmsds (SI Appendix, Table S1) were commensurate with the cur- rent state of the art (16). For our structural property calculations, we class surface-exposed residues as having a side chain with rel- ative accessible surface area (ASArel,X) ≥7.5%, compared with alanine-X-alanine for each residue X (23, 24). Using this defini-

tion, we identified all exposed residues in the models and PDB Proportion structures. Of the 7,057 exposed crystal structure residues, only 265 (3.76%) were wrongly assigned as buried in the models. As these results suggest that ABodyBuilder models are accu- rate enough for our analysis, we used this software to model all 137 CSTs (137 CST models) and diverse subsets of paired human VdH Ig-seq chains (14,072 human VdH Ig-seq models) and paired human UCB Ig-seq chains (19,019 human UCB Ig- seq models). The pairing and modeling protocol was designed Length to capture the sequence and structural diversity in each dataset, Fig. 1. Comparing the CDRH3 length distributions of the 137 CSTs (red), within the constraints of modelability and computational expense 105,458 human VdH Ig-seq nonredundant CDRH3s (blue), and 551,193 (SI Appendix, Methods). We then performed a series of in silico human VdH Ig-seq nonredundant heavy chains (green). The CSTs have a assays to determine the TAP metrics. lower median CDRH3 length.

4026 | www.pnas.org/cgi/doi/10.1073/pnas.1810576116 Raybould et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 n ih-hi ufcsotieteCRvcnt r ooe nwhite. Heavy- in a colored loop. has are CDRH3 vicinity 167.89) CDR its of the in outside score surfaces hydrophobicity PSH light-chain values. of and vicinity PSH patch CDR higher surface-exposed Doolittle at large and underrepresented (Kyte are CSTs (B) The models. (red) Ig-seq 2. Fig. 123.30 CST subsequent were the all distributions for 133.76 for Ig-seq values scale VdH PSH human (31) vicinity and Doolittle CDR and mean Kyte The comparisons. the use we so R (e.g., with and region] (31–35). (Fv) scales hydrophobicity variable different CDR entire five [the the the regions and two for across (Methods) PSH models vicinity evaluated Ig-seq human We and region, separated. CST a 137 evenly in being another one than neighbor rather to tend residues ( hydrophobic PSH metric, we (30), a area interface [side- developed the solvent-exposed to proportional is approximately is it effect effective not or the ASA whether chain estimated 6– also degree but its we (2, only apolarity not of considering models, mAbs by residue homology in each of hydrophobicity our propensity Using aggregation 8). to linked repeatedly Hydrophobicity. if highlighting assigned. loop, be modeled TAP cannot each forms. any of canonical form canonical well-characterized the adopt therapeu- reports non-CDRH3 loops of CDR majority tic clear a engineering, despite abude al. et Raybould sub- a that suggests also in and hydrophobicity vicinity of CDR patches exposed con- large highly high of may the tolerant stored the less are that them therapeutics theory render which occurs the under difference supports conditions primary This centration the CDRs. implying the S7), within Fig. with Appendix, region, 2B). Fv (SI (Fig. 357.69 entire value of the values high across mean a occurred with divergence antibody similar A therapeutic rare a a is of galiximab values; example PSH CDR higher at underrepresented B Proportion A h eut falhdohbct clswr ihycorrelated highly were scales hydrophobicity all of results The 0.00 0.02 0.04 0.06 0.08 D iiiyPHsoe coste17CT(le n ua VdH human and (blue) CST 137 the across scores PSH vicinity CDR (A) ± 2 80 .1btenalsae nteCRvcnt) and vicinity), CDR the in scales all between ≥0.91 .CT eenoticeably were CSTs 2A). (Fig. respectively 21.08, rel .%(3 4] steeeg ftehydrophobic the of energy the As 24)]. (23, ≥7.5% CDR VicinityPSHScore(Kyte &Doolittle) 10 yrpoiiyi h D ein a been has regions CDR the in Hydrophobicity 0 0 1 ± 2 29 n 370.56 and 22.95 ,ta ilshge crsif scores higher yields that Methods), 1 4 0 6 0 220 200 160 180 ± 44,respectively 24.45, Hydrophobic Hydrophilic ± 66 and 16.60 1.00 1.33 1.66 2.00 hytpclyhv ihri ir icst aus() This (4). values viscosity vitro in higher have typically TAP they as forward carried distribu- human were PNC assays The and PNC respectively. metrics. PPC and PPC 1, similar Both displayed below tions. models values 3A) Ig-seq (Fig. 3B) PPC VdH (Fig. having 80.30% PNC and and 88.32% with vicinities, CDR zero. engaging to be revised to found then residues was of bridges charge salt The in (4). 7.4 pH at effect averaged residues their surface for charge All pK appropriate the (Methods). assigned initially measures neg- dense were of (PNC) of patches and charge regions (PPC) charge ative highlight positive of to patches We the designed 11). charge: metrics (10, characteristics two biophysical calculated negative to linked been Charge. score PSH vicinity metric. TAP CDR a the as included therapeutic therefore unsuitable be We would candidates. antibodies human natural of set from away bias a show values. datasets negative Both values. measures, both parameter (C In symmetry scores. vicinity. charge higher CDR Fv from the away in biased metrics are PNC datasets (B) the and PPC (A) the for ues 3. Fig. A Proportion C B

hnmb aeopstl hre V charged oppositely have mAbs When their in charge of patches avoid to tend models CST 137 The Proportion Proportion a 0.00 0.05 0.10 0.15 0.20 0.25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.0 0.4 0.1 0.5 0.3 0.2 aus snihoigrsde pert aealimited a have to appear residues neighboring as values, . . . 1.5 1.0 0.5 0.0 ...... 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 itgaso 3 S bu)adhmnVHI-e oe rd val- (red) model Ig-seq VdH human and (blue) CST 137 of Histograms 4 -20 -40 ufc ace fpstv rngtv hrehv also have charge negative or positive of patches Surface PNAS Structural FvCSPScore CDR VicinityPNCScore CDR VicinityPPCScore | ac ,2019 5, March 0 . . . . 4.0 3.5 3.0 2.5 2.0 | o.116 vol. 20 itga fstructural of Histogram ) H n V and | o 10 no. 40 L chains, | 4027

BIOPHYSICS AND COMPUTATIONAL BIOLOGY aggregate-inducing electrostatic attraction is captured at the Table 2. TAP amber and red flag regions, as defined by the sequence level by the Fv charge symmetry parameter (FvCSP) entire set of 242 CSTs metric—the mAb tends to be more viscous if the product of net Metric Amber flag region Red flag region VH and VL charges is negative (4). Harnessing our structural models, we calculated a variant (the structural Fv charge symme- Total CDR length 54 ≤ L ≤ 60 L > 60 try parameter, SFvCSP), which only includes residues that are PSH, CDR vicinity 83.84 ≤ PSH ≤ 100.71 PSH < 83.84 surface-exposed, and not locked in salt bridges, in the evalua- 156.20 ≤ PSH ≤ 173.85 PSH > 173.85 tion of net charge. In galiximab, for example, we “correct” the PPC, CDR vicinity 1.25 ≤ PPC ≤ 3.16 PPC > 3.16 charge of arginine H108 and aspartic acid L56 to 0, as the model PNC, CDR vicinity 1.84 ≤ PNC ≤ 3.50 PNC > 3.50 indicates that they form a salt bridge. The charges of the glu- SFvCSP -20.40 ≤ SFvCSP ≤ -6.30 SFvCSP < -20.40 tamic acid at position H6, the aspartic acids at positions H107, PSH score is calculated with the Kyte and Doolittle (31) hydrophobicity L98, and L108, and the histidine at position L40 are ignored scale. L, length. as their side chains are buried. The FvCSP score for this anti- body would be 0 (net heavy chain charge of 0, net light chain charge of −2.9), while the SFvCSP score is +2.0 (net heavy chain mAb therapies (105 CSTs, listed in Dataset S2), not included in charge of +2, net light chain charge of +1). A similarly low per- the 137 CST dataset, that had advanced to at least phase II in centage of CST models (21.9%) and human VdH Ig-seq models clinical development. (20.8%) had negative SFvCSP scores (Fig. 3C), with mean val- Only eight of this set (7.69%) were assigned a red developa- ues of 3.34 ± 7.44 and 3.67 ± 7.40, respectively. With such a bias bility flag according to the boundaries set by the 137 CSTs, away from negative products, we chose the SFvCSP as our final an average of 0.08 red flags per newly tested therapeutic (SI TAP property. Appendix, Table S3). Erenumab received the most red flags— for total CDR length (60), CDR vicinity PSH (173.85), and CDR The Importance of Modeling. We then mined SAbDab (36) to vicinity PPC (1.53). All other red-flagged therapeutics received find all of the human, nonengineered, nonredundant (at 100% only one: rafivirumab for total CDR length (60); intetumumab sequence identity) X-ray crystal structures in the PDB (22). We for CDR PSH (83.84); adacanumab, derlotuximab, , found only 33 such mAbs (identities listed in Dataset S1), as most and teprotumumab for CDR PPC (2.67, 2.66, 2.48, and 3.16, human mAb PDB entries involve some degree of engineering. respectively); and quilizumab for Fv charge asymmetry (−20.40). Calculating their TAP metric values, we found approximately The low red-flagging rate confirms that these guideline charac- the same difference in mean CDR vicinity PSH score between teristics are highly conserved across therapeutic-like antibodies. therapeutic and human crystal structures as we did between Incorporating both sets of CSTs into a larger dataset (242 CSTs) therapeutic and human VdH Ig-seq models (−9.69 and −10.46 led to the new guideline values shown in Table 2. While most respectively; see SI Appendix, Table S2). However, if we had metrics were only slightly adjusted, the PPC thresholds changed compared human structures to therapeutic models, we would quite significantly. As a result, we performed statistical sampling not have detected a significant difference (therapeutic models: over our TAP metric distributions to give a sense of the error that 123.30 ± 16.60; human structures: 124.61 ± 16.54). This system- might be inherent in these new threshold values (SI Appendix, atic bias toward higher PSH values in models is seen most clearly Methods and Table S4). All 242 CST TAP metric values are listed when comparing the values for CST crystal structures with CST in Dataset S3. models (SI Appendix, Fig. S9). Case Studies. We tested whether these updated guideline val- ues could highlight candidates with developability problems by Developability Guidelines. When comparing the TAP metric val- ues obtained for the 56 CST structures and their correspond- building models and running TAP on two datasets supplied by ing models, we saw positive correlations across all metrics (SI MedImmune (Fig. 4). A lead anti-NGF antibody, MEDI-578, Appendix, Fig. S10). This indicates that calculations performed showed minor aggregation issues during in vitro testing, of a level usually rectifiable in development, whereas the affinity-matured on ABodyBuilder models are typically predictive of the results version, MEDI-1912, exhibited unrectifiably high levels of aggre- that would be obtained from a crystal structure, and therefore gation (37). This observation was rationalized through SAP score that threshold values derived from models are informative. (6) values, indicating that a large hydrophobic patch on the sur- While CSTs predictably share many features in common with face was responsible. TAP assigns MEDI-578 an amber flag and human antibodies, our CDR length and hydrophobicity distri- MEDI-1912 a red flag—by a large margin—in the CDR vicinity butions imply that not every human antibody would make a PSH metric (Fig. 4A). The paper describes how back-mutation of good therapeutic. Consequently, our developability guidelines three hydrophobic residues in MEDI-1912 to those of MEDI-578 were set solely by CST values across the five selected metrics led to MEDI-1912STT, fixing the aggregation issue while main- (Table 1). An amber flag indicates that the antibody lies within taining potency. TAP assigns MEDI-1912STT no developability the extremes of the distribution, whereas a red flag indicates a flags (Fig. 4A). previously unobserved value for that property. A lead anti-IL13 candidate, AB008, had no developabil- To confirm that these threshold definitions do not typically flag ity issues, but the affinity-matured version, AB001, had very mAbs without developability issues, we identified a further 105 poor levels of expression (seven times lower than AB008) (11). The authors highlighted the role of four consecutive negatively charged residues in the L2 loop—mutation of the fourth nega- Table 1. TAP amber and red flag cut-off thresholds, with respect tively charged residue to neutral asparagine (AB001DDEN) was to the clinical-stage therapeutic distributions able to stabilize the loop backbone, mitigating the ionic repulsion Metric Amber flag region Red flag region of the DDE motif, and returning acceptable levels of expression. TAP assigns no developability flags to AB008 but a red flag to 1. Total CDR length Bottom 5%, top 5% Above or below AB001 and an amber flag to AB001DDEN for its CDR vicin- 2. PSH, CDR vicinity Bottom 5%, top 5% Above or below ity PNC metric (Fig. 4B), again red-flagging the candidate with 3. PPC, CDR vicinity Top 5% Above prohibitive developability issues. Both AB001 and AB008, con- 4. PNC, CDR vicinity Top 5% Above firmed monomers in solution (11), did not flag for CDR vicinity 5. SFvCSP Bottom 5% Below PSH score (Fig. 4A).

4028 | www.pnas.org/cgi/doi/10.1073/pnas.1810576116 Raybould et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 xoe o h A erc hrfr eedo h values the on depend therefore not alone. CSTs metrics are across TAP seen high-concentration antibodies The variation and to. human-expressed forces, exposed (including that sheer development conditions) temperature, storage during and stresses pH in of suffer range therapeutics as intuitive, a ther- somewhat good be a would make This would apeutic. antibody have human every trials not that clinical suggests of stage this development. therapeutic reached to amenable have characteristics assumption that developa- the with poor mAbs therapeutics, to that post-phase-I linked 242 properties across several bility analyzed have We Discussion DH op ulsml upti hw in shown is output sample S12 full Fig. A loop. CDRH3 Appendix, (SI the easily results S11 the be on of Fig. can interpretation liabilities guide quality help sequence model to accessed Estimated probable surface. and model Appendix, met- antibody charge, (SI TAP hydrophobicity S11A), molecular visualize the to interactive Fig. of user An each the allows Flags histograms. to s. viewer assigned accompanying 30 than are with less red) rics, of runtime or typical amber, a (green, with pro- detailed antibody a an returning of input, file an as sequences domain variable at sabpred/TAP.php available application, Application. Web as labeled are are MEDI-578/1912/1912STT AB-001/008/001DDEN legibility. metrics. for and A-001/008/DDEN all M-578/1912/1912STT, of without for as CSTs range versions labeled in the Engineered seen to respectively. previously return metrics, values AB001DDEN) and (MEDI-1912STT, PNC Amber issues and studies red-flagged asterisk. developability Case PSH are an thresholds. AB001) the by (MEDI-1912, guideline for issues CST labeled stud- developability 242 value prohibitive case the MEDI-1912STT PNC with delineate MedImmune and lines vicinity and MEDI-1912, dashed CDR red blue) MEDI-578, the flag). (light have CSTs assigned all by 242 (colored of ies set combined the 4. Fig. abude al. et Raybould B Proportion Proportion A yaayigteepoete,w aefudeiec that evidence found have we properties, these analyzing By 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 D iiiyPCmtisfor metrics PNC vicinity CDR (B) and PSH vicinity CDR (A) The 80 . .Fnly aoia om r sindt ahnon- each to assigned are forms canonical Finally, B). 0.0 * ...... 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 CDR VicinityPSHScore(Kyte &Doolittle) 0 2 4 160 140 120 100 A nyrqie h ev-adlight-chain and heavy- the requires only TAP . ehv akgdteTPit web a into TAP the packaged have We CDR VicinityPNCScore

opig.stats.ox.ac.uk/webapps/sabdab- A-008 A-DDEN M-1912STT A-001 A-008

M-578 A-DDEN 180 IAppendix, SI 200 A-001 M-1912 ua C gsqsqecsaeaalbea eaae D n framework and nonredundant CDR separated light-chain at database as regions available 7,120,000 Space are sequences and Antibody Ig-seq Ig-seq UCB heavy-chain Observed human VdH 4,587,907 the The from human obtained (20). “healthy” be nonredundant, can sequences light-chain in supplied 1,359,745 are and sequences CST 242 All Methods of limits the redefine soon (18). values may permissible process formulation in Advances dis- and consideration. the development into observed taken and previously be the rules, should outside bounds hard-and-fast candidates red-flagged as of interpreted tance be not category. latter should the in mAbs 36 only the are S8)—with there Table that values Appendix , (SI caveat PSH Human mAbs S7 ) mean mouse values. or higher Table chimeric metric noticeably than Appendix, have mean mAbs (SI in humanized differences status and significant campaign to drug leads ( nor progression S6) clinical by Table splitting At neither origin. stage, species therapeutic this or status, trial active/discontinued more chains. light suggest far lambda could containing are leads this using issues chains, when prevalent light developability kappa hydrophobicity-driven from a that derived determine safely are post-phase- to CSTs of 90% data I light- around enough as lambda have Nevertheless, 25 their threshold. not guideline only do than have we hydrophobic currently CSTs, found, chain we more also As lambda significantly equivalents. (38) that kappa mAbs, are al. of et loops models paired DeKosky CDRL3 natively S5); 2,000 their Table mod- across , Appendix Ig-seq UCB CST, (SI human 242 19,019 els our and contribute across Ig-seq, values to VdH PSH human tend 14,072 vicinity chains CDR kappa light average involving higher Lambda mAbs to chains. for sepa- light considered lambda example, be or For could subclasses. thresholds into rate guidelines therapeutic (36). increases the PDB the in antibodies models of ABodyBuilder number PNC, as the PPC, CSTs, as PSH, by improve in returned fluctuation It values inevitable trials. SFvCSP clinical the and of for II allow phase also entered will have that mAbs new include TAP the that 37). or shown (11, expression issues with have aggregation antibodies highlight we selectively Nevertheless, can guidelines stability. lead detect that poor not mechanisms subtle will to more they or example, immunogenicity of For sources issues. developability of trum aaae.Tenms eune,admtdt o ahCTaesupplied are of CST online each set for of metadata test and in search sequences, The names, extensive The ( (18). databases. an (https://www.antibodysociety.org/late-stage-clinical-pipeline/ mAb al. through Society IMGT et body found the Jain was including of resources, sequences information CST supporting 105 the from CSTs. datasets model in Ig-seq found human be the can derive to from used downloaded protocol pairing/modeling be can models Ig-seq pncanfr lnn--lnn,a acltdwt h haeand Shrake the with calculated the as (23). with algorithm compared Rupley alanine-R-alanine, atoms, side-chain form across open-chain (24) exposure relative ≥7.5% Residues. Surface-Exposed (21). snap- ABodyBuilder (36) identical by have SAbDab used to template a inferred the of were to loops loops forms Model canonical CDR 2017. (40) 26, September North-defined from the shot on run was (39) Forms. Canonical ac.uk/webapps/sabdab-sabpred/Therapeutic.html. ihPBsrcue weeaalbe itdat listed available) (where structures PDB with S2, Dataset swt h iisirl ffie h hehlsthemselves thresholds the five, of rule Lipinski the with As progression, trial clinical include could subclasses Other stratify to possible be may it available, are CSTs enough When to regularly values threshold the recalculate to intend We spec- whole the capture not will guidelines TAP simple Our h nta e f17CTatbd eune a sourced was sequences antibody CST 137 of set initial The antibodymap.org/structure IAppendix, SI eghidpnetcnnclfr lseigprotocol clustering form canonical length-independent A PNAS eiusdfie s“ufc-xoe”have “surface-exposed” as defined Residues Methods. | ac ,2019 5, March hrpui oesadhmnVdH human and models Therapeutic . n h 5,9 heavy- 551,193 the and S2, Dataset n Anti- and www.imgt.org/mAb-DB/) The opig.stats.ox.ac.uk/resources. | o.116 vol. | IAppendix, SI o 10 no. opig.stats.ox. | 4029 )

BIOPHYSICS AND COMPUTATIONAL BIOLOGY CDR Vicinity. The “CDR vicinity” comprises every surface-exposed IMGT- Charge. The following charges were assigned by sequence: aspartic acid, defined CDR and anchor residue, and all other surface-exposed residues −1; glutamic acid, −1; lysine, +1; arginine, +1; and histidine, +0.1 with a heavy atom within a 4-A˚ radius. (Henderson–Hasselbalch equation applied: pKa 6, pH 7.4, and rounded up to one decimal place). Tyrosine hydroxyl deprotonation was not considered. Salt Bridges. Salt bridges were defined as pairs of lysines/arginines and Salt-bridge residues were assigned a charge of 0. The PPC and PNC metrics aspartic acids/glutamic acids with a N+−O− distance ≤3.2 A.˚ are analogous in form to PSH, with H(R,S) substituted for |Q(R)|, the abso- lute value of the charge assigned to residue R. SFvCSP values were calculated hP ihP i Hydrophobicity. Where R1 and R2 are two surface-exposed residues with as Q(RH) Q(RL) , where RH and RL are surface-exposed VH and RH RL a closest heavy-atom distance, r12, <7.5 A˚ and H(R,S) is the normalized VL residues, respectively. hydrophobicity score (between 1 and 2) for residue R in scheme S, the PSH H(R ,S)H(R ,S) metric can be calculated as P 1 2 . The hydrophobicity scales ACKNOWLEDGMENTS. We thank Sebastian Kelm and James Heads for their R1R2 r2 12 helpful comments concerning our metrics and Jinwoo Leem for his assis- tested were Kyte and Doolittle (31), Wimley and White (32), Hessa et al. tance in implementing the web application. This work was supported by the (33), Eisenberg and McLachlan (34), and Black and Mould (35). Salt-bridge Engineering and Physical Sciences Research Council and Medical Research residues were assigned the same value as glycine in each hydrophobicity Council Grant EP/L016044/1, GlaxoSmithKline plc, MedImmune Limited, scale. F. Hoffmann-La Roche AG, and UCB Celltech.

1. Antibody Society 2018. Approved antibodies. Available at https://www. 20. Kovaltsuk A, et al. (2018) Observed antibody space: A resource for data min- antibodysociety.org/news/approved-antibodies/. Accessed June 12, 2018. ing next-generation sequencing of antibody repertoires. J Immunol, 201:2502– 2. Jarasch A, et al. (2015) Developability assessment during the selection of novel 2509. therapeutic antibodies. J Pharm Sci 104:1885–1898. 21. Leem J, Dunbar J, Georges G, Shi J, Deane CM (2016) ABodyBuilder: Auto- 3. Xu Y, et al. (2013) Addressing polyspecificity of antibodies selected from an in vitro mated antibody structure prediction with data-driven accuracy estimation. mAbs yeast presentation system: A FACS-based, high-throughput selection and analytical 8:1259–1268. tool. Protein Eng Des Sel 26:663–670. 22. Berman HM, et al. (2000) The protein data bank. Nucleic Acids Res 28:235–242. 4. Sharma VK, et al. (2014) In silico selection of therapeutic antibodies for develop- 23. Shrake A, Rupley JA (1973) Environment and exposure to solvent of protein atoms. ment: Viscosity, clearance, and chemical stability. Proc Natl Acad Sci USA 111:18601– Lysozyme and insulin. J Mol Biol 79:361–371. 18606. 24. Yang Zhu Z, Blundell TL (1996) The use of amino acid patterns of classified helices and 5. Chennamsetty N, Voynov V, Kayser V, Helk B, Trout BL (2009) Design of therapeutic strands in secondary structure prediction. J Mol Biol 260:261–276. proteins with enhanced stability. Proc Natl Acad Sci USA 106:11937–11942. 25. Tsuchiya Y, Mizuguchi K (2016) The diversity of H3 loops determines the antigen- 6. Lauer TM, et al. (2012) Developability index: A rapid in silico tool for the screening of binding tendencies of antibody CDR loops. Prot Sci 25:815–825. antibody aggregation propensity. J Pharm Sci 101:102–115. 26. Lefranc M-P, et al. (2003) IMGT unique numbering fro immunoglobulin and T cell 7. Jain T, et al. (2017) Prediction of delayed retention of antibodies in hydrophobic receptor variable domains and Ig superfamily V-like domains. Dev Comp Immunol interaction chromatography from sequence using machine learning. Bioinformatics 27:55–77. 33:3758–3766. 27. Shi B, et al. (2014) Comparative analysis of human and mouse immunoglobulin vari- 8. Obrezanova O, et al. (2015) Aggregation risk prediction for antibodies and its able heavy regions from IMGT/LIGM-DB with IMGT/HighV-QUEST. Theor Biol Med application to biotherapeutic development. mAbs 7:352–363. Model 11:30. 9. Yadav S, Laue TM, Kalonia DS, Singh SN, Shire SJ (2012) The influence of charge dis- 28. Chothia C, Lesk AM (1987) Canonical structures for the hypervariable regions of tribution on self-association and viscosity behavior of solutions. immunoglobulins. J Mol Biol 196:901–917. Mol Pharm 9:791–802. 29. Chothia C, et al. (1989) Conformations of immunoglobulin hypervariable regions. 10. Datta-Mannan A, et al. (2015) Balancing charge in the complementarity- determining Nature 342:877–883. regions of humanized mAbs without affecting pI reduces non-specific binding and 30. Reynolds JA, Gilbert DB, Tanford C (1974) Empirical correlation between hydropho- improves the pharmacokinetics. mAbs 7:483–493. bic free energy and aqueous cavity surface area. Proc Natl Acad Sci USA 71:2925– 11. Popovic B, et al. (2017) Engineering the expression of an anti--13 antibody 2927. through rational design and mutagenesis. Protein Eng Des Sel 30:303–311. 31. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character 12. Haberger M, et al. (2014) Assessment of chemical modifications of sites in the CDRs of a protein. J Mol Biol 157:105–132. of recombinant antibodies. mAbs 6:327–339. 32. Wimley WC, White SH (1996) Experimentally determined hydrophobicity scale for 13. Sydow JF, et al. (2014) Structure-based prediction of asparagine and aspartate proteins at membrane interfaces. Nat Struct Biol 3:842–848. degradation sites in antibody variable regions. PLoS One 9:e100736. 33. Hessa T, et al. (2005) Recognition of transmembrane helices by the endoplasmic 14. Petrescu AJ, Milac AL, Petrescu SM, Dwek RA, Wormald MR (2004) Statistical analy- reticulum translocon. Nature 433:377–381. sis of the protein environment of N-glycosylation sites: Implications for occupancy, 34. Eisenberg D, McLachlan AD (1986) Solvation energy in protein folding and binding. structure, and folding. Glycobiology 14:103–114. Nature 319:199–203. 15. Courtois F, Agrawal NJ, Lauer TM, Trout BL (2016) Rational design of therapeu- 35. Black SD, Mould DR (1991) Development of hydrophobicity parameters to analyze tic mAbs against aggregation through protein engineering and incorporation of proteins which bear post- or cotranslational modifications. Anal Biochem 193:72–82. glycosylation motifs applied to bevacizumab. mAbs 8:99–112. 36. Dunbar J, et al. (2014) SAbDab: The structural antibody database. Nucleic Acids Res 16. Almagro JC, et al. (2014) Second antibody modeling assessment (AMA-II). Proteins 42:1140–1146. 82:1553–1562. 37. Dobson CL, et al. (2016) Engineering the surface properties of a human monoclonal 17. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and compu- antibody prevents self-association and rapid clearance in vivo. Sci Rep 6:1–14. tational approaches to estimate solubility and permeability in drug discovery and 38. DeKosky BJ, et al. (2016) Large-scale sequence and structural comparisons of development settings. Adv Drug Del Rev 23:3–25. human naive and antigen-experienced antibody repertoires. Proc Natl Acad Sci USA 18. Jain T, et al. (2017) Biophysical properties of the clinical-stage antibody landscape. 113:E2636–E2645. Proc Natl Acad Sci USA 114:944–949. 39. Nowak J, et al. (2016) Length-independent structural similarities enrich the antibody 19. Vander Heiden JA, et al. (2017) Dysregulation of B cell repertoire formation CDR canonical class model. mAbs 8:751–760. in myasthenia gravis patients revealed through deep sequencing. J Immunol, 40. North B, Lehmann A, Dunbrack RL Jr (2011) A new clustering of antibody CDR loop 198:1460–1473. conformations. J Mol Biol 406:228–256.

4030 | www.pnas.org/cgi/doi/10.1073/pnas.1810576116 Raybould et al. Downloaded by guest on September 26, 2021