<<

and Contact Residues Define and Size and Structure

This information is current as James W. Stave and Klaus Lindpaintner of September 26, 2021. J Immunol published online 24 June 2013 http://www.jimmunol.org/content/early/2013/06/22/jimmun ol.1203198 Downloaded from

Supplementary http://www.jimmunol.org/content/suppl/2013/06/24/jimmunol.120319 Material 8.DC1

Why The JI? Submit online. http://www.jimmunol.org/

• Rapid Reviews! 30 days* from submission to initial decision

• No Triage! Every submission reviewed by practicing scientists

• Fast Publication! 4 weeks from acceptance to publication

*average by guest on September 26, 2021

Subscription Information about subscribing to The Journal of is online at: http://jimmunol.org/subscription Permissions Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Email Alerts Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts

The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2013 by The American Association of Immunologists, Inc. All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. Published June 24, 2013, doi:10.4049/jimmunol.1203198 The Journal of Immunology

Antibody and Antigen Contact Residues Define Epitope and Paratope Size and Structure

James W. Stave and Klaus Lindpaintner1

A total of 111 Ag–Ab x-ray crystal structures of large Ag and were analyzed to inform the process of eliciting or selecting functional and therapeutic Abs. These analyses illustrate that Ab contact residues (CR) are distributed in three prominent CR regions (CRR) on L and H chains that overlap but do not coincide with Ab CDR. The number of Ag and Ab CRs per structure are overlapping and centered around 18 and 19, respectively. The CR span (CRS), a novel measure introduced in this article, is defined as the minimum contiguous amino acid sequence containing all CRs of an Ag or Ab and represents the size of a complete structural epitope or paratope, inclusive of CR and the minimum set of supporting residues required for proper conformation. The most frequent size of epitope CRS is 50–79 aa, which is similar in size to L (60–69) and H chain (70–79) CRS. ∼ The size distribution of epitope CRS analyzed in this study ranges from 20 to 400 aa, similar to the distribution of independent Downloaded from protein domain sizes reported in the literature. Together, the number of CRs and the size of the CRS demonstrate that, on average, complete structural epitopes and paratopes are equal in size to each other and similar in size to intact protein domains. Thus, independent protein domains inclusive of biologically relevant sites represent the fundamental structural unit bound by, and useful for eliciting or selecting, functional and therapeutic Abs. The Journal of Immunology, 2013, 191: 000–000.

onoclonal Abs (mAbs) are important tools, diagnostic Purified, full-length native and recombinant are useful http://www.jimmunol.org/ reagents, and therapeutic drugs. High-value commercial for eliciting Abs that bind native proteins, but are often difficult M diagnostic, functional, and therapeutic mAbs must rec- to prepare in quantities and qualities sufficient for immunization ognize full-length native proteins folded in native conformations as and selection of mAbs. In addition, immunization with full-length they exist in patients or patient samples. Given their potential value, proteins results in a polyclonal response where the majority of strategies for developing mAbs to native protein Ags is of intense Abs are directed to a limited number of immunodominant sites of interest; however, the underlying biology dictating performance the protein (5, 6). In cases where the predefined site of interest makes the task of developing such Abs both complicated and costly is nonimmunodominant, or even nonimmunogenic, it is very diffi- (1). For an Ab to exert a desired biological effect, it is usually the cult to isolate mAbs with the required performance specifications. by guest on September 26, 2021 case that it must bind to a specific, predefined site. In such cases, Thus, there is a critical need for a means of directing Ab develop- epitope prediction algorithms are of limited value because ment to predefined functional epitopes of native protein Ags re- the Ab must bind to the predefined site regardless of its inherent gardless of their inherent . immunogenicity. Of paramount importance is binding at the target In the case of therapeutics, once candidate mAbs have been site to the three-dimensional conformation the protein exists in isolated with the requisite binding characteristics, the CDR can be under the conditions where the Ab must function. taken from the isolated mAb (7) and inserted into engineered Ab The most common method for developing Abs is to inject purified constructs having desired effecter function and characterized im- protein into animals and isolate mAbs that bind to native proteins. A munogenicity and toxicity profiles. It has been reported for six difficulty with this approach is the preparation of protein of suitable protein Ags that only a fraction of the CDR residues participate in purity and functional conformation. For decades, researchers have binding (8), and that Ab contact residues (CRs) in nine crystal attempted to use peptides as surrogates of full-length native proteins structures do not precisely correspond with residues modified by with little success (2). Anti-peptide Abs react well to their cognate in Vk (9). It is critical to identify specific peptides and may be useful for detecting linear epitopes in techniques residues involved in contacting Ag and understand how they are like , but Abs raised to unstructured peptides more often organized within CDR to engineer Ab reactivity. Thus, the native than not do not recognize the same sequence in native protein (3, 4). three-dimensional structure of both Ag epitope and Ab paratope are fundamental to the success of developing and engineering high- Antibody Discovery Research and Development, Strategic Diagnostics Inc., Newark, performance functional and therapeutic mAbs. DE 19702 With these considerations in mind, we undertook the analysis of 1Current address: Thermo Fisher Scientific Inc., Analytical Technologies, Waltham, MA. a large number of protein–Ab x-ray crystal structures to better Received for publication November 19, 2012. Accepted for publication May 23, understand both epitope and paratope size, structure, and interac- 2013. tion. A number of researchers have reported identifying CR of Abs Address correspondence and reprint requests to Dr. James W. Stave, Strategic Diag- and Ags by analysis of publicly available crystal structures (10–12). nostics Inc., 111 Pencader Drive, Newark, DE 19702. E-mail address: jstave@ A large body of work in this field is aimed at identifying CR as a sdix.com means of developing algorithms for predicting B cell epitopes (10, The online version of this article contains supplemental material. 13–17), whereas others have focused on analyzing or predicting Abbreviations used in this article: CR, contact residue; CRR, contact residue region; CR in CDR of Abs (8, 18–20). The majority of the existing studies CRS, contact residue span; PDB, Protein Data Bank. define two residues to be in contact when they are within a specified Copyright Ó 2013 by The American Association of Immunologists, Inc. 0022-1767/13/$16.00 distance of each other. Many researchers use 4 A˚ as the definition of

www.jimmunol.org/cgi/doi/10.4049/jimmunol.1203198 2 EPITOPE AND PARATOPE STRUCTURE

CR (10, 14–17), and the Molecular Modeling Database (21), at the of predefined functional sites represent the basic structural unit National Center for Biotechnology Information, defines protein in- useful for immunization, screening, and selection of functional teractions using a 4-A˚ cutoff. One group has published several and therapeutic mAbs. works using 6 A˚ (13, 20, 22) and reported that their conclusions were unaffected by cutoffs ranging from 4 to 6 A˚ (20). Materials and Methods ˚ Haste Andersen et al. (14) identified epitope CR using a 4-A cutoff A list of 110 unique Ab–Ag x-ray crystal structures representing 111 unique in 76 crystal structures composed of Ags .25 aa (29 of the structures Abs (structure 3LOH contains 2 different Abs bound to the same Ag at were variants of lysozyme). Using this approach, these researchers different sites) was manually curated from the Protein Data Bank (PDB) were able to determine the number of CRs per epitope, the maximum (24). To provide information regarding the full extent of Ag–Ab interaction, $ continuous sequence of CR within each epitope, and the distribution only structures containing 85 Ag aa were included in the analysis. Each Ab in the data set occurs only once. The resolution of all structures is #3.8 A˚ . of continuous CR per segment. These results highlighted the dis- The amino acid sequence for each Ab was numbered using Abysis and the continuous nature of protein Ag epitopes. Kabat convention (25) (http://www.bioinf.org.uk/abysis). CRs are defined as Padlan et al. (8) determined Ab CR in six crystal structures with Ag and Ab amino acid residues within 4 A˚ of each other (10). CRs were protein Ags and compared the sequence positions with CDR defined identified from PDB x-ray crystal structures using the Cn3D tool and Mo- lecular Modeling Database (21), found at the National Center for Biotech- by nucleic acid sequence variability (23). MacCallum et al. (12) nology Information Web site (http://www.ncbi.nlm.nih.gov/Structure/CN3D/ determined Ab CR sequence positions for 7 protein–Ab structures cn3d.shtml). In some analyses, CR at other distances are also presented. The (5 of which are lysozyme), and Almagro (19) determined numbers majority of the Abs analyzed in this article are described by terms denoting specific function including “neutralizing” (e.g., 3GBN, 3SE8), “therapeutic” and sequence positions of VL and VH CR for 19 protein Ags. Be- sides analyzing only a limited number of protein–Ag structures, these (e.g., 1N8Z, 1YY9), “inhibitory/antagonistic” (e.g., 3L95, 3HI6), “blocking” Downloaded from (e.g., 2QQN), “stimulating” (e.g., 3GO4), “mitogenic” (e.g., 1YJD), and studies did not report on CR sequence positions outside estab- “protective” (e.g., 3ETB). lished CDRs. Recently, Kunik and coworkers (20) identified Ab CR in 69 Ag–Ab structures with proteins $5 aa using a 6-A˚ cutoff and reported that ∼20% of CRs fall outside CDR. It is not clear Results how many of these structures contained large protein Ags, and the Ag and Ab CR

authors did not report sequence positions of the identified CR. The list of 110 unique Ab–Ag x-ray crystal structures analyzed in this http://www.jimmunol.org/ The existing studies define CR in different ways, and the number work is presented in Table I. A summary of the amino acids analyzed of unique structures analyzed is small. In addition, the number of Ag in this study is presented in Table II. The data set is composed of amino acids in the crystals are often not reported or are too few to 78,666 aa, of which 5.3% are identified as CR. Ab amino acids rep- draw conclusions about the nature of Ab binding to large protein Ags. resent 61% of the total and are divided almost equally between Furthermore, some studies report only on Ag CR, whereas others heavy (IgH) and light (IgL) chains. About half of all CRs belong to focus solely on Ab. Ab (52%) and 65% of these lie within the H chain. Overall, there In this article we apply a uniform analysis to both Ag and Ab are ∼7 CRs per L chain, 13 per H chain, and 18 per Ag; however, there proteins in a large number of x-ray crystal structures containing is significant variation between structures. The largest number of relatively large numbers of Ag amino acids ($85). The number of CRs in a single structure, including both Ag and Ab, is 101 (1BGX), by guest on September 26, 2021 structures analyzed is significantly greater than previous studies, whereas the smallest is 20 (2EXW). revealing the extent of variability and enabling generalization of the The frequency distribution of numbers of CR in all 111 Ag–Ab underlying structural architecture that supports epitopes and para- structures is shown in Fig. 1. CRs are distributed around ∼18–19 for topes. For each structure, we identify numbers of CR and CR span both Ag and Ab, and the distributions exhibit significant overlap. (CRS) for both Ag and Ab, and calculate CR ratios between Ag and Whereas Fig. 1 illustrates that the number of CRs across all 111 Ab, as well as H and L chains. In addition, we identify the sequence structures is distributed in a similar and overlapping manner for both positions of all Ab CRs and map them against CDR revealing CR Ag and Ab, there is only a modest correlation between the number regions (CRRs) that are similar to, but do not coincide with, CDR. of Ag and Ab CRs within individual structures (Pearson correlation A significant observation of these analyses is that CR of both coefficient, r2 = 0.5394). Ag and Ab are distributed in long, continuous linear sequences The ratio of Ab to Ag CR for each of the 111 structures is shown in (CRS) that are similar in size to each other, and similar to the size Fig. 2 and is represented as the log(Ab CR/Ag CR). Negative values of complete protein domains. For the first time, to our knowledge, result when there are more CR in the Ag (36 structures), and positive we define complete structural epitopes as the surface CR and the values result when there are greater numbers of Ab CR (59structures). minimum set of underlying contiguous amino acids required to Sixteen structures have equal numbers of Ag and Ab CRs. The form the intact native structure. On average, complete epitopes are structure having the highest ratio in the data set is 2J88 (21 Ab CRs, the size of protein domains, and intact protein domains inclusive 10 Ag CRs), and the lowest is 1ADQ (9 Ab CRs, 16 Ag CRs).

Table I. Ag–Ab crystal structures analyzed

PDB Structures Analyzed 1ADQ 1I9R 1ORQ 1YQV 2I9L 2VIS 3CVH 3GRW 3L5X 3N85 3S37 1AFV 1IQD 1OSP 1YY9 2J88 2VXT 3D85 3H3B 3L95 3NH7 3SE8 1BGX 1JPS 1QFU 1Z3G 2JEL 2W9E 3D9A 3H42 3LDB 3NID 3SM5 1CZ8 1JRH 1RJL 1ZTX 2NY1 2XQB 3EOA 3HB3 3LEV 3NPS 3SOB 1E6J 1LK3 1S78 2ADF 2NZ9 2XWT 3ETB 3HI6 3LIZ 3O2D 3T2N 1EGJ 1N8Z 1V7M 2ARJ 2Q8B 2ZCH 3FMG 3HMX 3LOH 3OR6 3TT1 1EZV 1NCD 1VFB 2DD8 2QQL 3B9K 3G04 3I50 3LZF 3R1G 3TT3 1FJ1 1NSN 1WEJ 2EXW 2QQN 3BSZ 3G6J 3IU3 3MA9 3RAJ 3VG9 1FNS 1OAZ 1YJD 2FD6 2R29 3C09 3GBN 3K2U 3MJ9 3RKD 4D9R 1FSK 1OB1 1YNT 2GHW 2R56 3CSY 3GI8 3KJ4 3MXW 3S35 4DAG The Journal of Immunology 3

Table II. Summary of amino acids analyzed

CR 4 A˚

Structure Total aa Total Mean Median Maximum Minimum IgL 23,470 751 7 6 27 0 IgH 24,384 1,397 13 12 26 4 All Ab 47,854 2,148 19 19 49 9 Ag 30,812 2,002 18 18 52 8 Ag+Ab 78,666 4,150 37 37 101 20 n = 111 Ag–Ab interactions.

Ag CRS In addition to counting the number of CRs within each structure, the amino acid sequence position of each CR was also recorded for both Ag and Ab. From the sequence positions, the CRSs were determined FIGURE 2. The distribution of the ratio of Ab to Ag CR for 111 x-ray for all 111 structures. Fig. 3 shows the distribution of the size of the crystal structures. CRS across all 111 Ags using 3.5, 4.0, and 4.5 A˚ as the definition of CR. Although there appears to be a trend toward longer distances Downloaded from longest contiguous sequence of CR being 5 (Fig. 4). These findings between CRs resulting in longer CRS, the effect is small using the are in good agreement with previous work (14). The 1FSK structure distances examined in this study. The large majority of Ag struc- is an example of an epitope containing a long, contiguous sequence tures (86.5%) have CRS ranging from 20 to 229 aa. Only 2 of the of 12 CRs (i.e., a linear peptide); however, the complete epitope 111 structures analyzed have all of the Ag CRs contained within contains a total of 17 CR spread over a CRS of 56 residues. Thus, a CRS of ,20 aa (2J88 and 3BSZ). The most common class of Ag epitopes are discontinuous individual and peptide CRs contained CRS in this data set is 50–79 residues. The average and median CRS http://www.jimmunol.org/ within protein regions ranging from ∼20 to 400 aa. of all 111 Ag structures is 125 and 95 aa, respectively. The size and distribution of native mAb neutralization epitopes There is a significantly higher average number of CR for large Ag is illustrated in Fig. 5. The number of CRs for 9 HIV-1 gp120 structures $230 aa (19.8, n = 51), compared with smaller Ags ,230 neutralizing Abs range from 13 to 39 and are contained within long aa (16.6, n = 60, p , 0.01), and there is a weak but significant CRS from 194 to 384 aa. Many of these mAbs contact the same correlation between Ag size and CRS (Pearson correlation coeffi- amino acid positions and the SUM line is a “heat map” indicating cient, r2 = 0.273). The relationship is readily seen in the distribution the number of different neutralizing mAbs contacting residues at of average CRS depicted in Supplemental Fig. 1. The epitope CRS each position. HIV-1 gp120 residues 281, 365, 366, 367, 368, 419, increases proportionally to the size of Ag in the crystal. Also, the 421, and 473 are contacted by five or six of the mAbs. Together, by guest on September 26, 2021 average CRS of Ags $230 aa (179) is significantly greater than Ags these 8 CRs represent the core of an epitope recognized by mul- ,230 aa (79). tiple broadly neutralizing mAbs spread over a linear amino acid A small but significant percent of the structures (11.7%) have Ag sequence of 193 residues. CRS .229 aa. The 442-aa CRS in structure 3TT3 is the longest in Although each CR epitope is unique, the number of CRs and the data set, whereas the 12-aa CRS in structure 2J88 is the shortest. pattern of sequence positions may be useful for grouping neutrali- The 12-aa CRS containing the 2J88 epitope contains 10 Ag CRs. In zation epitopes into classes. Structures 2NY1 and 3JWD have rel- this respect, the 2J88 epitope is the most like a small, contiguous atively few CRs located at almost identical positions and contained peptide of all 111 structures. Although 2J88 is the only structure within a 319-aa CRS. Similarly, structures 3NGB, 3SE9, 3SE8, and where the number of CR  CRS, the majority of the structures do indeed have epitope CRs that are clustered together in short, con- tiguous stretches ranging from 2 to 12 aa, with the most frequent

FIGURE 3. The distribution of Ag epitope CRS determined at 3.5, 4.0, and 4.5 A˚ .Thenumberofstructuresincludedineachdistributionisin parentheses. Only crystal structures having resolution less than the value FIGURE 1. Frequency of CR in Ag–Ab structures. The number of CRs used to determine CR are included. Figure annotations are based on the in the entire data set per group is given in parentheses. 4.0-A˚ distribution. 4 EPITOPE AND PARATOPE STRUCTURE

The ratio of IgH to IgL CRs for each of the 111 structures is depicted in Fig. 7. Three of the 111 structures have no CRs within the L chain (1Z3G, 2I9L, and 3GBN). The overwhelming majority of the structures (92) have more IgH CR than IgL; however, 11 Ab structures have more IgL CRs than IgH. The structure having the highest ratio in the data set is 3EOA (13 IgH and 1 IgL), and the lowest is 3HB3 (5 IgH and 10 IgL). A significant observation from these studies is the degree of vari- ability in the number and ratio of CRs between individual structures within an Ag–Ab complex. Although all the data taken together suggest a model of Ag–Ab interaction involving equal numbers of Ab and Ag CR, this is the case in only 16 of the 111 structures analyzed. Similarly, the composite data indicate that 65% of Ab CRs reside within H chains, but the IgH to IgL CR ratio within an in- dividual Ab is highly variable ranging from no IgL CRs to twice as many L chain CRs as H chain. It is commonly reported that the FIGURE 4. Frequency distribution of the longest contiguous sequence majority of Ab CRs reside within IgH, and the analyses reported of Ag CR in each of the 111 structures analyzed. in this article support these observations; however, there are 11 ex- amples in this data set of Abs having more CRs in IgL than IgH. Downloaded from NIH45-46 all have large numbers of CRs spread over very long CRS Although the large number of structures analyzed in this study does and located at very similar sequence positions. All of the HIV-1 Abs indeed enable construction of useful models of Ag–Ab interaction, the significant variability within and between structures limits the examined in this study bind and neutralize structurally intact infec- ability to predict individual CRs based on these composite data. tious virus illustrating the general phenomenon that functional Abs contact Ag residues in highly conformationally dependant structures Ab CRS http://www.jimmunol.org/ consisting of long CRS. The identification of CRs recognized by As with Ag (Fig. 3), Ab CRs, comprising the Ab paratope, are multiple neutralizing mAbs illustrates the utility of applying a uni- distributed along the IgH and IgL chains in a noncontiguous form analysis to multiple structures and lends credence to the use of manner. The CRS was determined for all 111 IgL and IgH struc- CR defined as a simple distance between amino acids as a means for tures, and the frequency was plotted in Fig. 8. The IgL CRs are identifying residues of viral pathogens that may be critical deter- typically contained within a CRS of 60–69 aa, whereas the IgH minants of . CRS is typically 70–79 residues. Abs make contact with Ag at three discrete regions along each IgH and IgL chain known as CDR, and Ab CR the predominant CRS depicted in Fig. 8 represents Ig chains having

The distribution of the number of CRs present within the H and L CR in both the first and third CDRs. Eleven IgH structures have by guest on September 26, 2021 chains of each Ab structure is shown in Fig. 6. Overall, 65% of the CRS $90 aa (e.g., 3VG9). In all 11 of these structures, there is a Ab CRs identified here are contained within the H chain. The fre- CR in amino acid sequence position 1 or 2, which accounts for the quency of CRs are distributed around 5–7 per L chain and 11–13 extraordinary length of these CRSs. Similarly, there are 3 IgL struc- per H chain. The minimum Ig structure retaining both IgH and IgL tures where amino acid sequence position 1 or 2 is a CR resulting in ∼ ∼ ∼ variable domains (Fv) is 230 aa ( 110 aa VL and 120 aa VH). CRS of 90–99 aa (e.g., 3SE8). The 2,148 Ab CRs identified here represent, on average, ∼8.4% of an intact Fv fragment. The total number of CRs per Ab, however, Ab CR sequence positions varies significantly between structures. Structures 1BGX and 3SE8 The sequence positions of the Ab CR are depicted in Fig. 9. All have 49 and 32 CRs per Ab, respectively, whereas structure 1ADQ PDB Ab sequences were renumbered according to the Kabat has only 9 Ab CRs. convention using Abysis. Abysis did not number structures 3SE8

FIGURE 5. CR epitope map of nine HIV-1 gp120 neutralizing mAbs. Red bars indicate gp120 amino acid sequence positions within 3.5 A˚ of any Ab residue; yellow bars indicate amino acids within 4 A˚ . The number of CRs and length of CRSs are based on 4 A˚ data. The SUM line was constructed by counting the number of structures having CRs at the indicated positions and applying a color scale (red . yellow . green) to the distribution. The Journal of Immunology 5

FIGURE 6. The frequency of CRs in Ab structures. Numbers in pa- FIGURE 8. The frequency distribution of the CRS of IgH and IgL rentheses indicate the total number of CRs per group in the entire set of chains. Numbers in parentheses indicate the total number of CRs per group 111 Ag–Ab structures. IgH+L is the summation of IgH and IgL. in the data set of 111 structures. Downloaded from and 3T2N, and these were omitted from further analysis. The IgH CRRs are defined in this article as contiguous sequences where sequence positions that are most frequently CRs are 100, 52, 31, 97, .10% of the structures have CR. With these boundaries, 91.6% of and 98. The most frequent IgL CR sequence positions are 32, 92, all IgL and 93.7% of all IgH CRs lie within the indicated CRR. IgH 91, 30, and 50. IgL positions 32, 92, and 91 are in agreement with CR sequence positions 1–2 and 73–79 lie outside of defined CDR/ previous findings (9). In this study, we add as predominant CR the CRR. Eleven structures have CRs within IgH region 73–79 (e.g., http://www.jimmunol.org/ IgL positions 30 and 50 and the IgH positions above. 1S78) and 11 within region 1–2 (see also Fig. 9). Region 3 contains Further inspection of the data reveals three prominent regions of more CR than regions 1 and 2 for both IgL and IgH. The third CR for both IgH and IgL chains. These CRRs, defined by distance region of IgH is the largest, containing 24.6% of all Ab CR. There between Ag and Ab residues, are organized in a manner similar to are many Ab structures in this data set that do not have CR in each established CDR defined by Ab gene sequence variability (23). of the six CRRs. There are 7 IgH (e.g., 2QQN) and 64 IgL struc- Although the great majority of the total number of Ab CRs iden- tures that do not make contact in all 3 CRRs. The sequence posi- tified in this study lie within these prominent regions, a small but tions of all CRs in relation to CDR and CRR for all 111 structures significant percentage of CRs have sequence positions that fall are shown in Supplemental Figs. 2 (IgL) and 3 (IgH). by guest on September 26, 2021 outside of these regions. In individual Abs, CRs generally do not occur as long, contiguous A more detailed analysis of Ab CRR is presented in Fig. 10. The sequences, but are scattered in a discontinuous manner within and upper panel is an analysis of IgL CRR and the lower panel is the across regions. It is interesting to note that in IgH and IgL CDR2/ same analysis for IgH. No CRs were identified beyond IgL se- CRR2, there are large numbers of structures having CRs at positions quence position 96 and IgH sequence position 103. The heat maps 50 and 52, but only a small number at position 51. Similarly, IgH highlight sequence positions where multiple structures have CR. position 29 is included within CRR1 even though only 1 Ab of 111 Sixty-three IgL and 59 IgH amino acid sequence positions, repre- contacts Ag at that position because the positions on either side are senting ∼54% of all positions within an Ab V region fragment significantly above the threshold. These observations illustrate the (Fv 230 aa), are CRs in at least 1 of the 111 structures analyzed. discontinuous nature of CR within CRR and CDR of individual Abs. Many of these sequence positions, however, are CRs in only a The fact that IgH positions 29 and 51 and IgL position 51 are in- small number of structures.

FIGURE 7. The ratio of IgH to IgL CR in 111 Ag–Ab structures. FIGURE 9. The distribution of Ab CRs along the IgH and IgL chains. Numbers in parentheses indicate the total CR per group in the entire set of Ab amino acid sequence positions from PDB files were renumbered using 111 structures. Abysis (25). 6 EPITOPE AND PARATOPE STRUCTURE Downloaded from

FIGURE 10. Analysis of IgL (upper) and IgH (lower) CRRs. Heat maps for IgL and IgH were constructed by counting the number of occurrences of each position in all structures and applying a color scale (red . yellow . green) to the distribution. Sequence positions were numbered using Abysis and the Kabat convention. CRRs are highlighted in gray.

cluded within CDR/CRR and yet are only infrequently CRs suggests of CDR reported by others (26). CRRs are defined by measuring http://www.jimmunol.org/ the possibility of a specific role for these positions apart from directly physical distance between atoms within crystal structures, whereas binding Ag, perhaps in properly positioning actual CRs. CDRs are defined by Ig gene sequence variability. The overlap further confirms the validity of both methodologies. Although the two ap- Comparison of CRR with CDR proaches point to similar structural architecture, there are important A comparison of the boundaries, number of sequence positions, and differences as well (Fig. 12, Supplemental Figs. 2 and 3). Ab engi- CR in each CRR with the corresponding CDR is presented in Fig. 11. neers relying solely on definitions of CDR may not give adequate Although CRR and CDR significantly overlap, the boundaries are consideration to sequence positions that are actually important CR. different. The differences between CDR and CRR boundaries are Two examples are IgL sequence position 49 and IgH position 30. greater for IgH than for IgL. There are more total sequence posi- These positions lie outside established CDR and yet frequently occur by guest on September 26, 2021 tions within CDR than CRR. IgH region 3 contains the largest as CR. Also, IgL positions 24, 25, 26, 90, and 97 and IgH positions 61 number of CRs for both CDR and CRR, followed by IgH region 2. and 63 do not contact Ag in this data set but are included within CDR. IgL region 2 contains the fewest CRs for both CDR and CRR. These observations point out the advantage of using multiple meth- Using the Kabat definitions (discussed in Ref. 25), 13.0% of all Ab odologies to characterize Ab binding residues and shed new light on CRs lie outside of established CDR boundaries and 7.0% outside of the organization of regions of Abs harboring important CR. CRR. Recently, Kunik et al. (20) reported ∼20% of the Ab residues that actually bind the Ag fall outside the CDR. Fig. 12 illustrates the Discussion overlap between CDR and CRR. The greatest difference between These analyses were undertaken to better understand the nature of the two is the inclusion of IgH residues 26–30 in CRR1 compared Ab binding to large protein Ags, and in so doing provide insight to with CDR1. Another major difference between the two is inclusion strategies for developing high-performance, high-value functional ofIgLposition49inCRR2comparedwithCDR2.TheCRRsde- and therapeutic Abs. Abs are used in a variety of ways, and the fined in this article contain a larger number of CR and are more structure of an Ag is influenced by the assay or application in which uniform in number of sequence positions in a region, and yet com- the Ab is used (4). In the Western blot, the Ag is typically boiled prise fewer total sequence positions compared with CDR; that is, in detergent and reducing agent, and the Ab must be capable of they are more densely populated with CR than CDR. recognizing the denatured form of the Ag. In sandwich immuno- The analyses presented in this article demonstrate a significant assay methods using minimally processed patient serum or plasma overlap between CRR and established CDR defined using com- samples, the Ab must be able to bind to the properly folded native pletely orthogonal methodologies. Indeed, the CRR boundaries re- Ag structure. Therapeutic Abs must bind to native protein struc- ported in this article are even more closely aligned with refinements tures as they exist in the patient. Functional Abs must bind to

FIGURE 11. Comparison of established CDR (Kabat) and CRR. The total number of CRs in the data set for each group is in parentheses. The Journal of Immunology 7

FIGURE 12. Overlap of CDR and CRR. Amino acid sequence position is numbered across the top using the Kabat convention. The number of CRs in each region (defined in Fig. 11) are indicated. The heat map was constructed by applying a color scale to the number of CRs (red . yellow . green).

native protein structures at specific sites in a way that influences CRs are contained within relatively long CRSs that come into close protein function. spatial proximity because of complex folding inherent within native In many applications, it is of greater interest to predict the sites on protein structures. This is true of both Ag and Ab proteins. In the case an Ag that will result in functional modulation or therapeutic effect of Abs, the CRSs are highly uniform in length (60–69 for IgL and than it is to predict sites that will elicit the production of Abs in an 70–79 for IgH), and CRs are organized into prominent CRRs. Ag immunized animal. Indeed, the site of interest may not be immu- CR organization and CRS are more variable; however, the most nodominant or even immunogenic, and yet an Ab to the epitope is common CRS class, 50–79 residues, is very similar in size to IgL and highly desired. B cell epitope prediction algorithms are of little value IgH CRS. Given that the function of all Abs is to bind, it seems rea- in such instances, and thus strategies for developing Abs to specific sonable that Ab CRs are organized into uniform CRRs and distributed sites of native proteins regardless of their inherent immunogenicity across CRSs that are highly homogeneous in length. In contrast, Downloaded from are critical. epitopes reside on Ags that have a variety of different functions, so An important aspect of these analyses is that the majority of the it is perhaps not surprising that Ag CR and CRS are more variable. Abs are described in functional and therapeutic terms providing The fact that the most frequent Ag, IgH, and IgL CRSs are very information about this important class of molecules. Another im- similar in size suggests a common underlying structural organiza- portant aspect of this work is that all of the Ags consist of $85 aa. tion. In addition, the similarity in size between the large majority of

Structures with large Ags were selected in an attempt to reveal the Ag CRSs (20–229 residues), as well as the minimum Ab structure http://www.jimmunol.org/ full extent of Ab interaction with large, natively folded proteins. In retaining complete binding (Fv 230 aa), further suggests common this data set, there is a significantly higher average number of CRs structural elements. for large Ag structures compared with smaller Ags, and the epitope The basic structural unit within a protein, Ag and Ab, is a domain. CRS increases proportionally to the size of Ag in the crystal. Given Domains fold independently, mediate specific biological function, the earlier considerations and the fact that 92% of the 111 Ags and combine to form larger modular proteins. The smallest domains analyzed in this study are less than full-length, it seems likely that are ∼35 aa and the largest can be several hundred (27). The size of the values reported in this article for CRS, and to a lesser extent CR, the most frequently occurring protein domain is ∼100 aa (28), very may be underestimates of actual values, and previously reported similar in size to the average (125 aa) and median (95 aa) CRSs of estimates of CR based on even smaller protein fragments are lower the 111 Ag structures analyzed in this article. In addition, the size by guest on September 26, 2021 still. distribution of complete protein domains (28, 29) is strikingly Inthis study,weidentifyAgandAbCRsinpubliclyavailablex-ray similar to the distribution of Ag CRS presented in Fig. 3. Together, crystal structures defined as amino acids within a distance of 4 A˚ . these observations suggest that complete structural epitopes, rep- Other researchers have performed similar analyses using this resented by CRS, are the size of intact protein domains. methodology (10, 14, 15); however, prior studies have been more It is widely recognized that Ab framework regions are required for limited in scope and have not focused exclusively on structures proper paratope structure and Ab binding, and complete functional comprising large protein Ags. In addition, the length of the linear paratopes are formed only when CDR and framework regions are sequence containing all CR, the CRS, has not been previously re- structurally organized into VH (∼120 aa) and VL (∼110 aa) domains ported. The large number of structures analyzed in this article that together form a functional binding unit (Fv). In a similar fashion, highlights the extent of variability associated with the number of functional protein Ags are made up of “frameworks” of domains (30), CRs and their relative ratios, the length of CRS, and the organization and functional epitope conformation is determined by the underlying of CR within prominent Ab CRRs. The large number of structures domain structure. When viewed in composite, the observations also highlights specific differences between CDR and CRR. The that the number of Ag CRs is equal to the number of Ab CRs, Ag identification of viral Ag sequence positions contacted by multiple CRSs are similar in size to IgH and IgL CRSs, and the size of the neutralizing Abs, and the organization of Ab CR into CRR similar majority of Ag CRSs is less than or equal to Fv suggest that Abs to established CDR lend credence to distance between CR as recognize structural epitopes that are similar in size to VL,VH,and a useful means of characterizing Ag–Ab interactions. Fv, that is, one to two protein domains. The observation that the distributions of the numbers of Ag and Large protein epitopes are a collection of not only CRs, but many Ab CR significantly overlap and are both centered around ∼18 or 19 more residues that are necessary to present a native structure that CRs per structure suggests that, in general, Abs contact regions on may be as large as, and larger than, the physical footprint of an large protein Ags that are similar in size to their own physical Ab. Attempting to elicit or select Abs that bind full-length, native dimensions. Thus, one characteristic of an Ag epitope is that the proteins with peptides and polypeptides smaller than an entire epitope two-dimensional surface area containing all CRs is less than or not only risks non-native Ag folding, but Abs elicited in this way, equal to the physical dimensions of the binding region of an Ab. although capable of binding native structure, may still not bind to the Davies et al. (11) estimated these dimensions from studies of ly- full-length protein because of steric hindrance imposed by Ag residues sozyme–Ab crystal structures as an oval with the approximate within the footprint of the Ab but not included in the . dimensions of 20 3 30 A˚ . These dimensions represent the “foot- In the analysis of the functional, neutralizing epitopes of HIV-1 print” of Ab paratopes and large protein Ag epitopes. presented in this article, the most broadly neutralizing epitopes are A unique aspect of this work is the characterization of both the Ag composed of structures of ∼380 aa. Immunization with smaller poly- and Ab CRS. The analysis of many Ag–Ab complexes indicates that peptides would likely result in Abs with less CRs and, arguably, less 8 EPITOPE AND PARATOPE STRUCTURE affinity or perhaps less broadly neutralizing activity. In instances 3. Irving, M. B., L. Craig, A. Menendez, B. P. Gangadhar, M. Montero, N. E. van Houten, and J. K. Scott. 2010. Exploring peptide mimics for the production of where Abs bind to CR on multiple domains, the size of an immunogen against discontinuous protein epitopes. Mol. Immunol. 47: 1137– required to elicit such an Ab would be expected to be larger still, 1148. to encompass all CRs and all supporting framework amino acids 4. Brown, M. C., T. R. Joaquim, R. Chambers, D. V. Onisk, F. Yin, J. M. Moriango, Y. Xu, D. A. Fancy, E. L. Crowgey, Y. He, et al. 2011. Impact of immunization required for maximum contact. technology and assay application on antibody performance—a systematic A great deal of effort is expended attempting to predict protein comparative evaluation. PLoS ONE 6: e28718. epitopes. Zhang and coworkers (31) recently reported an improved 5. Jemmerson, R. 1987. Multiple overlapping epitopes in the three antigenic regions of horse cytochrome c1. J. Immunol. 138: 213–219. method of predicting conformational epitopes based on a “thick 6. Newman, M. A., C. R. Mainhart, C. P. Mallett, T. B. Lavoie, and S. J. Smith-Gill. surface patch” model including an “adjacent residue distance” fea- 1992. Patterns of antibody specificity during the BALB/c immune response to hen eggwhite lysozyme. J. Immunol. 149: 3260–3272. ture. This concept of an epitope includes the idea of CRs layered on 7. Jones, P. T., P. H. Dear, J. Foote, M. S. Neuberger, and G. Winter. 1986. top of supporting amino acids but fails to appreciate that the structure Replacing the complementarity-determining regions in a human antibody with of the immediately underlying amino acids is, in turn, dependent on those from a mouse. Nature 321: 522–525. 8. Padlan, E. A., C. Abergel, and J. P. Tipper. 1995. Identification of specificity- even more amino acids comprising the complete protein domain. determining residues in antibodies. FASEB J. 9: 133–139. Currently, B cell epitope prediction is largely focused on physico- 9. Ramirez-Benitez, M. C., and J. C. Almagro. 2001. Analysis of antibodies of chemical attributes of a protein Ag recognized by Abs such as surface known structure suggests a lack of correspondence between the residues in contact with the antigen and those modified by somatic hypermutation. Proteins exposure, solvent accessibility, spatial configuration, and so on. It 45: 199–206. is unlikely, however, that such characteristics will fully account 10. Novotny´, J., M. Handschumacher, E. Haber, R. E. Bruccoleri, W. B. Carlson, D. W. Fanning, J. A. Smith, and G. D. Rose. 1986. Antigenic determinants in for issues of immunodominance and tolerance that influence Ab de- proteins coincide with surface regions accessible to large probes (antibody velopment and involve other components of the such domains). Proc. Natl. Acad. Sci. USA 83: 226–230. Downloaded from as help. 11. Davies, D. R., E. A. Padlan, and S. Sheriff. 1990. Antibody-antigen complexes. Annu. Rev. Biochem. 59: 439–473. When attempting to develop functional and therapeutic Abs, it 12. MacCallum, R. M., A. C. R. Martin, and J. M. Thornton. 1996. Antibody-antigen is less important to identify the sites on a protein that may elicit interactions: contact analysis and binding site topography. J. Mol. Biol. 262: large amounts of Ab and is more important to direct Ab development 732–745. 13. Schlessinger, A., Y. Ofran, G. Yachdav, and B. Rost. 2006. Epitome: database of to sites where binding would modulate biological function, re- structure-inferred antigenic epitopes. Nucleic Acids Res. 34(Database issue): gardless of the site’s inherent immunogenicity. Given that there are D777–D780. http://www.jimmunol.org/ 14. Haste Andersen, P., M. Nielsen, and O. Lund. 2006. Prediction of residues in a limited number of immunodominant sites on a protein, immuni- discontinuous B-cell epitopes using protein 3D structures. Protein Sci. 15: 2558– zation with full-length proteins, although ensuring proper confor- 2567. mation and maximizing the potential interaction between Ag and 15. Ponomarenko, J. V., and P. E. Bourne. 2007. Antibody-protein interactions: benchmark datasets and prediction tools evaluation. BMC Struct. Biol. 7: Ab, may not elicit Abs directed to the biologically relevant site. 64. Immunization with peptides and polypeptides representing less 16. Ansari, H. R., and G. P. Raghava. 2010. Identification of conformational B-cell than a complete structural epitope has proved time and again to epitopes in an antigen from its primary sequence. Res. 6: 6. 17. Zhao, L., and J. Li. 2010. Mining for the antibody-antigen interacting associa- elicit Abs that may be directed to the site of interest but either do tions that predict the B cell epitopes. BMC Struct. Biol. 10(Suppl. 1): S6. not bind to native protein or bind in a way that does not modulate 18.Collis,A.V.J.,A.P.Brouwer,andA.C.R.Martin.2003.Analysisofthe

antigen combining site: correlations between length and sequence composition by guest on September 26, 2021 function. To develop functional and therapeutic Abs, it is critical of the hypervariable loops and the nature of the antigen. J. Mol. Biol. 325: to focus Ab development to potentially weakly immunogenic, bi- 337–354. ologically relevant sites in a way that the resulting Abs bind to full- 19. Almagro, J. C. 2004. Identification of differences in the specificity-determining residues of antibodies that recognize of different size: implications length native proteins and effect a desired function. for the rational design of antibody repertoires. J. Mol. Recognit. 17: 132– The analyses presented in this article demonstrate that functional 143. and therapeutic Abs bind structures composed of linear strings of 20. Kunik, V., B. Peters, and Y. Ofran. 2012. Structural consensus among antibodies defines the antigen binding site. PLOS Comput. Biol. 8: e1002388. contiguous amino acids that are the size of protein domains. As 21. Madej, T., K. J. Addess, J. H. Fong, L. Y. Geer, R. C. Geer, C. J. Lanczycki, defined by the CRS, a structural epitope is significantly larger than C. Liu, S. Lu, A. Marchler-Bauer, A. R. Panchenko, et al. 2012. MMDB: 3D structures and macromolecular interactions. Nucleic Acids Res. 40(Database previous definitions of epitopes that are focused primarily on CR. issue): D461–D464. A structural epitope includes CRs as well as the minimum number 22. Ofran, Y., A. Schlessinger, and B. Rost. 2008. Automated identification of of contiguous amino acids required to yield a compact native complementarity determining regions (CDRs) reveals peculiar characteristics of CDRs and B cell epitopes. J. Immunol. 181: 6230–6235. structure that is independent of other protein regions or even other 23. Wu, T. T., and E. A. Kabat. 1970. An analysis of the sequences of the variable molecules. Thus, immunization with independent protein domains regions of Bence Jones proteins and myeloma light chains and their implications representing complete structural epitopes is indicated for develop- for antibody complementarity. J. Exp. Med. 132: 211–250. 24. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, ment of functional and therapeutic mAbs targeted to biologically I. N. Shindyalov, and P. E. Bourne. 2000. The protein data bank. Nucleic Acids relevant and potentially weakly immunogenic sites. Res. 28: 235–242. 25. Martin, A. C. R. 2010. Protein sequence and structure analysis of antibody variable domains. In Antibody Engineering, Vol. 2. R. Kontermann, and Acknowledgments S. Du¨bel, eds. Springer-Verlag, Berlin, p. 33–51. 26. Kirkham, P. M., and H. W. Schroeder, Jr. 1994. Antibody structure and We thank Dr. Ross Chambers for insight and review of the manuscript and the evolution of immunoglobulin V gene segments. Semin. Immunol. 6: 347– Dr. Fenglin Yin for review of the database and Ab sequence numbering. 360. 27. Boeckmann, B., M. C. Blatter, L. Famiglietti, U. Hinz, L. Lane, B. Roechert, and A. Bairoch. 2005. Protein variety and functional diversity: Swiss-Prot annotation Disclosures in its biological context. C. R. Biol. 328: 882–899. The authors have no financial conflicts of interest. 28. Wheelan, S. J., A. Marchler-Bauer, and S. H. Bryant. 2000. Domain size dis- tributions can predict domain boundaries. Bioinformatics 16: 613–618. 29. Jones, S., M. Stewart, A. Michie, M. B. Swindells, C. Orengo, and J. M. Thornton. 1998. Domain assignment for protein structures using a con- References sensus approach: characterization and analysis. Protein Sci. 7: 233–242. 1. Strohl, W. R. 2009. Therapeutic monoclonal antibodies: past, present and future. 30. Savageau, M. A. 1986. Proteins of Escherichia coli come in sizes that are In Therapeutic Monoclonal Antibodies. Z. An, ed. John Wiley & Sons, Inc., multiples of 14 kDa: domain concepts and evolutionary implications. Proc. Natl. Hoboken, NJ, p. 3–50. Acad. Sci. USA 83: 1198–1202. 2. van Regenmortel, M. H. V. 2009. What is a B-cell epitope? In Methods in 31. Zhang, W., Y. Xiong, M. Zhao, H. Zou, X. Ye, and J. Liu. 2011. Prediction of Molecular Biology, Protocols, Vol. 524. U. Reineke, and M. conformational B-cell epitopes from 3D structures by random forests with Schutkowski, eds. Humana Press, New York, p. 3–20. a distance-based feature. BMC Bioinformatics 12: 341.