Mechanisms of protein oligomerization, the critical role of insertions and deletions in maintaining different oligomeric states

Kosuke Hashimoto and Anna R. Panchenko1

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894

Edited by Barry H. Honig, Columbia University/Howard Hughes Medical Institute, New York, NY, and approved October 4, 2010 (received for review September 7, 2010)

The main principles of protein-protein recognition are elucidated oligomers in the Protein Structure Database (14). Analysis of by the studies of homooligomers which in turn mediate and reg- high-throughput protein-protein interaction networks showed ulate gene expression, activity of , ion channels, receptors, that there are significantly more self-interacting proteins than and cell-cell adhesion processes. Here we explore oligomeric states expected by chance (15), and that the efficiency of coaggregation of homologous proteins in various organisms to better understand between different protein domains positively correlates with the functional roles and evolutionary mechanisms of homooligo- their similarity (16) due to stability, foldability, or evolutionary merization. We observe a great diversity in mechanisms controlling constraints (17, 18). Different evolutionary scenarios of protein oligomerization and focus in our study on insertions and deletions oligomerization have been discussed in the literature. Some of in homologous proteins and how they enable or disable complex the scenarios propose evolutionary mechanisms that follow formation. We show that insertions and deletions which differenti- kinetic pathways, domain swapping, or formation of leucine zip- ate monomers and dimers have a significant tendency to be located pers (8, 19–22). Gene duplication may lead to oligomeric para- on the interaction interfaces and about a quarter of all proteins logs and may create in evolution new protein complexes with studied and forty percent of enzymes have regions which mediate unique specificities (7, 11, 23–25). It has also been shown that or disrupt the formation of oligomers. We suggest that relatively binding modes and symmetries of homooligomeric complexes small insertions or deletions may have a profound effect on are conserved only between close homologs, and homooligomers complex stability and/or specificity. Indeed removal of complex with dihedral symmetry evolved through their cyclic intermedi- enabling regions from protein structures in many cases resulted ates (7, 26). in the complete or partial loss of stability. Moreover, we find that As homooligomers play important functional roles, the forma- insertions and deletions modulating oligomerization have a lower tion of multiple oligomeric interfaces and symmetry requirements aggregation propensity and contain a larger fraction of polar, put additional constraints on the evolution of constituent mono- charged residues, glycine and proline compared to conventional mers. Mechanisms of homooligomerization are not very well interfaces and protein surface. Most likely, these regions may understood although recent studies have shed some light on mediate specific interactions, prevent nonspecific dysfunctional their main principles. For example, it has been shown how low- aggregation and preclude undesired interactions between close affinity cadherin dimers formed through β-strand swapping paralogs therefore separating their functional pathways. Last, can mediate and control highly specific intercellular adhesion we show how the presence or absence of insertions and deletions (10, 27). In another study it was proposed that β-barrel membrane on interfaces might be of practical value in annotating protein proteins can oligomerize through the weakly stable interfacial oligomeric states. beta-strands (28), while the C-terminal helix in certain p53 proteins might be essential for stabilizing tetramers (29). Finally, ∣ ∣ homodimer homooligomer protein protein structural evolution the manual inspection of insertions and deletions of homologous proteins in different oligomeric states revealed that for about ow proteins interact is a basic question in cell biology, funda- a quarter of them certain protein regions are responsible for Hmental studies of physico-chemical and evolutionary me- enabling or disabling the oligomeric interfaces (30). chanisms of protein specific binding involve homooligomers In this paper we explore different oligomeric states of homo- and analysis of their complex formation and functional roles in logous proteins to better understand the functional and evolution- providing specific cellular function. Recent studies have led to ary mechanisms of homoligomerization. We observe a great a greater awareness that homooligomers provide diversity and diversity in mechanisms of oligomerization and focus our study specificity of many pathways and may mediate and regulate gene on how insertions or deletions in proteins may influence complex expression, activity of enzymes, ion channels, receptors, and formation. We show that insertions and deletions which differ- – cell-cell adhesion processes (1 10). Transitions between different entiate monomers and dimers have a significant tendency to be oligomeric states may also be important in regulation of located on the interfaces and about a quarter of all studied and tumor formation, and oligomeric complexes between homo- proteins and 40% of enzymes have regions which may mediate logous proteins can integrate different pathways and provide the or disrupt the formation of homodimers. Therefore we suggest cross talk between them (5, 11). Moreover, homooligomers may that relatively small insertions or deletions may represent an undergo reversible transitions between different discrete confor- mations which preserve the symmetry of the complex and account for their cooperative binding properties and allosteric mechan- Author contributions: A.R.P. designed research; K.H. performed research; K.H. and A.R.P. isms in signal transduction (12). In addition, oligomerization analyzed data; and K.H. and A.R.P. wrote the paper. allows proteins to form large structures without increasing gen- The authors declare no conflict of interest. ome size and provides stability, while the reduced surface area of This article is a PNAS Direct Submission. the monomer in a complex can offer protection against denatura- Freely available online through the PNAS open access option. tion (1–3, 9, 13). 1To whom correspondence should be addressed. E-mail: [email protected]. Although protein oligomeric states are often quite difficult to This article contains supporting information online at www.pnas.org/lookup/suppl/ characterize experimentally, the majority of proteins form homo- doi:10.1073/pnas.1012999107/-/DCSupplemental.

20352–20357 ∣ PNAS ∣ November 23, 2010 ∣ vol. 107 ∣ no. 47 www.pnas.org/cgi/doi/10.1073/pnas.1012999107 Downloaded by guest on September 28, 2021 50 important evolutionary mechanism of oligomerization, and may Enabling profoundly affect complex stability and the development of new 40 Disabling specificities. Indeed our computational experiments have demon- 30 strated that removal of enabling regions from protein structures results in the complete or partial loss of dimer stability. Moreover, 20 enabling and disabling regions may allow proteins to develop new 10 specific interactions and prevent undesired interactions between 0 close paralogs, thereby facilitating the separation of their func- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2324 25 tional pathways. This assumption is supported by analyses of Length of regions sequence and structure, composition, and by calcula- Fig. 1. The length distribution of the enabling and disabling regions in the tions of aggregation propensities and free energies of dissociation nonredundant set. of complexes. Lastly, we show how the enabling/disabling features might be of practical value in annotating protein oligomeric states. long whereas some of the disabling regions contain up to twenty five residues. Analysis of functional categories of families with Results enabling/disabling regions showed that families more fre- Identifying Mechanisms of Dimerization. We obtained 4,419 pairs quently contained such regions (72 out of 168 families, binomial of a dimer and a closest monomer from the same Conserved test p-value < 0.003) than other families (46 out of 199 families). Domain Database (CDD) family for the full dataset and 532 Although even one extra residue on an interface might affect pairs for the nonredundant dataset (see Materials and Methods). oligomer formation we concentrated on regions of four or more This list contained 367 different CDD families (348 families in residues that clearly mediate or disrupt the homodimer interface. the nonredundant dataset) encompassing a wide range of protein (Table S1 lists manually analyzed representatives of enabling and biological functions. About half of these families (168 families) disabling regions of four or more residues and all regions are represented enzymes and other major functional categories in- listed in Table S2). All monomeric states from this table (except cluded regulatory, signaling, and transport proteins. We analyzed for three entries) were confirmed using the (Inferred Biomolecu- pairs of dimers and monomers, their sequence, structural similar- lar Interaction Server) IBIS server (35) and dimer interfaces ity, and energy of dimer dissociation. The sequence similarity were additionally verified by the NOXclass method (36). We also between proteins in the dimeric and monomeric states varied con- estimated the free energy of dissociation of dimers with the PISA siderably with the highest fraction of proteins sharing 90–100% algorithm (ΔGdiss) which uses principles of chemical thermody- sequence identity between a dimer and a monomer (Fig. S1A), namics to detect macromolecular assemblies. We showed that which corresponds to the same proteins in different oligomeric many dimers were characterized by high binding affinity, ΔGdiss > states or cases where point mutation leads to the formation or 4 ∕ disruption of a dimer. Upon examination the majority of these kcal mol [this value corresponds to standard energy of disso- dimers were formed by domain swapping or through small scale ciation under the assumption that equilibrium concentrations of conformational changes. Point mutation can also have a consider- dimers and monomers are equal (37)], implying that they were able effect on dimer formation and implicate many diseases (31). quite stable. Moreover, we performed the excision of enabling For example, we found point mutations of glutamate receptors regions from the structures of protein dimers from the represen- which apparently stabilize their dimer interface and reduce tative set and recalculated the free energy of dissociation. We desensitization (1lb8-1mm6). Point mutation could also shift equi- showed that removal of enabling regions disrupted the dimer librium between different oligomeric states in Pyrococcus furiosus formation in about half of families and destabilized the dimer mutants (1iz5-1iz4) and disrupt the salt bridge stabilizing the in other cases (Table S1). The effect of destabilization depended functional dimer in nuclear hormone receptors (2pin-1n46). All considerably upon the . When we examined struc- these findings were supported by experimental studies (32–34). tural similarity in the aligned regions between proteins in two Other mechanisms observed in our set of dimers and monomers oligomeric states, it always turned out to be quite high (drmsd < included:—presence of common stabilizing ligands on interfaces 3 Å) indicating that the dimerization was not caused by intra- including ions (“ligand induced dimerization”);—regulation of chain conformational changes or involved domain swapping. dimerization through posttranslational modifications including After examining secondary structure propensities of enabling phosphorylation and disulfide bond formation between two sub- and disabling regions we found that the majority of these regions units; and —presence of insertions/deletions which favored constituted loops (54% of enabling and 59% of disabling regions dimeric or monomeric states. We will discuss the latter mechanism respectively), followed by the α-helix (37% and 30%), and in more detail. β-strand (9% and 11%) (Fig. 2 and Fig. S2). Moreover, loops and strands occurred significantly more often in enabling regions Mechanism of Dimerization Through Insertions/Deletions of Protein than on the overall protein surface (Fisher exact test p-value ≪ Regions. We analyzed differences in structures between mono- 0.05)(Table S3). In our previous study on glycosyltransferases we mers and dimers to assess whether or not the gapped or unaligned found that loops were also significantly more prevalent among residues were more frequently located on interface vs. other types enabling regions (24). of regions. We showed that the unaligned and gapped residues α β (inserted in the dimer compared to the monomer), occurred Loop -helix -strand 9% 11% more frequently on the interface than on the surface (P-values ≪ 54% 59% 10e−7)(Fig. S1B). This observation implies that they may play 37% 30% an important role in forming the dimer interface, constituting enabling regions. In addition, we noticed that for very remotely related monomers and dimers (of less than 20% identity) en- abling features might either be absent or obscured by the overall Enabling Disabling structural differences as a result of evolutionary divergence. Fig. 2. Percentage of secondary structures in the enabling and disabling We found 1,020 enabling regions and 351 disabling regions regions in the nonredundant set. Percentage of secondary structures in from the full dataset and 108 enabling and 49 disabling regions the full dataset is shown in Fig. S2. In all cases including the full dataset, more from the nonredundant dataset. Fig. 1 shows the length distribu- than half of enabling and disabling regions are located in loops, approxi- BIOPHYSICS AND

tion of these regions. Most regions are less than nine residues mately 30–40% in α-helices, and 10% in β-strands. COMPUTATIONAL BIOLOGY

Hashimoto and Panchenko PNAS ∣ November 23, 2010 ∣ vol. 107 ∣ no. 47 ∣ 20353 Downloaded by guest on September 28, 2021 Amino Acid Composition and Propensity to Aggregate for Enabling 8.0 Interface (align) and Disabling Regions. Previous studies showed that amino acid Surface composition on homooligomeric interfaces differed from those Enabling on surfaces and buried regions (38). We confirmed that most .0 Disabling of the dimer interfaces contained hydrophobic and some aro- .

matic amino acids (Fig. S3A). We also found that enabling and 06

disabling regions contained more polar and charged residues than Frequency the aligned interface regions, “conventional interfaces,” (Fig. 3) . (although these biases were not pronounced) and contained 04 significantly larger amounts of glycine and proline (Fig. 3 and 0.

Fig. S3B). Indeed, the effects of proline and glycine on the inhi- 02 bition of nonspecific aggregation and amyloid formation have −10 −8 −6 −4 −2 0 been confirmed previously (39). Aggregation Propensity To understand the role of enabling and disabling regions Fig. 4. Aggregation propensities calculated using the nonredundant set in specific biological interactions and in preventing nonspecific for four different regions: enabling, disabling, aligned interface, and overall aggregation, we calculated aggregation propensities using poten- molecular surface. The distributions of aggregation propensities are tials developed previously (40). Unlike amino acid composition smoothed by the Gaussian kernel density estimation. Conventional interfaces analysis the aggregation propensity calculations take into account have the highest aggregation propensity, whereas disabling regions have the residue context, the presence of specific patterns of alternat- the lowest aggregation propensity. Aggregation propensities for the full ing hydrophobic and hydrophilic residues, and residue charges. A dataset are also shown in Fig. S4B. recent study by Pechmann et al. (41) reported that interfaces of creating a protruding interaction surface whereas this β-sheet is protein complexes are more prone to aggregate than surface. missing in monomeric human proteins (Fig. 5). Consistent with this study we found similar trends for homooli- According to the second structural mechanism, which consti- gomers (Wilcoxon signed-rank test p-value < 10e−10, Fig. S4A) tutes the majority of the cases, existing secondary structure ele- even though homooligomers might be under selection pressure to ments are extended to form the interface. One example includes have lower aggregation propensity compared to heterooligomers glycoside (pfam10566), BtGH97a and BtGH97b. The (42). Importantly, we observed that while enabling regions have a somewhat higher propensity to aggregate than the protein surface difference between these two structures might be the result of their aggregation propensity is lower than that for conventional insertions of two loops in BtGH97a that strongly contribute to aligned interfaces (p-value < 10e−10) (Fig. 4 and Fig. S4B). In- form the homodimer interface (Fig. 6A). In addition, one of these terestingly, disabling regions have quite low aggregation propen- loops reaches close to the catalytic center of the counterpart of sity, considerably lower than that for enabling regions or con- the homodimer, implying its possible involvement in the catalytic ventional interfaces (p-value < 10e−10). reaction or substrate binding (Fig. S5A). Moreover, we found that the two enabling loops were present only in one subfamily Evolutionary and Structural Mechanisms to Form Enabling and Dis- BtGH97a and absent in the other BtGH97b (Fig. S5B). Based abling Regions. We analyzed enabling regions and found two on the facts that the evolutionary conservation of these loops typical structural mechanisms to enable extra residues to form is directly related to the catalytic mechanism and at the same time an interface. According to the first scenario an insertion in a these loops mediate the formation of oligomeric states, we can dimer (or deletion in a monomer, we do not have enough data conclude that catalytic mechanism and binding specificity can to infer ancestral states and distinguish these evolutionary events) be modulated through different functional oligomeric states. This creates a new secondary structure element or loop that functions conclusion has indeed been confirmed by recent studies (43) for as a new interaction surface. For example, dimeric extracellular glycosyl hydrolases and will be discussed further in the following glutamyl (cd00190, Table S1) has a loop which is sections of our paper. Fig. 6 B and C shows examples of the elongated through the insertion of an enabling region which β-strand and α-helix extensions. For example, in the pair of an forms a new β-sheet. The β-sheet extends in a vertical direction, aminoimidazole riboside kinase and a fructokinase (cd01167), the extension of two strands on the homodimer interface enables 0.4 A the formation of a β-sheet between two subunits. 0.2 MWCSQ I YAVTL 0.0 HRFEDKNPG* -0.2 Log odds ratio -0.4

-0.6 B 0.6 0.4 0.2 LIVCAYREHKWQ 0.0 SFTMDGNP -0.2 Enabling region in homodimer (1P3C) -0.4 Aligned region in monomer (1FQ3) Log odds ratio -0.6 -0.8 Fig. 5. Illustration of enabling features with insertion of new secondary structure elements, 1P3C—1FQ3 pair from the -like protease -1.0 family (cd00190). Two subunits of the homodimers are shown in light-blue Fig. 3. The amino acid propensities in the nonredundant set for (A) enabling and light-red. Enabling regions and their surrounding residues are repre- regions (B) disabling regions. The propensities of the full dataset are shown sented by the red tubes, whereas corresponding residues in the monomer in Fig. S3 C and D. The propensities are calculated as the log ratio between are represented by the yellow tubes. A small β-sheet contributing to the frequencies of a particular amino acid in enabling/disabling regions and enabling interaction (shown in red tube) is absent from the monomer (shown frequency of the same amino acid in aligned conventional interfaces. in the yellow tube).

20354 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1012999107 Hashimoto and Panchenko Downloaded by guest on September 28, 2021 (a) 2D73(homodimer) - 3A24(monomer) (b) 1TYY(homodimer) - 2QHP(monomer)

ABEnabling region in homodimer (2D73) Enabling region in homodimer (1TYY) CEnabling region in homodimer (3E3A) Aligned region in monomer (3A24) Aligned region in monomer (2QHP) Aligned region in monomer (3E0X)

Fig. 6. Illustration of enabling features with the extension of existing secondary structure elements. (A) 2D73—3A24 from the glycoside family (pfam10566). Two longer loops create an interface that is absent from the monomer (B) 1TYY—2QHP from the Fructokinases family (cd01167). The β-sheet is more extensive in the homodimer than in the monomer. (C) 3E3A—3E0X from the Esterases and lipases family (cd00312). Two α-helices are extended in the homodimer compared to the monomer. In all figures, the two subunits of the homodimers are shown in light-blue and light-red. Enabling regions are represented by red tubes, whereas corresponding regions in the monomer are shown by yellow tubes.

We analyzed the structural mechanisms preventing dimer for- abling features achieves a very good classification accuracy of mation as illustrated by the example of phosphonate monoester 0.70 sensitivity, 0.74 specificity, and 0.94 precision (Table 1) even hydrolase (cd00016). As can be seen in Fig. 7, a disabling region, though this approach does not explicitly use the information about shown in yellow, fills a part of the interaction surface and disrupts the level of similarity between unknown and annotated proteins. the dimer formation. The C-terminal region in the dimer, shown High precision of our prediction comes from the low rate of false in blue, forms the interface while in the monomer this interface is positives once the enabling/disabling features are detected in the covered by its own C-terminal region which increases its stability. family and does not drop considerably even for remote homology Interestingly, it has been shown experimentally that truncation range between unknown and annotated proteins. At the same of the C-terminal region that forms the homodimer interface time, similarity search methods including BLASTcan also predict severely affects the enzyme activity (44) and moreover, the effect oligomeric states by inference from homologs with the accuracy of the interface truncation is much larger than the effect of the considerably higher than expected from random assignments mutations. (sensitivity 0.74, specificity 0.53, precision 0.89). Annotating Unknown Oligomeric States. The experimental identifi- Discussion cation of oligomeric states is a tedious task while the computa- Mechanisms of Oligomerization Are Very Diverse. From our analysis tional annotation is complicated by the limited conservation of of different oligomeric states in proteins from the same family oligomeric states in evolution (7, 26). Detecting features which we observed several mechanisms which play key roles in oligomer- mediate or disrupt the oligomer formation can help in this respect ization: -domain swapping; -“ligand induced dimerization”; -point to predict or verify the existing oligomeric states and biological mutations on dimer interfaces;—posttranslational modifications binding modes. Using our representative set of dimers and mono- including phosphorylation on dimer interfaces and disulfide bond mers with enabling and disabling features from Table S1 we anno- formation between two subunits; and —insertions/deletions tated the homooligomeric states for all protein entries from our which favor dimeric or monomeric states. We focused in this paper set (excluding proteins from the representative set and protein on the latter mechanism and showed that unaligned and gapped pairs from the 90–100% identity bin, Fig. S1A). Although, as can residues occurred more frequently on the interface than on the be seen in Table S4, the accuracy of oligomeric state prediction depends considerably on the family, the simultaneous presence surface and about a quarter of all studied proteins and about of enabling and disabling features in different members of the 40% of all enzyme families had regions modulating the formation family considerably facilitates the prediction, achieving close to of dimers. Indeed, it was shown previously that oligomerization is 100% accuracy. Overall, the presence/absence of enabling or dis- especially important for enzyme activation/deactivation in meta- bolic pathways, for regulating their cellular location, and interac- A B tions with other components (3, 5, 6). The enabling and disabling Disabling regions may occur through insertion and deletion events in evolu- region tion and although these genetic events are relatively rare com-

Table 1. Prediction accuracy of oligomeric states

Error rate Sensitivity Specificity Precision 1- TN/ 90 TP/(TP + FN) TN/(FP + TN) TP/(TP + FP) (FP + TN)

Presence/absence of enabling 0.70 0.74 0.94 0.36 Homodimer interface and disabling features Interacting region of another subunit Percent identity 0.71 0.62 0.91 0.38 rmsd 0.72 0.60 0.90 0.40 Fig. 7. Illustration of disabling features of 2VQR—3B5Q from the phospho- GSAS 0.81 0.57 0.91 0.43 nate monoester hydrolase family (cd00016). (A) Front view of the interface in BLAST 0.74 0.53 0.89 0.47 2VQR (homodimer) with several disabling residues shown in yellow for 3B5Q (monomer). (B) Side view of the interface. Interface and surface on one Assignment of oligomeric states based on presence/absence of enabling/ subunit of the homodimer are shown in dark-red and light-red. The binding disabling features or based on the closest similarity to proteins with region (C-terminal region) of the other subunit is represented by the blue enabling/disabling features using different similarity metrics rmsd, GSAS, BIOPHYSICS AND

tube. A disabling region that fills a part of the interface is shown in yellow. sequence identity, and BLAST p-value COMPUTATIONAL BIOLOGY

Hashimoto and Panchenko PNAS ∣ November 23, 2010 ∣ vol. 107 ∣ no. 47 ∣ 20355 Downloaded by guest on September 28, 2021 pared to point mutations (45), it was shown previously that many At the same time enabling regions may prevent nonspecific dys- insertions and deletions are under strong selective pressure in functional aggregation, shielding “sticky” hydrophobic patches of proteins (46, 47). Our detailed analysis of biological functions, the conventional interface from the solvent. The effect of these structures, and evolutionary conservation of proteins in different regions is similar to the role of “gatekeeping” residues preventing oligomeric states from the studied families revealed several main nonspecific aggregation (41, 51). Disabling regions prevent any functional roles of enabling and disabling features. aggregation or association with other partners and might be spe- cific in the sense of preventing the participation in nonspecific Enabling Regions Are Important for Complex Stability. First of all, functional pathways assisting the functional separation of para- enabling features can be important for stability of dimers. Indeed, logs. In our previous paper we demonstrated that specific function the vast majority of dimers with enabling features from our repre- and binding selectivity in homodimers might be also sustained sentative set correspond to obligatory complexes and deletion of by disordered loops (52). We should mention, that although here enabling features in the majority of cases leads to considerable we did not study explicitly the amino acid substitutions which can destabilization of complexes. This scenario is realized for proteins govern the complex formation, previous studies clearly showed from the dihydropholate reductase family (DHFR, cd00209) that most of these substitutions involved aromatic and hydropho- which preferably exist as dimers in hyperthermophilic organisms bic residues which increased binding affinity and stabilized the and monomers in mesophiles. Moreover, monomers from this homooligomer (53, 54), but at the same time compromised the family have disabling regions so that they do not compromise specificity and reversibility of the complex formation. Finally, catalytic activity for stability [dimers have reduced catalytic activ- our analysis concludes that presence of enabling and disabling ity (48)]. The inspection of these dimers showed that they are features can be a strong predictor of the oligomeric state once stabilized through the enabling regions which form “clapping hands” conformations with an extensive set of contacts. We these features are found in the target protein, resulting in the should distinguish this mechanism of dimer formation from the lower false positive rate and high precision of the prediction. “ ” previously described domain swapping mechanism (19) which Materials and Methods includes the opening up of the monomeric conformation and Annotating Oligomeric States and Interaction Interfaces. First we extracted all exchanging identical regions between two monomers. While do- available protein chains from the Molecular Modeling Database (MMDB) main swapping involves substantial conformational changes and (55). Then we selected single domain chains, which had only one domain the stabilities of swapped dimers and monomers might be com- covering at least 80% of the entire chain. The domain definitions and bound- parable, dimers formed through enabling features may differ aries were obtained from the CDD version 2.17 (56), which is a curated considerably in their energy of stabilization from the monomers collection of multiple sequence alignments of protein domains defined as and as we showed do not involve large intrasubunit conforma- recurrent evolutionary units having specific functions and structures. For tional changes. each chain, oligomeric states and interaction interfaces were annotated and free energy of dissociation was calculated using the PISA program, which – Enabling and Disabling Regions Are Important for Developing New achieves 80 90% accuracy for the correct identification of macromolecular Specificities and Separation of Functional Pathways. In another assemblies (37). Because the majority of homooligomers in MMDB are dimers scenario, enabling/disabling regions may correspond to the speci- and higher order homooligomeric states are much more difficult to decipher using existing methods, in the current study we concentrated on homodimers ficity determining features or loops characteristic only for certain defined as assemblies consisting of two chains with at least 90% sequence subfamilies of dimers or monomers, which can provide high bind- identity. ing affinity to certain substrates while minimizing interactions with other unwanted partners. Examples of how enabling or dis- Comparison of Homologs in Different Oligomeric States. We searched for the abling regions can regulate the accessibility of substrates to the most similar monomer-dimer pair of proteins from the same CDD family. active site by imposing certain steric constraints include the viru- Similarity was assessed with the VAST program (57) using the gapped struc- lence factor choline-binding protein subfamily from the Metallo- tural alignment score (GSAS) (58). We compiled a full dataset including all beta-lactamase superfamily (cl00446) (49) or subfamilies from protein pairs and a nonredundant dataset (similar monomer/dimer chains Trypsin-like serine proteases (cd00190) (50). From an evolution- with BLAST p-value < 10e−07 were excluded). We then compared dimer and ary perspective, introducing enabling or disabling regions in a monomer structures for each pair of proteins to determine those features complex may allow the development of new specificities and which might mediate or disrupt the formation of the dimer or monomer. prevent undesired interactions between close paralogs, therefore Based on the VAST structure-structure superpositions between dimers and facilitating the separation of their functional pathways. Because monomers, all protein regions were classified into three categories: aligned, gapped, and unaligned regions. Here, gapped region refers to a set of con- oligomeric state often determines the binding affinity and activity secutive residues that are present in a dimer and absent from the monomer of proteins, developing new specificities in different organisms or (and vice versa), whereas unaligned region includes all the residues except for in paralogs from the same organism by changing the functional aligned or gapped residues. We also classified residues into three categories oligomeric forms could indeed be an important evolutionary in terms of their location on the structure: interface, surface, and buried mechanism. At the same time it is known that the activity of many residues. Interface, surface, and buried residues were defined by PISA. We oligomeric enzymes (many of which are presented in Table S1) searched for the regions that enabled or disabled the formation of interface and nuclear receptors are allosterically regulated, which can rein- in homodimers. An enabling region is defined as a gapped region on homo- force their specific binding (12). dimers where 80% of the gapped residues are also annotated as interface To support the specificity-defining role of enabling and dis- residues. A disabling region is defined as a gapped region on monomers that abling regions, we showed that while enabling regions have a is surrounded by two aligned residues, which corresponds to the interface somewhat higher propensity to aggregate than the protein surface, residues on the dimer. their aggregation propensity and binding affinity is lower than that for the conventional aligned interfaces. Disabling regions have Calculating Amino Acid Composition and Aggregation Propensity. We calcu- lated amino acid propensities to be located preferentially in different regions considerably lower aggregation propensity. Based on these results ðiÞ¼LnðF i ∕F i Þ F i F i of proteins as Propensity int sur , where int and sur are the frac- and amino acid composition analysis (showing that enabling/ tions of amino acid i on the interface and surface, respectively. The aggrega- disabling regions contain a large fraction of polar and charged tion propensity of regions was calculated using the Zyggregator method, residues as well as Gly and Pro), we can conclude that most which considers physico-chemical properties, hydrophobicity, charges, the likely, enabling regions may play an important role in mediating propensities to α-helix or β-sheet, as well as residue context, and the presence specific (in many cases of electrostatic nature) interactions includ- of specific patterns of alternating hydrophobic and hydrophilic residues (40). ing salt bridges, aromatic ring pairing, and cation-π interactions. (See SI Text for details).

20356 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1012999107 Hashimoto and Panchenko Downloaded by guest on September 28, 2021 Estimating Accuracy for Predicting Oligomeric States. Each protein was com- dicted oligomers. True negatives (TN) were defined as those which did not pared to the representative set of proteins with enabling/disabling features pass prediction criteria and were not assigned the wrong oligomeric state from the same family (Tables S1 and S4). Then the oligomeric state of the and false negatives (FN) as those which did not pass prediction criteria unknown protein was assigned based on either the presence/absence of and were not assigned the correct oligomeric state. Then the sensitivity enabling/disabling features or based on the closest similarity to annotated TP∕ðTP þ FNÞ was calculated as the number of TP divided by the sum of proteins with enabling/disabling features using different similarity metrics TP and FN. Specificity was estimated as the ratio TN∕ðFP þ TNÞ, error rate rmsd, GSAS, sequence identity, and BLAST p-value. CDD families were classi- fied into three types based on the presence of the enabling and disabling as one minus the specificity value and precision was calculated as the ratio ∕ð þ Þ features: families with only enabling regions, families with only disabling TP TP FP . regions, and families with both enabling and disabling regions. The assign- ment of oligomeric state was based on the fraction of enabling/disabling ACKNOWLEDGMENTS. We thank Steve Bryant for insightful discussions and regions aligned, if more than 80% of the enabling region was aligned to Tom Madej for careful reading of the manuscript. This work was supported the test protein, the protein was predicted to be a dimer and if more than by National Institutes of Health/Department of Health and Human Service 80% of the disabling region in a protein was aligned, it was predicted to (DHHS) (Intramural Research program of the National Library of Medicine). be a monomer (see details in Table S4). True positives (TP) were defined as K.H. was supported by a JSPS Research Fellowship from the Japan Society for correctly predicted oligomeric states, false positives (FP) as incorrectly pre- the Promotion of Science.

1. Cornish-Bowden AJ, Koshland DE, Jr (1971) The quaternary structure of proteins 31. Schuster-Bockler B, Bateman A (2008) Protein interactions in human genetic diseases. composed of identical subunits. J Biol Chem 246:3092–3102. Genome Biol 9:R9. 2. Jones S, Thornton JM (1995) Protein-protein interactions: a review of protein dimer 32. Sun Y, et al. (2002) Mechanism of glutamate receptor desensitization. Nature structures. Prog Biophys Mol Biol 63:31–65. 417:245–253. 3. Torshin I (1999) Activating oligomerization as intermediate level of signal transduc- 33. Matsumiya S, Ishino S, Ishino Y, Morikawa K (2003) Intermolecular ion pairs maintain tion: analysis of protein-protein contacts and active sites in several glycolytic enzymes. the toroidal structure of Pyrococcus furiosus PCNA. Protein Sci 12:823–831. Front Biosci 4:D557–570. 34. Estebanez-Perpina E, et al. (2007) Structural insight into the mode of action of a direct 4. Woodcock JM, Murphy J, Stomski FC, Berndt MC, Lopez AF (2003) The dimeric versus inhibitor of coregulator binding to the thyroid hormone receptor. Mol Endocrinol monomeric status of 14-3-3zeta is controlled by phosphorylation of Ser58 at the dimer 21:2919–2928. interface. J Biol Chem 278:36323–36327. 35. Shoemaker BA, et al. (2009) Inferred Biomolecular Interaction Server–a web server to 5. Mazurek S, Boschek CB, Hugo F, Eigenbrodt E (2005) Pyruvate kinase type M2 and its analyze and predict protein interacting partners and binding sites. Nucleic Acids Res role in tumor growth and spreading. Semin Cancer Biol 15:300–308. 38:D518–D524. 6. Koike R, Kidera A, Ota M (2009) Alteration of oligomeric state and domain architec- 36. Zhu H, Domingues FS, Sommer I, Lengauer T (2006) NOXclass: prediction of protein- ture is essential for functional transformation between and hydrolase with protein interaction types. BMC Bioinformatics 7:27. – the same scaffold. Protein Sci 18:2060 2066. 37. Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystalline 7. Dayhoff JE, Shoemaker BA, Bryant SH, Panchenko AR (2010) Evolution of protein bind- state. J Mol Biol 372:774–797. – ing modes in homooligomers. J Mol Biol 395:860 870. 38. Ofran Y, Rost B (2003) Analyzing six types of protein-protein interfaces. J Mol Biol 8. Baisamy L, Jurisch N, Diviani D (2005) Leucine zipper-mediated homo-oligomerization 325:377–387. – regulates the Rho-GEF activity of AKAP-Lbc. J Biol Chem 280:15405 15412. 39. Rauscher S, Baud S, Miao M, Keeley FW, Pomes R (2006) Proline and glycine control 9. Goodsell DS, Olson AJ (2000) Structural symmetry and protein function. Annu Rev protein self-organization into elastomeric or amyloid fibrils. Structure 14:1667–1676. – Biophys Biomol Struct 29:105 153. 40. Tartaglia GG, Vendruscolo M (2008) The Zyggregator method for predicting protein 10. Chen CP, Posy S, Ben-Shaul A, Shapiro L, Honig BH (2005) Specificity of cell-cell aggregation propensities. Chem Soc Rev 37:1395–1401. adhesion by classical cadherins: Critical role for low-affinity dimerization through 41. Pechmann S, Levy ED, Tartaglia GG, Vendruscolo M (2009) Physicochemical principles – beta-strand swapping. Proc Natl Acad Sci USA 102:8531 8536. that regulate the competition between functional and dysfunctional association of 11. Reid AJ, Ranea JA, Orengo CA (2010) Comparative evolutionary analysis of protein proteins. Proc Natl Acad Sci USA 106:10159–10164. complexes in E.coli and yeast. BMC Genomics 11:79. 42. Chen Y, Dokholyan NV (2008) Natural selection against protein aggregation on self- 12. Changeux JP, Edelstein SJ (2005) Allosteric mechanisms of signal transduction. Science interacting and essential proteins in yeast, fly, and worm. Mol Biol Evol 25:1530–1533. 308:1424–1428. 43. Kitamura M, et al. (2008) Structural and functional analysis of a glycoside hydrolase 13. Miller S, Lesk AM, Janin J, Chothia C (1987) The accessible surface area and stability of family 97 enzyme from Bacteroides thetaiotaomicron. J Biol Chem 283:36328–36337. oligomeric proteins. Nature 328:834–836. 44. Jonas S, van Loo B, Hyvonen M, Hollfelder F (2008) A new member of the alkaline 14. Henrick K, Thornton JM (1998) PQS: a protein quaternary structure file server. Trends phosphatase superfamily with a formylglycine nucleophile: structural and kinetic Biochem Sci 23:358–361. characterization of a phosphonate monoester hydrolase/phosphodiesterase from 15. Ispolatov I, Yuryev A, Mazo I, Maslov S (2005) Binding properties and evolution of Rhizobium leguminosarum. J Mol Biol 384:120–136. homodimers in protein-protein interaction networks. Nucleic Acids Res 33:3629–3635. 45. Benner SA, Cohen MA, Gonnet GH (1993) Empirical and structural models for inser- 16. Wright CF, Teichmann SA, Clarke J, Dobson CM (2005) The importance of sequence tions and deletions in the divergent evolution of proteins. J Mol Biol 229:1065–1082. diversity in the aggregation and evolution of proteins. Nature 438:878–881. 46. Jiang H, Blouin C (2007) Insertions and the emergence of novel protein structure: a 17. Lukatsky DB, Shakhnovich BE, Mintseris J, Shakhnovich EI (2007) Structural similarity structure-based phylogenetic study of insertions. BMC Bioinformatics 8:444. enhances interaction propensity of proteins. J Mol Biol 365:1596–1606. 47. Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR (2007) Long-term trends in 18. Andre I, Strauss CE, Kaplan DB, Bradley P, Baker D (2008) Emergence of symmetry in evolution of indels in protein sequences. BMC Evol Biol 7:19. homooligomeric biological assemblies. Proc Natl Acad Sci USA 105:16148–16152. 19. Bennett MJ, Schlunegger MP, Eisenberg D (1995) 3D domain swapping: a mechanism 48. Dams T, et al. (2000) The crystal structure of dihydrofolate reductase from Thermotoga – for oligomer assembly. Protein Sci 4:2455–2468. maritima: molecular features of thermostability. J Mol Biol 297:659 672. 20. D’Alessio G (1995) Oligomer evolution in action? Nat Struct Biol 2:11–13. 49. Garau G, Lemaire D, Vernet T, Dideberg O, Di Guilmi AM (2005) Crystal structure of 21. Xu D, Tsai CJ, Nussinov R (1998) Mechanism and evolution of protein dimerization. phosphorylcholine esterase domain of the virulence factor choline-binding protein e Protein Sci 7:533–544. from streptococcus pneumoniae: new structural features among the metallo-beta-lac- – 22. Tiana G, Broglia RA (2002) Design and folding of dimeric proteins. Proteins 49:82–94. tamase superfamily. J Biol Chem 280:28591 28600. 23. Pereira-Leal JB, Teichmann SA (2005) Novel specificities emerge by stepwise duplica- 50. Estebanez-Perpina E, et al. (2000) Crystal structure of the caspase activator human tion of functional modules. Genome Res 15:552–559. B, a proteinase highly specific for an Asp-P1 residue. Biol Chem – 24. Hashimoto K, Madej T, Bryant SH, Panchenko AR (2010) Functional states of homoo- 381:1203 1214. ligomers: insights from the evolution of glycosyltransferases. J Mol Biol 399:196–206. 51. Rousseau F, Serrano L, Schymkowitz JW (2006) How evolutionary pressure against 25. Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA (2007) Evolution of protein complexes protein aggregation shaped chaperone specificity. J Mol Biol 355:1037–1047. by duplication of homomeric interactions. Genome Biol 8:R51. 52. Fong JH, et al. (2009) Intrinsic disorder in protein interactions: insights from a 26. Levy ED, Boeri Erba E, Robinson CV, Teichmann SA (2008) Assembly reflects evolution comprehensive structural analysis. PLoS Comput Biol 5:e1000316. of protein complexes. Nature 453:1262–1265. 53. Nishi H, Ota M (2010) Amino acid substitutions at protein-protein interfaces that 27. Katsamba P, et al. (2009) Linking molecular affinity and cellular specificity in cadherin- modulate the oligomeric state. Proteins 78:1563–1574. mediated adhesion. Proc Natl Acad Sci USA 106:11594–11599. 54. Grueninger D, et al. (2008) Designed protein-protein association. Science 319:206–209. 28. Naveed H, Jackups R, Jr, Liang J (2009) Predicting weakly stable regions, oligomeriza- 55. Wang Y, et al. (2007) MMDB: annotating protein sequences with Entrez’s 3D-structure tion state, and protein-protein interfaces in transmembrane domains of outer mem- database. Nucleic Acids Res 35:D298–D300. brane proteins. Proc Natl Acad Sci USA 106:12735–12740. 56. Marchler-Bauer A, et al. (2009) CDD: specific functional annotation with the Conserved 29. Joerger AC, et al. (2009) Structural evolution of p53, p63, and p73: implication for Domain Database. Nucleic Acids Res 37:D205–210. heterotetramer formation. Proc Natl Acad Sci USA 106:17705–17710. 57. Gibrat JF, Madej T, Bryant SH (1996) Surprising similarities in structure comparison. 30. Akiva E, Itzhaki Z, Margalit H (2008) Built-in loops allow versatility in domain- Curr Opin Struct Biol 6:377–385. domain interactions: lessons from self-interacting domains. Proc Natl Acad Sci USA 58. Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure 105:13292–13297. alignment methods: scoring by geometric measures. J Mol Biol 346:1173–1188. BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Hashimoto and Panchenko PNAS ∣ November 23, 2010 ∣ vol. 107 ∣ no. 47 ∣ 20357 Downloaded by guest on September 28, 2021