Downloaded by guest on September 29, 2021 u rti neato aae otisattlo 8,762,166 of life. total of a tree contains the dataset across evolu- interaction the protein the study Our of and their dynamics experiments of tionary mapping principles dur- key scale change elucidate networks and these 21) how (20, structure. organisms, understand evolution living to ing all critical have organisms underlie is of experiments number it interactomes high-throughput organ- a Because com- in of that (15–19). interactomes the number in high-quality namely interactions large reported interactome, protein–protein A its of ism. proteins organism, network between particular plex interactions all a map in to efforts experimental biological or their perform create to and interactions functions. the interactions of per- ability to protein–protein the lead particular, destroy of could govern In sequences rewiring that acid proteins. amino vasive principles and between DNA basic interactions of evolution about of known evolution sequence DNA the is understanding little in genomes advances evolution, affects environment these life the Despite how ancestral (14). understand of and properties 13), determine (12, (11), functions their with between 10). (9, interactions elusive shapes remain mecha- organism evolution genomes, an in how of proteins into evolution insights demonstrated nistic sequence have on based alone studies at While data (6–8). evolution level sequence demonstrated and DNA have history the long studies a has phylogenetic diversity life’s extensive of study envi- The the and (2–5). other ronment many each of with interplay interact an that of components result molecular the are Most (1). ments T resilience network networks interaction protein–protein and evolution both of context environment. the in understanding structure for net- network implications molecular the have of findings change Our gradual topology. that work through and arises time interactome over resilient interactions competi- a of proteins and rewiring level coordinated family variable, protein a the the complex, exhibit at of more that find ability a We greater environment. in tive the survive resilient with more to associated a organism turn that find in we is bacteria, interactome In time. over net- failures to robust work more network become during to interactomes resilient that more interactomes meaning become evolution, of interactomes interactions. that resilience finds protein–protein of the and tree failures 8,762,166 on the focuses of across study total unknown. species Our a 1,840 largely involving of are interactomes life evolution study we during Here change and unpre- of environmental dictability to principles respond basic networks the interactome other, such each how with interact that mul- Bergman) proteins involve Aviv tiple phenotypes and most Airoldi Edoardo Although common by phenomenon. a reviewed biological is 2018; fluctuations 19, environmental October to review robustness for (sent 2018 18, December Feldman, W. Marcus by Contributed 94158 CA Francisco, San Biohub, Zuckerberg a aik Zitnik Marinka life of across tree interactomes the protein in resilience of Evolution www.pnas.org/cgi/doi/10.1073/pnas.1818013116 eateto optrSine tnodUiest,Safr,C 94305; CA Stanford, University, Stanford Science, Computer of Department ee euepoenitrcin esrdb hs large- these by measured interactions protein use we Here, spurred has interactions protein–protein of importance The genes associate to used been has information sequence DNA raim oaatterpeoye ocagn environ- changing to phenotypes of their ability fundamental adapt a to shows life organisms of diversity enormous he a | o Sosi Rok , ewr rewiring network c ˇ a acsW Feldman W. Marcus , | oeua evolution molecular | b,1 ecology n ueLeskovec Jure and , | b eateto ilg,Safr nvriy tnod A935 and 94305; CA Stanford, University, Stanford Biology, of Department section Appendix, (SI (12) pro- information species S2 sequence of ribosomal subunit history gene small from constructed evolutionary RNA life integrate of the tree We the (22) by species. vided dataset that the in into measured kinase– and interactions protein– interactions, substrate pathway regulatory direct exper- metabolic interactions, indicate interactions, including DNA edges interactions, protein–protein and physical biophysical proteins documented species’ repre- which imentally and in a network, species interactome indicate separate by nodes a interactions with Appendix species these infor- each (SI group sent interaction scale We S4). cross-species protein 1,840 Table a current from at all proteins mation 1,450,633 encompassing between species, interactions physical 1 (SI change become data size interaction protein conclu- network our more that when available. indicate or hold and still S8), will quality sen- Fig. sions not and data S8 are section that network Appendix, groups, to taxonomic (SI S7 across sitive Fig. species consistent and model are that S1 and section proteins and which incomplete Appendix, much-studied in are toward interactomes environments the biased natural Although with live. organisms relationships resilience this uncover use net- the and the evolution to between to and of relationship interactome function an resilience the of the resilience identify its determining We particular factor (23–26). critical interactome in a interactome, failures, work each of tion 1073/pnas.1818013116/-/DCSupplemental. y at online information supporting contains article This BY-NC-ND) (CC 4.0 NoDerivatives License distributed under is article access open This interest.y of conflict . no Of declare College authors Einstein The Albert A.B., and and M.W.F., University; R.S., Harvard M.Z., E.A., Reviewers: and per- data; J.L. analyzed and paper. y J.L. the R.S., and wrote M.Z., M.W.F., J.L. R.S., research; M.Z., designed research; J.L. formed and M.W.F., M.Z., contributions: Author owo orsodnemyb drse.Eal flmnsafr.d rjure@ or [email protected] Email: addressed. cs.stanford.edu. be may correspondence whom To hnetruheouinadhwteecagsafc their unpredictability. affect environmental changes to interactomes these response how how and and evolution reveal variable, through beneficial findings change complex, a Our in has habitats. survival interactome organism’s competitive resilient the highly on a impact that failures. find network inter-We against that robustness interactomes, hypothesis favoring life, study- evolve resilient longstanding actomes of more a By to tree for disease. leads evidence the providing evolution across or that species death find of 1,840 we breakdown cell from the to interactomes as ing lead interactome may the is underlies of failures interactions that property network critical to machinery a resilience molecular The of complexity. organismal structure the cap- interactions tures protein–protein of network interactome The Significance .Uigntoksine esuytentokorganiza- network the study we science, network Using ). a,c,1 y . y raieCmosAttribution-NonCommercial- Commons Creative ,oraaye ieresults give analyses our ), www.pnas.org/lookup/suppl/doi:10. NSLts Articles Latest PNAS eto 1and S1 section , c | Chan f8 of 1 y

EVOLUTION Results indicating that the interactome is not resilient to network Modeling Resilience of the Interactome. Natural selection has failures. influenced many features of living organisms, both at the level To fully characterize the interactome resilience of a species we of individual genes (27) and at the level of whole organisms (13). measure fragmentation of the species’ interactome across all pos- To determine how natural selection influences the structure of sible network failure rates (SI Appendix, Fig. S3). Consider the interactomes, we study the resilience of interactomes to network interactome of the pathogenic bacterium Haemophilus influenzae failures (23, 25, 26). Resilience is a critical property of an inter- and the interactome of humans, which have different resilience actome as the breakdown of proteins can fundamentally affect (Fig. 1C). In the H. influenzae interactome, on removing small the exchange of any biological information between proteins in a fractions of all nodes many network components of varying sizes cell (Fig. 1A). Network failure could occur through the removal appear, producing a quickly increasing Shannon diversity. In of a protein (e.g., by a nonsense mutation) or the disruption of a contrast, the human interactome fragments into a few small com- protein–protein interaction (e.g., by environmental factors, such ponents and one large component whose size slowly decreases as as availability of resources). The removal of even a small number small components break off, resulting in Shannon diversity that of proteins can completely fragment the interactome and lead increases linearly with the network failure rate (Fig. 1C). Thus, to cell death and disease (4, 5) (SI Appendix, section S5.1 and unlike the fragmentation of the H. influenzae interactome, the Table S3). Disruptions of interactions can thus affect the inter- human interactome stays together as a large component for very actome to the extent that its connectivity can be completely lost high network failure rates, providing evidence for the topological and the interactome loses its biological function and increases stability of the interactome. In general, the calculation of mod- the risk of disease (5). ified Shannon diversity over all possible network failure rates f We formally characterize the resilience of an interactome of a yields a monotonically increasing function that reaches its min- species by measuring how fragmented the interactome becomes imum value of 0 at f = 0 (i.e., a connected interactome) and its when all interactions involving a fraction f of the proteins maximum value of 1 at f = 1 (i.e., a completely fragmented inter- (nodes) are randomly removed from the interactome (Fig. 1A). actome) as the interactome becomes increasingly fragmented The resulting isolated network components then determine the with increasing network failure rate f (SI Appendix, section S5.3 interactome fragmentation. A network component is a con- and Fig. S3). We therefore define resilience of interactome G as nected subnetwork of the interactome in which any two nodes one minus the area under the curve defined by that function, can reach each other by a path of edges. The smaller the network Z 1 component is, the fewer nodes can be reached from any given Resilience(G) = 1 − Hmsh(Gf ) df , [2] node in the component. To characterize how the interactome 0 fragments into isolated components we use the Shannon diver- sity index (28–31), which we modify to ensure that the resilience which takes values between 0 and 1; a higher value indicates a of interactomes with different numbers of proteins can be com- more resilient interactome. pared (Fig. 1B and SI Appendix, section S5.2). In particular, when the interactome G is subjected to a network failure rate f , it Resilience of Interactomes Throughout Evolution. We characterize is fragmented into a number of isolated components of vary- systematically the resilience of interactomes for all species in ing sizes (SI Appendix, Fig. S1). We quantify connectivity of the the dataset (Fig. 1D and SI Appendix, Table S5). We find that species display varying degrees of interactome resilience to net- resulting fragmented interactome Gf by calculating the modified Shannon diversity on the resulting set of isolated components. work failures (Fig. 1E). At a global, cross-species scale, we find that a greater amount of genetic change is associated with a Let {C1, C2, ... , Ck } be k isolated components in Gf . The modi- more resilient interactome structure [locally weighted scatterplot fied Shannon diversity of Gf is then calculated as the entropy of smoothing (LOWESS) fit; R2 = 0.36; Fig. 1F], and this associ- {C1, C2, ... , Ck } as ation remains strong even after statistical adjustments for the k influence of many other variables (SI Appendix, section S8). The 1 X H (G ) = − p log p , [1] more genetic change a species has undergone, the more resilient msh f log N i i i=1 is its interactome. The evolution of a species, which is repre- sented by the total branch length from the root to the leaf taxon where N is the number of proteins in the interactome and representing that species in the tree of life (12), thus predicts pi = |Ci |/N is the proportion of proteins in the interactome G resilience of the species’ interactome, providing empirical evi- that are in component Ci . We can interpret pi as the proba- dence that interactome resilience is an evolvable property of bility of seeing a protein from component Ci when sampling organisms (26). This finding also suggests that the structure of one protein from the fragmented interactome Gf . That is, present-day interactomes reflects their history or that interac- Eq. 1 quantifies the uncertainty in predicting the component tomes must have a certain structure because that structure is well identity of a protein that is taken at random from the interac- suited to the network’s biological function. From an evolutionary tome. Finally, we use the normalization factor 1/ log N because standpoint, this finding points in the direction of topologically it corrects for differences in the numbers of proteins in the stable interactomes, which suggests that evolutionary forces may interactomes and ensures that interactomes of different species shape protein interaction networks in such a way that their large- can be compared. The range of possible Hmsh values is between scale connectivity, i.e., the network’s biological function, remains 0 and 1, where these limits correspond, respectively, to a con- largely unaffected by small network failures as long the failures nected interactome in which any two proteins are connected are random. by a path of edges and a completely fragmented interactome We also find that species from the same taxonomic domain in which every protein is its own isolated component. If the have more similar interactome resilience than species from fragmented interactome has one large component and only different domains (P = 6 · 10−11 for bacteria against eukary- a few small broken-off components, then the modified Shan- otes; see SI Appendix, Fig. S10 for comparisons between other non diversity is low, providing evidence that the interactome taxonomic groups). Furthermore, the degree of interactome has network structure that is resilient to network failures (23) resilience is significantly higher than expected by chance alone (SI Appendix, Fig. S2). In contrast, if the interactome breaks (SI Appendix, Fig. S9); that is, in a similar random net- into many small components, it becomes fragmented, and its work of identical size and degree distribution (P = 5 · 10−12), modified Shannon diversity is high (SI Appendix, Fig. S2), indicating that naturally occurring interactomes have higher

2 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1818013116 Zitnik et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 h neatm fa raimcnit falpyia neatosbtenpoen nteogns.We neatosinvolv- interactions When organism. diversity the in Shannon proteins Modified between (B) interactions physical components. all (f network of fraction consists certain organism a an of ing interactome The (A) distances. evolutionary 1. Fig. inke al. et Zitnik species’ the of resilience predicts species a of evolution species, all the the Across the on on S7). right to (far Fig. root protein–protein site Appendix, the Current per (SI fit; from species. substitutions (F PubMed (LOWESS every site) S1). NCBI failures of per section the network resilience Appendix, substitutions in interactome to (SI (nucleotide publications the interactome biases 1,000 length shows investigative gone least bars branch and have at of selection total circle with species notable the outside species to ancestral (E as The prone As us. represented S2). be species. to is section might Appendix, bacterial data available species (SI interaction a are a tree and species of the eukaryotic present-day Evolution in a the of (12). leaf corresponding of in eukarya interactomes connectivity 190 interactome only the and the and of archaea, lost, of loss 111 been neighborhood complete have small a interactomes A indicates older (D) 0 extinct, organisms. value selected resilience three and interactome, the resilient among ). S3 most Fig. the Appendix, indicates (SI 1 interactome value Resilience rate S5). section failure Appendix, network given a at

D

Interactome EF AB resilience Tree scale 0.3 Interactome x rti neato aao ,4 pce ossigo ,6,6 neatosb ,5,3 rtisrva h eiineo neatmsars vast across interactomes of resilience the reveal proteins 1,450,633 by interactions 8,762,166 of consisting species 1,840 of data interaction Protein axis).

ARCHAEA organism EUKARYOTA Eukaryotic = 5i hseape ftepoen r eoe rmteitrcoe h neatm rget noanme fisolated of number a into fragments interactome the interactome, the from removed are proteins the of example) this in 5/45 f (C . oosapiens Homo h eiineo h neatm nertsmdfidSanndiversity Shannon modified integrates interactome the of resilience The ) x xs aeo vrg 04 oersletitrcoeta h he pce ihtelatsbtttos(a left (far substitutions least the with species three the than interactome resilient more 20.4% a average on have axis) R

2 BACTERIA = a h ot(es)rsletitrcoe(hi eiinei .6 n .6,respectively) 0.267, and 0.461 is resilience (their interactome resilient (least) most the has influenzae) (H. 6;mr eei hneipisamr eiin neatm.Treseiswt h otnucleotide most the with species Three interactome. resilient more a implies change genetic more 0.36); H organism Bacterial msh esrshwteitrcoefamnsit sltdcomponents isolated into fragments interactome the how measures S5) section Appendix, (SI interaction A failed of 6proteins component Isolated of 4proteins component Isolated Modified Shannon C

Interactome resilience diversity, Hmsh 0.20 0.25 0.30 0.35 0.40 0.45 0.50 coelicolor Streptomyces Evolution (nucleotidesubstitutionspersite) melanogaster .026 .034 .042 .05.00 4.60 4.20 3.80 3.40 3.00 2.60 2.20 R p Fraction of proteins <9 2 =0.36

hspo hw h neatm eiinefr171 for resilience interactome the shows plot This ) in the component resilience Low interactome 0.05 0.15 0.25 . 10 hlgntcte hwn ,3 bacteria, 1,539 showing tree Phylogenetic ) Network failurerate, H H Modified Shannondiversity catarrhalis Moraxella -10 msh msh anguillarum Vibrio Isolated components =0.520 cosalpsil alr rates failure possible all across parvum Cryptosporidium NSLts Articles Latest PNAS resilience High interactome mellifera Apis f sapiens Homo rerio Danio | f8 of 3 f (SI

EVOLUTION resilience than their random counterparts. These findings are Relationship Between Interactome Resilience and Ecology. We next independent of genomic attributes of the species, such as genome ask whether there is a relationship between species’ interac- size and the number of protein-coding genes, and are not direct tome resilience and aspects of species’ ecology (SI Appendix, effects of network size, the number of interactions in each section S4). We examine the relationship between interac- species, broad-tailed degree distributions (23), or the presence tome resilience and the fraction of regulatory genes and find of hubs in the interactome networks (SI Appendix, Fig. S8 and that bacteria with more resilient interactomes have significantly Table S1). Furthermore, these findings are consistent across a more regulatory genes in their genomes (R2 = 0.32; Fig. 2A). variety of assays that are used to measure the interactome (SI Bacteria with highly resilient interactomes can also survive in Appendix, Table S2). more diverse and competitive environments, as revealed by

AB C R2 = 0.32 R2 = 0.21 R2 = 0.09 p < 7 . 10-4 p < 2 . 10-8 p < 4 . 10-2

Ralstonia Propionibacterium solanacearum acnes Streptococcus pneumoniae Thermotogate maritima Listeria Buchnera innocua aphidicola

Interactome resilience Interactome resilience Interactome resilience D Ecological habitat Oxygen requirement p = 0.01 p = 8 . 10-4

p = 0.03 p = 3 . 10-3 *ASH

*ASH *TMH *TM

*TMA Interactome resilience

Complexity of bacterial lifestyle Oxygen dependence

*T = Significant difference from Terrestrial (p < 0.05)

Fig. 2. Bacteria with more resilient interactomes survive in more complex, variable, and competitive environments. We use ecological information for 287 bacterial species (32) to examine the relationship between species’ interactome resilience and their ecology (SI Appendix, section S4). (A) Interactome resilience positively correlates with the fraction of regulatory genes in bacteria, an established indicator of environmental variability of species’ habitats (32) (R2 = 0.32). (B and C) For environmental viability of a species, we use a cohabitation index that records how many organisms populate each environment in which the species is viable (i.e., the level of competition in each viable environment) and an environmental scope index that records a fraction of the environments in which the species is viable (i.e., species’ environmental diversity) (32). The resilience of the interactome positively correlates with the level of cohabitation encountered by bacteria (R2 = 0.21), and bacteria with resilient interactomes tend to thrive in highly diverse environments (R2 = 0.09). (D) Terrestrial bacteria have the most resilient interactomes (P = 7 · 10−3), and host-associated bacteria have the least resilient interactomes (P = 4 · 10−6). In bacteria, interactome resilience is indicative of oxygen dependence. Aerobic bacteria have the most resilient interactomes (P = 8 · 10−4), followed by facultative and the anaerobic bacteria. Error bars indicate 95% bootstrap confidence interval; P values denote the significance of the difference of the means according to a Mann–Whitney U test.

4 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1818013116 Zitnik et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 rti aiis oo ad niae9%cndnebn o h OESfi;ga ie hwrno expectation. random al. show et lines Zitnik gray fit; LOWESS the for band confidence 95% indicate bands color families; in protein Lines mechanism. from evolutionary removed coordinated is a protein Appendix , (P central (SI the evolution the Isolated species in with when ). S5 originating length decrease arise Fig. that branch proteins’ Appendix , neighborhood (SI longest of metrics the the length network in with two branch components lineage calculating connected ( the the by of (gray) by family in number the interactome ordered is degree-adjusted in the 3 proteins protein the organism each by orthologous of whereas given of neighborhood length, are network sequence branch components the a shortest characterize by then the We with family S3). lineage protein section the the of represent tip We the tree. at located is 1 organism C ( S6). section Appendix, (SI neighborhood the and 1 species in (A events in members duplication S4) protein gene Fig. three through this , Appendix arise with In (SI which family neighborhoods (pink). proteins protein network “2” in-paralogs, hypothetical protein also and A evolve, are (green) (B) sequences Shown “1” speciation. protein time. organisms and after over diverge present-day 2 independently species to rewire arising can leads newly interactomes and two their (11) the As model pair. speciation-divergence protein protein the orthologous ancestral to single according a lineages example, two to rise gives that 3. Fig. species Ancestral C A B Tree scale Organism 1

Isolated network 0.1 yohtclpyoeei reilsrtsaseito event speciation a illustrates tree phylogenetic hypothetical A (A) interactomes. protein in changes structural network local mitigates Evolution Organism 3 Organism 1 components Protein Protein family( 0.00 0.10 0.20 0.30 0.40 Orthologous relationship A ‘ event Speciation Organism 2 Organism 2 .028 .032 .036 .040 .04.40 4.20 4.00 3.80 3.60 3.40 3.20 3.00 2.80 2.60 A ‘‘ .Tenihoho iedw-egtdb h eudnyo oa neatosgvsteefciesz of size effective the gives interactions local of redundancy the by down-weighted size neighborhood The S6). section Appendix, SI Evolution (nucleotidesubstitutionspersite) Organism 3 Protein-protein interaction = A’ A ‘‘‘ , 3 A’’ · species Ancestral Organism 2 10 A , Organism 1 A’’’ −8 ) A and neighborhood network Protein htwspeeti h neta pce ie iet proteins to rise gives species ancestral the in present was that P = 3 epciey Spearman’s respectively; 0.03, and C and h ubro sltdntokcmoet n h fetv ieo rti egbrod both neighborhoods protein of size effective the and components network isolated of number The D) Effective size=1.0 Isolated components=1.0 Isolated components e n D ij A hwteLWS to einageae ewr ercvle o 163poen rm2,224 from proteins 81,673 for values metric network median-aggregated of fit LOWESS the show Binary indicatorofinteractionbetween Number ofisolatedcomponentsarisingwhenallinteractionsinvolving Isolated networkcomponent Organism 2 Organism 1 raim1Ogns Organism 3 Organism 2 Organism 1 A A A ‘‘ ‘ Ortholog ‘ pair Ortholog ρ D akcreain,sgetn htlclitrcinnihohosrwr via rewire neighborhoods interaction local that suggesting correlation), rank pair Effective Rewiring ofinteractionsovertime

neighborhood size Paralog A pair =n 0.20 0.30 0.40 0.50 0.60 0.70 0 , Effective size=0.94 Isolated components=0.83 A A 00 /d , Evolution Organism 1 Organism 2 .028 .032 .036 .040 .04.40 4.20 4.00 3.80 3.60 3.40 3.20 3.00 2.80 2.60 A A 000 A A ‘‘ ‘ Ortholog d ,ec rmadfeetogns.I h hlgntctree, phylogenetic the In organism. different a from each ), A A Evolution (nucleotidesubstitutionspersite) pair ‘‘ =|N i and 1 deletion Node the interaction Addition of (A)| A 0 j and Degree ofprotein Ortholog pair A Paralog A 00 =1/d triple Paralog pair pnspeciation; upon Effective size=0.79 Isolated components=0.43 A i Organism 1 Є Σ Organism 2 N 1 (1 -1/d (A) A A NSLts Articles Latest PNAS ‘‘ ‘ Ortholog A A pair ‘‘‘ A of theinteraction Deletion A j A Є Σ 0 N fail and 1 (A) e ij ) A 00 oman form | f8 of 5

EVOLUTION exceptionally strong associations between the resilience and the of mutations in a resilient interactome might not change the net- level of cohabitation and the environmental scope (Fig. 2 B and work’s primary function, they might alter other network features, C). Furthermore, using a categorization of bacteria into five which can drive future adaptations as the environment of the groups based on their natural environments [NCBI classification species changes (33). Changes that are neutral concerning one for bacterial lifestyle (terrestrial, multiple, host cell, aquatic, spe- aspect of the network’s function could lead to innovation in other cialized) ordered by the complexity of each category (32)], we aspects, suggesting that a resilient interactome can harbor a vast find that terrestrial bacteria living in the most complex ecolog- reservoir of neutral mutations. ical habitats have the most resilient interactomes (P = 7 · 10−3; Mann–Whitney U test) and that host-associated bacteria have Structural Changes of Protein Network Neighborhoods. A resilient the least resilient interactomes (P = 4 · 10−6; Fig. 2D). Our anal- interactome may arise through changes in the network structure ysis further reveals that interactome resilience is indicative of of individual proteins over time (Fig. 3A). To investigate such oxygen dependence; the most resilient interactomes are those changes in local protein neighborhoods, we decompose species of aerobic bacteria (P = 8 · 10−4), followed by facultative and interactomes into local protein networks, using a 2-hop subnet- then anaerobic bacteria, which do not require oxygen for growth work centered around each protein in a given species as a local (Fig. 2D). representation of a protein’s direct and nearby interactions in These relationships suggest that molecular mechanisms that the species’ interactome (SI Appendix, section S6.1 and Fig. S4). render a species’ interactome more resilient might also allow it We obtain 81,673 protein network neighborhoods and then use to cope better with environmental challenges. In the network orthologous relationships between proteins to group them into context, high interactome resilience suggests that proteins can 2,224 protein families, with an average of 38 protein neighbor- interact with each other even in the face of high protein fail- hoods originating from 12 species in each family (SI Appendix, ure rate. High interactome resilience indicates that a species has section S3). Each family represents a group of orthologous a robust interactome, in which many mutations represent net- proteins that share a common ancestral protein (Fig. 3B). work failures that are neutral in a given environment, have no By examining protein families, we find that the number of iso- phenotypic effect on the network’s function, and are thus invis- lated network components in protein network neighborhoods ible to natural selection (26). However, neutral mutations may and the effective size of the neighborhoods (Fig. 3B and SI not remain neutral indefinitely, and a once-neutral mutation may Appendix, section S6.2) both decrease with evolution (P = 3 · have phenotypic effects in a changed environment and be impor- 10−8 and P = 0.03, respectively; Fig. 3 C and D), indicating tant for evolutionary innovation (25). Although a large number that protein neighborhoods become more connected during

AB Organism 1 Organism 2 * p < 10-33 * -0.215

Ortholog pair A A‘ * -0.098 Short evolutionary distance ( )

Evolution at Evolution at 0.016 * |{ }| IRR( ) = log , , , , |{ , , , , }| * -0.079 |{ }| |{ }| IRR( ) = log IRR( ) = log , |{ , }| |{ }| |{ }| IRR( ) = log |{ , }| Negative rate of change Positive rate of change Interaction rewiring rate (IRR) Protein Protein-protein interaction Randomized orthologous relationships Orthologous relationship Randomized evolutionary distances

Fig. 4. The rewiring rate of interactions in local protein neighborhoods varies with the topology of network motifs. (A) Interaction rewiring rate (IRR) measures the fold change between the probability of observing a particular network motif in the network neighborhood of protein A0 and the probability of observing the same motif in the neighborhood of an evolutionarily younger orthologous protein A. A positive (negative) rate of change indicates the motif becomes more (less) common over time (SI Appendix, section S7). Shown are the rewiring rates for interactions (i.e., edges; the number of interactors of A0 vs. A), triangle motifs touching the orthologous protein (yellow), square motifs touching the orthologous protein (green), and triangle motifs in the protein network neighborhood (orange). (B) Square motifs become more common in protein neighborhoods during evolution (P < 10−33), which is supported by a range of biological evidence (18, 37, 38). However, triangle motifs become less common over time (P < 10−33 for both types of triangle motifs). Gray bars indicate random expectation (SI Appendix, section S7), either for random orthologous relationships (dark gray) or for random evolutionary distances (light gray); error bars indicate 95% bootstrap confidence interval; and P values denote the significance of the difference of IRR distributions using a two-sample Kolmogorov–Smirnov test.

6 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1818013116 Zitnik et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 inke al. et Zitnik coe sn Rsfo i.4B Fig. from IRRs using the motif-based of actome, the our size extrapolating the by of estimate interactome human we power whole changes, of network share predictive motifs structural the of to square model new test likely produce To are thereby interactions. and proteins partners partners conclusion; interaction more same even shared the follow- multiple reach arguments (18) with Evolutionary duplication proteins). gene in ing sites binding (see the interactions of S6 motif Fig. square the a in manifests as which they other, interactome However, each with neighbors. directly their interact motifs: not of might square many share of with interfaces proteins similar number two hence supports interfaces; interactions the complementary protein–protein 38) require in (38), often 37, perspective change (18, structural of a evidence From rate biological of positive 0 = range (IRR this com- A evolution more during 4B). become neighborhoods Fig. interactions protein of in motifs mon square that find evolutionary also of protein’s studies the earlier confirms and age interactions a of between correlation number negative protein’s significant interactions This 4B). protein–protein = Fig. (IRR fewer species −0.215; younger 0.861 evolutionarily in of proteins with factor compared par- a average in on For species ticipate interactomes. older evolutionarily of in is evolution proteins interactions the example, of for rewiring mechanism that important suggesting an 4B), Fig. motifs; work ml ubro raim ihhg-oeaeprotein–protein high-coverage a with on organisms focused protein– of have most number networks of date, small biological To consisting of was species. analyses interactomes, 1,840 evolutionary perspective from of networks evolutionary dataset interaction protein a an by environmen- from enabled protein–protein to networks of investigation response interaction systematic networks organismal This these complexity. in and tal changes phenotypes how networks and affect interaction evolution protein–protein through how change reveal analyses Our Discussion (19). interactome human the from of complexity range the which establishing interactome, human of the to estimates of previous size three with be the number agreement to the good predict surprisingly humans in we in gene, interactions per of isoform splice one Assuming (SI pair ( protein evolution during each rewire of motifs species network younger S7.1 the interactomes section and the Appendix, older between the occurrences each IRR motif of times the the of derive and comparing number neighborhood by the protein each net- calculate in selected We appears for motif 4A). use (IRRs) (Fig. then rates motifs we rewiring work which interaction pairs, protein calculate 2,485,564 to in sec- Appendix , resulting (SI S3), species tion close orthologous evolutionarily identify from first pairs We protein 35). (34, we motifs interactomes, network the investigate in changes structural of mechanisms tionary Interactions. Protein–Protein of Rewiring Network undergone has change. that genetic species neighborhood more the the in interconnected and more different For becomes neigh- increasingly 3B): network become local (Fig. proteins’ borhoods the distance evolution increases, evolutionary species the of the as between species, model two in sug- network proteins neighborhoods orthologous molecular the a in changes gest structural These evolution. .Csel J afil F(08 ao e irba rusepn iest n alter and diversity Barab expand groups 2. microbial new Major (2018) JF Banfield CJ, Castelle 1. organization. life. of tree the of understanding our 370,000 s L lviZ 20)Ntokbooy nesadn h elsfunctional cell’s the Understanding biology: Network (2004) ZN Oltvai AL, asi ´ o nilsrto fitrcinitrae recognizing interfaces interaction of illustration an for neatos(5 7 9 n aepoe rca in crucial proved have and 39) 17, (15, interactions a e Genet Rev Nat .W n togsaitcleiec that evidence statistical strong find We ). 5:101–113. Cell acaoye cerevisiae Saccharomyces 172:1181–1197. hspeito is prediction This ∼160,000. eto S7.3 section Appendix, (SI P < 10 .cerevisiae S. osuyevolu- study To −33 IAppendix, SI o l net- all for 3) We (36). 150,000 inter- .016; ). n ua eoisadagatfo h onTmltnFoundation. Templeton John the Evolutionary, from Computational, grant for a and Center was Genomics M.W.F. Stanford Boeing, Human Biohub. the and Agency, by Zuckerberg Projects part Chan Research in and supported Advanced Initiative, supported were Science Defense J.L. Data NIH, and NSF, Stanford R.S., by M.Z., Pier- part Sharma. Emma Pratyaksh in Ting, Y. and Alice An, Fraser, B. Michael Hunter son, Goodman, Aaron thank we script, the with ACKNOWLEDGMENTS. shared are interactomes, processed from available the and including community paper, at this available in publicly is ology Availability. Data and Code analy- in additional provided and are methodology, ses statistical data, of description Detailed Methods and Materials studied poorly in proteins charac- orthologous organisms. experimentally their to extrap- from the proteins information facilitate terized and functional future the of exist, in olation structures dur- change may network change they observed how interactomes com- and currently how more why of a evolution, mechanisms ing in the also survive can with clarify to findings Our help associated a organism environment. competitive is has the or variable, turn, of change plex, ability in genetic greater which, more the interactome, findings undergone that resilient proposition The has more biological that failures. the organism protein for an to evidence resilience predicts interactome quantitative species offer species’ a network of the structural evolution of that of show predictor and important high- change an findings as Our evolution networks. light molecular fundamental of reveal principles interactomes structural that demonstrating by studies (40). properties dynamic interactome’s the on as connectivity well of interactome as stability large-scale topological in on how depends reveal both interactome the might dynamically (45–47) space change and time interactions on protein–protein more information yield how Additionally, might predictions. which give evolutionary resiliency, to of detailed adapted definition be complex could more resilience a interactome of available, measure becomes (44), our as data (43), protein-expression interactome dynamic the as information in well proteins detailed individual more dis- of As functions necessarily about (40–42). without could network function that the biological interactome connecting network’s the of the modifications alter the possible of other there measure fragmentation, are global Beyond a stability. iso- represents topological into interactome’s thus interactome and the components of lated fragmentation measures key resilience our datasets. that the in confidence biases possible providing to attributed thus many be Appendix S1), cannot by findings Table (SI explained and confounders S8, not Fig. network are and and (SI lineages genomic S10) phylogenetic data Fig. different interaction and Appendix, protein S2) Table of our Appendix, subsets con- (SI are different of results both our generalizability across However, and sistent evaluated. the and further positives be available, collected, can false are findings become of data genomes number interaction protein high more more a As to negatives. subject interactions currently plat- protein are same documented study. present the all experimentally the of of on Furthermore, limitation important tests proteins an scarce, unbiased of remain form by combinations mapped pairwise interactomes possible because is This neato aa uhas such data, interaction .HtlnE,e l 21)Acietr ftehmnitrcoedfie protein defines interactome human the of Architecture disease complex (2017) in al. concepts et biology EL, Network Huttlin (2016) S 4. Brunak CE, Thomas JX, Hu 3. u td rsnsa diinlprdg o evolutionary for paradigm additional an presents study Our The study. our of aspect important an is resilience Interactome omnte n ies networks. disease and communities comorbidities. a e Genet Rev Nat IAppendix SI o epu icsinadcmet ntemanu- the on comments and discussion helpful For otaeipeetto fsaitclmethod- statistical of implementation Software .cerevisiae, S. 17:615–629. l aaused data All snap.stanford.edu/tree-of-life. snap.stanford.edu/tree-of-life. . Nature 545:505–509. n humans. and musculus, Mus NSLts Articles Latest PNAS eto S8, section , | f8 of 7

EVOLUTION 5. Chen S, et al. (2018) An interactome perturbation framework prioritizes damaging 28. Sheldon AL (1969) Equitability indices: Dependence on the species count. Ecology missense mutations for developmental disorders. Nat Genet 50:1032–1040. 50:466–467. 6. Britten RJ (1986) Rates of DNA sequence evolution differ between taxonomic groups. 29. MaGuarran A (1988) Ecological Diversity and Its Measurement (Princeton Univ Press, Science 231:1393–1398. Princeton). 7. Yang Z, Rannala B (2012) Molecular : Principles and practice. Nat Rev 30. Goodwin SB, Spielman L, Matuszak J, Bergeron S, Fry W (1992) Clonal diversity Genet 13:303–314. and genetic differentiation of Phytophthora infestans populations in northern and 8. Marsit S, et al. (2017) Evolutionary biology through the lens of budding central Mexico. Phytopathology 82:955–961. comparative genomics. Nat Rev Genet 18:581–598. 31. Baczkowski A, Joanes D, Shamia G (1997) Properties of a generalized diversity index. 9. Studer RA, et al. (2016) Evolution of protein phosphorylation across 18 fungal species. J Theor Biol 188:207–213. Science 354:229–232. 32. Freilich S, et al. (2009) Metabolic-network-driven analysis of bacterial ecological 10. Sorrells TR, Booth LN, Tuch BB, Johnson AD (2015) Intersecting transcription networks strategies. Genome Biol 10:R61. constrain gene regulatory evolution. Nature 523:361–365. 33. Barve A, Wagner A (2013) A latent capacity for evolutionary innovation through 11. Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. exaptation in metabolic systems. Nature 500:203–206. Science 278:631–637. 34. Benson AR, Gleich DF, Leskovec J (2016) Higher-order organization of complex 12. Hug LA, et al. (2016) A new view of the tree of life. Nat Microbiol 1:16048. networks. Science 353:163–166. 13. Parks DH, et al. (2017) Recovery of nearly 8,000 metagenome-assembled genomes 35. Yin H, Benson AR, Leskovec J (2018) Higher-order clustering in networks. Phys Rev E substantially expands the tree of life. Nat Microbiol 2:1533–1542. 97:052306. 14. Mukherjee S, et al. (2017) 1,003 reference genomes of bacterial and archaeal isolates 36. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate expand coverage of the tree of life. Nat Biotechnol 35:676–683. in the protein interaction network. Science 296:750–752. 15. Rual JF, et al. (2005) Towards a -scale map of the human protein–protein 37. Ispolatov I, Yuryev A, Mazo I, Maslov S (2005) Binding properties and evolu- interaction network. Nature 437:1173–1178. tion of homodimers in protein–protein interaction networks. Nucleic Acids Res 16. Roguev A, et al. (2008) Conservation and rewiring of functional modules revealed by 33:3629–3635. an epistasis map in fission yeast. Science 322:405–410. 38. Keskin O, Tuncbag N, Gursoy A (2016) Predicting protein–protein interactions from 17. Venkatesan K, et al. (2009) An empirical framework for binary interactome mapping. the molecular to the proteome level. Chem Rev 116:4884–4909. Nat Methods 6:83–90. 39. Hart GT, Ramani AK, Marcotte EM (2006) How complete are current yeast and human 18. Arabidopsis Interactome Mapping Consortium, et al. (2011) Evidence for network protein-interaction networks? Genome Biol 7:120. evolution in an Arabidopsis interactome map. Science 333:601–607. 40. Siegal ML, Promislow DE, Bergman A (2007) Functional and evolutionary inference in 19. Rolland T, et al. (2014) A proteome-scale map of the human interactome network. gene networks: Does topology matter? Genetica 129:83–103. Cell 159:1212–1226. 41. Leclerc RD (2008) Survival of the sparsest: Robust gene networks are parsimonious. 20. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: Densification laws, Mol Syst Biol 4:213. shrinking diameters and possible explanations. Proceedings of the Eleventh ACM 42. Masel J, Siegal ML (2009) Robustness: Mechanisms and consequences. Trends Genet SIGKDD International Conference on Knowledge Discovery in Data Mining, ed 25:395–403. Grossman RL (ACM, New York), pp 177–187. 43. Agrawal M, Zitnik M, Leskovec J (2018) Large-scale analysis of disease pathways in the 21. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and human interactome. Pacific Symposium on Biocomputing, eds Altman RB, Dunker AK, shrinking diameters. ACM Trans Knowl Discovery Data 1:1–41. Hunter L, Ritchie MD, Klein TE (World Scientific, Singapore), Vol 23, pp 111–122. 22. Zitnik M, et al. (2019) Machine learning for integrating data in biology and medicine: 44. Deng Q, Ramskold¨ D, Reinius B, Sandberg R (2014) Single-cell RNA-seq reveals Principles, practice, and opportunities. Inf Fusion 50:71–91. dynamic, random monoallelic in mammalian cells. Science 343:193– 23. Albert R, Jeong H, Barabasi´ AL (2000) Error and attack tolerance of complex networks. 196. Nature 406:378–382. 45. Han JDJ, et al. (2004) Evidence for dynamically organized modularity in the yeast 24. Wagner A (2003) Does selection mold molecular networks? Sci Signaling 2003:pe41. protein–protein interaction network. Nature 430:88–93. 25. Wagner A (2005) Robustness, evolvability, and neutrality. FEBS Lett 579:1772–1778. 46. Smith C, Puzio RS, Bergman A (2014) Hierarchical network structure promotes 26. Wagner A (2013) Robustness and Evolvability in Living Systems (Princeton Univ Press, dynamical robustness. arXiv:1412.0709. Preprint, posted December 1, 2014. Princeton). 47. Smith C, Pechuan X, Puzio RS, Biro D, Bergman A (2015) Potential unsatisfiabil- 27. Vo TV, et al. (2016) A proteome-wide fission yeast interactome reveals network ity of cyclic constraints on stochastic biological networks biases selection towards evolution principles from to human. Cell 164:310–323. hierarchical architectures. J R Soc Interface 12:20150179.

8 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1818013116 Zitnik et al. Downloaded by guest on September 29, 2021