doi:10.1016/j.jmb.2010.09.029 J. Mol. Biol. (2010) 404, 158–171

Contents lists available at www.sciencedirect.com Journal of Molecular Biology journal homepage: http://ees.elsevier.com.jmb

Coevolution Predicts Direct Interactions between mtDNA-Encoded and nDNA-Encoded Subunits of Oxidative Phosphorylation Complex I

Moran Gershoni1†, Angelika Fuchs2†, Naama Shani1, Yearit Fridman1, Marisol Corral-Debrinski3, Amir Aharoni1, Dmitrij Frishman2⁎ and Dan Mishmar1⁎

1Department of Life Sciences and the Nation Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel 2Technische Universität München, Wissenschaftszentrum Weihenstephan, Am Forum 1, 85354 Freising, Germany 3Institut de la Vision, Université Pierre et Marie Curie Paris 6, Unité Mixte de Recherche, S 592, 17 rue Moreau, Paris F-75012, France

Received 29 October 2009; Despite years of research, the structure of the largest mammalian received in revised form oxidative phosphorylation (OXPHOS) complex, NADH–ubiquinone oxi- 5 September 2010; doreductase (complex I), and the interactions among its 45 subunits are accepted 13 September 2010 not fully understood. Since complex I harbors subunits encoded by Available online mitochondrial DNA (mtDNA) and nuclear DNA (nDNA) , with 22 September 2010 the former evolving ∼10 times faster than the latter, tight cytonuclear coevolution is expected and observed. Recently, we identified three Edited by M. Sternberg nDNA-encoded complex I subunits that underwent accelerated amino acid replacement, suggesting their adjustment to the elevated mtDNA rate Keywords: of change. Hence, they constitute excellent candidates for binding coevolution; mtDNA-encoded subunits. complex I; Here, we further disentangle the network of physical cytonuclear correlated analysis; interactions within complex I by analyzing subunits coevolution. Firstly, mitochondrial DNA; relying on the bioinformatic analysis of 10 protein complexes possessing mitochondria solved structures, we show that signals of coevolution identified physically interacting subunits with nearly 90% accuracy, thus lending support to our approach. When applying this approach to cytonuclear interaction within complex I, we predict that the ‘rate-accelerated’ nDNA-encoded subunits of complex I, NDUFC2 and NDUFA1, likely interact with the mtDNA-encoded subunits ND5/ND4 and ND5/ND4/ND1, respectively. Furthermore, we predicted interactions among mtDNA-encoded complex I subunits. Using the yeast two-hybrid system, we experimentally confirmed the predicted interactions of human NDUFC2 with ND4, the interactions of human NDUFA1 with ND1 and ND4, and the lack of interaction of NDUFC2 with ND3 and NDUFA1, thus providing a proof of concept for our approach.

*Corresponding authors. E-mail addresses: [email protected]; [email protected]. † M.G. and A.F. contributed equally to this study. Abbreviations used: OXPHOS, oxidative phosphorylation; complex I, NADH–ubiquinone oxidoreductase; mtDNA, mitochondrial DNA; nDNA, nuclear DNA; McBASC, McLachlan-based substitution correlation; OMES, observed minus expected squared; ELSC, explicit likelihood of subset covariation; ROC, receiver operator characteristic; AUC, area under the curve.

0022-2836/$ - see front matter © 2010 Elsevier Ltd. All rights reserved. Coevolution in Oxidative Phosphorylation Complex I 159

Our study shows, for the first time, evidence for direct interactions between nDNA-encoded and mtDNA-encoded subunits of human OXPHOS complex I and paves the path towards deciphering subunit interactions within complexes lacking three-dimensional structures. Our subunit-interactions-predicting method, ComplexCorr, is available at http://webclu.bio.wzw.tum.de/complexcorr. © 2010 Elsevier Ltd. All rights reserved.

Introduction encoded subunits that underwent accelerated amino acid replacement during the course of Subunit interactions within large protein com- primate evolution and are thus likely candidates to plexes, such as the nuclear pore and the proteasome, interact with the fast-evolving mtDNA-encoded are readily apparent from their crystal structures. subunits.13 Since cytonuclear subunit interactions However, the three-dimensional structures of many play important roles in disease and evolution,14,15 biologically important protein complexes have yet we sought to decipher such direct interactions to be resolved, thus limiting our understanding of within complex I. Here, we applied combined their subunit interactions and functionality. One evolutionary and experimental approaches to ana- such protein complex is the membrane-bound lyze the interaction of the fast-evolving nDNA- oxidative phosphorylation (OXPHOS) complex encoded subunits NDUFC2 and NDUFA1 with the NADH–ubiquinone oxidoreductase (complex I).1 mtDNA-encoded subunits of complex I. Complex I, the first and largest of the OXPHOS While coevolving amino acid residues were initially complexes (45 subunits in mammals), is the most used to predict intramolecular contacts,16 several common mutational target for mitochondrial groups have recently attempted to employ this dysfunction.2 This protein complex increased almost approach to predict residue contacts at the interface threefold in size from the so-called 14 ‘core’ subunits of interacting proteins. It has been suggested that the in Escherichia coli to 45 subunits in Homo sapiens by spatial distances between coevolving residue pairs gradual recruitment of subunits throughout harbored by interacting proteins are significantly evolution.3 Unlike other OXPHOS complexes, clues smaller than the distances between random residue for subunit interactions within the L-shaped mam- pairs.17,18 Analysis of correlated assisted in malian complex I exist for the mitochondrial matrix the successful identification of interdomain or inter- arm, but not for the membrane arm of the complex.4 protein docking configurations.19,20 Moreover, coe- Hence, much of the knowledge on subunit interac- volving amino acids were found to be prevalent tions and composition within the membrane arm of among interacting residue pairs within and between the complex originates from the investigation of its proteins (‘in silico two-hybrid system’).21 Taken bacterial ortholog, NDH1.5,6 Furthermore, studies of together, these findings indicate that although the complex I assembly have revealed subunit compo- coevolution signal may not be sufficient to discern sition within subcomplexes. Still, such findings only actual contacting residue pairs, it may be informative offer general clues for specific interactions (Vogel for identifying physically interacting protein pairs. et al.7 and references within). Since understanding Hence, residue coevolution constitutes a promising subunit interactions within complex I is important approach to assessing protein–protein interactions for deciphering its function and since disruption within large complexes such as OXPHOS complex I. of complex I assembly causes diseases,7 alterna- Here, using a data set of 10 multisubunit protein tive approaches are needed. complexes with resolved crystal structures, we have Four of the five OXPHOS complexes (i.e., com- detected a clear correlation between the presence of plexes I, III, IV, and V) are composed of subunits highly coevolving residues and physical interactions encoded by nuclear and mitochondrial genomes between subunits. This enabled us to extract optimal [nuclear DNA (nDNA) and mitochondrial DNA parameters and to define the criteria for predicting (mtDNA), respectively] that differ 10 times in terms candidate interactions among mtDNA-encoded and of mutational rates.8 Accordingly, elevated amino nDNA-encoded subunits of complex I. We applied acid replacement rates indicating positive selection these criteria to all of the mtDNA-encoded subunits have been identified in nDNA-encoded subunits of complex I and two candidate nDNA-encoded that closely interact with fast-evolving mtDNA- subunits (NDUFC2 and NDUFA1) thought to encoded subunits within OXPHOS complexes with interact with them.13 This approach predicted experimentally determined three-dimensional struc- interactions between NDUFC2 and two OXPHOS ture, namely complex III, complex IV, and part of complex I mtDNA-encoded subunits (ND4 and – complex V.9 12 In a recent rigorous sequence ND5), interactions between NDUFA1 and three analysis, we identified three complex I nDNA- mtDNA-encoded subunits (ND1, ND4, and ND5), 160 Coevolution in Oxidative Phosphorylation Complex I

Table 1. Complexes with known crystal structures used for an analysis of coevolving residues

Protein Data Possible Bank ID Description Chains pairsa Alignmentsb Interactionsc 1GW5 AP2 clathrin adaptor core A|B|M|S 6 6 6 1W63 AP1 clathrin adaptor core A|B|M|S 6 6 6 2BCJ Gq-GRK2-G complex A|B|G|Q 6 6 3 2J0S Exon junction complex A|C|D|T 6 3 3 1LDK Cul1-Rbx1-Skp1-F boxSkp2 SCF ubiquitin ligase A|B|C|D|E 10 10 5 complex 2CV5 Nucleosome core particle A|B|G|H 6 6 5 2CK3 F1-ATPase A|D|G|H|I 10 6 4 1SFC Synaptic fusion complex A|B|C|D 6 6 6 1BGY Cytochrome bc1 complex A|B|C|D|E|F|G|H|I|J 45 34 18 1V54 Cytochrome c oxidase A|B|C|D|E|F|G|H|I|J| 66 15 10 K|L a Number of possible protein pairs consisting of two different subunits for the given complex. b Subset of protein pairs in a concatenated alignment with at least 10 sequences. Only these protein pairs could be used for correlated mutation analysis. c Number of protein pairs in a concatenated alignment with at least 10 sequences and at least one residue contact. These protein pairs formed the subset of interacting complex subunits used for evaluating the prediction method.

and other interactions involving mtDNA-encoded set contained 10 complexes (Table 1) harboring a subunits. Utilizing the yeast two-hybrid system, we total of 56 different protein subunits involved in 95 experimentally confirmed the interactions of human pairwise interactions. NDUFC2 with ND4, the interactions of NDUFA1 To formulate the criteria for the use of coevolving with ND1 and ND4, and the lack of interaction of amino acid residues to assess protein–protein inter- NDUFC2 with ND3 and NDUFA1, thus providing a actions, we employed three different prediction proof of concept for our approach. Furthermore, we algorithms: McLachlan-based substitution correla- identified specific amino acid residues in NDUFC2 tion (McBASC),23 observed minus expected squared and NDUFA1 that have undergone positive selec- (OMES),24 and explicit likelihood of subset covari- tion. Our analysis thus sheds light on cytonuclear ation (ELSC).25 Briefly, the McBASC prediction subunit interactions within the enigmatic mamma- algorithm assigns higher importance to residues lian complex I. More generally, we present an with low rates of change, whereas both the OMES experimentally supported predictive method for algorithm and the ELSC algorithm underline posi- defining protein–protein interactions within large tions with high rates of change. In general, use of the protein complexes with as yet unsolved structures. McBASC algorithm led to either slightly increased correlation scores for interacting subunits (using the McLachlan matrix) or even higher correlation scores Results and Discussion in noninteracting subunits (using the Miyata matrix). In contrast, use of both the OMES algorithm and the ELSC algorithm resulted in the Coevolving residues in subunit pairs predict assignment of clearly higher correlation scores for physical interactions in complexes with interacting versus noninteracting proteins (Table 2). resolved structures Specifically, the average maximal score for inter- acting proteins obtained by both OMES and ELSC To assess the usefulness of the coevolution signal algorithms was nearly 1.5-fold higher than the as a tool for predicting subunit interactions, we values calculated for noninteracting proteins. Since generated a nonredundant set of multisubunit the differences in the maximal coevolution scores protein complexes with solved structures (Fig. were significant with both the OMES algorithm and S1). To this end, we selected protein structures the ELSC algorithm (Kolmogorov–Smirnov test: from the Protein Data Bank database22 harboring pb0.01 and pb0.05, respectively; Mann–Whitney four or more polypeptides (with each polypeptide test: pb0.001 and pb0.01, respectively), we con- being longer than 15 amino acids), which corre- cluded that both prediction methods consistently spond to different Swiss-Prot‡ accession numbers. detected highly coevolving residue pairs within Protein complexes sharing at least one pair of physically interacting subunits of large protein subunits with other complexes in the data set were complexes. It is worth noting, consistent with excluded from consideration. The remaining data previous reports, that strongly coevolving residue pairs within predicted coevolving subunits are, in ‡ www.expasy.ch/sprot most cases (99.8%), not in direct contact.24,26 Coevolution in Oxidative Phosphorylation Complex I 161

Table 2. Comparison of coevolution scores obtained with different prediction methods for interacting and noninteracting complex subunits

Maximum 0.99 quantile 0.9 quantile 0.75 quantile Median Mean McBASC(Miyata matrix) Interacting 0.98 0.73 0.39 0.21 0.06 0.13 Noninteracting 0.97 0.77 0.41 0.22 0.07 0.14 McBASC(McLachlan matrix) Interacting 0.94 0.70 0.40 0.25 0.10 0.15 Noninteracting 0.92 0.64 0.35 0.20 0.08 0.13 OMES Interacting 4.98 2.23 1.03 0.55 0.18 0.38 Noninteracting 3.46 1.91 0.98 0.55 0.18 0.36 ELSC Interacting 15.97 9.06 4.91 2.92 1.20 1.93 Noninteracting 10.37 6.87 3.78 2.20 0.83 1.41 Listed are the average values of the descriptive features of the observed score distributions for all 66 interacting and 32 noninteracting protein pairs in our data set, respectively. All results are based on multiple alignments generated from BLAST outputs (see Materials and Methods).

Optimizing the strategy to differentiate between ELSC algorithm, the maximal AUC value increased interacting and noninteracting subunits only slightly after realigning all BLAST alignments using either T-Coffee30 or MUSCLE.31 Specifically, in Since the OMES and ELSC algorithms demon- strated the best capability for distinguishing between interacting and noninteracting subunits, we used these methods for further analyses. To improve our ability to detect true protein–protein interactions using coevolution signals, we considered two prediction strategies: (a) the coevolution score of the single Nth best residue pair or (b) the average score of all N best residue pairs. Although it has been reported that the number of coevolving residue pairs can be influenced by the lengths of the proteins studied,16,27,28 we did not observe any notable influence of this parameter on the prediction of coevolving proteins (Fig. 1). A receiver operator characteristic (ROC) curve was calculated for every test case (see Materials and Methods and Fig. S2), and the quality of individual predictions was assessed by comparing measurements of area under the curve (AUC). In general, predictive performance, as represented by the AUC value, decreased with the number of selected coevolving residue pairs, indicating that the number of signif- icantly coevolving residues in interaction interfaces is rather small (Fig. 1). Overall, the best prediction using BLAST alignments (AUC =0.716) was obtained with the OMES algorithm using the correlation score of only the single strongest coevol- ving residue pair. Notably, ELSC performed slightly better when we analyzed a large number of correla- tions (ELSC and ELSC_AVG in Fig. 1a), while OMES retrieved slightly better predictions for a length- dependent residue pair threshold (OMES and Fig. 1. Comparison of different prediction strategies for OMES_AVG in Fig. 1b) and small numbers of interacting complex subunits using either the best X selected residues. correlated residue pairs (a) or the protein length/X best It has been previously suggested that different residue pairs (b). Shown for every prediction is the AUC sequence alignment methods could influence (area under the ROC curve) obtained, where larger AUC values indicate better predictions. OMES_AVG and ELS- sequence analyses, especially those performed for 29 C_AVG correspond to predictions where the average score phylogenetic purposes. Importantly, using various of the N best predicted residue pairs is used for classification, multiple-sequence alignment algorithms had only a whileOMESandELSCsimplyusethescoreoftheNth best minimal effect on the predictions of interacting residue pair. All results are based on multiple alignments subunits. For both the OMES algorithm and the generated from BLAST outputs (see Materials and Methods). 162 Coevolution in Oxidative Phosphorylation Complex I the case of OMES, the AUC of 0.716 (BLAST predictions with 56.1% sensitivity and 81.3% spec- alignments) was increased to 0.725 (MUSCLE) and ificity (for detailed results, see Table S2). Altogether, 0.723 (T-Coffee). For ELSC, the maximal AUC scores screening for coevolving residue pairs among of 0.681 and 0.675 (MUSCLE and T-Coffee, respec- proteins allowed the detection of truly physically tively) closely matched the maximal AUC of 0.676 interacting subunits with low false-positive signals. found for alignments generated from BLAST out- Nevertheless, our approach has somewhat limited puts. This observation could be attributed to the fact sensitivity, as some truly interacting proteins were that the presence of a single strongly coevolving missed. residue pair is only rarely affected by the method of alignment generation even though the exact correla- Predicting subunit interactions within the tion score may change (Table S1). Our prediction membrane arm of OXPHOS complex I method is thus highly robust against the chosen alignment tool; hence, the results presented below As demonstrated above, we have demonstrated are based on the original BLAST and ClustalW that identifying coevolving residues within large alignments. protein complexes can accurately predict at least a subset of physically interacting subunits. We thus Accuracy assessment for the prediction of sought to predict subunit interactions within a interacting subunits using selected protein complex with a poorly resolved crystal score thresholds structure. We chose, as a test case, to focus on the interactions between mtDNA-encoded and nDNA- Based on the above described analyses, we further encoded subunits of the first and largest protein investigated the classifications obtained using only complex of the mitochondrial energy-producing the score of the highest-ranking residue pair of each machinery, namely complex I. The crystal structure interacting protein pair (Table S1). We selected of complex I has only been solved with low several distinct score thresholds and classified all resolution, and the network of subunit interactions, 95 subunit pairs in our data set into interacting and including the interactions between mtDNA- noninteracting subunits. For every score threshold encoded and nDNA-encoded subunits (cytonuclear chosen, we calculated specificity (i.e., the fraction of interaction), is poorly understood.32 The choice to known noninteracting subunits that were predicted focus on cytonuclear interactions in complex I stems as noninteracting), sensitivity (i.e., the fraction of from their important role in diseases,33 but also in known interacting subunits that were predicted as evolutionary processes such as adaptation10 and the interacting), and accuracy (i.e., the fraction of truly emergence of new species.34 interacting subunits out of the sum of all subunits We recently identified three nDNA-encoded sub- that were predicted to interact). Obviously, the units of complex I that underwent positive selection higher is the score threshold, the more specific and and, as such, were selected as the best candidates to less sensitive a prediction becomes. Both OMES and interact with the fast-evolving mtDNA-encoded ELSC resulted in a prediction with 90.6% specificity subunits.13 To identify candidate subunits involved and 33.3% sensitivity (OMES score threshold, 5.0; in cytonuclear physical interaction within complex I, ELSC score threshold, 20) (Table 3). The best tradeoff we predicted coevolving residues and analyzed between sensitivity and specificity was obtained them using the OMES and ELSC algorithms. As a with an OMES threshold of 3.5, resulting in first step, we applied our correlated mutations approach to identify the best candidates within the seven complex I mtDNA-encoded subunits (ND1– Table 3. Assessing the specificity, sensitivity, and ND6 and ND4L) as interacting with one of the accuracy obtained while predicting interacting subunits positively selected subunits (NDUFC2) chosen as a within complexes model for the current analysis. The approach was also applied to identify all possible interactions Score Specificity Sensitivity Accuracy Algorithm threshold (%) (%) (%) among mtDNA-encoded subunits. For every possi- ble pair of subunits, considering either NDUFC2 or OMES 5.0 90.6 33.3 88.0 any of the mtDNA-encoded subunits, the highest- OMES 4.0 84.4 39.4 83.9 OMES 3.5 81.3 56.1 86.0 scoring residue pairs were utilized for the identifi- OMES 3.2 62.5 72.7 80.0 cation of candidate interacting and noninteracting ELSC 20 90.6 33.3 88.0 subunits. Favoring specificity over sensitivity and ELSC 15 84.4 37.9 83.3 using OMES and ELSC, we identified the ND4 and ELSC 12 78.1 44.0 80.6 ND5 subunits as exhibiting the highest scores, thus The analysis was performed using different score thresholds for corresponding to the most likely mtDNA candidates the highest-scoring residue pair. All results are based on to interact with NDUFC2 (Table 4). Figure 2 multiple alignments generated from BLAST outputs (see exemplifies a representative coevolving pair of Materials and Methods). amino acids in NDUFC2 and ND5.Further Coevolution in Oxidative Phosphorylation Complex I 163

Table 4. Coevolution of NDUFC2 and mtDNA-encoded candidates for interaction with NDUFC2 were ND6 complex I subunits (OMES score N4.0; ELSC score=13.9) and ND2 N Subunit 1 Subunit 2 OMES ELSC (ELSC score 14). Both ND4 and ND5 are known to share the same subcomplex with NDUFC2, thus NDUFC2 ND1 4.0 13.0 further supporting their candidacy as interacting NDUFC2 ND2 3.6 14.1 35 NDUFC2 ND3 3.0 11.4 subunits. NDUFC2 ND4 5.1 16.1 The availability of sequence orthologues from NDUFC2 ND4L 3.6 13.2 multiple species allowed us to expand our analysis NDUFC2 ND5 4.4 15.6 and to predict the interactions of the two other NDUFC2 ND6 4.0 13.2 positively selected nDNA-encoded complex I sub- The maximal coevolution scores obtained with the OMES and units, NDUFA1 and NDUFA4, with mtDNA- ELSC prediction methods for every subunit pair are summarized. encoded subunits. Our analysis revealed that ND1, ND4, and ND5 are likely to interact with NDUFA1 (Table 5). The analysis of NDUFA4, however, did

Species Phylogenetic Position 30 of the Position 101 of the classification human NDUFC2 human ND5

Homo Sapiens L M Gorilla gorilla L M Pan troglodytes L M Pongo pygmaeous L M Symphalangus syndactylus Primates L M Macaca mulatta L M *Macaca mulatta-macaca sylvanus L M *Colobus angolensiss-Colobus guereza L M Nycticebus coucang L M lemur catta L M Bos taurus L M Canis familiaris L M Sus scrofa L M Equus caballus L M *Spermophilus tridecemlineatus-Sciurus vulgaris L M Oryctolagus cuniculus Non-primate L M Ovis aries mammals L M Mus musculus L M Rattus norvegicus L M Loxodonta africana L M Dasypus novemcinctus L M Sorex_araneus L M Cavia porcellus L M Monodelphis domestica I L Tetraodon nigroviridis S L Xenopus laevis S L Non Gallus gallus mammalian S L Danio rerio vertebrates S L Oryzias latipes S L Takifugu rubripes S L

Fig. 2. High-scoring coevolving residue positions (position 30 in human NDUFC2 and position 101 in human ND5) confirmed by the OMES and ELSC algorithms. An alignment of fused proteins (NDUFC2–ND5)wasconstructedusing ClustalW, with sequences from the same species for both proteins concatenated. Importantly, performing the alignment using another independent tool (T-Coffee) did not alter the results. ⁎Species for which sequences were available only from the closest subspecies of the two proteins: Macaca mulata (NDUFC2)andMacaca sylvanus (ND5); Colobus angolensis (NDUFC2) and Colobus guereza;andSpermophilus tridecemlineatus and Sciurus vulgaris. Colors highlight the correlated changes. 164 Coevolution in Oxidative Phosphorylation Complex I

Table 5. Coevolution of NDUFA1 and mtDNA-encoded Altogether, this implied that ND5 and ND1 serve complex I subunits as a ‘minihub’ for other mtDNA-encoded subunits, ELSC OMES Subunit 2 Subunit 1 with ND5 being the best candidate to additionally interact with NDUFC2 and NDUFA1. 14.7 4.5 ND1 NDUFA1 13.9 4.2 ND2 NDUFA1 12.6 4.3 ND3 NDUFA1 Positive selection predicted cytonuclear subunit 12.9 4.7 ND4 NDUFA1 interactions but not coevolving residues 12.3 4.5 ND4L NDUFA1 14.3 4.5 ND5 NDUFA1 Positive selection has been identified in a subset of 12.3 3.1 ND6 NDUFA1 nDNA-encoded subunits of OXPHOS complexes, The maximal coevolution scores obtained with the OMES and which were in close contact with mtDNA-encoded ELSC prediction methods for every subunit pair are summarized. subunits of these complexes. Thus, the action of positive selection on nDNA-encoded subunits was not reveal clear candidates for interactions (data not interpreted as a compensation for the fast mtDNA mutation rate in order to maintain structure and shown). It is important to note that the available 9 orthologous sequences for the analysis of NDUFA1 function throughout evolution. We therefore rea- soned that specific positively selected amino acid and NDUFA4 did not encompass exactly the same – set of species as were analyzed in the case of residues may play an important role in nDNA NDUFC2. Since we aimed for a high accuracy at the mtDNA subunit interactions. As mentioned above, expense of sensitivity, some false-negative results our analysis predicted interactions of mtDNA- are expected. Thus, the lack of high-scoring interac- encoded subunits with two of the three positively tions for NDUFA4 could be due either to false- selected nDNA-encoded subunits (i.e., NDUFC2 and NDUFA1). Since both genes underwent positive negative predictions or to actual lack of interaction. 13 We further predicted interactions among mtDNA- selection within primates, specific amino acids encoded complex I subunits. Analysis of candidates that experienced positive selection in NDUFC2 and for direct interactions among these subunits resulted NDUFA1 were detected using SELECTON§ (a Web in relatively high correlation scores, possibly due to interface used to assess signatures of natural the elevated mtDNA mutation rate. Hence, we selection in proteins) based on sequence compar- applied higher thresholds for the identification of isons of orthologs from different primates, addition- interacting subunits (score of 4.5 for OMES and al nonprimate mammals, and nonmammalian score of 15 for ELSC). Among the highest-scoring vertebrates (Fig. 3). When the analysis was per- subunit pairs are all possible pairs formed by the formed on sequence alignments divided according three subunits ND1, ND3, and ND5. ND1 and ND5 to phylogenetic considerations (Table S3), we exhibited strong coevolution signals with a number identified amino acids that were positively selected of other subunits, namely the subunits ND2 and in all tested branches, as well as those amino acids ND4 in the case of ND1, and the subunits ND4 and that were positively selected in specific branches, ND6 in the case of ND5 (Table 6). Additionally, the especially the primate branch. This suggests that ND4–ND6 pair had a remarkably high maximal whereas some amino acids possess a more general ELSC score of 17, whereas the maximal OMES score adaptive value, others play a lineage-specific role in (4.6) was only slightly above the threshold. adjusting to changing energy needs (Fig. 3a and b; Fig. S3). Interestingly, we noticed that amino acids 40 and 44 in NDUFA1, which received the highest correlation scores with ND5 and ND1, are mapped to the “species-specific” domain, which has been Table 6. Coevolution of mtDNA-encoded complex I previously shown as important for complex I 36 subunits assembly and cytonuclear interactions, thus sup- porting our predicted interactions. Subunit 1 Subunit 2 OMES ELSC Notably, there was no significant correlation ND1 ND3 6.6 17.8 between the positively selected amino acids and ND1 ND4 5.9 16.2 coevolving residues. Nevertheless, positive selection ND5 ND6 5.4 15.8 ND1 ND2 5.3 15.5 or other methods estimating amino acid replace- ND3 ND5 4.9 15.1 ment rates could be used as indicators for the ND4 ND5 4.8 15.7 evolutionary period during which most of the ND4 ND6 4.6 17.0 amino acid replacements were accumulated, thus ND1 ND5 4.6 15.4 facilitating the choice of sequences for the correlated The maximal coevolution scores obtained with the OMES and mutations test. ELSC prediction methods for those subunit pairs passing the required thresholds (4.5 for OMES and 15 for ELSC) are summarized. § http://SELECTON.tau.ac.il/overview.html Coevolution in Oxidative Phosphorylation Complex I 165

(a)

1 112131415161718191101111

(b)

1 112131415161

(c)

1 112131415161718191101111

121 131 141

The selection scale:

Positive selection Purifying selection

Fig. 3. Patterns of positively selected amino acid positions in NDUFC2 and NDUFA1 versus THRSP. Analysis was performed and statistical significance was estimated using the default settings of SELECTON (http://SELECTON.tau.ac. il/). Color codes represent the relative degrees of either positive selection (yellow-orange spectrum) or negative selection (pink-purple spectrum). (a) Positive selection in NDUFC2 in primates. (b) Positive selection in NDUFA1 in primates. (c) THRSP is mostly negatively selected.

To control for the possibility that the positive ing the best candidates for interaction and with ND3 selection was merely a side effect of an accelerated predicted to have low odds of interaction with mutation rate in chromosomal location, we also NDUFC2. Similarly, we tested the best candidates to performed positive selection analysis for THRSP,a interact with NDUFA1, namely ND1, ND5, and ND4 gene whose sequence is located tail to tail with (see the text below). Since all mtDNA-encoded NDUFC2 (Fig. 3c). Our results indicate that THRSP proteins rely on the mtDNA genetic code, we is subjected to negative—rather than positive— recoded ND1, ND3, ND4,andND5 to include selection. As negative selection or neutrality applies general cytoplasmic codons (see Materials and to most protein-coding genes in humans, THRSP Methods). Each of these genes was cloned into the reflects the rule rather than the exception.37,38 pBD . Since the construct harboring ND5 Therefore, the positive selection effect noted is to proved toxic to the yeast in our hands, we continued be specific for NDUFC2 and NDUFA1. our experiments solely with encoding ND3, ND4, ND1, NDUFC2, and NDUFA1. Firstly, Yeast two-hybrid experiments confirm the direct we tested for the best predicted mtDNA-encoded interactions of NDUFC2 and NDUFA1 with protein interactions with NDUFC2. The results of mtDNA-encoded subunits these experiments (Fig. 4) show yeast growth on restrictive media (–L-W-H) in the combined pres- To verify the predicted protein–protein interac- ence of NDUFC2 and ND4, consistent with a direct tions, we used a yeast two-hybrid assay. Since the interaction of these subunits (Fig. 4b). In contrast, we traditional yeast two-hybrid assay is a qualitative observed a lack of yeast growth when the combina- (nonquantitative) assay and since we were interest- tion of NDUFC2 and ND3 was introduced, consis- ed in cytonuclear interactions, we focused only on tent with a lack of interaction between these our strongest predictions for cytonuclear interac- subunits, exactly as predicted (Fig. 4). Secondly, in tions. Accordingly, we amplified by reverse tran- order to assess whether the observed lack of scription PCR the cDNAs of human NDUFC2 or interaction is true or due to lack of protein NDUFA1 from commercially available human mus- expression, we performed Western blot analysis. cle or colon RNA samples, respectively, and cloned This analysis confirmed that all of the tested proteins the transcripts into the pAD plasmid (see Materials were expressed while growing the yeast in –L-W and Methods). Based on our bioinformatic predic- restrictive media (Fig. 4c); therefore, the observed tions, we aimed at assessing the interactions of lack of interaction is likely to be true. Finally, we mtDNA-encoded proteins with NDUFC2, with the applied the yeast two-hybrid assay as described mtDNA-encoded proteins ND4 and ND5 represent- above to test for the highest-ranking interactions of 166 Coevolution in Oxidative Phosphorylation Complex I

Fig. 5. NDUFA1 physically interacts with ND4 or ND1 but not with NDUFC2. Similarly to the results demonstrated in Fig. 3a and b, yeast cells were cotransformed with plasmids pAD-NDUFA1 and pBD-NDUFC2 (1), plasmids pAD-NDUFA1 and pBD-ND4 (2), plasmids pAD-NDUFA1 and pBD-ND1 (3), plasmid pAD-NDUFA1 and empty pBD (4), and plasmids pAD-NDUFC2 and pBD-ND4 (5). The transfected yeast cells underwent serial dilutions five times and were spotted onto selective media lacking leucine and Fig. 4. NDUFC2 physically interacts with ND4 but not tryptophan (a; –L-W) to ensure plasmid integrity. Physical with ND3. Yeast cells, cotransformed with plasmids pAD- interaction was detected by growth on plates lacking NDUFC2 and pBD-ND3 (1), plasmid pBD-ND4 (2), an histidine (b; –L-W-H). empty pAD vector with pBD-ND4 (3), and pAD-NDUFC2 with an empty pBD plasmid, were serially diluted five times and spotted onto selective media lacking leucine and tryptophan (a; –L-W) to ensure plasmid integrity. references within). Secondly, we predicted that Physical interaction was detected by growth on plates NDUFA1, ND4,andND5 would interact with lacking histidine (b; –L-W-H). (c) Western blot analysis of several subunits, consistent with their known essen- pBD expression. Gel loaded with protein extract from tiality to the assembly of the membrane arm of yeast transfected with (from left to right): (1) NDUFC2 and empty pBD; (2) empty pAD with ND4-BD; (3) NDUFC2 with pBD-ND4; (4) NDUFC2 with pBD-ND3; (5) untrans- ND4L fected yeast. ? ND3

NDUFA1 with mtDNA-encoded subunits (ND4 and NDUFA1 ND1), as well as the predicted lack of interaction ND1 ND5 between NDUFA1 and NDUFC2, with the latter ND2 being cloned in pBD for this particular assay (Fig. 5). The results of these experiments show yeast growth on restrictive media (–L-W-H; Fig. 5b) in the NDUFC2 combined presence of NDUFA1 and ND4 or ND1, consistent with a direct interaction of these subunits. ND4 Taken together, these findings offer experimental support for our bioinformatic predictions. ND6

Proposed interaction network of NDUFC2, NDUFA1, and mtDNA-encoded complex I subunits Fig. 6. A proposed schematic model for the interaction network of NDUFC2, NDUFA1, and mtDNA-encoded Based on the bioinformatic predictions and yeast complex I subunits. The scheme is based on the compiled two-hybrid experiments, we propose a model for a predictions described in Tables 4, 5,and6. This figure network of interactions among the tested subunits demonstrates the networks of predicted interactions among (Fig. 6). This model is largely consistent with the tested subunits. The illustration does not depict the 7 actual size of the subunits. The thickness of the arrows analyses of complex I assembly intermediates . corresponds to the strength of the predictions, with the Firstly, both ND3 and ND2 were predicted to interact thickest arrows representing the best predicted interactions with ND1, consistent with the transient association predictions (based on the top-scoring correlated pairs of of these very subunits in the presence of the complex residues). Question mark indicates a lack of clear predictions I assembly factor NDUFAF1 (Vogel et al.7 and for the interaction of ND4L with any of the tested subunits. Coevolution in Oxidative Phosphorylation Complex I 167 complex I (Vogel et al.7 and references within). of protein complexes are resolved, a future version of Finally, ND5 was predicted to interact with several our prediction method will use complex-specific score mtDNA-encoded complex I subunits, consistent thresholds, supported by an empirically derived scale with a recent structural analysis of its Thermus for the number of expected interactions given a thermophilus ortholog Nqo12.5 However, in addition certain number of complex subunits. to these consistencies with the literature, our work We applied our prediction approach to cytonuclear put forward new insights into complex I cytonuclear subunit interactions in human mitochondrial subunit interactions: both the predictions and the OXPHOS complex I, a protein complex whose experimental data presented in the current work structure is poorly understood. This allowed us to support the interactions of NDUFA1 and NDUFC2 identify candidate interactions among mtDNA- with ND4 and the interactions of the former with encoded subunits, as well as with the positively ND1, but also the lack of interactions of NDUFA1 selected subunits NDUFC2 and NDUFA1,andto and NDUFC2.Inaddition,bothNDUFA1 and propose a model for these predictions (Fig. 6). For NDUFC2 were predicted to interact with ND5. these two subunits, positive selection mainly occurred Since ND4, ND5,andNDUFA1, as mentioned in primates, which led us to enrich for primate above, are essential for the assembly of the mem- sequences and hence better detect signals of coevolu- brane arm of complex I, it would be tempting to test tion. Since human complex I is far more complex than the essentiality of NDUFC2 to the assembly process. its bacterial and yeast40 counterpart and since subunit Indeed, we noticed that transient knockdown of interactions in the membrane arm of the complex are NDUFC2 in cell culture resulted in cell growth yet to be resolved, our bioinformatic approach retardation, providing clues consistent with this constitutes the first step towards disentangling cyto- hypothesis (M.G. et al., unpublished results). In nuclear subunit interactions within this large multi- summary, our findings not only support our subunit complex. Using the yeast two-hybrid system, predictive method for subunit interactions but also we confirmed the predicted interactions of NDUFC2 constitute the first experimental evidence for direct with ND4, the predicted interactions of NDUFA1 with cytonuclear subunit interactions within the human ND1 and ND4, and the predicted lack of interaction of OXPHOS complex I. These findings pave the path NDUFC2 with ND3 and NDUFA1. To the best of our towards understanding the molecular basis under- knowledge, we provide the first experimental support lying the known involvement of complex I cyto- for predictions of the direct interaction of mtDNA- nuclear interactions in diseases,33 reduced fitness, encoded and nDNA-encoded proteins within human reproductive barriers, and speciation events.34 complex I. More generally, analysis of coevolving residues, underlining combined bioinformatic and experimental approaches as described above, could be Conclusions utilized to resolve the network of subunit interactions in other large complexes. Understanding the networks In this study, we have demonstrated the usefulness of subunit interactions within complexes, in general, of correlated mutations for identifying candidate and in OXPHOS complex I, in particular, may shed physically interacting subunits within large multi- light on their functional roles in normal and disease subunit protein complexes. Our observations indicate conditions. that interacting subunits within protein complexes often possess a small number of highly coevolving residues that distinguish them from noninteracting Materials and Methods subunits. Hence, the presence of significantly coevol- ving residues could be used as a tool to identify at Blood samples least a subset of physically interacting subunits with a relatively high accuracy (up to nearly 90%). Specif- For RNA extraction, primate blood samples were ically, of the four tested prediction algorithms, two obtained from The Biblical Zoo (Jerusalem) and The (OMES and ELSC) retrieved predictions with high Monkey Park (Ben Shemen). The samples were taken only specificity and accuracy. It is worth noting that during preplanned medical procedures. The blood sam- certain complexes may differ remarkably in their ples were collected in standard tubes containing anti- general correlation level (Table S1). Alignments with coagulation factors and kept in ice (no more than a few a large number of sequences tend to retrieve higher hours) for RNA purification. correlation scores than alignments of fewer 39 sequences. Additionally, the inclusion of rather RNA purification diverse sequences resulted in increased correlation scores. Hence, there is a need to adjust the threshold RNA purification was performed using the Versagene levels individually for the identification of physically total RNA purification kit (Gentra), in accordance with the interacting subunits in each newly investigated manufacturer's protocol. Briefly, red cells underwent lysis complex. Ideally, once significantly more structures by RBC buffer [50 mM TRIS (pH 7.2), 0.5 mM EDTA] to 168 Coevolution in Oxidative Phosphorylation Complex I facilitate the isolation of white blood cells by centrifuga- ment (N95%) and sequences with more than 25% gaps were tion. Pellets of white blood cells were incubated in the removed. Finally, concatenated alignments with fewer than presence of a detergent/salt solution to eliminate endog- 10 sequences were removed from the analysis, resulting in enous RNase activity. The RNA-containing supernatant concatenated alignments for 95 protein pairs used for was loaded onto a purification column supplied by the kit further analyses. For all 95 protein pairs, concatenated for RNA binding. The bound RNA was washed in order to sequence alignments were generated based on the original dispose of proteins and DNA. Residual DNA was removed BLAST, T-Coffee, and MUSCLE alignments. by on-column DNase treatment. Finally, the purified RNA was eluted with diethylpyrocarbonate-treated water. Prediction of coevolving residues cDNA preparation As mentioned in Results and Discussion, we used three prediction methods: McBASC, OMES, and ELSC. These Total cDNA was produced from blood RNA extracts methods were selected because they performed best in a using the iScript cDNA synthesis kit (Bio-Rad), in comparative study on fusion proteins.26 The McBASC accordance with the manufacturer's protocol. The cDNA method was applied using either the Miyata substitution was in turn used for PCR amplification and sequencing of matrix42 or the McLachlan substitution matrix.43 With NDUFC2 and THRSP transcripts using specific primers each method, we extracted the maximum coevolution (for primer list and PCR conditions, see Table S4). score, the quantiles, and the mean scores for each possible pair of subunits. These values were grouped into two sets (corresponding to interacting and noninteracting sub- Test data set and multiple-sequence alignments units) and compared. Several properties of the obtained distribution of correlation scores gained for each align- To assess the coevolution signal as a predictive tool for ment and prediction method were assessed for their subunit interactions, we obtained a nonredundant set of ability to distinguish physically interacting complex multisubunit protein complexes with solved structure (Fig. subunits from those that do not interact (see Results and S1). The final data set contained 10 complexes (Table 1) Discussion). To this end, the number of residue pairs with β β α harboring a total of 56 different protein subunits. These aC –C distance (C in the case of glycine) of less than 8 Å subunits were involved in 95 pairwise interactions, while the was calculated from the corresponding Protein Data Bank total number of possible interactions (given the way these structure, and all protein pairs having at least one such subunits are distributed among the complexes) was 167. residue pair were counted as interacting. Accordingly, 66 When a protein complex harbored multiple copies of the protein pairs out of 95 in total were classified as same subunit, a representative subunit was selected for the interacting, while 32 protein pairs were categorized as analysis corresponding to the protein chain with the maximal noninteracting. For each test case, we calculated an ROC number of observed interactions with other subunits. curve (which plots the achieved true-positive rate against Homologous sequences for the analyzed polypeptides the false-positive rate), with any point above the diagonal were obtained with PSI-BLAST searches against the corresponding to better predictions than random (Fig. S2). ‖ unfiltered NCBI database, with three iterations and The quality of different predictions was compared using with the E-value threshold for inclusion of related the AUC measure, with AUC values above 0.5 indicating − 4 database sequences set to 1×10 . Following our previous predictions better than randomb. work dealing with the prediction of intramolecular contacts in membrane-bound proteins,41 multiple align- ments for every subunit were directly constructed from Sequence data of OXPHOS complex I mtDNA-encoded PSI-BLAST results by compiling all hits found using their subunits and nDNA-encoded subunits alignment to the reference sequence. Additionally, in accordance with the previously published standards for Orthologous sequences of the complex I subunit the prediction of intermolecular correlations,21 if more NDUFC2, NDUFA1, and the mtDNA-encoded subunits than one sequence was obtained from a given species, the ND1–ND6 and ND4L were obtained from the NCBI and sequence most similar to the reference sequence was kept ENSEMBLa databases. To gain additional sequence in the alignment. In general, species not contributing variability, we sequenced further orthologs from primates orthologous sequences to all analyzed subunits of a given currently not present in public databases (Table S3). complex were removed. Finally, to test the influence of Primers for PCR amplification and sequencing are alignment quality on the resulting prediction, we rea- specified in Table S4. In total, sequences from more than ligned all used sequences utilizing the multiple-alignment 30 species were available (Table S3) and used to generate programs T-Coffee30 and MUSCLE.31 multiple-sequence alignments by ClustalWb for each of In order to predict correlated mutations between two the proteins separately; these alignments were subse- proteins, we first performed multiple orthodox alignments quently concatenatedc. for each protein separately and then constructed a joint alignment by concatenating sequences of the same organ- ism for every possible protein pair within each complex. b Our final method, called ComplexCorr, is available at From these alignments, sequences with a high level of http://webclu.bio.wzw.tum.de/complexcorr similarity to another sequence in the concatenated align- a www.ensembl.org b www.clustal.org c Alignments are available at http://webclu.bio.wzw. ‖ www.ncbi.nlm.nih.gov tum.de/oxphos Coevolution in Oxidative Phosphorylation Complex I 169

Prediction of interacting subunits within the the plasmids pAD-NDUFC2, pAD-NDUFA1, pBD-ND3, mitochondrial OXPHOS complex I and pBD-ND4, respectively, using a homologous recom- bination-based cloning procedure with six primers Subunit interactions were evaluated for all possible described in Table S5. The yeast strain PJ694A (MATa Δ Δ combinations of NDUFC2 or NDUFA1 with mtDNA- trp1-901 leu2-3 112 ura3-52 his3-200 gal4 gal80 GAL2- ∷ ∷ encoded subunits and for interactions among mtDNA- ADE2 met2 GAL7-lacZ LYS2 GAL1-HIS3) was cotrans- encoded subunits. As described above, a concatenated formed with plasmid pAD-NDUFC2 and either plasmid alignment was constructed for every analyzed subunit pBD-ND3 or plasmid pBD-ND4. The same procedure was pair by connecting sequences of the same species. applied for cotransfection with plasmid pAD-NDUFA1 Redundant sequences with a N95% identity to another and either plasmid pBD-ND1, plasmid pBD-ND4,or sequence in the alignment were removed. Coevolving plasmid pBD-NDUFC2. Yeast cells harboring the plasmids residues were predicted using the OMES and ELSC were cultured in synthetic medium lacking tryptophan methods, which retrieved the best distinction between and leucine. Cells with an appropriate optical density − 6 interacting and noninteracting subunits in the previous (3e cells/ml) were serially diluted, and positive interac- analysis. Finally, interacting subunits were predicted tions were detected by yeast growth on a medium lacking using thresholds optimized on complexes with solved histidine. structures (see Results and Discussion). Protein extraction Detecting positively selected amino acids within the complex I subunits NDUFC2 and NDUFA1 Overnight yeast cultures were centrifuged at 3220 rcf, and the pellet was resuspended in 200 μlof20% We generated multiple cDNA sequence alignments to trichloroacetic acid. Cell lysate was obtained by vigorous μ μ identify positively selected residue positions by ClustalW pellet vortexing for 5 min with 425- m to 600- m acid- and SELECTON. The list of sequences used is provided in washed glass beads (Sigma). The cell lysate was centri- Table S3. Briefly, the hypothesis tested is whether positive fuged in 16,000 rcf, and the pellet containing aggregated selection operates on NDUFC2 or NDUFA1, as contrasted proteins was resuspended in loading buffer following with a null hypothesis that assumes that there is no 5 min of incubation in 95 °C for future electrophoresis. positive selection. The SELECTON server allows for detecting the selective forces even at single amino acid Western blot analysis sites. The ratio of nonsynonymous (i.e., amino acid altering) substitutions to synonymous (i.e., silent) sub- In order to assess the protein expression of the fused stitutions, known as the Ka/Ks ratio, is used to estimate both positive selection and purifying selection at each pBD constructs, we first extracted proteins from yeast, as amino acid site. As a control for the analysis of positively described above. Following resuspension of the precipi- selected amino acids, a corresponding alignment was tated proteins, we loaded the samples onto a 10% generated for THRSP, the 3′ end of which lies in a tail-to- polyacrylamide gel and ran them for 1 h at 120 V. The tail orientation with the 3′ end of NDUFC2 both in birds transfer of proteins onto a nitrocellulose membrane was and in mammals.44 performed at 400 mA for 1 h using a Bio-Rad device. We blocked the membrane with 5% milk in TBS-T [0.02 M Tris-buffered saline (pH 7.6) and 0.1% Tween] for 1 h, Yeast two-hybrid analysis followed by three 10-min washes in TBS-T. In order to perform the detection, we incubated the membrane with To assess predicted protein–protein interactions, we rabbit polyclonal IgG antibody raised against Gal4 chose to use the yeast two-hybrid approach. To this end, Binding Domain (a generous gift from Prof. Michal we have amplified the cDNA of human NDUFC2 by Shapira of Ben-Gurion University of the Negev; Santa reverse transcription PCR using commercially available Cruz Biotechnology). The membrane was incubated muscle RNA (Ambion) as template, followed by incuba- overnight at 4 °C in the presence of a 1:500 dilution of tion with high-fidelity polymerase (plaque-forming units) the primary antibody and 1% milk in TBS-T. After three and cloning into the pGEM T-vector (Promega). Since all 10-min washes with TBS-T, we incubated the membrane mtDNA-encoded proteins used in this assay are coded by with the secondary antibody, peroxidase-labeled goat the mtDNA genetic code, we recoded the human ND3 and anti-rabbit IgG (KPL), which was diluted 1:50,000 in 1% – ND5 genes to comply with the general cytoplasmic codons milk TBS-T solution. After three 10-min washes with (GeneScript) and used the recoded ND4 and ND1 genes TBS-T, we used the EZ-ECL chemiluminescence detection that were already available to us.45 The sequences of the kit (Biological Industries) for reaction with the peroxidase recoded genes are available in Supplementary Materials. conjugate. After 5 min of incubation with the substrate, Analysis of protein–protein interaction was performed signals were visualized using the LAS-300 Intelligent Dark using the commercial GAL4 Two Hybrid vector Box (Rhenium; Fujifilm) after a 3-min exposure. kit (Stratagene), in accordance with the manufacturer's instructions. Plasmids were constructed using conven- tional PCR recombination methods. In brief, the tran- Supplementary information scripts of NDUFC2 or NDUFA1 were fused to a plasmid containing the GAL4 activating domain (pAD), while Additional information as well as supplementary tables ND3, ND4, NDUFC2, and ND1 were fused to a plasmid and figures can be accessed at: http://www.webclu.bio. containing the GAL4 binding domain (pBD), generating wzw.tum.de/oxphos/ 170 Coevolution in Oxidative Phosphorylation Complex I

Links and versatile process. Biochim. Biophys. Acta, 1767, 1215–1227. – 8. Lynch, M., Koskella, B. & Schaack, S. (2006). Mutation SELECTON web interface to assess signatures of pressure and the evolution of organelle genomic in proteins (http://SELECTON.tau. architecture. Science, 311, 1727–1730. ac.il/overview.html). Our final method, called Com- 9. Grossman, L. I., Wildman, D. E., Schmidt, T. R. & plexCorr, is available at http://webclu.bio.wzw.tum. Goodman, M. (2004). Accelerated evolution of the de/complexcorr. Additional information as well as electron transport chain in anthropoid primates. supplementary tables and figures could be accessed Trends Genet. 20, 578–585. at: http://webclu.bio.wzw.tum.de/oxphos NCBI 10. Rand, D. M., Haney, R. A. & Fry, A. J. (2004). Cytonuclear coevolution: the genomics of coopera- (www.ncbi.nlm.nih.gov) Swissport (www.expasy. 19 – ch/sprot) ENSEMBL (www.ensembl.org)ClustalW tion. Trends Ecol. Evol. , 645 653. (www.clustal.org). 11. Sackton, T. B., Haney, R. A. & Rand, D. M. (2003). Cytonuclear coadaptation in Drosophila: disruption of cytochrome c oxidase activity in backcross genotypes. Evol. Int. J. Org. Evol. 57, 2315–2325. Acknowledgements 12. Willett, C. S. & Burton, R. S. (2004). Evolution of interacting proteins in the mitochondrial electron transport system in a marine copepod. Mol. Biol. This work was funded by grants from the Israel 21 – Science Foundation and the Bikurah FIRST Founda- Evol. , 443 453. 13. Mishmar, D., Ruiz-Pesini, E., Mondragon-Palomino, tion (D.M.), by the Deutsche Forschungsge- M., Procaccio, V., Gaut, B. & Wallace, D. C. (2006). meinschaft (A.F.), and by the EU Biosapiens Adaptive selection of mitochondrial complex I sub- Network of Excellence (D.F.). The authors also wish units during primate radiation. Gene, 378,11–18. to thank the Negev Foundation for the scholarship of 14. Mishmar, D. & Zhidkov, I. (2010). Evolution and M.G. The generation and evaluation of the ND4 diseaseconvergeinthemitochondrion.Biochim. vector were financed by the French Agence National Biophys. Acta, 1797, 1099–1104. pour la Recherche (ANR)/Maladies Rares (M.C.-D.). 15. Zhidkov, I., Livneh, E. A., Rubin, E. & Mishmar, D. The authors express their special thanks to Drs. Nili (2009). mtDNA mutation pattern in tumors and human evolution are shaped by similar selective Avni (The Biblical Zoo) and Tzachi Eizenberg (The 19 – Monkey Park) for providing blood and tissue constraints. Res. , 576 580. 16. Gobel, U., Sander, C., Schneider, R. & Valencia, A. samples of primates to the study, and to Prof. Michal (1994). Correlated mutations and residue contacts in Shapira (Ben-Gurion University of the Negev) for proteins. Proteins, 18, 309–317. providing the antibodies for Western blot analysis. 17. Pazos, F., Helmer-Citterich,M.,Ausiello,G.& Valencia, A. (1997). Correlated mutations contain information about protein–protein interaction. J. Mol. References Biol. 271, 511–523. 18. Yeang, C. H. & Haussler, D. (2007). Detecting 1. Zickermann, V., Drose, S., Tocilescu, M. A., Zwicker, coevolution in and among protein domains. PLoS K., Kerscher, S. & Brandt, U. (2008). Challenges in Comput. Biol. 3, e211. elucidating structure and mechanism of proton 19. Madaoui, H. & Guerois, R. (2008). Coevolution at pumping NADH:ubiquinone oxidoreductase (com- protein complex interfaces can be detected by the plex I). J. Bioenerg. Biomembr. 40, 475–483. complementarity trace with important impact for 2. Schapira, A. H. (2006). . Lancet, predictive docking. Proc. Natl Acad. Sci. USA, 105, 368,70–82. 7708–7713. 3. Gabaldon, T., Rainey, D. & Huynen, M. A. (2005). 20. Filizola, M., Olmea, O. & Weinstein, H. (2002). Tracing the evolution of a large protein complex in the Prediction of heterodimerization interfaces of G- , NADH:ubiquinone oxidoreductase (com- protein coupled receptors with a new subtractive plex I). J. Mol. Biol. 348, 857–870. correlated mutation method. Protein Eng. 15, 881–885. 4. Belogrudov, G. I. & Hatefi, Y. (1996). Intersubunit 21. Pazos, F. & Valencia, A. (2002). In silico two-hybrid interactions in the bovine mitochondrial complex I as system for the selection of physically interacting revealed by ligand blotting. Biochem. Biophys. Res. protein pairs. Proteins, 47, 219–227. Commun. 227, 135–139. 22. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., 5. Efremov, R. G., Baradaran, R. & Sazanov, L. A. (2010). Bhat, T. N., Weissig, H. et al. (2000). The Protein Data The architecture of respiratory complex I. Nature, 465, Bank. Nucleic Acids Res. 28, 235–242. 441–445. 23. Olmea, O. & Valencia, A. (1997). Improving contact 6. Torres-Bacete, J., Sinha, P. K., Castro-Guerrero, N., predictions by the combination of correlated muta- Matsuno-Yagi, A. & Yagi, T. (2009). Features of tions and other sources of sequence information. Fold. subunit NuoM (ND4) in Escherichia coli NDH-1: Des. 2, S25–S32. topology and implication of conserved Glu144 for 24. Fodor, A. A. & Aldrich, R. W. (2004). Influence of coupling site 1. J. Biol. Chem. 284, 33062–33069. conservation on calculations of amino acid covari- 7. Vogel, R. O., Smeitink, J. A. & Nijtmans, L. G. (2007). ance in multiple sequence alignments. Proteins, 56, Human mitochondrial complex I assembly: a dynamic 211–221. Coevolution in Oxidative Phosphorylation Complex I 171

25. Dekker, J. P., Fodor, A., Aldrich, R. W. & Yellen, G. 36. Yadava, N., Potluri, P., Smith, E. N., Bisevac, A. & (2004). A perturbation-based method for calculating Scheffler, I. E. (2002). Species-specific and mutant explicit likelihood of evolutionary co-variance in MWFE proteins. Their effect on the assembly of a multiple sequence alignments. Bioinformatics, 20, functional mammalian mitochondrial complex I. J. Biol. 1565–1572. Chem. 277,21221–21230. 26. Halperin, I., Wolfson, H. & Nussinov, R. (2006). 37. Bustamante, C. D., Fledel-Alon, A., Williamson, S., Correlated mutations: advances and limitations. A Nielsen, R., Hubisz, M. T., Glanowski, S. et al. (2005). study on fusion proteins and on the Cohesin–Dockerin Natural selection on protein-coding genes in the families. Proteins, 63, 832–845. . Nature, 437, 1153–1157. 27. Grana, O., Baker, D., MacCallum, R. M., Meiler, J., 38. Nielsen, R., Hubisz, M. J., Hellmann, I., Torgerson, D., Punta, M., Rost, B. et al. (2005). CASP6 assessment of Andrés, A. M., Albrechtsen, A. et al. (2009). Darwinian contact prediction. Proteins, 61, 214–224. and demographic forces affecting human protein 28. Izarzugaza, J. M., Grana, O., Tress, M. L., Valencia, A. coding genes. Genome Res. 19, 838–849. & Clarke, N. D. (2007). Assessment of intramolecular 39. Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & contact predictions for CASP7. Proteins, 69, 152–158. Hwa, T. (2009). Identification of direct residue 29. Wong, K. M., Suchard, M. A. & Huelsenbeck, J. P. contacts in protein–protein interaction by message (2008). Alignment uncertainty and genomic analysis. passing. Proc. Natl Acad. Sci. USA, 106,67–72. Science, 319, 473–476. 40. Hunte, C., Zickermann, V. & Brandt, U. (2010). 30. Notredame, C., Higgins, D. G. & Heringa, J. (2000). Functional modules and structural basis of conforma- T-Coffee: a novel method for fast and accurate tional coupling in mitochondrial complex I. Science, multiple sequence alignment. J. Mol. Biol. 302, 329, 448–451. 205–217. 41. Fuchs, A., Kirschner, A. & Frishman, D. (2009). 31. Edgar, R. C. (2004). MUSCLE: multiple sequence Prediction of helix–helix contacts and interacting alignment with high accuracy and high throughput. helices in polytopic membrane proteins using neural Nucleic Acids Res. 32, 1792–1797. networks. Proteins, 74, 857–871. 32. Sazanov, L. A. & Walker, J. E. (2000). Cryo-electron 42. Miyata, T., Miyazawa, S. & Yasunaga, T. (1979). Two crystallography of two sub-complexes of bovine com- types of amino acid substitutions in protein evolution. plex I reveals the relationship between the membrane J. Mol. Evol. 12, 219–236. and peripheral arms. J. Mol. Biol. 302,455–464. 43. McLachlan, A. D. (1971). Tests for comparing related 33. Potluri, P., Davila, A., Ruiz-Pesini, E., Mishmar, D., amino-acid sequences. Cytochrome c and cytochrome c O'Hearn, S., Hancock, S. et al. (2009). A novel 551. J. Mol. Biol. 61,409–424. NDUFA1 mutation leads to a progressive mitochon- 44. Wang, X., Carre, W., Zhou, H., Lamont, S. J. & drial complex I-specific neurodegenerative disease. Cogburn, L. A. (2004). Duplicated Spot 14 genes in the Mol. Genet. Metab. 96, 189–195. chicken: characterization and identification of poly- 34. Gershoni, M., Templeton, A. R. & Mishmar, D. (2009). morphisms associated with abdominal fat traits. Gene, Mitochondrial bioenergetics as a major motive force of 332,79–88. speciation. BioEssays, 31, 642–650. 45. Ellouze, S., Augustin, S., Bouaita, A., Bonnet, C., 35. Hirst, J., Carroll, J., Fearnley, I. M., Shannon, R. J. & Simonutti, M., Forster, V. et al. (2008). Optimized Walker, J. E. (2003). The nuclear encoded subunits of allotopic expression of the human mitochondrial ND4 complex I from bovine heart mitochondria. Biochim. prevents blindness in a rat model of mitochondrial Biophys. Acta, 1604, 135–150. dysfunction. Am. J. Hum. Genet. 83, 373–387.