
Proteome-wide remodeling of protein location and PNAS PLUS function by stress KiYoung Leea,b,1,2, Min-Kyung Sungc,1, Jihyun Kima,b, Kyung Kima,b, Junghyun Byuna,b, Hyojung Paika, Bongkeun Kimc, Won-Ki Huhc,2, and Trey Idekerd,e,2 aDepartment of Biomedical Informatics, Ajou University School of Medicine, Suwon 443-749, Republic of Korea; bDepartment of Biomedical Sciences, Graduate School, Ajou University, Suwon 443-749, Republic of Korea; cDepartment of Biological Sciences and Research Center for Functional Cellulomics, Seoul National University, Seoul 151-747, Republic of Korea; and Departments of dMedicine and eBioengineering, University of California, San Diego, La Jolla, CA 92093 Edited by Wing Hung Wong, Stanford University, Stanford, CA, and approved June 19, 2014 (received for review October 9, 2013) Protein location and function can change dynamically depending data about large-scale molecular networks, such as protein–pro- on many factors, including environmental stress, disease state, tein interactions, has changed the functional prediction paradigm age, developmental stage, and cell type. Here, we describe an (7, 23). In reality, proteins seldom function alone (24). Therefore, integrative computational framework, called the conditional func- a number of network-based methods have been developed that tion predictor (CoFP; http://nbm.ajou.ac.kr/cofp/), for predicting predict location or function based on a protein’s physically inter- changes in subcellular location and function on a proteome-wide acting or functionally related partners (25). Network-based scale. The essence of the CoFP approach is to cross-reference general methods follow either of two distinct approaches, which we call knowledge about a protein and its known network of physical inter- direct vs. module-based annotation schemes. Direct annotation actions, which typically pool measurements from diverse environ- methods propagate protein location or function annotations ments, against gene expression profiles that have been measured over a biological network based on the assumption that nearby under specific conditions of interest. Using CoFP, we predict condi- proteins in the network have similar functions. Module-based tion-specific subcellular locations, biological processes, and molecu- methods first identify groups of functionally related genes or lar functions of the yeast proteome under 18 specified conditions. In gene products using unsupervised clustering methods and then addition to highly accurate retrieval of previously known gold stan- assign a representative function to each module based on the dard protein locations and functions, CoFP predicts previously known locations or functions of its members (25). unidentified condition-dependent locations and functions for Notably, all of these previous methods have difficulty pre- SYSTEMS BIOLOGY nearly all yeast proteins. Many of these predictions can be con- dicting condition-specific or dynamic behavior. The main diffi- firmed using high-resolution cellular imaging. We show that, under culty in predicting such dynamics is the lack of known protein DNA-damaging conditions, Tsr1, Caf120, Dip5, Skg6, Lte1, and Nnf2 locations and functions under the target condition(s), which are change subcellular location and RNA polymerase I subunit A43, required for generating a prediction model in the training stage. Ino2, and Ids2 show changes in DNA binding. Beyond specific pre- One possible solution is to find dynamic network modules in dictions, this work reveals a global landscape of changing protein gene expression networks constructed under specific conditions location and function, highlighting a surprising number of proteins (26). However, it is difficult to assign representative locations or that translocate from the mitochondria to the nucleus or from en- functions to the dynamic module, and one cannot assign a loca- doplasmic reticulum to Golgi apparatus under stress. tion or function to other proteins not belonging to the module. Here, we describe a general approach for predicting the dynamic function prediction | protein translocation | DTT and MMS | proteome-wide, condition-dependent locations and functions of systems biology | bioinformatics Significance cellular response can induce striking changes in the subcellular Alocation and function of proteins. As a recent example, the Protein location and function are dependent on diverse cell activating transcription factor-2 (ATF2) plays an oncogenic role in states. We develop a conditional function predictor (CoFP) for the nucleus, whereas genotoxic stress-induced localization within proteome-wide prediction of condition-specific locations and the mitochondria gives ATF2 the ability to play tumor suppressor, functions of proteins. In addition to highly accurate retrieval of resulting in promotion of cell death (1). Changes in protein loca- condition-dependent locations and functions in individual con- tion are typically identified using a variety of experimental methods ditions, CoFP successfully discovers dynamic function changes of [e.g., protein tagging (2), immunolabeling (3), or cellular sub- yeast proteins, including Tsr1, Caf120, Dip5, Skg6, Lte1, and Nnf2, fractionation of target organelles followed by mass spectrometry under DNA-damaging stresses. Beyond specific predictions, CoFP (4)]. Although highly successful, such measurements can be labo- reveals a global landscape of changes in protein location and rious and time-consuming, even for a single protein (all methods function, highlighting a surprising number of proteins that trans- except mass spectrometry) and condition (all methods). locate from the mitochondria to the nucleus or from endoplasmic For these reasons and others, computational prediction of reticulum to Golgi apparatus under stress. CoFP has the potential protein location and function has been a very active area of bio- to discover previously unidentified condition-specific locations and informatic research. Early methods attempted to infer protein functions under diverse conditions of cellular growth. function based mainly on individual protein features, such as se- – Author contributions: K.L., W.-K.H., and T.I. designed research; K.L. and M.-K.S. performed quence similarity or structural homology (3, 5 17). These methods research; K.L. contributed new reagents/analytic tools; K.L., M.-K.S., J.K., K.K., J.B., H.P., and range from simple sequence–sequence comparisons to profile- or B.K. analyzed data; and K.L., M.-K.S., W.-K.H., and T.I. wrote the paper. pattern-based supervised learning methods. Other methods pre- The authors declare no conflict of interest. dicted protein function using gene expression data (18, 19) based This article is a PNAS Direct Submission. on the observation that proteins with similar patterns of expres- 1K.L. and M.-K.S. contributed equally to this work. sions share similar functions (20). Another class of methods is 2To whom correspondence may be addressed. Email: [email protected], [email protected], or based on text mining (21, 22). [email protected]. Although such methods are still widely used for annotating This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. general protein locations or functions, the recent availability of 1073/pnas.1318881111/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1318881111 PNAS Early Edition | 1of10 Downloaded by guest on October 3, 2021 A B E Generate stress-specific protein interaction networks Known locations Protein−Protein & functions (f) interactions Time-series expression profiles in 18 conditions đ Untreated Stress 1 đ Stress 17 P3 P4 C D P2 Generate features & coherence Protein P models 1 P characteristics P8 5 Sequences Static feature P7 P6 S Conditional protein interactions F Predict conditional and dynamic locations & functions D = 1 D = 2 18 Locations P1 Locations Chemistry Network feature N Extract dynamic # Proteins 33 Processes 22 Functions locations& functions L Motifs 29 feature sets Processes Functions selected 1 GO terms possibility 0 Ĕ Landscapes of dynamic locations & 73 locations & functions under stresses functions Stress 1 Stress 2 Stress 17 Untreated G Expression Interaction + Expression H Performance Interaction NEGATIVE NO POSITIVE NEGATIVE NO POSITIVE S NL 1 Composite .90 .19 .22 .71 .26 .31 .62 .67 .46 .60 AUC (33) .62 .72 Processes 0 ProcessesBPs 1 .95 .12 .14 .12 .16 .22 .68 .30 .66 .67 .35 .52 (22) Functions 0 FunctionsMFs 1 .94 .69 .75 .22 .62 .35 .32 .38 .49 (22) .56 .58 .68 Locations 0 Locations Fig. 1. Proteome-wide prediction of conditional locations and functions under stresses. (A) Generally known protein functions, including 18 subcellular locations, 33 biological processes, and 22 molecular functions. (B) Yeast protein–protein interactions accumulated from several databases. (C) Static in- formation of proteins, including sequence, chemical properties, motifs, and GO terms (single-protein features). (D) Model generation after generating static single-protein (denoted S) and network features (denoted N and L) up to network distance D = 2. The best combination of features is selected for each functional category using a divide-and-conquer k-nearest-neighbor method classifier. (E) Stress-specific interaction networks in individual conditions are generated by assigning different functional coherence scores to each interaction of a protein depending on the interactor’s similarity in time series gene ex- pression profiles. (F) After generating the selected features from D using
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-