International Journal of Research ISSN NO:2236-6124

A Survey on -Protein Interaction Network Methods and Challenges

B Madhav Rao1*, V Srinivasa Rao2 1 Research Scholar, Rayalaseema University, Kurnool, India 2 Department of CSE, V.R .Siddhartha Engineering College, Vijayawada 520007, India e-mail: [email protected], [email protected] *Corresponding Author: [email protected], Tel.: 9949582856

ABSTRACT the drugs can be known. It is an area which is a relationship between the computer science Bio information system is one of the and biological sciences. Different biological prominent field for analyzing of biological terms and information techniques are applied process. The main objective of bio informatics to understand the association of molecules on is to identify the disease and analysis the cause a very large data so this is also termed as for disease. Protein Protein Interactions (PPI) management information system for is used to analyze the structure of protein biological sciences there are many tools for sequence and visualization in 3D structure. molecular biology which integrates computer Many methodologies have been used to science and mathematics to preprocess the analysis the cancer causing protein detection biological data for the large dieses using PPI network. Protein protein interaction information database. These tools perform the networks are used for the drug discovery for a complex biological sequences from the large particular disease in humans using protein databases at a very fast rate of preprocessing interactions in Human Interaction Networks. on complex data. There are many advantages and disadvantages The main goals of research in while analyzing the different methods, so we are to maintain the data in gather different analysis and results of large such a way which can allow to access the scale data. It is used for feature directions for information in a simplest format as per the the purpose of data modeling and analyzing to requirement and retrieve the new information be implemented by using different machine whenever it is produced. So we need to learning and deep learning techniques and 3d develop a tool that may be useful for visualization. Here different analysis methods analyzing the data and display the outcome in have been surveyed for the future directions. the human readable format. In the field of Bioinformatics there has been a serious Keywords consideration and attentions for the research Bioinformatics, Protein-Protein by using PPI networks..PPI networks can be interaction, Machine learning, deep learning, defines as a result of biochemical or 3d visualization, , network electrostatic forces it forms a connection construction. between two or more proteins. So, the main aim of this paper is to give the different methodologies[22] which are used in the 1. INTRODUCTION areas of breast cancer detection using Biomedical applications are one of the purification of gene expressions. In this paper major areas in the research field of describes about the various tools used for information technology to identify biological breast cancer PPI network analysis. information for a quality analysis. Integrated genetic and biological information which can Most Cellular functionalities are be used for the gene based discovery of the carried out by two or more proteins rather dieses by using which a information related to then a single one. Now a days so many

Volume VIII, Issue I, January/2019 Page No:974

View publication stats International Journal of Research ISSN NO:2236-6124

protein protein interaction[1] datasets are 22000 genes of various protein types in the publicly available for analysis. Modern biological activities. As mentioned below, computational and mathematical methods 1) These activities of a cell are depend upon required to analysis complex networks signal transduction which are carried by formed by combination of two or more extracellular signals depends on the interior of proteins. This reduces the cost and time for the cells between the different signal analyzing the protein sequence with more molecules of PPI. accuracy then the conventional methods. A valuable framework is required for 2) When a protein is carrying another protein better interaction and functional a transport across membranes is generated. organization of the . PPI’s are used 3) When enzymes interact with each other for performing the biological process of they produce a small macro molecules called multiple proteins interaction the main goal is as cell metabolism. use for unravelling protein– protein interactions (PPIs) in proteomics, which will 4) Muscle contraction consists of several decode the molecular functionality interactions related to gene diseases underlying the biological proteins to Steps for PPI networks understand the human diseases on root analysis: level[7]. For detecting protein functionality 1. Collect the data from biological we need to analyze the PPI network. By database using these analysis target drug is prepared. 2. Use the tools to predict. In PPI network mainly consider about 3. Visualize and analyze data. unbound proteins in large number of 4. Find the targeted proteins using a bio cells[5]. informatics approach.

5. Design the network graph. 2. IMPORTANCE OF PROTEIN - 6. Find the characterizes of protein PROTEIN INTERACTION modules. Gene is a variety of genomes which are 7. Find the relationship between the present in the body which are also called as protein modules . molecules cells these are more than 50000 various proteins which normally generates

fig 1: Protein complex cores detected using COACH

Volume VIII, Issue I, January/2019 Page No:975

View publication stats International Journal of Research ISSN NO:2236-6124

which is taken. Generally in this method targeted protein expressed as cell .By using

this method PPI can be analyzed quality wise PROTEIN PROTEIN INTERACTION as well as quantity wise [24]. METHODS FOR ANALYZING PROTEIN

SEQUENCES 3. EXPERIMENTAL METHODS 4. COMPUTATIONAL METHODS In experimental methods protein There are many methods used for interactions are detected by using bio physical analyzing and evaluating the PPI networks. and bio chemical methods like CO-IP (co – Among this most of the methods derived immune precipitation)[2]. from data mining, soft computing and biological structures. some of the most frequently used methods are[3][7]: 3.1 YEAST-TWO-HYBRID SCREENING This method allows to find interactions 4.1 PHYLOGENETIC PROFILING between two proteins. In this approach yeast Phylogenetic Profiling detects the cells are partitioned into two parts, namely similar patterns in protein families of large prey and bait, prey and bait not interact with number of . This method assumes, each other if no transcription of reporter if there is an interaction between proteins protein comes. The relation between two then they must co-evolve. In this proteins are present or not defined by using phylogenetic profile generated for each resultant reporter protein expression[24].In protein[5].By using this method we can this method there a large scale of false discover the pathways for unknown enzymes. positives and false negatives. and expressions are not same in Yeast and 4.2 PREDICTION OF PROTEIN PAIRS humans so that the physical interactions may USING SIMILAR PHYLOGENETIC vary each other. TREES In this technique phylogenetic trees are 3.2 AFFINITY PURIFICATION OR generated for given protein dataset to detect MASS SPECTROMETRY the interaction between two or more proteins[5].Pylogenetic trees gives the Affinity purification or mass evolutionary history of the given proteins. spectrometry used to find the constant For predicting the protein interaction interactions between proteins and detects the phylogentic approach follows mirror tree functional relationship between the protein method. It identifies the co-evolution sequences. This technique starts by between interacting proteins and results the purification of targeted protein, degree of similarity[8].

proteins[22].This method have a limitation 4.3 ROSETTA STONE METHOD In this method is based on the concept that ,it can be applied only to those proteins of some of the single domain containing in which domain arrangement exist. protein in one fuse to form multi domain proteins in another organism. This 4.4 INFERENCE OF THE method is also called as Gene fusion. In this INTERACTION FROM HOMOLOGOUS method Common fusion activities are STRUCTURE identified. Based on that phenomena In this model they consider known proteins are combined and evaluated that the protein sequence structure and its neighbor two proteins are interacted with each other or proteins which are similar to known proteins. not. If they interact then detects the Based on the similarity form the protein homologous properties of two clusters using complete linkage clustering

Volume VIII, Issue I, January/2019 Page No:976

View publication stats International Journal of Research ISSN NO:2236-6124

algorithm find the distance between the Protein protein interaction networks forms clusters and form the matrix. Then consider a interactions between proteins that may be the nearest clusters that interact more. single interaction or hundred of interactions. All this data were collected together in databases specifically designed for biological 4.5 CLASSIFICATION METHOD data storage. These databases are frequently Classification method is widely used updated in order to provide accurate and method in PPI networks. This method complete interaction data to users. Day by day belongs to data mining technique. In this numbers of biological databases are increased. method classifying the protein pairs into Databases are classified into three main interacting and non-interacting pairs. Most categories, they are primary databases, meta- used classifiers are Support Vector databases and prediction databases[12]. Machine(SVM) and Random Forest

Decision(RFD). 5.1 PRIMARY DATABASES 4.6 IDENTIFICATION OF Primary databases collects protein STRUCTURAL PATTERNS interaction information that proven to exist in This method consist pre defined set of any experimental methods. Some of the known protein interfaces from Protein Data examples for these type of data bases are : Bank (PDB).In this database protein interfaces Database of Interacting Proteins(DIP), are defined as polypeptide pair fragments that Molecular Interaction Database(MINT), MIPS are less than a threshold slightly larger than Mammalian Protein Protein Interaction the Van Der Waals radius of the atoms Database (MIPS – MPPI) ,Biological General Repository for Interaction Datasets (BioGRID),IntAct Molecular Interaction 4.7 DOMAIN-PAIR EXCLUSION Database,MIPS Protein Interaction Resource ANALYSIS on Yeast (MIPS – Mpact), and Human Protein Bayesian methods are used to identify the Reference Database(HPRD)[13]. non specific indiscriminative interactions, but it difficult to identify the interactions between specific domain. E-Scores are calculated to 5.2 META-DATABASES find interactions between domains in domain Meta databases also consist of –pair exclusion analysis. If E-score is low information that are available in primary interaction between proteins with in the databases, along with some extra data also domain are very low, relse E-Score is high collects in these databases. Example for meta indicates that two domains are likely to databases are Protein Interaction Network interact each other. In this method false Analysis(PINA), Agile Protein Interaction positives and false negatives are not consider Data Analyzer (APID).For representing for experimental data[23]. protein sequence in graphical view APID is useful [26].APID uses multi stage tool 4.8 SUPERVISED LEARNING PROBLEM apinBrowser used by APID to produce graphical analysis. Supervised Learning is a deep learning technique in data mining. It can be used to predict the PPI network by considering known 5.3 CONSENSUS PATH DATABASE protein interactions supervise the function that This database takes very small amount can be predict whether there is an interaction of time to compute the search and combine between two proteins are present or not in the the required results. To filter the dataset given protein data as input to supervised CPDB used a mapping standard that supports learning[9][10]. the PPI analysis tool. It generates many possible outcomes as well as it takes multiple

input datasets. CPDB produces a scalable 5. PPIS EXPERIMENTAL DATABASE graph, it supports the dynamic graph

Volume VIII, Issue I, January/2019 Page No:977

View publication stats International Journal of Research ISSN NO:2236-6124

generation for the result. NETWORK MAPS Network maps are characterized to determine how resultant interactions effects 5.4 MOLECULAR INTERACTION the functionality of the biological system. DATABASE(MINT) One approach known as the disease module For Executing any query with this hypothesis is based on the observation that Database it requires the Java installed disease proteins are not scattered randomly in the browser . The UI of MINT databse produces interactome, but form topological modules where a short review of the results and utilization of they tend to interact more with each other than different databases. MINT database does not with proteins outside of this neighborhood. provide graphical representation in order to Efforts in interactome mapping, with integration reduce the loading time. of isoforms and protein variants as well as quantitative, spatial, and temporal information, will permit a better understanding. 5.5 HDOCK SERVER HDOCH[18] is a collection of 6. TOOLS USED FOR ANALYSIS OF PPI multiple components like third party programs, docking algorithm and scoring NETWORKS functions. It supports both sequence and There are many analyzing tools available structural of the proteins as a input. After which are used to analyze the protein protein collecting the input dataset performs the interaction network. Some of the analyzing sequence similarity test against PDB tools are: sequence database to find the homologous Cytoscape is an open source tool for protein sequence. Then filter the common constructing small as well as complex PPI sequences in both datasets. networks and analysis the network. It can be merge multiple protein networks into single one.It produces 3D structure representation 5.6 STACKED AUTO ENCODER of PPI network. It supports predefined tools Stacked Auto Encoder uses as well as used defined tools to analysis the unsupervised learning algorithm to process PPI network. the unlabelled data in artificial neural PathBLAST tool is PPI network analyzer networks. An auto encoder is developed to and search tool for comparing PPI network create hidden structures of unlabeled among different species to detect the pathway data.Encoder takes ‘X’ as a input and of the proteins. produces ‘X^’ as output.In this each layer is APID(Agile Protein Interaction Data trained separately but before generating Analyses) is another tool used for PPI output it consider previous layer output as a network analysis. parameter[25]. BiogridPlugin2 used to importing data from BioGrid Database.In this we can apply filters 5.7 EPSILON-CP[17] to get specific PPI network to import. This is a structure prediction method. It is a contact method that combines the 7. CONCLUSION evolutionary data of the multiple sequences For Analyzing Protein Protein interactions alignment with physicochemical information various methods are available in data mining, from structure prediction method with soft computing in computational methods and sequence based information.By combining experimental methods like Y2H, co-IP. multiple sources of information to increate Various databases available for protein protein the prediction accuracy when compared to interaction datasets. In this databases some are CASP11.In EPSILON – CP method uses based on structure of the protein and some are stacking and train a deep neural network to sequence based. Experimental methods gives derive the relationship among the data. less accuracy when compared machine learning techniques discussed in this paper. 5.8 PROTEOME SCALE INTERACTOME There is a possibility for analyzing large scale

Volume VIII, Issue I, January/2019 Page No:978

View publication stats International Journal of Research ISSN NO:2236-6124

protein datasets by using deep learning "Parallelization of the functional flow techniques.This survey paper gives overview algorithm for prediction of protein of methods available in PPI network analysis. function using protein-protein interaction networks." High Performance Computing and Simulation (HPCS), International 8. REFERENCES Conference on. IEEE, 2011. [1] Antonio Mora, Katerina Michalickova and Ian M Donaldson, “A survey of protein interaction data and multigenic [10] Hu, Lun, "Efficiently predicting large- inherited disorders”, BMC scale protein-protein interactions using Bioinformatics, vol. 14, pp. 1-7, 2013. MapReduce." Computational Biology and Chemistry 2017. [2] Fiona Browne, 1 Huiru Zheng,1 HaiyingWang,1 and Francisco Azuaje, [11] Sun, Peng, et al. "Towards Distributed “From Experimental Approaches to Machine Learning in Shared Clusters: A Computational Techniques: A Review on Dynamically-Partitioned Approach." the Prediction of Protein-Protein Smart Computing (SMARTCOMP), Interactions”, Advances in Artificial IEEE International Conference on. IEEE, Intelligence, Hindawi, vol. 2010, pp. 1- 2017 15, 2010. [12] Tovchigrechko, Andrey, and Ilya A. [3] Tord Berggård1, Sara Linse1 and Peter Vakser. "GRAMM-X public web server James2, “Methods for the detection and for protein–protein docking." Nucleic analysis ofprotein–protein acids research, vol. 34.suppl_2 interactions”,Proteomics, vol.7 2007, pp. ,pp.W310-W314,2017. 1-10, 2007. [13] May, Andreas, and Martin Zacharias. [4] Ulrich Stelzl, Uwe Worm, "Protein–protein docking in CAPRI using MaciejLalowski, Christian Haening,”A ATTRACT to account for global and human protein-protein interaction local flexibility." Proteins: Structure, network: A resource for annotating the Function, and Bioinformatics , vol. Proteome”, Cell Press, Vol.122, issue 6, 69.4,pp.774-780, 2017. pp. 957-968, 2005. [14] Zacharias, Martin. "Protein–protein [5] Joan Planas-Iglesias,JaumeBonet,Javier docking with a reduced protein model Garcia- Garcia,Manuel A. Marin- accounting for side-chain flexibility." Lopez,ElisendaFelliu and BaldoOliva, Protein Science , vol.12.6,pp. 1271- ”Understanding Protein–Protein 1282,2003 Interactions Using Local Structural [15] Carter, Phil, "Protein–protein docking Features”, JMB Article, pp.1-3, 2013. using 3D-Dock in rounds 3, 4, and 5 of [6] Marco Wiltgen, “Structural CAPRI." Proteins: Structure, Function, Bioinformatics: From the Sequence and Bioinformatics ,vol.60.2 ,pp. 281- to Structure and 288,2005. Function”,CurrentBioinformatics,pp. 1-2, [16] Smith, Graham R., and Michael JE 2009. Sternberg. "Evaluation of the 3D-Dock [7] Hung Xuan Ta and Liisa Holm, protein docking suite in rounds 1 and 2 of “Computational approaches for predicting the CAPRI blind trial." Proteins: protein interaction networks: The wiring Structure, Function, and Bioinformatics of protein”, The Biochemical ,vol.52.1,pp.74- 79,2003. Society,pp.1- 3,2011., S. Mullender [17] Stahl, Kolja, Michael Schneider, and [8] Sun, Tanlin, "Sequence-based Oliver Brock. "EPSILON-CP: using deep prediction of protein protein interaction learning to combine information from using a deep-learning algorithm." BMC multiple sources for protein contact bioinformatics, vol. 18.1, pp.277, 2017. prediction." [9] Akkoyun, Emrah, and Tolga Can,

Volume VIII, Issue I, January/2019 Page No:979

View publication stats International Journal of Research ISSN NO:2236-6124

[18] Yan, Yumeng, "HDOCK: a web server [22] Huang D-S, Zhang L, Han K, Deng S, Yang for protein–protein and protein– K, Zhang H. Prediction of protein-protein DNA/RNA docking based on a hybrid interactions based on protein- protein strategy." Nucleic Acids Research, 2017 correlation using least squares regression. Curr Protein Pept Sci. 2014;15:553–560

[23] V. Srinivasa Rao, K. Srinivas, G. N. Sujini, [19] 45. Liu X, Liu B, Huang Z, Shi T, and G. N. Sunand Kumar,” Review Article Chen Y, Zhang J. SPPS: A sequence- Protein-Protein Interaction Detection: based method for predicting proba- bility Methods and Analysis”, Hindawi Publishing of protein-protein interaction partners. Corporation International Journal of PLoS One. 2012; 7(1):e30938. Proteomics Volume 2014, Article ID 147648, https://doi.org/10.1371/ 12 pages journal.pone.0030938 PMID: 22292078 http://dx.doi.org/10.1155/2014/147648 [24] Anna Brückner , Cécile Polge , Nicolas [20] 11. Angluin D, Aspnes J, Reyzin L. Network construction with subgraph Lentze, Daniel Auerbach and Uwe Schlattner,” Review Yeast Two-Hybrid, a connectivity constraints. Journal of Powerful Tool for Systems Biology”; Int. J. Combinatorial Optimization. 2015; Mol. Sci. 2009, 10, 2763-2788; 29(2):418–432. doi:10.3390/ijms10062763 https://doi.org/10.1007/s10878-013-9603- [25] Hina Umbrin,Sabi latif,”A Survey on 2 Protein Protein Interaction (PPI) Methods [21] Gautam Dey, Tobias Meyer: “Phylogenetic , Databases, Challenges and Futute Profiling for Probing the Modular Directions”,2018 International Conference Architecture of the Human Genome”, Cell on Computing, Mathematics and Engineering Systems 1, August 26, 2015 2015 Elsevier Technologies – iCoMET 2018 Inc.,pp 106-115,2015

Volume VIII, Issue I, January/2019 Page No:980

View publication stats