Intrinsic-Overlapping Co-Expression Module Detection with Application to Alzheimer’S Disease

Intrinsic-Overlapping Co-expression Module Detection with Application to Alzheimer's Disease Hazel Nicolette Mannersa, Swarup Royb,a,∗, Jugal K Kalitac aDept of Information Technology, North Eastern Hill University, Shillong, Meghalaya, India bDepartment of Computer Applications, Sikkim University, Gangtok, Sikkim, India cDept of Computer Science, University of Colorado, Colorado Springs, USA Abstract Genes interact with each other and may cause perturbation in the molecular pathways leading to complex diseases. Often, instead of any single gene, a subset of genes interact, forming a network, to share common biological func- tions. Such a subnetwork is called a functional module or motif. Identifying such modules and central key genes in them, that may be responsible for a disease, may help design patient-specific drugs. In this study, we consider the neurodegenerative Alzheimer's Disease (AD) and identify potentially responsible genes from functional motif analysis. We start from the hypothesis that central genes in genetic modules are more relevant to a disease that is under investigation and identify hub genes from the modules as potential marker genes. Motifs or modules are often non-exclusive or overlapping in nature. Moreover, they sometimes show intrinsic or hierarchical distributions with overlapping functional roles. To the best of our knowledge, no prior work handles both the situations in an integrated way. We propose a non-exclusive clustering approach, CluViaN (Clustering Via Network) that can detect intrinsic as well as overlapping modules from gene co-expression networks constructed using microarray expression profiles. We compare our method with existing methods to evaluate the quality of modules extracted. CluViaN reports the presence of intrinsic and overlapping motifs in different species not reported by any other researches. We ∗Corresponding author Email addresses: [email protected] (Hazel Nicolette Manners), [email protected] (Swarup Roy), [email protected] (Jugal K Kalita) Preprint submitted to Elsevier November 1, 2018 further apply our method to extract significant AD specific modules using CluViaN and rank them based the number of genes from a module involved in the disease pathways. Finally, top central genes are identified by topological analysis of the modules. We use two different AD phenotypes data for exper- imentation. We observe that central genes, namely PSEN1, APP, NDUFB2, NDUFA1, UQCR10, PPP3R1 and few more, plays significant role in the AD. Interestingly, our experiments also find a hub gene, PML, which has recently been reported to play a role in plasticity, circadian rhythms and the response to proteins which can cause neurodegenerative disorders. MUC4, another hub gene that we found experimentally is yet to be investigated for its potential role in AD. A software implementation of CluViaN in Java is available for download at https://sites.google.com/site/swarupnehu/ publications/resources/CluViaNSoftware.rar. Keywords: Functional Module, Clustering, Overlapping, Intrinsic, Co-expression, Alzheimer's Disease (AD). 1. Introduction Alzheimer's disease is a neurodegenerative disease. It is a common form of dementia that leads to memory related problems, changes in thinking and behavior of a human. Studying biological pathways and identifying which genes take part in pathways leading to Alzheimer's Disease (AD) may help us understand what goes wrong when the disease strikes. In turn, it may help design effective therapeutic drug molecules for AD. To date, studies have revealed three key genes that may be linked to autosomal dominant or familial early onset AD (FAD). These four genes include amyloid precursor protein (APP), presenilin 1 (PS1) and presenilin 2 (PS2). Apolipoprotein E (ApoE) has been found to be linked to late-onset Alzheimer's disease [1]. Mutations that are linked to APP and PS proteins can lead to the production of Abeta peptides, (Abeta42, specifically). PS1 mutation, which is FAD- linked, has been found to lead to Endoplasmic Reticulum (ER) stress. It important to identify additional key genes or possible regulators responsible for AD, if any. Biological activities inside a cell are governed by a set of influential genes or proteins that regulate one another forming network modules, called regulatory or functional modules. Biologically, a regulatory or functional module [2] is a set of genes (group of tightly interconnected nodes) that act collectively 2 to perform a distinct biological function [3]. A module usually performs a useful biological task, but may even be responsible for complex diseases like cancer, Alzheimer's or Parkinson's. Regulatory modules are co-expressed, co-evolved and regulated by the same set of transcription factors to respond to different conditions. Some genes may even play multiple roles and become members of more than one module [4]. Much of a cell's activity is organized as a network of interacting modules. Identifying regulatory modules is crucial in understanding cellular responses to various external or internal stimuli or signals. In turn, the process may help uncover the disease mechanisms in a living organism [5,6]. Identifying such modules is helpful for a system level understanding of biological and cellular processes. Clustering is a popular data analysis tool in genomic studies using gene- expression microarrays [7] because of its ability to group co-expressed genes with similar expression patterns, offering insights into various transcriptional and biological processes [8,9, 10]. A large amount of work has been done to cluster functionally similar genes [11,7, 12, 13, 14] by applying different forms of classical clustering algorithms along with their variations. Recent advances in biological research has revealed that some genes or proteins play multiple functional roles in a cell. For example, it has been observed that the yeast gene CMR1/YDL156W participates in many DNA- metabolism processes such as replication, repair and transcription [15]. Out of 1, 628 proteins in the hand-curated yeast complex data set [16], 207 proteins participate in more than one complex. It may not be possible to describe all these complexes using disjoint or fuzzy relationships [17, 18]. The genes responsible for such proteins are, therefore, expected to participate in different functional modules or complexes. They exhibit distinct overlapping and embedded structures. Effective finding of functional modules that exhibit these features is an important step towards unveiling disease sub-modules. Further, it is observed that target factor (TF) genes, which are the possible key genes for AD or other diseases, are central genes in subgraphs associated with modules. Detecting such central genes may help prioritize potential drugs as well as further analysis of disease pathways. To achieve our goal, we contribute the following. • As opposed to classical clustering techniques which normally perform exclusive clustering, we propose a new non-exclusive [19] clustering algorithm, CluViaN (Clustering Via Network), to detect functionally enriched regulatory modules from a co-expression network in the pres- 3 ence of overlapping and intrinsic relationships. • We also propose a new proximity measure to construct a weighted co- expression network. • We used CluViaN as an intermediate step to discover AD sub-modules and rank them based on AD pathway enrichment scores. • We further analyze top ranked modules topologically to identify central or hub genes, which are the potential key genes for AD. We organize the rest of the paper into different sections. Prior research on module finding techniques is reported in Section2. We introduce a new distance measure and the CluViaN algorithm in Section3. Performance of our method and ranking of genes responsible for AD are presented in Section4. In Section5, we summarize our work with concluding remarks. 2. Module Detection Techniques When a group of genes interact with each other, the deviations in in- teractions among them in the molecular pathways may lead to complex diseases. Highly interactive genes in these groups, often called network modules, may contribute significantly to the biological function or disease. A network module is a subgraph derived from a gene interaction graph. A subset of functionally cohesive genes is usually topologically highly interconnected in a large gene network. As a result, the task of reconstructing a gene-gene interaction graph or network is a first step towards module detection. The network inferred can then be used for the next step of module extraction or clustering. Next, we discuss the problem of network inference and module detection in a more formal way. A a genetic network is a graph, and thus can use mathematical and computational tools associated with graphs. Definition 1 (Gene co-regulation Network). A gene co-regulation network can be defined as a graph T = (G; E), where G denotes the set of N genes (nodes) fG1;G2; ··· ;GN g participating in a common gene product formation process or biological process, and E is the set of edges fe1; e2; ··· ; emg that correspond to the known interrelationships among the genes. An arc between two genes has a weight signifying their relative proximity. A higher weight means that two genes are less similar in their expression profiles. A positive weight indicates that the two genes are related. 4 The graph may be undirected or directed in nature, and called a Gene Regulatory Network (GRN) or Co-expression Network (GCN), respectively. Inferred

Intrinsic-Overlapping Co-Expression Module Detection with Application to Alzheimer’S Disease

Protein Identities in Evs Isolated from U87-MG GBM Cells As Determined by NG LC-MS/MS

The Function of NM23-H1/NME1 and Its Homologs in Major Processes Linked to Metastasis

Development and Validation of a Protein-Based Risk Score for Cardiovascular Outcomes Among Patients with Stable Coronary Heart Disease

Assessing the Human Canonical Protein Count[Version 1; Peer Review

Supplementary Table 1. the List of Proteins with at Least 2 Unique

Adenovirus Strategies for Altering the Cellular Environment in Favor of Infection

S41467-020-17157-W.Pdf

Transcriptomic Analysis of Early Stages of Intestinal Regeneration in Holothuria Glaberrima David J

A Chromosome-Centric Human Proteome Project (C-HPP) To

Analysis of the Mouse Transcriptome Based on Functional Annotation of 60,770 Full-Length Cdnas

Automatic Functional Annotation of Predicted Active Sites: Combining PDB and Literature Mining

Nucleoside Diphosphate Kinases (Ndpks) in Animal Development