Identification of Core Modules and Genes in Rheumatoid Arthritis Following Infliximab Therapy
Total Page:16
File Type:pdf, Size:1020Kb
Int J Hum Genet, 18(2): 172-179(2018) © Kamla-Raj 2018 DOI: 10.31901/24566330.2018/18.2.700 Identification of Core Modules and Genes in Rheumatoid Arthritis Following Infliximab Therapy Jin Yin1, Li-Li Yang2 and Shi-Xiang Ren3 1Department of Orthopedics, The First Hospital of Kunming Calmette International Hospital, Kunming, 650000, Yunnan Province, China 2Spine Surgery, The Second Hospital of Jilin University, Changchun, 130000, Jilin Province, China 3Department of Orthopedics, Beijing Chaoyang Hospital Affiliated to Capital Medical University, Beijing, 100020, China KEYWORDS Attractor. Bioinformatics. Biomarker. Protein-Protein Interaction Network. Target ABSTRACT The aim of this work was to investigate core modules and genes in rheumatoid arthritis (RA) following infliximab (IFX) therapy by combining systematic tracking of modules and attract method. Core modules were determined by attract method between IFX group (ION) and control group (CON). As a result, a total of 15 and 13 candidate modules were obtained for IFX group and control group, respectively. When matching candidate modules across the two groups, researchers gained 8 module pairs and named them as modules. In detail, Module 1 had the highest differential MCD (ÄC), ÄC = 0.041. The result of attract method showed that 2 core modules (Module 3 and Module 6) and 9 core genes (POLE2, CDC45, DLGAP5, KIF11, NCAPG, RPS5, RPL18A, RPL35 and RPS19) were successfully identified. The findings might give great insights to reveal the molecular mechanism underlying IFX, and provide potential biomarkers for treatment and prognosis of RA disease. INTRODUCTION tida et al. 2018; Pegoraro and Misteli 2017). Gene expression data studies reveal a complex, heter- Tumor necrosis factor alpha (TNF-α) plays ogeneous immune inflammatory response in the a crucial role in the pathogenesis of rheumatoid immune mediated inflammatory diseases yet arthritis (RA), as proved by the clinical benefit common signatures, are characteristic of specif- of TNF-α-neutralizing therapy with either a TNF- ic autoimmune diseases (Takeuchi 2017). Hence α type II receptor–IgG1 fusion protein or a chi- using microarray data of RA patients may be a meric monoclonal antibody against TNF-α, such good way to predict response to IFX accurately as infliximab (IFX) (Velascovelázquez et al. 2017). and reliably, even further to uncover the func- Generally, IFX is administered by intravenous tional mechanism of this drug. infusion typically at six- to eight-week intervals, and cannot be given by mouth because the di- Objectives gestive system would destroy the drug (Hemp- erly and Vande 2018). For RA patients, IFX seems In this paper, to identify core modules and to work by preventing TNF-α from binding to genes in RA following IFX therapy, the system- its receptor in the cells, but the specific molecu- atic tracking of modules and attract method were lar mechanism of the process is unclear. With combined. Firstly, objective network for IFX the development of high throughput technolo- group (ION) and objective network for the con- gy and gene data analysis over the past decade, trol group (CON) was constructed based on gene rapid progress has been made in discovering expression data, protein-protein interaction (PPI) genetic associations with certain disease (Bas- data, and Spearman correlation coefficient (SCC). Subsequently, modules were detected by calcu- Address for correspondence: lating module correlation density (MCD) be- Shi-Xiang Ren tween any pair of candidate modules which were Department of Orthopedics, identified by clique-merging algorithm. Finally, Beijing Chaoyang Hospital Affiliated to Capital Medical University, core modules were determined utilizing attract No.8 on Gongti South Road, Beijing, 100020, China method from modules between IFX group and Telephone and Fax: 86-010-85231228 control group, and genes in the core modules E-mail: [email protected] were defined as core genes. The results might CORE MODULES AND GENES PREDICTION 173 provide potential biomarkers for detection and were different due to the differences between therapy of IFX treated RA patients, and gained IFX treated samples and control samples. an insight to reveal the underlying molecular mechanisms of this process. Constructing ION and CON MATERIAL AND METHODS Since there might be a number of false posi- tive or non-effective interactions in PPI data, Preparing Data SCC was implemented to re-weight these inter- actions. Here, SCC is a measure of the correla- Gene Expression Data tion between two variables, giving a value be- tween -1 and +1 inclusive (Szmidt and Kacprzyk In this paper, gene expression profiles with 2010). The SCC between gene i and j, S(i, j), was accessing number of E-GEOD-57405 (Rosenberg calculated as formula: et al. 2014) for RA patients following IFX thera- 1 n g(i,k) − g(i) g( j,k) − g ( j ) (1) S(i, j) = ∑ ( ).( ) py were collected from the ArrayExpress data- n −1 k=1 σ (i) σ ( j) base (http://www.ebi.ac.uk/arrayexpress/). E- Where n was the number of samples of the GEOD-57405 was comprised of 19 RA samples gene expression data; g(i, k) or g(j, k) was the before IFX treatment (Control group) and 31 RA expression level of gene i or j in the sample k samples after IFX treatment (IFX group or Ex- under a specific condition; g(i) or g(j) repre- perimental group), and deposited on A-GEOD- sented the mean expression level of gene i or j; 13158 - [HT_HG-U133_Plus_PM] Affymetrix HT and σ(i) or σ(j) stood for the standard deviation HG-U133+ PM Array Plate. To control the qual- for the specific condition. If S(i, j) had a positive ity of the data, standard pre-treatments were value, there was a positive linear correlation be- conducted, including background correction, tween i and j. Besides, for a PPI between i and j, normalization, probe correction, and summari- absolute SCC value was denoted as its weight zation of expressed values (Bolstad et al. 2003; value. Only the interactions with P < 0.05 were Irizarry et al. 2003). After converting the prepro- selected to construct the ION for IFX group and cessed data on probe level into gene symbol CON for the control group. measure and removing the duplicated ones, the researchers obtained a total of 17352 genes in Identifying Modules gene expression data for subsequent analysis. Systemic inference of modules between IFX PPI Data group and control group comprised of two steps, identifying candidate modules from ION and Search Tool for the Retrieval of Interacting CON using clique-merging algorithm (Liu et al. Genes/Proteins (STRING) provided a critical as- 2009; Srihari and Leon 2013); and extracting sessment and integration of PPIs, including di- modules from candidate modules dependent on rect (physical) as well as indirect (functional) MCD and module pair match (Srihari and Ragan associations (Szklarczyk et al. 2014). Hence, re- 2013). searchers acquired all human PPIs from STRING database, including 16,730 genes and 1,048,576 Exploring Candidate Modules interactions. Subsequently, genes or interactions without expression value or duplicated self-loops Clique-merging algorithm worked in two were removed, and interactions of score < 0.2 steps: in the first step, it found all the maximal were also discarded. A total of 5665 genes and cliques from the ION and CON, and in the sec- 28,176 interactions were retained. To make these ond step, it merged highly overlapped cliques interactions more reliable and correlated to RA, (Liu et al. 2014). Maximal cliques were determined the researchers extracted interactions with two by cliques algorithm which utilized a depth-first nodes both belonging to the gene expression search strategy to enumerate all maximal cliques data. Ultimately, 12,899 PPIs involved in 3332 and effectively pruned non-maximal cliques dur- nodes were gained and denoted as the PPI data ing the enumeration process (Tomita et al. 2006). for further exploitation in this study. Note that Cliques with too small number of genes were the PPI data for IFX group and control group difficult and meaningless to study, and thus the 174 JIN YIN, LI-LI YANG AND SHI-XIANG REN researchers discarded cliques with node amount identified through the F-statistic, for gene i, F(i) smaller than 4 (Sriganesh and Ragan 2013). In was computed: 2 1 K 2 addition, some maximal cliques overlapped with γ u(i) − u(i) ∑k=1 k []k − one another, and the high overlapped ones must F ( i ) = K − 1 (3) 1 K γv (i) (i) 22 []u −u− be integrated to reduce the result size. For each N − K ∑∑k==11v vk clique, it checked whether there existed the oth- Where v represented corresponding expres- er clique that had a higher score than J, where J sion value in each replicate sample; rk for each = 0.5 was a predefined threshold for overlap- cell type k = 1, …, K; u stood for the mixed effect ping (Srihari et al. 2013). If such clique existed, model; N meant the total number of samples. the two cliques would be removed or merged. Large values of the F-statistic indicated a strong The refined maximal cliques were demoted as association whereas a small F-statistic suggest- candidate modules. In particular, candidate mod- ed that the gene demonstrated minimal cell type- ules for IFX group were identified from ION, sim- specific expression changes. In order to make ilarly, candidate modules for the control group the F-statistic more confidence, the researchers from CON. selected T test to correct the log2-transformed F-statistics and obtain P value for each poten- Evaluating Modules tially shared module originated from synexpres- sion groups. Adjusting their P values on the For purpose of evaluating candidate mod- basis of false discovery rate (FDR) (Benjamini ules in IFX group and control group, MCD, d, and Hochberg 1995), the researchers defined the for each candidate module under special condi- modules with P<0.05 as core modules between tion was calculated as follow: IFX and control groups, and genes in core mod- ∑ S((i, j), M ) ules as core genes.