Network‑Based Gene Function Inference Method to Predict Optimal Gene Functions Associated with Fetal Growth Restriction
Total Page:16
File Type:pdf, Size:1020Kb
MOLECULAR MEDICINE REPORTS 18: 3003-3010, 2018 Network‑based gene function inference method to predict optimal gene functions associated with fetal growth restriction KE-JUN YE1, JIE DAI1, LING-YUN LIU2 and MENG-JIA PENG1 Departments of 1Gynaecology and Obstetrics and 2Clinical Laboratory, Ruian People's Hospital, Ruian, Zhejiang 325200, P.R. China Received January 4, 2017; Accepted July 21, 2017 DOI: 10.3892/mmr.2018.9232 Abstract. The guilt by association (GBA) principle has been Introduction widely used to predict gene functions, and a network-based approach may enhance the confidence and stability of the Fetal growth restriction (FGR), the second primary cause analysis compared with focusing on individual genes. Fetal of perinatal mortality, is a clinical entity that affects 5-10% growth restriction (FGR), is the second primary cause of peri- of gestations (1). FGR has multiple heterogeneous causes, natal mortality. Therefore, the present study aimed to predict including maternal, fetal and placental factors (2). Effective the optimal gene functions for FGR using a network-based treatments for FGR have not been proposed, apart from the GBA method. The method was comprised of four parts: interruption of pregnancy (3). Consequently, early diagnosis Identification of differentially-expressed genes (DEGs) and prevention is of importance for patients with FGR, which between patients with FGR and normal controls based on may permit the etiological identification and adequate moni- gene expression data; construction of a co-expression network toring of fetal vitality, minimizing the risks associated with (CEN) dependent on DEGs, using the Spearman correlation prematurity and intrauterine hypoxia (1,4). Therefore, the coefficient algorithm; collection of gene ontology (GO) data identification of biological markers for the early diagnosis on the basis of a known confirmed database and DEGs; and and detection of FGR is required, in order to elucidate the prediction of optimal gene functions using the GBA algorithm, molecular mechanism underlying FGR. for which the area under the receiver operating characteristic At present, numerous diseases have been attributed to curve (AUC) was obtained for each GO term. A total of the differential expression of genes compared with normal 115 DEGs and 109 GO terms were obtained for subsequent controls [differentially expressed genes (DEGs)] (5). However, analysis. All DEGs were mapped to the CEN and formed genes frequently do not function individually; rather, they 6,555 edges. The results of GBA algorithm demonstrated that interact with other genes. A network-based approach is able to 78 GO terms had a good classification performance with AUC extract informative and notable genes via biological molecular >0.5. In particular, the AUC for 5 of the GO terms was >0.7, networks, including the co-expression network (CEN), rather and these were defined as optimal gene functions, including than focusing on individual genes (6,7). Providing that gene defense response, immune system process, response to stress, connections are based on guilt-determination, predictions of cellular response to chemical stimulus and positive regulation their gene functions may be conducted utilizing guilt-by-asso- of biological process. In conclusion, the results of the present ciation (GBA) method (8). The GBA is a basic element for study provided insights into the pathological mechanism predicting gene function, and typically uses the interactions underlying FGR, and provided potential biomarkers for early between any two genes for the purpose of investigation the role detection and targeted treatment of this disease. However, of novel genes in gene function categories. the interactions between the 5 GO terms remain unclear, and Therefore, the present study took Gene Ontology (GO) further studies are required. annotations and gene expression data as study objectives, and integrated the network approach with the GBA method, termed the network-based gene function inference method, for the purpose of predicting the optimal gene functions for FGR. These gene functions may be potential biomarkers for the early detection and targeted treatment of FGR. Correspondence to: Dr Ke-Jun Ye, Department of Gynaecology and Obstetrics, Ruian People's Hospital, 108 Wansong Road, Ruian, Zhejiang 325200, P.R. China Materials and methods E-mail: [email protected] Network‑based gene function inference method. The Key words: fetal growth restriction, gene ontology, co-expression network-based gene function inference method was comprised network, guilt by association, prediction of four steps: i) Identifying DEGs between patients with FGR and normal controls using Limma based on gene expression data; ii) constructing the CEN dependent on DEGs using the 3004 YE et al: OPTIMAL GENE FUNCTIONS IN FGR Spearman correlation coefficient (SCC); iii) collecting GO (geneontology.org) (18). There were 19,003 terms and data for FGR on the basis of a known confirmed database 18,402 genes in total for human beings. Notably, only one and DEGs; and iv) predicting gene functions using the GBA category (biological process) of GO was selected to be algorithm, for which the area under the receiver operating the study objective. In subsequent steps, the GO structure characteristic curve (AUC) was calculated for each GO term. was diffused, and filtered for GO terms on size ranging An AUC of 0.5 represents classification at chance levels, while from 20 to 1,000 genes after excluding those inferred from an AUC of 1.0 represents a perfect classification. In the gene electronic annotation, a range that generally gives stable function prediction literature, AUC >0.7 is considered to be performance (8,9). In addition, to make ensure that the GO optimal (9). Therefore, GO terms with AUC >0.7 were defined terms correlated closely to FGR, if a GO term had a number as optimal gene functions for patients with FGR in the present of DEGs <20, it was removed. Therefore, only GO terms study. including ≥20 DEGs were reserved. A total of 109 GO terms involved in 115 DEGs remained to be used in the following Identifying DEGs. Gene expression data [GSE24129 (2)] for analyses. human FGR were downloaded from the Gene Expression Omnibus database (www.ncbi.nlm.nih.gov/geo) using the Predicting gene function. As mentioned above, the GBA accession number. GSE24129 was deposited on an Affymetrix method was employed to predict the important gene function Human Gene 1.0 ST Array (Affymetrix; Thermo Fisher in the progression of FGR. Taking the GO functional annota- Scientific, Inc., Waltham, MA, USA), and comprised normo- tions, a multi-functionality score (MFS) was assigned to each tensive pregnancies with or without FGR. The 8 examples with gene i in the CEN (8): FGR were attributed to the case group, whereas 8 cases without FGR were denoted as normal controls. In order to control the quality of GSE24129, standard pretreatments were performed, which included background correction, normalization, probe matching and summarization (10-12). Following conversion Where Numinx was the number of genes within GO group of pretreated data at the probe level into gene symbols and x, whose weighting had the effect of giving contribution to a removal of duplications, 14,398 genes were obtained for FGR GO group; and Numoutx was the number of genes outside the in total. GO group x in the CEN, whose weighting provided a corre- Subsequently, DEGs between the FGR samples and normal sponding weight to genes outside the GO group. Therefore, as controls were identified using the Limma package (13). The the only gene outside a large GO group, the score of the only lmFit function implemented in Limma was utilized to perform gene within a GO group was added to the score of another empirical Bayes statistics and false discovery rate calibration gene. Notably, weighting referred to the impact of measuring of the P-values on the data (14,15). Only genes which met the connectivity in a group through the number of contributions thresholds of P<0.05 and |log2Fold Change| >2 were defined as of the gene to that GO group. A 3-fold cross-validation was DEGs across FGR patients and normal controls. applied to determine an MFS ranked list score for genes as to how well they fitted with the known gene set, and computed Constructing CEN. Cytoscape is an open source software the AUC values for assessing the classification performances project for integrating biomolecular interactions with between FGR samples and normal controls. To the best high-throughput expression data and other molecular states of our knowledge, AUC has been introduced as a better into a unified conceptual network (16). Therefore, the DEGs measure for evaluating the predictive ability of machine were inputted into the Cytoscape software to visualize the learning in support vector machine (SVM) models compared CEN. In order to further evaluate the cooperated strength for with assessing the clinical classification performance (19). each interaction in the CEN, the SCC method was utilized (17). Consequently, the AUC values for GO terms were obtained, SCC is a measure of the correlation between two variables, and terms with AUC >0.7 were identified to be optimal gene giving a value between -1 and +1 inclusive. If the SCC analysis functions. returned a positive value, this indicated a positive linear corre- lation between two genes; otherwise, a negative correlation Results was indicated. For an interaction between gene i and j, the absolute SCC value was denoted