Gene Correlation Network Analysis to Identify Regulatory Factors in Sepsis Zhongheng Zhang1*† , Lin Chen2†, Ping Xu3†, Lifeng Xing1, Yucai Hong1 and Pengpeng Chen1
Total Page:16
File Type:pdf, Size:1020Kb
Zhang et al. J Transl Med (2020) 18:381 https://doi.org/10.1186/s12967-020-02561-z Journal of Translational Medicine RESEARCH Open Access Gene correlation network analysis to identify regulatory factors in sepsis Zhongheng Zhang1*† , Lin Chen2†, Ping Xu3†, Lifeng Xing1, Yucai Hong1 and Pengpeng Chen1 Abstract Background and objectives: Sepsis is a leading cause of mortality and morbidity in the intensive care unit. Regu- latory mechanisms underlying the disease progression and prognosis are largely unknown. The study aimed to identify master regulators of mortality-related modules, providing potential therapeutic target for further translational experiments. Methods: The dataset GSE65682 from the Gene Expression Omnibus (GEO) database was utilized for bioinformatic analysis. Consensus weighted gene co-expression netwoek analysis (WGCNA) was performed to identify modules of sepsis. The module most signifcantly associated with mortality were further analyzed for the identifcation of master regulators of transcription factors and miRNA. Results: A total number of 682 subjects with various causes of sepsis were included for consensus WGCNA analysis, which identifed 27 modules. The network was well preserved among diferent causes of sepsis. Two modules desig- nated as black and light yellow module were found to be associated with mortality outcome. Key regulators of the black and light yellow modules were the transcription factor CEBPB (normalized enrichment score 5.53) and ETV6 (NES 6), respectively. The top 5 miRNA regulated the most number of genes were hsa-miR-335-5p= (n 59), hsa- miR-26b-5p= (n 57), hsa-miR-16-5p (n 44), hsa-miR-17-5p (n 42), and hsa-miR-124-3p (n 38). Clustering= analysis in 2-dimension= space derived from manifold= learning identifed= two subclasses of sepsis, which= showed signifcant association with survival in Cox proportional hazard model (p 0.018). = Conclusions: The present study showed that the black and light-yellow modules were signifcantly associated with mortality outcome. Master regulators of the module included transcription factor CEBPB and ETV6. miRNA-target interactions identifed signifcantly enriched miRNA. Keywords: Sepsis; intensive care unit, Gene co-expression netwoek, Mortality Background few agents are proven to be efective for the treatment of Sepsis is defned as organ dysfunction syndrome caused sepsis. Tus, more regulatory factors need to be identi- by uncontrolled infammatory response to infection. Sep- fed to provide potential targets for the design of efective sis is a leading cause of mortality in hospitalized patients therapeutic agents. [1, 2], and accounts for 30% of case fatality in hospitalized Several studies have used transcriptome analysis to patients [3]. Despite the high mortality and morbidity, investigate potential biological pathways regulating the pathogenesis of sepsis [4–8]. Tese studies were per- *Correspondence: [email protected] formed by diferential gene expression analysis, followed †Zhongheng Zhang, Lin Che and Ping Xu contributed equally to this work by enrichment analyses to established functional path- 1 Department of Emergency Medicine, Sir Run Run Shaw Hospital, ways. In these analyses, genes were tested individually. Zhejiang University School of Medicine, No 3, East Qingchun Road, Hangzhou 310016, Zhejiang Province, China Te sensitivity to identify biologically meaningful genes Full list of author information is available at the end of the article can be low due to multiple testing adjustment. Sepsis is © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/ zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Zhang et al. J Transl Med (2020) 18:381 Page 2 of 15 a heterogeneous syndrome and its pathogenesis involves Consensus weighted gene co‑expression network analysis hundreds of genes. In this situation, the individual con- Te frst step in constructing a consensus WGCNA was tribution of a single gene is too small to be detected with to choose the soft threshoulding power to which co- univariate test. In most diseases, genes function via net- expression similarity was raised to calculte adjaency. We works of co-expressed genes with similar biological func- chose from a set of values from 4 to 20 based on the cri- tions. Tus, identifcation of co-expression pattern could terion of approximate scale-free topology [10]. Since the provide further insights into sepsis-associated biological topological overlap matrices (TOM) of diferent sepsis pathways. Weighted gene co-expression network analysis may have diferent statistical property, we performed (WGCNA) is a systems biology approach used for fnding quantile normalization over the three types of sepsis (e.g. gene clusters with highly correlated expression levels and pneumonia, abdomianl and other sepsis). Te consen- for relating them to phenotypic traits [9]. Rather than sus TOM was calculated by taking the component-wise relating thousands of genes to the clinical trait, WGCNA (“parallel”) minimum of the TOMs in individual dataset, focuses on the relationship between a few modules and which was then input to hierarchical clustering. Finally, the trait [10, 11]. To the best of our knowledge, WGCNA modules were identifed in the resulting dendrogram has been used to explore coexpression pattern in mouse using the Dynamic Tree Cut algorithm [19]. Tis algo- model of sepsis [12], HIV infection [13], in vitro infam- rithm has several advantages such as capability of identi- matory cells [14], and pediatric sepsis [15]. Regulatory fying nested clusters and fexibility. Modules with similar factors including transcription factors and miRNA were expression profles were merged at the threshold of 0.25. not systematically explored in adult sepsis. Gene signifcance was defned as the Student t-test sta- Te present study aimed to identify gene co-expression tistic for testing diferential expression between sepsis modules in sepsis by using the consensus WGCNA (con- and healthy controls. Te signifcance level was adjusted sensus from diferent causes of sepsis including pneu- for multiple testing with Bonferroni correction. Te monia and abdominal sepsis). Tese consensus modules dataset also contained critically ill patients such as those were related to clinical traits and enriched to functional with major abdominal surgery without infection. Tere biological pathways. Potential regulators of these mod- were common pathways between critical illness and ules were explored by using well curated databases. severe infection, the comparison between sepsis versus non-infectious critical illness would omit some impor- Materials and methods tant genes. Tus, the diferential expression was tested The GEO dataset and data preprocessing between sepsis versus healthy controls. Te study used the publicly avaiable dataset GSE65682 from the Gene Expression Omnibus (GEO) database. Te Relating consensus module to pneumonia‑specifc sepsis dataset contained 802 samples including healthy con- module trols, non-sepsis critically ill patients and sepsis patients. Futhermore, the sepsis patients could be further cat- Modules specifc to pnenomia sepsis were identifed by egorized into pneumonia sepsis (n 192), abdominal the method as described above. Pneumonia sepsis mod- = ules were then related to consensus modules. We calcu- sepsis (n = 51) and others (n = 443) based on infection site. PAXgene blood RNA was isolated at intensive-care lated the overlaps of each pair of pneumonia-consensus unit (ICU) admission and whole-blood leukocyte tran- modules, and used the hypergeometric test to assign scriptome was performed at the platform of Afym- a p-value to each of the pairwise overlaps. Tis is also etrix Human Genome U219 Array. Key benefts of the known as the cross-tabulation based comparison of mod- gene chip include increased productivity and efciency ules. Tis method is justifed by the idea that if a module through parallel processing, excellent gene expression is well preserved and reproducible in all types of sepsis accuracy and reproducibility and complete coverage (pneumonia, abdominal and other sepsis), this mod- of the annotated genome. Further detials of the dataset ule could represent the common pathways involving the can be found at other publications [16, 17]. Raw inten- pathogenesis of sepsis [20]. sity expression data were preprocessed with the Robust Module preservation across all three datasets were Multi-array Average (RMA) method [18]. An advantage explored by pairwise comparing eigengene networks in of this method is that normalization occurs at the probe penumonia,