A Cluster-Based Approach for Identifying Prognostic Microrna Signatures in Digestive System Cancers

International Journal of Molecular Sciences Article A Cluster-Based Approach for Identifying Prognostic microRNA Signatures in Digestive System Cancers Jun Zhou 1, Xiang Cui 1 , Feifei Xiao 2 and Guoshuai Cai 1,* 1 Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA; [email protected] (J.Z.); [email protected] (X.C.) 2 Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA; [email protected] * Correspondence: [email protected]; Tel.: +1-803-777-4120 Abstract: Cancer remains the second leading cause of death all over the world. Aberrant expression of miRNA has shown diagnostic and prognostic value in many kinds of cancer. This study aims to provide a novel strategy to identify reliable miRNA signatures and develop improved cancer prognostic models from reported cancer-associated miRNAs. We proposed a new cluster-based approach to identify distinct cluster(s) of cancers and corresponding miRNAs. Further, with samples from TCGA and other independent studies, we identified prognostic markers and validated their prognostic value in prediction models. We also performed KEGG pathway analysis to investigate the functions of miRNAs associated with the cancer cluster of interest. A distinct cluster with 28 cancers and 146 associated miRNAs was identified. This cluster was enriched by digestive system cancers. Further, we screened out 8 prognostic miRNA signatures for STAD, 5 for READ, 18 for PAAD, 24 for LIHC, 12 for ESCA and 18 for COAD. These identified miRNA signatures demonstrated strong abilities in discriminating the overall survival time between high-risk group and low-risk group (p-value < 0.05) in both TCGA training and test datasets, as well as four independent Gene Citation: Zhou, J.; Cui, X.; Xiao, F.; Expression Omnibus (GEO) validation datasets. We also demonstrated that these cluster-based Cai, G. A Cluster-Based Approach for miRNA signatures are superior to signatures identified in single cancers for prognosis. Our study Identifying Prognostic microRNA identified significant miRNA signatures with improved prognosis accuracy in digestive system Signatures in Digestive System cancers. It also provides a novel method/strategy for cancer prognostic marker selection and offers Cancers. Int. J. Mol. Sci. 2021, 22, valuable methodological directions to similar research topics. 1529. https://doi.org/10.3390/ ijms22041529 Keywords: miRNA; cluster analysis; digestive system cancers; prognostic marker Academic Editor: Angelo Veronese Received: 26 December 2020 Accepted: 29 January 2021 Published: 3 February 2021 1. Introduction Cancer is the second leading-frequent cause of death worldwide. The global cancer Publisher’s Note: MDPI stays neutral burden has risen up to 18.1 million new cases and 9.6 million deaths in 2018; the incidence with regard to jurisdictional claims in of cancer is expected to increase by 50% until 2040, with approximately 27 million new published maps and institutional affil- cases per year [1]. The American Cancer Society estimates that there will be about 1,806,590 iations. cancer cases diagnosed and 606,520 deaths from cancer in US in 2020 [2]. New diagnostic tools and treatment guidelines have been extensively studied and developed; however, the survival outcomes of some digestive tract tumors are not showing corresponding improvement [3–5]. Therefore, it is still urgent to develop more reliable prognostic methods Copyright: © 2021 by the authors. for cancer treatment guidance, especially for some digestive tract cancers, which are often Licensee MDPI, Basel, Switzerland. asymptomatic at the early stages [6–10]. This article is an open access article MicroRNAs (miRNAs) have been found to be important players in cancer devel- distributed under the terms and opment. miRNAs are a class of noncoding RNAs of about 18–22 nucleotides (nt) in conditions of the Creative Commons length, which play key functions in the regulation of vital biological processes such as Attribution (CC BY) license (https:// cell division and death, cellular metabolism, intracellular signaling, immunity and cell creativecommons.org/licenses/by/ movement [11–13]. Rich evidence has confirmed the causal link between the dysregulation 4.0/). Int. J. Mol. Sci. 2021, 22, 1529. https://doi.org/10.3390/ijms22041529 https://www.mdpi.com/journal/ijms Int. J. Mol. Sci. 2021, 22, x FOR PEER REVIEW 2 of 14 which play key functions in the regulation of vital biological processes such as cell division and death, cellular metabolism, intracellular signaling, immunity and cell movement [11–13]. Rich evidence has confirmed the causal link between the dysregulation of miR- Int. J. Mol. Sci. 2021, 22, 1529 NAs and cancer [14], and miRNA signatures for particular cancers have been 2identified of 13 by comparing tumor samples and healthy controls. Studies also suggested that the pattern of miRNA expression is associated with cancer type, stage, and other clinical variables [15].of Furthermore, miRNAs and cancer the prognostic [14], and miRNA value signaturesof miRNAs for has particular been implicated cancers have in been multiple identi- can- cersfied including by comparing breast tumor cancer samples [16], pancreatic and healthy cancer controls. [17], Studies hepatocellular also suggested carcinoma that the [18], prostatepattern cancer of miRNA [19], expressionlung cancer is associated[20] and others. with cancer Moreover, type, stage, valuable and otherdiagnostic clinical and vari- prog- ables [15]. Furthermore, the prognostic value of miRNAs has been implicated in multiple nostic biomarkers have been identified in several specific cancers by integrating datasets cancers including breast cancer [16], pancreatic cancer [17], hepatocellular carcinoma [18], fromprostate different cancer studies [19], lung[21,22]. cancer [20] and others. Moreover, valuable diagnostic and prog- nosticDespite biomarkers extensive have studies been identified on cancer-associ in severalated specific miRNAs cancers have by integrating been conducted, datasets most attentionfrom different has been studies paid [21on,22 associations]. between miRNA expression aberration and indi- vidual cancerDespite types. extensive Evidence studies onshowed cancer-associated that several miRNAs key miRNAs have been play conducted, similar most and im- portantattention roles has in been a specific paid on groups associations of cancers. between For miRNA example, expression miR-21 aberration is associated and individ- with the survivalual cancer outcomes types. Evidenceof multiple showed cancers, that including several key hepatocellular miRNAs play similar carcinoma, and important colon cancer androles others in a[23,24]. specific Furthermore, groups of cancers. pancancer For example, analysis miR-21 also isdiscovered associated similar with the miRNA survival alter- outcomes of multiple cancers, including hepatocellular carcinoma, colon cancer and oth- ations among different cancers [25]. Therefore, we hypothesized that integrative analysis ers [23,24]. Furthermore, pancancer analysis also discovered similar miRNA alterations of miRNAamong differentsignatures cancers based [25 on]. cancer Therefore, clusters we hypothesized will identify that more integrative systematic analysis and reliable of biomarkersmiRNA signatures with improved based onprognostic cancer clusters power. will identify more systematic and reliable biomarkersIn this study, with we improved provide prognostic a novel power.cluster-based approach to identify miRNA sets for improvedIn thisprediction study, we of provide overall acancer novel cluster-basedsurvival. The approach detailed to study identify design miRNA of our sets analysis for is displayedimproved predictionin Figure of 1. overall With cancerreported survival. aberrantly The detailed expressed study designmiRNAs, of our we analysis identified is a distinctdisplayed cluster in Figureincluding1. With 28 reportedcancers aberrantlyand associated expressed 146 miRNAs,miRNAs. we Dige identifiedstive system a distinct cancers werecluster enriched including in the 28 cancers identified and associatedcluster, including 146 miRNAs. stomach Digestive adenocarcinoma system cancers were (STAD), esophagealenriched incarcinoma the identified (ESCA), cluster, liver including hepatocellular stomach adenocarcinoma carcinoma (LIHC), (STAD), pancreatic esophageal adenocarcinoma (ESCA), liver hepatocellular carcinoma (LIHC), pancreatic adenocarcinoma carcinoma (PAAD), colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ). (PAAD), colon adenocarcinoma (COAD) and rectum adenocarcinoma (READ). In this In thisstudy, study, we focused we focused on these on these digestive digestive system system cancers cancers and select and prognosticselect prognostic miRNAs miRNAs by by analyzing RNA-seq RNA-seq data data from from The The Cancer Cancer Genome Genome Atlas Atlas (TCGA). (TCGA). The prognosis The prognosis value of value of identifiedidentified miRNAmiRNA signatures signatures was was validated validated on data on data from from independent independent studies. studies. FigureFigure 1. A 1.flowA flowchart chart of the of thestudy study design. design. First, First, we we utilized utilized hier hierarchicalarchical clusteringclustering to to identify identi clustersfy clusters of cancers of cancers and and associatedassociated miRNAs. miRNAs. In this In this study, study, we we focused focused on thethe digestive

A Cluster-Based Approach for Identifying Prognostic Microrna Signatures in Digestive System Cancers

Adaptive Wavelet Clustering for Highly Noisy Data

Cluster Analysis, a Powerful Tool for Data Analysis in Education

Cluster Analysis Y H Chan

Cluster Analysis Objective: Group Data Points Into Classes of Similar Points Based on a Series of Variables

Cluster Analysis: What It Is and How to Use It Alyssa Wittle and Michael Stackhouse, Covance, Inc

Factors Versus Clusters

K-Means Clustering Via Principal Component Analysis

Cluster Analysis

A Nonparametric Statistical Approach to Clustering Via Mode Identification

Regression Clustering

Statistical Power for Cluster Analysis

Simultaneous Clustering and Estimation of Heterogeneous Graphical Models