Hindawi BioMed Research International Volume 2021, Article ID 6649660, 8 pages https://doi.org/10.1155/2021/6649660

Research Article That Predict Poor Prognosis in Breast Cancer via Bioinformatical Analysis

Qian Zhou,1 Xiaofeng Liu,1 Mingming Lv,1 Erhu Sun,1 Xun Lu,2 and Cheng Lu 1

1Department of Breast, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing 210004, China 2School of Public Health, Yale University, New Haven, CT 06520, USA

Correspondence should be addressed to Cheng Lu; [email protected]

Received 16 October 2020; Revised 1 March 2021; Accepted 7 April 2021; Published 19 April 2021

Academic Editor: Hassan Dariushnejad

Copyright © 2021 Qian Zhou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background. Breast cancer is one of the most commonly diagnosed cancers all over the world, and it is now the leading cause of cancer death among females. The aim of this study was to find DEGs (differentially expressed genes) which can predict poor prognosis in breast cancer and be effective targets for breast cancer patients via bioinformatical analysis. Methods. GSE86374, GSE5364, and GSE70947 were chosen from the GEO database. DEGs between breast cancer tissues and normal breast tissues were picked out by GEO2R and Venn diagram software. Then, DAVID (Database for Annotation, Visualization, and Integrated Discovery) was used to analyze these DEGs in ontology (GO) including molecular function (MF), cellular component (CC), and biological process (BP) and Kyoto Encyclopedia of Gene and Genome (KEGG) pathway. Next, STRING (Search Tool for the Retrieval of Interacting Genes) was used to investigate potential -protein interaction (PPI) relationships among DEGs and these DEGs were analyzed by Molecular Complex Detection (MCODE) in Cytoscape. After that, UALCAN, GEPIA (gene expression profiling interactive analysis), and KM (Kaplan–Meier plotter) were used for the prognostic information and core genes were qualified. Results. There were 96 upregulated genes and 98 downregulated genes in this study. 55 upregulated genes were selected as hub genes in the PPI network. For validation in UALCAN, GEPIA, and KM, 5 core genes (KIF4A, RACGAP1, CKS2, SHCBP1, and HMMR) were found to highly expressed in breast cancer tissues with poor prognosis. They differentially expressed between different subclasses of breast cancer. Conclusion. These five genes (KIF4A, RACGAP1, CKS2, SHCBP1, and HMMR) could be potential targets for therapy in breast cancer and prediction of prognosis on the basis of bioinformatical analysis.

1. Introduction changed in cancer [4]. Different genetic conditions such as the different gene expressions (DGEs) can lead to different Breast cancer is one of the most commonly diagnosed can- individualized treatment and different effects of treatment. cers all over the world, and it is now the leading cause of can- Bioinformatical analysis is a method using gene chips in pub- cer death among females; incidence rates for breast cancer far lic database to analyze the characters of one type of disease; it exceed those for other cancers in both transitioned and tran- can help researchers in better understanding the molecular sitioning countries [1]. The causes of breast cancer are related mechanism behind different types of cancer [5–7], find new to both hereditary and genetic factors such as gender, age, potential targets of early diagnosis and therapy [8, 9], or dis- family history, and hormone therapy. One of the major hall- cover new biomarkers for prognostic predictor [10, 11]. At marks of cancer is the disorder of gene expression [2, 3]. present, there are several predictive biomarkers for breast RNA is a critical factor for gene expression in the develop- cancer, for example, triple-negative breast cancers (TNBC) ment of cancer. It has various forms including protein- which lack estrogen receptor (ER-), progesterone receptor coding mRNA and noncoding RNAs, for example, lncRNAs (PR-), and amplification of human epidermal growth factor and miRNAs. Recent studies show that processing of RNA is receptor 2 (HER2-). TNBC often has a poor therapeutic 2 BioMed Research International response and a poor prognosis [12]; we must find more bio- (degree cutoff = 2, node score cutoff = 0:2,K-core = 2, max. markers to help us predict the prognosis of TNBC and targets depth = 100). to cure the disease. The rapid development of biological and biomedical research made all this came true. In our study, 2.5. Survival Analysis of Core Genes in Breast Cancer. UAL- with the help of bioinformatical analysis, we extended the CAN is a comprehensive web resource for analyzing cancer knowledge related to breast cancer based on various large data [15] (http://ualcan.path.uab.edu/index.html), and fi databases for conducting genes that predict poor prognosis GEPIA is an online tool for gene expression pro ling interac- in breast cancer especially TNBC. tive analysis as well (http://gepia.cancer-pku.cn/). They can both provide graphs and plots depicting gene expression 2. Materials and Methods and patient survival information based on gene expression. KM is an online survival analysis tool to rapidly assess the ff 2.1. Microarray Data Information. In our study, the gene e ect of certain genes on cancer prognosis using microarray expression profiles were downloaded from the GEO database data (https://kmplot.com/analysis/index.php?p= fi (Gene Expression Omnibus) which was a free public data- service&cancer=breast) [16]. In our study, we rst searched base that contained many genes or microarray profiles all the hub genes on the UALCAN website to identify the (https://www.ncbi.nlm.nih. gov/geo/). We chose GSE86374, ones with poor survival. Then, we used GEPIA to rerecognize ff GSE5364, and GSE70947 for use and got their expression of whether these core genes have di erent expressions between breast cancer and normal breast tissues [13]. GSE86374 was breast cancer and normal breast tissues. After that, we used based on GPL6244 platform ([HuGene-1_0-st] Affymetrix the KM plotter to identify their OS and RFS among breast Human Gene 1.0 ST Array [transcript (gene) version]), cancer patients. GSE5364 was based on GPL96 platform ([HG-U133A] Affy- 2.6. Reanalysis of Genes on UALCAN. We reanalyzed core metrix U133A Array), and GSE70947 was ff based on GPL13607 platform (Agilent-028004 SurePrint G3 genes based on di erent subclasses of breast cancer and their Human GE 8x60K Microarray). correlation using UALCAN.

2.2. Data Preprocessing and Analyzing of DEGs. All raw data 3. Results were processed by the online tool GEO2R (https://www.ncbi 3.1. Microarray Data Information. Three profiles (GSE86374, .nlm. http://nih.gov/geo/geo2r/) from GEO. Genes which GSE5364, and GSE70947) were chosen from the GEO data- met the cutoff criteria with adjusted P value > 0.05 and ∣ ∣≥ : base in our study. GSE86374 included 124 breast cancer sam- logFC 1 0 were considered as DEGs. If the DEGs were with ples and 35 normal breast samples, GSE5364 included 183 logFC > 0, we considered them as upregulated genes; on the breast cancer samples and 13 normal breast samples, and contrary, if the DEGs were with logFC < 0, we considered GSE70947 included 148 breast cancer samples and 148 nor- them as downregulated genes. After screening, we changed mal breast samples. There were totally 455 breast cancer GB-ACC in GSE70947 into gene symbol. Subsequently, data samples and 196 normal breast samples in our study were checked in Venn software online to look for common (Table 1). DEGs among GSE86374, GSE5364, and GSE70947 (http:// bioinformatics.psb.ugent.be/webtools/Venn/). 3.2. Identification of DEGs in Breast Cancer. We used GEO2R online tools to get DEGs in three datasets. Based on the cri- 2.3. Analysis of DEGs in GO Enrichment and KEGG Pathway teria of adjusted P value > 0.05 and ∣logFC ∣≥1:0, there were of Breast Cancer. GO analysis which means 268 upregulated and 396 downregulated genes in GSE86374, analysis now is prevalently used to define genes or RNA, and 967 upregulated and 676 downregulated genes in GSE5364, it is a very useful method for our daily large scale of functional and 920 upregulated and 956 downregulated ones in enrichment research. GO analysis can be classified into differ- GSE70947. Subsequently, Venn diagram software online ent gene functions, for example, biological process (BP), was performed to identify common DEGs in these three dif- molecular function (MF), and cellular component (CC). ferent datasets. We found 194 DEGs expressed significantly KEGG stores a lot of data about biological pathways, diseases, differentially among all three groups, 96 significantly upregu- and chemical substances and is widely used nowadays. In our lated and 98 downregulated (Figure 1). study, we used the database for DAVID (https://david.ncifcrf .gov/) to analyze DEG enrichment of BP, CC, and MF and 3.3. Analysis of GO Enrichment and KEGG Pathway in Breast the KEGG pathways. P <0:01 was considered statistically Cancer. We analyzed all the 194 DEGs using DAVID for GO significant. enrichment analysis, and results showed the following. (1) In biological processes (BP), DEGs were mainly enriched in 2.4. Protein-Protein Interaction (PPI) Network and Node microtubule-based movement, cell adhesion, collagen fibril Analysis in Breast Cancer. STRING is an online tool whose organization, cerebral cortex development, chemokine- full name is Search Tool for the Retrieval of Interacting mediated signaling pathway, cellular response to amino acid Genes (https://string-db.org/) [14]. We used it to get infor- stimulus, cellular response to lipopolysaccharide, activation mation of interaction between (medium confident of protein kinase activity, negative regulation of smooth mus- 0.4). Then, we used the app Cytoscape to examine correlation cle cell proliferation, mitotic cytokinesis, positive regulation among them and find central genes in the PPI network of cytokinesis, mitotic spindle assembly, regulation of BioMed Research International 3

Table 1: Statistic of the three microarray databases derived from the GEO database.

GEO Breast Normal Total Platform accession cancer GPL6244 [HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) GSE86374 124 35 159 version] GSE5364 183 13 196 GPL96 [HG-U133A] Affymetrix Human Genome U133A Array GSE70947 148 148 296 GPL13607 Agilent-028004 SurePrint G3 Human GE 8x60K Microarray Total 455 196 651

GSE 70947 GSE 70947

684 678

82 58 118 62 96 98

766 91 382 158 23 78

GSE 5364 GSE 86374 GSE 5364 GSE 86374 (a) (b)

Figure 1: Venn diagram of DEGs common to all three GEO datasets: (a) upregulated genes; (b) downregulated genes.

attachment of spindle microtubules to kinetochore, positive 3.5. Analysis of Central Genes by UALCAN, GEPIA, and KM regulation of cholesterol storage, positive regulation of Plotter. UALCAN analyzed all the 55 genes; results showed macrophage-derived foam cell differentiation, lipoprotein that 5 of them were with a significantly worse survival transport, mitotic spindle assembly checkpoint, and collagen (Table 2, P <0:01). They are CKS2, HMMR, KIF4A, RAC- catabolic process. (2) In cell component (CC), DEGs were GAP1, and SHCBP1. Then, we used GEPIA to dig up their mainly enriched in extracellular exosome, extracellular space, expression level between breast cancer and normal breast tis- proteinaceous extracellular matrix, perinuclear region of sues. Results revealed all these genes with high expression in cytoplasm, midbody, kinesin complex, extracellular matrix, breast cancer. Then, we reanalyzed these core genes on KM sarcolemma, kinetochore, mitotic spindle, spindle microtu- plotter; results showed that they all had a significant poor bule, and centralspindlin complex. (3) In molecular function survival (Figures 4 and 5, P <0:05). (MF), DEGs were mainly enriched in ATP binding, calcium ion binding, heparin binding, metalloendopeptidase activity, 3.6. Reanalysis of Genes on UALCAN. We reanalyzed the core ATPase activity, microtubule motor activity, ATP-dependent genes in different subclasses of breast cancer in UALCAN. microtubule motor activity, plus-end-directed, drug binding Results showed that these five genes had different expressions (P <0:01). Then, we analyzed these DEGs in the KEGG path- among breast cancer subclasses including TNBC (Figure 6). way. Results showed that DEGs were mainly in the PPAR sig- CKS2, KIF4A, RACGAP1, and SHCBP1 all have positive cor- naling pathway, cell cycle, ECM-receptor interaction, p53 relation with HMMR (Figure 7). The correlation between signaling pathway, oocyte meiosis, pathways in cancer, focal HMMR and the other four genes is 0.62, 0.73, 0.71, and adhesion, cytokine-cytokine receptor interaction, and 0.59, respectively. progesterone-mediated oocyte maturation (Figure 2, P < 0:01). 4. Discussion In our research, we studied GSE5364, GSE70947, and 3.4. Protein-Protein Interaction (PPI) Network and Node GSE86374 together. We found five core genes as common Analysis in Breast Cancer. We imported these total DEGs into DEGs in the three datasets with significant poor survival. STRING and got information of interaction between proteins. They are KIF4A, RACGAP1, HMMR, CKS2, and SHCBP1. There were 192 nodes and 2025 edges in the PPI network. We KIF4A (kinesin family member 4A) is a member of the kine- used Cytotype for further study. With the help of MCODE in sin 4 subfamily. This gene is coding by protein; it is highly Cytotype, we got 55 hub genes among these nodes, all of which expressed in hematopoietic tissues, thymus, fetal liver, were upregulated genes (Figure 3, P <0:05). spleen, adult thymus, and bone marrow, and lower levels 4 BioMed Research International

p53 signaling pathway

Cell cycle

PPAR signaling pathway CXCR chemokine receptor binding Glycosaminoglycan binding Extracellular matrix structural constituent Mitotic spindle Collagen-containing extracellular matrix Spindle Mitotic sister chromatid segregation Nuclear division

Mitotic nuclear division

051015 – P log10 ( .adjust)

BP MF CC KEGG (a) P. adjust

Cell cycle 0.0020 PPAR signaling pathway

p53 signaling pathway

Glycosaminoglycan binding 0.0015 Extracellular matrix structural constituent CXCR chemokine receptor binding Collagen-containing 0.0010 extracellular matrix Spindle

Mitotic spindle

Nuclear division 0.0005

Mitotic nuclear division Mitotic sister chromatid segregation

0.05 0.10 0.15

Generatio Counts 5 19

33

(b)

Figure 2: GO function enrichment and KEGG pathway analysis of DEGs (P <0:01, count > 5, DAVID). BioMed Research International 5

Figure 3: DEG protein-to-protein interaction network constructed by STRING and Cytoscape analysis. (a) There were 192 nodes and 2025 edges in the PPI network. Nodes meant proteins and edges meant the interaction of the proteins. Purple circles meant upregulated genes and green circles meant downregulated ones; yellow circles were highlighted hub genes. (b) Hub genes of DEGs (degree cutoff = 2, node score cutoff = 0:2, K-core-2, max depth = 100).

Table 2: The prognostic information of the 55 hub genes.

Category Official gene symbol Genes with significantly worse survival (P <0:01) KIF4A RACGAP1 CKS2 SHCBP1 HMMR Genes with significantly worse survival (P >0:05) CDK1 AURKA CENPE NDC80 TACC3 CDC20 SPC25 BUB1B BUB1 CCNB2 CENPF ZWINT DLGAP5 KIF20A MAD2L1 TPX2 ECT2 KIF11 NEK2 CCNB1 KIF2C KIF23 TYMS UBE2C TTK TOP2A NUSAP1 PRC1 TK1 PBK CCNE2 FOXM1 RRM2 MCM4 SMC4 MKI67 ASPM CEP55 CENPU MYBL2 EZH2 KIF18A RAD51AP1 KIF14 CDKN3

⁎⁎6 ⁎⁎⁎ 10 CKS2 HMMR KIF4A 8 RRACGAP1 SHCBP1 6 5 5 8 5 4 4 6 4 6 3 3 3 4 2 2 4 2

1 1 2 1 2

0 0 0 BRCA BRCA BRCA BRCA BRCA

(num (T) = 1085; (num (T) = 1085; (num (T) = 1085; (num (T) = 1085; (num (T) = 1085; num (N) = 291) num (N) = 291) num (N) = 291) num (N) = 291) num (N) = 291)

Figure 4: Five core genes in breast cancer patients compared to healthy people. GEPIA was used to further identify the expression level of these genes between breast cancer and normal people. All the 5 genes had significant high levels in breast cancer specimen. Red means tumor and grey means normal tissues (P <0:05). 6 BioMed Research International

CKS2-RFS HMMR-RFS KIF4A-RFS RACGAP1-RFS SHCBP1-RFS

1.0 HR = 11.72.72 (1(1.54.54 - 1.92) 1.0 HRHR = 11.7.7 (1.(1.5252 - 11.9).9) 1.0 HR = 1.751.75 (1(1.57.57 - 1.96) 1.0 HR = 1.961.96 (1(1.75.75 - 2.19) 1.0 HRHR = 1.76 ((1.581.58 - 1.971.97)) lologrankgrank P < 1E-16 lologrankgrank P < 1E-16 lologrankgrank P < 1E-16 logranklogrank P < 1E-1E-1616 lologrankgrank P < 1E-16 0.8 0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 Probability Probability Probability Probability Probability

0.2 0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0 0.0 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 Time (months) Time (months) Time (months) Time (months) Time (months) Number at risk Number at risk Number at risk Number at risk Number at risk Low 1976 1399 607 124 8 0 Low 1980 1432 610 137 10 0 Low 1984 1432 624 132 14 2 Low 1978 1482 657 150 13 1 Low 1980 1452 639 141 14 2 High 1975 1120 468 117 19 3 High 1971 1087 465 104 17 3 High 1967 1087 451 104 13 1 High 1973 1037 418 91 14 2 High 1971 1067 436 100 13 1 CKS2-OS HMMR-OS KIF4A-OS RACGAP1-OS SHCBP1-OS

1.0 HR = 1.59 (1.28 - 1.971.97)) 1.0 HR = 1.69 (1.36 - 2.1) 1.0 HR = 1.8 (1.44 - 2.23) 1.0 HR = 1.79 (1.44 - 2.222.22)) 1.0 HR = 1.76 (1.42 - 2.19) logranlogrankk P = 2e-05 logranklogrank P = 1.3e-061.3e-06 llogrankogrank P = 8.8e-08 logranlogrankk P = 1.1e-07 logranklogrank P = 2.1e-07 0.8 0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 0.4 Probability Probability Probability Probability Probability

0.2 0.2 0.2 0.2 0.2

0.0 0.0 0.0 0.0 0.0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Time (months) Time (months) Time (months) Time (months) Time (months) Number at risk Number at risk Number at risk Number at risk Number at risk Low 701 594 289 69 10 2 0 Low 701 597 281 81 13 1 0 Low 701 595 279 79 12 2 0 Low 701 598 287 86 15 3 0 Low 701 601 289 86 14 2 0 High 701 489 188 60 11 1 0 High 701 486 196 48 820 High 701 488 198 50 910 High 701 485 190 43 600 High 701 482 188 43 710

Expression

Low High

Figure 5: Prognostic information of these genes. KM plotter was used to identify the prognostic information of core genes, and all of them had a significantly worse survival. OS: overall survival; RFS: relapse-free survival; P <0:05.

Expression of CKS2 in BRCA based on breast cancer Expression of HMMR in BRCA based on breast cancer Expression of KIF4A in BRCA based on breast cancer subclasses subclasses subclasses ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ 500 40 60 ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ ⁎⁎ ⁎⁎ ns 50 400 ⁎⁎⁎ ⁎⁎⁎ 30 ⁎ ⁎ ⁎⁎⁎ ⁎⁎⁎ 40 ⁎⁎⁎ 300 ⁎⁎ 20 30 ⁎⁎⁎ 200 20 10 Transcript per million 100 Transcript per million Transcript per million 10

0 0 0 Normal Luminal HER2 positive Triple negative Normal Luminal HER2 positive Triple negative Normal Luminal HER2 positive Triple negative (n = 114) (n = 566) (n = 37) (n = 116) (n = 114) (n = 566) (n = 37) (n = 116) (n = 114) (n = 566) (n = 37) (n = 116)

TCGA samples TCGA samples TCGA samples

Expression of RACGAP1 in BRCA based on breast cancer Expression of SHCBP1 in BRCA based on breast cancer subclasses subclasses ⁎⁎⁎ ⁎⁎⁎ ⁎⁎⁎ 100 30 ⁎⁎⁎ ⁎⁎⁎ ns ⁎ 25 ⁎⁎⁎ 80 ⁎⁎⁎ ns ⁎⁎⁎ 20 60 ⁎⁎⁎ 15 40 10 Transcript per million 20 Transcript per million 5

0 0 Normal Luminal HER2 positive Triple negative Normal Luminal HER2 positive Triple negative (n = 114) (n = 566) (n = 37) (n = 116) (n = 114) (n = 566) (n = 37) (n = 116)

TCGA samples TCGA samples

Figure 6: Expression of core genes in breast cancer on different subclasses (UALCAN). are found in the testis, heart, kidney, colon, and lung [17]. BRCA2. It is potentially associated with higher risk of breast Diseases associated with KIF4A include mental retardation, cancer. Diseases associated with HMMR include breast can- X-linked 100 [18], and retinoblastoma [19]. HMMR (hyalur- cer [21] and fibrosarcoma [22]. RACGAP1 (Rac GTPase- onan-mediated motility receptor) is a protein-coding gene. activating protein 1) is a protein-coding gene which encodes When hyaluronan binds to HMMR, the phosphorylation of a GTPase-activating protein (GAP). It is highly expressed in PTK2/FAK1 occurs. It may also be involved in cellular trans- the thymus, testis, and placenta and lower expressed in the formation regulating extracellular-regulated kinase (ERK) spleen and peripheral blood lymphocytes; the highest levels activity and metastasis formation [20]. HMMR is expressed of its expression were found in spermatocytes while in testis in breast tissue and forms a complex with BRCA1 and the expression is restricted to germ cells [23]. CKS2 BioMed Research International 7

Gene expression correlation between HMMR Gene expression correlation between HMMR Gene expression correlation between HMMR Gene expression correlation between HMMR and CKS2 in BRCA and KIF4A in BRCA and RACGAP1 in BRCA and SHCBP1 in BRCA 1250 70 250 50 60 1000 200 40 50 750 40 150 30 30 (TPM) 500 100 (TPM) 20 20 250 50 10 10 SHCBP1 expression values expression SHCBP1 RACGAP1 expression values values expression RACGAP1 CKS2 expression values (TPM) values CKS2 expression

0 (TPM) values KIF4A expression 0 0 0 0 20406080100 0 20406080100 0 20406080100 0 20406080100 HMMR expression values (TPM) HMMR expression values (TPM) HMMR expression values (TPM) HMMR expression values (TPM)

Tumor Gene expression correlation between HMMR Gene expression correlation between HMMR Gene expression correlation between HMMR Gene expression correlation between HMMR and CKS2 in BRCA and KIF4A in BRCA and RACGAP1 in BRCA and SHCBP1 inBRCA 1250 70 250 50 60 1000 200 40 50 750 40 150 30 30 (TPM) 500 100 (TPM) 20 20 250 50 10 10 SHCBP1 expression values expression SHCBP1 RACGAP1 expression values expression RACGAP1 CKS2 expression values (TPM) values CKS2 expression 0 (TPM) values KIF4A expression 0 0 0 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 HMMR expression values (TPM) HMMR expression values (TPM) HMMR expression values (TPM) HMMR expression values (TPM)

Caucasian African American Asian

Figure 7: CKS2, KIF4A, RACGAP1, and SHCBP1 all have positive correlation with HMMR (UALCAN).

(CDC28 protein kinase regulatory subunit 2 or cyclin- GAP1, and SHCBP1 all have positive correlation with dependent kinase regulatory subunit) and SHCBP1 (SHC HMMR; they may become combined indicator of prognosis binding and spindle associated 1 or SHC SH2 domain- or targets for different subtypes of breast cancer. Our findings binding protein 1) are also protein-coding genes. Gene ontol- reinforce the importance of DEGs in breast cancer and pro- ogy annotations related to CKS2 and SHCBP1 both include vide new insights into the novel strategies of therapy and pre- protein binding [24, 25]. The five genes are all coding by pro- diction of prognosis in different subclasses of breast cancer. It tein, so we can detect their protein level in tissues by immu- will be useful for further clinical applications in breast cancer nohistochemistry. In our studies, HMMR is associated with diagnosis, prognosis, and targeted therapy. breast cancer poor prognosis; it is also known for the rela- tionship with BRCA1 and BRCA2. CKS2, KIF4A, RACGAP1, and SHCBP1 all have positive correlation with HMMR. Data Availability Therefore, they have the possibility to become combination fi biomarkers as indicator of prognosis for different subtypes The data used to support the ndings of this study are avail- of breast cancer especially TNBC and targets of this disease. able from the GEO database (https://www.ncbi.nlm.nih. However, there were some limitations in our study. First, gov/geo/). although high expressions of KIF4A, RACGAP1, CKS2, SHCBP1, and HMMR were independent prognostic factors of poor OS of breast cancer, all the data in our research came Conflicts of Interest from online database and we need further experiments The authors declare that there is no conflict of interest in vitro and in vivo to validate our findings. Second, there regarding the publication of this paper. is a lack of external datasets to identify our finding, so false positive may exist. Third, we did not explore the potential roles in diagnosis and therapy these genes played, so more Authors’ Contributions studies are needed and further studies should focus on inves- tigating the underlying mechanism between these core genes Qian Zhou and Xiaofeng Liu contributed equally to this ff and di erent subclasses in breast cancer especially TNBC. work. 5. Conclusions Acknowledgments The expressions of KIF4A, RACGAP1, CKS2, SHCBP1, and HMMR were high in breast cancer, associated with poor OS This work was supported by the Nanjing Medical Science and different breast cancer subclasses. These candidates and Technique Development Foundation (ZKX17041), the may provide underlying therapeutic targets to distinguish Natural Science Foundation of Jiangsu Province different subclasses of breast cancer and become clinically (BK20161120), and the Maternal and Child Health Research diagnostic biomarkers in the near future. CKS2, KIF4A, RAC- Project of Jiangsu Province (F201628). 8 BioMed Research International

References [16] B. Györffy, A. Lanczky, A. C. Eklund et al., “An online survival analysis tool to rapidly assess the effect of 22,277 genes on [1] H. Sung, J. Ferlay, R. L. Siegel et al., “Global cancer statistics breast cancer prognosis using microarray data of 1,809 2020: GLOBOCAN estimates of incidence and mortality patients,” Breast Cancer Research and Treatment, vol. 123, worldwide for 36 cancers in 185 countries,” CA: a Cancer Jour- no. 3, pp. 725–731, 2010. – nal for Clinicians, pp. 1 41, 2021. [17] S. Oh, H. Hahn, T. A. Torrey et al., “Identification of the [2] D. Hanahan and R. A. Weinberg, “The hallmarks of cancer,” human homologue of mouse KIF4, a kinesin superfamily Cell, vol. 100, no. 1, pp. 57–70, 2000. motor protein,” Biochimica et Biophysica Acta, vol. 1493, – [3] D. Hanahan and R. A. Weinberg, “Hallmarks of cancer: the no. 1-2, pp. 219 224, 2000. next generation,” Cell, vol. 144, no. 5, pp. 646–674, 2011. [18] M. H. Willemsen, W. Ba, W. M. Wissink-Lindhout et al., “ [4] G. J. Goodall and V. O. Wickramasinghe, “RNA in cancer,” Involvement of the kinesin family members KIF4A and ” Nature Reviews Cancer, vol. 21, no. 1, pp. 22–36, 2021. KIF5C in intellectual disability and synaptic function, Journal of Medical Genetics, vol. 51, no. 7, pp. 487–494, 2014. [5] S. Udhaya Kumar, D. Thirumal Kumar, R. Bithia et al., “Anal- “ ysis of differentially expressed genes and molecular pathways [19] R. T. Yan and S. Z. Wang, Increased chromokinesin immuno- ” in familial hypercholesterolemia involved in atherosclerosis: reactivity in retinoblastoma cells, Gene, vol. 189, no. 2, – a systematic and bioinformatics approach,” Frontiers in Genet- pp. 263 267, 1997. ics, vol. 11, p. 734, 2020. [20] X. F. Liu, T. K. Bera, C. Kahue et al., “ANKRD26 and its inter- [6] S. U. Kumar, D. T. Kumar, R. Siva, C. G. P. Doss, and acting partners TRIO, GPS2, HMMR and DIPA regulate adi- ” H. Zayed, “Integrative bioinformatics approaches to map pogenesis in 3T3-L1 cells, PLoS One, vol. 7, no. 5, article potential novel genes and pathways involved in ovarian can- e38130, 2012. cer,” Frontiers in Bioengineering and Biotechnology, vol. 7, [21] B. Kalmyrzaev, P. D. Pharoah, D. F. Easton, B. A. Ponder, p. 391, 2019. A. M. Dunning, and for the SEARCH Team, “Hyaluronan- [7] S. Mishra, M. I. Shah, S. Udhaya Kumar et al., “Network anal- mediated motility receptor gene single nucleotide polymor- ” ysis of transcriptomics data for the prediction and prioritiza- phisms and risk of breast cancer, Cancer Epidemiology, Bio- – tion of membrane-associated biomarkers for idiopathic markers & Prevention, vol. 17, no. 12, pp. 3618 3620, 2008. pulmonary fibrosis (IPF) by bioinformatics approach,” [22] C. L. Hall and E. A. Turley, “Hyaluronan: RHAMM mediated Advances in Protein Chemistry and Structural Biology, cell locomotion and signaling in tumorigenesis,” Journal of vol. 123, pp. 241–273, 2021. Neuro-Oncology, vol. 26, no. 3, pp. 221–229, 1995. [8] S. Udhaya Kumar, B. Rajan, D. Thirumal Kumar et al., [23] N. Naud, A. Touré, J. Liu et al., “Rho family GTPase “Involvement of essential signaling cascades and analysis of interacts and co-localizes with MgcRacGAP in male germ gene networks in diabesity,” Genes, vol. 11, no. 11, p. 1256, cells,” The Biochemical Journal, vol. 372, no. 1, pp. 105–112, 2020. 2003. [9] H. Yan, G. Zheng, J. Qu et al., “Identification of key candidate [24] H. E. Richardson, C. S. Stueland, J. Thomas, P. Russell, and S. I. genes and pathways in multiple myeloma by integrated bioin- Reed, “Human cDNAs encoding homologs of the small formatics analysis,” Journal of Cellular Physiology, vol. 234, p34Cdc28/Cdc2-associated protein of Saccharomyces cerevi- no. 12, pp. 23785–23797, 2019. siae and Schizosaccharomyces pombe,” Genes & Development, – [10] J. Wan, S. Jiang, Y. Jiang et al., “Data mining and expression vol. 4, no. 8, pp. 1332 1344, 1990. analysis of differential lncRNA ADAMTS9-AS1 in prostate [25] J. So, A. Pasculescu, A. Y. Dai et al., “Integrative analysis of cancer,” Frontiers in Genetics, vol. 10, p. 1377, 2019. kinase networks in TRAIL-induced apoptosis provides a ” [11] D. Fu, B. Zhang, L. Yang, S. Huang, and W. Xin, “Development source of potential targets for combination therapy, Science of an immune-related risk signature for predicting prognosis Signaling, vol. 8, no. 371, p. rs3, 2015. in lung squamous cell carcinoma,” Frontiers in Genetics, vol. 11, p. 978, 2020. [12] O. T. Cox, S. J. Edmunds, K. Simon-Keller et al., “PDLIM2 is a marker of adhesion and β-catenin activity in triple-negative breast cancer,” Cancer Research, vol. 79, no. 10, pp. 2619– 2633, 2019. [13] K. Yu, K. Ganesan, L. K. Tan et al., “A precisely regulated gene expression cassette potently modulates metastasis and survival in multiple solid cancers,” PLoS Genetics, vol. 4, no. 7, article e1000129, 2008. [14] D. Szklarczyk, A. L. Gable, D. Lyon et al., “STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets,” Nucleic Acids Research, vol. 47, no. D1, pp. D607– D613, 2019. [15] D. S. Chandrashekar, B. Bashel, S. A. H. Balasubramanya et al., “UALCAN: a portal for facilitating tumor subgroup gene expression and survival analyses,” Neoplasia, vol. 19, no. 8, pp. 649–658, 2017.