
Altaf et al. J of Biol Res-Thessaloniki (2021) 28:5 https://doi.org/10.1186/s40709-021-00136-7 Journal of Biological Research-Thessaloniki RESEARCH Open Access Genome-scale meta-analysis of breast cancer datasets identifes promising targets for drug development Reem Altaf1*, Humaira Nadeem1, Mustafeez Mujtaba Babar2, Umair Ilyas3 and Syed Aun Muhammad4 Abstract Background: Because of the highly heterogeneous nature of breast cancer, each subtype difers in response to several treatment regimens. This has limited the therapeutic options for metastatic breast cancer disease requiring exploration of diverse therapeutic models to target tumor specifc biomarkers. Methods: Diferentially expressed breast cancer genes identifed through extensive data mapping were studied for their interaction with other target proteins involved in breast cancer progression. The molecular mechanisms by which these signature genes are involved in breast cancer metastasis were also studied through pathway analysis. The potential drug targets for these genes were also identifed. Results: From 50 DEGs, 20 genes were identifed based on fold change and p-value and the data curation of these genes helped in shortlisting 8 potential gene signatures that can be used as potential candidates for breast cancer. Their network and pathway analysis clarifed the role of these genes in breast cancer and their interaction with other signaling pathways involved in the progression of disease metastasis. The miRNA targets identifed through miRDB predictor provided potential miRNA targets for these genes that can be involved in breast cancer progression. Several FDA approved drug targets were identifed for the signature genes easing the therapeutic options for breast cancer treatment. Conclusion: The study provides a more clarifed role of signature genes, their interaction with other genes as well as signaling pathways. The miRNA prediction and the potential drugs identifed will aid in assessing the role of these targets in breast cancer. Keywords: Breast cancer, Microarray datasets, Pathway enrichment analysis, Gene ontology, miRNA, Drug-gene network Background life-threatening malignancy that remains to be the lead- Cancer is one of the leading causes of death for the past ing cause of cancer incidence and mortality in women several years and is the second cause of mortality accord- globally [1]. ing to the American Cancer Society (ACS) statistics Several factors have been attributed towards the devel- after cardiovascular, infectious and parasitic disorders. opment of breast carcinoma. Tese include age, personal Breast cancer is one of the most commonly diagnosed history of breast cancer, reproductive, environmental and genetic factors. Increasing age enhances the risk of breast cancer development [2]. Having a personal history *Correspondence: [email protected] of breast cancer also contributes towards a greater risk of 1 Department of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Riphah International University, Islamabad 44000, Pakistan second breast cancer that can be ipsilateral or contralat- Full list of author information is available at the end of the article eral. Family history of breast cancer can also enhance the © The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/ zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Altaf et al. J of Biol Res-Thessaloniki (2021) 28:5 Page 2 of 15 risk of development of cancer in women. About 5–10% Preprocessing and diferential expression analysis of women with breast cancer show an autosomal domi- of microarray datasets nant inheritance while 20–25% have a positive family his- Te preprocessing of datasets was performed by prepar- tory [3]. Genetic predisposition alleles showing 40–85% ing the phenodata fles for each dataset in a recognizable of lifetime threat of breast cancer development include format [8]. Using the R version 3.1.3, the Bioconduc- BRCA1 and BRCA2 mutations, TP53 mutations, PTEN, tor ArrayQuality Metrics package was utilized for the STK11, E-cadherin and neurofbromatosis (NF) [4]. normalization of the data to a median expression level Te treatment strategies for breast cancer are largely for each gene [7]. After normalization, the background determined by the status of progesterone receptor, estro- correction was done for perfect match (pm) and mis- gen receptor and the human epidermal growth factor match (mm) by Robust Multi-array Analysis (RMA). receptor 2. Clinicopathological factors such as tumor Te method was used to eliminate the artifacts and local grade, size and status of lymph node also determine the noise. Te expression value with a p-value < 0.15 was therapeutic plan, however, the biomarkers for the tumor measured as marginal log transformation. Afterwards, invasion and metastasis are of profound importance in summarization was performed by RMA-algorithm in order to formulate new markers and treatment strategies order to measure the averages between probes in a probe for breast carcinomas. Tis will aid in both current thera- set to attain the summary of intensities. pies and tumor prognosis [5]. Te quality of RNA in these microarray datasets was With the aid of in silico bioinformatic approaches the measured using the AfyRNADegradation package of attainment of new treatment strategies have become Bioconductor, also called degradation analysis [9]. Lastly, easier. One such approach that has helped in identifying the DEGs in each dataset were identifed by pairwise new markers in cancer therapy is the cDNA diferential comparison and the Benjamini–Hochberg method [10] analysis [6]. In this study, 24 datasets were downloaded was employed for multiple testing correction. Te dif- to analyze gene expression profles in breast cancer and ferentially expressed genes were shortlisted and ranked a functional analysis was performed to identify the dif- according to their p-values and resulting scores. Te cut- ferentially expressed genes (DEGs) between breast tumor of values set were p-value ≤ 0.05, FDR < 0.05 (False Dis- cells and treated tissues. A genetic network was con- covery Rate) and absolute log fold change logFC > 1 [11] structed as well as pathway analysis and miRNA target to calculate the moderated statistics. identifcation were performed to understand the underly- ing molecular mechanisms and to identify potential ther- Data curation and cluster analysis apeutic targets for breast cancer. Moreover, drug-gene Te shortlisted genes obtained through diferential network analysis has also been performed to identify expression analysis were further screened to confrm potential drug targets for breast cancer. their role in breast cancer using diverse data sources such as PubMed (http://www.ncbi.nlm.nih.gov/pubme d), MeSH (http://www.ncbi.nlm.nih.gov/mesh), OMIM Methods (Online Mendelian Inheritance in Man) (http://www. Accession of gene expression data ncbi.nlm.nih.gov/omim), and PMC database (http:// Te study focuses on the identifcation of potential breast www.ncbi.nlm.nih.gov/pmc) [12]. Biomedical text mining cancer targets through a diferential screening method. helped in fltering signifcant disease specifc genes. Te Te datasets of breast cancer were accessed from Gene CIMminner tool was used to perform the cluster analysis Expression Omnibus database. Te screening criteria based on the expression values in each dataset using the was “organism: Homo sapiens”, and “experiment type: Absolute Pearson correlation analysis. Te cluster analy- expression profling by array”. Te Afymetrix GeneChip sis revealed variations in gene expression levels between Human Genome U133 Plus 2.0 Array (CDF: Hs133P_ control and treated replicates [13]. Hs_ENST, version 10) (Afymetrix, Inc., Santa Clara, CA, 95051, USA) platform was used. All datasets com- Network analysis and identifcation of gene signatures prised of GEO accession number, platform, sample type, number of samples and gene expression data. Te array Te protein–protein interaction network helped in iden- platform and hgu133plus2 annotation platform of probes tifying the interaction of each protein with other genes were used to identify the diferentially expressed genes. having diferent biological or molecular functions in a Te software R and Bioconductor packages AfyQCRe- diseased state as compared to normal. Te Search Tool port, Afy, Annotate, AnnotationDbi, Limma, Biobase, for the Retrieval of Interacting Genes/Proteins (STRING) AfyRNADegradation, hgu133plus2cdf, and hgu133a2cdf [14] and Human Annotated and Predicted Protein Inter- were used to perform the computational analysis
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages15 Page
-
File Size-