Research Article a Supervised Network Analysis on Gene
Total Page:16
File Type:pdf, Size:1020Kb
Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2014, Article ID 813067, 19 pages http://dx.doi.org/10.1155/2014/813067 Research Article A Supervised Network Analysis on Gene Expression Profiles of Breast Tumors Predicts a 41-Gene Prognostic Signature of the Transcription Factor MYB across Molecular Subtypes Li-Yu D. Liu,1 Li-Yun Chang,2 Wen-Hung Kuo,3 Hsiao-Lin Hwa,2 King-Jen Chang,3,4 and Fon-Jou Hsieh2,5 1 Biometry Division, Department of Agronomy, National Taiwan University, Taipei 106, Taiwan 2 Department of Obstetrics and Gynecology, College of Medicine, National Taiwan University, Taipei 100, Taiwan 3 Department of Surgery, College of Medicine, National Taiwan University, Taipei 100, Taiwan 4 Cheng Ching General Hospital, Taichung 400, Taiwan 5 Research Center for Developmental Biology and Regenerative Medicine, National Taiwan University, Taipei 100, Taiwan Correspondence should be addressed to Fon-Jou Hsieh; [email protected] Received 29 May 2013; Revised 7 October 2013; Accepted 20 October 2013; Published 3 February 2014 Academic Editor: Seiya Imoto Copyright © 2014 Li-Yu D. Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. MYB is predicted to be a favorable prognostic predictor in a breast cancer population. We proposed to find the inferred mechanism(s) relevant to the prognostic features of MYB via a supervised network analysis. Methods. Both coefficient of intrinsic dependence (CID) and Galton Pierson’s correlation coefficient (GPCC) were combined and designated as CIDUGPCC. It is forthe univariate network analysis. Multivariate CID is for the multivariate network analysis. Other analyses using bioinformatic tools and statistical methods are included. Results. ARNT2 is predicted to be the essential gene partner of MYB. We classified four prognostic relevant gene subpools in three breast cancer cohorts as feature types I–IV. Only the probes in feature type II are the potential prognostic feature of MYB. Moreover, we further validated 41 prognosis relevant probes to be the favorable prognostic signature. Surprisingly, two additional family members of MYB are elevated to promote poor prognosis when both levels of MYB and ARNT2 decline. Both MYBL1 and MYBL2 may partially decrease the tumor suppressive activities that are predicted to be up-regulated by MYB and ARNT2. Conclusions. The major prognostic feature of MYB is predicted to be determined by the MYB subnetwork (41 probes) that is relevant across subtypes. 1. Introduction Miao et al. described a transient defect in mammary gland development in the mouse model with the genetic deletion Breast cancer (BC) has become a global health problem of MYB.TheysuggestedthatMYB is critical for tumor among women in recent years. Finding more cost effective growth and mammary carcinogenesis [2]. MYB transcription strategies to better control this problem is highly desirable. factors (TFs) in the MYB family are widely distributed in It should be noted that the nature of heterogeneity for eukaryotic organisms [3, 4]. MYB family members consist of breast cancer at molecular and clinical level [1]stillremains A-, B-, and C-MYBsindiversevertebrates.A-MYB (MYBL1) challenging to breast cancer care and prevention. plays a critical role in mammary gland development. In We selected MYB for the global network study because female mouse model, MYBL1 is expressed in breast duc- it is essential for mammary gland development and tumori- tal epithelium, mainly during pregnancy-induced ductal genesis [2]. However, the most important reason was that branching and alveolar development [5]. B-MYB (MYBL2), a our preliminary data suggested MYB to be a good prognostic mitotic regulator, could be implicated in breast tumorigenesis predictor among 181 infiltrating ductal breast carcinomas because it is detected in a wide variety of cancer cells and plays basedonKaplan-Meiersurvivalanalysis. anessentialroleduringcellcycleprogression[6–8]. It has 2 Computational and Mathematical Methods in Medicine been documented that C-MYB (MYB) plays different roles in ER(−)PR(+)HER(−) (5A), ER(−)PR(+)HER(+) (6A), and normal and cancer cells [9]. The findings of Thorner et al.[10] ER(−)PR(+)HER(?) (3A). In addition, we designated the indicated that C-MYB may not be behaving as an oncogene cohort of Groups IE and IIE containing ninety gene expres- in estrogen receptor positive (ER(+)) luminal breast tumors sion profiles as 90A cohort. The cohort from subcohorts and suggested that it may be behaving as a tumor suppressor for TN, ERBB2+, ER(−) PR(+)HER(−), ER(−)PR(+)HER(+), in this disease. All these findings described above indicate and ER(−)PR(+)HER(?) to make 91 gene expression profiles the important roles of MYB family members in relation to was designated as 91A cohort. For 181A cohort, it includes mammary gland and breast cancer development in the model 90A cohort and 91A cohort. systems. However, more studies in a genome-wide scale for finding the roles of MYB in breast cancers are essential to fill 2.2. Microarray Data Analyses. Aglobalviewofagene in the gaps for the current findings in the field. profileperbreasttumorspecimenwasanalyzedusingHuman Thisstudyaimedatreassessingthedevelopmentally 1A (version 2) oligonucleotide microarray (half a genome important transcription factor MYB mediated transcriptional size: 22 k) (Agilent technologies, USA). The heatmaps were regulatory networks in relation to breast cancer development displayed after unsupervised hierarchical clustering. For and clinical outcome. unsupervised hierarchical clustering, the log2 ratio for each gene was first centered by subtracting the median across 2. Materials and Methods all samples to discriminate the subclass of the dataset. The “hcluster” function in “stats” package was utilized to perform 2.1. Features of Surgical Specimens for Generating the Dataset the unsupervised clustering. We used the Euclidean distance of Gene Expression Profiles. We used immunohistochemical and the complete linkage as the default settings. Then, the (IHC) statuses for three biomarkers (i.e., estrogen receptor selected gene expression profiles were fed into the software (ER), progesterone receptor A (PR), and HER-2/neu (HER)) R2.15.1 for displaying gene list (Y axis) that is derived from as the classifiers to identify eight intrinsic subtypes. However, hierarchical clustering analysis on the gene profiles of selected for ERBB2 (IHC score: 2+), determination of Her-2/neu gene arrays (X axis)togeneratetheheatmaps.Theheatmapwas copy number by chromogenic in situ hybridization (CISH) produced by “rect” function to make the customized view was performed [12]. As such, IHC/CISH status was used for of the subcohorts. In addition, we used Gene Spring GX7.3.1 determining HER status. (Agilent Technologies, USA) for generating Venn diagrams Ninety specimens of primary infiltrating ductal breast andforretrievingupdatedgeneannotation.ANOVAhasthe carcinomas (IDCs) consist of group IE (i.e., ER(+)PR(+)) advantage of performing both dichotomous and multichoto- (61/90) and group IIE (i.e., ER(+)PR(−)) (29/90). mous analyses. ANOVA test for the relationship between Ninety-one samples of IDCs consist of triple negatives mRNA levels of MYB and the statuses of a clinical index of (TN) (i.e., ER(−)PR(−)HER(−)) (48/91), ERBB2+ (i.e., interest in a given population as well as the statistical methods ER(−)PR(−)HER(+)) (29/91), ER(−)PR(+)HER(−)(5/91), for establishing MYB transcriptionalregulatorynetworkwere ER(−)PR(+)HER(+) (6/91), and ER(−)PR(+)HER(?) (3/91). described previously [11–14].Weusedthesamedataanalyses Those samples were obtained from patients who underwent described above for analyzing other transcription factors of surgeryatNationalTaiwanUniversityHospital(NTUH) interest. We performed Kaplan-Meier survival analyses [15] between 1995 and 2007. The tumor samples for this study using “survival” package in R (version 2.15.1) using the gene were the remaining frozen samples from diagnostic purpose. profiles of 90A cohort, 91A cohort, and 181A cohort or the All patients provided informed consent according to the extracted gene pools of interest in the assigned cohorts. guidelines approved by the Institutional Review Board To quantify the weight of hazard ratios associated with (IRB) at NTUH (IRB number: 200706039R, Research Ethics prognostic gene signature and the traditional prognostic Committee at National Taiwan University Hospital, Taipei, factorsinagivencohortofinterest,bothunivariateand Taiwan). The survival status of this cohort was derived from multivariate COX proportional hazard (COXPH) regression the recent medical recording collected in 2011 (by WHK). models in R package (version 2.15.1) were performed. Other medical records of the patients were obtained from the great assistance from the office of medical record (Cancer 2.3. Features of the Adapted Network Analysis Based on the Registry, Medical Information Management Office, NTUH). Dataset of 181 Gene Expression Profiles. The DNA microarray At the time of this study, the record of cancer treatments for becomes mainstay technique used in medical research. In these cancer patients was only partially complete. recent years, we have designed the network analysis for full The microarray data for this study (181 gene expression prediction of a transcription network for a given transcription