IDENTIFICATION of CELL SURFACE MARKERS WHICH CORRELATE with SALL4 in a B-CELL ACUTE LYMPHOBLASTIC LEUKEMIA with T(8;14)
Total Page:16
File Type:pdf, Size:1020Kb
IDENTIFICATION of CELL SURFACE MARKERS WHICH CORRELATE WITH SALL4 in a B-CELL ACUTE LYMPHOBLASTIC LEUKEMIA WITH T(8;14) DISCOVERED THROUGH BIOINFORMATIC ANALYSIS of MICROARRAY GENE EXPRESSION DATA The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:38962442 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA ,'(17,),&$7,21 2) &(// 685)$&( 0$5.(56 :+,&+ &255(/$7( :,7+ 6$// ,1 $ %&(// $&87( /<03+2%/$67,& /(8.(0,$ :,7+ W ',6&29(5(' 7+528*+ %,2,1)250$7,& $1$/<6,6 2) 0,&52$55$< *(1( (;35(66,21 '$7$ 52%(57 3$8/ :(,1%(5* $ 7KHVLV 6XEPLWWHG WR WKH )DFXOW\ RI 7KH +DUYDUG 0HGLFDO 6FKRRO LQ 3DUWLDO )XOILOOPHQW RI WKH 5HTXLUHPHQWV IRU WKH 'HJUHH RI 0DVWHU RI 0HGLFDO 6FLHQFHV LQ ,PPXQRORJ\ +DUYDUG 8QLYHUVLW\ %RVWRQ 0DVVDFKXVHWWV -XQH Thesis Advisor: Dr. Li Chai Author: Robert Paul Weinberg Department of Pathology Candidate MMSc in Immunology Brigham and Womens’ Hospital Harvard Medical School 77 Francis Street 25 Shattuck Street Boston, MA 02215 Boston, MA 02215 IDENTIFICATION OF CELL SURFACE MARKERS WHICH CORRELATE WITH SALL4 IN A B-CELL ACUTE LYMPHOBLASTIC LEUKEMIA WITH TRANSLOCATION t(8;14) DISCOVERED THROUGH BIOINFORMATICS ANALYSIS OF MICROARRAY GENE EXPRESSION DATA Abstract Acute Lymphoblastic Leukemia (ALL) is the most common leukemia in children, causing signficant morbidity and mortality annually in the U.S. We performed exploratory data analysis on several microarray gene expression data sets publicly available in the Gene Expression Omnibus (GEO) repository maintained at the National Center for Biotechnology Information of the National Library of Medicine under the National Institutes of Health (http://ncbi.nlm.nih.gov) looking for novel associations and relationships between the zinc finger transcription factor SALL4 and leukemia. Through this data mining, we found a subset of B-cell ALL where multiple cell surface markers have relatively high correlation with SALL4. However, in part due to the small number of samples in this group ( n = 13 ), the results of these analyses must be considered with caution until such time as they may be validated experimentally in the lab with living leukemia cells. We evaluated the transcriptome changes in these leukemia datasets which are associated with the expression of the SALL4. The correlation analysis of the microarray data revealed that a small subset of B-cell ALL, comprising 13 samples, a mature B-cell acute lymphoblastic leukemia with a translocation of t(8;14) subset [B-ALL with t(8;14)] has multiple cell surface marker genes which showed relatively high ii correlation with SALL4 expression ( | r | > 0.60), whereas 16 other leukemia subsets only showed low- moderate correlation of the same cell surface biomarkers with SALL4 ( | r | < 0.45). The microarray gene expression data was obtained using the Affymetrix gene chip, HG- U133Plus2, which is a 3’ IVT oligonucleotide array for the detection of cDNA, which is synthesized from mRNA extracted from the relevant human cells. The array consists of both Perfect Match and Mismatch probes for the detection and differential analysis of some 23,520 probe-gene pairs. The luminosity read- out from the gene chip assay then undergoes a number of statistical manipulations which include standardization and normalization of the data prior to its deposit in the GEO library. Within each dataset the gene expression data is normalized but special methods must be used if one wants to compare the data between different datasets from different experiments in the GEO repository. Some datasets include the raw luminosity read-outs. The majority of this thesis focuses on one specific microarray gene expression dataset, GSE13159, which comprises some 2,096 samples taken from patients with acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), chronic myeloid leukemia (CML), myelodysplastic syndrome (MDS) and normal healthy controls. After finding the B-ALL with t(8;14) wherein the cell surface markers correlate highly with SALL4, we used the limma package from the R-based Bioconductor platform to perform a linear regression analysis looking for the differential expression of genes in the transcriptome. The linear regression analysis reveals that this B-cell leukemia subset has genes differentially expressed distinct from the average pattern of gene expression of the other lymphoblastic leukemias. Extensive bioinformatic analyses were carried out on this small group of samples and the limitations of these analyses will be further examined in the discussion section of the paper. Some preliminary functional genomic analysis was carried out on these differentially expressed genes (DEGs) iii and they were compartmentalized into specific gene ontologies (GO) and KEGG pathways, which includes the hematopoietic pathway. This corollary data can be found in the appendices attached. There is some overlap of the Gene Ontologies and the KEGG pathways between the 17 leukemia / myelodysplastic groups analyzed, which includes the hematopoietic pathway but the B-ALL with t(8;14) showed differences from the other leukemias. SALL4 is a zinc-finger transcription factor important in maintaining the pluripotency of embryonic and hematopoietic stem cells as evidenced in transgenic animal models and genetically modified cell lines with either deletion of SALL4 or forced over-expression of SALL4. Experimental evidence also suggests that SALL4 plays an important role in leukemogenesis as well as other oncogenic processes in other neoplasms. Potentially the association found between these specific cell surface biomarkers with SALL4 expression in this B-ALL with t(8;14) subset may facilitate future research on SALL4. The iPathway tool (www.advaitabio.com) was used to further characterize this B-ALL t(8;14) subset. The iPathway tool revealed 549 differentially expressed genes (DEGs) compared with the normal samples identified out of a total of 20,388 genes with measured expression. These 549 DEGs have a significant impact on 34 biological pathways by KEGG analysis. These 549 DEGs also comprise a significant enrichment of 1431 Gene Ontology (GO) terms, 237 predicted miRNAs and 57 diseases based on uncorrected p-values. These DEGs were analyzed in the context of pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG), the Gene Ontology Consortium database (GO), the miRBase and TARGETSCAN databases. Some of the iPathway results will be found in the appendices. These results must be considered with caution considering significant limitations in this study. iv Table of Contents Pages Abstract ………………………………………………………………………….. ii - iv Table of Contents ……………………………………………………………….. v - vi List of Figures …………………………………………………………………..… vii List of Tables ……………………………………………………………………... viii Acknowledgments …………………………………………………………………..ix 1. Chapter 1: Background ..................................................................................…. 1 - 5 1.1 The pluripotency-maintaining transcription Factor SALL4 ........................... 1 1.3 The genetics of B-cell ALL with the t(8;14) translocation ............................... 3 2. Chapter 2: Data and Methods ......................................................................…… 5 - 31 2.1 Microarray analysis of Gene expression ………………………...................… 5 2.2 Bioinformatics and Computational Biology tools …………........................…. 7 2.3 R programming language and Bioconductor software tools ..........................…. 8 2.4 Data and computational Results .....................................................................…. 9 2.5 Brief Discussion of Results ...........................................................................…. 30 3. Chapter 3: Discussion and Perspectives .................................................................33 - 36 3.1 Discussion ………………………………………………….………..………… 33 3.2 Limitations .........................................................................…...........………....…35 3.3 Future Research Paths ..........................................................…...........…..…….. 36 4. Bibliography ...................................................................................................…… 37 - 57 5. Appendices Appendix A Correlation of cell surface markers for other groups ……………… 58 - 60 Appendix B Biology and Genetics of Acute Lymphoblastic leukemias ……….. 61 - 65 Appendix C Top 100 Differentially Expressed Genes B-ALL t(8;14) v normal.. 66 - 72 v Appendix D Top 100 DEGs for B-ALL without t(8;14) ……………….……… 73 - 76 Appendix E DEGs and Gene Set Enrichment Analysis ……………………….. 77 - 80 Appendix F DEGs with greater than 2-fold change from normal ……………... 81 - 91 Appendix G KEGG pathway analysis on DEGs ……………………………….. 92 - 95 Appendix H Gene Ontology analysis of DEGs ………………………………… 96 - 103 Appendix I Focused analysis of KEGG hematopoietic pathway ……………… 104 - 110 Appendix J Putative modulating microRNAs based on DEG expression …….. 111 - 117 Appendix K Relative ranks of DEGs among 3 ALL groups ……….………….. 118 - 191 Appendix L Master list of DEGs with greater than 2 log-fold change ………… 192 - 220 Appendix M Master list of DEGs found in B-ALL with t(8;14) ………………. 221 - 234 Appendix N Master list of DEGs found