Use of Chronic Lymphocytic Leukemia Research
Total Page:16
File Type:pdf, Size:1020Kb
USE OF CHRONIC LYMPHOCYTIC LEUKEMIA RESEARCH CONSORTIUM DATA REPOSITORY AND GENE EXPRESSION OMNIBUS TO GENERATE AND TEST HYPOTHESES FOR BIOMARKER IDENTIFICATION AND DEVELOPMENT THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Kristin Chelsea Keen, B.A. ***** The Ohio State University 2009 Thesis Committee: Approved by Professor Kun Huang, adviser Professor Philip Payne ______________________________ Adviser Pathology Graduate Program ABSTRACT Chronic lymphocytic leukemia (CLL) is the most common adult leukemia in the United States. There is no known cure for CLL. While biomarkers have been found to correlate with disease progression, such as CD38, IGHV, and ZAP-70, there is a need for further validation of these biomarkers as well as new biomarker discovery. In this study, publicly available gene expression data from NCBI’s Gene Expression Omnibus was used to identify genes with expression correlated to ZAP-70 and CD38 mRNA expression patterns. Also utilized were vast amounts of data from the Chronic Lymphocytic Leukemia Research Consortium (CRC) to search for novel correlations between clinical markers and disease progression and treatment outcome. We found several hundred genes with expression patterns correlated to ZAP-70. We also found several clinical, genetic, biologic, and immunologic CRC data fields which were correlated significantly and at weakly to strongly associated. ii Dedicated to my family iii ACKNOWLEDGEMENTS I thank my adviser, Kun Huang, for his positive support and trust in me as a student and a researcher. I thank my committee member, Philip Payne, for his tangible enthusiasm and thoughtful advice. I thank Gulcin Ozer for contributing her time, effort, and experience to the correlation analysis of Aim 1. I thank Cenny Taslim for helping me with Matlab when I was in a pinch. Lastly, I thank my husband, Jared Circle, and mother, Valerie Keen, for their hours of time doing more than their share of caring for my daughter while I finished this project. I couldn’t have done this without them. iv VITA 2003……………………………………………….B.A., Life Sciences, Otterbein College FIELDS OF STUDY Major Field: Pathology v TABLE OF CONTENTS Abstract…………………………………………………………………………………....ii Dedication…………………………………………………………………………….…..iii Acknowledgements……………………………………………………………….………iv Vita…………………………………………………………………………..…………….v List of Figures…………………………………………………………...………………viii Chapters: 1. Introduction………………………………………………………………..……………1 1.1 Specific Aims…………………………………………………………..……...1 2. Background and Significance …………………………………………..…………3 2.1 Chronic Lymphocytic Leukemia………………………………….…………..3 2.2 The CLL Research Consortium……………………………………….………4 2.3 Bioinformatics and CLL…………………………………………….………...5 3. Methods…………………………………………………………………..………..8 3.1 Methods for Specific Aim 1……………………………………………..…….8 3.2 Methods for Specific Aim 2………………………………………………….10 4. Results ……………………………………………………………………………13 4.1 Results for Specific Aim 1….………………………………………………..13 4.2 Results for Specific Aim 2.…………………………………………………..13 vi 5. Conclusions…………………................................................................................16 6. Discussion…………………………………………………………......................17 References……………………………………………………………………..................21 Appendix A: Tables and Figures for CRC data repository and GEO analysis…………..23 vii LIST OF FIGURES Figure 1: Valid and meaningful hypotheses from [1]…………………………..………..24 Figure 2: Flow chart for Aim 1 methods……………………………………..………….25 Figure 3: Flow chart for Aim 2 methods………….………………………..……………26 Figure 4: Fields queried from CRC research data repository for Aim 1…………….......27 Figure 5: Fields analyzed and bins for these fields………………………………………28 Figure 6: Comparison pairs between CRC query datasheets and methodology for combining datasheets for analysis………………...…………...29 Figure 7: GDS dataset information summary………………………………………..…..30 Figure 8: Correlated CRC data fields, p ≤ 0.05, phi ≥ 0.3……………………..………...31 Figure 9: Correlation gene lists for ZAP-70 and CD38 for GDS1388, GDS1454, and GDS2501, threshold 0.4…………………...…….…..32 Figure 10: Randomness calculation for ZAP-70 and CD38 gene list intersections………………………………………………………...………..33 Figure 11: Annotated IPA gene lists for correlated GDS2501 gene lists and intersected gene lists for ZAP-70 and CD38……………………...34 Figure 12: IPA pathways showing only connected genes, for combined gene lists for ZAP-70……………………………………………...….35 Figure 13: IPA pathway showing only connected genes, for viii combined gene lists for CD38……………………………………………………36 Figure 14: Ohio State University Medical Center clinical reference ranges, November 2008………………………………………...……..37 ix CHAPTER 1 INTRODUCTION Chronic lymphocytic leukemia (CLL) is the most common adult leukemia in the United States [2]. Although some leukemias and lymphomas can be cured, there is no known cure for CLL [3]. While some biomarkers have been found to correlate with disease progression, such as CD38, IGHV, and ZAP-70, there is a need for further validation of these biomarkers as well as new biomarker discovery [4]. In this study, publicly available gene expression data from NCBI’s Gene Expression Omnibus is used to identify genes with expression correlated to ZAP-70 and CD38 mRNA expression patterns. We also utilize vast amounts of data from the Chronic Lymphocytic Leukemia Research Consortium (CRC) to search for novel correlations between clinical markers and disease progression and treatment outcome. Our aims are as follows: Specific Aim 1: Test previously-formed hypotheses generated using knowledge engineering (KE)-based approach using CRC data and correlation analysis as a novel method of biomarker discovery. Specific Aim 2: Identify genes whose mRNA expression is correlated with known CLL biomarkers ZAP-70 and CD38. 1 Based on previous studies using gene expression correlation and genelist intersection from multiple datasets, we expect expression of several genes to correlate with ZAP-70 and CD38 in multiple datasets [5, 6]. We expect that our correlation analysis of CRC data will confirm the KE-generated hypotheses and discover new biomarkers for disease progression. 2 CHAPTER 2 BACKGROUND AND SIGNIFICANCE 2.1 Chronic Lymphocytic Leukemia Chronic lymphocytic leukemia (CLL) is the most common adult leukemia in the United States. Nearly 100,000 Americans live with CLL, most of them over fifty years old. Rates of CLL incidence are increasing, and there is no known cure [7]. CLL usually develops slowly, and the symptoms of the disease are similar to many other more common conditions. This increases the difficulty in diagnosis, and it is only after a battery of testing that most patients find out that they have CLL. CLL is diagnosed through blood tests including white blood cell count and complete blood count. Once a diagnosis is made, staging must be done to determine whether the disease is in the beginning, intermediate, or advanced stage. Some patients remain in the beginning stages of the disease progression and are able to live long lives, never having to deal with many of the disease’s worst symptoms [8, 9]. This results in two distinct groups of patients: those with advancing disease and those with disease that doesn’t seem to progress. Those with the non-progressive manifestation of the disease seem not to need treatment until the disease begins progressing and they become more symptomatic [4]. 3 Early determination of which grouping a patient belonged in, progressive or non- progressive CLL, would serve an important function. If this information could be determined in advance, it would potentially enable the development of a better course of action for disease management and treatment [10]. The end result would lead to the improvement of patients’ conditions and possibly saving lives. Biomarkers have proven helpful in identifying patient groups for other diseases [9]. ZAP-70, CD38, and IGHV have been named in multiple studies as biomarkers for CLL disease progression [4, 11, 12]. A positive ZAP-70 test means that a patient would be placed in the progressive group. While this is progress toward earlier characterization of an individual’s disease state, ZAP-70 testing only yields definitive results if conducted during later, symptomatic phases of disease progression [3]. A more efficient method would be to determine biomarkers or tests that are able to definitively determine at an early point in the course of the disease the likelihood with which a patient may soon stop responding to treatment or will begin more rapid disease progression. A large-scale study with thousands of patients with CLL has been done; with it, a database was created that contains hundreds of data fields. This database has the potential for leading to the identification of new biomarkers or tests that can assist in the determination of disease progression, early disease state detection, refractory, and patient response to treatment. 2.2 The CLL Research Consortium The CLL Research Consortium (CRC) is a multi-site research group funded by the National Cancer Institute whose primary function is to conduct studies of the genetic, 4 biochemical, and immunologic origins of CLL [13, 14]. The CRC has conducted studies that have been responsible for important new insights into CLL pathophysiology and treatment as well as multi-site group data repository management [10, 13-16]. The CRC’s goals are to pursue new treatments for CLL and to examine phenotypic and biomarker relationships