Microarray Bioinformatics and Its Applications to Clinical Research
Total Page:16
File Type:pdf, Size:1020Kb
Microarray Bioinformatics and Its Applications to Clinical Research A dissertation presented to the School of Electrical and Information Engineering of the University of Sydney in fulfillment of the requirements for the degree of Doctor of Philosophy i JLI ··_L - -> ...·. ...,. by Ilene Y. Chen Acknowledgment This thesis owes its existence to the mercy, support and inspiration of many people. In the first place, having suffering from adult-onset asthma, interstitial cystitis and cold agglutinin disease, I would like to express my deepest sense of appreciation and gratitude to Professors Hong Yan and David Levy for harbouring me these last three years and providing me a place at the University of Sydney to pursue a very meaningful course of research. I am also indebted to Dr. Craig Jin, who has been a source of enthusiasm and encouragement on my research over many years. In the second place, for contexts concerning biological and medical aspects covered in this thesis, I am very indebted to Dr. Ling-Hong Tseng, Dr. Shian-Sehn Shie, Dr. Wen-Hung Chung and Professor Chyi-Long Lee at Change Gung Memorial Hospital and University of Chang Gung School of Medicine (Taoyuan, Taiwan) as well as Professor Keith Lloyd at University of Alabama School of Medicine (AL, USA). All of them have contributed substantially to this work. In the third place, I would like to thank Mrs. Inge Rogers and Mr. William Ballinger for their helpful comments and suggestions for the writing of my papers and thesis. In the fourth place, I would like to thank my swim coach, Hirota Homma. Hirota is a master coach at North Sydney Aussie Masters Swimming Club. Although the asthma, interstitial cystitis and cold agglutinin disease I had suffered were untreatable or incurable, Hirota taught and used "Total Immersion" swimming to increase my overall fitness level. Not only had I completely got rid off the asthma, interstitial cystitis and cold agglutinin disease one and a half years ago, but I can also swim like Michael Phelps at this time. Finally, I owe a special gratitude to my parents for continuous and unconditional support of all my understandings, the funds to pursue my PhD and many medical expenses. II Microarray Bioinformatics and Its Applications to Clinical Research Ilene Y Chen University of Sydney 2009 Abstract This thesis provides an overview of my research over the past three years into microarray gene expression data analysis in clinical research. This work involves using bioinformatics techniques to study disease-related microarray gene expression data. These bioinformatics methods contain novel features which are of considerable biological interest. Fundamental to this thesis is that the mass of numbers produced by DNA microarrays amounting to hundreds of data points for thousands or tens of thousands of genes are primarily driven by their genetic roots in the human genome. It enables the genome to become a unifying explanation for all of human biology and medicine. This thesis is composed of three parts. In the first part, I describe the fundamental research strategies in which DNA microarrays have started to affect clinical research. In the second part of the work, I provide a choice of bioinformatics techniques to support the use of genome-based expression profiling as a commonplace microarray platform for minimally invasive diagnostic tests and therapeutic interventions in the study of diseases. In the third part, I employ the bioinformatics techniques presented in the second part to identify the molecular biological mechanisms as well as to pinpoint potential targets for drug discovery and diagnostics. This work, as described in the three parts, leads to the facility of genome-based expression profiling as a commonplace platform for diagnosis and therapeutic interventions of almost any kind of disease in clinical practice in a straightforward manner. iii Microarray Bioinformatics and Its Applications to Clinical Research Ilene Y Chen University of Sydney 2009 Table of Contents Abstract (ii) List of Tables (viii) List of Figures (x) 1 DNA Microarray Technology and Clinical Research 1 1.1 Introduction 1 1.2 The DNA Microarray Technology 3 1.3 Making Microarray Platforms 3 1.4 Microarray Gene Expression Profiling 6 1.5 Study Designs 8 1.5.1 Retrospective Studies 8 1. 5. 2 Prospective studies 9 1.6 Exploratory Studies vs. Analytical Studies 10 1. 7 Choosing Among Research Strategies 12 2 Analysis of Variant Genes 14 2.1 Introduction 14 2.2 Measure of Association 14 2.3 Confounding 16 2.4 Source of Bias 18 '' '2.5 Matching 19 2.6 The Linear Logistic Regression Model 20 2. 7 The Fixed Effect Logistic Regression Model 25 2.8 Entrez Gene 28 3 Clustering of Gene Expression Profiles 32 3.1 Introduction 32 1v 3.2 Distance Measures 33 3.3 Hierarchical Clustering 34 3.4 K-means Clustering 35 4 DNA Microarrays and Endurance Training-induced Physiological Responses 37 4.1 Introduction 37 4.2 Microarray Data Analysis 39 4.3 Conclusion 47 5 DNA Microarrays for the Study of Cancer 48 5.1 Introduction 51 5.2 Microarray Data Analysis 51 5.3.1 Childhood Acute Lymphoblastic Leukaemia 5.3.2 Waldenstrom's Macroglobulinemia 55 5.3.3 Bladder Cancer 62 5.3.4 Prostate Cancer 70 5.3.5 Serous Ovarian Cancer 73 5.3.6 Breast Cancer 76 5.3 Conclusion 81 6 DNA Microarrays for the Study of Other Diseases 82 6.1 Introduction 82 6.2 Microarray Data Analysis 83 6.2.1 Interstitial Cystitis 84 6.2.2 Asthma 95 6.2.3 Obesity and Type 2 Diabetes 101 6.2.4 HIV/AID 106 6.2.5 C. Pneumoniae Infection 117 6.2.6 Cardiac Allograft Rejection 123 6. 2. 7 Kidney Allograft Rejection 132 6.2.8 Human Endometrial Receptivity throughout Menstrual Cycles 139 6.2.9 Ovarian Endometriosis 148 6.2.10 Inflammatory Acne 157 6.3 Conclusion 161 7 Conclusions 162 v 7.1 Summary of Research Contributions 162 7. 2 Perspectives 165 7.3 Conclusion Remarks 166 Bibliography 167 VI List of Tables 2.1 Gene specific information that can be retrieved through Entrez Gene, how the data are revealed, and some aspects of how those data are processed 29 4.1 Variant genes in vastus lateralis muscle biopsies from subjects following a 50-minute endurance training course at 20-week after undergoing the endurance training course. Variant genes are prioritized relating to the following criteria: the estimated odds ratio, OR, greater than or equal to unity 40 5.1 Diagnostic discriminating genes in bone marrow samples from pediatric patients with B-ALL, T-ALL and B-ALL through MLL/ AF4 rearrangement 53 5.2 Diagnostic discriminating genes whose expressions are up-regulated in plasma cells from patients with Waldenstrbm's macroglobulinemia, down-regulated in B lymphocytes from those with chronic lymphoblastic leukaemia, and moderately expressed in plasma cells from those with multiple myeloma 58 5.3 Highly differentially expressed genes in B lymphocytes from patients with Waldenstrbm's macroglobulinemia and chronic lymphoblastic leukaemia 59 5.4 Diagnostic discriminating genes in bladder biopsies from patients with superficial transitional cell carcinoma with or without CIS lesions, and muscle invasive carcinomas 65 5. 5 Diagnostic discriminating genes whose expressions are up-regulated in prostate epithelial cells of Gleason patterns 4 and 5, and moderately expressed from those with Gleason pattern 3 72 5.6 Highly differentially expressed genes in serous ovarian adenocarcinomas 75 vii 5. 7 Diagnostic discriminating genes whose expressions are up-regulated in tumorigenic breast cancer, moderately expressed in non-tumorigenic breast cancer, and down-regulated in normal brest epithelium 78 6.1 Significant differentially expressed genes in response to IC stimulus using fixed effect logistic regression model 85 6.2 Highly differentially expressed genes in the human umbilical cord blood-derived mast cells stimulated via the high-affinity IgE receptor of Fe episilon RI 93 6.3 Highly differentially expressed genes associated with the development of obese diabetes 98 6.4 Highly differentially expressed genes in ex vivo human CD4+T cell from patients infected with HIV-1 within 6 months of study 110 6.5 Highly differentially genes in the macrophages of 57BL/6 mice after infection with C. pneumoniae 119 6.6 Highly differentially expressed gene in cardiac rejecting allografts 125 6. 7 Highly differentially expressed genes in kidney biopsies from transplant patients undergoing acute rejection 135 6.8 Diagnostic discriminating genes whose expressions are up-regulated in the endometrium at mid-secretory phase, moderately expressed for those at fate-secretory phase, and down-regulated for those at early-secretory phase 143 6. 9 Highly differentially expressed genes associated with ovarian endometriosis lesions 151 6.10 Highly differentially expressed genes in inflammatory acne 159 Vlll List of Figures 1.1 Gene expression analysis of two tissue samples using spotted DNA microarray 4 1.2 Gene expression analysis using oligonucleotide microarray 4 1.3 A clinical trial 8 2.1 (a) Situations in which gene B is a confounder for an association between gene A and disease D. (b) Situations in which gene B is not a confounder for an association between gene A and diseaseD 17 4.1 Reference KEGG pathway representing the regulation of axon Guidance 43 4.2 Reference KEGG pathway representing the regulation of actin cytoskeleton 44 4.3 The KEGG reference pathway representing the regulation of PPAR signaling transduction 46 5.1 K-means clustering results of microarray gene expression profiling of bone marrow samples from pediatric patients with B-ALL, T-ALL and