A META ANALYSIS of STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA INTEGRATION a Dissertation Submitted in Partial Satisfaction of the Requirements for the Degree Of
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA SANTA CRUZ STEMNESS REVISITED: A META ANALYSIS OF STEM CELL SIGNATURES USING HIGH-THROUGHPUT DATA INTEGRATION A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in BIOINFORMATICS by Martina I. Koeva December 2009 The Dissertation of Martina I. Koeva is approved: Professor Joshua Stuart, Chair Professor Camilla Forsberg Professor Kevin Karplus Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright c by ! Martina I. Koeva 2009 Table of Contents List of Figures vii List of Tables x Abstract xii Dedication xiv Acknowledgments xv 1 Introduction 1 1.1 Motivation and problem statement . 1 1.2 Goals of the study . 3 1.3 Overview of main results . 3 1.4 Organization of the dissertation . 5 2 Stem cells 6 2.1 Normal stem cells . 7 2.1.1 Definitions, origins and key properties of normal stem cells . 7 2.1.2 Normal stem cell types . 10 2.1.3 Known functional pathways that regulate normal stem cell behavior 17 2.2 Relationship between stem cells and cancer . 23 2.2.1 Cancer and tumor heterogeneity . 23 2.2.2 Cancer evolution theories . 25 2.2.3 Cancer stem cells . 28 2.2.4 Shared mechanisms between normal and cancer stem cells . 30 2.2.5 Metastasis and stem cells . 32 2.3 Side populations . 35 2.4 High-throughput technologies and stem cells . 36 2.4.1 High-throughput technologies . 36 2.4.2 High-throughput data . 40 2.4.3 Public data repositories . 43 iii 2.5 Summary . 44 3 Meta-analysis 45 3.1 Overview of Meta-analysis . 45 3.2 Meta-analysis and microarray data . 46 3.3 Techniques for combining study-specific effects . 47 3.3.1 Vote counting . 48 3.3.2 Effect size combination . 49 3.4 Choice of techniques for the stemness meta-analysis . 50 3.5 Summary . 51 4 Previous expression-based approaches to stemness 52 4.1 Gene-level approaches to stemness . 53 4.1.1 Founder studies . 53 4.2 Global-level approaches to stemness . 57 4.3 Summary . 60 5 Stemness Meta-Analysis Method 61 5.1 Stemness Meta-Analysis method overview . 62 5.2 Input data sources . 68 5.2.1 Mouse profiling studies . 68 5.2.2 Human profiling studies . 73 5.3 Input gene modules . 78 5.3.1 Homolog gene families (modules) . 78 5.3.2 Functional gene modules . 81 5.4 Recurrence scoring . 84 5.4.1 Notation definitions . 85 5.4.2 General form of a recurrence score . 86 5.4.3 Simulation of synthetic module data . 88 5.4.4 Evaluation and selection of a recurrence score . 92 5.4.5 Significance of recurrence scores . 95 5.5 Diversity scoring . 98 5.5.1 Notation definitions . 98 5.5.2 Cell type and gene usage diversity . 99 5.5.3 Significance of diversity scores . 101 5.6 Specificity scoring . 102 5.7 Pattern classification . 103 5.8 Stemness “on” and stemness “off” modules . 106 5.9 Formulation of stemness index . 107 5.9.1 Notation definitions . 108 5.9.2 Cross-validation setup . 108 5.9.3 Binary switch and fractional scoring . 112 5.9.4 Log-likelihood approach . 119 iv 5.10 Summary . 125 6 Stemness mechanisms in mouse stem cells 127 6.1 Identification of recurrent modules from mouse dataset compendium . 128 6.1.1 Recurrent module swap control . 131 6.1.2 Cultured cell bias . 133 6.2 Selection of significant of cell-diversity scores . 135 6.3 Classification of modules using diversity and specificity scoring . 138 6.3.1 Single-gene stemness . 138 6.3.2 Module-level stemness . 138 6.4 Stemness modules . 147 6.4.1 Oncogenes: Myb family . 149 6.4.2 Tumor suppressor factors: Sfrp family . 150 6.4.3 NM23 family . 152 6.4.4 Chaperone roles: Heat shock (Hspa) and importin families . 155 6.4.5 Lineage-specific gene inhibition: Inhibitor of differentiation/DNA binding (Id) family . 157 6.5 Comparison with other global stemness methods . 159 6.6 Differentiation modules . 161 6.7 Summary . 166 7 Stemness mechanisms in human stem cells 167 7.1 Recurrent modules in a human stem cell compendium . 168 7.2 Cell type diversity assessment and classification of recurrent modules . 176 7.3 Human stemness modules . 180 7.3.1 Angiogenesis: FGFR/FLT/PDGFR family . 180 7.3.2 Heparan sulfate proteoglycans (HSPGs): Glypican family . 182 7.4 Mammalian stemness modules: Notch, TCF/LEF, Frizzled, Integrin and Chd families . 183 7.4.1 Cell adhesion and communication: Integrin alpha family . 183 7.4.2 Wnt pathway: Tcf/LEF and Frizzled families . 184 7.4.3 Chromatin: Chd/Smarca family . 185 7.5 Summary . 186 8 Applications of stemness mechanisms to stem cell and cancer classifi- cation 187 8.1 Motivation and overview . 188 8.2 Stem cell-like populations . 189 8.3 What about side populations? . 194 8.4 Are normal stemness mechanisms conserved in cancer stem cells? . 195 8.5 Stemness in metastasis . 199 8.6 Summary . 199 v 9 Conclusion 202 9.1 Discussion . 202 9.2 Future directions . 205 9.2.1 Application to human data . 205 9.2.2 Application to alternative splicing and miRNA data . 205 9.2.3 Addition of niche data . 206 9.2.4 Methodological improvements . 206 9.2.5 Possible biological experiments . 207 A Definitions of key terms 209 B Tables of stemness modules 213 Bibliography 218 vi List of Figures 2.1 Hierarchical organization of stem cell types within the ectoderm, meso- derm, and endoderm lineages. 8 2.2 Stem cells can undergo symmetric or asymmetric cell division. 10 2.3 Hematopoietic stem cell system differentiation tree. 12 2.4 General scheme of the fluorescence-activated cell-sorting (FACS) flow cy- tometry technique. 13 2.5 Wnt pathway activation scheme. 18 2.6 Notch pathway activation scheme. 20 2.7 TGF beta pathway example activation scheme. 22 2.8 Clonal evolution cancer expansion model. 26 2.9 Cancer stem cell expansion model. 27 2.10 Illustration of functional self-renewal assay. 29 2.11 Illustration of the role of Bmi1 in self-renewal. 31 2.12 A FACS plot for a toy example that illustrates the isolation and definition of a side population. 37 4.1 Overlap between the genes upregulated in four stem cell types in three stemness founder studies. 54 5.1 Overview of stemness meta-analysis method. 62 5.2 Published gene list (PGL) input form to Stemness Meta-Analysis Method (SMA). 64 5.3 Module-level view of stemness. 67 5.4 Classification of modules based on cell-diversity, specificity and gene- diversity scores. 69 5.5 Global inter-experiment similarities in mouse stem cell compendium. 74 5.6 Global inter-experiment similarities in human stem cell compendium. 77 5.7 Protein similarity network generation approach used for homolog family definition. 80 5.8 Possible pitfalls of the homolog module definition methodology. 80 5.9 Pattern classification procedure used to identify stemness modules . 104 vii 5.10 Visual illustration of the five-fold cross-validation setup. 109 5.11 Robustness of recurrence scores across cross-validation folds. 113 5.12 Precision-recall comparison of four stemness index scores. 116 5.13 Precision comparison between stemness index scores. 117 5.14 Precision-recall comparison of twelve stemness index scores, based on different feature types and elements. 123 5.15 Precision-recall comparison of the real stemness and differentiation fea- tures to a randomly selected feature set. 124 5.16 A box-plot comparison of the stemness indices of all stem cell and differ- entiated cell experiments in the mouse stem cell compendium. ..