AVILA-DISSERTATION.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
LOGOS EX MACHINA: A REASONED APPROACH TOWARD CANCER by Andrew Avila, B. S., M. S. A Dissertation In Biological Sciences Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Approved Lauren Gollahon Chairperson of the Committee Rich Strauss Sean Rice Boyd Butler Richard Watson Peggy Gordon Miller Dean of the Graduate School May, 2012 c 2012, Andrew Avila Texas Tech University, Andrew Avila, May 2012 ACKNOWLEDGEMENTS I wish to acknowledge the incredible support given to me by my major adviser, Dr. Lauren Gollahon. Without your guidance surely I would not have made it as far as I have. Furthermore, the intellectual exchange I have shared with my advisory committee these long years have propelled me to new heights of inquiry I had not dreamed of even in the most lucid of my imaginings. That their continual intellectual challenges have provoked and evoked a subtle sense of natural wisdom is an ode to their efficacy in guiding the aspirant to the well of knowledge. For this initiation into the mysteries of nature I cannot thank my advisory committee enough. I also wish to thank the Vice President of Research for the fellowship which sustained the initial couple years of my residency at Texas Tech. Furthermore, my appreciation of the support provided to me by the Biology Department, financial and otherwise, cannot be understated. Finally, I also wish to acknowledge the individuals working at the High Performance Computing Center, without your tireless support in maintaining the cluster I would have not have completed the sheer amount of research that I have. To my parents there is nothing I can say that would enunciate my feelings of appreciation. Your ongoing love and support is remarkable if not extraordinary. Truly, I am thankful to have you. ii Texas Tech University, Andrew Avila, May 2012 TABLE OF CONTENTS ACKNOWLEDGEMENTS . ii ABSTRACT . vi LIST OF TABLES . viii LIST OF FIGURES . x LIST OF ABBREVIATIONS . xiii I. INTRODUCTION . 1 Background and Significance . 1 Hypotheses . 6 Chapter Summaries . 7 Bibliography . 8 II. DEVELOPMENT OF AN ABSTRACT LOGICAL MODEL FOR EX- PRESSING GENETIC RELATIONSHIPS . 9 Introduction . 9 Boolean Networks .......................... 10 Bayesian Networks .......................... 11 Knowledge Based Approaches .................... 11 Answer Set Programming and Action Languages . 12 Summary ............................... 15 Materials and Methods . 16 Results . 16 Discussion . 17 Bibliography . 22 iii Texas Tech University, Andrew Avila, May 2012 Appendix A . 26 Appendix B . 28 III. AUTOMATION OF LOGICAL MODEL GENERATION AND ANAL- YSIS FOR HOMO SAPIENS ...................... 30 Introduction . 30 Public Genetic and Biochemical Databases . 30 Representation Formats ....................... 34 Summary ............................... 37 Materials and Methods . 37 Results . 40 Discussion . 42 Bibliography . 49 Appendix A . 55 IV. ANALYSIS OF DIFFERENTIALLY EXPRESSED GENES IN HU- MAN CANCER CELL LINES . 99 Introduction . 99 Microarray Meta-Analyses of Cancer . 99 Summary ............................... 102 Materials and Methods . 102 Results . 105 Discussion . 106 Bibliography . 112 Appendix A . 152 iv Texas Tech University, Andrew Avila, May 2012 V. ANALYSIS OF DIFFERENTIALLY EXPRESSED GENES IN TISSUE SAMPLES OF BREAST CANCER . 156 Introduction . 156 Breast Cancer Initiation per the SMT . 156 Oncogenesis Via Chromothripsis . 158 Tumor Virology . 159 The Tissue Organization Field Theory of Cancer . 160 Summary ............................... 161 Materials and Methods . 162 Results . 164 Discussion . 166 Bibliography . 171 VI. CONCLUSIONS . 188 v Texas Tech University, Andrew Avila, May 2012 ABSTRACT Limitations in our current ability to integrate a diverse spectrum of genetic information in an effort to elucidate the underlying causes of cancer has spawned the need for a novel cancer modeling approach. Public repositories of biological pathways and gene expression experiments were combined in order to provide a systems biology approach toward cancer. Furthermore, by unifying these sources of knowledge, the ability to predict expression levels of unmeasured genes was developed. This technique was then applied to a variety of cancer types in order to resolve commonalities between heretofore divergent (or disparate) cancers. The results generated in this manner revealed characteristics that challenge the current prevailing paradigm of cancer. Specifically, the predicted results, according to the Somatic Mutation Theory of Cancer, of a significant upregulation of oncogenes and a significant downregulation of tumor suppressor genes was not found. In contrast, it was found that oncogenes were significantly downregulated and tumor suppressor genes were upregulated among the cancers examined. Furthermore, the results demonstrate the differential expression, in cancer cells, of genes involved in the cellular differentiation and wound healing processes. These results were used as a springboard to develop a novel oncogenesis hypothesis, named Umbracesis. In short, the Umbracesis hypothesis proposes that disruption of the wound healing process via carcinogens, occurs in such a way as to prevent organismic homeostasis from being recovered or prevent full re-differentiation of dedifferentiated cells. The former concept is implicated in inflammatory cancers. Whereas the latter concept, is implicated in cancers that show characteristics associated with embryonic tissues. It vi Texas Tech University, Andrew Avila, May 2012 was concluded, that the instrumental use of the modeling approach, developed within this study, has implications beyond cancer and may be of use within other areas of biomedical concern. vii Texas Tech University, Andrew Avila, May 2012 LIST OF TABLES 2.1 The individual relationships of an instance of a model that was imple- mented in Appendix B. 23 2.2 The application of a variety of initial conditions to the model listed in Appendix B and their solutions. 23 4.1 Genes that are significantly underexpressed in cancer (cervical, breast, and mesothelioma); sorted from most significant to least significant based on distribution mean. 117 4.2 Genes that are significantly overexpressed in cancer (cervical, breast, and mesothelioma); sorted from most significant to least significant based on distribution mean. 117 4.3 Genes that are significantly underexpressed in cervical cancer (GEO Dataset GDS3233 ); sorted from most significant to least significant based on distribution mean. 117 4.4 Genes that are significantly overexpressed in cervical cancer (GEO Dataset GDS3233 ); sorted from most significant to least significant based on distribution mean. 124 4.5 Genes that are significantly underexpressed in breast cancer (GEO Dataset GDS820 ); sorted from most significant to least significant based on distribution mean. 130 4.6 Genes that are significantly overexpressed in breast cancer (GEO Dataset GDS820 ); sorted from most significant to least significant based on dis- tribution mean. 131 viii Texas Tech University, Andrew Avila, May 2012 4.7 Genes that are significantly underexpressed in mesothelioma (GEO Dataset GDS1220 ); sorted from most significant to least significant based on distribution mean. 132 4.8 Genes that are significantly overexpressed in mesothelioma (GEO Dataset GDS1220 ); sorted from most significant to least significant based on distribution mean. 138 5.1 Genes that are significantly underexpressed in breast cancer (GEO Dataset GDS3324 ); sorted from most significant to least significant based on distribution mean. 175 5.2 Genes that are significantly overexpressed in breast cancer (GEO Dataset GDS3324 ); sorted from most significant to least significant based on distribution mean. 179 ix Texas Tech University, Andrew Avila, May 2012 LIST OF FIGURES 2.1 The network representing the causal relationship between the variables (a through i). Different arrows are used for edges leading to e in order to signify that they uniquely imply e and are not co-dependent. The relationship between a and i is of a non-direct, inverse type. 25 3.1 A gene cluster focusing on the gene IL1R1 and related elements. This is from the HeLa cervical cancer cell line (GEO dataset GDS3233 ). 51 3.2 An expansion upon Figure 3.1 with a recursive depth of one. Any gene (or related element) that connects to a gene (or related element) in Figure 3.1 has been added to the figure. 52 3.3 An expansion upon Figure 3.1 with a recursive depth of two. Any gene (or related element) that connects to a gene (or related element) in Figure 3.2 has been added to the figure. 53 3.4 A network demonstrating the ability to deduce the expression level of protein complexes from the expression levels of genes. Specifically, this network demonstrates the binding of FASL to the FAS Receptor in order to produce the FASL FAS Receptor Monomer. The deduction of the expression level of the FASL FAS Receptor Trimer is deduced from the FASL FAS Receptor Monomer. 54 4.1 A graph showing the differential expression of genes in cervical cancer (GEO Dataset GDS3233 ); specifically focusing on Semaphorin 5A and connected elements. 146 x Texas Tech University, Andrew Avila, May 2012 4.2 A graph showing the differential expression of genes in cervical cancer (GEO Dataset GDS3233 ); specifically focusing on Semaphorin 6D and connected elements. 147 4.3 A graph showing the differential expression of genes in breast cancer (GEO Dataset GDS820 ); specifically focusing on MED12 and con- nected elements. 148 4.4 A graph showing