Bioinformatics Mining of the Dark Matter Proteome For
Total Page:16
File Type:pdf, Size:1020Kb
BIOINFORMATICS MINING OF THE DARK MATTER PROTEOME FOR CANCER TARGETS DISCOVERY by Ana Paula Delgado A Thesis Submitted to the Faculty of The Charles E. Schmidt College of Science In Partial Fulfillment of the Requirements for the Degree of Master of Science Florida Atlantic University Boca Raton, Florida May 2015 Copyright 2015 by Ana Paula Delgado ii ACKNOWLEDGEMENTS I would first like to thank Dr. Narayanan for his continuous encouragement, guidance, and support during the past two years of my graduate education. It has truly been an unforgettable experience working in his laboratory. I also want to express gratitude to my external advisor Professor Van de Ven from the University of Leuven, Belgium for his constant involvement and assistance on my project. Moreover, I would like to thank Dr. Binninger and Dr. Dawson-Scully for their advice and for agreeing to serve on my thesis committee. I also thank provost Dr. Perry for his involvement in my project. I thank Jeanine Narayanan for editorial assistance with the publications and with this dissertation. It has been a pleasure working with various undergraduate students some of whom became lab mates including Pamela Brandao, Maria Julia Chapado and Sheilin Hamid. I thank them for their expert help in the projects we were involved in. Lastly, I want to express my profound thanks to my parents and brother for their unconditional love, support and guidance over the last couple of years. They were my rock when I was in doubt and never let me give up. I would also like to thank my boyfriend Spencer Daniel and best friends for being part of an incredible support system. iv ABSTRACT Author: Ana Paula Delgado Title: Bioinformatics Mining of the Dark Matter Proteome for Cancer Targets Discovery Institution: Florida Atlantic University Thesis Advisor: Dr. Ramaswamy Narayanan Degree: Master of Science Year: 2015 Mining the human genome for therapeutic target(s) discovery promises novel outcome. Over half of the proteins in the human genome however, remain uncharacterized. These proteins offer a potential for new target(s) discovery for diverse diseases. Additional targets for cancer diagnosis and therapy are urgently needed to help move away from the cytotoxic era to a targeted therapy approach. Bioinformatics and proteomics approaches can be used to characterize novel sequences in the genome database to infer putative function. The hypothesis that the amino acid motifs and proteins domains of the uncharacterized proteins can be used as a starting point to predict putative v function of these proteins provided the framework for the research discussed in this dissertation. Initially, a comprehensive atlas of 800 uncharacterized proteins was established using Meta Analysis approaches. Involving a streamlined strategy with the use of genome-wide association studies, transcriptome and proteome- based expression studies, motifs and domains analysis, interactome and pathway mapping tools, the atlas of the novel genes were characterized. Druggable proteins such as enzymes, channel proteins, receptors and transporters as well secreted protein biomarkers were identified amongst the novel proteins. Genome association studies show the involvement of these novel proteins in multiple diseases. An uncharacterized calcium binding Carcinoma- Related EF-hand Protein (CREF) was chosen as a proof of concept for the approach. A comprehensive characterization of the CREF protein suggests drug therapy and pharmacogenomics potential in the breast, liver and lung carcinomas. The atlas of the novel proteins generated in this study provides a rationale for new target(s) discovery for cancer and other diseases. vi BIOINFORMATICS MINING OF THE DARK MATTER PROTEOME FOR CANCER TARGETS DISCOVERY List of Tables ....................................................................................................... xii List of Figures ...................................................................................................... xiii Specific Aims…………...………………………………………………………………..1 Chapter 1: Background and Significance .............................................................. 3 Gene and Cancer ............................................................................................... 3 Bioinformatics and Cancer Gene Discovery ....................................................... 5 Novel ESTs and Uncharacterized Proteins ........................................................ 8 Illuminating the Dark Matter of the Human Genome .......................................... 9 Protein Domains, Motifs, and Fingerprints ...................................................... 11 Genome Wide Association Studies ................................................................. 12 Genome to Phenome Studies .......................................................................... 13 Pharmacogenomics .......................................................................................... 15 Drug Discovery ................................................................................................. 15 Biomarkers ....................................................................................................... 16 Significance ...................................................................................................... 19 Chapter 2: Materials and Methods ...................................................................... 21 vii Genome Analysis ............................................................................................ 21 Transcriptome Analysis ................................................................................... 21 Proteome Analysis ........................................................................................... 22 Knowledge-based Datamining ......................................................................... 23 Textmining Query Definition ............................................................................ 23 GeneALaCart (LifeMap Discovery) Batch Analysis ......................................... 24 Data Analysis ................................................................................................... 24 Statistics .......................................................................................................... 25 Chapter 3: The Approach and Experimental Design ........................................... 26 Database Generation ...................................................................................... 26 The Approach .................................................................................................. 27 Expression Analysis ........................................................................................ 27 The mRNA Expression .................................................................................... 28 The Protein Expression ................................................................................... 28 Protein Class .................................................................................................. 28 Characterization of Protein Motifs .................................................................. 31 Structure, Interaction and Pathway Identification ............................................ 32 GWAS and Genome to Phenome Analysis ..................................................... 32 Putative Diagnostic/ Druggable Targets .......................................................... 33 Chapter 4: Results ............................................................................................... 34 viii Characterization of the Carcinoma Related EF-Hand Protein ......................... 34 !!!!!Expression Profiling of the CREF Protein ........................................................ 34 GWAS of the CREF Gene ................................................................................ 40 !!!!Comprehensive Mutational Analysis of the CREF Gene .................................. 47 Molecular Characterization of the CREF Gene ................................................ 48 Regulation of the CREF Gene Expression ....................................................... 49 Effect of Knockout of Genes on CREF ............................................................. 52 Effect of Knockdown of Genes on CREF ......................................................... 52 Effect of Overexpression of Genes on CREF ................................................... 52 Effect of Mutations of Genes on CREF ............................................................ 53 Effect of Methylation on CREF Gene Regulation ............................................. 54 MicroRNA Regulation of the CREF Gene ........................................................ 54 Protein Motif and Domains Analysis ................................................................. 56 Structural Characterization ............................................................................... 56 3D Modeling of the CREF Protein .................................................................... 58 Templates Used by I-TASSER ......................................................................... 59 Proteins with Highly Similar Structure in PDB .................................................. 60 Post Translation Modification ........................................................................... 61 Interactome Mapping for the CREF Protein ..................................................... 61 Involvement of CREF in Other Diseases .........................................................