Colon Cancer and Its Molecular Subsystems: Network Approaches to Dissecting Driver Gene Biology

COLON CANCER AND ITS MOLECULAR SUBSYSTEMS: NETWORK APPROACHES TO DISSECTING DRIVER GENE BIOLOGY by VISHAL N. PATEL Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Genetics CASE WESTERN RESERVE UNIVERSITY August 2011 Case Western Reserve University School of Graduate Studies We hereby approve the dissertation* of Vishal N. Patel, candidate for the degree of Doctor of Philosophy on July 6, 2011. Committee Chair: Georgia Wiesner Mark R. Chance Sudha Iyengar Mehmet Koyuturk * We also certify that written approval has been obtained for any proprietary material contained therein 2 Table of Contents I. List of Tables 6 II. List of Figures 7 III. List of Abbreviations 8 IV. Glossary 10 V. Abstract 14 VI. Colon Cancer and its Molecular Subsystems 15 Construction, Interpretation, and Validation a. Colon Cancer i. Etiology ii. Development iii. The Pathway Paradigm iv. Cancer Subtypes and Therapies b. Molecular Subsystems i. Introduction ii. Construction iii. Interpretation iv. Validation c. Summary VII. Prostaglandin dehydrogenase signaling 32 One driver and an unknown path a. Introduction i. Colon Cancer ii. Mass spectrometry 3 b. Methods c. Results d. Discussion e. Summary VIII. Apc signaling with Candidate Drivers 71 Two drivers, many paths a. Introduction b. Methods c. Results and Discussion d. Summary IX. Apc-Cdkn1a signaling 87 Two drivers, one path a. Introduction b. Methods c. Results i. Driver Gene Network Prediction ii. Single Node Perturbations 1. mRNA profiling 2. Proteomic profiling d. Discussion e. Summary X. Molecular Subsystems in Cancer 114 The Present to the Future XI. Appendix I 119 Prostaglandin dehydrogenase associated data XII. Appendix II 125 Apc and Cdkn1a associated data 4 XIII. Appendix III 128 Apc and Cdkn1a additional methods XIV. References 132 5 List of Tables 1. Examples of molecular subsystems 27 2. Published evidence of interactions in the Apc -Cdkn1a 100 6 List of Figures 1. Diagram of colorectal cancer development 15 2. Pathways of Tp53 17 3. Manifolds used in cancer biology 22 4. Peptides to networks 35 5. Bivariate t -distribution 43 6. Mixing angle shrinkage 45 7. Hpgd data processing flowchart 54 8. Genomic width of proteomic data 55 9. False discovery proportion for high -throughput validation 58 10. False discovery proportion for individual validation 58 11. Hsd11b2 and Selenbp1 : alternative splicing candidates 60 12. Coexpression heatmap 61 13. HNF4A binding to proteomic targets 62 14. Mechanism by which Hpgd exerts its influence on both inflammation and neoplasia 69 15. PETALS flowchart 73 16. Blossom Algorithm 76 17. Calculating bimodality of coexpression 78 18. Apc blossom 81 19. Top 3 petals highly coexpressed with Apc proteomic targets 83 20. Hapln1 petal 85 21. Flowchart for mining pathways from PPI data 90 22. Apc -Cdkn1a pathway 97 23. Differential expression of the Apc -Cdkn1a network 103 24. Proteomic interactions of the Apc -Cdkn1a network 105 25. Proteomic coexpression of the Apc -Cdkn1a network 109 7 List of Abbreviations Acronym or Abbreviation Full Name 2D-DIGE Two Dimensional In Gel Electrophoresis AP/MS Affinity Purification-Mass Spectrometry BioGRID Biological General Repository for Interaction Datasets CAN Candidate cancer driver (gene) CGH Comparative Genomic Hybridization ChIP-chip Chromatin Immunoprecipitation chip COX Cyclooxygenase DIGE See “2D-DIGE” DTT Dithiothreitol FAP Familial Adenomatous Polyposis coli FDP False Discovery Proportion FDR False Discovery Rate FWER Family-wise Error Rate GO Gene Ontology GWA Genome-wide Association (study) HPRD Human Protein Reference Database IBD Inflammatory bowel disease KEGG Kyoto Encyclopedia of Genes and Genomes KO Knock-out KS Kolmogorov-Smirnov LC-MS/MS Liquid chromatography coupled tandem Mass spectrometry MIAME Minimum Information About a Microarray Experiment MIPS Mammalian Protein Interaction Database mRNA Messenger RNA MT Mutant OMIM Online Mendelian Inheritance in Man Pfam Protein Family (database) PGE 2 Prostaglandin E2 PPI Protein-protein interaction PPIN Protein-protein interaction network PTM Post-translational modification SDS PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis siRNA Small interfering RNA SNP Single Nucleotide Polymorphism TDP True Discovery Proportion UC Ulcerative colitis WT Wild-type 8 Y2H Yeast Two-Hybrid 9 Glossary Term Definition Adjacency matrix Binary matrix; describes the node and edge set of a graph Affinity Purification-Mass Process of pulling down a target protein using antibodies, Spectrometry followed by subsequent analysis of the interacting proteins via MS Alternative patterns of combining exons at the pre-mRNA level; Alternative splicing performed by the spliceosome Statistical parameter quantifying the density of extreme Bimodality correlation values between two distributions The probability of a success in a sequence of independent Binomial probability experiments Biological Unit A peptide, exon, or protein Bivariate t-distribution A two-dimensional Student’s t distribution Blossom A conglomeration of petals linked via a central regulatory gene Approach to approximating the null distribution; simulates new Bootstrap (density) experiments by randomly sampling from the given samples Cross-linking of proteins to DNA, followed by antibody pull- Chromatin Immunoprecipitation down of the target protein, and hybridization of bound DNA to chip an array Time-dependent trace of peptides eluting from a Chromatogram chromatographic column A measure of the degree to which nodes in a graph cluster Clustering coefficient together; a reflection of edge density among nodes Correlation (Pearson’s or Spearman’s) of mRNA expression Coexpression measurements Comparative Genomic Array-based approach measuring abundance of DNA across the Hybridization genome for the analysis of copy number changes Complement (probability) The probability of an event not happening Cooperativity A principle describing the functional interaction between genes Covariance normalized by each variable’s variance such that the Correlation correlation ranges from -1 to 1 Covariance A measure of how two variables change together A measure of the amount of information available to estimate an Degrees of freedom unknown parameter 10 Term Definition The distribution governing the likelihood of observing particular Density values of a random variable Change in mRNA or protein expression when comparing two Differential expression experimental groups Differentially expressed See “Differential expression” A gene that, when mutated, contributes to growth and Driver (gene) development of a tumor Patterns of mRNA/protein expression that deviate (e.g. in cancer) Dysregulation from “normal” or wild-type patterns A line in an interaction graph depicting a relationship between 2 Edge nodes Eigenvalues Scalar values associated with the eigenvectors of a matrix Vector, x, of a matrix, A; when x is multiplied by A, a scaled Eigenvector version of x is returned Empirical null distribution Null density estimated from the data Empirical Type I Error Rate Probability of a false positive; estimated from the data Probability of a false positive among the set of tests rejecting the False Discovery Rate null hypothesis A situation where the null hypothesis should have been rejected, False negative but was not False positive An incorrectly rejected null hypothesis A method of estimating the probability of making at least one Family-wise Error Rate false discovery among a family of tests (in multiple hypothesis testing) Method of combining p-values by taking the sum of their Fisher’s method (statistics) logarithm Gene Ontology A hierarchical vocabulary for describing genes/proteins A study measuring thousands of polymorphisms across the Genome –wide Association genome to uncover genetic associations with disease Genomic width The length of a biological unit (in base pairs) Heterarchy A principle that treats all genes as equal Hierarchy A principle that gives importance to certain genes over others A situation in which the number of variables greatly exceeds the High dimensional number of samples Homologues Genes with sequences conserved throughout evolution Hotelling’s T 2 Multivariate statistical test; extension of the Student’s t test Hub A gene/protein with many interactions Hypomorph Mutation that results in a gene with reduced function 11 Term Definition Co-elution of multiple intense ions prevents the fragmentation Ion suppression and sequencing of less abundant species Joint distribution Statistical distribution governing the behavior of a set of variables Knock-out Homozygous deletion of a gene Kuiper’s test Statistical test used to evaluate the similarity of two distributions Chromatographic separation of a peptide mixture, followed Liquid chromatography coupled directly by mass spectrometric fragmentation; allows for reverse tandem Mass spectrometry identification of peptides Manifold A map or structure that organizes genes/proteins Marginal (distribution) The density governing a single variable in a multivariate analysis Maximum likelihood estimation A method of estimating the parameters of a statistical model A parameter (related to Pearson’s correlation) in the bivariate t- Mixing angle distribution that controls the correlation between the variables Module A group of genes organized according to similarity A coupling of a manifold and a set of high-throughput molecular Molecular Subsystem measurements Multiple hypothesis testing Significance testing of a large number

Load more