Computational Analysis of DNA Methylation and Gene Expression Patterns in Prostate Cancer

Ieva Rauluševičiūtė Computational Analysis of DNA Methylation and Gene Expression Patterns in Prostate Cancer Master’s thesis in Molecular Medicine Trondheim, June 2018 Supervisors: dr. Morten Beck Rye and prof. Finn Drabløs Norwegian University of Science and Technology Faculty of Medicine and Health Sciences Department of Clinical and Molecular Medicine ABSTRACT DNA methylation is an important contributor for prostate cancer development and progression. It has been studied experimentally for years, but, lately, high-throughput technologies are able to produce genome-wide DNA methylation data that can be analyzed using various computational approaches. Thus, this study aims to bioinformatically investigate different DNA methylation and gene expression patterns in prostate cancer. DNA methylation data from three datasets (TCGA, Absher and Kirby) was correlated with gene expression data in order to distinguish different regulation patterns. Classically, increased DNA methylation in promoter regions is being associated with gene expression downregulation. Although, results of the present project demonstrate another robust pattern, where DNA hypermethylation in promoter regions of 1,058 common genes is accompanied by upregulated expression. After analyzing expression and methylation values in the same samples from TCGA dataset, expression overcompensation in a dataset as an explanation for upregulation was excluded. Further reasons behind the pattern were investigated using TCGA DNA methylation data with extended list of probes and includes the presence of methylated positions in CpG islands, distance to transcription start sites and alternative TSSs. As compared with the downregulated genes in the classical pattern, upregulated genes were shown to have more positions in CpG islands and closer to TSSs. Moreover, the presence of alternative TSS in prostate was demonstrated, also disclosing the limitations of methylation detection systems. In conclusion, indications from a present study suggest a possible DNA methylation role in gene expression upregulation in prostate cancer, which prompts to reconsider the classical believe that DNA methylation always leads to gene suppression. iii iv ACKNOWLEDGEMENTS This thesis is a part of MSc in Molecular Medicine study program in Faculty of Medicine and Health Sciences (MH) at Norwegian University of Science and Technology (NTNU) in Trondheim, Norway. I would like to thank my supervisor dr. Morten Beck Rye for an opportunity to work with him, for his continuous advices and expertise, incredible patience and support throughout this project. Thank You very much for all the lessons and fruitful discussions. I am also grateful to prof. Finn Drabløs for allowing me to join his research group in the Department of Clinical and Molecular Medicine. Thank You for the guidance and many suggestions. Finally, my education would not be possible without the support of my family, which I am thankful for. v vi TABLE OF CONTENTS LIST OF FIGURES ........................................................................................................... ix LIST OF TABLES .............................................................................................................. x ABBREVIATIONS ............................................................................................................ xi 1. INTRODUCTION ....................................................................................................... 1 1.1. Prostate cancer .................................................................................................... 2 1.1.1. Prostate cancer problem and epidemiology..................................................... 2 1.1.2. Prostate gland and prostate cancer development ............................................. 4 1.1.3. Current methods for diagnosis of prostate cancer. Pros and cons of methods for early diagnosis ....................................................................................................... 10 1.2. DNA methylation ............................................................................................... 12 1.2.1. General concepts of DNA methylation ......................................................... 12 1.2.2. DNA methylation changes in cancer ............................................................ 17 1.2.3. Current methods for detection and analysis of DNA methylation ................. 23 2. DATA AND METHODS ........................................................................................... 27 2.1. Research project design and workflow ............................................................. 27 2.2. Data selection ..................................................................................................... 29 2.2.1. DNA methylation data ................................................................................. 29 2.2.2. Gene expression data ................................................................................... 29 2.3. Programming languages .................................................................................... 30 2.3.1. Python ......................................................................................................... 30 2.3.2. R .................................................................................................................. 31 2.4. Gene set enrichment analysis ............................................................................ 32 2.5. Genome browser ................................................................................................ 33 2.6. DBTSS ................................................................................................................ 34 3. RESULTS .................................................................................................................. 35 3.1. Datasets of DNA methylation in prostate cancer .............................................. 35 3.2. PART I: DNA methylation and gene expression patterns and their significance 37 3.2.1. DNA methylation gain and loss in TCGA, Kirby and Absher DNA methylation datasets .................................................................................................... 37 3.2.2. DNA methylation and gene expression patterns in TCGA, Kirby and Absher datasets........................................................................................................................ 39 3.2.3. Consistency of genes, associated with multiple probes ................................. 40 3.2.4. Gene set enrichment analysis for overlapping genes ..................................... 41 3.3. PART II: Sample-to-sample comparison of gene expression and DNA methylation of genes in ‘Gain of methylation — upregulated expression’ regulation pattern ........................................................................................................................... 44 3.4. PART III: Analysis of ‘Gain of methylation — upregulated expression’ regulation pattern and comparison with ‘Gain of methylation — downregulated expression’ pattern ........................................................................................................ 45 vii 3.4.1. Genes, following only UPUP and UPDOWN regulation patterns ................. 45 3.4.2. Visualization of differentially and non-differentially methylated positions in Genome Browser ......................................................................................................... 47 3.4.3. Connection between methylated positions and transcription start sites .......... 51 3.4.4. Connection between hypermethylated positions and CpG islands ................. 54 3.4.5. Gene set enrichment analysis ....................................................................... 55 4. DISCUSSION ............................................................................................................ 57 4.1. Data selection ..................................................................................................... 57 4.2. PART I ............................................................................................................... 58 4.3. PART II .............................................................................................................. 59 4.4. PART III ............................................................................................................ 60 CONCLUSIONS ............................................................................................................... 63 REFERENCES.................................................................................................................. 65 viii LIST OF FIGURES Figure 1 Incidence rates for different cancer types in men in US from years 1975 to 2014 .... 2 Figure 2 From normal prostate to prostate cancer .................................................................. 5 Figure 3 BPH and prostate cancer share risk factors, but whether BPH is a premalignant condition is still unclear ........................................................................................................ 6 Figure 4 Molecular pathogenesis of prostate cancer .............................................................. 7 Figure 5 Ligand-dependent gene expression activation by androgen receptor. ....................... 8 Figure 6 DNA methylation ................................................................................................. 13 Figure 7 DNA methylation is one of the mechanisms that regulates gene expression .........

Computational Analysis of DNA Methylation and Gene Expression Patterns in Prostate Cancer

Interoperability in Toxicology: Connecting Chemical, Biological, and Complex Disease Data

The Function and Evolution of C2H2 Zinc Finger Proteins and Transposons

Appendix 2. Significantly Differentially Regulated Genes in Term Compared with Second Trimester Amniotic Fluid Supernatant

Supplementary Data

Identification of Novel Genes in Human Airway Epithelial Cells Associated with Chronic Obstructive Pulmonary Disease (COPD) Usin

Region Based Gene Expression Via Reanalysis of Publicly Available Microarray Data Sets

Bivariate Genome-Wide Association Analysis Strengthens the Role of Bitter Receptor Clusters on Chromosomes 7 and 12 in Human Bitter Taste

Functional Specialization of Human Salivary Glands and Origins of Proteins Intrinsic to Human Saliva

Deconvolution of Transcriptional Networks Identifies TCF4 As a Master Regulator in Schizophrenia

Host Cell Factors Necessary for Influenza a Infection: Meta-Analysis of Genome Wide Studies

Nazal Polipli Hastalarda Scgb1c1 (Lıgand-Bındıng Proteın Ryd5)

Comparative Transcriptome Analysis Reveals Key Epigenetic Targets in SARS-Cov-2 Infection