An Evaluation of Cancer Subtypes and Glioma Stem Cell Characterisation Unifying Tumour Transcriptomic Features with Cell Line Expression and Chromatin Accessibility
Total Page:16
File Type:pdf, Size:1020Kb
An evaluation of cancer subtypes and glioma stem cell characterisation Unifying tumour transcriptomic features with cell line expression and chromatin accessibility Ewan Roderick Johnstone EMBL-EBI, Darwin College University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy Darwin College December 2016 Dedicated to Klaudyna. Declaration • I hereby declare that except where specific reference is made to the work of others, the contents of this dissertation are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. • This dissertation is my own work and contains nothing which is the outcome of work done in collaboration with others, except as specified in the text and Acknowledge- ments. • This dissertation is typeset in LATEX using one-and-a-half spacing, contains fewer than 60,000 words including appendices, footnotes, tables and equations and has fewer than 150 figures. Ewan Roderick Johnstone December 2016 Acknowledgements This work was funded by the Biotechnology and Biological Sciences Research Council (BBSRC, Ref:1112564) and supported by the European Molecular Biology Laboratory (EMBL) and its outstation, the European Bioinformatics Institute (EBI). I have many people to thank for assistance in preparing this thesis. First and foremost I must thank my supervisor, Paul Bertone for his support and willingness to take me on as a student. My thanks are also extended to present and past members of the Bertone group, particularly Pär Engström and Remco Loos who have provided a great deal of guidance over the course of my studentship. I must also thank the members of my thesis advisory com- mittee, Steve Pollard, John Marioni and Jan Korbel for their support and comments. Last but not least I thank my wife to be, Klaudyna Johnstone nèe Schmidt, for your significant patience and support. A number of people contributed to the analysis set out in this thesis. Remco Loos con- tributed with comments and notes work presented in Chapter 2. Harry Bulstrode contributed cell culture and ATAC-seq library preparation for ATAC-seq datasets used in Chapter 4. Steven Pollard provided advice on glioma and GNS biology as well as supervision for H. Bulstrode. Paul Bertone contributed supervision, funding, manuscript revision alongside cell culture and microarray data preparation for GNS and NS exon array data represented in Chapter 3. Abstract The observation of significant cellular heterogeneity between tumours of the same typehas inspired efforts to identify gene expression based signatures representative of this variation. Large sample size expression datasets have been used to describe discrete clusters of tumour samples that are intended to represent functionally and clinically divergent tumour subtypes. While many of these studies have identified reproducible subtypes, the definition of clear expression-based subtypes in glioma have been particularly elusive. Relating these tumour subtypes to expression signatures and phenotypes in cancer stem cells has also been difficult. Set out within this thesis I apply a novel coexpression analysis method to identify inde- pendent subtypes within independent coexpression modules. These modules relate to intu- itive biological features enabling module specific variation to be identified independently of a transcriptome wide subtype. This methodology is used to evaluate established subtypes in breast ductal carcinoma and glioma. In breast carcinoma the basal, luminal, Her2-enriched and claudin-low cancer subtypes are replicated revealing functional expression differences that define each type. In glioma, dominant expression variation presents a grade associated axis of proneural to mesenchymal expression. This axis is also present within individual tumours suggesting classification of individual tumours as discrete subtypes should notbe assumed. Analysis of glioma derived stem cell lines similarly reveals distinct proneural and mesenchymal clusters in both gene expression and chromatin accessibility. These signatures also unify phenotypes described in previous glioma stem cell analysis. Proneural signature genes suggest these lines are similar to normal glial progenitor cells while mesenchymal expression largely relates inflammatory and immune responses. Differential chromatin accessibility of signatures genes enables the analysis of epige- netic control of subtype signature transcriptional networks. Complementing subtype anal- ysis I also compare between glioma derived stem cells and neural stem cells to identify glioma specific features. Novel methods for ATAC-seq analysis are also described forthe examination of chromatin accessibility. These findings will further assist the translation of tumour subtypes to the clinic alongside deeper characterisation glioma’s persistent and heterogeneous cancer stem cell population. Table of contents Table of contents xi List of figures xv List of tables xix Nomenclature xxii 1 Introduction1 1.1 Characterising glioma: tumours and stem cells . .1 1.2 An overview glioma cellular diversity . .3 1.2.1 Cancer stem cells . .5 1.2.2 Inter and intratumoural expression variation . .9 1.3 Aims for this thesis . 11 2 Methods 13 2.1 Coexpression methods for cancer subtype analysis . 13 2.2 Transcriptomic analysis of glioma derived neural stem cells . 15 2.3 ATAC-seq analysis and its application to GNS and NS cells . 17 3 Coexpression methods for cancer subtype analysis 21 3.1 Introduction: The cancer subtype problem . 21 3.2 Results . 23 3.2.1 Correlation marker clustering: Coexpression analysis for the iden- tification of independent subtypes . 23 3.2.2 Correlation marker clustering analysis identifies both general and cancer-specific features in tumour transcriptomes . 27 xii Table of contents 3.2.3 Breast ductal carcinoma expression modules allow precise identifi- cation of established tumour subtypes . 28 3.2.4 Coexpression module subclusters define established breast cancer subtypes and reveal their distinctive transcriptional features . 32 3.2.5 Glioma specific signatures identify components of intratumoural variation including a Proneural to Mesenchymal axis . 38 3.3 Discussion . 48 4 Transcriptomic analysis of glioma derived neural stem cells 51 4.1 Introduction . 51 4.1.1 Glioma stem cell culture . 52 4.1.2 The phenotypic diversity of glioma stem cells . 54 4.1.3 Characterisation of adherent GNS cells . 55 4.1.4 Outlook . 56 4.2 Results . 57 4.2.1 Comparing GNS to NS cells reveals glioma specific expression . 57 4.2.2 Verhaak et al. subtype centroid classification of GNS lines . 64 4.2.3 Expression modules identified in GNS lines correspond to glioma derived proneural and mesenchymal expression modules . 69 4.2.4 GNS proneural associated modules . 74 4.2.5 GNS mesenchymal associated modules . 78 4.2.6 Analysis of GNS proneural and mesenchymal modules in GNS-like cells . 81 4.3 Discussion . 84 5 ATAC-seq analysis and its application to GNS and NS cells 89 5.1 Introduction . 89 5.2 Results: Methods for ATAC-seq analysis . 92 5.2.1 An analysis of the Tn5 transposase DNA binding bias . 92 5.2.2 Considerations for ATAC-seq pre-processing . 94 5.2.3 Application of ATAC-seq for differential analysis of chromatin ac- cessibility . 99 5.3 Results: Application of ATAC-seq to the characterisation of GNS and NS cells . 101 5.3.1 ATAC-seq GNS to NS cell line comparison . 101 Table of contents xiii 5.3.2 ATAC-seq GNS subtype analysis reveals proneural and mesenchy- mal associated differences in chromatin accessibility . 107 5.3.3 Transcription factor motif enrichment in differentially accessible chromatin . 108 5.3.4 An examination of transcription factor motif footprint profiles . 113 5.4 Discussion . 118 6 Discussion and Outlook 119 6.1 Discussion . 119 6.2 Outlook and potential future work . 122 References 125 7 Appendix 157 7.1 Supplementary figures and tables . 157 List of figures 3.1 Comparison of UPGMC and UPGMA linkage criteria . 24 3.2 Centroid linkage module metrics across a range of cut off heights . 26 3.3 Heatmap visualising coexpression modules derived from Cross-cancer, BRCA or glioma tumour data . 30 3.4 BRCA scatterplots showing separation of PAM50 subtype samples . 31 3.5 Independent intramodule variation identifies consensus subtypes . 35 3.6 Basal module versus FOXC1 expression . 36 3.7 Expression of genes that distinguish claudin-low samples . 37 3.8 Expression of FOXA1 and ESR1 distinguish Her2-enriched samples . 38 3.9 Glioma samples present a dominant proneural to mesenchymal axis . 40 3.10 Expression of reduced mesenchymal and immune cell modules distinguishes mesenchymal and classical subtype samples . 44 3.11 The proneural to mesenchymal axis can be found in other datsets . 45 3.12 Expression of CMC modules between the tumour bulk and margin . 47 4.1 A comparison of neurosphere vs adherent culture . 53 4.2 CD133+ GSC neurospheres and CD133- adherent cell morphology . 55 4.3 Heatmap illustrating genes differentially expressed between NS and GNS . 61 4.4 Coexpression of GNS versus NS differentially expressed genes in glioma . 63 4.5 Verhaak et al. centroid classification of glioma and GNS data . 65 4.6 Verhaak et al. centroid genes in glioma and GNS data . 66 4.7 Relative expression based classification of GNS data to Verhaak et al. subtypes 68 4.8 Variance and gene number of coexpression modules . 69 4.9 Proneural and mesenchymal CMC modules separate GNS cells into two clusters . 70 4.10 Principal component analysis of GNS array data . 71 xvi List of figures 4.11 Heatmap showing expression of proneural and mesenchymal modules in GNS and NS data . 72 4.12 Variation of genes within proneural modules . 77 4.13 Variation of genes within mesenchymal modules . 79 4.14 GNS proneural and mesenchymal modules in GNS-like GSCs in EGF/FGF and serum culture conditions . 82 4.15 Serum induced mesenchymal transition in GSCs . 83 4.16 GNS proneural and mesenchymal coexpression modules in additional datasets 84 5.1 Sequencing adaptor transposition into accessible chromatin via ATAC-seq .