Transcriptional Characterization of Neural Stem Cells

Diva Tommei

Clare Hall College

A dissertation submitted to the University of Cambridge for the degree of Doctor of Philosophy

European Molecular Biology Laboratory European Institute Wellcome Trust Genome Campus [email protected] A Claudia e Daniela. This dissertation is my own work and includes nothing which is the outcome of work done in collaboration, except when specified in the text. It is not sub- stantially the same as any other that I have submitted for a degree, diploma or other qualification and no part has already been, or is currently being sub- mitted for any degree, diploma or other qualification. This dissertation does not exceed the specified length limit of 60,000 words as defined by The Biology Degree Committee. This dissertation has been typeset in 12pt Palatino with one-and-a-half spacing using LATEX 2ε according to the specifications defined by the Board of Graduate Studies and the Biology Degree Committee.

Diva Tommei March 25, 2013 Abstract

Tumours affecting the glial portion of brain parenchyma are termed and consti- tute the most frequent and lethal affecting the central nervous system. multiforme is the most aggressive glioma in adults and a World Health Organisation clas- sified grade IV , characterised by widespread intra-tumoural heterogeneity. A recent advance in the study of gliomas has been the establishment of glioma-derived neural stem (GNS) cell lines that may represent the glioma cell of origin. While these cell lines show many similarities to normal neural stem (NS) cells, an important difference is their capacity to give rise to authentic glioma-like tumours when xenografted into subventricular strata of immunocompromised mice. Here I describe an in-depth characterisation of the transcriptome of GNS cells, to identify differences in the expression between normal and glioma-derived cell lines that may underlie tumorigenesis. Analyses were carried out at the levels of , molecular signature profiling, transcript isoform detection and the quantitation of small non-coding RNAs, taking genetic alterations into account at both the karyotype and mutational level. Importantly, the cell lines studied were established from tumours with differing histology, allowing us to sample the breadth of the disease rather than focus on the differences between unhealthy versus healthy counterparts. We identified a large cohort of significantly differentially-expressed and a smaller subset of strictly up- and down-regulated ones, including several known glioma oncogenes as well as novel candidates. An extensive glioblastoma pathway was manually curated to show the expression of our dataset on the known and unknown glioblastoma-affected pathways. Interestingly, gene set enrichment analysis revealed a consistent up-regulation of inflammatory genes in the GNS lines belonging to the MHC class II family, suggesting an immune-evasion phenotype that has been noted in a number of early glioma studies. Gliomas have been classified into a small number of subtypes on the basis of patient survival and response to therapy. We found that the expression signatures of GNS cell lines closely resembled the mesenchymal and proneural subtypes, as well as reflecting their known histopathological features. To characterise genes correlating with patient survival time, we tested for the association between survival time and gene expression in publicly available glioma and glioblastoma data sets and found four genes to be strongly positively correlated with patient survival time and patient age. Together these studies provide an in-depth analysis of a model of glioma pathology driven by an aberrant population of NS cells. Finally, a package for the performance evaluation of eight leading microRNA target pre- diction algorithms was built using array and microRNA array data from the same GNS cell lines. This data was used to validate experimentally the target prediction algorithms that were assessed in their performance as single and combinations of them. The combi- natorial weight analysis allowed us to conclude that (i) tissue specificity bears a non-trivial weight in predicting what set of genes a certain microRNA regulates and, therefore, should be included in future versions of these algorithms, and that (ii) the ElMMO prediction algorithm fares better than any other combination of prediction algorithms. Acknowledgements

I would like to thank on a professional note all the members of my lab, starting from my supervisor Paul Bertone, who has given me the opportunity of con- ducting research at the University of Cambridge and has taught me over the years invaluable life lessons that I will always remember. An incredibly special thank you goes to Pär Engstrom, who supervised my work throughout my time as a doctoral student and has always been there for me in any matter scientific.

On a more personal note, I would like to thank my parents. Mamma, grazie dell’appoggio psicologico, culinario, telefonico, automobilistico e soprattutto affettivo che mi hai dato negli anni passati lontano da casa. Senza le tue cure ed il tuo affetto non sarei mai arrivata viva a questo giorno. Babbo, grazie del DNA che condividiamo, grazie del costante supporto mentale, pecuniario e filosofico che mi dai sempre incondizionatamente, a mo’ di martello pneu- matico. Grazie degli stimoli incessanti sui quali posso sempre contare. I also want to thank Sean Cheng for being there for me since March of 2009. Thanks for these past four years of exciting PhD life that you lived with me. Who knows what’s next for us. Thanks for being a great person, thanks for under- standing what goes on in my mind, thanks for the love and affection that you give me every day and thanks for ruining movies by always anticipating what will happen next. I secretly love that. Contents

Page Introduction 1 1 Glioblastoma 2 1.1 Glial Cells in the Central Nervous System ...... 2 1.2 Glioblastoma Multiforme ...... 5 1.3 Primary and Secondary ...... 10 1.4 Pathways Involved in Glioblastoma ...... 21 1.5 Pathway Crosstalk ...... 30 2 Neurogenesis 33 2.1 Radial Glia ...... 33 2.2 Neural Stem Cells ...... 37 3 Brain Stem Cells 57 3.1 The Cancer Stem Cell Hypothesis ...... 57 3.2 Brain Cancer Stem Cells ...... 63 3.3 Glioma Culture Systems ...... 72 4 The Non-Coding RNA World 85 4.1 MicroRNA regulation ...... 86 4.2 Target Prediction and Validation ...... 90 Methods 93 5 Methods 94 5.1 Tag-sequencing Data Processing ...... 94 5.2 Array Comparative Genomic Hybridization ...... 98 5.3 Differential Gene Expression ...... 100 5.4 Quantitative Real Time-PCR Validation ...... 101 5.5 Literature Mining ...... 107 5.6 Differential Isoform Expression ...... 110 5.7 Differential Long ncRNA Expression ...... 112 5.8 Glioma Expression Signatures ...... 112

i 5.9 External Dataset Expression Correlation ...... 112 5.10 Glioblastoma Pathway Construction ...... 114 5.11 MicroRNA Target Prediction Analysis ...... 117 Results 119 6 Digital Transcriptome Profiling 120 6.1 Clinical Data ...... 120 6.2 Tag mapping ...... 121 6.3 Copy Number Aberrations ...... 125 6.4 Core Differentially Expressed Genes ...... 129 6.5 Large-scale qRT-PCR Validation ...... 136 6.6 Literature Mining for Differentially Expressed Genes ...... 142 6.7 Isoform Differential Expression ...... 144 6.8 Long ncRNA Differential Expression ...... 157 7 Dataset Correlation Analyses 161 7.1 Enrichment Analysis ...... 161 7.2 Glioblastoma Expression Signatures ...... 168 7.3 Tumour Expression Correlation ...... 170 7.4 Survival Analysis ...... 182 7.5 Glioblastoma Pathway Analysis ...... 187 8 MicroRNA Target Prediction Ensemble Software 202 8.1 Principles ...... 202 8.2 Workflows ...... 204 8.3 Databases ...... 205 8.4 Filters ...... 209 8.5 Target Prediction Ensemble Analysis ...... 211 Conclusions 219 9 Discussion 220 9.1 Digital Profiling of GNS Cell Lines ...... 220 9.2 MicroRNA Target Prediction Analysis ...... 231 9.3 Concluding Remarks ...... 232 Appendix A Differentially Expressed Genes 234 A.1 Differential Expression ...... 234 A.2 Classified Differential Expression ...... 250 A.3 Quantitative RT-PCR ...... 256 A.4 Tag-seq vs qRT-PCR Correlation ...... 265 Appendix B Literature Mining Script 266

ii Appendix C Long ncRNAs 273 Appendix D Glioblastoma Pathway 276 D.1 Pathway Interactions ...... 276 D.2 Pathway Images ...... 282 Appendix E Exon Array Data 286 Appendix F MicroRNA Array Data 300 List of Abbreviations 306 List of Figures 311 List of Tables 314 Bibliography 315

iii Introduction

1 Chapter 1

Glioblastoma

Contents 1.1 Glial Cells in the Central Nervous System ...... 2 1.2 Glioblastoma Multiforme ...... 5 1.3 Primary and Secondary Glioblastomas ...... 10 1.4 Pathways Involved in Glioblastoma ...... 21 1.5 Pathway Crosstalk ...... 30

1.1 Glial Cells in the Central Nervous System

The Central Nervous System (CNS) is built up of neurons and special kinds of supporting cells called glial cells. Neurons are responsible for the functions that are unique to the nervous system, whereas the glial cells primarily serve the needs of the neurons. The functions of the glial cells are still not completely known. It has been established, however, that they are responsible for isolat- ing neuronal processes and controlling the environment of neurons as well as taking part in repair processes. Importantly, glial cells are fundamental during the development of the nervous system by providing surfaces and scaffoldings for migrating neurons and outgrowing axons [10,73]. The accepted hypothesis that glial cells outnumber the 100 billion neurons present in the brain with a ratio of up to 50 glial cells per neuron [529] has recently been challenged and scaled down to a ratio that is closer to one glial cell per neuron [33,73]. Independently of the neuronal and glial counts, the importance of glial cells remains unquestioned and their functional roles continue to expand away from the conservative notion of a support structure for neurons [73,173,516]. In fact, although glial cells do not send precise signals over long distances, they can produce brief electric currents by opening the membrane channels for Ca2+

2 1.1 Glial Cells in the Central Nervous System Introduction and producing "calcium signals”. These signals can spread rapidly and thus influence many neurons almost simultaneously. Also, since neurotransmitter release depends on extracellular Ca2+ concentrations, glial cells can contribute to coordination of synaptic activity [10,17,73,516]. Finally, glial activation and inflammation has been implicated in neurodegenerative diseases such as Alzheimer’s disease, Parkinson’s disease and multiple sclerosis [269]. Glial cells are usually divided into three categories: astrocytes, oligodendrocytes and mi- croglial cells, each carrying functional as well as structural differences. Whilst astrocytes have many processes of various shapes and appear to serve a home- ostatic function, oligodendrocytes tend to have fewer and shorter processes, their main function being production of myelin sheaths for axon insulation. Microglial cells are much smaller than other glial cells and serve to the brain similar functions that macrophages1 serve to the rest of the body [73].

In addition to the three main kinds described above, there are other spe- cialised forms of glial cells. Schwann cells are present only in the peripheral nervous system and form myelin sheaths that influence axonal thickness, ax- onal transport and neurofilament content [73]. Ependyma cells are epithelial cylindrical cells that line the surface of the ventricles2 of the CNS and are responsible for the production, transport and absorption of cerebrospinal fluid (CSF). These cells have recently become candidates for the location of the neural stem cell niche [81,328]. Müller cells are glial cells that reside in the vertebrate retina and have recently been observed to undergo dedifferentiation in vitro into multi-potent progenitor cells in fish, and mouse, to then differentiate into a number of retinal cell types, although in vivo evidence is not conclusive [144]. They have also been shown to act as a light collector in the mammalian eye [53,298]. Bergman cells are astrocytes that reside in the cerebellum and are responsible for the migration, dendritogenesis, synaptoge- nesis and maturation of Purkinje neurons3 [534]. Finally, pituicytes are glial cells that reside in the posterior of the pituitary gland, where they participate in the control of secretory events [434].

1Macrophages are white blood cells that protect the body from harmful bacteria, particles and dying cells by ingesting them in a process called phagocytosis. 2The ventricular system of the brain is composed of four communicating ventricles filled with cerebrospinal fluid that bathes and cushions the brain and communicates with the central canal of the spinal chord. 3Purkinje cells are very large neurons in the cerebellum that release the inhibitory neu- rotransmitter gamma-aminobutyric acid (GABA) and are responsible for transmission of all motor coordination signals.

3 1.1 Glial Cells in the Central Nervous System Introduction

The numerous short and long processes of astrocytes amount to a very large exposed surface that makes these glial cells well suited for efficient exchange of molecules and ions. This fact, put together with the intimate contacts that astrocytes establish with neurons, capillaries and the CSF, puts them in the unique position to control the environment in which neurons function [394]. Neuronal homeostasis is crucial due to the extreme sensitivity of neurons to changes in concentrations of ions and neurotransmitters and the fact that these changes can become significant at very low additional amounts of substance given the very limited extracellular space4. Extracellular neurotransmitter concentration must be kept very low at all times in order for the synaptic re- lease to generate a significant change in neurotransmitter concentration [73].

Astrocytes also contribute to extracellular pH by removing CO2 [136] and help control extracellular osmotic pressure by maintaining the water balance of the brain [20]. Mechanisms for the control of extracellular osmolarity may involve exchange of small neutral molecules like the taurin [384] and special channels for transport of water called aquaporins that are present on the membrane of astrocytic processes in contact with capillaries [540,545]. By surrounding capillaries with their processes and forming very extensive tight junctions5 between the endothelial cells, astrocytes decrease the perme- ability of brain capillaries and help establish a fully functional blood-brain barrier6 [172,279,438].

Since their discovery, oligodendrocytes have been classified in many different ways due to their inherent morphological heterogeneity including variable num- ber of processes, thickness of myelin sheath, density of and clumping of nuclear chromatin [44]. The function of most oligodendrocytes is to pro- duce myelin sheaths to wrap around axons in order to isolate and quicken transportation of the nerve impulse [73]. This contributes to the much higher conduction speeds observed in myelinated axons versus unmyelinated ones, with the thickest myelinated axons conducting at about 120 meters/sec and the slowest unmyelinated ones at 1 meter/sec [44,73]. A myelin sheath consists almost exclusively of numerous layers of oligodendrocyte or Schwann cell mem- brane (lamellae) that in the process of wrapping around the axon squeeze away

4Less than 20% of the total brain volume. 5Type of junctional complexes present only in vertebrates that seal together the mem- branes of two cells, effectively preventing passage of water and ions. 6Unlike most organs in our body, the composition of the extracellular fluid of the brain differs from that in the blood plasma.

4 1.2 Glioblastoma Multiforme Introduction the cytoplasm, thus allowing the membrane layers to lie closely apposed [73]. Although the structure and function of the myelin sheath produced by oligo- dendrocytes and Schwann cells is identical, one oligodendrocyte can extend its processes to 50 axons, whilst each Schwann cell can only wrap its processes around one axon [44,413]. Recent observations have suggested that oligoden- drocytes and Schwann cells are also responsible for the long-term functional integrity of axons [356]. Furthermore, there exists a pool of oligodendrocytes named "satellite" that are found next to neuronal cell bodies where they can- not serve an isolating role, but rather it is suggested they regulate neuronal homeostasis in much a similar way as do astrocytes [44].

1.2 Glioblastoma multiforme7

Primary brain tumours comprise a wide range of histological and pathologi- cal entities, each with a distinct natural history [77,128]. For simplicity, CNS tumours are classified as gliomas or non-gliomas. Gliomas affect the glial por- tion of brain parenchyma and they are the most frequent and lethal cancers affecting the CNS [383]. Diagnosis of gliomas is heavily based on the predom- inantly affected cell type, which in turn is indicative of prognosis [128] (Table 1.1). Gliomas of astrocytic, oligodendroglial, oligoastrocytic8 and ependymal origin account for more than 70% of all brain tumours [383], with glioblas- toma multiforme (GBM) being the most frequent (65%) and malignant of all adult gliomas [154,300,365]. Grade I are well circumscribed and curative surgical resections may be attempted. However, most gliomas are characterised by diffuse infiltration of brain parenchyma, making surgical ex- tirpation impossible [77]. Some lower-grade astrocytic tumours change their identity over time and turn into more aggressive forms. Depending on the his- tology of the tumour, patients undergo surgery, chemotherapy, radiotherapy or a combination of these treatments, although the chances of cure are very low [128].

In 1979 the World Health Organisation (WHO) published the first edition of Histological Typing of Tumours of the Central Nervous System with the 7The term glioblastoma is used synonymously with "glioblastoma multiforme". 8Oligoastrocytic tumours are composed of roughly equivalent amounts of astrocyte-like and oligodendrocyte-like cells.

5 1.2 Glioblastoma Multiforme Introduction

Table 1.1: Histological Types and Prognosis of Gliomas (y, years). Taken from Doyle et al 2005 [128].

Relative Survival Tumour Cell Type Tumour WHO Incidence Rates (%) grade 1y 2y 3y Astrocytic tumours Pilocytic astrocytoma l 10% of cerebral astro- 95.7 94.3 91.3 cytomas most common in children Astrocytic tumours Diffuse astrocytoma II 10-15% of astrocytic 73.9 61.8 46.9 tumours Astrocytic tumours III - 60.3 44.0 29.4 Astrocytic tumours Glioblastoma multiforme IV 50-60% of astrocy- 29.3 8.7 3.3 tomas; 12-15% of all intracranial neoplasms Oligodendroglial II - 89.7 83.4 70.5 tumours Ependymal Ependymoma or II-III - 87.6 81.4 70.6 tumours Anaplastic ependymoma aim of establishing a comprehensive four-tiered malignancy9 grading guideline for the evaluation of brain tumour progression. The WHO system is based on the appearance of specific characteristics, such as atypia10, mitosis, endothelial proliferation and necrosis. By reflecting the malignant potential of the tumour, these features help assess the choice of therapies [301]. The four grades are detailed as follows:

· Grade I lesions are considered low grade and include lesions with low proliferative potential that are often cured through surgical resection alone;

· Grade II lesions are also considered low grade, but they are infiltrative in nature and often recur despite surgical resection;

· Grade III lesions carry histological evidence of malignancy, such as atypia and fast mitotic activity, and are considered intermediate to high grade lesions;

· Grade IV lesions are considered high grade and are cytologically ma- lignant, mitotically active, necrosis-prone neoplasms often associated with rapid pre- and post-operative disease evolution and a fatal out- come [154,300,301].

The WHO classifies glioblastoma as a grade IV astrocytoma, accounting for approximately 12-15% of all intracranial neoplasms and 60-75% of astrocytic 9Medical term used to describe the state of tumours that are resistant to therapy, spread rapidly and have a destructive clinical course. 10Term that indicates a general cellular abnormality.

6 1.2 Glioblastoma Multiforme Introduction tumours [154,301,383]. As the moniker "multiforme" implies, GBM is charac- terised by a widespread intratumoural heterogeneity that makes it extremely hard to understand and treat [154,383]. The histopathological features of GBM include nuclear atypia, cellular pleomorphism, mitotic activity, vascular throm- bosis, microvascular proliferation and necrosis [154,301]. This complexity, com- bined with a putative cancer stem cell subpopulation and an incomplete atlas of epigenetic and genetic lesions, has contributed to make this cancer one of the most difficult to understand and to treat [154].

The average life expectancy of a patient with glioblastoma lies between several weeks and several months after postoperative radiotherapy, with a protracted course when treated with temozolomide in addition to radiotherapy alone (Fig 1.1). Median survival is generally less than one year from the time of diag- nosis and most patients die within two years, with most long-term survivors being given the wrong histological diagnosis at first [128,472]. Unless the neoplasm has developed from a lower grade astrocytoma, in more than 50% of cases the clinical history is less than 3 months. In most European and North American countries, the incidence lies in the range of 3-4 new cases per 100,000 population per year, preferentially affecting adults and with a slightly higher incidence in men than women [128,301]. Although infiltrative spread

Figure 1.1: Estimates of survival amongst GBM patients treated with radiotherapy alone or radiotherapy with the alkylating agent temozolomide. Taken from Stupp et al 2005 [472]. is a common feature of all diffuse astrocytic tumours, glioblastoma is particu- larly notorious for its rapid invasion of the neighboring brain structures [154].

7 1.2 Glioblastoma Multiforme Introduction

Invading cells reside outside the contrast-enhancing rim of the tumour thereby escaping surgical resection and evading radiotherapy. However, the generation of metastases outside of the CNS remains very rare because the subarach- noidal space11 and CSF tend to remain unaffected [301]. The transforming growth factor β (TGFβ) and Akt signaling pathways have been reported to act as molecular mediators for glioblastoma invasion [245,527], as well as the possibility of activation through hypoxia with the hypoxia-inducible factors HIF112 [219]. An important aspect of glioblastoma invasion is the production of a thick extracellular matrix to support migration and that of proteolytic to enhance invasion across this matrix [301]. The current standard for glioma patients involves resection of the tumour followed by extensive radi- ation therapy and chemotherapy with the temozolomide alkylating agent - or carmustine, a nitrosourea drug, in the United States - that grants the patient a median survival of 15 months [472].

Methods and Technologies in Cancer Genomics The field of cancer genomics has been evolving over the past decade at a very fast pace, bring- ing in gargantuan amounts of fresh data to analyse from increasingly more efficient technology platforms. A brief overview of the methods and technolo- gies referred to in the rest of this chapter is given below. It should be noted that this paragraph is by no means a comprehensive representation of all the technologies currently deployed in the field of cancer genomics.

· Expression profiling determines the expression level of transcripts within a cell using platforms that can be distinguished between array and non- array. In the case of microarrays, the expression level is measured through probe-transcript interaction for those transcripts that are represented on the array. An a priori knowledge of what sequences to measure is nec- essary in terms of the type of transcript (i.e. , mRNAs), the specific transcripts of interest for the correct probe design, and the re- gions within that transcript (i.e. , , UTRs, single nucleotide polymorphisms (SNPs), etc), making this platform inadequate for novel gene discovery. Unlike microarrays, RNA sequencing measures expres- sion levels by averaging the number of sequenced "reads", fragments of 11Interval between two membranes that protect the brain, the arachnoid membrane and the pia mater. 12Transcription factors that respond to decreased availability of in the cellular environment and are highly conserved transcriptional complexes of heterodimers constituted by an α and a β subunit.

8 1.2 Glioblastoma Multiforme Introduction

up to several hundred base pairs, depending on the technology used, col- lected along the entire length of the transcript. Potentially any RNA population can be selected for sequencing but the mRNA and small non- coding RNA population are the most commonly measured. Due to its probe-free nature, sequencing is an appropriate platform for novel gene discovery.

· Epigenetic profiling measures the amount of DNA methylation that oc- curs at CpG islands in promoters using different types of assays on array and non-array platforms. Many high-throughput profiling assays have genomic DNA treated with a bisulfite conversion kit that converts un- methylated cytosine residues to uracil and leaves methylated cytosine residues unaffected. This treatment yields single nucleotide resolution information about the methylation status of a segment of DNA and var- ious analyses can be performed that depend on the platform used, to retrieve this information.

· Exome sequencing is a selective method of DNA and RNA sequencing in which samples are enriched for the coding regions of the genome, or the "exome", and are successively sequenced. Targeted enrichment of the exome may be performed by capture, using hybridization to microarrays with probe sequences defined, for example, by the National Centre for Biotechnology Information (NCBI) Consensus Coding Sequence (CCDS) database [358], or by amplicon-based PCR amplification employing se- quencing adaptors to amplify specific loci. As coding regions constitute approximately 1% of the [98], exome sequencing is a potentially efficient strategy for the identification of rare functional mu- tations, thereby detaining clinical relevance in the assessment of the role of sequence variation in genetic disorders [56,98,358]. Although meth- ods of enriching targeted genomic segments by hybridization have been historically limited by the large amounts of genomic DNA required and the modest throughput of the coupled sequence platforms, nowadays ad- vances in hybridization specificity and sequencing technology have suc- cessfully reduced the costs and increased the coverage of the exome se- quencing process. However, cost effectiveness and completeness of the information obtained are still key considerations [98].

· Array comparative genomic hybridization is a method for the detection of copy number aberrations that uses an array platform containing thou-

9 1.3 Primary and Secondary Glioblastomas Introduction

sands of defined DNA probes that are hybridised to a sample mixture containing DNA from tumour cells and DNA from a normal control, each labeled with a different fluorescent dye. Abnormal regions in the genome are then detected by calculating the ratio of the fluorescence intensity of the hybridised sample to that of the reference DNA.

1.3 Primary and Secondary Glioblastomas

High-grade malignant gliomas are uniformly fatal despite the therapeutic ag- gressiveness with which they are treated. As already mentioned, a classical feature of these tumours is their widespread morphological and lineage het- erogeneity. This plasticity is especially appreciable in gliomas presenting both astrocytic and oligodendroglial histopathological features, the basis of which, however, remains unknown [549]. Glioblastomas have been classified into two subtypes, primary or de novo and secondary glioblastomas [154,301,383,549]. Primary glioblastomas rep- resent the majority (more than 90%) of diagnosed cases and they develop very rapidly without clinical or histopathological evidence of a pre-existing, less malignant precursor lesion [154,301,334,450]. Secondary glioblastomas, on the other hand, have a history of malignant progression from a lower-grade tu- mour such as diffuse astrocytoma (WHO grade II) or anaplastic13 astrocytoma (WHO grade III), with more than 70% of WHO grade II gliomas transform- ing into WHO grade III/IV diseases within five to 10 years of diagnosis [154]. Only 5% of glioblastomas are classified as secondary and they tend to pertain to a younger cohort of patients (average age 45 years) compared to primary glioblastoma patients (average age 60 years) [154,301,383]. Interestingly, de- spite their distinct clinical histories, primary and secondary glioblastomas are morphologically and clinically indistinguishable and their prognoses are equally poor after having performed age-adjusted analysis [154,383,390]. Presently, the tools used to study glioblastoma are: primary tissue, genetically modified mouse models (GEMMs), and glioma cell lines [308].

Primary Tissue The genetic events involved in the initiation and progres- sion of glioblastoma are still unknown because of the limited availability of early stage neoplastic tissue [308]. A number of studies today have focused on large-scale sequencing of glioblastoma samples from different patients to 13A cancer that is very poorly differentiated is called anaplastic.

10 1.3 Primary and Secondary Glioblastomas Introduction attempt focusing on this problem. In the Parsons et al study from 2008 [383], mutational data obtained from the exome sequencing of 22 human glioblas- tomas was analysed for copy number aberrations (CNAs), and integrated to identify glioblastoma candidate driver genes, i.e. genes that carry providing a selective advantage to the tumour cell. Interestingly, in 12% of the glioblastoma patients mutations were found in the of the 1 (IDH1) gene on the long arm of 22, a gene never previously associated with glioblastoma. IDH1 encodes an isocitrate dehydro- genase, which catalyses the carboxylation of isocitrate to α-ketoglutarate and nicotinamide adenine dinucleotide phosphate (NADPH), a coenzyme used as a reducing agent in anabolic14 biosynthetic reactions [225]. Five isocitrate dehy- drogenase genes exist in and three are localised in the mitochondria, while IDH1 is localised within the cytoplasm and . The func- tion of IDH1 is to help release cellular stress from oxidative damage through the generation of NADPH. Mutations in IDH1 were observed preferentially in younger glioblastoma patients, on average 33 years of age, as opposed to wild type carriers, on average 53 years of age, and most of them were found in patients with secondary glioblastomas. These patients had a longer median survival time of 3.8 years as compared to 1.1 years for patients with wild-type IDH1. A similar pattern was also observed in the subgroup of young patients with Tumour 53 (TP53) mutations. All mutations of IDH1 resulted in an amino acid activating substitution at an evolutionary conserved residue located within the ’s , reminiscent of known activating al- terations in oncogenes such as BRAF, KRAS and PIK3CA [383].

The study by Watanabe et al [522] carried these results further by finding a total of 130 IDH1 mutations involving amino acid 132 in 321 gliomas that specifically affected 88% of the low-grade diffuse astrocytomas, 82% of the secondary glioblastomas that developed through progression from low-grade diffuse or anaplastic astrocytoma, 79% of the and 94% of the . Interestingly, analyses of multiple biopsies from the same patient showed that an IDH1 never occurred after the acquisi- tion of a TP53 mutation, suggesting that IDH1 mutation is a very early event in gliomagenesis that may affect a common glial precursor cell population. IDH1 mutations were co-present with TP53 mutations in 63% of low-grade 14An anabolic reaction, as opposed to a catabolic one, is a metabolic pathway that con- structs molecules from smaller units and requires energy.

11 1.3 Primary and Secondary Glioblastomas Introduction diffuse astrocytomas, but only 10% of pilocytic astrocytomas, 5% of primary glioblastomas and none of the ependymomas. The frequent presence of IDH1 mutations in secondary glioblastomas and their near complete absence in pri- mary glioblastomas reinforces the concept that, despite their histological sim- ilarities, these subtypes are genetically and clinically distinct entities [522].

In an even larger study by Yan et al [535], the sequences of the IDH1 and closely related IDH2 genes were determined in 445 tumours of the CNS and 494 tumours that did not affect the CNS. In corroboration of the Parsons et al [383] and Watanabe et al [522] studies, mutations in the IDH1 gene were found in more than 70% of WHO grade II and III astrocytomas and oligoden- drogliomas, as well as in the glioblastomas that developed from these lower- grade lesions. Each of these mutations affected amino acid 132 and reduced the enzymatic activity of the encoded protein [535]. Interestingly, tumours that did not carry mutations in the IDH1 gene often had mutations affect- ing the analogous amino acid (R172) on the closely related IDH2 gene that also reduced the enzymatic activity of the encoded protein, suggesting a form of functional redundancy between the two genes. Similarly to the results by Parsons et al [383], the tumours carrying IDH1 or IDH2 mutations showed dis- tinctive genetic and clinical characteristics that resulted in better outcomes for those patients with respect to the patients carrying wild-type IDH genes [535].

In a study by Zhao et al [548] the functional impact of the IDH1 mutation was assessed in cultured glioma cells. By using the human cytosolic IDH1 crystal structure reported by Xu et al in 2004 [532], this study showed that the tumour-derived IDH1 mutation impaired the affinity of the enzyme for its substrate by forming catalytically inactive heterodimers. Interestingly, when the expression of a mutant IDH1 was forced in cultured glioma cells, the for- mation of α-ketoglutarate was greatly reduced but levels of the subunit α of hypoxia-inducible factor 1 (HIF1A) were greatly increased. In fact, the tran- scription factor HIF1A is regulated by the product of the reaction catalysed by IDH1, α-ketoglutarate, and as a result, the IDH1 mutated human glioma cul- tures displayed higher levels of HIF1A unlike wild-type IDH1 glioma cultures. This was indicative that IDH1 may function as a tumour suppressor and, when inactivated by mutation, may contribute to tumourigenesis through induction of the HIF1 pathway [548]. In summary, the possibility of detecting an IDH1 mutation in a patient has the clinical potential, for a subpopulation of mostly

12 1.3 Primary and Secondary Glioblastomas Introduction secondary and few primary glioblastoma patients, to hypothesise a protracted clinical course [383]. New treatments could be designed to take advantage of IDH1 alterations in these patients, especially since the inhibition of the IDH2 enzyme has recently been shown to increase sensitivity of tumour cells to chemotherapeutic agents [225].

The glioblastoma cancer genome was the first to be characterised in the con- certed efforts of the Cancer Genome Atlas project (TCGA) [326]. The aim was to "catalogue and discover major cancer-causing genome alterations in large cohorts of human tumours through integrated multi-dimensional analy- ses". The pilot project screened a total of 587 samples down to 206, which were used to conduct genome-wide analysis of DNA copy number and gene expression, and DNA methylation screening on a total of 2,305 assayed genes. Of the 206 chosen biospecimens, 21 were post-treatment glioblastoma cases and the remaining 185 represented predominantly primary glioblastomas. A total of 91 samples and 601 genes, inclusive of 7932 exons, were chosen from the 206 sample pool for re-sequencing towards mutational analysis. Although the method of biospecimen selection ensured high-quality data, the stringency of selection criteria may have introduced a degree of bias since small samples and samples with high levels of necrosis were excluded [326].

In the TCGA study, upon a statistical analysis of mutation significance in the 91 matched glioblastoma-normal pairs selected for detection of somatic muta- tions in 601 selected genes, eight genes were found to be significantly mutated: TP53, PTEN, NF1, EGFR, ERBB2, RB1, PIK3R1 and PIK3CA. All the mu- tations involving TP53 were clustered in its DNA-binding domain, a known hotspot for TP53 mutations in human cancers. TP53 is an important tran- scription factor and tumour suppressor involved in most cell survival pathways and for this reason often referred to as "the guardian of the genome" [133]. Given that 27 of the 72 untreated samples and 11 of the 19 treated samples harboured TP53 mutations and given that most of the 91 samples were pri- mary glioblastomas, one can conclude that TP53 mutation is a common event in primary glioblastoma [326]. Neurofibromin 1 (NF1) is a tumour suppressor that when mutated in humans is responsible for the development of neurofi- bromatosis type I and is associated with increased risk of optic gliomas, as- trocytomas and glioblastomas [35]. The protein encoded by NF1 is a negative regulator of the rat sarcoma (RAS) signaling pathway of small guanosine-5’-

13 1.3 Primary and Secondary Glioblastomas Introduction triphosphate (GTPase) that activate the mitogen activated protein kinase (MAPK) signaling cascade amongst other pathways [496]. Overall, at least 47 of the 206 samples harboured somatic NF1 inactivating mutations or deletions, confirming the relevance of this gene in sporadic human glioblas- toma [326].

Furthermore, within the TCGA dataset it was observed that mutation rates between untreated and treated glioblastomas were markedly different, aver- aging at 1.4 and 5.8 somatic silent mutations per sample, respectively. The higher average in the treated samples was mostly due to the contributions from a cohort of seven hypermutated tumours treated with temozolomide or lomustine. The hypermutator phenotypes previously described in glioblas- toma [79,204] were known to carry mutations in MutS homolog 6 (MSH6), a component of the post-replicative DNA mismatch repair system (MMR). MSH6 heterodimerizes with MSH2 to form MutSα, which binds to DNA mis- matches thereby initiating DNA repair [6]. An analysis of the genes involved in mismatch repair within the TCGA dataset uncovered that six out of the seven hypermutated samples harbored mutations in at least one of the mis- match repair genes MLH1, MSH2, MSH6 or PMS2 [326].

Recurrent focal alterations found in the TCGA samples that have already been described and are common in glioblastoma are the amplification of Epi- dermal growth factor receptor (EGFR), cyclin-dependent-kinase (CDK) CDK4 and CDK6, PDGFRA, MDM2, MDM4, MET, MYCN, CCND2 and PIK2CA [104,255,262,292,374,423,429]. Interestingly, uncommon focal alterations were also found, such as the amplification of the / protein kinase AKT3 and the homozygous deletions of NF1 and PARK2. V-akt murine thy- moma viral oncogene homolog 3 (AKT3) belongs to the AKT family of ser- ine/threonine kinases together with AKT1 and AKT2. The three AKT kinases are now known to represent central nodes in a variety of signaling cascades that regulate normal cellular process such as cell size and growth, proliferation, survival, glucose , genome stability, and neo-vascularization. It is currently less clear, however, whether AKT1, AKT2, and AKT3 are function- ally redundant or whether each carries out a specific functional role [45,46]. Parkinson protein 2 (PARK2) encodes for a component of an E3 complex that mediates the targeting of substrate for protea- somal degradation. The functions carried by PARK2 are currently unknown,

14 1.3 Primary and Secondary Glioblastomas Introduction but mutations in this gene are known to cause a familial form of Parkinson’s disease known as autosomal recessive juvenile Parkinson disease [229]. The most significant loss-of-heterozygosity (LOH15) event identified in the TCGA dataset was observed on the long arm of chromosome 17 where the TP53 gene resides [326].

Cytosine-phosphate-Guanine (CpG16) islands are regions of DNA in which a cytosine is linked to a guanine via a phopsphodiester bond to form the dinu- cleotide CpG that is repeated many times along a linear sequence. In formal definitions, a CpG island must occupy more than 50% of a 200bp sequence and have an expected/observed CpG ratio of 0.6 [102,159]. Methylation of the 5’ carbon of cytosine is a form of epigenetic modification that affects regula- tion of gene expression via non-sequence based interactions. Such methylated cytosines are present in the coding regions of mammalian genes and, over evo- lutionary time, spontaneously deaminate to become thymines [102,287]. In , 70% to 80% of CpG cytosines are methylated [208]. Oppositely, CpG islands in promoters tend to be unmethylated when genes are expressed, suggesting that methylation is an inhibiting event for gene expression [142,214]. Methylation of CpG sites within promoters has been observed as a mechanism of tumour suppressor gene silencing in a number of human cancers. In contrast, the hypomethylation of CpG sites has been associated with the over-expression of oncogenes within cancer cells [214]. Evaluation of methylation was conducted in the TCGA project across 91 glioblastoma samples. A pattern emerged between the methylation of the O-6-methylguanine-DNA methyltransferase (MGMT) promoter and the sub- stitution spectrum of treated samples. MGMT is a DNA repair enzyme that repairs damaged alkylated guanine residues and its promoter methylation sta- tus has already been associated to sensitivity to the temozolomide alkylating agent, which is the current standard of care for glioblastoma patients [79,204]. Amongst the 13 samples treated with alkylating agent that did not show MGMT methylation, the validated somatic mutations from GC to AT, caused by the spontaneous deamination of methylated cytosines to thymines, occurred in comparable amounts between CpG and non-CpG dinucleotides. However, in the six treated samples with MGMT methylation, the GC to AT transitions 15The loss of normal function of one allele of a gene in which the other allele was already inactivated. In the context of oncogenesis it refers to when the remaining functional allele in a somatic cell of the offspring becomes inactivated by mutation. 16The "CpG" notation is used to distinguish CG base-pairing of cytosine and guanine.

15 1.3 Primary and Secondary Glioblastomas Introduction were found mostly in all non-CpG dinucleotides. This pattern is consistent with the failure to repair alkylated guanine residues that is caused by treat- ment if MGMT methylation is also shifting the mutation spectrum of treated samples to a preponderance of GC to AT transitions at non-CpG sites [326]. The molecular mechanisms lying behind such pattern could find an expla- nation in the interesting observation that the mutation spectra of mismatch repair genes reflected MGMT promoter methylation as well. In fact, in treated hypermutated samples with methylated MGMT, mismatch repair genes accu- mulated GC to AT transitions in non-CpG islands, which was not observed in any of the hypermutated tumours with unmethylated MGMT. Thus, mis- match repair deficiency and MGMT methylation status together could have powerful clinical implications in the context of treatment, raising the possibil- ity that patients who initially respond to the frontline therapy in use today may evolve not only treatment resistance, but also an MMR-defective hypermuta- tor phenotype. The fact that newly diagnosed glioblastomas with methylated MGMT respond well to treatment with alkylating agents, is in part due to the initiation of many mismatch repair cycles that attempt to repair the alky- lated guanines and in doing so lead to cell death, which is consistent with the observation that the mismatch repair genes themselves are mutated with CG to AT transitions at non-CpG sites [326]. Therefore, initial methylation of MGMT in this scenario would have an effect on two fronts: shifting the mutation spectrum that will affect mutations at mismatch repair genes, and increasing the selective pressure to lose mismatch repair function, resulting in aggressive recurrent tumours with a hypermutator phenotype [363]. These findings highlight the importance of designing selective strategies that target mismatch-repair-deficient cells in combination with alkylating agents, in order to prevent or minimise the emergence of treatment resistance [326].

Genetically Engineered Mouse Models Murine gliomas that appear to develop in the absence of lower grade precursors are very important disease models because they reproduce de novo conditions for the onset of glioblas- toma. GEMMs are useful research tools towards that end because they can accurately reproduce the initiation and progression stages in the human pathol- ogy upon introduction of few mutations, although it is still debated whether they can accurately recreate the genomic and expression heterogeneity of the original human disease [308]. In some cases these mouse models helped pre- dict the importance in human gliomas of events such as TP53 and NF1 in-

16 1.3 Primary and Secondary Glioblastomas Introduction activation [426,549]. Thus, GEMMs have been instrumental so far in the molecular understanding of the causes of human gliomagenesis. Mutations in Phosphatase and tensin homolog (PTEN), TP53 and Retinoblastoma 1 (RB1) have all been tested in mouse models expressing Cre specifically in the brain and their phenotypical effects have all been cell-type as well as developmental- stage specific [100]. The use of the Cre-lox system allows specific recombination events to occur in genomic DNA. The Cre protein is a site-specific DNA re- combinase that catalyses the recombination of DNA between specific loxP sites that have a directional core sequence between them. When cells express Cre, the DNA is cut at both loxP sites and the result of the recombination depends on whether the lox sites are located on the same chromosome and whether in an inverted or direct repeat fashion. Same chromosome inverted lox sites pro- duce an insertion, while direct repeats cause a deletion. Different chromosome lox sites may cause translocation events [371].

Glioma Cell Lines Cancer cell lines have been the historical standard both for exploring the biology of human tumours and as preclinical models for screening of potential therapeutic agents [261]. Due to the inherent diffi- culty in establishing and maintaining primary tumour cell cultures, established cell lines have been traditionally used to characterize the genomic aberrations identified in primary tumours [275]. However, it is possible that the genetic aberrations accumulated by repeatedly passaging the cells in vitro may cause their phenotypic characteristics to bear little resemblance with the primary human tumour [66,261]. The laboratory of Howard Fine has attempted a systematic genomic survey of five of the most commonly used glioma cell lines - A172, Hs683, T98G, U251, and U87 - for the purpose of evaluating their similarity to primary gliomas [275]. Their research, conducted with high-density Single Nucleotide Polymorphism (SNP) arrays, showed that established glioma cell lines and pri- mary tumour have significant differences in both genomic alterations and gene expression, indicating that glioma cell lines may not be an accurate representa- tion or model system for primary gliomas [275]. The differences observed in the biological phenotype of glioma cell lines compared with primary tumours are further confounded by the lack of molecular hallmarks in serially passaged cell lines, such as the over-expression of EGFR, the silencing of cyclin-dependent kinase inhibitor (CDKN) CDKN2A and the loss of PTEN [207,445]. This ex- plains why in vitro and in vivo cancer cell line-based preclinical therapeutic

17 1.3 Primary and Secondary Glioblastomas Introduction screening models have been poorly predictive of useful therapeutic agents and may have led to important misinterpretations on the relevance of aberrant sig- naling pathways within cell lines compared to primary tumours [261]. Nonetheless, cell lines don’t contain the typical mixture of genetically distinct cells of primary tumours and can therefore be more easily characterised. In fact, the problematic presence of non-tumour cells in primary tumours, makes it harder to pinpoint the rare mutations that are not spread uniformly through- out the tumour [275]. On the basis of this claim, cancer cell line sequencing efforts are recently flourishing, including works on the commonly studied grade IV glioma cell line U87MG [101]. With these pros and cons in mind, in the study by Lee et al [261] they went on to search for a more biologically relevant model system for exploring glioma bi- ology and for the screening of new therapeutic targets, which they found in neu- ral stem (NS) cells. These cells have characteristics of continuous self-renewal, extensive migration and infiltration of brain parenchyma and the potential for full or partial differentiation, which are lost in glioma cell lines [261,286,347] and will be discussed in greater detail in chapter three (see Section 3.3).

Classification Systems

Several decades of experimentation on glioblastoma have highlighted that specific genetic lesions are more commonly observed in certain subclasses of glioblastoma. Primary glioblastoma typically harbours mutations in the EGFR receptor tyrosine kinase gene, tumour suppressor PTEN and cyclin inhibitor CDKN2A, while secondary glioblastoma harbours mutations in Platelet-derived growth factor (PDGF) and tumour suppressor TP53. However, the latter as- sociation is now starting to be considered a historical one, since an increasing number of studies are showing that TP53 mutations occur in a significant amount of primary glioblastomas [383,549]. These alterations can become pre- dictive of glioma subclasses. Glioblastomas with intact expression of the PTEN and EGFR vIII17 proteins, for example, correlate with increased EGFR kinase inhibitor response as compared to tumours expressing EGFR vIII but lacking PTEN [329]. Immunohistochemical markers have been important tools so far in the classifi- cation and diagnosis of malignant gliomas, with Glial fibrillary acidic protein (GFAP) and Oligodendrocyte lineage factor 2 (OLIG2) being two

17The vIII mutant of EGFR is the most common in glioblastoma and results from a non-random 801bp in-frame deletion of exons 2 to 7 of the EGFR gene [371].

18 1.3 Primary and Secondary Glioblastomas Introduction of the most specific ones [154]. GFAP is universally expressed in astrocytic and ependymal tumours and OLIG2 is an oligodendroglial as well as stem cell marker expressed at high levels only in diffuse gliomas [281,338,435]. Recently investigated novel markers are stem and progenitor cell markers. Intensive research efforts are attempting to uncover agents that may target subpopula- tions of these cells with high tumourigenic potential and increased resistance to current therapies [154]. The cell surface marker CD133 and other mark- ers of stem cells, such as Nestin, Musashi and Sex determining region y-box 2 (SOX2), have been shown to negatively correlate with outcome parame- ters [304]. In an attempt to optimise the association of different prognoses with differ- ent therapies, several studies have focused their efforts in building an accurate classification system [154,383,390]. Elucidating patterns between prognosis and specific genetic lesions would allow therapies to tailor to the group of patients who will most likely respond to them, an approach also known as "stratification of treatment" [383]. Genome-wide profiling studies such as the ones conducted by Freije et al in 2004 [148] Phillips et al in 2006 [390] and Verhaak et al in 2010 [511], have tried to categorise glioblastoma in molecular subclasses that could be predictive of survival outcomes. Thus, microarray gene expression data for hundreds of high-grade glioma samples was analysed and has shown that most tumours can be classified into a small number of subtypes correlated with survival and response to therapy.

The largest such study to date [511] built a dataset from 200 GBM and two normal brain samples that was used to identify four glioblastoma subtypes named Proneural, Neural, Classical and Mesenchymal, each characterised by a distinct gene expression signature encompassing a set of 210 up-regulated genes. An independent set of 260 GBM expression profiles was compiled from the public domain, including TCGA and Phillips et al [390], that successfully assessed subtype reproducibility. The Proneural subtype was associated with younger age, Platelet-derived growth factor receptor α (PDGFRA) abnormal- ities, and IDH1 and TP53 mutations, all of which have previously been as- sociated with secondary GBM and correlate with longer survival times [326]. In confirmation of this pattern, the Proneural subtype previously identified in the study by Phillips et al also included most grade III gliomas and 75% of lower grade gliomas [390]. The Classical subtype was strongly associated with the astrocytic signature and contained all common genomic aberrations

19 1.3 Primary and Secondary Glioblastomas Introduction observed in GBM, such as amplifications, chromosome 10 dele- tions, EGFR amplification, and deletion of the TP53-stabilising isoform of the cyclin-dependent inhibitor CDKN2A:ARF [326]. As already observed in the study by Phillips et al, the Mesenchymal subtype was characterised by high expression of Chitinase 3-like 1 (CHI3L1) and Met proto-oncogene (MET) and also a lack of association with a specific signature, but rather an equal cor- relation with the neural, astrocytic, and oligodendrocytic gene signatures. A striking characteristic of this class was the strong association with NF1 dele- tion, already known to induce GBM in Nf1;p53 double knockout mice [426] and shown to occur in a variety of tumours such as neurofibromas [550], but only recently observed in human GBMs [69,320]. Since the Proneural subtype was associated with a trend toward longer survival and the samples did not show a survival advantage from aggressive treatment protocols, but a clear treatment effect was observed in the Classical and Mesenchymal samples, the results of this profiling-based classification study may find important roles in suggesting different therapeutic strategies of high clinical impact. For example, extend- ing the current biomarker assays for GBM to include subtyping tests for key genetic events, including NF1 and PTEN loss, IDH1 and Phosphoinositide-3- kinase (PI3K) mutation, PDGFRA and EGFR amplification [511].

Other studies attempting to define distinct subgroups of glioma identified CpG island methylator phenotypes [363] and microRNA profiles [231] as part of the same goal towards a new therapeutic approach. In the former study by Noushmehr et al, promoter DNA methylation was assessed in 272 glioblas- tomas from the TCGA dataset and validated in a different set of non-TCGA glioblastomas and low-grade gliomas. Three DNA methylation clusters were identified on array-based methylation assay platforms and one of these formed a particularly tight cluster with a highly characteristic DNA methylation pro- file designated as the "glioma CpG island methylator phenotype" or G-CIMP. The G-CIMP sample cluster was highly enriched for the Proneural expression profile defined by Verhaak et al [511] and the 24 G-CIMP-positive patients were all significantly associated with IDH1 somatic mutations and a longer survival time, making G-CIMP status a potential predictor of improved pa- tient survival. The authors claim that if a transacting factor were involved in the protection from methylation of the CpG island promoters in the G-CIMP cluster, then the loss of its function could provide a favourable context for the acquisition of specific genetic lesions such as IDH1 mutation. The two GBM

20 1.4 Pathways Involved in Glioblastoma Introduction subgroups identified through the G-CIMP status would therefore have impor- tant implications in the assessment of therapeutic strategies for different GBM patients [363]. The microRNA profiling study by Kim et al [231] analysed 261 microRNA expression profiles from TCGA, identifying five clinically and genetically dis- tinct subclasses of glioblastoma that each related to a different neural precursor cell type: radial glia, oligoneuronal precursors, neuronal precursors, neuroep- ithelial/neural crest precursors and astrocyte precursors, suggesting a - tionship between each subclass and a distinct stage of neural differentiation. Interestingly, when compared to the glioblastoma subclasses identified by Ver- haak et al [511], the microRNA-based oligoneural, radial glial, and astrocytic subclasses were enriched in tumours from the Proneural, Classical, and Mes- enchymal subtypes, respectively. MicroRNA-based consensus clustering also yielded robust survival differences, with oligoneural glioblastoma patients liv- ing significantly longer, and a distinct pattern of somatic mutations, with the oligoneural subclass enriched for IDH1 and Phosphoinositide-3-kinase receptor 1 (PIK3R1) mutations but lacking NF1 mutations. A very high connectivity, i.e. the number of directly connected mRNAs, was displayed by miR-9 and miR-222 to the oligoneural and astrocytic precursor subtypes, respectively, suggesting that these microRNAs might serve as core regulators of subclass- specific gene expression in glioblastoma. Overall, this study provided strong evidence that glioblastomas can arise from the transformation of neural precur- sors at each of the stages represented by a microRNA-identified subclass and that microRNAs are therefore useful for sub-classifying glioblastomas to gener- ate accurate prognoses and for the development of molecular-based treatment decisions [231].

1.4 Pathways Involved in Glioblastoma

Glioblastoma often involves the concurrent deregulation of three core path- ways: RTK/PI3K/PTEN signaling, p53 signaling and Rb-mediated control of progression (Fig 1.2) [2,286,326,383]. Therefore, important genetic events in human glioblastoma are the deregulation of growth factor signaling pathways via amplification or mutational activation of receptor tyrosine ki- nases (RTKs), the activation of the PI3K pathway and inactivation of the p53 and Rb tumour suppressor pathways [286,326]. The Rb and p53 pathways,

21 1.4 Pathways Involved in Glioblastoma Introduction which regulate cell cycle primarily by governing the G1/S18 phase transition, are major targets of inactivating mutations in glioblastoma and their absence renders tumours particularly susceptible to inappropriate cell division driven by constitutively active mitogenic signaling effectors, such as PI3K and MAP kinases [154]. The same three pathways were singled out in the TCGA project when mapping the somatic nucleotide substitutions, homozygous deletions and focal amplifications, onto the major pathways implicated in glioblastoma. A statistical tendency was observed in which components within each pathway were altered in a mutually exclusive fashion, hinting at deregulation of one com- ponent relieving the selective pressure for additional ones in the same pathway. Also, 74% of the samples harbored aberrations in all three pathways, suggest- ing a functional non-redundancy between the three as a key requirement for glioblastoma pathogenesis [326]. The study by Parsons et al in 2008 [383] detected critical genes within im- portant cell signaling pathways in glioma: TP53, Mdm2 p53 binding protein homolog (MDM2) and Mdm4 p53 binding protein homolog (MDM4) in the p53 pathway; RB1, CDK4 and CDKN2A in the Rb pathway and PIK3CA, PIK3R1, PTEN and IRS1 in the RTK/PI3K/PTEN pathway. All but one of the cancers bearing mutations in members of one of these three pathways did not show alterations in any of the other two, suggesting functional redundancy for these mutations in tumourigenesis [383].

RTK/PI3K/PTEN Pathway

Tumour cells acquire genomic alterations that greatly reduce their dependence on exogenous growth stimulation via transmembrane receptor contact with dif- fusible growth factors, cell to cell adhesion, or extracellular matrix. Most often these cells enable inappropriate cell division, survival, and motility through the constitutive activation of the PI3K and MAPK pathways. The predominant mechanism of mitogenic signaling activation for gliomas occurs through RTKs, high-affinity cell surface receptors that bind and transduce the signal from cy- tokines, growth factors and hormones, and integrins, membrane-bound recep- tors that mediate the interaction between the extracellular matrix and the cy- toskeleton [154]. Receptors that belong to the RTK family are EGFR, PDGFR, MET, FGFR and VEGFR [431]. The epidermal growth factor EGF and the platelet-derived growth factor PDGF pathways play important roles in both 18Major checkpoint in the regulation of cell cycle beyond which the cell is committed to dividing.

22 1.4 Pathways Involved in Glioblastoma Introduction KEGG Glioma Pathway [2]. Oncogenes and tumour suppressors are highlighted in red. Figure 1.2:

23 1.4 Pathways Involved in Glioblastoma Introduction

CNS development and gliomagenesis [154]. The EGFR family is composed of four structurally related members: EGFR, ERBB2, ERBB3, and ERBB4. The importance of RTKs is made obvious by the fact that 58 out of the 90 unique tyrosine kinase genes in the human genome, encode for receptor tyro- sine kinase proteins [431]. EGFR gene amplification occurs in roughly 40% of all glioblastomas, and the amplified genes are frequently rearranged [154,262]. Alterations in this family were found in 41 of the 91 TCGA samples, includ- ing the vIII mutant, extracellular domain point mutations and cytoplasmic domain deletions [326]. The vIII EGFR mutant contains deletions of exons two to seven and occurs in 20-30% of all human glioblastoma, making it the most common EGFR mutant [154]. Although ERBB2 mutation was previously reported in only one glioblastoma, seven samples of the 91 TCGA harbored 11 somatic ERBB2 mutations, including mostly missense19 and one splice-site mutation. Unlike in breast cancer, however, no amplifications were observed for ERBB2 [326]. The PDGFR family of receptors contains two members, PDGFRα and PDGFRβ, that homodimerize or heterodimerize depending on which growth factor is bound to them [191]. PDGFRα and its ligands, PDGFA and PDGFB, are expressed in gliomas, particularly in high-grade tumours, while strong expression of PDGFRβ occurs in proliferating endothelial cells in glioblastoma [154]. In contrast to EGFR, amplification or rearrangement of PDGFR is much less common, and a relatively rare oncogenic deletion of exons eight and nine, similarly to EGFR vIII, is constitutively active and en- hances tumourigenicity [154]. Given the tumoural co-expression of PDGF and PDGFR, autocrine and paracrine loops may be the primary means by which this growth factor axis exerts its effects [154]. Although RTKs are known to signal through both the MAPK and PI3K pathways, GEMMs showed con- sistent high activation of just the PI3K pathway in high-grade astrocytomas, hinting at the fact that downstream consequences of RTK activation may vary greatly [100].

The PI3K complex belongs to class I of the kinase family of phospho- inositide 3-kinases and is composed of a catalytic subunit, PIK3CA, and a regulatory subunit, PIK3R1. Activating missense mutations in the adaptor binding and kinase domains of the catalytic subunit of class I PI3K complexes are known to occur in different types of tumours, including glioblastoma, while 19A missense mutation is a point mutation that results in a codon coding for a different amino acid.

24 1.4 Pathways Involved in Glioblastoma Introduction mutations in the regulatory subunits are less common [35,36,286]. Of the 91 samples from the TCGA study, nine carried somatic mutations in the regu- latory subunit that clustered around the three amino acids acting as contact points for the catalytic subunit, suggesting the possibility that these mutations prevent inhibitory contacts between the two subunits and cause constitutive PI3K activity [326]. The PI3K family of enzymes phosphorylates the inositol ring of phosphatidylinositol20 on the three, four and five hydroxyl groups in many different combinations [157,440]. Class I PI3K complexes can be acti- vated by small GTPases like RAS or RTKs [154]. Upon activation from RTKs, PI3K catalyses the of phos- phatidylinositol (3,4)-bisphosphate to phosphatidylinositol (3,4,5)-trisphosphate

(PIP3) [154,267]. The generation of PIP3 in the cytosolic side of the cell mem- brane acts as a docking site for the serine/threonine protein kinases of the Akt family and the 3-phosphoinositide dependent protein kinase-1 (PDPK1). Upon relocation, PDPK1 and mammalian target of rapamycin mTOR activate AKT1 via phosphorylation of two key residues, starting the AKT-mediated sig- naling cascade that promotes cell survival and proliferation [154]. AKT activa- tion may be compromised via two other mechanisms in glioblastoma: dephos- phorylation from the PH domain and rich repeat protein phosphatase 1 (PHLPP) [74] or inhibition of phosphorylation through the C-terminal modu- lator protein (CTMP) inhibitor [307]. One of the targets of AKT1 is phospho- rylation of the Forkhead box O (FOXO) family of transcription factors, which promotes their exclusion from the nucleus and reduces the expression of im- portant target genes such as the cyclin-dependent kinase inhibitors CDKN1A and CDKN1B, both also directly targeted by AKT1, and the Rb family mem- ber p130 [154]. The action of class I PI3Ks is directly antagonized by PTEN through dephosphorylation of PIP3 and inhibition of AKT1 relocation, which strongly reduces AKT1-mediated cell cycle promotion [84,99,111]. Furthermore, in 86% of the TCGA samples, at least one genetic event was har- boured in the RTK/PI3K/PTEN pathway as well as frequent deletions of the PTEN lipid phosphatase gene. Within the RTK/PI3K/PTEN pathway, fre- quent aberrations were shown in EGFR, ERBB2, PDGFRA and MET [326]. Patients with PTEN deletions or activating mutations in the catalytic and regulatory subunits of class I PI3K complexes, might benefit from PI3K or PDPK1 inhibitors, whereas patients in which the PI3K pathway is altered by 20Negatively charged phospholipid and a minor component in the cytosolic side of eukary- otic cell membranes.

25 1.4 Pathways Involved in Glioblastoma Introduction

AKT amplification might be refractory. Also, the co-amplification exhibited by multiple RTKs in the same glioblastoma sample may be tailored with anti- RTK therapies to specific patterns of RTK mutation [326].

PTEN is a major tumour suppressor that is inactivated in 50% of high-grade gliomas by mutations or epigenetic mechanisms, each resulting in uncontrolled PI3K signaling [154]. PTEN expression is completely extinguished in tumour cells of hGFAP-Cre+;p53lox/lox;Ptenlox/+ GEMMs21 with developed anaplastic astrocytomas and glioblastomas, but it is present in the surrounding normal tissue and vessels supplying tumour tissue. Such LOH effect is very frequent in human high-grade gliomas [549]. PTEN is located in the long arm of chromo- some 10 and acts as the central negative regulator of the PI3K/AKT pathway due to its lipid phosphatase activity that affects RTK signaling [84,99,111]. As a consequence, the RTK/PI3K pathway is commonly affected by the bial- lelic inactivation of PTEN or LOH of the long arm of chromosome 10. Loss of PTEN most often results in constitutive activation of AKT1 but is not, in mature astrocytes, sufficient to drive proliferation and initiate gliomagenesis in the absence of other mutations. This suggests that the PI3K/AKT pathway is not sufficiently stimulated by the absence of its main negative regulator to elevate pathway activity in astrocytes [100]. Furthermore, PTEN may act to suppress transformation and tumour progression beyond regulation of PI3K signaling. In a study by Shen et al [453], quiescent cells from mouse model cell systems22 harboured high levels of nuclear PTEN, which appeared to fulfill important roles in the maintenance of genomic integrity, through centromere stabilisation and promotion of DNA repair [453]. In other studies, a number of PTEN point mutations found in familial cancer predisposition syndromes had no effect on enzyme activity and lied within important sequences for the localisation of PTEN. Analysis of such mutants has confirmed that aberrant sequestration of PTEN into either the nucleus or the cytoplasm compromises its tumour suppressor function [117,154,495].

21In the study by Zheng et al [549], the hGFAP-Cre transgene was used to delete p53 alone or in combination with Pten in all CNS lineages using conditional p53 and Pten alleles, with modelling efforts directed towards the Ptenlox/+ genotype since broad CNS deletion of Pten results in lethal hydrocephalus in early mouse postnatal life. 22This mouse system included mouse embryonic fibroblasts and mouse embryonic stem cells.

26 1.4 Pathways Involved in Glioblastoma Introduction p53 Pathway

The tumour suppressor and TP53 is the most commonly mutated gene in human cancers and a master regulator of cell survival path- ways, as the number of solely its protein interactors suggests (Fig 1.3). After activation by cellular stresses, TP53 functions to trans-activate genes that mediate cell cycle arrest, , DNA repair, inhibition of angiogenesis and metastasis, and other p53-dependent activities [185]. In humans, TP53

Figure 1.3: Visualisation generated from list of 345 interactors (orange) of TP53 (yellow) from the BioGRID 3.1 [62] repository for interaction datasets. is located on the short arm of chromosome 17 and may be inactivated di- rectly by gene mutations or indirectly by alterations in genes that promote degradation of the TP53 protein [100,491,515]. TP53 signaling is commonly affected by biallelic inactivation of TP53 and sometimes via amplification of MDM2 or loss or mutation of the cyclin inhibitor CDKN2A:ARF [100]. While MDM2 is a direct inhibitor of TP53 through its ubiquitin ligase activity, CDKN2A has been identified in at least three alternatively spliced variants, two of which encode isoforms inhibitors of the CDK4 kinase. The remain- ing transcript, however, includes an alternate first exon that contains an al- ternate open reading frame (ARF) specifying a protein that is structurally unrelated to the products of the other variants and stabilises TP53 by se- questering MDM2 [233,491,515]. CDKN2A is predominantly inactivated by biallelic loss or hypermethylation in 50% to 70% of high-grade gliomas and roughly 90% of cultured glioma cell lines. Concordantly, the chromosomal re- gion containing MDM2 is amplified in roughly 10% of primary glioblastomas, the majority of which contain intact TP53 [154]. Furthermore, the discovery

27 1.4 Pathways Involved in Glioblastoma Introduction of the MDM2-related gene MDM4, which inhibits TP53 and enhances the ac- tivity of MDM2, prompted the finding of 4% of glioblastomas with MDM4 amplification and no TP53 mutation nor MDM2 amplification [284]. Through CDKN2A:ARF mediated stabilisation and activation, TP53 is able to acti- vate a potent cyclin-dependent kinase inhibitor, CDKN1A. By binding and inhibiting CDK4 and CDK6 complexes, CDKN1A acts as a regulator of the G1 progression of cell cycle [160,233]. Another cyclin-dependent inhibitor of the same family as CDKN2A is CDKN2C, which has recently been suggested to drive glioblastoma pathogenesis. CDKN2C inhibits the formation of the CDK4/CDK6 complex with cyclin dependent kinases, needed to keep the cell cycle from stalling at the G1 phase. Homozygous deletions of CDKN2C were reported in glioblastoma multiforme as well as missense mutations that dis- turb its binding with CDK6. Suggestions based on these GBM studies are that CDKN2C is a tumour suppressor that compensates for CDKN2A homozygous deletion by being up-regulated through the action of the E2F1 transcription factor [467].

In the TCGA project, inactivation of the p53 pathway occurred mostly in the form of TP53 mutations, but also of CDKN2A:ARF deletions and MDM2 and MDM4 amplifications. While genetic lesions in TP53 were mutually exclusive of those in MDM2 or MDM4, CDKN2A:ARF deletions were concurrent to TP53 mutations in 30% of the samples [326]. The best-characterised effector of TP53 is the transcriptional target CDKN1A. Although this gene has not been found to be altered in gliomas, its expression is frequently abrogated by TP53 functional inactivity as well as by mitogenic signaling through the PI3K and MAPK pathways [154]. Although TP53 mutation was historically associated with low-grade gliomas and secondary human glioblastomas, works done with GEMMs prompted the re-sequencing of both TP53 and PTEN to re-evaluate their combinatorial role in the disease [549]. PTEN had already been associated with primary glioblastoma [423] and the results of these works showed that 60% of the clinically annotated human primary glioblastomas with TP53 mutations also harboured a PTEN mutation or homozygous deletion, indicating that TP53, together with PTEN, is also a key player in human primary glioblastoma, as the TCGA data also reports [326,549].

28 1.4 Pathways Involved in Glioblastoma Introduction

Rb Pathway

The importance of the inactivation of the Rb pathway in glioma progression is evidenced by the near-universal and mutually exclusive alteration of Rb path- way effectors and inhibitors in both primary and secondary glioblastoma [154]. RB1 is located on the long arm of and similarly to the p53 pathway, the Rb pathway is also affected by homozygous deletions of the CDKN2A locus. The CDKN2A gene is, with the exception of its alternate reading frame isoform CDKN2A:ARF, a negative regulator of p53 signaling as well as a regulator of the G1 checkpoint in the Rb-mediated progression of cell cycle [154,398]. The RB1 gene is mutated in roughly 25% of high-grade astrocytomas and the loss of the long arm of chromosome 13 characterises the transition from low to intermediate grade gliomas. Amplification of the CDK4 gene accounts for the functional inactivation of RB1 in roughly 15% of high-grade gliomas, and CDK6 is also amplified but at a lower frequency [154].

Deregulation of Rb signaling leading to G1/S progression appears to be a crit- ical event in gliomagenesis whether or not inactivation of RB1 is an initiating event [100]. Cell cycle progression is regulated by the activities of complexes of cyclins and CDKs, which phosphorylate RB1 and block its growth-inhibitory functions [318]. G1 progression is controlled by the D-type cyclins, which form active complexes with CDK4 or CDK6, and E-type cyclins in associa- tion with CDK2 (Fig 1.4) [398]. Within the TCGA dataset, 77% of samples showed genetic alterations in the Rb pathway. Among these, the deletion of the CDKN2A/CDKN2B locus on the short arm of chromosome nine was the most common event, followed by amplification of the CDK4 locus [326]. CDKN2B and CDKN2A lie adjacent in the short arm of chromosome nine and together define a region that is frequently mutated and deleted in a wide variety of tumours [233]. CDKN2A and CDKN2B both form complexes with CDK4, CDK6 and cyclin D to block their activation and progression of cell cycle into the G1/S phase [382,454]. Interestingly, all samples with RB1 nucleotide sub- stitutions lacked CDKN2A/CDKN2B locus deletion, suggesting that this type of RB1 inactivation obviates the genetic pressure for activation of upstream cyclin and cyclin-dependent kinase complexes. Thus, it would be reasonable to speculate that patients with deletions in CDKN2A or CDKN2B or with amplifications in CDK4 or CDK6 could benefit from a treatment with CDK inhibitors, unlikely to affect patients with RB1 mutation [326]. However, nu- merous in vitro and in vivo assays have demonstrated that the neutralisation

29 1.5 Pathway Crosstalk Introduction

Figure 1.4: The Biocarta pathway for Rb signaling [61]. The cell cycle checkpoints at the G1/S and G2/M transitions prevent progression when DNA is damaged. The cyclin-dependent kinase CDK2 targets and phosphorylates RB1 to allow progression to the G1/S transition. When the cell is in a quiescent state, hypophosphorylated RB1 blocks proliferation by binding and sequestering the E2F family of transcription factors, which prevents the transactivation of genes essential for progression through the cell cycle [154]. Upon stimulation and activation of the MAPK cascade, cyclin D forms complexes with cyclin-dependent kinases CDK4 and CDK6 and CDK2 is re- leased from the inhibitory interaction with CDKN1B and binds to cyclin E. Together these activated complexes phosphorylate RB1, which stops inhibiting E2F transcrip- tion factors so that the cell cycle can proceed through the G1/S checkpoint [68,151]. of this pathway alone is insufficient to abrogate cell cycle control to the ex- tent needed for cellular transformation, suggesting that other important cell cycle regulation pathways complement its activities in preventing gliomagene- sis [154].

1.5 Pathway Crosstalk

While the RTK/PI3K/PTEN, p53, and Rb pathways are often considered as distinct entities, there is significant crosstalk among them, reinforcing the in- appropriate regulation of single pathway perturbations [154]. While 70% of secondary glioblastomas share the common event of IDH1 mu- tation, which initiates them into the development of the higher-grade pathol- ogy, primary glioblastomas arise de novo and lack a similar common initi- ating event. In trying to assess this, the study by Chow et al looked into the of the three most important pathways in glioblastoma using GEMMs [100]. Mutations in the three tumour suppressors Pten, p53 and Rb1 were introduced in various combinations in astrocytes and neural precursors

30 1.5 Pathway Crosstalk Introduction and eventually developed into astrocytomas ranging from grade III to grade IV [100,308]. Interestingly, none of the GEMMs carrying a deletion in only one of the three tumour suppressors developed high-grade astrocytomas. Only the deletion of p53 caused a late onset and low frequency of astrocytomas. The earliest tumour onset was observed in triple knockout mice and the highest frequency of astrocytomas in double knockout mice that carried a p53 dele- tion. This observation supports a role for p53 inactivation in astrocytoma initiation, alongside other factors such as the low frequency of RB1 and PTEN mutations and high frequency of TP53 mutations in grade II human astro- cytomas [300,526]. Furthermore, the earlier onset of high-grade gliomas in Pten:p53 knockout mice shows that Pten cooperates more efficiently with p53 mutation than Rb1 mutation in double knockout mice. Interestingly, Pten and Rb1 mutations together fail to cause gliomagenesis in GEMMs, indicating that in the absence of other mutations these two pathways fail to cooperate [100]. Moreover, these pathways can negatively regulate each other by having TP53 inhibiting the activation of the FOXO transcription factors via the activation of the Serum/glucocorticoid regulated kinase 1 (SGK1). Such kinase, in fact, could induce through phosphorylation the translocation of FOXO transcription factors out of the nucleus. This is turn would cause the inhibition of TP53 tran- scriptional activity via a FOXO-mediated increase in the association of TP53 with the nuclear export receptors translocating it to the cytoplasm [154,541].

In the study by Chow et al, the array comparative genomics hybridization (CGH) showed a subset of focal and large-scale genomic aberrations typical of human glioblastoma subclasses. For example, p53:Pten tumours had acquired the secondary amplifications of RTKs such as Pdgfra, Egfr and Met that are considered hallmarks of the human pathology. Also, p53:Pten:Rb1 tumours seemed to lose the selective pressure for RTK amplification, since only Pdgfra was found to be amplified. The understanding of how tumour suppressor losses induce secondary genomic alterations is a key to the direction of glioma pro- gression [100,308]. Thus, Rb1 inactivation seems to further cooperate with Pten and p53 deletions to generate high-grade gliomas with similar histologi- cal and biochemical signatures as in Pten:p53 double knockout mice, but with different selective pressure for RTK amplifications. Other genes affected by CNAs aside from the RTKs mentioned above are Cdk4, Cdk6, Ccnd1, Ccnd2 and Ccnd3, which all directly regulate the G1/S cell cycle checkpoint and thus Rb1 activity [68,100,382,454].

31 1.5 Pathway Crosstalk Introduction

The complicated interplay among these critical molecules highlights the need for detailed dissection of the pathways that are aberrant in each tumour to accurately guide the choice of combination therapies that can simultaneously target multiple pathways [154].

32 Chapter 2

Neurogenesis

Contents 2.1 Radial Glia ...... 33 2.2 Neural Stem Cells ...... 37

2.1 Radial Glia

Early ultrastructural studies with electron microscopy revealed that, during the development of the mammalian brain, newly born neurons used the ex- tended bipolar processes of radial glia as a structural support and guide to migrate to a new location [417]. However, the role of radial glia has recently been extended to that of a progenitor population that can divide in the devel- oping cortex and possibly the entire CNS, producing daughter cells including neurons, astrocytes and glia [400]. Historically, radial glia were believed to co- exist with neural progenitors in the ventricular zone of the brain, but the past decade has seen an increase in the amount of experimental data challenging this theory, with radial glia isolated in vitro displaying neuronal differentiation capacity [310], and dividing precursors in the developing cortex displaying a radial glia phenotype, as revealed by morphology and markers such as Brain lipid binding protein (BLBP) and Vimentin [187,361]. After cortical neuroge- nesis is complete, radial glia retract their processes and convert to multi-polar astrocytes, leaving specialized forms of radial glia persisting in the adult CNS in locations such as the cerebellum, retina and adult hippocampus [343]. In mammalian species, for example, radial glia persist into adulthood in the form of Bergmann glia in the cerebellum and Müller glia in the retina [400]. In non-mammalian vertebrates the number of radial glia that persist through-

33 2.1 Radial Glia Introduction out adulthood is much higher and more widely spread across the CNS, which might account for the considerable regenerative capacity seen in such species with respect to that in mammalian species [552].

During the development of the mammalian CNS, the steps that precede the appearance of the radial glia progenitor population are the following:

1. Commitment of an early population of cells to the neural lineage [400];

2. Induction of neuroectoderm, promoted by the absence of Bone morpho- genetic protein (BMP) and Fibroblast growth factor 2 (FGF2), and com- posed of a progenitor population termed "neuroepithelial progenitors" (NEP) cells. Although continuous self-renewal has not been demon- strated for NEP cells, all cells of the CNS directly or indirectly derive from them [400];

3. NEP cells undergo a process of interkinetic nuclear migration, in which the nucleus oscillates between the apical and basal cellular membrane in synchrony with cell cycle progression, leading to the formation of a pseudo-stratified epithelium of which Sox1 is the earliest marker [389];

4. The neural plate defined by the action of NEP cells undergoes a mor- phogenetic movement that results in the formation of the neural tube;

5. Signaling molecules (i.e. retinoic acid (RA), BMP, notochord-derived Sonic hedgehog (SHH)) are secreted from the nearby tissues to establish a positional gradient that defines sub-regions of the CNS, in which distinct neuronal and glial subtypes are specified [72];

6. At approximately embryonic day 9.5-10.5 in mouse, a second morpho- logically distinct cell type appears, termed "radial glia" [400].

Structural features of radial glia that distinguish them from their earlier NEP progenitors are the bipolar morphology, with one extension at the luminal sur- face23 of the ventricular zone (VZ) and a longer process extending in the op- posite direction through to the basement membrane adjacent to the pia mater 24 [193] (Fig 2.1); the ovoid cell body, where the nucleus is positioned in the

23This term indicates the surface that looks into the space defined by the interior of a tubular structure. 24The CNS is enclosed in three connective tissue membranes called meninges, one of them being the pia mater that follows the surface of the brain and spinal cord closely, extending into all sulci and depressions of the surface [73].

34 2.1 Radial Glia Introduction

VZ adjacent to the lumen; electron lucent processes; abundant intermediate filaments25; numerous glycogen granules condensed at the end of the process closest to the luminal surface. A second relevant NEP-distinguishing feature of radial glia is the expression of markers characteristic of the astrocytic lineage, such as the Glutamate aspartate transporter (GLAST), BLBP and GFAP, as well as the consistent immunoreactivity shown towards the Nestin and Vi- mentin antibodies [122,310,400]. In figure 2.1, dividing NEP cells appear in blue and populate the VZ and the sub-ventricular zone (SVZ), while mature migrated neurons are highlighted in yellow and populate the stratum closest to the pia mater. Radial glia are shown in green and their process extends with a bipolar morphology across from the VZ to the pia mater to support mature neuronal migration in the developing cortex. The stratification of the NEP cell population in the VZ is concordant with cell cycle phase (G1, G1/G2, M), while the more superficial SVZ displays continued mitotic activity but does not host a similar cellular stratification. Therefore, the mature cortex eventually displays an inside-out pattern of layering, with the early-born neurons residing in the deeper layers and the late-born ones residing more superficially next to the pia mater [193].

Figure 2.1: Cross-section through the neural tube with morphological zones indi- cated on the left. NEP cells appear in blue, mature migrated neurons are highlighted in yellow and radial glia are shown in green. Adapted from Herrup et al 2007 [193].

25Components of the cytoskeletal system that are distinguishable from microfilaments by the size of their diameter, 8-12 nanometers. They function as a tension-bearing element to help maintain cell shape and rigidity as well as anchor in place several organelles.

35 2.1 Radial Glia Introduction

Radial glia are by no means a uniform cell population. In fact, they are found within the cortex and also throughout the developing brain and spinal cord. This spatial and temporal heterogeneity likely generates the diversity of cellular phenotypes within the nervous system. For example, region-specific expression of transcription factors in radial glia is likely to determine the fate of progeny towards one of the lineages of the CNS [249]. Furthermore, there are circumstances in which the radial glia phenotype is reacquired, such as af- ter injury, during reprogramming and dedifferentiation in vitro, and following epigenetic disruptions in tumorigenesis [400].

Anatomically, a ventricular system is present within the cerebrum26 that is composed of four communicating compartments, or ventricles, filled by CSF [387]. The SVZ is a paired brain structure situated in the lining of the two lateral ventricles and is one of the major germinal layers during embryogene- sis [18,41] together with the sub-granular zone (SGZ27), as well as the largest district in which NS cells with the characteristics of astrocytes persist after birth in the mammalian adult brain [18,332]. Several studies have demon- strated that radial glia not only give rise to multiple classes of brain cells, but also generate adult SVZ stem cells that maintain the neurogenic lineage in the adult brain [293,332,413], with a similar relationship proposed also between radial glia and hippocampal progenitors [223,449]. Specifically, these adult SVZ stem cells have been shown to arise from a subpopulation of radial glia present within the developing striatum and display characteristics intermedi- ate between normal astrocytes and radial glia, hinting that NEP cells, radial glia and adult SVZ stem cells are the components of a continuous lineage with multipotent neural differentiation potential [107,332]. Although these in vitro studies have demonstrated the shared molecular and morphological charac- teristics between radial glia and adult SVZ stem cells (Fig 2.2) [107], several aspects of in vivo biology cannot be accounted for. For example, the fact that radial glia exist only transiently during fetal development makes it harder experimentally to validate whether they function as self-renewing stem cells. Also, the artificial environment of cultures may result in a unique synthetic cell state, and the combination of transcription factors expressed in cultured SVZ stem cells is not found in vivo. Therefore, it is fairest to term these precursor 26Largest structure in the mammalian and composed of the white and grey matter in the cranial cavity. 27Adult mammalian neural stem cells have also been isolated from the sub-granular zone (SGZ) of the dentate gyrus in the hippocampus, and the subcortical white matter [442].

36 2.2 Neural Stem Cells Introduction cells "radial glia-like NS cells", although in the rest of this thesis they will be referred to simply as NS cells [400]. Radial glia can be obtained from the

Figure 2.2: Surface markers of radial glia are expressed by NS cell lines, indicat- ing that these cells may provide the biological context to work with progenitors of the CNS. For example, the GFAP is a type III Intermediate filament; GLAST is an astrocyte-specific glutamate transporter; Prominin, also known as CD133, is a glycoprotein that is a neural and hematopoietic stem cell marker. Adapted from Conti et al 2005 [107]. dissociation of fetal CNS tissues and the subsequent establishment of primary cultures. The heterogeneity linked to these primary cultures was overcome by the development of cell type specific monoclonal antibodies like RC1, which reduces the presence of multiple immature and differentiated cell types and distinct radial glia subtypes. Fluorescent activated cell sorting (FACS) can also be used with cell surface markers like CD15 and CD133, which, however, enable only the enrichment and not the isolation of radial glia subpopula- tions. Another technique uses reporter mice in which an endogenous radial glia promoter drives the expression of fluorescent reporters, and cells with ac- tivated reporter expression are then isolated using FACS. Although primary cell cultures are a useful tool to isolate and characterise radial glia cells iso- lated directly from neural tissues, the mitogen-driven expansion of cells in vitro leads to the formation of NS cell lines that are an invaluable tool for molec- ular and biochemical studies on nervous system pathological models amongst others [400].

2.2 Neural Stem Cells

NS cells are defined as clonogenic cells capable of self-renewal and multipotent differentiation into the three main cell types of the CNS: neurons, astrocytes

37 2.2 Neural Stem Cells Introduction and oligodendrocytes. NS cells don’t express the pluripotent specific transcrip- tion factors Oct-4 and Nanog, but rather show expression of neural genes and lack the expression of mesoderm and endoderm specific genes. As shown in figure 2.3, pure NS cells can be derived from ES cells taken from the inner cell mass (ICM) of the blastocyst or from the SVZ and germinative area of adult and fetal brain tissue, respectively [400,403]. NS cells can be isolated from the

Figure 2.3: Sources of NS cells: (a) cultured indirectly starting from ES cells derived from the ICM of the blastocyst; (b) cultured directly from the dissociation of germinative areas of the fetal brain or SVZ of the adult brain. Adapted from Pollard et al 2007 [400]. embryonic or adult mammalian brain of mice [297,427,525], [171] and humans [135,145,149,436,443,512], although their precise purification remains elusive since they cannot yet be unambiguously identified with markers. Until recently, NS cells were mainly defined by the expression of Nestin, a cyto- plasmic intermediate filament protein discovered by Hockfield and McKay in 1985 [195], although it is now clear that Nestin identifies neural progenitors as well as stem cells. Nestin expression is lost in vitro with differentiation of NS cells and, in vivo, is retained postnatally only in proliferative zones. Direct iso- lation of NS cells from human fetal brain using flow cytometry for the cell sur- face marker Prominin-1 (CD133) was reported by Uchida et al. in 2000 [502]. CD133 was originally shown to be a hematopoietic stem cell marker, but it is also expressed in the SVZ of developing mice and humans on the apical mem- brane of cells lining the lateral ventricles, with more restricted expression than Nestin. In vitro NS cell-like cells with a marked stem cell activity have been isolated from human fetal brain cells expressing CD133 [122]. In vivo, NS cells have been identified as radial glial cells, expressing markers BLBP, GLAST

38 2.2 Neural Stem Cells Introduction and RC2, an antibody recognising the presence of Nestin [310]. Until the late 1990s, the only cell line that could consistently generate human neuronal cells in vitro was NTERA-2, a teratocarcinoma28 derived cell line that required the performance of complex manipulations to induce differentiation [22,397]. In 1997, Sah et al. established the first immortalized adherent human fetal neu- ral precursor cell line using retrovirally expressed avian v-myc [441] that led to subsequent independent reports using similar strategies [114,145,513], until the possibility of expanding human fetal neural precursors in suspension cul- tures was explored [83,428,483]). The floating aggregates of cells were termed "neurospheres" [427] and only recently adherent monoculture protocols have been developed as an alternative.

Neurospheres With the exception of ES cells, it has always proven difficult to obtain homogeneous propagation of stem cell cultures ex vivo since they tend to be accompanied by differentiation. In 1992 Weiss and Reynolds discovered that cells from fetal mouse CNS could be propagated in suspension culture with EGF, as a cluster of floating cells that they termed "neurospheres" (Fig 2.4). A neurosphere represents a clonal single cell-derived floating cluster of

Figure 2.4: (a,b) Contrast microscopy images of early phase neurosphere formation, in which individual cells form small clusters. (c,d) Immunofluorescence microscopy images in which (c) EGFR (green) and the Nestin protein (red) are detected on an intact neurosphere, and (d) nuclei (DAPI staining, blue) and cell mitosis (5-bromo- 2’-deoxyuridine (BrdU) incorporation, green) are detected on a frozen neurosphere section. Image adapted from [375]. proliferating cells [427] that contains thousands of cells and is a mixture of stem and progenitor cells, with only up to 5% stem cells [122]. The number of stem cells in a neurosphere is evaluated in a clonogenic assay by determining 28A germ cell tumor.

39 2.2 Neural Stem Cells Introduction the number of singly dissociated cells from primary spheres that can give rise to secondary spheres that, in turn, can differentiate down all three lineages of the CNS (Fig 2.5) [494]. Therefore, the re-plating efficiency of neurospheres is a measure of the number of stem cells, and the size of the sphere reflects progenitor proliferative efficiency. This assay also allows for a quantitation of self-renewal, which is distinct from proliferation in that self-renewal involves a cell division with a cell fate decision, so that at least one daughter cell retains the full stem cell potential of the parent cell (see Fig 3.2). A multipotent secondary sphere can only form from a stem cell. Undifferentiated neurospheres can be extensively passaged in suspension, but when plated onto an adherent substrate in serum, they differentiate into the three main neural lineages of the CNS: neurons, astrocytes and oligodendrocytes [122]. In their 1992 neurosphere assay Weiss and Reynolds [427] employed a serum- free culture system, whereby the majority of primary differentiated CNS cells did not survive but a small population of EGF-responsive cells were maintained in an undifferentiated state and proliferated to form clusters, that could then be dissociated to form numerous secondary spheres or induced to differentiate into the three major cell types of the CNS. After approximately seven days in growth medium containing EGF, the neurospheres isolated measured 100- 200µm in diameter, were composed of 3,000-5,000 cells, and differentiated into the three primary CNS phenotypes when, as intact clusters or dissociated cells, they were plated without growth factors on an adhesive substrate (Figure 2.5).

Over the past decade the use of the neurosphere assay has demonstrated that a population of cells existed in the fetal through to the adult mammalian CNS that could be isolated in culture, and exhibited the critical stem cell attributes of proliferation, self-renewal, and the ability to give rise to a number of dif- ferentiated, functional progeny [116]. The neurosphere assay has proven to be an excellent technique to isolate NS cells and progenitor cells to investigate the differentiation and potential of cell lineages. These spheres can be dissoci- ated, expanded and pooled in sufficient quantity for scientific inquiry, and lend themselves easily to sectioning for histology or immunocytochemistry applica- tions, and being cryopreserved [375]. Moreover, the recent finding that human brain tumours can be similarly cultured in neurosphere conditions generates the opportunity of understanding the stem cell hierarchy of these neoplasms (see Section 3.2) [122]. Neurospheres contain a mix of differently committed cells including radial glia, committed progenitors and differentiated astrocytes

40 2.2 Neural Stem Cells Introduction

Figure 2.5: The neurosphere assay used to study neural precursor cells in culture. Cells are first isolated from embryonic or adult brain, then cultured in serum-free conditions in the presence of EGF and FGF2 to generate floating colonies. The primary neurospheres can then be dissociated and re-plated in EGF and FGF2 to generate secondary neurospheres that can then be made to differentiate in the three primary lineages of the CNS by subtracting the growth factors in adherent conditions. Adapted from Dirks et al 2008 [122]. and neurons, that, upon removal of EGF, differentiate into the three main lineages of the CNS with a strong preference towards astrocytes [107]. This heterogeneity likely provides a niche that sustains 3-4% of stem cells, raising the question as to whether it is the multipotent cells or the more differenti- ated ones within the mixed cellular environment, that give rise to the three lineages [107,400]. In vivo, the external signals such as secreted factors and cell-cell interactions mediated by integral membrane proteins and the extracel- lular matrix, control stem cell fate collectively, defining the stem cell "niche", which has a powerful effect in maintaining the balance between quiescence, self-renewal and cell fate commitment [165]. In vitro, the neurosphere prob- ably provides the right mixture of cellular environments that resembles the complex niche, sustaining NS cells in the mammalian brain. Therefore, the neurosphere discovery is invaluable in that it demonstrates the potential in the developing and adult CNS of rodents and primates, to give rise to stem cells [107].

41 2.2 Neural Stem Cells Introduction

A proof-of-principle experiment performed by Conti et al [107] demonstrated the presence of NS cells within neurospheres, and consisted in allowing passage 40 mouse neurospheres derived from fetal forebrain to attach to gelatin-coated plastic in the presence of EGF and FGF2. Since under these conditions Conti et al had already proven the generation of NS cells in adherent monoculture (see Section 2.2), the appearance of bipolar cells that were indistinguishable from NS cells and could be serially propagated as uniform RC2+/GFAP- populations and then induced to differentiate into astrocytes or neurons, concluded that radial glia-like cells present in neurospheres give rise to NS cells in adherent culture in the presence of FGF2 and EGF. Conversely, they observed that NS cells of either ES cell or fetal brain origin readily formed neurospheres if detached from the substratum mechanically or due to overgrowth, confirming that NS cells and thus radial glia are likely the neurosphere forming stem cells, although in neurospheres they constitute only a fraction of the cell population. Analogously to the embryoid body (EB) differentiation observed in ES cell aggregates, the differentiation observed within neurospheres is presumably due to aggregation [125]. An important limitation of the neurosphere culture system is that, when used to screen compounds that affect NS cell expansion, human NS cells expand more slowly in suspension culture in vitro than do their mouse counterparts, which makes quantification of cell proliferation harder due to variable cell death. A second important limitation of neurosphere assays is that it is dif- ficult to identify the precise cellular target due to the presence of restricted progenitors and differentiated cell types, and real-time monitoring of cellu- lar responses is not possible in aggregates. Finally, fusion of neurospheres is a common occurrence in suspension, which confounds analyses based solely on sphere numbers or size [404]. To summarise, the neurosphere paradigm is invaluable in that it has demonstrated the existence of progenitors within cultured tissues, but it is accompanied by several important shortcomings: · the cellular complexity created by the mixed environment is a barrier for dissecting the mechanisms responsible for the self-renewal and commit- ment processes;

· the heterogeneity of the cellular population pollutes global expression profiling experiments and makes it hard to identify a precise cellular tar- get;

42 2.2 Neural Stem Cells Introduction

· neurospheres differentiate more promptly into astrocytes rather than neurons both in vitro and when transplanted into mice;

· human NS cells cannot be properly screened for compounds affecting them because quantification is made difficult by their slow growth in suspension culture and their variable death rate;

· cellular responses cannot be monitored within aggregates and fusion of neurospheres can confound results based on their number and size.

Therefore, a niche-independent environment is better suited for the growth of stem cell cultures, in that differentiation towards a specific lineage can always be traced back to the stem cells themselves [165]. Such an environment was produced when the presence of FGF2 and EGF alone was discovered to be sufficient for the continuous expansion of NS cells in adherent conditions [107].

Niche-independent NS cell derivation

The key to proper usage of ES cell-based technologies is the development of robust, reproducible and reliable protocols for controlling propagation and dif- ferentiation of cells, and an important goal in embryonic stem cell biology over the past 10 years has been that of developing protocols to enable the conver- sion of mouse and human ES cells to the neural lineage [400].

The default model of neural induction proposes that the key event is the re- moval of BMP signaling with no positive induction required [352]. In mouse ES cells, however, positive induction is necessary for differentiation to take place, since these cells can be maintained in vitro through the addition of the Leukemia inhibitory factor (LIF) and BMP extrinsic factors, as well as in- trinsic determinants Sox2, Oct-4 and Nanog transcription factors, but require replacement of LIF and BMP to specify the direction of differentiation. It was initially thought that exposure to RA and serum29 in suspension culture was required, provided LIF retraction, to generate neurons [38,471], although later reports showed RA was unnecessary and neural precursors could be en- riched in a serum-free basal media [367]. During neural differentiation, ES cells are believed to undergo progressive lineage restrictions similar to those observed during normal fetal development, providing a means to isolate dis- tinct neural precursor populations such as NEP cells and radial glia, as well as 29Animal derived fluid most commonly drawn from a bovine fetus that contains hormones and growth factors that allow cells in culture to proliferate.

43 2.2 Neural Stem Cells Introduction a platform to study the molecular events involved in the transitions between precursors [400]. Based on the view that neural differentiation of ES cells in vitro does recapitulate neural development in vivo, several studies isolated an RC2 immunoreactive radial glia-like cell as the transient neural progenitor in- volved in the transition from NEP cells to neuronal and glial subtypes (Fig 2.6) [302,399].

Figure 2.6: Diagram to visualise the progressive lineage restriction of ES cells differentiating toward the neural phenotype in neurospheres, showing the transition of NEP cells to RC2 immunoreactive radial glia-like cells. Adapted from Pollard et al 2007 [400].

Likely due to paracrine LIF signaling, many ES cell neural differentiation pro- tocols have the drawback of fostering the generation of non-homogeneous cul- tures that include contaminating populations of non-neural cells and residual ES cells [400]. This effect can be overcome by adopting a "lineage selection" strategy, in which a reporter gene or drug resistance gene is expressed as a transgene30 under cell type specific promoter elements, such as the Sox1-GFP reporter construct for the isolation of NEP cells [32]. ES cell-derived cultures engineered to express the Sox1-GFP reporter are initially enriched in Sox1+ NEP cells but quickly differentiate to neurons and glia due to the niche environ- ment recreated in neurospheres that mimics in vivo developmental cues [400]. In order to isolate cells capable of undergoing symmetrical stem cell divisions without differentiation, a niche-independent protocol was devised by Conti et al [107] that uses adherent conditions to ensure homogeneity of the stem cell population bypassing the formation of neurospheres. The unique characteris- tic of this protocol is the use of EGF in attached monolayer culture, which 30A gene that does not belong to the wild type genome sequence but can be introduced from an another organism naturally or by means of genetic engineering.

44 2.2 Neural Stem Cells Introduction had been previously only used in the culturing of neurospheres. The proto- col was successfully applied to the production of non-immortalized NS cells in adherent monolayer from mouse and human ES cells, as well as mouse and human fetal brain [107,481], and later adapted for derivation from adult mouse brain [403], with the most recent optimisation made for drug screening appli- cations from different regions of the fetal human CNS [198]. Other protocols that describe adherent monolayer culturing of NS cells have been explored by other researchers, but their approaches are all somewhat different in that they either generate immortalised cell lines [405], are tailored to derivation from the SGZ [34], or generate non-immortalized cell lines but have limited the charac- terisation to primary cultures without demonstrating long-term stability and tripotent differentiation capacity [376,536]. In the protocol described by Conti et al, prior to initiating differentiation, mouse ES cells are plated at relatively high density and cultured for 24 hours in standard ES cell medium containing LIF. To start monolayer differenti- ation, undifferentiated ES cells are dissociated and resuspended directly in N2B27 medium, a mixed formulation of basal media and supplements includ- ing and lacking LIF. Under this monoculture condition, ES cells lose pluripotent status and predominantly commit to a neural fate [539]. The cul- ture is re-plated after seven days in basal medium with FGF2 alone or FGF2 and EGF, during which time residual undifferentiated ES cells are eliminated because the NS-A component of basal medium does not allow for ES cell prop- agation. However, since in this media neural precursors associated into floating clusters, a lineage selection strategy was adopted (Fig 2.7). Thus, after neural commitment was induced in monolayer in Sox1-GFP reporter ES cell lines, the neural precursors were maintained adherent in N2B27 medium, and the undif- ferentiated ES cell and non-neural differentiation products were eliminated via addition of puromycin. In these reporter cell lines, Sox1 is linked via an inter- nal ribosome entry site (IRES) to a gene conferring puromycin resistance (Fig 2.8), so that the transition to NEP cells is marked by GFP fluorescence and the selection of these cells can be completed via the addition of puromycin. The subsequent addition of FGF2 and EGF outgrows a population of bipolar cells called LC1, during which process the ES cell-derived NEP cells gradually ex- tinguish GFP expression and acquire the NS cell phenotype, whereby they stop expressing pluripotent markers Nanog, Oct-4 and Sox1 and gain the expression of Nestin, BLBP, Sox2 and RC2 immunoreactivity and lack the expression of GFAP. These are the same markers expressed by the NS cell lines obtained

45 2.2 Neural Stem Cells Introduction

Figure 2.7: Protocol describing conversion of ES cells into immortalised NS cell lines. (a) Neural precursor differentiation is induced from ES cells in serum-free adherent monoculture by subtracting LIF and detected via the Sox1-GFP reporter transgene. Sox1 is one of the earliest neural differentiation markers. (b) After seven days cells are re-plated in basal medium in the presence of either FGF2 alone or FGF2 and EGF. (c) Residual undifferentiated ES cells are eliminated from the cul- ture through puromycin selection. (d) Neural precursors lose GFP expression as they become Sox1- and are re-plated in fresh medium in the presence of EGF and FGF2. They also attach and outgrow a population of bipolar cells, termed LC1. (e) Clono- genic NS cell lines are generated by plating single cells from the LC1 population. Adapted from Conti et al 2005 [107].

Figure 2.8: Representation of the Sox1-GFP reporter construct used in the niche- independent NS cell protocol. from mouse fetal and adult tissues using the same protocol. To establish the presence of clonogenic NS cells, single cells were isolated from the LC1 cul- ture and expanded as adherent cultures to show the same morphology, growth characteristics and markers of the LC1 population. Upon withdrawal of EGF and FGF2 and exposure to serum or BMP4, these NS cells differentiate into astrocytes. In contrast, removal of EGF followed by FGF2 gives rise to cells

46 2.2 Neural Stem Cells Introduction with immunochemical and electrophysiological properties of mature neurons. Importantly, even after prolonged expansion, NS cells maintain their potential to differentiate efficiently into neurons and astrocytes in vitro as well as upon transplantation into the adult brain [107]. To promote oligodendroglial differ- entiation cells were cultured on laminin coated dishes in medium containing N2 supplement plus FGF2, PDGF and forskolin, a growth factor combination known to enhance oligodendrocyte progenitor proliferation, which efficiently differentiated NS cells into oligodendrocytes [165]. To summarise the charac- terisation of the mouse NS cell lines derived by Conti et al:

· they express the astrocyte differentiation marker GFAP upon addition of serum or BMP4 and differentiate into astrocytes;

· they express neuronal markers type III β-tubulin (TUBB3), - associated protein 2 (MAP2) and ERBB2 upon removal of EGF and FGF2 (in this order) and differentiate into neurons. All NS cell lines produced in the study by Conti et al were electrophysiologically active and exhibited voltage-gated Na+ and Ca2+ conductance, typical of ma- turing nerve cells;

· they show no significant decline after many passages and retain diploid chromosome content throughout late passages, maintaining intact the differentiation potential in both the neuronal and glial direction;

· they do not form teratomas31, an important step towards the confirma- tion of the identity of NS cells. Unlike ES cells, in fact, the differentiation potential of NS cells is incapable of teratoma formation and this obser- vation was reproduced in an experiment by Conti et al [107] in which NS cells were transplanted in mouse fetal and adult brain and grafted onto mouse kidney, where they did not proliferate or give rise to teratomas. The absence of any histological evidence of unregulated proliferation or tumor formation was a clear confirmation of the identity of the NS cells.

Although the NS cell lines generated via the niche-independent NS cell protocol are capable of differentiating into all three lineages of the CNS, i.e. astrocytes, oligodendrocytes and neurons, the neuronal subtypes are limited to the gen- eration of large amounts of GABAergic neurons32 according to the results by 31Encapsulated germ cell tumor derived from pluripotent cells with tissue or organ com- ponents resembling normal derivatives of all three germ layers. 32Neurons that release the main CNS inhibitory neurotransmitter GABA.

47 2.2 Neural Stem Cells Introduction

Conti et al [107]. This differs vastly from the neuronal subtypes identified upon direct differentiation of ES cell with no intermediate expansion of the neural progenitors, which are skewed towards the generation of large amounts of glu- tamatergic neurons33 [55]. This inconsistency shows that in vitro expansion of NS cells may be somehow restricting the neuronal subtype differentiation capacity of these cells and further studies will have to address whether cell culture conditions can be altered for long-term expanded NS cells so that re- gional identities can be re-established [400].

Previous studies have shown the derivation of glial restricted progenitors in ad- herent culture using FGF2 for survival and expansion of the cells [76,212,276, 367]. However, these cultures change their properties over time and should not be considered equivalent to expanded long-term stem cell lines [403]. In the NS cell derivation from mouse ES cells described by Conti et al, the absence of EGF causes caspase 3-lead apoptosis and an immature neuronal phenotype. This phenomenon was found to be avoidable by culturing NS cells on laminin in the absence of EGF, although this addressed NS cells towards neuroblast34 commitment with differentiation upon mitogen withdrawal [290]. The NS cells obtained in the protocol by Conti et al exhibit phenotypic similarities to ra- dial glia in that they show no expression of neuronal or astrocyte antigens, but uniform expression of neural precursor markers Nestin, RC2, Vimentin, 3CB2, Lex1, Paired box gene 6 (Pax6) and Prominin (see Fig 2.2). In addition to this set of markers considered diagnostic for neurogenic radial glia, they show expression of the neural precursor markers Sox2, Sox3, and Emx2, and the transcription factors Olig2 and Mash1. The absence of Sox1 and maintenance of Sox2 is noteworthy of NS cells since the former marks all early neuroec- todermal precursors and its absence in stem cells might indicate that Sox2 is playing the key role. In a study by Gomez-Lopez et al [169] the roles of Sox2 and Pax6 were investigated in mouse fetal forebrain-derived NS cells, to find that conditional deletion of either gene reduced the clonogenicity of these cells in a gene dosage-dependent manner. Cells heterozygous for either gene displayed moderate proliferative defects, but in the complete absence of Sox2, cells exited the cell cycle with concomitant down-regulation of neural progenitor markers Nestin and Blbp, and ablation of Pax6 also caused major 33Neurons that release the main CNS excitatory neurotransmitter glutamate. 34Neuroblasts are the descendants of NS cells in the SVZ that migrate into damaged brain areas after strokes or other brain injuries to generate regionally appropriate new neurons [290]

48 2.2 Neural Stem Cells Introduction proliferative defects. However, a subpopulation of cells was able to expand continuously without Pax6, retaining the progenitor markers but displaying an altered capacity to differentiate into astrocytes and oligodendrocytes, high- lighting the role of Pax6 beyond neurogenic competence. The findings by this study, therefore, suggest that Sox2 and Pax6 are both critical for self-renewal of differentiation-competent radial glia [169]. Time-lapse videomicroscopy of the NS cells derived by Conti et al [107] also demonstrated that cell nuclei of those NS cells undergo interkinetic nuclear migration, a well-characterised feature of NEP and radial glia cells in vivo. Importantly, the mouse fetal brain, human ES cell and human fetal cortex- derived NS cells, expressed the same radial glia and neurogenic markers as the mouse ES cell-derived NS cells, although the human cells exhibited moderate levels of GFAP, consistently with the known activity of the human GFAP pro- moter in radial glia and unlike the very feeble expression observed in mouse NS cells [310,418]. Human ES cell and fetal cortex-derived NS cells also proliferate more slowly than the mouse-derived ones, like in neurospheres, and after se- quential withdrawal of EGF and FGF2, generate mixed populations of TuJ1+ neuron-like cells and GFAP+ cells, with pure populations of cells with typical astrocyte morphology and intense GFAP immunoreactivity readily produced after exposure to serum [107].

In a separate study, Sun et al [481] report the derivation and characterisa- tion of human NS cell lines from human fetal cortex and spinal cord using a continuous adherent procedure that is more efficient than allowing primary cells to form neurospheres and subsequently isolating NS cells, as described in the protocol by Conti et al. In the protocol developed by Sun et al, pri- mary cells are seeded onto laminin coated dishes in growth medium containing both EGF and FGF2. In these conditions cells readily attach and produce a morphologically heterogeneous population containing both Nestin+ neural precursors and Tuj1+ neurons. In order to enrich for undifferentiated neural precursors, the cells are temporarily transferred onto gelatin coated dishes, in which conditions neurons and committed neuronal progenitors fail to survive. Three weeks after initial plating, the primary human culture is homogeneously Nestin+ and Tuj1- [481]. Once established, human NS cells can be expanded continuously in monolayer culture where they homogeneously express immature neural precursor markers Nestin and Sox2. The long-term expansion of these cells can also occur suc-

49 2.2 Neural Stem Cells Introduction cessfully in EGF alone, confirming previous findings with mouse NS cells [403], where addition of FGF2 is essential for initial derivation but can be dispensed with during subsequent propagation, suggesting that EGF is the major mito- gen for NS cell self-renewal, although a contribution of autocrine FGF2 is not excluded. However, neither human nor mouse primary cells produce stable cell lines unless they are exposed to FGF2 during the first 2-4 weeks after plating, suggesting that a possible contributing factor is that FGF2 may induce EGF responsiveness in NS cells. Furthermore, when human NS cells expanded in EGF only are exposed to the differentiation conditions described above, they are able to generate both neurons and glial cells [481]. Immunostaining of the derived human fetal NS cells showed the expression of the markers that are hallmarks for radial glial cells, including BLBP, 3CB2, GLAST, Vimentin, and GFAP, as well as neural progenitor markers Nestin, Sox2, Pax6, Olig2, and CD133, with Sox1 transiently expressed in mouse and human neural precursor cells, but not maintained in human fetal NS cells [107,302,481]. It takes, on average, one month to derive an adherent and morphologically homogeneous human NS cell population with a total num- ber of approximately 2 million cells. To the date the research article by Sun et al was published, the group had successfully derived five human brain NS cell lines: CB192, CB516, CB525, CB541, and CB660 [481]. The differentia- tion potential of these human NS cells was assessed using protocols previously developed for mouse NS cells by Conti et al [107] and Glaser et al [165]:

· Neuronal differentiation was triggered by removing EGF from growth medium first and FGF2 successively. By the end of the fourth week of neuronal differentiation, many cells became Tuj1+ and exhibited thin elongated processes.

· To generate oligodendrocytes, cells were treated with basal medium sup- plemented with insulin-cotaining N2, forskolin, FGF2, and PDGF. From the third week, the supplements were changed to N2, PDGF, T3, and ascorbic acid, with successive withdrawal of PDGF inducing the appear- ance by the fifth week of 1-2% Olig-4+ cells bearing branched oligoden- drocyte morphology.

· In the absence of EGF and FGF2 and presence of BMP4 or serum, a morphologically homogeneous astrocyte population was derived. The sole removal of EGF and FGF2 without addition of BMP or serum also led to NS cell differentiation into astrocytes but with significant cell death

50 2.2 Neural Stem Cells Introduction

ensuing. Recently it was reported that different types of human astro- cytes may express distinct GFAP isoforms: adult SVZ astroglial cells express GFAPδ, while most other astrocytes, including the derivative as- trocytes of the human NS cells developed in the protocol by Sun et al, express GFAPα [432].

In the protocol described by Sun et al [481], during the first four weeks after cells are plated, primary human cells attach on laminin substrate within 24 hours and start to proliferate in the presence of both EGF and FGF2, with no extended cell proliferation observed in medium containing only one of the two growth factors and the subsequent impossibility of establishing NS cell lines successfully. In addition to EGF and FGF2, the laminin substrate is important for efficient human NS cell derivation, since primary cells grown on gelatin coated or uncoated dishes easily detach and tend to form neuro- spheres resulting in slower proliferation, as described in the study by Conti et al [107]. The laminin substrate was also found to be optimal for human NS cell propagation, indicating that laminin may play important roles in reg- ulating neural cell behaviour [481]. Finally, although the human fetal-derived NS cells exhibited some features of radial glia, the artificial nature of culture environments may result in unique cell populations in vitro, which may indi- cate, in turn, that NS cells do not have direct in vivo counterparts. In fact, as a further consideration, the combination of transcription factor expression in mouse NS cells is not routinely observed during normal development [403,481].

In order to shed the light on the nature of NS cells and their relationship to endogenous cell types, Pollard et al [403] have investigated whether cells capable of giving rise to NS cell cultures are restricted to developmental stages or may also be present in the mouse adult brain. Although both FGF2 and EGF were necessary for the derivation in culture of NS cells from adult mouse forebrain (containing the adult SVZ), once established, the stem cell lines could be maintained in added EGF alone. As already seen in mouse ES cell-derived NS cell cultures, the absence of EGF causes the majority of the NS cells to die of caspase 3-activated apoptosis and the rest to start differentiating towards the neuronal lineage, although never reaching the fully mature phenotype (Fig 2.9). On the contrary, withdrawal of FGF2 did not result in any striking change in NS cell morphology or behaviour, with the exception of a slightly lower doubling time for EGF-only grown colonies possibly due to a higher cell death, although more detailed analysis are required [403]. For complete

51 2.2 Neural Stem Cells Introduction differentiation to neurons, NS cells had to be plated on laminin-coated plates in basal medium withdrawing first EGF and then FGF2 in this order, since so long as EGF and FGF2 are supplied together no cell differentiation can occur and cells continue to self-renew. For differentiation to astrocytes, NS cells were exposed to 1% serum without EGF and FGF2 or exposed to BMP4 and LIF [399]. For differentiation to oligodendrocytes, the culture conditions involved re-culturing on laminin coated dishes in medium containing FGF2, PDGF and forskolin first, adding T3 and ascorbic acid later, a procedure known to promote the differentiation and survival of oligodendrocytes [165]. Under these conditions, NS cells efficiently differentiated into oligodendrocytes, astrocytes and neurons, with similar outcomes seen for NS cells derived from fetal mouse brain, altogether demonstrating that oligodendrocyte progenitor proliferation and differentiation seems to be preserved in adult mouse-derived NS cells. Importantly, the ability for oligodendroglial differentiation is maintained after transplantation. Therefore, the differentiation spectrum of these NS cells is not restricted to neurons and astrocytes but also extends to oligodendrocytes [165,399].

Figure 2.9: Roles of EGF and FGF2 in the derivation and maintenance of NS cells. Fetal forebrain progenitors or ES cell-derived neural progenitors (green) can be converted into NS cell lines (yellow) using a combination of EGF with FGF2. Once established, NS cells can be maintained in added EGF alone, whereas in FGF2 alone, they undergo differentiation (blue) and apoptosis. Adapted from Pollard et al 2006 [403] and Conti et al 2005 [107].

52 2.2 Neural Stem Cells Introduction

In light of the successful engraftment into adult mouse brain [107], the ES cell or primary CNS tissue-derived NS cells (fetal and adult) show the poten- tial for delivery of cell replacement and gene therapies. Furthermore, since homogeneous expansion of any stem cell in defined conditions has been until now a prerogative of ES cells, these NS cells provide an accessible system for characterisation, manipulation and analysis of the stem cell state, as well as a resource for direct comparison with ES cells [107]. A summary of commonali- ties and differences of lineage-restricted and pluripotent stem cells is shown in table 2.1. Table 2.1: Summary of commonalities and differences between ES cells and NS cells. Adapted from Pollard et al 2006 [399]

Features ES cells NS cells Species Rodent and Rodent and primate Source Blastocyst ES cells, germinate ar- eas of fetal brain, SVZ of adult brain Growth factor dependance LIF+BMP (serum free) EGF+FGF2 (serum free) Expansion in vitro Immortal Immortal Clonogenic Yes Yes Doubling time 12 hours 24 hours Stem cell divisions Symmetrical Symmetrical Karyotype Stable diploid Stable diploid Niche dependence None None In vivo counterpart Similarities to ICM Similarities to radial glia Potency Pluripotent Multipotent Genetic manipulation Yes Yes

Neurosphere versus adherent culture method Since NS cells offer a po- tential source for cell and tissue replacement therapy and for drug discovery, but no specific markers have been identified so far that distinguish them from their more differentiated progenitor cells, both NS cells and their progenitor cells have been isolated in a recent study by Sun et al [479] from embryonic or adult brain and spinal cord, and maintained in suspension culture as neu- rospheres as well as in adherent substrate-bound culture. The side-by-side comparison of these two culture systems was called for by the necessity to understand how processes such as cell survival, proliferation, differentiation and passaging performed in each system for potential use in drug discovery

53 2.2 Neural Stem Cells Introduction and cell therapy. For example, the neurosphere culture method has been used extensively to study molecular and cellular mechanisms that control neuro- genesis, differentiation and cancer proliferation [116,261], but a recent study showed that long-term neurosphere cultures induce changed differentiation and self-renewal capacities, and the occurrence of chromosomal instability [518]. Similarly, an earlier study by Pollard et al [401] had established that caution should be exercised also when extrapolating in vitro adherent monolayer NS cell culture findings to in vivo development because FGF2 induces a subset of cell surface markers that are not found in vivo. In this study, microarray-based expression profiling was used to identify a set of markers expressed by fetal mouse NS cells but not ES cells and found the cell surface protein CD44 to be differentially expressed with higher expression levels in fetal mouse-derived NS cells. Although CD44 was expressed homogeneously by all NS cell lines derived in this study, appreciable numbers of CD44+ cells could not be found in the developing brain during neurogenic stages, nor in differentiating ES cell cultures. Moreover, CD44 expression was found to be induced by FGF2 in a subset of cells in primary culture and this effect did not appear to be re- stricted to CD44, with many other NS cell markers found to be activated in vitro, including Adam12, Cadherin20, Cx3cl1, EGFR, Frizzled9, Kitl, Olig1, Olig2 and Vav3. Therefore, it was speculated that the self-renewing NS cell state may be generated in vitro following transcriptional resetting induced by FGF2 and the latter did not act simply to maintain and expand pre-existing stem cells, but also impart significant changes to the transcriptional and cel- lular phenotype [401].

Since the ability of NS cells to self-renew and differentiate into the three ma- jor neural cell lineages make them ideal candidates to treat impaired cells and tissues in neurodegenerative diseases, spinal cord injuries and stroke [544], methods that can scale up their production need to be evaluated accurately. Thus, the study by Sun et al [479] aimed at characterising the proliferative, differentiation and passaging capacities of the two culture methods of election for culturing NS cells, i.e. the neurosphere and adherent monolayer culture methods. The starting material for this study was dissociated from E14 rat cortical neuroepithelium, early enough in the development of the CNS that proliferating NEP cells and NS cells coexist with their neuronal, neuroglial, and glial progenitors, as well as newly post-mitotic cells (Fig 3.1). These cells were expanded in serum-free media in the presence of the mitogenic growth fac-

54 2.2 Neural Stem Cells Introduction tors FGF2 and EGF as described by several previous studies [107,165,403,481]. In their study, Sun et al [479] have shown that NS cells and their progenitors grew on adherent substrate significantly faster in the first four passages than the NS cells in neurospheres, but slowing down and plateauing their growth rate after the sixth or seventh passage, while in neurospheres their growth kept increasing slowly but steadily for more than 10 passages.

Immunocytochemical analysis using multiple lineage-specific antibodies in early primary cultures for both neurosphere and adherent methods showed about 60% MAP2+ neurons and about 40% Nestin+ progenitors, indicating that both cultures were initially composed of neural progenitor and mature neu- ronal populations. However, proliferating neural progenitors became dominat- ing in adherent culture over the next few days to finish off on the ninth day with a Nestin+ progenitor percentage of 93.76%, clearly identifying the type of cells that eventually compose the culture [479]. In order to test for the ability of both culture systems to differentiate into neuronal and glial cells, growth factor withdrawal was induced for seven days and differentiation was assessed through the use of antibodies against Tuj1, GFAP and Olig-4 for the identification of neurons, astrocytes and oligodendrocytes, respectively. The data gathered from passages 1, 3 and 5 revealed no significant differences in the differentiation potential of cells cultured with either method [479]. Self-renewal capacity was assessed in the two culture systems through a clono- genic assay that evaluated the capacity of primary neurospheres to form sec- ondary neurospheres, as well as the capacity of plating from a single cell from the original adherent culture. The clone forming rates at passage 1 were found to be 63% for neurospheres and 44% for adherent culture, a number signifi- cantly higher for neurospheres than the 20% reported by Reynolds et al [525]. Interestingly, as discussed in the study by Marten et al [319], in the fetal mouse forebrain geminal zone there seem to be two differently responding self- renewing cells that grow neurospheres with greater diameters in the presence of EGF alone in their culture in vitro than FGF. Since in the culture systems developed by Sun et al both EGF and FGF were added, the two differently responsive stem cells might have been produced together, increasing the clone forming rates observed [479]. Finally, neurosphere cultures were found to have a greater passage potential than adherent cultures, possibly due to the three-dimensional reproduction of the niche environment experienced by the cells in vivo. In fact, the regulatory

55 2.2 Neural Stem Cells Introduction signals from a niche are provided by soluble factors and the ECM, and neuro- spheres are spheroid structures that consist of cells producing their own ECM molecules, such as laminins, fibronectin, chondroitin sulphate proteoglycans, as well as growth factors and beta integrins, epidermal growth factor recep- tors, and cadherins. Therefore, the cell-cell and cell-matrix interactions in the three-dimensional structure of neurospheres can create an environment that is more physiologically relevant than the two-dimensional one in adherent culture systems. Even so, the robustness of cell proliferation in adherent cultures is probably due to the fibronectin and laminin substrates, in that fibronectin is a ubiquitous component of various types of ECMs, and laminin was reported as one of the five different substrates regulating neural differentiation of hu- man ES cells, and precisely stimulating their expansion in a dose dependent manner [303]. In the light of the increasing evidence that astrocytes [210] and endothelial cells [354,452] might also be important components of a niche for NS cells on top of the ECM, the adherent culture system needs to be optimised in this regard.

56 Chapter 3

Brain Cancer Stem Cells

Contents 3.1 The Cancer Stem Cell Hypothesis ...... 57 3.2 Brain Cancer Stem Cells ...... 63 3.3 Glioma Culture Systems ...... 72

3.1 The Cancer Stem Cell Hypothesis

The cancer stem cell hypothesis proposes that organisation of cell lineage in tumours is hierarchical and only a subpopulation of cells termed "cancer stem cells" is responsible for tumour expansion. According to this hypothesis, stem cells or cells that acquired the ability to self-renew, accumulate genetic changes over long periods of time, escape from the control of their environment, and give rise to cancerous growth. One of the postulations of the cancer stem cell hypothesis is that a population of cells with stem cell-like features exists in tu- mours and this population gives rise to the bulk of the tumour cells with more differentiated phenotypes [123,457]. Two types of cells could fulfill the role of tumour-initiating cell: adult stem cells or their progenitors, which normally undergo limited numbers of cell divisions and aberrantly acquire the capacity to self-renew by accumulating genetic lesions and subsequently becoming the long-lived target (Fig 3.1). Regardless of the cell of origin, the cancer stem cell is defined by its stem cell-like properties [486]. Normal adult stem cells are attractive candidates because they are tissue-specific, they can self-renew and, finally, they can differentiate into all cell types of the tissue of origin. Each division produces at least one daughter cell that maintains the indefinite capacity for cell division and a progenitor cell that has finite division capacity, ultimately differentiating into the mature cell types that constitute specific

57 3.1 The Cancer Stem Cell Hypothesis Introduction

Figure 3.1: Stem cell differentiation hierarchy, in which a possible increase in plas- ticity is highlighted that may be present within cancer populations. Such plasticity would be the enabler for the bidirectional interconvertibility between cancer stem cells and non-cancer stem cells. Adapted from Gupta et al 2009 [182]. tissues. This type of cell division is referred to as "asymmetrical division" or "asymmetrical self-renewal" and is specifically characterised by adult stem cell divisions that produce a new adult stem cell together with a non-stem cell sis- ter that becomes the progenitor for short-lived, differentiating, functional cells that in most cases mature to a terminal division arrest [420]. For the vast ma- jority of adult tissues, it remains unclear how stem cells succeed in maintaining a precise balance between proliferation and differentiation in steady-state. To explain their long-term viability, it has been argued that tissue stem cells are maintained in a long-lived quiescent state, with most divisions supported by differentiating progenitor cells that ultimately exit the cell cycle and are re- placed by stem cell progeny, which provides a mechanism for protecting stem cells from damage and loss throughout adult life (Fig 3.2). Clear evidence for asymmetric stem cell divisions is found in invertebrates C. elegans and D. melanogaster, although in recent lineage-tracing studies in mammals it was shown that stem cells behave as an equipotent population, in which the bal- ance between proliferation and differentiation is achieved through frequent and stochastic stem cell loss and replacement [240]. While the asymmetrical self- renewing properties of adult stem cells are particularly important for homeo- static control in adult tissues that undergo continuous cellular turnover such as epithelium and blood [457,486], the non-stem cells of most adult tissues go through a rapid cycle in which they are born, mature, expire, and are removed from the tissue by apoptosis. The fact that the rate of adult cellular turnover is much faster than that of tumour development is the basis for the hypothesis

58 3.1 The Cancer Stem Cell Hypothesis Introduction

Figure 3.2: (a) Asymmetric cell division, in which each stem cell (orange) generates one daughter stem cell and one daughter destined to differentiate (green). (b-c) Population strategies that provide dynamic control over the balance between stem cells and differentiated cells, a capacity that is necessary for repair after injury or disease. In this scheme, stem cells are defined by their "potential" to generate both stem cells and differentiated daughters, rather than their actual production of a stem cell and a differentiated cell at each division. (b) Symmetric cell division: each stem cell can divide symmetrically to generate either two daughter stem cells or two differentiated cells. (c) Combination of cell divisions: each stem cell can divide either symmetrically or asymmetrically. Adapted from Morrison et al 2006 [345]. that non-stem cells cannot effectively initiate cancers, a concept that contrasts with the commonly held idea that cancers may arise from any tissue cell with equal likelihood [420]. Although adult stem cells are attractive candidates to fulfill the role of tumour-initiating cells, two of their properties are also con- sidered limitations to that end: firstly, asymmetric division potentially limits the number of stem cells and therefore the incidence with which they could drive tumourigenesis, and secondly, the immortal DNA strand co-segregation35 process reduces the rate at which they can accumulate mutations. Through the DNA strand co-segregation molecular manoeuvre, adult stem cells reduce their mutation rate by more than 1000-fold, avoiding all mutations that arise from replication errors that are not properly repaired [420].

Since all stem cells alike must self-renew and regulate the relative balance be- tween self-renewal and differentiation, and cancer can be considered a disease of unregulated self-renewal, understanding the regulation of normal stem cell self-renewal is fundamental to understanding the regulation of cancer cell pro- liferation [458]. By maintaining at least some of the properties of their tissue

35Non random segregation of the set of with the oldest template of the DNA strands operated at each cell division.

59 3.1 The Cancer Stem Cell Hypothesis Introduction of origin, cancer stem cells give rise to tumours that phenotypically resemble their origin, either by morphology or by expression of tissue-specific genes. However, what distinguishes cancerous tissue from normal tissue is the loss of homeostatic mechanisms that maintain normal cell numbers, and much of this regulation normally occurs at the stem cell level. The cancer stem cell hypothesis raises the important experimental implication that if a population of biologically unique cancer stem cells exists, then tumour cells lacking stem cell properties will not be able to initiate self-propagating tumours, regardless of their differentiation status or proliferative capacity, which has an impact on the experimental definition of cancer stem cell. Furthermore, the cancer stem cell hypothesis raises a clinical implication that curative therapy will re- quire complete elimination of the cancer stem cell population, since patients who show an initial response to treatment may ultimately relapse if even a small number of cancer stem cells survive. On the other hand, targeted ther- apies that eliminate the cancer stem cell population offer the potential for a cure [486].

The concept of cancer stem cell initially arose from the observation that cancer tissues resembled developing tissues and self-renewal mechanisms were com- mon to cancer cells and stem cells. The definitive demonstration of cancer stem cells in human neoplasia was first made in 1994 in leukemia [122], a non-solid tumour found harbouring a stem cell hierarchy pattern, in which a minority of cells within the leukemic population possessed extensive proliferation and self-renewal capacity not found, however, in the rest of the leukemic cells. The putative cancer stem cells in leukemia were isolated and characterised on the basis of their phenotypical similarities to normal hematopoietic stem cells. The principle of uncovering significant similarities between putative can- cer stem cells and normal stem cells, was then extended to solid tumours, first in breast cancer then in brain cancer, although normal stem cells, their differ- entiation hierarchy and markers that identify them, are not well characterized in most solid organs. Since then, other tissues in which the connection be- tween stem cells and cancer has been found are mammary gland [15,288,289], gut [177,395,414], skin [75], bladder [89] and prostate [103] . So far, cancer stem cells have been defined on the basis of their ability to seed tumours in hosts, to self-renew and to generate non-cancer stem cell differentiated progeny. Accordingly, the number of cancer stem cells within a population of cancer cells can be measured by the number of cells that are

60 3.1 The Cancer Stem Cell Hypothesis Introduction required, at limiting dilutions, to seed new tumours. The pioneering studies in leukemia and later solid tumours showed that it is possible to use cell surface marker profiles to isolate cancer cell subpopulations that are enriched for or depleted of cancer stem cells. Subsequent reports showed that, after implanta- tion in vivo, cancer stem cell-enriched populations generate tumours that are no longer enriched for cancer stem cells, implying that the cancer cells within a single tumour are naturally found in multiple states of differentiation with distinct tumour-seeding properties. The stemming possibility of bidirectional interconversion between cancer stem cell and non-cancer stem cell populations does not undermine the cancer stem cell hypothesis, since the two populations always retain their distinct identities and they can be distinguished phenotyp- ically and functionally at any moment (Fig 3.1) [182].

One of the emerging caveats of the cancer stem cell hypothesis is the actual frequency of cancer stem cells. In normal tissues, somatic stem cells are inher- ently rare, and most of this type of data in cancer is derived from xenograft experiments, in which the frequency of human cancer stem cells is determined upon transplantation into a mouse environment. Differences between human and mouse stromal and support cells, cytokines as well as distinct levels of immunological function in different immunodeficient recipient mouse strains, have led to conflicting frequencies of cancer stem cells ranging between 1% and 25%. This discrepancy has highlighted that the stem cell properties of cancer stem cells are inherent but might also be the result of the interaction with the environmental milieu [393]. Thus, as suggested by recent findings, the number of cancer stem cells in a tumour may be a function of the cell type of origin, stromal36 microenvironment, accumulated somatic mutations and stage of ma- lignant progression reached by the tumour. In fact, an early report indicated that the proportion of leukemia stem cells varies up to 500-fold between patient samples. More recent reports have suggested that relatively undifferentiated tumours at the histopathological level may contain higher proportions of cancer stem cells than their more differentiated counterparts. Furthermore, the num- ber of cancer stem cells may be different between tumour subtypes that arise from a single tissue, indicating that the cancer stem cells may be as numerous as the non-cancer stem cells in certain subtypes. However, the cancer stem cell hypothesis can be adapted to state that cancer cells can exist in at least two alternative phenotypic states with different tumour-seeding potentials, with- 36Connective tissue supporting the parenchymal cells of an organ.

61 3.1 The Cancer Stem Cell Hypothesis Introduction out imposing requirements on the number of cancer stem and non-stem cells needed for different tumour subtypes or developmental stages [182]. In vitro cultures in unattached conditions promote the growth of sphere-like cell ag- gregates that are routinely used for enrichment and propagation of stem cells. When tumours such as glioblastoma were cultured this way, only CD133+ cells and not CD133- cells, were found to successfully grow these spheres, renamed "tumour spheres", which expressed neuronal stem cell markers, showed highly tumourigenic potential and were resistant to radiation. Despite the success of suspension cultures to enrich for stem cell-like cells in glioblastoma and other tumour types found to behave similarly, it is unknown how stem cells in this in vitro simulation find the correct niche for the support of normal tissue stem cells, such as self-renewal, multipotency, proliferation, and differentiation. Therefore, radiation and drug sensitivity assays performed in sphere cultures have to be carefully designed and interpreted and comparing cells growing in adherent and suspension cultures may yield differences irrespective of cellular differentiation status [457].

A major paradox of the cancer stem cell model lies in the multi-step tumour progression model, in which one precursor population of pre-malignant cells evolves via mutation into a successor population that has a phenotypic advan- tage, such as an increased resistance to apoptosis or growth-inhibitory signals. The conventional depiction of the cancer stem cell model would state that the only cells within the precursor population that are qualified to evolve into a successor population are its stem cells, since only these cells are endowed with the self-renewal capabilities that are required to generate unlimited num- bers of progeny [182]. This view, however, ignores the inherent properties of malignant cells, i.e. genomic instability and the ability to undergo rapid evolutionary changes [457]. Also, if the percentage of stem cells in the precur- sor cell population is small, then according to the cancer stem cell model the number of cells that can serve as targets for genetic evolution is just as small, and this postulation does not agree with the observation that the mutation rate required to complete cancer formation would have to be up to two orders of magnitude above the rates described so far for human tumour cells [182]. Thus, a major problem with this hypothesis is that it assumes stability within the tumour and does not consider the possibility that the cancer stem cell phe- notype can be acquired. To this end, several studies have demonstrated that more differentiated cancer cells can acquire mutation or activate a transcrip-

62 3.2 Brain Cancer Stem Cells Introduction tion factor and become cancer stem cells [457]. This paradox may be resolved if the non-cancer stem cells in a precursor cell population could also serve as targets of mutation, leading to clonal succession and, therefore, tumour pro- gression [182]. For example, when tumour spheres derived from highly vascular glioblastomas were transplanted into mice, the initial tumours demonstrated low-grade glioma phenotype without any sign of angiogenesis. However, upon serial transplantation in vivo, the tumour cells developed a highly malignant phenotype with extensive angiogenesis and necrosis being present in the tu- mours. This finding highlights a well-established fact that tumour cells evolve and if more malignant and less differentiated cancer cells have growth advan- tage, they will be selected for and expanded in the tumour. Therefore, as tumours progress, the line between cancer stem cells and the rest of tumour cells might gradually become blurred and can even disappear [457]. Finally, the recent finding that, depending on the tumour analysed, glioblas- toma cancer stem cells can be CD133+ or CD133- cells, also emphasises that either we do not have good markers for cancer stem cells or all tumour cells are tumourigenic at varying degree, which brings us back to what we already know about tumours being diverse, genetically unstable, and evolving due to the intratumoural diversity of cellular genotypes and phenotypes. Especially in light of the recent finding that normal human differentiated cells can be converted into functional pluripotent embryonic stem cells by expressing the right combination of transcription factors [215,226,462], it is a possibility that more differentiated cancer cells may acquire stem cell phenotypes. This means that eradication of tumours will likely be achieved by the successful target- ing of all cancer cells using a cocktail of drugs effective against all cancer cell sub-populations and not only the stem cell type [457].

3.2 Brain Cancer Stem Cells

For many solid tumours within the CNS evidence in support of the cancer stem cell hypothesis has emerged. In human glioblastoma, two separate stud- ies from Bao et al [40] and Piccirillo et al [392] have separately tried to discern the stem cell nature of this tumour, identified in a pioneering study by Singh et al [458]. Glioblastomas are diffuse tumours that invade normal brain tissues and frequently recur from focal masses after radiation, suggesting that only a fraction of tumour cells is responsible for regrowth, supporting the cancer stem cell hypothesis in solid tumours. The heterogeneity of glioblastoma starts at

63 3.2 Brain Cancer Stem Cells Introduction the cellular level thanks to the identification of tumour-initiating cells express- ing or not expressing Prominin as CD133+ stem cells, or more differentiated CD133- cells that include glioblastoma progenitor cells, respectively [121,458].

The study conducted by Singh et al [458] reports the identification and pu- rification of a cancer stem cell exclusively isolated within the cell fraction expressing the neural stem cell surface marker CD133, from different pheno- types of human brain tumours. Higher-grade gliomas showed an increased self-renewal capacity with respect to lower-grade gliomas and, importantly, the CD133+ cells could differentiate in culture into tumour cells that pheno- typically resembled the tumour from the patient. Neurosphere assays were used to functionally characterize the tumour cell population and identified the CD133+ cell as representing the minority of the tumour cell population. This cell lacked the expression of neural differentiation markers, was necessary for the proliferation and self-renewal of the tumour in culture, and was capable of differentiating in vitro into cell phenotypes identical to the tumour in situ. Since the identified cancer stem cell markers CD133 and Nestin were also iden- tified as defining normal NS cells, it was suggested that brain tumours can be generated from cancer stem cells that share a very similar phenotype as normal NS cells. Such identification of a brain tumour cancer stem cell, set up the research scene for the following investigations of the tumourigenic process in the CNS. In the study by Singh et al, cultures from 14 solid primary pediatric brain tumours were set to favour stem cell growth, resulting in all tumours grow- ing within the first 48 hours as clonally derived neurosphere-like clusters, the "tumour spheres", that continued to proliferate and expand the tumour cell culture over time. All primary tumour spheres were assessed for ability to form secondary tumour spheres upon re-plating, and all successfully exhibited self-renewal abilities. The frequency of secondary tumour sphere generation correlated with the tumour’s tumour clinical aggressiveness and varied accord- ing to tumour pathological subtype. Both primary and secondary tumour spheres retained the expression of the NS cell markers Nestin and CD133, failing instead to express the neural differentiation marker of election for as- trocytes, GFAP, and neurons, TUBB3 [458]. In neurosphere conditions, in fact, brain tumour cells express Nestin and CD133 but also markers of neural precursors such as Sox2, Notch and Jagged-1 [122]. By inducing differentiation in culture, Singh et al observed that the dissociated tumour spheres preferen-

64 3.2 Brain Cancer Stem Cells Introduction tially differentiated down the lineage that characterised the original tumour phenotype of the patient, losing the expression of Nestin and CD133 and gain- ing the markers of differentiation that reflected the immunophenotype of the original tumour [458]. This is a common finding amongst astrocytoma cells that are grown as neurospheres, which differentiate into GFAP+ astrocytes, and glioblastoma cells, which instead differentiate into GFAP+ astrocytes as well as TUBB3 neurons, suggesting that the tumours derive from a cell with multi-lineage differentiation capacity, i.e. a stem cell, and not a dedifferenti- ated astrocyte as previously thought [122]. When tumour cell cultures from the 14 pediatric tumours were sorted for CD133 expression, CD133+ tumour cells showed growth as non-adherent tumour spheres with continuous expan- sion of their cell population, while CD133- cells adhered to the culture dishes without showing proliferation and not forming spheres, demonstrating that the stem cell properties resided in the CD133+ fraction [458]. With this study, Singh et al demonstrated that the CD133 marker can identify an exclusive subpopulation of brain tumour cells with NS cell activity, with three pieces of evidence supporting this view:

1. CD133+ cells generated clusters of clonally derived cells that resembled neurospheres, termed "tumour spheres";

2. CD133+ cells were capable of constant self-renewal, as well as prolifera- tion;

3. CD133+ cells differentiated to recapitulate the phenotype of the tumour from which they were derived.

It must be noted that self-renewal and proliferation differ because the former is a cell division that must involve a cell fate decision, so that at least one of the two daughter cells retains the full stem cell potential of the parent cell (Fig 3.2). Exploring the molecular mechanisms involved in cell fate decisions of brain tu- mour stem cell divisions could have important implications in understanding clonal expansion and maintenance of these tumours [122]. Although cancer- initiating abilities clearly reside in the CD133+ fraction, not every CD133+ cell is capable of initiating sphere formation in vitro, demonstrating that not every CD133+ cell has stem cell properties [122] and implying the existence of a hierarchy, which will be functionally elucidated as more surface markers for NS cells emerge and additional subpopulations are identified [458].

65 3.2 Brain Cancer Stem Cells Introduction

The study by Bao et al [40] aimed at assessing the resistance of the CD133+ cells to ionization therapies caused by the more efficient and active DNA re- pair mechanisms present in these cells with respect to the rest constituting the bulk of the tumour. The observations in this study showed that after ionizing- radiation treatment of glioblastoma cells grown in vitro or as grafts in mice, the surviving fraction was enriched in CD133+ cells. These cells had the ability to reinitiate heterogeneous tumours when transplanted into other mice, thus demonstrating retention of their stem cell abilities. To assess the biological relevance of the enriched CD133+ cells, xenografts with a constant number of total cells but increasing fraction of CD133+ cells were generated that showed a dose-dependent decrease in tumour latency, enhancement of tumour growth and vascularity. The successive irradiation of these xenografts demonstrated that viable tumour cells enriched for CD133+ cells could form secondary tu- mours with decreased latencies themselves, demonstrating that enrichment of CD133+ cells is crucial in glioma recurrence after radiotherapy. The CD133+ tumour cells showed characteristics consistent with cancer stem cells, i.e. neu- rosphere formation, expression of neural and cancer stem cell markers CD133, Sox2, Musashi and Nestin, as well as multi-lineage differentiation with mark- ers for astrocytes, neurons or oligodendrocytes. Furthermore, CD133+ cells derived from xenografts or biopsy specimens formed neurospheres, whereas CD133- cells rarely did. Finally, CD133+ tumour cells were highly tumouri- genic in brains of immunocompromised mice with characteristics of glioblas- toma and CD133- cells did not form detectable tumours even when implanted with high doses of CD133- cells, showing that CD133+ subpopulations were enriched for characteristics of cancer stem cells, including tumourigenesis in vivo [40]. Although ionizing radiation damages tumour cells through several mechanisms, it kills cancer cells primarily through DNA damage, identifying DNA damage checkpoint responses as having essential roles in cellular radio-sensitivity [40]. Progression through the mitotic cycle is driven by cyclin-CDK complexes, which ensure that all phases of the cell cycle are executed in the correct order. Terminally differentiated neurons cannot undergo cell cycle re-entry and CDK activity is suppressed through interactions with two main families of inhibitory proteins, the INK4 family that exhibits selectivity for CDK4 and CDK6, and the CIP/KIP family that has a broader range of CDK inhibitory activity (Fig 3.3) [115]. As demonstrated in the study by Bao et al, activating phosphoryla- tion of the checkpoint proteins Ataxia telangiectasia mutated (ATM), RAD17,

66 3.2 Brain Cancer Stem Cells Introduction

CDK1 and CDK2 was significantly higher in CD133+ cells than in CD133- cells, indicating that the former show greater checkpoint activation in response to DNA damage. The fact that CD133+ glioma cells demonstrated to activate checkpoint responses to a greater extent than CD133- cells, suggested that the resistance of CD133+ cells to ionizing radiation is due to preferential check- point activation. The finding that CD133+ cells were made less resistant to radiation if the checkpoint kinases CDK1 and CDK2, which control the pauses in cell-cycle progression that are scheduled for DNA repair to occur (Fig 3.3), were to be pharmacologically inhibited, provided the potential for a cure tar- geting the resilient stem cell mass. The CD133+ resistant population should be

Figure 3.3: The cell cycle of eukaryotic cells can be divided into four successive phases: M for mitosis, S for DNA synthesis, and two gap phases, G1 and G2. In the G1 phase extracellular cues may induce either commitment to a further round of cell division or withdrawal from the cell cycle into G0 to embark on a differentiation pathway. The G1 phase is also involved in the control of DNA integrity before the onset of DNA replication. During the G2 phase the cell checks the completion of DNA replication and the genomic integrity before cell division starts. Adapted from Dehay et al 2007 [115]. targeted with DNA checkpoint blockers to make these cells radiosensitive and thus, in one therapy cycle, potentially wipe out the entire tumour mass (Fig 3.4) [123]. Together, the results exposed by Bao et al showed that CD133 + cancer cells contributed to glioma radioresistance and tumour re-population through preferential checkpoint response and DNA repair, and targeting of checkpoint response in CD133+ cancer cells can overcome glioma radioresis- tance in vitro and in vivo, which may provide a therapeutic advantage to reduce brain tumour recurrence [40].

67 3.2 Brain Cancer Stem Cells Introduction

Figure 3.4: Glioblastoma treatment with ionizing radiation. Following radiation, the bulk glioblastoma responds and the tumour shrinks. But CD133+ cells activate checkpoint controls for DNA repair more strongly than CD133- cells, resisting ra- diation and prompting the tumour to regrow. These cells could be targeted with DNA checkpoint blockers to make them radiosensitive. Adapted from Dirks et al 2006 [121].

The approach taken by Piccirillo et al [392] is based on the role that BMPs detain as soluble factors that induce mature astrocyte differentiation in brain tumour-initiating cells in glioblastoma. As already mentioned, these tumour- initiating cells represent a small fraction of glioblastoma cells that belong to the CD133 pool, display self-renewal in vitro, generate a large number of progeny, are multipotent and can perpetuate across serial transplantation. Given that BMPs favour the acquisition of an astroglial fate (see Section 2.2), the study aimed to assess their effect on weakening the tumour-forming ability of CD133+ cells by prompting their differentiation into astrocytes both in vitro and in vivo [392]. This approach implies that tumour populations at least partially retain a developmental hierarchy based on stem cells and remain able to re- spond to the normal signals inducing them to mature. Treatment of cultured glioblastoma progenitor cells or CD133+ cells with BMP, reduced the size of the tumours grafted into mice and prolonged the survival of the animal be- cause these cells were more mature and less invasive. Furthermore, CD133+ cells could not be recovered from these small tumours and they were also found to be incapable of serial engraftment, although some mice still died after three months from treatment, showing that presumably some cancer stem cells es- caped BMP treatment and were capable of re-iterating tumour formation (Fig 3.5) [123]. Amongst all BMPs assessed, BMP4 elicited the strongest effect by triggering a significant reduction in the stem-like, tumour-initiating precursors of hu- man glioblastomas. In fact, transient in vitro exposure to BMP4 abolished the capacity of glioblastoma cells to establish intracerebral grafts, produc-

68 3.2 Brain Cancer Stem Cells Introduction ing a pro-differentiation action predominantly in the astroglial direction and depleting the pool of tumour-initiating cells. Based on these results, it was inferred that the in vitro reduction in the stem-cell-like tumour-initiating cells would correspond to a similar decline in the ability of BMP4-treated cells to form tumours in vivo. By transiently exposing glioblastoma cells to BMP4, in fact, the tumour-initiating stem-like population produced a significant de- crease in the in vivo tumour-initiating ability of glioblastoma cells, since it effectively blocked the tumour growth and associated mortality in 100% of the mice subjected to intracerebral grafting. It is hypothesised that BMPs acti- vate their cognate receptors and trigger the Smad signaling cascade, causing a reduction in proliferation and increased expression of markers of neural differ- entiation, with no effect on cell viability. In fact, blocking endogenous BMP4 reduces Smad signaling37 and increases glioblastoma cell growth, perhaps by regulating the balance between proliferation and differentiation, and favouring the production of the differentiated astroglial-like cells normally found within glioblastomas. The authors surmise that BMP4 may reduce the frequency of tumour-initiating stem cell-like cells by decreasing symmetric cell cycles that generate two identical cells on division, triggering differentiation of a subpop- ulation of tumour-initiating stem cell-like cells or blocking their proliferation and progeny, which, although not mutually exclusive events, could all con- tribute to reducing the tumour-initiating stem cell-like population. Thus, the signaling system constituted by BMPs and their receptors may also act as a key inhibitory regulator of tumour-initiating, stem-like cells in glioblastomas, other than controlling the activity of normal brain stem cells. Importantly, the results of the study by Piccirillo et al also identified BMP4 as a novel, non-cytotoxic therapeutic effector, which may be used to prevent growth and recurrence of glioblastomas in humans [392].

Both these studies added depth to the cancer stem cell hypothesis, illustrat- ing the potential of re-examining cancer under this new light and highlighting the importance in cancer research of dissociating solid tumour samples into single cell suspensions to purify the stem cell fraction and test its response to treatment. Improved purification of the tumour, however, will be required 37A signaling system that involves the activation by membrane receptor protein kinases bound by the TGF-β ligand, of a family of receptor substrates, the Smad proteins, that assemble into multi-subunit complexes to go into the nucleus and activate transcription. This signaling pathway regulates a wide variety of cell-specific responses, depending on what is "the cellular context" that the TGFβ family members and multifunctional hormones are acting in [321].

69 3.2 Brain Cancer Stem Cells Introduction

Figure 3.5: Glioblastoma treatment with BMPs. BMPs normally cause NS cells to differentiate into astrocytes. When used to treat isolated glioblastoma CD133+ cells, they weaken their tumourigenicity both in vitro and, when engrafted into mice, in vivo. The knowledge that a tumour retains a developmental hierarchy suggests that targeting different cell populations is a promising therapeutic strategy. Adapted from Dirks et al 2006 [121]. because the true stem cells are probably a subpopulation of the CD133+ frac- tion, as demonstrated in the study by Bao et al by the death rates of mice after three months from treatment [123]. Another study attempting to eluci- date the role of CD133 was conducted in vivo on mice that were injected with 100 to 1,000 uncultured malignant brain tumour cells purified by bead sorting for CD133 [459]. These CD133+ cells reproduced a phenotypical copy of the patient’s original tumour and were also heterogeneous, with only a minority of cells expressing CD133. This suggested differentiation in vivo, making dif- ferentiation therapy look like a real option for brain tumour treatment and denoting that brain tumours of different types are also functionally heteroge- neous for tumour-initiating ability [122].

Several other studies have since then focused on the isolation of functional can- cer stem cell markers in different types and stages of brain tumours. In a study by Balenci et al [39], the mammalian IQ motif containing GTPase activating protein 1 (IQGAP1), considered to be a scaffolding protein at the intersection of several signaling pathways such as control of cell adhesion, polarization, di- rectional migration and neuronal motility, was found to be a reliable marker of Nestin+ amplifying neural progenitors in rat brain. This protein is highly

70 3.2 Brain Cancer Stem Cells Introduction abundant in rat and human glioma cell lines and it specifies a subpopulation of amplifying tumour cells in glioblastoma-like tumours but not in tumours with oligodendroglioma features, making it a reliable marker to distinguish oligodendroglioma from glioblastoma. These findings suggest that the ampli- fying IQGAP1+ cancer cells are closer to a multipotent progenitor cell and they represent the most aggressive cancer cell population in glioblastoma [39].

In a study by Ma et al [304], the expression of the stem cell markers CD133, Nestin, Sox2 and Musashi-1 amongst others was investigated in 72 astrocy- tomas of different WHO grades to find out that the expression of these markers positively correlated with an increase in the WHO grade of the astrocytomas. Finally, in a very interesting study by Wang et al [520], the stem-cell-like CD133+ fraction from 14 human glioblastomas was shown to include a subset of vascular endothelial-cadherin (CD144)-expressing cells with characteristics of endothelial progenitors capable of maturing into endothelial cells. They conclude that a subpopulation of cells within glioblastoma can give rise to endothelial cells via a CD133+/CD144+ endothelial progenitor intermediate included in the CD133+ cancer stem-cell-like fraction. This discovery opens up important clinical options since the strong correlation of tumour grade and neoplastic vasculature in human gliomas indicates that agents blocking the en- dothelial transition of tumour cells may provide a novel therapeutic strategy.

Recognition of the forebrain SVZ astrocytes as the descendants of radial glia in the adult brain and their capacity to act as NS cells in vitro raises the prospect that this cell type might be responsible for tumour expansion, identifying it- self with the cancer stem cell population of the cancer stem cell hypothesis. Therefore, although it is not yet clear whether cancer-initiating events occur in NS cells, progenitors or differentiated cells, NS cells are attractive candi- dates. Their self-renewal abilities would allow an oncogene to more easily initiate uncontrolled proliferation, and their potential for transformation has been further considered based on the observations that human brain tumours frequently arise deep in the brain near the SVZ. In p53-/- mice, more prolifer- ative activity is found in the SVZ and more neurospheres can be isolated from this region, suggesting an expansion of the NS cell pool, which may make the area more susceptible to neoplastic transformation [122]. Moreover, normal NS cells were found in the CD133+ population of the normal human fetal brain, again suggesting that the cell of origin of brain tumours may be a normal NS

71 3.3 Glioma Culture Systems Introduction cell [458]. Much of the insight into brain tumour stem cells comes directly from NS cell research because the discovery of such stem cells in the mammalian brain has laid the foundations for designing experiments aimed at assessing whether the hierarchy based on stem cells seen in adult healthy brain also exists in human brain tumours and future investigations will hopefully clarify where the brain tumour stem cell sits along the lineage hierarchy of cells [122].

In murine models for skin carcinogenesis, the tumour phenotype arising from overexpression of HRAS, a human gene involved in the regulation of cell divi- sion in response to growth factor stimulation, depended on the cell compart- ment it occurred in, with suprabasal layers yielding benign tumours and hair follicle bulge regions, the putative location for skin stem cells, yielding inva- sive carcinomas. Along these lines, different cells of origin might give rise to different types of brain tumours, with the more benignant ones arising from restricted progenitors and the more aggressive ones from stem cells or early progenitors. The cell of origin question is hampered by limited definition of the normal NS cell hierarchy, especially by a lack of promoters that can spec- ify gene expression in distinct compartments of the stem cell hierarchy and of cell surface markers that can distinguish stem cells from multipotent or lineage-restricted progenitors. Nestin, Sox2 and CD133 identify NS cells and progenitors but no definite lineage-restricted progenitor cells have been identi- fied. GFAP marks a rare NS cell population as well as differentiated astrocytes, complicating the interpretation of these studies [122].

3.3 Glioma Culture Systems

Glioma Cancer Cell Lines

The historically adopted use of cancer cell lines to delineate tumour biology and do preclinical drug screenings needs to be re-evaluated in light of the re- cently discovered stem cell component of solid tumours, in order to assess how well cancer cell lines reflect this characteristic amongst others with respect to primary tumour cultures. Phenotypic characteristics and the multitude of genetic aberrations found within repeatedly in vitro passaged cancer cell lines often bear little resemblance to those found within the corresponding primary human tumour and to this end, the study by Lee et al [261] attempted to find a more biologically relevant model system for exploring glioma biology and for

72 3.3 Glioma Culture Systems Introduction the screening of new therapeutic agents. In fact, little else is understood on the similarities of glioma stem-like cells, human NS cells and the primary tumours, besides the morphological similarities and the differentiation capacity of these cells [156,458,543]. Therefore, Lee et al undertook a series of experiments to identify and better characterise glioma tumour stem cells and their relation- ship with the primary tumour and traditional glioma cell lines [261]. Firstly, most cancer cell lines including glioma, are grown in media containing serum, unlike NS cell cultures that are serum-free since serum causes irre- versible differentiation of NS cells [107,427]. In order to assess how primary tumour cells are affected by the presence or absence of serum, single cell sus- pensions of freshly resected and dissociated glioblastoma tissues were cultured under conditions optimal for propagation of normal NS cells, termed "NBE" conditions [107], as well as conditions optimal for growth of glioma cancer cell lines, termed "serum" conditions. Under these two conditions, profound biological differences were found in vitro:

· Cells in NBE conditions readily proliferated both as tumour spheres and as an adherent monolayer, as is seen with normal NS cells. In contrast, cells cultured in serum conditions formed a morphologically heteroge- neous monolayer that within a month became homogeneous with a mor- phology reminiscent of fibroblasts or epithelial cells.

· Cells cultured in NBE proliferated at a constant rate regardless of passage number, whereas cells cultured in serum showed initial growth followed by a plateau phase, only to eventually proliferate at a much greater rate in later passages. To verify the extent of this behaviour, NBE cells were influenced by addition of serum and recapitulated the growth pattern seen in the serum cell population, while serum cells that were influenced by addition of NBE culture media nearly ceased growing. This indicated that the initial culture of primary tumour cells in serum conditions leads to profound biological changes that cannot be subsequently reversed fol- lowing transition to NBE conditions.

· Upon removal of EGF and FGF2 or addition of RA and serum, NBE cells stop expressing NS cell markers Nestin, Sox2, and Stage-specific embryonic antigen 1 (SSEA1) and start differentiation towards the glial and neuronal lineages, although 40% of cells co-stained for glial and neuronal markers together, suggesting that differentiation pathways in these NBE cells are not entirely normal. Serum cells expressed few or no

73 3.3 Glioma Culture Systems Introduction

NS cell markers and did not respond to differentiation cues such as RA.

· In clonogenicity assays in which cells were tested for neurosphere forma- tion upon single cell plating, NBE cells showed clonogenic frequencies reminiscent of NS cell-derived neurospheres, whereas serum cells failed to form neurosphere-like cells when plated in NBE conditions.

· The telomerase activities of NBE and serum-cultured cells differed in that, although both cells retained telomeres as determined by fluores- cence in situ hybridization (FISH), NBE cells had consistent telomerase activity regardless of passage number, whereas telomerase activity was lost when these cells were cultured in serum-containing media, consis- tent with what occurs in normal NS cells. Likewise, early passage serum cells did not have appreciable telomerase activity, which, however, they gained back in later passages of exponential growth phase.

Taken together, these data demonstrate that NBE cells, as observed by the two NBE cell lines followed in the study, contain many of the self-renewal and differentiation characteristics of NS cells, whereas serum cultured cells do not [261]. Other important characteristics tested on these two cell types were:

· Tumourigenic potential in vivo. Upon injection into the brains of neonatal immunodeficient mice, NBE cells demonstrated retention of a tumourigenic potential independent of passage number with as low as 1,000 cells, whereas serum-cultured cells at early passages did not show any tumourigenicity. When established NBE cell-derived tumours were dissociated and cultured under NBE conditions to be injected again into the brains of new recipient mice, there was no loss of tumourigenic po- tential, unlike when the same xenograft-derived cells were grown under serum conditions and all subsequent tumourigenic potential was lost. This suggested that the loss of tumourigenicity was due to the serum- culture conditions. Interestingly, although early passage serum-cultured cells did not form tumours, the late passage, exponentially growing, telomerase-positive ones did at an increasing rate with progressive pas- sage number.

· Phenocopy of the original human glioblastoma tumour. While the intracranial tumours generated by NBE cells demonstrated exten- sive infiltration along white matter tracts as observed in glioblastoma patients, all the tumours generated from late passage serum cells were

74 3.3 Glioma Culture Systems Introduction

well delineated with little tumour cell infiltration, a characteristic pheno- typically identical to the human tumour xenografts generated from the standard glioma cell lines, demonstrating that only tumours derived from NBE cells phenocopy the critically important histopathological features of the original human glioblastoma tumours.

· Transcriptional activity. The gene expression profiles of NBE cells, serum cells, their derived xenograft tumours, and the original glioblas- toma tumours, show that the transcriptional landscape of NBE cells and their derivative xenograft tumours is more closely related to that of NS cells, and parental tumours, while the transcriptional landscape of serum cells and their derivative xenografts is more closely related to that of glioma cell lines and their derivative tumours. These data demon- strate that NBE cells are remarkably similar to normal NS cells and their derivative tumours properly maintain many biological characteristics of the parental glioblastomas and other primary glioblastomas, whereas tra- ditionally grown, serum-cultured cancer cell lines do not.

· Genomic changes. These were evaluated by performing SNP analy- sis and spectral karyotyping (SKY) on NBE and serum-cultured glioma cells. Deletion of the CDKN2A:ARF locus on , loss of chromosome 10q, trisomy of chromosome 7, and local amplification of the EGFR locus are common genomic features in primary glioblastomas and they were found in all surgical samples. The serial genomic DNA profiles of NBE and serum cultured cells at various passage numbers were analysed by SNP analysis and showed that even after one year of main- tenance (more than 70 passages) in NBE culture condition, NBE cells largely maintained their parental tumour genotype, while serum cells underwent significant genomic rearrangements as early as two months of culture (less than 10 passages). Intriguingly, LOH in chromosomes 4 and 17 found in most of the late passage serum cells coincided with the onset of increased proliferation, tumourigenicity and aneuploidy typical of these cells, and carried the hCDC4 and p53 genes, respectively. The fact that the remaining copy of hCDC4 was found to be down-regulated hinted at the possibility that this E3 ubiquitin ligase involved in the regulation of the aurora kinase STK15, up-regulated in these cells and known to cause aneuploidy through inactivation [313,416], was partially responsible for the increased proliferation, tumourigenicity, and aneu-

75 3.3 Glioma Culture Systems Introduction

ploidy observed in late passage serum cells. In addition, loss of the wild-type p53 allele in late passage serum cells leaves only the mutant p53 allele found in both the NBE cells and the parental tumours, caus- ing a well-known genomic instability [130,153]. These genomic changes further demonstrate the significant differences between serum cells and their matched parental tumours [261].

In conclusion, by using a model system derived from primary glioblastomas, Lee et al [261] have demonstrated that NBE-cultured cells derived from pri- mary glioblastomas bear remarkable similarity to normal NS cells by retaining the ability to form neurospheres in vitro; an indefinite self-renewal potential; the ability to differentiate into the glial and neuronal lineages of the CNS; hav- ing gene expression profiles similar to NS cells; bearing genetic stability over many passages in vitro; harbour all of the genetic aberrations found within the primary tumour; have gene expression profiles similar to the glioblastomas they were derived from; appear to be the principal tumourigenic cell type that can recapitulate the overall in vivo phenotype of the parental glioblastoma. By contrast, cells derived from the same glioblastoma specimens but grown in serum-containing media lose all the characteristics of primary tumour cultures, although they ultimately regain the tumourigenic potential in later passages without being able to recapitulate the tumourigenic phenotype of the original tumour however, but rather matching the phenotypic and genotypic patterns found in most glioma cell lines. Since several groups have reported that tumour stem cell-like cells can be isolated from the established tumour cell lines by culture in serum-free me- dia with selected growth factors [242,385], experiments need to be carried out to validate how these cells maintain their tumour stem cell-like properties in differentiation-inducing conditions and if cells with stem-like properties may emerge again through epigenetic reprogramming or selection of a subpopula- tion of cells with genomic instability. Based on their findings, Lee et al propose that the inherent tumour stem cell population within primary glioblastomas is quickly lost in typical glioma cul- ture conditions, and the cells found following prolonged in vitro passages are the product of an outgrowth of a cell clone that has undergone profound de novo genetic and/or epigenetic changes. This indicates that NBE cells may be an optimal model system for understanding the biology of primary human tumours, for the preclinical screening of agents and to guide personalized tu- mour therapy. The table 3.1 below summarises the findings of this study [261].

76 3.3 Glioma Culture Systems Introduction

Table 3.1: Summary of characteristics of NBE and serum-cultured glioblastoma cells. Adapted from Lee et al 2006.

NBE-cultured Serum-cultured

Limited growth, plateau, ex- Proliferation Constant ponential growth Clonogenicity, Yes, regardless of passages Not at early passages tumourigenicity Differentiation Induce to become glial Do not respond to differen- potential and neuronal lineages tiation stimuli Negative initially, but be- Telomerase activity Positive came positive at late pas- sages Extensive migration, phe- Fail to show infiltration like tumour histology nocopy of primary human glioma lines GBMs Differentiation from pri- Similar to primary human Global gene expression mary tumours but similar GBMs to common glioma lines Nestin, Sox2, CD133, NSC-related genes - Musashi and Bmi Same as parental tumour Additional alterations not Genotype regardless of passages found in parental tumour

Glioma Neural Stem Cells

The demonstration that the adult human brain maintains areas of radial glia populations that have been shown to give rise to NS cells within the SVZ, raised the prospect that these multipotent cells could be the alternate cells responsible for glioma expansion to the differentiated glia in the brain parenchyma [122,390,399]. As rare populations of stem cells are being discov- ered in different tissues, the cancer stem cell hypothesis reinforces its statement that cell lineage organization in tumours is hierarchical rather than stochastic and only the sub-population of cancer stem cells is responsible for the expan- sion of the tumour [123,458]. Therefore, in vitro expansion of the putative brain cancer stem cells as stable cell lines would provide a powerful model system to study the human disease by giving insights into the origin of tumour heterogeneity and enable further analysis of the self-renewal, commitment, and differentiation processes, which will hopefully lead to more targeted therapeu- tic strategies [404].

77 3.3 Glioma Culture Systems Introduction

As mentioned in section 3.1, the neurosphere culture paradigm has been used successfully for enrichment of tumour-initiating cells from brain tumours in- cluding glioblastoma in serum-free media [156,261], an improvement on "clas- sic" serum-cultured glioma cell lines that fail to model accurately the human disease [404]. However, neurosphere culture has several limitations:

· short-lived progenitor cells also proliferate in suspension culture and true clonal analysis is hampered by sphere aggregation;

· the spontaneous differentiation and cell death accompanying stem cell divisions in the sphere environment limits rigorous assessment of stem cell behaviour and marker analysis based on bulk populations;

· the true nature of the stem cell compartment across the spectrum of gliomas and their relationship to in vivo progenitors remains unclear.

Unlike neurosphere culture, adherent culture provides uniform access to growth factors, suppressing differentiation and enabling expansion of highly pure pop- ulations of stem cells. The methodology reported by Conti et al [107], Pol- lard et al [403] and Sun et al [481] in section 2.2 for deriving and expanding mouse and human adherent NS cell lines in the presence of EGF and FGF2, was adopted by Pollard et al [404] in an experiment designed to test whether these same conditions enabled the isolation and expansion of stem cells from gliomas. In these conditions, six cell lines were expanded for at least one year and more than 20 passages, without any significant alteration in growth rate or known GBM-related genetic aberrations. Cell lines were established from histopathologically distinct types of tumour:

· three cases of glioblastoma multiforme: G144, G166 and GliNS2, with the glioma sample from patient number 144 established as a biological replicate in cell line G144ED, derived independently of G144 but using the same initial tumour sample. In all analyses performed, no striking differences in marker expression or behaviour were found between the two cell lines;

· one case of giant cell glioblastoma: G179;

· one case of anaplastic : G174.

These cell lines were phenotypically characterised by immunocytochemistry to ensure their similarity to the fetal NS cells used as normal counterpart,

78 3.3 Glioma Culture Systems Introduction which confirmed coherent expression of NS cell markers and neural progenitor markers Vimentin, Sox2, Nestin, CD44 and 3CB2, and by time-lapse videomi- croscopy, which confirmed the dynamic changes in cell shape and the high motility typical of fetal NS cells, with G166 cells being less motile than G144 cells. Given these similarities the cell lines were termed glioma NS cell lines or GNS cell lines [404]. In comparing the efficiency with which adherent GNS cell lines could be established compared to suspension culture in neurospheres, it was found that although most samples generated neurospheres upon initial plating, only two cell lines could be passaged further, while in adherent condi- tions the establishment of cultures was successful for all cell lines for at least 10 passages, possibly due to the increased differentiation and apoptosis processes that take place in neurosphere culture [404]. To determine whether the GNS cells displayed genomic alterations charac- teristic of glioblastoma, molecular cytogenetic analyses were performed us- ing SKY, locus-specific FISH, and comparative genomic hybridization, which showed G144 cultures exhibiting simple numerical gains of chromosomes 7 and 19, together with loss of chromosomes 6, 8, and 15. These aberrations, with the exception of chromosome 8 deletion, are all commonly associated with glioblastoma [326,383]. Furthermore, comparative genomic hybridization for G144 revealed an amplification of the CDK4 locus on chromosome 12 and deletion of PTEN on chromosome 10. However, late passage G144 cultures did show a more complex and heterogeneous pattern of both numerical and structural chromosomal change. Similarly, G179 exhibited a more complex chromosomal constitution at later passages, and like in G144, polysomic gain of whole chromosome 7 was evident. Thus, although GNS cells do not display gross chromosome instability, alterations in whole chromosome copy number do occur following long-term in vitro expansion, an issue that can be circum- vented by routinely using cultures that are expanded for no more than 20 passages [404].

In order to test the capacity of GNS cells to initiate tumour formation, cells from all GNS cell lines were injected intracranially into immunocompromised mice that were sacrificed at 5 and 20 weeks, with the former scenario revealing large numbers of engrafted human Nestin immunoreactive cells and the lat- ter scenario showing the formation of large and highly vascularised tumours histopathologically similar to human glioblastoma tumours. The cellular het- erogeneity was highlighted by immunostaining the xenografts for Nestin and

79 3.3 Glioma Culture Systems Introduction

GFAP, and performing flow cytometry for CD133. Furthermore, transplanta- tion of these cells after in vitro exposure to differentiation-promoting conditions delayed tumour formation. In most transplants a striking infiltration of the brain reminiscent of the human disease was observable with the exception of cell line G166, which generated a more defined tumour mass. To determine the tumour-initiating potential of the GNS cell lines, transplants were carried using 10-fold dilutions starting with 100 cells, which were sufficient for cell en- graftment of all GNS cells and for G144 and G174 were capable of generating an aggressive tumour mass, which required 1,000 cells in the case of G166 and G179. Unlike GNS cells, normal fetal NS cells never generated tumours at any dilution. Finally, to determine whether the tumour-initiating cells could self-renew within the xenograft, serial transplantations from the tumour mass into secondary and tertiary recipients was carried out using G144, G144ED and G179, showing in each case the generation of a tumour and thus demon- strating that long-term expanded GNS cell lines remain highly tumourigenic and are capable of forming tumours that appear to recapitulate the human disease [404].

Since the important defining property of stem cells is their ability to gen- erate differentiated progeny, the differentiation programs of the GNS cells were evaluated keeping in mind that the prevalent form of glioma is the GFAP+ astrocyte-like cell-containing astrocytoma, which can also contain anaplastic cell populations and, in some cases, an oligodendrocyte compo- nent [301]. For all GNS cell lines, the differentiation to Octamer-binding pro- tein 4 (Oct4)+oligodendrocytes or TuJ1+ neurons was fully suppressed in the presence of EGF and FGF2, in contrast to what is observed in glioma neuro- spheres. Upon growth factor withdrawal, NS cells differentiate into neurons (see Fig 2.9), but in contrast to that, G144 and G179 GNS cells differentiated into Oct4+ or CNPase+38 oligodendrocyte-like cells, and TuJ1+ cells, respec- tively. Neuronal-like cells or oligodendrocytes were not apparent in G166 cul- tures, which continued to proliferate in the absence of EGF and FGF2 without clear differentiation. The tendency of G144 cells to differentiate into oligo- dendrocytes was surprising because efficient oligodendrocyte differentiation of mouse and human fetal NS cells requires exposure to thyroid hormone, ascor-

38CNPase is a 2Õ, 3Õ-cyclic nucleotide 3Õ-phosphodiesterase that catalyses the in vitro hydrolysis of 2Õ, 3Õ-cyclic nucleotides to produce 2Õ-nucleotides and has an in vivo function that remains to be elucidated. High CNPase expression is seen in oligodendrocytes and Schwann cells, accounting for roughly 4% of the total myelin protein in the CNS [137]

80 3.3 Glioma Culture Systems Introduction bic acid, and PDGF [165]. Thus, the authors suggested that either G144 cells represented a corrupted three-directional potent state with acquired genetic changes influencing the lineage choice toward oligodendrocyte commitment, or that they may have a distinct phenotype more similar to oligodendrocyte precursor cells than NS cells. To distinguish between these two possibilities, G144 was tested for the expression of established markers of oligodendrocyte precursor cells, i.e. Olig2, Sox10, NG2, and PDGFRα prior to and during differentiation, to find out that G144 cells stably exhibited an oligodendro- cyte precursor-like phenotype expressing these markers prior to the beginning of differentiation by growth factor withdrawal. In fact, upon re-examination based on histopathology and CNPase staining, G144 was shown to have a significant oligodendrocyte component even though originally diagnosed as a glioblastoma [404]. In determining whether GNS cells could generate astrocytes, G144 and G179 were observed to undergo a striking change in cell morphology after seven days from addition of BMP4 and removal of EGF and FGF2, the protocol used for astrocyte differentiation of NS cells (see Fig 2.9). The majority of cells ex- pressed high levels of GFAP, although the frequency was much lower for G166, and cell lines G144 and G179 expressed low levels of the Doublecortin (Dcx)+ neuronal marker. This ensemble of events showed that although GNS cells retain the capacity to differentiate, efficiency and lineage choice vary dramat- ically between each line [404]. GFAP is expressed in radial progenitors and radial glia in the developing pri- mate nervous system, as well as putative NS cells within the adult SVZ and, specifically, at low levels in human fetal NS cell lines. Since an alternatively spliced form, GFAPδ, was shown to mark human SVZ astrocytes [432], the behaviour of the GNS cell lines with respect to GFAPδ was assessed. The expression level of GFAPδ mRNA was five times greater in G179 than in G144 and G166, with G179 cells also up-regulating GFAP expression following BMP treatment and G144 cells remaining, instead, predominantly negative. Co-expression of GFAPδ, Sox2, and Nestin was specific to G179 cells, which, together with the ability to generate neuronal-like cells in vitro, are all fea- tures conserved in adult SVZ astrocytes [443]. The fact that G166 cells lacked the expression of GFAPδ and oligodendrocyte precursor markers, but differ- entiated toward GFAP+ astrocytes in vitro, suggested similarity to a more restricted astrocyte precursor. Thus, despite their shared capacity to prolif- erate in response to EGF and FGF2 and the widespread expression of neural

81 3.3 Glioma Culture Systems Introduction progenitor markers, there are underlying differences between GNS cell lines that may reflect the existence of distinct subtypes of regular neural progeni- tors, and GFAPδ may be of use in discriminating astrocyte-like cells that have stem cell properties [404].

To evaluate the relationship between each GNS cell line and their correspon- dence to fetal NS cells, mRNA expression profiling was carried out using mi- croarrays, and PCA39 revealed that each GNS cell line had a transcriptional state that more closely related to fetal NS cells than to adult brain tissue (Fig 3.6). Consistent with what was found about the various GNS cell lines in the study by Pollard et al [404], the PCA analysis confirmed that G179 and G166 have a distinct expression profile, both from one another and to G174, G144, and GliNS2. The microarray data also specifically confirmed the expression

Figure 3.6: PCA of global mRNA expression in each GNS cell line (black, G144, G144ED, G166, G179, G174, and GliNS2), fetal NS cells (red, hf240, hf286, and hf289), and normal adult brain tissue (blue). The "a" and "b" are biological repli- cates of cell line hf240. Taken from Pollard et al 2009 [404]. of the oligodendrocyte precursor cell markers Sox8, Sox10, Olig1, Olig2, and Nkx2.2 in G144, and the lower expression levels of the same markers observed in G179, which, instead, showed higher levels of GFAPδ expression. The fact that GliNS2 clustered closely with G144 and expressed the oligodendrocyte precursor markers, suggested that the G144 phenotype may not be unique. The G174 cell line, instead, derived from an oligoastrocytoma, clustered closely to GliNS2 and G144 because it expressed higher levels of Olig2, although it failed to express Sox10 or the pluripotency markers Oct4 and Nanog. Finally, G166 39A common technique to find patterns in data of high dimension and expressing the data in such a way as to highlight similarities and differences. Mathematically, it is a procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components that are less than or equal to the number of original variables.

82 3.3 Glioma Culture Systems Introduction showed higher levels of EGFR than any other cell line, perhaps explaining its resistance to differentiation upon EGF withdrawal or BMP treatment [404]. The CD133 and CD15 cell surface markers are expressed on fetal and adult neural progenitors and brain tumour-initiating cells (see Section 3.2). For G144 and G179, heterogeneity for both markers was observed, similarly to fetal NS cells [480], whereas G166 was found to be negative consistently with the microarray expression data. The fact that CD133 was not present on all GNS cell lines confirmed that this marker does not universally identify tu- mourigenic cells in malignant glioma. Unlike CD133 and CD15, the hyaluronic acid-binding protein CD44, characterised as an astrocyte precursor marker and recently demonstrated to also mark NS cells in vitro [401], was uniformly ex- pressed in all GNS cell lines [404]. To demonstrate the proof of principle of the utility of GNS cells with respect to the shortcomings of the neurosphere assay (see Section 2.2), a small-scale chem- ical screen of known pharmaceutical drugs was carried out. Importantly, the results of this screen extended to human brain cancer stem cells the observa- tion that mouse neurospheres are sensitive to modulation of neurotransmitter pathways, and future more in depth studies will have to determine whether the drugs found in the screen that modulate the serotonin pathway of the adherent GNS cell lines, can limit growth of xenograft tumours in vivo [404].

In their paper, Pollard et al [404] have tackled two critical issues of the brain cancer stem cell hypothesis: 1. How to maintain and expand pure populations of cancer stem cells in vitro by expanding them as cell lines using the adherent culture methods previously established for fetal and human NS cells [107,481]. Specifi- cally, Pollard et al have demonstrated that suspension culture is not a requirement for successful long-term propagation of tumour-derived stem cells, and that expansion in adherent conditions overcomes the limita- tions of the neurosphere culture paradigm, such as increased levels of differentiation and apoptosis.

2. Elucidation of the phenotypic similarities between GNS cells and the en- dogenous progenitors within the developing and adult nervous system, e.g. NEP cells, radial glia, glial progenitors, oligodendrocyte progeni- tor cells, and SVZ astrocytes. For example, G144 cells strongly express markers of the oligodendrocyte precursor cell lineage and are biased to- ward oligodendrocyte differentiation, while G179 cells appear to be more

83 3.3 Glioma Culture Systems Introduction

similar to adult SVZ astrocytes, including expression of GFAPδ, although more specific markers are needed in order to confirm this. This perhaps indicates that the histological spectrum of different gliomas is dictated by the phenotype of the underlying tumour-initiating cells.

To summarise, when the same culture conditions for NS cells were tested on glioma tumours, NS-like cells were isolated and propagated as GNS cells. Nor- mal NS cells and GNS cells feature many commonalities, both in their morphol- ogy and immunoreactivity, to radial glia markers. However, unlike NS cells, GNS cells require less exogenous growth factor for stable proliferation and reca- pitulate the pathology of the original gliomas when xenografted into immuno- compromised mice. This system is highly unusual in that normal and diseased counterparts are morphologically and immunohistologically indistinguishable and yet the differentiation behaviour of the cancer stem cells is clearly aber- rant. The data generated using this system described by Pollard et al [404] is consistent with tumour stem cells arising following transformation of oligoden- droglial precursors or adult SVZ astrocytes, although it is considered equally plausible that short-lived progenitors or differentiated cells can be converted to a stem cell state through genetic or epigenetic disruptions. In conclusion, GNS cells provide a versatile and renewable resource to probe the biology of tumour-initiating cells and screen for agents that selectively and directly target them. tumour stem cell self-renewal, migration, apoptosis, and differentiation all represent potential therapeutic opportunities that are accessible in GNS cell cultures [404].

84 Chapter 4

The Non-Coding RNA World

Contents 4.1 MicroRNA regulation ...... 86 4.2 Target Prediction and Validation ...... 90

Non-coding RNAs are a growing subset of RNAs whose sizes, regulatory func- tions, range of regulated pathways and differential conservation within species distinguish them from the remainder of the coding and non-coding transcripts (mRNAs, tRNAs, rRNAs) classified as the traditional repertoire of RNAs ex- pressed within a cell. This new set of regulatory molecules in the RNA world is in continual expansion, as new types - unique in length, tissue specificity and regulatory mechanisms - are discovered and classified in different species. A broad distinction within this class of regulatory molecules is made in rela- tion to their length, distinguishing them between long and short non-coding RNAs (Fig 4.1). As a transcriptional class, long non-coding (nc)RNAs were first described during the large-scale sequencing of full-length cDNA libraries in the mouse [368], and are today arbitrarily considered longer than 200 nu- cleotides due to a practical cut-off in RNA purification protocols that excludes small RNAs. Their specific roles are still in the process of being unveiled, although they already have been implicated in high-order chromosomal dy- namics, telomere biology and sub-cellular structural organization [331]. While this class of molecules has been most extensively studied in animal species, studies of long non-coding RNAs in begin to emerge showing some con- servation of mechanisms [31]. On the other hand, the first small non-coding RNAs discovered were endoge- nous small-interfering RNAs (siRNAs) in plants [355,505], followed by mi- croRNAs (microRNAs) in C. elegans [264], repeat-associated small interfering

85 4.1 MicroRNA regulation Introduction

Figure 4.1: Classes of non-coding RNAs (ncRNAs) discovered to date: siRNA, small interfering RNA; microRNAs, microRNAs; rasiRNAs, repeat-associated small interfering RNAs; piRNAs, piwi-associated RNAs; endosiRNAs, endogenous small interfering RNAs; scnRNAs, scanRNAs; tasiRNAs, trans-acting RNAs.

RNAs (rasiRNAs) in D. melanogaster [24] and, only recently, piwi-associated RNAs (piRNAs) [23,164,523] and endogenous small interfering RNAs (en- dosiRNAs) in mouse [485,524]. Other classes are found in ciliates, such as scanRNAs (scnRNAs) that ensure the fidelity of genome inheritance to the next generation [336], and plants, such as trans-acting RNAs (tasiRNAs) [509]. Today siRNAs are known to act as gene silencers if exogenously introduced in mammals and microRNAs are well-established gene regulators in plants and , including the unicellular ciliate C. reinhardtii. Also, piRNAs are un- derstood to play a role in zebrafish - widening the regulatory horizons of such molecules - and rasiRNAs, now considered a subset of the piRNA class, have also been found in zebrafish [199]. In this thesis the focus will be maintained on the role of microRNA regulation in mammals.

4.1 MicroRNA regulation

The role of microRNAs has been studied in many different tissues and path- ways, as well as through various time points in the development of organisms. The first microRNA, lin-4, was identified in 1993 in a genetic screen for mutants that disrupted the timing of post-embryonic development in C. elegans [129], and today several hundred microRNAs are known to exist in the mammalian

86 4.1 MicroRNA regulation Introduction genome regulating up to two thirds of protein coding genes [150,256]. The way that microRNAs regulate gene expression is through an enzymatic pro- cess that starts in the nucleus and is driven by Drosha. The biogenesis of microRNAs in animal models starts when they are transcribed by RNA poly- merase II as primary transcripts, termed "pri-microRNAs". The initiation step of "cropping" is mediated by the Drosha-DGCR8 complex, also known as the Microprocessor complex. Drosha and DGCR8 are both located mainly in the nucleus. The product of this nuclear processing step is an approximate 70 nucleotide pre-microRNA, which possesses a short stem plus an approximate two nucleotide 3' overhang. This structure might serve as a signature mo- tif that is recognised by the nuclear export factor exportin-5. Pre-microRNA constitutes a transport complex together with exportin-5 and the GTP-bound form of Ran. Following export outside of the nucleus, the cytoplasmic RNase III Dicer participates in the second processing step termed "dicing" to produce microRNA duplexes. The duplex is separated and usually one strand is selected as the mature microRNA, whereas the other strand is degraded. When the other strand is not degraded it is indicated with the "*" mark in microRNA target prediction algorithms [232].

Transcriptional and Post-transcriptional Regulation

MicroRNA genes are encoded either in independent transcription units, in polycistronic clusters or within introns of protein coding genes. Indepen- dently of Drosha, pre-microRNA hairpins can also be generated from introns through the combined action of the spliceosome and the lariat-debranching en- zyme [239]. Animal microRNAs have been shown to function differently from microRNAs, with the former class binding to their target 3'UTRs by imperfect matching. In fact, the RNA-Induced Silencing Complex (RISC) me- diates target mRNA cleavage by loading siRNAs and mature microRNAs with perfect sequence complimentarily to their target mRNA. The extent of the mismatch region varies amongst different microRNA-mRNA duplexes in ani- mals, but it is rarely the case that sequence complementarity spans the entire microRNA sequence, meaning that transcript cleavage would seem to be just as rare a mechanism of repression. Because the extent of this complementarity is low in animals, there is a region spanning two to eight nucleotides from the 5' end of the microRNA that proves to be essential for the correct recognition of the message target. The 5' most nucleotide (t1) of the microRNA sequence does not have a significant role in this recognition process, as even when base

87 4.1 MicroRNA regulation Introduction pairing occurs between t1, usually an adenine, and m1 on the messenger, usu- ally a uridine, this feature alone does not contribute to a repressive effect on the target [364]. Interestingly, structural studies of the Argonaute40(AGO)-siRNA complex, demonstrated that the 5' most nucleotide of the guide RNA (equivalent to the m1 base of microRNAs) is not base-paired but instead bound by AGO [291]. Due to its biological role, the region two to eight nucleotides from the 5' end of the microRNA sequence is termed "seed" [273], which later proved an excel- lent name choice in view of the conservation found for it across most metazoan microRNAs [282]. A number of single nucleotide mutation studies monitoring the effect of repression of microRNAs on their known targets also confirmed the biological importance of the seed region [237,469]. The number and variety of complementary seed sequences on the 3'UTRs of target genes determines the potential for modulating their expression by different types and numbers of microRNAs at various stages of the life of a cell [239].

Three types of seeds have been observed to date displaying a range of target- ing efficiencies: "7mer-A1", lowest in the hierarchy, "7mer-m8", outperforming it, and "8mer" sites with the strongest repressive effects. The 7mer-A1 type matches over the entire seed sequence (seven positions) and carries an adenine at position t1. The 7mer-m8 type matches over the entire seed sequence as well but has an extra match at position 8 of the microRNA sequence, which accounts for its outperformance of the 7mer-A1 type. Finally, the 8mer site has a complete match over the entire seed sequence with an extra one at position eight, but outperforms the 7mer-m8 site by carrying an adenine at position t1 [359,446]. This relatively simplistic model, in which only the seed sequence and its closest nucleotides play a role in determining the efficacy of repression, has recently been refined to account for other features that seem to determine the identity of a microRNA target. Examples of these are the presence of spe- cific secondary structures, specific 3'UTR sequences surrounding the target site and the extent of complementarity between the 3'UTR and the 3' end of the microRNA sequence in order to compensate for the mismatching regions [239]. Another structural feature to account for are the G:U matches or "bulges" that may form along the recognition sites of some microRNAs [70]. In vivo experiments, in fact, seem to suggest that more than one G:U within 40Family of proteins that act as the catalytic component of the RNA-induced silencing complex by using small RNA guides (microRNAs, siRNAs and piRNAs) to bind and cleave target mRNAs.

88 4.1 MicroRNA regulation Introduction the seed region could compromise the efficiency of target repression, a finding recently confirmed by large-scale whole proteome microRNA impact analy- ses [446,469]. However, extensive complementarity with the 3' end of the microRNA sequence [469,538] and/or additional target pairing to microRNA nucleotides 13-16 [43,180] seem to counteract the compromising effect of G:U bulges on 7mer seed matches although these 3' compensatory and 13-16nt sup- plementary sites, unlike the seed sequence, do not appear to be under selection pressure [71,272] Of equal importance to the quality of the microRNA-target sequence match, is the overall accessibility of the target site [224], which may easily be obstructed by RNA-binding proteins [222] or the possible un-pairing of the seed flank- ing sequences rendered necessary by the secondary structures took on by the 3'UTR [224]. Various mechanisms have been documented to account for microRNA repres- sion in animals: translational inhibition during the initiation or elongation of protein synthesis, degradation of the nascent peptide chain, mRNA seques- tration to P-bodies41 and mRNA degradation. Even though understanding the choice of which mechanism is used for different microRNA-mRNA du- plexes is still a matter of controversy, it is a generally accepted concept that features of the microRNA sequences and proteins bound to them play an im- portant role in determining the method of suppression. The data available today is indicative of two main types of regulation: indirect or with direct effects on translation [359]. Amongst the transcriptionally indirect mecha- nisms are mRNA degradation through deadenylation and decappying. The direct effects on translation, instead, are divided into inhibition at the level of translation initiation, in which few or no ribosomes occupy the mRNA that is thought to be sequestered/stored in P-bodies, and inhibition post-initiation, in which the silenced mRNAs sediment in the polyribosome fractions in a su- crose gradient [239]. This latter type of inhibition can be further divided into three scenarios: premature ribosome drop-off, stalled/slowed elongation and co-translational protein degradation [359]. Data from many separate studies support all the listed mechanisms, indicating that the inhibition scenario is a very complex one. AGO2, for example, has been shown to bind to the m7G cap of mRNAs through a domain similar in structure to the one on translation initiation factor eIF4E. This observation suggests that the recruitment on the 41Distinct foci within the cytoplasm of the eukaryotic cell consisting of many enzymes involved in mRNA turnover.

89 4.2 Target Prediction and Validation Introduction mRNA’s 3'UTR of AGO2 via microRNA pairing blocks m7G cap recognition from the translation initiation factor, which is needed to start translation, thus supporting the inhibition of initiation view of miR-mediated repression [239]. Even so, mRNAs inhibited by the action of miRs have been found to be associ- ated with actively translating polysomes, reinforcing the idea that, depending on the mRNA-miR duplex, a different subset of cases is selected for miR- mediated repression [239].

4.2 Target Prediction and Validation

Initial efforts to characterise the properties of microRNAs focused on their comprehensive identification in sequenced eukaryotic genomes, the numbers of mRNAs targeted, and the degree of evolutionary conservation among dif- ferent microRNA species. Whole-transcriptome and proteome analyses have reinforced the importance of target site evolutionary conservation by demon- strating a stronger down-regulation of the mRNAs having conserved sites [37,139,359,446]. Although the surprising high 3'UTR shared among vertebrates, very few target sites are conserved among the drosophilid and ne- matode lineages [91] and only nine orthologous targets of miR-7 were found to be shared between Drosophila and human (as an intersection from 97 Drosophila and 581 human predicted targets) [266]. Following the cataloguing of microRNAs in a variety of organisms, subsequent work shifted towards the prediction and characterisation of their mRNA targets and the functional con- sequences of these regulatory interactions. In this context, a comprehensive list of empirical rules for microRNA target prediction should not overlook the importance of site accessibility. A fundamental challenge in the target prediction field has been to successfully predict the biologically functional targets while excluding false predictions [43]. To date, a number of sophisticated algorithms have been developed to address these questions and at least seven of these tools are publicly available, each one based on distinct criteria with varying false-positive rates. These tools include: Targetscan [272,273], EMBL [470], PicTar [127,247], EIMMo [155], miRanda [134,213], miRBase [179], PITA [224] and DIANA-microT [237,316]. Such methods differ considerably in their implementations of various predic- tion criteria; most search for complementary sequences between microRNAs and putative gene targets, while some consider physical and statistical hy- bridization properties, cross-conservation of regulatory RNAs between related

90 4.2 Target Prediction and Validation Introduction species, etc. As a result, prediction methods differ widely in their relative de- grees of accuracy and coverage and the results produced often disagree.

With predictions in the range of 300 evolutionarily conserved targets per mammalian microRNA family, it appears that microRNAs have the poten- tial to modulate the expression of nearly all the mammalian mRNAs [43]. The strong evolutionary pressure enacting the maintenance of the conserved tar- get sites within most of the 3'UTRs [150] is avoided by some housekeeping genes through the acquisition of exceptionally short 3'UTRs that are depleted of target sites [71]. A fundamental step in the microRNA target prediction pipeline is the experimental target validation of the predicted targets in order to confirm the validity of an approach over another. Several validation meth- ods have been employed, ranging from traditional genetic studies, rescue as- says [70], reporter-gene constructs [237,273] and mutation studies [71,124,237]. In addition to confirming or discharging hypotheses built on networks of pre- dicted targets, these validation approaches represent the real bottleneck of the whole process since they are most time-consuming and expensive. High- throughput approaches have also been developed, involving over-expression of microRNAs in cell lines followed by microarray profiling to detect downregu- lated targets [283] and the reverse approach of depleting microRNAs to identify up-regulated targets [425]. Assaying only relative changes in target mRNA levels without measuring the corresponding protein abundance is not sufficient to characterise all functional targets [19]. Thus, it is necessary to demonstrate that in addition to medi- ating the repression of gene expression through transcriptional degradation, microRNAs also directly repress translation [446]. Since proteins have dif- ferent turnover rates, a microRNA may require more or less time to change their steady-state levels. With the new version of the Stable Isotope Label- ing by Amino acids in Cell culture (SILAC) protocol developed by Rajwesky et al [446], called pulse-SILAC (pSILAC), only differences in newly synthe- sized proteins are detected by assaying the changes in their steady-state levels. SILAC is a technique based on mass spectrometry that uses isotopic rather than radioactive labelling of amino acids, to assay protein abundance in a cell. In standard and pSILAC, two cell populations are grown in identical culture media except for the presence, in one of the two cultures, of an isotope-labeled amino acid. In standard SILAC the labeled amino acid is fed to the cell cul- ture with the growth medium, so that it can be slowly incorporated into all

91 4.2 Target Prediction and Validation Introduction newly synthesized proteins and the raw protein concentration is monitored. In pSILAC, instead, the labeled amino acid is fed to the cell culture for a short period of time (a pulse), so that only de novo protein production is monitored [446]. Specifically, the pSILAC protocol developed by Rajwesky et al [446] involves labelling two cell cultures (transfected and control) with a heavy and a medium-heavy version of the same amino acid, respectively. Af- ter 8h from transfection the cultures are pulse-labeled and the samples, united after 32h post-transfection, are analysed by mass spectrometry. RNA from the same samples (8h and 32h) was analyzed by Affymetrix arrays. In this fashion, approximately 5,000 proteins were identified in HeLa cells with high confidence. The search for six nucleotide motifs within the 3'UTRs of those mRNAs whose protein levels decreased the most (never exceeding 4-fold) re- vealed that the most significant motifs were exactly the seed sequences of each respective microRNA. This showed that out of all proteins demonstrat- ing reduced synthesis, the levels of the direct targets of microRNAs decreased the most, and that this reduction is directly linked to the presence of the 3'UTR target site. Nevertheless, a number of repressed proteins without seed sequences are still targeted, since their level is decreased, but prediction al- gorithms cannot identify them because they are non-canonical and seedless. Amongst the strongly repressed targets, a search revealed that these are mainly proteins synthesized on the . A po- tential reason is that only mRNAs from cytosolic free ribosomes are targeted to P-bodies for degradation [446]. Other observations were that a nine to 11 nucleotide mismatch along the mRNA-microRNA duplex was necessary for the protein production to be repressed, being otherwise indistinguishable to that of mRNAs lacking the seed. Although mismatches are deleterious to siRNA- mediated mRNA cleavage, they seem to correlate with increased repression of protein production by microRNAs. Also, the presence of multiple seeds seemed to have multiplicative repressive effects with a higher impact when the seeds were proximal rather than distant. On average, the repression was found more pronounced for conserved rather than non-conserved seed sites, indicating that there are more determinants other than the seed that mediate efficient down-regulation of protein synthesis. Interestingly, as opposed to the seed enrichment observed in the down-regulated genes, no seed enrichment was observed in the up-regulated ones, strongly speaking against the microRNA- mediated activation observed in other works [446].

92 Methods

93 Chapter 5

Methods

Contents 5.1 Tag-sequencing Data Processing ...... 94 5.2 Array Comparative Genomic Hybridization ...... 98 5.3 Differential Gene Expression ...... 100 5.4 Quantitative Real Time-PCR Validation ...... 101 5.5 Literature Mining ...... 107 5.6 Differential Isoform Expression ...... 110 5.7 Differential Long ncRNA Expression ...... 112 5.8 Glioma Expression Signatures ...... 112 5.9 External Dataset Expression Correlation ...... 112 5.10 Glioblastoma Pathway Construction ...... 114 5.11 MicroRNA Target Prediction Analysis ...... 117

5.1 Tag-sequencing Data Processing

Instead of the traditional microarray expression profiling, Solexa sequencing was performed on total RNA from four GNS cell lines: G144ED, G144, G179 and G166, and two NS cell lines from fetal brain: CB541 and CB660 (Table 5.1). Tag-sequencing (Tag-seq) library preparation and sequencing was carried out using the Illumina DGE Tag Profiling kit according to the manufacturer’s instructions. We have submitted the data to ArrayExpress. The libraries were specifically constructed with the longSAGE protocol, in which first and second strand cDNA is synthesized with a biotinylated oligo deoxy-thymine (oligo-dT) onto streptavidin beads. The bound cDNA is cleaved at the 5’ end by an anchoring enzyme, such as NlaIII, that adds GTAC, the recognition site for MmeI, to the poly-thymine (poly-T) strand. Sequencing adapters with

94 5.1 Tag-sequencing Data Processing Methods

Table 5.1: Summary of cell lines investigated with the Tag-seq platform. (M=Male, F=Female)

Type of cell line Name of cell line Tissue Sex GNS G144 GBM M GNS G144ED GBM M GNS G166 GBM F GNS G179 Giant cell glioblastoma M NS CB541 Fetal forebrain - NS CB660 Fetal forebrain - an MmeI recognition site are then linked to the cDNA and MmeI digestion removes the 3’ portion of the cDNA from the beads, generating tags of con- stant length of 17nt that are then ligated to a 3’ adaptor and sequenced, as shown in figure 5.1 [490]. In the case of an Illumina sequencing platform, these

Figure 5.1: Step by step diagram of the longSAGE protocol used to generate reads that contain tag sequences derived from the mRNA pool.

95 5.1 Tag-sequencing Data Processing Methods constructs are hybridised onto the glass surface of a slide coated with comple- mentary oligonucleotides and "bridge" amplified to then have them sequenced with fluorescence-labeled nucleotide analogs [47]. Because each tag measures expression levels that are not based on transcript length, a digit or "count" is associated to it and the technology is often referred to as "digital gene expres- sion" tag profiling [490].

The Tag-seq technology is based on the principles of solid-phase sequencing and Serial Analysis of Gene Expression (SAGE). In this way, Tag-seq imports the quantitative and unbiased transcriptome profiling proper of SAGE library con- struction as well as the sequencing depth, dynamic range and cost-effectiveness of the latest high-throughput sequencing platforms. Furthermore, Tag-seq is not affected by the cross-hybridization and dynamic range limitations typical of microarray technology, since no sequence-specific hybridization is required for expression detection and the dynamic range is limited in principle only by the sequencing depth of the platform [346,490]. Tag-seq is similar to RNA-seq in that both technologies don’t require pre-existing knowledge of the genome analysed and Tag-seq has proven to perform comparably in terms of gene discovery and dynamic range [346,507]. Unlike RNA-seq, however, Tag-seq strictly monitors the 3’ end usage of transcripts and carries no information on their internal structure [194]. Finally, Tag-seq can be used as a strand-specific gene expression platform, which is not always true of RNA-seq, depending on the type of adaptor ligation performed [346].

We extracted tag sequences as the first 17nt of each read and counted the number of occurrences of each tag in each sample. Due to sequence errors introduced during library preparation and sequencing, highly expressed tran- scripts might have caused a significant number of tags to differ by a single base from the expected tag. To compensate for such errors, we used the Recount program [528], setting the hamming distance parameter to 1. Recount uses an expectation maximization algorithm to estimate "true" tag counts, i.e. counts in the absence of error, based on observed tag counts and base call quality scores.

We excluded tags matching adapters or primers used in library construction and sequencing, as well as tags matching ribosomal RNA (rRNA) or mitochon- drial sequence. Adapter and primer contamination was identified by running

96 5.1 Tag-sequencing Data Processing Methods the TagDust program [258] with a target false discovery rate (FDR) of 1%. Tags matching ribosomal sequence were identified by using the Bowtie pro- gram [257] to align against a database consisting of all rRNA genes, including , from Ensembl 56 [147] and all ribosomal repeats in the UCSC Genome Browser RepeatMasker track for genome assembly GRCh37 [152]; only perfect matches to the extended 21nt tag sequence, consisting of the NlaIII site CATG followed by the observed 17nt tag, were accepted. Mito- chondrial tags were similarly excluded by searching for perfect matches to the mitochondrial chromosome sequence. The parameters used to run the Bowtie program were the following:

-f parameter indicates that the query input files are FASTA files;

-n parameter set to 0 means that the alignments may have no more than n mismatches. Since we are only looking for perfect matches, n=0.

-y parameter specifies to the program to try as hard as possible to find valid alignments when they exist, which makes running this mode much slower;

-k 2 instructs the program to report up to two valid alignments;

-m 1 instructs the program to refrain from reporting any alignments for reads having more than one reportable alignment. This option is useful when the user wants to guarantee that reported alignments are unique, which is our case.

To assign tags to genes, we employed a hierarchical strategy based on the ex- pectation that tags are most likely to originate from the 3’-most NlaIII site in known transcripts. Tags were assigned to transcripts using virtual tag data from the SAGE Genie database [65] and virtual tags extracted from Ensembl transcript models. The SAGE Genie annotation consisted of 105 sets of virtual tags obtained by scanning for NlaIII sites in cDNAs from RefSeq, Mammalian Gene Collection (MGC) and Genbank, then Expressed Sequence Tags (ESTs), UniGene consensus sequences and transfrags. The virtual tag sets are further classified based on the position of the tag relative to the 3’ end of the transcript and indicators of 3’ end reliability such as polyadenylation signal and poly-A tail. Since SAGE Genie does not cover Ensembl transcripts, we also extracted virtual tags from the 3’-most cut site in each Ensembl transcript. We used the CATG recognition sequence as a "signal" that indicated where to start count- ing 17nt ahead. This allowed us to extract that 17nt tag and know exactly from

97 5.2 Array Comparative Genomic Hybridization Methods which transcript it was extracted. If the cut site was less than 17nt from the end of the transcript, we extended the sequence with adenine stretches, rep- resenting the poly-adenine (poly-A) tail. In these cases, additional upstream tags were also extracted until one tag fully contained in the Ensembl cDNA sequence was obtained. We disregarded Ensembl transcripts not belonging to a gene of biotype "protein_coding" or "processed_transcript". Ensembl annotates genes of several other biotypes, e.g. pseudogenes and small RNAs, but those annotations are not based on full-length transcript sequences, so we would not expect to find valid virtual tags in those transcripts. For the majority of this study, we used a conservative subset of the virtual tags from SAGE Genie and Ensembl comprising 25,593 unique tags assigned to 15,103 genes (Table 5.2). Specifically, we used SAGE Genie tags extracted from to the 3’-most cut site in RefSeq or MGC cDNAs having a poly-A tail or a polyadeny- lation signal, and Ensembl tags from transcripts of type "protein_coding" or "non_coding". Any virtual tags that mapped to multiple loci by these criteria were excluded. For certain analyses, we made use of more comprehensive vir- tual tag sets. In addition, we determined unique, perfect matches for tags to the genome using Bowtie as described above. We calculated a single expression value for each gene in each cell line by summing the counts of tags assigned to the gene.

5.2 Array Comparative Genomic Hybridization

We re-analysed the array comparative genomic hybridization (CGH) data de- scribed by Pollard et al [404] CGH was performed with Human Genome CGH Microarray 4x44K arrays (Agilent), using genomic DNA from each cell line hybridised in duplicate (dye swap) and normal human female DNA as ref- erence (Promega). Log2 ratios were computed from processed Cy3 and Cy5 intensities reported by the software CGH Analytics (Agilent). We corrected for effects related to GC content and restriction fragment size using a modi-

fied version of the waves array CGH correction algorithm [271]. Log2 ratios were adjusted by sequential loess normalization on three factors: fragment GC content, fragment size, and probe GC content. These were selected after in- vestigating dependence of log ratio on multiple factors, including GC content in windows of up to 500 kilobases centred around each probe. The Biocon- ductor package CGHnormaliter [506] was then used to correct for intensity dependence and log2 ratios scaled to be comparable between arrays using the

98 5.2 Array Comparative Genomic Hybridization Methods 2.48% 0.00% 17.05% 1.71% 4.53% 2.52% 2.87% 0.12% 55.40% 10,043,561 248,799 85 1,712,143 171,970 455,045 253,589 288,253 12,508 5,563,803 0.33% 0.00% 23.92% 2.16% 6.19% 3.00% 3.02% 0.11% 44.23% 12,103,066 39,992 112 2,894,539 261,411 749,653 362,899 366,115 13,553 5,353,468 0.39% 0.00% 24.16% 2.45% 5.90% 2.99% 2.91% 0.13% 45.14% 11,610,415 45,318 71 2,805,423 284,990 684,718 346,897 337,710 14,668 5,241,247 0.67% 0.01% 22.14% 2.20% 4.86% 2.29% 3.85% 0.19% 45.89% 13,415,402 90,407 1,721 2,970,554 295,764 652,449 306,786 516,134 25,914 6,156,944 10.73% 0.00% 18.85% 2.00% 4.51% 2.22% 2.08% 0.09% 47.59% 7,133,520 Classification of sequenced tags in each cell line. 765,484 242 1,344,879 142,369 322,034 158,661 148,512 6,717 3,394,874 9.31% 0.01% 22.25% 2.29% 5.42% 2.57% 2.60% 0.13% 40.94% Table 5.2: G144ED G144 G166 G179 CB541 CB660 6,383,175 751,698 11.78%156,534 912,971 2.45% 12.80%2,812,750 378,009 147,245 44.07% 2.82% 2,640,558 2.06%628,707 37.02% 347,537 285,881 9.85% 6,271,47134,999 2.99% 46.75% 2.13% 577,418 5,603,364 327,597 302,148 0.55% 8.09%72,658 48.26% 2.71% 34,472 2.60% 5,984,114 1,783,685 1.14% 49.44% 13.30% 382,819 287,493 0.48% 3,708,853205,626 1,267,594 60,725 3.81% 2.38% 36.93% 10.92% 103,632 3.22% 1,485,211 0.85% 133,9355,996 0.77% 185,115 12.27% 158,601 1.33% 105,787 982,190 2.60% 0.09%25,154 1.18% 0.91% 9.78% 608,978 4,464 107,955 0.39% 81,547 4.54% 0.93% 0.06% 25,422 418,267 0.67% 148,854 10,566 0.36% 49,374 3.60% 1.23% 0.08% 56,364 437,887 0.49% 84,542 6,455 3.62% 0.42% 0.84% 388,085 59,434 0.06% 3.86% 6,708 0.51% 51,511 0.06% 0.43% 40,908 46,416 0.41% 0.46% 594,513 651 1,420,009 146,442 345,759 164,176 165,895 8,581 2,613,101 Filtered tags Mitochondrial Tags assigned to a single locus Reference tags, best cDNA tags, best Other SAGE Genie tags, best Ambiguously mapping tags cDNA tags Tags not mapping totome known but transcrip- to multiple genomic locations Sequenced tags Adapter Ribosomal RNA Reference tags, unique cDNA tags, unique Other SAGE Genie tags, unique Tags not mapping totome known transcrip- but uniquely to genome Reference tags Other SAGE Genie tags Unclassified tags

99 5.3 Differential Gene Expression Methods

"scale" method in the package limma [466]. Replicate arrays were averaged and the genome (GRCh37) segmented into regions with different copy number using the circular binary segmentation algorithm in the Bioconductor package DNAcopy [510], with the option undo.SD set to 1. Aberrations were called using the package CGHcall [503] [37] with the option nclass set to 4. CGH data are available from ArrayExpress [28] under accession E-MTAB-972.

5.3 Differential Gene Expression

We called the differentially expressed genes with the Bioconductor package DESeq that uses pseudo reference-based normalization [21]. DESeq compares sequencing-derived expression profiles using a method that is able to account for large variation between biological replicates, as can be expected with can- cer samples. The R code that generated the differential expression calls is reported below, with ## signs indicating a comment:

## Define auxiliary functions read.gns.counts <- function(fn) { data <- read.delim(fn, quote="") counts <- data[,-(1:3)] colnames(counts) <- sapply(colnames(counts), substring, 7) counts <- as.matrix(counts) rownames(counts) <- data[,1] groups <- factor(c(rep("GNS",4),"NS","NS")) return( list(counts=counts, groups=groups, info=data[,1:3]) ) } normalize.counts <- function(counts, ref=counts) { geomeans <- exp(rowMeans(log(ref))) size.factors <- apply(ref, 2, function(cnts) median((cnts/geomeans)[geomeans > 0])) counts <- t( t( counts ) / size.factors ) return(counts) } ## Upload data in ’report’ variable library("DESeq") count.fn <- "gene_counts_ref_best.txt" report <- function(de, counts, info) { x <- data.frame(info, log2fc = de$log2FoldChange, p=de$pval, padj=de$padj, resVarA=de$resVarA,

100 5.4 Quantitative Real Time-PCR Validation Methods resVarB=de$resVarB, counts) rownames(x) <- de$id return(x) } ## Read data gns.data <- read.gns.counts(count.fn) ## Carry out DE analysis cds <- newCountDataSet( round(gns.data$counts[i,j]), gns.data$groups[j] ) cds <- estimateSizeFactors( cds ) cds <- estimateVarianceFunctions( cds ) de <- nbinomTest( cds, "NS", "GNS") rm(cds) ## Compute normalized counts and fold-changes norm.counts <- normalize.counts( gns.data$counts[i,] ) ## Make report table de.rep <- report(de, norm.counts, gns.data$info[i,]) ## Write table to file write.table(de.rep, "report_all.txt", row.names=FALSE, sep="t", quote=FALSE)

To identify major differences common to G144, G166 and G179, we filtered the set of genes called differentially expressed at a FDR of 1%, further requiring:

1. each GNS line show at least a two-fold change compared to each NS line, with direction of change consistent among the lines;

2. an absolute expression level above 30tpm42 in each GNS line (if up- regulated in GNS lines) or NS line (if down-regulated in GNS lines).

We computed correlation between expression profiles using variance-stabilised expression values from DESeq. GO and KEGG pathway tests were performed with the Bioconductor packages GOstats [138] and SPIA [489], respectively. InterPro domain annotation was obtained from Ensembl 56 and tests carried out in R.

5.4 Quantitative Real Time-PCR Validation

Custom-designed TaqMan low-density array microfluidic cards (Applied Biosys- tems) were used to measure the expression of 93 genes in 22 cell lines by qRT- 42tpm=tags per million

101 5.4 Quantitative Real Time-PCR Validation Methods

PCR. This gene set comprises 82 validation targets from Tag-seq analysis, eight glioma and developmental markers, and three endogenous control genes - 18S ribosomal RNA, TUBB and NDUFB10. The 18S gene was chosen because used by ABI as a manufacturing control, while TUBB and NDUFB10 were selected because they are present in our Tag-seq dataset and show low varia- tion of expression across GNS and NS cell lines in an independent microarray expression dataset [404]. The 93 genes were interrogated using 96 different TaqMan assays (three of the validation targets required two different primer and probe sets to cover all known transcript isoforms matching differentially expressed tags). cDNA was generated using SuperScript III (Invitrogen) and real-time PCR carried out using TaqMan fast universal PCR master mix. The absence of a no-template control (NTC) amplification (horizontal slope) en- sured that random contamination and reagent contamination were not affect- ing our samples. In complying with the minimum information for publication of quantitative real-time PCR experiments (MIQE) [78], a full assay list with raw and normalised threshold cycle (Ct) values is provided in Appendix A.3. The data analysis was performed using the R package HTqPCR [131], which handles high-throughput qPCR data with a focus on data from Taqman low density arrays. Ct values were normalised to the average of the three control genes and potential outliers identified and filtered out using the plotCtBoxes function (Fig 5.2). Figure 5.3 shows the effect of the normalisation method

Figure 5.2: Boxplot of normalised Ct values identifies outliers (empty circles).

102 5.4 Quantitative Real Time-PCR Validation Methods

Figure 5.3: Correlation scatter plot showing the effect of the normalization method on the raw Ct values measured for the three endogenous controls. employed on the endogenous controls rearranging the Ct values to a smaller dynamic range. To capture biological variability within cell lines, we measured up to four independent RNA samples per line (indicated with the letters A, B, C, D in tables A.3 and A.4 in Appendix A.3). Figure 5.4 shows that the concordance between all our A, B biological replicates is very high and few outliers can be observed - identified with a gene name and highlighted in red in the figure. Differentially expressed genes were identified by the Wilcoxon rank sum test after averaging replicates.

103 5.4 Quantitative Real Time-PCR Validation Methods

104 5.4 Quantitative Real Time-PCR Validation Methods

105 5.4 Quantitative Real Time-PCR Validation Methods

Figure 5.4: Scatter plots of each of our A and B biological replicates shows a high degree of concordance between samples with very few outliers (highlighted in red).

In order to assess the repeatability, or intra-assay variation, of the expression levels measured with qRT-PCR for each of the N=96 genes, standard devia- tions (σ) were computed for the differences in gene expression levels between two replicates (Fig 5.5). Standard deviations ranged between 3.5 <σ< 0.5. Letting the difference between expression levels measured for each gene in each replicate cell line:

Di = X1i − X2i (5.1) where i=1, 2, ..., N and the mean of the differences is:

ΣDi D¯ = (5.2) N then the standard deviation S for the differences can be computed as: s ¯ 2 Σ(Di − D) S = (5.3) N − 1

106 5.5 Literature Mining Methods

Figure 5.5: Dot plot of the standard deviations (blue) of the differences between expression levels in two replicates. Standard deviations were computed for each combination of replicate cell lines.

5.5 Literature Mining

A web script (provided in Appendix B) was implemented to automatically pro- cess the querying of the 748 differentially expressed genes (F DR < 10%) and therefore establish their role, if any so far had been assigned, in glioblastoma or other cancers. The script was modelled against a browser query frame type in the Visual Basic.NET programming language, which I deemed the best coding choice for this data-hefty application because of its "collection" objects, i.e. extremely powerful hash tables that make use of an ultra-fast key and vector content look-up. Moreover, the presence of the .NET Framework enriches this particu- lar edition of Visual Basic with extensive client-provider and internet browsing libraries that I expected to need for this particular application. The code I developed implements a process of hierarchical look-up, recapit- ulated in figure 5.6, that starts with the most glioblastoma-specific database amongst GBMbase [162], Google Scholar, Google Search, BioGraph [280], in- formation Hyperlinked Over Proteins (iHOP) [197] and PubMed [410], and loops all the way to the least-specific database. In this way, the probability of finding an association between the queried gene and glioblastoma diminishes at every iteration and is unlikely to exist by the time the execution arrives

107 5.5 Literature Mining Methods at the last and least-specific of the databases. This algorithm maximises the probability of finding a reported connection between any gene and any disease, prompted minor disease-specific details are changed such as the identity of the first database, which needs to be the most disease-specific of the six.

If we call "case" the code that searches for the association between queried gene and glioblastoma in one database, and "iteration" the search loop com- pleted from first to last database, then during each case a weighted-index is accumulated to become, at the end of one iteration, a total weighted index, i.e. a score of how well that particular gene did in the search for its association to glioblastoma in the six databases. Therefore, the total weighted index repre- sents those parameters used in the look-ups that yielded the most favourable results, i.e. which database successfully found an association between the gene and glioblastoma, how many hits were found, and in which part of the paper the hits were found. In fact, I assigned a greater weight to the associations found in the title, with respect to those found in the abstract, with respect to those found in the contents of a paper. In designing the weight structure I also decided to favour the associations found in the same sentence to those found more than one sentence apart. The latter, in fact, barely carry any relevance in the value cumulation process of the total weighted index.

In this particular application it was interesting to know whether the queried gene was implicated in any type of cancer, especially in case the literature did not report an association with glioblastoma. Therefore, at every iteration, the databases are also queried with parameters that identify the answer to the question "Is this gene implicated in any cancer?" and a separate index from the total weighted index is determined. To calculate this cancer index I searched for an association between the gene being queried and any of the cancers listed in a database that I compiled for this particular application. All the cancers listed in the National Cancer Institute A to Z List of Cancers [206] are present in the cancer database as a unique set of names. A search was per- formed in PubMed for every combination of queried gene symbol and cancer name from the compiled cancer database and, if the two terms appeared in the same sentence, this was considered an association and the cancer index for that combination incremented by one. At the end of the search, every gene-cancer combination possessed a cancer-index and the association was reported if the value of such index was above an arbitrarily chosen threshold value.

108 5.5 Literature Mining Methods

Is this gene Yes Is this gene associated Yes I ≧ 10 How many hits? associated with GBM How many hits? Category 1 with cancers? in GBMbase? 1 ≦ I < 10 I = 0 No I ≧ 1 Category 2 Category 4 Is this gene associated Yes I ≧ 10 with GBM in iHOP? How many hits? 1 ≦ I < 10 No Category 3 Category 2

Is this gene associated Yes I ≧ 10 with GBM in How many hits? Which cancers? PubMed? 1 ≦ I < 10 No

Category 2 Probability of finding an associaon with GBM

Is this gene Yes I ≧ 10 associated with GBM How many hits? in BioGraph? 1 ≦ I < 10

No Category 2

Is this gene Yes I ≧ 10 associated with GBM How many hits? in Google Scholar? 1 ≦ I < 10

No Category 2

Is this gene Yes

I ≧ 10 Category 2 associated with GBM How many hits? in Google Search? No 1 ≦ I < 10

Figure 5.6: In this diagram the "cases" (code that searches for the association between queried gene and glioblastoma) and "iterations" (search loop completed from first to last database) structures in the execution code are made obvious. At every question the answers identify the parameters that are used to assign the queried gene to one of the four categories or skip the assignment for that particular case and go through the rest of the iteration. I = number of associations found for the gene symbol of the queried gene and the word "glioblastoma" or "GBM".

At the end of an iteration the total weighted index and the cancer index for every gene-cancer combination were available and, depending on their values, the queried gene was assigned to one of four categories, which answer one of the following questions:

Category 1. Does extensive literature implicate the gene in GBM?

Category 2. Does a limited amount of literature implicate the gene in GBM?

Category 3. Is there no literature implicating the gene in GBM, but does it appear to be implicated in other cancers?

109 5.6 Differential Isoform Expression Methods

Category 4. Is there no literature implicating the gene in any cancer?

If W = total weighted index and C = cancer index, then if W > 10 (internal parameter, does not appear in fig 5.6), the gene is assigned to category one; if 1 < W < 10, the gene is assigned to category two; if W = 0 and C > 10 (internal parameter, does not appear in fig 5.6) then the gene is assigned to category three; finally, if W = 0 and C = 0 then the gene is assigned to category four.

5.6 Differential Isoform Expression

With the aim of establishing Tag-seq as a sensitive technique for the iden- tification of 3'UTR transcript isoforms that are differentially expressed be- tween GNS and NS cell lines, we evaluated the differential expression of 4,727 genes with multiple expressed tags using two alternative methods: the non- parametric χ2test, and the logarithmic algorithm adapted from Morrissy et al [346]. Non-parametric methods are applied when the assumptions for para- metric methods about the underlying distribution of the data are not met, namely that the data is normally distributed or that sample size is large enough to support the assessment of the normality distribution. Whilst the normal- ity assumption is satisfied when tested against the 28,351 tags mapping over 16,025 genes in the six libraries altogether, sample size drops to a range of two to 14 tags per gene when assessing differential expression over a single tran- script isoform. With a sample size smaller than 30, the use of non-parametric methods is often advisable. Due to the nature of the SAGE protocol, tags map at the 3'UTR of a gene and identify potential splicing isoforms. We conducted a χ2test with the chisq.test function in R on all tags mapping on to the same gene of the 12,794 tags expressed at higher than 10 tpm in at least one GNS and one NS library. In the second approach, following the logarithmic ratio method adapted by Morrissy et al [346], the isoform expression between the GNS and NS cell lines was evaluated by analyzing the expression of tags that mapped either twice on the same gene or, if more than twice, for all their pair combinations. Again, only tags with a minimum expression of 10 tpm in at least one GNS and one NS library were considered (Table 5.3). The ratio of normalised expression for each tag pair was calculated in each GNS and NS library, and multiplied by the natural logarithm of the difference between the expression of each tag in the pair, ensuring that the more highly expressed tag pairs would also be more highly ranked (equations 5.4 and 5.5). For pairs of genes with positive

110 5.6 Differential Isoform Expression Methods

Table 5.3: Summary of statistics using the χ2 and logarithmic tests.

Non-parametric Logarithmic χ2test ratio Number of tags and tag-pairs mapped (2-14/gene) 12,792 13,599 Number of genes identified by tags 4,727 4,541 Number of multiple mapping tags at p< 0.01 8,169 3,651 Number of genes differentially expressed between iso- 2,682 2,040 forms identified by multiple mapping tags at p< 0.01 ratio changes, the ratio was higher in GNS cell lines compared to NS cell lines, while pairs with negative values of ratio changes had a ratio that was higher in NS cell lines rather than GNS cell lines (equation 5.7). Gene pair ratio-change values ranged from -17.4 to 17.7 and no p-values were produced. The equations used are shown below, where S1 and S2 stand for any two isoforms identified by tag mapping:

S1 S1expression = ln[ln(S1expression − S2expression) × ( )] (5.4) S2 S2expression

If the second isoform is more highly expressed than the first, the ratio is cal- culated as:

S1 S2expression = −1 × ln[ln(S2expression − S1expression) × ( )] (5.5) S2 S1expression

If

0 < (S1expression − S2expression)|(S2expression − S1expression) < 1 (5.6) then the pair was omitted. For each tag pair, the average and standard deviation was computed for all modified means in GNS cell lines, and separately for all means in NS cell lines. An overall measure of the change in the ratio between GNS and NS cell lines was computed as shown below:

GNS µGNS µNS δ = ln( ) − ln( ) (5.7) NS σGNS σNS

Dividing each mean by the standard deviation ensures that tag pairs with lower variance in their ratios are ranked higher than tag pairs with a higher variance [346]. The logarithmic algorithm uses the mean and standard devi- ation statistics to describe the ratios of expression change between the GNS and NS lines. Unlike the non-parametric method, this algorithm produces a ratio change for each pair of tags rather than for the sum over all tags and the

111 5.7 Differential Long ncRNA Expression Methods absolute value tells the fold change whilst the sign the direction of the change. The 13,599 tag-pairs analysed pertained to a total of 4,541 genes, a number closely matching the 4,727 genes assessed with the non-parametric method. By ranking the tag-pairs according to magnitude of ratio change and setting a threshold at two-fold, the number of tag-pairs decreased to 3,651 but the num- ber of genes being represented was still substantially high at 2,040, a number comparable to the 2,682 genes identified in the non-parametric method.

5.7 Differential Long ncRNA Expression

We called differential expression with the Bioconductor package DESeq for all tags that mapped to a unique location in the reference human genome with the Bowtie program and filtered the results to exclude protein-coding transcripts. This analysis revealed 25 ncRNAs with strong evidence of differential expres- sion between GNS and NS lines. It is unlikely that these transcripts represent artifacts of the Tag-Seq procedure, because all coincide with published cDNA sequences or ESTs.

5.8 Glioma Expression Signatures

The Tag-seq data described in Parsons et al [383] was kindly provided by Dr. N. Papadopoulos. Following quality control, we excluded three out of 21 sequencing lanes. To treat the expression data in a similar manner as Verhaak et al [511], Tag-seq expression values were transformed by computing log fold change relative to the mean expression across the glioma tumour and xenograft samples. Only those genes identified by Verhaak et al [511] as signature genes for a subtype were used in the correlation calculations for that subtype. To obtain robust fold change estimates, the gene sets were further limited to those genes having a normalized expression level above 25 tags per million in any sample. This reduced the original set of 210 signature genes per subtype to 158, 167, 183 and 164 for "proneural", "neural", "classical" and "mesenchymal" subtypes, respectively. P-values were computed with the R function cor.test.

5.9 External Dataset Expression Correlation

We used Affymetrix Exon 1.0 ST microarray data from The Cancer Genome Atlas (TCGA; http://cancergenome.nih.gov; [326]) for 397 glioblastomas

112 5.9 External Dataset Expression Correlation Methods and 10 non-neoplastic brain samples (dataset GBM). In addition, we used Affymetrix HG-U133A and HG-U133B data for 24 grade III and 50 grade IV gliomas from Freije et al [148] and 21 grade III and 55 grade IV gliomas from Phillips et al [390] (dataset HGG), For the survival analysis the dataset from Gravendeel et al [176] and that from Murat et al [353] were used (Table 5.4). All tumour microarray data were from primary glioma samples obtained at ini-

Table 5.4: Public gene expression datasets used in trying to establish tumour expression correlations with the differentially expressed genes found through Tag- seq and with the clinical data available from TCGA in the survival analysis.

Citation Accession Microarray Number of cases platform GBM Grade Other Grade Wt** (Affymetrix) III III I-II brain astrocyt* glioma glioma TCGA [326] n.a. Exon 1.0 ST 397 0 0 0 10 Phillips et al GSE4271 U133A,B 55 21 0 0 0 [390] Freije et al GSE4412 U133A,B 50 8 16 0 0 [148] Gravendeel et GSE16011 U133 Plus 2.0 141 16 66 27 0 al [176] Murat et al GSE7696 U133 Plus 2.0 70 0 0 0 0 [353] Gravendeel et al. described 269 samples obtained at histologic diagnosis, from which we excluded 15 containing mostly non-neoplastic tissue and four lacking survival data; n.a.=not applicable; *astrocytoma, **Non-neoplastic. tial diagnosis. For dataset GBM, we used processed (level 3) data from TCGA, consisting of one expression value per gene and sample. For the survival anal- ysis datasets and the HGG dataset, the raw microarray data was processed with the Robust Multi-chip Average (RMA) method as implemented in the Bioconductor package affy [161] and probe-gene mappings were retrieved from Ensembl 68 [146]. For genes represented by multiple probesets, expression values were averaged across probesets for randomisation tests, heatmap vi- sualisation and GNS signature score calculation. Differential expression was computed using limma [464]. Randomisation tests were conducted with the limma function geneSetTest, comparing log 2(FC) for the sets of core up- or down-regulated genes against the distribution of log 2(FC) for randomly sam- pled gene sets of the same size.

Survival analysis was carried out with the R library survival. To combine expression values of multiple genes for survival prediction, an approach inspired

113 5.10 Glioblastoma Pathway Construction Methods

by Colman et al [105] was taken. The normalised expression values xij, where i represents the gene and j the sample, were first standardised to be comparable between genes by subtracting the mean across samples and dividing by the standard deviation, thus creating a matrix of z-scores:

xij − x¯i zij = (5.8) SD(xi)

Using a set U of nU genes up-regulated in GNS lines and a set D of nD genes down-regulated in these cells, we then computed a GNS signature score sj for each sample j by subtracting the mean expression of the down-regulated genes from the mean expression of the up-regulated genes:

X zij X zij sj = − (5.9) nu nD i∈U i∈D

IDH1 mutation calls for TCGA samples were obtained from Firehose data run version 2012-07-07 [143] and data files from the study by Verhaak et al updated 2011-11-28 [408].

5.10 Glioblastoma Pathway Construction

The pathway map was created in Cytoscape 3.0 [451]. Cytoscape is an open- source platform for complex network analysis and visualisation of network data, such as molecular interaction networks or biological pathways, that can be integrated with annotations, gene expression profiles and any other type of useful data. Cytoscape is available for download at www.cityscape.org. In a Cytoscape network nodes represent objects, i.e. proteins, and connecting edges represent relationships between objects, i.e. a protein’s physical inter- action with another protein. Once this basic network is laid out, attributes can be assigned to nodes and edges to help the visualisation of categories of objects and types of relationships, respectively (Fig 5.7). Cytoscape networks can become extremely complex as layers of attributes are applied to nodes and edges in the form of different colours, shapes, thicknesses and other graphical features that end up representing an ever-increasing amount of biological infor- mation pertaining to that network. Finally, cytoscape-generated networks can be analysed with the use of "plugins", pieces of independent software devel- oped for specific applications on Cytoscape such as graph analysis, clustering, ontology analysis, etc. All plugins can be found at aps.cytoscape.org.

114 5.10 Glioblastoma Pathway Construction Methods

Figure 5.7: Schematisation of a typical Cytoscape network look, with nodes and edges representing proteins and the type of interaction between them, respectively. The colour and shape of the nodes can be attributed to different characteristics of the proteins, such as their family, enzymatic activity, post-translational modifications, etc. The colour and shape of the edges can be attributed to the type of interactions between proteins, such as activating, inhibiting, coenzymatic, etc. In this example protein A and B interact to form a complex that activates protein D, also repressed by protein C, that once active can go ahead and activate protein E.

A series of pre-defined "layouts" available in Cytoscape, i.e. algorithms that automatically lay a network out by generating arrangements of nodes and edges that either make it look a specific way, such as the circular, grid and hierarchical layouts, or use the attribute information as a guide, such as the attribute circle, degree sorted circle, force-directed, group attributes layouts. For our purposes we used the edge-weighed force-directed layout, also known as "biolayout" that, once generated, was manually re-adapted to obtain a best fit for image generation. In order to graph the network, the edge-weighed force-directed layout, or biolayout, uses an algorithm that minimises the en- ergy of the model by starting with an initial layout, where the positions of the nodes are randomly assigned. Then, in every iteration, the algorithm tries to improve the layout according to the energy model using the first derivation of the energy function to compute a direction and a distance for the movement of each node. Since the graphs generated are large, the minimising algorithms do not carry a high complexity per iteration value. The algorithm of Barnes and Hut [42] is used in Cytoscape’s biolayout for this purpose.

The glioblastoma pathway was constructed manually by integrating the infor- mation on nodes (genes) and edges (types of interactions between genes) from a variety of glioblastoma pathways found in the literature with the purpose of

115 5.10 Glioblastoma Pathway Construction Methods generating an all-encompassing network containing all the known interactions that are important in glioblastoma. A fundamental rule that I followed was to check that every protein interaction had been experimentally validated, via lit- erature research and the use of the BioGRID protein interaction database [62] and the IntAct molecule interaction database [205], before adding it to the pathway. If it was not the case then the interaction (edge) and proteins in- volved (nodes) were not included in the final glioblastoma pathway. The information integrated to generate the final glioblastoma pathway was taken from four different sources:

· The pathway database of the Kyoto Encyclopedia of Genes and Genomes (KEGG) [3], a collection of manually drawn pathway maps;

· GBMbase [162], a collection of signalling pathways in glioblastoma based on the ones described in the TCGA project [326] including a searchable archive of glioblastoma gene publications;

· "Automated Network Analysis Identifies Core Pathways in Glioblas- toma" by Cerami et al 2010 [86];

· "Malignant Astrocytic Glioma: Genetics, Biology, and Paths to Treat- ment" by Furnari et al 2007 [154].

I also included the information from important pathways in signal transduction that are known to be specifically affected in glioma, namely:

· The Mitogen-activated protein (MAP) kinase signalling cascade, taken from three sources:

– The Cell Signalling Technologies website [1] – "Mitogen-activated protein (MAP) kinase pathways: regulation and physiological functions" by Pearson et al 2001 [388] – The pathway database of the Kyoto Encyclopedia of Genes and Genomes [3]

· The p53 pathway, taken from three sources:

– The Panther Pathway repository [5] – The pathway database of the Kyoto Encyclopedia of Genes and Genomes [4] – The Biocarta pathway archive [58]

116 5.11 MicroRNA Target Prediction Analysis Methods

· The Rb1 pathway "RB tumour suppressor/checkpoint signalling in re- sponse to DNA damage" from the Biocarta pathway archive [60]

· The Pten pathway "PTEN dependent cell cycle arrest and apoptosis" from the Biocarta pathway archive [59]

The complete pathway has a total of 238 nodes and the colour mapping is based on fold change absolute values and directions taken from the dif- ferential gene expression analysis (see Results section 6.4).The colour map- ping was performed using the VistaClara Cytoscape plug-in [234] that assigns colours to nodes matching expression data imported with the "Import At- tribute/Expression Matrix File" function of Cytoscape. The assignment is done through the sigmoid function:

2 y = − 1 (5.10) 1 + e-sx where the value of the constant "s" determines the rate of change in the colour gradient, with smaller values allowing for the colour to ramp very gradually, and larger values bringing the sigmoid function closer to a step function with a very abrupt colour change around the values −1 < y < 1. The log 2(FC) values range between −11 < F C < 11 and a value of s=1 was applied, with darker reds and greens indicating smaller absolute values of log 2(FC), and brighter colours indicating greater ones. A pathway image with the same colour coding and gradient used in the tumour correlation heatmap (Fig 7.4), has also been generated (yellow and blue shades) to correlate the expression values of the differentially expressed genes between the two analyses.

5.11 MicroRNA Target Prediction Analysis

In trying to answer the question "Is a subset of prediction algorithms better at predicting than the single?" we used exon array data and microRNA mi- croarray data from a cohort of GNS cell lines that was made available to us. The candidate did not perform this analysis but rather used the data gath- ered from it in the ensemble analysis of microRNA target predictions. The exon microarray data was analysed in R using software packages from the Bioconductor project [465]. Background correction, quantile normalization and calculation of probe-set expression values from fluorescence data was per- formed using the Robust Multi-chip Average (RMA) method as implemented in the affy package [161]. We used the xmapcore system, based on the earlier

117 5.11 MicroRNA Target Prediction Analysis Methods exonmap package and x:map database [369], to associate microarray probesets with protein-coding genes annotated in Ensembl 58. Probeset filtering was applied such that all probe sequences were required to map to exonic gene components. We estimated the expression level of a gene as the median nor- malised intensity over its associated probe-sets. Differential gene expression was computed using the limma package [465], where statistical significance was determined with a moderated eBayes test and the resulting p-values adjusted using the FDR method. The microRNA microarray data were background corrected using the method described by Edwards et al [132] and implemented in the limma package [465]. Following least-variant set normalization with de- fault parameters [482], differential expression was computed using the limma package.

118 Results

119 Chapter 6

Digital Transcriptome Profiling

Contents 6.1 Clinical Data ...... 120 6.2 Tag mapping ...... 121 6.3 Copy Number Aberrations ...... 125 6.4 Core Differentially Expressed Genes ...... 129 6.5 Large-scale qRT-PCR Validation ...... 136 6.6 Literature Mining for Differentially Expressed Genes . . . . 142 6.7 Isoform Differential Expression ...... 144 6.8 Long ncRNA Differential Expression ...... 157

6.1 Clinical Data

The clinical data available for our cell lines are somewhat limited due to pa- tient consent forms prohibiting surgeons from revealing it. However, we do know that all of our GNS cell lines are clinically classified as primary glioblas- tomas, not secondary. Consistent with this, the PCR experiments performed by Steven Pollard on our GNS cell lines to establish the presence or absence of IDH1 and IDH2 mutations found no evidence of any mutation in their re- spective loci (oral communication). In the paper by Pollard et al [404] the genetics of our GNS cell lines are described using SNP6.0 arrays, which iden- tified "classic" mutations, in particular frequent loss of CDKN2A:ARF, gains of CDK4 and general glioblastoma-pertinent gene instability. The age of the patients is described in the paper and summarised in table 6.1 below.

120 6.2 Tag mapping Results

Table 6.1: Summary of the available clinical data for our GNS cell lines (M=Male, F=Female, n.a=not applicable).

Type of Name of Tissue Type Sex IDH1 IDH2 Patient cell line cell line mutation mutation age (years) GNS G144 GBM Primary M No No 51 GNS G144ED GBM Primary M No No 51 GNS G166 GBM Primary F No No 74 GNS G179 Giant cell GBM Primary M No No 56 NS CB541 Fetal forebrain n.a. n.a. n.a. n.a. n.a. NS CB660 Fetal forebrain n.a. n.a. n.a. n.a. n.a.

6.2 Tag mapping

We created one Tag-seq library per cell line and obtained between 6 and 13 million sequence reads from each (Table 6.2). Every read was formed by a first sequencing primer, the 17nt tag, and a second sequencing primer in this order (Fig 6.1).

Table 6.2: Summary of reads per cell line library.

Type of cell line Cell line Number of reads GNS G144 7,133,520 GNS G144ED 6,383,175 GNS G166 13,415,402 GNS G179 11,610,415 NS CB541 12,103,066 NS CB660 10,043,561

Figure 6.1: Diagram of the construct generated by the longSAGE protocol sent to the Illumina sequencer for the final step of Tag-seq.

Once the read sequences were received as FASTA files, we first extracted the 17nt tags they contained, then filtered these tags and, finally, aligned them to the genome. The entire process is summarised as a diagram in figure 6.2. Firstly, the 17nt tags were extracted out of each read, separating the primer sequences from the tag sequences that actually interested us. Secondly, each tag sequence was counted and the resulting counts adjusted for sequencing and library preparation errors that might have caused highly expressed transcripts to give rise to a significant number of tags differing from the expected tag se- quence by just one base. Secondly, the reads that were not going to map onto

121 6.2 Tag mapping Results

Figure 6.2: Step by step diagram of the extraction, filtering and mapping phases for reads and tags. The extracted tag population is coloured with different shades of grey that represent a different origin (adapter, mitochondrial or ribosomal tag). In the mapping phase the diagram to the right represents the human reference genome and some of the filtered tags (light grey) are mapping onto some known transcripts (gene A, gene B) and non-coding regions. the reference genome (adapter tags formed by the ligation of two sequencing primers, and mitochondrial RNA), as well as rRNA sequences, were filtered out. As shown in figure 6.3, on average more than 90% of tags remained un- filtered, meaning that most of the sequenced data were available for us to use in subsequent analyses, the first one being alignment to the reference genome. Finally, the tag sequences contained within the recounted and filtered reads were mapped in two complementary ways. Notice that, in order to maximise the ability of the aligner to effectively map our short tag sequences to the ref- erence genome, we included the CATG recognition site of the MmeI anchoring enzyme, at the 5’ of every tag sequence; this generated 21nt tag sequences that were used for alignment to the reference genome. For each library, we mapped the so-generated pool of 21nt tags to the human reference genome, as well as to a virtual tag-to-gene library that we assembled from already existing tag-to- gene libraries and a complementary one that we generated programmatically. The tag mapping strategy we adopted was a hierarchical one, as summarised in figure 6.4. This allowed us to generate sets of decreasing stringency - regarding the number of tags mapping to known transcripts - that were best fit for differ-

122 6.2 Tag mapping Results Proportion of tags after the steps that filter for adapter contamination, mitochondrial or rRNA tags are concluded. Values are Figure 6.3: normalised across all cell lines.

123 6.2 Tag mapping Results ent analyses. We first mapped all the tags obtained from the process described in fig 6.2 to known transcripts, then we mapped the remaining tags to cDNA libraries. Finally, we mapped the remaining tags to the genome for the poten- tial discovery of new unannotated transcripts. As a result of this strategy, we collected in "Ref_uniq" - our most stringent set - all the tags mapping to a single reference gene and in "Ref_best" - our second most stringent set - all the tags included in the former with the addition of tags mapping to multiple genes of which, however, only one was a reference transcript. The conservative set of tag-to-transcript mappings ("Ref_uniq") comprised 25,593 unique tags assigned to 15,103 genes, and formed our primary dataset for further anal- ysis. In addition, we created several more comprehensive tag mapping sets, by including mappings to other cDNA sequences, expressed sequence tags, internal NlaIII sites and unannotated genomic regions ("cDNA_uniq" and "cDNA_best").

Figure 6.4: Diagram of the tag mapping strategy where each filtered tag is assigned to a more or less stringent set used for future analyses.

As shown in figure 6.5, on average more than 55% of tags (green) in every cell line could not be classified as belonging to any of the sets identified with the

124 6.3 Copy Number Aberrations Results hierarchical tag assignment process described above and summarised in fig 6.4. This showed that most of the transcriptional activity in our cell lines could not be represented by known transcripts or cDNAs, but perhaps that much of the regulation might be happening at the non-coding transcriptional level. On av- erage, in every cell line, less than 45% of tags mapped perfectly to the reference nuclear genome, i.e. the tags contained within the "Ref_uniq", "Ref_best", "cDNA_uniq" and "cDNA_best" sets in the non-green portions of fig 6.5. This indicated that almost half of the transcriptional activity detected in our cell lines could be traced back to known gene functions, giving us the possi- bility to correlate gene expression in our cell lines with, for example, known cancer pathways or other data from similar experiments. Of these 45% tags, on average 32% were specifically assigned to high-quality reference transcript models from the widely used Ensembl, RefSeq and Mammalian Gene Collec- tion (MGC) databases (i.e., belonged to the "Ref_uniq" and "Ref_best" sets). Reassuringly, other Tag-seq studies have reported similar figures [194,346]. Given that glioblastoma is a heterogeneous tumour, and that our samples were taken from three distinct glioblastoma patients, we performed an analysis to observe whether any correlation existed between the cell lines (Fig 6.6) that would reinforce or weaken that knowledge. We observed the following, calcu- lating a single expression value for each gene in each cell line by summing the counts of all tags assigned to that gene: the correlation between the techni- cal replicates G144 and G144ED was the highest (Pearson R = 0.94), as was assumable since the two cell lines were derived from the same tumour but in different laboratories (it should be noted that differences between these repli- cate measurements can be due to a number of factors in the procedure from cell line establishment to tag sequencing); the correlation between the two NS cell lines was also high (R = 0.87); any correlation between GNS cell lines (G166, G179 and G144) showed larger differences in gene expression profiles (R ranging from 0.78 to 0.82) than the correlation between the two biological replicates or the two NS cell lines, as was to be expected since the GNS cell lines originated from histologically distinct tumours while the NS cell lines supposedly bore a wild type transcriptional activity.

6.3 Copy Number Aberrations

Since the observed differences in gene expression can be caused by a number of factors such as chromosomal aberrations (including rearrangements, losses

125 6.3 Copy Number Aberrations Results Proportion of filtered tags that, after alignment to the reference genome, cDNA libraries and virtual tag-to-gene libraries, are assigned Figure 6.5: to a set. Values are normalised across all cell lines.

126 6.3 Copy Number Aberrations Results

Figure 6.6: Correlations for all combinations of cell lines: any GNS/GNS and NS/NS cell lines. and gains of DNA) and alterations in transcriptional regulation and mRNA degradation, we analysed array comparative genomic hybridization (aCGH) data for G144, G166 and G179 [404]. The candidate did not perform this analysis but the results will be shown nonetheless to give the reader a clearer understanding of the genomic identity of these cell lines. The aCGH method reveals net gains and losses in a cell population over the average ploidy. Previ- ous analysis of chromosomal alterations in G144, G179 and G166 by SKY and CGH detected several alterations, mostly whole-chromosome gains and losses, and found that the lines do not show gross chromosomal instability when cul- tured. Specifically, the CGH data for G144 revealed the deletion of PTEN on chromosome 10 [404]. These findings were confirmed by our analysis of the aCGH data (Fig 6.7). We identified aberrations known to be common in glioblastoma, including gain of chromosome 7 (EGFR, which is over-expressed in 40% of glioblas- tomas [154,262], lies on chromosome 7) and losses of large parts of chromosomes 10, 13, 14 and 19 in more than one GNS line, as well as focal gain of CDK4 in G144 (arrow, chromosome 12 in figure 6.7) and focal loss of the CDKN2A- CDKN2B locus in G179 (arrow, chromosome 9 in figure 6.7) [52,326]. We found that both G166 and G179 had low copy numbers of chromosome 10, but could not establish a focal PTEN loss as we did for CDK4.

127 6.3 Copy Number Aberrations Results

Figure 6.7: The image summarises the results of the CGH arrays performed on cell lines G144, G166 and G179. Dots indicate log2 ratios for array CGH probes along the genome, comparing each GNS line to normal female DNA. The coloured segments indicate gain (red) and loss (green) calls, with colour intensity proportional to mean log2 ratio over the segment. The was called as lost in G144 and G179 because these two cell lines are from male patients; sex-linked genes were excluded from downstream analyses of aberration calls. The red segments above the grey line at zero indicate gains and red segments below it indicate losses over the entire genome.

On a global level, we found a correlation between aberrations and expression levels, but this trend was modest (Fig 6.8a). Among the 459 autosomal genes that we found to be up-regulated in GNS relative to NS lines by Tag-seq, there was a clear enrichment of gains (p = 0.002 with Fisher’s exact test) and depletion of losses (p = 0.002) compared to all autosomal genes expressed in the GNS or NS lines. Down-regulated genes showed an opposite, but weaker, trend (Table 6.3, Fig 6.8b). We also observed that many of the 29 genes that were found to distinguish GNS lines from NS lines by qRT-PCR had associ- ated aberration calls that suggested their expression levels may be dictated by a loss or a gain in their copy numbers (Fig 6.8c). Overall, these results

Table 6.3: Significance of the correlation found between CNAs and expression levels measured with Fisher’s exact test (p-value).

Gains Losses Up-regulated 0.002163 0.002286 Down-regulated 0.06997 0.4257 suggest that copy number changes are a significant cause of the observed gene expression changes. However, other factors may be more important, because only a minority of up-regulated genes (21%) showed evidence of gains. Thus,

128 6.4 Core Differentially Expressed Genes Results regulatory changes and other alterations not detectable by aCGH, such as bal- anced translocations and small mutations, likely play a major role in shaping the GNS transcriptome.

Figure 6.8: (a) Curves show distributions of expression level differences between GNS and NS lines, stratified by aberration calls. The distributions for genes in seg- ments without aberrations (neutral) peak near the zero mark, corresponding to an equal expression level in GNS and NS lines. Conversely, genes in lost and gained regions tend to be expressed at lower and higher levels, respectively. In each plot, log2(FC) is computed between the indicated GNS line and the mean of the two NS lines, and capped at (-8, 8) for visualisation purposes. To obtain robust FC distri- butions, genes with low expression (< 25 tpm) in both GNS and NS conditions were excluded; consequently, between 6,014 and 6,133 genes underlie each plot. (b) For each of the three gene sets listed in the legend (inset), bars represent the percentage of genes with the indicated copy number status. (c) Aberration calls for the 29 genes that were found to distinguish GNS from NS lines by qRT-PCR. Circles indicate focal (< 10 Mb) aberrations; boxes indicate larger chromosomal segments.

6.4 Core Differentially Expressed Genes

To identify genes with differing expression between GNS and NS cells, we used the Bioconductor package DESeq on the three GNS lines G144, G166 and G179, and the two NS lines CB541 and CB660. The DESeq package, unlike its predecessor edgeR, uses a method whose core assumption is that the mean is a good predictor of the variance, which implies that, for a given distribution of genes with similar expression levels, the variance across replicates will be similar. Given this assumption, this method developed by Simon Anders [21] estimates a function that predicts the variance from the mean by calculating the sample mean and variance for each gene within replicates, and then fitting

129 6.4 Core Differentially Expressed Genes Results a curve to this data. With the diagnostic plot shown in fig 6.9 we validated that the estimates of the single gene variance functions followed the empirical vari- ance well enough, as indicated by the red line representing the local regression fit, even though the spread of the single gene variance values is considerable, as one should expect given that each variance value is estimated from just four values. Having estimated and verified the variance-to-mean dependance,

Figure 6.9: Plot of the estimates for each gene of the variance against the base levels, i.e. the count value for a tag divided by the total number of counts. The red line represents the fit from the local regression. we then proceeded to look for differentially expressed genes using the DESeq package function nbinomTest. With this function we generated the following values for each gene: the mean expression level, as a joint estimate between conditions "N" (normal) and "T" (tumour) and as a separate estimate for each condition, the fold change (FC) from condition N to T, the natural logarithm (Ln) of the fold change, and the p-value for the statistical significance of this change. The p-adjusted value is also computed by the nbinomTest function to adjust the p-value for multiple testing with the Benjamin-Hochberg pro- cedure, which controls the FDR. We first plotted the computed fold changes against the mean values and coloured in red the genes that were significant at 1% FDR (Fig 6.10). Interestingly, these genes seemed to cluster at higher values of the mean and absolute value of Ln(FC), indicative of gene expression

130 6.4 Core Differentially Expressed Genes Results levels that were coherently higher across all GNS versus NS cell lines, or all NS versus GNS cell lines. At an FDR of 10%, this analysis revealed 485 genes

Figure 6.10: Plot of the fold change computed with the DESeq package function nbinomTest versus the mean for the contrast of the two conditions "N" (normal) and "T" (tumour). The red genes are significant at 1% FDR. to be expressed at a higher average level in GNS cells (up-regulated) and 254 genes to be down-regulated (see Appendix A.1). To discern genes that are consistently up- or down-regulated among the GNS lines compared to the NS lines, and thus capture major gene expression changes common to G144, G166 and G179, we set strict criteria on fold changes and tag counts requiring that:

1. each GNS line show at least a two-fold change compared to each NS line, with the direction of change being consistent among the cell lines;

2. an absolute expression level above 30 tpm in each GNS line for up- regulated genes or each NS line for down-regulated genes.

This stringent approach yielded 32 up-regulated and 60 down-regulated genes, in the following referred to as strictly up- and down-regulated genes, respec- tively, or "core" differentially expressed genes (Table 6.4).

131 6.4 Core Differentially Expressed Genes Results

Table 6.4: Table of genes (alphabetical order) with large expression changes com- mon to the GNS lines G144, G166 and G179, relative to the normal NS lines CB541 and CB660, resulting from filtering with stringent criteria the differentially expressed genes at 10% FDR.

Gene Differential expression results Normalised tag counts

symbol gene ID log 2(FC) FDR G144ED G144 G166 G179 CB541 CB660 ADD2 119 6.32 5.1E-04 66.4 91.9 103.1 288.2 0.0 4.1 ATP1A2 477 -6.16 1.3E-08 39.0 37.5 0.6 0.0 471.0 1346.4 BACE2 25825 7.97 5.5E-11 227.5 423.1 253.4 468.2 0.0 2.7 C10orf11 83938 3.82 1.7E-03 195.7 603.7 175.1 152.8 20.5 23.6 C5orf13 9315 -3.29 4.7E-03 230.2 162.6 55.9 234.9 1985.9 962.7 C9orf125 84302 -3.78 6.1E-04 73.9 59.0 3.6 8.0 508.5 148.1 C9orf64 84267 7.88 5.3E-10 114.7 156.8 164.1 752.5 2.9 0.0 CA12 771 -4.65 9.7E-05 45.9 28.1 22.6 30.3 945.6 415.1 CACNG8 59283 -4.77 5.7E-05 33.1 28.3 3.9 2.6 405.1 220.9 CCND2 894 -3.98 9.3E-05 158.0 125.2 0.6 0.0 827.8 496.4 CD74 972 7.11 7.9E-13 3942.1 2107.8 93.4 1427.2 8.4 9.1 CD9 928 4.87 2.7E-06 1058.3 1082.4 396.5 189.3 4.8 32.9 CEBPB 1051 3.29 5.9E-03 106.7 155.6 552.5 270.9 36.7 30.2 CHCHD10 400916 3.70 9.0E-04 844.2 713.0 523.9 457.9 28.4 58.8 CTSC 1075 3.55 1.0E-03 937.2 731.4 311.7 1499.9 10.3 133.2 CXXC4 80319 -4.84 1.2E-03 23.5 21.1 0.0 0.0 219.5 200.8 DDIT3 1649 4.40 4.6E-05 933.6 1345.3 748.9 250.5 26.9 47.3 DNER 92737 -3.89 1.8E-03 90.5 523.2 9.3 847.4 8865.3 4813.8 DTX4 23220 -5.92 4.6E-05 6.8 7.8 3.2 0.0 128.8 312.5 EDA2R 60401 -5.58 7.6E-04 1.4 6.2 9.7 0.0 306.7 208.8 EPDR1 54749 3.09 5.6E-03 310.9 269.0 739.5 319.0 20.3 82.3 FAM38B 63895 -Inf 5.5E-11 0.0 0.0 0.0 0.0 778.6 166.8 FAM69A 388650 3.68 2.2E-03 513.2 393.1 155.0 439.3 27.9 23.3 FBLN2 2199 -Inf 2.0E-07 0.0 0.0 0.0 0.0 232.1 117.5 FOXG1 2290 3.30 6.8E-03 445.0 505.5 104.5 137.4 0.0 50.8 FOXJ1 2302 -4.19 3.6E-03 4.9 3.1 4.4 27.8 254.8 175.8 FUT8 2530 3.32 4.6E-03 666.0 999.8 609.6 2352.7 95.8 168.6 GABBR2 9568 -9.50 1.2E-13 0.0 0.0 5.6 0.0 2569.7 143.1 GPR158 57512 -4.55 9.9E-06 103.8 117.4 0.0 53.3 2311.0 354.9 GRIA1 2890 -4.42 4.2E-05 33.1 68.7 2.7 73.4 1780.2 292.7 HLA-DRA 3122 5.68 6.6E-07 9744.0 5434.9 138.4 270.5 57.4 19.2 HMGA2 8091 -5.68 3.7E-05 0.0 0.0 0.0 23.7 233.3 576.3 HOXD10 3236 Inf 3.9E-11 77.0 145.9 115.3 578.2 0.0 0.0 IL17RD 54756 -4.38 8.0E-04 25.5 17.5 9.8 15.3 364.6 230.4 IRX2 153572 -4.65 7.2E-04 0.0 0.0 0.0 28.9 370.0 111.1 KALRN 8997 -5.25 1.9E-04 0.0 6.3 2.5 7.1 302.9 106.2 KCTD12 115207 -4.77 4.0E-03 48.7 16.9 10.1 54.5 1349.0 147.8 LAMA2 3908 -5.21 4.4E-06 47.3 25.2 8.0 2.6 730.2 149.0 LGALS3 3958 6.44 6.1E-07 2062.3 3254.0 2538.7 134.8 32.1 14.3 LMO2 4005 -4.82 2.5E-04 20.3 21.2 0.0 4.9 134.6 369.0 LMO3 55885 -Inf 1.0E-08 0.0 0.0 0.0 0.0 136.9 446.0 LMO4 8543 3.80 5.0E-04 553.8 1311.2 455.5 1149.8 16.7 122.5 LPAR6 10161 -3.62 5.6E-03 29.8 33.8 39.6 0.0 395.9 211.3 LUM 4060 -4.26 3.2E-03 0.0 0.0 110.9 0.0 507.4 897.7 LYST 1130 5.73 3.5E-09 126.7 180.8 284.0 548.6 10.3 2.1 MAF 4094 -4.79 9.6E-04 14.9 7.1 16.0 0.0 282.4 150.4 MAN1C1 57134 3.59 1.8E-03 556.5 478.6 107.6 378.3 7.6 45.2 MAP6 4135 -3.25 7.6E-03 2.3 32.9 0.6 111.4 629.4 288.0 MBP 4155 6.47 2.7E-09 118.6 402.5 103.3 303.0 0.0 6.3 MMP17 4326 3.58 1.0E-03 1173.5 864.3 422.6 340.8 5.1 84.8 MMRN1 22915 -8.80 1.6E-06 0.0 0.0 1.3 0.0 279.4 93.2 MN1 4330 -7.23 3.0E-07 21.6 4.2 0.6 0.0 146.8 384.1 MT2A 4502 4.13 1.3E-03 2039.2 1924.4 8308.0 5237.5 285.0 301.4 MYL9 10398 -3.85 4.3E-03 3.8 3.1 110.0 8.5 796.2 370.3 NDN 4692 -3.46 2.3E-03 105.3 84.2 0.0 0.0 214.4 405.4 NELL2 4753 -3.25 1.0E-02 117.3 173.6 1.9 380.9 1073.3 2440.6 NKX2-1 7080 -5.52 1.6E-06 4.1 17.2 0.0 0.0 425.3 103.6 NNMT 4837 3.74 5.0E-04 519.9 172.4 918.1 175.6 15.2 47.4 NPTX2 4885 -6.47 1.8E-06 0.0 0.0 6.9 4.8 422.0 272.4 NTN1 9423 -4.22 1.5E-04 95.8 239.9 22.4 250.6 5458.0 966.2 NTRK2 4915 3.61 2.6E-03 90.7 163.9 174.8 219.5 0.0 30.3 ODZ2 57451 -3.68 6.5E-03 27.0 25.0 2.5 36.0 212.8 329.5 PDE1C 5137 4.13 1.3E-04 578.2 430.3 1117.8 981.4 45.7 50.8 PDZRN3 23024 -4.40 9.4E-04 18.4 22.1 12.6 3.0 278.2 251.3 PEG3 5178 -5.14 6.0E-08 105.7 130.6 8.5 0.0 2681.0 594.6 PI15 51050 -6.45 9.3E-08 20.2 3.3 2.5 5.5 397.8 262.2 PLA2G4A 5321 7.34 1.9E-12 909.7 1226.5 483.7 755.9 0.0 10.1 PLCH1 23007 -3.41 2.4E-03 152.0 134.0 83.3 118.9 2002.9 384.3

132 6.4 Core Differentially Expressed Genes Results

Gene Entrez Differential expression results Normalised tag counts

symbol gene ID log 2(FC) FDR G144ED G144 G166 G179 CB541 CB660 PLS3 5358 3.83 3.1E-04 360.2 776.6 476.6 367.2 0.0 75.6 PMEPA1 56937 4.36 5.4E-05 240.8 123.3 621.8 297.2 11.7 22.1 PRSS12 8492 4.80 2.1E-04 218.6 119.9 151.9 156.0 0.0 10.2 PTEN 5728 -5.00 2.9E-03 2.7 0.0 14.8 0.0 191.7 126.9 RAB6B 51560 -3.51 2.2E-03 71.7 52.1 29.4 38.4 754.8 160.7 RGS5 8490 -4.82 2.9E-04 4.1 13.5 6.9 3.0 343.5 112.5 RTN1 6252 -5.34 5.7E-05 8.1 5.5 0.6 19.3 385.7 286.3 S100A6 6277 6.04 5.1E-14 2259.3 324.2 5709.9 512.7 12.3 53.4 SALL2 6297 -4.13 1.7E-03 72.9 38.6 16.5 11.6 194.2 581.9 SDC2 6383 -3.56 2.9E-03 96.7 41.3 53.0 98.9 1015.1 502.4 SEMA6A 57556 -3.48 3.3E-03 68.1 50.0 3.1 0.0 257.1 140.0 SIX3 6496 -5.50 2.9E-06 2.7 26.1 0.0 0.0 203.9 594.6 SLC4A4 8671 -4.49 6.1E-05 36.0 38.9 13.1 18.8 925.2 145.3 SLCO1C1 53919 -Inf 5.9E-07 7.7 0.0 0.0 0.0 166.2 151.8 SLCO2A1 6578 -5.61 1.0E-05 38.8 15.5 0.0 0.0 213.0 297.0 SLIT2 9353 -7.18 1.5E-03 1.4 0.0 0.7 5.2 178.8 400.8 SPARCL1 8404 -4.03 1.5E-03 44.2 39.0 4.8 282.9 1448.1 2097.6 ST6GALNAC5 81849 -8.25 3.4E-09 8.9 0.0 0.0 5.8 261.3 899.5 SULF2 55959 3.31 5.3E-03 749.0 891.0 253.5 581.6 43.3 72.4 SYNM 23336 -4.23 3.3E-03 1.4 0.0 0.0 36.4 238.8 212.8 TAGLN 6876 -7.06 4.4E-11 1.9 3.1 9.4 4.1 1089.6 384.4 TES 26136 -Inf 2.5E-09 0.0 0.0 0.0 0.0 433.0 184.2 TRAM1L1 133022 -5.13 2.5E-03 0.0 0.0 0.0 11.3 121.7 146.3 TUSC3 7991 -3.74 5.9E-03 0.0 0.0 5.9 75.0 379.0 333.7

In a wide literature search that we performed on each gene of the core set, we found that many of both the up-and down-regulated genes appeared in pathways known to be affected in glioma. In the up-regulated cohort we found:

· S100A6 encodes an EF-hand calcium binding protein linked to the reg- ulation of cell proliferation, cytoskeletal organisation and metastasis. S100A6 is up-regulated in several malignancies. In gliomas, S100A6 appears to be specifically expressed by cancer stem cells [80,186].

· HLA-DRA and HLA-DRB form one of the MHC class II receptors, which present extracellular antigens to T-helper cells. CD74 or "invariant chain" takes the place of the extracellular peptide before MHC class II receptors are mature in the lumen of the endoplasmic reticulum (ER). Increased antigen presentation can help the immune system target cancer cells. HLA-DRA and HLA-DRB are induced by inflammatory cytokines and up-regulated in many tumours [421].

· MBP encodes myelin basic protein, so it’s expression in GNS lines prob- ably reflects their glial nature.

· PMEPA1 can act as an oncogene by affecting the PI3K pathway, in- cluding PTEN down-regulation. PMEPA1 is up-regulated in multiple tumours, but there are no reports implicating it in glioma [460,531].

· LMO4 encodes a transcriptional regulator involved in the development of multiple organs, including the CNS. LMO4 expression is elevated in

133 6.4 Core Differentially Expressed Genes Results

several cancers. LMO4 is well-characterised as an oncogene in breast cancer, but has not been studied in glioma. LMO4 is regulated trough the PI3K pathway, which is known to be affected in glioma [?,340].

· The transcription factor gene CEBPB is a well-characterised glioma oncogene, recently suggested to be a master regulator of a mesenchy- mal gene expression signature associated with poor prognosis [85].

· CD9 is a membrane protein implicated in invasion [253] and regulation of EGF receptor activation [455]. Its expression in astrocytic tumours correlates with malignancy [220].

· MT2A encodes a metallothionein found to suppress apoptosis in cancer cell lines [110], potentially by causing TP53 to misfold [411]. Inversely correlated expression of MT2A and TP53 proteins have been observed in glioblastoma specimens [306].

· NTRK2 encodes a receptor for brain-derived neurotrophic factor, pro- moting differentiation, proliferation and survival [118]. In several tumour types, its expression correlates with poor prognosis and metastasis [118] and the protein has been detected in a subset of cells in astrocytomas [29]. As shown in figure 6.11 the Tag-seq data available for NTRK2 happens to identify a long and a shorter isoform for this gene, potentially the one lacking the kinase domain that has also been implicated in the regulation of astrocyte morphology [366].

· FOXG1, a transcription factor gene involved in brain development, is commonly amplified in the childhood brain tumour medulloblastoma [9]. FOXG1 has been proposed to act as an oncogene in glioblastoma as well, by suppressing growth-inhibitory effects of TGFβ [448].

The set of core up-regulated genes also includes multiple genes suggested to play a role in other neoplasias, but for which we failed to find any studies implicating them in glioma. For five of these, previous studies have pointed to a role in motility and invasion: PLS3 [26,163], LMO4 [475], BACE2 [243,499], CTSC [113,530,551] and MMP17 [88,430]. Further examples of genes impli- cated in other cancers include the putative growth factor EPDR1 [8,360], which is highly expressed in some progenitor cell types [178], and PMEPA1, which can act as an oncogene by affecting the PI3K pathway and PTEN stability [460]. A subset of the core up-regulated genes have to the best of our knowledge

134 6.4 Core Differentially Expressed Genes Results

Figure 6.11: NTRK2 is a brain-derived neurotrophic factor detected in the lit- erature in a subset of cells in astrocytomas [29] and found in our Tag-seq dataset by mappings of tags on the shortest isoform. Three tracks are visible in the figure: at the top, a customised track composed of 12 sub-tracks representing the expres- sion levels of our GNS cell lines and NS cell lines in red - for tag mappings on the minus strand - and blue - for tag mappings on the plus strand. The level of tag expression is indicated by the thickness of the line representing the mapped tag; at the centre a track representing UCSC genes, RefSeq genes and Ensembl genes; at the bottom a track showing the zoom in of the central track region where the tag mapping takes place. The thicker lines in the bottom track indicate the 3’UTRs of the gene. The tags representing the NTRK2 gene map onto a region that iden- tifies both the shorter - potentially the one that lacks the kinase domain - and a longer isoform. Adapted from the UCSC genome browser (our Tag-seq track for hg19 is added by pasting the following line into the custom track box http://gns: [email protected]/\~engstrom/gns/tracks/hg19/gns\_tpm.bg.gz).

135 6.5 Large-scale qRT-PCR Validation Results not been implicated in cancer. One of these putative oncogenes is PLA2G4A, which encodes a cytoplasmic phospholipase (cPLA2α) involved in production of lipid signaling molecules with mitogenic and pro-inflammatory effects [192]. Another example is PDE1C, encoding a cyclic nucleotide phosphodiesterase that controls cellular levels of cAMP and cGMP and may regulate cell prolif- eration [126]. PDE1C expression has been observed in other glioma cell lines, but also in non-malignant astrocytes from rat brain [508]. Another putative novel oncogene is C10orf11. Disruptions of this gene have been associated with mental retardation, consistent with a developmental role [501]. Although the human gene is poorly characterised, the orthologous gene in Ciona intestinalis is required for embryogenesis and is a component of the Wnt/β-catenin sig- naling pathway, which is often activated in cancer [519]. Similar observations were made for core set of down-regulated genes:

· the TES gene (testis derived transcript) is a LIM-domain gene at frag- ile site found to be implicated in glioblastoma as well as prostate and head/neck cancer [294,349,444];

· the DNER gene (delta/notch-like EGF repeat containing) is a putative growth factor receptor found to be implicated in glioblastoma neuro- sphere formation reduce adipocyte cell proliferation [381,478];

· the PTEN gene is a tumour suppressor often down-regulated and/or deleted in gliomas [326];

· the TUSC3 gene (tumour suppressor candidate 3) is implicated in ovarian cancer [90].

6.5 Large-scale qRT-PCR Validation

To assess the accuracy of Tag-seq expression level estimates and investigate gene activity in a larger panel of cell lines, we assayed 82 core differentially ex- pressed genes in 16 GNS cell lines (derived from independent tumour samples) and 6 normal NS cell lines (Appendix tables A.3 and A.4), by qRT-PCR, using custom-designed TaqMan microfluidic arrays. The 82 validation targets were selected from the 92 core differentially expressed genes based on the availabil- ity of TaqMan probes and considering prior knowledge of gene functions. For the cell lines assayed by both Tag-seq and qRT-PCR, measurements agreed remarkably well between the two technologies: the median Pearson correlation

136 6.5 Large-scale qRT-PCR Validation Results for expression profiles of individual genes was 0.91 and the differential expres- sion calls were corroborated for all 82 genes. Figure 6.12 shows the relationship between the qRT-PCR and the Tag-seq measurements in three different panels for four different genes that were picked to highlight the dynamic range of the qRT-PCR and the Tag-seq platforms: the matrix metallopeptidase is a mem- brane protein involved in the breakdown of the ECM and a known oncogene in glioblastoma primary tumours [183]; the damage-inducible transcript DDIT3 is induced in response to a range of compounds that sensitises glioma cells to apoptosis [216]; MYL9 encodes for one of the myosin light chains and its link with glioblastoma has yet to be established (possibly acting via regulation of the cell migration process); the mannosidase gene MAN1C1 is one of the enzymes that hydrolyse the mannose sugar but has to yet be implicated in glioblastoma - although its sister gene MAN2C1 has been associated with a reduction of PTEN functionality in prostate cancer [190]. In panel a of figure 6.12 the high overall correlation between the two platforms is highlighted by the four data points that lie nearly on a straight line (R = 0.91); panel b shows the correlation values computed as the correlation between normalised Ct val- ues and tag counts across the five GNS cell lines (table of correlation values available in Appendix A.4); panel c shows the mean fold-change between the genes expressed in the GNS and NS cell lines as measured in each of the two platforms plotted so that the influence of outliers can be reduced thanks to the averaging. Across the entire panel of cell lines, 29 of the 82 genes showed statistically significant differences between GNS and NS lines at an FDR of 5% (Fig 6.13). This set of 29 genes distinguishes GNS from NS lines and may, therefore, have broad relevance for elucidating tumourigenic properties of GNS cells. Figure 6.14 shows a histogram for each of the 29 genes found to be relevant in distin- guishing GNS from NS cell lines. The height of the histogram bar represents the expression level measured via qRT-PCR for that gene and its designated probe (see title of each histogram for probe’s name). Each expression level is calculated as the arithmetic mean across all biological replicates for each cell line. The histogram bar colours comply to the ones selected for GNS and NS cell lines in the bar displayed at the top of fig 6.14, distinguishing the assayed GNS and NS cell lines in two separate groups along the x-axis.

137 6.5 Large-scale qRT-PCR Validation Results

Figure 6.12: Expression estimates correlate well between Tag-seq and qRT-PCR. (a) Expression values for MMP17, DDIT3, MYL9 and MAN1C1 in two GNS and two NS lines assayed by both technologies. Note the near-perfect agreement between Tag-seq (x-axis) and qRT-PCR (y-axis) over a wide dynamic range. (b) Histogram of correlation between Tag-seq and qRT-PCR for each of the 82 genes measured by qRT- PCR. (c) Fold-change estimates (indicating expression level in GNS lines relative to NS lines) from Tag-seq and qRT-PCR for the 82 genes. qRT-PCR confirmed greater than two-fold difference in expression (dashed lines at y = 1) for all genes.

Figure 6.13: Heatmap of 29 genes differentially expressed between 16 GNS and 6 NS cell lines. Colours indicate δδCt expression values, i.e. normalised expression on a log 2 scale where zero corresponds to the average expression between the two groups (GNS and NS).

138 6.5 Large-scale qRT-PCR Validation Results

139 6.5 Large-scale qRT-PCR Validation Results

140 6.5 Large-scale qRT-PCR Validation Results

141 6.6 Literature Mining for Differentially Expressed Genes Results

Figure 6.14: Expression levels of the 29 genes distinguishing GNS from NS lines, measured by qRT-PCR and presented as percent of NS geometric mean. Two his- tograms reveal the data for the two probes used to measure the expression levels of MYL9.

6.6 Literature Mining for Differentially Expressed Genes

An important assessment that adds value to our pathway analysis, is the clas- sification of every differentially expressed gene (FDR< 10%) as a gene known

142 6.6 Literature Mining for Differentially Expressed Genes Results to be associated with glioblastoma, cancer or yet unlinked to any of these. I extracted the information held in several literature navigating resources, such as BioGraph [280], GBMbase [162] and iHOP [197] using a script described in detail in the Methods section and reported in Appendix B, and classified each gene in one of the following categories:

· Extensive amount of literature implicates the gene in glioblastoma;

· Limited amount of literature implicates the gene in glioblastoma;

· Unknown to be implicated in glioblastoma, but known in other cancers;

· Unknown to be implicated in any type of cancer.

The information retrieved by the script for each of the differentially expressed genes was assigned to one of the four categories described above (Appendix A.2). The data outputted from this query is summarised in table 6.5.

Table 6.5: Differentially expressed genes (F DR < 10%) assigned to a four-tier classification system. The unique total refers to the unique list of all the genes in each category.

Category Number of genes Extensive amount of literature implicating the gene in GBM 238 Limited amount of literature implicating the gene in GBM 94 Unknown to be implicated in GBM, but known in other cancers 199 Unknown to be implicated in any type of cancer 232 Unique Total 748

A stripped down version of the complete table available in Appendix A.2, highlights only the genes with limited evidence or no evidence of implication in glioblastoma that appear in our integrated glioblastoma pathway (Table 6.6). Many of these genes had not been directly implicated in glioma, but they participate in glioma-related pathways and they differ in expression between GNS and NS lines. Thus, by considering expression changes in a pathway context, we identified additional candidate glioblastoma genes, such as the putative cell adhesion gene ITGBL1 [50], the orphan nuclear receptor NR0B1, which is strongly up-regulated in G179 and is known to be up-regulated and mediate tumour growth in Ewing’s sarcoma [158], and the genes PARP3 and PARP12 which belong to the PARP family of ADP-ribosyl genes involved in DNA repair. The up-regulation of these PARP genes in GNS cells

143 6.7 Isoform Differential Expression Results may have therapeutic relevance, as inhibitors of their homolog PARP1 are in clinical trials for brain tumours [270]. Our comparison between GNS and NS cell lines, thus, highlights genes and pathways that are known to be affected in glioma as well as novel candidates, suggesting that the GNS vs NS comparison is a promising approach for further understanding molecular aspects of glioma.

6.7 Isoform Differential Expression

To establish Tag-seq as a sensitive technique for the identification of tran- script isoforms differentially expressed between our GNS and NS libraries, we performed a parametric and a non-parametric test on the genes identified by multiple tag mappings using the "Ref_best" collection (see Fig 6.5).

With the non-parametric χ2test, 2,682 isoforms were found to be differentially expressed as detected by the sum over all tags for each isoform. With the para- metric approach, in which the logarithmic ratio of expression method adapted from Morrissy et al [346] was used, we computed a ratio change for each pair of tags that identified an isoform, so that a total of 2,040 differentially ex- pressed isoforms were detected. The two methods share a total of 1,454 genes that have differentially expressed isoforms in the GNS cell lines with respect to the NS cell lines at significant levels (p-value<0.01 for the non-parametric; FC>2 for parametric), with 1,228 genes uniquely identified by the logarithmic method and 586 genes uniquely identified by the non-parametric method.

In the attempt to focus on isoforms relevant to the pathways affected in glioma, we overlaid both lists of differentially expressed isoforms on the in- tegrated glioblastoma pathway described in section 7.5 and shown in figure 7.8. This highlighted the presence of 48 matching genes from the logarithmic (non-parametric) method and 57 matching genes from the parametric χ2test method, with 35 genes shared between the two methods. Each of the 35 genes, as well as the 13 genes identified exclusively through the logarithmic method and the 22 genes identified exclusively through the parametric χ2test method, were manually scrutinised on the University of California Santa Cruz (UCSC) genome browser by adding our tag mappings as a custom track together with the UCSC gene track and the gene predicted Ensembl track, both in "full" mode for the NCBI37.2/Ensembl 64 human genome assembly. For the follow-

144 6.7 Isoform Differential Expression Results

Table 6.6: Genes with limited evidence or no evidence of implication in glioblastoma that appear in our main glioblastoma pathway. In "Others" the genes with an important physiological role in the brain that do not appear in our glioblastoma pathway are listed.

Gene* Log2(FC)** Association Implications in Citations with glioma other neoplasms CACNA1A 7.1 None Prostate cancer (mouse model) [235] CACNA1C -8.2 None Liver cancer [25] CACNG7 -2.6 None None - CACNG8 -2.6 None None - CAMK1D -2.4 None Breast cancer [51] CPLX2 6.4 None None - DDIT3 4.4 Limited General (cellular stress response) [218,227,330,415] (CHOP, GADD153) DUSP16 4.2 None Burkitt’s lymphoma [265] FGF19 - None Liver, and colon cancer [119] ITGA4 3.0 Limited Chronic lymphocytic leukaemia, [244,263,309] (CD49D) breast cancer and others ITGBL1 + None None - MAP3K5 5.1 Limited Gastric cancer and histiocytoma [94,189,504] (ASK1) NFATC2 + Limited Breast cancer [64,95,311,348] (NFAT1) NFKBIZ 5.1 None Liposarcoma [170] NR0B1 + None Lung adenocarcinoma and Ew- [?,236] (DAX1) ing’s sarcoma NR1D1 2.9 None Breast cancer [246]

PARP3, 4.1, 2.9 Homology*** The PARP gene family is in- [312,463] PARP12 volved in DNA repair and sev- eral other processes related to tu- mourigenesis PERP 3.8 None Lung and skin cancer [92,317] PPEF1 4.4 Limited None [296] SNAP25 3.3 None Lung cancer [175] SYT1 -2.5 None None - TNFRSF14 4.0 None Follicular lymphoma [93] TNFSF4 4.0 None Generally implicated in immune [424] (OX40L) response to tumours

* Aliases are listed in parentheses.** Gene expression log2FC between the GNS and NS lines compared by Tag-seq. Some genes were detected exclusively in GNS or NS lines (indicated in column two by + or -, respectively).*** The homolog PARP1 has been implicated in glioma. ing list of genes an alternative tag mapping-based isoform was observed, three of which are reported as adapted images from the UCSC genome browser in figures 6.15 and 6.16:

145 6.7 Isoform Differential Expression Results

· AKT2: two reads map on the 3'UTR of the longest isoform whilst one read maps on the 3'UTR of a shorter subset of isoforms;

· AKT3: two reads map on the shortest isoform and one on the 3'UTR and of the middle and longest isoforms, respectively;

· BMP7: one read maps on the 3'UTR of the longest isoform and one read on the 3'UTR of a middle-length set isoforms;

· BRCA1: one read maps on the 3'UTR of the longest isoform, whilst one read maps on an alternatively spliced exon, contained in a small subset of isoforms and which is a single isoform itself;

· CANX: all six reads map onto the 3'UTR of the longest isoform, one on the 3'UTR of a shorter isoform;

· CTSB: three reads identify three different isoforms on their 3'UTRs;

· ERBB2: two reads map on the 3'UTRs of two different isoforms (but one is not recognised by Ensembl Gene Predictions);

· FGFR1: two reads map on the 3'UTRs of two different isoforms;

· GRIA2: three reads identify three sets of lengths of isoforms on their 3'UTRs with one identifying two of them;

· HLA-A: two reads identify two different isoforms on their 3'UTR;

· NBR1: one read maps on the 3'UTR of the longest isoform and one read on a consistently present exon, which identifies also a shorter isoform;

· NFYC: two reads identify 3'UTRS of two different isoforms with one mapping between the 3'UTRs and last intron;

· PTEN: two reads map on the 3'UTR of the longest isoform and on the 3'UTR of a much shorter isoform;

· RFX5: one read maps on the 3'UTR of two longer isoforms and one identifies a smaller subset of shorter ones on the 3'UTR;

· NTRK2: two reads mapping on alternative 3'UTRs;

· FGFR10P: one read maps on the 3'UTR and one on the fifth exon, which is displayed as a 3'UTR box on the Ensembl Prediction track;

146 6.7 Isoform Differential Expression Results

· METTL13: two reads mapping on alternative 3'UTRs;

· TSC22D2: two reads mapping on alternative 3'UTRs;

· RSRC2: two reads mapping on alternative 3'UTRs;

· TPM1: one read maps on the longest 3'UTR and the other on an internal exon that is alternatively spliced in one isoform.

We used the GenemiR package described in chapter 8 to find out how many genes, out of the 2,682 and 2,040 genes with differentially expressed isoforms, were predicted to harbour the same microRNA targeting sites in their 3'UTRs. The functionalities at the core of the GenemiR software package are to output a list of microRNAs when a list of genes is inputted and, vice versa, to output a list of genes when a list of microRNAs is inputted. The database used by these core functionalities consists of all the microRNA to mRNA predictions made by a maximum of eight leading algorithms and it varies in size depend- ing on which algorithms the user has chosen to select as part of a specific search. The search performed in this instance made use of the union of the microRNA predictions from five of the most widely accepted target prediction algorithms: PicTar [247], PITA [224], Targetscan [272], miRanda [213] and DIANA-microT [315,316]. The choice of using the union set of the results from the five prediction algorithms as opposed to the intersection set is justi- fied by the fact that the intersection set for each of the two lists of over 2,000 genes is null. In fact, as explained in chapter 8, the algorithms available to predict microRNA to mRNA interactions generate such different outputs that it is extremely rare to have all of them agree on the predictions (granted, of course, that the list of genes is not so large that finding an intersection set becomes statistically possible, or that the number of prediction algorithms se- lected are less than 3). When inputted into GenemiR, the 2,682 genes we found with the parametric method, yielded a list of 5,016 microRNAs. Similarly, the 2,040 genes we found with the logarithmic (non-parametric) method, yielded a list of 4,463 microRNAs. When the two lists of microRNAs were intersected (2,358 microRNAs) and inputed again into the GenemiR package to find the genes targeted by those microRNAs, we found 765 genes to be regulated by the intersected list of microRNAs (Fig 6.17). Table 6.7 shows the microRNA predictions resulting from the GenemiR query, stratified by prediction algo- rithm and origin of the gene list - parametric vs. non-parametric method - as well as the results that are common to both methods and unique to each.

147 6.7 Isoform Differential Expression Results has BMP7 , like other members BMP7 superfamily and is represented in our Tag-seq data by two tags that identify transcripts of β encodes a member of the TGF BMP7 different lengths. The expressionof levels the across bone our morphogenetic GNSalso protein cell been family, lines recently plays (top) implicated a are inImage key the in adapted role suppression blue from in of (plus the the tumourigenicity strand) of transformation UCSC and stem-like of genome red GBM mesenchymal browser. cells (minus cells when strand). into released bone from and endogenous neural cartilage. precursor However, cells [97]. Figure 6.15:

148 6.7 Isoform Differential Expression Results UTRs. The expression levels across our GNS cell lines ' gene, encoding for a member of the tropomyosin family of actin-binding proteins, is represented by two tags, the rightmost TPM1 The Figure 6.16: of which shows a very(top) high are expression in in blue the (plus antisense strand) strand and and red identifies the (minus four strand). longest Image 3 adapted from the UCSC genome browser.

149 6.7 Isoform Differential Expression Results

Figure 6.17: This figure summarises the process of finding genes with differentially expressed isoforms (common to both the parametric and logarithmic method) that are predicted to harbour the same microRNA targeting sites. One way to assess this is to find which genes are predicted to be regulated by the same microRNAs. In blue we find the numbers of microRNAs that are predicted to regulate the 2,682 genes (green set) and the 2,040 genes (orange set). In black we find the numbers that refer to genes. The 2,358 microRNAs common to both sets are predicted to regulate 765 genes that are in common to the parametric and the logarithmic methods. The prediction database used comes from the union of five prediction algorithms: PicTar [247], PITA [224], Targetscan [272], miRanda [213] and DIANA-microT [315,316].

In order to find in more detail which of the genes with differentially expressed isoforms that were also predicted to harbour the same microRNA targeting sites, actually had one or more microRNA seed sequences in their 3'UTRs, we used the Ensembl gene coordinates and the union of all microRNA seed coor- dinates given by the same five prediction algorithms used earlier (Targetscan, miRanda, PITA and DIANA-microT) to validate which genes harboured which microRNA seed sequences. We found that, of the 765 genes predicted to target the same microRNA targeting sites, 226 of the 2,682 genes with differentially expressed isoforms (identified with the parametric method) and 340 of the 2,040 genes with differentially expressed isoforms (identified with the logarith- mic method) hosted a microRNA seed sequence between at least two tags (Fig 6.18). These amounts clearly show that microRNA is a widely adopted mech- anism of isoform expression modulation in GNS cell lines. Since microRNA array data was available for four GNS cell lines (G7, G26, G144, G166) and four NS cell lines (CB660, CB130, CB152, CB171), we used it to cross verify

150 6.7 Isoform Differential Expression Results

Table 6.7: Summary of predicted microRNAs targeting differentially expressed isoforms of the genes identified with the parametric and logarithmic method. The predictions from each of the five algorithms (PicTar [247], PITA [224], Targetscan [272], miRanda [213] and DIANA-microT [315,316]) are shown in distinct rows.

Prediction Non- Parametric Common Unique to Unique to algorithms parametric microRNAs microRNAs non- parametric microRNAs parametric PicTar4 164 164 164 - - PITA 674 673 672 miR-658 miR-937 miR-886-3p TargetscanS 148 148 147 miR-615-3p miR-450a miRanda 677 677 677 - - DIANA 510 508 503 miR-151-1p miR-146b-3p micro-T miR-324-5p miR-199b-5p miR-369-5p miR-566 miR-423-3p miR-744 miR-602 miR-877 miR-658 miR-941

Figure 6.18: Schematisation of the localisation of the microRNA seeds given by the five prediction algorithms within the 3'UTRs of pairs of differentially expressed isoforms identified by two tags that map onto each one. Of the 765 genes with differentially expressed isoforms identified by both the parametric and logarithmic methods, 226/2,682 genes and 340/2,040 genes hosted at least one microRNA seed sequence between at least two tags. that the microRNA predictions found for the 765 genes with differentially ex- pressed isoforms were reflected in experimental data (Appendix F). Of the 11 microRNAs that were predicted to regulate the 765 genes with differentially expressed isoforms that were also available in the microRNA array dataset, we found seven to be implicated in regulatory pathways in GBM (Table 6.8):

151 6.7 Isoform Differential Expression Results

Table 6.8: MicroRNA array results for GNS cell lines with respect to NS cell lines. Based on a literature survey, these microRNAs would be interesting candidates to validate experimentally in a TLDA assay.

microRNA log 2(FC) FDR miR-128 0.9 0.000206443 miR-137 1 0.000121777 miR-34a -1.5 0.000004.43 miR-26a 1.4 0.0000143 miR-10b 2.9 1.03E-08 miR-451 1 3.15E-05 miR-129-3p 1 9.63E-05

· The levels of miR-128 have been found to be consistently lower in glioblastoma compared with normal brain tissue [259,364]. Opposite findings were observed in our microRNA array, in which miR-128 is up- regulated in GNS cells with respect to NS cells (Table 6.8). miR-128 directly targets the transcription factor E2F3a, which activates genes necessary for the progression of cell-cycle and can thus inhibit prolifera- tion of brain cells by negatively regulating E2F3a. This microRNA also directly targets BMI1, a gene that is thought to act as an oncogene in glioblastoma by regulating tumour suppressors like P53 and CDKN2A. BMI1 also promotes stem cell renewal by acting as part of a Polycomb Si- lencing Complex to silence the expression of genes - including CDKN2A and CDKN1A tumour suppressors - involved in differentiation and senes- cence. Low levels of miR-128 in glioblastoma may contribute to glioma growth by allowing the increased expression of BMI1 to promote an un- differentiated self-renewing state. High levels of miR-128 in GNS cells with respect to NS cells may identify a stem cell pool specific regulation that has yet to be studied in detail. miR-128 is known to be highly expressed in neurons but its role in the brain is still unknown and is surmised to be the promotion of neuronal differentiation through pre- vention of stem cell self-renewal. Our isoform data analysis with the non-parametric method showed that miR-128 is predicted to target the nerve growth factor receptor associated protein 1 NGFRAP1, a p75NTR- associated cell-death executor mediated by the common neutrophin re- ceptor p75NTR [350] that also plays a role in NGF-induced apopto- sis in oligodendrocytes [351]. Analsysis with the logarithmic method showed that miR-128 is predicted to target the Rab GTPase guanine nu-

152 6.7 Isoform Differential Expression Results

cleotide exchange factor GAPVD1, a gene fundamental for the activation of RAB5A during the engulfment of apoptotic cells [238] (Fig 6.19).

· miR-137 is one of the most down-regulated microRNAs in glioblas- toma compared with normal brain tissue. In our microRNA microarray dataset, miR-137 is up-regulated in GNS cell lines with respect to NS cell lines. Since miR-137 directly targets CDK6, if over-expressed in glioblas- toma it would induce cell-cycle arrest. The level of miR-137 increases upon differentiation of glioma neurosphere cultures and if over-expressed in these cells it leads to the expression of markers consistent with neu- ronal differentiation. These data suggest that the lower expression of this microRNA in glioblastoma, and the higher expression in GNS cells with respect to NS cells, reflects the lack of tumour cell differentiation in the former and the presence of regulatory mechanisms downstream of miR-137 in the latter, since cancer stem cells are the least differentiated within the lineage hierarchy according to the cancer stem cell hypothe- sis [364]. In our microRNA prediction analysis, miR-137 was predicted to target the serine/threonine-protein kinase 40 STK40 gene, a nega- tive regulator of NFKB and p53-mediated gene transcription [200] and the transcription factor TCF4, which associated with β catenin mediates Wnt signaling by trans-activating downstream target genes [11].

· miR-34a expression is down-regulated in glioblastoma and in p53-null mutant gliomas since non p53-null mutants, expressed in many tumours and that can possess gain-of-function activities, do not regulate tran- scription of miR-34a. This suggests miR-34a as a transcriptional target of P53 [278]. In our microRNA microarray dataset miR-34a is found to be down-regulated as well. Furthermore, miR-34a potently inhibits the protein expression of MET, a hepatocyte growth factor receptor encod- ing a tyrosine-kinase, as well as MET 3'UTR reporter activity in glioma, medulloblastoma cells and astrocytes. miR-34a also inhibits Notch-1 and Notch-2 protein expression and their 3'UTR reporter activities, as well as CDK6 protein expression in glioma cells. Transient transfection of miR-34a into brain tumour cell lines inhibited cell proliferation, cell cycle progression, cell survival, and cell invasion but did not affect hu- man astrocyte cell survival and cell cycle. miR-34a transfection, also, did not affect the protein levels of PDGFRA in any tested cell line, al- though miR-34a has predicted seed matches in the 3'UTR of PDGFRA.

153 6.7 Isoform Differential Expression Results , a GTPase guanine nucleotide exchange factor essential during engulfment GAPVD1 Isoform detection by multi tag mapping of gene of apoptotic cells [238] andand involved red in (minus the strand). degradation of Image EGFR adapted [473]. from The the expression UCSC levels genome across browser. our GNS cell lines (top) are in blue (plus strand) Figure 6.19:

154 6.7 Isoform Differential Expression Results

miR-34a transfected cells generated xenografts that were statistically significantly smaller than control miR transfected xenografts, demon- strating that miR-34a expression inhibits in vivo glioblastoma xenograft growth [278]. In our microRNA prediction analysis miR-34a is predicted to target C1orf9, an open reading frame 9 of since no p53 targeting is predicted due to the deletion of the p53 genomic location within the GNS cell lines.

· miR-26a is up-regulated in glioblastoma and targets PTEN through the direct binding in its 3'UTR of the B2 and B3 sites, mediating transla- tional repression and reduced steady-state levels of the protein. West- ern blotting demonstrated that miR-26a over-expression achieved a 50% knockdown of PTEN protein in two glioblastoma cell lines, accompa- nied by an enhanced Akt signaling pathway. In our microRNA mi- croarray dataset we observe up-regulation of miR-26a as well. In addi- tion to enhancing tumourigenesis, miR-26a effectively represses endoge- nous PTEN protein in a relevant PDGF-driven glioma model system. The miR-26a-mediated knockdown of EZH2, a histone methylatrans- ferase, and SMAD1, a transcription factor, was also observed in glioblas- toma [259,364]. In our analysis miR-26a was predicted to target the SMAD1 gene, of which two isoforms were detected through tag mapping (Fig 6.20).

· miR-10b is up-regulated in glioblastomas but its function has not been described yet. Increased levels of miR-10b in breast cancer correlated with the disease’s progression [259,364]. In our microRNA microarray dataset miR-10b is also up-regulated.

· miR-451 inhibited the growth of transfected glioblastoma cells, as de- tected by neurosphere formation assays [259,364]. We found miR-451 to be up-regulated in our microRNA microarray dataset, in line with its role as a cell cycle breaker and growth inhibitor.

· miR-129-3p is found to be down-regulated in glioblastomas but its func- tion remains unknown to date [259,364]. Interestingly, we observed miR- 129-3p to be up-regulated in our microRNA microarray dataset, possibly indicating a different regulation in action at the stem cell level.

155 6.7 Isoform Differential Expression Results . The expression levels across our GNS cell lines (top) are in blue (plus SMAD1 Isoform detection by multi tag mapping of gene Figure 6.20: strand) and red (minus strand). Image adapted from the UCSC genome browser.

156 6.8 Long ncRNA Differential Expression Results

6.8 Long ncRNA Differential Expression

In contrast to microarray expression profiling, Tag-seq is not limited by pre- selected probes targeting the known transcriptome and we took advantage of this to discover differentially expressed ncRNAs. To this end, we called differ- ential expression for a combination of tags that mapped to the genome, the transcriptome or virtual tags and filtered the results using coding gene anno- tations. At an FDR of 10%, this analysis revealed 25 differentially expressed putative non-coding RNAs, 18 of which were up-regulated and the remaining 7 down-regulated (Appendix C). Five of these are putative long antisense RNAs that are known to be transcribed from the opposite strand of protein-coding genes CDKN2B, CD27, PAX8, MCF2L2 and TXNRD1. Two, instead, are known long non-coding RNAs: HOTAIRM1 [547] and NEAT1 [314] (Table 6.9). CDKN2BAS, an antisense transcript to tumour suppressor CDKN2B, is of particular interest because of its role in Polycomb-mediated repression of CDKN2B and CDKN2A [537]. We detected CDKN2BAS exclusively in G144, G144ED and G166, consistent with the locus being deleted in G179 according to aCGH data (see 6.3). The functions of the remaining differentially expressed

Table 6.9: Multiple putative long non-coding RNAs differentially expressed between GNS and NS cells at 10% FDR.

Category Up-regulated Down-regulated Known antisense transcripts 2 (over CDKN2B, CD27) 1 (over PAX8) Other known ncRNAs 2 (HOTAIRM1, NEAT1) 0 Intronic RNAs 2 (in CDKN2B, TXNRD1) 1 (in FAM38B) Intergenics RNA 9 7 non-coding RNAs are unknown, but a unifying feature of 15 of them is that they are located in gene deserts (Appendix D). Several of these transcripts display an expression pattern similar to a protein-coding gene near the gene desert, suggesting that the transcripts may be functional RNAs regulating nearby genes [372] or indicate transcription from active enhancers [230]. The coding genes exhibiting correlated expression to these non-cancer RNAs are cancer-related genes CTSC (Fig 6.21) [113,551] and DKK1 [461] and develop- mental regulators IRX2, SIX3 and ZNF536 [412]. In the case of the Cathepsin C (CTSC) gene we were able to detect two dif- ferent isoforms with Tag-seq, as well as a long non-coding RNA lying within a 150 kilobase distance from the two isoforms. The CTSC gene encodes a 42Megabase sized genomic segments devoid of protein-coding genes in vertebrates.

157 6.8 Long ncRNA Differential Expression Results lysosomal cysteine protease that is part of the peptidase C1 family of proteins and is responsible for activating serine proteases in immune and inflammatory cells to function in processes of bone remodelling, epidermal homeostasis, and antigen presentation [500]. During cancer progression cathepsins are secreted into the extracellular matrix where they promote tumour invasion by cleav- ing components of the matrix and the basement membrane, thereby creating a passageway for the migration of cancer cells. The disruption of adherens junctions via cleavage of E-cadherin is another example. Cathepsins can also initiate proteolytic cascades in which they activate other proteases such as matrix metalloproteinases, which in turn promote invasion [166,167]. Mem- bers B, L and S of the cathepsin family have been identified as regulators of E-cadherin function through cleavage of its N-terminus. Knockout mice mutants for cathepsins B, L or S show obvious defects such as the decrease of tumour cell proliferation, tumour invasion and tumour vascularity, while CTSC -/- knockout mice have more subtle defects that consist in failing to activate granzymes A and B in cytotoxic lymphocytes [167]. The hypothetical regulation performed by the long non-coding RNA BC038205 on CTSC isoforms could be one of maintaining the expression levels of this gene high in GNS cell lines through a form of direct regulation which is re- lieved when the non-coding RNA is absent or expressed at lower levels, such as in NS cells. In figure 6.21 three panels are shown in which the location (panel a) of the CTSC isoforms and that of the non-coding RNA is shown along the human reference genome; the expression levels as detected by Tag-seq (panel b) are displayed in a histogram to visually highlight the relationship between them; a correlation plot (panel c) that shows the presence of a correlation in both the expression trends between the first CTSC isoform and the non-coding RNA BC038205, and the second CTSC isoform and the same non-coding RNA gene. The correlation was calculated with the cor.test function in the R stats package that uses the Pearson’s product moment correlation coefficient to test for association between paired samples. HOTAIRM1 is known to be strongly up-regulated in human NB4 promyelo- cytic cell lines and normal hematopoietic cells upon induction of granulocytic differentiation and is found to be up-regulated in our GNS lines as well as in the gliomas from the Parsons et al [383] Tag-seq data with respect to our fetal human NS cell lines and the Parsons et al primary brain samples. Interestingly, its knock-down in NB4 cells causes down-regulation of HOXA1 and HOXA4 genes and these genes are up-regulated in our Tag-seq dataset (Fig 6.22).

158 6.8 Long ncRNA Differential Expression Results

Figure 6.21: Correlated expression of CTSC and a nearby ncRNA. (a) CTSC (cathepsin C) is located in a gene desert harboring an uncharacterized ncRNA tran- scribed in the opposite direction (cDNA BC038205). Image adapted from the En- sembl Genome Browser [147]. (b) Both CTSC and the ncRNA have strongly elevated expression in the GNS lines relative to the NS lines, with highest levels in G179.(c) Correlation plots for the BC038205 ncRNA and first isoform of CTSC (grey) and second isoform of CTSC (blue).

159 6.8 Long ncRNA Differential Expression Results Histogram displaying the normalised tag counts found in our Tag-seq dataset and the Parsons et al [383] Tag-seq dataset for the long ncRNA and the surrounding HOX genes. HOTAIRM1 Figure 6.22:

160 Chapter 7

Dataset Correlation Analyses

Contents 7.1 Enrichment Analysis ...... 161 7.2 Glioblastoma Expression Signatures ...... 168 7.3 Tumour Expression Correlation ...... 170 7.4 Survival Analysis ...... 182 7.5 Glioblastoma Pathway Analysis ...... 187

7.1 Enrichment Analysis

The differential expression analysis revealed 485 genes to be up-regulated and 254 genes to be down-regulated between GNS and NS cell lines at an FDR of 10% (Appendix A.1). We performed Gene Set Enrichment Analysis (GSEA) to investigate which pathways were most highly represented in our set of 739 differentially expressed genes. In order to evaluate the enrichment of a group of genes that together define a pathway, the GSEA method looks for those genes to follow the same trends in the experimental dataset of interest. In fact, if a number of genes that belong to the same pathway change expression level even moderately, it could mean that in the evaluated experimental setting that pathway is being affected. By having established an a priori relationship between the genes involved in the same pathway, the GSEA method detains more statistical power to detect smaller changes that affect the whole set as compared to a per gene statistic. In order to achieve its goal the GSEA method first ranks the genes in the dataset of interest according to a per gene statistic such as a p-value and then uses the complete ranked list to assess how the genes that belong to a

161 7.1 Enrichment Analysis Results specific pathway distribute across the ranked list, whether they are randomly distributed throughout the ranked list or primarily found at the top or bottom. Three key elements define the GSEA method [474]:

1. Calculating an enrichment score that measures the degree to which a set of genes belonging to a pathway is overrepresented at the top or bottom of the entire ranked list of differentially expressed genes, for example (and corresponds to a weighted Kolmogorov-Smirnov like statistic).

2. Estimating the significance of the enrichment score by permuting the phenotype labels and recomputing the enrichment score of the genes in the pathway each time. This generates a null distribution that the p- value of the observed original enrichment score is calculated against.

3. Adjusting the enrichment score to account for multiple hypothesis testing when entire databases of pathways are evaluated at once, like in our case with the KEGG and Gene Ontology (GO) databases. In this case the enrichment score is first normalised to account for the size of the path- way, and then the proportion of false positives, or FDR, is calculated for each normalised enrichment score. The FDR associated to each nor- malised enrichment score corresponds to the estimated probability that a pathway with a given normalised enrichment score represents a false positive finding.

The enrichment analysis using GO [106] and the KEGG pathway database [2] confirmed the set of 739 differentially expressed genes to be enriched for path- ways related to brain development, glioma and cancer (Table 7.2 and 7.1). We also observed enrichment of regulatory and inflammatory genes, such as signal transduction components, cytokines, growth factors and DNA-binding factors. Several genes related to antigen presentation on MHC class I and II molecules were up-regulated in GNS cells, consistent with the documented expression of their corresponding proteins in glioma tumours and cell lines [120,174]. In line with these findings, affected pathways from the KEGG database included Anti- gen Processing and Presentation, Diabetes Mellitus Type I, Cytokine-cytokine receptor interaction, Neuroactive ligand-receptor interaction, MAPK signaling and, expectedly, Glioma, a collection of genes involved in glioma formation (Table 7.2). The first two plots from the GSEA run that identified these path- ways as being significantly altered in our dataset, are shown in figure 7.1. In the top panels of the figure the green distribution represents the trend of the

162 7.1 Enrichment Analysis Results 2.00E-04 n.s. n.s. n.s. 0.002 n.s. n.s. n.s. 54 18 9 3 45 2 0 0 n.s. 2.00E-07 3.00E-16 2.00E-05 0.058 7.00E-11 9.00E-04 3.00E-10 74 56 61 33 66 25 5 14 7.00E-07 1.00E-07 2.00E-12 0.014 3.00E-07 2.00E-08 0.008 1.00E-07 Differentially Expressed, 739 genesGenes Up-regulated, p-value 485 genes Down-regulated, 254 genes 128 74 70 Genes36 p-value111 27 Genes p-value 5 14 Selected Gene Ontology terms and InterPro domains enriched among differentially expressed genes. Table 7.1: A. Biological Process GONervous terms system developmentCell differentiation Cell proliferationCell adhesion Cell migrationImmune response Antigen processing and presentationCellular ion homeostasis B. Molecular Function GO 106Signal terms transducer activity Receptor activity 2.00E-10Cytokine activity Growth factor 17 activityMHC class II receptorSequence-specific activity DNA binding 4.00E-07C. 86 Interpro domains Immunoglobulin-likeMHC classes 3.00E-04 I/II-like antigen 44Homeobox recognition 62 protein 3.00E-04 0.005 52 17 83 20 3.00E-04 5.00E-10 59 44 8.00E-07 0.011 0.014 45 2.00E-05 30 0 3.00E-08 0.026 n.s. 27 34 48 28 n.s. 0.053 17 14 n.s. 8.00E-06 0.002 n.s. 32 18 6.00E-06 35 n.s. 3 0.002 18 13 n.s. 0.012 n.s. 10 n.s. P-values indicating the statistical significanceusing of the enrichment Bonferroni of method; these terms n.s., were not computed significant with (p-value>0.1). Fisher’s exact test and corrected for multiple testing

163 7.1 Enrichment Analysis Results

Table 7.2: Representative KEGG pathways from signaling pathway impact analysis of gene expression differences between GNS and NS lines.

Pathway Genes p-value Predicted status in GNS cell lines Cytokine-cytokine receptor interaction 29 4.00E-12 Activated Chemokine signaling pathway 15 5.00E-06 Activated Neuroactive ligand-receptor interaction 21 2.00E-04 Inhibited Antigen processing and presentation 11 7.00E-04 Activated MAPK signaling pathway 24 0.011 Activated Glioma 10 0.013 Activated ECM-receptor interaction 10 0.041 Inhibited Calcium signaling pathway 15 0.041 Activated P-values and status predictions were obtained by signaling pathway impact analysis [26], taking fold-change estimates and pathway topology into account. P-values were FDR-corrected for multiple testing. enrichment score - a number that reflects the degree to which a gene set is overrepresented at the top or bottom of a ranked list of genes - as the analysis walks down the ranked list. Both enrichment score distributions who positive values because they indicate gene set enrichment at the top of the ranked list. The score at the peak of the plot (farthest from 0.0) is the enrichment score for the gene set. The middle portions of the two plots show where the members of the gene set appear in the ranked list of genes. The set of genes that appear in the ranked list prior to the peak enrichment score are the ones that contribute most to the enrichment score and are commonly referred to as "leading edge subset". The bottom portions of the two plots show the value of the ranking metric, which is by default the signal-to-noise ratio and has been kept such in our GSEA runs, as you move down the list of ranked genes. The ranking met- ric measures a gene’s correlation with a phenotype: a positive value indicates correlation with the GNS phenotype and a negative value indicates correla- tion with the NS phenotype. Additional information from our GSEA runs is provided in the two plots described in figure 7.2. Surprisingly, via GSEA we detected an up-regulation in the GNS lines of several Major Histocompatabil- ity Complex (MHC) class II genes, as well as related genes involved in antigen presentation on MHC class I complexes (Fig 7.3). Several works have shown that MHC class I and II molecules are involved in aspects of human cancer pathology such as invasion and migration [327,421,546]. These two classes of molecules are known to be a fundamental component of the adaptive immune response and their misregulation potentially leads to faulty antigen recognition and processing.

164 7.1 Enrichment Analysis Results

Figure 7.1: Enrichment plots for the top two pathways revealed through GSEA of the KEGG database of pathways [2].

Figure 7.2: (a) The plot of nominal p-values, which estimates the statistical signif- icance of the enrichment score for a single pathway, vs normalised enrichment (NES) score provides a quick way to grasp the number of enriched pathways that are sig- nificant. The FRD q-value represents the estimated probability that the normalised enrichment score represents a false positive finding. (b) The line graph of the en- richment scores across pathways provides a quick visual way to grasp the number of enriched gene sets at every enrichment score value. The first two peaks are neg- ative global enrichment scores that represent the enrichment of their corresponding number of pathways in the NS phenotype. The last two peaks are positive global enrichment scores that represent the enrichment of their corresponding number of pathways in the GNS phenotype.

The endogenous pathway is used by any nucleated cell in the body to present endogenous, or cytosolic, fragments of proteins to cytotoxic CD8+ T cells. The receptors that are responsible for such presentation are the Major Histocompatibility Complex (MHC) class I molecules. By exposing cytosolic

165 7.1 Enrichment Analysis Results proteins outside the cellular membrane, MHC class I receptors can cause the activation of cytotoxic T cells against exogenous peptides derived from an in- fection. In fact, healthy cells are ignored by the cytotoxic T cells thanks to the sensitivity of the recognition system between the loaded MHC class I receptor and the T cell receptor, while cells containing foreign proteins can be recog- nised and, therefore, killed. MHC class I molecules are heterodimers and they consist of two polypeptide chains, the α and the β2-microglobulin chain, of which only the α chain is polymorphic. Loading of MHC class I receptors with peptides occurs inside the lumen of the endoplasmic reticulum (ER) [323]. The exogenous pathway is used by a cohort of specialised cells termed pro- fessional Antigen Presenting Cells, or APCs. These cells are macrophages, dendritic cells and B cells and they present fragments of extracellular proteins to helper CD4+ T cells triggering a cell-mediated (Th1) or humoral (Th2) re- sponse for peptides deriving from extracellular pathogens. The receptors that are responsible for such presentation are the MHC class II molecules. MHC class II molecules, like the MHC class I molecules, are heterodimers but the MHC class II molecules consist of two homologous peptides, the α and β chain, both polymorphic and encoded by the HLA gene. Loading of both classes oc- curs inside the cell, but MHC class II molecules are loaded only when the vesicle generated at the ER fuses with a lysosome containing digested endo- cysed extracellular proteins. In the ER vesicle, the MHC class II molecule has its peptide-binding cleft blocked by CD74, a trimer called "invariant chain" which prevents the binding of cellular peptides. Once this vesicle fuses with the lysosome, the MHC class II molecules are unloaded with the invariant chain and reloaded with a peptide from the lysosome via an MHC class II-like structure called HLA-DM. The stable MHC class II thus formed is presented on the cell surface. the expression of MHC class II molecules is constitutive only on professional APCs but can be induced by cytokines such as IFNG on other cell types, including cancer cells [492]. All genes involved in the two pathways are listed below (7.3) [327], showing all alleles of the same Human Leukocyte Antigen (HLA) class. In a study on ovar- ian cancer, the MHC class II receptor α polymorphic chain, encoded by the HLA-DRA gene, was found to be the most over-expressed gene [421]. Although also the β chain, HLA-DRB, was over-expressed, the almost complete lack of this polypeptide at the protein level via an unknown post-transcriptional or post-translational mechanism of regulation, precluded the formation of a ma- ture HLA-DR receptor and a similar pattern was observed when staining brain

166 7.1 Enrichment Analysis Results

Table 7.3: Summary of all MHC class I and II genes and the FC (F DR > 10%) of those measured in the Tag-seq dataset.

Cytosolic pathway Exogenous pathway

Gene log 2(FC) Gene log 2(FC) HLA-A 3.04 Alpha chains HLA-B HLA-DMA Inf HLA-C HLA-DQA HLA-E 2.09 DPA1 5.73 HLA-F DQA1 Inf HLA-G DRA 5.68 HLA-K Beta chains HLA-L HLA-DMB B2M HLA-DOB PSMB5 HLA-DPB1 3.83 PSMB6 HLA-DQA2 6.56 PSMB7 HLA-DQB1 4.60 PSMB8 HLA-DQB2 PSMB9 HLA-DRB1 PSMB10 HLA-DRB3 TAP1 2.36 HLA-DRB4 TAP2 HLA-DRB5 8.36 CANX Invariant chains CALR CD74,CLIP 7.11 TAPBPL 2.57 Transcriptional co-activator CIITA tumour tissue [419]. It seems that compensatory mechanisms such as cyto- plasm over-expression of Ii/CD74 and reduction of HLA-DRB are in action to decrease the tumour’s immunogenicity by decreasing the ability of MHC class II to present tumour-specific antigens to the host immune system. Interest- ingly, no such mechanism is observed in our study, in which HLA-DRA, HLA- DRB and CD74 are all over-expressed in the GNS cell lines with a fold-change much greater than two (Table 7.3). The same magnitude of up-regulation is observed for the MHC class II-like peptide HLA-DM that aids in the unloading of CD74 and loading of immature MHC class II receptors with the extracel- lular peptides present in the lysosome to form the mature molecule. Other two MHC class II αβ heterodimer receptors are over-expressed according to our dataset and these are HLA-DP and HLA-DQ. In fact, both homologue chains encoded by the HLA-DPA1 and HLA-DPB1 loci and HLA-DQA1 and HLA-DQB1 loci are over-expressed and have the potential to form a mature MHC class II receptor. Altogether these findings suggest that in the GNS cell lines the exogenous pathway is not affected by compensatory mechanisms of

167 7.2 Glioblastoma Expression Signatures Results regulation and the presentation of extracellular peptides can potentially be completed for recognition from T helper cells. As for MHC class I-mediated immunogenicity, our study identifies two important players as strongly up- regulated in GNS cell lines, HLA-A and TAP1. The protein encoded by TAP1 is an ATP binding cassette transporter involved in the pumping of degraded cytosolic peptides across the ER into the vesicles where MHC class I receptors such as HLA-A assemble. TAP1 has been show to interact with TAPBP and HLA-A [386]. Similarly to the expression observed for MHC class II molecules, the over-expression of these MHC class I molecules seems to suggest that part of the endogenous pathway might also be activated in immunosurveillance response mechanisms in GNS cell lines. We find an overall transcriptional up-regulation of MHC class II molecules and a smaller subset of MHC class I molecules, suggestive of the absence of transcriptionally active compensatory mechanisms. However, it is impossible with the available data to determine what happens at the post-transcriptional and post-translational level because this up-regulation should be checked against protein expression data.

7.2 Glioblastoma Expression Signatures

Analysis of microarray gene expression data for hundreds of high-grade glioma samples and a smaller number of xenografts have shown that most tumours can be classified into a small number of subtypes correlated with survival and response to therapy [148,390,511]. The largest such study to date identified four glioblastoma subtypes, each characterised by a distinct gene expression signature encompassing 210 genes [511]. The subtypes were named "proneu- ral", "neural", "classical" and "mesenchymal" based on which genes were up- regulated in their respective expression signatures. To investigate whether these subtype signatures could be captured by our Tag-seq data, Tag-seq ex- pression data for three primary glioblastoma tumours, 11 xenografts and two normal brain samples was analysed, which had been produced with the same Tag-seq protocol used on our GNS and NS cell lines by Parsons et al [383]. The correlations were highly significant (p < 0.01) for all tumour and xenograft samples and both normal brain samples (Fig 7.3), confirming that Tag-seq cap- tures the subtype expression signatures previously observed in large microarray datasets. Specifically, of the tumour and xenograft samples, three were classi- fied as proneural, seven as classical and three as mesenchymal. However, both normal brain samples and none of the glioblastoma samples, were classified as

168 7.2 Glioblastoma Expression Signatures Results neural, consistent with this subtype being the least common and characterised by markedly better prognosis, expressing genes associated with normal brain and neurogenesis [390,511]. In comparing the GNS line expression profiles to the subtype signatures, we found that both G166 and G179 correlated strongly with the mesenchymal signature. Mesenchymal subtype markers with elevated expression in these two lines included MET, CD44, CD68 and CASP1 [511]. G144 did not correlate significantly with any of the four signatures, but showed a slight positive correlation (R = 0.07) with the proneural one. Supporting such classification of G144 were several of the hallmarks of the proneural sub- type emphasised by Verhaak et al 2010, such as high expression of oligodendro- cytic development genes PDGFRA, NKX2-2 and OLIG2, as well as ERBB3, DCX and TCF4 genes, and low levels of tumour suppressor CDKN1A. All in all, the Tag-seq data seems to agree with the results from the microarray technology, which is very reassuring given these technologies are so different. When we verified in which of the four subtypes defined by the Verhaak et al study [511] our 29 genes fell - the genes distinguishing GNS from NS cell lines measured via qRT-PCR - we found that only three of them, namely CEBPB, TES and PLS3, were represented as part of the original signature genes. Interestingly, however, each one of the three genes fell within the mes- enchymal subtype. The mesenchymal phenotype has recently been associated with neoplastic transformation in the CNS as a state in which cells contrive an uncontrolled ability to invade and stimulate angiogenesis [390,497]. Since the defining characteristics of the aggressiveness of GBM are invasion of the local brain parenchyma and the ability to stimulate novel angiogenesis [154,219], our findings reflect this GBM identity. Although 26 of the 29 genes were not represented in the original 210 gene signature defined by Verhaak et al, the three genes that were present belonged to the mesenchymal subtype, conducive of the fact that they are core players in establishing the characteristic aggres- siveness of GBM (as confirmed by two separate platforms). CEBPB, in fact, which we find to be up-regulated in GNS lines with respect to NS lines in our qRT-PCR measurements (Fig 6.14), is an important player in the mesenchymal regulatory module together with STAT3 and is needed for mesenchymal transformation in human glioma cells [85]. The TES gene, which we instead find to be expressed only in NS cell lines (Fig 6.14), has been recently identified as a tumour suppressor that is methylated in tumours and is responsible for repressing cell growth. It is possibly inactivated by transcrip- tional silencing resulting from CpG island methylation [493] and is interestingly

169 7.3 Tumour Expression Correlation Results located on the long arm of chromosome 7, which we know is gained in our GNS cell lines. Thus, very effective transcriptional silencing mechanisms must be actively suppressing the TES gene. Finally, PLS3 encodes for an actin-binding protein that has yet to become associated with any form of neoplasia and that we find to be up-regulated in our GNS cell lines with respect to NS cell lines as measured by qRT-PCR.

Figure 7.3: Correlation with glioblastoma subtype expression signatures for tis- sue samples and cell lines interrogated by Tag-seq. (a) Shows the heatmap from the Verhaak et al [511] paper and in (b) colours indicate the correlation (Pearson R) between subtype-specific centroid values determined by Verhaak et al [511] and gene expression in our indicated Tag-seq measured samples. Cells showing positive correlation are labeled with p-values indicating the significance of the correlation.

7.3 Tumour Expression Correlation

To investigate whether the identified core set of differentially expressed genes that included 32 up-regulated and 60 down-regulated genes, showed similar expression patterns in primary tumours as in GNS lines, we made use of a cohort of public microarray data (see Methods, table 5.4). The core set of differentially expressed genes included genes with established roles in glioblastoma, such as PTEN [326] and CEBPB [85], as well as others not previously implicated in the disease. It was our interest to investigate

170 7.3 Tumour Expression Correlation Results how each of these genes behaved in different settings: stem cell component of the tumour (GNS cells), primary tumour and lower grade gliomas, given they differ significantly from one another under a biological perspective. In fact, the tissues in the primary tumours comprise a heterogeneous mixture of cell types, whilst our GNS culture system has been designed to maintain the very specific stem cell component of the tumour in culture. Thus, we did not expect a perfect agreement between tissue and cell based results. Fur- thermore, assessing the behaviour of these same genes in lower grade gliomas could give us insights into the key players that determine the severity of the disease. Considering that our GNS cell lines have all been classified as primary GBMs, the differences observed from the lower grade glioma datasets will be especially meaningful in telling us how the same genes behave in the two dif- ferent categories (if we approximate that a significant percentage of the lower grade gliomas transform into higher grade gliomas such as GBMs). Panels a and c of figure 7.4 compare the GBM data in the TCGA dataset to the non-neoplastic brain tissue data in the TCGA dataset (altogether referred to as dataset GBM in Methods), for the core up-regulated (a) and core down- regulated (c) genes. Panels b and d, on the other hand, compare the GBM (grade IV glioma) data to the grade III glioma data in the combined Phillips and Freije datasets (altogether referred to as dataset HGG in Methods), for the core up-regulated (b) and core down-regulated (d) genes. From this figure we observe that there is a clear trend for the core up-regulated genes to be more highly expressed in glioblastoma tumours than in non-neoplastic brain tissue (Fig 7.4a; p = 0.02, randomisation test) and an opposite trend for the core down-regulated genes (Fig 7.4c; p = 3x10-5). This means that, out of all the core differentially expressed genes that we measured with Tag-seq, a signif- icant part of the ones we found to be more highly expressed in GNS cell lines with respect to NS cell lines are also more highly expressed in primary GBM with respect to non-neoplastic brain tissue (histogram bars in the direction of positive average log2(FC) identify these genes in panel a of figure 7.4). Similarly, a significant part of the core differentially expressed genes that we found to be down-regulated in our GNS cell lines with respect to our NS cell lines, are also down-regulated in primary GBM with respect to non-neoplastic brain tissue (histogram bars in the direction of negative average log2(FC) iden- tify these genes in panel c of figure 7.4). The colour of the histogram bars represents the level of significance of the comparison with black indicating a p-value<0.01 and grey indicating a p-value>0.01.

171 7.3 Tumour Expression Correlation Results

Figure 7.4: Core gene expression changes in GNS lines are mirrored in glioblastoma tumours. Gene expression in tumours for the core up-regulated (a, b) and down- regulated (c, d) genes. The gene sets were identified by comparison of Tag-seq expression profiles for GNS and NS cell lines. Bars depict average FC between glioblastoma and non-neoplastic brain tissue (a, c) and between glioblastoma and grade III astrocytoma (b, d). Black bars indicate genes with significant differential expression in the microarray data (p < 0.01). The heatmaps show expression in individual samples relative to the average level in non-neoplastic brain (a, c) or grade III astrocytoma (b, d). The side dots point to the genes that belong to the cohort of 29 differentially expressed genes measured via qRT-PCR and have the same colour gradient adopted in figure 6.13. One gene (CHCHD10) not quantified in the TCGA dataset is omitted from panel a.

172 7.3 Tumour Expression Correlation Results

The orientation of the bar represents the direction of FC and the height of the bar the magnitude of the FC (−3 < F C > 3). We hypothesised that ex- pression of these genes might also differ between glioblastoma and less severe gliomas and, upon examination of the expression patterns for grade III gliomas and glioblastoma in the dataset HGG, we found the core up-regulated genes to be more highly expressed in glioblastoma than in grade III glioma (Fig 7.4b; p < 10-6), while the core down-regulated genes showed the opposite pattern (Fig 7.4d; p < 10-5). Thus, of the core differentially expressed genes we mea- sured with Tag-seq, a significant portion of the highly expressed ones in GNS cell lines with respect to NS cell lines, were also more highly expressed in grade IV gliomas (GBMs) with respect to grade III gliomas (histogram bars in the direction of positive average log2(FC) identify these genes in panel b of figure 7.4). This trend may be indicative of the fact that the core up-regulated genes in our GNS cell lines that are also up-regulated in GBM primary tumours, belong to a cohort of genes involved in regulatory networks already activated in grade III gliomas. The very fact that the same genes are similarly over ex- pressed in our GNS cell lines and in grade III gliomas may be an indication of a specific prognosis, namely the progression to a higher grade glioma. Of course, at this level of analysis this is only a hypothesis and it would be interesting to test it further and try to establish if it can be used as a pre-diagnostic tool. Similarly to the matching trends observed for the core up-regulated genes, a significant portion of the differentially expressed genes that we found to be down-regulated in GNS cell lines with respect to NS cell lines, were also down-regulated in grade IV gliomas (GBMs) with respect to grade III gliomas

(histogram bars in the direction of negative average log2(FC) identify these genes in panel d of figure 7.4). This opposite trend is not surprising consid- ering the hypothesis mentioned before. If the regulatory networks involved in the progression of the disease from a grade III to a grade IV have already been activated in grade III gliomas, then a set of complementary genes and regu- latory networks must be silenced and rendered transcriptionally inactive. In this scenario postulating a diagnostic tool that derives from the combination of the two oppositely moving groups of genes may serve a pre-diagnostic tool purpose well. Of course, a specific study would need to be carried to verify the validity of this hypothesis. The dots next to the gene names on the sides of the heatmaps in figure 7.4 are there to highlight whether a gene also belongs to the cohort of the 29 genes found via qRT-PCR to distinguish GNS cell lines from NS cell lines. The colour

173 7.3 Tumour Expression Correlation Results of the dot indicates the direction (red or blue) and the magnitude (intensity of red or blue) of the FC as measured via qRT-PCR and in doing so adopts the same colour scheme of the heatmap in 7.4. Table 7.4 summarises all the information gathered in the literature searches for the 29 genes differentially expressed as measured via qRT-PCR, information that is expanded upon in the paragraphs below. Up-regulated in GNS cell lines. The genes HOXD10, CD9, PLA2G4A, MT2A, SULF2, DDIT3, PLS3, CEBPB, PRSS12 and LYST are all highly ex- pressed in both GNS cell lines and primary GBMs. On the contrary, genes FOXG1, LMO4, ADD2 and PDE1C are up-regulated in GNS cell lines but are down-regulated in primary GBMs. Interestingly, FOXG1 is proposed to act as an oncogene in GBM by suppress- ing the growth inhibitory effects of TGFβ [448], so that the up-regulation we observe in GNS cell lines (Fig 7.4a) could be a cell type specific regulation that especially affects the stem cell component of the tumour. This would corrob- orate the hypothesis that NS cells are candidates for tumour-initiating cells in GBM. Also this up-regulation is not mirrored in grade III astrocytomas, indicating different oncogenic factors in action (Fig 7.4b). The transcriptional regulator LMO4 is of particular interest as it is involved in the development of multiple organs, including the CNS, and its expression is elevated in several cancers [202,339,475,488,542]. LMO4 is especially well stud- ied as an oncogene in breast cancer and regulated through the phosphoinosi- tide 3-kinase pathway [340], which is commonly affected in glioblastoma [326]. Similarly to FOXG1, the up-regulation of LMO4 in GNS cell lines but not in primary GBM (Fig 7.4a) could be reflective of its oncogenic role in GBM tumour-initiating cells. Interestingly, LMO4 could be an oncogene early at work in the progression of the disease from a lower to a higher grade, as its up-regulation is observed in grade III astrocytomas (Fig 7.4b). The ADD2 gene encodes a cytoskeletal protein that interacts with FYN, a tyrosine kinase promoting cancer cell migration [456,533]. The up-regulation of this gene in GNS cell lines is reflective of the invasiveness that charac- terises GBM, and is possibly lost through differentiation since ADD2 appears to be down-regulated in primary GBM data (Fig 7.4a). The small level of up-regulation observed for this gene in grade III astrocytomas (Fig 7.4b) is a reflection of its activity early on in the progression of the disease. PDE1C is a cyclic nucleotide phosphodiesterase gene that we observe to be up- regulated in GNS cell lines and down-regulated in primary GBM (Fig 7.4a).

174 7.3 Tumour Expression Correlation Results Shima et al. 2001 (11526103), Pariser et(19107234), al. Lafleur 2005 and (16105548), HemlerHomma Ferrandi 2009 et et (19211836) al. al. 2006 2010 (16465418), (19838659) Nerlov 2007al. (17658261), 2009 Zahnov (19549908) 2009Seoane (19351437), et Carro al. et al. 2004 (15084259), 2010 Adesina (20032975) etal. al. 2010 2007 (20368557) (17522785) Osborne et al.(19593777), Sasayama et 1998 al. (9773404), 2009 Reddy (19536818), Sun etSum et al. et al. al. 2011 2008(16865272), 2002 (21419107) Yu (18922890), (11751867), et Mizunuma al. Ma et 2008 et al. (19099607), al. 2003 Montanez-WiscovichRasmussen et (12771919), et al. 2007 Sum al. 2009 et (17898713), 2003(17823980), (19648968) al. (14692531), Baffa Natkunam Hurt 2005 et et et (15897450), al. al. al. Taniwaki 2009 et 2004 (19687312) (14998494), al. 2009 MurakamiMaier 2006 et et al. al. 2007(17982672), (18059226), Lim Peng 1997 et et (9444362), al. al. Cui 2009 2007 (19062161), et Puca al.Aizawa et et al. al. 2003 20092011 (12646258), 1992 (18996371) (21525872) (1394972), Yamasaki Hu et et al. al. 2003 2007(15183717), (12913118), (17914565), Fassunke Chapman et Krona and al. et KnowlesDas 2008 2009 and al. (18819986) (19626646), Sharma Jörnsten 2005 2007 et (16142372), Vatter al. etal. al. 2010 2005 (20944117), (15816855), HanStewart Dolci et et et al. al. al. 2007 2010 2006 (17895620), (20683962), (16455054) Kim Caiazza et etIkeda al. al. et 2011 al. 2011 (21262355) (21119660) 2005Mitsui (16142308), et Capriotti al. et 2007 al. (17223089), 2008 Matsumoto-Miyai (18569641) etFears al. et al. 2009 (19303856) 2006 (16574663), Watanabe etJohansson al. et 2006 al. (16132527), 2005Alaoui Theocaris (15750623), et Morimoto-Tomita et et al. al. al. 2010 2008 2005 (19855436), (20840587) (16331886), Lai Dai et etLiu al. al. et 2005 2010 al. (16192265), (20725905), 2011Gunnersen Lemjabbar- Phillips (21144834), et et al. Pitre al. et 2000(18378184), 2012 al. (11008214), Zhao (22293178) Shields 2012 et et (22337773) 2010 al. al. (20012321) 2002 2009 (11773051), (19329940), Yeo Kim et al. et2009 al. 2006 (19550145), (16402363), 2010 Gunduz Assinder et (20705054), etal. al. al. Yeo 2010 et 2009 2009 (20180808) (19289703), al.MacGrogan Ma et 2010 et al. al. (20336793),(9671399), 2010 Prasad 1996 Bashyam (20626849), et et (8661104), Weeks al. al. et Booksteinal. al. 2005 et 2008 (16036106), 2010 (18840272), al. Pils (20573277), Arasaradnam et Qui 1997 et al. et al. (9088270), 2005 Ahuja 2010 (16270321), (20505342), et Guervos Bui al. et et al. al. 1998 2007 (9850084), 2010 (17641416), (19812376) Li Cooke et et al. 1998 Selected references (PubMed IDs) Evidence other cancers? No Yes Yes Yes Yes Yes Yes Yes No No No Yes Yes Yes Yes glioma? No Yes Yes Limited No No Yes Limited No No No Limited Yes Limited Limited Up-regulated in GNS YesUp-regulated in GNS LimitedDown-regulated in GNS Yes Limited YesDown-regulated in Yes GNS No ShiUp-regulated et in al. GNS 2000 Oyadomari and (10662783), Mori Kawashima 2004 et (14685163), al. Ragel NoDown-regulated et Akai 2002 in et al. Yes (12185197), GNS al. 2007 Zöller 2004 (17486380), 2009 (15497774), Kaul (19078974), Young and No and Kolesnikova Maltese Narita et 2009 2007 al. (19724676), (17473167),Down-regulated Cleynen Meng 2009 in and et GNS Van de Ven 2008 No (18202751), No Liu et Down-regulated Adamowicz et in Yes al. GNS 2006 (16752383) LimitedUp-regulated in Yes GNS Kaplan Yes et al. 2008 (18043242) LimitedUp-regulated Vater et in al. GNS 2009 (19016712) Yes NoDown-regulated Medjkane et in al. GNS 2009 Kuroda (19198601), et Lu al. Yes etDown-regulated al. 1999 in 2010 (10548494), GNS (21139803) Maeda et al. Yes 2001 (11803583), Yes DiLella etDown-regulated Hernandez in al. et GNS al. 2001 (11304808), 2000 Yes Nelson (10838595), Moolwaney et Yes and al. Igwe 2004 2005 (15950779), Linkous Yes et al. 2010 (20729478), Jeong et Down-regulated Lin in et GNS al. Yes 1993 The (8428952), Yes Cancer Arpin Genome et Atlas al. Research Network 1994 2008 (7806577), (18772890), Bos Hollander et Su et al. et al. 2009 al. 2011 (19421193), (21430697) Kroes 2003 et (14612505), al. Kari Yes 2010 et (20616019), Jing al. Oster et al. et 2003 al. 2005 (12782714), (15657940), 2011 Pan (21400501) et al. 2008 (18509200), Noetzel et al. 2010 (20543860), Sun et al. Tatarelli 2010 et (19853601), al. 2000 (10950921), Tobias et al. 2001 (11420696), Mueller et al. 2007 (16909125), Martinez et al. Up-regulated in GNS Up-regulated in GNS Up-regulated in GNS Up-regulated in GNS Up-regulated in GNS Down-regulated in GNS Up-regulated in GNS Down-regulated in GNS Up-regulated in GNS Down-regulated in GNS Up-regulated in GNS Down-regulated in GNS Up-regulated in GNS Down-regulated in GNS Down-regulated in GNS Literature survey for the 29 genes found to distinguish GNS from NS lines across a panel of 21 cell lines. The table details whether each CD9 DDIT3 HMGA2 IRX2 LYST MAP6 MYL9 NELL2 PLA2G4A PLS3 PTEN ST6GALNAC5 SYNM TES Gene (aliases) CategoryADD2 CEBPB FOXG1 Evidence in HOXD10 LMO4 MAF MT2A NDN PDE1C PLCH1 PRSS12 SDC2 SULF2 TAGLN TUSC3 Table 7.4: gene has previously been implicated in glioma or other neoplasias, and includes references to relevant publications.

175 7.3 Tumour Expression Correlation Results

Up-regulation of PDE1C has been associated with proliferation in other cell types through hydrolysis of cAMP and cGMP [126,437]. Up-regulation in GNS cell lines may foster the proliferation that characterises these cells, which may be lost in more differentiated progeny belonging to the primary tumour sample. In figure 7.4b we observe a down-regulation of PDE1C in grade III astrocytomas. Altogether, FOXG1, LMO4, ADD2 and PDE1C may be part of the GNS ex- pression profile that defines the stem cell identity of GBM and is therefore lost through the differentiation pathways undertaken by the rest of the tumour cells that do not retain their stem cell identities, according to the cancer stem cell hypothesis. Of the ten genes that are highly expressed in both GNS cell lines and primary GBMs, HOXD10 encodes a protein with a homeobox DNA-binding domain that is known to be involved in limb development and differentiation [82]. HOXD10 protein levels are suppressed by a microRNA (miR-10b) which is highly expressed in gliomas, and it has been suggested that HOXD10 suppres- sion by miR-10b promotes invasion [476]. Interestingly, the HOXD10 mRNA up-regulation we observe in GNS cell lines and GBM tumours is not mirrored in grade III astrocytoma, perhaps reflective of the less invasive phenotype. In fact, miR-10b is present at higher levels in glioblastoma compared to gliomas of lower grade [476]. The CD9 gene encodes a cell-surface glycoprotein, or antigen, that has been previously implicated in glioma with a role in adhesion and migration. In the CNS CD9 is expressed in the myelin sheath and is believed to suppress the metastatic potential of some human tumours including gliomas [221]. We find up-regulation of CD9 in GNS cell lines as well as primary GBM (Fig 7.4a), but a down-regulation in grade III astrocytoma (Fig 7.4b), perhaps reflective of the lower invasive potential of the latter. The PLA2G4A gene encodes a phospholipase enzyme that catalyses the hy- drolysis of membrane phospholipids to lipid-based cellular hormones that then regulate a variety of intracellular pathways. PLA2G4A has not been linked to GBM but has been implicated in other neoplasias [192,285,341]. We observed an up-regulation of PLA2G4A in GNS cell lines as well as primary GBM (Fig 7.4a), but a down-regulation in grade III astrocytoma (Fig 7.4b). The MT2A gene encodes a metallothionein that has yet to be implicated in gliomas, but has been observed to interact with the kinase domain of a member of the PKC family in prostate cancer [422] and TP53 in breast cancer epithe-

176 7.3 Tumour Expression Correlation Results lial cells to possibly regulate apoptosis in the latter [373]. Also in the case of MT2A we observed an up-regulation in GNS cell lines as well as primary GBM (Fig 7.4a), but a down-regulation in grade III astrocytoma (Fig 7.4b). The SULF2 gene encodes a sulfatase that edits the sulfation status of heparan sulfate proteoglycans on the outside of cells and, in this way, regulates criti- cal signaling pathways [433]. Disregulation of SULF2 has been implicated in non-small cell lung cancer [268], pancreatic cancer [357], hepatocellular car- cinoma [254], breast cancer [344], and gliomas, in which knock-down of the SULF2 gene resulted in decreased GBM growth in vivo in mice. Molecu- larly, ablation of SULF2 resulted in decreased PDGFRα phosphorylation and decreased downstream MAPK signaling activity. Interestingly, of this obser- vation on the proneural GBM subtype defined by Verhaak et al [511] that is characterized by aberrations in PDGFRα, showed the strongest SULF2 expres- sion [391]. In our observations SULF2 was up-regulated in GNS cell lines and primary GBM (Fig 7.4a), in line with the observations of Phillips et al [391] that made it a candidate oncogene. We observed SULF2 to also be strongly up-regulated in grade III astrocytoma (Fig 7.4b) indicating the possibility of a regulatory impact on behalf of SULF2 early in the progression of the disease. The DDIT3 gene encodes the pro-apoptotic protein CHOP that is known to drive the down-regulation of the anti-apoptotic mitochondrial protein Bcl-2, thereby favouring apoptosis through the activation of and cas- pase 3. Studies in hepatoma and pheochromocytoma cell lines have shown that the transcription factor encoded by CEBPB (C/EBPβ) promotes the expres- sion of DDIT3 [324] and thus of CHOP, which in turn can inhibit C/EBPβ by dimerizing with it and acting as a dominant negative [324]. This interplay be- tween CEBPB and DDIT3 may be relevant for glioma therapy development, as DDIT3 induction in response to a range of compounds sensitises glioma cells to apoptosis [217]. In line with the pro-apoptotic role of DDIT3 described, we ob- served up-regulation of the gene in GNS cell lines and primary GBM (Fig 7.4a) and down-regulation in grade III astrocytoma (Fig 7.4b); similarly, we found CEBPB to be up-regulated in GNS cell lines and primary GBM (Fig 7.4a), and down-regulated in grade III astrocytoma (Fig 7.4b). PLS3 (T-plastin) encodes a regulator of actin organisation and its over-expression in the CV-1 fibroblast-like cell line resulted in partial loss of adherence [27]. The elevated levels of PLS3 expression we observe in GNS cell lines and primary GBM may thus be relevant to the invasive phenotype. Accordingly, the less invasive lower grade III astrocytoma shows down-regulation of PLS3 (Fig 7.4b).

177 7.3 Tumour Expression Correlation Results

The PRSS12 gene encodes a protease that can activate tissue plasminogen acti- vator (tPA) [335], an enzyme which is highly expressed by glioma cells and has been suggested to promote invasion [168]. We observe a slight up-regulation in the GNS cell lines and primary GBM of PRSS12 (Fig 7.4a), and a slight down-regulation in grade III astrocytomas (Fig 7.4b). The LYST gene encodes a vesicular transport protein called the "lysosomal trafficking regulator" that so far has not been implicated in any neoplasia but is known to be associated with a rare recessive disorder (Chédiak-Higashi syn- drome) caused by a microtubule polymerisation defect that decreases phago- cytosis ability [87]. In line with the hypothetical increase of cellular activities and rates, and thus of vesicular transport, in GNS cells, we observed LYST to be up-regulated in GNS cell lines and primary GBM (Fig 7.4a), and slightly down-regulated in grade III astrocytomas (Fig 7.4b). Down-regulated in GNS cell lines. The panels c and d of figure 7.4 the core down-regulated genes found through Tag-seq are used to evaluate the compar- ison of the GBM data to the non-neoplastic tissue data (TCGA dataset), and the comparison of the GBM (grade IV) to the grade III astrocytoma (HGG dataset). In this comparison the genes that were also identified via qRT-PCR as being differentially expressed between GNS cell lines and NS cell lines, are highlighted by the blue colour gradient dots. The trends highlighted in figure 7.4 show matching down-regulation for genes NELL2, TUSC3, ST6GALNAC5, PLCH1, NDN, MAP6 and PTEN in GNS cell lines and primary GBM, and non matching down-regulation in GNS cell lines and up-regulation in primary GBM for genes MAF, MYL9, HMGA2, SDC2, SYNM, IRX2, TES and TAGLN. The MAF gene encodes a transcription factor and oncoprotein that belongs to the same AP-1 super-family of JUN and FOS. MAF has been associated with multiple myeloma whereby its up-regulation is suggested to enhance myeloma proliferation and adhesion to the bone marrow [406]. Although not previously linked to glioma, we find MAF to be down-regulated in GNS cell lines but slightly up-regulated in primary GBM, perhaps as an indicator of its cell type restricted proliferation enhancing functions (Fig 7.4c). Accordingly, we ob- served a slight down-regulation of MAF in grade III astrocytoma with respect to primary GBM (Fig 7.4d). The protein encoded by the MYL9 gene is a myosin light chain that has previously been associated with the stem cell component of lung adenocar- cinoma [447] and medullary breast cancer [54] in gene expression profiling studies. MYL9 has never been implicated in glioma and we observed down-

178 7.3 Tumour Expression Correlation Results regulation of its expression in GNS cell lines opposed to its up-regulation in primary GBM (Fig 7.4c), and a strong down-regulation in grade III astrocy- toma (Fig 7.4d). The HMGA2 gene encodes a transcriptional regulator that belongs to the non- histone family of structural proteins. The members of this family act as chro- matin architectural factors and contain structural DNA-binding domains that allow them to act as transcriptional factors. We find HMGA2 to be down- regulated in GNS lines and up-regulated in primary GBM (Fig 7.4c), and slightly down-regulated in grade III astrocytoma (Fig 7.4d). Low or absent protein expression of HMGA2 has been observed in GBM compared to low grade gliomas [14] and HMGA2 polymorphisms have been associated with survival time in GBM [295]. The SDC2 gene encodes a transmembrane heparan sulfate proteoglycan that participates as an extracellular matrix receptor in the processes of cell prolifera- tion, cell migration and cell-matrix interaction. An altered expression of SDC2 has been detected in esophageal carcinoma [201], colon carcinoma [184], fi- brosarcoma [380], prostate cancer [108,407] and gliomas, where over-expression of SDC2 promotes membrane protrusion, migration, capillary tube formation and cell-cell interactions in microvascular endothelial cells [141]. We found SDC2 to be down-regulated in GNS cell lines and grade III astrocytoma, but slightly up-regulated in primary GBM (Fig 7.4). The SYNM gene is a type IV intermediate filament that has recently been shown to interact with the LIM domain protein Zyxin, thereby possibly mod- ulating cell adhesion and cell motility. Aberrant SYNM promoter methylation has been associated with early breast cancer relapse [362]. In gliomas SYNM has been found to promote AKT-dependent GBM cell proliferation by antago- nising protein phosphatase PP2A, the major regulator of Akt dephosphoryla- tion [396]. We found SYNM to be down-regulated in GNS cell lines and grade III astrocytoma, but slightly up-regulated in primary GBM, although with a relatively high p-value (Fig 7.4). The IRX2 gene is a iroquois-class homeobox genes that has been associated with development of the vertebrate embryo and in humans specifically in the development of brain [322] and breast [274]. Over-expression of IRX2 has been detected in soft tissue sarcomas [7]. IRX2 has been proposed to enhance antitumour immune responses in that ex vivo pre-treatment of CD8+ T cells with IRX-2 provided protection from tumour- induced apoptosis [112]. In line with this suggested role of IRX2, we observed the gene to be strongly down-regulated in GNS cell lines and grade III astro-

179 7.3 Tumour Expression Correlation Results cytoma, but up-regulated in primary GBM (Fig 7.4). Finally, we found the TAGLN and TES genes to be absent or low in most GNS lines, but displaying the opposite trend in GBM tissue compared to normal brain (Fig 7.4c) or grade III astrocytoma (Fig 7.4d). Interestingly, the TES gene is located on chromosome 7, which is known to be gained in all our GNS cell lines (see 6.3 section). A strong regulatory effect must be in action in GNS cell lines to effect the down-regulation of TES. Similarly to the SYNM gene, TES has been shown to interact with the LIM domain protein Zyxin and therefore is believed to have a role in cell motility and is often found in focal adhesions [109]. The TAGLN gene encodes an actin protein found in fibrob- lasts and smooth muscle, the expression of which is down-regulated in many cell lines and may be an early marker for the onset of transformation [260]. Both TAGLN and TES have been characterised as tumour suppressors in ma- lignancies outside the brain and TES is known to often be silenced by promoter hypermethylation in GBM [30,349]. Altogether, MAF, MYL9, HMGA2, SDC2, SYNM, IRX2, TES and TAGLN may be part of a GNS expression profile that helps define the stem cell identity of these cells and the tumour that derives from them according to the cancer stem cell hypothesis. Interestingly, the expression pattern of most of these genes is mirrored in grade III astrocytomas. Of the seven genes that are down-regulated in both GNS cell lines and pri- mary GBMs, TUSC3 is a candidate tumour suppressor known to be silenced by promoter methylation in GBM, particularly in patients over 40 years of age. Loss or down-regulation of TUSC3 has been found in other cancers, such as colon cancer, where its promoter becomes increasingly methylated with age in the healthy mucosa [12]. These data suggest that transcriptional changes in healthy aging tissue, such as TUSC3 silencing, may contribute to the more severe form of glioma in older patients. In line with its role as tumour suppres- sor, we found TUSC3 to be down-regulated in GNS cell lines, primary GBM and grade III astrocytoma (Fig 7.4). The PLCH1 gene is a member of the phospholipase C family of enzymes that cleave phosphatidylinositol 4,5-bisphosphate to generate second messengers in- ositol 1,4,5-trisphosphate and diacylglycerol. PLCH1 is thus involved in phos- phoinositol signaling [228], just like the frequently mutated phosphoinositide 3-kinase complex [326]. We found PLCH1 to be down-regulated in GNS cell lines and primary GBM (Fig 7.4c) and slightly up-regulated in grade III as- trocytoma (Fig 7.4d).

180 7.3 Tumour Expression Correlation Results

The PTEN lipid phosphatase gene is a well known tumour suppressor that is frequently deleted or mutated in GBM and its down-regulation affects the correct regulation of the RTK/PI3K/PTEN pathway in which it acts as an instrumental player [326]. PTEN is mutated in a variety of other cancers be- sides gliomas, including prostate, breast, endometrial cancer and melanoma [377,378,439,514]. We observe the expression of PTEN to be down-regulated in GNS cell lines and primary GBM (Fig 7.4c) but up-regulated in grade III astrocytoma (Fig 7.4d). The ST6GALNAC5 gene belongs to the sialyltransferase family of proteins that modify proteins and on the cell surface to alter cell-cell or cell- extracellular matrix interactions [498]. ST6GALNAC5 facilitates the transmi- gration of cancer cells through the blood-brain barrier and is known to mediate breast cancer metastasis to the brain [67]. In GBM ST6GALNAC5 has already been observed to be down-regulated with respect to normal brain tissue [250], and we observed it to be down-regulated in GNS cell lines as well as primary GBM and slightly up-regulated in grade III astrocytoma (Fig 7.4). The NDN gene is an imprinted gene expressed exclusively from the pater- nal allele that has no introns and acts as a growth suppressor. Studies in mice suggest that the Necdin protein suppresses growth in postmitotic neu- rons [209,487]. Necdin has also been suggested to interact with and negatively regulate HIF1α, a factor that mediates cellular homeostatic responses like angiogenesis to reduce O2 availability [342]. The expression of NDN is also known to be down-regulated due to the transcriptional control implemented by STAT3 in human melanoma, prostate cancer and breast cancer [188]. We found NDN to be strongly down-regulated in GNS cell lines and primary GBM but strongly up-regulated in grade III astrocytoma (Fig 7.4). The MAP6 gene encodes a microtubule-associated protein that is calmodulin- binding and calmodulin-regulated and is therefore involved in microtubule sta- bilisation. MAP6 has yet to be associated with any neoplasia and we observed it to be down-regulated in GNS cell lines and primary GBM, and slightly up-regulated in grade III astrocytoma (Fig 7.4). Finally, the NELL2 gene en- codes a glycoprotein containing several EGF-like repeats that is found in the cytoplasm and is involved in neural cell growth and differentiation as well as oncogenesis [521]. Together with NELL1, NELL2 is predominantly expressed in neuroblastoma and other embryonal neuroepithelial tumours [305], but has yet to be characterised in glioblastoma. We observed its expression to be down- regulated in GNS cell lines and primary GBM, as well as grade III astrocytoma

181 7.4 Survival Analysis Results

(Fig 7.4). Overall, we can confirm that the set of core differentially expressed genes identified by Tag-seq defines an expression signature characteristic of glioblastoma and related to glioma histological grade.

7.4 Survival Analysis

To explore the relevance in glioma of the patterns observed for the GNS vs. NS cell line comparison, we decided to integrate clinical information with tumour expression data. Although this analysis was not performed by the candidate it is included in this thesis because it is very relevant to the rest of the material and analyses performed by the candidate and will therefore help paint for the reader a more complete picture of the relevance of GNS cell lines in glioblas- toma cancer research. We first tested for associations between gene expression and survival time using the TCGA dataset, consisting of 397 glioblastoma cases (see Methods section 5.9 for table 5.4). For each gene, we fitted a Cox proportional hazards model with gene expression as a continuous explanatory variable and computed a p-value by the score test (Table 7.5). The set of 29 genes found to distinguish GNS cells from NS cells across the 22 cell lines assayed by qRT-PCR was en- riched for low p-values compared to the complete set of 18,632 genes quantified in the TCGA dataset (p = 0.02, one-sided Kolmogorov-Smirnov test), demon- strating that expression analysis of GNS and NS lines had enriched for genes associated with patient survival. Seven of the 29 genes had a p-value<0.05 and, for six of these, the direction of the survival trend agreed with that of the GNS expression in such a way that greater similarity to the GNS expression pattern indicated poor survival. Specifically, DDIT3, HOXD10, PDE1C and PLS3 were up-regulated in GNS cells and expressed at higher levels in glioblas- tomas with poor prognosis, while PTEN and TUSC3 were down-regulated in GNS cells and expressed at lower levels in gliomas with poor prognosis. Once this trend was observed we reasoned that we could go one step further and try to obtain an even stronger and more robust association with survival by integrating expression information for multiple genes up- or down-regulated in GNS cell lines. Therefore, we combined the expression values for DDIT3, HOXD10, PDE1C, PLS3, PTEN and TUSC3 into a single value per tumour sample that we named "GNS signature score" (see Methods section 5.9). This score was more strongly associated with survival (p = 10-6) than the expression levels of any of the six individual genes were (0.005

182 7.4 Survival Analysis Results

Table 7.5: Survival tests for the 29 genes found via qRT-PCR to distinguish GNS cell lines from NS cell lines.

Gene Category TCGA dataset Gravendeel dataset (GBM cases) Coefficient* p-value Probeset** Coefficient* p-value ADD2 Up -0.13 0.2858 237336_at -0.17 0.1420 CD9 Up 0.18 0.0731 201005_at 0.17 0.0689 CEBPB Up 0.19 0.1028 212501_at 0.17 0.0651 DDIT3 Up 0.17 0.0128 209383_at 0.09 0.2777 FOXG1 Up 0.13 0.0861 206018_at 0.11 0.0380 HMGA2 Down 0.13 0.1456 1561633_at -0.84 0.2459 HOXD10 Up 0.12 0.0108 229400_at 0.15 0.0021 IRX2 Down -0.19 0.2346 228462_at -0.20 4.4x10-4 LMO4 Up 0.24 0.1046 209205_s_at 0.20 0.1435 LYST Up 0.05 0.5590 203518_at 0.10 0.4151 MAF Down 0.10 0.5873 209348_s_at 0.38 0.0074 MAP6 Down 0.16 0.3063 235672_at -0.30 0.0087 MT2A Up 0.16 0.1554 212185_x_at 0.27 0.0127 MYL9 Down 0.08 0.3764 201058_s_at 0.15 0.0252 NDN Down -0.04 0.4874 209550_at -0.22 6.0x10-5 NELL2 Down 0.08 0.1021 203413_at 0.14 0.0215 PDE1C Up 0.20 0.0105 236344_at 0.21 0.0134 PLA2G4A Up -0.06 0.3198 210145_at 0.30 2.9x10-4 PLCH1 Down 0.10 0.3165 214745_at 0.45 0.0094 PLS3 Up 0.13 0.0381 201215_at 0.30 0.0069 PRSS12 -0.11 0.1865 213802_at 0.20 0.0296 PTEN Down -0.53 0.0047 228006_at -0.40 0.0062 SDC2 Down 0.22 0.0044 212158_at 0.28 5.8x10-4 ST6GALNAC5 Down 0.01 0.9116 220979_s_at 0.08 0.2416 SULF2 -0.11 0.1525 233555_s_at -l0.15 0.0930 SYNM Down -0.06 0.5620 212730_at 0.08 0.2613 TAGLN Down 0.03 0.5947 205547_s_at 0.17 0.0030 TES Down -0.05 0.5759 202720_at 0.07 0.5499 TUSC3 Down -0.14 0.0079 209227_at -0.18 0.0060 *Fitted coefficient from Cox model; a positive coefficient indicates that higher ex- pression is associated with poor survival and a negative coefficient indicates the opposite. ** For the Gravendeel dataset, the result for the most significant probeset interrogating the gene is shown; Up=Up-regulated; Down=Down-regulated.

To test whether these findings generalise to independent clinical sample groups, we examined the glioblastoma datasets described by Gravendeel et al [176] and Murat et al [353], consisting of 141 and 70 cases, respectively (see Methods section 5.9 for table 5.4). The GNS signature score was correlated with pa- tient survival in both of these datasets (p = 3x105 and p = 0.006, respectively; Fig 7.5a). At the level of individual GNS signature genes, five genes were significantly associated with survival (p < 0.05) in both of the two largest

183 7.4 Survival Analysis Results

Figure 7.5: Association between GNS signature and other survival predictors. (a) Scatter plots demonstrate the correlation between GNS signature score and age at diagnosis for the TCGA (left) and Gravendeel (right) datasets. The regression line, Pearson correlation coefficient (r) and p-value indicating statistical significance of the correlation are shown. (b) GNS signature score for samples in the Gravendeel dataset, stratified by IDH1 mutation status and histological grade. Blue circles represent individual samples (independent cases) and grey boxplots summarise their distribution. Only cases with known IDH1 status are shown (127 mutated, 77 wild type). glioblastoma datasets we investigated (TCGA and Gravendeel): HOXD10, PDE1C, PLS3, PTEN and TUSC3 (Table 7.5). In addition to glioblastoma (grade IV) tumours, Gravendeel et al also characterised 109 grade I-III glioma cases (see Methods section 5.9). Inclusion of these data in survival analyses made the association with GNS signature even more apparent (Fig 7.5b). This is consistent with the observation made in section 7.3 whereby core transcrip- tional alterations in GNS cells correlated with histological grade of primary tumours. Analysis of data from the studies of Phillips et al [390] and Freije et al [148], which profiled both grade III and IV gliomas (see Methods section 5.9 for table 5.4), further confirmed the correlation between GNS signature and survival (Figure 7.5b). In summary, the association between GNS signature and patient survival was reproducible in five independent datasets comprising

184 7.4 Survival Analysis Results

867 glioma cases in total (see Methods section 5.9 for table 5.4).

In trying to investigate the presence of a relationship to known predictors of survival in glioma, we noted that the GNS signature scores correlated with patient age at diagnosis, suggesting that the GNS-related expression changes were associated with the more severe form of the disease observed in older pa- tients (Figure 7.6a). Of the genes contributing to the GNS signature, HOXD10, PLS3, PTEN and TUSC3 correlated with age both in the TCGA and Graven- deel datasets.

Figure 7.6: Association between GNS signature score and patient survival. Kaplan- Meier plots illustrate the association between signature score and survival for (a) three independent glioblastoma datasets and (b) three datasets that include gliomas of lower grade (see Methods section 5.9). Higher scores indicate greater similarity to the GNS expression profile. Hazard ratios and log-rank p-values were computed by fitting a Cox proportional hazards model to the data. Percentile thresholds were chosen for illustration; the association with survival is statistically significant across a wide range of thresholds and the p-values given in the text and Table 7.5 were computed without thresholding, using the score as a continuous variable.

IDH1 mutation affecting codon 132 of the IDH1 gene is present in most grade III astrocytomas and a minority of glioblastomas, resulting in an amino acid change (R132H, R132S, R132C, R132G, or R132L). The presence of this muta- tion is associated with lower age at disease onset and better prognosis [383,517]. As already mentioned in section 6.1, all 16 GNS lines profiled in this study were derived from glioblastoma tumours, and the IDH1 locus was sequenced in each cell line (data not shown) and none of the cell lines appeared to harbour the mutation. We therefore wanted to investigate whether the GNS signature was characteristic of IDH1 wild-type glioblastomas or not. We could perform this analysis thanks to the fact that IDH1 status had been determined for most cases in the TCGA and Gravendeel datasets (Table 7.5) [176,326,511]. As ex- pected, we found that gliomas with the IDH1 mutation tend to have lower GNS

185 7.4 Survival Analysis Results

Table 7.6: Significance of survival association for GNS signature and IDH1 status.

dataset Number Single covariate Two covariates (GBM cases) of cases GNS signature IDH1 status GNS signature IDH1 status TCGA 270 5.3x10-5 0.0015 0.0091 0.1489 Gravendeel, 118 2.7x10-5 0.0031 9.2x10-4 0.0840 GBM cases Gravendeel 86 6.5x10-4 0.5776 6.3x-4 0.5408 et al, grade I-III cases Wald test p-values, indicating association with survival, for each covariate in a Cox proportional hazards model with one or two covariates (GNS signature, IDH1 status or both). Cases with unknown IDH1 mutation status were excluded. signature scores than IDH1 wild-type gliomas of the same histological grade (Fig 7.6b). However, we also found that the GNS signature bore a stronger survival association than the IDH1 status (Table 7.6). In fact, the signature remained a significant predictor of patient survival when controlling for IDH1 status (Table 7.6), demonstrating that it contributes independent information to the survival model and does not simply represent a transcriptional state of IDH1 wild-type tumours. This was evident in glioblastomas as well as grade I-III gliomas; the effect is thus not limited to grade IV tumours.

Finally, to investigate whether the correlation between GNS signature and age could be explained by the higher proportion of cases with IDH1 mutation among younger patients, we repeated the correlation analysis described above (Fig 7.6a), limiting the data to glioblastoma cases without IDH1 mutation. For the TCGA dataset, the correlation was decreased somewhat (Pearson R = 0.25 compared to R = 0.36 for the full dataset) but still highly significant (p = 6x105), demonstrating that the correlation with age is only partially explained by IDH1 status. This result was confirmed in the Gravendeel dataset, where the effect of controlling for IDH1 status and grade was negligible (R = 0.38 compared to R = 0.39 for the full dataset including grade I-III samples). Among the individual signature genes, both HOXD10 and TUSC3 remained correlated with age in both datasets when limiting the analysis to IDH1 wild- type glioblastoma cases.

186 7.5 Glioblastoma Pathway Analysis Results

7.5 Glioblastoma Pathway Analysis

In order to identify differentially expressed genes within pathways known to be perturbed in glioma and pathways related to the progenitor cell state, I man- ually built and curated an integrated pathway map by gathering information about interactions and interactors from the literature and the information data bases described in the Methods chapters. The glioblastoma pathways that al- ready exist in the literature, such as the KEGG "Glioma" pathway [3], are not as comprehensive as the present level of understanding of the disease requires of them. These existing pathways are shortcoming in four specific ways:

1. they do not take into account the contribution from the stem cell com- ponent of the tumour and, thus, the progenitor cell state remains unrep- resented;

2. core cell-cycle regulatory pathways that are very relevant in cancer, such as the MAPK cascade and the Integrin signal transduction network, are presented as zoomed out blocks in which the regulatory genes involved are omitted;

3. they do not take into account the contribution from cancer-specific path- ways such as apoptosis, angiogenesis and invasion, or glioblastoma-specific phenotypes such as antigen processing and presentation and mesenchy- mal transformation;

4. they are not available in formats other than poorly editable pdf/jpg/giff images, so they cannot be used to overlay custom expression data and highlight, in a very quick way, the relevant gene networks that are turned on/off.

In order to address these shortcomings and be able to visualise gene expression differences in a pathway context, I compiled our own glioblastoma integrated pathway map that tries to gather all the relevant information missing so far from the literature. Therefore, our map includes the pathways most commonly affected in glioblastoma: (i) RTK/PI3K/PTEN signaling, (ii) p53 signaling and (iii) Rb-mediated control of cell cycle progression [154,326], as well as core cell-cycle regulatory pathways: (i) MAPK cascade and (ii) RTK signaling, and finally cancer-specific and glioblastoma-specific pathways: (i) antigen process- ing and presentation, (ii) apoptosis, (iii) angiogenesis, invasion, motility and (iii) mesenchymal transformation [85].

187 7.5 Glioblastoma Pathway Analysis Results

Figure 7.7: The integrated glioblastoma pathway is subdivided into rough sections contained within the orange and blue boxes identifying the gene networks that par- ticipate in the pathway. In the orange boxes the "classic" glioblastoma pathways are highlighted: TP53, RB1 and PTEN signaling; in the blue boxes the cancer and glioblastoma-specific pathways are represented.

188 7.5 Glioblastoma Pathway Analysis Results

Figure 7.7 shows the integrated pathway in its default colours, without any overlaid expression data that would colour the gene nodes. Sections pertaining to different pathways are coloured in blue if the pathways are glioblastoma- specific or cancer-specific and orange if they reflect the cross-talk between the three well-known pathways commonly affected in glioblastoma. The colour, line and endpoints of the edges represent the type of interactions between the nodes: activation (solid, green, arrow point), transcriptional activation (solid, green, diamond point), tentative activation (dashed, green, arrow point), inhi- bition (solid, red, T point), transcriptional inhibition (solid, red, delta point), includes (solid, black, circle point), becomes (solid, black, top half arrow point), simple interaction (solid, grey, no endpoint), leading to (dashed, grey, arrow), lets in (dashed, grey, no endpoint). The shape of the node represents the type: gene (round), complex (hexagon), family (hexagon), molecule (fee), process (round rectangle). Finally, the beige colour distinguishes genes (grey if not overlaid with colour-coded expression data) from non genes (beige). All the interactions described by the pathway can be found in Appendix D.1. In order to ensure that every single interaction described in the pathway had been experimentally validated I checked every one of them in the interactions databases described in detail in the Methods section, such as BioGrid [62] and Intact [205]. Thus, this glioblastoma pathway is a representation of physically interacting proteins at work in the specific disease context. When the interac- tion needs to occur with chromatin to describe the regulation properly, such as in the case of a transcription factor, either a node named "DNA" is described as one of the two interactors, or a special edge named "activates transcription of" or "inhibits transcription of" - that can be identified by the diamond or delta endpoint, respectively - connects the two interactors. The entire pathway can be recreated from the list of interactions and interactors described in Ap- pendix D.1, but an editable version - with extension CYS - can be downloaded from www.ebi.ac.uk/~diva/GBM-pathway.cys, where the latest and previous versions of the pathway are available. The CYS extension allows the pathway to be viewed and/or edited with any network editing software, such as Cy- toscape [451]. The availability of the pathway for download and use by other researchers, as well as its constant curation, attempts to address one of the four main shortcomings of already existing GBM pathways: the unavailability of formats other than images. Hopefully this resource will now be used by all the researchers who want to overlay expression data on a new comprehensive integrated GBM pathway to observe changes in the wider disease context.

189 7.5 Glioblastoma Pathway Analysis Results

Naturally, we wanted to make use of the resource ourselves and used it to visualise gene expression changes in the context of the molecular networks represented. The complete pathway has a total of 245 nodes, of which 57 are not genes and the remaining 188 are single genes. The nodes representing genes can be recognised by their distinct circular shapes that are unique to this category. Of the 57 non gene nodes: 22 are families of proteins, rep- resented by a diamond shape; eight are complexes of proteins, represented by a hexagon shape; six are non-proteic molecules, represented by a vee-like shape; the remaining 20 define end-processes such as "apoptosis", "cell signal- ing" and "protein synthesis" and are represented by round rectangular shapes. The only isoforms represented in the pathway are the two isoforms of gene CDKN2A labeled as CDKN2A and CDKN2A:ARF (Table 7.7). The dataset

Table 7.7: Node assignment in the glioblastoma pathway. The colour of the nodes representing genes and isoforms is only grey at the default pathway level. When expression data are overlaid on the network the gene nodes will become coloured according to a customisable colour palette.

Node type Number of nodes Node shape Node colour Gene families 22 Diamond Beige Cellular Processes 20 Round Rectangle Beige Non-proteic molecules 6 Vee Beige Complexes 8 Hexagon Beige Genes 186 Circle Grey, default Isoforms 2 Circle Grey, default Total 245 of differentially expressed genes from the Tag-seq expression dataset contains 739 genes at an FDR of 10%. By overlaying the expression data for the 739 differentially expressed genes over the 188 gene nodes in the pathway, we found that 70 genes were differentially expressed between GNS and NS cell lines at 10% FDR. Of these 70 genes, 51 were up-regulated in GNS cell lines (red colour gradient), and 19 were down-regulated in GNS cell lines (blue colour gradient)

(Fig 7.8). The intensity of the colour reflects the value of the log2(FC) be- tween GNS and NS cell lines and the values defining the red and blue colour gradients were specifically taken from the same colour gradients defined in the qRT-PCR heatmap of figure 6.13 and the tumour expression correlation heatmap of figure 7.4.

190 7.5 Glioblastoma Pathway Analysis Results

Figure 7.8: Integrated GBM pathway with Tag-seq GNS dataset overlaid. The map is composed of 245 nodes, of which 188 nodes represent individual genes, shown as circles of intensities that correlate with FC expression changes. 191 7.5 Glioblastoma Pathway Analysis Results

Figure 7.9 displays a simplified and condensed version of the integrated GBM pathway of figure 7.8 that is focused on the pathways most frequently affected in glioblastoma, meant to highlight that we did capture some of the most com- mon alterations in glioblastoma with the GNS/NS comparison. Such changes include up-regulation of EGFR and down-regulation of the tumour suppres- sor PTEN [326]. Similarly to the complete pathway in figure 7.8, the colour gradient of the condensed pathway in figure 7.9 is taken from the same colour gradients defined in the qRT-PCR heatmap of figure 6.13 and the tumour expression correlation heatmap of figure 7.4.

Figure 7.9: Affected p53, RB1 and PTEN/PI3K pathways. Genes are represented by circles coloured according to the expression FC measured between GNS and NS cell lines by Tag-seq.

Given the high heterogeneity typical of glioblastoma tumours, we envisioned another application for our GBM integrated pathway. Since every cell line originated from a separate patient (for the exception of G144ED and G144)

192 7.5 Glioblastoma Pathway Analysis Results and the heterogeneity of the tumour has recently been emphasised further by signature microarray studies like those of Verhaak et al [511] and Phillips et al [390], we mapped the Tag-seq measured expression levels for every GNS cell line on our GBM integrated pathway to try and visualise differences poten- tially unique to any GNS cell line and, therefore, to the tumour it originated from (single pathways in Appendix D.2). Figure 7.10 highlights the genes in our GBM pathway that are expressed in each individual GNS cell line, as mea- sured via Tag-seq. The green nodes reflect the genes that have been measured and the intensity of the green colour correlates with the level of expression ob- served, where lighter greens signify smaller expression values and darker greens signify higher expression values (normalised digital tag counts range from 0 to 16,500) (Fig 7.10). For a clearer view of each of these pathways refer to Ap- pendix D.2 Pathway Images. By having the GNS cell line-specific GBM pathways next to each other in fig- ure 7.10, it is possible to observe differences as well as commonalities in gene expression patterns. For example, in each GNS cell line, tumour suppressor PTEN is not expressed (blue shaded boxes) and the antigen presentation- related genes light up with different intensities, highlighting a more highly activated antigen presentation process at the transcriptional level in cell line G144 than in cell lines G166 and G179 (red shaded boxes). Also, since the CDKN2A-CDKN2B tumour suppressor locus is deleted in cell line G179 but retained in cell lines G144 and G166 - possibly in response to an oncogenic signal - the expression of its CDKN2A, CDKN2A:ARF and CDKN2B genes is very low (darker green nodes) in the G179 pathway and much higher (lighter green nodes) in the G144 and G166 pathways (green shaded boxes). Finally, we can see that the orphan nuclear receptor NR0B1, which is known to be up-regulated and drive tumour growth in Ewing’s sarcoma [158], is also highly expressed (lighter green nodes) in our G179 pathway as opposed to the G144 and G166 pathways (yellow shaded boxes). The PARP3 and PARP12 genes, located below NR0B1 in the pathways and included within the yellow shaded box of figure 7.10, are highly expressed across all GNS pathways, consistent with the potential therapeutic role of these genes in GNS cells as inhibitors of their homolog PARP1, for which characteristic they are now being studied in brain tumour clinical trials [270]. In order to consider other sources of ex- pression data that could highlight interesting patterns when compared to our Tag-seq dataset, we overlaid the exon and microarray expression data from the TCGA and HGG datasets (described in Methods section 5.9; all genes overlaid

193 7.5 Glioblastoma Pathway Analysis Results . PARP12 and PARP3 , NR0B1 ; the red box of the antigen presentation network of genes; PTEN locus genes; the yellow box of the location of genes CDKN2B - CDKN2A Four integrated GBM pathways overlaid with Tag-seq expression level measures for each GNS cell line. Lighter green nodes indicate Figure 7.10: lower expression than darker greenthe nodes. green box The of blue the box location highlights of the the location of

194 7.5 Glioblastoma Pathway Analysis Results

195 7.5 Glioblastoma Pathway Analysis Results were found to be differentially expressed at FDR<%10) on our integrated GBM pathway (Fig 7.11, 7.12). In figure 7.13 the three pathways appear close to each other to enable a quicker view of the highlightable differences and similar- ities. To allow for fair comparisons, in each pathway figure the colour gradient and range are identical to the ones originally used in heatmaps 6.13 and 7.4. It is interesting to notice through this comparison that the TCGA data overlaid GBM pathway (TCGA pathway) and the HGG data overlaid GBM pathway (HGG pathway), express a similar cohort of genes to the Tag-seq overlaid GBM pathway (GNS pathway). In fact, the PTEN node in the TCGA pathway is coloured in light blue, reflecting the slight down-regulation observed in figure 7.4 for primary GBM tumour samples with respect to non-neoplastic brain tissue. However, important differences can also be discerned, such as changes in expression of the three tumour suppressor genes TP53, PTEN and RB1 in the TCGA pathway with respect to the GNS and HGG pathways (blue shaded boxes in figure 7.13). TP53 appears to be up-regulated in the TCGA pathway as opposed to the GNS and HGG pathways, reflecting a higher expression in GBM primary tumours with respect to non-neoplastic brain tissue. This is a very interesting finding since in the TCGA dataset, TP53 is mutated in 38 of the 91 GBM samples, most of which are classified as primary GBMsn [326]. PTEN, on the other hand, appears to be down-regulated in all three path- ways, although at very different levels, reflected by the gradually descending intensities of the blue coloured nodes, with the highest level of down-regulation in the GNS pathway, followed by the TCGA pathway and, finally, the HGG pathway. In our Tag-seq dataset, in fact, PTEN was part of the core down- regulated genes. In the TCGA dataset, 86% of the samples harboured at least one genetic event in the RTK/PI3K/PTEN pathway with frequent deletions of PTEN. Interestingly, we can also see that the down-regulation of PTEN in the TCGA and HGG pathways is accompanied by the up-regulation of the AKT1 gene, highlighted by the red-orange coloured nodes. In fact, PTEN normally antagonises relocation of AKT1, thus reducing AKT1-mediated cell cycle promotion, and loss of PTEN most often results in constitutive activation of AKT1 [99] (yellow shaded boxes). Finally, RB1 seems to be up-regulated in the TCGA and HGG pathway and is undetected in the GNS pathway. Cell cy- cle progression is regulated by the activities of complexes of cyclins and CDKs, which phosphorylate RB1 and thus block its growth-inhibitory functions [318]. Within the TCGA dataset, 77% of samples showed genetic alterations in the Rb pathway, of which all samples with RB1 nucleotide substitution lacked

196 7.5 Glioblastoma Pathway Analysis Results

CDKN2A/CDKN2B locus deletion - observable in the TCGA pathway by the red-orange colours of nodes representing CDKNA genes (Fig 7.11). Amplifi- cation of the CDK4 locus was also a common finding in the TCGA dataset, CDK4 forming complexes with CDKN2A and CDKN2B to maintain cell cycle progression, which can be observed in the TCGA pathway as an orange node (Fig 7.11). Other interesting patterns that can be observed by the comparison of the three pathways in figure 7.13, are those highlighted with the other coloured boxes. The HIF1A gene, responsible for the regulation of poor-oxygen level responses in the cell, is up-regulated in the TCGA and HGG pathway but undetected in the GNS pathway (light blue shaded boxes). The SERPINB2 and SERPINE genes, responsible for the inhibition of angiogenesis and metastasis, seem to be strongly up-regulated in the GNS pathway but slightly down-regulated in the HGG pathway and undetected in the TCGA pathway (brown shaded boxes). The green shaded boxes highlight the regulatory loop for mesenchymal trans- formation that has been found to be active in GBM tumours. In this loop, CEBPB activates STAT3 and FOSL2 that together activate RUNX1 - also activated by DEC1 that is activated itself by FOSL2, which in turn activates JUN leading to mesenchymal transformation [85]. This regulatory loop is clearly active and up-regulated in the TCGA and HGG pathways, although only CEBPB appears as detected and up-regulated in the GNS pathway. The pink shaded boxes highlight the regulation of motility-relevant genes, such as those belonging to the SNARE complex, CALM1 and calcium channels of the CACN family, that appear to be strongly up-regulated in the GNS pathway but down-regulated in the TCGA and HGG pathways. Finally, the purple shaded boxes highlight the behaviour of the antigen processing and presentation-related genes such as HOXA, that are consistently up-regulated in all three pathways, although at different levels with the GNS pathway de- taining the highest levels of up-regulation. This confirms the importance of the immune-evasion phenotype that is transcriptionally active in all three datasets. Overall, this pathway approach allowed us to identify differentially expressed genes that participate in glioma-related pathways and highlight patterns be- tween different tumour cell types. However, it should be made clear that the GBM integrated pathway is not a flawless representation of the gene networks and regulations that occur inside GBM tumour cells, but rather the widest yet representation of the known interactions. Since the pathway was manually built there is always a chance to expand specific sub-networks of interest by

197 7.5 Glioblastoma Pathway Analysis Results writing to the curator at [email protected] or downloading the latest version of the pathway at www.ebi.ac.uk/~diva/GBM-pathway.cys and editing the pathway yourself. The manual curation of the pathway is nonetheless ongoing and new additions are released on a regular basis at the same address.

198 7.5 Glioblastoma Pathway Analysis Results

Figure 7.11: Integrated GBM pathway overlaid with TCGA dataset. The map is composed of 245 nodes and contains 188 nodes representing individual genes that correlate with FC expression changes (http://cancergenome.nih.gov; [326]). 199 7.5 Glioblastoma Pathway Analysis Results

Figure 7.12: Integrated GBM pathway overlaid with HGG dataset. The map composed of 245 nodes and contains 188 nodes representing individual genes that correlate with FC expression changes [148,390]. 200 7.5 Glioblastoma Pathway Analysis Results

Figure 7.13: Three integrated GBM pathways overlaid with the GNS, TCGA and HGG datasets (FDR<%10). Blue boxes locate PTEN, TP53, RB1; yellow, AKT1; azure, HIF1A; brown, SERPINE; purple, antigen presentation genes; green, mesenchymal transformation; pink, motility genes. 201 Chapter 8

MicroRNA Target Prediction Ensemble Software

Contents 8.1 Principles ...... 202 8.2 Workflows ...... 204 8.3 Databases ...... 205 8.4 Filters ...... 209 8.5 Target Prediction Ensemble Analysis ...... 211

8.1 Principles

Hundreds of microRNA sequences are now annotated throughout the human genome and are predicted to modulate the expression levels of thousands of mRNAs. Families of microRNA sequences can be identified through evolution- ary conservation of sequence patterns in related species, and reciprocally, genes targeted by microRNAs are under selection pressure to retain the recognition sites needed for the annealing of microRNA to mRNA duplexes. Numerous regulatory target prediction algorithms have been developed to exploit these properties, but they vary widely in terms of criteria, accuracy and prediction coverage. Most prediction methods search for complementary sequences be- tween microRNAs and putative gene targets, while some consider physical and statistical hybridization properties, cross-conservation of regulatory RNAs be- tween related species, etc. As a result, there is very little overlap between the microRNA:mRNA annealing predicted by different algorithms. It is of- ten the case that researchers have asked themselves "which target prediction algorithm will predict the highest number of genes in my list?". Needless to

202 8.1 Principles Results say, this is a flawed approach triggered by the absence of a unifying common theme in the prediction algorithms that would make them more accurate in their predictions. Rationally, if several algorithms can be developed that yield vastly different results, then we are missing an important variable in our equa- tion. In this perspective, the existing prediction algorithms are not capable of simulating accurately what is happening in Nature time after time in the same reproducible way. Having said this, we were interested in finding out if a single prediction algorithm or a superset of several prediction algorithms was best at predicting microRNA to mRNA annealing. During this analysis we stumbled upon an observation that might be a hint towards the identity of the missing variable in the prediction algorithm equation.

In order to determine which set of prediction algorithms were best at predict- ing microRNA:mRNA annealing, we needed experimentally validated data to help us screen the positives from the false positives, which contribute greatly to the pool of predictions. We used exon array data and microRNA array data that was available to us from our GNS cell lines (Appendix E and F) to validate our findings. Since a large number of prediction algorithms are com- monly used in research and they each predict a vast number of interactions, we built a tool called "GenemiR" to allow molecular biologists to generally access and manage microRNA predictions on a large scale and across differ- ent algorithms that would also help us address our research question. The GenemiR software exists as a command line binary executable for Unix-based systems or as a compiled local installation for Windows. Due to its length, the code of the binary executable could not be added to the appendix of this thesis, but it is available for anyone to view or download at www.ebi.ac. uk/~diva/GenemiR/genemir-code.txt. The executable for Windows can be downloaded at www.ebi.ac.uk/~diva/GenemiR/GenemiR.zip.

In order to address the need for unified search and presentation of microRNA to mRNA target interactions predicted across multiple algorithms, GenemiR relates human and mouse microRNAs with their predicted target genes as reported by eight leading predictors. At the core of the program are two primitives that allow it to be extremely fast in extrapolating lists of gene tar- gets given a microRNA and, performing the reciprocal analysis, extrapolating lists of microRNAs predicted to repress one or more genes of interest (Fig 8.1). Many other auxiliary functions and filters allow the expert user to parametrize

203 8.2 Workflows Results and optimise the analysis to make it as stringent or as loose as necessary. The target predictions in GenemiR’s database can be queried through a variety of selectable filters, allowing the user to include or exclude any combination of algorithms, group multiple microRNAs into the same query, and retrieve vari- able subsets of targeted genes (Fig 8.1). Finally, GenemiR allows the matching of prediction data to any gene expression dataset of interest, and provides a fa- cility for loading external datasets determined by the user. Graphical plotting functions display target expression levels relative to the conditions of the ex- periment, such as tissue-specific transcriptional profiling or time course series. The software itself does not contain internal experimental data, but simply aggregates the output of prediction algorithms in a manner suitable to explo- ration and hypothesis testing, reducing the complexity of this approach by integrating microRNA and gene annotations, genome-wide expression data, and regulatory target predictions in a common analysis framework (Fig 8.1).

8.2 Workflows

The GenemiR software allows for the discovery of systemic or tissue-specific patterns that may be hidden in the microRNA targeting prediction data from eight leading algorithms. In order to do this, two flows of information are estab- lished that address reciprocal biological questions: "which genes are targeted by one or more specific microRNAs?" and "which microRNAs are predicted to target a given set of genes?". These issues are intimately related within the context of the same biological system, but are organised as two opposite flows of information in terms of program operation and layout (Fig 8.2). De- pending on the type of query executed against the internal databases, a list of genes predicted to be targeted by a number of microRNAs according to a customer-defined selection of prediction algorithms, is generated (Workflow 1, figure 8.3). In the opposite workflow, a list of microRNAs is generated start- ing from a list of gene symbols or Genbank, RefSeq, EMBL, ENSG or ENST identifiers, according to the customer-defined selection of prediction algorithms (Workflow 2, figure 8.3). Conversion from the identifiers contained originally in the prediction files is constantly active in both workflows to always translate the queries to a human intelligible list of gene symbols.

204 8.3 Databases Results

Figure 8.1: Overview of GenemiR. (a) Queries based on various data types (the input section: microRNAs, expression data, genes, and identifiers) are executed against internal or external databases to generate other data types (the output sec- tion: genes, microRNAs, graphs and converted identifiers). The shaded boxes distin- guish the core primitive functions from the auxiliary ones. (b) Relationships between databases used by GenemiR and the functions that operate on them. Solid lines de- pict databases of fixed size over the lifetime of the program’s execution, whereas the dashed lines indicate a dataset of variable dimensions according to the files that are inputted by the user.

8.3 Databases

All microRNAs and genes that could constitute a query are stored in the form of three internal databases:

1. Identifiers for all known human and mouse microRNAs;

2. Genbank, RefSeq [409], EMBL [252], ENSG, ENST [147] identifiers for all annotated human and mouse genes;

205 8.3 Databases Results

Figure 8.2: Internal organisation of the target prediction database of Gene- miR. From top to bottom, the two conceptual workflows are consecutively dis- played. Each of eight algorithms (circles) is represented by a different acronym: Diana-microT=DN-T; miRanda=MR; miRBase=MC;ElMMo=EM; PITA=PA; Pic- Tar=PT5/6; TargetscanS=TSS. MicroRNAs (diamonds) and target genes (squares) are assigned random labels for illustrative purposes.

3. Gene symbols as determined by the HUGO Commit- tee for [203] and the MGI Mouse Genome Informatics database resource for mouse [333].

All expression datasets that the user wants to upload in the software are re- ferred to as "external databases" because they are not part of the internal databases, they are user-defined and variable in size. Gene annotation data are derived from the Ensembl database [147], and a utility script is provided with the program to allow automatic updating of files, via the Ensembl Perl API, to reflect the latest annotation release of the human and mouse genomes. The internal databases of human and mouse microRNAs and gene identifiers for all annotated human and mouse genes are linked in a meaningful way by the algorithms that predict which of the microRNAs regulate which of the gens and, vice versa, which of the genes are regulated by which microRNAs. Thus, these linked databases consists of the union of prediction results gen- erated from the eight most widely used algorithms (considering PicTar 5-way and PicTar 6-way as separate algorithms) (Table 8.1), with the possibility of increasing this number as new algorithms are produced with new sets of pre- dictions, thanks to the modular internal structure with which GenemiR was designed:

1. PicTar [127,247] identifies putative microRNA targets in vertebrates, C. elegans and Drosophila, using the principle of cross-conservation. The two versions, PicTar 5-way and PicTar 6-way, refer to the number of species in the comparison: 5-way includes human, chimp mouse, rat and dog genomes, and 6-way adds the chicken genome. PicTar employs

206 8.3 Databases Results

Figure 8.3: Workflows at the core of the primitive functions of the GenemiR soft- ware. The first workflow answers the question: "which microRNAs target this list of genes?" and returns a list of microRNAs filtered according to the Minimum or Cumulative filter or both. The second workflow answers the question: "which genes are targeted by this list of microRNAs?" and returns a list of genes filtered according to the settings of the Minimum or Cumulative filter or both.

pair-wise alignments to filter seed sequences conserved in 7 species of Drosophila and 8 species of vertebrates. In addition to sequence conser- vation, PicTar considers secondary evidence, such as co-expression and clustering of microRNAs, as well as target identification based on onto- logical parameters, such as expression in the common cell types or devel-

207 8.3 Databases Results

opmental stages. Importantly, it also takes into account the number of multiple seeds occurring within the same 3'UTR. The false-positive rate for this algorithm is estimated to be approximately 30% [48].

2. TargetScanS is the second version of the mammalian microRNA target prediction algorithm developed by Lewis and colleagues following the publication of their original TargetScan approach [272,273]. The earlier program searched for a 7nt-long seed region starting from the second nucleotide of the 5' end miRNA sequence, calculated the free energy of hybridization using RNAFold [57], and computed a score based on the presence of multiple seeds capable of annealing to the same microRNA. The criteria used by TargetScanS are more specific, requiring a 6nt-long seed sequence preceded by an adenosine, positioned in a conserved region that, in turn, is surrounded by areas of lesser conservation. TargetScanS omits the use of free-energy calculations to predict hybridization affinity, and is estimated to have a false-positive rate of between 22 and 31% [48].

3. DIANA-microT [237,316] version 3.0 recognises only single seed se- quences having a central stem-loop secondary structure, instead of select- ing for near-perfect complementarity between the 5' seed region of the microRNA and the 3'UTR of the target message. In addition to search- ing for the classical annealing pattern in the 5' seed region, DIANA- microT also requires complementarity at the 3' end of the microRNA sequence [48].

4. miRanda [134,213] identifies putative targets throughout the human and Drosophila genomes. It implements a dynamic programming approach to compute local alignments of complementary microRNA and mRNA sequences. The algorithm assigns scores to putative seed sequences using a position matrix, giving a higher weight to those closer to the 5' end of the microRNA sequence. However, it is not constrained by the re- quirement for exact seed matches. The program takes into consideration free-energy annealing calculations (via RNAfold), and seed-sequence con- servation. To account for tandem annealing of microRNAs to multiple consensus sequences in the targeted transcript, miRanda assigns a high score to alignment matches to the same as well as different microRNAs.

5. ElMMo [155] uses a general Bayesian method that scores the conser- vation of microRNA binding sites according to an evolutionary model.

208 8.4 Filters Results

Bayesian methods are based on prior probability of an observation and are updated as new data keeps entering. This model assumes a phyloge- netic relationship among several species.

6. miRBase [179] uses the miRanda algorithm to identify potential binding sites of a microRNA. Dynamic programming alignment is used to identify highly complementary sites that also require strict complementarity at the 5' seed region and thermodynamic stability, which is estimated for each target site. For inclusion in the database, conservation of the target site at the exact same position in at least two species is needed.

7. PITA [224] takes into consideration the strength of microRNA repression given target site accessibility and for each target site, an energy-based measure representing the difference between the free energy gained by the binding of the microRNA to the target, and the free energy lost by un-pairing the nucleotides within the target site, is calculated. The energy used to un-pair additional nucleotides flanking the target sites is also taken into account.

Table 8.1: microRNA target prediction algorithms used by GenemiR with number of microRNA:3'UTR interactions predicted. The original target identifiers refer to the identifiers used by a prediction algorithm to identify the targeted genes. The final target identifiers refer to the identifiers that are returned by any query of any prediction algorithm database.

Number Number of Original target Final target Prediction algorithm microRNAs targets identifiers identifiers PITA 678 22,974 RefSeq HGNC/MGI PicTar 5-way 178 9,334 RefSeq HGNC/MGI PicTar 6-way 130 3,585 RefSeq HGNC/MGI DIANA-microT 555 18,986 ENSG/ENSMUSG HGNC/MGI Elmmo 1206 31,303 RefSeq,EMBL HGNC/MGI Miranda 1100 32,641 EMBL HGNC/MGI Microcosm 694 34,507 ENST/ENSMUST HGNC/MGI TargetscanS 967 17,725 HGNC/MGI HGNC/MGI

8.4 Filters

Critical to the successful outcome of user queries is the appropriate use of the built-in filtering functions. The purpose of the filter auxiliary functions (Fig

209 8.4 Filters Results

8.3) is to allow the user to specify an appropriate range of return results from each query, depending on either the numbers of different microRNAs expected to play a role in gene regulation (adjusted according to the Minimum and Cumulative filters), or the level of concordance the user enforces between the combination of prediction algorithms defined. The user can at all times pref- erentially select a desired subset of algorithms upon which prediction results will be based and the two types of filters available - Minimum and Cumulative - will be applied on the prediction results of those algorithms. The consequence of setting the Minimum filter to a given value depends on the type of query he user makes. If the query starts with a list of microRNAs and therefore asks which genes are target by these microRNAs, according to a number n of prediction algorithms, then setting the Minimum filter to a given value returns a set of genes targeted by at least that number of microRNAs, as reported by all of the selected algorithms. For example, if the inputted list contains m = 10 microRNAs and n = 3 prediction algorithms are selected, then the Minimum filter will range from one to 10 and setting it on four will cause it to report only those genes that are targeted by at least eight microR- NAs as predicted by each of the three algorithms alone. If the query were to start from a list containing g = 30 genes and n = 3 prediction algorithms were selected, then the Minimum filter would range from one to 30 and setting it on 24 would cause it to report only those microRNAs that target 24 genes as predicted by each of the prediction algorithms alone (Fig 8.3). The number of genes returned from the query will therefore depend greatly on the prediction algorithms selected. The Cumulative filter acts in an alternative fashion: a gene is reported if the total number of microRNA predictions, according to all of the chosen algo- rithms, is at least equal to or greater than the minimum value set by the user. This alleviates the requirement that every algorithm independently predict a set number of microRNA interactions; rather, the aggregate sum of all results in determining which target genes to report, will be considered. Using the Cumulative filter is a less stringent approach and, all things equal, returns a higher number of outputted results. For example, if the inputted list contains m = 10 microRNAs and n = 3 prediction algorithms are selected, then the Cu- mulative filter will range from one to 10x3 = 30 and setting it on 20 will cause it to report the genes that are targeted by at least 20 microRNAs according to any combination of prediction algorithms selected, without the requirement that any particular algorithm predict a set threshold of microRNAs. If the

210 8.5 Target Prediction Ensemble Analysis Results query were to start from a list containing g = 30 genes and n = 3 prediction algorithms were selected, then the Cumulative filter would range from one to 30x3 = 90 and setting it on 56 would cause it to report only those microRNAs that target 56 genes as predicted by any combination of the prediction algo- rithms (Fig 8.3). While the Minimum and Cumulative filters operate on microRNAs, the possi- bility of selecting a subset of the eight prediction algorithms affords the user control over the number of algorithms required to report a given prediction, and thus define the criteria by which predictions are deemed accurate. For example, certain applications - such as experimental validation and cloning of microRNAs involved in particular pathways of interest - involve a significant amount of effort to perform. If this is the ultimate goal of the investigation, then it is desirable to focus on a small set of highly-scoring microRNA targets unanimously reported by several algorithms. Other searches though, such as those performed for comparative genomic analyses, are limited only by com- putational feasibility and can therefore afford to include a greater number of putative regulators whose involvement may only be predicted by one or two algorithms. Any combination of the above filtering methods is possible, keeping in mind that: (i) they are always applied to the original set of data, and that (ii) a given gene is included in the final results only if it passes the criteria imposed by each individual filter set by the user. Therefore, the utility of applying dif- ferent filter combinations depends on the relevance of the biological question they help to answer.

8.5 Target Prediction Ensemble Analysis

A question we wanted to ask was whether any prediction algorithm fared bet- ter than any other combination of prediction algorithms. Since the divergence between prediction results from any algorithm is large, finding whether any combination of a subset of these algorithms fares better than any algorithm alone, is a necessary step. Using our software tool GenemiR we tried to eluci- date how different combinations of prediction algorithms fared with respect to the single algorithm, and thus address the hypothesis suggested by Alexiou et al [16] that combinations of algorithms predict more accurately than the single algorithm alone. The way we addressed this problem was using relevant ex- perimental data from exon arrays and microRNA microarrays generated from

211 8.5 Target Prediction Ensemble Analysis Results the same GNS cell lines of which we had digital gene expression profiles. With this set of data we could link the down-regulation of expression observed for the 2,290 genes at F DR < 1% as measured in the exon arrays, to the up- regulation of expression observed for the 258 microRNAs at F DR < 1% as measured in the microRNA microarrays.

Our first interesting finding was that the final score of our analysis, represent- ing how well the subset of algorithms at stake fared when asked to predict which microRNAs targeted the list of 2,290 down-regulated genes, worsened considerably if an initial filter was not applied that made our analysis tissue- specific. When I mentioned earlier that the ensemble analysis gave us a hint as to the identity of the variable missing from prediction algorithms, it was tissue specificity. So far prediction algorithms have not been designed with tissue- specificity in mind. However, microRNA regulation is highly tissue specific and there is no way around having to factor this in when designing an algo- rithm that wil accurately predict the interaction between a microRNA and an mRNA. Current prediction algorithms consider different cohorts of variables that all refer to a main concept: sequence complementarity. Secondary con- siderations are factors such as secondary structure and number of microRNA seeds in a 3'UTR. There is no easy way to implement tissue-specificity variables within a target prediction algorithm because of the empirical nature of such variables. We have yet to understand what exactly triggers tissue-specificity in microRNA regulation of mRNAs, but any tuning implemented by epigenetic mechanisms or feedback regulatory loops may be too computationally inten- sive to simulate still. One way to work around this problem in the present is to filter the target predictions by a background gene list that filters out most non tissue-specific predictions. Of course, this strategy will only yield partially accurate results but it can surely improve the use of prediction results at the present time. As a result of this understanding, we used a background gene list ourselves in order to minimise the number of false positives caused by the brain tissue-specificity of the query. The background gene list was retrieved from the exon array dataset and consisted of all 6,254 genes with an expression level that was mea- sured at F DR < 1%. Running the algorithm without this background gene list worsened the final score by ten fold. The method we used to evaluate the performance and accuracy of each pre- diction algorithm in the GenemiR internal database is described below and

212 8.5 Target Prediction Ensemble Analysis Results summarised in figure 8.4. We first run the method on the predictions from each algorithm alone and then extended it to all combinations of prediction algorithm results. The score E reflecting the accuracy of each prediction or combination of predictions, ranges between 0 and 1 (0 > E > 1), with 1 re- flecting a perfectly matching set of predictions between the algorithm being evaluated and our experimental data. The score is calculated following these successive steps:

1. filter out, for each prediction algorithm and for each microRNA exper- imentally observed to be up-regulated, the genes that are predicted to be target by that microRNA but are not present in the gene background list;

2. generate a union list from the genes filtered in step 1 for the entire cohort of up-regulated microRNAs, for each prediction algorithm;

3. iterate over each gene in the union list and increase a cumulative counter every time the gene is predicted to be targeted by one of the up-regulated microRNAs, for each prediction algorithm. This counter is normalised to the number of microRNAs that the prediction contains from the list of 258 experimentally measured microRNAs;

4. sort the union gene list by the count calculated in step 3 to generate a "hit" list, representative of each prediction algorithm;

5. iterate over the hit list summing in a cumulative counter C1, the counts associated to all the genes in the list, which have been predicted by the prediction algorithm ;

6. iterate over the hit list summing in a cumulative counter C2, the counts associated to the genes that are experimentally observed to be down- regulated in our exon array data;

7. generate a score value by dividing the cumulative counters C1 and C2, representing a measure of how well a prediction algorithm fared based on how many genes it correctly predicted to be targeted in our GNS cell lines, considering the same cell line expression profile as a background for calculations.

Scores for each single prediction algorithm are listed in table 8.2 below. To evaluate whether combinations of different algorithms were more accurate at

213 8.5 Target Prediction Ensemble Analysis Results

Figure 8.4: Step by step diagram of the ensemble method adopted to find the score E (=C2/C1) of prediction accuracy for prediction algorithms. The method was applied to the predictions from all combinations of target prediction algorithms and an E-score was generated for each one. The red set of gene predictions varied in size depending on how many and which algorithms were being considered for a particular round. predicting than the single, we performed the same steps 1-7 but with lists that resulted as the union of all the user-defined prediction algorithm genes (vary- ing size of the predicted genes set of figure 8.4). The results are listed in table 8.5. The purpose of the hit list is to "weight" the importance to the predicted genes. For example, if gene A has been predicted to be targeted by only nine of the 258 microRNAs, its weight when added to the cumulative score C1 will be only 9/258th the weight added by gene B, if gene B were predicted to be

214 8.5 Target Prediction Ensemble Analysis Results

Table 8.2: Single prediction algorithm ensemble analysis results. Displayed in descending order of E-score. Prediction name E-score ElMMo 0,3181 Diana-microT 0,3122 PITA 0,3088 TargetscanS 0,3073 PicTar 6 0,3020 miRBase 0,3008 PicTar 5 0,2989 miRanda 0,2979 targeted by all 258 microRNAs. This scoring system takes into consideration the possibility that not all prediction algorithms necessarily predict the target- ing for all the 258 up-regulated microRNAs, by always measuring the accuracy of a prediction set over the number of genes of the background gene list that are predicted by a particular algorithm.

This analysis highlights the fact that very little improvement, in the order of the thousandth, is achieved by combining prediction algorithms together. The highest scoring prediction algorithms as-singles are ElMMo and Diana-microT and when used in combination with other prediction algorithms the E-score of ElMMo always decreases, while the E-score of Diana-microT, in combina- tion with one or two other prediction algorithms, seems to slightly increase. This analysis reveals that the best performing combination of algorithms does not outperform the best scoring as-single algorithm ElMMo. However, this initial analysis should be followed by a series of other analysis in which other approaches for the evaluation of the score are taken, and the Minimum and Cumulative filters are taken into consideration as well. This approach would increase the number of combinations from the current 256 (28) because each combination of filters would have to be evaluated for all the 256 microRNAs and would therefore require a global energy combinatorial optimisation algo- rithm such as a genetic algorithm, whereby a search that mimics the process of natural evolution would be performed on a population of candidate solu- tions that evolve towards the best solution starting from a randomly selected population.

215 8.5 Target Prediction Ensemble Analysis Results

Table 8.3: All combinations of prediction algorithms in descending order of E-score. The E- scores for the single algorithms are also present and highlighted along the list in bold characters.

Prediction algorithms involved in the combination set E-score ElMMo 0.3181 ElMMo PicTar6 0.3173 ElMMo PicTar5 0.3162 ElMMo miRBase 0.3160 Diana-microT ElMMo 0.3159 Diana-microT ElMMo PicTar6 0.3155 ElMMo PicTar5 PicTar6 0.3155 ElMMo miRBase PicTar6 0.3154 Diana-microT ElMMo PicTar5 0.3148 Diana-microT ElMMo miRBase 0.3147 ElMMo miRBase PicTar5 0.3145 Diana-microT ElMMo PicTar5 PicTar6 0.3144 ElMMo TargetscanS 0.3144 Diana-microT ElMMo miRBase PicTar6 0.3143 ElMMo miRBase PicTar5 PicTar6 0.3139 ElMMo PicTar6 TargetscanS 0.3139 Diana-microT ElMMo TargetscanS 0.3138 Diana-microT ElMMo miRBase PicTar5 0.3137 Diana-microT ElMMo PicTar6 TargetscanS 0.3135 Diana-microT ElMMo miRBase PicTar5 PicTar6 0.3134 ElMMo PicTar5 TargetscanS 0.3133 ElMMo miRBase TargetscanS 0.3132 Diana-microT ElMMo miRBase TargetscanS 0.3130 Diana-microT ElMMo PicTar5 TargetscanS 0.3130 ElMMo miRBase PicTar6 TargetscanS 0.3129 ElMMo PicTar5 PicTar6 TargetscanS 0.3129 Diana-microT ElMMo miRBase PicTar6 TargetscanS 0.3127 Diana-microT ElMMo PicTar5 PicTar6 TargetscanS 0.3127 ElMMo miRBase PicTar5 TargetscanS 0.3123 Diana-microT ElMMo miRBase PicTar5 TargetscanS 0.3123 Diana-microT 0.3122 ElMMo PITA 0.3122 Diana-microT ElMMo PITA 0.3122 Diana-microT ElMMo miRBase PicTar5 PicTar6 TargetscanS 0.3121 ElMMo PITA PicTar6 0.3120 Diana-microT ElMMo PITA PicTar6 0.3120 ElMMo miRBase PicTar5 PicTar6 TargetscanS 0.3120 Diana-microT ElMMo miRBase PITA 0.3118 Diana-microT ElMMo PITA PicTar5 0.3118 ElMMo miRBase PITA 0.3117 ElMMo PITA PicTar5 0.3117 Diana-microT ElMMo miRBase PITA PicTar6 0.3116 Diana-microT ElMMo PITA PicTar5 PicTar6 0.3116 ElMMo miRBase PITA PicTar6 0.3115 ElMMo PITA PicTar5 PicTar6 0.3115 Diana-microT ElMMo PITA TargetscanS 0.3115 ElMMo PITA TargetscanS 0.3114 Diana-microT ElMMo PITA PicTar6 TargetscanS 0.3114 Diana-microT ElMMo miRBase PITA PicTar5 0.3113 Diana-microT PicTar6 0.3113 ElMMo PITA PicTar6 TargetscanS 0.3113 ElMMo miRBase PITA PicTar5 0.3112 Diana-microT ElMMo miRBase PITA PicTar5 PicTar6 0.3112 Diana-microT ElMMo miRBase PITA TargetscanS 0.3112 Diana-microT ElMMo PITA PicTar5 TargetscanS 0.3112 ElMMo miRBase PITA PicTar5 PicTar6 0.3110 ElMMo miRBase PITA TargetscanS 0.3110 ElMMo PITA PicTar5 TargetscanS 0.3110 Diana-microT ElMMo miRBase PITA PicTar6 TargetscanS 0.3110 Diana-microT ElMMo PITA PicTar5 PicTar6 TargetscanS 0.3110 ElMMo PITA PicTar5 PicTar6 TargetscanS 0.3109 Diana-microT ElMMo miRBase PITA PicTar5 TargetscanS 0.3108 ElMMo miRBase PITA PicTar6 TargetscanS 0.3108 Diana-microT ElMMo miRBase PITA PicTar5 PicTar6 TargetscanS 0.3107 Diana-microT ElMMo miRanda 0.3106 ElMMo miRBase PITA PicTar5 TargetscanS 0.3106 ElMMo miRBase PITA PicTar5 PicTar6 TargetscanS 0.3105 Diana-microT ElMMo miRanda PicTar6 0.3104 Diana-microT miRBase 0.3100 ElMMo miRanda 0.3100 Diana-microT ElMMo miRBase miRanda 0.3100 Diana-microT PicTar5 0.3100

216 8.5 Target Prediction Ensemble Analysis Results

Prediction algorithms involved in the combination set E-score Diana-microT ElMMo miRanda PicTar5 0.3100 Diana-microT ElMMo miRanda TargetscanS 0.3099 Diana-microT ElMMo miRanda PITA 0.3098 Diana-microT ElMMo miRBase miRanda PicTar6 0.3098 Diana-microT ElMMo miRanda PicTar5 PicTar6 0.3098 Diana-microT TargetscanS 0.3098 Diana-microT ElMMo miRanda PicTar6 TargetscanS 0.3098 Diana-microT PITA 0.3097 ElMMo miRanda PicTar6 0.3097 Diana-microT ElMMo miRanda PITA PicTar6 0.3097 Diana-microT ElMMo miRBase miRanda PITA 0.3095 Diana-microT ElMMo miRBase miRanda PicTar5 0.3095 Diana-microT ElMMo miRanda PITA PicTar5 0.3095 Diana-microT PITA PicTar6 0.3095 Diana-microT ElMMo miRBase miRanda TargetscanS 0.3095 Diana-microT ElMMo miRanda PITA TargetscanS 0.3095 Diana-microT ElMMo miRanda PicTar5 TargetscanS 0.3095 Diana-microT PicTar6 TargetscanS 0.3095 ElMMo miRanda PITA 0.3094 Diana-microT miRBase PicTar6 0.3094 Diana-microT ElMMo miRBase miRanda PITA PicTar6 0.3094 Diana-microT PicTar5 PicTar6 0.3094 Diana-microT ElMMo miRanda PITA PicTar5 PicTar6 0.3094 Diana-microT ElMMo miRBase miRanda PicTar6 TargetscanS 0.3094 Diana-microT ElMMo miRanda PITA PicTar6 TargetscanS 0.3094 Diana-microT ElMMo miRanda PicTar5 PicTar6 TargetscanS 0.3094 ElMMo miRBase miRanda 0.3093 ElMMo miRanda PicTar5 0.3093 ElMMo miRanda PITA PicTar6 0.3093 Diana-microT ElMMo miRBase miRanda PicTar5 PicTar6 0.3093 ElMMo miRanda TargetscanS 0.3093 Diana-microT miRBase PITA 0.3092 Diana-microT PITA PicTar5 0.3092 Diana-microT ElMMo miRBase miRanda PITA PicTar5 0.3092 Diana-microT PITA TargetscanS 0.3092 Diana-microT ElMMo miRBase miRanda PITA TargetscanS 0.3092 Diana-microT ElMMo miRanda PITA PicTar5 TargetscanS 0.3092 ElMMo miRanda PicTar6 TargetscanS 0.3092 Diana-microT ElMMo miRBase miRanda PITA PicTar6 TargetscanS 0.3092 Diana-microT ElMMo miRanda PITA PicTar5 PicTar6 TargetscanS 0.3092 ElMMo miRBase miRanda PITA 0.3091 ElMMo miRanda PITA PicTar5 0.3091 ElMMo miRBase miRanda PicTar6 0.3091 ElMMo miRanda PicTar5 PicTar6 0.3091 Diana-microT ElMMo miRBase miRanda PITA PicTar5 PicTar6 0.3091 ElMMo miRanda PITA TargetscanS 0.3091 Diana-microT ElMMo miRBase miRanda PicTar5 TargetscanS 0.3091 Diana-microT PITA PicTar6 TargetscanS 0.3091 Diana-microT miRBase PITA PicTar6 0.3090 ElMMo miRBase miRanda PITA PicTar6 0.3090 Diana-microT PITA PicTar5 PicTar6 0.3090 ElMMo miRanda PITA PicTar5 PicTar6 0.3090 Diana-microT ElMMo miRBase miRanda PITA PicTar5 TargetscanS 0.3090 ElMMo miRanda PITA PicTar6 TargetscanS 0.3090 Diana-microT ElMMo miRBase miRanda PicTar5 PicTar6 TargetscanS 0.3090 Diana-microT ElMMo miRBase miRanda PITA PicTar5 PicTar6 TargetscanS 0.3089 PITA 0.3088 ElMMo miRBase miRanda PITA PicTar5 0.3088 Diana-microT miRBase TargetscanS 0.3088 ElMMo miRBase miRanda TargetscanS 0.3088 Diana-microT miRBase PITA TargetscanS 0.3088 ElMMo miRBase miRanda PITA TargetscanS 0.3088 Diana-microT PicTar5 TargetscanS 0.3088 ElMMo miRanda PicTar5 TargetscanS 0.3088 Diana-microT PITA PicTar5 TargetscanS 0.3088 ElMMo miRanda PITA PicTar5 TargetscanS 0.3088 ElMMo miRBase miRanda PicTar5 0.3087 Diana-microT miRBase PITA PicTar5 0.3087 ElMMo miRBase miRanda PITA PicTar5 PicTar6 0.3087 ElMMo miRBase miRanda PicTar6 TargetscanS 0.3087 Diana-microT miRBase PITA PicTar6 TargetscanS 0.3087 ElMMo miRBase miRanda PITA PicTar6 TargetscanS 0.3087 ElMMo miRanda PicTar5 PicTar6 TargetscanS 0.3087 Diana-microT PITA PicTar5 PicTar6 TargetscanS 0.3087

217 8.5 Target Prediction Ensemble Analysis Results

Prediction algorithms involved in the combination set E-score ElMMo miRanda PITA PicTar5 PicTar6 TargetscanS 0.3087 PITA PicTar6 0.3086 Diana-microT miRBase PITA PicTar5 PicTar6 0.3086 ElMMo miRBase miRanda PITA PicTar5 TargetscanS 0.3086 Diana-microT miRBase PicTar5 0.3085 ElMMo miRBase miRanda PicTar5 PicTar6 0.3085 PITA TargetscanS 0.3085 Diana-microT miRBase PITA PicTar5 TargetscanS 0.3085 Diana-microT miRBase PicTar6 TargetscanS 0.3085 Diana-microT PicTar5 PicTar6 TargetscanS 0.3085 ElMMo miRBase miRanda PITA PicTar5 PicTar6 TargetscanS 0.3085 ElMMo miRBase miRanda PicTar5 TargetscanS 0.3084 PITA PicTar6 TargetscanS 0.3083 Diana-microT miRBase PITA PicTar5 PicTar6 TargetscanS 0.3083 miRBase PITA 0.3082 PITA PicTar5 0.3082 ElMMo miRBase miRanda PicTar5 PicTar6 TargetscanS 0.3082 miRBase PITA PicTar6 0.3081 Diana-microT miRBase PicTar5 PicTar6 0.3080 PITA PicTar5 PicTar6 0.3080 miRBase PITA TargetscanS 0.3080 Diana-microT miRBase PicTar5 TargetscanS 0.3080 PITA PicTar5 TargetscanS 0.3080 miRBase PITA PicTar6 TargetscanS 0.3079 PITA PicTar5 PicTar6 TargetscanS 0.3079 Diana-microT miRBase PicTar5 PicTar6 TargetscanS 0.3078 miRBase PITA PicTar5 0.3077 miRBase PITA PicTar5 TargetscanS 0.3076 miRBase PITA PicTar5 PicTar6 0.3075 miRBase PITA PicTar5 PicTar6 TargetscanS 0.3075 TargetscanS 0.3073 Diana-microT miRanda PITA TargetscanS 0.3071 Diana-microT miRanda PITA 0.3070 Diana-microT miRanda PITA PicTar6 TargetscanS 0.3070 Diana-microT miRanda PITA PicTar6 0.3069 Diana-microT miRBase miRanda PITA TargetscanS 0.3068 Diana-microT miRanda PITA PicTar5 TargetscanS 0.3068 PicTar6 TargetscanS 0.3068 Diana-microT miRBase miRanda PITA 0.3067 Diana-microT miRanda PITA PicTar5 0.3067 Diana-microT miRBase miRanda PITA PicTar6 0.3067 Diana-microT miRBase miRanda PITA PicTar6 TargetscanS 0.3067 Diana-microT miRanda PITA PicTar5 PicTar6 TargetscanS 0.3067 Diana-microT miRanda PITA PicTar5 PicTar6 0.3066 Diana-microT miRBase miRanda PITA PicTar5 TargetscanS 0.3066 Diana-microT miRBase miRanda PITA PicTar5 0.3065 Diana-microT miRBase miRanda PITA PicTar5 PicTar6 TargetscanS 0.3065 Diana-microT miRBase miRanda PITA PicTar5 PicTar6 0.3064 miRanda PITA TargetscanS 0.3060 miRanda PITA PicTar6 TargetscanS 0.3060 miRBase TargetscanS 0.3059 miRanda PITA 0.3058 miRBase miRanda PITA TargetscanS 0.3058 PicTar5 TargetscanS 0.3058 miRanda PITA PicTar5 TargetscanS 0.3058 miRanda PITA PicTar6 0.3057 miRBase miRanda PITA PicTar6 TargetscanS 0.3057 miRanda PITA PicTar5 PicTar6 TargetscanS 0.3057 miRBase miRanda PITA PicTar5 TargetscanS 0.3056 miRBase PicTar6 TargetscanS 0.3056 miRBase miRanda PITA 0.3055 miRanda PITA PicTar5 0.3055 PicTar5 PicTar6 TargetscanS 0.3055 miRBase miRanda PITA PicTar5 PicTar6 TargetscanS 0.3055 miRBase miRanda PITA PicTar6 0.3054 miRanda PITA PicTar5 PicTar6 0.3054 Diana-microT miRanda TargetscanS 0.3053 miRBase miRanda PITA PicTar5 0.3052 miRBase miRanda PITA PicTar5 PicTar6 0.3052 Diana-microT miRanda PicTar6 TargetscanS 0.3052 Diana-microT miRBase miRanda TargetscanS 0.3050 miRBase PicTar5 TargetscanS 0.3049 Diana-microT miRanda PicTar5 TargetscanS 0.3049 Diana-microT miRBase miRanda PicTar6 TargetscanS 0.3049

218 8.5 Target Prediction Ensemble Analysis Results

Prediction algorithms involved in the combination set E-score Diana-microT miRanda PicTar5 PicTar6 TargetscanS 0.3049 Diana-microT miRBase miRanda PicTar5 TargetscanS 0.3047 miRBase PicTar5 PicTar6 TargetscanS 0.3047 Diana-microT miRBase miRanda PicTar5 PicTar6 TargetscanS 0.3046 Diana-microT miRanda 0.3045 Diana-microT miRanda PicTar6 0.3044 Diana-microT miRBase miRanda 0.3041 Diana-microT miRanda PicTar5 0.3040 Diana-microT miRBase miRanda PicTar6 0.3040 Diana-microT miRanda PicTar5 PicTar6 0.3039 Diana-microT miRBase miRanda PicTar5 0.3037 Diana-microT miRBase miRanda PicTar5 PicTar6 0.3037 PicTar6 0.3020 miRanda TargetscanS 0.3020 miRanda PicTar6 TargetscanS 0.3020 miRBase miRanda TargetscanS 0.3019 miRBase miRanda PicTar6 TargetscanS 0.3019 miRanda PicTar5 TargetscanS 0.3018 miRanda PicTar5 PicTar6 TargetscanS 0.3018 miRBase miRanda PicTar5 TargetscanS 0.3017 miRBase miRanda PicTar5 PicTar6 TargetscanS 0.3017 miRBase PicTar6 0.3011 miRBase 0.3008 miRBase PicTar5 PicTar6 0.3003 miRBase PicTar5 0.3000 PicTar5 PicTar6 0.2999 PicTar5 0.2989 miRBase miRanda PicTar5 PicTar6 0.2987 miRBase miRanda PicTar6 0.2986 miRBase miRanda PicTar5 0.2985 miRBase miRanda 0.2984 miRanda PicTar5 PicTar6 0.2983 miRanda PicTar6 0.2982 miRanda PicTar5 0.2981 miRanda 0.2979

219 Chapter 9

Discussion

9.1 Digital Profiling of GNS Cell Lines

Gliobastoma multiforme is the most common primary brain tumour and the most aggressive glioma in adults. No effective solution has been embodied as a treatment yet, causing the prognosis for this disease to be very poor, with a median survival time of 15 months [472]. The extensive cellular het- erogeneity typical of glioblastomas is confirmed by the consistently different molecular signatures and copy number variations observed in the subclasses present within the primary and secondary subtypes [154,334,450]. These ob- servations are very important since they point at the need for approaching treatment research from a more molecular standpoint that allows for patient treatment diversification, in which the patient’s genome is specifically tailored to obtain the most effective results. An important finding within the cancer stem cell field in the context of glioblastomas, was the observation that these tumours contain a population of cells with similarities to NS cells. Before then, gliomas were studied as cancer cell lines, which hid the underlying stem cell component of the tumour by causing it to differentiate and could therefore never be specifically targeted. NS cells give rise to both neurons and glia during the development of the nervous system and are present in restricted regions of the adult human brain, where they constitute proliferating germinative zones that are active throughout adulthood [248]. According to the cancer stem cell hypothesis, such stem cell-like cell populations are responsible for maintaining cancers, as well as giving rise to the differentiated progeny responsible for the cellular diversity of many neoplasias, including glioblastoma [379]. If this is the case, isolating and characterizing the glioma cancer stem cells will be key to developing efficient therapies for glioblastoma multiforme.

220 9.1. Digital Profiling of GNS Cell Lines Discussion

Before Pollard et al [404] adapted the protocol developed by Conti et al [107] for the derivation of NS cells in adherent serum-free culture, to glioma tu- mours, glioma-related research was conducted on glioma cell lines (see section 1.3) and neurosphere cultures (see section 2.2). Unlike the derivation method for cancer cell lines, which involves serum-containing media that is selective for proliferative cells which therefore lose their differentiation identity, the NS cell line derivation method uses a defined medium with the addition of growth hormones EGF and FGF2 that rather supports the self-renewal pro- cess. While neurosphere cultures have been instrumental in the detection and quantification of the presence of a stem cell component in gliomas, the imme- diate differentiation that is observed when neurospheres start adhering, makes them a suboptimal tool for the long term characterisation and manipulation of glioma stem cells. When it was discovered that, although adult human NS cells are difficult to study, fetal human NS cells can be isolated and main- tained as untransformed cell lines in serum-free medium supplemented with growth factors, the link was made that the same protocol could be used to ob- tain NS-like cells from gliomas and maintain them in vitro [404]. These cells, termed Glioma Neural Stem Cells (GNS), have similar morphology as NS cells obtained in the same manner, when grown as adherent cultures and feature other similarities including expression of progenitor cell markers and capacity to differentiate into multiple neural lineages. In contrast to NS cells, however, GNS cells harbour genetic aberrations characteristic of glioblastoma and form glioma-like tumours when transplanted into immunocompromised mice [404]. Thus, the derivation of NS cells from brain tissue forms the basis for culturing GNS cells from gliomas and by using the same protocol for the isolation of fetal human NS cells, the isolation of NS-like cells in adherent culture [404] set the best research platform for conducting research that lends itself better to the achievement of patient tailored therapies.

In this thesis I show several lines of support for GNS lines being suitable models for the understanding of the molecular basis of glioma:

1. GNS/NS differential expression analysis highlights pathways in glioma;

2. GNS expression profiles capture molecular signatures of glioblastoma subtypes established by large microarray studies;

3. many core DE genes identified in the GNS/NS comparison show similar changes in expression data from primary glioblastomas and xenografts.

221 9.1. Digital Profiling of GNS Cell Lines Discussion

In order to reveal transcriptional changes that underlie glioblastoma, I per- formed an in-depth analysis of gene expression in malignant stem cells derived from patient tumours in relation to untransformed, karyotypically normal neu- ral stem cells. These cell types are closely related and it has been hypothe- sised that gliomas arise by mutations in NS cells or in glial cells that have reacquired stem cell features [404]. We measured gene expression by high- throughput RNA tag sequencing (Tag-seq), a method which features high sen- sitivity and reproducibility compared to microarrays [490]. qRT-PCR valida- tion further demonstrates that Tag-seq expression values are highly accurate. Other cancer samples and cell lines have recently been profiled with the same method [346,383], which should make these samples comparable to ours and the analyses presented in this thesis. Through Tag-seq expression profiling of normal and cancer stem cells followed by qRT-PCR validation in a wider panel of 22 cell lines, we identified 29 genes strongly discriminating GNS from NS cells. Some of these genes have previously been implicated in glioma, in- cluding four with a role in adhesion and/or migration, CD9, ST6GALNAC5, SYNM and TES [63,241,251,477], and two transcriptional regulators, FOXG1 and CEBPB. This observation is in line with the gene ontology analysis, which revealed "Cell adhesion” and "Cell migration” as relevant biological processes amongst our set of differentially expressed genes (Fig 7.1), and the fact that, although infiltrative spread is a common feature of all diffuse astrocytic tu- mours, glioblastoma is particularly notorious for its rapid invasion of neigh- boring brain structures [301]. Activation of the TGFβ and AKT pathways have been described as possible molecular mediators of this invasion [245,527] and a number of other expression profiling studies have identified a subset of tumours with elevated expression of ECM components as well as intracellular proteins associated with cell motility [148,390]. FOXG1, which has been pro- posed to act as an oncogene in glioblastoma by suppressing growth-inhibitory effects of TGFβ [448], showed remarkably strong expression in all 16 GNS lines assayed by qRT-PCR. CEBPB was recently identified as a master regulator of a mesenchymal gene expression signature associated with poor prognosis in glioblastoma [85]. Studies in hepatoma and pheochromocytoma cell lines have shown that the transcription factor encoded by CEBPB (C/EBPβ) promotes expression of DDIT3 [140], another transcriptional regulator that we found to be up-regulated in GNS cells. DDIT3 encodes the protein CHOP, which in turn can inhibit C/EBPβ by dimerizing with it and acting as a dominant negative [85]. This interplay between CEBPB and DDIT3 may be relevant for

222 9.1. Digital Profiling of GNS Cell Lines Discussion glioma therapy development, as DDIT3 induction in response to a range of compounds sensitises glioma cells to apoptosis (see e.g. [216]).

Our results also corroborate a role in glioma for several other genes with limited prior links to the disease. This list includes PLA2G4A, HMGA2, TAGLN and TUSC3, all of which have been implicated in other neoplasias (Appendix A.2). PLA2G4A encodes a phospholipase that functions in the production of lipid signaling molecules with mitogenic and pro-inflammatory effects. In a subcu- taneous xenograft model of glioblastoma, expression of PLA2G4A by the host mice was required for tumour growth [285]. For HMGA2, a transcriptional regulator down-regulated in most GNS lines, low or absent protein expression has been observed in glioblastoma compared to low-grade gliomas [285], and HMGA2 polymorphisms have been associated with survival time in glioblas- toma [295]. TAGLN, another gene down-regulated in most GNS lines, encodes the actin-binding protein transgelin. TAGLN has been characterised as a tu- mour suppressor with lost expression in prostate, breast and colon cancers [30], but we only found one prior study on TAGLN in glioma, showing low expression in a glioma cell line from rats [181]. TUSC3 is commonly silenced by promoter methylation in glioblastoma, in particular in patients above 40 years of age. Loss or down-regulation of TUSC3 has been found in several other cancers, e.g. of the colon where TUSC3 becomes hypermethylated with age [13]. The function of TUSC3 is unknown, but may relate to protein glycosylation [337]. The set of 29 genes found to generally distinguish GNS from NS cells also in- cludes multiple genes implicated in other neoplasias, but without direct links to glioma, such as SULF2, NNMT and LMO4 (Appendix A.2). Of these, the transcriptional regulator LMO4 may be of particular interest, as it is well stud- ied as an oncogene in breast cancer and regulated through the phosphoinosi- tide 3-kinase pathway [340], which is commonly affected in glioblastoma [326]. SULF2 encodes an extracellular sulfatase that modulates interactions between growth factors and their receptors, with effects on multiple signaling pathways. It is up-regulated in several cancers, including a mouse model of glioma [211] and, as shown in our differential expression analysis, in many GNS lines.

Five of these 29 genes have not been directly implicated in cancer. This list comprises one gene down-regulated in GNS lines (PLCH1) and four up- regulated (ADD2, LYST, PLA2G4A, PDE1C and PRSS12). PLCH1 is in- volved in phosphoinositol signaling [228], like the frequently mutated phospho-

223 9.1. Digital Profiling of GNS Cell Lines Discussion inositide 3-kinase complex [326]. Interestingly, both PDE1C and PLCH1 are activated by intracellular Ca2+ [126,228], in line with these genes being involved in PI3K and cAMP regulation, which are both Ca2+ regulated pathways (see pathway figure 7.8). PLA2G4A encodes a cytoplasmic phospholipase involved in production of lipid signaling molecules with mitogenic and pro-inflammatory effects [192]. ADD2 encodes a cytoskeletal protein that interacts with FYN, a tyrosine kinase promoting cancer cell migration [456,533]. For PDE1C, a cyclic nucleotide phosphodiesterase gene, we found higher expression to cor- relate with shorter survival after surgery. Up-regulation of PDE1C has been associated with proliferation in other cell types through hydrolysis of cAMP and cGMP [126,437]. PRSS12 encodes a protease that can activate tissue plas- minogen activator (tPA) [335], an enzyme which is highly expressed by glioma cells and has been suggested to promote invasion [168]. Indeed, our Tag-seq data shows that the tissue plasminogen activator gene PLAT is expressed in all the assayed GNS lines and strongly up-regulated in all except one (Appendix A.1). LYST, which encodes a cytoplasmic protein involved in lysosomal traf- ficking, has no clear role in cancer-related processes, although there is evidence of altered protein kinase C levels in LYST-deficient cells [168]. The GNS versus NS transcriptome comparison thus identified multiple genes known to play a role in glioma as well as several other genes likely to do so.

I have compared gene expression and non-coding RNA expression between can- cer stem cells from glioblastoma and NS cells, as well as evaluated correlations amongst sets of highly significant differentially expressed genes found in our dataset and other glioblastoma cell lines established by The Cancer Genome Atlas project [326] and other lower grade gliomas from studies by Freije et al [148] and Phillips et al [390]. I have also verified how gene expression might be affected by the aberrations known to exist in glioblastoma cell genomes to conclude that there was only a modest trend between the presence of aberra- tions and the gene expression observed and no adjustment needed to be made to our differential expression calls. Interestingly, the GNS cell lines tend to have very low genomic instability outside the one carried in lieu of them being cancer-derived cell lines, which typically increases over a short period of time to then stabilise later on in classic cancer cell lines. GNS cell lines start accu- mulating chromosomal and genomic aberrations very late, after having passed the one hundred passage mark [404]. The four GNS lines that we profiled have been thoroughly phenotypically characterised and all give rise to glioma-like

224 9.1. Digital Profiling of GNS Cell Lines Discussion tumours when transplanted into immunocompromised mice [404]. In addition, the lines were established from tumours with differing histology, allowing us to sample the breadth of the disease. In fact, our correlation analysis (Fig 6.6) showed that when compared with each other the GNS cell lines fared the worst with respect to the high scoring correlations observed, instead, be- tween our biological replicates (the two cell lines established from the same parental tumour, but in different laboratories) and the two NS cell lines. This is expected, as G144, G166 and G179 originate from different tumours with histologically distinct properties and further confirms the presence of molecular variability amongst different glioblastoma patients: G144 was derived from a glioblastoma with a significant oligodendrocyte component, G166 from a case of glioblastoma multiforme and G179 from a giant cell glioblastoma.

Furthermore, by considering expression changes in a pathway context (Fig 7.8), we identified several genes that have not been directly implicated in glioma, but participate in glioma-related pathways, such as the putative cell adhesion gene ITGBL1 [49], the orphan nuclear receptor NR0B1, which is strongly up-regulated in G179 and is known to be up-regulated and mediate tumour growth in Ewing’s sarcoma [158], and the genes PARP3 and PARP12 (Table 6.6). Other examples include three down-regulated calcium channel genes, CACNA1C, CACNG7 and CACNG8, and one up-regulated, CACNA1A. Deregulation of these genes could affect cellular calcium levels, which influence the activation status of protein kinase C and indirectly the MAPK pathway. Another example is ITGBL1, an integrin-related gene that is expressed by G166 and G179 and may play a role in cell adhesion [49]. The pathway analy- sis also highlighted the up-regulated genes PARP3 and PARP12, which belong to the PARP family of ADP-ribosyl transferase genes involved in DNA repair. The up-regulation of these genes may have relevance for therapy, as PARP inhibitors currently are in clinical trials for several cancers and there is evi- dence that PTEN loss, which is common in glioblastoma, may sensitise cells to PARP inhibitors [325].

Transcriptome analysis thus identified multiple genes of known significance in glioma pathology as well as several novel candidate genes and pathways. These results are further corroborated by survival analysis, which revealed a GNS expression signature associated with patient survival time in five inde- pendent datasets. This finding is compatible with the notion that gliomas

225 9.1. Digital Profiling of GNS Cell Lines Discussion contain a GNS component of relevance for prognosis. Five individual GNS signature genes were significantly associated with survival of glioblastoma pa- tients in both of the two largest data sets: PLS3, HOXD10, TUSC3, PDE1C and the well-studied tumour suppressor PTEN. PLS3 (T-plastin) regulates actin organisation and its over-expression in the CV-1 cell line resulted in partial loss of adherence [26]. Elevated PLS3 expression in GNS cells may thus be relevant for the invasive phenotype. The association between tran- scriptional up-regulation of HOXD10 and poor survival is surprising, because HOXD10 protein levels are suppressed by a microRNA (miR-10b), which is highly expressed in gliomas, and it has been suggested that HOXD10 suppres- sion by miR-10b promotes invasion [476]. Notably, in our microRNA microar- ray dataset microRNA miR-10b is up-regulated three-fold in GNS cell lines with respect to NS cell lines (Table 6.8; Appendix F) and the HOXD10 mRNA up-regulation we observe in GNS lines also occurs in glioblastoma tumours, as shown by comparison with grade III astrocytoma (Fig 7.4b). Similarly to our findings, miR-10b is present at higher levels in glioblastoma compared to gliomas of lower grade [476]. It is conceivable that HOXD10 transcriptional up-regulation and post-transcriptional suppression is indicative of a regulatory program associated with poor prognosis in glioma. Also, from the survival analysis it was clear that tumours from older patients featured an expression pattern more similar to the GNS signature. One of the genes contributing to this trend, TUSC3, is known to be silenced by promoter methylation in glioblastoma, particularly in patients over 40 years of age [277]. Loss or down-regulation of TUSC3 has been found in other cancers, e.g. of the colon, where its promoter becomes increasingly methylated with age in the healthy mucosa [13]. Taken together, these data suggest that transcriptional changes in healthy aging tissue, such as TUSC3 silencing, may contribute to the more severe form of glioma in older patients. Thus, the molecular mechanisms underlying the expression changes described here are likely to be complex and varied. To capture these effects and elucidate their causes, transcriptome anal- ysis of cancer samples will benefit from integration of diverse genomic data, including structural and nucleotide-level genetic alterations, as well as DNA methylation and other chromatin modifications.

To identify expression alterations common to most glioblastoma cases, other studies have profiled tumour biopsies in relation to non-neoplastic brain tis- sue [196,383,497]. While such comparisons have been revealing, their power

226 9.1. Digital Profiling of GNS Cell Lines Discussion is constrained by discrepancies between reference and tumour samples; for in- stance the higher neuronal content of normal brain tissue compared to tumours. Gene expression profiling of tumour biopsies further suffers from mixed signal due to a stromal cell component and heterogeneous populations of cancer cells, only some of which contribute to tumour progression and maintenance [379]. Part of a recent study bearing a closer relationship to our analysis examined gene expression in another panel of glioma-derived and normal NS cells [299], but included neurosphere cultures which often contain a heterogeneous mix- ture of self-renewing and differentiating cells. We have managed to circumvent these issues by profiling uniform cultures of primary malignant stem cell lines that can reconstitute the tumour in vivo [402], in direct comparison to normal counterparts of the same cell type [107,481]. While the resulting expression patterns largely agree with those obtained from glioblastoma tissues, there are notable differences. For example, we found the breast cancer oncogene LMO4 (discussed above) to be up-regulated in most GNS lines, although its average expression in glioblastoma tumours is low relative to normal brain tissue (Fig 7.4 3a). Similarly, TAGLN and TES were absent or low in most GNS lines, but displayed the opposite trend in glioblastoma tissue compared to normal brain (Fig 7.4c) or grade III astrocytoma (Fig 7.4d). Importantly, both TAGLN and TES have been characterised as tumour suppressors in malignancies outside the brain and the latter is often silenced by promoter hypermethylation in glioblastoma [30,349]. A very interesting hypothesis about the TES gene that we could explore further with specific biochemical assays, is that, assuming the methylation mark observed in glioblastoma literature is maintained in our GNS cell lines, there must be a very robust mechanism to keep this gene silenced. In fact, TES is found on chromosome 7, which is nearly always present in very high copy number, for example in cell line G144, which carries more than 10 copies at late passages (Fig 6.7). If we could verify that the methylation mark in our GNS cell lines was maintained, this would be consistent with TES being silenced very early on in the disease, before the chromosomal gains and ane- uploidy, which may make it a good candidate as a glioblastoma initiating event.

In assigning each GNS cell line to one of the four expression signature profiles as defined by the latest study by Verhaak et al [511], each match was reflective of their known histopathological features and this further supported the need for a patient stratification approach in assigning therapies. G166 and G179, both glioblastoma cell lines, were assigned to the mesenchymal signature, asso-

227 9.1. Digital Profiling of GNS Cell Lines Discussion ciated with the poorest prognosis. G144, instead, with its high oligodendrocyte component known to positively correlate with patient survival rates in glioblas- toma, was assigned to the proneural signature, linked with neural markers and therefore the best prognosis of all. In comparing the GNS line expression pro- files to the subtype signatures, we found that both G166 and G179 correlated strongly with the mesenchymal signature with worst prognosis. Mesenchymal subtype markers with elevated expression in these two lines included MET, CD44, CD68 and CASP1 [511]. G144 did not correlate significantly with any of the four signatures, but showed a slight positive correlation with the proneu- ral one. Supporting such classification of G144 were several of the hallmarks of the proneural subtype emphasised by Verhaak et al 2010 [511]: high expres- sion of oligodendrocytic development genes PDGFRA, NKX2-2 and OLIG2, as well as ERBB3, DCX and TCF4 genes and low levels of tumour suppressor CDKN1A. These expression signature profile studies should be performed on an always greater number of glioblastoma samples in order for them to be able to capture even finer differences than the ones proposed by the study by Ver- haak et al, which have already proven the existence of different glioblastoma classes and different prognoses associated with each class. If each class were to be targeted by a tailored molecular therapy, this class would find it most beneficial as a treatment and patients would obtain better results.

A Gene Ontology term analysis confirmed that the sets of differentially ex- pressed genes were enriched for genes involved in processes related to brain de- velopment and cancer biology. We also observed enrichment of genes encoding regulatory and inflammatory proteins, such as signal transduction components, cytokines, growth factors and DNA binding proteins. In line with these find- ings, affected pathways from the KEGG database included Cytokine-cytokine receptor interaction, Neuroactive ligand-receptor interaction, MAPK signaling and, expectedly, Glioma, a collection of genes involved in glioma formation. GSEA analysis revealed a consistent up-regulation of inflammatory genes in the GNS lines belonging especially to the MHC class II family, suggesting an immune-evasion phenotype that has already been called upon by a small num- ber of early glioma studies [468,484]. The up-regulation in the GNS lines was seen in several MHC class II genes, as well as related genes involved in antigen presentation on MHC class I complexes. Several works have already shown that MHC class I and II molecules are involved in aspects of human cancer pathology such as invasion and migration [327,421,546]. We find an overall

228 9.1. Digital Profiling of GNS Cell Lines Discussion transcriptional up-regulation of MHC class II molecules and a smaller subset of MHC class I molecules, suggestive of the absence of transcriptionally active compensatory mechanisms. Follow-up proteomic studies could be carried out by us to further explore the dynamics of this immunoevasion mechanism in ac- tion. The identification of specific protein expression levels, in fact, would help us understand the level at which this regulation, which is bound to exist given the up-regulation observed especially in MHC class II molecules, takes place. It would help us answer questions such as whether an excess of MHC class II molecules is also present on the surface of GNS cells, and if there are any correlations with the expression signature profiles and the diagnosis associated with them. In fact, it is possible that in the proneural signature would fall those glioblastoma samples that carry higher protein expression levels of MHC class II molecules, which by exposing aberrant extracellular antigens, would be responsible for a more efficient immunological activation against those cells, in line with the kinder prognosis associated with this signature. On the other hand, it is also possible that little or no sample-dependent variation exists in the up-regulation of MHC class molecules and therefore no correlation between prognosis and MHC member protein levels. In the attempt to observe whether non-coding RNA regulation plays an impor- tant role in the levels of gene expression observed for GNS cell lines, differential microRNA-targeted isoform expression and long non-coding RNA expression were evaluated. Assuming that the differential isoform expression is due to microRNA regulation exerted on a specific isoform, after having identified approximately 2,000 isoform candidates for differential expression detected by multiple tag mappings, we verified which predicted microRNA sequences (from five of the leading algorithms) aligned on their 3'UTRs. An interesting candi- date for isoform microRNA regulation is gene NTRK2, which encodes a recep- tor for brain-derived neurotrophic factor, promoting differentiation, prolifera- tion and survival [118]. In several tumour types, NTRK2 expression correlates with poor prognosis and metastasis [118] and the protein has been detected in a subset of cells in astrocytomas [29]. The Tag-seq data demonstrates that GNS lines express a short NTRK2 isoform, potentially the one that lacks the kinase domain (Fig 6.11), which has been implicated in regulation of astrocyte morphology [366]. Other interesting candidate genes for microRNA regulation of isoform expression, are the tumour suppressor gene BRCA1, involved in p53 signaling [370], the genes AKT2 and AKT3, involved in glioblastoma tumour evasion phenotypes [245], BMP7, a member of the TGF-β superfamily of bone

229 9.1. Digital Profiling of GNS Cell Lines Discussion morphogenetic proteins that plays a key role in the transformation of mes- enchymal cells into bone and cartilage [245], and ERBB2, HLA-A and PTEN. We found that 226 of the approx. 2,000 differentially expressed isoforms hosted at least one microRNA seed sequence between at least two tags, which identi- fied microRNA regulation as a widely adopted mechanism of regulation of gene and isoform expression in GNS cell lines. An important follow-up experiment for which data came in only recently in our laboratory, is the measurement of microRNA expression levels in our GNS cell lines on a microRNA microarray platform. Analysis of this data revealed that, of the 226 microRNAs identified on a prediction basis as regulators of isoforms within our GNS study, miR- 10b (see discussion above) and miR-26a were consistently up-regulated in our GNS cell lines, and miR-137, miR 128, miR-34a, miR-129-3p, and miR-451 consistently down-regulated, in line with the glioblastoma microRNA litera- ture [96,278,364].

In the assessment of the regulation potentially performed by long non-coding RNAs, we found 18 up-regulated and 7 down-regulated putative ncRNAs, three of which are known to be long antisense RNAs: CDKN2BAS, HOTAIRM1 and NEAT1. Although long-non coding RNA regulation still needs to be thoroughly elucidated, an interesting pattern was observed with the levels of expression of HOTAIRM1 in our GNS cell lines and the expression trend found in literature in human NB4 promyelocytic cell lines. In fact, upon induction of granulocytic differentiation, HOTAIRM1 becomes strongly up-regulated in human NB4 promyelocytic cell lines and normal hematopoietic cells, and its knock-down in NB4 cells causes down-regulation of the HOXA1 and HOXA4 genes [547]. In line with these observations, HOTAIRM1 is found to be up- regulated in our GNS cell lines compared to our NS lines, and the HOXA1 and HOXA4 genes are also up-regulated (Fig 6.22), which potentially identifies them as conserved targets of HOTAIRM1, although an immuno-precipitation assay should be performed in order to conclude that.

Overall, in this thesis I demonstrate that our results support GNS lines as suitable model for understanding the molecular basis of glioblastoma, and use of NS cell lines as controls in this setting. By this approach, we have identified several likely oncogenes and tumour suppressors that have not previously been associated with glioblastoma. With the advent of GNS cell lines, the glioblas- toma research field is bound to be enriched by experiments that will represent

230 9.2. MicroRNA Target Prediction Analysis Discussion in their results also the stem cell component of the cancer and therefore enable more accurate targeting therapeutic strategies. Outside of the glioblastoma research field, similar concepts will be applied for the adherent culturing of other solid cancers with a preponderant stem cell component, such as other brain cancers and breast as well as colon, pancreatic and lymphoma cancers. The ability to work in an adherent culture that maintains intact the stem cell component of the cancer for long periods of time is an enabling feature for the entire cancer research field that will enable researchers to move forward faster towards the achievement of effective cancer therapeutic strategies.

9.2 MicroRNA Target Prediction Analysis

In this thesis I have also constructed a microRNA target prediction analysis tool for the manipulation of microRNA target prediction data and combined the exon array expression data (Appendix E) with the microRNA array ex- pression data (Appendix E) to produce, using the GenemiR software tool, an ensemble analysis that would help me answer the question whether any com- bination of prediction algorithms together were more effective and accurate at predicting mRNA targets that any prediction algorithm alone. microRNA target prediction is the result of the factoring of several different variables that are each treated differently, depending on the algorithm at work, in terms of the importance that is assigned to each within the algorithm itself. Although several robust prediction algorithms have been developed, they all have the problem of predicting with a very high percentage of false positives. Furthermore, the poor agreement that is so often observed between sets of results originating from exactly the same input list but processed by different algorithms, tells us that there needs to be more to be then account of. Thus, the characterisation of microRNA function and regulation of mRNAs is de- pendent on a robust strategy for managing disparate results. Some prediction algorithms generate a large number of putative regulatory targets, many of which are exclusive to a particular method. Moreover, those results that do agree will not necessarily exhibit higher prediction accuracy. Even in the best case where the degree of overlap is approximately 70% (between TargetScanS and PicTar), it is not obvious which algorithm produces the optimal set of target predictions. When predictions are made, they must then be viewed

231 9.3. Concluding Remarks Discussion in context with other genomic information in order to elucidate regulatory function. In an attempt to elucidate the answer to the research question of interest, the hypothesis put forward by Alexiou et al [16] that combinations of algorithms predict more accurately than the single algorithm alone was chal- lenged. Through the analysis described in this thesis it is clear that the target prediction algorithm ElMMo [155]is at the head of the list sorted by accuracy score out of the 258 combinations tested for. Although this analysis shows that ElMMo in particular has the most accurate target prediction algorithm within the pool of eight tested prediction algorithms, it must be noted that there is a substantial discrepancy between the score achieved by ElMMo and the next best score for a single target prediction algorithm. Before Diana- microT [237,316], the second highest solo score, many combinations of two to three target prediction algorithms appear earlier in the list, demonstrating the potential for better accuracy than another solo prediction algorithm. Although it is refreshing to see that one target prediction algorithm can fare better than the rest, the question remains as to whether the variables and factors that are taken into consideration, and the ones that are not, are fairly judged in the algorithm so as to simulate the intricate regulations happening within the cell. In fact, during this analysis the accuracy score of each prediction algorithm decreased greatly when a background gene list was not used to filter out all the genes that were predicted but were not tissue specific.

While the results of this analysis indicate that some algorithms alone and some group of algorithms are better than others, it highlights at the same time the importance of re-assessing the factors that are taken into consideration to make the microRNA target predictions. If we have reached the limit of what sequence-based algorithms can achieve, we ought to start thinking of adding another dimension to the field of microRNA target prediction algorithms that takes factors in components of the tissue-specific regulatory actions of microR- NAs.

9.3 Concluding Remarks

In this thesis I aimed to characterise the transcriptional landscape of Glioma Neural Stem Cells (GNS) in the most comprehensive way possible. Several approaches were taken to examine the expression profiling data that gave in- sights into the stem cell component of the biology of these cells, which had

232 9.3. Concluding Remarks Appendix previously been overlooked due to the suboptimal culturing systems for the maintenance of pluripotency. To this end I have approached the analysis from the coding transcriptome and the non-coding transcriptome point of view, trying to draw as many parallels as possible and as many comparisons as possible with other external datasets that could give insights in the characteristics the are unique to the GNS sys- tem. Pairing the available expression data with new DNA methylation data would allow us to gain insights into the epigenetic mechanisms at work. It is, in fact, important that the future works in this field and with these cell lines are directed towards a deeper understanding of the mechanisms that elect and maintain them as tumour-initiating agents of the glioblastoma primary tumour.

233 Appendix A

Differentially Expressed Genes

A.1 Differential Expression

The differentially expressed genes generated with the DESeq Bioconductor package are listed below in alphabetical order. The values for each gene are reported in each cell line in order to give the reader an idea on the directionality of the fold change between diseased and normal counterpart. Rows with mul- tiple gene Ensembl and/or Entrez gene IDs separated by comma correspond to cases where there was not a one-to-one correspondence between Entrez and Ensembl genes.

234 A.1 Differential Expression Appendix 25.9 170.3 1.0 12.7 2.0 422.8 218.6 8.8 1.0 12.3 1.0 0.0 13.0 6.7 44.7 39.4 0.0 79.4 39.5 339.9 1346.4 0.0 21.1 30.4 0.0 0.0 26.6 0.0 0.0 0.0 0.0 165.5 108.9 0.0 0.0 0.0 0.0 0.0 3.8 133.9 7.8 0.0 713.6 0.0 226.6 471.0 0.0 62.7 23.2 3.8 184.0 0.0 0.0 22.9 0.0 0.0 1.7 0.0 1.2 0.0 85.8 219.6 96.5 81.8 0.0 0.0 18.0 28.3 0.0 18.8 0.0 0.0 93.5 173.3 0.0 120.1 0.0 187.2 346.5 133.1 0.0 12.5 0.0 87.6 1.3 4.5 27.5 65.7 8.0 0.0 0.0 0.8 18.6 0.0 0.0 0.6 23.8 54.1 25.3 2.7 23.0 0.0 227.3 7.4 0.0 0.0 58.1 16.8 0.0 214.7 1784.3 0.0 33.4 128.1 1.6 0.0 74.6 94.7 0.0 36.2 37.5 1213.2 1329.5 3430.1 171.5 16.2 0.0 79.2 1.4 65.1 1.4 87.9 6.1 0.0 211.6 1585.7 0.0 42.1 55.6 1.4 5.4 156.7 120.8 0.0 129.9 39.0 1174.8 1771.8 1880.4 25.7 3.56E-02 4.98E-05 1.95E-07 1.16E-03 1.65E-02 2.33E-07 2.32E-02 4.18E-02 6.68E-02 8.93E-02 5.28E-13 1.07E-05 7.58E-02 5.48E-02 1.43E-04 5.14E-02 2.11E-02 4.54E-03 9.45E-02 1.74E-03 1.33E-08 8.49E-12 8.35E-03 7.20E-07 3.67E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 1.26E-03 4.53E-07 9.42E-10 1.80E-05 4.58E-04 1.14E-09 7.32E-04 1.56E-03 2.94E-03 4.23E-03 3.91E-16 8.11E-08 3.46E-03 2.26E-03 1.52E-06 2.09E-03 6.40E-04 9.53E-05 4.57E-03 3.00E-05 4.66E-11 1.37E-14 1.99E-04 4.12E-09 1.32E-03 ) FC ( 2 log 3.05 -Inf 8.09 4.25 5.44 -Inf -3.00 -3.35 5.86 3.56 10.27 Inf 3.30 3.74 -7.41 -Inf Inf -3.05 -Inf -3.96 -6.16 Inf 3.56 5.49 4.94 Classification of differentially expressed genes at 10% FDR. ENSG00000204334 ENSG00000225301 ENSG00000237493 ENSG00000204089 ENSG00000100813 ENSG00000159251 ENSG00000142303 ENSG00000078549 ENSG00000170214 ENSG00000169129 ENSG00000135744 ENSG00000198074 ENSG00000101935 ENSG00000136859 ENSG00000171388 ENSG00000171885 ENSG00000137727 ENSG00000102606 ENSG00000106819 ENSG00000168874 ENSG00000018625 ENSG00000160862 ENSG00000135454 ENSG00000095739 Table A.1: 23779,553158,55615 ENSG00000186654,ENSG00000241484 -5.71 1.48E-03 4.03E-02 0.0 0.0 3.1 0.0 5.1 103.5 285141 22985 70 81794 117 147 84632 183 57016 9949 23452 8862 361 57569 8874 54829 84913 477 563 2583 25805 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts AC012215.1,UNC5DAC026410.6AC067930.1 137970AC092296.1ACTA2ADAMTS1 ENSG00000156687,ENSG00000233863ADAMTS4ADD2ADRA2A 59AGPAT9 ENSG00000227087 9510 -InfAIDA ENSG00000228381 9507ALDH1A3 ENSG00000229449 1.06E-08 1.71E-06ANGPTL1 119 150 ENSG00000154734 ENSG00000107796 0.0AP000280.1,C21orf62 84803 ENSG00000158859APOD 0.0ARC 64853 220 56245 ENSG00000075340 0.0 ENSG00000150594ARHGAP8,PRR5,PRR5- 9068ARHGAP8 ENSG00000138678 3.52 0.0ASNS 4.86 ENSG00000186063 2.06E-03 206.3 ENSG00000205929,ENSG00000239565ATAD3C 4.90 ENSG00000184254 347 5.09E-02 1.80E-03 57.2 ATP10B ENSG00000116194 56.7 2.88 -2.67 4.69E-02 23237 2.63E-04ATP1B2 28.4 Inf 1.05E-02 96.6B3GNT9 3.27E-03 140.1 1.30E-03 50.3 7.27E-02 3.62E-02 93.1 ENSG00000189058BACE2 -4.76 440 0.0 87.2 4.45E-04 219293 103.3 5.18 71.3 ENSG00000198576 19.3 InfBATF3 1.62E-02 4.65E-06 Inf 23120 79.9 405.7 31.1 0.0 0.0 0.0 1.62E-03 3.71E-04 482 13.6 57.5 7.29E-05 4.29E-02 0.0 40.2 2.25 0.7 ENSG00000215915 114.4 3.58E-15 3.65E-03 27.6 84752 ENSG00000070669 11.9 4.77 107.6 0.0 3.32E-12 65.0 146.7 3.1 16.4 -7.69 ENSG00000118322 25825 2.6 1.9 2.85E-03 0.0 18.5 618.4 4.2 1.59E-03 6.53E-02 130.0 54.1 0.0 55509 9.68E-06 ENSG00000129244 4.23E-02 349.3 30.6 51.1 486.3 0.0 ENSG00000237172 0.0 6.95E-04 0.0 0.0 0.0 60.5 5.1 401.7 3.18 ENSG00000182240 484.9 0.0 -4.03 0.0 0.0 533.4 236.6 0.0 582.3 ENSG00000123685 0.0 3.52E-04 334.1 0.0 47.3 1.34E-02 0.0 9.9 1.22E-05 2.0 3.2 558.9 1.3 8.48E-04 6.90 0.0 2.33 114.8 31.7 0.0 1217.7 0.0 173.7 Inf 0.0 1.30E-04 12.5 27.8 4.47E-03 5.85E-03 88.4 9.28E-02 -2.46 6.6 3.0 0.0 6.8 619.4 7.54E-13 2.96 84.2 2.95E-10 0.0 6.07E-04 5.0 1064.7 7.97 669.9 13.8 2.02E-02 5.06E-04 1962.9 5.12 337.6 1.77E-02 90.9 417.6 823.1 118.7 478.9 1.22E-13 84.4 3.0 303.2 19.0 0.0 5.48E-11 207.0 9.08E-05 227.5 78.1 6.3 4.34E-03 256.1 8.1 1.0 212.0 423.1 85.8 69.3 0.0 0.0 123.3 253.4 504.6 816.5 468.2 5.0 0.0 10.7 581.6 0.0 83.6 46.6 2.7 0.0 3.8 ABATAC007405.8,LOC285141 AC012354.2 AC034102.1 AC068399.1 ACIN1 18ACTC1 ADAMTS10 ADCYAP1R1 ADRA1B ENSG00000183044AFAP1L2 AGT AKR1B10 AMMECR1 ANGPTL2 APLN AQP4 -2.61ARHGAP20 2.69E-04ARHGEF7 1.07E-02ASPN 769.4ATOH8 1682.2ATP1A2 48.1AZGP1 223.8B4GALNT1 6470.4BAMBI 1513.8 BC008001

235 A.1 Differential Expression Appendix 156.2 0.0 1.0 0.9 0.0 23.6 0.0 0.0 51.7 0.0 3.8 327.6 962.7 3.0 0.0 8.1 148.1 1.0 0.0 15.9 220.9 0.0 0.0 2.0 1.0 21.1 0.0 0.0 0.0 1.1 0.1 20.5 0.0 0.0 0.0 0.0 0.0 489.1 1985.9 0.0 0.0 0.0 508.5 3.0 0.0 103.0 405.1 0.0 6.0 169.0 0.0 0.0 0.0 739.4 0.0 229.0 126.8 152.8 115.1 93.4 358.3 46.0 1.6 95.7 234.9 0.0 0.0 0.0 8.0 14.8 135.5 0.0 2.6 0.0 166.1 0.0 88.5 13567.4 0.0 48.2 2.0 263.7 37.9 175.1 9.7 81.2 68.2 0.0 0.0 39.3 55.9 82.8 2446.3 133.2 3.6 98.1 6.5 0.6 3.9 5.5 121.7 0.0 38.2 826.6 4.5 0.0 244.2 1.6 35.9 603.7 57.9 0.0 35.9 53.0 154.4 53.1 162.6 25.6 0.0 63.9 59.0 15.6 6.1 0.0 28.3 187.0 10.9 0.0 0.0 0.0 20.3 0.0 198.7 0.0 5.4 195.7 67.8 3.6 8.2 9.6 264.9 41.9 230.2 29.7 0.0 39.2 73.9 15.9 0.0 1.4 33.1 124.2 4.1 0.0 0.0 13.3 4.27E-03 5.48E-11 2.74E-04 1.12E-07 7.76E-05 1.65E-03 2.19E-04 3.10E-04 5.81E-02 1.18E-02 4.18E-02 4.22E-02 4.75E-03 8.23E-02 1.24E-14 2.40E-02 6.10E-04 7.75E-02 1.02E-03 1.54E-03 5.68E-05 2.79E-04 1.60E-03 4.29E-05 9.86E-03 6.80E-15 p-value FDR G144ED G144 G166 G179 CB541 CB660 8.87E-05 1.20E-13 3.19E-06 5.19E-10 7.52E-07 2.80E-05 2.49E-06 3.76E-06 2.45E-03 3.01E-04 1.55E-03 1.58E-03 1.01E-04 3.81E-03 2.61E-18 7.73E-04 8.38E-06 3.54E-03 1.51E-05 2.57E-05 5.28E-07 3.26E-06 2.68E-05 3.70E-07 2.45E-04 4.58E-19 ) FC ( 2 log -5.65 Inf 7.34 7.02 Inf 3.82 Inf Inf 2.57 Inf 4.69 -2.70 -3.29 4.56 Inf 4.02 -3.78 4.40 Inf -8.15 -4.77 Inf 5.04 -Inf 6.38 8.81 ENSG00000182492 ENSG00000023445 ENSG00000116985 ENSG00000130303 ENSG00000174808 ENSG00000148655 ENSG00000148735 ENSG00000140025 ENSG00000203706 ENSG00000142698 ENSG00000188674 ENSG00000181744 ENSG00000134986 ENSG00000164463 ENSG00000204542 ENSG00000232956 ENSG00000165152 ENSG00000106733 ENSG00000100314 ENSG00000151067 ENSG00000142408 ENSG00000058404 ENSG00000137752,ENSG00000204397 ENSG00000180347 ENSG00000135127 ENSG00000108691 633 330 656 684 685 83938 79949 90141 574036 84970 389073 205428 9315 153222 29113 285958 84302 54981 164633 775 59283 816 114769,440068,834 223075 92558 6347 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts BIDBMP7BMPERBTBD11BTG1C10orf116 637C10orf90 655 168667C1S 121551C1orf187 ENSG00000015475C20orf103 694 10974 ENSG00000101144 ENSG00000164619C3 ENSG00000151136 118611C4orf32C5orf38 716 ENSG00000148671 374946 ENSG00000133639C6orf138 24141 ENSG00000154493C7orf16C8orf4 ENSG00000162490 718 132720 ENSG00000182326C9orf64 2.53 ENSG00000125869 153571CA12 -1.95 Inf 442213 -6.75CACNA1A 4.78E-03 4.86E-03 ENSG00000174749 10842 9.74E-02 2.97E-03CACNG7 ENSG00000125730 9.82E-02 103.7 7.12E-04 6.69E-02 ENSG00000186493 301.5 56892 6.23CALM1 5.34 2.27E-02 81.9 ENSG00000178729,ENSG00000244694 66.6 84267 0.0 InfCAPN6 145.7 4.44E-06 65.6 7.29E-08 ENSG00000106341 117.1CBLC 771 773 0.0 3.59E-04 9.85E-06 0.0 429.8 2.12E-04 3.07 0.0 0.0CCDC48 ENSG00000176907 223.8 3.25 59284 3.3 8.75E-03 42.6 ENSG00000165118 1.3 5.00CCKBR 53.9 6.6 302.7 -Inf 0.0 3.73E-03 801 197.9 28.3 4.97E-04 0.0 8.08E-02 228.7 106.7 3.62E-03 ENSG00000074410 ENSG00000141837 0.0 827 1.75E-02 152.5 185.3 220.6 5.37 24.6 7.73E-04 7.87E-02 10.2 ENSG00000105605 Inf 45.1 9.4 8.3 2.40E-02 8.2 23624 -5.52 0.0 112.4 0.0 79825 1.4 6.36E-04 27.1 44.3 0.0 ENSG00000198668 1.4 0.0 2.11E-02 2.15E-12 7.9 887 9.66E-04 8.6 25.1 0.0 7.99E-10 0.0 ENSG00000077274 -Inf 2.88E-02 0.0 142.9 2.0 0.0 2.7 0.0 ENSG00000142273 440.9 4.75 0.0 ENSG00000114654 0.0 0.0 0.0 1.64E-03 15.2 89.5 7.88 0.0 1.5 4.35E-02 0.0 20.2 ENSG00000110148 2.23E-08 125.8 0.0 19.1 0.0 3.31E-06 24.4 1.40E-12 0.0 0.0 -4.65 7.13 40.5 12.6 5.34E-10 585.6 2.3 -2.58 0.0 114.7 0.5 0.0 13.0 9.71E-07 1.84E-14 20.3 0.0 9.69E-05 1.10E-11 0.0 156.8 4.27E-03 -2.30 54.6 45.9 1.7 1.3 613.3 8.99E-02 0.0 164.1 -Inf 0.0 89.2 18.2 2647.7 3.99E-03 28.1 752.5 Inf 4.7 8.49E-02 27.4 -4.17 2.3 41.8 5.42E-07 2.9 2159.3 22.6 5.75E-05 53.3 22.1 1.79E-03 12.6 2745.1 4.37 1.39E-03 48.4 30.3 0.0 0.0 4.67E-02 1252.5 3.81E-02 824.4 2.5 0.0 945.6 4.1 0.0 1635.9 3.25E-03 0.0 7.22E-02 415.1 149.8 14162.8 0.0 9.0 5.7 50.3 2936.0 0.0 78.7 76.2 187.2 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0.0 32.7 199.0 6.1 0.0 79.7 0.0 BEX5BGN BIRC3 BMP8B BST2 BTC 340542C10orf11 C10orf81 C14orf143 ENSG00000184515C1orf133 C1orf94 C2orf80 C3orf58 C5orf13 C5orf41 C6orf15 4.39C7orf40 C9orf125 1.64E-04C9orf95 7.07E-03CABP7 0.0CACNA1C 0.0CACNG8 CAMK2B 159.3 65.7CARD17,CASP1,CARD16 CCDC129 0.0CCDC64 6.8 CCL2

236 A.1 Differential Expression Appendix 0.0 0.0 0.0 1.0 6.5 9.1 16.4 0.0 1.5 67.2 1.0 10.4 71.5 0.0 0.0 11.1 24.5 0.0 4846.9 2365.8 1335.1 1.1 3.3 173.2 8.4 665.5 0.0 0.0 0.0 0.0 11.8 8.4 0.0 0.0 0.0 83.7 1.5 24.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 6989.8 0.0 156.7 989.8 42.3 19.2 136.2 22.1 0.0 71.7 180.8 1427.2 101.6 0.0 61.5 826.0 56.6 0.0 0.0 27.9 9.0 228.5 19.4 81.1 200.5 33.9 1177.6 108.2 10.7 51.6 0.0 4.1 6.9 143.8 1.3 101.3 107.8 93.4 116.2 0.0 46.3 258.6 20.7 0.0 0.0 57.0 55.8 53.9 217.8 10.5 642.9 0.0 227.1 0.0 0.0 119.2 0.0 0.6 0.0 17.2 81.5 0.0 2.4 2107.8 70.3 86.4 48.4 242.3 26.9 0.0 0.0 15.6 0.0 27.9 10.9 0.0 3.0 6.5 79.2 18.7 0.0 267.1 0.0 0.0 0.0 2.7 33.8 0.0 4.1 3942.1 78.3 143.8 23.9 345.8 34.5 0.0 0.0 25.7 0.0 8.1 6.8 0.0 95.9 7.7 249.9 13.1 0.0 225.2 0.0 0.0 1.16E-03 1.88E-04 4.58E-02 1.24E-03 4.12E-02 7.87E-13 1.77E-02 4.05E-02 2.90E-03 5.90E-02 9.75E-02 9.64E-02 1.36E-02 9.79E-03 9.18E-02 3.13E-03 9.83E-02 1.77E-02 9.64E-02 3.25E-07 1.69E-02 9.74E-03 1.61E-02 8.49E-02 2.82E-02 1.23E-07 p-value FDR G144ED G144 G166 G179 CB541 CB660 1.80E-05 2.09E-06 1.75E-03 1.95E-05 1.53E-03 6.35E-16 5.05E-04 1.50E-03 5.61E-05 2.50E-03 4.80E-03 4.70E-03 3.58E-04 2.43E-04 4.40E-03 6.11E-05 4.87E-03 5.11E-04 4.70E-03 1.64E-09 4.73E-04 2.39E-04 4.42E-04 3.99E-03 9.38E-04 5.80E-10 ) FC ( 2 log Inf Inf Inf 6.82 3.41 7.11 3.56 Inf 6.68 2.55 4.77 -Inf -Inf Inf Inf 4.21 2.76 Inf -3.11 -6.48 -3.07 6.38 -4.49 -1.99 -Inf -7.77 ENSG00000108688 ENSG00000152669 ENSG00000163823 ENSG00000196352 ENSG00000129226 ENSG00000019582 ENSG00000123146 ENSG00000071991 ENSG00000148600 ENSG00000123080 ENSG00000138028 ENSG00000154645 ENSG00000138615 ENSG00000134873 ENSG00000165215 ENSG00000134326 ENSG00000170293 ENSG00000018236 ENSG00000164692 ENSG00000168542 ENSG00000144810 ENSG00000145920 ENSG00000124772 ENSG00000157613 ENSG00000100122 ENSG00000103316 6354 10309 1230 1604 968 972 976 28513 92211 1031 10669 140578 8483 9071 1365 129607 152189 1272 1278 1281 1295 10814 57699 90993 1414 1428 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts CCND2CCNYCD248CD58CD70 894CD9 219771CDH13 57124CDH6CDKN2A 965 ENSG00000118971 ENSG00000108100CEBPB 970CHCHD10 ENSG00000174807 928 1012CHRDL1 ENSG00000116815CITED4 1004 1029 ENSG00000125726CLDN11 1051 ENSG00000010278CMAH ENSG00000140945 400916CMTM5 ENSG00000113361 91851 ENSG00000147889 -3.98CNKSR2 -2.29 163732CNTN6 ENSG00000172216 ENSG00000241579,ENSG00000242131 9.23E-07 -Inf 5010 3.93E-03 9.27E-05COL21A1 8.44E-02 ENSG00000101938 158.0 8418COL4A6 3.26 265.6 1.69E-03 ENSG00000179862 116173 125.2 4.45E-02CPAMD8 Inf 136.3 22866 2.4 2.05E-03 ENSG00000013297 0.6CPNE2 3.70 4.87 5.09E-02 189.6 Inf 27255CRB2 1.78E-16 40.4 0.0 ENSG00000168405 0.0 81578 199.7 ENSG00000166091 2.78E-13 -3.05 1.31E-05CRIP2 1.74E-08 7.23 0.0 1270.9 2.65E-11 36.8 1288 9.02E-04 ENSG00000149970 827.8 2.74E-06 0.0CRYBB2 8.17E-09 844.2 448.1 1058.3 2.29E-03 27151 3.29 496.4 120.4 ENSG00000134115 847.0 1.35E-14 0.0 0.0 5.53E-02 1082.4 713.0 ENSG00000124749 8.49E-12 165.3 74.7 221184 494.6 3.00 1.33E-04 396.5 815.1 4.6 1301.8 523.9 17.5 ENSG00000197565 5.94E-03 6.87 286204 189.3 114.4 22.5 4.8 457.9 848.2 ENSG00000160111 106.7 2.10E-03 45.7 1397 5.0 4.8 0.0 5.73 3.1 73.0 28.4 5.16E-02 1415 1206.3 ENSG00000140848 1.58E-05 155.6 1410.3 0.0 1.05E-03 32.9 0.0 1.0 0.0 5.82 58.8 552.5 ENSG00000148204 5.35 0.0 7.86E-11 1413.1 3.0 270.9 2.05E-08 0.7 4.52 139.1 0.0 ENSG00000182809 2.97E-03 2597.6 0.0 1.50E-03 36.7 ENSG00000244752 6.69E-02 6.6 4.3 42.9 -Inf 4.07E-02 1809.3 0.0 1.66E-04 4.23 30.2 70.1 176.1 60.7 7.15E-03 75.0 2.7 129.7 6.87E-06 0.0 948.0 -3.00 116.5 8.43E-07 46.4 5.15E-04 -Inf 8.58E-05 154.6 0.0 0.0 0.0 0.0 3.1 2.27E-03 0.0 -2.19 6.8 5.50E-02 1.24E-03 6.6 35.3 1.0 83.2 0.0 0.0 3.53E-02 -4.11 0.0 108.9 2.61E-03 1.4 0.0 0.0 6.09E-02 0.0 -2.56 3.8 21.9 2.21E-03 487.9 6.7 Inf 5.38E-02 2.0 0.0 1.0 0.0 45.6 4.4 1.91E-03 433.9 8.1 880.4 4.86E-02 0.0 1.85E-04 8.2 78.3 77.3 0.0 362.8 7.86E-03 3.7 138.2 0.0 1.4 46.6 232.0 95.2 31.3 0.6 1114.3 170.8 32.8 16.9 0.0 863.3 7.5 45.8 16.2 106.5 64.3 277.6 0.0 63.5 345.0 0.0 0.0 CCL26CCL7 CCNO CCR1 CD55 CD68 10344CD74 CD97 CDH19 ENSG00000006606CDHR1 CDKN2C CGREF1 CHODL CILP CLDN10 CLDN3 CMPK2 InfCMTM8 8.09E-11CNTN1 2.07E-08COL1A2 0.0COL3A1 0.0COL8A1 CPLX2 355.9CPNE5 78.5CREB3L1 0.0CRYBB1 0.0 CRYM

237 A.1 Differential Expression Appendix 49.8 133.2 24.7 0.0 5.1 2.0 529.0 401.7 47.3 0.0 0.0 0.0 4813.8 39.3 0.0 312.5 16.9 0.0 22.4 104.6 75.1 15.1 6.1 167.8 2.1 63.5 0.0 10.3 0.0 0.0 0.0 0.0 263.6 0.0 26.9 0.0 0.0 0.0 8865.3 4.6 2.3 128.8 10.4 0.0 15.5 2.4 45.6 11.4 21.9 1134.0 6.9 2.3 0.0 1499.9 215.3 446.4 529.8 19.6 5664.5 0.0 250.5 88.4 27.6 64.9 847.4 257.8 16.3 0.0 18.5 24.4 157.1 613.9 2.5 128.0 124.8 89.1 7.4 487.4 0.0 311.7 62.2 17.3 5.6 72.7 612.9 0.0 748.9 29.5 33.9 46.5 9.3 2.2 86.9 3.2 181.5 78.6 166.2 551.2 4.5 0.6 1.9 7.2 138.8 0.0 0.0 731.4 0.0 0.0 0.0 10.9 145.2 0.0 1345.3 17.1 57.8 0.0 523.2 652.2 142.9 7.8 410.4 48.6 53.1 3.1 0.0 731.1 272.5 207.2 161.5 268.9 0.0 937.2 2.7 0.0 0.0 1.4 145.0 0.0 933.6 0.0 31.0 0.0 90.5 513.1 129.0 6.8 130.7 29.2 53.7 4.1 1.4 642.9 273.5 176.2 137.2 243.2 4.84E-02 1.03E-03 7.22E-02 1.44E-08 6.46E-07 5.57E-02 7.77E-02 3.65E-07 4.59E-05 1.65E-03 5.09E-03 5.31E-03 1.75E-03 1.48E-03 6.64E-04 4.59E-05 2.48E-03 8.39E-04 8.23E-02 9.29E-03 5.61E-02 3.67E-04 6.45E-02 8.54E-03 4.20E-03 1.72E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 1.89E-03 1.55E-05 3.24E-03 5.32E-11 3.56E-09 2.32E-03 3.57E-03 1.87E-09 4.13E-07 2.79E-05 1.10E-04 1.16E-04 3.03E-05 2.43E-05 9.15E-06 4.13E-07 4.66E-05 1.20E-05 3.81E-03 2.27E-04 2.34E-03 4.57E-06 2.81E-03 2.04E-04 8.68E-05 4.86E-04 ) FC ( 2 log -Inf 3.55 2.86 Inf 6.13 5.08 2.43 -Inf 4.40 Inf Inf Inf -3.89 3.79 6.17 -5.92 3.87 Inf 2.74 2.85 -4.73 4.43 3.24 -2.68 4.53 2.93 ENSG00000163599 ENSG00000109861 ENSG00000107562 ENSG00000081041 ENSG00000124875 ENSG00000185753 ENSG00000057019 ENSG00000011465 ENSG00000175197 ENSG00000203797 ENSG00000147202 ENSG00000107984 ENSG00000187957 ENSG00000135905 ENSG00000173852 ENSG00000110042 ENSG00000138166 ENSG00000121310 ENSG00000101210 ENSG00000115380 ENSG00000184349 ENSG00000155849 ENSG00000110675 ENSG00000197977 ENSG00000125746 ENSG00000082397 1493 1075 6387 2920 6372 159013 131566 1634 1649 8528 1730 22943 92737 55619 23333 23220 1847 55268 1917 2202 1946 9844 55531 54898 24139 23136 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts CTNNA2CXCL1CXCL14CXCL3CXXC4 1496CYB5R2 2919DCHS1 9547DDAH2 ENSG00000066032 2921DDIT4L 80319DHRS3 ENSG00000163739 51700 ENSG00000145824DKFZp434H1419,AC012513.4 8642 150967DLK1 ENSG00000163734 23564 ENSG00000168772DNM3 115265 ENSG00000166394DOCK5 ENSG00000226052 9249DRD2 ENSG00000166341 ENSG00000213722DUSP16 6.39 ENSG00000145358DYNC1I1 8788 7.23 26052 2.30E-06EDA2R ENSG00000162496 2.59 2.05E-04 80005EEF1D 59.1 2.34E-13 7.86 9.96E-11EFHD2 1813 7.14E-04 -5.50 ENSG00000185559 0.0 80824 95.6 2.27E-02EGFR ENSG00000197959 6.33 1.87E-16 1780 0.0 ENSG00000147459 1.94E-05 2.78E-13 0.0ELMO2 0.0 -2.12 1.24E-03 0.0 -3.37 60401 2.99E-04ELN 0.0 159.1 18.1 ENSG00000149295 -3.46 1.17E-02 1.2 ENSG00000111266 1936 4.68E-03ELTD1 -6.41 1.38E-03 9.5 0.0 0.0 9.64E-02 7.0 5.8 ENSG00000158560 79180 3.80E-02 1.51E-03 912.1EPAS1 34.6 91.8 4.09E-02 1.15E-03 5.20 29.7 2.2 2853.2 15.6 ENSG00000131080 1956 0.0 0.0 128.6 3.31E-02 71.8 0.0 63916 2452.6 1.4 21.1 91.0 ENSG00000104529 2.00E-09 0.0 23.3 3.8 0.0 0.0 -7.95 ENSG00000142634 3.87E-07 313.6 16.5 3.5 2006 0.0 4.45 44.5 49.4 64123 78.3 28.0 7.0 ENSG00000146648 -4.03 0.0 2.69E-11 10.7 2.7 ENSG00000062598 2034 1.3 20.3 8.17E-09 110.7 4.19E-05 239.5 73.8 0.0 8.58 1.0 3.76E-04 2.28E-03 37.0 4.16 0.0 592.3 50.3 1.42E-02 115.8 ENSG00000049540 ENSG00000162618 176.8 173.7 0.0 3.40 6.3 510.7 2.39E-10 51.5 128.5 3.36E-03 ENSG00000116016 5.55E-08 0.0 -4.02 7.40E-02 0.0 0.0 0.0 4.6 1.83E-03 18.9 62.7 4.73E-02 5.12 14.1 6.46E-04 0.0 233.7 14.9 0.0 2.54 0.0 2.12E-02 95.2 6.1 6.8 3.08E-05 21.1 11.1 60.9 2.94 40.7 20.1 1.77E-03 2.77E-03 -4.56 669.4 5.0 6.38E-02 191.6 1017.5 9.1 39.5 0.0 414.9 275.7 4.40E-03 175.8 40.5 1.78E-04 167.7 1.1 4.26 9.18E-02 4.8 -Inf 0.0 7.62E-03 251.5 49.9 0.0 6.9 5.5 0.0 0.0 -2.84 676.0 2.36E-06 5.2 87.7 1.52E-07 132.3 2.09E-04 9.9 84.9 0.0 1.92E-05 134.7 915.0 8.37E-05 90.2 0.0 0.0 75.7 4.09E-03 20.7 11.4 1024.5 48.6 184.1 4.9 40.3 0.0 2.9 3.3 2.3 372.0 136.1 0.0 51.3 224.0 21.3 0.0 0.0 68.8 4.7 40.7 2186.6 184.9 166.7 5.0 CSGALNACT1CTLA4 CTSC CXCL12 55790CXCL2 CXCL6 CXorf38 DCBLD2 ENSG00000147408DCN DDIT3 DDO DIAPH2 DKK1 DNER DOCK10 7.68DPY19L1 DTX4 2.19E-07 2.65E-05DUSP5 359.0ECHDC2 256.3EEF1A2 0.6EFEMP1 EFNA5 54.1ELMO1 0.0ELMOD1 1.3 ELOVL2 EML2 EPB41L3

238 A.1 Differential Expression Appendix 469.4 0.0 52.4 10.9 102.7 1.0 12.4 0.0 2.3 28.1 23.3 2.7 389.9 14.1 0.0 196.7 2.0 6.6 175.8 168.6 0.0 0.0 143.1 0.0 3.8 0.0 902.5 0.0 0.0 2.3 82.2 0.0 3.8 0.0 11.3 48.0 27.9 0.0 12.2 17.5 0.0 29.7 0.0 0.0 254.8 95.8 0.0 0.0 2569.7 0.0 0.0 0.0 14.6 86.1 202.8 66.2 276.1 78.7 165.8 0.0 13.8 783.8 439.3 44.3 8.2 1235.4 0.0 0.0 84.3 95.0 27.8 2352.7 0.0 705.9 0.0 0.0 5.7 3.6 14.4 1.4 235.9 84.6 374.2 39.0 47.6 5.5 1.3 90.7 155.0 5.4 16.3 572.3 6.1 0.0 0.6 47.7 4.4 609.6 0.0 6.9 5.6 0.0 121.1 0.0 268.5 39.4 124.6 247.2 1338.9 1073.7 34.8 82.5 580.3 91.7 393.1 110.8 0.0 88.9 1514.3 0.0 939.1 0.0 3.1 999.8 180.1 1205.4 0.0 126.9 32.5 288.3 204.6 36.2 169.8 181.6 1061.8 886.3 6.0 93.6 602.4 36.5 513.2 74.7 2.7 30.4 2134.8 0.0 1699.4 0.0 4.9 666.0 110.9 1650.8 0.0 87.7 8.1 294.5 5.69E-03 2.35E-03 2.27E-02 1.78E-03 3.33E-02 8.49E-12 5.01E-02 3.11E-02 5.13E-04 1.58E-02 2.18E-03 1.59E-02 8.65E-03 2.71E-08 2.61E-12 1.86E-05 3.72E-11 8.94E-02 3.60E-03 4.63E-03 5.08E-04 2.06E-13 1.21E-13 4.15E-03 1.78E-02 9.52E-06 p-value FDR G144ED G144 G166 G179 CB541 CB660 1.26E-04 4.35E-05 7.06E-04 3.11E-05 1.16E-03 1.37E-14 2.00E-03 1.05E-03 6.80E-06 4.25E-04 3.96E-05 4.32E-04 2.08E-04 1.07E-10 2.46E-15 1.45E-07 6.75E-14 4.24E-03 7.16E-05 9.74E-05 6.70E-06 1.11E-16 5.68E-17 8.54E-05 5.17E-04 6.98E-08 ) FC ( 2 log -2.78 Inf 2.83 4.30 2.84 9.61 3.37 Inf 4.89 3.08 3.68 5.14 -4.63 5.32 Inf -Inf 8.40 3.74 -4.19 3.32 Inf Inf -9.50 Inf 4.71 Inf ENSG00000182580 ENSG00000196482 ENSG00000157557 ENSG00000131187 ENSG00000122591 ENSG00000154153 ENSG00000115363 ENSG00000047662 ENSG00000188916 ENSG00000144802,ENSG00000144815 ENSG00000154511 ENSG00000162981 ENSG00000138829 ENSG00000156804 ENSG00000132185 ENSG00000153266 ENSG00000137441 ENSG00000184922 ENSG00000129654 ENSG00000033170 ENSG00000089356 ENSG00000221946 ENSG00000136928 ENSG00000147402 ENSG00000136542 ENSG00000007237 2049 2104 2114 2161 84668 54463 84141 27146 642938 64332,91775 388650 151354,653602 2201 114907 84824 55079 83888 752 2302 2530 5349 53822 9568 55879 11227 8522 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts ERBB4ETS1EVC2F2RL1FAM129A 2066FAM150BFAM181A 2113FAM189A1 132884 2150FAM38B2,FAM38B,C18orf58 ENSG00000178568 116496 63895FAM5B 285016 ENSG00000134954 ENSG00000173040FAM70A 90050FBLN2 23359 ENSG00000135842 ENSG00000164251FBXO27 ENSG00000154864,ENSG00000168738,ENSG00000175388 ENSG00000189292 -InfFCGR2B,FCGR2C,FCGR2A ENSG00000140067 57795 2212,2213,9103FERMT3 ENSG00000104059 1.16E-13 55026FGF19 5.48E-11 ENSG00000072694,ENSG00000143226,ENSG00000244682 -4.00 0.0 2199FGFR1 6.26 126433 ENSG00000198797 5.23FOXG1 8.10E-05 0.0 Inf ENSG00000125355 1.33E-05 3.97E-03FOXQ1 9.12E-04 83706 8.1 1.13E-05 2.81 4.11 0.0FXYD1 338.7 ENSG00000163520 7.98E-04 2.87E-03 ENSG00000161243 -5.14 9965 121.4FXYD5 6.55E-02 29.7 0.0 219.8 1.72E-03 5.83E-04 8.1 2260 -InfFZD3 4.52E-02 1.95E-02 201.8 3.09E-04 0.7 13.2 ENSG00000149781 778.6 -Inf 43.9 0.0 2290 1.19E-02GABRA5 19.4 3.1 0.0 0.0 5.2 166.8 2.48E-03 94234GAL 71.2 ENSG00000162344 92.2 5.86E-02 0.0 3.09E-05 5348 39.5 0.0GALR1 4.1 59.4 ENSG00000077782 1.77E-03 4.7 2.65 250.9 1.3 53827 0.0 74.7 28.7 ENSG00000176165 6.01 489.4 314.9 2.2 0.0 0.6 ENSG00000164379 129.5 2.97E-03 4.0 7976 0.0 64.0 0.0 2558 -Inf 6.69E-02 1.99E-06 0.8 0.0 5.09 ENSG00000221857 0.0 13.7 1.84E-04 0.0 13.1 ENSG00000089327 0.0 202.7 51083 0.0 7.1 55.4 9.47E-10 1.64E-05 34.4 2587 4.18 1.95E-07 0.0 1.07E-03 114.1 ENSG00000104290 69.6 0.0 ENSG00000186297 0.8 0.0 0.6 15.7 78.2 -Inf 2.53E-03 356.7 45.6 0.0 ENSG00000069482 5.95E-02 163.8 -1.90 0.0 18.2 2.5 ENSG00000166573 0.0 3.78E-07 0.0 3.30 0.0 64.5 4.32E-05 4.51E-03 -Inf 5.2 26.0 41.3 9.34E-02 3.5 0.0 195.6 1.57E-04 1177.0 7.79 6.83E-03 4.65E-04 13.9 0.0 0.0 3.06 1288.2 232.1 445.0 1.66E-02 119.9 8.28E-12 147.2 0.0 117.5 4.9 0.0 505.5 -2.63 2.86E-09 3.46E-04 311.0 -6.00 0.0 269.6 1.32E-02 104.5 0.0 0.0 3067.1 45.6 2.29E-03 137.4 5.0 1.45E-03 673.5 Inf 5.53E-02 1294.6 0.0 3.99E-02 0.0 Inf 0.0 98.9 1.6 0.0 8.1 1.01E-03 0.0 208.3 0.0 57.5 50.8 489.3 3.00E-02 7.97E-04 1.7 0.0 0.0 2.45E-02 49.7 11.2 0.0 86.2 0.0 9.1 52.7 1.9 0.0 2.0 96.5 0.0 240.7 37.7 0.0 0.0 262.9 0.9 82.6 0.0 65.5 0.0 0.0 0.0 0.0 EPDR1EPHB3 ESRRG ETS2 F12 FAM126A 54749FAM134B FAM176A FAM184B ENSG00000086289FAM196A,C10orf141 FAM55C,NFKBIZ FAM69A FAM84A,LOC653602 FBN2 FBXO32 FCRLA 3.09FEZF2 FGFBP2 1.22E-04FMNL1 5.55E-03 310.9FOXJ1 FUT8 269.0FXYD3 739.5FXYD7 319.0GABBR2 20.3GABRQ 82.3 GALNT5 GAS7

239 A.1 Differential Expression Appendix 4.0 0.0 0.0 108.5 7.2 1096.6 44.0 73.6 908.6 23.2 183.8 292.7 11.1 55.4 0.0 0.0 36.3 0.0 15.1 2.4 19.2 2.0 2.0 0.0 0.0 41.3 0.0 0.0 133.1 0.0 475.4 0.0 25.8 4802.2 82.2 76.9 1780.2 0.0 223.4 0.0 0.0 15.2 0.0 2.4 0.0 57.4 0.0 0.0 0.0 0.0 654.3 82.2 46.2 4.2 94.7 4880.1 448.1 0.0 1039.6 224.6 0.0 73.4 0.0 0.8 2.9 114.3 0.0 7.4 12.3 280.8 270.5 0.0 0.0 273.5 33.1 468.3 15.3 15.0 0.0 0.0 7220.3 18.4 0.0 345.3 127.8 4.4 2.7 0.0 3.8 1.0 180.4 0.0 13.7 0.6 8.1 138.4 0.0 28.9 19.5 2.5 4.7 0.0 1.6 37.2 336.4 886.6 0.0 9.5 316.9 16473.1 20.0 68.7 631.6 51.3 84.1 0.0 0.0 196.7 359.8 0.0 5434.9 995.1 173.2 631.7 33.6 0.0 7.5 5.5 32.7 468.5 899.6 1.4 39.2 393.2 7688.5 30.5 33.1 580.1 52.6 75.0 0.0 0.0 383.9 1007.2 0.0 9744.0 2885.9 94.6 479.4 22.7 5.18E-04 1.35E-02 9.74E-02 2.27E-02 1.37E-04 9.64E-02 3.42E-02 6.15E-02 7.64E-02 5.42E-09 9.74E-03 4.24E-05 4.35E-05 1.93E-02 3.22E-02 1.49E-06 3.56E-02 9.08E-05 1.61E-02 5.75E-05 6.58E-07 6.04E-11 2.24E-03 4.42E-11 7.17E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 6.94E-06 3.54E-04 4.77E-03 7.12E-04 1.43E-06 4.70E-03 1.20E-03 2.64E-03 3.49E-03 1.68E-11 2.40E-04 3.62E-07 3.84E-07 5.71E-04 1.11E-03 8.81E-09 1.27E-03 8.98E-07 4.38E-04 5.45E-07 3.67E-09 1.38E-13 4.09E-05 8.63E-14 3.20E-03 ) FC ( 2 log 4.06 Inf Inf -3.12 5.34 2.45 2.80 -3.99 -2.33 6.74 -3.99 -4.42 5.24 -2.89 Inf Inf -Inf Inf 3.83 6.56 5.68 8.36 6.06 Inf Inf ENSG00000162645 ENSG00000162654 ENSG00000187210 ENSG00000130055 ENSG00000131095 ENSG00000137563 ENSG00000165474 ENSG00000156689 ENSG00000127920 ENSG00000136235 ENSG00000164199 ENSG00000155511 ENSG00000163873 ENSG00000056998 ENSG00000164107 ENSG00000206337 ENSG00000124440 ENSG00000204257 ENSG00000223865 ENSG00000237541 ENSG00000204287 ENSG00000198502 ENSG00000152413 ENSG00000153807 ENSG00000106004 3123,3125,3126 ENSG00000196126 Inf 2.04E-10 4.89E-08 1105.3 381.8 3.1 87.8 0.0 0.0 2634 115361 2650 54857 2670 8836 2706 219970 2791 10457 84059 2890 2899 8908 9464 10866 64344 3108 3115 3118 3122 3127 9456 3206 3202 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts GBP3GBP5GDF15GEMGFPT2GJA1 2635GJC3 115362 9518GMPRGPC3 2669 ENSG00000117226 9945 ENSG00000154451GPR158GRB14 2697 ENSG00000130513GRIA3 349149 ENSG00000164949 2766GRM3 ENSG00000131459H1F0 2719 ENSG00000152661 57512HAPLN1 ENSG00000176402 2888HEPACAM ENSG00000137198 3.47 2892HLA-A ENSG00000147257 Inf ENSG00000151025 2913HLA-DPA1 1.53E-04 4.37HLA-DQA1 ENSG00000115290 3005 6.69E-03 3.88E-09 1404 0.0HLA-DQB1 4.64 6.87E-07 220296 ENSG00000125675 3.15E-07 2.25 0.0HLA-DRB3,HLA-DRB1,HLA- 3.72E-05 ENSG00000198822DRB4 0.0 288.9 1.68E-06 3105 -3.49 3113 4.84E-03 0.0 1.57E-04 ENSG00000189060HMGA2 Inf ENSG00000145681 1402.4 ENSG00000165478 9.82E-02 49.3 57.7 3117 1.89E-04HOXA1 24.3 3.72 1358.2 52.6 720.9 3119 8.00E-03 108.3 61.0 1.50E-03HOXA4 334.7 276.2 -5.79 15.6 37.5 4.05E-02 ENSG00000206503 -4.55 ENSG00000231389 2.86E-03 22.8HOXA7 37.7 31.1 0.0 130.2 6.54E-02 247.6 ENSG00000196735 9.1 1.24E-06 5.96 108.1 69.9 610.9 7.39E-08 1.19E-04 764.6 8091 243.2 86.2 ENSG00000179344 0.0 9.89E-06 7.17 0.0 15.7 140.1 116.4 103.8 3198 0.0 1.66E-03 0.0 -Inf 33.5 4.39E-02 6363.9 3.0 3201 117.4 0.0 7.45E-05 31.2 70.6 0.0 1206.7 3.59 3.71E-03 35.5 3204 6.40 0.0 Inf ENSG00000149948 4.70E-08 85.8 35.1 30.3 0.0 6.47E-06 4.5 ENSG00000105991 9.61E-04 53.3 0.0 0.0 4.55E-07 129.9 2.87E-02 4.6 3.91E-03 ENSG00000197576 0.0 3.04 4.98E-05 6.3 5.73 2311.0 93.7 8.39E-02 1.4 521.8 11.6 0.0 ENSG00000122592 60.7 82.3 354.9 Inf 3.02E-04 46.8 32.7 383.7 5.76E-11 1268.7 4.60 0.0 0.0 1.18E-02 65.6 1.53E-08 0.0 294.8 89.0 0.8 1.25E-03 1554.1 0.0 1.2 0.0 3.53E-02 3.26E-05 103.7 0.0 469.0 1033.9 198.2 0.0 1.84E-03 3.0 162.4 49.5 0.0 348.2 840.5 -5.68 89.6 0.0 66.0 535.7 243.3 Inf 347.0 0.0 13.5 3.13E-07 0.0 3.0 83.8 4.6 Inf 16.8 3.72E-05 0.0 0.0 0.0 2.21E-03 Inf 65.7 31.7 12.4 5.39E-02 4.31E-11 13.5 0.0 0.0 3.4 1.26E-08 5.79E-04 63.1 1.94E-02 10.9 0.0 0.0 7.4 51.7 100.0 12.5 23.7 102.1 2.5 48.3 233.3 0.7 382.0 0.0 576.3 0.0 0.0 0.0 0.0 0.0 0.0 GBP1GBP2 GBP4 GCNT1 GDPD2 GFAP 2633GGH GJB2 GLYATL2 ENSG00000117228GNG11 GPNMB GPR98 GRIA1 GRIK3 GYG2 HAND2 HCP5 3.90HIF3A HLA-DMA 2.24E-04 9.22E-03HLA-DPB1 1.5HLA-DQA2 HLA-DRA 0.0 64.1HLA-DRB5 HOMER1 266.1HOXA10 7.6HOXA5 6.9

240 A.1 Differential Expression Appendix 0.0 0.0 0.0 0.0 0.0 1.0 4.7 958.5 11.1 775.8 505.9 10.1 10.0 14.3 0.0 2.3 1.0 1.5 4.1 383.3 92.7 883.1 0.0 105.8 1.0 53.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4504.8 16.3 0.0 2.8 0.0 6.9 0.0 0.0 0.0 0.0 0.0 0.0 282.2 5.1 668.8 0.0 0.0 0.0 14.5 59.8 55.6 0.0 93.3 7.4 84.8 1331.9 161.5 277.2 8.3 4.5 0.0 45.7 628.7 175.1 43.5 84.2 286.9 0.0 3387.9 0.0 149.7 48.5 0.0 50.7 135.3 23.8 338.7 0.5 0.0 23.2 18.2 3.8 17.6 470.2 31.2 0.6 0.0 41.9 451.0 115.5 1.4 12.6 26.4 0.0 1531.6 0.0 127.2 33.4 0.0 31.3 79.6 0.0 0.0 76.4 45.4 61.6 16.0 50.0 1447.2 40.6 0.0 0.0 630.9 260.7 0.0 0.0 75.0 2.1 0.0 361.5 1475.9 11.4 123.3 0.0 0.0 77.3 583.2 0.0 0.0 3.5 27.2 49.7 10.8 38.0 1342.6 25.7 0.0 0.0 750.3 301.4 0.0 0.0 81.1 2.7 0.0 240.6 780.8 30.5 308.0 0.0 0.0 36.3 374.9 3.16E-02 5.72E-08 5.86E-02 1.44E-03 2.42E-02 1.56E-02 7.73E-12 1.73E-02 2.47E-04 2.13E-03 4.77E-06 1.98E-05 1.77E-02 7.02E-09 2.96E-06 3.42E-02 3.77E-02 6.10E-06 1.87E-04 4.84E-02 8.72E-02 7.22E-02 3.24E-02 2.40E-03 2.84E-03 2.67E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 1.08E-03 2.50E-10 2.48E-03 2.33E-05 7.80E-04 4.17E-04 1.09E-14 4.89E-04 2.84E-06 3.84E-05 3.37E-08 1.58E-07 5.11E-04 2.22E-11 1.93E-08 1.20E-03 1.36E-03 4.39E-08 2.05E-06 1.89E-03 4.11E-03 3.24E-03 1.12E-03 4.47E-05 5.47E-05 8.79E-04 ) FC ( 2 log Inf Inf Inf Inf Inf 6.28 7.51 -2.33 4.27 -4.89 -7.34 5.38 3.78 5.66 Inf 5.30 6.01 7.68 5.89 2.68 -3.75 -2.54 Inf -Inf 6.70 2.97 ENSG00000108511 ENSG00000170689 ENSG00000123364 ENSG00000128714 ENSG00000128709 ENSG00000173083 ENSG00000109854 ENSG00000172201 ENSG00000216490 ENSG00000185201 ENSG00000129965,ENSG00000167244,ENSG00000240801 ENSG00000142549 ENSG00000143061 ENSG00000123496 ENSG00000125538 ENSG00000169306 ENSG00000104951 ENSG00000169429 ENSG00000163083 ENSG00000186480 ENSG00000170549 ENSG00000105655 ENSG00000198542 ENSG00000078596 ENSG00000104369 ENSG00000107104 3216 3219 3229 3239 3235 10855 10553 3400 10437 10581 3481,3630,723961 402665 3321 3598 3553 11141 259307 3576 3625 3638 79192 51477 9358 9452 56704 23189 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts HOXB7HOXC10HOXD10HOXD3HPRT1 3217HRCT1 3226ICAM1 3236IFI27 3232IFI6 ENSG00000120087 3251 ENSG00000180818IFITM8P 646962 ENSG00000128710IGFBP5 3383 ENSG00000128652IGSF11 ENSG00000165704 3429IKBKE ENSG00000196196 2537IL17RD ENSG00000090339IL1R1 3488 ENSG00000165949IL33 Inf 152404 InfIL6 ENSG00000126709 9641 Inf 2.51E-14INHBA ENSG00000215096 1.43E-11 54756 1.74E-06 ENSG00000115461INPP5D Inf 0.0 1.62E-04 ENSG00000144847 7.25E-14IRAK1 3554 90.5 2.62 3.85E-11 6.40 0.0 4.05E-03IRX2 77.0 ENSG00000143466 90865 160.7 8.61E-02 6.73E-04 ENSG00000144730ITGA4 2.75 49.4 1.67E-03 795.3 2.18E-02 145.9 3569 47.1 3624ITIH5 4.42E-02 28.4 109.6 ENSG00000115594 3.86 115.3 0.0 3635 0.0 40.5 7.47E-04JAM2 0.0 578.2 ENSG00000137033 2.35E-02 82.7 3.36 3654 1.5KALRN 0.0 138.3 2.55E-05 0.0 0.0 -Inf 0.0 614.7 1.54E-03 ENSG00000136244 ENSG00000122641 153572 25.7 2.69E-05 272.9 -2.23 11.8 0.0 246.9 89.8 3676 0.0 ENSG00000168918 1.60E-03 3.49 2.27E-03 100.5 0.0 110.0 4.3 6.6 7.4 5.50E-02 4.78E-03 80760 ENSG00000184216 848.2 -4.25 0.0 9.74E-02 7.56E-04 57.7 0.0 58494 ENSG00000170561 96.9 0.8 840.8 -4.38 28.1 446.9 2.37E-02 8997 2294.7 2.18E-03 ENSG00000115232 453.7 38.2 0.0 936.7 92.5 0.0 801.8 5.35E-02 2.49 1.14E-05 ENSG00000123243 549.5 7.4 2.3 8.03E-04 30.1 0.0 29.0 -Inf ENSG00000154721 25.5 0.0 1.56E-03 1081.5 20.5 ENSG00000160145 0.0 2.3 175.0 4.18E-02 Inf 3234.4 7.38 17.5 1.60E-03 0.0 25.7 4.26E-02 3148.3 -3.50 0.6 2.1 9.8 0.0 14.7 7.89E-07 15.0 1.07E-06 2.95 3.3 8.09E-05 46.3 1.05E-04 4.59E-03 15.3 18.3 67.6 44.6 -4.65 86.5 0.0 9.47E-02 82.9 6.8 4.63E-04 364.6 977.0 2.99 25.3 1.66E-02 172.1 0.0 1.01E-05 230.4 22.2 5.87 19.4 292.6 7.21E-04 10.9 66.0 110.4 9.76E-04 0.0 0.0 -2.11 279.4 104.7 2.91E-02 64.6 16.5 3.8 3.70E-10 -5.25 70.2 0.0 764.0 8.21E-08 0.0 3.94E-03 0.0 0.0 0.0 1336.6 375.0 8.44E-02 2.09E-06 63.0 51.2 196.8 0.0 1302.0 1.88E-04 0.0 1.0 75.1 61.6 0.0 11.9 2.8 310.6 28.9 46.1 49.5 594.6 1.4 12.5 6.3 370.0 35.7 199.1 111.1 3.6 2.5 20.2 780.0 7.1 10.7 726.7 302.9 106.2 HOXB6 HOXB9 HOXC13 HOXD13 HOXD9 HPSE HTATIP2 ID4 IFI30 IFITM2 IGF2,INS,INS- IGF2,AC132217.2 IGLON5 IGSF3 IL13RA2 IL1B IL1RAPL1 IL4I1 IL8 INHBB INSIG1 IRX1 ISYNA1 ITGBL1 ITM2A JPH1 KANK1

241 A.1 Differential Expression Appendix 0.0 6.8 150.0 5.4 149.0 58.9 578.3 27.3 2.0 446.0 3.0 4.7 183.5 86.0 49.6 0.0 9.0 1.0 150.4 22.2 1.0 5.0 1.1 3.3 6.3 2122.4 0.0 39.5 1598.8 0.0 730.2 0.0 219.8 15.9 0.0 136.9 0.0 0.0 446.7 187.5 0.0 0.0 139.4 10.3 282.4 5.4 5.0 0.0 0.0 57.1 0.0 1665.0 0.0 0.0 58.7 0.0 2.6 4.1 643.9 891.5 0.0 0.0 101.0 15.6 12.7 18.6 0.0 0.0 3.7 454.5 0.0 1117.4 178.0 139.8 41.7 0.0 303.0 334.9 0.0 0.0 25.5 67.2 8.0 630.6 3936.6 98.9 0.0 0.0 12.6 32.9 10.0 5.6 0.0 222.4 0.0 79.3 16.0 0.0 110.6 25.8 61.6 0.0 103.3 66.5 139.1 0.0 71.6 113.0 25.2 6.3 3802.5 28.0 139.7 0.0 3.1 184.3 205.3 38.9 0.0 0.0 0.0 117.1 7.1 0.0 28.1 1.6 37.1 0.0 402.5 895.3 60.5 0.0 105.3 58.2 47.3 0.0 2898.0 25.7 113.3 0.0 2.7 186.4 268.8 29.6 0.0 0.0 0.0 85.4 14.9 0.0 43.3 0.0 18.9 0.3 118.6 1932.6 2.44E-03 3.79E-02 1.97E-04 1.80E-02 4.42E-06 1.43E-02 3.56E-02 3.99E-04 2.74E-02 1.02E-08 5.61E-02 5.09E-03 5.09E-02 5.73E-02 4.84E-02 2.47E-05 4.15E-03 2.10E-05 9.55E-04 4.77E-06 9.83E-04 2.17E-02 5.70E-03 1.42E-02 2.71E-09 8.25E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 4.57E-05 1.37E-03 2.20E-06 5.25E-04 3.06E-08 3.79E-04 1.27E-03 5.11E-06 9.07E-04 3.43E-11 2.34E-03 1.10E-04 2.04E-03 2.41E-03 1.89E-03 2.03E-07 8.55E-05 1.71E-07 1.41E-05 3.34E-08 1.46E-05 6.64E-04 1.27E-04 3.77E-04 7.66E-12 3.82E-03 ) FC ( 2 log Inf -Inf -4.06 4.56 -5.21 2.85 2.80 3.97 5.52 -Inf 4.67 4.94 -2.05 -2.68 -Inf Inf -5.76 5.22 -4.79 4.75 5.06 4.45 6.53 -Inf 6.47 -2.13 ENSG00000184185 ENSG00000184261 ENSG00000178695 ENSG00000067082 ENSG00000196569 ENSG00000186007 ENSG00000131981 ENSG00000128342 ENSG00000169783 ENSG00000048540 ENSG00000072201 ENSG00000138131 ENSG00000117114 ENSG00000134569 ENSG00000183908 ENSG00000012223 ENSG00000079257 ENSG00000143669 ENSG00000178573 ENSG00000144063 ENSG00000197442 ENSG00000135525 ENSG00000145416 ENSG00000140832 ENSG00000197971 ENSG00000106484 3768 56660 115207 1316 3908 93273 3958 3976 84894 55885 84708 84171 23266 4038 219527 4057 56925 1130 4094 7851 4217 9053 55016 91862 4155 4232 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts KCNK1KCNMB2KIAA1217L3MBTL4LAMA4 3775LFNG 10242LHFPL3 56243LIFR 91133LMO2 ENSG00000135750 ENSG00000197584 3910LMO4 ENSG00000120549LOC283070,CAMK1D 3955 ENSG00000154655 375612LPAR6LRAT ENSG00000112769 3977 283070,57118LRRC2 4005 ENSG00000106003 ENSG00000187416 ENSG00000183049LRRN2 8543LUM ENSG00000113594 5.95 10161LY96 5.32 ENSG00000135363 4.49MACROD2 9227 1.82E-03 ENSG00000143013 1.85E-05 79442 4.72E-02 InfMAL 1.19E-03 0.0 1.62E-03 10446 ENSG00000139679MAN1C1 2.6 4.29E-02 2.81 2.11E-10 35.7MAP6 0.0 4060 ENSG00000121207 4.97E-08 0.0 -2.21 ENSG00000163827MAPT Inf 140733 58.0 23643 2.23E-03 52.3 -2.40 90.8 5.43E-02 ENSG00000170382MARS 38.9 2.21E-03 885.8 67.9 81.5 3.3 2.25E-09MATN2 -2.95 5.38E-02 5.40E-04 204.5 4118 57134 ENSG00000139329 4.29E-07 4.0 66.1 1.84E-02 984.7 -4.82 11.6 ENSG00000172264MEIS2 0.0 0.0 485.5 ENSG00000154589 109.8 1.06E-03 3.80 31.7 341.6 4135 0.0 96.7 3.15E-02 2.89E-06 402.6 142.2 1.0 4.5 181.0 116.1 4137 0.0 2.50E-04 6.9 -3.62 6.59E-06 ENSG00000172005 0.0 3.6 ENSG00000117643 0.6 20.3 4141 38.0 124.8 5.03E-04 0.0 49.2 4147 0.0 5.42 553.8 35.0 1.24E-04 21.2 38.0 69.6 ENSG00000171533 Inf 5.61E-03 429.5 4212 1311.2 578.0 727.7 ENSG00000186868 0.0 29.8 0.0 -4.91 6.40E-06 455.5 42.4 4.93E-04 52.7 ENSG00000166986 1803.6 4.14E-06 4.9 0.0 33.8 1149.8 0.0 1.54E-04 3.38E-04 ENSG00000132561 -4.26 2775.9 -3.07 6.71E-03 16.7 0.0 7.53 39.6 134.6 ENSG00000134138 7.8 0.0 6.32E-05 122.5 2.85E-03 0.0 369.0 3.23E-03 0.0 1.62E-07 6.53E-02 0.0 5.5 1.9 6.24 3.59 2.01E-05 20.6 395.9 169.1 3.4 290.9 0.0 0.0 211.3 0.0 2.59E-10 3.15E-05 21.8 -3.25 1.5 5.83E-08 59.4 1.79E-03 0.0 4.55 110.9 22.8 0.0 0.0 556.5 1.78E-04 106.5 3.0 2.80 0.0 7.62E-03 97.9 0.0 478.6 8.34E-06 23.3 3.44 0.0 115.6 2.3 6.10E-04 507.4 27.3 107.6 1.53E-03 0.0 11.9 353.1 3.05 90.0 4.12E-02 4.46E-04 378.3 32.9 897.7 1181.8 655.2 1.62E-02 572.9 33.4 1.0 7.6 6.37E-04 326.2 0.6 771.8 0.0 3.3 2.11E-02 720.9 69.3 45.2 111.4 272.0 34.4 6.0 1.3 35.9 629.4 85.9 5.3 6.6 288.0 39.5 36.6 12.2 22.8 60.2 331.3 0.0 24.1 36.3 KCNA2KCNJ12 KCNK12 KCTD12 KLF6 LAMA2 3737LEMD1 LGALS3 LIF ENSG00000177301LINGO1 LMO3 LNX1 LOXL4 LPHN2 LRP4 LRRC55 LTF InfLXN 4.29E-04LYST 1.58E-02MAF 149.2MALL 103.9MAP3K5 0.0MAP7 2.4MARCH1 MARVELD3 0.0MBP 0.0 MEST

242 A.1 Differential Expression Appendix 1174.2 2.0 28.8 1.0 0.0 24.7 2.3 384.1 7.1 301.4 0.9 44.4 1298.9 370.3 158.3 126.4 6.9 536.3 1.0 14.3 204.4 52.2 6.1 47.4 317.9 5.0 3.4 2.6 65.6 0.0 63.7 1.5 0.0 146.8 0.0 285.0 0.0 17.4 11.8 796.2 117.5 86.7 24.5 0.0 0.0 0.0 96.8 0.0 0.0 15.2 0.0 250.3 55.3 17.4 98.0 0.0 0.0 267.6 146.3 0.0 77.9 5237.5 31.8 765.9 51.9 8.5 1970.5 4.9 102.0 0.0 56.8 215.7 2577.4 0.0 8.9 175.6 25.1 0.0 124.4 30.8 1713.4 101.3 0.0 27.7 17.5 0.6 77.7 8308.0 86.3 162.7 16.5 110.0 484.4 0.0 200.6 0.0 11.9 49.9 328.9 0.0 153.8 918.1 0.0 0.0 6.2 165.8 3.1 1.6 0.0 0.0 0.0 4.2 69.7 1924.4 82.6 81.2 17.2 3.1 678.3 51.5 200.2 0.0 25.2 51.1 470.8 0.0 20.7 172.4 0.0 0.0 12.0 191.4 0.0 12.1 0.0 0.0 0.0 21.6 39.5 2039.2 45.1 87.8 18.9 3.8 499.3 53.2 195.9 0.0 16.7 43.2 954.4 0.0 8.1 519.9 0.0 0.0 6.32E-02 8.25E-03 1.03E-03 2.72E-02 1.06E-02 6.15E-02 4.75E-03 3.03E-07 8.25E-03 1.28E-03 4.28E-04 2.49E-03 1.30E-03 4.28E-03 1.77E-02 9.25E-02 1.72E-02 3.34E-08 5.08E-02 6.37E-03 1.62E-02 4.02E-02 1.82E-02 5.03E-04 1.78E-02 1.65E-06 p-value FDR G144ED G144 G166 G179 CB541 CB660 2.73E-03 1.96E-04 1.56E-05 8.98E-04 2.65E-04 2.65E-03 1.01E-04 1.51E-09 1.96E-04 2.03E-05 5.50E-06 4.68E-05 2.06E-05 8.93E-05 5.09E-04 4.45E-03 4.84E-04 1.35E-10 2.04E-03 1.43E-04 4.46E-04 1.47E-03 5.32E-04 6.59E-06 5.14E-04 9.86E-09 ) FC ( 2 log -3.26 5.05 3.69 6.08 -Inf 2.92 5.74 -7.23 4.40 4.13 7.05 3.44 -4.54 -3.85 2.92 -2.50 3.42 -Inf 5.94 3.89 2.90 -Inf 4.32 3.74 -4.24 -Inf ENSG00000117122 ENSG00000074416 ENSG00000204520 ENSG00000165175 ENSG00000148773 ENSG00000115648 ENSG00000137673 ENSG00000169184 ENSG00000117791 ENSG00000125148 ENSG00000157601 ENSG00000185697 ENSG00000109063 ENSG00000101335 ENSG00000105835 ENSG00000104490 ENSG00000072864 ENSG00000104722 ENSG00000163531 ENSG00000050344 ENSG00000116962 ENSG00000148826 ENSG00000109255 ENSG00000166741 ENSG00000074181 ENSG00000056291 4237 11343 4276 58526 4288 79083 4316 4330 54996 4502 4599 4603 4621 10398 10135 83988 54820 4741 23114 9603 4811 84504 10874 4837 4854 10886 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts MGC87042MIAMICBMIPOL1MLC1 256227MMP17MMRN1 8190MOCOS 4277 ENSG00000105889 145282MT1AMTTP 23209 ENSG00000213054 4326MXRA5 ENSG00000204516 ENSG00000151338 22915MYC 55034MYL1 ENSG00000100427 4489MYO1B ENSG00000198598 ENSG00000138722 4547NBL1 Inf 25878 ENSG00000075643NCAM1NDN 4609 ENSG00000205362 4.14E-03NELL2 4632 5.64 8.76E-02 ENSG00000138823 4430 ENSG00000101825 65.1 4.20NFATC2 -Inf 5.51E-04NFIA 4681 67.1 1.87E-02 ENSG00000136997 4.73E-03 -2.52 4684 7.14E-04NKX2-1 21.5 9.68E-02 ENSG00000168530 2.27E-02 1.3 3.58 0.0 ENSG00000128641NLGN4X 0.0 6.39E-04 -8.80 4692 151.9 0.0 2.11E-02 4753NNAT 1.55E-05 4.47 ENSG00000158747 4773 0.0 207.8 0.0 0.0 1.01E-08 1.03E-03NOP16 ENSG00000149294 0.0 1.65E-06 1173.5 109.7 Inf 0.0 3.85E-06 27.2NOV 0.0 4774 0.0 ENSG00000182636 864.3 3.16E-04 0.0 7080 -3.36 22.5 85.6 ENSG00000184613 14.1 -3.18 0.0 0.0 57502 422.6 7.71E-04 0.0 ENSG00000101096 12.3 8.32E-04 0.0 2.40E-02 340.8 75.8 2.0 1.27E-03 6.3 2.54E-02 4826 3.95 1.8 495.8 1.3 ENSG00000162599 3.56E-02 5.1 6.5 4.0 -Inf 51491 201.7 ENSG00000136352 46.1 59.5 54.5 0.0 -3.03 ENSG00000146938 17.4 7.75E-06 195.0 84.8 4856 6.3 5.72E-04 23.4 1.75E-08 0.0 2.84 41.7 279.4 155.1 1.10E-03 ENSG00000053438 2.74E-06 3.13 7.2 3.19E-02 0.0 0.0 ENSG00000048162 25.4 93.2 341.8 14.0 13.5 7.32E-04 27.0 -3.46 23.0 8.63E-04 0.0 2.32E-02 478.0 ENSG00000136999 0.0 -3.24 0.0 2.63E-02 584.5 32.0 249.5 Inf 123.2 3171.6 4.28E-05 0.0 0.0 491.1 2.99E-04 55.3 2.32E-03 25.2 11.4 158.0 3747.4 1.17E-02 105.3 2.95 763.1 1.18E-03 0.0 111.5 36.2 -5.52 117.2 24.8 3.37E-02 2068.2 3.12 84.2 514.9 1733.8 74.3 3.06E-03 6.6 174.9 156.4 1.01E-08 6.87E-02 232.8 0.0 387.9 1.65E-06 -Inf 4.96E-04 0.7 220.0 83.5 151.9 310.5 4.1 187.5 1.75E-02 2.60 0.0 378.4 570.9 429.2 4.4 4.86E-09 17.2 Inf 214.4 1.91E-03 1070.5 8.40E-07 10.7 700.9 0.0 4.86E-02 0.0 2418.9 405.4 0.0 9.8 8.1 305.8 4.28E-06 0.0 0.0 0.0 3.47E-04 162.0 448.6 1.5 265.6 0.0 8.4 493.2 0.0 425.3 37.4 174.9 198.3 103.6 58.2 0.0 9.8 57.0 0.0 0.0 67.7 363.2 0.0 0.0 METMFAP2 MGLL MICA MID1IP1 MKI67 4233MLPH MMP7 MN1 MOSC2 ENSG00000105976MT2A MX1 MYBL1 MYH3 MYL9 NAMPT NCALD 9.09NDE1 NEFM 1.36E-18 1.01E-14NFASC 1.4NFE2L3 NID1 0.0NKX6-2 43.8NMU 9851.7NNMT 12.2NOTCH3 0.0 NPFFR2

243 A.1 Differential Expression Appendix 0.0 0.0 0.0 76.3 30.3 1.4 1.0 329.5 6.0 14.8 2.0 9.3 2.0 0.0 1.3 17.2 12.1 75.3 1297.6 0.0 262.2 0.0 0.0 0.0 56.6 384.3 0.0 0.0 0.0 26.6 0.0 0.0 9.1 212.8 0.0 40.1 0.0 7.7 0.0 2.3 0.0 42.5 0.0 12.9 9727.1 0.0 397.8 0.0 6.2 0.0 0.0 2002.9 1039.8 14.6 30.0 901.9 219.5 81.2 162.4 36.0 0.0 0.0 2.2 267.2 208.6 35.2 79.4 0.0 355.6 2041.3 1060.2 0.0 5.5 87.4 531.7 6.3 18.7 118.9 0.0 4.4 60.4 169.0 174.8 53.9 55.6 2.5 11.3 0.0 25.5 145.0 1.3 0.0 3.7 0.6 66.4 107.6 590.8 0.0 2.5 13.2 480.6 0.6 0.6 83.3 0.0 79.7 9.7 60.1 163.9 0.0 6.5 25.0 227.7 0.0 611.8 25.0 7.8 107.6 0.0 1.6 2706.9 1283.2 1377.8 117.1 3.3 18.7 3.1 4419.1 578.2 134.0 0.0 86.9 2.7 33.7 90.7 0.0 2.7 27.0 200.9 0.0 691.5 5.4 65.1 90.5 0.0 9.5 2309.4 1470.9 1189.8 44.0 20.2 7.3 1.4 1227.8 182.5 152.0 5.60E-12 1.64E-02 9.29E-03 1.58E-02 2.57E-03 6.41E-03 4.03E-02 6.51E-03 9.44E-03 2.16E-02 4.42E-09 1.93E-03 7.67E-04 3.16E-02 9.05E-02 5.74E-02 7.67E-12 5.72E-06 5.04E-02 1.07E-02 9.29E-08 3.75E-03 2.94E-10 1.24E-14 6.69E-02 2.38E-03 p-value FDR G144ED G144 G166 G179 CB541 CB660 6.79E-15 4.54E-04 2.27E-04 4.26E-04 4.89E-05 1.46E-04 1.48E-03 1.49E-04 2.31E-04 6.59E-04 1.34E-11 3.46E-05 1.08E-05 1.08E-03 4.31E-03 2.42E-03 1.03E-14 4.08E-08 2.01E-03 2.70E-04 4.25E-10 7.59E-05 7.33E-13 3.34E-18 2.96E-03 4.41E-05 ) FC ( 2 log Inf Inf Inf 2.86 3.61 6.47 3.88 -3.68 4.72 -Inf 7.72 4.12 6.16 5.39 5.77 -5.35 7.42 4.68 -2.44 Inf -6.45 Inf 6.80 Inf 2.81 -3.41 ENSG00000169297 ENSG00000153234 ENSG00000157168 ENSG00000118257 ENSG00000148053 ENSG00000089127 ENSG00000111331 ENSG00000145934 ENSG00000130558 ENSG00000165588 ENSG00000089041 ENSG00000041880 ENSG00000197991 ENSG00000113205 ENSG00000163710 ENSG00000184588 ENSG00000134853 ENSG00000162493 ENSG00000162734 ENSG00000087495 ENSG00000137558 ENSG00000186088 ENSG00000069011 ENSG00000188257 ENSG00000189129 ENSG00000114805 190 4929 3084 8828 4915 4938 4940 57451 10439 5015 5027 10039 64881 56132 26577 5142 5156 10630 8682 116154 51050 54103 5307 5320 219348 23007 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts NR1D1NRBF2NRN1NTN1NXPH1 9572OAS2 29982OASLOGN 51299OTOR 9423 ENSG00000126368 30010 ENSG00000148572OXTRPARP12 4939 ENSG00000124785PAX8 8638 ENSG00000065320 ENSG00000122584PCDHB12 4969 56914PCDHB4 ENSG00000111335 5021PDE1C ENSG00000135114 64761PDGFA ENSG00000106809 2.86PDGFRB ENSG00000125879 7849 56124 -InfPDZRN3 ENSG00000180914 1.07E-03 56131 Inf ENSG00000059378PERP 3.15E-02 2.85E-04 80.9 5137 1.12E-02 -4.22PHLPP1 ENSG00000125618 4.57 ENSG00000120328 0.0 6.59E-09 5154PIGA 136.8 1.13E-06 1.58E-06 5159 ENSG00000081818 317.7 InfPITPNC1 4.70E-05 1.48E-04 0.0 469.2 23024 2.49E-03 95.8 InfPKNOX2 220.4 ENSG00000154678 235.8 325.0 0.0 2.18E-05PLA2G4A ENSG00000197461 64065 -Inf 51.7 239.9 49.5 1.36E-03 9.92 23239 414.1 ENSG00000113721 2.20E-08 0.0PLCB1 0.0 22.4 48.5 24.2 3.30E-06 ENSG00000121440 0.0 3.67 3.58E-04 1.2 5277 2.92 3.31E-15 250.6 12.9 26207 1.36E-02 0.0 0.0 3.28E-12 26.6 0.0 ENSG00000112378 5458.0 1.99E-03 63876 1169.3 59.5 ENSG00000081913 7.8 -2.80 1.36E-03 -7.05 0.0 5.00E-02 8.4 966.2 73.4 5321 3.77E-02 1474.1 0.0 28.0 5.88 102.4 18.2 65.7 ENSG00000165195 1.78E-03 0.0 2.05E-03 ENSG00000154217 4.0 23236 180.2 4.66E-02 5.09E-02 111.2 0.0 ENSG00000165495 0.0 4.13 21.9 0.0 0.0 2.7 2.01E-06 0.0 36.0 1.85E-04 0.0 3.58 ENSG00000116711 196.2 0.0 337.8 0.0 -3.44 1.38E-06 94.6 0.0 0.0 0.0 300.2 0.0 ENSG00000182621 1.32E-04 -4.40 1.02E-04 351.5 7.6 578.2 1.0 1.87E-03 24.5 40.1 4.80E-03 0.7 5.6 71.6 4.81E-02 1540.7 3.83 1.38E-05 430.3 66.0 4.6 21.2 34.3 0.0 -2.64 9.38E-04 1178.9 0.0 1117.8 18.4 438.3 4.28E-04 41.4 18.8 981.4 31.9 8.54E-04 3.43 3.72 1.58E-02 0.0 53.5 2.61E-02 22.1 32.8 45.7 1.4 29.9 23.1 469.3 -2.77 4.0 1.79E-03 5.25E-04 12.6 20.8 3.3 50.8 7.34 4.67E-02 0.0 331.2 1.80E-02 3.36E-03 3.0 13.9 49.3 293.0 7.40E-02 58.0 166.3 -2.27 200.3 1.68E-15 62.9 100.8 215.2 278.2 10.3 1.92E-12 316.0 108.2 2.87E-03 909.7 1557.1 39.5 251.3 32.3 182.7 11.4 6.55E-02 924.7 103.0 54.9 1226.5 97.2 1.5 3.3 483.7 73.1 1.7 16.1 1.9 755.9 8.4 2.0 13.7 0.0 109.0 53.8 53.4 10.1 366.7 70.9 NPTX2NR0B1 NR4A2 NRG1 NRP2 NTRK2 4885OAS1 OAS3 ODZ2 ENSG00000106236OLFM1 OTX2 P2RX7 PARP3 PCDH20 PCDHB3 PCOLCE2 PDE4B -6.47PDGFRA 1.11E-08PDPN 1.78E-06 0.0PEA15 PHACTR3 0.0PI15 PION 6.9PITX1 4.8PLA2G2A 422.0PLAC9 272.4 PLCH1

244 A.1 Differential Expression Appendix 24.7 75.6 22.1 305.5 1.2 6.4 43.8 180.3 1.5 0.0 126.9 63.7 2.0 19.4 33.4 1.0 20.3 8.1 0.0 157.6 4.2 112.5 203.9 1.0 1.1 0.0 39.9 0.0 11.7 1517.6 0.0 0.0 11.2 1239.4 6.3 0.0 191.7 101.0 0.0 36.2 58.6 0.0 0.0 24.3 0.0 267.2 0.0 343.5 90.0 0.0 0.0 0.0 1.6 367.2 297.2 368.1 3.1 23.0 358.0 218.4 0.0 29.0 0.0 0.0 88.6 74.3 963.8 0.0 137.8 471.8 38.5 19.8 50.7 3.0 130.6 58.0 0.0 0.0 0.6 476.6 621.8 8.8 242.9 9.6 90.7 42.9 181.8 212.3 14.8 7.0 12.8 179.4 53.7 52.1 13.5 64.7 26.1 28.2 5.6 6.9 146.6 50.8 5.1 1.3 0.0 776.6 123.3 183.7 6.3 165.1 1.6 215.1 0.0 0.0 0.0 20.1 0.0 837.9 221.7 62.9 92.3 26.5 5.2 18.6 107.6 13.5 2727.3 0.0 165.0 72.1 6.8 360.2 240.8 155.7 21.6 12.2 2.7 168.6 5.5 0.0 2.7 2.8 0.0 569.0 137.3 82.1 113.9 64.8 0.0 21.3 84.3 4.1 3292.5 2.4 225.4 41.3 7.18E-02 3.06E-04 5.41E-05 4.90E-02 4.52E-05 2.21E-02 9.82E-02 5.54E-02 4.81E-02 1.43E-05 2.94E-03 6.69E-02 6.42E-02 3.41E-03 1.10E-02 1.83E-02 9.64E-02 8.65E-03 6.45E-02 2.45E-02 2.33E-02 2.87E-04 6.09E-02 2.19E-02 2.74E-03 6.99E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 3.21E-03 3.67E-06 4.99E-07 1.93E-03 4.02E-07 6.87E-04 4.84E-03 2.30E-03 1.87E-03 1.11E-07 5.70E-05 2.97E-03 2.78E-03 6.76E-05 2.79E-04 5.37E-04 4.70E-03 2.08E-04 2.81E-03 7.97E-04 7.40E-04 3.40E-06 2.60E-03 6.76E-04 5.25E-05 3.11E-03 ) FC ( 2 log -5.40 3.83 4.36 -2.28 7.37 4.44 2.45 -2.16 4.09 Inf -5.00 -3.18 5.05 3.71 3.17 6.23 3.00 3.53 Inf -3.26 4.76 -4.82 2.77 6.16 6.81 Inf ENSG00000143850 ENSG00000102024 ENSG00000124225 ENSG00000240694 ENSG00000141934 ENSG00000086717 ENSG00000118898 ENSG00000154845 ENSG00000101000 ENSG00000204540 ENSG00000171862 ENSG00000153707 ENSG00000153233 ENSG00000100504 ENSG00000123892 ENSG00000117280 ENSG00000157927 ENSG00000133321 ENSG00000152689 ENSG00000242732 ENSG00000091844 ENSG00000143248 ENSG00000164292 ENSG00000183421 ENSG00000145428 ENSG00000232803 22874 5358 56937 10687 8612 5475 5493 9989 10544 170679 5728 5789 5801 5836 23682 8934 55698 5920 25780 340526 26575 8490 22836 54101 285533 100127888 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts PLS1PLSCR1PMP2PNPPPCSPPHLN1 5357 5359PPM1KPRIMA1 5375PRSS12 4860 ENSG00000120756 ENSG00000188313PSPH 79717 51535PTGS1 ENSG00000147588 152926PTPRH 145270 ENSG00000198805PXDN ENSG00000127125 ENSG00000134283 8492RAB11FIP1 ENSG00000163644RAB6B 5723 ENSG00000175785 5742RAD50 5794RARRES1 2.94 ENSG00000164099 2.45 7837 80223RASGEF1C ENSG00000146733 9.39E-04 InfREC8 4.58E-03 ENSG00000095303 2.82E-02 9.47E-02 51560RGL3 67.5 ENSG00000080031 3.92 37.0 2.94 9.08E-15 10111RGS20 ENSG00000156675 3.56 5918 ENSG00000130508 7.10E-12 149.4 3.87E-03 137.7 255426RHBDF2 948.1 5.15 3.03E-04 8.32E-02 218.6 7.59E-04 ENSG00000154917 307.6 6.84RHPN1 1.18E-02 36.5 1312.7 344.8 2.37E-02 57.1 533.6 1.93E-03 ENSG00000113522RNASEH1 9985 0.0 98.5 36.5 4.90E-02 3.66 20.7 ENSG00000118849 2.97E-09 81.2 ENSG00000146090RP11-473I1.1 57139 106.2 16.2 5.51E-07 0.0 165.8 25.3 115.6 3.45 8601 865.4 38.1 771.9 1.95E-03 79651 28.1 -3.91 119.1 8.3 4.92E-02 0.0 214.9 465.5 ENSG00000100918 6.6 2.67 128.0 1.87E-03 114822 56.3 32.7 2.4 ENSG00000205517 4.31E-03 4.81E-02 246243 0.0 0.0 -3.57 49.8 3.01 9.05E-02 23.6 40.0 2.6 ENSG00000147509 3.36E-03 61.6 12.2 54.9 3.7 ENSG00000129667 7.40E-02 55.6 4.79E-03 0.0 60.0 -3.75 14.1 6.69E-04 13.5 ENSG00000158106 0.0 9.75E-02 6.8 87.1 2.18E-02 ENSG00000171865 3.49 19.1 168.5 2.4 1025.9 8.86E-05 2.0 6.00 3.0 0.0 4.10 1.6 1.9 4.27E-03 1801.8 ENSG00000220793 6.9 3.57E-03 36.5 162.7 163.8 0.0 5.06E-11 7.77E-02 10.1 4.0 3.55E-03 371.3 19.0 1.42E-08 25.7 146.9 22.8 7.75E-02 -2.64 5.6 61.9 35.8 10.2 471.5 0.0 41.9 Inf 49.4 10.7 20.2 1.11E-03 19.7 143.6 0.0 -3.16 132.8 56.8 8.2 3.22E-02 3.41 70.3 3.90E-04 2.8 99.8 140.1 16.6 -4.18 84.2 1.47E-02 4.95E-03 303.4 2.43 3.12E-05 10.2 9.97E-02 9.8 91.8 1153.4 1.78E-03 6.3 76.3 0.0 2.88E-03 6.1 9.5 4.7 4.37E-03 2.89 6.56E-02 0.0 23.9 5.7 9.14E-02 21.7 0.0 125.9 6.3 68.9 18.8 61.0 6.0 5.13E-04 5.9 633.6 7.1 1.78E-02 30.3 214.9 165.9 73.0 373.3 956.3 25.6 0.0 0.0 213.0 177.2 10.7 440.5 136.0 0.0 0.0 882.0 48.5 63.6 56.5 40.3 3.2 52.6 35.3 48.9 46.0 PLD3PLEKHA6 PLS3 PMEPA1 PNMA2 PPAP2C 23646PPEF1 PPL PPP4R1 ENSG00000105223PROCR PSORS1C1 PTEN PTPRD PTPRR PYGL RAB38 RAB7L1 -3.31RADIL 2.42E-04RARRES3 9.76E-03RASGRP3 228.5RGAG4 33.1RGS17 87.5RGS5 9.8RHOBTB3 RIPK4 700.5RNF175 162.0 RP11-93B14.2,hCG_2018279

245 A.1 Differential Expression Appendix 53.8 0.0 5.5 0.0 496.5 267.7 43.4 40.4 178.5 82.0 140.0 1.0 32.3 37.6 55.9 726.4 17.9 38.3 48.5 24.5 0.0 145.3 297.0 19.2 44.6 36.6 64.7 0.0 0.0 0.0 3663.5 409.7 0.0 0.0 0.0 0.0 257.1 0.0 125.0 39.6 0.0 807.8 1346.6 150.4 96.6 14.5 0.0 925.2 213.0 8.5 20.5 25.1 82.3 253.0 0.0 58.2 66.5 112.8 0.0 0.0 4.1 0.0 0.0 4.0 0.0 672.0 0.0 132.7 0.0 0.0 95.7 98.7 34.8 18.8 0.0 226.3 513.8 235.4 335.5 84.0 98.5 98.1 6.3 0.0 0.0 0.0 0.0 0.0 3.1 0.0 0.0 989.7 0.0 72.1 0.0 1.9 12.7 320.3 10.0 13.1 0.0 176.0 79.9 357.1 650.0 0.0 64.0 0.7 1302.1 81.2 0.0 0.0 23.9 2.3 50.0 109.6 0.0 40.0 0.0 29.6 1.6 1.2 1552.1 124.7 25.0 38.9 15.5 11.6 396.7 29.7 569.1 0.0 72.2 0.0 3679.1 126.2 0.0 0.0 44.4 0.0 68.1 103.7 0.0 47.5 0.0 95.0 1.4 1.4 590.4 150.8 12.1 36.0 38.8 17.6 306.2 29.7 7.40E-02 4.79E-07 2.50E-02 5.72E-04 1.83E-02 6.19E-02 7.17E-02 8.84E-02 7.27E-02 1.94E-02 3.32E-03 3.15E-02 9.73E-05 3.79E-04 3.31E-02 1.23E-02 3.74E-12 8.53E-04 5.96E-02 1.32E-02 7.54E-02 6.12E-05 1.00E-05 1.58E-02 5.26E-03 3.77E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 3.34E-03 2.54E-09 8.14E-04 7.77E-06 5.35E-04 2.66E-03 3.20E-03 4.18E-03 3.28E-03 5.75E-04 6.56E-05 1.07E-03 9.82E-07 4.80E-06 1.15E-03 3.19E-04 4.28E-15 1.23E-05 2.54E-03 3.45E-04 3.44E-03 5.85E-07 7.54E-08 4.27E-04 1.14E-04 1.36E-03 ) FC ( 2 log 2.59 Inf 4.41 Inf -2.18 -2.39 -Inf -Inf -3.29 -6.30 -3.48 6.22 -Inf 3.88 -Inf -3.30 -10.35 -6.36 2.94 3.22 Inf -4.49 -5.61 3.32 3.34 2.75 ENSG00000164610 ENSG00000166592 ENSG00000179041 ENSG00000136514 ENSG00000160307 ENSG00000103449 ENSG00000168356 ENSG00000183873 ENSG00000069188 ENSG00000153993 ENSG00000092421 ENSG00000100095 ENSG00000229415 ENSG00000185437 ENSG00000180730 ENSG00000138771 ENSG00000184302 ENSG00000163406 ENSG00000155850 ENSG00000111371 ENSG00000076351 ENSG00000080493 ENSG00000174640 ENSG00000170545 ENSG00000132639 ENSG00000197989 6100 6236 23212 64108 6285 6299 11280 6331 54549 223117 57556 23544 253970 6450 387914 57619 4990 6565 1836 81539 113235 8671 6578 57228 6616 85028 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts RPSAP52RRP7ARTN1S100A6S1PR3 204010SALL2 27341SCN1BSDC2 6252 ENSG00000241749 6277SELENBP1 ENSG00000189306 1903SEMA4G 6297SERPINE2 ENSG00000139970 6324SFRP1 ENSG00000197956SGCD 8991 6383 ENSG00000213694SHC4 ENSG00000165821 57715SHOX2 ENSG00000105711 5270 4.40SIX3 ENSG00000143416 ENSG00000169439 6422SKAP2 3.00 2.37E-03 ENSG00000095539SLC16A3 6444 5.66E-02 ENSG00000135919 0.0 -5.34SLC2A5 2.73E-03 399694 6.04 6.32E-02 6474SLC38A5 ENSG00000104332 78.1 5.31E-07 0.0 -4.14SLC4A11 5.68E-05 6496 2.05E-17 ENSG00000170624 -4.13 8.1 35.6 8935 12.2 5.08E-14SLCO1C1 ENSG00000185634 4.96E-03 9123 2259.3 4.36 116.6 9.98E-02SLITRK5 263.9 ENSG00000168779 2.94E-05 5.5 29.7 324.2 6518 -3.91 -3.28 1.71E-03 2.0SMOC2 0.0 3.42E-06 92745 ENSG00000138083 72.9 5709.9 0.6SNCAIP 2.87E-04 6.3 -3.43 ENSG00000005020 512.7 9.9 7.83E-04 83959 4.2 7.59E-04 654.7 ENSG00000141526 3.55 38.6 19.3 2.42E-02 2.37E-02 53919 12.3 0.0 4.43E-03 25.5 18.2 15.1 1438.9 ENSG00000142583 16.5 9.22E-02 385.7 26050 ENSG00000017483 1.21E-04 53.4 -2.10 141.0 0.0 29.7 5.2 11.0 5.51E-03 11.6 286.3 ENSG00000088836 25.4 64094 4.73 1392.0 4.6 7.59E-04 13.2 ENSG00000139155 14.3 2.94 194.2 9627 20.1 31.9 2.37E-02 2436.4 1.20E-05 ENSG00000165300 47.7 91.7 Inf 4.0 581.9 6.7 69.1 127.5 20.2 8.39E-04 4.92E-03 513.6 253.9 172.6 ENSG00000112562 9.93E-02 0.0 535.5 -5.50 110.1 8.30E-04 50.1 4.09 92.8 216.2 ENSG00000064692 443.4 0.0 2.54E-02 203.2 Inf 69.2 1.88E-08 41.9 30.0 83.1 0.0 2.91E-06 12.3 4.19E-05 Inf 47.4 2.7 2.28E-03 Inf 1.99E-03 184.2 23.4 0.0 1178.7 52.5 5.01E-02 Inf 63.3 7.62E-05 396.2 0.0 64.0 26.1 0.0 -Inf 8.90E-04 3.75E-03 58.8 12.9 0.0 2.70E-02 1.7 0.0 1.75E-05 -4.56 0.0 11.0 63.4 26.6 11.1 1.14E-03 3.19E-09 0.0 0.0 15.5 3.1 -Inf 360.6 5.86E-07 7.21E-04 21.9 20.6 7.7 2.29E-02 -5.66 8.4 0.0 203.9 52.7 41.5 4.5 13.7 61.7 2.12E-04 594.6 0.0 8.75E-03 8.9 0.0 3.64E-03 87.7 4.0 109.7 0.0 2.7 7.90E-02 11.6 0.0 0.0 4.4 0.0 0.0 16.3 0.0 0.0 0.0 0.0 0.0 1.6 0.0 0.0 0.0 166.2 46.4 0.0 0.0 151.8 207.6 0.0 0.0 2.8 79.9 49.4 RP5-955M13.1,KCNG1RP9 RRAD 3755RRS1 RTP4 S100B ENSG00000026559,ENSG00000242964SALL1 SCN11A SCN5A SDK2 3.17SEMA3D SEMA6A 4.85E-03SEZ6L 9.82E-02 31.5SFTA3 SH3BGR 43.7SHISA2 73.1SHROOM3 75.4SIX6 0.0SLC15A2 SLC26A2 14.1 SLC38A1 SLC46A1 SLC4A4 SLCO2A1 SMAGP SNAP25 SNHG12

246 A.1 Differential Expression Appendix 1.0 0.0 0.0 0.0 2.4 0.0 254.9 52.0 138.7 319.2 59.7 15.9 1.4 1255.8 0.0 72.4 212.8 14.1 384.4 2.4 35.1 0.0 8.6 0.0 0.0 61.5 0.0 0.0 0.0 0.0 0.0 0.0 335.4 151.8 214.2 321.8 23.6 3.5 2.5 16.7 103.4 43.3 238.8 5.9 1089.6 0.0 8.4 0.0 92.8 26.0 12.1 38.9 0.0 0.8 30.0 0.0 30.5 0.0 7850.3 0.0 38.0 4757.1 0.0 334.9 104.7 9.8 0.0 581.6 36.4 142.0 4.1 90.9 368.1 0.0 0.0 104.2 54.3 1655.3 1.9 401.2 2.6 0.0 81.5 79.9 2549.4 5.0 0.0 771.8 0.0 86.7 219.6 7.5 0.0 253.5 0.0 71.1 9.4 73.7 87.7 0.0 0.0 42.5 99.7 39.0 276.1 3.1 81.1 3216.5 4.7 3.8 194.5 0.0 14.4 1473.9 4.7 0.0 35.9 38.8 0.0 891.0 0.0 54.7 3.1 15.7 70.8 195.2 13.6 357.8 272.7 73.4 162.4 112.8 82.4 2350.8 6.8 0.0 75.6 0.0 21.5 1070.0 0.0 0.0 6.2 0.0 0.0 749.0 1.4 37.0 1.9 13.8 59.5 256.3 21.6 476.7 216.7 237.0 1.05E-04 4.05E-08 7.07E-03 4.85E-14 3.15E-02 2.76E-02 5.10E-03 1.90E-03 1.94E-02 3.49E-02 4.46E-02 3.66E-03 2.87E-05 6.51E-05 1.05E-03 5.28E-03 3.32E-03 6.67E-02 4.44E-11 2.62E-03 2.18E-02 2.83E-04 5.38E-02 1.64E-02 1.92E-03 1.48E-03 p-value FDR G144ED G144 G166 G179 CB541 CB660 1.08E-06 1.66E-10 1.63E-04 1.63E-17 1.07E-03 9.14E-04 1.10E-04 3.39E-05 5.76E-04 1.23E-03 1.69E-03 7.34E-05 2.40E-07 6.26E-07 1.60E-05 1.15E-04 6.56E-05 2.93E-03 8.97E-14 5.01E-05 6.69E-04 3.34E-06 2.19E-03 4.55E-04 3.44E-05 2.42E-05 ) FC ( 2 log 7.52 Inf Inf Inf 5.25 Inf 3.58 -5.93 -3.35 2.86 -4.73 3.81 6.19 -5.09 -Inf 3.31 -4.23 3.14 -7.06 5.89 3.00 Inf -3.42 3.71 4.55 3.55 ENSG00000157734 ENSG00000109610 ENSG00000156395 ENSG00000100146 ENSG00000061656 ENSG00000128040 ENSG00000118785 ENSG00000137877 ENSG00000196220 ENSG00000101955 ENSG00000184005 ENSG00000138134 ENSG00000164647 ENSG00000137868 ENSG00000145087 ENSG00000196562 ENSG00000182253 ENSG00000102362 ENSG00000149591 ENSG00000204634 ENSG00000081059 ENSG00000205678 ENSG00000088992 ENSG00000137203 ENSG00000163235 ENSG00000186340 79856 6649 22986 6663 6676 6691 6696 51332 9901 8406 256435 57559 26872 64220 9515 55959 23336 94121 6876 11138 6932 253017 54997 7020 7039 7058 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts SOD2SORCS2SORL1SOX3SPARCL1SPINT1 6648 57537SPRED1 6653SQRDLSRP9 6658 8404 ENSG00000112096 ENSG00000184985ST6GAL1 6692ST6GALNAC5 ENSG00000137642 161742STC2 ENSG00000134595 ENSG00000152583 58472STEAP2STX3 ENSG00000166145 6726 6480 ENSG00000166068 81849SULF1 ENSG00000137767SYNGR1 3.20SYT1 8614 -9.04 ENSG00000143742 261729 ENSG00000117069 ENSG00000073849SYTL5 1.06E-04 3.93 6.02E-13TAPBPL 6809 4.94E-03 2.49E-10 23213 3.74 45.8TCEAL2 1.5 -4.03 3.16E-04 ENSG00000113739 ENSG00000157214 9145TCF7L2 1.22E-02 51.4 4.31 1.67E-04 586.9 2.40E-05 1.6 -3.32TES 6857 7.18E-03 1.48E-03 ENSG00000166900 187.6 94122 293.8 2.5TF 44.2 ENSG00000137573 0.6 1.01E-06 3.58 1755.6 55080 6.80E-04 ENSG00000100321 9.90E-05 1.4TFCP2 2.20E-02 53.2 39.0 0.0 0.0 140597 3.1 15.4 -2.29TGM2 1.07E-03 -8.25 4.93 52.7 ENSG00000067715 90.9 6934 4.8 3.15E-02 690.7 ENSG00000147041 256.8 0.0 17.7 31.7 3.0 2.21E-03 ENSG00000139192 1.01E-11 4.81E-04 75.8 282.9 77.4 5.38E-02 3.42E-09 26136 ENSG00000184905 1.71E-02 6.9 1009.1 90.8 443.5 3.31 12.1 8.9 3.94 1448.1 1.8 184.6 7.2 7018 24.1 7024 2097.6 ENSG00000148737 726.1 183.0 171.7 0.0 15.2 1.18E-03 2.25E-05 19.9 52.3 Inf 7052 217.2 195.4 3.37E-02 1.40E-03 ENSG00000135269 0.7 -3.79 0.0 161.4 0.0 13.7 0.0 101.8 18.4 2.48 12.0 5.56E-04 ENSG00000091513 1982.7 3.40E-03 ENSG00000135457 5.8 0.0 1.88E-02 0.0 0.0 7.47E-02 1557.9 -2.46 0.0 2.40E-03 4.1 ENSG00000198959 0.0 3.34 5.73E-02 261.3 230.2 2.57 319.9 127.2 4.0 6.89E-04 0.0 899.5 26.8 Inf 414.3 11.3 2.22E-02 1.88E-03 130.6 3.86E-03 87.5 4.82E-02 25.9 0.0 72.6 0.0 8.32E-02 -2.60 0.0 487.6 2.51E-08 123.3 110.3 6.1 19.5 414.2 3.69E-06 0.0 17.2 -Inf 3.98E-03 0.0 1.4 0.0 118.8 52.8 0.0 8.49E-02 13.7 33.8 114.3 3.70 16.1 14.8 5.47 6.77E-12 71.1 3.3 0.0 780.6 86.9 2.45E-09 277.9 34.3 3.93 374.6 0.0 4.33E-03 27.2 93.5 6.46E-04 12.2 9.07E-02 93.7 2.12E-02 69.4 248.9 140.3 9.46E-06 20.3 47.7 0.0 7.3 42.3 6.83E-04 0.0 205.9 0.0 99.6 0.0 343.9 0.0 0.0 20.3 248.4 0.0 62.7 14.5 14.7 290.3 433.0 5.5 418.6 0.0 184.2 11.5 6.1 2.0 22.5 SNX10SNX22 SOD3 SORCS3 SOX10 SPAG4 29887SPINK2 SPP1 SPTBN5 ENSG00000086300SRGAP3 SRPX ST6GALNAC3 STAMBPL1 STEAP1 STRA6 STXBP5L SULF2 4.39SYNM 2.53E-03SYTL4 5.95E-02TAGLN 82.6TBC1D8 39.4TCF7 24.4TECRL 63.7TESC TFAP2A 0.0TGFA 4.3 THBS2

247 A.1 Differential Expression Appendix 7.4 8.1 91.0 6.5 10.1 336.6 3.0 15.8 64.6 116.0 21.2 47.9 146.3 161.8 29.9 13.1 2.0 43.7 2.7 45.5 6.1 333.7 207.8 71.4 15.9 16.7 1.0 0.0 140.7 0.9 194.7 595.0 0.0 0.0 58.6 0.0 3.2 198.9 121.7 158.2 7.6 20.5 0.0 0.0 0.0 0.0 0.0 379.0 322.5 8.4 216.4 0.0 4.9 47.0 1586.1 119.9 1.0 36.9 513.3 288.5 2851.5 0.0 134.9 1733.5 11.3 837.1 527.1 75.3 0.0 1.7 0.0 0.0 29.1 75.0 38.7 2398.4 0.0 332.9 308.7 0.0 154.3 58.5 20.2 177.7 28.1 6.8 55.2 0.0 5.6 563.7 0.0 4203.7 265.0 10.6 155.0 1.3 127.5 0.0 185.7 5.9 0.0 46.0 0.0 16.2 242.3 150.7 466.7 1.6 12.6 9.8 0.0 458.0 21.4 0.0 221.1 367.0 0.0 109.6 49.3 802.9 4.6 849.2 0.0 0.0 47.9 0.0 162.2 0.0 0.0 1.6 117.3 186.9 535.5 2.7 1.4 25.6 0.0 621.2 35.1 0.0 197.4 179.6 0.0 83.9 106.8 156.8 4.1 737.7 0.0 0.0 54.0 0.0 113.5 0.0 0.0 0.0 3.16E-06 4.03E-02 4.77E-02 2.87E-02 5.74E-02 8.65E-02 1.43E-08 1.14E-05 3.06E-04 1.54E-03 3.15E-02 3.17E-02 2.50E-03 2.94E-03 5.33E-04 1.45E-03 5.63E-03 3.32E-03 3.80E-02 6.22E-02 1.70E-03 5.89E-03 9.08E-02 1.89E-05 3.86E-06 5.99E-03 p-value FDR G144ED G144 G166 G179 CB541 CB660 2.08E-08 1.48E-03 1.85E-03 9.61E-04 2.42E-03 4.07E-03 5.19E-11 8.74E-08 3.69E-06 2.55E-05 1.07E-03 1.09E-03 4.75E-05 5.71E-05 7.17E-06 2.35E-05 1.24E-04 6.53E-05 1.38E-03 2.68E-03 2.91E-05 1.32E-04 4.34E-03 1.49E-07 2.65E-08 1.35E-04 ) FC ( 2 log 5.56 4.02 2.67 4.13 -3.20 -2.64 6.89 4.95 3.98 -Inf 3.31 2.85 -5.13 3.42 3.88 4.14 5.71 3.70 4.79 -Inf 4.85 -3.74 -1.98 4.34 -Inf 3.76 ENSG00000169908 ENSG00000166292 ENSG00000187975 ENSG00000165071 ENSG00000223551 ENSG00000185215 ENSG00000123610 ENSG00000117586 ENSG00000159173 ENSG00000198846 ENSG00000137364 ENSG00000174599 ENSG00000101255 ENSG00000132481 ENSG00000144481 ENSG00000145777 ENSG00000156298 ENSG00000215845 ENSG00000155657 ENSG00000127824 ENSG00000104723 ENSG00000144406 ENSG00000162692 ENSG00000106018 ENSG00000163032 4071 55273 25907 399474 137835 7127 7130 7292 7135 9760 7172 133022 57761 91107 79054 85480 7102 100131187 7273 7277 7991 285175 7412 7434 7447 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts TMCO4TMEM132DTMEM176BTMEM38ATMSB15A 255104 121256TNC 28959TNFAIP3TNFRSF14 79041 ENSG00000162542 ENSG00000151952TNNC2 11013TNNI2 ENSG00000106565TPM1 3371 ENSG00000072954 7128TRAF1 8764 ENSG00000158164TRIB2 7125TRIM14 ENSG00000041982 ENSG00000118503 7136TRIM48 ENSG00000157873TSHZ3 7168 5.51 -Inf 7185TSPAN13 ENSG00000101470 6.03 1.45E-04 28951TSPAN9 1.01E-04 ENSG00000130598 6.41E-03 9830 3.09 4.75E-03TTF2 2.7 2.07E-03 ENSG00000140416 0.0 79097 -2.98 5.11E-02TTYH1 ENSG00000056558 1.80E-03 20.3 1.6 57616TUBB4 ENSG00000071575 0.0 4.67E-02 27075 1.29E-03 205.9 ENSG00000106785UGT8 93.6 2.99 3.62E-02 2.86 42.3 0.0 10867 ENSG00000150244 0.0 4.01VAX1 608.9 129.9 0.0 3.08E-04 0.0 3.34E-03 ENSG00000121297VIPR1 70.7 8458 1.2 0.0 1.19E-02 ENSG00000106537 7.40E-02 5.5 -Inf 2.55E-04 57348VIT 215.9 5.1 16.2 57.0 1.02E-02 ENSG00000011105 -Inf 10382 1.0 171.6 12.7 0.0 188.2 1.46E-04 23.2 33.3 17.2 265.9 -2.94 6.41E-03 7368 7.9 31.9 ENSG00000116830 1.0 2.84E-05 3.92 0.0 1121.9 20.2 ENSG00000167614 83.3 11023 1.67E-03 1.95E-03 1597.5 -2.05 1173.9 0.0 67.4 ENSG00000104833 7433 177.3 4.92E-02 59.8 2.00E-03 0.0 3.38 85.7 260.7 5.01E-02 0.0 1.95E-03 Inf ENSG00000174607 5212 0.0 91.8 17.9 4.92E-02 0.0 12.9 78.5 2.02E-03 ENSG00000148704 434.9 25.7 4.45 5.05E-02 0.0 2.62 0.0 36.3 1.84E-04 1.0 ENSG00000114812 269.4 54.4 352.2 7.84E-03 0.0 2.72 2.11E-05 215.4 19.5 21.6 0.0 1.25E-03 34.9 ENSG00000205221 1.32E-03 63.7 955.6 3.53E-02 145.7 136.6 0.0 55.8 2.51E-03 3.05 124.2 85.9 126.1 73.3 1911.5 -1.91 6.8 5.92E-02 142.5 0.0 919.1 114.0 78.4 234.5 155.4 4.59 6.65E-04 38.5 2.0 2.34E-03 307.4 0.0 2.17E-02 127.9 13.5 5.61E-02 90.4 184.1 4.64 55.3 1.43E-03 748.4 1960.4 5.3 -Inf 0.0 3.94E-02 99.0 0.0 28.1 834.0 112.4 150.5 4.89E-06 -4.43 4588.1 0.0 3.85E-04 0.0 11.1 65.1 2.74E-06 91.9 113.6 416.6 155.7 2.39E-04 6.07 4.74E-06 472.7 21.2 32.7 0.0 68.8 3.77E-04 274.5 25.1 6.5 0.0 1183.5 1.83E-05 169.2 0.0 1.18E-03 958.4 27.3 38.9 0.0 14.9 3.1 0.0 3.8 12.4 3.6 66.7 0.0 8.6 6.8 0.0 32.7 193.0 1096.8 121.6 1.0 0.0 2.1 THY1TM4SF1 TMEM100 TMEM158 TMEM200B TMEM71 7070TMSL1 TNFAIP2 TNFAIP6 ENSG00000154096TNFSF4 TNNI1 TOX TPMT TRAM1L1 TRIB3 TRIM47 TRPM8 3.29TSLP TSPAN7 2.41E-04 9.76E-03TSTD1 2994.5TTN 2506.5TUBA4A 150.3TUSC3 2389.9 150.0UNC80 VCAM1 195.0 VIPR2 VSNL1

248 A.1 Differential Expression Appendix 10.6 1006.3 79.8 211.7 247.2 0.0 0.0 0.0 508.8 0.0 0.0 80.2 0.0 0.0 132.4 57.0 0.0 0.0 3.0 37.9 0.0 98.3 85.0 0.0 0.6 0.0 3.8 18.8 4.0 290.3 0.0 1.6 0.0 100.5 59.1 5.4 504.6 0.0 0.0 0.0 41.7 86.5 1.54E-02 6.14E-02 8.75E-03 4.74E-04 4.29E-05 2.18E-03 5.12E-02 p-value FDR G144ED G144 G166 G179 CB541 CB660 4.11E-04 2.63E-03 2.12E-04 6.13E-06 3.72E-07 3.95E-05 2.07E-03 ) FC ( 2 log 3.95 -2.40 -Inf -7.19 -7.23 Inf Inf ENSG00000166278,ENSG00000243649,ENSG00000244255 ENSG00000148516 ENSG00000156925 ENSG00000139800 ENSG00000102935 ENSG00000198597 ENSG00000160352 629,717 6935 7547 85416 23090 9745 148206 Gene symbol(s) Entrez gene ID(s) Ensembl 56 gene ID(s) Differential expression results Normalised tag counts XYLT1ZIC2ZIC4ZNF281ZNF454 64131ZNF710ZNF747 7546 84107 23528 ENSG00000103489 285676 ENSG00000043355 374655 ENSG00000174963 65988 ENSG00000162702 ENSG00000178187 ENSG00000140548 ENSG00000169955 4.78 -5.61 1.10E-03 3.20E-02 -7.51 5.45 174.3 2.65E-05 4.41 1.59E-03 5.84E-04 164.9 1.4 4.39E-04 1.95E-02 -4.55 1.61E-02 0.0 0.0 7.62E-05 Inf 60.6 3.75E-03 6.3 3.99E-03 0.8 24.3 8.49E-02 0.0 51.6 4.1 0.8 4.62E-04 0.0 118.3 0.6 1.66E-02 81.7 0.0 12.2 19.6 2.1 4.0 0.0 0.0 152.8 0.0 9.5 5.3 28.5 0.0 0.0 223.2 0.0 14.6 47.4 2.0 9.2 71.1 33.8 0.0 68.7 0.0 WBSCR17XXbac-BPG116M5.1,C2,CFB ZEB1 ZIC3 ZIC5 64409ZNF423 ZNF536 ZNF714 ENSG00000185274 -Inf 3.18E-05 1.80E-03 0.0 0.0 0.0 0.0 17.6 90.9

249 A.2 Classified Differential Expression Appendix

A.2 Classified Differential Expression

The differentially expressed genes generated by the Bioconductor R package DESeq at a FDR<10% are classified below under one of four categories based on the results of the literature mining analysis:

· Extensive amount of literature implicates the gene in glioblastoma (first column);

· Limited amount of literature implicates the gene in glioblastoma (second column);

· Unknown to be implicated in glioblastoma (third column), but known in other cancers (fourth column);

· Unknown to be implicated in any type of cancer (fifth column).

250 A.2 Classified Differential Expression Appendix AC012354.2 AC034102.1 AC068399.1 ACTA2 AFAP1L2 AIDA ATP1A2 B4GALNT1 BMP8B C10orf11 C10orf81 C14orf143 C1orf187 C20orf103 C2orf80 C4orf32 C5orf41 C7orf16 C9orf125 C9orf95 CACNA1A CACNG7 CAMK1D CCDC129 CCDC64 CDHR1 non-small lung carcinoma cervical, endometrial ovarian cancer breast, BCLL gastric breast, pancreatic breast ovarian blood HeLa head, neck breast gastric cancer prostate,pancreas,carcinoma HL60 leukemia cells nasopharyngeal epithelial cell line prostate esophageal cancer gastric,lung neuroblastoma osteosarcoma myeloma ovarian, pancreatic, lung non small cell lung cancer, colon small cell lung breast ADAMTS1 AKR1B10 ANGPTL2 ARHGAP20 ARHGEF7 AZGP1 BEX5 BTG1 C6orf15 CAPN6 CDH19 CILP CLDN11 CMTM5 CNKSR2 CRIP2 CRYM CTSC DCBLD2 DHRS3 DOCK5 DYNC1I1 EEF1A2 EPHB3 F12 FBLN2 Classification of differentially expressed genes based on literature mining analysis. ABAT ADD2 BGN CNTN6 DDIT3 FUT8 GABRA5 GBP2 GJB2 HAPLN1 HLA-DPA1 HOXA5 HOXB6 HOXC10 IFITM2 IL17RD INSIG1 ITGA4 LAMA2 LHFPL3 MEST MMP17 MX1 NEFM NFATC2 NR0B1 Table A.2: Implicated in gliomaACIN1 Limited evidence in gliomaADAMTS4 Not implicated in gliomaALDH1A3 but other cancers CancerAPOD typeASNS ATOH8 ATP1B2 BID BMP7 C1S C3 Not implicated in cancers CALM1 CASP1 CCKBR CCL26 CCND2 CCR1 CD55 CD68 CD74 CD97 CDKN2A CEBPB CITED4 CNTN1 COL3A1 ACTC1AGTAPLN ADCYAP1R1AQP4ASPN ADRA1BATP10B BST2 ADAMTS10BAMBI DDAH2BIRC3 FCGR2B FXYD1BTC AMMECR1 GBP1C2 ARCCA12 GFPT2 ARHGAP8CAMK2B ATAD3C HAND2 BACE2CBLC glaucoma HLA-DMACCL2 HLA-DRB5 BTBD11 HOXA7CCL7 C5orf13 renal cell carcinomaCCNY HOXB9 C8orf4CD248 HOXC13 CARD17 colorectal CDH6CD58 breast cancer IGSF3 CLDN10 lung adenocarcinomaCD70 IL1RAPL1 breastCD9 IRX1 CMPK2 AC026410.6 CDH13 neuroblastoma CMTM8 KCNMB2 AC067930.1 CDKN2C esophageal LFNG COL8A1 CRYBB2CFB MAP3K5 thyroidCLDN3 MFAP2 head, neck AGPAT9 CTNNA2 MOSC2 renalCOL1A2 CXXC4 cell carcinoma AC092296.1 ADRA2A hepatocelluar carcinomaCOL4A6 NCALD DDIT4L NELL2 chronic myelogenous DOCK10 leukemia and CLL NFKBIZ B3GNT9 HeLa DUSP16 ANGPTL1 EDA2R NRN1 hepatocarcinoma rhabdoid tumour (kidney) BATF3 C1orf94 EPDR1 C1orf133 C10orf90 ETS2 endometrial FAM84A C10orf116 BMPER renal carcinoma FBN2 melanoma melanoma C5orf38 Burkitt’s lymphoma colorectal C3orf58 C21orf62 colorectal colon prostate,colon,breast C6orf138 C7orf40 squamous cell carcinoma CACNA1C C9orf64 CABP7 CACNG8 CCDC48 CARD16 CGREF1 CCNO

251 A.2 Classified Differential Expression Appendix CHCHD10 CHRDL1 COL21A1 CPNE2 CRB2 CSGALNACT1 CYB5R2 DDO DKFZp434H1419 DPY19L1 ECHDC2 ELMO2 ELN ELTD1 EPDR1 FAM126A FAM134B FAM176A FAM184B FAM196A FAM69A FBXO27 FCGR2C FXYD7 GABBR2 GALNT5 GBP4 follicular lymphoma lung pancreatic gastric,pancreatic,esophageal,cervical breast,head,neck,pancreatic,endometrial Human promyelocytic leukemia-cells breast,prostate pilocytic astrocytoma childhood ALL medulloblastoma,neuroblastoma hepatocellular,gastric,fibrosarcoma B-cell lymphoma soft tissue sarcoma alveolar rhabdomyosarcoma renal cell carcinoma non-small cell lung cancer prostate,colorectal squamous cell carcinoma,breast,pancreatic breast cancer CLL cervical,colorectal cervical squamous malignant fibrous histiocytomas pancreatic breast nasopharyngeal carcinoma,prostate non-small cell lung cancer FCGR2B FERMT3 FOXJ1 FXYD5 GJB2 GMPR GRB14 HLA-DPA1 HLA-DRB4 HTATIP2 IFI6 IL4I1 IRX2 ITM2A KANK1 KIAA1217 LEMD1 LMO4 LPHN2 LRP4 LUM MAF MAN1C1 MARVELD3 MGLL MIPOL1 MMRN1 NXPH1 OLFM1 PCDHB3 PERP PITPNC1 PMP2 PPEF1 PTPRR PYGL RARRES3 RRAD SCN1B SIX6 SLITRK5 SNX10 SQRDL SRPX SULF2 TAGLN TMEM71 TUSC3 Implicated in gliomaCPAMD8 Limited evidence in gliomaCTLA4 Not implicated in gliomaCXCL12 but other cancers CancerCXCL2 typeCXCL6 DKK1 DNER DUSP5 EFEMP1 EGFR EPAS1 Not implicated in cancers ERBB4 ETS1 FAM38B FGFBP2 FMNL1 FXYD3 GALR1 GDF15 GGH GPNMB GRIA3 HEPACAM HLA-A HLA-DQB1 HLA-DRB1 HMGA2 CREB3L1CXCL1CXCL14 OAS3CXCL3 OXTRDCN PDE4BDLK1 PIGADRD2 FCRLAEEF1D PKNOX2EFNA5 FGF19 PNP FOXQ1ELMO1 PSPHEPB41L3 PXDN GCNT1 RAD50ESRRG GJC3 RASGEF1CF2RL1 SALL1 GPC3FCGR2A HAND2 B SCN5A cellFGFR1 lymphoma HLA-DQA1FOXG1 SLC4A4 HOXA4 IFI27 liver,lung,colon SNAP25 colorectal,breastGAL IGSF11 SORL1GAS7 prostate SRGAP3GFAP INS HeLaGJA1 ITIH5 STAMBPL1 JPH1GRIA1 hepatocelluar carcinoma,liver SYT1 breast, neuroblastoma CHODL gastric,lung adenocarcinomaGRM3 KCNJ12 TM4SF1 L3MBTL4HIF3A ovarian,AML CMAH TNFAIP6 CPLX2 skin,epithelial,breast LMO2HLA-DPB1 hepatocellular,gastrointestinal DIAPH2 HLA-DRA CXorf38 LOXL4HLA-DRB3 LRAT CPNE5 colon,breast, LRRC55 breast cancer colorectal CRYBB1 carcinoma DCHS1 EFHD2 prostate,stomach,breast breast DTX4 cancer LXN DNM3 MALL gastrointestinal,hepatocellular,BLL,T-ALL MAP7 MEIS2 head/neck FAM150B carcinoma,bladder MICB colon,colorectal,renal,mammary MLPH pancreatic EML2 ELMOD1 EVC2 ELOVL2 FAM129A prostate,gastric,colon,melanoma,medulloblastoma FAM181A FAM189A1 FAM70A oligodendrogliomas colon ovarian,embryonal carcinoma,myeloid leukemia cervical,osteosarcoma,squamous FZD3 cell carcinoma lung,breast FAM5B GABRQ FBXO32 FEZF2 GBP3

252 A.2 Classified Differential Expression Appendix GDPD2 GLYATL2 GPR158 GRIK3 H1F0 HLA-DQA2 IFITM8P IL33 ISYNA1 KALRN KCNK12 LINGO1 LOC285141 LYST MAP6 MID1IP1 MYL1 MYO1B NRBF2 OTOR PCDHB4 PDE1C PDZRN3 PLA2G4A PLCH1 PLEKHA6 breast leukemias pancreatic,colon pancreatic,T-cell lymphoblastic lymphoma murine breast lung adenocarcinoma,sarcoma colorectal,neuroblastoma lymphoma breast cancer non-small-cell lung cancer breast,lung,adenocarcinoma breast HeLa renal cell carcinoma breast,gastric breast stomach,colorectal melanoma melanoma,nasopharyngeal carcinoma,prostate gastrointestinal stromal tumour,lymphoma gastric,renal cell carcinoma prostate neuroblastoma colorectal lung MT1A MYBL1 NBL1 NFE2L3 NOP16 NR0B1 NTN1 ODZ2 PARP3 PCDH20 PITX1 PLD3 PLS3 PMEPA1 PPAP2C PPP4R1 PROCR PTPRH RAB38 RARRES1 REC8 RGS5 RNASEH1 RTN1 SEMA4G SHOX2 Implicated in gliomaHOMER1 LimitedHOXA1 evidence in glioma NotHOXB7 implicated in glioma but other cancersHOXD13 Cancer typeHOXD9 HPSE ID4 IGF2 IKBKE MN1IL1B IL6 NotINHBA implicated in cancers INPP5D JAM2 KCNK1 LAMA4 meningioma,ALLLIF LMO3 LPAR6 LTF MAL MATN2 MET GBP5 MICA MLC1 MT2A MYC HOXA10HOXD10HOXD3HPRT1ICAM1IFI30IGFBP5IL13RA2IL1R1 MXRA5IL8 MYH3INHBB NDE1IRAK1 NID1KCNA2 NPFFR2KLF6 NR1D1LGALS3 OAS2LIFR ovarian,colon OGNLNX1 adenocarcinoma PARP12LRRN2 AMLLY96 PHACTR3 PKI55 colon,gastricMARS Primary CNS lymphoma PLS1MBP PLSCR1 breastMIA prostate,breast PNMA2MKI67 GEM PPL hepatocarcinoma GNG11 MMP7 breast cancer PRIMA1MTTP HCP5 non-small-cell lung PTPRD cancerNAMPT RAB11FIP1 cancer GYG2 GPR98 RAB6B lung adenocarcinoma ovarian,colorectal RARRES3 IGLON5 gastrointestinal neuroendocrine RGS17 carcinomas INS-IGF2 HRCT1 KCNG1 esophageal RHOBTB3 RNF175 thyroid,non ITGBL1 small MACROD2 cell lung cancer cells,prostate SALL2 breast gastrointestinal MOCOS SEZ6L LOC100127888 SIX3 LRRC2 breast,neuroendocrine KCTD12 ovarian,breast lung,prostate,ovarian ganglioglioma MARCH1 pediatric AML,non-Hodgkinslymphoma MYL9 synovial sarcoma OASL lung NKX6-2 PION Chondrosarcomas PCDHB12 PCOLCE2 PDE1C PLAC9 PPCS PLCH1

253 A.2 Classified Differential Expression Appendix PPHLN1 PRSS12 PSORS1C1 RADIL RGL3 RHBDF2 RIPK4 RP9 RRS1 SCN11A SEMA6A SGCD SHISA2 SLC38A5 SLC4A11 SNHG12 SORCS2 SPTBN5 STX3 SYNGR1 TBC1D8 TECRL TESC TMCO4 TMEM100 TMEM132D TMEM158 TMEM176B TMEM200B TMEM38A TMSB15A TMSL1 TNNC2 pancreatic gastric,colon carcinoma colorectal various cancers lymphomas juvenile myelomonocytic leukemia, renal cancer esophageal,squamous-cell carcinoma,ovarian prostate colorectal,ovarian,hepatocellular,gastric breast pancreatic colon cancer,acanthoma nasopharyngeal carcinoma Hodgkin’s lymphoma,breast lung,AML,melanoma breast,prostate neuroblastoma,ALL melanoma meningioma SKAP2 SLC26A2 SLCO2A1 SPAG4 SPINK2 SPRED1 ST6GALNAC3 STC2 STEAP2 SULF1 SYTL4 TBC1D8 TCF7 TNFAIP2 TNFSF4 TRIB2 TSHZ3 TSPAN7 TTN ZIC5 Implicated in gliomaNCAM1 Limited evidence in gliomaNFASC Not implicated in gliomaNKX2-1 but other cancers CancerNMU typeNNMT NOV NR4A2 NRP2 OAS1 P2RX7 PDGFA Not implicated in cancers PDGFRB PEA15 PI15 PLCB1 PTGS1 RASGRP3 S100B SDC2 SEMA3D SFRP1 SLC15A2 SLCO1C1 SNCAIP SOD3 SOX3 ST6GAL1 NDNNFIANKX6-2NNATNOTCH3NPTX2NRG1NTRK2OTX2 SLC16A3PAX8 SLC2A5PDGFRA SMAGPPDPN SPARCL1PHLPP1 SPINT1PLA2G2A SRP9PTEN ST6GALNAC5PTPRD STEAP1 bladder cancerS100A6 breast,adenocarcinoma,colon,liver,lymphomas STRA6S1PR3 metastatic cell lines SULF2 PRSS12 SELENBP1 SYTL5 colorectal,pancreatic,lung colorectal,gastric,breast,stomach,ovarianSERPINE2 TCEAL2SHC4 TM4SF1 breast colorectal TNFRSF14SLC38A1 RGS20 SLITRK5 breast,prostate PPM1K TPMT RGAG4 SOD2 TRIB3 RAB7L1 colorectalSOX10 TSPAN13 lung adenocarcinoma,squamousSPP1 cell breast carcinoma TSPAN9 ZIC3 ovarian SDK2 ZNF423 pancreatic pancreatic RHPN1 RP11-473I1.1 RPSAP52 acute and lymphoblastic leukemia breast,colorectal cancer prostate,breast RTP4 ovarian meningioma SFTA3 SMOC2 neuroblastoma,CML SH3BGR SHROOM3 SLC46A1 SNX22 SORCS3 TAPBPL STEAP1B STXBP5L

254 A.2 Classified Differential Expression Appendix TNNI1 TNNI2 TOX TRAM1L1 TRIM14 TRIM48 TSTD1 TTF2 TUBB4 TUSC3 UNC5D UNC80 VAX1 VIPR2 VIT WBSCR17 XYLT1 ZNF281 ZNF454 ZNF536 ZNF710 ZNF714 ZNF747 Implicated in gliomaST6GALNAC5 LimitedSYNM evidence in glioma NotTES implicated in glioma but other cancersTFAP2A Cancer typeTGFA THBS2 TNC TPM1 TRIM47 TSLP TUBA4A NotVCAM1 implicated in cancers VSNL1 ZEB1 ZIC2 ZIC4 TCF7L2 TF TFCP2 TGM2 THY1 TNFAIP3 TRAF1 TRPM8 TTYH1 UGT8 VIPR1

255 A.3 Quantitative RT-PCR Appendix

A.3 Quantitative RT-PCR

Ct values were normalised to the mean of three endogenous control genes (18S rRNA, TUBB and NDUFB10). Values were capped at 37 prior to normaliza- tion. Each sample was then normalised by subtracting the mean Ct for the sample, of the three control assays, and adding the global mean Ct of the con- trol assays. The subtraction corresponds to a standard deltaCt normalisation to adjust for differences in the amount of RNA among samples. The addition ensures that normalised values occupy approximately the same range as the original values; this is only done for convenience and does not affect the statis- tical tests since all Ct values are incremented by the same amount. Replicated samples have suffix (A) - (D). Raw and normalised Ct values are listed in the tables below.

256 A.3 Quantitative RT-PCR Appendix 34.8 31.8 29.7 25.8 30.9 12.2 24.7 27.7 35.8 23.5 28.7 27.6 30.1 27.6 40.0 34.3 18.8 28.7 24.2 23.9 32.9 30.7 28.9 30.2 34.1 31.6 29.8 25.0 30.4 11.9 24.3 28.1 36.0 23.1 28.1 26.8 31.1 27.5 40.0 34.2 18.8 28.4 23.1 23.8 32.6 29.5 29.3 29.9 27.4 32.6 31.0 24.9 40.0 14.8 26.5 26.2 40.0 23.7 36.6 26.0 33.7 33.1 30.3 33.7 18.6 25.3 25.1 24.9 33.1 31.3 30.2 29.7 26.7 30.9 29.9 24.3 40.0 12.9 24.7 22.7 33.5 23.7 33.6 21.4 27.0 29.8 30.3 32.6 19.8 24.7 26.2 23.7 34.3 34.4 29.7 27.1 28.7 26.9 28.9 21.8 31.7 10.9 23.0 29.0 34.6 33.5 25.6 22.9 29.3 32.6 40.0 31.6 19.1 40.0 26.6 25.4 27.8 28.2 27.7 27.5 31.2 29.2 30.5 22.2 33.0 11.1 24.7 29.8 36.9 37.0 26.3 24.7 29.8 32.0 40.0 30.5 22.4 40.0 28.3 26.6 29.6 29.6 28.1 28.1 36.2 31.1 30.0 22.4 34.8 13.0 22.6 27.8 32.9 33.9 24.8 24.2 27.6 30.8 40.0 33.8 22.9 40.0 27.9 25.6 28.4 30.5 27.9 28.0 31.4 31.1 28.6 22.8 32.3 12.1 21.8 24.0 35.0 35.3 26.1 24.3 31.8 30.3 40.0 32.0 21.3 40.0 27.3 24.6 29.7 30.9 27.2 28.3 35.9 32.0 35.2 26.9 40.0 18.9 23.9 25.6 36.0 40.0 29.6 23.1 28.6 37.0 29.8 35.2 19.8 40.0 27.8 25.6 31.0 32.0 30.6 31.0 37.0 31.4 32.6 25.2 40.0 14.9 23.5 25.5 34.4 40.0 28.2 21.4 26.3 34.2 29.8 40.0 19.7 40.0 26.5 25.9 29.6 32.0 29.5 28.6 25.8 26.7 30.4 31.3 29.0 11.2 25.9 24.8 28.3 23.0 27.0 25.5 23.9 25.9 25.5 24.1 22.1 25.2 26.3 23.8 26.1 26.0 28.1 25.7 27.1 26.3 32.4 29.7 28.8 12.0 25.3 22.6 27.3 23.3 25.0 30.7 24.4 24.9 26.4 23.8 25.6 26.8 29.0 23.7 25.8 27.0 28.5 25.7 24.4 27.0 31.5 35.2 28.3 12.3 25.6 24.8 28.6 22.9 27.6 26.7 24.5 26.0 26.1 25.0 20.6 23.8 26.0 23.3 25.8 26.9 27.8 26.0 26.8 32.6 30.7 32.4 40.0 12.5 25.6 28.4 33.8 24.3 29.8 22.3 29.6 32.9 30.6 40.0 21.9 40.0 27.6 25.4 31.0 28.9 30.4 28.3 29.6 30.9 31.4 40.0 40.0 11.3 27.2 30.4 35.5 23.3 31.0 27.4 31.0 30.3 31.1 40.0 20.6 40.0 26.0 25.2 34.1 28.9 32.5 30.1 31.9 31.1 32.0 40.0 40.0 12.4 27.9 29.9 35.2 25.2 31.2 27.0 28.6 29.1 31.9 40.0 23.3 40.0 28.1 26.9 33.4 29.9 32.8 30.0 28.9 32.5 32.4 40.0 40.0 13.0 26.4 35.2 40.0 24.0 29.9 28.6 31.8 33.2 29.2 40.0 19.0 40.0 25.6 24.5 33.2 29.9 34.1 29.5 25.1 29.0 31.1 28.2 40.0 11.1 25.2 22.0 34.9 21.6 29.2 22.1 24.4 27.0 29.9 34.6 20.6 40.0 25.8 23.1 27.6 29.5 29.5 28.8 26.2 31.0 31.8 28.6 40.0 11.5 27.2 24.8 40.0 23.3 31.0 26.0 27.1 29.0 31.4 40.0 20.7 40.0 25.8 25.4 31.8 29.7 32.4 31.0 25.4 31.9 32.0 40.0 40.0 14.0 27.7 29.8 40.0 23.8 30.9 30.1 28.6 31.6 29.2 40.0 19.2 40.0 25.1 24.4 30.6 30.5 35.3 29.7 22.9 29.0 30.2 33.1 40.0 12.1 26.6 26.9 35.9 23.2 29.9 28.2 23.0 28.2 29.1 40.0 20.8 40.0 24.7 24.0 29.3 33.5 32.8 28.0 28.0 30.2 30.5 34.7 40.0 11.4 26.8 30.9 40.0 22.9 31.1 26.3 30.1 31.2 30.6 40.0 20.6 40.0 23.9 25.1 31.4 28.4 32.2 30.0 27.5 30.0 30.3 37.0 40.0 11.6 26.3 30.8 35.7 22.4 30.9 26.1 30.3 30.6 30.5 40.0 20.0 40.0 23.6 24.5 31.3 27.8 32.0 29.1 27.9 30.9 32.8 40.0 40.0 13.3 26.6 33.3 40.0 24.0 30.7 27.5 31.6 32.9 30.1 40.0 19.7 40.0 25.7 24.8 32.0 31.5 34.4 29.5 CategoryCore down ACore down BCore down 28.5 CCore up 28.3 28.5 AMarker 33.4 BCore 34.0 down 33.0 CNorm 31.7 29.4 DCore up 32.2 29.5 ACore down 25.6 B 26.2Core down 23.3 CCore 24.0 down 36.1Core 35.9 down 36.0 40.0Core up 35.7 34.0Core 28.9 down A 29.7 30.2Core down 29.5 BCore down ACore down BCore A up BCore down CCore up DCore down ACore down BCore down A B Core down Core down Raw Ct values. Abbreviations: "down" for down-regulated, "up" for up-regulated, and "Norm" for Normalisation control. Table A.3: MYL9-Hs00382913_m1 RGS5-Hs00186212_m1 CEBPB-Hs00270923_s1 PDGFRA-Hs00998026_m1 NKX2-1-Hs00163037_m1 18S-Hs99999901_s1 LMO4-Hs01086790_m1 SPARCL1-Hs00949886_m1 FBLN2-Hs00157482_m1 TUSC3-Hs00185147_m1 PEG3-Hs00377844_m1 HLA-DRA-Hs00219575_m1 PI15-Hs00210658_m1 DTX4-Hs00392288_m1 IRX2-Hs01383002_m1 SIX3-Hs00193667_m1 S100A6-Hs00170953_m1 FAM38B-Hs00926225_m1 PMEPA1-Hs00375306_m1 NDN-Hs00267349_s1 MN1-Hs00159202_m1 HMGA2-Hs00171569_m1 RAB6B-Hs00981572_m1 KALRN-Hs00610179_m1 Hs00229612_m1 Well Applied Biosystems assay ID1 Gene23 CHCHD10-Hs01369775_g14 CB130 CB130 CB130 CB152 Core CB152 ST6GALNAC5- CB152 up CB1525 CB171 CB171 CB171 CB192 CB5416 CB660 CB660 G2 26.77 C5orf13-Hs00854282_g1 G2 25.389 G7 25.9 CCND2-Hs00922419_g110 25.8 Core G7 down11 CTSC-Hs00175188_m1 25.5 G7 28.112 Core 25.3 down G7 27.313 DNER-Hs00294564_m1 24.3 32.014 G9 28.0 Core up 26.7 30.415 PLCH1-Hs00392783_m1 25.4 G9 28.116 Core 30.9 down 27.6 G14 23.1 26.217 SYNM-Hs00322391_m1 31.5 G14 27.0 31.9 Core 24.218 down 28.8 33.9 25.3 31.9 24.519 SALL2-Hs00413788_m1 29.2 26.1 29.0 28.0 32.720 Core 24.4 down 29.1 25.9 28.3 26.8 32.621 GPR158-Hs00393109_m1 24.2 28.6 28.9 26.1 31.9 Core 26.722 31.8 down 25.3 25.7 27.8 26.3 32.723 29.0 FUT8-Hs00189535_m1 28.8 Core 25.6 40.0 24.1 down 28.0 27.0 30.924 24.3 25.8 27.3 33.0 23.1 26.7 27.9 34.425 25.9 25.1 ASCL1-Hs00269932_m1 25.0 31.7 33.4 25.9 26.426 27.6 26.1 34.1 23.7 Core 25.3 up 31.5 24.0 30.9 24.027 EPDR1-Hs00378148_m1 27.9 25.7 34.8 23.9 25.8 40.0 32.1 Marker28 37.0 23.9 27.1 25.5 28.1 36.9 27.8 23.9 26.3 30.429 30.7 CXXC4-Hs00228693_m1 24.6 26.2 29.9 28.2 27.6 36.5 23.5 Core 26.3 26.6 up30 21.8 29.2 27.0 25.3 34.0 23.8 28.9 26.8 32.1 23.531 26.7 MAP6-Hs01023152_s1 22.3 22.3 37.0 22.9 32.0 Core 26.2 28.7 26.5 32.4 down32 22.2 26.1 28.6 21.9 30.3 27.7 21.5 31.8 27.0 27.7 35.6 24.733 NKX2-2-Hs00159616_m1 30.9 27.0 40.0 22.5 27.9 26.4 33.2 22.5 32.7 27.1 27.934 36.1 26.9 24.9 25.9 30.8 28.0 Core 28.6 30.7 down 32.7 36.0 31.1 24.635 27.4 26.9 33.6 33.4 MMP17-Hs01108847_m1 30.7 30.7 Marker 26.2 26.5 27.2 25.4 26.0 34.836 27.6 28.1 33.2 26.4 26.9 34.1 24.4 24.3 26.5 27.5 29.1 36.9 37 24.4 24.6 OLIG2-Hs00300164_s1 26.1 29.5 25.1 23.3 24.3 30.1 Core 40.0 28.7 up 29.438 40.0 27.0 30.2 24.3 25.3 34.5 26.8 29.9 27.0 30.8 27.239 26.7 28.5 22.5 24.8 HOXD10-Hs00157974_m1 40.0 28.2 31.4 25.8 32.1 25.7 29.7 22.8 31.7 29.440 28.2 26.3 33.8 27.9 29.0 Marker 40.0 27.6 35.1 29.4 31.1 27.2 29.9 27.741 31.6 FOXG1-Hs01850784_s1 Core 27.4 28.2 30.5 37.0 up 27.9 29.0 26.3 24.7 32.7 28.742 31.4 25.5 28.2 28.5 29.1 27.7 26.5 40.0 29.6 21.8 28.9 29.843 28.6 NDUFB10-Hs00605903_m1 26.8 33.2 29.9 26.2 28.2 30.8 33.3 40.0 32.8 27.0 27.0 32.6 Core44 28.7 34.9 up 28.2 26.1 27.7 28.5 Norm 34.1 33.6 37.0 25.945 40.0 25.8 33.6 33.1 MMRN1-Hs00201182_m1 26.6 25.5 34.9 30.3 29.8 30.8 40.0 40.046 24.7 24.4 29.9 27.5 35.6 40.0 29.9 25.6 28.9 29.147 24.9 40.0 27.5 40.0 NTRK2-Hs00178811_m1 28.0 31.7 30.2 Core down 40.0 24.8 25.5 27.1 27.348 27.5 27.0 31.9 40.0 40.0 26.4 28.9 24.3 26.6 40.0 27.8 40.0 25.0 28.6 CD74-Hs00269961_m1 31.0 40.0 33.1 24.4 26.0 25.7 28.2 Core 28.2 27.4 33.6 up 40.0 25.9 30.1 25.1 27.3 28.1 32.2 29.0 27.0 27.2 32.1 34.9 40.0 27.6 40.0 25.5 29.5 28.6 26.7 30.7 40.0 26.8 36.0 40.0 28.0 27.5 25.9 28.9 26.3 40.0 25.2 Core 31.8 26.6 up 28.7 40.0 33.4 26.9 35.5 26.8 25.1 30.5 40.0 27.0 26.5 40.0 30.2 28.9 25.9 40.0 40.0 29.2 26.9 40.0 31.0 40.0 27.9 24.8 26.2 40.0 29.0 27.3 40.0 27.5 25.6 28.4 26.7 32.7 28.5 23.2 40.0 25.1 26.5 26.6 27.3 40.0 26.3 29.9 28.0 35.9 24.9 28.6 30.0 40.0 27.6 29.0 27.4 40.0 26.9 30.6 30.0 35.3 26.2 29.3 40.0 25.9 27.9 28.4 40.0 30.0 24.3 26.8 27.3 22.9 40.0 25.3 28.7 29.8 28.9 26.5 30.1 27.2 30.1 27.4 29.2 40.0 25.1 21.8 29.9 28.0 28.2 27.7 22.2 27.0 27.9 30.1 27.8 26.0 25.8 27.3 24.0 22.3 27.9 25.9 31.0 28.0 27.2 25.2 22.7 28.9 27.4 26.1 28.1 29.0 30.5 26.7 27.7 25.5 23.5 27.4 32.8 28.3 25.0 28.5 40.0 24.1 28.9 30.5 28.9 25.0 40.0 22.1 26.7 36.0 27.8 25.7 22.1 36.6 20.7 27.8 28.3 27.2 36.4 24.3 26.9 25.5 22.4 40.0 25.2 30.1 25.5 23.3 36.0 23.7 25.2 29.1 25.5 40.0 24.0 24.2 25.0 36.9 21.8 25.0 27.6 32.4 24.1 26.8 33.9 25.2 22.7 23.9 25.0 26.5 24.3 24.8 23.6 21.9 27.1 26.9 27.9

257 A.3 Quantitative RT-PCR Appendix 29.7 26.5 23.8 31.9 28.9 28.2 26.7 27.3 30.8 26.5 22.3 29.6 25.3 22.0 33.9 40.0 29.8 27.0 24.3 29.1 31.1 26.6 29.9 28.5 29.6 26.3 23.9 31.7 28.6 27.9 26.2 26.2 30.4 26.1 22.5 29.3 25.3 21.8 34.1 35.9 29.2 28.9 23.8 27.5 30.0 26.3 29.1 28.4 27.3 29.3 23.7 30.8 22.1 30.5 28.9 30.2 26.4 27.9 19.0 30.9 25.9 23.1 34.9 40.0 29.6 23.7 26.8 32.9 25.4 27.9 30.0 28.9 23.0 27.3 24.0 28.1 21.6 30.1 28.2 26.7 24.1 26.4 19.1 27.9 24.9 22.5 33.1 33.4 27.7 22.3 25.6 33.4 23.0 28.2 28.9 29.3 31.0 27.7 24.4 26.8 29.3 25.2 30.9 26.3 22.3 26.7 20.0 29.6 25.9 24.2 28.7 37.0 27.8 22.2 33.1 29.5 26.4 26.7 28.5 30.7 33.0 28.5 25.3 27.1 37.2 27.9 33.5 26.7 24.9 28.3 22.4 29.3 27.3 26.9 33.2 40.0 29.4 24.5 34.2 28.7 29.4 27.7 30.5 31.0 29.0 26.4 23.1 26.6 36.9 29.4 34.3 25.2 23.1 28.4 21.4 29.1 26.5 26.6 31.4 40.0 29.8 26.1 30.2 28.9 27.3 25.0 28.7 29.6 32.5 26.8 23.3 25.4 40.0 27.8 30.3 24.9 23.4 26.6 20.3 27.1 25.7 25.3 30.8 40.0 28.1 23.7 28.5 28.9 22.4 24.6 28.4 28.6 40.0 25.9 23.9 27.2 29.7 27.0 29.2 26.4 26.1 27.0 22.2 32.4 26.0 24.4 32.2 36.6 31.9 32.6 29.4 30.9 32.2 28.9 28.7 30.2 40.0 25.8 23.0 26.8 27.4 27.4 27.8 25.6 25.4 27.4 22.5 29.8 25.6 23.4 31.5 40.0 31.9 30.9 27.4 29.6 28.7 27.2 28.2 29.0 24.6 27.9 26.7 26.3 21.8 26.0 27.0 23.3 22.5 25.5 24.2 26.8 28.0 24.1 32.6 25.6 28.5 27.9 28.0 25.5 29.8 31.8 31.5 32.4 24.6 25.6 27.8 25.7 23.6 26.4 28.4 23.0 23.4 25.8 24.4 26.7 29.7 25.6 32.4 26.1 28.5 29.4 29.6 25.8 30.8 30.3 32.7 32.1 25.5 26.1 26.4 26.1 21.2 25.2 26.6 22.9 22.1 25.1 23.3 26.4 27.6 26.1 32.5 25.1 27.9 27.7 29.9 25.2 31.8 37.0 30.7 34.9 29.9 27.4 26.8 30.4 21.8 27.1 28.5 22.7 21.5 25.5 23.9 28.4 26.4 22.9 34.8 32.6 29.8 29.9 40.0 26.9 32.0 28.1 31.1 29.9 32.8 25.1 31.3 33.1 20.0 27.6 28.2 26.4 31.5 25.2 23.9 28.8 26.3 28.9 33.0 26.9 30.1 28.5 33.8 29.0 34.1 28.8 31.5 31.8 33.4 26.0 30.9 34.9 20.7 29.7 29.6 26.8 32.6 26.4 25.3 27.7 26.5 29.7 32.6 29.1 30.8 30.9 33.4 28.6 36.0 29.4 31.7 33.6 31.6 26.5 27.9 36.0 22.0 26.4 27.5 24.6 30.9 25.3 25.5 28.4 24.8 24.4 34.8 25.7 31.2 32.0 29.7 29.5 40.0 24.9 30.3 29.2 32.0 22.4 25.4 29.1 21.9 26.0 26.8 24.3 31.0 23.3 20.4 26.6 23.5 26.4 29.7 28.4 30.7 29.6 33.4 25.9 33.3 27.8 29.3 33.1 36.9 25.0 29.1 30.0 20.2 27.4 28.9 26.2 29.5 25.0 20.9 29.6 26.6 26.7 33.1 28.9 32.2 30.1 34.2 27.7 31.4 29.2 31.4 33.0 31.9 26.6 26.8 34.1 21.9 25.8 27.2 25.2 28.3 24.5 22.4 28.8 24.6 24.1 34.0 27.3 34.3 32.2 30.8 30.4 35.5 27.3 30.6 40.0 30.3 25.6 25.9 35.3 20.5 26.4 27.6 24.3 30.9 24.1 22.9 27.2 23.8 25.0 32.2 27.7 34.3 28.8 32.1 26.4 34.1 27.1 30.6 36.0 29.2 26.1 30.1 33.7 18.5 26.7 28.0 25.9 32.6 24.7 24.2 29.0 25.2 27.9 31.9 26.1 29.4 27.9 34.8 29.3 32.4 27.8 30.3 33.1 28.9 25.3 29.7 37.1 18.8 25.9 28.0 25.9 32.2 24.5 23.6 28.9 25.0 27.6 31.7 25.6 29.0 27.5 34.6 29.3 31.7 27.4 30.1 33.2 29.4 27.0 28.5 36.8 21.2 26.5 27.9 25.0 30.1 24.7 25.2 29.0 25.3 24.6 34.4 25.7 30.9 31.0 29.9 30.2 34.1 25.9 31.0 29.2 CategoryCore down ACore down BCore up CCore down ACore down BCore down CCore down DCore down ACore down BCore down CCore up Core down Core up ACore up BCore down ACore down BCore down ACore up BCore up CCore down D AMarker BCore up ACore up B Core up LUM-Hs00158940_m1 MAP6-Hs01929835_s1 SULF2-Hs00378697_m1 LMO2-Hs00277106_m1 TAGLN-Hs00162558_m1 MAF-Hs00193519_m1 EDA2R-Hs00939736_m1 SEMA6A-Hs00221174_m1 CA12-Hs01080909_m1 SDC2-Hs00299807_m1 MT2A-Hs02379661_g1 PDZRN3-Hs00392900_m1 FAM69A-Hs00961685_m1 PLS3-Hs00418605_g1 SALL2-Hs00826674_m1 TES-Hs00210319_m1 LPAR6-Hs00271758_s1 NNMT-Hs00196287_m1 PRSS12-Hs00186221_m1 NTN1-Hs00180355_m1 CHI3L1-Hs00609691_m1 ADD2-Hs00242289_m1 LYST-Hs00179814_m1 PLA2G4A-Hs00233352_m1 Well Applied Biosystems assay ID49 Gene5051 NELL2-Hs00196254_m152 CB130 CB130 CB13053 CB152 PDE1C-Hs01095694_m1 CB152 CB152 CB152 Core54 CB171 down CB171 CB171 CB192 CB54155 TUBB-Hs00962420_g1 CB660 CB660 24.5 G2 Core up56 24.3 G257 MAN1C1-Hs00220595_m1 24.758 G7 29.9 Norm 28.059 Core RTN1-Hs00382515_m1 G7 up 32.4 29.360 32.9 G761 29.1 NPTX2-Hs00383983_m1 20.0 27.8 27.0 G762 24.9 Core down 21.3 26.9 27.963 G9 SOX10-Hs00366918_m1 24.3 Core 21.5 26.4 27.0 29.0 down64 G9 25.6 21.2 23.065 25.9 28.0 DDIT3-Hs00358796_g1 35.2 G14 25.3 Marker 20.666 23.2 29.0 29.2 G14 30.6 25.367 23.1 C9orf125-Hs00260558_m1 29.7 27.8 31.4 30.5 24.668 Core 22.7 28.8 25.7 30.8 up 40.0 30.5 23.869 SLIT2-Hs00191193_m1 Core 19.8 28.9 down 27.2 30.4 40.0 26.870 25.0 24.0 25.9 27.2 26.2 28.0 40.0 32.9 32.871 IL17RD-Hs00296982_m1 30.5 22.2 27.2 27.0 27.9 29.0 40.0 32.2 29.872 30.9 Core down 21.3 27.6 30.2 27.1 25.9 29.2 40.073 32.8 40.0 LGALS3-Hs00173587_m1 Core 30.7 29.3 down 20.4 26.974 31.1 26.4 28.4 35.2 36.7 36.9 31.7 33.1 28.0 21.075 23.8 TERT-Hs00972656_m1 26.8 29.0 27.6 32.8 40.0 30.7 Core 35.3 up 24.3 32.976 28.7 20.2 23.8 25.7 28.9 27.9 40.0 34.6 24.6 33.6 30.077 28.9 PTEN-Hs02621230_s1 21.5 25.5 25.5 25.7 26.8 27.3 40.0 34.8 22.9 34.2 25.778 23.0 Marker 23.3 28.7 29.0 25.3 25.8 34.0 28.5 40.0 33.7 25.5 20.8 23.8 24.579 26.0 KCTD12-Hs00540818_s1 31.4 26.1 26.9 21.1 30.3 40.0 27.1 35.4 26.8 23.680 25.4 26.1 Core 29.3 23.1 24.2 40.0 27.5 down 27.1 28.6 40.0 36.381 40.0 SOX2-Hs01053049_s1 23.4 24.8 20.6 24.4 26.4 Core 29.8 40.0 down 40.0 26.2 28.7 35.182 30.0 24.4 29.4 21.7 25.2 24.0 26.2 33.0 26.3 40.0 21.583 25.8 30.6 27.6 40.0 27.4 ODZ2-Hs00393060_m1 27.4 28.5 29.4 24.5 24.2 20.5 32.9 27.684 40.0 28.3 30.2 27.9 27.9 27.6 31.4 28.1 26.1 20.8 23.0 Marker 27.4 31.885 28.0 28.5 CACNG8-Hs01100182_m1 40.0 27.3 26.9 30.3 27.5 29.4 29.0 25.6 40.0 31.186 24.7 24.0 Core 40.0 26.3 33.3 27.4 down 30.5 28.0 37.0 26.8 26.6 25.2 87 Core FOXJ1-Hs00230964_m1 25.2 24.2 down 27.2 30.2 40.0 36.0 29.6 23.9 33.9 28.1 28.388 25.1 25.9 29.0 21.3 36.0 29.2 24.4 40.0 40.0 24.4 32.7 26.4 25.9 27.389 28.5 26.5 LMO3-Hs00375237_m1 40.0 21.7 40.0 40.0 33.0 28.8 23.7 30.2 Core 33.2 28.8 40.090 27.5 down 25.8 36.4 26.2 24.1 29.4 40.0 40.0 32.7 27.4 29.791 28.3 CD9-Hs01124025_g1 27.8 26.7 26.6 35.7 25.6 28.3 40.0 26.2 28.7 40.0 32.6 27.692 29.7 Core 27.5 27.1 28.2 28.0 down 32.9 25.0 27.7 31.9 22.993 26.4 36.7 40.0 30.4 MYL9-Hs00697086_m1 29.8 29.5 27.0 28.1 30.7 32.7 25.0 29.3 26.9 31.0 20.494 36.5 27.6 33.7 26.5 26.4 27.6 24.0 26.8 26.7 28.595 26.9 Core 24.3 LAMA2-Hs00166308_m1 28.3 26.4 35.5 up 34.5 25.3 27.6 28.8 24.3 33.7 29.096 25.1 27.7 Core 22.5 down 33.6 33.6 27.3 24.8 27.3 29.9 25.5 26.2 30.5 34.5 BACE2-Hs00273238_m1 34.4 22.0 25.9 Core 27.5 34.8 25.4 25.1 26.2 40.0 down 29.4 32.4 26.3 40.0 26.0 24.4 22.7 40.0 30.4 35.3 24.2 32.2 27.8 25.7 35.3 34.8 35.4 23.5 29.9 23.1 32.7 Core 30.0 30.9 28.7 24.8 up 25.5 26.9 40.0 30.9 30.2 30.7 32.0 30.8 23.4 24.9 27.3 25.3 27.9 20.9 33.4 32.4 25.0 37.0 33.4 29.6 23.5 27.3 24.3 26.1 33.4 35.5 22.6 26.5 24.7 40.0 35.5 40.0 29.0 31.0 32.1 24.7 26.1 27.3 24.2 26.7 40.0 30.8 40.0 34.0 27.2 30.5 24.9 30.4 24.1 27.8 29.7 31.6 23.6 27.1 29.7 23.0 40.0 29.4 33.9 28.0 30.4 24.7 33.6 23.4 27.7 26.5 26.7 40.0 26.8 32.2 26.4 28.7 32.4 22.6 29.8 30.1 26.2 28.7 27.8 40.0 21.7 30.0 25.5 30.1 26.1 29.9 28.8 28.3 25.3 31.3 40.0 28.4 28.8 27.8 27.9 31.2 27.1 24.1 26.8 29.6 40.0 30.1 40.0 23.7 32.5 27.3 32.4 22.5 26.2 25.4 40.0 23.8 31.1 31.3 32.0 25.8 28.0 31.2 25.5 29.1 31.0 28.6 32.4 35.2 37.0 26.7 23.7 28.2 26.7 28.0 40.0 21.1 31.7 34.0 27.9 33.0 28.7 22.2 35.8 35.6 33.1 31.7 32.0 22.4 28.4 36.9 35.2 31.1 22.4 33.2 40.0 27.7 22.8 28.6 40.0 30.5 32.8 21.2 25.9 26.0 27.9 24.1 22.3 26.5 25.2 23.8 24.2 26.6 25.8 22.9 32.4 26.7 29.3 23.4 33.0 28.8 30.6 26.3 30.4 26.3 25.5 24.5 25.0

258 A.3 Quantitative RT-PCR Appendix 31.4 30.1 28.2 25.9 40.0 11.4 23.2 25.2 28.6 25.3 40.0 21.1 26.7 28.1 30.4 40.0 18.3 31.6 25.7 35.2 30.8 28.7 29.9 31.0 29.1 31.9 31.1 28.1 26.5 40.0 11.8 24.4 26.3 28.7 26.5 40.0 22.2 28.2 28.5 30.5 40.0 19.3 32.7 26.0 40.0 31.5 29.4 31.0 31.2 30.4 28.6 31.4 28.7 28.6 40.0 13.8 24.1 30.5 34.7 28.8 31.4 26.6 30.0 31.7 40.0 33.1 17.9 40.0 25.0 40.0 35.0 32.1 30.5 30.9 27.5 28.3 30.4 28.4 25.2 40.0 12.0 23.7 27.9 36.5 32.0 29.8 28.3 27.4 31.0 40.0 33.7 17.5 40.0 22.6 35.7 34.7 31.3 29.6 27.9 28.8 33.9 33.0 31.5 26.2 36.9 13.8 24.7 31.3 35.4 40.0 28.7 22.8 34.1 30.7 40.0 30.8 23.8 40.0 27.7 27.0 30.1 33.8 30.2 30.8 40.0 34.2 36.2 32.2 26.7 34.7 14.5 25.7 31.2 34.3 40.0 29.5 24.4 32.6 30.4 40.0 31.6 24.0 40.0 27.3 28.6 31.6 35.8 31.7 32.2 40.0 32.2 30.6 29.9 28.9 31.0 12.0 22.9 28.9 35.4 30.2 32.0 30.5 34.6 27.6 34.9 34.9 21.8 40.0 26.9 25.0 26.4 32.3 27.1 27.4 40.0 31.1 30.1 29.7 27.3 32.2 11.6 22.9 27.0 33.9 30.0 29.1 28.0 27.7 26.5 37.1 40.0 22.7 40.0 27.4 25.5 26.8 32.6 26.7 26.7 34.9 34.8 30.5 29.2 27.8 31.7 11.9 22.6 26.4 34.5 40.0 40.0 32.0 37.0 26.4 40.0 36.1 22.9 40.0 26.9 25.1 27.2 32.8 26.8 27.5 31.2 35.1 30.0 29.1 27.5 31.2 11.4 22.4 26.5 34.3 40.0 40.0 33.7 36.0 26.4 40.0 34.7 23.0 40.0 27.1 25.4 26.5 33.1 26.6 27.4 32.3 33.6 30.1 29.7 23.6 40.0 11.9 22.5 24.0 34.6 24.9 25.6 23.9 30.6 29.5 40.0 36.4 20.9 40.0 24.6 40.0 27.1 31.5 27.4 27.5 40.0 34.1 30.0 29.9 23.6 40.0 12.1 22.8 24.3 34.1 24.9 25.2 24.0 31.1 29.8 40.0 35.1 21.1 40.0 25.2 36.9 28.0 32.4 28.0 27.2 40.0 29.9 32.3 30.5 26.9 40.0 12.2 25.4 29.3 36.2 40.0 30.1 24.5 26.9 34.2 40.0 40.0 21.2 40.0 27.3 27.1 35.5 33.6 32.4 31.1 31.2 28.8 30.2 29.7 23.8 40.0 11.7 23.3 24.8 34.3 40.0 27.6 23.7 27.8 34.5 40.0 40.0 21.4 40.0 26.7 27.1 33.7 32.1 30.7 30.6 31.4 29.9 30.0 29.5 26.0 40.0 11.9 23.8 27.3 33.9 35.0 27.3 22.9 27.6 29.4 40.0 35.9 19.2 40.0 24.3 25.7 28.9 31.8 29.0 30.1 40.0 30.7 30.3 29.8 26.5 40.0 12.3 24.4 27.5 32.8 37.0 27.3 22.4 27.5 30.0 40.0 36.0 20.3 40.0 25.4 25.9 30.5 32.4 29.5 30.7 40.0 27.4 31.5 30.2 24.4 40.0 13.0 24.1 24.0 35.9 32.9 28.5 27.7 26.6 30.5 40.0 34.8 20.0 40.0 23.0 25.8 27.3 35.9 28.0 28.3 27.5 27.7 31.3 29.9 24.9 35.8 11.7 24.4 25.1 36.7 34.1 27.8 27.9 28.9 30.4 36.4 34.5 19.9 40.0 22.4 25.9 26.4 35.4 27.9 28.5 32.8 31.9 29.9 27.8 27.3 34.8 12.2 23.6 28.9 30.7 25.0 33.9 40.0 32.2 33.8 28.7 30.5 17.8 36.4 26.4 40.0 33.4 27.2 33.0 31.8 26.0 29.1 30.0 28.2 25.3 40.0 11.9 23.4 26.2 30.1 24.4 32.9 35.3 30.6 32.8 28.4 29.9 18.3 36.9 25.5 40.0 31.8 27.5 33.2 30.3 23.4 33.1 32.9 30.7 22.3 35.3 12.4 22.3 27.4 31.9 40.0 24.9 35.0 28.9 28.0 40.0 40.0 27.3 40.0 27.6 30.1 27.9 33.2 29.3 27.4 40.0 33.8 32.1 30.0 22.5 35.3 12.0 21.9 26.4 31.0 40.0 23.6 33.4 32.3 28.5 40.0 40.0 29.2 40.0 27.9 29.4 28.0 33.0 29.8 27.5 40.0 36.1 31.6 30.4 25.0 31.5 13.1 25.3 25.6 35.5 23.5 28.9 26.1 30.8 27.7 40.0 34.0 20.1 29.4 25.1 24.2 34.9 32.0 30.7 30.6 31.9 34.0 31.0 30.6 24.7 31.1 12.9 24.5 26.9 40.0 23.2 28.9 27.0 28.9 27.8 40.0 33.0 18.5 29.0 24.1 23.8 32.2 29.9 28.9 30.5 30.1 CategoryCore down ACore down BCore down A 29.7Core up 31.1 B 25.5 26.1Marker A 32.9 30.3Core B down 33.1 40.0 ANorm 35.9 34.3 B 29.9Core up 31.4 A 33.5Core 33.7 down B 27.1 27.0Core down 28.8 A 28.6Core 36.0 down B 35.9 ACore down 30.9 33.5 BCore up 34.5 A 33.2 Core down BCore down ACore down BCore A down Core B up Core A down Core B up Core A down Core B down Core down Core down Core down Core down MYL9-Hs00382913_m1 RGS5-Hs00186212_m1 Hs00229612_m1 CEBPB-Hs00270923_s1 PDGFRA-Hs00998026_m1 NKX2-1-Hs00163037_m1 18S-Hs99999901_s1 LMO4-Hs01086790_m1 SPARCL1-Hs00949886_m1 FBLN2-Hs00157482_m1 TUSC3-Hs00185147_m1 PEG3-Hs00377844_m1 HLA-DRA-Hs00219575_m1 PI15-Hs00210658_m1 DTX4-Hs00392288_m1 IRX2-Hs01383002_m1 SIX3-Hs00193667_m1 S100A6-Hs00170953_m1 FAM38B-Hs00926225_m1 PMEPA1-Hs00375306_m1 NDN-Hs00267349_s1 MN1-Hs00159202_m1 HMGA2-Hs00171569_m1 RAB6B-Hs00981572_m1 KALRN-Hs00610179_m1 LUM-Hs00158940_m1 Well Applied Biosystems assay ID1 Gene23 CHCHD10-Hs01369775_g14 G19 G19 Core ST6GALNAC5- up5 G21 G216 G23 26.27 C5orf13-Hs00854282_g1 G23 27.28 G24 26.99 G24 27.3 CCND2-Hs00922419_g110 G25 25.8 Core down 25.9 G2511 CTSC-Hs00175188_m1 26.1 G26 29.112 Core 26.0 down G26 28.113 DNER-Hs00294564_m1 26.5 G30 27.2 35.014 25.6 Core G30 27.8 up 30.5 25.215 G31 28.2 PLCH1-Hs00392783_m1 21.6 26.3 G31 25.216 Core 22.0 down 26.6 G32 21.6 21.917 27.9 SYNM-Hs00322391_m1 26.7 24.4 23.2 G32 31.4 Core 27.418 down 26.5 26.3 24.2 G144 28.4 30.019 26.5 G144 SALL2-Hs00413788_m1 26.7 25.5 27.3 26.0 32.4 G166 27.0 40.0 21.620 Core 26.4 26.3 down 27.4 G166 26.6 29.4 20.7 32.7 27.121 GPR158-Hs00393109_m1 26.8 G179 27.6 34.0 26.4 26.0 32.4 27.7 Core 24.522 G179 down 33.4 25.0 25.5 27.7 27.0 26.7 26.023 FUT8-Hs00189535_m1 27.9 40.0 22.5 27.2 28.7 Core 29.9 22.1 down 25.324 28.2 26.3 22.2 26.6 28.1 28.1 22.0 27.7 40.0 20.5 24.8 26.7 30.625 28.5 26.6 ASCL1-Hs00269932_m1 20.7 27.6 27.3 22.1 25.3 30.0 31.6 26.326 24.6 20.9 Core 27.2 26.9 24.0 up 27.9 27.9 30.5 34.3 20.627 EPDR1-Hs00378148_m1 24.0 28.3 23.6 24.2 28.2 29.4 36.9 40.0 21.2 Marker28 36.0 24.4 24.0 40.0 29.9 35.1 25.1 26.1 27.4 40.0 24.529 23.9 40.0 29.6 CXXC4-Hs00228693_m1 33.2 28.2 25.3 Core 29.4 24.3 24.5 32.3 28.0 29.3 up30 32.0 36.9 25.1 29.3 24.5 25.0 32.7 27.3 32.4 31.031 28.7 MAP6-Hs01023152_s1 30.7 25.1 40.0 23.9 26.1 31.1 27.4 Core 29.5 down 32.032 23.1 27.0 28.2 36.1 27.6 31.3 31.6 27.5 33.1 22.8 23.4 27.233 31.9 32.4 29.2 28.2 NKX2-2-Hs00159616_m1 29.7 27.0 26.3 28.6 28.7 21.2 27.1 33.3 30.2 30.434 28.2 32.5 26.2 31.6 Core 30.4 27.3 27.0 31.7 30.8 down 30.5 22.8 24.3 26.635 MMP17-Hs01108847_m1 31.5 29.4 26.0 32.0 28.5 Marker 31.2 24.7 30.3 26.5 24.336 27.9 33.9 26.0 28.8 40.0 29.2 28.7 29.1 26.8 27.9 28.3 35.4 27.337 23.1 27.6 OLIG2-Hs00300164_s1 29.3 30.9 27.2 29.6 40.0 Core 26.0 40.0 28.5 up 27.0 28.238 28.7 29.1 28.2 30.2 26.7 31.2 36.8 36.4 26.2 26.9 28.4 29.2 29.939 HOXD10-Hs00157974_m1 28.3 28.6 40.0 26.3 25.8 31.0 40.0 26.5 28.2 36.940 29.6 28.4 23.4 26.5 Marker 26.9 40.0 27.8 40.0 29.6 28.9 30.9 29.641 23.4 26.2 FOXG1-Hs01850784_s1 Core 28.0 27.5 34.9 up 40.0 29.7 30.3 21.9 26.1 29.0 31.442 27.9 27.9 40.0 30.4 30.1 22.5 26.5 33.9 34.9 27.8 26.4 26.943 33.3 NDUFB10-Hs00605903_m1 30.8 28.6 22.2 28.3 29.1 29.2 28.5 26.8 26.7 33.9 Core44 32.4 31.0 27.7 up 22.6 22.1 29.1 28.5 25.5 25.4 27.4 Norm 32.645 28.9 28.1 MMRN1-Hs00201182_m1 22.8 29.1 30.9 40.0 24.8 26.1 25.0 28.4 28.846 26.8 28.9 40.0 25.8 26.5 25.4 24.4 28.2 25.1 26.0 27.5 30.8 27.747 27.4 26.9 NTRK2-Hs00178811_m1 24.2 35.1 Core 29.1 24.9 down 25.4 28.6 25.0 24.9 30.748 27.3 29.8 22.9 29.4 30.1 40.0 26.1 29.4 25.6 24.6 33.7 23.049 29.3 28.6 31.3 CD74-Hs00269961_m1 26.2 27.1 23.1 29.5 24.5 25.0 31.6 Core 31.6 24.6 29.8 up 26.250 30.4 22.9 29.5 25.2 25.3 26.8 40.0 25.3 33.7 25.9 31.3 23.1 28.9 25.3 24.8 NELL2-Hs00196254_m1 32.8 40.0 24.3 29.5 29.4 24.9 31.8 29.5 25.3 24.5 24.0 33.0 25.3 28.8 Core 36.9 28.6 21.8 up 31.2 25.1 27.1 24.6 30.7 34.6 25.5 29.2 21.2 28.9 25.7 35.0 29.5 40.0 24.8 Core 25.9 29.2 33.7 down 22.6 28.6 26.0 29.7 40.0 26.3 29.2 27.5 28.4 22.8 28.8 25.4 27.5 33.1 31.6 40.0 26.3 26.4 22.6 28.9 26.4 24.7 28.8 33.2 32.6 40.0 23.5 26.5 32.1 22.7 28.8 25.9 33.2 33.2 26.6 32.0 23.5 31.0 35.1 24.8 29.0 25.3 29.0 33.1 31.5 21.4 32.7 26.4 31.0 30.7 24.9 23.6 31.6 23.4 40.0 21.6 34.2 25.2 28.6 32.1 23.4 28.8 40.0 21.4 31.0 27.9 25.5 29.8 23.3 40.0 21.8 28.4 28.2 26.5 25.4 33.2 29.6 26.3 40.0 26.3 22.2 25.2 27.0 36.9 24.3 40.0 30.5 23.0 24.7 28.5 37.0 24.0 40.0 27.9 24.2 30.5 23.8 32.5 34.6 40.0 26.6 25.6 27.2 34.4 37.0 25.5 24.7 40.0 24.5 30.8 27.5 24.4 21.8 33.7 25.0 31.4 31.8 36.5 27.0 21.3 26.4 35.0 26.4 33.5 27.4 25.1 28.3 33.6 25.3 28.2 30.3 22.5 32.2 29.1 24.9 30.1 24.8 24.7 28.0 28.1 28.7 29.1 27.4 26.6 31.8 22.3 32.2 22.0 31.1

259 A.3 Quantitative RT-PCR Appendix 26.4 26.2 29.0 26.8 34.6 34.5 27.7 26.1 25.4 18.9 29.0 26.0 25.1 30.8 33.5 31.4 23.5 31.3 30.0 21.8 28.8 27.9 28.9 27.4 26.8 29.8 27.7 35.5 35.3 28.0 26.3 26.1 20.0 29.0 27.4 26.3 31.4 35.8 32.0 24.3 31.5 29.4 22.3 29.0 28.9 29.7 33.9 25.4 35.4 29.7 28.5 30.1 27.1 26.8 28.5 18.5 31.6 26.0 22.4 31.7 36.1 30.2 23.9 24.3 35.0 28.2 25.4 28.0 27.3 30.9 25.2 33.3 30.4 27.1 31.0 25.3 25.9 26.9 18.8 30.8 26.1 21.0 32.4 36.2 28.8 23.0 23.3 30.5 23.0 25.6 28.1 28.6 31.0 24.7 32.6 31.4 33.3 34.0 26.2 26.3 29.0 22.9 32.9 26.5 24.6 34.6 40.0 32.9 28.0 25.8 27.9 37.0 27.7 29.4 28.4 34.8 27.5 32.6 29.9 33.9 37.0 27.2 28.3 29.2 24.3 33.1 28.1 26.5 35.3 40.0 34.1 27.5 27.3 28.0 40.0 28.9 30.1 29.0 30.0 25.1 25.4 28.7 26.8 31.5 23.0 23.2 28.9 21.7 26.0 25.0 24.3 31.1 40.0 28.7 28.3 26.8 25.6 29.2 24.8 29.9 26.0 28.3 24.9 25.8 28.5 27.7 29.8 22.8 24.0 28.8 21.4 26.1 25.0 24.8 31.1 40.0 29.0 29.9 26.9 25.7 28.3 24.8 29.2 26.4 29.8 24.4 25.9 30.4 27.4 34.4 22.9 23.8 27.6 21.4 26.3 25.4 24.8 31.5 40.0 28.7 28.1 27.8 25.9 27.6 24.9 29.2 26.2 29.7 24.9 26.0 32.7 26.9 34.5 22.4 23.9 28.0 21.4 26.3 25.2 24.4 31.0 40.0 29.0 29.0 27.4 25.5 27.6 24.9 28.8 26.5 27.0 22.5 26.8 24.4 27.3 28.2 23.8 27.8 28.5 22.0 28.9 26.3 24.4 31.7 40.0 28.0 27.1 37.0 26.8 22.8 25.8 28.7 26.1 26.9 23.2 25.9 25.7 27.5 28.5 23.9 26.7 29.0 21.8 28.6 26.4 24.9 32.5 40.0 28.2 27.3 40.0 27.1 21.8 25.8 28.3 26.5 31.9 25.7 30.5 29.3 28.8 32.5 26.8 26.3 26.7 21.9 28.6 25.9 23.9 34.6 40.0 32.3 24.6 26.9 30.5 27.4 32.1 31.6 32.1 31.4 23.6 27.3 29.1 26.7 32.0 24.6 23.5 25.8 20.0 27.7 24.3 22.5 32.8 40.0 30.5 23.7 25.3 28.1 23.8 30.3 29.3 31.1 30.3 23.8 34.0 25.0 31.5 34.8 27.2 26.9 24.9 22.5 26.0 25.5 23.3 32.7 34.2 32.9 26.9 26.6 28.0 30.7 27.9 29.2 27.0 29.5 24.6 34.3 26.5 31.4 35.5 27.7 27.9 25.9 24.0 26.7 25.7 24.2 33.1 34.1 33.3 28.2 27.8 28.0 29.3 28.6 28.9 27.1 28.7 22.4 30.3 21.5 28.7 28.1 24.8 30.4 25.6 23.1 40.0 24.0 22.8 32.5 36.8 29.1 30.3 31.8 25.0 24.0 27.8 29.0 29.8 29.7 22.6 30.8 21.0 28.0 28.0 23.6 31.3 25.2 22.8 35.8 25.3 21.6 33.4 34.1 28.7 32.7 31.0 25.4 24.3 26.7 29.3 30.1 30.1 26.3 35.5 27.2 30.1 26.7 26.1 26.5 26.2 21.9 40.0 27.7 22.9 37.0 36.5 31.2 27.1 24.5 35.6 36.9 25.2 28.4 29.4 28.9 24.0 31.1 24.7 27.2 28.9 25.6 25.5 25.9 20.9 33.9 24.9 22.8 33.6 34.2 29.7 23.8 24.9 34.2 33.1 26.5 29.6 29.4 25.4 25.7 25.9 33.0 28.6 28.2 23.3 23.3 36.0 21.5 26.7 25.2 25.4 33.1 40.0 29.3 30.6 33.5 26.2 27.8 25.2 30.6 28.9 25.0 26.1 26.3 40.0 27.8 27.9 22.4 22.4 35.1 21.7 26.4 23.7 24.4 31.7 40.0 28.9 30.0 30.3 26.6 29.9 24.0 29.8 29.3 26.5 24.8 30.5 30.1 28.0 27.8 26.3 28.6 27.0 23.0 28.3 24.8 22.7 32.6 40.0 29.9 28.3 25.7 29.0 25.9 27.7 30.3 29.2 25.7 23.6 30.3 28.1 27.9 27.2 26.6 30.1 26.0 22.6 29.0 25.0 21.6 33.8 40.0 29.6 27.9 24.2 29.2 31.2 26.5 29.8 28.1 CategoryCore down ACore up BCore A down BCore down ACore down BCore down ACore down BCore down A BCore down ACore up BCore down ACore up BCore up ACore B down ACore down BCore down ACore up BCore up ACore down BMarker ACore up B Core up Core up MAP6-Hs01929835_s1 SULF2-Hs00378697_m1 LMO2-Hs00277106_m1 TAGLN-Hs00162558_m1 MAF-Hs00193519_m1 EDA2R-Hs00939736_m1 SEMA6A-Hs00221174_m1 CA12-Hs01080909_m1 SDC2-Hs00299807_m1 MT2A-Hs02379661_g1 PDZRN3-Hs00392900_m1 FAM69A-Hs00961685_m1 PLS3-Hs00418605_g1 SALL2-Hs00826674_m1 TES-Hs00210319_m1 LPAR6-Hs00271758_s1 NNMT-Hs00196287_m1 PRSS12-Hs00186221_m1 NTN1-Hs00180355_m1 CHI3L1-Hs00609691_m1 ADD2-Hs00242289_m1 LYST-Hs00179814_m1 PLA2G4A-Hs00233352_m1 Well Applied Biosystems assay ID51 Gene5253 PDE1C-Hs01095694_m154 G1955 TUBB-Hs00962420_g1 G19 Core up56 G21 G2157 MAN1C1-Hs00220595_m1 G2358 24.4 Norm G2359 25.4 Core RTN1-Hs00382515_m1 up G24 27.360 G24 28.561 G25 NPTX2-Hs00383983_m1 20.4 24.9 27.6 G2562 Core 21.8 24.1 27.3 down G2663 20.0 23.6 SOX10-Hs00366918_m1 24.9 G26 Core 24.8 20.5 down 23.664 24.9 G30 25.3 21.2 24.4 31.865 DDIT3-Hs00358796_g1 29.8 G30 32.7 21.3 23.6 34.1 Marker66 28.3 G31 32.5 21.0 25.5 29.3 28.1 G3167 27.5 C9orf125-Hs00260558_m1 21.7 25.9 29.2 31.8 G32 28.6 21.868 26.5 27.2 Core up 40.0 28.7 G32 29.3 20.8 25.8 26.369 SLIT2-Hs00191193_m1 Core 40.0 29.2 G144 down 29.1 20.9 28.0 24.270 40.0 34.1 G144 32.6 22.1 27.9 26.2 26.7 30.7 G166 40.0 37.071 31.9 IL17RD-Hs00296982_m1 20.7 28.0 25.3 27.2 31.6 G166 40.0 40.0 24.972 20.5 27.8 Core 25.2 27.1 G179 40.0 down 40.0 34.7 26.5 19.7 27.573 24.9 G179 27.7 LGALS3-Hs00173587_m1 40.0 36.5 27.2 25.5 Core 20.0 25.1 down 24.3 24.974 34.5 26.4 40.0 26.8 25.2 19.9 24.2 25.3 22.3 37.0 24.775 31.9 TERT-Hs00972656_m1 26.3 22.9 28.0 19.8 Core 33.3 24.8 27.1 up 28.7 24.9 31.2 27.076 27.1 23.8 31.9 28.9 23.5 27.2 28.9 40.0 30.3 25.5 29.0 25.977 PTEN-Hs02621230_s1 27.1 22.4 32.1 40.0 27.4 26.3 28.7 26.6 29.6 26.1 24.878 25.9 Marker 31.9 35.1 33.4 19.7 23.7 29.5 27.1 25.7 32.3 24.8 27.6 29.879 35.0 29.5 KCTD12-Hs00540818_s1 24.4 30.9 24.3 20.0 29.3 29.0 27.3 31.6 40.0 34.280 26.3 33.8 24.6 Core 27.0 27.0 31.5 23.2 40.0 down 28.1 26.9 27.081 33.3 32.3 23.2 SOX2-Hs01053049_s1 26.6 27.3 33.0 40.0 Core 28.0 22.8 down 30.7 31.7 23.3 26.7 82 28.9 21.0 27.4 40.0 28.1 40.0 27.0 32.2 26.6 29.7 21.1 31.683 40.0 26.8 31.9 27.1 ODZ2-Hs00393060_m1 29.2 26.6 33.3 25.6 32.9 21.0 36.5 30.4 32.084 30.7 28.5 25.7 34.2 24.7 32.6 30.3 21.1 36.6 Marker 30.2 29.5 26.185 CACNG8-Hs01100182_m1 28.9 24.6 27.9 25.0 40.0 32.0 31.4 32.0 32.4 25.9 29.086 22.5 28.6 Core 31.7 30.2 24.4 down 40.0 26.2 30.4 26.8 24.1 28.687 Core FOXJ1-Hs00230964_m1 31.4 31.8 23.7 down 26.4 27.2 24.2 24.4 40.0 31.2 28.8 25.6 30.4 25.888 23.9 26.2 32.9 24.3 29.9 31.7 26.4 23.7 40.0 30.5 26.3 22.0 33.989 26.6 LMO3-Hs00375237_m1 25.4 29.9 30.9 26.7 33.3 32.0 27.9 22.7 29.0 Core 27.2 35.890 down 24.8 30.3 27.3 28.6 29.8 28.1 24.5 32.1 24.4 27.8 31.491 26.1 28.6 24.8 34.9 CD9-Hs01124025_g1 29.3 28.2 24.5 32.5 24.6 33.3 28.1 30.2 25.992 29.8 29.3 24.2 28.0 Core 30.5 down 27.8 28.0 29.8 28.1 29.6 27.7 24.7 31.493 30.9 MYL9-Hs00697086_m1 28.1 30.4 30.5 26.9 25.9 40.0 29.5 27.6 23.5 30.994 30.7 30.9 37.0 31.9 28.0 40.0 29.2 29.9 23.5 33.5 24.195 Core 30.4 33.6 LAMA2-Hs00166308_m1 31.0 32.1 up 31.9 31.1 22.8 36.4 27.6 30.7 32.9 23.1 24.896 34.8 29.4 24.9 Core 30.6 down 29.4 40.0 33.7 26.9 31.7 30.8 22.7 24.5 30.9 BACE2-Hs00273238_m1 34.4 Core 23.1 29.2 30.6 32.6 33.6 30.8 22.5 down 40.0 30.2 23.3 27.0 24.0 30.3 33.0 40.0 22.2 29.8 30.8 31.0 29.1 29.9 26.9 22.4 27.1 31.4 40.0 22.4 29.6 Core 34.7 29.4 up 30.9 23.5 29.4 26.3 31.1 31.3 37.0 22.0 31.3 36.9 32.6 24.4 27.9 34.8 27.3 40.0 22.5 30.0 27.9 37.0 36.9 29.6 24.3 27.1 29.7 28.1 25.9 27.6 24.8 26.7 34.9 28.6 29.9 25.6 29.6 22.8 28.9 30.5 25.1 25.6 29.7 30.3 25.0 26.6 23.3 32.1 30.7 29.4 34.7 40.0 22.4 24.5 28.6 30.6 28.9 29.9 36.9 30.0 36.0 22.5 27.7 27.0 30.4 26.9 29.4 30.4 31.5 22.9 26.2 32.2 27.9 32.9 30.4 22.7 33.0 24.6 31.8 28.0 27.4 28.0 26.4 32.1 32.2 22.5 28.4 21.9 28.6 32.8 26.5 32.5 27.6 22.3 27.3 35.2 40.0 26.0 33.4 30.6 22.6 33.8 40.0 40.0 25.5 31.2 22.9 32.2 26.6 29.6 31.9 22.6 40.0 30.4 29.2 28.8 23.0 30.8 31.9 40.0 27.4 35.3 25.2 33.0 27.7 35.9 34.7 23.9 32.8 29.9 29.4 34.3 29.8 22.3 25.7 28.2 29.6 23.0 26.1 32.7 30.0 24.2 27.3 28.9 34.0 23.0 26.9 27.5 34.0 25.8 35.0 25.6 27.7 27.1

260 A.3 Quantitative RT-PCR Appendix 35.2 32.1 30.1 26.2 31.3 12.6 25.0 28.1 36.2 23.8 29.0 28.0 30.5 28.0 37.4 34.7 19.2 29.1 24.6 24.3 33.3 31.0 29.3 30.6 35.0 32.4 30.7 25.9 31.2 12.7 25.1 28.9 36.8 24.0 29.0 27.6 32.0 28.4 37.9 35.1 19.7 29.3 24.0 24.7 33.4 30.4 30.2 30.8 26.6 31.8 30.2 24.1 36.2 14.0 25.7 25.4 36.2 22.9 35.8 25.2 32.9 32.3 29.5 32.9 17.8 24.5 24.3 24.1 32.3 30.5 29.4 28.9 26.3 30.5 29.6 24.0 36.7 12.6 24.4 22.4 33.2 23.3 33.3 21.1 26.6 29.5 29.9 32.3 19.5 24.4 25.9 23.4 34.0 34.1 29.3 26.8 29.4 27.6 29.6 22.5 32.4 11.6 23.7 29.7 35.3 34.2 26.3 23.6 30.0 33.3 37.7 32.3 19.8 37.7 27.3 26.1 28.5 28.9 28.4 28.2 30.5 28.4 29.8 21.5 32.2 10.4 24.0 29.1 36.2 36.3 25.6 23.9 29.1 31.3 36.3 29.8 21.6 36.3 27.5 25.9 28.8 28.9 27.3 27.4 36.0 30.9 29.8 22.2 34.6 12.8 22.4 27.6 32.7 33.7 24.6 24.0 27.4 30.6 36.8 33.6 22.7 36.8 27.7 25.4 28.2 30.3 27.7 27.8 31.9 31.5 29.1 23.2 32.8 12.5 22.2 24.4 35.4 35.7 26.5 24.7 32.3 30.8 37.4 32.5 21.7 37.4 27.7 25.0 30.1 31.3 27.6 28.7 33.4 29.4 32.7 24.3 34.4 16.4 21.3 23.1 33.4 34.4 27.0 20.6 26.0 34.4 27.3 32.6 17.3 34.4 25.3 23.1 28.5 29.5 28.1 28.5 36.1 30.4 31.6 24.2 36.1 14.0 22.5 24.5 33.5 36.1 27.2 20.5 25.4 33.3 28.9 36.1 18.8 36.1 25.5 25.0 28.7 31.0 28.5 27.7 26.7 27.6 31.3 32.1 29.8 12.1 26.7 25.6 29.2 23.9 27.9 26.4 24.8 26.7 26.3 25.0 23.0 26.1 27.2 24.7 27.0 26.8 29.0 26.5 27.2 26.5 32.6 29.9 29.0 12.1 25.5 22.8 27.5 23.4 25.2 30.8 24.5 25.0 26.6 24.0 25.7 27.0 29.1 23.9 25.9 27.1 28.6 25.9 24.9 27.4 32.0 35.7 28.8 12.8 26.1 25.3 29.1 23.4 28.1 27.2 25.0 26.5 26.6 25.5 21.1 24.3 26.5 23.8 26.3 27.3 28.3 26.5 26.8 32.6 30.7 32.4 37.0 12.5 25.7 28.5 33.8 24.4 29.8 22.3 29.6 32.9 30.7 37.0 21.9 37.0 27.6 25.4 31.0 29.0 30.4 28.3 29.5 30.8 31.3 36.9 36.9 11.2 27.1 30.3 35.5 23.3 30.9 27.4 30.9 30.2 31.1 36.9 20.5 36.9 25.9 25.1 34.1 28.8 32.4 30.0 30.2 29.5 30.3 35.4 35.4 10.8 26.3 28.3 33.6 23.5 29.6 25.4 27.0 27.4 30.3 35.4 21.7 35.4 26.5 25.3 31.8 28.2 31.1 28.4 29.4 32.9 32.9 37.5 37.5 13.4 26.9 35.6 37.5 24.4 30.4 29.1 32.3 33.6 29.7 37.5 19.5 37.5 26.1 25.0 33.6 30.3 34.6 30.0 25.3 29.3 31.3 28.4 37.2 11.3 25.4 22.2 35.1 21.8 29.4 22.3 24.6 27.2 30.1 34.8 20.8 37.2 26.0 23.4 27.8 29.7 29.7 29.0 25.6 30.4 31.2 27.9 36.4 10.9 26.6 24.1 36.4 22.7 30.3 25.3 26.5 28.3 30.8 36.4 20.0 36.4 25.2 24.7 31.2 29.1 31.7 30.3 25.2 31.7 31.8 36.8 36.8 13.8 27.5 29.5 36.8 23.6 30.7 29.8 28.3 31.4 28.9 36.8 18.9 36.8 24.9 24.1 30.3 30.3 35.1 29.5 23.0 29.2 30.3 33.2 37.1 12.2 26.7 27.0 36.0 23.3 30.0 28.3 23.1 28.3 29.3 37.1 20.9 37.1 24.8 24.2 29.4 33.6 32.9 28.1 28.2 30.4 30.6 34.8 37.1 11.5 26.9 31.0 37.1 23.0 31.2 26.4 30.3 31.3 30.7 37.1 20.8 37.1 24.1 25.2 31.5 28.5 32.3 30.1 27.9 30.4 30.7 37.4 37.4 12.0 26.8 31.2 36.2 22.8 31.4 26.5 30.7 31.1 30.9 37.4 20.4 37.4 24.0 25.0 31.8 28.3 32.4 29.5 28.2 31.3 33.1 37.4 37.4 13.7 27.0 33.6 37.4 24.4 31.0 27.8 32.0 33.2 30.4 37.4 20.0 37.4 26.0 25.1 32.3 31.8 34.8 29.9 CategoryCore down ACore down BCore down 28.9 CCore up 28.8 28.6 AMarker 33.5 BCore 33.7 down 32.4 CNorm 31.9 29.8 DCore up 30.5 29.5 ACore down 25.6 B 26.7Core down 23.4 CCore 24.9 down 35.2Core 33.3 down 36.4 36.8Core up 34.9 34.7Core 28.6 down A 28.9 31.1Core down 29.9 BCore down ACore down BCore A up BCore down CCore up DCore down ACore down BCore down A B Core down Core down Normalised Ct values. Abbreviations: "down" for down-regulated, "up" for up-regulated, and "Norm" for Normalisation control. MYL9-Hs00382913_m1 RGS5-Hs00186212_m1 CEBPB-Hs00270923_s1 PDGFRA-Hs00998026_m1 NKX2-1-Hs00163037_m1 18S-Hs99999901_s1 LMO4-Hs01086790_m1 SPARCL1-Hs00949886_m1 FBLN2-Hs00157482_m1 TUSC3-Hs00185147_m1 PEG3-Hs00377844_m1 HLA-DRA-Hs00219575_m1 PI15-Hs00210658_m1 DTX4-Hs00392288_m1 IRX2-Hs01383002_m1 SIX3-Hs00193667_m1 S100A6-Hs00170953_m1 FAM38B-Hs00926225_m1 PMEPA1-Hs00375306_m1 NDN-Hs00267349_s1 MN1-Hs00159202_m1 HMGA2-Hs00171569_m1 RAB6B-Hs00981572_m1 KALRN-Hs00610179_m1 Hs00229612_m1 Table A.4: Well Applied Biosystems assay ID1 Gene23 CHCHD10-Hs01369775_g14 CB130 CB130 CB130 CB152 Core CB152 ST6GALNAC5- CB152 up CB1525 CB171 CB171 CB171 CB192 CB5416 CB660 CB660 G2 27.17 C5orf13-Hs00854282_g1 G2 25.789 G7 26.1 CCND2-Hs00922419_g110 25.9 Core G7 down11 CTSC-Hs00175188_m1 25.3 G7 28.512 Core 24.6 down G7 27.713 DNER-Hs00294564_m1 24.5 32.314 G9 28.1 Core up 27.2 30.815 PLCH1-Hs00392783_m1 25.5 G9 26.516 Core 31.0 down 27.3 G14 23.5 26.117 SYNM-Hs00322391_m1 31.6 G14 26.4 32.3 Core 24.618 down 28.8 33.7 25.6 32.3 24.619 SALL2-Hs00413788_m1 29.7 26.5 28.4 28.5 32.820 Core 24.6 down 29.3 26.3 28.5 25.2 32.721 GPR158-Hs00393109_m1 24.0 29.5 29.3 26.2 32.4 Core 26.622 31.5 down 24.7 24.7 28.2 26.4 31.123 29.1 FUT8-Hs00189535_m1 28.1 Core 23.1 37.4 24.3 down 28.1 26.8 30.924 24.8 26.2 27.5 33.4 23.5 26.8 27.3 34.825 25.7 25.1 ASCL1-Hs00269932_m1 25.2 32.2 33.5 24.3 25.626 27.3 26.3 34.5 24.1 Core 26.2 up 29.9 24.7 31.0 24.027 EPDR1-Hs00378148_m1 27.3 26.2 34.9 24.1 25.5 36.1 32.0 Marker28 36.7 23.9 26.3 25.7 26.4 37.0 25.2 24.7 26.6 30.429 30.1 CXXC4-Hs00228693_m1 25.1 27.1 30.3 28.6 27.5 36.3 22.5 Core 26.7 26.9 up30 22.3 29.0 27.2 25.5 34.3 21.3 27.3 26.8 31.4 22.831 26.8 MAP6-Hs01023152_s1 22.4 22.7 37.4 23.7 32.4 Core 26.9 28.6 27.0 32.6 down32 22.0 26.3 29.0 22.7 28.7 27.3 20.6 31.9 27.1 27.8 36.0 23.933 NKX2-2-Hs00159616_m1 31.3 26.7 36.2 20.0 28.3 25.5 33.1 23.2 32.8 27.6 28.734 34.5 27.8 25.3 23.3 31.2 27.4 Core 28.8 30.4 down 32.8 35.7 31.5 24.435 27.8 27.1 32.6 33.3 MMP17-Hs01108847_m1 29.9 30.8 Marker 26.4 26.6 26.5 25.2 26.5 32.236 28.0 27.5 34.1 27.2 26.9 34.2 25.1 23.5 27.0 27.3 29.5 37.3 37 24.5 25.1 OLIG2-Hs00300164_s1 26.3 28.6 25.6 23.0 25.0 29.9 Core 36.8 27.1 up 28.838 37.4 27.9 27.7 23.5 25.4 35.0 26.5 29.1 27.1 30.2 27.139 26.9 29.0 23.4 24.0 HOXD10-Hs00157974_m1 37.4 28.9 30.5 25.9 30.5 26.6 29.5 23.2 32.5 29.640 27.9 26.4 31.2 28.3 29.5 Marker 37.1 27.4 35.0 28.7 31.5 26.3 29.1 28.141 32.1 FOXG1-Hs01850784_s1 Core 27.9 28.7 28.9 37.1 up 28.6 26.4 27.2 24.1 32.7 28.542 29.7 25.2 28.4 28.6 29.0 28.2 26.8 36.8 28.9 22.0 29.4 30.243 27.8 NDUFB10-Hs00605903_m1 26.6 33.1 30.6 27.1 28.4 30.8 32.6 37.4 33.7 27.5 27.1 33.1 Core44 27.9 34.6 up 28.3 25.2 27.5 29.0 Norm 34.5 33.8 37.4 26.645 36.2 24.1 34.4 33.2 MMRN1-Hs00201182_m1 24.0 25.9 34.6 29.7 30.0 31.6 37.5 37.146 24.7 23.5 30.1 27.9 34.8 37.4 30.3 25.7 29.1 30.047 22.4 35.4 27.3 37.1 NTRK2-Hs00178811_m1 28.1 32.6 30.0 Core down 37.4 25.1 25.9 27.9 26.548 28.0 26.0 32.3 36.9 36.8 26.9 28.3 24.1 27.3 37.1 25.2 37.4 25.5 27.7 CD74-Hs00269961_m1 29.3 37.0 32.5 23.7 25.6 25.8 28.4 Core 28.6 24.9 33.7 up 37.4 26.0 30.0 25.8 26.5 28.6 32.4 28.8 27.4 28.0 32.5 34.7 36.7 28.5 37.1 25.6 28.7 28.6 26.5 30.8 37.5 25.9 34.3 36.2 28.3 28.2 25.3 29.3 25.5 37.1 25.0 Core 32.3 24.1 up 29.6 35.4 34.3 26.6 35.4 27.5 25.3 30.9 36.8 27.4 25.8 37.4 30.3 28.1 25.0 36.9 36.7 29.2 26.7 37.5 31.1 36.4 28.8 25.0 23.7 36.2 29.9 27.6 37.0 26.8 26.1 28.8 27.2 31.1 29.3 23.4 37.2 25.5 27.2 25.7 27.7 37.5 26.1 30.3 28.1 35.8 24.7 28.3 27.4 37.5 26.0 28.2 27.5 37.1 27.8 29.8 30.4 35.3 25.5 30.0 35.4 25.8 28.0 29.2 37.9 29.8 23.4 26.5 27.8 23.1 36.9 25.4 29.1 29.1 28.7 23.9 29.1 26.4 30.8 27.6 29.7 37.0 25.6 22.2 27.4 28.9 27.6 27.4 22.0 27.4 28.8 28.5 28.2 26.4 26.0 26.4 24.2 21.5 27.7 25.0 30.9 28.9 27.3 26.0 23.4 28.1 27.8 23.5 28.5 29.0 30.2 27.4 28.6 24.6 23.9 25.7 32.0 28.0 22.4 29.0 36.1 23.9 28.8 31.4 28.1 25.5 34.4 21.4 26.9 36.4 28.7 25.5 22.2 37.0 21.4 28.6 28.7 26.4 36.2 24.0 27.4 26.2 21.5 36.3 24.4 30.3 25.2 20.8 36.7 24.6 24.4 29.5 26.3 36.7 24.4 25.0 24.8 36.1 20.8 25.4 26.9 33.2 21.6 27.5 34.3 25.7 22.4 23.7 24.2 25.7 25.2 25.5 24.0 21.6 26.3 27.8 28.3

261 A.3 Quantitative RT-PCR Appendix 30.0 26.9 24.2 32.3 29.3 28.6 27.1 27.7 31.2 26.9 22.7 30.0 25.7 22.4 34.3 37.4 30.2 27.4 24.7 29.5 31.5 27.0 30.3 28.9 30.4 27.2 24.8 32.6 29.5 28.8 27.1 27.1 31.3 27.0 23.4 30.2 26.2 22.7 35.0 36.7 30.1 29.7 24.7 28.4 30.9 27.2 29.9 29.3 26.5 28.5 22.9 30.0 21.3 29.7 28.1 29.4 25.6 27.1 18.2 30.1 25.1 22.3 34.1 36.2 28.8 22.9 26.0 32.1 24.6 27.1 29.2 28.1 22.7 27.0 23.7 27.7 21.3 29.7 27.9 26.3 23.8 26.1 18.8 27.6 24.6 22.2 32.8 33.0 27.4 21.9 25.3 33.1 22.7 27.8 28.5 28.9 31.7 28.4 25.1 27.5 30.0 25.9 31.6 27.0 23.0 27.4 20.7 30.2 26.6 24.9 29.4 37.7 28.5 22.9 33.8 30.2 27.1 27.4 29.2 31.4 32.3 27.8 24.6 26.4 36.3 27.2 32.8 25.9 24.2 27.5 21.7 28.6 26.6 26.2 32.5 36.3 28.7 23.7 33.5 27.9 28.6 26.9 29.7 30.2 28.8 26.2 22.9 26.4 36.7 29.1 34.1 25.0 22.9 28.2 21.2 28.9 26.3 26.4 31.2 36.8 29.6 25.9 30.0 28.7 27.1 24.8 28.5 29.4 32.9 27.2 23.7 25.8 37.4 28.2 30.7 25.4 23.8 27.0 20.8 27.5 26.2 25.7 31.2 37.4 28.6 24.1 28.9 29.3 22.8 25.0 28.8 29.0 34.4 23.4 21.3 24.6 27.1 24.4 26.6 23.8 23.5 24.4 19.7 29.9 23.4 21.8 29.6 34.0 29.3 30.0 26.8 28.3 29.6 26.3 26.1 27.7 36.1 24.9 22.0 25.8 26.4 26.5 26.9 24.7 24.5 26.5 21.6 28.8 24.7 22.5 30.6 36.1 31.0 30.0 26.5 28.7 27.7 26.3 27.3 28.1 25.5 28.7 27.5 27.2 22.6 26.9 27.8 24.2 23.4 26.3 25.0 27.6 28.8 25.0 33.4 26.5 29.4 28.8 28.9 26.4 30.7 32.6 32.4 33.3 24.7 25.7 27.9 25.9 23.7 26.6 28.5 23.1 23.6 25.9 24.6 26.8 29.8 25.7 32.5 26.2 28.6 29.6 29.7 25.9 30.9 30.4 32.8 32.3 26.0 26.6 26.9 26.6 21.6 25.7 27.1 23.3 22.6 25.5 23.8 26.9 28.1 26.6 33.0 25.6 28.4 28.2 30.4 25.7 32.3 37.5 31.1 35.4 29.9 27.5 26.9 30.5 21.9 27.1 28.6 22.7 21.5 25.5 23.9 28.4 26.4 22.9 34.9 32.6 29.9 30.0 37.0 27.0 32.0 28.1 31.1 29.9 32.7 25.0 31.2 33.1 19.9 27.6 28.1 26.4 31.4 25.1 23.9 28.7 26.2 28.8 33.0 26.8 30.0 28.4 33.7 28.9 34.0 28.8 31.4 31.7 31.8 24.4 29.3 33.3 19.1 28.1 28.0 25.2 31.0 24.8 23.7 26.1 24.8 28.1 31.0 27.5 29.2 29.3 31.8 27.0 34.4 27.8 30.1 32.0 32.1 27.0 28.4 36.4 22.4 26.9 28.0 25.1 31.4 25.7 26.0 28.8 25.2 24.9 35.3 26.2 31.6 32.5 30.2 30.0 37.5 25.4 30.8 29.7 32.2 22.6 25.6 29.3 22.1 26.2 27.0 24.5 31.2 23.5 20.6 26.8 23.7 26.6 29.9 28.6 30.9 29.8 33.6 26.1 33.5 28.0 29.5 33.3 36.3 24.4 28.5 29.3 19.5 26.8 28.3 25.6 28.9 24.4 20.2 28.9 25.9 26.1 32.4 28.2 31.5 29.5 33.6 27.1 30.8 28.6 30.8 32.3 31.7 26.4 26.6 33.8 21.7 25.6 27.0 25.0 28.1 24.3 22.2 28.6 24.4 23.9 33.7 27.1 34.0 31.9 30.6 30.2 35.3 27.1 30.3 36.8 30.4 25.7 26.0 35.5 20.6 26.5 27.7 24.4 31.0 24.3 23.1 27.3 23.9 25.1 32.3 27.8 34.5 29.0 32.2 26.5 34.2 27.2 30.8 36.1 29.3 26.2 30.3 33.8 18.6 26.8 28.1 26.1 32.7 24.8 24.4 29.1 25.3 28.0 32.0 26.2 29.6 28.0 35.0 29.4 32.5 27.9 30.4 33.2 29.3 25.7 30.1 37.4 19.3 26.3 28.4 26.4 32.7 24.9 24.0 29.3 25.5 28.0 32.2 26.0 29.4 27.9 35.0 29.7 32.1 27.8 30.5 33.6 29.7 27.4 28.9 37.2 21.5 26.9 28.2 25.4 30.5 25.1 25.5 29.3 25.7 25.0 34.7 26.0 31.3 31.3 30.2 30.6 34.4 26.2 31.4 29.6 CategoryCore down ACore down BCore up CCore down ACore down BCore down CCore down DCore down ACore down BCore down CCore up Core down Core up ACore up BCore down ACore down BCore down ACore up BCore up CCore down D AMarker BCore up ACore up B Core up LUM-Hs00158940_m1 MAP6-Hs01929835_s1 SULF2-Hs00378697_m1 LMO2-Hs00277106_m1 TAGLN-Hs00162558_m1 MAF-Hs00193519_m1 EDA2R-Hs00939736_m1 SEMA6A-Hs00221174_m1 CA12-Hs01080909_m1 SDC2-Hs00299807_m1 MT2A-Hs02379661_g1 PDZRN3-Hs00392900_m1 FAM69A-Hs00961685_m1 PLS3-Hs00418605_g1 SALL2-Hs00826674_m1 TES-Hs00210319_m1 LPAR6-Hs00271758_s1 NNMT-Hs00196287_m1 PRSS12-Hs00186221_m1 NTN1-Hs00180355_m1 CHI3L1-Hs00609691_m1 ADD2-Hs00242289_m1 LYST-Hs00179814_m1 PLA2G4A-Hs00233352_m1 Well Applied Biosystems assay ID49 Gene5051 NELL2-Hs00196254_m152 CB130 CB130 CB13053 CB152 PDE1C-Hs01095694_m1 CB152 CB152 CB152 Core54 CB171 down CB171 CB171 CB192 CB54155 TUBB-Hs00962420_g1 CB660 CB660 24.8 G2 Core up56 24.8 G257 MAN1C1-Hs00220595_m1 24.858 G7 30.3 Norm 28.159 Core RTN1-Hs00382515_m1 G7 up 32.8 29.160 33.1 G761 28.5 NPTX2-Hs00383983_m1 20.3 28.1 27.1 G762 25.1 Core down 21.7 27.4 27.663 G9 SOX10-Hs00366918_m1 24.8 Core 21.7 26.7 27.2 28.4 down64 G9 24.0 21.3 23.465 26.0 28.2 DDIT3-Hs00358796_g1 35.6 G14 25.3 Marker 20.466 23.3 28.7 29.7 G14 31.0 25.467 22.5 C9orf125-Hs00260558_m1 29.8 27.1 29.8 30.6 25.068 Core 22.9 28.6 25.9 30.7 up 37.4 30.6 23.969 SLIT2-Hs00191193_m1 Core 20.2 28.3 down 27.7 30.4 37.4 26.670 25.8 22.4 26.2 25.6 26.7 28.3 37.1 33.2 32.171 IL17RD-Hs00296982_m1 29.6 22.2 27.7 26.9 28.0 29.4 37.1 32.7 30.072 28.4 Core down 21.3 26.0 30.6 27.2 26.7 29.3 36.873 32.9 37.5 LGALS3-Hs00173587_m1 Core 30.5 29.7 down 20.8 26.874 31.6 25.4 28.5 34.6 36.8 35.3 31.0 33.5 25.5 21.175 23.8 TERT-Hs00972656_m1 27.1 29.1 27.4 33.0 36.8 31.4 Core 35.3 up 24.7 33.076 28.4 21.1 24.3 26.1 29.8 27.3 37.5 33.9 24.4 33.6 29.277 29.0 PTEN-Hs02621230_s1 20.6 24.7 25.6 25.8 25.9 27.5 35.4 35.0 23.4 35.1 26.178 20.4 Marker 24.0 28.5 26.4 26.2 25.9 34.4 29.0 36.9 34.2 25.7 21.2 23.5 24.979 26.4 KCTD12-Hs00540818_s1 30.8 25.2 26.7 20.9 28.7 37.0 26.3 33.7 26.6 24.580 25.5 23.5 Core 29.6 22.3 25.1 37.4 26.8 down 26.5 28.5 37.5 36.381 36.1 SOX2-Hs01053049_s1 23.8 24.9 21.3 24.8 27.1 Core 30.3 37.4 down 34.4 26.4 28.7 35.282 30.4 24.2 29.4 21.4 24.9 23.8 26.6 31.4 25.6 37.1 20.783 26.3 31.0 28.1 37.9 26.6 ODZ2-Hs00393060_m1 27.9 29.0 29.2 23.8 24.9 21.4 32.8 28.484 37.1 26.7 30.7 28.1 27.0 27.7 30.7 28.3 25.8 21.2 23.2 Marker 27.8 31.885 25.4 29.2 CACNG8-Hs01100182_m1 36.8 26.5 26.8 30.4 28.3 29.5 29.9 26.1 37.4 30.786 25.6 24.5 Core 36.4 26.3 33.4 26.4 down 30.3 27.0 36.8 26.0 25.0 25.6 87 Core FOXJ1-Hs00230964_m1 25.6 24.3 down 24.6 27.6 37.2 35.2 30.5 24.3 33.7 27.4 28.788 25.1 26.3 29.4 21.7 36.7 29.6 25.2 37.5 37.4 24.5 32.1 26.6 25.7 27.889 28.3 26.6 LMO3-Hs00375237_m1 36.7 21.8 36.1 35.4 33.4 28.1 24.6 29.4 Core 33.4 29.3 36.290 27.6 down 26.3 33.9 26.9 24.3 30.1 37.9 36.9 32.8 26.5 30.191 26.7 CD9-Hs01124025_g1 28.3 26.8 26.3 36.1 25.7 28.0 37.4 26.0 26.2 37.0 32.8 27.492 28.1 Core 27.5 26.3 27.4 27.7 down 33.3 25.8 28.1 31.2 22.293 27.3 37.2 36.8 31.3 MYL9-Hs00697086_m1 29.7 29.6 26.8 27.4 31.4 32.8 24.1 29.6 27.3 31.4 20.694 36.6 26.8 33.1 26.5 26.1 28.1 21.4 27.0 26.8 28.995 27.6 Core 24.8 LAMA2-Hs00166308_m1 27.5 26.8 36.3 up 34.7 25.8 27.7 29.2 24.0 33.4 29.196 26.0 27.5 Core 20.8 down 32.6 34.0 27.4 24.0 28.1 28.3 25.9 25.5 29.8 34.7 BACE2-Hs00273238_m1 31.8 22.0 26.8 Core 27.8 33.1 25.8 25.8 27.0 36.1 down 29.4 32.8 26.5 36.8 26.4 24.4 22.4 34.4 30.8 35.3 24.7 32.0 26.8 25.7 35.6 35.3 34.8 22.7 30.3 23.6 32.0 Core 27.4 31.1 28.7 24.9 up 26.4 27.4 37.4 30.7 28.5 30.9 32.6 31.2 23.6 25.0 27.8 25.7 27.1 21.1 33.1 32.1 25.1 37.1 33.4 30.1 24.3 28.0 24.1 26.3 32.6 34.8 22.4 26.9 25.6 37.1 35.6 36.7 27.4 31.9 32.8 23.7 25.5 28.2 23.5 27.1 36.2 29.8 36.8 34.4 26.9 31.0 22.4 30.3 24.4 28.7 28.8 29.0 23.8 27.2 28.9 23.4 36.4 29.5 33.9 28.4 27.8 25.1 34.4 23.2 28.2 27.0 26.9 37.2 27.2 33.0 26.2 29.2 32.7 21.9 28.2 28.4 26.0 28.5 27.1 37.5 22.4 29.1 25.7 29.3 26.8 29.8 28.7 27.7 25.0 28.7 35.4 29.2 29.5 27.5 27.1 31.6 27.1 24.1 27.1 29.2 36.9 29.3 36.1 24.6 32.3 27.8 31.6 23.0 26.6 26.3 34.4 24.1 31.1 30.5 32.9 26.2 28.1 31.6 25.6 27.5 31.7 29.1 32.7 35.0 36.7 27.6 24.5 28.1 26.9 27.3 36.2 20.1 30.7 34.1 28.6 33.9 29.5 19.6 33.3 35.3 33.5 32.2 31.1 22.8 28.8 36.1 32.6 31.3 22.2 33.0 37.9 28.1 22.1 27.9 37.4 31.3 32.6 21.9 26.6 25.0 27.1 23.8 22.0 23.9 25.9 23.0 23.4 27.0 25.5 23.8 33.3 26.5 28.5 23.8 33.4 28.0 31.4 27.0 30.7 26.0 24.7 25.4 25.4

262 A.3 Quantitative RT-PCR Appendix 31.3 30.0 28.0 25.8 36.9 11.2 23.1 25.1 28.5 25.1 36.9 21.0 26.6 27.9 30.2 36.9 18.1 31.5 25.6 35.0 30.6 28.5 29.7 30.8 29.0 31.1 30.3 27.3 25.8 36.2 11.0 23.6 25.5 27.9 25.7 36.2 21.4 27.5 27.7 29.8 36.2 18.5 31.9 25.3 36.2 30.7 28.6 30.2 30.5 29.6 28.7 31.5 28.8 28.7 37.1 13.9 24.2 30.6 34.8 28.9 31.6 26.7 30.1 31.8 37.1 33.2 18.0 37.1 25.1 37.1 35.1 32.2 30.7 31.1 27.6 29.3 31.4 29.4 26.2 38.0 13.0 24.7 29.0 37.5 33.1 30.8 29.3 28.4 32.0 38.0 34.7 18.5 38.0 23.6 36.8 35.7 32.3 30.7 28.9 29.8 32.8 31.8 30.4 25.0 35.7 12.6 23.5 30.2 34.2 35.8 27.6 21.6 32.9 29.5 35.8 29.7 22.6 35.8 26.5 25.8 28.9 32.6 29.0 29.6 35.8 31.7 33.6 29.6 24.2 32.2 12.0 23.2 28.7 31.7 34.5 27.0 21.9 30.1 27.8 34.5 29.1 21.4 34.5 24.8 26.1 29.1 33.3 29.1 29.6 34.5 32.9 31.3 30.6 29.6 31.8 12.7 23.7 29.7 36.2 30.9 32.7 31.2 35.4 28.3 35.6 35.6 22.5 37.7 27.7 25.7 27.1 33.1 27.8 28.2 37.7 31.9 30.9 30.5 28.1 33.0 12.4 23.7 27.7 34.7 30.8 29.8 28.8 28.5 27.3 37.8 37.8 23.5 37.8 28.1 26.3 27.6 33.3 27.4 27.4 35.7 35.4 31.1 29.8 28.3 32.3 12.5 23.2 27.0 35.1 37.6 37.6 32.6 37.6 27.0 37.6 36.7 23.5 37.6 27.5 25.7 27.8 33.4 27.3 28.1 31.8 36.0 31.0 30.0 28.4 32.1 12.3 23.3 27.5 35.2 37.9 37.9 34.6 36.9 27.4 37.9 35.6 23.9 37.9 28.0 26.4 27.4 34.0 27.6 28.3 33.3 34.2 30.7 30.3 24.2 37.6 12.5 23.2 24.6 35.2 25.5 26.2 24.5 31.3 30.1 37.6 37.1 21.5 37.6 25.3 37.6 27.7 32.1 28.1 28.1 37.6 34.4 30.3 30.3 23.9 37.4 12.4 23.2 24.6 34.5 25.3 25.6 24.3 31.5 30.2 37.4 35.4 21.4 37.4 25.6 37.3 28.4 32.8 28.3 27.6 37.4 29.6 32.0 30.2 26.6 36.7 11.8 25.1 28.9 35.8 36.7 29.8 24.2 26.5 33.8 36.7 36.7 20.9 36.7 27.0 26.8 35.1 33.3 32.1 30.8 30.9 29.5 30.8 30.3 24.4 37.6 12.3 23.9 25.4 34.9 37.6 28.2 24.4 28.4 35.1 37.6 37.6 22.0 37.6 27.4 27.8 34.3 32.7 31.4 31.2 32.0 30.2 30.4 29.8 26.4 37.4 12.2 24.2 27.7 34.3 35.4 27.6 23.3 27.9 29.7 37.4 36.3 19.6 37.4 24.7 26.1 29.3 32.2 29.4 30.5 37.4 30.4 30.0 29.5 26.2 36.7 12.0 24.1 27.2 32.5 36.6 27.0 22.1 27.2 29.7 36.7 35.7 20.0 36.7 25.1 25.6 30.2 32.1 29.2 30.4 36.7 27.1 31.1 29.9 24.0 36.6 12.6 23.8 23.6 35.5 32.5 28.1 27.4 26.2 30.2 36.6 34.5 19.6 36.6 22.6 25.4 26.9 35.5 27.6 27.9 27.1 28.2 31.8 30.4 25.4 36.3 12.2 24.9 25.5 37.2 34.6 28.3 28.4 29.3 30.9 36.9 35.0 20.4 37.5 22.8 26.4 26.9 35.8 28.4 29.0 33.3 32.1 30.0 27.9 27.4 35.0 12.3 23.7 29.0 30.8 25.1 34.0 37.1 32.3 33.9 28.8 30.6 17.9 36.5 26.5 37.1 33.6 27.3 33.1 31.9 26.1 29.4 30.3 28.5 25.5 37.3 12.1 23.7 26.5 30.4 24.7 33.2 35.6 30.9 33.1 28.6 30.2 18.6 37.2 25.8 37.3 32.1 27.7 33.5 30.6 23.6 33.4 33.2 31.1 22.7 35.7 12.7 22.6 27.7 32.2 37.3 25.2 35.4 29.3 28.4 37.3 37.3 27.6 37.3 28.0 30.5 28.3 33.5 29.6 27.7 37.3 34.7 33.0 30.9 23.4 36.2 12.9 22.8 27.3 31.9 37.9 24.5 34.3 33.2 29.4 37.9 37.9 30.1 37.9 28.8 30.3 28.9 33.9 30.7 28.4 37.9 35.7 31.2 30.0 24.6 31.0 12.7 24.8 25.2 35.0 23.0 28.5 25.7 30.4 27.3 36.6 33.6 19.7 28.9 24.7 23.7 34.5 31.6 30.3 30.2 31.4 34.3 31.3 30.9 25.0 31.4 13.2 24.8 27.3 37.3 23.5 29.3 27.3 29.2 28.1 37.3 33.4 18.8 29.3 24.4 24.1 32.5 30.2 29.2 30.8 30.4 CategoryCore down ACore down BCore down A 30.0Core up 30.7 B 26.4 26.5Marker A 33.2 30.4Core B down 33.6 36.6 ANorm 35.6 34.7 B 30.6Core up 31.0 A 33.9Core 34.3 down B 28.1 27.6Core down 29.5 A 29.3Core 33.5 down B 34.7 ACore down 31.9 33.7 BCore up 33.7 A 33.1 Core down BCore down ACore down BCore A down Core B up Core A down Core B up Core A down Core B down Core down Core down Core down Core down MYL9-Hs00382913_m1 RGS5-Hs00186212_m1 Hs00229612_m1 CEBPB-Hs00270923_s1 PDGFRA-Hs00998026_m1 NKX2-1-Hs00163037_m1 18S-Hs99999901_s1 LMO4-Hs01086790_m1 SPARCL1-Hs00949886_m1 FBLN2-Hs00157482_m1 TUSC3-Hs00185147_m1 PEG3-Hs00377844_m1 HLA-DRA-Hs00219575_m1 PI15-Hs00210658_m1 DTX4-Hs00392288_m1 IRX2-Hs01383002_m1 SIX3-Hs00193667_m1 S100A6-Hs00170953_m1 FAM38B-Hs00926225_m1 PMEPA1-Hs00375306_m1 NDN-Hs00267349_s1 MN1-Hs00159202_m1 HMGA2-Hs00171569_m1 RAB6B-Hs00981572_m1 KALRN-Hs00610179_m1 LUM-Hs00158940_m1 Well Applied Biosystems assay ID1 Gene23 CHCHD10-Hs01369775_g14 G19 G19 Core ST6GALNAC5- up5 G21 G216 G23 26.67 C5orf13-Hs00854282_g1 G23 26.78 G24 27.89 G24 27.6 CCND2-Hs00922419_g110 G25 26.0 Core down 26.1 G2511 CTSC-Hs00175188_m1 26.5 G26 29.412 Core 25.6 down G26 27.613 DNER-Hs00294564_m1 26.2 G30 28.1 35.314 26.0 Core G30 28.1 up 30.0 25.815 G31 28.5 PLCH1-Hs00392783_m1 22.5 26.0 G31 25.316 Core 22.4 down 27.0 G32 22.1 22.217 28.2 SYNM-Hs00322391_m1 27.4 24.0 22.8 G32 31.7 Core 27.518 down 27.4 26.0 25.1 G144 27.9 30.419 27.1 G144 SALL2-Hs00413788_m1 27.1 25.8 28.2 26.3 32.0 G166 27.8 37.6 21.920 Core 26.7 25.9 down 27.1 G166 27.3 29.1 20.9 33.0 28.021 GPR158-Hs00393109_m1 27.2 G179 25.1 34.3 26.8 26.5 32.5 28.0 Core 25.122 G179 down 33.0 25.6 25.1 28.2 27.2 25.5 25.623 FUT8-Hs00189535_m1 28.8 37.9 22.2 26.8 28.8 Core 30.2 22.5 down 26.324 28.6 26.9 22.5 26.3 28.5 27.7 22.7 28.0 37.8 21.2 24.9 27.1 30.925 28.1 27.5 ASCL1-Hs00269932_m1 21.7 27.7 28.1 21.8 25.9 29.5 31.3 26.626 23.9 21.5 Core 27.7 24.4 24.3 up 27.6 28.8 30.9 34.6 21.427 EPDR1-Hs00378148_m1 23.9 27.9 24.2 24.5 28.5 30.0 37.1 35.8 21.9 Marker28 35.7 25.3 24.6 37.3 29.6 35.5 22.5 27.1 27.7 37.4 25.129 24.9 37.1 29.9 CXXC4-Hs00228693_m1 32.8 27.8 24.2 Core 30.0 25.1 25.1 32.8 28.1 29.9 up30 31.7 37.2 26.0 29.0 25.2 25.8 32.3 28.2 33.4 31.431 27.9 MAP6-Hs01023152_s1 30.3 25.4 37.4 21.4 26.8 30.8 28.0 Core 30.1 down 32.232 24.0 27.3 28.0 36.7 28.0 28.7 32.0 28.3 32.7 21.6 23.7 27.333 32.9 31.7 28.7 28.8 NKX2-2-Hs00159616_m1 30.0 27.7 26.6 27.4 29.0 22.2 27.6 33.9 31.1 30.134 27.8 30.0 26.8 31.4 Core 30.6 27.0 28.0 32.5 31.2 down 30.8 22.9 25.2 27.535 MMP17-Hs01108847_m1 30.3 29.9 25.6 32.7 28.7 Marker 31.9 25.1 30.4 27.1 23.536 28.2 33.5 26.3 29.8 34.5 29.3 29.6 29.4 27.6 27.1 27.9 35.1 28.037 23.0 28.1 OLIG2-Hs00300164_s1 29.9 31.0 27.9 29.7 35.8 Core 26.9 37.4 28.1 up 26.8 27.838 29.1 29.9 28.7 27.7 27.0 30.5 37.8 37.0 26.6 26.6 28.0 29.9 29.539 HOXD10-Hs00157974_m1 28.6 27.5 36.7 27.0 26.2 30.8 37.1 27.4 25.7 36.640 29.7 28.7 23.8 27.5 Marker 27.6 38.0 28.1 37.4 28.8 27.7 31.4 29.141 24.0 26.7 FOXG1-Hs01850784_s1 Core 27.7 27.8 35.5 up 37.1 29.3 31.2 22.8 26.8 28.9 32.442 28.3 28.0 36.7 30.1 30.5 23.1 27.3 33.1 35.2 28.4 26.8 27.343 33.4 NDUFB10-Hs00605903_m1 31.2 28.9 22.9 25.8 28.7 30.2 28.8 26.4 27.4 33.7 Core44 33.1 30.2 27.8 up 23.4 23.0 29.7 28.0 25.2 26.3 26.3 Norm 32.345 29.3 25.6 MMRN1-Hs00201182_m1 23.2 29.9 30.8 37.9 25.2 26.7 26.0 28.8 28.546 27.1 29.6 37.3 26.4 25.3 26.2 24.7 28.8 24.8 26.1 27.6 28.3 28.047 27.0 27.7 NTRK2-Hs00178811_m1 23.8 36.1 Core 30.1 25.3 down 25.8 28.7 25.3 25.3 28.248 26.5 28.6 23.8 30.0 30.7 37.1 25.8 29.8 25.2 25.3 34.0 23.349 28.1 29.4 31.0 CD74-Hs00269961_m1 26.1 28.1 22.8 29.1 25.4 25.9 30.9 Core 31.1 24.8 30.5 up 26.550 31.4 23.3 29.2 25.6 25.9 26.9 37.9 25.5 31.2 26.6 31.2 23.7 29.3 25.6 25.5 NELL2-Hs00196254_m1 32.9 37.3 24.8 28.8 30.3 24.6 30.6 30.1 25.4 25.2 24.3 33.3 24.9 29.4 Core 36.1 28.4 22.1 up 30.9 25.6 24.6 24.1 31.7 34.7 25.1 30.0 21.8 29.3 25.3 34.9 30.4 37.5 25.1 Core 24.8 29.9 33.8 down 23.5 29.2 25.7 30.0 36.6 26.9 26.7 27.9 29.4 23.4 29.7 25.7 26.7 33.4 31.9 36.7 25.9 25.9 23.4 29.5 25.2 25.3 28.9 32.8 32.7 37.4 23.8 26.4 33.0 23.4 29.6 25.6 34.1 33.6 27.6 32.6 24.1 30.3 35.4 22.3 29.8 25.7 29.3 32.7 31.2 22.3 33.0 26.6 30.9 28.1 25.5 22.4 31.8 23.1 37.4 22.2 34.3 26.2 27.8 32.2 23.8 27.6 37.6 22.2 32.1 28.4 26.1 30.3 23.9 37.9 22.5 28.3 27.8 27.5 26.2 33.3 29.2 26.0 37.6 23.8 21.9 25.9 27.1 36.6 24.7 37.8 29.7 23.3 23.6 26.0 37.4 24.6 37.7 27.1 24.8 30.4 24.8 33.1 35.6 34.5 25.4 25.2 27.1 34.1 37.5 25.7 25.1 35.8 25.5 31.2 28.2 25.0 21.0 34.8 25.1 32.0 32.5 37.5 27.9 21.2 23.8 35.1 25.7 34.1 28.0 23.9 29.0 32.8 25.2 29.0 31.0 23.5 32.1 29.8 22.4 27.5 24.9 23.5 26.8 27.3 29.7 30.1 27.3 26.7 32.0 21.5 31.4 21.8 31.0

263 A.3 Quantitative RT-PCR Appendix 26.3 26.0 28.9 26.7 34.5 34.3 27.6 25.9 25.3 18.8 28.9 25.9 24.9 30.7 33.3 31.2 23.4 31.1 29.8 21.7 28.7 27.7 28.7 26.6 26.0 29.0 26.9 34.7 34.6 27.2 25.6 25.4 19.2 28.2 26.6 25.5 30.6 35.0 31.3 23.5 30.7 28.6 21.5 28.2 28.1 29.0 34.0 25.5 35.6 29.8 28.7 30.2 27.2 26.9 28.6 18.6 31.8 26.1 22.5 31.9 36.2 30.3 24.1 24.4 35.1 28.4 25.5 28.1 27.5 31.9 26.2 34.3 31.4 28.1 32.0 26.3 26.9 27.9 19.8 31.8 27.1 22.1 33.4 37.2 29.8 24.0 24.3 31.5 24.0 26.6 29.1 29.6 29.8 23.5 31.4 30.2 32.1 32.8 25.0 25.1 27.8 21.7 31.7 25.3 23.4 33.4 35.8 31.7 26.8 24.6 26.8 35.8 26.5 28.2 27.2 32.2 25.0 30.0 27.4 31.4 34.4 24.7 25.8 26.7 21.8 30.5 25.6 24.0 32.8 34.5 31.6 25.0 24.8 25.5 34.5 26.4 27.6 26.5 30.7 25.8 26.1 29.4 27.5 32.2 23.7 23.9 29.6 22.4 26.7 25.8 25.0 31.8 37.7 29.4 29.0 27.5 26.3 29.9 25.6 30.6 26.8 29.1 25.7 26.6 29.3 28.4 30.6 23.6 24.8 29.6 22.1 26.9 25.7 25.6 31.8 37.8 29.8 30.6 27.7 26.5 29.0 25.6 29.9 27.1 30.4 25.0 26.5 31.0 28.0 35.0 23.4 24.4 28.2 22.0 26.9 26.0 25.4 32.1 37.6 29.3 28.7 28.3 26.5 28.2 25.5 29.8 26.8 30.6 25.9 26.9 33.7 27.8 35.4 23.4 24.8 29.0 22.3 27.3 26.1 25.3 32.0 37.9 29.9 29.9 28.4 26.4 28.5 25.8 29.8 27.4 27.6 23.2 27.4 25.0 28.0 28.8 24.4 28.4 29.2 22.7 29.5 26.9 25.1 32.4 37.6 28.6 27.8 37.6 27.4 23.4 26.5 29.3 26.7 27.3 23.5 26.3 26.1 27.8 28.8 24.3 27.1 29.4 22.2 29.0 26.8 25.3 32.8 37.4 28.6 27.6 37.4 27.4 22.1 26.2 28.7 26.9 31.6 25.4 30.2 29.0 28.4 32.2 26.4 25.9 26.3 21.5 28.3 25.6 23.6 34.2 36.7 31.9 24.2 26.6 30.2 27.1 31.7 31.3 31.8 32.0 24.3 27.9 29.7 27.3 32.7 25.2 24.1 26.4 20.6 28.3 24.9 23.1 33.5 37.6 31.1 24.3 25.9 28.7 24.4 30.9 29.9 31.8 30.7 24.1 34.4 25.4 31.9 35.2 27.6 27.3 25.3 22.9 26.4 25.9 23.7 33.1 34.6 33.2 27.3 27.0 28.4 31.1 28.3 29.6 27.4 29.2 24.3 34.0 26.2 31.1 35.2 27.4 27.6 25.6 23.7 26.4 25.4 23.9 32.8 33.8 33.0 27.9 27.4 27.7 29.0 28.3 28.6 26.8 28.3 22.0 29.9 21.1 28.3 27.8 24.5 30.0 25.2 22.7 36.6 23.6 22.4 32.1 36.4 28.7 29.9 31.5 24.6 23.7 27.4 28.6 29.4 30.2 23.1 31.3 21.4 28.5 28.4 24.1 31.8 25.6 23.3 36.2 25.7 22.1 33.9 34.5 29.2 33.2 31.5 25.9 24.8 27.2 29.7 30.6 30.2 26.4 35.6 27.3 30.2 26.9 26.2 26.6 26.4 22.0 37.1 27.8 23.1 37.1 36.6 31.4 27.2 24.6 35.7 37.1 25.3 28.5 29.5 29.1 24.3 31.3 24.9 27.5 29.2 25.9 25.8 26.2 21.2 34.2 25.1 23.1 33.8 34.5 30.0 24.1 25.1 34.5 33.4 26.8 29.9 29.6 25.7 26.1 26.2 33.3 28.9 28.6 23.6 23.6 36.3 21.8 27.0 25.6 25.8 33.4 37.3 29.7 30.9 33.8 26.6 28.1 25.5 31.0 29.3 25.9 27.0 27.2 37.9 28.7 28.8 23.3 23.3 36.0 22.6 27.3 24.6 25.3 32.6 37.9 29.7 30.9 31.2 27.5 30.7 24.9 30.7 30.2 26.1 24.3 30.1 29.7 27.5 27.4 25.9 28.1 26.5 22.6 27.9 24.4 22.3 32.2 36.6 29.5 27.9 25.3 28.5 25.5 27.3 29.8 28.7 26.0 23.9 30.6 28.4 28.2 27.5 26.9 30.4 26.3 22.9 29.3 25.3 21.9 34.2 37.3 29.9 28.3 24.5 29.5 31.6 26.8 30.1 28.4 CategoryCore down ACore up BCore A down BCore down ACore down BCore down ACore down BCore down A BCore down ACore up BCore down ACore up BCore up ACore B down ACore down BCore down ACore up BCore up ACore down BMarker ACore up B Core up Core up MAP6-Hs01929835_s1 SULF2-Hs00378697_m1 LMO2-Hs00277106_m1 TAGLN-Hs00162558_m1 MAF-Hs00193519_m1 EDA2R-Hs00939736_m1 SEMA6A-Hs00221174_m1 CA12-Hs01080909_m1 SDC2-Hs00299807_m1 MT2A-Hs02379661_g1 PDZRN3-Hs00392900_m1 FAM69A-Hs00961685_m1 PLS3-Hs00418605_g1 SALL2-Hs00826674_m1 TES-Hs00210319_m1 LPAR6-Hs00271758_s1 NNMT-Hs00196287_m1 PRSS12-Hs00186221_m1 NTN1-Hs00180355_m1 CHI3L1-Hs00609691_m1 ADD2-Hs00242289_m1 LYST-Hs00179814_m1 PLA2G4A-Hs00233352_m1 Well Applied Biosystems assay ID51 Gene5253 PDE1C-Hs01095694_m154 G1955 TUBB-Hs00962420_g1 G19 Core up56 G21 G2157 MAN1C1-Hs00220595_m1 G2358 24.8 Norm G2359 25.0 Core RTN1-Hs00382515_m1 up G24 28.260 G24 28.961 G25 NPTX2-Hs00383983_m1 20.7 25.2 27.9 G2562 Core 21.3 24.2 26.9 down G2663 20.9 24.1 SOX10-Hs00366918_m1 25.8 G26 Core 25.1 20.9 down 23.364 25.2 G30 24.8 21.5 24.1 32.165 DDIT3-Hs00358796_g1 30.1 G30 33.6 21.4 24.0 34.2 Marker66 27.8 G31 32.9 21.4 26.1 29.7 29.0 G3167 27.8 C9orf125-Hs00260558_m1 21.3 25.5 28.8 32.2 G32 28.8 21.568 26.9 26.9 Core up 37.3 28.9 G32 29.8 21.2 26.5 26.769 SLIT2-Hs00191193_m1 Core 36.6 29.3 G144 down 28.7 21.5 28.9 24.870 37.9 34.5 G144 32.2 21.8 28.5 25.9 27.0 31.0 G166 37.3 36.671 32.3 IL17RD-Hs00296982_m1 21.1 28.7 25.6 26.8 31.1 G166 37.3 36.7 25.572 21.2 28.5 Core 25.9 28.0 G179 37.9 down 37.1 35.0 26.2 20.7 25.073 25.8 G179 28.1 LGALS3-Hs00173587_m1 37.3 37.0 27.8 25.8 Core 20.6 25.4 down 24.9 25.274 34.8 25.2 36.6 26.5 25.8 20.6 23.8 26.0 22.4 37.1 24.475 32.3 TERT-Hs00972656_m1 26.6 23.9 28.9 20.5 Core 34.2 25.5 27.6 up 29.1 25.2 31.8 26.576 27.7 21.2 32.3 26.3 23.6 26.8 28.5 37.6 31.2 26.4 29.8 26.277 PTEN-Hs02621230_s1 26.8 21.2 31.8 36.7 26.3 25.5 29.3 26.9 30.3 26.3 25.178 26.3 Marker 32.3 35.4 34.2 20.7 24.0 27.0 28.1 25.5 32.8 24.3 28.3 30.479 35.6 30.2 KCTD12-Hs00540818_s1 24.5 30.6 25.2 20.1 28.1 28.7 27.4 31.3 37.9 31.780 26.8 33.5 25.0 Core 27.3 27.3 31.8 22.5 37.6 down 29.1 26.2 26.681 32.1 32.7 23.4 SOX2-Hs01053049_s1 27.2 27.9 32.5 37.8 Core 27.7 22.7 down 30.8 32.3 23.4 26.6 82 29.2 22.0 28.4 37.9 29.0 37.7 27.4 31.8 27.1 29.2 21.7 30.883 37.6 27.2 32.2 24.5 ODZ2-Hs00393060_m1 29.3 27.2 33.7 25.2 33.8 21.8 37.3 30.7 31.684 30.6 28.2 24.5 34.8 24.4 31.8 30.6 21.8 37.3 Marker 30.3 30.4 26.585 CACNG8-Hs01100182_m1 29.8 25.0 28.1 22.5 38.0 29.4 31.9 32.3 32.3 26.6 29.686 23.1 28.7 Core 31.3 30.4 23.2 down 37.1 27.1 29.2 27.6 23.8 29.187 Core FOXJ1-Hs00230964_m1 31.0 31.9 24.0 down 27.0 27.9 25.3 24.7 36.2 32.3 28.5 25.9 30.8 26.388 23.5 26.9 30.4 24.9 29.6 32.0 26.0 23.8 36.9 31.2 25.9 22.9 34.189 27.4 LMO3-Hs00375237_m1 26.3 30.3 30.5 27.6 32.1 31.7 27.6 23.0 26.4 Core 26.4 35.190 down 25.4 31.0 28.1 28.9 30.2 28.5 24.8 33.1 25.1 26.6 31.191 25.9 28.9 25.0 34.8 CD9-Hs01124025_g1 29.9 28.9 24.6 32.8 25.3 33.5 28.5 30.4 26.192 30.7 29.0 24.6 29.0 Core 30.1 down 25.2 28.7 29.9 28.6 30.2 28.0 24.4 30.793 31.8 MYL9-Hs00697086_m1 28.2 31.3 31.0 26.5 24.8 37.3 30.3 28.2 23.2 31.294 30.6 31.5 36.6 31.6 27.2 36.6 29.9 30.9 23.9 33.8 25.195 Core 31.1 33.3 LAMA2-Hs00166308_m1 31.4 33.0 up 29.4 31.7 23.4 36.5 27.5 31.4 33.3 23.2 25.596 35.2 30.2 24.5 Core 31.1 down 28.2 34.5 34.3 26.5 31.9 31.5 23.0 23.8 30.5 BACE2-Hs00273238_m1 34.0 Core 23.4 30.2 31.0 33.0 33.7 28.3 23.2 down 35.8 29.9 23.2 27.3 23.6 31.0 32.5 37.5 23.2 29.9 31.2 29.8 30.1 30.2 27.6 23.3 28.0 32.3 36.6 23.0 30.3 Core 33.9 28.9 up 31.8 23.9 30.4 26.9 31.2 31.6 36.7 22.8 30.9 37.8 33.2 24.7 28.7 34.7 27.6 37.4 23.3 30.1 28.2 36.2 37.3 30.4 24.4 27.9 29.8 28.7 23.4 28.3 25.1 25.9 34.8 28.8 30.7 26.0 27.1 23.3 28.6 31.4 23.9 25.2 29.9 27.7 24.6 26.5 22.9 32.4 31.3 28.2 35.6 37.5 22.1 25.5 28.3 31.2 27.7 30.6 37.3 31.0 35.6 22.9 28.1 28.0 31.2 27.0 29.7 31.4 31.2 23.5 26.8 32.3 28.5 30.3 30.5 21.9 33.4 24.2 31.9 27.6 28.2 27.2 26.9 30.9 32.8 22.8 28.7 21.8 29.3 32.0 26.1 32.2 27.4 23.0 27.9 36.2 34.5 25.7 33.3 31.0 23.5 34.7 37.1 35.8 25.9 31.8 23.5 32.8 27.2 30.5 31.1 23.4 38.0 31.1 28.9 29.4 23.8 31.5 31.8 37.1 27.8 36.0 22.6 30.5 28.3 35.1 35.4 22.7 31.6 30.8 26.8 34.2 30.4 23.4 26.7 27.0 30.3 23.2 26.2 33.7 30.8 23.4 26.6 26.3 34.1 22.9 26.8 26.3 33.2 26.8 34.9 25.7 27.0 26.9

264 A.4 Tag-seq vs qRT-PCR Correlation Appendix

A.4 Tag-seq vs qRT-PCR Correlation

The correlation between the qRT-PCR measurements from the 21 cell lines (16 GNS and 5 NS cell lines) and the Tag-seq measurements from the five cell lines (three GNS and two NS cell lines) was found by taking the normalised Ct values and tag counts for each of the 82 core differentially expressed genes - as determined by Tag-seq on the three GNS cell lines and two NS cell lines - and applying the cor function from the stats R package.

Table A.5: Pearson correlation values between the normalised Ct values measured through qRT-PCR and the tag counts measured across the five GNS and NS cell lines assayed via Tag-seq.

Gene name Pearson Gene name Pearson correlation correlation PTEN 0.9973 MAN1C1 0.9953 NDN 0.9942 HMGA2 0.9932 HOXD10 0.9925 TUSC3 0.9888 TES 0.9848 SYNM 0.9823 SIX3 0.9800 LYST 0.9797 MYL9 0.9777 PLA2G4A 0.9757 IRX2 0.9749 DDIT3 0.9748 MT2A 0.9741 MMP17 0.9735 BACE2 0.9684 MAF 0.9672 CA12 0.9653 LMO3 0.9652 NKX2-1 0.9625 IL17RD 0.9571 SEMA6A 0.9560 CXXC4 0.9520 FAM69A 0.9475 ADD2 0.9451 SPARCL1 0.9416 LMO2 0.9391 CHCHD10 0.9388 PLCH1 0.9362 PEG3 0.9345 CCND2 0.9344 KALRN 0.9310 FAM38B 0.9278 EDA2R 0.9270 GPR158 0.9222 LMO4 0.9203 CD9 0.9187 ODZ2 0.9168 ST6GALNAC5 0.9139 MAP6 0.9116 NPTX2 0.9047 CACNG8 0.9045 SLIT2 0.9016 MMRN1 0.8906 CD74 0.8906 TAGLN 0.8874 PDE1C 0.8832 CEBPB 0.8813 DNER 0.8811 FUT8 0.8804 C9orf125 0.8772 RAB6B 0.8709 PI15 0.8654 NTN1 0.8620 SULF2 0.8602 KCTD12 0.8384 PMEPA1 0.8322 LUM 0.8263 SALL2 0.8078 NNMT 0.8034 HLA-DRA 0.7884 RTN1 0.7777 MN1 0.7714 RGS5 0.7679 CTSC 0.7661 C5orf13 0.7595 PLS3 0.7439 NELL2 0.7146 NTRK2 0.7115 LPAR6 0.7090 EPDR1 0.7053 LAMA2 0.6889 SDC2 0.6595 S100A6 0.6580 FOXG1 0.6051 PDZRN3 0.6032 FBLN2 0.5795 PRSS12 0.5211 LGALS3 0.5204 DTX4 0.4964 FOXJ1 0.4454

265 Appendix B

Literature Mining Script

It should be noted that the text preceded by the single quotation mark sign (’) that appears in italic typeface is a comment and not code that should be executed.

Public Class Form1 ’GBMbase iHOP PubMed BioGraph Google Scholar Google Search Dim genes() As String = {"SFTA3", "SLC4A4", "STRA6", "IL6", "SNX22", "COL21A1", "CA12", "CCND2", "SPINT1", "INHBA", "GFAP", "GPC3", "HOXC10", "PDE1C", "FAM70A", "GEM", "KALRN", "NTN1", "INHBB", "CCNO", "C10orf81", "CTNNA2", "APLN", "VAX1", "PCDHB4", "KCTD12", "ELN", "IFI30", "BMP8B", "CAMK2B", "LMO2", "TECRL", "RGS5", "NOV", "MOCOS", "SCN1B", "TNFSF4", "C14orf143", "LRRC2", "PLS3", "AP000280.1", "C21orf62", "C10orf116", "VIPR1", "ELMO1", "UGT8", "SH3BGR", "LIF", "MX1", "LRAT", "FXYD3", "CNTN6", "NNMT", "ZIC5", "GBP2", "TRIM47", "FAM196A", "C10orf141", "LMO4", "MYC", "RTP4", "C9orf125", "DPY19L1", "FCGR2B", "FCGR2C", "FCGR2A", "MAPT", "SLC15A2", "TGM2", "IRX2", "PCDH20", "ANGPTL1", "IL17RD", "ARC", "CHCHD10", "MAF", "CD55", "FBXO27", "PDZRN3", "ETS1", "SGCD", "CITED4", "MAP3K5", "STXBP5L", "ECHDC2", "CTSC", "MICA", "MMP17", "AC068399.1", "VIT", "TSHZ3", "CXXC4", "CABP7", "KCNMB2", "MT2A", "MYH3", "HOXD13", "STEAP2", "SLC4A11", "CCL7", "IFI27", "SPARCL1", "TNNI1", "OAS2", "TRPM8", "THBS2", "DOCK10", "ZIC2", "IFI6", "TNNI2", "CARD17", "CASP1", "CARD16", "ATOH8", "DNER", "FAM189A1", "SALL2", "C10orf11", "EEF1D", "WB- SCR17", "TUBA4A", "RHBDF2", "PARP3", "MAN1C1", "DDO", "F12", "SPTBN5", "TGFA", "ZNF536", "HLA-DQB1", "IFITM2", "DNM3", "JPH1", "FAM69A", "CACNA1C", "ESRRG", "ITM2A", "SKAP2", "TRAM1L1", "NDN", "PLCH1", "KCNJ12", "NXPH1", "MYBL1", "DUSP5", "NTRK2", "RNF175", "HOMER1", "PTEN", "CMPK2", "TBC1D8", "TRIB3", "CDHR1", "SYNM", "LUM", "SEMA6A", "ADRA2A", "STAMBPL1", "TSPAN7", "PYGL", "FOXJ1", "ZNF454", "EML2", "GABRQ", "EPAS1", "ERBB4", "RAB6B", "LXN", "MYL9", "BGN", "FUT8", "GRIA3", "ARHGEF7", "SLC2A5", "MMP7", "TMEM132D", "PION", "C5orf13", "SOD2", "PDGFA", "LOXL4", "SPP1", "BATF3", "SNAP25", "SULF2", "LPAR6", "EPDR1", "EPHB3", "TSLP", "TMCO4", "SERPINE2", "TUSC3", "VSNL1", "ATAD3C", "MARCH1", "DKK1", "CEBPB", "NFE2L3", "TNNC2", "ODZ2", "OAS1",

266 Appendix

"BEX5", "DIAPH2", "GBP3", "SORCS3", "SOX3", "FOXG1", "LRRN2", "ELMO2", "MAP6", "TRIM48", "CNKSR2", "MOSC2", "CRYBB2", "GJA1", "ELOVL2", "B4GALNT1", "C10orf90", "FBN2", "SMOC2", "ZIC3", "RARRES3", "GBP1", "EFEMP1", "GPR98", "PLD3", "OLFM1", "CCDC64", "MGLL", "THY1", "CPLX2", "FAM150B", "MKI67", "AC092296.1", "PHACTR3", "TNFRSF14", "NRG1", "RAB38", "ABAT", "NRBF2", "CLDN10", "C1orf94", "NELL2", "CYB5R2", "TNC", "HLA-A", "PPCS", "SORL1", "SHROOM3", "SLC38A1", "FXYD5", "CILP", "OGN", "MARVELD3", "APOD", "LEMD1", "DOCK5", "GBP4", "KCNA2", "FAM55C", "NFKBIZ", "RGL3", "HLA-DPB1", "XXbac-BPG116M5.1", "C2", "CFB", "NRP2", "HPSE", "CPNE5", "SMAGP", "FAM84A", "LOC653602", "NR4A2", "PERP", "ZNF281", "NID1", "ACIN1", "FOXQ1", "MATN2", "IRAK1", "NDE1", "COL8A1", "TFAP2A", "C1S", "ST6GAL1", "CD97", "ID4", "IGSF3", "NAMPT", "RP11-473I1.1", "EPB41L3", "NOTCH3", "GALNT5", "NLGN4X", "PITPNC1", "KLF6", "B3GNT9", "RIPK4", "S100B", "GYG2", "LOC283070", "CAMK1D", "ZNF747", "HOXA7", "NMU", "RAB7L1", "STX3", "SRGAP3", "CNTN1", "ATP1B2", "MAP7", "ADAMTS4", "F2RL1", "MIA", "MLC1", "ARHGAP20", "C4orf32", "EDA2R", "OTX2", "MEIS2", "SPRED1", "PPEF1", "TCF7", "TFCP2", "HPRT1", "TTF2", "SLITRK5", "PXDN", "MIPOL1", "SYT1", "CXCL14", "ETS2", "PPHLN1", "NBL1", "GDPD2", "SDC2", "ADAMTS10", "C6orf138", "ICAM1", "SELENBP1", "C7orf40", "GALR1", "SFRP1", "IGSF11", "MTTP", "RRS1", "RGAG4", "BTBD11", "PHLPP1", "RGS17", "NCAM1", "MID1IP1", "TMEM200B", "CRYBB1", "KANK1", "MT1A", "SHOX2", "PLS1", "H1F0", "SEMA3D", "ITGA4", "ZIC4", "SPINK2", "SLC38A5", "LINGO1", "FAM184B", "NR1D1", "SQRDL", "SPAG4", "HOXD9", "HAND2", "LIFR", "MYO1B", "TPMT", "PCDHB3", "SHISA2", "XYLT1", "REC8", "NFATC2", "TOX", "STC2", "FAM126A", "GAL", "CPAMD8", "HLA-DQA1", "SRPX", "GJB2", "HOXB6", "HIF3A", "MXRA5", "ITGBL1", "LGALS3", "C5orf38", "IL1RAPL1", "TSPAN13", "AC007405.8", "LOC285141", "TMSB15A", "ADAMTS1", "KCNK12", "PARP12", "DCHS1", "TSTD1", "SEZ6L", "SNHG12", "IL4I1", "CCDC48", "NKX6-2", "OAS3", "TUBB4", "CDH19", "GJC3", "TMEM100", "DDIT4L", "ARHGAP8", "PRR5", "PRR5-ARHGAP8", "DDAH2", "ADCYAP1R1", "C3orf58", "IL33", "IL1R1", "CD68", "MARS", "ALDH1A3", "C2orf80", "C7orf16", "KIAA1217", "AC067930.1", "CD248", "CMTM5", "HRCT1", "CCR1", "FAM129A", "PIGA", "PAX8", "GRB14", "ADD2", "CTLA4", "DYNC1I1", "KCNK1", "LRRC55", "PDGFRB", "SYTL5", "TMEM158", "PROCR", "PSPH", "CRIP2", "CBLC", "TMEM38A", "INSIG1", "ST6GALNAC3", "NOP16", "TPM1", "FAM176A", "PPM1K", "PNMA2", "OXTR", "TRAF1", "TRIB2", "TRIM14", "AQP4", "PEA15", "PRSS12", "LPHN2", "CD58", "NFASC", "CHRDL1", "AC026410.6", "IKBKE", "CRB2", "SRP9", "IFITM8P", "SLC16A3", "ANGPTL2", "COL4A6", "LFNG", "FZD3", "CDH6", "PPP4R1", "LAMA4", "LNX1", "EFNA5", "TMEM71", "RPSAP52", "SYNGR1", "GABRA5", "TESC", "TTYH1", "FAM181A", "HOXC13", "C1orf133", "LRP4", "FERMT3", "CDKN2C", "TSPAN9", "CXorf38", "HOXA1", "TTN", "TMEM176B", "CPNE2", "SALL1", "SLC26A2", "ZEB1", "GLYATL2", "RHOBTB3", "EFHD2", "MLPH", "MFAP2", "PTPRR", "RRP7A", "SNX10", "ZNF714", "MACROD2", "GMPR", "BMPER", "AIDA", "FAM5B", "PLCB1", "ADRA1B", "ELMOD1", "RP11-93B14.2", "hCG_2018279", "SYTL4", "NFIA", "CMAH", "PLAC9", "EVC2", "SCN11A", "RASGRP3", "RHPN1", "PTPRD", "PDE4B", "CXCL12", "CCKBR", "ISYNA1", "SDK2", "TNFAIP3", "ACTA2", "PCDHB12", "AMMECR1", "PKNOX2", "PTPRH", "DUSP16", "SULF1", "RP9", "GNG11", "C5orf41", "C9orf95", "DCBLD2", "RAD50", "RASGEF1C", "PCOLCE2", "PNP", "PLEKHA6", "HEPACAM",

267 Appendix

"TAPBPL", "MEST", "EEF1A2", "C1orf187", "CALM1", "CREB3L1", "CCNY", "MGC87042", "TCF7L2", "ZNF710", "JAM2", "SCN5A", "SLC46A1", "TMSL1", "C20orf103", "HOXA5", "FMNL1", "CACNG7", "TF", "RNASEH1", "UNC80", "EGFR", "PTGS1", "ASPN", "SEMA4G", "ASNS", "FGFR1", "CHODL", "IRX1", "INPP5D", "PLSCR1", "NCALD", "CLDN3", "COL1A2", "GGH", "MICB", "FNDC5", "RAB11FIP1", "DKFZp434H1419", "AC012513.4", "CGREF1", "RP5-955M13.1", "KCNG1", "GFPT2", "IGFBP5", "CCDC8", "RADIL", "BID", "PPL", "RGS20", "BMP7"}

Dim dbUrls() As String = {"www.gbmbase.org", "www.ncbi.nlm.nih.gov/pubmed", "www.ihop- net.org/UniPub/iHOP/", "biograph.be/", "scholar.google.com/scholar", "www.google.com"} Dim pageLoadTimes, nDB, ngene As Integer

Public Event nextGene() Public Event nextDB()

Private Sub ButtonStart_Click(sender As Object, e As EventArgs) Handles ButtonStart.Click ngene = -1 RaiseEvent nextGene() End Sub

Private Sub nextGene_() Handles Me.nextGene ngene += 1 If ngene < genes.Count Then nDB = 4 ’0 ’-1+ TextBox3.Text = genes(ngene) RaiseEvent nextDB() Else ’ End data collection End If End Sub

Private Sub nextDB_() Handles Me.nextDB nDB += 1 If nDB < dbUrls.Count Then pageLoadTimes = 0 Select Case nDB Case 0, 3 : WebBrowser1.Navigate(dbUrls(nDB)) Case 1 : WebBrowser1.Navigate(dbUrls(nDB) & "?term=" & genes(ngene) & " glioblas- toma") Case 2 : WebBrowser1.Navigate(dbUrls(nDB) & "?search=" & genes(ngene) & " &field=all &ncbi_tax_id=9606&organism_syn=") Case 4 : WebBrowser1.Navigate(dbUrls(nDB) & "?as_vis=0&q=" & genes(ngene) & " +glioblastoma&hl=en&as_sdt=0,5") Case 5 : WebBrowser1.Navigate(dbUrls(nDB) & "#hl=en&gs_nf=1&cp=10&gs_id=r&xhr=t &q=" & genes(ngene) & "+NEAR+glioblastoma&fp=1")

268 Appendix

End Select End If End Sub

Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDoc- umentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

If nDB = 0 Then ’ GMBbase Select Case pageLoadTimes

Case 0 ’ Search and fill the "search textbox" For Each elem As HtmlElement In WebBrowser1.Document.All Dim NameStr As String = elem.GetAttribute("name")

If ((NameStr IsNot Nothing) AndAlso (NameStr.Length <> 0)) Then If NameStr.ToLower().Equals("search") Then elem.InnerText = genes(ngene) End If End If Next WebBrowser1.Document.GetElementById("title_submit").InvokeMember("click")

Case 1 ’ Search the link to Gene name For Each elem As HtmlElement In WebBrowser1.Document.Links Dim NameStr As String = elem.InnerText If ((NameStr IsNot Nothing) AndAlso (NameStr.Length <> 0)) Then If Trim(elem.InnerText) = genes(ngene) Then elem.InvokeMember("Click") Exit Select End If End If Next

Case 2 ’ Get the results from page Dim st As String = WebBrowser1.DocumentText.ToString Dim st1 As Integer = st.IndexOf("Glioblastoma multiforme Gene Publications:") If st1 > 0 Then TextBox3.Text = Convert.ToInt32(Trim(st.Substring(st1 + 49, 5))) If TextBox3.Text <> "0" Then RaiseEvent nextGene() Else RaiseEvent nextDB() Exit Sub End If End Select

ElseIf nDB = 1 Then ’ PubMed

269 Appendix

Select Case pageLoadTimes

Case 0 ’ Get the results from page Dim st As String = WebBrowser1.DocumentText.ToString Dim stp As Integer = st.IndexOf("class=""result_count"">Results: ") If stp > 0 Then Dim stR As String = st.Substring(stp + 28, 20) ’ ex. : 1 to 20 of 87") st = stR.Substring(0, stp) ’es. : 1 to 20 of 87 stp = st.LastIndexOf(" ") stR = st.Substring(stp) ’ ex. 87 TextBox3.Text = Convert.ToInt32(Trim(stR)) If TextBox3.Text <> "0" Then RaiseEvent nextGene() Else RaiseEvent nextDB() Exit Sub End If

End Select

ElseIf nDB = 2 Then ’ iHOP

Select Case pageLoadTimes

Case 0 ’ Query for gene For Each elem As HtmlElement In WebBrowser1.Document.Links Dim NameStr As String = elem.InnerText If ((NameStr IsNot Nothing) AndAlso (NameStr.Length <> 0)) Then If Trim(elem.InnerText) = genes(ngene) Then elem.InvokeMember("Click") Exit Select End If End If Next

Case 1 ’ Get the result page Dim st As String = WebBrowser1.DocumentText.ToString End Select

ElseIf nDB = 3 Then ’ BioGraph Select Case pageLoadTimes

Case 0 ’ Search and fill the "search textbox" For Each elem As HtmlElement In WebBrowser1.Document.All Dim NameStr As String = elem.GetAttribute("name")

If ((NameStr IsNot Nothing) AndAlso (NameStr.Length <> 0)) Then

270 Appendix

If NameStr.ToLower().Equals("query") Then elem.InnerText = genes(ngene) : Exit For End If End If Next For Each elem As HtmlElement In WebBrowser1.Document.All ’ Search and click submit button Dim NameStr As String = elem.GetAttribute("name")

If ((NameStr IsNot Nothing) AndAlso (NameStr.Length <> 0)) Then If NameStr.ToLower().Equals("commit") Then elem.InvokeMember("click") : Exit For End If End If Next

Case 1 ’ Get the result page Dim st As String = WebBrowser1.DocumentText.ToString

End Select

ElseIf nDB = 4 Then ’ Google Scholar Select Case pageLoadTimes

Case 0 ’ Get the results from page Dim st As String = WebBrowser1.DocumentText.ToString Dim stp As Integer = st.IndexOf("

About ") If stp > 0 Then Dim stR As String = st.Substring(stp + 24, 20) ’ ex. 18,700 results (0.03 sec)... stp = stR.IndexOf("results") st = stR.Substring(0, stp).Replace(",", "") ’ ex. 18,700 –>18700 TextBox3.Text = Convert.ToInt32(Trim(st)) If TextBox3.Text <> "0" Then RaiseEvent nextGene() Else RaiseEvent nextDB() Exit Sub End If

End Select ElseIf nDB = 5 Then ’ Google Search Select Case pageLoadTimes

Case 0 ’ Get the results from page Dim st As String = WebBrowser1.DocumentText.ToString Dim stp As Integer = st.IndexOf("About ") If stp > 0 Then Dim stR As String = st.Substring(stp + 24, 20) ’ex. 18,700 results (0.03 sec)...

271 Appendix stp = stR.IndexOf("results") st = stR.Substring(0, stp).Replace(",", "") ’ex. 18,700 –>18700 TextBox3.Text = Convert.ToInt32(Trim(st)) If TextBox3.Text <> "0" Then RaiseEvent nextGene() Exit Sub End If

End Select

End If pageLoadTimes += 1

End Sub

End Class

272 Appendix C

Long ncRNAs

We detected 25 differentially expressed long non-coding RNAs. Several of these display an expression pattern similar to a neighbouring protein-coding gene, including cancer-associated genes DKK1 and CTSC [113,461,551] and devel- opmental regulators IRX2, SIX3 and ZNF536 [412], suggesting that they may be functional RNAs regulating nearby genes [372] or represent transcription from active enhancers [230].

273 C. Long ncRNAs Appendix 0.0 0.0 58.3 0.0 0.0 0.0 1.0 283.9 0.0 0.0 136.9 0.0 0.0 0.0 0.0 21.8 206.3 0.0 0.0 24.7 18.0 243.9 126.3 0.0 129.8 125.8 2.2 121.6 389.5 328.5 49.6 0.0 158.7 33.6 0.0 0.0 0.0 0.0 96.2 0.0 174.7 18.0 0.0 0.0 0.0 0.0 54.0 0.0 4.3E-06 1.3E-02 2.0E-02 1.5E-02 1.1E-05 9.9E-07 8.7E-04 6.3E-04 p-value FDR7.9E-10 G144ED G144 G166 G179 CB5411.4E-05 CB660 2.5E-05 1.7E-05 2.8E-09 1.3E-10 5.4E-07 3.5E-07 ) FC ( 2 log Inf Inf -7.08 Inf Inf Inf 7.46 -Inf 19144990 18185590; 20541999 None None None None None 12082533; 15872005 Differentially expressed non-coding RNAs. In gene desert? PubMedID(s)No Differential expression resultsNo Normalised tag counts No No Yes Yes Yes Yes Yes 20137068; 21170033 4.32No 1.6E-06No 2.2E-03 123.4 None 627.4No 775.4 None 1927.1 18.5 4.42Yes None 91.8 No 3.2E-05 Inf 2.4E-02 None 151.4Yes 1.4E-04 Inf None 123.2 7.5E-02 332.4 1.7 135.3 1.4E-05 None Inf 9.3 1.3E-02 0.0 41.9 7.53 9.4 36.9 2.0E-04 105.5 9.9E-02 68.3 2.4E-11 Inf 51.8 0.0 2.3E-07 0.0 0.0 4.2 0.0 3.9E-05 0.0 0.0 2.8E-02 68.0 66.8 1.4 0.0 267.3 30.8 1381.8 0.0 0.0 0.0 30.7 0.0 5.9 99.7 0.0 0.0 Table C.1: Description HOTAIRM1, aHOXA ncRNA locus. fromthat the it There might regulateexpression. is HOXA evidence gene CDKN2BAS, anscript antisense toto tran- CDKN2B function that inrepression appears Polycomb-mediated ofCDKN2A and tumor CDKN2B. suppressors Transcript from the opposite strand of PAX8. Transcript from the opposite strand of TXNRD1. Transcript from geneDKK1. desert Correlated with near pression. DKK1 ex- Transcript from gene desert. Transcript betweenGRM5. CTSCexpression. Correlated and with CTSC The tagncRNA NCRMS, perhaps is detecting an NCRMS in isoform. NCRMS might be last a host intronand transcript mir-135a-2. of forincides The mir-1251 with tag a SINE also repeat. co- NEAT1, aparaspeckle ncRNA formationlated during and involved NS cell regu- in differentiation. Transcript from the opposite strand of CD27. Transcript from the opposite strand of MCF2L2. Transcript from(sense CDKN2B strand). intron Transcript from gene desert. Transcript downstream of PRRG4. Transcript from gene desert. BC031342 NR_003529 NR_015377 BM977209 BC015429 AK094154 BC038205 CX868766 TagCCGCCTTAATAAATGTA Accession TGGATAAACAAAATGAA ATGGCACCATATTGTGT TAGAACGGTGTTCCTCC ATTACATTTATGTCCTT GAAACATTCCAAACCTA GCTTTATTTTTTCTGCT AAATATTAGTTTTTCTT TACATAATTACTAATCA NR_028272 TATCCCCAAATAAACAA NR_015382 CCTGACCCTGCATCCCT BC013229 ACAAACAAAAGCCTTCC DB099927 TAACTGATCCTTAGATA BQ957425 CAAATAAACTTTATACC BC063641 GAGCCCAGACTAGATGG BF031226

274 C. Long ncRNAs Appendix 0.0 0.0 142.3 3.1 0.0 0.0 0.0 123.7 0.0 0.0 3.7 50.3 0.0 225.2 27.6 8.8 0.0 0.0 103.5 73.4 566.0 126.1 0.0 0.0 6.1 263.9 12.5 0.0 0.0 4.2 4.9E-06 7.5E-03 7.2E-04 1.9E-03 6.9E-02 p-value FDR G144ED G1449.3E-10 G166 G179 CB541 CB660 7.3E-06 4.2E-07 1.3E-06 1.2E-04 ) FC ( 2 log Inf Inf -Inf 6.14 Inf None None 17084678 None (1688605) In gene desert? PubMedID(s)Yes Differential expression resultsYes NoneYes Normalised tag counts 5.95 5.4E-05Yes 3.7E-02 4.2Yes 0.0 96.3 95.6No 0.0 2.1 NoYes None 17084678Yes -4.65 1.6E-05 -InfYes 1.4E-02 17084678 34.0 6.9E-08 27.7 1.8E-04 None -Inf 0.0 4.5 2.4 0.0 1.3E-05 1.2E-02 354.0 0.0 0.0 -Inf 224.1 0.0 0.0 2.9E-05 162.1 2.2E-02 0.0 173.2 0.0 0.0 0.0 44.3 0.0 125.7 0.0 80.4 35.2 Description Transcript from gene desert around FOXG1. Overlaps anto EST a similar regionpseudoautosomal gene of SPRY3. downstream of the Transcript from gene desert harbor- ing ZNF536 andlated with TSHZ3. ZNF536 expression. Corre- Transcript from gene desert around KCNF1. Transcript from gene desertstream down- of SIX3.SIX3 expression. Correlated with Transcript from geneDCBLD2. desert near Transcript downstreamF, of on HLA- oppositean strand. antisense transcript(NR_026972). to Overlaps HLA-F Intergenic transcript, or possiblyvery long a CACNG8 3’-UTRsion. exten- Might be amir-935. host transcript for Transcript fromstream gene of desert SIX3.SIX3 expression. up- Correlated with Transcript from gene desertstream down- of SIX3.SIX3 expression. Correlated with Transcript from geneIRX2. desert Correlated near pression. with IRX2 ex- BM679519 BG201257 GD259214 BG166405 DB349183 TagAAATATGGATAAATGTA CA425887 AAATTGGTGCTGTTGCT Accession GAATACAGATTAATCCT TATAATAATAATGCTTA TACACAATAAATATTTA TTTTTCATCAAGAGGAA GAAGGTCCCCCAGGGGT AK131287 ATCATCACGTGAGAGAT AK126832 GAGAGTGAATGTTTAAA CR623536 ATAATAAAAGTATTTTT AA993778

275 Appendix D

Glioblastoma Pathway

D.1 Pathway Interactions

Table D.1: Network interaction data for the integrated glioblastoma pathway. The first column identifies the first interactor; the second column identifies the interaction type; the third column identifies the second interactor. Sorted in alphabetical order on the first column interactor.

First interactor Interaction Second interactor AKT1 activates MAP2K7 AKT1 activates MDM2 AKT1 activates TSC2 AKT1 inhibits CDKN1A AKT1 inhibits CDKN1B AKT1 inhibits FOXO3 AKT1 inhibits TSC-complex APAF1 leads-to APOPTOSIS ATM activates CHEK1 ATM activates PRKDC ATM activates TP53 BASC-complex includes BRCA1 BASC-complex includes MSH6 BASC-complex leads-to APOPTOSIS BASC-complex leads-to DNA-REPAIR BAX inhibits BCL2 BCL2 inhibits APAF1 BDNF activates NTRK2 BID activates BAX BID activates CYCS BRCA1 interacts MSH6 BTRC inhibits NFKB BTRC inhibits PHLPP1 BUB1B inhibits IRS1 BUB1B leads-to CELL-GROWTH BUB1B leads-to PROTEIN-SYNTHESIS CACN activates PKC CACN includes CACNA1A CACN includes CACNA1C CACN includes CACNG7 CACN includes CACNG8 CACN lets-in Calcium CALM1 activates CAMK1D CALM1 activates CAMK2B CALM1 activates PPEF1 CALM1 interacts MAPT CALM1 interacts SNCA CALM1 interacts SYT1 CAMK1D activates AKT1 CAMK2A activates RAS CAMK2B interacts CAMK2A

276 D.1 Pathway Interactions Appendix

First interactor Interaction Second interactor CASP1 activates CASP3 CASP1 activates IL1B CASP3 activates PARP CASP8 activates BID CBL inhibits RTK CBL interacts CRK CCND-CDK4/6-complex activates RB1 CCND-CDK4/6-complex includes CCND1 CCND-CDK4/6-complex includes CCND2 CCND-CDK4/6-complex includes CDK4 CCND-CDK4/6-complex includes CDK6 CCNE-CDK2-complex activates RB1 CCNE-CDK2-complex includes CCNE1 CCNE-CDK2-complex includes CDK2 CDC25C activates CDK1 CDK1 leads-to G2-M-PROGRESSION CDKN1A inhibits CCND-CDK4/6-complex CDKN1A inhibits CCNE-CDK2-complex CDKN1B inhibits CCNE-CDK2-complex CDKN2A inhibits CCND-CDK4/6-complex CDKN2A:ARF inhibits MDM2 CDKN2C inhibits CCND-CDK4/6-complex CEBPB activates DEC1 CEBPB activates FOSL2 CEBPB activates STAT3 CEBPB activates_transcription CEBPB CHEK1 inhibits CDC25C CPLX2 inhibits SNARE-complex CPLX2 interacts STX1A CREBBP inhibits TP53 CREBBP interacts CREB1 CREBBP interacts JUN CREBBP interacts MYC CREBBP interacts NFATC2 CRK interacts GAB1 CYCS activates CASP9 Calcium activates CALM1 Calcium activates SYT1 Cytosolic-antigen activates HLA-A DAG activates PKC DDIT3 leads-to APOPTOSIS DEC1 activates RUNX1 DNA-DAMAGE activates ATM DNA-DAMAGE activates GADD45 DOCK1 interacts CBL DOCK1 interacts CRK DOCK1 interacts ERBB2 DOCK1 interacts PIK3R1 DUSP16 inhibits MAPK14 DUSP16 inhibits MAPK8 DUSP5 inhibits MAPK14 DUSP5 inhibits MAPK8 E2F1 leads-to G1-S-PROGRESSION EGF interacts EGFR EGFR activates CASP1 EIF4E leads-to CELL-GROWTH EIF4E leads-to PROTEIN-SYNTHESIS EIF4EBP1 inhibits EIF4E ELK1 interacts ELK4 ELK1 interacts SRF ELK4 interacts MYC ELK4 interacts SRF EP300 activates TP53 EPHB2 activates CCND-CDK4/6-complex EPHB2 inhibits TSC-complex EPHB2 leads-to CELL-CYCLE-PROGRESSION ERBB2 interacts EGFR ERBB2 interacts PIK3R1 ERRFI1 inhibits EGFR Endocytosed-antigen activates MHC-classII FAS leads-to APOPTOSIS FASLG activates FAS FGF activates FGFR FGF includes FGF19 FGF includes FGF23 FGF23 interacts CBL

277 D.1 Pathway Interactions Appendix

First interactor Interaction Second interactor FGFR includes FGFR1 FOSL2 activates DEC1 FOSL2 activates RUNX1 FOXO includes FOXO3 FOXO leads-to APOPTOSIS FOXO3 activates-transcription CDKN1B FOXO3 activates-transcription FASLG GAB1 activates PI3K-class1a GADD45 activates CCND-CDK4/6-complex GADD45 activates MAP3K4 GADD45 interacts PCNA GRB2 activates GAB1 GRB2 activates SOS1 HIF1A leads-to HYPOXIA HLA-A leads-to CD8-T-CELL-ACTIVATION HLA-DM activates MHC-classII HLA-DM includes HLA-DMA HLA-DM includes HLA-DMB HLA-DM inhibits CD74 HLA-DRA interacts CBL IFI30 activates Endocytosed-antigen IFNG activates IFI30 IFNG activates Immunoproteasome IGF1 activates IGF1R IGF2 leads-to APOPTOSIS IGFBP5 inhibits IGF2 IKBKE activates NFKB IKBKE interacts MAP3K14 IL1B activates IL1R1 IL1B leads-to APOPTOSIS IL1B leads-to DIFFERENTIATION IL1B leads-to INFLAMMATION IL1B leads-to PROLIFERATION IL1R1 activates PIK3R1 IL1R1 activates TRAF2 IL1R1 activates TRAF6 ILK activates AKT1 IP3 activates ITPR1 IRAK1 activates MAP3K14 IRAK1 activates MAP3K1 IRS1 activates PI3K-class1a ITGB activates ILK ITGB activates PTK2 ITGB activates SHC ITGB includes ITGA4 ITGB includes ITGBL1 ITPR1 lets-in Calcium Immunoproteasome activates Cytosolic-antigen JUN leads-to DIFFERENTIATION JUN leads-to INFLAMMATION JUN leads-to PROLIFERATION MAP2K1/2 activates MAPK1/2 MAP2K4 activates MAPK14 MAP2K6 activates MAPK14 MAP2K7 activates MAPK8 MAP3K14 tentatively-activates NFKB MAP3K4 activates MAP2K4 MAP3K4 interacts TRAF4 MAP3K5 activates AKT1 MAP3K7 activates MAP3K14 MAPK1 activates ELK1 MAPK1 activates MAPT MAPK1/2 activates RPS6KA3 MAPK1/2 inhibits MAP2K1/2 MAPK1/2 inhibits SOS1 MAPK1/2 inhibits TSC2 MAPK14 activates DDIT3 MAPK8 activates JUN MAPK8 inhibits NFATC2 MAPT interacts S100B MAPT interacts SNCA MAPT leads-to MICROTUBULE-DISASSEMBLY MDM2 inhibits RB1 MDM2 inhibits TP53 MDM4 inhibits TP53 MHC-classI includes HLA-A

278 D.1 Pathway Interactions Appendix

First interactor Interaction Second interactor MHC-classII includes HLA-DPA1 MHC-classII includes HLA-DPB1 MHC-classII includes HLA-DQA1 MHC-classII includes HLA-DQA2 MHC-classII includes HLA-DQB1 MHC-classII includes HLA-DRA MHC-classII includes HLA-DRB5 MHC-classII interacts CD74 MHC-classII leads-to CD4-T-CELL-ACTIVATION NFATC2 interacts EP300 NFKB leads-to ANTI-APOPTOSIS NFKB leads-to INFLAMMATION NFKB leads-to PROLIFERATION NFKBIZ inhibits NFKB NGF activates NTRK1 NR activates MAP2K6 NR includes NR0B1 NR includes NR1D1 NR includes NR4A2 NTRK1 activates TRAF6 NTRK1 interacts GRB2 NTRK1 interacts SHC1 NTRK2 activates TRAF6 NTRK2 interacts SHC1 PARP includes PARP12 PARP includes PARP3 PARP leads-to APOPTOSIS PDGF activates PDGFR PDGFR includes PDGFRA PDGFR includes PDGFRB PDGFR interacts CBL PDGFR interacts CRK PDGFR interacts PI3K-class1a PDK1 activates AKT1 PDPK1 activates AKT1 PEA15 interacts CASP8 PEA15 interacts MAPK1/2 PERP activates CYCS PHLPP1 inhibits AKT1 PI3K includes PI3K-class1a PI3K includes PI3K-class3 PI3K-class1a activates PIP2 PI3K-class1a interacts RB1 PI3K-class3 activates TORC1-complex PIP2 becomes DAG PIP2 becomes IP3 PIP2 becomes PIP3 PIP3 activates ILK PIP3 activates PDK1 PIP3 becomes PIP2 PIP3 interacts AKT1 PKC activates MAP2K1/2 PKC activates RAF1 PKC activates RAS PLC activates PIP2 PLCG activates PIP2 PLCG activates PKC PRKAB1 activates TSC-complex PRKDC activates TP53 PTEN inhibits AKT1 PTEN inhibits PIP3 PTEN inhibits PTK2 PTEN inhibits SHC PTEN inhibits TP53 PTEN interacts GAB1 PTEN interacts PIK3CA PTEN interacts PIK3R1 PTK2 activates MAP3K4 PTK2 activates PI3K-class1a PTK2 inhibits TSC-complex PTK2 interacts GRB2 PTK2 leads-to ACTIN-ORGANIZATION RAF1 activates MAP2K1/2 RAS activates PI3K RAS activates RAF1 RAS includes HRAS

279 D.1 Pathway Interactions Appendix

First interactor Interaction Second interactor RAS includes KRAS RAS includes NRAS RB1 inhibits E2F1 RHEB activates TORC1-complex RPS6KA3 activates BUB1B RPS6KA3 activates STK11 RPS6KA3 inhibits TSC2 RPS6KA3 interacts CREBBP RPS6KA3 interacts PEA15 RTK activates IRS1 RTK activates PI3K RTK activates PLCG RTK activates SHC RTK activates SRC RTK includes EGFR RTK includes FGFR RTK includes IGF1R RTK includes MET RTK includes PDGFR RTK interacts CRK RUNX1 activates JUN RUNX1 leads-to MESENCHYMAL-TRANSFORMATION S100B inhibits TP53 SERPINB2 leads-to INHIBITION-ANGIOGENESIS SERPINB2 leads-to INHIBITION-METASTASIS SERPINE2 interacts SERPINB2 SHC activates GRB2 SHC includes SHC1 SHC includes SHC4 SHC4 interacts NTRK2 SHISA5 interacts PERP SNARE-complex includes SNAP25 SNARE-complex includes STX1A SNCA interacts CAMK2B SNCA interacts SNCAIP SOS1 activates RAS SPRY2 inhibits CBL SPRY2 inhibits RAS SRC activates GRB2 SRC activates PI3K-class1a SRC inhibits PTEN SRF interacts DNA STAT3 activates FOSL2 STAT3 activates RUNX1 STAT3 activates_transcription STAT3 STK11 activates PRKAB1 SYT1 activates SNARE-complex TAB2 activates MAP3K7 TGFA activates EGFR TGFB activates TGFBR TGFBR activates NR TLR4 activates TRAF6 TNF activates TNFR TNF includes TNFSF4 TNFR activates TRAF2 TNFR includes TNFRSF14 TORC1-complex activates BUB1B TORC1-complex includes MLST8 TORC1-complex includes MTOR TORC1-complex includes RPTOR TORC1-complex inhibits EIF4EBP1 TORC2-complex activates AKT1 TORC2-complex includes MAPKAP1 TORC2-complex includes MLST8 TORC2-complex includes MTOR TORC2-complex includes RICTOR TORC2-complex leads-to ACTIN-ORGANIZATION TP53 activates BRCA1 TP53 activates HIF1A TP53 activates MDM2 TP53 activates S100B TP53 activates-transcription BAX TP53 activates-transcription CDKN1A TP53 activates-transcription FAS TP53 activates-transcription GADD45 TP53 activates-transcription IGFBP5

280 D.1 Pathway Interactions Appendix

First interactor Interaction Second interactor TP53 activates-transcription PTEN TP53 activates-transcription SERPINE2 TP53 activates-transcription SHISA5 TP53 inhibits CDK1 TP53 inhibits CDK2 TP53 inhibits-transcription BCL2 TP53 inhibits-transcription TIMP3 TP53 interacts CDKN2C TP53 leads-to APOPTOSIS TRAF2 activates MAP3K5 TRAF4 activates MAPK8 TRAF6 activates IRAK1 TRAF6 interacts TAB2 TSC-complex includes TSC1 TSC-complex includes TSC2 TSC2 inhibits RHEB

281 D.2 Pathway Images Appendix

D.2 Pathway Images

Figure D.1: Integrated GBM pathway overlaid with Tag-seq G144 expression data. The colour intensity of the nodes (green) indicates the magnitude of the expression. 282 D.2 Pathway Images Appendix

Figure D.2: Integrated GBM pathway overlaid with Tag-seq G144ED expression data. The colour intensity of the nodes (green) indicates the magnitude of the expression. 283 D.2 Pathway Images Appendix

Figure D.3: Integrated GBM pathway with Tag-seq G166 expression data. The colour intensity of the nodes (green) indicates the magnitude of the expression. 284 D.2 Pathway Images Appendix

Figure D.4: Integrated GBM pathway with Tag-seq G179 expression data. The colour intensity of the nodes (green) indicates the magnitude of the expression. 285 Appendix E

Exon Array Data

Table E.1: Fold-changes measured by exon array and filtered at FDR<1% for GNS cell lines G7, G26, G144, G166 and NS cell lines CB130, CB152, CB171, CB660. The average expression across samples column refers to the average between GNS and NS cell lines, i.e. a weighted average over samples accounting for the number of samples per group.

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 7.22 -2.67 0.000 0.000000 ENSG00000006042 TMEM98 9.22 -1.346 0.000 0.000000 ENSG00000018408 WWTR1 4.68 -3.21 0.000 0.000000 ENSG00000033122 LRRC7 8.79 -1.999 0.000 0.000000 ENSG00000052802 MSMO1 4.89 -1.417 0.000 0.000000 ENSG00000064692 SNCAIP 8.87 -2.08 0.000 0.000000 ENSG00000065308 TRAM2 6.38 -5.287 0.000 0.000000 ENSG00000065833 ME1 6.59 -3.859 0.000 0.000000 ENSG00000067715 SYT1 8.67 -1.769 0.000 0.000000 ENSG00000072110 ACTN1 6.31 -0.69 0.000 0.000000 ENSG00000072195 SPEG 7.29 -2.828 0.000 0.000000 ENSG00000075275 CELSR1 5.96 -3.66 0.000 0.000000 ENSG00000082397 EPB41L3 3.73 -1.96 0.000 0.000000 ENSG00000086991 NOX4 5.8 -1.574 0.000 0.000000 ENSG00000100078 PLA2G3 6.79 -2.284 0.000 0.000000 ENSG00000100626 GALNTL1 5.06 -1.815 0.000 0.000000 ENSG00000104044 OCA2 7.42 -1.469 0.000 0.000000 ENSG00000104219 ZDHHC2 4.75 -2.065 0.000 0.000000 ENSG00000104722 NEFM 5.16 -4.431 0.000 0.000000 ENSG00000104723 TUSC3 7.68 -1.989 0.000 0.000000 ENSG00000106829 TLE4 7.14 -4.683 0.000 0.000000 ENSG00000107438 PDLIM1 9.06 -4.638 0.000 0.000000 ENSG00000107796 ACTA2 6.54 -1.356 0.000 0.000000 ENSG00000111145 ELK3 7.83 -5.119 0.000 0.000000 ENSG00000113083 LOX 5.04 -2.938 0.000 0.000000 ENSG00000113319 RASGRF2 9.91 -2.22 0.000 0.000000 ENSG00000113657 DPYSL3 6.74 -2.677 0.000 0.000000 ENSG00000114805 PLCH1 7.37 -1.544 0.000 0.000000 ENSG00000122861 PLAU 7.69 -3.235 0.000 0.000000 ENSG00000128285 MCHR1 7.55 -2.544 0.000 0.000000 ENSG00000128591 FLNC 6.2 -4.327 0.000 0.000000 ENSG00000128641 MYO1B 8 -2.584 0.000 0.000000 ENSG00000129116 PALLD 5.75 -2.299 0.000 0.000000 ENSG00000131080 EDA2R 7.38 -1.771 0.000 0.000000 ENSG00000133216 EPHB2 4.88 -3.883 0.000 0.000000 ENSG00000135269 TES 6.32 -6.287 0.000 0.000000 ENSG00000135333 EPHA7 7.51 -1.44 0.000 0.000000 ENSG00000135540 NHSL1 8.34 -2.455 0.000 0.000000 ENSG00000136068 FLNB 3.82 -1.753 0.000 0.000000 ENSG00000137691 C11orf70 6.99 -1.835 0.000 0.000000 ENSG00000138771 SHROOM3 7.72 -2.708 0.000 0.000000 ENSG00000139278 GLIPR1 3.6 -1.412 0.000 0.000000 ENSG00000140057 AK7 8.29 -2.046 0.000 0.000000 ENSG00000140416 TPM1 4.82 -3.855 0.000 0.000000 ENSG00000141449 GREB1L

286 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 4.92 -2.647 0.000 0.000000 ENSG00000141682 PMAIP1 7.47 -1.782 0.000 0.000000 ENSG00000144730 IL17RD 7.37 -1.397 0.000 0.000000 ENSG00000145012 LPP 7.8 -1.172 0.000 0.000000 ENSG00000145431 PDGFC 4.45 -2.471 0.000 0.000000 ENSG00000145708 CRHBP 5.27 -1.439 0.000 0.000000 ENSG00000147234 FRMPD3 7.75 -4.124 0.000 0.000000 ENSG00000149591 TAGLN 5.48 -3.438 0.000 0.000000 ENSG00000151150 ANK3 6.55 -3.912 0.000 0.000000 ENSG00000151388 ADAMTS12 5.76 -4.527 0.000 0.000000 ENSG00000151572 ANO4 7.52 -5.064 0.000 0.000000 ENSG00000151892 GFRA1 8.11 -2.063 0.000 0.000000 ENSG00000152154 TMEM178A 3.22 -1.658 0.000 0.000000 ENSG00000153993 SEMA3D 5.1 -2.672 0.000 0.000000 ENSG00000155966 AFF2 5.84 -2.395 0.000 0.000000 ENSG00000156675 RAB11FIP1 5.63 -1.037 0.000 0.000000 ENSG00000157110 RBPMS 5.49 -2.472 0.000 0.000000 ENSG00000159217 IGF2BP1 5.97 -1.422 0.000 0.000000 ENSG00000159433 STARD9 6.62 -1.718 0.000 0.000000 ENSG00000162849 KIF26B 4.79 -3.556 0.000 0.000000 ENSG00000163071 SPATA18 7.66 -1.821 0.000 0.000000 ENSG00000163110 PDLIM5 6.5 -1.878 0.000 0.000000 ENSG00000163297 ANTXR2 7.09 -3.748 0.000 0.000000 ENSG00000163814 CDCP1 4.72 -1.731 0.000 0.000000 ENSG00000164002 DEM1 5.01 -1.766 0.000 0.000000 ENSG00000164932 CTHRC1 8.12 -1.339 0.000 0.000000 ENSG00000164938 TP53INP1 6.06 -5.065 0.000 0.000000 ENSG00000165588 OTX2 7.09 -2.589 0.000 0.000000 ENSG00000167081 PBX3 7.26 -1.998 0.000 0.000000 ENSG00000167693 NXN 8.27 -1.438 0.000 0.000000 ENSG00000169439 SDC2 6.61 -1.004 0.000 0.000000 ENSG00000170561 IRX2 4.48 -2.288 0.000 0.000000 ENSG00000172123 SLFN12 5.14 -4.167 0.000 0.000000 ENSG00000172260 NEGR1 7.6 -1.24 0.000 0.000000 ENSG00000172667 ZMAT3 5.4 -1.874 0.000 0.000000 ENSG00000173068 BNC2 6.66 -3.789 0.000 0.000000 ENSG00000173530 TNFRSF10D 4.85 -1.134 0.000 0.000000 ENSG00000173535 TNFRSF10C 6.29 -1.907 0.000 0.000000 ENSG00000174099 MSRB3 8.45 -1.552 0.000 0.000000 ENSG00000174136 RGMB 3.84 -1.408 0.000 0.000000 ENSG00000176040 TMPRSS7 7.2 -1.425 0.000 0.000000 ENSG00000176720 BOK 8.22 -1.108 0.000 0.000000 ENSG00000177119 ANO6 6.03 -1.552 0.000 0.000000 ENSG00000178573 MAF 6.78 -4.433 0.000 0.000000 ENSG00000184613 NELL2 4.67 -1.519 0.000 0.000000 ENSG00000184809 C21orf88 6.28 -1.561 0.000 0.000000 ENSG00000184985 SORCS2 5.2 -2.054 0.000 0.000000 ENSG00000185046 ANKS1B 6.45 -4.02 0.000 0.000000 ENSG00000185274 WBSCR17 8.51 -2.037 0.000 0.000000 ENSG00000185567 AHNAK2 7.01 -2.956 0.000 0.000000 ENSG00000189184 PCDH18 6.47 -2.842 0.000 0.000000 ENSG00000196730 DAPK1 7.77 -1.504 0.000 0.000000 ENSG00000196923 PDLIM7 5.87 -3.559 0.000 0.000000 ENSG00000198796 ALPK2 3.59 -2.326 0.000 0.000000 ENSG00000204764 RANBP17 5.82 -2.638 0.000 0.000000 ENSG00000204767 FAM196B 8.15 -1.501 0.000 0.000000 ENSG00000205213 LGR4 5.06 -1.915 0.000 0.000000 ENSG00000206538 VGLL3 4.48 -1.527 0.000 0.000000 ENSG00000213186 TRIM59 5.66 -2.628 0.000 0.000000 ENSG00000244694 PTCHD4 9.77 -1.446 0.000 0.000010 ENSG00000026025 VIM 8.93 -1.491 0.000 0.000010 ENSG00000035403 VCL 7.97 -3.845 0.000 0.000010 ENSG00000079931 MOXD1 5.7 -1.554 0.000 0.000010 ENSG00000080546 SESN1 11.03 -1.357 0.000 0.000010 ENSG00000099194 SCD 9.2 -1.406 0.000 0.000010 ENSG00000100345 MYH9 7.34 -1.425 0.000 0.000010 ENSG00000100918 REC8 4.65 -1.337 0.000 0.000010 ENSG00000107614 TRDMT1 7.08 -1.026 0.000 0.000010 ENSG00000109654 TRIM2 7.28 -1.459 0.000 0.000010 ENSG00000112144 ICK 5.52 -0.778 0.000 0.000010 ENSG00000115648 MLPH 8.11 -3.461 0.000 0.000010 ENSG00000117114 LPHN2 6.19 -1.353 0.000 0.000010 ENSG00000125257 ABCC4 8.36 -2.308 0.000 0.000010 ENSG00000129038 LOXL1 7.12 -2.728 0.000 0.000010 ENSG00000131378 RFTN1 4.4 -2.19 0.000 0.000010 ENSG00000134516 DOCK2 8.18 -1.64 0.000 0.000010 ENSG00000134871 COL4A2

287 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 5.66 -1.108 0.000 0.000010 ENSG00000135299 ANKRD6 6.65 -2.293 0.000 0.000010 ENSG00000139211 AMIGO2 6.01 -0.834 0.000 0.000010 ENSG00000143466 IKBKE 6.94 -1.227 0.000 0.000010 ENSG00000143772 ITPKB 5.93 -3.525 0.000 0.000010 ENSG00000147459 DOCK5 7.08 -1.588 0.000 0.000010 ENSG00000149269 PAK1 4.75 -1.397 0.000 0.000010 ENSG00000149948 HMGA2 8.76 -2.143 0.000 0.000010 ENSG00000150551 LYPD1 5.98 -1.944 0.000 0.000010 ENSG00000150687 PRSS23 8.52 -1.26 0.000 0.000010 ENSG00000150938 CRIM1 7.07 -1.701 0.000 0.000010 ENSG00000151474 FRMD4A 7.81 -1.827 0.000 0.000010 ENSG00000152104 PTPN14 7.53 -1.628 0.000 0.000010 ENSG00000153707 PTPRD 6.11 -1.021 0.000 0.000010 ENSG00000166016 ABTB2 6.79 -1.103 0.000 0.000010 ENSG00000166444 ST5 6.42 -0.938 0.000 0.000010 ENSG00000168140 VASN 6.31 -1.634 0.000 0.000010 ENSG00000171533 MAP6 6.1 -1.588 0.000 0.000010 ENSG00000171843 MLLT3 7.3 -1.553 0.000 0.000010 ENSG00000172175 MALT1 7.77 -1.621 0.000 0.000010 ENSG00000172638 EFEMP2 7.19 -1.756 0.000 0.000010 ENSG00000173848 NET1 6 -1.068 0.000 0.000010 ENSG00000180592 SKIDA1 6.47 -1.659 0.000 0.000010 ENSG00000182985 CADM1 6.33 -2.532 0.000 0.000010 ENSG00000187720 THSD4 9.32 -1.343 0.000 0.000010 ENSG00000188042 ARL4C 7.93 -1.344 0.000 0.000010 ENSG00000197702 PARVA 9.34 -1.438 0.000 0.000020 ENSG00000002586 CD99 7.95 -1.593 0.000 0.000020 ENSG00000065923 SLC9A7 5.65 -1.606 0.000 0.000020 ENSG00000071205 ARHGAP10 7.14 -1.517 0.000 0.000020 ENSG00000072401 UBE2D1 7.46 -1.853 0.000 0.000020 ENSG00000085377 PREP 6.82 -1.943 0.000 0.000020 ENSG00000103460 TOX3 9.2 -1.742 0.000 0.000020 ENSG00000112972 HMGCS1 5.82 -1.582 0.000 0.000020 ENSG00000114861 FOXP1 7.28 -0.983 0.000 0.000020 ENSG00000128739 SNRPN 7.58 -1.104 0.000 0.000020 ENSG00000129474 AJUBA 5.5 -1.192 0.000 0.000020 ENSG00000129682 FGF13 7.08 -1.836 0.000 0.000020 ENSG00000137942 FNBP1L 8.55 -1.31 0.000 0.000020 ENSG00000154380 ENAH 5.64 -2.757 0.000 0.000020 ENSG00000165659 DACH1 9.13 -1.323 0.000 0.000020 ENSG00000166033 HTRA1 7.58 -2.2 0.000 0.000020 ENSG00000168672 FAM84B 7.44 -1.906 0.000 0.000020 ENSG00000170175 CHRNB1 7.08 -0.917 0.000 0.000020 ENSG00000182197 EXT1 4.37 -1.48 0.000 0.000020 ENSG00000183778 B3GALT5 7.62 -3.42 0.000 0.000020 ENSG00000187955 COL14A1 10.42 -1.09 0.000 0.000020 ENSG00000196924 FLNA 3.77 -1.626 0.000 0.000020 ENSG00000203995 ZYG11A 8.82 -1.245 0.000 0.000020 ENSG00000213625 LEPROT 7.54 -1.127 0.000 0.000020 ENSG00000250588 IQCJ-SCHIP1 4.94 -1.757 0.000 0.000020 ENSG00000255994 C14orf162 8.77 -1.205 0.000 0.000030 ENSG00000065150 IPO5 7.74 -1.274 0.000 0.000030 ENSG00000073712 FERMT2 6.31 -1.635 0.000 0.000030 ENSG00000100592 DAAM1 8.63 -1.933 0.000 0.000030 ENSG00000125266 EFNB2 6.64 -3.548 0.000 0.000030 ENSG00000126010 GRPR 7.09 -4.473 0.000 0.000030 ENSG00000132429 POPDC3 8.83 -0.845 0.000 0.000030 ENSG00000137076 TLN1 5.45 -1.938 0.000 0.000030 ENSG00000137831 UACA 9.29 -1.364 0.000 0.000030 ENSG00000138448 ITGAV 7.34 -2.1 0.000 0.000030 ENSG00000139687 RB1 7.03 -2.562 0.000 0.000030 ENSG00000145536 ADAMTS16 8.41 -1.569 0.000 0.000030 ENSG00000148484 RSU1 7.15 -0.853 0.000 0.000030 ENSG00000150457 LATS2 7.54 -1.712 0.000 0.000030 ENSG00000153976 HS3ST3A1 6.68 -2.064 0.000 0.000030 ENSG00000154556 SORBS2 5.6 -1.892 0.000 0.000030 ENSG00000166450 PRTG 7 -1.416 0.000 0.000030 ENSG00000169047 IRS1 5.16 -1.289 0.000 0.000030 ENSG00000183840 GPR39 6.2 -1.112 0.000 0.000030 ENSG00000203772 SPRN 6.92 -0.86 0.000 0.000040 ENSG00000013364 MVP 6.65 -1.951 0.000 0.000040 ENSG00000069869 NEDD4 5.28 -1.552 0.000 0.000040 ENSG00000079102 RUNX1T1 6.07 -1.116 0.000 0.000040 ENSG00000085741 WNT11 6.6 -1.686 0.000 0.000040 ENSG00000099284 H2AFY2 7.93 -1.989 0.000 0.000040 ENSG00000101670 LIPG

288 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 6.88 -0.837 0.000 0.000040 ENSG00000107758 PPP3CB 7.91 -1.277 0.000 0.000040 ENSG00000128829 EIF2AK4 8.37 -0.912 0.000 0.000040 ENSG00000136478 TEX2 6.12 -1.235 0.000 0.000040 ENSG00000137502 RAB30 7.56 -1.128 0.000 0.000040 ENSG00000139668 WDFY2 3.8 -1.027 0.000 0.000040 ENSG00000150672 DLG2 7.69 -0.971 0.000 0.000040 ENSG00000163625 WDFY3 10.31 -1.446 0.000 0.000040 ENSG00000168615 ADAM9 7.77 -1.029 0.000 0.000040 ENSG00000181827 RFX7 8.07 -0.637 0.000 0.000040 ENSG00000198752 CDC42BPB 7.51 -0.797 0.000 0.000040 ENSG00000214717 ZBED1 6.19 -1.808 0.000 0.000050 ENSG00000066468 FGFR2 5.66 -2.079 0.000 0.000050 ENSG00000081803 CADPS2 7.49 -0.96 0.000 0.000050 ENSG00000087088 BAX 6.32 -1.122 0.000 0.000050 ENSG00000100968 NFATC4 8.22 -1.297 0.000 0.000050 ENSG00000102893 PHKB 5.64 -1.108 0.000 0.000050 ENSG00000108239 TBC1D12 9 -2.672 0.000 0.000050 ENSG00000112276 BVES 5.95 -2.589 0.000 0.000050 ENSG00000115232 ITGA4 4.59 -1.69 0.000 0.000050 ENSG00000124134 KCNS1 7.48 -2.329 0.000 0.000050 ENSG00000136928 GABBR2 8.18 -1.075 0.000 0.000050 ENSG00000150347 ARID5B 6.01 -2.228 0.000 0.000050 ENSG00000162745 OLFML2B 6.77 -1.239 0.000 0.000050 ENSG00000166073 GPR176 4.66 -3.976 0.000 0.000050 ENSG00000166342 NETO1 4.69 -1.97 0.000 0.000050 ENSG00000172403 SYNPO2 8.1 -0.895 0.000 0.000050 ENSG00000176014 TUBB6 8.3 -1.04 0.000 0.000060 ENSG00000100403 ZC3H7B 7.95 -1.065 0.000 0.000060 ENSG00000143344 RGL1 7.39 -1.509 0.000 0.000060 ENSG00000171862 PTEN 7.99 -0.895 0.000 0.000060 ENSG00000179820 MYADM 8.3 -0.826 0.000 0.000060 ENSG00000187079 TEAD1 5.29 -2.689 0.000 0.000070 ENSG00000101311 FERMT1 9.82 -0.993 0.000 0.000070 ENSG00000104549 SQLE 7.35 -0.867 0.000 0.000070 ENSG00000112655 PTK7 7.33 -0.735 0.000 0.000070 ENSG00000130338 TULP4 10.17 -0.996 0.000 0.000070 ENSG00000131236 CAP1 8.78 -1.538 0.000 0.000070 ENSG00000134824 FADS2 6.55 -0.792 0.000 0.000070 ENSG00000137936 BCAR3 9.79 -0.973 0.000 0.000070 ENSG00000150093 ITGB1 6.93 -1.288 0.000 0.000070 ENSG00000168502 SOGA2 8.38 -1.506 0.000 0.000070 ENSG00000179431 FJX1 8.32 -1.283 0.000 0.000070 ENSG00000197043 ANXA6 5.8 -1.297 0.000 0.000070 ENSG00000198113 TOR4A 7.19 -1.449 0.000 0.000070 ENSG00000205269 TMEM170B 9.7 -1.29 0.000 0.000080 ENSG00000071127 WDR1 7.75 -1.188 0.000 0.000080 ENSG00000073792 IGF2BP2 7.37 -0.762 0.000 0.000080 ENSG00000100139 MICALL1 6.25 -1.627 0.000 0.000080 ENSG00000140557 ST8SIA2 4.93 -2.465 0.000 0.000080 ENSG00000148798 INA 7.75 -1.245 0.000 0.000080 ENSG00000151612 ZNF827 6.56 -1.118 0.000 0.000080 ENSG00000151718 WWC2 7.26 -1.995 0.000 0.000080 ENSG00000174721 FGFBP3 7.34 -1.183 0.000 0.000080 ENSG00000175115 PACS1 7.06 -1.147 0.000 0.000080 ENSG00000182287 AP1S2 7.79 -1.122 0.000 0.000080 ENSG00000187239 FNBP1 6.27 -0.958 0.000 0.000080 ENSG00000229619 AC106722.1 10.55 -1.573 0.000 0.000090 ENSG00000101608 MYL12A 5.59 -1.066 0.000 0.000090 ENSG00000101665 SMAD7 6.15 -1.436 0.000 0.000090 ENSG00000107518 ATRNL1 5.53 -3.929 0.000 0.000090 ENSG00000144227 NXPH2 5.63 -0.965 0.000 0.000090 ENSG00000158246 FAM46B 6.62 -1.761 0.000 0.000090 ENSG00000165323 FAT3 7.74 -1.946 0.000 0.000100 ENSG00000102996 MMP15 8.65 -1.862 0.000 0.000100 ENSG00000104290 FZD3 6.36 -2.28 0.000 0.000100 ENSG00000151136 BTBD11 7.57 -2.996 0.000 0.000100 ENSG00000152661 GJA1 7.28 -0.967 0.000 0.000100 ENSG00000166912 MTMR10 7.44 -1.18 0.000 0.000110 ENSG00000058091 CDK14 8.62 -0.924 0.000 0.000110 ENSG00000073921 PICALM 5.74 -1.788 0.000 0.000110 ENSG00000152128 TMEM163 7.49 -1.413 0.000 0.000110 ENSG00000158966 CACHD1 5.56 -1.457 0.000 0.000110 ENSG00000165449 SLC16A9 3.3 -2.026 0.000 0.000110 ENSG00000170571 EMB 6.66 -0.922 0.000 0.000110 ENSG00000174307 PHLDA3 5.19 -1.959 0.000 0.000110 ENSG00000183691 NOG

289 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 6.73 -0.823 0.000 0.000120 ENSG00000011114 BTBD7 6.86 -0.624 0.000 0.000120 ENSG00000037757 MRI1 4.78 -1.277 0.000 0.000120 ENSG00000068781 STON1-GTF2A1L 9.49 -1.337 0.000 0.000120 ENSG00000075213 SEMA3A 4.78 -1.205 0.000 0.000120 ENSG00000108187 PBLD 7.01 -1.596 0.000 0.000120 ENSG00000109586 GALNT7 6.73 -0.668 0.000 0.000120 ENSG00000110237 ARHGEF17 6.37 -1.075 0.000 0.000120 ENSG00000126790 L3HYPDH 6.21 -1.532 0.000 0.000120 ENSG00000165996 PTPLA 6.16 -1.903 0.000 0.000120 ENSG00000180914 OXTR 9.41 -1.204 0.000 0.000120 ENSG00000196576 PLXNB2 5.41 -1.323 0.000 0.000120 ENSG00000197646 PDCD1LG2 6.14 -0.846 0.000 0.000130 ENSG00000090975 PITPNM2 6.87 -0.981 0.000 0.000130 ENSG00000148737 TCF7L2 7.63 -1.404 0.000 0.000130 ENSG00000177508 IRX3 6.23 -2.345 0.000 0.000140 ENSG00000104332 SFRP1 8.2 -1.368 0.000 0.000140 ENSG00000140682 TGFB1I1 7.55 -0.981 0.000 0.000140 ENSG00000148468 FAM171A1 6.12 -0.812 0.000 0.000140 ENSG00000162804 SNED1 7.73 -1.008 0.000 0.000140 ENSG00000196865 NHLRC2 7.36 -1.145 0.000 0.000150 ENSG00000055163 CYFIP2 6.64 -0.859 0.000 0.000150 ENSG00000072422 RHOBTB1 6.33 -1.765 0.000 0.000150 ENSG00000130176 CNN1 8.66 -0.711 0.000 0.000150 ENSG00000130638 ATXN10 8.32 -0.937 0.000 0.000150 ENSG00000139793 MBNL2 5.8 -1.718 0.000 0.000150 ENSG00000173320 STOX2 7.41 -1.382 0.000 0.000150 ENSG00000176788 BASP1 5.41 -4.49 0.000 0.000150 ENSG00000187714 SLC18A3 6.97 -0.57 0.000 0.000160 ENSG00000061273 HDAC7 8.41 -1.415 0.000 0.000160 ENSG00000163430 FSTL1 6.58 -1.178 0.000 0.000160 ENSG00000169436 COL22A1 8.98 -1.848 0.000 0.000170 ENSG00000135048 TMEM2 8.41 -1.185 0.000 0.000170 ENSG00000137710 RDX 4.5 -1.657 0.000 0.000170 ENSG00000154027 AK5 8.87 -0.936 0.000 0.000170 ENSG00000167460 TPM4 7.64 -1.238 0.000 0.000170 ENSG00000182319 PRAGMIN 8.22 -0.609 0.000 0.000170 ENSG00000182534 MXRA7 8.7 -1.291 0.000 0.000170 ENSG00000186575 NF2 7.62 -2.017 0.000 0.000180 ENSG00000084710 EFR3B 7.84 -0.796 0.000 0.000180 ENSG00000108091 CCDC6 8.51 -1.519 0.000 0.000180 ENSG00000148848 ADAM12 9.53 -0.645 0.000 0.000180 ENSG00000168175 MAPK1IP1L 7.32 -1.041 0.000 0.000180 ENSG00000170525 PFKFB3 7.31 -1.108 0.000 0.000180 ENSG00000178764 ZHX2 6.15 -0.992 0.000 0.000190 ENSG00000116675 DNAJC6 6.52 -1.174 0.000 0.000190 ENSG00000131370 SH3BP5 6.08 -0.978 0.000 0.000190 ENSG00000143816 WNT9A 6.26 -1.511 0.000 0.000190 ENSG00000255103 KIAA0754 6.71 -1.362 0.000 0.000200 ENSG00000122335 SERAC1 5.2 -3.22 0.000 0.000200 ENSG00000123119 NECAB1 5.45 -1.76 0.000 0.000200 ENSG00000138696 BMPR1B 6.28 -2.18 0.000 0.000200 ENSG00000146426 TIAM2 8.62 -0.877 0.000 0.000200 ENSG00000152558 TMEM123 6.89 -0.815 0.000 0.000210 ENSG00000068383 INPP5A 7.22 -1.283 0.000 0.000210 ENSG00000105974 CAV1 9.12 -1.876 0.000 0.000210 ENSG00000106484 MEST 7.38 -1.482 0.000 0.000210 ENSG00000113328 CCNG1 6.45 -1.103 0.000 0.000210 ENSG00000119681 LTBP2 6.12 -0.72 0.000 0.000210 ENSG00000132334 PTPRE 6.68 -1.403 0.000 0.000210 ENSG00000188158 NHS 7.53 -0.946 0.000 0.000230 ENSG00000107819 SFXN3 7.73 -0.772 0.000 0.000230 ENSG00000120254 MTHFD1L 7.79 -0.638 0.000 0.000230 ENSG00000122863 CHST3 3.92 -2.068 0.000 0.000230 ENSG00000158164 TMSB15A 7.94 -0.812 0.000 0.000230 ENSG00000185950 IRS2 7.35 -1.124 0.000 0.000240 ENSG00000067064 IDI1 7.4 -0.902 0.000 0.000240 ENSG00000124788 ATXN1 5.98 -0.83 0.000 0.000240 ENSG00000138131 LOXL4 7.87 -1.371 0.000 0.000240 ENSG00000187498 COL4A1 7.87 -0.797 0.000 0.000240 ENSG00000198951 NAGA 4.77 -1.178 0.000 0.000250 ENSG00000100285 NEFH 7.33 -0.837 0.000 0.000250 ENSG00000103335 PIEZO1 8.26 -2.268 0.000 0.000250 ENSG00000132470 ITGB4 8.65 -1.241 0.000 0.000250 ENSG00000181019 NQO1 5.67 -0.561 0.000 0.000250 ENSG00000198832 RP3-412A9.11 6.86 -0.871 0.000 0.000260 ENSG00000070269 C14orf101

290 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 5.37 -1.251 0.000 0.000260 ENSG00000113448 PDE4D 7.86 -0.688 0.000 0.000260 ENSG00000139645 ANKRD52 4.6 -1.255 0.000 0.000260 ENSG00000165626 BEND7 6.96 -0.585 0.000 0.000260 ENSG00000182541 LIMK2 6.14 -0.788 0.000 0.000270 ENSG00000014914 MTMR11 8.85 -1.086 0.000 0.000270 ENSG00000064666 CNN2 7.38 -0.682 0.000 0.000270 ENSG00000128283 CDC42EP1 8.39 -1.216 0.000 0.000270 ENSG00000162909 CAPN2 8.41 -0.967 0.000 0.000270 ENSG00000176903 PNMA1 5.62 -1.998 0.000 0.000270 ENSG00000182732 RGS6 6.79 -0.841 0.000 0.000280 ENSG00000151553 FAM160B1 7.41 -1.128 0.000 0.000280 ENSG00000170365 SMAD1 8.41 -1.273 0.000 0.000280 ENSG00000183722 LHFP 3.46 -1.447 0.000 0.000290 ENSG00000022556 NLRP2 2.61 -0.607 0.000 0.000290 ENSG00000133640 LRRIQ1 7.3 -1.36 0.000 0.000300 ENSG00000185339 TCN2 5.04 -1.653 0.000 0.000300 ENSG00000215475 SIAH3 9 -1.175 0.000 0.000310 ENSG00000107779 BMPR1A 8.88 -1.175 0.000 0.000310 ENSG00000117298 ECE1 8.39 -0.598 0.000 0.000310 ENSG00000198954 KIAA1279 5.41 -2.252 0.000 0.000320 ENSG00000124610 HIST1H1A 7.1 -1.379 0.000 0.000320 ENSG00000205726 ITSN1 7.71 -0.927 0.000 0.000330 ENSG00000102753 KPNA3 7 -0.851 0.000 0.000330 ENSG00000143553 SNAPIN 9.78 -0.783 0.000 0.000330 ENSG00000147416 ATP6V1B2 5.61 -2.025 0.000 0.000330 ENSG00000188517 COL25A1 6.51 -2.126 0.000 0.000340 ENSG00000152977 ZIC1 7.7 -0.908 0.000 0.000340 ENSG00000154001 PPP2R5E 4.97 -0.768 0.000 0.000340 ENSG00000203877 RIPPLY2 6.95 -1.345 0.000 0.000350 ENSG00000080298 RFX3 8.79 -1.528 0.000 0.000350 ENSG00000082781 ITGB5 8.19 -0.882 0.000 0.000350 ENSG00000175662 TOM1L2 4.95 -2.667 0.000 0.000360 ENSG00000089472 HEPH 9.11 -1.105 0.000 0.000360 ENSG00000102898 NUTF2 7.58 -1.796 0.000 0.000360 ENSG00000125430 HS3ST3B1 9.2 -1.289 0.000 0.000360 ENSG00000172380 GNG12 6.85 -1.643 0.000 0.000370 ENSG00000139263 LRIG3 8.35 -0.864 0.000 0.000370 ENSG00000165476 REEP3 6.9 -1.305 0.000 0.000380 ENSG00000040199 PHLPP2 8.79 -0.701 0.000 0.000380 ENSG00000120733 KDM3B 5 -1.693 0.000 0.000380 ENSG00000143473 KCNH1 6.27 -2.298 0.000 0.000380 ENSG00000173917 HOXB2 6.76 -0.673 0.000 0.000380 ENSG00000205011 AC073082.1 6.92 -1.127 0.000 0.000390 ENSG00000042493 CAPG 7.91 -0.553 0.000 0.000390 ENSG00000204138 PHACTR4 9.42 -0.581 0.000 0.000400 ENSG00000143742 SRP9 8.05 -1.162 0.000 0.000410 ENSG00000107249 GLIS3 4.6 -0.65 0.000 0.000410 ENSG00000172238 ATOH1 8.33 -1.572 0.000 0.000420 ENSG00000152767 FARP1 4.7 -2.758 0.000 0.000420 ENSG00000163762 TM4SF18 7.29 -1.321 0.000 0.000420 ENSG00000179242 CDH4 4.32 -1.336 0.000 0.000420 ENSG00000188316 ENO4 7.53 -0.977 0.000 0.000430 ENSG00000039560 RAI14 8.64 -1.21 0.000 0.000430 ENSG00000128791 TWSG1 7.73 -1.201 0.000 0.000430 ENSG00000130147 SH3BP4 6.59 -1.229 0.000 0.000440 ENSG00000113721 PDGFRB 7.62 -0.694 0.000 0.000440 ENSG00000150760 DOCK1 7.79 -0.655 0.000 0.000450 ENSG00000148411 NACC2 4.02 -1.254 0.000 0.000450 ENSG00000154760 SLFN13 6.46 -1.117 0.000 0.000450 ENSG00000188153 COL4A5 4.84 -1.145 0.000 0.000460 ENSG00000109339 MAPK10 9.11 -2.027 0.000 0.000460 ENSG00000118523 CTGF 7.55 -1.017 0.000 0.000460 ENSG00000137693 YAP1 7.01 -0.558 0.000 0.000460 ENSG00000186174 BCL9L 6.04 -1.369 0.000 0.000460 ENSG00000197565 COL4A6 7.85 -0.791 0.000 0.000460 ENSG00000198856 OSTC 3.62 -0.988 0.000 0.000470 ENSG00000155974 GRIP1 6.5 -0.997 0.000 0.000470 ENSG00000171055 FEZ2 6.12 -0.539 0.000 0.000470 ENSG00000171680 PLEKHG5 10.02 -1.536 0.000 0.000490 ENSG00000041982 TNC 7.6 -0.459 0.000 0.000490 ENSG00000166454 ATMIN 6.43 -0.794 0.001 0.000500 ENSG00000197261 C6orf141 8.15 -1.104 0.001 0.000510 ENSG00000119655 NPC2 5.83 -1.663 0.001 0.000510 ENSG00000146373 RNF217 3.77 -1.114 0.001 0.000510 ENSG00000150540 HNMT 7.94 -0.855 0.001 0.000510 ENSG00000159842 ABR

291 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 7.84 -0.72 0.001 0.000510 ENSG00000214265 SNURF 3.66 -1.512 0.001 0.000510 ENSG00000214919 AC104472.1 5.04 -1.875 0.001 0.000520 ENSG00000102935 ZNF423 6.22 -0.928 0.001 0.000530 ENSG00000106991 ENG 7.3 -0.798 0.001 0.000530 ENSG00000119977 TCTN3 8.03 -1.528 0.001 0.000530 ENSG00000120437 ACAT2 6.41 -1.877 0.001 0.000530 ENSG00000138829 FBN2 3.03 -0.791 0.001 0.000530 ENSG00000162614 NEXN 6.94 -2.214 0.001 0.000540 ENSG00000121005 CRISPLD1 6.77 -1.284 0.001 0.000540 ENSG00000164465 DCBLD1 4.6 -1.496 0.001 0.000540 ENSG00000172671 ZFAND4 5.97 -0.955 0.001 0.000550 ENSG00000162520 SYNC 6.25 -0.89 0.001 0.000550 ENSG00000163485 ADORA1 4.05 -1.925 0.001 0.000550 ENSG00000165983 PTER 7.78 -1.38 0.001 0.000550 ENSG00000180537 RNF182 7.99 -0.599 0.001 0.000560 ENSG00000166333 ILK 5.29 -0.835 0.001 0.000570 ENSG00000139832 RAB20 8.27 -1.102 0.001 0.000580 ENSG00000122026 RPL21 6.95 -0.781 0.001 0.000580 ENSG00000167123 CERCAM 5.84 -2.436 0.001 0.000580 ENSG00000205664 RP11-706O15.1 7.91 -1.209 0.001 0.000590 ENSG00000122884 P4HA1 4.18 -1.509 0.001 0.000590 ENSG00000172716 SLFN11 8.52 -0.9 0.001 0.000600 ENSG00000149485 FADS1 6.61 -0.601 0.001 0.000610 ENSG00000087903 RFX2 5.67 -0.674 0.001 0.000610 ENSG00000095539 SEMA4G 7.5 -0.85 0.001 0.000610 ENSG00000160584 SIK3 7.03 -0.509 0.001 0.000610 ENSG00000166507 NDST2 7.78 -0.812 0.001 0.000620 ENSG00000074181 NOTCH3 6.24 -0.729 0.001 0.000630 ENSG00000157322 CLEC18A 9.2 -1.574 0.001 0.000640 ENSG00000112186 CAP2 7.07 -2.615 0.001 0.000650 ENSG00000087510 TFAP2C 7.91 -0.832 0.001 0.000650 ENSG00000142173 COL6A2 8.58 -0.86 0.001 0.000650 ENSG00000197965 MPZL1 5.92 -0.73 0.001 0.000660 ENSG00000124831 LRRFIP1 6.9 -1.51 0.001 0.000670 ENSG00000104783 KCNN4 3.47 -0.881 0.001 0.000670 ENSG00000146038 DCDC2 5.8 -0.887 0.001 0.000670 ENSG00000198768 APCDD1L 6.88 -0.976 0.001 0.000680 ENSG00000148429 USP6NL 8.6 -1.479 0.001 0.000680 ENSG00000169855 ROBO1 6.76 -0.914 0.001 0.000680 ENSG00000181649 PHLDA2 7.55 -0.953 0.001 0.000690 ENSG00000014216 CAPN1 5.39 -0.948 0.001 0.000690 ENSG00000112183 RBM24 7.14 -0.869 0.001 0.000690 ENSG00000118263 KLF7 7.46 -0.671 0.001 0.000700 ENSG00000148498 PARD3 5.7 -1.198 0.001 0.000710 ENSG00000135824 RGS8 6.54 -1.323 0.001 0.000720 ENSG00000161958 FGF11 6.12 -1.585 0.001 0.000720 ENSG00000172059 KLF11 8.33 -0.669 0.001 0.000720 ENSG00000172081 MOB3A 6.15 -0.95 0.001 0.000720 ENSG00000239887 C1orf226 6.85 -1.301 0.001 0.000730 ENSG00000123096 SSPN 6.8 -0.95 0.001 0.000740 ENSG00000196935 SRGAP1 6.28 -0.714 0.001 0.000750 ENSG00000157335 CLEC18C 6.9 -2.327 0.001 0.000760 ENSG00000146411 SLC2A12 7.01 -1.299 0.001 0.000760 ENSG00000197093 GAL3ST4 7.62 -0.757 0.001 0.000770 ENSG00000137216 TMEM63B 5.99 -1.143 0.001 0.000770 ENSG00000144218 AFF3 8.19 -1.003 0.001 0.000780 ENSG00000173457 PPP1R14B 6.95 -1.347 0.001 0.000790 ENSG00000065534 MYLK 9.56 -1.273 0.001 0.000790 ENSG00000124942 AHNAK 8.75 -1.094 0.001 0.000790 ENSG00000160285 LSS 6.52 -1.65 0.001 0.000790 ENSG00000172469 MANEA 6.01 -0.921 0.001 0.000790 ENSG00000173715 C11orf80 4.28 -1.027 0.001 0.000790 ENSG00000197140 ADAM32 4.65 -1.509 0.001 0.000800 ENSG00000169946 ZFPM2 7.97 -1.128 0.001 0.000800 ENSG00000183098 GPC6 5.92 -1.632 0.001 0.000810 ENSG00000003436 TFPI 7.2 -0.498 0.001 0.000810 ENSG00000070366 SMG6 6.43 -0.515 0.001 0.000830 ENSG00000172346 CSDC2 4.78 -0.836 0.001 0.000830 ENSG00000228120 AP001631.10 8.58 -0.967 0.001 0.000840 ENSG00000145349 CAMK2D 8.74 -0.995 0.001 0.000850 ENSG00000068793 CYFIP1 7.38 -0.478 0.001 0.000850 ENSG00000197324 LRP10 6.39 -0.685 0.001 0.000860 ENSG00000065665 SEC61A2 6.27 -0.754 0.001 0.000860 ENSG00000140839 CLEC18B 4.96 -1.366 0.001 0.000870 ENSG00000150630 VEGFC 7.31 -0.864 0.001 0.000870 ENSG00000151929 BAG3

292 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 5.94 -1.001 0.001 0.000870 ENSG00000165406 Mar-08 7.37 -0.827 0.001 0.000880 ENSG00000126773 PCNXL4 8.75 -0.652 0.001 0.000890 ENSG00000138107 ACTR1A 8.13 -0.645 0.001 0.000900 ENSG00000132142 ACACA 2.65 -0.596 0.001 0.000900 ENSG00000151338 MIPOL1 7.65 -2.89 0.001 0.000910 ENSG00000145623 OSMR 4.78 -1.089 0.001 0.000930 ENSG00000164484 TMEM200A 8.17 -0.961 0.001 0.000930 ENSG00000180776 ZDHHC20 6.74 -1.118 0.001 0.000940 ENSG00000165512 ZNF22 3.35 -0.987 0.001 0.000950 ENSG00000174792 C4orf26 6.79 -1.011 0.001 0.000950 ENSG00000183580 FBXL7 7.28 -0.97 0.001 0.000960 ENSG00000109536 FRG1 5.53 -0.609 0.001 0.000960 ENSG00000123191 ATP7B 8.79 -0.678 0.001 0.000960 ENSG00000152022 LIX1L 6.78 -1.182 0.001 0.000960 ENSG00000164251 F2RL1 7.8 -0.836 0.001 0.000970 ENSG00000147100 SLC16A2 7.29 -0.674 0.001 0.000990 ENSG00000130643 CALY 6.22 -0.585 0.001 0.000990 ENSG00000198546 ZNF511 4.79 2.559 0.000 0.000000 ENSG00000005020 SKAP2 5.52 1.043 0.000 0.000000 ENSG00000008056 SYN1 8.25 3.134 0.000 0.000000 ENSG00000010278 CD9 4.2 2.503 0.000 0.000000 ENSG00000064763 FAR2 6.63 0.942 0.000 0.000000 ENSG00000085117 CD82 4.32 2.793 0.000 0.000000 ENSG00000086300 SNX10 4.71 2.29 0.000 0.000000 ENSG00000087303 NID2 7.32 0.863 0.000 0.000000 ENSG00000105379 ETFB 3.96 1.434 0.000 0.000000 ENSG00000105499 PLA2G4C 3.11 1.41 0.000 0.000000 ENSG00000105889 STEAP1B 5.57 1.454 0.000 0.000000 ENSG00000106789 CORO2A 5.47 1.651 0.000 0.000000 ENSG00000114948 ADAM23 5.72 0.898 0.000 0.000000 ENSG00000119431 HDHD3 7.21 4.479 0.000 0.000000 ENSG00000124785 NRN1 7.81 0.974 0.000 0.000000 ENSG00000128563 PRKRIP1 5.96 1.277 0.000 0.000000 ENSG00000128709 HOXD9 6.11 3.085 0.000 0.000000 ENSG00000128710 HOXD10 5.82 1.922 0.000 0.000000 ENSG00000128713 HOXD11 6.64 1.409 0.000 0.000000 ENSG00000130203 APOE 5.62 4.15 0.000 0.000000 ENSG00000134853 PDGFRA 3.79 1.54 0.000 0.000000 ENSG00000137727 ARHGAP20 5.02 1.678 0.000 0.000000 ENSG00000140022 STON2 5.67 2.034 0.000 0.000000 ENSG00000142494 SLC47A1 7.69 1.182 0.000 0.000000 ENSG00000147883 CDKN2B 4.8 2.142 0.000 0.000000 ENSG00000153029 MR1 3.98 1.624 0.000 0.000000 ENSG00000154493 C10orf90 7.05 1.693 0.000 0.000000 ENSG00000154537 FAM27C 4.8 1.635 0.000 0.000000 ENSG00000156395 SORCS3 5.16 1.341 0.000 0.000000 ENSG00000160179 ABCG1 5.79 2.664 0.000 0.000000 ENSG00000163235 TGFA 3.18 1.87 0.000 0.000000 ENSG00000164647 STEAP1 4.75 1.82 0.000 0.000000 ENSG00000167216 KATNAL2 5.26 2.322 0.000 0.000000 ENSG00000167306 MYO5B 5.67 2.245 0.000 0.000000 ENSG00000168234 TTC39C 5.37 1.902 0.000 0.000000 ENSG00000168779 SHOX2 8.11 1.311 0.000 0.000000 ENSG00000169919 GUSB 7.18 1.878 0.000 0.000000 ENSG00000170215 FAM27B 5.26 1.276 0.000 0.000000 ENSG00000171729 TMEM51 4.33 2.548 0.000 0.000000 ENSG00000173083 HPSE 6.17 1.425 0.000 0.000000 ENSG00000173805 HAP1 6.57 3.093 0.000 0.000000 ENSG00000175197 DDIT3 6.62 3.734 0.000 0.000000 ENSG00000176165 FOXG1 7.24 1.431 0.000 0.000000 ENSG00000178381 ZFAND2A 3.53 1.663 0.000 0.000000 ENSG00000178538 CA8 7.2 1.913 0.000 0.000000 ENSG00000182368 FAM27A 4.39 2.726 0.000 0.000000 ENSG00000186960 C14orf23 7.12 1.148 0.000 0.000000 ENSG00000187244 BCAM 5.4 1.328 0.000 0.000000 ENSG00000187764 SEMA4D 6.42 4.099 0.000 0.000000 ENSG00000189058 APOD 3.76 1.495 0.000 0.000000 ENSG00000196954 CASP4 5.08 1.747 0.000 0.000000 ENSG00000204634 TBC1D8 5.43 1.793 0.000 0.000000 ENSG00000214575 CPEB1 4.75 2.361 0.000 0.000000 ENSG00000223572 CKMT1A 4.81 2.497 0.000 0.000000 ENSG00000237289 CKMT1B 5.34 2.004 0.000 0.000000 ENSG00000258518 AC112502.1 6.52 1.423 0.000 0.000010 ENSG00000003096 KLHL13 5.29 2.135 0.000 0.000010 ENSG00000019549 SNAI2 4.56 1.944 0.000 0.000010 ENSG00000056736 IL17RB

293 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 8.68 0.866 0.000 0.000010 ENSG00000073578 SDHA 6.03 3.092 0.000 0.000010 ENSG00000089127 OAS1 3.57 1.6 0.000 0.000010 ENSG00000092607 TBX15 4.82 1.835 0.000 0.000010 ENSG00000102445 KIAA0226L 4.8 1.261 0.000 0.000010 ENSG00000103723 AP3B2 7.97 0.683 0.000 0.000010 ENSG00000106263 EIF3B 7.77 1.071 0.000 0.000010 ENSG00000121083 DYNLL2 6.83 1.603 0.000 0.000010 ENSG00000123146 CD97 6.29 0.672 0.000 0.000010 ENSG00000125746 EML2 7.19 0.883 0.000 0.000010 ENSG00000130305 NSUN5 7.03 2.957 0.000 0.000010 ENSG00000146072 TNFRSF21 4.7 1.327 0.000 0.000010 ENSG00000151117 TMEM86A 4.81 1.072 0.000 0.000010 ENSG00000153790 C7orf31 5.74 2.749 0.000 0.000010 ENSG00000166741 NNMT 5.11 1.937 0.000 0.000010 ENSG00000186088 PION 4.71 1.77 0.000 0.000010 ENSG00000187098 MITF 3.72 1.906 0.000 0.000010 ENSG00000204397 CARD16 7.31 1.009 0.000 0.000010 ENSG00000232098 AC012313.1 6.94 1.111 0.000 0.000020 ENSG00000072071 LPHN1 5.36 2.843 0.000 0.000020 ENSG00000089041 P2RX7 7.79 0.858 0.000 0.000020 ENSG00000089057 SLC23A2 5.99 1.358 0.000 0.000020 ENSG00000096433 ITPR3 8.6 0.812 0.000 0.000020 ENSG00000101474 APMAP 6.01 0.937 0.000 0.000020 ENSG00000103241 FOXF1 7.4 0.773 0.000 0.000020 ENSG00000105518 TMEM205 4.7 3.12 0.000 0.000020 ENSG00000106688 SLC1A1 4.11 1.188 0.000 0.000020 ENSG00000107864 CPEB3 6.82 1.511 0.000 0.000020 ENSG00000110841 PPFIBP1 6.66 1.511 0.000 0.000020 ENSG00000117280 RAB7L1 8.48 1.567 0.000 0.000020 ENSG00000125148 MT2A 5.93 2.858 0.000 0.000020 ENSG00000125820 NKX2-2 6.31 2.012 0.000 0.000020 ENSG00000134202 GSTM3 3.83 1.725 0.000 0.000020 ENSG00000134716 CYP2J2 4.06 1.958 0.000 0.000020 ENSG00000136237 RAPGEF5 7.69 2.759 0.000 0.000020 ENSG00000136842 TMOD1 6.32 0.922 0.000 0.000020 ENSG00000141026 MED9 6.37 1.279 0.000 0.000020 ENSG00000147889 CDKN2A 5.91 1.256 0.000 0.000020 ENSG00000158856 EPB49 7.98 1.855 0.000 0.000020 ENSG00000162873 KLHDC8A 6.97 0.685 0.000 0.000020 ENSG00000167106 FAM102A 6.31 3.628 0.000 0.000020 ENSG00000170689 HOXB9 5.8 1.171 0.000 0.000020 ENSG00000172183 ISG20 4.23 1.273 0.000 0.000020 ENSG00000172345 STARD5 7.02 1.316 0.000 0.000020 ENSG00000176490 DIRAS1 6.55 0.867 0.000 0.000020 ENSG00000177556 ATOX1 5.87 2.507 0.000 0.000020 ENSG00000196361 ELAVL3 5.22 0.873 0.000 0.000020 ENSG00000196476 C20orf96 5.9 1.137 0.000 0.000020 ENSG00000196683 TOMM7 8.22 1.421 0.000 0.000020 ENSG00000205362 MT1A 4.9 1.465 0.000 0.000030 ENSG00000018869 ZNF582 6.66 0.818 0.000 0.000030 ENSG00000101181 GTPBP5 5.69 1.972 0.000 0.000030 ENSG00000113296 THBS4 5.74 1.581 0.000 0.000030 ENSG00000119922 IFIT2 6.83 0.596 0.000 0.000030 ENSG00000130813 C19orf66 6.99 1.327 0.000 0.000030 ENSG00000145216 FIP1L1 5.26 1.11 0.000 0.000030 ENSG00000149927 DOC2A 4.84 1.528 0.000 0.000030 ENSG00000151090 THRB 7.77 1.094 0.000 0.000030 ENSG00000157259 GATAD1 3.99 0.896 0.000 0.000030 ENSG00000164398 ACSL6 4.64 2.353 0.000 0.000030 ENSG00000165443 PHYHIPL 5.03 1.62 0.000 0.000030 ENSG00000170941 AC135352.1 4.99 2.142 0.000 0.000030 ENSG00000182326 C1S 2.77 0.976 0.000 0.000030 ENSG00000188906 LRRK2 7.76 1.085 0.000 0.000030 ENSG00000198874 TYW1 4.54 1.754 0.000 0.000040 ENSG00000073910 FRY 6.26 2.104 0.000 0.000040 ENSG00000104419 NDRG1 7.3 1.434 0.000 0.000040 ENSG00000130066 SAT1 4.96 1.985 0.000 0.000040 ENSG00000132436 FIGNL1 3.52 0.689 0.000 0.000040 ENSG00000143226 FCGR2A 4.62 1.31 0.000 0.000040 ENSG00000143320 CRABP2 5.93 4.882 0.000 0.000040 ENSG00000156298 TSPAN7 5.53 2.923 0.000 0.000040 ENSG00000160862 AZGP1 6.28 2.132 0.000 0.000040 ENSG00000163638 ADAMTS9 4.94 1.336 0.000 0.000040 ENSG00000189164 ZNF527 6.19 1.071 0.000 0.000050 ENSG00000105649 RAB3A 5.66 1.003 0.000 0.000050 ENSG00000136002 ARHGEF4

294 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 6.21 1.475 0.000 0.000050 ENSG00000136243 NUPL2 5.64 0.992 0.000 0.000050 ENSG00000151093 OXSM 4.85 1.082 0.000 0.000050 ENSG00000155629 PIK3AP1 7.07 0.716 0.000 0.000050 ENSG00000157778 PSMG3 8.48 1.368 0.000 0.000050 ENSG00000162545 CAMK2N1 4.72 1.871 0.000 0.000050 ENSG00000164741 DLC1 5.24 1.051 0.000 0.000060 ENSG00000058866 DGKG 5.61 1.415 0.000 0.000060 ENSG00000073464 CLCN4 9.27 0.763 0.000 0.000060 ENSG00000086232 EIF2AK1 3.97 2.144 0.000 0.000060 ENSG00000108176 DNAJC12 7.3 1.402 0.000 0.000060 ENSG00000112852 PCDHB2 4.68 1.736 0.000 0.000060 ENSG00000137393 RNF144B 6.16 1.387 0.000 0.000070 ENSG00000116574 RHOU 8.42 1.11 0.000 0.000070 ENSG00000124226 RNF114 5.72 0.79 0.000 0.000070 ENSG00000129282 MRM1 6 1.143 0.000 0.000070 ENSG00000159228 CBR1 7.25 1.234 0.000 0.000070 ENSG00000166770 AC004696.2 6.42 0.938 0.000 0.000070 ENSG00000172840 PDP2 6.38 0.76 0.000 0.000070 ENSG00000174672 BRSK2 7.22 0.934 0.000 0.000070 ENSG00000179304 FAM156B 6.01 2.099 0.000 0.000070 ENSG00000182217 HIST2H4B 6.01 2.099 0.000 0.000070 ENSG00000183941 HIST2H4A 7.93 2.197 0.000 0.000070 ENSG00000196532 HIST1H3C 6.17 0.703 0.000 0.000070 ENSG00000196741 CXorf24 4.98 1.47 0.000 0.000070 ENSG00000214106 AC093726.6 5.49 0.968 0.000 0.000080 ENSG00000036672 USP2 5.53 1.341 0.000 0.000080 ENSG00000088035 ALG6 5.15 1.343 0.000 0.000080 ENSG00000156500 FAM122C 5.74 1.177 0.000 0.000080 ENSG00000163884 KLF15 6.04 1.258 0.000 0.000080 ENSG00000166402 TUB 7.17 0.939 0.000 0.000080 ENSG00000182646 FAM156A 6.64 4.408 0.000 0.000080 ENSG00000250349 RP5-972B16.2 6.97 0.713 0.000 0.000090 ENSG00000101189 C20orf20 3.92 1.212 0.000 0.000090 ENSG00000119514 GALNT12 3.71 0.824 0.000 0.000090 ENSG00000125804 FAM182A 7.55 0.989 0.000 0.000090 ENSG00000163156 SCNM1 3.42 1.18 0.000 0.000090 ENSG00000181016 C7orf53 6.35 0.891 0.000 0.000090 ENSG00000184205 TSPYL2 4.35 1.373 0.000 0.000090 ENSG00000189283 FHIT 5.51 1.475 0.000 0.000090 ENSG00000198270 TMEM116 7.84 1.309 0.000 0.000100 ENSG00000101224 CDC25B 5.52 0.686 0.000 0.000100 ENSG00000126950 TMEM35 4.4 2.366 0.000 0.000100 ENSG00000148735 PLEKHS1 5.04 1.951 0.000 0.000100 ENSG00000163644 PPM1K 7.83 1.683 0.000 0.000100 ENSG00000172115 CYCS 5.24 1.512 0.000 0.000100 ENSG00000204807 FAM27E2 6.23 1.594 0.000 0.000100 ENSG00000251369 AC003682.1 6.76 1.991 0.000 0.000100 ENSG00000255986 MT1JP 5.41 1.122 0.000 0.000110 ENSG00000131242 RAB11FIP4 7.78 0.733 0.000 0.000110 ENSG00000146676 PURB 6.69 2.507 0.000 0.000110 ENSG00000149294 NCAM1 4.24 0.751 0.000 0.000110 ENSG00000175697 GPR156 4.15 0.854 0.000 0.000110 ENSG00000214290 C11orf93 4.27 1.558 0.000 0.000120 ENSG00000082482 KCNK2 4.89 0.887 0.000 0.000120 ENSG00000100307 CBX7 2.71 0.772 0.000 0.000120 ENSG00000146856 AGBL3 5.69 0.606 0.000 0.000120 ENSG00000169136 ATF5 4.36 0.938 0.000 0.000120 ENSG00000175170 FAM182B 6.37 0.774 0.000 0.000120 ENSG00000255098 RP11-481A20.11 3.52 1.432 0.000 0.000130 ENSG00000003147 ICA1 4.87 1.498 0.000 0.000130 ENSG00000007944 MYLIP 5.81 0.772 0.000 0.000130 ENSG00000101104 PABPC1L 5 1.218 0.000 0.000130 ENSG00000102362 SYTL4 6.13 1.354 0.000 0.000130 ENSG00000158186 MRAS 4.22 0.902 0.000 0.000130 ENSG00000176273 SLC35G1 6.5 1.029 0.000 0.000130 ENSG00000256073 C21orf119 5.73 2.686 0.000 0.000140 ENSG00000117602 RCAN3 7.14 1.232 0.000 0.000140 ENSG00000141753 IGFBP4 5.52 0.559 0.000 0.000140 ENSG00000149599 DUSP15 4.73 1.741 0.000 0.000140 ENSG00000154734 ADAMTS1 3.69 1.648 0.000 0.000150 ENSG00000064225 ST3GAL6 5.21 1.771 0.000 0.000150 ENSG00000106004 HOXA5 6.07 1.193 0.000 0.000150 ENSG00000111364 DDX55 6.02 1.5 0.000 0.000150 ENSG00000120318 ARAP3 5.89 0.804 0.000 0.000150 ENSG00000178809 TRIM73 7.59 0.945 0.000 0.000150 ENSG00000184371 CSF1

295 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 6.4 0.769 0.000 0.000150 ENSG00000184857 TMEM186 5.64 1.271 0.000 0.000150 ENSG00000198814 GK 4.72 1.253 0.000 0.000150 ENSG00000213903 LTB4R 5.17 1.876 0.000 0.000160 ENSG00000008277 ADAM22 3.51 1.09 0.000 0.000160 ENSG00000118777 ABCG2 5.85 1.024 0.000 0.000160 ENSG00000164048 ZNF589 6.16 0.956 0.000 0.000160 ENSG00000175175 PPM1E 6.99 1.096 0.000 0.000160 ENSG00000232838 PET117 6.64 0.794 0.000 0.000170 ENSG00000102934 PLLP 6.78 0.622 0.000 0.000170 ENSG00000141582 CBX4 8.37 0.573 0.000 0.000170 ENSG00000143543 JTB 3.96 1.576 0.000 0.000170 ENSG00000158315 RHBDL2 6.45 1.095 0.000 0.000170 ENSG00000204524 ZNF805 6.43 1.7 0.000 0.000180 ENSG00000006652 IFRD1 5.02 1.218 0.000 0.000180 ENSG00000092068 SLC7A8 4.53 2.247 0.000 0.000180 ENSG00000152527 PLEKHH2 6.11 1.012 0.000 0.000180 ENSG00000164620 RELL2 5.74 1.411 0.000 0.000180 ENSG00000165795 NDRG2 6.38 2.236 0.000 0.000190 ENSG00000105409 ATP1A3 8.02 1.715 0.000 0.000190 ENSG00000106537 TSPAN13 6.58 1.894 0.000 0.000190 ENSG00000125144 MT1G 5.6 1.454 0.000 0.000190 ENSG00000164684 ZNF704 6.47 0.985 0.000 0.000200 ENSG00000065717 TLE2 3.93 1.031 0.000 0.000200 ENSG00000122862 SRGN 4.01 1.439 0.000 0.000200 ENSG00000137203 TFAP2A 6.21 1.126 0.000 0.000200 ENSG00000178878 APOLD1 5.99 0.934 0.000 0.000200 ENSG00000180953 ST20 8.5 1.354 0.000 0.000200 ENSG00000198743 SLC5A3 5.6 0.559 0.000 0.000200 ENSG00000251357 AP000350.10 7.33 0.633 0.000 0.000210 ENSG00000130733 YIPF2 5.89 0.896 0.000 0.000210 ENSG00000142528 ZNF473 5.73 1.626 0.000 0.000210 ENSG00000143162 CREG1 6.19 0.828 0.000 0.000210 ENSG00000146909 NOM1 5.34 0.95 0.000 0.000210 ENSG00000149150 SLC43A1 3.29 0.995 0.000 0.000210 ENSG00000165923 AGBL2 6.6 0.893 0.000 0.000210 ENSG00000175898 S1PR2 4.07 0.944 0.000 0.000210 ENSG00000188732 FAM221A 7.44 1.401 0.000 0.000210 ENSG00000231997 FAM27D1 5.25 0.581 0.000 0.000220 ENSG00000126583 PRKCG 6.21 1.077 0.000 0.000220 ENSG00000173875 ZNF791 3.79 1.662 0.000 0.000230 ENSG00000071073 MGAT4A 6.82 0.743 0.000 0.000230 ENSG00000106638 TBL2 7.47 1.573 0.000 0.000230 ENSG00000189060 H1F0 6.2 0.673 0.000 0.000240 ENSG00000106404 CLDN15 6.5 0.814 0.000 0.000240 ENSG00000106479 ZNF862 5.81 0.871 0.000 0.000240 ENSG00000106608 URGCP 5.85 0.819 0.000 0.000240 ENSG00000108852 MPP2 6.65 2.284 0.000 0.000240 ENSG00000154096 THY1 9.19 1.685 0.000 0.000240 ENSG00000166986 MARS 6.7 2.826 0.000 0.000240 ENSG00000175445 LPL 7.67 0.515 0.000 0.000240 ENSG00000184787 UBE2G2 5.33 1.325 0.000 0.000240 ENSG00000237198 FAM27E1 3.64 1.397 0.000 0.000240 ENSG00000250305 KIAA1456 7.42 0.921 0.000 0.000250 ENSG00000130254 SAFB2 4.71 0.878 0.000 0.000250 ENSG00000152049 KCNE4 4.05 1.314 0.000 0.000250 ENSG00000175906 ARL4D 3.91 0.779 0.000 0.000250 ENSG00000204086 RPA4 5.35 2.281 0.000 0.000250 ENSG00000205364 MT1M 4.05 1.053 0.000 0.000260 ENSG00000076554 TPD52 8.37 1.379 0.000 0.000260 ENSG00000085662 AKR1B1 8.61 0.814 0.000 0.000260 ENSG00000104687 GSR 6.29 1.082 0.000 0.000260 ENSG00000105926 MPP6 9.84 0.823 0.000 0.000260 ENSG00000146701 MDH2 5.59 0.957 0.000 0.000260 ENSG00000168301 KCTD6 5.9 0.865 0.000 0.000260 ENSG00000174939 ASPHD1 5.92 0.645 0.000 0.000260 ENSG00000204536 CCHCR1 6.69 0.769 0.000 0.000270 ENSG00000106785 TRIM14 6.99 0.808 0.000 0.000270 ENSG00000125875 TBC1D20 5.82 1.073 0.000 0.000270 ENSG00000164742 ADCY1 4.02 1.271 0.000 0.000270 ENSG00000196932 TMEM26 4.66 0.86 0.000 0.000270 ENSG00000197933 ZNF823 6 1.557 0.000 0.000270 ENSG00000198053 SIRPA 5.43 0.934 0.000 0.000270 ENSG00000215897 ZBTB8B 4.79 0.711 0.000 0.000280 ENSG00000124613 ZNF391 6.98 1.247 0.000 0.000280 ENSG00000126368 NR1D1 4.95 0.642 0.000 0.000290 ENSG00000125510 OPRL1

296 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 4.21 1.238 0.000 0.000290 ENSG00000166387 PPFIBP2 5.58 0.709 0.000 0.000290 ENSG00000186951 PPARA 7.3 1.284 0.000 0.000290 ENSG00000187372 PCDHB13 3.61 0.734 0.000 0.000290 ENSG00000196167 C11orf92 5.69 1.412 0.000 0.000300 ENSG00000068650 ATP11A 7.51 0.631 0.000 0.000300 ENSG00000106070 GRB10 5.19 0.553 0.000 0.000300 ENSG00000170379 FAM115C 5.03 0.896 0.000 0.000300 ENSG00000257755 orphan 4.33 1.526 0.000 0.000310 ENSG00000104324 CPQ 5.18 0.844 0.000 0.000310 ENSG00000107957 SH3PXD2A 6.49 0.752 0.000 0.000310 ENSG00000114859 CLCN2 7.06 1.16 0.000 0.000310 ENSG00000135457 TFCP2 5.35 1.852 0.000 0.000310 ENSG00000153714 LURAP1L 10.2 0.876 0.000 0.000310 ENSG00000171867 PRNP 3.1 1.481 0.000 0.000320 ENSG00000106560 GIMAP2 7.45 1.354 0.000 0.000320 ENSG00000150893 FREM2 5.65 0.632 0.000 0.000320 ENSG00000163132 MSX1 8 0.759 0.000 0.000320 ENSG00000178741 COX5A 5.45 1.098 0.000 0.000320 ENSG00000211445 GPX3 6.16 0.599 0.000 0.000330 ENSG00000083812 ZNF324 7.47 0.735 0.000 0.000330 ENSG00000145882 PCYOX1L 5.9 0.732 0.000 0.000330 ENSG00000155428 TRIM74 6.82 0.92 0.000 0.000340 ENSG00000106415 GLCCI1 5.56 0.658 0.000 0.000340 ENSG00000116039 ATP6V1B1 4.71 0.686 0.000 0.000340 ENSG00000132801 ZSWIM3 6.32 0.688 0.000 0.000340 ENSG00000134086 VHL 7.86 0.575 0.000 0.000350 ENSG00000113269 RNF130 5.23 1.823 0.000 0.000350 ENSG00000131015 ULBP2 4.27 1.325 0.000 0.000350 ENSG00000197757 HOXC6 7.01 1.393 0.000 0.000350 ENSG00000231360 AL592284.2 3.41 0.946 0.000 0.000360 ENSG00000025423 HSD17B6 5.67 1.773 0.000 0.000360 ENSG00000165118 C9orf64 4.77 0.903 0.000 0.000360 ENSG00000175911 AC127496.1 4.45 1.145 0.000 0.000360 ENSG00000232956 SNHG15 5.73 1.061 0.000 0.000370 ENSG00000135245 HILPDA 6.24 1.297 0.000 0.000370 ENSG00000135472 FAIM2 6.44 0.912 0.000 0.000370 ENSG00000154930 ACSS1 7.65 1.07 0.000 0.000370 ENSG00000168303 MPLKIP 5.48 0.809 0.000 0.000370 ENSG00000205358 MT1H 4.75 1.489 0.000 0.000380 ENSG00000112379 KIAA1244 4.21 1.081 0.000 0.000380 ENSG00000119547 ONECUT2 3.38 0.996 0.000 0.000380 ENSG00000144290 SLC4A10 5.43 0.988 0.000 0.000380 ENSG00000146833 TRIM4 7.01 1.023 0.000 0.000380 ENSG00000173276 ZNF295 8.3 0.683 0.000 0.000380 ENSG00000175193 PARL 4.84 0.881 0.000 0.000390 ENSG00000204856 FAM216A 6.52 1.323 0.000 0.000400 ENSG00000107829 FBXW4 6.84 0.91 0.000 0.000400 ENSG00000124228 DDX27 7.24 0.665 0.000 0.000410 ENSG00000101407 TTI1 6.79 0.747 0.000 0.000410 ENSG00000106245 BUD31 6.3 0.592 0.000 0.000410 ENSG00000142700 DMRTA2 6.5 0.915 0.000 0.000410 ENSG00000189046 ALKBH2 5.51 2.199 0.000 0.000410 ENSG00000215247 5.9 0.698 0.000 0.000420 ENSG00000006194 ZNF263 7.11 1.686 0.000 0.000420 ENSG00000120328 PCDHB12 7.42 0.68 0.000 0.000420 ENSG00000136213 CHST12 5.42 1.536 0.000 0.000420 ENSG00000151364 KCTD14 7.23 0.928 0.000 0.000420 ENSG00000198171 DDRGK1 6.44 0.8 0.000 0.000430 ENSG00000131368 MRPS25 8.34 0.747 0.000 0.000440 ENSG00000111678 C12orf57 6.57 0.838 0.000 0.000440 ENSG00000165661 QSOX2 5.88 1.608 0.000 0.000440 ENSG00000186868 MAPT 5.39 0.894 0.000 0.000440 ENSG00000253293 HOXA10 7.35 0.627 0.000 0.000450 ENSG00000156928 MALSU1 5.28 1.124 0.000 0.000450 ENSG00000172006 ZNF554 4.69 0.646 0.000 0.000450 ENSG00000177335 C8orf31 5.68 1.476 0.000 0.000450 ENSG00000206535 LNP1 6.92 0.94 0.000 0.000460 ENSG00000197608 ZNF841 3.25 1.454 0.000 0.000470 ENSG00000168811 IL12A 4.52 0.897 0.000 0.000470 ENSG00000176593 CTD-2368P22.1 3.63 1.342 0.000 0.000480 ENSG00000113389 NPR3 3.35 1.071 0.000 0.000480 ENSG00000176907 C8orf4 5.37 0.494 0.000 0.000480 ENSG00000177380 PPFIA3 7.44 0.526 0.000 0.000480 ENSG00000229212 RP11-561C5.4 4.52 1.355 0.000 0.000490 ENSG00000120915 EPHX2 4.77 1.002 0.000 0.000490 ENSG00000125388 GRK4

297 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 6.05 0.737 0.001 0.000500 ENSG00000103024 NME3 7.42 0.557 0.001 0.000500 ENSG00000105583 WDR83OS 4.38 0.804 0.001 0.000500 ENSG00000133401 PDZD2 6.69 0.801 0.001 0.000500 ENSG00000196652 ZKSCAN5 7.71 1.671 0.001 0.000510 ENSG00000070669 ASNS 6.84 0.567 0.001 0.000510 ENSG00000096070 BRPF3 5.69 1.106 0.001 0.000510 ENSG00000101187 SLCO4A1 5.48 0.832 0.001 0.000510 ENSG00000120158 RCL1 9.84 0.767 0.001 0.000510 ENSG00000163399 ATP1A1 5.71 1.081 0.001 0.000510 ENSG00000196981 WDR5B 4.19 1.925 0.001 0.000520 ENSG00000136514 RTP4 6.56 2.056 0.001 0.000520 ENSG00000156011 PSD3 4.5 0.937 0.001 0.000520 ENSG00000159882 ZNF230 5.22 0.836 0.001 0.000520 ENSG00000163362 C1orf106 5.09 2.391 0.001 0.000530 ENSG00000106511 MEOX2 5.05 1.189 0.001 0.000530 ENSG00000128604 IRF5 6.86 1.183 0.001 0.000530 ENSG00000163938 GNL3 6.95 0.971 0.001 0.000540 ENSG00000127952 STYXL1 7.12 1.905 0.001 0.000540 ENSG00000164649 CDCA7L 9.33 0.772 0.001 0.000540 ENSG00000187145 MRPS21 7.6 0.708 0.001 0.000550 ENSG00000068305 MEF2A 5.5 0.907 0.001 0.000550 ENSG00000124772 CPNE5 6.17 0.721 0.001 0.000550 ENSG00000129347 KRI1 4.03 0.802 0.001 0.000550 ENSG00000196482 ESRRG 5.88 0.673 0.001 0.000560 ENSG00000123358 NR4A1 4.08 0.7 0.001 0.000560 ENSG00000203815 AL358813.2 4.84 0.817 0.001 0.000570 ENSG00000110031 LPXN 6.18 0.467 0.001 0.000580 ENSG00000104731 KLHDC4 5.1 0.815 0.001 0.000580 ENSG00000144115 THNSL2 5 1.87 0.001 0.000590 ENSG00000104611 SH2D4A 5.53 0.752 0.001 0.000590 ENSG00000117266 CDK18 7.58 1.016 0.001 0.000590 ENSG00000171223 JUNB 6.64 0.746 0.001 0.000600 ENSG00000068400 GRIPAP1 5.05 0.96 0.001 0.000600 ENSG00000184208 C22orf46 8 0.613 0.001 0.000610 ENSG00000130717 UCK1 3.53 1.117 0.001 0.000610 ENSG00000135678 CPM 6.86 1.135 0.001 0.000610 ENSG00000257921 RP11-571M6.15 6.36 0.507 0.001 0.000610 ENSG00000258720 3.41 0.725 0.001 0.000620 ENSG00000140009 ESR2 5.09 1.82 0.001 0.000620 ENSG00000144847 IGSF11 3.89 2.44 0.001 0.000620 ENSG00000157214 STEAP2 7.27 0.911 0.001 0.000620 ENSG00000174243 DDX23 5.36 0.76 0.001 0.000620 ENSG00000213983 AP1G2 4.77 0.842 0.001 0.000630 ENSG00000095380 NANS 4.07 2.206 0.001 0.000630 ENSG00000152078 TMEM56 5.1 0.773 0.001 0.000640 ENSG00000188818 ZDHHC11 6.82 1.812 0.001 0.000650 ENSG00000120324 PCDHB10 5.77 1.023 0.001 0.000650 ENSG00000122877 EGR2 6.72 1.068 0.001 0.000650 ENSG00000203814 HIST2H2BF 4.4 1.535 0.001 0.000660 ENSG00000111846 GCNT2 4.62 1.74 0.001 0.000660 ENSG00000174007 CEP19 5.05 1.321 0.001 0.000660 ENSG00000180818 HOXC10 8.34 0.553 0.001 0.000660 ENSG00000214331 RP11-252A24.2 4.6 1.204 0.001 0.000670 ENSG00000185015 CA13 6.4 0.578 0.001 0.000680 ENSG00000148290 SURF1 6.12 1.141 0.001 0.000680 ENSG00000214357 NEURL1B 5.77 3.086 0.001 0.000690 ENSG00000154529 CNTNAP3B 6.46 0.53 0.001 0.000690 ENSG00000160299 PCNT 7.31 0.729 0.001 0.000700 ENSG00000087250 MT3 6.03 2.89 0.001 0.000700 ENSG00000106714 CNTNAP3 6.23 0.712 0.001 0.000700 ENSG00000181191 PJA1 8.21 0.906 0.001 0.000710 ENSG00000101210 EEF1A2 5.14 0.814 0.001 0.000710 ENSG00000160207 HSF2BP 5.97 0.818 0.001 0.000710 ENSG00000160908 ZNF394 5.69 0.548 0.001 0.000710 ENSG00000165655 ZNF503 5.24 2.121 0.001 0.000710 ENSG00000174607 UGT8 7.63 0.754 0.001 0.000720 ENSG00000104980 TIMM44 4.83 0.918 0.001 0.000720 ENSG00000162757 C1orf74 5.19 0.796 0.001 0.000730 ENSG00000062282 DGAT2 6.04 1.083 0.001 0.000730 ENSG00000064300 NGFR 3.4 1.382 0.001 0.000730 ENSG00000165071 TMEM71 3.53 1.144 0.001 0.000730 ENSG00000227471 AKR1B15 6.81 1.233 0.001 0.000740 ENSG00000111276 CDKN1B 6.24 0.731 0.001 0.000740 ENSG00000172366 FAM195A 6.03 1.004 0.001 0.000740 ENSG00000188283 ZNF383 5.05 0.755 0.001 0.000750 ENSG00000111199 TRPV4

298 E. Exon Array Data Appendix

Average expression across samples Fold-change p-value FDR Ensembl Gene ID Symbol 7.26 1.393 0.001 0.000750 ENSG00000116991 SIPA1L2 8.23 0.568 0.001 0.000780 ENSG00000196367 TRRAP 6.49 1.202 0.001 0.000780 ENSG00000198729 PPP1R14C 4.34 0.843 0.001 0.000790 ENSG00000135127 CCDC64 6.54 1.345 0.001 0.000800 ENSG00000106689 LHX2 4.92 0.897 0.001 0.000800 ENSG00000221994 ZNF630 8.31 0.722 0.001 0.000810 ENSG00000101365 IDH3B 7.15 0.669 0.001 0.000810 ENSG00000114767 RRP9 6.95 0.503 0.001 0.000810 ENSG00000198258 UBL5 7.8 0.771 0.001 0.000820 ENSG00000078043 PIAS2 7.41 0.87 0.001 0.000820 ENSG00000111266 DUSP16 6.56 0.92 0.001 0.000820 ENSG00000112715 VEGFA 6.95 1.222 0.001 0.000830 ENSG00000123080 CDKN2C 5.81 0.756 0.001 0.000830 ENSG00000130158 DOCK6 5 0.858 0.001 0.000840 ENSG00000125352 RNF113A 4.69 2.173 0.001 0.000840 ENSG00000138642 HERC6 4.04 2.745 0.001 0.000860 ENSG00000138646 HERC5 6.46 0.684 0.001 0.000860 ENSG00000197037 ZNF498 8.15 0.797 0.001 0.000870 ENSG00000071462 WBSCR22 5.13 1.572 0.001 0.000870 ENSG00000133321 RARRES3 7.79 0.901 0.001 0.000870 ENSG00000160633 SAFB 7.56 0.741 0.001 0.000870 ENSG00000164933 SLC25A32 7.65 0.65 0.001 0.000880 ENSG00000125844 RRBP1 7.59 0.595 0.001 0.000880 ENSG00000164896 FASTK 4.47 1.076 0.001 0.000880 ENSG00000197857 ZNF44 7.31 1.007 0.001 0.000890 ENSG00000077348 EXOSC5 7.23 0.664 0.001 0.000900 ENSG00000101040 ZMYND8 3.75 1.724 0.001 0.000900 ENSG00000137752 CASP1 4.91 0.534 0.001 0.000900 ENSG00000143578 CREB3L4 6.86 1.846 0.001 0.000900 ENSG00000154856 APCDD1 4.11 0.919 0.001 0.000910 ENSG00000105708 ZNF14 4.74 1.522 0.001 0.000910 ENSG00000116761 CTH 7.33 0.778 0.001 0.000910 ENSG00000196365 LONP1 7.82 1.045 0.001 0.000920 ENSG00000013374 NUB1 6.08 0.799 0.001 0.000920 ENSG00000077713 SLC25A43 3.54 0.864 0.001 0.000930 ENSG00000157224 CLDN12 7.3 0.662 0.001 0.000930 ENSG00000160200 CBS 5.12 3.1 0.001 0.000930 ENSG00000227921 AL353791.1 4.02 1.202 0.001 0.000940 ENSG00000152433 ZNF547 6.71 0.572 0.001 0.000950 ENSG00000105559 PLEKHA4 6.45 1.93 0.001 0.000960 ENSG00000113205 PCDHB3 3.5 0.926 0.001 0.000970 ENSG00000136167 LCP1 2.97 0.557 0.001 0.000970 ENSG00000136839 OR13C9 3.02 0.567 0.001 0.000980 ENSG00000130783 CCDC62 4.96 1.806 0.001 0.000980 ENSG00000132498 ANKRD20A3 7.62 0.717 0.001 0.000980 ENSG00000146834 MEPCE 7.66 2.08 0.001 0.000980 ENSG00000196963 PCDHB16 4.6 1.779 0.001 0.000990 ENSG00000110436 SLC1A2 8.16 0.75 0.001 0.000990 ENSG00000137817 PARP6 7.33 1.172 0.001 0.000990 ENSG00000141576 RNF157 4.53 2.338 0.001 0.001000 ENSG00000018236 CNTN1 5.8 2.56 0.001 0.001000 ENSG00000100095 SEZ6L 7.36 0.859 0.001 0.001000 ENSG00000126947 ARMCX1 3.88 1.307 0.001 0.001000 ENSG00000138741 TRPC3 5.29 0.642 0.001 0.001000 ENSG00000163013 FBXO41

299 Appendix F

MicroRNA Array Data

The data generated with the Agilent microRNA array platform is reported be- low, with the name of each microRNA that was probed listed in first column - the microRNAs with an asterisk in their name represent sequences that origi- nated from the opposite strand in the pri-microRNA secondary structure [71]. The expression value measured is displayed in each GNS and NS cell line, and a fold change measurement is calculated - with an FDR associated - to rep- resent the differential expression for each microRNA in GNS versus NS cell lines. Negative fold change values identify greater average expression across the NS cell lines with respect to GNS cell lines, whilst positive fold change values identify greater average expression across GNS cell lines with respect to NS cell lines. The FDR values are reported in scientific notation.

300 F. MicroRNA Array Data Appendix 9.47E-03 8.70E-03 8.57E-03 8.44E-03 8.29E-03 8.04E-03 7.83E-03 7.64E-03 6.96E-03 6.23E-03 5.68E-03 5.21E-03 4.75E-03 4.70E-03 4.08E-03 3.67E-03 3.55E-03 3.49E-03 3.19E-03 3.05E-03 2.62E-03 2.43E-03 2.40E-03 2.15E-03 1.67E-03 1.60E-03 3.62E-03 3.26E-03 3.19E-03 3.12E-03 3.04E-03 2.91E-03 2.81E-03 2.72E-03 2.40E-03 2.10E-03 1.87E-03 1.69E-03 1.52E-03 1.49E-03 1.28E-03 1.13E-03 1.08E-03 1.06E-03 9.48E-04 8.95E-04 7.46E-04 6.89E-04 6.73E-04 5.89E-04 4.45E-04 4.21E-04 0.5 0.6 1.6 0.5 0.7 0.9 0.9 0.6 1 0.8 0.5 0.7 0.9 0.7 0.6 1 0.8 0.6 0.8 0.8 1.1 2.2 0.8 1.3 1.2 0.7 8.65 4.82 3.2 8.19 -2.01 2.85 -2.07 8.82 -0.98 0.84 6.17 1.99 0.53 0.87 2.03 -1.29 -2.2 10.32 3.02 -0.34 3.64 5.03 3.93 0.29 -0.59 -1.66 8.29 4.26 3.39 8.65 -1.47 4.08 -1.82 8.21 -1.25 0.46 6.39 1.55 0.32 1.35 2.19 -0.68 -2.53 10.47 2.72 -0.89 2.55 3.82 2.8 -0.43 1.23 -1.94 7.43 4.61 3.04 8.85 -2.17 4.74 -2.04 7.55 -1.24 -0.8 6.72 2.07 -0.85 2.09 2.79 -0.91 -2.03 10.44 1.7 -1.04 2.58 -1.42 3.14 -0.66 1.11 -1.3 7.98 5.11 2.71 8.99 -1.45 4.75 -1.33 7.96 -0.83 0.61 6.76 1.97 0.45 1.94 2.16 -0.73 -1.78 10.56 2.01 0 3.28 1.99 3.48 0.25 -0.3 -1.63 NS 8.25 4.6 -1.43 8.1 -1.2 2.08 -2.1 8.13 -1.92 0.78 5.88 2.34 0.34 0.61 2.27 -1.35 -1.74 9.87 2.88 -1.39 3.39 4.15 3.08 -1 0.24 -1.5 8.16 3.96 2.43 8.44 -2.06 3.74 -1.99 8.01 -0.95 0.69 6.06 1.67 0.3 1.44 2.39 -0.69 -2.77 10.19 2.14 -0.94 1.97 2.92 2.84 0.69 0.26 -2.25 6.46 4.47 5.09 8.39 -1.04 5.38 -1.86 6.45 -1.52 0.44 6.97 2.57 0.36 0.98 2.66 -0.66 -2.17 9.83 1.68 -0.94 1.33 -3.25 3.48 -0.64 0.64 -1.16 7.04 4.78 5.82 8.83 -1.43 5.63 -1.36 7.22 -0.67 1.33 7.2 2.33 0.01 0.22 2.56 -0.46 -1.95 10.24 2.54 -0.31 2.59 -0.68 3.39 0.02 0.87 -1.26 8.01 4.87 3.84 8.28 -0.93 5.11 -0.23 9.39 -0.56 0.51 8.16 2.84 0.77 1.36 3.28 0.01 -2.38 10.63 2.39 -0.71 3.69 -1 3.52 0.91 1.5 -1.55 7.54 4.61 3.83 7.88 -0.86 5.28 -1.01 8.82 0.28 0.7 7.76 3.18 0.88 1.66 3.25 -0.53 -1.98 10.1 1.62 -0.91 3.5 -1.96 3.63 -0.1 1.21 -1.28 7.85 4.39 4.79 8.89 -1.33 6.27 -0.85 8.23 -0.72 1.03 5.06 2.52 1.81 1.96 2.15 -0.77 -1.64 10.2 3.49 1.3 3.01 4.76 3.74 1.09 1.25 -1.1 Differentially expressed microRNAs in GNS vs NS cell lines at FDR<1%. GNS 8.07 4.42 4.05 9 -1.64 6.64 -1.53 8.38 0.41 1.31 5.51 2.45 0.85 1.96 2.6 0.88 -1.55 10.41 3.52 0.9 3.02 5.25 3.5 0.36 1.65 -1.3 Table F.1: 8.32 5.02 6.34 9.6 -0.37 5.08 0.01 7.6 -0.78 1.91 8.04 2.92 1.37 2.26 3.52 0.81 -0.15 11.41 3.47 0.12 3.39 6.14 4.65 1.56 1.89 -0.56 8.36 5.25 4.63 9.55 -0.85 4.13 -1.78 7.57 0.17 1.72 7.99 2.15 1.32 1.89 2.9 0.5 -1.06 11.47 3.71 0.14 3.53 6.21 4.26 1.11 1.53 -0.96 9.3 6.23 4.82 9.65 -0.51 4.23 -0.62 8.61 0.23 1.62 6.89 2.92 1.38 1.97 3.43 0.06 -0.46 11.15 3.34 -0.13 5.08 5.33 4.59 1.89 2.03 0.08 9.14 6.33 4.93 9.65 -0.92 4.07 -1.06 8.65 -0.6 2.04 6.87 2.74 0.34 2.02 3.03 -0.09 -1.19 11.19 3.41 -0.17 5.06 5.05 4.38 1.71 2.38 -0.23 microRNAshsa-miR-17 hsa-miR-1271 G7A G7Bhsa-miR-513a-5p G26Ahsa-miR-148b G26B G144Ahsa-miR-3183 G144Bhsa-miR-134 G166A G166Bhsa-miR-3187 CB660Ahsa-miR-4284 CB660B CB130Ahsa-miR-493 CB130Bhsa-miR-138-2* CB152A CB152Bhsa-miR-221* CB171Ahsa-miR-498 CB171B LogFChsa-miR-4253 p-valuehsa-miR-3605-5p FDR hsa-miR-373* hsa-miR-1914 hsa-miR-3178 hsa-miR-107 hsa-miR-500a hsa-miR-188-3p hsa-miR-7-1* hsa-miR-18b hsa-miR-3911 hsa-miR-3177 hsa-miR-3154 hsa-miR-200b* hsa-miR-513bhsa-miR-192* 2.96hsa-miR-100* 3.19 2.39 2.65hsa-miR-200a* 2.36 3.12 4.18 1.62hsa-let-7d* 0.19 3.79 2.67 0.19 1.71 1.31hsa-miR-379* 2.75 -0.22 1.18 2.41 2.68hsa-miR-127-5p 0.27 2.83 0.64 1.61 -1.79 1.38hsa-miR-374b -0.31 3.09 2.78 -0.88 -1.06 -0.72 1.33hsa-miR-224* 0.9 8.9 3.12 -1.24 2.63 -0.91 -0.48hsa-miR-1224-5p -0.26 -0.41 4.82 9.08 1.31 3.04 3.62 0.55 0.37 3.94 -0.05hsa-miR-1236 4.48 2.25 8.44 1.62 4.18 3.15 1.01 -0.09 2.92 0.45 -0.58hsa-miR-204 8.47 0.83 -0.82 3.91 1.68 -0.9 2.02 -0.75hsa-miR-361-5p 0.99 8.24 -1.48 0.75 2.07 5.2 11.46 -2.36 -0.32 0.31 1.81 9.9 0.84hsa-miR-3121 -1.48 11.22 -1.06 8.21 0.49 4.09 0.83 1.99 10.99 -0.31 1.67 0.41 9.95 1.46hsa-miR-92a-1* -1.24 2.5 7.25 -1.25 11.69 4.34 -1.24 10.89 1.37 -0.6 2.17 2.4hsa-miR-3653 0.88 7.2 2.12 2.63 -1.32 1.29 7.66 10.96 -0.78 -0.69 2.23 4.09hsa-miR-765 0.57 1.6 1.02 5.53 9.64 2.28 6.27 8.09 1.54 -2.01 -0.47 0.9 0.23 0.96 -0.05 4.38hsa-miR-501-5p 5.48 0.57 1.65 9.5 0 3.11 1.42 7.75 -0.18 -2.27 3.41 4.5 2.37 7.74hsa-miR-422a 4.3 0.66 -1.62 -1.55 1.87 -0.07 3.32 0.76 3.31 9.29 8.36 -0.54hsa-miR-663b 0.43 4.8 1.42 -1.76 2.85 -0.73 7.7 2.19 1.46 -0.54 -0.14 0.45 -0.09 0.8 3.81 4.28 -0.76 9.81 -0.07hsa-miR-320a 5.58 3.19 -0.23 5.76 -1.14 0.81 1.93 1.01 3.26 -0.2 -1.49 -0.17 0.7 -0.74 0.35 -1.38 -0.65 7.27hsa-miR-138-1* 1.42 2.74 9.57 1.28 7.86 5.97 5.97 2.95 -1.25 -0.3 0.78 -0.32 0.6 3.50E-03 -0.21 -0.02 1.6 0.9 -0.32hsa-miR-29b-1* 0.18 7.65 2.73 -0.82 8.21 2.52 9.4 -0.57 1.09 3.87 9.23E-03 -0.64 2.92 -0.1 8.93 0.59 3.24E-03 5.17 -1.12 -1.36 7.09 3.12E-03hsa-miR-1246 3.04E-03 -0.67 1.36 -0.47 -0.1 0.14 -0.24 8.66E-03 -1.18 4.19 8.44E-03 8.29E-03 5.01 2.22 7.78 3.59 7.06 1.65hsa-miR-15a 9.41 -0.63 -1.3 0.7 7.18 0.52 4.08 1.48 1.1 5.05 -1.67 -1.53 -0.32 5.71 7.51 0.32 2.43 -1.02 4.72 7.73 3.59 2.81E-03 0.8 12 8.79 5.18 2.82E-03 11.61 -0.19 2.19 -0.7 -1.13 -0.56 7.83E-03 3.94 -0.96 7.27 2.81 -0.36 5.65 7.83E-03 0.29 2.95E-03 4.69 7.46 12.22 3.02 -1.61 -1.47 11.3 9.63 0.37 4.22 -0.94 1.91 8.10E-03 -0.46 7.14 11 -0.21 2.46 3.74 5.18 -1.93 0.7 4.43 -1.78 -0.71 0.5 3.94 8.36 9.7 1.86 7.45 -0.86 1.04 11.32 -0.41 4.69 2.35E-03 2.17 -2 -1.14 4.66 2.38 2.51E-03 -1.96 11.43 6.72 0.8 6.84E-03 7.09 -1.66 0.8 7.21E-03 0.33 0.7 9.56 5.46 3.21 11.49 -1.56 2.52 4.47 -1.7 2.08E-03 -2.41 6.37 -0.51 1.70E-03 1.1 3.98 10.99 0.91 6.20E-03 9.22 1.89 3.65 -1.69 5.25E-03 2.32 4.59 -1.06 -1.61 1.62E-03 11.26 0.42 6.78 4.44 1.21 0.71 0.6 0.51 -1.96 5.01E-03 10.89 2.28 -1.01 3.84 -1.67 0.9 6.41 1.51E-03 4.76 3.33 0.64 -1.41 0.9 9.83 4.74E-03 -0.82 2.17 -1.58 3.15 1.16E-03 6.99 1.32E-03 1.77 4.45 -1.46 3.74E-03 2.57 10.71 -0.35 -2.16 4.17E-03 2.65 0.8 6.98 -1.3 3.03 3.76 1.1 10.49 0.7 -0.68 1.10E-03 0.6 3.60E-03 6.85 1.08E-03 2.14 -0.54 9.35E-04 11.08 2.89 1 9.72E-04 3.54E-03 3.17E-03 3.26E-03 7.16 10.95 0.7 2.84 4.23 7.67E-04 6.77E-04 2.67E-03 10.86 0.6 1.73 4.36 2.41E-03 10.71 7.11E-04 3.09 1.5 2.50E-03 0.8 6.51E-04 1.1 4.21E-04 2.34E-03 1.60E-03 4.78E-04 1.78E-03

301 F. MicroRNA Array Data Appendix 1.45E-03 1.15E-03 1.07E-03 9.47E-04 9.01E-04 8.87E-04 7.51E-04 6.56E-04 6.14E-04 6.14E-04 6.00E-04 5.42E-04 5.10E-04 4.90E-04 3.16E-04 2.77E-04 2.41E-04 2.34E-04 2.22E-04 2.06E-04 1.69E-04 1.49E-04 1.40E-04 1.26E-04 1.25E-04 1.22E-04 3.75E-04 2.85E-04 2.61E-04 2.24E-04 2.09E-04 2.04E-04 1.68E-04 1.43E-04 1.31E-04 1.32E-04 1.26E-04 1.11E-04 1.04E-04 9.87E-05 6.02E-05 5.21E-05 4.44E-05 4.24E-05 4.01E-05 3.67E-05 2.89E-05 2.48E-05 2.27E-05 2.00E-05 1.97E-05 1.89E-05 1.7 0.7 0.6 1.3 0.9 0.8 1.1 1.6 2.3 1 1.2 0.8 1 1.2 1 1 1 1 0.8 0.9 1.4 1.4 0.9 0.8 1.2 1.3 2.66 7.92 1.33 -2.02 -1.27 1.98 1.98 2.26 3.28 -0.6 8.13 -1.01 -1.47 1.29 1.27 4.76 8.42 6.59 2.32 0.81 -0.82 6.35 6.08 6.17 4 4.63 0.08 8 1.32 -0.94 -1.06 2.32 2.18 1.87 1.43 -1.93 7.45 -1.09 -1.47 0.2 1.99 4.08 8.77 6.78 2.37 0.17 0.06 6.31 5.76 5.96 4.8 4.92 0.86 7.17 1.38 -0.71 -1.25 2.23 1.32 2.97 -0.3 -0.83 8.28 -0.77 -0.5 3.43 0.7 4.18 8.86 7.27 2.54 0.2 3.78 6.51 5.78 6.15 4.83 5.29 2.14 7.8 0.98 -1.52 -0.9 2.43 2.24 3.6 1.96 -1.21 8.34 -0.74 -1.35 3.5 1.08 4.28 8.83 7.36 2.76 -0.02 4.17 6.75 5.89 6.22 4.74 4.65 NS 0.06 7.34 0.4 -0.91 -1.13 2.25 2.12 1.57 1.02 -1.54 7.26 -0.99 -1.1 0.49 1.27 3.89 7.89 6.17 2.36 -0.35 -1.64 5.9 5.42 5.77 3.9 3.64 -1.7 7.42 0.28 -1.96 -1.29 2.31 0.83 0.37 -1.6 -1.22 7.31 -0.74 -1.15 -0.76 1.93 3.78 8.44 6.7 2.66 0.08 -0.66 6.18 5.48 5.88 4.7 4.84 -2.93 7.45 0.91 -1.05 -0.72 3.13 2.41 -3.67 1.92 -1.1 8.32 -1.74 -0.82 2.11 0.2 5.87 8.81 5.9 1.65 1.37 4.86 4.83 6.45 6.11 4.57 5.34 -2.58 7.8 1.15 -1.62 -0.36 3.18 2.51 -1.76 2.61 -1.07 8.57 -1.57 -0.17 2.73 0.22 6.06 9.27 6.74 1.23 1.4 4.95 6.6 6.54 6.17 4.75 5.6 -2.71 7.27 0.18 -1.93 -1.83 3.01 2.62 1.48 3.3 -1.06 9.76 -0.71 -1.11 4.6 1.4 5.18 8.74 5.97 1.46 0.41 -2.21 7.29 6.66 6 5.9 5.95 -2.89 6.75 0.23 -2.15 -1.08 2.38 3.2 0.92 2.15 -1.02 9.9 -1.06 -0.96 4.32 1.66 5.01 8.08 5.66 1.47 0.32 -2.35 6.77 6.58 6.04 5.64 5.83 2.8 8.66 1.69 2.83 -0.73 3.81 3.24 4.37 3.63 -0.51 8.85 -0.25 0.31 -0.4 2.25 5.83 9.39 8.26 4.53 1.24 5.34 8.21 7.05 6.76 4.99 5.27 GNS 2.52 8.76 1.64 2.27 0.1 3.9 2.9 4.14 2.97 -0.8 8.68 -0.45 -0.18 -0.19 2.11 5.16 9.35 8.32 4.5 1.17 4.85 7.96 6.68 6.8 5.06 5.5 2.23 9.18 2.22 -0.36 0.83 3.42 3.26 2.48 3.62 0.2 9.92 -0.25 0.36 4.14 2.48 6.16 10.17 7.91 2.17 1.95 2.55 6.9 7.25 7.01 6.68 6.94 2.5 9.03 2.44 -0.39 0.87 2.68 2.93 1.45 3.38 0.41 8.29 -0.24 -0.05 4.35 1.89 5.62 9.79 7.59 1.91 1.8 1.52 6.61 6.59 7.34 5.93 6.29 3.92 8.42 2.23 0.09 0.36 3.61 3.16 2.84 5 0.77 9.17 0.05 0.43 2.7 2.4 5.81 10.75 9.06 4.4 2.27 8.24 8.68 7.28 7.51 5.85 6.84 3.91 8.58 2.11 -1.04 0.34 3.42 3.05 2.15 4.94 0.51 8.59 0.94 0.96 3.17 2.29 5.72 10.61 8.66 3.99 1.79 7.72 8.43 6.87 7.33 5.66 6.58 microRNAshsa-miR-514b-5phsa-miR-130b* 2.54 G7Ahsa-miR-365 2.61 G7B 2.74hsa-miR-4323 G26A G26B 3.48hsa-miR-214 G144A 1.98hsa-miR-3176 G144B G166A 1.98hsa-miR-1290 G166Bhsa-miR-671-5p 1.33 CB660A CB660Bhsa-miR-545 1.73 CB130Ahsa-miR-1973 CB130B 2.5 CB152Ahsa-miR-149* CB152B 2.45hsa-miR-3162 CB171A CB171Bhsa-miR-3200-5p 1.41 LogFC p-valuehsa-miR-3621 FDR 0.81hsa-miR-34b hsa-miR-760 0.77hsa-miR-3679-5p 1.57hsa-miR-26b 1.22hsa-miR-340 hsa-miR-208b 1.1hsa-miR-4304 0.8hsa-miR-95 3.91E-04hsa-miR-101 1.50E-03 hsa-miR-3196 hsa-miR-484 hsa-miR-371-5p hsa-miR-4271 hsa-miR-187*hsa-miR-340* 1.14hsa-miR-374c 1.44 6.71 0.17hsa-miR-4299 6.84 5.06 1.1 6.21hsa-miR-129* 5.32 6.17 6.37 4.4hsa-miR-185 -0.68 5.65 3.47 7.33 7.74hsa-miR-3607-5p 0.37 3.71 4.56 6.96 3.6 7.81 3.6 7.43hsa-miR-590-5p 3.8 0.72 7.11 3.93 6.77 9.23hsa-miR-135b 3.42 4.33 7.17 -0.49 4 2.58 9.6 6.66hsa-miR-20b* 4.69 7.07 11.25 4.6 0.08 3.88 8.57 11.09hsa-miR-935 7.77 -1.71 6.79 4.23 -0.99 2.29 8.22 5.66 -1.87 8.77 -1.17hsa-miR-3065-3p 1.04 7.55 6.93 3.06 1.9 8.68 3.57 -1.15 2.39 8.83 -0.83hsa-miR-3937 5.54 3.15 6.38 6.83 7.12 -0.77 3.14 1.58 2.87 1.86 8.57hsa-miR-3188 3.09 -0.97 -1.02 -0.41 3.33 5.43 6.66 6.78 2.99 1.86 6.81 3 7.55hsa-miR-423-3p 3.31 -0.58 3.36 -1.3 3.4 -0.7 6.15 10.79 4.89 3.85 3.12 0.41hsa-miR-4322 -1.03 2.93 3.59 8.01 5.95 -1.97 3.79 2.55 11.45 3.45 3.56 3.17 -1.62hsa-miR-769-5p 5.53 -0.67 6.15 3.44 6.99 -1.74 2.56 -2.21 7.9 2.93 5.56 3.87 2.58 7.58 4.1 2.79hsa-miR-125a-3p 3.36 -1.31 -1.08 6.3 -2.03 2.12 2.74 6.03 5.48 5.18 7.53 -0.19 2.75 2.44 2.94hsa-miR-128 5.96 2.97 -0.43 7.18 1.44 2.6 -1.28 5.19 7.21 -2.19 0.64 2.81 0.16hsa-miR-29b 5.52 6.36 8.03 -0.44 3.03 3.21 3.4 4.76 5.84 8.13 7.09 9.73 1.5 2.11 2.22 1.97 1.39hsa-miR-124* -2.19 3.19 8.18 2.98 0.04 4.88 0.85 6.91 4.92 12.87 7.8 6.89 6.06 3.24 2.99E-04 1.71 8.29 2.07hsa-miR-570 12.94 -0.08 7.63 3.06 4.33 -1.81 1.69 2.71 3.16 1.19E-03 6.74 2.78 13.11 -1.77 0.7 6.52hsa-miR-4324 8.21 3.15 7.59 2.04 1.69 3.17 6.14 6.56 4.19 1.35 1.5 13.16 -2.14 3.13 1.46 5.68 1.95 2.71E-04 -0.19hsa-miR-629* 6.6 -1.11 2.4 6.59 13.07 5.64 1.28 1.89 7.89 1.49 4.16 1.10E-03 0.7 2.95 -0.76 5.95 -2.06 1.2 5.75hsa-miR-137 3.15 5.7 -0.23 13.08 1.74 2.26E-04 4.88 -1.78 2.86 6.88 2.49 2.86 6.44 0.64 2.22E-04 4.06 8.13 9.49E-04 0.7 6.1 -1.86 13.02 4.91 4.11 9.92 2.31 1.94 9.42E-04 1.64 1.06 4.51 5.28 -2.33 1.33 2.95 2.23 4.44 4.07 0.7 13.56 1.71 2.09E-04 4.73 2.67 7.87 4.44 -2.25 8.53 6.03 5.67 9.01E-04 2.28 1.26 1.5 -2.34 0.61 4.08 10.64 0.88 4.92 1.81E-04 6.22 1.17 3.98 2.39 0.82 0.7 1.1 7.99E-04 6.19 3.87 1.57E-04 1.4 0.16 6.02 -1.53 6.54 0.79 8.35 5.24 3.07 2.58 1.92 7.07E-04 1.28 3.52 1.31E-04 1.43E-04 2.27 4.03 6.65 1.32E-04 -1.08 5.84 -3.79 6.14E-04 1.4 6.56E-04 5.56 0.69 11.83 3.01 2.3 6.14E-04 1.57 0.8 3.51 4.31 6.75 1.58 -3.54 -0.02 1.20E-04 5.67 12.41 6.33 1.3 1.76 0.92 5.74E-04 1.81 4.1 4.03 4.33 4.04 1.53 2.39 -1.12 11.41 6.1 1.06E-04 6.28 2.14 -0.62 4.46 1.3 4.12 5.21E-04 3.5 3.86 -0.05 1.23 1.97 11.23 6.56 7.1 6.09 0.44 9.89E-05 0.9 3.91 -0.32 3.71 -1.02 12.47 4.90E-04 0.06 1.1 6.69E-05 6.22 5.59 0.24 6.31 3.36 -0.14 5.54E-05 13 3.47E-04 -3 3.61 1.1 2.93E-04 6.1 0.8 1.42 3.72 1.1 3.21 4.72E-05 0.8 1.7 2.44 4.33E-05 2.54E-04 1.16 6.35 3.56 4.25E-05 3.52 2.36E-04 -0.9 3.59E-05 1.73 2.34E-04 2.04E-04 1.17 0.9 3.75 3.48 1.8 0.63 3.68E-05 1.75 3.94 2.80E-05 3.25 2.06E-04 1.65E-04 0.69 1.4 3.71 2.88 1.7 2.48E-05 3.3 0.8 1.49E-04 2.03E-05 1.27E-04 2.00E-05 1 1.26E-04 1.91E-05 1.22E-04

302 F. MicroRNA Array Data Appendix 1.13E-04 1.04E-04 9.44E-05 8.61E-05 7.24E-05 6.96E-05 6.31E-05 5.96E-05 5.91E-05 5.65E-05 5.65E-05 4.78E-05 4.59E-05 4.55E-05 3.95E-05 3.49E-05 3.45E-05 3.32E-05 3.15E-05 3.03E-05 2.94E-05 2.83E-05 2.64E-05 2.41E-05 2.27E-05 1.98E-05 1.73E-05 1.57E-05 1.40E-05 1.25E-05 1.04E-05 9.75E-06 8.62E-06 8.05E-06 7.90E-06 7.41E-06 7.35E-06 6.11E-06 5.80E-06 5.70E-06 4.73E-06 3.94E-06 3.82E-06 3.60E-06 3.34E-06 3.15E-06 3.03E-06 2.88E-06 2.65E-06 2.38E-06 2.20E-06 1.88E-06 1 1.4 1 2.6 1.4 0.9 1 1.3 1.2 2.2 1.3 1.4 2.3 1 2 1.1 1.9 1.5 1 1.4 1.9 1.3 1.4 1.2 1.5 1.2 1.3 5.91 10.67 2.49 -0.86 4.58 6.4 5.82 8.42 -0.19 4.98 3.3 2.79 11.51 1.79 3 0.6 1.8 2.28 1.94 1.22 4.77 10.96 10.29 1.06 3.05 1.08 6.85 10.78 0.9 -1.61 4.82 6.51 4.63 9.08 1.75 5.55 4.65 0.61 11.67 1.5 2.41 -0.21 1.46 1.79 3.03 1.35 5.04 11.11 10.88 0.4 2.98 2.41 7.18 11.2 2.32 -1.71 4.99 6.78 4.73 9.43 -0.1 6.13 5.59 -0.61 11.4 1 2.87 -1.35 1.67 2.46 2.04 2.48 5.04 10.24 10.94 -0.05 3.95 2.66 6.66 11.44 3.32 -1.7 4.75 6.8 4.92 9.32 0.57 5.78 5.5 1.06 11.68 1.11 2.7 -0.56 1.19 2.32 2.05 1.4 5.2 10.38 11.05 0.03 3.82 NS 2.03 5.59 10.26 -1.82 -0.8 4.87 6.43 4.21 8.21 0.91 4.39 3.42 1.38 11.21 0.57 2.57 0.09 1.65 2.15 2.52 0.53 4.52 10.7 10.16 -0.59 2.61 1.67 6.54 10.54 -0.34 -2.04 4.87 6.54 3.8 8.93 1.24 5.21 4.09 0.97 11.41 -0.63 3 -1.62 0.86 2.39 2.51 0.48 4.89 10.62 10.52 -0.87 3.07 1.63 7.72 11.2 3 -1.59 4.76 6.71 5.76 9.73 1.65 6.13 5.01 0.13 11 -0.4 3.03 -0.14 1.53 2.46 2.47 2.2 4.77 9.41 10.23 0.51 4.11 2.26 7.6 11.61 4.18 -1.2 4.96 7.06 5.92 10.01 2.17 6.43 5.37 -0.58 11.44 1.55 3.38 0.49 1.76 2.55 2.06 3.08 5.59 10.56 10.71 0.94 4.36 1.99 8.32 11.15 2.73 -0.51 4.97 7.3 7.09 8.27 3.52 7.24 5.91 2.34 12.16 2.61 3.58 -0.33 1.71 2.63 1.47 3.61 7.31 11.85 11.34 1.41 4.35 2.32 8.09 10.52 0.37 -1.04 4.87 6.92 6.95 7.97 1.74 7 5.6 2.1 11.64 2.13 3.38 -0.83 0.74 2.81 1.66 3.68 6.65 11.25 10.74 0.4 4.27 3.66 7.58 11.41 4.09 -1.08 5.33 7.84 3.63 11.66 2.38 6.89 5.79 2.3 11.81 3.69 3.23 2.42 1.77 3.32 5.67 3.31 5.51 11.41 11.13 1.39 4.93 GNS 3.18 7.64 11.48 4.06 -1.05 5.51 8.26 3.66 11.25 2.56 6.54 5.8 2.24 11.98 3.57 3.6 2.56 2.41 3.32 6.05 2.71 5.95 11.55 11.31 0.98 4.32 2.74 8.83 12.85 5.85 0.78 5.54 7.76 7.68 9.97 3.79 7.16 6.45 3.24 12.61 2.96 4.7 2.7 3.74 2.74 2.99 4.91 5.87 12.19 12.47 1.83 5.28 2.31 8.04 12.67 5.62 0.76 4.98 7.55 7.62 9.88 3.6 6.51 6.14 3.42 12.5 2.6 4.5 2.36 4.02 2.86 2.46 3.33 5.82 12.19 12.26 2.25 4.8 3.73 8.35 12.59 5.79 0.65 7.49 7.79 6.41 11.76 3.85 6.74 6.41 4.12 13.4 2.61 4.62 1.82 4.83 4.32 4.84 3.33 6.57 12.39 12.74 2.51 4.88 3.48 8.04 12.81 6.09 0.91 7.41 7.54 6.83 11.59 3.88 6.66 6.36 4.2 13.44 2.3 4.44 1.98 4.63 4.5 4.58 3.31 6.93 12.2 12.74 2.46 4.8 microRNAshsa-miR-378*hsa-miR-450b-5p G7A 4.12hsa-miR-3665 G7B 3.99hsa-miR-23a G26A 0.27 G26Bhsa-miR-4317 2.1 G144Ahsa-miR-148a* G144B 3.84 G166Ahsa-miR-146b-5p G166B 3.32hsa-miR-660 CB660A CB660B -0.89hsa-miR-222 CB130A -0.46hsa-miR-181a CB130B CB152A -0.84hsa-miR-449a CB152Bhsa-miR-1225-5p CB171A 0.35 CB171Bhsa-miR-3663-3p LogFC -1.52 p-valuehsa-miR-15b* FDR 0.85hsa-miR-16 hsa-miR-29a* -0.35hsa-miR-1226* -0.52hsa-miR-193b* -0.93hsa-miR-3174 0.12hsa-miR-451 hsa-miR-146a 2.4hsa-miR-3610 1.86E-05hsa-miR-3651 1.21E-04 hsa-miR-29a hsa-let-7g hsa-miR-3934 hsa-miR-4270 hsa-miR-30ehsa-miR-129-3p 9 4.48hsa-miR-320b 4.46 9.36hsa-miR-629 9.63 3.52 9.93hsa-miR-140-5p 9.55 1.44 3.2 9.9 10.97 9.06hsa-miR-711 2.01 10.71 4.47 8.07 8.96hsa-miR-1207-5p 0.67 9.25 1.63 7.02 8.92 4.63 8.21hsa-miR-598 1.35 9.58 1.83 7.09 2.27 8.72hsa-miR-105 0.16 7.51 12.19 1.27 7.22 8hsa-miR-139-3p 12.22 8.56 1.41 2.44 1.05 8.09 7.8 1.46 8 2.16 8.34hsa-miR-490-5p 0.34 2.93 8.86 1.97 -0.81 8.44 7.18 2.02 0.87hsa-miR-200a 8.5 -0.62 1.27 7.43 0.29 8.22 1.21 7.59 2.99 0.79 7.28 -0.72hsa-miR-195 7.39 4.01 1 9.39 0.67 1.61 -0.74 7.01 2.53 7.98hsa-miR-15b 2.59 7.68 4.31 7.03 0.86 11.3 3.89 8.74 0.68 7.05 2.71 3.1 0.47hsa-miR-3652 8.12 7.5 11.31 0.19 14.51 2.7 6.87 3.47 8.79hsa-miR-4306 7.41 0.13 14.17 3.61 -0.96 2.86 9.18 -1.01 -0.25 6.84 7.78 13.91 -1.75 9.23 8.07hsa-let-7f 1.92 2.77 3.37 -0.56 7.84 -0.43 7.1 13.8 -1.61 -1.9 8.81 7.01 -0.13 8.48 2.52 8.4hsa-miR-642b 7.92 0.89 -0.43 8.07 12.97 2.19 -1.55 -0.59 -0.59 6.63 3.89 7.91 8.53 -1.54hsa-miR-7 15.52 9.47 5.67 6.32 0.47 12.7 -0.95 -1.29 8.44 15.6 7.49 2.72 2.9 7.87 0.19hsa-let-7a -0.68 6.37 7.06 6.33 -1.47 9.48 5.15 0.35 13.2 15.18 -0.2 -1.67 8.3 6.39 6.02 8.25 2.61hsa-miR-4281 7.41 15.47 7.67 -0.01 -0.23 2.16 -0.77 0.91 6.69 13.72 6.54 9.36 5.93 6.89 14.87 14.52hsa-miR-3141 -1.87 3.38 0.17 8.21 8.13 8.22 7.76 1.2 -1.51 3.65 0.26 12.27 14.84 -1.56 6.03 1 14.37hsa-miR-193b 0.9 6.55 8.21 5.22 8.85 14.45 6.11 3.42 -1.55 6.92 4.32 -0.44 1.61E-05 0.9 12.01 6.7 -1.15 14.19 7.77 6.16 -0.28 14.5hsa-miR-425 1.6 5.53 6.4 1.44E-05 1.06E-04 2.19 6.66 6.93 7.2 -1.41 0.97 1.1 6.06 14.86 -0.88 1.39E-05 8.42 13.97 9.63E-05 5.11 12.75hsa-miR-503 0.1 6.55 1.03E-05 0.22 6.32 7.62 9.31 9.40E-05 7.22 9.76E-06 14.18 7.24E-05 6.9 -1.83 1.93 8.14 6.52 13.82 5.48 0.98 6.1 -0.01 7.28 12.46 6.56 9 6.96E-05 5.77 1.4 -0.31 4.68 13.33 6.93 4.37 13.51 8.31 -2.25 7.15 0.68 6.82 5.65 12.82 -0.84 5.25 1.26 6.34 1.1 9.46 9.61E-06 14.02 4.49 0.11 4.46 6.82 8.42 13.43 4.71 8.43 -1.97 6.95E-05 12.52 -0.49 1.3 9.46 6.69 13.03 8.14E-06 4.85 6.51 -0.34 3.24 5.13 4.41 6.6 8.49 2.1 13.35 5.99E-05 8.7 1.6 8.19 7.68E-06 12.91 2.13 12.59 0.9 4.72 -0.37 4.7 6.56 4.62 5.81E-05 7.72 7.43E-06 3.51 4.68 14.1 6.47E-06 8.54 12.52 7.82 12.99 5.65E-05 8.00E-06 5.49 2.23 0.31 5.04E-05 4.23 7.11 5.07 4.22 5.95E-05 3.18 7.49 13.87 9.16 1.1 12.68 3.27 7.09 1.35 1.8 5.12 4.11 6.82 13.85 5.18 9.76 6.41 4.38 5.20E-06 13.41 3.87 1.5 5.99E-06 4.27E-05 1.39 4.65 3.37 7.76 13.99 6.77 4.72E-05 5.72 3.55 4.78 6.25 13.24 5.71E-06 1.7 1.2 4.55E-05 5.11 3.44 7.61 6.95 13.38 4.94 7.37 3.01 2.96 4.28E-06 3.64E-06 5.12 3.53 12.86 8.1 1 3.72E-05 3.33E-05 3.86 3.35 1.69 7.64 1.2 4.91 3.48 3.90E-06 7.92 1.5 6.72 4.85 0.98 3.48E-05 3.17E-06 4.11 3.3 3.53E-06 3.03E-05 8.39 1.9 5.46 7.18 3.31E-05 5.82 3.20E-06 4.15 8.3 1.1 4.69 3.03E-05 5.53 2.92E-06 1.2 0.18 8.33 2.85E-05 2.76E-06 1.2 0.64 8.31 2.73E-05 2.45E-06 2.46E-05 2.2 1.1 2.05E-06 2.28E-06 2.14E-05 2.32E-05

303 F. MicroRNA Array Data Appendix 1.61E-05 1.44E-05 1.43E-05 1.05E-05 9.49E-06 9.35E-06 8.47E-06 7.55E-06 7.55E-06 7.05E-06 5.78E-06 5.78E-06 5.78E-06 4.92E-06 4.60E-06 4.40E-06 3.65E-06 3.60E-06 3.29E-06 3.03E-06 2.96E-06 2.53E-06 2.38E-06 2.10E-06 1.98E-06 1.91E-06 1.51E-06 1.31E-06 1.27E-06 9.05E-07 7.87E-07 7.64E-07 6.73E-07 5.78E-07 5.77E-07 5.13E-07 4.03E-07 4.06E-07 4.10E-07 3.25E-07 2.97E-07 2.75E-07 2.22E-07 2.11E-07 1.88E-07 1.69E-07 1.60E-07 1.30E-07 1.18E-07 1.01E-07 9.04E-08 8.39E-08 1.2 2.6 1.4 1.8 1.6 1.4 1.6 2.2 1.1 1.4 1.9 1.6 1.6 3.7 3.5 2.7 2.1 1.4 1.8 2.6 4.1 1.9 2 2.9 1.7 2.6 11.7 0.56 9.14 0.17 2.75 -1.7 1.05 0.85 1.69 8.57 -2.15 10.94 3.35 0.93 -0.8 2.72 -0.33 6.25 3.5 0.39 1.61 2.2 -0.94 6.88 4.28 2.63 11.84 -1.37 9.69 -0.62 1.81 -2.12 0.68 -0.42 2.11 9.01 -2.12 10.32 2.37 -1.24 0.55 2.04 -0.78 5.92 3.59 1.77 1.72 2.97 0.37 6.66 4.81 2.97 11.57 -0.66 9.48 0.3 1.61 -1.88 1.37 -0.87 2.58 9.52 -1.11 9.01 2.56 4.27 -0.18 3.44 0.01 6.24 3.59 -1.09 -2.5 3.18 0.33 11.17 5.29 7.74 11.85 0.09 9.23 -0.59 1.53 -1.32 1.29 0.16 2.51 9.35 -1.5 9.38 2.48 3.87 1.98 4.23 -0.64 6.16 3.78 -1.87 -1.46 2.58 0.55 11.52 4.75 8.08 NS 11.39 0.15 8.59 0.77 1.4 -2.53 1.99 -1.19 2.47 8.04 -2.16 10.32 1.99 0.55 -2.27 1.56 -1.04 5.26 1.54 0.02 -2.44 1.28 -1.32 6.82 4.5 2.8 11.5 -1.59 9.48 0.92 2.09 -1.75 2.73 -1.35 2.26 8.76 -2.26 10.12 2.05 3 -1.89 2.81 -0.68 5.75 2.63 1.4 -0.98 2.18 -1.16 7.33 4.6 3.77 9.66 -1.13 9.52 -0.55 3.67 -1.86 0.62 -0.89 3.59 9.67 -1.04 8.79 3.61 -0.49 -3.21 2.9 -0.81 6.46 3.28 -1.57 -4.55 4 2.34 9.36 5.38 6.34 10.44 -1.02 9.77 -0.7 3.86 -2.07 0.65 -0.08 3.72 9.65 -0.88 9.53 3.58 -0.75 -1.92 4.6 -0.47 6.81 3.69 -1.74 -3.73 4.13 2.11 11.39 5.36 7.57 11.37 0.87 9.09 -0.12 2.68 -0.39 0.3 0.64 3.31 10.28 -0.99 10.31 3.42 -2.15 3.96 4.66 1.16 7.17 4.26 -1.63 2.51 4.57 0.11 10.36 6.53 6.4 10.91 -0.07 8.54 0.14 2.39 -0.61 1.1 -0.24 3.25 9.85 -1.13 9.71 3.44 -0.55 2.72 3.96 1.31 6.9 3.62 -1.71 -0.52 4.5 -0.19 9.77 6.61 6.1 12.14 1.7 12.51 1.01 4.97 -1.25 2.55 1.98 4.17 9.71 -0.69 11 4.53 10.53 -0.17 6.13 1.25 7.2 4.01 4.41 4.43 4.07 2.82 12.87 5.79 8.95 GNS 12.23 1.58 12.63 0.28 4.23 -1.23 2.47 2.27 4.2 9.9 -1 11.47 3.79 10.54 1.08 6.59 0.53 7.12 4.11 3.88 4.14 4.16 2.81 12.07 5.95 8.24 12.97 2.9 10.19 0.35 4.36 0.13 -0.12 2.11 2.95 11.52 2.03 12.56 5.02 0.44 3.75 5.88 2.56 7.7 5.44 3.89 1.75 5.45 3.26 11.95 6.94 8.06 12.99 2.55 9.99 -0.35 4.16 -0.26 -0.01 2.47 2.79 11.76 0.49 12.66 4.64 0.4 3.97 5.81 1.23 7.77 5.65 3.28 0.34 4.84 2.54 11.31 6.28 6.99 13.74 3.55 11.8 6.02 4.58 -0.11 7.99 2.14 4.62 10.36 1.66 11.98 5.17 10.47 2.45 6.42 2.27 7.86 6.33 3 3.99 5.34 3.52 12.79 7.72 9.12 13.58 3.13 11.67 6.57 4.39 0 8.65 2.07 4.79 10.35 1.41 11.79 4.97 10.04 2.17 6.21 1.92 8.03 6.41 2.77 4.12 5.09 3.63 12.87 7.14 8.8 microRNAshsa-miR-762hsa-miR-106b G7A 5.94hsa-miR-1285 G7B 6.4hsa-miR-26a G26A G26B 5.5hsa-miR-452 G144Ahsa-miR-483-5p 6.29 G144B G166A 5.68hsa-miR-589* G166Bhsa-miR-224 5.74 CB660A CB660Bhsa-miR-345 5.03 CB130Ahsa-miR-584 CB130B 5 CB152Ahsa-miR-30d CB152Bhsa-miR-3615 5.45 CB171A CB171Bhsa-miR-20b LogFC 5.68 p-valuehsa-miR-4298 FDR 3.69hsa-miR-124 hsa-miR-30d* 2.78hsa-miR-362-3p 4.47hsa-miR-877 hsa-miR-320c 4.76hsa-miR-423-5p 3.59hsa-miR-488* 3.62hsa-miR-32 hsa-miR-4327 1.4hsa-miR-664* 1.79E-06 1.90E-05 hsa-miR-424 hsa-miR-1181 hsa-miR-542-3p hsa-miR-93hsa-miR-630 12.56hsa-miR-29c 12.52 7.37 12.49hsa-miR-188-5p 7.13 10.44 12.25 4.27hsa-miR-1915* 10.67 7.81 11.35 4.18 10.87hsa-miR-664 1.48 7.71 11.13 10.79 4.62 1.38hsa-miR-550a* 6.67 9.45 10.08 6.32 4.98 0.81hsa-miR-30b 6.74 8.1 10.31 6.25 9.3 4.06 0.77 6.7hsa-miR-4318 5.29 9.69 9.44 4.32 5.02 0.11 10 5.88hsa-miR-3607-3p 5.7 3.83 9.35 9.46 4.61 3.84 3.98 0.1 5.58hsa-miR-30b* 3.59 10.26 10.6 5.05 4.19 7.43 4.63 10.06 4.17 3.01 10.7hsa-miR-3194 0.13 3.86 8.67 4.98 2.65 8.15 2.77 4.43hsa-miR-150* 3.55 4.01 7.17 2.54 0.61 2.8 10.63 3.68 7.33 7.81 0.81 4.77hsa-miR-1915 3.02 5.17 5.95 -0.07 3.16 2.99 10.81 3.79 3.76 4.68 8.54 2.44 1.43hsa-miR-1273e 6.28 8.65 8.53 5.4 3.24 -0.07 10.43 4.76 3.11 2.58 3.35 5.47hsa-miR-454 3.73 9.23 2.2 8.9 0.29 8.65 3.48 1.81 2.56 6.35 -1.02 2.8 10.8hsa-miR-500a* 2.46 7.81 7.63 4.64 2.03 0.45 5.18 7.64 5.13 2.52 8.85 1.67 2.62hsa-miR-199a-3p 3.74 5.75 8.78 0.05 7.96 4.27 11.19 0.85 7.22 3.65 3.31 8.68 4.9 5.85 2.65 5.06hsa-let-7b 8.27 2.57 2.27 6.42 8.65 4.4 4.48 8.85 1.1 -1.03 5.83 0.59 0.12 7.53hsa-miR-3065-5p 3.45 3.26 7.02 5.64 8.24 2.32 2.18 3.9 2.37 1.68 1.48E-06 3.53 5.87 9.13 -1.16 14.64hsa-miR-320d 0.58 4.88 7.19 0.08 5.59 9.02 2.55 1.59E-05 6.77 3.98 1.4 2.57 3.62 14.61 6.44 1.09 1.65hsa-miR-361-3p 4.38 -1.43 9.17 1.9 7.37 10.53 14.02 8.3 4.44 4.28 5.05 8.95 1.01 8.09 5.94 4.14 2.86 10.41 6.09 13.92hsa-miR-492 1.44 0.24 4.42 4.66 -1.8 2.31 1.7 1.91 1.6 10.04 7.8 7.86 13.3 5.96 4.04 5 4.86hsa-miR-200b 0.05 7.83 3.8 1.5 4.88 9.87 0.36 3.09 5.09 1.30E-06 6.01 1.04E-06 -1.25 0.76 -0.76 2.07 1.5 13.16hsa-miR-320e 5.05 1.44E-05 4.64 9.59 7.7 5.36 3.1 1.18E-05 9.02E-07 0.55 4.88 6.41 7.8 5.78 3.62 -0.23 12.82 3.6 0.19 -0.76 7.89E-07 1.05E-05 1.21 5.23 3.26 5.08 9.82 1.97 9.44 4.37 0.87 9.49E-06 4.97 13.47 3.58 0.25 6.14 4.31 5.08 9.74 3.41 -0.47 7.21 1.39 2.97 1.1 1.35 9.25 2.19 5.16 4.46 11.75 2 4.7 9.31 1.3 3.37 -0.01 5.3 2.41 4.02 -1 7.06E-07 4 1.6 1.95 9.58 3.56 0.59 11.16 9.2 2.91 3.73 3.76 6.09E-07 8.72E-06 2.21 2.41 1.6 2.11 5.83E-07 7.81E-06 8.89 6.86 3.17 3.78 0.56 12.1 8.9 2.92 -0.08 1.7 7.55E-06 2.03 3.81 4.31 2.91 5.25E-07 1.68 1.17 4.17 8.55 5.47 7.44 7.14E-06 1.85 4.78E-07 1.04 8.77 11.67 -0.63 2.83 1.7 4.6 4.05 6.64E-06 0.63 0.26 3.96 2.79 8.46 8.3 6.35 5.74 12.29 -0.9 1.6 1.16 4.12E-07 3.43 4.62 5.78E-06 0.67 1.95 8.79 0.91 3.11 12.13 4.12E-07 4.58 5.59 0.54 7.81 0.44 5.78E-06 1.6 4.07 8.17 3.15 2.39 12.49 2.71 2.32 1.9 4.35 8.55 2.3 0.52 3.34E-07 4.32 7.83 11.41 3.29 1.72 5.00E-06 3.21E-07 0.5 3.88 2.89E-07 1.9 8.7 -0.52 4.92E-06 4.53E-06 1.9 1.3 7.46 1.5 0.64 2.30E-07 3.71 2.9 1.01 1.84E-07 3.73E-06 8.44 2.22E-07 6.98 1.96 3.27E-06 2.04E-07 0.82 3.65E-06 3.59 1.17 3.52E-06 8.33 7.7 2 1.9 3.33 0.66 1.4 1.69E-07 7.89 1.6 1.48E-07 0.25 0.63 3.03E-06 2.80E-06 1.29E-07 7.61 2.1 1.19 2.53E-06 7.75 1.11E-07 2.3 2.26E-06 1.5 1.01E-07 2.10E-06 9.16E-08 1.98E-06

304 F. MicroRNA Array Data Appendix 1.70E-06 1.51E-06 9.29E-07 9.29E-07 8.25E-07 6.82E-07 5.18E-07 3.81E-07 3.36E-07 1.98E-07 1.95E-07 1.27E-07 1.05E-07 4.27E-08 3.77E-08 2.53E-08 1.03E-08 8.33E-09 5.13E-09 5.13E-09 2.49E-09 1.20E-09 9.25E-10 6.66E-10 1.10E-11 7.33E-08 6.18E-08 3.62E-08 3.54E-08 3.00E-08 2.33E-08 1.65E-08 1.13E-08 9.57E-09 5.32E-09 4.93E-09 2.91E-09 2.26E-09 8.25E-10 6.72E-10 4.13E-10 1.53E-10 1.11E-10 5.98E-11 5.76E-11 2.00E-11 7.16E-12 4.18E-12 1.98E-12 1.63E-14 2 2.9 3.3 2.6 2.5 4 2.2 1.7 2.2 2.4 2.1 3.5 3.3 2.9 3.5 3.3 2.9 4.7 3.8 3.2 4.4 5.2 4.7 4.3 7.8 0.21 1.66 1.29 0.91 1.3 -1.54 2.98 4.68 2.01 1.35 -1.72 -1.78 -2.09 1.25 -1.38 -2.14 6.7 4.78 -2.79 8.26 -1.97 -2.22 -3.46 -1.88 0.63 1.01 -0.12 1.39 0.16 1.9 -0.18 3.6 4.67 1.9 1.21 -2.24 -2.08 -1.89 0.28 -0.38 -2.06 5.94 4.81 -2.5 7.9 -2.23 -1.94 -3.78 -2.42 0.36 0.31 -1 2.05 0.92 6.21 3.18 4.37 4.91 3 2.85 -2 -0.77 -1.32 0.98 0.02 -2.83 6.14 6.15 -2.22 3.98 -2.35 -1.49 3.81 0.87 1.51 0.07 -0.51 1.39 2 6.44 1.95 4.39 4.99 3.03 2.53 -1.76 -0.24 -0.07 0.67 -0.79 -2.36 6.13 5.98 -2.33 3.92 -2.21 -1.25 4.31 0.39 0.39 NS -0.03 -0.77 1.43 -0.44 0.8 -2.4 2.29 4.95 1.21 1.46 -2.22 -2.35 -2.28 0.72 -2.16 -2.9 5.61 4.56 -2.7 8.06 -2.41 -3.02 -3.78 -2.05 0.66 -0.71 0.33 -0.45 0.38 1.97 -0.77 3.08 5.13 0.55 1.53 -2.41 -2.63 -2.01 0.79 -1.5 -2.42 5.53 4.42 -2.93 7.71 -2.33 -1.84 -3.95 -2.36 0 1.95 0.84 3.2 1.96 4.95 3 4.39 5.24 1.53 1.82 -2.36 0.15 -0.8 0.52 1.96 -2.82 1.97 4.82 -2.45 7.03 -2.32 0.47 -2.61 2.4 5.95 2.05 1.21 3.26 2.15 5.96 3.78 4.49 5.43 2.57 2.59 -2.38 1.82 0.23 -0.46 2.8 -2.39 1.92 6.63 -1.91 7.92 -1.7 0.94 -2.63 2.78 6.6 0.8 -0.72 4.7 3.19 4.35 5.43 5.69 5.69 4.51 4.67 -1.43 -0.14 0.96 2.49 3.03 -0.75 6.92 10.87 0.19 7.96 4.08 1.93 -2.22 2.49 9.5 0.16 -0.48 4.99 2.27 4.02 5.3 5.01 5.47 4.21 4.5 -1.43 -0.62 0.39 2.21 3.38 -1.98 6.15 10.24 -1.04 7.46 3.42 0.69 -2.52 1.96 8.89 2.59 4.58 4.87 3.73 7.36 4.39 6 6.13 2.59 3.75 -0.41 2.94 2.12 4.1 2.68 0.42 8.22 9.79 1.31 10.67 2.35 3.57 4.91 4.3 8.9 GNS 2.37 5.06 4.85 4.25 6.74 3.7 6.54 6.23 2.58 4 -1.15 1.93 1.66 3.88 2.86 0.97 8.18 9.1 0.59 10.9 1.27 3.78 5.2 4.9 9.02 3.56 1.99 4.75 3.93 6.43 5.85 6.05 6.97 5.47 4.49 1.04 3.73 2.9 0.53 3.76 2.15 4.94 10.47 2.57 11.63 2.92 4.75 4.16 4.67 10.87 3.19 1.25 3.34 3.81 6.24 4.38 5.81 6.54 5.6 3.55 0.95 3.36 2.24 0.71 3.08 2.03 4.74 11.22 2.47 11.75 3.65 4.32 5.48 4.46 10.3 4.15 6.58 6.43 3.83 7.37 5.02 6.13 8.26 4.33 4.72 0.96 4.34 3.08 7.18 4.15 1.69 11.95 8.84 2.37 10.11 -0.16 6.18 5.22 4.56 10.32 3.73 6.59 5.71 3.6 7.26 4.55 5.94 7.99 4.32 4.58 1.41 4.62 2.78 7.24 3.87 1.65 12.01 9.43 2.29 10.05 0.43 5.86 4.97 4.53 10.36 microRNAshsa-miR-34c-3phsa-miR-3609 0.51 G7Ahsa-miR-885-5p 0.34 G7Bhsa-miR-2276 0.45 G26A G26Bhsa-miR-502-5p 0.72 G144Ahsa-miR-542-5p -1.75 G144B G166Ahsa-miR-4257 -2.1 G166Bhsa-miR-532-5p CB660A -1.58 CB660Bhsa-miR-192 -1.28 CB130Ahsa-miR-29c* CB130B -1.98 CB152Ahsa-miR-1208 CB152B -2.25hsa-miR-339-5p CB171A CB171B -2.4hsa-miR-424* LogFC p-valuehsa-miR-718 -2.82 FDR hsa-miR-10b* -1.66hsa-miR-135a* -2.5hsa-miR-339-3p hsa-miR-10b -2.5hsa-miR-96 -2.38hsa-miR-1469 1.7hsa-miR-363 8.50E-08hsa-miR-183* 1.91E-06 hsa-miR-194 hsa-miR-138 hsa-miR-502-3p hsa-miR-148a hsa-miR-25hsa-miR-140-3p 13.1 9.2hsa-miR-338-3p 13.05 8.47 8.7hsa-miR-155 12.06 8.76 12.09hsa-miR-362-5p 7.21 7.97 3.96 11.86 6.34 7.27hsa-miR-186 7.44 4.84 11.63 6.36 9.37hsa-miR-139-5p 6.73 7.64 9.63 9.99 6.14 4.27hsa-miR-663 9.37 6.98 7.91 6.38 10.78 10.44 4.37hsa-miR-1972 5.48 7.63 5.29 6.22 6.79 2.73 1.34 9.88hsa-miR-191 8.02 5.2 2.46 6.99 5.58 1.36 6.31 3.02hsa-miR-501-3p 9.54 6.22 3.57 4.14 2.92 6.43 6.36 6.99 5.45 3.11 2.7 0.98hsa-miR-488 5.28 6.45 2.6 10.44 6.21 7.33 6.14 2.45 6.04 3.11hsa-miR-199b-5p 5.31 2.81 5 3.34 6.31 10.44 8.7 -0.11 2.6 5.33 0.4hsa-miR-532-3p 5.02 5.39 6.02 6.56 3.53 4.8 8.77 -0.38 10.6 4.65 3.31 5.23hsa-miR-215 6.22 0.91 5.83 0.08 5.5 5.1 1.52 5.74 4.71 -0.55 3.52hsa-miR-874 6.64 6.1 10.29 3.77 1.29 0.48 6.72 1.24 4.82 5.89 5.43hsa-miR-625 3.02 -0.17 6.88 6.37 3.75 6.99 10.65 6.65 8.02 2.52 1.23 4.91 2.74 5.84 3.95hsa-miR-450a 6.46 5.17 -1.01 7.3 -1.54 6.82 5.69 6.17 2.71 5.51 10.74 7.27 -0.6 2.24hsa-miR-129-5p 2.67 5.69 4.05 4.79 -0.5 4.15 8.77 5.92 1.16 -2.47 -0.77 4.88 1.96 1.5 -0.52 6.38 4.6 3.05hsa-miR-378 4.99 -1.88 8.89 5.75 4.53 -0.65 5.22 4.45 -2.52 2.59 3.21 2.66 0.73 -0.79 7.07E-08 2.09 6.75hsa-miR-183 4.59 -0.12 4.94 4.88 6.2 -2.86 6.65 4.72 -1.39 1.67E-06 1.45 0.1 3.99 7.14 3.6 5.53hsa-miR-196a 1.41 3.25 3.78 1.06 4.57 6.64 6.4 3.79 -1.04 3.63 1.72 -2.53 -1.59 1.6 6.89 3.7hsa-miR-3648 3.84 2.9 -1.72 8.87 3.17 2.03 4.6 5.39 3.26 2.59 2.81 4.02 -1.36 4.88 -2.35 6.88 -2.49 3.99E-08 7.59hsa-miR-182 5.08 9.09 6.54 4.88 4.84 3.65E-08 4.83 3.41 -1.98 3.52 9.94E-07 4.14 3.32 4.5 4.59 -0.03 3.02 2.32 -1.68 5.05hsa-miR-196b 1.9 9.29E-07 4.92 5.61 -2.18 3.98 7.16 4.95 4.09 5.15 0.06 -1.13 2.1 4.64 4.35 3.4 4.86 3.18 1.16 -0.71 3.18E-08 -1.06 4.81 1.97 9.79 2.27 4.43 4.03 -2.29 6.8 -3.28 8.56E-07 -0.49 5.86 -2.39 4.93 2.66E-08 1.22E-08 9.81 6.91 -1.43 3.84 -2.7 -1.58 2.87 6.05 4.73 1.7 1.9 7.45E-07 1.95 -2.09 -0.15 4.00E-07 -0.88 4.06 -3.06 3.26 5.37 6.15 -1.8 3.46 -1.86 2.21E-08 -0.82 0.94 2.23 -1.2 2.57 4.53 3.39 3.9 4.87 2.01 -0.41 3.72 6.27 3.82 2.8 6.61E-07 4.62 5.27 -1.55 6.5 -2.19 3.2 5.28E-09 4.18 1.83 4.84 2.53 -1.22 6.68 9.73E-09 -2.4 -1.51 1.27 4.1 -4.24 1.98E-07 3.2 3.36E-07 4.89E-09 4.24 -2.12 6.32 4.35 4.96 0.42 -2.24 -1.32 -2.24 2.89 1.95E-07 2.94 2.23 -4.58 4.37E-09 3.8 2.45 2.9 -2.12 3.87 5.48 1.84E-07 6.31 0.32 -1.06 -2.56 2.82 -4.79 3.55 2.7 3.53 2.65E-09 -0.98 -2.27 3.84 3.07 -0.83 4.2 5.94 1.19E-07 -0.65 1.30E-09 -4.79 2.86 1.05 -3.08 -2 -1.57 6.50E-08 3.99 7.74E-10 -2.62 -1.43 1.03 -1.08 -4.44 2.4 -3.1 4.17E-08 0.76 -1.33 -2.85 4.53 -1.42 -3.14 -0.88 1.71 4.79E-10 -4.41 -3.77 3.5 -2.01 2.80E-08 -1.67 3.2 -2.42 2.6 4.6 -4.63 2.32 3.30E-10 -3.41 -0.22 6.09E-11 -0.53 1.46E-10 2.12E-08 7.33E-11 -2.19 -4.54 5.13E-09 1.46 1.03E-08 5.81E-09 -2.81 6.2 -2.01 -2.12 8.4 2.18E-11 1.51 -2.23 -2.07 -2.96 2.49E-09 8.49E-12 -3.57 5.5 1.27E-09 5.2 -2.53 4.81E-12 -3.06 3.29E-12 7.4 9.25E-10 8.86E-10 8.8 2.74E-13 7.07E-16 1.23E-10 9.52E-13

305 List of Abbreviations

5-bromo-2’-deoxyuridine BrdU AGO Argonaute AKT v-akt murine thymoma viral oncogene APC Antigen Presenting Cell ARF Alternate Reading Frame ASR Age Standardized Rate ATM Ataxia telangiectasia mutated BLBP Brain lipid binding protein BMP Bone Morphogenetic Protein Ca2+ Calcium CCDS Consensus Coding Sequence CD133 Prominin CD144 Vascular endothelial-cadherin CDK Cyclyn-Dependent Kinase CDKN Cyclyn-Dependent Kinase Inhibitor CGH Comparative genomics hybridization CHI3L1 Chitinase 3-like 1 CNA Copy Number Aberrations CNS Central Nervous System

CO2 Carbon dioxide CSF Cerebro Spinal Fluid Ct Cycle threshold CTL Cytotoxic Lymphocyte CTMP C-terminal modulator protein CpG Cytosine-phosphate-Guanine DNA Deoxyribonucleic Acid Dcx Doublecortin ECM Extracellular matrix EGF Epidermal Growth Factor EGFR Epidermal Growth Factor Receptor ES Embryonic Stem EST Expressed Sequence Tags FACS Fluorescent Activated Cell Sorting FC Fold change FDR False Discovery Rate FGF2 Fibroblast Growth Factor 2 FGF2 Fibroblast growth factor 2 FISH Fluorescence In Situ Hybridization FOXO Forkhead box O G-CIMP Glioma CpG Island Methylator Phenotype GABA Gamma-aminobutyric acid GBM Glioblastoma Multiforme GEMM Genetically Engineered Mouse Model GFAP Glial Fibrillary Acidic Protein GFAP Glial Fibrillary Acidic Protein GFP Green Fluorescent Protein GLAST Glutamate Aspartate Transporter GO Gene Ontology GSEA Gene Set Enrichment Analysis GTP Guanosine-5’-triphosphate GTPase Guanosine-5’-triphosphate HGNC HUGO gene nomenclature committee HIF1 Hypoxia Inducible Factor 1 HIF1A Hypoxia Inducible Factor 1, subunit α ICM Inner Cell Mass IDH1 Isocitrate Dehydrogenase 1 IQGAP1 IQ motif containing GTPase activating protein 1 IRES Internal Ribosome Entry Site KEGG Kyoto Encyclopedia of Genes and Genomes LIF Leukemia Inhibitory Factor

306 Ln Natural logarithm LOH Loss of Heterozygosity MAP2 Microtubule-associated protein 2 MAPK Mitogen Activated Protein Kinase MDM2 Mdm2 p53 binding protein homolog MDM4 Mdm4 p53 binding protein homolog MGC Mammalian Gene Collection MGI Mouse Genome Informatics MGMT O-6-methylguanine-DNA methyltransferase MIQE Minimum Information for publication of Quantitative real-time PCR Experiments MMR Mismatch Repair MSH6 MutS homolog 6 NADPH Nicotinamide Adenine Dinucleotide Phosphate NCBI National Center for Biotechnology Information NEP Neuroepithelial progenitor NF1 Neurofibromin 1 NS Neural Stem OCT4 Octamer-binding protein 4 OLIG2 Oligodendrocyte lineage transcription factor 2 ORF Open Reading Frame PCA Principal Component Analysis PDGF Platelet-Derived Growth Factor PDGFR Platelet-Derived Growth Factor Receptor PDPK1 3-phosphoinositide dependent protein kinase-1 PHLPP PH domain and leucine rich repeat protein phosphatase 1 PI3K Phosphoinositide-3-Kinase PI3KR Phosphoinositide-3-Kinase Receptor

PIP3 phosphatidylinositol (3,4,5)-trisphosphate PTEN Phosphatase and tensin homolog Pax6 Paired box gene 6 RA Retinoic Acid RAS Rat Sarcoma RB1 Retinoblastoma 1 RISC RNA Induced Silencing Complex RNA Rybonucleic Acid RTK Receptor Tyrosine Kinase SAGE Serial Analysis of Gene Expression SCV Squared Coefficient of Variation SGZ Subgranular Zone SHH Sonic hedge hog SILAC Stable Isotope Labeling by Amino acids in Cell culture SKY Spectral Karyotyping SNP Single Nucleotide Polymorphism SOX1 Sex determining region Y-box 1 SOX2 Sex determining region Y-box 2 SSEA1 Stage-specific embryonic antigen 1 SVZ Subventricular Zone TCGA The Cancer Genome Atlas TGFβ Tumour Growth Factor β TP53 Tumor Protein 53 TUBB type III β-tubulin Tag-seq Tag sequencing UCSC University California Santa Cruz UTR Untranslated Region VZ Ventricular Zone WHO World Health Organization aCGH array comparative genomics hybridization bp base pair iHOP information Hyperlinked Over Proteins mRNA messenger RNA mm millimeter ncRNAs non-coding RNAs nt nucleotide oligo-dT oligo deoxy-thymine poly-A poly-adenine poly-T poly-thymine qRT-PCR quantitative Real-Time PCR rRNA ribosomal RNA tpm tags per million UCSC University of California Santa Cruz List of Figures

1.1 Estimates of survival amongst GBM patients treated with radio- therapy alone or radiotherapy with the alkylating agent temo- zolomide. Taken from Stupp et al 2005 [472]...... 7 1.2 KEGG Glioma Pathway...... 23 1.3 Visualisation generated from list of 345 interactors (orange) of TP53 (yellow) from the BioGRID 3.1 [62] repository for inter- action datasets...... 27 1.4 The Biocarta pathway for Rb signaling...... 30

2.1 Cross-section through the neural tube...... 35 2.2 Surface markers of radial glia are expressed by NS cell lines, indicating that these cells may provide the biological context to work with progenitors of the CNS...... 37 2.3 Sources of NS cells: (a) ICM; (b) SVZ...... 38 2.4 (a,b) Contrast microscopy images of early phase neurosphere formation. (c,d) Immunofluorescence microscopy images of EGFR and Nestin detected on an intact neurosphere...... 39 2.5 Schematisation of the neurosphere assay used to study neural precursor cells in culture...... 41 2.6 Diagram of the progressive lineage restriction of ES cells differ- entiating toward the neural phenotype...... 44 2.7 Protocol describing conversion of ES cells into immortalised NS cell lines...... 46 2.8 Representation of the Sox1-GFP reporter construct used in the niche-independent NS cell protocol...... 46 2.9 Roles of EGF and FGF2 in the derivation and maintenance of NS cells...... 52

3.1 Stem cell differentiation hierarchy...... 58 3.2 Diagram of asymmetric and symmetric cell division...... 59

308 3.3 Schematisation of cell cycle phases...... 67 3.4 Glioblastoma treatment with ionizing radiation...... 68 3.5 Glioblastoma treatment with BMPs...... 70 3.6 PCA diagram of global mRNA expression in GNS cell lines. . . 82

4.1 Classes of non-coding RNAs discovered to date...... 86

5.1 Schematisation of the longSAGE protocol...... 95 5.2 Boxplot of normalised Ct values...... 102 5.3 Correlation scatter plot of the raw Ct values vs endogenous controls...... 103 5.4 Scatter plots of our A and B biological replicates...... 106 5.5 Dot plot of the standard deviations of the differences between expression levels in two replicates...... 107 5.6 Literature mining diagram of code functions...... 109 5.7 Schematisation of a typical Cytoscape network...... 115

6.1 Sequencing construct schematisation...... 121 6.2 Diagram of the extraction, filtering and mapping phases for reads and tags...... 122 6.3 Pie charts for the proportion of filtered tags...... 123 6.4 Diagram of the tag mapping strategy...... 124 6.5 Pie charts for the assignment of tags to genes...... 126 6.6 Correlations for all combinations of cell lines...... 127 6.7 CGH array analysis...... 128 6.8 Curves show distributions of expression level differences between GNS and NS lines...... 129 6.9 Plot of the estimates of the variance against the base levels for each gene...... 130 6.10 Plot of the fold change versus the mean for normal vs tumour samples...... 131 6.11 Tag mapping of NTRK2 on UCSC genome browser...... 135 6.12 Expression estimates correlate well between Tag-seq and qRT- PCR...... 138 6.13 Heatmap of 29 genes differentially expressed between 16 GNS and 6 NS cell lines ...... 138 6.14 Expression levels of the 29 genes distinguishing GNS from NS lines as percent of NS geometric mean...... 142 6.15 Tag mapping of BMP7 on UCSC genome browser...... 148 6.16 Tag mapping of TPM1 on UCSC genome browser...... 149 6.17 Schematisation of the process of finding genes with differentially expressed isoforms...... 150 6.18 Schematisation of the localisation of the microRNA seeds on the mapped isoforms...... 151 6.19 Isoform detection by multi tag mapping of gene GAPVD1 on UCSC genome browser...... 154 6.20 Isoform detection by multi tag mapping of gene SMAD1 on UCSC genome browser...... 156 6.21 Correlated expression of CTSC and a nearby ncRNA...... 159 6.22 Histogram of expression levels of HOTAIRM1 and surrounding HOX genes...... 160

7.1 Enrichment plots for the top two pathways revealed through GSEA of the KEGG database of pathways [2]...... 165 7.2 GSEA plots of (a) nominal p-values vs normalised enrichment score and (b) line graph of the enrichment scores across pathways.165 7.3 Correlation of Tag-seq interrogated cell lines with glioblastoma subtype expression signatures...... 170 7.4 Core gene expression changes in GNS lines are mirrored in glioblastoma tumours...... 172 7.5 Association between GNS signature and other survival predictors.184 7.6 Association between GNS signature score and patient survival. . 185 7.7 The integrated glioblastoma pathway subdivided into sections identifying the gene networks that participate in the pathway. . 188 7.8 Integrated GBM pathway used to overlay the Tag-seq GNS dataset...... 191 7.9 Affected p53, RB1 and PTEN/PI3K pathways...... 192 7.10 Four integrated GBM pathways overlaid with Tag-seq expres- sion level measures for each GNS cell line...... 194 7.11 Integrated GBM pathway with the TCGA dataset overlaid. . . . 199 7.12 Integrated GBM pathway with the HGG dataset overlaid. . . . 200 7.13 Three integrated GBM pathways overlaid with the GNS, TCGA and HGG datasets...... 201

8.1 Overview of GenemiR software...... 205 8.2 Internal organisation of the target prediction database of Gen- emiR...... 206 8.3 Workflows at the core of the primitive functions of the GenemiR software...... 207 8.4 Step by step diagram of the ensemble method adopted to find the score E (=C2/C1) of prediction accuracy for prediction al- gorithms...... 214

D.1 Integrated GBM pathway with G144 cell line Tag-seq expres- sion data...... 282 D.2 Integrated GBM pathway with G144ED cell line Tag-seq ex- pression data...... 283 D.3 Integrated GBM pathway with G166 cell line Tag-seq expres- sion data...... 284 D.4 Integrated GBM pathway with G179 cell line Tag-seq expres- sion data...... 285 List of Tables

1.1 Histological Types and Prognosis of Gliomas (y, years). Taken from Doyle et al 2005 [128]...... 6

2.1 Summary of commonalities and differences between ES cells and NS cells. Adapted from Pollard et al 2006 [399] ...... 53

3.1 Summary of characteristics of NBE and serum-cultured glioblas- toma cells. Adapted from Lee et al 2006...... 77

5.1 Summary of cell lines investigated with Tag-seq...... 95 5.2 Classification of sequenced tags in each cell line...... 99 5.3 Summary of statistics using the χ2 and logarithmic tests. . . . . 111 5.4 Public gene expression datasets used in thesis...... 113

6.1 Summary of the available clinical data for our GNS cell lines. . . 121 6.2 Summary of reads per cell line library...... 121 6.3 Significance of the correlation found between CNAs and expres- sion levels measured with Fisher’s exact test (p-value)...... 128 6.4 Table of genes with large expression changes common to the GNS cell lines...... 132 6.5 Differentially expressed genes assigned to a four-tier classifica- tion system...... 143 6.6 Genes with limited or no evidence of implication in glioblastoma that appear in our pathway...... 145 6.7 Summary of predicted microRNAs targeting differentially ex- pressed isoforms...... 151 6.8 MicroRNA array results for GNS cell lines with respect to NS cell lines...... 152 6.9 Differentially expressed ncRNAs...... 157

312 7.1 Selected Gene Ontology terms and InterPro domains enriched among differentially expressed genes...... 163 7.2 Representative KEGG pathways from signaling pathway impact analysis of gene expression differences between GNS and NS lines.164 7.3 Summary of all MHC class I and II genes...... 167 7.4 Literature survey for the 29 genes found to distinguish GNS from NS lines across a panel of 21 cell lines ...... 175 7.5 Survival tests for the 29 genes found via qRT-PCR to distinguish GNS cell lines from NS cell lines...... 183 7.6 Significance of survival association for GNS signature and IDH1 status...... 186 7.7 Node assignment in the glioblastoma pathway...... 190

8.1 microRNA target prediction algorithms used by GenemiR with number of microRNA:3'UTR interactions predicted. The origi- nal target identifiers refer to the identifiers used by a prediction algorithm to identify the targeted genes. The final target iden- tifiers refer to the identifiers that are returned by any query of any prediction algorithm database...... 209 8.2 Single prediction algorithm ensemble analysis results. Displayed in descending order of E-score...... 215 8.3 All combinations of prediction algorithms in descending order of E-score...... 216

A.1 Classification of differentially expressed genes at 10% FDR. . . . 235 A.2 Classification of differentially expressed genes based on litera- ture mining analysis...... 251 A.3 Raw Ct values. Abbreviations: "down" for down-regulated, "up" for up-regulated, and "Norm" for Normalisation control. . 257 A.4 Normalised Ct values. Abbreviations: "down" for down-regulated, "up" for up-regulated, and "Norm" for Normalisation control. . 261 A.5 Pearson correlation values between the normalised Ct values measured through qRT-PCR and the tag counts measured across the five GNS and NS cell lines assayed via Tag-seq...... 265

C.1 Differentially expressed non-coding RNAs...... 274

D.1 GBM pathway interaction data...... 276 E.1 Fold-changes measured by exon array for GNS cell lines. . . . . 286

F.1 Differentially expressed microRNAs in GNS vs NS cell lines at FDR<1%...... 301 Bibliography

[1] Cell signaling technology. MAPK-ERK signaling cascade. http://www.cellsignal.com/reference/pathway/mapk_erk_growth.html. [2] Kegg pathways. http://www.genome.jp/kegg/pathway.html. [3] Kyoto encyclopedia of genes and genomes. Glioma pathway. http://www.genome.jp/kegg-bin/_pathway?hsa05214. [4] Kyoto encyclopedia of genes and genomes. P53 pathway. http://www.genome.jp/kegg/pathway/hsa/hsa04115.html. [5] Panter pathway. P53 pathway. http://www.pantherdb.org/pathway/pathwaydiagram. [6] S. Acharya, T. Wilson, S. Gradia, M. F. Kane, S. Guerrette, G. T. Marsischky, R. Kolodner, and R. Fishel. hMSH2 forms specific mispair-binding complexes with hMSH3 and hMSH6. Proceedings of the National Academy of Sciences of the United States of America, 93(24):13629–13634, Nov. 1996. [7] M. Adamowicz, B. Radlwimmer, R. Rieker, D. Mertens, M. Schwarzbach, P. Schraml, A. Benner, P. Lichter, G. Mechtersheimer, and S. Joos. Frequent amplifications and abundant expression of trio, nkd2, and irx2 in soft tissue sarcomas. Genes, Chromo- somes and Cancer, 45(9):829–838, 2006. [8] D. Adams, B. Hasson, A. Boyer-Boiteau, A. El-Khishin, and V. Shashoua. A peptide fragment of ependymin neurotrophic factor uses protein kinase c and the mitogen- activated protein kinase pathway to activate c-jun n-terminal kinase and a functional ap-1 containing c-jun and c-fos proteins in mouse nb2a cells. Journal of neuroscience research, 72(3):405–416, 2003. [9] A. Adesina, Y. Nguyen, V. Mehta, H. Takei, P. Stangeby, S. Crabtree, M. Chin- tagumpala, and M. Gumerlock. Foxg1 dysregulation is a frequent event in medul- loblastoma. Journal of neuro-oncology, 85(2):111–122, 2007. [10] C. Agulhon, J. Petravicz, A. McMullen, E. Sweger, S. Minton, S. Taves, K. Casper, T. Fiacco, and K. McCarthy. What is the role of astrocyte calcium in neurophysiology? Neuron, 59(6):932–946, 2008. [11] E. Ah Cho and G. Dressler. Tcf-4 binds [beta]-catenin and is expressed in distinct regions of the embryonic brain and limbs. Mechanisms of development, 77(1):9–18, 1998. [12] N. Ahuja, Q. Li, A. Mohan, S. Baylin, and J. Issa. Aging and dna methylation in colorectal mucosa and cancer. Cancer research, 58(23):5489–5494, 1998. [13] N. Ahuja, Q. Li, A. Mohan, S. Baylin, and J. Issa. Aging and dna methylation in colorectal mucosa and cancer. Cancer research, 58(23):5489, 1998. [14] T. Akai, Y. Ueda, Y. Sasagawa, T. Hamada, T. Date, S. Katsuda, H. Iizuka, Y. Okada, and K. Chada. High mobility group ic protein in astrocytoma and glioblastoma. Pathology-Research and Practice, 200(9):619–624, 2004. [15] M. Al-Hajj, M. Wicha, A. Benito-Hernandez, S. Morrison, and M. Clarke. Prospective identification of tumorigenic breast cancer cells. Proceedings of the National Academy of Sciences, 100(7):3983, 2003. [16] P. Alexiou, M. Maragkakis, G. Papadopoulos, M. Reczko, and A. Hatzigeorgiou. Lost in translation: an assessment and perspective for computational target iden- tification. Bioinformatics, 25(23):3049, 2009.

315 [17] N. Allen and B. Barres. Signaling between glia and neurons: focus on synaptic plas- ticity. Current opinion in neurobiology, 15(5):542–548, 2005. [18] A. Alvarez-Buylla and D. Lim. For the Long Run: Maintaining Germinal Niches in the Adult Brain. Neuron, 41(5):683–686, 2004. [19] V. Ambros and X. Chen. The regulation of genes and genomes by small RNAs. Development, 134(9):1635–41, 2007. [20] M. Amiry-Moghaddam and O. Ottersen. The molecular basis of water transport in the brain. Nature Reviews Neuroscience, 4(12):991–1001, 2003. [21] S. Anders and W. Huber. Differential expression analysis for sequence count data. Genome Biol, 11(10):R106, 2010. [22] P. Andrews. Retinoic acid induces neuronal differentiation of a cloned human embry- onal carcinoma cell line in vitro* 1. Developmental biology, 103(2):285–293, 1984. [23] A. Aravin, D. Gaidatzis, S. Pfeffer’t, M. Lagos-Quintana, P. Landgraf, and T. Tuschl. A novel class of small rnas bind to mili protein in mouse testes. Nature, 442:203–207, 2006. [24] A. Aravin, N. Naumova, A. Tulin, V. Vagin, Y. Rozovsky, and V. Gvozdev. Double- stranded rna-mediated silencing of genomic tandem repeats and transposable elements in the d. melanogaster germline. Current Biology, 11(13):1017–1027, 2001. [25] K. Archer, V. Mas, K. David, D. Maluf, K. Bornstein, and R. Fisher. Identifying genes for establishing a multigenic test for hepatocellular carcinoma surveillance in hepatitis c virus-positive cirrhotic patients. Cancer Epidemiology Biomarkers & Prevention, 18(11):2929–2932, 2009. [26] M. Arpin, E. Friederich, M. Algrain, F. Vernel, and D. Louvard. Functional differences between l-and t-plastin isoforms. The Journal of cell biology, 127(6):1995, 1994. [27] M. Arpin, E. Friederich, M. Algrain, F. Vernel, and D. Louvard. Functional differences between l-and t-plastin isoforms. The Journal of cell biology, 127(6):1995–2008, 1994. [28] ArrayExpress. www.ebi.ac.uk/arrayexpress. [29] M. Assimakopoulou, M. Kondyli, G. Gatzounis, T. Maraziotis, and J. Varakis. Neu- rotrophin receptors expression and jnk pathway activation in human astrocytomas. BMC cancer, 7(1):202, 2007. [30] S. Assinder, J. Stanton, and P. Prasad. Transgelin: an actin-binding protein and tu- mour suppressor. The international journal of biochemistry & cell biology, 41(3):482– 486, 2009. [31] P. Au, Q. Zhu, E. Dennis, and M. Wang. Long non-coding rna-mediated mechanisms independent of the rnai pathway in animals and plants. RNA biology, 8(3), 2011. [32] J. Aubert, M. Stavridis, S. Tweedie, M. O’Reilly, K. Vierlinger, M. Li, P. Ghazal, T. Pratt, J. Mason, D. Roy, et al. Screening for mammalian neural genes via fluorescence-activated cell sorter purification of neural precursors from sox1-gfp knock- in mice. Proceedings of the National Academy of Sciences of the United States of America, 100(Suppl 1):11836, 2003. [33] F. Azevedo, L. Carvalho, L. Grinberg, J. Farfel, R. El Ferreti, R. Leite, W. Jacob filho, R. lent, and S. herculano houzel. Equal numbers of neuronal and non-neuronal cells make the human brain an isometrically scaled-up primate brain. brain, 513:532–541, 2009. [34] H. Babu, G. Cheung, H. Kettenmann, T. Palmer, and G. Kempermann. Enriched monolayer precursor cell cultures from micro-dissected adult mouse dentate gyrus yield functional granule cell-like neurons. PLoS One, 2(4):e388, 2007. [35] A. Bader, S. Kang, and P. Vogt. Cancer-specific mutations in PIK3CA are oncogenic in vivo. Proceedings of the National Academy of Sciences of the United States of America, 103(5):1475, 2006. [36] A. Bader, S. Kang, L. Zhao, and P. Vogt. Oncogenic PI3K deregulates transcription and translation. Nature reviews cancer, 5(12):921–929, 2005. [37] D. Baek, J. Villen, C. Shin, F. D. Camargo, S. P. Gygi, and D. P. Bartel. The impact of microRNAs on protein output. Nature, 455(7209):64–71, 2008. [38] G. Bain, D. Kitchens, M. Yao, J. Huettner, and D. Gottlieb. Embryonic stem cells express neuronal properties in vitro. Developmental biology, 168(2):342–357, 1995. [39] L. Balenci. IQGAP1 Protein Specifies Amplifying Cancer Cells in Glioblastoma Mul- tiforme. Cancer Research, 66(18):9074–9082, Sept. 2006. [40] S. Bao, Q. Wu, R. McLendon, Y. Hao, Q. Shi, A. Hjelmeland, M. Dewhirst, D. Bigner, and J. Rich. Glioma stem cells promote radioresistance by preferential activation of the DNA damage response. Nature, 444(7120):756–760, 2006. [41] I. Barani, S. Benedict, and P. Lin. Neural stem cells: implications for the conven- tional radiotherapy of central nervous system malignancies. International Journal of Radiation Oncology* Biology* Physics, 68(2):324–333, 2007. [42] J. Barnes and P. Hut. A hierarchical 0 (n log iv) force-calculation algorithm. nature, 324:4, 1986. [43] D. P. Bartel. MicroRNAs: target recognition and regulatory functions. Cell, 136(2):215–33, 2009. [44] N. Baumann. Biology of oligodendrocyte and myelin in the mammalian central nervous system. Physiological Reviews, 2001. [45] A. Bellacosa, C. C. Kumar, A. Di Cristofano, and J. R. Testa. Activation of AKT kinases in cancer: implications for therapeutic targeting. Advances in cancer research, 94:29–86, 2005. [46] A. Bellacosa, J. R. Testa, R. Moore, and L. Larue. A portrait of AKT kinases: human cancer and animal models depict a family with strong individualities. Cancer biology & therapy, 3(3):268–275, Mar. 2004. [47] D. R. Bentley. Whole-genome re-sequencing. Current opinion in genetics and devel- opment, 16(6):545–52, 2006. [48] I. Bentwich. Prediction and validation of micrornas and their targets. FEBS letters, 579(26):5904–5910, 2005. [49] R. Berg, E. Leung, S. Gough, C. Morris, W. Yao, S. Wang, J. Ni, and G. Krissansen. Cloning and characterization of a novel-beta integrin-related cdna coding for the pro- tein tied. Genomics, 56(2):169–178, 1999. [50] R. Berg, E. Leung, S. Gough, C. Morris, W. Yao, S. Wang, J. Ni, and G. Krissansen. Cloning and characterization of a novel β integrin-related cdna coding for the protein tied (ÂŞten β integrin egf-like repeat domainsÂŤ) that maps to chromosome band 13q33: a divergent stand-alone integrin stalk structure. Genomics, 56(2):169–178, 1999. [51] A. Bergamaschi, Y. Kim, K. Kwei, Y. La Choi, M. Bocanegra, A. Langerød, W. Han, D. Noh, D. Huntsman, S. Jeffrey, et al. Camk1d amplification implicated in epithelial- mesenchymal transition in basal-like breast cancer. Molecular oncology, 2(4):327–339, 2008. [52] R. Beroukhim, G. Getz, L. Nghiemphu, J. Barretina, T. Hsueh, D. Linhart, I. Vivanco, J. Lee, J. Huang, S. Alexander, et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proceedings of the National Academy of Sciences, 104(50):20007–20012, 2007. [53] M. Berry, Z. Ahmed, B. Lorber, M. Douglas, and A. Logan. Regeneration of axons in the visual system. Restorative neurology and neuroscience, 26(2):147–174, 2008. [54] F. Bertucci, P. Finetti, N. Cervera, E. Charafe-Jauffret, E. Mamessier, J. Adélaïde, S. Debono, G. Houvenaeghel, D. Maraninchi, P. Viens, et al. Gene expression profiling shows medullary breast cancer is a subgroup of basal breast cancers. Cancer Research, 66(9):4636–4644, 2006. [55] M. Bibel, J. Richter, K. Schrenk, K. Tucker, V. Staiger, M. Korte, M. Goetz, and Y. Barde. Differentiation of mouse embryonic stem cells into a defined neuronal lin- eage. Nature neuroscience, 7(9):1003–1009, 2004. [56] L. Biesecker. Exome sequencing makes medical genomics a reality. Nature genetics, 42(1):13, 2010. [57] E. Bindewald and B. Shapiro. Rna secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers. Rna, 12(3):342–352, 2006. [58] Biocarta. P53 signaling pathway. http://www.biocarta.com/pathfiles/h_p53pathway.asp. [59] Biocarta. Pten dependent cell cycle arrest and apoptosis. http://www.biocarta.com/pathfiles/h_ptenpathway.asp. [60] Biocarta. Rb tumor suppressor/checkpoint signaling in response to dna damage. http://www.biocarta.com/pathfiles/h_rbpathway.asp. [61] Biocarta. www.biocarta.com. [62] BioGRID. Database of protein and genetic interactions. www.thebiogrid.org. [63] B. Boëda, D. Briggs, T. Higgins, B. Garvalov, A. Fadden, N. McDonald, and M. Way. Tes, a specific mena interacting partner, breaks the rules for evh1 binding. Molecular cell, 28(6):1071–1082, 2007. [64] S. Bonnet, S. Archer, J. Allalunis-Turner, A. Haromy, C. Beaulieu, R. Thompson, C. Lee, G. Lopaschuk, L. Puttagunta, S. Bonnet, et al. A mitochondria-k+ channel axis is suppressed in cancer and its normalization promotes apoptosis and inhibits cancer growth. Cancer cell, 11(1):37–51, 2007. [65] K. Boon, E. Osório, S. Greenhut, C. Schaefer, J. Shoemaker, K. Polyak, P. Morin, K. Buetow, R. Strausberg, S. De Souza, et al. An anatomy of normal and malignant gene expression. Proceedings of the National Academy of Sciences, 99(17):11287, 2002. [66] B. Borrell. How accurate are cancer cell lines? Nature, 463(7283):858, Feb. 2010. [67] P. Bos, X. Zhang, C. Nadal, W. Shu, R. Gomis, D. Nguyen, A. Minn, M. van de Vijver, W. Gerald, J. Foekens, et al. Genes that mediate breast cancer metastasis to the brain. Nature, 459(7249):1005–1009, 2009. [68] R. Bourgo, U. Ehmer, J. Sage, and E. Knudsen. RB deletion disrupts coordination between dna replication licensing and mitotic entry in vivo. Molecular biology of the cell, 22(7):931, 2011. [69] C. Brennan, H. Momota, D. Hambardzumyan, T. Ozawa, A. Tandon, A. Pedraza, and E. Holland. Glioblastoma subclasses can be defined by activity among signal transduction pathways and associated genomic alterations. PLoS ONE, 4(11):e7752, 2009. [70] J. Brennecke, D. R. Hipfner, A. Stark, R. B. Russell, and S. M. Cohen. bantam encodes a developmentally regulated microRNA that controls cell proliferation and regulates the proapoptotic gene hid in Drosophila. Cell, 113(1):25–36, 2003. [71] J. Brennecke, A. Stark, R. B. Russell, and S. M. Cohen. Principles of microRNA-target recognition. PLoS Biol, 3(3):e85, 2005. [72] J. Briscoe and J. Ericson. Specification of neuronal fates in the ventral neural tube. Current opinion in neurobiology, 11(1):43–49, 2001. [73] P. Brodal. The Central Nervous System: Structure and Function. Oxford Univ Pr, 2010. [74] J. Brognard, E. Sierecki, T. Gao, and A. C. Newton. PHLPP and a second isoform, PHLPP2, differentially attenuate the amplitude of Akt signaling by regulating distinct Akt isoforms. Molecular cell, 25(6):917–931, Mar. 2007. [75] K. Brown, D. Strathdee, S. Bryson, W. Lambie, and A. Balmain. The malignant capacity of skin tumours induced by expression of a mutant h-ras transgene depends on the cell type targeted. Current biology, 8(9):516–524, 1998. [76] O. Brustle, K. Jones, R. Learish, K. Karram, K. Choudhary, O. Wiestler, I. Duncan, and R. McKay. Embryonic stem cell-derived glial precursors: A source of myelinating transplants. Science, 285(5428):754–756, 1999. [77] J. Buckner, P. Brown, and B. O’Neill. Central nervous system tumors. Mayo Clinic Proceedings, 2007. [78] S. Bustin, V. Benes, J. Garson, J. Hellemans, J. Huggett, M. Kubista, R. Mueller, T. Nolan, M. Pfaffl, G. Shipley, et al. The miqe guidelines: minimum information for publication of quantitative real-time pcr experiments. Clinical chemistry, 55(4):611– 622, 2009. [79] D. Cahill, K. Levine, R. Betensky, P. Codd, C. Romany, L. Reavie, T. Batchelor, P. Futreal, M. Stratton, W. Curry, et al. Loss of the mismatch repair protein MSH6 in human glioblastomas is associated with tumor progression during temozolomide treatment. Clinical cancer research, 13(7):2038, 2007. [80] I. Camby, N. Nagy, M. Lopes, B. Schäfer, C. Maurage, M. Ruchoux, P. Murmann, R. Pochet, C. Heizmann, J. Brotchi, et al. Supratentorial pilocytic astrocytomas, astrocytomas, anaplastic astrocytomas and glioblastomas are characterized by a dif- ferential expression of s100 proteins. Brain pathology, 9(1):1–19, 1999. [81] M. Carlén, K. Meletis, C. Göritz, V. Darsalia, E. Evergren, K. Tanigaki, M. Amendola, F. Barnabé-Heider, M. S. Y. Yeung, L. Naldini, T. Honjo, Z. Kokaia, O. Shupliakov, R. M. Cassidy, O. Lindvall, and J. Frisén. Forebrain ependymal cells are Notch- dependent and generate neuroblasts and astrocytes after stroke. Nature neuroscience, 12(3):259–267, Mar. 2009. [82] E. Carpenter, J. Goddard, A. Davis, T. Nguyen, and M. Capecchi. Targeted disruption of hoxd-10 affects mouse hindlimb development. Development, 124(22):4505–4514, 1997. [83] M. Carpenter, X. Cui, Z. Hu, J. Jackson, S. Sherman, Å. Seiger, and L. Wahlberg. In vitro expansion of a multipotent population of human neural progenitor cells. Exper- imental neurology, 158(2):265–278, 1999. [84] A. Carracedo, A. Alimonti, and P. P. Pandolfi. PTEN Level in Tumor Suppression: How Much Is Too Little? Cancer Research, 71(3):629–633, Feb. 2011. [85] M. Carro, W. Lim, M. Alvarez, R. Bollo, X. Zhao, E. Snyder, E. Sulman, S. Anne, F. Doetsch, H. Colman, et al. The transcriptional network for mesenchymal transfor- mation of brain tumours. Nature, 463(7279):318–325, 2009. [86] E. Cerami, E. Demir, N. Schultz, and B. Taylor. Automated network analysis identifies core pathways in glioblastoma. PLoS ONE, 2010. [87] S. Certain, F. Barrat, E. Pastural, F. Le Deist, J. Goyo-Rivas, N. Jabado, M. Benker- rou, R. Seger, E. Vilmer, G. Beullier, et al. Protein truncation test of lyst reveals heterogenous mutations in patients with chediak-higashi syndrome. Blood, 95(3):979– 983, 2000. [88] V. Chabottaux, S. Ricaud, L. Host, S. Blacher, A. Paye, M. Thiry, A. Garofalakis, C. Pestourie, K. Gombert, F. Bruyere, et al. Membrane-type 4 matrix metallopro- teinase (mt4-mmp) induces lung metastasis by alteration of primary breast tumour vascular architecture. Journal of cellular and molecular medicine, 13(9b):4002–4013, 2009. [89] K. Chan, I. Espinosa, M. Chao, D. Wong, L. Ailles, M. Diehn, H. Gill, J. Presti, H. Chang, M. Van De Rijn, et al. Identification, molecular characterization, clinical prognosis, and therapeutic targeting of human bladder tumor-initiating cells. Proceed- ings of the National Academy of Sciences, 106(33):14016–14021, 2009. [90] C. Cheadle, M. Nesterova, T. Watkins, K. Barnes, J. Hall, A. Rosen, K. Becker, and Y. Cho-Chung. Regulatory subunits of pka define an axis of cellular prolifera- tion/differentiation in ovarian cancer cells. BMC Medical Genomics, 1(1):43, 2008. [91] K. Chen and N. Rajewsky. Deep conservation of microRNA-target relationships and 3’UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harb Symp Quant Biol, 71:149–56, 2006. [92] L. Z. L. Y. Z. Q. Chen K, Luo Z. Perp gene therapy attenuates lung cancer xenograft via inducing apoptosis and suppressing vegf. Cancer biology and therapy, 12, 2011. [93] A. J. S. T. S. C. B.-N. S. S. J. M. R. M. R. S. S. Q. H. P. J. T. A. R. T. L. W. S. K. C. J. B. C. M. M. G. R. H. D. Cheung KâĂŘJJ, Johnson NA. Acquired tnfrsf14 mutations in follicular lymphoma are associated with worse prognosis. Cancer Res, 70:9166–9174, 2010. [94] F. Chibon, O. Mariani, J. Derré, A. Mairal, J. Coindre, L. Guillou, X. Sastre, F. Pédeu- tour, and A. Aurias. Ask1 (map3k5) as a potential therapeutic target in malignant fibrous histiocytomas with 12q14–q15 and 6q23 amplifications. Genes, Chromosomes and Cancer, 40(1):32–37, 2004. [95] S. Chigurupati, R. Venkataraman, D. Barrera, A. Naganathan, M. Madan, L. Paul, J. Pattisapu, G. Kyriazis, K. Sugaya, S. Bushnev, et al. Receptor channel trpc6 is a key mediator of notch-driven glioblastoma growth and invasiveness. Cancer research, 70(1):418–427, 2010. [96] E. Chiocca. The many functions of microRNAs in glioblastoma. World neurosurgery, 2010. [97] S. Chirasani, A. Sternjak, P. Wend, S. Momma, B. Campos, I. Herrmann, D. Graf, T. Mitsiadis, C. Herold-Mende, D. Besser, et al. Bone morphogenetic protein-7 release from endogenous neural precursor cells suppresses the tumourigenicity of stem-like glioblastoma cells. Brain, 133(7):1961–1972, 2010. [98] M. Choi, U. Scholl, W. Ji, T. Liu, I. Tikhonova, P. Zumbo, A. Nayir, A. Bakkaloğlu, S. Özen, S. Sanjad, et al. Genetic diagnosis by whole exome capture and massively par- allel dna sequencing. Proceedings of the National Academy of Sciences, 106(45):19096– 19101, 2009. [99] L. M. Chow and S. J. Baker. PTEN function in normal and neoplastic growth. Cancer letters, 241(2):184–196, Sept. 2006. [100] L. M. Chow, R. Endersby, X. Zhu, S. Rankin, C. Qu, J. Zhang, A. Broniscer, D. W. Ellison, and S. J. Baker. Cooperativity within and among Pten, p53, and Rb pathways induces high-grade astrocytoma in adult brain. Cancer Cell, 19(3):305–16, 2011. [101] M. J. Clark, N. Homer, B. D. O’Connor, Z. Chen, A. Eskin, H. Lee, B. Merriman, and S. F. Nelson. U87MG decoded: the genomic sequence of a cytogenetically aberrant human cancer cell line. PLoS genetics, 6(1):e1000832, 2010. [102] N. M. Cohen, E. Kenigsberg, and A. Tanay. Primate CpG islands are maintained by heterogeneous evolutionary regimes involving minimal selection. Cell, 145(5):773–786, May 2011. [103] A. Collins, P. Berry, C. Hyde, M. Stower, and N. Maitland. Prospective identification of tumorigenic prostate cancer stem cells. Cancer research, 65(23):10946, 2005. [104] V. Collins. Amplified genes in human gliomas. In Seminars in cancer biology, volume 4, page 27, 1993. [105] H. Colman, L. Zhang, E. Sulman, J. McDonald, N. Shooshtari, A. Rivera, S. Popoff, C. Nutt, D. Louis, J. Cairncross, et al. A multigene predictor of outcome in glioblas- toma. Neuro-oncology, 12(1):49–57, 2010. [106] G. O. Consortium et al. Gene ontology: tool for the unification of biology. Nature genetics, 25(1):25–29, 2000. [107] L. Conti, S. M. Pollard, T. Gorba, E. Reitano, M. Toselli, G. Biella, Y. Sun, S. Sanzone, Q.-L. Ying, E. Cattaneo, and A. Smith. Niche-Independent Symmetrical Self-Renewal of a Mammalian Tissue Stem Cell. PLoS Biology, 3(9):e283, 2005. [108] H. Contreras, R. Ledezma, J. Vergara, F. Cifuentes, C. Barra, P. Cabello, I. Gallegos, B. Morales, C. Huidobro, and E. Castellón. The expression of syndecan-1 and-2 is associated with gleason score and epithelial-mesenchymal transition markers, e- cadherin and β-catenin, in prostate cancer. In Urologic Oncology: Seminars and Original Investigations, volume 28, pages 534–540. Elsevier, 2010. [109] A. Coutts, E. MacKenzie, E. Griffith, and D. Black. Tes is a novel focal adhesion protein with a role in cell spreading. Journal of cell science, 116(5):897–906, 2003. [110] Y. Cui, J. Wang, X. Zhang, R. Lang, M. Bi, L. Guo, and S. Lu. Ecrg2, a novel candidate of in the esophageal carcinoma, interacts directly with metallothionein 2a and links to apoptosis* 1,* 2,* 3. Biochemical and biophysical research communications, 302(4):904–915, 2003. [111] M. Cully, H. You, and A. Levine. Beyond PTEN mutations: the PI3K pathway as an integrator of multiple inputs during tumorigenesis. Nature Reviews Cancer, 2006. [112] M. Czystowska, J. Han, M. Szczepanski, M. Szajnik, K. Quadrini, H. Brandwein, J. Hadden, K. Signorelli, and T. Whiteside. Irx-2, a novel immunotherapeutic, protects human t cells from tumor-induced cell death. Cell Death & Differentiation, 16(5):708– 718, 2009. [113] C. Dang, M. Gottschling, K. Manning, E. O’Currain, S. Schneider, W. Sterry, E. Stockfleth, and I. Nindl. Identification of dysregulated genes in cutaneous squamous cell carcinoma. Oncology reports, 16(3):513–519, 2006. [114] L. De Filippis, G. Lamorte, E. Snyder, A. Malgaroli, and A. Vescovi. A novel, immor- tal, and multipotent human neural stem cell line generating functional neurons and oligodendrocytes. Stem cells, 25(9):2312–2321, 2007. [115] C. Dehay and H. Kennedy. Cell-cycle control and cortical development. Nature Re- views Neuroscience, 8(6):438–450, 2007. [116] L. Deleyrolle and B. Reynolds. Isolation, expansion, and differentiation of adult mam- malian neural stem and progenitor cells using the neurosphere assay. Methods Mol Biol, 549:91–101, 2009. [117] G. Denning, B. Jean-Joseph, C. Prince, D. Durden, and P. Vogt. A short n-terminal sequence of pten controls cytoplasmic localization and is required for suppression of cell growth. Oncogene, 26(27):3930–3940, 2007. [118] C. Desmet and D. Peeper. The neurotrophic receptor trkb: a drug target in anti-cancer therapy? Cellular and molecular life sciences, 63(7):755–759, 2006. [119] L. Desnoyers, R. Pai, R. Ferrando, K. Hötzel, T. Le, J. Ross, R. Carano, A. D’Souza, J. Qing, I. Mohtashemi, et al. Targeting fgf19 inhibits tumor growth in colon cancer xenograft and fgf19 transgenic hepatocellular carcinoma models. Oncogene, 27(1):85– 97, 2007. [120] T. Di Tomaso, S. Mazzoleni, E. Wang, G. Sovena, D. Clavenna, A. Franzin, P. Mor- tini, S. Ferrone, C. Doglioni, F. Marincola, et al. Immunobiological characterization of cancer stem cells isolated from glioblastoma patients. Clinical Cancer Research, 16(3):800–813, 2010. [121] P. Dirks. Stem cells and brain tumours. Nature, 444(7120):687–688, 2006. [122] P. Dirks. Brain tumour stem cells: the undercurrents of human brain cancer and their relationship to neural stem cells. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1489):139, 2008. [123] P. B. Dirks. Cancer: stem cells and brain tumours. Nature, 444(7120):687–8, 2006. [124] J. G. Doench and P. A. Sharp. Specificity of microRNA target selection in translational repression. Genes Dev, 18(5):504–11, 2004. [125] T. Doetschman, H. Eistetter, M. Katz, W. Schmidt, and R. Kemler. The in vitro development of blastocyst-derived embryonic stem cell lines: formation of visceral yolk sac, blood islands and myocardium. Journal of embryology and experimental morphology, 87(1):27, 1985. [126] S. Dolci, A. Belmonte, R. Santone, M. Giorgi, M. Pellegrini, E. Carosa, E. Pic- cione, A. Lenzi, and E. Jannini. Subcellular localization and regulation of type-1c and type-5 phosphodiesterases. Biochemical and biophysical research communications, 341(3):837–846, 2006. [127] G. Dominic, W. Yi-Lu, L. David, and R. Nikolaus. microrna target predictions across seven drosophila species and comparison to mammalian targets. 2005. [128] D. Doyle, G. Hanks, and N. Cherny. Oxford textbook of palliative medicine. Oxford University Press, USA, 2005. [129] T. Du. microPrimer: the biogenesis and function of microRNA. Development, 132(21):4645–4652, Sept. 2005. [130] A. Duensing and S. Duensing. Guilt by association? p53 and the development of ane- uploidy in cancer. Biochemical and biophysical research communications, 331(3):694– 700, 2005. [131] H. Dvinge and P. Bertone. Htqpcr: high-throughput analysis and visualization of quantitative real-time pcr data in r. Bioinformatics, 25(24):3325–3326, 2009. [132] D. Edwards. Non-linear normalization and background correction in one-channel cdna microarray studies. Bioinformatics, 19(7):825, 2003. [133] A. Efeyan and M. Serrano. p53: guardian of the genome and policeman of the onco- genes. Cell Cycle, 6(9):1006–1010, 2007. [134] A. J. Enright, B. John, U. Gaul, T. Tuschl, C. Sander, and D. S. Marks. MicroRNA targets in Drosophila. Genome Biol, 5(1):R1, 2003. [135] P. Eriksson, E. Perfilieva, T. Björk-Eriksson, A. Alborn, C. Nordborg, D. Peterson, and F. Gage. Neurogenesis in the adult human hippocampus. Nature medicine, 4(11):1313–1317, 1998. [136] J. Erlichman and J. Leiter. Glia modulation of the extracellular milieu as a factor in central CO2 chemosensitivity and respiratory control. Journal of Applied Physiology, 108(6):1803, 2010. [137] C. Esposito, M. Scrima, A. Carotenuto, A. Tedeschi, P. Rovero, G. D’Errico, A. Mal- fitano, M. Bifulco, and D. Anna Maria. Structures and micelle locations of the nonlipidated and lipidated c-terminal membrane anchor of 2’, 3’-cyclic nucleotide- 3’-phosphodiesterase. Biochemistry, 47(1):308–319, 2008. [138] S. Falcon and R. Gentleman. Using gostats to test gene lists for go term association. Bioinformatics, 23(2):257, 2007. [139] K. K. Farh, A. Grimson, C. Jan, B. P. Lewis, W. K. Johnston, L. P. Lim, C. B. Burge, and D. P. Bartel. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science, 310(5755):1817–21, 2005. [140] T. Fawcett, H. Eastman, J. Martindale, and N. Holbrook. Physical and functional association between gadd153 and ccaat/enhancer-binding protein beta during cellular stress. Journal of Biological Chemistry, 271(24):14285–14289, 1996. [141] C. Fears, C. Gladson, and A. Woods. Syndecan-2 is expressed in the microvascula- ture of gliomas and regulates angiogenic processes in microvascular endothelial cells. Journal of Biological Chemistry, 281(21):14533–14536, 2006. [142] R. Feil and F. Berger. Convergent evolution of genomic imprinting in plants and mammals. Trends in Genetics, 23(4):192–199, 2007. [143] B. G. Firehose. Broad gdac firehose. [144] A. Fischer and R. Bongini. Turning müller glia into neural progenitors in the retina. Molecular neurobiology, pages 1–11, 2010. [145] J. Flax, S. Aurora, C. Yang, C. Simonin, A. Wills, L. Billinghurst, M. Jendoubi, R. Sidman, J. Wolfe, S. Kim, et al. Engraftable human neural stem cells respond to developmental cues, replace neurons, and express foreign genes. Nature biotechnology, 16:1033–1039, 1998. [146] P. Flicek, M. Amode, D. Barrell, K. Beal, S. Brent, D. Carvalho-Silva, P. Clapham, G. Coates, S. Fairley, S. Fitzgerald, et al. Ensembl 2012. Nucleic acids research, 40(D1):D84–D90, 2012. [147] P. Flicek, M. Amode, D. Barrell, K. Beal, S. Brent, Y. Chen, P. Clapham, G. Coates, S. Fairley, S. Fitzgerald, et al. Ensembl 2011. Nucleic acids research, 39(suppl 1):D800, 2011. [148] W. Freije, F. Castro-Vargas, Z. Fang, S. Horvath, T. Cloughesy, L. Liau, P. Mischel, and S. Nelson. Gene expression profiling of gliomas strongly predicts survival. Cancer research, 64(18):6503, 2004. [149] R. Fricker, M. Carpenter, C. Winkler, C. Greco, M. Gates, and A. Björklund. Site- specific migration and neuronal differentiation of human neural progenitor cells after transplantation in the adult rat brain. The Journal of neuroscience, 19(14):5990, 1999. [150] R. Friedman, K. Farh, C. Burge, and D. Bartel. Most mammalian mrnas are conserved targets of micrornas. Genome research, 19(1):92–105, 2009. [151] M. Frolov and N. Dyson. Molecular mechanisms of E2F-dependent activation and pRB-mediated repression. Journal of cell science, 117(11):2173, 2004. [152] P. Fujita, B. Rhead, A. Zweig, A. Hinrichs, D. Karolchik, M. Cline, M. Goldman, G. Barber, H. Clawson, A. Coelho, et al. The ucsc genome browser database: update 2011. Nucleic acids research, 39(suppl 1):D876, 2011. [153] T. Fujiwara, M. Bandi, M. Nitta, E. Ivanova, R. Bronson, and D. Pellman. Cytoki- nesis failure generating tetraploids promotes tumorigenesis in p53-null cells. Nature, 437(7061):1043, 2005. [154] F. B. Furnari, T. Fenton, R. M. Bachoo, A. Mukasa, J. M. Stommel, A. Stegh, W. C. Hahn, K. L. Ligon, D. N. Louis, C. Brennan, L. Chin, R. A. DePinho, and W. K. Cavenee. Malignant astrocytic glioma: genetics, biology, and paths to treatment. Genes and development, 21(21):2683–710, 2007. [155] D. Gaidatzis, E. Van Nimwegen, J. Hausser, and M. Zavolan. Inference of mirna targets using evolutionary conservation and pathway analysis. BMC bioinformatics, 8(1):69, 2007. [156] R. Galli, E. Binda, U. Orfanelli, B. Cipelletti, A. Gritti, S. De Vitis, R. Fiocco, C. Foroni, F. Dimeco, and A. Vescovi. Isolation and characterization of tumorigenic, stem-like neural precursors from human glioblastoma. Cancer research, 64(19):7011, 2004. [157] G. Gallia, V. Rand, I. Siu, et al. PIK3CA gene mutations in pediatric and adult glioblastoma multiforme. Molecular cancer research, 4(10):709, 2006. [158] E. Garcia-Aragoncillo, J. Carrillo, E. Lalli, N. Agra, G. Gomez-Lopez, A. Pestana, and J. Alonso. Dax1, a direct target of ews/fli1 oncoprotein, is a principal regulator of cell-cycle progression in ewing’s tumor cells. Oncogene, 27(46):6034–6043, 2008. [159] M. Gardiner-Garden and M. Frommer. CpG islands in vertebrate genomes. Journal of molecular biology, 196(2):261–282, July 1987. [160] A. Gartel and S. Radhakrishnan. Lost in transcription: repression, mechanisms, and consequences. Cancer research, 65(10):3980, 2005. [161] L. Gautier, L. Cope, B. Bolstad, and R. Irizarry. affyÂŮanalysis of affymetrix genechip data at the probe level. Bioinformatics, 20(3):307–315, 2004. [162] GBMbase. A bioinformatics resource for glioblastoma multiforme. http://beta.gbmbase.org/page/welcome/display. [163] A. Giganti, J. Plastino, B. Janji, M. Van Troys, D. Lentz, C. Ampe, C. Sykes, and E. Friederich. Actin-filament cross-linking protein t-plastin increases arp2/3-mediated actin-based movement. Journal of cell science, 118(6):1255, 2005. [164] A. Girard, R. Sachidanandam, G. J. Hannon, and M. A. Carmell. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature, 442(7099):199–202, 2006. [165] T. Glaser, S. M. Pollard, A. Smith, and O. Brüstle. Tripotential Differentiation of Adherently Expandable Neural Stem (NS) Cells. PLoS ONE, 2(3):e298, Mar. 2007. [166] V. Gocheva and J. Joyce. Cysteine cathepsins and the cutting edge of cancer invasion. Cell cycle, 6(1):60–64, 2007. [167] V. Gocheva, W. Zeng, D. Ke, D. Klimstra, T. Reinheckel, C. Peters, D. Hanahan, and J. Joyce. Distinct roles for cysteine cathepsin genes in multistage tumorigenesis. Genes & development, 20(5):543–556, 2006. [168] K. Goh, W. Poon, D. Chan, and C. Ip. Tissue plasminogen activator expression in meningiomas and glioblastomas. Clinical neurology and neurosurgery, 107(4):296–300, 2005. [169] S. Gomez-Lopez, O. Wiskow, R. Favaro, S. Nicolis, D. Price, S. Pollard, and S. A. Sox2 and pax6 maintain the proliferative and developmental potential of gliogenic neural stem cells in vitro. Glia, 59:1588–1599, 2011. [170] M. Göransson, M. Andersson, C. Forni, A. Ståhlberg, C. Andersson, A. Olofsson, R. Mantovani, and P. Åman. The myxoid liposarcoma fus-ddit3 fusion oncoprotein deregulates nf-κb target genes by interaction with nfkbiz. Oncogene, 28(2):270–278, 2008. [171] E. Gould, A. Reeves, M. Graziano, and C. Gross. Neurogenesis in the neocortex of adult primates. Science, 286(5439):548, 1999. [172] A. Gourine and S. Kasparov. Astrocytes as brain interoceptors. Experimental Physi- ology, 96(4):411, 2011. [173] A. Gourine, V. Kasymov, N. Marina, F. Tang, M. Figueiredo, S. Lane, A. Teschemacher, K. Spyer, K. Deisseroth, and S. Kasparov. Astrocytes control breathing through pH-dependent release of ATP. Science, 329(5991):571, 2010. [174] M. Graeber, C. Tran, P. Wolz, R. Egensperger, S. Kösel, Y. Imai, K. Bise, S. Kohsaka, and P. Mehraein. Differential expression of mhc class ii molecules by microglia and neoplastic astroglia: relevance for the escape of astrocytoma cells from immune surveil- lance. Neuropathology and applied neurobiology, 24:293–301, 1998. [175] B. M. H. H. G. M. Graff L, Castrop F. Expression of vesicular monoamine transporters, synaptosomalâĂŘassociated protein 25 and syntaxin1: a signature of human small cell lung carcinoma. Cancer Res, 61:2138–2144, 2001. [176] L. Gravendeel, M. Kouwenhoven, O. Gevaert, J. de Rooi, A. Stubbs, J. Duijm, A. Dae- men, F. Bleeker, L. Bralten, N. Kloosterhof, et al. Intrinsic gene expression profiles of gliomas are a better predictor of survival than histology. Cancer research, 69(23):9065– 9072, 2009. [177] A. Gregorieff and H. Clevers. Wnt signaling in the intestinal epithelium: from endo- derm to cancer. Genes & development, 19(8):877–890, 2005. [178] C. Gregorio-King, J. McLeod, F. Collier, G. Collier, K. Bolton, G. Van Der Meer, J. Apostolopoulos, and M. Kirkland. Merp1: a mammalian ependymin-related protein gene differentially expressed in hematopoietic cells. Gene, 286(2):249–257, 2002. [179] S. Griffiths-Jones, H. K. Saini, S. van Dongen, and A. J. Enright. MiRBase: tools for microRNA genomics. Nucleic Acids Res, 36(Database issue):D154–8, 2008. [180] A. Grimson, K. K. Farh, W. K. Johnston, P. Garrett-Engele, L. P. Lim, and D. P. Bar- tel. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell, 27(1):91–105, 2007. [181] J. Gunnersen, V. Spirkoska, P. Smith, R. Danks, and S. Tan. Growth and migration markers of rat c6 glioma cells identified by serial analysis of gene expression. Glia, 32(2):146–154, 2000. [182] P. Gupta, C. Chaffer, and R. Weinberg. Cancer stem cells: mirage or reality? Nature medicine, 15(9):1010–1012, 2009. [183] C. Hagemann, J. Anacker, S. Haas, D. Riesner, B. Schömig, R. Ernestus, and G. Vince. Comparative expression pattern of matrix-metalloproteinases in human glioblastoma cell-lines and primary cultures. BMC research notes, 3(1):293, 2010. [184] I. Han, H. Park, and E. Oh. New insights into syndecan-2 expression and tumourigenic activity in colon carcinoma cells. Journal of Molecular Histology, 35(3):319–326, 2004. [185] K. L. Harms and X. Chen. The C terminus of p53 family proteins is a cell fate determinant. Molecular and cellular biology, 25(5):2014–2030, Mar. 2005. [186] M. Harris, H. Yang, B. Low, J. Mukherje, A. Guha, R. Bronson, L. Shultz, M. Israel, and K. Yun. Cancer stem cells are enriched in the side population cells in a mouse model of glioma. Cancer research, 68(24):10051–10059, 2008. [187] E. Hartfuss, R. Galli, N. Heins, and M. Götz. Characterization of cns precursor subtypes and radial glia. Developmental biology, 229(1):15–30, 2001. [188] R. Haviland, S. Eschrich, G. Bloom, Y. Ma, S. Minton, R. Jove, and W. Cress. Necdin, a negative growth regulator, is a novel stat3 target gene down-regulated in human cancer. PloS one, 6(10):e24923, 2011. [189] Y. Hayakawa, Y. Hirata, H. Nakagawa, K. Sakamoto, Y. Hikiba, H. Kinoshita, W. Nakata, R. Takahashi, K. Tateishi, M. Tada, et al. Apoptosis signal-regulating kinase 1 and cyclin d1 compose a positive feedback loop contributing to tumor growth in gastric cancer. Proceedings of the National Academy of Sciences, 108(2):780–785, 2011. [190] L. He, C. Fan, A. Kapoor, A. Ingram, A. Rybak, R. Austin, J. Dickhout, J. Cutz, J. Scholey, and D. Tang. [alpha]-mannosidase 2c1 attenuates pten function in prostate cancer cells. Nature Communications, 2:307, 2011. [191] C. Heldin, A. Ostman, A. Eriksson, A. Siegbahn, L. Claesson-Welsh, and B. Wester- mark. Platelet-derived growth factor: Isoform-specific signalling via heterodimeric or homodimeric receptor complexes. Kidney Int, 41(3):571–574, 1992. [192] M. Hernandez, M. Nieto, and M. Sanchez Crespo. Cytosolic phospholipase a2 and the distinct transcriptional programs of astrocytoma cells. Trends in neurosciences, 23(6):259–264, 2000. [193] K. Herrup and Y. Yang. Cell cycle regulation in the postmitotic neuron: oxymoron or new biology? Nature Reviews Neuroscience, 8(5):368–378, 2007. [194] M. S. Hestand, A. Klingenhoff, M. Scherf, Y. Ariyurek, Y. Ramos, W. van Workum, M. Suzuki, T. Werner, G. J. van Ommen, J. T. den Dunnen, M. Harbers, and P. A. t Hoen. Tissue-specific transcript annotation and expression profiling with comple- mentary next-generation sequencing technologies. Nucleic acids research, 38(16):e165, 2010. [195] S. Hockfield and R. McKay. Identification of major cell classes in the developing mammalian nervous system. The Journal of neuroscience, 5(12):3310, 1985. [196] J. Hodgson, R. Yeh, A. Ray, N. Wang, I. Smirnov, M. Yu, S. Hariono, J. Silber, H. Feiler, J. Gray, et al. Comparative analyses of gene copy number and mrna expres- sion in glioblastoma multiforme tumors and xenografts. Neuro-oncology, 11(5):477– 487, 2009. [197] R. Hoffmann and A. Valencia. A gene network for navigating the literature. Nature genetics, 36(7):664–664, 2004. [198] L. Hook, J. Vives, N. Fulton, M. Leveridge, S. Lingard, M. Bootman, A. Falk, S. Pol- lard, T. Allsopp, D. Dalma-Weiszhausz, et al. Non-immortalized human neural stem (ns) cells as a scalable platform for cellular assays. Neurochemistry international, 2011. [199] S. Houwing, L. Kamminga, E. Berezikov, D. Cronembold, A. Girard, H. van den Elst, D. Filippov, H. Blaser, E. Raz, C. Moens, et al. A role for piwi and pirnas in germ cell maintenance and transposon silencing in zebrafish. Cell, 129(1):69–82, 2007. [200] S. Huang, L. Shu, M. Dilling, J. Easton, F. Harwood, H. Ichijo, and P. Houghton. Sus- tained activation of the jnk cascade and rapamycin-induced apoptosis are suppressed by p53/p21cip1. Molecular cell, 11(6):1491–1501, 2003. [201] X. Huang, D. Xiao, L. Xu, H. Zhong, L. Liao, Z. Xie, and E. Li. Prognostic signifi- cance of altered expression of sdc2 and cyr61 in esophageal squamous cell carcinoma. Oncology reports, 21(4):1123, 2009. [202] Z. Huang, Y. Kawase-Koga, S. Zhang, J. Visvader, M. Toth, C. Walsh, and T. Sun. Transcription factor lmo4 defines the shape of functional areas in developing cortices and regulates sensorimotor control. Developmental biology, 327(1):132–142, 2009. [203] HUGO. Gene nomenclature committee. http://www.genenames.org/. [204] C. Hunter, R. Smith, D. Cahill, P. Stephens, C. Stevens, J. Teague, C. Greenman, S. Edkins, G. Bignell, H. Davies, et al. A hypermutation phenotype and somatic MSH6 mutations in recurrent human malignant gliomas after alkylator chemotherapy. Cancer research, 66(8):3987, 2006. [205] E. B. Institute. Intact database. http://www.ebi.ac.uk/intact/main.xhtml. [206] N. C. Institute. A to z list of cancers. [207] N. Ishii, D. Maier, A. Merlo, M. Tada, Y. Sawamura, A. Diserens, and E. Meir. Frequent co-alterations of TP53, p16/CDKN2A, p14ARF, PTEN tumor suppressor genes in human glioma cell lines. Brain pathology, 9(3):469–479, 1999. [208] K. Jabbari and G. Bernardi. Cytosine methylation and CpG, TpG, (CpA) and TpA frequencies. Gene, 333:143–149, 2004. [209] P. Jay, C. Rougeulle, A. Massacrier, A. Moncla, M. Mattel, P. Malzac, N. Roëckel, S. Taviaux, J. Lefranc, P. Cau, et al. The human necdin gene, ndn, is maternally im- printed and located in the prader-willi syndrome chromosomal region. Nature genetics, 17(3):357–361, 1997. [210] J. Jiao and D. Chen. Induction of neurogenesis in non-conventional neurogenic regions of the adult cns by niche astrocyte-produced signals. Stem Cells, pages 2007–0513v1, 2008. [211] F. Johansson, H. Göransson, and B. Westermark. Expression analysis of genes in- volved in progression driven by retroviral insertional mutagenesis in mice. Oncogene, 24(24):3896–3905, 2005. [212] K. Johe, T. Hazel, T. Muller, M. Dugich-Djordjevic, and R. McKay. Single factors direct the differentiation of stem cells from the fetal and adult central nervous system. Genes & development, 10(24):3129, 1996. [213] B. John, A. J. Enright, A. Aravin, T. Tuschl, C. Sander, and D. S. Marks. Human MicroRNA targets. PLoS Biol, 2(11):e363, 2004. [214] P. Jones and P. Laird. Cancer-epigenetics comes of age. Nature genetics, 21(2):163– 167, 1999. [215] A. Kanawaty and J. Henderson. Genomic analysis of induced pluripotent stem (ips) cells: routes to reprogramming. Bioessays, 31(2):134–138, 2009. [216] Y. Kang, I. Kim, E. Kim, M. Yoon, S. Kim, T. Kwon, and K. Choi. Paxilline enhances trail-mediated apoptosis of glioma cells via modulation of c-flip, survivin and dr5. Experimental & molecular medicine, 43(1):24, 2011. [217] Y. Kang, I. Kim, E. Kim, M. Yoon, S. Kim, T. Kwon, and K. Choi. Paxilline enhances trail-mediated apoptosis of glioma cells via modulation of c-flip, survivin and dr5. Experimental & molecular medicine, 43(1):24–34, 2011. [218] A. Kaul and W. Maltese. Killing of cancer cells by the photoactivatable protein kinase c inhibitor, calphostin c, involves induction of endoplasmic reticulum stress. Neoplasia (New York, NY), 11(9):823, 2009. [219] B. Kaur, F. Khwaja, E. Severson, S. Matheny, D. Brat, and E. Van Meir. Hypoxia and the hypoxia-inducible-factor pathway in glioma growth and angiogenesis. Neuro- oncology, 7(2):134, 2005. [220] M. Kawashima, K. Doh-ura, E. Mekada, M. Fukui, and T. Iwaki. Cd9 expression in solid non-neuroepithelial tumors and infiltrative astrocytic tumors. Journal of Histochemistry & Cytochemistry, 50(9):1195, 2002. [221] M. Kawashima, K. Doh-ura, E. Mekada, M. Fukui, and T. Iwaki. Cd9 expression in solid non-neuroepithelial tumors and infiltrative astrocytic tumors. Journal of Histochemistry & Cytochemistry, 50(9):1195–1203, 2002. [222] M. Kedde, M. J. Strasser, B. Boldajipour, J. A. Oude Vrielink, K. Slanchev, C. le Sage, R. Nagel, P. M. Voorhoeve, J. van Duijse, U. A. Orom, A. H. Lund, A. Perrakis, E. Raz, and R. Agami. RNA-binding protein Dnd1 inhibits microRNA access to target mRNA. Cell, 131(7):1273–86, 2007. [223] G. Kempermann, S. Jessberger, B. Steiner, and G. Kronenberg. Milestones of neuronal development in the adult hippocampus. Trends in Neurosciences, 27(8):447–452, 2004. [224] M. Kertesz, N. Iovino, U. Unnerstall, U. Gaul, and E. Segal. The role of site accessi- bility in microrna target recognition. Nature genetics, 39(10):1278–1284, 2007. [225] I. Kil, S. Kim, S. Lee, and J. Park. Small interfering RNA-mediated silencing of mitochondrial NADP+-dependent isocitrate dehydrogenase enhances the sensitivity of HeLa cells toward tumor necrosis factor-α and anticancer drugs. Free Radical Biology and Medicine, 43(8):1197–1207, 2007. [226] D. Kim, C. H. Kim, J. I. Moon, Y. G. Chung, M. Y. Chang, B. S. Han, S. Ko, E. Yang, K. Y. Cha, R. Lanza, and K. S. Kim. Generation of human induced pluripotent stem cells by direct delivery of reprogramming proteins. Cell Stem Cell, 4(6):472–6, 2009. [227] I. Kim, Y. Kang, M. Yoon, E. Kim, S. Kim, T. Kwon, I. Kim, and K. Choi. Amiodarone sensitizes human glioma cells but not astrocytes to trail-induced apoptosis via chop- mediated dr5 upregulation. Neuro-oncology, 13(3):267–279, 2011. [228] J. Kim, J. Choi, S. Lim, O. Kwon, J. Seo, S. Ryu, and P. Suh. Phospholipase c-eta 1 is activated by intracellular ca2+ mobilization and enhances gpcrs/plc/ca2+ signaling. Cellular Signalling, 2011. [229] S. Kim, M. Seong, B. Jeon, H. Ko, J. Kim, and S. Park. Phase analysis identifies compound heterozygous deletions of the PARK2 gene in patients with early-onset Parkinson disease. Clinical Genetics. [230] T. Kim, M. Hemberg, J. Gray, A. Costa, D. Bear, J. Wu, D. Harmin, M. Laptewicz, K. Barbara-Haley, S. Kuersten, et al. Widespread transcription at neuronal activity- regulated enhancers. Nature, 465(7295):182–187, 2010. [231] T.-M. Kim, W. Huang, R. Park, P. J. Park, and M. D. Johnson. A developmental taxonomy of glioblastoma defined and maintained by MicroRNAs. Cancer Research, 71(9):3387–3399, May 2011. [232] V. N. Kim. MicroRNA biogenesis: coordinated cropping and dicing. Nature Reviews Molecular Cell Biology, 6(5):376–385, May 2005. [233] W. Kim and N. Sharpless. The regulation of INK4/ARF in cancer and aging. Cell, 127(2):265–275, 2006. [234] R. Kincaid, A. Kuchinsky, and M. Creech. Vistaclara: an expression browser plug-in for cytoscape. Bioinformatics, 24(18):2112–2114, 2008. [235] S. Kinney, D. Smiraglia, S. James, M. Moser, B. Foster, and A. Karpf. Stage-specific alterations of dna methyltransferase expression, dna hypermethylation, and dna hy- pomethylation during prostate cancer progression in the transgenic adenocarcinoma of mouse prostate model. Molecular Cancer Research, 6(8):1365–1374, 2008. [236] L. S. Kinsey M, Smith R. Nr0b1 is required for the oncogenic phenotype mediated by ews/fli in ewing’s sarcoma. Mol Cancer Res, 4:851–859, 2006. [237] M. Kiriakidou, P. T. Nelson, A. Kouranov, P. Fitziev, C. Bouyioukos, Z. Mourelatos, and A. Hatzigeorgiou. A combined computational-experimental approach predicts human microRNA targets. Genes Dev, 18(10):1165–78, 2004. [238] M. Kitano, M. Nakaya, T. Nakamura, S. Nagata, and M. Matsuda. Imaging of rab5 ac- tivity identifies essential regulators for phagosome maturation. Nature, 453(7192):241– 245, 2008. [239] C. Klattenhoff and W. Theurkauf. Biogenesis and germline functions of piRNAs. Development, 135(1):3–9, 2008. [240] A. Klein and B. Simons. Universal patterns of stem cell fate in cycling adult tissues. Development, 138(15):3103, 2011. [241] T. Kolesnikova, A. Kazarov, M. Lemieux, M. Lafleur, S. Kesari, A. Kung, and M. Hem- ler. Glioblastoma inhibition by cell surface immunoglobulin protein ewi-2, in vitro and in vivo. Neoplasia (New York, NY), 11(1):77, 2009. [242] T. Kondo, T. Setoguchi, and T. Taga. Persistence of a small subpopulation of cancer stem-like cells in the c6 glioma cell line. Proceedings of the National Academy of Sciences of the United States of America, 101(3):781, 2004. [243] K. Kondoh, N. Tsuji, C. Kamagata, M. Sasaki, D. Kobayashi, A. Yagihashi, and N. Watanabe. A novel aspartic protease gene, alp56, is up-regulated in human breast cancer independently from the cathepsin d gene. Breast cancer research and treatment, 78(1):37–44, 2003. [244] A. Korshunov, R. Sycheva, and A. Golanov. Genetically distinct and clinically relevant subtypes of glioblastoma defined by array-based comparative genomic hybridization (array-cgh). Acta neuropathologica, 111(5):465–474, 2006. [245] D. Koul, R. Shen, S. Bergh, Y. Lu, J. de Groot, T. Liu, G. Mills, and W. Yung. Targeting integrin-linked kinase inhibits Akt signaling pathways and decreases tumor progression of human glioblastoma. Molecular cancer therapeutics, 4(11):1681, 2005. [246] C. R. E. C. B. M. C. D. Kourtidis A, Jain R. An rna interference screen identifies metabolic regulators nr1d1 and pbp as novel survival factors for breast cancer cells with the erbb2 signature. Cancer Res, 70:1783–1792, 2010. [247] A. Krek, D. Grün, M. Poy, R. Wolf, L. Rosenberg, E. Epstein, P. MacMenamin, I. Da Piedade, K. Gunsalus, M. Stoffel, et al. Combinatorial microrna target predic- tions. Nature genetics, 37(5):495–500, 2005. [248] A. Kriegstein and A. Alvarez-Buylla. The glial nature of embryonic and adult neural stem cells. Annual review of neuroscience, 32:149, 2009. [249] A. Kriegstein and M. Götz. Radial glia diversity: a matter of cell fate. Glia, 43(1):37– 43, 2003. [250] R. Kroes, G. Dawson, and J. Moskal. Focused microarray analysis of glyco-gene expression in human glioblastomas. Journal of neurochemistry, 103(s1):14–24, 2007. [251] R. Kroes, H. He, M. Emmett, C. Nilsson, F. Leach, I. Amster, A. Marshall, and J. Moskal. Overexpression of st6galnacv, a ganglioside-specific alpha 2, 6- sialyltransferase, inhibits glioma growth in vivo. Proceedings of the National Academy of Sciences, 107(28):12646, 2010. [252] T. Kulikova, P. Aldebert, N. Althorpe, W. Baker, K. Bates, P. Browne, A. Van Den Broek, G. Cochrane, K. Duggan, R. Eberhardt, et al. The embl nucleotide sequence database. Nucleic Acids Research, 32(suppl 1):D27–D30, 2004. [253] M. Lafleur, D. Xu, and M. Hemler. Tetraspanin proteins regulate membrane type-1 matrix metalloproteinase-dependent pericellular proteolysis. Molecular Biology of the Cell, 20(7):2030–2040, 2009. [254] J. Lai, D. Sandhu, C. Yu, T. Han, C. Moser, K. Jackson, R. Guerrero, I. Aderca, H. Isomoto, M. Garrity-Park, et al. Sulfatase 2 up-regulates glypican 3, promotes fibroblast growth factor signaling, and decreases survival in hepatocellular carcinoma. Hepatology, 47(4):1211–1222, 2008. [255] Y. Lam, E. di Tomaso, H.-K. Ng, J. Pang, M. Roussel, and N. Hjelm. Expression of p19 ink4d, cdk4, cdk6 in glioblastoma multiforme. British journal of neurosurgery, 14(1):28–32, 2000. [256] P. Landgraf, M. Rusu, R. Sheridan, A. Sewer, N. Iovino, A. Aravin, S. Pfeffer, A. Rice, A. Kamphorst, M. Landthaler, et al. A mammalian microrna expression atlas based on small rna library sequencing. Cell, 129(7):1401–1414, 2007. [257] B. Langmead, C. Trapnell, M. Pop, and S. Salzberg. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol, 10(3):R25, 2009. [258] T. Lassmann, Y. Hayashizaki, and C. Daub. TagdustÂŮa program to eliminate arti- facts from next generation sequencing data. Bioinformatics, 25(21):2839, 2009. [259] S. Lawler and E. Chiocca. Emerging functions of micrornas in glioblastoma. Journal of neuro-oncology, 92(3):297–306, 2009. [260] D. Lawson, M. Harrison, and C. Shapland. Fibroblast transgelin and smooth muscle sm22α are the same protein, the expression of which is down-regulated in many cell lines. Cell motility and the cytoskeleton, 38(3):250–257, 1998. [261] J. Lee, S. Kotliarova, Y. Kotliarov, A. Li, Q. Su, N. M. Donin, S. Pastorino, B. W. Purow, N. Christopher, W. Zhang, J. K. Park, and H. A. Fine. Tumor stem cells derived from glioblastomas cultured in bFGF and EGF more closely mirror the phe- notype and genotype of primary tumors than do serum-cultured cell lines. Cancer Cell, 9(5):391–403, May 2006. [262] J. Lee, I. Vivanco, R. Beroukhim, J. Huang, W. Feng, R. DeBiasi, K. Yoshimoto, J. King, P. Nghiemphu, Y. Yuza, et al. Epidermal growth factor receptor activation in glioblastoma through novel missense mutations in the extracellular domain. PLoS medicine, 3(12):e485, 2006. [263] K. Lee, W. Han, J. Kim, I. Shin, E. Ko, I. Park, D. Lee, K. Oh, and D. Noh. The cd49d+/high subpopulation from isolated human breast sarcoma spheres possesses tumor-initiating ability. International journal of oncology, 40(3):665, 2012. [264] R. Lee, R. Feinbaum, and V. Ambros. The c. elegans heterochronic gene lin-4 encodes small rnas with antisense complementarity to lin-14. Cell, 75(5):843–854, 1993. [265] S. Lee, N. Syed, J. Taylor, P. Smith, B. Griffin, M. Baens, M. Bai, K. Bourantas, J. Stebbing, K. Naresh, et al. Dusp16 is an epigenetically regulated determinant of jnk signalling in burkitt’s lymphoma. British journal of cancer, 103(2):265–274, 2010. [266] Y. S. Lee, S. Pressman, A. P. Andress, K. Kim, J. L. White, J. J. Cassidy, X. Li, K. Lubell, H. Lim do, I. S. Cho, K. Nakahara, J. B. Preall, P. Bellare, E. J. Sontheimer, and R. W. Carthew. Silencing by small RNAs is linked to endosomal trafficking. Nat Cell Biol, 11(9):1150–6, 2009. [267] S. J. Leevers, B. Vanhaesebroeck, and M. D. Waterfield. Signalling through phos- phoinositide 3-kinases: the take centre stage. Current opinion in cell biology, 11(2):219–225, Apr. 1999. [268] H. Lemjabbar-Alaoui, A. van Zante, M. Singer, Q. Xue, Y. Wang, D. Tsay, B. He, D. Jablons, and S. Rosen. Sulf-2, a heparan sulfate endosulfatase, promotes human lung carcinogenesis. Oncogene, 29(5):635–646, 2009. [269] G. Lemke. Glial control of neuronal development. Annual review of neuroscience, 24:87–105, 2001. [270] C. Leonetti, A. Biroccio, G. Graziani, and L. Tentori. Targeted therapy for brain tumours: Role of parp inhibitors. Current cancer drug targets, 12(3):218, 2012. [271] F. Leprêtre, C. Villenet, S. Quief, O. Nibourel, C. Jacquemin, X. Troussard, F. Jardin, F. Gibson, J. Kerckaert, C. Roumier, et al. Waved acgh: to smooth or not to smooth. Nucleic acids research, 38(7):e94–e94, 2010. [272] B. Lewis, C. Burge, and D. Bartel. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microrna targets. Cell, 120(1):15–20, 2005. [273] B. P. Lewis, I. H. Shih, M. W. Jones-Rhoades, D. P. Bartel, and C. B. Burge. Predic- tion of mammalian microRNA targets. Cell, 115(7):787–98, 2003. [274] M. Lewis, S. Ross, P. Strickland, C. Snyder, and C. Daniel. Regulated expression patterns of irx-2, an iroquois-class homeobox gene, in the human breast. Cell and tissue research, 296(3):549–554, 1999. [275] A. Li, J. Walling, Y. Kotliarov, A. Center, M. E. Steed, S. J. Ahn, M. Rosenblum, T. Mikkelsen, J. C. Zenklusen, and H. A. Fine. Genomic changes and gene expression profiles reveal that established glioma cell lines are poorly representative of primary human gliomas. Molecular cancer research : MCR, 6(1):21–30, 2008. [276] M. Li, L. Pevny, R. Lovell-Badge, and A. Smith. Generation of purified neural precur- sors from embryonic stem cells by lineage selection. Current biology, 8(17):971–974, 1998. [277] Q. Li, A. Jedlicka, N. Ahuja, M. Gibbons, S. Baylin, P. Burger, J. Issa, et al. Con- cordant methylation of the er and n33 genes in glioblastoma multiforme. Oncogene, 16(24):3197, 1998. [278] Y. Li, F. Guessous, Y. Zhang, C. DiPierro, B. Kefas, E. Johnson, L. Marcinkiewicz, J. Jiang, Y. Yang, T. Schmittgen, et al. Microrna-34a inhibits glioblastoma growth by targeting multiple oncogenes. Cancer research, 69(19):7569, 2009. [279] S. Liebner, C. Czupalla, and H. Wolburg. current concepts of blood-brain barrier development. The International journal of developmental biology, 2011. [280] A. Liekens, J. De Knijf, W. Daelemans, B. Goethals, P. De Rijk, and J. Del-Favero. Biograph: unsupervised biomedical knowledge discovery via automated hypothesis generation. Genome biology, 12(6):R57, 2011. [281] K. Ligon, J. Alberta, A. Kho, J. Weiss, M. Kwaan, C. Nutt, D. Louis, C. Stiles, and D. Rowitch. The oligodendroglial lineage marker OLIG2 is universally expressed in diffuse gliomas. Journal of Neuropathology & Experimental Neurology, 63(5):499, 2004. [282] L. P. Lim, M. E. Glasner, S. Yekta, C. B. Burge, and D. P. Bartel. Vertebrate microRNA genes. Science, 299(5612):1540, 2003. [283] L. P. Lim, N. C. Lau, P. Garrett-Engele, A. Grimson, J. M. Schelter, J. Castle, D. P. Bartel, P. S. Linsley, and J. M. Johnson. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature, 433(7027):769–73, 2005. [284] L. Linares, A. Hengstermann, A. Ciechanover, S. Müller, and M. Scheffner. Hdmx stimulates Hdm2-mediated ubiquitination and degradation of p53. Proceedings of the National Academy of Sciences, 100(21):12009, 2003. [285] A. Linkous, E. Yazlovitskaya, and D. Hallahan. Cytosolic phospholipase a2 and lysophospholipids in tumor angiogenesis. Journal of the National Cancer Institute, 102(18):1398–1412, 2010. [286] M. M. Lino and A. Merlo. PI3Kinase signaling in glioblastoma. Journal of neuro- oncology, 103(3):417–427, July 2011. [287] R. Lister, M. Pelizzola, R. Dowen, R. Hawkins, G. Hon, J. Tonti-Filippini, J. Nery, L. Lee, Z. Ye, Q. Ngo, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 462(7271):315–322, 2009. [288] B. Liu, Y. Kim, V. Leatherberry, P. Cowin, and C. Alexander. Mammary gland devel- opment requires syndecan-1 to create a β-catenin/tcf-responsive mammary epithelial subpopulation. Oncogene, 22(58):9243–9253, 2003. [289] B. Liu, S. McDermott, S. Khwaja, and C. Alexander. The transforming activity of wnt effectors correlates with their ability to induce the accumulation of mammary progenitor cells. Proceedings of the National Academy of Sciences of the United States of America, 101(12):4158, 2004. [290] F. Liu, Y. You, X. Li, T. Ma, Y. Nie, B. Wei, T. Li, H. Lin, and Z. Yang. Brain injury does not alter the intrinsic differentiation potential of adult neuroblasts. The Journal of Neuroscience, 29(16):5075, 2009. [291] J. Liu, M. A. Valencia-Sanchez, G. J. Hannon, and R. Parker. MicroRNA-dependent localization of targeted mRNAs to mammalian P-bodies. Nat Cell Biol, 7(7):719–23, 2005. [292] W. Liu, Y. Fu, S. Xu, F. Ding, G. Zhao, K. Zhang, C. Du, B. Pang, and Q. Pang. c-Met expression is associated with time to recurrence in patients with glioblastoma multiforme. Journal of clinical neuroscience : official journal of the Neurosurgical Society of Australasia, 18(1):119–121, 2011. [293] X. Liu, A. Bolteus, D. Balkin, O. Henschel, and A. Bordey. Gfap-expressing cells in the postnatal subventricular zone display a unique glial phenotype intermediate between radial glia and astrocytes. Glia, 54(5):394–410, 2006. [294] X. Liu, M. Cicek, S. Plummer, E. Jorgenson, G. Casey, and J. Witte. Association of testis derived transcript gene variants and prostate cancer risk. The Journal of urology, 177(3):894–898, 2007. [295] Y. Liu, S. Shete, C. Etzel, M. Scheurer, G. Alexiou, G. Armstrong, S. Tsavachidis, F. Liang, M. Gilbert, K. Aldape, et al. Polymorphisms of lig4, btbd2, hmga2, and rtel1 genes involved in the double-strand break repair pathway predict glioblastoma survival. Journal of Clinical Oncology, 28(14):2467–2474, 2010. [296] D. Q. S. P. C. K. L. O. T. J. L. J. K. V. C. T. M. P. L. T. L. L. N. S. T. C.-L. Liu Q, Nguyen DH. Molecular properties of cd133+ glioblastoma stem cells derived from treatment-refractory recurrent brain tumors. J Neurooncol, 94:1–19, 2009. [297] C. Lois and A. Alvarez-Buylla. Proliferating subventricular zone cells in the adult mammalian forebrain can differentiate into neurons and glia. Proceedings of the Na- tional Academy of Sciences, 90(5):2074, 1993. [298] B. Lorber, A. Guidi, J. Fawcett, and K. Martin. Activated retinal glia mediated axon regeneration in experimental glaucoma. Neurobiology of Disease, 2011. [299] C. Lottaz, D. Beier, K. Meyer, P. Kumar, A. Hermann, J. Schwarz, M. Junker, P. Oefner, U. Bogdahn, J. Wischhusen, et al. Transcriptional profiles of cd133+ and cd133- glioblastoma-derived cancer stem cell lines suggest different cells of origin. Cancer research, 70(5):2030–2040, 2010. [300] D. Louis, H. Ohgaki, and O. Wiestler. The 2007 WHO classification of tumours of the central nervous system. Acta Neuropathologica, 2007. [301] N. Louis. WHO Classification of Tumours of the Central Nervous System. Interna- tional Agency for Research on Cancer, 2007. [302] S. Lowell, A. Benchoua, B. Heavey, and A. Smith. Notch promotes neural lineage entry by pluripotent embryonic stem cells. PLoS biology, 4(5):e121, 2006. [303] W. Ma, T. Tavakoli, E. Derby, Y. Serebryakova, M. Rao, and M. Mattson. Cell- extracellular matrix interactions regulate neural differentiation of human embryonic stem cells. BMC developmental biology, 8(1):90, 2008. [304] Y.-H. Ma, R. Mentlein, F. Knerlich, M.-L. Kruse, H. M. Mehdorn, and J. Held- Feindt. Expression of stem cell markers in human astrocytomas of different WHO grades. Journal of neuro-oncology, 86(1):31–45, 2008. [305] K. MAEDA, S. MATSUHASHI, K. TABUCHI, T. WATANABE, T. KATAGIRI, M. OYASU, N. SAITO, and S. KURODA. Brain specific human genes, nell1 and nell2, are predominantly expressed in neuroblastoma and other embryonal neuroep- ithelial tumors. Neurologia medico-chirurgica, 41(12):582–589, 2001. [306] H. Maier, C. Jones, B. Jasani, D. Öfner, B. Zelger, K. Schmid, and H. Budka. Metal- lothionein overexpression in human brain tumours. Acta neuropathologica, 94(6):599– 604, 1997. [307] S. Maira, I. Galetic, D. Brazil, S. Kaech, E. Ingley, M. Thelen, and B. Hemmings. Carboxyl-terminal modulator protein (ctmp), a negative regulator of pkb/akt and v-akt at the plasma membrane. Science, 294(5541):374, 2001. [308] C. L. Maire and K. L. Ligon. Glioma Models: New GEMMs Add “Class” with Genomic and Expression Correlations. Cancer Cell, 19(3):295–297, Mar. 2011. [309] A. Majid, T. Lin, G. Best, K. Fishlock, S. Hewamana, G. Pratt, D. Yallop, A. Buggins, S. Wagner, B. Kennedy, et al. Cd49d is an independent prognostic marker that is associated with cxcr4 expression in cll. Leukemia research, 35(6):750–756, 2011. [310] P. Malatesta, E. Hartfuss, and M. Gotz. Isolation of radial glial cells by fluorescent- activated cell sorting reveals a neuronal lineage. Development, 127(24):5253, 2000. [311] M. Mancini and A. Toker. Nfat proteins: emerging roles in cancer progression. Nature Reviews Cancer, 9(11):810–820, 2009. [312] B. A. Mangerich A. How to kill tumor cells with inhibitors of poly(adp-ribosyl)ation. Int J Cancer, 128:251–265, 2011,. [313] J. Mao, J. Perez-losada, D. Wu, R. DelRosario, R. Tsunematsu, K. Nakayama, K. Brown, S. Bryson, and A. Balmain. Fbxw7/cdc4 is a p53-dependent, haploin- sufficient tumour suppressor gene. Nature, 432(7018):775–779, 2004. [314] Y. Mao, H. Sunwoo, B. Zhang, and D. Spector. Direct visualization of the co- transcriptional assembly of a nuclear body by noncoding rnas. Nature cell biology, 13(1):95–101, 2010. [315] M. Maragkakis, P. Alexiou, G. Papadopoulos, M. Reczko, T. Dalamagas, G. Gi- annopoulos, G. Goumas, E. Koukis, K. Kourtis, V. Simossis, et al. Accurate mi- crorna target prediction correlates with protein repression levels. BMC bioinformatics, 10(1):295, 2009. [316] M. Maragkakis, M. Reczko, V. Simossis, P. Alexiou, G. Papadopoulos, T. Dalama- gas, G. Giannopoulos, G. Goumas, E. Koukis, K. Kourtis, et al. Diana-microt web server: elucidating microrna functions through target prediction. Nucleic acids re- search, 37(suppl 2):W273–W276, 2009. [317] I. R. B. R. A. L. Marques MR, Horner JS. Mice lacking the p53/p63 target gene perp are resistant to papilloma development. Cancer Res, 65:6551–6556, 2005. [318] C. Marshall. RAS and RHO GTPases in G1-phase cell-cycle regulation. Nature Reviews Molecular Cell Biology, 5(5):355–366, 2004. [319] D. Martens, V. Tropepe, and D. van der Kooy. Separate proliferation kinetics of fi- broblast growth factor-responsive and epidermal growth factor-responsive neural stem cells within the embryonic forebrain germinal zone. The Journal of Neuroscience, 20(3):1085, 2000. [320] D. L. Masica and R. Karchin. Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. Cancer Research, 71(13):4550–4561, July 2011. [321] J. Massagué and D. Wotton. Transcriptional control by the tgf-beta/smad signaling system. The EMBO Journal, 19:1745–1754, 2000. [322] K. Matsumoto, S. Nishihara, M. Kamimura, T. Shiraishi, T. Otoguro, M. Uehara, Y. Maeda, K. Ogura, A. Lumsden, and T. Ogura. The prepattern transcription factor irx2, a target of the fgf8/map kinase cascade, is involved in cerebellum formation. Nature neuroscience, 7(6):605–612, 2004. [323] M. Matsumura, D. Fremont, P. Peterson, I. Wilson, et al. Emerging principles for the recognition of peptide antigens by mhc class i molecules. Science (New York, NY), 257(5072):927, 1992. [324] K. McCullough, J. Martindale, L. Klotz, T. Aw, and N. Holbrook. Gadd153 sensi- tizes cells to endoplasmic reticulum stress by down-regulating bcl2 and perturbing the cellular redox state. Science Signalling, 21(4):1249, 2001. [325] B. McEllin, C. Camacho, B. Mukherjee, B. Hahm, N. Tomimatsu, R. Bachoo, and S. Burma. Pten loss compromises homologous recombination repair in astrocytes: implications for glioblastoma therapy with temozolomide or poly (adp-) poly- merase inhibitors. Cancer research, 70(13):5457–5464, 2010. [326] R. McLendon, A. Friedman, D. Bigner, and E. Van Meir. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 2008. [327] M. Mehling, P. Simon, M. Mittelbronn, R. Meyermann, S. Ferrone, M. Weller, and H. Wiendl. WHO grade associated downregulation of MHC class I antigen-processing machinery components in human astrocytomas: does it reflect a potential immune escape mechanism? Acta neuropathologica, 114(2):111–9, 2007. [328] K. Meletis, F. Barnabé-Heider, M. Carlén, E. Evergren, N. Tomilin, O. Shupliakov, and J. Frisén. Spinal cord injury reveals multilineage differentiation of ependymal cells. PLoS biology, 6(7):e182, 2008. [329] I. Mellinghoff, M. Wang, I. Vivanco, D. Haas-Kogan, S. Zhu, E. Dia, K. Lu, K. Yoshi- moto, J. Huang, D. Chute, et al. Molecular determinants of the response of glioblas- tomas to egfr kinase inhibitors. New England Journal of Medicine, 353(19):2012–2024, 2005. [330] X. Meng, M. Leyva, M. Jenny, I. Gross, S. Benosman, B. Fricker, S. Harlepp, P. Hébraud, A. Boos, P. Wlosik, et al. A ruthenium-containing organometallic com- pound reduces tumor growth through induction of the endoplasmic reticulum stress gene chop. Cancer research, 69(13):5458–5466, 2009. [331] T. Mercer, M. Dinger, and J. Mattick. Long non-coding rnas: insights into functions. Nature Reviews Genetics, 10(3):155–159, 2009. [332] F. Merkle, A. Tramontin, J. García-Verdugo, and A. Alvarez-Buylla. Radial glia give rise to adult neural stem cells in the subventricular zone. Proceedings of the National Academy of Sciences of the United States of America, 101(50):17528, 2004. [333] MGI. Mouse genome informatics database. http://www.informatics.jax.org/. [334] P. S. Mischel, S. F. Nelson, and T. F. Cloughesy. Molecular analysis of glioblastoma: pathway profiling and its implications for patient therapy. Cancer biology and therapy, 2(3):242–7, 2003. [335] S. Mitsui, N. Yamaguchi, Y. Osako, and K. Yuri. Enzymatic properties and localiza- tion of motopsin (prss12), a protease whose absence causes mental retardation. Brain research, 1136:1–12, 2007. [336] K. Mochizuki, N. Fine, T. Fujisawa, and M. Gorovsky. Analysis of a piwi-related gene implicates small rnas in genome rearrangement in tetrahymena. Cell, 110(6):689–699, 2002. [337] E. Mohorko, R. Glockshuber, and M. Aebi. Oligosaccharyltransferase: the central enzyme of n-linked protein glycosylation. Journal of inherited metabolic disease, pages 1–10, 2011. [338] K. Mokhtari, S. Paris, L. Aguirre-Cruz, N. Privat, E. Criniere, Y. Marie, J. Hauw, M. Kujas, D. Rowitch, K. Hoang-Xuan, et al. Olig2 expression, gfap, p53 and 1p loss analysis contribute to glioma subclassification. Neuropathology and applied neurobiol- ogy, 31(1):62–69, 2005. [339] M. Montanez-Wiscovich, D. Seachrist, M. Landis, J. Visvader, B. Andersen, and R. Keri. Lmo4 is an essential mediator of erbb2/her2/neu-induced breast cancer cell cycle progression. Oncogene, 28(41):3608–3618, 2009. [340] M. E. Montanez-Wiscovich, D. D. Seachrist, M. D. Landis, J. Visvader, B. Andersen, and R. A. Keri. Lmo4 is an essential mediator of erbb2/her2/neu-induced breast cancer cell cycle progression. Oncogene, 28:3608–3618, 2009. [341] A. Moolwaney and O. Igwe. Regulation of the cyclooxygenase-2 system by interleukin- 1β through mitogen-activated protein kinase signaling pathways: A comparative study of human neuroglioma and neuroblastoma cells. Molecular brain research, 137(1):202– 212, 2005. [342] H. Moon, M. Ahn, J. Park, K. Min, Y. Kwon, and K. Kim. Negative regulation of hypoxia inducible factor-1α by necdin. FEBS letters, 579(17):3797–3801, 2005. [343] T. Mori, A. Buffo, and M. Götz. The novel roles of glial cells revisited: the contribution of radial glia and astrocytes to neurogenesis. Current topics in developmental biology, 69:67–99, 2005. [344] M. Morimoto-Tomita, K. Uchimura, A. Bistrup, D. Lum, M. Egeblad, N. Boudreau, Z. Werb, and S. Rosen. Sulf-2, a proangiogenic heparan sulfate endosulfatase, is upregulated in breast cancer. Neoplasia (New York, NY), 7(11):1001, 2005. [345] S. Morrison and J. Kimble. Asymmetric and symmetric stem-cell divisions in devel- opment and cancer. Nature, 441(7097):1068–1074, 2006. [346] A. S. Morrissy, R. D. Morin, A. Delaney, T. Zeng, H. Mcdonald, S. Jones, Y. Zhao, M. Hirst, and M. A. Marra. Next-generation tag sequencing for cancer gene expression profiling. Genome Research, 19(10):1825–1835, Oct. 2009. [347] C. M. Morshead and D. van der Kooy. Disguising adult neural stem cells. Current opinion in neurobiology, 14(1):125–131, Feb. 2004. [348] G. Mosieniak, B. Pyrzynska, and B. Kaminska. Nuclear factor of activated t cells (nfat) as a new component of the signal transduction pathway in glioma cells. Journal of neurochemistry, 71(1):134–141, 1998. [349] W. Mueller, C. Nutt, M. Ehrich, M. Riemenschneider, A. Von Deimling, D. Van Den Boom, and D. Louis. Downregulation of runx3 and tes by hypermethylation in glioblastoma. Oncogene, 26(4):583–593, 2006. [350] J. Mukai, T. Hachiya, S. Shoji-Hoshino, M. Kimura, D. Nadano, P. Suvanto, T. Hanaoka, Y. Li, S. Irie, L. Greene, et al. Nade, a p75ntr-associated cell death executor, is involved in signal transduction mediated by the common neurotrophin receptor p75ntr. Journal of Biological Chemistry, 275(23):17566, 2000. [351] J. Mukai, P. Suvant, and T. Sato. Nerve growth factor-dependent regulation of nade- induced apoptosis. Vitamins & Hormones, 66:385–402, 2003. [352] I. Muñoz-Sanjuán and A. H. Brivanlou. Neural induction, the default model and embryonic stem cells. Nature Reviews Neuroscience, 3(4):271–280, apr 2002. [353] A. Murat, E. Migliavacca, T. Gorlia, W. Lambiv, T. Shay, M. Hamou, N. De Tribolet, L. Regli, W. Wick, M. Kouwenhoven, et al. Stem cell-related self-renewal signa- ture and high epidermal growth factor receptor expression associated with resistance to concomitant chemoradiotherapy in glioblastoma. Journal of Clinical Oncology, 26(18):3015–3024, 2008. [354] N. Nakagomi, T. Nakagomi, S. Kubo, A. Nakano-Doi, O. Saino, M. Takata, H. Yoshikawa, D. Stern, T. Matsuyama, and A. Taguchi. Endothelial cells support survival, proliferation, and neuronal differentiation of transplanted adult ischemia- induced neural stem/progenitor cells after cerebral infarction. Stem Cells, 27(9):2185– 2195, 2009. [355] C. Napoli, C. Lemieux, and R. Jorgensen. Introduction of a chimeric chalcone synthase gene into petunia results in reversible co-suppression of homologous genes in trans. The Plant Cell Online, 2(4):279, 1990. [356] K. Nave. Axon-glial signaling and the glial support of axon function. Annual review of neuroscience, 2008. [357] R. Nawroth, A. Van Zante, S. Cervantes, M. McManus, M. Hebrok, and S. Rosen. Extracellular sulfatases, elements of the wnt signaling pathway, positively regulate growth and tumorigenicity of human pancreatic cancer cells. PLoS One, 2(4):e392, 2007. [358] S. Ng, K. Buckingham, C. Lee, A. Bigham, H. Tabor, K. Dent, C. Huff, P. Shannon, E. Jabs, D. Nickerson, et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics, 42(1):30–35, 2009. [359] C. B. Nielsen, N. Shomron, R. Sandberg, E. Hornstein, J. Kitzman, and C. B. Burge. Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. RNA, 13(11):1894–910, 2007. [360] I. Nimmrich, S. Erdmann, U. Melchers, S. Chtarbova, U. Finke, S. Hentsch, I. Hoff- mann, M. Oertel, W. Hoffmann, and O. Muller. The novel ependymin related gene ucc1 is highly expressed in colorectal tumor cells. Cancer letters, 165(1):71–79, 2001. [361] S. Noctor, A. Flint, T. Weissman, W. Wong, B. Clinton, and A. Kriegstein. Divid- ing precursor cells of the embryonic cortical ventricular zone have morphological and molecular characteristics of radial glia. The Journal of neuroscience, 22(8):3161, 2002. [362] E. Noetzel, M. Rose, E. Sevinc, R. Hilgers, A. Hartmann, A. Naami, R. Knüchel, and E. Dahl. Intermediate filament dynamics and breast cancer: Aberrant promoter methylation of the synemin gene is associated with early tumor relapse. Oncogene, 29(34):4814–4825, 2010. [363] H. Noushmehr, D. Weisenberger, and K. Diefes. Identification of a CpG island methy- lator phenotype that defines a distinct subgroup of glioma. Cancer Cell, 2010. [364] J. Novakova, O. Slaby, and R. Vyzula. Microrna involvement in glioblastoma patho- genesis. Biochemical and biophysical research communications, 2009. [365] H. Ohgaki and P. Kleihues. Epidemiology and etiology of gliomas. Acta neuropatho- logica, 109(1):93–108, 2005. [366] K. Ohira, N. Funatsu, K. Homma, Y. Sahara, M. Hayashi, T. Kaneko, and S. Naka- mura. Truncated trkb-t1 regulates the morphology of neocortical layer i astrocytes in adult rat brain slices. European Journal of Neuroscience, 25(2):406–416, 2007. [367] S. Okabe, K. Forsberg-Nilsson, A. Spiro, M. Segal, and R. McKay. Development of neuronal precursor cells and functional postmitotic neurons from embryonic stem cells in vitro. Mechanisms of development, 59(1):89–102, 1996. [368] Y. Okazaki, M. Furuno, T. Kasukawa, J. Adachi, H. Bono, S. Kondo, I. Nikaido, N. Osato, R. Saito, H. Suzuki, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cdnas. Nature, 420(6915):563–573, 2002. [369] M. Okoniewski, T. Yates, S. Dibben, and C. Miller. An annotation infrastructure for the analysis and interpretation of affymetrix exon array data. Genome biology, 8(5):R79, 2007. [370] P. Ongusaha, T. Ouchi, K. Kim, E. Nytko, J. Kwak, R. Duda, C. Deng, and S. Lee. Brca1 shifts p53-mediated cellular outcomes towards irreversible growth arrest. Onco- gene, 22(24):3749–3758, 2003. [371] P. C. Orban, D. Chui, and J. D. Marth. Tissue- and site-specific DNA recombination in transgenic mice. Proceedings of the National Academy of Sciences of the United States of America, 89(15):6861–6865, Aug. 1992. [372] U. Ørom, T. Derrien, M. Beringer, K. Gumireddy, A. Gardini, G. Bussotti, F. Lai, M. Zytnicki, C. Notredame, Q. Huang, et al. Long noncoding rnas with enhancer-like function in human cells. Cell, 143(1):46–58, 2010. [373] E. Ostrakhovitch, P. Olsson, S. Jiang, and M. Cherian. Interaction of metallothionein with tumor suppressor p53 protein. FEBS letters, 580(5):1235–1238, 2006. [374] T. Ozawa, C. W. Brennan, L. Wang, M. Squatrito, T. Sasayama, M. Nakada, J. T. Huse, A. Pedraza, S. Utsuki, Y. Yasui, A. Tandon, E. I. Fomchenko, H. Oka, R. L. Levine, K. Fujii, M. Ladanyi, and E. C. Holland. PDGFRA gene rearrangements are frequent genetic events in PDGFRA-amplified glioblastomas. Genes & Development, 24(19):2205–2218, Oct. 2010. [375] L. Pacey, J. Stead, A. Gleave, K. Tomczyk, and L. Doering. Neural stem cell culture: neurosphere generation, microscopical analysis and cryopreservation. Nat. Protoc, 215:1–14, 2006. [376] T. Palmer, P. Schwartz, P. Taupin, B. Kaspar, S. Stein, and F. Gage. Cell culture: Progenitor cells from human brain after death. Nature, 411(6833):42–43, 2001. [377] P. Pandolfi. Breast cancerÂŮloss of pten predicts resistance to treatment. New Eng- land Journal of Medicine, 351(22):2337–2338, 2004. [378] K. Paraiso, Y. Xiang, V. Rebecca, E. Abel, Y. Chen, A. Munko, E. Wood, I. Fe- dorenko, V. Sondak, A. Anderson, et al. Pten loss confers braf inhibitor resis- tance to melanoma cells through the suppression of bim expression. Cancer research, 71(7):2750–2760, 2011. [379] D. Park and J. Rich. Biology of glioma cancer stem cells. Molecules and cells, 28(1):7– 12, 2009. [380] H. Park, I. Han, H. Kwon, and E. Oh. Focal adhesion kinase regulates syndecan- 2–mediated tumorigenic activity of ht1080 fibrosarcoma cells. Cancer research, 65(21):9899–9905, 2005. [381] J. Park, J. Jung, M. Seo, S. Kang, Y. Lee, and K. Kang. Dner modulates adipoge- nesis of human -derived mesenchymal stem cells via regulation of cell proliferation. Cell proliferation, 43(1):19–28, 2010. [382] D. Parry, D. Mahony, K. Wills, and E. Lees. Cyclin d-cdk subunit arrangement is dependent on the availability of competing ink4 and p21 class inhibitors. Molecular and cellular biology, 19(3):1775, 1999. [383] D. Parsons, S. Jones, X. Zhang, J. Lin, and R. Leary. An integrated genomic analysis of human glioblastoma multiforme. Science, 2008. [384] H. Pasantes-Morales and A. Schousboe. Role of taurine in osmoregulation in brain cells: mechanisms and functional implications. Amino Acids, 12(3):281–292, 1997. [385] L. Patrawala, T. Calhoun, R. Schneider-Broussard, J. Zhou, K. Claypool, and D. Tang. Side population is enriched in tumorigenic, stem-like cancer cells, whereas abcg2+ and abcg2- cancer cells are similarly tumorigenic. Cancer research, 65(14):6207, 2005. [386] K. Paulsson, M. Kleijmeer, J. Griffith, M. Jevon, S. Chen, P. Anderson, H. Sjögren, S. Li, and P. Wang. Association of tapasin and copi provides a mechanism for the retrograde transport of major histocompatibility complex (mhc) class i molecules from the golgi complex to the endoplasmic reticulum. Journal of Biological Chemistry, 277(21):18266, 2002. [387] G. Paxinos, J. Mai, and S. O. service). The Human Nervous System. Elsevier Academic Press London, 2004. [388] G. Pearson, F. Robinson, T. Gibson, B. Xu, M. Karandikar, K. Berman, and M. Cobb. Mitogen-activated protein (map) kinase pathways: regulation and physiological func- tions. Endocrine reviews, 22(2):153–183, 2001. [389] L. Pevny and M. Placzek. Sox genes and neural progenitor identity. Current opinion in neurobiology, 15(1):7–13, 2005. [390] H. S. Phillips, S. Kharbanda, R. Chen, W. F. Forrest, R. H. Soriano, T. D. Wu, A. Misra, J. M. Nigro, H. Colman, L. Soroceanu, P. M. Williams, Z. Modrusan, B. G. Feuerstein, and K. Aldape. Molecular subclasses of high-grade glioma predict prog- nosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell, 9(3):157–173, Mar. 2006. [391] J. Phillips, E. Huillard, A. Robinson, A. Ward, D. Lum, M. Polley, S. Rosen, D. Row- itch, and Z. Werb. Heparan sulfate sulfatase sulf2 regulates pdgfrα signaling and growth in human and mouse malignant glioma. The Journal of clinical investigation, 122(3):911, 2012. [392] S. Piccirillo, B. Reynolds, N. Zanetti, G. Lamorte, E. Binda, G. Broggi, H. Brem, A. Olivi, F. Dimeco, A. Vescovi, et al. Bone morphogenetic proteins inhibit the tumorigenic potential of human brain tumour-initiating cells. Nature, 444(7120):761, 2006. [393] S. G. Piccirillo, E. Binda, R. Fiocco, A. L. Vescovi, and K. Shah. Brain cancer stem cells. Journal of molecular medicine, 87(11):1087–95, 2009. [394] R. Piet, L. Vargová, E. Syková, D. Poulain, and S. Oliet. Physiological contribution of the astrocytic environment of neurons to intersynaptic crosstalk. Proceedings of the National Academy of Sciences of the United States of America, 101(7):2151, 2004. [395] D. Pinto and H. Clevers. Wnt control of stem cells and differentiation in the intestinal epithelium. Experimental cell research, 306(2):357–363, 2005. [396] A. Pitre, N. Davis, M. Paul, A. Orr, and O. Skalli. Synemin promotes akt-dependent glioblastoma cell proliferation by antagonizing pp2a. Molecular biology of the cell, 23(7):1243–1253, 2012. [397] S. Pleasure, C. Page, and V. Lee. Pure, postmitotic, polarized human neurons derived from ntera 2 cells provide a system for expressing exogenous proteins in terminally differentiated neurons. The Journal of neuroscience, 12(5):1802, 1992. [398] E. Poch, R. Miñambres, E. Mocholí, C. Ivorra, A. Pérez-Aragó, C. Guerri, I. Pérez- Roger, and R. M. Guasch. RhoE interferes with Rb inactivation and regulates the proliferation and survival of the U87 human glioblastoma cell line. Experimental cell research, 313(4):719–731, Feb. 2007. [399] S. Pollard, A. Benchoua, and S. Lowell. Neural stem cells, neurons, and glia. Methods in enzymology, 418:151–169, 2006. [400] S. Pollard and L. Conti. Investigating radial glia in vitro. Progress in neurobiology, 83(1):53–67, 2007. [401] S. Pollard, R. Wallbank, S. Tomlinson, L. Grotewold, and A. Smith. Fibroblast growth factor induces a neural stem cell phenotype in foetal forebrain progenitors and during embryonic stem cell differentiation. Molecular and Cellular Neuroscience, 38(3):393– 403, 2008. [402] S. Pollard, K. Yoshikawa, I. Clarke, D. Danovi, S. Stricker, R. Russell, J. Bayani, R. Head, M. Lee, M. Bernstein, et al. Glioma stem cell lines expanded in adher- ent culture have tumor-specific phenotypes and are suitable for chemical and genetic screens. Cell Stem Cell, 4(6):568–580, 2009. [403] S. M. Pollard. Adherent Neural Stem (NS) Cells from Fetal and Adult Forebrain. Cerebral Cortex, 16(Supplement 1):i112–i120, July 2006. [404] S. M. Pollard, K. Yoshikawa, I. D. Clarke, D. Danovi, S. Stricker, R. Russell, J. Bayani, R. Head, M. Lee, M. Bernstein, J. A. Squire, A. Smith, and P. Dirks. Glioma stem cell lines expanded in adherent culture have tumor-specific phenotypes and are suitable for chemical and genetic screens. Cell Stem Cell, 4(6):568–80, 2009. [405] K. Pollock, P. Stroemer, S. Patel, L. Stevanato, A. Hope, E. Miljan, Z. Dong, H. Hodges, J. Price, and J. Sinden. A conditionally immortal clonal stem cell line from human cortical neuroepithelium for the treatment of ischemic stroke. Experi- mental neurology, 199(1):143–155, 2006. [406] R. Popovic and J. Licht. Mek and maf in myeloma therapy. Blood, 117(8):2300–2302, 2011. [407] A. Popovi&cacute, A. Demirovi&cacute, B. Spaji&cacute, G. Štimac, and D. B Krušlin. Expression and prognostic role of syndecan-2 in prostate cancer. Prostate cancer and prostatic diseases, 13(1):78–82, 2009. [408] T. D. Portal. Tcga data portal: An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in pdgfra, idh1, egfr and nf1. [409] K. Pruitt, T. Tatusova, and D. Maglott. Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic acids research, 35(suppl 1):D61–D65, 2007. [410] PubMed. Pubmed. http://www.ncbi.nlm.nih.gov/pubmed. [411] R. Puca, L. Nardinocchi, G. Bossi, A. Sacchi, G. Rechavi, D. Givol, and G. D’Orazi. Restoring wtp53 activity in hipk2 depleted mcf7 cells by modulating metallothionein and zinc. Experimental cell research, 315(1):67–75, 2009. [412] Z. Qin, F. Ren, X. Xu, Y. Ren, H. Li, Y. Wang, Y. Zhai, and Z. Chang. Znf536, a novel zinc finger protein specifically expressed in the brain, negatively regulates neuron differentiation by repressing retinoic acid-induced gene transcription. Molecular and cellular biology, 29(13):3633, 2009. [413] A. Quiñones Hinojosa and N. Sanai. Cellular composition and cytoarchitecture of the adult human subventricular zone: a niche of neural stem cells. The Journal of comparative neurology, 2006. [414] F. Radtke and H. Clevers. Self-renewal and cancer of the gut: two sides of a coin. Science, 307(5717):1904, 2005. [415] B. Ragel, W. Couldwell, D. Gillespie, and R. Jensen. Identification of hypoxia-induced genes in a malignant glioma cell line (u-251) by cdna microarray analysis. Neurosur- gical review, 30(3):181–187, 2007. [416] H. Rajagopalan, P. Jallepalli, C. Rago, V. Velculescu, K. Kinzler, B. Vogelstein, and C. Lengauer. Inactivation of hcdc4 can cause chromosomal instability. Nature, 428(6978):77–81, 2004. [417] P. Rakic. Guidance of neurons migrating to the fetal monkey neocortex. Brain re- search, 33(2):471, 1971. [418] P. Rakic. Elusive radial glial cells: historical and evolutionary perspective. Glia, 43(1):19–32, 2003. [419] S. Ramaswamy, K. Ross, E. Lander, and T. Golub. A molecular signature of metastasis in primary solid tumors. Nature genetics, 33(1):49–54, 2002. [420] L. Rambhatla, S. Ram-Mohan, J. Cheng, and J. Sherley. Immortal dna strand cosegre- gation requires p53/impdh–dependent asymmetric self-renewal associated with adult stem cells. Cancer research, 65(8):3155, 2005. [421] L. B. Rangel, R. Agarwal, C. A. Sherman-Baust, V. Mello-Coelho, E. S. Pizer, H. Ji, D. D. Taub, and P. J. Morin. Anomalous expression of the HLA-DR alpha and beta chains in ovarian and other cancers. Cancer biology and therapy, 3(10):1021–7, 2004. [422] P. Rao, M. Jaggi, D. Smith, G. Hemstreet, and K. Balaji. Metallothionein 2a inter- acts with the kinase domain of pkcµ in prostate cancer. Biochemical and biophysical research communications, 310(3):1032–1038, 2003. [423] S. K. Rao, J. Edwards, A. D. Joshi, I.-M. Siu, and G. J. Riggins. A survey of glioblas- toma genomic amplifications and deletions. Journal of neuro-oncology, 96(2):169–179, 2010. [424] W. A. Redmond WL, Ruby CE. The role of ox40-mediated co-stimulation in t-cell activation and survival. Crit Rev Immunol, 29:187–201, 2009. [425] J. Rehwinkel, J. Raes, and E. Izaurralde. Nonsense-mediated mRNA decay: Target genes and functional diversification of effectors. Trends Biochem Sci, 31(11):639–46, 2006. [426] K. Reilly, D. Loisel, and R. Bronson. Nf1; Trp53 mutant mice develop glioblastoma with evidence of strain-specific effects. Nature Genetics, 2000. [427] B. Reynolds and S. Weiss. Generation of neurons and astrocytes from isolated cells of the adult mammalian central nervous system. Science, 255(5052):1707, 1992. [428] S. Riaz, E. Jauniaux, G. Stern, and H. Bradford. The controlled conversion of human neural progenitor cells derived from foetal ventral mesencephalon into dopaminergic neurons in vitro. Developmental brain research, 136(1):27–34, 2002. [429] M. J. Riemenschneider, R. Büschges, M. Wolter, J. Reifenberger, J. Boström, J. A. Kraus, U. Schlegel, and G. Reifenberger. Amplification and overexpression of the MDM4 (MDMX) gene from 1q32 in a subset of malignant gliomas without TP53 mutation or MDM2 amplification. Cancer Research, 59(24):6091–6096, Dec. 1999. [430] A. Rizki, V. Weaver, S. Lee, G. Rozenberg, K. Chin, C. Myers, J. Bascom, J. Mott, J. Semeiks, L. Grate, et al. A human breast cell model of preinvasive to invasive transition. Cancer research, 68(5):1378, 2008. [431] D. Robinson, Y. Wu, and S. Lin. The protein tyrosine kinase family of the human genome. biomedical science, 19:5548–5557, 2000. [432] R. Roelofs, D. Fischer, S. Houtman, J. Sluijs, W. Van Haren, F. Van Leeuwen, and E. Hol. Adult human subventricular, subgranular, and subpial zones contain astrocytes with a specialized intermediate filament cytoskeleton. Glia, 52(4):289–300, 2005. [433] S. Rosen and H. Lemjabbar-Alaoui. Sulf-2: an extracellular modulator of cell signaling and a cancer target candidate. Expert opinion on therapeutic targets, 14(9):935–949, 2010. [434] L. Rosso and J. Mienville. Pituicyte modulation of neurohormone output. Glia, 57(3):235–243, 2009. [435] A. Rousseau, C. Nutt, R. Betensky, A. Iafrate, M. Han, K. Ligon, D. Rowitch, and D. Louis. Expression of oligodendroglial and astrocytic lineage markers in diffuse gliomas: use of ykl-40, apoe, ascl1, and nkx2-2. Journal of Neuropathology & Experi- mental Neurology, 65(12):1149, 2006. [436] N. Roy, A. Benraiss, S. Wang, R. Fraser, R. Goodman, W. Couldwell, M. Neder- gaard, A. Kawaguchi, H. Okano, and S. Goldman. Promoter-targeted selection and isolation of neural progenitor cells from the adult human ventricular zone. Journal of neuroscience research, 59(3):321–331, 2000. [437] S. Rybalkin, C. Yan, K. Bornfeldt, and J. Beavo. Cyclic gmp phosphodiesterases and regulation of smooth muscle function. Circulation research, 93(4):280–291, 2003. [438] T. Saito, K. Shibasaki, M. Kurachi, S. Puentes, M. Mikuni, and Y. Ishizaki. Cere- bral capillary endothelial cells are covered by the VEGF-expressing foot processes of astrocytes. Neuroscience Letters, 2011. [439] A. Sakurada, H. Hamada, S. Fukushige, T. Yokoyama, K. Yoshinaga, T. Furukawa, S. Sato, A. Yajima, M. Sato, S. Fujimura, et al. Adenovirus-mediated delivery of the pten gene inhibits cell growth by induction of apoptosis in endometrial cancer. International journal of oncology, 15(6):1069–1074, 1999. [440] Y. Samuels, Z. Wang, A. Bardelli, N. Silliman, J. Ptak, S. Szabo, H. Yan, A. Gazdar, S. Powell, G. Riggins, et al. High frequency of mutations of the PIK3CA gene in human cancers. Science, 304(5670):554, 2004. [441] D. San, J. Ray, and F. Gage. Bipotent progenitor cell lines from the human cns. Nature biotechnology, 15(6):574–580, 1997. [442] N. Sanai, A. Alvarez-Buylla, and M. Berger. Neural stem cells and the origin of gliomas. New England Journal of Medicine, 353(8):811–822, 2005. [443] N. Sanai, A. Tramontin, A. Quiñones-Hinojosa, N. Barbaro, N. Gupta, S. Kunwar, M. Lawton, M. McDermott, A. Parsa, J. Verdugo, et al. Unique astrocyte ribbon in adult human brain contains neural stem cells but lacks chain migration. Nature, 427(6976):740–744, 2004. [444] M. Sarti, C. Sevignani, G. Calin, R. Aqeilan, M. Shimizu, F. Pentimalli, M. Picchio, A. Godwin, A. Rosenberg, A. Drusco, et al. Adenoviral transduction of testin gene into breast and uterine cancer cell lines promotes apoptosis and tumor reduction in vivo. Clinical cancer research, 11(2):806–813, 2005. [445] E. Schmidt, K. Ichimura, H. Goike, A. Moshref, L. Liu, and V. Collins. Mutational profile of the pten gene in primary human astrocytic tumors and cultivated xenografts. Journal of Neuropathology & Experimental Neurology, 58(11):1170, 1999. [446] M. Selbach, B. Schwanhausser, N. Thierfelder, Z. Fang, R. Khanin, and N. Ra- jewsky. Widespread changes in protein synthesis induced by microRNAs. Nature, 455(7209):58–63, 2008. [447] D. Seo, J. Sung, H. Cho, H. Yi, K. Seo, I. Choi, D. Kim, J. Kim, A. El-Aty, H. Shin, et al. Gene expression profiling of cancer stem cell in human lung adenocarcinoma a549 cells. Mol Cancer, 6(1):75, 2007. [448] J. Seoane, H. Le, L. Shen, S. Anderson, and J. Massagué. Integration of smad and forkhead pathways in the control of neuroepithelial and glioblastoma cell proliferation. Cell, 117(2):211–223, 2004. [449] B. Seri, J. Garcı a Verdugo, B. McEwen, and A. Alvarez-Buylla. Astrocytes give rise to new neurons in the adult mammalian hippocampus. The Journal of Neuroscience, 21(18):7153, 2001. [450] R. Shai, T. Shi, T. J. Kremen, S. Horvath, L. M. Liau, T. F. Cloughesy, P. S. Mischel, and S. F. Nelson. Gene expression profiling identifies molecular subtypes of gliomas. Oncogene, 22(31):4918–4923, July 2003. [451] P. Shannon, A. Markiel, O. Ozier, N. Baliga, J. Wang, D. Ramage, N. Amin, B. Schwikowski, and T. Ideker. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research, 13(11):2498–2504, 2003. [452] Q. Shen, Y. Wang, E. Kokovay, G. Lin, S. Chuang, S. Goderie, B. Roysam, and S. Temple. Adult svz stem cells lie in a vascular niche: a quantitative analysis of niche cell-cell interactions. Cell Stem Cell, 3(3):289–300, 2008. [453] W. Shen, A. Balajee, J. Wang, H. Wu, C. Eng, P. Pandolfi, and Y. Yin. Essential role for nuclear pten in maintaining chromosomal integrity. Cell, 128(1):157–170, 2007. [454] C. Sherr and J. Roberts. CDK inhibitors: positive and negative regulators of G1-phase progression. Genes & development, 13(12):1501, 1999. [455] S. L. Shi W, Fan H and D. R. The tetraspanin cd9 associates with transmembrane tgf-alpha and regulates tgf-alpha-induced egf receptor activation and cell proliferation. J. Cell Biol, (148):591–602, 2000. [456] T. Shima, N. Okumura, T. Takao, Y. Satomi, T. Yagi, M. Okada, and K. Nagai. Interaction of the sh2 domain of fyn with a cytoskeletal protein, β-adducin. Journal of Biological Chemistry, 276(45):42233–42240, 2001. [457] M. Shipitsin and K. Polyak. The cancer stem cell hypothesis: in search of definitions, markers, and relevance. Laboratory Investigation, 88(5):459–463, 2008. [458] S. Singh, I. Clarke, M. Terasaki, V. Bonn, C. Hawkins, J. Squire, and P. Dirks. Iden- tification of a cancer stem cell in human brain tumors. Cancer Research, 63(18):5821, 2003. [459] S. Singh, C. Hawkins, I. Clarke, and J. Squire. Identification of human brain tumour initiating cells. Nature, 2004. [460] P. Singha, I. Yeh, M. Venkatachalam, P. Saikumar, et al. Transforming growth factor beta-inducible gene tmepai converts tgf-beta from a tumor suppressor to a tumor promoter in breast cancer. Cancer research, 70(15):6377–6383, 2010. [461] D. Smadja, C. d’Audigier, L. Weiswald, C. Badoual, V. Dangles-Marie, L. Mauge, S. Evrard, I. Laurendeau, F. Lallemand, S. Germain, et al. The wnt antagonist dickkopf-1 increases endothelial progenitor cell angiogenic potential. Arteriosclerosis, thrombosis, and vascular biology, 30(12):2544–2552, 2010. [462] K. Smith, M. Luong, and G. Stein. Pluripotency: toward a gold standard for human es and ips cells. Journal of cellular physiology, 220(1):21–29, 2009. [463] B. J. M. D. C. B. G. R. Smith SJ, Long A. Pediatric high-grade glioma: identification of poly(adp-ribose) polymerase as a potential therapeutic target. Neuro-oncology, 13:1171–1177, 2011. [464] G. Smyth. Linear models and empirical bayes methods for assessing differential ex- pression in microarray experiments. Statistical applications in genetics and molecular biology, 3(1):3, 2004. [465] G. Smyth. Limma: linear models for microarray data. Bioinformatics and computa- tional biology solutions using R and Bioconductor, pages 397–420, 2005. [466] G. Smyth and T. Speed. Normalization of cdna microarray data. Methods, 31(4):265– 273, 2003. [467] D. A. Solomon, J.-S. Kim, W. Jean, and T. Waldman. Conspirators in a capital crime: co-deletion of p18INK4c and p16INK4a/p14ARF/p15INK4b in glioblastoma multiforme. Cancer Research, 68(21):8657–8660, Nov. 2008. [468] J. Soos, J. Krieger, O. Stüve, C. King, J. Patarroyo, K. Aldape, K. Wosik, A. Slavin, P. Nelson, J. Antel, et al. Malignant glioma cells use mhc class ii transactivator (ciita) promoters iii and iv to direct ifn-γ-inducible ciita expression and can function as nonprofessional antigen presenting cells in endocytic processing and cd4+ t-cell activation. Glia, 36(3):391–405, 2001. [469] A. Stark, J. Brennecke, N. Bushati, R. B. Russell, and S. M. Cohen. Animal MicroR- NAs confer robustness to gene expression and have a significant impact on 3’UTR evolution. Cell, 123(6):1133–46, 2005. [470] A. Stark, J. Brennecke, R. B. Russell, and S. M. Cohen. Identification of Drosophila MicroRNA targets. PLoS Biol, 1(3):E60, 2003. [471] C. StrÃÂijbing, G. Ahnert-Hilger, J. Shan, B. Wiedenmann, J. Hescheler, and A. M. Wobus. Differentiation of pluripotent embryonic stem cells into the neuronal lineage in vitro gives rise to mature inhibitory and excitatory neurons. Mechanisms of Devel- opment, 53(2):275–287, 1995. [472] R. Stupp, W. P. Mason, M. J. van den Bent, M. Weller, B. Fisher, M. J. B. Taphoorn, K. Belanger, A. A. Brandes, C. Marosi, U. Bogdahn, J. Curschmann, R. C. Janzer, S. K. Ludwin, T. Gorlia, A. Allgeier, D. Lacombe, J. G. Cairncross, E. Eisenhauer, R. O. Mirimanoff, E. O. for Research, T. of Cancer Brain Tumor, R. Groups, and N. C. I. of Canada Clinical Trials Group. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. The New England journal of medicine, 352(10):987– 996, Mar. 2005. [473] X. Su, C. Kong, and P. Stahl. Gapex-5 mediates ubiquitination, trafficking, and degradation of epidermal growth factor receptor. Journal of Biological Chemistry, 282(29):21278–21284, 2007. [474] A. Subramanian, P. Tamayo, V. Mootha, S. Mukherjee, B. Ebert, M. Gillette, A. Paulovich, S. Pomeroy, T. Golub, E. Lander, et al. Gene set enrichment anal- ysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43):15545–15550, 2005. [475] E. Sum, D. Segara, B. Duscio, M. Bath, A. Field, R. Sutherland, G. Lindeman, and J. Visvader. Overexpression of lmo4 induces mammary hyperplasia, promotes cell invasion, and is a predictor of poor outcome in breast cancer. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7659–7664, 2005. [476] L. Sun, W. Yan, Y. Wang, G. Sun, H. Luo, J. Zhang, X. Wang, Y. You, Z. Yang, and N. Liu. Microrna-10b induces glioma cell invasion by modulating mmp-14 and upar expression via hoxd10. Brain research, 1389:9–18, 2011. [477] N. Sun, T. Huiatt, D. Paulin, Z. Li, and R. Robson. Synemin interacts with the lim domain protein zyxin and is essential for cell adhesion and migration. Experimental cell research, 316(3):491–505, 2010. [478] P. Sun, S. Xia, B. Lal, C. Eberhart, A. Quinones-Hinojosa, J. Maciaczyk, W. Matsui, F. DiMeco, S. Piccirillo, A. Vescovi, et al. Dner, an epigenetically modulated gene, regulates glioblastoma-derived neurosphere cell differentiation and tumor propagation. Stem Cells, 27(7):1473–1486, 2009. [479] T. Sun, X. Wang, S. Xie, D. Zhang, X. Wang, B. Li, W. Ma, and H. Xin. A comparison of proliferative capacity and passaging potential between neural stem and progenitor cells in adherent and neurosphere cultures. International Journal of Developmental Neuroscience, 2011. [480] Y. Sun, W. Kong, A. Falk, J. Hu, L. Zhou, S. Pollard, and A. Smith. Cd133 (prominin) negative human neural stem cells are clonogenic and tripotent. PloS one, 4(5):e5498, 2009. [481] Y. Sun, S. Pollard, L. Conti, M. Toselli, G. Biella, G. Parkin, L. Willatt, A. Falk, E. Cattaneo, and A. Smith. Long-term tripotent differentiation capacity of human neural stem (ns) cells in adherent culture. Molecular and Cellular Neuroscience, 38(2):245–258, 2008. [482] C. Suo, A. Salim, K. S. Chia, Y. Pawitan, and S. Calza. Modified least-variant set normalization for miRNA microarray. RNA, 16(12):2293–303, 2010. [483] C. Svendsen, M. ter Borg, R. Armstrong, A. Rosser, S. Chandran, T. Ostenfeld, and M. Caldwell. A new method for the rapid and long term growth of human neural precursor cells. Journal of neuroscience methods, 85(2):141–152, 1998. [484] Y. Takamura, H. Ikeda, T. Kanaseki, M. Toyota, T. Tokino, K. Imai, K. Houkin, and N. Sato. Regulation of mhc class ii expression in glioma cells by class ii transactivator (ciita). Glia, 45(4):392–405, 2004. [485] O. Tam, A. Aravin, P. Stein, A. Girard, E. Murchison, S. Cheloufi, E. Hodges, M. Anger, R. Sachidanandam, R. Schultz, et al. -derived small interfering rnas regulate gene expression in mouse oocytes. Nature, 453(7194):534, 2008. [486] B. Tan, C. Park, L. Ailles, and I. Weissman. The cancer stem cell hypothesis: a work in progress. Laboratory investigation, 86(12):1203–1207, 2006. [487] N. Taniguchi, H. Taniura, M. Niinobe, C. Takayama, K. Tominaga-Yoshino, A. Ogura, and K. Yoshikawa. The postmitotic growth suppressor necdin interacts with a calcium-binding protein (nefa) in neuronal cytoplasm. Journal of Biological Chem- istry, 275(41):31674–31681, 2000. [488] M. Taniwaki, Y. Daigo, N. Ishikawa, A. Takano, T. Tsunoda, W. Yasui, K. Inai, N. Kohno, and Y. Nakamura. Gene expression profiles of small-cell lung cancers: molecular signatures of lung cancer. International journal of oncology, 29(3):567–576, 2006. [489] A. Tarca, S. Draghici, P. Khatri, S. Hassan, P. Mittal, J. Kim, C. Kim, J. Kusanovic, and R. Romero. A novel signaling pathway impact analysis. Bioinformatics, 25(1):75, 2009. [490] P. A. C. t’Hoen, Y. Ariyurek, H. H. Thygesen, E. Vreugdenhil, R. H. A. M. Vossen, R. X. De Menezes, J. M. Boer, G.-J. B. Van Ommen, and J. T. Den Dunnen. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research, 36(21):e141–e141, Oct. 2008. [491] C. Thomas, G. Ely, C. D. James, R. Jenkins, M. Kastan, A. Jedlicka, P. Burger, and R. Wharen. Glioblastoma-related gene mutations and over-expression of functional epidermal growth factor receptors in SKMG-3 glioma cells. Acta neuropathologica, 101(6):605–615, June 2001. [492] J. Ting and J. Trowsdale. Genetic control of mhc class ii expression. Cell, 109(2):S21– S33, 2002. [493] E. Tobias, A. Hurlstone, E. MacKenzie, R. McFarlane, D. Black, et al. The tes gene at 7q31. 1 is methylated in tumours and encodes a novel growth-suppressing lim domain protein. Oncogene, 20(22):2844, 2001. [494] V. Tropepe, M. Sibilia, B. Ciruna, J. Rossant, E. Wagner, and D. Kooy. Distinct neural stem cells proliferate in response to egf and fgf in the developing mouse telencephalon. Developmental biology, 208(1):166–188, 1999. [495] L. Trotman, X. Wang, A. Alimonti, Z. Chen, J. Teruya-Feldstein, H. Yang, N. Pavletich, B. Carver, C. Cordon-Cardo, H. Erdjument-Bromage, et al. Ubiqui- tination regulates pten nuclear import and tumor suppression. Cell, 128(1):141–156, 2007. [496] A. B. Trovó-Marqui and E. H. Tajara. Neurofibromin: a general outlook. Clinical genetics, 70(1):1–13, July 2006. [497] C. Tso, P. Shintaku, J. Chen, Q. Liu, J. Liu, Z. Chen, K. Yoshimoto, P. Mischel, T. Cloughesy, L. Liau, et al. Primary glioblastomas express mesenchymal stem-like properties. Molecular cancer research, 4(9):607–619, 2006. [498] A. Tsuchida, T. Okajima, K. Furukawa, T. Ando, H. Ishida, A. Yoshida, Y. Nakamura, R. Kannagi, M. Kiso, and K. Furukawa. Synthesis of disialyl lewis a (lea) structure in colon cancer cell lines by a sialyltransferase, st6galnac vi, responsible for the synthesis of α-series gangliosides. Journal of Biological Chemistry, 278(25):22787–22794, 2003. [499] N. Tsuji, K. Kondoh, M. Furuya, D. Kobayashi, A. Yagihashi, Y. Inoue, T. Meguro, S. Horita, H. Takahashi, and N. Watanabe. A novel aspartate protease gene, alp56, is related to morphological features of colorectal adenomas. International journal of colorectal disease, 19(1):43–48, 2004. [500] V. Turk, B. Turk, G. Guncar, D. Turk, and J. Kos. Lysosomal cathepsins: struc- ture, role in antigen processing and presentation, and cancer. Advances in enzyme regulation, 42:285, 2002. [501] A. Tzschach, A. Bisgaard, M. Kirchhoff, L. Graul-Neumann, H. Neitzel, S. Page, A. Ahmed, I. Müller, F. Erdogan, H. Ropers, et al. Chromosome aberrations involving 10q22: report of three overlapping interstitial deletions and a balanced translocation disrupting c10orf11. European Journal of Human Genetics, 18(3):291–295, 2009. [502] N. Uchida, D. Buck, D. He, M. Reitsma, M. Masek, T. Phan, A. Tsukamoto, F. Gage, and I. Weissman. Direct isolation of human central nervous system stem cells. Pro- ceedings of the National Academy of Sciences, 97(26):14720, 2000. [503] M. Van De Wiel, K. Kim, S. Vosse, W. Van Wieringen, S. Wilting, and B. Ylstra. Cghcall: calling aberrations for array cgh tumor profiles. Bioinformatics, 23(7):892– 894, 2007. [504] J. Van Den Boom, M. Wolter, R. Kuick, D. Misek, A. Youkilis, D. Wechsler, C. Som- mer, G. Reifenberger, and S. Hanash. Characterization of gene expression profiles associated with glioma progression using oligonucleotide-based microarray analysis and real-time reverse transcription-polymerase chain reaction. The American journal of pathology, 163(3):1033–1043, 2003. [505] A. Van der Krol, L. Mur, M. Beld, J. Mol, and A. Stuitje. Flavonoid genes in petu- nia: addition of a limited number of gene copies may lead to a suppression of gene expression. The Plant Cell Online, 2(4):291, 1990. [506] B. van Houte, T. Binsl, H. Hettling, and J. Heringa. Cghnormaliter: a bioconduc- tor package for normalization of array cgh data with many cnas. Bioinformatics, 26(10):1366–1367, 2010. [507] F. van Ruissen and F. Baas. Serial analysis of gene expression (SAGE). Methods in molecular biology, 383:41–66, 2007. [508] S. Vatter, G. Pahlke, J. Deitmer, and G. Eisenbrand. Differential phosphodiesterase expression and cytosolic ca2+ in human cns tumour cells and in non-malignant and malignant cells of rat origin. Journal of neurochemistry, 93(2):321–329, 2005. [509] F. Vazquez, H. Vaucheret, R. Rajagopalan, C. Lepers, V. Gasciolli, A. Mallory, J. Hilbert, D. Bartel, and P. Crété. Endogenous trans-acting sirnas regulate the accumulation of arabidopsis mrnas. Molecular Cell, 16(1):69–79, 2004. [510] E. Venkatraman and A. Olshen. A faster circular binary segmentation algorithm for the analysis of array cgh data. Bioinformatics, 23(6):657–663, 2007. [511] R. G. W. Verhaak, K. A. Hoadley, E. Purdom, V. Wang, Y. Qi, M. D. Wilkerson, C. R. Miller, L. Ding, T. Golub, J. P. Mesirov, G. Alexe, M. Lawrence, M. O’Kelly, P. Tamayo, B. A. Weir, S. Gabriel, W. Winckler, S. Gupta, L. Jakkula, H. S. Feiler, J. G. Hodgson, C. D. James, J. N. Sarkaria, C. Brennan, A. Kahn, P. T. Spellman, R. K. Wilson, T. P. Speed, J. W. Gray, M. Meyerson, G. Getz, C. M. Perou, D. N. Hayes, and Cancer Genome Atlas Research Network. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell, 17(1):98–110, Jan. 2010. [512] A. Vescovi, E. Parati, A. Gritti, P. Poulin, M. Ferrario, E. Wanke, P. Frölichsthal- Schoeller, L. Cova, M. Arcellana-Panlilio, A. Colombo, et al. Isolation and cloning of multipotential stem cells from the embryonic human cns and establishment of trans- plantable human neural stem cell lines by epigenetic stimulation. Experimental neu- rology, 156(1):71–83, 1999. [513] A. Villa, E. Snyder, A. Vescovi, and A. Martínez-Serrano. Establishment and prop- erties of a growth factor-dependent, perpetual neural stem cell line from the human cns. Experimental neurology, 161(1):67–84, 2000. [514] R. Vlietstra, D. van Alewijk, K. Hermans, G. van Steenbrugge, and J. Trapman. Frequent inactivation of pten in prostate cancer cell lines and xenografts. Cancer Research, 58(13):2720–2723, 1998. [515] B. Vogelstein, D. Lane, A. Levine, et al. Surfing the p53 network. Nature, 408(6810):307–310, 2000. [516] A. Volterra. Astrocytes, from brain glue to communication elements: the revolution continues. Nature Reviews Neuroscience, 2005. [517] A. Von Deimling, A. Korshunov, and C. Hartmann. The next generation of glioma biomarkers: Mgmt methylation, braf fusions and idh1 mutations. Brain Pathology, 21(1):74–87, 2011. [518] V. Vukicevic, A. Jauch, T. Dinger, L. Gebauer, V. Hornich, S. Bornstein, M. Ehrhart- Bornstein, and A. Müller. Genetic instability and diminished differentiation capacity in long-term cultured mouse neurosphere cells. Mechanisms of ageing and development, 131(2):124–132, 2010. [519] S. Wada, M. Hamada, K. Kobayashi, and N. Satoh. Novel genes involved in canonical wnt beta-catenin signaling pathway in early ciona intestinalis embryos. Development, growth & differentiation, 50(4):215–227, 2008. [520] R. Wang, K. Chadalavada, J. Wilshire, U. Kowalik, K. Hovinga, A. Geber, B. Fligel- man, M. Leversha, C. Brennan, and V. Tabar. Glioblastoma stem-like cells give rise to tumour endothelium. Nature, 468(7325):829–833, 2010. [521] T. Watanabe, T. Katagiri, M. Suzuki, F. Shimizu, T. Fujiwara, N. Kanemoto, Y. Naka- mura, Y. Hirai, H. Maekawa, and E. Takahashi. Cloning and characterization of two novel human cdnas (nell1 and nell2) encoding proteins with six egf-like repeats. Ge- nomics, 38(3):273–276, 1996. [522] T. Watanabe, S. Nobusawa, P. Kleihues, and H. Ohgaki. Idh1 mutations are early events in the development of astrocytomas and oligodendrogliomas. The American journal of pathology, 174(4):1149, 2009. [523] T. Watanabe, A. Takeda, T. Tsukiyama, K. Mise, T. Okuno, H. Sasaki, N. Minami, and H. Imai. Identification and characterization of two novel classes of small rnas in the mouse germline: retrotransposon-derived sirnas in oocytes and germline small rnas in testes. Genes & development, 20(13):1732, 2006. [524] T. Watanabe, Y. Totoki, A. Toyoda, M. Kaneda, S. Kuramochi-Miyagawa, Y. Obata, H. Chiba, Y. Kohara, T. Kono, T. Nakano, et al. Endogenous sirnas from naturally formed dsrnas regulate transcripts in mouse oocytes. Nature, 453(7194):539–543, 2008. [525] S. Weiss, C. Dunne, J. Hewson, C. Wohl, M. Wheatley, A. Peterson, and B. Reynolds. Multipotent cns stem cells are present in the adult mammalian spinal cord and ven- tricular neuroaxis. The Journal of neuroscience, 16(23):7599, 1996. [526] P. Wen and S. Kesari. Malignant gliomas in adults. New England Journal of Medicine, 359(5):492–507, 2008. [527] W. Wick, U. Naumann, and M. Weller. Transforming growth factor-beta: A molec- ular target for the future therapy of glioblastoma. Current pharmaceutical design, 12(3):341–349, 2006. [528] E. Wijaya, M. Frith, Y. Suzuki, and P. Horton. Recount: expectation maximization based error correction tool for next generation sequencing data. In Genome Inform, volume 23, pages 189–201, 2009. [529] R. Williams and K. Herrup. The Control of Neuron Number. Annual Review of Neuroscience, 11(1):423–453, 1988. [530] P. Wolters, M. Laig-Webster, and G. Caughey. Dipeptidyl peptidase i cleaves matrix- associated proteins and is expressed mainly by mast cells in normal dog airways. American journal of respiratory cell and molecular biology, 22(2):183, 2000. [531] L. Xu, Y. Shi, G. Petrovics, C. Sun, M. Makarem, W. Zhang, I. Sesterhenn, D. McLeod, L. Sun, J. Moul, et al. Pmepa1, an androgen-regulated nedd4-binding pro- tein, exhibits cell growth inhibitory function and decreased expression during prostate cancer progression. Cancer research, 63(15):4299, 2003. [532] X. Xu, J. Zhao, Z. Xu, B. Peng, Q. Huang, E. Arnold, and J. Ding. Structures of hu- man cytosolic nadp-dependent isocitrate dehydrogenase reveal a novel self-regulatory mechanism of activity. Journal of Biological Chemistry, 279(32):33946–33957, 2004. [533] V. Yadav and M. Denning. Fyn is induced by ras/pi3k/akt signaling and is required for enhanced invasion/migration. Molecular carcinogenesis, 50(5):346–352, 2011. [534] K. Yamada and M. Watanabe. Cytodifferentiation of Bergmann glia and its relation- ship with Purkinje cells. Anatomical science international, 77(2):94–108, 2002. [535] H. Yan, D. Parsons, G. Jin, R. McLendon, B. Rasheed, W. Yuan, I. Kos, I. Batinic- Haberle, S. Jones, G. Riggins, et al. Idh1 and mutations in gliomas. New England Journal of Medicine, 360(8):765–773, 2009. [536] J. Yan, L. Xu, A. Welsh, G. Hatfield, T. Hazel, K. Johe, and V. Koliatsos. Extensive neuronal differentiation of human neural stem cell grafts in adult rat spinal cord. PLoS medicine, 4(2):e39, 2007. [537] K. Yap, S. Li, A. Muñoz-Cabello, S. Raguz, L. Zeng, S. Mujtaba, J. Gil, M. Walsh, and M. Zhou. Molecular interplay of the noncoding rna anril and methylated histone h3 lysine 27 by polycomb cbx7 in transcriptional silencing of ink4a. Molecular cell, 38(5):662–674, 2010. [538] S. Yekta, I. Shih, et al. Microrna-directed cleavage of hoxb8 mrna. Science, 304(5670):594, 2004. [539] Q. Ying and A. Smith. Defined conditions for neural commitment and differentiation. Methods in enzymology, 365:327–341, 2003. [540] A. Yool. Aquaporins: multiple roles in the central nervous system. The Neuroscientist, 13(5):470, 2007. [541] H. You, K. Yamamoto, and T. Mak. Regulation of transactivation-independent proapoptotic activity of p53 by foxo3a. Proceedings of the National Academy of Sci- ences, 103(24):9051, 2006. [542] J. Yu, K. Ohuchida, K. Nakata, K. Mizumoto, L. Cui, H. Fujita, H. Yamaguchi, T. Egami, H. Kitada, M. Tanaka, et al. Lim only 4 is overexpressed in late stage pancreas cancer. Mol Cancer, 7:93, 2008. [543] X. Yuan, J. Curtin, Y. Xiong, G. Liu, S. Waschsmann-Hogiu, D. Farkas, K. Black, and J. Yu. Isolation of cancer stem cells from adult glioblastoma multiforme. Oncogene, 23(58):9392–9400, 2004. [544] S. Yun, K. Byun, J. Bhin, J. Oh, L. Nhung, D. Hwang, and B. Lee. Transcriptional regulatory networks associated with self-renewal and differentiation of neural stem cells. Journal of cellular physiology, 225(2):337–347, 2010. [545] Z. Zador, O. Bloch, X. Yao, and G. Manley. Aquaporins: role in cerebral edema and brain water balance. Progress in brain research, 161:185–194, 2007. [546] D. Zagzag, K. Salnikow, L. Chiriboga, H. Yee, L. Lan, M. A. Ali, R. Garcia, S. De- maria, and E. W. Newcomb. Downregulation of major histocompatibility complex antigens in invading glioma cells: stealth invasion of the brain. Laboratory investiga- tion; a journal of technical methods and pathology, (3):328–41. [547] X. Zhang, Z. Lian, C. Padden, M. Gerstein, J. Rozowsky, M. Snyder, T. Gingeras, P. Kapranov, S. Weissman, and P. Newburger. A myelopoiesis-associated regula- tory intergenic noncoding rna transcript within the human hoxa cluster. Blood, 113(11):2526–2534, 2009. [548] S. Zhao, Y. Lin, W. Xu, W. Jiang, Z. Zha, P. Wang, W. Yu, Z. Li, L. Gong, Y. Peng, et al. Glioma-derived mutations in idh1 dominantly inhibit idh1 catalytic activity and induce hif-1{alpha}. Science Signalling, 324(5924):261, 2009. [549] H. Zheng, H. Ying, H. Yan, A. Kimmelman, and D. Hiller. p53 and Pten control neural and glioma stem/progenitor cell renewal and differentiation. Nature, 2008. [550] Y. Zhu, P. Ghosh, P. Charnay, D. Burns, and L. Parada. Neurofibromas in nf1: Schwann cell origin and role of tumor environment. Science, 296(5569):920, 2002. [551] Z. Zhuang, P. Jian, L. Longjiang, H. Bo, and X. Wenlin. Oral cancer cells with different potential of lymphatic metastasis displayed distinct biologic behaviors and gene expression profiles. Journal of Oral Pathology & Medicine, 39(2):168–175, 2010. [552] G. Zupanc and S. Clint. Potential role of radial glia in adult neurogenesis of teleost fish. Glia, 43(1):77–86, 2003.