ABSTRACT

Identifying and Predicting Molecular Signatures in Glioblastoma Using Imaging-Derived

Phenotypic Traits

by

Dalu Yang

This thesis addresses the relationship between molecular status of glioblastoma patients and imaging-derived phenotypic traits. Glioblastoma (GBM) is the most common and aggressive malignant brain tumor. GBM’s complex heterogeneity in expression leads to considerably various responses to current treatment among different patients. Thus, a deeper understanding of tumor biology and alternative personalized therapeutic intervention is urgently demanded. Magnetic Resonance Imaging (MRI) and histologic images are routinely used for GBM diagnosis. The imaging-derived phenotypic traits can be linked to tumor molecular signatures. In this thesis, we explore the imaging-genomic relationship in GBM via three approaches. First, we aim to find

MRI-derived texture features that best discriminate GBM molecular subtypes. Second, we aim to find gene networks that determine the radiologically-defined tumor sub- compartment volumes. The third approach aims to quantify GBM histologic hallmarks and correlate them with biological pathway activities. Our study shows that linking imaging traits with tumor molecular status has potential clinical relevance and provides biological insight. Acknowledgments

I would like to first thank my advisors, Dr. Arvind Rao and Dr. Ashok

Veeraraghavan for their generous guidance and support, without which it would not have been possible for me to work on such an interesting thesis topic. I am also very grateful to

Shivali Narang, Dr. Joonsang Lee, Peng Chu, Dr. Pankaj Singh, and Dr. Ganiraju

Manyam, who have provided me with data, links, and insights that are key to the success of the experiments in this thesis. I am thankful to my girlfriend who really helped me overcome the procrastination on writing. Finally, I would like to express my gratitude to all my family and friends for their support and understanding on my choice of pursuing a graduate degree.

Contents

Acknowledgments ...... iv

Contents ...... v

List of Figures ...... viii

List of Tables ...... ix

Chapter 1 ...... 10

Introduction ...... 10 1.1. Background ...... 10 1.2. Motivation ...... 12 1.3. Thesis Layout ...... 13

Chapter 2 ...... 15

Predicting Molecular Subtypes with MRI Texture Features ...... 15 2.1. Introduction ...... 15 2.2. Pre-processing of MRI scans ...... 17 2.3. Texture Feature Extraction ...... 20 2.3.1. Segmentation-based Fractal Texture Analysis (SFTA) ...... 20 2.3.2. Run-length matrix (RLM) ...... 22 2.3.3. Local binary patterns (LBP) ...... 23 2.3.4. Histogram of oriented gradients (HOG) ...... 25 2.3.5. Haralick texture features ...... 26 2.4. Classification Algorithm ...... 27 2.5. Experiments and Results ...... 28 2.5.1. Dataset ...... 28 vi

2.5.2. Prediction Performance Evaluation ...... 29 2.5.3. Results ...... 31

Chapter 3 ...... 34

Discovering Molecular Networks Underlying MRI Volumetric Features ...... 34 3.1. Introduction ...... 34 3.2. Methods and Materials ...... 36 3.2.1. Dataset ...... 36 3.2.2. Automated Tumor Segmentation in MRI ...... 37 3.2.3. Determination of Cutpoints on Tumor Subcompartment Volumetrics ...... 39 3.2.4. Identification of Gene Networks using Context-Specific Subnetwork Discovery ...... 40 3.2.5. Pathway Identification using IPA ...... 41 3.2.6. Identification of Perturbagens using Connectivity Map ...... 41 3.3. Results ...... 42 3.3.1. Tumor Subcompartment Volumes Stratify OS in a Statistically Significant Manner ...... 42 3.3.2. MIS Extracted by COSSY Algorithm Contains Closely Interacting Molecular Nodes ...... 43 3.3.3. Ingenuity Pathway Analysis Identifies Phenotype Related Pathways Containing MIS ...... 47 3.3.4. Connectivity Map Identifies Perturbagens Reversing Phenotype Associated Molecular signatures ...... 49

Chapter 4 ...... 51 Identifying Molecular Programs Underlying Features of Whole Slide Tissue Images in Glioblastoma ...... 51 4.1. Introduction ...... 51 4.2. Materials and Methods ...... 52 4.2.1. Whole Slide Tissue Image Data...... 52 4.2.2. Pathway Activity Data ...... 53 4.2.3. Extracting Image Tiles and Removing Artifact ...... 53 4.2.4. Segmenting Microvascular Proliferation ...... 54 4.2.5. Segmenting Necrosis ...... 56 vii

4.2.6. Correlating MVP and Necrosis Proportions with Pathway Activities ...... 57 4.3. Results ...... 58 4.3.1. Distribution of the MVP and necrosis proportions ...... 58 4.3.2. MVP and Necrosis Proportions are correlated with Abstract Processes of Different Pathways ...... 59

Chapter 5 ...... 62

Discussion and Future Directions ...... 62 5.1. Performance of Predicting GBM Subtypes...... 62 5.2. Biological Relevance of the MRI traits ...... 65 5.3. Pathways Correlating with Features derived from WSIs ...... 69 5.4. Potential Future Directions ...... 71

Chapter 6 ...... 73

Conclusions and Contributions ...... 73

References ...... 75

Appendix A ...... 85 Formulas for RLM features and Haralick features ...... 85

List of Figures

Figure 2.1 Brain MRI scan before and after N3 correction ...... 18

Figure 2.2 The pipeline of pre-processing step ...... 20

Figure 2.3 Segmentation-based fractal texture analysis (SFTA) feature extraction. 22

Figure 2.4 Computing the run-length matrix (RLM)...... 23

Figure 2.5 Computing local binary patterns (LBP)...... 24

Figure 2.6 Computing histogram of oriented gradients (HOG) features...... 25

Figure 2.7 Computing Haralick features...... 26

Figure 3.1 Getting volumetrics data from raw MRI scans...... 39

Figure 3.2 The Kaplan-Meier survival curves demonstrating that different tumor compartment volumes stratify survivals...... 42

Figure 3.3 Molecular interaction subnetworks associated with enhancing tumor volume–derived survival groups...... 44

Figure 4.1 Tiling and artifacts removal of a whole slide image. (a): Original whole slide image...... 54

Figure 4.2 Segmentation process for MVP ...... 56

Figure 4.3 Distributions of MVP and necrosis proportion among patients...... 58

List of Tables

Table 2.1 Area under the ROC curve (AUC) for evaluating the performance of classifiers...... 32

Table 3.1. The 20 MISs generated by COSSY algorithm, 5 for each tumor subcompartment...... 47

Table 3.2 Top canonical pathways identified by IPA based on tumor subcompartment volumes...... 48

Table 3.3 Top perturbagens identified by Connectivity Map for each tumor subcompartment...... 50

Table 4.1 Pathways that have abstract processes significantly correlated with necrosis proportions...... 60

Table 4.2 Pathways that have abstract processes significantly correlated with MVP proportions...... 61

Introduction

1.1. Background

Medical imaging techniques such as CT and MRI have been used in initial diagnosis of cancer and monitoring therapeutic effects. [1-3] Due to their noninvasive nature, imaging studies are part of a routine clinical procedure in cancer treatment. The imaging techniques are great tools for quantitatively assessing the morphological and anatomical information of tumor, as well as for guiding surgical and radiation therapies.

However, the lack of molecular detail in radiographic imaging studies has limited its effectiveness in discovering biological correlates of tumor development and in identifying new possible therapy methods. [4]

Microarray technology, on the other hand, has the ability to measure all the genes in the whole genome simultaneously. [5, 6] This technology allows the researchers to investigate the interaction between individual biomolecules and their influence to

10

11 physiological processes in a far more efficient way than modifying and studying single genes in a mouse model. [7] With the microarray technology, the heterogeneity of gene expression profiles among different tumors can be accurately evaluated, which is useful in tumor subtyping and finding new prognosis predictors. Although the microarray measurement can give researchers a better understanding of the tumor, it requires tissue samples from biopsy or surgery, making it not practical for all patients.

Therefore, imaging-genomics, the emerging research area that studies the relationship between phenotypic traits derived from medical images and the molecular characteristics, has been a promising new direction of cancer research. [8] A typical imaging-genomic study attempts to understand how molecular correlates, such as gene expression or mutation, determine the imaging phenotypes, and to find imaging features as surrogates of molecular biomarkers which are relevant to clinical outcomes. [9]

The conventional approach of extracting phenotypic traits from medical images mostly relies on human expert readers. [10-12] Although there are guidelines designed for reproducibility of the features extracted by the human readers, manual image analysis still have some disadvantages compared to fully-automatic computer analysis.

Inefficiency of manual feature extraction, limited feature set according to the guidelines, and noise introduced by inter-rater variability are some common problems that researchers are facing when using manually extracted image features. [8]

With the recent development of computer vision algorithm, automatic feature extraction are increasingly adopted into radiogenomic researches. Some initial studies have shown that computer generated features are capable of correlating with molecular 12 information. [13, 14] There is still enormous potential in computer vision algorithms for tackling radiogenomic problems.

1.2. Motivation

In 2015, nearly 23,000 patients in the United States were newly diagnosed with brain cancer, and over 15,000 patients with brain cancer died. [15] Generally, among all malignant brain tumor cases, more than 46% of adult patients have glioblastoma (GBM).

[16] GBM is the most common and aggressive type of malignant brain tumor, with a median survival of only 12-15 months. [17]

Recent studies have identified several distinct molecular alterations in GBM. [18]

This underlying molecular heterogeneity suggests it necessary to develop different treatment strategies for different patients. However, currently the only way to determine a patient’s tumor subtype is to perform an invasive brain biopsy followed by mRNA profiling [19]. A brain tumor biopsy can be both costly and risky. [20, 21]

The limitations in chemotherapy is yet another problem besides the molecular heterogeneity assessment of tumor. Currently temozolomide is the only approved chemotherapy for GBM. [17] Although the biological characteristics of GBM have been better understood in the recent years, GBM’s complex phenotype and heterogeneity in gene expression are still challenges searching for deeper understanding of tumor biology and alternative strategies for therapeutic intervention.

Hence, the studies in this thesis aim to investigate: 13

i) The feasibility of using image features automatically derived from tumor

MRI scans for discrimination of molecular subtypes and 12-month overall

survival (OS) status in GBM. In particular, the performance of five

commonly used two-dimensional (2D) texture feature descriptors for their

potential value in these classification tasks;

ii) The genes and pathways that are biologically relevant to GBM, based on

the stratification of overall survival in GBM patients by the four

radiologically defined tumor compartment features derived automatically

from MRI scans;

iii) Fully automatically assessed GBM hallmarks in microscopic images and

their relevance to biological pathways. In other terms, the correlation

between the extent of necrosis/ microvascular proliferation and pathway

activities.

1.3. Thesis Layout

In Chapter 2 I discuss the method and experiment results in predicting GBM subtypes with MRI texture features. In Chapter 3 I demonstrate how genes and biological pathways are identified based on radiologically defined tumor compartment features from

MRI scans. In Chapter 4 I extract clinical hallmarks from whole slide tissue images and identify pathway activities correlated with them. Chapter 5 discusses the implications of the experiment results and possible future works. Chapter 6 states the conclusion of this thesis and my contributions.

14

1.4. Contributions

The contributions of each chapter is are listed for an overview as follows:

Chapter 2:

 We show that specific texture features extracted from MRI images can be

used to predict subtypes and overall survival of GBM patients.

 We demonstrate that the prediction performance depends on the

combination of MRI modalities and scan directions.

Chapter 3:

 We show that automatically generated tumor compartment features from

MRI scans are informative in finding GBM tumor pathways.

 We discuss the feasibility of drug discovery method by integrating image

features, gene expression data, and patient survival information.

Chapter 4:

 We designed the algorithm for comprehensively assessing GBM tumor

hallmarks in whole slide tissue images.

 We identified GBM-relevant biological pathways correlated with the

quantitative assessment of the tumor hallmarks.

15

Predicting Molecular Subtypes with MRI Texture Features

2.1. Introduction

As mentioned in Chapter 1, GBM can be classified by distinct molecular alterations. Specifically, four molecular subtypes of GBM have been identified: classical, mesenchymal, neural, and proneural, each representing distinct molecular alterations. The four subtypes are identified using consensus hierarchical clustering on a gene expression dataset of 202 patients and 1,740 genes. [18] The four subtypes have different gene signatures and clinical correlations and are found to require potentially different treatment strategy. For example, intensive therapy (concurrent chemotherapy and radiation) results in better survival outcomes in patients of classical subtype, but not in patients with the proneural subtype. [18] A more recent study show that patients with mesenchymal signatures have significantly extended survival after immunotherapy, but patients of the other three subtypes do not. [22] In addition, these subtypes are also associated with distinctly different survival outcomes (for example, the proneural subtype has a better overall survival than the other molecular subtypes). [18] Although it is 16 crucial to determine the molecular subtypes before proper treatment, a biopsy is required for this task, introducing extra cost and risk as well as delaying the decisions on the best treatment strategy.

Unlike the brain biopsies, MRI has been routinely used for noninvasive assessment of brain cancer. An imaging based characterization presents a variety of advantages [8], such as permitting interrogation of tumor heterogeneity characteristics and an assessment of tumor pathophysiology. More recently, it has also been suggested that GBMs harbor multiple cell subpopulations, [23] hence an imaging based examination can also provide an insight into the molecular programs driving the disease. Gutman et al. demonstrated that manually extracted visual features from MRI are associated with the GBM molecular profiles and survival duration. [11]

Computer-based evaluation techniques (image analysis and machine learning algorithms) have shown enormous potential in tackling detection and classification tasks on radiologic data. [24] Of the many image analysis techniques, computer derived texture features from MRI scans, a characterization of gray-level tumor heterogeneity, have been of particular interest in tissue and tumor segmentation tasks in GBM. [25] Other studies have indicated that MRI texture analysis can differentiate GBM from other types of brain tumors, such as low-grade gliomas or malignant glioneuronal tumors. [14, 26] Also, texture features and gray-level tumor heterogeneity in MRI have been studied for survival prediction in GBM patients. [27, 28] These studies all suggest that clinical aspects of

GBM can be characterized quite effectively by MRI-derived tumor texture features.

However, the relationship between any specific measure of gray-level tumor heterogeneity from tumor MRI scans and the underlying tumor biology (molecular status) 17 in GBM is still unclear. Some recent studies have suggested that texture features from diagnostic (MRI) images might have predictive value for the determination of tumor molecular phenotype (subtype) status in breast tumors. [29] Hence, we sought to investigate the feasibility of using texture features from tumor MRI scans for discrimination of molecular subtypes and 12-month overall survival (OS) status in GBM.

In particular, we evaluated the performance of five commonly used two-dimensional

(2D) texture feature descriptors for their potential value in these classification tasks. We also aim to compare the information presented in different image planes (axial, coronal, or sagittal) and different MRI sequences (postcontrast T1 weighted or T2 FLAIR).

2.2. Pre-processing of MRI scans

MRI scans were processed in three steps: nonparametric intensity nonuniformity normalization (N3-correction), tumor segmentation, and image reslicing.

N3-correction corrects the inhomogeneity in each set of MRI scans. [30] Such inhomogeneity, or shading artifacts, is due to intensity variations caused by inhomogeneous radio-frequency excitation or spatial variation in the reception sensitivity.

N3-correction corrects this inhomogeneity by iteratively estimating the spatial variation of sensitivity and tissue intensity distribution. [30] N3-correction was performed using the Medical Image Processing, Analysis, and Visualization software (National Institutes of Health, Bethesda, MD). [31] The left hand side of Figure 2.1 shows the inhomogeneity before correction and the right hand side shows the corrected image. 18

Figure 2.1 Brain MRI scan before and after N3 correction

To extract tumor image features, tumor segmentation was done using the Medical

Imaging Interaction Toolkit software (Heidelberg, Germany). [32] The segmentation was performed on a slice-by-slice basis separately on the post-contrast T1-weighted and the

T2 FLAIR sequences of each patient, yielding two 3D tumor volumes per patient. For post-contrast T1-weighted sequences, the contrast enhancing tumor was segmented. For 19

T2 FLAIR sequences, the solid tumor as well as infiltrating tumor and edema that is delineated by increased intensity were segmented. No image registration is done between the T1-post contrast and the T2-FLAIR image sequences – thus geometrical distortions between the two have not been considered.

The pixel spacing in each slice as well as the spacing between slices are not identical among the scans from different patients. To account for the unequal resolutions, image reslicing was performed (using trilinear interpolation) so that the voxel dimensions in the resulting three-dimensional (3D) image were isotropic. The reslicing step was performed in MATLAB using the NIfTI toolbox (MathWorks; Natick, MA).

Two-dimensional (2D) texture features were evaluated on the image slices with maximum tumor area from each of the image planes (axial, coronal, or sagittal) and for each MRI sequence (T1 post-contrast or T2 FLAIR). A rectangular tumor region (that encloses the segmented tumor area at that slice) was extracted, and image features were computed. An illustration of the entire pre-processing step is shown in Figure 2.2. 20

Figure 2.2 The pipeline of pre-processing step

2.3. Texture Feature Extraction

We extracted five sets of global image features: segmentation-based fractal texture, run length matrix, local binary patterns, histogram of oriented gradients, and

Haralick texture features. The detailed methods for deriving these features are described as follows, and the parameters we used for each type of features are summarized in

Appendix A.

2.3.1. Segmentation-based Fractal Texture Analysis (SFTA)

SFTA uses fractal dimension analysis to describe the complexity and texture features in the image. [33] The binary decomposition step in SFTA captures the regions with high intensity variations. The complexity of these regions is quantified by the 21

Hausdorff’s fractal dimension. The Hausdorff’s fractal dimension of a 2D tumor area is defined as

log N(ϵ) D0 = − lim logϵ N(ϵ) = lim ϵ→0 ϵ→0 log ϵ−1 where N(ϵ) is the number of ϵ×ϵ squares needed to cover the 2D area. [34] For digital images, the value of D0 can be approximated by the box-counting algorithm. [35]

To compute SFTA features, we first find 8 intensity thresholds (t1, t2, …, t8) by applying multi-level Otsu algorithm. [36] The multi-level Otsu algorithm determines the intensity thresholds so that the between-class variance is maximized. Then we decomposed a grayscale tumor image into 16 binary images using the two-threshold binary decomposition algorithm. Subsequently, these thresholds are applied to the grayscale image to generate binary maps. For each binary map, we computed three values as texture feature descriptors: fractal dimension, mean gray level, and area. These steps are illustrated in Figure 2.3.

22

Figure 2.3 Segmentation-based fractal texture analysis (SFTA) feature extraction. A tumor region of interest (ROI) is decomposed to 16 binary images by applying the two- threshold binary decomposition algorithm. For each binary image, three features are computed: fractal dimension, area, and the average graylevel of the corresponding pixels in the original image. The 48 features are combined to form the SFTA feature descriptor.

‘I’ refers to the graylevel representation of the image. 풕풏 and 풕풏+ퟏ represent the upper and lower limits of the 퐧퐭퐡 intensity bin.

2.3.2. Run-length matrix (RLM)

An RLM essentially describes the distribution of graylevel runs and captures coarseness in a grayscale image. [37] In an RLM, the entry (푖, 푗) is the number of gray level runs with gray level 푖 and run-length 푗. A gray level run is defined as a set of connected and collinear pixels that are of the same gray level. The length of the run is the number of pixels in that run.

For each tumor image, we constructed four RLMs, one for each direction of the run (0°, 45°, 90°, and 135°). Figure 2.4 shows how a typical RLM is constructed. For the

RLM in each of the four directions, we extracted 11 RLM texture features by weighting 23 and summing different terms in the RLM. [38, 39] To obtain rotational invariance we summed the texture features from all four directions. The formulas for computing 11

RLM features are listed in Appendix A.

Figure 2.4 Computing the run-length matrix (RLM). The original tumor ROI is reduced to a 4-bit grayscale image (16 different intensities). RLM is constructed by counting the number of runs with each gray level and run-lengths. Eleven features can be computed by weighting and summing different entries of the matrix.

2.3.3. Local binary patterns (LBP)

LBP describes the texture features of an image by comparing the intensities of neighboring pixels in small patches. [40] Consider the 3×3 patch around one pixel in a grayscale image. If the intensity of a surrounding pixel was greater than the center pixel, a 1 is recorded as the comparison result for that pixel; otherwise a 0 is recorded. Because there are eight surrounding pixels in a 3×3 patch, we arranged the comparison results into an 8-bit binary number, which was then converted to a decimal ranging from 0 to

255. These steps are summarized in 24

We obtained a binary pattern (a decimal) for every pixel in the tumor image. Then we computed the histogram of these binary patterns across all the pixels in the image.

The histogram had 256 bins, and the height in each bin was the frequency with which the binary pattern occurred in the image. After normalization, the histogram served as LBP texture features of the tumor image, quantifying the distribution of the 256 binary patterns within 3×3 patches.

Figure 2.5 Computing local binary patterns (LBP). In step 1, for each pixel in the tumor region of interest (ROI), a ퟑ×ퟑ patch covering the pixel is extracted. In step 2 we compare the pixel intensity of the surrounding pixels with the center pixel. The comparison results are either 1 (the surrounding pixels are more intense) or 0 (the surrounding pixels are less intense) and are recorded in a ퟑ×ퟑ table. In step 3 the results in the table form in the indicated direction a binary string, which is then converted to a decimal number. Eventually each pixel has a decimal number describing its binary pattern. We use the normalized histogram of these decimal numbers as the LBP texture features. 25

Figure 2.6 Computing histogram of oriented gradients (HOG) features. To ensure that the final HOG features had the same dimension, we resized all the original tumor regions of interest (ROIs) to ퟒퟎ×ퟒퟎ. Then the gradient map is computed and divided into 25 8×8 cells. In each cell we compute the histogram of the gradient intensities in each of the 9 directions. The histograms are illustrated by the short white lines in each cell. In each 2×2 blocks (such as the one in red full lines and the one in black dotted lines) the histograms are normalized via the L2-Hys scheme.

2.3.4. Histogram of oriented gradients (HOG)

HOG captures the distribution of intensity gradients in small regions in an image.

[41, 42] Similar to other texture descriptors, HOG is a representation of the neighborhood structure of pixel intensities.

To compute HOG features, we first computed the gradient of the image. Then we divided the gradient map into non-overlapping square patches (called cells). In each cell, we constructed a histogram in which the bin centers are directions (20°, 40°, … , 180°), and the heights are a combination of frequency and gradient intensity in that direction.

Next, the cells (and the histograms) were grouped together to form blocks for normalization. In general, each cell is 8×8 pixels. Each block consists of four cells (thus the size of each block is 16×16) and the stride (block overlap) is 8 pixels. The four 26 histograms within each block were normalized together with the L2-Hys scheme

(clipping the unit length vector then renormalizing). [43] The concepts of cells, blocks, and histograms of gradients are illustrated in.Figure 2.6.

2.3.5. Haralick texture features

Haralick texture features quantitatively describe the homogeneity, luminance, complexity, etc. of an image through 13 numerical texture features extracted from the gray-level co-occurrence matrix (GLCM). [44] The entry (푖, 푗) in a GLCM is the total number of pixel pairs that have a gray level 푖 and 푗 in the image in a specific direction and distance. The 13 features are derived from the GLCM.

Figure 2.7 Computing Haralick features. The tumor region of interest (ROI) was reduced to a 4-bit grayscale image (16 different intensities). The entry (i,j) of a gray-level co-occurrence matrix (GLCM) can be computed by counting the total number of pixel pairs that have a gray level i and j in the image at the specific distance. For each GLCM, 13 Haralick features can be computed by weighting and summing different entries of the matrix. Finally we found a total of 52 Haralick features for each tumor ROI (by concatenating the 13 features across different distances). 27

To obtain rotational invariance, we summed the GLCM over four canonical directions (0°, 45°, 90°, and 135°). We repeated the GLCM calculation with different pixel distance offsets to obtain both fine and coarse features. Figure 2.7 shows the four

GLCMs computed for a tumor region of interest (ROI).

2.4. Classification Algorithm

We used random forest (RF) classifiers to predict GBM molecular subtypes and to predict 12-month survival status (alive or dead at 12-month time point). [45] RF is an ensemble learning algorithm that combines the individual predictions of a collection of individual learners (decision trees) to predict a target class. Each decision tree in the RF is trained via bootstrap resampling of the data. The sampling is stratified by the class composition of the original data, thus providing tradeoff of true positive and true negative rates. RF has several advantages over other classifiers: it can handle high dimensional data (e.g., 576 HOG features in our study), including situations where the number of features is much larger than the number of cases, in addition to providing intrinsic cross- validation via an “out-of-bag” estimation of error rate. [45]

For each combination of feature set, image plane and MRI sequence, we build five individual RF binary classifiers, one for each of the five prediction tasks: to discriminate each subtype vs. the rest, as well as for 12-month survival status (alive or dead at 12-month time point). As the occurrence of the subtypes does not necessarily preclude the occurrence of other subtypes, [23, 46] independently assessing the biology of the tumor for each molecular subtype is useful. No variable selection was done for classifier construction, rather, the classifier was trained over all variables in a feature set 28

(e.g. all 48 SFTA features). The algorithm for building random forest models as well as

“out-of-bag” validation procedure is summarized as follows [47]:

1. 500 bootstrap samples (with replacement, of size twice that of the minority class) are drawn from the 82 patients.

2. A binary decision tree classifier is trained on each bootstrap sample. For each node of the tree, best split is chosen from √p randomly selected features (‘p’ represents the total number of features in the feature set). All the trees are grown until the terminal nodes are pure.

3. Each tree classifier predicts a label for data points not included in the bootstrap sample used to create that tree. This held-out data, also referred to as the ‘out- of-bag’ (OOB) data, is used for estimation of classifier performance. On average, each data point (patient) is out-of-bag for one-third of the 500 trees, on which the predictions are aggregated by majority voting. The overall performance of the random forest model is evaluated based on these OOB predictions.

We used the "randomForest" package in R to build random forest models. [47]

2.5. Experiments and Results

2.5.1. Dataset

82 patients (26 females, 56 males) with primary GBM, were identified from the

Cancer Genome Atlas (TCGA). These patients had routine MR imaging which includes both post-contrast T1-weighted and T2-weighted fluid-attenuated inversion recovery 29

(FLAIR) MRI images available in their medical records (data obtained from The Cancer

Image Archive). All MRI images were acquired prior to treatment. All patients received standard of care which includes surgical resection followed by concurrent chemoradiation. [17] This study was IRB-exempt under the TCGA data use agreement.

Molecular subtype labels and survival data for these patients were obtained from the cBioPortal, a public domain repository of genomic and clinical data from the TCGA patient cohort. [48]

Of the 82 GBM patients, 21 had the classical subtype, 29 had the mesenchymal subtype, 12 had the neural subtype, and 20 had the proneural subtype. We also dichotomized the overall survival duration data into long survival (Overall Survival> 12 months, n = 49) and short survival (OS< 12 months, n = 33) categories.

The MRI sequences used included axial post-contrast T1-weighted (TE/TR = 2.1–

20 ms/4.94-3285.62 ms; slice thickness = 1.4–5 mm; spacing between slices = 0.7–6.5 mm; matrix size = 256×256 or 512×512; and pixel spacing = 0.47–1.02 mm) and axial

T2-FLAIR (TE/TR = 14–155 ms/400–36737.38 ms; slice thickness = 2.5–5 mm; spacing between slices = 2.5–7.5 mm; matrix size = 256×256 or 512×512; and pixel spacing =

0.43–1.09 mm).

2.5.2. Prediction Performance Evaluation

The performance of the five classifiers was evaluated with receiver operating characteristic (ROC) curves. ROC curves show how the false-positive prediction rates

(FPR) vary with the true-positive prediction rates (TPR), i.e., sensitivity vs. 1-specificity.

For each of the four molecular subtype classification tasks, we define the patients of that 30 specific subtype as positive cases and the patients of other subtypes as negative cases. For classification of 12-month survival status, we define the patients who are alive at 12- month point as positive cases and those are deceased as negative cases. For each random forest model, the FPR and TPR are computed with the following formulas:

# of True Positive cases TPR = # of actual positive cases

# False Positive cases FPR = # of actual negative cases

The area under the ROC curve (AUC) quantifies the classification performance with a single number. The value of AUC ranges from 0 to 1; a high AUC indicates good performance. In addition to the AUC, the p-value of the AUC given by Mann-Whitney U test is also used to evaluate the classifier relative to random classification (AUC=0.5).

[49] These p-values are adjusted for multiple testing using Benjamini-Hochberg procedure to assess significance. [50] The intent of the Benjamini-Hochberg procedure is to control the false discovery rate (FDR), which in our case is the expected proportion of classification performance AUCs that are erroneously called significant. To perform

Benjamini-Hochberg correction, we first fix a user-defined level 훼 and rank the p-values in increasing order. Let 푀 denote the total number of p-values, 푖 denote the rank of the p- value 푝푖, and AUCi denote the AUC associated with 푝푖. Then AUCi is called significant if it satisfies the following:

훼 ⋅ 푗 푝 ≤ 푝 , 퐿 = max { 푗: 푝 < } 푖 퐿 푗 푀

This correction with Benjamini-Hochberg procedure has the property FDR ≤ 훼. 31

2.5.3. Results

For each patient, three 2D ROIs (from slices of maximum area in axial, coronal, and sagittal planes, respectively) were used for feature extraction. For each tumor ROI, we obtained 48 SFTA features, 44 RLM features, 576 HOG features, 256 LBP features, and 52 Haralick features.

As discussed above, we implemented five binary classification tasks (each subtype vs. rest, as well as for 12-month survival status) with random forest (RF) classifiers. The classification task was performed for each of the four molecular subtypes and for 12-month survival status using each of the five texture descriptors. Table 2.1 summarizes the AUC values of the five texture descriptors for all four subtypes and for

12-month survival status, with both post-contrast T1-weighted and T2-weighted FLAIR modalities across three different anatomic planes (axial, sagittal, and coronal). The lower and upper bound of 95% confidence intervals (CI) as well as the p-value of the AUC are also noted in Table 2.1. For example, the result in the fifth entry in the first column of

Table 2.1 refers to the performance of Haralick features derived from the axial slice of the post-contrast T1 weighted image to predict the classical subtype. The first number

0.72 refers to the AUC, the numbers in the parentheses (0.59, 0.82) refer to the lower and upper limits of a 95% CI. The last number 0.00 refers to the p-value of the AUC and the superscript “ a ” marked on the p-value indicates that the AUC was statistically significant after correction for multiple testing (Benjamini & Hochberg with 훼 < 0.05).

32

Table 2.1 Area under the ROC curve (AUC) for evaluating the performance of classifiers. The performance for classifying each of the four glioblastoma subtypes as well as 12-month survival status using five texture feature sets is shown. In each cell the four numbers are organized as “AUC value (95% confidence limits), p-value”. Each column corresponds to the performance of applying different texture feature sets on different scan modalities and anatomic planes. Abbreviations: T1 post, postcontrast T1- weighted image; T2 FLAIR, T2-weighted fluid-attenuated inversion recovery image; SFTA, segmentation-based fractal texture analysis; HOG, histogram of oriented 33 gradients; RLM, run-length matrix; LBP, local binary patterns; HARA, Haralick texture features.

Discovering Molecular Networks Underlying MRI Volumetric Features

3.1. Introduction

Determining the subtype of GBM patients can help identifying the treatment needed for individual patient and potentially extend patient survival. In Chapter 2 we show that the texture features derived from MRI scans have predictive value in discriminating GBM subtypes. However, the lack of therapeutic tools for GBM is still a burden in improving patient outcomes. Currently temozolomide is the only approved chemotherapy for GBM. [17] The heterogeneity of GBM tumors constitutes the central challenge to understanding GBM tumor biology and the development of novel personalized therapeutics for this disease. [51, 52] As radiographic imaging is able to characterize the whole tumor body, it provides a holistic way to study GBM heterogeneities.

34

35

Magnetic resonance imaging (MRI) is routinely used for non-invasive assessment of GBM. Recent radiogenomic studies have correlated image features extracted from

MRI scans with both prognostic information [53, 54] and genomic information of GBM.

[11, 55-58] In particular, these studies show that quantitative assessment of GBM subcompartment features, such as necrosis, enhancing tumor, and edema, are good correlates with genomic alterations and/or gene expression. [55, 57, 59-61] Tumor- associated edema identified on fluid-attenuated inversion recovery (FLAIR) MRI scans identified specific genomic correlates of cellular invasion in GBM. [12] A radiogenomic association map of MRI features, messenger RNA (mRNA) expression, and DNA copy number variation identified GBM biomarkers associated with each MRI feature. [62]

In the above mentioned studies, most genomic correlates are identified individually without taking into account the interactions between multiple genes.

Tumorigenesis is driven by mutations in multiple genes, with a single mutation being generally insufficient endow the transformed cell with all traits necessary for its unimpeded growth. Current models of tumor development implicate networks of genes responsible for resistance to apoptosis, replicative immortality, immune evasion, etc.

These hallmarks can develop in multiple ways, with different regulatory and signaling pathways being de-regulated or co-opted in different tumors (or even within the same tumor). [63] Thus, it is necessary to consider molecular interaction networks when correlating the genomic information with imaging features.

The molecular correlates identified by imaging genomic analysis not only can provide a better understanding of biology in GBM development, but also can serve as potential molecular targets of small molecule compounds in the process of drug 36 discovery. The Connectivity Map [64] database is a database consisting of gene expression profiles in human cancer cell lines induced by treatment with small molecule perturbagens in varying concentrations and for varying duration. This database identifies perturbagens that induce (upregulate) the under-expressed genes and repress

(downregulate) the overexpressed genes. The identified perturbagens are thus potential candidates for modulating disease-associated phenotypes.

In this study, we aim to relate MRI-derived subcompartment volumes with genomics information to identify underlying molecular networks. Further, we hypothesize that the disease phenotype-associated molecular networks can be used to prioritize potential drug options pertinent to induced gene signatures for potential therapeutic strategies in GBM.

3.2. Methods and Materials

3.2.1. Dataset

Genomic information and clinical data for GBM patients were obtained from The

Cancer Genome Atlas (TCGA). [65] This study is exempt from Institutional Review

Board approval under a TCGA data use agreement. Imaging data were downloaded from the Cancer Imaging Archive [66]. Thirty-eight patients (12 female, mean age 51.67 years;

26 male, mean age 59.84 years) who had primary GBM and received standard treatment

(surgical resection followed by concurrent chemoradiation [67]) were included in our study. Baseline pre-operative axial MRI scans across four modalities: T1-weighted, contrast-enhanced T1-weighted, T2-weighted, and FLAIR, were available for this cohort. 37

Pre-contrast T1-weighted scans were acquired with TE = 6.36–15 ms, TR =

416.7–3256.32 ms, slice thickness = 3–5 mm, spacing between slices = 3–6.5 mm, matrix size = 256×256 or 512×512, and pixel spacing = 0.43–0.86 mm. Post-contrast T1- weighted scans were acquired with TE = 6.36–20 ms, TR = 416.664-3256.24 ms, slice thickness = 2.5–5 mm, spacing between slices = 2.5–6.5 mm, matrix size = 256×256 or

512×512; pixel spacing = 0.43–0.86 mm. T2-weighted scans were acquired with TE =

20–110.32 ms, TR = 3000–5780.36 ms, slice thickness = 1.5–5 mm, spacing between slices = 1.5–7.5 mm, matrix size = 256×256 or 512×512, and pixel spacing = 0.43–1.02 mm. FLAIR scans were acquired with TE = 120.3–155 ms, TR = 9002–10004 ms, slice thickness = 2.5–5 mm, spacing between slices = 2.5–6.5 mm, matrix size = 256×256 or

512×512, and pixel spacing = 0.43–0.94 mm.

3.2.2. Automated Tumor Segmentation in MRI

MRI scans were segmented automatically by Brain Tumor Image Analysis

(BraTumIA) software (http://www.nitrc.org/projects/bratumia/). [68] The software first performed skull-stripping on each of the four modalities. This was done by first registering a brain atlas image to a patient image. The resulting affine transformation matrix was then applied to the brain mask of the atlas to generate the initial brain mask for the patient image. Finally the brain mask was refined by a level-set method. [69] The images were co-registered using standard rigid registration by optimizing mutual information. [26]

To segment the skull-stripped and co-registered images into tumor subcompartments, voxel-based classification was followed by spatial regularization. [26] 38

For classification, a total of 256 features were extracted voxel-by-voxel. These 256 features consisted of intensities, first-order textures, location features, symmetry features, and statistics on intensity gradients. Based on the 256 features of a voxel, random forest classification provided a probabilistic label posterior for the voxel, i.e., one of seven compartments: three brain tissue compartments (cerebrospinal fluid, white matter, gray matter) and four pathologic (tumor) tissue subcompartments (non-enhancing tumor, enhancing tumor, necrosis, and edema). Spatial regularization was formulated as an energy minimization problem, with the energy as the sum of two terms. The first term depends on the label posterior of the individual voxels, generated from the classification stage. The second energy term accounts for the information in the neighborhood of a voxel. This term penalizes different labels of neighboring voxels, controls smoothness based on local intensity variation, and imposes prior knowledge of tissue adjacency (e.g., low probability of adjacency between necrosis and healthy tissue). A tumor segmentation mask can be obtained by optimizing the summed energy term. The volumes of each subcompartment were computed according to the tumor mask. [68]

The volumes of the four pathologic subcompartments were divided by the total volume of the brain for normalization to determine the proportions of each subcompartment (i.e., proportion of edema, proportion of necrosis, etc.). The normalized volumes of the four tumor subcompartments were used in subsequent survival and pathway analyses. Figure 3.1 illustrates the steps to get a tumor mask and the corresponding normalized compartment volumes. 39

Figure 3.1 Getting volumetrics data from raw MRI scans. Firstly four sequences of MRI scans (T1, post-contrast T1, T2, and FLAIR) are obtained. Then automatic tumor segmentation is performed using BraTumIA software. Based on the segmentation, the volumes of four tumor subcompartments (necrosis, non-enhancing tumor, enhancing tumor, and edema) are computed.

3.2.3. Determination of Cutpoints on Tumor Subcompartment Volumetrics

Using survival data from the 38 patients, we estimated a cutpoint on each subcompartment volume (as described in the previous subsection) using the K-Adaptive

Partitioning Scheme (KAPS) [70], yielding two (cutpoint-defined) groups for each subcompartment. The cutpoint for each subcompartment was defined as the volume value that divided the patients into two groups with significantly different overall survival duration. The statistical significance of the survival difference between each two cutpoint-defined groups was estimated by the log-rank test. 40

3.2.4. Identification of Gene Networks using Context-Specific Subnetwork

Discovery

To discover indications of specific oncogenic or de-regulated gene networks, the

Context-Specific Subnetwork Discovery (COSSY) algorithm [71] to determine gene subnetworks that are altered in the induced survival groups for each subcompartment.

The COSSY analysis was performed for each subcompartment individually.

Molecular networks (from human signaling pathways in the Kyoto Encyclopedia of Genes and Genomes [KEGG]) were partitioned into molecular interaction subnetworks

(MISs) using the online implementation of the COSSY analysis, iCOSSY

(http://icossy.korea.ac.kr/). An MIS defines a community in the network consisting of only closely interacting molecular nodes. To extract the MISs from a network, COSSY first built a dendrogram using a hierarchical agglomerative algorithm. [72] The dendrogram was recursively split from the top until the derived communities were of appropriate size (less than 15 nodes). Finally, the nodes in the smaller communities (less than 5 nodes) were merged into the closest communities. [71]

For each MIS, the gene expression data for all 38 patients were clustered on the basis of differential gene expression, and an entropy score was computed for each MIS.

The entropy score is low when a cluster contained mostly one phenotype of samples and high when the cluster contained an equal number of positive and negative samples.

Finally, the MISs were sorted by their entropy scores, and the top five MISs were recorded for further analysis. 41

3.2.5. Pathway Identification using IPA

The genes in the iCOSSY-identified MISs for each subcompartment volume were subjected to Ingenuity Pathway Analysis (IPA) (http://www.ingenuity.com) to identify biological pathways relevant to each of GBM subcompartments, molecular interactions and biologically functional pathways for specified genes. Within IPA, the Ingenuity

Knowledge Base (Genes Only) was used to look for experimentally observed interactions. For each of the four tumor subcompartments, we recorded the top canonical pathways with the lowest p-values identified by IPA.

3.2.6. Identification of Perturbagens using Connectivity Map

The Connectivity Map (http://www.broadinstitute.org/cmap/) database was queried to find perturbagens with reference gene-expression profiles similar to the queried profile. The reference profile of a perturbagen is a rank-ordered list of genes based on its induced differential expression. For each tumor subcompartment, the query signature consisted of genes from the corresponding MISs (identified via COSSY). The genes in the query signature are divided into two groups based on whether it should be up-regulated (under-expressed in the low survival group) or down-regulated (over- expressed in the low survival group). Comparing the query signature and the reference profile, the similarity measure “connectivity score,” ranging from +1 to -1, was computed on the basis of a nonparametric, rank-based Kolmogorov-Smirnov statistical method. [64,

73] A positive connectivity score means under-expressed query genes tend to appear near the top of reference rank list and down-regulated query genes near the bottom, and vice versa for a negative connectivity score. 42

3.3. Results

3.3.1. Tumor Subcompartment Volumes Stratify OS in a Statistically Significant

Manner

KAPS revealed that the four tumor subcompartments stratify overall survival. For all the four tumor subcompartments, the overall survival of the two groups defined by cut point in the subcompartment volumes were significantly different (Figure 3.2). The cut points for necrosis, non-enhancing, enhancing, and edema subcompartments were

0.0635%, 0.168%, 3.25%, and 8.5% of the total brain volume, respectively. For the non- enhancing tumor, enhancing tumor, and edema subcompartments, the short-term survival group had larger volumes and the long-term survival group had smaller volumes (p =

0.02, p = 0.0025, and p = 0.0194, respectively). Conversely, for the necrosis subcompartment, the short-term survival group had smaller volumes and the long-term survival group had larger volumes (p = 0.0399).

Figure 3.2 The Kaplan-Meier survival curves demonstrating that different tumor compartment volumes stratify survivals. In all the four graphs, red dotted lines (G2) are the groups with smaller volumes in that specific radiologically defined tumor compartment (necrotic, non-enhancing, enhancing, or edema). 43

3.3.2. MIS Extracted by COSSY Algorithm Contains Closely Interacting

Molecular Nodes

From the TCGA gene expression data and the phenotype labels generated by survival-based cutpoint estimation, the COSSY algorithm extracted 20 MISs that correspond to the subcompartment–based survival groups, 5 for each subcompartment

(Table 3.1). Figure 3a shows one of the five MISs differentiate high volume vs. low volume of enhancing tumor (Figure 3.3a). There are 13 genes in this MIS: RASA1,

NCK2, ABL1; the ephrin-B receptor tyrosine kinases ligand family EFNB1-3; the EphB receptor tyrosine kinase transmembrane glycoproteins EPHB1,2,3,4,6; WAS, and WASL.

The other genes, RASA1, NCK2, ABL1, WAS, and WASL also belong to the ephrin signaling pathway, which is known to be involved in GBM development. An MIS that differentiates the volume of non-enhancing tumor has 13 genes: EFNA1-5 and EPHA1-8

(Figure 3.3b).

An MIS that differentiates high volume vs. low volume of edema is also obtained.

This MIS contains 10 genes PPP2R3C, PPP2R2A, PPP2R5A, PPP2R5B, PPP2R5D,

AKT1, MTCP1, PPP1R3F, BCL2L1, and TBC1D4 (Figure 3.3c). Another MIS differentiates high vs. low volume of necrosis (Figure 3.3d), and includes 21 genes mostly comprising elements of the Notch signaling pathway: NCSTN, APH1A, APH1B,

PSENEN, NUMB, NUMBL, JAG1,2, NOTCH1, LFNG, MFNG, RFNG, DLL1,3 and 4,

MIR326, DTX2, DTX3L, DTX1,3, and 4.

44

Figure 3.3 Molecular interaction subnetworks associated with enhancing tumor volume– derived survival groups. (a), (b), (c), and (d) are examples showing the MISs identified by the iCOSSY algorithm that differentiate volume of enhancing tumor, non-enhancing tumor, edema, and necrosis, respectively. The red molecules were over-expressed and the green were down-regulated in the short-term survival groups.

45

Table 3.1-a

MISs associated with edema GCK, PDX1, ONECUT1, NEUROD1, NKX2-2, PAX6, NEUROG3, MIS1 HHEX, MNX1, FOXA2, HNF1B, NR5A2, NKX6-1, IAPP, PAX4

PPP2R3C, PPP2R2A, PPP2R5A, PPP2R5B, PPP2R5D, AKT1, MTCP1, MIS2 PPP1R3F, BCL2L1, TBC1D4

PLCG1, PLCG2, PRKCA, PRKD1, TEC, ICAM1, SH3BP2, MIR30B, MIS3 MIR30C1, MIR30C2, MIR101-1, MIR101-2, SH2D2A, EZH2

PGM1, PGM2, HK1, HK3, HKDC1, SORD, KHK, PHPT1, MPI, MIS4 GNPDA1, GNPDA2, AMDHD2, GFPT1, GFPT2, GNPNAT1

MIS5 CHMP1B, VPS4A, VPS4B, CHMP5, CHMP2B, CHMP2A, VTA1

Table 3.1-b

MISs associated with enhancing tumor PABPC4L, PABPC5, PABPC1, PABPC1L2B, PABPC3, PABPC1L, MIS1 PABPC4, MSI2, DAZAP1, MSI1, PAN2, TOB1, TOB2, BTG3, BTG4, BTG1, BTG2, PAN3, CNOT7, CNOT8

PRKDC, TP53, TNFRSF10B, BBC3, TP73, RCHY1, MIR223, RPRM, GTSE1, PIDD, PMAIP1, EI24, TP53I3, SHISA5, PERP, ZMAT3, BAI1, MIS2 CD82, DDB2, SESN3, SESN1, SESN2, STEAP3, RFWD2, CCNG1, PPM1D, TBPL2, TBP, TBPL1

PGM1, PGM2, HK1, HK3, HKDC1, SORD, KHK, PHPT1, MPI, MIS3 GNPDA1, GNPDA2, AMDHD2, GFPT1, GFPT2, GNPNAT1

RASA1, NCK2, ABL1, EFNB1, EFNB2, EFNB3, EPHB1, EPHB2, MIS4 EPHB3, EPHB4, EPHB6, WAS, WASL

ACTG1, FLNB, FLNC, PARVB, PARVG, MYH15, ARPC3, ARPC1A, MIS5 ARPC5L, RDX, DMD, TMSB4X, TMSB4Y

46

Table 3.1-c

MISs associated with necrosis

PDE1A, ADK, GMPS, DCK, TAS1R2, GNAT3, TAS1R3, TAS1R1, TAS2R39, TAS2R40, TAS2R41, TAS2R43, TAS2R31, TAS2R46, MIS1 TAS2R30, TAS2R19, TAS2R20, TAS2R50, TAS2R60, TAS2R42, TAS2R3, TAS2R4, TAS2R16, TAS2R1, TAS2R9, TAS2R8, TAS2R7, TAS2R13, TAS2R10, TAS2R14, TAS2R5, TAS2R38

MIS2 BLM, RMI1, RMI2, TOP3A, TOP3B, RPA4, RPA1, RPA2, RPA3

PRKAG2, RHEB, DDIT4, TSC2, RRAGB, RRAGA, RRAGD, RRAGC, MIS3 RPTOR, TSC1

GNAL, OR5I1, OR52J3, OR52R1, OR4X2, OR5P2, OR8I2, OR2D2, OR56A4, OR2AP1, OR4K14, OR6P1, OR10Z1, OR6K6, OR9A4, OR9A2, OR13C8, OR1N2, OR1N1, OR52I2, OR6B3, OR56B4, OR5D14, OR8H1, OR5R1, OR5AR1, OR10S1, OR9Q1, OR9Q2, OR5A2, OR5A1, OR4D11, OR6C74, OR1L4, OR1A2, OR2K2, OR10J1, OR11A1, OR10H3, OR10H2, OR10H1, OR7C1, OR4F3, OR4D1, OR2V1, OR2H1, OR4C12, OR8D1, OR8D2, OR10A4, OR4N4, OR10H5, OR2L13, OR14A16, MIS4 OR52L1, OR6S1, OR1J1, OR52A5, OR51M1, OR51Q1, OR52D1, OR52N4, OR10A6, OR5W2, OR5T1, OR8K1, OR4D10, OR10V1, OR6X1, OR6C76, OR4K2, OR4K17, OR5AU1, OR2M7, OR2G6, OR2A25, OR13C2, OR1L6, OR2A5, OR51A4, OR2A42, OR2T27, OR6C68, OR4F21, OR5B3, OR4Q3, OR10J3, OR13G1, OR2B3, OR2C1, OR8U9, OR51B4, OR4K1, OR11H1, OR4F17, OR4K15, OR8J3, OR51G2, OR10W1, OR2G2, OR1A1, OR1E2

NCSTN, APH1A, APH1B, PSENEN, NUMB, NUMBL, JAG1, JAG2, MIS5 NOTCH1, LFNG, MFNG, RFNG, DLL3, DLL1, DLL4, MIR326, DTX2, DTX3L, DTX1, DTX3, DTX4

47

Table 3.1-d

MISs associated with non-enhancing tumor ALDH1A3, HAL, CNDP2, CARNS1, HDC, CNDP1, GAD1, GAD2, MIS1 WBSCR22, METTL6, HEMK1, METTL2B

CCL5, CCL3L3, CCL3, CCL4L2, CCL4, IL8, CCR9, CCR1, IRF5, EGR1, MIS2 U2AF1

NAGK, RENBP, HEXA, HEXB, CHIT1, CHIA, GNS, NAGA, A4GALT, MIS3 B3GALNT1, GBGT1

IL4, IFNG, IL5, IL13, TBX21, GATA3, MAF, NOS2, PSME3, PSME1, MIS4 PSME2

EFNA1, EFNA2, EFNA3, EFNA4, EFNA5, EPHA2, EPHA1, EPHA3, MIS5 EPHA4, EPHA5, EPHA7, EPHA8, EPHA6

Table 3.1. The 20 MISs generated by COSSY algorithm, 5 for each tumor subcompartment.

3.3.3. Ingenuity Pathway Analysis Identifies Phenotype Related Pathways

Containing MIS genes

Using IPA, biological pathways represented by the genes in the MISs found by the COSSY algorithm were identified. For each subcompartment, the top five canonical pathways that contribute to the differences in the phenotype were identified based on the p-values computed within IPA (Table 3.2). MIS genes associated with edema were implicated with NOS/ROS production in macrophages, p70S6K signaling, and UDP-N- acetyl-D-galactosamine biosynthesis. MIS genes associated with enhancing tumor were implicated with p53 signaling, integrin, axonal guidance, and ephrin receptor signaling, as well as actin nucleation by ARP-WASP. Necrosis was implicated with nucleotide excision repair, cell cycle control of chromosomal replication, amyloid processing, and 48

Notch signaling. Non-enhancing tumor volume was associated with hypercytokinemia/hyperchemokinemia, axonal guidance signaling, communication between innate and adaptive immune cells, and ephrin receptor signaling.

Tumor Top Canonical Pathways Found by IPA Subcompartment

Maturity onset diabetes of young (MODY) signaling GDP-glucose biosynthesis Production of nitric oxide and reactive oxygen species in Edema macrophages p70S6K signaling UDP-N-acetyl-D-galactosamine biosynthesis II

p53 signaling Actin nucleation by ARP-WASP complex Enhancing Integrin signaling Axonal guidance signaling Ephrin receptor signaling

Nucleotide excision repair pathway Cell cycle control of chromosomal replication Necrosis Amyloid processing Notch signaling Gustation pathway

Hypercytokinemia/hyperchemokinemia in the pathogenesis of influenza Axonal guidance signaling Non-enhancing Communication between innate and adaptive immune cells Ephrin receptor signaling Ephrin A signaling

Table 3.2 Top canonical pathways identified by IPA based on tumor subcompartment volumes. The pathways shown in italics have already been confirmed as biologically relevant to glioblastoma by previous studies. 49

3.3.4. Connectivity Map Identifies Perturbagens Reversing Phenotype

Associated Molecular signatures

Table 3.3 shows the perturbagens with high connectivity scores for reversing the gene expression profiles of the subcompartment-defined short-term survival groups. For each subcompartment, the perturbagens with the ten highest connectivity scores are listed. These are the perturbagens that most strongly induce the under-expressed genes and also repress the overexpressed genes. The perturbagens that have been studied as candidates for GBM treatment are indicated in italic type.

50

Edema Necrosis

Molecule Name Score Molecule Name Score arachidonyltrifluoromethane 1 H-89 1 mycophenolic acid 0.998 metergoline 0.997 bromocriptine 0.915 adrenosterone 0.994 dantrolene 0.9 mitoxantrone 0.99 procyclidine 0.895 cyclopenthiazide 0.977 ergocalciferol 0.888 clemizole 0.974 sirolimus 0.874 mebendazole 0.964 citiolone 0.867 cyclizine 0.952 bephenium hydroxynaphthoate 0.866 metampicillin 0.944 amoxapine 0.858 clomipramine 0.944

Enhancing Non-enhancing

Molecule Name Score Molecule Name Score irinotecan 1 raloxifene 1 hydralazine 0.988 sirolimus 0.854 benserazide 0.973 tribenoside 0.848 cefotaxime 0.97 haloperidol 0.819 metanephrine 0.94 alpha-yohimbine 0.807 Prestwick-682 0.934 tamoxifen 0.794 antazoline 0.929 wortmannin 0.787 nafcillin 0.916 genistein 0.776 anisomycin 0.916 sulfacetamide 0.769 lithocholic acid 0.915 loperamide 0.769

Table 3.3 Top perturbagens identified by Connectivity Map for each tumor subcompartment. The Connectivity Map analysis was based on the upregulation and downregulation of genes in the molecular interaction subnetworks that differentiated high volumes and low volumes in the four tumor subcompartments. The perturbagens shown in italics have been studied as potential treatments for GBM.

51

Identifying Molecular Programs Underlying Features of Whole Slide Tissue Images in Glioblastoma

4.1. Introduction

Although MRI is used for non-invasive diagnosis and assessment of GBM, its resolution limited the investigation of tumor microscopic structures. Instead, the Whole

Slide Tissue Images (WSI) from biopsy or surgery can provide detailed information regarding the growth of tumor. In WSIs, GBM is characterized by the presence of two main hallmarks: pseudopalisading necrosis and microvascular proliferation (MVP) [74].

Pseudopalisading necrosis is the areas of clear zone of necrotizing cells surrounded by accumulated tumor cells [75]. Necrosis forms due to the hypoxia caused by the lack of blood supply when tumor grows too fast. Cells in the hypoxia area cannot survive, thus forming an empty zone which surround by tumor cells. MVP is the proliferation of enlarged blood vessels in the tissue, typically in regions adjacent to necrosis. [75] MVP is developed as a result of increased demand of blood by tumor cells. The blood vessel in 52

MVP region form multiple endothelial layers. An illustration of necrosis and MVP is in

Figure 4.2(a) and Figure 4.2(e).

In WSI the image features used are mainly morphometric features of nuclei. The statistics of the morphometric features are shown to be predictive of GBM molecular subtypes [76, 77] and correlate with oligodendrocyte gene expression signature. [78]

Features from manual annotation on WSI were less used, as a full annotation of WSI is too time consuming and subject to inter- and intra- observer variations. [79] An attempt to use a feature set based on manually annotated necrosis and angiogenesis found it highly correlated with the expression of mesenchymal master regulators. [80]

In this study, we aim to first compute quantitative measures on the extent of GBM necrosis and MVP from whole slide tissue images, with a fully automatic method developed in previous work by our group. [81] We hypothesize that the image-derived features are correlated with biological pathway activities that are relevant to the development of glioblastoma.

4.2. Materials and Methods

4.2.1. Whole Slide Tissue Image Data

We identified 158 patients (62 females and 96 males) with primary GBM from the The Cancer Genome Atlas (TCGA). The median diagnosis age of the patients is 58 yr. This study was IRB-exempt under TCGA data use agreement. For these patients, 346 diagnostic whole slide tissue images (WSI) as well as clinical data were obtained directly from TCGA. 53

4.2.2. Pathway Activity Data

The PAthway Recognition Algorithm using Data Integration on Genomic Models

(PARADIGM) algorithm [82] can infer pathway activities levels by modeling the curated pathway interactions of genes. For the 158 patients we identified, we obtained pathway activity data generated by PARADIGM pathway analysis of the mRNA expression and copy number. [83] This dataset contains 677 abstract processes entities covering of 116 pathways that are in the NCI Pathway Interaction Database collection. For each abstract process, the dataset has a score representing the Inferred Pathway Activity (IPA).

4.2.3. Extracting Image Tiles and Removing Artifact

Algorithm for the extraction of necrosis and MVP were developed by us in prior work [81]. Because of the memory limits and the artifacts of the data, we first extracted small patches from the whole slide tissue images. At 0.5μm resolution, we generated image tiles of size 3000×3000 pixels with a stride size of 1500 pixels (50% overlapping).

As we need to count the whole tissue area, we did a full tiling instead of random tiling described in previous work. [81] For each extracted tile, we estimated the proportion of artifacts (glass, pen marks, and tissue folds) by comparing the brightness and color histograms. We discarded the tiles with 30% or more area that is considered as artifacts.

The tiling and artifact removal process is shown in Figure 1. We denote the total number

푡 6 of tiles in a whole slide image as 푇 and the area of each tile as 퐴0 (= 9×10 pixels), where 푡 = 1, 2, ⋯ , 푇. 54

Figure 4.1 Tiling and artifacts removal of a whole slide image. (a): Original whole slide image. The grid size is 3000×3000 pixels. The stride size (overlapping between tiles) is 1500 pixels; (b), (c), and (d): Tiles kept for further analysis containing microcellular proliferation, necrosis, or normal tumor region, respectively. (e), (f), and (g): Tiles to be discarded because of too much glass or pen marks.

4.2.4. Segmenting Microvascular Proliferation

On the extracted the tiles after artifacts removal, we applied the MVP segmentation algorithm described in our prior work [81] to estimate the MVP area proportion in each tile. A brief summary of the algorithm is as follows: 1) perform adaptive thresholding of the image followed by morphological transformation to identify candidate MVP regions; 2) filter the image by mean and median filter several times, and detect local minima in the resulting intensity map; 3) discard the candidate MVP regions which does not contain local minima; 4) extract geometric properties (areas, diameters, 55 major/minor axes) of the remaining candidate regions and validate these properties through a decision tree (based on domain knowledge of clinicians) to determine whether to confirm the candidate regions as MVP or discard them. These steps are illustrated in

Figure 2. We sum up the total number of pixels in the confirmed regions as the MVP area

푡 of the tile and denote as A푀, 푡 = 1, 2, ⋯ , 푇. 56

Figure 4.2 Segmentation process for MVP: a) a zoomed-in area showing MVP; b) candidate MVP region after thresholding and morphological transformations; c) filtered intensity maps and local minima; d) segmented MVP region after decision tree. Segmentation process for necrosis: e) a zoomed-in area showing necrosis; f) binary image of segmented cells; g) cell-count profile of the region; h) segmented necrosis region after decision tree.

4.2.5. Segmenting Necrosis

We also segmented the necrosis using the algorithm described in the previous paper [81]. The algorithm for necrosis segmentation has these major steps: 1) perform cell segmentation by intensity thresholding and morphological filtering; 2) create cell- 57 count profile by counting number of cells in small windows in the tile; 3) apply threshold to the cell-count profile and keep the regions with local minima as candidate necrosis regions; 4) use a decision tree (based on domain knowledge) on several properties (area, number of cells in center region, and number of cells in surrounding region) of candidate necrosis region to confirm if it is necrotic. This process is also summarized in Figure 2.

Again, the total number of pixels in confirmed necrotic regions is used as the necrotic

푡 area of the tile and denote as A푁, 푡 = 1, 2, ⋯ , 푇.

4.2.6. Correlating MVP and Necrosis Proportions with Pathway Activities

Using the area terms defined previously, MVP proportion is computed as:

푇 푡 ∑푡=1 퐴푀 푃푀푉푃 = 푇 푡 ∑푡=1 퐴0

And similarly, necrosis proportion is computed as:

푇 푡 ∑푡=1 퐴푀 푃푁푒푐푟표푠푖푠 = 푇 푡 ∑푡=1 퐴0

Then Spearman correlation coefficients between the 677 IPAs and the above feature proportions are then computed. The corresponding p-values were used to assess the significance of the correlation. The significance testing were corrected by Benjamini-

Hochberg method with a false discovery rate of 0.25. All the calculations are performed in MATLAB (MathWorks, Natick, MA). 58

4.3. Results

4.3.1. Distribution of the MVP and necrosis proportions

By analyzing the whole slide images we are able to compute 푃푁푒푐푟표푠푖푠 and 푃푀푉푃 for each patient. From all the 158 patients, the detected max, median, and min MVP proportions are 50.3%, 9.08%, and 0.0811%, respectively. The detected max, median, and min necrosis proportions are 59.1%, 13.9%, and 0.39%, respectively. For about 60% of the cases, their MVP or proportions are less than 10%. The histogram of the MVP proportions is shown in Figure 4.3(a) and the histogram of necrosis proportions is shown in Figure 4.3(b).

(a) (b)

Figure 4.3 Distributions of MVP and necrosis proportion among patients.

59

4.3.2. MVP and Necrosis Proportions are correlated with Abstract Processes of

Different Pathways

We computed the Spearman correlation between both MVP and necrosis proportions and the 677 PARADIGM entities activation values. After Benjamini-

Hochberg correction at a false discovery rate of 0.25, 25 abstract process activities in 7 pathways are significantly correlated with necrosis proportions, while 21 abstract process activities in 5 pathways are significantly correlated with MVP proportions. The abstract processes and corresponding pathways, as well as their correlation coefficients, are listed in Table 4.1 and Table 4.2.

60

Correlation Pathway Abstract Process Coefficient VEGFR1 specific signals VEGFR1 homodimer 0.216 endothelial cell proliferation -0.212 Class I PI3K signaling events mediated by Akt JNK cascade 0.248 Insulin-mediated glucose transport Insulin responsive Vesicles 0.206 BARD1 signaling events ubiquitination 0.252 mRNA polyadenylation -0.219 Signaling events mediated by HDAC Class II nuclear import -0.225 Glypican 1 network fibroblast growth factor receptor signaling pathway 0.241 cell growth 0.241 neuron differentiation 0.221 PLK1 signaling events segregation -0.270 regulation of centriole-centriole cohesion 0.265 positive regulation of microtubule depolymerization -0.247 microtubule cytoskeleton organization -0.241 spindle stabilization -0.238 G2 phase of mitotic cell cycle -0.236 spindle elongation -0.232 regulation of mitotic centrosome separation -0.232 Golgi organization -0.232 C13orf34 -0.232 interphase -0.231 mitosis -0.226 spindle assembly -0.226 metaphase plate congression -0.211 ubiquitin-dependent protein catabolic process 0.287

Table 4.1 Pathways that have abstract processes significantly correlated with necrosis proportions.

61

Correlation Pathway Abstract Process Coefficient Regulation of p38-alpha and p38-beta response to insulin stimulus 0.207 Signaling events mediated by HDAC Class II nuclear import -0.206 Glypican 1 network fibroblast growth factor receptor signaling pathway 0.283 cell growth 0.283 neuron differentiation 0.257 Arf1 pathway actin filament polymerization -0.216 EntrezGene:1313 -0.209 PLK1 signaling events chromosome segregation -0.314 mitosis -0.304 positive regulation of microtubule depolymerization -0.284 interphase -0.281 G2 phase of mitotic cell cycle -0.280 spindle elongation -0.271 regulation of mitotic centrosome separation -0.271 Golgi organization -0.271 C13orf34 -0.270 spindle stabilization -0.270 spindle assembly -0.264 metaphase plate congression -0.261 microtubule cytoskeleton organization -0.240 regulation of centriole-centriole cohesion 0.226

Table 4.2 Pathways that have abstract processes significantly correlated with MVP proportions.

62

Discussion and Future Directions

5.1. Performance of Predicting GBM Subtypes

Texture features are capable of characterizing gray-level heterogeneity in the images. As a first attempt to understand what specific aspects of gray-level tumor heterogeneity in MRI scans are relevant to GBM molecular subtype, we evaluated five texture feature types for their ability to discriminate molecular subtype and 12-month OS status. Our results show that tumor derived texture features were predictive of 12-month survival status and molecular subtypes. With the appropriate combination of texture feature set, image plane (axial, coronal, or sagittal) and MRI sequence, the area under

ROC curve (AUC) values for predicting different molecular subtypes and 12-month survival status are: 0.72 for classical (with Haralick features on T1 post-contrast axial scan), 0.70 for mesenchymal (with HOG features on T2 FLAIR axial scan), 0.75 for neural (with RLM features on T2 FLAIR axial scan), 0.82 for proneural (with SFTA 63

features on T1 post-contrast coronal scan), and 0.69 for 12-month survival status (with

SFTA features on T1 post-contrast coronal scan). No single image feature type gave the universally best performance in discriminating all four molecular subtypes or for predicting 12-month survival status, suggesting that different feature characteristics are relevant to different prediction tasks. Specific investigation into the biological correlates of such predictive texture features (within contrast enhancing or non-enhancing, infiltrating regions of the tumor) could yield insights into the underlying tumor biology and is a subject of future study. Such investigation is now feasible given the availability of companion genomics and proteomics data underlying large-scale efforts such as the

Cancer Genome Atlas (TCGA).

Overall, SFTA gave the best performance in terms of AUC averaged across all four molecular subtypes, two imaging modalities, and three image-planes. The average

AUC was 0.61. SFTA is especially good in discriminating proneural cases with post- contrast T1-weighted images in coronal and sagittal plane, with AUCs of 0.78 and 0.82, respectively.

For post-contrast T1-weighted and T2-weighted FLAIR scans of GBM, the variation in intensities within the tumor suggest different tumor subcompartments (for example, necrosis has lower intensities in postcontrast T1-weighted images and edema has higher intensities in T2-weighted FLAIR image). Thus the binary maps generated by the thresholding step in SFTA are likely to capture the boundaries between different tumor compartments. By computing the fractal dimension, SFTA can describe the complexity of the boundaries, which may be related to subtype status. 64

The other texture descriptors have different advantages. HOG is the only descriptor that effectively could predict the mesenchymal subtype (on an axial T2- weighted FLAIR scan). As with most descriptors that are based on gradient, HOG emphasizes the object’s overall shape in addition to the texture features. Haralick texture features had the best performance in predicting the classical subtype, with an average

AUC of 0.62. LBP was able to discriminate the neural and proneural subtypes in axial

T1-weighted contrast, which makes LBP useful when only axial postcontrast T1- weighted scans are available. These results indicates that combining different texture features might have predictive value for non-invasive characterization of molecular status.

Most texture features performed well in classifying the neural subtype, with an average AUC of 0.64. However, the texture descriptors perform relatively poorly

(average AUC 0.56) for discriminating the mesenchymal subtype.

Our results also suggest that image features derived from non-axial scans (coronal and sagittal) have predictive value for molecular characterization. Figure 4 show that when predicting subtypes, SFTA is better with sagittal scans (average AUC = 0.63) and coronal scans (average AUC = 0.61) than with axial scans (average AUC = 0.60).

Though these differences are not appreciable and likely due to random variation, a deeper examination of the systematic differences in performance due to different image planes is essential to validate these findings. This variation in predictive value with image plane also suggests an investigation of 3D tumor texture features for these prediction tasks and their value relative to 2D texture features alone. The comparison of 2D vs. 3D texture 65

features is also necessary to assess and quantify the trade-off between computation efficiency and loss of information between 2D and 3D image texture characterizations.

Although SFTA features were mostly predictive regardless of the orientation of the slice (coronal, sagittal, or axial), the average AUCs of the LBP and Haralick features from the sagittal and coronal scans were substantially lower than the AUC from the axial scan. The lower AUCs from the non-axial planes is likely an artifact of the redundancy introduced (from the interpolation procedure) during the reslicing process. The computation of Haralick and LBP features both involve a step that compares each pixel’s gray level with those of its immediate neighbors. This step can capture subtle intensity variations in the axial scan, but is confounded by information loss in the sagittal and coronal planes, thereby explaining the low AUCs. These results suggest that a systematic evaluation of these feature sets in a controlled validation cohort (with isotropic pixel resolution) is required to appropriately characterize the predictive value of texture features across different image planes.

5.2. Biological Relevance of the MRI traits

The results in Chapter 3 indicate that the four radiologically defined tumor compartments stratify survival, and the MISs identified that differentiate phenotypes defined by tumor volumetrics belong to pathways that are biologically relevant to GBM.

Further, some of the small molecule perturbagens identified by tumor compartment stratified survival are confirmed to have clinical potential in GBM. 66

The MISs identified differentiating phenotypes defined by tumor volumetrics belong to pathways that are biologically relevant to GBM. Further, some of the small molecule perturbagens identified by our analysis of tumor subcompartment–stratified survival duration have clinical potential in GBM [84-91]. One MIS (Figure 3c) includes several genes which encode regulatory subunits of protein phosphatase 2A (PP2A): PP2A is an important serine threonine kinase that controls cell growth and division and enhances GBM tumor stem cell survival under hypoxic environment. [92] The PP2A regulator genes as well as some other genes in this MIS are mostly associated with

P70S6K signaling pathway.

Of the 19 pathways identified by IPA, seven were have known functions in GBM.

One of these, the p70S6K signaling pathway, is one of the five top pathways that differentially activated in large versus small GBM edema volumes. Stimulation of surface receptor tyrosine kinases by ligands such as EGF and IGF [93], as well as mechanical stimulation [94] can induce phosphorylation cascades which activate ribosomal S6K.

Elevated S6 kinase activity results in increased protein synthesis and cell survival. [95,

96] One might speculate that physiological responses to edema might activate stress signaling resulting in S6K pathway activation. The activation of p70S6K can inhibit autophagic cell death in GBM [97] and studies exploring treatment possibilities of inhibiting p70S6K are already underway. [84, 98, 99]

IPA implicated that ephrin receptor canonical pathways differentiate large and small non-enhancing tumor volumes. Membrane-bound ephrin receptor tyrosine kinases facilitate direct cell-cell signaling (juxtacrine signaling) important in the developmental 67

processes of the central nervous system [100-102] and have known involvement in GBM.

[103-107] The binding of the ephrin-B ligands and the receptors plays a role in the development of nervous system and blood vessels. [108] The activation of EphA2 receptor by ephrin-A1 ligand can suppress tumor proliferation and migration. [103] In

GBM, the low level of ephrin-A1 leads to the lack of activation of EphA2 receptor, and results in overexpression. [107] EphrinA5, a member of the ephrin A subclass of the ephrin family, is under-expressed in GBM. EphrinA5 can suppress GBM tumor growth by downregulating EGFR, suggesting a possible approach for treatment. [106]

p53 signaling, integrin signaling, and the above-mentioned ephrin receptor signaling have known roles in GBM progression and are altered between large and small enhancing tumor volumes. The p53 transcription factor plays a central role in the cellular stress response and is important in regulating cell cycle arrest, apoptosis and senescence.

[109] Inactivation of the p53 pathway permits uncontrolled tumor cell division [110] and mutation of TP53 is common in primary GBM. [111] Integrin signaling regulates cell adhesion, migration and invasion in multiple biological contexts, including GBM.

[112] Recent studies have also shown that abnormal integrin signaling can interfere with p53 signaling and enhance resistance to treatment with temozolomide. [113]

The amyloid processing pathway and Notch signaling differentiate large and small necrosis volumes and have known known functions in GBM pathobiology. The amyloid β protein inhibits cell proliferation in human GBM U87 cells. [114, 115] The

Notch signaling pathway is frequently deregulated in GBM, and inhibition of this pathway suppresses cell growth and induces differentiation. [116] Also, the Notch 68

pathway can link angiogenesis with GBM stem cell self-renewal. [117] Though the relationships between several of these pathways and GBM have not been demonstrated hitherto, this data mining approach creates multiple hypotheses about phenotype- associated gene signatures that underlie GBM behavior.

The perturbagens we identified via Connectivity Map are of interest as well.

Sirolimus (rapamycin) is the one of the top 10 perturbagens for edema volume–associated genes. Sirolimus induces autophagy by inhibiting growth signaling by mTOR. Sirolimus has been shown to reduce tumor cell proliferation and to enhance cell death in primary

GBM cell lines in vitro and to double survival time in a GBM xenograft model. [118]

Several clinical trials have been conducted to test the effectiveness of sirolimus in combination with receptor tyrosine kinase inhibitors. [84-86]

H-89, mitoxantrone, and mebendazole are clinically relevant perturbagens that can possibly reverse necrosis volume–associated genes. H-89 is a protein kinase A inhibitor which suppresses EGFRvIII-induced tumor cell growth and migration in human

GBM cell lines. [87, 88] Mitoxantrone is an intercalating agent which induces DNA double-strand breaks and is effective in breast cancer and non-Hodgkin lymphoma. [119]

Other studies have shown that local delivery of mitoxantrone through a surgically created cavity can prolong survival of patients with recurrent GBM. [120, 121] Mebendazole, an antiparasitic drug, has been shown to be effective in a mouse GBM tumor model and in two preclinical models. [122, 123] Irinotecan and anisomycin are two perturbagens identified as potentially reversing expression of genes associated with enhancing tumor volume. Phase II trials have been conducted of irinotecan in combination with other 69

chemicals for patients with primary or recurrent GBM. [89, 90] Anisomycin can downregulate the PP2A catalytic subunit and induce cell death in human GBM cell lines

[124] and sensitizes GBM cells to death receptor–induced apoptosis. [91]

We identified three perturbagens, including sirolimus, tamoxifen, and genistein, as potentially reversing non-enhancing tumor volume–associated genes. Sirolimus was also identified for the edema volume–associated genes. High-dose tamoxifen has gone through a Phase II trial for the treatment of GBM. [125] Genistein has been proven to inhibit cell growth, proliferation, and angiogenesis in human GBM cell lines. [126, 127]

The eight clinically relevant perturbagens demonstrate the effectiveness of our approach to identifying perturbagens via analysis of tumor compartments and associated gene expression. This suggests that the other perturbagens identified in Table 2 could be potential candidates for GBM treatment, at least for initial in vitro assessment.

5.3. Pathways Correlating with Features derived from WSIs

In Chapter 4, we have identified nine pathways that are correlated with MVP and necrosis proportions. Five of the nine pathways have been shown to have impact on

GBM development.

Vascular endothelial growth factors (VEGFs) and their receptors are among the most important regulators of angiogenesis. [128] The additional expression of VEGFs is one of the reasons for the structurally and functionally abnormality of tumor vasculature in GBM. Thus, VEGF have been a main target for treating GBM in multiple studies and clinical trials. [129-131] 70

Pathways involving Phosphoinositide 3-kinase (PI3K) complex is a link between growth factors and cell functions such as proliferation, differentiation, and apoptosis.

Activated PI3K generates signals that are involved in increased cell growth and division.

[132] Increased activation of PI3K pathway is observed in most of the GBM cases by other studies. [111, 133]

ADP ribosylation factor (ARF) is one of the tumor suppressor . Typically,

ARF pathway is inactivated or mutated in glioblastoma tumors. [134] The deletion of the locus involving ARF is a key event in GBM development. [135]

Polo-like kinase 1 (PLK1) is involved in many key cell functions, including mitotic entry, cell cycle progression and cytokinesis. [136] In GBM tumors, PLK1 level is elevated compared to normal tissue. [137] The inhibition of PLK1 can restricts the growth of brain cancer cells. Thus, PLK1 has been researched as are target for GBM treatment. [137]

Regarding the feature extraction, our method relies on the domain expertise of clinicians, which means there is subjectivity in the model. The better approach for generating the model is to automatically learn the decision parameters from the image, rather than manually assign them. However, our method is still the first to comprehensively assess the MVP and necrosis in WSIs for the entire image. We have shown that this quantitative assessment has value in revealing the tumor biology. 71

5.4. Potential Future Directions

Clinical utility requires the classifiers to have strong sensitivity and specificity characteristics. The most practical sensitivity-specificity combination should be determined by the specific application scenario. These measures demonstrate how the classifier tradeoff between sensitivity and specificity, while the AUCs characterize the classifier’s overall performance.

We also note that other factors, such as MR hardware (coil pattern) and scanning parameters (TE, TR, etc.), will also have a direct impact on the image intensity histogram. A future texture analysis study controlling these factors will lead to potential ways of improving prediction performance.

Some of these texture features can be generalized to compute on the irregular shaped ROIs instead of the current rectangular ones. The classification performance with irregular shaped ROIs can be different and is worth inspecting. It also remains to be determined how the variation in inter-rater tumor segmentation affects the prediction performance of texture features. A detailed study that examines this variation across multiple human segmentations is an essential next step.

When identifying the molecular networks underlying GBM tumor volumetrics, we do not have enough samples to validate our results separately. An essential next step is to perform the analysis on a validation cohort with matched clinical characteristics and corresponding gene expresion data, so that the survival cutoffs and the molecular programs underlying them can be validated. 72

We note that in our image dataset, the MRI acquisition protocols vary between patients because the TCIA data is multi-site. This may affect the accuracy of the automatic tumor segmentation and volumetrics results given by BraTumIA. A detailed study on a controlled patient cohort is essential to determine how the MRI acquisition protocol affect the tumor subcompartment volumes determined from the scans.

For the WSI images, we were advised to manually exclude the patients that are heavily overstained and understained, as the staining will affect the results. A more robust staining correction method needs to be developed.

In the feature extraction process of WSI images we study uses the clinically established features (MVP and necrosis) in the images. However, there can be information from the WSI images that are not previously captured by human eyes. Recent developments in the field of Deep Learning can offer some promising methods on automatically learn features from the images. 73

Conclusions

In the project described in Chapter 2, we evaluated the performance of different texture features for predicting GBM molecular subtypes and 12 month survival status.

The results show that several texture feature types were predictive of survival and molecular subtype. Our results suggest that different combinations of image features, MR sequence and image plane are predictive for different tasks.

In Chapter 3, our study indicate that radiologically-defined tumor compartments are correlated with molecular pathways that are biologically relevant to GBM. Further, small molecule perturbagens identified by molecular correlates of survival-associated tumor subcompartments may have the potential to identify novel therapeutic options in

GBM.

In Chapter 4, our study shows that two important clinical hallmarks of GBM –

MVP and necrosis – can be automatically assessed in a comprehensive manner. This assessment can lead to discoveries of key GBM pathways by correlating the features with pathway activities. 74

75

References

[1] M. Desmeules, T. Mikkelsen, and Y. Mao, "Increasing incidence of primary malignant brain tumors: influence of diagnostic methods," J Natl Cancer Inst, vol. 84, pp. 442-5, Mar 18 1992. [2] W. Schima, "MRI of the pancreas: tumours and tumour-simulating processes," Cancer Imaging, vol. 6, pp. 199-203, 2006. [3] M. Kriege, C. T. Brekelmans, C. Boetes, P. E. Besnard, H. M. Zonderland, I. M. Obdeijn, et al., "Efficacy of MRI and mammography for breast-cancer screening in women with a familial or genetic predisposition," N Engl J Med, vol. 351, pp. 427-37, Jul 29 2004. [4] A. M. Rutman and M. D. Kuo, "Radiogenomics: creating a link between molecular diagnostics and diagnostic imaging," Eur J Radiol, vol. 70, pp. 232-41, May 2009. [5] S. P. Fodor, R. P. Rava, X. C. Huang, A. C. Pease, C. P. Holmes, and C. L. Adams, "Multiplexed biochemical assays with biological chips," Nature, vol. 364, pp. 555-6, Aug 5 1993. [6] M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, "Quantitative monitoring of gene expression patterns with a complementary DNA microarray," Science, vol. 270, pp. 467-70, Oct 20 1995. [7] A. Hsiao and M. D. Kuo, "High-throughput biology in the postgenomic era," J Vasc Interv Radiol, vol. 17, pp. 1077-85, Jul 2006. [8] M. A. Mazurowski, "Radiogenomics: what it is and why it is important," J Am Coll Radiol, vol. 12, pp. 862-6, Aug 2015. [9] M. D. Kuo and N. Jamshidi, "Behind the numbers: Decoding molecular phenotypes with radiogenomics--guiding principles and technical considerations," Radiology, vol. 270, pp. 320-5, Feb 2014. [10] C. A. Karlo, P. L. Di Paolo, J. Chaim, A. A. Hakimi, I. Ostrovnaya, P. Russo, et al., "Radiogenomics of clear cell renal cell carcinoma: associations between CT imaging features and mutations," Radiology, vol. 270, pp. 464-471, 2014. [11] D. A. Gutman, L. A. Cooper, S. N. Hwang, C. A. Holder, J. Gao, T. D. Aurora, et al., "MR imaging predictors of molecular profile and survival: multi-institutional study of the TCGA glioblastoma data set," Radiology, vol. 267, pp. 560-9, May 2013. [12] P. O. Zinn, B. Mahajan, P. Sathyan, S. K. Singh, S. Majumder, F. A. Jolesz, et al., "Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme," PLoS One, vol. 6, p. e25451, 2011. [13] P. A. Eliat, D. Olivie, S. Saikali, B. Carsin, H. Saint-Jalmes, and J. D. de Certaines, "Can dynamic contrast-enhanced magnetic resonance imaging combined with texture analysis differentiate malignant glioneuronal tumors from other glioblastoma?," Neurol Res Int, vol. 2012, p. 195176, 2012. [14] E. I. Zacharaki, S. Wang, S. Chawla, D. Soo Yoo, R. Wolf, E. R. Melhem, et al., "Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme," Magn Reson Med, vol. 62, pp. 1609-18, Dec 2009. 76

[15] R. L. Siegel, K. D. Miller, and A. Jemal, "Cancer statistics, 2015," CA Cancer J Clin, vol. 65, pp. 5-29, Jan-Feb 2015. [16] Q. T. Ostrom, H. Gittleman, J. Fulop, M. Liu, R. Blanda, C. Kromer, et al., "CBTRUS Statistical Report: Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2008-2012," Neuro Oncol, vol. 17 Suppl 4, pp. iv1-iv62, Oct 2015. [17] R. Stupp, W. P. Mason, M. J. van den Bent, M. Weller, B. Fisher, M. J. Taphoorn, et al., "Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma," N Engl J Med, vol. 352, pp. 987-96, Mar 10 2005. [18] R. G. Verhaak, K. A. Hoadley, E. Purdom, V. Wang, Y. Qi, M. D. Wilkerson, et al., "Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1," Cancer Cell, vol. 17, pp. 98-110, Jan 19 2010. [19] G. P. Dunn, M. L. Rinne, J. Wykosky, G. Genovese, S. N. Quayle, I. F. Dunn, et al., "Emerging insights into the molecular and cellular basis of glioblastoma," Genes Dev, vol. 26, pp. 756-84, Apr 15 2012. [20] J. Vaquero, R. Martinez, and M. Manrique, "Stereotactic biopsy for brain tumors: is it always necessary?," Surg Neurol, vol. 53, pp. 432-7; discussion 437-8, May 2000. [21] J. Yuen, C. X. Zhu, D. T. Chan, R. Y. Ng, W. Nia, W. S. Poon, et al., "A sequential comparison on the risk of haemorrhage with different sizes of biopsy needles for stereotactic brain biopsy," Stereotact Funct Neurosurg, vol. 92, pp. 160-9, 2014. [22] R. M. Prins, H. Soto, V. Konkankit, S. K. Odesa, A. Eskin, W. H. Yong, et al., "Gene expression profile correlates with T-cell infiltration and relative survival in glioblastoma patients vaccinated with dendritic cell immunotherapy," Clin Cancer Res, vol. 17, pp. 1603-15, Mar 15 2011. [23] A. P. Patel, I. Tirosh, J. J. Trombetta, A. K. Shalek, S. M. Gillespie, H. Wakimoto, et al., "Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma," Science, vol. 344, pp. 1396-401, Jun 20 2014. [24] P. Sajda, "Machine learning for detection and diagnosis of disease," Annu Rev Biomed Eng, vol. 8, pp. 537-65, 2006. [25] D. Assefa, H. Keller, C. Menard, N. Laperriere, R. J. Ferrari, and I. Yeung, "Robust texture features for response monitoring of glioblastoma multiforme on T1-weighted and T2-FLAIR MR images: a preliminary investigation in terms of identification and segmentation," Med Phys, vol. 37, pp. 1722-36, Apr 2010. [26] S. Bauer, H. Lu, C. P. May, L.-P. Nolte, P. Büchler, and M. Reyes, "Integrated segmentation of brain tumor images for radiotherapy and neurosurgery International Journal of Imaging Systems and Technology Volume 23, Issue 1," International Journal of Imaging Systems and Technology, vol. 23, pp. 59-63, 01 2013. [27] M. Heydari and University of Alberta. Dept. of Computing Science. (2009). Prognosis of Glioblastoma multiforme using textural properties on MRI. Available: http://hdl.handle.net/10048/799 77

[28] M. Zhou, L. Hall, D. Goldgof, R. Russo, Y. Balagurunathan, R. Gillies, et al., "Radiologically defined ecological dynamics and clinical outcomes in glioblastoma multiforme: preliminary results," Transl Oncol, vol. 7, pp. 5-13, Feb 2014. [29] E. J. Sutton, J. H. Oh, B. Z. Dashevsky, H. Veeraraghavan, A. P. Apte, S. B. Thakur, et al., "Breast cancer subtype intertumor heterogeneity: MRI-based features predict results of a genomic assay," J Magn Reson Imaging, vol. 42, pp. 1398-406, Nov 2015. [30] J. G. Sled, A. P. Zijdenbos, and A. C. Evans, "A nonparametric method for automatic correction of intensity nonuniformity in MRI data," IEEE Trans Med Imaging, vol. 17, pp. 87-97, Feb 1998. [31] M. J. McAuliffe, F. M. Lalonde, D. McGarry, W. Gandler, K. Csaky, and B. L. Trus, "Medical Image Processing, Analysis and Visualization in clinical research," in Computer-Based Medical Systems, 2001. CBMS 2001. Proceedings. 14th IEEE Symposium on, 2001, pp. 381-386. [32] I. Wolf, M. Vetter, I. Wegner, T. Bottger, M. Nolden, M. Schobinger, et al., "The medical imaging interaction toolkit," Med Image Anal, vol. 9, pp. 594-604, Dec 2005. [33] A. F. Costa, G. Humpire-Mamani, and A. J. M. Traina, "An Efficient Algorithm for Fractal Analysis of Textures," in Graphics, Patterns and Images (SIBGRAPI), 2012 25th SIBGRAPI Conference on, 2012, pp. 39-46. [34] M. R. Schroeder, Fractals, chaos, power laws : minutes from an infinite paradise, Dover ed. Mineola, N.Y.: Dover Publications, 2009. [35] R. M. Rangayyan, F. Oloumi, and T. M. Nguyen, "Fractal analysis of contours of breast masses in mammograms via the power spectra of their signatures," Conf Proc IEEE Eng Med Biol Soc, vol. 2010, pp. 6737-40, 2010. [36] P.-S. Liao, T.-S. Chen, and P.-C. Chung, "A fast algorithm for multilevel thresholding," J. Inf. Sci. Eng., vol. 17, pp. 713-727, 2001. [37] M. M. Galloway, "Texture analysis using gray level run lengths," Computer graphics and image processing, vol. 4, pp. 172-179, 1975. [38] A. Chu, C. M. Sehgal, and J. F. Greenleaf, "Use of gray value distribution of run lengths for texture analysis," Pattern Recognition Letters, vol. 11, pp. 415-419, 1990. [39] X. Tang, "Texture information in run-length matrices," Image Processing, IEEE Transactions on, vol. 7, pp. 1602-1609, 1998. [40] T. Ojala, M. Pietikäinen, and T. Mäenpää, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 971-987, 2002. [41] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 2005, pp. 886-893. [42] H. A. Rashwan, M. A. Mohamed, M. A. García, B. Mertsching, and D. Puig, "Illumination robust optical flow model based on histogram of oriented gradients," in Pattern recognition, ed: Springer, 2013, pp. 354-363. 78

[43] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol. 60, pp. 91-110, 2004. [44] R. M. Haralick, K. Shanmugam, and I. H. Dinstein, "Textural features for image classification," Systems, Man and Cybernetics, IEEE Transactions on, pp. 610- 621, 1973. [45] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001. [46] M. Gerlinger, A. J. Rowan, S. Horswell, J. Larkin, D. Endesfelder, E. Gronroos, et al., "Intratumor heterogeneity and branched evolution revealed by multiregion sequencing," New England Journal of Medicine, vol. 366, pp. 883-892, 2012. [47] A. Liaw and M. Wiener, "Classification and regression by randomForest," R news, vol. 2, pp. 18-22, 2002. [48] E. Cerami, J. Gao, U. Dogrusoz, B. E. Gross, S. O. Sumer, B. A. Aksoy, et al., "The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data," Cancer discovery, vol. 2, pp. 401-404, 2012. [49] X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez, et al., "pROC: an open-source package for R and S+ to analyze and compare ROC curves," BMC bioinformatics, vol. 12, p. 77, 2011. [50] Y. Benjamini and Y. Hochberg, "Controlling the false discovery rate: a practical and powerful approach to multiple testing," Journal of the Royal Statistical Society. Series B (Methodological), pp. 289-300, 1995. [51] M. K. Nicholas, R. V. Lukas, S. Chmura, B. Yamini, M. Lesniak, and P. Pytel, "Molecular heterogeneity in glioblastoma: therapeutic opportunities and challenges," Semin Oncol, vol. 38, pp. 243-53, Apr 2011. [52] J.-J. Zhu and E. T Wong, "Personalized medicine for glioblastoma: current challenges and future opportunities," Current molecular medicine, vol. 13, pp. 358-367, 2013. [53] W. B. Pope, J. Sayre, A. Perlina, J. P. Villablanca, P. S. Mischel, and T. F. Cloughesy, "MR imaging correlates of survival in patients with high-grade gliomas," AJNR Am J Neuroradiol, vol. 26, pp. 2466-74, Nov-Dec 2005. [54] M. A. Mazurowski, A. Desjardins, and J. M. Malof, "Imaging descriptors improve the predictive power of survival models for glioblastoma patients," Neuro Oncol, vol. 15, pp. 1389-94, Oct 2013. [55] P. O. Zinn, B. Mahajan, B. Majadan, P. Sathyan, S. K. Singh, S. Majumder, et al., "Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme," PLoS One, vol. 6, p. e25451, 2011. [56] R. R. Colen, M. Vangel, J. Wang, D. A. Gutman, S. N. Hwang, M. Wintermark, et al., "Imaging genomic mapping of an invasive MRI phenotype predicts patient outcome and metabolic dysfunction: a TCGA glioma phenotype research group project," BMC Med Genomics, vol. 7, p. 30, 2014. [57] B. M. Ellingson, "Radiogenomics and imaging phenotypes in glioblastoma: novel observations and correlation with molecular characteristics," Curr Neurol Neurosci Rep, vol. 15, p. 506, Jan 2015. [58] J. A. Carrillo, A. Lai, P. L. Nghiemphu, H. J. Kim, H. S. Phillips, S. Kharbanda, et al., "Relationship between tumor enhancement, edema, IDH1 mutational status, 79

MGMT promoter methylation, and survival in glioblastoma," AJNR Am J Neuroradiol, vol. 33, pp. 1349-55, Aug 2012. [59] H. X. Bai, A. M. Lee, L. Yang, P. Zhang, C. Davatzikos, J. M. Maris, et al., "Imaging genomics in cancer research: limitations and promises," Br J Radiol, vol. 89, p. 20151030, May 2016. [60] R. Stoyanova, M. Tachar, N. Erho, C. Lynne, S. Abraham, M. Patel, et al., "Radiogenomics of MRI-Guided Prostate Cancer Biopsy Habitats," Medical Physics, vol. 42, pp. 3605-3605, Jun 2015. [61] S. Yamamoto, D. D. Maki, R. L. Korn, and M. D. Kuo, "Radiogenomic Analysis of Breast Cancer Using MRI: A Preliminary Study to Define the Landscape," American Journal of Roentgenology, vol. 199, pp. 654-663, Sep 2012. [62] N. Jamshidi, M. Diehn, M. Bredel, and M. D. Kuo, "Illuminating radiogenomic characteristics of glioblastoma multiforme through integration of MR imaging, messenger RNA expression, and DNA copy number variation," Radiology, vol. 270, pp. 1-2, Jan 2014. [63] D. Hanahan and R. A. Weinberg, "Hallmarks of Cancer: The Next Generation," Cell, vol. 144, pp. 646-674, Mar 2011. [64] J. Lamb, E. D. Crawford, D. Peck, J. W. Modell, I. C. Blat, M. J. Wrobel, et al., "The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease," Science, vol. 313, pp. 1929-35, Sep 29 2006. [65] N. Cancer Genome Atlas Research, "Comprehensive genomic characterization defines human glioblastoma genes and core pathways," Nature, vol. 455, pp. 1061-8, Oct 23 2008. [66] "The Cancer Imaging Archive," https://public.cancerimagingarchive.net/ncia/, 2016. [67] R. Stupp, M. E. Hegi, W. P. Mason, M. J. van den Bent, M. J. Taphoorn, R. C. Janzer, et al., "Effects of radiotherapy with concomitant and adjuvant temozolomide versus radiotherapy alone on survival in glioblastoma in a randomised phase III study: 5-year analysis of the EORTC-NCIC trial," Lancet Oncol, vol. 10, pp. 459-66, May 2009. [68] N. Porz, S. Bauer, A. Pica, P. Schucht, J. Beck, R. K. Verma, et al., "Multi-modal glioblastoma segmentation: man versus machine," PLoS One, vol. 9, p. e96873, 2014. [69] S. Bauer, T. Fejes, and M. Reyes, "A skull-stripping filter for ITK," Insight Journal, 2012. [70] S.-H. Eo, S.-M. Hong, and H. Cho, "K-adaptive partitioning for survival data: The Kaps add-on package for R," arXiv preprint arXiv:1306.4615, 2013. [71] A. Saha, A. C. Tan, and J. Kang, "Automatic context-specific subnetwork discovery from large interaction networks," PLoS One, vol. 9, p. e84227, 2014. [72] A. Clauset, M. E. Newman, and C. Moore, "Finding community structure in very large networks," Physical review E, vol. 70, p. 066111, 2004. [73] A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, et al., "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles," Proceedings of the National 80

Academy of Sciences of the United States of America, vol. 102, pp. 15545-15550, 2005. [74] P. C. Burger, BW, and S. Vogel F, "Surgical pathology of the nervous system and its coverings," 1991. [75] Y. Rong, D. L. Durden, E. G. Van Meir, and D. J. Brat, "‘Pseudopalisading’necrosis in glioblastoma: a familiar morphologic feature that links vascular pathology, hypoxia, and angiogenesis," Journal of Neuropathology & Experimental Neurology, vol. 65, pp. 529-539, 2006. [76] L. A. Cooper, J. Kong, D. A. Gutman, F. Wang, S. R. Cholleti, T. C. Pan, et al., "An integrative approach for in silico glioma research," IEEE Transactions on Biomedical Engineering, vol. 57, pp. 2617-2621, 2010. [77] L. A. Cooper, J. Kong, F. Wang, T. Kurc, C. S. Moreno, D. J. Brat, et al., "Morphological signatures and genomic correlates in glioblastoma," in 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2011, pp. 1624-1627. [78] J. Kong, L. A. Cooper, F. Wang, J. Gao, G. Teodoro, L. Scarpace, et al., "Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates," PloS one, vol. 8, p. e81049, 2013. [79] M. J. van den Bent, "Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician’s perspective," Acta neuropathologica, vol. 120, pp. 297-304, 2010. [80] L. A. Cooper, J. Kong, D. A. Gutman, W. D. Dunn, M. Nalisnik, and D. J. Brat, "Novel genotype-phenotype associations in human cancers enabled by advanced molecular platforms and computational analysis of whole slide images," Laboratory investigation, vol. 95, pp. 366-376, 2015. [81] H. S. Mousavi, V. Monga, G. Rao, and A. U. Rao, "Automated discrimination of lower and higher grade gliomas based on histopathological image analysis," Journal of pathology informatics, vol. 6, p. 15, 2015. [82] C. J. Vaske, S. C. Benz, J. Z. Sanborn, D. Earl, C. Szeto, J. Zhu, et al., "Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM," Bioinformatics, vol. 26, pp. i237-i245, 2010. [83] "Broad Institute TCGA Genome Data Analysis Center (2016): PARADIGM pathway analysis of mRNA expression and copy number data," B. I. o. M. a. Harvard, Ed., ed, 2016. [84] D. A. Reardon, A. Desjardins, J. J. Vredenburgh, S. Gururangan, A. H. Friedman, J. E. Herndon II, et al., "Phase 2 trial of erlotinib plus sirolimus in adults with recurrent glioblastoma," Journal of neuro-oncology, vol. 96, pp. 219-230, 2010. [85] M. G. Chheda, P. Y. Wen, F. H. Hochberg, A. S. Chi, J. Drappatz, A. F. Eichler, et al., "Vandetanib plus sirolimus in adults with recurrent glioblastoma: results of a phase I and dose expansion cohort study," Journal of neuro-oncology, vol. 121, pp. 627-634, 2015. [86] S. Phuphanich, J. Rudnick, R. Chu, J. Yu, and K. Black, "A phase I trial of gefitinib and sirolimus in adults with recurrent glioblastoma multiforme (GBM)," in ASCO Annual Meeting Proceedings, 2008, p. 2088. 81

[87] H. Feng, B. Hu, K. Vuori, J. Sarkaria, F. Furnari, W. Cavenee, et al., "EGFRvIII stimulates glioma growth and invasion through PKA-dependent serine phosphorylation of Dock180," Oncogene, vol. 33, pp. 2504-2512, 2014. [88] E.-Y. Moon, G.-H. Lee, M.-S. Lee, H.-M. Kim, and J.-W. Lee, "Phosphodiesterase inhibitors control A172 human glioblastoma cell death through cAMP-mediated activation of protein kinase A and Epac1/Rap1 pathways," Life sciences, vol. 90, pp. 373-380, 2012. [89] T. N. Kreisl, L. Kim, K. Moore, P. Duic, C. Royce, I. Stroud, et al., "Phase II trial of single-agent bevacizumab followed by bevacizumab plus irinotecan at tumor progression in recurrent glioblastoma," Journal of Clinical Oncology, vol. 27, pp. 740-745, 2009. [90] D. A. Reardon, A. Desjardins, K. B. Peters, S. Gururangan, J. H. Sampson, R. E. McLendon, et al., "Phase II study of carboplatin, irinotecan, and bevacizumab for bevacizumab naive, recurrent glioblastoma," Journal of neuro-oncology, vol. 107, pp. 155-164, 2012. [91] S. Xia, Y. Li, E. M. Rosen, and J. Laterra, "Ribotoxic stress sensitizes glioblastoma cells to death receptor–induced apoptosis: requirements for c-Jun NH2-terminal kinase and Bim," Molecular Cancer Research, vol. 5, pp. 783-792, 2007. [92] C. P. Hofstetter, J.-K. Burkhardt, B. J. Shin, D. B. Gürsel, L. Mubita, R. Gorrepati, et al., "Protein phosphatase 2A mediates dormancy of glioblastoma multiforme-derived tumor stem-like cells during hypoxia," PloS one, vol. 7, p. e30059, 2012. [93] B. K. Law, M. E. Waltner-Law, A. J. Entingh, A. Chytil, M. E. Aakre, P. Norgaard, et al., "Salicylate-induced growth arrest is associated with inhibition of p70(s6k) and down-regulation of c-Myc, cyclin D1, cyclin A, and proliferating cell nuclear antigen," Journal of Biological Chemistry, vol. 275, pp. 38261- 38267, Dec 8 2000. [94] D. J. Sherwood, S. D. Dufresne, J. F. Markuns, B. Cheatham, D. E. Moller, D. Aronson, et al., "Differential regulation of MAP kinase, p70(S6K), and Akt by contraction and insulin in rat skeletal muscle," American Journal of Physiology- Endocrinology and Metabolism, vol. 276, pp. E870-E878, May 1999. [95] J. J. Pei, W. L. An, X. W. Zhou, T. Nishimura, J. Norberg, E. Benedikz, et al., "P70S6 kinase mediates tau phosphorylation and synthesis," Febs Letters, vol. 580, pp. 107-114, Jan 9 2006. [96] O. J. Shah, S. Ghosh, and T. Hunter, "Mitotic regulation of ribosomal S6 kinase 1 involves Ser/Thr, Pro phosphorylation of consensus and non-consensus sites by Cdc2," Journal of Biological Chemistry, vol. 278, pp. 16433-16442, May 2 2003. [97] K. Fujiwara, E. Iwado, G. B. Mills, R. Sawaya, S. Kondo, and Y. Kondo, "Akt inhibitor shows anticancer and radiosensitizing effects in malignant glioma cells by inducing autophagy," International journal of oncology, vol. 31, pp. 753-760, 2007. [98] H. Aoki, Y. Takada, S. Kondo, R. Sawaya, B. B. Aggarwal, and Y. Kondo, "Evidence that curcumin suppresses the growth of malignant gliomas in vitro and in vivo through induction of autophagy: role of Akt and extracellular signal- 82

regulated kinase signaling pathways," Molecular pharmacology, vol. 72, pp. 29- 39, 2007. [99] L. Zhang, H. Wang, J. Zhu, K. Ding, and J. Xu, "FTY720 reduces migration and invasion of human glioblastoma cell lines via inhibiting the PI3K/AKT/mTOR/p70S6K signaling pathway," Tumor Biology, vol. 35, pp. 10707-10714, 2014. [100] K. K. Murai and E. B. Pasquale, "'Eph'ective signaling: forward, reverse and crosstalk," Journal of Cell Science, vol. 116, pp. 2823-2832, Jul 15 2003. [101] N.-J. Xu and M. Henkemeyer, "Ephrin reverse signaling in axon guidance and synaptogenesis," Seminars in Cell & Developmental Biology, vol. 23, pp. 58-64, Feb 2012. [102] E. N. Grossman, C. A. Giurumescu, and A. D. Chisholm, "Mechanisms of Ephrin Receptor Protein Kinase-Independent Signaling in Amphid Axon Guidance in Caenorhabditis elegans," Genetics, vol. 195, pp. 899-+, Nov 2013. [103] J. Wykosky, D. M. Gibo, C. Stanton, and W. Debinski, "EphA2 as a novel molecular marker and target in glioblastoma multiforme," Molecular Cancer Research, vol. 3, pp. 541-551, 2005. [104] D.-P. Liu, Y. Wang, H. P. Koeffler, and D. Xie, "Ephrin-A1 is a negative regulator in glioma through down-reguation of EphA2 and FAK," International journal of oncology, vol. 30, pp. 865-871, 2007. [105] J. Fukai, H. Yokote, R. Yamanaka, T. Arao, K. Nishio, and T. Itakura, "EphA4 promotes cell proliferation and migration through a novel EphA4-FGFR1 signaling pathway in the human glioma U251 cell line," Molecular cancer therapeutics, vol. 7, pp. 2768-2778, 2008. [106] J. Li, D. Liu, G. Liu, and D. Xie, "EphrinA5 acts as a tumor suppressor in glioma by negative regulation of epidermal growth factor receptor," Oncogene, vol. 28, pp. 1759-1768, 2009. [107] D. P. Zelinski, N. D. Zantek, J. C. Stewart, A. R. Irizarry, and M. S. Kinch, "EphA2 overexpression causes tumorigenesis of mammary epithelial cells," Cancer research, vol. 61, pp. 2301-2306, 2001. [108] K. Pruitt, G. Brown, T. Tatusova, and D. Maglott, "The reference sequence (RefSeq) database," 2012. [109] T. Li, N. Kon, L. Jiang, M. Tan, T. Ludwig, Y. Zhao, et al., "Tumor Suppression in the Absence of p53-Mediated Cell-Cycle Arrest, Apoptosis, and Senescence," Cell, vol. 149, pp. 1269-1283, Jun 8 2012. [110] F. B. Furnari, T. Fenton, R. M. Bachoo, A. Mukasa, J. M. Stommel, A. Stegh, et al., "Malignant astrocytic glioma: genetics, biology, and paths to treatment," Genes & development, vol. 21, pp. 2683-2710, 2007. [111] R. McLendon, A. Friedman, D. Bigner, E. G. Van Meir, D. J. Brat, G. M. Mastrogianakis, et al., "Comprehensive genomic characterization defines human glioblastoma genes and core pathways," Nature, vol. 455, pp. 1061-1068, 2008. [112] M. Giovanna and A. H. Kaye, "Integrins: molecular determinants of glioma invasion," Journal of Clinical Neuroscience, vol. 14, pp. 1041-1048, 2007. [113] H. Janouskova, A. Maglott, D. Y. Leger, C. Bossert, F. Noulet, E. Guerin, et al., "Integrin α5β1 plays a critical role in resistance to temozolomide by interfering 83

with the p53 pathway in high-grade glioma," Cancer research, vol. 72, pp. 3463- 3470, 2012. [114] H. Zhao, J. Zhu, K. Cui, X. Xu, M. O'Brien, K. K. Wong, et al., "Bioluminescence imaging reveals inhibition of tumor cell proliferation by Alzheimer's amyloid β protein," Cancer cell international, vol. 9, p. 15, 2009. [115] D. Paris, K. Townsend, A. Quadros, J. Humphrey, J. Sun, S. Brem, et al., "Inhibition of angiogenesis by Aβ peptides," Angiogenesis, vol. 7, pp. 75-85, 2004. [116] M. Kanamori, T. Kawaguchi, J. M. Nigro, B. G. Feuerstein, M. S. Berger, L. Miele, et al., "Contribution of Notch signaling activation to human glioblastoma multiforme," Journal of neurosurgery, vol. 106, pp. 417-427, 2007. [117] K. E. Hovinga, F. Shimizu, R. Wang, G. Panagiotakos, M. Van Der Heijden, H. Moayedpardazi, et al., "Inhibition of notch signaling in glioblastoma targets cancer stem cells via an endothelial cell intermediate," Stem cells, vol. 28, pp. 1019-1029, 2010. [118] A. Arcella, F. Biagioni, M. Antonietta Oliva, D. Bucci, A. Frati, V. Esposito, et al., "Rapamycin inhibits the growth of glioblastoma," Brain Res, vol. 1495, pp. 37-51, Feb 7 2013. [119] M. Senkal, J. Tonn, R. Scho, W. Schachenmayr, U. Eickhoff, M. Kemen, et al., "Mitoxantrone-induced DNA strand breaks in cell-cultures of malignant human astrocytoma and glioblastoma tumors," Journal of neuro-oncology, vol. 32, pp. 203-208, 1997. [120] A. Boiardi, A. Silvani, M. Eoli, E. Lamperti, A. Salmaggi, P. Gaviani, et al., "Treatment of recurrent glioblastoma: can local delivery of mitoxantrone improve survival?," Journal of neuro-oncology, vol. 88, pp. 105-113, 2008. [121] A. Boiardi, M. Eoli, A. Salmaggi, E. Lamperti, A. Botturi, G. Broggi, et al., "Systemic temozolomide combined with loco-regional mitoxantrone in treating recurrent glioblastoma," Journal of neuro-oncology, vol. 75, pp. 215-220, 2005. [122] R.-Y. Bai, V. Staedtke, C. M. Aprhys, G. L. Gallia, and G. J. Riggins, "Antiparasitic mebendazole shows survival benefit in 2 preclinical models of glioblastoma multiforme," Neuro-oncology, p. nor077, 2011. [123] R.-Y. Bai, V. Staedtke, T. Wanjiku, M. A. Rudek, A. D. Joshi, G. L. Gallia, et al., "Brain Penetration and Efficacy of Different Mebendazole Polymorphs in a Mouse Brain Tumor Model," Clinical Cancer Research, p. clincanres. 2681.2014, 2015. [124] J.-y. Li, J.-y. Huang, M. Li, H. Zhang, B. Xing, G. Chen, et al., "Anisomycin induces glioma cell death via down-regulation of PP2A catalytic subunit in vitro," Acta Pharmacologica Sinica, vol. 33, pp. 935-940, 2012. [125] H. I. Robins, M. Won, W. F. Seiferheld, C. J. Schultz, A. K. Choucair, D. G. Brachman, et al., "Phase 2 trial of radiation plus high-dose tamoxifen for glioblastoma multiforme: RTOG protocol BR-0021," Neuro-oncology, vol. 8, pp. 47-52, 2006. [126] A. Das, N. L. Banik, and S. K. Ray, "Flavonoids activated caspases for apoptosis in human glioblastoma T98G and U87MG cells but not in human normal astrocytes," Cancer, vol. 116, pp. 164-176, 2010. 84

[127] C. P. Haar, P. Hebbar, G. C. Wallace IV, A. Das, W. A. Vandergrift III, J. A. Smith, et al., "Drug resistance in glioblastoma: a mini review," Neurochemical research, vol. 37, pp. 1192-1200, 2012. [128] S.-P. Weathers and J. de Groot, "VEGF manipulation in glioblastoma," Oncology, vol. 29, pp. 719-719, 2015. [129] T. T. Batchelor, D. G. Duda, E. di Tomaso, M. Ancukiewicz, S. R. Plotkin, E. Gerstner, et al., "Phase II study of cediranib, an oral pan–vascular endothelial growth factor receptor tyrosine kinase inhibitor, in patients with recurrent glioblastoma," Journal of Clinical Oncology, p. JCO. 2009.26. 3988, 2010. [130] W. Pope, A. Lai, P. Nghiemphu, P. Mischel, and T. Cloughesy, "MRI in patients with high-grade gliomas treated with bevacizumab and chemotherapy," Neurology, vol. 66, pp. 1258-1260, 2006. [131] J. J. Raizer, S. Grimm, M. C. Chamberlain, M. K. Nicholas, J. P. Chandler, K. Muro, et al., "A phase 2 trial of single‐agent bevacizumab given in an every‐ 3‐week schedule for patients with recurrent high‐grade gliomas," Cancer, vol. 116, pp. 5297-5305, 2010. [132] R. Zoncu, A. Efeyan, and D. M. Sabatini, "mTOR: from growth signal integration to cancer, diabetes and ageing," Nature reviews Molecular cell biology, vol. 12, pp. 21-35, 2011. [133] H. Mao, D. G. LeBrun, J. Yang, V. F. Zhu, and M. Li, "Deregulated signaling pathways in glioblastoma multiforme: molecular mechanisms and therapeutic targets," Cancer investigation, vol. 30, pp. 48-56, 2012. [134] M. Nakamura, T. Watanabe, U. Klangby, C. Asker, K. Wiman, Y. Yonekawa, et al., "p14ARF deletion and methylation in genetic pathways to glioblastomas," Brain Pathology, vol. 11, pp. 159-168, 2001. [135] D. A. Solomon, J.-S. Kim, W. Jean, and T. Waldman, "Conspirators in a capital crime: co-deletion of p18INK4c and p16INK4a/p14ARF/p15INK4b in glioblastoma multiforme," Cancer research, vol. 68, pp. 8657-8660, 2008. [136] R. M. Golsteyn, K. E. Mundt, A. M. Fry, and E. A. Nigg, "Cell cycle regulation of the activity and subcellular localization of Plk1, a human protein kinase implicated in mitotic spindle function," The Journal of cell biology, vol. 129, pp. 1617-1628, 1995. [137] C. Lee, A. Fotovati, J. Triscott, J. Chen, C. Venugopal, A. Singhal, et al., "Polo‐ Like Kinase 1 Inhibition Kills Glioblastoma Multiforme Brain Tumor Cells in Part Through Loss of SOX2 and Delays Tumor Progression in Mice," Stem Cells, vol. 30, pp. 1064-1075, 2012. 85

Appendix A

Formulas for RLM features and Haralick features

Here we give detailed equations for run-length matrix (RLM) texture features as well as Haralick texture features. The 11 RLM features are described in equations

[A1~A11] and the 13 Haralick texture features are described in equations [A12~A24].

For a gray-level run-length matrix 푝 where 푝(푖, 푗) indicates the number of runs with pixels of gray level 푖 and run-length 푗. Let 푀 be the total number of gray levels, 푁 be the maximum run-length, 푛푟 be the total number of runs, and 푛푝 be the total number of pixels in the image. The 11 RLM features are defined as follows.

1. Short Run Emphasis (SRE):

푀 푁 1 푝(푖, 푗) SRE = ∑ ∑ 2 (A1) 푛푟 푗 푖=1 푗=1

SRE emphasize on the distribution of short greylevel runs. The value is large for fine texture patterns.

2. Long Run Emphasis (LRE):

푀 푁 1 LRE = ∑ ∑ 푝(푖, 푗) ⋅ 푗2 (A2) 푛푟 푖=1 푗=1

LRE emphasize on the distribution of long greylevel runs. The value is large for coarse texture patterns. 86

3. Gray-Level Nonuniformity (GLN):

2 푀 푁 1 GLN = ∑ (∑ 푝(푖, 푗)) (A3) 푛푟 푖=1 푗=1

GLN measures the non-uniformity of graylevels in the image.

4. Run Length Nonuniformity (RLN):

푁 푀 2 1 RLN = ∑ (∑ 푝(푖, 푗)) (A4) 푛푟 푗=1 푖=1

RLN measures the non-uniformity in the run-lengths.

5. Run Percentage (RP):

푛푟 RP = (A5) 푛푝

Run percentage measures the homogeneity of runs. The value is also large for fine texture patterns.

6. Low Gray-Level Run Emphasis (LGRE):

푀 푁 1 푝(푖, 푗) LGRE = ∑ ∑ 2 (A6) 푛푟 푖 푖=1 푗=1

LGRE emphasize on the low graylevel runs distribution.

7. High Gray-Level Run Emphasis (HGRE): 87

푀 푁 1 HGRE = ∑ ∑ 푝(푖, 푗) ⋅ 푖2 (A7) 푛푟 푖=1 푗=1

HGRE emphasize on the high graylevel runs distribution.

The following features aim to quantify the joint distribution of run-lengths and graylevels.

8. Short Run Low Gray-Level Emphasis (SRLGE):

푀 푁 1 푝(푖, 푗) SRLGE = ∑ ∑ 2 2 (A8) 푛푟 푖 ⋅ 푗 푖=1 푗=1

9. Short Run High Gray-Level Emphasis (SRHGE):

푀 푁 1 푝(푖, 푗) ⋅ 푖2 SRHGE = ∑ ∑ 2 (A9) 푛푟 푗 푖=1 푗=1

10. Long Run Low Gray-Level Emphasis (LRLGE):

푀 푁 1 푝(푖, 푗) ⋅ 푗2 LRLGE = ∑ ∑ 2 (A10) 푛푟 푖 푖=1 푗=1

11. Long Run High Gray-Level Emphasis (LRHGE):

푀 푁 1 LRHGE = ∑ ∑ 푝(푖, 푗) ⋅ 푖2 ⋅ 푗2 (A11) 푛푟 푖=1 푗=1

88

For computation of Haralick features, we use 푝 to indicate the gray-level co- occurrence matrix (GLCM). 푝(푖, 푗) indicates the (푖, 푗)th element of the GLCM. We use 푚 and 푛 to indicate the number of pixels in the specified dimension.

The 13 Haralick features are defined as follows:

1. Angular second moment (ASM):

푁 푁 1 ASM = ∑ ∑ 푝(푖, 푗)2 (A12) 푛푟 푖=1 푗=1

The ASM measures the homogeneity of an image. An image which has fewer gray levels generally is more homogeneous and has a higher ASM.

2. Contrast:

푁−1 2 Contrast = ∑ 푘 푝푥−푦(푘), 푘 = |푖 − 푗| (A13) 푘=0

Contrast measures the luminance difference presented in an image.

3. Correlation:

푁 푁 1 Correlation = ∑ ∑(푖푗)푝(푖, 푗) − 휇푥휇푦 (A14) 휎푥휎푦 푖=1 푗=1

Correlation measures the gray-level linear dependence.

4. Variance: 89

푁 푁 Variance = ∑ ∑(1 − 휇)2푝(푖, 푗) (A15) 푖=1 푗=1

Variance measures how the gray levels deviate from the mean value of 푝(푖, 푗).

5. Inverse difference moment (IDM):

푁 푁 1 IDM = ∑ ∑ 푝(푖, 푗) (A16) 1 + (푖 − 푗)2 푖=1 푗=1

IDM emphasizes on the locally homogeneous regions and reduces the weight on inhomogeneous regions.

6. Sum Average:

2푁−2 Sum Average = ∑ 푘 ⋅ 푝푥+푦(푘) (A17) 푘=0

Sum average estimates the mean of the sum histogram.

7. Sum Variance:

2푁−2 2 Sum Variance = ∑ (푘 − Sum Average) ⋅ 푝푥+푦 (푘) (A18) 푘=0

Sum variance estimates the spread of the sum histogram.

8. Sum Entropy:

2푁−2 Sum entropy = − ∑ 푝푥+푦(푘) log (푝푥+푦(푘)) (A19) 푘=0

Sum entropy quantifies the homogeneity of the sum histogram. 90

9. Entropy:

푁 푁 Entropy = − ∑ ∑ 푝(푖, 푗) log(푝(푖, 푗)) (A20) 푖=1 푗=1

Entropy quantifies the homogeneity of the image. Homogeneous images have lower entropy values.

10. Difference Variance:

푁−1 푁−1 2

Difference Variance = ∑ 푝|푥−푦| (푘 − ∑ 푙푝|푥−푦|(푘)) (A21) 푘=0 푖=0

Difference Variance measures the spread of the difference histogram.

11. Difference entropy:

푁−1 Difference Entropy = − ∑ 푝|푥−푦|(푘) log (푝|푥−푦|(푘)) (A22) 푘=0

Difference entropy measures the homogeneity of the difference histogram.

12. Cluster Shade:

푁 푁 3 Cluster shade = ∑ ∑(푖 + 푗 − 휇푥 − 휇푦) 푝(푖, 푗) (A23) 푖=1 푗=1

13. Cluster Prominence:

푁 푁 4 Cluster Prominence = ∑ ∑(푖 + 푗 − 휇푥 − 휇푦) 푝(푖, 푗) (A24) 푖=1 푗=1 91

Cluster shade and cluster prominence both measure the skewness, or lack of symmetry, of the image. 92