Original Research n Computer Applications 387 ------

1

ancer: C http://radiology.rsna.org/lookup http://radiology.rsna.org/lookup ung L www.rsna.org/rsnarights. ell C RSNA, 2012

personalized medicine. personalized q Supplemental material: /suppl/doi:10.1148/radiol.12111607/-/DC1 with clusters of coexpressed genes (metagenes) was de (metagenes) was genes of coexpressed with clusters map is created radiogenomics correlation a fined. First, and image features for a pairwise association between built are models of metagenes predictive Next, metagenes. linear regres using sparse by in terms of image features are models of image features predictive sion. Similarly, signif the prognostic Finally, built in terms of metagenes. in a evaluated are image features icance of the predicted outcomes. data set with survival public applied to a cohort of was strategy This radiogenomics and gene expression whom for NSCLC with patients 26 and (CT) computed tomography from 180 image features available. were (PET)/CT tomography emission positron 243 statistically significant pairwise correla were There and metagenes of NSCLC. image features tions between with in terms of image features predicted Metagenes were fourteen of 180 of 59%–83%. One hundred an accuracy value uptake and the PET standardized image features CT with an accuracy in terms of metagenes predicted were were image features of 65%–86%. When the predicted data set with survival mapped to a public gene expression and sharpness ranked edge shape, tumor size, outcomes, significance. highest for prognostic for identifying imaging bio strategy This radiogenomics im of novel evaluation rapid enable a more may markers to their translation accelerating thereby aging modalities, To identify prognostic imaging biomarkers in non–small in non–small biomarkers imaging identify prognostic To of a radiogenomics means by cell lung (NSCLC) and medical im gene expression integrates that strategy not avail outcomes are ages in patients for whom survival data in public gene expression survival leveraging able by data sets. for associating image features strategy A radiogenomics mall mall S Results: Purpose: Methods: Conclusion: Materials and on–

Identifying Prognostic Imaging Imaging Prognostic Identifying by Leveraging Public Biomarkers Data— Microarray Gene Expression and PreliminaryMethods Results N - radiology.rsna.org

n

S.K.P. (e-mail: (e-mail: S.K.P. This copy is for your personal non-commercial use only. To order presentation-ready presentation-ready order To personalonly. use non-commercial for your is copy This

). Note: at us contact clients, or colleagues distribution for copies to your Address correspondence to Volume 264: Number 2—August 2012 264: Volume RSNA, RSNA, 2012

From the Departments of Radiology (O.G., A.N.L., A.Q., A.Q., A.N.L., the Departments of Radiology (O.G., From

q Luxembourg; and a Henri Benedictus Fellow of the King Luxembourg; and a Henri Benedictus Fellow American Educational and the Belgian Baudouin Foundation Foundation. [email protected] GE Healthcare. O.G. is a fellow of the Fund for Scientific Re O.G. GE Healthcare. search Flanders (FWO-Vlaanderen); an Honorary Fulbright Scholar of the Commission for Educational Exchange and Belgium, America, between the United States of September 30; revision received November 26; accepted 2012. February 21, December 29; final version accepted Supported by Information Sciences in Imaging at Stanford, and the Center for Cancer Systems Biology at Stanford, Cardiothoracic Surgery (C.D.H., Y.X.), Y.X.), Cardiothoracic Surgery (C.D.H., CA 94305; Stanford, Rd, Welch 1201 School of Medicine, Affairs and Veterans Alto Palo Healthcare Alto, System, Palo 2011; revision requested August 5, Received Calif (C.D.H.). 1 and Electrical Engineering (J.X.), S.K.P.), S.N., D.L.R., Radiology: Daniel L. Rubin, MD, MS MD, Rubin, Daniel L. PhD Sandy Napel, PhD Plevritis, Sylvia K. Ann N. Leung, MD Leung, Ann N. PhD Xu, Yue MD Andrew Quon, Olivier Gevaert, PhD Olivier Gevaert, MS Jiajing Xu, MD Hoang, Chuong D. COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

ersonalized medicine aims to tailor biomarkers. Rapid advances in imaging abundantly available in public databases medical care to the individual by technologies permit better anatomic res- with clinical outcomes (11,12). To iden- Pcharacterizing molecular heteroge- olution and provide noninvasive measure- tify prognostic imaging biomarkers, we neity. Advances in high-throughput mo- ments of functional and physiologic tis- propose a radiogenomics strategy that lecular technologies promise to produce sue- and lesion-specific properties. Yet the integrates gene expression and medical biomarkers that will drive the future of ability to determine the clinical significance images of patients for whom survival patient-specific medical care (1–3). How- of newly identified image features relies outcomes are not available by leveraging ever, a limitation of these approaches is on clinical studies that can be costly and survival data in public gene expression the need to acquire tissue through inva- require long-term follow-up periods. We data sets. sive . In such , samples propose a novel radiogenomics strategy are often obtained from only a portion that may identify new imaging biomarkers of a generally heterogeneous lesion and when long-term clinical outcomes are not Materials and Methods cannot completely represent the le- immediately available. Our approach re- sion’s anatomic, functional, and physi- lies on first acquiring paired gene expres- Image and Microarray Data Collection ologic properties, particularly its size, sion data and medical images at diagnosis With institutional review board ap- location, and morphology. Yet many of from a study cohort, then leveraging the proval, we studied data in 26 patients these lesion-specific features are ac- public gene expression data that contain with NSCLC who underwent preopera- quired in routine imaging examinations clinical outcomes from a closely matched tive CT and fluorine 18 fluorodeoxyglu- and are known to be highly informative population (Fig 1a). A critical step in this cose (FDG) PET/CT between April 7, in diagnosis, clinical staging, and treat- approach involves predicting the image 2008, and May 21, 2010, and who had ment planning. Despite the importance features in the study cohort in terms of archived frozen tissue available for gene of these image features, only a handful gene signatures. We evaluate the prognos- expression analysis (Table E1 [online]). of studies (4–7) have generated a radi- tic significance of these gene signatures in We gathered preoperative thin-section ogenomics map that integrates the ge- public data sets for which gene expression CT and FDG PET/CT images, and a rep- nomic and image data. and survival data are available. Predicting resentative cross section of the excised Our work extends the use of radi- image features from gene signatures may ogenomics mapping to address the grow- not only enable immediate translational ing demand for prognostic image-based potential but may also suggest potential Published online before print molecular mechanisms that may give rise 10.1148/radiol.12111607 Content code:

to imaging phenotypes. Radiology 2012; 000:387–396 Advances in Knowledge As a proof-of-concept, we applied nn Fifty-six high-quality metagenes our radiogenomics strategy to a human Abbreviations: captured from non–small cell study cohort of patients with NSCLC in AUC = area under the receiver operating characteristic curve lung cancer (NSCLC) gene ex- whom we retrospectively acquired med- CI = confidence interval pression microarrays can be pre- ical images (CT and PET/CT images) FDG = fluorine 18 fluorodeoxyglucose dicted by CT image features with and gene expression microarray data. NSCLC = non–small cell lung cancer a mean accuracy of 72% (range, We focused on NSCLC because it is the RFS = recurrence-free survival 59%–83%). leading cause of cancer death, with an SUV = standardized uptake value nn Ten semantic, 104 computation- overall 5-year survival rate of 16% that Author contributions: ally derived CT image features, has not changed appreciably over the Guarantors of integrity of entire study, O.G., J.X., Y.X., A.Q., and the PET/CT standardized past 15 years (8). Also, imaging has an S.N., S.K.P.; study concepts/study design or data acquisi- uptake value of NSCLC can be important role in the management of tion or data analysis/interpretation, all authors; manuscript NSCLC and will likely have an increased drafting or manuscript revision for important intellectual predicted by using the 56 high- content, all authors; manuscript final version approval, all clinical role, on the basis of the results of quality metagenes, with an accu- authors; literature research, O.G., J.X., C.D.H., Y.X., A.Q., racy or area under the receiver the National Lung Screening Trial (9,10). S.K.P.; clinical studies, C.D.H., A.Q., S.K.P.; experimental operating characteristic curve of Moreover, gene expression of NSCLC is studies, J.X., C.D.H., Y.X., D.L.R., S.N., S.K.P.; statistical between 65% and 86%. analysis, O.G., S.K.P.; and manuscript editing, O.G., C.D.H., Y.X., A.Q., S.N., S.K.P. nn A prognostic signature of image biomarkers, composed of im- Implication for Patient Care Funding: aging features that are expressed nn Prognostically significant patient- This research was supported by the National Cancer Institute (grants R01 CA160251, U54 CA149145, and U01 in terms of their predictive gene specific molecular signatures may CA142555). expression signature, was identi- be predicted noninvasively from fied by leveraging publicly avail- image features, further advancing Potential conflicts of interest are listed at the end able microarray data with sur- the role of imaging in personal- of this article. vival outcomes. ized medicine. See also the editorial by Jaffe et al in this issue.

388 radiology.rsna.org n Radiology: Volume 264: Number 2—August 2012 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

Figure 1

Figure 1: Creation of radiogenomics map for non- small cell lung cancer (NSCLC). (a) Strategy for cre- ation and use of the radiogenomics map in NSCLC. Step 1 integrates the computed tomographic (CT) and positron emission tomographic (PET)/CT image and the gene microarray data from our study cohort. Step 2 maps the metagenes to publicly available microarray data with survival. Step 3 links image features expressed in terms of metagenes to public gene expression data. The dashes in this link high- light its indirectness, because by leveraging public gene expression data, we are able to associate the image features in the study cohort with survival, even without survival data in the study cohort. (b) Hierarchical clustering of radiogenomics correlations map with metagenes (in rows) and image features (in columns). Black squares = 243 significant asso- ciations between an image feature and a metagene with q , 5%. (c) Association between metagene 12 and the image feature for the internal air broncho- gram; (i) gene expression of genes in metagene 12; (ii) metagene 12 expression; (iii) presence of lesion was used for microarray analysis. From the same patients, we retrieved an internal air bronchogram, where ∗ = squamous Image and gene expression data were frozen tissue and extracted the RNA, lung carcinoma cases; and (iv) sample CT images gathered as described in Appendix E1 which was analyzed with gene expression of a lesion with (top) an internal air bronchogram (online). Briefly, CT and FDG PET/CT microarrays (Illumina HT-12). The gene present versus (bottom) a lesion with an internal air images were deidentified, and image fea- expression data were first clustered on bronchogram absent. For gene expression, red = tures were extracted manually by a radi- the basis of coexpression, and 56 clusters overexpression and green = underexpression; for ologist using a controlled vocabulary and were selected for subsequent analysis be- image features, blue = absence of the feature and computationally by using predefined ana- cause their cluster homogeneity was con- yellow = presence of the feature. lytical software tools to characterize the served in external data (Gene Expression lesion’s properties. This resulted in 153 Omnibus accession number, GSE8894) computational image features, 26 seman- (13). Each of the clusters was represented using Significance Analysis of Microar- tic image features, and a PET standard- by its metagene, which was defined as the rays (SAM) in R (version 1.28), with ized uptake value (SUV) for each patient’s first principal component of the cluster the false discovery rate (FDR) for mul- tumor. Figure E1 (online) shows a single (Fig E2 [online]). tiple hypothesis testing correction (14). representative CT cross section of each We used the two-class-unpaired and of the 26 nodules included in this study, Creating the Radiogenomics Correlation the continuous-response types of SAM Table E2 (online) lists all of the semantic Map for the binary- and continuous-valued annotations applied to each tumor, and Initially, we established a radiogenomics image features, respectively. The SAM Table E8 (online) provides a description correlation map for pairwise associations “D threshold” was set at 0.3, and 1000 of the computational features. between metagenes with image features, permutations were used to estimate the

Radiology: Volume 264: Number 2—August 2012 n radiology.rsna.org 389 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

FDR. A q value filter of 0.05 or less was analysis to cases of adenocarcinoma (n overexpression, which is consistent with used to identify statistically significant as- = 63), because they constituted a larger the reported observations that activating sociations between metagenes and image fraction of our study cohort. First, we KRAS mutations are exclusively associ- features. mapped each predicted image feature ated with adenocarcinoma (18) and that to the Lee et al data to assess its prog- Ras pathway activation is characteristic Creating the Radiogenomics Predictive nostic significance separately. Next, of adenocarcinoma (19) (Fig 1c). Such Models we applied Cox proportional hazards analyses of the radiogenomics map en- We built a predictive model of the meta- modeling and Kaplan-Meier survival able the synthesis of hypotheses that genes in terms of the image features, analysis to investigate the prognostic relate image phenotypes with genotype. using generalized linear regression with significance of predicted image features lasso regularization (glmnet package in (survival R package, version 2.35–8). Predictive Models of Metagenes in Terms R, version 1.5.1) (15). The regularization Kaplan-Meier curves were evaluated by of Image Features parameter was set such that at least 80% splitting the predictor at its median to We found that image features predicted of the deviance is captured by the model. identify a good versus poor prognostic all 56 metagenes with a mean accuracy Similarly, we predicted each image fea- group. We used Cox proportional haz- of 72% (range, 59%–83%; Table E4 [on- ture in terms of the metagenes. Depend- ards modeling to determine whether line]). In particular, metagene 43, which ing on the type of image feature, the the predicted image features added in- is enriched with genes in the hypoxia response variable was set as binomial, dependent information in the presence pathway (20,21), had the presence or multinomial, or Gaussian. The resulting of the clinical covariates, namely: age, absence of internal air bronchograms predictive models of image features ex- sex, smoking, nodal stage, and tumor at CT as its top-ranking predictive se- pressed in terms of metagenes can be size. Finally, we built a multivariate mantic feature. When we tried to pre- regarded as surrogates for image fea- survival model based on the predicted dict the 56 metagenes by using only the tures, and we define them as “predicted image features by using generalized lin- top 25 image features selected in an image features” (Fig 2a). We used leave- ear regression models with lasso regu- additional leave-one-out cross validation, one-out cross validation to assess the larization (glmnet package in R, version we found that all metagenes were pre- model’s performance. The performance 1.5.1) and evaluated its performance dicted with an average accuracy of 71% metric of the predicted semantic image with 10-fold cross validation. We in- (range, 56%–85%). In other words, the features was the AUC. The performance cluded the clinical covariates to deter- metagenes could be predicted from 25 metric for the predicted computational mine whether the predicted image fea- image features with reasonable accu- image features, which were continuously tures provided independent prognostic racy. We also investigated whether ei- valued, was termed “accuracy” and was value. ther semantic or computational image defined as 1 minus the error, where the features alone were sufficient to predict error was defined as the average absolute the metagenes with similar accuracy. In- error divided by the numeric range of the Results terestingly, the semantic features alone feature. Predictions with at least 65% predicted 21 of the 56 metagenes (mean AUC or 65% accuracy were selected for Radiogenomics Correlation Map of NSCLC accuracy, 74%; range, 61%–83%); how- subsequent analysis. Figure 1b shows a NSCLC radiogenom- ever, the computational features pre- ics correlation map of 243 statistically dicted all 56 metagenes (mean accuracy, Leveraging Public Gene Expression Data significant pairwise associations be- 71%; range, 56%–83%). This finding Sets for Identifying Image Biomarkers tween metagenes with CT image fea- suggests that, in our study cohort, se- Despite the lack of survival outcomes in tures and PET SUV (Table E3 [online]). mantic annotations were not needed for our study cohort, we identified candi- Several provocative associations were predicting the metagenes if the compu- date prognostic imaging biomarkers by identified. For example, the presence tational image features were available mapping the predicted image features or absence of the internal air broncho- (Table E4 [online]). to public availability of gene expression gram feature at CT was associated with data sets with clinical outcomes across metagene 12 (Fig 1c). The correspond- Predictive Models of Image Features in hundreds of patients (Fig 1a). In partic- ing gene cluster contains genes that are Terms of Metagenes ular, we used the NSCLC gene expres- specifically overexpressed in NSCLC, We found that 115 of 153 image fea- sion data set by Lee et al (13) because including KRAS (16,17). This particu- tures were predicted from the meta- it was a relevantly large study (n = 138), lar association suggests that the pres- genes with an accuracy of 65% or it contains clinical outcomes, and it has ence of an internal air bronchogram is greater. These image features consist- has a histologic composition of NSCLC related to the overexpression of KRAS ed of PET SUV, 10 semantic features, that is similar to that in our study co- and its targets. Interestingly, none of the and 104 computational features (Fig hort. Because prognostic signatures for six patients with squamous carcinoma 2b and Table E5 [online]). The top 10 adenocarcinoma and squamous car- in our study cohort had internal air predicted computational image feature cinoma differ, we limited our survival bronchograms, and none showed KRAS models had an average accuracy of 85%

390 radiology.rsna.org n Radiology: Volume 264: Number 2—August 2012 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

Figure 2

Figure 2: Multivariate modeling of image features in terms of metagenes. (a) Strategy for multivariate modeling of image features in terms of metagenes. Each image feature is modeled as a linear combination of metagenes, using L1 regularization to induce sparsity

in the number of metagenes that are selected. I1 = the first image feature ofk image features in total; Mj = jth metagene; fi = linear

regression for the ith image feature; a = regularization parameter; Wj (not shown) = weight for each metagene M1, M2, to Mn; and W = matrix with all weights. (b) Semantic features predicted by metagenes with an area under the receiver operating characteristic curve (AUC) of 65% or greater, based on leave-one-out cross-validation analysis. (c) Multivariate metagene prediction model for the presence versus absence of internal air bronchogram at CT. The top seven metagenes, representing 95% of the weight of the multivariate model, are shown; the top three metagenes are upregulated when an air bronchogram is present, and the bottom four metagenes are downregulated. The downregulated metagenes are enriched in hypoxia-related pathways; the upregulated metagenes contain a Ras signature and genes upregulated by Ras. (d) Receiver operating characteristic curve for the predicted presence versus absence of internal air bronchogram at CT, when expressed in terms of metagenes. (e) Multivariate model for internal air bronchogram correspond- ing to c. For gene expression, red = overexpression and green = underexpression; for image features, blue = absence of the feature and yellow = presence of the feature. CI = confidence interval.

(84%–86%) and were predominately 2d). The associated gene clusters that upregulation of KRAS-driven lung tu- associated with lesion size, edge shape, were upregulated in the presence of mors (23) and suggests that the pres- and edge sharpness. an internal air bronchogram were en- ence of internal air bronchograms may The most accurately predicted se- riched with Ras targets and confirmed be a biomarker of high Ras and low mantic feature was the presence or ab- our earlier univariate association of hypoxia pathway activity. sence of the internal air bronchogram this feature (19,22); the downregulated To validate our approach for pre- at CT (AUC = 86%, Fig 2c). This fea- gene clusters were enriched in hypoxia- dicting image features with a gene ture was predicted by using 10 meta- related pathways (20,21). This finding signature, we focused on the accuracy genes, seven of which captured 95% of is consistent with the observation that of predicting image features that de- the weight in the linear regression (Fig inactivation of HIF2alpha is related to scribe the tumor’s dimensions, because

Radiology: Volume 264: Number 2—August 2012 n radiology.rsna.org 391 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

Figure 3

Figure 3: Validation ofQ14 the predicted image features related to computed lesion size at CT. (a) Graph shows the correlation of the predicted image feature “lesion size” and the actual lesion size in the Lee et al (13) data (P < .0001). (b) Graph shows comparison of the accuracy of the predicted image features lesion size, minor axis, and major axis with their correlation with actual tumor size in the Lee et al data.

features remained prognostically signif- the actual tumor size was available in size [Fig 4a]), a finding consistent with icant (Table) in a multivariate analysis the Lee et al (13) data set. In partic- the literature (28). In addition, several that included clinical covariates such as ular, we focused on the gene signa- semantic image features were prognosti- age, nodal stage, and tumor size (28). tures that predict the three computed cally significant, including the presence Multivariate survival analysis of pre- CT image features: minor axis, major of satellite nodules in the primary tumor dicted image features.—We computed a axis, and lesion size. We found that lobe, lobulated margins, and pleural at- multivariate model of the predicted im- all three predicted image features tachment (Fig 4b and Table). We also age features that significantly correlated were statistically significantly corre- found that edge sharpness was corre- with 4-year RFS (log-rank P = .00166 lated with actual tumor size (Fig 3a). lated with RFS (hazard ratio, 0.35; 95% [Fig 5a) in the Lee et al (13) adeno- Moreover, this significance was also CI: 0.13, 0.91 [Fig 4c]); in particular, carcinoma data. The top-ranked image correlated with the accuracy of each blurry versus sharp edges were asso- features were edge sharpness, length of predictive model (Fig 3b). Interest- ciated with poor versus good survival. the major axis of the lesion, and lesion ingly, the upregulated genes for large In addition, the presence of an internal texture (Table E7 [online]). The multi- predicted tumor sizes were associated air bronchogram, which we previously variate model was also predictive of RFS with extracellular matrix remodeling associated with KRAS, was associated when clinical covariates were included (P , .0001) (24) and the epithelial-to- with poor RFS (hazard ratio, 3.1; 95% (Cox P = .0017; hazard ratio, 3.88; 95% mesenchymal transition (25–27). CI: 1.5, 6.6 [Fig 4d]). This is consistent CI: 1.7, 9.1). Because predicted tumor with the reported association between size was in this model, we repeated the Identification of Image Biomarkers by upregulation of KRAS and poor sur- analysis without all predicted tumor Leveraging Public Gene Expression Data vival (29,30). However, this appears to size features, and found that the model Univariate survival analysis of predicted contrast with the studies that correlate was still predictive of RFS (log-rank P image features.—We found 26 and 22 the presence of an internal air broncho- = .0257 [Fig 5b]). Even when the clin- predicted image features that were sig- gram, in stage 1 tumors with pleural re- ical variables were included, a combi- nificantly associated with recurrence- traction, with good prognosis (31). Yet nation of predicted image features was free survival (RFS) (log-rank P and Cox a recent study found a nonsignificant independently significant (CoxP = .020; P , .05 [Fig 4, Table) and overall sur- correlation between the presence of an hazard ratio, = 1.79; 95% CI: 1.1, 2.9). vival (Table E6 [online]), respectively, at air bronchogram and RFS, suggesting Top-ranking predicted image features 4 years from diagnosis, in a univariate that larger studies that can adjust for included computational features that survival analysis of the Lee et al (13) ad- confounding factors are needed (32). describe the lesion texture and edge enocarcinoma data. Among the 26 pre- Consistent with our observations, Travis sharpness and a semantic image feature dicted image features associated with et al (33) reported a positive correlation that describes the presence or absence RFS were all three image features that between the presence of KRAS muta- of entering airways (Table E7 [online]). characterize tumor dimensions on CT tions and the presence of an internal When we repeated this analysis using (ie, minor axis, major axis, and lesion air bronchogram. Overall, many image only semantic features, the resulting

392 radiology.rsna.org n Radiology: Volume 264: Number 2—August 2012 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

model was also significantly correlated Figure 4 with RFS (log-rank P = .0251 [Fig 5c]), and the top-ranking semantic features included the presence or absence of en- tering airways, lobulated margins, and pleural retraction (Table E7 [online]).

Discussion While there has been much recent ac- tivity in developing quantitative imaging biomarkers of disease, linking these biomarkers to clinical outcomes such as progression-free and overall survival and response to treatment is prob- lematic because of the length of time required to obtain these outcomes in study cohorts. Here, we demonstrate a radiogenomics strategy for rapidly identifying prognostically significant image biomarkers that requires only the paired acquisition of image and gene expression data and the existence of a large public gene expression data set where survival outcomes are avail- able. We demonstrated that once the image features can be predicted from metagenes, the clinical significance of the predicted image features can be ex- Figure 4: Univariate survival analysis for four predicted image features evaluated in the Lee et al (13) plored by mining existing public gene data for RFS. Predicted image features are expressed in terms of a multivariate gene expression signature. expression microarray databases that (a) Survival curves for predicted lesion size at CT, with top right image showing that a small CT lesion is contain clinical outcomes. As a proof- associated with good prognosis (green curve) and bottom right image showing that a large CT lesion is asso- of-concept, our radiogenomics strategy ciated with poor prognosis (red curve). (b) Survival curves for predicted presence versus absence of pleural identified image features that are known attachment at CT. (c) Survival curves based on predicted edge sharpness composite 1, with top right image to be prognostically significant without showing that high edge sharpness is associated with good prognosis (red curve) and bottom right image relying on paired observations of image showing that low edge sharpness is associated with poor prognosis (green curve). (d) Survival curves based features and survival outcomes. on predicted presence versus absence of internal air bronchogram. We applied our radiogenomics strategy in a cohort of 26 patients with NSCLC for whom we had medical features and PET SUV can be predicted even when clinical covariates that are images (CT and PET/CT images) and from a linear combination of the meta- known to be prognostically significant gene expression microarray data. Ini- genes with similar accuracy. For exam- were considered. This last analysis tially, we identified several statistically ple, we provided a gene signature for may prove to be valuable because many significant correlations between gene image-based tumor size that implicates emerging and evolving imaging technol- expression profiles of NSCLC and im- processes in extracellular modeling and ogies change during the time required aging phenotypes at CT and PET/CT. the epithelial-to-mesenchymal transi- to establish a clinical end point. Similar to Segal et al (4), who studied tion. This particular result suggests that A strength in our radiogenom- the radiogenomics of hepatocellular from cross-sectional observations of tu- ics strategy is the use of a large set carcinoma, we demonstrated that the mors, we may be able to identify mo- of computer-derived image features in metagenes of NSCLC can be predicted lecular characteristics of a single tumor addition to semantic features. We ex- from a linear combination of a small set as if it were observed longitudinally tracted computational features from CT of image features with a reasonable ac- with increasing size. Finally, we showed that describe tumor shape, edge shape, curacy to warrant further investigation that a linear combination of predicted and edge variability. Interestingly, we in a larger study that includes an inde- image features was prognostically sig- showed that the computational image pendent validation data set. Moreover, nificant in a publicly available gene ex- features were necessary and sufficient we demonstrated that many CT image pression data set of NSCLC outcomes, for accurate modeling of metagenes

Radiology: Volume 264: Number 2—August 2012 n radiology.rsna.org 393 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

Predicted Image Features That Are Significantly Related to RFS in the Data Set of Lee et al Univariate Survival Analysis Multivariate Survival Analysis* 95% CI for 95% CI for Significant Clinical Image Feature Cox P Value Hazard Ratio Hazard Ratio Cox P Value Hazard Ratio Hazard Ratio Variables

Poor prognosis image features Texture-Gabor-9 .011 2.56 1.22, 5.37 .002 3.70 1.59, 8.63 N stage, tumor size Texture-Gabor-10 .001 3.29 1.54, 7.02 .003 3.40 1.53, 7.57 N stage, tumor size Edge sharpness-WB3 .009 2.62 1.25, 5.50 .002 4.19 1.70, 10.35 N stage Internal air bronchogram† .002 3.11 1.46, 6.62 .020 2.85 1.18, 6.92 N stage, tumor size Satellite nodules in primary .048 2.05 0.99, 4.23 .009 3.07 1.32, 7.11 N stage, tumor tumor lobe† size Lobulated margin† .028 2.24 1.07, 4.68 .179 1.70 0.78, 3.7 N stage, tumor size Pleural attachment† .034 2.16 1.05, 4.47 .019 2.96 1.20, 7.31 N stage Edge shape-RDSmean .022 2.32 1.11, 4.85 .036 2.37 1.06, 5.29 N stage, tumor size Histogram-kurtosis .016 2.43 1.16, 5.09 .008 3.20 1.36, 7.53 N stage, tumor size Edge sharpness-Smin .008 2.66 1.27, 5.58 .014 2.85 1.23, 6.59 N stage, tumor size Lesion size .017 2.40 1.15, 5.02 .016 2.96 1.23, 7.12 N stage Major axis .001 3.61 1.66, 7.88 .002 3.66 1.61, 8.32 N stage Minor axis ,.001 3.79 1.74, 8.26 ,.001 5.36 2.14, 13.38 N stage Edge shape-Min3R .012 2.51 1.20, 5.26 .032 2.46 1.08, 5.60 N stage Edge shape-Mean2R .018 2.38 1.14, 4.98 .016 2.72 1.21, 6.13 N stage, tumor size Good prognosis image features Histogram-B7 .030 0.45 0.22, 0.95 .075 0.49 0.23, 1.07 N stage, tumor size Edge sharpness-WB13 .037 0.47 0.23, 0.97 .053 0.42 0.17, 1.01 N stage, Edge sharpness-WB19 .035 0.47 0.23, 0.96 .125 0.52 0.23, 1.20 N stage, tumor size Edge sharpness-SB29 .009 0.38 0.18, 0.81 .050 0.44 0.20, 1.00 N stage Entering airway† .005 0.36 0.17, 0.75 .023 0.36 0.15, 0.87 N stage Histogram-min .002 0.31 0.15, 0.66 .013 0.34 0.15, 0.80 N stage Edge sharpness-Wmax .039 0.47 0.22, 0.98 .029 0.41 0.18, 0.91 N stage, tumor size Edge sharpness-Wstd .028 0.45 0.21, 0.93 .015 0.36 0.16, 0.82 N stage, tumor size Edge sharpness-Composite1 .002 0.32 0.15, 0.68 .003 0.25 0.10, 0.63 N stage Edge sharpness-Composite2 .002 0.31 0.15, 0.66 .005 0.28 0.11, 0.69 N stage Edge shape-Max3R .019 0.42 0.20, 0.88 .143 0.53 0.23, 1.24 N stage

Note.—The univariate results refer to evaluating a predicted image feature without other clinical covariates. The multivariate results show the results of the univariate analysis after correcting for clinical covariates available in the Lee et al (13) data set. N stage refers to nodal stage. * Single image feature plus clinical covariates. † Semantic feature. and predicting survival outcomes. This available database for associating com- of reproducible semantic and computed finding alone suggests that the imaging putationally derived image features image features further permits images community may greatly benefit from with clinical outcomes (11). Moreover, to be minable like genomic data (34). the creation of a large-scale publicly the creation of a lexicon and ontology Over the longer term, a minable image

394 radiology.rsna.org n Radiology: Volume 264: Number 2—August 2012 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

Figure 5

Figure 5: Multivariate survival analysis for predicted image features evaluated in the Lee et al data set (13) for RFS. Graphs show Kaplan- Meier survival curves for selected multivariate models on the Lee et al external data set. Left: Multivariate model using all predicted image features. Middle: Multivariate model using all image features except the three size features: lesion size, minor axis, and major axis. Right: Multivariate model using only semantic features. database, together with data semantic features from a controlled vo- sard Bonnington. Other relationships: none to and clinical outcomes, would enable cabulary. Multiple radiologists would disclose. D.L.R. No potential conflicts of interest to disclose. S.N. No potential conflicts of inter- an analysis of the added value of im- capture variability in these interpreta- est to disclose. S.K.P. Financial activities related age and genomic features in clinical tions. Variability in selected regions of to the present article: institution received grant decision making. interest may result in variations in the from GE Healthcare and GE Medical Systems. The limitations of our radiogenom- computational features and their as- Financial activities not related to the present ar- ticle: none to disclose. Other relationships: none ics analysis relate to the data used as sociations with the image features. A to disclose. proof-of-concept. We analyzed data in larger study and an external validation a small cohort of patients with NSCLC, cohort will enable investigation of these References which did not fully capture the variabil- limiting factors underlying our analysis. ity of the disease at imaging and gene In summary, by mapping image fea- 1. Sotiriou C. Molecular biology in oncology expression profiling, nor the variability tures to gene expression data, we were and its influence on clinical practice: gene expression profiling [abstr]. Ann Oncol due to histologic subtype. Also owing able to leverage public gene expression 2009;20(Suppl 4):v10. to the absence of a high-quality thin- microarray data to assess prognosis section CT validation data set, we used and, where available, therapeutic re- 2. Pao W, Kris MG, Iafrate AJ, et al. Integration of cross-validation techniques to estimate sponse, as a function of image features. molecular profiling into the lung cancer clinic. Clin Cancer Res 2009;15(17):5317–5322. the correlation with survival. In addi- Even though we focused on CT and PET tion, we did not consider variability in images in patients with NSCLC, our 3. Gevaert O, De Moor B. Prediction of cancer image acquisition, reconstruction, and methods are sufficiently general that outcome using DNA microarray technology: past, present and future. Expert Opin Med interpretation. The CT-derived compu- they can be applied to other imaging Diagn 2009;3(2):157–165. tational image features may be sensitive modalities and diseases. to the methods of image formation, in- 4. Segal E, Sirlin CB, Ooi C, et al. Decoding global gene expression programs in liver cluding CT reconstruction kernel, sec- Acknowledgments: We are grateful to Andrew Gentles, PhD, and Professors Robert Tibshirani, cancer by noninvasive imaging. Nat Biotech- tion thickness, and pixel size. Similarly, PhD, and Maximilian Diehn, MD, PhD, for help- nol 2007;25(6):675–680. PET SUVs may show similar variations ful discussions. We are also grateful to Professor as a function of scanner design and re- Jhingook Kim and his colleagues for providing 5. Diehn M, Nardini C, Wang DS, et al. Identi- fication of noninvasive imaging surrogates for construction parameters. We limited additional clinical annotation for the Lee et al data set. brain tumor gene-expression modules. Proc the feature extraction of the PET image Natl Acad Sci U S A 2008;105(13):5213–5218. to the most important clinical value— Disclosures of Potential Conflicts of Interest: 6. Kuo MD, Gollub J, Sirlin CB, Ooi C, Chen X. namely, maximum SUV—yet it is cer- O.G. No potential conflicts of interest to dis- close. J.X. No potential conflicts of interest to Radiogenomic analysis to identify imaging tainly possible to extract a multitude disclose. C.D.H. No potential conflicts of inter- phenotypes associated with drug response of image features from PET. Finally, a est to disclose. A.N.L. No potential conflicts of gene expression programs in hepatocellular single thoracic radiologist identified a interest to disclose. Y.X. No potential conflicts carcinoma. J Vasc Interv Radiol 2007;18(7): region of interest in a representative of interest to disclose. A.Q. Financial activities 821–831. related to the present article: none to disclose. cross section for computational feature Financial activities not related to the present 7. Rutman AM, Kuo MD. Radiogenomics: creat- extraction and selected the descriptive article: has served as an expert witness for Has- ing a link between molecular diagnostics and

Radiology: Volume 264: Number 2—August 2012 n radiology.rsna.org 395 COMPUTER APPLICATIONS: Prognostic Imaging Biomarkers in Non–Small Cell Lung Cancer Gevaert et al

diagnostic imaging. Eur J Radiol 2009;70(2): 17. Riely GJ, Marks J, Pao W. KRAS mutations 27. Xi LQ, Lyons-Weiler J, Coello MC, et al. 232–241. in non-small cell lung cancer. Proc Am Tho- Prediction of lymph node metastasis by rac Soc 2009;6(2):201–205. analysis of gene expression profiles in pri- 8. Jemal A, Siegel R, Xu J, Ward E. Cancer sta- mary lung adenocarcinomas. Clin Cancer tistics, 2010. CA Cancer J Clin 2010;60(5): 18. Brose MS, Volpe P, Feldman M, et al. BRAF and Res 2005;11(11):4128–4135. 277–300. RAS mutations in human lung cancer and mela- noma. Cancer Res 2002;62(23):6997–7000. 28. Port JL, Kent MS, Korst RJ, Libby D, Pas- 9. National Lung Screening Trial Research Team, mantier M, Altorki NK. Tumor size predicts Aberle DR, Berg CD, et al. The National Lung 19. Bild AH, Yao G, Chang JT, et al. Onco- survival within stage IA non-small cell lung Screening Trial: overview and study design. genic pathway signatures in human cancer. Chest 2003;124(5):1828–1833. Radiology 2011;258(1):243–253. as a guide to targeted therapies. Nature 2006;439(7074):353–357. 29. Massarelli E, Varella-Garcia M, Tang X, et 10. National Lung Screening Trial Research al. KRAS mutation is an important predic- Team, Aberle DR, Adams AM, et al. Re- 20. Manalo DJ, Rowan A, Lavoie T, et al. Tran- tor of resistance to therapy with epidermal duced lung-cancer mortality with low-dose scriptional regulation of vascular endothelial growth factor receptor tyrosine kinase in- computed tomographic screening. N Engl J cell responses to hypoxia by HIF-1. Blood hibitors in non-small-cell lung cancer. Clin Med 2011;365(5):395–409. 2005;105(2):659–669. Cancer Res 2007;13(10):2890–2896. 11. Barrett T, Troup DB, Wilhite SE, et al. 21. Elvidge GP, Glenny L, Appelhoff RJ, Ratcliffe PJ, 30. Mascaux C, Iannino N, Martin B, et al. The NCBI GEO: archive for high-throughput Ragoussis J, Gleadle JM. Concordant reg- role of RAS oncogene in survival of patients functional genomic data. Nucleic Acids Res ulation of gene expression by hypoxia and with lung cancer: a systematic review of the 2009;37(Database issue):D885–D890. 2-oxoglutarate-dependent dioxygenase literature with meta-analysis. Br J Cancer inhibition: the role of HIF-1alpha, HIF- 12. Parkinson H, Sarkans U, Kolesnikov N, et 2005;92(1):131–139. 2alpha, and other pathways. J Biol Chem al. ArrayExpress update—an archive of mi- 2006;281(22):15215–15226. 31. Yoshino I, Nakanishi R, Kodate M, et al. croarray and high-throughput sequencing- Pleural retraction and intra-tumoral air-bron- based functional genomics experiments. 22. Sweet-Cordero A, Mukherjee S, Subra- chogram as prognostic factors for stage I pul- Nucleic Acids Res 2011;39(Database issue): manian A, et al. An oncogenic KRAS2 monary adenocarcinoma following complete D1002–D1004. expression signature identified by cross- resection. Int Surg 2000;85(2):105–112. species gene-expression analysis. Nat Genet 13. Lee ES, Son DS, Kim SH, et al. Prediction 2005;37(1):48–55. 32. Ikehara M, Saito H, Kondo T, et al. Compar- of recurrence-free survival in postoperative ison of thin-section CT and pathological find- non-small cell lung cancer patients by using 23. Mazumdar J, Hickey MM, Pant DK, et al. ings in small solid-density type pulmonary an integrated model of clinical informa- HIF-2alpha deletion promotes Kras-driven adenocarcinoma: prognostic factors from CT tion and gene expression. Clin Cancer Res lung tumor development. Proc Natl Acad Sci findings. Eur J Radiol 2012;81(1):189–194. 2008;14(22):7397–7404. U S A 2010;107(32):14182–14187. 33. Travis WD, Brambilla E, Noguchi M, et al. 14. Tusher VG, Tibshirani R, Chu G. Signifi- 24. Bergamaschi A, Tagliabue E, Sørlie T, et al. International association for the study of cance analysis of microarrays applied to the Extracellular matrix signature identifies breast lung cancer/american thoracic society/euro- ionizing radiation response. Proc Natl Acad cancer subgroups with different clinical out- pean respiratory society international multi- Sci U S A 2001;98(9):5116–5121. come. J Pathol 2008;214(3):357–367. disciplinary classification of lung adenocar- 15. Friedman J, Hastie T, Tibshirani R. Regu- 25. Charafe-Jauffret E, Ginestier C, Monville F, cinoma. J Thorac Oncol 2011;6(2):244–285. larization paths for generalized linear et al. Gene expression profiling of breast cell 34. Rubin DL, Rodriguez C, Shah P, Beaulieu C. models via coordinate descent. J Stat Softw lines identifies potential new basal markers. iPad: Semantic annotation and markup of 2010;33(1):1–22. Oncogene 2006;25(15):2273–2284. radiological images. AMIA Annu Symp Proc 16. Inoue A, Nukiwa T. Gene mutations in lung 26. Seo DC, Sung JM, Cho HJ, et al. Gene ex- 2008:626–630. cancer: promising predictive factors for the pression profiling of cancer stem cell in hu- success of molecular therapy. PLoS Med man lung adenocarcinoma A549 cells. Mol 2005;2(1):e13. Cancer 2007;6:75.

396 radiology.rsna.org n Radiology: Volume 264: Number 2—August 2012