Supplementary Figure 1

GEM Development Breast Previous Breast Patient Population Cancer Patient Set (N=251, Miller 2005) NCI-60 Panel (N=133, Hess et al `05)

23 1 132 575A Sensitive Tumor Tumor 1 Profiling 23 Sample Taken 1 1 23 23 TFAC Chemotherapy treatment TFAC Drugs Responders Non Responders

COXEN Feature selection Feature Selection & Model Training Model Training Model Training

In vitro COXEN GEM In vivo COXEN GEM

T F A C T F A C

In vito T/FAC In vivo T/FAC DLDA31 GEM COXEN GEM COXEN GEM

Prospective Applications of GEMs to TFAC treated 100 Patient Set

SUPPLEMENTARY MATERAILS

Supplementary Methods

Gene expression profiling

Needle aspiration specimens of the cancer were placed into RNAlater™ solution (Qiagen) and stored in -800C until further analysis. RNA was extracted using the RNAeasy Kit™

(Qiagen, Valencia CA). The amount and quality of RNA were assessed with a DU-640 U.V.

Spectrophotometer (Beckman Coulter, Fullerton, CA) and were considered adequate for further analysis if the OD260/280 ratio was >1.8 and the total RNA yield was >1microgramme. A single-round T7 amplification was used to generate biotin-labeled cRNA for hybridization to

Affymetrix U133A chips overnight at 42oC. Normalization and quality control assessment of the hybridization results were performed with SimpleAffy and other

Bioconductor software packages (http://www.bioconductor.org); the percent present call had to be >30%, scaling factor < 3, and the 3’/5’ ratios for beta-actin < 3 and for GAPDH < 1.3.

Clinical Nomogram

The clinical variable-based nomogram published by Rouzier et al. to predict pCR after preoperative chemotherapy combines information from patient age, tumor size, ER-status, tumor grade and the type and number of chemotherapy treatments. (Rouzier R et al.

Nomograms to predict pathologic complete response and -free survival after preoperative chemotherapy for breast cancer. J Clin Oncol, 2005). The equation of probability of achieving pCR was different for patients who received Anthracycline-based chemotherapy

1 without Paclitaxel and for those who received Anthracycline-based chemotherapy with

Paclitaxel. The equation of probability of achieving pCR was:

i) For patients who received Anthracycline-based chemotherapy without Paclitaxel:

P = 1/(1 + exp (-X)).

ii) For patients who received Anthracycline-based chemotherapy with Paclitaxel:

P = 1/(1 + exp (-(2.311-0.7285*P+1.4891*X))), with P=1 for weekly Paclitaxel and

P=2 for 3-weekly paclitaxel. where in both equation X = -5.85766 + 1.11027*V1 - 0.02247 * V2 - 0.46632 * V3 + 0.91987 *

V4 - 0.85084 * V5 and V1 was the number of preoperative course (3 or 4), V2 was age in years, V3 was T of TNM classification (0, 1, 2, 3 or 4), V4 was histological grade (1, 2 or 3), and V5 was estrogen status (0 or 1). This clinical response prediction calculator is freely accessible online at: http://www.mdanderson.org/pcr.

Using this nomogram, we calculated the probability of pCR for each patient. Then, we compared the nomogram-predicted pCR rates to the observed rates. The performance of this nomogram is described through discrimination, and calibration.

Discrimination (i.e., whether the relative ranking of individual predictions is in the correct order) was quantified with the area under (AUC) the receiver operating characteristic (ROC) curve. The AUC is a summary measure of the ROC that reflects the ability of a test to discriminate between a diseased and a non-diseased subject across all possible levels of positivity. A 95% confidence interval (CI) was calculated for each AUC. AUC ranges from 0 to

1 with 1 indicating perfect concordance, 0.5 indicating no better concordance than chance,

2 and 0 indicating perfect discordance (Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982).

Calibration (i.e., agreement between observed outcome frequencies and predicted probabilities) was studied from graphical representations of the relationship between the observed outcome frequencies and the predicted probabilities (calibration curves). A calibration curve can be approximated by a regression line with intercept α and slope β.

These parameters can be estimated in an LRM with the event as outcome and the linear predictor as the only covariate. Well-calibrated models have α = 0 and β = 1. Therefore, a sensible measure of calibration is a likelihood ratio statistic testing the null hypothesis that α =

0 and β = 1. The statistic has a χ2 distribution with 2 degrees of freedom (unreliability [U] - statistic) (Cox D: Two further applications of a model for binary regression. Biometrika, 1958).

We also evaluated average [E average (E aver)] and maximal errors [E maximal (E )] between predictions and observations obtained from a calibration curve. All analyses were performed using the R package with the Design, Hmisc, and Verification libraries

(http://lib.stat.cmu.edu/R/CRAN/), version 2.8.0.

The “Co-eXpression ExtrapolatioN” (COXEN) algorithm

To calculate the in vitro-COXEN GEM scores, the data was normalized using RMAExpress 1.0 software and standardized by subtracting the mean and dividing the result by the standard deviation for each probe level measurement. The COXEN algorithm is composed of six distinct steps. The end result is what we term the “COXEN Score,” which reflects the predicted sensitivity of a particular cell line or tumor to the specific drug

3 being evaluated by the algorithm. Generically, the steps for prediction of a drug’s activity in cells belonging to some Set 2 on the basis of its activity pattern in different cells of some Set 1 are as follows:

Step 1. Experimentally determine the drug’s pattern of activity in cells of Set 1.

Step 2. Experimentally measure molecular characteristics of the cells in Set 1.

Step 3. Select a subset of those molecular characteristics that most accurately predicts

the drug’s activity in cell Set 1 (“chemosensitivity signature” selection).

Step 4. Experimentally measure the same molecular characteristics of the cells in Set 2

Step 5. Among the molecular characteristics selected in step 3, identify a subset that

shows a strong pattern of “co-expression extrapolation” between cell Sets 1 and 2.

Step 6. Use a multivariate algorithm to predict the drug’s activity in Set 2 cells on the

basis of the drug’s activity pattern in Set 1 and the molecular characteristics of Set 2

selected in step 5. The output of the multivariate analysis is a COXEN Score.

In our first application of the COXEN algorithm, for example, Cell Sets 1 and 2 were the NCI-

60 and one of the training set of human breast cancer patients (n=251; Miller et al., An expression signature for status in human breast cancer predicts status, transcriptional effects, and patient survival. Proc Natl Acad Sci U S A, 2005; BR-251); the

Step 1 drug activities were those assessed by the DTP in the NCI-60; the “molecular characteristics” in Steps 2 and 4 were transcript expression levels, as assessed using

Affymetrix HG-U133A microarrays; the algorithm in Step 3 was “Significance Analysis of

Microarrays (SAM)” or similar statistical testing for differential expression; Step 5 is a novel

4 “co-expression extrapolation” algorithm we developed. Step 6 was a refined classification algorithm, “Misclassification-Penalized Posterior” (MiPP), which we recently introduced for selection of the best mathematical “models” for such predictions (Soukup et al, Robust classification modeling on microarray data using misclassification penalized posterior.

Bioinformatics, 2005). MiPP generates the final COXEN Score. As will be discussed below,

COXEN predictions for both cell line and clinical trial drug responses were prospectively and independently validated.

Identification of candidate “chemosensitivity biomarkers” in the NCI-60 panel (Step 3)

For each compound in the public NCI-60 drug database, we identified each top 20% most sensitive and most resistant NCI-60 cell lines. Using slightly different percent cutoffs did not change the ultimate results appreciably (data not shown). After selection of NCI-60 cells sensitive and resistant to each of the two compounds, we used the “Significance Analysis of

Microarrays” (SAM) test with false discovery rate (FDR) 0.1 to identify microarray probe sets expressed differentially between the two cell subsets. Those probe sets can be thought of as candidate “chemosensitivity biomarkers” based on the NCI-60 data.

Identification of co-expression extrapolation signatures (Step 5)

To parameterize each probe’s co-expression relationships between two studies mathematically, we calculated a “co-expression extrapolation coefficient (CEEC),” rc(j), for each probe j as follows: Using the probe expression data, we constructed two correlation matrices (of dimension n x n) for the set of n candidate chemosensitivity probes. The two correlation matrices, one for the NCI-60, the other for the training patient set BR-251, were

5 evaluated as U = [Uij]nxn and V = [Vij]nxn,, where Uij and Vij are the correlation coefficients between probes i and j in the NCI-60 and BR-251, respectively. Then, rc(j) is defined as

n − − − − ∑(U kj U j )(Vkj V j ) rc( j) = k=1 n − n − − 2 − 2 ∑(U kj U j ) ∑(Vkj V j ) k=1 k=1

− − where U j and V j are the mean correlation coefficients of the column-j correlation coefficient vectors for the NCI-60 and BR-251. rc reflects the degree of co-expression extrapolation of probe j with the set of n probes between the NCI-60 and BR-251 panels. If rc(j) exceeded a cut-off criterion (e.g., 98th percentile of the corresponding random distribution generated by randomly shuffling the probe identities between the two sets), probe j was selected as a probe for co-expression extrapolation between the two panels. Since probe j was selected from the set of n candidate chemosensitivity predictors, it had that pharmacological characteristic as well. Note that CEEC will be high if a probe’s co-expression network relationships with the other probes on the first set (i.e. NCI-60) are concordant with those of the second set

(i.e. BR-251). The co-expression extrapolation coefficient (CEEC), rc(j), is thus simply the

correlation coefficient between column vectors U j and V j .

Development of chemosensitivity prediction models for the in vitro COXEN GEM (Step

6)

We searched among those candidate biomarkers obtained from steps 1-5 for ones that would form optimal parsimonious models for prediction of the compound’s activity. For that purpose, we used the “Misclassification-Penalized Posterior” (MiPP) algorithm. In brief, MiPP is based

6 on stepwise incremental classification modeling for discovery of the most parsimonious prediction models. It includes double cross-validated evaluation for each trained prediction model. Model training can be performed using any of several different classification algorithm such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machine (SVMs) learning, or logistic regression. In the current study we used LDA for most of the applications. In double cross-validation, the first cross-validation is based on random splitting of the whole data set into a training set and an independent test set for external model validation; the second is an n-fold cross-validation on the training set to avoid the pitfalls of a large-screening search and to obtain the most parsimonious optimal prediction models. Independent splits of the data result in multiple prediction models. The multiple models are then re-evaluated using a large number (e.g., 1000) of random splits of test and training sets to obtain confidence bounds on the accuracy of prediction. On the basis of those confidence bounds, the prediction performance and mean misclassification error rates (ER) are obtained for each of the candidate prediction models. The final prediction of a cell line as

“sensitive” or “resistant” is determined by the cell’s (posterior) classification probability (CP) from each LDA prediction model. If CP > 0.5 based on the top three to five prediction models, the cell line is considered sensitive; if not, it is considered resistant. We found that optimal

MiPP prediction models were often obtained with only a small number of probes, e.g. three or four probes, among the ones identified in Step 5.

Development of prediction models for the in vivo COXEN GEM (Revised Step 6)

The in vivo COXEN GEMs of the four drugs were developed using the identical sets of biomarkers that were used in the in vitro COXEN GEM developments for the four drugs.

7 However, the multivariate LDA classification modeling was performed using the microarray data of the same previous human set used for developing the DLDA31 model

(Supplementary Figure S2). This modeling was based on the assumption that the patient group of responders (to the combination chemotherapy) should have a good proportion of the responders to each of the four compounds while the non-responder patients to the combination chemotherapy were non-responders to any of the four compounds. All the other statistical modeling procedures were identical to the case of the in vitro COXEN GEMs.

Combination-drug GEM

Individual compound GEMs are combined to generate the prediction model for each combination chemotherapy under the assumption that the individual compounds in the combination act independently. That is, prediction of combination drug efficacy was obtained based on the final single-drug prediction models, directly utilizing each cell line’s classification probabilities from these models. That is, assuming two different drug compounds acted independently, the combination chemosensitivity probability PAB of their combination treatment was derived as:

1 – PA[resistant for drug A] x PB[resistant for drug B].

Here PA and PB are the chemosensitivity response probabilities based on the prediction models for compound A and B, respectively. Importantly, this model development and training did not use any clinical information or microarray data from the test sets used for

GEM evaluation, thus maintaining strict independence between training and test data sets.

8

9 Supplementary Table 1: and probe sets used in the DLDA30 and paclitaxel, 5-

FU, doxorubicin, and cyclophosphamide COXEN models.

The list of probe sets included in the DLDA30 predictor.

Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 1 MAPT 203929_s_at Hs.101174.1 AI056359 Microtubule-associated 2 MAPT 203930_s_at g8400712 NM_016835 Microtubule-associated protein 3 BBS4 212745_s_at Hs.26471.0 AI813772 Bardet-Biedl syndrome 4 4 MAPT 203928_x_at Hs.101174.1 AI870749 Microtubule-associated protein 5 THRAP2 212207_at Hs.4084.0 BG426689 Thyroid associated protein 2 6 MGC5370 217542_at Hs.168732.0 BE930512 Hypothetical protein MGC5370 7 MAPT 206401_s_at g338684 J03778 Microtubule-associated protein 8 ? 215304_at Hs.159264.0 U79293 Human clone 23948 mRNA sequence 9 ZNF552 219741_x_at g13443019 NM_024762 protein 552 10 RAMP1 204916_at g5032018 NM_005855 Receptor (calcitonin) activity modifying protein 1 Beclin 1 (coiled-coil, myosin-like BCL2 11 BECN1 208945_s_at Hs.12272.0 AF139131 interacting protein) 12 BTG3 213134_x_at Hs.77311.1 AI765445 BTG family, member 3 13 SCUBE2 219197_s_at Hs.222399.0 NM_020974 Signal peptide, CUB domain,EGF-like 2 14 MELK 204825_at g7661973 NM_014791 Maternal embryonic leucinezipper kinase 15 BTG3 205548_s_at g5802989 NM_006806 BTG family, member 3 16 AMFR 202204_s_at g5931954 NM_001144 Autocrine motility factor receptor Catenin (cadherin-associated protein), delta 2 17 CTNND2 209617_s_at g2661061 AF035302 (neural plakophilin-related armrepeat protein) 18 GAMT 205354_at g7549759 NM_000156 Guanidinoacetate Nmethyltransferase 19 CA12 204509_at g8923149 NM_017689 Carbonic anhydrase XII 20 FGFR1OP 214124_x_at Hs.108548.1 AL043487 FGFR1 oncogene partner 21 KIAA1467 213234_at Hs.6189.0 AB040900 KIAA1467 protein 22 MTRN 219051_x_at g13128999 NM_024042 Meteorin, glial cell differentiation regulator 23 FLJ10916 219044_at g8922765 NM_018271 Hypothetical protein FLJ10916 24 203693_s_at g12669913 NM_001949 factor 3 V-erb-a erythroblastic leukemia viral oncogene 25 ERBB4 214053_at Hs.390729 AW772192 homolog 4(avian) 26 JMJD2B 215616_s_at Hs.301011.2 AB020683 Jumonji domain containing 2B 27 RRM2 209773_s_at g12804874 BC001886 Ribonucleotide reductase M2 polypeptide 28 FLJ12650 219438_at g13375663 NM_024522 Hypothetical protein FLJ12650 29 GFRA1 205696_s_at g4885268 U97144 GDNF family receptor 1 30 IGFBP4 201508_at g10835020 NM_001552 Insulin-like growth factor binding protein 4

The list of 25 biomarkers for training a COXEN gene expression model for Paclitaxel.

Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 1 DUSP3 201536_at Hs.181046.0 AL048503 dual specificity phosphatase 3 2 --- 202377_at Hs.23581.0 AW026535 --- 3 CCND1 208712_at g179364 M73554 cyclin D1 4 PDZD2 209493_at g12751451 AF338650 PDZ domain containing 2 polymerase (DNA directed), epsilon 3 (p17 5 POLE3 208828_at g13278800 BC004170 subunit) 6 TFDP1 212330_at Hs.79353.2 R60866 Dp-1 7 PLCB4 203896_s_at g4505866 AL535113 phospholipase C, beta 4 8 MAP3K9 213927_at Hs.170267.0 AV753204 mitogen-activated protein kinase kinase kinase 9 integrin, alpha 3 (antigen CD49C, alpha 3 subunit 9 ITGA3 201474_s_at g4504746 NM_002204 of VLA-3 receptor) 10 RUVBL1 201614_s_at g4506752 NM_003707 RuvB-like 1 (E. coli) methylenetetrahydrofolate dehydrogenase 11 MTHFD2 201761_at g13699869 NM_006636 (NADP+ dependent) 2, methenyltetrahydrofolate cyclohydrolase 12 TFDP1 204147_s_at g6005899 NM_007111 transcription factor Dp-1 DIM1 dimethyladenosine transferase 1-like (S. 13 DIMT1L 204405_x_at g7657197 NM_014473 cerevisiae) DNA fragmentation factor, 40kDa, beta 14 DFFB 206752_s_at g4758149 NM_004402 polypeptide (caspase-activated DNase) 15 CCND1 208711_s_at g12652656 BC000076 cyclin D1 acidic (leucine-rich) nuclear phosphoprotein 32 16 ANP32B 201306_s_at g5454087 NM_006401 family, member B 17 --- 210524_x_at g6683748 AF078844 --- N-deacetylase/N-sulfotransferase (heparan 18 NDST1 202607_at Hs.20894.0 AL526632 glucosaminyl) 1 19 FEN1 204767_s_at g12653112 BC000323 flap structure-specific endonuclease 1 20 HNRNPR 208766_s_at g12655184 BC001449 heterogeneous nuclear ribonucleoprotein R 21 RBM10 208984_x_at g13278827 BC004181 RNA binding motif protein 10 22 WWC1 213085_s_at Hs.21543.0 AB020676 WW and C2 domain containing 1 23 NUP188 212691_at Hs.30002.3 AW131863 188kDa guanine nucleotide binding protein (), 24 GNG11 204115_at g4758447 NM_004126 gamma 11 25 UGCG 204881_s_at g4507810 NM_003358 UDP- ceramide glucosyltransferase

The list of 23 biomarkers for training a COXEN gene expression model for 5- fluorouracil.

Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 1 TRAP1 201391_at g7706484 NM_016292 TNF receptor-associated protein 1 2 NPM3 205129_at g6857817 NM_006993 nucleophosmin/nucleoplasmin, 3 3 DDX28 203785_s_at g8922975 NM_018380 DEAD (Asp-Glu-Ala-Asp) box polypeptide 28 4 PHB2 201600_at g6005853 NM_007273 prohibitin 2 5 RNF24 204668_at Hs.30524.0 AL031670 ring finger protein 24 6 WHSC2 203112_s_at g5032226 NM_005663 Wolf-Hirschhorn syndrome candidate 2 7 RUVBL1 201614_s_at g4506752 NM_003707 RuvB-like 1 (E. coli) 8 RNPS1 200060_s_at g12804496 BC001659 RNA binding protein S1, serine-rich domain 9 TSFM 212656_at Hs.3273.0 AF110399 Ts translation elongation factor, mitochondrial

10 TUFM 201113_at g4507732 NM_003321 Tu translation elongation factor, mitochondrial AU RNA binding protein/enoyl-Coenzyme A 11 AUH 205052_at g4502326 NM_001698 hydratase 12 ATRN 212517_at Hs.194019.0 AL132773 attractin 13 APRT 203219_s_at g4502170 NM_000485 adenine phosphoribosyltransferase 14 FYN 212486_s_at Hs.169370.1 N20923 FYN oncogene related to SRC, FGR, YES 15 PMM2 203201_at g4557838 NM_000303 phosphomannomutase 2 16 RSL1D1 212018_s_at Hs.85963.0 AK025446 ribosomal L1 domain containing 1 17 RNF11 208924_at g5931544 AB024703 ring finger protein 11 dishevelled associated activator of morphogenesis 18 DAAM1 216060_s_at Hs.197751.2 AK021890 1 19 STOML2 215416_s_at Hs.3439.1 AC004472 stomatin (EPB72)-like 2 20 RPL13 212933_x_at Hs.180842.1 AA961748 ribosomal protein L13 21 DST 212254_s_at Hs.198689.0 AI798790 dystonin GCN1 general control of amino-acid synthesis 1- 22 GCN1L1 212139_at Hs.75354.0 D86973 like 1 (yeast) methylenetetrahydrofolate dehydrogenase 23 MTHFD2 201761_at g13699869 NM_006636 (NADP+ dependent) 2, methenyltetrahydrofolate cyclohydrolase

The list of 52 biomarkers for training a COXEN gene expression model for doxorubicin.

Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID NSL1, MIND complex component, 1 NSL1 209484_s_at g9295185 AF201941 homolog (S. cerevisiae) 2 PLLP 204519_s_at g7705754 NM_015993 plasma membrane proteolipid (plasmolipin) glycerophosphodiester phosphodiesterase domain 3 GDPD5 213343_s_at Hs.6748.0 AL041124 containing 5 4 GJB3 205490_x_at Hs.98485.0 NM_024009 gap junction protein, beta 3, 31kDa UDP-Gal:betaGlcNAc beta 1,4- 5 B4GALT4 212876_at Hs.13225.1 BF223021 galactosyltransferase, polypeptide 4 MOB1, Mps One Binder kinase activator-like 1B 6 MOBKL1B 214812_s_at Hs.322903.0 D80006 (yeast) transducin-like enhancer of split 1 (E(sp1) 7 TLE1 203221_at Hs.28935.0 NM_005077 homolog, Drosophila) 8 TROVE2 210438_x_at g387656 M25077 TROVE domain family, member 2 9 DNAJC7 202416_at g4507712 NM_003315 DnaJ (Hsp40) homolog, subfamily C, member 7

CTD (carboxy-terminal domain, RNA polymerase 10 CTDSPL 201906_s_at g5031774 NM_005808 II, polypeptide A) small phosphatase-like

11 ARHGEF5 204765_at g4885632 NM_005435 Rho guanine nucleotide exchange factor (GEF) 5 12 TTRAP 202266_at g7705261 NM_016614 TRAF and TNF receptor associated protein 13 LGALS3 208949_s_at g12654570 BC001120 lectin, galactoside-binding, soluble, 3 14 PER1 202861_at g4505712 NM_002616 period homolog 1 (Drosophila) 15 HPRT1 202854_at g4504482 NM_000194 hypoxanthine phosphoribosyltransferase 1 16 PDXK 202671_s_at g4505700 NM_003681 pyridoxal (pyridoxine, vitamin B6) kinase 17 TGFBR3 204731_at g4507470 NM_003243 transforming growth factor, beta receptor III 18 TMEM184B 202027_at g7110634 NM_012264 transmembrane protein 184B 19 ERAL1 212087_s_at Hs.3426.0 AL562733 Era G-protein-like 1 (E. coli) TIP41, TOR signaling pathway regulator-like (S. 20 TIPRL 214773_x_at Hs.137576.1 AI983505 cerevisiae) 21 EIF6 210213_s_at g2809382 AF022229 eukaryotic translation initiation factor 6 22 WSB1 210561_s_at g5817189 AL110243 WD repeat and SOCS box-containing 1 23 CTTN 201059_at g4885204 NM_005231 cortactin 24 MAPK6 207121_s_at g4506090 NM_002748 mitogen-activated protein kinase 6 25 GIPC1 207525_s_at g5031714 NM_005716 GIPC PDZ domain containing family, member 1 telomeric repeat binding factor 2, interacting 26 TERF2IP 201174_s_at g9507032 NM_018975 protein 27 CPN1 206256_at g4503010 NM_001308 carboxypeptidase N, polypeptide 1 excision repair cross-complementing rodent repair ERCC8 /// deficiency, complementation group 8 /// guanine 28 205162_at g4557466 NM_000082 FLJ12595 nucleotide binding protein-like 3 (nucleolar)-like 29 LAMC2 202267_at g5031846 NM_005562 laminin, gamma 2 hCG_16001 similar to ribosomal protein L23A /// ribosomal 30 213084_x_at Hs.194657.4 BF125158 /// RPL23A protein L23a 31 VIPR1 205019_s_at g4759307 NM_004624 vasoactive intestinal peptide receptor 1 PAX interacting (with transcription-activation 32 PAXIP1 212825_at Hs.173854.0 AI357401 domain) protein 1 inositol polyphosphate-4-phosphatase, type II, 33 INPP4B 205376_at g4504706 NM_003866 105kDa 34 ZNF207 200829_x_at g4508016 NM_003457 zinc finger protein 207 35 LMNA 212089_at Hs.77886.2 M13452 lamin A/C 36 APOB 205108_s_at g4502152 NM_000384 apolipoprotein B (including Ag(x) antigen) 37 CHKA 204266_s_at g4557454 NM_001277 choline kinase alpha Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID protein tyrosine phosphatase, receptor type, f 38 PPFIA1 202066_at Hs.183648.0 AA195259 polypeptide (PTPRF), interacting protein (liprin), alpha 1 39 AVPR1B 208260_at g13112055 NM_000707 arginine vasopressin receptor 1B 40 TLK2 212997_s_at Hs.57553.1 AU151689 tousled-like kinase 2 41 TSPAN1 209114_at g6434903 AF133425 tetraspanin 1 G protein-coupled receptor, family C, group 5, 42 GPRC5A 203108_at g12056470 NM_003979 member A 43 GRB14 206204_at g4758477 NM_004490 growth factor receptor-bound protein 14 44 ARHGAP19 212738_at Hs.80305.0 AV717623 Rho GTPase activating protein 19 45 CRLF3 205474_at g7705331 NM_015986 cytokine receptor-like factor 3 46 KIAA0947 209654_at g13436178 BC004902 KIAA0947 protein 47 --- 212444_at Hs.288660.0 AA156240 --- solute carrier family 25 (mitochondrial carrier; 48 SLC25A16 214140_at Hs.180408.1 AI827990 Graves disease autoantigen), member 16 49 CTSF 203657_s_at g6042195 NM_003793 cathepsin F macrophage stimulating 1 receptor (c-met-related 50 MST1R 205455_at g4505264 NM_002447 ) polymerase (DNA directed), epsilon 3 (p17 51 POLE3 208828_at g13278800 BC004170 subunit) cisplatin resistance-associated overexpressed 52 CROP 203804_s_at g5174618 NM_006107 protein

The list of 184 biomarkers for training a COXEN gene expression model for cyclophosphamide.

Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 1 BCL2L2 209311_at g1944417 D87461 BCL2-like 2 2 DHODH 213632_at Hs.94925.0 M94065 dihydroorotate dehydrogenase 3 MPHOSPH9 215731_s_at Hs.86178.2 X98258 M-phase phosphoprotein 9 4 EDC4 202496_at g7657509 NM_014329 enhancer of mRNA decapping 4 TAF15 RNA polymerase II, TATA box 5 TAF15 202840_at g4507352 NM_003487 binding protein (TBP)-associated factor, 68kDa 6 EP400 212375_at Hs.306094.0 BE880591 E1A binding protein p400 dual-specificity tyrosine-(Y)-phosphorylation 7 DYRK4 212954_at Hs.17154.0 AF263541 regulated kinase 4 8 CES2 213509_x_at Hs.282975.1 AW157619 carboxylesterase 2 (intestine, liver) 9 NAE1 202268_s_at g4502168 NM_003905 NEDD8 activating enzyme E1 subunit 1 10 C12orf24 204521_at g9558740 NM_013300 12 open reading frame 24 11 FMNL1 204789_at g5174400 NM_005892 formin-like 1 12 RBL2 212331_at Hs.79362.0 X76061 retinoblastoma-like 2 (p130) 13 CHD9 212615_at Hs.10351.0 AI742305 chromodomain helicase DNA binding protein 9 14 ERAL1 212087_s_at Hs.3426.0 AL562733 Era G-protein-like 1 (E. coli) ribosome binding protein 1 homolog 180kDa 15 RRBP1 201204_s_at Hs.98614.0 AA706065 (dog) splicing factor proline/glutamine-rich 16 SFPQ 201586_s_at g4826997 NM_005066 (polypyrimidine tract binding protein associated) ATPase, H+ transporting, lysosomal 70kDa, 17 ATP6V1A 201972_at g6523820 AF113129 V1 subunit A 18 NUP93 202188_at g7661901 NM_014669 nucleoporin 93kDa myosin, light chain 6B, alkali, smooth muscle 19 MYL6B 204173_at g4505302 NM_002475 and non-muscle 20 MPHOSPH9 206205_at g13430861 NM_022782 M-phase phosphoprotein 9 21 C1orf114 206721_at g10880974 NM_021179 open reading frame 114 22 CIAPIN1 208424_s_at g10092672 NM_020313 cytokine induced apoptosis inhibitor 1 23 PFAS 213302_at Hs.105478.0 AL044326 phosphoribosylformylglycinamidine synthase 24 CBFB 202370_s_at g13124872 NM_001755 core-binding factor, beta subunit 25 SNRPF 203832_at g4507130 NM_003095 small nuclear ribonucleoprotein polypeptide F SWI/SNF related, matrix associated, 26 SMARCD1 209518_at Hs.79335.1 AI869240 dependent regulator of , subfamily d, member 1 27 CSNK2A2 203575_at g4503096 NM_001896 , alpha prime polypeptide MOB1, Mps One Binder kinase activator-like 28 MOBKL1B 214812_s_at Hs.322903.0 D80006 1B (yeast) T cell receptor associated transmembrane 29 TRAT1 217147_s_at Hs.138701.1 AJ240085 adaptor 1 30 SCCPDH 201826_s_at g7705766 NM_016002 saccharopine dehydrogenase (putative) 31 NAP1L1 204528_s_at g4758755 NM_004537 assembly protein 1-like 1 32 CYTH3 206523_at g8670548 NM_004227 cytohesin 3 33 TERT 207199_at g4507438 NM_003219 telomerase reverse transcriptase alcohol dehydrogenase 1A (class I), alpha 34 ADH1A 207820_at g11496886 NM_000667 polypeptide 35 LSR 208190_s_at g7706247 NM_015925 lipolysis stimulated lipoprotein receptor 36 CASC3 207842_s_at g6678887 NM_007359 cancer susceptibility candidate 3 Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 37 ALDH1B1 209645_s_at Hs.169517.0 NM_000692 aldehyde dehydrogenase 1 family, member B1 38 GSN 200696_s_at g4504164 NM_000177 gelsolin (amyloidosis, Finnish type) SWI/SNF related, matrix associated, actin 39 SMARCE1 211988_at Hs.332848.0 BG289800 dependent regulator of chromatin, subfamily e, member 1 dishevelled associated activator of 40 DAAM1 216060_s_at Hs.197751.2 AK021890 morphogenesis 1 41 TOP2A 201292_at Hs.156346.0 AL561834 (DNA) II alpha 170kDa 42 LGALS3 208949_s_at g12654570 BC001120 lectin, galactoside-binding, soluble, 3 43 EIF2C2 213310_at Hs.324504.1 AI613483 Eukaryotic translation initiation factor 2C, 2 glycerophosphodiester phosphodiesterase 44 GDPD5 213343_s_at Hs.6748.0 AL041124 domain containing 5 45 KARS 200840_at g5031814 NM_005548 lysyl-tRNA synthetase 46 AZIN1 201772_at g7706219 NM_015878 antizyme inhibitor 1 NSL1, MIND kinetochore complex component, 47 NSL1 209484_s_at g9295185 AF201941 homolog (S. cerevisiae) JAM3 /// junctional adhesion molecule 3 /// hypothetical 48 212813_at Hs.55016.1 AA149644 LOC100133502 protein LOC100133502 PAX interacting (with transcription-activation 49 PAXIP1 212825_at Hs.173854.0 AI357401 domain) protein 1 50 HNRNPA1 214280_x_at Hs.249495.7 X79536 heterogeneous nuclear ribonucleoprotein A1 transmembrane emp24-like trafficking protein 51 TMED10 200929_at g5803200 NM_006827 10 (yeast) 52 RPN1 201011_at g4506674 NM_002950 ribophorin I cytochrome P450, family 1, subfamily A, 53 CYP1A1 205749_at g13325053 NM_000499 polypeptide 1 54 TYMS 202589_at g4507750 NM_001071 thymidylate synthetase 55 MED1 203497_at g4759265 NM_004774 mediator complex subunit 1 56 CALCR 207886_s_at g4502546 NM_001742 calcitonin receptor proteasome (prosome, macropain) 26S subunit, 57 PSMD7 201705_at g4506230 NM_002811 non-ATPase, 7 58 LAD1 203287_at g5031844 NM_005558 ladinin 1 59 CRLF3 205474_at g7705331 NM_015986 cytokine receptor-like factor 3 nuclear factor of activated T-cells, cytoplasmic, 60 NFATC3 210555_s_at g1835590 U85430 calcineurin-dependent 3 61 RPL6 200034_s_at g4506656 NM_000970 ribosomal protein L6 62 ELAC2 201767_s_at g11875212 NM_018127 elaC homolog 2 (E. coli) LanC lantibiotic synthetase component C-like 1 63 LANCL1 202020_s_at g5174444 NM_006055 (bacterial) 64 HPRT1 202854_at g4504482 NM_000194 hypoxanthine phosphoribosyltransferase 1 FANCG /// Fanconi anemia, complementation group G /// 65 203564_at g4759335 NM_004629 VCP valosin-containing protein 66 PCDH1 203918_at g4505630 NM_002587 protocadherin 1 67 ZNF84 204453_at g4508036 NM_003428 zinc finger protein 84 protein kinase (cAMP-dependent, catalytic) 68 PKIA 204612_at g5803126 NM_006823 inhibitor alpha 69 TAGLN3 204743_at g10047091 NM_013259 transgelin 3 extra spindle pole bodies homolog 1 (S. 70 ESPL1 204817_at g6912453 NM_012291 cerevisiae) LSM6 homolog, U6 small nuclear RNA 71 LSM6 205036_at g5901997 NM_007080 associated (S. cerevisiae) transmembrane protein with EGF-like and two 72 TMEFF1 205123_s_at g4507548 NM_003692 follistatin-like domains 1 Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 73 FOXE1 206912_at g9257220 NM_004473 forkhead box E1 (thyroid transcription factor 2) 74 PDPN 208233_at g7019416 NM_013317 podoplanin RPS27A /// ribosomal protein S27a /// ubiquitin A-52 75 UBA52 /// 211296_x_at g2647407 AB009010 residue ribosomal protein fusion product 1 /// UBB /// UBC ubiquitin B /// ubiquitin C 76 ITGB4 211905_s_at g2293520 AF011375 integrin, beta 4 hCG_16001 /// similar to ribosomal protein L23A /// ribosomal 77 213084_x_at Hs.194657.4 BF125158 RPL23A protein L23a 78 FNBP1L 215017_s_at Hs.239681.1 AW270932 formin binding protein 1-like 79 RAG2 215117_at Hs.159376.0 AW058148 recombination activating gene 2 glycosylphosphatidylinositol anchor attachment 80 GPAA1 215690_x_at Hs.4742.1 AL157437 protein 1 homolog (yeast) 81 ERGIC3 216032_s_at Hs.169992.1 AF091085 ERGIC and golgi 3 82 UST 205139_s_at g5032218 NM_005715 uronyl-2-sulfotransferase 83 KIAA0284 213242_x_at Hs.182536.0 AB006622 KIAA0284 84 IFI30 201422_at g5453695 NM_006332 interferon, gamma-inducible protein 30 85 TCTA 203054_s_at g11560140 NM_022171 T-cell leukemia translocation altered gene diacylglycerol O-acyltransferase homolog 1 DGAT1 /// 86 203669_s_at g7382489 NM_012079 (mouse) /// similar to diacylglycerol O- LOC727765 acyltransferase homolog 1 (mouse) 87 CCNF 204826_at g4502620 NM_001761 cyclin F 88 TRADD 205641_s_at g13378136 NM_003789 TNFRSF1A-associated via death domain TATA box binding protein (TBP)-associated 89 TAF1A 206613_s_at g5032142 NM_005681 factor, RNA polymerase I, A, 48kDa 90 SAT1 210592_s_at g338335 M55580 spermidine/spermine N1-acetyltransferase 1 91 GRK4 210600_s_at g971254 U33054 G protein-coupled receptor kinase 4 92 DOCK2 213160_at Hs.17211.0 D86964 dedicator of cytokinesis 2 93 FEM1C 213341_at Hs.47367.0 AI862658 fem-1 homolog c (C. elegans) 94 PWP1 201608_s_at g5902033 NM_007062 PWP1 homolog (S. cerevisiae) 95 NUCB1 200649_at g12803104 BC002356 nucleobindin 1 96 SPAG5 203145_at g5453631 NM_006461 sperm associated antigen 5 97 OPRK1 207553_at g4505510 NM_000912 opioid receptor, kappa 1 98 RAD51C 209849_s_at g2909800 AF029669 RAD51 homolog C (S. cerevisiae) 99 NPAS2 213462_at Hs.296355.0 AW000928 neuronal PAS domain protein 2 heterogeneous nuclear ribonucleoprotein C 100 HNRNPC 200014_s_at g4758543 NM_004500 (C1/C2) 101 FUCA1 202838_at g4503802 NM_000147 fucosidase, alpha-L- 1, tissue 102 EPHB4 202894_at g4758289 NM_004444 EPH receptor B4 katanin p80 (WD repeat containing) subunit B 103 KATNB1 203163_at g5031816 NM_005886 1 104 TP63 209863_s_at g3644039 AF091627 tumor protein p63 105 CTNND1 211240_x_at g2224708 AB002382 catenin (cadherin-associated protein), delta 1 106 ZNHIT3 212544_at Hs.2210.0 AI131008 zinc finger, HIT type 3 107 DGCR2 214198_s_at Hs.2491.2 AU150824 DiGeorge syndrome critical region gene 2 108 --- 207875_at g7705671 NM_015882 --- NSL1, MIND kinetochore complex component, 109 NSL1 209483_s_at Hs.24427.0 AF255793 homolog (S. cerevisiae) 110 TICAM1 213191_at Hs.29344.0 AF070530 toll-like receptor adaptor molecule 1 111 FAM169A 213954_at Hs.91662.0 AB020695 family with sequence similarity 169, member A 112 RAD51C 206066_s_at g4506390 NM_002876 RAD51 homolog C (S. cerevisiae) 113 220266_s_at g4758321 AF105036 Kruppel-like factor 4 (gut) SNRPN /// small nuclear ribonucleoprotein polypeptide N 114 201522_x_at g13027651 NM_003097 SNURF /// SNRPN upstream reading frame Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 115 TSTA3 201644_at g6598326 NM_003313 tissue specific transplantation antigen P35B 116 202248_at g12652720 BC000110 E2F transcription factor 4, p107/p130-binding 117 HIRIP3 204504_s_at g5803030 NM_003609 HIRA interacting protein 3 LRRC37A /// leucine rich repeat containing 37A /// leucine 118 206088_at g7662181 NM_014834 LRRC37A2 rich repeat containing 37, member A2 119 CNGA3 207261_at g4502916 NM_001298 cyclic nucleotide gated channel alpha 3 120 LRRC48 208140_s_at g13775213 NM_031294 leucine rich repeat containing 48 subfamily 2, group F, member 121 NR2F6 209262_s_at g12803666 BC002669 6 122 CTCF 202521_at g5729789 NM_006565 CCCTC-binding factor (zinc finger protein) polymerase (RNA) II (DNA directed) 123 POLR2K 202634_at Hs.150675.0 AL558030 polypeptide K, 7.0kDa PH domain and leucine rich repeat protein 124 PHLPPL 213407_at Hs.173373.0 AB023148 phosphatase-like transcobalamin I (vitamin B12 binding protein, 125 TCN1 205513_at g4507406 NM_001062 R binder family) protein phosphatase 1, regulatory (inhibitor) 126 PPP1R12A 201603_at Hs.16533.0 AI817061 subunit 12A DnaJ (Hsp40) homolog, subfamily C, member 127 DNAJC7 202416_at g4507712 NM_003315 7 128 OXCT1 202780_at g4557816 NM_000436 3-oxoacid CoA transferase 1 129 LPHN1 203488_at g7662323 NM_014921 latrophilin 1 130 INHBB 205258_at g9257224 NM_002193 inhibin, beta B 131 PNPLA4 209740_s_at g458225 U03886 patatin-like phospholipase domain containing 4 132 ITPKC 213076_at Hs.21453.0 D38169 inositol 1,4,5-trisphosphate 3-kinase C 133 MAN1C1 214180_at Hs.8736.0 AW340588 mannosidase, alpha, class 1C, member 1 procollagen-lysine 1, 2-oxoglutarate 5- 134 PLOD1 200827_at g4557836 NM_000302 dioxygenase 1 proteasome (prosome, macropain) subunit, beta 135 PSMB10 202659_at g4506190 NM_002801 type, 10 ATPase, H+ transporting, lysosomal 42kDa, 136 ATP6V1C1 202872_at Hs.86905.0 AW024925 V1 subunit C1 137 C4orf8 203600_s_at g4506480 NM_003704 chromosome 4 open reading frame 8 138 FANCA 203806_s_at g4503654 NM_000135 Fanconi anemia, complementation group A 139 PTK6 206482_at g5174646 NM_005975 PTK6 protein tyrosine kinase 6 140 PTPRC 207238_s_at g4506306 NM_002838 protein tyrosine phosphatase, receptor type, C 141 ST7 207871_s_at g11761623 NM_018412 suppression of tumorigenicity 7 142 AZIN1 212461_at Hs.278614.1 BF793951 antizyme inhibitor 1 signal transducer and activator of transcription 143 STAT5B 212549_at Hs.24064.0 BE645861 5B 144 HSPA4 208814_at Hs.90093.2 AA043348 Heat shock 70kDa protein 4 145 NCKIPSD 216114_at Hs.302061.0 AL049430 NCK interacting protein with SH3 domain 146 KIF14 206364_at g7661877 NM_014875 kinesin family member 14 connector enhancer of kinase suppressor of Ras 147 CNKSR2 206731_at g7662367 NM_014927 2 148 DDX28 203785_s_at g8922975 NM_018380 DEAD (Asp-Glu-Ala-Asp) box polypeptide 28 proline/arginine-rich end leucine-rich repeat 149 PRELP 204223_at g4506040 NM_002725 protein 150 XK 206698_at g10835266 NM_021083 X-linked Kx blood group (McLeod syndrome) serpin peptidase inhibitor, clade H (heat shock 151 SERPINH1 207714_s_at g4757923 NM_004353 protein 47), member 1, (collagen binding protein 1) 152 WHSC1 209054_s_at g4378016 AF083389 Wolf-Hirschhorn syndrome candidate 1 153 C3orf63 209285_s_at Hs.23440.0 N38985 open reading frame 63 Gene Affymetrix GeneBank Transcript ID Gene Name Symbol Probe Set ID ID 154 --- 216621_at Hs.306307.0 AL050032 --- KH domain containing, RNA binding, signal 155 KHDRBS1 200040_at g5730026 NM_006559 transduction associated 1 156 TUBA1B 201090_x_at g5174476 NM_006082 tubulin, alpha 1b splicing factor proline/glutamine-rich 157 SFPQ 201585_s_at Hs.180610.0 NM_005066 (polypyrimidine tract binding protein associated) 158 MPHOSPH6 203740_at g5031918 NM_005792 M-phase phosphoprotein 6 159 KIF11 204444_at g13699823 NM_004523 kinesin family member 11 160 TSPAN9 205665_at g5729940 NM_006675 tetraspanin 9 161 AKAP3 207344_at g5454075 NM_006422 A kinase (PRKA) anchor protein 3 162 --- 207881_at g7705677 NM_015887 --- 163 RPN2 213399_x_at Hs.75722.2 AI560720 ribophorin II ATPase, Na+/K+ transporting, alpha 1 164 ATP1A1 220948_s_at g4502268 NM_000701 polypeptide 165 TMEM59 200620_at g4758571 NM_004872 transmembrane protein 59 166 PRKCSH 200707_at g4506076 NM_002743 protein kinase C substrate 80K-H 167 200750_s_at g4092053 AF054183 RAN, member RAS oncogene family 168 BCAP31 200837_at g10047078 NM_005745 B-cell receptor-associated protein 31 169 NCAPD2 201774_s_at Hs.5719.0 AK022511 non-SMC I complex, subunit D2 170 SNX2 202114_at g4507140 NM_003100 sorting nexin 2 171 NUTF2 202397_at g5031984 NM_005796 factor 2 172 DHX9 202420_s_at g13514819 NM_001357 DEAH (Asp-Glu-Ala-His) box polypeptide 9 173 RPP30 203436_at g5454023 NM_006413 ribonuclease P/MRP 30kDa subunit 174 TTC35 203584_at g7661909 NM_014673 tetratricopeptide repeat domain 35 protein disulfide isomerase family A, member 175 PDIA5 203857_s_at g5803120 NM_006810 5 176 IFNAR1 204191_at g10835182 NM_000629 interferon (alpha, beta and omega) receptor 1 177 MRPL28 204599_s_at g5453835 NM_006428 mitochondrial ribosomal protein L28 178 CEP290 205250_s_at g7662079 NM_014684 centrosomal protein 290kDa inositol polyphosphate-4-phosphatase, type II, 179 INPP4B 205376_at g4504706 NM_003866 105kDa macrophage stimulating 1 receptor (c-met- 180 MST1R 205455_at g4505264 NM_002447 related tyrosine kinase) 181 PRSS22 205847_at g11545838 NM_022119 protease, serine, 22 182 CPN1 206256_at g4503010 NM_001308 carboxypeptidase N, polypeptide 1 183 PSEN1 207782_s_at g7549814 NM_007319 presenilin 1 184 CYTIP 209606_at g431327 L06633 cytohesin 1 interacting protein