<<

S-TABLE 1

GENE SYMBOL FULL NAME AND DESCRIPTION

ATP-binding cassette, sub-family F, member1

ABCF1 This may be regulated by tumor necrosis factor-alpha and play a role in enhancement of protein synthesis and the inflammation process

Coronin, actin binding protein, 1C

This encodes a member of the WD repeat protein family. Members of this family are

CORO1C involved in a variety of cellular processes, including cycle progression, ,

, and gene regulation

Dipeptidyl-peptidase 3

DPP3 This gene encodes a protein that is a member of the S9B family in clan SC of the serine proteases. Increased activity of this protein is associated with certain type of cancers

Prolactin regulatory binding-element protein

PREB This protein may act as a transcriptional regulator and is thought to be involved in some of the developmental abnormalities

Ubiquitin protein ligase E3A

UBE3A This gene encodes an E3 ubiquitin-protein ligase, part of the ubiquitin protein degradation system

Phosphatidylserine synthase 1

PTDSS1 This gene is related to the phosphorous metabolism and lipid biosynthesis

S-Table 1- The 6-gene-model: gene description

S-TABLE 2

HR p-value

Six-gene model 0.30 <0.00001

Primary Tumor Size (≤2cm vs. >2cm) 0.56 0.006

Node ( negative vs. positive ) 0.93 0.8

Age ( <45 vs. ≥45 years) 1.62 0.02

Chemotherapy Exposure (no vs. yes) 1.89 0.04

ER ( negative vs. positive) 0.74 0.25

Differentiation: intermediate vs. well 1.15 0.61

poorly vs. well 1.06 0.83

S-Table 2: Multivariable Proportional-Hazards Analysis of the risk of distant metastasis as a first event in van de Vijver’s dataset based on six-gene model. S-TABLE 3

Datasets Total # of patients Kaplan-Meier(p) HR Cancer type

GSE4573 130 0.04 0.52 SCLC

GSE11117 41 0.09 0.51 NSCLC

HLM 79 0.03 0.52 NSCLC

MICH 177 0.08 0.66 NSCLC

DFCI 82 0.07 0.49 NSCLC

MSKCC 104 0.09 0.51 NSCLC

S-Table 3 - Utilizing the 6-gene model to predict human lung cancer outcomes. (SCLC- Squamous Cell Lung Carcinoma, NSCL-Non-small Cell Lung Cancer)

S-FIGURE 1

A. Class 1 Class 2

van de Vijver

B. Class 1 Class 2

GSE4922

C. Class 1 Class 2

GSE2034

S-Figure 1- Breast cancer patients expressed SpMGS was segregated into two groups by

Hierarchal cluster based on the first bifurcation in the clustering dendrogram, and assigned as Class 1 and Class 2.

S-FIGURE 2

A Van de Vijver

Metastasis-free survival Overall Survival

Class 1 Class 1

Class 2 Class 2 Probability of overall survival Hazard ratio = 0.68 Log-rank p=0.07 Hazard ratio = 0.72 Log-rank p=0.17 Probability of metastasis-free survival

Yea Yea No. AT Risk No. AT Risk Class 1 120 108 96 71 52 31 16 8 Class 1 120 114 102 80 58 38 20 Class 2 175 146 118 95 56 37 20 Class 2 175 167 145 111 69 45 22

B C GSE4922 GSE2034 (Overall Survival) (Relapse-free survival)

Class 1 Class 1

Class 2 Class 2

Probability survival of overall Hazard ratio= 0. 57 Log-rank p=0.03 Hazard ratio =0.60 Log-rank p=0.024 Probability of relapse-free survival of relapse-free Probability

No. AT Risk Yea Yea Class 1 75 68 59 52 46 38 2 No. AT Risk Class 2 88 59 48 43 37 31 5 Class 1 105 101 93 87 80 76 72 66 Class 2 181 169 149 131 124 107 97 87

Figure 2 - Among all breast cancer patients who expressed EMGS, Kaplan-Meier analysis of the probability that patients would remain free of metastases and overall survival in van de Vijver dataset (panel A); overall survival in GSE4922 dataset (panel B) and replase-free survival in GSE2034 dataset (panel C). Patients exhibited the metastatic signature (EMGS) were assigned Class 2 (blue), while those did not were assigned Class 1 (red). S-FIGURE 3

GSE4573 dataset GSE11117 dataset

Class 1 (n=45)

Class 1 (n=25) Class 2 (n=85)

Class 2 (n=216) Probability of overallProbability survival

Probability of overallProbability survival HR=0.5 Log-rank p=0.09 HR=0.52 Log-rank p=0.04

Year Year

HLM dataset MICH dataset

Class 1 (n=94)

Class 1 (n=23) Class 2 (n=83)

Class 2 (n=56) Probability of overallProbability survival Probability of overallProbability survival

HR=0.66 Log-rank p=0.08 HR=0.52 Log-rank p=0.03

Year Year

DFCI dataset MSKCC dataset

Class 1 (n=58) Class 1 (n=38)

Class 2 (n=46) Class 2 (n=44) Probability of overall survivalProbability Probability of overallProbability survival

HR=0. 49 Log-rank p=0.07 HR=0. 51 Log-rank p=0.09

Year Year

S-Figure 3 – Utilize six-gene model to predict human lung cancer patients’ outcome. S-FIGURE 4

4T1 Mouse Metastases Model

Signature (79 SpMGS)

3 Human Breast Cancer Datasets van de Vijver GSE4922 GSE2034

6-gene-model

Validation

3 Original Human Breast Cancer Datasets van de Vijver GSE4922 GSE2034

Further Validation

3 New Human 6 Human Lung Breast Cancer datasets Cancer datasets GSE1456 GSE4573 GSE2990 GSE11117 GSE7390 HLM MICH DFCI MSKCC