Supporting Information Supplementary Methods Patients for whole genome sequencing and validation cohort. Heparinized bone marrow samples were obtained from 8 RAEB patients with informed consent for WGS according to the ethics review board of Shanghai Institute of Hematology. Briefly, these 8 patients were 4 RAEB-1, 4 RAEB-2, 5 males, 3 females, 1 with complex karyotype, 1 with +8, 5 with normal karyotype, and classified as intermediate to very high risk level. 6 patients died 4-23 months after diagnosis of infection, hemorrhage, cerebral infarction or evolution to AML (complete information see Table S1). The validation cohort consisted of 188 various subtypes of MDS patients diagnosed and treated in Shanghai Ruijin Hospital and Shanghai No.6 People’s Hospital. All patients provided written informed consent. Bone marrow and paired buccal samples were obtained after informed consent.

DNA sample preparation. Mononuclear cells (MNC) were separated by density gradient centrifugation using Ficoll in 8 RAEB patients and 188 MDS patients from validation cohort. Subsequently, CD34+ cells were isolated by magnetic cell separation (Miltenyi Biotech, Bergisch Gladbach, Germany) to reach a purity of 89-97.7% (average: 93.1%) in 8 RAEB patients. Flow through CD34- cells were also collected for analysis. Skin biopsy was obtained for analysis of normal genome and extracted by DNeasy Blood & Tissue Kit (Qiagen). Genomic DNA of CD34+ cells were isolated by QuickGene DNA whole blood kit L (FUJIFILM, Life Science). Genomic DNA of MNC from validation set was extracted by Wizard® Genomic DNA Purification Kit (Promega).

DNA library preparation. Genomic DNA was sheared by sonication 1 and adaptors were ligated to the resulting fragments. The adaptor-ligated templates were fractionated by agarose gel electrophoresis and fragments of the desired size were excised. The resulting fragments were amplified by ligation-mediated PCR, purified and subjected to DNA sequencing on the Illumina platform.

Massively parallel sequencing. The workflow of Cluster generation using the Illumina cluster station was as follows: template hybridization, isothermal amplification, linearization, blocking, denaturation and sequencing primer hybridization. Then, deep sequencing was performed for the captured libraries with the Illumina GAIIx and HiSeq2000, and 2×120 bp (base pairs) paired end reads were output following the manufacturer’s protocols. Image analysis and base calling were performed by Illumina RTA versions 1.6 with default parameters.

Alignment, SNV/INDEL calling and quality control. A third party software BWA (1) was used to align the paired end reads to the reference (hg19, http://genome.ucsc.edu/) with default parameters. Variations including SNVs and INDELs were called with the Samtools software package (2) and filtered with recommended threshold (SNV quality ≥20, INDEL quality ≥50 and at least 3 reads covered) for cases. To ensure the filter power and minimize the false discover rate, loose criteria were applied to filter those control variations (SNV quality ≥10, INDEL quality ≥10 and ≥3 reads covered). When we were plotting figures and doing clonality analysis based on SNVs, a strict threshold was adopted as below: SNV quality ≥100, case depth ≥30, Map quality ≥55 and control depth ≥30.

Targeted resequencing. In order to determine the recurrent 2 mutations in highlighted , we designed PCR primers following the guidelines from Fluidigm using iPLEX AssayDesigner software. The Fluidigm Access Array microfluidic platform was adapted to generate highly multiplexed libraries of tagged amplicons from MDS patients. Deep resequencing was performed on Illumina GAIIx/MiSeq platform.

Somatic copy number variation (CNV) and uniparental disomy (UPD) detection. The DNA from case tumor and matched germline control was prepared for hybridization to Illumina high density Genome Wide Human 660W Quad_v1 (657,366 probes) SNPs array according to the manufacturer’s protocol. The raw intensity data (*.idat) files were analyzed using the Genotyping Module of Illumina Genome Studio software Version 2011.1. With this software, normalized Log R Ratio (LRR) and B Allele Frequency (BAF) for all the available probes in each sample were extracted. OncoSNP (3) (version 1.1) was selected to detect somatic genomic alterations in paired samples. To verify the reliability of CNVs and UPDs, all the reported alterations were plotted based on their LRR and BAF by R statistical software (www.r-project.org, version 2.15.1) and visually checked. Only the somatic alterations meet the criteria proposed by OncoSNP and PennCNV (4) were kept for further study. Copy Number Variations (CNVs) were analyzed with regard to their chromosomal positions (Fig. S3), indicating that amplification of chr8 regions and DELs or uniparental disomy (UPD) of chr7 regions were the most common events. Cases A2 and A6 were found to have complex chromosomal aberrations (Table S6), harboring DELs of respectively TP53 gene–containing 17p13.3-p13.1 and 17p13.3-p11.2 in the presence of TP53 mutations on the remaining allele. By contrast, UPD of regions on 4q21.22-q35.2 and 7q32.1-q36.3 was found in A7 while amplification 3 of p23.3-q24.3 in A8, the two cases with normal karyotype.

Clonality analysis. Clonality analysis was performed according to a previous report (5). And these figures were plotted based on high-quality somatic SNVs with strict detailed threshold described above.

Statistical analysis. Student’s t-test was used to compare the difference between the average coding sequence mutations between groups. The patients with mutations between RAEB and RCMD was analyzed by Chi-square test. Fisher exact test was applied to determine the co-occurrence of highly recurrent genes. Overall survival (OS) was defined as the time from the date of diagnosis to death or alive at last follow-up (censored). Progression free survival (PFS) was calculated from diagnosis to disease progression, defined as relapse, progression to acute leukemia or RAEB phase, death, or alive at last follow-up (censored). Kaplan-Meier was used to evaluate time to survival and time to progression. All p values were based on 2-sided tests. The statistical analyses were performed with the statistical software package SPSS 19.0 (SPSS Science, Chicago, IL, USA). Univariate analyses were performed among 196 MDS patients to access the impact of age, gender, WHO subtypes, percentage of BM blasts, levels of hemoglobin, platelet and neutrophil, chromosomal aberrations and gene mutations as variables on OS and PFS and to screen the main prognostic factors. It was found that male (p = 0.025), RAEB subtypes (p = 0.002), high percentage of BM blasts (p < 0.001), lower hemoglobin (p < 0.001) and occurrence of gene mutations (p = 0.001) were associated with adverse OS, and except for gender (p = 0.081), these factors were also associated with poor PFS (Table S11). Furthermore, when each of the gene mutations was analyzed separately, mutations of STAG2 (p = 0.007), cohesin family complex (p = 4

0.004), DNMT3A (p = 0.024), IDH1/IDH2 (p = 0.01), U2AF1 (p = 0.018), RUNX1 (p = 0.013) and TP53 (p = 0.048) predicted adverse PFS (Table S12). We then performed multivariate analyses in clinical parameters and mutated genes separately. It was found that clinical factors including percentage of BM blasts (hazard ratio [HR] = 2.30; 95% CI, 1.56-3.37; p < 0.001 for OS, [HR] = 2.29; 95% CI, 1.62-3.23; p < 0.001 for PFS) and hemoglobin (HR = 2.56; 95% CI, 1.44-4.57; p = 0.001 for OS, HR = 2.63; 95% CI, 1.46-4.73; p = 0.001 for PFS) were independent adverse prognostic factors for OS and PFS. While in mutated genes, STAG2 ([HR] = 4.21; 95% CI, 1.07-16.2; p = 0.04), IDH1/IDH2 ([HR] = 5.67; 95% CI, 1.26-10.9; p = 0.017), RUNX1 ([HR] = 4.11; 95% CI, 1.03-5.80; p = 0.043) were independent adverse prognostic factors for PFS (Table S13).

5

Supplementary Figures

Fig. S1. (A), (B) Impact of age on the number of somatic mutations in the genomes of bone marrow CD34+ cells among 8 RAEB cases. No significant correlation was observed between the age and the number of all mutations (p = 0.403) (A) or between the age and the number of non-silent mutations in coding sequences (p = 0.487) (B). (C) Proportion of nucleotide transitions and transversions (62.3% vs. 37.7%) in the genomes of bone marrow CD34+ cells among 8 RAEB cases analyzed with WGS. (D) Numbers of distinct SNVs in coding sequences among 8 RAEB cases analyzed with WGS.

6

Fig. S2. (A) Percentage of nucleotide transitions and transversions in two previously reported exome sequencing studies for MDS (65.0% vs. 35.0%) (MDS-exome-1 and MDS-exome-2) (6, 7) and in a recently reported exome sequencing/WGS study for AML (67.7% vs. 32.3%) (8). (B) Genomic mutation spectrum of SNVs in each of six mutation classes in previously reported studies for MDS (6, 7) and AML (8). Note that C→T is the most prevalent change.

7

Fig. S3. Types and genomic distribution of somatic copy number variations (CNVs) in the genomes of bone marrow CD34+ cells among 7 RAEB case. UPD: uniparental disomy. CNVs are shown in colored lines.

8

Fig. S4. Predicted domain structures of two EWSR1-ASXL1 transcripts, with fusions between sequences for N-terminal 483 aa or 431 aa of EWSR1 and the exon 3 or exon 1 of ASXL1, in a tail-to-tail manner, respectively. EAD: Gln/Pro/Thr-rich region; IQ: IQ domain, binds calmodulin; RRM: RNA recognition motif.

9

Fig. S5. Circos plot showing a landscape view of 287 somatic mutations of 38 genes in seven functional categories among 145 MDS cases. Ribbons connecting distinct categories of gene abnormalities reflect the concurrent mutations of each two categories, whereas mutual exclusivity may exist in areas that are not connected.

10

Fig. S6. Frequencies of non-silent gene mutations in MDS vs. AML. Red: gene abnormalities in all MDS cases with mutations; Blue: gene mutations in RAEB; Green: gene mutations in RCMD; Purple: integrated data of gene mutations previously reported in three AML series (8-10).

11

Fig. S7. (A) Kaplan-Meier estimates of OS for five groups according to IPSS-R. 3-year OS rates for very low, low, intermediate, high and very high were 100%, 80.1±12.9%, 87.9±4.7%, 59.2±8.2% and 24.2±12.5%, respectively (p < 0.001). n: number of cases (B) Survival analysis of MDS patients with distinct status of gene mutations. Kaplan-Meier estimates of OS for three subgroups according to mutations of 21 marker genes (with mutation frequencies  2.5%). 3-year OS rates for absence of mutation (N = 0), one mutation (N = 1) and  2 mutations (N  2) were 93.2±3.3%, 77.3±6.2% and 66.3±6.2%, respectively (p < 0.001).

12

Supplementary Tables Table S1. Clinical characteristics of 8 RAEB patients

Patient

A8 A7 A6 A5 A4 A3 A2 A1

ID

WHO

RAEB RAEB RAEB RAEB RAEB RAEB RAEB RAEB

*

subtype

------

2 2 2 2 1 1 1 1

Ag

e

74 63 61 48 80 62 53 43

(yr)

Sex

M M M M M

F F F

consent

WGS

Yes Yes Yes Yes Yes Yes Yes Yes

Blasts

BM

(%)

11 14 14 17

8 8 9 7

BM CD34+ BM CD34+

Sorting (%) Sorting

After

97.7 93.5 93.8

89 93 92 95 91

ANC (×10

0.67 1.23 0.82 1.04 2.01 1.01 0.98

1.2

9

/L)

Hb (g/L) Hb

108 113

76 98 67 92 76 45

PLT (×10PLT

227 263

68 27 27 87 39 50

9

/L)

48,XY,+8,+8,+9,der(

20)t(20;21),

Cytogenetics

47,XX,+8

46,XX 46,XY 46,XY 46,XY 46,XY

N/A

-

21

13

IPSS

N/A

6.5 5.5 5.5 4.5

5 5 8

-

R

Intermediate

Very High Very High

High High High High

Risk

N/A

Decitabine+Chemotherapy

Decitabine+Low dose dose Decitabine+Low dose Decitabine+Low

Supportive Care Supportive Care Supportive Care Supportive Care Supportive

Chemotherapy Chemotherapy

Allo

Therapy

-

HSCT

Status

Alive Alive

Dead Dead Dead Dead Dead Dead

OS

10+ 10+

17 23 10 10 14

4

ont

hs)

(m

Cigarette

30 years 30

Quit for for Quit

Yes

No No No No No No

Hepatitis

HA

No No No No No No No

V

Cholecystitis, hypertension, thyroid adenoma thyroid hypertension, Cholecystitis,

Diabetes, pneumonectasis, gallstones Diabetes, pneumonectasis,

Hypertension, tuberculous pleuritis tuberculous Hypertension,

Hypertension, arrhythmia Hypertension,

Hypertension, diabetes Hypertension,

Hypertension

Past History

Appendicitis

Urticaria

*WGS: Whole Genome Sequencing †OS: Overall Survival

14

Table S2. Sequencing depth and coverage Patient Coverage Samples Depth ID ≥1X ≥2X ≥4X ≥8X ≥10X ≥15X ≥20X Bone 33.0 99.1% 98.7% 97.9% 95.7% 94.3% 89.4% 82.0% A1 Marrow Skin 37.4 99.0% 98.6% 97.5% 94.9% 93.1% 87.6% 80.4% Bone 48.9 99.9% 99.9% 99.7% 99.5% 99.3% 98.3% 96.0% A2 Marrow Skin 33.8 99.8% 99.6% 99.3% 98.6% 98.0% 95.4% 90.5% Bone 36.6 99.7% 99.6% 99.4% 98.9% 98.5% 96.9% 93.8% A3 Marrow Skin 31.1 99.7% 99.5% 99.0% 97.2% 95.6% 88.7% 77.5% Bone 23.3 99.7% 99.3% 97.7% 91.9% 87.8% 75.0% 60.9% A4 Marrow Skin 20.2 99.9% 99.8% 99.3% 97.3% 95.6% 89.4% 80.5% Bone 32.5 99.8% 99.7% 99.4% 98.4% 97.5% 93.5% 85.9% A5 Marrow Skin 31.1 99.9% 99.8% 99.5% 98.3% 97.3% 92.8% 84.3% Bone 31.6 99.4% 98.8% 97.0% 90.9% 87.0% 76.3% 66.1% A6 Marrow Skin 30.9 99.1% 98.0% 94.7% 85.5% 80.6% 69.5% 60.8% Bone 33.0 99.9% 99.9% 99.7% 99.0% 98.3% 95.0% 88.1% A7 Marrow Skin 34.7 99.9% 99.8% 99.6% 98.6% 97.6% 92.8% 83.3% Bone 31.9 99.8% 99.7% 99.5% 98.9% 98.2% 94.5% 85.7% A8 Marrow Skin 32.8 99.7% 99.6% 99.4% 98.2% 97.0% 90.9% 79.6%

15

Table S3. Summary of SNVs in 8 RAEB patients Patient ID A1 A2 A3 A4 A5 A6 A7 A8 All SNVs 1965 2329 2605 1290 1166 2972 2075 1437 Coding region

Missense 7 12 18 10 7 12 10 6 Nonsense 2 0 2 1 1 1 1 0 Synonymous 7 4 4 4 1 2 5 2 Noncoding, transcribed

5' UTR 2 4 2 3 0 1 3 2 3' UTR 9 12 18 7 4 18 24 6 Splice site 1 0 1 1 0 2 0 0 Intronic 655 820 813 459 365 957 656 466 Intergenic 1282 1476 1746 805 788 1977 1375 955

Table S4. Summary of INDEL* in 8 RAEB patients Patient ID A1 A2 A3 A4 A5 A6 A7 A8 All INDELs 1994 2377 2935 5905 2555 3226 2570 3060 CDS 2 5 5 8 4 2 3 1 Noncoding, transcribed

5' UTR 1 2 3 1 0 5 3 5 3' UTR 13 15 13 40 14 18 18 20 Splice site 0 0 0 0 0 1 0 0 Intronic 737 829 1045 2169 906 1239 952 1153 Intergenic 1241 1526 1869 3687 1631 1961 1594 1881 *INDEL: Short insertion and deletion

Table S5. Genomic rearrangements in 8 RAEB patients

Intrachromosomal Rearrangement Interchromosomal Patient ID Deletions Deletions * Total ITX Insersion Rearrangement (≥1000bp) (50-1000bp) A1 1 3 0 0 1 5 A2 10 0 26 11 23 70 A3 1 0 0 1 3 5 A4 1 2 1 0 1 5 A5 2 2 10 0 65 79 A6 8 4 6 3 9 30 A7 0 0 0 0 0 0 A8 0 0 0 0 0 0 Average 2.9 1.4 5.4 1.9 12.8 24.3 Total 23 11 43 15 102 194 *ITX: Intra- translocation

16

Table S6. Analysis of somatic CNV* and UPD in CD34+ cells from 7 RAEB patients† Gain/l Patient Chromoso Size Karyotype Altered region oss/U Gene (position)‡ ID me (Mb) PD A1 46,XY - - - - - A2 48,XY,+8,+8,+9, 3 p26.3-p12.3 77.7 Loss - der(20)t(20;21),- 3 p12.2-p12.1 6.4 Loss -

21 3 q11.1-q11.2 2 Loss -

3 q12.2-q23 42.5 Loss -

5 q11.1-q11.1 0.7 Loss -

5 q11.2-q11.2 0.9 Loss -

5 q11.2-q11.2 0.9 Loss -

5 q12.2-q12.3 2.5 Loss -

5 q13.1-q13.2 1.4 Loss -

5 q13.2-q35.3 110.6 Loss -

7 p22.3-q36.3 159.1 Loss -

17 p13.3-p13.1 7.6 Loss TP53 (17p13.1)

17 p13.1-p13.1 0.2 Loss -

17 p13.1-p13.1 0.8 Loss -

18 q12.3-q12.3 1.6 Loss -

18 q21.1-q23 32.7 Loss -

20 q11.21-q11.21 1.5 Gain -

20 q11.23-q13.2 16.8 Loss -

22 q11.1-q12.2 13.6 Gain -

A3 47,XX,+8 8 p23.3-q24.3 146.3 Gain SULF1 (8q13.2) A5 46,XY - - - - - A6 fail 1 p36.13-p36.13 0.9 Loss - 2 q33.1-q33.1 0.8 Loss -

3 q13.33-q24 21.7 Loss -

7 p22.3-q11.22 69.7 Loss -

7 q21.2-q21.3 2.5 UPD -

7 q21.3-q31.31 22.8 Loss LAMB4 (7q31.1)

7 q31.31-q32.3 13.5 UPD -

7 q32.3-q36.1 20.3 Loss -

7 q36.1-q36.3 7.5 UPD -

8 p23.3-q24.3 146.3 Gain -

9 p24.3-q34.3 141.1 Gain CIZ1 (9q34.11)

12 p13.31-p12.1 15.9 Loss -

12 p12.1-p12.1 0.9 Loss -

12 q13.13-q13.2 2.4 Loss -

16 p13.3-q24.3 90.2 Gain -

17 p13.3-p11.2 17.5 Loss TP53 (17p13.1)

20 p13-p13 3.3 Loss -

20 p11.23-p11.23 0.7 Loss -

17

20 q11.21-q13.13 19.2 Loss -

A7 46,XY 4 q21.22-q35.2 107.1 UPD - 7 q32.1-q36.3 31.5 UPD -

A8 46,XX 8 p23.3-q24.3 146.3 Gain - *CNV: Copy Number Variation

†CNV data for patient A4 was not available ‡Gene (position): Validated somatic mutations in CNV regions in 7 RAEBpatients

18

Table S7. The total number of mutated genes in 8 RAEB patients

Patient ID A1 A2 A3 A4 A5 A6 A7 A8

Cytogenetic Risk Very Very Good Int. Good Good Good Good Poor Poor* Genes ASXL1

KAT6B 1. Epigenetic TET2† modifiers FTSJD2

IDH2

STAG2

2. Cohesin and USH2A

cell adhesion LAMB4

CDH10

TP53 3. Tumor TMPRSS11A suppressors TSSC1

ZNF219

ZFP161

KDM5C

ZKSCAN7

ZNF391 4. Transcription SCML2 factors STAT4

FOXR1

ZUFSP

CEBPA

NKAP

OR10J5

5. G- PDCL3

modulators OR10X1

ARHGAP28

OSR1

RPS6KA2

PGK2 6. Protein ERBB4 kinases CDC42BPG

CAMKV

BRSK2

7. Spliceosome SRSF2

19 and RNA SNRNP200 conformation HELZ

GRIA2

SLC7A11

GRIN3A

SCN9A

SLC22A4

8. Transporters SLC4A9 and ion channel SLC6A12

ABCA5

GABRA4

SLC16A10

KCNK1

CACNA1B

TMEM107 9. TMC8 Transmembran TMEM79 e TMEM132D

UBR2

10. Ubiquitin / RLIM proteosome BRCC3 pathway UBR4

HERC1

CAV3

CNTRL

11. Skeleton KRT1 and scaffold KRT15 proteins MTUS2

PLS3

SHANK1

FAM96B

D4S234E

ASPM

NLRC4

12. Regulate SERPINA3 cell CARD14 proliferation, IFRD1 differentiation PIK3AP1 and apoptosis CIZ1

PLEKHG5

SAMD9

ALAS2

PTPRD

20

SCUBE1

SULF1 13. Cell SH2B3‡ signaling MPL

CRLF1

14. Genome ATAD5 stability EEA1

MYH2

ZFHX2

MTMR4

MRO

PCDHB5 15. Others SLAMF7

DNAJC17

CCDC74A

PLEKHF1

PRSS48

CRY2

*Very poor: Verified by CNV †TET2: One mutation and one INDEL co-occurred in A4 patient ‡SH2B3: Two nonsense mutations co-occurred in A5 patient Deepened color : INDELs Gene names with gray background: gene mutations seen in VAF-defined subclone in CD34+ cells

21

Table S8. Intra- and inter-chromosomal fusion genes.

Patient Rearrange Confirmed Fusion genes Frame Genomic breakpoint ID ment type by RNAseq

Out-of-fra chr22:29692515︱ A2 EWSR1-ASXL1 CTX* + me chr20:30971684 Out-of-fra chr22:28151485︱ A2 MN1-MICAL3 ITX - me chr22:18454310 Out-of-fra A2 DNAH2-ARL15 CTX chr17:7649996︱chr5:53226738 - me Out-of-fra A2 PER1-DNAH2 ITX chr17:8048691︱chr17:7650750 + me A2 MYH10-PIK3R1 In-frame CTX chr17:8510250︱chr5:67524878 - A2 ARL15-NTN1 In-frame CTX chr5:53228901︱chr17:8967321 - chr7:14696397︱ A3 DGKB-CDK15 In-frame CTX - chr2:202730745 PEMT-TMEM189 chr17:17482036︱ A6 In-frame CTX - -UBE2V1 chr20:48718667 Out-of-fra chr3:122182765︱ A6 KPNA1-ZBTB20 ITX - me chr3:114162595 chr3:148585842︱ A6 CPA3-WWTR1 In-frame ITX - chr3:149364465 *CTX: Inter-chromosome translocation

Table S9. Fusion transcripts in 6 RAEB patients of RNA-seq Patient ID Coding Fusion transcripts A1 A2 A3 A5 A6 A7 A7 change CD34+ CD34- CD34- CD34- CD34- CD34+ CD34- PER1-DNAH2 Out-of-frame

ACOT13-SYN3 Out-of-frame

C15orf57-CBX3 Out-of-frame

EWSR1-ASXL1(1) Out-of-frame

EWSR1-ASXL1(2) Out-of-frame

NSUN3-NIT2 Out-of-frame

HIRA-MICAL3 Out-of-frame

FXR1-ATAD2 Out-of-frame

Total fusion n. 0 7 1 0 0 0 0

Grey color : fusion transcript detected

22

Table S10. Integrative analysis of recurrently mutated genes in 196 patients with different MDS subtypes and comparison of gene mutations between MDS and AML

MDS patients RCMD RAEB patients AML patients with non-silent patients with with non-silent with non-silent Function Genes mutations non-silent mutations, n mutation number, n mutations, n (n = 89) (8-10) (n = 196) (n = 95) 1. Cohesin STAG2 13 (6.6%) 2 (2.1%) 11 (12.4%) 3.0% complex SMC3 5 (2.6%) 3 (3.2%) 2 (2.2%) 3.5% RAD21 3 (1.5%) 1 (1.1%) 2 (2.2%) 3.5% SMC1A 2 (1.0%) 2 (2.1%) 0 2.5% Total 23 (11.7%) 8 (8.4%) 15 (16.9%) 12.5% 2. DNA modifiers TET2 27 (13.8%) 12 (12.6%) 13 (14.6%) 9.5% DNMT3A 18 (9.2%) 5 (5.3%) 12 (13.5%) 18.0% IDH2/IDH1 5 (2.6%) 1 (1.1%) 4 (4.5%) 17.0% Total 50 (25.5%) 18 (17.0%) 29 (32.6%) 44.5% 3. Chromatin ASXL1 28 (14.3%) 11 (11.6%) 17 (19.1%) 3.7% modifiers BCOR 12 (6.1%) 2 (2.1%) 9 (10.1%) 1.0% EZH2 9 (4.6%) 5 (5.3%) 3 (3.4%) 1.5% Total 49 (25.0%) 18 (17.0%) 29 (32.6%) 6.2% 4. Spliceosome U2AF1 29 (14.8%) 8 (8.4%) 20 (22.5%) 4.0% genes SF3B1 22 (11.2%) 5 (5.3%) 10 (11.2%) 0.5% SRSF2 11 (5.6%) 0 11 (12.4%) 0.5% ZRSR2 6 (3.1%) 3 (3.2%) 3 (3.4%) / SNRNP200 1 (0.5%) 0 1 (1.1%) 0.5% Total 69 (35.2%) 16 (16.9%) 45 (50.6%)* 5.5% 5. Transcription RUNX1 17 (8.7%) 6 (6.3%) 11 (12.4%) 6.5% factors GATA2 5 (2.6%) 1 (1.1%) 4 (4.5%) 1.0% CEBPA 4 (2.0%) 1 (1.1%) 3 (3.4%) 14.3% ETV6 2 (1.0%) 0 2 (2.2%) 1.0% Total 28 (14.3%) 8 (8.5%) 20 (22.5%)* 22.8% 6. Activated MPL 6 (3.1%) 1 (1.1%) 3 (3.4%) 0.5% signaling NRAS/KRAS 6 (3%) 1 (1.1%) 5 (5.6%) 8.7% molecules SH2B3 5 (2.6%) 2 (2.1%) 3 (3.4%) 0.5% CBL 3 (1.5%) 2 (2.1%) 1 (1.1%) 1.5% PTPN11 3 (1.5%) 0 3 (3.4%) 4.5% CALR 2 (1.0%) 1 (1.1%) 1 (1.1%) 1.0% NF1 2 (1.0%) 1 (1.1%) 1 (1.1%) 1.5% JAK2 1 (0.5%) 0 1 (1.1%) 4.5% FLT3 1 (0.5%) 1 (1.1%) 0 21.9% KIT 0 0 0 5.2% Total 28 (14.3%) 9 (9.5%) 18 (20.2%) 49.8%

23

7. Tumor TP53 20 (10.2%) 5 (5.3%) 12 (13.5%) 4.0% suppressors WT1 3 (1.5%) 0 3 (3.4%) 5.3% PHF6 1 (0.5%) 1 (1.1%) 0 3.0% Total 24 (12.2%) 6 (6.3%) 15 (16.9%)* 12.3% 8. NPM1 and SETBP1 4 (2.0%) 2 (2.1%) 2 (2.2%) 1.0% other myeloid NPM1 5 (2.6%) 3 (3.2%) 2 (2.2%) 24.2% genes DST 2 (1.0%) 1 (0.5%) 1 (0.5%) 1.0% BOD1L 1 (0.5%) 1 (0.5%) 0 0.5% FAM5C 3 (1.5%) 2 (1.0%) 1 (0.5%) 2.5% *Compared with RCMD, chi-square test, p < 0.05

24

Table S11. Clinical variables related to survival and disease progression OS PFS Patients, n 3-year (%) p value 3-year (%) p value Gender 196 Male 109 52.3±7.8 49.8±7.8 0.025 0.081 Female 87 75.2±7.9 67.5±8.5 Age (yr) 196 < 60 113 66.5±7.3 66.4±6.2 0.141 0.226  60 83 57.1±8.8 45.8±10.2 WHO subtypes 196 RAEB-1/RAEB-2 89 33.0±9.8 25.5±9.3 RCMD 95 82.7±6.1 80.6±6.1 0.002 0.001 RARS 6 100 100 RCUD 6 75.0±21.7 75.0±21.7 BM blasts (%) 196 0-2 65 98.4±1.6 92.7±3.6 2-5 42 82.2±6.1 75.0±6.9 <0.001 <0.001 5-10 44 47.3±13.4 46.8±11.6 > 10 45 29.7±10.7 23.6±10.0 Hb (g/L) 196 < 80 116 50.4±8.7 47.5±7.9 8-100 39 80.8±7.1 <0.001 70.5±8.3 0.005  100 41 86.0±7.6 83.9±7.7 PLT (×109/L) 196 < 50 91 61.4±7.9 59.1±7.4 50-100 46 62.7±12.0 0.482 50.6±13.5 0.936  100 59 73.7±6.7 70.1±6.9 ANC (×109/L) 196 < 0.8 58 66.4±7.3 66.1±7.2 0.192 0.504  0.8 138 62.3±7.2 56.7±7.3 Mutations* 196 With mutations 145 52.0±6.9 45.2±6.9 0.001 <0.001 Without mutations 51 93.8±3.5 91.3±4.1 Number of mutations 196 0 mutation 51 93.8±3.5 91.3±4.1 1 mutation 69 65.9±9.2 <0.001 62.7±8.8 <0.001  2 mutations 76 41.1±9.4 37.5±8.8 Cytogenetic aberrations 192 Normal 113 63.1±6.7 62.1±6.3 0.841 0.35 Others 79 63.4±9.5 52.9±10.3 Mutations + cyt† 196 With mutations and cyt 161 55.6±6.6 49.6±6.7 0.004 0.006 Without mutations or cyt 35 90.3±5.3 90.3±5.3 RAEB 89 25

With mutations 81 31.1±9.4 22.5±8.6 0.083 0.07 Without mutations 8 100 100 RAEB 89 With mutations and cyt 81 31.5±9.5 22.9±8.8 0.148 0.099 Without mutations or cyt 8 100 100 RCMD 95 With mutations 53 76.1±10.2 71.1±10.9 0.268 0.196 Without mutations 42 92.9±4.0 90.1±4.7 RCMD 95 With mutations and cyt 65 78.4±9.3 74.4±9.9 0.351 0.374 Without mutations or cyt 30 89.2±5.9 89.2±5.9 IPSS-R 192 Very low 4 100 100 Low 34 80.1±12.9 78.8±11.6 Intermediate 66 87.9±4.7 <0.001 82.8±5.8 <0.001 High 52 59.2±8.2 58.5±7.9 Very High 36 24.2±12.5 22.6±10.2 *Mutations in 38 genes †Cytogenetic aberrations

26

Table S12. Mutated genes associated with OS and PFS in 196 MDS patients

OS PFS Function Genes* 18-months (%) p value 18-months (%) p value STAG2 29.9±15.4 <0.001 24.9±13.6 0.007 1. Cohesin complex SMC3 75.0±21.7 0.244 50.0±25.0 0.588 Cohesin 46.0±12.6 0.016 44.7±11.4 0.004 DNMT3A 44.7±14.3 0.076 37.6±13.8 0.024 2. DNA modifiers TET2 68.1±10.2 0.671 60.1±12.0 0.511 IDH1/IDH2 40.0±21.9 0.001 40.0±21.9 0.01 EZH2 44.4±16.6 0.043 44.4±16.6 0.087 3. Chromatin modifiers ASXL1 67.8±9.5 0.659 63.7±9.8 0.353 BCOR 33.9±18.2 0.277 19.0±16.0 0.051 SRSF2 55.6±16.6 0.552 55.6±16.6 0.197 ZRSR2 80.0±17.9 0.348 80.0±17.9 0.624 4. Spliceosome genes SF3B1 66.9±12.8 0.842 61.4±12.9 0.755 U2AF1 59.2±12.0 0.148 49.0±11.4 0.018 RUNX1 40.8±18.6 0.026 37.5±17.5 0.013 5. Transcription factors GATA2 66.7±27.2 0.737 53.3±24.8 0.186 NRAS/KRAS 50.0±25.0 0.338 50.0±25.0 0.494 6. Activated signaling MPL 60.0±21.9 0.322 60.0±21.9 0.471 molecules SH2B3 80.0±17.9 0.981 80.0±17.9 0.851 7. Tumor suppressors TP53 54.8±13.2 0.016 51.3±13.8 0.048 8. NPM1 and other NPM1 80.0±17.9 0.752 80.0±17.9 0.882 myeloid disease genes *Genes mutated in  5 patients

Table S13. Multivariate analysis for OS and PFS in 196 MDS patients OS PFS Variables p value 95% CI HR* p value 95% CI HR Clinical BM blasts <0.001 1.56-3.37 2.30 <0.001 1.62-3.23 2.29 parameters Hb 0.001 1.44-4.57 2.56 0.001 1.46-4.73 2.63 STAG2 NS† 0.04 1.07-16.2 4.21

Cohesin NS NS

IDH1/2 0.001 2.28-19.5 6.67 0.017 1.26-10.9 5.67 Mutated U2AF1 / NS genes DNMT3A / NS

EZH2 0.001 2.43-27.9 8.23 /

RUNX1 0.001 1.89-11.2 4.59 0.043 1.03-5.80 4.11 TP53 0.001 1.85-10.2 4.35 NS

*HR: Hazard ratio †NS: No significance

27

Table S14. IPSS-R-M prognostic scoring system Prognostic variable 0 0.5 1 1.5 2 3 4 Very Inter- Very Cytogenetics* - Good - Poor Good mediate Poor Bone marrow blast  2 - > 2 - <5 - 5-10 > 10 - (%) Hemoglobin (g/dL)  10 - 8 - < 10 < 8 - - - Platelets (×109/L)  100 50 -< 100 < 50 - - - - ANC (×109/L)  0.8 < 0.8 - - - - - Number of High risk 0 1  2 - - - mutations† mutation‡ *Very Good: -Y, del(11q); Good: normal, del(5q), del(12p), del(20q), double including del(5q); Intermediate: del(7q), +8, +19, i(17q), any other single or double independent clones; Poor: -7, inv(3)/t(3q)/del(3q), double including -7/del(7q), complex: 3 abnormalities; Very Poor: >3 abnormalities †Mutations in 21 genes: U2AF1, SF3B1, SRSF2, ZRSR2, STAG2, SMC3, TET2, DNMT3A, IDH1/IDH2, ASXL1, BCOR, EZH2, NPM1, RUNX1, GATA2, TP53, MPL, SH2B3 and NRAS/KRAS ‡Any mutation of IDH1/IDH2, EZH2, RUNX1 and TP53.

Table S15. IPSS-R-M prognostic risk categories/scores 3-year Risk category Patients, n Risk Score* survival (%) Very Low 13  2.0 100 Low 33 > 2.0-3.5 89.5±5.9 Intermediate 56 > 3.5-5.0 83.7±5.7 High 42 > 5.0-6.5 64.9±9.7 Very High 48 > 6.5 41.7±8.4 *IPSS-R-M risk score = IPSS-R risk score + 0.5

28

Table S16. Re-stratification of MDS patients using M-based* and IPSS-R-M based system 3-year All MDS RAEB-1/RAEB- RCMD/RCUD/RAR Risk category survival patients, n 2, n S, n (%) IPSS-R 192 Very Low 4 0 4 100 Low 34 0 34 80.1±12.9 Intermediate 66 12 54 87.9±4.7 High 52 40 12 59.2±8.2 Very High 36 35 1 24.2±12.5 M-based system 196 Low 63 12 51 93.2±3.3 Intermediate-1 47 23 24 69.4±11.2 Intermediate-2 37 24 13 53.0±13.9 High 49 30 19 49.7±8.3 IPSS-R-M based system 192 Very Low 13 0 13 100 Low 33 0 33 89.5±5.9 Intermediate 56 12 44 83.7±5.7 High 42 30 12 64.9±9.7 Very High 48 45 3 41.7±8.4 *Molecular-marker based system

29

Reference:

1. Li H & Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754-1760. 2. Li H, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078-2079. 3. Yau C, et al. (2010) A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome biology 11(9):R92. 4. Wang K, et al. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17(11):1665-1674. 5. Welch JS, et al. (2012) The origin and evolution of mutations in acute myeloid leukemia. Cell 150(2):264-278. 6. Papaemmanuil E, et al. (2011) Somatic SF3B1 mutation in myelodysplasia with ring sideroblasts. The New England journal of medicine 365(15):1384-1395. 7. Yoshida K, et al. (2011) Frequent pathway mutations of splicing machinery in myelodysplasia. Nature 478(7367):64-69. 8. Ley T, et al. (2013) Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med 368(22):2059-2074. 9. Shen Y, et al. (2011) Gene mutation patterns and their prognostic impact in a cohort of 1185 patients with acute myeloid leukemia. Blood 118(20):5593-5603. 10. Patel JP, et al. (2012) Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. The New England journal of medicine 366(12):1079-1089.

Other Supporting Information Files

Dataset S1 (XLSX)

Dataset S2 (XLSX)

30