Identification of Recurring Tumor-Specific Somatic Mutations In
Total Page:16
File Type:pdf, Size:1020Kb
Leukemia (2011) 25, 821–827 & 2011 Macmillan Publishers Limited All rights reserved 0887-6924/11 www.nature.com/leu ORIGINAL ARTICLE Identification of recurring tumor-specific somatic mutations in acute myeloid leukemia by transcriptome sequencing PA Greif1,2,5, SH Eck3,5, NP Konstandin1,2,5, A Benet-Page`s3, B Ksienzyk1,2, A Dufour2, AT Vetter1, HD Popp2, B Lorenz-Depiereux3, T Meitinger3,4, SK Bohlander1,2,6 and TM Strom3,4,6 1Clinical Cooperative Group ‘Leukemia’, Helmholtz Zentrum Mu¨nchen, German Research Center for Environmental Health, Munich, Germany; 2Department of Medicine III, Universita¨tMu¨nchen, Munich, Germany; 3Institute of Human Genetics, Helmholtz Zentrum Mu¨nchen, German Research Center for Environmental Health, Neuherberg, Germany and 4Institute of Human Genetics, Technische Universita¨tMu¨nchen, Munich, Germany Genetic lesions are crucial for cancer initiation. Recently, whole the human trithorax homolog and histone methyltransferase genome sequencing, using next generation technology, was MLL and nucleophosmin (NPM1).2–7 used as a systematic approach to identify mutations in So far, most of the genes that were found mutated in AML genomes of various types of tumors including melanoma, lung and breast cancer, as well as acute myeloid leukemia (AML). were found through a candidate gene approach, because of their Here, we identify tumor-specific somatic mutations by sequenc- involvement in translocations or in hematopoetic differentiation. ing transcriptionally active genes. Mutations were detected by For example, CEBPA knockout mice show a block in myeloid comparing the transcriptome sequence of an AML sample with differentiation, and both MLL and NPM1 were initially found to the corresponding remission sample. Using this approach, we be involved in fusion genes that resulted from chromosomal found five non-synonymous mutations specific to the tumor translocations in leukemia patients.5–9 sample. They include a nonsense mutation affecting the RUNX1 gene, which is a known mutational target in AML, and a With the advent of next generation sequencing technology, missense mutation in the putative tumor suppressor gene the unbiased detection of tumor-specific somatic mutations TLE4, which encodes a RUNX1 interacting protein. Another became possible.10–15 Sequence analysis of an AML genome missense mutation was identified in SHKBP1, which acts resulted in the identification of recurring mutations in the gene downstream of FLT3, a receptor tyrosine kinase mutated in IDH1, encoding the enzyme isocitrate dehydrogenase 1.11 about 30% of AML cases. The frequency of mutations in TLE4 Metabolite screening of AML samples revealed that the related and SHKBP1 in 95 cytogenetically normal AML patients was 2%. 16 Our study demonstrates that whole transcriptome sequencing enzyme IDH2 is another mutational target. Despite its leads to the rapid detection of recurring point mutations in the technical feasibility, whole genome sequencing is still cost coding regions of genes relevant to malignant transformation. intensive, and therefore several alternative approaches of Leukemia (2011) 25, 821–827; doi:10.1038/leu.2011.19; targeted sequencing have been proposed, like the sequencing published online 22 February 2011 of coding regions. Although the size of a diploid human genome Keywords: acute myeloid leukemia; point mutations; TLE4; is about 6 Gbp, the transcriptome, as defined by the combined SHKBP1; RUNX1; transcriptome sequencing length of all mRNAs in a cell, is only 0.6 Gbp in size. This figure is based on the estimate that a cell contains about 300 000 transcripts, with an average length of 2000 bases.17,18 Sequenc- ing of only a few gigabases of the transcriptome should allow Introduction mutation detection in a large proportion of transcribed genes. Here we report that sequencing of an AML tumor and the Acute myeloid leukemia (AML) is the most frequent hemato- corresponding remission transcriptome allowed us to analyze logical malignancy in adults, with an annual incidence of 3 to 4 approximately 10 000 genes and to identify five tumor-specific cases per 100 000 individuals. Despite the increasing knowl- somatic mutations. edge about the molecular pathology of AML, the prognosis remains poor, with a 5-year survival of only 25–30%. Materials and methods Chromosomal aberrations in tumor cells are found in approxi- mately half of the AML patients, whereas the other half of the Case information patients has a normal karyotype (cytogenetically normal-AML).1 A diagnostic bone marrow sample was collected from a 69-year- Even though a growing number of submicroscopic genetic old patient, diagnosed with AML M1 in May 2008. The patient lesions is identified in AML, about 25% of cytogenetically was included in the AML Cooperative Group clinical trial, and normal-AML patients do not carry any of the currently known informed consent and ethical approval for scientific use of the mutations. The list of frequently affected genes includes the sample including genetic studies were obtained. After induction receptor tyrosine kinase FLT3, the transcription factor CEBPA, therapy using the sequential high-dose cytosine arabinoside and mitoxantrone (S-HAM) protocol, complete remission was Correspondence: Professor SK Bohlander, Department of Medicine III, achieved. After leukocyte recovery in July 2008, a remission University of Munich, Marchioninistr 15, Klinikum Grosshadern, sample from peripheral blood was taken. Munich, Bavaria 81377, Germany. E-mail: [email protected] 5These authors contributed equally to this work. 6Joint senior and corresponding authors. Sample preparation 6 Received 17 October 2010; revised 28 November 2010; accepted 10 Approximately 50 Â 10 cells from each sample were used for December 2010; published online 22 February 2011 mRNA extraction using Trizol (Invitrogen, Carlsbad, CA, USA). AML transcriptome sequencing PA Greif et al 822 The sequencing library was prepared using mRNA-Seq sample Distribution of reads across exonic and non-exonic preparation kit (Illumina, San Diego, CA, USA). In brief, mRNA regions was selected using oligo-dT beads (dynabeads, Invitrogen). The To determine the success of the RNA library preparation, we mRNA was then fragmented using metal ion hydrolysis and calculated the percentage of reads matching to known exons reversely transcribed using random hexamer primers. Following from the UCSC genome browser. For the AML sample, B63% of steps included end repair, adapter ligation, size selection and reads aligned to exons, B28.5% to introns and B7.5% to polymerase chain reaction enrichment. intergenic regions, whereas for the remission sample, B73.5% of reads aligned to exons, B20.5% to introns and B6% to intergenic regions (Figure 1b). The relatively high proportion of Sequence alignment intronic reads may stem from unspliced mRNAs. Variable Short-read alignment and consensus assembly were performed proportions of intronic and exonic reads were observed between 19 using the BWA (v.0.5.5) sequence-alignment program, with different preparations from the same samples, indicating that the default parameters and interactive trimming of low quality minor differences in RNA concentration and quality might bases at the end of reads (cut-off quality value q ¼ 15). We used strongly influence the competitive binding of shorter spliced and an expanded reference sequence comprising the human longer incompletely spliced mRNAs to oligo dT-beads. The genome assembly (build NCBI36/hg18) and all annotated splice values varied between the different chromosomes and the sites extracted from the University of California Santa Cruz number of reads mapping to exons were correlated with overall (UCSC) genome browser-known gene track. In total, we gene density on the chromosome (Supplementary Figure S1). generated 127 115 919 paired-end reads of 36 bp length for the AML sample, of which 95.08% aligned to the reference sequence, and 187 782 678 paired-end reads for the remission Expression analysis sample with 82 % aligning to the reference. Read mapping, Expression values were calculated as RPKM (reads per kilo-base subsequent assembly and variant calling were performed using of gene model per million mapped reads.21 In brief, the number the resequencing software packages BWA and SAMtools.19,20 of uniquely mapping reads (BWA mapping quality 40, B75 to During alignment, 31.27 and 39.81% apparently duplicated 85% of reads for both samples) for each gene was counted reads were removed from the AML and remission sample, and then normalized by gene length and the total number of respectively. reads generated in the experiment. As the reference set, Figure 1 (a) Histograms of the sequence coverage in a non-redundant gene set based on the Ensembl annotation (35 876 genes) for genes detected in both samples (left), the acute myeloid leukemia (AML, middle) and remission (right) samples. Minimum sequence coverage is plotted on the x-axis and number of genes is plotted on the y-axis. We sequenced 10 152 genes with an average coverage of 7 or greater, 6989 genes with an average coverage of 20 or greater and 5535 genes with an average coverage of at least 30 in both samples (left). The result obtained from the AML sample was 11 293 genes with an average coverage of 7 or greater, 7878 genes with an average coverage of 20 or greater and 6326 genes with an average coverage of 30 or greater (middle). The sequencing of the remission yielded 11 906 genes with an average coverage of 7 or greater, 8805 genes with an average coverage of 20 or greater and 7446 genes at an average coverage of at least 30 (right). The high proportion of genes detected in both samples indicates a good comparability of expression profiles. (b) Two pie charts showing the percentage of reads from the AML (left) and remission samples (right) that map to exons, introns or intergenic regions (see also Supplementary Figure S1). Leukemia AML transcriptome sequencing PA Greif et al 823 we used a non-redundant gene set based on the Ensembl gene We sequenced 4.35 and 5.54 Gbp of the tumor and remission annotations by merging all annotated transcripts from the same sample, respectively, on an Illumina GA IIx sequencer gene into a single ‘maximum coding sequence’.