(2009) 23, 271–278 & 2009 Macmillan Publishers Limited All rights reserved 0887-6924/09 $32.00 www.nature.com/leu ORIGINAL ARTICLE

The molecular anatomy of the FIP1L1-PDGFRA fusion

C Walz1, J Score2, J Mix1, D Cilloni3, C Roche-Lestienne4, R-F Yeh5, JL Wiemels5, E Ottaviani3, P Erben1, A Hochhaus1, M Baccarani6, D Grimwade7, C Preudhomme8, J Apperley9, G Martinelli6, G Saglio3, NCP Cross2 and A Reiter1 on behalf of the European LeukemiaNet

1III Medizinische Universita¨tsklinik, Medizinische Fakulta¨t Mannheim der Universita¨t Heidelberg, Mannheim, Germany; 2Wessex Regional Genetics Laboratory, Salisbury and Human Genetics Division, University of Southampton, Southampton, UK; 3Division of Hematology and Internal Medicine, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy; 4Unite´ Inserm U837FCentre JP Aubert and Laboratoire de Ge´ne´tique Me´dicale, CHRU, Lille, France; 5Laboratory for Molecular Epidemiology, Department of Epidemiology and Biostatistics, UCSF Medical School, San Francisco, CA, USA; 6Institute of Hematology and Medical Oncology Seragnoli, University of Bologna, Bologna, Italy; 7King’s College London School of Medicine, Department of Medical and Molecular Genetics, London, UK; 8Unite´ Inserm U837FCentre JP Aubert and Laboratoire d’Hematologie A, CHRU, Lille, France and 9Department of Haematology, Imperial College Faculty of Medicine, Hammersmith Hospital, London, UK

The FIP1L1-PDGFRA is a recurrent molecular from chronic phase disease to blast crisis/secondary acute abnormality in patients with -associated myelo- myeloid leukemia and excellent response to treatment with proliferative neoplasms. We characterized FIP1L1-PDGFRA .1–9 junction sequences from 113 patients at the mRNA (n ¼ 113) and genomic DNA (n ¼ 85) levels. Transcript types could be As the underlying at 4q12 is invisible by conven- 1 assigned in 109 patients as type A (n ¼ 50, 46%) or B (n ¼ 47, tional karyotyping, fluorescence in situ hybridization or reverse 43%), which were created by cryptic acceptor splice sites in transcriptase-(RT)-PCR is typically used to detect the fusion different introns of FIP1L1 (type A) or within PDGFRA 12 gene.1,2 Sequence analysis of amplified products obtained by (type B). We also characterized a new transcript type C (n ¼ 12, PCR allows detailed characterization of the heterogeneous 11%) in which both genomic breakpoints fell within coding sequences creating a hybrid exon without use of a cryptic FIP1L1-PDGFRA junction sequences at both DNA and acceptor splice site. The location of genomic breakpoints mRNA levels. The genomic breakpoints within FIP1L1 have within PDGFRA and the availability of AG splice sites determine been reported to occur in five different introns (introns 9–13; the transcript type and restrict the FIP1L1 used for the http://www.ensembl.org), whereas the genomic breakpoints creation of the fusion. Stretches of overlapping sequences within PDGFRA are exclusively found within exon 12. Two were identified at the genomic junction site, suggesting that the different types of fusion mRNA can be distinguished: type A FIP1L1-PDGFRA fusion is created by illegitimate non-homo- logous end-joining. Statistical analyses provided evidence for fusions result from the use of a cryptic splice acceptor site in an clustering of breakpoints within FIP1L1 that may be related to FIP1L1 intron and type B fusions result from the use of a cryptic 1 DNA- or chromatin-related structural features. The variability in splice acceptor site in PDGFRA exon 12. Moreover, hetero- the anatomy of the FIP1L1-PDGFRA fusion has important geneity is increased by the insertion of FIP1L1 intron-derived implications for strategies to detect the fusion at diagnosis or nucleotides in type A fusion mRNAs. However, the number of for monitoring response to treatment. cases reported to date is too low to draw any conclusion about Leukemia (2009) 23, 271–278; doi:10.1038/leu.2008.310; published online 6 November 2008 the relative prevalence of the different fusion transcripts or likely Keywords: FIP1L1; PDGFRA; Eos-MPN mechanisms underlying formation of the causative intrachro- mosomal deletion. Here, we report on a large cohort of patients, which was collected from several European laboratories within the frame- work of the European LeukemiaNet. We present data on (i) the Introduction identification of several new individual fusion transcripts that are of diagnostic significance, (ii) characterization of the The FIP1L1-PDGFRA fusion gene results from a cytogenetically anatomy of three general types of fusion transcripts, (iii) detailed invisible interstitial chromosomal deletion on analysis of the great variability of fusion transcripts and (iv) band 4q12 and was identified as a recurrent molecular evidence that the fusion gene is created by illegitimate non- abnormality in patients with eosinophilia-associated myelopro- homologous end-joining (NHEJ) after DNA breakage. These liferative neoplasms (Eos-MPNs) diagnosed earlier as hyper- results have important implications for diagnosis and develop- eosinophilic syndrome, chronic eosinophilic leukemia or ment of sensitive assays for the monitoring of residual disease in systemic with eosinophilia.1–4 The pathogenesis FIP1L1-PDGFRA-positive patients during treatment. of FIP1L1-PDGFRA-positive Eos-MPN is similar to BCR-ABL- positive chronic myeloid leukemia with constitutively increased activity of the fusion , potential evolution Materials and methods

Correspondence: Professor A Reiter, III. Medizinische Universita¨tskli- Patients nik, Medizinische Fakulta¨t Mannheim der Universita¨t Heidelberg, Bone marrow and peripheral blood samples from 113 patients Theodor-Kutzer-Ufer 1–3, 68167 Mannheim, Germany. E-mail: [email protected] with FIP1L1-PDGFRA-positive Eos-MPN were analyzed. In- Received 11 August 2008; revised 3 September 2008; accepted 2 formed consent was obtained from individual patients according October 2008; published online 6 November 2008 to the Declaration of Helsinki. Anatomy of the FIP1L1-PDGFRA fusion gene C Walz et al 272 Reverse transcription-PCR Total leukocyte RNA was extracted and reverse transcribed with random hexamers using standard techniques. The quality of cDNA was tested by the amplification of standard , for example, ABL1, BCR or glucose-6-phosphate dehydrogenase (G6PD). Primers for the amplification of the FIP1L1-PDGFRA fusion gene by nested reverse transcription (RT)-PCR were used as published.1,10

Genomic long-template PCR A single set of reverse primers derived from PDGFRA intron 12 was used in combination with various forward FIP1L1 primers, which were selected according to the fusion sequence on the RNA level. The long-template PCR reactions were performed following the manufacturer’s instructions (Roche Diagnostics, Mannheim, Germany). Primer sequences are listed in Supple- mentary Table 1.

Bubble-PCR Bubble-PCR to amplify genomic junction fragments was carried out essentially as described using reverse primers derived from PDGFRA intron 13 and bubble-specific primers.10 Figure 1 (a) Distribution of breakpoints within FIP1L1 and PDGFRA. Exon numbering is shown according to ENSEMBL transcript Sequence analysis ENST00000337488 with 18 exons and also, for the most commonly PCR products were sequenced either directly or after cloning involved exons, the exon numbering system used by Cools et al.1 (b) using the TOPO cloning kit (Invitrogen, Leiden, The Nether- Creation of fusion transcripts A, B and C. Transcript type A is lands). The exon numbering was according to ENSEMBL characterized by the insertion of a sequence between an intact FIP1L1 transcript ENST00000337488, which contains 18 exons. Num- exon and a truncated PDGFRA exon 12. The inserted sequence almost always originates from the intron immediately downstream of the bering of exons 9–13 is therefore different from the initial 1 relevant FIP1L1 exon. Transcript type B is characterized by an mRNA numbering reported by Cools et al. in which exons 7a, 8, 8a, 9 fusion consisting of a normally spliced FIP1L1 exon fused with a and 10 correspond to exons 9, 10, 11, 12 and 13 in this study. PDGFRA sequence located immediately downstream of the first The exon numbering terminology and corresponding flanking cryptic AG splice acceptor site following the genomic breakpoint sequences can be found in Figure 1a and Supplementary within PDGFRA exon 12. Transcript type C results when both genomic Table 2. breakpoints fall within coding sequences creating a hybrid exon without use of a cryptic acceptor splice site.

Analysis of breakpoint features Possible clustering of FIP1L1 breakpoints was assessed using scan statistics, which determines the likelihood of a number of FIP1L1-PDGFRA genomic breakpoints breakpoints falling within a certain sliding ‘window’ along a From 85 of the 113 patients, genomic DNA of adequate quality breakpoint cluster region, known as a bandwidth. P-values are was available. The genomic breakpoints in those 85 patients calculated by comparing the density of breakpoints falling were successfully identified by long-template and/or bubble within a bandwidth with a projected homogeneous distribution PCR (Supplementary Table 4). Within FIP1L1, genomic break- across the same area.11 The size of the bandwidth can greatly points were identified within a sequence stretch of 60 kb ranging influence the significance of a cluster and therefore we used from intron 9 to 16. Breakpoints were predominantly located in several methods described in Segal and Wiemels.12 Known FIP1L1 intron 10 (n ¼ 18, 21%), intron 11 (n ¼ 22, 26%), intron DNA sequence or structural features (Supplementary Table 3) 12 (n ¼ 10, 12%) or intron 13 (n ¼ 16, 19%). Rarely, genomic were mapped along the breakpoint cluster regions using breakpoints were found in FIP1L1 intron 9 (n ¼ 3, 3%), exon 11 EMBOSS/Fuzznuc or custom Perl scripts. Statistical proximity (n ¼ 1, 1%), exon 12 (n ¼ 4, 4%), exon 13 (n ¼ 3, 3%), exon 14 of breakpoints was determined by comparing the observed (n ¼ 1, 1%), intron 14 (n ¼ 1, 1%), intron 15 (n ¼ 2, 2%), exon 16 proximity of breakpoints to these features to 200 random (n ¼ 2, 2%) or intron 16 (n ¼ 2, 2%). All genomic PDGFRA simulations of the dispersal of breakpoints and the features breakpoints fell exclusively within exon 12 between positions within the space of the target breakpoint regions. þ 13 and þ 106 with most (65/85; 76%) being found between positions þ 27 and þ 84 (position þ 1 refers to the first base in exon 12). In the majority of patients (55/85; 65%), genomic Results breakpoints could not be determined with absolute precision because of short stretches (1–5 bp) of homology between the FIP1L1-PDGFRA mRNA fusions two genes. FIP1L1-PDGFRA fusion transcripts were amplified from all 113 patients by nested RT-PCR (Supplementary Table 4). Nine different FIP1L1 exons (exons 9–16 and 18) were found to be Classification of three different transcript types fused with PDGFRA exon 12 that was truncated in all cases to a The usage of different AG acceptor splice sites and the variable variable degree (Figure 1a). insertion of FIP1L1 intron-derived sequences of different length

Leukemia Anatomy of the FIP1L1-PDGFRA fusion gene C Walz et al 273 and origin between the exons of FIP1L1 and PDGFRA enabled insertion of a FIP1L1 intron-derived sequence fused with a three main different transcript types (A, B and C) to be identified truncated FIP1L1 exon 16 (Figure 3). The variability of the from 109 patients (Figures 1b and 2). FIP1L1 exons that were involved, the length and origin of inserts, and the length of the truncated PDGFRA exon 12, all lead to formation of unique FIP1L1-PDGFRA junction sequences in Transcript type A. Transcript type A was identified in 50 each individual with a type A junction. patients (46%) and is characterized by the insertion of a sequence between an intact FIP1L1 exon and a truncated PDGFRA exon 12. The inserted sequences were variable in Transcript type B. Transcript type B was identified in 47 length (2–76 bp) and originated in all but one of 50 patients patients (43%) and was characterized in the majority of patients (98%) from the intron immediately downstream of the relevant (n ¼ 43) by an mRNA fusion consisting of a normally spliced FIP1L1 exon. The inserted sequence was always preceded by an FIP1L1 exon fused with a PDGFRA sequence located immedi- AG splice acceptor site at the genomic level. The junction ately downstream of the first cryptic AG splice acceptor site sequence between FIP1L1 and PDGFRA are therefore identical following the genomic breakpoint within PDGFRA exon 12 at the DNA and mRNA levels as the 30-end of the inserted (Figure 4). Between PDGFRA exon 12 positions þ 13 and sequence corresponds to the genomic FIP1L1 breakpoint. þ 106, there are five AG dinucleotides located at positions Strikingly, of the 17 patients with mRNA fusions involving þ 25/26, þ 43/44, þ 48/49, þ 83/84 and þ 100/101. The FIP1L1 exon 12, only one was type A, whereas none of the 21 usage of these potential cryptic splice sites was þ 25/26 patients with involvement of FIP1L1 exon 13 was type B with (4/43 patients; 9%), þ 43/44 (22/43; 51%), þ 48/49 (0%), the majority of them being type A (17/21; 81%). þ 83/84 (13/43, 30%) and þ 100/101 (4/43, 9%). Because of Atypical inserts were seen in three patients. One patient had the constraints dictated by the different reading frames, the sites an insert of 76 bp that was derived from two distinct regions of at þ 25/26, þ 43/44 and þ 100/101 were only used in the same FIP1L1 intron 13 as illustrated in Figure 3. The second combination with FIP1L1 exons 10 and 11, whereas þ 83/84 patient had an inserted sequence of 43 bp that was partly was almost exclusively used in combination with FIP1L1 exon derived from chromosome band 4q33, indicating a complex 12 (12/13, 92%). None of the four AG splice sites within the rearrangement. The third patient had a transcript type A-like PDGFRA breakpoint cluster region, which are usually used for transcript type B, creates an in-frame fusion with FIP1L1 exon 13. Although the AG at position þ 48/49 is in-frame, this is not used as a splice site with any of the available FIP1L1 exons, presumably because it is not recognized by the splicing machinery. Two patients with involvement of FIP1L1 exon 11 showed PDGFRA breakpoints in the region between þ 45 and þ 84. As mentioned above, the next AG splice site following this region can regularly create only an open-reading frame with FIP1L1 exon 12. One of these transcripts is a special type B transcript with usage of an AG splice site located at the end of the FIP1L1 intron. The second patient has a sequence variant in PDGFRA exon 12 at position 83, in which the A of the AG splice site is substituted by a G and thus the next AG after the breakpoint in this patient is at þ 100/101. When compared with the original sequence in ENSEMBL, 25 of 26 evaluable patients had a G instead of an A at position þ 48 of PDGFRA exon 12 leading to Figure 2 Distribution of transcript types A, B or C with respect to which FIP1L1 exon was at the mRNA fusion junction. Note that only a substitution from a glutamine to arginine. The genomic 109 of 113 patients are shown because the transcript type could not be PDGFRA breakpoint is located between 0 and 35 bp upstream of assigned in four patients (either transcript type B or C). the PDGFRA AG splice site. The junction sequences at the DNA

Figure 3 Creation of atypical transcript type A in two patients.

Leukemia Anatomy of the FIP1L1-PDGFRA fusion gene C Walz et al 274

Figure 4 Genetic structure of PDGFRA exon 12 in 50–30 direction. Codons are separated by dashes and AG splice sites are indicated in gray with their corresponding number. Tryptophan residues spanning the WW-like domain are indicated in gray.

and mRNA levels in the same individual are therefore different, associated with breakpoints in this region almost exclusively but identical fusion transcripts are found in different patients, for involved FIP1L1 exon 12 (12/13, 92%). Conversely, only one example, fusion of FIP1L1 exon 10 or 11 to PDGFRA exon 12 patient involving FIP1L1 exon 12 led to formation of a type A position þ 45 was seen in a total of 22 patients, 11 for each transcript (Table 1). FIP1L1 exon. Transcript type B was not observed with involvement of FIP1L1 exon 13 because this exon cannot create an in-frame fusion with available AG splice sites of PDGFRA Clustering of breakpoints exon 12. In one unusual case, the AG splice site was not located We used scan statistics to determine the statistical significance in PDGFRA but was provided by the end of the intronic FIP1L1 of clustering. This procedure involves choosing a length of sequence immediately upstream of the breakpoint, whereas in sequence, termed a ‘bandwidth’, and sliding this window along the remaining three patients the A was derived from the intronic the breakpoint cluster regions. The frequency of breakpoints FIP1L1 sequence and the G from the exonic PDGFRA sequence. falling within the window is compared to a random distribution of breakpoints; if this number exceeds a defined threshold, then Transcript type C. Transcript type C was identified in statistically significant clustering is achieved. The size of the 12 patients (11%) in which both genomic breakpoints fell bandwidth must be chosen carefully to prevent artifactual within coding sequences creating a hybrid exon without use of a significance; for this reason, we used several statistical 12 cryptic acceptor splice site (exon 11, n ¼ 1; exon 12, n ¼ 4; exon techniques to choose bandwidth, and we only report results 13, n ¼ 4; exon 14, n ¼ 1; exon 16, n ¼ 1 and exon 18, n ¼ 1). as significant when they were robust to several bandwidths. An additional four patients could not be assigned to either Considering breakpoints of all types together, clustering as transcript type B or C because of lack of DNA. These patients are assessed by scan statistics was significant at a bandwidth of probably transcript type C but could possibly have been 660 bp (P ¼ 0.01) becoming much more significant at higher –6 transcript type B if the AG splice site is provided by the end bandwidths (P ¼ 8 Â 10 at a bandwidth of 6323 bp). Type A of the FIP1L1 intron or by an AG overlap, as seen in the three breakpoints were not significant at any bandwidth but type B patients described in the previous paragraph. were highly significant (Po0.005 at bandwidths ranging from 1746 to 6915). Transcript type C breakpoints were borderline significantly clustered (P ¼ 0.06 at bandwidth of 1700) but are The location of breakpoints within PDGFRA exon 12 structurally confined to the ends of exons and therefore the determines the transcript type clustering is dependent on mRNA structure rather than DNA For the assessment of the breakpoint distribution in PDGFRA sequence features. Breakpoints within PDGFRA were highly exon 12, data from 109 patients were available: 85 that clustered overall, and among transcripts type A (P ¼ 0.01), but were directly amplified from genomic DNA and a further 24 not for the B and C types (PB1.0). The number of clusters was (transcript type A: n ¼ 4; type B: n ¼ 18; type C: n ¼ 2) for assessed using a gap statistic as described.12 FIP1L1 showed four whom the breakpoint could be inferred from the mRNA fusion. clusters overall, and for the type B variant, two. PDGFRA The borders of various genomic breakpoint regions were demonstrated evidence for five clusters overall, and four for type defined by the location of the AG splice sites that are capable A variants (Figure 5). A search for two-way clustering between of maintaining an in-frame fusion. Breakpoints were FIP1L1 and PDGFRA breakpoints did not demonstrate signifi- located between þ 13 and þ 26 of PDGFRA exon 12 for 11 cant associations. (10%) patients (region 1), between þ 27 and þ 44 for 27 Known DNA sequence or structural features (Supplementary (25%) patients (region 2), between þ 45 and þ 84 for 58 (53%) Table 3) were also mapped along the breakpoint cluster regions. patients (region 3) and between þ 85 and þ 101 for 11 (10%) There were no significant associations with repetitive elements patients (region 4), respectively. Two patients showed break- (including LINE and Alu sequences); however, predicted matrix points at þ 105 or þ 106. attachment regions colocalized to some degree with breakpoint In regions 1 ( þ 13–26) and 4 ( þ 85–101) of PDGFRA exon cluster regions. To explore this further, 200 simulations were 12, the numbers of patients were too small to draw significant performed comparing the actual distribution of breakpoints and conclusions. Breakpoints in region 2 ( þ 27–44) generated type features to randomized assortments of the breakpoints and B transcripts in the majority of patients (22/27, 82%) and features. In this analysis, predicted scaffold/matrix attachment involved only FIP1L1 exons 10 and 11. Breakpoints in region 3 regions (by the SMARTest procedure13,14 were found to be over- ( þ 45–84) gave rise to the majority of type A transcripts (40/50, represented 100 bp prior to the breakpoints (P ¼ 0.01). Smaller 80%) with predominant involvement of FIP1L1 exons 10, 11 known motifs, such as topoisomerase II sites, recombination site and 13 (33/40, 83%). Of interest, the type B transcripts sequences and pyrimidine tracts (Supplementary Table 3), were

Leukemia Anatomy of the FIP1L1-PDGFRA fusion gene C Walz et al 275 Table 1 Genomic breakpoints within PDGFRA exon 12, corresponding FIP1L1 exons and transcript type

Breakpoint location within PDGFRA exon 12

+13–26 (n ¼ 11) +27–44 (n ¼ 27) +45–84 (n ¼ 48) +85–101 (n ¼ 11) +102–106 (n ¼ 2)

Transcript type A B C A B C A B C A B C A B C Total

FIP1L1 exon 9 (0) 1 1 1 1 4 10 (+2) 2 3 11 6 1 1 1 25 11 (+2) 4 1 11 12 2a 11 1 33 12 (0) 1 1 12 2 1 17 13 (+1) 2 15 2 2 21 14 (+2) 1 1 2 15 (+1) 2 2 16 (+2) 2 1 1 4 17 (+2) 18 (0) 1 1 Total 3 7 1 2 22 3 40 15 3 4 3 4 1 1 109 Gray boxes indicate breakpoints belonging to transcript type B, which cannot create an in-frame fusion with the available AG splice site. aThe marked cases are special type B transcripts that skip the AG splice site at +83/84, which is not in-frame with FIP1L1 exon 11. The end phase of each FIP1L1 exon is listed in brackets.

FIP1L1 breakpoints FIP1L1, type A FIP1L1, type B FIP1L1, type C

0.6 1.5 0.6 1.5 0.4 0.4 1.0 0.2 1.0 0.2 0.5 0.0 0.0 0.5 0.0

Gap Statistic -0.2 Gap Statistic Gap Statistic Gap Statistic -0.2 -0.4 -0.5 -0.4 0.0 12345 12345 12345 12345 number of clusters k number of clusters k number of clusters k number of clusters k

PDGFRA, breakpoints PDGFRA, type A PDGFRA, type B PDGFRA, type C 0.6 0.5 0.4 0.8 0.4 0.3 0.6 0.2 0.2 0.0 0.1 0.4 0.0 0.0 -0.2

Gap Statistic Gap Statistic Gap Statistic -0.5 0.2 Gap Statistic -0.4 -0.2 0.0 -0.6 12345 12345 12345 12345 number of clusters k number of clusters k number of clusters k number of clusters k

Figure 5 Assessment of number of clusters for FIP1L1 using gap statistics with hierarchical clustering algorithm using Euclidean distance and complete linkage.12 The gap statistic is calculated at each number of theoretical clusters (1, 2, y5), and where the gap statistic displays the lowest value, a characteristic ‘kink’ in the graph is displayed; this kink corresponds to the number of clusters that is best supported by the data. In this way, one can see that FIP1L1 breakpoints overall have four clusters, whereas FIP1L1-PDGFRA transcript type B has only two clusters. not represented near breakpoints more often than by chance partner genes.10,15–17 Rather than PDGFRA exon 12 being a site (data not shown). that is particularly prone to breakage, the most likely reason for this clustering is that breakpoints in this region specifically disrupt the autoinhibitory WW-like domain between its two Discussion tryptophan residues W559 –and W586. Stover et al.18 have shown earlier that this disruption of the WW-like domain is FIP1L1-PDGFRA has emerged as the second most common essential for the activation of the PDGFRA kinase by FIP1L1, fusion gene in MPNs after BCR-ABL1; however, the creation and which, in contrast to other TK fusion partners, is unlikely to diversity of FIP1L1-PDGFRA fusion transcripts are much more contain a known protein dimerization domain. FIP1L1 is complex. Within PDGFRA, all genomic breakpoints are tightly therefore completely dispensable for PDGFRA activation. Of clustered between positions þ 13 and þ 106 of exon 12, interest, we identified two patients with PDGFRA breakpoints whereas breakpoints in FIP1L1 are much more dispersed. that do not disrupt the autoinhibitory WW-domain, suggesting Breakpoints within the same region of exon 12 are also seen that at least in these cases alternative mechanisms contribute to in rare cases with variant PDGFRA fusions involving alternative tyrosine kinase activation. Moreover, recent data have shown

Leukemia Anatomy of the FIP1L1-PDGFRA fusion gene C Walz et al 276 that FIP1L1-RARA fusion form homodimers in a case of This is most likely explained by the constraints on the breakpoint acute promyelocytic leukemia, suggesting that the dimerization positions for transcript type A required to create a productive in- of this fusion protein is presumably mediated by the FIP1L1 frame mRNA fusion, whereas the breakpoints can be much moiety.19 However, more experimental studies are required to more variable for the creation of transcript type B. For transcript confirm these results to finally assess the impact of FIP1L1 for type C, the constraints are even tighter as the translocation has oncoprotein activation. to generate an in-frame fusion exon without any splicing. Owing The use of various cryptic AG splice sites in FIP1L1 and to the rarity of this fusion, it is not clear whether any factors PDGFRA usually determines the transcript type, which is either might favor the creation of this transcript type but it appears with (transcript type A) or without (transcript type B) an inserted that they are formed in regions where production of transcript sequence derived from a FIP1L1 intron. We have demonstrated type B is not possible. that the location of genomic breakpoints within PDGFRA DNA double-strand breaks (DSBs) are a dangerous threat to exon 12 is strongly associated with the FIP1L1 exons used for genomic integrity and can occur from a variety of exogenous the creation of the fusion gene and also the transcript type. In agents (for example, ionizing radiation, topoisomerase II- transcript type A, any FIP1L1 exon could theoretically be specific drugs) or endogenous DNA-damaging lesions (for involved in the creation of the fusion gene if the intron sequence example, V(D)J recombinase, reactive oxygen species).20 There between the AG splice site and the genomic breakpoint can are two main hypotheses that have been put forward to explain maintain an in-frame fusion. In addition, there is a significant how DSBs are formed: They could simply represent completely relationship (Pearson correlation coefficient r ¼ 0.84, random events and thus, leukemia-associated fusion genes P ¼ 0.0042) for transcript type A between the relative frequency would be a consequence of biological selection for oncogenic of involvement of specific FIP1L1 exons and the size of the products. Alternatively, DSBs might preferentially occur at intron immediately downstream of that exon (Figure 6). particular genomic regions as they result from mechanisms that The formation of transcript type B depends on the creation of use either specific sequence targets or sequence-dependent an in-frame fusion between the end of a FIP1L1 exon and secondary structures. PDGFRA sequence preceded by a cryptic splice site. FIP1L1 In mammalian cells, repair of DSB is performed by two exon 12 is therefore used only if the PDGFRA breakpoint is competing mechanisms: homologous recombination, which located between positions þ 45 and þ 84 as only the AG site at requires regions of homology at the breakpoints, and NHEJ in position þ 83/84 was found to create an in-frame fusion with which two non-homologous broken ends are ligated together. FIP1L1 exon 12. The AG splice site at position þ 107/108 would Because DSBs are often staggered, NHEJ is frequently associated theoretically be available for an in-frame fusion; however, no with small deletions, duplications or insertions at the break- genomic breakpoint with creation of transcript type B was found points.21–23 Chromosome translocations in myeloid disorders distal to the previously used AG splice site þ 100/101. The two generally show features of NHEJ, for example, rearrangements of genomic breakpoints between positions þ 102 and þ 106 the MLL and NUP98 genes,23–25 the t(15;17), t(8;21) and inv(16) create transcript types A and C, respectively. FIP1L1 exon 13 in acute myeloid leukemia26–29 and the t(9;22) in chronic is not involved in transcript type B as it cannot create an myeloid leukemia.30 After repair, the newly created chimeric in-frame fusion with the cryptic AG splice sites within sequence has to fulfill several structural and functional require- PDGFRA exon 12. ments to be translated into a protein with oncogenic properties: The vast majority of patients with transcript type A are created first, the transcribed mRNA sequence has to be in-frame, which in those regions of PDGFRA where transcript type B is less is facilitated by the use of cryptic AG splice sites by the cellular likely, for example, when PDGFRA breakpoints fell between splicing machinery as discussed above. Both fusion partners positions þ 45 and þ 84, transcript type A was found in the have to retain the critical functional domains required for majority of patients, whereas transcript type B was created malignant transformation, for example, in the case BCR-ABL1, almost exclusively by fusion with FIP1L1 exon 12. Both BCR must provide an oligomerization domain, whereas the transcript types should be possible when the PDGFRA break- complete tyrosine kinase domain has to be retained within points fall between positions þ 27 and þ 44 of PDGFRA exon ABL1. As mentioned, it is much less clear what particular FIP1L1 12; however, transcript type B is created in over 80% of patients. domains are essentially required for PDGFRA activation, although the disruption of the autoinhibitory WW-like domain seems to be the most likely candidate. FIP1L1-PDGFRA forms by an intrachromosomal deletion rather than a translocation but the sequence features at the genomic breakpoint junctions are also consistent with NHEJ following two DSBs. Statistical analyses provided some evi- dence for clustering of breakpoints within FIP1L1 that may be related to DNA- or chromatin-related structural features. Significant clustering was evident for all breakpoints when considered together, and also for type B breakpoints considered in isolation. No evidence for clustering of FIP1L1 breakpoints in type A or C cases was found; however, type A breakpoints were associated with predicted scaffold attachment regions, and within the PDGFRA breakpoint cluster region, type A break- points formed a single cluster proximal to a scaffold attachment region. Type B FIP1L1 breakpoints did not cluster near any of the structural features, but generally fell within introns close to exons 11, 12 or 13. The observed statistical clustering of type B Figure 6 Correlation between the size of the FIP1L1 intron and the breakpoints may therefore be governed by gene structure number of breakpoints falling within that intron. (that is, proper splicing and open-reading frame) and specifically

Leukemia Anatomy of the FIP1L1-PDGFRA fusion gene C Walz et al 277 the positions of exon–intron boundaries rather than chromatin/ induction of major molecular responses and achievement of DNA structural features. complete molecular remission in FIP1L1-PDGFRA positive chronic In addition to providing information relevant for the eosinophilic leukemia. Blood 2007; 109: 4635–4640. molecular diagnosis of FIP1L1-PDGFRA-positive disease, our 7 Baccarani M, Cilloni D, Rondoni M, Ottaviani F, Messa F, Merante S et al. Imatinib mesylate induces complete and durable responses findings are relevant to the design of strategies to detect minimal in all patients with the FIP1L1-PDGFRa positive hypereosinophilic residual disease by RT-PCR. Although nine different FIP1L1 syndrome. Results of a multicenter study. Haematologica 2007; exons were found to be fused with the truncated PDGFRA exon 92: 1173–1179. 12, only four FIP1L1 exons (exons 10–13) are involved in the 8 Metzgeroth G, Walz C, Score J, Siebert R, Schnittger S, Haferlach C vast majority of patients (100/113, 88%; Table 1). Jovanovic et al. Recurrent finding of the FIP1L1-PDGFRA fusion gene in et al.6 developed a sensitive quantitative RT-PCR assay with eosinophilia-associated and lympho- blastic T-cell . Leukemia 2007; 21: 1183–1188. FIP1L1 primers covering FIP1L1 exons 9–13 and a reverse 9 Metzgeroth G, Walz C, Erben P, Popp H, Schmitt-Graeff AH, PDGFRA primer overlapping exon 13/14. Only 19 of the 113 Haferlach C et al. Safety and efficacy of imatinib in chronic (17%) patients in this study could not be detected by this assay eosinophilic leukemia and hypereosinophilic syndromeFa phase- due to FIP1L1 breakpoints downstream of FIP1L1 exon 13. II study. Br J Haematol 2008; 143: 707–715. In summary, our results demonstrate that the marked 10 Score J, Curtis C, Waghorn K, Stalder M, Jotterand M, Grand FH heterogeneity of the FIP1L1-PDGFRA fusion gene is not a et al. Identification of a novel imatinib responsive KIF5B-PDGFRA random event. The location of the genomic breakpoint within a fusion gene following screening for PDGFRA overexpression in patients with hypereosinophilia. Leukemia 2006; 20: 827–832. very short stretch of 94 bp within PDGFRA exon 12 and certain 11 Loader CR. Large deviation approximations to the distribution of characteristics within FIP1L1 (intron size, structural features and scan statistics. Adv Appl Probab 1991; 23: 751–771. availability of AG splice sites) determine the anatomy of the 12 Segal MR, Wiemels JL. Clustering of translocation breakpoints. fusion gene, which is created by illegitimate NHEJ. Because the J Am Stat Assoc 2002; 97: 66–76. PDGFRA breakpoint cluster region is constrained by function- 13 Frisch M, Frech K, Klingenhoff A, Cartharius K, Liebich I, Werner ally relevant domains for the fusion gene, alternative genomic T. In silico prediction of scaffold/matrix attachment regions in large genomic sequences. Genome Res 2002; 12: 349–354. PDGFRA breakpoints seem unlikely. The great variability in the 14 Liebich I, Bode J, Reuter I, Wingender E. Evaluation of sequence anatomy of the FIP1L1-PDGFRA fusion gene has important motifs found in scaffold/matrix-attached regions (S/MARs). Nucleic implications for strategies to detect the fusion in patients at Acids Res 2002; 30: 3433–3442. diagnosis and also for the development of quantitative RT-PCR 15 Baxter EJ, Hochhaus A, Bolufer P, Reiter A, Fernandez JM, Senent L assays to monitor response to treatment with TK inhibitors. et al. The t(4;22)(q12;q11) in atypical chronic myeloid leukaemia fuses BCR to PDGFRA. Hum Mol Genet 2002; 11: 1391–1397. 16 Curtis CE, Grand FH, Musto P, Clark A, Murphy J, Perla G et al. Acknowledgements Two novel imatinib-responsive PDGFRA fusion genes in chronic eosinophilic leukaemia. Br J Haematol 2007; 138: 77–81. 17 Walz C, Curtis C, Schnittger S, Schultheis B, Metzgeroth G, Schoch This study was supported by the ‘Deutsche Jose´ Carreras C et al. Transient response to imatinib in a chronic eosinophilic F Leuka¨mie-Stiftung eV DJCLS R06/02 and H03/01, Germany, leukemia associated with ins(9;4)(q33;q12q25) and a CDK5RAP2- Leukaemia Research Fund, UK, the Competence Network ‘Acute PDGFRA fusion gene. Genes Cancer 2006; 45: and Chronic ’, sponsored by the German Bundesmi- 950–956. nisterium fu¨r Bildung und Forschung (Projekttra¨ger Gesundheits- 18 Stover EH, Chen J, Folens C, Lee BH, Mentens N, Marynen P et al. forschung; DLR eVF01GI9980/6) and the ‘European Activation of FIP1L1-PDGFRalpha requires disruption of the juxtamembrane domain of PDGFRalpha and is FIP1L1-indepen- LeukemiaNet’ within the 6th European Community Framework dent. Proc Natl Acad Sci USA 2006; 103: 8078–8083. Programme for Research and Technological Development. R-FY 19 Kondo T, Mori A, Darmanin S, Hashino S, Tanaka J, Asaka M. The and JLW were supported by NCI-CA89032. GM was supported by seventh pathogenic fusion gene FIP1L1-RARA was isolated from a FIRB2006 and AIRC. t(4;17)-positive acute promyelocytic leukemia. Haematologica 2008; 93: 1414–1416. 20 Bowater R, Doherty AJ. Making ends meet: repairing breaks in References bacterial DNA by non-homologous end-joining. PLoS Genet 2006; 2: e8. 1 Cools J, DeAngelo DJ, Gotlib J, Stover EH, Legare RD, Cortes J 21 Hefferin ML, Tomkinson AE. Mechanism of DNA double-strand et al. A tyrosine kinase created by fusion of the PDGFRA and break repair by non-homologous end joining. DNA Repair (Amst) FIP1L1 genes as a therapeutic target of imatinib in idiopathic 2005; 4: 639–648. hypereosinophilic syndrome. N Engl J Med 2003; 348: 1201–1214. 22 Wiemels JL, Alexander FE, Cazzaniga G, Biondi A, Mayer SP, 2 Vandenberghe P, Wlodarska I, Michaux L, Zachee P, Boogaerts M, Greaves M. Microclustering of TEL-AML1 translocation break- Vanstraelen D et al. Clinical and molecular features of FIP1L1- points in childhood acute lymphoblastic leukemia. Genes PDFGRA (+) chronic eosinophilic leukemias. Leukemia 2004; 18: Chromosomes Cancer 2000; 29: 219–228. 734–742. 23 Gillert E, Leis T, Repp R, Reichel M, Hosch A, Breitenlohner I et al. 3 Pardanani A, Ketterling RP, Brockman SR, Flynn HC, Paternoster A DNA damage repair mechanism is involved in the origin of SF, Shearer BM et al. CHIC2 deletion, a surrogate for FIP1L1- chromosomal translocations t(4;11) in primary leukemic cells. PDGFRA fusion, occurs in systemic mastocytosis associated with Oncogene 1999; 18: 4663–4671. eosinophilia and predicts response to imatinib therapy. Blood 24 Reichel M, Gillert E, Nilson I, Siegler G, Greil J, Fey GH et al. Fine 2003; 102: 3093–3096. structure of translocation breakpoints in leukemic blasts with 4 Pardanani A, Brockman SR, Paternoster SF, Flynn HC, Ketterling chromosomal translocation t(4;11): the DNA damage-repair model RP, Lasho TL et al. FIP1L1-PDGFRA fusion: prevalence and of translocation. Oncogene 1998; 17: 3035–3044. clinicopathologic correlates in 89 consecutive patients with 25 Nebral K, Schmidt HH, Haas OA, Strehl S. NUP98 is fused to moderate to severe eosinophilia. Blood 2004; 104: 3038–3045. topoisomerase (DNA) IIbeta 180 kDa (TOP2B) in a patient with 5 Martinelli G, Malagola M, Ottaviani E, Rosti G, Trabacchi E, acute myeloid leukemia with a new t(3;11)(p24;p15). Clin Cancer Baccarani M. Imatinib mesylate can induce complete molecular Res 2005; 11: 6489–6494. remission in FIP1L1-PDGFR-a positive idiopathic hypereosino- 26 Reiter A, Saussele S, Grimwade D, Wiemels JL, Segal MR, Lafage- philic syndrome. Haematologica 2004; 89: 236–237. Pochitaloff M et al. Genomic anatomy of the specific reciprocal 6 Jovanovic JV, Score J, Waghorn K, Cilloni D, Gottardi E, translocation t(15;17) in acute promyelocytic leukemia. Genes Metzgeroth G et al. Low-dose imatinib mesylate leads to rapid Chromosomes Cancer 2003; 36: 175–188.

Leukemia Anatomy of the FIP1L1-PDGFRA fusion gene C Walz et al 278 27 van der Reijden BA, Dauwerse HG, Giles RH, Jagmohan-Changur 29 Wiemels JL, Greaves M. Structure and possible S, Wijmenga C, Liu PP et al. Genomic acute myeloid leukemia- mechanisms of TEL-AML1 gene fusions in childhood associated inv(16)(p13q22) breakpoints are tightly clustered. acute lymphoblastic leukemia. Cancer Res 1999; 59: Oncogene 1999; 18: 543–550. 4075–4082. 28 Xiao Z, Greaves MF, Buffler P, Smith MT, Segal MR, Dicks BM 30 Zhang JG, Goldman JM, Cross NC. Characterization of genomic et al. Molecular characterization of genomic AML1-ETO fusions in BCR-ABL breakpoints in chronic myeloid leukaemia by PCR. childhood leukemia. Leukemia 2001; 15: 1906–1913. Br J Haematol 1995; 90: 138–146.

Supplementary Information accompanies the paper on the Leukemia website (http://www.nature.com/leu)

Leukemia