Leukemia (2007) 21, 706–713 & 2007 Nature Publishing Group All rights reserved 0887-6924/07 $30.00 www.nature.com/leu ORIGINAL ARTICLE

Optimization of PCR-based minimal residual disease diagnostics for childhood acute lymphoblastic leukemia in a multi-center setting

VHJ van der Velden1, ER Panzer-Gru¨mayer2, G Cazzaniga3, T Flohr4, R Sutton5, A Schrauder6, G Basso7, M Schrappe6, JM Wijkhuijs1, M Konrad2, CR Bartram4, G Masera3, A Biondi3, JJM van Dongen1

1Department of Immunology, Erasmus MC, Rotterdam, The Netherlands; 2Children’s Cancer Research Institute and St Anna Kinderspital, Vienna, Austria; 3M Tettamanti Research Center, Pediatric Clinic, San Gerardo Hospital, University of Bicocca, Monza, ; 4Institute of Human Genetics, University of Heidelberg, Heidelberg, Germany; 5Children’s Cancer Institute Australia for Medical Research, University of NSW, Sydney, Australia; 6Department of Pediatrics, University Hospital Schleswig- Holstein, Campus Kiel, Germany; 7Hemato-Oncology Laboratory, Department of Pediatrics, University of Padova, , Italy

Minimal residual disease (MRD) diagnostics is used for patients at intermediate risk (IR; 5-year relapse rate of 22%).3,9 Of treatment stratification in childhood acute lymphoblastic leu- note, for recognition of LR patients, the MRD assay had to reach a kemia. We aimed to identify and solve potential problems in À4 3,9 multicenter MRD studies to achieve and maintain consistent sensitivity of at least 10 . On the basis of these results, MRD results between the AIEOP/BFM ALL-2000 MRD laboratories. As diagnostics for treatment stratification is currently applied in the dot-blot hybridization method was replaced by the real-time many childhood ALL treatment protocols, including the ongoing quantitative polymerase chain reaction (RQ-PCR) method AIEOP/BFM ALL-2000 and DCOG-ALL10 protocols. during the treatment protocol, special attention was given to Analysis of MRD is mostly performed using polymerase chain the comparison of MRD data obtained by both methods and to reaction (PCR) analysis of immunoglobulin (Ig) and T-cell the reproducibility of RQ-PCR data. Evaluation of all key steps in molecular MRD diagnostics identified several pitfalls that receptor (TCR) gene rearrangements, as this method is appli- resulted in discordant MRD results. In particular, guidelines for cable in the vast majority of childhood ALL patients and À4 RQ-PCR data interpretation appeared to be crucial for obtaining generally reaches sensitivities of 10 required for identification concordant MRD results. The experimental variation of the RQ- of LR patients.3,9,10 Within the I-BFM-SG, PCR analysis was PCR was generally less than three-fold, but logically became initially followed by dot-blot hybridization using a radio-labeled larger at low MRD levels below the reproducible sensitivity of À4 junctional region-specific probe, resulting in a semi-quantitative the assay (o10 ). Finally, MRD data obtained by dot-blot 3,9 hybridization were comparable to those obtained by RQ-PCR analysis of MRD levels. In the meantime, real-time quantita- analysis (r2 ¼ 0.74). In conclusion, MRD diagnostics using RQ- tive RQ-PCR analysis has become available and offers an easier, 11 PCR analysis of immunoglobulin/T-cell receptor gene rearran- faster and more quantitative method for MRD analysis. gements is feasible in multicenter studies but requires MRD detection by PCR analysis of rearranged Ig/TCR genes is standardization; particularly strict guidelines for interpretation however a complex process, involving many steps (Figure 1). of RQ-PCR data are required. We further recommend regular Identification of pitfalls in this process is of importance in order quality control for laboratories performing MRD diagnostics in international treatment protocols. to ensure comparable MRD results between the MRD-PCR Leukemia (2007) 21, 706–713. doi:10.1038/sj.leu.2404535; laboratories of multicenter national or international treatment published online 8 February 2007 protocols. Furthermore, the move from a laboratory research Keywords: minimal residual disease; real-time quantitative PCR; tool used for retrospective analysis of clinical trials to a quality control; reproducibility; immunoglobulin; T-cell receptor diagnostic tool for stratification of patients necessitates uni- formity in MRD data not only within single treatment protocols but also between different treatment protocols. Introduction Within the MRD Task Force of the I-BFM-SG, we therefore aimed to identify and solve potential problems in multicenter Several studies have shown that detection of minimal residual MRD studies and to achieve and maintain consistent MRD disease (MRD) has prognostic relevance in childhood acute results between the MRD-PCR laboratories participating in the lymphoblastic leukemia (ALL).1–8 On the basis of MRD analysis AIEOP/BFM ALL-2000 protocol. To this end, we evaluated during the early phases of treatment, preferably at two different several steps in PCR-based MRD detection, including detection time points, MRD-based risk groups can be recognized. Within and sequencing of Ig/TCR gene rearrangements, MRD analysis the International BFM Study Group (I-BFM-SG), patients were of follow-up samples, and interpretation of RQ-PCR MRD data. classified according to MRD levels at day 33 and day 78 of As the dot-blot hybridization method was fully replaced by RQ- therapy, and three risk groups could be distinguished: low-risk PCR techniques during the course of the AIEOP-BFM ALL-2000 patients (LR), having MRD negativity at both time points (about protocol, we particularly focused on the comparison of MRD 45% of patients; 5-year relapse rate of 2%); patients at high-risk data obtained by both methods and on the reproducibility of the (HR), having high (X10À3) MRD levels at both time points (about RQ-PCR methods, both experimental variation and variation in 15% of patients; 5-year relapse rate of 80%); and the remaining the interpretation of RQ-PCR data.

Correspondence: Prof Dr JJM van Dongen, Department of Immunol- Materials and methods ogy, Erasmus MC, University Medical Center Rotterdam, Dr Molewa- terplein 50, 3015 GE Rotterdam, The Netherlands. E-mail: [email protected] MRD analysis 12 Received 10 May 2006; revised 6 September 2006; accepted 15 DNA was isolated as described previously. The presence of November 2006; published online 8 February 2007 IGK-Kde, TCRG and TCRD rearrangements in diagnostic Improving molecular MRD diagnostics VHJ van der Velden et al 707 Rotterdam and Sydney, the latter performing the MRD analysis for the I-BFM-SG-related Australian ANZCHOG Study VIII clinical trial. First, each of the five participating laboratories repeated the RQ-PCR MRD assays for a number of patient cases (total number of patients: 74). These repetitions were performed using new DNA dilutions but the same oligonucleotides, one to several months after the initial analysis. Second, the newly obtained RQ-PCR data were interpreted by both the executing laboratory and a second laboratory.

Data analysis All data were analyzed by the department of Immunology, Rotterdam (VHJvdV). Data were presented non-blinded to facilitate the identification of the underlying causes of dis- crepancies and the discussion of how to overcome the pitfalls and achieve concordance.

Results and discussion Figure 1 Overview of all key steps in MRD analysis applying Ig/TCR gene rearrangements. (a) MRD-PVR target identification. (b) Sensitivity MRD diagnostics using PCR analysis of Ig/TCR gene rearrange- testing. (c) MRD analysis of follow-up samples. ments includes three main steps: (1) MRD-PCR target identifica- tion; (2) sensitivity testing; and (3) MRD analysis of follow-up samples (Figure 1). These three main steps were evaluated by comparing the results obtained in the laboratories of the I-BFM- samples was determined using various primer combina- SG MRD task force using centrally provided samples and data tions.3,13,14 Complete IGH rearrangements were detected using files. five VH family primers in combination with one consensus JH primer.15 Sequence analysis was performed as described previously.3 Evaluation of step 1: MRD-PCR target identification MRD levels in follow-up samples were either analyzed by dot-blot hybridization3 or by RQ-PCR analysis. Four laboratories Potential MRD-PCR targets were identified by PCR-heterodu- performed RQ-PCR analysis using the ABI Prism equipment (ABI plex analysis in eight ALL patients. These eight patients were not Prism 7700, 7900 or 7000) and single PCR assay with hydrolysis chosen randomly, but were selected based on the availability of (TaqMan) probes;11,15–17 one laboratory performed a nested sufficient DNA, the presence of particular rearrangements and/ PCR assay in which the second PCR was run on the Light Cycler or the presence of subclonal rearrangements. As shown in using SYBR Green I.18 As these two approaches theoretically Table 1 a total of 40 clonal Ig/T-cell receptor rearrangements differ considerably, the results are shown separately where could be detected by at least one of the four participating relevant. laboratories. Twenty-five out of these 40 rearrangements (63%) were identified in all four laboratories. Discrepancies in target identification between the four laboratories were particularly Exchange of samples and data caused by: 1, lack of detection of clonal Ig/TCR gene Several steps of blinded testing were conducted on DNA rearrangements; 2, sequencing errors; and 3, errors in sequence samples from 18 patients with childhood ALL in the MRD interpretation (Figure 1a). laboratories of the AIOEP-ALL 2000 and ALL-BFM 2000 protocols: Vienna, Heidelberg/Hannover and Monza/Padova. The samples were not chosen randomly but selected by the Rotterdam MRD laboratory, based on their potential to identify Lack of detection of clonal Ig/TCR gene rearrange- and understand pitfalls that might cause discrepancies in MRD ments. The detection of Ig/TCR gene rearrangements is results. Also 20 RQ-PCR data files were selected by Rotterdam dependent on factors such as the applied primer set, the PCR and circulated for independent interpretation. conditions, the amount of DNA input, the quality of the DNA All experiments were performed under routine conditions in and the amount of PCR product used for heteroduplex analysis. parallel to the ongoing MRD diagnostics. All results were For example, the missed VH3-JH rearrangement in TF2 (see discussed in depth during closed meetings with participation of Table 1) might be owing to the use of a consensus FR3 primer all laboratories, including scientists, technicians and clinicians. instead of VH family-specific FR1 primers. Consequently, all The regular and open discussions were essential for the learning four laboratories agreed to use the BIOMED-1 primer sets and process and for making agreements on the standardization of the PCR protocol.13 MRD analysis. By PCR-heteroduplex analysis, several clonal PCR products showed a (very) weak band on the gel, suggesting a subclonal origin. In case of weak clonal bands in heteroduplex analysis RQ-PCR reproducibility experiments (e.g., the Vg3–Jg2.3 rearrangement in TF9; Table 1), further To evaluate the reproducibility of the RQ-PCR analysis and the identification of the rearrangements by sequencing was not interpretation of RQ-PCR data, several analyses were performed performed in all laboratories, resulting in apparent discrepancies in the MRD laboratories of Vienna, Heidelberg, Monza, in the reported rearrangements between the four laboratories.

Leukemia Improving molecular MRD diagnostics VHJ van der Velden et al 708 The presence of Ig/TCR gene rearrangements in a (minor) subclone may hamper its identification. To facilitate the 75 88 100 100 100 interpretation of the PCR data, Southern blot-analysis was 19,20

Overall (%) performed in one laboratory (Rotterdam). Indeed, the IGH and TCRD rearrangements in TF2 and the IGH rearrangement in TF12, which were detected by PCR in only some of the a (1/4) 59

b laboratories, appeared to be oligoclonal according to Southern- 1.3 (4/4) 94

g blot analysis. Target 8

9–J It was agreed that the use of an oligoclonal PCR target should g preferably be avoided for MRD analysis, because only a part of the leukemic clone will be monitored and it is not known at diagnosis which subclone may eventually cause a relapse. If alternative monoclonally appearing rearrangements are not a (1/4) VH1–JH4 available, oligoclonal or subclonal appearing IGH gene b rearrangements should be checked for the presence of a common DH-JH stem. An ASO primer designed in the common Target 7 1.3 (bi-allelic) (4/4) V

g DH-JH stem will enable the simultaneous monitoring of multiple subclones containing this common stem. By this 7–J g approach, one can avoid false-negative MRD results that might occur owing to ongoing clonal evolution if VH-DH specific primers were used. a (2/4) VH1–JH4 b 3 (4/4) V d Target 6 2–D Sequencing errors resulting in an incorrect sequence of d the junctional region. In three cases, the sequence of the junctional region appeared not to be correct. These sequences were not obtained in the MRD-PCR laboratory itself, but were a (2/4) VH3–JH4

b outsourced via a company that performed the sequencing

2.1 (1/4)reaction as well as the sequence interpretation. 75 It is therefore of g I-Kde (2/4) V

Target 5 importance to re-check commercially obtained sequences by k 9–J g evaluating the original sequencing file, which is now routinely performed. To obtain a reliable junctional region, it was agreed that each a (3/4)

b clonal PCR product should preferably be sequenced from both 3 (4/4) 3 d 2.3 (4/4) 2.3 (4/4) d

g g directions. In case of doubt, a second (independent) clonal PCR II-Kde (4/4) VH2–JH4 II-Kde (4/4)II-Kde (3/4) V V 2–D Target 4 product should be sequenced. 2–D 2–J 3–J k k k d d g g

Sequence interpretation errors. In 16 out of 40 rearrange- a (3/4) D ments, the interpretation of the sequences obtained from the b 3 (4/4) V 3 d 2.3 (4/4) V 1.1 (4/4) V

d detected clonal Ig/TCR gene rearrangements differed between g g I-Kde (4/4) V

II–Kde (4/4) the four laboratories. Such misinterpretation may result in non- Target 3 2–D 2–D k 2–J 3–J k d d g g optimal design of ASO primers. For appropriate analysis of junctional regions, it was therefore agreed to use databases available on the worldwide web, such as IMGT (http:// a (3/4) V imgt.cines.fr), V-BASE/DNAPLOT (http://vbase.mrc-cpe.cam.a- b 3 (1/4) V 1 (3/4) V d d 1.3 (4/4) V 2.3 (4/4) Intron-Kde (3/4) V c.uk), Blast (www.ncbi.nlm.nih.gov/BLAST/), or IgBlast g g 2–J Target 2 2–D (www.ncbi.nlm.nih.gov/igblast/). 9–J 9–J d d g g Incorrect interpretation of the junctional region may also be owing to alignment of too short sequences, resulting in inappropriate recognition of the involved gene segment. Finally, a (3/4) VH4–JH it was agreed that at least one-third of a germline D-segment b 3 (4/4) V 3 (4/4) V 1 (3/4) D d d 2.3 (4/4) VH3–JH6 (4/4) V d

g sequence, with a minimum of five nucleotides, should be 1–J 2–D 2–D present for assigning D-segments in the junctional region 4–J d d d g sequence.

Evaluation of step 2 sensitivity testing

First, diagnostic samples from 10 ALL cases as well as the

Summary of the target identification results sequence of one or two Ig/TCR gene rearrangements were provided. ASO primers were designed by the different labora- tories and evaluated for their sensitivity (Table 2). Two potential variables were identified: the ASO primer design and the Locus appeared to be oligoclonal based on Southern blot analysis. Identified Ig/TCR gene rearrangements. The number in parentheses refers to the number of laboratories that identified the involved rearrangement. Table 1 Sample Immunophenotype Target 1 a b TF5TF6TF9 T-ALLTF12 Precursor-B-ALL Precursor-B-ALL Precursor-B-ALL VH1–JH6 (4/4) V VH3–JH6 V (4/4) V V TF1TF2 Precursor-B-ALL Precursor-B-ALL D VH3–JH TF15TF16 Precursor-B-ALLAbbreviations: ALL, Precursor-B-ALL acute lymphoblastic VH2–JH4 leukemia; (4/4) Ig/TCR, immunoglobulin/T-cell receptor. VH3–JH5 (4/4) VH3–JH6 (4/4) V VH1–JH6 (4/4) V interpretation of the sensitivity of the RQ-PCR assay (Figure 1).

Leukemia Improving molecular MRD diagnostics VHJ van der Velden et al 709 Table 2 Overview of MRD analysis in unknown samples

Sample Target Sensitivitya MRD-1b MRD-2b MRD-3b MRD-4b

TF3c VkIII-Kde 10À5 (10À4–10À5)2Â 10À3 (10À2–10À3)10À5 (neg–10À5) Neg (neg–neg) TF4c Vg10–Jg2.3 10À4 (10À3–10À5)10À1 (X10À2–1) 10À1 (X10À2–1) TF7c Intron-Kde 10À4 (10À3–10À5)3Â 10À3 (10À2–10À3)10À4 (neg–o10À3) TF8c Vd2–Dd310À5 (10À4–10À5) o10À4 (o2 Â 10À5–10À3) Neg (neg-neg) TF10 Vd5–Jd110À4 (10À4–10À4)2Â 10À1 (410À1–1) Neg (neg–neg) TF11 Vg4–Jg2.3 10À4 (10À4–10À5)10À1 (10À1–10À1)10À4 (o10À4–10À4) VkIII–Kde 10À4 (10À4–10À4)10À1 (10À1–10À1)10À4 (o10À4–3 Â 10À4) d d d d d TF13 VkII–Kde 10À4 (10À4–10À5)3Â 10À4 (10À3–10À4) o10À4 (neg–10À4) TF14 Vg3–Jg1.1 10À4 (10À2–10À4) Neg (neg–3 Â 10À4) Neg (neg–10À4) TF17 VkII–Kde 10À3 (10À3–10À5)10À3 (7 Â 10À4–10À3)10À4 (5 Â 10À5–2 Â 10À4) VH3–JH4 10À4 (10À4–10À4)10À3 (9 Â 10À4–10À3)10À4 (9 Â 10À5–3 Â 10À4) TF18 Vd1–Jd110À4 (10À4–10À4)7Â 10À2 (6 Â 10À3–10À1) Neg (neg–o10À4) Neg (neg–neg) Neg (neg–neg) Vg11–Jg2.3 5 Â 10À5 (10À4–10À5)6Â 10À2 (2 Â 10À2–10À1) Neg (neg–o10À4) Neg (neg–neg) Neg (neg–neg) Abbreviations: MRD, minimal residual disease; RQ-PCR, real-time quantitative polymerase chain reaction. aSensitivity of the MRD analysis. Data are presented as median (range). bMRD level of follow-up samples. Data are presented as median (range). Neg: negative. cTwo laboratories applied RQ-PCR and the dot-blot method in parallel (analysis of TF3, TF4, TF7, and TF8). Both methods reached comparable sensitivities and resulted in comparable MRD results. dSetup of a good RQ-PCR was not successful in one laboratory and consequently MRD levels could not be analyzed in the two follow-up samples.

ASO primer design. The specificity and characteristics of be found positive in a more sensitive experiment (e.g., the the ASO primer will affect the sensitivity. However, there was second follow-up sample of TF3; Table 2). It should be noted no straightforward relation between the designed ASO primers that MRD-based risk group stratification in the AIEOP/BFM ALL- and the obtained sensitivities. Furthermore, in TF14 (Vg3–Jg1.1), 2000 protocols requires the availability of two targets with a the ASO primer designed by three laboratories was identical, but sensitivity of at least 10À4. sensitivities obtained were not (lower sensitivity in one À2 À4 laboratory: 10 versus 10 ). This likely was due to the use Interpretation of RQ-PCR MRD data. The interpretation of different control DNA samples, resulting in variable levels of of RQ-PCR MRD results varied between laboratories, in background amplification (non-specific amplification observed particular for low to negative MRD results. Guidelines for RQ- in control DNA). PCR MRD data interpretation were therefore drafted, focusing on criteria for MRD positivity, MRD negativity and criteria for RQ-PCR data interpretation. During the discussion of the the calculation of MRD levels. Considerations for the design of RQ-PCR results, it was clear that the interpretation of sensitivity these guidelines have been published previously.11 To evaluate varied between laboratories. Therefore a set of guidelines for and optimize the guidelines for RQ-PCR MRD data interpreta- RQ-PCR interpretation were drafted, focusing on definitions for tion, RQ-PCR data from 20 ALL patients were interpreted by all reproducibility and reproducible sensitivity, definition of max- laboratories. Discordant MRD results were obtained in 12 out of imal sensitivity, definition of background and criteria for 50 follow-up samples that were analyzed; this would have lead acceptable standard curves. Considerations for the design of to discrepant MRD-based risk group stratification in two out of these guidelines have been published previously.11 For evalua- the 20 cases. Re-interpretation of these data applying optimized tion of the guidelines for interpretation of RQ-PCR sensitivity, guidelines still resulted in discordant MRD levels in six follow- RQ-PCR data files from 20 ALL patients were subsequently up samples, all with very low MRD levels (o10À4). Yet, after re- analyzed by all four laboratories and discordant results were interpretation of the data no differences were observed in MRD- obtained in 14 out of 20 cases. After discussion and modifica- based risk group stratification (based on the maximal MRD level tion of the guidelines for RQ-PCR sensitivity interpretation, re- of two MRD-PCR targets analyzed at two time points). Further interpretation of the 20 cases still showed a different interpreta- evaluation of the guidelines was performed in the ‘reproduci- tion in eight cases. Further evaluation of these guidelines was bility experiments’ (see below). performed in the ‘reproducibility experiments’ (see below).

Further evaluation of Step 3: reproducibility of RQ-PCR Evaluation of step 3: MRD analysis of follow-up samples experiments

To identify potential pitfalls and problems in MRD detection in The implementation of RQ-PCR-based MRD diagnostics during follow-up samples, initially samples from 10 ALL cases were the course of the AIEOP/BFM ALL-2000 protocol and the exchanged. On the basis of the obtained results (Table 2), two observed variation in interpretation of RQ-PCR data (see above) major factors affecting the reported MRD results were recog- necessitated the evaluation of the experimental reproducibility nized: 1, the obtained sensitivity of the assay; and 2, the of the RQ-PCR methods as well as the RQ-PCR data interpreta- interpretation of obtained RQ-PCR MRD data (Figure 1). tion in more detail. For evaluation of the experimental reproducibility, all five laboratories repeated the RQ-PCR Sensitivity of the MRD analysis. Logically, the level of MRD analysis of several ALL patients that were previously MRD that can be detected in follow-up samples is dependent on analyzed within the same laboratory. The results of these the sensitivity of the applied method. Therefore, in case of low repeated MRD assays were interpreted by both the executing sensitivity, a sample may be considered negative, whereas it can laboratory as well as a second laboratory.

Leukemia Improving molecular MRD diagnostics VHJ van der Velden et al 710 Experimental reproducibility of RQ-PCR. Repetition of experiment but negative in the other experiment. These RQ-PCR assays resulted in comparable reproducible sensitivities differences can be explained by the fact that low MRD levels in 80 out of 136 cases (59%). However, in 22% of cases the are often detected below the reproducible range of the RQ-PCR repeated experiment showed a lower reproducible sensitivity, assay; by definition repetition of experiments with MRD results whereas an improved reproducible sensitivity was obtained in below the reproducible range may give different results. It 18% of cases. should be noted that within protocols aimed at therapy As shown in Figure 2, comparison of MRD levels of individual reduction for MRD-based low-risk patients (such as the AIEOP/ targets and maximal MRD levels (highest MRD value for all BFM ALL-2000 protocol), prevention of false-negative MRD targets analyzed per follow-up sample) showed concordant data may result in some non-specific (background) amplification results in the majority of samples (o3-fold difference between being interpreted as a very low positive MRD level. A second MRD level in initial and repeated experiment, see legend type of discordant results was observed in samples that were Figure 2). In the single PCR approach with hydrolysis (TaqMan) considered being positive (but not quantifiable) in one experi- probes, discordant MRD results were obtained in 52 out of 198 ment, whereas they could be quantified in the other experiment. samples (26%) and maximal MRD levels were discordant in 27 These differences reflect variation in the reproducible sensitivity out of 104 samples (26%). In the nested PCR with SYBR Green I of the two experiments, which may result in identical MRD detection, discordant MRD results were obtained in 25 out of 72 levels being quantified in the experiment with the highest samples (35%) and maximal MRD levels were discordant in 18 reproducible sensitivity only (and being considered positive, not out of 46 samples (39%). It should be emphasized that the main quantifiable in the other). difference between the ‘single step PCR hydrolysis (TaqMan) In order to prevent potential differences in MRD results, all probe’ approach and the ‘nested PCR SYBR Green I’ approach is five laboratories now use the single PCR approach employing not related to the use of an ABI RQ-PCR machine or the Light hydrolysis probes (no nested PCR with SYBR Green I detection Cycler, but is related to the single PCR versus nested PCR anymore). approach. Discordant results were mainly observed in samples MRD-based risk group stratification according to the AIEOP/ with low MRD levels (o10À4). First, some samples were BFM ALL-2000 protocol was identical between the initial and considered to be positive (but not quantifiable) in one repeated experiment in 73% of cases (47 out of 64 patients in

Figure 2 Experimental reproducibility of RQ-PCR MRD analysis. (a and b) Experiments performed using a single PCR and hydrolysis probes. (c and d) Experiments performed using a nested PCR approach and SYBR Green I dye. RQ-PCR data of the initial experiment (x-axis) are compared with RQ-PCR data from a second experiment (y-axis). MRD data are shown for each individual PCR target (a) and (c) as well as the maximal MRD level as assessed with two independent PCR targets for each analyzed follow-up sample (b) and (d). The lines in the quantitative part of the assay indicate the x ¼ y, x ¼ 3y and x ¼ 0.33y axes.

Leukemia Improving molecular MRD diagnostics VHJ van der Velden et al 711 which both the day 33 and 3 months sample could be repeated considered ‘positive, below reproducible sensitivity’ in the with both targets). Discordant results concerned: HR-IR (2; other laboratories. 3%), IR-HR (2; 3%), LR-IR (2; 3%), IR-LR (4; 6%); all four Re-evaluation of MRD-based risk group stratification gave patients who shifted between HR and IR groups had MRD concordant results in 81% of cases (56 out of 69 evaluated levels just around the cutoff value. In the initial experiment, patients). Discordant results concerned: HR-IR (1; 1%), IR- MRD-based risk group stratification could not be made in two HR (1; 1%), LR-IR (1; 1%), unclassifiable -IR (3; 4%), IR- patients (3%); both cases were MR in the repeated experiment. unclassifiable (4; 6%) and LR- unclassifiable (3; 4%). The Five patients (8%), initially classified as LR (1) or IR (4), discrepancies in considering a patient unclassifiable according could not be stratified on the basis of the repeated experiment, to MRD results was mainly owing to a different interpretation of owing to insufficient reproducible sensitivities in combination the reproducible sensitivity of the MRD-PCR targets. with negative MRD results. A comparable variation in MRD On the basis of these results, the guidelines for RQ-PCR data results has recently been reported for paired bone marrow interpretation were re-evaluated and adapted, and discordant samples.21 cases were re-interpreted by the laboratories again. This resulted in identical reproducible sensitivities in 68% of cases. The Reproducibility of RQ-PCR MRD data interpreta- second round interpretation significantly improved the indivi- tion. The RQ-PCR data of the repeated experiments were dual results for MRD levels, but some discrepancies remained sent to a second laboratory for re-interpretation of the data using (Figure 3b and d). Furthermore, concordance in MRD-based risk the guidelines for RQ-PCR data interpretation. In 50% of cases, group stratification between the two laboratories was increased the reported reproducible sensitivity was identical between the to 86% of cases. One patient was HR versus IR, and seven two laboratories. As shown in Figure 3a and c, the re- patients were considered as not appropriate for MRD-based interpretation by the second laboratory resulted in comparable stratification (based on the lack of two sensitive targets) by one MRD levels in 72% (single PCR using hydrolysis probe) and laboratory but were stratified by the other laboratory. All cases 77% (nested PCR using SYBR Green) of cases, but some clear with discordant results were subsequently discussed within the discrepancies were observed as well. Particularly, MRD levels MRD Task Force and consensus on the interpretation was were quantified by two laboratories, whereas they were reached in all cases.

Figure 3 Reproducibility of interpretation of RQ-PCR MRD data. (a and b) Experiments performed using a single PCR and hydrolysis probes. (c and d) Experiments performed using a nested PCR approach and SYBR Green I dye. RQ-PCR data were interpreted in both the executing laboratory (x-axis) as well as by a second laboratory (y-axis). In (a and c) the results of the first round analysis are shown, whereas (b) and (d) show the results after the second round, in which the adapted guidelines were used for data interpretation. The lines in the quantitative part of the assay indicate the x ¼ y, x ¼ 3y and x ¼ 0.33y axes.

Leukemia Improving molecular MRD diagnostics VHJ van der Velden et al 712 uniformity in MRD results, thereby ensuring the comparability of patient risk groups in different clinical trials. Given the complexity of the MRD-PCR procedure, it is advised to limit the number of MRD-PCR laboratories per treatment protocol. These laboratories need to have a detailed knowledge on the structure and composition of Ig/TCR genes and thorough experience in analyzing the rearrangement patterns. Further- more, in order to achieve and maintain a minimal level of experience, the number of laboratories should preferably be limited to one laboratory per 10–14 million inhabitants (or one laboratory for smaller countries). Guidelines for interpretation of RQ-PCR data are a prerequi- site for clinical MRD studies and make it possible to compare results of different treatment protocols. The implementation of guidelines within this group was greatly facilitated by regular meetings with open discussion of non-blinded results. The guidelines for interpretation of RQ-PCR data as developed by the I-BFM-SG MRD task force are currently evaluated within the European Study Group on MRD detection in ALL (ESG-MRD- Figure 4 Comparison between MRD results obtained by dot-blot ALL), a consortium of 32 laboratories involved in MRD analysis hybridization (x-axis) and MRD data obtained by RQ-PCR analysis (y- of ALL patients. Within the ESG-MRD-ALL, the guidelines are axis). It should be noted that quantification of very high ‘MRD’ levels further being optimized, in particular with respect to readability À2 (X10 ) by the dot-blot method is not accurate and consequently all and practical applicability (van der Velden et al. Leukemia,in MRD levels higher than 10À2 were reported as X10À2. press). The overall aims of the MRD Task Force were to identify and solve pitfalls in achieving consistent results in multicenter MRD Further evaluation of Step 3: dot-blot hybridization versus studies. Indeed, the concordance in percentage of patients that RQ-PCR could be MRD-stratified and the relative distribution of patients over the three MRD-based risk groups increased over time During the course of the AIEOP-BFM ALL-2000 protocol, the between the MRD laboratories of the AIEOP/BFM ALL2000 dot-blot hybridization method was replaced by RQ-PCR protocol (data not shown). In addition, the in-depth discussions analysis. Therefore, we compared MRD data obtained by both of all results also contributed to achieving a higher level of methods. To this end, 46 patients previously analyzed by dot- efficiency in the MRD-PCR laboratories, because experimental blot technology were re-analyzed using RQ-PCR (62 targets, procedures and approaches were attuned and optimized. 109 samples); particularly patients with detectable MRD levels Furthermore, the experience obtained in the distribution of were selected for this purpose. As shown in Figure 4, the data samples and in the analysis and reporting of the results were showed a good correlation between the two methods highly valuable for the set-up of a quality control program. Such (y ¼ 0.9993 Âþ0.1778; R2 ¼ 0.7381). It should be noted that program, consisting of two quality control rounds per year that quantification of very high ‘MRD’ levels (X10À2) by the dot-blot focus on all laboratory aspects of Ig/TCR-based MRD analysis, is method is not accurate and consequently all MRD levels higher currently being organized by the ESG-MRD-ALL. This quality than 10À2 were reported as X10À2. Of importance, in only two control program is required for the implementation of RQ-PCR patients (4%) MRD-based risk group stratification would have based MRD diagnostics in clinical protocols. been different between the two applied MRD methods. Acknowledgements Conclusion We are grateful to Dr Martin Zimmermann (Hannover, Germany) Within clinical treatment protocols, it is essential to obtain for advice on statistical issues and to Marieke Comans-Bitter for comparable MRD results in the involved laboratories. However, preparing the figures. We acknowledge the Kind-Phillip Stiftung, our data show that Ig/TCR-based MRD diagnostics is complex BMBF, Deutsche Krebshilfe, St Anna Kinderkrebsforschung, and that results may differ between laboratories. In our study, Fondazione Tettamanti, Fondazione Cariplo, Fondazione Citta` run in parallel to the MRD-based AIEOP/BFM ALL-2000 Della Speranza, Associazione Italiana per la Ricerca sul Cancro protocol, we identified several pitfalls in MRD analysis and (AIRC), MIUR PRIN 2005 no. 2005069388_001, NH & MRC and made agreements on how to circumvent potential problems and Cancer Council (Australia) for financial support. to achieve and maintain uniform MRD data. Two topics appeared to be of utmost importance: standardization of References experimental approaches and strict guidelines for interpretation of RQ-PCR data. 1 Cave H, van der Werff ten Bosch J, Suciu S, Guidal C, Waterkeyn The standardization of the experimental approaches needs to C, Otten J et al. Clinical significance of minimal residual disease in address all steps of MRD analysis, including isolation of DNA, childhood acute lymphoblastic leukemia. European Organization detection and identification of Ig/TCR gene rearrangements, for Research and Treatment of Cancer–Childhood Leukemia interpretation of Ig/TCR sequences, ASO primer design, RQ-PCR Cooperative Group. N Engl J Med 1998; 339: 591–598. 2 Coustan-Smith E, Sancho J, Hancock ML, Boyett JM, Behm FG, technique and analysis of RQ-PCR data (Figure 1). Although it Raimondi SC et al. Clinical importance of minimal residual disease may not be necessary to replicate every step in exact detail, a in childhood acute lymphoblastic leukemia. Blood 2000; 96: certain level of standardization is required in order to achieve 2691–2696.

Leukemia Improving molecular MRD diagnostics VHJ van der Velden et al 713 3 van Dongen JJM, Seriu T, Panzer-Grumayer ER, Biondi A, Pongers- 13 Pongers-Willemse MJ, Seriu T, Stolz F, d’Aniello E, Gameiro P, Willemse MJ, Corral L et al. Prognostic value of minimal residual Pisa P et al. Primers and protocols for standardized detection of disease in acute lymphoblastic leukaemia in childhood. Lancet minimal residual disease in acute lymphoblastic leukemia using 1998; 352: 1731–1738. immunoglobulin and T cell receptor gene rearrangements and 4 Panzer-Grumayer ER, Schneider M, Panzer S, Fasching K, Gadner TAL1 deletions as PCR targets: report of the BIOMED-1 CON- H. Rapid molecular response during early induction chemotherapy CERTED ACTION: investigation of minimal residual disease in predicts a good outcome in childhood acute lymphoblastic acute leukemia. Leukemia 1999; 13: 110–118. leukemia. Blood 2000; 95: 790–794. 14 Peham M, Panzer S, Fasching K, Haas OA, Fischer S, Marschalek R 5 Knechtli CJ, Goulden NJ, Hancock JP, Grandage VL, Harris EL, et al. Low frequency of clonotypic Ig and T-cell receptor gene Garland RJ et al. Minimal residual disease status before allogeneic rearrangements in t(4;11) infant acute lymphoblastic leukaemia bone marrow transplantation is an important determinant of and its implication for the detection of minimal residual disease. successful outcome for children and adolescents with acute Br J Haematol 2002; 117: 315–321. lymphoblastic leukemia. Blood 1998; 92: 4072–4079. 15 Verhagen OJ, Willemse MJ, Breunis WB, Wijkhuijs AJ, Jacobs DC, 6 van der Velden VHJ, Joosten SA, Willemse MJ, van Wering ER, Joosten SA et al. Application of germline IGH probes in real-time Lankester AW, van Dongen JJM et al. Real-time quantitative PCR quantitative PCR for the detection of minimal residual disease in for detection of minimal residual disease before allogeneic stem acute lymphoblastic leukemia. Leukemia 2000; 14: 1426–1435. cell transplantation predicts outcome in children with acute 16 van der Velden VHJ, Wijkhuijs JM, Jacobs DC, van Wering ER, van lymphoblastic leukemia. Leukemia 2001; 15: 1485–1487. Dongen JJM. T cell receptor gamma gene rearrangements as targets 7 Szczepanski T, Orfao A, van der Velden VHJ, San Miguel JF, van for detection of minimal residual disease in acute lymphoblastic Dongen JJM. Minimal residual disease in leukaemia patients. leukemia by real-time quantitative PCR analysis. Leukemia 2002; Lancet Oncol 2001; 2: 409–417. 16: 1372–1380. 8 Marshall GM, Haber M, Kwan E, Zhu L, D, Xue C et al. 17 van der Velden VHJ, Willemse MJ, van der Schoot CE, Hahlen K, Importance of minimal residual disease testing during the second van Wering ER, van Dongen JJM. Immunoglobulin kappa deleting year of therapy for children with acute lymphoblastic leukemia. element rearrangements in precursor-B acute lymphoblastic J Clin Oncol 2003; 21: 704–709. leukemia are stable targets for detection of minimal residual 9 Willemse MJ, Seriu T, Hettinger K, d’Aniello E, Hop WC, Panzer- disease by real-time quantitative PCR. Leukemia 2002; 16: Grumayer ER et al. Detection of minimal residual disease identifies 928–936. differences in treatment response between T-ALL and precursor B- 18 Nakao M, Janssen JW, Flohr T, Bartram CR. Rapid and reliable ALL. Blood 2002; 99: 4386–4393. quantification of minimal residual disease in acute lymphoblastic 10 Szczepanski T, Flohr T, van der Velden VHJ, Bartram CR, van leukemia using rearranged immunoglobulin and T-cell receptor Dongen JJM. Molecular monitoring of residual disease using loci by LightCycler technology. Cancer Res 2000; 60: 3281–3289. antigen receptor genes in childhood acute lymphoblastic leukae- 19 Beishuizen A, Verhoeven MA, Mol EJ, Breit TM, Wolvers-Tettero mia. Best Pract Res Clin Haematol 2002; 15: 37–57. IL, van Dongen JJ. Detection of immunoglobulin heavy-chain gene 11 van der Velden VHJ, Hochhaus A, Cazzaniga G, Szczepanski T, rearrangements by Southern blot analysis: recommendations for Gabert J, van Dongen JJM. Detection of minimal residual disease optimal results. Leukemia 1993; 7: 2045–2053. in hematologic malignancies by real-time quantitative PCR: 20 Breit TM, Wolvers-Tettero IL, Beishuizen A, Verhoeven MA, van principles, approaches, and laboratory aspects. Leukemia 2003; Wering ER, van Dongen JJ. Southern blot patterns, frequencies, and 17: 1013–1034. junctional diversity of T-cell receptor-delta gene rearrangements in 12 Verhagen OJ, Wijkhuijs AJ, van der Sluijs-Gelling AJ, Szczepanski acute lymphoblastic leukemia. Blood 1993; 82: 3063–3074. T, van der Linden-Schrever BE, Pongers-Willemse MJ et al. 21 van der Velden VH, Hoogeveen PG, Pieters R, van Dongen JJ. Suitable DNA isolation method for the detection of minimal Impact of two independent bone marrow samples on minimal residual disease by PCR techniques. Leukemia 1999; 13: residual disease monitoring in childhood acute lymphoblastic 1298–1299. leukaemia. Br J Haematol 2006; 133: 382–388.

Leukemia