Published online 10 July 2014 Nucleic Acids Research, 2014, Vol. 42, No. 16 e129 doi: 10.1093/nar/gku607 Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence Claire T. Deakin1, Jeffrey J. Deakin1, Samantha L. Ginn1, Paul Young2, David Humphreys2, Catherine M. Suter2,3, Ian E. Alexander1,4,* and Claus V. Hallwirth1
1Gene Therapy Research Unit, Children’s Medical Research Institute and The Children’s Hospital at Westmead, Westmead, New South Wales 2145, Australia, 2Molecular Genetics Division, Victor Chang Cardiac Research Institute, Sydney, Darlinghurst, New South Wales 2010, Australia, 3Faculty of Medicine, University of New South Wales, Kensington, New South Wales 2052, Australia and 4Discipline of Paediatrics and Child Health, The Children’s Hospital at Westmead Clinical School, The University of Sydney, Westmead, New South Wales 2145, Australia Downloaded from Received August 14, 2013; Revised June 10, 2014; Accepted June 24, 2014
ABSTRACT INTRODUCTION
Barcoded vectors are promising tools for investigat- Retroviral vectors, such as gammaretroviral and lentivi- http://nar.oxfordjournals.org/ ing clonal diversity and dynamics in hematopoietic ral vectors, have demonstrated great therapeutic poten- gene therapy. Analysis of clones marked with bar- tial, particularly for gene therapy applications targeting coded vectors requires accurate identification of po- the hematopoietic compartment. Therapeutic efficacy fol- tentially large numbers of individually rare barcodes, lowing retroviral gene delivery to hematopoietic progenitor cells (HPCs) has been reported following trials of gene ther- when the exact number, sequence identity and abun- apy for several genetic diseases (1–12), leukemia (13)and dance are unknown. This is an inherently challeng- attenuation of graft-versus-host disease (14). Analyses of ing application, and the feasibility of using contem- vector integration sites (ISs), which uniquely tag individ- porary next-generation sequencing technologies is ual gene-marked HPC clones, are yielding important in- at UNSW Library on May 18, 2016 unresolved. To explore this potential application em- sights into clonal complexity, clonal dynamics and geno- pirically, without prior assumptions, we sequenced toxicity following gene therapy. For example, analysis of barcode libraries of known complexity. Libraries con- samples taken 12–102 months post-transplant from eight taining 1, 10 and 100 Sanger-sequenced barcodes patients treated in the groundbreaking French SCID-X1 were sequenced using an Illumina platform, with a trial showed that diversity of reconstituted T cells corre- 100-barcode library also sequenced using a SOLiD lated positively with the dose of genetically modified HPCs platform. Libraries containing 1 and 10 barcodes received by each patient (15). Additionally, the propor- tion of genetically modified HPCs that contributed to long- were distinguished from false barcodes generated term hematopoiesis was estimated to be 1%. In the same by sequencing error by a several log-fold difference and subsequent trials involving other disease indications, IS in abundance. In 100-barcode libraries, however, ex- analysis has also been successfully used to investigate ad- pected and false barcodes overlapped and could not verse events including leukemia, myelodysplasia and non- be resolved by bioinformatic filtering and cluster- malignant clonal expansions (16–19). The underlying mech- ing strategies. In independent sequencing runs mul- anism proved to be insertional mutagenesis and is now rec- tiple false-positive barcodes appeared to be repre- ognized as an important genotoxic risk associated with gene sented at higher abundance than known barcodes, therapy applications using integrating vector systems. While despite their confirmed absence from the original li- indispensable for investigating the mechanism underlying brary. Such errors, which potentially impact barcod- the above adverse events, IS analysis has a number of limi- ing studies in an application-dependent manner, are tations when used to assess clonal dynamics, including early and reliable detection of potentially pathological clonal ex- consistent with the existence of both stochastic and pansions. These limitations include methodological com- systematic error, the mechanism of which is yet to plexity and, with the most widely used protocols involving be fully resolved. use of both restriction endonucleases and extensive rounds
*To whom correspondence should be addressed. Tel: +61 2 9845 3071; Fax: +61 2 9845 1317; Email: [email protected]