Impact of Next-Generation Sequencing Error on Analysis of Barcoded Plasmid Libraries of Known Complexity and Sequence Claire T

Published online 10 July 2014 Nucleic Acids Research, 2014, Vol. 42, No. 16 e129 doi: 10.1093/nar/gku607 Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence Claire T. Deakin1, Jeffrey J. Deakin1, Samantha L. Ginn1, Paul Young2, David Humphreys2, Catherine M. Suter2,3, Ian E. Alexander1,4,* and Claus V. Hallwirth1 1Gene Therapy Research Unit, Children’s Medical Research Institute and The Children’s Hospital at Westmead, Westmead, New South Wales 2145, Australia, 2Molecular Genetics Division, Victor Chang Cardiac Research Institute, Sydney, Darlinghurst, New South Wales 2010, Australia, 3Faculty of Medicine, University of New South Wales, Kensington, New South Wales 2052, Australia and 4Discipline of Paediatrics and Child Health, The Children’s Hospital at Westmead Clinical School, The University of Sydney, Westmead, New South Wales 2145, Australia Downloaded from Received August 14, 2013; Revised June 10, 2014; Accepted June 24, 2014 ABSTRACT INTRODUCTION Barcoded vectors are promising tools for investigat- Retroviral vectors, such as gammaretroviral and lentivi- http://nar.oxfordjournals.org/ ing clonal diversity and dynamics in hematopoietic ral vectors, have demonstrated great therapeutic poten- gene therapy. Analysis of clones marked with bar- tial, particularly for gene therapy applications targeting coded vectors requires accurate identification of po- the hematopoietic compartment. Therapeutic efficacy fol- tentially large numbers of individually rare barcodes, lowing retroviral gene delivery to hematopoietic progenitor cells (HPCs) has been reported following trials of gene ther- when the exact number, sequence identity and abun- apy for several genetic diseases (1–12), leukemia (13)and dance are unknown. This is an inherently challeng- attenuation of graft-versus-host disease (14). Analyses of ing application, and the feasibility of using contem- vector integration sites (ISs), which uniquely tag individ- porary next-generation sequencing technologies is ual gene-marked HPC clones, are yielding important in- at UNSW Library on May 18, 2016 unresolved. To explore this potential application em- sights into clonal complexity, clonal dynamics and geno- pirically, without prior assumptions, we sequenced toxicity following gene therapy. For example, analysis of barcode libraries of known complexity. Libraries con- samples taken 12–102 months post-transplant from eight taining 1, 10 and 100 Sanger-sequenced barcodes patients treated in the groundbreaking French SCID-X1 were sequenced using an Illumina platform, with a trial showed that diversity of reconstituted T cells corre- 100-barcode library also sequenced using a SOLiD lated positively with the dose of genetically modified HPCs platform. Libraries containing 1 and 10 barcodes received by each patient (15). Additionally, the propor- tion of genetically modified HPCs that contributed to long- were distinguished from false barcodes generated term hematopoiesis was estimated to be 1%. In the same by sequencing error by a several log-fold difference and subsequent trials involving other disease indications, IS in abundance. In 100-barcode libraries, however, ex- analysis has also been successfully used to investigate ad- pected and false barcodes overlapped and could not verse events including leukemia, myelodysplasia and non- be resolved by bioinformatic filtering and cluster- malignant clonal expansions (16–19). The underlying mech- ing strategies. In independent sequencing runs mul- anism proved to be insertional mutagenesis and is now rec- tiple false-positive barcodes appeared to be repre- ognized as an important genotoxic risk associated with gene sented at higher abundance than known barcodes, therapy applications using integrating vector systems. While despite their confirmed absence from the original li- indispensable for investigating the mechanism underlying brary. Such errors, which potentially impact barcod- the above adverse events, IS analysis has a number of limi- ing studies in an application-dependent manner, are tations when used to assess clonal dynamics, including early and reliable detection of potentially pathological clonal ex- consistent with the existence of both stochastic and pansions. These limitations include methodological com- systematic error, the mechanism of which is yet to plexity and, with the most widely used protocols involving be fully resolved. use of both restriction endonucleases and extensive rounds *To whom correspondence should be addressed. Tel: +61 2 9845 3071; Fax: +61 2 9845 1317; Email: [email protected] C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] e129 Nucleic Acids Research, 2014, Vol. 42, No. 16 PAGE 2 OF 14 of polymerase chain reaction (PCR), the risk of detection elongation factor 1␣ (EF1␣) promoter-enhancer frag- biases that can reduce sensitivity and even preclude detec- ment (Figure 1A). Oligonucleotides were synthesized tion of certain clones (20). Despite efforts to address these to contain random nucleotides at defined positions limitations (20–24), there remains considerable impetus for and adaptor sequences for either the Illumina or the development of alternative methods with improved sen- SOLiD platforms (Supplementary Table S1). Anneal- sitivity and greater quantitative potential. ing of either primer 5’-[phos]GGCACCCGTGCAC Barcoded vectors, containing random nucleotide (nt) se- for the Illumina-compatible barcode or primer 5’- quences at defined positions, are a conceptually attrac- [phos]GCTGCTGTACGGCCAAGGCG for the SOLiD- tive alternative to IS analysis. Individual HPCs would be compatible barcode produced an NsiI-compatible end at uniquely tagged provided the barcoded vector stock has one end of the barcode insert. The complementary strands sufficiently high complexity. Such an approach could offer of both barcode inserts were synthesized using the 5’ → more reliable quantitation of clonal contributions if mini- 3’ exo− Klenow Fragment (New England Biolabs) and an mal PCR cycles are used to amplify the barcode from the NsiI-compatible end was generated at the other end of genomic DNA, as well as methodological simplicity. Given the barcode insert by cleavage with PstI (New England that doses in excess of 106 transduced HPCs per kg of body Biolabs). After ligation of the insert with NsiI-linearized weight have been used in hematopoietic gene therapy trials pEF1␣.␥c, the NsiI site was not reconstituted, which (2,4,6–10), an ideal barcode library may need to contain up enabled digestion of the ligation product with NsiI to 8 to 10 different barcodes to ensure HPC clones are uniquely eliminate vector molecules that re-ligated without the Downloaded from tagged. Analyzing the diversity of such a highly complex barcode insert. Electrocompetent SURE cells (Agilent barcode library would require the ability to accurately iden- Technologies) were transformed with the ligation prod- tify large numbers of unique barcode variants of unknown ucts to produce highly complex Illumina-compatible sequence, individually present at low frequency. and SOLiD-compatible barcoded plasmid libraries, with The capacity of next-generation sequencing (NGS) to an- complexities of ∼15 million and 1.8 million, respectively. alyze tens to hundreds of millions of short sequence reads http://nar.oxfordjournals.org/ raises the possibility of identifying and possibly quantifying Production of defined barcode libraries very large numbers of barcode variants recovered from genomic DNA extracted from clinical samples. The suitability From the Illumina-compatible and SOLiD-compatible of existing NGS technologies for this extremely demanding plasmid libraries, individual plasmids containing 119 and application is yet to be resolved. Current NGS technologies 100 unique barcodes, respectively, were isolated, quanti- have higher error rates than traditional Sanger sequencing fied using a NanoDrop 1000 spectrophotometer (Thermo (25,26), and each of the platforms has different error profiles Fisher Scientific), and Sanger-sequenced using anAB (27,28). Although several analyses of barcodes amplified 3730xl instrument (Australian Genome Research Facility). at UNSW Library on May 18, 2016 from integrated retroviral vectors have been reported (29– For all isolated plasmids, concentrations ranged from 36.3 36), at present it is unknown to what extent sequencing error to 235.7 ng/␮l. Barcode libraries of defined complexity might impact on the analysis of complex barcoded libraries, comprising known sequence identities were produced by and whether there is a limit to the degree of complexity mixing the plasmids containing these sequenced barcodes that can be reliably resolved using contemporary NGS tech- in equimolar proportions. For the Illumina-compatible nologies. To address these questions empirically, we ampli- barcode, plasmids containing unique and defined bar- fied barcodes of known sequence identity within mixtures of code sequences were mixed to provide libraries contain- low to moderate complexity using minimal PCR cycles, and ing 10 known and 100 known barcode sequences,

Impact of Next-Generation Sequencing Error on Analysis of Barcoded Plasmid Libraries of Known Complexity and Sequence Claire T

SMRT® Tools Reference Guide (V8.0)

Renderx XEP User Guide XEP User Guide

Dropping Hints: Estimating the Diets of Livestock in Rangelands Using DNA Metabarcoding of Faeces

Building-Up of a DNA Barcode Library for True Bugs (Insecta: Hemiptera: Heteroptera) of Germany Reveals Taxonomic Uncertainties and Surprises

Customer Case Study Series #5/2012

Automation Basics for the Small Public Library.Pdf

Lineage Recording Reveals Dynamics of Cerebral Organoid Regionalization

Software Reuse Library Amandeep Kaur1, Raman Goyal2 1,2Lala Lajpat Rai College of Engineering & Technology, India, Moga [email protected]

IEEE Paper Template in A4

2D Barcode Based Mobile Payment System with Biometric Security

Luatex Lunatic

Expressed Barcodes Enable Clonal Characterization of Chemotherapeutic Responses in Chronic Lymphocytic Leukemia