Quick viewing(Text Mode)

1 Genomic DNA Transposition Induced by Human PGBD5 Anton G

1 Genomic DNA Transposition Induced by Human PGBD5 Anton G

1 Genomic DNA transposition induced by human PGBD5

2 3 Anton G. Henssena, Elizabeth Henaffb, Eileen Jianga, Amy R. Eisenberga, Julianne R. Carsona, 4 Camila M. Villasantea, Mondira Raya, Eric Stilla, Melissa Burnsc, Jorge Gandarab, Cedric 5 Feschotted, Christopher E. Masonb, Alex Kentsis a,e,f,1 6 7 8 9 a Molecular Pharmacology Program, Sloan Kettering Institute, New York, NY; 10 b Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY; 11 c Boston Children’s Hospital, Boston, MA; 12 d Department of Human , University of Utah School of Medicine, Salt Lake City, UT; 13 e Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY; 14 f Weill Cornell Medical College, , New York, NY. 15 16 17 18 19 1 To whom correspondence may be addressed: Email: [email protected] 20 21 Author contributions: A.H. and A.K. designed the research, A.H., E.H., A.E., E.J., M.B., J.R.C, 22 C.V. performed the research, A.H., E.H., C.F., C.E.M., A.K. analyzed data, A.H. and A.K. wrote 23 the manuscript in consultation with all authors. 24 25 The authors declare no conflict of interest. 26 27 Data deposition: The data reported in this manuscript have been deposited to the Sequence Read 28 Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra/, accession number SRP061649), the Dryad 29 Digital Repository (http://datadryad.org, doi:10.5061/dryad.b2hc1), and the Zenodo computer 30 archive (http://dx.doi.org/10.5281/zenodo.22206). Detailed experimental protocols have been 31 posted at the Nature Protocol Exchange 32 (http://www.nature.com/protocolexchange/labgroups/38333). 33 34 This article contains supporting information. 35

1

36 Abstract

37 Transposons are that are found in nearly all , including

38 humans. Mobilization of DNA transposons by can cause genomic

39 rearrangements, but our knowledge of human derived from is limited. Here,

40 we find that the encoded by human PGBD5, the most evolutionarily conserved

41 -derived in vertebrates, can induce stereotypical cut-and-paste DNA

42 transposition in human cells. Genomic integration activity of PGBD5 requires distinct aspartic

43 acid residues in its transposase domain, and specific DNA sequences containing inverted

44 terminal repeats with similarity to piggyBac transposons. DNA transposition catalyzed by

45 PGBD5 in human cells occurs -wide, with precise transposon excision and preference for

46 at TTAA sites. The apparent conservation of DNA transposition activity by PGBD5

47 suggests that genomic remodeling contributes to its biological function.

2

48 Introduction

49 Transposons are genetic elements that are found in nearly all living organisms (1). They

50 can contribute to the developmental and adaptive regulation of and are a major

51 source of genetic variation that drives genome (2). In humans and other mammals, they

52 comprise about half of the nuclear genome (3). The majority of primate-specific sequences that

53 regulate gene expression are derived from transposons (4), and transposons are a major source of

54 structural genetic variation in human populations (5).

55 While the majority of genes that encode transposase enzymes tend to become

56 catalytically inactive and their transposon substrates tend to become immobile in the course of

57 organismal evolution, some can maintain their transposition activities (6, 7). In humans, at least

58 one hundred L1 long interspersed repeated sequences (LINEs) actively transpose in human

59 and induce structural variation (8), including somatic rearrangements in neurons that

60 may contribute to neuronal plasticity (9). The human -like transposase RAG1 catalyzes

61 somatic recombination of the V(D)J receptor genes in lymphocytes, and is essential for adaptive

62 immunity (10). The Mariner-derived transposase SETMAR functions in single-stranded DNA

63 resection during DNA repair and replication in human cells and can catalyze DNA transposition

64 in vitro (7, 11).

65 Among transposase enzymes that can catalyze excision and insertion of transposon

66 sequences, DNA transposases are distinct in their dependence only on the availability of

67 competent genomic substrates and cellular repair enzymes that ligate and repair excision sites, as

68 compared to which require of the mobilized sequences (12). Most

69 DNA transposases utilize an RNase H-like domain with three aspartate or glutamate residues

70 (so-called DDD or DDE motif) that catalyze magnesium-dependent hydrolysis of phosphodiester

3

71 bonds and strand exchange (13-15). The IS4 transposase family, which includes piggyBac

72 transposases, is additionally distinguished by precise excisions without modifications of the

73 transposon flanking sequences (16). The piggyBac transposase and its transposon were originally

74 identified as an insertion in lepidopteran Trichoplusia ni cells (17). The piggyBac transposon

75 consists of 13-bp inverted terminal repeats (ITR) and 19-bp subterminal inverted repeats located

76 3 and 31 base pairs from the 5’ and 3’ ITRs, respectively (18). PiggyBac transposase can

77 mobilize a variety of ITR-flanked sequences and has a preference for integration at TTAA target

78 sites in the host genome (15, 18-23).

79 Members of the piggyBac superfamily of transposons have colonized a wide range of

80 organisms (24), including a recent and likely ongoing invasion of the bat M. lucifugus (25). The

81 contains 5 paralogous genes derived from piggyBac transposases, PGBD1-5 (24,

82 26). PGBD1 and PGBD2 invaded the common mammalian ancestor, and PGBD3 and PGBD4

83 are restricted to primates, but are all contained as single coding , fused in frame with

84 endogenous host genes, such as the Cockayne syndrome B gene (CSB)-PGBD3 fusion (24, 27).

85 Thus far, only the function of PGBD3 has been investigated. CSB-PGBD3 is capable of binding

86 DNA, including endogenous piggyBac-like transposons in the human genome, but has no known

87 catalytic activity, though biochemical and genetic evidence indicates that it may participate in

88 DNA damage response (28, 29).

89 PGBD5 is distinct from other human piggyBac-derived genes by having been

90 domesticated much earlier in vertebrate evolution approximately 500 million years (My) ago, in

91 the common ancestor of cephalochordates and vertebrates (24, 30). PGBD5 is transcribed as a

92 multi-intronic but non-chimeric transcript predicted to encode a full-length transposase (30).

93 Furthermore, PGBD5 expression in both human and mouse appears largely restricted to the early

4

94 embryo and certain areas of the embryonic and adult brain (24, 30). These intriguing features

95 prompted us to investigate whether human PGBD5 has retained the enzymatic capability of

96 mobilizing DNA.

5

97 Results

98 Human PGBD5 contains a C-terminal RNase H-like domain that has approximately 20%

99 sequence identity and 45% similarity to the active lepidopteran piggyBac, ciliate piggyMac, and

100 mammalian piggyBat transposases (Figure 1) (24, 25, 31). In addition, the human genome

101 contains 2,358 sequence elements with similarity to the piggyBac transposable elements (Table 1

102 and Figure 2A). Specifically, MER75 (MER75, MER75A, MER75B) and MER85 elements

103 show considerably similar inverted terminal repeat sequences as compared to lepidopteran

104 piggyBac transposons (Table 2 and Figure 2B). A total of 328 piggyBac-like elements in the

105 human genome have intact inverted terminal repeats and exhibit duplications of their presumed

106 TTAA target sites (Table 1 and Figure 2C). We reasoned that even though the ancestral

107 transposon substrates of PGBD5 cannot be predicted due to its very ancient evolutionary origin

108 (~500 My), preservation of its transposase activity should confer residual ability to mobilize

109 distantly related piggyBac-like transposons. To test this hypothesis, we used a synthetic

110 transposon reporter PB-EF1-NEO comprised of a neomycin resistance gene flanked by T. ni

111 piggyBac ITRs (Figure 3B) (20, 32). We transiently transfected human embryonic kidney (HEK)

112 293 cells, which lack endogenous PGBD5 expression with the PB-EF1-NEO transposon reporter

113 in the presence of a plasmid expressing PGBD5, and assessed genomic integration of the

114 reporter using clonogenic assays in the presence of G418 to select cells with genomic integration

115 conferring neomycin resistance (Figure 3C, Figure 3 – figure supplement 1). Given the absence

116 of suitable to monitor PGBD5 expression, we expressed PGBD5 as an N-terminal

117 fusion with the green fluorescent protein (GFP). We observed significant rates of neomycin

118 resistance of cells conferred by the transposon reporter with GFP-PGBD5, but not in cells

119 expressing control GFP or mutant GFP-PGBD5 lacking the transposase domain (Figure 3C),

120 despite equal expression of all transgenes (Figure 3 – figure supplement 2). The efficiency of 6

121 neomycin resistance conferred by the transposon reporter with GFP-PGBD5 was approximately

122 4.5-fold less than that of the T. ni piggyBac-derived transposase (Figure 3D), consistent with

123 their evolutionary divergence. These results suggest that human PGBD5 can promote genomic

124 integration of a piggyBac-like transposon.

125 If neomycin resistance conferred by the PGBD5 and the transposon reporter is due to

126 genomic integration and DNA transposition, then this should require specific activity on the

127 transposon ITRs. To test this hypothesis, we generated transposon reporters with mutant ITRs

128 and assayed them for genomic integration (Figure 3B, Figure 3 – figure supplement 3). DNA

129 transposition by the piggyBac family transposases involves hairpin intermediates with a

130 conserved 5’-GGGTTAACCC-3’ sequence that is required for target site phosphodiester

131 hydrolysis (15). Thus, we generated reporter lacking ITRs entirely or containing

132 complete ITRs with 5’-ATATTAACCC-3’ predicted to disrupt the formation of

133 productive hairpin intermediates (15). To enable precise quantitation of mobilization activity, we

134 developed a quantitative genomic PCR assay using primers specific for the transposon reporter

135 and the endogenous human TK1 gene for normalization (Figure 3 – figure supplements 4 and 5).

136 In agreement with the results of the clonogenic neomycin resistance assays, we observed

137 efficient genomic integration of the donor transposons in cells transfected by GFP-PGBD5 as

138 compared to the minimal signal observed in cells expressing GFP control (Figure 3E).

139 of transposon ITRs from the reporter reduced genomic integration to background levels (Figure

140 3E). Consistent with the specific function of piggyBac family ITRs in genomic transposition,

141 of the terminal GGG sequence in the ITR significantly reduced the integration

142 efficiency (Figure 3E). These results indicate that specific transposon ITR sequences are required

143 for PGBD5-mediated DNA transposition.

7

144 DNA transposition by piggyBac superfamily transposases is distinguished from most

145 other DNA transposon superfamilies by the precise excision of the transposon from the donor

146 site and preference for insertion in TTAA sites (20, 32). To determine the structure of the donor

147 sites of transposon reporters mobilized by PGBD5, we isolated plasmid DNA from cells two

148 days after transfection, amplified the transposon reporter using PCR, and determined its

149 sequence using capillary Sanger sequencing (Figure 4 – figure supplement 1). Similar to the

150 hyperactive T. ni piggyBac, cells expressing GFP-PGBD5, but not those expressing GFP control

151 vector, exhibited robust excision of ITR-flanked transposon with apparently precise repair of the

152 donor plasmid (Figure 4A, 4B and Figure 4 – figure supplement 1). These results suggest that

153 PGBD5 is an active cut-and-paste DNA transposase.

154 To validate chromosomal integration and determine the location and precise structure of

155 the insertion of the reporter transposons in the human genome, we isolated genomic DNA from

156 G418-resistant HEK293 cells following transfection with PGBD5 and PB-EF1-NEO, and

157 amplified the genomic sites of transposon insertions using flanking-sequence exponential

158 anchored (FLEA) PCR, a technique originally developed for high-efficiency analysis of

159 retroviral integrations (33). We adapted FLEA-PCR for the analysis of genomic DNA

160 transposition by using unique reporter sequence to prime polymerase extension upstream of the

161 transposon ITR into the flanking human genome, followed by reverse linear extension using

162 degenerate primers, and exponential amplification using specific nested primers to generate

163 chimeric suitable for massively parallel single-molecule Illumina DNA sequencing

164 (Figure 5) (34). This method enabled us to isolate specific portions of the human genome

165 flanking transposon insertions, as evidenced by the reduced yield of amplicons isolated from

166 control cells lacking transposase vectors or expressing GFP (Figure 6 – figure supplement 1). To

8

167 identify the sequences of the transposon genomic insertions at single resolution, we

168 aligned reads obtained from FLEA-PCR Illumina sequencing to the human hg19 reference

169 genome and synthetic transposon reporter, and identified split reads that specifically span both

170 (Figure 5). These data have been deposited to the Sequence Read Archive

171 (http://www.ncbi.nlm.nih.gov/sra/, accession number SRP061649), with the processed and

172 annotated data available from the Dryad Digital Repository (35).

173 To infer the mechanism of genomic integration of transposon reporters, we analyzed the

174 sequences of the insertion loci to determine integration preferences at base pair resolution and

175 identify potential sequence preferences. We found that transposon amplicons isolated from cells

176 expressing GFP-PGBD5, but not those isolated from GFP control cells, were significantly

177 enriched for TTAA sequences, as determined by sequence entropy analysis (36) (Figure 6A). To

178 discriminate between potential DNA transposition at TTAA target sites and alternative

179 mechanisms of chromosomal integration, we classified genomic insertions based on target sites

180 containing TTAA and those containing other sequence motifs (Table 3). Consistent with the

181 DNA transposition activity of PGBD5, we observed significant induction of TTAA-containing

182 insertions in cells expressing GFP-PGBD5 and transposons with intact ITRs, as compared to

183 control cells expressing GFP, or to cells transfected with GFP-PGBD5 and mutant ITR

184 transposons (Table 3). Sequence analysis of split reads containing transposon-human junction at

185 TTAA sites revealed that, in almost every case examined (n = 65 out of 66), joining between

186 TTAA host and transposon DNA occurred precisely at the GGG/CCC terminal motif of the

187 donor transposon ITR (Figure 6C), in agreement with its requirement for efficient DNA

188 transposition (Figure 3E). Consistent with the genome-wide transposition induced by PGBD5,

189 we identified transposition events in all human , including both genic and

9

190 intergenic loci (Figure 6B). Thus, PGBD5 can mediate canonical cut-and-paste DNA

191 transposition of piggyBac transposons in human cells.

192 Requirement for transposon substrates with specific ITRs, their precise excision and

193 preferential insertion into TTAA-containing genomic locations are all consistent with the

194 preservation of PGBD5’s DNA transposase activity in human cells. Like other cut-and-paste

195 transposases, piggyBac superfamily transposases are thought to utilize a triad of aspartate or

196 glutamate residues to catalyze phosphodiester bond hydrolysis, but the catalytic triad of

197 aspartates previously proposed for T. ni piggyBac is apparently not conserved in the primary

198 sequence of PGBD5 (Figure 1) (14, 15, 24, 37). Thus, we hypothesized that distinct aspartic or

199 glutamic acid residues may be required for DNA transposition mediated by PGBD5. To test this

200 hypothesis, we used alanine scanning mutagenesis and assessed transposition activity of GFP-

201 PGBD5 mutants using quantitative genomic PCR (Figure 7 and Figure 7 – figure supplements 1-

202 3). This analysis indicated that simultaneous alanine mutations of D168, D194, and D386

203 reduced apparent transposition activity to background levels, similar to that of GFP control

204 (Figure 7A). We confirmed that the mutant GFP-PGBD5 have equivalent stability and

205 expression as the wild-type protein in cells by immunoblotting against GFP (Figure 7B).

206 Phylogenetic analysis of vertebrate piggyBac homologs from Danio rerio, Python bivittatus,

207 Xenopus tropicalis, Gallus gallus, Mus musculus and Homo sapiens showed that PGBD5

208 proteins are divergent from other piggyBac-like proteins (Figure 8), and include conservation of

209 functionally important D168, D194, and D386 residues that distinguish them from lepidopteran

210 piggyBac and human PGBD1-4 (Figure 8 - figure supplement 1). These results suggest that

211 PGBD5 represents a distinct member of the evolutionarily ancient piggyBac-like family of DNA

212 transposases.

10

213 Discussion

214 Our current findings indicate that human PGBD5 is an active piggyBac transposase that

215 can catalyze DNA transposition in human cells. DNA transposition by PGBD5 requires its C-

216 terminal transposase domain, and depends on specific inverted terminal repeats derived from the

217 lepidopteran piggyBac transposons (Figure 3). DNA transposition involves trans-esterification

218 reactions mediated by DNA hairpin intermediates (15). Consistent with the requirement of intact

219 termini of the piggyBac, Tn10, and Mu transposons (18), elimination or mutation of the terminal

220 GGG from the transposon substrates also abolishes the transposition activity of

221 PGBD5 (Figure 3). PGBD5-induced DNA transposition is precise with preference for insertions

222 at TTAA genomic sites (Figure 4). Since our analysis was limited to ectopically expressed

223 PGBD5 fused to GFP and episomal substrates derived from lepidopteran piggyBac transposons,

224 it is possible that endogenous PGBD5 may exhibit different activities on chromatinized

225 substrates in the human genome.

226 Current structure-function analysis indicates that PGBD5 requires three aspartate residues

227 to mediate DNA transposition (Figure 7), but its transposase domain appears to be distinct from

228 other piggyBac transposase enzymes with respect to its primary sequence (Figure 1 and Figure 8)

229 (14). Thus, the three aspartate residues required for efficient DNA transposition by PGBD5 may

230 form a catalytic triad that functions in phosphodiester bond hydrolysis, similar to the DDD motif

231 in other piggyBac family transposases, or alternatively may contribute to other steps in the

232 transposition reaction, such as synaptic complex formation, hairpin opening, or strand exchange

233 (14, 15, 18). In addition, we find that alanine mutations of the three required aspartate residues in

234 the PGBD5 transposase domain significantly reduce but do not completely eliminate genomic

235 integration of the transposon reporters (Figure 7). This could reflect residual catalytic activity

11

236 despite these mutations, or that PGBD5 expression may affect other mechanisms of DNA

237 integration in human cells.

238 The evolutionary conservation of the transposition activity of PGBD5 suggests that it

239 may have hitherto unknown biologic functions among vertebrate organisms. DNA transposition

240 is a major source of genetic variation that drives genome evolution, with some DNA

241 transposases becoming extinct and others domesticated to evolve exapted functions. The

242 evolution of transposons’ activities can be highly variable, with some organisms such as Z. mays

243 undergoing continuous genome remodeling and recent two-fold expansion through endogenous

244 retrotransposition, and Saccharomyces owing over half of their known spontaneous

245 mutations to transposons, and primate species including humans exhibiting relative extinction of

246 transposons (1).

247 Indeed, transposase-derived genes domesticated in humans have evolved to have

248 endogenous functions other than genomic transposition per se. For example, human RAG1 is a

249 domesticated Transib transposase that has retained its active transposase domain, and can

250 transpose ITR-containing transposons in vitro, but catalyzes somatic recombination of

251 immunoglobulin and T- receptor genes in lymphocytes across signal sequences that might be

252 derived from related transposons (38, 39). Human SETMAR is a Mariner-derived transposase

253 with a divergent DDN transposase domain that has retained its activity and

254 functions in double strand DNA repair by non-homologous end joining (7). The human genome

255 encodes over 40 other genes derived from DNA transposases (1, 3), including THAP9 that was

256 recently found to mobilize transposons in human cells with as of yet unknown function (40).

257 RAG1, THAP9 and PGBD5 are, to our knowledge, the only human proteins with demonstrated

258 transposase activity in human cells.

12

259 The distinct biochemical and structural features of PGBD5 indicated by our findings are

260 consistent with its unique evolution and function among human piggyBac-derived transposase

261 genes (24, 30). PGBD5 exhibits deep evolutionary conservation predating the origin of

262 vertebrates, including a preservation of genomic synteny across lancelet, lamprey, teleosts, and

263 amniotes (30). This suggests that while PGBD5 likely derived from an autonomous mobile

264 element, this ancestral copy was immobilized early in evolution and PGBD5 can probably no

265 longer mobilize its own genomic , at least in germline cells. The human genome contains

266 several thousands of miniature transposable elements (MITEs) with similarity to

267 piggyBac transposons (Figure 2 and Table 1) (1, 24). CSB-PGBD3 can bind to the piggyBac-

268 derived MER85 elements in the human genome (28, 29). Similarly, it is possible that PGBD5 can

269 act in trans to recognize and mobilize one or several related MITEs in the human genome.

270 Recently, single-molecular maps of the human genome have predicted thousands of mobile

271 element insertions, and the activity of PGBD5 or other endogenous transposases may explain

272 some of these novel variants (41, 42).

273 Given that both human RAG1 and ciliate piggyMac domesticated transposases catalyze

274 the elimination of specific genomic DNA sequences (10, 31), it is reasonable to hypothesize that

275 PGBD5’s biological function may similarly involve the excision of as of yet unknown ITR-

276 flanked sequences in the human genome or another form of DNA recombination. Since DNA

277 transposition by piggyBac family transposases requires substrate chromatin accessibility and

278 DNA repair, we anticipate that additional cellular factors are required for and regulate PGBD5

279 functions in cells. Likewise, just as RAG1-mediated DNA recombination of immunoglobulin

280 loci is restricted to B lymphocytes, and rearrangements of T-cell receptor genes to T

13

281 lymphocytes, potential DNA rearrangements mediated by PGBD5 may be restricted to specific

282 cell types and developmental periods.

283 PGBD5 localizes to the cell nucleus, and is expressed during embryogenesis and

284 neurogenesis, but its physiological function is not yet known (30). Generation of molecular

285 diversity through DNA recombination during nervous system development has been a long-

286 standing hypothesis (43, 44). The recent discovery of somatic retrotransposition in human

287 neurons (45-47), combined with our finding of DNA transposition activity by human PGBD5,

288 which is highly expressed in neurons, suggest that additional mechanisms of somatic genomic

289 diversification may contribute to vertebrate nervous system development.

290 Because DNA transposition is inherently topological and orientation of transposons can

291 affect the arrangements of reaction products (48), potential activities of PGBD5 can depend on

292 the arrangements of accessible genomic substrates, leading to both conservative DNA

293 transposition involving excision and insertion of transposon elements, as well as irreversible

294 reactions such as DNA elimination and chromosomal breakage-fusion-bridge cycles, as

295 originally described by McClintock (49). Finally, given the potentially mutagenic activity of

296 active DNA transposases, we anticipate that unlicensed activity of PGBD5 and other

297 domesticated transposases can be pathogenic in specific disease states, particularly in cases of

298 aberrant chromatin accessibility, such as cancer.

14

299 Materials and Methods

300 Reagents

301 All reagents were obtained from Sigma Aldrich if not otherwise specified. Synthetic

302 oligonucleotides were obtained from Eurofins MWG Operon (Huntsville, AL, USA) and purified

303 by HPLC.

304 Cell culture

305 HEK293 and HEK293T were obtained from the American Type Culture Collection

306 (ATCC, Manassas, Virginia, USA). The identity of all cell lines was verified by STR analysis

307 and lack of Mycoplasma contamination was confirmed by Genetica DNA Laboratories

308 (Burlington, NC, USA). Cell lines were cultured in DMEM supplemented with 10% fetal bovine

309 serum and 100 U / ml penicillin and 100 µg / ml streptomycin in a humidified atmosphere at 37

310 ˚C and 5% CO2.

311 Plasmid constructs

312 Human PGBD5 cDNA (Refseq ID: NM_024554.3) was cloned as a GFP fusion into the

313 lentiviral vector pReceiver-Lv103-E3156 (GeneCopoeia, Rockville, MD, USA). piggyBac

314 inverted terminal repeats (5’-

315 TTAACCCTAGAAAGATAATCATATTGTGACGTACGTTAAAGATAATCATGTGTAAAA

316 TTGACGCATG-3’ and 5’-CATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAA-3’),

317 as originally cloned by Malcolm Fraser and colleagues (18, 23), were cloned into PB-EF1-NEO

318 to flank IRES-driven neomycin resistance gene, as obtained from System Biosciences (Mountain

319 View, CA, USA). Plasmid encoding the hyperactive T. ni piggyBac transposase, as originally

320 generated by Nancy Craig and colleagues (50), was obtained from System Biosciences. Site-

321 directed PCR mutagenesis was used to generate mutants of PGBD5 and piggyBac, according to

15

322 manufacturer’s instructions (Agilent, Santa Clara, CA, USA). Plasmids were verified by

323 restriction endonuclease mapping and Sanger sequencing, and deposited in Addgene. Lentivirus

324 packaging vectors psPAX2 and pMD2.G were obtained from Addgene (51).

325 Cell transfection

326 HEK293 cells were seeded at a density of 100,000 cells per well in a 6-well plate, and

327 transfected with 2 μg of total plasmid DNA, containing 1 μg of transposon reporter (PB-EF1-

328 NEO or mutants) and 1 μg of transposase cDNA (pRecLV103-GFP-PGBD5 or mutants) using

329 Lipofectamine 2000, according to manufacturer’s instructions ( Technologies, CA, USA).

330 After 24 hours, transfected cells were trypsinized and re-plated for functional assays.

331 Quantitative RT-PCR

332 Upon transfection, cells were cultured for 48 hours and total RNA was isolated using the

333 RNeasy Mini Kit, according to manufacturer’s instructions (Qiagen, Venlo, Netherlands). cDNA

334 was synthesized using the SuperScript III First-Strand Synthesis System (Invitrogen,

335 Waltham, MA, USA). Quantitative real-time PCR was performed using the KAPA SYBR FAST

336 PCR polymerase with 20 ng template and 200 nM primers, according to the manufacturer’s

337 instructions (Kapa Biosystems, Wilmington, MA, USA). PCR primers are listed in

338 Supplementary File 1. Ct values were calculated using ROX normalization using the ViiA 7

339 software (Applied Biosystems).

340 Neomycin resistance colony formation assay

341 Upon transfection, cells were seeded at a density of 1,000 cells per 10 cm dish and

342 selected with G418 sulfate (2 mg/ml) for 2 weeks. Resultant colonies were fixed with methanol

343 and stained with Crystal Violet.

344

16

345 Transposon excision assay

346 Upon transfection, cells were cultured for 48 hours and DNA was isolated using the

347 PureLink Genomic DNA Mini Kit, according to manufacturer’s instructions (Life technologies,

348 Carlsbad, CA, USA). Reporter plasmid sequences flanking the neomycin resistance cassette

349 transposons were amplified using hot start PCR with an annealing temperature of 57 ˚C and

350 extension time of 2 minutes, according to the manufacturer’s instructions (New England Biolabs,

351 Beverly, MA, USA) using the Mastercycler Pro thermocycler (Eppendorf, Hamburg, Germany).

352 PCR primers are listed in Supp. Table 1. The PCR products were resolved using agarose gel

353 electrophoresis and visualized by ethidium bromide staining. Identified gel bands were extracted

354 using the PureLink Quick Gel Extraction Kit (Invitrogen) and Sanger sequenced to identify

355 excision products.

356 Quantitative PCR assay of genomic transposon integration

357 Upon transfection, cells were selected with puromycin (5 μg/ml) for 2 days to eliminate

358 non-transfected cells. After selection, cells were expanded for 10 days without selection and

359 genomic DNA isolated using PureLink Genomic DNA Mini Kit (Life technologies,

360 Carlsbad, CA, USA). Quantitative real-time PCR was performed using the KAPA SYBR FAST

361 PCR polymerase with 20 ng template and 200 nM primers, according to the manufacturer’s

362 instructions (Kapa Biosystems, Wilmington, MA, USA). PCR primers are listed in Supp. Table

363 1. Ct values were calculated using ROX normalization using the ViiA 7 software (Applied

364 Biosystems). We determined the quantitative accuracy of this assay using analysis of serial

365 dilution PB-E1-NEO plasmid as reference (Figure 2 – figure supplement 4).

366

17

367 Flanking sequence exponential anchored (FLEA) PCR

368 To amplify genomic transposon integration sites, we modified flanking sequence

369 exponential anchored (FLEA) PCR (33), as described in Figure 5 (34). First, linear extension

370 PCR was performed using 2 μg of genomic DNA and 100 nM biotinylated linear primer using

371 the Platinum HiFidelity PCR mix, according to manufacturer’s instructions (Invitrogen Corp.,

372 Carlsbad, CA, USA). Linear extension parameters for PCR were: 95 ˚C (45 sec), 62 ˚C (45 sec),

373 72 ˚C (3 min) for 30 cycles. Reaction products were purified by diluting the samples in a total

374 volume of 200 μl of nuclease-free water and centrifugation using the Amicon Ultra 0.5 ml 100K

375 at 12,000 g for 10 minutes at room temperature (EMD Millipore, Billerica, MA,

376 USA) purification. Retentate was bound to streptavidin ferromagnetic beads on a shaker at room

377 temperature overnight (Dynal, Oslo, Norway). Beads were washed with 40 μl of washing buffer

378 (Kilobase binder kit; Dynal), then water, then 0.1 N NaOH and finally with water again.

379 To anneal the anchor primer, washed beads were resuspended in a total volume of 20 μl

380 containing 5 μM anchor primer, 500 nM dNTP, and T7 DNA polymerase buffer (New England

381 Biolabs, Beverly, MA, USA). Samples were placed in a heating block pre-heated to 85 ˚C, and

382 allowed to passively cool to 37 ˚C. Once annealed, 10 units of T7 DNA polymerase (New

383 England Biolabs) was added and the mixtures were incubated for 1 h at 37 °C. Next, the beads

384 washed 5 times in water.

385 To exponentially amplify the purified products, beads were resuspended in a total volume

386 of 50 μl containing 500 nM of exponential and Transposon1 primers, and the Platinum

387 HiFidelity PCR mix. PCR was performed with the following parameters: 95 °C for 5 min,

388 followed by 35 cycles of 95 °C for 45 sec, 62 °C for 45 sec and 72 °C for 3 min. PCR products

389 were purified using the Invitrogen PCR purification kit (Invitrogen Corp., Carlsbad, CA, USA).

18

390 Second nested PCR was performed using 1/50th of the first exponential PCR product as template

391 using the Platinum HiFidelity PCR with 500 nM of exponential and Transposon2 primers. PCR

392 was performed with the following parameters: 35 cycles of 95 °C for 45 sec, 62 °C for 45 sec

393 and 72 °C for 3 min. Final PCR products were purified using the Invitrogen PCR purification kit,

394 according to the manufacturer’s instructions (Invitrogen Corp., Carlsbad, CA, USA).

395 Sequencing of transposon reporter integration sites

396 Equimolar amounts of purified FLEA-PCR amplicons were pooled, as measured using

397 fluorometry with the Qubit instrument (Invitrogen Carlsbad, CA) and sized on a 2100

398 BioAanalyzer instrument (Agilent Technologies, Santa Clara, CA). The sequencing library

399 construction was performed using the KAPA Hyper Prep Kit (KAPA biosystems, Wilmington,

400 MA) and 12 indexed Illumina adaptors from IDT (Coralville, IO), according to the

401 manufacturer’s instructions.

402 After quantification and sizing, libraries were pooled for sequencing on a MiSeq (pooled

403 library input at 10pM) on a 300/300 paired end run (Illumina, San Diego, CA). An average of

404 575,565 paired reads were generated per sample. The duplication rate varied between 56 and

405 87%. Because of the use of FLEA-PCR amplicons for DNA sequencing, preparation of Illumina

406 sequencing libraries is associated with the formation of adapter dimers (52). We used cutadapt to

407 first trim reads to retain bases with quality score > 20, then identify reads containing adapter

408 dimers and exclude them from further analyses (parameters –q 20 -b P7= -B

409 P5= –discard ; where is the P7 primer adapter with the specific barcode

410 for each library, and is the generic P5 adapter sequence:

411 GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCAT

412 T (53). Anchor primer sequences were then trimmed from the reads retained using cutadapt (-g

19

413 ^GTGGCACGGACTGATCNNNNNN). Filtered and trimmed reads were mapped to a hybrid

414 reference genome consisting of the hg19 full sequences and the PB-EF1-NEO

415 plasmid sequence using bwa-mem using standard parameters (54). Mapped reads were then

416 analyzed with LUMPY using split read signatures (55), and insertion loci were identified using

417 the called variants flagged as interchromosomal translocations (BND) between the plasmid

418 sequence and the human genome. Breakpoints were resolved to base-pair accuracy using split

419 read signatures when possible. Insertion loci were taken with 10 flanking base pairs and aligned

420 with MUSCLE to establish consensus sequence (55). Genomic distribution of insertion loci were

421 plotted using ChromoViz (https://github.com/elzbth/ChromoViz). All analysis scripts are

422 available from Zenodo (56).

423 Lentivirus production and cell transduction

424 Lentivirus production was carried out as described in (57). Briefly, HEK293T cells were

425 transfected using TransIT with 2:1:1 ratio of the pRecLV103 lentiviral vector, and psPAX2 and

426 pMD2.G packaging plasmids, according to manufacturer’s instructions (TransIT-LT1, Mirus,

427 Madison, WI). supernatant was collected at 48 and 72 hours post-transfection, pooled,

428 filtered and stored at -80 °C. HEK293T cells were transduced with virus particles at a

429 multiplicity of infection of 5 in the presence of 8 μg/ml hexadimethrine bromide. Transduced

430 cells were selected for 2 days with puromycin (5 μg/ml).

431 Western blotting

432 To analyze protein expression by Western immunoblotting, 1 million transduced cells

433 were suspended in 80 μl of lysis buffer (4% sodium dodecyl sulfate, 7% glycerol, 1.25% beta-

434 mercaptoethanol, 0.2 mg/ml Bromophenol Blue, 30 mM Tris-HCl, pH 6.8). Cells suspensions

435 were lysed using Covaris S220 adaptive focused sonicator, according to the manufacturer’s

20

436 instructions (Covaris, Woburn, CA). Lysates were cleared by centrifugation at 16,000 g for 10

437 minutes at 4 ˚C. Clarified lysates (30 μl) were resolved using sodium dodecyl sulfate-

438 polyacrylamide gel electrophoresis, and electroeluted using the Immobilon FL PVDF

439 membranes (Millipore, Billerica, MA, USA). Membranes were blocked using the Odyssey

440 Blocking buffer (Li-Cor), and blotted using the mouse and rabbit antibodies against GFP (1:500,

441 clone 4B10) and β-actin (1:5000, clone 13E5), respectively, both obtained from Cell Signaling

442 Technology (Beverly, MA). Blotted membranes were visualized using goat secondary antibodies

443 conjugated to IRDye 800CW or IRDye 680RD and the Odyssey CLx fluorescence scanner,

444 according to manufacturer’s instructions (Li-Cor, Lincoln, Nebraska).

445 Multiple analysis of DNA and protein sequences

446 The transposon annotation of the human genome (assembly hg19) was downloaded from

447 the UCSC website (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz) and

448 converted to the GFF3 annotation format. The sequences of the elements in the piggyBac-like

449 MER75, MER75A, MER75B, MER85, UCON29 and LOOPER families were extracted with 50

450 flanking base pairs using fastaFromBed from the BedTools genome analysis suite

451 (http://bedtools.readthedocs.org). The set of sequences for each family was aligned using

452 MUSCLE using standard parameters (58). Inverted terminal repeat sequences for each family

453 were defined as terminal sequences conserved amongst all family members measured using

454 multiple sequence alignments. We used a cutoff of 70% similarity to determine the fist position

455 of the ITR in the alignment. The multiple sequence alignments were then manually curated to

456 identify the set of “intact” elements defined by containing both the TTAA target site duplication

457 as well as both 3’ and 5’ ITRs aligning without gaps with the consensus ITR sequence. Multiple

458 sequence alignment of the ITR sequences was also performed with MUSCLE, and the sequence

21

459 identity matrix calculated using SIAS (http://imed.med.ucm.es/Tools/sias.html), with the

460 following measure of identity:

= 100 ∗ ℎ ℎ

461

462 Chromosome ideograms were made using the NCBI’s Genome Decoration Tool

463 (http://www.ncbi.nlm.nih.gov/genome/tools/gdp/). Multiple sequence alignment of protein

464 sequences was done using Clustal Omega (59, 60). Pairwise BLAST alignments-based Grishin’s

465 sequence distance analysis was done using BLAST (http://blast.ncbi.nlm.nih.gov/) and MEGA6

466 using standard parameters (61).

467 Statistical analysis

468 Statistical significance values were determined using two-tailed non-parametric Mann-

469 Whitney tests for continuous variables, and two-tailed Fisher exact test for discrete variables.

22

470 Acknowledgments

471 We are grateful to Alejandro Gutierrez, Marc Mansour, Thomas Look, Charles Roberts,

472 Daniel Bauer, Leo Wang, Hao Zhu and Michael Kharas for critical discussions, Nahum Meller

473 for cloning assistance, and Alan Chramiec for technical support. This work was supported by the

474 University of Essen Pediatric Oncology Research Program (A.H.), NIH K08 CA160660,

475 Burroughs Wellcome Fund, CureSearch for Children’s Cancer, and Hyundai Hope on Wheels

476 (A.K.), and the Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts, the STARR

477 Consortium, the Bert L and N Kuggie Vallee Foundation, and the WorldQuant Foundation

478 (C.E.M.). We thank the MSKCC Integrated Genomics Core Facility and Bioinformatics Core

479 Facility for assistance with DNA sequencing and analysis (NIH P30 CA008748).

23

480 References

481 1. Feschotte C & Pritham EJ (2007) DNA transposons and the evolution of eukaryotic 482 genomes. Annu Rev Genet 41:331-368. 483 2. Cordaux R & Batzer MA (2009) The impact of retrotransposons on human genome 484 evolution. Nat Rev Genet 10(10):691-703. 485 3. Smit AF (1999) Interspersed repeats and other mementos of transposable elements in 486 mammalian genomes. Curr Opin Genet Dev 9(6):657-663. 487 4. Jacques PE, Jeyakani J, & Bourque G (2013) The majority of primate-specific regulatory 488 sequences are derived from transposable elements. PLoS Genet 9(5):e1003504. 489 5. Stewart C, et al. (2011) A comprehensive map of mobile element insertion 490 polymorphisms in humans. PLoS Genet 7(8):e1002236. 491 6. Munoz-Lopez M & Garcia-Perez JL (2010) DNA transposons: nature and applications in 492 genomics. Curr Genomics 11(2):115-128. 493 7. Liu D, et al. (2007) The human SETMAR protein preserves most of the activities of the 494 ancestral Hsmar1 transposase. Mol Cell Biol 27(3):1125-1132. 495 8. Kazazian HH, Jr. (2004) Mobile elements: drivers of genome evolution. Science 496 303(5664):1626-1632. 497 9. Erwin JA, Marchetto MC, & Gage FH (2014) Mobile DNA elements in the generation of 498 diversity and complexity in the brain. Nat Rev Neurosci 15(8):497-506. 499 10. Hiom K, Melek M, & Gellert M (1998) DNA transposition by the RAG1 and RAG2 500 proteins: a possible source of oncogenic translocations. Cell 94(4):463-470. 501 11. Shaheen M, Williamson E, Nickoloff J, Lee SH, & Hromas R (2010) Metnase/SETMAR: 502 a domesticated primate transposase that enhances DNA repair, replication, and 503 decatenation. Genetica 138(5):559-566. 504 12. Berg DE & Howe MM (1989) Mobile DNA (Am. Soc. Microbiol, Washington, DC). 505 13. Dyda F, Chandler M, & Hickman AB (2012) The emerging diversity of transpososome 506 architectures. Q Rev Biophys 45(4):493-521. 507 14. Keith JH, Schaeper CA, Fraser TS, & Fraser MJ, Jr. (2008) Mutational analysis of highly 508 conserved aspartate residues essential to the catalytic core of the piggyBac transposase. 509 BMC Mol Biol 9:73. 510 15. Mitra R, Fain-Thornton J, & Craig NL (2008) piggyBac can bypass DNA synthesis 511 during cut and paste transposition. The EMBO journal 27(7):1097-1109. 512 16. De Palmenaer D, Siguier P, & Mahillon J (2008) IS4 family goes genomic. BMC Evol 513 Biol 8:18. 514 17. Fraser MJ, Brusca JS, Smith GE, & Summers MD (1985) Transposon-mediated 515 mutagenesis of a baculovirus. Virology 145(2):356-361. 516 18. Elick TA, Lobo N, & Fraser MJ, Jr. (1997) Analysis of the cis-acting DNA elements 517 required for piggyBac transposable element excision. Mol Gen Genet 255(6):605-610. 518 19. Beames B & Summers MD (1990) Sequence comparison of cellular and viral copies of 519 host cell DNA insertions found in Autographa californica nuclear polyhedrosis virus. 520 Virology 174(2):354-363. 521 20. Fraser MJ, Cary L, Boonvisudhi K, & Wang HG (1995) Assay for movement of 522 Lepidopteran transposon IFP2 in insect cells using a baculovirus genome as a target 523 DNA. Virology 211(2):397-407.

24

524 21. Wang HG & Fraser MJ (1993) TTAA serves as the target site for TFP3 lepidopteran 525 transposon insertions in both nuclear polyhedrosis virus and Trichoplusia ni genomes. 526 Insect Mol Biol 1(3):109-116. 527 22. Fraser MJ, Smith GE, & Summers MD (1983) Acquisition of Host Cell DNA Sequences 528 by Baculoviruses: Relationship Between Host DNA Insertions and FP Mutants of 529 Autographa californica and Galleria mellonella Nuclear Polyhedrosis . J Virol 530 47(2):287-300. 531 23. Handler AM, McCombs SD, Fraser MJ, & Saul SH (1998) The lepidopteran transposon 532 vector, piggyBac, mediates germ-line transformation in the Mediterranean fruit fly. Proc 533 Natl Acad Sci U S A 95(13):7520-7525. 534 24. Sarkar A, et al. (2003) Molecular evolutionary analysis of the widespread piggyBac 535 transposon family and related "domesticated" sequences. Mol Genet Genomics 536 270(2):173-180. 537 25. Mitra R, et al. (2013) Functional characterization of piggyBat from the bat Myotis 538 lucifugus unveils an active mammalian DNA transposon. Proc Natl Acad Sci U S A 539 110(1):234-239. 540 26. Smit AF & Riggs AD (1996) Tiggers and DNA transposon fossils in the human genome. 541 Proc Natl Acad Sci U S A 93(4):1443-1448. 542 27. Newman JC, Bailey AD, Fan HY, Pavelitz T, & Weiner AM (2008) An abundant 543 evolutionarily conserved CSB-PiggyBac fusion protein expressed in Cockayne 544 syndrome. PLoS Genet 4(3):e1000031. 545 28. Bailey AD, et al. (2012) The conserved Cockayne syndrome B-piggyBac fusion protein 546 (CSB-PGBD3) affects DNA repair and induces both interferon-like and innate antiviral 547 responses in CSB-null cells. DNA Repair (Amst) 11(5):488-501. 548 29. Gray LT, Fong KK, Pavelitz T, & Weiner AM (2012) Tethering of the conserved 549 piggyBac transposase fusion protein CSB-PGBD3 to chromosomal AP-1 proteins 550 regulates expression of nearby genes in humans. PLoS Genet 8(9):e1002972. 551 30. Pavelitz T, Gray LT, Padilla SL, Bailey AD, & Weiner AM (2013) PGBD5: a neural- 552 specific -containing piggyBac transposase domesticated over 500 million years ago 553 and conserved from cephalochordates to humans. Mob DNA 4(1):23. 554 31. Baudry C, et al. (2009) PiggyMac, a domesticated piggyBac transposase involved in 555 programmed genome rearrangements in the ciliate Paramecium tetraurelia. Genes Dev 556 23(21):2478-2483. 557 32. Cary LC, et al. (1989) of baculoviruses: analysis of Trichoplusia 558 ni transposon IFP2 insertions within the FP-locus of nuclear polyhedrosis viruses. 559 Virology 172(1):156-169. 560 33. Pule MA, et al. (2008) Flanking-sequence exponential anchored-polymerase chain 561 reaction amplification: a sensitive and highly specific method for detecting retroviral 562 integrant-host-junction sequences. Cytotherapy 10(5):526-539. 563 34. Henssen A, Carson JR, & Kentsis A (2015) Transposon mapping using flanking sequence 564 exponential anchored (FLEA) PCR. Nature Protocol Exchange. 565 35. Henssen A, et al. (2015) Data from: Genomic DNA transposition induced by human 566 PGBD5. Dryad Digital Repository. doi:10.5061/dryad.b2hc1. 567 36. Crooks GE, Hon G, Chandonia JM, & Brenner SE (2004) WebLogo: a sequence logo 568 generator. Genome Res 14(6):1188-1190.

25

569 37. Nesmelova IV & Hackett PB (2010) DDE transposases: Structural similarity and 570 diversity. Adv Drug Deliv Rev 62(12):1187-1195. 571 38. Fugmann SD, Lee AI, Shockett PE, Villey IJ, & Schatz DG (2000) The RAG proteins 572 and V(D)J recombination: complexes, ends, and transposition. Annu Rev Immunol 573 18:495-527. 574 39. Landree MA, Wibbenmeyer JA, & Roth DB (1999) Mutational analysis of RAG1 and 575 RAG2 identifies three catalytic amino acids in RAG1 critical for both cleavage steps of 576 V(D)J recombination. Genes Dev 13(23):3059-3069. 577 40. Majumdar S, Singh A, & Rio DC (2013) The human THAP9 gene encodes an active P- 578 element DNA transposase. Science 339(6118):446-448. 579 41. Pendleton M, et al. (2015) Assembly and diploid architecture of an individual human 580 genome via single-molecule technologies. Nature methods 12(8):780-786. 581 42. Chaisson MJ, et al. (2015) Resolving the complexity of the human genome using single- 582 molecule sequencing. Nature 517(7536):608-611. 583 43. Dreyer WJ, Gray WR, & Hood L (1967) The Genetics, Molecular, and Cellular Basis of 584 Formation: Some Facts and a Unifying Hypothesis. Cold Spring Harb Symp 585 Quant Biol 32:353-367. 586 44. Wu Q & Maniatis T (1999) A striking organization of a large family of human neural 587 cadherin-like cell adhesion genes. Cell 97(6):779-790. 588 45. Coufal NG, et al. (2009) L1 retrotransposition in human neural progenitor cells. Nature 589 460(7259):1127-1131. 590 46. Evrony GD, et al. (2012) Single-neuron sequencing analysis of L1 retrotransposition and 591 somatic mutation in the human brain. Cell 151(3):483-496. 592 47. Upton KR, et al. (2015) Ubiquitous L1 mosaicism in hippocampal neurons. Cell 593 161(2):228-239. 594 48. Claeys Bouuaert C, Liu D, & Chalmers R (2011) A simple topological filter in a 595 eukaryotic transposon as a mechanism to suppress genome instability. Mol Cell Biol 596 31(2):317-327. 597 49. McClintock B (1942) The Fusion of Broken Ends of Chromosomes Following Nuclear 598 Fusion. Proc Natl Acad Sci U S A 28(11):458-463. 599 50. Li X, et al. (2013) piggyBac transposase tools for genome engineering. Proc Natl Acad 600 Sci U S A 110(25):E2279-2287. 601 51. Cudre-Mauroux C, et al. (2003) Lentivector-mediated transfer of Bmi-1 and telomerase 602 in muscle cells yields a duchenne myoblast cell line with long-term genotypic 603 and phenotypic stability. Hum Gene Ther 14(16):1525-1533. 604 52. ILLUMINA (2015) Paired-End Sample Preparation Guide. 605 53. Lindgreen S (2012) AdapterRemoval: easy cleaning of next-generation sequencing reads. 606 BMC Res Notes 5:337. 607 54. Li H & Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler 608 transform. Bioinformatics 26(5):589-595. 609 55. Layer RM, Chiang C, Quinlan AR, & Hall IM (2014) LUMPY: a probabilistic 610 framework for structural variant discovery. Genome Biol 15(6):R84. 611 56. Henaff E, Henssen A, Mason CE, & Kentsis A (2015) Genomic DNA transposition 612 induced by human PGBD5 (Accompanying Scripts for Computational Analyses), 613 doi:10.5281/zenodo.22206.

26

614 57. Kentsis A, et al. (2012) Autocrine activation of the MET receptor tyrosine kinase in acute 615 myeloid leukemia. Nat Med 18(7):1118-1122. 616 58. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high 617 throughput. Nucleic acids research 32(5):1792-1797. 618 59. Sievers F & Higgins DG (2014) Clustal omega. Current protocols in bioinformatics / 619 editoral board, Andreas D. Baxevanis ... [et al.] 48:3 13 11-13 13 16. 620 60. Thompson JD, Higgins DG, & Gibson TJ (1994) CLUSTAL W: improving the 621 sensitivity of progressive multiple sequence alignment through sequence weighting, 622 position-specific gap penalties and weight matrix choice. Nucleic acids research 623 22(22):4673-4680. 624 61. Tamura K, Stecher G, Peterson D, Filipski A, & Kumar S (2013) MEGA6: Molecular 625 Evolutionary Genetics Analysis version 6.0. and evolution 626 30(12):2725-2729.

627

27

628 Figures

629 Figure 1. Human PGBD5 is a distinct piggyBac-like transposase. Sequence alignment of 630 piggyBac-like transposases frog Uribo 2, bat piggyBat, lepidopteran piggyBac, and human 631 PGBD5. Catalytic residues conserved among piggyBac transposases are highlighted in red. 632 Human PGBD5 D168, D194, and D386 residues, identified in our study (Figure 7), are marked 633 in yellow.

634 Figure 2. Human piggyBac-like transposable elements have intact inverted terminal repeat 635 (ITR) sequences similar to the T. ni piggyBac transposon. (A) Chromosome ideogram 636 representing the distribution of annotated piggyBac-like elements in the human genome (version 637 hg19). (B) Multiple sequence alignment of the piggybac ITR sequence with the consensus ITR 638 sequences of the MER75 and MER85 families of piggyBac-like elements. (C) Chromosome 639 ideogram representing the distribution of piggyBac-like elements annotated in the human 640 genome (version hg19) with TTAA target site duplication as well as ITR sequences aligning with 641 the consensus (intact ITRs).

642 Figure 3. PGBD5 induces genomic integration of synthetic piggyBac transposons in human 643 cells. (A) Schematic of the human PGBD5 protein with its C-terminal transposase 644 domain, as indicated. (B) Schematic of synthetic transposon substrates used for DNA 645 transposition assays, including transposons with mutant inverted terminal repeat (ITR) marked 646 by triangles in red, and transposons lacking ITRs marked in blue. (C) Representative 647 photographs of Crystal Violet-stained colonies obtained after G418 selection of HEK293 cells 648 co-transfected with the transposon reporter plasmid along with transposase cDNA expression 649 vectors. (D) Quantification of G418-selection clonogenic assays, demonstrating the integration 650 activities of GFP-PGBD5, PGBD5 N-terminus, T.ni. piggyBac and GFP control (GFP-PGBD5 651 vs. GFP; p = 0.00031). (E) Quantification of genomic transposon integration using quantitative 652 PCR of GFP-PGBD5 and GFP expressing cells using intact (black), mutant (red), and deleted 653 (blue) ITR-containing transposon reporters (intact vs. mutant ITR; p = 0.00011). Error bars 654 represent standard errors of the mean of 3 biologic replicates.

655 Figure 4. PGBD5 precisely excises piggyBac transposons. (A) Representative agarose 656 electrophoresis analysis of PCR-amplified PB-EF1-NEO transposon reporter plasmid from 657 transposase-expressing cells, demonstrating efficient excision of the ITR-containing transposon 658 by PGBD5, but not GFP or PGBD5 N-terminus mutant lacking the transposase domain. T.ni 659 piggyBac serves as positive control. (B) Representative Sanger sequencing fluorogram of the 660 excised transposon, demonstrating precise excision of the ITR and associated duplicated TTAA 661 sequence, marked in red. demonstrating integrations of transposons (green) into human genome 662 (blue) with TTAA insertion sites and genomic coordinates, as marked.

28

663 Figure 5. Schematic of transposon specific flanking-sequence exponential anchored– 664 polymerase chain reaction amplification (FLEA-PCR) and massively parallel single 665 molecule sequencing assay for mapping and sequencing transposon insertions.

666 Figure 6. PGBD5 induces DNA transposition in human cells. (A) Analysis of the transposon 667 integration sequences, demonstrating TTAA preferences in integrations in cells expressing GFP- 668 PGBD5, but not GFP control. X-axis denotes sequence logo position, and y-axis 669 denotes information content in bits. (B) Circos plot of the genomic locations PGBD5-mobilized 670 transposons plotted as a function of chromosome number and transposition into genes (red) and 671 intergenic regions (gray). (C) Alignment of representative DNA sequences of identified genomic 672 integration sites,

673 Figure 7. Structure-function analysis of PGBD5-induced DNA transposition using alanine 674 scanning mutagenesis. (A) Quantitative PCR analysis of genomic integration activity of alanine 675 point mutants of GFP-PGBD5, as compared to wild-type and GFP control-expressing cells. 676 D168A, D194A, and D386A mutants (red) exhibit significant reduction in apparent activity (p = 677 0.00011, p = 0.000021, p = 0.000013 vs. GFP-PGBD5, respectively). Dotted line marks 678 threshold at which less than 1 transposon copy was detected per haploid human genome. Error 679 bars represent standard errors of the mean of 3 biological replicates. (B) Western immunoblot 680 showing equal expression of GFP-PGBD5 mutants, as compared to wild-type GFP-PGBD5 681 (green). β-actin (red) serves as loading control.

682 Figure 8. PGBD5 homologs are divergent from other piggyBac genes in vertebrates. 683 Phylogenetic reconstruction of the evolutionary relationships among piggyBac transposase- 684 derived genes in vertebrates, demonstrating the PGBD5 homologs represent a distinct subfamily 685 of piggyBac-like derived genes. Scale bar represents Grishin distance.

29

686 Tables

687 Table 1. Summary of annotated human piggyBac-like elements.

688

Total piggyBac-like elements Intact elements † MER75 475 144 MER75A 93 62 MER75B 114 27 MER85 905 95 UCON29 240 0 Looper 531 0 689

690 † denotes elements with intact ITR sequences that align with the consensus without gaps, and 691 contain a TTAA target site duplication.

30

692 Table 2. Sequence identity matrix of the piggyBac inverted terminal repeat sequences and 693 consensus sequences of the MER75 and MER85 human piggyBac-like elements.

694

piggyBac MER75 MER85 piggyBac 100% MER75 53% 100% MER85 56% 63% 100% 695

31

696 Table 3. Analysis of transposon integration sequences in human genomes induced by 697 PGBD5.

Intact Transposon Mutant Transposon

TTAA ITR Non-ITR TTAA ITR Non-ITR

Transposase

GFP-PGBD5 82% (65) † 18% (14) 11% (4) ‡ 89% (33)

GFP Control 17% (2) 83% (10) 40% (27) 60% (40)

698

699 Cells expressing GFP-PGBD5 and intact transposons exhibit significantly higher frequency of 700 genomic integration as compared to either GFP control, or GFP-PGBD5 with mutant 701 transposons, with 87% (69 out of 79) of sequences demonstrating DNA transposition of ITR 702 transposons into TTAA sites († p = 1.8 x 10-5). Mutation of the transposon ITR significantly 703 reduces ITR-mediated integration, with only 11% (4 out of 37) of sequences (‡ p = 0.0016). 704 Numbers in parentheses denote absolute numbers of identified insertion sites.

32

705 Figure supplements

706 Figure 3 – figure supplement 1. Assay for genomic integration of transposon reporters. 707 Schematic showing the procedure to assay for genomic integration of transposon reporters using 708 G418 selection to clone genomically-integrated neomycin resistant cells.

709 Figure 3 – figure supplement 2. GFP-PGBD5, PGBD5 N-terminus and T.ni. piggyBac are 710 equally expressed upon transfection in HEK293 cells. Quantitative RT-PCR specific to GFP- 711 PGBD5, PGBD5 N-terminus and T.ni. piggyBac show equal mRNA expression of all 3 712 transposases (PGBD5 N-terminus p = 0.17, T.ni. piggyBac p = 0.092).

713 Figure 3 – figure supplement 3. Sanger sequencing traces of the inverted terminal repeats 714 of the synthetic transposon reporter plasmids. Top to bottom: Intact transposon, deleted 715 transposon and mutant transposon. Black line indicates location of GGG to ATA mutations in 716 mutant transposon ITR.

717 Figure 3 – figure supplement 4. Quantitative assay of genomic integration of transposon 718 reporters. Cells were transfected as described in Materials and Methods. Next, cells were 719 expanded and genomic DNA was isolated. Quantitative real-time PCR was performed with 720 primers specific to the transposon sequence as well as to the TK1 reference gene.

721 Figure 3 – figure supplement 5. Quantitative genomic PCR standard curve for transposon 722 specific primers.

723 Figure 4 – figure supplement 1. Schematic of transposon excision assay. Cells were 724 transfected as described in Materials and Methods. Next DNA was isolated. PCR was performed 725 with primers specific to sequences flanking the transposon.

726 Figure 6 – figure supplement 1. Representative agarose gel image of amplicons from 727 flanking-sequence exponential anchored– polymerase chain reaction amplification (FLEA- 728 PCR). Arrow indicated expected size of degenerate anchor primer amplicons.

729 Figure 7 – figure supplement 1. Sanger sequencing trances of pRecLV103-GFP-PGBD5 730 D>A and E>A mutants (D168A, D175A, E188A, D192A, D194A, E203A, E205A, E236A) .

731 Figure 7 – figure supplement 2. Sanger sequencing trances of pRecLV103-GFP-PGBD5 732 D>A and E>A mutants (D241A, D244A, E284A, E285A, E287A, D303A, E365A, E373A)).

733 Figure 7 – figure supplement 3. Sanger sequencing trances of pRecLV103-GFP-PGBD5 734 D>A and E>A mutants (D386A, D387A, D425A, E439A, E444A, E449A, D450A).

735 Figure 8 – figure supplement 1. PGBD5 glutamic acid resitues D168, D194, and D386 are 736 conserved across species. Clustal O alignment of PGBD5 proteins across species (Danio rerio, 737 Xenopus tropicalis, Homo sapiens, Mus musculus, Gallus gallus, Python bivittatus). D168, 738 D194, and D386 residues are highlighted in yellow. 33

739 Supplementary files

740 Supplementary Table 1. Sequences of PCR primers

34

Figure 1

Uribo2 ------MAKRFYSAEEAAAHCMASSSEEFSGSDSEYVPPASESDSSTEESWCSS 48 Uribo2 ----LNRGETYALRKNELLAIKFFD---KKNVFMLTSIHDESVIREQRVGRPPKN----- 429 PiggyBat ------MSQHSDYSDDEFCADKLSNYSCDSDLENASTSDE-DSSDDEVMVRP 45 PiggyBat ----LKKGETKFIRKNDILLQVWQS---KKPVYLISSIHSAEMEESQNIDRTSKKKIV-- 397 PiggyBac ------MGSSLDDEHILSALLQSDDELVGEDSDSEISDHVSEDDVQSDTEEAFI 48 PiggyBac ----SRPVGTSMFCFDGPLTLVSYK---PKPAKMVYLLS--SCDEDASINESTGK----- 432 PiggyMac MFYNDEEEEDWGDKNEKDAEDEFQDLNPSEHKKRIPNNRPKLQPKPQKKKKEVQECEFAI 60 PiggyMac QQIYNQYHYAYYLNQSNELMLMYFQGTSEKEIALISNFLDNSLNEQHMWDISKQHYYVPH 591 PGBD5 ------MPDYRTRALFVQSTLGDSAGPELQLLSIVPGRDLQPSDSFTGP 43 PGBD5 PATPPARGQYQIKMKGNMSLICWYN----KGHFRFLTNAYSPVQQGVIIKRKSGEIP--- 367 . . : . . . . . * . . .

Uribo2 STVSALEEPMEVDEDVDDLEDQEAGDRADAA------79 Uribo2 --KPLCSKEYSKYMGGVDRTDQLQHYYNATRKTRAWYKKVGIYLIQMALRNSYIVYKAAV 487 PiggyBat RTLRRRRISSSSSDSESDIE ------65 PiggyBat --KPNALIDYNKHMKGVDRADQYLSYYSILRRTVKWTKRLAMYMINCALFNSYAVYKSVR 455 PiggyBac DEVHEVQPTSSGSEILDEQNVIEQPGSSLASN ------RILT 84 PiggyBac ---PQMVMYYNQTKGGVDTLDQMCSVMTCSRKTNRWPMALLYGMINIACINSFIIYSHNV 489 PiggyMac DDPIKRRQLANYQHVPVRLEAGEKMNINVQYDDGQVRKFGQQKAFEEKLIPISKSGTFIY 120 PiggyMac LKAPYMMYVYNKYKGGVDRRNSYVVKYRSRFPAKKWWQSVFERLFETAILNAYLIFRSYN 651 PGBD5 TRKMPPSAS------52 PGBD5 --CPLAVEAFAAHLSYICRYDDKYSKYFISHKPNKTWQQVFWFAISIAINNAYILYKMSD 425 * : : :. . : :. * *:: ::

Uribo2 ------AGGEPAWGPPCNFPP----EIPPFT------100 Uribo2 PGPK------491 PiggyBat ------GGREEWS-HVDNPP----VLEDFL------84 PiggyBat QRKMG------460 PiggyBac LPQRTIRGKNKHCWSTSKSTRRSRVSALNIVR ------116 PiggyBac SSKG------493 PiggyMac EASTGKLERKNPGRPREDTRSLFDDPKLQKMRNNYVPKLLQRITTNNTKE VDEEEGVQDP 180 PiggyMac PESSYRNKGQMRDFRINLMYQFAERYKSYEHQQEENGKNRFSYFAKIQPHTFIEGEEIVK 711 PGBD5 ------PGBD5 AYHVKR------431

Uribo2 ------TVPG-VKVDTS-NFEPINFFQLFMTEAILQDMVLY 133 Uribo2 ------LSYYK 496 PiggyBat ------GHQG-LNTDAV-INNIEDAVKLFIGDDFFEFLVEE 117 PiggyBat ------FKMFLKQTAIHWLTDDIP 478 PiggyBac ------SQRGPTRMCRN-IYDPLLCFKLFFTDEIISEIVKW 150 PiggyBac ------EKVQ 497 PiggyMac GVSWKPVKSVHEVPESFYMRQVFDKSASGPRSIDKSKIKSEYDAFRLFFDNDIYNTIIKH 240 PiggyMac CSECGNETKVFCQECTILKAEVVGLCHEKDTIKCQRFHEFMDFELDKNKEVIDKRKGKDP 771 PGBD5 ------AVDFFQLFVPDNVLKNMVVQ 72 PGBD5 ------YSRAQ 436 .:**. : . . ::

Uribo2 TNVYAEQYLTQNP------LPRYARAHAWHPTDIAEMKRFVGLTLAMGLIKANSL 182 Uribo2 YQLQILPALLFGGVEEQTVPEMPPSDNVARL---IGKHFIDTLPPTPG---KQRPQKGCK 550 PiggyBat SNRYYNQ--NRNN------FKLSKKSLKWKDITPQEMKKFLGLIVLMGQVRKDRR 164 PiggyBat EDMDIVPDLQPVPSTSGMRAKPPTSDPPCRLSMDMRKHTLQAIVGSGK ---KKNILRRCR 535 PiggyBac TNAEISLK------RRESMTGATFRDTNEDEIYAFFGILVMT-AVRKDNH 193 PiggyBac SRKKFMRNLYMSLTSSFMRKRLEAPTLKRYLRDNISNILPNEVPGTSDDSTEEPVMKKRT 557 PiggyMac TRERYQQKVEEQIYSYIHGMVHMGIRAKKPTLMQWEFTEYELEAYFAVQIFFGIVRLSNQ 300 PiggyMac YKPNFLEKLNQRTNAKGNQSSQKKESPLVNLLNKINDQIKQEARVKQEVKREDNTNKQTT 831 PGBD5 TNMYAKKF------QERFGSDGAWVEVTLTEMKAFLGYMISTSISHCESV 116 PGBD5 FGERLVRELLGLEDASPTH------455 :. . *: :.. : : . :: * .

Uribo2 ESYWDTTTVL------SIPVFSATMSRNRYQLLLRFLHF 215 Uribo2 VCRKR-----GIRRDTRYYCPKCPRNPGLCFKPCFEIYHTQLHY------589 PiggyBat DDYWTTEPWT------ETPYFGKTMTRDRFRQIWKAWHF 197 PiggyBat VCSVH-----KLRSETRYMCKFCN--IPLHKGACFEKYHTLKNY------572 PiggyBac MSTDDLFDRS------LSMVYVSVMSRDRFDFLIRCLRM 226 PiggyBac YCTYCP---SKIRRKANASCKKCK--KVICREHNIDMCQSCF------594 PiggyMac RDYWKSSARQKPIKKAETGRRKLRELAQEKMDRYAHWVTQRMSSIVSYEKFKTIRNCLNI 360 PiggyMac YIIPEVKSEDESSTDSDIYIQRTTNQRLIEIHQKIEQMSMCSEEFLNGSVANSQENFAER 891 PGBD5 LSIWSGGFYS------NRSLALVMSQARFEKILKYFHV 148 PGBD5 ------. :: :: : . ..

Uribo2 NNNATAVPPDQPGHDRLHKLRPLIDSLSERFAAVYTPCQNICIDESLLL----FKGRLQF 271 Uribo2 ------PiggyBat NNNADIVNES----DRLCKVRPVLDYFVPKFINIYKPHQQLSLDEGIVP----WRGRLFF 249 PiggyBat ------PiggyBac DD--KSIRPTLRENDVFTPVRKIWDLFIHQCIQNYTPGAHLTIDEQLLG----FRGRCPF 280 PiggyBac ------PiggyMac SG---AEALKLKGRDPIWKIRDFLNQMNMRFAKYYYPGEFITIDEGMIP----FAGKVQF 413 PiggyMac DENDQFNDNNGNDDNQFQFPQQRAQQIDDDDEQRNSKNEEQQKDFIKKMMEFADDEGSEN 951 PGBD5 VAFRSSQTTHG-----LYKVQPFLDSLQNSFDSAFRPSQTQVLHEPLIDEDPVFIA TCTE 203 PGBD5 ------: :: . : : : * :.* :: : .

Uribo2 RQYIPSKRARYGIKFYKLCESSSGYTSYFLIYEGKDSKLDPPGCPPDLTV ------SGKI 325 Uribo2 ------PiggyBat RVYNAGKIVKYGILVRLLCESDTGYICNMEIYCGEGKRLLET ------291 PiggyBat ------PiggyBac RMYIPNKPSKYGIKILMMCDSGTKYMINGMPYLGRGTQTNGVPLG------EYY 328 PiggyBac ------PiggyMac KVYNPDKPTKWGIKEYLLCDASNTYTFQLRLYHGQTMWNNDFKQTMFVNEEDTQHRTMEL 473 PiggyMac EEIQYPEDEADHFYQQLLQQEEQAIKYQQKKQLQQQLEEESERSNISHKSKKQQKLEQKF 1011 PGBD5 RELRKRKKRKFSLWVRQCSSTGFIIQIYVHLKEGGGPDGLDALKNKPQLHS------MV 256 PGBD5 ------: * ::.: ..:. *

Uribo2 VWELISPLLGQGFHLYVDNFYSSIPLFTALYCL--DTPACGTINRNRKGLPRALLDKK-- 381 Uribo2 ------PiggyBat IQTVVSPYTDSWYHIYMDNYYNSVANCEALMKN--KFRICGTIRKNR-GIPKDFQTIS-- 346 PiggyBat ------PiggyBac VKELSKPVHGSCRNITCDNWFTSIPLAKNLLQEPYKLTIVGTVRSNKREIPEVLKNSR-- 386 PiggyBac ------PiggyMac VLQMCKDYEHKAHKVVMDNYYSSWMLFRELRNR--GIGAVGTIRHNRTGLTKKDLTSKHF 531 PiggyMac IETSMRGIKQSQIQSNSEIGQDLQKIISASQDLNQISKQITESKGDNQNSQSDQ 1065 PGBD5 ARSLCRNAAGKNYIIFTGPSITSLTLFEEFEKQ--GIYCCGLLRARKSDCTGLPLSMLTN 314 PGBD5 ------: . : . .* : * :. .: . Figure 2 A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

MER75 MER75A MER75B MER85 UCON29 Looper

BCMER75 MER75A MER75B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y PiggyBac 5’ ITR TTAACCCT A G AAAG A T A MER85 MER75A 5’ ITR TTAACCCA TTTCCCA T T T MER75UCON29 5’ ITR TTAACCCTTTTCCCA T T T MER75BLooper 5’ ITR TTAACCCA TTTCCCA T T T MER85 5’ ITR TTAACCAATTT T G CCT

PiggyBac 3’ ITR TTA CCTTT TA GGG TTA A MER75A 3’ ITR AAA T GGGAAA T GGG TTA A MER75 3’ ITR AAA T GGG AAA A GGG TTA A MER75B 3’ ITR AAA T GGG AAA T GGG TTA A MER85 3’ ITR AAGGG T AAA T GGG TTA A intact MER75MER75 intact ITR MER75A intact ITR MER75B intact ITR MER85 intact ITR intact MER75A intact MER75B intact MER85 Figure 3 A B PGBD5 Synthetic T.ni. piggyBac trans- poson N-terminus Transposase domain 1 160 455 intact ITR ITR ITR NeoR C mutant ITR NeoR PGBD5 T. ni. GFP GFP-PGBD5 N-terminus piggyBac deleted ITR NeoR

D E intact ITR mutant ITR 14 deleted ITR 1.5 12 10

gene)

2 1.0 8

TK1 6

per cm 0.5 4

Transposon copies copies Transposon 2

(relative to to (relative G418 resistant colonies colonies G418 resistant 0.0 0 GFP GFP-PGBD5 GFP

GFP-PGBD5 T. ni. piggyBac PGBD5 N-terminus

Figure 5 Biotinylated primer

human genome transposon

anchor Streptavidin primer beads

exponential primer Transposon 1 primer

exponential Transposon 2 primer primer

Illumina MiSeq (2x300bp) sequencing 1. Align reads to human genome and transposon 2. Identify split reads

human genome transposon Figure 6

Chr 9 A B Chr 8 Chr 10 GFP-PGBD5 Chr 7 Chr 11 1.0 Chr 12 Chr 6

0.5 Chr 13 bits T A T Chr 5 T TT GT T TAC AA AA T A A Chr 14 GT G GT T A C T A CAT A AAC A G C C CAA CG A T TCG AC TC CT C T G

G G T G G G G G 0.0 CGGAGGCA CCC CGT 5 10 15 20 Chr 15 Chr 4 Chr 16 GFP 2.0 1.0 Chr 17 Chr 3 Chr 18 1.0 0.5 Chr 19 bits bits Chr 20 C Chr 2 C AG T C C GA C G C AT Chr 21 A T T T C T G CCT ACT C T C CG G A G A A A T AGA C G A G CGCAATG G C AGGCTT C 0.0 GTTTGACGTTAATGCATGAGTA Chr 22 5 10 15 20 X Chr 1 Y Transposon gene insertions intergenic insertions C Transposon Human genome Coordinate

GAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAAAAATTGCTAATAGAAAACCATACCTTTTCTCAGGCCT... chr2:145131779-145131780

GAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAAAATATGGTAGGAACAGCTGGCCACCAGCTAGCACTGGCAA... chr1:185687955-185687956

GAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAAAGTGTGATGATGATTATCATAAAAATAATTACAATTTTAACCT... chr4:148237037-148237038

GAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAAAATAAGTAATCTATATTTTTCAAGAATCAAGTAGCTTCATTT... chr6:55607722-55607723

GAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAAATTATGTAGATTTTGTTGTTGTTGTTGTTGCAGAAATGGCTCC... chr8:38090373-38090374

GAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAAAGAAATAAATCTCTTCCACATATTAAAACCACATTCATCCTGA... chr16:4131524-4131525 Figure 7 A

gene) 10

TK1 * * * 1 * *

Transposon copies copies Transposon

(relative to to (relative

GFP D168AD175AE188AD192AD194AE203AE205AE236AD241AD244AE284AE285AE287AD303AE365AE373AD386AD387AD425AE439AE444AE449AD450A

GFP-PGBD5 D194A+D386A B D168A+D194A+D386A MW(kDa)

75 GFP-PGBD5 50 Actin 37

D168AD194AD386A E365A HEK293T GFP-PGBD5 D194A+D386A

D168A+D194A+D386A Figure 8 Xenopus tropicalis PGBD5 Gallus gallus PGBD5 Danio rerio pgbd5 Python bivittatus PGBD5 Homo sapiens PGBD5 Mus musculus Pgbd5 Homo sapiens PGBD1 Bos taurus PGBD1 Ailuropoda melanoleuca PGBD1 Mus musculus Pgbd1 Bos taurus PGBD2 Homo sapiens PGBD2 Macaca mulatta PGBD2 Homo sapiens PGBD3 Macaca mulatta PGBD3 Homo sapiens PGBD4 Xenopus tropicalis PGBD4 Xiphophorus maculatus PGBD4 Danio rerio PGBD4 Trichoplusia ni PiggyBac Danio rerio PGBD3 Xenopus tropicalis PGBD3

0.2