Striking heterogeneity of somatic L1 retrotransposition INAUGURAL ARTICLE in single normal and cancerous gastrointestinal cells

Katsumi Yamaguchia,1, Alisha O. Soaresa, Loyal A. Goffa, Anjali Talasilaa, Jungbin A. Choia, Daria Ivenitskya, Sadik Karmaa, Benjamin Brophya, Scott E. Devineb, Stephen J. Meltzerc, and Haig H. Kazazian Jr.a,1

aDepartment of Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD 21205; bInstitute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201; and cDepartment of Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD 21205

This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected in 2018.

Contributed by Haig H. Kazazian Jr., November 1, 2020 (sent for review September 16, 2020; reviewed by Mark A. Batzer and John V. Moran) Somatic LINE-1 (L1) retrotransposition has been detected in early et al. (14) showed that L1s are mobile in certain regions of the embryos, adult brains, and the gastrointestinal (GI) tract, and brain, particularly the hippocampus of mice. Other studies of many cancers, including epithelial GI tumors. We previously found human brain, including of single cells, solidified the consensus numerous somatic L1 insertions in paired normal and GI cancerous that somatic retrotransposition of L1 occurs in the brain, but at tissues. Here, using a modified method of single-cell analysis for low levels (∼0.2 insertions per neuron) (14–19). Somatic L1 in- somatic L1 insertions, we studied adenocarcinomas of colon, pan- sertion has also been found at very low levels in human adult creas, and stomach, and found a variable number of somatic L1 colon, stomach, esophagus, and liver (20, 21), but such data are insertions in tumors of the same type from patient to patient. We lacking for other organs. detected no somatic L1 insertions in single cells of 5 of 10 tumors In contrast, the level of L1 retrotransposition varies greatly studied. In three tumors, aneuploid cells were detected by FACS. In among different types of epithelial tumors (21–31). Retro- one pancreatic tumor, there were many more L1 insertions in an- transposition also varies from patient to patient within the same euploid than in euploid tumor cells. In one gastric cancer, both tumor type: Some tumors possess many somatic L1 insertions, aneuploid and euploid cells contained large numbers of likely while others have none (24, 25). In addition, investigators have clonal insertions. However, in a second gastric cancer with aneu- “ ” ploid cells, no somatic L1 insertions were found. We suggest that reported an artifactual detection of somatic L1 insertions from GENETICS when the cellular environment is favorable to retrotransposition, both whole tissue and single-cell analysis. Chimera artifacts arise aneuploidy predisposes tumor cells to L1 insertions, and retro- either during multiple displacement amplification (MDA) or the transposition may occur at the transition from euploidy to aneu- ligation and PCR steps of library preparation (32–35). Chimeras ploidy. Seventeen percent of insertions were also present in are sequences extending from a reference or polymorphic non- normal cells, similar to findings in genomic DNA from normal tis- reference L1Hs to a downstream flanking sequence less than 5 to sues of GI tumor patients. We provide evidence that: 1) The num- 10 kb away (33). ber of L1 insertions in tumors of the same type is highly variable, Here, we used a modified method to analyze single cells from 2) most somatic L1 insertions in GI cancer tissues are absent from adenocarcinomas of the colon, pancreas, and stomach for so- normal tissues, and 3) under certain conditions, somatic L1 retro- matic L1 insertions. We found that many presumptive new in- transposition exhibits a propensity for occurring in aneuploid cells. sertions close to reference human L1s were, in fact, artifacts: They could not be validated in any cells. They were absent from somatic retrotransposition | gastrointestinal cancer | single cells genomic DNA (gDNA) of tissues and their 5′ ends could not be recovered by PCR of the 5′ flank. As others have reported, we hile -coding DNA sequences make up only 1.5% of found significant patient-to-patient variability in the number of Wthe , remnants of transposable elements somatic L1 insertions in tumors of the same type. Among in- comprise nearly 50% (1). The 500,000 known LINE-1s (L1s), sertions detected in tumors, about 17% (10 of 59) were also one group of retrotransposons, comprise 17% of total human DNA, but nearly all are inactive, the result of ancient truncated Significance insertions and full-length insertions that previously mutated during human evolution. We humans have some 100 to 150 ac- Using a modified method, we analyzed single cells from ade- tive L1s in our genomes (2), but only a handful of these are very nocarcinomas of the colon, pancreas, and stomach and their “ ” active or hot (2, 3). They become mobile only when their DNA paired normal tissues for somatic LINE-1 retrotransposition is transcribed into RNA, and this RNA is reverse-transcribed events. We found that somatic retrotransposition varied sig- back into DNA and inserted into the genome. This process of nificantly from patient to patient with cancers of the same retrotransposition has been modeled in cell culture (4). The L1- type, as well as in different cells within a particular cancer. encoded , endonuclease and reverse transcriptase, tend to act predominantly in cis on the L1 transcript that encoded Author contributions: K.Y., L.A.G., and H.H.K. designed research; K.Y., A.O.S., A.T., J.A.C., them (5–7). Paradoxically, these autonomous elements also drive D.I., S.K., B.B., and S.E.D. performed research; S.J.M. contributed new reagents/analytic tools; S.J.M. provided samples; K.Y., A.O.S., and S.E.D. analyzed data; and K.Y. and H.H.K. the insertion of nonautonomous Alus (8), SVAs (SINE-R, VNTR, wrote the paper. Alu – and )(911), processed pseudogenes (6, 7), and other small Reviewers: M.A.B., Louisiana State University; and J.V.M., University of Michigan, RNAs (12). This process of in trans retrotransposition depends Ann Arbor. upon the L1 endonuclease and reverse transcriptase encoded by the The authors declare no competing interest. L1 ORF2 protein. Published under the PNAS license. For many years, investigators believed that L1 retrotransposition 1To whom correspondence may be addressed. Email: [email protected] or hkazazi1@ occurred exclusively in germ cells and the very early human embryo, jhmi.edu. > because so many L1s and Alus (together, 1.5 million) are fixed This article contains supporting information online at https://www.pnas.org/lookup/suppl/ in the genome. In addition, it was difficult to find L1 RNA doi:10.1073/pnas.2019450117/-/DCSupplemental. outside of the testis and ovary (13). However, in 2005, Muotri

www.pnas.org/cgi/doi/10.1073/pnas.2019450117 PNAS Latest Articles | 1of8 Downloaded by guest on September 24, 2021 present in normal cells adjacent to the tumor, similar to previous single-copy sites on different . To use the MDA DNA for library findings in gDNA from normal tissues (21, 29). production, we required a positive PCR product from all 16 sites, except for the colon cancer samples, where we worked with cells that had 14 to 16 Materials and Methods positive sites. Even with the highest-quality MDA kit, this stringent criterion was fulfilled in only 0 to 50% of single-cell DNA tested per single FACS- To detect somatic L1 insertions in single cells, we modified previous methods sorting experiment. (36) at both the wet bench and computational phases. These changes led to After MDA and its quality-control procedure, we carried out asymmetric, more efficient detection of somatic L1 insertions and improved accuracy in one-sided PCR off the 3′ ends of human-specific L1s using the specific ACA reducing chimeras. Briefly, we isolated single tumor and nontumor cell nu- sequence 94 bp upstream of the polyA tract (38). To obtain a very high clei by FACS sorting. When possible, we separated aneuploid nuclei on the fraction of 3′ flanking sequences of these L1s, we then circularized the basis of DNA content [nuclei having a DNA content peak greater than a product (Fig. 2) and carried out inverse PCR around the circle in both di- euploid peak: A peak between G1 and G2 (Fig. 1B) or a peak greater than G2 rections using two oligonucleotides, both of which were located on the (Fig. 1D)]. Distance along the abscissa is proportional to DNA content. In L1Hs-specific sequence (SI Appendix, Supplemental Methods). Additional Fig. 1B, aneuploid cells (∼240 to 260) have less DNA than euploid cells (∼260 adapters were then added to the resulting linearized products, increasing to 280), while in Fig. 1D, aneuploid cells (∼260 to 280) have more DNA than the specificity for human-specific L1s by anchoring the 3′ end of the L1 euploid cells (∼180 to 200). primer to the specific G residue in L1Hs 9 bp upstream of the polyA tract. The We then amplified single-cell DNA (∼12 pg) by MDA to 0.5 to 5 μg (37). resulting library was quality-checked for flanking sequences 3′ to human- MDA DNA was then quality-checked by PCR amplification of 16 different specific L1s by topoisomerase cloning and Sanger sequencing of 5 to 10

Fig. 1. FACS analysis of nuclei from cases 352 and 6041. (A) The 352 normal pancreas nuclei; G2 phase (R11) nuclei were sorted. (B) The 352 pancreatic cancer nuclei; G2 phase (R11) and sub G2 (R10) nuclei are sorted. (C) The 6041 normal gastric nuclei; G2 phase (R11) nuclei are sorted. (D) The 6041 gastric cancer nuclei; G2 phase (R11) and aneuploid (R15) nuclei are sorted. Red arrows indicate sites of aneuploid cells. Distance along the abscissa is proportional to DNA content. In B, aneuploid cells (∼240 to 260) have less DNA than euploid cells (∼260 to 280), while in D, aneuploid cells (∼260 to 280) have more DNA than euploid cells (∼180 to 200).

2of8 | www.pnas.org/cgi/doi/10.1073/pnas.2019450117 Yamaguchi et al. Downloaded by guest on September 24, 2021 somatic L1 insertions were discovered in single nuclei had been ABTissue sample Asymmetric PCR AC previously analyzed (21, 39). After careful analyses of many L1 3’UTR An INAUGURAL ARTICLE Lysis buffer An presumptive new insertions by computational and manual fil- An An tration, we failed to validate somatic L1 insertions in two pan- Singlet nuclei creatic cancers and three gastric cancers. We successfully Circularizaon validated insertions in the colon cancer and the remaining two & pancreatic and two gastric cancers. Inverse PCR FACS sorng First, we analyzed four colon cancer nuclei, three hepatic metastatic nuclei, and four normal liver nuclei, all from the same Add Index for Nex-gen seq patient (Fig. 3). In addition to these 11 nuclei used for next-

An generation sequencing, we analyzed the 8 insertions found by Single nuclei Tn sequencing in 11 other isolated nuclei: 6 from a colon cancer, 1 L1-Seq from a metastasis, and 4 from normal colon. In these nuclei, we validated the eight novel somatic L1 insertions, along with five other somatic insertions that had been observed in our previous Genomic DNA amplificaon Inseron validaon by PCR bulk gDNA study of the same patient (21). Notably, these latter Fig. 2. In A, tissue is prepared for FACS sorting and gDNA amplification by five insertions were not detected by next-generation sequencing MDA. In B, a library is prepared consisting of the 3′ flanks of human-specific in the present study, likely because the tissues used here were L1s for Illumina sequencing (Nex-gen seq, next-generation sequencing). See from different sites than those in the previous study. Eleven of 13 SI Appendix, Supplemental Methods for details of the procedure. somatic insertions were present in the primary cancer, while 9 of the 13 were detected in the metastasis. Three of the 13 were also present in the normal liver (Fig. 2). Note that none of the eight new insertions found by next-generation sequencing had been individually cloned products per library. A library chosen to proceed to next- generation sequencing contained clones of which 80 to 90% were located 3′ identified in our previous study, but they were now detected in of different human-specific reference L1s in the human genome. both single nuclei and gDNA. These high-quality libraries were then sequenced, and data from 30 to 50 Curiously, one insertion (023) was present in only one normal million reads per library were analyzed using the L1-seq pipeline (36). Then, and one metastasis nucleus, yet it was detected in whole-tissue the most likely somatic L1 insertions present in one or more tumor cells, DNA from both normal and metastatic nuclei. The 023, like the while absent in normal cells, were further analyzed manually at their specific other 12 shown, was fully validated (Table 2). We could not GENETICS genomic sites. If any “newly detected” L1 was located within a 5-kb window detect this insertion in any of 10 cancer nuclei nor in the whole- of a reference or known polymorphic human-specific L1, it was deemed cancer tissue DNA, suggesting that its concentration was low in likely to be a false-positive due to chimera formation. We chose only L1 insertions located at a greater distance from known genomic L1s as poten- the cancer tissue. This insertion is one of only two that were not tial somatic insertions for validation at both 5′ and 3′ ends. Of these can- detected in the primary cancer, but present in metastasis. We didates, about 70 to 80% could be fully validated by 5′ and 3′ end sequences. suggest that normal liver cells containing insertion 023 had po- For more detailed methods, see SI Appendix, Supplemental Methods. tentially invaded and contaminated the liver metastasis, or that the insertion was very rare in cells of the tumor but selectively Results entered the metastasis. Using this procedure, we analyzed the cells in one colon cancer Next, we focused on single nuclei from four pancreatic ade- case, four pancreatic cancer cases, and five gastric cancer cases nocarcinomas and their paired normal tissues. In the first case (Table 1). The whole tissue of two of these samples in which studied (A55) (Fig. 4A), we detected and validated nine somatic

Table 1. Somatic L1 insertions in cancer nuclei and bulk gDNA Total Primary tumor Aneuploid tumor Metastatic cancer Normal Tumor Metastatic cancer Normal insertions nuclei nuclei nuclei nuclei gDNA gDNA gDNA

Colon cancer case 8 7 N.A. 4 1 7 6 3 3329 Pancreatic cancer 9 8 N.A. 8 1 8 8 1 case A55 Pancreatic cancer 0 0 N.A. N.A. 0 0 N.A. 0 case 287 Pancreatic cancer 0 0 N.A. N.A. 0 0 N.A. 0 case 325 Pancreatic cancer 7 3 7 N.A. 0 6 N.A. 4 case 352 Gastric cancer case 0 0 N.A. N.A. 0 0 N.A. 0 4087 Gastric cancer case 0 0 0 N.A. 0 0 N.A. 0 5066 Gastric cancer case 28 28 28 N.A. 0 28 N.A. 1 6041 Gastric cancer case 2 2 N.A. N.A. 0 2 N.A. 0 6042 Gastric cancer case 0 0 N.A. N.A. 0 0 N.A. 0 6075

N.A., not available.

Yamaguchi et al. PNAS Latest Articles | 3of8 Downloaded by guest on September 24, 2021 Single nuclei gDNA cancer nuclei 006 007 metastac liver nuclei 018 023 normal nuclei duts s duts 035

snoi Euploid cancer nuclei

i 037 hT tresni 040 Cancer bulk genomic DNA 041 s02 Metastac liver bulk genomic DNA .la te gniw te .la s05 Normal bulk genomic DNA s15 s16 PCR validated Ey s17

Fig. 3. Somatic L1 insertions in a colon cancer patient. Red marker indicates PCR validated candidates. Each column indicates validated insertions in single nuclei or bulk gDNA. Thirteen insertions were found in 10 colorectal cancer nuclei, 4 metastatic nuclei from liver, and 8 normal liver nuclei. Three insertions (007, 037, and s05) were found in only one tumor cell, but were present in bulk tumor gDNA. Note that we found no insertions in three tumor cells and one metastatic cell.

L1 insertions. Interestingly, although we found the A55.139 in- pair along with MDA-amplified DNA from three single normal sertion in several metastatic nuclei, many cancer nuclei, and in a cells and three single tumor cells sequenced to 30× coverage. single normal nucleus, this mosaic insertion was not detected in After MELT analysis, several strong somatic L1 candidate in- either normal or metastatic bulk DNA (Fig. 4A). The insertion in sertions were identified in single cells. None, however, could be this patient was apparently present in low concentration in validated with PCR assays. Thus, all of the L1 insertions iden- normal tissue, clonal in the primary cancer, but present in only a tified in the normal/tumor pair by WGS were either artifacts or small fraction of metastatic tissue. In this case, two of the nine germline insertions. insertions were detected in normal pancreatic tissue. In the nuclei of three gastric cancer patients, we obtained a This sample had been previously analyzed (39), wherein 20 de number of potential positives from the 3′ ends data. However, novo insertions were uncovered in the whole-tissue cancer DNA. many of these potential insertions turned out to be located close We found only 6 of these 20 insertions in our current single cell to other human-specific L1s in the genome. PCR of their 3′ and analysis. We then looked for the remaining 14 insertions in tu- 5′ ends was completely unsuccessful, and we believe that all of mor gDNA, validating 8 of them in this different sample of whole these potential L1 insertions were false-negatives due to chimera tissue from the tumor. Although we missed these eight insertions formation during library production. All told, three of five gastric in our single-cell analysis, we did find three new insertions not cases had no somatic L1 insertions detected. In a fourth gastric detected by Rodic et al. (39). cancer case, we found only two somatic L1 insertions. In the second pancreatic case (Fig. 4B, the 352 case), we found In contrast, the fifth gastric cancer sample (6041) analyzed had seven novel somatic L1 insertions. In this case, we detected an aneuploid nuclei and many somatic L1 insertions. We studied 4 aneuploid peak by FACS analysis. Although we looked hard for diploid cancer nuclei, 12 aneuploid cancer nuclei, and 4 normal aneuploid nuclei in all FACS sorts, we found the peaks only in nuclei. We found 28 validated L1 insertions in the cancer nuclei, this 352 pancreatic case and in two gastric cancer cases (Fig. 5 both diploid and aneuploid, but none were present in normal and Table 2). In the 352 case, we gated and sorted nuclei in R11 nuclei (Fig. 5). Of these 28 insertions, 25 were validated at the 5′ and R10 separately (Fig. 1); R11 contains euploid G2 phase end. The remaining three were thought to be accompanied by 5′ nuclei; R10 contains both euploid S phase nuclei and aneuploid flanking genomic alterations and were not chimeras because they nuclei (here, we term these R10 nuclei aneuploid). We studied were detected in many cells and were present in bulk tissue four of these aneuploid nuclei along with four nonaneuploid gDNA. Only 1 of the 28 insertions was detected in normal gastric cancer nuclei and four normal nuclei in 352 samples. The seven tissue. Because all 28 insertions were present in a majority of insertions detected were much more prevalent in aneuploid single cells tested, all 28 insertions appeared to be either clonal nuclei than in euploid nuclei from the primary cancer. We de- or to have occurred very early in the development of the tumor. tected those 7 insertions at a rate of 18 insertions in 28 possi- Interestingly, 1 of 4 presumptive aneuploid cancer nuclei and 1 bilities (64%) in the aneuploid nuclei and a rate of only 3 of 28 of 4 tumor nuclei had only 1 of the 28 insertions, suggesting (10%) in euploid nuclei. Moreover, all seven insertions were significant variability within the cells of the tumor. present in the aneuploid nuclei, but only three of seven were Among the total of 59 de novo L1 insertions found in this found in euploid nuclei. One of the seven insertions was detected study, none were present in an exon of a . Forty percent of in only one aneuploid nucleus and not in bulk DNA from the the insertions were present in introns, only slightly greater than cancer or normal tissue, while another was found in only one the fraction of the genome (30%) that is intronic space. Since aneuploid nucleus but was present in bulk DNA of the tumor. none of the intronic insertions were in well-known driver , Four of the seven insertions present in two to four of the an- it is likely that few, if any, were driver mutations related to the euploid cells (likely clonal in the tumor) were present in normal etiology of the cancer. bulk DNA (Fig. 4B). Altogether, insertions A55.139 (in meta- Since we could not determine the 5′ end of 4 insertions, we static cells) and 352.001 (in an aneuploid cell) were present in have fully characterized 55 of the 59 insertions (Table 2). There few tumor nuclei, but absent from whole-tissue gDNA of the were many interesting and unusual features of these insertions. tumor, suggesting that our detection method has the ability to The structures of many of the insertions were unique. We found detect new L1 insertions that are in low concentration in the just 1 of 55 (2%) that was a full-length L1. This fraction is much tissue and are not detected in gDNA. less than the 25 to 30% of recent Ta subset L1 insertions, In one (287) of the other two pancreatic cases in which we polymorphic or fixed in the human genome (39). Interestingly, found no insertions by our modified method (Table 1), we also target site duplications were associated with only 43 of 55 (78%) carried out whole-genome sequencing (WGS) followed by of characterized insertions, while short target site deletions MELT analysis (mobile element locator tool, MELT v2.1.5) (target site truncations) were present in 10 insertions, and 2 in- (40). In this case, we studied bulk gDNA from the normal/tumor sertions were blunt, having neither target site duplications nor

4of8 | www.pnas.org/cgi/doi/10.1073/pnas.2019450117 Yamaguchi et al. Downloaded by guest on September 24, 2021 Single nuclei gDNA A cancer nuclei

A55.162 INAUGURAL ARTICLE A55.167 metastac liver nuclei A55.175 A55.181 normal nuclei A55.139 Euploid cancer nuclei F2 insertions D4 Euploid/Aneuploid cancer nuclei E11 A55.191 Cancer bulk genomic DNA Metastac liver bulk genomic DNA Single nuclei gDNA B Normal bulk genomic DNA

352.001 PCR validated 352.002 352.004 352.006 352.008 insertions 352.012 352.027

Fig. 4. Somatic L1 insertions in pancreatic cancers A55 (A) and 352 (B). Note that insertion A55.139 was present in four of nine metastatic cells, but could not be detected in gDNA from the metastasis. This insertion must therefore be rare in the whole-tissue specimen, even though it was present in nearly half ofthe single-cell nuclei analyzed. Similarly, insertion 352.001 was present in one aneuploid tumor nucleus, but absent from whole-tissue gDNA from the tumor. In case 352, the insertions predominated in aneuploid nuclei, but were uncommon in euploid nuclei of the tumor.

target site truncations. Three L1 insertions in the gastric cancer (42) or template switching from one L1 RNA to the chromo- case had 5′ extra nucleotides of 4, 28, and 157 nucleotides, while some 6 RNA (43). 1 pancreatic cancer insertion had 47 bp of a U5 small RNA Not surprisingly, 14 of 55 (26%) insertions contained a 5′ in- sequence attached to its 5′ end (41). The 157 nucleotides at- version, and we found only two 3′ transductions. Our method, tached to the 5′ end of the 6041 insertion 005 in 3 however, generally fails to detect most 3′ transduction events GENETICS are present in the genome at chromosome 6 + 66260062. We because the 3′ sequence of a transduction is repeated at least hypothesize that these extra nucleotides are derived from a twice in the genome. Thus, the low number of 3′ transductions is transcript joined to the truncated L1 by either RNA ligase RtcB not unexpected. For both of the 3′ transductions, we were able to

Single nuclei gDNA A 6041.001 * 6041.002 6041.003 6041.004 6041.005 6041.006 6041.007 6041.008 6041.009 6041.010 6041.012 6041.014 * cancer nuclei 6041.015 6041.016 normal nuclei insertions 6041.017 6041.018 Euploid/Aneuploid cancer nuclei 6041.019 6041.020 Aneuploid cancer nuclei 6041.022 * Cancer bulk genomic DNA 6041.023 6041.024 Normal bulk genomic DNA 6041.025 6041.026 PCR validated 6041.027 6041.028 * : 3’ validaon only 6041.030 6041.031 6041.033

B Single nuclei gDNA

6042.001 6042.002

insertions *

Fig. 5. (A) Somatic L1 insertions present in single nuclei of euploid tumor cells, aneuploid tumor cells, normal cells, and normal and tumor whole gDNA. Note that nearly all insertions appear clonal in the aneuploid and euploid tumor cells, but are not observed in normal nuclei, suggesting that they occurred very early in tumor development. Only 1 of 28 insertions (018) was found in normal gDNA. (B) Data on the two insertions found in one cell and tissue gDNA of a gastric cancer case.

Yamaguchi et al. PNAS Latest Articles | 5of8 Downloaded by guest on September 24, 2021 Table 2. Characteristics of 59 somatic L1 insertions found in single nuclei of GI cancers Insertion no. Source 5′ validation Chr Strand Position Length Target site Inversion EN site

Colon cancer case 3329 006 Yes 11 — 6572407 Intron DNHD1 1,526 TSD 15 poly T 5′- TGTT/AT 007 Yes 12 — 320414 Intron SLC6A12 248 TSD 16 44 5′- ATTT/AC 018 Yes 20 + 41140789 Intron PTPRT 606 Blunt 5′- TATT/AA 023 Yes 5 — 82945979 Intron HAPLN1 380 TSD 19 226 5′- TCTT/AC 035 Yes 2 — 125942771 Intergenic 519 TSD 14 107 5′- TTTT/AT 037 Yes 3 — 137230618 Intergenic 110 TSD 16 5′- TTTC/AA 040 Yes 5 — 105210750 Intergenic 317 TST 1 5′- TTCA/AA 041 Yes X + 28282640 Intergenic 533 TSD 12 378 5′- TTTT/AA Pancreatic cancer case A55 A55.162 Yes 3 + 144039103 Intergenic 1,018 TSD 14 692 5′- TGTT/AT A55.167 Rodic et al. (39) Yes 7 — 32107312 Intron PDE1C 1,111 TSD 15 5′- TTCT/GT A55.175 Rodic et al. (39) Yes 9 + 82623554 Intergenic 293 TSD 17 5′- TCTT/AA 3′ Transduction A55.181 Rodic et al. (39) Yes 12 — 99119996 Intron APAF1 377 TSD 16 32 5′- TTTT/AG A55.139 Yes 15 + 51868384 Intron DMXL2 520 TSD 579 5′- TTTA/AC F2 Rodic et al. (39) Yes 2 + 197376760 Intron HECW2 152 TSD 6 N.D. D4 Rodic et al. (39) Yes 7 + 55919204 Intron SEPT14 517 TSD 16 Yes N.D. E11 Rodic et al. (39) Yes 8 + 15064582 Intron SGCZ 630 TSD 16 N.D. A55.191 Yes 9 — 91457945 Intergenic 359 TSD 8 5′- CCTT/GG 3′ Transduction Pancreatic cancer case 352 352.001 Yes 2 — 214902012 Intergenic 1,254 TSD 3 5′- TTTC/AT 352.002 Yes 3 + 1542434 Intron SPAG16 188 TSD 10 97 5′- TTTT/GT 352.004 Yes 5 — 100705559 Intergenic 558 TSD 13 5′- TTCT/AA 352.006 Yes 5 — 129475525 Intron CHSY3 424 TST 1 5′ - AACA/AT 352.008 Yes 6 — 12829478 Intron PHACTR1 6,022 TSD 16 5′- TTCT/GT Full length 352.012 Yes 8 — 89386371 Intergenic 288 TST 3 5′- TTTT/AA 352.027 Yes X — 25147381 Intergenic 245 TSD 10 5′- TTGT/AA 47 bp U5 snRNA Gastric cancer case 6041 6041.001 Yes 1 — 34906442 Intergenic 208 TSD 4 5′- TTTC/AT 6041.002 No 21 — 29066778 Intergenic 6041.003 Yes 2 — 40517543 Intron SLC8A1 282 TSD 8 5′- CTTT/GC 6041.004 Yes 2 — 193243466 Intergenic 209 Blunt 5′- TTTT/AA 6041.005 Yes 3 — 102622796 Intergenic 183 TSD 17 5′- TGGA/CA 5′ Extra nucleotides: 157 bp 6041.006 Yes 4 + 14257419 Intergenic 158 TST 6 5′- ATTT/GA 6041.007 Yes 4 + 117257394 Intergenic 547 TST 6 288 5′- TGTT/AA 6041.008 Yes 4 — 139700800 Intergenic 365 TST 24 5′- TTTT/AC 6041.009 Yes 5 + 50129488 Intron PARP8 413 TSD 14 111 5′- TTTC/AA 6041.010 Yes 5 — 121088311 Intergenic 305 TSD 14 173 5′- TTTT/GA 6041.012 Yes 6 + 127339102 Intron AK127472 211 TSD 16 5′- TTTT/AG 5′ Extra nucleotides: 28 bp 6041.014 No 7 — 9432408 Intergenic 6041.015 Yes 8 — 51632702 Intron SNTG1 588 TSD 7 5′- TTTC/AA 6041.016 Yes 9 + 29298432 Intergenic 129 TST 5 5′- TCTT/AA 6041.017 Yes 12 — 105956585 Intergenic 344 TST 5 5′- TTTT/GA 6041.018 Yes 13 — 68766406 Intergenic 439 TSD 17 1117 5′- TTTT/AA 6041.019 Yes 14 — 55210179 Intron SAMD4A 869 TSD 15 1346 5′- TTTT/AA 6041.020 Yes 15 — 25907230 Intergenic 201 TSD 17 355 5′- TTTA/AC 6041.022 No 15 — 68040194 Intron MAP2K5 3′ Transduction 6041.023 Yes X — 86935179 Intergenic 281 TST 1 5′- TTGT/AT 6041.024 Yes 18 — 6076317 Intron L3MBTL4 107 TSD 13 5′- TTAT/AA 5′ Extra nucleotides: 4 bp 6041.025 Yes 19 + 37362350 Intron ZNF345 871 TSD 3 5′- TTTC/AA 6041.026 Yes 2 — 58865999 ncRNA LINC01122 151 TST 1 5′- TTTA/AA 6041.027 Yes 2 + 154744852 Intron GALNT13 895 TST 3 5′- TTTT/AA 6041.028 Yes 3 — 81502240 Intergenic 707 TSD 16 5′- TTTT/CC 6041.030 Yes 7 — 123848772 Intron EU233817 275 TSD 2 5′- TATT/AT 6041.031 Yes 8 — 84907212 Intergenic 701 TSD 9 5′- CTTT/GA 6041.033 Yes 3 — 56019514 Intron ERC2 344 TSD 7 5′- TTTT/AA Gastric cancer case 6042 6042.001 Yes 12 + 127715032 Intergenic 670 TSD 5 5′- TTTT/AA 6042.002 No 12 + 100608188 Intron 5′- TTTT/AT

The 5′ end of the insertion was identified for 55 of the 59 insertions. EN site, endonuclease-cleavage site; TSD, target site duplication; TST, target site truncation.

6of8 | www.pnas.org/cgi/doi/10.1073/pnas.2019450117 Yamaguchi et al. Downloaded by guest on September 24, 2021 find the insertion’s precursor, and both precursors were full- insertion, while the 11 others harbored over 20 somatic L1 in- length L1s. One had previously been determined to be highly sertions. Likewise, 1 of 4 euploid tumor cells in that case pos- active, while the other is a nonreference L1 that has not yet been sessed only 1 insertion, while the remaining 3 cells had 24 or INAUGURAL ARTICLE tested for retrotransposition in tissue culture. more. Moreover, we found eight L1 insertions that were not present in single nuclei, but were present in whole-tissue DNA Discussion (Figs. 3–5), implying that there were at least two different cell There have been numerous analyses of L1 insertions in paired types in the normal tissues, one of which had the L1 insertion cancer and normal tissues (21–31) and in brain neurons (15–19). and the other of which did not. De novo L1 insertions have been commonly detected in cancer Aneuploid cells, which are present in about 70% of solid tu- tissues, especially in epithelial cancers. Nevertheless, insertions mors, are caused by a variety of mechanisms (45). Clinical in these tumors have varied strikingly in number. Some tumors of studies show that aneuploidy is associated with poorer survival a particular organ and cell type harbor many insertions, while than diploidy in many cancers, including prostate adenocarci- others have none. A few somatic insertions have functioned as nomas (46). In most of our samples, we found no cells that were driver mutations in cancer, disrupting tumor-suppressor genes, obviously aneuploid, but in three, there were small numbers of such as APC and PTEN, and leading to adenocarcinoma of the aneuploid cells (Fig. 1 and Table 1). In pancreatic cancer case colon (22, 27, 44). Studies of somatic L1 insertions in brain tissue 352, there were many more L1 insertions in the aneuploid cells have demonstrated a wide range in numbers of potential inser- (Fig. 4), while in gastric cancer 6041 with a great many somatic tions in neurons. However, analyses of insertions in single neu- L1 insertions, the number in aneuploid cells was very high. rons have revealed a very small number of L1 insertions in these However, in one gastric case, no insertions were found, even brain cells (∼2 of 100 cells) (17, 19, 33). though aneuploid cells were analyzed. Whether increased L1 Here, to our knowledge, we report a unique analysis of so- retrotransposition in aneuploid cells is a cause of poorer prog- matic L1 insertions in single cells from gastrointestinal (GI) nosis or merely a noncausal association remains to be seen. cancers. Similar to findings observed in whole tissues (27, 30, 35), Recently, a whole-genome study of inflammatory bowel disease we observed a high degree of variability in de novo L1 insertions. cases found a correlation between retrotransposon insertions Somatic L1 insertions were found in variable numbers both and aneuploidy (47). Moreover, Bortvin and colleagues (48) within similar tumor types and different cells in a tumor. In the have reported that increased L1 expression correlates with an- 10 tumors studied, we failed to detect any somatic L1 insertions euploidy and oocyte attrition in mice. Furthermore, previous in 2 of 4 pancreatic cancers and 3 of 5 gastric cancers; yet, in the studies suggest that whereas retrotransposition can occur GENETICS remaining cases, we found a variable number of L1 insertions, throughout the (49), it predominates during S-phase ranging from a minimum of 2 to a maximum of 28 (Figs. 3–5). (41–53). We suggest that the timing during replication at which In addition to the observed variable number of somatic L1 euploid cells become aneuploid may in fact be important in insertions among tumors of the same type, we found new in- facilitating retrotransposition (Fig. 6). sertions in single nuclei that were not found in whole-tissue DNA We studied de novo L1 insertions in single cells to determine (Fig. 4). In pancreatic case A55, an L1 insertion was present in a whether many somatic L1 insertions originate in presumptive normal nucleus, but not in normal tissue DNA. In the same case, normal cells within a tissue, and these “normal” cells later be- insertions were found in four of nine metastatic nuclei, but not in come malignant. We postulated that those cells containing a new metastatic tissue DNA. Likewise, in pancreatic case 352, an in- L1 insertion might be prone to become fast-replicating and sertion was present in an aneuploid tumor nucleus, but not in cancerous. Our data indicate that the majority of somatic L1 whole-tumor DNA. We also found variability in the number of insertions (83%) occur only after tumorigenesis is well underway. L1 insertions in single nuclei within the same case. In patient However, a significant percentage (17%) of these insertions were 6041, we found 1 of 12 aneuploid cancer cells with only one L1 found both in the normal and tumor cells of GI cancer patients.

Fig. 6. Timing of L1 retrotransposition. (Upper) L1 retrotransposition occurs mostly during DNA replication in cell culture. L1 insertions are a common phenomenon in aneuploid cancer cells, and the point at which euploid cells become aneuploid may be important for the timing of L1 retrotransposition in these cells (Lower).

Yamaguchi et al. PNAS Latest Articles | 7of8 Downloaded by guest on September 24, 2021 Such insertions could potentially initiate or drive tumorigenesis have a major effect on the number of somatic L1 insertions in a if they are present sufficiently early in the process to create particular cancer. WGS and improved single-cell technologies driver mutations. Insertions that occur only after tumorigenesis should provide answers to the relationship between tumor mu- is well underway produce significant heterogeneity within the tations and L1 retrotransposition, and the significance of aneu- cancer, but likely represent passenger mutations that could po- ploidy in somatic L1 retrotransposition. tentially influence tumor evolution and metastasis. The reasons for the heterogeneity in somatic L1 insertions Data Availability. All study data are included in the article and from one tumor to another are unknown. The ability of any supporting information. particular L1 to retrotranspose is variable, and highly active L1s “ ” are polymorphic within the population. Many hot L1s are ACKNOWLEDGMENTS. We thank Clement Ojo and Andrew Hannan for present in a minority of individuals, with allele frequencies technical assistance; Hao Zhang for FACS analysis; and Kathleen Burns for of <50% (2, 3). Meanwhile, >60 proteins function as brakes on helpful discussion. This study was supported in part by NIH Grants 2R01 retrotransposition at different stages of development (54). GM099875 (to H.H.K.), R01 DK118250 (to S.J.M.), and R01 HG002898 (to However, we do not know whether particular tumor mutations S.E.D.). K.Y. was supported by the Debbie’s Dream Foundation-AACR Gastric affecting the ability of a protein to limit retrotransposition could Cancer Research Fellowship, Grant Number 19-40-41-YAMA.

1. E. S. Lander et al., Initial sequncing and analysis of the human genome. Nature 409, 29. T. T. Doucet-O’Hare, et al., Somatically acquired LINE-1 insertions in normal esopha- 860–921 (2001). gus undergo clonal expansion in esophageal squamous cell carcinoma. Hum. Mutat. 2. B. Brouha et al., Hot L1s account for the bulk of retrotransposition in the human 37, 942–954 (2016). population. Proc. Natl. Acad. Sci. U.S.A. 100, 5280–5285 (2003). 30. J. M. C. Tubio et al., Mobile DNA in cancer: Extensive transduction of nonrepetitive 3. C. R. Beck et al., LINE-1 retrotransposition activity in human genomes. Cell 141, DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 1159–1170 (2010). (2014). 4. J. V. Moran et al., High frequency retrotransposition in cultured mammalian cells. Cell 31. B. Rodriguez-Martin et al.; PCAWG Structural Variation Working Group; PCAWG 87, 917–927 (1996). Consortium, Pan-cancer analysis of whole genomes identifies driver rearrangements 5. B. A. Dombroski, S. L. Mathias, E. Nanthakumar, A. F. Scott, H. H. Kazazian Jr., Iso- promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020). lation of an active human transposable element. Science 254, 1805–1808 (1991). 32. G. D. Evrony, E. Lee, P. J. Park, C. A. Walsh, Resolving rates of mutation in the brain 6. W. Wei et al., Human L1 retrotransposition: Cis preference versus trans complemen- using single-neuron genomics. eLife 5, e12966 (2016). tation. Mol. Cell. Biol. 21, 1429–1439 (2001). 33. C. D. Treiber, S. Waddell, Resolving the prevalence of somatic transposition in Dro- 7. C. Esnault, J. Maestre, T. Heidmann, Human LINE retrotransposons generate pro- sophila. eLife 6, e28297 (2017). cessed pseudogenes. Nat. Genet. 24, 363–367 (2000). 34. M. Kircher, Analysis of high-throughput ancient DNA sequencing data. Methods Mol. – 8. M. Dewannieux, C. Esnault, T. Heidmann, LINE-mediated retrotransposition of Biol. 840, 197 228 (2012). ’ marked Alu sequences. Nat. Genet. 35,41–48 (2003). 35. M. A. Quail et al., A large genome center s improvements to the Illumina sequencing – 9. E. M. Ostertag, J. L. Goodier, Y. Zhang, H. H. Kazazian Jr., SVA elements are nonau- system. Nat. Methods 5, 1005 1010 (2008). tonomous retrotransposons that cause disease in humans. Am. J. Hum. Genet. 73, 36. A. D. Ewing, H. H. Kazazian Jr., High-throughput sequencing reveals extensive vari- 1444–1451 (2003). ation in human-specific L1 content in individual human genomes. Genome Res. 20, – 10. D. C. Hancks, J. L. Goodier, P. K. Mandal, L. E. Cheung, H. H. Kazazian Jr., Retro- 1262 1270 (2010). transposition of marked SVA elements by human L1s in cultured cells. Hum. Mol. 37. F. B. Dean et al., Comprehensive human genome amplification using multiple dis- placement amplification. Proc. Natl. Acad. Sci. U.S.A. 99, 5261–5266 (2002). Genet. 20, 3386–3400 (2011). 38. J. Skowronski, T. G. Fanning, M. F. Singer, Unit-length line-1 transcripts in human 11. J. Raiz et al., The non-autonomous retrotransposon SVA is trans-mobilized by the teratocarcinoma cells. Mol. Cell. Biol. 8, 1385–1397 (1988). human LINE-1 protein machinery. Nucleic Acids Res. 40, 1666–1683 (2012). 39. N. Rodic et al., Retrotransposon insertions in the clonal evolution of pancreatic ductal 12. A. J. Doucet, G. Droc, O. Siol, J. Audoux, N. Gilbert, U6 snRNA pseudogenes: Markers adenocarcinoma. Nat. Med. 21, 1060–1064 (2015). of retrotransposition dynamics in mammals. Mol. Biol. Evol. 32, 1815–1832 (2015). 40. E. J. Gardner et al.; 1000 Genomes Project Consortium, The mobile element locator 13. E. M. Ostertag et al., A mouse model of human L1 retrotransposition. Nat. Genet. 32, tool (MELT): Population-scale mobile element discovery and biology. Genome Res. 27, 655–660 (2002). 1916–1929 (2017). 14. A. R. Muotri et al., Somatic mosaicism in neuronal precursor cells mediated by L1 41. N. Gilbert, S. Lutz, T. A. Morrish, J. V. Moran, Multiple fates of L1 retrotransposition retrotransposition. Nature 435, 903–910 (2005). intermediates in cultured human cells. Mol. Cell. Biol. 25, 7780–7795 (2005). 15. J. K. Baillie et al., Somatic retrotransposition alters the genetic landscape of the hu- 42. J. B. Moldovan, Y. Wang, S. Shuman, R. E. Mills, J. V. Moran, RNA ligation precedes man brain. Nature 479, 534–537 (2011). the retrotransposition of U6/LINE-1 chimeric RNA. Proc. Natl. Acad. Sci. U.S.A. 116, 16. N. G. Coufal et al., L1 retrotransposition in human neural progenitor cells. Nature 460, 20612–20622 (2019). 1127–1131 (2009). 43. A. Buzdin et al., The human genome contains many types of chimeric retrogenes 17. G. D. Evrony et al., Single-neuron sequencing analysis of L1 retrotransposition and generated through in vivo RNA recombination. Nucleic Acids Res. 31, 4385–4390 somatic mutation in the human brain. Cell 151, 483–496 (2012). (2003). 18. K. R. Upton et al., Ubiquitous L1 mosaicism in hippocampal neurons. Cell 161, 228–239 44. E. C. Scott et al., The mobile element locator tool (MELT): Population scale mobile (2015). element discovery and biology. Genome Res. 26, 745–755 (2016). 19. J. A. Erwin et al., L1-associated genomic regions are deleted in somatic cells of the 45. L. Sansregret, C. Swanton, The role of aneuploidy in cancer evolution. Cold Spring – healthy human brain. Nat. Neurosci. 19, 1583 1591 (2016). Harb. Perspect. Med. 7, a028373 (2017). 20. R. Shukla et al., Endogenous retrotransposition activates oncogenic pathways in he- 46. K. H. Stopsack et al., Aneuploidy drives lethal progression in prostate cancer. Proc. – patocellular carcinoma. Cell 153, 101 111 (2013). Natl. Acad. Sci. U.S.A. 116, 11390–11395 (2019). 21. A. D. Ewing et al., Widespread somatic L1 retrotransposition occurs early during 47. S. Olafsson et al., Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, – gastrointestinal cancer evolution. Genome Res. 25, 1536 1545 (2015). 672–684.e11 (2020). 22. Y. Miki et al., Disruption of the APC gene by a retrotransposal insertion of L1 se- 48. S. Malki, G. W. van der Heijden, K. A. O’Donnell, S. L. Martin, A. Bortvin, A role for L1 – quence in a colon cancer. Cancer Res. 52, 643 645 (1992). retrotransposon in fetal oocyte attrition.in mice. Dev. Cell 51, 658 (2019). 23. R. C. Iskow et al., Natural mutagenesis of human genomes by endogenous retro- 49. S. Kubo et al., L1 retrotransposition in nondividing and primary human somatic cells. transposons. Cell 141, 1253–1261 (2010). Proc. Natl. Acad. Sci. U.S.A. 103, 8036–8041 (2006). 24. E. Lee et al.; Cancer Genome Atlas Research Network, Landscape of somatic retro- 50. S. Boissinot, A. Entezam, L. Young, P. J. Munson, A. V. Furano, The insertional history transposition in human cancers. Science 337, 967–971 (2012). of an active family of L1 retrotransposons in humans. Genome Res. 14, 1221–1231 25. S. Solyom et al., Extensive somatic L1 retrotransposition in colorectal tumors. Genome (2004). Res. 22, 2328–2338 (2012). 51. P. Mita et al., LINE-1 protein localization and fuctional dynamics during the cell cycle. 26. E. Pitkänen et al., Frequent L1 retrotranspositions originating from TTC28 in colo- Elife 7, e30058 (2018). rectal cancer. Oncotarget 5, 853–859 (2014). 52. T. Sultana et al., The landscape of L1 retrotransposons in the human genome is 27. E. Helman et al., Somatic retrotransposition in human cancer revealed by whole- shaped by pre-insertion sequence biases and post-insertion selection. Mol. Cell 74, genome and exome sequencing. Genome Res. 24, 1053–1063 (2014). 555–570.e7 (2019). 28. T. T. Doucet-O’Hare et al., LINE-1 expression and retrotransposition in Barrett’s 53. D. A. Flasch et al., Genome-wide de novo L1 retrotransposition connects endonu- esophagus and esophageal carcinoma. Proc. Natl. Acad. Sci. U.S.A. 112, E4894–E4900 clease activity with replication. Cell 177, 837–851.e28 (2019). (2015). 54. J. L. Goodier, Restricting retrotransposons: A review. Mob. DNA 7, 16 (2016).

8of8 | www.pnas.org/cgi/doi/10.1073/pnas.2019450117 Yamaguchi et al. Downloaded by guest on September 24, 2021