Downloaded by guest on October 2, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1819788116 a Huang Xuerui Yan Zhangming pairs RNA fusion and interactions RNA–DNA of colocalization Genome-wide uintranscripts fusion fusion of spatial creation the where for poise model, could EML4-ALK RNA-poise transcripts. DNA and an with Collec- RNA of . suggest sample proximity fusion data cancer EML4-ALK these the a tively, forming detected without FISH, we RNA RNA fusion normal FISH, single-molecule in DNA RNA-seq, interactions and combining sequencing RNA–DNA by exhibit RNA Finally, validation to using valida- cells. the found in a samples transcripts were in cancer fusion cohort 42 these lung of analyzed 96 Thirty-seven (RNA-seq). we of RNAs, cells transcripts cohort normal fusion fusion tion in of documented interactions predictive RNA–DNA present whether are to test pair To cancer. gene in probabil- this exhibit the to with of pair correlated ity gene positively Furthermore, a is cancer. of interactions with in frequency RNA–DNA RNAs colocalized signifi- the fusion genome, 5 most the formed cells, throughout that 10 normal pairs top gene in the the RNA–genome interactions of Among RNA–DNA mapping cant assay]. [using the types (MARGI) of tumor cell version interactions 9,966 noncancerous enhanced normal, inter- of an RNA–DNA two iMARGI, cohort genome-wide in transcripts with published mapped types) fusion actions a cancer have 33 16,410 (from of features compared samples cancer common we in few Here, detected types, iden- reported. cancer diag- been have been companion diverse RNAs in from fusion 15,000 biomarkers Yuan) tified Guo-Cheng than and as more Jiang Ning Although used by noses. reviewed are 2018; 20, transcripts November review Fusion for (sent 2018 19, December Chien, Shu by Contributed 17033; PA 92093; Hershey, and CA University, Jolla, State La Pennsylvania Diego, The San Medicine, of College Biology, 92093; Molecular CA Jolla, La Diego, San ae ihMRI MRIrdcdterqie mutof an amount required developed the we reduced iMARGI work, MARGI, this Com- with iMARGI. In pared called pipeline (6). experimental interacting MARGI caRNA genomic improved respective each the of and locations identifica- caRNAs diverse enabled RNA– of technology of tion (MARGI) mapping interactions developed histone recently genome and The DNA to (10). parallel modifications in information epigenomic of layer into pairs insights transcripts. gene fusion provides of the of creation that of the pattern understanding form- locations interactions characteristic genomic RNA a RNA–DNA the report fusion involving of we of distribution Here, 2D feature (9). the distinct pairs analyses gene any Recent ing transcript. identify unreported fusion new what not a predict could form may to of formidable pair remains it transcripts, identified and (4). (TCGA) transcripts Atlas fusion 15,000 Genome cancer than Cancer more A 33 5–8). across The datasets 3, from RNA-seq (1, types 9,966 relied data analyzed (RNA-seq) primarily study 4). sequencing have recent RNA (2, transcripts to of subtypes fusion analyses developed of cancer on been detection defined of have fusion-gene Efforts therapies treat and targeted identify and tests panion F eateto iegneig nvriyo aiona a ig,L ol,C 92093; CA Jolla, La Diego, San California, of University Bioengineering, of Department hoai-soitdRA cRA)poiea additional an provide (caRNAs) RNAs Chromatin-associated fusion identified in pairs gene of number large the Despite aebe rpsda igotcboakr 13.Com- (1–3). biomarkers diagnostic as proposed and types been cancer diverse have with associated are transcripts usion g eateto eiie nvriyo aiona a ig,L ol,C 92093 CA Jolla, La Diego, San California, of University Medicine, of Department | c ighoWen Xingzhao , N–N interactions RNA–DNA a,b,1 omnHuang Norman , f c eateto ibtsCmlctosadMtbls,BcmnRsac nttt,Ct fHp,Dat,C 91010; CA Duarte, Hope, of City Institute, Research Beckman Metabolism, and Complications Diabetes of Department iiino ilgclSine,Uiest fClfri,SnDeo aJla A92093; CA Jolla, La Diego, San California, of University Sciences, Biological of Division a,b i Xu Jie , | N-os model RNA-poise a,b,1 exnWu Weixin , d isiJin Qiushi , a,b,1 d agZhang Kang , ezogChen Weizhong , 1073/pnas.1819788116/-/DCSupplemental. y at online information supporting contains article This 2 1 aadpsto:Tedt eotdi hspprhv endpstdi h Gene the in deposited been have paper this database, GSE122690 in (GEO) reported Omnibus data Expression The deposition: Data under distributed BY).y is (CC the article Inc.y access Genemo, wrote of open cofounder This S.Z. a is Institute.y and S.Z. Cancer statement: Farber S.C., interest Dana G.-C.Y., of and Z.C., Conflict Austin; W.W., N.H., at Texas Z.Y., X.H., of N.H., University W.C., Z.Y., research; N.J., Reviewers: N.H., and performed Z.Y., data; S.Z. tools; N.H., analyzed and reagents/analytic Z.Y., paper.y S.Z. Z.C., new research; contributed K.Z., and designed S.Z. X.W., Q.J., S.Z. and J.X., and W.C., J.C., W.W., S.C., Y.J., W.W., W.C., N.H., W.W., Z.Y., contributions: Author oain,w eeoe MRI nehne eso fthe of version enhanced an iMARGI, interaction developed genomic we their locations, and caRNAs characterize tematically Maps. Interaction RNA–DNA Genome-Wide of Characteristics Results creation the RNA involve not an gene. does fusion identified that a sample we of cancer new model, a in being this fusion fusion for a with poised creating thus Consistent is and 2 transcript transcript. nascent gene 2’s inter- of gene by into fusion sequence 1 spliced genomic gene in the of involved RNA with the acting pairs pairs wherein model gene a widespread gene suggest the The transcripts on the transcripts. interactions on fusion RNA–DNA cancer-derived appeared in often involved kidney detected embryonic interactions The cells. human (HFF) RNA–DNA fibroblast in foreskin human interactions and (HEK) RNA–DNA to map 100-fold to by cells input owo orsodnemyb drse.Eal hcinus.d rszhong@ or [email protected] Email: addressed. be ucsd.edu. may correspondence whom work.y To this to equally contributed W.W. and N.H., Z.Y., etn htgnm eobnto sntarqie tpof step required a model. not RNA-poise is the sug- recombination exhibited gene, genome fusion samples that corresponding gesting additional a these without lung transcripts of fusion additional fusion 96 One of with samples. creation model cancer the this validated for other We the poises transcripts. and sequence transcripts where gene’s genomic one model, fusion gene’s of RNA-poise cancer-derived proximity an spatial with suggest the those data These with transcripts. overlap exhib- strong cells normal these ited in pairs interactions gene RNA–DNA The in types. inter- involved cell noncancerous RNA–DNA mapping two from genome-wide maps genomic systematic obtained action respective we By their loci, transcripts. and interaction RNAs fusion pairs chromatin-associated RNA new of unreported form what predict can to formidable remains It Significance e hnChen Zhen , b y nttt fEgneigi eiie nvriyo California, of University Medicine, in Engineering of Institute ). y a,b e iu Jiang Yiqun , eateto ptamlg,Uiest fCalifornia, of University Ophthalmology, of Department f h Chien Shu , ilo el.W sdiMARGI used We cells. million ∼5 a,b https://www.ncbi.nlm.nih.gov/geo iga Chen Jingyao , d a,b,g,2 eateto iceityand Biochemistry of Department raieCmosAtiuinLcne4.0 License Attribution Commons Creative n hn Zhong Sheng and , www.pnas.org/lookup/suppl/doi:10. NSLts Articles Latest PNAS a,b , acsinno. (accession | osys- To a,b,2 f10 of 1

SYSTEMS BIOLOGY A In situ In vitro

A Read 2 DNA end T

cDNA Read 1 BamHI RNA end Reverse crosslinking RNA/DNA RNA/DNA end prepare RNA-DNA Circularization Paired-end sequencing proximity ligation Chimera pull-down Linearization fragmentation Adenylated linker ligation Reverse transcription

B chr11 chr14 65.45 mb 65.55 mb 55.5 mb 55.7 mb 65.5 mb 65.6 mb 55.6 mb MALAT1 KTN1 MALAT1 Gene −1 HEK 0 iMARGI HFF iMARGI

0 0.01 0.02 C E DNA chr4 chr5 chr7 chr9 chr10 chr12 chr13 chr15 chr16 chr17 chr18 chr20 chr21 chr22 chrX chrY chr1 chr2 chr3 chr6 chr8 chr11 chr14 chr19

37.6% chr1 (13.7M) 52% (18.9M) chr2

chr3 10.4% chr4 (3.8M) chr5

proximal distal inter chr6 chr7 D chr8 Remote interactions chr9 100% chr10 83.3% 82.8% RNA chr11 75% chr12 chr13 50% chr14 chr15 chr16 25% 16.7% 17.2% chr17 chr18 chr19 chr20 0% chr21 HEK HFF chr22 chrX inter intra chrY

Fig. 1. Overview of iMARGI method and data. (A) Schematic view of iMARGI experimental procedure. (B) An example of interchromosomal read pairs (horizonal lines), where the RNA ends (red bars) were mapped to the MALAT1 gene on chromosomal 11, and the DNA ends (blue bars) were mapped to 14 near the KTN1 gene (blue bars). (C) Proportions of proximal, distal, and interchromosomal read pairs in a collection of valid RNA–DNA interaction read pairs. M: million read pairs. (D) Ratios of inter- and intrachromosomal read pairs in HEK and HFF cells after removal of proximal read pairs. (E) Heatmap of an RNA–DNA interaction matrix in HEK cells. The numbers of iMARGI read pairs are plotted with respect to the mapped positions of the RNA end (row) and the DNA end (column) from small (blue) to large (red) scale, normalized in each row.

2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1819788116 Yan et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 0 bi E el.Arw h eepiswt h ags n h eodlretnme fiAG edpairs. read iMARGI of number largest al. second et the Yan and largest ( the pair with gene pairs every gene of the pairs Arrow: read cells. HEK iMARGI in of kb number >200 the of Histogram (B) circles). (y pairs (x (Futra) bins RNA genomic of pair the between 2. Fig. ( ∼26,000) fewer distal many pairs and generated read million) diMARGI pairs. million) (∼0.5 read ). S1 (1–4 interchromosomal distal Table valid interchro- and Appendix, valid of million) (SI amounts (∼19 similar mosomal produced pxMARGI and S1). comparable Table Appendix, roughly (SI pairs had generated read datasets raw datasets of These amounts diMARGI cells. compared and HEK293T We (10). from pxMARGI, fragmentation differ iMARGI, chromatin which of the diMARGI, degree and the pxMARGI by called variations nical iMARGI requires (10). MARGI described previously only requires technology MARGI MARGI. the with iMARGI of Comparison circles). genomic blue and their red with number 2A, correlated (Fig. the negatively interactions distances chromosome, pairs each RNA–DNA read Within iMARGI 1E of of (Fig. S1D). line Fig. map diagonal Appendix, the 2D SI near interactions The more 1D). exhibited (Fig. actions an exhibited as interactions interchromosomal remote with only and deals paper interactions. this distal of rest refer of The interactions. we remote union Hereafter, the sequences. genomic to neighboring transcripts their nascent between and removed beyond interactions represent inter- we and proximal likely (10), because within actions analysis al. further to from et pairs mapped Sridhar read proximal Following were the respectively. end where kb, DNA 200 interactions and intrachromosomal end the RNA as defined were Appendix, 1C SI (Fig. respectively interactions, interchromosomal and tal, which in mil- interaction pairs, RNA–DNA 36.3 valid read (HFF) in million resulted 17.8 and These (HEK) 2 lion respectively. million pairs, 355.2 on read and cells sequencing steps million HFF and 361.2 situ ligation HEK yield in to these to iMARGI steps applied performs ligation We MARGI beads. the streptavidin whereas out carries 1A), iMARGI (Fig. and that iMARGI between is difference MARGI main The (10). assay MARGI AB

Number of interacted iMARGI pairs. read the of distribution the compared we First, types cell both interactions, RNA–DNA remote the Among genomic bins or Futra pairs 10 10 10 10 h ubro eoi i ar ih1 rmr MRIra ar ( y pairs read iMARGI more or 10 with pairs bin genomic of number The (A) data. iMARGI of Summary 0 1 2 3 10 : ai fita n necrmsmlinter- interchromosomal and intra- of ratio ∼1:5 Genomic distancebetween genomicbins ilo el osatteeprmn,whereas experiment, the start to cells million ∼5 .Tepoia n itlinteractions distal and proximal The S1A). Fig. 5 xs eie rm996cne ape spotdaanttegnmcdsac (x distance genomic the against plotted is samples cancer 9,966 from derived axis) 0 ilo el.MRIhstotech- two has MARGI cells. million ∼500 5,1% n 5 eepoia,dis- proximal, were 55% and 10%, ∼35%, 10 or Futra pairs(bp) 6 xs nHKcls(e ice)adHFcls(lecrls.Frcmaio,tenme ffso transcript-contributing fusion of number the comparison, For circles). (blue cells HFF and circles) (red cells HEK in axis) ecmae MRIwith iMARGI compared We 10 × 7 0-ppaired-end 100-bp HFF HEK Futra 10 8 and and x ofso rncit.T hsed esbetdtepreviously 9,966 from the derived subjected were we that transcripts end, contribute fusion this that 16,410 To pairs reported transcripts. RNA the fusion Fusion of to to distribution Contributing genome-wide Pairs of RNA the Transcripts. of Distribution Nonuniform fusion analyze in and systematically transcripts interactions to fusion of RNA–DNA us as transcripts. 5 led between reported findings relationship Notably, were These the (7). were 7). pairs cancers (1, gene pair cancers prostate 10 gene and top second-ranked the liver this from from reported transcripts Fusion muto necrmsmladdsa MRIra pairs 2B read largest (Fig. second iMARGI The GPC5-GPC6 distal (7). from and cancers was interchromosomal report- prostate of the and head found amount liver, lung, kidney, we neck, from pair, transcripts and 2B fusion gene largest FHIT-PTPRG (Fig. this of the pairs ing FHIT-PTPRG Investigating with read was S4A). pair iMARGI Fig. cells gene distal top HEK and the in interchromosomal XIST, interactions of as amount RNA–DNA such noncod- abundant RNAs, distal extremely ing Excluding significant data. Cancer. iMARGI the most in the from Transcripts with the Fusion Colocalized identify Forming Interactions Pairs RNA–DNA Gene Significant pairs. Most read usable The of amount small very its S3 to Fig. due Appendix, also (SI likely data iMARGI to size correlation bin weaker the S3 with Fig. increased Appendix, correlation (SI The S3A). end Fig. DNA Appendix, 0.88, strong the = a correlation with exhibited (Pearson pxMARGI pairs correlation and read iMARGI bin. of each number to 100-kb mapped the into counted genome and the segmented bins We segment. genomic every fewer many its with consistent pairs. read S2), usable Fig. every Appendix, in of (SI caRNAs type fewer that RNA severalfold ( revealed caRNAs to diMARGI of S2). types caRNAs abundant Fig. most anti- the of and as RNA, RNA amount sense pseudogene lincRNA, pair), similar mRNA, read with valid a pxMARGI, (1 revealed threshold possible iMARGI smallest the Under NAs. xs cosalo h nrcrmsmlgn ar ihgnmcdistance genomic with pairs gene intrachromosomal the of all across axis) Number of gene pairs on (10) level” attachment “RNA the by compared we Third, caR- discovered of numbers the by compared we Second, 10 10 10 5 8 2 eakdwehrteei n lblcharacteristic global any is there whether asked We 0 0 600 400 200 0 xs ewe h ee novdi h uin(purple fusion the in involved genes the between axis) Number ofiMARGIreadpairs B and xs spotdaanttegnmcdistance genomic the against plotted is axis) .dMRIdt xiie much exhibited data diMARGI C). and NSLts Articles Latest PNAS i.S4B). Fig. Appendix, SI P value and <2.2×10 estofto off set We GPC5-GPC6 FHIT-PTPRG IAppendix, SI IAppendix, SI | − D–F f10 of 3 16 ; SI ),

SYSTEMS BIOLOGY samples across 33 cancer types from TCGA project to further exhibited nonuniform distribution in the genome, characterized analysis (4). On average, there were two fusion transcripts per by enrichment of intrachromosomal pairs and preference to sample (Fig. 3A). The 16,410 fusion transcripts corresponded to smaller genomic distances. 15,144 unique RNA pairs. Hereafter we refer to these gene pairs as fusion transcript-contributing RNA pairs (“Futra pairs”). Differences Between the Genomic Locations of Futra Pairs and More than 95% (14,482 of 15,144) of Futra pairs occurred only in Genome Interactions. We asked to what extent Futra pairs may 1 sample of the 9,966 samples analyzed (Fig. 3B). These data con- correlate with genome interactions. Forty-one percent (6,253 firmed the scarcity of recurrent gene pairs in fusion transcripts of 15,144) of Futra pairs were interchromosomal, whereas less (11, 12). than 15% of chromosomal conformation capture-derived read We visualized the frequency of Futra pairs in a 2D heatmap, pairs were interchromosomal (13, 14). The intrachromosomal which we call a “fusion map” (Fig. 3C and SI Appendix, Fig. S5A). Futra pairs exhibited enrichment in large chromosomal domains The 2D distribution was not uniform, with more intrachromoso- (Figs. 3C and 4A and SI Appendix, Fig. S6A). These enriched mal than interchromosomal gene pairs (odds ratio = 27.91, P chromosomal domains ranged from approximately 1/10th to value <2.2 × 10−16, χ2 test). A total of 8,891 and 6,253 Futra 1/3rd of a chromosome in lengths, which are ∼10–20 times pairs were intra- and interchromosomal, respectively. Chromo- the typical sizes of topologically associated domains (TADs) somes 1, 12, and 17 harbored the largest amounts of intrachro- (15). Taken together, Futra pairs exhibited different global mosomal gene pairs (SI Appendix, Fig. S5B). 1, 11, distribution characteristics from those of genome interactions. 12, 17, and 19 contribute to the largest amounts of interchromo- somal gene pairs (SI Appendix, Fig. S5C). Higher density of gene Genome-Wide Colocalization of Futra Pairs and RNA–DNA Interac- pairs appeared on the diagonal line of the fusion map, suggesting tions. We asked to what extent Futra pairs may coincide with enrichment of gene pairs within chromosomes or large chromo- genome-wide RNA–DNA interactions. We carried out a visual- somal domains (Figs. 3C and 4A). We quantified the relative ized comparison of the 2D distribution of Futra pairs with that distances between the Futra pairs. The number of intrachromo- of RNA–DNA interactions (Figs. 1E and 3C and SI Appendix, somal Futra pairs negatively correlated with their chromosomal Fig. S1D) and observed pronounced similarities (Fig. 4 A–C distance (Fig. 2A, purple circles). Taken together, Futra pairs and SI Appendix, Fig. S6 A–C). For example, a set of 34 Futra

0 16 256 A C Number of Futra pairs

2000 chr19 chrX chr12 chr20 chr13 chr15 chr16 chr17 chr18 chr21 chr7 chr14 chr10 chr22 chrY chr11 chr2 chr3 chr4 chr5 chr6 chr1 chr9 chr8

1500 chr1

1000 chr2

500 chr3

Number of samples chr4 0 chr5 0 10203040 Number of Futra pairs in a sample chr6

chr7 B chr8 4.4% (662) chr9 chr10 chr11 chr12 chr13 95.6% chr14 (14482) chr15 chr16 chr17 chr18 chr19 chr20 Recurrence chr21 chr22 One More than one chrX chrY

Fig. 3. Overview of the 15,144 Futra pairs derived from 33 cancer types. (A) Distribution of the number of detected Futra pairs in each sample across the 9,966 cancer samples. (B) Pie chart of the numbers of Futra pairs that appeared in only one cancer sample (nonrecurring Futra pairs, red) and in mul- tiple cancer samples (recurring Futra pairs, green). (C) The genomic distribution of Futra pairs. Genomic coordinates of all chromosomes are ordered on rows and columns. The numbers of Futra pairs of corresponding genomic positions are shown in a blue (small) to red (large) color scheme. Bin size: 10 Mb.

4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1819788116 Yan et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 N neatosi ihrHKo F el od ai = ratio (odds al. cells et Yan HFF RNA– or with HEK overlapped either (48.2%) in 3,014 interactions DNA pairs, interchro- Futra 6,253 the mosomal Among interactions. RNA–DNA identified partners fusion iMARGI four (iMARGI; the RNA–DNA cells and HEK 73 KMT2C in to of mapped were total (chr7: pairs a read Correspondingly, 7 pairs; Mb S7). (Futra chromosome 80 Fig. approximately 66,000,000–78,000,000) of were (chr7: that location KMT2C away mRNAs four 152-Mb the and the between 152,000,000) near formed fusion were mRNA Four S7). KMT2C-GTF2IRD, individual Fig. and of Appendix, KMT2C-CLIP2, resolution (SI KMT2C-CALN1, the pairs KMT2C-AUTS2, transcripts, read at and as pairs well fusion as resolutions, 4D) 10- S6 (Fig. including Fig. resolutions, Appendix, (SI different Mb using analyses multiscale in RNA–DNA 4 in (Fig. enriched region interactions chromosomal a with lane 4 top colocalized the Fig. the in In Mb, position pair. genomic Futra (32–39 a an the indicates to in bottom mapped enriched the end was at RNA pairs gene the a with small to pair from lane read 9 top iMARGI the chromosome an from on lane. indicates gene (columns) bottom line a end the green linking to DNA line each mapped the blue cells, end and each HEK DNA pairs, (rows) in the Futra and end pairs in of boxes read RNA track (yellow iMARGI the the region of of In 30-Mb track positions lanes. a of bottom mapped view the the Detailed and to (D) top respect scale. (red) with (B large (columns). plotted to right are (blue) to pairs left read from and iMARGI (rows) of bottom to top from plotted 4. Fig. A D 139 100 (M) read pairs

chr9 50 eqatfidteoelp ewe ur ar n iMARGI- and pairs Futra between overlaps the quantified We iMARGI 0 Futra pairs 010139 100 50 0 Number ofFutrapairs itiuino nrcrmsmlFtapiso hoooe9 eoi oriae are coordinates Genomic 9. chromosome on pairs Futra intrachromosomal of Distribution (A) pairs. read iMARGI and pairs Futra of Comparison chr9 chr9 Genes Genes DNA RNA Fusion map A chr9 .Sc ooaiain eeobserved were colocalizations Such B–D). and -brgo ncrmsm (chr9) 9 chromosome on region ∼7-Mb i.S7). Fig. Appendix, SI 0246 ,1M Fg 4 (Fig. 1-Mb A–C), .Ti ur-arerce region Futra-pair–enriched This D). 25 M 25 M B ,ad100-kb and A–C), 139 100 (M) chr9 50 0 050100139 IAppendix, SI ubro edpisNumberofread pairs Number ofreadpairs N-N neato nHKRNA-DNA interactioninHFF RNA-DNA interactioninHEK 30 M 30 M and C N–N neato arcsi E el (B cells HEK in matrices interaction RNA–DNA ) chr9 A–C aaes hc eotdattlo ur ar.Nn fthe of None pairs. Futra 8 these of on total (17) a STAR-Fusion reported ran 2 which and datasets, million (16) datasets cells RNA-seq 75 we HEK293T merged First, from than the tests. occur more detectable reanalyzed other to were We of two pairs cells. transcript out Futra normal fusion cancer-derived carried in a the we whether for sample, due tested likelihood cancer perturbation a Rec- small by tested transcripts. in very be fusion the cannot of model to creation this RNA–DNA in for that interactions of poise ognizing RNA–DNA cells colocalization that normal is the the pairs explain Cells. Futra Normal may and in interactions Transcripts that Fusion Form model Interac- Not RNA–DNA A Do Cells with Normal Colocalize in That tion Pairs Futra Cancer-Derived their is cells. normal which in pairs, = interactions Futra RNA–DNA ratio with cancer-derived colocalization (odds of feature cells common HFF a or HEK either 35.44, in RNA– interactions with overlapped DNA (83.5%) 7,427 pairs, Futra chromosomal 14.1, 050100 35 M .Ti eoi einadtecnandgnsaepotdtiei the in twice plotted are genes contained the and region genomic This ). 35 M P P value value 5 200 150 <2.2 <2.2 × × 40 M 40 M C 10 139 100 (M)

10 chr9 50 0 − − 16 050100139 16 , , × χ χ 2 2 0-ppie-n edpairs read paired-end 100-bp et.Aogte881intra- 8,891 the Among test). et.Teedt one to pointed data These test). n F el (C cells HFF and ) NSLts Articles Latest PNAS 45 M 45 M chr9 02040 .Tenumbers The ). | 080 60 f10 of 5

SYSTEMS BIOLOGY previously derived 15,144 Futra pairs from TCGA RNA-seq data RNA-seq was carried out with Illumina’s TruSight RNA Pan- were detected in HEK293T cells. In addition, we specifically Cancer Panel. Of these 96 samples, 27 did not yield sufficient tested for EML4-ALK fusion transcripts, which were reported RNA for sequencing, whereas the other 69 samples produced a in nonsmall cell lung carcinoma (NSCLC) (18), and there were sequencing library and yielded on average 3.9 million uniquely RNA–DNA interactions between EML4 RNA and the ALK aligned read pairs per sample (Fig. 5A). STAR-Fusion (17) was genomic locus in HEK and HFF cells (see Fig. 6A). Neither applied to this dataset and it reported a total of 42 fusion tran- PCR nor quantitative PCR analysis detected EML4-ALK fusion scripts from these 69 samples (Fig. 5 B and C). These 42 fusion transcripts in HEK293T cells (SI Appendix, Fig. S8), whereas transcripts included EML4-ALK and FRS2-NUP107 fusions, both assays detected fusion transcripts in a NSCLC cell line which were also reported from the 9,966 TCGA cancer samples, (H2228) (SI Appendix, Fig. S8). Taken together, these data sug- as well as 40 new fusion transcripts that were not previously doc- gest that, although cancer-derived Futra pairs colocalized with umented. The small amount of recurring Futra pairs between RNA–DNA interactions in normal cells, the fusion transcripts these additional cancer samples and TCGA samples is expected found in cancer are not present in the normal cells. from the small fraction of recurring Futra pairs across the TCGA samples (Fig. 3B). RNA–DNA Interactions in Normal Cells Are Predictive of Fusion Among these 42 Futra pairs detected from the validation Transcripts in New Cancer Samples. Next, we tested whether the cohort, 37 (88.1%) colocalized with RNA–DNA interactions in RNA–DNA interactions in normal cells are predictive of fusion the assayed normal cells, supporting the idea that RNA–DNA transcript formation in cancer. To this end, we analyzed a valida- interactions in the already assayed normal cells are predictive tion cohort comprising 96 new lung cancer samples from patients of Futra pairs in cancer (odds ratio = 106.51, P value <2.2 × who were not part of the TCGA cohorts. We also analyzed 10−16, χ2 test). We asked whether only intrachromosomal Futra a NSCLC cell line (H2228). RNA was extracted and targeted pairs colocalized with RNA–DNA interactions. Nineteen of the

A x10 6 Total read pairs Uniquely mapped read pairs 6

4

2

0 Number of read pairs 1 3 90 49 57 75 77 81 96 18 31 61 76 82 83 84 88 89 91 92 95 11 21 23 25 27 28 32 35 36 38 39 40 42 44 46 51 53 60 62 64 73 13 14 15 22 24 26 33 41 43 45 47 48 52 55 56 59 74 79 17 58 78 80 87 19 30 10 50 H2228 B 5 4 3 2

transcripts 1

Number of fusion 0 1 3 22 21 25 41 42 43 47 48 55 78 15 17 18 19 49 50 58 74 80 82 87 88 91 96 11 27 28 30 31 32 33 35 36 39 57 60 61 62 81 89 90 95 10 13 23 24 26 38 40 45 52 56 14 44 46 51 53 73 76 77 59 64 75 83 84 92 79 H2228 C 23 DEIntra-chromosomal Inter-chromosomal 19 20 1 4 Total Futra pairs 15 4 Overlapped with 4 4 10 HEK iMARGI Overlapped with 5 15* HFF iMARGI 10# 0 Number of Futra pairs intra inter Type of Futra pairs

Fig. 5. Fusion transcripts detected from the new lung cancer samples. (A) The number of RNA-seq read pairs (light blue bar) and the uniquely mapped read pairs (dark blue bar) of each sample (column). (B) Number of detected fusion transcripts (y axis) in each sample (columns). (C) Numbers of intra- and interchromosomal Futra pairs detected from the 69 cancer samples. (D and E) Intersections of these intrachromosomal (D) and interchromosomal (E) Futra pairs to RNA–DNA interactions in HEK (green) and HFF (pink) cells. ∗: The 15 intrachromosomal Futra pairs that overlap with RNA–DNA interactions in both HEK and HFF (yellow-green) cells are ALK:EML4, RP11-557H15.4:SGK1, FRS2:NUP107, ACTN4:ERCC2, LRCH1:RP11-29G8.3, CUX1:TRRAP, CEP70:GSK3B, NIPBL:WDR70, KCTD1:SS18, CHST11:NTN4, RP1-148H17.1:TOP1, LMO7:LRCH1, NAV3:RP1-34H18.1, LPP:PPM1L, and MTOR-AS1:RERE. #: The 10 interchro- mosomal Futra pairs that overlap with RNA–DNA interactions in both HEK and HFF (yellow-green) cells are LIN52:PI4KA, LPP:OSBPL6, FCGBP:MT-RNR2, KMT2B:MALAT1, ETV6:TTC3, KTN1:MALAT1, FLNA:MALAT1, COL1A2:MALAT1, COL1A1:MALAT1, and MALAT1:TNFRSF10B.

6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1819788116 Yan et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 i.6. Fig. B 5 (Fig. interchromoso- interactions RNA–DNA of D with 19) overlapped of pairs (18 Futra 95% mal of and (19 intrachromosomal percent of Eighty-three 23) samples. TCGA Futra from interchromosomal detected of pairs (41%) proportion 5C), the (Fig. to interchromosomal were comparable pairs Futra detected (45%) 42 a tal. et Yan measurement. no box: White (column). sample rearrangement. RNA-seq, 2 cancer without on bar: gene each based (Scale ALK transcripts in integral RNA. fusion test) indicate fusion EML4-ALK rearrangement of indicating DNA detection (C signals a rearrangement. of FISH without FISH, outcomes gene Apart (black) ALK negative of Break deletion and (ALK partial (red) FISH Positive DNA (B) and pairs. FuseFISH, read iMARGI (A, of line. information horizontal pairing a by linked is reads EML4-ALK because choosing test by creation this pairs for the transcripts fusion-susceptible for fusion from EML4-ALK step transcripts prerequisite Tumor. fusion an a in of with is Gene rearrangement Correlates Fusion genome ALK Without and Fusion EML4 RNA Between Interaction RNA–DNA RNA–DNA with pairs. pairs fusion-susceptible cells gene normal the in of call as interactions model We where creation this model. to model for RNA-poise refer we the poise the Hereafter, cancers. cells support in transcripts normal samples fusion in additional cancer of interactions new predictability RNA–DNA in the pairs and fusion cells, Futra cancer-derived normal pairs of in lack Futra transcripts the of interactions, colocalization intrachromosomal RNA–DNA the to and together, restricted not Taken was interactions. pairs Futra and actions C A DNA FISH FuseFISH RNA-seq and iMARGI RNAseq Gene read pairs H2228 ,sgetn htteclclzto fRADAinter- RNA–DNA of colocalization the that suggesting E), HEK N uinadDAbek (A, break. DNA and fusion RNA # 44 HFF chr2 H2228 ALK HEK 29.2 mb

M4MreBreakApart FISH Merge EML4 1 2 3 4 5 6 29.3 mb 7 8 9 10

11 29.4 mb 12 13 14 15

16 29.5 mb Lower 17 Middle 18 19 20 29.6 mb

rcs MRIra ar lge otetogns e as N n.Bu as N n.Ti rylines: gray Thin end. DNA bars: Blue end. RNA bars: Red genes. two the to aligned pairs read iMARGI tracks) 21 22

rcs N-e edpis(upebr)aindt L (Left ALK to aligned bars) (purple pairs read RNA-seq tracks) 23 ersnaieiaeo L ra pr IH ihclclzdrdadylo inl that signals yellow and red colocalized with FISH, Apart Break ALK of image representative A (D ) µm.) 24

25 whether tested We

26 29.7 mb 27 ueIHiae ften.3 acrsml rmteEL n L hnes ros colocalized Arrows: channels. ALK and EML4 the from sample cancer 37 no. the of images FuseFISH )

ALK 28 29 30

31 29.8 mb 32 33 34 35

36 29.9 mb 37 38 39 40 41 42 43

44 the Appendix , to (SI hybridize to respectively designed nm, were 605 probes and FISH nm The S9). 705 Fig. at quan- with dots transcripts ALK dot-labeled and tum quantum EML4 labeling out by carried (21) We sm-FISH 20). of (19, in detection transcripts fluorescence the fusion for single-molecule can- method (sm-FISH)–based a lung hybridization 96 FuseFISH, situ of with collection samples our cer reanalyzed we transcripts, fusion genome EML4-ALK detectable was with there sample transcripts. whether fusion only probes. the ascertain ALK or in not control rearrangement could either from therefore FISH sig- tissue FISH We DNA remaining DNA to the any yielded 44 in attempts nal no. eight & our sample of from Health None analysis. tissue Oregon remaining sub- the the We protocols. jected standardized at to performed according Laboratories were University Science Diagnostic assays the Apart Knight of Break detection by FISH gene. DNA fusion on (Vysis based EML4-ALK kit FISH) diagnosis Apart FDA-approved an Break no. is ALK (sample there tran- samples and tumor 6B), fusion new (Fig. our EML4-ALK 44) of one 6A), in detected (Fig. are pair scripts fusion-susceptible a is 45 RNAseq: 46 EML4-ALK express that samples cancer other identify To H2228 chr2 oiieotoeNomeasurement 47 Positive outcome

48 42.1 mb 49 50 51 52 53 54 EML4

55 * 56 57 58 59 60 61

62 42.2 mb 63 64 65 Negative outcome n M4(Right EML4 and ) 66 67 68 69

D 70 71 72 73 74 75 42.3 mb

NSLts Articles Latest PNAS 76 77

.Ec aro paired-end of pair Each ). 78 79 80 81 82 83 84 85 86 eetda Detected ∗: 87 88

89 42.4 mb

| 90 91 f10 of 7 92 93 94 95 96

SYSTEMS BIOLOGY consensus exons shared among all 28 variants of EML4-ALK molecules transcribed from distant chromosomal locations to fusion transcripts that have been identified to date (22). Follow- meet in space are small. Therefore, it remains difficult to per- ing prior literature (19, 20), fusion transcripts were detected by ceive a biophysical process in which fusion transcripts are created the colocalized sm-FISH signals targeting EML4 and ALK tran- by splicing errors. scripts. In a positive control test, an average of 12 colocalized The RNA-poise model fills this theoretical gap. The prein- sm-FISH signals per cell were detected in a total of 22 H2228 stallation of gene 1’s transcripts on gene 2’s genomic sequence cells (SI Appendix, Fig. S9) that were known to express EML4- positions gene 2’s nascent transcripts spatially close to gene ALK fusion transcripts (SI Appendix, Fig. S8) (23). In contrast, 1’s transcripts, allowing for the possibility of transsplicing. Fur- HEK293T cells exhibited on average zero colocalized signals thermore, the majority of splicing events are cotranscriptional. per cell from 19 cells (SI Appendix, Fig. S9), consistent with the The availability of transcripts of gene 1 during gene 2’s tran- lack of such a fusion transcript in HEK293T cells (SI Appendix, scription allows for the opportunity to perform cotranscriptional Fig. S8). transsplicing. In our collection of 96 tumor samples, only 57 had remain- ing tissues for FuseFISH analysis. These 57 samples included Breaking Down the RNA-Poise Model by RNA–DNA Interactions. 39 that yielded RNA-seq data and 18 that did not yield RNA- Remote RNA–DNA interactions could be created by at least two seq data (Fig. 6B). The FuseFISH analysis detected EML4-ALK means. First, the caRNA can target specific genomic sequences, fusion transcripts in 2 samples, including sample no. 44 which which could be mediated by tethering molecules (RNA targeting, was also analyzed by RNA-seq and sample no. 37 which did not Fig. 7). Second, the spatial proximity of the genomic sequences yield RNA-seq data (Fig. 6 B and C). To test whether sample in 3D space could bring the nascent transcripts of one gene no. 37 had ALK-related fusion genes, we subjected it together to the genomic sequence of another gene (RNA confinement, with 6 other randomly selected samples (nos. 18, 51, 56, 57, Fig. 7). Both means of RNA–DNA interactions provide spatial 63, and 65) to DNA recombination analysis using Vysis ALK proximity between two RNA molecules and thus allow for splic- Break Apart FISH. None of these 7 samples exhibited ALK- ing errors. In addition, the spatial proximity of two genes in the related fusion genes. More specifically, 1 sample (no. 57) failed RNA confinement model could enhance the chances of genome to generate DNA FISH signals from four attempts and 1 sample rearrangement of the spatially close genomic sequences and thus (no. 56, negative for EML4-ALK fusion transcripts by RNA-seq create fusion genes (25). Thus, the RNA-poise model can be and FuseFISH analyses) exhibited a partial deletion of the ALK regarded as a union of two submodels, depending on the pro- gene, but no sign of ALK-related fusion genes (Fig. 6B, ∗). The cess of RNA–DNA interaction. One submodel (targeting-poise other 5 samples, including sample no. 37, exhibited integral ALK model, Fig. 7) could create fusion transcripts only by transsplic- genes (Fig. 6 B and D). Taken together, the lung cancer sample ing. The other submodel (confinement-poise model, Fig. 7) could no. 37 expressed EML4-ALK fusion transcripts without having an create fusion transcripts by either transsplicing or creation of EML4-ALK fusion gene. These data suggest that genome rear- fusion genes. rangement is not a prerequisite step for the creation of fusion transcripts from fusion-susceptible pairs. In other words, the Materials and Methods RNA-poise model does not require alterations of the DNA. Reference Genome and Gene Annotations. assembly hg38/GRCh38 and Ensembl gene annotation release 84 (GRCh38.84) were Discussion used throughout all data analyses. Abundance of Genome Rearrangement-Independent Fusion Tran- Public RNA-Seq Data. HEK293T RNA-seq datasets were downloaded from scripts. Our RNA FISH and DNA FISH analyses revealed a the NCBI BioProject database (accession nos. SRR2992206–SRR2992208 cancer sample that contained a fusion transcript without the cor- under project no. PRJNA305831) (16). The three datasets were merged. responding fusion gene. Such an example, although not often seen in the literature, may not be a rare case (11). The lack of reports is likely attributable to the research attention paid to the other side of the coin, i.e., the fusion transcripts created by RNA-poise model Targeting-poise Confinement-poise fusion genes (2). Indeed, ∼36–65% of fusion transcripts derived from cancer RNA-seq data were attributable to genome rear- RNA Targeting RNA Confinement rangement (Low Pass bars, figure S1A of ref. 1). However, this is RNA 1 likely an overestimate because when low-quality whole-genome RNA-DNA interaction RNA 1 sequencing (WGS) data were removed, only ∼30–45% of fusion transcripts had corresponding WGS reads (High Pass bars, fig- ure S1A of ref. 1). These published results are consistent with the notion that fusion genes do not account for all observed fusion transcripts and suggest the occurrence of fusion transcripts inde- pendent of genome rearrangement. However, we recognize that RNA 1 to date, the sheer amount of validated fusion RNAs indepen- Fusion RNA biogenesis RNA 2 dent of genome rearrangement remains limited, which warrants future investigation. Trans-splicing DNA re-arrangement

The RNA-Poise Model Allows for Splicing Errors. Fusion transcripts can be created by two processes. The better-recognized pro- Gene 1 RNA 1 cess is through transcription of a fusion gene that was cre- Gene 2 RNA 2 ated by genome rearrangement. The less-recognized process is Pol II Spliceosome Fusion RNA by RNA splicing errors, where two separate transcripts were Fig. 7. RNA-poise model. In this model, the transcripts of one gene (RNA spliced together (transsplicing) (24). Transsplicing does not 1, red bar) can exhibit spatial proximity to another gene (RNA 2, purple involve genome rearrangement. A theoretical gap in the splicing bar) due to tethering (RNA targeting) or spatial proximity of the two genes error model is that transsplicing can happen only to two RNA (RNA confinement). Both cases could enhance splicing errors (gray arrows), molecules that are close to each other in 3D space; however, whereas the proximity of genomic sequences may also facilitate gene fusion except for neighboring genes (11), the chances for two RNA (gray arrow on the right), which subsequently produces fusion RNA.

8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1819788116 Yan et al. Downloaded by guest on October 2, 2021 Downloaded by guest on October 2, 2021 a tal. et Yan (https://github.com/Zhong-Lab- repository GitHub our in UCSD/iMARGI found be can enriched ods pairs. be read iMARGI to Mapping 2) Data. Sequencing (Read iMARGI end of Analysis DNA of is bases which two 1A, first restriction Fig. 5 CT. AluI the the with shown at Since expect as “CT” (10). we leaves 2 configuration and cut, Read sequence library “AGCT” and MARGI recognizes 1 with enzyme Read as into same phase ends can the strategy DNA construction and library Illumina and RNA an were circularization on The libraries sequencing 4000. iMARGI 100-cycle HiSeq (10). paired-end described to gener- subjected previously library subsequently as sequencing performed and circularization, were digestion, (NEB) denaturation, ation BamHI DNA and RNA, annealing of oligo RNA–DNA transcription of 7.9, pulldown reverse linkers, chimera, ligated (pH mM nonproximity including from alcohol 100 steps biotin subsequent of EDTA, phenol:chloroform:isoamyl The removal precipitation. mM of ethanol 1 by volume followed SDS, Ambion) equal 1% an 7.5, 65 pH at adding Tris·HCl, incubated mM and 50 NaCl] (NEB), K teinase 1× with and RNasinPlus] construction. proximity unit/µL Library 0.5 of (NEB), × 1 16 BSA mL at (NEB), mg/mL incubated RNA– 2 1 ligase situ X-100, DNA in Triton in T4 0.1% resuspended For buffer, units/µL linker. were [4 free mixture nuclei excess ligation remove ligation, to proximity buffer DNA PNK with 22 times at five incubated and 16 RNasinPlus] then unit/µL 0.4 X-100, Triton 1× (NEB), KQ 200 truncated in resuspended were For (10). nuclei [38 used ligation, was RNA-linker paper MARGI situ previous in the in described as sequence linker 1× 200 (3 in Fragment resuspended Klenow buffer, PNK unit/ with 37 0.4 at (NEB), 6.5] pH PNK buffer, T4 unit/µL 200 [0.5 in incubated were Tris mM Ligations. (20 buffer MgCl wash mM PNK 10 using twice washed and 37 pelleted at incubated and mixture units/µL [2.3 1 nuclei 1 × mix 37 washed and at digestion (Promega) the incubated RNasinPlus chromatin and chromatin, unit/µL AluI 0.3 digest with an To (NEB), in AluI X-100. resuspended Triton of was 1% pellet final a (0.5× 62 with buffer 10 and volume about ratio SDS pellet be vol with the to 1:3 incubated estimated estimate was was to pellet weighed let nuclei and of pelleted strokes mg 20 were (10 for A Nuclei pestle ice. homogenizer 1× dounce on P-40, with Tris·HCl, homogenized Nonidet mM and 0.2% (10 min buffer NaCl, 15 lysis mM cell 10 of mL 7.5, 1 pH in incubated were cells cross-linked edpiswr lae yi-os cit.Acrigt h irr con- library the to According scripts. in-house by cleaned were pairs read washed M pelleted, 0.2 were Cells at min. glycine 10 1× for with using RT quenched at incubated was with and min reaction concentration 10 for cross-linking (RT) were The temperature Cells room library. rotation. at sequencing formaldehyde iMARGI 1% an in of cross-linked construction the for used digestion. were chromatin and preparation Nuclei Libraries. Sequencing iMARGI Constructing (31). GIVE Futra with of created plots were Genomic pairs (30). ComplexHeatmap package Bioconductor Pairs. using were Futra hg38 of with Visualization on (27) R kb using (29). 200 processed InteractionSet mainly and within were (28) pairs GenomicRanges pairs packages Portal Futra Futra Bioconductor liftOver. of by (8), Data data hg38 The al. to Gene removed. et converted were Fusion Davidson coordinates Following Tumor Genomic analyses. the our in from downloaded ( were Transcripts. scripts Fusion Derived TCGA 2) ir1adte uintasrpswr used were transcripts fusion 2 tier and 1 Tier (26). www.tumorfusions.org) fRaeI(:0dltdi 1× in diluted (1:10 I RNase of µL E ufr2 n nuae t37 at incubated and 2] buffer NEB 4RAlgs 2- ligase RNA T4 units/µL 10 linker, annealed and adenylated µL ◦ B,rsseddi 250 in resuspended PBS, B,adaiutdinto aliquoted and PBS, vrih ihmxn.Atrlgto,tence eewashed were nuclei the ligation, After mixing. with overnight C opeaeteRAadDAed o ikrlgto,nuclei ligation, linker for ends DNA and RNA the prepare To 2 ). .Bifl,te nld he anses is,the First, steps. main three include they Briefly, methods). ◦ overnight. C ◦ 0 ◦ ◦ o 0mnwt iigadimdaequenching immediate and mixing with min 10 for C → o 0mnwt iig uliwr ahdtwice washed were Nuclei mixing. with min 30 for C orvrecoslnig uliwr ahdtwice washed were nuclei cross-linking, reverse To vrih ihmxn.Atrcrmtndigestion, chromatin After mixing. with overnight C 4RAlgs ecinbfe,2%PG80,0.1% 8000, PEG 20% buffer, reaction ligase RNA T4 N 3 RNA µL 5 0 x- NB,01m AP .%Tio X-100, Triton 0.1% dATP, mM 0.1 (NEB), exo-) ◦ etaso h on arxwr plotted were matrix count the of Heatmaps o .DAadRAwr xrce by extracted were RNA and DNA h. 3 for C ◦ h ealdiAG aapoesn meth- data-processing iMARGI detailed The o i ofamn N.Nce were Nuclei RNA. fragment to min 3 for C ∼5 h CARAsqdrvdfso tran- fusion RNA-seq–derived TCGA The fetato ufr[ gm Pro- mg/mL [1 buffer extraction of µL 0 B)wsdrcl de otereaction the to added directly was PBS) eddpopoyainrato mix reaction dephosphorylation -end Nsnls 1× RNasinPlus, µL × ◦ N -aln i 03unit/µL [0.3 mix A-tailing DNA µL o 0mnwt iig h same The mixing. with min 30 for C 10 6 nec ue opeaenuclei, prepare To tube. each in usatbfe,05 D)at SDS) 0.5% buffer, Cutsmart rtaeihbtr niefor ice on inhibitor) protease prxmtl 5 Approximately N iaereaction ligase DNA ) h ulipel- nuclei The µL). N phosphatase PNK usatbuffer] Cutsmart iainmix ligation µL ◦ o and h 6 for C C,p 7.5, pH ·HCl, 0 × n fthe of end 10 6 cells n I DNcem 0C204 t ..adS.Z.). and S.Z.), S.C. (to (to (to DP1HD087990 U01CA200147 Nucleome R01HL108735 S.C.), 4D Z.C.), (to NIH (to R01HL121365 and R00HL122368 S.C.), Grants (to provided NIH R01HL106579 cells, by S.C.), 1 funded Tier is Nucleome work (https://www.4dnucleome.org/cell-lines.html). 4D This laboratory the Dekker of Job line the cell by a are 6) (HFF-hTert-clone larger them ACKNOWLEDGMENTS. of sum pair the read and zero discordant than 2. supporting with than larger of (v2.5.4b) read STAR-Fusion data, numbers STAR junction-spanning using sequencing both and called using sample requiring were (hg38) (17), cancer transcripts genome (v0.8.0) pub- Fusion lung human (37). HEK293T and parameters the including line, default to cell data, man- mapped H2228 RNA-seq the were the the following data, of size (Illumina) lic approximate Panel All the Pan-Cancer protocol. of RNA ufacterer’s TruSight samples the tissue using mm cancer lung mm×3 and 3 line cell Analysis. H2228 and Sequencing RNA overlap by transcripts radius. detection a of within deter- centers were colocalization predicted transcripts for between Fusion searching range. a by thresholds, within mined robust change not for using do searches counts detected that where were algorithm transcripts thresholding single automated Briefly, an (36). at described microscope previously inverted as IX83 Olympus an 60× using imaging fluorescence field 1× in analysis. resuspended and Imaging and buffer imaging. wash for with PBS washed 37 subsequently at was incubated (5– was tissue slice mixture tissue The A solution. droplets. hybridization form to parafilm the of 10 surface probes the with to solution added hybridization a was Briefly, (35). described previously as out samples. tissue of Hybridization imaging. de otehbiiainslto n nuae ihteclsat cells the were with probes incubated Briefly, and 21). solution (19, hybridization described the 37 previously to as added out final lines. carried any cell remove was adherent A to of probes. used Hybridization unconjugated was remaining filtration MyOne aggregates. any membrane dynabeads to 0.2-µm with remove subjected subsequent purified to was MWCO purification subsequently 100,000 SILANE were the group of probes amino retentate The The the through (34). dots 0.2 reaction These quantum EDC hybridization. with probe–RNA using conjugated of were hindrance steric probes minimize to 5 carbons the on probes. ified oligonucleotide to dots EML4-ALK quantum of of contents Conjugation GC variations similar chosen with observed temperatures. length, were melting in the exons nt and of 35–40 These were all probes RNA. These in ALK genes. fusion of present 20–23 were exons they and because RNA EML4 of 2–6 design. Probe Analysis. FuseFISH pair. the body Futra and gene the pair in the Futra strand- gene the to was other in mapped end gene was one RNA of end the body DNA when gene pair the to Futra mapped a specifically with overlapping pairs. as Futra regarded and pairs read iMARGI of and Intersection (33) Gviz package Bioconductor (31). with GIVE created plots were pairs Genomic read (30). iMARGI ComplexHeatmap of package Bioconductor pairs. using read plotted iMARGI of further Visualization for used data. each iMARGI were of of (29) our analysis InteractionSet kb from and excluded 200 (28) were within GenomicRanges which locations interactions, analysis. proximal genomic as to defined valid were The mapped other pairs. were read mapped that the pairs filter read and deduplicate, parse, pairtools to Finally, used using were (32). (hg38), of “-SP5M” genome bases parameters v0.2.0, human two with (version the 0.7.17) first to Then, (version the mapped nucleotides. mem random were addition, bwa are pairs In they read as CT. removed cleaned were not the 1) 5 were (Read the end 2) RNA if (Read the out end filtered DNA were their pairs read design, struction ntikes xdo ls oesi a etypae vrthe over placed gently was coverslip glass a on fixed thickness) in µm ebaefitainad1000mlclrwih uof(MWCO). cutoff weight molecular 100,000 and filtration membrane µm ◦ i meso betv NA .) mg rcsigwscridout carried was processing Image 1.4). = (N.A. objective immersion oil vrih.Clswr ahdadrsseddi 1× in resuspended and washed were Cells overnight. C × lgncetd rbswr eindt yrdz oexons to hybridize to designed were probes Oligonucleotide 0 n nhuescripts in-house and https://github.com/mirnylab/pairtools) n ihapiayaiogopadasae f30 of spacer a and group amino primary a with end 30 e ape N eunigwscridout carried was sequencing RNA sample. per µm el rtsuswr mgdi 1× in imaged were tissues or Cells aehetimraie ua oeknfibroblasts foreskin human hTert-immortalized Male N a xrce ihTio rman from Trizol with extracted was RNA rb yrdzto ntsuswscarried was tissues in hybridization Probe etaso h on arxwere matrix count the of Heatmaps rb yrdzto nH28cells H2228 in hybridization Probe 0 bflnigrgoso the of regions flanking kb ±100 NSLts Articles Latest PNAS niAG edpi was pair read iMARGI An 0 ms w ae of bases two -most B hog wide- through PBS ◦ lgswr mod- were Oligos vrih.The overnight. C | B for PBS f10 of 9

SYSTEMS BIOLOGY 1. Gao Q, et al. (2018) Driver fusions and their implications in the development and 19. Semrau S, et al. (2014) FuseFISH: Robust detection of transcribed gene fusions in treatment of human cancers. Cell Rep 23:227–238e3. single cells. Cell Rep 6:18–23. 2. Mertens F, Johansson B, Fioretos T, Mitelman F (2015) The emerging complexity of 20. Markey FB, Ruezinsky W, Tyagi S, Batish M (2014) Fusion FISH imaging: Single- gene fusions in cancer. Nat Rev Cancer 15:371–381. molecule detection of gene fusion transcripts in situ. PLoS One 9:e93488. 3. Kumar-Sinha C, Kalyana-Sundaram S, Chinnaiyan AM (2015) Landscape of gene 21. Nguyen TC, et al. (2016) Mapping RNA-RNA interactome and RNA structure in vivo fusions in epithelial cancers: Seq and ye shall find. Genome Med 7:129. by MARIO. Nat Commun 7:12023. 4. Dai X, Theobard R, Cheng H, Xing M, Zhang J (2018) Fusion genes: A promis- 22. Forbes SA, et al. (2017) Cosmic: Somatic cancer genetics at high-resolution. Nucleic ing tool combating against cancer. Biochim Biophys Acta Rev Cancer 1869:149– Acids Res 45:D777–D783. 160. 23. Martelli MP, et al. (2009) EML4-ALK rearrangement in non-small cell lung cancer and 5. Stransky N, Cerami E, Schalm S, Kim JL, Lengauer C (2014) The landscape of kinase non-tumor lung tissues. Am J Pathol 174:661–670. fusions in cancer. Nat Commun 5:4846. 24. Li H, Wang J, Mor G, Sklar J (2008) A neoplastic gene fusion mimics trans-splicing of 6. Yoshihara K, et al. (2015) The landscape and therapeutic relevance of cancer- RNAs in normal human cells. Science 321:1357–1361. associated transcript fusions. Oncogene 34:4845–4854. 25. Gu W, Zhang F, Lupski JR (2008) Mechanisms for human genomic rearrangements. 7. Torres-Garcia W, et al. (2014) Prada: Pipeline for RNA sequencing data analysis. Pathogenetics 1:4. Bioinformatics 30:2224–2226. 26. Hu X, et al. (2018) Tumorfusions: An integrative resource for cancer-associated 8. Davidson NM, Majewski IJ, Oshlack A (2015) Jaffa: High sensitivity transcriptome- transcript fusions. Nucleic Acids Res 46:D1144–D1149. focused fusion gene detection. Genome Med 7:43. 27. R Core Team (2018) R: A Language and Environment for Statistical Computing (R 9. Lai J, et al. (2015) Fusion transcript loci share many genomic features with non-fusion Foundation for Statistical Computing, Vienna), Version 3.5.1. loci. BMC Genomics 16:1021. 28. Lawrence M, et al. (2013) Software for computing and annotating genomic ranges. 10. Sridhar B, et al. (2017) Systematic mapping of RNA-chromatin interactions in vivo. PLoS Comput Biol 9:e1003118. Curr Biol 27:610–612. 29. Lun AT, Perry M, Ing-Simmons E (2016) Infrastructure for genomic interactions: 11. Varley KE, et al. (2014) Recurrent read-through fusion transcripts in breast cancer. Bioconductor classes for Hi-C, ChIA-PET and related experiments. F1000Res 5:950. Breast Cancer Res Treat 146:287–297. 30. Gu Z, Eils R, Schlesner M (2016) Complex heatmaps reveal patterns and correlations 12. Kim J, et al. (2015) Recurrent fusion transcripts detected by whole-transcriptome in multidimensional genomic data. Bioinformatics 32:2847–2849. sequencing of 120 primary breast cancer samples. Genes Chromosomes Cancer 31. Cao X, Yan Z, Wu Q, Zheng A, Zhong S (2018) GIVE: Portable genome browsers for 54:681–691. personal websites. Genome Biol 19:92. 13. Nagano T, et al. (2015) Comparison of hi-C results using in-solution versus in-nucleus 32. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with ligation. Genome Biol 16:175. BWA-MEM. arXiv:1303.3997v2. Preprint, posted May 26, 2013. 14. van de Werken HJ, et al. (2012) Robust 4C-seq data analysis to screen for regulatory 33. Hahne F, Ivanek R (2016) Visualizing genomic data using Gviz and bioconductor. Sta- DNA interactions. Nat Methods 9:969–972. tistical Genomics. Methods in Molecular Biology, eds Mathe´ E, Davis S (Humana Press, 15. Dixon JR, et al. (2012) Topological domains in mammalian genomes identified by New York), Vol 1418, pp 335–351. analysis of chromatin interactions. Nature 485:376–380. 34. Choi Y, et al. (2009) In situ visualization of gene expression using polymer-coated 16. Wynne JW, et al. (2017) Comparative transcriptomics highlights the role of the quantum-dot-DNA conjugates. Small 5:2085–2091. activator 1 transcription factor in the host response to ebolavirus. J Virol 35. Lyubimova A, et al. (2013) Single-molecule mRNA detection and counting in 91:e01174-17. mammalian tissue. Nat Protoc 8:1743–1758. 17. Haas B, et al. (2017) Star-fusion: Fast and accurate fusion transcript detection from 36. Raj A, van den Bogaard P, Rifkin SA, van Oudenaarden A, Tyagi S (2008) Imag- RNA-seq. bioRxiv:120295. Preprint, posted March 24, 2017. ing individual mRNA molecules using multiple singly labeled probes. Nat Methods 18. Vendrell JA, et al. (2017) Detection of known and novel ALK fusion transcripts 5:877–879. in lung cancer patients using next-generation sequencing approaches. Sci Rep 7: 37. Dobin A, et al. (2013) STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29: 12510. 15–21.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1819788116 Yan et al. Downloaded by guest on October 2, 2021