Supplementary Data A Comprehensive Approach Characterizing Fusion and Their Interactions Using Biomedical Literature Somnath Tagore1, Alessandro Gorohovski1, Lars Juhl Jensen2 and Milana Frenkel-Morgenstern1* 1 The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, Safed 13195, ISRAEL 2 Cellular Network Biology Group, The Novo Nordisk Foundation Center for Research, University of Copenhagen, DENMARK *[email protected]

1. Table S1: Root and relation tokens, Bibliography 2. Table S2: Action Tokens for Fusion PPI 3. Table S3: Synonyms for Fusions, Dictionary 4. Table S4: Rulebase 5. Table S5: Fusion tokens identified by ProtFus for 100 PubMed IDs 6. Table S6: Fusion PPIs tokens identified by ProtFus for 100 PubMed IDs

Table S1: Root and relation tokens, Bibliography

Root token Relation tokens activate activates, activating, activator block blocks, blocking, blocked, blocks in, blocks with chimera chimeric, chimeric , chimeric , chimeric transcript depend dependent, depends on, depends to, depending on domain domains express expression, expressed in, expressed with, expresses in family fusions, fusion transcript, fusion transcripts, fusion protein, fusion proteins, fusion fusion gene, fusion genes gene genes, gene fusion interaction interactions, interactions with protein proteins reduce reduced, reduced form residue residues transcript transcripts

Table S2: Action Tokens for Fusion PPI

Root token Relation tokens Abolish abolish, abolishes, abolished, abolishing Accelerate accelerate, accelerates, accelerated, accelerating Acceptor acceptor Accumulate accumulate, accumulates, accumulated, accumulating, accumulation Acetylate acetylate, acetylates, acetylated, acetylating, acetylation Activate activate, activates, activiated, activating, activation, activator Affect affect, affects, affected, affecting Alter alter, alters, altered, altering, alteration Amplify amplify, amplifies, amplified, amplifying, amplification Apoptosis apoptosis Assemble assemble, assembles, assembled, assembling Associate associate, associates, associated, associating, association Attach attach, attaches, attached, attaching, attachment Attack attack, attacks, attacked, attacking Bind bind, binds, bound, binding Block block, blocks, blocked, blocking carbamoylate, carbamoylates, carbamoylated, carbamoylating, Carbamoylate carbamoylation Carboxylate carboxylate, carboxylates, carboxylated, carboxylating, carboxylation Catalyze catalyze, catalyzes, catalyzed, catalyzing Cleave cleave, cleaves, cleaved, cleaving Co- immunoprecipitat co-immunoprecipitate, co-immunoprecipitates, co-immunoprecipitated, co- e immunoprecipitating, co-immunoprecipitation, co-immunoprecipitations Compare compared, comparison, compared to Complex complex, complexes, complexed, complexing, complexation Conjugate conjugate, conjugates, conjugated, conjugating, conjugation Contact contact, contacts, contacted, contacting Couple coupled, coupled with, coupled to Covalent covalent link, covalently linked to Deaccetylate deaccetylate, deaccetyltes, deaccetylated, deaccetylating, deaccetylation Deaminate deaminate, deaminates, deaminated, deaminating, deamination decarboxylate, decarboxylates, decarboxylated, decarboxylating, Decarboxylate decarboxylation Decrease decrease, decreases, decreased, decreasing Dehydrate dehydrate, dehydrates, dehydrated, dehydrating, dehydration dehydrogenate, dehydrogenates, dehydrogenated, dehydrogenating, Dehydrogenate dehydrogenation Demethylate demethylate, demethylates, demethylated, demethylating, demethylation dephosphorylate, dephosphorylates, dephosphorylated, dephosphorylating, Dephosphorylate dephosphorylation Deplete deplete, depletes, depleted, depleting, depletion Disassemble disassemble, disassembles, disassembled, disassembling Discharge discharge, discharges, discharged, discharging Dock dock, docks, docked, docking down-regulate, down-regulates, down-regulated, down-regulating, down- Down-regulate regulation downregulate, downregulates, downregulated, downregulating, Downregulate downregulation Elevate elevate, elevates, elevated, elevating, elevation Enhance enhance, enhances, enhanced, enhancing Express express, expresses, expressed, expressing, expression, express as Formylate formylate, formylates, formylated, formylating, formylation Fusion fusions, fusion proteins Glycosylate glycosylate, glycosylates, glycosylated, glycosylating, glycosylation Hasten hasten, hastenes, hastening, hastened heterodimerize, heterodimerizes, heterodimerizing, heterodimerized, Heterodimerize heterodimerization, heterodimer, heterodimers homodimerize, homodimerizes, homodimerizing, homodimerized, Homodimerize homodimerization, homodimer, homodimers hydrolyse hydrolyse, hydrolyses, hydrolysing, hydrolysed, hydrolysis Inactivate inactivate, inactivates, inactivated, inactivating, inactivation Incite incite, incites, incited, inciting Induce induce, induces, induced, inducing, induction Infect infect, infects, infected, infecting Influence influence, influences, influencing, influenced Inhibit inhibit, inhibits, inhibited, inhibiting, inhibition, inhibitors Initiate initiate, initiates, initiated, initiating, initiation Interact interact, interacts, interacts, interacting, interaction Impair impair, impairs, impaired, impairing Isomerize isomerize, isomerizes, isomerized, isomerizing, isomerization Ligate ligate, ligates, ligated, ligating, ligation, ligand Mediate mediate, mediates, mediated, mediating Methylate methylate, methylates, methylated, methylating, methylation Modify modify, modifies, modified, modifying, modification Modulate modulate, modulates, modulating, modulated Myogenesis myogenesis Overexpress overexpress, overexpresses, overexpressed, overexpressing, overexpression Oxidize oxidize, oxidizes, oxidized, oxidizing, oxidation Pair pair, pairs, paired, paring Participate participate, participates, participated, participating, participation Peroxidize peroxidize, peroxidizes, peroxidized, peroxidizing, peroxidation phosphorylate, phosphorylates, phosphorylated, phosphorylating, Phosphorylate phosphorylation Prevent prevent, prevents, prevented, preventing Produce produce, produces, produced, producing, production Promote promote, promotes, promoted, promoting, promotion Protein proteins, protein-protein, PPI React react, reacts, reacted, reacting, reaction Recognize recognize, recognizes, recognized, recognizing, recognition Recruit recruit, recruits, recruited, recruiting Regulate regulate, regulates, regulated, regulating, regulation Replace replace, replaces, replaced, replacing Repress repress, represses, repressed, repressing, repression Severe severe, severes, severed, severing Split splitting Stimulate stimulate, stimulates, stimulated, stimulating, stimulation, stimulator Substitute substitute, substitutes, substituted, substituting, substitution Suppress suppress, suppresses, suppressed, suppressing, suppression Tether Tether, tethers, tethered, tethering transactivate, transactivates, transactivated, transactivating, transactivation, Transactivate transactivator Transaminate transaminate, transaminates, transaminated, transaminating, transamination Ubiquitinate ubiquitinate, ubiquitinates, ubiquitinated, ubiquitinating, ubiquitination Upregulate upregulate, upregulates, upregulated, upregulating, upregulation, upregulator up-regulate, up-regulates, up-regulated, up-regulating, up-regulation, up- Up-regulate regulator

Table S3: Synonyms for Fusions, Dictionary

Alternate Fusion proteins Synonyms representations ews fli1, EWSR1 EWS, SH2D1B EAT2 , ELK1, PDGFC SCDGF, ETV2 EWSR1/FLI1, EWS-FLI1 ER71,ETSRP71 EWS FLI-1 Fc-containing NA NA motif-GST NA NA EPO-Fc NA NA VpHsf-GFP NA NA GLP-1 NA NA GLP-1 NA NA GFP-SlGGB1 NA NA receptor-Fc NA NA LSL-tagged NA NA LSL-tagged NA NA Ag85B-ESAT6 NA NA

Rep/VP rep VP0041, rep VP02_18990 Rep-VP Rep/VP NA NA NS1-epitope NA NA DISC1-Boymaw NA NA DISC1-Boymaw NA NA RUNX1 AML1,CBFA2, RUNX1T1 AML1T1,CBFA2T1,CDR,ETO,MTG8, TCF12 BHLHB20,HEB,HTF4, SON C21orf50,DBP5,KIAA1019,NREBP,HSPC310, CBFA2T3 AML1/ETO MTG16,MTGR2,ZMYND4 AML1-ETO AcrB-AcrA NA NA Antibody- cytokine NA NA SRRP1-GFP NA NA E3-ubiquitin NA NA PD-LIg NA NA PD-L1Ig NA NA receptor-Fc NA NA ZC3HC1 NIPA,HSPC216, PPID CYP40,CYPD, SFPQ NPM-ALK PSF NPM/ALK GluA2-GluK NA NA F-specific NA NA HN-specific NA NA His-SUMO NA NA VP2-VP3 NA NA VP2-VP3 NA NA his-tag NA NA TMPRSS2-ERG ERG TMPRSS2/ERG RET CDHF12,CDHR16,PTC,RET51, TRIM24 RNF82,TIF1,TIF1A, TRIM33 KIAA1113,RFG7,TIF1G, NCOA4 ARA70,ELE1,RFG, MAPK15 ERK7,ERK8, SH2B1 KIAA1299,SH2B, PDLIM7 ENIGMA, RET- RET/PTC2, RET/PTC ELE1, ELE1-RET RET-PTC BMP7-BMP2 BMPR2 PPH1, Bmpr2 BMPP7/BMP2 MAP-1 NA NA PROTEIN-ES1 NA NA RPS6-AtHD2B NA NA thioredoxin- tagged NA NA NKp80-Fc NA NA NKp80-Fc NA NA BCR-ABL1 SETBP1 KIAA0437, BTK, Bach2, ZNFN1A1 IKZF1, BCR/ABL1 BCR-ABL1 NA NA Vpr-IN vpr, VPRBP DCAF1,KIAA0800,RIP, Vpr/IN protein-NCX1 NA NA hepcidin- thioredoxin NA NA EML1 EMAP1,EMAPL,EMAPL1, EML2 EML4-ALK EMAP2,EMAPL2 EML4/ALK NAB2-STAT6 NA NA NAB2-STAT6 NA NA NAB2-STAT6 NA NA NAB2-STAT6 NA NA LDHC-RFP NA NA protein- nitroreductase NA NA antibody/Fc NA NA BCR-ABL NA NA catalase- lipoxygenase NA NA catalase- lipoxygenase NA NA KIR-Fc NA NA MYB-NFIB MYB/NFIB MYB/NFIB IFNalpha1- THYalpha1 NA NA poIFNalpha1- THYalpha1 NA NA poIFNalpha1- THYalpha1 NA NA envelope-NS1 NA NA Flag-tagged NA NA Flag-tagged NA NA MIIA-Strep NA NA hsp70-p24 NA NA EGFP-HttQ52 NA NA AdoMetDC- SpdSyn NA NA protein-tagged NA NA signaling- enhanced NA NA BCR-ABL NA NA CD19-specific NA NA Osteocalcin- fibronectin NA NA antibody- cytokine NA NA SP-target NA NA SP-LacA NA NA cell-permeable NA NA HA-Fc NA NA GFP-NACC1 NA NA flagellin-PAc NA NA EML4-ALK NA NA IL-2 NA NA cell-permeable NA NA B7-specific NA NA translocation- associated NA NA PgMYB1- mGFP5 NA NA CTCR3-MAML2 NA NA CCDC6-RET RET CDHF12,CDHR16,PTC,RET51, CCDC6-RETc, CCDC6/RETc, CCDC6-RET NA NA CCR5/CCR2b NA NA Alpha-TFEB NA NA peptide-Fc NA NA X-MLL NA NA RUNX1- RUNX1 RUNX1T1 RUNX1/RUNX RUNX1T1 1T1 androgen- regulated NA NA mucin-type NA NA OSCAR-Fc NA NA TREM2-Fc NA NA TREM2-Fc NA NA Mer/Fc NA NA FP-tubulin NA NA FP-Tub1 NA NA FUS-KLF17 NA NA EWS-FLI1 NA NA Antibody- cytokine NA NA CBFbeta- SMMHC NA NA IL-2 NA NA 2-STAT6 NA NA gamma-gliadin NA NA TNF-induced NA NA TTP-susceptible NA NA r-Cpae NA NA bio-functional NA NA PutAPX-GFP NA NA ZZ-AP NA NA TACI-Ig NA NA TPM3-ALK NA NA NAB2-STAT6 NA NA PSF-TFE3 TFE3 BHLHE33 PSF/TFE3 EML4-ALK NA NA EML4-ALK NA NA EML4-ALK NA NA Rec11-Rec10 NA NA S- NA NA GFP-HNF1alpha NA NA CD30- Immunoglobulin NA NA PML MYL,PP8675,RNF71,TRIM19, PRAM1, UBE2I UBC9,UBCE9, RARA NR1B1, SUMO1 SMT3C,SMT3H3,UBL1,OK/SW-cl.43, SUV39H1 KMT1A,SUV39H, PIAS2 PIASX, CSNK2A1 CK2A1, PML-RARa SUMO2 SMT3B,SMT3H2, RARA PML/RARA BCR-ABL1 NA NA BCR-ABL1 NA NA BCR-ABL1 NA NA TPM3-NTRK1 NA NA neoplasia- associated NA NA anti-CD19 NA NA PmERP15-EGFP NA NA above-mentioned NA NA TNS3-MAP3K3 ZFPM2-ELF5 TNS3/MAP3K3 ZFPM2-ELF5 MAP3K3-TNS3, TNS3-MAP3K3 ZFPM2/ELF5 TAT-Nanog NA NA TACI-Ig NA NA IL-2 NA NA IL-2 NA NA IL-2 NA NA EWSR1-WT1 NA NA M3-T4L NA NA mouse-porcine NA NA antibody- cytokine NA NA Antibody- cytokine NA NA antibody- cytokine NA NA cell-cell NA NA GST-CTD NA NA GPR84-Gialpha NA NA bi-functional NA NA NCOA4/RET NA NA FGFR4 JTK2,TKF, PAX3 HUP2, FAM193B PAX3-FOXO1 IRIZIO,KIAA1931 PAX3/FOXO1 MDR1-mApple NA NA EGFP-rab11a NA NA promoter-reporter NA NA JAGGED2-Fc NA NA SF301-mCherry NA NA synthase/phospha tase NA NA GPC3-targeted NA NA FN1-FGFR1 NA NA NKp30-Fc NA NA BCR-ABL NA NA N-terminal NA NA TCF3-PBX1 TCF3 BHLHB21,E2A,ITF1, PBX1 PRL, ANKS1B TCF3/PBX1 L20h-Ts3 L20h-Ts3 L20h/Ts3 L20h-Ts3 NA NA TMPRSS2-ERG ERG TMPRSS2/ERG TMPRSS2-ERG NA NA TMPRSS2-ERG NA NA TMPRSS2-ERG NA NA GFP-PGRMC2 NA NA LapA-GFP NA NA TMPRSS2-ERG NA NA gamete-specific NA NA PR8/WSN NA NA EWSR1-related NA NA EWSR1-PBX3 NA NA EWSR1-PBX3 NA NA EWSR1-related NA NA EWSR1- CREB3L1 NA NA FUS-CREB3L2 CREB3L2 BBF2H7 HUS/CREB3L1 GFP-VirB11 NA NA 18-kDa NA NA MYB-NFIB MYB/NFIB MYB/NFIB MYB-NFIB NA NA MYB-NFIB NA NA NKG2D-IgG1 NA NA TAT-gelonin NA NA F8-IL4 NA NA BCOR-CCNB3 NA NA ICAM1-Fc NA NA EML4-ALK NA NA mouse/human NA NA NOTCH3-Fc NA NA G-C5a NA NA BclS-GFP NA NA FH/Fc NA NA FH/Fc NA NA SNAP-tag NA NA SNAP-tagged NA NA single-chain NA NA in-frame NA NA EstA- autotransporter NA NA YFP-sarcomeric NA NA MECT1- MECT1/MAML MAML2 CRTC1 KIAA0616,MECT1,TORC1,WAMTP1 2 DOX-CYP NA NA GP16-EGFP NA NA CD19-specific NA NA CD9-GFP NA NA Pvs25-PvCSP NA NA TMPRSS2-ERG NA NA scFv425-sTRAIL NA NA Hv1a/GNA NA NA Pl1a/GNA NA NA Hv1a/GNA NA NA protein-hPXR NA NA PABD-YFP NA NA FC5-Fc NA NA luciferase-PAWP NA NA Rab3A/Rab22A TBC1D10B FP2461, Tbc1d10b Rab3A-Rab22A Single-chain NA NA GMCSF-NAg NA NA WWTR1-FOSB NA NA ABCD2-EGFP NA NA FAM131B- BRAF NA NA NPM-RAR NA NA Gag-Pol NA NA EWSR1-FLI1 EWSR1/FLI1 EWSR1/FLI1 Fc-tagged NA NA EWS/ETS EWSR1 EWS , FEV PET1 EWS-ETS PrgI-SipD NA NA beta-lactamase NA NA DT-IL3 NA NA DDX3X-PRKD1 NA NA ARID1A-PRKD1 NA NA LMP1/CD40 NA NA agonist/antagonis t NA NA protein-coding NA NA NP-CD40L NA NA 6xHis- TdsPLA2III NA NA SUMO-Tpz1 NA NA Azu-P450 NA NA RUNX1 AML1,CBFA2, RUNX1T1 AML1T1,CBFA2T1,CDR,ETO,MTG8,TCF12 BHLHB20,HEB,HTF4 , SON C21orf50,DBP5,KIAA1019,NREBP,HSPC310, CBFA2T3 MTG16,MTGR2,ZMYND4 , Cbfa2t3 Cbfa2t3h,Mtgr2 , AML1/ETO Usp18 Ubp43 , pir AML1-ETO GFP-scFv NA NA C-terminal NA NA PML-RARA NA NA PML-RARA NA NA PML MYL,PP8675,RNF71,TRIM19 , RARA NR1B1 , PML-RARalpha UBE2I UBC9,UBCE9 , Trim24 Tif1,Tif1a PML/RARalpha EWSR1-FEV NA NA EWSR1-FLI1 EWSR1/FLI1 EWSR1/FLI1 EML1 EMAP1,EMAPL,EMAPL1, EML2 EML4/ALK , EML4-ALK EMAP2,EMAPL2 EML4-ALK, IgG-IDS NA NA HIRMAb-IDS NA NA HIRMAb-IDS NA NA HIRMAb-IDS NA NA HIRMAb-IDS NA NA ERV-Flt3 NA NA scFvBaP1- SUMO NA NA receptor-Galpha NA NA pirB-cry2Aa NA NA L4-L5 NA NA SLK-LacZ NA NA SLK-LacZ NA NA SLK-LacZ NA NA KRT5/KRT8 NA NA Cancer-specific NA NA BCR-ABL NA NA tumor-specific NA NA MICA-Fc NA NA Trx-hGH NA NA FGFR3-TACC3 NA NA FGFR3-TACC3 NA NA GST-TvCyP1 NA NA B-C2 NA NA REST-VP16 NA NA Trx-hCTRP1 NA NA EWS-FLI1 NA NA EWS/FLI NA NA EaF82a-sGFP NA NA c-myc NA NA antibody- cytokine NA NA F8-IFNgamma NA NA antibody- cytokine NA NA FGFR3-TACC3 NA NA CD19-specific NA NA Fla-L2 NA NA TMPRSS2-ERG NA NA TMPRSS2-ERG NA NA His-tagged NA NA NPM-ALK NA NA TRPC6-V5 NA NA JAZF1-SUZ12 SUZ12 CHET9,JJAZ1,KIAA0160 , JAZF1 TIP27,ZNF802 JAZF1/SUZ12 EGFP-CPP NA NA EWSR1-NR4A3 NA NA TAF15-NR4A3 NA NA TCF12-NR4A3 NA NA TAT-gelonin NA NA TRIM24-BRAF Trim24 Tif1,Tif1a TRIM24/BRAF Cell-cell NA NA cell-cell NA NA GFP-STRS NA NA IL13Ralpha2- targeted NA NA CIC-DUX4 CIC/DUX4 CIC/DUX4 LAMTOR1- PRKCD NA NA

Table S4: Rulebase

Description Rule Reg Ex Should follow with a Starting with letter/tok space en \s\w+ Should be consider Tokens ed as a separated fusion with Dash token \w+(\-)\w+ Should be consider Tokens ed as a separated fusion with Colon token \w+(\:)\w+ Should be Tokens consider separated ed as a with Front fusion Slash token \w+(\/)\w+ Should follow Tokens with a separated by letter/tok space en \w+\s+w+ Should be separate Tokens with d by fusion word space/to occurrence kens \s\w+\s(('fusion|fusions|fusion genes|gene fusion|fusion protein|fusion transcripts')\s Should be consider Any Greek ed non- letter in english middle token \w+ [aßYÖE ] \w+

Tokens with Should \s\w+\s('chimeric|chimeric transcript|chimeric gene')\s chimeric be word separate occurrence d by space/to kens Should be consider ed am Protein name alpha- has a number neumeri in the middle c token \w+[0-9]\w+ Should be Tokens with separate transcript d by word space/to occurrence kens \s\w+\s('transcripts')\s Should be consider ed as a Tokens with fusion chimeric token if word adjective occurrence has a followed by dash/col an adjective on/front (\s(chimera|chimeric|chimeric gene|chimeric transcript )\s\w+(\-|\:|\/)\w+\s) OR (\s\w+(\- or vice-versa slash |\:|\/)\w+\s(chimera|chimeric|chimeric gene|chimeric transcript )\s) Part of Last character fusion is a dash protein \w+ - Should be consider ed as a fusion Tokens with token if fusion word adjective occurrence has a followed by dash/col an adjective on/front (\s(fusion|fusions|fusion transcript|fusion transcripts|fusion proteins|fusion genes )\s\w+(\-|\:|\/)\w+\s) OR (\s\w+(\- or vice-versa slash |\:|\/)\w+\s(fusion|fusions|fusion transcript|fusion transcripts|fusion proteins|fusion genes )\s) Part of First character fusion is a dash protein -\w+ Tokens with depend word Should occurrence be preceded and consider succeeded by ed as adjective interacti token on \s\w+(\-|\:|\/)\w+\s(dependent|depends on|independent|depends to|depending on )\s\w+(\-|\:|\/)\w+\s) All letters are in uppercase N/A [A-Z]+ Tokens with express word Should occurrence be preceded and consider succeeded by ed as adjective interacti token on \s\w+(\-|\:|\/)\w+\s(express|expression|expressed in|expressed with|expresses in )\s\w+(\-|\:|\/)\w+\s) Any natural number N/A [0-9]+ Tokens with interact word Should occurrence be preceded and consider succeeded by ed as adjective interacti token on \s\w+(\-|\:|\/)\w+\s(interact|induce|initiate|modulate|produce|incite )\s\w+(\-|\:|\/)\w+\s) Tokens with negative Should words be occurrence consider preceded and ed as succeeded by interacti \s\w+(\-|\:|\/)\w+\s(dephosphorylate|decarboxylate| demethylate|deaccetylate|deaminate|dehydrogenate)\s\w+(\- adjective on |\:|\/)\w+\s) token

First letter is in uppercase N/A [A-Z]\w+ Tokens with positive words Should occurrence be preceded and consider succeeded by ed as \s\w+(\- adjective interacti |\:|\/)\w+\s(phosphorylate|phosphorylated|phosphorylation|acetylate|acetylated|acetylation|carboxylate|carbamoylate|de token on phosphorylate|decarboxylate|methylate|formylate|glycosylate|ubiquitinate|transaminate)\s\w+(\-|\:|\/)\w+\s) Any roman letter in the middle N/A \w+ [IVXDLCM]+ \w+ Combination of alphabets and numbers. First character is an alphabet. N/A \w+ [A-Za-z] \w+ [0-9]\w+ Any roman letter N/A [IVXDLCM]+ Tokens with process words Should occurrence be preceded and consider succeeded by ed as \s\w+(\- adjective interacti |\:|\/)\w+\s(enhance|enhanced|enhancing|enhances|amplify|amplifies|elevate|express|promote|influence|react|mediate)\s token on \w+(\-|\:|\/)\w+\s) Mixture of uppercase and lowercase letters N/A [A-Za-z]+ Tokens with increase activity words Should occurrence be preceded and consider succeeded by ed as adjective interacti \s\w+(\-|\:|\/)\w+\s(activate|activates|activating|activator|accelerate|accelerating|affect|stimulate|regulate)\s\w+(\- token on |\:|\/)\w+\s) Two digit numbers N/A [0-9][0-9] Tokens with decrease activity words Should occurrence be preceded and consider succeeded by ed as adjective interacti token on \s\w+(\-|\:|\/)\w+\s(block|blocks|blocking|blocks in|blocks with|blocked|attacks|attacked|abolish)\s\w+(\-|\:|\/)\w+\s) Combination of alphabets and numbers. First character is a number. N/A \w+ [0-9] \w+ [A-Za-z]\w+ Tokens with breakdown event words Should occurrence be preceded and consider succeeded by ed as adjective interacti token on \s\w+(\-|\:|\/)\w+\s(catalyze|cleave|dissassemble)\s\w+(\-|\:|\/)\w+\s) First letter is in uppercase. Second letter is in lowercase. N/A [A-Z][a-z]\w+ Tokens with negation event words Should occurrence be preceded and consider succeeded by ed as \s\w+(\- adjective interacti |\:|\/)\w+\s(alter|decrease|deplete|discharge|downregulate|inactivate|inhibit|impair|modify|prevent|repress|suppress|tethe token on r)\s\w+(\-|\:|\/)\w+\s) Ranges from 0-9 N/A [0-9] Tokens with drill-down event words Should occurrence be preceded and consider succeeded by ed as \s\w+(\- adjective interacti |\:|\/)\w+\s(hydrolyse|isomerize|ligate|oxidize|peroxidize|transactivate|heterodimerize|homodimerize|split)\s\w+(\- token on |\:|\/)\w+\s) Any Greek letter N/A \w+ [aßYÖE ] Tokens with roll-up event words Should occurrence be preceded and consider succeeded by ed as adjective interacti token on \s\w+(\-|\:|\/)\w+\s(assemble|acceptor|accumulate|associate|attach|bind|complex|conjugate)\s\w+(\-|\:|\/)\w+\s)

Table S5: Fusion tokens identified by ProtFus for 100 PubMed IDs

Fusion_ pubme ID gene1 gene2 d info

CD74-ROS1 and EZR-ROS1 fusions were significantly associated with at least focal globular immunoreactivity and plasma membranous F00000 241861 accentuation, respectively, and these patterns were specific to ROS1- 1 EZR ROS1 39 rearranged cases. F00000 225752 Subsequent molecular studies confirmed the presence of an associated 2 ACSL3 ETV1 61 ACTB-GLI1 fusion transcript. F00000 245617 Size and location of functional domains of the MLL wt, ACTN4 wt, and of 3 ACTB GLI1 89 the MLL-ACTN4 fusion protein. Seven of the twelve fusion transcripts were classified as before endoreduplication; two, CTCF-SCUBE2 and BC041478-EXOSC10 were classified later. AGPAT5-MCPH1 and SUSD1-ROD1/PTBP3 and KLK5- CDH23 were undetermined, as their allelic copy number could not be resolved by array CGH or FISH. These structural rearrangements gave rise to at least twelve expressed fusion transcripts, confirmed by RT-PCR and Sanger sequencing: RGS22-SYCP1, CTAGE5-SIP1, PLXND1-TMCC1, SEC22B-NOTCH2, KLK5-CDH23, BC041478-EXOSC10, AGPAT5- F00000 ACTN 237622 MCPH1, SUSD1-ROD1/PTBP3, SGK1-SLC2A12, RHOJ-SYNE2, 4 4 MLL 76 PUM1-TRERF1 and CTCF-SCUBE2 The SCL45A3-BRAF fusion mRNA encodes a 329–amino acid protein that comprises only a C-terminal fragment of BRAF. By contrast, the ESRP1-RAF2 and AGTRAP-BRAF fusions encode proteins with F00000 206137 substantial contribution of N-terminal sequences from the RAF fusion 5 AFF1 MLL 48 partner. F00000 AGPA 235436 In addition, common oncogenic fusions of RET and NTRK1 as well as 6 T5 MCPH1 67 PAX8/PPARγ and AKAP9-BRAF were also assessed by RT-PCR. F00000 AGTR 113108 The hybrid MSN-ALK protein had a molecular weight of 125 kd and 7 AP BRAF 34 contained an active tyrosine domain. F00000 AKAP 230936 This situation also happens for the ANKHD1–PCDH1 fusion in the SK- 8 9 BRAF 08 BR-3 sample. To further characterize the effects of the ARID1A-MAST2 fusion in MDA-MB-468 cells, we used shRNA targeting MAST2, which displayed F00000 221017 efficient knockdown of ARID1A-MAST2 fusion transcript and protein 9 ALK MSN 66 (Fig. S3k–l). F00001 ANKH 175430 In addition to wild-type TFE3, ASPSCR1-TFE3 fusion transcripts (three 0 D1 PCDH1 78 type 1 and two type 2 transcripts) were detected in all cases. The BCAS4-BCAS3 fusion transcript was detected only in MCF7 cells, but F00001 ARID1 123785 the BCAS4 gene was also overexpressed in nine of 13 breast cancer cell 1 A MAST2 25 lines. Monitoring BCR-ABL1 fusion transcripts has been widely used to reflect F00001 ASPSC 270953 the molecular response to ABL inhibitor therapies and the progression of 2 R1 TFE3 69 the disease for patients. F00001 271340 We report a case of myeloproliferative neoplasm, unclassifiable (MPN-U) 3 ATF1 EWSR1 74 with BCR-JAK2 fusion confirmed by molecular studies. These structural changes lead to the formation of fusion genes RET-PTC, TRK(-T), and BRAF-AKAP9, which originate as a result of F00001 209513 intrachromosomal or interchromosomal rearrangements and are found in 4 ATIC ALK 15 papillary thyroid carcinoma. Eight out of 27 fusion genes (BSG-NFIX, CCDC85C-SETD3, DHX35- ITCH, CMTM7-GLB1, LAMP1-MCF2L, NOTCH1-NUP214, PPP1R12A- F00001 212474 SEPT10 and SUMF1-LRRFIP2) identified here were not associated with 5 BCAS4 BCAS3 43 high-level gene amplifications Both Bcr-Abl fusion proteins exhibit an increased tyrosine kinase activity F00001 974787 and their oncogenic potential has been demonstrated using in vitro cell 6 ABL1 BCR 3 culture systems as well as in in vivo mouse models Although wild-type CANT1 has two alternative first exons (exons 1 and 1a), only exon 1a was detected in CANT1-ETV4 fusion transcripts. A F00001 184511 KLK2-ETV4 fusion protein containing the NH2-terminal KLK2 signal 7 BCR JAK2 33 peptide would be secreted and could not function as a transcription factor F00001 136794 The breakpoint is identical to the one previously reported in the CARS- 8 BIRC3 MALT1 33 ALK fusion Although the expression of SLC34A2-ROS1, EZR-ROS1, or KIF5B-RET fusion transcripts was not detected in any of the cases, the expression of F00001 238774 CD74-ROS1 fusion transcripts was detected in one (0.9%) of the 114 9 BRD3 NUTM1 38 NSCLCs. Expression of the CCDC6-RET fusion gene in LC-2/ad cells was F00002 235781 demonstrated by the mRNA and protein levels, and the genomic break- 0 BRD4 NUTM1 75 point was confirmed by genomic DNA sequencing. CDH11-USP6 fusion transcripts were demonstrated only in ABC with F00002 150263 t(16;17) but other ABCs had CDH11 or USP6 rearrangements resulting 1 BSG NFIX 24 from alternate cytogenetic mechanisms. The in-frame CDK6-MLL transcript is provocative with respect to a F00002 C2orf4 119300 potential contribution of the predicted Cdk6-MLL fusion protein in the 2 4 ALK 09 genesis of the ALL, which also contains an in-frame MLL-AF4 transcript. Recent studies have identified a subgroup of undifferentiated soft tissue sarcomas with primitive round to plump spindle cell morphology and a t(4;19)(q35;q13.1) translocation resulting in the expression of a CIC-DUX4 F00002 CANT 218131 fusion transcript, including 2 tumors previously reported by our laboratory 3 1 ETV4 56 (Cancer Genet Cytogenet 2009;195:1). The peripheral blood leukocytes revealed the t(2;17;8)(p23;q23;p23) F00002 241427 translocation and a CLTC-ALK fusion gene, which have never been 4 ALK CARS 40 reported in BPDCN or in any myeloid malignancies thus far. We report the cloning of a novel clathrin heavy-chain gene (CLTC)-TFE3 F00002 CCDC 129176 gene fusion resulting from a t(X;17)(p11.2;q23) in a renal carcinoma 5 6 RET 40 arising in a 14-year-old boy. F00002 CCND TACST 179507 COL1A1-PDGFB fusion transcripts have been demonstrated in DFSP and 6 1 D2 82 giant cell fibroblastoma as well as their hybrid lesions [4] and [5]. The first 103 bp of COL1A1 coding sequence are included in the COL1A1-USP6 fusion transcript, but this COL1A1-encoded protein is a short fragment due to a stop codon in the COL1A1 reading frame at the beginning of the USP6-contributed sequence. The novel fusion partners F00002 157356 appear well suited to drive USP6 transcription in Proteoglycans; RNA- 7 CD74 ROS1 89 Binding Proteins; THRAP3 protein, human Recent cytogenetic and molecular analyses have shown that most LGFMSs F00002 169319 have a characteristic chromosomal abnormality, t(7;16)(q33;p11), resulting 8 CDH11 USP6 51 in the FUS-CREB3L2 fusion gene. Although the CREB3L2-PPARG fusion is rare, its existence points to a F00002 200751 limitation of the PPFP RT-PCR assay if the goal is to detect PPARγ fusions 9 CDK6 MLL 82 as markers of potential thyroid malignancy. F00003 CHCH 174372 We analyzed 55 primary salivary gland tumors including 22 0 D7 PLAG1 81 mucoepidermoid carcinomas (MECs) to determine the association of MECT1/TORC1/CRTC1-MAML2 fusion transcript to tumor types, level of MEC differentiation and clinicopathologic parameters. We developed reverse transcription-polymerase chain reaction (RT-PCR) F00003 197497 assays for CRTC1-MAML2, CRTC2-MAML2, and CRTC3-MAML2 1 CIC DUX4 40 fusions. F00003 118941 In addition, one of the tumors expressed a cryptic CTNNB1-PLAG1 fusion 2 BRAF CLCN6 14 transcript. We found that the t(1;19) in TS-2 fuses the 19p13 gene DAZAP1 (Deleted in Azoospermia-Associated Protein 1) to the 1q23 gene MEF2D (Myocyte F00003 157443 Enhancer Factor 2D), leading to expression of reciprocal in-frame 3 ALK CLTC 50 DAZAP1/MEF2D and MEF2D/DAZAP1 transcripts. The FUS-DDIT3 fusion oncogene results from a t(12;16)(q13;p11) F00003 188500 translocation and has a causative role in the initiation of 4 CLTC TFE3 10 myxoid/round cell liposarcomas (MLS/RCLS). By reverse transcription polymerase chain reaction (RT-PCR) of the RNA F00003 102226 from the leukemic cells of the patient, DDX10-NUP98 and NUP98- 5 CNBP USP6 53 DDX10 fusion transcripts were detected. F00003 COL1 218350 Both programs missed two fusion transcripts: DHX35-ITCH and NFS1- 6 A1 PDGFB 07 PREX1. BT474 RAB22A-MYO9B 20-19 56886176 17256205 8 20 In this study, we examined 75 primary CRCs and 121 primary lung cancers F00003 COL1 248477 in the Japanese population for EIF3E-RSPO2 and PTPRK-RSPO3 fusion 7 A1 USP6 61 transcripts using RT-PCR and subsequent sequencing analyses. Here we report the first experimental model of MLL. Murine bone marrow F00003 COL1 126428 cells were retrovirally transduced to express the MLL-eleven nineteen 8 A2 PLAG1 66 leukemia (MLL-ENL) fusion protein. As similar gene fusions were reported in endometrial stromal sarcomas, we F00003 CREB3 242854 screened for potential gene abnormalities in JAZF1 and EPC1 by FISH and 9 L2 FUS 34 found two additional cases with EPC1-PHF1 fusions. Here we used paired-end transcriptome sequencing to screen ETS rearrangement-negative prostate cancers for targetable gene fusions and identified the SLC45A3-BRAF (solute carrier family 45, member 3-v-raf murine sarcoma viral oncogene homolog B1) and ESRP1-RAF1 (epithelial F00004 CREB3 205263 splicing regulatory protein-1-v-raf-1 murine leukemia viral oncogene 0 L2 PPARG 49 homolog-1) gene fusions. The ETV6-ABL1 fusion gene has also been identified in 3 patients with chronic myeloproliferative neoplasms other than “chronic myeloid F00004 CREB 211934 leukemia” (cMPN) as well as 7 patients with BCR-ABL1 negative acute 1 BP KAT6A 23 lymphoblastic leukemia and 4 patients with acute myeloid leukemia Overproduction of IL3 has been reported in atypical CML following F00004 MAML 165722 rearrangements of the IL3 gene upstream region in cells from patients with 2 CRTC1 2 02 t(5;12) (q23–31;p13) translocation and ETV6-ACSL6 fusion F00004 MAML 200330 Schematic diagram of the protein domains fused in the predicted ETV6– 3 CRTC3 2 38 ITPR2 fusion protein. To better understand the cellular origin of breast cancer, we developed a mouse model that recapitulates expression of the ETV6-NTRK3 (EN) F00004 CTNN 180686 fusion oncoprotein, the product of the t(12;15)(p13;q25) translocation 4 B1 PLAG1 31 characteristic of human secretory breast carcinoma. To address this, we investigated the presence of FUS-ATF1, EWSR1- F00004 DAZA 177247 ATF1, and the highly related EWSR1-CREB1 fusion in a group of nine 5 P1 MEF2D 45 AFHs. The first is a genetic action of the EWSR1-DDIT3 fusion protein, which results in binding to the functional C/EBP site within Opn and Col11a2 F00004 225707 promoters through interaction of its DNA-binding domain and subsequent 6 DDIT3 FUS 37 interference with endogenous C/EBPβ function. Many studies showed that EWSR1/FEV, EWSR1/FLI and EWSR1/ERG fusion proteins played similar roles. EWSR1/SP3 t(2;22)(q31;q12) SP3 is a transcription factor belonging to the Sp/XKLF family able to recognize F00004 233293 GCrich DNA motifs, found in many promoters and enhancers of 7 DDX10 NUP98 08 housekeeping genes F00004 985883 The discovery of this translocation suggested that there might be a novel 8 DDX5 ETV4 6 EWSR1-ETV4 fusion gene. F00004 EIF3E RSPO2 234808 Fluorescence in situ hybridization mapping suggested the involvement of 9 95 each of the 2 partner genes, and reverse transcriptase polymerase chain reaction revealed an in-frame EWSR1-NFATC1 transcript. In addition, rare cases of Ewing sarcoma where EWSR1 becomes fused to another type of transcription factor have been reported: inv(22)(q12q12) F00005 224672 resulted in an EWSR1-PATZ1 (POZ/BTB and A-T-hook containing zinc 0 ELL MLL 49 finger 1) fusion gene A novel EWSR1-PBX1 fusion gene consisting of exons 1-8 of the 5'-end F00005 183832 of EWSR1 and exons 5-9 of the 3'-end of PBX1 was shown to result from 1 ALK EML4 10 the translocation. A EWSR1-POU5F1 fusion was identified in a pediatric soft tissue tumor by 3'Rapid Amplification of cDNA Euds (RACE) and subsequently F00005 208150 confirmed in four additional soft tissue tumors in children and young 2 EPC1 PHF1 32 adults. Mapping analysis demonstrated that deletion of the C-terminus (SLIDE or F00005 211131 SANT motives) of hSNF2H impaired, and deletion of the SNF2_N domain 3 ERC1 RET 40 fully abrogated NIH3T3 cell transformation by EWSR1-SMARCA5. Common genomic events (i.e., trisomy 3 and extra EWSR1-WT1 and F00005 243883 WT1-EWSR1 copies) probably contributed to disease pathogenesis and/or 4 ESRP1 RAF1 97 evolution of DSRCT. How the new fusion gene contributes to tumorigenesis is unknown, but the finding of an EWSR1 rearrangement suggests that this, possibly even the F00005 197606 EWSR1-ZNF444, is a defining pathogenetic feature of at least a subset of 5 ABL1 ETV6 02 these tumors Functional characterization of the novel FAM131B-BRAF fusion F00005 214245 demonstrated constitutive MEK phosphorylation potential and 6 ETV6 ITPR2 30 transforming activity in vitro. F00005 243459 Additionally, a FCHSD1-BRAF fusion was identified in a large congenital 7 ETV6 JAK2 20 melanocytic nevus (LCMN) (13). F00005 117391 The study demonstrates that the BCR-FGFR1 fusion may occur in patients 8 ETV6 NTRK3 86 with apparently typical CML. To determine whether other intrachromosomal FGFR-TACC fusion combinations exist in human GBM, we screened cDNA from an independent panel of 88 primary GBMs and discovered two additional cases (one harboring FGFR1-TACC1 and one FGFR3-TACC3), F00005 228373 corresponding to 3 of 97 total GBMs (3.1%), including the GBM-1123 9 ETV6 RUNX1 87 case F00006 213946 The TCEA1-PLAG1, HMGA2-FHIT, and HMGA2-NFIB fusion 0 CREB1 EWSR1 49 transcripts were not detected. F00006 165025 We recently identified the FIP1L1-PDGFRA fusion gene in approx 50% of 1 DDIT3 EWSR1 85 HES/CEL cases. Genetic characterization of these ALK-positive tumors indicated that full- length ALK expression in two serous carcinoma patients is consistent with F00006 225702 ALK gene copy number gain, whereas a stromal sarcoma patient carries a 2 ERG EWSR1 54 novel transmembrane ALK fusion gene: FN1-ALK. PAX3-FOXO1 (PAX3-FKHR) is the fusion protein produced by the F00006 258068 genomic translocation that characterizes the alveolar subtype of 3 ETV1 EWSR1 26 Rhabdomyosarcoma, a pediatric sarcoma with myogenic phenotype. Either FUS-ATF1 or EWSR1-ATF1 have been detected in the few cases F00006 EWSR 180944 published, pointing to the interchangeable role of FUS and EWSR1 in this 4 1 FEV 13 entity. Cytogenetic analyses have identified a recurrent balanced translocation F00006 EWSR 156408 t(7;16) (q32-34;p11), later shown by molecular genetic approaches to result 5 1 FLI1 31 in a FUS/CREB3L2 fusion gene F00006 EWSR NFATC 166516 Most MLS/RCLS carry a t(12;16) translocation, resulting in a FUS-DDIT3 6 1 2 30 fusion gene. In acute myeloid leukemias harboring t(16;21), ERG function is F00006 EWSR 261482 deregulated due to a fusion with FUS/TLS resulting in the expression of a 7 1 NR4A3 30 FUS-ERG oncofusion protein F00006 EWSR 230522 We also report in lung cancer the GOPC-ROS1 fusion originally 8 1 PATZ1 55 discovered and characterized in a glioma cell line. F00006 EWSR 156424 In this case, we show that, even with seemingly normal chromosome 8 on 9 1 PBX1 02 conventional cytogenetic analysis, the joining of 8q12.1 to 8q24.1, with subsequent PLAG1-HAS2 fusion, occurred. Recently, two fusion genes were described in mesenchymal chondrosarcomas: a recurrent HEY1-NCOA2 found in tumors that had not been cytogenetically characterized and an IRF2BP2-CDX1 found in a F00007 EWSR 248399 tumor carrying a t(1;5)(q42;q32) translocation as the sole chromosomal 0 1 POU5F1 99 abnormality. A fusion between exon 2 of EIF4E2 with exon 8 of HJURP generated the fusion transcript EIF4E2-HJURP and a fusion between exon 9 of HJURP with exon 25 of INPP4A yielded HJURP-INPP4A. Additionally, we found exon 10 of RC3H2 fused to exon 20 of RGS3. One chimeric transcript F00007 EWSR SMARC 191369 from Met 3 involves exon 9 of STRN4 with exon 2. Exon 1 of USP10 (red) 1 1 A5 43 is fused with exon 3 of ZDHHC7 (green We hereby report an unusual gastric tumor arising from the pyloric wall of F00007 EWSR 269800 the stomach in a 9-year old child harboring the exceptionally rare 2 1 SP3 27 translocation t(7;12) resulting in ACTB-GLI1 gene fusion Although the expression of HMGA2-LPP fusion gene has been reported in F00007 EWSR 195285 lipomas, the reciprocal LPP-HMGA2 fusion gene has rarely been 3 1 WT1 02 described. F00007 EWSR 198372 We describe here the fourth reported case of lipoma showing a HMGA2- 4 1 YY1 71 NFIB fusion, and the first one in a child. However, in the pleomorphic adenoma expressing the HMGA2/WIF1 F00007 EWSR 171716 fusion transcript, we observed re-expression of HMGA2 wild-type 5 1 ZNF384 86 transcripts and very low levels of WIF1 expression. In our hospital's archives three more cases of MC were found, and we examined them looking for the supposedly more common HEY1-NCOA2 F00007 FAM13 231854 fusion, finding it in all three tumours but not in the case showing t(1;5) and 6 BRAF 1B 13 IRF2BP2-CDX1 gene fusion F00007 160344 7 BCR FGFR1 66 A PCM1-JAK2 fusion was recently characterised in MPDs F00007 187228 A JAZF1/PHF1 fusion gene was recently found in two tumors showing an 8 FGFR1 PLAG1 75 exchange between 6p and 7p rearrangement. F00007 268793 To date the JAZF1/SUZ12 gene fusion is by far the most frequent and 9 FGFR1 TACC1 82 seems to be the cytogenetic hallmark of ESN and LG-ESS. F00008 218848 KIAA1549-BRAF fusion transcripts have been detected in frozen tissue, 0 FGFR3 TACC3 20 however, methods for FFPE tissue have not been reported. The KIF5B-RET fusion leads to aberrant activation of RET kinase and is considered to be a new driver mutation of LADC because it segregates from mutations or fusions in EGFR, KRAS, HER2 and ALK, and a RET F00008 PDGFR 223276 tyrosine kinase inhibitor, vandetanib, suppresses the fusion-induced 1 FIP1L1 A 24 anchorage-independent growth activity of NIH3T3 cells. F00008 223474 With this system, we successfully identified a novel ALK fusion, KLC1- 2 ALK FN1 64 ALK. F00008 FOXO 210369 To our knowledge, this is the first description of a KLK2–ETV1 fusion 3 1 PAX3 22 event. F00008 191449 4 FRYL MLL 82 One patient had an MLL-ACTN4 fusion, 2 others an MLL-TET1 fusion The presence of coiled-coil domains in the resulting ktn1/ret fusion protein F00008 108504 suggests ligand-independent dimerization and thus constitutive activation 5 ATF1 FUS 14 of the ret TK domain. F00008 CREB3 236300 A LIFR-PLAG1 fusion was detected by RACE and then confirmed by 6 L1 FUS 11 FISH in one soft tissue ME tumor with tubular formation. F00008 268885 MALAT1-TFEB fusion gene was identified in 2 cases by polymerase chain 7 ERG FUS 08 reaction and direct sequencing. To focus on the identification of MLL-ACTN4 as a rare but recurrent MLL rearrangement, we present 2 cases of MLL-ACTN4 rearrangement, and F00008 124617 compare these patients with regard to diagnostic findings and clinical 8 GOPC ROS1 47 courses. Sequence analysis of reverse-transcriptional polymerase chain reaction F00008 186170 product revealed a novel variant form of MLL-ELL transcript in which 9 HAS2 PLAG1 60 MLL exon 10 was fused to ELL exon 3. F00009 209800 To check the expression of the MLL-EP300 fusion transcript in the bone 0 HEY1 NCOA2 53 marrow cells, reverse transcription PCR (RT-PCR) was performed Fifteen fusion transcripts were included: BCR-ABL1, PML-RARA, ZBTB16-RARA, RUNX1-RUNX1T1, CBFB-MYH11, DEK-NUP214, F00009 HMGA 230913 TCF3-PBX1, ETV6-RUNX1, MLL-AFF1, MLL-MLLT4, MLL-MLLT3, 1 2 LPP 11 MLL-MLLT10, MLL-ELL, MLL-MLLT1, and MLL-MLLT6 F00009 HMGA 229529 The resulting fusion genes, PICALM-AF10 and MLL-PICALM, have been 2 2 NFIB 41 found in aggressive hematologic malignancies Molecular analysis led to the identification of several MLL-SEPT6 fusion F00009 HMGA 184926 transcripts in all cases, including a novel MLL-SEPT6 rearrangement 3 2 WIF1 91 (MLL exon 6 fused with SEPT6 exon 2) In the April 2013 issue of Haematologica, Lee et al. have described the F00009 HOOK 243239 TET1 genomic breakpoints and clinical features of MLL-TET1 rearranged 4 3 RET 92 cases of acute leukemia F00009 IRF2BP 231507 Of the 13 patients, nine patients had KIF5B-RET, three patients had 5 CDX1 2 06 CCDC6-RET, and one patient had a novel NCOA4-RET fusion F00009 193184 To detect the NFATc2-EWSR1 fusion transcript, the primers 6 JAK2 PAX5 79 NFATc2_867_F and EWSR1_1561_R were used. Initially, three independent cases of MAST gene fusions were identified by F00009 241404 transcriptome analyses-ARID1A-MAST2, ZNF700-MAST1, and NFIX- 7 JAZF1 PHF1 25 MAST1 RT-PCR reactions to check the putative NIN-PDGFRB translocation were carried out with Immolase DNA polymerase heat activated (Bioline; London, United Kingdom) using NIN-1 to NIN-6 as forward primers and F00009 150873 PDGFRB-1 and PDGFRB-2 as reverse primers in separate and multiple 8 JAZF1 SUZ12 77 combinations. Finally, using quantitative reverse transcriptase PCR and immunohistochemistry (IHC) we identified the TPM3-NTRK1 rearrangement in a CRC clinical sample, therefore suggesting that this chromosomal translocation is indeed a low frequency recurring event in F00009 249627 CRC and that such patients might benefit from therapy with TRKA kinase 9 KIF5B RET 92 inhibitors F00010 100749 0 ALK KLC1 15 A single tumor exhibited a TPR/NTRK1 fusion (TRK-T2)

Table S6: Fusion PPIs tokens identified by ProtFus for 100 PubMed IDs

FP_ID PubMed Interaction Description In mice, the FIG-ROS1 fusion gene has been shown to promote the formation of astrocytomas F00000 2371926 when ectopically expressed in the basal ganglia, and the EZR-ROS1 fusion gene has been 1 7 shown to promote lung adenocarcinoma when ectopically expressed in lung epithelium Expression of ACSL3 was also elevated in a panel of ‘androgen-sensitive' (LAPC-4, LNCaP, F00000 1859452 MDA PCa2a, MDA PCa2b, and 22Rv1) versus ‘androgen-insensitive' (PPC1, PC3, and 2 7 DU145) prostate cancer cell line We found that co-expression of Dyrk1 and Gli1 strongly induced Gli1-dependent gene F00000 1213812 transcription in the presence of the 3′GliBS-Luc reporter construct, but not in the presence of 3 5 the mutant reporter construct, m3′GliBS-Luc Recently, Ehrlicher and Pollak et al. demonstrated that in FSGS, a K255E mutation in ACTN4 F00000 2628871 changes the cellular biological properties in which increasing the affinity for actin increases 4 7 cellular forces and work and decreases cellular movement F00000 1188637 Hybridization of the MLL/AF4 probe combination to pure SEM cells resulted in 87% of the 5 8 cells displaying the expected AF4x3/MLLx3/AF4 con MLLx2 hybridization pattern Several genes involved in fusions have been reported to be fused or rearranged in other cases– F00000 2376227 AGPAT5, NOTCH2, PUM1, SEC22B, SGK1 and TRERF1 (all early or unclassified), while 6 6 several are mutated at sequence level, notably SYNE2 The finding of mutated RAF genes in prostate cancer is consistent with a previous observation F00000 2061374 that oncogenic BRAFV600E can initiate prostate cancer in mouse models16 and may have 7 8 major implications for therapy The large amplification on 7 and 12 could potentially activate the RAS-RAF- F00000 2354366 MEK pathway by amplification of the BRAF gene located on chromosome 7 or amplification 8 7 of KRAS located on chromosome 12 F00000 1131083 These findings indicate that MSN may act as an alternative fusion partner for activation of 9 4 ALK in ALCL and provide further evidence that oncogenic activation of ALK may occur at different intracellular locations F00001 2309360 BWA found six discordant alignments with MAPQ = 0 between ANKHD1 and both PCDH1 0 8 and ANKHD1–EIF4EBP3 (ENSG00000254996). Juxtaposition of the ARID1A promoter would place control of MAST2 which is downstream F00001 2162295 of the RB1 pathway, as evidenced by the preponderance of E2F sites in the ARID1A promoter 1 9 and by the observation that ARID1A is regulated in a cell cycle-dependent manner In response, we demonstrate through both validation and extensive clinical experience that a break-apart strategy probe set for TFE3, including a chromosome X centromere probe as a F00001 2382831 control and a TFE3/ASPSCR1 dual-color, single-fusion reflex probe set, is an excellent test to 2 4 aid in the identification of tumors with Xp11.2 rearrangement. F00001 1588409 Simultaneous expression of the EWSR1-ATF1 and MITF-M transcripts in CCS has 3 9 led to the proposal that the MITF-M promoter is transactivated by EWSR1-ATF1 F00001 2055452 4 5 Grb2 has been shown to bind NPM-ALK and ATIC-ALK in previous works In the remaining cell lines, two fusion genes, BCAS4-BCAS3 and CCDC6-RET, were detected from the breast and thyroid cancer cell lines, MCF-7 and TPC-1, respectively, and F00001 2130564 thus validating the microarrays ability to detect fusion genes outside the group of positive 5 4 controls F00001 6 9747873 The SH2-containing adapter protein GRB10 interacts with BCR-ABL F00001 2390481 It was demonstrated by preclinical studies that BCR-JAK2 induces STAT5 activation and 7 4 elicits BCRxL gene expression The BIRC3–MALT1 fusion protein and the overexpressed MALT1 protein in the t(14;18) F00001 2044028 have been shown to activate NF-κB and thereby promote cellular proliferation and resistance 8 1 to apoptosis F00001 1840632 Brd2- and Brd3-associated chromatin is significantly enriched in H4K5, H4K12, and H3K14 9 6 acetylation and contains relatively little dimethylated H3K9 In contrast, BRD4 is an important member of the bromodomain and extra-terminal domain F00002 2655128 proteins (the BET family) known to regulate cell cycle progression, survival signaling, 0 1 chromatin structure, epigenetic memory and embryonic stem cell development F00002 2124744 Interphase FISH showing amplified signals of BSG and NFIX (left) and NOTCH1 and 1 3 NUP214 (right) in KPL-4 Complementary DNA (cDNA) sequencing identified 75 read pairs spanning the fusion junction (data not shown) and an 89.8-fold increase in 3′ ALK expression beginning at exon F00002 2232762 20 relative to exons 1–19, suggesting that the C2orf44-ALK fusion transcript results in ALK 2 2 kinase overexpression Exclusive usage of CANT1 exon 1a as first exon in CANT1-ETV4 fusion transcripts might F00002 1845113 have various explanations, including the positions of breakpoints of the specific genomic 3 3 rearrangement and the prostate-specific expression of transcripts starting at exon 1a F00002 1367943 This alternative splicing occurs downstream of the breakpoint identified in IMT and does not 4 3 affect the sequence of the CARS-ALK chimeric protein F00002 2315070 The fusion of the intracellular kinase-encoding domain of RET to CCDC6 and NCOA4, 5 5 among others, gives rise to ligand-independent activation of RET. F00002 2338224 Expression of an RNA chimera fusing CCND1 and TROP2 (TACSTD2) transcripts has been 6 8 demonstrated to result in immortalization and transformation of human epithelial cells F00002 2221574 To confirm this activity, we showed that crizotinib also inhibits ROS1 phosphorylation in 7 8 HEK 293 cells transfected with a CD74-ROS1 fusion gene expression construct Interestingly, we have not encountered the fusion CDH11-USP6 in NF or soft tissue ABC, F00002 2376942 and the only soft tissue ABC characterized at the cytogenetics level showed a chromosomal 8 2 rearrangement consistent with the presence of COL1A1-USP6 The central role of Cdk6 in cell cycle progression and its recurrent alteration in human cancer F00002 1193000 suggest that the CDK6-MLL juxtaposition may have been a cooperating mutation in 9 9 leukemogenesis in patient 38 The fact that both the CHCHD7-PLAG1 and TCEA1-PLAG1 fusions are caused by cytogenetically cryptic rearrangements in tumors with different karyotypic abnormalities or F00003 1673650 normal karyotypes indicates that PLAG1 gene fusions are more common than originally 0 0 suggested by conventional cytogenetics F00003 2181315 One CIC-DUX4–positive tumor showed membranous CD99 positivity, 2 showed focal S100 1 6 positivity, and 1 showed focal CD57 positivity F00003 2381757 Another new BRAF alteration was identified in ICGC_PA65, resulting in a three amino acid 2 2 insertion (p.R506_insVLR) in the interdomain cleft of BRAF - a structural region linked to its activity17 and homodimerization F00003 2414274 The tyrosine kinase domain of ALK is constitutively phosphorylated by the formation of 3 0 CLTC-ALK, the tumorigenicity of which has been verified in vitro and in vivo CLTC encodes a major subunit of clathrin, a multimeric protein on cytoplasmic organelles, F00003 1291764 and is a known recurrent fusion partner of the ALK tyrosine kinase gene in anaplastic large- 4 0 cell lymphoma and inflammatory myofibroblastic tumors The novel fusion partners appear well suited to drive USP6 transcription in the bone/mesenchymal context: osteomodulin is expressed strongly in osteoblastic lineages, and F00003 1573568 the COL1A1 promoter has an oncogenic role in the mesenchymal cancer 5 9 dermatofibrosarcoma protuberans F00003 1142070 In addition, COL1A1-PDGFB transfected cell supernatants significantly stimulated 6 9 fibroblastic cell growth, through the activation of the PDGFB receptor pathway Since the point of fusion is highly specific for PDGFB but spread over almost the entire locus F00003 1573568 for COLlAl, the role of the COL1A1 gene may be simply to up-regulate the expression of 7 9 PDGFR, which acts as an auto- or paracrine growth factor F00003 1098730 Consequently, COL1A2-PLAG1 encodes a full-length PLAG1 protein and a short, COOH- 8 0 terminal-truncated, COL1A2 protein F00003 2109258 Thus, it is expected that the novel FUS/CREB3L1 chimera will have a similar impact 9 3 at the cellular level as the much more common FUS/CREB3L2 fusion protein. Although the CREB3L2-PPARG fusion is rare, its existence points to a limitation of the PPFP F00004 2007518 RT-PCR assay if the goal is to detect PPARγ fusions as markers of potential thyroid 0 2 malignancy Although no samples in our series were positive for MLL-ELL, MLL-MLLT1, ZBTB16- RARA, RBM15-MKL1, or KAT6A-CREBBP, probably because of the low frequency of F00004 2479818 occurrence of these fusion genes, 6, 7, 9, 24, 25 and 26 the ability of this method to detect 1 6 these rearrangements was verified by testing positive controls for each of them F00004 1733499 The CRTC1-MAML2 fusion protein acts by inducing transcription of cAMP/CREB 2 7 target genes, and this activity is crucial for the transforming properties of the protein. Both gene fusions seem to result in an identical tumor phenotype and the fusion F00004 1805030 genes CRTC1-MAML2 and CRTC3-MAML2 may play a similar role in the 3 4 development of mucoepidermoid carcinomas. The t(3;8) results in promoter swapping between PLAG1 and the constitutively expressed F00004 1002908 gene for beta-catenin (CTNNB1), leading to activation of PLAG1 expression and reduced 4 5 expression of CTNNB1. F00004 1518243 In addition, exogenous expression of MEF2D-DAZAP1 and DAZAP1-MEF2D promoted the 5 1 growth of HeLa cells. F00004 2001790 In addition, CDK2 showed an increased affinity for cytoskeletal proteins in cells expressing 6 6 FUS-DDIT3 and DDIT3. F00004 2322460 Although two reciprocal chimeric products, NUP98-DDX10 and DDX10-NUP98, were 7 3 predicted, only NUP98-DDX10 appears to be implicated in tumorigenesis. Therefore, the FLJ35294-ETV1 and CANT1-ETV4 fusions can be categorized as class II gene fusions in prostate cancer which include rearrangements involving fusions from prostate- F00004 1879415 specific androgen-induced 5′ partner genes (21); whereas DDX5-ETV4 may represent a class 8 2 IV gene fusion, in which non-tissue-specific promoter elements drive ETS gene expression Although the expression of EIF3E-RSPO2 or PTPRK-RSPO3 was not detected in any of the F00004 2484776 NSCLCs, EIF3E-RSPO2 and PTPRK-RSPO3 fusion transcripts were detected in two CRCs 9 1 and one CRC, respectively These studies demonstrate the enhancing effect of MLL-ELL on the proliferative potential F00005 1099546 of myeloid progenitors as well as its causal role in the genesis of acute myeloid 0 3 leukemias. F00005 1859401 TAE684 inhibited the growth of one of three (H3122) EML4-ALK-containing cell lines 1 0 in vitro and in vivo, inhibited Akt phosphorylation, and caused apoptosis. The EPC1/PHF1 chimeric fusion led to an open reading frame containing 581 amino acid residues from EPC1, six additional amino acid residues upstream from the initial methionine, F00005 1639722 and the entire PHF1 protein sequences consisting of 567 amino acids, in total, 1,154 amino 2 2 acids in the predicted chimeric protein Interestingly, ERC1, H4(D10S170) and TPM3 are three PDGFRB partners in myeloid malignancies that are also involved in human papillary thyroid carcinoma: ERC1 and H4(D10S170) fuse with RET as a result of t(10;12)(q21;p13) and of inv(10)(q11.2q21), producing the ERC1-RET and the H4(D10S170)-RET autophosphorylated tyrosine kinase, F00005 1769069 respectively;4, 5 TPM3 rearranges with the nearby neurotrophic tyrosine kinase receptor type 3 7 1 (NTRK1/1q23) gene NIH3T3 cells over-expressing SLC45A3-BRAF formed rapidly growing tumors in nude mice F00005 2052634 (Fig. 3b); however NIH3T3 cells over-expressing ESRP1-RAF1 did not form tumors (data not 4 9 shown), which may reflect signaling differences between the different fusion products F00005 1050081 Hyperdiploidy > 50 chromosomes and ETV6-CBFA2 fusions have been used to identify 5 3 low-risk cases, and BCR-ABL and MLL-AF4 to define high-risk leukemias. F00005 2003303 ETV6-ITPR2, an expressed, in frame fusion gene generated by a 15Mb inversion in the 6 8 primary breast cancer PD3668a F00005 1707714 Expression of TEL-JAK2 in primary human hematopoietic cells drives erythropoietin- 7 0 independent erythropoiesis and induces myelofibrosis in vivo. A highly conserved NTRK3 C-terminal sequence in the ETV6-NTRK3 oncoprotein binds F00005 1466834 the phosphotyrosine binding domain of insulin receptor substrate-1 [?]: an essential 8 2 interaction for transformation. F00005 2019081 ETV6/RUNX1 abrogates mitotic checkpoint function and targets its key player 9 7 MAD2L1 CONCLUSION: We identified the presence of either EWSR1-CREB1 or EWSR1-ATF1 in F00006 1809441 all the cases, strengthening the concept of chromosomal promiscuity between AFH 0 3 and clear cell sarcoma. Overexpressing EWSR1-DDIT3 under the control of a CMV promoter using the pFLAG- CMV4 EWSR1-DDIT3 expression vector significantly repressed Opn and Col11a2 promoter F00006 2257073 activities by 79% and 78%, respectively; however, overexpressing EWSR1 and DDIT3 did 1 7 not Identical EWS nucleotide sequences found in the EWS/FLI-1 fusion transcripts are fused F00006 to portions of ERG encoding an ETS DNA-binding domain resulting in expression of a 2 8162068 hybrid EWS/ERG protein. F00006 1052382 When EWS/ETV1 or EWS/FLI1 expressing NIH3T3 cells are injected into SCID mice, 3 7 tumors form more often and faster than with NIH-3T3 cells with empty vector controls. F00006 1717284 EWS-FLI, EWS-ERG, and EWS-FEV caused NIH3T3 cells to exhibit anchorage 4 2 independent growth whereas EWS-ETV1 and EWS-ETV4 did not. F00006 Deletion of either the EWS domain or the FLI1 corresponding to the DNA-binding 5 8516324 domain totally abrogated the ability for EWS-FLI1 to transform 3T3 cells. F00006 2499390 The presence of EWSR1-NFATC2 fusion, focal small-round-cell morphology, and CD99 6 3 immunopositivity in the reoccurrence favor an Ewing-like sarcoma Western blots further show that an isoform of the native NR4A3 receptor lacking the C- terminal domain is very highly expressed in tumours positive for EWSR1/NR4A3, and F00006 1885587 co-transfections of this isoform along with EWSR1/NR4A3 indicate that it may 7 7 negatively regulate the activity of the fusion protein on the PPARG promoter The EWSR1 gene encodes a multifunctional protein, member of the ten-eleven translocation F00006 2246724 (TET) family of proteins, that is involved in various cellular processes, including gene 8 9 expression, cell signalling and ribonucleic acid (RNA) processing and transport F00006 1838321 In the EWSR1-PBX1 fusion gene detected, the 5′ transactivation domain of EWSR1 and the 3′ 9 0 DNA-binding domain of PBX1 were retainedEWSR1-PBX1 Knockdown of EWS-POU5F1 in the t(6;22) sarcoma-derived GBS6 cell line resulted in F00007 2020328 a significant decrease of cell proliferation because of G1 cell cycle arrest associated 0 5 with p27(Kip1) up-regulation. The EWSR1–SMARCA5 chimeric cDNA in the LITMUS38i vector was amplified by PCR to F00007 2111314 add the epitope tag FLAG and the Kozak consensus translation initiation sequences into the 1 0 N-terminal region F00007 2332930 SP3 is a transcription factor belonging to the Sp/XKLF family able to recognize GCrich DNA 2 8 motifs, found in many promoters and enhancers of housekeeping genes F00007 1249870 In this issue of Cancer Cell, now show that the resulting EWS-WT1 gene-fusion product 3 8 leads to overexpression of BAIAP3, a protein implicated in regulated exocytosis. F00007 2363007 The putative EWSR1–YY1 encoded protein would contain, as other EWSR1-fusion-encoded 4 0 proteins, the transactivation domain of EWSR1 and the DNA-binding domain of YY1 F00007 2332930 Transcription factor ZNF384/CIZ/NMP4 plays a role in bone metabolism and 5 8 spermatogenesis F00007 2504026 In vitro experiments demonstrated that FAM131B-BRAF is also an activator of the MAPK 6 2 pathway By analogy with data obtained from previously characterized fusion genes involving FGFR1 and BCR/ABL, it is likely that the oligomerization domain contributed by BCR is F00007 1174697 critical and that its dimerizing properties lead to aberrant FGFR1 signaling and 7 1 neoplastic transformation. F00007 1805933 Using primers located in exon 1 of FGFR1 and in exon 4 of PLAG1, we could show that 9 of 8 7 the 10 cases with r(8) expressed FGFR1–PLAG1 fusion transcripts Quantitative analyses of mitoses revealed that cells expressing FGFR3-TACC3 or FGFR1- F00007 2130564 TACC1 exhibit three to five times more errors in chromosomal segregation compared with 9 4 control cells Comprehensive genomic profiling of the original cervical biopsy was pursued to identify additional therapeutic options and revealed the following: FGFR3–TACC3 fusion (breakpoints at FGFR3 intron 18 and TACC3 intron 7), BRAF 3′ tandem duplication (breakpoint in intron 9 with duplication of exons 10–18), activating PIK3CA missense F00008 2642572 mutation (E545K), CDNK2A loss, and subclonal activating missense mutations in KRAS 0 3 (G12C), and HRAS (G13R) Recently, a novel tyrosine [?] kinase that is generated from fusion of the Fip1-like 1 F00008 1284297 (FIP1L1) and PDGFR alpha (PDGFRA) genes has been identified as a therapeutic 1 9 target for imatinib mesylate in hypereosinophilic syndrome (HES). Encoded by the first 23 exons, the FN1 portion of FN1-ALK retains a diverse set of binding F00008 2257025 domains involved in fibronectin self-association and interaction with other ECM components, 2 4 which could potentially provide strong activating signal to ALK F00008 2580682 PAX3-FOXO1 may contribute to tumor formation by inhibiting the tumor suppressor 3 6 activities which are characteristic of both FOXO family members and TGF-β pathways F00008 1819509 MEIS1 expression was substantially greater in all of the leukemia cases than in the cells with 4 6 MLL-FRYL Because most sarcomas bearing unique chromosomal translocations are believed to originate from common progenitor cells, and because MPCs populate most organs, we expressed the F00008 1684954 sarcoma-associated fusion proteins FUS/TLS-CHOP, EWS-ATF1, and SYT-SSX1 in 5 6 MPCs and tested the tumorigenic potential of these cells in vivo. The case with the variant FUS/CREB3L1 fusion also had a break in exon 5 of the CREB gene, F00008 1564083 but as the breakpoint in FUS was in exon 9, a larger portion of FUS was included in the fusion 6 1 gene F00008 2614823 We found that FUS–ERG mainly binds non-promoter regions in a complex consisting of other 7 0 ETS factors, GATA2, LMO2, LYL1, RUNX1, TAL1 and RNAPII The GOPC-ROS1 fusion protein has been shown to have constitutively active kinase activity F00008 2305225 and its transforming potential has been demonstrated in a mouse transgenic model where it 8 5 resulted in glioblastomas in an Ink4a;Arf-null background In this case, we show that, even with seemingly normal chromosome 8 on conventional F00008 1564240 cytogenetic analysis, the joining of 8q12.1 to 8q24.1, with subsequent PLAG1-HAS2 fusion, 9 2 occurred The absence of HEY1-NCOA2 fusion in some cases has been explained as being due to F00009 2483999 methodological inadequacy (16,19) but the possibility of other disease-specific fusion gene(s), 0 9 and thus pathogenetic heterogeneity in this diagnostic entity, should not be ruled out These results indicate that in vivo overexpression of HMGA2-LPP promotes F00009 1637585 chondrogenesis by upregulating cartilage-specific collagen gene expression through 1 4 the N-terminal DNA binding domains. In the pediatric lipoma studied here, the sequence analysis of the HMGA2–NFIB fusion gene F00009 1983727 revealed that the chimeric transcript was identical to some described previously in 2 1 pleomorphic adenomas of the salivary glands [24] and in lipomas Taken together these data suggest that in pleomorphic adenoma PA37 cells, the identified F00009 1983727 HMGA2 fusion transcripts containing intronic HMGA2 sequences are expressed at lower 3 1 levels than the HMGA2/WIF1 fusion transcript F00009 1763905 The ability of HOOK3-RET protein to induce cell transformation was studied in an NIH3T3 4 7 cells focus assay F00009 2318541 The IRF2BP2-CDX1 fusion is thus suggested to take part in MC tumorigenesis and/or 5 3 progression F00009 2551596 6 0 PAX5-JAK2 binds to PAX5 target loci and activates these genes In the present study, we show that the low-grade endometrial stromal sarcoma cell line JHU- F00009 1872287 ESS1, established by Fresia et al. [9], which carries a der(7)t(6;7)(p21;p22), also harbors a 7 5 JAZF1/PHF1 fusion gene F00009 2687938 One out of six samples was positive both by RT-PCR and by FISH, indicating lower 8 2 prevalence of JAZF1/SUZ12 gene fusion in extrauterine in comparison to uterine ESSs F00009 2232762 The LADCs that were positive for the KIF5B-RET fusion showed twofold to 30-fold higher 9 4 RET expression than non-cancerous lung tissues F00010 2234746 Infection of 3T3 cells with the virus expressing KLC1-ALK readily produced multiple 0 4 transformed foci in culture and subcutaneous tumors in a nude mouse tumorigenicity assay (Figure 4), confirming the potent transforming ability of KLC1-ALK