<<

Supporting Information

Lee et al. 10.1073/pnas.1309293111 SI Materials and Methods performing independent random permutation of the signature Genome Sequence and Annotation. We obtained mouse genome values for each gene. The FDR corresponding to a given P value sequence via the BSgenome.Mmusculus.UCSC.mm9 package in threshold was computed as the ratio of the number of GO cate- BioConductor (1). We downloaded the corresponding genome gories with a P value below threshold, averaged over 50 ran- annotation coordinates directly from www.genome.ucsc.edu (ver- domized data sets, and the number of GO categories with a P value below threshold. A 1% FDR based on the empirical sion mm9). − permutation test corresponds to a WMW test P < 10 4. Information Content of Locus Expression Signatures. To assess how much information about downstream transcriptional regulation Low-Complexity Sequence Features (Fig. S4). To eliminate the po- was contained in a given signature, without the need to specify tential confounding contribution from low-complexity sequence a particular regulatory mechanism, we summed the squares of the features to LESs, we calculated the frequency of each base and the CpG dinucleotide across the transcribed region for each gene. t-values tgm corresponding to the regression coefficients βgm: X Next, we computed the residuals from a multiple linear regression χ2 = 2 : of each LES on these five frequencies (without an intercept). m tgm g We calculated DNA base composition and CpG content for 1-kb windows up to 200 kb upstream or downstream. The base up 2 To determine the statistical significance of the χ statistic, we composition indicator variable Ngbi for base b, gene g, and dis- constructed a null distribution as follows. We performed 100 tance i from TSS was defined as follows: independent permutations by randomizing expression level of 1if base at locus i is b gene across genes for each tumor. We did not permute insertion Nup = : loci because we want to preserve the correlation structure be- gbi 0 otherwise tween insertion loci. For the randomized data sets, we performed down multiple linear regression to calculate the t-value and corre- The base composition Ngbi for the downstream sequence sponding χ2 statistic for each locus with the same method as is calculated with the same procedure using the downstream χ2 up down the actual data set. The mean statistic averaged over 100 sequence. The CpG content Ng;CpG;i, Ng;CpG;i for upstream and randomized data sets for each locus is shown in Fig. S1C as downstream sequences was calculated using the same procedure. purple bar. To calculate the base composition of upstream sequence for To investigate the fraction of the variance in expression levels window;up each window Ngbw , we divided the 200-kb sequence up- of each tumor that is accounted for by our locus expression stream into 1-kb window intervals. For each such window, we signatures, we performed multiple linear regression of the mRNA calculated the upstream base composition as follows: expression levels for each held-out tumor on the locus expression X signature (LES) matrix constructed using all other tumors and window;up = up : 2 Ngbw Ngbi calculated coefficients of determination (R ). We used either (i) i∈w the 13 insertion loci that occur in at least 10 tumors or (ii) the 87 insertion loci that occur in at least 3 tumors. For comparison, we Here, w represents the wth window of the upstream sequence for also performed multiple linear regression using only the 25% most gene g. We also calculated the downstream base composition variable genes. To construct a null distribution, we used 100 in- window;down window;up window;down Ngbw and CpG content Ng;CpG;w and Ng;CpG;w using dependent random permutations of all genes for each tumor. the same procedure. We measured coefficients of determination (R2) by regressing the LESs on all low sequence complexity for Permutation to Calculate False Discovery Rate. To calculate false window w (without an intercept). We used the residuals of this discovery rates (FDR) in each analysis, we performed permu- model fit in further transcription factor (TF)-locus association tations of locus expression signatures across genes for each locus. analyses. Then, we applied the same procedure as used in each analysis to calculate statistics such as t-value or P value for the randomized TF Binding Affinity Profiles. We used the convert2psam utility from data sets. The FDR corresponding to a given P value threshold REDUCE Suite version 2.0 software package (www.bussemakerlab. was computed as the ratio of the number of associations with the org) to convert each of position weight matrix (PWM) from P value below threshold averaged over 1,000 randomized data JASPAR to a position-specific affinity matrix or position-specific sets and the number of associations with the P value below affinity matrix (PSAM) (2); pseudocounts equal to 1 were added to threshold for the real data set. For the t-value, we computed the the PWM at each position, and the resulting base counts were number of associations whose absolute t-value is bigger than a divided by that of the most frequent base at each position to get an given t-value threshold instead. estimate for the relative affinity associated with each point muta- tion away from the optimal binding sequence. The resulting PSAM Forward Selection of Gene Ontology Categories. For each gene collection was used to compute a weighted promoter affinity ontology (GO) category, we applied the Wilcoxon–Mann–Whitney for each gene. All putative individual binding sites in the ge- (WMW) test to detect differences in distribution between the nomic region from 200 kb upstream to 200 kb downstream of locus expression signature value of genes within the GO category the TSS of each gene with a predicted relative affinity of at least and that of the other genes. At each step, we subtracted the mean 0.1 were identified and scored using the AffinityProfile utility in signature value of the genes in the gene set with the lowest P value the REDUCE Suite. from all genes in that gene set. The P values were then recalcu- lated, and the procedure was repeated until even the most sig- Inferring Length Scale Parameters. For each choice of the regula- − nificantly regulated gene group had P > 10 5, which corresponds tory scale parameter λ in the range from 1,000 to 100,000 base to an FDR < 0.1%. Statistical significance was determined by pairs, we obtained a total weighted upstream affinity by summing

Lee et al. www.pnas.org/cgi/content/short/1309293111 1of12 the affinity of all upstream or downstream binding sites using probes mapping to the same mouse RefSeq ID, resulting in 9,757 a weight exp(−d/λ), where d is the (absolute) distance of a given genes shared between both data sets. binding site from the transcription start site (TSS). Then, we To obtain robust results, we filtered out noninformative genes λup computed TF-specific and locus-specific parameters φm that using two criteria. First, only mouse genes showing a high variance maximized the correlation coefficients between a total weighted across tumors (upper 50th percentile) were retained. Second, we upstream affinity and each LES, resulting in an optimized total deleted human genes whose expression was detected in neither weighted affinity. An analogous procedure was performed for treatment nor control. Next, we calculated averages of gene the downstream sequence. The sum of upstream and down- expression levels across profiles for the same in different cell stream total weighted affinities was used for mapping the locus- types, resulting in 1,309 drug signatures. Genome-wide linear TF network and drug-TF-locus network. regression of each of these on the locus expression signatures was performed. To determine the statistical significance of each pu- Myc Validation of Result. We downloaded gene expression profiles tative drug-locus association, we performed 100 random permu- obtained by ref. 3 for transgenic mice that conditionally express tations of drug signatures and repeated the analysis. A 1% FDR the human MYC cDNA in T-cell lymphocytes (GEO accession corresponded to a regression coefficients whose t-value has an number GSE10200). In this transgenic mouse, doxycycline treat- absolute value >7. ment suppresses MYC expression. We used the two most extreme Statistical significance for TF-drug associations was also de- doxycycline concentrations of 0 and 20 ng/mL. To obtain an es- termined by performing 100 independent random permutations of timate for the differential expression level in response to inac- drug response profiles, resulting in a 5% FDR for family-level and tivation of Myc, we subtracted the treatment/reference log2-ratio − − individual PSAMs at P < 3.0 × 10 3 < × 5 at 0 ng/mL from that at 20 ng/mL. These values served as the de- and 6.3 10 , respectively. pendent variable in the regression on TF affinity profiles. We adopted the same statistical significance criterion for the drug- locus association and TF-locus associations as in previous analyses. Mapping Drug-Locus Associations. Genome-wide mRNA expres- Human Mutation Expression Signatures. The acute myeloid leuke- sion data for cultured human cells treated with bioactive small “ ” molecules were downloaded from the Connectivity Map website mia data set was downloaded from The Cancer Genome Atlas (www.broadinstitute.org/cmap/). This collection contains 7,056 data portal (https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp). We expression profiles for 1,309 distinct compounds. The experi- downloaded level 3 gene expression levels (Affymetrix HG-U133 ments were carried out on two different Affymetrix GeneChip platform) and level 2 somatic mutation data for 197 acute myeloid designs (HG-U133A and HT-HG_U133A) and in four different leukemia tumor samples and found that both data types were cell lines (the breast cancer epithelial cell line MCF7, the pros- available for 194 tumor samples. For the somatic mutation data, tate cancer epithelial cell line PC3, nonepithelial leukemica cell we considered the nonsilent somatic mutation and determined line HL60, and nonepithelial melanoma cell line SKMEL5). We whether a gene has at least one nonsilent mutation for each tu- followed the preprocessing and normalization steps described in mor. For each gene, we fit a linear model to explain its mRNA ref. 4 to obtain the expression log2 ratio between drug treatment expression level in terms of the mutation status at the RUNX1 and control. To combine the human drug response expression and TP53 loci. The coefficients of this model, taken together data with our mouse-based locus expression signatures, we needed across all genes, then constituted the mutation expression signa- to map human Affymetrix probe IDs to mouse RefSeq IDs. For ture for each locus. We mapped mouse insertion locus genes to this, we used human-mouse orthology tables downloaded from their human orthologs using the human-mouse orthology tables UCSC (www.genome.ucsc.edu/). We averaged over the Affymetrix downloaded from genome.ucsc.edu.

1. Gentleman RC, et al. (2004) Bioconductor: Open software development for 3. Shachaf CM, et al. (2008) Genomic and proteomic analysis reveals a threshold level of computational biology and bioinformatics. Genome Biol 5(10):R80. MYC required for tumor maintenance. Cancer Res 68(13):5132–5142. 2. Bussemaker HJ, Foat BC, Ward LD (2007) Predictive modeling of genome-wide mRNA 4. Lamb J, et al. (2006) The Connectivity Map: Using gene-expression signatures to expression: From modules to molecules. Annu Rev Biophys Biomol Struct 36:329–347. connect small molecules, genes, and disease. Science 313(5795):1929–1935.

Lee et al. www.pnas.org/cgi/content/short/1309293111 2of12 Fig. S1. (A) Number of tumors carrying each insertion (Upper Right of matrix) and tumors carrying both insertions together (within matrix). The color represents –log10(P value) of the Fisher exact test for each pair of insertion loci showing the independence of occurring two insertion together. (B) Pairwise Pearson correlation of each pair of LESs. (C and D) Information contents in each LES (C) χ2 statistic calculated by summing squares of the t-value corresponding to each LES (blue) and the mean of the χ2 statistic averaged over 100 randomly permuted data sets (pink) (SI Materials and Methods). (D) The mean of co- efficient of determination across all tumors from multivariate linear regression of LESs on expression levels of each tumor. The blue represents the results based on 13 insertion loci that occur in no less than 10 tumors, and the red represents the results based on 87 insertion loci that occur in no less than 3 tumors. We performed multivariate linear regressions based on all genes, the remaining 25% of genes after filtering out less variable genes across tumors, and permuted genes.

Lee et al. www.pnas.org/cgi/content/short/1309293111 3of12 Fig. S2. (A) LES values for genes in the surrounding regions of insertions for NOTCH1 and MYC insertions. (B) Cumulative distribution function plot of p19ARF and p53 expression signature values of genes in M phase of the mitotic cell cycle.

Lee et al. www.pnas.org/cgi/content/short/1309293111 4of12 Fig. S3. Functional annotation using GO. Heatmap of significant GO categories for each LES at FDR 1%.

Lee et al. www.pnas.org/cgi/content/short/1309293111 5of12 Fig. S4. Effect of low-complexity sequence on LESs. R2 from a linear regression model between each LES and the frequency of DNA base compositions in- cluding each DNA composition and CpG composition. Red represents downstream and blue represents upstream from TSSs. The DNA base compositions for each gene and each window are calculated based on a 1-kb window sequence from TSSs and are used as independent variables in a multiple linear model with LESs to calculate R2 (SI Materials and Methods).

Lee et al. www.pnas.org/cgi/content/short/1309293111 6of12 Fig. S5. Validation of regulation under MYC expression signature. Scatter plot of –log10(P value) for each TF. The significance levels of differential TF activities in response to MYC inactivation (x axis) and in response to MYC insertion (y axis) are shown.

Lee et al. www.pnas.org/cgi/content/short/1309293111 7of12 aedu.Rdbxrpeet rg htfnto shsoedaeyaeihbtr n are and inhibitors deacetylase histone as function that are and represents hibitors box Red drug. same S6. Fig. e tal. et Lee rglcsascain.Sgiiatdu-ou soitosa D % ti ae nepeso rflsaeae vrtecl ye nresponse in types cell the over averaged profiles expression on based is It 1%. FDR at associations drug-locus Significant associations. Drug-locus www.pnas.org/cgi/content/short/1309293111 MYC seii,adgensosdusta ucina oosmrs niiosadare and inhibitors topoisomerase as function that drugs shows green and -specific,

AC153556.2 Mycn Myc Rras2 Med20|Ccnd3 mmu−mir−106a Notch1 Pvt1 Runx1 Gfi1 Rasgrp1 Pim1 Myb p19 ko MYCN p53 ko mebendazole clioquinol miconazole seneciphylline sulfapyridine Prestwick−983 0297417−0002B LY−294002 sirolimus acid etacrynic norcyclobenzaprine menadione corticosterone mycophenolic acid scriptaid vorinostat pyrimethamine rescinnamine emetine oxyphenbutazone valproic acid mesilate co−dergocrine puromycin clofilium tosylate beta−escin helveticoside lanatoside C ouabain bisacodyl 0179445−0000 MS−275 ellipticine CP−645525−01 rifabutin (−)− tretinoin fulvestrant A trichostatin HC toxin Prestwick−1080 SC−560 NS−398 STOCK1N−35696 cinchonine iopamidol CP−863187 AG−825tyrphostin mitoxantrone doxorubicin medrysone dioxybenzone famprofazone ginkgolide A nystatin CP−944629 daunorubicin irinotecan resveratrol methotrexate thioguanosine 0175029−0000 camptothecin GW−8510 chlorhexidine clindamycin chlortetracycline dirithromycin danazol 0173570−0000 6−benzylaminopurine dexverapamil 5109870 nilutamide dipyridamole mestranol (+)−chelidonine ionomycin etoposide deferoxamine cytochalasin B Prestwick−559 latamoxef calmidazolium suloctidil pyrvinium 8−azaguanine niclosamide lynestrenol gossypol methylbenzethonium chloride mephentermine tonzonium quinostatin benzethonium chloride alexidine procarbazine quinisocaine seii.Bu o hw rg htfnto sP3 in- PI3K as function that drugs shows box Blue -specific. 1 1 50510 5 0 −5 −10 −15 mmu-mir-106a t−value – specific. 8of12 othe to A B C

tyrphostin mestranol famprofazone remoxipride mebhydrolin nystatin seneciphylline quinostatin ionomycin fluphenazine AG−825 deptropine nilutamide 0173570−0000 ginkgolide A cloperastine metergoline dipyridamole

TRP(MYB) E2F1

TRP(MYB) NOTCH1 MED20|CCND3

AC153556.2 D

MYC

Myc HIF1A::ARNT bHLH(zip)

fenoterol bromocriptine LY−294002 tetryzoline Prestwick−983 fulvestrant procarbazine mephentermine sulfapyridine sirolimus co−dergocrine etacrynic acid methylergometrine pergolide (−)−isoprenaline thioridazine alexidine mesilate mepacrine

E MYCN

Myc bHLH(zip) HIF1A::ARNT ETS REL

tonzonium benzethonium clofilium tosylate pyrimethamine (+)−chelidonine 0297417−0002B procarbazine bromide astemizole trimipramine chloride suloctidil loperamide piperacetazine norcyclobenzaprine rifabutin prochlorperazine fulvestrant etacrynic acid quinisocaine homochlorcyclizine metergoline resveratrol trichostatin A cyproheptadine quinostatin dihydroergotamine

ionomycin promazine terfenadine thioridazine nortriptylineperphenazine valproic acid chlortetracycline danazol clindamycin chlorhexidine methylbenzethonium dirithromycinsulfapyridine azacyclonolchlorzoxazone lobeline chlorcyclizine cloperastinetrifluoperazine MS−275 pergolide chloride calmidazolium rottlerin mefloquine prenylamine salbutamol ampyrone alexidine sirolimus alpha−ergocryptine alfuzosin fluphenazine bromocriptine

(−)−isoprenaline methylergometrine fenoterol tretinoin

Fig. S7. All locus-TF-drug associations for each locus. (A) NOTCH1 locus. (B) AC153556 locus. (C) MED20/CCND3 locus. (D) MYCN locus. (E) MYC locus.

Lee et al. www.pnas.org/cgi/content/short/1309293111 9of12 Fig. S8. Association between the mouse insertional locus and the human somatic mutation. (A) The association between each mouse locus expression sig- nature and the human expression signature affected by the mutation status of TP53. The –log10(P value) from the linear model is shown. (B) The association between each mouse locus expression signature and the human expression signature affected by the mutation status of RUNX1.

Lee et al. www.pnas.org/cgi/content/short/1309293111 10 of 12 Table S1. Genes with the most extreme values of the locus expression signature p19 p53 Notch1 Rasgrp1 Gfi1

Slpi 0.77 Cdkn2a 1.66 Gm12253 2.05 Myl1 0.93 Myl1 1.89 RP23-395H4.4 0.62 Cdkn2a 1.45 Notch1 1.92 Pp11r 0.62 Tns4 1.11 NM_027222 0.61 Butr1 1.27 Dtx1 1.86 Dlk1 0.58 Gimap7 1.04 Reg1 0.58 Ifi27l2a 1.03 Aldh1b1 1.65 Slc6a13 0.55 Als2 0.95 Try10 0.58 Isg15 1.03 NR_002860 1.45 Cd8b1 0.52 Gfi1 0.92 Amy2 0.57 Sparc 0.90 Adam19 1.42 Ldhb 0.50 Mpp4 0.86 Vpreb3 0.56 Vpreb3 0.82 Spsb4 1.35 Rag1 0.47 Gpr68 0.81 Pnlip 0.56 Usp18 0.77 Rag1 1.35 Emb 0.47 Klra1 0.77 Il4i1 0.55 Retnlg 0.65 Cd163l1 1.28 Lmo4 0.44 Egr1 0.77 .Pnliprp1 0.55 NM_027222. 0.64 Ctla4. 1.25 .Rasgrp1 0.43 Rapsn. 0.74 . . . . . Treml1 −0.23 NM_175332 −0.55 S100a10 −1.09 Kcnf1 −0.35 Sycn −0.64 Cdr2 −0.23 Xist −0.55 Try10 −1.11 NM_027222 −0.35 Try5 −0.67 Cdkn2a −0.24 Lck −0.56 Prss2 −1.11 Egr4 −0.36 Ctrb1 −0.67 Serpine2 −0.24 Sh2d2a −0.58 Myl1 −1.17 Ccdc72 −0.36 Pnliprp1 −0.69 Ednra −0.25 Hdc −0.62 Amy2 −1.18 Serpine2 −0.39 Pnlip −0.70 Ddc −0.25 Thy1 −0.64 Clps −1.20 Vpreb3 −0.41 Zg16 −0.71 Ppbp −0.28 Cd3d −0.68 Zg16 −1.22 Slpi −0.47 Try4 −0.72 Mpp4 −0.31 NM_009831 −0.69 Pnlip −1.27 Homer2 −0.48 Try10 −0.78 Rsad2 −0.36 Cd3g −0.71 Pnliprp1 −1.46 H2-Eb1 −0.63 Reg1 −0.79 Gfi1 −0.39 Trp53 −1.77 RP23-395H4.4 −1.50 H2-Aa −0.81 Sycn −0.64

Rras2 Myb AC153556 Mycn Myc Rras2 2.03 Igfbp4 1.52 Rras2 0.98 Emb 0.85 Myl1 0.94 Lypla1 1.34 Trnp1 1.40 Satb1 0.96 Cd160 0.78 Gimap7 0.64 Pp11r 1.10 Cel 1.38 Ly6c1 0.86 Ddc 0.75 Grb7 0.57 NM_025427 0.73 Ela1 1.36 Ppbp 0.83 Cxcr3 0.72 Chst2 0.56 Nsg2 0.69 EG386551 1.35 Thy1 0.78 Thy1 0.70 Tns4 0.53 Ly6d 0.68 Ctrb1 1.33 Dlk1 0.76 Itgb7 0.64 Ccnd2 0.52 Satb1 0.66 Cpb1 1.33 Trat1 0.75 Nefh 0.64 Cd160 0.51 Cd163l1 0.65 Try5 1.31 Mgst2 0.72 Cd3d 0.59 Bmp7 0.47 Dntt 0.63 Ctrc 1.30 Xrcc6 0.67 Rapsn 0.59 Rapsn 0.46 .Cyb5 0.58. Ela3 1.29. H2-Q2 0.66. Tesc 0.58. Gfi1 0.44 . . . . . Hdc −0.56 Pf4 −0.73 Tns4 −0.85 Try4 −1.20 Prg2 −0.43 Plac8 −0.56 Lgi2 −0.75 Trnp1 −0.86 Prss2 −1.24 Rras2 −0.43 Gpr68 −0.58 Smpdl3a −0.77 Arg1 −0.87 Reg1 −1.29 Ddc −0.46 Ccnd2 −0.63 Ly6d −0.82 Xist −0.89 Clps −1.33 S100a9 −0.46 Trf −0.64 Cxcr3 −0.87 Serpina3g −0.93 Try10 −1.35 Pp11r −0.47 Smpdl3a −0.66 Ctsw −0.87 Ccl24 −0.94 Pnlip −1.35 Gpc1 −0.50 Dlk1 −0.68 Ddc −1.11 Ccl8 −1.07 Amy2 −1.38 Cd163l1 −0.62 Tns4 −0.80 Ppbp −1.12 Prg2 −1.12 Zg16 −1.39 Ly6d −0.64 Klk8 −0.81 Myl1 −1.16 Igfbp4 −1.18 Pnliprp1 −1.68 Ccnd1 −0.68 Gal −1.14 Ly6c1 −1.36 Retnla −1.22 RP23-395H4.4 −1.76 Satb1 −0.80

Pvt1 Runx1 Pim1 Med20/Ccnd3 Mmu-mir-106a Ccnd1 0.87 Cd163l1 1.11 RP23-395H4.4 1.02 Emb 0.66 Pnliprp1 0.88 Satb1 0.58 Emb 0.66 Pnliprp1 0.95 Lysmd2 0.51 Emb 0.86 Slc6a13 0.52 Prg2 0.63 Clps 0.86 Ly6k 0.50 Lypla1 0.84 Il18r1 0.45 Cd8b1 0.61 Pnlip 0.79 Itgb7 0.49 Amy2 0.72 Eras 0.45 Il18r1 0.57 Lypla1 0.73 Gimap7 0.48 Cd163l1 0.63 Eif2s3y 0.42 Nkg7 0.57 Try10 0.73 Il18r1 0.47 RP23-395H4.4 0.60 Gpc1 0.42 Lsp1 0.56 Try4 0.71 Thy1 0.45 Igf2bp3 0.60 Cdkn1c 0.40 Scin 0.55 Prss2 0.71 Kcnf1 0.44 Reg1 0.57 Cd163l1 0.39 Pp11r 0.53 Try5 0.71 Gimap3 0.44 Notch1 0.54 .Nfil3 0.39 Ear4. 0.51 Reg1. 0.69 Rapsn. 0.43 Ly6k. 0.52 . . . . . Prss2 −0.70 Myl1 −0.42 Hdc −0.45 NM_027222 −0.46 Mgp −0.41 Pnlip −0.70 Ly6k −0.43 Homer2 −0.46 H2-T22 −0.47 Lxn −0.46 Reg1 −0.74 Il17rb −0.43 Chst2 −0.47 Ccnd2 −0.48 Igfbp4 −0.47 Try10 −0.78 Egr4 −0.44 Rapsn −0.47 Irf4 −0.49 Prg2 −0.47 Clps −0.82 Klra1 −0.44 Cd160 −0.47 Plac8 −0.49 Gpr97 −0.48

Lee et al. www.pnas.org/cgi/content/short/1309293111 11 of 12 Table S1. Cont. Pvt1 Runx1 Pim1 Med20/Ccnd3 Mmu-mir-106a

Amy2 −0.84 Rapsn −0.45 Ly6c1 −0.49 Adam19 −0.50 Dnajc6 −0.54 Zg16 −0.87 Reg1 −0.46 Tns4 −0.54 Notch1 −0.54 Gal −0.55 RP23-395H4.4 −1.00 RP23-395H4.4 −0.52 Cd3d −0.54 Gm12253 −0.57 Dlk1 −0.59 Myl1 −1.06 Gfi1 −0.67 Gal −0.65 Myl4 −0.57 Ly6d −0.69 Pnliprp1 −1.06 Pnliprp1 −0.69 Dlk1 −0.68 Myl1 −0.61 Ddc −0.73

For each locus at which an insertion is present in at least 10 tumors in the data set, the table shows the genes with the 10 highest and 10 lowest expression signature values, separated by ellipses. Shown in bold is the gene designated as the primary target of the insertion or that whose function is directly-related with its primary target.

Lee et al. www.pnas.org/cgi/content/short/1309293111 12 of 12