Identifying Cooperating Mirnas Via Kernel Based Interaction Tests

bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests

Gulden Olgun1 Oznur Tastan2* 1 Cancer Data Science Lab, National Cancer Institute, NIH, Bethesda, MD, 20892 2 Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla, Istanbul, Turkey * [email protected]

Abstract Although miRNAs can cause widespread changes in expression programs, single miRNAs typically induce mild repression on their targets. Cooperativity is reported as one strategy to overcome this con- straint. Expanding the catalog of synergistic miRNAs is critical for understanding gene regulation and for developing miRNA-based therapeutics. In this study, we develop miRCoop to identify synergistic miRNA pairs that have weak or no repression on the target mRNA, but when bound together, induce strong repression. miRCoop uses kernel-based interaction tests together with miRNA and mRNA target information. We apply our approach to kidney tumor patient data and identify 66 putative triplets. For 64 of these triplets, there is at least one common transcription factor that potentially regulates all participating RNAs of the triplet, supporting a functional association among them. Furthermore, we find that triplets are enriched for certain biological processes that are relevant to kidney cancer. Some of the synergistic miRNAs are very closely encoded in the genome, hinting a functional association among them. We believe miRCoop can aid our understanding of the complex regulatory interactions in different health and disease states of the cell and can help in designing miRNA-based therapies. Matlab code for the methodology is provided in https://github.com/guldenolgun/miRCoop. 1 Introduction stronger degree of repression of a single gene is achieved with multiple miRNAs than is possible by the individual miRNAs. Enright et al. [16] MicroRNAs (miRNAs) are small non-coding RNAs are the first to show the cooperativity of lin–4 that play extensive regulatory roles across different and let–7 in Drosophila experimentally. Krek et biological processes and pathologies such as can- al. [32] find that in a murine pancreatic cell line cer ([18, 44]). By binding the complementary se- that miR–375, miR–124, and let–7b cooperate to quence of the target messenger RNAs (mRNAs), repress the MTPN gene. Similarly, Lai et al. [33] miRNAs repress their target mRNA gene expres- reports synergistic interaction of miR–93 and miR– sion ([5]). Despite the widespread changes they 572 in repressing the p21 gene in human SK–Mel– have on the protein levels, single miRNAs typically 147 melanoma cells. Yang et al. [61] shows that induce modest repression on their targets ([49, 4]). miR–17–5p and miR–17–3p can synergistically in- Cooperation among miRNAs is one strategy to duce prostate tumor growth and invasion by re- boost their activity. This cooperativity can be repressing the same target TIMP3. Others also show alized in the form of multiple miRNAs collectively similar lines of evidence on the synergistic inter- co-regulating a cohort of genes that are part of a actions of miRNAs to regulate a mutual target functional module, or multiple miRNAs can jointly ([20, 57, 55, 54, 34]). co-target an individual gene to achieve cooperative control of a target gene (reviewed in [9]). In this Despite the accumulating evidence, our under- work, we focus on the latter case. standing of synergistic miRNAs that regulate a sin- There is a substantial amount of experimental gle gene is still rudimentary. Determining which evidence on the cooperativity of miRNAs where a pairs of miRNAs synergize and which mRNA is

1 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

the target of this cooperativity is critical for un- of miRNAs in the human genome using miRNA derstanding how miRNAs function and how they target predictions, structural predictions, molecu- regulate various gene expression programs. Fur- lar dynamics simulations, and mathematical mod- thermore, synergism can be exploited in designing eling methods. The main limitation of this method miRNA based therapeutics as pointed in a recent is that it does not incorporate the gene expres- survey by Lai et al. [34]. The single miRNA treat- sion profiles. The gene regulation changes dras- ments require high amounts of miRNAs which can tically in different conditions; thus, the identified provoke off-target effects while therapeutics based miRNAs are putative triplets and are not context- on synergistic miRNAs have the potential to over- dependent. CancerNet [40] makes use of the ex- come these unintended effects by enabling lower pression data but not for testing the statistical levels of each of the miRNAs to be administered interactions of the triplets. The miRNA-mRNA ([34]). expression profiles in each cancer are used to re- In this study, we develop miRCoop for detecting move the targets that are not expressed from the synergistic miRNAs and their mutual targets using protein-protein interaction (PPI) network and to the expression data and miRNA target information. filter miRNA-mRNA interactions that are not neg- We focus on a specific type of synergistic relation- atively correlated. CancerNet detects synergistic ship: the miRNA pairs which have weak or no re- miRNA pairs based on the functional similarity of pression on the target mRNA, but when together the targets of the miRNAs and the proximity of strongly repress its expression. Please note that these targets in the protein-protein interaction net- this does not cover all the synergistic relationships, work. This functional association does not neces- i.e., the case where miRNAs have moderate expres- sarily point to synergism in regulating a single gene, sion themselves can also be in a synergistic relation- but instead, it refers to a group of genes that are ship. Although there are methods that address the regulated together by the same pair of miRNAs. cooperativity of miRNAs - discussed below - there Finally, Zhang et al. [65] and Zhang et al. [71] is no work that specifically addresses this specific uses gene expression data and apply causal infer- type of synergistic relationship. We focus on this ence methods to find synergistic pairs. In Zhang et particular case because this set of miRNAs can un- al. [65], miRNA knockout is simulated using gene cover more novel RNAs that are important for a expression data to discover synergistic miRNAs. In disease or condition which cannot be captured by a more recent study, Zhang et al. [71] extends merely checking the miRNAs individual effect, and this work to incorporate the simulation of multi- they are important for miRNA-based therapies. ple knockouts of miRNAs. Zhang et al.’s method Several related computational methods exist in [65] relies on the IDA method, Interventional cal- the literature on miRNA cooperativity. However, culus when DAG is Absent. Similarly, miRsyn, the majority of these tools inspect the cooperativ- [71], makes use of the joint-IDA method to iden- ity of miRNAs in regulating a set of genes which tify the synergistic miRNAs. IDA learns causal are part of a functional module ([50, 64, 3, 8, 2, structure by the PC-algorithm [60] and measures 59, 36, 21, 13, 40, 14]) in contrast to the coopera- the effects with Pearl’s do-calculus method [11]. In tivity for regulating a single gene. Hon et al. [25] this method, the variables are assumed to follow is one of the early work that analyzes the combi- a multivariate Gaussian distribution, which might natorial interactions of miRNAs and their targets. not hold in the case of expression data. IDA also Although the study includes an analysis of the miR- assumes there are no hidden effects on the mR- NAs that mutually target a single gene, this work NAs, however, the main regulators of the mRNA does not aim at finding miRNAs that bind their are actually the transcription factors. The mirSyn target concurrently in a synergistic manner. The method also includes a filtering step that removes target relationships are established solely based on transcripts from the analysis that are not signifi- the predicted miRNA-mRNA interactions without cantly associated with survival. mirSyn filters the expression data. miRNAs and mRNA that are significantly differ- TriplexRNA aims at detecting synergistic miR- ent in their survival using a Cox regression model. NAs that bind to a mutual target gene [47]. This limits the applicability as patient survival data TriplexRNA assesses the synergism of the pairs are not available in many situations and the par-

2 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

ticipants of the synergistic relations are not neces- triplets’ RNAs reveal that most of them are individ- sarily associated with survival. Another limitation ually associated with kidney cancer, as well. Fur- is that the two works do not check if the binding thermore, we observe that some miRNA pairs are sites of the candidate miRNAs are overlapping or encoded very close on the genome hinting a tight not. If they are overlapping, it is unlikely that the regulation at the genome level and supporting a miRNAs can bind at the same time. functional link among these miRNAs. To check We cast the problem of finding synergistic miR- other coordinated regulatory evidence, we investi- NAs and their mutual target as a statistical depen- gate whether a shared transcription factor poten- dence discovery problem. Using matched miRNA tially coordinates the miRNAs and RNAs, and in- and mRNA expression data and considering the ex- terestingly, we observe that this holds for almost pression level of each miRNA and mRNA as a ran- all identified triplets. When we explore the rel- dom variable, miRCoop finds a specific three-way evance of the identified triplets to kidney cancer, statistical interaction among these random vari- our functional enrichment analysis on the largest ables. Here, the pairs of miRNAs have weak or no connected component of miRNA–miRNA–mRNA statistical dependence with the mRNAs, yet when network highlights critical processes related to kid- taken jointly, they are dependent on the mRNA. ney cancer, part of which are key driver genes. This case is similar to the example wherein adding sugar to coffee or stirring the coffee alone has weak or no effect on the coffee’s sweetness, but the joint 2 Materials and methods presence of the two has a strong effect ([48]). This specific case has not been the focus of previous miRCoop relies on miRNA–mRNA target predic- work. In testing whether the triplets of RNAs have tions and kernel statistical tests on the expression such relationships, we resort to the kernelized Lan- data. The overall pipeline of miRCoop is summa- caster three-variable interaction developed by Se- rized in Fig 1. The first step curates a list of pos- jdinovic et al. [48]. This test relies on mapping the sible triplets based on miRNA target predictions probability distributions into a reproducing kernel and their predicted mRNAs. In the second step, Hilbert space (RKHS). The effectiveness of embed- the candidate triplets are pruned based on expres- ding probability distributions are demonstrated in sion data. In the final stage, we test the remaining different machine learning and inference task (re- triplets for the statistical dependence relationships. viewed in [53] and [41]). The key idea of embed- Below, we detail these three steps. Following the ding probability distributions is to implicitly map miRCoop descriptions, we provide information on probability distributions into potentially infinite- the null model used for triplets and the data col- dimensional feature spaces using kernel functions lection and pre-processing steps when miRCoop is ([52]). Kernel embedding of the probability dis- applied to kidney cancer. tributions avoids the restrictive assumptions on the distributions such as variables to be discrete- 2.1 miRCoop Step 1: Identifying valued, or assumptions on the relationships among Candidate miRNA–mRNA Tar- the variables such as the linear relationship between get Interactions variables ([53]). The kernel trick helps us to deal with small sample sizes for the available expression miRCoop first identifies pairs of miRNA that can profiles, and it does not require any distributional bind to a common target mRNA (Fig 1 Step 1). assumptions. In addition to the expression profiles, A pair of miRNAs and their mutual target mRNA miRCoop makes use of the putative target informa- constitute a candidate triplet. As the interactions tion among the miRNAs and mRNAs. of newly cataloged miRNAs are not included in ex- We apply miRCoop to kidney cancer using tu- perimentally validated miRNA target interaction mor samples from The Cancer Genome Atlas databases, we resort to available target prediction (TCGA) project. This analysis discovers 66 pu- algorithms. For this, we use TargetScan ([1]) and tative triplets. We use several validation strategies miRanda ([7]). When using TargetScan, the 6–mer to check the existence of a regulatory relationship sites’ predictions are filtered since they are classi- among the triplets’ participants. Investigating the fied as poorly conserved ([1]). Similarly, when run-

3 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1. Find miRNA pairs that potentially target the same 2. Filter triplets based on miRNA and mRNA 3. Find cooperating miRNA pairs based on mRNA expression patterns kernel based three variable interaction test on expression data a. Find miRNA pairs predicted to target the same mRNA a. Filter RNAs with missing and low varying expression a. Apply two variable kernel independence test for miRNA1 - miRNA2 b. Keep triplets for which patient mRNA expression is pairwise miRNA-mRNA interactions significantly lower when both miRNA’s expression is up compared to the case where both are down. miRNA3 - miRNA4 b. Apply kernel based three variable Lancaster independence test for triplets

3’UTR mRNA

b. Filter miRNA pairs with overlapping binding regions

ΔLP = P XYZ - PXPYZ - PXZ PY - PXY PZ + 2PXPYPZ

miRNA1 miRNA2 mRNA expression mRNA miRNA1: X Y: miRNA 2 3’UTR mRNA

mRNA: Z miRNA4 miRNA3 Patient Group 1: Patient Group 2: Both miRNAs are Both miRNAs are 3’UTR mRNA upregulated downregulated

Figure 1: Overview of the methodology. Each box represents a step in the methodology. The first step identifies miRNA pairs that can target the same RNA based on miRNA target predictions. The second step illustrates how the RNAs are filtered based on expression profiles. The third step illustrates the idea of cooperative RNAs. δLP denotes the Lancaster interaction on the triple of random variables (X,Y,Z) is defined as the signed measure. PXYZ denotes the joint distributions of X,Y,Z random variables, while PX , PY and PZ are marginal distributions. If the joint distribution PXYZ factorizes into a product of marginals in any way, then δLP vanishes. This is used as a test statistic to test for independence.

ning miRanda, to reduce false-positive predictions, miRNAs are down-regulated. To filter for triplets we apply stringent parameter settings (a pairing where this expression pattern holds, we divide the score of > 155 and an energy score of < −20). samples into different expression subgroups based We consider only the miRNA–mRNA pairs that are on the expression states of the potential synergis- predicted by both algorithms. The triplets where tic miRNA pairs. The first group comprises the miRNA pairs have an overlapping binding site on samples where both miRNAs are upregulated, and the target mRNA are also excluded. the second group is the subset of samples where Although the target information prunes many both miRNAs are downregulated. In this analysis, combinations, at the end of this step, there is still miRNA expression in a sample is considered upreg- an extensive list of candidate triplets. For example, ulated if it is in the fourth quartile of all samples, when we apply to kidney cancer (see Results), there and it is deemed to be downregulated if it is in the are 31, 184 potentially interacting miRNA–mRNA first quartile. Having grouped the samples, for ev- pairs, and there are 163, 550 candidate miRNA– ery triplet in the candidate list, we check whether miRNA–mRNA triplets. the mRNA expression levels are significantly lower than the mRNA expression levels compared to the second group. To test for significance, we apply the 2.2 miRCoop Step 2: Expression one-sided Wilcoxon rank-sum test ([56]). This test Filter is used to determine whether two groups of expression values are selected from populations having We further narrow down the candidate list based the same mRNA expression distribution. We ap- on the expression profiles of the miRNAs and mR- ply this test to all possible triplets formed at the NAs measured over a set of samples, Fig 1 (Step end of Step 1 and retain only those for which the 2). miRNAs repress the expression of their tar- null hypothesis is rejected at 0.05 significance level. get mRNA genes. Therefore, we expect that if both miRNAs are upregulated, the mRNA expression will be lower compared to the case where both

4 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

2.3 miRCoop Step 3: Testing Statis- a signed measure that vanishes whenever P can be tical Interactions of RNAs with factorized as a product of its marginal distributions Kernel-Based Interaction Tests ([35]). When there are two variables, X and Y , the Lancaster interaction measure is defined as: For every triplet candidate that passed through Step 2, we perform statistical interaction tests on 4L P = PXY − PX PY . (1) RNA expression data to find putative miRCoop For three random variables, the Lancaster inter- triplets, Fig 1 (Step 3). We are looking for a high action is defined as: order interaction between the three RNA entities that is while each miRNA expression levels have 4LP = PXYZ −PXY PZ −PYZ PX −PXZ PY +2PX PY PZ . weak or no interaction on the mRNA expression, (2) the simultaneous interaction of the two miRNAs In the above equations, while PX , PY and PZ are with the mRNA expression levels is strong. marginal distributions of the random variables X, We denote the two miRNAs’ expression levels as Y and Z respectively. PXYZ denotes the joint dis- two random variables: X and Y and define the tributions of X,Y,Z random variables, PXZ , PXY expression level of the mRNA to be tested as a third and PYZ are pairwise marginal distributions. The random variable denoted as Z. We are interested in 4LP is zero whenever P can be factorized as a cases where the miRNAs are pairwise independent product of its marginal distributions. A statistical

with mRNA: X = Y and Y = Z but form a mutually | | dependency test can be constructed where the Lan- dependent triplet because of the following holds: caster interaction is the test statistic ([35, 48]). The

(¬((X, Y ) = Z)). Note that this also corresponds | null hypothesis, in this case, is that the Lancaster to checking for a V-structure, a directed graphical interaction is zero. The null distribution of the model with two independent parents pointing to a test statistic can be estimated using a permutation- single child, where the miRNAs’ expression levels based approach where the indices of the variable(s) (X and Y ) are the parent variables, and the child of interest are shuffled ([48]). variable is the mRNA expression level (Z) ([48]). Sejdinovic et al. [48] formulate a kernel-based To ensure that the miRNAs have weak influence three-variable Lancaster interaction test by em- individually, we first test for pairwise independence bedding the probability distributions into repro- among X and Z and Y and Z. In each of these ducing kernel Hilbert spaces (RHKS). In the gen- tests, the null hypothesis that the two variables are eral formulation of [48], the three variables to be independent. We continue with triplets, where the tested form a random vector of triplets (X,Y,Z) ∈ null hypothesis is accepted. On these triplets, we Rp ×Rp ×Rp, where p is the dimension. In our case apply three-variable kernel interaction tests pro- p = 1 as for each miRNA and mRNA there is only posed by [48] and considers only triplets that re- one measurement per sample. ject the null hypothesis. Here, we provide a brief The key idea of embedding probability distri- description of these kernel-based tests and later de- butions is to implicitly map probability distribu- tail on how we use this method for discovering miR- tions into potentially infinite dimensional feature Coop triplets. spaces using the positive definite kernel functions [41]. The positive definiteness of the kernel func- 2.3.1 Kernel Three-Variable Lancaster In- tion guarantees the existence of a dot product space teraction Test H and a feature mapping φ : X → H such that k(xi, xj) = hφ(xi), φ(xj)iH . This dot product can Interaction effects occur when the effect of one vari- be calculated implicitly with the kernel function able depends on the value of another variable. In without mapping φ explicitly. This is commonly this case, the repression effect of the miRNAs on referred as the kernel trick [45]. The mean em- the mRNA depend on the value of the coopera- bedding of a random variable X is defined as: tive miRNA’s expression. An interaction measure µX := EX k(X, ·) ∈ Fk. In [48], given obser- is used to test for the existing of such a statisti- vations Xi, an estimate of the mean embedding 1 Pn cal dependency. An interaction measure associated isµ ˜X = n i=1 k(Xi, ·). Given an i.i.d. sample n to a multidimensional probability distribution P is (Xi,Yi,Zi)i=1, the norm of the mean embedding µL

5 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

is empirically estimated using empirically centered triplets, where the null hypothesis of the three vari- kernel matrices. The empirically centered kernel able interaction is rejected (α = 0.01). We use the matrix K˜ for the kernel k is given by: Gaussian kernel as deﬁned:

2 ˜ − kxi − xjk Kij = hk(Xi, ·) − µ˜X , k(Xj, ·) − µ˜X i, (3) k(x , x ) = exp 2 (6) i j 2σ2 Sejdinovic et al. [48] arrives at the following test , where σ is a non-negative real number that deter- statistics for the two variable and three-variable de- mines the width of the kernel. We set σusing the pendency test using the kernel trick: median heuristic ([19]). In both tests, the null distribution is estimated by repeatedly permuting the 1 indices of the mRNA expressions of the samples (Z k 4 Pˆ k2 =k Pˆ − Pˆ Pˆ k2 = (K˜ ◦ L˜) L k⊗l XY X Y k⊗l n2 ++ variable), where we set the number of permutations (4) to 2, 000. 1 k 4 Pˆ k2 = (K˜ ◦ L˜ ◦ M˜ ) (5) L k⊗l⊗m n2 ++ 2.4 A Regression-Based Method Here, k denotes the kernel function defined in We use a simple alternative strategy to detect domain X that maps PX to H. Similarly, l and m synergistic miRNAs using linear regression, which are kernel functions defined in domains of Y and Z, formes a baseline. Starting from the candidate respectively. Using the kernel trick, computations triplets after Step 2 in the miRCoop pipeline, we on the probability distributions can be calculated build three regression models. In the first two mod- implicitly via kernel matrix operations. The ker- els, we model the relationship of miRNAs with the nel matrix for the kernel k is defined as the n × n mRNA separately by fitting a linear equation ( Eq 7 matrix Kij = k(Xi,Xj) computed over all sam- and Eq 8), where X indicates the first miRNA’s ple pairs, i, j. ◦ denotes an Hadamard (element- expression level and Y indicates the second one’s wise) product of matrices. ⊗ indicates the Kro- expression. In the third model, in addition to the necker product. 1 denotes an n × 1 vector and main effects of the individual miRNAs, we include n n ˜ A++ = Σi=1Σj=1Aij for a matrix A. K denotes an interaction term (X ∗ Y ) to measure synergy. the corresponding centered matrix of K, where A significant, negative, and large in magnitude re- ˜ 1 K = HKH and H = I − n 11 and I is the identity gression coefficient will indicate a synergy. These matrix. three models are given below: The same rules apply to the other gram matrices, n L and M. Given an i.i.d. sample (xi, yi, zi)i=1, the centered kernel matrices can be computed over Z = β0 + βxX + ε (7) the samples. Using equations 5, the norm of the Z = β0 + βyY + ε (8) embeddings can be computed using these centered Z = β0 + β1X + β2Y + βx,y(X ∗ Y ) + ε (9) matrices. We refer readers to [48] for the derivation of these equations. In the fist two models, a significant, negative, and small in magnitude coefficient suggests a weak 2.3.2 Applying Kernel-based Interaction or non-existing linear relationship between the Tests miRNA and the mRNA. To decide whether the magnitude of a coefficient is large or small, we need We form a random vector of triplets (X,Y,Z) ∈ to set a cut-off value. Looking at the distribution R × R × R. In the pairwise interaction tests, where of the coefficient values of all miRNAs ( Supple- we test for independence of miRNA expression lev- mentary File 1 Fig S5), we pick 0.25. We expect els with the mRNA (X-Z and Y -Z), we consider to see a negative correlation between the interact- triplets where the pairwise independence test fails ing miRNA-mRNA. Moreover, since we are inter- to be rejected (α = 0.01). For the three-variable ested in cases where the miRNA itself shows weak independence interaction test, we only consider interaction, we keep the triplets for which both

6 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

miRNA’s regression coefficients are small. Thus, asses the statistical significance of the statistic of we only consider cases where −0.25 < βx < 0 and interest (such as the number of TFs (See sections −0.25 < βy < in the first two models (Equations 3.2.2, 3.2.1, 3.3)), we obtain an empirical p-value by 7 and 8). Having filtered out triplets for which calculating the fraction of the cases in B random the above conditions on βx and βy does not hold, samplings where the test statistic is as extreme or we fit the third model (Equation 9). For this, we more extreme than the observed statistic. The null look for cases where βx,y in Eq. 9 is negative and model is constructed with 1, 000 samplings. large. In all cases, we only considered triplets with significant regression coefficients ( p-value < 0.05). The p-value is for the t-statistic of the hypothesis test with the null hypothesis that the corresponding coefficient is equal to zero against the alternative hypothesis that it is non-zero. 2.6 Data Collection and Processing 2.5 The Null Model for miRCoop Triplets We apply miRCoop on the patient–derived kid- Having identified the triplets, to check the va- ney cancer samples. We retrieve the precomputed lidity of different comparisons with the biological mRNA and miRNA expression levels from GDC data, we use a random sampling-based approach Data Portal on Sept 2nd 2017. mRNAs were quan- to obtain a random set of triplets that consti- tified using RNA-seq technology while miRNA’s tutes the null model. We use these randomized are using miRNA-seq technology. The details of triplets to obtain an empirical null distribution miRNA transcriptomic profiling are described in of the test statistic of interest in each case. We [12]. The details of data processing can be ac- construct the randomized triplets as follows: let cessed at https://docs.gdc.cancer.gov/. We S = t1, t2, ..., tn be the set of miRCoop triplets utilize the available Reads per million mapped identified. Let the i-th triplet member of S denoted reads (RPM) values and Fragments per Kilobase of by ti = (xi, yi, zi), which include the miRNA pairs, transcript per Million mapped reads upper quar- xi, yi and the mRNA zi. We sample a collection tile normalization (FPKM–UQ) for miRNA and of B random triplet sets of each with size m. Let mRNA expression data, respectively. We only use b b b b S = {t1, t2, . . . , tn} denote the b-th random triplet matched primary tumor solid samples that concur- b set. For the b-th sampling, ti is generated as fol- rently include mRNA and miRNA expression data, lows: the mRNA zi is retrieved, and one miRNA which contain 995 tumor patient samples. To filter pair of the triplet (xi or yi) is selected at random out the non-coding RNA genes from the RNASeq and retained, the paired miRNA triplet is though expression data and create an mRNA list, we used selected uniformly at random from the set of miR- GENCODE v27 annotations ([24]). NAs that can potentially target zi, Rz, as identified at the end of Step 2 of the miRCoop pipeline ex- We eliminate mRNAs with 30UTRs shorter than cluding the miRNA triplets (Fig 1). By repeating 12 nucleotides since two miRNAs cannot bind on this procedure for all samplings ∀b ∈ {1,...,B}, we such a short sequence at the same time. 30UTR se- B obtain a collection of B many triplet sets, Sbb=i the quences of mRNAs are obtained using ENSEMBL random counterparts of S. We must note that this BioMart ([15]). Genes with expression values lower procedure constructs a rather stringent null model. than 0.05 in more than 20% of the samples are fil- Note that if we were to select the mRNAs and miR- tered. Expression values are added with a constant NAs all at random, the miRNAs and mRNAs in 0.25 to deal with the 0 gene expression values, and these random triplets would likely be unrelated to data are log 2 transformed. We eliminate RNAs each other. With such a null model, attaining sig- that do not vary across samples. For this purpose, nificance would be essentially trivial. we filter RNAs with a median absolute deviation We calculate the p-value of the observed statis- (MAD) below 0.5. In the final set, we have 12,113 tic empirically based on this null distribution. To mRNAs and 349 miRNAs in kidney tumor samples.

7 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

3 Results miRNA miRNA Distance miRNA Family miR-132 miR-212 262 miR-132 3.1 Overview of the synergistic miR- miR-501 miR-502 4792 miR-500 NAs and their targets in Kidney miR-500a miR-532 5192 miR-500 Cancer Table 1: The list of identified synergistic miRNA We applied the methodology summarized in Fig 1 pairs in kidney cancer that are encoded close in the to find synergistic miRNAs and their targeting mR- genome and their family information. NAs. The number of candidate triplets that re- main after applying each of the filtering steps is provided in Supplementary Fig S1. The target transcription start and end sites within the genome prediction step produces 31, 184 potentially inter- are considered spatially close. We retrieve known acting miRNA–mRNA pairs and 163, 550 miRNA– transcription start and end sites of the miRNA miRNA–mRNA interaction combinations. Follow- from the Ensembl using biomaRt package [15]. We ing the remaining steps, we arrive at potential 66 observe that three of such pairs are very close, and synergistic triplets, which also contains 66 differ- these pairs form two different clusters (Table 1). ent synergistic miRNA pairs. The triplets include We find this number is not significantly high (p- 75 miRNAs. The summary statistics of the expres- value = 0.163). The reason could be that most sion values of these miRNAs are provided in Sup- reported since the synergistic miRNAs are not nec- plementary File 4. These triplets include 41 differ- essarily acting cis. However, those that we observe ent mRNAs, 27 of which are reported to be cancer close in the genome are interesting. These clustered driver genes for kidney cancer as reported in In- miRNAs are part of the miR–500 and miR–132 tOGen ([22]). The complete list of the identified miRNA family. There is evidence that miR–500 triplets is listed in Supplementary File 2 and the family takes part in kidney cancer ([39, 51, 17]). network for the potential miRNA-miRNA pairs is Genomic locations of these synergistic miR–500 provided in Fig 2. family miRNAs are illustrated in Fig 3.

3.2 Analysis of the Identified 3.2.2 Common TF Regulating the RNAs of Triplets the Triplets There is no available experimentally identified set Coordinated regulation of genes by a shared tran- of synergistic miRNAs to validate the triplets we scription factor often indicates a functional link identified. Thus, we devise several strategies to in- among the genes. To this end, we checked if the vestigate the miRCoop triplets in light of external identified triplets were potentially co-regulated by biological information. To check the statistical sig- a common TF. To be able to conduct this analysis, nificance of each of these findings, we use a null we first curated a set of regulatory interaction for model by generating random synergistic triplets, TFs regulating miRNA and mRNA. which we detail in Methods Section 2.4. Below Since none of the TF–miRNA databases are com- we describe the biological evidence relevant to the prehensive, we use two strategies to curate TFs reg- identified triplet set. ulating miRNAs. We first use data from available databases; this includes TransmiR ([29]), PuTmiR ([28]) and miRTrans ([27]) databases. miRTrans 3.2.1 Triplets Coded Close on the Genome database provides a list of TFs–miRNA interac- Recent studies reveal that miRNAs, which regu- tions for different tissues and cell types; we only late similar processes, are located nearby on the consider the renal cell lines HRE, human Renal genome ([64]). Others report that miRNAs that Epithelial Cells, and RPTEC, hTERT immortal- are proximally encoded are generally co-expressed, ized human renal proximal tubular epithelial cell [6], and this indicates a functional link. Thus, we lines. Secondly, we compile the list by analyzing investigate whether the identified synergistic pairs ChIP-Seq data made available by the ENCODE areclose on the genome. Specifically, 10 kb of the project ([26]) and TF motifs. We retrieve the

8 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

mir-320b-2 mir-30b mir-502 mir-26b mir-345 mir-1228 mir-1976 mir-4709 mir-33b mir-501 mir-3127 mir-6837 mir-6892 mir-134 mir-3677 mir-331

mir-4661 mir-500b mir-3130-1 mir-486-1 mir-766 mir-185 mir-328 mir-500a mir-103a-2 mir-4746 mir-532 mir-940 mir-935 mir-409 mir-509-2 mir-330 mir-188 mir-509-1 mir-103a-1 mir-584 mir-1248 mir-6806 mir-486-2

mir-3065 mir-616 mir-659 mir-212 mir-485 mir-615

mir-1226 mir-3130-2 mir-149 mir-125a mir-214 mir-5683 mir-1249 mir-132 mir-181d mir-19b-1

mir-504 mir-3170 mir-1301 mir-6511b-2 mir-197 mir-92a-2 mir-93 mir-6511b-1 mir-877 mir-146a

mir-550a-3 mir-3615 mir-296 mir-320b-1 mir-324 mir-629 mir-6842 mir-219a-1 mir-760 mir-1180

Figure 2: The network shows the miRNA synergistic pairs discovered for kidney cancer. Nodes are miRNAs and edges exist between them if they are participate in at least one miRCoop triplet.

Chr X Finally, we check if there is at least one common TF that regulates all RNAs (the miRNAs and the mRNA) that participate in a triplet interaction. In miR-532 miR-500-a miR-501 miR-502 checking this, we disregard POL2 as it will over- Figure 3: The red marks on the genome axis indi- lap with promoter regions without any relevance cate the miRNA positions of the synergistic miR- to the functional cooperation. We observe that for NAs that belong to the miR-500 family. The ﬁgure 64 of the 66 triplets, all triplets participants are co- was created with Gviz package in R ([23]). regulated by at least one common TF. This number was signiﬁcantly high at test level ( p-value = 0.04. The list of the TF–triplet relations is provided in Supplementary File 3. transcription start sites from Ensembl employing the biomaRt package[15] and 10, 000 upstream re- We inspect a subset of the TF–triplet regulations gions of transcription start of the gene in ques- more in detail. Recent studies report that the tran- tion using the R package GenomicRanges ([66]). scription factors, HIF1A ([30]), TFEB, TFE3 and We check if this region overlaps with the ChIP- MITF ([46]) are related to the kidney cancer. We Seq peaks in three kidney-related cell lines (HRE, check if there are any miRCoop triplets regulated HEK293B, HEK293) using the GLANET tool [70]. by these transcription factors. These four TFs are We compile the TF–miRNA interactions from the linked to a total of 27 triplets (out of 66) and the databases and discovered through analyzing ChIP- sub-network includes triplets that are coordinated Seq data. For the list of TFs regulating mRNAs, by multiple of these TFs Fig 4. Additionally, we ob- we conducted the same analysis on the ChIP-Seq serve that the mRNA pairs that are encoded very data. Furthermore, we scan the promoter regions closely on the genome (see Section 3.2.1) are among with the TFs’ binding motifs with JASPAR ([67]) them and have shared TFs. For example, TFE3 po- and Hocomoco v.11 ([68]) using the FIMO ([69]) tentially regulates both miR–501, miR–502, miR– tool with default parameter settings. If the TF has 500a, and miR–532. Apart from this, interestingly, a reported overlapping peak with the promoter and TFs related to the Xp11 translocation renal cell there is a matching motif inside the peak, we con- carcinoma disease (TFEB, TFE3 and MITF) ([46]) sider this as a TF–mRNA interaction. regulate the same triplets: (miR103a.1, miR486.1,

9 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

TFEB TFE3 MITF HIF1A

miR219a-1,miR6511b-1,CPT2 miR33b,miR501,EFNB2 miR3170,miR3615,LHFPL4

miR103a-1,miR486-1,PDZD8 miR501,miR502,RDH16 miR486-2,miR616,NFASC miR1228,miR3127,PIGQ miR132,miR212,FZD6

miR500a,miR532,TBC1D16 miR320b-2,miR6837,LDLRAP1 miR345,miR6892,FXYD3 miR1228,miR4709,DFFB

miR103a-1,miR330,AMOT miR760,miR877,SLC7A6 miR1301,miR149,BLOC1S1

miR3130-2,miR-3170,LHFPL4 miR1180,miR146a,LHFPL4

Figure 4: The network for the synergistic triplets regulated by common TFs. Each node denotes a triplet or a TF while an edge represents an interaction between them. TFs are represented by the diamond-shaped yellow symbol, and triplets are represented by the circle-shaped blue symbol.

PDZD8) and (miR103a.1, miR330, AMOT). the negative regulation of hydrolase activity, and the regulation of metal ion transport are all found 3.2.3 Functional Enrichment Analysis to be enriched (Fig 5 (b)). Among these, sodium and guanylate related processes are particularly re- Synergistic miRNAs are likely to regulate genes ported as important in kidney cancer ([31, 43, 42]). with the same or similar functions ([59]). To in- Also, the genes that are already known as the vestigate the potential biological process related to drivers of renal cancer are annotated with these the synergistic miRNAs, we conduct a functional processes. An example is Angiomotin (AMOT). enrichment analysis on them. To this end, we first [37], reports that AMOT is an oncogene in renal build a miRNA–miRNA–mRNA network. In this cancer. AMOT family members are also reported network, nodes are miRNAs and mRNAs, and an to promote cancer initiation in other cancers and edge exists between two RNAs whenever they are act as a tumor suppressor in some others ([62, 38]). connected through at least one miRCoop triplet. Interestingly, the brain related biological pro- A connected component analysis reveals that the cesses such as layer formation in the cerebral cor- network includes 13 connected components. Inter- tex, synapse organization, embryonic brain devel- estingly, the largest connected component covers a opment are found to be enriched (Fig 5(b)). It is very large fraction of the all miRNAs: of the 75 reported that brain metastasis is common for renal miRNAs that is part of the triplets, 41 of them cancer and in 15 % of cases, the tumor metastasizes were in this largest connected component (p-value to the brain ([58]). Additionally, brain metastasis = 0.032) (Fig 5 (a)) indicating that the cooper- is the main cause of morbidity and mortality in ative miRNAs are directly or indirectly linked to kidney cancer ([58]). These findings suggest that each other through their targets. mRNAs regulated by the synergistic triplets may Gene ontology (GO) enrichment analysis is GO participate in the mechanisms that promote brain analysis is conducted with clusterProfiler [63] pack- metastasis. age in R. We use the default background set that is all human genes annotated by GO Biological Pro- cess in the org.Hs.eg.db database [10]. We set a p- 3.3 Comparison with Other Meth- value cut-off of 0.05, an FDR cutoff 0.05 and used ods Bonferroni multiple hypothesis test correction. The GO enrichment analysis reveals that the set We first check the overlap of the identified triplets of mRNAs are enriched in processes that are asso- with with that of other computational meth- ciated with kidney cancer. For example, metabolic ods available in the literature: CancerNet ([40]), networks such as the receptor metabolic processes, TriplexRNA ([47]) and miRsyn [71] and a simple

10 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

DFFB

PIGQ

miR-4709 miR-1228 FXYD3 miR-3127 miR-3677 miR-550a-3 miR-345 OPRD1 IGFBP5 RSPH9

miR-26b miR-1180 miR-3130-1 miR-146a DAB2IP miR-504 TPT1 RPL10 miR-328 miR-6892 GUCA1B ZNF106 LHFPL4 miR-616 miR-134 miR-1226 miR-940 miR-185 LMTK2 KSR2 NFASC miR-500b miR-3615 miR-3130-2 miR-188 miR-3065 miR-486-2 PDZD8 miR-3170 miR-486-1

WTIP UNKL PPP1R16A miR-6837 WFDC3 miR-6806 miR-935 miR-1976 miR-409

miR-30b miR-103a-2 AMOT miR-103a-1 miR-320b-2

GTF2H2 LDLRAP1 miR-532 miR-330 TBC1D16

miR-500a

miR-584 SCN3B miR-93

miR-6842

(a) The largest connected component (b) Bar chart for enriched GO-terms

Figure 5: a) The the largest connected component of miRCoop triplet network. Blue nodes are miRNAs and pink colors are mRNAs of the triples. The gray nodes are dummy nodes to connect the triplet miRNAs to the target mRNA. b) Bar chart for the top 25 enriched biological process GO-terms (p-value < 0.05) found over the mRNA set of the largest connected component in part a). The x-axis shows log 2 transformed p-values.

regression-based model (See Section 2.4). Can- cerNet provides cancer-specific pre-computed synergistic pairs of miRNAs. Of the 66 synergistic miRNA pairs, miRCoop identifies, 35 of them are also suggested by CancerNet. This overlap is highly significant based on the sampling-based null model with a sampling size of 1, 000 (p-value ≤ 0.001). We also check the overlap with the synergistic pairs provided by TriplexRNA; however, we do not find a significant overlap. One reason for this minimal overlap could be that TriplexRNA triplets are not cancer-specific while the triplets we identified are. mirCoop do not have any overlapping triplets with the miRsyn and the regression-based model. The Figure 6: The Venn diagram for the number of sizes of the overlaps are summarized in the Venn synergistic miRNAs that are detected by different diagram in Fig 6. methods. In the TriplexRNA database, we utilize We also compare mirCoop with mirSyn [71]. We TEC predictions.For Cancernet predictions, kidney run miRSyn on the same data using the same data cancer specific results are used. miRsyn results that were described in Section 2.6 using suggested are obtained by running the miRCoop tool on the parameters. When considered only the pairwise TCGA kidney data. synergistic pairs, mirSyn return 5, 284 miRNA- miRNA-mRNA triplets that involves 1, 079 synergistic miRNAs. There was no overlap of the triplets with mirSyn, although there were 11 miRNAs that overlap. MiRsyn employs a filtering step based on

11 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Cox regression, thus eliminates transcripts that do 4 Conclusion not prognostic value, and the methods used are different. This could be partly the reason why there Post-transcriptional regulation of gene expression is no overlap. mediated by miRNAs is a fundamental mechanism to fine-tune the protein levels in the cell. The dys- regulation of miRNA expression is also associated We built the regression model (Section 2.4) to with many diseases, including cancer. miRNAs are constitute a baseline to assess whether mirCoop not only potential biomarkers, but are also inves- triplets could be identified with an easier strategy. tigated as promising therapeutic agents; therefore, This simple method results in 24 potential triplets, understanding the intricate ways they work is es- which do not have overlap with mirCoop triplets or sential for both expanding the knowledge on how any of the tools we compare the miRCoop against cells work and translational purposes. In this work, (shown in the Venn diagram provided in Supple- we focus on the synergistic target regulation of mentary File 1 Fig S2). miRNAs. To identify synergistic miRNAs pairs and their target mRNA, we propose a novel method: miRCoop. MiRCoop uses kernel-based statistical We also compare mirCoop with mirSyn and the interactions and finds triplets where miRNA ex- regression method in terms of the genome and the pression levels have no or only weak statistical de- TF relations. For the regression model, none of pendencies with the mRNA levels but are jointly the synergistic miRNAs are located close by on the dependent. miRCoop makes use of both RNA genome. We also find none of the triplets are regu- sequence information for target predictions. We lated with a common transcription factor. Thus, we demonstrate the use of miRCoop in kidney can- conclude that this simple strategy is not sufficient cer and find triplets with supporting biological evi- to solve this problem. This method has various dence on their relevance. The work presented here limitations, as well. It assumes that the nature of can be extended in several directions. the statistical interaction of the individual miRNAs and mRNAs are linear. Also, the linear regression Although it is possible to extend the Lancaster model assumes that the mRNA expression values interaction test to detect interactions of higher- are normally distributed (as the errors are assumed order, where more than two miRNAs regulate a mu- to be normally distributed in the standard regres- tual target, it would necessitate significantly higher sion). We also need to set arbitrary cutoffs on the computational resources. Therefore, at the mo- coefficients to decide the strong and weak interac- ment, we limit our analysis to the three-variable tions. interaction case. Finding higher-order interactions with ways to reduce the computation time can be a future line of research. We analyzed the synergistic miRNAs in miRsyn In this work, we focus on miRNAs that regulate that are encoded close-by on the genome. We a target gene directly which is those that simulta- observe that 10 out of 1, 079 synergistic miRNA neously bind to the target. There are other ways pairs suggested by miRsyn [71] are close-by on the in which pairs of miRNAs regulate a common tar- genome. The number of 10 was not significantly get. For example, while one miRNA targets the high at confidence level 0.05 when tested with the gene of interest directly, another miRNA can tar- permutation test. Next, we check if miRsyn triplets get the transcription factor that regulates this gene are co-regulated by common TFs. We conducted a ([9, 65]). In this case, one miRNA is acting directly statistical test exactly as we did in miRCoop to on the target, and the partner miRNA is acting in- check if this number is interesting hight. The syn- directly through the transcription factor. An addi- thetic interactions in the null model are constructed tional line of future work would be to extend miR- as described in section 2.5. The test fail to reject Coop to discover these multi-level regulations. the null hypothesis with p − value = 1.00. On av- Since there are no synergistic interaction datasets erage, (on average 3, 810 triplets are co-regulated that are readily available, we devise various strate- by at least one common TF) calculated over 1000 gies to check if the identified triplets and the indi- samplings. vidual RNAs are implicated in kidney cancer and

12 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

have any biological evidence supporting a func- NIH, NCI, Cancer Data Science Lab. tional theme or regulatory interaction among themselves. When experimentally validated synergistic miRNA pairs become available, the analysis can be References further improved in the light of the ground truth [1] Agarwal, V. et al. (2015). Predicting effec- interactions. For example, we set the kernel width tive microrna target sites in mammalian mrnas. using the median heuristic, but the width could be elife, 4, e05005. tuned when an extensive set of ground truth interactions become available. [2] An, J. et al. (2010). Identifying co-regulating In the present study, to establish target relations microrna groups. Journal of bioinformatics and between the miRNA and the mRNAs, we use two computational biology, 8(01), 99–115. target prediction algorithms. We resort to prediction algorithms because newly cataloged miRNAs [3] Antonov, A. V. et al. (2009). Geneset2mirna: – which is valid for the TCGA profile miRNAs finding the signature of cooperative mirna ac- - are not incorporated in the experimentally val- tivities in the gene lists. Nucleic acids research, idated miRNA target interaction datasets. To re- 37(suppl 2), W323–W328. duce the number of false positives, we only consider [4] Baek, D. et al. (2008). The impact of micrornas target interactions that are supported by both algo- on protein output. Nature, 455(7209), 64. rithms concurrently. The availability of experimentally validated miRNA-mRNA interactions should [5] Bartel, D. P. (2009). Micrornas: target recog- strengthen the outlined methodology. The number nition and regulatory functions. cell, 136(2), of synergistic triplets can be adjusted by chang- 215–233. ing the significance levels applied in the statistical [6] Baskerville, S. and Bartel, D. P. (2005). Mi- tests. For example, in a scenario where a large num- croarray profiling of micrornas reveals frequent ber of interactions can be verified experimentally at coexpression with neighboring mirnas and host scale, these settings can be adjusted to output more genes. Rna, 11(3), 241–247. triplets. In this study, we use all kidney cancer patients [7] Betel, D. et al. (2010). Comprehensive model- to have a large sample size. However, the set of ing of microrna targets predicts functional non- synergistic miRNAs could be different for different conserved and non-canonical sites. Genome bi- subpopulations of the same cancer type. As more ology, 11(8), R90. data accumulate, the analysis can be repeated over [8] Boross, G. et al. (2009). Human micrornas co- subtypes of cancer. silence in well-separated groups and have dif- Although we showcase results for kidney cancer, ferent predicted essentialities. Bioinformatics, the method can be applied to other diseases as well 25(8), 1063–1069. as healthy cells. The method can be extended to work with other species other than human when- [9] Bracken, C. P. et al. (2016). A network- ever the expression profiles of matched miRNAs biology perspective of microrna function and and mRNAs are available for over a set of samples, dysfunction in cancer. Nature Reviews Genet- and miRNA target predictions are available. ics, 17(12), 719. [10] Carlson, M. (2019). org.hs.eg.db: Genome Acknowledgments wide annotation for human. r package version 3.8.2. The results shown here are in whole or part based [11] Pearl, J. (2009). Causality. Cambridge univer- upon data generated by the TCGA Research Net- sity press. work: https://www.cancer.gov/tcga . We thank Sabanci University and Bilkent University for inter- [12] Chu, A. et al. (2016). Large-scale profiling of nal funding support. This research was supported micrornas for the cancer genome atlas. Nucleic in part by the Intramural Research Program of the acids research, 44(1), e3–e3.

13 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[13] Ding, J. et al. (2014). Microrna modules prefer [24] Harrow, J. et al. (2012). Gencode: the refer- to bind weak and unconventional target sites. ence human genome annotation for the encode Bioinformatics, 31(9), 1366–1374. project. Genome research, 22(9), 1760–1774.

[14] Ding, J. et al. (2017). Ccmir: a computa- [25] Hon, L. S. and Zhang, Z. (2007). The roles tional approach for competitive and cooperative of binding site arrangement and combinatorial microrna binding prediction. Bioinformatics, targeting in microrna repression of gene expres- 34(2), 198–206. sion. Genome biology, 8(8), R166.

[15] Durinck, S. et al. (2009). Mapping identiﬁers [26] Consortium, E. P. et al. (2012). An inte- for the integration of genomic datasets with the grated encyclopedia of dna elements in the hu- r/bioconductor package biomart. Nature proto- man genome. Nature, 489(7414), 57. cols, 4(8), 1184. [27] Hua, X. et al. (2017). mirtrans: a resource of [16] Enright, A. J. et al. (2003). Microrna targets transcriptional regulation on micrornas for hu- in drosophila. Genome biology, 5(1), R1. man cell lines. Nucleic acids research, 46(D1), D168–D174. [17] Fedorko, M. et al. (2016). Micrornas in the pathogenesis of renal cell carcinoma and [28] Bandyopadhyay, S. and Bhattacharyya, M. their diagnostic and prognostic utility as can- (2010). Putmir: a database for extracting cer biomarkers. The International journal of neighboring transcription factors of human mi- biological markers, 31(1), 26–37. crornas. BMC bioinformatics, 11(1), 190.

[18] Friedman, R. C. et al. (2009). Most mam- [29] Wang, J. et al. (2009). Transmir: a transcrip- malian mrnas are conserved targets of micror- tion factor–microrna regulation database. Nu- nas. Genome research, 19(1), 92–105. cleic acids research, 38(suppl 1), D119–D122.

[19] Fukumizu, K. et al. (2009). Kernel choice and [30] Sanchez, D. J. and Simon, M. C. (2018). Tran- classifiability for rkhs embeddings of probabil- scriptional control of kidney cancer. Science, ity distributions. In Advances in neural infor- 361(6399), 226–227. mation processing systems, pages 1750–1758. [31] Jeppesen, A. et al. (2010). Hyponatremia as [20] Gam, J. J. et al. (2018). A mixed antag- a prognostic and predictive factor in metastatic onistic/synergistic mirna repression model en- renal cell carcinoma. British journal of cancer, ables accurate predictions of multi-input mirna 102(5), 867. sensor activity. Nature communications, 9(1), 2430. [32] Krek, A. et al. (2005). Combinatorial microrna target predictions. Nature genetics, 37(5), 495– [21] Gennarino, V. A. et al. (2012). Identification 500. of microrna-regulated gene networks by expression analysis of target genes. Genome research, [33] Lai, X. et al. (2012). Computational analy- 22(6), 1163–1172. sis of target hub gene repression regulated by multiple and cooperative mirnas. Nucleic acids [22] Gonzalez-Perez, A. et al. (2013). Intogen- research, 40(18), 8818–8834. mutations identifies cancer drivers across tumor types. Nature methods, 10(11), 1081. [34] Lai, X. et al. (2019). Systems biology- based investigation of cooperating micrornas as [23] Hahne, F. and Ivanek, R. (2016). Statistical monotherapy or adjuvant therapy in cancer. genomics: methods and protocols. Chapter vi- Nucleic acids research. sualizing genomic data using Gviz and Biocon- ductor. New York: Springer New York, pages [35] Lancaster, H. O. and Seneta, E. (1969). Chi- 335–51. square distribution. Wiley Online Library.

14 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[36] Li, Y. et al. (2014). Mirsynergy: detecting syn- [47] Schmitz, U. et al. (2014). Cooperative gene ergistic mirna regulatory modules by overlap- regulation by microrna pairs and their identiﬁ- ping neighbourhood expansion. Bioinformatics, cation using a computational workﬂow. Nucleic 30(18), 2627–2635. acids research, 42(12), 7539–7552.

[37] Lv, M. et al. (2016). Angiomotin promotes [48] Sejdinovic, D. et al. (2013). A kernel test renal epithelial and carcinoma cell proliferation for three-variable interactions. In Advances in by retaining the nuclear yap. Oncotarget, 7(11), Neural Information Processing Systems, pages 12393. 1124–1132.

[38] Lv, M. et al. (2017). Angiomotin family mem- [49] Selbach, M. et al. (2008). Widespread changes bers: oncogenes or tumor suppressors? Interna- in protein synthesis induced by micrornas. national journal of biological sciences, 13(6), 772. ture, 455(7209), 58.

[39] Mangolini, A. et al. (2014). Diﬀerential ex- [50] Shalgi, R. et al. (2007). Global and lo- pression of microrna501-5p aﬀects the aggres- cal architecture of the mammalian microrna– siveness of clear cell renal carcinoma. FEBS transcription factor regulatory network. PLoS Open Bio, 4, 952–965. computational biology, 3(7), e131.

[40] Meng, X. et al. (2015). Cancernet: a database [51] Shu, X. et al. (2017). Microrna proﬁling in for decoding multilevel molecular interactions clear cell renal cell carcinoma tissues potentially across diverse cancer types. Oncogenesis, 4(12), links tumorigenesis and recurrence with obesity. e177. British journal of cancer, 116(1), 77. [41] Muandet, K. et al. (2017). Kernel mean em- [52] Smola, A. et al. (2007). A hilbert space embedding of distributions: A review and beyond. bedding for distributions. In International Con- Foundations and Trends R in Machine Learn- ference on Algorithmic Learning Theory, pages ing, 10(1-2), 1–141. 13–31. Springer. [42] Nagy, I. Z. et al. (1981). Intracellular na+: K+ ratios in human cancer cells as revealed by en- [53] Song, L. et al. (2013). Kernel embeddings ergy dispersive x-ray microanalysis. The Jour- of conditional distributions: A uniﬁed ker- nal of cell biology, 90(3), 769–777. nel framework for nonparametric inference in graphical models. IEEE Signal Processing Mag- [43] Oosterwijk, E. and Gillies, R. (2014). Tar- azine, 30(4), 98–111. geting ion transport in cancer. Philosophical Transactions of the Royal Society B: Biological [54] Wahdan-Alaswad, R. and Liu, B. (2013). “sis- Sciences, 369(1638), 20130107. ter” mirnas in cancers.

[44] Peng, Y. and Croce, C. M. (2016). The role of [55] Wang, S. et al. (2013). Functional co- micrornas in human cancer. Signal transduction operation of mir-125a, mir-125b, and mir- and targeted therapy, 1, 15004. 205 in entinostat-induced downregulation of erbb2/erbb3 and apoptosis in breast cancer [45] Schölkopf, B. et al. (1999). Advances in Kernel cells. Cell death & disease, 4(3), e556. Methods: Support Vector Learning. MIT Press. [56] Wilcoxon, F. (1945). Individual comparisons [46] Skala, S. L. et al. (2018). Detection of 6 tfeb- by ranking methods. biom bull 1: 80–83. amplified renal cell carcinomas and 25 renal cell carcinomas with mitf translocations: system- [57] Wu, S. et al. (2010). Multiple micrornas modu- atic morphologic analysis of 85 cases evaluated late p21cip1/waf1 expression by directly target- by clinical tfe3 and tfeb fish assays. Modern ing its 3 untranslated region. Oncogene, 29(15), Pathology, 31(1), 179. 2302.

15 bioRxiv preprint doi: https://doi.org/10.1101/769307; this version posted March 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[58] Wyler, L. et al. (2014). Brain metas- [69] Grant, C. E. et al. (2011). Fimo: scanning for tasis in renal cancer patients: metastatic occurrences of a given motif. Bioinformatics, pattern, tumour-associated macrophages and 27(7), 1017–1018. chemokine/chemoreceptor expression. British journal of cancer, 110(3), 686. [70] Otlu, B. et al. (2017). Glanet: genomic loci annotation and enrichment tool. Bioinformatics, [59] Xu, J. et al. (2010). Mirna–mirna synergistic 33(18), 2818–2828. network: construction via co-regulating functional modules and disease mirna topological [71] Zhang, J. et al. (2019). Identifying mirna syn- features. Nucleic acids research, 39(3), 825– ergism using multiple-intervention causal infer- 836. ence. bioRxiv, page 652180.

[60] Spirtes, Peter et al. (2000). Causation, prediction, and search. MIT press

[61] Yang, X. et al. (2013). Both mature mir-17- 5p and passenger strand mir-17-3p target timp3 and induce prostate tumor growth and invasion. Nucleic acids research, 41(21), 9688–9704.

[62] Yi, C. et al. (2013). The p130 isoform of angiomotin is required for yap-mediated hepatic epithelial cell proliferation and tumorigenesis. Sci. Signal., 6(291), ra77–ra77.

[63] Yu, G. et al. (2012). clusterproﬁler: an r package for comparing biological themes among gene clusters. Omics: a journal of integrative biology, 16(5), 284–287.

[64] Yuan, X. et al. (2009). Clustered micrornas’ coordination in regulating protein-protein interaction network. BMC systems biology, 3(1), 65.

[65] Zhang, J. et al. (2016). Identifying mirna synergistic regulatory networks in heterogeneous human data via network motifs. Molecular bioSystems, 12(2), 454–463.

[66] Lawrence, M. et al. (2013). Software for com- puting and annotating genomic ranges. PLoS computational biology, 9(8), e1003118.

[67] Khan, A. et al. (2017). Jaspar 2018: update of the open-access database of transcription factor binding proﬁles and its web framework. Nucleic acids research, 46(D1), D260–D266.

[68] Kulakovskiy, I. V. et al. (2017). Hocomoco: towards a complete collection of transcription factor binding models for human and mouse via large-scale chip-seq analysis. Nucleic acids research, 46(D1), D252–D259.