Statistical Analysis Has Demonstrated the Expression Levels of Predicted Mrna Targets Are

1Online Supplementary material

2MicroRNA regulation of messenger-like non-coding RNAs 3 ––a network of mutual microRNA control

4Yi Zhao2,*, Shunmin He1,4,*, Changning Liu2,*, Songwei Ru2, Haitao Zhao3, Zhen Yang2, 5Pengcheng Yang2, Xiongyin Yuan2, Shiwei Sun2, Dongbo Bu2, Jiefu Huang3, Geir 6Skogerbø1, §, Runsheng Chen1, § 71 Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, 8Chinese Academy of Sciences, Beijing, China 92 Bioinformatics Research Group, Institute of Computing Technology, Chinese Academy of Sciences, 10Beijing, China 113 Department of Liver Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical 12Sciences, CAMS & PUMC, Beijing, China 134 Graduate School of the Chinese Academy of Sciences, Beijing, China 14

15miRNA target site prediction and mlRNA–mRNA expression analysis

16Statistical analysis has demonstrated that the expression levels of predicted mRNA

17targets are sometimes reduced in tissue where the targeting miRNA is expressed [1, 2]. A

18demonstration of similar effects on messenger-like noncoding RNAs (mlRNAs) would

19provide evidence that miRNAs may also act on non-coding RNAs. However, directly

20repeating the analyses carried out on mRNAs is, for several reasons, not straight-forward.

21Earlier studies may have provided lists of miRNA target sites in mRNA from various

22model organisms (e.g. PicTar [3]), but the software is commonly not available, at least

23not in a form that makes feasible a study of several thousand transcripts, and target site

24prediction in mlRNAs has, to our knowledge, not yet been published. Further, most of the

25commonly used software [4, 5] includes a conservation filtering step, and, despite 26appearing to be under purifying selection [6], mlRNAs are generally far less conserved

27than mRNAs rendering most published software of little use, even if available.

28

29We therefore opted for the publicly available miRanda [7] software for miRNA target

30prediction, as this algorithm does not have a built-in conservation filter step. MicroRNA

31target site prediction in mRNAs is commonly restricted to the 3’UTR, which is a

32meaningless term in a non-protein coding transcript. In order to determine what – if any –

33portion of an mlRNA that would be most suitable of miRNA target prediction, we used

34PhastCon [8] scores to assess their relative conservation levels. As shown in Figure S1,

35the distribution of average PhastCon scores in a full-length mlRNA deviates strongly

36from that of a full-length mRNA, but resembles closely that of the 3’UTR portion of an

37mRNA. Thus, in the following we used full-length mlRNA sequences for miRNA target

38prediction. 39

40Figure S1. Conservation of mlRNAs relative to mRNAs. The figures shows the frequency

41distribution of full-length mlRNA sequences compared to full-length mRNAs, 5’UTRs,

423’UTRs and coding sequences.

43

44The RIKEN cDNA microarray systematically profiles the expression of 54,000 cDNA

45clones in 20 different mouse tissues [9], including a considerable number of FANTOM2

46clones [10]. In the most recent version of the FANTOM3 database [11] there are

47approximately 68,000 cDNA clones corresponding to mRNAs and 34,000 clones

48corresponding to mlRNAs. For 15,734 mRNA clones and 10,326 mlRNA clones it is

49possible to obtain expression profiles from the RIKEN cDNA microarray data covering 5015 tissues or more. For mlRNAs with expression data lacking in only a few tissue, we set

51the expression levels in these tissues to 0 (i.e. corresponding to the average of all genes).

52

53We predicted miRNA target sites in the full-length sequence of the 10,326 mlRNAs using

54the miRanda v1.9 software [7] with the parameter –sc 120. To increase prediction

55accuracy and reduce the number of false positives, we introduced a conservation filter,

56selecting mlRNAs with a full-length average PhastCons score > 0.2. The miRanda score

57was then used to select approximately the top 100 mlRNA targets, which were used for

58further analysis. As target prediction with miRanda is assumed to be less accurate [12]

59than software used for the most relevant similar study on mRNAs (PicTar; [2]), we also

60predicted target sites in the 10,981 mRNAs with 3’UTR annotation in the FANTOM3

61data, and selected the top 200 mRNA targets for subsequent analysis and comparison.

62

63Subsequent analyses were carried out entirely in accordance with Sood et al [2], where a

64detailed description of the method is provided. In brief, we first predicted targets for each

65miRNA, and then ranked expression levels among the 20 tissues for each target gene.

66This gave a vector (for each tissue) containing the rank of that tissue for all target genes

67included in the analysis. The probability that the median distribution in tissue vector is

68different from vector generated from the background set (i.e. all mlRNAs with expression

69data from >15 tissues) was estimated using Wilcoxon’s one-sided test, and a score was

70calculated as the negative natural logarithm of this probability. A negative score was

71reported when the probability of the ''less than'' one-sided test was the smallest. Positives

72scores were reported in a corresponding manner. The score thus indicates the probability 73that the median expression level of the mlRNA targets is lower (or higher) when

74compared with the background set. All statistical tests were with R using default settings.

75

76The results are summarized in Table 1 and Figure 1 in the main text, and Figure S2

77(below). We analysed the expression levels of 8 different miRNAs specifically expressed

78in 5 different tissues [13-15], constituting 10 different miRNA-tissue combination. This

79resulted in 4 cases in which predicted mlRNA targets had significantly (p<0.05) reduced

80expression level in the tissue where the targeting miRNA is expressed. A corresponding

81expression analysis of predicted mRNA targets produced two cases with a tendency

82towards down-regulation (p<0.05). As previous analyses of predicted mRNA targets were

83carried out on human material, our data from mice are not directly comparable.

84Nonetheless, miR-124a, which was shown to induce strongly reduced expression of

85human target mRNAs in the brain [1] displayed the same result for mouse target mRNAs

86(Figure S2). Both mRNA and mlRNA targets of heart-and-skeletal-muscle-specific miR-

87133a show significantly reduced expression in at least one of these tissues in the mouse,

88which is similar to the results from human target mRNAs [2]. Similarly, the expression of

89mRNA targets of miR-122a in the liver is strongly, though not significantly (p=0.08) ,

90reduced, as was also seen in human liver [2].

91

92Table S1. Eight mouse tissue-specific miRNAs and their corresponding tissues of

93expression.

94 miRNAs Tissue

mmu-miR-122a Liver mmu-miR-124a Brain mmu-miR-133a Heart, skeletal muscle mmu-miR-153 Brain mmu-miR-206 Heart, skeletal muscle mmu-miR-208 Heart mmu-miR-375 Pancreas mmu-miR-376a Pancreas 95

96The significance scores obtained in this study (ranging from -7 to -4, corresponding to p

97values between 0.008 and 0.05) were considerably higher (i.e. “less significant”) than

98those reported by Sood et al. (ranging from -17 to -8) (see fig. 1 in the main text, and

99supplementary fig. 2). This may partly have been caused by the microarray data set used,

100as a similar analysis of PicTar mRNA targets using the RIKEN cDNA microarray data

101consistently gave lower significance scores than when using the SymAtlas data (see:

102http://symatlas.gnf.org) (data not shown). However, most of the difference is likely due to

103the lower accuracy of miRNA target site prediction imparted by miRanda [12]. By

104comparison, analysis of PicTar predicted mouse target mRNAs gave a much higher

105number of tissues with significantly reduced expression of targets of tissue specific

106miRNAs.

107 108miR-133a:

6

) 4 e u l a v - p

2 n o x o c l i

W 0

d e d i s

- -2 e n o

n l (

- -4 / +

-6 e y e g s s n a n s s t m r h n e e y y l e n n i u o t e u a r u e c i n s a a c n o u t a v m l a n e r e l a i o d d s d b l l e s e i y o r e r l m t p 0 0 u i p t e h l h o c b i c c e s 1 1 m k a s u t n t t d b e e e l a s t a e t t p p r n a a e i n n c _ o o l e e l n n a _ _ m n m s i u k l s l e b e r e 109 c

110miR-124a:

6 ) e u

l 4 a v - p

2 n o x o

c 0 l i W

-2 e d i s

- -4 e n o

n -6 l ( - /

+ -8 n m s t s n r y s e h a y e e n s y e g i u i r u e e a a s c t e n l o u a n n a l t a m e v d e o a n n o c l r d i u r l s e y l i 0 r p m e d b s o e 0 t l b e e h h p l 1 c i o c i u c t 1 s b t t s e n d t a k m u e e e t a a s l t t r a p p a n e n n i c o o _ e e l n n l _ _ a m n m u i s l k l s e b e r e 111 c

112Figure S2. Tissue-specific effects of miRNAs on mRNA target expression. The figure

113shows a statistical analysis of the expression of mouse mRNAs targets of two tissue-

114specific miRNAs across 20 tissues. The ordinate indicates the statistical significance

115value of the down-regulation. 116

117The miRNA – mlRNA network

118To get an impression of the miRNA – mlRNA network, we used miRanda (score >160)

119to predict target sites for 461 mouse miRNAs (miRBase v10.0) in the full-length

120sequence of 10,326 mlRNAs used for the expression analysis. This prediction resulted in

121158,025 putative target sites, corresponding to the same number of edges in a potential

122miRNA – mlRNA post-transcriptional regulatory network. On average each mlRNA

123would be targeted by 15.3 miRNAs, whereas each miRNA would target an average of

124343 mlRNAs. Only 156 mlRNAs were not predicted to be targeted by any of the 461

125miRNA, and the highest number of miRNAs potentially targeting one single mlRNA was

12679. Only four miRNAs were not predicted to target any of the 10,326 mlRNAs. There

127were substantial differences in the number of targeted mlRNA per miRNA; more than

128one hundred miRNA target less than 30 mlRNAs, whereas around 40 miRNAs were

129predicted to have more than a thousand mlRNA targets.

130

131A similar analysis of the 10,981 mRNA 3’UTRs indicated that each mRNA would be

132targeted by 7.3 miRNAs, and each miRNA would on average target 174 mRNAs. The

133higher number of mlRNA targets probably reflects that the full-length mlRNA sequences

134(1930 nt) are generally more than twice (2.2x) as long as 3’UTRs (840 nt). Compared to

135the comparative study by Sethupathy et al.[12] in which miRanda predicted 572

136targets/miRNA, our score threshold of 160 used for both the mlRNA and mRNA analysis

137appears to be a relatively stringent condition.

138 139A question of particular interest is whether pri-miRNAs might be under post-

140transcriptional control of other miRNAs. There are presently 33 mlRNAs that are known

141to encode a total of 36 miRNAs. To get an impression of the characteristics of the

142miRNA-miRNA network, we predicted target sites for these 36 miRNA in the 33

143miRNA-encoding mlRNAs. We got 162 miRNA-mlRNA interactions forming a

144continuous network (Figure 1 in the main text). Analysis of network topology with

145FANMOD [16] showed that a number of 3- and 4-node motifs were significantly

146enriched in the network (Table S2 and Figure S3), and a few miRNAs are predicted to

147have target sites within their own primary transcript sequences

148

149Table S2. Significantly enriched subgraphs

Frequency Mean-Freq Standard-Dev ID Adj Z-Score p-Value [Original] [Random] [Random]

166 1.1919% 0.72063% 0.0021995 2.1426 0.014

590 0.79901% 0.39795% 0.0013821 2.9018 0.006

2076 1.6513% 1.0531% 0.0021141 2.8296 0.006 158 2.5568% 1.8225% 0.0028017 2.6208 0.003

18460 0.17756% 0.070099% 0.00041159 2.6108 0.014

2140 0.78125% 0.46183% 0.001333 2.3963 0.015

710 1.1009% 0.63749% 0.0022801 2.0322 0.035

150 151

152Figure S3. Examples of regulatory subgraphs. A) Multiple-Input Motif + Auto-

153regulation. B) Feed Back Loop + Auto-regulation. C) Single-Input Motif + Auto-

154regulation. D) Feed Forward Loop + Auto-regulation. 155References

1561. Lim, L.P., et al., Microarray analysis shows that some microRNAs downregulate 157 large numbers of target mRNAs. Nature, 2005. 433(7027): p. 769-773. 1582. Sood, P., et al., Cell-type-specific signatures of microRNAs on target mRNA 159 expression. PNAS, 2006. 103(8): p. 2746-2751. 1603. Krek, A., et al., Combinatorial microRNA target predictions. Nat Genet, 2005. 161 37(5): p. 495-500. 1624. Mishima, Y., et al., Differential Regulation of Germline mRNAs in Soma and 163 Germ Cells by Zebrafish miR-430. Current Biology, 2006. 16(21): p. 2135-2142. 1645. Wu, L., J. Fan, and J.G. Belasco, MicroRNAs direct rapid deadenylation of 165 mRNA. Proceedings of the National Academy of Sciences, 2006. 103(11): p. 166 4034-4039. 1676. Ponjavic, J., C.P. Ponting, and G. Lunter, Functionality or transcriptional noise? 168 Evidence for selection within long noncoding RNAs. Genome Res., 2007: p. 169 gr.6036807. 1707. Enright, A.J., et al., MicroRNA targets in Drosophila. Genome Biol, 2003. 5(1): 171 p. R1. 1728. Siepel, A., et al., Evolutionarily conserved elements in vertebrate, insect, worm, 173 and yeast genomes. Genome Res, 2005. 15(8): p. 1034-50. 1749. Bono, H., et al., Systematic Expression Profiling of the Mouse Transcriptome 175 Using RIKEN cDNA Microarrays. Genome Res., 2003. 13(6b): p. 1318-1323. 17610. Okazaki, Y., et al., Analysis of the mouse transcriptome based on functional 177 annotation of 60,770 full-length cDNAs. Nature, 2002. 420(6915): p. 563-73. 17811. Carninci, P., et al., The Transcriptional Landscape of the Mammalian Genome. 179 Science, 2005. 309(5740): p. 1559-1563. 18012. Sethupathy, P., M. Megraw, and A.G. Hatzigeorgiou, A guide through present 181 computational approaches for the identification of mammalian microRNA targets. 182 Nat Meth, 2006. 3(11): p. 881-886. 18313. Sempere, L.F., et al., Expression profiling of mammalian microRNAs uncovers a 184 subset of brain-expressed microRNAs with possible roles in murine and human 185 neuronal differentiation. Genome Biol, 2004. 5(3): p. R13. 18614. Lagos-Quintana, M., et al., Identification of tissue-specific microRNAs from 187 mouse. Curr Biol, 2002. 12(9): p. 735-9. 18815. Poy, M.N., et al., A pancreatic islet-specific microRNA regulates insulin 189 secretion. Nature, 2004. 432: p. 226-230. 19016. Wernicke, S. and F. Rasche, FANMOD: a tool for fast network motif detection. 191 Bioinformatics, 2006. 22(9): p. 1152-1153. 192 193