Investigating the competing endogenous RNA hypothesis Genome-wide and in Single Cells by Apratim Sahay B.S in Physics and Mathematics, University of Chicago (2008) Submitted to the Department of Physics I- in partial fulfillment of the requirements for the degree of CO cO- C DOCTOR OF PHILOSOPHY at the I Ul) MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2015

Massachusetts Institute of Technology 2015. All rights reserved.

Signature redacted Author. Department of Physics /1 May 22nd, 2015 Signature redacted Certified by A/ /7 Alexander van Oudenaarden MIT Pro sor of Physics and Professor of Biology Director, Hubrecht Intitute for evelopmental Biology I Thesis Supervisor

Certified by Signature redacted Jeff Gore Latham Family Career Development Assistant Professor of Physics Thesis Supervisor Signature redacted_ _ Accepted by Professor Nergis Mavalvala Associate Department Head of Physics 77 Massachusetts Avenue Cambridge, MA 02139 MITLibraries htp://Iibraries.mit.edu/ask

DISCLAIMER NOTICE

Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available.

Thank you.

The images contained in this document are of the best quality available.

Investigating the competing endogenous RNA hypothesis Genome-wide

and in Single Cells

by

Apratim Sahay

Submitted to the Department of Physics on May 22nd, 2015, in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Abstract

The observation that microRNAs (miRNAs), through a titration mechanism can cou- ple interactions of their common targets (competing endogenous RNAs or ceRNAs) has prompted a general "ceRNA hypothesis' that RNAs can regulate each other indirectly through global RNA-miRNA-RNA networks. These ceRNAs are said to "crosstalk' with each other by competing for common miRNAs. Although many individual ceRNAs have been found, fundamental questions about both the magnitude and generality of the crosstalk effect remain. In our study we combine RNA sequencing and single-molecule FISH (smFISH) approaches to both measure the magnitude of the crosstalk effect genome-wide by perturb- ing three known ceRNAs (Pten, Vapa, Cnot6l) and to identify mechanisms by which the crosstalk effect acts. We identify hundreds of putative ceRNAs and dissect the contributions of individual miRNAs in transmitting crosstalk. We demonstrate that while the crosstalk effect is pervasive, it nevertheless remains bounded by the size of the perturbation. Further- more, we show that both the number and affinity of shared miRNA binding sites between targets is crucial in determining the magnitude of the crosstalk strength. Using the smFISH data, we examined the single-cell expression profiles of pairs of ceRNAs and found that ceRNA gene expression is correlated only in the presence of active miRNAs. Additionally, on inspecting the intra-cellular localization of RNA molecules, we found a miRNA-dependent colocalization of ceRNAs, suggesting a new signature of crosstalk between ceRNAs that extends and modifies the original hypothesis.

Thesis Supervisor: Alexander van Oudenaarden Title: MIT Professor of Physics and Professor of Biology Director, Hubrecht Institute for Developmental Biology

Thesis Supervisor: Jeff Gore Title: Latham Family Career Development Assistant Professor of Physics

This work is dedicated to my grandparents

Gaur Priya Devi & Krishnanand Sahay, Veena Srivastava & Shailendra Nath Srivastava who instilled in me their love for the life of the mind

and the desire to share its fruits with others. Acknowledgements

This thesis would not have been possible without the help, encouragement and support of many people to whom I owe a debt of gratitude. First and foremost, Alexander van Oude- naarden, my thesis advisor, who welcomed me into his lab and gave me great freedom and support throughout my PhD. Alexander's grasp of experimental biophysics is truly broad and deep, which I found as he led the lab through the smFISH era, the RNA sequencing era and the single-cell sequencing era. Not only was he an inspiring scientist, but he also created a fantastic group of enormously talented students and post-docs in building 68 that buzzed with stimulating ideas. After introducing me to microRNAs and suggesting an ex- perimental plan of attack, he then stepped back to let me find my own way. Always there to offer a suggestion, to share in excitement or to help think through a problem, he has been a great mentor. After his move to Utrecht, he offered me numerous opportunities to visit him there and work with another set of fantastic people. Finally, I am also thankful for the opportunity as a graduate student to be able to make mistakes. I will be forever grateful for Alexanders limitless patience throughout this process.

I sincerely thank my thesis committee members, Jeff Gore, Jeremy England and Mehran

Kardar for their support and advice throughout my graduate years. Jeff in particular for his blend of unflappable enthusiasm and guidance during some of the more trying phases of research.

Next, my wonderful collaborators - Joern Schmiedel, Yannan Zheng, Sandy Klemm,

Dominic Gruen. Joern came to MIT a year into my thesis project and has helped shape and sharpen my ideas tremendously. His enthusiasm and dogged persistence in solving problems were a great boost whenever I was stuck in dark alleys. Yannan and I started

vi and finished our PhD's together and also been through all the ups and downs of graduate student life together. She taught me a lot about microRNA biology and was an invaluable source of experimental guidance, especially cell culturing and cloning. Dominic helped set up the RNA Sequencing pipeline in Utrecht and generously shared his expertise in microRNA bioinformatic analysis. Sandy was a fantastic friend, a critical sounding board for hypothesis, and taught me the intricacies of live-cell FACS sorting.

My graduate life would not have been half as much fun without the tremendous people at the AvO lab: Dong Hyun Kim for mentoring me in worm biology when I first came to the lab and training me in the dark arts of FISH. He and Christoph Engert were vital founts of friendship, mentorship and cheer. To the postdocs: Stefan, Jeroen, Magda, Nick,Nikolai, Lenny, Shalev, Philipp, Anna, Gregor, Arjun, Scott, who took the time to provide critical advice on experiments, research, and life. To the amazing graduate students in the lab office who shared all the joy and frustrations of research. You made the AvO lab fun and exciting: Ruizhen, Miaoqing, Bernardo, Clinton, Ni, Annnalisa, Dylan, Kay, Juan, Shankar, Hyun. Lastly, Monica Wolf, Annemiek van Rooijen, Crystal, Cathy and Katie who have meticulously taken care of any and all administrative issues that have cropped up.

During my time at MIT, I've been lucky to have some wonderful roommates and friends-

Michelle, Andrew, Andrew Stecker, Arghavan, David who have been fantastic at keeping a balanced life. Friends on the squash courts who have offered huge support and camaraderie over the years, thank you for helping me maintain my sanity- Najib, Ann, Pam, Jan, Frans, Christopher, Christoph, Justin, Mehmood.

Finally I would like to thank my parents Aparajita and Avinash, for being so amaz- ingly supportive throughout my entire academic career, and life in general, and providing countless opportunities to me. My sisters Ananya and Apoorva for your love and feigned excitement at my research! My cousins, Sunny, Pranay, and Abhilash for their encourage- ment and shared geekdom. My extended family in India for their tremendous support over the years. Lastly, my wife Liz, without whom I would never have been introduced to the world of biology, and without whose unwavering support none of this would have happened.

Your intelligence, encouragement and limitless love makes all things possible.

vii Table Of Contents

Acknowledgements vi

List of Figures xii

1 Introduction 9 1.1 MicroRNAs-discovery, biogenesis, target binding and competition . 10 1.1.1 Discovery of miRNA Regulation ...... 10 1.1.2 Biogenesis of miRNAs ...... 11 1.1.3 miRNAs: target binding and competition ...... 12 1.2 ceRNAs: Discovery ...... 13 1.2.1 Different types of endogenous ceRNAs ...... 15 1.2.2 3'UTRs as ceRNAs ...... 15 1.2.3 Circular RNAs ...... 16 1.2.4 Pseudogenes as ceRNAs ...... 16 1.2.5 Long non coding (lncRNA) as ceRNAs ...... 17 1.3 Modulators of crosstalk activity ...... 18 1.3.1 Abundance of miRNA binding sites and miRNA concentration .. . 19 1.3.2 MiRNA binding affinity ...... 20 1.3.3 MRE Accessibility and Local concentrations ...... 20 1.3.4 Post-transcriptional network effects ...... 21 1.4 Summary and Outline ...... 22

2 Assesment of the ceRNA hypothesis with integrated genome-wide mea- surements reveals bounded yet pervasive crosstalk activity 24 2.1 Results ...... 26 2.1.1 ODE biochemical model of crosstalk predicts that crosstalk strength should be bounded by 1 ...... 28 2.1.2 Quantification of crosstalk following siRNA knockdown of sender 32 2.1.3 Pervasive yet bounded mRNA Crosstalk upon siRNA knockdown. 35 2.1.4 Crosstalk strength correlates with the number of shared binding sites 37 2.1.5 miRNA's hierarchically contribute to transmitting crosstalk .... . 40

viii TABLE OF CONTENTS

2.1.6 Pten miRNAs have the greatest crosstalk power due to high [miRNA]: Target abundance ratios ...... 44 2.1.7 Transfecting Pten UTR as a sponge de-represses putative ceRNA's in a dose-dependent and miRNA dependent manner ...... 46 2.2 Discussion and conclusions ...... 54 2.3 Methods and Materials ...... 58 2.3.1 Cell culture and siRNA Transfection ...... 58 2.3.2 RNA extraction ...... 59 2.3.3 RT-PCR ...... 59 2.3.4 Reporter Plasmid Construction ...... 60 2.3.5 Transient Transfection of plasmid ...... 60 2.3.6 FACS sorting ...... 60 2.3.7 RNA Sequencing ...... 61 2.3.8 RNASeq Data Analysis ...... 61 2.3.9 miRNA-mRNA Target prediction ...... 62 2.3.10 miRNA expression Data sources ...... 62 2.3.11 Target Abundance and Sequestration estimation ...... 62 2.3.12 GO term analysis ...... 62 2.3.13 TMM (Trimmed Mean of M-values) Normalization ...... 63 2.4 Supplementary Figures and Tables ...... 64

3 A single molecule analysis of ceRNAs reveals miRNA-dependent cor- relation and colocalization 69 3.1 Results ...... 70 3.1.1 Quantification of gene expression for Pten, Vapa and Cnot6l in single cells with 3-colour smFISH ...... 70 3.1.2 Presence of shared miRNAs generates correlated fluctuations of Pten ceRNAs in single cells ...... 72 3.1.3 Pten, Vapa, Cnot6l are mutually reciprocal ceRNAs ...... 75 3.1.4 Individual molecules of Pten ceRNAs are colocalized in a miRNA- dependent manner ...... 77 3.2 D iscussion ...... 80 3.3 Methods...... 81 3.3.1 Fluorescent in situ hybridization and imaging ...... 81 3.3.2 Image analysis ...... 82 3.3.3 siRNA transfection and cell culturing ...... 82

4 MicroRNA-mediated control of expression noise 83 4.1 Background ...... 83 4.2 Effects of microRNAs on gene expression noise ...... 84

ix Chapter 0

4.3 Conclusions ...... 94

5 Conclusions and Future Directions 95

References 98

Appendix A: Mathematical model of microRNA regulation by Joern Schmiedel 101

x List of Figures

1.1 Canonical miRNA biogenesis pathway (adapted from (Davis-Dusenbery Hata, 2010)...... 11 1.2 Logic of the ceRNA language (adapted from (Salmena, 2011) ...... 13 1.3 Various types of validated competing endogenous RNAs (adapted from (Tay & Pandolfi, 2014) ...... 15 1.4 Extensive co-targeting of miRNAs - many targets share miRNA binding sites(adapted from Obermayer(2014). The color of the edges indicates the number of pairs which share a given pair of miRNAs while the size of the nodes indicates the total number of shared targets for a given miRNA . . . 21

2.1 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded ...... 31 2.1 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded ...... 32 2.2 siRNA knockdown of 3 different endogenous senders shows crosstalk strength is bounded by 1 ...... 34 2.3 Crosstalk is miRNA-mediated and pervasive on a genome-wide scale .... 36 2.4 Crosstalk strength of receivers with sender CNOT6L does not depend on their predicted number of shared binding sites with CNOT6L Related to (Figure 2.5) ...... 39 2.5 Crosstalk strength of receivers correlates with the predicted number of miRNA binding sites shared with the sender ...... 40 2.6 Dissecting relative contributions of miRNAs in transmitting crosstalk . ... 43 2.7 Greater miRNA:Target ratios underlie Pten's superior ability to send crosstalk 45 2.8 Derepression of Pten ceRNAs is detected upon modulating the levels of Pten 3' UTR with a transiently trasnfected synthetic reporter construct ..... 47 2.9 Normalization is required for FACS Sorted RNAseq data as reads from plas- mid occupy a large percentage of total sequencing reads leading to an overall offset in fold changes ...... 51 2.10 Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dose- dependent and miRNA dependent manner ...... 53

xi Chapter 0

2.10 Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dose- dependent and miRNA dependent manner ...... 54 2.11 Predicted TargetScan conserved miRNA binding sites in the 3'UTR of the ceRNAs chosen in this study ...... 64 2.12 Crosstalk is microRNA mediated and pervasive on a genome-wide scale. Re- lated to (Figure 2.3) ...... 65

2.13 Distribution of log2 fold changes (PTEN UTR/NULL) for all post TMM normalization is centered around zero in each bin i.e no bin-dependent effects are seen. Related to Figure 2.10 ...... 65

3.1 Measuring Pten, Vapa and Cnot6l gene expression in single cells with 3-colour single-molecule FISH ...... 71 3.2 Crosstalk helps ceRNAs co-fluctuate in single cells thereby tightening their stoichiometric ratios in the presence of active miRNAs ...... 73 3.3 Pten does not lose correlation in DICER for a gene with which it doesn't share miRNAs ...... 74 3.4 Measuring crosstalk strength with smFISH for 3 different senders in HCT116 and DICER -/-...... 76 3.5 Single molecule FISH shows Pten ceRNAs are colocalized in a DICER de- pendent m anner ...... 78

4.1 Opposing noise effects of microRNA regulation at low and high gene expression 85 4.1 Opposing noise effects of microRNA regulation at low and high gene expression 86 4.2 Noise model predictions for a microRNA regulated gene ...... 87 4.3 microRNA-mediated intrinsic noise effects ...... 89 4.4 Estimation of microRNA pool noise and noise effects for endogenous genes . 91 4.4 Estimation of microRNA pool noise and noise effects for endogenous genes . 92

5.1 Colocalization of ceRNA's can enhance crosstalk by increasing their local concentrations hence promoting rates of miRNA association between ceRNA as free miRNA's are more likely to bind to nearby mRNA than other targets (adapted from Jens (2015) ...... 96

xii Chapter 1

Introduction

According to the central dogma of molecular biology, RNAs are passive messengers of ge- netic information, or carrying out DNA instructions for protein production in cells. Studies on gene regulatory networks in the past focused on transcriptional regulation in the form of protein transcription factors binding to DNA, but increasing evidence suggests that post- transcriptional regulation are a significant part of the regulatory network. The discovery of microRNAs, a class of short noncoding RNA 18-25 nucleotides in length,that were shown to inhibit their target genes through binding to sites on the 3' untranslated regions (UTRs) of target RNA transcripts with imperfect complementarity, and leading to decreased ex- pression of their target either by mRNA degradation or translational inhibition

(Bartel,2009) has dramatically increased the complexity of the gene regulatory networks.

Each microRNAs can act in a combinatorial manner as a single mRNA usually contains binding for multiple miRNAs. At the same time, individual miRNA often targets up to 200 transcripts which are diverse in their function. Within the network of potential interactions that ensue, miRNAs have been thought to function mainly as fine tuners of gene regulation by weakly dampening protein output (Bartel 2004) but more recently attention has been directed to their system-level effects. In particular, If microRNAs act to negatively regulate

RNAs, could RNA's themselves regulate microRNA levels? After all, each target binding Chapter 1 site sequesters miRNA from their other targets. The central mechanism underlying the ceRNA hypothesis proposed by (Salmena 2011) is the idea that RNA species are coupled by their targeting miRNA through their shared miRNA binding sites. Therefore, they may have interactions that are not direct, but instead indirect and mediated by competition and depletion of shared microRNA pools. Thus RNA's could be said to "crosstalk" with each other. Moreover, the hypothesis contends that these indirect RNA interactions result in a biologically important mRNA network- either by functional changes in protein levels or by inducing correlations in different RNA species or by reducing noise in protein levels. This

mechanism is believed to play a role in many biological processes, from cancer (Tay 2011)

to cell differentiation (Cesana 2011).

In the next section, we discuss miRNA biology and literature summarizing the experi-

mental evidence of RNA-RNA crosstalk, as well as the modulators of crosstalk activity.

1.1 MicroRNAs-discovery, biogenesis, target binding and com-

petition

1.1.1 Discovery of miRNA Regulation

MicroRNAs were first discovered in the nematode C. Elegans in 1993 where lin-4, a short

non-coding RNA, was found to imperfectly base-pair to complementary sequences on the

3'UTR of the lin-41 transcript (Wightman 1993, Lee 1993), and block lin-41 gene expres-

sion. Reduction of LIN-41 protein resulted in mis-timing of the developmental stages of

the animal. Lin-4 remained the only miRNA discovered until 2000, when another miRNA

important in the development of C. elegans, let-7 was discovered (Reinhart et al., 2000).

Analogues of lin-4 and let-7 were found in a wide-range of other species, including humans

and in the following years, over 1500 different miRNA sequences were discovered. A huge

amount of research focused on the identification of target sites(Lewis 2005, Stark 2003), their

likely cellular functions (Giraldez 2006, Vigorito 2007) and their biogenesis (Hutvagner and

Zamore, 2002). MiRNAs have been ascribed roles in nearly every biological process, includ-

10 1.1. MICRORNAS-DISCOVERY, BIOGENESIS, TARGET BINDING AND COMPETITION

ing apoptosis (Cimmino 2005), pluripotency (Subramanyam 2011), and cell-cycle control

(Ivanovska 2008).

i Ftt r .C A

1..Tioee ils of miN s

In animals, miRNAs are transcribed by RNA Pol II as long primary transcripts (pri-

miRNAs) with both a 5' cap and 3' poly-adenylated ends (Cai 2004). miRNA genes are of-

ten genomically clustered such that pri-miRNA transcripts contain multiple mature miRNA

sequences (Lau 2001). These primary mIRNAs are recognized and clipped by the micropro- cessor complex, comprising the RNAse III enzyme DROSHA (Lee 2002) and its co-factor

RNA-binding protein DGCR8 (Gregory 2004), into hairpin loops 60-65 bp long. These hair-

pin loops are bound and exported from the nucleus into the cytoplasm by Exporin-5. Once in the cytoplasm, the pre-miRNA are bound by a second RNAase III enzyme DICERi

11 Chapter 1 which cleaves the precursor loops into short double-stranded 20-24 nt RNA (Grishok 2001), containing the mature miRNA "guide-strand' and "passenger strand". In a less understood process, DICERI loads the mature miRNA into the Argonaute complex (usually AGO2), that in turn recruits the RNA-induced silencing complex (RISC) (Sontheimer 2005). Upon loading of the miRNA into the RISC complex, the passenger strand of the double-strand miRNA is usually degraded while the guide-strand bound to the silencing complex seeks out its complementary RNA sequence. As biogenesis consists of multiple steps, numerous mechanisms for modulating its propagation have been shown, with implications for ceRNA competition that will be discussed later. In particular, over expression of Ago2 was found to increase mature miRNA levels in some cells, while disruption of DICER1 enzyme resulting in lowered levels of mature miRNAs (Diederichs 2007, Tay 2011). We will use cells lacking in DICER1 as an important control in all our experiments in Chapters 2 and 3.

1.1.3 miRNAs: target binding and competition

The specificity of the target recognition process depends upon a crucial "seed" region of the miRNA (usually nt 2-7/8) recognizing as few as 6-7 nucleotides in the 3'UTR of target mRNA (called the microRNA Response Element or MRE). In most cases, even a single mismatch in the seed sequence leads to disruption of miRNA binding (Lewis 2005). Even so, with such few nucleotides in the seed region responsible for target recognition, individ- ual miRNAs potentially bind to a large number of target mRNAs. However, as with any bimolecular binding reaction of the form A +B +-- AB, the mass-action law dictates what proportion of targets would be bound, and thus repressed by a miRNA. This relates the molecular concentrations of the miRNA and its targets to the Kd of the interaction. If miR-

NAs are limiting, then increasing the number of targets would result in lower occupancy per target. Put another way, each miRNA bound to a target necessarily prevents, to some extent, the binding of that miRNA to other target sites. Thus, target sites can be said to compete with each other for miRNAs. More generally, competition and saturation effects occur in other parts of the miRNA regulatory process. When mature miRNAs are loaded onto the Argonaute complex there is competition for access due to the small number of

12 1.2. CERNAS: DISCOVERY

molecules involved which can lead to saturation conditions for the RISC machinery.

The concept of competitive target inhibition by miRNAs inside the cell was first shown in 2007 by (Ebert 2007), who used plasmids overexpressing miRNA seed-sites (upto -10,000 copies) to 'sponge up" specific endogenous miRNAs, and thereby titrate away those miRNAs from their other targets, resulting in a specific up-regulation of the corresponding miRNAs targets. Consistent with the limited power of miRNA repression, they measured a mild

1.5-2 fold up-regulation of the miRNA target. In order to stress the large number of strong binding sites for a single miRNA that had been introduced into the cell, they used the term

PmiRNA sponge". Later (Seitz,2009) proposed that these highly expressed artificial sponges may have a biological function and that the role of a substantial fraction of computationally identified miRNA targets may be to sequester miRNAs, preventing them from binding to their authentic targets. Such sponges had been discovered in plants where over-expression of the long non-coding RNA IPS1 sequesters miR-339 and results in the up regulation of miR-

339 target gene (Franco-Zorrilla 2007). To what extent similar competition and saturation effects naturally occurred in animals remained unexplored.

1.2 ceRNAs: Discovery

A Conventional RNA logicRN

S UTR COS 3'UTR ~T

MRE

Figure 1.2 I Logic of the ceRNA language (adapted from (Salmena, 2011).

In 2010 the Pandolfi group devised a combined computational/experimental strategy to search for potential competing endogenous RNAs (termed ceRNAs) for a tumour-suppresor

13 Chapter 1 gene Pten based on the number of predicted shared miRNA binding sites on other tran- scripts. This computational analysis identified over a hundred protein coding genes that shared at least 7 miRNA binding sites with Pten. These genes were considered candidate ceRNAs for Pten. For a subset of these gene (6 out of the 8 genes tested) they demon- strated a depletion of their expression upon Pten knockdown via siRNA, and conversely, a up-regulation upon overexpression of PTEN 3'UTR. Specifically, the genes Vapa and Cnot6l were confirmed as bona fide Pten bi-directional ceRNAs as transfecting the 3'UTR of these mRNAs intensified their miRNA sponging and led to an increase in PTEN protein abun- dance. Such a change in PTEN protein levels was shown to have a functional significance: it antagonized PI(3)K signaling and caused growth and tumor suppression (Tay 2011).

The authors went further and extrapolated that all kinds of RNA transcripts talk to each other in a miRNA-mediated language and proposed a "crosstalk" hypothesis: RNA's sharing multiple MRE in their 3' UTRs (or in other ncRNA) communicate with each other and regulate their expression levels by competing for a limited pool of miRNAs (Salmena

2011). Upregulating a given RNA would lead to an increase in the total number of MRE's and thereby attract miRNA binding towards it. As a result the targeting miRNAs would be sequestered leading to the de-repression of other miRNAs sharing the same MRE's. This indirect correlation, between competing targets was termed the ceRNA or crosstalk effect. (Figure 1.2)

While the ceRNA hypothesis was a natural consequence of target competition and se- questration, it nevertheless made a startling claim: a new, pervasive gene regulatory network must exist due to the highly promiscuous and clustered nature of miRNA-target binding

(Karreth 2011, Sumazin 2011). These papers proposed that shared miRNA target sites linked dense networks of thousands of genes in a regulatory complex and moreover, the ex- pression of these genes is correlated in many cancer cell-types. In order to test computational predictions of ceRNAs, individual ceRNAs were either down regulated or over-expressed and expression levels of other ceRNAs were measured. In this manner, many new ceRNAs were discovered. In the following section, we briefly discuss some classes of transcripts that have been identified as ceRNAs.

14 1.2. CERNAS: DISCOVERY

1.2.1 Different types of endogenous ceRNAs

miRNAs A -rrr=

Pseudogenes AAAAA miRNA Ti"If2f- e miRNA uicRNA circRNA Competing mRNA

MAAA

mRNA

AAAAA B

T" I I ftNotv&W&W Cerwclf w aft nds m u~fN

Sminon-codengRMA H"sZI fO1 MM-27a ifPsa"AwzahMw Lnjgnon-codingRNA IP$1 PnO2 iOR-399 AmbAdoptis SUwana Hl.C PRACO ff372 HOMMOW Unc4AM1 MAEL mAW133 AMnma cukand Homo sapvns AMc inI-135

ImcRoR ~NG n)&145I hiamsop"M WUr SM~

PTCSC3 miR-5745p H29 Le-7 -my Mwnumuscus and HOMa sapans PTMIPI PTEN -17.mR-19 m&iR-21, miR-26 and

KWAIP L*t-7 bn~i

Pbcs4 SCAS4 Miarnuanuband Hoinoseiwis OmXatRNA CoRlas/AyS- Dan,..ed, Mwwmosuandhamose Sqy m*138 Mws musadus and Homo sapid

Figure 1.3 1 Various types of validated competing endogenous RNAs (adapted from (Tay & Pandolfi, 2014).

1.2.2 3'UTRs as ceRNAs

3' UTRs are critical for mRNA stability and typically contain MREs for several different

miRNAs. One can view them differently as ceRNAs because 3' UTRs regulate not only

the stability of their own transcripts in cis, but are also likely to attract miRNAs from

transcripts with shared MREs, thereby regulating such transcripts in trans. This suggests

that mutations or changes in abundance, structure, or length of 3' UTRs may affect their

15 Chapter 1 ability to sponge miRNAs. Supporting this view, alternative polyadenylation of 3'UTR has been observed- leading to their lengthening during embryogenesis, and shortening in proliferating cells (Mayr 2009) and in cancer (Mercer 2011). These changes in the length of

3' UTRs affect the interaction of miRNAs with such transcripts and affect protein output.

Moreover, due to a reduced number of MREs, 3' UTR shortening will also modify the ability of these mRNAs to compete for/sequester miRNAs and thereby function as ceRNAs.

1.2.3 Circular RNAs

RNA's that are covalently linked at the ends to form circles had been described in plants

(Sanger 1976) but a new class of noncoding circular RNAs (circRNAs) was recently identi- fied and characterized in mammals (> 5000). These RNAs are processed by the spliceosome in an unusual head-to- tail fashion, resulting in circular transcripts that contain multi- ple miRNA binding sites and act as miRNA sponges to deplete the cell of specific miR-

NAs, essentially alleviating repression of the mRNAs they target (Memczak 2013, Hansen

2013). They found that a circRNA ciRs-7 contained >70 MRE's for the miRNA miR-7 and formed complexes with AGO in a miR-7 dependent manner. smFISH then showed that circRNA-miRNA complexes localize to P-bodies, suggesting that the complexes were being sequestered from translational machinery. circRNAs have proven to be highly effective at sequestering miRNA's as compared to their linear counterparts partly because they are al- most immune to miRNA mediated target destabilization due to inherent resistance to RNA exonucleases. Effective "supersponge" ceRNAs have precisely such properties: resistance to degradation, high expression levels, multiple miRNA binding sites. Further characterization of this abundant class of non coding RNAs will be necessary to determine how universal this mechanism is for sequestering miRNAs inside cells and their ceRNA function.

1.2.4 Pseudogenes as ceRNAs

Pseudogenes, a class of non-coding RNAs, are transcribed yet posses features such as pre-

mature stop codons, deletions/insertions, or frameshift mutations that prevent them from producing functional proteins. Hence they have been considered "junk" DNA. However, they

16 1.2. CERNAS: DISCOVERY

are thought to act as "perfect sponges" because they possess many of the same MREs lo- cated on their ancestral genes; for example, PTENP1 is able to change the miRNA network normally involved in the regulation of PTEN [Tay 2011, Poliseno 2010]. PTENP1, the pro- cessed pseudogene of PTEN represents the first reported example of an RNA transcript that acts as a ceRNA for PTEN. Within the coding region, the PTENP1 sequence differs from the PTEN sequence by only 18 mismatches, thus PTEN-targeting microRNAs that bind to MREs are usually PTENP1-targeting as well. (Poliseno, 2011) tested ceRNA activity of PTENP1 in prostate cancer cells, and showed that inhibiting the common microRNAs miR-17, -19, -21, -26 and -214 de-repressed PTENP1. Conversely, PTENP1 3'UTR overex- presison led to the de-repression of PTEN. Another pseudogene acting as a ceRNA is the

Oct44 pseudogene, Oct4-pg4 (Wang 2013). Oct4 pesudogene was shown to sponge away the miR-145, and hence upregulate Oct4. These studies have attributed a miRNA-sponge func- tion to pseudogenes however, the difficulty of reliably quantifying pseudogene expression

(due to the aforesaid sequence similarity) have hindered attempts to quantitatively study their ceRNA function on a large scale.

1.2.5 Long non coding (IncRNA) as ceRNAs

Similar to pseudogenes, long non coding RNAs don't have any protein-coding capacity, but are found pervasively across the transcriptome (-10,000) . They are good candidates to act as ceRNAs because they are peppered with miRNA binding sites, and have an ability to sequester miRNAs (Chi 2009). Moreover, lncRNAs are also known to display specific expression patterns in different tissues, developmental stages, cell types and disease and thus have been recognized as ideal candidates to tune post-transcriptional regulation

(Guttman 2012). Two such IncRNA ceRNAs that have been discovered acting as miRNA sponges are HULC and ROR. The lncRNA HULC has been shown to act as a ceRNA - it sequesters a set of miRNAs, including miR-372, and its over expression reduces miR-

372 expression and activity in the liver cancer cell line Hep3B. This miR-372 sequestration increases the translational level of the miR-372 target gene, PRKACB (Cesana, 2011).

Recently, (Wang 2013) showed that lnc-RoR competes for miR-145 binding with the well-

17 Chapter 1

known core pluripotency factors Oct4, Nanog and Sox2 in pluripotent embryonic stem cells and thereby protects them from miR-145 induced degradation. Interestingly, Inc-ROR was expressed at a greater level(>100 copies/cell) than its miRNA-145 (10-20 copies/cell) suggesting that it acts as a good sponge.

1.3 Modulators of crosstalk activity

The size of the crosstalk effect depends upon whether or not a single ceRNA perturbation has an appreciable effect on the total miRNA target pool so as to titrate away miRNAs from other shared targets and thereby relieve their miRNA induced repression. Recent mathematical models of miRNA gene regulation (Bosia 2013, Figliuzzi 2013, Ala 2013) have aimed to quantitatively model ceRNA crosstalk through both steady-state and kinetic descriptions for a small number of interacting miRNA-ceRNA species. The quantitative pre- dictions of these models may not sufficiently explain the magnitudes of the endogenously measured ceRNA effect due to the limited number of ceRNAs modeled and the use of free kinetic parameters of transcription, degradation and association rates that are difficult to experimentally ascertain (Ebert 2012). However, they illustrate some useful principles of

miRNA-target competition: (i)the optimal regime for ceRNA crosstalk occurs when targets

concentrations are close to the binding Kd of miRNA-target interaction (ii) crosstalk be- tween targets is intensified with a greater number of shared miRNAs (iii) higher expressed

targets that form a greater proportion of a miRNA's total target pool are better senders for

crosstalk.(iv)ceRNA effects will be selective and hierarchical depending on the particular

affinities and binding strengths of miRNA-target pairs (Figliuzzi 2013) (v) ceRNA effects

can be indirect i.e if ceRNA1 shares miRNA1 with ceRNA2 and also shares miRNA2 with

ceRNA3,then ceRNA1 will be indirectly coupled to ceRNA3 through even though they do

not share any mIRNAs directly in common with each other.(Ala 2013)

Quantitative prediction of ceRNA effect in miRNA networks critically requires knowl-

edge of the relative concentrations of miRNAs and targets in the cell. Both of these are

experimentally difficult to measure. Absolute concentrations of miRNA have been reported

18 1.3. MODULATORS OF CROSSTALK ACTIVITY to range up to 120,000 copies per cell in various cell types (Bissels et al., 2009; Calabrese et al., 2007; Denzler et al., 2014; Lim et al., 2003; Mukherji et al., 2011). Estimated total target concentrations for a given miRNA vary from 500 copies per cell to over 440,000

(Denzler et al., 2014; Loeb et al., 2012; Wee et al., 2012). Estimates of target abundance concentrations are done in-silico and are widely divergent estimates. Consequently, dif- fering target pool size predict very different characteristics of miRNA target competition networks. Recently, researchers (Bosson 2014) have critically advanced the field by mak- ing state-of-the-art measurements of both miRNA abundance and the total abundance of miRNA-binding sites (Bosson 2014).

1.3.1 Abundance of miRNA binding sites and miRNA concentration

Firstly, to directly determine bound miRNA target sites, (Bosson 2014) relied on cross- linking and immunopreciptiation (CLIP) of the Argonaute 2 protein to identify bound

AGO2 mRNA and consequently target-site abundance in vivo. CLIP protocols first use

ultra-violet (UV) light to induce protein-RNA cross links, then AGO2 protein is immuno- precipitated using a specific antibody, thus bringing both the guide miRNAs and their

bound targets, and these are stringently purified to get rid of unbound RNA, digested into

short RNAs, and prepared for sequencing. By quantifying the CLIP reads at each miRNA

seed-site they were able to specifically and reproducibly estimate the concentration of bound

targets. Secondly, they measured miRNA concentrations with a small RNA-seq assay and

normalized the counts to miR-295 copies per cell quantified by northern blot. With these

data, they show that for the thirty highest expressed miRNAs in ES cells, total 6-mer/7-

mer/8-mer target pools were more abundant than all miRNA concentrations. Thus any perturbation of a ceRNA for those miRNAs is unlikely to titer them away as binding sites

are already in excess. Similarly, (Denzler 2014) reported that even for the highest expressed

miRNA, miR-122, total target binding sites are above miRNA levels; consequently miR-122

targets are not derepressed until they added unphysiologically high amounts of miR-122

sponges. These studies, done on primary cells, have considerably diminished the possibility

of a appreciable ceRNA effect that is purely stoichiometric in nature. It is important to

19 Chapter 1 realize that these CLIP protocols (Bosson 2014) pool together millions of cells, yielding an average binding profile which may not be reflective of dynamic conditions in single cells.

Moreover, the studies by the Pandolfi group were done in a cancer cell-line which are known to have altered miRNA concentrations. Therefore, we cannot rule out ceRNA effects in all types of cells.

1.3.2 MiRNA binding affinity

The two main factors that affect miRNA-binding affinity are the number of miRNA bind- ing sites on a target and the free energy of the miRNA-target hybridization (AG). Given the variation in binding affinities across targets, miRNAs will preferentially bind targets with greater affinity before spreading to lower-affinity sites. Thus the total target pool is partitioned into hierarchical affinity classes that do not compete equally. Conceptually, all binding sites of the same affinity (Kd) "see" the same concentration of free miRNA, which means that they can be grouped together. Targets with affinity much greater than the rest of the pool would act in a simple 1:1 titration regime with the miRNA. Since high-affinity target sites more favorably bind the available miRNA pool, competition can occur without approaching expression levels of the total pool of weak and strong sites combined.

1.3.3 MRE Accessibility and Local concentrations

Going back to the binding reaction of the form A+ B <-- AB, one notices that the relevant concentrations of each species is not the global concentration (assuming a well-mixed cellular environment) but rather that binding probabilities are determined by local concentrations. If miRNAs or mRNAs are kept sequestered in sub-cellular structures, local concentrations may deviate from the average by a large magnitude. Structures such as P-bodies or RNA granules can harbor RNAs and mIRNAs in small volumes, thereby concentrating them and possibly altering binding and unbinding of miRNAs. While these phenomena are very difficult to quantify, altered local concentrations can change the competition between miRNA-mRNAs and enhance the size of the ceRNA effect. Essentially, rather than competing for binding with the whole target pool, miRNAs could bind much more favorably to locally available

20 1.3. MODULATORS OF CROSSTALK ACTIVITY

target sites. smFISH studies can allow us to quantify the localization of ceRNAs, which we will perform in Chapter 3.

1.3.4 Post-transcriptional network effects

shard pairs *3'0,\ 0 230 m 45 = 60

.10

Mir-203mir-34 mir-2m2r-96

2000 0 1000 0 100 connectivity (total # of shared pair targets)

Figure 1.4 I Extensive co-targeting of miRNAs - many targets share miRNA binding sites(adapted from Obermayer(2014). The color of the edges indicates the number of pairs which share a given pair of miRNAs while the size of the nodes indicates the total number of shared targets for a given miRNA .

A systematic analysis of the ceRNA effect is impeded by the complexity of natural miRNAD ceRNA regulatory networks. The ceRNA effect depends both on the underlying dynamical binding parameters of miRNAs-target RNAs and on the topology of the network.

The miRNA-RNA network is known to be highly clustered-certain miRNAs often target genes in tandem- consequently, there appears strong correlations in network connectiv-

ity (Figure 1.4). An implication of the highly interconnected nature of the miRNA-RNA

target network is that perturbations of gene expression can potentially propagate in the net-

work through a cascade of coregulated target RNAs and miRNAs that share targets (Nitzan

2014). Pairs of miRNAs which have greater number of shared targets would therefore act

as key nodes in the ceRNA network. Conversely, certain ceRNAs, which are commonly tar-

21 Chapter 1

geted by a large number of miRNA species can selectively transmit crosstalk than others.

Whether or not small effects caused by a propagation of the ceRNA effects are biologi- cally meaningful remains to be investigated. Similar network propagation issues affect other gene regulatory mechanisms. It has often been observed after a gene perturbation (eg. of a transcription factor, miRNA, or drug target) that unrelated genes (off-targets) changed expression i.e those genes whose connection to the perturbed genes was not traceable.

1.4 Summary and Outline

Following the discovery of transcripts that can sequester miRNAs thereby releasing other targets from miRNA-mediated repression, a new principle for post-transcriptional gene reg- ulation has been proposed. This layer of gene regulation works through competition for miRNA binding between different RNAs, and thus has the capability to form a large-scale regulatory network across the transcriptome. The competing endogenous RNA (ceRNA) or

RNA-RNA crosstalk hypothesis certainly seems an attractive explanation for the function- ality of non-coding RNAs and pseudogenes, and until now, many ceRNAs, both coding and non-coding, have been implicated in varied biological contexts, from cancer (Fang 2013) to muscle differentiation (Cesana 2011). Nonetheless, only a handful of ceRNAs have been experimentally identified and many features of the proposed ceRNA hypothesis remain un- examined. Our aim in this thesis is to address some of the fundamental questions about the generality and magnitude of the crosstalk mechanism. In Chapter 2 we describe the results of perturbing single ceRNAs ( Pten, Vapa and Cnot6l) and quantifying its effects on the transcriptome to extract both the size of the ceRNA effect and test the contribution of specific microRNAs. As will be seen the ceRNA effect is bounded yet pervasive across the transcriptome. We find that in addition to the number of shared miRNA binding sites between the perturbed ceRNA and its targets, the affinity of shared miRNA-target binding is crucial in determining the magnitude of the ceRNA effect. Chapter 3 investigates three specific ceRNAs at a single-cell level with single-molecule resolution to explore how ceRNA co-regulation plays out in single cells. Unexpectedly, we find significant co-localization of these ceRNAs which can enhance crosstalk locally through competition, thus allowing us

22 1.4. SUMMARY AND OUTLINE to revise the original hypothesis. Moreover, we find miRNA-coupling between ceRNAs is capable of buffering their individual fluctuations and producing surprising correlations in gene expression. Chapter 4 studies the role of miRNAs in dampening fluctuations in protein levels (Schmiedel et al. 2015). We find that miRNA regulation provides a significant reduc- tion in intrinsic protein noise at low expression levels which scales with miRNA repression, but variability in miRNA concentrations itself propagates to target fluctuations at higher expression levels.

23 Chapter 2

Assesment of the ceRNA hypothesis with integrated genome-wide measurements reveals bounded yet pervasive crosstalk activity

MicroRNAs (miRNAs) are an abundant class of small non-coding RNA that play complex roles in post-transcriptional regulation of gene expression. Individual genes are typically reg- ulated by many distinct miRNAs, and conversely individual miRNAs often target multiple genes leading to complex regulatory networks (Friedman 2009) that drive a large vari- ety of cellular processes, from differentiation and proliferation to apoptosis and cancer [Yi

2008, Sluijter 2010, Cimmino 2005]. Several recent studies have added a new facet of post- transcriptional gene regulation: one that is mediated by transcripts with shared miRNA binding sites (Salmena 2011; Tay 2011; Tay 2014). This stems from the bidirectional ef- fects between miRNAs and their target mRNAs- where a change in one transcript might affect the expression of other transcripts by sequestering miRNAs from their shared targets and thereby inhibit miRNA repression of those other targets. These transcripts-coupled by their shared miRNAs- are said to 'crosstalk' or regulate each other by competing for common miRNAs. Based upon such a target competition and sequestration mechanism, the competing endogenous RNA (ceRNA) hypothesis proposes a rich network of protein coding-independent regulatory interactions mediated by miRNAs.

Although many individual ceRNAs have been found, fundamental questions about the magnitude of the effect remain. The experimental setup usually consists of altering the level of a particular transcript, ncRNA or a 3 'UTR (a 'sender'), then measuring the change in other genes ( 'receivers' ) that share MRE (miRNA response elements) with the sender, and verifying that this change in receiver expression is miRNA dependent. In this way, pertur- bation of senders by siRNA knockdown or UTR overexpression assays indicates that specific receivers move in a correlated fashion (Tay 2011, Salmena 2011)- they are reduced when senders are knocked down and are de-repressed when senders are upregulated. However such a competition mechanism faces three major limitations in accounting for the magni- tude of the observed ceRNA effects. Firstly, individual miRNAs have long been thought to confer limited repression (-2 fold Bartel 2004, Baek 2008). Secondly, given the large target abundances in a cell, any sender perturbation is only thought to add or subtract very few sites from the total target pools for a targeting miRNA (Arvey 2011), implying that the repressive influence of that miRNA on individual receivers would be muted, and thus any consequent crosstalk would be small. Thirdly, mathematical models predict an op- timum regime where crosstalk might be possible, namely when regulating miRNA and its target binding sites are near equal effective concentrations (modulo binding K d) (Jens 2015, Bosia 2013, Figliuzzi 2013). While estimates of miRNA concentrations exist (tens to 120,000 copies per cell) [Bissels 2009 Denzler 2014], estimates of total target abundances and bind- ing affinities are highly variable, making it difficult to asses whether genes are susceptible to crosstalk in an endogenous environment. However, a recent study of ceRNA effects for the exceptionally highly expressed liver-specific miR-122 determined that no target-competition occurs in vivo because of the large relative abundance of the miRNA target pools (Denzler

2014). Thus the hypothesis remains controversial despite a variety of examples: psudogenes

(Poliseno 2010), circ-RNAs (Hansen 2011), and lnc-RNA (Cesana 2011) which suggest the existence of ceRNA interactions.

The logic of crosstalk, supplemented with the highly interconnected network of miRNA- mRNA interaction, suggests that ceRNA effects should be pervasive across the transcrip-

25 Chapter 2 tome (Sumazin,2011). Since each sender typically sequesters multiple miRNAs,which in turn have other targets, perturbing the levels of one sender could potentially result in the change in expression of hundreds of RNAs competing for shared miRNAs. Signal propa- gation through miRNA -+ ceRNA -+ miRNA could take place, affecting distant receivers

(Nitzan 2014, Bosia 2013). However no widespread ceRNA effects have been shown experi- mentally. Existing studies typically focus on perturbing a sender and testing only a handful of ceRNAs. For example, after computationally searching for Pten ceRNAs based upon the

number of shared miRNAs, (Tay 2011), found hundreds of possible ceRNA candidates but

tested only a selected few Vapa, Cnot6l, Serinci, Znf460 that each shared at least 7 miRNA

binding sites with Pten. Consequently, it has proved difficult to ascertain whether crosstalk

is restricted to a select few sender-receiver pairs with high numbers of shared miRNAs, or

only to those in favourable stoichimetric [miRNA] / Target pool ratios or instead if crosstalk

is a general phenomenon.

Identifying which miRNAs are involved in transmitting crosstalk between a particular

sender and a receiver is crucial to refining the ceRNA hypothesis. Current methods to

identify ceRNAs rely upon computational miRNA-mRNA target predictions. In particular,

they emphasize the number of shared miRNAs between a sender-receiver pair (Salmena

2011, Ala 2013). However, each miRNA-mRNA interaction is affected differently by the

strength of the miRNA-mRNA binding and by the local concentration of each interacting

species. Thus the ability of a specific miRNA to transmit crosstalk will be influenced by its

differential sequestration by the sender and differential repression on the receiver, and not

only on the number of shared miRNA binding sites. In the case of Pten ceRNAs, miR-17,

miR-19 and miR-26 families have been validated as transmitting crosstalk but it remains

unknown whether other miRNAs are functional in the Pten ceRNA network.

2.1 Results

In our study we used RNA Sequencing to quantify both the magnitude and extent of the

crosstalk effect genome-wide by directly measuring the effect of perturbation of 3 different

26 2.1. RESULTS

senders on the transcriptome. The senders we chose to knock down - Pten, Vapa and Cnot6l share many miRNA binding sites, and were each experimentally demonstrated as putative ceRNAs, competing for miRNAs with each other in the colon carcinoma HCT 116 cell line.

Genome-wide measurement of the transcriptome after the perturbation of senders using

RNA Sequencing would allow an assessment of key features of the ceRNA hypothesis.

In particular, it would permit a quantification of the magnitude of crosstalk strength for thousands of potential receivers. Our work is focused on three major questions: a)How large is the magnitude of crosstalk in an endogenous system? Are ceRNAs restricted or are they extensive when you test thousands of sender-receiver pairs? What are the characteristics of a good sender? Which miRNAs are involved in transmitting crosstalk? What are the characteristics of miRNA's that makes them good at transmitting crosstalk?

We used a highly simplified model of miRNA regulation of a single sender-receiver aimed at quantifying the magnitude of crosstalk interactions. The model predicts that crosstalk

strength is bounded by 1 and is usually much smaller for reasonable binding parameters.

On evaluating the crosstalk strength transcriptome-wide in our experiments, we found that

crosstalk strength is indeed bounded for each of the senders, yet it is surprisingly pervasive

across the genome- including hundreds of genes at all expression levels. We uncover putative

ceRNA's for each sender based on the difference of crosstalk strength in the HCT116 and

HCT 116 DICER -/- colon carcinoma cells. We further characterize the influence of shared

miRNAs between senders and receivers upon the crosstalk strength and determine that

crosstalk strength is intensified when sender-receivers pairs share more miRNAs. Using

our quantification of crosstalk strength, we estimate the power of a miRNA to transduce

crosstalk for each sender, and find that there is a hierarchy of miRNAs crosstalk power i.e

miRNA are differentiated in their ability to affect ceRNAs. Surprisingly, we find that the

miRNAs targeting Pten have the highest crosstalk power of the three senders. We suggest

that the ability of a gene to be a good sender of crosstalk (like Pten) is dependent upon its

ability to sequester miRNAs and the overall stoichiometry of its [miRNA] / target pools.

We further find that we can modulate the levels of these putative ceRNA's by trans-

fecting a plasmid carrying endogenous Pten 3'UTR sponges into the cells at varying levels.

27 Chapter 2

Specifically, we find a subset of 'robust' Pten ceRNAs are both de-repressed in a dose- dependent manner and depleted when Pten is knocked down suggesting that Pten exists in an optimal regime for crosstalk.

2.1.1 ODE biochemical model of crosstalk predicts that crosstalk strength

should be bounded by 1

The endogenous molecular environment consisting of numerous miRNAs and targets is complicated; any perturbation of a sender changes target pools for many different miRNA's targeting many receivers (Figure 2.1a). To characterize the strength of the ceRNA effect, we need to answer two questions: How does a change in the sender influence the free miRNA pool? How does the corresponding change in the miRNA pool influence the receiver? We sought to understand the simplest system consisting of one sender- one transmitting miRNA and one receiver. In the simplest titration mass-action ODE model (analagous to Buchler

2008;Mukherji 2011) of two mRNAs regulated by one miRNA which is recycled after in- teracting with its target we take into account the dynamic properties of miRNAs (p), free

mRNAs (ceRNAs) for the two targets (m, and M 2 ), and complexes of the miRNA with its targets (mg and M 2,a) . The model's parameters are transcription and degradation rates for m1,2 (il and dm) and [t resp.), and association, dissociation, and degradation rates for the complex m1,2t (kn,koff, d"' . For illustration (but this simplification can be relaxed), all the transcription, degradation and association rates are assumed equal. Considering one target as the sender and the other as the receiver of crosstalk (Figure 2.1a), we would like to know the impact of the variation of the single sender m1 's transcription rate on the receiver m2 (their derivative is what we term crosstalk strength).

d[i] = V - d i.[m,] - k".[n.].[jL] + k f.[mp] (2.1) dt 2 Z 2 2

d[minp] d = k ".[mj.[p] -k' .[mp - d .[mp (2.2)

A' = [p] + [mIA] + [n2p] (2.3)

28 2.1. RESULTS

where we assumed that the total miRNA concentration is a constant ILT . We can obtain the steady state solutions for each species:

K (2.4)

kof f+dm" where K is the effective dissociation constant of the miRNA complex, K on and d coped-,an

[Mil ([m2 - [pt - K* + V([m1 - [p*] - K*) 2+ 4m .K (2.5)

Where we defined a microRNA "target load" w = ml/Kd + m2 /Kd which describes the sequestration of the miRNA by the two regulated mRNAs and captures the competition between those two co-regulated genes for the same mRNA. [m9] = v m /d' is the steady state mRNA level without any microRNA regulation, and the effective miRNA concentration is

A d* /d . [p T]. The effect of the competing mRNA can be subsumed into an apparent

Kd*=Kd. (1+ w,) corrected by the miRNA target load that the other mRNA contributes to the miRNA, w1 2 = m 2 /Kd.

The quantity we are interested in, crosstalk strength, is the sensitivity of m2 to M1 levels, drnjd". To see how it varies with sender expression, we fixed all parameters but the transcription rate of the sender. For the steady state solution of the model, the dissociation constant for the miRNA-target complex K , dictates the threshold at which the miRNA is bound or unbound by the sender. Most miRNAs are bound as the sender levels increase above the threshold while they become unbound below it (Figure 2.1b). The model gives us the steady state concentration of the receiver as depending on the free miRNA concentration.

As free miRNA levels decrease, the receiver gets progressively unbound (Figure 2.1c). If there are too many sender molecules then all miRNA would be bound by it, leaving the receiver free, thus no crosstalk would be observed. If there would be too many miRNAs then both ceRNAs would be bound by the miRNA, and no crosstalk would be observed.

Above the threshold, miRNA repression is lost and receiver levels grow almost linearly with transcription rate of the sender while its variation is maximal close to the threshold.

Thus crosstalk is only present in a narrow range near the threshold Kd, where bound

29 Chapter 2

miRNA-mRNA complexes are most sensitive to free miRNA concentrations. Moreover, the model illustrates that both the binding affinities and the overall stoichiometry of the system dictates whether or not there is cross-talk between ceRNAs (Figure 2.1d) [similar to

Figluizzi 2013].

The magnitude of the crosstalk strength between the sender and receiver can be shown to be the product of two factors: the response of the miRNA level to perturbations of the transcription rate of of the sender, and the response of the level of the receiver to the perturbations of the miRNA level (See Appendix A). The former depends upon the fraction of miRNAs bound by the sender (Sequestration; determined by K d, ) while the latter depends upon the the relative repression conferred by a miRNA upon its target (Repression; determined by rates of degradation and association and by the relative concentrations of free miRNA). As the sequestration factor is always less than 1 (miRNAs are always bound to other targets than just sender) and the repressive effect of the miRNA on the receiver mRNA is also always less than 1, their product will also be less than 1. Thus, the crosstalk strength can be shown to be bounded by 1.

CS recive < Sequestration x Repressionreceiver< 1 (2.6)

In simulations of the single sender-receiver model, where we sweeped parameters (with biologically reasonable values from the literature] in about 90% of expression states in all systems, crosstalk strength was below 0.1. The simulations show that crosstalk is strongest when the expression of the sender is in the sender's ultra-sensitive regime and the expression of the receiver is below the receiver's ultra-sensitive regime. Though the single sender- receiver model is perhaps too simple, it does make a testable prediction: crosstalk strength in an endogenous system should be small and generally bounded by 1. To evaluate this general prediction we used RNA Sequencing to quantify both the magnitude and extent of the crosstalk effect genome-wide for three different sender mRNA.

30 2.1. RESULTS a endogenous situation minimal system

transmittng :. miRNA I 0

sender receiver *

How does a change How does the corre sponding %~S in the sender influence change in the miRNA pool the miRNA pool? influence the receiver? b C m"A

z 4D a 0 0

-0.3

101 100 101 102 [miRNA] [sender] d 03 CS = dm/dm,= dm/dTL * dTL/dm, < 1 11 a) microRNA-mediated change of targetdo Ad . changes in mnRNA2 upon upon change in n RNA1 change of targedoad (TL)

C, o1 0" lop 0 W [sender]

Figure 2.1 1 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded. (a) Generally RNAs (wavy lines) in an endogenous system of multiple miRNAs (cicles) interacting with many targets will se- quester miRNA and produce RNA competition effects. This competition between competing endogenous RNA (ceRNA) species for their miRNA is termed 'crosstalk' or the ceRNA ef- fect. (b)We study a minimal model with only one 'sender', one transmitting miRNA and one 'receiver' under simple mass-action kinetics to computationally ascertain how a change in the sender influences the miRNA pool and how the corresponding change in the miRNA pool influences the sender under reasonable biochemical binding parameters.] Steady state concentrations in the system are obtained by fixing all parameters but the transcription rate of the sender. All binding parameters are assumed equal between sender and receiver. Sender and receiver expressions are normalized by their (equal) dissociation constants.Numerical simulations of the model show that bound miRNA-target complexes are formed and free miRNA declines as more sender target sites are introduced into the system until the sender saturates the miRNA pool. Maximal change requires free miRNA concentration around the dissociation constant (Kd) of sender binding sites. Inset contains the derivative of miRNA concentrations with respect to sender concentrations which is always negative because an increase in the sender always causes an increase in the level of bound miRNA-target complex

31 Chapter 2

Figure 2.1 1 ODE biochemical model of a miRNA mediated crosstalk predicts that crosstalk strength should be bounded. (c)Under repression by miRNAs, the receiver levels decline upon increase of miRNA levels until they are maximally repressed. Inset contains the derivative of receiver concentrations with respect to miRNA concentra- tions and it is always negative because an increase in [miRNA] always has a peak around Kd of receiver. (d)Combining the dynamics in (b) and (c) we obtain the response of the receiver to sender levels. The receiver is sensitive to variations in the level of its competitor (sender) via the change of the free miRNA concentration [miRNA], and is progressively derepressed as the sender starts to sequester the miRNA. Its derivative is what we refer to as the crosstalk strength (CS) i.e the relative change in the free levels of the receiver upon a relative change in the sender. The inset depicts the crosstalk strength in this model (pa- rameter set). The crosstalk strength increases in the regime where free and bound molecules have similar concentrations. Crosstalk is bound by 1 because it is the product of two factors that are each less than 1: the fraction of miRNAs bound by the sender and the change in repression of the receiver upon changes in its target pool.

2.1.2 Quantification of crosstalk following siRNA knockdown of sender

Previous studies of ceRNA's have focused on only one sender or on only a few targets of a miRNA, even though a perturbation in ceRNA levels that changes miRNA activity would be expected to affect many hundreds of genes. To obtain a more comprehensive view of the effects of sender knockdown in an endogenous system, we knocked down a sender using siRNA and quantified the concomitant changes in the transcriptome using RNAseq (Figure

2.2a). These experiments were performed in triplicate using siRNA pools (a combination of four independent siRNAs) which have been specifically designed to achieve strong target knockdown and minimize off-target effects. We chose to knock down Pten, Vapa and Cnot6l as each of them has been previously shown to be a strong sender of crosstalk [Tay 2011]; moreover, they are targeted by many different, validated, miRNA families [figure], each of which, in turn, targets many different RNA's, thus allowing us to simultaneously test i) thousands of possible sender-receiver pairs for crosstalk and ii) isolate the contribution of specific miRNA's in transmitting crosstalk iii) test the impact of shared miRNAs on crosstalk. As any siRNA knockdown experiment has confounding direct and indirect effects that are either a) due to off-target effects of siRNA transfection or b) not mediated through competition with miRNA's but instead due to the changes in sender transcription (Pten for eg. is a key antagonizer in the PI3K-AKT/PKB signalling pathway), all our siRNA

32 2.1. RESULTS

knockdown experiments were performed in parallel with two essential controls:

a) with negative control siRNA's (Gene expression levels following the knockdown were

compared to expression data collected from three replicates that were transfected with

negative control siRNA)

b) in the HCT 116 DICER -/- cell lines.

The DICER -/- HCT 116 cell line has a deletion in exon5 of the DICER enzyme which

is crucial in the processing of mature microRNA's [Cummins 2006]; additionally mature

microRNA's are known to be significantly depleted in them [Tay 2011]. We expect crosstalk

would thus be reduced significantly in the DICER cell for any putative ceRNA, as observed

previously [Tay 20111 thus allowing us to use it as a control to eliminate non miRNA-

mediated fold changes.

After treating the cells with siRNA, we waited for 24 hours to ensure a strong knockdown,

extracted RNA and prepared RNA-sequencing libraries for each of the knockdowns. We

sequenced with Illumnia HiSeq 2500 at a depth of roughly 20-30 million short reads per

sample. We quantified gene expression in each condition by using reads-per-kilobase million

(RPKM) normalization and averaging RPKM over three biological replicates. To remove

variability from low-abundance RNA species, we removed genes that had 0 reads counts in

any library and measured fold changes for each gene between the sender knockdown libraries

and the negative control libraries. We achieved a direct knockdown fold change of 70-80% for

each of the three senders. A representative RPKM expression scatter-plot in siPten vs the

negative control sample in (Figure 2.2b) shows that Pten is the most strongly differentially

expressed gene. We also confirmed that siRNA mediated gene-silencing is independent of

DICER processing and hence fully functional in the DICER -/- cell line, as comparable

knockdown fold changes for the senders were observed in the DICER cells.

33 Chapter 2

25nM 29nM 25nm h a si-VAPA si-PTEN bCNOT6L- HCT 116 cells high cs 3 - med cs p. Biological ReplicatesI HCT 116 Z - & DICER 24 h ... 6* C . 3 * X1 Extract RNA for RNAseq ,"","""-"'- -- .

Test thousands of sender- os . -. * receiver pairs t

I I 0 2 4 6 8 10 5 10 15 20 25 30 gene expression si-neg. control log2 (RPKM) si-neg. control RPKM C

VAM

AM

0-

OLO to

0 - 1.0 -1.5 -1.0 -. 5 0.0 0.5 1.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 -1.5 -1.0 -0.5 0.0 0.5 PTEN Crosstalk Strenath VAPA Crosstalk Strenath CNOT6L Crosstalk Strength

Figure 2.2 1 siRNA knockdown of 3 different endogenous senders shows crosstalk strength is bounded by 1. (a) Experimental system for quantifying crosstalk strength genome-wide upon siRNA knockdown of either Pten, Vapa or Cnot6l in HCT116 and miRNA deficient DICER -/- HCT 116 cells. Each cell-line was transfected with sender- siRNA and negative control siRNA in parallel and their RNA was extracted after 24 hours. For each sample, RNAseq libraries were created and transcript expression was quantified with sequencing. All RNAseq exeriments were performed with 3 biological replicates. (b) RNAseq mean expression (in units of log2 RPKM) scatter plot for the Pten knockdown and negative control in HCT 116 cells. Each dot represents the mean expression for all genes expressed at greater than 0.1 RPKM in the two libraries (n=13,700 genes). The direct fold change in Pten (shown in green) due to the si-Pten knockdown was 80%. Crosstalk strength for each receiver gene is defined as their fold change normalized to the fold change of Pten (sender). Genes below the diagonal (purple line) have positive crosstalk strength as they are reduced upon Pten knockdown. The right panel is a zoomed in version to highlight changes in genes with expression similar to Pten. The magnitude of crosstalk strength can be estimated as their relative distance from the diagonal compared to Pten's distance from the diagonal. Genes marked in light blue have a lower crosstalk strength than those marked in dark blue. Most genes that fall along diagonal show no changes in expression i.e no crosstalk. In contrast, previously known Pten ceRNAs, Cnot6l and Vapa both have positive CS and are marked in red for comparison. Expression is in units of RPKM. (c) Volcano plot of statistical significance for Crosstalk Strength versus P-value in each of the sender- knockdowns. Crosstalk strength is bounded by 1 (dotted green line) but can have larger negative values. CS=1 for each of the senders (black dots) by construction. P-values are adjusted for multiple comparisons by Benjamini and Hochberg false discovery rate (FDR) fthod with a = 0.05) 2.1. RESULTS

2.1.3 Pervasive yet bounded mRNA Crosstalk upon siRNA knockdown

Different receivers will in general respond differently to a change in sender levels depending upon exactly which miRNA are being sequestered by the sender or by the repressive effect of miRNAs, and thus can exhibit more or less crosstalk. We wished to quantify the crosstalk strength between senders and all its potential receivers for each of the 3 different siRNA knockdown RNAseq datasets in the HCT and DICER cell lines. We defined the 'crosstalk strength' of a receiver with respect to a sender in the respective cell lines/conditions, as the relative fold change in the receiver levels after the sender knockdown to the relative change in sender levels after its siRNA knockdown. For example, for HCT 116 cells, when the sender is Pten, then for a receiver gene X we compare its mean expression in the negative control

(termed 'HN') replicates to its expression in the siPten (termed 'HP') biological replicates

CSceLs=HCT,receiver=X - fold change of gene X in HN over HP _ XHN-XHP XHN sender=Pten fold change of Pten in HN over HP - PtenHN-PtenHP PtenHN

This means, that when the crosstalk strength is 0.1 and the sender levels reduce by

80% then the receiver levels will reduce by 8% through crosstalk. The crosstalk strength, so defined, is dependent on the relative direction (sign) of the fold change: Genes with positive crosstalk strengths are thus depressed when the sender is knocked down i.e they co-vary with the sender as implied by the ceRNA hypothesis. Genes that are upregulated on sender-knockdown will thereby have negative crosstalk strength but should not considered as putative ceRNAs.

We calculated the crosstalk strength for all the 13,700 expressed genes in each of the sender libraries as described above and examined its distribution in the HCT 116 and

DICER cell lines. Most genes suffered no expression change on knocking down the Pten, Vapa or Cnot6l senders, thus the CS distribution was centered around zero in both HCT and

DICER. As suggested by the reduction of miRNA activity in DICER -/-, the distribution of CS in DICER was substantially shifted towards smaller values than CS in HCT. (Figure

2.3). Strikingly however, crosstalk strength in all the conditions was bounded by +1 - al- most no genes were down-regulated greater than the sender down-regulation i.e receiver gene expression fold changes were smaller than the 70-80% fold change of the sender.(Figure

35 Chapter 2

2.2c). Hundreds of genes had statistically significant (p<0.05) CS between 0.1 and 0.5 but

relatively few had greater crosstalk strength that was also significant. We obtained the sta-

tistical significance for gene Crosstalk Strength by calculating z-values from our replicates

and using the Benjamini-Hochberg method to adjust p-values for multiple comparison test-

ing. Interestingly, genes with negative crosstalk strengths had comparatively higher p-values

(more replicate variability) indicating that they tended to be expressed at lower levels. Taken together our prediction that crosstalk strength should be bounded was supported by the

genome-wide expression data.

HCT CS HCT CS HCT CS DICER CS DICER CS DICER CS

a> 0 "q

sq C

-1.5 -1.0 -0.5 0.0 0.5 1.0 -1.5 -1.0 - .5 0.0 0.5 1.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 PTEN Crosstalk Strength VAPA Crosstalk Strength CNOT6L Crosstalk Strength

Figure 2.3 1 Crosstalk is miRNA-mediated and pervasive on a genome-wide scale. (a) Probability density of the crosstalk strength distribution in both HCT (black) and DICER (red) for all genes expressed above 0.1 RPKM in each of the 3 senders. Observed crosstalk strength in all of the knockdowns is always less than 1. CS is higher in HCT cells compared to DICER for many genes, and more genes have negative CS in DICER. The inset shows the same distributions but with the number of genes whose CS HCT > CS DICER calculated for each of the 30 bins across HCT CS. This indicates that hundreds of genes exhibit miRNA mediated crosstalk across the genome for each of the three senders with low-moderate crosstalk strength (0.1

To determine whether or not these extensive positively crosstalking genes were indeed

miRNA mediated, we chose only those genes whose crosstalk strength in HCT116 was

greater than that in DICER. We found such genes across a range of crosstalk strengths

ranging from low (n=440 Pten ceRNAs with CS=0.1) to high (n=65 Pten ceRNAs with

CS=0.5)(Figure 2.3) indicating that putative ceRNAs were found pervasively across the

36 2.1. RESULTS

transcriptome. In addition, on examining the expression range of these putative ceRNAs, we found that they were expressed across 3 orders of magnitude. These included some ceR-

NAs previously discovered (Cnot6l, serinci, Vapa, zeb2) but also hundreds of novel ceRNAs

((Figure ??). A GO-term analysis for putative Pten ceRNAs showed significant enrichment for a range of biological processes including "protein phosphorylation", "regulation of phos- phate metabolic process' (Table 2.3), which are also GO-terms linked to the functional role of Pten- which acts as a tumor suppressor through the function of its phosphatase protein product.

To assess whether these putative ceRNAs were actually responding to changes in miRNA levels due to depletion of the sender, we performed a miRNA enrichment analysis. MiRNA- mediated crosstalk would require that these putative ceRNAs are enriched in miRNA bind- ing sites for their particular senders. Indeed, we found many sender-targeting miRNAs that are enriched in their respective putative ceRNA lists (Table 2.1). These include miRNA families (mir-17, mir-19, miR-93, miR-26) previously implicated in Pten ceRNA networks

[Poliseno 2010]. Intriguingly, we also found statistically significant miNRA enrichment in these ceRNA sets for miRNAs that are not known to have binding sites on the sender, suggesting ceRNA effects can propagate via the interconnected miRNA-target networks.

2.1.4 Crosstalk strength correlates with the number of shared binding

sites

We reasoned intuitively that if miRNAs of different families are sequestered by a sender, then each miRNA released upon sender knockdown would repress their targets independently, thus amplifying any crosstalk between the sender and receivers which share binding sites.

Indeed, our model, along with others [Ala 2013], suggests that crosstalk depends on the overlap of miRNA-binding sites between senders and receivers. Specifically, it increases with the increase in the number of shared MRE's. In order to test this hypothesis we first tested the weaker claim: genes that share multiple miRNA binding sites with the sender must have greater crosstalk strength than the set of all genes. The second, stronger claim we tested was: the more miRNA binding sites a receiver shares with the sender, the more

37 Chapter 2 its crosstalk strength ought to increase.

We tested the weaker claim by ranking genes exclusively by the of shared miRNAs with the sender (independently for Pten, Vapa and Cnot6l). We counted all the predicted target-scan overlapping binding sites shared between any given mRNA and the sender, and then ranked this list of genes by the number of shared binding sites. We thus obtained a list of top 500 Pten, Vapa and Cnot6l "shared miRNA predicted ceRNAs". We then compared the CS of these genes in HCT to that in DICER cells, and found that their HCT CS is significantly greater than their DICER CS for Pten, Vapa but not for Cnot6l (Figure

2.5a) suggesting that our measurement of crosstalk strength was miRNA dependent and supported the hypothesis of the correlation between shared binding sites and crosstalk strength. In order to eliminate any systematic CS bias in HCT versus DICER, we also checked the CS distribution in the three HCT 116 sender libraries. We found that the HCT crosstalk strengths in these "top 500 shared miRNA predicted ceRNAs was significantly greater the the control set (consisting of all genes) for Pten, Vapa but not for Cnot6l (Figure

2.5b).

We caution that not all of these computationally predicted genes that share miRNA binding sites with a sender have positive CS. For example, 155 genes of the "top 500" genes that share more than 3 miRNA binding sites with Pten have negative crosstalk strength thus demonstrating that computational methods of predicting ceRNAs have to be supplemented by experimental tests due to the high number of false positives present in TargetScan binding sites predictions.

Because the second claim is more quantitative than a simple comparison, we wished to remove contamination from non-ceRNAs and required our basic condition: HCT CS>

DICER CS be met. To further increase stringency, we took this list of candidate ceRNAs and required that they share at least four miRNA-binding sites with the sender. For Pten we found 858 genes and for Vapa 610 genes

We then binned these receiver genes into quintiles of the number of shared miRNAs. For each of these quintile gene-sets, we computed the median CS independently for each sender.

Consistent with the model, the crosstalk strength was shifted to higher levels in receivers

38 2.1. RESULTS

that share more and more miRNA binding sites with the sender (Figure 2.5c). The greater

CS of Pten may indicate a greater propensity to sequester miRNA's or its greater affinities to miRNA (see discussion). These results confirmed that shared miRNA binding sites play a significant role in transmitting crosstalk between a sender and a receiver.

With this analysis, we found that Cnot6l shows no evidence of crosstalk strength de- pendence on number of shared mIRNA (Figure 2.4) : (i)the genes that share more than

8 miRNA binding sites with Cnot6l have lower crosstalk strength than those that share no miRNA binding sites. (ii) there is no increase in crosstalk strength for genes binned by the

# of shared miRNAs with Cnot6l

all genes top 500 - shared miRNA

C;

00

cell of"to500 gees hatshare grae hn7miRNAbidgstswthC TL

(black)~~~~~~~~~NO6ihteC itiuinfralgns(ry.TemdaCln rosstalk strength

is smaller for "top 500" genes than that for all genes indicating that CNOT6L crosstalk is not dependent on the number of shared miRNA

39 Chapter 2

top 500 shared 0.20 HCT 116 a miRNA b HCT 116 -

DICER alligenes tIoP 500 * siVAPA

C0.15 -

0<0 a) E CA '0.10 - -1.0 -0.5 0.0 0.5 1.0 -to -015 0.0 0.5 1.0 PTEN Cro.takatnenEM OTEN Crossilk .100001 0 a)

DICER to S1M1. cc, 000

L

(3 0.00

4 P< 1 - 1 ~ 3' -1.0 -0.5 0.0 0.5 10 -10 -0.0 00 05 to VAPA 00.000.00050 Ss0500010 VAA o etelkst0 Bins (# of Shared miRNA)

Figure 2.5 1 Crosstalk strength of receivers correlates with the predicted number of miRNA binding sites shared with the sender. (a) and (b) Crosstalk strength is higher for receivers that have the largest number of predicted miRNA binding sites in com- mon with their respective senders both between HCT116 and DICER, and within HCT116 cells. (a) Cumulative distributions of crosstalk strengths wrt each sender for receivers that share the most binding sites with the sender. The crosstalk strength distributions for these set of genes is shown in HCT116 and DICER. "top 500 shared miRNAs" indicates the ranked list of genes sharing at least 4 or more binding sites with the sender, see text). These genes show a significant increase in CSh116 compared to CSdicer . p < 10-9 for the difference between the distributions was calculated by the one-side Kolmogorov Smirnov (K-S) test. (b) same as above but the Crosstalk strength distributions wrt each sender are for all genes in HCT116 and the set of "top 500 shared miRNA" genes also in HCT116. These genes show a significant increase in CS compared to the 'all genes' background set. 5 (K-S test).(c) Genes that share the p < 10- , p < 10-4. for CS pten and CSvapa respectively most binding sites with the respective senders were grouped into bins based on their of shared binding sites (colored # of shared binding sites is indicated on x-axis). Only those receivers with CShctll6 > CSd"i were selected. The median crosstalk strength in each bin is reported (for each sender). The distribution of CS for each bin was significantly different from the preceding bin with all p-values less than 10-3. (KS test). Each bin had atleast 90 genes.

2.1.5 miRNA's hierarchically contribute to transmitting crosstalk

With these quantitative genome-wide measurements of crosstalk strength, we next turned

to measuring the ability of a miRNA to transmit crosstalk. Given that different miRNAs

vary in their concentrations, binding affinities, target abundances, all of which modulate

40 2.1. RESULTS

their ability to transduce crosstalk, we wished to dissect their individual contributions.

To determine which miRNA's were involved in mediating crosstalk, and to what extent, we developed a metric to quantify the bulk effect of sender knock-down upon the predicted targets of a miRNA. Rather than evaluating the crosstalk stength for a particular target of a miRNA, our metric characterizes the the cumulative, concordant variations of all, rather than individual target genes. Specifically, for each miRNA, we calculated the difference between the median CS of its targets (genes that contain a predicted binding site for that miRNA) and its non-targets (genes that don't contain a binding site for that miRNA).

CT powered = median (CS targets of miRNA,)- median (CS non-targets of miRNA,) miRNAi (2.7)

We term this shift in the CS distribution for targets vs non-targets the Crosstalk Power for each miRNA (Figure 2.6a). Note that we don't require the crosstalk strength of a particular gene to be statistically significant as we are interested in the cumulative effect of a miRNA on all its targets. Reassuringly, the 'Crosstalk power' for conserved miRNA's that have known binding sites on Pten and Vapa (again, not for Cnot6l) is greater than the crosstalk power of miRNA's that are not predicted to have binding sites on these senders(Figure 2.6c). This suggests that crosstalk power can be used to discriminate be- tween sender-targeting miRNAs and non sender-targeting miRNA. The crosstalk power of the sender-targeting miRNA families are shown in (Figure 2.6b). We found that miRNA's

differ considerably in their ability to transmit crosstalk, as exemplified by miR-374ab and

miR-875 which emerged as the miRNA with the greatest Pten and Vapa miRNA crosstalk

power respectively. Strikingly, almost all Pten-targeting miRNAs have positive CT power,

including many miRNAs that have greater crosstalk power than mir-17, mir-19, miR-20a,

mir-26a which have been previously shown to directly mediate crosstalk for Pten. [Poliseno

2010].

Pten therefore has the ability to transmit miRNA through by sequestering many different

miRNAs allowing it to promiscuously interact with ceRNAs. In general however, not all

miRNAs that are predicted to have binding sites on the senders necessarily have a positive

41 Chapter 2 crosstalk power; nor do all miRNAs with positive crosstalk powers necessarily have binding sites on the senders. For example, we uncovered 92 different miRNA with positive crosstalk power for Pten and 67 different miRNA with positive crosstalk power for Vapa that do not have any predicted binding sites on the respective genes.

One factor that our model suggests can influence the ability of a miRNA to transduce crosstalk for a sender is the cumulative number of its binding sites sequestered by that sender. A more effective sender of crosstalk would sequester many miRNA's (higher Kd ) but would only be weakly repressed by them, enabling a greater contribution to free miRNA pools when the sender is perturbed. However, it is very difficult to experimentally quanti- tate miRNA sequestration on miRNAs. We therefore estimated the sequestration fraction bioinformatically for each of the miRNA's which target Pten (similarly for Vapa, Cnot6l).

To do so, we calculated the ratio of the number of predicted targetscan binding sites on

Pten (scaled by Pten's expression) to the predicted targetscan binding sites for that miRNA on all its other targets (scaled by their expression). This ratio quantifies for each miRNA the fraction potentially sequestered by Pten. As expected from the model, we find that miRNA crosstalk power for each sender is strongly correlated with the fraction of miRNA binding sites sequestered by the sender (Figure 2.6d). Notably, those miRNAs which are sequestered by Pten the most tend also to have the greatest crosstalk power (miR-374ab, miR-410)

42 2.1. RESULTS

b a Ist-7M84W .04- PTEN miRNA CT power 0.02- no- rges P.3030946-10

CrT 0.0Emil

I 'r 06-6- 000 E TEV 6- LC 0.04- I VAPA miRNA CT power 0.02 - 0.00 - P -0.02 - -5 -1.0 -05 00 05 10 1.5 PTEN Crosstalk Strength

"2 ---E2-2 E Eg 9 T -a I d C PTEN mRN rho = 0.37 3 bgr miRNAs .. 00...... ' _ ...... C)

0 -0.04 -0.02 0.00 0.02 004 0.00 0.00 0.02 0.04 0.0 0.06 0.10 maRNA sequestraion %

VAPAmiRNAs - o bgr PoRNAs 6 rfo = 0.35 a VAPA I- 0 z -g -0.04 -0.02 0.00 0.02 0.04 0.06 0.00 0.02 0.04 0.00 0.6 0.10 miRNA Crosstalk power miRNA sequestraton %

Figure 2.6 1 Dissecting relative contributions of miRNAs in transmitting crosstalk. (a) Histogram of Pten crosstalk strength in HCT116 for predicted targetscan targets of let-7 (red) and its non-targets (gray). The bulk-contribution of let-7 in transmit- ting Pten crosstalk to all of its targets can be estimated by the difference in the medians of the two distributions. We defined this difference as the "Crosstalk power" (CT power) of the miRNA let-7 for the sender-Pten. Crosstalk power can similarly be calculated for all 153 conserved miRNA families expressed in HCT cells from each of the sender crosstalk strength distributions. CT power is larger for those miRNAs whose targets suffer a large overall repression when the sender is knocked down. Only those genes with CS HCT>CS DICER were considered. (b)miRNA CT power for all the miRNAs which target Pten, Vapa shows differential ability of sender-targeting miRNAs to transmit crosstalk. Those miRNA with negative CT power are those whose targets tend to be up-regulated when the sender is knocked down, and are thus unlikely to be involved in the ceRNA effect. miRNAs which have shared binding sites in all the three senders are in bold. See [table] for miRNA CT power and p-values for all 153 miRNA families. (c)Cumulative distributions of miRNA crosstalk power for all the miRNAs which target the sender (red) compared to the miRNAs which dont target the sender (black). (d)miRNA CT power for sender-targeting miRNAs is correlated with the its fraction of binding sites on the sender- its sequestration fraction. Those miRNA with higher CT power also tend to be (relatively) highly sequestered by the sender.

43 Chapter 2

2.1.6 Pten miRNAs have the greatest crosstalk power due to high [miRNA]:

Target abundance ratios

A recent study using Argonaute CLIP assays [Bosson 2014] has shown that miRNA:Target ratios is correlated with higher Argonaute binding on genes, and consequently, greater miRNA repression. It has been experimentally demonstrated that only the most abundant miRNAs have significant repression suggesting that ceRNAs are free from miRNAs when those miRNAs have low concentrations. Conversely, previous analysis of miRNA repression showed that miRNAs with lower miRNA:Target abundance ratios deliver minimal repression

[Garcia 2011; Arvey 2010]. A possible explanation is that lowly expressed miRNA have a low probability to find their target sites on transcripts because miRNA-target target encounter occurs by mass action. Additionally, when microRNAs that are expressed at a low level have hundreds of different targets (i.e have high target abundance), a single miRNA would have a limited repressive impact on any one gene.

We sought to investigate differences in the relative miRNA and target levels for our three senders. We first obtained miRNA expression profiles in HCT116 from a miRNA microarray

[Yan 2011] and found that the average expression of miRNAs which target Pten were greater that Vapa or CNOT6L targeting miRNAS. Surprisingly, even though Cnot6l has several more predicted miRNA binding sites than Pten (Figure 2.7a)(44 and 24 respectively), the average expression of Pten-targeting miRNAs is almost four times greater than the average expression of Cnot6l targeting miRNAs(Figure 2.7c).

We next estimated Target Abundances (TA) for each miRNA by summing the predicted

6-mer,7-mer and -8-mer binding sites on each of its targets scaled by the RPKM expression of that target in HCT116 in our data [following Bosson 2014]. We averaged the target abundance for each of the miRNAs targeting Pten, Vapa, Cnot6l. Interestingly, we found the opposite hierarchy between the 3 senders: Pten had the least TA while Cnot6l had an average TA about 10 fold higher(Figure 2.7b). Thus, Pten has the greatest [miRNA]: TA ratio of the three, allowing us to hypothesize that miRNA's targeting Pten might confer greater repression on their targets, compared to Vapa and Cnot6l, rendering Pten ceRNA's

44 2.1. RESULTS

more susceptible to crosstalk. a aof Prodded twgotio ndFtdA e

PTE!% or- 0-

of n~lbgS b Avg TWgotobuiduio 0-

0

AqLL~q1Ofld lnvHNA 0 C) ---- VAPA

C Avg EMwedon of 1W98"o ojfA ccz CNCFr 0O -CNOT6L VM% 0~ KM

Ag WE~so)O WgWdoNA * Targeting miRNA 01 0 Background rniRNA d modhon ,oIANA coostilpowortorrno tWgott adRt4A ci 2.00 2.05 2.10 2.15 [miRNA] Target Abundance FdCoQv&ddpoffb g.&.oA NA

Figure 2.7 1 Greater miRNA:Target ratios underlie Pten's superior ability to send crosstalk. (a) TargetScan based prediction of the number of different miRNA fam- ilies targeting each of Pten, Vapa and Cnot6l. (b)Average target abundance of sender- 153 human targeting miRNAs (in log10 units). Target abundance (TA) for each of the conserved miRNA was calculated (see methods) by summing the predicted 6-mer,7-mer and -8-mer binding sites on each of its targets scaled by its target expression in HCT116. (c)Average miRNA expression for each of sender-targeting miRNAs. miRNA expression in HCT116 cells are from a miRNA microarray dataset (Yan 2011) and are in relative units. Pten is targeted by highly expressed miRNAs compared to Cnot6l. (d)Median crosstalk power for all miRNAs which target Pten, Vapa and Cnot6l respectively. Pten miRNAs have greater crosstalk power. Crosstalk powers for each miRNA (for each sender) were calculated from the crosstalk strength distribution as in the text. (d)miRNAs with greater crosstalk power also have higher [miRNA]: Target ratio as exemplified by Pten which has the greatest [miRNA]: Target ratio and miRNA crosstalk power of the three senders. Targeting miRNAs (white) are those (red) are all those miRNAs which target the sender. Background miRNAs miRNAs that dont have predicted binding sites on the sender.

We used miRNA crosstalk power as a proxy for repression and ranked the senders by

our experimentally determined crosstalk power for each miRNA. Pten clearly emerged as

the best sender of crosstalk-its miRNAs had much greater miRNA crosstalk power than the

other two senders Vapa and Cnot6l (Figure 2.7d)- about twice the median crosstalk power

than Vapa's miRNAs. In fact, for each sender, we observed that miRNAs which target them

had both greater [miRNA]:TA and crosstalk power on average, than the background set of

45 Chapter 2 miRNA's that did not target them (Figure 2.7e).

Thus, we conclude that senders such as Cnot6l, which are targeted largely by low- abundance miRNA's with comparatively more targets have much weaker ability to transmit crosstalk compared with a sender such as Pten. However, we are only making a compara- tive claim between the senders- as both the miRNA expression data and target abundance estimations are in non-absolute concentrations, we cannot not be certain that miRNA con- centrations are in excess of the target pool size or vice versa. Moreover, we observed no correlation between the [miRNA]:TA ratios and the miRNA crosstalk power for just the

Pten miRNAs, just an overall correspondence in the median miRNA crosstalk power and average [miRNA]:TA of the three senders. We found individual miRNAs that had high Pten miRNA crosstalk power but had a low [miRNA]:TA and vice-versa.

2.1.7 Transfecting Pten UTR as a sponge de-represses putative ceRNA's

in a dose-dependent and miRNA dependent manner

As Pten emerged as the strongest sender of crosstalk in the siRNA knockdown experiments,

we wanted to exclude the possibility that transcriptional regulation via PTEN protein, a

tumour suppressor and a key member of the P13KT -mTOR pathway, may have been a

factor in the widespread crosstalk changes observed. We sought to clarify two questions:

a)whether miRNA binding sites on the Pten 3 'UTR were directly responsible for the

crosstalk effects b)to what extent would Pten ceRNAs be de-repressed by modulating the

amount of Pten 3' UTR i.e the varying levels of MREs by an endogenous 3' UTR

46 2.1. RESULTS

P a pTRE-Tight b PTEN 3'UTR 20% transfection

0.V

pTRE-TihtNULL ic~iencyiiii31m Iiu

C)

4 0 102 103 10 105

C d 5 -PTEN 3'UTR 2.0 4.5 -NL-NULL TR3'UTR 4 1.5 -

13 NO 4, 1.0

0.5

2.5 3 3.5 4 4.5 0 logio mCherry [a.u]

Figure 2.8 1Derepression of Pten ceRNAs is detected upon modulating the levels of Pten 3' UTR with a transiently trasnfected synthetic reporter construct. (a) A synthetic two-color reporter construct for measuring the effect of Pten 3' UTR sponging in single cells The construct consists of a bidirectional tetracycline-responsive promoter that drives the transcription of two fluorescent reporter proteins: ZsGreen and mCherry. We fused Pten 3'UTR to Zsgreen, and the unmodified plasmid is used as a control (NULL 3' UTR) (b) Flow cytometry measurement of HCT116 cells transiently transfected with Pten 3'UTR sponge plasmid and induced with doxycycline for 18h (cells positive for plasmid are in purple) indicate a robust expression of the Pten 3'UTR sponge across 3-decades. Transfection efficiency is about 20%. (c)Pten 3'UTR. is under robust repression throughout the plasmid expression range. It exerts a strong influence on Zsgreen levels as seen by the difference in transfer function of Pten 3'UTR and NULL 3'UTR transfections. Cells were binned by mcherry expression and the mean Zsgreen expression was calculated for each bin. (d) Total RNA from sorted cells (purple in b)) carrying the Pten 3'UTR plasmid was probed for the expression of known Pten ceRNAs with RT-PCR. Expression of indicated genes was normalized to their expression in the un-transfected cells. Data are mean sem(wt)

Constructing and transfecting Pten 3'UTR reporter sponge into HCT116 cells

The Pten 3'UTR contains predicted sites for 25 different miRNA families (http://www.

targetscan.org) and there is direct evidence using RNA immunoprecipitation for Pten reg-

47 Chapter 2 ulation by miR-106ab, miR- 130 and the miR-17-92 cluster (which encodes the microRNAs miR-17, -18, -19a, -19b, -20a, and -92 in HCT 116 (Tay, 2011). To explore the genome-wide effects of sponging away miRNA's with an endogenous 3' UTR on competing RNA's we adapted a plasmid-reporter system previously developed in our lab [Mukherji 2011].

The plasmid contains two genes that encode fluorescent proteins (ZsGreen and mCherry), which are transcribed at identical levels from a common bi-directional tetracycline-inducible promoter and contains multiple-cloning sites to insert any 3'UTRs of interest(Figure 2.8a).

To probe the effect of microRNA sponging, we constructed a variant carrying the entire 3'

Pten UTR fused to the Zsgreen gene. We transfected this plasmid into the HCT116 cells and used the original plasmid (without the Pten 3'UTR) as a control (we call this the null UTR

). The mcherry/Zsgreen fluorescence from the NULL UTR construct is used as a control and allows us to isolate only the effect of the Pten 3'UTR sponge. In order to induce the promoter with doxycycline we co-transfected these cells with the rtTA plasmid as HCT 116

does not endogenously produce rtTA transcription factors. We observed robust expression of the Pten 3'UTR sponge construct across 3 orders of magnitude on quantifying the single

cell fluorescence 18 hours later using a flow cytometer (Figure 2.8b). In principle, plasmid

induction starts immediately after the addition of doxycycline, but we observed more de- repression in confirmed Pten ceRNAs Vapa, Cnot6l and SERINC1 18h later as compared to after 12h or 36h [supp figure], possibly due to miRNA degradation timescales.

To ascertain whether overexpression of Pten 3'UTR with a synthetic construct was ca-

pable of sponging away endogenous miRNAs, and thus derepressing other targets, we mea-

sured its effect on some previously established Pten ceRNAs. Taking the ratio of zsgreen to

mcherry across bins of mcherry fluorescence in the flow cytometry measurements in each of

the Pten 3'UTR and NULL plasmid transfections, allowed us to calculate the Pten 3'UTR

fold repression across the transfection range. We observed that Pten 3'UTR was under

weak repression (upto 2-fold) throughout the transfection range (Figure 2.8c). FACS sort-

ing only the mcherry expressing cells, and measuring the bulk RNA levels of four Pten

ceRNAS- (Vapa, Cnot6l, SERINC1) with RT-PCR showed that they were de-repressed [fig-

ure] by 40-80% which confirmed that the Pten 3'UTR sponge was functionally engaging the

48 2.1. RESULTS miRNAs in the cell and competing with its known ceRNAs (Figure 2.8d). Additionally, in the RNAseq analysis, we could discriminate between the Pten UTR and Pten coding sequence reads, and found that Pten mRNA (cds) was de-repressed increasingly as the ex- ogenous Pten 3'UTR sponged away miRNAs from the endogenous Pten mRNA, confirming our observation that Pten 3'UTR was under mild repression [figure 2.10 b]. Having observed an increase in Pten (coding sequence) expression throughout all the bins, we reasoned that the transfected Pten UTR sponges could also derepress other potential ceRNAs across the transcriptome.

FACS sorting cells with varying amounts of Pten 3'UTR for RNASeq Assay

In order to isolate cell populations expressing varying amounts of Pten 3'UTR we used FACS sorting. We used mcherry fluorescence intensity to bin cells with similar transcriptional activity (e.g. due to varying plasmid copy numbers), indicating similar levels of Pten 3'UTR sponge. For both of the Pten 3'UTR and NULL 3'UTR transfections, we then FACS sorted

100,000 live cells in 4 different bins across 3 orders of magnitude (Figure 2.10a), see

Methods), extracted RNA (500-10OOng per bin) and quantified the transcriptome of each bin using RNA sequencing. As the amount of plasmid expression in bins 2 and 3 were upto

30% of the total reads (Figure 2.9a), and moreover, due to repression the expression of

Pten 3'UTR and Zsgreen expression were very different in each Pten UTR or NULL BIN, estimating fold changes was not straightforward. Even after explicitly removing the reads coming from the plasmid, and performing RPKM normalizations, we observed an overall offset in the overall distribution of fold changes (Pten UTR/NULL) in bins 1,2,3 (Figure

2.9b). We used the more appropriate TMM (trimmed mean of M-values) normalization method [Robinson 2010] to estimate the scale factor to remove the overall offset. After doing so, we could measure the fold changes of the transcriptome in each bin reliably, and set it as the ratio of the normalized TMM values in each Pten 3'UTR bin to the TMM values in the corresponding NULL UTR bin.

Now that we could reliably infer the magnitude of fold changes causes by the sponging effect of Pten UTR, we decided to explore the concordance between genes identified as

49 Chapter 2 putative Pten ceRNA's by the siPten knockdown and genes derepressed by Pten UTR overexpression. These genes would be sensitive to the levels of Pten, and so would be extremely likely interacting with Pten through the crosstalk mechanism. To identify these genes, we first obtained the distribution of null fold changes from technical replicates in both the siPten knockdown and the Pten 3'UTR RNAseq data, and defined a null fold change threshold and CS of 1 standard deviation above 0 (Figure 2.10c). We only considered genes whose FC and CS were above this threshold. These genes, were therefore both reduced when Pten was knocked down, and de-repressed when miRNA's were sponged away by Pten

3'UTR, making them sensitive to perturbations in Pten levels in both directions. We found

2305 genes meeting these criteria in bin 0, 2493 in bin 1, 2090 in bin 2 and 2470 in bin 3.

50 2.1. RESULTS

a C RPKM normalization TMM normalization Fewgesswah motmad counts C\1 3,~ C.'j z0 0

0 C, . F1 Ti- I. BINO C%j BIN 0

Jft 04 b 30 * PTEN UTR NULL UTR J I* -C\- 20 -j .* BIN I .- BIN I

10 - t 0...- --- . *S. i

* 0 BINO BINI BIN2 BIN3 C.j -j d BIN 2 -1 .. BIN 2 1Wit C\j 0 -C'J U- 0.75

B IN 3 0 5 10 0 5 10 0.5 BINO BINI BIN2 BIN3 A=.5*10g2(WT*NULL) A=.5*10g2(WTr*NULL)

Figure 2.9 1 Normalization is required for FACS Sorted RNAseq data as reads from plasmid occupy a large percentage of total sequencing reads leading to an overall offset in fold changes. (a) Schematic of two libraries A and B with a small set of genes in library B having enormous of sequencing reads thereby reducing the sequencing "real estate" for the rest of the genes. Require an overall scale factor to normalize the library sizes. (b) Proportion of Plasmid reads (mcherry+zsgreen+Pten 3'UTR) of the total sequencing reads from the indicated sorted bins ( in the Pten 3'UTR and the NULL 3'UTR data sets). Total RNA output from each bin is quite different with reads from plasmid taking increasing sequencing real estate. (c) M (fold changes) versus A (average expression) plot comparing RPKM values from the Pten UTR and NULL datasets for each bin shows a clear offset from zero from Bins 1,2,3 (left panel). Genes indicated in red are in the middle 40% of M values and middle 90% of A values which are used to estimate the TMM factor as described in the methods. The green line shows the estimated TMM factor and is offset from zero in bins 1,2,3. Panel on the right contains the same M-A plots with the offset removed after normalizing the fold changes with the TMM factors. y-axis is in log scale. (d) Estimated TMM normalization factors from b) is used to normalize the library size of the respective bin.

We hypothesized that these genes were most likely to be 'robust Pten ceRNAs' , and thus would show a bin-dependent signature of crosstalk. We observed increasing derepression in

51 Chapter 2 these robust Pten ceRNA's as more Pten UTR's were expressed in the system (Figure

2.10d). Notably, these robust Pten ceRNA's have a median fold change of 0.19 even when very few Pten UTR sponges are present (bin 1) and the median fold-change increases to

0.27 when 103 more Pten UTR sponges are present (in bin 3).

In order to verify that the fold changes in the transcriptome that we observed with the 3'Pten UTR plasmid were miRNA-dependent we examined if the magnitude of de- repression was correlated with the overall functional efficiency of the miRNA binding sites

(based on the context+ score of the site) in each bin. We relied on our siPten knockdown data to select those miRNAs that were involved in Pten crosstalk with high confidence.

We had ascertained that miR-17, miR-19a, miR-20a and miR-130 were the miRNAs both strongly involved in transmitting Pten crosstalk to their other targets, and were highly enriched in the putative Pten ceRNAs. Moreover, they have been show to physically bind to Pten 3'UTR by RNA-Immunoprecipitation (RIP assays) [Poliseno 2010] . With our cleaner Pten 3'UTR overexpression system, we investigated whether these miRNA were directly being sequestered by Pten 3'UTR and thus derepressing their targets, as such a dependence would result in an increasing bin-dependent fold changes of their targets. We analyzed the relationship between bin-dependent derepression of genes by these miRNA's based on their site number, site type (6-,7,-8-nt sites), site position, and other determinants used by TargetScan to calculate total context+ scores of predicted miRNA targets [Lewis

2005; Garcia 2011]. Binding sites with greater context+ scores have been shown to to be effectively bound by miRNAs and repressed. When predicted targets of miR-17, miR-19a, miR-93 and miR-130 were distributed into 4 context+ score bins and the distribution of fold changes was plotted, the effect of increasing target derepression was clear in bins 2 and

3, but not so in bins 0 and 1 (Figure 2.10e). Thus, the affinity of miRNA binding sites in each receiver leads to greater crosstalk strength even for a fixed amount of the sender in each bin (PTEN 3'UTR level)

52 2.1. RESULTS

'3- a 5 "'r 3r 2 - - MCHERRY M- - PTENLUTR ZSGREEN -1 - transfecti on- +flow 3 sorting N 2.5- -N B

bos

BIN 0 BIN I BIN 2 BIN 3 0 1 2 r 3 4 Iog(mcherry) C d

+FC M

i -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -I- FoMwinge MM1) I **1 / 15 / 1 1 * IC BN

BIND#

I 0 F h0 (5 1N0 1.6 FC~ang Ch9 ( NULL r -1.5 -1.0 -0.5 0,0 0.5 1.0 1O PTENCro95b rI~hn

e BIN I (WT/NULL) BIN 2 (WT/NULL) BIN 3 (WT/NULL)

0 25 75

miR-17 miR-17 miR-17 miR-19a - miR-19a - miR-19a miR-93 miR-93 miR-93 miR-130 miR-130 miR-130

Lot 0.0 Can 1.0 I. Lot Fold Change Loni Fold Change Lotu Fold Change

Figure 2.10 1 Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dose-dependent and miRNA dependent manner. (a) Schematic of FACS sort- ing: Cells are transfected with bidirectional plasmids expressing mCherry and ZsGreen with Pten 3' UTR and without (NULL 3' UTR) . The transfected cells are sorted on the flow sorter into 4 different bins depending on mCherry expression and collected for downstream RNAseq (b) Expression (in RPKM) of mcherry, zsgreen, pten3' UTR and pten coding se- quence in each bin for the cells transfected with Pten 3' UTR plasmid. Pten coding sequence (RPKM) is increasingly upregulated in each bin indicating that the Pten 3' UTR plasmid is capable of sponging away miRNA.

53 Chapter 2

Figure 2.10 | Transfecting Pten UTR as a sponge derepresses putative ceRNAs in a dose-dependent and miRNA dependent manner. (c) Distribution of RNAseq fold changes for bini (WT/NULL) and Pten CS. We refine potential Pten ceRNAOs by intersecting the sets of genes repressed in Pten knockdown and derepressed in Pten UTR transfection. A threshold for null changes "OFC" or "OCS" is determined as 1 std. deviation of the fold change in the technical replicates (gray bar). Only genes that have positive Pten crosstalk stength (+CS) and positive fold changes in each bin are considered as 'robust Pten ceRNAs' as they are sensitive to Pten levels in both directions i.e they are reduced when Pten is knocked down and are de-repressed when Pten 3'UTR sponges are intro- duced. (d) Cumulative distributions of fold change for genes in the intersection of the two datasets. Inset shows the median fold change for robust ceRNAs in each bin. Robust ceR- NAs are increasingly derepressed in each bin. P-values for difference in medians between each preceding bin were calculated by Wilcox rank sum test (P<0.05 (bini), P<0.01(bin2), P<10^-16(bin3)) (e) Cumulative distributions of fold changes for all targets of indicated Pten miRNA's (that were enriched in the list of Pten ceRNAs from the knockdown dataset) with increasing Context+ scores (colour).

2.2 Discussion and conclusions

Recent experimental studies have suggested that miRNA-mediated competition between

RNAs could be a new channel of post-transcriptional gene regulation, and such RNA-RNA

'crosstalk' affects many different biological contexts. Our study represents the first genome

wide measurement of crosstalk strength in response to the knockdown of three different

genes. Previous studies of the ceRNA hypothesis have concentrated on only one or a few

targets of a miRNA even though a perturbation in a ceRNA that changes miRNA activity

is expected to affect hundreds of targets. Quantifying the magnitude of crosstalk has also

proven challenging as existing studies rely on qPCR or luciferase assays, both of which have

difficulties in extracting precise fold changes due to issues with primer/enzyme efficiency

or amplifications biases. In order to fully test the generality and magnitude of the miRNA-

mediated crosstalk hypothesis, it is necessary to perform perturbation experiments to see

how the alteration of the expression level of one 'sender' mRNA could affect other 'receiver'

mRNAs regulated by the same miRNAs. Thus it is essential to measure crosstalk strength

transcriptome wide using a quantitative assay.

Knocking down a individual mRNA is expected to widely affect the transcriptome making

it difficult to extract the effect of miRNA-mediated effects. Thus we used the DICER -/-

54 2.2. DISCUSSION AND CONCLUSIONS cell line, which has depleted levels of mature mIRNAs, to isolate only those fold changes that were miRNA dependent. Careful scrutiny of RNA-seq crosstalk strength measurements in HCT116 and DICER-/- yielded a high-confidence set of putative ceRNAs for each of the three senders. We studied whether this cohort of putative ceRNA were actually miRNA- mediated and hence in accord with a ceRNA effect. Firstly, we found them to be enriched in miRNA-binding sites for their respective senders. Secondly, the hypothesis implies that the crosstalk effect should be more effective if genes share more miRNA binding sites with the sender. However, such a feature has not been experimentally demonstrated to the best of our knowledge. We binned our list of putative ceRNAs by the number of miRNA shared binding sites and found that their magnitude of crosstalk strength was correlated with the number of binding sites shared with their senders. We suspect that this feature implies that multiple miRNAs act cooperatively on receivers. Thirdly, different miRNAs have different total # of binding sites in the transcriptome, are sequestered to different extents by each sender, and therefore should have different ability to transmit crosstalk ("crosstalk power").

By considering the difference in CS distributions for targets vs non-targets, we ranked the crosstalk power for all miRNAs expressed in the cell-line. Other miRNAs besides miR-17, miR-19 and miR-26 (tested by Tay(2011) have greater PTEN crosstalk powers, thus we suggest that manipulating the levels of more highly ranked miRNAs from our list, would be more effective for future studies. As an example, because PTEN ceRNA's have known oncogenic effect, miR-374ab which we found has the highest crosstalk power for PTEN, could be a useful target for miRNA based cancer therapies(Cai 2013).

The originating studies of the crosstalk hypothesis had computationally found many pos- sible ceRNA candidates, but had only experimentally tested a few genes. Our data indicates that ceRNAs were pervasive across the transcriptome and were broadly expressed ( 3-decade range of RPKM).The functional relevance of a broad class of ceRNAs may be concordant buffering of lots of genes involved in similar biological functions. Indeed, in a GO-term analysis of putative ceRNAs, we observe shared functional roles of Pten, Vapa and Cnot6l ceRNAS with their respective senders. Such a covariation between a broad class of ceR-

NAs when a sender is perturbed could help maintain a stoichiometric balance in pathways.

55 Chapter 2

However, we caution that it is difficult to construct a null model for whether these covari- ations are themselves caused by transcriptional level changes of the sender. Our candidate ceRNAs may not be true ceRNAs. A limitation of our analysis is its dependence on DICER

-/- cells to control for spurious crosstalk effects that result from purely transcriptional net- work perturbations. For example, it is well-known that regulatory network structures such as incoherent feed-forward loops can produce positive correlation between an mRNA and targeting miRNA/Transcription Factors (Tsang 2007). How many of the ceRNA candidates identified in our analysis are directly repressed by targeting miRNAs is currently unknown.

Detailed experimental work is needed to examine these candidate ceRNAs; in particular, assays for miRNA binding and siRNA knockdown experiments can provide more conclusive evidence for ceRNA interactions in individual receiver-sender pairs. It will be a combination of our transcriptomic analysis with more biochemical assays to identify binding partners that will enable a greater understanding of the crosstalk mechanism.

The size of the ceRNA effect has been widely considered larger than expected purely by steady-state target competition because of the typically large number of targets (Broderick

& Zamore 2014). We find that crosstalk strength, though substantial, is usually less than

0.5 for most genes, and is generally bound by 1 for the three different senders that we tested. Crosstalk strength is larger than we expected based on most sequestration models

(including ours). For example, we estimate sequestration for most miRNAs on Pten to be less than 1% and Pten mRNA repression to be atmost 2 fold (based on PTEN 3'UTR sponging data) . So CS < Sequestration de Repressionr implies CS < 2%. In or- der to explain the relatively large CS magnitude, we suggest two possibilities. Firstly, it remains unclear if the total binding sites for a miRNA are truly in excess of miRNA con- centrations locally. Estimates of total average binding sites in the cell might be irrelevant to individual miRNA-target interactions that depend of local miRNA/target concentrations.

A recent theoretical study (Figliuzzi,2013) also finds that substantial crosstalk requires a small number of competing target sites. They propose that ceRNA function may require a channel of 'stoichiometric decay', in which a bound miRNA needs to be destabilized or functionally depleted by other mechanisms such as trapping in P-bodies. Secondly, the topol-

56 2.2. DISCUSSION AND CONCLUSIONS ogy of the ceRNA-miRNA network may play an important role as strongly interconnected sender:miRNA:receiver subnetworks could enhance crosstalk. For examples, the miRNAs

(miR-17 and miR-19) which are strongly implicated in PTEN ceRNAs in our data, are co-transcribed in polycistronic regions, and tend to have similar sets of targets, suggesting their repressive effects can amplify for a large number of ceRNas. (Yip 2014)

The discrepancy between the ceRNA effect we detect by over expressing PTEN 3'UTR

(even at low amounts) and the lack of any detectable ceRNA effect by over expression of synthetic miR-122 seed-sites (Denzler 2014) may be due to atleast two reasons. Firstly,

we used a full-length endogenous PTEN 3'UTR (3.3Kb) which contains multiple binding

sites of 25 miRNA families while Denzler et al used a short (125bp) AldoA mRNA with a

single miRNA binding site (miR-122). The functionality of miRNA target sites is affected

by numerous 3'UTR properties including, the presence of multiple target sites in close

proximity (Grimson 2007, Broderick 2011), the position of the site in the 3'UTR (Marjoros

2007), and the synergistic repression of multiple miRNAs(Lai 2012). Thus an endogenous

3'UTR' with multiple miRNA sites could have greater sequestration and miRNA-repression

ability than AldoA. Moreover,in our system, the PTEN 3' UTR sponge is under constant

repressive fold changes ranging from 2-3 (Pten) unlike miR-122 sponge, which exhibited a

loss of repression at higher induction levels ( from 2-fold to 0.1) suggesting that miR-122

was saturated. Other endogenous 3'UTRs we measured also had constant repression fold

changes 2-fold (Weel) and 5-fold (Lats2) (c.f Chapter 4, Schmiedel 2015) at all induction

levels, suggesting that endogenous 3'UTRs carrying more seed-sites are attracting more

miRNA repression. Secondly, the ceRNA effect depends on the cellular concentrations of

miRNAs; our cancer HCT116 cell-line has a different miRNA expression profile compared

to primary cells. The oncogenic miR-17-92 cluster in particular, which we found has high

PTEN crosstalk power, is known to be significantly upregulated in the HCT116 cell line

(Wang 2008).

57 Chapter 2

2.3 Methods and Materials

2.3.1 Cell culture and siRNA Transfection

The HCT 116 colorectal cancer cell-line was obtained from ATCC ( American Type Culture

Collection). The HCT116 DICEReon5 -/- cell lines was a kind gift from Dr. B. Vogelstein and was generated as described previously (Cummins 2006). HCT116 wild-type and HCT

116 DICER -/- cells were grown in an ATCC-formulated McCoy's 5a Medium Modified

(Catalog No. 30-2007) plus 10% (v/v) FBS, penicillin/streptomycin (Gibco), L-glutamine at

370C in a humidified atmosphere with 5% CO 2. Cells were grown adherently in 10cm dishes or 6-well plates at a seeding density of 1.0x10 5 cells/cm 2 until they were 50% confluent (40-

50 hours), upon which they were trypsinized, re-plated and transfected with 25nM siRNA for 24 hours. Titration of the siRNA and the transfection reagent was performed (data not shown), and the lowest working amounts of the siRNA and the transfection reagent were applied in the present study. Transfection of siRNA oligonucleotides was performed with

Dharmafect lipid transfection reagent according to the manufacturer's protocols. siRNA were purchased from Dharmacon as smart pools. Titration of the siRNA and the transfection reagent was performed (data not shown), and the lowest working amounts of the siRNA

(25nM) and the transfection reagent were applied in the present study.With this protocol more than 90% of cells were positive to the fluorescent siGLO RISC-free control siRNA. A list of immunological reagents used in this study is below. A master mix was created for each individual condition in order to eliminate pipetting errors and to increase consistency between each well. Each siRNA was transfected in triplicate in each of HCT116 and DICER

-/- and all the knockdown experiments were done simultaneously to avoid an additional source of variation. After 24 hours cells were harvested for various assays.

58 2.3. METHODS AND MATERIALS

Reagent Source

McCoys 5A Medium;Fetal Bovine Serum (FBS) ATCC(30-2007, 30-2020)

Trypsin ATCC(30-2101)

siGENOME siRNA pool for nontargeting 1 Dharmacon (Catalog D-001206)

5X siRNA buffer Dharmacon (Catalog B-002000)

SMARTpool si-Pten Dharmacon (Catalog M-003023)

SMARTpool si-Pten Dharmacon (Catalog M-021382)

SMARTpool si-Pten Dharmacon (Catalog M-016411)

2.3.2 RNA extraction

Total RNA was extracted from cells using Trizol reagent for the RT-PCR assay or us- ing RNeasy (Qiagen) for RNA-sequencing assays following the manufacturer's protocols.

RNA pellets were resuspended in 20ul RNase-free sterile water, RNA quantity was assessed spectrophotometrically using the NanoDrop ND-1000 UV-VIS Spectrophotometer (Thermo

Fisher). The RNA integrity number (RIN) was assessed with a 2100 Agilent Bioanalyzer to verify RNA quality for all experimental samples. Only samples with RIN >9 were used for sequencing.

2.3.3 RT-PCR mRNA levels of various transcripts were measured using RT-PCR. Reverse transcription into cDNA was done using a First Strand Synthesis kit (Invitrogen). RT-PCR was per- formed in triplicate reactions using SYBRGreen mix (Applied Biosystems), run on Applied

Biosystems 7500 Real-Time PCR instrument. Levels of various genes after siRNA knock- down were measured with the ddCT method and human Actin for normalization. List of primers used are in a supplementary table.

59 Chapter 2

2.3.4 Reporter Plasmid Construction

Starting from a previously established reporter system (Mukherji 2011), the plasmid pTRE-

Tight-BI (Clontech), eYFP was replaced with ZsGreen1-1 (Clontech) using EcoRI and NdeI digestion sites. We received the psicheck2 -Pten 3'UTR plasmid as a kind gift from Yvonne

Tay, The Pten 3'UTR sequence was cloned from that plasmid using custom primers, and was inserted into the bi-directional plasmid into the ZsGreen MCS via the NdeI and XbaI digestion sites using standard cloning techniques.This reporter plasmid is referred to as the

Pten 3'UTR sponge plasmid in the text. The "NULL" plasmid, which we used as a control, consists of the same construct as above, but without the Pten 3'UTR i.e just the plasmid containing the bidirectional tetracycline-responsive promoter that drives the transcription of two fluorescent reporter proteins: ZsGreen and mCherry. All constructs were sequence confirmed

2.3.5 Transient Transfection of plasmid

HCT 116 cells were grown in 2m of culturing media (antibiotic free) on 6-well dishes for two days before the transfection. PtenT3'UTR or NULL plasmids were mixed with the rtTA plasmid at a ratio of 3:1 (1.5 ug reporter plasmid: 0.5 ug rtTA plasmid) and then co- transfected into the cells in a medium consisting of 10ul Lipofectamine 2000 (Invitrogen) and 250ul Opti-MEM. 6 hours post-transfection, when the cells had stabilized, they were detached with trypsin, passaged onto 60mm plates in 3m1l culturing medium and induced with 1 ug/ml doxycycline (Sigma). Live cells were taken for flow sorting assay 18 hours post-induction.

2.3.6 FACS sorting

At the end of the transfection period, live cells were trypsinized, pelleted and resuspended into a single-cell suspension in McCoys 5A medium . These transfected cells were sorted by FACS into ice-cold PBS+3% FBS using a BD Biosciences Aria II flow cytometer in the following manner: (i) Single cells were gated using their FSC-A and SSC-A scatter profiles

60 2.3. METHODS AND MATERIALS

(ii) Only those cells containing the reporter plasmid were chosen based on their mCherry expression values. (iii) We collected cells into 4 different bins based on their mCherry ex- pression values (see figure). 100,000 cells from each bin (the same bins were used for sorting both Pten UTR and NULL UTR) were sorted into eppendorf tubes containing ice-cold PBS

1%FBS buffer and their RNA was extracted as above. This method gave a total of 500-1000 ng of RNA per bin. For Analytic flow cytometry cells were detached with 0.05% trypsin-

EDTA, washed and resuspended in sterile 3% FBS PBS. Measurements were performed on a BD Biosciences LSR Fortessa platform.

2.3.7 RNA Sequencing

From isolated RNA, poly(A)+ RNA sequencing libraries were prepared using Illumina True-

Seq Stranded mRNA kit in the MIT BioMicro Center. The prepared libraries were mul- tiplexed and sequenced on an Illumnia HiSeq 2500 sequencer to obtain single-end 40-bp reads. On average we obtained 20 million reads per sample. For each sample, there were three biological replicates. Reads were aligned with Burrows-Wheeler Aligner (BWA)(Li and Durbin, 2009) using parameters [q (PHRED-quality)=30,1 (seed length)=30] to mod-

ENCODE integrated transcript models on the basis of (hg19 version). We allowed a maximum edit distance of 2 [options "aln -n2" and flag "-uniq=1' to only map unique reads. The output was converted into SAM format using the BWA "samse" option, and processed with a custom perl script. Each library had 85% mapped reads. For the Pten

3'UTR plasmid transfection experiment, we disaggregated pten sequence into pten 3'UTR and pten cds and the sequences of mcherry, zsgreen were added to hg19 transcript model.

Reads were aggregated across isoforms, and expression per gene locus was calculated in reads per million mapped reads (RPM). Whenever expression was measured in RPKM, the length of merged isoforms was used for normalization

2.3.8 RNASeq Data Analysis

Genes with no zero-read counts in any of the libraries were retained, resulting in a total of

13,700 (out of 23,704) expressed genes. RPKM values were averaged over the 3 biological

61 Chapter 2 replicates. The CV was estimated in a Gene-independent manner by pooling all the CV measurements at a given expression in the following way: Loess regression was performed to obtain an error model relating expression CV for each gene as a function of expression mean for all samples. Expression CV for each gene was adjusted to the loess-regression fitted line of expression CV to expression mean. Significance of fold-changes was by calculating z-scores and standard benjamini-hochberg multiple hypothesis corrected p-values were obtained.

2.3.9 miRNA-mRNA Target prediction

Genes were labeled as predicted microRNA targets if they contain at least one predicted conserved microRNA binding site (Targetscan6.2 (Garcia, 2011) for a microRNA seed family expressed in HCT 116.

2.3.10 miRNA expression Data sources

For expression of miRNAs in HCT 116, we obtained microarray-data sets generated by [Yan

2011], and were downloaded from NCBI GEO (Series GSE26819).

2.3.11 Target Abundance and Sequestration estimation

For each conserved human miRNA, the total number of predicted 6-,7-, and 8-nt 3'UTR binding sites on a gene were weighted by the RPKM expression value of that gene in the untreated HCT 116 RNAseq data to yield the TA for each miRNA.

We estimated the fraction of miRNA i sequestered by Pten (similarly for Vapa and

Cnot6l) as

SequestrationmiRNAi [# of predicted niRNAi binding sites on Pten] x [PtenRPKMexpression] Pten Ej[#of predicted miRNAi binding sites on gene j] x [gene JRPKM expression]

2.3.12 GO term analysis

GO term analysis was performed in R using the GOstats package [Falcon 2007]. For each set of putative Pten, Vapa or Cnot6l ceRNAs, we collected the GO terms associated for

62 2.3. METHODS AND MATERIALS

each mRNA in the set. For each term, we then computed a p-value using a hypergeometric test, to indicate the enrichment of the term in the ceRNA set compared to the background set of all genes.

2.3.13 TMM (Trimmed Mean of M-values) Normalization

Methods for normalization of RNA-sequencing gene expression data commonly assume equal total expression between compared samples. The number of reads expected to map to a gene not only depends on the expression level and length of the gene, but also on the composition of the RNA population that is being sampled [Robinson 2010]. Thus, if a large number of genes are unique to, or highly expressed in a experimental condition, the sequencing 'real-

estate' available for the remaining genes in that sample is decreased. If not adjusted for,

this sampling artifact can force any fold-change analysis to be skewed. This is precisely the

situation in our FACS sorted sequencing dataset. Upon transfecting reporter plasmids into

cells and inducing thousands of transcripts we obviously change the global gene expression to

different extents in each bin. We sorted cells by their expression of mcherry transcripts, and

consequently found a large 3-log decade increase in mcherry read counts in the untrasnfected

(bin 0) and the fully saturated bin 3 (Figure 2.9b). Mchery and zsgreen reads combined

were as much as 30% of the total reads in the last bin.

Define Y as the observed read count for gene g in library k and N as the total number 9kk of reads for library k. Remove all instances of Y = 0 as fold changes cannot be calculated. 9k Then for each bin, let k and and k' stand for the Pten UTR and control NULL UTR library.

Define the gene-wise log-fold- changes M between these two libraries as: 9

M =io Yk/Nk g = 2(Ygki1Nk,)

and absolute expression levels A as g

A = 2log2 (Y kIN x Y k,/N,) for Y ,4 0

If there would be no bias by RPKM normalization, one would expect that the distribution

of M values would be centered around zero. This is not so due to the distorting effects of

the different amounts of plasmid RNA in each bin. To eliminate their effects, we robustly

63 Chapter 2

summarize the M and A values, by a trimmed mean. A trimmed mean is the mean after removing the upper x% and lower x% of the data. We use a double trimming of both the

M and A values: trimming the top 30% and bottom 30% of M values, and the top 5% and bottom 5 % of A. After trimming, we define the TMM Factor as the mean M of the remaining genes. This TMM factor is then used to normalize the library size for library

k. We estimated TMM factors for bin 1 as 0.91 indicating that the Pten UTR library size

had to reduced by that factor. After performing this normalization, we find that the overall

offset in fold changes is 0 as expected(Figure 2.9c). The TMM factor is reasonably stable

for different choices of trim percentiles [data not shown].

2.4 Supplementary Figures and Tables

a 1Uman PEI!N NM.0*314 LYRUMmgi3302

CoaservWd tm for~nMINA fimflhsbraft1 comerved among vertebratks miR-26b1297 iR-19 29bmiR-19 miR-23ab miR-23ab ma103Imi-22 i-26EW1297 nuUR-19 miR-17 5p-nm9 1OM5194 miR-148&152 - I mil-205 MiR-130V30I I I b Human VAPA N14M 3574 SI trMkw5724

ConswednR2 sm mM-13=/12 fo ndR Afxmes brocoswerdma&-bMlaag amngertba miR-11206 mM.3Oa-pl- 75 p0c4/384-5p miR-145 2 miR-194 miR-10130 miR-451 miR-19 C Human CNOT6L NM_144571 31 UTR Iengfk7042

Cosmrved dbz for miRNA fmuiak brosift comaurved ateb rutesx nu -9 nR-19 miR-5/16t94247 1 miR- maiR-17-5p/20W3,d1.5I94 miR-365 miR-IR2 miR-499/49-5p miR-15/16195424/497 miR-23ab miR-23ab niR-34w34b-5p34c/34-5pi449449abd609 iR-961271 miR-137 ImiR-507 Ilet-7 miR-145 mlR-144

Figure 2.11 I Predicted TargetScan conserved miRNA binding sites in the 3'UTR of the ceRNAs chosen in this study. (a) PTEN is targeted by 25 conserved miRNAs (b) VAPA is targeted by 28 conserved miRNAs (c) CNOT6L is targeted by 44 conserved miRNAs

64 2.4. SUPPLEMENTARY FIGURES AND TABLES

C a :xz:.: vc

r b t 0-

. .

0- 0-' -. 5 -1. -. 5 0'.0 0.5 1.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 -1.5 -1.0 -0.5 0.0 0. 1.0 PTEN Crosstalk Strength VAPA Crosstalk Strength CNOT6L Crosstalk Strength

Figure 2.12 1 Crosstalk is microRNA mediated and pervasive on a genome-wide scale. Related to (Figure 2.3). Volcano plot of magnitude of Crosstalk Strength versus P-value in each of the sender-knockdowns for putative ceRNAs i.e only those genes with crosstalk strength in HCT 116 greater than DICER cells. (HCT CS >DICER CS). Data points for genes marked in green have P-value < 10-3 and are statistically significant. Number of putative ceRNAs for the given sender are indicated in the legend.

C BIN1 BIN2 BINS

LL

-5 E 0 0

0 0.0 1.0 2.0 3.0

- - -'^-~BIN # C y

-1.0 -0.5 0.0 0.5 1.0 1.5 Fold Change (WT/NULL)

Figure 2.13 1 Distribution of log2 fold changes (PTEN UTR/NULL) for all genes post TMM normalization is centered around zero in each bin i.e no bin-dependent effects are seen. Related to Figure 2.10.

65 Chapter 2

Table 2.1 1MicroRNA's enriched in genes with positive PTEN Crosstalk Strength with hypergeometric p-value less than 0.05. MicroRNA's in bold are those that are predicted to target PTEN. miRNA seed family P-value Enrichiment factor nuiR-200bc/429/548a 0 1.87 miR-17/17-5p/2Oab/20b-5p/93/1O6ab/427/518a-3p/519d 1.01E-13 1.8 miR-23abc/23b-3p 4.02E-13 1.79 miR-340-5p 4.33E-13 1.7 miR-101/l01ab 6.78E-13 1.9 miR-19ab 9.20E-13 1.78 niR-181abcd/4262 4.15F,12 1.75 rniR-144 1.41E-11 1.87 miR-300/381/539-3p 8.74E-11 1.83 miR-590-3p 4.63E-10 1.62 rniR-13Oac/301ab/301b/30lb-3p/454/721/4295/3666 1.12E-09 1.8 miR-93/93a/105/1O6a/29la-3p/294/295/3O2abcde/372/373/428/519a/520be/52acd-3p/1378/1420ac 1.55F,09 1.8 miR-30abcdef/30abe-5p/384-5p 2.77E-08 1.58 miR-26ab/1297/4465 2.92E-08 1.72 miR-186 1.04E-07 1.68 miR-141/200a 1.32E-07 1.74 niiR-25/32/92abc/363/363-3p/367 1.55E-07 1.71 miR-15abc/16/16abc/195/322/4-)4/497/1907 6.52E-07 1.53 miR-27abe/27a-3p 1.13E-06 1.55 miR-216b/216b-5p 1.65E-06 2.14 miR-148ab-3p/152 2.19E-06 1.71 nuiR-495/1192 2.56E-06 1.62 miR-96/507/1271 2.74E-06 1.57 miR-21/590-5p 2.89F-06 2.13 miR-132/212/212-3p 4.94E-06 1.94 miR-543 4.98E-06 1.69 miR-503 5.20E-06 2 miR-153 7.92E-06 1.69 miR-374ab 8.67E-06 1.67 miR-205/205ab 9.67E-06 1.86 miR-448/448-3p 2.29-E-05 1.67 miR-124/124ab/506 2.27E-05 1.42 miR-7/7ab 3.60E-05 1.8 miR-410/344de/344b-1-3p 6.57-05 1.67 miR.155 7.06F-05 1.81 miR-433 9.37E-05 1.87 miR-lab/206/613 0.0X001 1.57 miR-221/222/222ab/1928 0.0001 1.76 miR-217 0.00018 1.86 miR-202.3p 0.0001 1.58 rniR-128/128ab 0.0003 1,218 miR-320abcd/4429 0.0004 1.54 miR.140/140-5p/876-3p/1244 0.0005 1.8 miR-223 0.001 1.8 rniR-544/544ab/5,14-3p 0.001 1.58 miR-218/218a 0.001 1.48 miR-iSI 0.00)1 1.411 miR-499-5p 0.001 1.76 miR-199ab-5p 0.00)2 1.61 mill-29abcd 0.002 1.42 miR-139-5p 0.003 1.74 miR-194 0.003 1.69 miR-494 0.003 1.56 miR-103a/ 107/lO7&b 0.003 1.53 miR-2O8ab/M ab-3p 0.003 2.06 miR.137/137ab 0.004 1.4 iniR-224 0.006 1.63 let-7/98/4458/4500 0.006 1.1 miR-421 0.007 1.59 miR-290-5p/292-5p/371-5p/293 0.01 1.63 miR-9/9ab 0.01 1.35 mniR-135ab/135a-5p 0.01 1.47 miR-653 0.01 1.78 miRl-142-3p 0.01 1.66 miR-l96abc 0.01 1.75 miR-377 0.01 1.46 miR-l8ab/4735-3p 0.019 1.71 miR-425/425-5p/489 0.03 1.78 miR-l38/138ab 0.03 1.47 miR-24/24ab/24-3p 0.04 1.12 miR-324-5p 0.04 1.94 miR-33a-3p/365/365-3p 0.04726543453742 1.64

66 2.4. SUPPLEMENTARY FIGURES AND TABLES

Table 2.2 1 MicroRNA's enriched in genes with positive VAPA Crosstalk Strength with hypergeometric p-value less than 0.05. MicroRNA's in bold are those that are predicted to target VAPA. miRNA seed family P-value Enrichment factor miR-17/17-5p/20ab/20b-5p/93/106ab/427/518a-3p/519d 6.53E-08 1.8 miR-200bc/429/548a 1.50E-07 1.8 miR-93/93a/105/106a/291a-3p/294/295/302abcde 2.49E-07 1.94 miR-30abcdef/30abe-5p/384-5p 2.90E-07 1.72 miR-202-3p 1.60E-06 1.96 miR-300/381/539-3p 2.80E-06 1.84 miR-186 1.09E-05 1.79 let-7/98/4458/4500 7.OOE-05 1.7 miR-23abc/23b-3p 7.86E-05 1.64 miR-19ab 0.0001 1.62 miR-27abc/27a-3p 0.0001 1.62 miR-590-3p 0.0001 1.55 miR-217 0.0001 2.17 miR-148ab-3p/152 0.0002 1.8 miR-340-5p 0.0003 1.53 miR-9/9ab 0.0003 1.58 miR-144 0.001 1.64 miR-130ac/301ab/301b/301b-3p/454/721/4295/3666 0.002 1.63 miR-26ab/1297/4465 0.002 1.62 miR-25/32/92abc/363/363-3p/367 0.002 1.64 miR-128/128ab 0.002 1.57 miR-101/101ab 0.005 1.63 miR-182 0.007 1.5 miR-503 0.007 1.93 miR-374ab 0.008 1.64 miR-141/200a 0.01 1.59 miR-192/215 0.01 2.4 miR-543 0.01 1.61 miR-196abc 0.01 2.04 miR-139-5p 0.01 1.89 miR-15abc/16/16abc/195/322/424/497/1907 0.01 1.44 miR-155 0.02 1.75 miR-221/222/222ab/1928 0.03 1.72 miR-181abcd/4262 0.04 1.42

67 Chapter 2

Table 2.3 1 Table of Biological Processes (GO) annotations significantly enriched in putative PTEN ceRNAs. Only the top 10 are shown

GO Term P-value Enrichment Description GO:0001568 2.37E-09 2.73 blood vessel development GO:0036211 8.06E-09 1.59 protein modification process GO:0031323 8.72F,09 1.50 regulation of cellular metabolic process GO:2000112 2.69E-08 1.53 regulation of cellular macromolecule biosynthetic process GO:0072358 3.35E-08 2.66 cardiovascular system development GO:0019220 7.64E-08 1.89 regulation of phosphate metabolic process GO:0009891 1.07E-07 1.79 positive regulation of biosynthetic process GO:2001141 1.51E-07 1.50 regulation of RNA biosynthetic process GO:0009892 1.57E-07 1.70 negative regulation of metabolic process GO:0045944 1.76F,07 2.09 positive regulation of transcription from RNA polymerase II promoter

Table 2.4 1 Table of Biological Processes Gene Ontology (GO) annotations significantly enriched in putative VAPA ceRNAs. Only the top 10 are shown

GO Term P-value Enrichment Description GO:0000279 2.20E-14 3.37 M phase GO:0051301 4.61E-11 3.08 cell division GO:0048285 1.61E-10 3.20 organelle fission GO:0000278 4.49E-09 3.86 mitotic cell cycle GO:0007067 2.50E-05 2.90 mitosis GO:0007098 3.62E-05 6.79 centrosome cycle GO:0000236 6.22E-05 3.89 mitotic prometaphase GO:0006796 6.50E-05 1.61 phosphate-containing compound metabolic process GO:0022403 6.56E-05 2.39 cell cycle phase GO:0009889 6.83E-05 1.45 regulation of biosynthetic process

Table 2.5 1 Table of Biological Processes Gene Ontology (GO) annotations significantly enriched in putative CNOT6L ceRNAs. Only the top 10 are shown

GO Term P-value Enrichment Description GO:0045766 1.07E-05 6.68 positive regulation of angiogenesis GO:0009890 7.93E-05 1.95 negative regulation of biosynthetic process GO:0071276 0.0001 22.6913 cellular response to cadmium ion GO:0048714 0.0001 84.91 positive regulation of oligodendrocyte differentiation GO:0048514 0.0001 2.50 blood vessel morphogenesis GO:0071294 0.0002 18.91 cellular response to zinc ion GO:0010035 0.0002 2.54 response to inorganic substance GO:0001944 0.0003 2.29 vasculature development GO:0051918 0.0003 42.45 negative regulation of fibrinolysis GO:0031324 0.0005 1.69 negative regulation of cellular metabolic process

68 Chapter 3

A single molecule analysis of ceRNAs reveals miRNA-dependent correlation and colocalization

In the preceding chapter, we presented the genome-wide measurement of crosstalk strength for three different senders Cnot6l,Pten, Vapa and identified key features - # of shared miR- NAs, target abundance of miRNAs and the binding affinity of miRNAs -that impact the magnitude of crosstalk strength. We were able to isolate these factors by profiling tran- script abundances for a bulk population of cells after perturbing sender levels. However the theoretical framework of the ceRNA hypothesis depends upon relative concentrations of targets and miRNAs, sequestration of miRNAs and the repression of miRNA targets. Each of these processes occur within individual cells. The cell is a highly complex environment that cannot be approximated as well-mixed due to the presence of numerous sub-cellular structures. Local concentrations of miRNA binding sites and miRNAs may differ from the average by large amounts thus affecting the rates of miRNA sequestration or repression.

Hence, the absolute intracellular concentrations of these species, their spatial localization, and dynamics along with other molecules involved in miRNA biogenesis (Argonaute2) have to be taken into account for a quantitative understanding of the ceRNA hypothesis. More- over, bulk measurements of crosstalk in cells following sender-knockdowns may mask some Chapter 3 other features of miRNA-mRNA coupling such as buffering of individual ceRNA fluctua- tions by their shared miRNA. This chapter focuses on quantifying endogenous transcription in single cells of known ceRNAs (Cnot6l,Pten, Vapa) with single-molecule resolution by an in situ hybridization (smFISH) assay and analyzing their spatial localizations.

3.1 Results

3.1.1 Quantification of gene expression for Pten, Vapa and Cnot6l in

single cells with 3-colour smFISH

In order to quantify the absolute abundance of ceRNAs in single cells with molecular reso- lution, we used the single-molecule RNA FISH (smFISH) method (Femino et al., 1998; Raj,

Bogaard, et al., 2008) which labels each RNA molecule of a particular species with a fluo- rescently coupled set of complementary oligonucleotide probe sequences. For each of Pten,

Vapa and Cnot6l, we designed 25 to 48 fluorescently labeled probes, each 20 bases long, complementary to the coding-sequence of the target transcript (Figure 3.1). Cnot6l had a shorter coding sequence that admitted only 25 probes. We hybridized individual probe-sets for each of the three genes to fixed and permeabilized cells, stringently washed unbound probes and finally imaged the cells under the fluorescence microscope. For simultaneous detection of three different genes, we labeled our probes with spectrally distinguishable flu- orophores (Cy5, Alexa Fluor 594, and Cy5) and imaged with the appropriate filter sets.

The large number of fluorophores bound to a single mRNA results in diffraction-limited fluorescent spots corresponding to single transcripts (Figure 3.1). We found no obvious sub-cellular localization of these transcripts in the cytoplasm suggesting that ceRNAs are not uniquely captured in a particular structure. In order to quantify the expression level of endogenous mRNA in individual cells, we counted the fluorescent spots from the 3D images of cells using custom MATLAB scripts adapted from (Raj 2008). Each computationally identified molecule was assigned to a cell by manually tracing individual cell boundaries based on the DAPI nuclear staining signal.

To robustly estimate expression levels we counted gene expression in 300-400 cells. We

70 3.1. RESULTS

also performed identical smFISH experiments in the miRNA-deficient HCT 116 DICER -/- cell line. In HCT 116, the ceRNAs Cnot6l,Pten and Vapa were expressed at an average of

12, 28 and 82 molecules/cell respectively. Expression levels of Pten, Vapa and Cnot6l were largely unchanged in the miRNA-deficient DICER-/- cell-line. Quantification of Pten and

Vapa mRNA levels with our sensitive single-molecule method revealed a 1:3 ratio contrary to their previously reported ratio of 1:100 ratio in the same HCT 116 cell-line as estimated by qPCR. (Tay 2011, Ala 2013). We note that such a reported disparity in expression levels had led many authors to conclude that the crosstalk mechanism could not account for a sender expressed at extremely low levels to affect the levels of a receiver expressed 100-fold higher [Ebert, Sharp 2012]. Having established our quantitative 3-colour smFISH data-set we then proceeded to analyze the correlative structure of the single-cell gene expression of the three ceRNAs.

PTEN ORF 48 probes CNOT6L ORF 25 probes VAPA ORF Aft4 4 - __L__-

Figure 3.1 1 Measuring Pten, Vapa and Cnot6l gene expression in single cells with 3-colour single-molecule FISH. (a) Multiple 20-mer oligonucleotide probes for Pten, Vapa and Cnot6l were constructed and labeled with distinct dyes to allow simul- taneous measurement of gene expression with a smFISH assay (b) Spots corresponding to single mRNA molecules resulting from the transcription of the genes Vapa (red, de- tected with oligonucleotide probes coupled to Alexa 595) and Cnot6l (green, oligonucleotide probes coupled to Cy5) in HCT116 cells. Representative maximum intensity z- projection. Diffraction-limited spots (molecules) in each channel were automatically identified with a custom MATLAB script and assigned to individual cells which had been manually seg- mented based on DAPI nuclear staining.

71 Chapter 3

3.1.2 Presence of shared miRNAs generates correlated fluctuations of

Pten ceRNAs in single cells

We used the smFISH data to determine if ceRNAs are correlated in individual cells, which would suggest shared miRNAs co-regulate their fluctuations, or if they varied independently, which would indicate miRNA coupling occurs at a slower timescale to gene expression fluctuations. Previous studies of Pten and Vapa have shown that their gene expression levels are correlated in different bulk tumor samples (Ala 2013). However, any competition and sequestration of miRNAs and consequent crosstalk is a single-cell phenomenon i.e ceRNAs sponge away miRNAs from each other within a noisy intracellular environment consisting of different levels of ceRNAs, miRNAs and RISC/DICER enzymes. Thus it becomes necessary to study the expression of ceRNAs in single-cells to investigate how the presence of shared miRNA biding sites in the three ceRNAs influences their gene expression. We plotted ceRNA pairwise gene expression in single cells for both the HCT116 and DICER datasets. Strikingly, we observed a significant correlation (Pearson correlation coefficient p -0.40) between the gene expression of the three ceRNA pairs in HCT116 cells that was lost in the miRNA- deficient DICER cells (p -0.10) (Figure 3.2b,c). Thus, a cell with low or high expression levels of one of the ceRNAs is likely to be in the corresponding expression state of the other ceRNAs only in the presence of functional miRNAs.

In order to control for possible large-scale transcriptional network imbalances in the

DICER cell-line that might result in all genes fluctuating randomly, we performed smFISH on Twisti and Pten. Twisti is a highly expressed transcription factor that induces epithelial to mesenchymal transition (Yang 2010), and significantly doesn't have any predicted miRNA binding sites in common with Pten. Moreover, Twisti had negligible Pten crosstalk strength in our RNAseq si-Pten knockdown experiment, making it an attractive negative control. We found that while Twisti and Pten expression levels were significantly negatively correlated in HCT 116 cells (p = -0.31), they remained negatively correlated in DICER cells (p =

-0.29).(Figure 3.3)

72 3.1. RESULTS

A nkdepender* gen s twing ge AB A stoichiomnery

extrinsic noise

Kntnsic noise

B C "U ""st"in"A'""essh-)D

200 200 1 HFCT *DICER HCT S p =.42*0.09 p=.11 0.12 02 = 0.24 000.8 DOR 1 41500 . 2=0.91 150 g

o < 0.4 0.4 >n> 504 0.2

0 04W20406080-10-55 IFEN mRNA Counts PTEN mnRA Counts Log2(Stoichlometric Ratio) PTEN / VAPA

35 35 I0. HCT SHCT D0 E 060 30 p=.42*0.08 30 % P =.020 .13 0. '.C0.4 S DE 0.5 02=0. 1C P 30 .2P 5 .4

. 15 0 15 0

1010 0.2 30 * 420 3 . * =100Is"20 35 HC 35 , glCER 0.73 C 5 *. 0.1 3 0 20 40 60 80 Log2(Stoichiometric Ratio) PTEN / CNOT6L 0 20 40 0 F EN mRNA Counts c NEN mRNA Counts

dH and 35r - ICER s. HCT 4 91 0 09 30 ef.t p=. s FI da t 300r6 0o2= 0t50 0* * CR n25 # 8 25 ~0.5 'a21.16 0 20 4P-0 .

15 1500 0.3

z 10 0 Z 0 0.2

51:: 5 5*0.1

00 s0 100 150 200 0 50 100 150 200 -5 0 5 10 VA DrRNA Counts VAp RNA Counts Log2(Stchior etrlc Ratio) VAPA /(CNOTbL

Figure 3.2 t Crosstalk helps ceRpNAs co-fluctuate in single cells thereby tight- ening their stoichiometric ratios in the presence of active dsiRNAs. (a) Two genes that are coupled by a common microRNA (red) will thereby also manage to cou-

ple DIE--their endogenousCorltoCefcet "intrinsic' fluctuationsar ntprgt and thereforen hwsgnfcnhave reduced deviationsoso orlto in their stoichiometric expressions (upper marginal histogram) when compared to genes that don't share miRNAs (grey) .(b,c) Using 3-colour smFISH to quantify expression of Pten, Vapa and Cnot6l in HCT116 and control DICER -/- single cells. Over 300 cells were analysed. Scatter-plot for single-cell transcript counts for Ptert, Vapa, Crtot6l of each pair of ceR- NAs. Left column is smFISH data for untreated HCT116 and right column that for control DICER-/-. Correlation coefficients are on top right, and show significant loss of correlation in DICER -/-. Error bars are bootstrapped 95% confidence intervals (d) Using data in (b,c) we computed the ratio of the transcript count for two genes in each cell and refer to this as the stoichiometric ratio of two genes. Red and black curves are the distribution of the 1og2 (Stoichiometric Ratio) for each pair of genes in HCT116 and DICER cells respectively. The variance of these distributions are indicated in the top left. 73 Chapter 3

A B

30 P - -. 31 P - -. 29 HC - CE

-.

250200 -

IS

500 00

0 0 90 0 0 4 0 0 Figur 3. 0 0 I 2 Oe 0doe 0 notn0 00 to1000 DIE.oecreaini for a eewthwih0 d Ts 116 DICR

PTEN MRNA PTEN MRNA

Figure 3.3 1 Pten does not lose correlation in DICER for a gene with which it doesn't share miRNAs. (a,b) Scatter plot of gene expression in single cells for Ptert and Twisti in HCT1 16 (left) and DICER (right) cell lines. Twisti was chosen as a control as it does not share any predicted miRNA binding sites with Pten and is highly expressed. The two genes remain negatively correlated in both the cell lines. Correlation coefficient on the top right.

Another possible explanation for the the observed difference in ceRNA correlations in

HCT 116 and DICER is that their cell-cycles proceed at different rates. However, we had cultured the two cell-lines under identical conditions and found only a small difference of doubling time between them (-21h and -23h for HCT116 and DICER respectively). Nev- ertheless, we accounted for a possible cell-cycle mechanism in explaining such a correlation by calculating the concentration of mRNAs (by dividing the of mRNA in each cell by its cellular volume) and found a similar loss of ceRNA correlation in DICER cells compared to HCT 116 cells. Together, these data suggest that individual ceRNAs appear to be corre- lated i.e they co-fluctuate with each other in single cells due to the buffering effect of active miRNAs.

Stoichiometric ratio of ceRNAs is tightened by active miRNAs

We speculated that shared miRNAs could couple fluctuations of ceRNAs and thus regulate the stoichiometry of gene expression. By dynamically buffering individual fluctuations in each species via miRNA-mediated crosstalk, ceRNAs could have tighter stoichiometric ratios with each other than with non miRNA regulated genes(Figure 3.2a). Cellular processes

74 3.1. RESULTS are acutely sensitive to changes in dosage for many genes, and thus ceRNAs may be used in pathways to minimize fluctuations. Pten, for example, is a haplo-insufficient gene such that even moderate Pten down-regulation resulting from the loss of a single allele may be tumorigenic (Kwabi-addo 2001). To compare the range of these ceRNA fluctuations in the HCT116 and DICER cell lines, we calculated a 'stoichiometric ratio' , defined as the ratio between the individual mRNA counts for each ceRNA pair in each cell. Notably the stoichiometric ratio is calculated for each single cell in our dataset, and is thus different from the pearson correlation which is defined for two mRNA count series for a entire cell- population. When the distribution of 'stoichiometric ratio' values is plotted for each of the three ceRNA pairs (Pten & Vapa; Pten & Cnot6l; Vapa & Cnot6l) in HCT116 and DICER -

/- cells, significant differences can be detected between the two cell-lines. The distribution of ceRNA stoichiometric ratio is tighter in HCT116 cells compared to DICER -/- as measured by the variance in the distribution, implying that the loss of active miRNAs in DICER -/- causes ceRNAs to fluctuate independently of each other. (Figure 3.2d).

3.1.3 Pten, Vapa, Cnot6l are mutually reciprocal ceRNAs

As ceRNAs share miRNA binding sites, it is expected that they should behave in a bidirec- tional manner i.e their interactions should be reciprocal. In order to study their reciprocal effects, we knocked down 3 separate transcripts (with three biological replicates) of 25nM si-Pten, 25nM si- Vapa and 25nM si- Cnot6l and counted the number of transcripts of Pten, Vapa and Cnot6l simultaneously using smFISH for each of the knockdowns.Though we had quantified the crosstalk strength genome wide for Pten, Vapa and Cnot6l as described in

Chapter 2 by knocking them down individually with siRNA, and RNA sequencing the tran- scriptomes, we observed a significantly greater crosstalk strength in HCT116 compared to

DICER -/- in 4 of the 6 possible sender-receiver pairs. Given that smFISH measurements yield absolute mRNA expression levels rather than relative RPKM values, we anticipated that quantifications of crosstalk strength would be more accurate when performed at a single molecule resolution. Pairwise analysis of scatter-plots for each of the receivers reveals that they are each depleted when any individual sender is knocked-down (Figure 3.4a,b,c).

75 Chapter 3

. fractional charge ptc-n = 33% *WT 0, ilt - in receiver CS, = =.53 2nM *I-VAPA fractional change 63% C 00 * e n eO c not 6 l 0.7 _.

DICER HCTI 16 0W 2 9 40 a& U. ------r DICER HCTI1I6 PTEN VAPA B VAWA J6 vapacnot6l CSgt CS

0.3 0.4jo

m - DICER HCT116 DICER HCT116

C VAPAVAA PrEN

,.CN 6L UI apapten lo-cnt Cscno6:I

DICER HCT116 DICER HCT116

PTEN ~ VAM A

Figure 3.4 I Measuring crosstalk strength with smFISH for 3 different senders in HCT116 and DICER -/-. (a) Single-cell mRNA counts with 3-colour FISH on Vapa, Pten and Cnot6l in WT(black) and 25nM si- Vapa knockdown (violet). Each dot is the mRNA count/cell for the two indicated mRNA species. Marginal histograms for each mRNA in the two different conditions are on the top and right of each scatter plot. Bars indicate the mean expression in each single-cell distribution (black=WT and violet=si- Vapa). Knocking down Vapa by 60% results in a 33% fold change of Pten. (b,c) Same as (a) with 25 nM si-Pten knockdown (pink) and 25nM si-Cnot6l knockdown(cyan) (c)Crosstalk strength for a receiver wrt to a sender is defined as in the text. Average CS measured in 3 different biological replicates for each sender-receiver pair in HCT and DICER -/- cells. Error bars are standard deviations of 3 independent sets of knockdown experiments.

For instance, Pten is reduced by 33% when Vapa levels are knocked down by si-Vapa = 0.53 (Figure 3.4a). Similarly we could calculate the crosstalk by 60%, thus the CSPte"vapa strength for each of the six possible sender-receiver pairs. Even though Pten is not as highly

expressed at Vapa, it again emerges as the best sender of crosstalk as corroborated from

our genome-wide RNAsequencing results. Importantly, though senders suffer similar fold

knockdowns in DICER -/- as in HCT116 cells, the receiver reduction (and consequently

crosstalk strength) is much weaker in DICER -/- cells for all 6 sender-receiver pairs indicat-

76 3.1. RESULTS

ing that mature miRNAs are essential for the crosstalk mechanism (Figure 3.4d). Notably, we always measure a non-zero 'residual' crosstalk effect in DICER -/- due to the attenuation but not elimination of mature microRNAs as reported by Taqman miRNA qPCR in the

DICER -/- cell line (Tay 2011). Taken together, we find that the ceRNA effect is indeed bi-directional and miRNA dependent.

3.1.4 Individual molecules of Pten ceRNAs are colocalized in a miRNA-

dependent manner

On inspecting our smFISH dataset closely, we surprisingly found some of the individual ceRNA molecules were colocalized with each other (Figure 3.5b). As discussed in the introduction, local concentrations of miRNAs and mRNAs can differ considerably from average cellular concentrations. If ceRNAs are co-localized with each other, or sequestered in miRNA processing machinery, then their competition for miRNAs could substantially increase as their effective local concentrations would be much greater than the average concentration of all possible competing miRNA binding sites. Put another way, bound miRNAs released from a sender would have greater propensity to bind to other receiver mRNAs in its vicinity than diffusing to other far-away binding sites. We speculated that the high magnitude of crosstalk strength that we observed for the three reciprocal Pten,

Vapa, Cnot6l ceRNAs might be explained by such colocalization of their transcripts.

Quantifying degree of colocalization between ceRNAs

In order to measure the degree of colocalization between ceRNAs, we used the 3-colour smFISH expression datasets to first identify the precise 3D locations of the centers of each diffraction-limited fluorescent spot. To do so, we fitted a gaussian to each spot's intensity trace for each channel and thereby calculated the centre of each spot. The channels are aligned using TetraSpeckTM Microspheres (Invitrogen) and in each channel we find all the spots in another channel that are within 2 pixels in the xy plane and 2 z-planes away to control for possible stage drifts during the imaging procedure. This method allows the automated quantification of the number of transcripts being colocalized in each single cell.

77 Chapter 3

A PTEN ORF B 48 probes CNOT6 ORF 48 probes VAPA ORF 48 robes

IM"ef C 1VAPA roRKA oocafted with PlM n~U4 P~TE mRt4A osocaftW~ Y~h YAP A AP A OatWdt, uUL

37 4.52 , DICERMdePenetm ()R P set *werr d A f

0, 07.6k0 RNs aeclclzdi Figur 3. Snge oecl FIS s wsPe

Fiur 5 Singl moeueFS2hw tnceN saeclclzdi

t hree three C NOTOLCnotOl ndnR cupledcopled toht dTO#feret 1~ ~4 diffren flopOre flopoe lo.witMFAllw nTS detectiondetctio CoS wocv trHacriptsfo ~ OfOO trnciptse wthe frth

expression genes simultaneously. A representative dual-colour image for Vapa and Cnot6l in HCT116 cells is shown (maximum intensity projection) (b) A single z-slice of a 3-colour FISH image for Pten, Vapa and Cnot6l in HCT116 cells. Arrows indicate colocalized tran- location scripts for each pair of genes. (c)We computationally detected the precise 3D of each transcript's intensity peak and calculated the percentage of transcripts that are colocalized between pairs of ceR.NAs in HCT116 and the control DICER -/- in different experimental conditions (indicated below each barpolot). ]For cg. the colocalization percent- to the age of Vapa with Pten indicates the fraction of colocalized Vapa and Pten molecules total number of Vapa molecules. For each condition, more than 300 cells were analysed and the colocalization percentage represents the mean colocalization percentage of a cell.

For each pair of ccRNAs, we define the average colocalization fraction as follows:

with ceRNA2 Coloaliedracton f cRNA ithceRA =# of colocalized transcripts of ceRNA1 Colocahzed~frcino eR ,wt eR 2= total # of transcripts of ceRN A1

where () denotes the average over all the ells. Note that the Colocalization fraction is not

78 3.1. RESULTS

symmetric as the denominators are different even though the numerators are identical i.e

Colocalized fraction of Pten with Vapa will always be greater than Colocalized fraction of

Vapa with Pten because Vapa expression (denominator) is greater than Pten expression (de- nominator) even though the number of colocalized Pten and Vapa (numerator) molecules is identical.

In order to test whether the colocalization was miRNA dependent, we measured the colocalization fraction between each ceRNA pair in all our experimental conditions, and for both the HCT116 and DICER -/- cell lines. We found that colocalization fraction for each ceRNA pair was significantly higher in HCT compared to DICER -/- in all the con- ditions,suggesting that miRNAs were partly responsible for colocalization (Figure 3.5c).

The fraction of Cnot6l colocalized with Vapal was surprisingly high and ranged from 25-40% in the siRNA knockdown conditions. Most other ceRNA pairs had colocalization fractions between 2-10% in HCT116. However, this is likely to be a lower bound for colocalization of ceRNA species over a cell-cycle because we only take snapshots of gene expression with smFISH. To test for the specificity of our colocalization algorithm, and exclude the pos- sibility that the colocalization was independent of common miRNAs between ceRNAs we used Twisti as a negative control. We checked for colocalization between Pten and Twisti which dont share any miRNA binding sites. We found no colocalization between the two suggesting that colocalization was specific to ceRNA species. We also estimated a null model for random colocalization in the following manner: we took the probability of 2 transcripts to randomly colocalize as the size of a voxel occupied by a diffraction-limited spot / cellular volume. The size of a voxel for a diffraction limited spot is -0.2pm x 0.2pm x 0.3pm while the volume of a cell is -10pm x 10pm x 5pm, thus the probability of random colocalization is negligible. Taken together, we find that colocalization of ceRNAs is miRNA dependent and differs considerably for each ceRNA pair.

79 Chapter 3

3.2 Discussion

Here we used a smFISH assay to quantify endogenous transcription in single cells of known ceRNAs (Cnot6l,Pten, Vapa) with single-molecule resolution and analyzed their spatial lo- calizations. Our smFISH single-cell measurement of crosstalk strength for these three ceR-

NAs that share at least 7 miRNA binding sites is consistent with the previous chapter's population-level result. However, we measured Vapa, Pten and Cnot6l's crosstalk effects on each other with a much greater accuracy, and found that they affected each other recipro- cally at both mean-level changes and dynamically in single-cells. In analyzing the single-cell expression profile, we uncovered a miRNA-dependent correlation and stoichiometric co- variation of ceRNA expression in single cells along with a miRNA-dependent colocalization of their mRNA molecules. These findings may have important implications of a crosstalk- based mechanism of post-transcriptional regulation.

Firstly, if microRNAs promote a stoichiometric balance among genes that share miRNA binding sites then this could explain the paradox of weak miRNA repression on individual targets versus strong evolutionary selection of microRNA-targeting. Stoichiometric balance is crucial within macromolecular complexes and cellular networks where imbalances can lead to severe malfunctions. As microRNAs are known to extensively co-target functionally shared gene networks and proteins in macromolecular complexes, we suggest that microR-

NAs may be selected for their combinatorial regulation on many different ceRNAs together rather than on individual targets. The individual repressive effect of a miRNA on its shared targets would be correlated through the crosstalk channel and allow for stoichiometric ex- pression of a large set of miRNA targets. Such a crosstalk based co-regulatory mechanism at the transcript level would allow a flexible,adaptive mechanism for compensating environ- mental, genetic or random perturbations in mRNA abundance.

Secondly, our observation that ceRNAS exhibit reduced gene expression correlations in miRNA deficient DICER -/- cells may be taken as a general signature of crosstalk to help in their identification. Putative ceRNAs could be identified without perturbing the cell i.e without relying upon either down-regulating or up-regulating the levels of a particular

80 3.3. METHODS sender and observing changes in a particular receivers. Instead, the intrinsic variability of sender transcript levels in a cell would correlate the levels of a receiver through the shared miRNA crosstalk channel. Recent advances in single-cell sequencing technology has resulted in the ability to measure the entire transcriptome of hundreds of cells, and thereby compute single-cell correlations between all possible pairs of genes (Gruen,Kester 2014). Pairs of genes that appear to lose correlation in DICER -/- when compared to HCT 116 would thus be attractive ceRNA candidates. Such an unbiased, "loss of correlation" based approach to identify ceRNAs would circumvent two major limitations of the sender perturbation strategy. One, the reliance on microRNA-target predictions to identify putative ceRNAs.

Computational target predictions are often noisy and have limited accuracy and consistency- in practice, false positives and false negatives in the target predictions often make it difficult to identify mRNAs with common targeting miRNAs. Secondly, perturbing a sender mRNA causes a cascade of transcriptional and protein-level changes which make the construction of a null model challenging.

3.3 Methods

3.3.1 Fluorescent in situ hybridization and imaging

Hybridization and washes were carried out according to previously established protocols

(Femino 1998,Raj, 2008). Briefly, we hybridized probes for at least 18 hours at 30C, we used wash buffers of formamide concentration 25%. Optimal washing conditions and probe concentrations were determined empirically for each gene. For nuclear staining, we used the DAPI after the wash steps. Z-stacks of images were taken with a Nikon Ti-E inverted fluorescence microscope equipped with a 100x oil-immersion objective and a Photometrics

Pixis 1024B CCD (charge-coupled device) camera using MetaMorph software (Molecular

Devices, Downington, PA). The image-plane pixel dimension was 0.13 Jpm and the Z spacing between planes was 0.4 pm.

81 Chapter 3

Table of smFISH experimental conditions

Treatment Cell-line smFISH species

untreated HCT116 and DICER -/- Pten, Vapa, Cnot6l, Twisti

25nM si-non targeting neg control HCT116 and DICER -/- Pten, Vapa, Cnot6l

25nM si-Pten HCT116 and DICER -/- Pten, Vapa, Cnot6l

25nM si- Vapa HCT116 and DICER -/- Pten, Vapa, Cnot6l

25nM si-Cnot6l HCT116 and DICER -/- Pten, Vapa, Cnot6l

3.3.2 Image analysis

The transcript distribution was measured by counting smFISH labeled mRNA in single cells as previously described (Raj, Bogaard, et al., 2008). Briefly, a log filter is applied to each optical plane of the image stack to enhance the fluorescent signal. A threshold on intensity values is taken for where the plot consisting of the of identified spots with respect to intensity plateaus to pick up true mRNA spots. The locations of mRNA spots are then taken to be the regional maximum pixel value of each connected region. Cell boundaries are manually traced using the dapi and bright-field images. The number of mRNA spots located within the cell boundaries of an individual cell can thus be quantified.

3.3.3 siRNA transfection and cell culturing

Transfections and cell culturing were carried out as described in Chapter 2.

82 Chapter 4

MicroRNA-mediated control of protein expression noise

4.1 Background

1 MicroRNAs regulate a large number of genes in metazoan organisms (Friedman et al.,

2009; Lewis et al., 2005; John et al., 2004; Lee et al., 1993; Wightman et al., 1993; Enright et al., 2003) by accelerating mRNA degradation and inhibiting translation (Guo et al.,

2010; Lim et al., 2005). Although the physiological function of some microRNAs is known in detail (Lee et al., 1993; Wightman et al., 1993; Brennecke et al., 2003; Johnston and

Hobert, 2003), it is not clear why microRNA regulation is so ubiquitous and conserved, since individual microRNAs only weakly repress the vast majority of their target genes (Baek et al., 2008; Selbach et al., 2008) and knockouts rarely result in mutant phenotypes (Miska et al., 2007). One reasons for this widespread regulation that has been proposed is the ability of microRNAs to provide robustness to gene expression (Bartel and Chen, 2004; Hornstein and Shomron, 2006) - e.g. by buffering stochastic variability in gene expression(Ebert and

'This chapter has been adapted from a paper entitled "MicroRNA control of protein expression noise' that has been published (Science 3 April 2015: 128-132) with lead author J6rn Schmiedel. My contribution was to aid in experimental design and in writing an earlier version of the final paper. Chapter 4

Sharp, 2012).

In this work we use mathematical modeling and single cell reporter assays to show that microRNAs - in conjunction with increased transcription - decrease protein expression noise for lowly expressed genes, but increase noise for highly expressed genes. Genes that are regulated by multiple microRNAs show more pronounced noise reduction. We estimate that hundreds of (lowly expressed) genes in mouse embryonic stem cells have reduced noise due to substantial microRNA regulation. Our findings therefore suggest that microRNAs confer precision to protein expression and thus offer plausible explanations for the commonly observed combinatorial targeting of endogenous genes by multiple microRNAs as well as the preferential targeting of lowly expressed genes.

Gene expression is inherently variable due to the stochasticity of all molecular reactions

(Raj et. al., 2006; see (Figure 4.1a). Noise in the expression of a gene is thought to mainly originate from transcriptional dynamics (Blake et al., 2003; Raj, Peskin, et al., 2006), low number of mRNA molecules (Ozbudak et al., 2002; Bar-Even et al., 2006) or fluctuations that propagate to the gene from external sources, such as varying numbers of transcription factors or ribosomes (Pedraza and van Oudenaarden, 2005; Paulsson, 2004). Previous work has hypothesized that microRNAs should be able to reduce gene expression noise when their repressive post-transcriptional effects are antagonized by accelerated transcriptional dynamics (Ebert and Sharp, 2012; Noorbakhsh et al., 2013). However, this has not been shown experimentally and since microRNA levels themselves are variable, the propagation of their fluctuations should theoretically contribute additional gene expression noise.

4.2 Effects of microRNAs on gene expression noise

To explore the effects of endogenous microRNAs on protein expression noise, we adapted a single-cell plasmid reporter system (Mukherji et al., 2011) to measure microRNA-dependent expression fluctuations in mouse embryonic stem cells (mESC). The plasmid contains two genes that encode fluorescent proteins (ZsGreen and mCherry), which are transcribed from a common bi-directional promoter (Figure 4.1b).

84 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE

A

transcriptional translational machinery machinery gene I mRNA protein expr. r -- =- -~--

microRNA 00

B C D 1 05 no 3'UTR < >0.1 sites pTRE-Tight four bulged miR-20a - ~i-'~ 3UTR I 10 Cd 0 -- microRNA prte aE 10' 104 protein 0 U E mCherry intensity (a.u.] 10e (in one ZsGreen bin) 101 102 103 104 105 ZsGreen intensity [a.u.]

E A *no 3'UTR *no 3'UTR .no 3'UTR L 1 . e one bulged miR-20a site * one perfect miR-20a site 1.5 * four bulged miR-20a sites

(0 05 0.5 0.5

E 0- 0 - 010 104 102 103 104 102 103 104 102 1W

mCherry intensity mean [a.u.]

Figure 4.1 I Opposing noise effects of microRNA regulation at low and high gene expression. (a) Model scheme for the expression of a microRNA regulated gene. The microRNA can reversible bind the mRNA (not depicted) to inhibit its translation and decrease its stability. If the mRNA is degraded in the mRNA-microRNA complex, the microRNA is recycled. Noise in gene expression originates from the stochasticity of molecular reactions (intrinsic noise; jagged reaction arrows), or variability in the cellular machinery (extrinsic noise; external factors with fluctuating levels). (b)The plasmid reporter system. The plasmid carries a pTRE-Tight bi-directional promoter from which ZsGreen and mCherry are transcribed. The mCherry 3'UTR can be modified to contain no or a certain number and type of microRNA binding sites. (c) Overlay of two flow cytometry measurements of mouse embryonic stem cells transiently transfected with two different variants of the reporter system, one with no mCherry 3'UTR (black) and the other with four bulged miR-20a binding sites in the mCherry 3'UTR (blue). For further processing we binned cells according to ZsGreen intensity (red vertical lines) and discarded cells in ZsGreen background (grey) (see Appendix C, Methods). a.u.: arbitrary units.

85 Chapter 4

Figure 4.1 1 Opposing noise effects of microRNA regulation at low and high gene expression. (d) Example of mCherry intensity distributions in one ZsGreen bin. In each bin we calculate the mean and noise - defined as the coefficient of variation (standard deviation divided by mean) - of mCherry intensity distributions. (e)Noise of mCherry intensity as a function of mean mCherry intensity in each bin for three different miR-20a regulated constructs (blue) compared to respective unregulated constructs (black). Panels are ordered from left to right according to increasing repression of constructs by miR-20a (cf. Figure C.1). Dots and error bars represent data mean and bootstrapped standard deviation, respectively. Dashed lines and patches represent optimal model fit and 95% confidence interval, respectively.

To probe the effect of microRNAs, we constructed variants of the plasmid with different numbers and types of microRNA binding sites in the 3'UTR of the mCherry gene. We transfected plasmids into mESCs and quantified single cell fluorescence two days later using a flow cytometer (Figure 4.1c). We used ZsGreen fluorescence intensity to bin cells with similar transcriptional activity (e.g. due to varying plasmid copy numbers) and in each bin we calculated mean and noise of mCherry intensities over all cells in the bin ( (Figure

4.1d), see Appendix C, Materials and Methods and Supplementary Note). We define noise as the standard deviation of the protein expression distribution divided by its mean, which is an intuitive measure of the relative size of expression fluctuations.

We started by assessing the effects of miR-20a, a microRNA endogenously expressed in mESC, on mCherry protein expression noise (Figure 4.le). In cells with low mCherry ex- pression, miR-20a regulation reduces noise compared to an unregulated control. In contrast, in cells with high mCherry expression, miR-20a regulation increases noise. These changes in mCherry noise are more pronounced for reporters where miR-20a repression of mCherry protein is stronger, e.g. when using perfect and multiple target sites ( (Figure 4.lf,g)and

Figure C.1).

We utilized a mathematical model in order to understand these opposing effects of mi- croRNA regulation on protein expression noise (see Appendix C, Supplementary Model).In this work, we adopt the commonly used decomposition of total noise 7tot into intrinsic noise

2 and extrinsic noise q (Elowitz, Levine, et al., 2002; Swain et al., 2002). Here, 2t squared total noise is the sum of the squared intrinsic and extrinsic noise components:

86 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE

(4.1) 77ot mt ext 1

C A B total T intrinsic noise nt extrinsic noise next noise Io

-no mRNA-miRNA A A A 1.5 ~| I interactioninteraction 1.5 0L 1.5 CL V CL 1 a) 1 A) (0 0 C 0.5'S C 0.51 C: 0.5 0 - - 0 0' 101 102 103 104 105 10 102 103 10 10, 10 10 10 10 10, protein expression [a.u.] protein expression [a.u.] protein expression [a.u.]

Figure 4.2 1 Noise model predictions for a microRNA regulated gene. (a) In- trinsic noise due to low molecule numbers declines with increasing expression. MicroRNA regulation reduces intrinsic noise as a function of repression due to higher mRNA num- bers necessary and dampened propagation of noise from the mRNA to the protein level. (b)microRNA regulation results in additional extrinsic noise due to fluctuations in the mi- croRNA pool that are propagated to the target gene dependent on conferred repression and satu- ration of the microRNA pool (cf. Figure C.2). (c) Net influence of microRNA regulation results in decreased total noise at low and increased total noise at high expression levels.

Intrinsic noise stems from the reactions internal to the expression of the gene and is domi-

nated by transcriptional dynamics and low mRNA copy numbers. Extrinsic noise stems from

fluctuations propagating from external factors to the gene (Figure 4.2a). The modeling

results in two key predictions. Firstly, the model predicts that a microRNA-regulated gene

(reg) has reduced intrinsic noise compared to an unregulated gene (unreg) at equal protein expression levels; the size of intrinsic noise reduction is approximately equal to the square

root of microRNA-mediated fold-repression r (Figure 4.2a):

unreg -V' (4.2) 77_reg int

The model predicts that the effect and its size are independent of the mode of repression,

since translational inhibition requires higher mRNA levels and therefore reduces intrinsic

87 Chapter 4 noise resulting from low mRNA copy numbers, while accelerated mRNA degradation damp- ens the propagation of noise from the mRNA to the protein level (see Appendix C, Supple- mentary Note 1; Ebert et. al., 2012; Pedraza et. al., 2005; Fraser et. al., 2004). To achieve equal protein expression given increased mRNA turnover, there must be increased tran- scription rates. Reduction of intrinsic noise can therefore be understood as the combined effect of microRNA-mediated accelerated turnover and increased transcriptional activity

(Ebert and Sharp, 2012). Secondly, the model predicts ( (Figure 4.2b) and Figure C.2) that microRNA regulation acts as an additional extrinsic noise source given by

7ext =4 (4.3)

where IT denotes the noise in the pool of regulating microRNAs (see Appendix C, Sup- plementary Model), and 0 is the microRNA repression (see Figure C.2). The combined effects of decreased intrinsic and additional extrinsic noise result in decreased total noise at

low expression, but increased total noise at high expression (Figure 4.2c) ; and model-fits, with the microRNA pool noise as the only free parameter, yield accurate agreement with

the experimentally observed total noise profiles (Figure 3.le-g). To distinguish the effects

of microRNA regulation on intrinsic and extrinsic noise experimentally, we modified our

plasmid reporter system such that both ZsGreen and mCherry are regulated by miR-20a

through identical 3'UTRs ((Figure 4.3a) and Figure C.3a). As a result of this design, both

fluorescent reporters share the same regulatory inputs and cellular environment, and intra-

cellular differences in their expression can only result from processes inherent to each gene, i.e. the processes that create intrinsic noise (Elowitz, Levine, et al., 2002; Swain et al., 2002).

Results from this experimental design show that miR-20a regulation reduces intrinsic noise

compared to an unregulated construct ((Figure 4.3b) and Figure C.3b). As predicted by

our model, the intrinsic noise is reduced by the square root of fold-repression conferred by

miR-20a ((Figure 4.3c) ; see also Figure C.3d), confirming our results reported in Figure

4.1c These results further imply that the observed increase in total noise at high mCherry

expression must be due to additional extrinsic noise (Figure C.3c).

88 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE

A B * no 3'UTR . no 3'UTRs pTRE-Tight 0 0.5 . . no 3'UTR CD elxbulged miR-20a * I xperfect miR-20a 9 4xbulged miR-20a P=UUTR 0 4 4 C 0 0 mRNA C C nicroRNA protein 10m104 10310 105103 104 mean mCherry + ZsGreen intensity [a.u.] C

=3 4 4xbulged rr 8-20a

0 3 ix perfect mirA-29ieT

Q.)

--lprfectmtrri-20a

l tulged miR-20a 1 L 2 3 4 sqrt(fold-repression)

Figure 4.3 1 microRNA-mediated intrinsic noise effects. (a)Modified plasmid re- porter system where ZsGreen and mCherry have identical 3'UTRs, which allows to quantify expression-dependent intrinsic noise. (b) Intrinsic noise as a function of mean ZsGreen + mCherry intensities in each bin, showing that microRNA regulation reduces intrinsic noise. Dots and error bars represent data mean and bootstrapped standard deviation, respectively. Dashed lines and patches represent optimal model fit and 95% confidence interval, respec- tively. (c) Measured intrinsic noise reduction for bi-regulated constructs is proportional to square root of fold-repression, as measured independently by mCherry-regulated constructs (cf. Figure C.1). Error bars indicate standard deviation of three biological replicates.

In summary, our data show that miR-20a regulation reduces intrinsic noise while it

increases extrinsic noise of target genes, resulting in lower total noise at low expression but

increased total noise at high expression levels.

Our analyses so far suggest that the reduction of intrinsic noise is a generic property

of microRNAs as post-transcriptional repressors of protein expression and therefore noise

reduction should occur irrespective of the specific microRNAs or the molecular details of

the mRNA-microRNA interaction. In contrast, additional extrinsic noise stems from the

variability of the microRNA pools and should therefore depend on the specific microRNA.

To investigate these hypotheses, we constructed reporters with binding sites for eight ad-

ditional microRNAs that are endogenously expressed in mESC over a wide range (Figure

89 Chapter 4

C.4). Since the molecular details of mRNA-microRNA interactions do not affect microRNA-

mediated noise effects we chose perfect target sites to allow for high specificity with respect to the regulating microRNA pool and to optimize measurement signals. The data from all

eight reporters consistently show intrinsic noise reduction as large as the square root of fold-

repression (Figure C.3e), and we additionally confirmed this by directly measuring intrinsic

noise reduction for miR-291a (cf. (Figure 4.3c) ). We furthermore found that AU-rich

elements, which induce post-transcriptional repression of protein expression due to binding

of various co-factors (Barreau et al., 2005), also reduce intrinsic noise by the square root of

fold-repression (Figure C.3f). These data therefore support the hypothesis that reduction

of intrinsic noise is a generic property of microRNAs as post-transcriptional repressors that

is independent of the specific identity of the regulating microRNA.

Next we used our mathematical model to extract the microRNA pool noise from the

fits to the experimental data. We find that microRNA pool noise differs across all assayed

microRNAs (Figure 4.4a) , while estimates of microRNA pool noise for different constructs

assaying the same microRNA are similar (Figure C.7), validating that our model fits can

faithfully estimate microRNA pool noise. Although microRNA pool noise decreases for

microRNAs that repress the reporters more strongly, it is still substantial even for the

most highly expressed microRNAs in mESC (miR-290 cluster, including miR-290, miR291a

miR295; Marson et. al., 2008). Interestingly, we find that the subset of assayed microRNAs

with two independent gene copies, producing the identical mature microRNA ((Figure

4.4a), marked in red), tend to have lower microRNA pool noise compared to microRNAs

that confer similar repression but only have one gene copy ((Figure 4.4a), marked in

black).

90 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE

A E

A) miR-200b ----. 0 0.5 m------. r_ niiR-_9P R2&- acc:3104

0 miR-2 3 0. iRT2a o 10 miR-291a .2E z 0.25 ..-.- i102 E)- A 0 m1F 126a iR-16ni OE 0 10' 102 103 0Q -J 10p 101 m herry mRNA leve s [RPKM] fold-repression B mESC transcriptome 0.5 0) 0 CL z 0.3~ - I . C5 0. 2 0 0.25 -- - - - CL q .. 0.1 0

100 10' 102 10 0, mRNA levels [RPKMI qt * + + F NC percentage of genes expressed below 25 50 75 90 95 99 C 1.5 100- - -- Weel 3'UTR A " Weel 3'UTR mut v A Wee1 3'UTR wt so, 1 0 0. a C 100 ------I Lats23'UTR 0.5 0 a) 50.

E E 0 0 0. 2 so ~ ~ ------10 103 104 100- Casp2 TUTR mCherry intensity mean [a.u.] D so0 ------E 00. Lats2 3'UTR mut 0 A e * Lats2 3'UTR wt 100 -Rbl2 3'UTR E 1 50------0 0. . crossover C -0- 0.5 3 * endogenous expression 100 10' 102 10 max. possible mRNA levels [RPKM] noise reduction E 0 102 1io 104 mCherry intensity mean [a.u.]

Figure 4.4 1 Estimation of microRNA pool noise and noise effects for endogenous genes. (a) MicroRNA pool noise estimates from reporters with perfect target sites for nine different microRNAs endogenously expressed in mESC. Subsets of microRNAs with one (black) or two gene copies (red) show negative scaling of pool noise with conferred fold- repression, with latter subset having lower noise levels.

91 Chapter 4

Figure 4.4 1Estimation of microRNA pool noise and noise effects for endogenous genes. (b)microRNA pool noise estimates of individual pools of miR-16, miR-20a and miR- 290 compared to mixed pools of miR-16 miR-20a and miR-20a miR-290, as determined from a reporter regulated by two perfect target sites for the respective microRNA species. Red bars in columns for mixed pools show expected microRNA pool noise when individual microRNA sub-pools were fully correlated. (c)Total noise levels for the 3'UTR of the cell cycle regulator Weel, wild-type (blue) and microRNA binding sites point-mutated (black) versions. (d) Total noise levels for the 3'UTR of the tumor suppressor Lats2, wild-type (blue) and microRNA binding sites point-mutated (black) versions. (e) Mapping fluorescent reporter levels to the transcriptome of mESC. (Upper panel) FACS sorting and least square regression was used to determine conversion between mean mCherry fluorescent intensities and mCherry mRNA levels (as measured by RNA seq). (Lower panel) The range covered by the fluorescent reporter system in relation to the transcriptome expression (n = 13751) in mESC (25% to -99% of transcriptome expression). (f) Relative microRNA-mediated effects on total noise in assayed endogenous 3'UTRs compared to their point-mutated 3'UTR versions as a function of transcriptome expression. Blue line and area represent model-based extrapolation of noise effects to transcriptome expression (mean and 95% confidence interval based on parameter estimates of n=3 measurements). Black dots indicate crossover from reduced to increased total noise. Red dots indicate endogenous transcriptome expression of the respective gene in mESC. Red dashed lines indicate maximally expected reduction of total noise given the observed repression. Error bars in (a) & (b) indicate standard deviation of at least three biological replicates. In (c) & (d) dots and error bars represent data mean and bootstrapped standard deviation, respectively. Dashed lines and patches represent optimal model fit and 95% confidence interval.

This suggests that microRNA pools could have lower noise if they consist of indepen- dently transcribed microRNAs. We reasoned that these findings should extend to genes that are regulated by different microRNAs, where uncorrelated fluctuations between the different microRNAs can average out, resulting in lower noise of the overall pool. To test this hy- pothesis, we constructed reporters with a perfect target site for miR-20a and an additional perfect target site for either miR-16 or miR-290 in the mCherry 3'UTR and compared them to reporters with two perfect target sites for miR-16, miR-20a or miR-290, respectively.

When estimating microRNA pool noise from the total noise profiles (Figure C.8) we find that the noise levels in the mixed pools are lower than expected if the individual microRNA pools were fully correlated (see Appendix C, Methods) and can be lower than the noise in the individual microRNA pools (Figure 4.4b). Taken together these experiments show that although microRNA regulation increases extrinsic protein expression noise, mixed pools of microRNAs can attenuate this effect.

92 4.2. EFFECTS OF MICRORNAS ON GENE EXPRESSION NOISE

So far we investigated microRNA-mediated noise effects using nearly or fully comple- mentary microRNA binding sites in an artificial 3'UTR setting. Endogenous microRNA targets however often harbor many binding sites, albeit with less complementarity, for dif- ferent microRNAs in their 3'UTRs (Friedman et al., 2009; Enright et al., 2003; Krek et al., 2005; Stark et al., 2005). To test if our findings extend also to those situations, we constructed four mCherry reporters with the 3'UTRs for the genes Weel, Lats2, Casp2 and

Rbl2, which all have multiple binding sites for different microRNAs endogenously expressed in mESC. We then compared protein expression noise for constructs with the wild-type

3'UTRs to versions with point-mutated microRNA binding sites (see Appendix C, Meth- ods). The microRNAs together confer between 3 and 5.5-fold repression for the wild-type

3'UTRJs compared to the point-mutated 3'UTRs (Figure C.9a). For all wild-type 3'UTRs we observed reduced total noise at low and intermediate expression compared to the mu- tated 3'UTRs ((Figure 4.4c,d) and Figure C.9a). As observed for the artificial 3'UTR constructs, intrinsic noise for the wild-type 3'UTR constructs is reduced by the square root of fold-repression (Figure C.3g), indicating that our previous findings on the reduction of intrinsic noise can be extrapolated to endogenous microRNA targets. Interestingly, total noise is hardly increased at high expression levels and the estimated noise levels for the mixed microRlNA pools regulating the endogenous 3'UTRs are very low compared to the noise levels estimated for single microRNA pools (Figure C.9b), consistent with the findings above that mixing of different microRNA species results in lowered microRNA pool noise.

Finally, we determined if the expression range covered by our reporter assay covers relevant expression levels of endogenous genes. We collected cells at different mCherry fluo- rescence intensities using fluorescence-activated cell sorting, and measured mCherry mRNA levels in conjunction with the whole transcriptome using mRNA sequencing (see Appendix

C, Methods Figure C.10a). We find that our reporter assay covers the range of 25% to 99%

(-1 RPKM to -500 RPKM) of expressed genes in mESC (Figure 4.4e), indicating that the noise effects observed in our reporter assay are relevant to endogenous genes. For all four 3'UTRs that we assayed with our reporter, reduction of total noise extends in a graded fashion up to the top 10% of the transcriptome expression distribution (Figure 4.4f).

93 Chapter 4

While most microRNAs individually repress genes only to a small extend (11, 12), we find that hundreds of genes are substantially repressed (>2 fold) by the combinatorial action of microRNAs in mESC (Figure C.11), as determined from data comparing the transcrip- tome expression between wild-type and microRNA-deficient Dicer knockout mESC (Leung et al., 2011). Furthermore, most of the highly repressed genes have low expression levels

(see Figure C.11; Stark et. al., 2005; Farh et. al., 2005; Sood et. al., 2006), suggesting that these genes should have reduced protein expression noise as a consequence of microRNA regulation.

4.3 Conclusions

Genome-scale analysis of microRNA binding data (Farh et al., 2005; Sood et al., 2006) has shown that microRNAs preferentially target lowly expressed genes that are dominated by intrinsic noise, while selectively avoiding ubiquitous and highly expressed genes that are more sensitive to extrinsic fluctuations. Our integrated theoretical and experimental approach has shown that microRNAs reduce intrinsic noise while increasing extrinsic noise.

Together these results suggests that a common effect of microRNAs is to reduce gene expression noise. Our work has further shown that combinatorial microRNA regulation, a widely observed phenomenon in vivo (Friedman et al., 2009; Enright et al., 2003; Krek et al., 2005; Stark et al., 2005), enhances overall noise reduction by amplifying repression and buffering stochastic fluctuations in the abundance of single microRNAs. Combinatorial microRNA regulation may thus be a potent mechanism to reinforce cellular identity by reducing gene expression fluctuations that are undesirable for the cell.

The principle established in this work is that fluctuations in protein abundance can be effectively regulated at the level of transcription. Here, we have focused on the capacity of microRNAs to regulate gene expression noise; however, any translationally invariant

mechanism that decreases the timescale of mRNA fluctuations will, in principle, produce a

similar effect. This conceptual perspective provides a foundation for studying a broad range of transcriptional regulators as alternative instruments for controlling protein noise.

94 Chapter 5

Conclusions and Future Directions

it is now well-established that miRNA play an important role in gene regulation through ei- ther translational repression or mRNA degradation. By being able to target different mRNA species, their impact may be more extended. In this thesis we have investigated the ceRNA hypothesis which proposes to add a new layer of post-transcriptional gene regulation medi- ated by the titration of common miRNAs by competing targets. This RNA-RNA crosstalk effect is a subject of intense activity and indeed controversy. Indeed, it is difficult to imagine that perturbing the expression of individual miRNA targets, which are only a small part of the total number of binding sites in the cell, could possibly influence enough miRNA to significantly change the repression of other targets. The focus of this work has been to inter- rogate key questions about the ceRNA mechanism- its generality, its dependence on shared miRNAs, and the size of the effect. We aimed to answer these questions by integrating three kinds of experiments: a) perturbing the levels of 3 known ceRNAs and systematically searching for miRNA-mediated crosstalk effects on the transcriptome b) modulating the levels of binding sites in the cell by over-expressing an endogenous PTEN 3'UTR and sort- ing cells carrying specific amounts of the PTEN 3'UTR to isolate dose-dependent crosstalk effects c) quantifying the expression and spatial localization of ceRNAs in single-cells

While initial studies of the ceRNA hypothesis were restricted only to a few computa- Conclusion and Review

tionally predicted ceRNAs, our results show that an appreciable crosstalk effect exists quite

pervasively across the genome, i.e the levels of hundreds of genes, across all expression

scales, appear co-regulated along with the perturbed sender. Through carefully selecting

genes whose crosstalk was lowered in a miRNA-deficient control we could ascertain that

the effect was miRNA-mediated. More specifically, the size of the crosstalk effect can be

correlated to the number of shared miRNA, and the quality of miRNA binding sites in

the receiver genes. Thus both the overlap of miRNA binding sites and the affinity of those

miRNAs are important determinants of crosstalk. In the case of VAPA, PTEN, CNOT6L,

we found that shared miRNA binding sites made their interactions reciprocal- perturba-

tions in each caused changes in the other. These findings suggest that combinatorial miRNA

targeting could be a mechanism that cells use to concordantly shape the expression of an

entire class of genes which may be functional in similar pathways or need to be expressed

stoichiometrically.

100-

75 Binding equation:

U50 Binding or unbndning oP 25- S

0 1 1 1 100 10,000 10,000 100 Free miRNA concentation F (units of Kd)

Figure 5.1 I Colocalization of ceRNA's can enhance crosstalk by increasing their local con- centrations hence promoting rates of miRNA association between ceRNA as free miRNA's are more likely to bind to nearby mRNA than other targets (adapted from Jens (2015).

The size of the ceRNA effect was bounded by 1 for all receivers, for each of the 3 senders.

That is, the fold change in a receiver was always lower than the fold change in the perturbed

sender. The existence of a hard bound emerged naturally from our minimal ceRNA model

because each receiver is only weakly repressed by a miRNA and each sender sequesters

only a fraction of the total miRNA pool. However the moderate crosstalk strength we

96 measured for many genes was still larger than predicted by our minimal miRNA-ceRNA model, and other steady-state models of target competition. To examine this discrepancy in more detail, we used smFISH to measure the intracellular concentrations of these molecules at the molecular level and surprisingly found a colocalization of different ceRNA species with each other. Thus, we hypothesize that the strong crosstalk for PTEN, VAPA and

CNOT61 (between 0.2-0.5), and possibly other ceRNAs, might be explained by localization.

Effectively, localization renders the available pool of interacting binding sites much smaller

than the total, amplifying crosstalk between select ceRNAs. Put another way, colocalization

of bound ceRNAs increases their local concentrations making it more likely that dynamically

binding/un-binding miRNAs from one ceRNA will bind to another nearby ceRNA. We are

currently working on extending our minimal model to take localization effects into account.

Experimentally, one can apply new multiplexed smFISH technologies to potentially search

for colocalization between multiple ceRNAs (Lee 2014). Though more challenging, with

recently developed technologies to visualize sub-cellular localization of miRNAs (Pitchiaya

2014), one can probe the miRNAs we have identified to search for spatial colocalizations

between ceRNA-miRNA pairs.

Genome-wide ceRNA studies have measured expression changes in population averages

of cells after perturbing either the # of targets/or miRNAs and have found crosstalk effects

to be small. We think miRNA-mediated crosstalk effects are more visible in in un-perturbed

single cells. As we found in Chapter 4 miRNA pools of different miRNA families themselves

can be quite noisy and propagate noise to their target proteins. Our work suggests that both

miRNA pool noise and miRNA coupling between ceRNAs are a mechanism to suppress their

independent fluctuations, leading to more correlated and even stoichiometric expression of

genes in single cells. However we caution that we only demonstrated this effect in fixed

cells by observing differences in ceRNA correlations between HCT 116 and miRNA defi-

cient DICER cells. Future studies could track ceRNA levels dynamically in single-cells after

antagonizing specific miRNAS to truly isolate which miRNAs are responsible for reduced

fluctuations. Measuring correlations or noise would be a more sensitive measure of miRNA

induced interactions between ceRNAs than perturbing individual ceRNAs

97 References Ebert, M.S., Neilson, J.R., and Sharp, P.A. (2007). MicroRNA sponges: competitive inhibitors of small RNAs in mammalian cells. Nat. Methods 4, 721-726 Baek, D, Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71. Selbach, M., B. Schwanhausser, N. Thierfelder, Z. Fang, R. Khanin, N. Rajewsky. (2008). Widespread changes in protein synthesis induced by microRNAs. Nature 455: 58-63

Bartel DP. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281- 297

Wightman, B., Ha, I. and Ruvkun, G. (1993). Post-transcriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862. Reinhart, B. J., Slack, F. J., Basson, M., Pasquenelli, A. E., Bet- tinger, J. C., Rougvie, A. E. and Horvitz, H. R., (2000) The 21-nucleotide let-7 RNA regulates developmental timings in Caenorhabditis ele- gans. Nature, 403, 901-906. Cai S, Han HJ, Kohwi-Shigematsu T (2003) Tissue-speciWc nuclear architecture and gene expression regulated by SATB1. Nat Genet 34(1):42-51 Hansen T.B.,J.Kjems, C.K.Damgaard (2010),. CircularRNAand miR-7 in cancer. Cancer Research, vol. 73, no. 18, pp. 5609-5612, 2013.

Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI, Rubio-Somoza I, Leyva A, Weigel D, Garcia JA, Paz-Ares J. (2007). Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet 39: 1033-1037.

Seitz, H. (2009). Redefining microRNA targets. Curr. Biol. 19, 870-873.

Mayr, C., and Bartel, D.P. (2009). Widespread shortening of 3'UTRs by alterna- tive cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673-684. Poliseno, L., Salmena, L., Zhang, J., Carver, B., Haveman, W.J., and Pandolfi, P.P. (2010). A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033-1038. Cesana M, Cacchiarelli D, Legnini I, Santini T, Sthandier 0, Chinappi M, Tramontano A, Bozzoni I. (2011). A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147:358 -369. Lewis B, Shih 1, et al (2003).: Prediction of mammalian microRNA targets. Cell, 115(7):787- 798.

Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenyla- tion and clearance of maternal mRNAs. Science 312, 75-79. Ebert, M. S. & Sharp, P. A. Emerging roles for natural microRNA sponges. Curr. Biol. 20, R858- R861 (2010).

98 Memczak S, et al. (2013). Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495:333- 338. Brewster, R.C., Weinert, F.M., Garcia, H.G., Song, D., Rydenfelt, M., and Phillips, R. (2014). The transcription factor titration effect dictates level of gene expression. Cell 156, 1312-1323. Buchler, N.E., and Louis, M. (2008). Molecular titration and ultrasensitivity in regulatory networks. J. Mol. Biol. 384, 1106-1119. Bartel DP. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281- 297

Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, Karreth F, Poliseno L, Provero P, Di Cunto F, Lieberman J, Rigoutsos I, Pandolfi PP. (2011) Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell 147, 344-357.

Karreth FA et al (2011) In vivo identification of tumor-suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma. Cell, 147:382-395

Sumazin P, Yang X, Chiu HS, Chung WJ, lyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J. (2011). An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147: 370-381

Yi et al. (2008). A skin microRNA promotes differentiation by repressing 'stemness'. Nature 452, 225-229.

Sluijter, J.P.G. et al. (2010). MicroRNA-1 and -499 regulate differentiation and proliferation in human-derived cardiomyocyte progenitor cells. Arterioscler. Thromb. Vasc. Biol. 30, 859-868.

Cimmino, A. et al. (2005). miR-15 and miR-16 induce apoptosis by targeting Bcl2. Proc. Nati. Acad. Sci. USA 102, 13944-13949 Jens M, Rajewsky N, (2015) Competition between target sites of regulators shapes post- transcriptional gene regulation, Nature Reviews Genetics 16, 113-126 Nitzan M., et al. Interactions between distant ceRNAs in regulatory networks. Biophys. J., 106 (2014), pp. 2254-2266 Salmena, L., Poliseno, L., Tay, Y., Kats, L., and Pandolfi, P. P. (2011). A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353-358. Cesana M, et al. (2011). A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA, Cell, 147 , pp. 358-369 Tay, Y, Rinn, J., and Pandolfi, P.P. (2014). The multilayered complexity of ceRNA crosstalk and competition. Nature 505, 344-352. Figliuzzi, M., Marinari, E., and De Martino, A. (2013). MicroRNAs as a selective channel of communication between competing RNAs: a steady-state theory. Biophys. J. 104, 1203-1213. Bosia C, Pagnani A, Zecchina R. (2013) Modelling Competing Endogenous RNA Networks PLoS ONE vol. 8 (6) pp. e 66609

99 Denzler, R., Agarwal, V., Stefano, J., Bartel, D.P., and Stoffel, M. (2014). Assessing the ceRNA hypothesis with quantitative measurements of miRNA and target abundance. Mol. Cell 54, 766- 776.

Cummins J. M.et al. (2006) The colorectal microRNAome Proc. Nati. Acad. Sci. U.S.A. 103, 3687-3692 Arvey A, Larsson E, Sander C, Leslie CS, Marks DS. (2010). Target mRNA abundance dilutes MicroRNA and siRNA activity. Mol Syst Biol, 6(363). Li, H. & Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589-595 Yan H, Choi AJ, Lee BH, Ting AH.(201 1) Identification and functional analysis of epigenetically silenced microRNAs in colorectal cancer cells. PLoS One 6(6):e20628 Garcia et al., Weak seed-pairing stability and high target-site abundance decrease the proficiency of Isy-6 and other microRNAs. (2011) Nature Structural & Molecular Biology 18, 1139-1146 Robinson M, Oshlack A, (2010) A scaling normalization method for differential expression analysis of RNA-seq data Genome Biology, 11:R25 Falcon, S. & Gentleman, R. (2007). Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257-258 . Mukherji, S., M. S. Ebert, ., A. van Oudenaarden. (2011). MicroRNAs can generate thresholds in target gene expression. Nat. Genet. 43: 854-859 Levine E, McHale P, Levine H. (2007). Small regulatory RNAs may sharpen spatial expression patterns. PLoS computational biology, 3(11):e233, Figliuzzi M, Marinari E, De Martino A. (2013). MicroRNAs as a selective channel of communication between competing RNAs: a steady-state theory. Biophys J 104: 1203-1213. Yuan Y, Liu B, Xie P, Zhang MQ, Li Y, Xie Z, Wang X. (2015). Model-guided quantitative analysis of microRNA-mediated regulation on competing endogenous RNAs using a synthetic gene circuit. Proc Natl Acad Sci 112: 3158-3163.

Ala U, Karreth FA, Bosia C, Pagnani A, Taulli R, Leopold V, Tay Y, Provero P, Zecchina R, Pandolfi PP. (2013). Integrated transcriptional and competitive endogenous RNA networks are cross-regulated in permissive molecular environments. Proc Natl Acad Sci 110: 7154-7159.

R Core Team. (2011). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

Hausser J, Zavolan M. (2014). Identification and consequences of miRNA-target interactions- beyond repression of gene expression. Nat Rev Genet 15: 599-612.

Bosson AD, Zamudio JR, Sharp PA. (2014). Endogenous miRNA and target concentrations determine susceptibility to potential ceRNA competition. Mol Cell 56: 347-359.

Broderick JA, Zamore PD. (2014). Competitive endogenous RNAs cannot alter microRNA function in vivo. Mol Cell 54: 711-713.

100 Supplementary Note

Derivation of the mathematical model of microRNA regulation In order to investigate the regulation of genes by microRNAs we build a kinetic model describing the expression of a gene that is regulated on the post-transcriptional level by a microRNA. We start with a previously published model of microRNA regulation [Mukherji et al., 20111 that we extend to include the competition of multiple mRNAs for the same microRNA regulation and the turnover of the microRNA (see later section). The model is an ordinary differential equation model that describes the temporal evolution of free mRNA levels [mi] as well as the levels of the complex between mRNAs and the microRNA [milt] for an unlimited number of regulated genes. We denote different genes and the parameters associated with them by subscripts. We assume that mRNA is transcribed with the constant rate vi and constitutively degraded with a rate d- [mi]. The mRNA can bind to the free microRNA to reversibly form the complex mitt, with the associated on-rate k?' and off-rate k? . When bound in the complex, the mRNA is degraded with the rate d" [mii]. By assuming mass-action kinetics, the ordinary differential equations for the free levels of an mRNA mi and the levels of the respective complex mip can be written as

d[mi] =vi - d . [mi] - k?' - [mi] - [p] + ko I - [mip] , (1) dt 0T d[m2 p] n dt k -[mi] - [p] - k'f - [mip] - di" - [mip] (2) In the beginning we assume that the turnover of microRNAs is much slower and therefore we treat the total microRNA concentration as constant. Consequently, the following conservation relation holds

N T [W]=[p]+ [mui] , (3) 3=1

where [ 1 T],[p] and [mjp] are the levels of total microRNA, free microRNA and all complexes formed by the microRNA with the regulated mRNAs mj with j = 1, ... , N , respectively. Solving equation 2 for steady state, i.e. setting the time derivate of [mipi] equal to zero, we obtain

i[mi] - [P] (4) [m ]= Ki ' 4

where Ki = is the dissociation constant of the mRNA-microRNA interaction. It follows from equation 4 that the concentrations of the complexes formed by two different mRNAs with the microRNA are related by

[mxp] _ [m.] Ky (5) [myPI] K, [my] Using equations 3 and 5 we can solve the steady state of [mip] as

(6) [mip] = [mi]K 1+w[T Here we define the sum over all free levels of regulated mRNAs normalized by their respective dissociation constants

E =(7) j=1 Kj as the microRNA workload. It follows from equations 4 and 6 that the inverse of one plus the microRNA workload is the fraction of free microRNA

7 . (8) [A] = 1+w The workload describes the sequestration of the microRNA by all regulated mRNAs and therefore captures the competition between co-regulated genes. With equation 6 we can write the steady state of the free mRNA levels implicitly as

[mni] = r = 1 n?+ 9 Z Kic(1+w) K-lw where we define [i] = (10)

as the steady state concentration of the mRNA when it is not regulated by a microRNA. Further it is beneficial to also define the effective total microRNA concentration as

T [p7]= .-[p ] .(11)

To quantify the effect of the microRNA regulation on the free levels of an mRNA we introduce the measure of repression as

Ri = 1 - [m.] (12) [1?] Therefore repression of 0% (Ri = 0) means the free levels of the mRNA are not changed by the microRNA regulation and repression of 100% (Ri = 1) means the levels of the free mRNA are completely suppressed by the microRNA regulation. Using the implicit expression for the steady state of the free mRNA (equation 9), the repression of regulated mRNAs can be re-written in terms of the workload as

[me] Ri = I - '.j (13) Ki (14) [in?

(15) 1+w + xj

where xi =- d(l'I - is the ratio between the maximal microRNA mediated mR.NA degradation rate constant (at zero microRNA workload, w = 0) and the constitutive mRNA degradation rate constant. Therefore, at a given workload of the microRNA, the repression of any regulated mRNA is simply determined by its ratio between the maximal microRNA mediated degradation rate constant and the constitutive mRNA degradation rate constant. The workload at which each mRNA's repression is reduced to half of the maximal repression present at zero workload w = 0 is

R -Ri = 0) =1+x . (16)

An increased ratio of microRNA mediated to constitutive mRNA degradation rate constant therefore increases repression and also shifts the loss of repression to higher microRNA workload values.

8 The steady state of the mRNA can also be solved explicitly as

[mi] = ([m+ - [p*I -K* (1 +wi) + ([m?] - [i*J -Ki (1 +-wi))2 4- [my] -K- (1 +wi) , (17)

where Wi =W - ,i] (18) is the workload of the microRNA contributed by all regulated mRNAs except mRNA mi. The competition of co-regulated mRNAs results in an apparent dissociation constant KZ for each regulated mRNA, depending on the workload contributed by all co-regulated mRNAs:

Ki =Ki-(1 +w) . (19) Further, to quantify the influence of an mRNA towards the microRNA, we introduce the fraction of microRNA sequestrated by mRNA mi as

[Mi]

s . (20) 1+ Quantification of mRNA crosstalk To investigate the coupling between co-regulated genes that share a common microRNA regulation, we introduce the measure of crosstalk strength. Crosstalk strength describes the relative change in the free levels of the receiver m, upon a relative change in the free levels of the sender m.

Cr - Oln([m,]) _ O[m,] [m,] (44) 91Bn([Tn.]) - [m,] [m,l (44 Using the implicit equation for the steady state of the free mRNA levels (equation 9) and the theorem on implicit differentiation we can rewrite equation 44 and solve it as

V VM-1 ( gr[T] - [m,] - [m,]

- K,+ K_+_(45) S[T] a[,m1] [M[](

v,-d"P -[p'] V -d"A -[p T] m d" + -J2Ks K, -(1 +w)2 (d + 22 ) ]

Crosstalk strength is always positive, because all terms in equation 47 are positive. And it is always smaller than 1, because

[ms 1 1 + + [ . 2 (48)

Crosstalk strength can be reformulated in terms of repression and sequestration as

- Rr (9 C7 = S, 1 -R, -S (19 where R, is the repression of the receiver in the given state (cf. equation 12) and Sr and S, are the fractions of microRNA sequestrated by the receiver and the sender, respectively (cf. equation 20). Further it can be shown that given a certain concentration of the sender [m] crosstalk strength will be maximal when all concentrations of co-regulated mRNAs (including the receiver) are close to zero

[in] -+ 0 Vj # s . (50) Therefore crosstalk strength at a given concentration of the sender [m,] will always be equal to or less than

Cr < Ss -R mrax .(1

Equation 51 can also be used to estimate the limits of crosstalk effects among several mRNAs. Case 1: When the receiver is sequentially influenced by multiple senders, all of them who share the same microRNA regulation, the sum over all crosstalk strengths must be smaller 1:

C; R"max. r S = Rm1axr . 1 +w -+R rmax < 1 . (52)

13 The sum over all crosstalk strengths from different senders can be re-written as the product of repression of the receiver times the sum over all fractions of microRNA sequestration by the senders (-). The sum over all fractions of microRNA sequestration must be smaller 1 (-.). Case 2: When the receiver is sequentially influenced by multiple senders, all of whom share a connon microRNA regulation with the receiver, but no common microRNA regulation among each other, the sum over all crosstalk strengths must be smaller 1. Let us denote the different microRNA regulators with the index k, then we can formulate this as

Cr

References

[Baccarini et al., 2011] Baccarini, A., Chauhan, H., Gardner, T. J., Jayaprakash, A. D., Sachidanandam, R. and Brown, B. D. (2011). Kinetic analysis reveals the fate of a microRNA following target regulation in mammalian cells. Current biology : CB 21, 369-376.

[Baek et al., 20081 Baek, D., Vill6n, J., Shin, C., Camargo, F. D., Gygi, S. P. and Bartel, D. P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71.

IBruggeman et al., 20091 Bruggeman, F. J., Bhithgen, N. and Westerhoff, H. V. (2009). Noise management by molecular networks. PLoS Computational Biology 5, e1000506.

[Elf and Ehrenberg, 20031 Elf, J. and Ehrenberg, M. (2003). Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Research 13, 2475-2484.

[Gantier et al., 20111 Gantier, M. P., McCoy, C. E., Rusinova, I., Saulep, D., Wang, D., Xu, D., Irving, A. T., Behlke, M. A., Hertzog, P. J., Mackay, F. and Williams, B. R. G. (2011). Analysis of microRNA turnover in mammalian cells following Diceri ablation. Nucleic Acids Research 39, 5692-5703.

[Haley and Zamore, 20041 Haley, B. and Zamore, P. D. (2004). Kinetic analysis of the RNAi enzyme complex. Nature structural & molecular biology 11, 599-606.

[Lim et al., 20031 Lim, L. P., Lau, N. C., Weinstein, E. G., Abdelhakim, A., Yekta, S., Rhoades, M. W., Burge, C. B. and Bartel, D. P. (2003). The microRNAs of Caenorhabditis elegans. Genes & Development 17, 991-1008.

[Mukherji et al., 20111 Mukherji, S., Ebert, M. S., Zheng, G. X. Y., Tsang, J. S., Sharp, P. A. and van Oudenaarden, A. (2011). MicroRNAs can generate thresholds in target gene expression. Nature Genetics 43, 854-859.

[Paulsson, 20041 Paulsson, J. (2004). Summing up the noise in gene networks. Nature 427, 415-418. [Pedraza and van Oudenaarden, 20051 Pedraza, J. M. and van Oudenaarden, A. (2005). Noise propagation in gene networks. Science 307, 1965-1969.

[Schwanhiiusser et al., 20111 Schwanhiiusser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W. and Selbach, M. (2011). Global quantification of mammalian gene expression control. Nature 473, 337-342.

14