Research Collection

Doctoral Thesis

Interactions of the factor RBFOX with non- coding RNAs

Author(s): Stoltz, Moritz

Publication Date: 2015

Permanent Link: https://doi.org/10.3929/ethz-a-010496253

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library DISS. ETH NO. 22872

Interactions of the alternative splicing factor RBFOX with non- coding RNAs

A thesis submitted to attain the degree of

DOCTOR OF SCIENCES of ETH ZURICH

(Dr. sc. ETH Zurich)

Presented by

Moritz Stoltz

Master of Science in Chemistry, University of Basel

born on 24.07.1983

citizen of

Marburg, Germany

accepted on the recommendation of

Prof. Dr. Jonathan Hall

Prof. Dr. Gisbert Schneider

2015

2

Don't speak to me of anarchy or peace of calm revolt, man We're in a play of slow decay orchestrated by Boltzmann

It's entropy, it's not a human issue Entropy, it's matter of course Entropy, energy at all levels Entropy, from it you cannot divorce

And your pathetic moans of suffrage tend to lose all significance

Extinction, degradation; the natural outcomes of our ordered lives Power, motivation; temporary fixtures for which we strive

Something in our synapses assures us we're okay But in our disequilibrium we simply cannot stay

From “Entropy” by Bad Religion, written by Greg Graffin

3

4

Content Summary ...... 7 Zusammenfassung ...... 9 Acknowledgments ...... 11 1: Interactions of FOX with microRNAs...... 15 1: Introduction ...... 15 1.1 Surface Plasmon Resonance spectroscopy (SPR) ...... 15 1.1.1..: History of the development of SPR ...... 15 1.1.2.: Theoretical background of SPR ...... 15 1.1.3.: Experimental set-up of the SPR method ...... 18 1.1.4.: Fitting experimental data from SPR experiments ...... 20 1.2.: RNA Synthesis ...... 24 1.2.1.: Oligonucleotide deprotection...... 25 1.2.2.: Activation and coupling of the phosphoramidites ...... 26 1.2.3.: Capping of the unreacted reagents...... 27 1.2.4 Oxidation ...... 28 1.2.5.: Workup and purification of oligonucleotides ...... 29 1.3.: MicroRNA biogenesis ...... 30 1.3.1 Nuclear processing ...... 30 1.3.3 Cytoplasmic processing ...... 32 1.3.4 The RNA induced silencing complex (RISC) ...... 32 1.3.5.: Additional regulatory elements...... 35 1.3.6.: MiRNA in human disease ...... 36 1.4.: RBFOX proteins ...... 37 1.4.1.: Structure of the RBFOX/RNA complex ...... 38 1.4.2.: RBFOX in alternative splicing ...... 43 1.4.3.: Variation in the recognition of RNA by RBFOX proteins ...... 45 1.4.4.: MicroRNA regulation of RBFOX-3...... 46 1.5.: Detection of new RNA binding motifs ...... 47 1.5.1. Systematic Evolution of Ligands by Exponential Enrichment (SELEX) ...... 47 1.5.2 Cross-linking immunoprecipitation (CLIP) based methods ...... 50 Aim of the project ...... 54 2. Results and Discussion ...... 55 2.1.: ELISA Assay of RBFOX against pre-miRNAs ...... 55 2.2: Synthesis of two RNA libraries ...... 59 2.3: Immobilization of the RBFOX RRM on a biosensor ...... 60

5

2.4.: Screen for new RNA binding motifs of RBFOX RRM ...... 63 2.5.: Single base variants of the consensus RBFOX binding element ...... 70 2.5.1.:. Use of dimethyl-cytidine (dMC) and monomethyl-cytidine (mMC) to gain insights into RBP/RNA recognition...... 74 2.6. Interaction of RBFOX with precursor miRNAs ...... 77 2.6.1. SPR studies of RBFOX/miRNA interactions ...... 77 2.6.2.: Influence of RNA structure upon binding to RBFOX RRM ...... 83 2.6.4.: The influence of the sugar pucker of the ribose on the binding affinity against RBFOX ... 86 2.7.: Effects of the alternative splicing factor RBFOX-2 on the biogenesis of pre-miR-20b ...... 89 3.: Summary and Outlook...... 92 4.: Materials and Methods ...... 94 4.1.: List of used Chemicals ...... 94 4.2.: List of used Equipment ...... 95 4.3.: Methods ...... 96 5.: Supplementary tables and figures ...... 104 5.1.: List of Abbreviations ...... 104 5.2.: Tables ...... 106 5.3.: Figures ...... 139 6.: Manuscripts ...... 147 6.1.: Development of a RNA negative control ...... 147 6.2.: Rapid high-yield cell-free expression of quantitatively biotinylated proteins (Draft) ...... 155 4.4.: List of tables ...... 176 4.5.: List of tables ...... 180 5.: References ...... 182 CURRICULUM VITAE: MORITZ STOLTZ ...... 199

6

Summary

RNA binding proteins (RBPs) play roles in the post-transcriptional regulation of RNA metabolism including (alternative) splicing, polyadenylation, stability, localization, degradation and translation. According to the OMIM (Online Mendalian Inheritance in Man) database, 150 of these RBPs are linked genetically to human diseases. Of these RBPs, 30 % are known to interact with messenger RNAs (mRNAs) and the rest interacts amongst others with non-coding RNAs (ncRNA). Currently, over 1500 human RBPs are assumed to interact solely with RNAs (in contrast to approx. 20’000 -coding which contain an RNA binding domain RBD somewhere in their sequence. The conservation between RBPs ca be conveniently classified according to RBP functions, with ribosomal proteins being the most conserved RBPs. RBPs usually contain one or more RBDs which include the RNA recognition motif (RRM), the K-homology domain (KH-domain) and a zinc finger (ZnF). Each of those RBDs is characterized by a distinct topology. The RRM is by far the most common RBD. It interacts with approximately four nucleotides, through stacking, electrostatic and hydrogen bonding interactions. The RBFOX proteins belong to a small family with a highly conserved RRM. The proteins are mostly known for their roles in alternative splicing. Recently, RBFOX-3 was shown to play a role in microRNA (miRNA) biogenesis. The commonly accepted consensus RNA binding motif (RBM) of the RBFOX family is (U)GCAUG(U), a heptaribonucleotide sequence which was discovered using Systematic Evolution of Ligands by Exponential Enrichment (SELEX) studies. It is commonly referred to as the Fox-binding element (FBE) and is present in many FOX-associated RNAs. However, as the FBE is not present in many RNAs to which RBFOX proteins bind, then other non-canonical FBEs presumably exist.

In the work described in this thesis we developed a new method to identify RBMs for the RBFOX family RRM using Surface Plasmon Resonance spectroscopy (SPR). We designed a dedicated RNA library to determine new RBMs for RBFOX. The experiments revealed new RBMs for RBFOX. The new RBMs showed little preference for terminal nucleotide positions in a heptanucleotide sequence, and confirmed the importance of two flanking guanosines for the binding to the RRM. The most prominent sequences to bind to the RRM were GCUUG, GAAUG, GCACG and GCAAG. One motif bound stronger to the RBFOX RRM than the FBE. Several of these motifs feature in the sequences of precursor sequence of miRNAs (pre-miRNAs) and we confirmed direct binding interactions using SPR. The binding interaction with pre-miR-20b was confirmed in cells and its function in a regulatory loop was elucidated. We investigated the effects of RNA structure on the binding affinity of RBFOX RRM in the context of short FBEs and also in the hairpins of pre-miRNAs.

7

Taken together, we developed a new SPR-based method to detect new RBMs and investigated effects of RNA structures on the binding to the RBFOX RRM. We validated the new binding sites in a biologically relevant systems.

8

Zusammenfassung

RNA bindende Proteine sind wichtige Regulatoren in der post-transkriptionalen Regulierung des RNA Metabolismuses, unter anderem in (alternativen) Spleißen, der Polyadenylierung, der RNA Stabilität, der RNA Lokalisierung, dem RNA Abbaus und der Translation. Laut der „Online Mendalian Inheritance in man” Datenbank sind 150 dieser RNA bindenden Proteinen mit Krankheiten verknüpft. Von diesen 150 RNA bindenden Proteinen interagieren nur etwa 30% mit mRNA, der Rest interagiert unter anderem mit ncRNA. Zurzeit sind über 1500 RBPs im Menschen bekannt, welche vermeintlich nur mit RNA interagieren (im Gegensatz zu etwa 20500 RPB codierenden Genen, welche RNA bindende Domänen besitzen und zusätzliche Funktionen wie z.B. DNA Regulation haben) mit einer Homologie zwischen Menschen und Hefen von durchschnittlich 31%. Die Homologie variiert zwischen Proteinen mit verschiedenen Funktionen, z.B. ribosomale Proteine haben mit 51% Homologie die höchste Konservation, im Gegensatz zu Proteinen der ncRNA regulieren, mit nur 20% Homologie. RBPs besitzen meistens eines oder mehrere RNA bindenden Domänen, z.B. das RNA recognition motif (RRM), die K- homology domain (KH-domain) und der Zinc finger (ZnF). Jede RBD nimmt eine ähnliche räumliche Struktur an, über welche bestimmte RBD identifiziert werden können. Das RRM ist bei weitem die am meisten verbreitete RBD welche mit ungefähr 4 Nukleotiden durch π-Wechselwirkungen, Wasserstoffbrückenbindungen und elektrostatischen Interaktionen interagiert. Eine Protein Familie mit einem sehr stark konservierten RRM ist die FOX Protein Familie welche am meisten für das alternative Spleißen bekannt ist und seit neustem ist auch eine Funktion in der miRNA Biogenese bekannt. Das stärkste bekannte RNA bindende Motiv wurde durch SELEX entdeckt und ist das (U)GCAUG(U) 7 mer. Leider konnten viele Bindungsereignisse von PAR-CLIP Experimenten nicht mit dem RBM erklärt wurden.

Wir haben dazu eine neue Methode entwickelt, um neue RBMs für RBFOX mit Hilfe von SPR Spektroskopie zu entdecken. Wir haben eine für RBFOX bestimmte RNA Bibliothek entworfen und haben eine volle Pentamer RNA Bibliothek synthetisiert für weitere Studien. Die Versuche zeigten neue RBMs für RBFOX. Die Experimente zeigten eine hohe Flexibilität an den terminalen Basen und haben die Wichtigkeit der beiden Guanosine bestätigt. Die neuen Motive mit Mutationen in der zentralen Region waren GCUUG, GAAUG, GCAUCG und GCAAG. Die Affinitäten für 7mer Sequenzen mit einer einfachen Mutation waren meistens unabhängig von der substituierten Base. Die einzige Ausnahme war die Änderung von einem U zu einem C in der fünften Position, welche eine stärkere Affinität als andere Variationen an der Stelle zeigte. Wir haben die neuen Motive angewendet um die Ergebnisse von einem ELISA Assay für die Interaktionen von RBFOX mit pre-miRs zu erklären. Wir haben die Interaktionen von RBFOX zu den Hairpins bestätigt und durch Mutationen in den RBMs gezeigt, dass

9 diese die Interaktionen erwirken. Des Weiteren haben wir den Effekt von der RNA Struktur auf die Bindung an RBFOX gezeigt. Die Affinität von Hairpins hat mit der stabileren Hairpin Struktur abgenommen.

Zusammengefasst zeigt diese Arbeit eine neue Methode um RBMs zu entdecken und zeigt den Einfluss von Struktur auf die Affinität von RNA zu RBFOX Proteinen. Die neuen RBMs wurden in einem biologischen System bestätigt und konnten diverse Interaktionen vom ELISA Assay zwischen pre-miR und RBFOX-2 erklären.

10

Acknowledgments

I would like to thank especially Prof. Dr. Jonathan Hall for giving me the opportunity, to work on this project in his lab. I especially want to thank him for the last weeks in which he helped tremendously with my thesis and always keeping his cool. Within my working time, I want to thank him for his insights into industry, to get a glance out of our “Elfenbeinturm” into the reality and of course for his input into my research. And also thanks for his “English humor” how Mario called it.

I would like to thank Prof. Gisbert Schneider for co-examining my thesis. I would also like to thank him, for letting me use parts of its machinery and leading such a friendly group. I made some friends in it.

… Mauro Zimmermann… for introducing me into the RNA synthesis and having me for more than five years as my desk neighbor… poor guy. Thanks for not complaining much about listening to Bad Religion, Die Toten Hosen, The Killers etc. a lot

…Sylvia Peleg… for always being a helpful, patient and friendly person with everything you need.

… Mario Rebhan… he introduced my into SPR. thanks you’re your famous “1 month” introduction into SPR. Know I can assemble and reassemble an SPR even if I would finish his 5 l barrel of wheat beer, which is still in the lab from his defense 

... Mirjam Menzi… It was very interesting how our relationship developed. From a rather complicated one at her time as a master student, to a good connection while being equal working colleges. Especially in the last years, it was always relaxed and easy to work with you…. Thanks. Take good care of “my” little Mass-1!

…Hartmut “Hardy, Hartmützchen” Jahns… We still have to go for a dive. Even though we had some fights, we always got it sorted out easily afterwards. I`ll even forgive you your “spelling mistake”, to be honest, I am the last who can complain about those. Thanks for introducing our “group meeting derivative”…best idea we has. Was really nice working with you and having some drinks. And also a big thanks for pushing me in the last days of writing.

... Boris Günnewig... You are coming back to Switzerland a week to late! He was the first person who gave me a place to crash in Zürich, even before I had my own place. Thanks for the work on our paper… Sadly the second planned project did not happen. Enjoy your time down under.

…Luca Gebert… My diving buddy… The first one in Switzerland with whom I discovered the relaxing beauty of Switzerland’s lake. It were cold times, a semi-dry suit while snowing. Thanks for always being a relaxed person with his own subgroup and of course thanks. And of course, thanks for reading my thesis in the “endgame”!

11

...Julian Zagalak… Me Julie. We had some fun weekends. Oktoberfest and the Interboat at Lake Constance, you should not have left me there, spending 4h+ on the train station on your own is rather boring. Thanks for the one sailing trip we did together on a gorgeous day in May with an amazing view of the Alps, I got my worst sunburn ever. And of course, we had a fun time in Laax. Also thanks for our scientific cooperation, really a pleasure. And thanks in the end for helping me with my dissertation, how to do stuff and reading some of my work!

… Martina Roos… Thanks Martini for the fun times, for joking around and for getting tipsy when seeing a beer.

…Harry Towbin... Harry, THANKS… you are one of the most helpful and friendly persons I ever met. This combined with an amazing repertoire of scientific knowledge and ideas... priceless!

…Matije “the Lutscher” Lucic… We had fun times. Thanks for always having time for a chat and joking around about various topics.

…Yuluan Wang… Also thanks for always having time for a chat. Always great to joke around with sarcastic persons. And of course, also thanks for trying to make our group more social… I know, very frustrating sometimes.

…Afzal Dogar… Thanks for the scientific cooperation. And also thanks for always helping me with questions regarding literature. I still want to see you and Alok watch a cricket match “India vs. Pakistan” together.

…Helen Lightfoot... thanks for our famous gin session in the “Gonzo”. I will never forget your quote the next…let s called it “morning”.

..Andreas Brunschweiger… Thanks to the Erfolgsfan  Remember watching our evenings watching handball while always being well supplied with food.

…Ugo Pradère… My second diving buddy, ok mainly advanced snorkeling. Thanks for the fun times.

…the students of our lab… from Mirjam (yay, mentioned you twice) to Amany, thanks for the fun times.

…Jochen Imig…You are just a part of the last lab. Always joking together. I hope you`ll get your professorship soon.

12

…Erich Michel… For always providing me with enough protein and for our scientific discussions. All the best for your little family!

… Dr. Jan Hiss and Dr. Petra Schneider… Thanks for the help with the CD spectrometer.

…Dr. Klaus Wiehler, Chris Whalen and the Rest of the Sierra Sensors team…Thanks for your endless support whenever I had questions.

…the Newcomers…Alok, Meiling, Martina, Franz, Christian, Anna. Enjoy the time here and thanks for the brief time we had. Continue what we started and try to keep our “established” social events going (greasy breakfast, Weisswurstfrühstück, ski day etc.)

…To all my friends I made over the years in Zurich and around the world… You made it a great time to remember.

..last and definitely not least; To my family, my dad, my mum, my sisters Julia and Anne, my brother in law Stefan and OF course Mieke and Paulinus (even though I will meet you for the first time at my defense). Thanks for always supporting me, for teasing me to write and apply and for having great times. On Special thanks to my parents, who always supported me and made all this possible. My trip to New Zealand, my studies in Switzerland and Germany, my hobbies etc. THANKS!

13

14

1: Interactions of FOX proteins with microRNAs 1: Introduction 1.1 Surface Plasmon Resonance spectroscopy (SPR)

1.1.1..: History of the development of SPR

SPR is a label-free method used to detect interactions between an analyte and a ligand in real time. The first commercially available SPR machine was released in 1990 by Biacore. To my knowledge, the first experiments measuring affinity between RNA and proteins were conducted in 1997[1, 2]. In these, Hendrix et al used the binding of an immobilized RNA against a part of the “Regulator of expression of virion” (Rev) protein to test for the proper folding of the RNA. At about the same time, Hartmann et al. used a chemically-synthesized protein containing the -helical structure responsible for the RNA contacts in the major groove [3, 4] to measure the affinity of aptamers obtained using the SELEX (Systematic Evolution of Ligands by Exponential Enrichment) method (1.5.1. Systematic Evolution of Ligands by Exponential Enrichment (SELEX)) against 2'-5'-oligoadenylate synthase, the enzyme responsible for the degradation of viral RNA [5-7]. They found low nanomolar (nM) affinity for the aptamers, whereas a random RNA showed a weaker affinity for the protein [2]. Most studies performed with SPR use biotinylated oligoribonucleotides rather than immobilized protein [8-10] for several reasons, including: i) RNA is in general a more robust molecule compared with proteins; ii) RNA is easier to immobilize in an oriented controlled fashion so as to generate minimal interference with the analyte (e.g. using a biotin phosphoramidite, during solid phase synthesis); iii) The dextran surface matrix of the SPR sensor chips is negatively charged at pH 7, which leads to a repulsion of the RNA phosphate backbone [11] affecting the measurements by enhancing the effect of mass-transfer limitations; iv) Short RNA sequences as ligands generally yield higher responses on binding to protein, in contrast to large immobilized proteins against most full length proteins binding comparatively smaller RNAs [12].

1.1.2.: Theoretical background of SPR

The SPR method measures changes in the refractive index in close proximity to a coated surface. The principles of SPR were first described in 1902 [13]. A surface plasmon wave (SPW) occurs at the boundary between a metal and a dielectric (a non-conducting or weakly conducting material). The dielectric can be either a solution, a gas or a solid [14]. In SPR experiments gold is most commonly used as the metal. The SPW is described in part by the propagation constant β (Equation 1)

15

(1) 휔 휀푀휀퐷 훽 = √ 푐 휀푀 + 휀퐷

where:

ω = angular frequency of the SPW [Hz] c = speed of light in vacuum [m s-1]

εD εM = the dielectric functions of the metal and the dielectric. [15, 16]

The dielectric functions both depend on their respective refractive indices [17, 18]. Therefore the change of the propagation constant () and thus the change of the SPW is proportional to the change of the refractive index of the dielectric.

훽 ≅ 푘∆푛 (2)

With k = free space-wavenumber [m-1] [18]

n = change of refractive index

This is only true where the change in refractive index occurs in the whole field of the SPW. If the change in the refractive index only occurs in parts of the magnetic field, another factor (between 1 and 0) has to be added to equation 2, taking into account in which fraction of the field the change takes place [19].

16

Figure 1. Principle of SPR method. A light beam transmitted through a prism is totally reflected at the prism/gold layer at all wavelengths but the one interacting with the SPW. The wavelength suffering an intensity loss changes if the SPW changes. The change in wavelength interacting with the SPW leads to an intensity dip at a different exit angle (A and B). The change of the wavelength experiencing the energy dip is illustrated on the right site. ΔI illustrates the change in intensity if the detection occurs at a fixed wavelength.

[20]

The working principle of SPR relies on the interaction of this SPW with a light beam. When directing a light beam through a prism where the metal is adjacent to a dielectric, the complete spectra of the light except at one wavelength is totally reflected at the prism/metal layer (Figure 1A). Light at this one wavelength is interacting with the SPW and therefore changes its intensity. Since the prism breaks up non-monochromatic light into single wavelengths refracting them at different angles, this one wavelength can be determined by the angle of reflection (Figure 1B). Upon binding of an analyte to a ligand on the surface, the characteristics of the SPW changes as described due to the change of refractive index (Equation 2) and therefore the wavelength which interacts with the SPW changes. This change can be followed in real time and represents the binding of the analyte onto the ligand. Through this principle the change of the propagation constant is shown as arbitrary response units (RU) and provides a real time representation of the binding kinetics.

The SPR method can be used in a number of different ways. The most basic experiment yields a “yes/no” answer, detecting either presence or absence of binding [21, 22]. However, the power of the technique lies with the determination of kinetic data for a given binding process. Kinetic constants

-6 1 -1 2 7 -1 -1 -1 -13 ranging over several orders of magnitude (kd: 10 -10 s ; ka:10 -10 M s ; KD: 10 -10 M)[23] can be measured in this fashion. Furthermore, a wide variety of analytes can be tested in kinetic measurements, from proteins [24, 25] and antibodies [26, 27] over oligonucleotides [28, 29] to small molecules [30, 31] and even viruses and bacteria [32]. In addition thermodynamic values can be obtained when measuring kinetic data over a temperature range, using van’t Hoff plots [33]. Thus, SPR can provide mechanistic insights into binding events. Rich et al. showed the differences in binding kinetics for the binding of small-molecule agonists and antagonists to the estrogen receptor [34]. They

17 showed that agonists bind with fast association rates (1*106 M-1s-1) in comparison to antagonists (6.3*103 M-1s-1). The slower association rates were suggestive of a conformational change in the receptor through complex formation.

Markgren et al. [35] used SPR to study the kinetic properties of known HIV-1 protease inhibitors, which were subsequently used as benchmarks for potential new drugs. They created 58 modifications of HIV-1 protease inhibitors, and showed the effects of modification on the binding kinetics. This approach showed that the most promising molecules have fast association and slow dissociation rate constants [36].

1.1.3.: Experimental set-up of the SPR method

The first step of an SPR experiment is immobilization of the ligand onto the surface. There are several methods available. Here the three principal steps of the capture assay are described.

1) The most commonly used capture assay is the biotin/avidin system. A streptavidin chip is used to immobilize a biotinylated ligand. This method has the advantage that the ligands are immobilized in a directed fashion through a single functional group. For some molecules it is difficult to achieve a biotin labelling, in which case other capture assays such as antibodies or Polyhistidin-Tags (HIS Tags) are used.

2) Amine coupling is a very commonly used method [37] (Figure 2). After activation of carboxylic groups on the chip surface, a ligand containing a primary amine is covalently bound to the surface. This method is broadly applicable but it has several drawbacks. Where a ligand has multiple accessible amine groups, immobilization occurs in a random fashion at the surface carboxylic groups. This can be problematic if the most reactive amine group is important for the binding to an analyte.

18

Figure 2. Schematic illustration of the amine coupling. The thick black line represents the chip surface and the thin black line the surface matrix

3) Direct coating on the gold surface using thiols [38]: In this method a thiol is attached to the ligand, forming a self-assembling monolayer (SAM) on the surface of the biosensor. This results in a very strong non-covalent bond (energy of approx. 45 kcal/mol[39]). The use of thiols to construct SAMs on a gold surface is well understood and thiols mostly form a very stable monolayer. The drawback of this approach is that the SAMs often have defects in the monolayer and the preparation of this method is more time demanding in comparison with the two previously-mentioned methods [40].

One channel of an SPR machine always is reserved for the reference channel. This accounts for non-specific binding and bulk shifts (change of the refractive index) due to changes in the solvent concentration and addition of the analyte to the solvent during an experiment. In the preferred format, the channel is coated with a similar, but inactive version of the ligand of interest (e.g. a mutated or unrelated RNA sequence or a mutated protein). If this is not possible, the channel is left with either the matrix or with only bound streptavidin. It is necessary to measure several blank injections throughout an experiment to account for the solvent effects. Afterwards, both the buffer injection and the reference channel are subtracted from the signals from the ligand/analyte interaction. This method is called “double referencing”.

A typical experiment consists of 4 (or 5, including regeneration) steps (Figure 3). In the pre-injection phase the running buffer flows over the cell. This should produce a flat baseline. Next, the analyte is injected during an association phase, and an increase in signal upon binding of the analyte to the ligand can be observed if ka[A] > kd[LA] applies. The steady state phase is reached when ka[A] = kd[LA] with [A] as the analyte concentration, [LA] the ligand/analyte complex concentration, ka the association rate

19 constant and kd the dissociation rate constant. In this phase, the data for the steady state analysis is obtained (see 1.1.4.: Fitting experimental data from SPR experiments). In the dissociation phase, the analyte injection is switched to buffer and dissociation occurs. Here kd is determined straightforwardly if dissociation follows first order kinetics (equation 7). However, if the surface density of the ligand is high and the analyte shows a strong affinity to the ligand the rebinding of the dissociated analyte can influence the observed dissociation constant which can be accounted for in the data analysis.

Figure 3. Principle of a typical SPR experiment, including the pre-injection phase (-25 – 0 s), the association phase (0 -~20 s), the steady state phase (25 - 210 s) and the dissociation phase (220-300 s).

1.1.4.: Fitting experimental data from SPR experiments

Currently, the conventional method of analyzing SPR data uses the fitting method, which comprises a set of SPR curves fitted to one value of ka, kd and RUmax [41] using numerical integration. Before this method was developed, it was only possible to determine the kinetic data of a single binding curve in a simple 1:1 binding model [42, 43]. This new method development opened the possibility of using more complex binding models, such as the mass-transport limitation model (the diffusion of the analyte to the ligand is the rate determining step), the conformational changes model (the ligand or analyte changes its structure upon binding) and the heterogeneous ligand/analyte interaction models

20

(one analyte binds to two separate ligand binding sites/two analytes compete for binding to the same ligand site).

Here, the basic Langmuir 1:1 model [42] is presented. The binding reaction occurs according to the following equation:

The kinetics of the reaction can be described by the following differential equations.

푑[퐴] (3) ≅ 0 푑푡 푑[퐿] (4) = 푘 [퐿퐴] − 푘 [퐴] [퐿] 푑푡 푑 푡 푎 푡 푡 푑[퐿퐴] (5) = 푘 [퐴] [퐿] − 푘 [퐿퐴] 푑푡 푎 푡 푡 푑 푡

During the dissociation phase of the experiment:

푑[퐿퐴] (6) = −푘 [퐴퐿] 푎푠 [퐴] = 0 푑푡 푑 푡 푡

Where:

[A] = concentration of the analyte [L]= concentration of the immobilized ligand [LA] = concentration of the formed complex

-1 -1 ka= association constant in M s

-1 kd= dissociation constant with s

The concentration of the analyte ([A]) is known and can be assumed to be constant (Equation 3) throughout the association phase. Equation 4 describes the change of the free ligand concentration on the surface during the reaction and equation 5 describes the change of the complex concentration during the experiment. The amount of [LA] is proportional to the response (RU) signal. Using numerical integration, the equations are used to fit the values ka, kd and [L] to the measured data over the whole

21 concentration range. The program creates the binding curve for the simulated kinetic values and calculates the residuals (difference between simulated and calculated data) compared to the experimental data. The residuals are then iteratively minimized to obtain the final values for the best fit.

The dissociation rate constant kd can be calculated easily, since a normal analyte dissociation follows a first order kinetic decay (Equation 7). Equation 5 describes the change over time of the complex which contains a term for association and dissociation. During the dissociation phase the equation can be simplified, since [A] equals 0 and the rebinding of the dissociated analyte can be neglected in most cases. This yields equation 6, which through integration leads to equation 7, with [LA] ∝ Rt (Rt = response at a certain time “t”)

−푘푑푡 [푅]푡 = [푅]0푒 (7) In yet another commonly used model, an additional variable is accounted for in this set of equations: the so called mass transfer rate (km). The model is used, if the rate determining step in the association is the diffusion of the ligand from the bulk solution to the surface. This only occurs for very fast association rates and creates problems for data analysis, which can be avoided using high flow rates and low surface loading densities. Using high flow rates increases amounts of the analyte needed, and also increases the rate of dissociation.

It is very difficult to determine kinetic data if analyte dissociation or association is very fast. Fast rate constants manifest themselves in sensograms with little or no curvature. The fit of those nearly rectangular sensograms is possible through several exponential descriptions and therefore the method yields data with large margins of error. In these cases it is usually better to use the so-called

“steady-state” method of analysis. This method provides the KD for an interaction, but not kinetic rate constants. To measure the KD value, the response is plotted against the analyte concentration. The value of KD corresponds to the concentration where the plot of RU vs [A] which is fitted to the Hill equation (Equation 8) changes its curvature (f’’(x) = 0; f’’’(x) ≠ 0) (Figure 4).

[퐴]푛 (8) 푅푈 = 푛 퐾퐷 + [퐴] Where:

RU = Response in steady state at given concentration [A] = Analyte concentration

KD = dissociation constant n = Hill coefficient (in case of a 1:1 Langmuir model n = 1)

22

For this type of analysis it is necessary that the signal of the sensogram indicates steady state on the surface of the sensor. To obtain reliable values, a maximal signal should also be reached, resulting in a

Figure 4. Schematic illustration of the steady state analysis. The steady state analysis is on the left side and the sensograms are shown on the right. The steady state fit (red line) was done using the Hill equation flattening for high and low concentrations in the Hill plot.

23

1.2.: RNA Synthesis

Figure 5. Structures of the phosphoramidites (1-12), the activators (13, 14) and the linker group on the Controlled Porosity Glass (CPG) solid support (15) for the synthesis of DNA (1, 4, 7, 10, 13, 14, 15), 2’-OMe-RNA (2, 5, 8, 11, 13, 14, 15) and RNA (3, 6, 9, 12-15)

The chemical synthesis of oligonucleotides using a synthesizer is conducted on a solid support in the 3’ to 5’ direction and is fully automated. The development of chemically synthesized oligonucleotides made a leap forward when Beaucage et al. synthesized phosphoramidites for the synthesis of DNA [44]

24

Figure 6. General reaction cycle of the oligonucleotide synthesis including 1) deprotection, 2) activation and coupling, 3) capping and 4) oxidizing.

(Figure 5). The t-butyldimethylsilyl (TBDMS) group for the protection of the 2’-OH of the ribose ring had already been introduced earlier for RNA synthesis [45-47] (Figure 5). A second commonly used 2’- O protecting group is the [(triisopropylsilyl)oxy]methyl (TOM) protecting group [48, 49], which can be used under milder deprotection conditions and shows a higher reactivity compared with the TBDMS analogue. This is reportedly due to the lower steric crowding of the TOM protecting group. The steric demand of the TOM protection group is similar to that of the methoxy group of the 2’-OMe-RNA (2, 5, 8, 11). Alternative base and 2’-OH protecting groups are reviewed by Somoza [50]. The automated oligonucleotide synthesis is done in a 4 step cycle (Figure 6).

1.2.1.: Oligonucleotide deprotection.

In a first step, the 5’-OH group of the coupled nucleoside is removed by cleaving the DMT group with 2% dichloracetic acid (DCA) in dichloromethane (DCM) (Figure 7). The 5’-O of the ribose ring is protonated by the DCA and the subsequent positive charge on the 5’-O is stabilized by the electron rich aromatics (16).

25

Figure 7. Mechanism of the deprotection of the 5'-OH Group using DCA.

1.2.2.: Activation and coupling of the phosphoramidites

The key step in oligonucleotide synthesis is the coupling of the activated phosphoramidites with the CPG-attached nucleoside. In a first step the tertiary amine is protonated by the activator (tetrazole) (Figure 8). The phosphoramidite is activated [51, 52] through the formation of a bond between the activator and the phosphorus, accompanied by the cleavage of the diisopropylamine group. The support-bound 5’-OH group of the nucleoside attached to the CPG then attacks the phosphorus of the activated base. The tetrazole acts as a good leaving group (Figure 8) yielding the coupled nucleosides, with the cyanoethyl group still intact.

26

Figure 8. Reaction mechanism of the tetrazole mediated activation and coupling of a phosphoramidite.

1.2.3.: Capping of the unreacted reagents.

In case of an incomplete coupling reaction, the unreacted 5’-OH needs to be incapacitated to prevent sequence synthesis continuing. This would lead to full length n-mer products accompanied by truncated sequences n-1, n-2 etc. which would be separated from the desired product with difficulty, especially if the final product is for example a hairpin (around 60 nt).

Figure 9. Capping mechanism

27

Unreacted sequences are therefore capped using acetic anhydride and N-methylimidazole [53]. The N-methylimidazole creates a reactive intermediate (1-(1-methyl-1H-imidazol-3-yl)ethan-1-one) in situ through nucleophilic attack of the tertiary amine to a carboxyl group at the acetic anhydride. The carboxylic group of the new intermediate is attacked by the electron pair of the 5’-OH group. In a last step, the acetylated molecule is formed with expulsion of the N-methylimidazole (Figure 9).

1.2.4 Oxidation

In the final step of the reaction, the P(III) is oxidized to P(V). This is typically done with iodine in water in the presence of lutidine or pyridine[53].

Figure 10. Oxidation mechanism

The iodine is attacked by an electron pair of the phosphorus creating a positively charged intermediate. The iodine leaves the intermediate under a nucleophilic attack of a water molecule. In the next step, the base removes the proton of the OH group, forming a P(V) through formation of an P-O double bond (Figure 10).

28

1.2.5.: Workup and purification of oligonucleotides

The oligonucleotides are cleaved from the solid support using gaseous methylamine or aqueous ammonium hydroxide. In this process, all protecting groups, but the 2’-TBDMS and the 5’-DMT are removed [54-56].

Figure 11. Cleavage from the solid support

The amine deprotonates the cyanoethyl group, leading to a cleavage of the group and a negatively charged oxygen at the phosphorus. Afterwards the methylamine attacks the amine the carboxyl ester, cleaving the CPG from the 4,7-epoxyisoindole. The primary alcohol is deprotonated forming a 5 membered phosphotriester as the final product (Figure 11).

The 2’-TBDMS group is cleaved using a triethylamine tris(hydroflouride). After cleavage, the oligonucleotide is purified “DMT-on” using High Performance Liquid Chromatograhpie (HPLC) with a reverse phase column. The remaining DMT group works as a marker, changing the retention time of the full length oligonucleotides. The incompletely synthesized oligonucleotides do not have the DMT-group because of the capping step, and are therefore well-separated by the column. After HPLC workup, the DMT-group is removed with 40% aq. AcOH (Figure 7). The final product is purified again, using reverse phase HPLC.

29

1.3.: MicroRNA biogenesis

MicroRNAs (miRNAs) are approx. 22 nt long non-coding RNAs [57, 58]. They are often highly conserved throughout species [59-61]. They function as posttranscriptional regulators of expression in many organisms by targeting messenger RNAs (mRNAs) [57, 58, 62]. They generally bind in a sequence specific manner to the 3’ UTR (3’ untranslated region) of their target mRNAs. The bases 2-8 of the miRNA are highly complementary to their target mRNAs, whereas the bases 9-22 may contain bulges and mismatches. The large number of miRNAs form a complex regulatory system, in which most miRNAs have different mRNA targets and mRNAs often have 3’ UTRs that accept different regulatory miRNA partners.

The first miRNA, lin-4, was discovered in c. elegans by Lee et al. [63] in 1993. They found, that a non-coding gene produced 2 short RNA sequences (22 nt and 61nt) and was able to regulate larval c. elegans development. At the same time, Wightman et al. [64] published that lin-14 was regulated by lin-4 through partial complementary binding sites in the 3’UTR and thereby firmly established this novel regulatory mechanism.

1.3.1 Nuclear processing

The general process of miRNA biogenesis is understood rather well, even though many details are still unclear and additional factors may await discovery. The general process is shown in Figure 12. The first step, in miRNA biogenesis is the transcription of the genes into the primary miRNAs (pri-miRNAs) by RNA polymerase II [65-67]. There are some exceptions in which the pri-miRNAs are transcribed by RNA polymerase III, e.g. in viral miRNAs and in the human C19 miRNA cluster [65, 68]. To protect pri-miRNAs from degradation, most of them are capped and polyadenylated [66] like mRNAs.

Pri-miRNAs contain a hairpin structure with a stem consisting of an approx. 33nt stretch with strong base pairing and a single stranded region, the terminal loop (TL). This structure is further processed by the microprocessor complex [69-72]. In humans the complex consists of the two dsRNA binding proteins, Drosha and DiGeorge critical region 8 (DGCR8). Drosha contains 2 RNase III domains which each cleave at the 3’ and the 5’ end of the stem respectively, creating an approx. 2-nt overhang at the 5’ end. The cleavage sites are determined by the junctions between the double strands RNA (dsRNA) and ssRNA of the loop (apical junction) and the stem and the junction between the stem 5’ flanking RNA sequence (basal junction). Drosha cleaves the pri-miRNA approx. 11 nt from the basal junction and 22nt in 3’ direction of the apical junction. In this process DGCR8 functions as a molecular ruler, to determine the cleavage site [73, 74]. Several studies have shown the importance of distinct motifs in

30 a pri-miRNA. Auyeng et al found, that 78% of pri-miRs contain either a UG or and CNNC motif in the basal region or a UGUG motif in the loop region [75]. The CNNC motif is known to be recognized by SRp20 and DEAD-box RNA helicase p72 (DDX17) and to induce dicer cleavage [76]. The resulting product is the so called precursor microRNA (pre-miR) which is an approx. 60 nt RNA hairpin.

Figure 12. Canonical miRNA biogenesis pathway. The pri-miRNA is transcribed by RNA polymerase from the miRNA gene. It is processed by Drosha and DGCR8 forming the pre-miRNA which is then exported into thy cytoplasm by Exportin-5 and RAN-GTP. The pre-miRNA is released into the cytoplasm upon hydrolysis of the RAN-GTP to RAN-GDP. It is cleaved into the mature miRNA duplex by DICER and TRBP and subsequently loaded into the RISC complex containing an Argonaut protein amongst others.[77]

31

The pre-miRNA is exported into the cytoplasm through the nuclear pore complex (NPC) by Exportin 5 (EXP5) utilizing the RAN-GTP system. EXP5 recognizes the stem of the hairpin and forms a transport complex with RAN-GTP [78-80]. After transport to the cytoplasm, GTP is hydrolyzed to GDP, leading to dissociation of the complex and the release of the pre-miRNA into the cytoplasm [62].

1.3.3 Cytoplasmic processing

In the cytoplasm, the pre-miRNA is processed by Dicer through an intermediate miRNA duplex that gives rise to the mature 22 nt long miRNA [81]. Dicer is an endonuclease, with several features similar to that of Drosha. It consists of two RNase III-type domains (RIIID) at the C-terminal end for the catalytic cleavage [82]. The N-terminal end is a helicase and is responsible for the interaction with the terminal loop (TL) and the induction of processing [83, 84]. The Piwi-Argonaut-Zwille (PAZ) domain is separated from one of the RIIID’s by a positively charged helix which might act as a ruler for the pre-miRNA cleavage [85]. Dicer binds the 5’-phosphate and the 3’OH groups [81, 82, 86] and cleaves the pre-miRs at a distance of 21-25 nt from the 3’ end (3’ counting rule) [81, 82, 87]. In mammals, dicer also uses the 5’ end as a marker for cleavage. It cleaves 22nt from the 5’ end of the RNA. This rule only applies, if the 5’ end is thermodynamically unstable [81]. A cofactor, TAR RNA binding protein (TRBP) also influences Dicer cleavage. It enhances DICER processing and can create isomiRs (miRNAs with addition/deletion of 1-2 nt in the terminal region) by rearranging the structure of the hairpin [88-91].

1.3.4 The RNA induced silencing complex (RISC)

In the next step, either the 3’ or the 5’ strand of the miRNA duplex is incorporated into the RNA induced silencing complex (RISC). The RISC contains several proteins, Argonaute (AGO), a glycine-tryptophan repeat containing protein of 182kDa (GW182), a poly(A) binding protein (PABP), CCR4-NOT and PAN2-PAN3 (Figure 13A) [92-95]. The AGO partner GW182 interacts with the PABP. The strand which is incorporated into the RISC called guide strand whereas the non-incorporated strand is called “passenger” or “star strand”. In a first step, the miRNA duplex is loaded by (AGO) forming the precursor RISC [96-98]. A certain bias for the strand selection was found on the bases of the thermodynamic stability of the miRNA duplex. AGO selects the strand which 5’-end is part of the thermodynamically less stable end of the miRNA duplex. This results in a uracil bias for the guide strand and a cytosine bias for the star strand [99-104]. This mechanism is challenged by recent studies which showed different strand concentrations depending on the tissue [105-107].

The AGO protein consists of three domains, the PAZ domain, MID (middle) domain and the C-terminal PIWI (P-element induced wimpy testis) domain [108-110]. The 5’-phosphate group of the guide strand

32 is bound by a pocket formed by the MID and PIWI domains [111, 112]. A uridine or adenosine are the preferred nt to be recognized [113]. The 3’ end is fixated at the PAZ domain. The two fix points allow the RNA to keep a certain flexibility to bind to the target mRNA. The affinity of the interaction between the RISC and the target is presumably enhanced by the pre-organization of the 5’-end of the guide strand [114, 115].

The RISC complex is responsible for the regulation of mRNAs. Bases 2-8 of the guide strand recognize the target mRNA by through Watson-Crick complementarity [116]. The other bases add additional interactions, but are not responsible for the target selectivity (Figure 13A).

33

Figure 13. A) Schematic illustration of the RISC complex and the targeting of the mRNA target. B) Process of the mediation of translational repression. C) Mechanism of the mRNA deadenylation which leads to mRNA degradation.[117]

34

For suppression of the mRNA target, several mechanisms have been described. In one, the target mRNA is degraded (Figure 13C). The degradation of the mRNA is enhanced through GW182. It interacts with the poly(A) binding proteins (PABP) which stabilize the poly(A) tail, which prevents the exonucleolytic cleavage of the mRNA. Furthermore GW182 interacts with the deadenylase which subsequently leads to deadenylation and mRNA decay [93-95, 118, 119] (Figure 13C). Repression of mRNA translation is a second mechanism [104, 120]. In this pathway the RISC complex prevents the assembly of the 40S and 60S ribosomal subunits, inhibits the initiation of translation at bound ribosomes and stops the elongation by disrupting the elongation factor (elFaE) [118] (Figure 13B). In addition to these two main mechanisms, one of the Argonaute proteins AGO2 has been reported to cleave the mRNA where perfectly complementarity occurs in the miRNA/mRNA duplex [111, 121, 122].

1.3.5.: Additional regulatory elements

The importance of miRNAs to mammalian development (mice) was shown through genetic deletion of Dicer. The mouse embryos with a disrupted DICER-1 gene showed severe reduction of size compared to wild type (wt) embryos at day 7.5 of embryonic development [123]. As no other function has been described for DICER, this implies that these phenotypes are likely due to a lack of processing of miRNAs. Another level of regulation in the miRNA biogenesis is a self-regulation between Drosha and DGCR8 [124]. The Microprocessor complex prevents translation of the DGCR8 mRNA by cleaving a stem loop structure in the second exon [124, 125]. The interaction between DGCR8 and Drosha also stabilizes Drosha [124]. Further regulatory elements are modifications at the posttranslational level. The Drosha/DGCR8 complex localization can be influenced by phosphorylation [126, 127] and acetylation of DGCR8 can influence the activity [128] and stability [129, 130] of the complex.

Other factors interact with processing to mature miRNAs at the pri- and pre-miRNA levels. One prominent factor is Lin28, which binds with the terminal loop region [131] of both pri- and pre-let-7 to suppress Drosha and Dicer processing[132, 133]. Let-7 itself inhibits Lin28 levels through the miRNA pathway [134]. Therefore, Let-7 and Lin28 form a feedback loop [135, 136]. Other feedback loops have been described, for example that between the alternative splicing factor SF2/ASF and miR-7. SF2/ASF enhances the Drosha processing of pri-miR-7 to pre-miR-7, whereas miR-7 targets SF2/ASF mRNA to reduce its protein levels [137].

A third example for a protein involved in miRNA processing is the heterogeneous ribonucleoprotein A1 (hnRNP A1) which binds with the loop of pri-miR-18a through the UAGGA/U motif to induce processing [138, 139]. In contrast, the interaction of hnRNP A1 with pri-let-7a-1 inhibits its processing

35

[140]. SRP (KH-type splicing regulatory protein) is a component of both the Drosha and Dicer complex and promotes the maturation of several miRNAs including let-7. The recognition occurs at the terminal loop through G rich sequences [141, 142]. Another example for the regulation of miRNA biogenesis through the terminal loops is MCPIP1 (monocyte chemoattractant protein-induced protein 1). It cleaves the TL of some pre-miRs and therefore inhibits the production of a bona fide mature miRNA guide, leading to a degradation of the miRNAs. The miRNAs with the biggest increase in concentration upon MCPIP1 knockdown were miR-21,-26a, 146a and 155. [143].

Other proteins are also described to regulate miRNA biogenesis through the interaction with their loops, but their mechanisms remain to be clarified. FXR1P (fragile X-related protein 1) promotes the biogenesis of miR-9 and miR-124 by binding to the pre-miRs and also to Dicer [144]. TDP-43 (TAR DNA binding protein 43) increases levels of let7-b and decreases levels of miR-663 [145]. HuR (human antigen R) depletion resulted in enhanced levels of miR-7 in HeLa cells. The regulation of miR-7 is suggested to work through the binding HuR to the intronic region hosting the miR-7 precursor [146].

1.3.6.: MiRNA in human disease

MiRNAs dysfunction is correlated to several human diseases. The involvement of miRNAs in cancer has been shown in several studies [147-150]. A genome wide study showed that 53% of all known miRNAs are found in regions connected to cancer [151]. The miRNAs play key roles in different processes in tumorigenesis [152] such as [153], proliferation [154], angiogenesis [155], migration [156] and invasion [157, 158]. MiRNAs can act as tumour suppressors (e.g. let7-family) [159] or as oncomiRs (e.g. miR-17-92 cluster) in lymphoma, lung, colon and pancreatic cancer [160- 162]. The interplay of let-7a and Lin28 is a well-studied example for the role of miRNAs and its regulation of proteins in cancer. Studies showed that Lin28 is overexpressed in cancer cells and that the let-7 levels are reduced [163]. Subsequently the low let-7 levels lead to upregulation of oncogenic targets like c-MYC and K-RAS [164].

MiR-122 is an example for the role of miRNAs in viral infections. MiR-122 is a highly conserved miRNA with a full conservation of the mature sequence through 18 vertebrates [165]. The inhibition of miR-122 using an antimiR (miravirsen) was shown to reduce the levels of the Hepatitis C virus (HCV) in human HCV patients. Miravirsen was shown to inhibit the Dicer and Drosha processing of pre/pri- miR-122 by strand invasion [166] in addition to its function of neutralizing miR-122 by hybridizing to the miRNA [167]. The suggested mechanism by which MiR-122 protects viral RNA from being degraded is the recruitment of a RISC like complex to miR-122 targets in the 5’-UTR of the viral RNA [168].

36

1.4.: RBFOX proteins

In mammals the RBFOX family (Feminizing on X) consists of three different family members, RBFOX1 (A2BP1), RBFOX2 (RBM9) and RBFOX3 (NeuN). RBFOX-1 is found in heart, skeletal muscle and brain [169, 170], RBFOX-2 is expressed in a wide variety of tissues [169] and RBFOX-3 is expressed in neurons [171]. All proteins of the RBFOX family are highly homologous between species and between different family members. The RNA recognition motif (RRM) of RBFOX-1 and RBFOX-2 is identical between mouse and human and shows only a slight variation between human, mouse, nematode, zebrafish and fruitfly [172] (Figure 14). Recent publications [173, 174] suggest that the N and C-termini of RBFOX proteins play a role in its function and also in the recognition of RNA by an unknown mechanism. In general proteins of the RBFOX family are located in the nucleus due to a nuclear localization signal (NLS) in the C-terminal highly conserved region [170, 175-178] (Figure 14B). However, immunochemical staining indicates the RBFOX proteins are also found in the cytoplasm.

The binding of RBFOX to a sequences specific RNA stretch was first shown in two studies [170, 179] using zebrafish RBFOX-1 and human RBFOX-1 respectively. In both studies the SELEX technique was used to show a high selective binding of RBFOX to a (U)GCAUG sequence. This motif is highly enriched at alternative splicing sites [169, 170, 180, 181].

37

Figure 14. Conservation of different RBFOX family members across species. A) Map of conservation of the FOX-1 family RRM. Identical amino acids (AA) are marked in orange, similar AA are marked in yellow. B) Comparison of the identity of full length RBFOX protein between mouse RBFOX-1, mouse RBFoX-2, zebrafish RBFOX-1 and C. elegans RBFOX-1. The RRM is marked in orange, the C and N-terminal regions are marked in yellow. Percentages numbers show the AA identity compared to mouse RBFOX-1. C) Comparison of the C-terminal AA sequences of the RBFOX-1 family and other proteins (hnRNP A1, hnRNP D, hnRNP F and TAP) containing the nuclear localization signal (NLS).[172]

1.4.1.: Structure of the RBFOX/RNA complex

The structure of the RBFOX RRM/UGCAUGU complex was solved in 2006 by our collaborators Auweter et al.[9](Figure 15A) using nuclear magnetic resonance (NMR) spectroscopy. In this study also the KD values of the protein/RNA complex and several mutated RNA/RRM sequences were determined using

SPR spectroscopy. The RRM of RBFOX adopts the canonical RRM structure, a structure with a slight variation, a small two stranded  sheet between the loopFigure 16 C/D). The titration of the protein with the RNA consensus sequence (5’-U1G2C3A4U5G6U7-3’) revealed a 1:1 binding stoichiometry. The bases U5-U7 contact the canonical -sheet whereas the first 4 nt have contacts with the loops  and 4. Another interesting structural feature of the complex is the pucker of the sugar of the last 6 nucleosides. Even though the C3’-endo form is the most common form of RNA [182], in the RNA/RRM complex, all sugars but U1 are in the C2’-endo form (Figure 16 C, D). The formation of the RNA/RRM complex is driven by several hydrophobic and electrostatic interactions (Figure 16 B). The four positively charged side chains (R194, K156, R127 and R184) of the protein interact with the negatively charged phosphate backbone of the RNA and F126, F160 and H120

38 undergo base stacking with several bases. U1 and G2 stack with F126, the base U5 stacks with H120 and the base G6 with F160 [9]. The study also suggested a transition from a free RNA from to a bent form, although this was not shown for natural targets and only for the short 7-mer RNA sequence. This means the RBFOX RRM/UGCAUGU interaction follows a so called induced fit model [183, 184].

F126 in the RBFOX-1 RRM also plays a crucial role in binding the RNA by an unusual binding mode. Mutations of the F126 showed a strong decrease in affinity of the protein for UGCAUGU. The mutated RBFOX-1 RRM variants F126A, F126I and F126R all showed an approximate 1500-fold loss of affinity, whereas substitutions with other aromatic side chains (F126H, F126W and F126Y) showed much smaller changes (ten-fold, or less). The authors suggested that the -stacking of the nucleobases with an aromatic side chain of the protein is essential for binding.

Comparison of the TOCSY (Total Correlated Spectroscopy) spectra of the RRMwt/RNA complex and the

RRMF126A/RNA complex showed that the coupling between the hydrogens at atoms 5 and 6 (H5-H6 correlation); (Figure 15 B, C) of U5 and U7 do not change, in contrast to those of the H5-H6 correlations of U1 and C3. In contrast, when comparing the TOSCY spectra of the F160A (part of the -sheet surface) deletion RRM, the H5-H6 correlation of the U5 and U7 nt change drastically while the H5-H6 correlation of U1 and C3 remain nearly the same (Figure 15B). This indicated to the authors, that the binding of the last three nucleotides is unaffected by the binding of the first four nucleotides and is regulated by the canonical -sheet whereas the binding of the last three bases has no effect on the binding of the first 4 bases. Those last three nucleotides are bound to the RBFOX RRM by being wrapped around F126. The authors also mention that it is striking that in other human RBDs (531 RBDs in the Pfam database; www.sanger.ac.uk/software/pfam) the equivalent position of F126 is conserved in 11%, in 9.8 % the position is occupied by tyrosine and in 4.3% by tryptophan, all of which contain an aromatic side chain. Furthermore, the natural occurrence of those amino acids in vertebrates is considerably lower (4.0, 3.3 and 1.3 %) compared to the numbers above. The fact that equivalent positions of the F126 of the RBFOX RRM in RRMs from different RBPs showed a certain preference for AAs with an aromatic side chain suggests to the authors, that this binding mode of the nt to the F126 is a not an unusual mode within RNA:RRM complexes and is conserved in different proteins.

39

Figure 15. A) Interactions between selected bases and amino acids shown by the NMR structure. B) Overlay of sections of 2D TOCSY spectra showing the H5–H6 correlations of uracil and cytosine of solutions of 5’-UGCAUGU-3’ in the presence of one equivalent of Fox-1 (red), Fox-1 F126A (black), and Fox-1 F160A (blue). C) Structure of uracil and cytidine with numbering of the atoms. Adapted from[9]

40

Analysis of the NMR structure suggests that the specificity of the binding of RBFOX RRM to UGCAUGU stems from numerous inter- and intramolecular H-bonds (Figure 15A). This specificity was further probed by the NMR structure and SPR measurements in which SPR sensograms were recorded using immobilized RNA. U1 has one H-bond to R127 and one intramolecular H-bond to C3 (Figure 25). G2 shows three intermolecular H-bonds to I124 (2) and R184 (1) and two intramolecular H-bonds to A4, which has no further hydrogen bonds. C3 has one additional H-Bond to N151 and the aforementioned intramolecular bond. The U5 has one H-bond to each of N190 and T192. The most important nucleobase regarding H-bonds is the G6. It has four intermolecular H-bonds (R118 (2) and T192 (2)), one H-bond to the O5 of the sugar of U5 and one intramolecular H-bond to the sugar of U7, which has no further H-bonds. The hydrogen bond pattern of U7 suggests that this position maybe redundant for RNA binding to RBFOX.

Mutations of single nucleotides mostly confirmed those hypotheses [9]. The loss of free energy for one H-bond is predicted to be 4-7 kJ/mol [185]. Possible structures for single mutations in the RNA are suggested in Figure 27. The changes of the sequences of the FBE (UGCAUGU) will be shown in this work as YNX where “Y” is the consensus base, “N” is the position within the consensus motif, and “X” is the newly introduced base. These suggestions do not account for changes in hydrophobic interactions, ionic interactions, -stacking [186-188] and steric clashes. Furthermore, the sugar moieties which can also play a role (U7, Figure 15A) are not considered. The G values and the following hypotheses were put forward by Auweter et al.[9]. Compared to the UGCAUGU consensus RNA sequence, U1A and U1C yields G of 4.0 or 4.5 kJ/mol, respectively. This corresponds to the approximate energy loss of one H-bond. The mutation C3U leads to a G of 14 kJ/mol, which is more consistent with a predicted loss of two H-bonds, according to Auweter et al. [9]. However, G2A shows a G of 15 kJ/mol which is lower than expected for the predicted loss of four H-bonds [9]. This could be explained by a strong influence of the -stacking, for example from the higher electron density on the adenosine ring system [186-188]. The mutation A4P (P=purine) and A4I (I=inosine) leads to a loss of 5.2 or 13 kJ/mol, respectively. The A4P mutation should cause the loss of the H-Bond from the exocyclic amino group, whereas A4I also loses the H-Bond from the exocyclic amino function of G2 (Figure 27 wt4).

The U5C sequence was previously shown using SELEX studies to be another relatively tight binder of RBFOX [170]. SPR measurements by Auweter et al. of UGCACGU to RBFOX-1 RRM show a G of 3.9 kJ/mol [9], which indicates a loss of one H-bond. The mutation G6A is predicted to lose 4 H-bonds which is consistent with the observed G of 19 kJ/mol.

41

Figure 16. Overview of the solution structure of the RBD of RBFOX-1 in complex with UGCAUGU. (A) Overlay of the final 30 structures superposed on the heavy atoms of the structured parts of the protein and of the RNA. The protein backbone is gray, the RNA backbone is orange, the phosphate groups are red, and the RNA bases are yellow. Only the ordered region of the protein (residues 116–194) is shown. (B) Surface (heavy atoms of residues 116–194) and stick (heavy atoms of the RNA) representation of the lowest energy structure. The protein surface is painted according to surface potential with red indicating negative charges and blue indicating positive charges. The RNA is colored as in panel (A). (C) The lowest energy structure in ribbon (protein backbone) and stick (RNA) representation. The color scheme is the same as in (A), important protein side chains involved in hydrophobic interactions with the RNA are represented as green sticks. (D) Same as (C) but rotated by 90% around the indicated axis[9].

42

1.4.2.: RBFOX in alternative splicing

The most studied function of the RBFOX family is the regulation of alternative splicing [172, 175, 181, 190-197]. The first publication of a UGCAUGU stretch which was essential for alternative splicing was already reported in 1994 [198] before any RBFOX protein was characterized. In this study the inclusion of the EIIIB exon into the rat fibronectin (FN) gene was elucidated. In this study, for the first time, the importance of the UGCAUG for alternative exon inclusion was shown. Several copies of this repeat were found approximately 500 nt downstream of the exon. The authors also stated, that these repeats are well preserved (seven of nine) in human. By mutating the sequence to UGACUG (mutated) or AGUCGU (scrambled) the exon was no longer included into the mRNA, neither in HeLa cells nor in F9 (teratocarcinoma) stem cells. They also examined data from an older study [199] involving the splicing of the calcitonin/calcitonin gene-related peptide

Figure 17. A) General mechanism of the inhibition of exon (CGRP) mRNA. HeLa and thyroid C cells include exon 4 inclusion upon binding of RBFOX in the upstream intronic whereas F9 cells include exons 5 and 6. By mutating an area flanking region (UIF). B) General mechanism of the exon inclusion upon binding of RBFOX in the downstream which is approximately 700 nt upstream of exon 4, exon 4 intronic flanking region (DIF). C) A model for repression of prespliceosome complex formation by the Fox-1 was included into the mRNA. These mutations removed family. The repression of calcitonin-specific exon 4 of calcitonin/CGRP pre-mRNA in neuronal cells by the Fox-1 one or more GCAUG repeats. Two later studies confirmed family involves two distinct regulatory events. First, the -34 element in the UIF region prevents E’ complex the significance of these observations [198]. One of them formation through repressing SF1 binding to the branch point. Second, the +45 exonic element blocks transition investigated the influence of the (U)GCAUGU repeats for to E complex via inhibiting U2AF65 binding to the splicing of alternative splicing [200] while the other polypyrimidine tract (PY). [189] investigated the influence of RBFOX-1 and RBFOX-2 upon the splicing of the CGRP mRNA [201].

In an independent study it was shown that two UGCAUGU repeats occur in the cis-acting RNA structure which is essential for N30 exon inclusion into the myosin II heavy chain-B (MHC-B). The element is relatively far (roughly 1.5 kb) downstream of the N30 exon. The included exon is only present in a neuron-specific isoform [202] of the protein.

43

Taken together these studies suggest a position-dependent function of RBFOX [172] induced alternative splicing. If the FOX Binding element (FBE) is in the upstream intronic region, the exons are generally excluded and if the FBE is in the downstream intronic region, the exon is generally included into the mature mRNA (Figure 17). This position-dependent mode of action is also described for two other alternative splice factors, NOVA [203] and PTB [204].

Several studies on alternative splicing by RBFOX have used a minigene construct encoding parts of the calcitonin gene and calcitonin gene-related peptide (CGRP) as models. Zhou et al showed that RBFOX-1/2 prevent one component of the alternative splicing machinery, SF1 (splicing factor 1), from binding to the polypyrimidine tract of the target mRNA [205, 206] and therefore in a first step repress the formation of the E´ complex (Figure 17C). In a second step, RBFOX-2 prevents the binding of Tra2 (transformer-2 protein) and SRSF6 (serine/arginine-rich splicing factor 6) to the splicing enhancers and therefore also inhibits the formation of the E complex (early complex) (Figure 17C) [189]. The E’ and the E complexes are complexes which are assembled early in the formation of the spliceosome which removes introns during alternative splicing [207-211]. A second study by Fukumura et al. focused on exon 9 of F1 as a model system. In this system RBFOX-1 inhibits the exon inclusion by preventing intron 9 to be spliced. This splicing is dependent on U2-snRNP (U2 small nuclear ribonucleoprotein). Fukumura and Sun et al. showed that the C-terminal domain of RBFOX-1 was important for splicing activity whereas the N-terminal domain was not necessary for exon repression [174, 212, 213]. They tested the importance of the RRM for splicing, by replacing the RRM by a MS2 coat protein. The MS2 coat protein is a protein from the MS2 bacteriophage which recognizes a 21nt stem loop (GCGUACACCAUCAGGGUACGC) [214, 215] which was inserted downstream from exon 7 in a human SMN2 (Survival of motor neuron 2) minigene [216]. The fused protein could still act as a splicing promotor, when the RRM was replaced and the C-terminal region of RBFOX was intact. This suggests a modular system for RBFOX, with the C-terminal end as the functional part and the RRM as the part for binding to the target [174].

44

1.4.3.: Variation in the recognition of RNA by RBFOX proteins

Rank Sequence Rank Sequence Importantly for our project, only 30-50% of all RBFOX-2

1 UGCAUG 14 AUGCAU binding sites contain the consensus motif UGCAUGU [197, 218]. A recent study [217] revealed a broader 2 GCAUGC 15 GAAUGC spectrum of binding motifs for RBFOX using a new 3 GCAUGU 16 UGCAUC methods - RNA Bind-Seq - in which 25 additional binding 4 GCAUGA 17 GCUUGC motifs (Table 1) were detected. Most of the new motifs 5 AGCAUG 18 GCACGA show variations on the terminal bases (U1 and U7), as 6 UGCACG 19 UGCAUU well as the U5C variation (according to numeration 7 CGCAUG 20 GAAUGU before) which was also shown by SELEX studies to be well 8 AGCACG 21 GCAUUU tolerated. The alternative motifs are summarized at the 9 GCAUGG 22 UGCUUG entries 15, 17, 20, 22, 23 in Table 1. In another study 10 GGCAUG 23 GAAUGA [197], two further motifs were revealed. [CAG]C[AU]CAC 11 GCACGU 24 GCAUCU and UGUGUG. In my view, both motifs seem doubtful.

12 UUGCAU 25 AGCAUC The first shows a high redundancy and does not feature

13 GCACGC the two key Gs in the sequence. The GUGUG motif is a

Table 1. Ranking of new RNA binding motifs from known binding element of the splicing factor SUP-12 Lambert et al. [217]. (human orthologue RBM38), which is known to interact coordinatively with RBFOX [194, 219, 220]. This complex is partially stabilized through electrostatic interactions between RBFOX and SUP-12 [221]. RBM38 is also expressed in mouse embryonic stem cells (mESC) which were used to determine the UGUGUG motif [222, 223] and in addition the GUGUG is the second most frequent pentamer sequence within conserved regions in proximity to alternative splice sites [219]. Therefore it is possible that this motif does not originate from RBFOX binding to the RNA stretch, but is an artefact occurring close to a RBFOX consensus motif, both of which are enriched in proximity of alternative splice sites. It would be of interest to evaluate the flanking RNA sequence of the UGUGUG motif for FBEs and to investigate, if SUP-12 is also bound to the RNA under the experimental conditions used by Jangi et al. [197] e.g. using MALDI mass spectroscopy (Matrix-assisted laser desorption/ionization).

45

1.4.4.: MicroRNA regulation of RBFOX-3

In addition to the alternative splicing function of RBFOX-1 family it was recently shown, that RBFOX-3 alters the processing of pri-miRs to pre-miRs in P19 cells and it also interacts with pri-miRs in mouse neural tissue [173, 175]. In a first step, the authors of the study performed a PAR-CLIP experiment in both P19 (embryonal carcinoma cells) cells, as well as in mouse neural tissue. Brain and the upper part of the spinal cord were taken from mice which were injected with 4-thiouridine. The P19 cells were also labelled with 4-thiouridine in a conventional fashion. P19 cells and the mouse neural tissue were treated according to the general PAR-Clip method [224] (1.5.: Detection of new RNA binding motifs). In the P19 cells, 9% (399) of the CLIP sites were designated as pri-miR clusters and also the in vivo experiment with mouse neural tissue showed RBFOX-3 binding to 157 pri-miR clusters, from which 90% overlapped with the miR clusters from P19 cells.

The changes of the expression levels of 97 miRNAs in P19 cells upon RBFOX-3 knockdown were probed using a microarray. The levels of 34 miRs were decreased upon RBFOX-3 knockdown and the levels of ten miRNAs were increased. The hairpin loci of 68% of those effected miRNAs were also targets of RBFOX in PAR-Clip experiments. This indicates for the authors a correlation of RBFOX-3 binding to pri-miR loci and the expression of their miRNAs. They next focused on miR-15a and miR-485, taking miR-214 as a negative control for further investigations. Notably, none of those miRs contains any consensus FBE. However, the results of three miRs (miR-15a, -485 and -214) in the microarray were confirmed in an additional experiment, in which an RBFOX-3 expression vector and an empty vector were transfected into P19 cells. This caused increased levels of miR-15a and decreased levels of miR-485, whereas miR-214 showed no significant change. In a further step, they conducted a RIP (RNA immune precipitation) experiment in which they pulled down the RBFOX-3/RNA complex using an antibody against RBFOX-3, followed by PCR with pri-miR specific primers. The experiment showed an enrichment of pri-miR-15a and pri-miR-485 but not for pri-miR-214. They mutated different positions in the loop and the stem of pri-miR-15a and pri-miR-485, to determine the binding sites of RBFOX. Mutations extending base pairing in the stem, close to the loop, deletions in the loop and mutations in the loop reduced binding of RBFOX-3 to pri-miR-15a. Mutations in the stem, especially in the region terminal to the loop, decreased the binding of RBFOX-3 to pri-miR-485. The level at which RBFOX-3 regulates the biogenesis of miRs was determined by an in vitro processing assay, using nuclear extracts from undifferentiated P19 cells (no RBFOX-3 expression), differentiated P19 cells (RBFOX-3 expression) and differentiated P19 cells which were pre-treated with an shRNA (short hairpin RNA) against RBFOX-3 (no RBFOX-3 expression). Pre-miR-15 was only produced in cells containing RBFOX-3 whereas pre-miR-485 production was reduced upon RBFOX-3 expression. Finally, in a pulldown experiment,

46 they showed enhanced recruitment of the Drosha-Dgcr8 complex to pri-miR-15a upon RBFOX-3 addition, whereas pri-miR-485 showed the opposite effect. Therefore they concluded that RBFOX-3 regulates the pre-miR levels upon regulating the recruitment of the Drosha-Dgcr8 complex.

1.5.: Detection of new RNA binding motifs

Interactions between RBPs and RNAs may occur through predominantly structure-specific (e.g. DICER [225], AGO[121, 226], Vts1[227], dsRBD3 from [76, 228]) or through sequence-specific [229] (Lin28[230, 231], RBFOX family [170, 232], GLD-1 [233, 234]) contacts, although often both are expected to play important roles [235]. Knowledge of the interaction partners of an RBP enables us to surmise the function of the protein. For example, proteins which bind to miRNAs may play roles in their biogenesis, their stability [236, 237] or their cell/tissue localization [238]. To identify the specific RNA binding sequence with which RBPs interact, mainly two methods are used: SELEX (Systematic Evolution of Ligands by Exponential Enrichment) [170, 232, 239-241] or CLIP (cross-linking immunoprecipitation) procedures and variations thereof [242-244].

1.5.1. Systematic Evolution of Ligands by Exponential Enrichment (SELEX)

Aptamers are short oligonucleotides which bind with strong affinity to their target. In SELEX a randomized pool of chemically synthesized RNA or DNA is used to bind the query protein. The pool consists of a library of about 1013-1015 sequences, which carry common 5’ and 3’ ends as primer targets and cap a region of 20-80 random nucleotides. PCR is then used to enrich for sequences which bind favorably the protein in iterative cycles. The primers are also needed for the transcription of the DNA pool into the RNA pool in RNA SELEX experiments. Most experiments use a library with a completely randomized core sequence. However if a specific structural element is known to bind the target, e.g. a stem loop structure, certain constant regions can be included. This was, for example, the approach taken by Davis et al [245] to find aptamers binding to guanosine triphosphate (GTP). They introduced a 12 nt stem structure in the center of a 52 nt randomized sequence. [246]. For RNA SELEX experiments, the DNA library is converted into a RNA library [247] using bacteriophage T7 polymerase. [248-250]. Proteins or other targets to be queried for binding are then incubated with the library. Relatively high concentrations/amounts of the target are needed, which is a major drawback of the technique (Liu et al used 10-8 M of EGR-1 (Early growth response protein 1 for SELEX [251]). Aptamer- bound and -unbound proteins are then separated using one of several different methods: sepharose

47 or agarose [252, 253], ultrafiltration through nitrocellulose filters [239, 254] or magnetic beads [255- 258]. Magnetic beads are coated for example with antibodies to capture the protein of interest or with streptavidin to bind biotin-labelled targets. The beads contain a ferromagnetic substance, which can be separated using a magnetic stand. Ultrafiltration uses filters, which use a “size-cut-off” retaining molecules and complexes, which exceed a certain size. Ultrafiltration of the SELEX mixture can lead to a non-specific enrichment of RNA interacting with the nitrocellulose filters. The aptamers bound to the target are eluted afterwards and amplified using PCR (DNA SELEX) or RT-PCR (RNA SELEX). The new pool is then prepared for a new round of SELEX. In case of an RNA SELEX, the DNA pool from RT-PCR has to be transcribed into RNA using T7 RNA polymerase followed by RNA purification. In a DNA SELEX the double stranded strands have to be separated. A common method is the use of biotinylated antisense primers leading to biotinylated antisense DNA strands and non-biotinylated template strands [259]. The dsDNA is immunoprecipitated using streptavidin and both strands are separated by using heating or NaOH. The strand of interest is therefore released and can be used in the next SELEX round.

The enrichment procedure is repeated several times, reducing the diversity of the aptamer pool. In each cycle the separation steps have to be adapted in order to capture only the strongest binders (Figure 18). The resulting sequences are sequenced, and then aligned so as to reveal common RNA motifs which represent the RNA binding motifs.

48

Figure 18. Scheme of the SELEX mechanism. Each experiment uses a pool of a randomized RNA or DNA pool with about 1015 different sequences. In a first step the target is incubated with the RNA/DNA pool. After washing away unbound RNA/DNA molecules, the remaining molecules are enriched using PCR. These steps are repeated several times (6-20). After the final round of amplification, the aptamers are sequenced and analyzed. [241]

Various modifications to the SELEX procedure exist for each of the experimental steps and are comprehensively reviewed by Stoltenburg et al. [241] and Marshall et al. [260]. Even though the SELEX technique has many positive aspects - can be used for a wide variety of different targets, automation is available, several modifications [241] e.g. photo-SELEX, crosslinking the aptamer to the target [261], are available - it also has several drawbacks including:

- Lack of a general protocol for each step in the protocol, e.g. the number and the nature of the washing steps and the washing buffer has to be adapted for every experiment; - It is a very time consuming method;

- Thermodynamic (G) or kinetic (ka, kd) data is not obtained from the method.

49

- PCR efficiency can influence the ratio of each binder due to different PCR efficacies and therefore influence the results of each experiment by influencing the number of sequence reads - The number of different RBMs sequences is highly dependent on the amount of SELEX rounds. Too many rounds of SELEX may lead to a narrow aptamer pool, to little SELEX circles can lead to a pool containing a large variety of weak binders. This might have happened for RBFOX for which SELEX found a highly specific consensus sequence [170, 179] whereas PAR-Clip showed a broad sequence specificity[173].

1.5.2 Cross-linking immunoprecipitation (CLIP) based methods

Several CLIP-based approaches were developed recently to identify RNA binding motifs to which RNA binding proteins bind. The two main CLIP procedures are HITS-CLIP (high-throughput sequencing cross-linking immunoprecipitation) [262-264] and PAR-CLIP (photoactivatable-ribonucleotide-enhanced Crosslinking Immunoprecipitation) [265, 266] (Figure 20) methods and will be described briefly below.

In the first step of a HITS-Clip experiment [262-264] cells are irradiated with 265 nm to form covalent bonds between the RNA and proteins. The protein of interest and the covalently linked protein-RNA complexes are then extracted from cell lysates, using immobilized (e.g. on magnetic beads) antibodies against the protein of interest. Unbound RNA is subsequently washed from the target with stringent washing methods. The bound RNA is partially digested before RNA adapters are ligated to the 3’ ends of RNA bound to the protein (adapter ligation) and then the protein is degraded. Next 5’-adapter sequences are ligated to the RNA. The RNA is reverse transcribed by reverse transcriptase (RT) into cDNA (complementary DNA) using primers to the adapters. The pool of cDNA obtained is amplified using PCR, which is subjected to high-throughput sequencing. The resulting DNA sequences are mapped to the genome to locate the gene sequence giving rise to RNA binding to the protein of interest.

In PAR-Clip experiments cells are treated with 4-thiouridine (4-SU) or 6-thioguanosine (6-SG) (Figure 19). This is incorporated into newly transcribed RNA of the cell. Thanks to these artificial RNA bases, a mild irradiation of the cells at 365 nm crosslinks RNA to proteins. The irradiation products of the bases to amino acids containing aromatic side chains are shown in Figure 19. The following steps are identical to those of HITS-CLIP. The PAR-Clip method yields the additional information of which base cross links to the protein to that of HITS-Clip, because the cross-linked base gives rise to a mutated

50 sequence. The base crosslinking to the protein induces a mutation in the cDNA during reverse transcription (Figure 6). Use of 4-SU gives rise to a C in the PCR products because the crosslinking reaction changes the secondary amine of the base to a tertiary amine (Figure 19). This leads to a new UG . Thus, the crosslinking of the 4-SU to an aromatic amino acid results in the incorporation of guanosine by the reverse transcriptase into the cDNA opposite to the uridine [266]. Afterwards, a normal GC base pair is created through PCR (Figure 19). In contrast, in the HITS-Clip experiment the irradiation with 254 nm leads to a cross-linked product, which still accepts adenosine as the base-pairing partner. On the other hand, using 6-SG leads to a G to A mutation in the final PCR product. This is accompanied by a lower mutation rate of about 26% compared with about 90% for 4-SU [266, 267]. The observation for the 6-SG cannot easily be explained (Figure 19) by the cross-linked complex, and thus the mechanism of the mutation is therefore still unknown. The mutation after PCR amplification at the nucleoside which is cross-linked to the protein can yield a single nt resolution where the RNA binds to the protein. When aligning the sequence reads with the genome the mutations in the sequence reads in comparison with the genome show the precise site

Figure 19. Scheme of the mechanism of U to C transition, adapted from Ascano et al [266]. Aromatic amino acid residues are marked in blue.

51

The CLIP procedures offer several advantages over SELEX. The methods can be generally applied to nearly all targets, for which antibodies are available. Also no RNA or DNA library synthesis is necessary. The most significant advantage is that the system works on living cells and in case of HITS-CLIP even in different tissues (if the target is expressed) and the methods yields genome wide results. However, CLIP methods also have some limitations. For example the crosslinking efficiencies are typically low (1-5%) [263] and also produce biases in crosslinking [268]. The low crosslinking efficiencies means that RNA target proteins expressed low levels may be difficult to detect [263]. The accuracy of the results can also be affected by the ligation efficiency of primers. Furthermore, it is reported, that T4 RNA ligase has a base-dependent bias for adapter ligation [269]. Finally too extensive RNase digestion, can lead to a loss of bonafide RNA binding sites [270].

52

Figure 20. Scheme of the work flow of the HITS-Clip and PAR-Clip procedure. For PAR-CLIP the cells are treated with the modified nucleobase. Afterwards the cells are irradiated with 365 or 254 nm wavelength respectively. After lysis, partial digestion and immunoprecipitation 3’-adapter ligation is done. The protein is digested using proteinase followed by 5’ adapter ligation. The RNA is reverse transcribed. For PAR-CLIP the modified base bound to the amino acids can either lead to a transition from in thymidine to cytidine (4SU), or from guanosine to adenosine (6SG) or no mutation can occur (read-through). For HITS-CLIP the binding site can either lead to a deletion or can also not affect the transcription. The cDNA is afterwards amplified using PCR followed by high-throughput sequencing. Adapted from [271]

53

Aim of the project

We intended to develop a new general method, for determining RNA binding motifs of RBPs without some of the disadvantages of the aforementioned techniques. We chose SPR spectroscopy to analyze the binding of a library of all possible 5-mer RNAs to a protein of interest. Such a method should be generally applicable. It can use a wide variety of immobilized proteins, using specific labelling (Biotin, HIS-tag), immobilization with an antibody directly from the cell lysate or direct chemical immobilization using EDC/NHS (1.1 Surface Plasmon Resonance spectroscopy (SPR)). If the protein is only present in low concentrations, a plasmid can be used to overexpress the protein in the cell. The resulting binding events can only be due to sequence specific interactions, since 5 mers should not possess secondary or tertiary structure. An advantage of SPR spectroscopy measurements is the direct preliminary information of the binding kinetics of the binders upon analysis or visual inspection of the binding response. Especially a high complex stability is important for potential drug targets [272-275] which is shown by a slow dissociation constants in a SPR spectroscopy experiment. The dissociation constant can easily extracted from kinetic measurements and also a preliminary result for the dissociation constant can be obtained from an assay aiming for a “yes/no” answer for the binding of the analyte to the ligand. The approach requires synthesis of a full library of all possible pentaribonucleotides, consisting of 1024 (n=45) different sequences. As a model system, we chose the RBFOX family proteins, since it is well studied, it binds strongly to a well-defined motif and the structure of the RNA/FOX RRM complex is known [9].

For a proof of concept study we designed a dedicated library for RBFOX. As RBFOX-2 RRM binds to the UGCAUGU sequence as two independent tetranucleotide and trinucleotide motifs, we designed a RNA library containing where the first four nucleotides are kept constant and the last three bases are varied (UGCANNN) and vice versa (NNNNUGU).

54

2. Results and Discussion 2.1.: ELISA Assay of RBFOX against pre-miRNAs

Harry Towbin in our lab developed an enzyme-linked immunosorbent assay (ELISA) based assay, to determine interactions between proteins and pre-miRNAs [276]. He used this assay, to probe for new interactions between RBFOX-2 and pre-miRNAs (Figure 21) using a library of 95 human pre-miRNAs (Supplementary Table 2) which were selected for their conservation of the terminal loop (70) [277], in addition to a further 25 pre-miRNAs of interest in disease related areas. The hairpins were synthesized with and without a 3’-biotin label by Dr. P. Wenter using 2’ O-TOM ([(triisopropylsilyl)oxy]methyl ) synthesis chemistry [278]. A 384 well plate was coated with RBFOX RRM and the biotinylated hairpins were added afterwards. Horse radish peroxidase attached to streptavidin was incubated afterwards. To determine the amount of hairpin bound, enhanced chemiluminescence substrate was added which interacts with the peroxidase to yield a chemiluminescence signal which is a measure for the amount of analyte bound to the target. In another setup, using full length native RBFOX as an analyte, a 384 well plate was coated with streptavidin on which biotinylated hairpins were coated. HeLa cell lysate was added subsequently and incubated. Afterwards, RBFOX-2 antibody was added and then an anti-antibody with a fused peroxidase. The chemiluminescence was measured to indicate binding after addition of enhanced chemiluminescence substrate.

The results are shown in Figure 21B and C. When we compared the results of the screen obtained using the full length RBM9 protein with that of RBFOX RRM, it showed that the latter had a smaller number of signals (Figure 21 C, B), of which six pre-miRNAs show strong binding, (hsa-miR-32, -20b, -373, -107, -9-2, -9-1). Three of those binders (miR-32, -107, -20b) contained the consensus core motif GCAUG, the other binders could not be rationalized by any known consensus binding motif of the RBFOX family determined by SELEX. On the other hand, the results from the ELISA using the HeLa lysate yielded a more complex picture (Figure 21 C). The three pre-miRs with the GCAUG consensus motif are in positions 1, 3 and 8 of Table 2. Several miRNAs showed large effects in the screen, pre-miR-19a -32 and 1-2. Pre-miR-19a was a rather weak binder against the RRM, but was the second strongest binder in the HeLa lysate-based assay. Pre-miR-1-2 was also a rather strong binder in the HeLa screen, but not for the recombinant protein. Pre-miR-15a, showed no binding although it is reported to interact with RBFOX-3, which contains the same RRM as RBFOX-2 [172], through the terminal loop [173] to enhance microprocessor recruitment. The overall differences between the results of the screens could have been the result of several factors. Additional binding events could be explained through binding to other proteins of RBFOX, such as RBM38 (human ortholog of SUP-12) [194, 219-221], which is known to coordinatively bind RNA with RBFOX to regulate

55 alternative splicing. The weaker binding seen with some miRNAs in the RBFOX-2 HeLa screen might have been due to competition with other factors binding to the RNA.

56

Figure 21. A) Schematic illustration of RBOX pre-miRNA ELISA. B) Normalized binding intensity of pre-miRNAs against immobilized RBFOX RRM. C) Normalized binding intensity of pre-miRNAs to RBFOX-2 from HeLa cell lysates (Experiments were performed by Dr. Harry Towbin).

57

Pre-miR ranking Norm- chemiluminescence Pre-miR ranking Norm. Chemiluminescence HeLa screen HeLa screen rec. RBFOX RRM recombinant RBFOX RRM

hsa-miR-20b 0,81 hsa-miR-107 0,89

hsa-miR-19a 0,72 hsa-miR-32 0,58

hsa-miR-107 0,65 hsa-miR-9-1 0,51

hsa-let-7i 0,60 hsa-miR-9-2 0,40

hsa-miR-1-2 0,59 hsa-miR-373 0,20

hsa-miR-25 0,50 hsa-miR-20b 0,19

hsa-let-7g 0,49 hsa-miR-181a-2 0,11

hsa-miR-32 0,45 hsa-miR-15a 0,10

hsa-miR-181b-1 0,43 hsa-miR-181b-1 0,10

hsa-miR-29c 0,41 hsa-miR-299 0,08

hsa-miR-140 0,39 hsa-miR-1-2 0,08

hsa-miR-103a-1 0,33 hsa-miR-96 0,08

hsa-miR-181a-2 0,29 hsa-miR-592 0,07

hsa-miR-134 0,29 hsa-miR-604 0,07

hsa-miR-138-2 0,28 hsa-miR-19a 0,07

hsa-miR-373 0,27 hsa-miR-30c-1 0,06

hsa-miR-9-2 0,26 hsa-miR-190 0,05

hsa-miR-148a 0,25 hsa-miR-25 0,05

Table 2. Ranking of miRNAs binding to RBFOX-1 RRM and full length RBFOX from HeLa lysate. Chemiluminescence was normalized to the strongest measured binder. Bold marks the binders containing the GCAUG consensus sequence. Red and blue mark the pre-miRNAs that only occur as a strong binder in one of the ELISA formats.

The results from the assays could not be explained by the conventional consensus motif for many miRNAs. Therefore, this data served as a strong justification for the development of a new method to identify binding motifs for RBPs using SPR spectroscopy and a RNA library of short motifs.

58

2.2: Synthesis of two RNA libraries

We synthesized two libraries: in one all possible 5 mers (1024) were present, in the other we created a dedicated library for the RBFOX RRM in which we kept one part of the motif of the first RRM intact and varied the second (UGCANNN; 43 = 64 sequences) and then, vice versa (NNNNUGU, 44 = 256 sequences). This resulted in a library of 320 sequences.

To create the set of all possible sequences we programmed a Python program (Supplementary Figure 2) to obtain all possible sequences.

For the synthesis of both RNA libraries we used a MerMade 192 from BioAutomation following standard solid phase synthesis procedures [279]. This comprised a universal support from Chemgenes, 2’-O-t-Butyldimethylsilyln (TBDMS) phosphoramidites from Thermo Scientific, 5-Benzylthio-1H-tetrazole (BTT) from CarboSynth and 96 well synthesis well plates from Orochem. The synthesis followed the 4 step cycle described previously (1.2.: RNA Synthesis). For the purification of oligoribonucleotides, we used an Agilent HPLC machine using reverse phase columns from Waters.

It was essential for the library handling to be able to follow easily, which sequence was in which well of the plate after HPLC purification. The problem was that we partially observed 2 Peaks or very broad peaks and therefore sometimes 2 wells were needed for one crude product. We developed a macro for the HPLC software in cooperation with Uwe Kirchhoff in order that each defined well of a 96-well plate would contain the desired product, and byproducts or the excess of the product would be discarded into other plates at defined wells. Two HPLC-based purifications were carried out, one with the DMT group present, the other after deprotection with 40% acetic acid (AcOH) (Figure 7). We measured the mass of approx. 10% of the purified library sequences to confirm their identity. In a final step, we quantified the synthesized sequences using a Spectramax M2 from Molecular Devices on UV transparent Costar plates at 260 nm. We calculated the extinction coefficient of the different sequences, using the nearest neighbor method [280] and therefore calculated the concentration of the solutions (Supplementary Table 4, Supplementary Table 5).

The synthesis was successful for about 90% of the sequences, which is rather low for such short sequences. Yields of the sequences varied widely ranging from 1-237 nM (Supplementary Table 5). I assumed that the problem in synthesis efficacy was due to a technical error of the machine, i.e. variable vacuum pulses for the mixing of the reaction solution and the CPG for different wells.

We successfully synthesized two libraries with 1344 sequences in total in sufficient amount for screening for new RBMs of RBPs. All the sequences synthesized and measured using HPLC-MS showed

59 the correct mass and we can be confident that the sequences of the libraries contain the proper assigned sequence.

2.3: Immobilization of the RBFOX RRM on a biosensor

To develop a universally applicable method for the detection of new RBMs it is important to be able to develop a method to reliably and reproducibly immobilize the ligands on [281] the biosensor. Our collaborator Dr. Erich Michel used a cell-free system to express the AA 109-208 [281] of the human RBFOX-2 containing a HIS tag [282] and a recognition sequence (AviTag) for the E. coli biotin ligase BirA [283] which in presence of D-biotin, attaches a biotin group specifically at the lysine of the recognition sequence (GLNDIFEAQKIEWHE). The product was purified using affinity chromatography with a nitrilotriacetic acid (Ni-NTA) chromatography which interacts with the histidines of the HIS tag. The protein was cleaved from the HIS-TAG and eluted from the column using TEV protease (Tobacco Etch Virus). This method of expression and purification worked well for the RBFOX RRM, but not for other protein of interest in the project, such as GLD-1 [284, 285] or GLA-3 [286]. GLD-1 is a protein in c. elegans which promotes the silencing of translation of the tra2 mRNA. GLA-3 is a zinc-finger containing protein from c. elegans. We were interested in these proteins from our collaborators. In addition the method to express biotinylated proteins is rather time consuming especially due to the purification procedures. We developed a new practical method to site-specifically label RBFOX RRM with biotin. The method is described in detail in the manuscript draft (6.2.: Rapid high-yield cell-free expression of quantitatively biotinylated proteins), attached as an appendix. One of the time consuming steps in the conventional preparation method is the purification of the labeled protein. The necessity for tedious purification steps could be circumvented by capture of the target protein directly from cell lysates in which the labeling is performed onto the surface of the biosensor. This was initially not possible using the conventional route [281] since the protein BirA biotinylates the endogenous biotin carboxyl carrier protein (BCCP), which then would also be deposited onto the biosensor chip surface. We circumvented this problem by using a bacterial S30 cell extract depleted of BCCP prior to labeling of the desired protein, in our case the RBFOX RRM. The BCCP depleted cell extract was created by incubating S30 extracts with a streptavidin-(chitin-binding domain)3 fusion protein (SA-(CBD)3). The fusion protein uses the high affinity [287] of the streptavidin/biotin system (fM affinity) to capture quantitatively biotin-labeled BCCP. This SA-(CBD)3/BCCP complex is then removed from the lysate using chitin coated magnetic beads (see 1.5.: Detection of new RNA binding motifs). This BCCP-depleted S30 extract was used to express the AviTag fused protein [281] of interest, in the presence of BirA and D-biotin leading to selective-biotinylation of the protein of interest free of biotinylated by-products. We found it best

60

to use the AviTag at the C-terminal end of the protein to avoid capturing of N-terminus-aborted translation products onto the biosensor surface, which could show unspecific interactions with the analyte.

The cell extract was filtered to remove cellular debris and was injected over a streptavidin coated biosensor. As a comparison we used purified biotinylated FOX RRM and measured the affinity against the UGCAUGU sequence. Both, the purified and non-purified yielded good and reproducible sensograms. Figure 22 shows one example sensogram each for the measurements with the purified and with the non-purified RBFOX-1 RRM as a ligand against UGCAUGU.

Figure 22: SPR analysis of RNA binding to C-terminally biotinylated Fox (109–208). The biotinylated Fox construct was immobilized on the streptavidin-coated sensor surface either after purification (A) or directly from the crude cell-free reaction mixture (B). The binding experiments were recorded at 25 °C in SPR buffer (10 mM HEPES at pH 7.4, 200 mM NaCl, 3.4 mM EDTA) in a concentration series of 100, 50, 25, 12.5, 6.25, 3.13, 1.56, 0.78, 0.39, 0.2, 0.1 and 0.05 nM of the 5’-UGCAUGU-3’ RNA analyte. All injections were measured as duplicates and the resulting sensograms were fitted with a 1:1 Langmuir model that includes mass transfer and double referencing

The obtained kinetic data was highly reproducible and gave similar results for both cases (Table 3), and was also in agreement with data from literature [9].

61

-1 -1 Experiment Coating [RU] ka [Ms ] kd [s ] KD [nM]

1st Purified Fox-1 172 5.2E6 0.027 5.2

2nd Purified Fox-1 165.5 1.33E6 0.0077 5.78

3rd Purified Fox-1 169.4 2.19E6 0.008 3.67

4th Purified Fox-1 151.6 3.4E6 0.0156 4.65

1st Crude Mix Fox-1 181.9 2.9E6 0.0128 4.45

2nd Crude Mix Fox-1 180.1 1.52E6 0.0079 5.2

3rd Crude Mix Fox-1 172.7 1.20E6 0.00577 4.81

4th Crude Mix Fox-1 164.1 1.52E6 0.00641 4.2

Table 3. Results of four independently conducted SPR concentration series experiments of the interaction between Fox-1(109–208) and 5’-UGCAUGU-3’ RNA. The target protein Fox-1(109– 208) was immobilized either from purified samples (purified Fox-1) or directly from the crude reaction mixture (crude mix Fox-1). All injections were measured as duplicates.

To optimize the conditions of the cell free biotinylation we varied the concentrations of D-biotin in the S30 extract (0, 2.5, 5, 7.5, 10, 12.5 and 15 µM) and tested for the effect of the biotin concentration on the coating of the biotinylated protein on the SPR chip surface. We injected a purified RBFOX RRM solution as a control and cell lysate with biotinylated RBFOX RRM for which the biotinylation done at different D-biotin concentrations. After each coating step, we injected 3 times a 2 M NaCl solution to remove attached RNA from the RRM. As expected, the purified RBFOX signal was constant upon injection of NaCl in contrast to a reduced signal for the non-purified protein. The signals from the coated non-purified RBFOX-1 RRM were increasing with decreasing biotin concentrations. This can be rationalized by remaining non-reacted biotin in the cell extracts.

Figure 23. SPR analysis of the direct immobilization of C-terminally biotinylated RBFOX-1 (AA 109–208) from the crude reaction mixture onto a streptavidin-coated SPR biosensor chip. The cell free reactions for production of the biotinylated target protein were carried out in presence of various amounts of biotin and each reaction was individually injected over the SPR biosensor surface which is indicated by color coding. The immobilization of purified biotinylated Fox-1 served as a reference. Blue arrows indicate the time points of sample injection into the channels of the sensor chip.

62

For low biotin concentrations, the free biotin should nearly be depleted from the extract. Therefore the capture of the biotinylated RBFOX RRM has nearly no competition reaction with the free biotin. In contrast, for the 15 µM biotin concentration, we can expect a huge excess of biotin and therefore the capture of one biotin is more likely than the capture of one RBFOX RRM. Due to the lower mass one bound biotin does not lead to the same change in signal as the signal change for the coating of one biotinylated FOX RRM which we can observe in the sensogram (Figure 23).

The new method reduced the effort of the immobilization drastically. Once the S30 extract is obtained, the biotinylation is done within 2.5-4 h and can be applied in an SPR experiment yielding the same result, compared with purified proteins. The cost of this method can be ignored. The costs of the materials for a 100 µL reactions are below 0.5 CHF and yield enough material to coat several hundred SPR chips. The only drawback compared to the method including purification, is the rather imprecise coating of the protein since the resulting signal is a combination of the coated protein and the attached RNA and therefore the bound RNA has to be removed to identify the amount of protein bound to the surface.

2.4.: Screen for new RNA binding motifs of RBFOX RRM

The 320 sequences of the RRM RNA library were diluted to a uniform concentration of 1 M in HEPES buffer. For SPR measurements we used a Mass-1 from Sierra Sensors. The chip was coated with biotinylated RBFOX RRM to a level of approximately 200 RU. Before, after and between plate measurements a standard, UGCAUGU at 75 nM was measured for normalization and as a control for the surface activity. Each analyte was measured for 2 minutes, followed by NaCl regeneration. The data was double-referenced (referenced to buffer injection and as well as a streptavidin-coated surface) using the “Analyzer” Software from Sierra Sensors. Each sensogram reached steady state (Figure 24a), which is essential for the data analysis.

63

Figure 24. Results of the RNA/RBFOX RRM screen. a) 256 sequences of NNNNUGU. b) 64 sequences of UGCANNN at a concentration of 1mM. c) all sequences with normalized RU values plotted. A threshold of 7 RU (red line) was arbitrarily chosen as a significant binding event.

When we normalized the response (RU) of analytes at the same concentrations to the mass of the analyte, we can use the response as an approximate measure for the binding affinity, given that binding is in steady-state. When the surface of the chip was fully saturated (signal at RUmax) however, we could not rank the binding affinity of individual sequences. In such cases, kinetic measurements at several concentrations were conducted. To achieve a ranking by the signal response we extracted the response of each analyte before the end of the injection of the analyte. We normalized the response to the mass using UGCAUGU (2181.36 g/mol) as the reference and normalized also to the active surface loading by using the reference signal with the reference channel (Supplementary Table 1). The normalized responses were plotted to have an overview over the results (Figure 24 c). As a cut-off for a meaningful response we chose a value of 7 RU, as indicated by the red line (Figure 24C). The sensograms were separated into different groups. In Figure 24A we see one very prominent binder (GGCAUGU; RU 13.5), followed by a group of 5 sequences (UGCAUGU, AGCAUGU, CGCAUGU, UGCAUGU, GGCUUGU) and then a second binder, GGAAUGU (6.7 RU). Interestingly, we also observed sensograms with obvious slow kd values (marked orange in Table 4). In Figure 24B we observed two

64 distinct groups: one sequence at 7 RU (UGCAAGC) and seven other sequences (UGCAUGA, UGCAUGG, UGCACGG, UGCACGU, UGCACGA, UGCAUGC UGCACGC).

Most of those 19 sequences contain the GCAUG consensus motif or the high confidence GCACG core motif known from SELEX experiments [170]. Anyhow, the experiment yielded three more new core motifs, GCUUG, GAAUG and GCAAG. Variations in the terminal nucleotides are well tolerated, as expected from former studies [9, 170].

6 -1 Entry Sequence RU KD [nM] ka *10 [M/s] kd [s ] G KD normalized [kJ\mol]

1 UGCAUGA 13.4 1.6 ± 0.1 4,85 ± 2.8 4.24 ± 2.3*10-3 -1,67

2 UGCAUGC 11.6 4.18 ± 2.7 4.42 ± 2.57 11.4 ± 1.3*10-3 0,67

3 UGCAUGU 11.5 3.18 ± 1.9 6.13 ± 4.5 1.3 ± 2.6*10-3 0,00

4 UGCAUGG 13.4 2.6 ± 0.26 4.08 ± 1.9 1.04 ± 4.6*10-3 -0,49

5 UGCACGA 12.5 3.34 9.3 3.12*10-2 0,12

6 UGCACGG 12.7 4.4 11.6 5.1*10-2 0,79

7 CGCAUGU 10.2 11.4 ± 6.8 7.48 ± 0.58 0.133 ± 0.041 3,11

8 AGCAUGU 11 20.4 ± 4.3 3.21 ± 1.35 5.95 ± 1.3*10-2 4,53

9 UGCACGC 11.1 15.5 5.23 8.18*10-2 3,86

10 UGCACGU 11.8 35.8 ± 10.3 3.54 ± 2.45 9.8 ± 4.9*10-2 5,90

11 GGAAUGU 7.5 188 ± 85.3 1.79 ± 0.93 0.28 ± 0.1 9,94

12 GGGUUGU 4.8 1174.5 ± 674 0.045 ± 0.0026 5.14 ± 2.47*10-2 14,41

13 GGCUUGU 9.8 1475 ± 948 0.462 ± 0.42 0.34 ± 0.2 14,96

14 AGGUUGU 3.6 872 0.0247 2.39*10-2 13,68

15 UGCUUGU 4.5 226 ± 11.5 1.26 ± 0.34 0.28 ± 0.06 10,39

16 CGGUUGU 5.5 1022 ± 12.5 500 ± 499 500 ± 498 14,07

17 UGCCUGU 5.4 140 ± 31.2 1.55 ± 0.14 0.22 ± 0.068 9,22

18 GGCAUGU 15.1 19.4 ± 3.5 6.61 ± 3.7 0.33 ± 0.20 4,41

19 UGCAAGC 6.9 2330 650 0.152 16,08

Table 4. Summary of screening data and kinetik measurements of the screen hits

The sequences of interest were further tested in kinetic SPR spectroscopy measurements to validate the findings and to determine the binding affinities using a chip surface coated with about 120 RU of RBFOX-1 RRM. The resulting kinetic constants were transferred into the change of free energy (G)

퐾퐷 with UGCAUGU as the reference (∆∆퐺 = −푅푇 ∗ ln ( 푅푒푓 ). For the analysis, we assumed no big 퐾 퐷퐴푛푎푙푦푡푒 changes in the structure of the RNA sequence and we only considered the H-bonds and partially the size of the -system for base stacking as done in the paper by Auweter et al.[9]. We did not consider

65 any steric clashes and changes in structure of the protein, nor the RNA. The changes in ionic interactions and in hydrophobic interactions are also not accounted for. A loss of a H-bond was assumed to yield a G of 4-7 kJ/mol according to literature [185].

The four binders with variations in the last base (Entry 1-4) all yielded very similar G values which is in agreement with the known RRM/RNA structure[9] (Figure 15A), where the Watson-Crick edge of the base is not involved in H-bonding. Variation at the 5th position to a C (Entry 10), as known from SELEX experiments [170], was also well tolerated, and showed no significant loss of binding. When changing the fifth position to a C and furthermore mutated the last position, it is curious that we find a strong loss of binding for the purines in the last position (Entries 9, 10) but not for pyrimidines (Entries 5, 6). This could indicate an influence of the stacking interactions, since the -system of pyrimidine bases is smaller compared to the purines (Supplementary Figure 1). The new motif GGAAUGU shows a loss of affinity of 9.94 kJ/mol. This roughly corresponds to a loss of 2 H-bonds. The motif GGCUUGU has an energy loss of 14.07 kJ/mol correlating to loss of 2-3 H-bonds. The motif UGCAAGC is the weakest binder from the screen with an energy loss of 16.08 kJ/mol which also correlates to may represent loss of 2-3 H-bonds.

With the three new core motifs, GCUUG, GAAUG and GCAAG we tried to rationalize the hits from the ELISA assay (2.1.: ELISA Assay of RBFOX against pre-miRNAs). In the FOX RRM screen (Figure 21 B), with an offset of 0.04 normalized chemiluminescence (value of 18 th strongest binder; hsa-pre-miR-25) (Table 2), we observed a 9 fold enrichment for both, GCAUG and GAAUG compared with the probability of the motif occurring by chance; 26% of the 18 pre-miRNAs contain either one of those motifs. If we used a more stringent cut-off (top 10 binders), we saw a 25.8 fold enrichment of the GCAUG motif compared with the probability of the motif occurring by chance and 30% of the binders contained that motif. The GAAUG motif only had a 2.9 fold enrichment and was present in 10% of the binders. The other two motifs, and also the GCACG motif (known from SELEX experiments), did not feature significantly in this analysis.

The analysis of the data from the HeLa lysate screen (Figure 21 C) provided more hits. When using a cut-off for the 18 highest binders (0.24 norm. chemiluminescence), the GCAUG motif occurred in 22% of the binders, and the GAAUG in 17% (13 fold enrichment) of the binders; in addition, GCAAG occurred in 6% of the RNAs (1.4 fold enrichment). Overall, 44% of the top 18 binders had one of the binding motifs. When analyzing the top 10 binders (0.4 norm. chemiluminescence), GCAUG occurred in 30 % of the RNAs as did GAAUG, which is a 25.8 fold enrichment. Furthermore the GCAAG motif is present in 10 % of the binders (2.9 fold enrichment). The Top 10 binders harbored in 70 % of the cases one of the three motifs. Even though GCAUG and GAAUG yielded the same number of pre-miRNA binders

66 they show a difference in affinity, when taking the chemiluminescence as a measure for binding strength. When we compare the binding strength of the pre-miRNAs harboring the GAAUG and the GCAUG binding elements the pre-miRNAs containing GAAUG as the binding element only show 90% of the normalized chemiluminescence compared to the pre-miRNAs containing the GCAUG sequence (Average chemiluminescence GCAUG: 0.639; GAAUG: 0.581).

HeLa lysate Top 18 binders Top 10 binders

Motif fold enrichment pre-miRs containing the sequence fold enrichment pre-miRs containing the sequence

GCAUG / 22% 25.8 30%

GAAUG 13 17% 25.8 30%

GCUUG / / / /

GCAAG 1.4 6% 2.9 10%

RBFOX RRM Top 18 binders Top 10 binders

Motif fold enrichment pre-miRs containing the sequence fold enrichment pre-miRs containing the sequence

GCAUG 9 26% 25.8 30%

GAAUG 9 26% 2.9 10%

GCUUG / / / /

GCAAG / / / /

Table 5. Statistical analysis of the new FBE in the pre-miRs from the ELISA assay.

67

Figure 25. Possible interactions of sub-structures for the new screening motifs. Amino Acids are marked in green, nucleic acids in blue.

68

Figure 26. Sensogram of 7 mer RNA sequences from the screen against RBFOX. The chip surfaces were coated between 100 and 140 RU using the Amine Chip from Sierra sensors. Each concentration was measured in duplicates in a 1:1 dilution series. The data were fitted using Scrubber in a 1:1 binding model including mass transfer limitations.

69

2.5.: Single base variants of the consensus RBFOX binding element

The data from the screen suggested new RNA motifs to which the RBFOX RRM binds. It also confirmed the findings from a different study using the “RNA Bind-n-Seq” method [217]. It suggested certain flexibility upon binding of RNA to the FOX RRM. We decided to test the influence of each possible single base mutation of the consensus FBE for its affinity against RBFOX RRM (Figure 27).

To determine the contribution of each base to binding of the RRM to the RNA, we synthesized all possible combinations of the 7 mer consensus sequence with a single mutation, except for the two guanosines, since they were crucial in our experiments and also in the experiments of others [9, 170, 217]. This led to a series of 15 RNAs (Table 6). The data was correlated with the possible structures shown in Figure 27. The biosensor surface was coated with 100-140 RU RBFOX RRM using the biotin/neutravidin system. After each injection of analyte, the surface was regenerated using 1M NaCl, and after each NaCl injection a buffer injection was performed for double referencing. The data were analyzed using Scrubber. The sensograms are shown in Figure 28.

All possible mutations in the first position (Entries 20-22) led to a loss of about 4 kJ/mol in binding. This indicated a possible loss of one H-Bond. In these mutations we have to consider the base stacking with the F126 as reported [9]. The π-stacking could be enhanced since the adenosine has a larger aromatic ring system, compared with uridine base and also a higher electron density. For the U1C mutation, the one carboxyl group is exchanged by an NH2 group. This exchange leads to a loss of one H-bond and could also affect other interactions such as hydrophobic interactions, π-stacking etc.

70

6 -1 -1 -1 -1 Entry Sequence ka * 10 [M s ] kd [s ] KD[nM] ΔΔG [kJ mol ]

20 AGCAUGU 3.21 ± 1.35 5.95 ± 0.013 20.4 ± 4.3 4.53

21 CGCAUGU 7.48 ± 0.58 0.133 ± 0.041 11.4 ± 6.8 3.11

22 GGCAUGU 6.61 ± 3.7 0.33 ± 0.20 19.4 ± 3.5 4.40

23 UGAAUGU 0.82 0.52 636 12.99

24 UGGAUGU 1.05 0.5 474 12.19

25 UGUAUGU 0.47 0.554 1180 14.42

26 UGCCUGU 1.55 ± 0.14 0.22 ± 0.068 140 ± 31.2 9.22

27 UGCGUGU 0.4 ± 0.17 390.35 ± 205.65 11.50 1.1 ± 0.15 28 UGCUUGU 1.26 ± 0.34 0.28 ± 0.06 226 ± 11.5 10.39

29 UGCAAGU 0.68 ± 0.035 0.46 ± 0.036 666.5 ± 91.5 12.99

30 UGCACGU 3.54 ± 2.45 9.8 ± 0.049 35.8 ± 10.3 5.90

31 UGCAGGU 0.83 0.122 1460 14.93

32 UGCAUGU 6.13 ± 4.5 1.3 ± 2.6*10-3 3.18 ± 1.9 0.00

33 UGCAUGC 4.42 ± 2.57 11.4 ± 1.3*10-3 4.18 ± 2.7 0.66

34 UGCAUGG 4.08 ± 1.9 1.04 ± 4.6*10-3 2.6 ± 0.26 -0.49

35 UGCAUGA 4,85 ± 2.8 4.24 ± 2.3*10-3 1.6 ± 0.1 -1.67

Table 6. List of 15 ssRNAs with a single mutation marked in red and the kinetic data. The change in Gibbs free energy is calculated as described above with UGCAUGU sequence as the reference.

The variations at position 3 led to losses of 12-14 kJ/mol, consistent with loss of 2-3 H-bonds. Variations at position 4 yielded similar loss then the variations in position 3 with a difference in energy of 9-12 kJ/mol. The variation A4C abolished the intramolecular GA mismatch base pair in the wt consensus sequence [9] by losing 2 H-bonds. The U5C variation is shown to be well tolerated in in SELEX experiments against RBFOX. It was one of the strong binders in the experiment by Jin et al. [170]. In our experiments, we observed a 5.90 kJ/mol weaker binding compared with the UGCAUGU sequence. This loss in energy correlates to the loss of one hydrogen bond. Mutations to adenosine (U5A) and guanosine (U5G) showed a similar weakened complex (G of 12.99 or 14.93 kJ/mol). In the last position, it seems as that which base is incorporated is not important, what is in agreement with the NMR structure of the RBFOX/UGCAUGU complex which shows that the seventh base does not participate in any hydrogen bonding, only the sugar moiety is involved in H-bonding [9].

71

Figure 27. Possible strucutres of 7 mer RNA binding to RBFOX RRM based on the UGCAUGU/RBFOX RRM strucutre from Auweter et al.

Our screens and the follow-up experiments showed a variety of new RNA motifs to which the RRM of RBFOX bound with nM affinities. In comparison with the consensus UGCAUGU sequence, the screen showed a high flexibility at the terminal bases. This is in agreement with the RNA/RBFOX structure [9], SELEX experiments [170, 179] and other recent studies [197, 217]. Also, variations in the bases at positions between the two guanosines were tolerated, which is also in agreement with literature [217]. The changes in affinity could partially be explained through the loss of H-bonds, when using the difference in Gibbs free energy.

72

Figure 28. Sensograms of RNA sequences containing one variation in comparison with the UGCAUGU motif. Data were measured in duplicates in a 1:1 dilution series starting from a concentration of 2000 nM. The curves were fitted using a 1:1 Langmuir binding model including mass-transport limitations.

It is interesting that the loss of energy was mostly independent of the substituted base but more dependent on the position of the mutation. The only exception was the U5C mutation, which was about 3-fold stronger than the other mutations.

73

2.5.1.:. Use of dimethyl-cytidine (dMC) and monomethyl-cytidine (mMC) to gain insights into RBP/RNA recognition.

To elucidate the importance of the exocyclic amino function in position 3 of the UGCAUGU sequence which forms an intramolecular H-bond to the U1 (Figure 27 wt1/wt3), we used (di)methyl-cytidine modifications which we applied prior to disturb base pairing in a RNA/RNA duplex without introducing a new nucleotide sequence [288] (6.: Manuscripts

6.1.: Development of a RNA negative control). The dimethyl-cytidine (dMC) modification (Figure 29B) was expected to disturb the natural H-bonding of this group completely, and might also have introduced steric clashes further reducing the affinity to its protein target. In contrast, the monomethyl-cytidine (mMC) (Figure 29B) should have the capability to maintain one H-bond and should avoid potential steric clashes with a partial rotation of its C-N bond.

-1 -1 -1 Sequence ka [s M ] kd [s ] KD [nM]

UGCAUGU 6,11 ± 1,6 *106 0,013 ± 0,0026 3.18 ± 1.9

UGmMCAUGU 1,965 ± 0,63*105 0,22 ± 0,078 1122 ± 37,5

UGdMCAUGU 1,27*106 0,44 348,00

Table 7. Kinetic data from SPR measurements of UGCAUGU, UGmMCAUGU and UGdMCAUGU against RBFOX RRM

We synthesized UGmMCAUGU and UGdMCAUGU and tested the affinity of sequences with RBFOX RRM as the ligand. Both RNA sequences yielded a signal from SPR spectroscopy (Figure 29A) although both modifications resulted in a decrease of affinity compared to the FBE. The KD value increased by a factor of about 400 (mMC) and a factor of 100 (dMC), which was mainly the result of faster dissociation rates, which increased by factors of 15 and 33 (mMC and dMC, resp.). The association constants showed smaller changes compared to the dissociation rates. We observed for the dMC modification only a 5 fold reduced association rate, whereas the mMC showed a 30 fold reduced association. This was in agreement with the expected results of a disturbed contact of the RNA to the protein due to the methylation of the exocyclic amino function and the smaller effect on the association rate is also in agreement with the observation that the complex formation is driven by several hydrophobic and electrostatic interactions [9] which should be unaffected by the introduced modifications. The affinity constants were in a similar range to sequences with other nucleotides at position 3 (C3A, C3G, C3U). This suggested a general disturbance of the interaction of the C3 to the RNA/protein complex for any

74 mutation in this position, and not only the loss of an intramolecular H-bond as shown by the methylated nucleobases.

Figure 29. A) SPR sensograms of the dMC and mMC modified UGCAUGU sequences against RBFOX RRM coated on the sensor surface. The kinetic fit is done with a 1:1 Langmuir model. The concentration for the mMC starts from 300 nM and the 2000 nM for the dMC modified oligonucleotide in a 1:1 dilution series. B) wt: Interactions from the wild type RNA against RBFOX RRM, obtained from the NMR spectroscopy, C3dMC: Possible interaction of the dMC modified RNA against RBFOX RRM. C3mMC: Possible interaction of the mMC modified RNA and its keto-imine tautomer against RBFOX RRM.

75

Surprisingly, UGmMCAUGU showed a lower affinity to RBFOX RRM in comparison to the UGdMCAUGU sequence (Table 7). The main difference was due to the association rate. The association rate of the mMC was one order of magnitude slower than that of the dMC modified sequence. The difference between the unmodified sequence and the dimethyl cytidine was not as strong. In contrast, the stability of the complex is similar, when comparing both modified sequences with each other, but they show both an about 20 fold faster dissociation, in comparison to the unmodified sequence. The slower association constant of the mMC could indicate the need for a structural rearrangement upon binding in the case of mMC [289]. An explanation based on the entropy could also be considered: the dMC could replace more water molecules from the complex, leading to an entropically favored complex. A possible explanation for the weaker binding of the mMC modified- cytidine is shown in Figure 29B. The Keto-imine tautomerism for cytidine and for heterocyclic aromatic amines has been reported previously [290-292]. In the imine form the mMC nucleobase loses two possible H-bond interactions with the RRM, in comparison with one for the dMC sequence. In addition, the CN bond of the exocyclic amine group presumably loses some rotational freedom, due to its partial double bond character. Calculations of the stability predicts the enamine form of the cytidine to be the more stable form in previous studies [290, 291].

Taken together, the data from the heptanucleotide screen, the binding data from the single base variants and the studies with the dMC/mMC modified sequences have shown that the binding of the RBFOX RRM to RNA has a greater degree of redundancy than is suggested by literature. The terminal bases (N1 and N7) show little sequence specificity and therefore the important motif is reduced to a 5 mer core sequence, which is framed by two essential guanosines. Variations in the bases of the core sequences yielded mainly all the same difference in energy (ΔΔG = 10 - 15 kJ/mol) with two exceptions, U5C and A4C, which showed a ΔΔG of 5 and 9 kJ/mol respectively. In our hands, the only sequence with a mutation in the core sequence and a similar affinity compared to the consensus FBE is the UGCACGU sequence. The conservation of the pre-miR-32 sequences from 21 different species also supports the importance of the UGCACGU sequence. The alignment of the 21 different pri-miR-32 sequences from miRBase (www.mirbase.org) using CARNA [293, 294] (http://rna.informatik.uni- freiburg.de/CARNA/) showed a high conservation for the pre-miRNA. The only two mutations in the FBE in other species introduces the second strongest binder (UGCACGU) (Supplementary Figure 3).

76

2.6. Interaction of RBFOX with precursor miRNAs 2.6.1. SPR studies of RBFOX/miRNA interactions

To confirm the results from the ELISA and to test if the FBEs are responsible for the binding events, we evaluated the binding of miRNA hairpins to RBFOX RRM using SPR. Three pre-miRs carrying the RBFOX core consensus motif GCAUG (pre-miR-32,-107, -20b) were tested on the Biacore T100 on a streptavidin surface, coated with approx. 150 RU of RBFOX RRM. To test if the FBE is responsible for the binding event we introduced a GA mutated (GCAUG  GCAUA) FBE into the hairpin and probed the mutated pre-miRNAs against the RBFOX RRM. To compare the effect of the secondary RNA structure we measured the respective 7 mer FBE of the hairpins as a control.

The data showed a strong binding of the 7mer sequence of the respective pre-miRNAs with KDs of 1.8, 5.95, 7.71 nM (UGCAUGU, GGCAUGA, GGCAUGU) and a good kinetic fit. The corresponding pre-miRNAs (pre-miR-107, -20b) showed an approximate 1000-fold lower affinity for the RRM compared with their corresponding FBEs. Pre-miR-32 showed an approximately 100-fold weaker binding to RBFOX RRM compared to the 7-mer UGCAUGU sequence. This loss of affinity could be explained by the partially ds character of the RNA of the hairpins: indeed, the RRM has been suggested to be a single-stranded RNA binder [9, 295] (Figure 30). This suggests that in order to bind the RRM the RNA structure has to unwind and possibly transition to the bent form observed by NMR spectroscopy [9].

Figure 30. Affinity of RBFOX-binding heptanucleotides, pre-miR-20b, -32 and -107 to recombinant biotinylated FOX RRM domain measured by SPR. The panels show binding curves and predicted secondary structures of native and mutated precursors, where the FBE is depicted in red. Concentrations of UGCAUGU and UGCAUAU were 1.7, 2.3, 4.7, 9.4, 18.8, 37.5, 75, 150 nM and for the pre-miRNAs were 0, 117, 175, 263, 395, 1333, 2000, 3000 nM.

77

-1 -1 sequence KD Ka [M s ] Kd [s ] The necessity of the pre-miRNAs to adopt a certain pre-miR-107 1.99 µM 2.1∙105 0.4 structure is supported by the kinetic information gained pre-miR-20b 3.6 µM 1.15∙105 0.43 pre-miR-32 327.5 nM 1.063∙105 0.03481 from the kinetic measurements. The association rate is GGCAUGG 7.71 nM 1.44∙107 0.11 about 2 orders of magnitude slower for the hairpins GGCAUGA 5.95 nM 0.86∙107 0.0504 compared with the heptanucleotides. In contrast, UGCAUGU 1.8 nM 1.19∙107 0.0215 Table 8. Summary of SPR data of RBFOX-1 RRM to the dissociation is only 4-10 times faster for the pre-miRs short 7mer FBEs and the full length pre-miRs. (Table 8). According to literature, slower association rates are indicative of the need for a structural rearrangement prior binding [289] which supports the theory of a conformational change.

We also confirmed the binding of the RBFOX RRM to pre-miRNAs (pre-miR-19a, -181b-1, -1-2 and -206) that harbor the non-consensus FBE (GAAUG). We confirmed that the FBE is responsible for the RBOX RRM/pre-miRNA by introducing a GA mutated FBE (GAAUA) into the pre-miRNA. For those analytes, we used the MASS-1 machine from Sierra Sensors. We used the amine chip, coated with 1500-2500 RU neutravidin and then with 20-50 RU RBFOX RRM.

This series of pre-miRs showed unusual binding profiles and we were not able to fit them to any common fitting model (1:1 Langmuir; heterogeneous ligand, heterogeneous analyte, induced fit) (Figure 31). However, all binding was lost when the presumed binding element was mutated (GA mutation) hairpins. We were unable to explain the problems with the fitting of the data. In order to see whether binding of the RBFOX RRM to the pre-miRNA changes the RNA structure, we measured CD (circular dichroism) spectra. We first obtained spectra for the pre-miRs (2.5 µM) and the pure RBFOX RBM at different concentrations alone. We then measured CD spectra of RBFOX RRM in five concentrations from 0.5-5.5 M in the presence of a constant RNA concentration of 2.5 M. We subtracted the CD spectra of the protein alone from those of the pre-miRNA/RBFOX RRM mixtures. The CD spectra measures the different absorption of circular polarized light by chiral molecules at a given wavelength. The absorption difference of the differently polarized light at the wavelength at around 260 nm is influenced by the π-stacking of the bases in the stem structure of the hairpin. Therefore, any changes in intensity absorption maxima around that wavelength may indicate a change in the structure of the hairpin [296-300]. Spectra at the lower wavelengths are influenced by both the protein and the RNA (as can be seen from the RNA and protein alone) and therefore it is difficult to gain insight from this region of the spectra.

78

Figure 31. SPR sensograms of pre-miRNAs harboring the GAAUG FBE. Concentrations starting at 2000nM in a 1:1 dilution series.

The resulting CD spectra did not indicate any significant changes in structure (Figure 32) of the RNA hairpins upon binding to RBFOX-1 RRM. If the structure would change, we would see a dose dependent change of the absorption bands in the CD spectra for the lowest protein concentration (0 - 2.5 µM). A likely explanation for our inability to fit the data to a 1:1 binding are additional interactions of structured RNA to RBFOX RRM. The hairpins pre-miR-19a,- 1-2, -206 all show a continuously increasing RU.

79

Figure 32. CD Spectra of pre-miRs at a concentration of 2.5 mM in the presence of RBFOX RRM (0.5 - 5.5 mM). The CD spectra of the protein and buffer were subtracted. From the pure RBFOX-1 RRM spectra only the HEPES Buffer was subtracted[299]

80

In a 1:1 binding model we would expect a saturable signal. In addition we can observe a 2 phasic decrease of the signal intensity in the dissociation phase. Directly after the end of the injection of the analyte, we observed a rapid decrease, followed by a slower decrease (Figure 31). Further evidence for multiple binding on the wild type pre-miRNAs came from spectra obtained at high concentrations of GA-mutated pre-miRNAs. A binding signal was observed at a similar response level compared to the wt. pre-miR (Supplementary Figure 4) for the highest concentration, but no responses for the lower concentrations are recorded. This might indicate that long RNA aggregates at high concentrations [301] and forms a high weight complex which interacts with the coated protein. In contrast to the aforementioned hairpins, hsa-pre-miR-181b-1 shows a 1:1 Langmuir behavior Figure 31. In contrast to the results obtained with the Mass-1 machine, results from the T100 Biacore (Figure 30) experiments yielded data which could be fitted. The main difference between the experiments is the chip surface which is used on the SPR chips. The amine chips from Sierra Sensors which was used consist of a matrix of C18 molecules with carboxylic groups in comparison with the dextran matrix of the Biacore chips.

To circumvent the problems with the low yields of pre-miRNAs synthesis, we tested a series of truncated pre-miRNAs (Supplementary Table 3). In contrast to the full length hairpins, the truncated hairpins showed sensograms which could be fitted very well (Figure 33) and yielded reproducible data (Table 9). Furthermore, the shorter truncated hairpins all showed a roughly 10-fold higher affinity in comparison to their wild-type pre-miR (Table 8, Table 9).

A possible explanation for this difference could be a need of the RBFOX RRM to unwind the stem structure partially in order to bind to the RNA, the energy required to unwind a longer stem is naturally higher.

81

Figure 33. SPR sensograms of truncated hairpins against RBFOX RRM. Concentrations starting from 5 µM in a 1:1 dilution series. The data were measured in duplicates and fitted to a 1:1 Langmuir model including mass-transport limitations.

-1 -1 -1 sequence ka [M s ] kd [s ] KD [nM] tr-miR-32 228000 ± 2000 0,01055 ± 0,00015 46,40 ± 1,1 tr-miR-20b 642500 ± 139500 0,0984 ± 0,0068 158,50 ± 23,5 tr-miR-19a 265000,00 0,15 580,00 tr-miR-1-2 402800,00 0,09 217,78 tr-miR-206 83,20 Table 9. Kinetic data of truncated hairpins

82

2.6.2.: Influence of RNA structure upon binding to RBFOX RRM

We observed a considerable difference between the affinity of the pre-miRNAs and the truncated hairpins for the RBFOX RRM.

Figure 34. Schematic illustration of the competition of the intramolecular RNA duplex formation and the RNA/protein complex formation. The RNA sequences represent the FBE (UGCAUGU) and the complementary strand (ACGUACA) in a hairpin structure.

The effects of the hairpin structure on the binding to RBFOX can either been seen as an effect of the accessibility [302-304], as a competing reaction to the RNA structure, or a combination of both. If the FBE is in the stem of a hairpin structure, the competition between RNA hybridization and RBP/RNA binding might be especially strong. The pre-miR-32 hairpin has the RBM partially in the stem region in contrast to pre-miR-20b. In the case of competing RNA hybridization with the formation of the RNA/RBFOX complex (Figure 34), at an equimolar ratio of educts the ratio between RNA/RNA and RNA/protein should be reflected in the ratio between the two rate constants. The rate constant for the RNA hybridization depends - among other parameters - on the ΔG value of the RNA hairpin stability and the rate constant of the RNA/protein complex formation on the ΔG value of the complex formation.

The stability of the hairpin can be obtained using UV melting experiments [305]. The absorption at 260 nm of a duplex increases rapidly upon separation of both strands, due to the quenching of the π-π interactions of the stacked bases [306]. The melting temperature is defined as the temperature at which 50% of the duplex is melted. The thermodynamic values can be extracted from the curvature of the melting curves [305, 307].

83

Figure 35. Data from SPR and melting experiments. Associated fraction plot and UV Absorption was measured in Phosphate buffer. SPR Sensograms were measured on the MASS-1 with RBFOX RRM coated on the sensor surface.

We decided to use UV melting experiments and SPR measurements in an effort to examine the effect of the hairpin stability to the binding affinity of RBFOX to hairpins. We synthesized a series of truncated (tr) miR-32 and -20b variants (Supplementary Table 8). The SPR experiments yielded kinetic data for the RNA/protein complex formation and the melting experiments thermodynamic data for the RNA hairpin formation (Figure 35, Supplementary Table 7). No correlation of the hairpin stability (ΔG) with the dissociation constant of the RBFOX/RNA complex could be shown. In consequence also the enthalpy (ΔH) and entropy (ΔS), from which ΔG can be calculated, showed no effect on the dissociation rate (kd) for neither tr-miR-32 nor -20b binding to RBFOX RRM (Figure 36 Correlation between

ΔG/ΔH/ΔS and kd). Therefore we can conclude that the dissociation rate is independent on the hairpin stability. In contrast we see a slower association of the RNA/RBFOX RRM complex if the hairpins exhibited a low Gibbs free energy (Figure 36; Correlation between ΔG and ka). In consequence a similar

84 correlation can be seen between the enthalpy (ΔH) and entropy (ΔS) and the association rate. The effect is more pronounced, for tr-miR-32, in which the FBE is mainly in the hairpin stem, in contrast to the FBE in the TL for tr-miR-20b (Supplementary Figure 8) which supports the theory, that the competition reaction of the RNA hybridization is more pronounced in a stem structure.

Figure 36. Correlation between the thermodynamic values of the melting experiments, with the kinetic data obtained from SPR experiments. ΔG values represent the stability of the hairpin structure.

The experiments clearly show the importance of the position of the RBP RBM. If the RBM is in a dsRNA part, the affinity can be drastically reduced for a RBP which binds to ssRNA, which results in a slower RNA complex formation. In contrast, the stability of the RBFOX/RNA complex is not effected, as we can see from the dissociation constants.

85

2.6.4.: The influence of the sugar pucker of the ribose on the binding affinity against RBFOX

The NMR structure [9] of the RNA/RBFOX RRM complex showed that the last six out of seven nucleobases adopt a C2’-endo conformation in contrast to the more usual C3’-endo conformation of oligoribonucleotides. We decided to investigate the influence of the sugar pucker upon binding to the RRM. The C2 or the C3 leans out of the plane leading to a C2’-endo or C3’-endo conformation (Figure 37. The sugar conformation can be determined by the 3J-coupling of the protons measured by 1H-NMR

3 3 spectroscopy between H1 - H2 ( JH1-H2C’2-endo ≈ 8 Hz, JH1-H2C3’-endo < 2 Hz) (Figure 37).

Figure 37. Schematic illustration of the different sugar pucker conformations.

A recent study calculated the energies of the transitions states between the C2’- and C3’-endo conformation of RNA and DNA and the energy minima of the C2’-endo and C3’-endo conformation [308] of RNA and DNA bases in the gaseous phase. The energy for the formation of the transition state for RNA between C2’-endo and C3’-endo is in the range of 4 kcal/mol (rA = 3.7, rC = 3.8, rG = 3.7 rU = 4.3 kcal/mol). For rC and rU the C2’-endo conformation is with 0.9 kcal/mol less favorable compared to the C3’-endo form. In contrast, rA and rG showed no (rA) or a rather small (0.1 kcal/mol for rG) preference for the C2’-endo conformation. The transition of the DNA between the C2’-endo and C3’-endo conformation occurs through 2 transition states with higher energies, compared to the RNA transition states (dA = 4.5/4.9, dC = 3.9/4.9, dG = 4.7/5.0 dT = 4.5/5 kcal/mol). In all cases, the C3’-endo form is less favorable compared to the C2’-endo conformation (dA = 2.9, dC = 2.8, dG = 3.1, dT = 3.2 kcal/mol). The favorable energy of the C3’-endo conformation of RNA in ssRNA form [309] was also confirmed by calculating the potential energy of RNA dimers [310] including the contributions of van der Waal's and electrostatic forces and torsional contributions. By variations of the torsional angles the conformation with the lowest energy was calculated. They found out that the C3’-endo conformation was always favored over the C2’-endo conformation (rAA = 1.8, rCC = 5.8, rGG = 4.5, rUU

86

= 1.8 kcal/mol). Other studies also showed that sugars can switch their conformations even if they are embedded in a dsRNA system and are themselves in a mismatched base pair [311, 312]. It was also reported that the sugar pucker of DNA can change upon binding a protein[313].

-1 -1 6 -1 Sequence ka [M s ] 10 kd [s ] KD [nM] ΔΔG [kJ/mol]

UGCAUGU 2,04 ± 0,26 0,0111 ± 0,00267 5,37 ± 0,63 0 TGCAUGU D1 2,10 ± 0,42 0,0024 ± 0,00037 1,14 ± 0,074 -3,84 UGCAUGU D2 3,53 ± 0,669 0,0086 ± 0,0014 2,47 ± 0,37 -1,93 UGCAUGU D3 4,40 ± 0,600 0,3507 ± 0,317 29,25 ± 5,15 4,203 UGCAUGU D4 5,17 ± 2,25 0,0183 ± 0,011 3,16 ± 0,9 -1,31 UGCATGU D5 4,46 ± 3,53 0,0093 ± 0,006 3,92 ± 3,1 -0,78 UGCAUGU D6 6,23 ± 2,32 0,095 ± 0,0469 14,69 ± 3 2,49 UGCAUGT D7 1,78 ± 0,72 0,063 ± 0,026 26,7 ± 12,31 3,98 UGCAUGU OMe1 2,3 ± 0,55 0,0042 ± 0,0012 1,85 ± 0,26 -2,64 UGCAUGU OMe2 1,88 ± 0,47 0,0084 ± 0,0031 4,45 ± 1,19 -0,47 UGCAUGU OMe3 2,58 ± 0,06 0,017 ± 0,002 6,6 ± 0,92 0,511 UGCAUGU OMe4 2,44 ± 0,66 0,0059 ± 0,0016 2,49 ± 0,72 -1,90 UGCAUGU OMe5 3,49 ± 1,53 0,0155 ± 0,013 3,76 ± 1,67 -0,88 UGCAUGU OMe6 3,60 ± 0,09 0,0165 ± 0,00215 4,37 ± 0,48 -0,51 UGCAUGU OMe7 4,71 ± 0,99 0,0423 ± 0,011 8,98 ± 1,62 1,27 Table 10. Kinetic data of the 14 hybrid sequences containing one modified sugar binding to RBFOX RRM. DX marks the position at which an RNA nucleotide is replaced by a DNA nucleotide and OMeX marks the position at which an RNA nucleotide is replaced by a 2’-OMe RNA nucleotide.

We synthesized a series of seven consensus hybrid sequences with a variation at one position to a DNA nt or a 2’-OMe-RNA respectively. Each binding constant of the hybrid sequence against RBFOX RRM was compared to the affinity of the fully RNA consensus motif to the RBFOX RRM. The sugars of the DNA bases should adopt a C2’-endo form whereas the sugars of the 2’-OMe-RNA hybrid sequences should keep the C3’-endo sugar pucker.

We measured the RNA hybrid strands against the immobilized RBFOX RRM and fitted the obtained sensograms according to a Langmuir 1:1 binding model including mass transfer (Figure 38). The bases 2-7 of the motif binding to RBFOX adopt a C2’-endo conformation and the first base adopts the C3’-endo conformation in the RBFOX RRM/RNA complex [9]. The only significant changes in binding affinity of the oligonucleotide sequences to the RBFOX-RRM are seen for the positions 1, 3, 6 and 7 for DNA hybrids and for position 1 for the 2’-OMe hybrids. The hybrid strand with a DNA nucleoside in position 1 leads to a gain of stability of 3.84 kJ/mol. In comparison with the reference sequence the dissociation rate is lowered by a factor of 5 whereas the association rate of the RNA/protein complex is unaffected. The DNA nucleoside in position 3 leads to a weaker binding of the RNA with 4.2 kJ/mol. Here also the major difference is the dissociation rate with about a 30 fold increase but also a slightly

87 enhanced association rate. The oligonucleotide/RBFOX RRM complex with DNA nucleotides at positions 6 and 7 is less stable with a higher energy of 2.5 and 4 kJ/mol respectively. Again, the major difference is the dissociation rate, which cannot be compensated by the enhanced affinity constant. The effect of the DNA nucleotides at the last two positions can be explained by the loss of intramolecular 2’-OH H-bonding (Figure 15A). The effects for the UGCAUGU OMe7 and OMe6 sequences are less enhanced. This can be explained because the methoxy group could still act as a hydrogen acceptor.

Our SPR data is not consistent with the sugar pucker observed in the RBFOX RRM/RNA structure from Auweter et al.[9]. This could be due to the additional effects, which are more enhanced than the effects of the sugar pucker and are due to the experimental design which is might be inappropriate to investigate the influence of the sugar pucker of ribonucleotides upon binding RBFOX. It might be necessary that all of the last six bases are swapped to a C2’endo sugar pucker. Another problem could be that the short sequences do not show any secondary structure and therefore already exist in the C2’-endo conformation due to the low energy difference between the two major sugar pucker forms. This could be elucidated by including the modifications into a structural framework, such as the truncated hairpins. It is more likely that the RNA assumes the C3’-endo conformation in those hairpins. An additional explanation could be that the difference in affinity is too small to observe any effect using SPR.

88

Figure 38. SPR Sensograms of the 14 hybrid sequences and the consensus RNA sequence (Table 10) against RBFOX RRM coated on the SPR chip surface. Data were measured in duplicates and in a twofold dilution series starting from a concentration of 100 nM. The simulated curves are fitted in a 1:1 Langmuir model including mass-transfer limitations.

2.7.: Effects of the alternative splicing factor RBFOX-2 on the biogenesis of pre- miR-20b

To determine whether RBFOX binds to miRNAs also in vivo we did UV crosslinking and immunoprecipitation (RIP) from RBFOX-2 in human adenocarcinoma cells (SW13) and detected the bound RNA using PCR with primers against pre and pri-miRNAs (Supplementary Table 9). We found pre/pri-miR-107/32 to be 30-40 fold enriched compared to pre/pri-miRNAs not containing the FBE whereas pre/pri-miR-20b was 10 fold enriched (Figure 39A). It should be noted that the designed primers cannot discriminate between pre and pri levels whereas the pri-miRNA primers are specific. This would be critical for testing at which step of the miRNA biogenesis RBFOX interacts with the RNA.

89

Figure 39. A: Selective interaction of FOX-2 with FBE-containing precursors of miR-20b, miR-32, and miR-107 in SW13 cells. qRT-PCR data of FOX-2 immunoprecipitation (RIP) versus control beads (without antibody). Error bars indicate standard deviations of three independent experiments. (Experiments were performed by Julian Zagalak)

We wanted to examine if the interactions between RBFOX-2 and the pre-miRNAs measured in the RIP experiments (Figure 39A) have any effect on the biogenesis of these miRNAs. Therefore we suppressed the levels of RBFOX-2 using a siRNA (silencing RNA) against RBFOX-2. As a control for the natural biogenesis of the miRNAs we performed a mock transfection into HeLa cells (no RNA transfected). RNA was extracted from cells and the small RNA fraction was separated with a polyacrylamide (PAA) denaturing gel. The obtained RNA was transformed into cDNA which was then sent for small RNA deep sequencing.

The experiment showed reduced levels of miR-107-3p and upregulated levels of miR-32-5p and miR-20b-3p upon RBFOX-2 knockdown (Figure 40A). A closer look at the sequence reads of the deep sequencing experiment (Figure 40 B) revealed an effect on the 3p strand of miR-20b: the sequences showed an addition of 1 or 3 nt at the 5’ end of the 3p strand. In contrast, a negative control without any FBE, miR-17, did not show any such changes in the miRNA sequence upon RBFOX knockdown. Since RBFOX does not bind to miR-17 we assumed that the effect on the maturation of miR-20b upon RBFOX-2 knockdown was a direct effect of RBFOX-2 and not mediated through alternative factors influencing miRNA biogenesis. One possible explanation of the effect could be the effect of TRBP on the biogenesis of miRNAs. TRBP was recently shown to alter Dicer cleavage, thereby including bases at the 5’ end of 3p miRNAs [90]. If RBFOX-2 prevented TRBP from binding to the pre-miRNA by binding this might have explained the experimental outcome. Additional experiments showed a regulatory loop between miRNA-20b and RBFOX-2 (data not shown). Similar loops were previously shown for SF2 and miR-7 [137]and Lin28 and let-7 [314, 315].

90

Figure 40. A: Deep sequencing analysis upon FOX-2 knock-down of selected miRNAs with significantly altered expression levels. B: Analysis by small RNA sequence reads. The normalized sequence reads show intact miR-20b-5p reads in Mock samples and is down-regulated in FOX-2 knock-down samples. The 3p miR-20b reads increase upon knock-down of FOX-2 and with addition of U on 5ʼ end. (Experiments performed by Julian Zagalak, Dr. Afzal Dogar and Dr. Jochen Imig)

These experiments showed a new function for RBFOX-2 in miRNA biogenesis, in addition to its function as an alternative splicing factor. Mir-20b was shown to act with RBFOX-2 in a regulatory-loop to provide a second self-regulatory mechanism of RBFOX in addition to the alternative splicing of RBFOX-2 of its own mRNA to form an inactive isoform [316]. The deep sequencing data additionally showed a strand bias through RBFOX-2 pre-miRNA interactions.

91

3.: Summary and Outlook

We developed a new method to determine the RNA binding motifs (RBM) of RBFOX using SPR spectroscopy and a library of short RNA sequences. In this project we synthesized one general RNA pentamer library and one specific heptamer library for RBFOX. The screen of the heptamer library against RBFOX at one concentration revealed a certain degree of flexibility within the core sequence. Furthermore, we found that the terminal bases are recognized unspecifically and that one variation within the core motif is tolerated, while the two guanosine residues are essential for binding. We confirmed the strongest binding sequences in a concentration series to obtain kinetic data of the RBFOX-RNA interaction. The resulting new motifs confirmed different recent studies revealing a less specific binding of RBFOX to RNA.

To achieve an easy and reliable method to coat the SPR sensor chip with the ligand, we developed a new method to capture biotinylated protein directly from the cell lysate. The protein was expressed in a cell free system and site specifically labeled by an enzyme BirA. By depleting BCCP from the lysate we extracted only the biotinylated protein from the cell lysate and were able to directly capture the site-specifically biotin-labeled RBFOX RRM from the lysate onto the SPR chip surface. In comparison with the purified RBFOX RRM we could not see any distinct difference in the binding properties against the FBE. With this method we are able to label specific site on RBFOX within a day for less than 0.5 CHF in sufficient amounts for several SPR measurements.

We rationalized binding between pre-miRNAs and RBFOX-2 from an ELISA plate screening assay. We confirmed the binding events of the ELISA using SPR and abolished the binding upon a single base mutation in the FBE. The comparison between the affinity of pre-miRNAs against RBFOX and the corresponding FBEs showed a strongly decreased binding for the pre-miRNA. We elucidated the effects of the hairpin stability on the affinity of the RBFOX RRM to the RNA and showed a decreased affinity in the interaction of RBFOX and RNA for a more stable hairpin. The dissociation rate of the RBFOX RRM/RNA complex was unaffected by the hairpin stability, whereas the association rate showed a decrease (approx. 5 fold for tr-miR-20b and 10 fold for tr-miR-32). This experiments showed the importance of the position of the FBE within the secondary RNA structure for binding of RBPs. We could also confirm the binding for hairpins harboring the non-consensus FBE GAAUG to the RBFOX RRM, but we were not able to fit the data according to any standard model.

In a next step we tried to elucidate the effects of the sugar pucker on the binding kinetics of RBFOX. The RNA/RBM structure obtained by NMR showed that the RNA sugar moiety adopts a C2’-endo formation and not the common C3’-endo formation of RNA. We measured RNA/DNA and

92

RNA/2’-OMe-RNA hybrids to introduce sugar moieties which are expected to exhibit a C2’-pucker conformation into the sequence (DNA) or mimic the effect of the 2’-OH group of the RNA. Even though we measured differences in the binding affinity upon single mutations, the results could not be explained by the sugar pucker. We hypothesize, that the loss of the 2’-OH groups shows an effect which is stronger then the effect of the sugar pucker.

In experiments in vivo we investigated the effect of the RBFOX/pre-miRNA interactions on the miRNA biogenesis. RBFOX2 RIP assays in SW13 cells showed that RBFOX-2 binds to pre-miRNAs in vivo and affects the strand bias of miR-20b. In addition RBFOX-2 influences the Dicer cleavage site between the 3p-strand and the terminal loop yielding additional bases at the 5’ end of the 3p strand upon RBFOX- 2 knockdown. In addition, miRNA-20b 5p shows a repressive effect on the levels of RBFOX-2 forming a regulatory loop between miR-20b and RBFOX-2.

Within this work we developed a new negative control for RNA/RNA interactions. We introduced a N4-methylcytidine and N4N4-dimethylcytidine modification into the seed of a miRNA. The modification reduced the effect of the miRNA on its target significantly without introducing a new seed sequence.

The insights gained by our research can serve as a starting point for further investigation. The newly discovered role of RBFOX-2 in miRNA biogenesis is very intriguing. The biological implications of RBFOX-2 interacting with pre-miRNAs with a non-canonical FBE are of special interest. Three of those miRNAs (miR-181b-1, -1 and -206) have a role in the differentiation of myoblasts into myotubes and were shown to bind RBFOX2 in primary human myoblasts. According to target scan, Ataxin1, which is known to interact with RBFOX-2, has predicted targets site for miRNA-19a, -32, -181b-1 and -20b. Especially miRNA-19a has several highly conserved target sites in the 3’-UTR of the Ataxin1 gene. This might be an additional important regulatory network effecting cerebellar functions in humans.

The new SPR spectroscopy-based assay was working fine as soon as the library was fully synthesized. Unfortunately we encountered problems in expressing proteins of interest and using them in this assay. Some of them were not expressed (GLD-1) in our system or did not fold properly, leading to unspecific binding (GLA-3). To circumvent this problems it could be of interest, to pull-down proteins from the cell lysate using specific antibodies captured on the biosensor chip surface. We tried to avoid this pathway, since the system would only result in little signals due to the high mass of the ligand/antibody complex.

93

4.: Materials and Methods 4.1.: List of used Chemicals Chemical Provider 2'-OMe Ac C Phosphoramidite Thermo Scientific 2'-OMe Bz A Phosphoramidite Thermo Scientific 2'-OMe iBu G Phosphoramidite Thermo Scientific 2'-OMe U Phosphoramidite Thermo Scientific 2-Propanol Sigma Aldrich 3’-Biotin-TEG-CPG GlennResearch 30% (w/v) acrylamide/ Bis solution 29:1 (3.3%C) Bio-Rad 4-Triazolyl Uridine CED phosphoramidite Chemgenes 5-(Benzylthio)-1H-Tetrazole Acetic acid Chemgenes Acetic Acid Sigma Aldrich Acetonitrile Fluka Agarose MP Roche Agencourt AMPure XP - PCR Purification Beckman-Coulter Ammonium hydroxide Sigma Aldrich anti-FOX-2 Bethyl anti-rabbit IgG peroxidase conjugate KPL Blocking Reagent Roche BM chemiluminescent substrate Roche CAP A (with Lutidine) Biosolve CAP B 16 % Biosolve Caspase-Glo 3/7 assay system Promega Dichloroacetic Acid Sigma Aldrich dichloroethan Sigma Aldrich Dimethylamine Acros di-Sodium hydrogen phosphate dihydrate Fluka Dual-Glo Luciferase Assay System Promega Dynabeads® Oligo(dT)25 Life Technologies EDTA (0.5 M), pH 8.0 Life Technologies Ethanol Sigma Aldrich HeLa cells American Type Culture Collection HEPES ABCR-Chemicals Hexafluoroisopropanol ABCR-Chemicals Methanol Sigma Aldrich Methylamine gaseous Linde Methylamine, 40 wt% solution in water Acros MinElute Gel Extraction Kit Qiagen MinElute PCR purification kit Qiagen Oligofectamine™ Transfection Reagent Life Technologies Oxidizer 0.02 M for AB Biosolve PBS, pH 7.4 Life Technologies Potassium chloride Fluka psiCHECK2 Promega

94

RNeasy Mini Kit Qiagen Sodium chloride VWR Streptavidin Jackson ImmunoResearch SW13 cells Cell Line Services T4 Polynucleotide kinase New England Biolabs T4 RNA Ligase 1 (ssRNA Ligase) New England Biolabs T7 RNA polymerase Roche TheraPure Ac rC Phosphoramidite Thermo Scientific TheraPure Bz dA Phosphoramidite Thermo Scientific TheraPure Bz dC Phosphoramidite Thermo Scientific TheraPure Bz rA Phosphoramidite Thermo Scientific TheraPure iBu dG Phosphoramidite Thermo Scientific TheraPure iBu rG Phosphoramidite Thermo Scientific TheraPure rU Phosphoramidite Thermo Scientific TheraPure T Phosphoramidite Thermo Scientific Triethylamine Acros Triethylamine-hydrofluoric acid Sigma Aldrich Trifluoroacetic acid Sigma Aldrich Tris (1 M), pH 8.0 Life Technologies Tween 20 Applichem Ultra-Pure Sequagel Urea Gel National Diagnostics Universal UnyLinker Support (500 and 1000 Å) Chemgenes Table 11: List of used chemicals

4.2.: List of used Equipment

Equipment Provider 1200 series HPLC and HPLC-MS System Agilent 96 well Assay block 2 mL Costar 96 well filter plate Orochem Amine SPR sensor chip Sierra Sensors Biacore T-100 Biacore C1000 Thermal Cycler 96 well bock BioRad Carey 300 UV spectrometer Agilent Chirascan™ Circular Dichroism Photophysics Deepwellplate 96/500 sterile Eppendorf DeltaRange XS6002S Balance Mettler Toledo Filter Tips Star Lab Mass-1 Sierra Sensors MerMade 12 BioAutomation MerMade 192 BioAutomation Mithras LB940 plate reader Berthold NanoDrop thermo-Fisher

95

RiOs-DI 3 UV Milipore SA SPR sensor chip Biacore SpectraMax M Molecular Devices SPR-2 Sierra Sensors UV Plate, 96-well, no lid with UV transp. flat bottom Costar waters XBridge OST C18 2.1 x 50 mm Waters waters XBridge OST C18 4.6 x 50 mm Waters Table 12. List of used Equipment

4.3.: Methods

RNA ELISA screening assays

This ELISA-based screening assay was carried out as reported previously [276].96-well microtiter plated were coated with Streptavidin/PBS overnight followed by a blocking step (1% Top-Block in 12 mM Hepes, 80 mM KCl). We added the biotinylated RNAs from our library (8 nM in 20 mM Hepes, 80 mM KCl) and incubated at 4°C overnight. Plates were washed using water and a high-salt HeLa lysate was added (1/100 lysate dilution in 12 mM Hepes, 80 mM KCl, 1 mM MgCl2, 0.05% Tween-20, 10 µg/ml Heparin, 0.1 mM DTT, 1% Top-Block). Incubation took place at room temperature for 2.5 h prior to a washing step using water. Fixation was carried out using formaldehyde. RBFOX2 protein retained by the coated RNAs was measured using a specific antibody (anti-RBM9, Bethyl, Cat. No. A300-864A). A 1/5000 dilution of primary antibody was added (in 25 mM Hepes pH 7.2, 150 mM NaCl, 0.05% Tween- 20, 1% Top-Block) followed by a 1/3000 dilution of secondary antibody (anti-rabbit IgG peroxidase conjugate, KPL, Cat. No. 074-1506). BM chemiluminescent substrate (Roche Applied Sciences, Cat No. 11582950001) was added for detection on a Mithras LB940 plate reader (Berthold).

Cell Culture and Transfections

The HeLa (CCL-2) and SW13 (# 300349) cells were purchased from the American Type Culture Collection and Cell Line Services respectively. The siRNAs against FOX2 were at (CCUGGCUAUUGCAAUAUUU) 1556 to 1574 and (CAGACACAAAGUAGUGAAA) 661 to 679 positions of NM_001031695.2 synthesized from Eurofins Genomics. Mimics of miR-20b 5p and 3p, miR-32 5p and 3p were from Dharmacon. The pre-miRNA (-20b, -20m, -32, -32m, -107 and -107m) sequences were synthesized using a procedure described previously [276]. The RNAs were transfected using Oligofectamine (#12252-011, Invitrogen) according to manufacturer's instructions. A plasmid encoding Human RNA binding protein (RBFOX2) transcript variant 1(#SC320568) purchased from OriGene

96

Technologies was transfected using jetPEI™ (#101-10) from Ployplus transfection according to manufacturer's instructions.

RIP, qRT-PCR and northern blot analysis

RIP was performed as described previously [276]. Pre-miRNA primers were designed to amplify the miRNA stem-loops. qRT–PCR assays were performed for measurement of the expression levels of pre- and mature miRNAs were described previously [276]. Northern blot analysis was performed as described previously [317]. Probes were radioactively labeled using in vitro T7 transcription system.

Production of biotinylated human FOX RRM

The RRM domain of human alternative splicing factor Fox-1 comprising amino acids 109-208 was subcloned from pET28-Fox [9] into the cell-free expression vector pCFX3 [318] using the NdeI and BamHI restriction sites. For expression of the biotinylated Fox RRM, the 15-amino acid E. coli biotin ligase recognition sequence GLNDIFEAQKIEWHE was introduced between the TEV cleavage site and the gene encoding Fox-1 using standard PCR mutagenesis. The resulting vectors were sequence- verified and amplified using a plasmid maxi prep kit (Macherey-Nagel). E. coli protein ligase BirA was cloned, expressed and purified as previously described [283]. Expression of non-biotinylated human Fox RRM was achieved in a 10 mL batch-mode cell-free synthesis reaction [318] which was conducted for 3.5 h. Biotinylated Fox RRM was separately expressed using the same cell-free procedure in presence of 2 µM BirA and 400 µM d-biotin. Subsequently, both biotinylated and non-biotinylated Fox RRM were separately purified from the cell-free expression supernatant by Ni-NTA affinity chromatography using a 5 mL HisTrap column (GE Healthcare) equilibrated with buffer A (50 mM sodium phosphate, pH 7.4, 30 mM imidazole, 500 mM sodium chloride). The proteins were eluted in a 100 mL linear gradient of 30-500 mM imidazole in buffer B and were cleaved overnight at 4°C with 0.5 mg TEV protease, prepared as described[281]. The protein solutions were then passed over a 5 ml HisTrap column to remove the N-terminal (His)6-tagged GB1 domain and the (His)6-tagged TEV protease. The purified Fox proteins were then dialyzed overnight against 4 L of buffer C (10 mM Tris- HCl, pH 7.2, 20 mM NaCl) and were subsequently concentrated in a 3 kDa Vivaspin-20 centricon (Sartorius) to 100 µM. Biotinylation of the purified Fox proteins was verified using mass spectrometry.

97

Cell culture and transfections for 6.: Manuscripts

6.1.: Development of a RNA negative control [288]

HeLa cells (ATCC, #CCL-2) obtained from (LGC, Molsheim, FR), were maintained in Dulbecco's Modified Eagle's medium (Gibco, Invitrogen, Basel, CH) supplemented with 10% fetal bovine serum (FBS; Sigma- Aldrich, Buchs, CH). SiRNA against Renilla (siRen) is 5’ GAGCGAAGAGGGCGAGAAAUU (Dharmacon, Chicago, USA) and the control siRNA (siCon; #AM4640) was from Ambion (Austin, USA). RNAs were transfected using Oligofectamine (#12252-011, Invitrogen, Basel, CH) according to manufacturer's instructions. Dual luciferase reporter plasmids containing the target sites of miR-106a (CDKN1A; NM_000389.4; 3’ UTR: 1051-1200) and miR-34a (SIRT1; NM_001142498.1; 3’ UTR 1381-1500), were cloned into the psiCHECK-2 Vector (#C8021, Promega, Dübendorf, CH). For the luciferase assays, HeLa cells were seeded in white 96-well plates and RNAs were transfected after 8 h with the indicated doses. All transfections were performed in triplicates. DNA (20 ng of plasmid/well) was transfected using jetPEI (#101-10, Polyplus, Illkirch, FR) according to manufacturer's protocol. After 48 h supernatants were removed and firefly substrate (15 µl; Dual-Glo® Luciferase Assay System, Promega, Dübendorf, CH) was added. Luminescence was measured on a microtiter plate reader (Mithras LB940, Berthold Technologies, Bad Wildbad, DE). After 30 min 15 µl of renilla substrate per well was added and the measurement was repeated. Values were normalized against the normalization luciferase and the corresponding oligofectamine mock control, respectively. Caspase-3/7 activity was measured in lysates of transfected cells as previously described [319].

RNA library synthesis

The RNA synthesis was done according to the standard RNA synthesis protocol using “DMT-on”. We used approx. 2 mg 500 Å universal support on 96 filter well plates on a Mermaid 192. The deprotection of the 5’-DMT group is done using two times 150 µl 3% DCA in DCE. We used three coupling steps with 50 µl 0.08 M and 140 µl 0.24 M BTT in dry ACN for 80 s each. The coupling is followed by the capping of the unreacted 5’-OH group by a 1:1 mixture of 300 µl capping reagent A and B for 50s. Cap A is a mixture of THF/lutidine/acetic anhydride (8:1:1) and Cap B consist of 16% N-methylimiazole in THF. After capping, the P(III) is oxidized using 150 µl of 0.02M Iodine in THF/pyridine/water (7:2:1) for 50 s.The oligonucleotides are cleaved and deprotected using 1 bar gaseous methylamine at 65 °C for 2 h. The oligonucleotides were washed from the solid support with a 1:1 mixture of ethanol and water (3 x 200 µl). To the solution 20 µl of a 1 M solution of TRIS buffer was added. The solution was reduced under vacuum to dryness. The 2’-TBDMS group was removed using 130 µl if a mixture of NMP/TEA/TEA 3HF (60:30:40) at 70°C for 2h. The reaction was quenched with 160 µl isopropoxytrimethylsilane for 30 min. The mixture was reduced under vacuum for 2h. Finally 240 µl water was added. The aqueous

98 phase was applied on a waters column (XBridge OST C18, 10 x 50mm, 2.5μm) at 60°C in a TEAA/methanol running buffer (gradient 20% MeOH to 70%). The collected fractions were reduced for 5 h under vacuum and 40% AcOH were added at RT for 2h followed by evacuation over night to dryness. The fractions were dissolved in 200 µl water and again applied to a waters column on an agilent HPLC in the same running buffer with a gradient of 5% to 25% MeOH. Approx. 10% of the samples were analyzed using a water column (Acquity OST C18, 2.1 x 50mm, 1.7μm) on an Agilent HPLC-MS using a running buffer of 0.1M HFIP, 8.6mM TEA and MeOH (gradient 5% MeOH to 40). The concentration was measured using 10 µl of the solution diluted to 100 µl in a UV permeable 96 well plate on a SpectraMax Plus. The concentrations were pipetted using a Hamilton StarLet pipetting robot.

Synthesis of RNA hairpins and short RNAs.

Additional short RNAs and mixed strands were synthesized following the same protocol as described above but on a MerMaid12 with single columns. For RNA hairpins 2000 Å solid support was used. The short sequences were synthesized using 500 Å solid support.

Synthesis of modified RNA sequences.

The modified dMC/mMC sequences were also synthesized using the protocol described above. On the position of the mutation, 4-triazoleuridinephoshporamidite was used instead of the cytidine phosphoramidite. For the mMC the regular protocol was followed and the substitution of the triazole was done by methylamine. For dMC the CPG was removed after the synthesis cycle and incubated with a 200 µl 1:1 mixture of 40% dimethylamine in water and 33% dimethylamine in ethanol for 30 min at 50°C. Afterwards either 200 µl ammonium hydroxide was added and incubated at 65°C for 1h or the mixture was reduced to dryness and applied to the methylamine procedure as described before.

SPR Experiments Sierra Sensors; Immobilization

Every SPR experiment was done at 25°C. Immobilization was done on a MASS-1 or SPR-2 from Sierra Sensors. The amine chip was coating at a flow rate of 12.5 µl/min using a PBS buffer at a pH at 7.5. All 16 channels were injected with a solution of 1M NaCl and 1M NaOH for 2 min. Afterwards 100 µl of a mixture of EDC/NHS (200 mM/100mM) were injected. The streptavidin was coated using an acetate buffer (10mM sodium acetate) at a pH of 5.5 with 100 µl resulting in an approx. response of 2500 RU. Before capturing the analyte the running buffer was switched to a HEPES buffer (10 mM HEPES at pH

99

7.4, 200 mM NaCl, 3.4 mM EDTA, 0.01 % (v/v) Tween 20). Afterwards approx. 10 µl of a 75 nM solution of biotinylated RBFOX RRM in HEPES buffer was injected only on the second channel resulting in a response of approx. 200 RU. The amount of the injected ligand varied, depends on the wanted coating. Kinetic measurements of short sequences were done using a surface loading of approx. 130 RU, Hairpins were measured with a coating of approx. 50 RU and the screen, as described at approx. 200 RU.

SPR Experiments Sierra Sensors; SPR screen

The analytes were all at a concentration of 1 µM in HEPES buffer (10 mM HEPES at pH 7.4, 200 mM NaCl, 3.4 mM EDTA, 0.01 % (v/v) Tween 20). 50 µl of the analyte solution were injected over both channels followed by 240 s of a dissociation time. After each injection, 2M NaCl was injected for one minute for regeneration. For double referencing before, after and in-between of each plate HEPES buffer was injected and also a solution of 100 nM UGCAUGU was injected before, after and in-between of each plate for normalization.

SPR Experiments Sierra Sensors; Kinetic measurements

100 µl of the analyte was injected at a flow rate of 25 µl/min with a dissociation time of 480 s. Afterwards 50 µl of a 2M NaCl solution was added for regeneration. After every injection, a buffer injection was added for double referencing.

sequence starting concentration [nM] sequence starting concentration [nM] UGCAUGA 500 GGAAUGU 4000 UGCAUGC 500 GGGUUGU 4000 UGCAUGU 100 GGCUUGU 4000 UGCAUGG 500 AGGUUGU 4000 UGCACGA 500 UGCUUGU 4000 UGCACGG 1000 CGGUUGU 4000 CGCAUGU 500 UGCCUGU 4000 AGCAUGU 500 GGCAUGU 4000 UGCACGC 1000 UGCAAGC 4000 UGCACGU 500 Table 13. Starting concentration for the kinetic measurements of the SPR screen hits agaisnt RBFOX RRM.

100

SPR Data fitting

The data were fitted using Scubber301 using a 1:1 Langmuir model including mass transfer. The data was processed, using a streptavidin channel. The average buffer injection was subtracted from the signal to account for bulk shift.

SPR Experiments Biacore

The SPR measurements were carried out using a Biacore T100 at 25°C. The running buffer was a HEPES buffer (200 mM NaCl, 10 mM HEPES, 3.4 mM EDTA, 0.01 % Tween 20) at a pH of 7.3 for both coating and measurements. The biotinylated FOX RRM was coated on a Biacore SA chip on the second flow cell to 140 RU using a 70 nM FOX RRM solution at a flow rate of 25 µL min-1. Each analyte was measured in duplicates in a 1:1 dilution series at a flowrate of 25 µL min-1 with an injection time of 150 s (syn-pre-20b/syn-pre-107), 220 s (single stranded sequences) or 320 s (syn-pre-32) and a dissociation time of minimum 280 s. After each injection, the surface was regenerated using a 1 M NaCl solution for 1 min. The data was double referenced and fitted to a 1:1 Langmuir fit including mass transfer using Scrubber3.0 (BioLogicSoftware, http://www.biologic.com).

UV melting experiments

UV melting experiments were conducted on a VarianCary 300 Bio spectrometer equipped with a 6x6 Multicell Block Peltier and a thermo controller. We used 160 µl Hellma microcuvettes. The RNA was measured at a concentration of 2.5 µm in a phosphate buffer (100 mM NaCl, 100 mM sodium

-1 phosphate, 0.1 mM Na2EDTA, pH 7.0) at 260 nm. The cooling rate was 1 K min from 5°C to 95°C. The melting temperature was calculated using the maximum of the first derivative of the plot absorbance vs temperature. The thermodynamic values were obtained from the van’t Hoff plot of lnK vs 1/T with –ΔH/R as the slope and ΔS/R as the intercept. K is calculated using the associated fraction [320]

푓푇 퐾푇 = 2 (1 − 푓푇) ∗ 푐

퐴푠(푇) − 퐴(푇) [퐴퐵] 푓(푇) = = 퐴푠(푇) − 퐴푑(푇) [퐴] + [퐴퐵]

With: As(T) upper baseline Ad(T) lower baseline A(T) Absorption at temperature T

101

Circular Dichroism

The CD spectra were recorded with a Chirascan™ Circular Dichroism Spectrometer at 25°C. We used a HEPES buffer (60 mM NaCl, 10 mM HEPES, 3.4 mM EDTA, 0.01 % Tween 20, pH 7.4) to record our spectra. The phosphoramidites were in a concentration of 2.5 µM and the RBFOX RRM from 0.5 µM to 5.5 µM in 1 µM steps. The spectra were recorded in triplicates at 1 nm steps from 200-320 nM. To the oligonucleotide solution, RBFOX RRM was added in 1 µM steps from 0.5 to 5.5 µM. From the RNA/protein spectra, the buffer and the pure RBFOX RRM were subtracted and the spectra was smoothened using the Savitsky-Golay algorithm [321].

Small RNA sequencing

Total RNA was isolated with MirVana Kit and TRIZOL reagent according to the manufacturer’s protocol. Input of 1 µg total RNA was processed for small RNA sequencing. Additionally, four calibrator oligonucleotides (Cal 01-04 5fMol each) were added as a reference as described previously and sometimes with slight modifications. Briefly, to get the identity of longer fragments total RNA was dephosphorylated using FastAP (Fermentas) and radiolabelled with 200 U/mL T4 polynucleotide kinase, 3’-phosphatase free (Roche) and radioactive ATP ([γ-32P]- 6000Ci/mmol 10mCi/mL, 10µL, final volume 100µL, Perkin-Elmer) for 1h at 37° C. Subsequently, RNA was separated with denaturing PAA (15%) gel electrophoresis and exposed to a phospho-imager screen. The sizes between 20 to 35 nt were excised and eluted in 0.4 M NaCl at 4 °C overnight under agitation. The RNA was recovered ethanol precipitation and addition of Glycoblue (Ambion). Thereafter, preadenylated 3’-adapter (IDT DNA technologies, 5’-TGGAATTCTCGGGTGCCAAGG-3’) was ligated with truncated T4 RNA Ligase 2 (1- 249, K227Q, final concentration 2000U/µL, NEB) overnight on ice. After a denaturing 15% PAA gel separation ligated RNA was cut out of gel and extracted in 0.4 M NaCl overnight. Again, the RNA was recovered by ethanol precipitation and addition of Glycoblue (Ambion) followed by 5’-RNA Adapter (5’-GUUCAGAGUUCUACAGUCCGACGAUC-3’) ligation with T4 RNA ligase (Fermentas) at 37°C for 1h and another 12% PAA gel purification together with 0.4M NaCl overnight and ethanol precipitation. RNA was converted into cDNA with SuperScript III reverse transcriptase (Invitrogen) according to the manufacturer’s description. Library amplification step was performed with minimal number of PCR cycles that still enabled us to see the DNA product on the agarose gel (2.5%) in the pilot PCR. PCR was performed using Taq-Polymerase (DreamTaq, Thermo) with IlluminaTruSeq Small RNA Sample Prep Kits. Final PCR was extracted using QiaExII (Qiagen) according to the purchaser’s protocol. The recovered DNA was submitted to Illumina single end 50 cycles sequencing on HiSeq2000 device.

102

RNA sequencing data analysis

We obtained on average 24 million sequencing reads per sample. Adapter sequences were removed from sequencing reads using fastx clipper tool from FASTX toolkit (http://hannonlab.cshl.edu/fastx_toolkit). Unclipped reads and clipped reads shorter than 13 nucleotides were discarded retaining over 95% of the reads for further analysis. Clipped reads were aligned to Ensembl reference version 37 release 72 (CRCh37.72) using Bowtie v.1.0.0[322]. We configured the short read aligner to allow one mismatch per read and report maximum of one hit per input read by randomly assigning ambiguous reads to one of the genomic regions they aligned to (bowtie parameters: –l 50 –n 1 –e 30 –m 10 –k 1 --best --strata --nomaqround). Reads were annotated by Ensemble annotation for CRCh37.72 as well as the mature microRNAs annotations provided by mirBase (Release 20) using HTSeq-count from the Python package HTseq (http://www-huber.embl.de/users/anders/HTSeq). Data normalization and differential expression tests were carried out using Bioconductor package DESeq2[323]. Samples were analyzed in a paired test setting to account for batch effects using generalized linear models. Data normalization and dispersion estimation was performed using all expressed genes but low expressed genes (base-mean < 10) were excluded from final test for differential expression in order to lose minimal statistical power on multiple hypothesis testing. The p-values were adjusted to control false discovery rate under 5% using Benjamini-Hochberg procedure

103

5.: Supplementary tables and figures

5.1.: List of Abbreviations

A2BP1 Ataxin2 binding protein 1 ACN Acetonitril AcOH Acetic Acid ASO antisense oligonucleotides BTT 5-Benzylthio-tetrazole CD circular dichroism CGRP calcitonin gene-related peptide CLIP cross-linking immunoprecipitation DCA dichloro acetic acid DCA dichloracetic acid DCE dichlororethane DCM dichloromethane dMC dimethyl cytidine DMT Dimethoxytrityl DNA Deoxyribonucleic acid FBE FOX binding element FOX feminizing locus on X HITS-CLIP High-throughput sequencing cross-linking immunoprecipitation LNA locked nucleic acids MALDI Matrix-assisted laser desorption/ionization MeOH Methanol mESC mouse embryonic stem cells miRNAs microRNAs mMC monomethyl cytidine mRNA messenger RNA ncRNA non-coding RNA NLS nuclear localisation signal NMP N-Methyl-2-pyrrolidon NMR Nuclear magnetic resonance NPC nuclear pore complex nt nucleotides PAR-CLIP Photoactivatable-Ribonucleoside-Enhanced Crosslinking Immunoprecipitation PCR Polymerase chain reaction RBD RNA binding domain RBP RNA binding protein RIP RNA immunoprecipitation RISC RNA induced silencing complex RNA Ribonucleic acid RRM RNA recognition motif RT room temperature

104

RT-PCR Real time Polymerase chain reaction RU Response Units SAM self-assembling monolayer SELEX Systematic Evolution of Ligands by Exponential Enrichment SPR Surface Plasmon Resonance SPW Surface Plasmon wave TBDMS t-butyldimethylsilyl TEA Tetraethylammonium TL terminal loop TOCSY Total correlated Spectroscopy TOM [(triisopropylsilyl)oxy]methyl TRIS Tris(hydroxymethyl)-aminomethan Table 14. List of Abbreviations used.

105

5.2.: Tables

Sample RU RU normalized to standard Ru Normalized to mass RU normalized to mass/standard

GGCAUGU 13,5 15,43 13,26 15,15

UGCAUGA 11,4 13,61 11,28 13,46

UGCAUGG 11,4 13,61 11,19 13,37

UGCAUGU 10,7 12,78 10,70 12,77

UGCAUGC 10,5 12,54 10,50 12,54

UGCACGG 11 12,05 10,81 11,84

UGCACGU 10,6 11,62 10,60 11,62

UGCACGA 10,6 11,62 10,49 11,49

AGCAUGU 9,8 11,20 9,69 11,08

UGCACGC 10,1 11,07 10,10 11,07

UGCAUGU 9 10,75 8,90 10,63

CGCAUGU 9,2 10,22 9,20 10,22

GGCUUGU 8,6 9,83 8,53 9,75

GGAAUGU 6,7 7,66 6,51 7,44

UGCAAGC 7 7,00 6,93 6,93

GGAUUGU 5,1 5,83 5,01 5,72

GGCCUGU 5 5,71 4,96 5,67

CGGUUGU 5 5,56 4,96 5,51

UGCCUGU 4,5 5,37 4,50 5,37

GGUAUGU 4,8 5,33 4,71 5,23

CACUUGU 4,3 4,85 4,38 4,93

UGCACCU 4,6 4,84 4,69 4,93

GGGUUGU 4,4 4,89 4,29 4,76

CGCUUGU 4,2 4,67 4,24 4,72

UGCAAGU 4,7 4,70 4,65 4,65

UGCUUGU 3,8 4,54 3,80 4,54

CACGUGU 4 4,51 4,00 4,51

CGAAUGU 4,1 4,56 4,06 4,51

UGCAAGA 4,4 4,40 4,31 4,31

UGCACUA 3,9 4,27 3,93 4,31

UGCAAGG 4,1 4,10 3,98 3,98

UGCACCC 3,6 3,79 3,67 3,86

UGCAUCG 3,2 3,82 3,20 3,82

UACCUGU 3,7 3,75 3,74 3,79

106

UGCACCG 3,5 3,68 3,50 3,69

CCAUUGU 3,3 3,57 3,36 3,63

AGGUUGU 3,2 3,66 3,14 3,59

UGCAUCC 2,8 3,34 2,85 3,41

AGCUUGU 2,9 3,31 2,90 3,31

UGCACUG 2,9 3,18 2,90 3,18

GGAGUGU 2,8 3,20 2,70 3,09

UUCCUGU 2,5 3,03 2,53 3,06

UGAAUGU 2,8 3,07 2,74 3,00

UGUUUGU 2,5 2,99 2,50 2,98

GUAAUGU 2,7 3,00 2,67 2,97

GCGAUGU 2,5 2,99 2,45 2,93

UGGCUGU 2,5 2,99 2,45 2,93

GGACUGU 2,6 2,97 2,55 2,92

UGCAGUU 2,6 2,89 2,60 2,89

GGUUUGU 2,6 2,89 2,58 2,87

UGCAUCA 2,3 2,75 2,32 2,77

GCCUUGU 2,1 2,51 2,12 2,53

CGGGUGU 2,3 2,56 2,24 2,49

UGCACCA 2,3 2,42 2,32 2,44

UGGAUGU 2,1 2,51 2,04 2,44

GCGUUGU 2 2,39 1,98 2,37

CACCUGU 2 2,25 2,04 2,30

UGCAUAC 1,9 2,27 1,91 2,29

UGAUUGU 2,1 2,30 2,08 2,28

CGGAUGU 2 2,22 1,96 2,18

AGCCUGU 1,9 2,17 1,90 2,17

CCAGUGU 2 2,16 2,00 2,16

UGCAUUA 1,9 2,14 1,91 2,16

GGGAUGU 2 2,22 1,93 2,14

CGCCUGU 1,9 2,11 1,92 2,13

UGCACAU 2 2,11 2,01 2,12

GGCGUGU 1,9 2,17 1,85 2,12

CGGCUGU 1,9 2,11 1,89 2,10

GCGCUGU 1,7 2,03 1,69 2,02

UGUAUGU 1,7 2,03 1,68 2,01

UCACUGU 1,7 1,86 1,73 1,90

107

UGCACAA 1,8 1,89 1,79 1,89

GGUGUGU 1,7 1,89 1,66 1,84

UGCAUCU 1,5 1,79 1,53 1,82

GCAUUGU 1,5 1,79 1,50 1,79

UGCACUC 1,6 1,75 1,63 1,79

AGUUUGU 1,5 1,71 1,50 1,71

GUCUUGU 1,2 1,68 1,21 1,70

UCUCUGU 1,5 1,64 1,54 1,69

UGCAGUG 1,5 1,67 1,47 1,64

AGUAUGU 1,4 1,60 1,38 1,58

UGCACAG 1,5 1,58 1,48 1,56

GUAUUGU 1,4 1,56 1,40 1,55

GAGUUGU 1,4 1,58 1,37 1,55

UGGUUGU 1,3 1,55 1,28 1,52

AGUGUGU 1,3 1,49 1,28 1,46

GAUAUGU 1,3 1,46 1,29 1,45

UGCAUAA 1,2 1,43 1,20 1,43

UGCGUGU 1,2 1,43 1,18 1,41

UGCACUU 1,2 1,32 1,22 1,34

GGUCUGU 1,2 1,33 1,19 1,32

UGCAACU 1,3 1,30 1,31 1,31

UGCAACG 1,3 1,30 1,29 1,29

AGGAUGU 1,1 1,26 1,07 1,22

CGCGUGU 1,1 1,22 1,09 1,21

UUGUUGU 1 1,21 0,99 1,20

UGCAGGG 1,1 1,22 1,06 1,18

AAGCUGU 1,1 1,19 1,09 1,18

GGGGUGU 1,1 1,22 1,05 1,17

AGUCUGU 1 1,14 1,00 1,14

GAGGUGU 1 1,13 0,96 1,09

GGGCUGU 1 1,11 0,97 1,08

UGCACAC 1 1,05 1,01 1,06

GAUUUGU 0,9 1,01 0,90 1,01

GUAGUGU 0,9 1,00 0,88 0,98

UUGCUGU 0,8 0,97 0,79 0,96

AAGUUGU 0,9 0,97 0,89 0,96

AUGCUGU 0,8 0,96 0,80 0,95

108

AUGUUGU 0,8 0,96 0,80 0,95

CGUUUGU 0,9 0,94 0,91 0,94

UGCAGUC 0,8 0,89 0,80 0,89

UGCAACC 0,8 0,80 0,81 0,81

GUACUGU 0,7 0,78 0,70 0,78

UGCAGGU 0,7 0,78 0,69 0,76

UUCUUGU 0,6 0,73 0,61 0,74

GCCCUGU 0,6 0,72 0,61 0,72

UUCGUGU 0,6 0,73 0,60 0,72

UGUCUGU 0,6 0,72 0,60 0,72

UUGAUGU 0,6 0,73 0,59 0,71

UUGGUGU 0,6 0,73 0,58 0,71

UAGCUGU 0,7 0,71 0,69 0,70

UGCAACA 0,7 0,70 0,70 0,70

GCUUUGU 0,6 0,69 0,61 0,69

CGACUGU 0,6 0,67 0,60 0,67

GAUGUGU 0,6 0,68 0,59 0,66

UGCAGUA 0,6 0,67 0,59 0,66

AACUUGU 0,6 0,65 0,60 0,65

UGCAGGA 0,6 0,67 0,58 0,65

AACGUGU 0,6 0,65 0,59 0,64

CGUAUGU 0,6 0,62 0,60 0,62

GCACUGU 0,5 0,60 0,50 0,60

GCAAUGU 0,5 0,60 0,49 0,59

GACUUGU 0,5 0,56 0,50 0,56

GUCAUGU 0,4 0,56 0,40 0,56

CGUCUGU 0,5 0,52 0,51 0,53

UACGUGU 0,5 0,51 0,50 0,50

GCAGUGU 0,4 0,48 0,39 0,47

AUGGUGU 0,4 0,48 0,39 0,47

UGUGUGU 0,4 0,48 0,39 0,47

CACAUGU 0,4 0,45 0,40 0,45

CAAUUGU 0,4 0,45 0,40 0,45

CCAAUGU 0,4 0,43 0,40 0,44

UGACUGU 0,4 0,44 0,40 0,43

UGAGUGU 0,4 0,44 0,39 0,43

GUCCUGU 0,3 0,42 0,30 0,43

109

CUCAUGU 0,4 0,42 0,41 0,42

AAGAUGU 0,4 0,43 0,39 0,42

ACACUGU 0,4 0,42 0,40 0,42

ACCAUGU 0,4 0,42 0,40 0,42

UUCAUGU 0,3 0,36 0,30 0,36

UGCAUAU 0,3 0,36 0,30 0,36

GCCGUGU 0,3 0,36 0,30 0,36

UGCAUAG 0,3 0,36 0,30 0,35

GCGGUGU 0,3 0,36 0,29 0,35

AGAAUGU 0,3 0,35 0,29 0,35

UGCAUUG 0,3 0,34 0,30 0,34

AGCGUGU 0,3 0,34 0,29 0,34

CGAUUGU 0,3 0,33 0,30 0,33

GAGCUGU 0,3 0,34 0,29 0,33

AGGGUGU 0,3 0,34 0,29 0,33

GAGAUGU 0,3 0,34 0,29 0,33

AAAUUGU 0,3 0,32 0,30 0,32

ACCCUGU 0,3 0,31 0,31 0,32

AAAGUGU 0,3 0,32 0,29 0,32

ACCGUGU 0,3 0,31 0,30 0,31

UACUUGU 0,3 0,30 0,30 0,31

UGCAAAU 0,3 0,30 0,30 0,30

UAGAUGU 0,3 0,30 0,29 0,30

UAGGUGU 0,3 0,30 0,29 0,30

GUGAUGU 0,2 0,28 0,20 0,28

UUUGUGU 0,2 0,23 0,20 0,23

GAUCUGU 0,2 0,23 0,20 0,23

AGGCUGU 0,2 0,23 0,20 0,22

GACAUGU 0,2 0,23 0,20 0,22

GACGUGU 0,2 0,23 0,20 0,22

CCACUGU 0,2 0,22 0,20 0,22

CAUUUGU 0,2 0,22 0,20 0,22

AAGGUGU 0,2 0,22 0,19 0,21

AAUCUGU 0,2 0,21 0,20 0,21

ACAUUGU 0,2 0,21 0,20 0,21

UGCAAUG 0,2 0,21 0,20 0,21

CUAGUGU 0,2 0,21 0,20 0,21

110

ACAAUGU 0,2 0,21 0,20 0,21

CGUGUGU 0,2 0,21 0,20 0,21

ACAGUGU 0,2 0,21 0,20 0,21

UACAUGU 0,2 0,20 0,20 0,20

UGCAAAC 0,2 0,20 0,20 0,20

GUCGUGU 0,1 0,14 0,10 0,14

AUCGUGU 0,1 0,12 0,10 0,12

AUGAUGU 0,1 0,12 0,10 0,12

GCUAUGU 0,1 0,11 0,10 0,11

CAACUGU 0,1 0,11 0,10 0,11

CGAGUGU 0,1 0,11 0,10 0,11

AACAUGU 0,1 0,11 0,10 0,11

CUACUGU 0,1 0,10 0,10 0,11

CUAUUGU 0,1 0,10 0,10 0,11

UAGUUGU 0,1 0,10 0,10 0,10

UAAAUGU 0,1 0,10 0,10 0,10

UGCAGCC 0 0,00 0,00 0,00

UGCAUUU 0 0,00 0,00 0,00

UGCAGGC 0 0,00 0,00 0,00

AAACUGU 0 0,00 0,00 0,00

CAGUUGU 0 0,00 0,00 0,00

CAAAUGU 0 0,00 0,00 0,00

AACCUGU 0 0,00 0,00 0,00

CAUCUGU 0 0,00 0,00 0,00

CAUGUGU 0 0,00 0,00 0,00

ACCUUGU 0 0,00 0,00 0,00

GACCUGU 0 0,00 0,00 0,00

GCUGUGU 0 0,00 0,00 0,00

CUAAUGU 0 0,00 0,00 0,00

CUCUUGU 0 0,00 0,00 0,00

UUAGUGU 0 0,00 0,00 0,00

UAAUUGU 0 0,00 0,00 0,00

UCUGUGU 0 0,00 0,00 0,00

UGGGUGU 0 0,00 0,00 0,00

UGCAAAG -0,1 -0,10 -0,10 -0,10

UGCAAAA -0,1 -0,10 -0,10 -0,10

AAUGUGU -0,1 -0,10 -0,10 -0,10

111

UGCAAUA -0,1 -0,11 -0,10 -0,10

CUCGUGU -0,1 -0,10 -0,10 -0,11

CAGGUGU -0,1 -0,11 -0,10 -0,11

CAAGUGU -0,1 -0,11 -0,10 -0,11

AUUUUGU -0,1 -0,11 -0,10 -0,11

UGCAUUC -0,1 -0,11 -0,10 -0,11

AGAUUGU -0,1 -0,12 -0,10 -0,12

AUAAUGU -0,1 -0,12 -0,10 -0,12

AUCAUGU -0,1 -0,12 -0,10 -0,12

UUAUUGU -0,1 -0,12 -0,10 -0,12

AUCUUGU -0,1 -0,12 -0,10 -0,12

AUCCUGU -0,1 -0,12 -0,10 -0,12

GUGCUGU -0,1 -0,14 -0,10 -0,14

UAAGUGU -0,2 -0,20 -0,20 -0,20

UGCAAUC -0,2 -0,21 -0,20 -0,21

AAAAUGU -0,2 -0,22 -0,20 -0,21

CAGAUGU -0,2 -0,22 -0,20 -0,21

CUCCUGU -0,2 -0,21 -0,21 -0,21

UCGAUGU -0,2 -0,22 -0,20 -0,22

UGCAGCA -0,2 -0,22 -0,20 -0,22

UCUUUGU -0,2 -0,22 -0,21 -0,23

AUUCUGU -0,2 -0,23 -0,20 -0,23

GCUCUGU -0,2 -0,23 -0,20 -0,23

AGACUGU -0,2 -0,24 -0,20 -0,23

UUUCUGU -0,2 -0,23 -0,20 -0,23

AUAUUGU -0,2 -0,24 -0,20 -0,24

AUACUGU -0,2 -0,24 -0,20 -0,24

UUACUGU -0,2 -0,24 -0,20 -0,24

GAACUGU -0,2 -0,25 -0,20 -0,24

UAACUGU -0,3 -0,30 -0,30 -0,30

AAUUUGU -0,3 -0,31 -0,30 -0,31

UGCAAUU -0,3 -0,32 -0,30 -0,32

CAGCUGU -0,3 -0,32 -0,30 -0,32

UAUGUGU -0,3 -0,33 -0,30 -0,33

UCGGUGU -0,3 -0,33 -0,30 -0,33

CAUAUGU -0,3 -0,32 -0,30 -0,33

UGCAGCG -0,3 -0,33 -0,29 -0,33

112

UGCAGCU -0,3 -0,33 -0,30 -0,33

AUUGUGU -0,3 -0,34 -0,30 -0,34

AUUAUGU -0,3 -0,34 -0,30 -0,34

UUUAUGU -0,3 -0,35 -0,30 -0,35

UUUUUGU -0,3 -0,35 -0,30 -0,35

AUAGUGU -0,3 -0,36 -0,30 -0,35

CUGGUGU -0,3 -0,37 -0,30 -0,37

CUGUUGU -0,3 -0,37 -0,30 -0,37

CUGCUGU -0,3 -0,37 -0,30 -0,37

AAUAUGU -0,4 -0,42 -0,40 -0,41

UAUCUGU -0,4 -0,44 -0,40 -0,44

UCGCUGU -0,4 -0,44 -0,40 -0,44

UCUAUGU -0,4 -0,44 -0,41 -0,45

UUAAUGU -0,4 -0,48 -0,40 -0,48

GAAUUGU -0,4 -0,49 -0,40 -0,49

UGCAGAA -0,5 -0,55 -0,49 -0,54

UAUAUGU -0,5 -0,55 -0,50 -0,55

UCAGUGU -0,5 -0,55 -0,50 -0,55

UAUUUGU -0,5 -0,55 -0,51 -0,55

UCGUUGU -0,5 -0,55 -0,51 -0,55

UCCAUGU -0,5 -0,55 -0,51 -0,56

CCGAUGU -0,4 -0,56 -0,40 -0,56

UCCUUGU -0,5 -0,55 -0,51 -0,56

CCGCUGU -0,4 -0,56 -0,40 -0,57

CCCAUGU -0,4 -0,56 -0,41 -0,57

CCCUUGU -0,4 -0,56 -0,41 -0,58

CCCCUGU -0,4 -0,56 -0,41 -0,58

ACGAUGU -0,5 -0,59 -0,49 -0,58

ACGUUGU -0,5 -0,59 -0,50 -0,59

GAAGUGU -0,5 -0,62 -0,49 -0,60

GAAAUGU -0,5 -0,62 -0,49 -0,60

CUGAUGU -0,5 -0,62 -0,50 -0,62

UGCAGAC -0,6 -0,66 -0,59 -0,65

UCAAUGU -0,6 -0,66 -0,60 -0,66

UCAUUGU -0,6 -0,66 -0,61 -0,67

UCCCUGU -0,6 -0,66 -0,62 -0,68

ACGGUGU -0,6 -0,71 -0,59 -0,69

113

CCGGUGU -0,5 -0,70 -0,50 -0,70

UCCGUGU -0,7 -0,77 -0,71 -0,78

AGAGUGU -0,7 -0,82 -0,68 -0,80

GUGGUGU -0,6 -0,84 -0,58 -0,82

ACUUUGU -0,7 -0,82 -0,71 -0,84

ACUCUGU -0,7 -0,82 -0,71 -0,84

CCCGUGU -0,6 -0,84 -0,61 -0,85

CUUAUGU -0,7 -0,86 -0,71 -0,88

CUUCUGU -0,7 -0,86 -0,72 -0,89

ACUAUGU -0,8 -0,94 -0,81 -0,95

UGCAGAG -0,9 -0,99 -0,87 -0,96

GUGUUGU -0,7 -0,98 -0,69 -0,97

CCGUUGU -0,7 -0,98 -0,71 -0,99

CCUAUGU -0,7 -0,98 -0,71 -1,00

CUUUUGU -0,8 -0,98 -0,82 -1,01

CUUGUGU -0,9 -1,11 -0,91 -1,12

UGCAGAU -1,1 -1,21 -1,09 -1,19

ACGCUGU -1,1 -1,29 -1,10 -1,29

ACUGUGU -1,2 -1,41 -1,20 -1,41

GUUCUGU -1 -1,40 -1,01 -1,42

CCUCUGU -1 -1,40 -1,03 -1,45

CCUUUGU -1,2 -1,68 -1,24 -1,73

GUUGUGU -1,4 -1,96 -1,39 -1,95

GUUAUGU -1,4 -1,96 -1,40 -1,96

CCUGUGU -1,5 -2,11 -1,52 -2,13

GUUUUGU -2,1 -2,95 -2,12 -2,98

GCCAUGU -6,5 -7,76 -6,50 -7,76

Supplementary Table 1. List of all 320 sequences form RNA screen. RU response was normalized to mass of the analyte and the surface density of the chip

114

Name Sequence 5’  3’

H-Ras GGUGGUGGUGGGCGCCGUCGGUGUGGGCAAGAGUGCGCUGACCAUCCUUUUUU[BioTEG-Q] hsa-let-7a-1 UGAGGUAGUAGGUUGUAUAGUUUUAGGGUCACACCCACCACUGGGAGAUAACUAUACAAUCUACUGUCUUUC[BioTEG-Q] hsa-let-7a-2 UGAGGUAGUAGGUUGUAUAGUUUAGAAUUACAUCAAGGGAGAUAACUGUACAGCCUCCUAGCUUUCC[BioTEG-Q] hsa-let-7c UGAGGUAGUAGGUUGUAUGGUUUAGAGUUACACCCUGGGAGUUAACUGUACAACCUUCUAGCUUUCC[BioTEG-Q] hsa-let-7d AGAGGUAGUAGGUUGCAUAGUUUUAGGGCAGGGAUUUUGCCCACAAGGAGGUAACUAUACGACCUGCUGCCUUUC[BioTEG-Q] hsa-let-7f-1 UGAGGUAGUAGAUUGUAUAGUUGUGGGGUAGUGAUUUUACCCUGUUCAGGAGAUAACUAUACAAUCUAUUGCCUUCCC[BioTEG-Q] hsa-let-7f-2 UGAGGUAGUAGAUUGUAUAGUUUUAGGGUCAUACCCCAUCUUGGAGAUAACUAUACAGUCUACUGUCUUUCC[BioTEG-Q] hsa-let-7g UGAGGUAGUAGUUUGUACAGUUUGAGGGUCUAUGAUACCACCCGGUACAGGAGAUAACUGUACAGGCCACUGCCUUGC[BioTEG-Q] hsa-let-7i UGAGGUAGUAGUUUGUGCUGUUGGUCGGGUUGUGACAUUGCCCGCUGUGGAGAUAACUGCGCAAGCUACUGCCUUGCU[BioTEG-Q] hsa-miR-101-1 CAGUUAUCACAGUGCUGAUGCUGUCUAUUCUAAAGGUACAGUACUGUGAUAACUGAA[BioTEG-Q] hsa-miR-101-2 CGGUUAUCAUGGUACCGAUGCUGUAUAUCUGAAAGGUACAGUACUGUGAUAACUGAA[BioTEG-Q] hsa-miR-103-1 UCGGCUUCUUUACAGUGCUGCCUUGUUGCAUAUGGAUCAAGCAGCAUUGUACAGGGCUAUGA[BioTEG-Q] hsa-miR-106b UAAAGUGCUGACAGUGCAGAUAGUGGUCCUCUCCGUGCUACCGCACUGUGGGUACUUGCUGC[BioTEG-Q] hsa-miR-107 UCAGCUUCUUUACAGUGUUGCCUUGUGGCAUGGAGUUCAAGCAGCAUUGUACAGGGCUAUCA[BioTEG-Q] hsa-miR-10a UACCCUGUAGAUCCGAAUUUGUGUAAGGAAUUUUGUGGUCACAAAUUCGUAUCUAGGGGAAUA[BioTEG-Q] hsa-miR-10b UACCCUGUAGAACCGAAUUUGUGUGGUAUCCGUAUAGUCACAGAUUCGAUUCUAGGGGAAUA[BioTEG-Q] hsa-miR-1-2 ACAUACUUCUUUAUGUACCCAUAUGAACAUACAAUGCUAUGGAAUGUAAAGAAGUAUGUAU[BioTEG-Q] hsa-miR-125b-1 UCCCUGAGACCCUAACUUGUGAUGUUUACCGUUUAAAUCCACGGGUUAGGCUCUUGGGAGCU[BioTEG-Q] hsa-miR-128-1 CGGGGCCGUAGCACUGUCUGAGAGGUUUACAUUUCUCACAGUGAACCGGUCUCUUU[BioTEG-Q] hsa-miR-133a-1 GCUGGUAAAAUGGAACCAAAUCGCCUCUUCAAUGGAUUUGGUCCCCUUCAACCAGCUG[BioTEG-Q] hsa-miR-134 UGUGACUGGUUGACCAGAGGGGCAUGCACUGUGUUCACCCUGUGGGCCACCUAGUCACCAA[BioTEG-Q] hsa-miR-135a-2 UAUGGCUUUUUAUUCCUAUGUGAUAGUAAUAAAGUCUCAUGUAGGGAUGGAAGCCAUGAA[BioTEG-Q] hsa-miR-136 ACUCCAUUUGUUUUGAUGAUGGAUUCUUAUGCUCCAUCAUCGUCUCAAAUGAGUCU[BioTEG-Q] hsa-miR-137 ACGGGUAUUCUUGGGUGGAUAAUACGGAUUACGUUGUUAUUGCUUAAGAAUACGCGUAG[BioTEG-Q] hsa-miR-138-2 AGCUGGUGUUGUGAAUCAGGCCGACGAGCAGCGCAUCCUCUUACCCGGCUAUUUCACGACACCAGGGUU[BioTEG-Q] hsa-miR-140 CAGUGGUUUUACCCUAUGGUAGGUUACGUCAUGCUGUUCUACCACAGGGUAGAACCACGG[BioTEG-Q] hsa-miR-142 CAUAAAGUAGAAAGCACUACUAACAGCACUGGAGGGUGUAGUGUUUCCUACUUUAUGGA[BioTEG-Q] hsa-miR-147b GUGGAAACAUUUCUGCACAAACUAGAUUCUGGACACCAGUGUGCGGAAAUGCUUCUGCUA[BioTEG-Q] hsa-miR-148a AAAGUUCUGAGACACUCCGACUCUGAGUAUGAUAGAAGUCAGUGCACUACAGAACUUUGU[BioTEG-Q] hsa-miR-153-2 UCAUUUUUGUGAUGUUGCAGCUAGUAAUAUGAGCCCAGUUGCAUAGUCACAAAAGUGAUC[BioTEG-Q] hsa-miR-15a UAGCAGCACAUAAUGGUUUGUGGAUUUUGAAAAGGUGCAGGCCAUAUUGUGCUGCCUCA[BioTEG-Q] hsa-miR-16-1 UAGCAGCACGUAAAUAUUGGCGUUAAGAUUCUAAAAUUAUCUCCAGUAUUAACUGUGCUGCUGAA[BioTEG-Q] hsa-miR-181a-2 AACAUUCAACGCUGUCGGUGAGUUUGGGAUUUGAAAAAACCACUGACCGUUGACUGUACC[BioTEG-Q] hsa-miR-181b-1 AACAUUCAUUGCUGUCGGUGGGUUGAACUGUGUGGACAAGCUCACUGAACAAUGAAUGCAAC[BioTEG-Q] hsa-miR-181c AACAUUCAACCUGUCGGUGAGUUUGGGCAGCUCAGGCAAACCAUCGACCGUUGAGUGGAC[BioTEG-Q]

115 hsa-miR-18a UAAGGUGCAUCUAGUGCAGAUAGUGAAGUAGAUUAGCAUCUACUGCCCUAAGUGCUCCUUCUGG[BioTEG-Q] hsa-miR-18b UAAGGUGCAUCUAGUGCAGUUAGUGAAGCAGCUUAGAAUCUACUGCCCUAAAUGCCCCUUCUGGC[BioTEG-Q] hsa-miR-190 UGAUAUGUUUGAUAUAUUAGGUUGUUAUUUAAUCCAACUAUAUAUCAAACAUAUUCCU[BioTEG-Q] hsa-miR-193b CGGGGUUUUGAGGGCGAGAUGAGUUUAUGUUUUAUCCAACUGGCCCUCAAAGUCCCGCU[BioTEG-Q] hsa-miR-196b UAGGUAGUUUCCUGUUGUUGGGAUCCACCUUUCUCUCGACAGCACGACACUGCCUUCA[BioTEG-Q] hsa-miR-19a AGUUUUGCAUAGUUGCACUACAAGAAGAAUGUAGUUGUGCAAAUCUAUGCAAAACUGA[BioTEG-Q] hsa-miR-204 UUCCCUUUGUCAUCCUAUGCCUGAGAAUAUAUGAAGGAGGCUGGGAAGGCAAAGGGACGU[BioTEG-Q] hsa-miR-20b CAAAGUGCUCAUAGUGCAGGUAGUUUUGGCAUGACUCUACUGUAGUAUGGGCACUUCCAG[BioTEG-Q] hsa-miR-21 UAGCUUAUCAGACUGAUGUUGACUGUUGAAUCUCAUGGCAACACCAGUCGAUGGGCUGU[BioTEG-Q] hsa-miR-214 UGCCUGUCUACACUUGCUGUGCAGAACAUCCGCUCACCUGUACAGCAGGCACAGACAGGCAGU[BioTEG-Q] hsa-miR-22 AGUUCUUCAGUGGCAAGCUUUAUGUCCUGACCCAGCUAAAGCUGCCAGUUGAAGAACUGU[BioTEG-Q] hsa-miR-224 CAAGUCACUAGUGGUUCCGUUUAGUAGAUGAUUGUGCAUUGUUUCAAAAUGGUGCCCUAGUGACUACA[BioTEG-Q] hsa-miR-24-1 UGCCUACUGAGCUGAUAUCAGUUCUCAUUUUACACACUGGCUCAGUUCAGCAGGAACAG[BioTEG-Q] hsa-miR-25 AGGCGGAGACUUGGGCAAUUGCUGGACGCUGCCCUGGGCAUUGCACUUGUCUCGGUCUGA[BioTEG-Q] hsa-miR-28 AAGGAGCUCACAGUCUAUUGAGUUACCUUUCUGACUUUCCCACUAGAUUGUGAGCUCCUGGA[BioTEG-Q] hsa-miR-299 UGGUUUACCGUCCCACAUACAUUUUGAAUAUGUAUGUGGGAUGGUAAACCGCUU[BioTEG-Q] hsa-miR-29a ACUGAUUUCUUUUGGUGUUCAGAGUCAAUAUAAUUUUCUAGCACCAUCUGAAAUCGGUUA[BioTEG-Q] hsa-miR-29b-1 CUGGUUUCAUAUGGUGGUUUAGAUUUAAAUAGUGAUUGUCUAGCACCAUUUGAAAUCAGUGUU[BioTEG-Q] hsa-miR-29b-2 CUGGUUUCACAUGGUGGCUUAGAUUUUUCCAUCUUUGUAUCUAGCACCAUUUGAAAUCAGUGUU[BioTEG-Q] hsa-miR-29c UGACCGAUUUCUCCUGGUGUUCAGAGUCUGUUUUUGUCUAGCACCAUUUGAAAUCGGUUAUG[BioTEG-Q] hsa-miR-302a ACUUAAACGUGGAUGUACUUGCUUUGAAACUAAAGAAGUAAGUGCUUCCAUGUUUUGGUGA[BioTEG-Q] hsa-miR-302c UUUAACAUGGGGGUACCUGCUGUGUGAAACAAAAGUAAGUGCUUCCAUGUUUCAGUGG[BioTEG-Q] hsa-miR-30a UGUAAACAUCCUCGACUGGAAGCUGUGAAGCCACAGAUGGGCUUUCAGUCGGAUGUUUGCAGC[BioTEG-Q] hsa-miR-30c-1 UGUAAACAUCCUACACUCUCAGCUGUGAGCUCAAGGUGGCUGGGAGAGGGUUGUUUACUCC[BioTEG-Q] hsa-miR-30c-2 UGUAAACAUCCUACACUCUCAGCUGUGGAAAGUAAGAAAGCUGGGAGAAGGCUGUUUACUCU[BioTEG-Q] hsa-miR-31 AGGCAAGAUGCUGGCAUAGCUGUUGAACUGGGAACCUGCUAUGCCAACAUAUUGCCAU[BioTEG-Q] hsa-miR-32 UAUUGCACAUUACUAAGUUGCAUGUUGUCACGGCCUCAAUGCAAUUUAGUGUGUGUGAUAUUU[BioTEG-Q] hsa-miR-323 AGGUGGUCCGUGGCGCGUUCGCUUUAUUUAUGGCGCACAUUACACGGUCGACCUCU[BioTEG-Q] hsa-miR-331 CUAGGUAUGGUCCCAGGGAUCCCAGAUCAAACCAGGCCCCUGGGCCUAUCCUAGAA[BioTEG-Q] hsa-miR-34b UAGGCAGUGUCAUUAGCUGAUUGUACUGUGGUGGUUACAAUCACUAACUCCACUGCCAUCA[BioTEG-Q] hsa-miR-34c AGGCAGUGUAGUUAGCUGAUUGCUAAUAGUACCAAUCACUAACCACACGGCCAGG[BioTEG-Q] hsa-miR-361 UUAUCAGAAUCUCCAGGGGUACUUUAUAAUUUCAAAAAGUCCCCCAGGUGUGAUUCUGAUUU[BioTEG-Q] hsa-miR-365-1 GAGGGACUUUUGGGGGCAGAUGUGUUUCCAUUCCACUAUCAUAAUGCCCCUAAAAAUCCUUAU[BioTEG-Q] hsa-miR-370 CAGGUCACGUCUCUGCAGUUACACAGCUCACGAGUGCCUGCUGGGGUGGAACCUGGU[BioTEG-Q] hsa-miR-373 ACUCAAAAUGGGGGCGCUUUCCUUUUUGUCUGUACUGGGAAGUGCUUCGAUUUUGGGGUGU[BioTEG-Q] hsa-miR-375 GCGACGAGCCCCUCGCACAAACCGGACCUGAGCGUUUUGUUCGUUCGGCUCGCGUGA[BioTEG-Q] hsa-miR-376a-1 GUAGAUUCUCCUUCUAUGAGUACAUUAUUUAUGAUUAAUCAUAGAGGAAAAUCCACGU[BioTEG-Q]

116

hsa-miR-377 AGAGGUUGCCCUUGGUGAAUUCGCUUUAUUUAUGUUGAAUCACACAAAGGCAACUUUUGU[BioTEG-Q]

hsa-miR-378 CUCCUGACUCCAGGUCCUGUGUGUUACCUAGAAAUAGCACUGGACUUGGAGUCAGAAGGC[BioTEG-Q]

hsa-miR-379 UGGUAGACUAUGGAACGUAGGCGUUAUGAUUUCUGACCUAUGUAACAUGGUCCACUAACU[BioTEG-Q]

hsa-miR-382 GAAGUUGUUCGUGGUGGAUUCGCUUUACUUAUGACGAAUCAUUCACGGACAACACUUUU[BioTEG-Q]

hsa-miR-383 AGAUCAGAAGGUGAUUGUGGCUUUGGGUGGAUAUUAAUCAGCCACAGCACUGCCUGGUCAGA[BioTEG-Q]

hsa-miR-411 UAGUAGACCGUAUAGCGUACGCUUUAUCUGUGACGUAUGUAACACGGUCCACUAACC[BioTEG-Q]

hsa-miR-423 UGAGGGGCAGAGAGCGAGACUUUUCUAUUUUCCAAAAGCUCGGUCUGAGGCCCCUCAGU[BioTEG-Q]

hsa-miR-424 CAGCAGCAAUUCAUGUUUUGAAGUGUUCUAAAUGGUUCAAAACGUGAGGCGCUGCUAUA[BioTEG-Q]

hsa-miR-425 AAUGACACGAUCACUCCCGUUGAGUGGGCACCCGAGAAGCCAUCGGGAAUGUCGUGUCCGCCC[BioTEG-Q]

hsa-miR-452 CUUACAACUGUUUGCAGAGGAAACUGAGACUUUGUAACUAUGUCUCAGUCUCAUCUGCAAAGAAGUAAGUG[BioTEG-Q]

hsa-miR-495 GAAGUUGCCCAUGUUAUUUUCGCUUUAUAUGUGACGAAACAAACAUGGUGCACUUCUU[BioTEG-Q]

hsa-miR-497 CAGCAGCACACUGUGGUUUGUACGGCACUGUGGCCACGUCCAAACCACACUGUGGUGUUAGA[BioTEG-Q]

hsa-miR-539 GGAGAAAUUAUCCUUGGUGUGUUCGCUUUAUUUAUGAUGAAUCAUACAAGGACAAUUUCUUUUU[BioTEG-Q]

hsa-miR-582 UUACAGUUGUUCAACCAGUUACUAAUCUAACUAAUUGUAACUGGUUGAACAACUGAACC[BioTEG-Q]

hsa-miR-592 UUGUGUCAAUAUGCGAUGAUGUGUUGUGAUGGCACAGCGUCAUCACGUGGUGACGCAACA[BioTEG-Q]

hsa-miR-604 GCUUGACCUUCCACGCUCUCGUGUCCACUAGCAGGCAGGUUUUCUGACACAGGCUGCGGAAUUCAGGAC[BioTEG-Q]

hsa-miR-628 AUGCUGACAUAUUUACUAGAGGGUAAAAUUAAUAACCUUCUAGUAAGAGUGGCAGUCGA[BioTEG-Q]

hsa-miR-7-1 UGGAAGACUAGUGAUUUUGUUGUUUUUAGAUAACUAAAUCGACAACAAAUCACAGUCUGCCAUA[BioTEG-Q]

hsa-miR-873 GCAGGAACUUGUGAGUCUCCUAUUGAAAAUGAACAGGAGACUGAUGAGUUCCCGGGA[BioTEG-Q]

hsa-miR-874 GGCCCCACGCACCAGGGUAAGAGAGACUCUCGCUUCCUGCCCUGGCCCGAGGGACCGA[BioTEG-Q]

hsa-miR-876 UGGAUUUCUUUGUGAAUCACCAUAUCUAAGCUAAUGUGGUGGUGGUUUACAAAGUAAUUCAUA[BioTEG-Q]

hsa-miR-9-1 UCUUUGGUUAUCUAGCUGUAUGAGUGGUGUGGAGUCUUCAUAAAGCUAGAUAACCGAAAGU[BioTEG-Q]

hsa-miR-9-2 UCUUUGGUUAUCUAGCUGUAUGAGUGUAUUGGUCUUCAUAAAGCUAGAUAACCGAAAGU[BioTEG-Q]

hsa-miR-96 UUUGGCACUAGCACAUUUUUGCUUGUGUCUCUCCGCUCUGAGCAAUCAUGUGCAGUGCCAAUAUG[BioTEG-Q]

Supplementary Table 2. List of all pre-miRs form the ELISA screen

Name and Sequence calc. Mass [g/mol] tr-pre-miR-20b GA, CAGGUAGUUUUGGCAUAACUCUACUG 8265,99 tr-pre-miR-20b, CAGGUAGUUUUGGCAUGACUCUACUG 8281,99 tr-pre-miR-32 GA, AAGUUGCAUAUUGUCACGGCCUCAAUGCAAUUU 10476,32 tr-pre-miR-32, AAGUUGCAUGUUGUCACGGCCUCAAUGCAAUUU 10492,32 tr-pre-miR-miR-1-2 GA, UUUAUGUACCCAUAUGAACAUACAAUGCUAUGGAAUAUAAA 13071,95 tr-pre-miR-miR-1-2, UUUAUGUACCCAUAUGAACAUACAAUGCUAUGGAAUGUAAA 13087,95 tr-pre-miR-miR-19a GA, UUGCACUACAAGAAGAAUAUAGUUGUGCAA 9629,90 tr-pre-miR-miR-19a, UUGCACUACAAGAAGAAUGUAGUUGUGCAA 9645,90 tr-pre-miR-miR-206 GA, UUUAUAUCCCCAUAUGGAUUACUUUGCUAUGGAAUAUAAG 12689,62 tr-pre-miR-miR-206, UUUAUAUCCCCAUAUGGAUUACUUUGCUAUGGAAUGUAAG 12705,62 Supplementary Table 3.: List of synthesized tr-pre-miRs

117

Sequence content [nM] Molarity [nM] mass calc. [g/mol] Sequence content [nM] Molarity [nM] mass calc. [g/mol]

AAAAA 13,40 67000,57 1584,09 GAAAA 16,75 83740,52 1600,09

AAAAC 28,47 142373,44 1560,06 GAAAC 21,89 109441,68 1576,06

AAAAG 54,83 274163,59 1600,09 GAAAG 17,60 88013,35 1616,09

AAAAU 17,53 87644,72 1561,05 GAAAU 20,46 102275,12 1577,05

AAACA 17,57 87852,42 1560,06 GAACA 14,75 73749,16 1576,06

AAACC 11,36 56814,67 1536,03 GAACC 16,99 84925,00 1552,03

AAACG 17,00 84984,77 1576,06 GAACG 18,59 92950,23 1592,06

AAACU 15,22 76107,01 1537,02 GAACU 20,00 100011,49 1553,02

AAAGA 17,67 88371,97 1600,09 GAAGA 18,92 94595,49 1616,09

AAAGC 15,27 76367,90 1576,06 GAAGC 18,10 90510,72 1592,06

AAAGG 16,31 81567,34 1616,09 GAAGG 21,49 107459,61 1632,09

AAAGU 18,58 92891,13 1577,05 GAAGU 20,96 104778,49 1593,05

AAAUA 16,69 83426,05 1561,05 GAAUA 18,13 90626,67 1577,05

AAAUC 20,22 101079,74 1537,02 GAAUC 21,09 105461,68 1553,02

AAAUG 20,22 101086,21 1577,05 GAAUG 23,41 117031,25 1593,05

AAAUU 18,16 90791,06 1538,01 GAAUU 19,91 99529,10 1554,01

AACAA 16,93 84660,35 1560,06 GACAA 14,68 73412,70 1576,06

AACAC 20,13 100664,81 1536,03 GACAC 16,37 81869,60 1552,03

AACAG 15,07 75356,32 1576,06 GACAG 16,73 83640,56 1592,06

AACAU 13,22 66099,38 1537,02 GACAU 22,98 114876,03 1553,02

AACCA 21,98 109906,06 1536,03 GACCA 19,51 97574,03 1552,03

AACCC 14,60 73006,45 1512,00 GACCC 18,89 94441,08 1528,00

AACCG 16,26 81298,17 1552,03 GACCG 18,00 89992,62 1568,03

AACCU 9,54 47710,45 1512,99 GACCU 19,85 99230,56 1528,99

AACGA 19,97 99866,17 1576,06 GACGA 24,28 121386,62 1592,06

AACGC 25,72 128577,25 1552,03 GACGC 25,90 129518,24 1568,03

AACGG 31,96 159786,49 1592,06 GACGG 23,24 116217,52 1608,06

AACGU 28,20 141005,16 1553,02 GACGU 25,88 129390,25 1569,02

AACUA 28,74 143695,74 1537,02 GACUA 13,56 67793,06 1553,02

AACUC 15,05 75240,92 1512,99 GACUC 17,58 87911,51 1528,99

AACUG 16,96 84819,70 1553,02 GACUG 17,90 89514,81 1569,02

AACUU 12,96 64806,00 1513,98 GACUU 20,89 104460,60 1529,98

AAGAA 14,57 72846,19 1600,09 GAGAA 31,02 155113,36 1616,09

AAGAC 25,95 129764,86 1576,06 GAGAC 20,13 100631,14 1592,06

AAGAG 12,81 64059,71 1616,09 GAGAG 23,59 117971,47 1632,09

118

AAGAU 22,93 114667,46 1577,05 GAGAU 25,70 128507,79 1593,05

AAGCA 19,10 95494,72 1576,06 GAGCA 23,32 116609,51 1592,06

AAGCC 20,93 104655,35 1552,03 GAGCC 27,17 135849,36 1568,03

AAGCG 24,47 122363,44 1592,06 GAGCG 7,14 35690,23 1608,06

AAGCU 14,18 70908,70 1553,02 GAGCU 29,66 148288,23 1569,02

AAGGA 30,06 150305,68 1616,09 GAGGA 17,09 85427,48 1632,09

AAGGC 21,50 107496,63 1592,06 GAGGC 15,95 79740,77 1608,06

AAGGG 13,41 67055,81 1632,09 GAGGG 17,99 89970,74 1648,09

AAGGU 10,17 50861,16 1593,05 GAGGU 22,31 111529,18 1609,05

AAGUA 15,64 78187,33 1577,05 GAGUA 18,16 90787,24 1593,05

AAGUC 14,20 70998,55 1553,02 GAGUC 24,13 120660,51 1569,02

AAGUG 20,83 104143,93 1593,05 GAGUG 21,16 105801,19 1609,05

AAGUU 22,76 113810,01 1554,01 GAGUU 23,29 116456,92 1570,01

AAUAA 16,00 79975,22 1561,05 GAUAA 7,06 35308,80 1577,05

AAUAC 20,20 100981,43 1537,02 GAUAC 21,49 107447,95 1553,02

AAUAG 10,98 54896,65 1577,05 GAUAG 25,12 125600,29 1593,05

AAUAU 31,14 155694,50 1538,01 GAUAU 27,96 139793,71 1554,01

AAUCA 32,19 160960,47 1537,02 GAUCA 15,49 77462,41 1553,02

AAUCC 18,50 92520,79 1512,99 GAUCC 21,20 106023,91 1528,99

AAUCG 17,79 88957,50 1553,02 GAUCG 18,04 90196,38 1569,02

AAUCU 17,75 88762,98 1513,98 GAUCU 21,50 107518,68 1529,98

AAUGA 17,55 87769,99 1577,05 GAUGA 19,36 96779,12 1593,05

AAUGC 16,35 81739,23 1553,02 GAUGC 19,98 99923,51 1569,02

AAUGG 26,98 134924,16 1593,05 GAUGG 18,28 91397,70 1609,05

AAUGU 30,73 153638,94 1554,01 GAUGU 18,44 92191,43 1570,01

AAUUA 25,12 125590,08 1538,01 GAUUA 14,15 70748,19 1554,01

AAUUC 36,38 181899,30 1513,98 GAUUC 28,89 144472,15 1529,98

AAUUG 39,38 196924,42 1554,01 GAUUG 27,98 139915,70 1570,01

AAUUU 29,27 146373,50 1514,97 GAUUU 8,30 41520,35 1530,97

ACAAA 19,24 96208,70 1560,06 GCAAA 10,91 54530,35 1576,06

ACAAC 23,23 116160,30 1536,03 GCAAC 13,34 66708,80 1552,03

ACAAG 12,24 61205,37 1576,06 GCAAG 16,19 80945,53 1592,06

ACAAU 12,28 61412,28 1537,02 GCAAU 17,50 87496,25 1553,02

ACACA 14,59 72968,75 1536,03 GCACA 6,60 33010,94 1552,03

ACACC 18,63 93166,58 1512,00 GCACC 8,82 44076,95 1528,00

ACACG 18,82 94088,66 1552,03 GCACG 11,30 56521,32 1568,03

ACACU 24,99 124964,08 1512,99 GCACU 19,42 97081,56 1528,99

119

ACAGA 9,69 48437,04 1576,06 GCAGA 23,71 118560,41 1592,06

ACAGC 17,53 87640,13 1552,03 GCAGC 28,94 144702,67 1568,03

ACAGG 20,94 104689,55 1592,06 GCAGG 33,71 168548,81 1608,06

ACAGU 19,81 99047,55 1553,02 GCAGU 17,34 86716,27 1569,02

ACAUA 11,44 57177,95 1537,02 GCAUA 17,18 85922,18 1553,02

ACAUC 21,25 106236,99 1512,99 GCAUC 14,52 72587,27 1528,99

ACAUG 15,07 75368,73 1553,02 GCAUG 20,54 102709,76 1569,02

ACAUU 8,82 44083,26 1513,98 GCAUU 13,13 65646,83 1529,98

ACCAA 7,93 39663,91 1536,03 GCCAA 17,74 88709,92 1552,03

ACCAC 15,80 78978,39 1512,00 GCCAC 14,69 73463,44 1528,00

ACCAG 20,39 101968,05 1552,03 GCCAG 17,64 88202,42 1568,03

ACCAU 24,44 122175,57 1512,99 GCCAU 14,37 71831,90 1528,99

ACCCA 9,58 47900,03 1512,00 GCCCA 23,79 118958,43 1528,00

ACCCC 16,88 84413,90 1487,97 GCCCC 22,96 114780,00 1503,97

ACCCG 27,71 138561,15 1528,00 GCCCG 24,69 123433,77 1544,00

ACCCU 25,87 129355,77 1488,96 GCCCU 19,93 99628,83 1504,96

ACCGA 12,30 61518,38 1552,03 GCCGA 16,73 83634,71 1568,03

ACCGC 14,63 73152,89 1528,00 GCCGC 14,05 70228,84 1544,00

ACCGG 15,12 75590,36 1568,03 GCCGG 18,49 92443,04 1584,03

ACCGU 11,89 59435,98 1528,99 GCCGU 14,53 72663,04 1544,99

ACCUA 8,53 42651,42 1512,99 GCCUA 17,68 88393,38 1528,99

ACCUC 22,61 113066,17 1488,96 GCCUC 16,84 84204,19 1504,96

ACCUG 20,77 103850,08 1528,99 GCCUG 18,52 92610,24 1544,99

ACCUU 37,89 189455,34 1489,95 GCCUU 13,75 68728,76 1505,95

ACGAA 28,64 143220,23 1576,06 GCGAA 22,38 111880,97 1592,06

ACGAC 17,34 86701,76 1552,03 GCGAC 31,25 156257,52 1568,03

ACGAG 19,06 95278,21 1592,06 GCGAG 25,39 126963,79 1608,06

ACGAU 9,03 45154,30 1553,02 GCGAU 25,86 129287,12 1569,02

ACGCA 10,79 53958,90 1552,03 GCGCA 18,41 92072,84 1568,03

ACGCC 20,48 102379,79 1528,00 GCGCC 15,22 76088,93 1544,00

ACGCG 15,54 77696,36 1568,03 GCGCG 18,11 90574,67 1584,03

ACGCU 14,14 70690,94 1528,99 GCGCU 16,28 81404,70 1544,99

ACGGA 9,03 45169,01 1592,06 GCGGA 19,71 98548,01 1608,06

ACGGC 10,93 54654,71 1568,03 GCGGC 16,96 84779,70 1584,03

ACGGG 9,63 48148,66 1608,06 GCGGG 21,94 109690,21 1624,06

ACGGU 10,01 50044,52 1569,02 GCGGU 11,74 58720,00 1585,02

ACGUA 22,28 111386,09 1553,02 GCGUA 19,38 96878,00 1569,02

120

ACGUC 19,79 98935,71 1528,99 GCGUC 16,29 81456,27 1544,99

ACGUG 14,32 71620,36 1569,02 GCGUG 19,96 99818,43 1585,02

ACGUU 15,31 76532,09 1529,98 GCGUU 15,66 78310,35 1545,98

ACUAA 13,67 68352,04 1537,02 GCUAA 11,54 57705,65 1553,02

ACUAC 14,12 70589,02 1512,99 GCUAC 16,22 81098,05 1528,99

ACUAG 8,99 44969,21 1553,02 GCUAG 14,16 70809,23 1569,02

ACUAU 7,00 34994,43 1513,98 GCUAU 11,28 56379,58 1529,98

ACUCA 9,07 45353,62 1512,99 GCUCA 14,90 74481,10 1528,99

ACUCC 21,01 105041,15 1488,96 GCUCC 14,62 73122,48 1504,96

ACUCG 16,95 84755,12 1528,99 GCUCG 19,89 99432,09 1544,99

ACUCU 18,75 93725,78 1489,95 GCUCU 13,14 65720,11 1505,95

ACUGA 14,90 74497,62 1553,02 GCUGA 19,06 95277,92 1569,02

ACUGC 26,28 131410,13 1528,99 GCUGC 25,20 126004,00 1544,99

ACUGG 16,15 80746,79 1569,02 GCUGG 18,90 94495,61 1585,02

ACUGU 17,20 86002,79 1529,98 GCUGU 14,95 74751,89 1545,98

ACUUA 12,78 63888,82 1513,98 GCUUA 11,28 56383,56 1529,98

ACUUC 16,95 84773,45 1489,95 GCUUC 13,08 65384,79 1505,95

ACUUG 14,20 71006,85 1529,98 GCUUG 14,39 71926,19 1545,98

ACUUU 15,53 77636,24 1490,94 GCUUU 10,60 52977,35 1506,94

AGAAA 15,72 78601,22 1600,09 GGAAA 15,56 77777,27 1616,09

AGAAC 16,46 82285,01 1576,06 GGAAC 13,11 65543,31 1592,06

AGAAG 21,26 106277,62 1616,09 GGAAG 12,25 61233,08 1632,09

AGAAU 13,71 68562,03 1577,05 GGAAU 13,70 68489,93 1593,05

AGACA 14,69 73451,76 1576,06 GGACA 11,67 58328,05 1592,06

AGACC 28,04 140183,22 1552,03 GGACC 25,30 126510,75 1568,03

AGACG 18,95 94762,51 1592,06 GGACG 22,23 111164,18 1608,06

AGACU 22,97 114852,81 1553,02 GGACU 24,47 122355,80 1569,02

AGAGA 7,76 38815,28 1616,09 GGAGA 13,03 65146,69 1632,09

AGAGC 18,42 92081,33 1592,06 GGAGC 16,67 83368,45 1608,06

AGAGG 9,75 48756,28 1632,09 GGAGG 11,45 57269,05 1648,09

AGAGU 11,35 56772,71 1593,05 GGAGU 10,56 52795,36 1609,05

AGAUA 13,14 65718,72 1577,05 GGAUA 15,28 76391,95 1593,05

AGAUC 18,11 90567,07 1553,02 GGAUC 14,28 71416,59 1569,02

AGAUG 19,00 94983,29 1593,05 GGAUG 13,26 66308,23 1609,05

AGAUU 17,45 87249,64 1554,01 GGAUU 10,76 53814,08 1570,01

AGCAA 8,89 44455,93 1576,06 GGCAA 20,64 103223,76 1592,06

AGCAC 25,54 127695,08 1552,03 GGCAC 21,55 107773,44 1568,03

121

AGCAG 11,49 57448,80 1592,06 GGCAG 30,79 153959,87 1608,06

AGCAU 11,48 57406,03 1553,02 GGCAU 9,15 45745,96 1569,02

AGCCA 11,14 55695,94 1552,03 GGCCA 12,06 60294,61 1568,03

AGCCC 23,96 119779,34 1528,00 GGCCC 13,61 68062,00 1544,00

AGCCG 13,75 68765,61 1568,03 GGCCG 15,58 77906,04 1584,03

AGCCU 30,12 150593,02 1528,99 GGCCU 12,00 60001,91 1544,99

AGCGA 7,64 38218,85 1592,06 GGCGA 15,10 75507,56 1608,06

AGCGC 16,82 84079,26 1568,03 GGCGC 14,01 70039,64 1584,03

AGCGG 32,45 162246,98 1608,06 GGCGG 12,61 63061,87 1624,06

AGCGU 16,99 84971,07 1569,02 GGCGU 13,36 66823,69 1585,02

AGCUA 15,76 78783,22 1553,02 GGCUA 20,03 100128,41 1569,02

AGCUC 25,70 128502,04 1528,99 GGCUC 15,86 79301,40 1544,99

AGCUG 17,60 87985,61 1569,02 GGCUG 17,29 86453,26 1585,02

AGCUU 17,14 85688,66 1529,98 GGCUU 12,94 64692,92 1545,98

AGGAA 8,99 44926,77 1616,09 GGGAA 19,85 99262,62 1632,09

AGGAC 14,70 73508,28 1592,06 GGGAC 15,83 79154,08 1608,06

AGGAG 9,51 47528,97 1632,09 GGGAG 17,35 86754,20 1648,09

AGGAU 20,06 100300,98 1593,05 GGGAU 15,03 75149,44 1609,05

AGGCA 14,86 74295,07 1592,06 GGGCA 34,59 172949,09 1608,06

AGGCC 17,82 89094,49 1568,03 GGGCC 16,60 82980,23 1584,03

AGGCG 25,17 125845,54 1608,06 GGGCG 15,85 79230,09 1624,06

AGGCU 19,30 96495,72 1569,02 GGGCU 14,62 73123,97 1585,02

AGGGA 7,43 37150,79 1632,09 GGGGA 18,08 90418,59 1648,09

AGGGC 16,12 80616,94 1608,06 GGGGC 19,75 98759,69 1624,06

AGGGG 9,64 48178,86 1648,09 GGGGG 16,20 81014,91 1664,09

AGGGU 14,53 72658,17 1609,05 GGGGU 21,88 109375,17 1625,05

AGGUA 13,81 69069,05 1593,05 GGGUA 20,94 104707,88 1609,05

AGGUC 10,65 53247,51 1569,02 GGGUC 20,34 101708,08 1585,02

AGGUG 9,81 49031,58 1609,05 GGGUG 15,25 76228,15 1625,05

AGGUU 16,38 81918,26 1570,01 GGGUU 14,31 71556,39 1586,01

AGUAA 9,60 48010,46 1577,05 GGUAA 15,54 77716,98 1593,05

AGUAC 12,61 63033,82 1553,02 GGUAC 14,86 74286,47 1569,02

AGUAG 14,74 73711,35 1593,05 GGUAG 10,29 51434,35 1609,05

AGUAU 19,62 98113,11 1554,01 GGUAU 8,18 40901,62 1570,01

AGUCA 10,76 53802,23 1553,02 GGUCA 14,04 70222,32 1569,02

AGUCC 27,97 139861,50 1528,99 GGUCC 16,74 83675,24 1544,99

AGUCG 14,90 74508,06 1569,02 GGUCG 12,78 63908,23 1585,02

122

AGUCU 12,72 63590,62 1529,98 GGUCU 11,37 56834,50 1545,98

AGUGA 11,41 57063,58 1593,05 GGUGA 16,28 81380,00 1609,05

AGUGC 14,65 73262,44 1569,02 GGUGC 12,36 61788,65 1585,02

AGUGG 11,69 58444,57 1609,05 GGUGG 8,74 43683,02 1625,05

AGUGU 15,84 79214,62 1570,01 GGUGU 9,04 45179,18 1586,01

AGUUA 12,47 62374,06 1554,01 GGUUA 9,64 48209,34 1570,01

AGUUC 20,88 104387,86 1529,98 GGUUC 8,93 44659,34 1545,98

AGUUG 15,36 76821,87 1570,01 GGUUG 7,08 35387,14 1586,01

AGUUU 34,65 173247,30 1530,97 GGUUU 7,48 37388,53 1546,97

AUAAA 21,90 109475,38 1561,05 GUAAA 15,75 78748,56 1577,05

AUAAC 35,57 177829,69 1537,02 GUAAC 14,38 71882,21 1553,02

AUAAG 11,70 58515,36 1577,05 GUAAG 14,53 72638,68 1593,05

AUAAU 17,93 89642,12 1538,01 GUAAU 14,57 72828,40 1554,01

AUACA 17,99 89972,76 1537,02 GUACA 16,31 81545,47 1553,02

AUACC 30,11 150529,26 1512,99 GUACC 18,06 90294,91 1528,99

AUACG 13,77 68847,96 1553,02 GUACG 11,29 56465,83 1569,02

AUACU 14,89 74434,91 1513,98 GUACU 11,76 58787,45 1529,98

AUAGA 18,00 90008,89 1577,05 GUAGA 12,33 61653,26 1593,05

AUAGC 14,39 71927,33 1553,02 GUAGC 11,37 56870,11 1569,02

AUAGG 34,19 170944,16 1593,05 GUAGG 8,96 44817,95 1609,05

AUAGU 14,56 72819,04 1554,01 GUAGU 7,50 37508,82 1570,01

AUAUA 15,53 77671,47 1538,01 GUAUA 15,15 75756,09 1554,01

AUAUC 36,16 180807,04 1513,98 GUAUC 12,33 61655,55 1529,98

AUAUG 22,59 112964,72 1554,01 GUAUG 14,65 73257,00 1570,01

AUAUU 22,11 110552,31 1514,97 GUAUU 10,82 54114,36 1530,97

AUCAA 13,54 67724,78 1537,02 GUCAA 14,45 72226,86 1553,02

AUCAC 18,95 94725,76 1512,99 GUCAC 19,56 97782,92 1528,99

AUCAG 22,46 112294,18 1553,02 GUCAG 12,35 61772,12 1569,02

AUCAU 14,87 74333,44 1513,98 GUCAU 12,33 61630,22 1529,98

AUCCA 26,88 134394,99 1512,99 GUCCA 14,81 74025,96 1528,99

AUCCC 28,61 143067,29 1488,96 GUCCC 14,74 73718,65 1504,96

AUCCG 28,79 143950,44 1528,99 GUCCG 12,00 60005,62 1544,99

AUCCU 17,98 89881,34 1489,95 GUCCU 9,91 49568,93 1505,95

AUCGA 24,31 121554,43 1553,02 GUCGA 13,79 68930,16 1569,02

AUCGC 27,48 137382,98 1528,99 GUCGC 12,00 59991,10 1544,99

AUCGG 13,75 68762,78 1569,02 GUCGG 12,64 63189,70 1585,02

AUCGU 18,81 94056,98 1529,98 GUCGU 16,25 81228,63 1545,98

123

AUCUA 15,81 79034,06 1513,98 GUCUA 12,56 62789,56 1529,98

AUCUC 17,54 87700,37 1489,95 GUCUC 12,46 62301,47 1505,95

AUCUG 17,46 87300,23 1529,98 GUCUG 10,11 50547,96 1545,98

AUCUU 13,84 69199,07 1490,94 GUCUU 10,12 50612,46 1506,94

AUGAA 6,23 31129,21 1577,05 GUGAA 17,30 86516,94 1593,05

AUGAC 24,29 121450,07 1553,02 GUGAC 10,62 53111,43 1569,02

AUGAG 22,39 111932,00 1593,05 GUGAG 14,86 74295,61 1609,05

AUGAU 12,72 63590,62 1554,01 GUGAU 8,85 44236,96 1570,01

AUGCA 28,02 140081,48 1553,02 GUGCA 16,88 84400,79 1569,02

AUGCC 37,94 189701,86 1528,99 GUGCC 13,29 66439,78 1544,99

AUGCG 26,77 133837,59 1569,02 GUGCG 11,27 56328,13 1585,02

AUGCU 18,85 94273,15 1529,98 GUGCU 12,03 60174,84 1545,98

AUGGA 20,89 104466,48 1593,05 GUGGA 10,77 53866,26 1609,05

AUGGC 20,51 102528,93 1569,02 GUGGC 16,68 83417,32 1585,02

AUGGG 109,76 548803,92 1609,05 GUGGG 15,15 75744,59 1625,05

AUGGU 25,66 128294,45 1570,01 GUGGU 12,42 62092,93 1586,01

AUGUA 19,11 95555,26 1554,01 GUGUA 13,94 69718,07 1570,01

AUGUC 26,70 133492,14 1529,98 GUGUC 7,86 39282,41 1545,98

AUGUG 57,66 288295,47 1570,01 GUGUG 9,20 46021,53 1586,01

AUGUU 26,32 131578,96 1530,97 GUGUU 8,05 40225,32 1546,97

AUUAA 19,86 99305,06 1538,01 GUUAA 14,27 71350,81 1554,01

AUUAC 29,05 145245,24 1513,98 GUUAC 12,77 63850,10 1529,98

AUUAG 15,70 78482,14 1554,01 GUUAG 8,40 41983,94 1570,01

AUUAU 10,67 53328,40 1514,97 GUUAU 12,52 62603,83 1530,97

AUUCA 13,56 67790,90 1513,98 GUUCA 9,26 46279,06 1529,98

AUUCC 14,76 73776,03 1489,95 GUUCC 13,07 65337,49 1505,95

AUUCG 17,92 89586,86 1529,98 GUUCG 7,93 39641,45 1545,98

AUUCU 25,33 126647,54 1490,94 GUUCU 7,23 36128,94 1506,94

AUUGA 38,39 191958,88 1554,01 GUUGA 8,24 41202,64 1570,01

AUUGC 18,62 93120,02 1529,98 GUUGC 7,76 38823,47 1545,98

AUUGG 13,77 68831,91 1570,01 GUUGG 7,45 37266,88 1586,01

AUUGU 10,84 54199,94 1530,97 GUUGU 7,13 35667,54 1546,97

AUUUA 16,07 80366,12 1514,97 GUUUA 11,65 58260,71 1530,97

AUUUC 55,78 27749,79 1490,94 GUUUC 7,52 37594,20 1506,94

AUUUG 21,70 108492,63 1530,97 GUUUG 7,04 35181,52 1546,97

AUUUU 13,93 69659,45 1491,93 GUUUU 10,02 50080,50 1507,93

CAAAA 11,48 57414,51 1560,06 UAAAA 21,90 109522,89 1561,05

124

CAAAC 20,82 104102,22 1536,03 UAAAC 15,07 75339,30 1537,02

CAAAG 18,98 94887,95 1576,06 UAAAG 13,10 65479,83 1577,05

CAAAU 11,58 57918,55 1537,02 UAAAU 11,12 55578,11 1538,01

CAACA 23,76 118786,30 1536,03 UAACA 8,34 41720,60 1537,02

CAACC 21,81 109055,04 1512,00 UAACC 9,06 45291,61 1512,99

CAACG 16,65 83228,94 1552,03 UAACG 12,29 61426,34 1553,02

CAACU 14,58 72894,44 1512,99 UAACU 12,31 61569,98 1513,98

CAAGA 13,60 67979,56 1576,06 UAAGA 7,14 35720,20 1577,05

CAAGC 29,61 148029,76 1552,03 UAAGC 12,69 63459,49 1553,02

CAAGG 16,42 82097,45 1592,06 UAAGG 13,35 66740,14 1593,05

CAAGU 18,63 93146,55 1553,02 UAAGU 21,82 109115,45 1554,01

CAAUA 11,34 56682,74 1537,02 UAAUA 28,57 142855,22 1538,01

CAAUC 15,63 78170,17 1512,99 UAAUC 12,53 62673,34 1513,98

CAAUG 18,08 90392,93 1553,02 UAAUG 12,51 62574,43 1554,01

CAAUU 12,33 61667,87 1513,98 UAAUU 26,91 134572,94 1514,97

CACAA 21,84 109220,75 1536,03 UACAA 11,88 59382,73 1537,02

CACAC 16,12 80582,68 1512,00 UACAC 16,66 83312,11 1512,99

CACAG 17,13 85632,11 1552,03 UACAG 14,07 70372,14 1553,02

CACAU 12,15 60772,64 1512,99 UACAU 16,85 84242,98 1513,98

CACCA 20,24 101195,90 1512,00 UACCA 9,28 46401,47 1512,99

CACCC 23,67 118332,31 1487,97 UACCC 22,71 113553,43 1488,96

CACCG 14,91 74544,36 1528,00 UACCG 18,60 93008,87 1528,99

CACCU 14,60 73006,08 1488,96 UACCU 26,63 133125,05 1489,95

CACGA 13,30 66519,98 1552,03 UACGA 22,24 111185,26 1553,02

CACGC 25,30 126514,91 1528,00 UACGC 15,98 79903,04 1528,99

CACGG 16,42 82089,72 1568,03 UACGG 16,24 81200,52 1569,02

CACGU 17,53 87635,37 1528,99 UACGU 10,60 52977,94 1529,98

CACUA 24,43 122128,20 1512,99 UACUA 19,15 95744,86 1513,98

CACUC 26,04 130180,16 1488,96 UACUC 16,00 79989,46 1489,95

CACUG 15,71 78564,46 1528,99 UACUG 29,05 145241,37 1529,98

CACUU 13,00 65005,77 1489,95 UACUU 15,27 76343,28 1490,94

CAGAA 10,55 52767,09 1576,06 UAGAA 10,91 54554,34 1577,05

CAGAC 11,43 57163,27 1552,03 UAGAC 12,07 60364,36 1553,02

CAGAG 12,17 60869,30 1592,06 UAGAG 14,11 70546,21 1593,05

CAGAU 7,58 37884,17 1553,02 UAGAU 18,73 93631,42 1554,01

CAGCA 8,54 42684,46 1552,03 UAGCA 17,13 85669,08 1553,02

CAGCC 10,08 50395,41 1528,00 UAGCC 13,67 68331,13 1528,99

125

CAGCG 10,14 50717,59 1568,03 UAGCG 17,26 86322,05 1569,02

CAGCU 14,62 73097,11 1528,99 UAGCU 17,88 89375,91 1529,98

CAGGA 10,89 54459,48 1592,06 UAGGA 22,71 113565,90 1593,05

CAGGC 27,39 136943,12 1568,03 UAGGC 20,78 103904,69 1569,02

CAGGG 12,28 61404,53 1608,06 UAGGG 30,94 154706,39 1609,05

CAGGU 10,52 52599,39 1569,02 UAGGU 28,03 140147,60 1570,01

CAGUA 10,63 53144,99 1553,02 UAGUA 24,55 122735,53 1554,01

CAGUC 20,50 102483,21 1528,99 UAGUC 24,13 120631,55 1529,98

CAGUG 13,37 66829,84 1569,02 UAGUG 5,83 29129,82 1570,01

CAGUU 24,07 120340,57 1529,98 UAGUU 15,63 78145,85 1530,97

CAUAA 16,82 84101,32 1537,02 UAUAA 18,75 93750,85 1538,01

CAUAC 13,03 65143,60 1512,99 UAUAC 17,02 85117,40 1513,98

CAUAG 25,23 126152,37 1553,02 UAUAG 18,14 90675,28 1554,01

CAUAU 15,26 76280,32 1513,98 UAUAU 11,01 55048,53 1514,97

CAUCA 23,29 116430,43 1512,99 UAUCA 12,49 62469,53 1513,98

CAUCC 21,05 105225,94 1488,96 UAUCC 16,40 81976,37 1489,95

CAUCG 8,87 44341,15 1528,99 UAUCG 26,54 132704,96 1529,98

CAUCU 14,10 70521,90 1489,95 UAUCU 17,32 86624,61 1490,94

CAUGA 22,28 111409,74 1553,02 UAUGA 20,46 102276,78 1554,01

CAUGC 23,82 119076,70 1528,99 UAUGC 18,45 92235,64 1529,98

CAUGG 24,97 124840,37 1569,02 UAUGG 15,16 75796,41 1570,01

CAUGU 12,77 63836,51 1529,98 UAUGU 12,43 62170,51 1530,97

CAUUA 20,87 104358,05 1513,98 UAUUA 9,50 47489,19 1514,97

CAUUC 13,42 67115,04 1489,95 UAUUC 9,60 48009,66 1490,94

CAUUG 10,65 53238,39 1529,98 UAUUG 12,79 63964,63 1530,97

CAUUU 34,06 170313,11 1490,94 UAUUU 13,99 69939,10 1491,93

CCAAA 19,09 95446,19 1536,03 UCAAA 15,60 78003,18 1537,02

CCAAC 26,12 130617,44 1512,00 UCAAC 20,87 104361,15 1512,99

CCAAG 18,81 94033,87 1552,03 UCAAG 21,88 109415,47 1553,02

CCAAU 10,25 51231,28 1512,99 UCAAU 11,36 56776,61 1513,98

CCACA 27,27 136338,09 1512,00 UCACA 19,72 98608,91 1512,99

CCACC 24,92 124610,92 1487,97 UCACC 13,72 68575,62 1488,96

CCACG 12,89 64433,01 1528,00 UCACG 16,18 80911,29 1528,99

CCACU 15,13 75669,22 1488,96 UCACU 20,13 100653,25 1489,95

CCAGA 22,00 110017,82 1552,03 UCAGA 13,62 68123,37 1553,02

CCAGC 19,21 96029,57 1528,00 UCAGC 9,91 49538,28 1528,99

CCAGG 13,30 66476,57 1568,03 UCAGG 15,83 79148,92 1569,02

126

CCAGU 11,20 56002,48 1528,99 UCAGU 12,72 63616,02 1529,98

CCAUA 16,30 81479,20 1512,99 UCAUA 11,22 56116,25 1513,98

CCAUC 16,42 82119,40 1488,96 UCAUC 13,62 68089,59 1489,95

CCAUG 15,89 79444,70 1528,99 UCAUG 12,41 62059,84 1529,98

CCAUU 14,47 72373,11 1489,95 UCAUU 16,43 82168,40 1490,94

CCCAA 12,03 60171,86 1512,00 UCCAA 18,68 93419,74 1512,99

CCCAC 12,38 61923,12 1487,97 UCCAC 19,87 99355,04 1488,96

CCCAG 15,50 77484,65 1528,00 UCCAG 16,90 84483,96 1528,99

CCCAU 16,90 84492,48 1488,96 UCCAU 17,53 87629,73 1489,95

CCCCA 7,17 35868,74 1487,97 UCCCA 11,51 57541,19 1488,96

CCCCC 9,67 48327,13 1463,94 UCCCC 12,05 60241,17 1464,93

CCCCG 9,78 48922,85 1503,97 UCCCG 12,49 62437,69 1504,96

CCCCU 16,60 83008,76 1464,93 UCCCU 13,99 69946,56 1465,92

CCCGA 16,62 83107,15 1528,00 UCCGA 9,68 48401,44 1528,99

CCCGC 13,25 66244,82 1503,97 UCCGC 16,77 83858,44 1504,96

CCCGG 20,83 104140,86 1544,00 UCCGG 22,39 111953,85 1544,99

CCCGU 14,63 73144,51 1504,96 UCCGU 15,33 76628,98 1505,95

CCCUA 13,28 66393,46 1488,96 UCCUA 15,60 78019,16 1489,95

CCCUC 16,33 81663,42 1464,93 UCCUC 32,38 161898,72 1465,92

CCCUG 19,92 99596,31 1504,96 UCCUG 17,71 88563,88 1505,95

CCCUU 9,13 45666,44 1465,92 UCCUU 14,19 70925,59 1466,91

CCGAA 10,89 54435,92 1552,03 UCGAA 13,35 66766,68 1553,02

CCGAC 7,97 39873,99 1528,00 UCGAC 25,83 129160,92 1528,99

CCGAG 8,27 41369,14 1568,03 UCGAG 16,84 84207,13 1569,02

CCGAU 10,73 53655,04 1528,99 UCGAU 32,87 164352,98 1529,98

CCGCA 28,63 143140,59 1528,00 UCGCA 26,05 130273,60 1528,99

CCGCC 22,22 111118,64 1503,97 UCGCC 25,95 129729,85 1504,96

CCGCG 25,94 129688,54 1544,00 UCGCG 31,45 157274,19 1544,99

CCGCU 9,59 47952,40 1504,96 UCGCU 19,20 96008,49 1505,95

CCGGA 20,82 104088,74 1568,03 UCGGA 14,57 72874,10 1569,02

CCGGC 19,29 96461,34 1544,00 UCGGC 39,19 195935,31 1544,99

CCGGG 50,11 250526,85 1584,03 UCGGG 32,59 162935,40 1585,02

CCGGU 15,82 79113,72 1544,99 UCGGU 11,90 59522,60 1545,98

CCGUA 11,40 56989,61 1528,99 UCGUA 14,94 74707,84 1529,98

CCGUC 13,85 69227,98 1504,96 UCGUC 49,88 249400,82 1505,95

CCGUG 10,13 50626,29 1544,99 UCGUG 18,80 94017,04 1545,98

CCGUU 20,63 103168,75 1505,95 UCGUU 33,49 167446,83 1506,94

127

CCUAA 9,25 46260,97 1512,99 UCUAA 8,46 42284,05 1513,98

CCUAC 34,97 174873,91 1488,96 UCUAC 17,33 86653,28 1489,95

CCUAG 24,12 120604,95 1528,99 UCUAG 7,17 35826,39 1529,98

CCUAU 13,57 67842,68 1489,95 UCUAU 10,00 50010,38 1490,94

CCUCA 27,84 139201,98 1488,96 UCUCA 25,48 127401,59 1489,95

CCUCC 11,21 56034,05 1464,93 UCUCC 34,19 170928,65 1465,92

CCUCG 30,66 153312,23 1504,96 UCUCG 32,11 160542,41 1505,95

CCUCU 31,14 155678,16 1465,92 UCUCU 31,27 156344,86 1466,91

CCUGA 22,87 114358,81 1528,99 UCUGA 14,72 73620,10 1529,98

CCUGC 19,55 97751,53 1504,96 UCUGC 11,77 58845,62 1505,95

CCUGG 21,81 109073,59 1544,99 UCUGG 11,97 59830,74 1545,98

CCUGU 20,66 103286,77 1505,95 UCUGU 8,46 42304,80 1506,94

CCUUA 14,66 73315,65 1489,95 UCUUA 18,15 90727,65 1490,94

CCUUC 15,71 78565,61 1465,92 UCUUC 11,16 55815,58 1466,91

CCUUG 17,82 89109,73 1505,95 UCUUG 11,59 57959,31 1506,94

CCUUU 13,96 69777,80 1466,91 UCUUU 11,84 59201,33 1467,90

CGAAA 12,69 63472,56 1576,06 UGAAA 10,30 51524,16 1577,05

CGAAC 9,35 46762,78 1552,03 UGAAC 7,05 35250,32 1553,02

CGAAG 17,31 86547,11 1592,06 UGAAG 20,35 101757,42 1593,05

CGAAU 13,52 67604,09 1553,02 UGAAU 26,83 134170,76 1554,01

CGACA 17,46 87313,53 1552,03 UGACA 20,16 100787,69 1553,02

CGACC 13,97 69862,86 1528,00 UGACC 20,95 104758,17 1528,99

CGACG 11,56 57790,85 1568,03 UGACG 35,01 175065,16 1569,02

CGACU 12,67 63346,79 1528,99 UGACU 16,53 82630,83 1529,98

CGAGA 6,19 30925,87 1592,06 UGAGA 20,75 103729,99 1593,05

CGAGC 8,55 42774,20 1568,03 UGAGC 27,92 139607,67 1569,02

CGAGG 7,08 35406,72 1608,06 UGAGG 40,64 203187,71 1609,05

CGAGU 9,65 48266,20 1569,02 UGAGU 20,99 104925,18 1570,01

CGAUA 6,81 34073,31 1553,02 UGAUA 24,28 121377,27 1554,01

CGAUC 9,68 48411,36 1528,99 UGAUC 23,48 117402,37 1529,98

CGAUG 15,13 75666,83 1569,02 UGAUG 14,08 70420,56 1570,01

CGAUU 17,15 85770,90 1529,98 UGAUU 8,67 43331,87 1530,97

CGCAA 11,37 56863,80 1552,03 UGCAA 38,66 193292,33 1553,02

CGCAC 13,71 68560,96 1528,00 UGCAC 33,52 167615,87 1528,99

CGCAG 8,36 41784,33 1568,03 UGCAG 35,40 176981,63 1569,02

CGCAU 13,15 65744,96 1528,99 UGCAU 34,41 172035,31 1529,98

CGCCA 10,89 54468,30 1528,00 UGCCA 19,87 99334,38 1528,99

128

CGCCC 22,05 110255,16 1503,97 UGCCC 40,18 200882,57 1504,96

CGCCG 16,38 81919,01 1544,00 UGCCG 37,16 185788,85 1544,99

CGCCU 11,77 58874,97 1504,96 UGCCU 25,09 125456,69 1505,95

CGCGA 10,62 53084,68 1568,03 UGCGA 22,43 112153,20 1569,02

CGCGC 14,35 71758,46 1544,00 UGCGC 16,41 82072,23 1544,99

CGCGG 18,29 91440,33 1584,03 UGCGG 17,57 87843,97 1585,02

CGCGU 15,69 78469,09 1544,99 UGCGU 16,10 80500,92 1545,98

CGCUA 18,63 93136,77 1528,99 UGCUA 45,85 229244,20 1529,98

CGCUC 21,55 107737,30 1504,96 UGCUC 32,61 163039,26 1505,95

CGCUG 10,97 54826,06 1544,99 UGCUG 35,88 179412,73 1545,98

CGCUU 12,52 62617,31 1505,95 UGCUU 49,17 245847,98 1506,94

CGGAA 10,82 54110,06 1592,06 UGGAA 45,22 226118,04 1593,05

CGGAC 14,12 70584,51 1568,03 UGGAC 22,23 111154,41 1569,02

CGGAG 11,75 58742,00 1608,06 UGGAG 23,34 116689,76 1609,05

CGGAU 11,68 58405,04 1569,02 UGGAU 40,52 202581,50 1570,01

CGGCA 9,48 47397,49 1568,03 UGGCA 19,47 97332,27 1569,02

CGGCC 20,91 104565,14 1544,00 UGGCC 20,00 100022,54 1544,99

CGGCG 19,72 98590,99 1584,03 UGGCG 15,95 79737,03 1585,02

CGGCU 19,03 95149,13 1544,99 UGGCU 13,62 68104,02 1545,98

CGGGA 16,38 81906,90 1608,06 UGGGA 29,08 145386,88 1609,05

CGGGC 14,22 71110,85 1584,03 UGGGC 44,98 224884,84 1585,02

CGGGG 12,11 60539,69 1624,06 UGGGG 26,31 131540,89 1625,05

CGGGU 10,09 50466,68 1585,02 UGGGU 40,90 204504,26 1586,01

CGGUA 10,17 50853,85 1569,02 UGGUA 54,59 272927,53 1570,01

CGGUC 14,73 73650,66 1544,99 UGGUC 45,00 224997,80 1545,98

CGGUG 11,60 57999,87 1585,02 UGGUG 27,00 134981,40 1586,01

CGGUU 15,16 75819,53 1545,98 UGGUU 45,44 227206,24 1546,97

CGUAA 9,30 46524,46 1553,02 UGUAA 48,30 241521,21 1554,01

CGUAC 10,57 52871,77 1528,99 UGUAC 46,73 233628,16 1529,98

CGUAG 10,91 54532,47 1569,02 UGUAG 27,83 139148,34 1570,01

CGUAU 12,91 64529,98 1529,98 UGUAU 9,39 46973,02 1530,97

CGUCA 13,73 68633,35 1528,99 UGUCA 22,25 111254,20 1529,98

CGUCC 23,08 115378,19 1504,96 UGUCC 22,40 112014,42 1505,95

CGUCG 11,00 54995,02 1544,99 UGUCG 26,44 132184,62 1545,98

CGUCU 20,12 100595,61 1505,95 UGUCU 17,50 87506,53 1506,94

CGUGA 7,75 38757,12 1569,02 UGUGA 27,58 137922,54 1570,01

CGUGC 12,74 63690,30 1544,99 UGUGC 28,30 141519,82 1545,98

129

CGUGG 15,58 77875,53 1585,02 UGUGG 15,30 76492,34 1586,01

CGUGU 19,46 97277,83 1545,98 UGUGU 28,39 141930,09 1546,97

CGUUA 8,42 42085,55 1529,98 UGUUA 14,25 71257,24 1530,97

CGUUC 13,09 65445,88 1505,95 UGUUC 54,65 273256,90 1506,94

CGUUG 23,25 116246,05 1545,98 UGUUG 26,94 134685,61 1546,97

CGUUU 17,50 87496,01 1506,94 UGUUU 39,93 199662,90 1507,93

CUAAA 10,32 51600,76 1537,02 UCGAA 14,08 70376,87 1538,01

CUAAC 17,06 85298,38 1512,99 UCGAC 29,56 147790,42 1513,98

CUAAG 8,88 44398,22 1553,02 UCGAG 15,28 76382,13 1554,01

CUAAU 12,02 60085,04 1513,98 UCGAU 21,49 107468,50 1514,97

CUACA 11,22 56109,70 1512,99 UCGCA 22,28 111409,94 1513,98

CUACC 14,27 71329,26 1488,96 UCGCC 20,51 102530,60 1489,95

CUACG 13,07 65345,45 1528,99 UCGCG 21,88 109412,71 1529,98

CUACU 14,53 72641,14 1489,95 UCGCU 21,02 105116,11 1490,94

CUAGA 10,56 52786,09 1553,02 UCGGA 24,46 122290,35 1554,01

CUAGC 9,63 48135,18 1528,99 UCGGC 20,39 101971,69 1529,98

CUAGG 14,52 72623,64 1569,02 UCGGG 14,11 70538,71 1570,01

CUAGU 15,20 76012,14 1529,98 UCGGU 16,21 81042,81 1530,97

CUAUA 20,08 100377,03 1513,98 UCGUA 12,97 64847,62 1514,97

CUAUC 23,46 117277,32 1489,95 UCGUC 14,81 74068,04 1490,94

CUAUG 11,13 55631,04 1529,98 UCGUG 15,63 78160,47 1530,97

CUAUU 11,74 58677,29 1490,94 UCGUU 13,52 67605,22 1491,93

CUCAA 6,76 33778,31 1512,99 UCUAA 18,61 93057,77 1513,98

CUCAC 16,76 83789,91 1488,96 UCUAC 14,71 73532,10 1489,95

CUCAG 17,00 85012,21 1528,99 UCUAG 16,06 80312,59 1529,98

CUCAU 15,69 78445,41 1489,95 UCUAU 12,89 64428,56 1490,94

CUCCA 17,18 85911,72 1488,96 UCUCA 14,23 71130,75 1489,95

CUCCC 16,18 80916,42 1464,93 UCUCC 13,64 68217,90 1465,92

CUCCG 16,26 81317,29 1504,96 UCUCG 19,45 97267,38 1505,95

CUCCU 32,26 161304,34 1465,92 UCUCU 14,57 72865,75 1466,91

CUCGA 11,47 57359,02 1528,99 UCUGA 12,49 62450,24 1529,98

CUCGC 16,05 80259,58 1504,96 UCUGC 14,02 70104,09 1505,95

CUCGG 13,72 68615,88 1544,99 UCUGG 15,28 76393,75 1545,98

CUCGU 10,04 50187,96 1505,95 UCUGU 16,17 80827,22 1506,94

CUCUA 17,15 85740,40 1489,95 UCUUA 17,02 85092,64 1490,94

CUCUC 17,66 88320,18 1465,92 UCUUC 16,31 81557,94 1466,91

CUCUG 15,43 77138,30 1505,95 UCUUG 15,72 78591,39 1506,94

130

CUCUU 17,96 89792,97 1466,91 UCUUU 13,83 69162,10 1467,90

CUGAA 18,22 91105,18 1553,02 UGAAA 15,45 77273,54 1554,01

CUGAC 15,43 77173,82 1528,99 UGAAC 13,72 68604,88 1529,98

CUGAG 12,49 62473,06 1569,02 UGAAG 14,13 70644,77 1570,01

CUGAU 13,26 66318,47 1529,98 UGAAU 13,52 67613,44 1530,97

CUGCA 21,03 105127,35 1528,99 UGACA 12,18 60880,28 1529,98

CUGCC 10,58 52909,54 1504,96 UGACC 12,90 64479,53 1505,95

CUGCG 20,10 100484,63 1544,99 UGACG 14,19 70933,55 1545,98

CUGCU 20,31 101565,15 1505,95 UGACU 15,26 76292,96 1506,94

CUGGA 18,81 94033,12 1569,02 UGAGA 16,42 82086,03 1570,01

CUGGC 11,55 57742,48 1544,99 UGAGC 13,21 66043,85 1545,98

CUGGG 17,04 85202,85 1585,02 UGAGG 11,13 55665,33 1586,01

CUGGU 14,21 71032,63 1545,98 UGAGU 14,12 70588,01 1546,97

CUGUA 16,46 82280,93 1529,98 UGAUA 13,69 68432,03 1530,97

CUGUC 19,33 96665,32 1505,95 UGAUC 16,06 80281,08 1506,94

CUGUG 19,76 98812,66 1545,98 UGAUG 13,75 68756,97 1546,97

CUGUU 20,49 102438,13 1506,94 UGAUU 13,15 65773,37 1507,93

CUUAA 18,82 94101,86 1513,98 UGCAA 15,37 76847,10 1514,97

CUUAC 17,18 85895,80 1489,95 UGCAC 14,70 73514,32 1490,94

CUUAG 18,61 93045,65 1529,98 UGCAG 16,13 80643,82 1530,97

CUUAU 17,35 86772,06 1490,94 UGCAU 14,87 74331,44 1491,93

CUUCA 15,13 75652,50 1489,95 UGCCA 15,89 79457,61 1490,94

CUUCC 18,25 91273,51 1465,92 UGCCC 13,86 69320,21 1466,91

CUUCG 10,81 54062,90 1505,95 UGCCG 16,83 84129,08 1506,94

CUUCU 13,69 68452,29 1466,91 UGCCU 12,55 62762,72 1467,90

CUUGA 14,27 71367,57 1529,98 UGCGA 12,26 61286,39 1530,97

CUUGC 20,93 104668,38 1505,95 UGCGC 16,21 81028,05 1506,94

CUUGG 15,01 75068,15 1545,98 UGCGG 11,14 55712,43 1546,97

CUUGU 18,65 93259,42 1506,94 UGCGU 14,17 70859,13 1507,93

CUUUA 16,80 83976,03 1490,94 UGCUA 28,23 141138,71 1491,93

CUUUC 17,98 89886,81 1466,91 UGCUC 35,59 177947,92 1467,90

CUUUG 20,62 103091,87 1506,94 UGCUG 18,42 92093,47 1507,93

CUUUU 16,49 82464,36 1467,90 UGCUU 14,72 73614,91 1468,89

Supplementary Table 4. List of RNA sequences for the pentamer library with synthesis yields and calculated mass.

131

Sequence content molarity mass calc. Sequence content molarity mass calc. [nM] [nM] [g/mol] [nM] [nM] [g/mol]

AAAAUGU 15,64 78195,49 2212,43 GGAAUGU 17,28 86393,66 2244,43

AAACUGU 38,10 190514,91 2188,40 GGACUGU 18,77 93830,70 2220,40

AAAGUGU 16,20 81019,11 2228,43 GGAGUGU 12,96 64784,95 2260,43

AAAUUGU 25,60 128000,00 2189,39 GGAUUGU 15,59 77929,16 2221,39

AACAUGU 16,65 83244,68 2188,40 GGCAUGU 11,25 56241,03 2220,40

AACCUGU 20,29 101453,49 2164,37 GGCCUGU 24,77 123854,66 2196,37

AACGUGU 11,22 56087,55 2204,40 GGCGUGU 17,93 89644,97 2236,40

AACUUGU 23,72 118601,40 2165,36 GGCUUGU 88,67 443333,33 2197,36

AAGAUGU 15,78 78887,48 2228,43 GGGAUGU 10,19 50948,51 2260,43

AAGCUGU 18,19 90934,45 2204,40 GGGCUGU 10,00 50000,00 2236,40

AAGGUGU 13,47 67362,92 2244,43 GGGGUGU 73,55 367741,94 2276,43

AAGUUGU 11,55 57742,78 2205,39 GGGUUGU 16,25 81241,18 2237,39

AAUAUGU 19,47 97345,13 2189,39 GGUAUGU 61,18 305913,98 2221,39

AAUCUGU 23,97 119833,56 2165,36 GGUCUGU 22,49 112462,91 2197,36

AAUGUGU 24,44 122222,22 2205,39 GGUGUGU 10,32 51622,00 2237,39

AAUUUGU 23,67 118351,06 2166,35 GGUUUGU 7,38 36879,43 2198,35

ACAAUGU 84,41 422074,47 2188,40 GUAAUGU 16,30 81520,31 2205,39

ACACUGU 22,60 113005,78 2164,37 GUACUGU 11,32 56614,51 2181,36

ACAGUGU 88,71 443572,40 2204,40 GUAGUGU 14,93 74666,67 2221,39

ACAUUGU 23,05 115226,34 2165,36 GUAUUGU 76,00 380000,00 2182,35

ACCAUGU 28,95 144729,34 2164,37 GUCAUGU 13,86 69306,93 2181,36

ACCCUGU 30,47 152351,10 2140,34 GUCCUGU 58,85 294245,72 2157,33

ACCGUGU 19,62 98091,04 2180,37 GUCGUGU 7,58 37900,87 2197,36

ACCUUGU 8,12 40601,50 2141,33 GUCUUGU 60,66 303283,58 2158,32

ACGAUGU 17,10 85481,68 2204,40 GUGAUGU 10,68 53405,99 2221,39

ACGCUGU 82,35 411764,71 2180,37 GUGCUGU 13,88 69393,94 2197,36

ACGGUGU 6,29 31460,67 2220,40 GUGGUGU 14,84 74189,00 2237,39

ACGUUGU 7,68 38418,08 2181,36 GUGUUGU 5,28 26382,98 2198,35

ACUAUGU 64,68 323392,61 2165,36 GUUAUGU 73,30 366486,49 2182,35

ACUCUGU 8,53 42662,63 2141,33 GUUCUGU 39,34 196716,42 2158,32

ACUGUGU 7,07 35344,83 2181,36 GUUGUGU 56,23 281134,75 2198,35

ACUUUGU 13,53 67630,06 2142,32 GUUUUGU 12,21 61055,63 2159,31

AGAAUGU 4,50 22503,16 2228,43 UAAAUGU 49,71 248571,43 2189,39

AGACUGU 75,57 377838,58 2204,40 UAACUGU 126,03 630140,85 2165,36

AGAGUGU 19,07 95372,75 2244,43 UAAGUGU 52,10 260501,98 2205,39

AGAUUGU 15,73 78645,83 2205,39 UAAUUGU 31,00 155020,08 2166,35

132

AGCAUGU 22,16 110807,11 2204,40 UACAUGU 21,27 106353,59 2165,36

AGCCUGU 24,35 121739,13 2180,37 UACCUGU 27,76 138787,88 2141,33

AGCGUGU 6,93 34647,89 2220,40 UACGUGU 21,79 108961,59 2181,36

AGCUUGU 21,21 106051,87 2181,36 UACUUGU 15,78 78893,74 2142,32

AGGAUGU 7,67 38341,97 2244,43 UAGAUGU 26,47 132372,21 2205,39

AGGCUGU 11,12 55587,39 2220,40 UAGCUGU 13,53 67634,25 2181,36

AGGGUGU 9,00 44979,92 2260,43 UAGGUGU 10,08 50406,50 2221,39

AGGUUGU 7,86 39300,13 2221,39 UAGUUGU 18,20 91008,17 2182,35

AGUAUGU 9,15 45758,35 2205,39 UAUAUGU 17,46 87287,02 2166,35

AGUCUGU 9,44 47175,14 2181,36 UAUCUGU 24,88 124386,72 2142,32

AGUGUGU 14,97 74831,76 2221,39 UAUGUGU 35,93 179670,33 2182,35

AGUUUGU 5,52 27604,87 2182,35 UAUUUGU 26,24 131215,47 2143,31

AUAAUGU 33,17 165865,99 2189,39 UCAAUGU 33,50 167507,00 2165,36

AUACUGU 29,49 147469,22 2165,36 UCACUGU 17,98 89908,26 2141,33

AUAGUGU 9,97 49871,47 2205,39 UCAGUGU 40,86 204279,60 2181,36

AUAUUGU 14,48 72395,83 2166,35 UCAUUGU 15,40 76989,87 2142,32

AUCAUGU 10,39 51972,79 2165,36 UCCAUGU 36,02 180120,48 2141,33

AUCCUGU 22,41 112071,54 2141,33 UCCCUGU 41,33 206666,67 2117,30

AUCGUGU 21,51 107563,03 2181,36 UCCGUGU 16,67 83359,25 2157,33

AUCUUGU 15,13 75644,70 2142,32 UCCUUGU 19,33 96650,72 2118,29

AUGAUGU 10,76 53805,77 2205,39 UCGAUGU 30,10 150500,72 2181,36

AUGCUGU 20,52 102616,28 2181,36 UCGCUGU 22,14 110720,00 2157,33

AUGGUGU 14,00 70013,57 2221,39 UCGGUGU 41,48 207418,40 2197,36

AUGUUGU 6,11 30559,35 2182,35 UCGUUGU 33,25 166268,66 2158,32

AUUAUGU 29,32 146614,58 2166,35 UCUAUGU 40,35 201731,60 2142,32

AUUCUGU 26,65 133237,82 2142,32 UCUCUGU 10,91 54574,64 2118,29

AUUGUGU 7,09 35470,67 2182,35 UCUGUGU 13,13 65653,50 2158,32

AUUUUGU 6,31 31550,07 2143,31 UCUUUGU 14,07 70336,39 2119,28

CAAAUGU 16,57 82833,79 2188,40 UGAAUGU 26,61 133063,43 2205,39

CAACUGU 17,57 87833,83 2164,37 UGACUGU 20,56 102790,01 2181,36

CAAGUGU 58,81 294036,06 2204,40 UGAGUGU 17,64 88186,81 2221,39

CAAUUGU 29,42 147116,74 2165,36 UGAUUGU 19,05 95264,62 2182,35

CACAUGU 18,55 92732,56 2164,37 UGCAUGU 20,44 102202,64 2181,36

CACCUGU 21,73 108653,85 2140,34 UGCCUGU 45,06 225283,63 2157,33

CACGUGU 70,82 354122,94 2180,37 UGCGUGU 49,21 246060,61 2197,36

CACUUGU 75,15 375729,65 2141,33 UGCUUGU 47,58 237888,20 2158,32

CAGAUGU 23,66 118294,36 2204,40 UGGAUGU 19,83 99168,98 2221,39

133

CAGCUGU 22,42 112098,01 2180,37 UGGCUGU 141,85 709259,26 2197,36

CAGGUGU 111,40 556980,06 2220,40 UGGGUGU 131,19 655954,09 2237,39

CAGUUGU 105,21 526074,50 2181,36 UGGUUGU 113,77 568831,17 2198,35

CAUAUGU 104,37 521870,70 2165,36 UGUAUGU 25,88 129395,60 2182,35

CAUCUGU 55,71 278538,81 2141,33 UGUCUGU 47,05 235258,36 2158,32

CAUGUGU 58,27 291329,48 2181,36 UGUGUGU 92,64 463203,46 2198,35

CAUUUGU 46,63 233139,53 2142,32 UGUUUGU 104,03 520174,17 2159,31

CCAAUGU 16,14 80701,75 2164,37 UUAAUGU 28,76 143775,10 2166,35

CCACUGU 20,13 100641,03 2140,34 UUACUGU 32,43 162154,29 2142,32

CCAGUGU 11,92 59612,52 2180,37 UUAGUGU 24,31 121525,89 2182,35

CCAUUGU 50,87 254372,16 2141,33 UUAUUGU 29,12 145580,11 2143,31

CCCAUGU 13,31 66561,51 2140,34 UUCAUGU 55,34 276700,43 2142,32

CCCCUGU 72,91 364561,40 2116,31 UUCCUGU 31,39 156937,80 2118,29

CCCGUGU 17,88 89396,41 2156,34 UUCGUGU 23,58 117910,45 2158,32

CCCUUGU 19,90 99497,49 2117,30 UUCUUGU 85,08 425382,26 2119,28

CCGAUGU 77,91 389536,62 2180,37 UUGAUGU 111,14 555710,31 2182,35

CCGCUGU 21,78 108907,56 2156,34 UUGCUGU 23,66 118322,98 2158,32

CCGGUGU 20,99 104968,94 2196,37 UUGGUGU 11,31 56565,66 2198,35

CCGUUGU 20,75 103750,00 2157,33 UUGUUGU 10,62 53120,46 2159,31

CCUAUGU 17,86 89291,10 2141,33 UUUAUGU 27,46 137292,82 2143,31

CCUCUGU 83,98 419898,82 2117,30 UUUCUGU 125,93 629663,61 2119,28

CCUGUGU 78,28 391401,27 2157,33 UUUGUGU 90,33 451669,09 2159,31

CCUUUGU 9,42 47115,38 2118,29 UUUUUGU 36,20 181021,90 2120,27

CGAAUGU 111,21 556050,07 2204,40 UGCAAAA 39,42 197093,79 2211,44

CGACUGU 0,61 3034,90 2180,37 UGCAAAC 29,67 148340,55 2187,41

CGAGUGU 14,33 71671,39 2220,40 UGCAAAG 27,39 136971,35 2227,44

CGAUUGU 14,02 70114,94 2181,36 UGCAAAU 46,92 234578,15 2188,40

CGCAUGU 12,93 64643,40 2180,37 UGCAACA 31,95 159774,96 2187,41

CGCCUGU 9,14 45714,29 2156,34 UGCAACC 18,16 90824,26 2163,38

CGCGUGU 20,19 100940,44 2196,37 UGCAACG 24,39 121944,04 2203,41

CGCUUGU 18,33 91639,87 2157,33 UGCAACU 27,99 139969,83 2164,37

CGGAUGU 10,00 50000,00 2220,40 UGCAAGA 43,52 217600,00 2227,44

CGGCUGU 10,73 53674,12 2196,37 UGCAAGC 51,52 257604,17 2203,41

CGGGUGU 31,94 159703,70 2236,40 UGCAAGG 32,94 164705,88 2243,44

CGGUUGU 9,30 46497,76 2197,36 UGCAAGU 17,24 86197,18 2204,40

CGUAUGU 11,10 55524,08 2181,36 UGCAAUA 30,29 151466,67 2188,40

CGUCUGU 21,51 107547,17 2157,33 UGCAAUC 33,25 166272,19 2164,37

134

CGUGUGU 10,73 53651,27 2197,36 UGCAAUG 34,77 173863,64 2204,40

CGUUUGU 7,56 37781,11 2158,32 UGCAAUU 21,37 106857,14 2165,36

CUAAUGU 12,23 61150,07 2165,36 UGCACAA 13,39 66947,96 2187,41

CUACUGU 10,17 50842,27 2141,33 UGCACAC 9,15 45749,61 2163,38

CUAGUGU 10,80 54000,00 2181,36 UGCACAG 9,90 49490,54 2203,41

CUAUUGU 9,80 48985,51 2142,32 UGCACAU 17,02 85081,24 2164,37

CUCAUGU 10,84 54185,69 2141,33 UGCACCA 41,75 208774,58 2163,38

CUCCUGU 65,36 326812,82 2117,30 UGCACCC 36,02 180101,18 2139,35

CUCGUGU 33,21 166037,74 2157,33 UGCACCG 19,71 98569,16 2179,38

CUCUUGU 57,16 285806,45 2118,29 UGCACCU 15,40 76998,37 2140,34

CUGAUGU 13,57 67836,26 2181,36 UGCACGA 12,18 60919,54 2203,41

CUGCUGU 19,08 95409,84 2157,33 UGCACGC 6,60 33009,71 2179,38

CUGGUGU 12,75 63732,93 2197,36 UGCACGG 10,79 53939,39 2219,41

CUGUUGU 10,69 53435,11 2158,32 UGCACGU 7,32 36585,37 2180,37

CUUAUGU 13,10 65507,25 2142,32 UGCACUA 10,32 51594,20 2164,37

CUUCUGU 10,84 54193,55 2118,29 UGCACUC 8,96 44805,19 2140,34

CUUGUGU 77,07 385343,51 2158,32 UGCACUG 10,81 54037,27 2180,37

CUUUUGU 14,19 70967,74 2119,28 UGCACUU 9,50 47500,00 2141,33

GAAAUGU 16,19 80927,84 2228,43 UGCAGAA 20,27 101333,33 2227,44

GAACUGU 26,65 133240,22 2204,40 UGCAGAC 19,01 95043,73 2203,41

GAAGUGU 30,25 151245,09 2244,43 UGCAGAG 26,23 131129,48 2243,44

GAAUUGU 11,31 56573,71 2205,39 UGCAGAU 14,41 72067,04 2204,40

GACAUGU 15,40 76986,30 2204,40 UGCAGCA 22,14 110724,64 2203,41

GACCUGU 17,54 87687,69 2180,37 UGCAGCC 14,41 72025,72 2179,38

GACGUGU 17,15 85754,58 2220,40 UGCAGCG 35,50 177507,60 2219,41

GACUUGU 14,78 73881,67 2181,36 UGCAGCU 30,16 150778,82 2180,37

GAGAUGU 14,04 70221,07 2244,43 UGCAGGA 12,04 60191,52 2243,44

GAGCUGU 25,61 128057,55 2220,40 UGCAGGC 237,18 1185911,18 2219,41

GAGGUGU 9,62 48118,28 2260,43 UGCAGGG 61,14 305697,84 2259,44

GAGUUGU 16,54 82702,70 2221,39 UGCAGGU 14,47 72358,90 2220,40

GAUAUGU 10,51 52535,76 2205,39 UGCAGUA 130,15 650746,27 2204,40

GAUCUGU 17,11 85550,79 2181,36 UGCAGUC 85,67 428355,96 2180,37

GAUGUGU 12,15 60762,94 2221,39 UGCAGUG 66,11 330535,46 2220,40

GAUUUGU 6,68 33424,66 2182,35 UGCAGUU 23,64 118195,05 2181,36

GCAAUGU 12,07 60335,20 2204,40 UGCAUAA 40,85 204266,67 2188,40

GCACUGU 86,16 430792,68 2180,37 UGCAUAC 27,18 135918,37 2164,37

GCAGUGU 13,77 68847,80 2220,40 UGCAUAG 29,42 147107,44 2204,40

135

GCAUUGU 15,01 75036,08 2181,36 UGCAUAU 53,97 269832,40 2165,36

GCCAUGU 25,95 129729,73 2180,37 UGCAUCA 25,94 129683,00 2164,37

GCCCUGU 74,95 374750,83 2156,34 UGCAUCC 20,58 102875,40 2140,34

GCCGUGU 13,83 69147,29 2196,37 UGCAUCG 34,56 172809,67 2180,37

GCCUUGU 12,21 61049,28 2157,33 UGCAUCU 98,58 492879,26 2141,33

GCGAUGU 14,38 71897,29 2220,40 UGCAUGA 118,39 591955,62 2204,40

GCGCUGU 27,62 138118,02 2196,37 UGCAUGC 95,05 475272,16 2180,37

GCGGUGU 14,08 70414,20 2236,40 UGCAUGG 55,12 275620,44 2220,40

GCGUUGU 111,67 558333,33 2197,36 UGCAUGU 21,38 106901,62 2181,36

GCUAUGU 10,42 52086,33 2181,36 UGCAUUA 36,64 183218,71 2165,36

GCUCUGU 14,34 71680,00 2157,33 UGCAUUC 193,81 969065,85 2141,33

GCUGUGU 55,21 276060,61 2197,36 UGCAUUG 38,06 190308,37 2181,36

GCUUUGU 104,02 520121,95 2158,32 UGCAUUU 54,83 274150,66 2142,32

Supplementary Table 5. List of RNA sequences synthesized for the RBFOX library

2 Sequence KD (kinetic) KD(steady state) Standard deviation R

pre-miR-107 1.97 M 2.11 M 0.37  0.9941

pre-miR-20b 3.56 M 3.52 M 0.46  0.9952

pre-miR-32 332.5 nM 363.19 nM 33.2 nM 0.9972

Supplementary Table 6. Comparison of steady state and kinetic SPR analysis.

-1 -1 -1 Name TM [°C] H [kcal/mol] S [cal/mol K] G [kcal/mol] KD [nM] ka [M s ] kd [s ]

tr-miR-20b UU-AG 70,73 -64,94 -188,85 -6,37 562,67 115000,00 0,05725

tr-miR-20b AC-UG 57,94 -65,67 -198,35 -4,15 146,50 380000,00 0,056

tr-miR-20b AC-UA 54,01 -74,87 -228,82 -3,91 158,92 314850,00 0,0504

tr-miR-20b AC 48,88 -60,69 -188,46 -2,24 600,00 310000,00 0,186

tr-miR-20b 42,95 -33,42 -105,71 -0,63 158,50 642500,00 0,0984

tr-miR-32 CU-AA 66,02 -79,40 -234,11 -6,79 407,00 164000,00 0,067

tr-miR-32 50,84 -64,17 -198,06 -2,74 46,40 228000,00 0,01055

tr-miR-32 AU-GG 48,60 -25,37 -78,85 -0,91 69,35 959800,00 0,06656

tr-miR-32 CU-AA 2 x CG 43,04 -24,71 -78,15 -0,47 25,60 1830000,00 0,0468

tr-miR-32 CG 40,40 -31,10 -99,00 -0,40 37,20 1380000,00 0,051

Supplementary Table 7. Kinetic and Thermodynamic data of truncated hairpins. Kinetic constants were measured with a MASS-1 against RBFOX proteins. The thermodynamic values

136

name sequence mass calc.

tr-miR-20b CAGGUAGUUUUGGCAUGACUCUACUG 8281,99 tr-miR-20b A-C CCGGUAGUUUUGGCAUGACUCUACUG 8257,96

tr-miR-20b A-C U-A CCGGUAGAUUUGGCAUGACUCUACUG 8281,00

tr-miR-20b A-C U-G CCGGUAGUGUUGGCAUGACUCUACUG 8297,00

tr-miR-20b UU-AG CAGGUAGAGUUGGCAUGACUCUACUG 8344,07

tr-miR-32 AAGUUGCAUGUUGUCACGGCCUCAAUGCAAUUU 10492,32

tr-miR-32 AU-GG AAGUUGCAUGUUGUCACGGCCUCAAUGCAGGUU 10547,36

tr-miR-32 C-G AAGUUGCAUGUUGUCACGGCCUCAAUGGAAUUU 10532,35

tr-miR-32 CU-AA AAGUUGCAUGUUGUCACGGCAACAAUGCAAUUU 10539,39

tr-miR-32 CU-AA 2xC-G AAGUUGCAUGUUGUCACGGCAAGAAUGGAAUUU 10619,45

Supplementary Table 8.. List of truncated hairpins for the correlation between hairpin strength and Protein RNA Affinity. Variations are marked in red.

137

Name Sequence

miR-10a-pre-F TACCCTGTAGATCCGAATTTGTG

miR-10a-pre-R TATTCCCCTAGATACGAATTTGTG

miR-18b-pre-F GGTGCATCTAGTGCAGTTAGTGA

miR-18b-pre-R GCCAGAAGGGGCATTTAGG

miR-107-pre-F TCTCTGCTTTCAGCTTCTTTACAGT

miR-107-pre-R GTACAATGCTGCTTGAACTCCAT

miR-20b-pre-F CAAAGTGCTCATAGTGCAGGT

miR-20b-pre-R CTGGAAGTGCCCATACTACAG

miR-32-pre-F GCACATTACTAAGTTGCATGTTGTC

miR-32-pre-R TCACACACACTAAATTGCATTG

Supplementary Table 9. List of primers

138

5.3.: Figures

Supplementary Figure 1. Natural RNA bases

139

Supplementary Figure 2. Python code to obtain all possible n-mers

140

Supplementary Figure 3. Alignment of the pri-miR-32 sequences using CARNA. The alignment shows the high level of conservation for the mature miRNA sequences. The colors show the number of possible base pairing partners. The first green column is the mutated site within the FBE.

141

Supplementary Figure 4. SPR spectroscopy sensograms of pre-miRs and mutated pre-miRs with an indication for unspecific binding at high concentrations

142

Supplementary Figure 5. Results from ELISA screen. Binders with binding motifs (GCAUG, GAAUG, GCACG) are marked with a red dot

143

Supplementary Figure 6. comparison kinetic vs steady state analysis of pre-miRs against RBFOX

144

Supplementary Figure 7. Change in absorption of RNA upon addition of 20 M RBFOX solution to the RNA

145

Supplementary Figure 8. Calculated structures of hairpins used in this work with their corresponding G values. Mfold was used for calculation. The FBE is shown in green and the mutations are shown in red. Deletions are marked by neighboring bases in italics.

146

6.: Manuscripts 6.1.: Development of a RNA negative control

Results and Discussion

A number of modified nucleosides with methylated nucleobases have been described which disable W-C pairing (e.g. N1-methylG, N3-methylU) [324, 325]. Methylation of the 4-amino group of cytidine yields a nucleoside with modified base-pairing properties (Figure 41 A).

Figure 41. Watson-Crick pairing of RNAs containing guanine with natural and methylated cytidines. A: Canonical G-C base pair. B: Guanine N4-methylcytosine (mMC) base pair. C: One possible conformation of a guanine N4.N4-dimethylcytosine (dMC) base pair. D: Melting profiles of duplexes comprising RNAs from Table1: (black squares: RNA duplex [1-2], red dots: RNA duplex [1-3]; green triangles: RNA duplex [1-4]; blue triangle RNA duplex [1-5]. E: The thermodynamic TM of duplexes is indicated by arrows, determined from the theoretical equilibrium curve f(T).

Whereas mono-methylation (N4-methylcytidine; mMC) permits a regular G-C base-pair if the methyl group adopts a trans conformation with N3 (Figure 41 B), dimethylation (N4,N4-dimethylcytidine; dMC) disrupts the W-C base-pair and therefore would be expected to lower the strength of the pairing interaction considerably (Figure 41 c). We hypothesized that a dMC-nucleotide in the seed of a miRNA would likely stack into the A-form helix and base-pair weakly with G in the passenger strand, jutting it’s dimethylamine group into the major groove thereby permitting loading into the RISC in analogous fashion to a G-U wobble [326]. However, binding to the mRNA in the subsequent targeting step would be so weakened by the partial base-pair in the seed that a significant loss of silencing activity would

147 likely be experienced, similar to that of a mismatch control. As most miRNAs contain at least one C-nt in the seed, it is not necessary to expand the repertoire of bases beyond C. An additional attractive feature of the dMC modification is that it could be conveniently incorporated into RNAs using the convertible nucleoside approach [327], employing a 4-triazolcytidine phosphoramidite during automated synthesis. Using dimethylamine instead of gaseous methylamine during oligonucleotide deprotection would then create the dMC at the desired position in the strand. This versatile synthesis strategy offers also the opportunity to incorporate other functionality into the major groove of miRNAs for a variety of chemical biology applications. The convertible nucleoside, 4-triazolcytidine phosphoramidite, was synthesized as previously described[328]. Solid phase RNA synthesis was performed under standard conditions. The 4-triazole group in selected cytidines of RNA sequences was then substituted by monomethyl- or dimethyl-amine with concomitant deprotection, while still on the CPG solid support. Removal of RNA protecting groups under standard conditions uses gaseous methylamine, thus the synthesis of mMC-containing RNAs was achieved without changes to the usual protocol. Substitution of triazole with dimethylamine was performed using aqueous dimethylamine, followed by gaseous methylamine to remove any traces of remaining base and phosphodiester protecting groups. RNAs were purified by RP-HPLC and then detritylated. Modified sequences were characterized by electrospray mass spectrometry. Synthesis yields for unmodified RNAs from a 50 nmol synthesis were in the range of 6-8 nmol whereas yields of dMC-modified and mMC-modified RNAs were 1-2 nmol and 1-10 nmol, respectively.

148

Sequence number or name Sequence (5' → 3') Mass calc. (g/mol) Mass (g/mol) 1 GUGUCUAAACUAUC 4391.7 4391.7 2 GAUAGUUUAGACAC 4454.8 4453.01 3 GAUAGUUUAGAAAC 4478.8 4478.13 4 GAUAGUUUAGACdMAC 4482.8 4482.35 5 GAUAGUUUAGACmMAC 4468.8 4468.15 6 GAUAGUUUAGACAA 4478.8 4478.13 7 GAUAGUUUAGACACdM 4482.8 4482.35 8 GAUAGUUUAGACACmM 4468.8 4468.15 9 GAUAGUUUAGAAAA 4502.8 4502.21 10 GAUAGUUUAGACdMACdM 4510.8 4509.92 11 GAUAGUUUAGACmMACmM 4482.8 4482.02 wt-34a (5p strand) UGGCAGUGUCUUAGCUGGUUGU 7029.2 7028.2 hsa-miR-34a (3p strand) CAAUCAGCAAGUAUACUGCCCU 6945.3 6943.8 mm-34a (5p strand) UGGAAGUGUCUUAGCUGGUUGU 7053.2 7052.3 mMC-34a (5p strand) UGGCmMAGUGUCUUAGCUGGUUGU 7043.2 7042.2 dMC-34a (5p strand) UGGCdMAGUGUCUUAGCUGGUUGU 7057.2 7056.0 wt-106a (5p strand) AAAAGUGCUUACAGUGCAGGUAG 7434.5 7433.8 hsa-miR-106a (3p strand) CUGCAAUGUAAGCACUUCUUAC 6923.1 6922.0 mm-106a (5p strand) AAAAGUGAUUACAGUGCAGGUAG 7458.6 7457.6 mMC-106a (5p strand) AAAAGUGCmMUUACAGUGCAGGUAG 7448.6 7447.6 dMC-106a (5p strand) AAAAGUGCdMUUACAGUGCAGGUAG 7462.6 7461.0 Table 15. Oligoribonucleotide Sequences Used in the Investigation

A series of RNAs (Table 15) were assembled to determine the consequences of incorporated N-methyl cytidines on binding affinity in short RNA duplexes. Control sequences bearing standard mismatch permutations were prepared for comparison. Melting curves from RNA- and modified RNA-duplexes (Figure 41 D) were transformed into an associated fraction plot calculated from the upper and lower baselines (Figure 41 E) and melting

o temperatures (TM) were calculated for f = 0.5 (Table 16). Values of ΔG were obtained from a plot of ln

o Ka vs. 1/T and the Gibbs equation. The complementary 14-nt duplex [1-2] yielded a TM of 61.2 C. A

o single C→A mismatch base-change at position 12 (duplex [1-3]) decreased the TM by 8.9 C, corresponding to a ΔΔGo of 2.2 kcal mol−1, which is within the typical range for an A-G mismatch depend on the flanking base-pairs[329]. The melting curve of the mMC-containing RNA duplex [1-5]) is similar

o o to that of the parent duplex [1-2] (TM‘s of 60.9 C and 61.2 C, respectively). Previous reports of the thermodynamics properties of RNA duplexes containing N4-methylcytidine appears to be inconsistent. The results of studies on cytosine nucleobases using NMR spectrometry [330] showed that the N4-methyl group favors a syn-conformation with N3 and therefore, forcing the anti-conformation necessary for a W-C base-pair would be an energy destabilizing element during hybridization. Indeed, a considerable loss of binding affinity was found for a single mMC modification close to the center of

149 an 11-nt RNA duplex [331]. However, N4-methylcytidine in a self-complementary 13-nt RNA duplex showed a considerable increase in duplex stability [324] whereas in an imperfect RNA duplex from the

4 HIV rev-response element N -methylcytidine caused no change in TM [332] consistent with our data. It is likely therefore that the base-pairs flanking an mMC-G base-pair determine whether the modification is stabilizing or destabilizing, as is also the case for mismatched base-pairs [329, 333].

o The TM of the RNA duplex [1-4] containing a single dMC modification at C12 is 51.7 C, giving a ΔTM of -9.6 oC and a ΔΔGo of 3.4 kcal mol−1 in comparison to the parent duplex [1-2]. Thus, in this particular 14-nt duplex the dMC modification is more destabilizing than a standard mismatch by approximately 1 kcal mol−1. Modification of base-pairs at the terminal position of the RNA duplex (mismatch-[1-6], mMC-[1-8], dMC-[1-7]) barely affected the free energy of hybridization, as expected. Furthermore, melting curves and energy calculations of the doubly modified duplexes [1-9], [1-10] and [1-11] were similar to those of the singly-modified sequences. In summary, the dMC modification fulfills three of the important requirements for a negative control for miRNA experiments: i) it does not constitute a new RNA sequence and therefore does not trigger a new mRNA target signature; ii) it is easily incorporated into RNAs; iii) it causes a large destabilization of base-pairing affinity with complementary sequences, including presumably the mRNA substrates of miRNAs in RISC.

RNA duplex Sequence (5‘→3‘) of modified RNA TM [°C] ΔG°37°C [kcal mol-1] We [1-2] GAUAGUUUAGACAC 61.2 ± 0.2 -16.8 ± 0.3 [1-3] GAUAGUUUAGAAAC 52.3 ± 0.1 -14.6 ± 0.1 next [1-4] GAUAGUUUAGA(dMC)AC 51.7 ± 0.03 -13.4 ± 0.1 [1-5] GAUAGUUUAGA(mMC)AC 60.9 ± 0.1 -17.2 ± 0.3 [1-6] GAUAGUUUAGACAA 60.2 ± 0.1 -16.2 ± 0.2 [1-7] GAUAGUUUAGACA(dMC) 60.1 ± 0.2 -19.0 ± 0.4 [1-8] GAUAGUUUAGACA(mMC) 61.1 ±0.2 -18.1 ± 0.1 [1-9] GAUAGUUUAGAAAA 51.2 ± 0.3 -13.3 ± 0.2 [1-10] GAUAGUUUAGA(dMC)A(dMC) 51.1 ± 0.03 -13.8 ± 0.1 [1-11] GAUAGUUUAGA(mMC)A(mMC) 59.2 ± 0.1 -16.3 ± 0.2 Table 16. Melting temperatures and free energy changes of hybridization from annealing oligoribonucleotides. investigated the properties of two miRNAs containing N-methylated cytidines in their seed regions, specifically whether they would be accepted into RISC. MiR-106a and miR-34a play important roles in regulating genes involved in cancer[334, 335]. Natural mimics of miR-106a and miR-34a were prepared by annealing RNA sequences for their 5p- and 3p-strands as defined in the miRNA database miRBase (Figure 42 A; www.mirbase.org). A series of modified guide strands of miR-106a and miR-34a were synthesized bearing a single mMC, a single dMC and a C→A mismatch in their seed regions at positions 8 and 4, respectively (Table 15). These were also annealed to the appropriate passenger strand

150 sequences. Finally, commercial miRNA mimics for miR-106a and miR-34a were acquired as well as two siRNAs: one targeting Renilla luciferase (siRen) to control for transfection efficiency and a correct functioning the reporter gene and an unrelated siRNA (siCon) which we have previously used to control for transfection toxicity[336] . The double-stranded RNAs (dsRNAs) were tested in series in three types of cellular assays. Two dual-luciferase reporters, each carrying in their 3’ UTR’s a single complementary target sequence to the miRNAs miR-34 and miR-106a assayed for the siRNA-like activity of the miRNAs (Figure 42 B). A second pair of reporters was constructed to assay their miRNA activity, each containing a unique site for miR-106a and miR-34a using a bona fide target sequence from two validated target sites in the 3’ UTRs of CDKN1A (P21) [337] and SIRT1 [338], respectively. Finally, for the miR-34a series of reagents an apoptosis assay was employed to show the functional consequences of methylated cytidines on the miR-34a-mediated induction of apoptosis in HeLa cells.

151

Figure 42. Methylated cytidines in microRNA(miRNA) duplexes are accepted into the RNA induced silencing complexes (RISC) and show varying levels of biological activity. (A) Structure of mimics miR-106a andmiR-34a. (B) Schematic representation of miR-106a and miR-34a targeting complementary sites embedded in parts of the SIRT1 and P21 3’ untranslated regions (UTRs), respectively. (C) and (D) HeLa cells transfected with luciferase reporter plasmids shown in (B) were treated after 24 hours with increasing doses (0, 2, 9 and 36 nM) of analogs of miR-106a and miR-34a. Relative luciferase activity was measured 48 hours after plasmid transfections, and residual luciferase activity is plotted after normalization to that of the 0 nM treatment [mean of triplicate transfections – standard deviation (SD)]. (E) Schematic representation of miR-34a and miR-106a analogs targeted to luciferase reporter genes bearing a single miR-RNA target site from P21 and SIRT1, respectively in their 3’ UTRs. (F) and (G) HeLa cells transfected with luciferase reporter plasmids shown in (E) were treated after 24 hours with analogs of miR-106a and miR-34a. Relative luciferase activity was measured 48 hours after plasmid transfections, and residual luciferase activity is plotted after normalization to that of the 0 nM treatment (mean of triplicate transfections ±SD). (H) Caspase 3/7 activity was measured from lysates of HeLa cells 72 hours after transfection with miR-34a analogs. Caspase 3/7 activity is plotted after normalization to that of the 0 nM treatment (mean of triplicate transfections ±SD).

HeLa cells were co-transfected with reporter constructs and dsRNAs at three concentrations, and normalized residual luciferase activity was measured 48 h later. In each experiment, the control siCon control showed little inhibitory activity whereas siRen strongly reduced luciferase activity indicating good transfection efficiency and no untoward toxicity. Unmodified miR-106a (wt-106a) and miR-34a

152

(wt-34a) inhibited their respective targets efficiently, similar to commercially-available mimic’s mimic-106a and mimic-34a (Figure 42 C, D). MiR-106a and miR-34a containing C→A mismatches in the seeds (mm-106a and mm-34a, respectively) showed a heavily-reduced activity in comparison to the wild-type counterparts. A siRNA guide strand employs its entire length in the recognition of an mRNA target sequence and therefore, a single mismatched base is not always expected to abolish all silencing activity unless it is located at a critical position. The doubly methylated dMC-106a also showed a significant reduction in silencing activity compared to its wild-type analog, whereas dMC-34a only showed activity at the highest dose. MiRNAs with a single mMC in the seed (mMC-106a, mMC-34a) showed similar levels of activity to their unmodified counterparts, indicating that the mMC-G base-pair is intact and that positioning of the N4-methyl into the major groove of the seed duplex region does not adversely affect the silencing activity of the RISC. This contrasts with miRNAs, which are modified at the minor groove binding sites in the seed, with for example 2’-O-methyl substituents on the ribose. Here the modification abolishes RISC activity, probably due to poor recognition by proteins of the RISC [339, 340]. The silencing activity of our seed-modified dsRNAs was most affected in assays reporting miRISC activity, as would be expected for a mechanism which depends on W-C recognition involving a seed region of only approximately 8 base-pairs. MiR-106a and miR-34a have been shown to regulate conserved target sites in the 3’ UTR’s of P21 and SIRT1, respectively. We constructed a luciferase reporter for miR-106a and miR-34a containing approximately 150-nt of these UTR’s, including the validated target sites (Figure 42 E). Co-transfection of the reporter with siRen showed efficient silencing of luciferase, whereas the wild-type miRNAs wt-106a and wt-34a and the commercial mimics showed the typical low level of inhibition expected for miRNA-regulation of a single target site in a UTR (Figure 42 F, G). Both mismatched and dMC-modified miRNAs were inactive in this assay. In contrast, mMC-106a was slightly less potent and mMC-34a was slightly more potent than their wild-type counterparts, once again confirming that the placement of groups in the major groove is not detrimental to miRNA activity. We have previously shown that miR-34a induces caspase 3/7 in HeLa cells causing apoptosis[336]. We tested the miR-34a dsRNAs in the caspase assay and observed that the trend in activity against the SIRT1 reporter construct was mirrored at the functional level (Figure 42 H). Thus, mimic-34a, wt-34a and mMC-34a were all able to induce caspase 3/7 in HeLa cells at 72 h, whereas mm-34a and dMC-34a were not. In summary, we have elaborated a method by which N-alkylated cytidines may be introduced into synthetic miRNA reagents using the convertible nucleoside approach. We have demonstrated that an N4-monomethylated cytidine in two short related RNA sequences does not adversely affect

153 hybridization affinity, though this may depend on its flanking base-pairs. We have shown with two examples that N4-monomethylated cytidine in the seed of a miRNA is accepted by the RISC and that the protrusion of the methyl group into the major groove does not hinder silencing of complementary targets or bona fide miRNA target sites. This bodes well for the introduction of other functionality into miRNAs as tools for biochemistry or chemical biology. The introduction of the N4,N4-dimethylcytidine into an RNA perturbs the formation of a W-C base-pair and lowers its binding affinity for a complementary sequence. We demonstrated that miRNAs with a single dMC in the seed region, both at position 4 (in miR-106a) and at position 8 (miR-34a), show reduced silencing activity, probably from reduced binding affinity to mRNAs. The modified C-base therefore satisfies all of the aforementioned criteria for use in negative miRNA controls, most importantly that a new seed sequence with target signature has not been introduced.

154

6.2.: Rapid high-yield cell-free expression of quantitatively biotinylated proteins (Draft)

Abstract Site-specific protein biotinylation is an important post-translational modification that finds ubiquitous applications in protein purification, detection and immobilization. However, the fast and efficient preparation of quantitatively and specifically modified proteins in sufficient amounts is still a common bottleneck. To overcome these frequent limitations, we here present a highly efficient Escherichia coli-based cell-free expression system which rapidly provides mg-quantities of specifically and quantitatively biotinylated [eukaryotic] proteins. Our system uses the newly designed cell-free expression vectors pCFX4, pCFX5A and pCFX5C for batch-mode expression of the target proteins either as N- or C-terminal fusion to the AviTag in the presence of biotin and biotin-protein ligase BirA. The employed S30 extract was prepared following a novel procedure which completely removes the only endogenous biotinylated E. coli protein BCCP. We applied our cell-free expression system to produce 0.2–0.6 mg of various biotinylated eukaryotic RNA-binding proteins per mL reaction mixture and used NMR spectroscopy to demonstrate that the obtained target proteins are natively folded and specifically and quantitatively biotinylated. The absence of endogenous biotinylated E. coli proteins further allows rapid immobilization of the unpurified biotinylated target protein directly from the crude reaction mixture to avidin/streptavidin-containing materials. We illustrate the potential of our system for high-throughput applications with the preparation of biotinylated human Fox-1(109–208) protein and determination of its RNA-binding affinity by surface plasmon resonance in less than six hours.

Introduction

The covalent attachment of the cofactor biotin to the Nε-amino group of specific lysine residues in biotin-dependent carboxylases is an essential post-translational modification found in diverse prokaryotic and eukaryotic metabolic pathways [341, 342] and is catalyzed by highly specific biotin-protein ligases [343]. In Escherichia coli this reaction is carried out by the bifunctional ligase/repressor protein BirA [344] which specifically modifies the biotin carboxyl carrier protein (BCCP) subunit of the acetyl-CoA carboxylase complex [345, 346]. It was discovered that biotin and biotinylated polypeptides bind with an extraordinary femtomolar dissociation constant to avidin and streptavidin [287, 347] which has resulted in many implementations of this strong interaction for purification, immobilization and detection of biotinylated proteins [348-351]. An important and widespread application includes the immobilization of specifically biotinylated proteins on surfaces coated with avidin or streptavidin, which has the advantage to be very rapid, highly efficient, extremely

155 stable and it preserves the structural and functional integrity of the target protein [350]. Biotinylation of target proteins for these applications can be achieved chemically, enzymatically or by protein engineering techniques. Chemical biotinylation approaches are very rapid and efficient but also unspecific and modify all reactive functionalities that are accessible to the biotinylation reagent [352]. In contrast to their obvious convenience, these reagents typically yield inhomogeneously biotinylated samples and can result in functional inactivation of proteins by modification of important residues [353]. To enable specific biotinylation of proteins at defined sites, alternative protein engineering approaches have been proposed such as stop codon-mediated C-terminal incorporation of cytidin(biotin)-puromycin [354], quadruplet-encoded incorporation of biotinylated p-aminophenylalanine [355] and intein-mediated C-terminal native chemical ligation with cysteine-biotin [350, 356]. However, these and other similar approaches are often tedious and time-consuming [350, 356], provide only partial biotinylation [350, 357, 358] and low protein yields [354, 355, 357]. A widely employed alternative to these protein engineering approaches is based on biotin-protein ligases which specifically catalyze biotinylation of a single lysine contained within defined target polypeptide sequences [350, 359, 360]. This specific reaction is often accomplished with the E. coli BirA enzyme which biotinylates the lysine residue in the optimized 15-mer polypeptide substrate GLNDIFEAQKIEWHE also known as AviTag [350, 360, 361]. Co-expression of an AviTag-fusion to the protein of interest with BirA enables simple and specific in-vivo biotinylation; however, the in-vivo biotinylation efficiency is typically rather low and provides incompletely modified target proteins [353, 360]. Increased biotinylation efficiency can be achieved when the target protein fused to the AviTag is first expressed and purified and then separately in-vitro biotinylated with purified BirA [350].

To overcome frequent limitations in the production of specifically biotinylated proteins, we here present a fast and efficient cell-free approach for high-yield expression and simultaneous quantitative biotinylation of eukaryotic proteins. We employ E. coli cell extracts devoid of the biotinylated BCCP protein which ensures that the only biotinylated species in the expression reaction is the protein of interest. This approach allows direct application of the biotinylated target protein from the crude reaction mixture to avidin/strepavidin-containing materials without interference from other undesired biotinylated proteins.

Results and discussion

Setup for cell-free expression of biotinylated proteins

The starting platform for high-yield production of specifically biotinylated proteins is based on the previously described E. coli-based batch-mode cell-free expression system [281]. To adjust this system

156 to the production of biotinylated proteins, we have generated a set of new cell-free expression vectors (Fig. 1 A-C) that allow expression of the gene of interest either as N- or C-terminal fusion to the 15 amino acid polypeptide GLNDIFEAQKIEWHE which is also known as the AviTag [360]. A short flexible amino acid linker sequence between the AviTag and the target protein provides independent structural mobility (see below). To increase the overall cell-free translation efficiency and protein solubility, the

AviTag-containing target proteins are expressed as N-terminal fusion constructs with the (His)6-GB1 domain [281]. If desired, either a thrombin cleavage site in the vector pCFX4 (Fig. 1A) or a TEV cleavage site in the vectors pCFX5A (Fig. 1B) and pCFX5C (Fig. 1C) allows facile proteolytic removal of the

(His)6-GB1 domain from the target protein. An additional Factor Xa cleavage site further enables specific removal of the His-tag while preserving the solubility-enhancing GB1 fusion to the target protein (Fig. 1A-C).

Fig. 1. Schematic representation of the new vectors designed for cell-free expression of biotinylated proteins and western blot analysis of the reaction supernatant obtained after cell-free production of various biotinylated eukaryotic target proteins in these vectors. A) – C) All three new cell-free expression vectors pCFX4 (A), pCFX5A (B) and pCFX5C (C) encode an N-terminal (His)6-GB1 fusion that strongly enhances solubility and production yields of the target protein [REF?]. If required, the (His)6-GB1 fusion can be easily removed with an engineered cleavage site for thrombin (in pCFX4) or TEV protease (in pCFX5A and pCFX5C) while a Factor Xa site allows selective removal of the (His)6-tag and preserves the solubility-enhancing GB1 domain in the final construct. The gene of interest can be inserted into the multiple-cloning site (MCS) to result in either an N-terminal (in pCFX4 and pCFX5A) or C-terminal (in pCFX5C) fusion to the 15-amino acid AviTag which is an ideal/efficient biotinylation substrate of the biotin-protein ligase BirA. T7 RNAP is the RNA polymerase from bacteriophage T7, RBS is the ribosomal binding site and ATG the start codon for translation. D) Western blot analysis of soluble biotinylated target proteins obtained from expression in the new cell-free vectors. Reactions were carried out for 2.5 h at 30°C in the presence of 2 μM BirA and 400 μM biotin, unless indicated otherwise. A total of 0.75 μL of the reaction supernatant was applied per lane and protein biotinylation was detected with a streptavidin-alkaline phosphatase conjugate. Bands corresponding to target proteins and biotin carboxyl carrier protein (BCCP) are indicated by arrowheads. Symbols used: kDa protein molecular weight marker; N negative control; 1 human Fox(109–208) in pCFX4 produced in absence of BirA and biotin; 2 human Fox(109–208) in pCFX4; 3 human Fox(109–208) in pCFX5A; 4 human Fox(109–208) in pCFX5C; 5 human full-length Fox in pCFX5A; 6 C. elegans Gld1(201–336) in pCFX5A; 7 C. elegans Gld1(136–336) in pCFX5A; 8 human EPRS-R1R2(749–875) in pCFX5A; 9 human full-length Lin28 in pCFX5A; 10 human full-length Lin28 in pCFX5C.

We carry out cell-free expressions for 2.5 h at 30 °C in batch-mode and supplement the reaction mixture with D-biotin and E. coli biotin-protein ligase BirA, which efficiently catalyzes specific

157 biotinylation of the lysine sidechain contained within the AviTag peptide [361]. The utilized BirA is obtained from routine overexpression in E. coli cells and provides yields of ca. 20 mg of purified and highly active BirA per liter of cell culture. Supplementation of the reaction mixture with a total of 2 μM BirA and 400 μM biotin assures efficient and quantitative biotinylation of the target proteins (see below).

Analytical cell-free expression of biotinylated eukaryotic RNA-binding proteins

We selected various eukaryotic proteins with different RNA-binding domains such as the RNA-recognition motif (RRM) [362], hnRNP K-homology domain (KH domain) [363], double-stranded RNA-binding domain [364] or zinc finger domain [365] as targets for analytical cell-free production and biotinylation in small scale. The target proteins were expressed from either pCFX4, pCFX5A or pCFX5C vectors in 50 μL cell-free reactions and the yields of soluble biotinylated protein were then qualitatively analyzed by western blot using a streptavidin-alkaline phosphatase conjugate for detection. A first analysis showed that all selected target proteins could be produced in soluble and biotinylated form (Fig. 1D). Comparison of the N- and C-terminal biotinylation of the target proteins in either pCFX4, pCFX5A or pCFX5C indicated similar N- and C-terminal biotinylation efficiencies of the individual target proteins (Fig. 1D). However, N-terminal biotinylation revealed the presence of prematurely aborted translation products for some target proteins which remained in solution throughout the reaction duration (Fig. 1D, lanes 2,3,8). These abortive products are likely to be also present in the corresponding C-terminally biotinylated target proteins but remain undetected due to the absence of the AviTag in incompletely translated constructs. The presence of soluble, non-native biotinylated protein constructs can potentially cause problems when all biotinylated species are directly transferred from the crude reaction mixture to avidin- or streptavidin-containing materials. In those cases it may be advisable to employ C-terminal biotinylation of the target protein as only the completely translated polypeptide will contain the biotinylated AviTag.

The western blot analysis further indicated significant amounts of an undesired biotinylated protein with an apparent molecular weight of ca. 22 kDa to be present in all cell-free reactions, including the negative control reaction which was devoid of BirA and biotin (Fig. 1D). This western blot band arises from the 16.7 kDa biotin carboxyl carrier protein (BCCP) subunit of acetyl-CoA carboxylase which is present in conventional E. coli S30 cell extracts. BCCP is the only endogenous biotinylated protein in E. coli and is known to migrate in denaturing polyacrylamide gels with an apparent molecular weight of ca. 22.5 kDa [345, 366]. The presence of biotinylated BCCP in the S30 cell extract prevents exclusive binding of the biotinylated target protein directly from the crude reaction mixture to avidin- or streptavidin-containing materials as BCCP, possibly bound to the multi-subunit acetyl-CoA carboxylase

158 complex, will co-purify on these materials and potentially interferes with and falsifies the intended downstream applications. To enable rapid and efficient cell-free expression of biotinylated proteins for direct immobilization from the crude reaction mixture, we therefore wanted to establish a system which is devoid of the unwanted endogenous BCCP.

Depletion of BCCP from S30 cell extracts

BCCP is an essential component of the E. coli acetyl-CoA carboxylase multi-enzyme complex and is required for cell viability [367]. Deletion of the BCCP-encoding accB gene from the genome of E. coli is therefore not feasible, which is why we investigated two alternatives to remove BCCP from cell extracts

(Fig. 2): In a first approach (Fig. 2A), a genomically encoded (His)6-tag was introduced into the accB gene of the E. coli BL21 (DE3) Star strain, resulting in a BCCP fusion containing a C-terminal (His)6-tag.

Cell extracts prepared from this E. coli BL21 (DE3) Star accB::(His)6 strain were slowly passed over

Ni-NTA beads to selectively remove the (His)6-tagged BCCP protein; however, removal of BCCP could not be achieved with this approach presumably because the (His)6-tag was inaccessible in the acetyl-CoA carboxylase multi-enzyme complex.

Fig. 2. Schematic representation of affinity chromatography-based approaches to remove the endogenous biotin carboxyl carrier protein (BCCP) from Escherichia coli cell extracts. A) Site-specific insertion of a genomically encoded (His)6-tag in the accB gene of E. coli provides an affinity handle that allows subsequent removal of the (His)6-tagged protein from the cell extract by passage of the lysate over Ni-NTA beads. B) A fusion protein of streptavidin with three C-terminal chitin-binding domains (SA-(CBD)3) is added to E. coli cell extract where it forms a high affinity complex with endogenous BCCP which is subsequently removed from the cell extract by slow passage over chitin beads.

159

We therefore investigated an alternative approach (Fig. 2B) in which conventional S30 cell extract is incubated with a streptavidin-(chitin-binding domain)3 fusion, SA-(CBD)3, that binds endogenous biotinylated BCCP and enables its selective removal from the extract by taking advantage of the CBD-chitin bead interaction. The extraordinary femtomolar dissociation constant of the complex between streptavidin and biotin [287] ensures quantitative capture of endogenous BCCP by the

SA-(CBD)3 fusion while the three consecutive chitin-binding domains provide avidity to retain the formed complex between BCCP and SA-(CBD)3 on the chitin beads. We have prepared the required

SA-(CBD)3 fusion protein by recombinant expression in E. coli, purified it under denaturing conditions to withdraw bound biotin from the streptavidin domain and refolded it by dialysis resulting in a final yield of ca. 16 mg purified fusion protein per liter of cell culture. The protocol for treatment of cell extracts with SA-(CBD)3 was stepwise optimized and the removal of BCCP was analyzed by western blot (Fig. 3). In the initial setup (Fig. 3A) we incubated increasing amounts of SA-(CBD)3 with a fixed volume of cell extract for 60 min on ice followed by addition of chitin beads and further incubation for 15 min at 20 °C. Analysis of the supernatant by western blot (Fig. 3A) indicated a strong depletion of

BCCP from the cell extract already in presence of ca. 3.1 μM monomeric SA-(CBD)3; however, a faint band of residual BCCP was still observable even after treatment with 12.5 μM SA-(CBD)3.

We therefore explored an alternative setup where S30 cell extract is slowly and repetitively passed over a column containing chitin beads that were pre-saturated with SA-(CBD)3 (Fig. 3B). The amounts of endogenous BCCP remaining in the cell extract were investigated at each step of the procedure by western blot which indicated that significant fractions of BCCP were continuously withdrawn from the extract after each column passage; however, small amounts of BCCP also remained in the extract after the last step (Fig. 3B). The incremental withdrawal of BCCP at each passage suggests that the exposure of BCCP to immobilized SA-(CBD)3 may have been too short to quantitatively bind BCCP, which may be attributed to reduced access of BCCP to the streptavidin domain of immobilized SA-(CBD)3. To overcome this limitation, we again modified the protocol and incubated the cell extract overnight at 4

°C with free SA-(CBD)3 to ensure its quantitative binding of endogenous BCCP (Fig. 3C). This incubation step could be directly combined with the overnight dialysis against S30 buffer which is an integral part of the standard S30 extract preparation protocol. The cell extract was then slowly passed over chitin beads and the quantitative removal of the BCCP/SA-(CBD)3 complex could be confirmed by western blot analysis of the extract before and after the procedure (Fig. 3C). This final approach has subsequently been used for routine preparation of a novel S30 cell extract devoid of endogenous BCCP.

160

Fig. 3. Optimization of the SA-(CBD)3-mediated BCCP removal from cell extracts. Three different protocols for binding of SA-(CBD)3 to BCCP and affinity removal of the formed complex from the cell extract were analyzed by western blot for their efficiency of BCCP clearance: A) Various amounts of SA-(CBD)3 were incubated for 1 h on ice with a fixed volume of cell extract. After incubation with chitin beads for 15 min at 20 °C, the supernatant was removed and was analyzed by western blot. B) Cell extract was slowly passed four times at 4 °C over chitin beads saturated with SA-(CBD)3 and was analyzed by western blot (1) before, (2) after the first, (3) the second, (4) the third and (5) the fourth passage over chitin beads. C) The cell extract was incubated overnight with SA-(CBD)3 at 4 °C and was then slowly passed over chitin beads at 4 °C. Each step was analyzed by western blot: (1) cell extract containing SA-(CBD)3 before and (2) after passage over chitin beads; (3) cell extract in absence of SA-(CBD)3 before and (4) after passage over chitin beads; (5) chitin beads after passage with cell extract containing SA-(CBD)3. M indicates the protein molecular weight marker.

Preparative cell-free production and analysis of biotinylated RNA-binding proteins

In the following, our novel BCCP-depleted S30 extract replaced the conventional S30 extract as basis for continued cell-free production of biotinylated proteins. Using this optimized setup, we repeated the small-scale production and western blot analysis of biotinylated RNA-binding proteins which indicates absence of endogenous BCCP and reveals the target proteins as the only biotinylated species in the reaction mixture (Fig. 4A). To analyze the extend and specificity of target protein biotinylation in this reaction setup, we conducted preparative scale cell-free production of N-terminally and of C-terminally biotinylated human Fox-1(109–208) and of N-terminally biotinylated human EPRS-R1R2(749–875) using 15N-labeled amino acids and obtained yields of 0.2–0.6 mg of biotinylated protein per mL reaction mixture after purification. The corresponding unbiotinylated samples of the

161 three target proteins were prepared as a control in a separate cell-free reaction devoid of biotin and BirA. Mass spectroscopic analysis of the individual target protein preparations revealed the expected mass difference of 226.2 Da between the control and the biotinylated constructs (Fig. 4B). In addition, all biotinylated samples gave rise to a peak corresponding to the mass of a mono-biotinylated protein and did not show a detectable peak at the mass expected of the unmodified construct. These results suggest quantitative and specific biotinylation of the target proteins. To further support these findings, we recorded and analyzed 2D [15N,1H]-HSQC spectra of the 15N-labeled protein samples (Fig. 4 C-E). The nearly perfect superposition of the nicely dispersed resonances in the 2D [15N,1H]-HSQC spectra of N-terminally biotinylated Fox-1(109–208) and the corresponding unbiotinylated control protein (Fig. 4C) indicates that both samples share the same native fold. The few spectral differences originate from the changed chemical environment of the amide groups near the specific biotinylation site in the AviTag. An overlay of the 2D [15N,1H]-HSQC spectra of N- and C-terminally biotinylated Fox-1(109–208) (Fig. 4D) demonstrates native folding of the target protein independent of N- or C-terminal fusion to the AviTag. Here, the observed spectral differences are due to different linker residues in N- and C-terminal AviTag fusions and to the altered chemical environment of the nearby amide groups.

Direct comparison of the 2D [15N,1H]-HSQC spectra of N-terminally biotinylated human EPRS-R1R2(749–875) and the respective unbiotinylated control protein also indicates an identical and native fold for both protein samples (Fig. 4E). The few shifted resonances are again caused by the changed chemical environment near the biotinylation site. All spectra of the three biotinylated target proteins display a distinct set of resonances that correspond to a homogenous protein population (Fig. 4C–E) which in turn strongly supports quantitative and specific target protein biotinylation with our system.

162

Fig. 4. Analysis of biotinylated proteins produced from BCCP-depleted cell-free reaction mixtures. All reactions were carried out for 2.5 h at 30 °C with BCCP-depleted cell extract and in presence of 2 μM BirA and 400 μM biotin. A) Western blot analysis of biotinylated target proteins obtained from a BCCP-depleted reaction mixture. Each lane corresponds to 0.75 µL of the reaction supernatant. Arrowheads indicate the bands corresponding to the target proteins. Symbols used: M protein molecular weight marker; N negative control; 1 human Fox(109–208) in pCFX5A; 2 human Fox(109–208) in pCFX5C; 3 human full-length Fox in pCFX5A; 4 C. elegans Gla3(36–75) in pCFX5C 5 C. elegans Gld1(201–336) in pCFX5A; 6 C. elegans Gld1(136–336) in pCFX5A; 7 human EPRS-R1R2(749–875) in pCFX5A; 8 human full-length Lin28 in pCFX5A; 9 human full-length Lin28 in pCFX5C. B) Mass spectroscopic analysis of human Fox(109–208) containing either an N-terminal (Fox-NBio) or C-terminal (Fox-CBio) biotinylation tag and of EPRS-R1R2(749–875) with an N-terminal biotinylation tag (R1R2-NBio). As a negative control, the same constructs were cell-free expressed in absence of BirA and biotin. Shown is an overlay of the biotinylated (grey lines) and unbiotinylated (black lines) constructs after proteolytic removal of the (His)6-GB1 domain with TEV protease. C)–E) NMR characterization of uniformly [15N]-labeled and biotinylated Fox(109–208) and EPRS-R1R2(749–875) constructs after removal of the (His)6-GB1 domain. C) Overlay of 2D [15N, 1H]-HSQC spectra of 225 μM Fox(109–208) with N-terminal AviTag prepared in presence (black signals) and absence of BirA and biotin (red signals). D) Superposition of 2D [15N, 1H]-HSQC spectra of 225 μM Fox(109–208) with N-terminal biotinylation (red signals) and 105 μM Fox with C-terminal biotinylation (black signals). E) Overlay of 2D [15N, 1H]-HSQC spectra of 100 μM EPRS-R1R2(749– 875) with N-terminal AviTag prepared in presence (red signals) and absence of BirA and biotin (black signals).

163

Rapid target protein preparation for surface plasmon resonance measurements

The absence of endogenous biotinylated BCCP protein in our cell-free production system assures that the protein of interest is the only biotinylated species in the reaction mixture. This particular feature bears a great potential that enables the rapid and direct application of the produced biotinylated target protein directly from the crude reaction mixture onto avidin- or streptavidin-containing materials. To demonstrate the feasibility of this approach, we wanted to apply our system to rapidly produce and immobilize biotinylated human Fox-1(109–208) on streptavidin-coated chips for the determination of its RNA-binding affinity by surface plasmon resonance. Human Fox-1 is an important regulator of alternative splicing and specifically recognizes UGCAUG-RNA elements with its RNA-recognition motif [9]. A previous surface plasmon resonance (SPR) study employed immobilized biotin-5’-CUCUGCAUGU-3’ RNA to investigate its binding to human Fox-1(109–208) [9] and provides a direct reference to our inversed approach in which we investigate the interaction between immobilized human Fox-1(109–208) and free 5’-UGCAUGU-3’ RNA.

Fig. 5. Surface plasmon resonance analysis of RNA binding to C-terminally biotinylated Fox(109–208). The biotinylated Fox construct was immobilized on the streptavidin-coated sensor surface either after purification (A) or directly from the crude cell-free reaction mixture (B). The binding experiments were recorded at 25 °C in SPR buffer (10 mM HEPES at pH 7.4, 200 mM NaCl, 3.4 mM EDTA) in a concentration series of 100, 50, 25, 12.5, 6.25, 3.13, 1.56, 0.78, 0.39, 0.2, 0.1 and 0.05 nM of the 5’-UGCAUGU-3’ RNA analyte. All injections were measured as duplicates and the obtained sensograms were fitted with a 1:1 Langmuir model that includes mass transfer and double referencing.

We initiated our investigations by immobilization of purified N-terminally biotinylated Fox-1(109–208) on the streptavidin-coated SPR sensor chip. The response due to the interaction with 5’-UGCAUGU-3’ RNA was then measured in a concentration series from 50 pM to 100 nM of the RNA analyte (Fig. 5A)

164 and indicated a KD of ca. 5.2 nM in presence of 200 mM NaCl which in agreement with the available results [9]. An important consideration before aiming for the direct immobilization of the biotinylated target protein from crude reaction mixtures is the presence of residual free biotin which will compete for the available streptavidin binding sites on the SPR sensor chip and therefore dilute the desired target protein immobilization process. Batch-mode cell-free protein expression typically accumulates ca. 1–20 μM of target protein in the reaction mixture [281] which implies large residual quantities of the initially employed 400 μM biotin. The ideal reaction setup would comprise an equimolar ratio of initial biotin to final target protein so that the bulk of the employed biotin will eventually become covalently attached to the target protein during the cell-free reaction.

To improve the target protein immobilization from crude reaction mixtures, we therefore assessed various ratios of biotinylated target protein to free biotin by production of C-terminally biotinylated Fox-1(109–208) in absence and in presence of 2.5, 5, 7.5, 10, 12.5 or 15 μM biotin and passed the reaction mixtures directly over the streptavidin-coated SPR sensor chip. As expected, the sensograms indicated a significantly stronger response from reactions with lower initial biotin concentrations (Fig. 6) which can be rationalized with an increased proportional immobilization of biotinylated target protein and decreased binding of biotin. The reactions which initially contained 2.5 and 5 μM biotin gave particularly nice responses that approximated the one of the purified reference biotinylated Fox-1(109–208) (Fig. 6).

Fig. 6. SPR analysis of the direct immobilization of C-terminally biotinylated Fox-1(109–208) from the crude reaction mixture onto a streptavidin-coated surface plasmon resonance sensor chip. The cell free reaction mixtures for production of the biotinylated target protein were carried out in presence of various amounts of biotin which is indicated by color coding. The immobilization of purified biotinylated Fox-1 served as a reference. Blue arrows indicate the time points of sample injection into the channels of the sensor chip. We therefore established a maximum concentration of 5 μM biotin in cell-free reactions which are intended for direct target protein immobilization. Additional supplementation of the reaction mixture with thrombin or TEV protease further allows proteolytic removal of the N-terminal (His)6-GB1 fusion from the target protein. To assess the proteolytic cleavage efficiency, we added various amounts of TEV protease either at the beginning or after cell-free production of C-terminally biotinylated (His)6-GB1-Fox-1(109–208) and continued incubation of the reaction mixture for 1.5 h at 20 °C. Western blot analysis revealed quantitative cleavage of the fusion construct in solutions containing 60 μg/mL TEV protease.

165

To test the suitability of direct target protein immobilization from the crude reaction mixture for rapid SPR measurements, we expressed C-terminally biotinylated Fox-1(109–208) for 2.5 h in presence of 5

μM biotin and subsequently removed the N-terminal (His)6-GB1 fusion tag by incubation for 1.5 h with 60 ng/μL TEV protease. A total of 130 μL of a 1:500 dilution of the reaction supernatant in SPR buffer was passed over each channel of the streptavidin-coated SPR sensor surface to immobilize the target protein. The interaction of the immobilized Fox-1(109–208) with 5’-UGCAUGU-3’ RNA was then analyzed in a concentration series from 50 pM to 100 nM of the RNA analyte (Fig. 5B) and indicated a

KD of 4.5 nM which nicely agrees with the results obtained with the purified biotinylated Fox-1(109– 208) (Table 17).

-1 -1 Experiment Coating [RU] ka [Ms ] kd [s ] RUmax KD [nM]

1st Purified Fox-1 172 5.2E6 0.027 7.58 5.2

2nd Purified Fox-1 165.5 1.33E6 0.0077 6.18 5.78

3rd Purified Fox-1 169.4 2.19E6 0.008 5.71 3.67

4th Purified Fox-1 151.6 3.4E6 0.0156 6.16 4.65

1st Crude Mix Fox-1 181.9 2.9E6 0.0128 7.11 4.45

2nd Crude Mix Fox-1 180.1 1.52E6 0.0079 5.85 5.2

3rd Crude Mix Fox-1 172.7 1.20E6 0.00577 5.65 4.81

4th Crude Mix Fox-1 164.1 1.52E6 0.00641 4.92 4.2

Table 17. Results of four independently conducted SPR concentration series experiments of the interaction between Fox-1(109–208) and 5’-UGCAUGU-3’ RNA. The target protein Fox-1(109–208) was immobilized either from purified samples (purified Fox-1) or directly from the crude reaction mixture (crude mix Fox-1). All injections were measured as duplicates. This demonstrates the exceptional suitability of our cell-free production system for rapid and efficient production and immobilization of biotinylated target proteins for SPR measurements. Starting from the gene, we can prepare exclusively biotinylated eukaryotic proteins for direct application onto streptavidin- or avidin-containing materials within 2.5–4 h which enables further SPR analysis of its interactions with ligands in less than 2 h. Economically speaking, a typical 100 μL cell-free reaction costs less than 0.5 CHF and yields enough biotinylated protein to easily allow ca. 400 individual SPR measurements.

Conclusion

We here document the rapid and efficient cell-free production of N- or C-terminally biotinylated eukaryotic proteins. The introduction of new cell-free expression vectors allows flexible expression of the target proteins as either N- or C-terminal fusions to the AviTag which are biotinylated in situ by supplementation of the reaction mixture with biotin and BirA. We have applied our production system to prepare milligram amounts of natively folded eukaryotic RNA-binding proteins and used NMR spectroscopy to demonstrate their quantitative and specific biotinylation. The establishment of a novel procedure to deplete E .coli S30 cell extracts from endogenous biotinylated BCCP protein provided the basis for an advanced production system that guarantees the target protein to be the only biotinylated

166 protein in the reaction mixture. This cell-free expression system was further optimized for direct target protein application from the crude reaction mixture onto streptavidin- or avidin-containing materials by adjusting the employed initial biotin concentration for an enhanced ratio of biotinylated target protein to unreacted biotin. This system was applied for production of C-terminally biotinylated Fox-1(109–208) which was immobilized straight from the crude reaction mixture onto a streptavidin-coated sensor chip for subsequent SPR experiments. We determined a KD of ca. 4.5 nM for the complex between immobilized Fox-1(109–208) and 5’-UGCAUGU-3’ RNA which nicely agrees with results obtained with purified Fox-1(109–208) and the available literature values [9]. The combination of our cell-free production system with SPR measurements enables extremely rapid analysis of protein-ligand interactions and allows going from gene to kinetic characterization of protein interactions in less than six hours. The high productivity of our cell-free system renders it very economical and rapidly provides enough biotinylated protein for more than thousand SPR measurements per mL of reaction mixture. These features strongly support the suitability of our protein production system for high-throughput applications.

Materials and Methods

Preparation of the vectors pCFX4, pCFX5A and pCFX5C and cloning of target genes

All constructs were sequence-verified at Microsynth (Switzerland). The cell-free expression vector pCFX4 encodes for a thrombin-cleavable N-terminal (His)6-GB1-Avitag fusion with the gene of interest (GOI) (Fig. 1A) and was prepared by site-directed mutagenesis of the vector pCFX1 [281] with the oligonucleotide primers Avi_Fwd and Avi_Rev. Site-directed mutagenesis of the vector pCFX3 [281] using the oligonucleotide primers pCFX5A_Fwd and pCFX5A_Rev yielded pCFX5A which encodes a

TEV-cleavable N-terminal (His)6-GB1-Avitag fusion with the GOI (Fig. 1B). The vector pCFX5C encodes for a TEV-cleavable N-terminal (His)6-GB1- and C-terminal Avitag-fusion with the GOI (Fig. 1C) and was generated by ligation of the annealed and phosphorylated oligonucleotides 5CBio and 3CBio into XhoI-BamHI restriction-digested pCFX7D [368]. Full-length murine Fox-1 was PCR-amplified with the oligonucleotide primers Fox_FL_Fwd and Fox_FL_Rev from the plasmid pcDNA3.1-FLAG-mouseA2BP1 (obtained from D. L. Black) and transferred into NdeI-XhoI digested pCFX5A. For N-terminal biotinylation, the RNA-binding domain (RBD) of human Fox-1 comprising residues 109-208 was subcloned from pet28A-Fox(109-208) [9] into pCFX4 and pCFX5A using the NdeI-XhoI restriction sites. For preparation of the C-terminally biotinylated Fox-RBD the gene encoding Fox-1(109-208) was PCR-amplified from pET28A with T7 promoter and Fox_RRM_Rev_5C primers and was transferred into pCFX5C using the NdeI-BamHI restriction sites. The human glutamyl-prolyl-tRNA synthetase (EPRS) domains R1R2 comprising residues 749-875 were PCR-amplified from pCFX3-R1R2 [368] and

167 transferred into NdeI-BamHI digested pCFX5A. The zinc finger domain ZnF1 of Gla-3 comprising residues 36-75 was PCR-amplified from C. elegans cDNA (obtained from M. Hengartner) with the Gla3_ZnF1_Fwd and Gla3_ZnF1_Rev primers and inserted into pCFX5C using the NdeI-BamHI restriction sites. The Gld-1 construct comprising residues 136-336 was amplified from C. elegans cDNA (obtained from M. Hengartner) with the Gld1_Qua_Fwd and Gld1_Rev primer pair and inserted into pCFX5A using the NcoI-BamHI restriction sites, while the shorter construct of residues 201-336 was amplified with the Gld1_KH_Fwd and Gld1_Rev primer pair and was inserted into pCFX5A using the NdeI-BamHI sites. Full-length human Lin28 was amplified from pET28a-Lin28 (obtained from F. Laughlin) with either the Lin28_Fwd/Lin28_Rev_5A or Lin28_Fwd/Lin28_Rev_5C primer pairs and was inserted into NcoI-BamHI digested pCFX5A and pCFX5C, respectively.

Cloning, expression and purification of biotin-protein ligase BirA

The gene encoding full-length biotin-protein ligase BirA was PCR-amplified from E. coli BL21 (DE3) genomic DNA using the BirA_Fwd and BirA_Rev primer pair (Supplementary Table 1) and was inserted into pET14b using the NdeI-BamHI restriction sites. E. coli BL21 (DE3) cells containing the plasmid pET14b-BirA were then grown overnight with shaking at 30°C in 40 mL Luria Bertani (LB) medium containing 100 mg/L carbenicillin. 20 mL of this preculture was used as inoculum for 2 L LB medium containing 100 mg/L carbenicillin and was grown at 30°C to an OD600 of ca. 0.75 when BirA expression was induced with 0.25 mM IPTG. After 4 h the cells were harvested by centrifugation (20 min at 6000 × g and 4°C) and were resuspended on ice in 30 mL buffer RBA (50 mM sodium phosphate at pH 7.4, 30 mM imidazole, 500 mM sodium chloride, 1 mM DTT, 0.2 mM PMSF) followed by two passages through a French Press at 16000 psi (Thermo Electron Corporation). The cell debris was removed by centrifugation (30 min at 30000 × g and 4°C) and the cleared supernatant was applied on a 5 mL HisTrap HP column (GE Healthcare) with a flowrate of 2.5 mL/min. After washing with 80 mL buffer BA (50 mM sodium phosphate at pH 7.4, 30 mM imidazole, 500 mM sodium chloride), BirA was eluted with a 100 mL linear gradient from 30-500 mM imidazole in buffer BA. Eluted fractions showing significant absorption at 280 nm were analyzed by SDS-PAGE and those containing pure BirA were pooled and dialyzed overnight in a 12-14 kDa MWCO SpectraPor4 dialysis membrane against against 2 L storage buffer (50 mM HEPES-KOH at pH 8.0, 100 uM DTT, 100 μM EDTA, 10 μM NaN3, 5 % (v/v) glycerol). After continued dialysis for 6 h on the next day with 2 L of fresh storage buffer, the protein solution was concentrated at 4000 × g and 4°C in a 5 kDa Vivaspin-20 ultracentrifugation device (Sartorius Stedim Biotech GmbH) to ca. 100 μM (12 mL final volume containing a total of 40 mg BirA) and was stored in appropriate aliquots at -80°C.

168

Cloning, expression and purification of the streptavidin-(chitin-binding domain)3 fusion construct

SA-(CBD)3

The Streptomyces avidinii Streptavidin gene was PCR-amplified from pET21a-Streptavidin-Alive [369] using the SA_Fwd and SA_ Rev primer pair, inserted into pET21a using the NdeI-XhoI restriction sites and then subcloned into pET19b using the XbaI-XhoI sites to yield pET19b-SA. The expression vector pEM1, which encodes a TEV-cleavable N-terminal (His)6-GB1 fusion domain, was constructed by subcloning the XbaI-BamHI insert from pCFX3 [281] into pET19b. The gene encoding the Bacillus circulans chitin-binding domain (CBD) was amplified in three separate PCR reactions from the plasmid pCFX11B [368] using the primer pairs CBD1_Fwd/CBD1_Rev, CBD2_Fwd/CBD2_Rev and CBD3_Fwd/CBD3_Rev, respectively. The CBD1 PCR product was restriction digested with NotI-SapI, the CBD2 product with SapI-NcoI and the CBD3 product with NcoI-BamHI. The three digested PCR products were then simultaneously ligated into NotI-BamHI digested pEM1 and then further subcloned into pET19b-SA using the NotI-EcoRI sites to yield pET19b-SA-(CBD)3. The final expression vector pEM1-His-GB1-SA-(CBD)3 was generated by subcloning the NdeI-EcoRI insert from pET19b-SA-(CBD)3 into pEM1. E. coli BL21(DE3) pLysS cells containing pEM1-His-GB1-SA-(CBD)3 were then grown overnight at 37°C in 7.5 mL LB medium containing 1% (w/v) glucose and 50 mg/L carbenicillin. This preculture was used as inoculum for 2 L prewarmed LB medium containing 1% (w/v) glucose and 50 mg/L carbenicillin and was grown at 37°C to an OD600 of ca. 0.8 when protein expression was induced for 5 h with 1 mM IPTG. The cells were harvested by centrifugation (10 min at 6000 × g and 4°C), resuspended in 20 mL buffer SA (50 mM Tris-HCl at pH 7.2, 20 mM imidazole, 500 mM sodium chloride, 8 M urea, 1 mM 2-ME) and passed twice through a French Press at 16000 psi. The obtained cell lysate was incubated for 5 min at 60°C with gentle agitation and was then centrifuged for 30 min at 35000 × g to remove cell debris. The supernatant was then filtered through a 0.45 um sterile filter (Sarstedt AG) and passed over a 5 mL HisTrap HP column with a flow rate of 1 mL/min. After washing with 160 mL buffer SA, His-GB1-SA-(CBD)3 was eluted with a 100 mL linear gradient from 30-500 mM imidazole in buffer SA. Fractions containing significant amounts of His-GB1-SA-(CBD)3 were identified by

SDS-PAGE and were pooled and supplied with 10 mM DTT and 5 mM EDTA. The His-GB1-SA-(CBD)3 protein was refolded by overnight dialysis in a 3.5 kDa MWCO SpectraPor3 dialysis membrane at 4°C against 4 L of refolding buffer (50 mM Tris-HCl at pH 7.2, 100 mM of each glutamate and arginine, 200 mM sodium chloride, 1 mM DTT, 100 uM EDTA) and the N-terminal (His)6-GB1 fusion was subsequently cleaved by addition of 3 mL of a 0.5 mg/mL TEV protease solution, which was prepared as previously described [281]. The SA-(CBD)3 protein was separated from (His)6-GB1 by passing the proteolytically cleaved solution over a 5 mL HisTrap HP column. The flow-through containing SA-(CBD)3 was supplied with 10 mM DTT and 200 uM EDTA and was dialyzed for one day in a 3.5 kDa MWCO SpectraPor3

169 dialysis membrane at 4°C against 4 L of storage buffer (20 mM Tris-acetate at pH 8.2, 60 mM potassium acetate, 5% (v/v) glycerol, 1 mM DTT). The final sample contained ca. 20 μM SA-(CBD)3 in total volume of 50 mL and was stored at 4°C.

Analytical removal of BCCP from S30 cell extracts

S30 cell extract for analytical removal of endogenous BCPP was prepared as previously described [281].

We investigated removal of BCCP either by incubation of cell extract with SA-(CBD)3, addition of chitin beads and collection of the supernatant (approach A; Fig. 4A), passing cell extract over

SA-(CBD)3-saturated chitin beads (approach B;Fig. 4B), or by incubating cell extract with SA-(CBD)3 and subsequent passage over chitin beads (approach C;Fig. 4C). In approach A, 100 μL of a solution containing 20 % (v/v) S30 extract and various amounts of SA-(CBD)3 in S30 buffer was incubated on ice for 1 h with gentle agitation, followed by supplementation with 50 μL chitin beads and incubation for 15 min with gentle agitation at 20°C. A total of 50 μL of the supernatant was then precipitated with ice-cold acetone and resuspended in 100 uL SDS-PAGE loading buffer (150 mM Tris-HCl at pH 6.8, 6 % (v/v) 2-ME, 1.2 % (v/v) SDS, 30 % (v/v) glycerol, bromophenol blue) for western blot analysis (see Fig. 4a). In approach B, 1 mL of cell extract was slowly passed three times at 4°C over 2.5 mL of 50 % (v/v) chitin beads pre-saturated with 5 mL of 20 μM SA-(CBD)3 and pre-equilibrated in S30 buffer. A total of 20 μL of the cell extract was precipitated with acetone at various steps of the procedure and was resuspended in 200 μL SDS-PAGE loading buffer for western blot analysis (see Fig. 4b). In approach C,

1 mL S30 cell extract was supplemented with 5 μM SA-(CBD)3 and was incubated overnight at 4°C followed by a single slow passage (one drop per second) at 4°C over 2.5 mL of 50 % (v/v) chitin beads equilibrated in S30 buffer. A total of 20 μL extract was removed before and after passage over chitin beads, precipitated with acetone and resuspended in 200 uL SDS-PAGE loading buffer for western blot analysis. To analyze retention of BCCP on the chitin beads, 250 μL of the beads were suspended in 250 μL SDS-PAGE loading buffer.

Preparation of BCCP-depleted S30 cell extracts

S30 extract for cell-free protein expression was produced from E. coli BL21 (DE3) Star cells based on a modification of the previously described detailed protocol [281]. To avoid contamination with RNases, all equipment was treated sequentially with RNase-AWAY (Molecular BioProducts) and DEPC-treated water and all solutions except the culture media were prepared with DEPC-treated water. Unless stated otherwise, all preparative steps after cell growth were performed on ice while centrifugation and dialysis were carried out at 4°C.

170

E. coli BL21 (DE3) Star cells were grown in 4.8 L of PYG medium (5.6 g/L KH2PO4, 28.9 g/L K2HPO4, 10 g/L yeast extract, and 1% (w/v) glucose) at 37°C to an OD600 of ca. 0.8 and were harvested by centrifugation (10 min at 5000 × g). The cell pellet was resuspended twice in 500 mL S30 buffer (10 mM Tris-OAc at pH 8.2, 60 mM KOAc, 14 mM Mg(OAc)2, 7.15 mM 2-mercaptoethanol, 1 mM DTT) and once with 2 mL S30 buffer per gram of cells before disruption by a single passage through a French Press at 16000 psi. After centrifugation of the cell lysate (30 min at 30000 × g), ca. 90 % of the supernatant was carefully removed by pipetting without disturbing the pellet and was incubated for 40 min at 30°C with 0.25 volumes of pre-incubation mixture (293.3 mM Tris-OAc at pH 8.2, 84 mM PEP,

13.17 mM ATP, 9.24 mM Mg(OAc)2 6.67 U/mL of pyruvate kinase). The pre-incubation supernatant (10 min at 4000 × g) was dialyzed for 1.5 h in supplied with 5 μM SA-CBD(3) and further dialyzed overnight against 2 L of fresh S30 buffer. The extract was slowly passed with a flow rate of one drop per second over 12.5 mL chitin beads pre-equilibrated in S30 buffer and residual cell extract was washed with 5 mL S30 buffer from the chitin bead material. The obtained extract was finally concentrated in a 10 kDa MWCO Vivaspin-20 ultrafiltration device to an A260 of ca. 300 and was cleared by centrifugation (10 min at 4000 × g) to remove insoluble particles. Appropriate aliquots of the S30 extract were frozen by immersion into liquid nitrogen and stored at -80 °C until needed.

Cell-free expression/biotinylation and sample preparation of target proteins

The cell-free reaction mixture for protein expression is based on a previously described batch-mode protocol [370] and contained 58 mM HEPES-KOH at pH 8.2, 217 mM potassium acetate, 175 μg/mL E. coli tRNA (Sigma), 3.25% (v/v) PEG-8000 (Fluka), 11 mM magnesium acetate (Applichem), 2.1 mM DTT (Applichem), 2.1 mM 2-ME (Applichem), 1.2 mM ATP (Applichem), 0.86 mM each of GTP (Fluka), CTP and UTP (both Applichem), 80 mM creatine phosphate (Sigma), 5.8 μM creatine kinase from rabbit muscle (Roche), 3.8 mM sodium azide (Fluka), 1.5 mM of each of the 20 proteinogenic amino acids (Spectra Stable Isotopes), 68 μM folinic acid (Sigma), 640 μM cAMP (Sigma), 0.65 μM T7 RNA polymerase, 10 ng/μL template plasmid and 30% (v/v) S30 cell extract. For production of biotinylated target proteins, the cell-free reaction mixture contained BCCP-depleted S30 cell extract and was supplemented with 0.5–2 μM BirA and 5-400 μM D-biotin. Small-scale reactions (25–50 μL) were incubated in 1.5 mL tubes and large-scale reactions (5–10 mL) were incubated in 15 mL tubes for 2.5 h at 30°C with gentle agitation in a Thermomixer Comfort (Eppendorf). Small-scale cell-free reactions of proteins for SPR measurements were subsequently centrifuged for 3 min at 14,000 × g and the supernatant was diluted with HEPES buffer and directly loaded on the SPR chip. Optional proteolytic removal of the N-terminal (His)6-GB1 fusion in target proteins for SPR measurements was achieved by

171 addition of 30 ng TEV protease per μL of reaction mixture either at the start or after 2.5 h of a prolonged cell-free reaction with a total duration of 4 h. Target proteins from large-scale cell-free reactions were purified from the cleared supernatant (5 min at 4,000 × g) by Ni-NTA affinity chromatography using a 5 mL HisTrap HP column mounted on an Aekta prime FPLC system (GE Healthcare) as described previously [370]. Subsequent proteolytic removal of the N-terminal

(His)6-GB1 fusion using TEV protease, transfer into the final buffer by dialysis and sample concentration by ultrafiltration were conducted as described previously [281].

Analysis of protein expression and biotinylation

SDS-PAGE analysis was carried out using 12 % SDS-Tris-Laemmli gels [371] and a Tris-Tricine running buffer (100 mM Tris, 100 mM Tricine, 0.1 % (w/v) SDS). The protein samples in SDS-PAGE loading buffer were incubated at 95°C for 3 min before application on the gel. For Western blot analysis, the SDS-PAGE gels were incubated for 5 min in Towbin transfer buffer (192 mM glycine, 25 mM Tris, 20 % (v/v) methanol, 0.1 % (w/v) SDS) and blotted for 50 min at 20 V onto an Optitran BA-S83 nitrocellulose membrane (Whatman). After blocking the membrane for 45 min with 20 mL TTBS (100 mM Tris-HCl at pH 7.5, 150 mM NaCl, 0.1 % (v/v) Tween-20) containing 5 % (w/v) milk powder (Biorad), the membrane was incubated for 90 min with 20 mL of a 1:5000 dilution of streptavidin-alkaline phosphatase (Sigma) in TTBS and was then developed in 20 mL alkaline phosphatase buffer (100 mM Tris-HCl at pH 9.5, 100 mM NaCl, 5 mM Mg(Cl2)) containing 100 uM BCIP (Roche) and 100 uM NBT (Sigma). Uniformely 15N-labeled protein samples for NMR analysis were produced by employing 15N-labeled amino acids (Spectra Stable Isotopes) in the cell-free reaction mixture. After purification, the protein samples were concentrated to a final volume of ca. 500 uL in NMR buffer containing 5 % (v/v) D2O. NMR spectra were recorded at 20°C (R1R2), 25°C (Fox_C) on a Bruker DRX-500 spectrometer equipped with a triple-resonance cryoprobe with shielded z-gradient coils. Spectra were processed with TOPSPIN 2.0 (Bruker-Biospin) and were analyzed using the program CARA [372].

Surface plasmon resonance experiments

SPR measurements were carried out on a MASS-1 system equipped with an SPR Amine chip (Sierra Sensors, Germany). The sensor surface was coated with 1400–2000 RU of streptavidine in PBS (Sigma-Aldrich) at 25°C and a flow rate of 12.5 µl/min according to the manufacturer’s recommendations. The biotinylated target protein was immobilized on the chip surface using a flow rate of 12.5 µl/min in SPR buffer (10 mM HEPES at pH 7.4, 200 mM NaCl, 3.4 mM EDTA, 0.01 % (v/v) Tween 20). Channel A was coated only with streptavidin and served as a reference cell. Binding experiments were carried out at 25°C in SPR buffer with a flow rate of 25 µl/min. The concentration series was started with 100 nM RNA analyte which was incrementally diluted twofold in subsequent

172 measurements. All injections were measured as duplicates and the surface was regenerated with 2 M NaCl after each injection. The raw data was analyzed using Scrubber 2.0 (BioLogic Software) with a 1:1 Langmuir model including a term for mass-transfer and double referencing.

Preparation of RNA

RNA was synthesized on a MerMade 12 (Bioautomation Corporation) on a 50 nM scale under standard conditions on UnySupport controlled pore glass (CPG) (Glenn research) with a pore size of 500 Å. Phosphoramidites were purchased from Thermo Fisher scientific, CAP A/CAP B were purchased from BioSolve and 5-Benzylthio-1H-tetrazole (BTT) from CarboSynth. After synthesis the RNA was cleaved from CPG using gaseous methylamine at 65°C and 1.5 bar pressure for 90 min. The RNA was eluted with 50 % (v/v) ethanol and concentrated to dryness. Desilylation was conducted using triethylamine (TEA) : 1-methyl-2-pyrrolidone (NMP) : TEA.3HF (3:6:4) at 70°C for 90 min. Isopropoxytrimethylsilane was added following by 400 μl Dieethylether. The supernatant was removed and the solid was dissolved in 200 μl water. Crude RNA was purified by HPLC (Agilent 1200 series, Agilent Technologies) using a XBridge OST column (Waters). Dimethoxytrityl groups cleaved in 40 % (v/v) acetic acid and separated from followed the RNA by HPLC. The purified RNA samples were analyzed on an Agilent 6130 Series Quadrupole LC/MS (Agilent Technologies) with electron spray ionization and the yield was determined by UV absorption (Nanodrop, Thermo Scientific).

Acknowldegments

We thank Dr. Doug Black for providing the plasmid pcDNA3.1-FLAG-mouseA2BP1 encoding murine full-length Fox-1. C. elegans cDNA (obtained from Dr. M. Hengartner). pET28a-Lin28 (obtained from Dr. F. Laughlin). Dr. Peter Hunziker and Dr. Serge Chesnov from the FGCZ for ESI-MS analysis of protein preparations.

173

References

1. Samols D, et al. (1988) Evolutionary conservation among biotin enzymes. J Biol Chem 263(14):6461-6464.

2. Tong L (2013) Structure and function of biotin-dependent carboxylases. Cell Mol Life Sci 70(5):863-891.

3. Campbell JW & Cronan JE, Jr. (2001) Bacterial fatty acid biosynthesis: targets for antibacterial drug discovery. Annu Rev Microbiol 55:305-332.

4. Eisenberg MA, Prakash O, & Hsiung SC (1982) Purification and properties of the biotin repressor. A bifunctional protein. J Biol Chem 257(24):15167-15173.

5. Choi-Rhee E & Cronan JE (2003) The biotin carboxylase-biotin carboxyl carrier protein complex of Escherichia coli acetyl-CoA carboxylase. J Biol Chem 278(33):30806-30812.

6. Fall RR & Vagelos PR (1975) Biotin carboxyl carrier protein from Escherichia coli. Methods Enzymol 35:17-25.

7. Green NM (1963) Avidin. 1. The Use of (14-C)Biotin for Kinetic Studies and for Assay. Biochem J 89:585-591.

8. Piran U & Riordan WJ (1990) Dissociation rate constant of the biotin-streptavidin complex. J Immunol Methods 133(1):141-143.

9. Hofmann K, Finn FM, Friesen HJ, Diaconescu C, & Zahn H (1977) Biotinylinsulins as potential tools for receptor studies. Proc Natl Acad Sci U S A 74(7):2697-2700.

10. de Boer E, et al. (2003) Efficient biotinylation and single-step purification of tagged transcription factors in mammalian cells and transgenic mice. Proc Natl Acad Sci U S A 100(13):7480-7485.

11. Chattopadhaya S, Tan LP, & Yao SQ (2006) Strategies for site-specific protein biotinylation using in vitro, in vivo and cell-free systems: toward functional protein arrays. Nat Protoc 1(5):2386-2398.

12. Fernandez-Suarez M, Chen TS, & Ting AY (2008) Protein-protein interaction detection in vitro and in cells by proximity biotinylation. J Am Chem Soc 130(29):9251-9253.

13. Elia G (2010) Protein biotinylation. Curr Protoc Protein Sci Chapter 3:Unit 3 6.

14. Kay BK, Thai S, & Volgina VV (2009) High-throughput biotinylation of proteins. Methods Mol Biol 498:185-196.

15. Agafonov DE, Rabe KS, Grote M, Voertler CS, & Sprinzl M (2006) C-terminal modifications of a protein by UAG-encoded incorporation of puromycin during in vitro protein synthesis in the absence of release factor 1. Chembiochem 7(2):330-336.

16. Watanabe T, Muranaka N, Iijima I, & Hohsaka T (2007) Position-specific incorporation of biotinylated non-natural amino acids into a protein in a cell-free translation system. Biochem Biophys Res Commun 361(3):794-799.

17. Lesaicherre ML, Lue RY, Chen GY, Zhu Q, & Yao SQ (2002) Intein-mediated biotinylation of proteins and its application in a protein microarray. J Am Chem Soc 124(30):8768-8769.

174

18. Taki M, Sawata SY, & Taira K (2001) Specific N-terminal biotinylation of a protein in vitro by a chemically modified tRNA(fmet) can support the native activity of the translated protein. J Biosci Bioeng 92(2):149-153.

19. Lue RY, Chen GY, Hu Y, Zhu Q, & Yao SQ (2004) Versatile protein biotinylation strategies for potential high-throughput proteomics. J Am Chem Soc 126(4):1055-1062.

20. Schatz PJ (1993) Use of peptide libraries to map the substrate specificity of a peptide-modifying enzyme: a 13 residue consensus peptide specifies biotinylation in Escherichia coli. Biotechnology (N Y) 11(10):1138-1143.

21. Cull MG & Schatz PJ (2000) Biotinylation of proteins in vivo and in vitro using small peptide tags. Methods Enzymol 326:430-440.

22. Beckett D, Kovaleva E, & Schatz PJ (1999) A minimal peptide substrate in biotin holoenzyme synthetase-catalyzed biotinylation. Protein Sci 8(4):921-929.

23. Michel E & Wüthrich K (2012) High-yield Escherichia coli-based cell-free expression of human proteins. J Biomol NMR 53(1):43-51.

24. Clery A, Blatter M, & Allain FH (2008) RNA recognition motifs: boring? Not quite. Curr Opin Struct Biol 18(3):290-298.

25. Valverde R, Edwards L, & Regan L (2008) Structure and function of KH domains. FEBS J 275(11):2712-2726.

26. Masliah G, Barraud P, & Allain FH (2013) RNA recognition by double-stranded RNA binding domains: a matter of shape and sequence. Cell Mol Life Sci 70(11):1875-1895.

27. Hall TM (2005) Multiple modes of RNA recognition by zinc finger proteins. Curr Opin Struct Biol 15(3):367-373.

28. Li SJ & Cronan JE, Jr. (1992) The gene encoding the biotin carboxylase subunit of Escherichia coli acetyl-CoA carboxylase. J Biol Chem 267(2):855-863.

29. Baba T, et al. (2006) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol 2:2006 0008.

30. Auweter SD, et al. (2006) Molecular basis of RNA recognition by the human alternative splicing factor Fox-1. EMBO J 25(1):163-173.

31. Michel E, Skrisovska L, Wüthrich K, & Allain FH (2013) Amino acid-selective segmental isotope labeling of multidomain proteins for structural biology. Chembiochem 14(4):457-466.

32. Howarth M, et al. (2006) A monovalent streptavidin with a single femtomolar biotin binding site. Nat Methods 3(4):267-273.

33. Michel E & Wüthrich K (2012) Cell-free expression of disulfide-containing eukaryotic proteins for structural biology. FEBS J 279(17):3176-3184.

34. Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227(5259):680-685.

35. Keller R (2004) The Computer Aided Resonance Assignment Tutorial (Cantina Verlag, Goldau, Switzerland).

175

4.4.: List of tables FIGURE 1. PRINCIPLE OF SPR METHOD. A LIGHT BEAM TRANSMITTED THROUGH A PRISM IS TOTALLY REFLECTED AT THE PRISM/GOLD LAYER AT ALL WAVELENGTHS BUT THE ONE INTERACTING WITH THE SPW. THE WAVELENGTH SUFFERING AN INTENSITY LOSS CHANGES IF THE SPW CHANGES. THE CHANGE IN WAVELENGTH INTERACTING WITH THE SPW LEADS TO AN INTENSITY DIP AT A DIFFERENT EXIT ANGLE (A AND B). THE CHANGE OF THE WAVELENGTH EXPERIENCING THE ENERGY DIP IS ILLUSTRATED ON THE RIGHT SITE. ΔI ILLUSTRATES THE CHANGE IN INTENSITY IF THE DETECTION OCCURS AT A FIXED WAVELENGTH...... 17 FIGURE 2. SCHEMATIC ILLUSTRATION OF THE AMINE COUPLING. THE THICK BLACK LINE REPRESENTS THE CHIP SURFACE AND THE THIN BLACK LINE THE SURFACE MATRIX ...... 19 FIGURE 3. PRINCIPLE OF A TYPICAL SPR EXPERIMENT, INCLUDING THE PRE-INJECTION PHASE (-25 – 0 S), THE ASSOCIATION PHASE (0 -~20 S), THE STEADY STATE PHASE (25 - 210 S) AND THE DISSOCIATION PHASE (220-300 S)...... 20 FIGURE 4. SCHEMATIC ILLUSTRATION OF THE STEADY STATE ANALYSIS. THE STEADY STATE ANALYSIS IS ON THE LEFT SIDE AND THE SENSOGRAMS ARE SHOWN ON THE RIGHT. THE STEADY STATE FIT (RED LINE) WAS DONE USING THE HILL EQUATION ...... 23 FIGURE 5. STRUCTURES OF THE PHOSPHORAMIDITES (1-12), THE ACTIVATORS (13, 14) AND THE LINKER GROUP ON THE CONTROLLED POROSITY GLASS (CPG) SOLID SUPPORT (15) FOR THE SYNTHESIS OF DNA (1, 4, 7, 10, 13, 14, 15), 2’-OME-RNA (2, 5, 8, 11, 13, 14, 15) AND RNA (3, 6, 9, 12-15) ...... 24 FIGURE 6. GENERAL REACTION CYCLE OF THE OLIGONUCLEOTIDE SYNTHESIS INCLUDING 1) DEPROTECTION, 2) ACTIVATION AND COUPLING, 3) CAPPING AND 4) OXIDIZING...... 25 FIGURE 7. MECHANISM OF THE DEPROTECTION OF THE 5'-OH GROUP USING DCA...... 26 FIGURE 8. REACTION MECHANISM OF THE TETRAZOLE MEDIATED ACTIVATION AND COUPLING OF A PHOSPHORAMIDITE...... 27 FIGURE 9. CAPPING MECHANISM ...... 27 FIGURE 10. OXIDATION MECHANISM ...... 28 FIGURE 11. CLEAVAGE FROM THE SOLID SUPPORT ...... 29 FIGURE 12. CANONICAL MIRNA BIOGENESIS PATHWAY. THE PRI-MIRNA IS TRANSCRIBED BY RNA POLYMERASE FROM THE MIRNA GENE. IT IS PROCESSED BY DROSHA AND DGCR8 FORMING THE PRE-MIRNA WHICH IS THEN EXPORTED INTO THY CYTOPLASM BY EXPORTIN-5 AND RAN-GTP. THE PRE-MIRNA IS RELEASED INTO THE CYTOPLASM UPON HYDROLYSIS OF THE RAN-GTP TO RAN-GDP. IT IS CLEAVED INTO THE MATURE MIRNA DUPLEX BY DICER AND TRBP AND SUBSEQUENTLY LOADED INTO THE RISC COMPLEX CONTAINING AN ARGONAUT PROTEIN AMONGST OTHERS.[77] ...... 31 FIGURE 13. A) SCHEMATIC ILLUSTRATION OF THE RISC COMPLEX AND THE TARGETING OF THE MRNA TARGET. B) PROCESS OF THE MEDIATION OF TRANSLATIONAL REPRESSION. C) MECHANISM OF THE MRNA DEADENYLATION WHICH LEADS TO MRNA DEGRADATION.[117] ...... 34 FIGURE 14. CONSERVATION OF DIFFERENT RBFOX FAMILY MEMBERS ACROSS SPECIES. A) MAP OF CONSERVATION OF THE FOX-1 FAMILY RRM. IDENTICAL AMINO ACIDS (AA) ARE MARKED IN ORANGE, SIMILAR AA ARE MARKED IN YELLOW. B) COMPARISON OF THE IDENTITY OF FULL LENGTH RBFOX PROTEIN BETWEEN MOUSE RBFOX-1, MOUSE RBFOX-2, ZEBRAFISH RBFOX-1 AND C. ELEGANS RBFOX-1. THE RRM IS MARKED IN ORANGE, THE C AND N-TERMINAL REGIONS ARE MARKED IN YELLOW. PERCENTAGES NUMBERS SHOW THE AA IDENTITY COMPARED TO MOUSE RBFOX-1. C) COMPARISON OF THE C-TERMINAL AA SEQUENCES OF THE RBFOX-1 FAMILY AND OTHER PROTEINS (HNRNP A1, HNRNP D, HNRNP F AND TAP) CONTAINING THE NUCLEAR LOCALIZATION SIGNAL (NLS).[172] ...... 38 FIGURE 15. A) INTERACTIONS BETWEEN SELECTED BASES AND AMINO ACIDS SHOWN BY THE NMR STRUCTURE. B) OVERLAY OF SECTIONS OF 2D TOCSY SPECTRA SHOWING THE H5–H6 CORRELATIONS OF URACIL AND CYTOSINE OF SOLUTIONS OF 5’-UGCAUGU-3’ IN THE PRESENCE OF ONE EQUIVALENT OF FOX-1 (RED), FOX-1 F126A (BLACK), AND FOX-1 F160A (BLUE). C) STRUCTURE OF URACIL AND CYTIDINE WITH NUMBERING OF THE ATOMS. ADAPTED FROM[9] ...... 40 FIGURE 16. OVERVIEW OF THE SOLUTION STRUCTURE OF THE RBD OF RBFOX-1 IN COMPLEX WITH UGCAUGU. (A) OVERLAY OF THE fiNAL 30 STRUCTURES SUPERPOSED ON THE HEAVY ATOMS OF THE STRUCTURED PARTS OF THE PROTEIN AND OF THE RNA. THE PROTEIN BACKBONE IS GRAY, THE RNA BACKBONE IS ORANGE, THE PHOSPHATE GROUPS ARE RED, AND THE RNA BASES ARE YELLOW. ONLY THE ORDERED

176

REGION OF THE PROTEIN (RESIDUES 116–194) IS SHOWN. (B) SURFACE (HEAVY ATOMS OF RESIDUES 116– 194) AND STICK (HEAVY ATOMS OF THE RNA) REPRESENTATION OF THE LOWEST ENERGY STRUCTURE. THE PROTEIN SURFACE IS PAINTED ACCORDING TO SURFACE POTENTIAL WITH RED INDICATING NEGATIVE CHARGES AND BLUE INDICATING POSITIVE CHARGES. THE RNA IS COLORED AS IN PANEL (A). (C) THE LOWEST ENERGY STRUCTURE IN RIBBON (PROTEIN BACKBONE) AND STICK (RNA) REPRESENTATION. THE COLOR SCHEME IS THE SAME AS IN (A), IMPORTANT PROTEIN SIDE CHAINS INVOLVED IN HYDROPHOBIC INTERACTIONS WITH THE RNA ARE REPRESENTED AS GREEN STICKS. (D) SAME AS (C) BUT ROTATED BY 90% AROUND THE INDICATED AXIS[9]...... 42 FIGURE 17. A) GENERAL MECHANISM OF THE INHIBITION OF EXON INCLUSION UPON BINDING OF RBFOX IN THE UPSTREAM INTRONIC FLANKING REGION (UIF). B) GENERAL MECHANISM OF THE EXON INCLUSION UPON BINDING OF RBFOX IN THE DOWNSTREAM INTRONIC FLANKING REGION (DIF). C) A MODEL FOR REPRESSION OF PRESPLICEOSOME COMPLEX FORMATION BY THE FOX-1 FAMILY. THE REPRESSION OF CALCITONIN-SPECIfiC EXON 4 OF CALCITONIN/CGRP PRE-MRNA IN NEURONAL CELLS BY THE FOX-1 FAMILY INVOLVES TWO DISTINCT REGULATORY EVENTS. FIRST, THE -34 ELEMENT IN THE UIF REGION PREVENTS E’ COMPLEX FORMATION THROUGH REPRESSING SF1 BINDING TO THE BRANCH POINT. SECOND, THE +45 EXONIC ELEMENT BLOCKS TRANSITION TO E COMPLEX VIA INHIBITING U2AF65 BINDING TO POLYPYRIMIDINE TRACT (PY). [189] ...... 43 FIGURE 18. SCHEME OF THE SELEX MECHANISM. EACH EXPERIMENT USES A POOL OF A RANDOMIZED RNA OR DNA POOL WITH ABOUT 1015 DIFFERENT SEQUENCES. IN A FIRST STEP THE TARGET IS INCUBATED WITH THE RNA/DNA POOL. AFTER WASHING AWAY UNBOUND RNA/DNA MOLECULES, THE REMAINING MOLECULES ARE ENRICHED USING PCR. THESE STEPS ARE REPEATED SEVERAL TIMES (6-20). AFTER THE FINAL ROUND OF AMPLIFICATION, THE APTAMERS ARE SEQUENCED AND ANALYZED. [241] ...... 49 FIGURE 19. SCHEME OF THE MECHANISM OF U TO C TRANSITION, ADAPTED FROM ASCANO ET AL [266]. AROMATIC AMINO ACID RESIDUES ARE MARKED IN BLUE...... 51 FIGURE 20. SCHEME OF THE WORK FLOW OF THE HITS-CLIP AND PAR-CLIP PROCEDURE. FOR PAR-CLIP THE CELLS ARE TREATED WITH THE MODIFIED NUCLEOBASE. AFTERWARDS THE CELLS ARE IRRADIATED WITH 365 OR 254 NM WAVELENGTH RESPECTIVELY. AFTER LYSIS, PARTIAL DIGESTION AND IMMUNOPRECIPITATION 3’-ADAPTER LIGATION IS DONE. THE PROTEIN IS DIGESTED USING PROTEINASE FOLLOWED BY 5’ ADAPTER LIGATION. THE RNA IS REVERSE TRANSCRIBED. FOR PAR-CLIP THE MODIFIED BASE BOUND TO THE AMINO ACIDS CAN EITHER LEAD TO A TRANSITION FROM IN THYMIDINE TO CYTIDINE (4SU), OR FROM GUANOSINE TO ADENOSINE (6SG) OR NO MUTATION CAN OCCUR (READ- THROUGH). FOR HITS-CLIP THE BINDING SITE CAN EITHER LEAD TO A DELETION OR CAN ALSO NOT AFFECT THE TRANSCRIPTION. THE CDNA IS AFTERWARDS AMPLIFIED USING PCR FOLLOWED BY HIGH- THROUGHPUT SEQUENCING. ADAPTED FROM [271] ...... 53 FIGURE 21. A) SCHEMATIC ILLUSTRATION OF RBOX PRE-MIRNA ELISA. B) NORMALIZED BINDING INTENSITY OF PRE-MIRNAS AGAINST IMMOBILIZED RBFOX RRM. C) NORMALIZED BINDING INTENSITY OF PRE-MIRNAS TO RBFOX-2 FROM HELA CELL LYSATES (EXPERIMENTS WERE PERFORMED BY DR. HARRY TOWBIN)...... 57 FIGURE 22: SPR ANALYSIS OF RNA BINDING TO C-TERMINALLY BIOTINYLATED FOX (109–208). THE BIOTINYLATED FOX CONSTRUCT WAS IMMOBILIZED ON THE STREPTAVIDIN-COATED SENSOR SURFACE EITHER AFTER PURIFICATION (A) OR DIRECTLY FROM THE CRUDE CELL-FREE REACTION MIXTURE (B). THE BINDING EXPERIMENTS WERE RECORDED AT 25 °C IN SPR BUFFER (10 MM HEPES AT PH 7.4, 200 MM NACL, 3.4 MM EDTA) IN A CONCENTRATION SERIES OF 100, 50, 25, 12.5, 6.25, 3.13, 1.56, 0.78, 0.39, 0.2, 0.1 AND 0.05 NM OF THE 5’-UGCAUGU-3’ RNA ANALYTE. ALL INJECTIONS WERE MEASURED AS DUPLICATES AND THE RESULTING SENSOGRAMS WERE FITTED WITH A 1:1 LANGMUIR MODEL THAT INCLUDES MASS TRANSFER AND DOUBLE REFERENCING ...... 61 FIGURE 23. SPR ANALYSIS OF THE DIRECT IMMOBILIZATION OF C-TERMINALLY BIOTINYLATED RBFOX-1 (AA 109–208) FROM THE CRUDE REACTION MIXTURE ONTO A STREPTAVIDIN-COATED SPR BIOSENSOR CHIP. THE CELL FREE REACTIONS FOR PRODUCTION OF THE BIOTINYLATED TARGET PROTEIN WERE CARRIED OUT IN PRESENCE OF VARIOUS AMOUNTS OF BIOTIN AND EACH REACTION WAS INDIVIDUALLY INJECTED OVER THE SPR BIOSENSOR SURFACE WHICH IS INDICATED BY COLOR CODING. THE IMMOBILIZATION OF PURIFIED BIOTINYLATED FOX-1 SERVED AS A REFERENCE. BLUE ARROWS INDICATE THE TIME POINTS OF SAMPLE INJECTION INTO THE CHANNELS OF THE SENSOR CHIP...... 62

177

FIGURE 24. RESULTS OF THE RNA/RBFOX RRM SCREEN. A) 256 SEQUENCES OF NNNNUGU. B) 64 SEQUENCES OF UGCANNN AT A CONCENTRATION OF 1MM. C) ALL SEQUENCES WITH NORMALIZED RU VALUES PLOTTED. A THRESHOLD OF 7 RU (RED LINE) WAS ARBITRARILY CHOSEN AS A SIGNIFICANT BINDING EVENT...... 64 FIGURE 25. POSSIBLE INTERACTIONS OF SUB-STRUCTURES FOR THE NEW SCREENING MOTIFS. AMINO ACIDS ARE MARKED IN GREEN, NUCLEIC ACIDS IN BLUE...... 68 FIGURE 26. SENSOGRAM OF 7 MER RNA SEQUENCES FROM THE SCREEN AGAINST RBFOX. THE CHIP SURFACES WERE COATED BETWEEN 100 AND 140 RU USING THE AMINE CHIP FROM SIERRA SENSORS. EACH CONCENTRATION WAS MEASURED IN DUPLICATES IN A 1:1 DILUTION SERIES. THE DATA WERE FITTED USING SCRUBBER IN A 1:1 BINDING MODEL INCLUDING MASS TRANSFER LIMITATIONS...... 69 FIGURE 27. POSSIBLE STRUCUTRES OF 7 MER RNA BINDING TO RBFOX RRM BASED ON THE UGCAUGU/RBFOX RRM STRUCUTRE FROM AUWETER ET AL...... 72 FIGURE 28. SENSOGRAMS OF RNA SEQUENCES CONTAINING ONE VARIATION IN COMPARISON WITH THE UGCAUGU MOTIF. DATA WERE MEASURED IN DUPLICATES IN A 1:1 DILUTION SERIES STARTING FROM A CONCENTRATION OF 2000 NM. THE CURVES WERE FITTED USING A 1:1 LANGMUIR BINDING MODEL INCLUDING MASS-TRANSPORT LIMITATIONS...... 73 FIGURE 29. A) SPR SENSOGRAMS OF THE DMC AND MMC MODIFIED UGCAUGU SEQUENCES AGAINST RBFOX RRM COATED ON THE SENSOR SURFACE. THE KINETIC FIT IS DONE WITH A 1:1 LANGMUIR MODEL. THE CONCENTRATION FOR THE MMC STARTS FROM 300 NM AND THE 2000 NM FOR THE DMC MODIFIED OLIGONUCLEOTIDE IN A 1:1 DILUTION SERIES. B) WT: INTERACTIONS FROM THE WILD TYPE RNA AGAINST RBFOX RRM, OBTAINED FROM THE NMR SPECTROSCOPY, C3DMC: POSSIBLE INTERACTION OF THE DMC MODIFIED RNA AGAINST RBFOX RRM. C3MMC: POSSIBLE INTERACTION OF THE MMC MODIFIED RNA AND ITS KETO-IMINE TAUTOMER AGAINST RBFOX RRM...... 75 FIGURE 30. AFFINITY OF RBFOX-BINDING HEPTANUCLEOTIDES, PRE-MIR-20B, -32 AND -107 TO RECOMBINANT BIOTINYLATED FOX RRM DOMAIN MEASURED BY SPR. THE PANELS SHOW BINDING CURVES AND PREDICTED SECONDARY STRUCTURES OF NATIVE AND MUTATED PRECURSORS, WHERE THE FBE IS DEPICTED IN RED. CONCENTRATIONS OF UGCAUGU AND UGCAUAU WERE 1.7, 2.3, 4.7, 9.4, 18.8, 37.5, 75, 150 NM AND FOR THE PRE-MIRNAS WERE 0, 117, 175, 263, 395, 1333, 2000, 3000 NM...... 77 FIGURE 31. SPR SENSOGRAMS OF PRE-MIRNAS HARBORING THE GAAUG FBE. CONCENTRATIONS STARTING AT 2000NM IN A 1:1 DILUTION SERIES...... 79 FIGURE 32. CD SPECTRA OF PRE-MIRS AT A CONCENTRATION OF 2.5 MM IN THE PRESENCE OF RBFOX RRM (0.5 - 5.5 MM). THE CD SPECTRA OF THE PROTEIN AND BUFFER WERE SUBTRACTED. FROM THE PURE RBFOX-1 RRM SPECTRA ONLY THE HEPES BUFFER WAS SUBTRACTED[299] ...... 80 FIGURE 33. SPR SENSOGRAMS OF TRUNCATED HAIRPINS AGAINST RBFOX RRM. CONCENTRATIONS STARTING FROM 5 µM IN A 1:1 DILUTION SERIES. THE DATA WERE MEASURED IN DUPLICATES AND FITTED TO A 1:1 LANGMUIR MODEL INCLUDING MASS-TRANSPORT LIMITATIONS...... 82 FIGURE 34. SCHEMATIC ILLUSTRATION OF THE COMPETITION OF THE INTRAMOLECULAR RNA DUPLEX FORMATION AND THE RNA/PROTEIN COMPLEX FORMATION. THE RNA SEQUENCES REPRESENT THE FBE (UGCAUGU) AND THE COMPLEMENTARY STRAND (ACGUACA) IN A HAIRPIN STRUCTURE...... 83 FIGURE 35. DATA FROM SPR AND MELTING EXPERIMENTS. ASSOCIATED FRACTION PLOT AND UV ABSORPTION WAS MEASURED IN PHOSPHATE BUFFER. SPR SENSOGRAMS WERE MEASURED ON THE MASS-1 WITH RBFOX RRM COATED ON THE SENSOR SURFACE...... 84 FIGURE 36. CORRELATION BETWEEN THE THERMODYNAMIC VALUES OF THE MELTING EXPERIMENTS, WITH THE KINETIC DATA OBTAINED FROM SPR EXPERIMENTS. ΔG VALUES REPRESENT THE STABILITY OF THE HAIRPIN STRUCTURE...... 85 FIGURE 37. SCHEMATIC ILLUSTRATION OF THE DIFFERENT SUGAR PUCKER CONFORMATIONS...... 86 FIGURE 38. SPR SENSOGRAMS OF THE 14 HYBRID SEQUENCES AND THE CONSENSUS RNA SEQUENCE (TABLE 10) AGAINST RBFOX RRM COATED ON THE SPR CHIP SURFACE. DATA WERE MEASURED IN DUPLICATES AND IN A TWOFOLD DILUTION SERIES STARTING FROM A CONCENTRATION OF 100 NM. THE SIMULATED CURVES ARE FITTED IN A 1:1 LANGMUIR MODEL INCLUDING MASS-TRANSFER LIMITATIONS...... 89 FIGURE 39. A: SELECTIVE INTERACTION OF FOX-2 WITH FBE-CONTAINING PRECURSORS OF MIR-20B, MIR-32, AND MIR-107 IN SW13 CELLS. QRT-PCR DATA OF FOX-2 IMMUNOPRECIPITATION (RIP) VERSUS CONTROL

178

BEADS (WITHOUT ANTIBODY). ERROR BARS INDICATE STANDARD DEVIATIONS OF THREE INDEPENDENT EXPERIMENTS. (EXPERIMENTS WERE PERFORMED BY JULIAN ZAGALAK) ...... 90 FIGURE 40. A: DEEP SEQUENCING ANALYSIS UPON FOX-2 KNOCK-DOWN OF SELECTED MIRNAS WITH SIGNIFICANTLY ALTERED EXPRESSION LEVELS. B: ANALYSIS BY SMALL RNA SEQUENCE READS. THE NORMALIZED SEQUENCE READS SHOW INTACT MIR-20B-5P READS IN MOCK SAMPLES AND IS DOWN- REGULATED IN FOX-2 KNOCK-DOWN SAMPLES. THE 3P MIR-20B READS INCREASE UPON KNOCK-DOWN OF FOX-2 AND WITH ADDITION OF U ON 5ʼ END. (EXPERIMENTS PERFORMED BY JULIAN ZAGALAK, DR. AFZAL DOGAR AND DR. JOCHEN IMIG) ...... 91 FIGURE 41. WATSON-CRICK PAIRING OF RNAS CONTAINING GUANINE WITH NATURAL AND METHYLATED CYTIDINES. A: CANONICAL G-C BASE PAIR. B: GUANINE N4-METHYLCYTOSINE (MMC) BASE PAIR. C: ONE POSSIBLE CONFORMATION OF A GUANINE N4.N4-DIMETHYLCYTOSINE (DMC) BASE PAIR. D: MELTING PROFILES OF DUPLEXES COMPRISING RNAS FROM TABLE1: (BLACK SQUARES: RNA DUPLEX [1-2], RED DOTS: RNA DUPLEX [1-3]; GREEN TRIANGLES: RNA DUPLEX [1-4]; BLUE TRIANGLE RNA DUPLEX [1-5]. E: THE THERMODYNAMIC TM OF DUPLEXES IS INDICATED BY ARROWS, DETERMINED FROM THE THEORETICAL EQUILIBRIUM CURVE F(T)...... 147 FIGURE 42. METHYLATED CYTIDINES IN MICRORNA(MIRNA) DUPLEXES ARE ACCEPTED INTO THE RNA INDUCED SILENCING COMPLEXES (RISC) AND SHOW VARYING LEVELS OF BIOLOGICAL ACTIVITY. (A) STRUCTURE OF MIMICS MIR-106A ANDMIR-34A. (B) SCHEMATIC REPRESENTATION OF MIR-106A AND MIR-34A TARGETING COMPLEMENTARY SITES EMBEDDED IN PARTS OF THE SIRT1 AND P21 3’ UNTRANSLATED REGIONS (UTRS), RESPECTIVELY. (C) AND (D) HELA CELLS TRANSFECTED WITH LUCIFERASE REPORTER PLASMIDS SHOWN IN (B) WERE TREATED AFTER 24 HOURS WITH INCREASING DOSES (0, 2, 9 AND 36 NM) OF ANALOGS OF MIR-106A AND MIR-34A. RELATIVE LUCIFERASE ACTIVITY WAS MEASURED 48 HOURS AFTER PLASMID TRANSFECTIONS, AND RESIDUAL LUCIFERASE ACTIVITY IS PLOTTED AFTER NORMALIZATION TO THAT OF THE 0 NM TREATMENT [MEAN OF TRIPLICATE TRANSFECTIONS – STANDARD DEVIATION (SD)]. (E) SCHEMATIC REPRESENTATION OF MIR-34A AND MIR-106A ANALOGS TARGETED TO LUCIFERASE REPORTER GENES BEARING A SINGLE MIR-RNA TARGET SITE FROM P21 AND SIRT1, RESPECTIVELY IN THEIR 3’ UTRS. (F) AND (G) HELA CELLS TRANSFECTED WITH LUCIFERASE REPORTER PLASMIDS SHOWN IN (E) WERE TREATED AFTER 24 HOURS WITH ANALOGS OF MIR-106A AND MIR-34A. RELATIVE LUCIFERASE ACTIVITY WAS MEASURED 48 HOURS AFTER PLASMID TRANSFECTIONS, AND RESIDUAL LUCIFERASE ACTIVITY IS PLOTTED AFTER NORMALIZATION TO THAT OF THE 0 NM TREATMENT (MEAN OF TRIPLICATE TRANSFECTIONS ±SD). (H) CASPASE 3/7 ACTIVITY WAS MEASURED FROM LYSATES OF HELA CELLS 72 HOURS AFTER TRANSFECTION WITH MIR-34A ANALOGS. CASPASE 3/7 ACTIVITY IS PLOTTED AFTER NORMALIZATION TO THAT OF THE 0 NM TREATMENT (MEAN OF TRIPLICATE TRANSFECTIONS ±SD)...... 152

SUPPLEMENTARY FIGURE 1. NATURAL RNA BASES ...... 139 SUPPLEMENTARY FIGURE 2. PYTHON CODE TO OBTAIN ALL POSSIBLE N-MERS ...... 140 SUPPLEMENTARY FIGURE 3. ALIGNMENT OF THE PRI-MIR-32 SEQUENCES USING CARNA. THE ALIGNMENT SHOWS THE HIGH LEVEL OF CONSERVATION FOR THE MATURE MIRNA SEQUENCES. THE COLORS SHOW THE NUMBER OF POSSIBLE BASE PAIRING PARTNERS. THE FIRST GREEN COLUMN IS THE MUTATED SITE WITHIN THE FBE...... 141 SUPPLEMENTARY FIGURE 4. SPR SPECTROSCOPY SENSOGRAMS OF PRE-MIRS AND MUTATED PRE-MIRS WITH AN INDICATION FOR UNSPECIFIC BINDING AT HIGH CONCENTRATIONS ...... 142 SUPPLEMENTARY FIGURE 5. RESULTS FROM ELISA SCREEN. BINDERS WITH BINDING MOTIFS (GCAUG, GAAUG, GCACG) ARE MARKED WITH A RED DOT ...... 143 SUPPLEMENTARY FIGURE 6. COMPARISON KINETIC VS STEADY STATE ANALYSIS OF PRE-MIRS AGAINST RBFOX ...... 144 SUPPLEMENTARY FIGURE 7. CHANGE IN ABSORPTION OF RNA UPON ADDITION OF 20 M RBFOX SOLUTION TO THE RNA ...... 145

179

SUPPLEMENTARY FIGURE 8. CALCULATED STRUCTURES OF HAIRPINS USED IN THIS WORK WITH THEIR CORRESPONDING G VALUES. MFOLD WAS USED FOR CALCULATION. THE FBE IS SHOWN IN GREEN AND THE MUTATIONS ARE SHOWN IN RED. DELETIONS ARE MARKED BY NEIGHBORING BASES IN ITALICS. .. 146

4.5.: List of tables

TABLE 1. RANKING OF NEW RNA BINDING MOTIFS FROM LAMBERT ET AL. [217]...... 45 TABLE 2. RANKING OF MIRNAS BINDING TO RBFOX-1 RRM AND FULL LENGTH RBFOX FROM HELA LYSATE. CHEMILUMINESCENCE WAS NORMALIZED TO THE STRONGEST MEASURED BINDER. BOLD MARKS THE BINDERS CONTAINING THE GCAUG CONSENSUS SEQUENCE. RED AND BLUE MARK THE PRE-MIRNAS THAT ONLY OCCUR AS A STRONG BINDER IN ONE OF THE ELISA FORMATS...... 58 TABLE 3. RESULTS OF FOUR INDEPENDENTLY CONDUCTED SPR CONCENTRATION SERIES EXPERIMENTS OF THE INTERACTION BETWEEN FOX-1(109–208) AND 5’-UGCAUGU-3’ RNA. THE TARGET PROTEIN FOX-1(109–208) WAS IMMOBILIZED EITHER FROM PURIFIED SAMPLES (PURIFIED FOX-1) OR DIRECTLY FROM THE CRUDE REACTION MIXTURE (CRUDE MIX FOX-1). ALL INJECTIONS WERE MEASURED AS DUPLICATES...... 62 TABLE 4. SUMMARY OF SCREENING DATA AND KINETIK MEASUREMENTS OF THE SCREEN HITS ...... 65 TABLE 5. STATISTICAL ANALYSIS OF THE NEW FBE IN THE PRE-MIRS FROM THE ELISA ASSAY...... 67 TABLE 6. LIST OF 15 SSRNAS WITH A SINGLE MUTATION MARKED IN RED AND THE KINETIC DATA. THE CHANGE IN GIBBS FREE ENERGY IS CALCULATED AS DESCRIBED ABOVE WITH UGCAUGU SEQUENCE AS THE REFERENCE...... 71 TABLE 7. KINETIC DATA FROM SPR MEASUREMENTS OF UGCAUGU, UGMMCAUGU AND UGDMCAUGU AGAINST RBFOX RRM ...... 74 TABLE 8. SUMMARY OF SPR DATA OF RBFOX-1 RRM TO THE SHORT 7MER FBES AND THE FULL LENGTH PRE- MIRS...... 78 TABLE 9. KINETIC DATA OF TRUNCATED HAIRPINS ...... 82 TABLE 10. KINETIC DATA OF THE 14 HYBRID SEQUENCES CONTAINING ONE MODIFIED SUGAR BINDING TO RBFOX RRM. DX MARKS THE POSITION AT WHICH AN RNA NUCLEOTIDE IS REPLACED BY A DNA NUCLEOTIDE AND OMEX MARKS THE POSITION AT WHICH AN RNA NUCLEOTIDE IS REPLACED BY A 2’- OME RNA NUCLEOTIDE...... 87 TABLE 11: LIST OF USED CHEMICALS ...... 95 TABLE 12. LIST OF USED EQUIPMENT ...... 96 TABLE 13. STARTING CONCENTRATION FOR THE KINETIC MEASUREMENTS OF THE SPR SCREEN HITS AGAISNT RBFOX RRM...... 100 TABLE 14. LIST OF ABBREVIATIONS USED...... 105 TABLE 15. OLIGORIBONUCLEOTIDE SEQUENCES USED IN THE INVESTIGATION...... 149 TABLE 16. MELTING TEMPERATURES AND FREE ENERGY CHANGES OF HYBRIDIZATION FROM ANNEALING OLIGORIBONUCLEOTIDES...... 150 TABLE 17. RESULTS OF FOUR INDEPENDENTLY CONDUCTED SPR CONCENTRATION SERIES EXPERIMENTS OF THE INTERACTION BETWEEN FOX-1(109–208) AND 5’-UGCAUGU-3’ RNA. THE TARGET PROTEIN FOX-1(109– 208) WAS IMMOBILIZED EITHER FROM PURIFIED SAMPLES (PURIFIED FOX-1) OR DIRECTLY FROM THE CRUDE REACTION MIXTURE (CRUDE MIX FOX-1). ALL INJECTIONS WERE MEASURED AS DUPLICATES. THIS DEMONSTRATES THE EXCEPTIONAL SUITABILITY OF OUR CELL-FREE PRODUCTION SYSTEM FOR RAPID AND EFFICIENT PRODUCTION AND IMMOBILIZATION OF BIOTINYLATED TARGET PROTEINS FOR SPR MEASUREMENTS. STARTING FROM THE GENE, WE CAN PREPARE EXCLUSIVELY BIOTINYLATED EUKARYOTIC PROTEINS FOR DIRECT APPLICATION ONTO STREPTAVIDIN- OR AVIDIN-CONTAINING MATERIALS WITHIN 2.5–4 H WHICH ENABLES FURTHER SPR ANALYSIS OF ITS INTERACTIONS WITH LIGANDS IN LESS THAN 2 H. ECONOMICALLY SPEAKING, A TYPICAL 100 ΜL CELL-FREE REACTION COSTS LESS THAN 0.5 CHF AND YIELDS ENOUGH BIOTINYLATED PROTEIN TO EASILY ALLOW CA. 400 INDIVIDUAL SPR MEASUREMENTS...... 166

180

SUPPLEMENTARY TABLE 1. LIST OF ALL 320 SEQUENCES FORM RNA SCREEN. RU RESPONSE WAS NORMALIZED TO MASS OF THE ANALYTE AND THE SURFACE DENSITY OF THE CHIP ...... 114 SUPPLEMENTARY TABLE 2. LIST OF ALL PRE-MIRS FORM THE ELISA SCREEN ...... 117 SUPPLEMENTARY TABLE 3.: LIST OF SYNTHESIZED TR-PRE-MIRS ...... 117 SUPPLEMENTARY TABLE 4. LIST OF RNA SEQUENCES FOR THE PENTAMER LIBRARY WITH SYNTHESIS YIELDS AND CALCULATED MASS...... 131 SUPPLEMENTARY TABLE 5. LIST OF RNA SEQUENCES SYNTHESIZED FOR THE RBFOX LIBRARY ...... 136 SUPPLEMENTARY TABLE 6. COMPARISON OF STEADY STATE AND KINETIC SPR ANALYSIS...... 136 SUPPLEMENTARY TABLE 7. KINETIC AND THERMODYNAMIC DATA OF TRUNCATED HAIRPINS. KINETIC CONSTANTS WERE MEASURED WITH A MASS-1 AGAINST RBFOX PROTEINS. THE THERMODYNAMIC VALUES ...... 136 SUPPLEMENTARY TABLE 8.. LIST OF TRUNCATED HAIRPINS FOR THE CORRELATION BETWEEN HAIRPIN STRENGTH AND PROTEIN RNA AFFINITY. VARIATIONS ARE MARKED IN RED...... 137 SUPPLEMENTARY TABLE 9. LIST OF PRIMERS ...... 138

181

5.: References

1. Hendrix, M., et al., Direct observation of aminoglycoside-RNA interactions by surface plasmon resonance. J Am Chem Soc, 1997. 119(16): p. 3641-8. 2. Hartmann, R., et al., Activation of 2′-5′ Oligoadenylate Synthetase by Single-stranded and Double-stranded RNA Aptamers. Journal of Biological Chemistry, 1998. 273(6): p. 3236-3246. 3. Malim, M.H., et al., Functional dissection of the HIV-1 Rev trans-activator—Derivation of a trans-dominant repressor of Rev function. Cell, 1989. 58(1): p. 205-214. 4. Malim, M.H. and B.R. Cullen, HIV-1 structural gene expression requires the binding of multiple Rev monomers to the viral RRE: implications for HIV-1 latency. Cell, 1991. 65(2): p. 241-8. 5. Minks, M.A., et al., Activation of 2',5'-oligo(A) polymerase and protein kinase of interferon- treated HeLa cells by 2'-O-methylated poly (inosinic acid) . poly(cytidylic acid), Correlations with interferon-inducing activity. J Biol Chem, 1980. 255(13): p. 6403-7. 6. Minks, M.A., et al., Structural requirements of double-stranded RNA for the activation of 2',5'- oligo(A) polymerase and protein kinase of interferon-treated HeLa cells. J Biol Chem, 1979. 254(20): p. 10180-3. 7. Ghosh, S.K., et al., Cloning, sequencing, and expression of two murine 2'-5'-oligoadenylate synthetases. Structure-function relationships. J Biol Chem, 1991. 266(23): p. 15293-9. 8. Law, M.J., et al., The role of RNA structure in the interaction of U1A protein with U1 hairpin II RNA. RNA, 2006. 12(7): p. 1168-78. 9. Auweter, S.D., et al., Molecular basis of RNA recognition by the human alternative splicing factor Fox-1. EMBO J, 2006. 25(1): p. 163-73. 10. Katsamba, P.S., S. Park, and I.A. Laird-Offringa, Kinetic studies of RNA-protein interactions using surface plasmon resonance. Methods, 2002. 26(2): p. 95-104. 11. Cooper, M.A. and M.A. Cooper, Sensor surfaces and receptor deposition Label-Free Biosensors. 2009: Cambridge University Press. 12. Davis, T.M. and W. David Wilson, Surface plasmon resonance biosensor analysis of RNA-small molecule interactions, in Methods in Enzymology, M.J.W. Jonathan B. Chaires, Editor. 2001, Academic Press. p. 22-51. 13. Wood, R.W., XLII. On a remarkable case of uneven distribution of light in a diffraction grating spectrum. Philosophical Magazine Series 6, 1902. 4(21): p. 396-402. 14. Hippel, A.R.V., Dielectric Materials and Applications. 1995, London: Artech House. 15. Boardman, A.D., Electromagnetic Surface Modes. 1982, Chichester: John Wiley & Sons Ltd. 16. Raether, H., Surface Plasmons on Smooth and Rough Surfaces and on Gratings. Springer Tracts in Modern Physics, ed. Y. Chen, Fujimori, A., Kühn, J.H., Müller, Th., Steiner, F., Stwalley, W.C., Trümper, J.E., Wölfle, P., Woggon, U. Höhler, Gerhard (Ed.). Vol. 111. 1988, Berlin Heidelberg: Springer-Verlag. 17. Park, W.-D., Optical Constants and Dispersion Parameters of CdS Thin Film Prepared by Chemical Bath Deposition. Transactions on Electrical and Electronic Materials, 2012. 13(4): p. 196-199. 18. Parriaux, O. and G. Voirin, Plasmon wave versus dielectric waveguiding for surface wave sensing. Sensors and Actuators A: Physical, 1990. 23(1–3): p. 1137-1141. 19. Homola, J., Present and future of surface plasmon resonance biosensors. Analytical and bioanalytical chemistry, 2003. 377(3): p. 528-39. 20. Rebhan, M., Affinity-based Structural Studies of microRNA Precursors, in D-CHAB. 2013, ETH Zürich: Zürich. 21. Homola, J., Surface Plasmon Resonance Based Sensors. Springer Series on Chemical Sensors and Biosensors, ed. J. Homola. Vol. 4. 2006, Berlin Heidelberg: Springer-Verlag.

182

22. Hamalainen, M.D., et al., Label-free primary screening and affinity ranking of fragment libraries using parallel analysis of protein panels. J Biomol Screen, 2008. 13(3): p. 202-9. 23. Myszka, D.G., Kinetic analysis of macromolecular interactions using surface plasmon resonance biosensors. Curr Opin Biotechnol, 1997. 8(1): p. 50-7. 24. Koehnke, J., et al., Crystal structures of beta-neurexin 1 and beta-neurexin 2 ectodomains and dynamics of splice insertion sequence 4. Structure, 2008. 16(3): p. 410-21. 25. Kieffer, C., et al., Two distinct modes of ESCRT-III recognition are required for VPS4 functions in lysosomal protein targeting and HIV-1 budding. Dev Cell, 2008. 15(1): p. 62-73. 26. Zeller, J., et al., CGRP function-blocking antibodies inhibit neurogenic vasodilatation without affecting heart rate or arterial blood pressure in the rat. Br J Pharmacol, 2008. 155(7): p. 1093-103. 27. Reisser-Rubrecht, L., et al., High-affinity uranyl-specific antibodies suitable for cellular imaging. Chem Res Toxicol, 2008. 21(2): p. 349-57. 28. Julien, K.R., et al., Conformationally restricted nucleotides as a probe of structure-function relationships in RNA. RNA, 2008. 14(8): p. 1632-43. 29. Mackay, H., et al., Targeting the inverted CCAAT Box-2 of the topoisomerase IIalpha gene: DNA sequence selective recognition by a polyamide-intercalator as a staggered dimer. Bioorg Med Chem, 2008. 16(4): p. 2093-102. 30. Lewis, P., et al., Dynamics of saxitoxin binding to saxiphilin c-lobe reveals conformational change. Toxicon, 2008. 51(2): p. 208-17. 31. Giannetti, A.M., B.D. Koch, and M.F. Browner, Surface plasmon resonance based assay for the detection and characterization of promiscuous inhibitors. J Med Chem, 2008. 51(3): p. 574-80. 32. Wang, H., et al., In vitro and in vivo properties of adenovirus vectors with increased affinity to CD46. J Virol, 2008. 82(21): p. 10567-79. 33. de Mol, N.J., et al., Surface plasmon resonance thermodynamic and kinetic analysis as a strategic tool in drug design. Distinct ways for phosphopeptides to plug into Src- and Grb2 SH2 domains. J Med Chem, 2005. 48(3): p. 753-63. 34. Rich, R.L., et al., Kinetic analysis of estrogen receptor/ligand interactions. Proceedings of the National Academy of Sciences, 2002. 99(13): p. 8562-8567. 35. Markgren, P.-O., et al., Determination of Interaction Kinetic Constants for HIV-1 Protease Inhibitors Using Optical Biosensor Technology. Analytical Biochemistry, 2001. 291(2): p. 207- 218. 36. Markgren, P.-O., et al., Relationships between Structure and Interaction Kinetics for HIV-1 Protease Inhibitors. Journal of Medicinal Chemistry, 2002. 45(25): p. 5430-5439. 37. Johnsson, B., S. Löfås, and G. Lindquist, Immobilization of proteins to a carboxymethyldextran-modified gold surface for biospecific interaction analysis in surface plasmon resonance sensors. Analytical Biochemistry, 1991. 198(2): p. 268-277. 38. Wang, R., et al., Immobilisation of DNA probes for the development of SPR-based sensing. Biosensors and Bioelectronics, 2004. 20(5): p. 967-974. 39. Dubois, L.H. and R.G. Nuzzo, Synthesis, Structure, and Properties of Model Organic Surfaces. Annual Review of Physical Chemistry, 1992. 43(1): p. 437-463. 40. Love, J.C., et al., Self-assembled monolayers of thiolates on metals as a form of nanotechnology. Chem Rev, 2005. 105(4): p. 1103-69. 41. Morton, T.A., D.G. Myszka, and I.M. Chaiken, Interpreting complex binding kinetics from optical biosensors: a comparison of analysis by linearization, the integrated rate equation, and numerical integration. Anal Biochem, 1995. 227(1): p. 176-85. 42. Oshannessy, D.J., et al., Determination of Rate and Equilibrium Binding Constants for Macromolecular Interactions Using Surface Plasmon Resonance: Use of Nonlinear Least Squares Analysis Methods. Analytical Biochemistry, 1993. 212(2): p. 457-468. 43. Rich, R.L. and D.G. Myszka, Extracting kinetic rate constants from binding responses

183

Label-Free Biosensors. 2009: Cambridge University Press. 44. Beaucage, S.L. and M.H. Caruthers, Deoxynucleoside phosphoramidites—A new class of key intermediates for deoxypolynucleotide synthesis. Tetrahedron Letters, 1981. 22(20): p. 1859- 1862. 45. Ogilvie, K.K., N. Theriault, and K.L. Sadana, Synthesis of oligoribonucleotides. Journal of the American Chemical Society, 1977. 99(23): p. 7741-7743. 46. Usman, N., et al., The automated chemical synthesis of long oligoribuncleotides using 2'-O- silylated ribonucleoside 3'-O-phosphoramidites on a controlled-pore glass support: synthesis of a 43-nucleotide sequence similar to the 3'-half molecule of an Escherichia coli formylmethionine tRNA. Journal of the American Chemical Society, 1987. 109(25): p. 7845- 7854. 47. Usman, N., R.T. Pon, and K.K. Ogilvie, Preparation of ribonucleoside 3′-O-phosphoramidites and their application to the automated solid phase synthesis of oligonucleotides. Tetrahedron Letters, 1985. 26(38): p. 4567-4570. 48. Pitsch, S., et al., Reliable Chemical Synthesis of Oligoribonucleotides (RNA) with 2′-O- [(Triisopropylsilyl)oxy]methyl(2′-O-tom)-Protected Phosphoramidites. Helvetica Chimica Acta, 2001. 84(12): p. 3773-3795. 49. Pitsch, S., et al., Fast and Reliable Automated Synthesis of RNA and Partially 2′-O- Protected Precursors (`Caged RNA') Based on Two Novel, Orthogonal 2′-O-Protecting Groups, Preliminary Communication. Helvetica Chimica Acta, 1999. 82(10): p. 1753-1761. 50. Somoza, A., Protecting groups for RNA synthesis: an increasing need for selective preparative methods. Chem Soc Rev, 2008. 37(12): p. 2668-75. 51. Stutz, A., C. Höbartner, and S. Pitsch, Novel Fluoride-Labile Nucleobase-Protecting Groups for the Synthesis of 3′(2′)-O-Aminoacylated RNA Sequences. Helvetica Chimica Acta, 2000. 83(9): p. 2477-2503. 52. Welz, R. and S. Müller, 5-(Benzylmercapto)-1H-tetrazole as activator for 2′-O-TBDMS phosphoramidite building blocks in RNA synthesis. Tetrahedron Letters, 2002. 43(5): p. 795- 797. 53. Caruthers, M.H., et al., Chemical synthesis of deoxyoligonucleotides by the phosphoramidite method. Methods Enzymol, 1987. 154: p. 287-313. 54. Boal, J.H., et al., Cleavage of oligodeoxyribonucleotides from controlled-pore glass supports and their rapid deprotection by gaseous amines. Nucleic Acids Res, 1996. 24(15): p. 3115-7. 55. Guzaev, A.P. and M. Manoharan, A conformationally preorganized universal solid support for efficient oligonucleotide synthesis. J Am Chem Soc, 2003. 125(9): p. 2380-1. 56. Reddy, M.P., N.B. Hanna, and F. Farooqui, Ultrafast Cleavage and Deprotection of Oligonucleotides Synthesis and Use of CAc Derivatives. Nucleosides and Nucleotides, 1997. 16(7-9): p. 1589-1598. 57. Ha, M. and V.N. Kim, Regulation of microRNA biogenesis. Nat Rev Mol Cell Biol, 2014. 15(8): p. 509-24. 58. Treiber, T., N. Treiber, and G. Meister, Regulation of microRNA biogenesis and function. Thromb Haemost, 2012. 107(4): p. 605-10. 59. Luo, Y., Z. Guo, and L. Li, Evolutionary conservation of microRNA regulatory programs in plant flower development. Developmental Biology, 2013. 380(2): p. 133-144. 60. Altuvia, Y., et al., Clustering and conservation patterns of human microRNAs. Nucleic Acids Research, 2005. 33(8): p. 2697-2706. 61. Chen, K. and N. Rajewsky, Deep conservation of microRNA-target relationships and 3'UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harb Symp Quant Biol, 2006. 71: p. 149-56. 62. Krol, J., I. Loedige, and W. Filipowicz, The widespread regulation of microRNA biogenesis, function and decay. Nature reviews. Genetics, 2010. 11(9): p. 597-610.

184

63. Lee, R.C., R.L. Feinbaum, and V. Ambros, The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 1993. 75(5): p. 843-854. 64. Wightman, B., I. Ha, and G. Ruvkun, Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell, 1993. 75(5): p. 855-62. 65. Borchert, G.M., W. Lanier, and B.L. Davidson, RNA polymerase III transcribes human microRNAs. Nat Struct Mol Biol, 2006. 13(12): p. 1097-101. 66. CAI, X., C.H. HAGEDORN, and B.R. CULLEN, Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. RNA, 2004. 10(12): p. 1957- 1966. 67. Lee, Y., et al., MicroRNA genes are transcribed by RNA polymerase II. EMBO J, 2004. 23(20): p. 4051-60. 68. Pfeffer, S., et al., Identification of microRNAs of the herpesvirus family. Nat Methods, 2005. 2(4): p. 269-76. 69. Denli, A.M., et al., Processing of primary microRNAs by the Microprocessor complex. Nature, 2004. 432(7014): p. 231-5. 70. Gregory, R.I., et al., The Microprocessor complex mediates the genesis of microRNAs. Nature, 2004. 432(7014): p. 235-40. 71. Han, J., et al., The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev, 2004. 18(24): p. 3016-27. 72. Morlando, M., et al., Primary microRNA transcripts are processed co-transcriptionally. Nat Struct Mol Biol, 2008. 15(9): p. 902-909. 73. Han, J., et al., Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell, 2006. 125(5): p. 887-901. 74. Zeng, Y., Principles of micro-RNA production and maturation. Oncogene, 2006. 25(46): p. 6156-62. 75. Auyeung, V.C., et al., Beyond secondary structure: primary-sequence determinants license pri- miRNA hairpins for processing. Cell, 2013. 152(4): p. 844-58. 76. Mori, M., et al., Hippo signaling regulates microprocessor and links cell-density-dependent miRNA biogenesis to cancer. Cell, 2014. 156(5): p. 893-906. 77. Zagalak, J., New interactions between RNA-binding proteins and non-coding stem-loop RNAs, in D-CHAB. 2015, ETH Zürich: Zürich. 78. Strambio-De-Castillia, C., M. Niepel, and M.P. Rout, The nuclear pore complex: bridging nuclear transport and gene regulation. Nat Rev Mol Cell Biol, 2010. 11(7): p. 490-501. 79. Lund, E., et al., Nuclear Export of MicroRNA Precursors. Science, 2004. 303(5654): p. 95-98. 80. Bohnsack, M.T., K. Czaplinski, and D. Gorlich, Exportin 5 is a RanGTP-dependent dsRNA- binding protein that mediates nuclear export of pre-miRNAs. RNA, 2004. 10(2): p. 185-191. 81. Park, J.E., et al., Dicer recognizes the 5' end of RNA for efficient and accurate processing. Nature, 2011. 475(7355): p. 201-5. 82. Zhang, H., et al., Single processing center models for human Dicer and bacterial RNase III. Cell, 2004. 118(1): p. 57-68. 83. Tsutsumi, A., et al., Recognition of the pre-miRNA structure by Drosophila Dicer-1. Nat Struct Mol Biol, 2011. 18(10): p. 1153-8. 84. Gu, S., et al., The loop position of shRNAs and pre-miRNAs is critical for the accuracy of dicer processing in vivo. Cell, 2012. 151(4): p. 900-11. 85. MacRae, I.J., et al., Structural Basis for Double-Stranded RNA Processing by Dicer. Science, 2006. 311(5758): p. 195-198. 86. Tian, Y., et al., A phosphate-binding pocket within the platform-PAZ-connector helix cassette of human Dicer. Mol Cell, 2014. 53(4): p. 606-16. 87. MacRae, I.J., K. Zhou, and J.A. Doudna, Structural determinants of RNA recognition and cleavage by Dicer. Nat Struct Mol Biol, 2007. 14(10): p. 934-40.

185

88. Lee, H.Y. and J.A. Doudna, TRBP alters human precursor microRNA processing in vitro. RNA, 2012. 18(11): p. 2012-9. 89. Fukunaga, R., et al., Dicer partner proteins tune the length of mature miRNAs in flies and mammals. Cell, 2012. 151(3): p. 533-46. 90. Wilson, Ross C., et al., Dicer-TRBP Complex Formation Ensures Accurate Mammalian MicroRNA Biogenesis. Molecular Cell, 2015. 57(3): p. 397-407. 91. Chakravarthy, S., et al., Substrate-Specific Kinetics of Dicer-Catalyzed RNA Processing. Journal of Molecular Biology, 2010. 404(3): p. 392-402. 92. Kuzuoglu-Ozturk, D., et al., The GW182 protein AIN-1 interacts with PAB-1 and subunits of the PAN2-PAN3 and CCR4-NOT deadenylase complexes. Nucleic Acids Res, 2012. 40(12): p. 5651-65. 93. Huntzinger, E., et al., The interactions of GW182 proteins with PABP and deadenylases are required for both translational repression and degradation of miRNA targets. Nucleic Acids Res, 2013. 41(2): p. 978-94. 94. Fabian, M.R., et al., miRNA-mediated deadenylation is orchestrated by GW182 through two conserved motifs that interact with CCR4-NOT. Nat Struct Mol Biol, 2011. 18(11): p. 1211-7. 95. Braun, J.E., et al., GW182 proteins directly recruit cytoplasmic deadenylase complexes to miRNA targets. Mol Cell, 2011. 44(1): p. 120-33. 96. Mourelatos, Z., et al., miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev, 2002. 16(6): p. 720-8. 97. Tabara, H., et al., The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell, 1999. 99(2): p. 123-32. 98. Hammond, S.M., et al., Argonaute2, a link between genetic and biochemical analyses of RNAi. Science, 2001. 293(5532): p. 1146-50. 99. Khvorova, A., A. Reynolds, and S.D. Jayasena, Functional siRNAs and miRNAs exhibit strand bias. Cell, 2003. 115(2): p. 209-16. 100. Schwarz, D.S., et al., Asymmetry in the Assembly of the RNAi Enzyme Complex. Cell, 2003. 115(2): p. 199-208. 101. Chendrimada, T.P., et al., TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature, 2005. 436(7051): p. 740-4. 102. Maniataki, E. and Z. Mourelatos, A human, ATP-independent, RISC assembly machine fueled by pre-miRNA. Genes Dev, 2005. 19(24): p. 2979-90. 103. Daschkey, S., et al., MicroRNAs Distinguish Cytogenetic Subgroups in Pediatric AML and Contribute to Complex Regulatory Networks in AML-Relevant Pathways. PLoS ONE, 2013. 8(2): p. e56334. 104. Huntzinger, E. and E. Izaurralde, Gene silencing by microRNAs: contributions of translational repression and mRNA decay. Nat Rev Genet, 2011. 12(2): p. 99-110. 105. Guennewig, B., et al., Synthetic pre-microRNAs reveal dual-strand activity of miR-34a on TNF- α. RNA, 2013. 106. Chiang, H.R., et al., Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev, 2010. 24(10): p. 992-1009. 107. Yang, J.-S. and Eric C. Lai, Alternative miRNA Biogenesis Pathways and the Interpretation of Core miRNA Pathway Mutants. Molecular Cell. 43(6): p. 892-903. 108. Schirle, N.T., bordway, and I.J. MacRae, The Crystal Structure of Human Argonaute2. Science, 2012. 336(6084): p. 1037-1040. 109. Elkayam, E., et al., The structure of human argonaute-2 in complex with miR-20a. Cell, 2012. 150(1): p. 100-10. 110. Nakanishi, K., et al., Structure of yeast Argonaute with guide RNA. Nature, 2012. 486(7403): p. 368-374. 111. Parker, J.S., S.M. Roe, and D. Barford, Structural insights into mRNA recognition from a PIWI domain-siRNA guide complex. Nature, 2005. 434(7033): p. 663-6.

186

112. Ma, J.B., et al., Structural basis for 5'-end-specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature, 2005. 434(7033): p. 666-70. 113. Frank, F., N. Sonenberg, and B. Nagar, Structural basis for 5'-nucleotide base-specific recognition of guide RNA by human AGO2. Nature, 2010. 465(7299): p. 818-22. 114. Parker, J.S., et al., Enhancement of the seed-target recognition step in RNA silencing by a PIWI/MID domain protein. Mol Cell, 2009. 33(2): p. 204-14. 115. Wang, Y., et al., Structure of the guide-strand-containing argonaute silencing complex. Nature, 2008. 456(7219): p. 209-13. 116. Bartel, D.P., MicroRNAs: Target Recognition and Regulatory Functions. Cell, 2009. 136(2): p. 215-233. 117. Gebert, L., TARGETING THE BIOGENESIS AND FUNCTION OF MIR-122: DEVELOPMENT OF ANALYTICAL APPROACHES AND DISCOVERY OF NOVEL INHIBITORS., in D-CHAB. 2013, ETH Zürich: Zürich. 118. Fabian, M.R. and N. Sonenberg, The mechanics of miRNA-mediated gene silencing: a look under the hood of miRISC. Nat Struct Mol Biol, 2012. 19(6): p. 586-593. 119. Ding, X.C. and H. Grosshans, Repression of C-elegans microRNA targets at the initiation level of translation requires GW182 proteins. EMBO J, 2009. 28(3): p. 213-222. 120. Iwasaki, S. and Y. Tomari, Argonaute-mediated translational repression (and activation). Fly (Austin), 2009. 3(3): p. 204-6. 121. Song, J.-J., et al., Crystal Structure of Argonaute and Its Implications for RISC Slicer Activity. Science, 2004. 305(5689): p. 1434-1437. 122. Liu, J., et al., Argonaute2 is the catalytic engine of mammalian RNAi. Science, 2004. 305(5689): p. 1437-41. 123. Bernstein, E., et al., Dicer is essential for mouse development. Nat Genet, 2003. 35(3): p. 215- 7. 124. Han, J., et al., Posttranscriptional crossregulation between Drosha and DGCR8. Cell, 2009. 136(1): p. 75-84. 125. Triboulet, R., et al., Post-transcriptional control of DGCR8 expression by the Microprocessor. RNA, 2009. 15(6): p. 1005-11. 126. Tang, X., et al., Glycogen synthase kinase 3 beta (GSK3beta) phosphorylates the RNAase III enzyme Drosha at S300 and S302. PLoS One, 2011. 6(6): p. e20391. 127. Tang, X., et al., Phosphorylation of the RNase III enzyme Drosha at Serine300 or Serine302 is required for its nuclear localization. Nucleic Acids Res, 2010. 38(19): p. 6610-9. 128. Wada, T., J. Kikuchi, and Y. Furukawa, Histone deacetylase 1 enhances microRNA processing via deacetylation of DGCR8. EMBO Rep, 2012. 13(2): p. 142-9. 129. Tang, X., et al., Acetylation of drosha on the N-terminus inhibits its degradation by ubiquitination. PLoS One, 2013. 8(8): p. e72503. 130. Cheng, T.L., et al., MeCP2 suppresses nuclear microRNA processing and dendritic growth by regulating the DGCR8/Drosha complex. Dev Cell, 2014. 28(5): p. 547-60. 131. Piskounova, E., et al., Determinants of MicroRNA Processing Inhibition by the Developmentally Regulated RNA-binding Protein Lin28. Journal of Biological Chemistry, 2008. 283(31): p. 21310-21314. 132. Newman, M.A., J.M. Thomson, and S.M. Hammond, Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA, 2008. 14(8): p. 1539-49. 133. Thornton, J.E. and R.I. Gregory, How does Lin28 let-7 control development and disease? Trends Cell Biol, 2012. 22(9): p. 474-82. 134. Moss, E.G. and L. Tang, Conservation of the heterochronic regulator Lin-28, its developmental expression and microRNA complementary sites. Developmental Biology, 2003. 258(2): p. 432- 442. 135. Wang, X., et al., Regulation of let-7 and its target oncogenes (Review). Oncology Letters, 2012. 3(5): p. 955-960.

187

136. Shaik Syed Ali, P., et al., Recognition of the let-7g miRNA precursor by human Lin28B. FEBS Letters, 2012. 586(22): p. 3986-3990. 137. Wu, H., et al., A Splicing-Independent Function of SF2/ASF in MicroRNA Processing. Molecular Cell, 2010. 38(1): p. 67-77. 138. Burd, C.G. and G. Dreyfuss, RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. EMBO J, 1994. 13(5): p. 1197-204. 139. Guil, S. and J.F. Caceres, The multifunctional RNA-binding protein hnRNP A1 is required for processing of miR-18a. Nat Struct Mol Biol, 2007. 14(7): p. 591-6. 140. Michlewski, G. and J.F. Caceres, Antagonistic role of hnRNP A1 and KSRP in the regulation of let-7a biogenesis. Nat Struct Mol Biol, 2010. 17(8): p. 1011-8. 141. Gherzi, R., et al., The role of KSRP in mRNA decay and microRNA precursor maturation. Wiley Interdiscip Rev RNA, 2010. 1(2): p. 230-9. 142. Trabucchi, M., et al., The RNA-binding protein KSRP promotes the biogenesis of a subset of microRNAs. Nature, 2009. 459(7249): p. 1010-4. 143. Suzuki, H.I., et al., MCPIP1 ribonuclease antagonizes dicer and terminates microRNA biogenesis through precursor microRNA degradation. Mol Cell, 2011. 44(3): p. 424-36. 144. Xu, X.L., et al., FXR1P but not FMRP regulates the levels of mammalian brain-specific microRNA-9 and microRNA-124. J Neurosci, 2011. 31(39): p. 13705-9. 145. Buratti, E., et al., Nuclear factor TDP-43 can affect selected microRNA levels. FEBS J, 2010. 277(10): p. 2268-81. 146. Lebedeva, S., et al., Transcriptome-wide analysis of regulatory interactions of the RNA- binding protein HuR. Mol Cell, 2011. 43(3): p. 340-52. 147. Jansson, M.D. and A.H. Lund, MicroRNA and cancer. Molecular Oncology, 2012. 6(6): p. 590- 610. 148. Sassen, S., E.A. Miska, and C. Caldas, MicroRNA—implications for cancer. Virchows Archiv, 2008. 452(1): p. 1-10. 149. Cheng, C.J., et al., MicroRNA silencing for cancer therapy targeted to the tumour microenvironment. Nature, 2015. 518(7537): p. 107-110. 150. Hayes, J., P.P. Peruzzi, and S. Lawler, MicroRNAs in cancer: biomarkers, functions and therapy. Trends in Molecular Medicine. 20(8): p. 460-469. 151. Calin, G.A., et al., Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc Natl Acad Sci U S A, 2004. 101(9): p. 2999-3004. 152. van Kouwenhove, M., M. Kedde, and R. Agami, MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer, 2011. 11(9): p. 644-56. 153. Lima, R.T., et al., MicroRNA regulation of core apoptosis pathways in cancer. European Journal of Cancer, 2011. 47(2): p. 163-174. 154. Negrini, M., M.S. Nicoloso, and G.A. Calin, MicroRNAs and cancer--new paradigms in molecular oncology. Curr Opin Cell Biol, 2009. 21(3): p. 470-9. 155. Urbich, C., A. Kuehbacher, and S. Dimmeler, Role of microRNAs in vascular diseases, inflammation, and angiogenesis. Cardiovasc Res, 2008. 79(4): p. 581-8. 156. Gregory, P.A., et al., MicroRNAs as regulators of epithelial-mesenchymal transition. Cell Cycle, 2008. 7(20): p. 3112-8. 157. Nicoloso, M.S., et al., MicroRNAs the micro steering wheel of tumour metastases. Nat Rev Cancer, 2009. 9(4): p. 293-302. 158. Olson, P., et al., MicroRNA dynamics in the stages of tumorigenesis correlate with hallmark capabilities of cancer. Genes Dev, 2009. 23(18): p. 2152-65. 159. Boyerinas, B., et al., The role of let-7 in cell differentiation and cancer. Endocr Relat Cancer, 2010. 17(1): p. F19-36. 160. Garzon, R., G. Marcucci, and C.M. Croce, Targeting MicroRNAs in Cancer: Rationale, Strategies and Challenges. Nature reviews. Drug discovery, 2010. 9(10): p. 775-789. 161. Lujambio, A. and S.W. Lowe, The microcosmos of cancer. Nature, 2012. 482(7385): p. 347-55.

188

162. Ota, A., et al., Identification and characterization of a novel gene, C13orf25, as a target for 13q31-q32 amplification in malignant lymphoma. Cancer Res, 2004. 64(9): p. 3087-95. 163. Barh, D., et al., Microrna let-7: an emerging next-generation cancer therapeutic. Current Oncology, 2010. 17(1): p. 70-80. 164. O'Donnell, K.A., et al., c-Myc-regulated microRNAs modulate E2F1 expression. Nature, 2005. 435(7043): p. 839-843. 165. Jopling, C., Liver-specific microRNA-122: Biogenesis and function. RNA Biol, 2012. 9(2): p. 137- 42. 166. Gebert, L.F., et al., Miravirsen (SPC3649) can inhibit the biogenesis of miR-122. Nucleic Acids Res, 2014. 42(1): p. 609-21. 167. Janssen, H.L., et al., Treatment of HCV infection by targeting microRNA. N Engl J Med, 2013. 368(18): p. 1685-94. 168. Shimakami, T., et al., Stabilization of hepatitis C virus RNA by an Ago2–miR-122 complex. Proceedings of the National Academy of Sciences, 2012. 109(3): p. 941-946. 169. Underwood, J.G., et al., Homologues of the Caenorhabditis elegans Fox-1 protein are neuronal splicing regulators in mammals. Mol Cell Biol, 2005. 25(22): p. 10005-16. 170. Jin, Y., et al., A vertebrate RNA-binding protein Fox-1 regulates tissue-specific splicing via the pentanucleotide GCAUG. EMBO J, 2003. 22(4): p. 905-12. 171. Kim, K.K., R.S. Adelstein, and S. Kawamoto, Identification of neuronal nuclei (NeuN) as Fox-3, a new member of the Fox-1 gene family of splicing factors. J Biol Chem, 2009. 284(45): p. 31052-61. 172. Kuroyanagi, H., Fox-1 family of RNA-binding proteins. Cell Mol Life Sci, 2009. 66(24): p. 3895- 907. 173. Kim, K.K., et al., Rbfox3 controls the biogenesis of a subset of microRNAs. Nat Struct Mol Biol, 2014. 21(10): p. 901-10. 174. Sun, S., et al., Mechanisms of activation and repression by the alternative splicing factors RBFOX1/2. RNA, 2012. 18(2): p. 274-83. 175. Nakahata, S. and S. Kawamoto, Tissue-dependent isoforms of mammalian Fox-1 homologs are associated with tissue-specific splicing activities. Nucleic Acids Res, 2005. 33(7): p. 2078- 89. 176. Yang, G., et al., Regulated Fox-2 isoform expression mediates protein 4.1R splicing during erythroid differentiation. Blood, 2008. 111(1): p. 392-401. 177. Chook, Y.M. and G. Blobel, Karyopherins and nuclear import. Curr Opin Struct Biol, 2001. 11(6): p. 703-15. 178. Boillee, S., et al., Onset and progression in inherited ALS determined by motor neurons and microglia. Science, 2006. 312(5778): p. 1389-92. 179. Ponthier, J.L., et al., Fox-2 Splicing Factor Binds to a Conserved Intron Motif to Promote Inclusion of Protein 4.1R Alternative Exon 16. Journal of Biological Chemistry, 2006. 281(18): p. 12468-12474. 180. Brudno, M., et al., Computational analysis of candidate intron regulatory elements for tissue- specific alternative pre-mRNA splicing. Nucleic Acids Res, 2001. 29(11): p. 2338-48. 181. Castle, J.C., et al., Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nature Genetics, 2008. 40(12): p. 1416-25. 182. Williams, A.A., et al., Impact of Sugar Pucker on Base Pair and Mispair Stability. Biochemistry, 2009. 48(50): p. 11994-12004. 183. Williamson, J.R., Induced fit in RNA-protein recognition. Nat Struct Biol, 2000. 7(10): p. 834-7. 184. Kurisaki, I., M. Takayanagi, and M. Nagaoka, Combined mechanism of conformational selection and induced fit in U1A-RNA molecular recognition. Biochemistry, 2014. 53(22): p. 3646-57. 185. Fersht, A.R., The hydrogen bond in molecular recognition. Trends in Biochemical Sciences, 1987. 12(0): p. 301-304.

189

186. Guallar, V. and K.W. Borrelli, A binding mechanism in protein–nucleotide interactions: Implication for U1A RNA binding. Proceedings of the National Academy of Sciences of the United States of America, 2005. 102(11): p. 3954-3959. 187. Luscombe, N.M., R.A. Laskowski, and J.M. Thornton, Amino acid–base interactions: a three- dimensional analysis of protein–DNA interactions at an atomic level. Nucleic Acids Research, 2001. 29(13): p. 2860-2874. 188. Nobeli, I., et al., On the molecular discrimination between adenine and guanine by proteins. Nucleic Acids Research, 2001. 29(21): p. 4294-4309. 189. Zhou, H.-L. and H. Lou, Repression of Prespliceosome Complex Formation at Two Distinct Steps by Fox-1/Fox-2 Proteins. Molecular and Cellular Biology, 2008. 28(17): p. 5507-5516. 190. Zhang, C., et al., Defining the regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes Dev, 2008. 22(18): p. 2550-63. 191. Yeo, G.W., et al., Alternative splicing events identified in human embryonic stem cells and neural progenitors. PLoS Comput Biol, 2007. 3(10): p. 1951-67. 192. Baraniak, A.P., J.R. Chen, and M.A. Garcia-Blanco, Fox-2 mediates epithelial cell-specific fibroblast growth factor receptor 2 exon choice. Mol Cell Biol, 2006. 26(4): p. 1209-22. 193. Minovitsky, S., et al., The splicing regulatory element, UGCAUG, is phylogenetically and spatially conserved in introns that flank tissue-specific alternative exons. Nucleic Acids Res, 2005. 33(2): p. 714-24. 194. Kuroyanagi, H., et al., The Fox-1 family and SUP-12 coordinately regulate tissue-specific alternative splicing in vivo. Mol Cell Biol, 2007. 27(24): p. 8612-21. 195. Hakim, N.H., et al., Alternative splicing of Mef2c promoted by Fox-1 during neural differentiation in P19 cells. Genes Cells, 2010. 15(3): p. 255-267. 196. Gallagher, T.L., et al., Rbfox-regulated alternative splicing is critical for zebrafish cardiac and skeletal muscle functions. Dev Biol, 2011. 359(2): p. 251-61. 197. Jangi, M., et al., Rbfox2 controls autoregulation in RNA-binding protein networks. Genes Dev, 2014. 28(6): p. 637-51. 198. Huh, G.S. and R.O. Hynes, Regulation of alternative pre-mRNA splicing by a novel repeated hexanucleotide element. Genes Dev, 1994. 8(13): p. 1561-74. 199. Emeson, R.B., et al., Alternative production of calcitonin and CGRP mRNA is regulated at the calcitonin-specific splice acceptor. Nature, 1989. 341(6237): p. 76-80. 200. Hedjran, F., et al., Control of alternative pre-mRNA splicing by distributed pentameric repeats. Proc Natl Acad Sci U S A, 1997. 94(23): p. 12343-7. 201. Zhou, H.L., A.P. Baraniak, and H. Lou, Role for Fox-1/Fox-2 in mediating the neuronal pathway of calcitonin/calcitonin gene-related peptide alternative RNA processing. Mol Cell Biol, 2007. 27(3): p. 830-41. 202. Kawamoto, S., Neuron-specific alternative splicing of nonmuscle myosin II heavy chain-B pre- mRNA requires a cis-acting intron sequence. J Biol Chem, 1996. 271(30): p. 17613-6. 203. Ule, J., et al., An RNA map predicting Nova-dependent splicing regulation. Nature, 2006. 444(7119): p. 580-586. 204. Llorian, M., et al., Position-dependent alternative splicing activity revealed by global profiling of alternative splicing events regulated by PTB. Nat Struct Mol Biol, 2010. 17(99): p. 1114- 1123. 205. Sickmier, E.A., et al., Structural basis for polypyrimidine tract recognition by the essential pre- mRNA splicing factor U2AF65. Mol Cell, 2006. 23(1): p. 49-59. 206. Sharma, S., A.M. Falick, and D.L. Black, Polypyrimidine tract binding protein blocks the 5' splice site-dependent assembly of U2AF and the prespliceosomal E complex. Mol Cell, 2005. 19(4): p. 485-96. 207. Black, D.L., Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem, 2003. 72: p. 291-336.

190

208. Horowitz, D.S., The mechanism of the second step of pre-mRNA splicing. Wiley interdisciplinary reviews. RNA, 2012. 3(3): p. 331-50. 209. Matera, A.G. and Z. Wang, A day in the life of the spliceosome. Nat Rev Mol Cell Biol, 2014. 15(2): p. 108-121. 210. Jamison, S.F., A. Crow, and M.A. Garcia-Blanco, The spliceosome assembly pathway in mammalian extracts. Molecular and Cellular Biology, 1992. 12(10): p. 4279-4287. 211. Seraphin, B. and M. Rosbash, Identification of functional U1 snRNA-pre-mRNA complexes committed to spliceosome assembly and splicing. Cell, 1989. 59(2): p. 349-358. 212. Fukumura, K., et al., U1-independent pre-mRNA splicing contributes to the regulation of alternative splicing. Nucleic Acids Res, 2009. 37(6): p. 1907-14. 213. Fukumura, K., et al., Tissue-specific splicing regulator Fox-1 induces exon skipping by interfering E complex formation on the downstream intron of human F1gamma gene. Nucleic Acids Res, 2007. 35(16): p. 5303-11. 214. Carey, J., P.T. Lowary, and O.C. Uhlenbeck, Interaction of R17 coat protein with synthetic variants of its ribonucleic acid binding site. Biochemistry, 1983. 22(20): p. 4723-4730. 215. Johansson, H.E., L. Liljas, and O.C. Uhlenbeck, RNA Recognition by the MS2 Phage Coat Protein. Seminars in Virology, 1997. 8(3): p. 176-185. 216. Hua, Y., et al., Antisense masking of an hnRNP A1/A2 intronic splicing silencer corrects SMN2 splicing in transgenic mice. Am J Hum Genet, 2008. 82(4): p. 834-48. 217. Lambert, N., et al., RNA Bind-n-Seq: quantitative assessment of the sequence and structural binding specificity of RNA binding proteins. Mol Cell, 2014. 54(5): p. 887-900. 218. Yeo, G.W., et al., An RNA code for the FOX2 splicing regulator revealed by mapping RNA- protein interactions in stem cells. Nat Struct Mol Biol, 2009. 16(2): p. 130-7. 219. Amrane, S., et al., Backbone-independent nucleic acid binding by splicing factor SUP-12 reveals key aspects of molecular recognition. Nat Commun, 2014. 5. 220. Kuwasako, K., et al., RBFOX and SUP-12 sandwich a G base to cooperatively regulate tissue- specific splicing. Nat Struct Mol Biol, 2014. 21(9): p. 778-86. 221. Mackereth, C.D., Splicing factor SUP-12 and the molecular complexity of apparent cooperativity. Worm, 2014. 3(4): p. e991240. 222. Zhou, F., et al., Genome-scale proteome quantification by DEEP SEQ mass spectrometry. Nat Commun, 2013. 4: p. 2171. 223. Li, X.Y., X.S. Cui, and N.H. Kim, Transcription profile during maternal to zygotic transition in the mouse embryo. Reprod Fertil Dev, 2006. 18(6): p. 635-45. 224. Hafner, M., et al., Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 2010. 141(1): p. 129-41. 225. Vermeulen, A., et al., The contributions of dsRNA structure to Dicer specificity and efficiency. RNA, 2005. 11(5): p. 674-82. 226. Ma, J.-B., K. Ye, and D.J. Patel, Structural basis for overhang-specific small interfering RNA recognition by the PAZ domain. Nature, 2004. 429(6989): p. 318-322. 227. Oberstrass, F.C., et al., Shape-specific recognition in the structure of the Vts1p SAM domain with RNA. Nat Struct Mol Biol, 2006. 13(2): p. 160-167. 228. Ramos, A., et al., RNA recognition by a Staufen double-stranded RNA-binding domain. EMBO J, 2000. 19(5): p. 997-1009. 229. Auweter, S.D., F.C. Oberstrass, and F.H.-. Allain, Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Research, 2006. 34(17): p. 4943-4959. 230. Wilbert, M.L., et al., LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance. Molecular cell, 2012. 48(2): p. 195-206. 231. Stefani, G., et al., A novel mechanism of LIN-28 regulation of let-7 microRNA expression revealed by in vivo HITS-CLIP in C. elegans. RNA, 2015. 21(5): p. 985-96. 232. Shibata, H., D.P. Huynh, and S.M. Pulst, A novel protein with RNA-binding motifs interacts with ataxin-2. Hum Mol Genet, 2000. 9(9): p. 1303-13.

191

233. Brummer, A., et al., Modeling the binding specificity of the RNA-binding protein GLD-1 suggests a function of coding region-located sites in translational repression. RNA, 2013. 19(10): p. 1317-26. 234. Wright, J.E., et al., A quantitative RNA code for mRNA target selection by the germline fate determinant GLD-1. EMBO J, 2011. 30(3): p. 533-45. 235. Gupta, A. and M. Gribskov, The Role of RNA Sequence and Structure in RNA–Protein Interactions. Journal of Molecular Biology, 2011. 409(4): p. 574-587. 236. Hasan, A., et al., Systematic Analysis of the Role of RNA-Binding Proteins in the Regulation of RNA Stability. PLoS Genet, 2014. 10(11): p. e1004684. 237. Vo, D.T., et al., The oncogenic RNA-binding protein Musashi1 is regulated by HuR via mRNA translation and stability in glioblastoma cells. Mol Cancer Res, 2012. 10(1): p. 143-55. 238. Long, R.M., et al., An Exclusively Nuclear RNA-Binding Protein Affects Asymmetric Localization of ASH1 mRNA and Ash1p in Yeast. The Journal of Cell Biology, 2001. 153(2): p. 307-318. 239. Tuerk, C. and L. Gold, Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science, 1990. 249(4968): p. 505-10. 240. Ellington, A.D. and J.W. Szostak, In vitro selection of RNA molecules that bind specific ligands. Nature, 1990. 346(6287): p. 818-822. 241. Stoltenburg, R., C. Reinemann, and B. Strehlitz, SELEX—A (r)evolutionary method to generate high-affinity nucleic acid ligands. Biomolecular Engineering, 2007. 24(4): p. 381-403. 242. Hafner, M., et al., Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA Target Sites by PAR-CLIP. Cell, 2010. 141(1): p. 129-141. 243. Jensen, K.B. and R.B. Darnell, CLIP: crosslinking and immunoprecipitation of in vivo RNA targets of RNA-binding proteins. Methods Mol Biol, 2008. 488: p. 85-98. 244. Weyn-Vanhentenryck, S.M., et al., HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism. Cell reports, 2014. 6(6): p. 1139-52. 245. Davis, J.H. and J.W. Szostak, Isolation of high-affinity GTP aptamers from partially structured RNA libraries. Proceedings of the National Academy of Sciences, 2002. 99(18): p. 11616- 11621. 246. Ashrafuzzaman, M., Aptamers as both drugs and drug-carriers. Biomed Res Int, 2014. 2014: p. 697923. 247. James, W., Aptamers, in Encyclopedia of Analytical Chemistry, R.A. Meyers, Editor. 2000, John Wiley & Sons Ltd: Chichester. p. 4848–4871. 248. Milligan, J.F., et al., Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates. Nucleic Acids Research, 1987. 15(21): p. 8783-8798. 249. Sousa, R. and S. Mukherjee, T7 RNA Polymerase, in Progress in Nucleic Acid Research and Molecular Biology. 2003, Academic Press. p. 1-41. 250. Chamberlin, M., J. McGrath, and L. Waskell, New RNA polymerase from Escherichia coli infected with bacteriophage T7. Nature, 1970. 228(5268): p. 227-31. 251. Liu, J. and G.D. Stormo, Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions. Nucleic Acids Research, 2005. 33(17): p. e141-e141. 252. Liu, J. and G.D. Stormo, Combining SELEX with quantitative assays to rapidly obtain accurate models of protein–DNA interactions. Nucleic Acids Research, 2005. 33(17): p. e141. 253. Tombelli, S., et al., Aptamer-based biosensors for the detection of HIV-1 Tat protein. Bioelectrochemistry, 2005. 67(2): p. 135-141. 254. Bianchini, M., et al., Specific oligobodies against ERK-2 that recognize both the native and the denatured state of the protein. Journal of Immunological Methods, 2001. 252(1–2): p. 191- 197. 255. Lupold, S.E., et al., Identification and Characterization of Nuclease-stabilized RNA Molecules That Bind Human Prostate Cancer Cells via the Prostate-specific Membrane Antigen. Cancer Research, 2002. 62(14): p. 4029-4033.

192

256. Stoltenburg, R., C. Reinemann, and B. Strehlitz, FluMag-SELEX as an advantageous method for DNA aptamer selection. Analytical and Bioanalytical Chemistry, 2005. 383(1): p. 83-91. 257. Haukanes, B.I. and C. Kvam, Application of magnetic beads in bioassays. Biotechnology (N Y), 1993. 11(1): p. 60-3. 258. Olsvik, O., et al., Magnetic separation techniques in diagnostic microbiology. Clin Microbiol Rev, 1994. 7(1): p. 43-54. 259. Wilson, R., Preparation of Single-Stranded DNA from PCR Products with Streptavidin Magnetic Beads. Nucleic Acid Therapeutics, 2011. 21(6): p. 437-440. 260. Marshall, K.A. and A.D. Ellington, [14] In vitro selection of RNA aptamers, in Methods in Enzymology. 2000, Academic Press. p. 193-214. 261. Johnson, L. and P.D. Gershon, RNA binding characteristics and overall topology of the vaccinia poly(A) polymerase-processivity factor-primer complex. Nucleic Acids Research, 1999. 27(13): p. 2708-2721. 262. Ule, J., et al., CLIP Identifies Nova-Regulated RNA Networks in the Brain. Science, 2003. 302(5648): p. 1212-1215. 263. Darnell, R.B., HITS-CLIP: panoramic views of protein–RNA regulation in living cells. Wiley Interdisciplinary Reviews - RNA, 2010. 1(2): p. 266-286. 264. Licatalosi, D.D., et al., HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature, 2008. 456(7221): p. 464-469. 265. Pfeffer, S., et al., Identification of microRNAs of the herpesvirus family. Nat Meth, 2005. 2(4): p. 269-276. 266. Ascano, M., et al., Identification of RNA–protein interaction networks using PAR-CLIP. Wiley interdisciplinary reviews. RNA, 2012. 3(2): p. 159-177. 267. Hafner, M., et al., Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 2010. 141(1): p. 129-141. 268. Fecko, C.J., et al., Comparison of Femtosecond Laser and Continuous Wave UV Sources for Protein–Nucleic Acid Crosslinking. Photochemistry and Photobiology, 2007. 83(6): p. 1394- 1404. 269. Zhuang, F., et al., Structural bias in T4 RNA ligase-mediated 3'-adapter ligation. Nucleic Acids Res, 2012. 40(7): p. e54. 270. Konig, J., et al., Protein-RNA interactions: new genomic technologies and perspectives. Nat Rev Genet, 2011. 13(2): p. 77-83. 271. König, J., et al., Protein–RNA interactions: new genomic technologies and perspectives. Nat Rev Genet, 2012. 13(2): p. 77-83. 272. Copeland, R.A., D.L. Pompliano, and T.D. Meek, Drug-target residence time and its implications for lead optimization. Nat Rev Drug Discov, 2006. 5(9): p. 730-739. 273. Lu, H. and P.J. Tonge, Drug–target residence time: critical information for lead optimization. Current Opinion in Chemical Biology, 2010. 14(4): p. 467-474. 274. Pan, A.C., et al., Molecular determinants of drug–receptor binding kinetics. Drug Discovery Today, 2013. 18(13–14): p. 667-673. 275. Swinney, D.C., The role of binding kinetics in therapeutically useful drug action. Curr Opin Drug Discov Devel, 2009. 12(1): p. 31-9. 276. Towbin, H., et al., Systematic screens of proteins binding to synthetic microRNA precursors. Nucleic Acids Research, 2013. 41(3): p. e47. 277. Michlewski, G., et al., Posttranscriptional Regulation of miRNAs Harboring Conserved Terminal Loops. Molecular Cell, 2008. 32(3): p. 383-393. 278. Pitsch, S., P.A. Weiss, and L. Jenny, Nucleoside with triple substituted silyloxymethyl group as a protection group on the 2' oxygen; protection groups are not subject to isomerization and give higher coupling yields. 1999, Google Patents. 279. Sproat, B.S., RNA synthesis using 2'-O-(tert-butyldimethylsilyl) protection., in Oligonucleotide Synthesis - Methods and Applications, P. Herdewijn, Editor. 2005, Humana Press: Ney York.

193

280. Tataurov, A.V., Y. You, and R. Owczarzy, Predicting ultraviolet spectrum of single stranded and double stranded deoxyribonucleic acids. Biophysical Chemistry, 2008. 133(1–3): p. 66-70. 281. Michel, E. and K. Wuthrich, High-yield Escherichia coli-based cell-free expression of human proteins. J Biomol NMR, 2012. 53(1): p. 43-51. 282. Hochuli, E., et al., Genetic Approach to Facilitate Purification of Recombinant Proteins with a Novel Metal Chelate Adsorbent. Nat Biotech, 1988. 6(11): p. 1321-1325. 283. Chapman-Smith, A., et al., The C-terminal domain of biotin protein ligase from E. coli is required for catalytic activity. Protein Sci, 2001. 10(12): p. 2608-17. 284. Lee, M.H. and T. Schedl, Identification of in vivo mRNA targets of GLD-1, a maxi-KH motif containing protein required for C. elegans germ cell development. Genes Dev, 2001. 15(18): p. 2408-20. 285. Lehmann-Blount, K.A. and J.R. Williamson, Shape-specific Nucleotide Binding of Single- stranded RNA by the GLD-1 STAR Domain. Journal of Molecular Biology, 2005. 346(1): p. 91- 104. 286. Kritikou, E.A., et al., C. elegans GLA-3 is a novel component of the MAP kinase MPK-1 signaling pathway required for germ cell survival. Genes Dev, 2006. 20(16): p. 2279-92. 287. Piran, U. and W.J. Riordan, Dissociation rate constant of the biotin-streptavidin complex. J Immunol Methods, 1990. 133(1): p. 141-3. 288. Guennewig, B., et al., Properties of N(4)-methylated cytidines in miRNA mimics. Nucleic Acid Ther, 2012. 22(2): p. 109-16. 289. Rich, R.L., et al., Kinetic analysis of estrogen receptor/ligand interactions. Proc Natl Acad Sci U S A, 2002. 99(13): p. 8562-7. 290. Remko, M., P.T. Van Duijnen, and M. Swart, Theoretical study of molecular structure, tautomerism, and geometrical isomerism of N-methyl- and N-phenyl-substituted cyclic imidazolines, oxazolines, and thiazolines. Structural Chemistry, 2003. 14(3): p. 271-278. 291. Tomic, K., J. Tatchen, and C.M. Marian, Quantum chemical investigation of the electronic spectra of the keto, enol, and keto-imine tautomers of cytosine. J Phys Chem A, 2005. 109(37): p. 8410-8. 292. Mason, S.F., The Tautomerism of N-Heteroaromatic Amines. Journal of the Chemical Society, 1959(Mar): p. 1281-1288. 293. Sorescu, D.A., et al., CARNA—alignment of RNA structure ensembles. Nucleic Acids Research, 2012. 40(W1): p. W49-W53. 294. Dal Palù, A., M. Möhl, and S. Will, A Propagator for Maximum Weight String Alignment with Arbitrary Pairwise Dependencies, in Principles and Practice of Constraint Programming – CP 2010, D. Cohen, Editor. 2010, Springer Berlin Heidelberg. p. 167-175. 295. Lunde, B.M., C. Moore, and G. Varani, RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol, 2007. 8(6): p. 479-90. 296. Fu, Y., et al., MBNL1-RNA recognition: contributions of MBNL1 sequence and RNA conformation. Chembiochem, 2012. 13(1): p. 112-9. 297. Tan, R. and A.D. Frankel, Circular dichroism studies suggest that TAR RNA changes conformation upon specific binding of arginine or guanidine. Biochemistry, 1992. 31(42): p. 10288-10294. 298. Circular Dichroism: Principles and Applications. 2nd ed. 2000, New York: Wiley-VCH. 912. 299. Fox, J.W. and K.P. Wong, Changes in the conformation and stability of 5 S RNA upon the binding of ribosomal proteins. Journal of Biological Chemistry, 1978. 253(1): p. 18-20. 300. Kelly, S.M. and N.C. Price, The Use of Circular Dichroism in the Investigation of Protein Structure and Function. Current Protein & Peptide Science, 2000. 1(4): p. 349-384. 301. Kim, I., et al., Rapid purification of RNAs using fast performance liquid chromatography (FPLC). RNA, 2007. 13(2): p. 289-294.

194

302. Gao, F.B., et al., Selection of a subset of mRNAs from combinatorial 3' untranslated region libraries using neuronal RNA-binding protein Hel-N1. Proc Natl Acad Sci U S A, 1994. 91(23): p. 11207-11. 303. Levine, T.D., et al., Hel-N1: an autoimmune RNA-binding protein with specificity for 3' uridylate-rich untranslated regions of growth factor mRNAs. Mol Cell Biol, 1993. 13(6): p. 3494-504. 304. Hiller, M., et al., Using RNA secondary structures to guide sequence motif finding towards single-stranded regions. Nucleic Acids Res, 2006. 34(17): p. e117. 305. Mergny, J.L. and L. Lacroix, Analysis of thermal melting curves. Oligonucleotides, 2003. 13(6): p. 515-37. 306. De Mesmaeker, A., et al., Antisense Oligonucleotides. Accounts of Chemical Research, 1995. 28(9): p. 366-374. 307. Borer, P.N., et al., Stability of ribonucleic acid double-stranded helices. J Mol Biol, 1974. 86(4): p. 843-53. 308. Huang, M., et al., Improvement of DNA and RNA Sugar Pucker Profiles from Semiempirical Quantum Methods. Journal of Chemical Theory and Computation, 2014. 10(4): p. 1538-1545. 309. Eichhorn, C.D., et al., Unraveling the structural complexity in a single-stranded RNA tail: implications for efficient ligand binding in the prequeuosine riboswitch. Nucleic Acids Res, 2012. 40(3): p. 1345-55. 310. Broyde, S.B., et al., Classical potential energy calculations for ApA, CpC, GpG, and UpU. The influence of the bases on RNA subunit conformations. Biopolymers, 1975. 14(8): p. 1597- 1613. 311. Gao, X.L. and D.J. Patel, Antitumour drug-DNA interactions: NMR studies of echinomycin and chromomycin complexes. Q Rev Biophys, 1989. 22(2): p. 93-138. 312. Yi-Brunozzi, H.Y., et al., A ribose sugar conformational switch in the LTR-retrotransposon Ty3 polypurine tract-containing RNA/DNA hybrid. J Am Chem Soc, 2005. 127(47): p. 16344-5. 313. Nishinaka, T., et al., Base pair switching by interconversion of sugar puckers in DNA extended by proteins of RecA-family: a model for homology search in homologous genetic recombination. Proceedings of the National Academy of Sciences of the United States of America, 1998. 95(19): p. 11071-6. 314. Newman, M.A., J.M. Thomson, and S.M. Hammond, Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA, 2008. 14(8): p. 1539-1549. 315. Yang, X., et al., Double-negative feedback loop between reprogramming factor LIN28 and microRNA let-7 regulates aldehyde dehydrogenase 1-positive cancer stem cells. Cancer Res, 2010. 70(22): p. 9463-72. 316. Damianov, A. and D.L. Black, Autoregulation of Fox protein expression to produce dominant negative splicing factors. RNA, 2010. 16(2): p. 405-16. 317. Koscianska, E., et al., High-resolution northern blot for a reliable analysis of microRNAs and their precursors. ScientificWorldJournal, 2011. 11: p. 102-17. 318. Michel, E. and K. Wuthrich, High-yield Escherichia coli-based cell-free expression of human proteins. J Biomol NMR. 53(1): p. 43-51. 319. Dogar, A.M., H. Towbin, and J. Hall, Suppression of latent transforming growth factor (TGF)- beta1 restores growth inhibitory TGF-beta signaling through microRNAs. J Biol Chem, 2011. 286(18): p. 16447-58. 320. Martin, F.H., O.C. Uhlenbeck, and P. Doty, Self-complementary oligoribonucleotides: adenylic acid-uridylic acid block copolymers. J Mol Biol, 1971. 57(2): p. 201-15. 321. Savitzky, A. and M.J.E. Golay, Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 1964. 36(8): p. 1627-1639. 322. Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol, 2009. 10(3): p. R25.

195

323. Anders, S. and W. Huber, Differential expression analysis for sequence count data. Genome Biol. 11(10): p. R106. 324. Micura, R., et al., Methylation of the nucleobases in RNA oligonucleotides mediates duplex- hairpin conversion. Nucleic Acids Res., 2001. 29: p. 3997. 325. Chiu, Y.L. and T.M. Rana, siRNA function in RNAi: a chemical modification analysis. RNA, 2003. 9(9): p. 1034-48. 326. Gu, S., et al., Thermodynamic stability of small hairpin RNAs highly influences the loading process of different mammalian Argonautes. Proc. Natl. Acad. Sci. U.S.A., 2011. 108: p. 9208. 327. Macmillian, A.M. and G.L. Verdine, Synthesis of Functionally Tethered Oligodeoxynucleotides by the Convertible Nucleoside Approach. J. Org. Chem., 1990. 55: p. 5931. 328. Shah, K., H. Wu, and T.M. Rana, Synthesis of Uridine Phosphoramidite Analogs: Reagents for Site-Specific Incorporation of Photoreactive Sites into RNA Sequences. Bioconjugate Chem., 1994. 5: p. 508. 329. Davis, A.R. and B.M. Znosko, Thermodynamic Characterization of Single Mismatches Found in Naturally Occurring RNA†. Biochemistry, 2007. 46(46): p. 13425-13436. 330. Engel, J.D. and P.H. Von Hippel, Effects of methylation on the stability of nucleic acid conformations. Monomer level. Biochemistry, 1974. 13: p. 4143. 331. Allerson, C.R., S.L. Chen, and G.L. Verdine, A Chemical Method for Site-Specific Modification of RNA: The Convertible Nucleoside Approach. Journal of the American Chemical Society, 1997. 119(32): p. 7423-7433. 332. Grasby, J.A., et al., Synthesis and applications of oligoribonucleotides containing N4- methylcytidines. Nucleosides, Nucleotides Nucleic Acids, 1995. 14: p. 1129. 333. Kierzek, R., M.E. Burkard, and D.H. Turner, Thermodynamics of Single Mismatches in RNA Duplexes. Biochemistry, 1999. 38: p. 14214. 334. O'Donnell, K.A., et al., c-Myc-regulated microRNAs modulate E2F1 expression. Nature, 2005. 435: p. 839. 335. Hermeking, H., The miR-34 family in cancer and apoptosis. Cell Death Differ, 2009. 17(2): p. 193-199. 336. Dogar, A.M., H. Towbin, and J. Hall, Suppression of latent TGF-beta1 restores growth inhibitory TGF-beta signaling through microRNAs. Journal of Biological Chemistry, 2011. 337. Ivanovska, I., et al., MicroRNAs in the miR-106b Family Regulate p21/CDKN1A and Promote Cell Cycle Progression. Mol. Cell. Biol., 2008. 7: p. 2167. 338. Yamakuchi, M., M. Ferlito, and C.J. Lowenstein, miR-34a repression of SIRT1 regulates apoptosis. Proc. Natl. Acad. Sci. U.S.A., 2008. 36: p. 13421. 339. Jackson, A.L., et al., Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. RNA, 2006. 12: p. 1179. 340. Manoharan, M., et al., Unique gene-silencing and structural properties of 2'-fluoro-modified siRNAs. Angew. Chem., Int. Ed., 2011. 50: p. 2284. 341. Samols, D., et al., Evolutionary conservation among biotin enzymes. J Biol Chem, 1988. 263(14): p. 6461-4. 342. Tong, L., Structure and function of biotin-dependent carboxylases. Cell Mol Life Sci, 2013. 70(5): p. 863-91. 343. Campbell, J.W. and J.E. Cronan, Jr., Bacterial fatty acid biosynthesis: targets for antibacterial drug discovery. Annu Rev Microbiol, 2001. 55: p. 305-32. 344. Eisenberg, M.A., O. Prakash, and S.C. Hsiung, Purification and properties of the biotin repressor. A bifunctional protein. J Biol Chem, 1982. 257(24): p. 15167-73. 345. Choi-Rhee, E. and J.E. Cronan, The biotin carboxylase-biotin carboxyl carrier protein complex of Escherichia coli acetyl-CoA carboxylase. J Biol Chem, 2003. 278(33): p. 30806-12. 346. Fall, R.R. and P.R. Vagelos, Biotin carboxyl carrier protein from Escherichia coli. Methods Enzymol, 1975. 35: p. 17-25.

196

347. Green, N.M., Avidin. 1. The Use of (14-C)Biotin for Kinetic Studies and for Assay. Biochem J, 1963. 89: p. 585-91. 348. Hofmann, K., et al., Biotinylinsulins as potential tools for receptor studies. Proc Natl Acad Sci U S A, 1977. 74(7): p. 2697-700. 349. de Boer, E., et al., Efficient biotinylation and single-step purification of tagged transcription factors in mammalian cells and transgenic mice. Proc Natl Acad Sci U S A, 2003. 100(13): p. 7480-5. 350. Chattopadhaya, S., L.P. Tan, and S.Q. Yao, Strategies for site-specific protein biotinylation using in vitro, in vivo and cell-free systems: toward functional protein arrays. Nat Protoc, 2006. 1(5): p. 2386-98. 351. Fernandez-Suarez, M., T.S. Chen, and A.Y. Ting, Protein-protein interaction detection in vitro and in cells by proximity biotinylation. J Am Chem Soc, 2008. 130(29): p. 9251-3. 352. Elia, G., Protein biotinylation. Curr Protoc Protein Sci, 2010. Chapter 3: p. Unit 3 6. 353. Kay, B.K., S. Thai, and V.V. Volgina, High-throughput biotinylation of proteins. Methods Mol Biol, 2009. 498: p. 185-96. 354. Agafonov, D.E., et al., C-terminal modifications of a protein by UAG-encoded incorporation of puromycin during in vitro protein synthesis in the absence of release factor 1. Chembiochem, 2006. 7(2): p. 330-6. 355. Watanabe, T., et al., Position-specific incorporation of biotinylated non-natural amino acids into a protein in a cell-free translation system. Biochem Biophys Res Commun, 2007. 361(3): p. 794-9. 356. Lesaicherre, M.L., et al., Intein-mediated biotinylation of proteins and its application in a protein microarray. J Am Chem Soc, 2002. 124(30): p. 8768-9. 357. Taki, M., S.Y. Sawata, and K. Taira, Specific N-terminal biotinylation of a protein in vitro by a chemically modified tRNA(fmet) can support the native activity of the translated protein. J Biosci Bioeng, 2001. 92(2): p. 149-53. 358. Lue, R.Y., et al., Versatile protein biotinylation strategies for potential high-throughput proteomics. J Am Chem Soc, 2004. 126(4): p. 1055-62. 359. Schatz, P.J., Use of peptide libraries to map the substrate specificity of a peptide-modifying enzyme: a 13 residue consensus peptide specifies biotinylation in Escherichia coli. Biotechnology (N Y), 1993. 11(10): p. 1138-43. 360. Cull, M.G. and P.J. Schatz, Biotinylation of proteins in vivo and in vitro using small peptide tags. Methods Enzymol, 2000. 326: p. 430-40. 361. Beckett, D., E. Kovaleva, and P.J. Schatz, A minimal peptide substrate in biotin holoenzyme synthetase-catalyzed biotinylation. Protein Sci, 1999. 8(4): p. 921-9. 362. Clery, A., M. Blatter, and F.H. Allain, RNA recognition motifs: boring? Not quite. Curr Opin Struct Biol, 2008. 18(3): p. 290-8. 363. Valverde, R., L. Edwards, and L. Regan, Structure and function of KH domains. FEBS J, 2008. 275(11): p. 2712-26. 364. Masliah, G., P. Barraud, and F.H. Allain, RNA recognition by double-stranded RNA binding domains: a matter of shape and sequence. Cell Mol Life Sci, 2013. 70(11): p. 1875-95. 365. Hall, T.M., Multiple modes of RNA recognition by zinc finger proteins. Curr Opin Struct Biol, 2005. 15(3): p. 367-73. 366. Li, S.J. and J.E. Cronan, Jr., The gene encoding the biotin carboxylase subunit of Escherichia coli acetyl-CoA carboxylase. J Biol Chem, 1992. 267(2): p. 855-63. 367. Baba, T., et al., Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol, 2006. 2: p. 2006 0008. 368. Michel, E., et al., Amino acid-selective segmental isotope labeling of multidomain proteins for structural biology. Chembiochem, 2013. 14(4): p. 457-66. 369. Howarth, M., et al., A monovalent streptavidin with a single femtomolar biotin binding site. Nat Methods, 2006. 3(4): p. 267-73.

197

370. Michel, E. and K. Wüthrich, Cell-free expression of disulfide-containing eukaryotic proteins for structural biology. FEBS J, 2012. 279(17): p. 3176-84. 371. Laemmli, U.K., Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 1970. 227(5259): p. 680-5. 372. Keller, R., The Computer Aided Resonance Assignment Tutorial. 2004, Goldau, Switzerland: Cantina Verlag.

198

CURRICULUM VITAE: MORITZ STOLTZ

Personal Information:

Karstlernstrasse 14 Date of Birth: July 24th, 1983 8048 Zurich Citizenship: German

Education:

10/2009 – 07/2015 PhD Thesis, ETH Zurich, Institute of Pharmaceutical Science Switzerland, Prof. Jonathan Hall

Thesis Title: “Interactions of the alternative splicing factor RBFOX with non-coding RNAs.” 02/2008 – 03/2009 Master Studies, Chemistry, University of Basel, Basel, Switzerland Master thesis under supervision of Prof. B. Giese, Institute of Organic Chemistry

Thesis Title: “Direction dependence of electron transport through peptides”

Completed with Master in Science 02/2007– 12/2007 Master Studies, Chemistry, University of Otago, Dunedin, New Zealand

Completed with the Postgraduate Diploma in Science with Credit 10/2003 – 12/2006 Diploma Studies, Chemistry, University of Kiel, Kiel, Germany

Intermediate Diploma in 2005 08/1993 – 05/2003 Grammar school, König Wilhelm Gymnasium Höxter; Germany

Work experience:

10/2009 – present ETH Zurich, Switzerland

 Development of a method to determine new RNA binding motifs of proteins  Investigation of binding mechanisms of RNA protein complexes  Synthesis of RNA libraries

199

Publications:

B. Giese, M. Wang, J. Gao, Stoltz M., P. Müller, and M. Graber, Electron Relay in Peptides, J. Org. Chem. 2009, 74 (10), 3621–3625. Stoltz, M., Guennewig, B., Menzi, M., Dogar, A. M., Hall J., Properties of N4-Methylated Cytidines in miRNA Mimics, Nucleic Acid Ther. 2012, 22 (2), 109-116. Stoltz, M., Zagalak J., Dogar, M., Hall J., Rationalizing non-canonical RNA-binding motifs of RBFOX, in preparation. 2015. Afzal M. Dogar, Julian A. Zagalak, Moritz Stoltz, Andreas Brunschweiger, Harry Towbin, Jochen Imig, Erich Michel, Pejman Mohammadi, Frédéric H.-T. Allain, Jonathan Hall, RBFOX2 induces microRNA strand bias for self-regulation, in revision. 2015. Michel E., Stoltz M., Hall J., Allain F., Rapid high yield cell free expression of quantitatively biotinylated proteins, in preparation. 2015.

200