EXAMINATION OF LOCAL AND LONG DISTANCE EFFECTS OF SUB-

STITUTIONS IN SECONDARY RNA STRUCTURES

by

Daniela Giachetti

A Thesis Submitted to the Faculty of

The Wilkes Honors College

in Partial Fulfillment of the Requirements for the Degree of

Bachelor of Science in Liberal Arts and Sciences

with a Concentration in Biology

Wilkes Honors College of

Florida Atlantic University

Jupiter, Florida

May 2019 EXAMINATION OF LOCAL AND LONG DISTANCE EFFECTS OF NUCLEOTIDE SUB-

STITUTIONS IN SECONDARY RNA STRUCTURES

by

Daniela Giachetti

This thesis was prepared under the direction of the candidate’s thesis advisor, Dr. Catherine Triv- igno, and has been approved by the members of her supervisory committee. It was submitted to the faculty of The Honors College and was accepted in partial fulfillment of the requirements for the degree of Bachelor of Science in Liberal Arts and Sciences.

SUPERVISORY COMMITTEE:

______Dr. Catherine Trivigno

______Dr. Monica Maldonado

______Dean Ellen Goldey, Wilkes Honors College

______Date

ii Acknowledgments

I would first like to thank my family for the continuous support throughout my years at the Honors College and during this thesis. Moreover, I would like to thank Dr. Trivigno for turn- ing this idea into a reality. It has been an absolute pleasure to be mentored by someone who has remained as patient, understanding and resourceful as she has, throughout my entire research. I cannot be more grateful for all her help, this truly would not have been possible without her guidance.

iii ABSTRACT

Author: Daniela Giachetti

Title: Examination of Local and Long Distance Effects of Nucleotide Substitutions in Secondary RNA Structures

Institution: Wilkes Honors College of Florida Atlantic University

Thesis Advisor: Dr. Catherine Trivigno

Degree: Bachelor of Science in Biology

Concentration: Biology

Year: 2019

Researchers are currently designing for diagnostic and therapeutic applications, and an understanding of the factors that affect stability of these RNAs could inform the design of new RNA constructs. This project involved the use of in silico experiments to investigate the ef- fects of different nucleotide substitutions on the stability of RNAs using EteRNA, a publicly available, online computer modeling software developed by researchers at Carnegie Mellon and

Stanford Universities to study RNA folding. It was determined that while many types of single nucleotide substitutions can alter the stability of secondary structures, such as loops, the stability of adjacent structures was not affected.

iv Table of Contents Introduction...... 1 Methods/Materials...... 14 Body...... 1 Structure of RNA...... 1 Function of RNA...... 3 Riboswitches...... 6 Catalytic RNA...... 7 Secondary Structures...... 8 Relevance...... 9 Specific Aims...... 13 Results...... 16 Discussion/Future Work...... 30 Conclusion...... 31 References ...... 32

v List of Tables Table 1- Energy States/Changes in Energy for Loops...... 29

vi List of Figures Figure 1- Deoxyribose vs Ribose...... 3 Figure 2- Peptide Synthesis...... 5 Figure 3- Secondary RNA Structures...... 10 Figure 4- EteRNA interface/Overview of tRNA...... 18 Figure 5a- AU closing orientation w/out boost...... 19 Figure 5b- AU closing orientation w/ boost...... 19 Figure 6a- GC closing orientation w/out boost...... 20 Figure 6b- GC closing orientation w/ boost...... 20 Figure 7a- CG closing orientation w/out boost...... 21 Figure 7b- CG closing orientation w/ boost...... 21 Figure 8a- Starting Energies for GC orientation...... 23 Figure 8b- Starting Energies for CG orientation...... 24 Figure 8c- Starting Energies for AU orientation...... 25 Figure 9- Comparison of Boost affects on Loop Sizes...... 27

vii Introduction:

Ribonucleic acid is a molecule that plays many unique roles within a cell and exists in several forms. Understanding the relationship between the structure and function of RNA is fun- damental in the knowledge of genetics as well as medical therapies created to combat genetic diseases. In order to fully understand the processes that occur in our cells, we must consider rela- tionships between the structure and function of the molecules that participate in them. The dis- covery of the “Watson-Crick” structure of DNA in 1953 revealed how DNA could replicate ge- netic information from generation to generation, initiated the research of RNA. Scientists did not truly begin the mission to examine ribonucleic acid (RNA) until they discovered that cells must have specific machinery to place amino acids in a specified order in . Five years later, in

1958, the initial understanding of transfer RNA (tRNA) was developed and shortly after messen- ger RNA (mRNA) was hypothesized by Francois Jacob in 1959. From that point onward, re- searchers understood that a cell’s specific abilities depended largely on its mRNA content (James

Darell, 2011). This composition aims to highlight factors that contribute to the stability of RNA secondary structures and is explored through an interactive open-source modeling software called EteRNA.

Structure

Ribonucleic acid (RNA) is a polymeric molecule that has various biological roles in cod- ing, decoding, regulation and expression of . RNA is a nucleic acid that is made up of hun- dreds or thousands of smaller single unit molecules known as . RNA is typically a single stranded biopolymer and is composed of ribonucleotides that are linked by phosphodiester bonds, forming strands of varying lengths. A ribonucleotide in the RNA chain is composed of

1 three unique parts; a phosphate group, a five-carbon sugar (ribose) and a nitrogenous base. There are four different nitrogenous bases in RNA; , , guanine and . Similar to

DNA, RNA is composed of complementary base pairing; adenine is the complementary of uracil and guanine is the complementary base pair with cytosine. The phosphate groups and the ribose sugar function as the polymers’ backbone, while the bases attach to the backbone. The ribose sugar of RNA is a cyclical structure consisting of five carbons and only one oxygen. The presence of the hydroxyl group that is attached to the second carbon group in the ribose sugar molecule makes it more prone to hydrolysis. RNA hydrolysis is a reaction where the phosphodi- ester bond in the sugar-phosphate backbone of RNA is broken, which in turn cleaves the RNA molecule (Denise Ferrier, 2017).

Although DNA and RNA both carry genetic information, there are many differences be- tween the two molecules. First and foremost, DNA contains the sugar deoxyribose while RNA contains the sugar ribose. The difference between the two sugars is that the ribose sugar contains an additional hydroxyl group than deoxyribose. DNA has a hydrogen attached to the 2’ carbon instead of the hydroxyl, making DNA more stable than RNA. Another structural difference is that DNA is usually a double stranded molecule consisting of long chains of nucleotides, where as RNA is typically a single stranded molecule with shorter chain of nucleotides. In addition,

RNA and DNA have slightly different base pairings since DNA contains the base thymine in- stead of uracil which is seen in RNA. The difference in these bases affect the complementary base pairing of either molecule. Primarily, RNA helices are thought to be composed of Watson-

Crick base-pairs, however, other helical forms have been observed. Watson-Crick base pairing can be characterized as a base pairing model where purines (adenine/guanine) always pair with

2 pyrimidine (cytosine/uracil). However, each purine binds to one particular type of pyrimidine.

For example, adenine binds to uracil while guanine binds to cytosine. This is often referred to as complementary base pairing. (Denise Ferrier, 2017)

Figure 1 - Deoxyribose vs. Ribose. This figure illustrates that ribonucleotides contain pentose sugar while deoxyribonucleotides contain deoxyribose. RNA contains the base uracil instead of thymine which is found in DNA. https://courses.lumenlearning.com/microbiology/chapter/struc- ture-and-function-of-/

Function

Aside from the structural differences between the two molecules, DNA and RNA also perform different functions in humans. For example, DNA is responsible for storing and transfer- ring genetic information while RNAs can code for amino acids as well as acting as a messenger between DNA and ribosomes in order to make proteins. In short, DNA can be thought of as a

“storage device” that allows the blueprint of life to be passed down many generations; while

RNA functions as the “reader” that decodes this storage device in a multi-step process with dif- ferent specialized RNAs involved for each of the different steps. Below we will highlight the many different types of RNAs associated with this process as well as others (Sarah Woodson,

2015).

3 Of the many different types of RNA molecules, the three most well-known and common- ly studied are messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA).

These three types of RNA also differ from each other in size, function and special structural modifications. Messenger RNA comprises around 5% of the RNA in a cell, yet is the most het- erogeneous type of RNA in size and base sequence. mRNA is considered as “coding” RNA, giv- en that it carries genetic information from DNA for use in synthesis. For example, in eu- karyotes this would involve transport of mRNA out of the nucleus and into the cytosol. In addi- tion to the protein coding regions of the mRNA strand, mRNA contains untranslated regions at its 5’ and 3’ ends. The special characteristics of mRNA includes a long sequence of adenine nu- cleotides on the 3’ end as well as a cap on the 5’ end consisting of a molecule of 7-methylguano- sine attached through unusual triphosphate linkages. Transfer RNA is a small chain of about 80 nucleotides that assists in by transferring specific amino acids that correspond to the mRNA sequence. Each tRNA serves as an adaptor molecule that carries its specific amino acids, attached to its 3’ end, to the site of protein synthesis. At this site, it recognizes the genetic code sequence on an mRNA molecule that specifies which amino acids is added to the growing polypeptide chain. Interestingly, tRNA’s has a distinctive folded structure with three hairpin loops that shape a three-leafed clover. This special shape allows tRNA to carry out its specific function of transferring the appropriate amino acid to its chain. tRNA consists of around 15% of the RNA in a cell. Lastly, ribosomal RNA constitutes the catalytic component within the ribo- some. In the cytoplasm, rRNA and protein components come together to form a nucleoprotein complex called the ribosome, which binds mRNA and synthesizes proteins. Interestingly, rRNA makes up around 80% of the RNA in a cell (Denise Ferrier, 2017).

4 Figure 2 - Peptide Synthesis. This picture illustrates the process of peptide synthesis. The mRNA code arrives at the ribosomal RNA. Every three letters section of the RNA is termed the codon. Each codon is read the ribosome and each codon codes for a specific amino acid. The ri- bosome moves across the mRNA reading each codon then a tRNA is signaled to pick up the amino acid specified by the codon. The tRNA returns the correct amino acid to the ribosome landing on the mRNA codon and releases its amino acid to attach to the growing peptide chain. This process is also known as translation. http://www2.nau.edu/lrm22/lessons/protein_synthe- sis/protein_synthesis.htm.

In addition to the previously described types of RNA, RNA can be divided into two broad categories, coding RNA (cRNA) and noncoding RNA (ncRNA). Although many people realize the significance of mRNA in , tRNA in translation and rRNA in the composition of ribosomes, many do not realize the significant roles RNA has in other essential tasks. A few ex- amples of noncoding RNAs in eukaryotes are small nuclear RNA (snRNA), microRNA (miR-

NA), short interfering RNA (siRNA), small nucleolar RNA (snoRNA), and kiwi-interacting RNA

(piRNAs). snRNAs are found in the nucleus, typically bound to proteins in complexes called small nuclear ribonucleoproteins and play a critical role in regulation through RNA splic- ing. miRNAs function in gene regulation and have been shown to inhibit gene expression by re-

5 pressing translation. siRNAs are relatively small, only 21-25 base pairs in length, and function to inhibit gene expression specifically by incorporating into a unique complex which inhibits tran- scription. snoRNAs are located inside the nucleolus which is where rRNA processing and ribo- somal assembly take place. snoRNAs function to process rRNA molecules which often results in methylation and pseudouridylation of specific nucleosides. piRNA interacts with the piwi family of proteins; the function of these molecules involve chromatin regulation and the suppression of transposon activity in germline and somatic cells (Suzanne Clancy, 2008).

Riboswtiches

Another interesting feature about RNA is its ability to function in gene regulation through

‘riboswitches,’ otherwise known as ‘RNA switches.’ Most living organisms must have the ability to sense environmental stimuli and convert these input signals into appropriate cellular respons- es. In RNA, riboswitches have become recognized as this fundamental element in control of re- sponding to environmental stimuli through its function as genetic switches. Riboswitches are composed of two domains, the aptamer domain and the expression platform. The aptamer do- main acts as the receptor that binds specific ligands, while the expression platform acts directly on gene expression through its ability to transition between secondary structures in response to ligand binding. Essentially, riboswitches are regions of mRNA that contain specific ligand-bind- ing domains along with a variable sequence, termed the expression platform, that enables regula- tion of the downstream coding sequences. Binding to the riboswitch sensor occurs if the me- tabolite abundance exceeds a specific threshold, which in turn induces a conformational change in the expression platform. Riboswitches are primarily located in the 5’ untranslated region of

6 bacterial mRNA. However, that is not always the case, given that researchers have discovered in eukaryotic mRNA, the thiamine pyrophosphate (TPP) riboswitch regulates splicing at the 3’ end.

Typically, riboswitches are categorized into families and classes based upon the type of ligand they bind as well as their secondary structure. For example, within a family of riboswitch- es that are related by the type of ligand they recognize, they are further distinguished into classes based upon the pattern that defines the ligand-binding pocket. Aside from the differences in their classification, riboswitches are universally important for their role in regulating transcription.

The important role they have in bacteria is important for humans as well because it allows for the development of novel antibiotics aimed towards stopping bacterial pathogens from thriving in humans. Furthermore, this allows for drugs to be designed to affect riboswitches through the lig- and binding interactions which in turn is a powerful tool in research. (Alexander Serganov,

2013).

Catalytic RNA

Aside from RNA’s ability to act as both a carrier of genetic information and as a ri- boswitch molecule, RNA can also function as a catalyst of chemical reactions. Catalytic RNA, ribozymes, are RNA molecules that accelerate chemical reactions; and are made up of RNA rather than protein. The discovery of catalytic RNA was a surprise and at first seemed ill-suited to be a catalyst. However, RNA is now recognized as an active catalyst in biology in various re- actions such as small ribozymes, in self splicing of specific intros, tRNA processing and also as the catalytic center of the ribosome and spliceosome. It is rather remarkable that a nucleic acid has the ability to function as an enzyme given that its components have a small fraction of the chemical space in comparison to a typical catalytic protein. Yet, in the same context, ribozymes

7 have the ability to accelerate reactions a million-fold or more. Ribozymes are responsible for catalyzing some of the cell’s two most important reactions, the condensation of amino acids in the peptidyl transferase center of the ribosome and the splicing of mRNA in eukaryotes. Aside from the responsibility of important chemical reactions in cells, ribozymes are also essential for the postulate of early life development on planet earth. In fact, it is the key component that sup- ports the RNA world hypothesis proposed by Crick and Orgel. While no one can be sure of how life truly formed, the discovery of peptide transferase ribozyme is critical evidence of the pro- posed theory (Timothy Wilson, 2015).

Secondary Structures

Aside from the numerous complex functions RNA has in the cell, the structure of RNA is rather complex and it can take on a three-dimensional shapes to perform specific roles. RNA’s structure is similar to a protein’s structure in that it can be described at the primary, secondary, tertiary and quaternary levels. However, RNA forms local stable structures, structural motifs, that are linked and constrained by tertiary interactions. At the primary level, RNA motifs can be iden- tified by short sequences in functional RNAs such as tRNA or rRNA. The secondary level can be identified both by base-paired regions (helices) and closed non-paired regions (loops). The ter- tiary structure of RNA describes its overall 3D shape and is determined by long range intramole- cular interactions such as kissing hairpin loops, pseudonknots, ribose zippers, tetraloop-tetraloop interactions and many more which will be described later. This tertiary structure can also be me- diated by intermolecular interactions with ligands such as metals, small molecules and other macromolecules. The definition of RNA motifs have been described as “directed and ordered stacked arrays of non-Watson-Crick base pairs forming distinctive foldings of the phosphodiester

8 backbones of the interacting RNA strands and as a discrete sequence or combination of base jux- tapositions found in naturally occurring RNAs in unexpectedly high abundance” (Donna Hen- drix, 2006).

Although the prevalence of these motifs are great, the amount of these identifiable motifs are relatively small in number. For example, the examination of RNA’s three-dimensional struc- ture revealed that it can be composed of a combination of recurring motifs joined by different types of tertiary interaction motifs. In addition, RNA’s interaction with metal ions, small mole- cules, proteins and other RNAs can also be associated with specific motifs. Motifs are usually characterized by secondary structure type such as hairpin loops, internal loops, and junction loops. Hairpin loops can be described as loops that link the 3’- and 5’- ends of a double helix, however, the structural characterization of these must close with a Watson-Crick pairing and vary in length from 2 to 14 nucleotides. The most common and well-studied hairpin loops are tetra loops. Another motif is the internal loop, which is defined as a loop that separates a double heli- cal RNA into two segments by inclusion of residues that do not follow the Watson-crick pairing in at least one strand of the duplex. A special case of the internal loop is the ‘bulge loop’ which can be separated into two types, symmetrical and asymmetrical. Symmetrical in the case that it has the same number of nucleotides on each side where as asymmetrical has a different number of nucleotides on the opposing strands. Lastly, junction loops, otherwise known as multi-loops, are formed by the intersection of three or more double helices. An important feature of junction loops are the coaxial (continuous) stacking of the helices which usually occurs opposite the long- est strand. Common examples of these junction loops are found in tRNA and specific ribozymes

(Donna Hendrix, 2006).

9 Figure 3 - Secondary RNA structures. Fallmann, Jörg, et al. “Recent Advances in RNA Folding.” Vol. 261, 2017, pp. 97–104.

Relevance

Despite the various structures of motifs, a common and primary function of motifs is their ability to bind a ligand either for structural stabilization, as a cofactor, or simply as a signal.

Interestingly, RNA molecules have recently become attractive as potential drug targets due to their ability to allow small molecules to bind to defined sites in their RNA chain, thereby modu- lating or blocking their function. The function of many RNAs depend on their interactions with other molecules in the cell. For example, many regulatory RNA molecules only utilize their func- tion by interacting with ligands. Ligand binding is important for various types of RNA such as ribozymes, and riboswitches including splicing functions as well as RNA-RNA intermolecular and tertiary interactions. However, ligand binding has high selectivity and specificity to bind to a given motif. This high specificity and selectivity is useful for RNA’s real world applications

(Wedekind & McKay, 2003).

The increase of antibiotic resistance bacteria has been a serious health threat that many researchers seek a solution. While for many years, RNA has been under-appreciated for its role

10 in drug development, many researchers have now turned to RNA for new drug discovery. Given that RNA is present in almost every living cell, yet has a relatively simple structure, this makes it an enticing target for drug discovery. Another reason RNA is used for drug development is the relevant and useful function of the previously described riboswitches. Simply put, researches uti- lize riboswitches by designing small molecules that can disrupt the key RNA signaling molecule interaction in a way that would “turn off the switch” and kill the desired bacteria (Liszweski,

2018). In addition to this function, a new generation of drugs including enzymes, antibodies and immunostimulatory antigens have been administered through mRNA which acts to deliver genet- ic information. This mRNA can be designed to express proteins that assists the body in produc- ing its own “medicine.” The ability for mRNA to subject itself to the cell’s protein-expression machinery serves as a valuable template for an anticancer therapeutic or an immunostimulating antigen (Connelly et al, 2016). Currently, Moderna Therapeutics utilizes mRNA to develop a vaccine against cytomegalovirus (CMV), which is the leading cause of disabling infections in newborns. While there are no vaccines approved for CMV, this team of researchers engineered and produced an mRNA vaccine candidate. This mRNA is called mRNA-1647, which encodes six viral proteins in one. Past attempts were unable to include two of the important antigens re- quired to block the CMV infection. While this study is still underway, it illustrates the important role RNA has in drug development and therapy (Liszweski, 2018).

The process of RNA folding is rather complex and still poorly understood. However, the structure of an RNA sequence is an excellent predictor of its biological function. As a result, for the various RNA molecules to function properly, they must fold into specific three-dimensional shapes. Similarly, proteins where the amino acid sequence determines the structure, the ribonu-

11 cleotide sequence also determines the pattern of base pairs (secondary structure) and the global shape (tertiary structure). The dominating process in is driven by hydrophobic forces, while a hierarchical folding process is observed in RNA with base pairs and secondary structures rapidly forming. However, the formation of the tertiary structure is a slower process.

The secondary structure elements are formed by intramolecular interactions of nucleotides specifically through hydrogen bonds between the base pairs. Typically, RNA folding follows the standard base pairing known as the Watson-Crick base pairs. The base pairs guanine and cytosine form three hydrogen bonds while adenine and uracil only form two. This is relevant when con- sidering their energy contributions to the overall structure, with GC pairs contributing more than the AU pairs. However, the primary source of energy does not come from the hydrogen bonds, rather it is from the stacking interaction of the pi electron systems of the aromatic rings of the . “As a result, almost all RNAs form highly stable, well-defined secondary struc- tures, while protein structures often remain flexible or are only marginally stable at room tem- perature” (Fallmann, 2017).

Aside from the base pairing that occurs, other factors also contribute to the overall stabili- ty and formation of the RNA helices. For example, GU wobble-base pairs are bases that do not follow the standard Watson-Crick pairing and are still frequently seen throughout RNA struc- tures. In addition to wobble base pairs, long range interactions (pseudoknots/kissing hairpins) that occur when a stem or loop region interacts with another non-adjacent stem or loop also con- tribute to the stability of an RNA structure (Fallmann, 2017).

Another important factor to consider in the process of RNA folding is the use of metals or other cations to overcome the charge repulsion of the phosphate groups on the backbone of

12 RNA. The strong electrostatic repulsion between the closely-packed phosphate anions on the backbone often unwinds RNA. However, positive ions promote folding by reducing the repul- sion between RNA phosphates thus assist in stabilizing the folded conformation of RNA. The strong attractive interactions between cations and RNA are generally represented by the forma- tion of different bound states in solution. For example, bound states in RNA systems are often categorized into two distinct classes depending on the strength of their interaction; diffusely bound and site bound. Diffusely bound systems are both univalent and divalent salts that interact with the strong anionic field around the RNA backbone via long-range interactions through sol- vent. In site binding, ions are trapped near the surface of RNA when their translational energy is not sufficient to overcome strong local attractive forces. Though this problem is far from solved, it has been established that small ions such as Mg2+ are most efficient in stabilizing RNA ter- tiary structures while larger ions produce more swollen structures that are loosely-packed (Her- schlag, 2015).

Specific Aims:

This paper aims to explore three unique questions relating to the stability and energy of an RNA structure. The first aim is to determine whether a single nucleotide substitution within an RNA loop will affect the energy and stability of that loop. Second, we examined if the size and nu- cleotide closing pair of a loop affects its stability and energy. Third, we determine if the induced changed produced by the nucleotide substitution within the loop effects the energy of the rest of the molecule.

13 Materials and Methods:

Eterna Training and Folding:

The materials and methods used throughout the experiment are primarily based upon the online software program EteRNA. The EteRNA project is a massive open laboratory developed by researchers at Carnegie Mellon and Stanford Universities to recruit public users to design molecules for biomedical research. The program also allows its participants to design, simulate and manipulate RNA structures and also control a remote experimental pipeline for high- throughput RNA synthesis and structure mapping. New users must first complete a series of tuto- rials that guide the user through the use of the various tools the software provides while teaching concepts related to RNA folding and stability. Upon completion, users are then able to perform their own experiments and design new RNA constructs. The users may rearrange the sequence by placing the four different nucleotides at various positions; this in turn alters the RNA’s energy, stability and shape.

Nucleotide stability experiment:

The aim of the experiment was to determine if specific nucleotide substitutions on sec- ondary RNA structures, such as end loops, would alter the stability and energy of the loop itself as well as the overall structure. The energy of each loop in EteRNA is determined by the last paired base of the stack and the first unpaired base of the loop. With this in mind, the strategy of boosting the end loop was used. Boosting can be characterized as a nucleotide substitution that lowers the energy of a given molecule by finding the loop’s boost points and placing the required nucleotides. The change in stability and energy of the end loops was observed by modifying the first unpaired base of a given loop by placing the nucleotide “guanine” as the boost. Once the

14 boost was placed, a change in energy was observed within the loop. This process was completed on several different RNA structures and several different loop sizes to ensure consistency. In or- der to determine if the boost also affected the adjacent structures, the change in energy within the loop was subtracted from the change in energy in the overall structure. Of note, all loops con- tained adenine nucleotides per baseline conditions set by EteRNA, except where nucleotide sub- stitutions (boosts) were placed.

RNA Structures Used:

The different RNA’s structures used through out the experiment were: Lysine Riboswitch, Lead- zyme Ribozyme, RNA polymerase, Ribosomal RNA, 5s Ribosomal RNA, Cobalamin Ri- boswitch, and Saccharaomycyes Cerevisiae tRNA.

EteRNA Strategy and Folding Guides:

Stack Guide, Loop Guide, EteRNA Dictionary, Puzzle Solving Guide, and the Boost Guide were all written by EteRNA users. The listed guides were specifically used as a resource to assist with the folding strategies of RNA structures and can be found with the following link https://eter- nagame.org/web/strategy_guides/.

15 Results:

EteRNA interface allows for visualization of nucleotides, shape and energy of RNAs.

Figure 4 represents an overview of an RNA molecule from Saccharomyces cerevisiae, captured from the EteRNA software. The four different nucleotide bases within RNA are repre- sented by unique colors as shown with the arrow located at the bottom of figure 4. Yellow, blue, red and green colors are used for adenine, uracil, guanine and cytosine respectively. The total energy of the molecule is located at the top left of the screen displayed at -79.98kcal. The exper- iments were conducted on each loop within this figure, however, the loop of interest depicted with the arrow (BP 182-190) is a sample for demonstration purposes. The associated loop energy is displayed at the top left of the screen at 4.8kcal. To examine the change in stability and energy of the end loop, the first unpaired base (183) of the loop was modified by placing the nucleotide

“guanine” as the boost.

Figure 5a displays the loop of interest with an AU base pair closing orientation without the boost. The starting energy of the molecule is -78.38kcal and the loop energy is 5.6kcal demonstrated by the arrows. The loop energy is correlated to ending base pair orientation which will be further demonstrated throughout this chapter. Figure 5b displays the same loop, however, includes the guanine nucleotide substitution on base 183. The final energy for the overall struc- ture is observed at -79.18kcal and the loop energy is observed at 4.8kcal. This demonstrated that the nucleotide substitution did increase the stability of the loop and lowered the energy of the loop. Specifically, the change in loop energy was observed to be -1.3kcal. To examine whether this nucleotide substitution affected the overall structure, the total change in energy within the loop was subtracted from the total change in energy in within overall structure; which was calcu-

16 lated to be 0. As a result, the nucleotide substitution had no effect on adjacent secondary struc- tures.

Figure 6a displays the same loop of interest, however, here it has a GC closing base pair orientation. The starting energy of the molecule is -79.98kcal and the starting loop energy is ob- served at 4.8kcal prior to the boost, as depicted by the arrows. This differs from the prior energy found with the AU closing orientation, demonstrating that the closing base pair orientation does in fact affect the energy of the given loop. Figure 6b displays the GC orientated loop with the nucleotide substitution added to base 183. Upon addition of the boost, the loop energy decreased to 3.5kcal and the overall energy of the molecule decreased to -81.28, as depicted by the arrows.

This demonstrated again that a single nucleotide substitution within a given loop will affect the energy and stability of that loop. However, upon examination of the nucleotide substitution af- fecting adjacent structures, it was again found that the change in energy solely reflects the loop that was modified.

The same process was repeated on the same loop (BP 182-190) with the CG base pair closing orientation, as shown with Figure 7a. The starting energy for the overall structure is ob- served at -80.68kcal and the starting loop energy is observed at 4.4kcal, prior to the boost. The starting energy for the loop with the CG orientation is more stable than the previously observed

AU and GC orientations. Once again, this illustrates that the closing base pair orientation of the loop affects the energy and stability of the loop. Figure 7b displays the CG orientated loop with the added guanine boost to base 183. The final energy for the overall structure is -81.38kcal and the final loop energy is 3.7kcal. The loop energy changed from 4.4kcal to 3.7kcal upon the nu- cleotide substitution demonstrating that an addition of a boost does affect the stability and energy

17 of the loop modified. However, the stability of adjacent structures was not affected with the addi- tion of the boost.

Figure 4- EteRNA interface - Overview of tRNA

18 Figure 5a - AU closing orientation without boost (BP 182-190)

Figure 5b - AU closing orientation with boost (BP 182-190)

19 Figure 6a - GC closing orientation without boost (BP 182-190)

Figure 6b- GC closing orientation with boost (BP 182-190)

20 Figure 7a - CG closing orientation without boost (BP 182-190)

Figure 7B- CG closing orientation with boost (BP 182-190)

21 Starting energy of loops correlates with size of loop and nucleotide pair closure.

Figures 8a, 8b and 8c demonstrate both the correlation between the starting loop energy with the number of nucleotide bases within the loop as well as the closing base pair orientation.

A total of 34 loops in 11 different RNAs were used for the experiment. It was found that for each unique loop size an associated energy was also seen. Figure 8a illustrates the different loop ener- gies found for each unique size; all with the same GC closing pair orientation. For example, the loops examined with 6 nucleotide bases had the same starting energy of 4.5kcal. All loops with 7 nucleotide bases had a starting energy of 4.5kcal, loops with 8 nucleotide bases had 4.3kcal start- ing energy, loops with 9 nucleotide bases had starting energies of 4.8kcal, so on and so forth.

This phenomena of consistent staring loop energies within a base size was seen for all the loops examined. As illustrated through figure 8a, 8b and 8c, there is an upward trend; as the number of nucleotide bases within a loop increases, the energy of the loop increases as well. In simpler terms, the bigger the loop, the less stable the structure is.

Figures 8b and 8c are the same loops examined as in figure 8a, however, each contain different closing pair orientations. Interestingly, the same trends are exemplified. Each unique loop size found has an associated starting energy. Again, the upward trend was seen where an increase of nucleotides with a loop also has a higher energy value found. There were minor in- consistencies seen with sizes 8 and 10 within each closing orientation. These inconsistencies may be as a result of the programming of EteRNA; however, it is still to be determined. When com- paring figures 8a, 8b and 8c, the impact the closing pair orientation has on a given loop size can be seen. For example, loop size 6 with the GC orientation has a starting energy of 4.5kcal, while

22 the CG and AU orientations are 4.1kcal and 5.3 kcal, respectively. These results demonstrate that

CG orientation is the most stable, while AU is the least.

Starting Energy of Loops Correlates with Size and Nucleotide Closing Pair

8

6.15 6 5.68 5.84 5.4 5.5 4.8 4.5 4.5 4.5 4.3 GC closing

(kcal) 4 pair of loop Energy orientation

2

0 6 7 8 9 10 12 13 15 17 22 # of nucleotides in loop

Figure 8a - Comparison of starting energies for different size loops with GC closing pair orientation.

23 Starting Energy of Loops Correlates with Size and Nucleotide Closing Pair

6 5.75 5.44 5.28 5 5.1

4.4 4.1 4.1 4.1 4 3.9

CG closing

(kcal) pair of loop Energy orientation

2

0 6 7 8 9 10 12 13 15 17 22 # of nucleotides in loop

Figure 8b - Comparison of starting energies for different size loops with CG closing pair orientation.

24 Starting Energy of Loops Correlates with Size and Nucleotide Closing Pair

8

6.95 6.48 6.64 6.2 6.3 6 5.6 5.3 5.3 5.3 5.1

AU closing

(kcal) 4 pair of loop Energy orientation

2

0 6 7 8 9 10 12 13 15 17 22 # of nucleotides in loop

Figure 8c - Comparison of starting energies for different size loops with AU closing pair orientation.

25 Change in loop energy due to single nucleotide substitution (boost) correlates with size and nu- cleotide base pair closure of loop.

It was previously noted that the starting energies of each loop here related to the size as well as the nucleotide closure of each loop. Similarly found, the change in loop energy due to the addition of the guanine boost correlates with size and closing base pair orientation as demon- strated by figure 9. It became evident that the change in energy induced by the nucleotide substi- tution for loops > 6 nucleotide bases long were the same for each closing pair orientation. This trend is revealed with figure 9; all loops > 7 nucleotide bases with CG, GC and AU orientations showed a decrease in energy by 1.3kcal, 0.7 kcal, and 0.8kcal respectively. As expected, loops with exactly 6 nucleotide bases were impacted more heavily by the addition of the guanine boost given their smaller size. However, while figures 8a-c revealed that loop sizes 6, 7 and 10 had the same starting energy, the loops were not affected similarly by the addition of the boost. This may be due to differences in the size of the loop.

The impact of the closing base pairs on the change in energy follows a unique trend inde- pendent of the loop size. It was found that CG orientation is stabilized most by the addition of the boost while GC is stabilized the least. Prior to this finding, it was anticipated that CG orienta- tion would be most stabilized and the AU orientation least stabilized, based upon the starting en- ergies illustrated in figure 8. However, this was not seen given that the change in energy for the

GC orientation decreased by 0.7kcal and the AU orientation decreased by 0.8kcal. It became evi- dent that the size of the loop and closing base pair orientation does affect the induced change produced by the boost.

26 The Change in Loop Energy due to Boost correlates with Size and Closing Base Pair

5

4.3

4 3.7

in CG orientation Val. after Abs. (kcal)

boost| 3

energy 2.8 |Change

GC orientation

2 AU orientation

1.3 1.3

1 0.8 0.8 0.7 0.7

0 6 7 8-22 # of nucleotides in loop

Figure 9 - Comparison of the affects of the boost on the different loop sizes with each base pair losing orientations.

27 Changes in stability of the rest of the RNA molecule were not observed for the local nucleotide substitutions examined

To determine whether the nucleotide substitution affected any adjacent RNA structures, simple subtraction was used. Table 1 is a representative sample of 11 loops taken from a total of

34 loops studied. The table reveals the energies of the loops and overall structure before and after the nucleotide substitution. To assess whether there was a change in stability of any adjacent structures after the addition of the boost, the change in energy of the loop was subtracted from the total change in energy for the overall molecule. Refer to table 1, row 1, to the Riboswitch

Cobalamin (BP 75-80). The starting energy for the loop was 4.5kcal which decreased to 0.2kcal upon addition of the boost. The change in energy observed in the loop was -4.3kcal. The starting energy for the molecule was -25.19kcal which decreased to -29.49kcal. The total change in ener- gy observed for the entire RNA was -4.3kcal. This illustrates that the only change in energy for the overall molecule was within the loop that was modified. While many types of single nu- cleotide substitutions may alter the stability of secondary structures, the stability of adjacent structures was not affected.

28 Table 1 - Energy states and changes in energy for representative examples of loops.

29 Discussion:

Ribonucleic acid has many different roles and functions with a cell which is largely de- pendent on its structure. The purpose of this study was to discover factors that affect the stability and energy of RNA secondary motifs. Two factors were found to have an effect on the stability of secondary RNA structures; the number of nucleotides within a loop and the closing base pair orientation of a loop. It was observed that the greater number of nucleotide bases within a loop decreases the stability of that given loop. A more stable loop will use the CG closing orientation as opposed to GC or AU, to allow for the lowest energy within the loop. Lastly, it was deter- mined that with a single nucleotide substitution a secondary RNA structure can become more stable. However, this stabilization within a given secondary structure did not affect stability of adjacent parts of the molecule. Although this may be true for the loops examined, it might not hold true for all loops. It is worth noting that all starting loops contained only A residues. This is due to the fact that it is easier to incorporate these in synthesized RNA molecules in the lab, a goal of EteRNA scientists. It is possible that loops with other nucleotide bases in the loops would behave differently.

Future work regarding new RNA constructs will include studying UA base pair closure orientations to better understand the trends of the data. Unfortunately, due to time constraints the

UA orientation was not examined. In addition, to assess more factors that affect the stability of secondary structures, additional experiments using other nucleotide substitutions aside from the single “G”s placed could be examined. Lastly, it would be interesting to examine the effects of loop size and base pair orientation on the stability of internal loops rather than end loops.

30 Conclusion:

Understanding the structure, design and shape of RNA is crucial to uncover the funda- mental principles underlying life’s building blocks. The EteRNA program has served as research grounds for many researchers, like myself, to help generate new RNA constructs as well as gen- erate knowledge that may aid researchers in biological discovery. The scientists behind the de- velopment of the EteRNA software transition from simulation to biology by choosing the best designs created by users to be synthesized at Stanford University. Although there has been a sig- nificant growing interest in the role of RNA as a messenger and regulator of cellular functions, there is a still much to learn about its purpose and design. It is my hope that the results found within this thesis will aid researchers in synthesizing future RNA constructs.

31 Work Cited

Alberts, Bruce. “The RNA World and the Origins of Life.” Current Neurology and Neuroscience Reports., U.S. National Library of Medicine, 1 Jan. 1970, www.ncbi.nlm.nih.gov/books/ NBK26876/.

Boniecki, Michal. “SimRNA: a Coarse-Grained Method for RNA Folding Simulations and 3D Structure Prediction.” OUP Academic, Oxford Academic, 19 Dec. 2015, academic.oup.com/nar/ article/44/7/e63/2467805.

Chen, Yu, and Gabriele Varani. “RNA Structure.” The Canadian Journal of Chemical Engineer- ing, Wiley-Blackwell, 17 June 2010, onlinelibrary.wiley.com/doi/full/ 10.1002/9780470015902.a0001339.pub2.

Chen, Shi-Jie, and Xiaojun xu. “A Method to Predict the Structure and Stability of RNA/RNA Complexes.” Ncbi.nlm.nih, 2017, www.ncbi.nlm.nih.gov/pmc/articles/PMC5508871/.

Clancy, Suzanne. “RNA Functions.” Nature News, Nature Publishing Group, 2008, www.na- ture.com/scitable/topicpage/rna-functions-352.

Dandekar, Thomas, et al. RNA Motifs and Regulatory Elements. Springer Berlin, 2013.

Darnell, University James (rockefeller University). Rna - Lifes Indispensable Molecule. Cold Spring Harbor Laboratory, 2011.

Edwards, Andrea L. “Riboswitches: A Common RNA Regulatory Element.” Nature News, Na- ture Publishing Group, www.nature.com/scitable/topicpage/riboswitches-a-common-rna-regula- tory-element-14262702.

Fallmann, Jörg, et al. “Recent Advances in RNA Folding.” Vol. 261, 2017, pp. 97–104.

Ferrier, Denise R. Biochemistry. Lippincott Williams & Wilkins, 2017.

Hershlag, Daniel. “From Static to Dynamic: the Need for Structural Ensembles and a Predictive Model for RNA Folding and Functioning.” Science Direct, 2015

Herschlag, Daniel, et al. “The Story of RNA Folding, as Told in Epochs.” Cold Spring Harbor Perspectives in Biology, 2018, cshperspectives.cshlp.org/content/10/10/a032433.full.

Phillips, Anna, et al. “LigandRNA: Computational Predictor of RNA–Ligand Interactions.” US National Library of Medicine National Institutes of Health , 2013, www.ncbi.nlm.nih.gov/pmc/ articles/PMC3860260/.

32 Lee, Jeehyoung. “RNA design rules from a massive open laboratory.” Proceedings of the Na- tional Academy of Science of the United States of America, 17 Jan 2014, https:// www.ncbi.nlm.nih.gov/pmc/articles/PMC3926058/

Serganov, Alexander, and Evgeny Nudler. “A Decade of Riboswitches.” Vol. 152, no. 1, 2013, pp. 17–24.

Wilson, Timothy J, and David M.J. Lilley. “RNA Catalysis--Is That It?” US National Library of Medicine National Institutes of Health, 2015, www.ncbi.nlm.nih.gov/pmc/articles/ PMC4371269/.

Woodson, Sarah A. “RNA Folding Retrospective: Lessons from Ribozymes Big and Small.” RNA, Cold Spring Harbor Lab, 1 Jan. 1970, rnajournal.cshlp.org/content/21/4/502.full.

"Solve Puzzles. Invent Medicine." Eterna. N.p., n.d. Web. 18 Sept. 2018. “Stack Guide” written by user: dimension9, “Loop Guide” written by user: mbp21 , “EteRNA dictionary” written by user: Eli Fisker “Puzzle solving guide” written by user: Eli Fisker, and “Boost Guide” written by user: Brourd.

33