SHOWCASE ON RESEARCH

Synthetic RNA Biology Aleksandra Filipovska and Oliver Rackham* Harry Perkins Institute of Medical Research and School of Chemistry and Biochemistry, University of Western Australia, Nedlands, WA 6009 *Corresponding author: [email protected]

Synthetic biology is a dynamic new field focused on DNA, the rules of base pairing are well understood and, designing and building new biomolecules, networks although diverse structures are possible, the prevalence and systems, and redesigning existing biological systems of canonical base-pairing and the easily predicted free for useful purposes (1). Significant milestones to date energies of RNA structures enable structured RNAs to include the creation of the first cell controlled by an be rationally designed (6). These rules have been used entirely artifical genome (2), the efficient production of to understand the roles of RNA structures in living cells the antimalarial drug precursor artemisinic acid in yeast extensively and even enable ‘RNA origami’: the design (3) and the generation of cells with unnatural genetic of complex, self-assembling RNA structures in vitro. codes (4). Further advances require the development of Because of RNA’s place at the heart of the Central Dogma new components that interact in a predictable manner, of Molecular Biology, most of its roles in nature revolve both with each other and with existing or designed around its ability to act as a template for synthesis cellular networks (5). These components need to be and in regulating expression more generally. Many easily engineered and to have broad functionalities. The modes of RNA-mediated gene regulation exist in nature unique characteristics of RNA have seen it used in many and most have been adopted in innovative synthetic synthetic biology applications to date and has stimulated biology applications. the emergence of ‘RNA synthetic biology’ (6). Small RNAs are important modulators of and since their discovery three decades ago in bacteria, Why RNA? it has become clear that they are ubiquitous across all The chemical structure of RNA closely resembles that domains of life. The specificity that can be achieved of DNA, but differs primarily in two respects. Firstly, through canonical base pairing and the plasticity in their thymine is replaced by uracil, an unmethylated form of evolution provide major advantages for the adoption of thymine. This considerably reduces the energetic costs small RNAs to modulate RNA metabolism in nature. in synthesising RNA within the cell but eliminates the Likewise these characteristics make small RNAs valuable opportunity for proofreading deaminated cytidines. tools in synthetic biology. Isaacs et al. provided the first Secondly, while DNA contains deoxyribose, RNA example of RNA synthetic biology using small RNAs, contains ribose, which contains a hydroxyl group attached dubbed ‘riboregulators’, to activate translation of bacterial to the 2′ position of the pentose ring. These hydroxyl mRNAs by revealing previously occluded start codons groups make RNA more prone to hydrolysis than DNA (6). In many eukaryotic systems endogenous microRNAs and as such RNA is less stable. However it provides a regulate the bulk of the transcriptome, and although the massive gain in chemical functionality as it enables the rules for target identification are not entirely understood, backbone to participate as a hydrogen bond donor as well the ability to design artificial miRNAs from the simple as an acceptor. RNA can take on both single-stranded rules of Watson-Crick base pairing has made them and, by complementary base pairing, double-stranded attractive research tools. In synthetic biology applications structures. These characteristics enable RNA to take on RNAi has been used to build circuits that enable the many additional structures and functions compared robust, tunable and reversible silencing of gene expression to DNA, which are reflected in the myriad of biological in mammalian cells. In a particularly interesting example, roles that have been identified to date. This is exemplified Benenson and colleagues divised a miRNA-based circuit by the fact that RNA can have catalytic activity, as first to discriminate cancer cells from non-transformed cells identified by Cech in 1982, and this ability of RNA by incorporating target sites for three miRNAs that are structures to catalyse enzymatic reactions has been found expressed at particularly low levels in cervical cancer to be widespread. Most importantly, RNA is the only cells into a mRNA expressing the cell death protein Bax – macromolecule in nature that can boast both catalytic and enabling the selective killing of cancerous cells (7). information storage abilities. These observations illustrate Small RNA-mediated gene silencing systems in RNA’s remarkable functionality and provide evidence for eukaryotic cells are thought to have originated as a means potential evolutionary roles in the origin of life. of preventing the spread of viruses and, not surprisingly, analogous systems have recently been found to exist in Engineering Rules for RNA prokaryotes and archaea (8). In these organisms small From an engineering perspective, RNA combines some DNA fragments from invading phage and plasmids are of the most favourable characteristics of both DNA and archived in the genome as clustered, regularly interspaced . Like proteins, RNA can adopt a wide variety of short palindromic repeat (CRISPR) sequences. When structures and a small number of residues can achieve transcribed these can be act as templates for the recognition specific recognition of other molecules. However like and cleavage of potentially harmful plasmids and phage Page 8 AUSTRALIAN BIOCHEMIST Vol 46 No 2 August 2015 SHOWCASE ON RESEARCH Synthetic RNA Biology as part of an RNA-protein complex (8). The CRISPR tRNA to release the peptide chain. Recently, detailed system has recently been co-opted to alter sequences in genome manipulation (12) or the removal of all UAG stop genomes, to build artificial transcription factors, and to codons in the E. coli genome (13) have enabled the deletion label specific regions of in eukaryotic cells of RF1 and thereby efficient incorporation of unnatural (9). These recent examples provide an exciting taste of the amino acids at UAG codons. These studies have now potential applications of RNA synthetic biology. In the opened the door to completely reassign one codon in E. following sections we discuss in more detail examples coli, however further genetic code expansion will require of synthetic biology approaches using RNA, where our alternative approaches. To bypass this limitation, recent laboratories have contributed. work has focused on engineering the ribosome itself. In bacteria the selection of mRNAs for translation occurs Engineered Ribosomes via direct interactions between the mRNA-binding site The amazing variety in protein structure and function (MBS) of the rRNA and the ribosome-binding site (RBS) is achieved by connecting a set of 20 common amino of the mRNA within the ribosome (Fig. 1a). Directed acids in different permutations. Although many different evolution has been used to create interactions between functions are possible, the chemical structures of the 20 a set of mRNAs and ribosomes so that they can function common amino acids do not show particular chemical orthogonally in living cells (Fig. 1b) (14). Mutations in diversity. This has prompted researchers to investigate endogenous rRNA are poorly tolerated and often lethal. the new protein functions that might be possible in However, because orthogonal ribosomes (O-ribosomes) proteins containing unnatural amino acid building are not linked to translation of the proteome, they can be blocks. Translation defines life by linking amino acids extensively mutated to alter and explore their properties according to mRNA instructions. By bridging the mRNA (15). This has enabled O-ribosomes to be used to perform codons and their encoded amino acids, tRNAs and the large-scale mutagenesis of the rRNA residues that make aminoacyl-tRNA synthetases that load them, in essence up the interface between large and small ribosomal set the genetic code. Pioneering work from Schultz and subunits (16), to build synthetic gene regulatory circuits colleagues used an ‘orthogonal’ tRNA-synthetase pair, controlled at the level of translation (17), and enhance that is a functional set that does not cross-react with the tRNA recognition of the UAG stop codon over translation natural tRNAs or synthetases in the host organism, to termination (18), enabling efficient site-specific unnatural expand the genetic code. They demonstrated that mutating amino acid incorporation at UAG codons (Fig. 1c). the active site of an orthogonal tyrosyl-tRNA synthetase Importantly, large-scale mutagenesis of the decoding enabled it to load a variety of unnatural amino acids centre of the O-ribosome led to the discovery of mutant onto its target tRNA that were subsequently added into ribosomes that are able to use quadruplet codons as proteins in Escherichia coli (4). The orthogonal tRNA was efficiently as triplet codons and, when used in combination modified so that it decodes UAG stop codons, resulting in with a UAG suppressor tRNA, can be used to incorporate incorporation of an unnatural amino acid and subsequent two distinct unnatural amino acids into the same protein elongation of the nascent protein, instead of termination (Fig. 1d) (19). Furthermore, the discovery of an O-ribosome of protein synthesis. A number of different unnatural mutant that allows the efficient suppression of specific amino acids have been incorporated into proteins using UGA stop codons with a selenocysteine-specific tRNA this approach, enabling the site-specific labelling of has opened up the possibility to recode the UGA stop proteins with biophysical probes, photocrosslinking codon in living cells (Fig. 1e) (20). These advances provide reagents, fluorescent groups, heavy atoms and orthogonal evidence that an orthogonal translational system might reactive groups (4). This approach has since been used to be used to encode the synthesis of new types of ligands expand the genetic codes of yeast, mammalian cells, the and polymers in vivo allowing the addition and evolution worm and the fruit fly of new programmable functions, resulting in phenotypes melanogaster (10). In an example of how these technologies not discovered in cells that merely encode L-amino acid can be applied in synthetic biology, a photocaged lysine proteins. residue was incorporated into the T7 RNA polymerase in mammalian cells (11). Irradiation of cells expressing the Programmable RNA-binding Proteins photocaged polymerase enables the specific transcription RNA-binding proteins play essential roles in the of mRNAs or shRNAs (short hairpin RNAs) in a spatially lifecycles of RNA. As such, the use and engineering of controlled manner. The incorporation of amino acids with RNA-binding proteins could allow the manipulation unnatural chemistries provides an approach to selectively of many different aspects of gene expression. Most introduce new functions to living cells. RNA-binding proteins recognise their targets using a A major limitation to expanding the genetic code of combination of sequence and structure, which meant that living cells is the lack of available codons, because in designing RNA-binding proteins with new specificities almost all organisms all 64 codons are used to specify had proven particularly challenging (21). Studies of the incorporation of the 20 canonical amino acids or RNA-binding repeat proteins have now paved the way to terminate protein synthesis. Even the use of the rare, and create designer proteins that target endogenous RNAs of comparatively weak, UAG stop codon is limited because interest. Proteins of the Pumilio and FBF homology (PUF) the UAG suppressing tRNA must always compete with family interact with RNA via an array of repeats of a three the translation release factor (RF1) that also recognises the alpha helix bundle of 36 or more amino acids (Fig. 2a) (22). UAG stop codon and promotes hydrolysis of peptidyl- The most striking and useful feature of the PUF domain is

Vol 46 No 2 August 2015 AUSTRALIAN BIOCHEMIST Page 9 Synthetic RNA Biology SHOWCASE ON RESEARCH that each repeat binds to a single base in its RNA target, in to a splicing inhibition domain it was possible to increase a modular fashion. Amino acids at positions 12 and 16 of the production of Bcl-XS and induce apoptosis in cancer the PUF repeat bind the Watson-Crick edge of each RNA cells (24). This illustrates the potential for designer RNA- base via hydrogen bonding or van der Waals contacts and binding proteins to modulate gene expression at the level are base-specific, such that specific pairs of amino acids of RNA and how they can provide unique opportunities bind specific bases (22). Recent studies have elucidated for fine-tuning the expression of RNAs. As well as the complete code for recognition of all four bases and providing a rapid response, control at the RNA level is shown that pairs of amino acids can be swapped to make particularly useful because of its reversibility and because designer RNA-binding proteins with the potential to some aspects of gene expression can only be controlled target any RNA sequence of interest (Fig. 2c) (23). post-transcriptionally, such as the nuclear retention or Using the PUF code for RNA recognition, proteins cytoplasmic localisation of mRNAs (25). The availability have been engineered to recognise endogenous RNAs of designer RNA-binding proteins will provide new tools in order to track their localisation, cleave them, and to to create synthetic networks that are controlled at the level activate or repress their translation (24). In an interesting of RNA and to improve our understanding of the complex example, Wang et al. used PUF proteins to create programs of gene expression in living cells. engineered splicing factors (24). A PUF domain was Computational studies of another family of RNA- designed to bind upstream of a splice site in the Bcl-X binding repeat proteins, known as pentatricopeptide mRNA, which encodes a mitochondrial outer membrane repeat (PPR) proteins, have predicted that they also bind protein that is involved in programmed cell death their targets in a modular and sequence-specific manner (apoptosis). of this transcript enables (26,27). PPR proteins contain a repeated motif that is the production of two distinct isoforms, Bcl-XL and Bcl- typically 35 amino acids in length and folds into two XS, which act as an apoptotic inhibitor or an apoptotic anti-parallel alpha helices (Fig. 2b) (28). Natural proteins activator, respectively. By fusing a designed PUF domain have been observed to contain between two and thirty

a large b ribosomal subunit

rRNA mRNA wild-type ribosome O-ribosome

mRNA MBS O-mRNA small ribosomal subunit E. coli cell

Fig. 1. Engineered ribosomes. nascent protein a. A cartoon representation of the structure of the tRNA bacterial ribosome containing both mRNA and tRNAs. Ribosomal proteins are coloured green, the O-mRNA large subunit rRNA in pink, the small subunit rRNA in purple, the mRNA in orange, and the E-site and

unnatural amino acid selenocysteine P-site tRNAs in blue and yellow, respectively. The intimate interaction between the ribosome-binding site (RBS) in the mRNA and the mRNA-binding site (MBS, coloured red) is shown in the expanded view. c quadruplet e The molecular details are modelled from the crystal anticodon structure of the Thermus thermophilus ribosome (Protein SelB Data Bank accession code 4V4J; image created with SECIS PyMOL, http://www.pymol.org). UAG UGA d b. The engineered orthogonal ribosome (O-ribosome) only translates an otherwise silent orthogonal mRNA (O-mRNA) enabling the ribosome to be manipulated without affecting cell health. Subsequent evolution of AGGA O-ribosomes enabled the selection of ribosomes that enhanced the incorporation of unnatural amino acids at UAG stop codons (c), the decoding of quadruplet codons (d), and the incorporation of selenocysteine at UGA codons flanked by a selenocysteine insertion sequence (SECIS) in the mRNA (e).

Page 10 AUSTRALIAN BIOCHEMIST Vol 46 No 2 August 2015 SHOWCASE ON RESEARCH Synthetic RNA Biology

a N 1′ 3′ C 1 8 N C 2 7 3 6 N PUF 4 5 90° RNA protein 5 4 6 3 7 2 8 1 8′ 5′ C

b N 5′ N 1 1 N C 2 2 C 3 3 90° cPPR 4 4 RNA protein 5 5 6 6 7 7 8 8 solvating helix 3′ C

c O Fig. 2. Universal codes for RNA-protein recognition a. Proteins of the Pumilio and FBF homology 12 16 H N 4 34 PUF repeat N Q U N D cPPR repeat (PUF) family interact with RNA via tandem O N repeat elements, in an ‘anti-parallel’ configuration. Capping repeats (designated 1′ and 8′) stabilise the NH2 RNA-binding repeats. A cartoon representation 12 16 N 4 34 of the structure of the human PUF protein, PUM1 PUF repeat S R C N S cPPR repeat ( accession code 1M8Y), is shown O N with alternating repeats coloured in green and NH2 purple, and its RNA target in orange. b. Consensus pentatricopeptide repeat (cPPR) 12 16 N N 4 34 PUF repeat C Q A T N cPPR repeat proteins interact with RNA in the opposite N N configuration to PUF proteins. A helix-nucleating

O cap at the N-terminus and a solvating helix at the C-terminus stabilise the RNA-binding repeats. A N 12 16 H N 4 34 cartoon representation of the structure of a cPPR PUF repeat S E G T D cPPR repeat H2N N N protein designed to bind a poly(C) RNA (Protein Data Bank accession code 4WSL) is coloured as for (a). Individual RNA nucleotides are modelled into the apo-protein structure (as PPR proteins adopt a more compact fold upon association with RNA the nucleotides cannot be linked into a contiguous RNA chain). Structure images were created with PyMOL, http://www.pymol.org. c. Amino acids at positions 12 and 16 of each PUF repeat recognise specific RNA bases, while amino acids at positions 4 and 34 of each cPPR repeat are responsible for binding specificity. individual PPRs. Some PPR proteins appear to consist PPR domains were designed based on the conservation almost entirely of tandem PPRs, while some contain other of residues within PPRs throughout evolution. These domains, such as endonuclease or protein interaction consensus PPRs (cPPRs) are highly soluble and, via an domains (28). Statistical correlations between specific PPR appropriate choice of amino acids at position 4 and 34, residues and RNA bases within their binding sites have can be designed to bind RNA in a predictable, sequence- elucidated a potential code for RNA recognition by PPR specific manner Fig.( 2c) (29). proteins based on the identities of amino acids at positions The stability of cPPRs enabled the structures of four of 4 and 34 and particular bases within the RNA footprint these proteins to be solved, providing new insights into (26,27). A key limitation in studying and engineering PPR RNA recognition by PPR proteins (29). Although the proteins is their highly insoluble nature when expressed cPPRs varied only in the identities of the amino acids at in heterologous systems. This problem severely delayed positions 4 and 34, variations in their curvature highlight the elucidation of the mechanisms of PPR-RNA binding the capacity of individual PPR repeats to pack variably and their potential use as tools. To better understand against each other and indicate that the RNA-binding the modes by which PPR proteins bind RNA, and to residues at positions 4 and 34 might influence the overall develop PPR scaffolds that would enable robust and architecture of PPR arrays. This flexibility could also reliable recognition of RNA targets of interest, synthetic reflect a pre-requisite to accommodate RNAs of various

Vol 46 No 2 August 2015 AUSTRALIAN BIOCHEMIST Page 11 Synthetic RNA Biology SHOWCASE ON RESEARCH sequences, in particular if they contain purines rather than 4. Liu, C.C., and Schultz, P.G. (2010) Annu. Rev. Biochem. pyrimidines as these would require pockets that allow 79, 413-444 deeper insertion. Interestingly, the positions of RNA 5. Filipovska, A., and Rackham, O. (2008) ACS Chem. nucleotides modelled into the cPPR structures showed Biol. 3, 51-63 that their phosphate groups were at hydrogen bonding 6. Isaacs, F.J., Dwyer, D.J., and Collins, J.J. (2006) Nat. distance from lysine residues at position 12 of reach Biotechnol. 24, 545-554 repeat. Mutant cPPRs where the lysines at position 12 of 7. Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R., and each repeat were changed to either serine or aspartate no Benenson, Y. (2011) Science 333, 1307-1311 longer bound RNA, confirming that these lysines play a 8. Wiedenheft, B., Sternberg, S.H., and Doudna, J.A. key role in stabilising bound RNA (29). These observations (2012) Nature 482, 331-338 illustrate a reoccurring theme in synthetic biology: that 9. Mali, P., Esvelt, K.M., and Church, G.M. (2013) Nat. engineering biomolecules often reveals new insights into Methods 10, 957-963 the basic functioning of natural systems. 10. Chin, J.W. (2014) Annu. Rev. Biochem. 83, 379-408 The natural PPR proteins characterised to date operate 11. Hemphill, J., Chou, C., Chin, J.W., and Deiters, A. in mitochondria, chloroplasts and nuclei; locations where (2013) J. Am. Chem. Soc. 135, 13433-13439 the most common RNA-directed tool, RNA interference, 12. Johnson, D.B., Xu, J., Shen, Z., Takimoto, J.K., Schultz, cannot function or functions poorly to target RNAs. As M.D., Schmitz, R.J., Xiang, Z., Ecker, J.R., Briggs, S.P., proteins that contain PPRs have often been observed to and Wang, L. (2011) Nat. Chem. Biol. 7, 779-786 contain many other domains with diverse roles in RNA 13. Lajoie, M.J., Rovner, A.J., Goodman, D.B., Aerni, H.R., metabolism, such as RNA cleavage, modification, and Haimovich, A. D., Kuznetsov, G., Mercer, J.A., Wang, control of translation (28), these proteins are likely to be H.H., Carr, P. A., Mosberg, J.A., Rohland, N., Schultz, structurally compatible for fusion with partner proteins. P.G., Jacobson, J.M., Rinehart, J., Church, G.M., and These qualities will likely be very useful for making new Isaacs, F.J. (2013) Science 342, 357-360 research tools to manipulate aspects of RNA biology that 14. Rackham, O., and Chin, J.W. (2005) Nat. Chem. Biol. 1, have been neglected due to a lack of appropriate reagents 159-166 and also for controlling gene networks to build cells with 15. Filipovska, A., and Rackham, O. (2013) FEBS Lett. 587, new properties in synthetic biology. 1189-1197 16. Rackham, O., Wang, K., and Chin, J.W. (2006) Nat. Future Directions Chem. Biol. 2, 254-258 Decades of work in the RNA field have provided a 17. Rackham, O., and Chin, J.W. (2005) J. Am. Chem. Soc. compendium of RNAs with diverse functions in molecular 127, 17584-17585 recognition, catalysis and macromolecular scaffolding, 18. Wang, K., Neumann, H., Peak-Chew, S.Y., and Chin, as well as an understanding of their associated proteins. J.W. (2007) Nat. Biotechnol. 25, 770-777 These building blocks provide powerful tools to address 19. Neumann, H., Wang, K., Davis, L., Garcia-Alai, M., the new challenges provided by the ethos of synthetic and Chin, J.W. (2010) Nature 464, 441-444 biology and have been integral in the development of 20. Thyer, R., Filipovska, A., and Rackham, O. (2013) J. the field to date. The diverse characteristics and roles of Am. Chem. Soc. 135, 2-5 RNA that have been discovered in nature and evolved in 21. Mackay, J.P., Font, J., and Segal, D.J. (2011) Nat. Struct. the laboratory illustrate the fact that the many possible Mol. Biol. 18, 256-261 applications of RNA in synthetic biology are likely to be 22. Wang, X., McLachlan, J., Zamore, P.D., and Hall, T.M. limited predominantly by our imaginations. (2002) Cell 110, 501-512 23. Filipovska, A., Razif, M.F., Nygard, K.K., and References Rackham, O. (2011) Nat. Chem. Biol. 7, 425-427 1. Benner, S.A., and Sismour, A.M. (2005) Nat. Rev. 24. Wang, Y., Cheong, C.G., Hall, T.M., and Wang, Z. Genet. 6, 533-543 (2009) Nat. Methods 6, 825-830 2. Gibson, D.G., Glass, J.I., Lartigue, C., Noskov, V.N., 25. Filipovska, A., and Rackham, O. (2011) RNA Biol. 8, Chuang, R.Y., Algire, M.A., Benders, G. A., Montague, 978-983 M.G., Ma, L., Moodie, M.M., Merryman, C., Vashee, 26. Barkan, A., Rojas, M., Fujii, S., Yap, A., Chong, Y.S., S., Krishnakumar, R., Assad-Garcia, N., Andrews- Bond, C.S., and Small, I. (2012) PLoS Genet. 8, e1002910 Pfannkoch, C., Denisova, E.A., Young, L., Qi, Z.Q., 27. Yagi, Y., Hayashi, S., Kobayashi, K., Hirayama, T., and Segall-Shapiro, T.H., Calvey, C.H., Parmar, P.P., Nakamura, T. (2013) PLoS ONE 8, e57286 Hutchison, C.A., 3rd, Smith, H.O., and Venter, J.C. 28. Schmitz-Linneweber, C., and Small, I. (2008) Trends (2010) Science 329, 52-56 Plant. Sci. 13, 663-670 3. Ro, D.K., Paradise, E.M., Ouellet, M., Fisher, K.J., 29. Coquille, S., Filipovska, A., Chia, T., Rajappa, L., Newman, K.L., Ndungu, J.M., Ho, K.A., Eachus, R.A., Lingford, J.P., Razif, M.F., Thore, S., and Rackham, O. Ham, T.S., Kirby, J., Chang, M.C., Withers, S.T., Shiba, (2014) Nat. Commun. 5, 5729 Y., Sarpong, R., and Keasling, J.D. (2006) Nature 440, 940-943

Page 12 AUSTRALIAN BIOCHEMIST Vol 46 No 2 August 2015