Linking RNA Sequence, Structure, and Function on Massively Parallel High-Throughput Sequencers

Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Sarah K. Denny1 and William J. Greenleaf1,2,3

1Stanford University Department of Genetics, Stanford, California 94305 2Stanford University Department of Applied Physics, Stanford, California 94025 3Chan Zuckerberg Biohub, San Francisco, California 94158 Correspondence: [email protected]

SUMMARY

High-throughput sequencing methods have revolutionized our ability to catalog the diversity of RNAs and RNA–protein interactions that can exist in our cells. However, the relationship between RNA sequence, structure, and function is enormously complex, demonstrating the need for methods that can provide quantitative thermodynamic and kinetic measurements of macromolecular interaction with RNA, at a scale commensurate with the sequence diversity of RNA. Here, we discuss a class of methods that extend the core functionality of DNA sequencers to enable high-throughput measurements of RNA folding and RNA–protein interactions. Top- ics discussed include a description of the method and multiple applications to RNA-binding proteins, riboswitch design and engineering, and RNA tertiary structure energetics.

Outline

1 Background 4 Conclusion 2 Techniques References 3 Applications

Editors: Thomas R. Cech, Joan A. Steitz, and John F. Atkins Additional Perspectives on RNA Worlds available at www.cshperspectives.org

Copyright # 2018 Cold Spring Harbor Laboratory Press; all rights reserved Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 1 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press S.K. Denny and W.J. Greenleaf

1 BACKGROUND for these multiple structural states at an immense scale— commensurate with the combinatorial sequence complexity Our ability to catalog the immense diversity of RNAs of nucleic acids. And although a variety of high-throughput that exist in the cell has accelerated at a spectacular rate with methods have been developed to measure RNA structure the advent of high-throughput sequencing methods, allow- and RNA–protein binding, few allow for reporting of these ing the mapping of a universe of previously uncharacterized interactions in terms of the thermodynamic and kinetic transcripts. Likewise, sequencing methods have enabled the constants that must govern physical behaviors in the cell. mapping of the diverse proteins that interact with RNA in Transcriptome-wide RNA structure measurements have vivo, providing a window into a network of protein–RNA relied on proximity cross-linking (Ramani et al. 2015) or interactions critical to gene expression regulation (Butter base accessibility (Ding et al. 2014), each of which provides et al. 2009; Castello et al. 2012). However, our understand- constraints on the behavior of the ensemble of RNA struc- ing of the detailed relationship between the sequence of tures, but these methods require averaging over many RNA and its structure and function is limited, both in terms different structures and thus cannot quantify occupancy of how complex RNAs fold into 3D structures and also how across different structural states and cannot be easily read RNA sequence and structures determine protein–RNA in- out in terms of binding energies and kinetic constants. teractions. A predictive, quantitative understanding for In vivo approaches to quantify RNA–protein interac- these functional determinants (both 3D structure and pro- tions through cross-linking and immunoprecipitation tein interaction) will be needed before we can truly under- (CLIP, eCLIP, iCLIP, etc.) (Hafner et al. 2010; Van Nostrand stand, and engineer, RNA functions. et al. 2016) have mapped enrichment of RNA-binding However, the sequence–structure–function relation- protein (RBP) binding at high throughput, but often ship in RNA is complex—perhaps not as fundamentally suffer from poor sequence resolution, biased cross-linking combinatorial and interactive as protein, but much more efficiency, large variability in the abundance of different challenging than double-stranded, or even single-stranded, potential binding sites, and indirect targeting of binding DNA. RNA makes diverse intramolecular interactions— interactions with an antibody. In vitro approaches can solve Watson-Crick base pairs, noncanonical base pairs, tertiary a number of these uncertainties, and medium-throughput interactions, etc.—that define the 3D structures RNA takes approaches for quantitative measurement have been en- on (Turner et al. 1988; Tinoco and Bustamante 1999), abled by microfluidic methods (Martin et al. 2012). High- which in turn sets the landscape for its function. For exam- er-throughput measurements of relative RBP affinity have ple, critical RNAs in translation, splicing, and chromatin been achieved with sequencing-enrichment approaches like organization and compaction must form precise tertiary RNAcompete, RNA-bind-and-seq, RNA SELEX, HiTS- structures to achieve their biological function (Staley and Kin, and HiTS-Eq (Ellington and Szostak 1990; Ray et al. Guthrie 1998; Kieft et al. 2001; Noller 2005). In higher 2009; Lambert et al. 2014; Jankowsky and Harris 2017; Lou eukaryotes, these biological functions are rarely achieved et al. 2017). These methods have enabled measurements in the absence of RNA-binding proteins, which help to of relative RBP affinity across randomized libraries; howev- coordinate and regulate gene expression at multiple stages er, they often require immense sequencing resources for in the life cycle of an RNA (Moore 2005; Keene 2007), and readout and the enrichment readout is not as generalizable further entangle the RNA folding process with the physical as direct measurements of biophysical parameters. determinants of RNA–protein interactions. Furthermore, A recently developed class of techniques has enabled RNA sequences form conformationally dynamic structures, direct, fluorescence-based, quantitative measurement of with biologically functional conformational transitions and binding affinityand kineticsto RNA, with the same or larger multiple nonnative folded states (Winkler and Breaker experimental capacity as sequencing-based approaches. 2005; Solomatin et al. 2010). Protein binding can affect Two highly related approaches in this class are RNA-MaP the occupancyand rate of exchange between these structural (quantitative analysis of RNA on a massively parallel array) states, while conversely the structure of the RNA helps to (Buenrostro et al. 2014) and HiTS-RAP (high-throughput define protein binding affinity and specificity. Therefore, sequencing and RNA affinity profiling) (Tome et al. 2014); there exists a need for methods that can begin to provide In this review, we will refer to this class of methods collec- generalizable, broad insights into the complex relationship tively as RNA-HiTS, for an RNA array on a high-through- between RNA sequence, dynamic conformations, and pro- put sequencer. tein binding, creating a principled framework for under- These methods make use of the core functionality of standing RNA’s diverse functions within the cell. the high-throughput DNA sequencers (i.e., clonal amplifi- However, the challenge is vast. A productive approach cation and massively parallel fluorescence quantification) to to these problems requires measurements that can account enable high-throughput, fluorescence-based detection of

2 Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press RNA Studies on High-Throughput Sequencers nucleic acid substrates for diverse applications. RNA-HiTS 2 TECHNIQUES was born from work on DNA–protein interactions that 2.1 Illumina Platforms Generate Arrays of Sequence- used a high-throughput sequencer to characterize protein Identified DNA Clusters on the Surface of a Chip binding to a vast swath of DNA sequences (Nutiu et al. 2011)—the first demonstration of the power of the sequenc- During the process of high-throughput sequencing on ing instrument for quantitative determination of affinity an Illumina GAII(x) DNA Sequencer, millions of individ- landscapes of mutant nucleic acid sequences. Other appli- ual DNA strands are first immobilized on a flow cell surface, cations of sequencing infrastructure include selection of then clonally amplified to generate a “cluster” of DNA con- specific oligonucleotide sequences within a pool of targets taining approximately 1000 identical molecules (Fig. 1A). (Matzas et al. 2010), detection of mRNA sequence using During the sequencing process, the nucleic acid sequence of fluorescently labeled transfer RNAs (tRNAs) (Uemura et al. each clonal cluster is determined by step-wise sequencing- 2010), and determination of the length of poly(A) tails by-synthesis of one strand of each DNA molecule on the among different classes of mRNAs (Subtelny et al. 2014). surface of the flow cell. Fluorescently labeled nucleotides are RNA-HiTS technology substituted diverse RNA targets for incorporated such that each incorporation cycle allows de- the original dsDNA targets in Nutiu et al. (2011), ultimately termination of the identity of one base of the DNA sequence enabling equally comprehensive and quantitative character- (Fig. 1A). Typical sequencing reactions proceed for at least ization of binding to RNA (Buenrostro et al. 2014; Tome 30 cycles per read (up to 300 cycles), and multiple reads can et al. 2014). be performed starting from different priming sequences to

A DNA sequencing on an lllumina sequencing platform Library variants Cluster generation Sequencing by synthesis Cycle 1 Cycle 2 ... Cycle N

G A T CCG A T G A T C × 107

Flow cell surface TIRF images of fluorescent nucleotide incorporation:

C A T ... G G C A T G Oligonucleotide N = 103 DNA covalently attached to molecules per cluster flow cell surface

B RNA generation on a sequencing flow cell C RNA polymerase stalling methods

(1) 2nd-strand (2) RNA polymerase (3) RNA extension Streptavidin ssDNA synthesis initiation and stalling “roadblock” Tus binding to Ter site makes Biotin on Primer Nascent “roadblock” RNA primer +DNA pol. (step 1) +dNTPs RNA pol. +NTPs E. coli RNAP T7 RNA polymerase Sequencing flow cell

Figure 1. Harnessing the Illumina sequencing platform for systems-level biochemistry of RNA. (A) Schematic for DNA sequencing on an Illumina platform. DNA fragments with sequencing adapters on 5′ and 3′ ends anneal to complementary oligonucleotides covalently attached to the sequencing flow cell surface. Molecules are clonally amplified by polymerase chain reaction (PCR) to form “clusters” on the chip surface, which are then sequenced by sequential incorporation of reversibly terminated, fluorescent nucleotides. (B) RNA-HiTS extends the Illumina sequencing platform by transcribing clusters of DNA into RNA directly on the surface of the sequencing chip. (1) The second strand of DNA is regenerated with a DNA polymerase. (2) An RNA polymerase is initiated by allowing binding to an initiation sequence designed into the DNA fragment. (3) The RNA polymerase is allowed to extend until it stalls at a “roadblock,” resulting in stable display of the nascent RNA transcript (gray). (C) Two methods for stalling the RNA polymerase. (Left) A biotinylated primer is used for second-strand synthesis (see B); streptavidin is then bound to this biotin before transcription, making a “roadblock” that stalls the Escherichia coli RNA polymerase during the extension step. (Right) A Tus protein binds a Ter sequence element designed into each DNA fragment, which stalls T7 RNA polymerase.

Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 3 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press S.K. Denny and W.J. Greenleaf detect barcodes for demultiplexing or to read from both to its DNA template. In an alternate stalling scheme, the ends of the molecule for paired-end sequencing. replication terminator protein Tus is bound to a 32-bp The GAII hardware platform integrates fluidics han- Ter sequence element engineered into the DNA library dling and thermal control that enables molecular biology (Mohanty et al. 1996; Mulugu et al. 2001). T7 RNA poly- reactions to take place on the surface of the chip, as well merase then stalls when it encounters the bound Tus pro- as total internal reflection fluorescence (TIRF) imaging tein (Fig. 1C) (Tome et al. 2014). for detection of nucleotide incorporation across clusters After transcription, DNA oligos may be annealed to in massive throughput. Initial implementations of RNA- common sequences on all transcripts to help stabilize the HiTS methods hijacked these instruments directly (Buen- desired secondary structure of the RNA, to provide a read- rostro et al. 2014; Tome et al. 2014). However, the GAII out of transcription efficiency (Fig. 2A), or to otherwise platform is antiquated and unsupported, and newer, more label the transcribed RNA. economical Illumina instruments like the MiSeq allow much more rapid sequencing turnaround. However, these 2.3 Performing Kinetic and Thermodynamic newer instruments are highly integrated and do not support Measurements and Quantifying Images the facile implementation of nonstandard molecular biology and imaging workflows. Recently, our laboratory has The TIRF imaging platform enables direct measurement of developed a modified, custom-built imaging station based association of any fluorescently labeled binding partner on the original Illumina GAII that decouples the platform to every RNA cluster on the surface of the chip. This direct used for sequencing from the instrument used for custom observation of binding allows relatively straightforward biochemistry applications (She et al. 2017). This imaging measurement of kinetic and thermodynamic parameters. station is built using components from the original Illumina For example, the equilibrium dissociation constants (Kd) GAII(x) but is designed to interface with Illumina MiSeq can be measured by applying different concentrations of chips, with “home brew” image analysis software and hard- the binding partner to the chip, waiting sufficient time for ware interfaces. equilibration, then imaging fluorescence observed at each cluster (Fig. 2B,C). Quantifying the bound fluorescence across concentrations and fitting these data to a binding 2.2 In Situ Transcription to Display Nascent curve enables inference of the Kd and the standard free- Transcripts of RNA at High Throughput o energy change of forming the bound complex (ΔG =RT The conversion of the native DNA array of a postse- log Kd) (Fig. 2C). Dissociation rate constant (koff ) can like- quenced chip to an RNA array requires in situ transcrip- wise be directly measured by subsequently diluting the fluo- tion of the DNA clusters on the surface of the flow cell. rescently bound binding partner from solution, then Broadly, this in situ transcription is performed in three sequentially imaging of the loss of bound fluorescence steps: beginning with single-stranded DNA molecules co- over time. In principle, association rate constants can be valently attached to the sequencing flow cell surface, (1) measured in a similar manner, or can be calculated from second-strand synthesis of the DNA molecules by primer kon = koff/Kd. annealing and extension by a DNA polymerase is carried A necessary step to quantifying fluorescence of each out, (2) an RNA polymerase is initiated in a sequence- cluster is to align the positions of clusters obtained from specific manner, and then (3) a nascent RNA transcript the sequencing data with each fluorescent image (Fig. 2D). is extended until the RNA polymerase reaches a “road- An initial cross-correlation provides a rough alignment that block” that stably halts the polymerase on the DNA tem- accounts for overall offsets between the sequencing data and plate (Fig. 1B). the images, and then subsequent iterations of cross-corre- This “roadblock” has been implemented in two distinct lation of progressively finer grids of subtiles then account ways. In the first, a biotinylated primer can be used to prime for any optical aberrations introduced in different imaging second-strand synthesis, and subsequently this biotin may stations. Together, these values are fit to a continuous offset be bound by streptavidin before RNA polymerase initiation map that can be used to obtain precise alignment of se- and extension (Fig. 1C) (Buenrostro et al. 2014). The RNA quence data and experimental images (She et al. 2017). polymerase (in this case Escherichia coli RNAP) stalls when Subsequently, subtiles are fit to a sum of 2D Gaussian func- it encounters this terminal biotin–streptavidin roadblock tions to determine the integrated fluorescence of each clus- (Greenleaf et al. 2008), allowing stable display of the nascent ter. This strategy also allows for a subtile-specific RNA transcript. Furthermore, this workflow aims to allow background fluorescence value to be determined to account only a single RNAP molecule to initiate at the engineered for any differences in overall fluorescence in different re- promoter, producing a single piece of RNA stably tethered gions of the chip.

4 Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press RNA Studies on High-Throughput Sequencers

A BC RNA variants Equilibrium binding series []= 1 nM 10 nM 100 nM 1000 nM Fluorescently labeled binding partner

Sequencing flow cell Bound complex Image quantification Transcribed RNA:

Measured Kd (nM) 1 500 Kd 20 Quantified fluorescence

----Cluster locations 1 10 100 1000 [ ] (nM)

D Image quantification Bound fluorescence Cluster centers Quantified fluorescence (imaging station) (sequencing data) Image and data, aligned for each sequence

Hierarchical 2D cross- Gaussian correlation fitting

151 250 151 250 0 99 Cluster sequence has RNAP initiation site Cluster sequence no RNAP initiation site

Figure 2. Performing kinetic and thermodynamic measurements and quantifying images. (A) After in situ transcription, transcribed RNAs may be labeled with fluorescent oligonucleotides complementary to a common sequence to all library variants (simulated data below). (B) Application of a fluorescently labeled binding partner to the flow cell enables detection of the thermodynamics and kinetics of binding to each transcribed library variant. (C) A typical equilibrium binding experiment proceeds by applying a specific concentration of the binding partner to the flow cell, waiting for equilibrium, detecting bound fluorescence at each cluster of RNA, and then repeating at multiple concentrations of binding partner. After image quantification, fluorescent measurements are fit to a binding isotherm fi fl to obtain the observed equilibrium dissociation constant, Kd.(D) Image quanti cation work ow. First, images are aligned with the sequencing data through hierarchical cross-correlation. Images are then fit to a sum of 2D Gaussians, with each Gaussian centered at the position of each cluster. Scale bar, 2.5 µm.

2.4 Library Considerations and Requirements lection of a diverse set of natural sequences. A completely randomized nucleic acid sequence library is often not The ﬁrst step in any RNA-HiTS experiment is to generate a feasible—experiments can generally probe on the order diverse library of sequenceable DNA fragments that will of 106 different sequences at most (assuming an average serve as the basis of the RNA array. The generation of this of 10 replicate clusters per sequence variant and 107 total DNA library has two distinct steps. First, one must design clusters per chip), which corresponds to a completely and synthesize the variable region that will be transcribed in random sequence of only 10 bases. Structured RNA motifs situ. Next, this variant library must be converted into a like hairpin stems will often exceed this limit and have sequenceable library. We will deal with each of these steps considerable sequence constraints to maintain base- in turn. pairing, making targeted sequence variations as opposed Variable DNA sequences that serve as the core of a to random N-mers often more informative and interpret- RNA variant library can be created in three ways: (1) a full able. One way to focus investigational throughput around or partial randomization of nucleic acid synthesis, (2) an interesting functional variations is to produce a library array-based programmed library synthesis, and (3) the se- of sequence variants “centered” on a known consensus se-

Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 5 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press S.K. Denny and W.J. Greenleaf quence motif with error-prone polymerase chain reaction spurious results. A per-base error rate of ∼0.1% or more can (PCR) (Cirino et al. 2003; Tome et al. 2014), or with doped- be expected for Illumina sequencing. For a variable region in synthesis (Buenrostro et al. 2014), each of which can of 100 nt, such a rate might produce misassignment of introduce single, double, and some fraction of triple and a substantial fraction (1/10th) of library variants. One strat- higher-order mutations to the known motif. However, these egy to remove the impact of sequencing error is to introduce classes of “dirty” synthesis are constrained by the statistics a unique molecular identifier (UMI) to each molecule with- of random incorporation: In general, the consensus se- in the library, comprising a random sequence of 16 nt. quence will be highly represented, whereas higher-order On incorporation of this UMI to each memberof the library, mutations will have a much lower representation depending the library is diluted to ∼8×105 molecules and then ream- on sequence length and the error rate. plified, creating a “bottlenecked” population of PCR ampli- More recently, high-throughput oligo synthesis plat- cons, with each UMI represented many times in the library. forms have given researchers access to programmed parallel Sequencing this bottlenecked library allows for confident synthesis of up to 106 unique oligonucleotide sequences. assessment of the specific sequence variant associated These commercially available technologies allow research- with each UMI, and downstream analysis requires only ers to design a library of sequence variants that will each be the sequence of the UMI to identify the molecular variant. synthesized independently with approximately equal repre- In general, the number of sequenced clusters should sentation. This synthesis method allows the designer to ask exceed the number of variants by at least 10-fold, in order multiple questions within each library—that is, by introduc- to make multiple distributed measurements for each mo- ing variation in multiple known consensus sequences to lecular variant in the library to enable a principled under- assess evolutionary paths between the sequences or by standing of measurement noise. changing the sequences flanking the consensus sequence to assess the context dependence of mutational effects. Cur- rently, commercially available oligonucleotide pools are 3 APPLICATIONS generally limited to sequences <200 nt in length, and 3.1 Sequence and Structural Determinants whereas synthesis error rates can introduce additional un- of Protein–RNA Interactions wanted variation, efforts for improving fidelity are ongoing. 3.1.1 MS2 Coat Protein to Mutated RNA Targets A final method takes advantage of the diversity of sequence variants inherent to biological genomes. For ex- An overarching goal of RNA-HiTS technology has been to ample, the Saccharomyces cerevisiae genome can be enzy- make a generalizable, predictive model of protein–RNA matically digested to form the variable transcribed region in interaction affinity across any possible RNA sequence var- an RNA-HiTS experiment (She et al. 2017). This approach iant. The first protein targeted for this analysis was the MS2 enabled assessment of binding to targets embedded in their coat protein (Buenrostro et al. 2014), a protein that was physiologically relevant, local sequence context that could initially discovered to play a bifunctional role in viral coat affect local structure formation. assembly and translational repression in the bacteriophage The second step in library generation requires the con- MS2. The high affinity and specificity of this interaction has struction of a DNA fragment amenable to sequencing and made it useful in biotechnology applications including af- transcription, which is achieved by adding common se- finity purification, live-cell RNA imaging, and synthetic quences to the entire library of variants at the 5′ and 3′ biology (Bardwell and Wickens 1990; Bertrand et al. end. Each library variant must have an RNAP initiation 1998). The MS2 protein binds an RNA motif consisting site at the 5′ end of the variable transcribed region, and if of a stem loop structure with a single bulged residue (Fig. using Ter/Tus stalling, the 32-bp Ter element at the 3′ end 3A). Starting with this consensus sequence, RNA-HiTS en- of the transcribed region. Both stalling methods require a abled the quantitative measurement of MS2 interaction sufficient number of base pairs between the stall site of with all single, double, and a subset of triple and higher- the RNAP polymerase and the 3′ end of the RNA target, order mutants to comprehensively assess the contribution as the RNAP polymerase transcripts have a footprint that of each residue to the free-energy landscape of the binding leaves approximately 25 bases of RNA within the exit chan- process. The effects of each single mutant revealed high nel (Greenleaf et al. 2008), preventing access by the fluores- mutational sensitivity to a subset of single stranded residues cently labeled binding partner. important for docking into the single-stranded protein. The DNA sequencing error rate can present an issue for Double-mutant effects revealed epistasis between specific an RNA-HiTS experiment, especially for a library of highly base-paired residues, showing the importance of maintain- related sequences, as sequencing errors can lead to clusters ing base-pairing throughout the stem for MS2 protein rec- being assigned to the wrong molecular variant, producing ognition and binding.

6 Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press RNA Studies on High-Throughput Sequencers

A BC MS2 RNA kon + U k A C off G A G C 0.3 A C ) ) T T G C B B U A

A U Δ G ( k Unbound C G – ΔΔ G ( k A U –6.7 5′ 3′ 5′ 3′ ΔG‡ effect ΔG‡ ΔG‡ ΔG on Bound on off Δ ‡ avg. effect of transition/transversion Goff effect avg. effect of disrupting base pair

D PUM2 F Sequence: UGUACUAAUA

C U ′ Binding register 1 5′ A A A 3 U G U U A ΔΔ G1 C A Binding register 2 ′ ′ 5 G U A U A U A 3 ΔΔ RNA U G2 5′ 3′ Ensemble free-energy: U G U A U A U A C A ΔΔ Binding register 3 Gi ′ 3′ ΔΔG = RT log ∑ exp – 5 U A U A U A ΔΔG U G 3 i RT

E 0.0 ) T

Avg. effect of mutation: B 5′ 3′ C A Binding register N 5′ 3′ U G U A U A A U ΔΔG

– ΔΔ G ( k –5.3 N

Figure 3. Sequence and structural determinants of protein–RNA interactions. (A)(Left) Crystal structure of the MS2 coat protein–RNA interaction (PDB: 2BU1) (Grahn et al. 2001). (Right) Primary sequence and secondary structure of the consensus MS2 RNA binding site. (B) Linear regression analysis of single and double mutants of the consensus MS2 RNA binding site attributed effects to either primary sequence (base colors) or to secondary structure (base pair colors) (Buenrostro et al. 2014). (C) Free-energy diagram showing the unbound, transition, and bound states. Each Δ ‡ Δ Δ ‡ –Δ ‡ Δ G term is linked to the kinetics of forming the complex, with G = Goff Gon. Mutations can affect the G by Δ ‡ Δ ‡ affecting Goff (dotted line), Gon (dashed line), or both. (D) Crystal structure of the RNA-binding domain of human PUM2 protein in complex with RNA (PDB: 3Q0Q) (Lu and Hall 2011), with the primary sequence of the consensus site indicated. (E) The average effect of each single mutation to the consensus site (data from I Jarmoskaite, SK Denny, P Vaidyanathan, et al., in prep.). (F) Multiple binding registers can give rise to a stable interaction between PUM2 and RNA for a given RNA sequence. Residues pointing down contribute to protein binding, whereas residues pointing up are “flipped out.” The ΔΔG of each binding register is calculated with an energy term for each residue contacting the protein, with a penalty for having flipped out residues, as well as other coupling terms. The final effect for a given sequence is the ensemble free-energy of each possible binding register.

‡ These observations allow direct quantification of pri- (Fig. 3C). Each of these ΔG terms is linked to the kinetics of mary sequence and secondary structure determinants of forming the complex, with the first step affecting the asso- protein recognition and binding using a simple regression ciation rate constant, and the second step affecting the Δ ‡ Δ ‡ model that estimates the effect of a transition or a trans- dissociation rate constant: Gon =RT log kon and Goff = version of each residue, and the effect of converting a base RT log koff. Thus, measuring the kinetic rate constants in pair to either a noncanonical base pair or to an otherwise addition to equilibrium binding allows quantification of disrupted base pair (Fig. 3B). This full model was predictive mutational effects on each of these steps of the binding of the higher-order mutation effects. process (Fig. 3C). In the MS2 coat protein experiment, dis- The free-energy change in forming a bound complex sociation rate constants (koff ) were measured for each mu- can be decomposed into two contributions: the free-energy tant, revealing that mutation effects predominantly resulted change between the starting state and the transition state from changes in the association rate constant, kon, suggest- Δ ‡ ( Gon), and the free-energy change between the transition ing that MS2 must wait for the formation of a binding fi Δ ‡ Δ Δ ‡ – Δ ‡ state and the nal state ( Goff), such that G = Goff Gon competent RNA secondary structure before association.

Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 7 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press S.K. Denny and W.J. Greenleaf

Mutational effects can be used to constrain the func- the 3′ UTR of genes (Wickens et al. 2002; Van Nostrand tional landscape in which evolution must operate to gener- et al. 2016). Binding preference for this consensus motif is ate or maintain high affinity binding. Many double mutants conferred by a highly modular protein structure composed within the MS2 hairpin had very small effects and thus of eight amino acid repeats that each interact with a single represent equivalent “solutions” to the evolutionary “prob- RNA nucleotide (Fig. 3D) (Wang et al. 2001). Thus, this lem” of creating a stable RNA–protein interaction. Howev- system provides an ideal testbed for developing an additive er, the mutational paths that connect each pair of double model aimed at predicting binding free-energy from prima- mutants were observed to have very different “roughness” ry nucleic acid sequence. in this functional binding space, depending on the order However, the high-precision measurements afforded by in which the mutations accumulated. These types of data RNA-HiTS analysis revealed a substantially more complex provide quantitative insight into complexities and path de- picture of the binding process, even for this simple interac- pendences of natural and engineered methods for evolving tion, than could be captured by a simple linear model high-affinity binding partners. (I Jarmoskaite, SK Denny, P Vaidyanathan P, et al., in prep.). These complexities stemmed from three primary sources: (1) RNA secondary structure formation can affect 3.1.2 RNA Aptamers for GFP and NELF the accessibility of the binding site; (2) alternate binding RNA-HiTS allows characterization of synthetic RNA– registers with “flipped-out” residues can form stable com- protein interactions. For example, Tome and colleagues plexes; and finally (3) subtle, yet pervasive, energetic cou- characterized the specificity of two RNA aptamers that pling can occur between residues within the binding site. were each originally identified through in vitro selection: Systematic identification and quantification of these dis- aptamers to the green fluorescent protein (GFP) and to the tinct contributions resulted in a two-step model that first Drosophila negative elongation factor (NELF-E) (Shui et al. enumerates the possible binding configurations, then pre- 2012; Pagano et al. 2014; Tome et al. 2014). Variants of each dicts the binding free-energy of each configuration using a aptamer were generated with error-prone PCR, resulting linearly additive model with coupling terms (Fig. 3E,F). The in single, double, and some higher order mutants of each final observed affinity for any sequence is thus derived from sequence. Examining the effects on binding affinity of each the ensemble of free energies for all binding configurations aptamer to their respective proteins identified regions with (Fig. 3F), modulated by secondary structure formation as substantial sensitivity to single point mutants. The summed predicted by algorithms such as Vienna fold (Gruber et al. effects of these single point mutations were predictive of 2008). This model predicted Puf occupancy that was line- the effects of higher-order mutations in the NELF aptamer, arly related to the observed occupancy of these sites as supporting an additive model whereby each residue con- determined in an in vivo cross-linking experiment, high- tributes additively to the affinity of this interaction. How- lighting both the usage of this fully predictive model and ever, this linearity was not observed in the GFP aptamer, supporting the importance of thermodynamics in in vivo which had sensitivity to changes in base-paired regions, settings (Van Nostrand et al. 2016). leading to significant epistasis between residues. These observations show the ability of RNA-HiTS to quantify and 3.2 Protein–RNA Interactions across Transcriptome define the potentially diverse binding mechanisms that may Targets come out of an in vitro selection. Mutational analysis of the GFP aptamer also revealed two point mutants with several- The investigation of the relationship between protein bind- fold higher affinity than the original aptamer, showing ing in the context of in vivo–derived RNAs is another ex- RNA-HiTS’s capacity to optimize affinity over the original citing application of RNA-HiTS. The immense capacity of selected aptamer. We envision that, in the future, the RNA- these instruments opens the possibility of placing an entire HiTS technology may be used as part of the in vitro selec- eukaryotic genome on the chip, and probing binding tion process (i.e., to characterize a broader diversity of to every theoretically possible transcript. This approach, putative RNA targets at earlier rounds of selection). termed the transcribed genome array (TGA), was used to examine binding preferences of the RNA binding domain of Vts1 from S. cerevisiae (She et al. 2017), a member of 3.1.3 Puf Binding to Designed Library of RNA the Smaug family of proteins with conserved RNA binding Puf family proteins are versatile posttranscriptional regula- domains implicated in mediating RNA decay throughout tors found throughout eukaryotes (Quenault et al. 2011). eukaryotes (Aviv et al. 2003; Oberstrass et al. 2006; Rendl The specificity of Puf–RNA interactions is thought to et al. 2008, 2012; Riordan et al. 2011). The entire S. cerevi- be driven by an ∼8-nt consensus sequence found within siae genome was fragmented into ∼100-nt fragments result-

8 Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press RNA Studies on High-Throughput Sequencers ing in coverage of >30× per nucleotide. This library of standing of these aptamers and their coupling to confor- overlapping fragments formed the substrate for in vitro mational changes. transcription in the RNA-HiTS experiment. To make progress on this question, the RNA-HiTS Using this unbiased, transcriptome-wide approach, platform was used to measure the response of tens of approximately 300 sites within the genome were expected thousands of rationally designed RNA riboswitches. These to be significantly occupied within the cell, based on the designs came from scientific “enthusiasts” playing the on- apparent Kds observed and the estimated physiological pro- line game EteRNA. This crowd-based rational approach tein concentration (130 nM); two-thirds of these sites fell enables massively parallel, yet hypothesis-driven science, into the transcribed strand of an open reading frame. These as originally shown for RNA secondary structure folding gene targets were significantly overlapping with those from (Lee et al. 2014). EteRNA players were tasked with design- in vivo cross-linking and immunoprecipitation (RIP-seq) ing RNA sequences that would satisfy the “challenge” of experiments, suggesting that in vitro identified targets re- forming the prespecified aptamer domain and changing capitulate in vivo binding events in aggregate (Aviv et al. another domain’s conformation in response to the effector 2006; Hogan et al. 2008). The functional relevance of these molecule (see eternagame.org/web/). The player-generated binding events was assessed by comparing the expression designs were tested using RNA-HiTS to detect the response between vts1Δ and wild-type cells by RNA-seq. These data of each riboswitch, resulting in a score for each riboswitch; showed that genes associated with TGA-identified binding players could then use these scores in subsequent rounds sites had significantly increased average expression, al- to optimize their designs. In combination with RNA-HiTS, though RIP-identified targets displayed no significant aver- this crowd-sourced approach produced thermodynamical- age expression change. Targets common to both analyses ly optimal sensors, with binding of the aptamer domain had the greatest expression increase, suggesting that TGA fully coupled to changes in the conformation of the readout may help uncover the true positives within RIP-identified domain (JOL Andresen et al., in prep.). targets. In addition, TGA identified many new targets, vir- tually all of which were lowly expressed in vivo, highlighting 3.4 Understanding Sequence Contributions a crucial difference between TGA and in vivo immunopre- to RNA Tertiary Structure cipitation experiments: Cross-linking and pulldown experiments are often limited to probing only targets within A last compelling application of RNA-HiTS is in develop- genes with high expression in the conditions that the cells ing a predictive model for the RNA folding process. had been grown. Instead, TGA allows investigation of af- Although the first step of hierarchical RNA folding (i.e., finity to all possible transcripts simultaneously. secondary structure formation) may be predicted fairly accurately for the lowest free-energy state with readily available software (Turner et al. 1988; Gruber et al. 2008), mod- els to predict RNA tertiary structure from this secondary 3.3 Design and Engineering of RNA Devices structure are substantially more limited in scope. Current Using Massively Parallel Rational Design approaches have focused on obtaining structural character- The flexibility of RNA-HiTS enables investigation of RNA ization of the “modules” of RNA tertiary structure—that beyond its capacity to bind proteins. One exciting area of is, recurrent RNA elements like helices and junctions that application is in the field of biological engineering wherein have similar crystal structures in different RNAs—and then RNA is used as a programmable tool for control of biolog- using these structures like Legos to infer the conformation ical systems. The motivation for this molecular engineering of larger RNAs (Jaeger et al. 2001; Petrov et al. 2013; Miao stemmed from the discovery of natural RNA riboswitches, and Westhof 2017). However, the inherent flexibility of which are conceptually composed of two pieces: the struc- RNA limits the applicability of these approaches to achieve tured aptamer that binds a small-molecule “effector,” and thermodynamic understanding of the stability of tertiary a second region that undergoes a conformational change structures (Herschlag et al. 2015). Furthermore, this ap- in response to the effector, generally resulting in a change proach is limited by the small number of elements for which of gene expression through various mechanisms (Tucker high-quality crystallographic information exists. and Breaker 2005). The separation between these two To approach these challenges, RNA-HiTS was used functions (i.e., the aptamer domain and the resulting to measure the sequence dependence of tertiary structure functional consequence) supports that RNAs might enable formation at high throughput (Denny et al. 2018; Yessel- modular programming of specific computations within the man et al. 2018). RNA elements like helices and two-way cell (Kim and Smolke 2017). However, the design of these junctions were introduced as “guest” motifs into a “host” devices is currently limited by imperfect predictive under- system—together, they formed a minimal RNA tertiary as-

Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 9 ..DnyadWJ Greenleaf W.J. and Denny S.K. 10 Downloaded from http://cshperspectives.cshlp.org/ MS2 PUM2 GFPapt NELFapt Vts1 Riboswitches tectoRNAs

Substrate

MS2 Aptamer binding site Single, double mutants Single, double, and Variants of RNA of consensus Single and double Single and double Genomic fragments of Rationally designed higher-order mutants helices and junctions sequence, in multiple mutants of in vitro mutants of in vitro S. cerevisiae riboswitches from of consensus inserted into tectoRNA flanking sequence and selected sequence selected sequence online game EteRNA sequence. structure contexts. scaffold ieti ril as article this Cite onOctober2,2021-PublishedbyColdSpringHarborLaboratoryPress Binding partner tectoRNA- binding MS2 coat protein PUM2 RNA- EGFP protein NELF-E protein Vts1 RNA- MS2 coat protein partner binding domain binding domain +/- effector odSrn abPrpc Biol Perspect Harb Spring Cold

Observations Kd, koff Kd, koff Kd Kd Kd, obs Kd Kd, koff

Interpretations Primary sequence and Primary sequence Primary sequence Primary sequence Binding sites of Vts1 in Riboswitches that fully Classes of conformational secondary structure effects, alternate effects from single effects from single the yeast genome; couple binding of behavior of RNA determinants of binding, binding registers, mutants; these effects mutants; the sum of comparison with aptamer domain to junctions determined in a linear regression. secondary structure were not predictive of each single-mutant functional gene change in conformation through thermodynamic effects, and other double-mutant effects. effect predicted double- targets. of MS2 binding fingerprints. epistatic contributions. mutant effects. site. Has endogenous RNA binding? Ye s No o:10.1101/cshperspect.a032300 doi: Figure 4. Summary of RNA-HiTS applications. Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press RNA Studies on High-Throughput Sequencers sembly (tectoRNA), whose formation quantitatively de- atic understanding of these other effects is fundamentally pends on the conformational preferences of the integrated impossible without a “baseline” of expected occupancy of element (Jaeger and Leontis 2000; Geary et al. 2008). For sites derived from thermodynamic parameters measured in two-way junction elements, “thermodynamic fingerprints” the absence of these confounders. were constructed that represented a multidimensional read- More broadly, we anticipate other areas of applications out of the junction’s conformational behavior. Unbiased of “ systems biochemistry” approaches like RNA-HiTS. clustering of junction thermodynamic fingerprints revealed Multiple groups have used sequencing arrays to investigate classes of junctions with similar conformational preferenc- DNA–protein interactions (Nutiu et al. 2011; Boyle et al. es. Differences in conformational behavior were largely 2017), and we anticipate that extending this approach with driven by differences in secondary structure (as expected multicolor imaging capabilities will begin to unravel the from Bailor et al. 2010), but also by previously unrecognized nature of transcription factor binding cooperativity, and primary sequence attributes like mismatch identity. Con- nucleic acid binding cooperative more generally. Beyond formational classes were also useful for inferring structural transcription factors, proteins that can be engineered to behavior. Existing crystallographic conformations were as- bind diverse nucleic acid targets (e.g., Cas9, Cpf1, TALEN, sociated with specific thermodynamic fingerprints, ulti- Cas13, and AGO proteins) are of special interest. The mately enabling the identification of structural ensembles potential binding targets of each of these programmable describing the flexible and dynamic behavior of junctions proteins span an immense combinatorial space, making with similar thermodynamic fingerprints. These “boot- high-throughput and quantitative investigations a neces- strapped” ensembles, when combined with ensembles for sary step in characterizing their behavior for applied and RNA helices and a statistical mechanics model for tectoR- therapeutic purposes. NA assembly formation, were predictive of measured ther- With methods such as RNA-HiTS, the biochemical modynamic fingerprints and outperformed the use of static and biophysical realms have been brought into the high- structures for predicting these behaviors. throughput era. Now, we as RNA biologists must direct the immense bandwidth afforded by these approaches to “carve nature at its joints.” Thus, a significant challenge remains in 4 CONCLUSION designing and engineering RNA libraries to provide maxi- Initial applications of the RNA-HiTS platform focused mal insight into the physical underpinnings of complex on dissecting the sequence and structural determinants RNA behaviors. of RNA–protein interactions. Collectively, these studies allowed accurate quantification of the subtle contributions to binding affinity of primary sequence effects, alternate REFERENCES binding registers, secondary structure effects, and other epistatic relationships between residues not directly involved Aviv T, Lin Z, Lau S, Rendl LM, Sicheri F, Smibert CA. 2003. The RNA- binding SAM domain of Smaug defines a new family of post-transcrip- in base-pairing (Fig. 4). The quantitative contributions of tional regulators. Nat Struct Biol 10: 614–621. each of these elements were identified in the case of the GFP Aviv T, Lin Z, Ben-Ari G, Smibert CA, Sicheri F. 2006. Sequence-specific aptamer and Vts1; and these contributions were further recognition of RNA hairpins by the SAM domain of Vts1p. Nat Struct Mol Biol 13: 168–176. modeled for any arbitrary sequence in the cases of MS2, Bailor MH, Sun X, Al-Hashimi HM. 2010. Topology links RNA second- NELF-E, and PUM2. In the future, diverse other protein– ary structure with global conformation, dynamics, and adaptation. RNA interactions may be interrogated in such a system, Science 327: 202–206. Bardwell VJ, Wickens M. 1990. Purification of RNA and RNA–protein allowing a more complete mapping of the diverse physical fi – complexes by an R17 coat protein af nity method. Nucleic Acids Res mechanisms that determine RNA protein interactions. 18: 6587–6594. RNA-HiTS data can complement in vivo cross-linking Bertrand E, Chartrand P, Schaefer M, Shenoy SM, Singer RH, Long RM. and immunoprecipitation experiments, as seen in the cases 1998. Localization of ASH1 mRNA particles in living yeast. Mol Cell 2: 437–445. of Vts1 and Pum2, by identifying likely false-positive tar- Boyle EA, Andreasson JOL, Chircus LM, Sternberg SH, Wu MJ, Guegler gets, as well as binding sites in lowly expressed transcripts CK, Doudna JA, Greenleaf WJ. 2017. High-throughput biochemical that are not able to be probed with cross-linking methods. profiling reveals sequence determinants of dCas9 off-target binding 114: – The manner by which the in vitro occupancy predicted and unbinding. Proc Natl Acad Sci 5461 5466. Buenrostro JD, Araya CL, Chircus LM, Layton CJ, Chang HY, Snyder MP, by RNA-HiTS measurements is affected by additional ef- Greenleaf WJ. 2014. Quantitative analysis of RNA–protein interactions fects in vivo—such as RNA localization and sequestration, on a massively parallel array reveals biophysical and evolutionary landscapes. Nat Biotechnol 32: 562–568. cooperativity with other RNA binding proteins, as well as – “ ” — Butter F, Scheibe M, Mörl M, Mann M. 2009. Unbiased RNA protein helicases that might periodically reset the binding state interaction screen by quantitative proteomics. Proc Natl Acad Sci 106: remain to be systematically addressed. However, a system- 10626–10631.

Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 11 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press S.K. Denny and W.J. Greenleaf

Castello A, Fischer B, Eichelbaum K, Horos R, Beckmann BM, Strein C, Martin L, Meier M, Lyons SM, Sit RV, Marzluff WF, Quake SR, Chang Davey NE, Humphreys DT, Preiss T, Steinmetz LM, et al. 2012. Insights HY. 2012. Systematic reconstruction of RNA functional motifs with into RNA biology from an atlas of mammalian mRNA-binding pro- high-throughput microfluidics. Nat Methods 9: 1192–1194. teins. Cell 149: 1393–1406. Matzas M, Stähler PF, Kefer N, Siebelt N, Boisguérin V, Leonard JT, Keller Cirino PC, Mayer KM, Umeno D. 2003. Generating mutant libraries A, Stähler CF, Häberle P, Gharizadeh B, et al. 2010. High-fidelity gene using error-prone PCR. Methods Mol Biol 231: 3–9. synthesis by retrieval of sequence-verified DNA identified using high- Denny SK, Bisaria N, Yesselman JD, Das R, Herschlag D, Greenleaf WJ. throughput pyrosequencing. Nat Biotechnol 28: 1291–1294. 2018. High-throughput investigation of diverse junction elements in Miao Z, Westhof E. 2017. RNA structure: Advances and assessment of 3D RNA tertiary folding. Cell 174: 377–390.e20. structure prediction. Annu Rev Biophys 46: 483–503. Ding Y, Tang Y, Kwok CK, Zhang Y, Bevilacqua PC, Assmann SM. 2014. Mohanty BK, Sahoo T, Bastia D. 1996. The relationship between se- In vivo genome-wide profiling of RNA secondary structure reveals quence-specific termination of DNA replication and transcription. novel regulatory features. Nature 505: 696–700. EMBO J 15: 2530–2539. Ellington AD, Szostak JW. 1990. In vitro selection of RNA molecules that Moore MJ. 2005. From birth to death: The complex lives of eukaryotic bind specific ligands. Nature 346: 818–822. mRNAs. Science 309: 1514–1518. Geary C, Baudrey S, Jaeger L. 2008. Comprehensive features of natural Mulugu S, Potnis A, Shamsuzzaman, Taylor J, Alexander K, Bastia D. and in vitro selected GNRA tetraloop-binding receptors. Nucleic Acids 2001. Mechanism of termination of DNA replication of Escherichia coli – 98: Res 36: 1138–1152. involves helicase contrahelicase interaction. Proc Natl Acad Sci – Grahn E, Moss T, Helgstrand C, Fridborg K, Sundaram M, Tars K, Lago 9569 9574. 309: H, Stonehouse NJ, Davis DR, Stockley PG, et al. 2001. Structural basis Noller HF. 2005. RNA structure: Reading the ribosome. Science – of pyrimidine specificity in the MS2 RNA hairpin-coat-protein com- 1508 1514. plex. RNA 7: 1616–1627. Nutiu R, Friedman RC, Luo S, Khrebtukova I, Silva D, Li R, Zhang L, fi Greenleaf WJ, Frieda KL, Foster DAN, Woodside MT, Block SM. 2008. Schroth GP, Burge CB. 2011. Direct measurement of DNA af nity landscapes on a high-throughput sequencing instrument. Nat Biotech- Direct observation of hierarchical folding in single riboswitch ap- 29: – tamers. Science 319: 630–633. nol 659 664. Oberstrass FC, Lee A, Stefl R, Janis M, Chanfreau G, Allain FH-T. 2006. Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL. 2008. The fi Vienna RNA websuite. Nucleic Acids Res 36: W70–W74. Shape-speci c recognition in the structure of the Vts1p SAM domain with RNA. Nat Struct Mol Biol 13: 160–167. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jungkamp A-C, Munschauer M, et al. 2010. Pagano JM, Kwak H, Waters CT, Sprouse RO, White BS, Ozer A, Szeto K, Shalloway D, Craighead HG, Lis JT. 2014. Defining NELF-E RNA Transcriptome-wide identification of RNA-binding protein and mi- binding in HIV-1 and promoter-proximal pause regions. PLoS Genet croRNA target sites by PAR-CLIP. Cell 141: 129–141. 10: e1004090. Herschlag D, Allred BE, Gowrishankar S. 2015. From static to dynamic: Petrov AI, Zirbel CL, Leontis NB. 2013. Automated classification of RNA The need for structural ensembles and a predictive model of RNA 3D motifs and the RNA 3D Motif Atlas. RNA 19: 1327–1340. folding and function. Curr Opin Struct Biol 30: 125–133. Quenault T, Lithgow T, Traven A. 2011. PUF proteins: Repression, acti- Hogan DJ, Riordan DP, Gerber AP, Herschlag D, Brown PO. 2008. vation and mRNA localization. Trends Cell Biol 21: 104–112. Diverse RNA-binding proteins interact with functionally related Ramani V, Qiu R, Shendure J. 2015. High-throughput determination of sets of RNAs, suggesting an extensive regulatory system. PLoS Biol 6: RNA structure by proximity ligation. Nat Biotechnol 33: 980–984. e255. Ray D, Kazan H, Chan ET, Peña Castillo L, Chaudhry S, Talukder S, Jaeger L, Leontis NB. 2000. Tecto-RNA: One-dimensional self-assem- 39: – Blencowe BJ, Morris Q, Hughes TR. 2009. Rapid and systematic anal- bly through tertiary interactions. Angew Chem Int Ed 2521 ysis of the RNA recognition specificities of RNA-binding proteins. Nat 2524. Biotechnol 27: 667–670. Jaeger L, Westhof E, Leontis NB. 2001. TectoRNA: Modular assembly Rendl LM, Bieman MA, Smibert CA. 2008. S. cerevisiae Vts1p induces units for the construction of RNA nano-objects. Nucleic Acids Res 29: – deadenylation-dependent transcript degradation and interacts with 455 463. the Ccr4p-Pop2p-Not deadenylase complex. RNA 14: 1328–1336. fi – Jankowsky E, Harris ME. 2017. Mapping speci city landscapes of RNA Rendl LM, Bieman MA, Vari HK, Smibert CA. 2012. The eIF4E-binding protein interactions by high throughput sequencing. Methods 118– 119: – protein Eap1p functions in Vts1p-mediated transcript decay. PLoS 111 118. ONE 7: e47121. Keene JD. 2007. RNA regulons: Coordination of post-transcriptional Riordan DP, Herschlag D, Brown PO. 2011. Identification of RNA rec- 8: – events. Nat Rev Genet 533 543. ognition elements in the Saccharomyces cerevisiae transcriptome. Nu- Kieft JS, Zhou K, Jubin R, Doudna JA. 2001. Mechanism of ribosome cleic Acids Res 39: 1501–1509. 7: – recruitment by hepatitis C IRES RNA. RNA 194 206. She R, Chakravarty AK, Layton CJ, Chircus LM, Andreasson JOL, Dam- Kim CM, Smolke CD. 2017. Biomedical applications of RNA-based de- araju N, McMahon PL, Buenrostro JD, Jarosz DF, Greenleaf WJ. 2017. vices. Curr Opin Biomed Eng 4: 106–115. Comprehensive and quantitative mapping of RNA–protein interac- Lambert N, Robertson A, Jangi M, McGeary S, Sharp PA, Burge CB. 2014. tions across a transcribed eukaryotic genome. Proc Natl Acad Sci RNA Bind-n-Seq: Quantitative assessment of the sequence and 114: 3619–3624. structural binding specificity of RNA binding proteins. Mol Cell 54: Shui B, Ozer A, Zipfel W, Sahu N, Singh A, Lis JT, Shi H, Kotlikoff MI. 887–900. 2012. RNA aptamers that functionally interact with green fluorescent Lee J, Kladwang W, Lee M, Cantu D, Azizyan M, Kim H, Limpaecher A, protein and its derivatives. Nucleic Acids Res 40: e39. Yoon S, Treuille A, Das R, et al. 2014. RNA design rules from a massive Solomatin SV, Greenfeld M, Chu S, Herschlag D. 2010. Multiple native open laboratory. Proc Natl Acad Sci 111: 2122–2127. states reveal persistent ruggedness of an RNA folding landscape. Na- Lou T-F, Weidmann CA, Killingsworth J, Tanaka Hall TM, Goldstrohm ture 463: 681–684. AC, Campbell ZT. 2017. Integrated analysis of RNA-binding protein Staley JP, Guthrie C. 1998. Mechanical devices of the spliceosome: Mo- complexes using in vitro selection and high-throughput sequencing tors, clocks, springs, and things. Cell 92: 315–326. and sequence specificity landscapes (SEQRS). Methods 118–119: 171– Subtelny AO, Eichhorn SW, Chen GR, Sive H, Bartel DP. 2014. Poly(A)- 181. tail profiling reveals an embryonic switch in translational control. Na- Lu G, Hall TMT. 2011. Alternate modes of cognate RNA recognition by ture 508: 66–71. human PUMILIO proteins. Structure 19: 361–367. Tinoco I, Bustamante C. 1999. How RNA folds. J Mol Biol 293: 271–281.

12 Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press RNA Studies on High-Throughput Sequencers

Tome JM, Ozer A, Pagano JM, Gheba D, Schroth GP, Lis JT. 2014. Com- Robust transcriptome-wide discovery of RNA-binding protein binding prehensive analysis of RNA–protein interactions by high-throughput sites with enhanced CLIP (eCLIP). Nat Methods 13: 508–514. sequencing-RNA afﬁnity proﬁling. Nat Methods 11: 683–688. Wang X, Zamore PD, Hall TM. 2001. Crystal structure of a Pumilio Tucker BJ, Breaker RR. 2005. Riboswitches as versatile gene control ele- homology domain. Mol Cell 7: 855–865. ments. Curr Opin Struct Biol 15: 342–348. Wickens M, Bernstein DS, Kimble J, Parker R. 2002. A PUF family Turner DH, Sugimoto N, Freier SM. 1988. RNA structure prediction. Ann portrait: 3′UTR regulation as a way of life. Trends Genet 18: 150– Rev Biophys Biophys Chem 17: 167–192. 157. Uemura S, Aitken CE, Korlach J, Flusberg BA, Turner SW, Puglisi JD. Winkler WC, Breaker RR. 2005. Regulation of bacterial gene expression 2010. Real-time tRNA transit on single translating ribosomes at codon by riboswitches. Annu Rev Microbiol 59: 487–517. resolution. Nature 464: 1012–1017. Yesselman JD, Denny SK, Bisaria N, Herschlag D, Greenleaf WJ, Das R. Van Nostrand EL, Pratt GA, Shishkin AA, Gelboin-Burkhart C, Fang MY, 2018. RNA tertiary structure energetics predicted by an ensemble Sundararaman B, Blue SM, Nguyen TB, Surka C, Elkins K, et al. 2016. model of the RNA double helix. bioRxiv doi: 10.1101/341107.

Cite this article as Cold Spring Harb Perspect Biol doi: 10.1101/cshperspect.a032300 13 Downloaded from http://cshperspectives.cshlp.org/ on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press

Linking RNA Sequence, Structure, and Function on Massively Parallel High-Throughput Sequencers

Sarah K. Denny and William J. Greenleaf

Cold Spring Harb Perspect Biol published online October 15, 2018

Subject Collection RNA Worlds

Alternate RNA Structures Structural Biology of Telomerase Marie Teng-Pei Wu and Victoria D'Souza Yaqiang Wang, Lukas Susac and Juli Feigon Approaches for Understanding the Mechanisms Structural Insights into Nuclear pre-mRNA of Long Noncoding RNA Regulation of Gene Splicing in Higher Eukaryotes Expression Berthold Kastner, Cindy L. Will, Holger Stark, et al. Patrick McDonel and Mitchell Guttman Principles and Practices of Hybridization Capture What Are 3′ UTRs Doing? Experiments to Study Long Noncoding RNAs That Christine Mayr Act on Chromatin Matthew D. Simon and Martin Machyna Linking RNA Sequence, Structure, and Function Single-Molecule Analysis of Reverse on Massively Parallel High-Throughput Transcriptase Enzymes Sequencers Linnea I. Jansson and Michael D. Stone Sarah K. Denny and William J. Greenleaf Extensions, Extra Factors, and Extreme CRISPR Tools for Systematic Studies of RNA Complexity: Ribosomal Structures Provide Regulation Insights into Eukaryotic Translation Jesse Engreitz, Omar Abudayyeh, Jonathan Melanie Weisser and Nenad Ban Gootenberg, et al. Nascent RNA and the Coordination of Splicing Relating Structure and Dynamics in RNA Biology with Transcription Kevin P. Larsen, Junhong Choi, Arjun Prabhakar, et Karla M. Neugebauer al. Combining Mass Spectrometry (MS) and Nuclear Beyond DNA and RNA: The Expanding Toolbox of Magnetic Resonance (NMR) Spectroscopy for Synthetic Genetics Integrative Structural Biology of Protein−RNA Alexander I. Taylor, Gillian Houlihan and Philipp Complexes Holliger Alexander Leitner, Georg Dorn and Frédéric H.-T. Allain For additional articles in this collection, see http://cshperspectives.cshlp.org/cgi/collection/

Discovering and Mapping the Modified Structural Basis of Nuclear pre-mRNA Splicing: Nucleotides That Comprise the Epitranscriptome Lessons from Yeast of mRNA Clemens Plaschka, Andrew J. Newman and Kiyoshi Bastian Linder and Samie R. Jaffrey Nagai

For additional articles in this collection, see http://cshperspectives.cshlp.org/cgi/collection/