technology feature

Tool box 380 Getting structure 382 Box 1: Long noncoding RNAs regulating 380 Box 2: Possible roles for long noncoding RNA 381

Long noncoding RNAs: the search for function Monya Baker

Transcripts are easy to find: sorting out what they do is a challenge.

In the late 1990s, before the publication

of the , John Rinn, then at X Yale University, was hunting for protein ? genes on 22 with his gradu- ate student adviser Michael Snyder. The lncRNA only genes they found were ones that had (iii) Protein X Activate or already been discovered, but their arrays repress protein identified a steady stream of transcribed ? regions with no apparent purpose. These POL II long noncoding RNAs (lncRNAs) came Gene A Gene B ... from genome regions that were known to lack protein genes. The transcripts also lncRNA lacked open reading frames and other (i) ? Z

properties necessary for them to be trans- Protein X lated into . lncRNA X Most scientists at the time dismissed (ii) such transcripts as noise, but Rinn kept doing experiments. “I started cloning Y them,” he recalls, “and I realized that if I X Nature America, Inc. All rights reserved. All rights Inc. America, Nature 1 could clone them, they must be stable.” And if they were stable, he thought, per-

© 201 haps they were functional, too. In 2004, Z

Rinn took a postdoctoral position with Rinn J. Howard Chang at Stanford University, Long noncoding RNAs (lncRNAs) could be a byproduct of transcription (i), a scaffold linking after the two came up with a scheme to proteins (ii) or a guide bringing proteins to specified parts of the genome (iii). The same lncRNA learn what, if anything, these mysterious can function simultaneously as a scaffold and a guide. POL II, RNA polymerase II. transcripts were doing. This work led eventually to the dis- covery of a noncoding RNA they named Function focus different cells. Other researchers have HOTAIR1. This 2.2-kilobase spliced RNA Everyone has favorite analogies for how suggested an effector component: lncRNA transcript interacts with the protein com- lncRNAs might function. Chang recently binds to a protein, changing its structure plex polycomb to modify and showed that HOTAIR serves as a ‘modu- and activating it4. Tom Cech, a scientist repress transcription of the human HOX lar scaffold’, assembling a molecular cargo at the University of Colorado at Boulder genes, which regulate development. How of specific combinations of enzymes that who won the Nobel Prize for his work on exactly it does so is still unclear. are equipped to regulate target genes2. RNA, believes that each of these mecha- What is clear is that HOTAIR is just one Rinn likens some of these scaffolds to an nisms and more may be in play. of thousands of lncRNAs. Although less air traffic controller, guiding regulatory The possibilities seem endless (Box 2). than 2% of a mammalian genome codes machinery to the appropriate spots in the Some lncRNAs may even enhance tran- for protein, studies consistently show that genome. His work has shown that hun- scription through chromosome looping half or even more of the genome is tran- dreds of lncRNAs are physically associ- or other means5,6. “I don’t know why peo- scribed. Partly because suitable research ated with polycomb and other chromatin- ple think that lncRNAs are all doing one tools are in their infancy, scientists are modifying complexes3. That, he says, thing,” says Rinn. “They are just new types only beginning to uncover the functions would explain why the same protein of genes, and their repertoire of functions of these transcripts (Box 1). complexes act on different sequences in I think will rival the proteome.”

nature methods | VOL.8 NO.5 | MAY 2011 | 379 technology feature

BOX 1 LONG NONCODING RNAs REGULATING GENES

Although researchers are still struggling to uncover the stress, DNA damage and p53, a protein that regulates the cell mechanisms and protein partners of long noncoding RNAs cycle and is implicated in about half of all human cancers12. (lncRNAs), evidence of their importance in basic biology Another set of experiments hinted at extensive regulatory is pouring in. systems featuring lncRNAs. Several lncRNAs were found to be Last year, John Rinn at the Broad Institute and colleagues regulated by p53; one transcript, lncRNA-21, binds a protein used a specially designed microarray to test the expression known as heterogeneous nuclear ribonucleoprotein K (hnRNP-K). of about 900 lncRNAs in human fibroblasts and pluripotent Almost 600 genes are affected in common by all three—p53, stem cells. Just over 100 lncRNAs were induced when lncRNA-21 and hnRNP-K (ref. 13). fibroblasts were reprogrammed to pluripotency; a similar This March, Howard Chang at Stanford University described number of lncRNAs were repressed. The team focused on how a lncRNA called HOTTIP can help to activate the transcription a couple of dozen lncRNAs that were more upregulated in of several HOXA genes in vivo. The molecule is transcribed from induced pluripotent stem cells than in embryonic stem cells, the tip of the HOXA@ , where it binds adaptor proteins reasoning that these would be particularly important for and sets nearby chromatin marks to drive transcription. When reprogramming. Previously published data indicated that HOTTIP was knocked down, the proteins MLL1 and WDR5 were Oct4, an essential transcription factor for pluripotency, was not observed on the transcription start sites of HOX5 genes as binding sites in the genome identified as coding for lncRNAs. they normally are. However, these effects could not be rescued by Gain- and loss-of-function experiments showed that at expressing HOTTIP from another region on the genome, indicating least one of these, called lincRNA-RoR, for long intergenic that the lncRNA must work from the chromosome on which noncoding RNA and regulator of reprogramming, was it is transcribed. Indeed, chromosome conformation capture essential for a variety of functions, including reprogramming techniques have indicated that chromatin looping brings the RNA as well as modulating genes known to respond to oxidative into close proximity to the activated genes3.

Because their functions are difficult to that are sticking well above the pack.” And really proof,” he says. “Ultimately we have study, long noncoding RNAs are gener- though the work is still in early stages, it to go in and do experiments to demon- ally classified by their origins. They seem seems to be paying off. Knocking down strate that things have function.” to come from everywhere in the genome: or ectopically expressing these transcripts Advances in uncovering the function the noncoding side of genes, alongside and changes cells’ phenotypes for about half of of lncRNAs could be self-reinforcing, between the protein-coding regions and, the cases he has investigated, he says. says Chang. Standard approaches often especially, the long stretches in genomes Still, the debate over just what propor- neglect noncoding genes. Whole-exon where no protein-coding genes are thought tion of all noncoding RNAs are function- capture used in disease association stud- to exist at all. Noncoding transcripts are al is surprisingly fierce. “It is difficult to ies, for example, restricts sequencing to Nature America, Inc. All rights reserved. All rights Inc. America, Nature 1 traditionally classified as long at around discriminate functional transcripts from regions that code for proteins. And when 200 nucleotides, an arbitrary distinction those that may be byproducts of other researchers find that a transcription factor

© 201 based on RNA purification technologies. processes,” says Tim Hughes, a genome binds to an intergenic region, their first Most are thousands of nucleotides long. biologist at the University of Toronto, instinct is often to investigate the nearest Nowadays, new putative lncRNAs are “but many transcripts that come from ‘protein gene’. “The knowledge that there generally identified by an RNA sequenc- intergenic regions are starting to look like are long noncoding RNA genes could ing technique called RNA-seq, which uses real signals. They show up relatively con- change someone’s strategy,” says Chang. high-throughput sequencing to profile sistently in different experiments, contain cell transcripts. Chromatin analysis also splice junctions and are present in high Tool box helps. Work by Rinn and others showed numbers,” says Hughes. Still, he advocates As more researchers begin to investi- that histone methylation patterns that are caution. Differential expression could be gate lncRNAs, there is a greater need to characteristic of transcribed protein genes explained by activity in nearby protein- annotate them so that researchers know also apply to lncRNAs, and resulting ‘chro- coding genes, for example. “Higher abun- what to look for, know if they find some- matin-state maps’ have now been used to dance presumably increases the likelihood thing new and know how to name what flag thousands of putative lncRNAs6. that a transcript is functional, but it’s not they find. In general, microarrays cannot So many lncRNAs are being identified distinguish between different forms of a that it is difficult to know what to study transcript, and sequencing often indicates in depth, says John Mattick, a genome multiple ‘isoforms’ of a transcript without biologist at the University of Queensland. indicating which is the most biologically His tactic is to use microarray data to find relevant. “Protein-coding RNA has been transcripts with the greatest change in studied for long enough that most major expression between tissues; differences lab Lee J. Sarma, K. forms of the transcripts are known, but of more than 20-fold are not uncommon, RNA FISH probes (green) that bind the lncRNA lncRNAs are too new for that,” says Rinn. he says. “When you have a mountain of Xist (right) can be displaced using specially Right now, he says, it is not always clear things to look at, you just want the ones designed sequences of locked nucleic acids (left). where a lncRNA gene starts or stops.

380 | VOL.8 NO.5 | MAY 2011 | nature methods technology feature

BOX 2 POSSIBLE ROLES FOR LONG NONCODING RNA

Junk. Sloppy machinery means some sequences are transcribed unnecessarily.

Byproduct. The act of transcription may help to prepare the genome for future transcripts or open the DNA to activate nearby genes.

Scaffold. Various protein complexes may need to work in unique combinations; noncoding RNA keeps the proteins together.

Guide. Noncoding RNA may guide complexes to the right spots in the genome.

Effector. An intimate collaboration of RNA and protein allows a protein to modify chromatin or otherwise regulate .

Enhancer or activator. The promoters of some protein genes may get a boost from noncoding RNAs.

Several cataloguing and annotation be difficult, says Jennifer Harrow at the efforts are underway. Early this year, Wellcome Trust Sanger Institute, who Mattick established a database (http:// coordinates the Havana team. They use www.lncrnadb.org/) especially for long a combination of RNA-seq data, chroma- noncoding RNAs backed by experimen- tin-state maps and computer algorithms tal data. Entries are manually curated to identify lncRNAs, but there is much from the research literature and linked to more work to be done, Harrow says. the University of California Santa Cruz Transcriptional evidence alone cannot Genome Browser and Noncoding RNA show what a lncRNA is doing. Expression database7. FANTOM (function- Meanwhile, experimental tools designed al annotation of the mammalian genome), for other applications are being extended a large international consortium led by sci- to lncRNAs. Companies such as Active entists at the RIKEN Yokohama Institute, Motif and Millipore sell RNA immuno- has documented tens of thousands of non- precipitation kits to purify proteins and coding RNA transcripts in mouse tissues at to identify bound lncRNAs in ribonucleic Nature America, Inc. All rights reserved. All rights Inc. America, Nature 1 several stages of development. protein complexes. Many lncRNAs are now The Havana team of the ENCODE represented on microarrays from Agilent

© 201 (Encyclopedia of DNA elements) con- and Life Technologies. Life Technologies sortium is manually annotating lncRNAs also sells TaqMan quantitative PCR (qPCR) in the human genome. The field is so assays to precisely evaluate the expres- new that just naming the transcripts can sion of certain lncRNAs. The RNAi (RNA Members of J. Rinn laboratory Rinn J. of Members

lncRNAs allow proteins to regulate different genes in different cells, says John Rinn, of the Broad Institute.

nature methods | VOL.8 NO.5 | MAY 2011 | 381 technology feature

ASH1-E1URE2 50 300 V1 25 200 100 0 0 1,000 1,200 750 900 S1 500 600 250 300 0 0

PARS 0 0

950 960 970 980 990 1,000 1,010 1,020 1,030 420 430 440 450 460 470 480 490 500

Paired

Unpaired Howard Chang Howard

Parallel analysis of RNA structure (PARS) sequence fragments produced by nucleases to simultaneously identify the structures of many long RNA transcripts.

interference) Consortium maintains a ref- (ChIP-seq) assays, researchers use anti- of antibodies from several vendors before erence set of RNA sequences for knock- bodies to pull transcription factors from finding one with sufficiently high affin- ing down lncRNAs. These tools are use- cell lysates, then wash away and analyze ity and specificity. Even with a good anti- ful, but more are needed, says Jeannie Lee, bound DNA to learn where transcription body, she says, data are inherently noisy, who studies noncoding RNA at Harvard factors bind on the genome. RNA immu- making high-quality controls particularly Medical School. For example, the machin- noprecipitation followed by sequenc- important. “Your pulldowns won’t mean a ery for knocking down RNA is mostly ing (RIP-seq) exploits a similar idea, but thing unless you have something to com- in the cytoplasm, but many lncRNAs instead uses antibodies to pull ribonucleo- pare them to,” she says. are in the nucleus, a fact that makes loss- proteins from cell lysates and determines of-function experiments inefficient. which RNA molecules are associated with Getting structure High on scientists’ wish lists are tech- them. Lee used the technique to identify Figuring out what precisely lncRNAs are niques that can be used to identify both more than 9,000 lncRNAs that interact doing requires more than the identifica- the genome regions and the proteins with with the polycomb complex in embry- tion of a transcript’s protein partners, which lncRNAs interact. Long noncoding onic stem cells. Getting the system to says Tom Cech, of Colorado University: RNAs do not always undergo canonical work required a lot of optimization, says “Even for the most well-studied noncod- Nature America, Inc. All rights reserved. All rights Inc. America, Nature 1 base-pairing, so the sequence of a tran- Lee. Her team had to try several batches ing RNAs, the field is still grappling with script yields few clues about how it inter- the question of what are the relevant RNA © 201 acts with the genome. In December 2010, 932 structures.” Computational tools are often Lee and scientists from Harvard University 933 used to assess the structures of smaller and reagents company Exiqon showed RNAs, but lncRNAs represent a more dif- 1,187 that non-natural nucleic acids could be 5,089 ficult challenge. used to watch how Xist, a 17-kilobase Unlike protein-coding genes and short- lncRNA and one of the first discovered, er RNAs, lncRNAs are poorly conserved interacts with the X chromosome. Her 1,521 between species, which makes it harder to team used locked nucleic acids that were translate results from one organism to anoth- complementary to two sections of Xist, er and also lends an additional degree of

then used fluorescent probes to observe 1,756 uncertainty about whether a given lncRNA how the lncRNA and associated pro- is functional. Many researchers suspect that

teins disassociated and reassociated with 3,230 although sequences are not well conserved, 8 the genome . Though Xist is an unusual 1,944 the structures they form may well be. If lncRNA—it coats most of the inactive X this is true, structural information could chromosome—Lee believes the technique Total noncoding RNA genes = 16,592 provide a more meaningful way to classify will be generally applicable. In fact, she lincRNA snoRNA data 7 Gencode preliminary project, Havana Harrow, J. lncRNAs. With enough data, structures Antisense miscRNA says, as most lncRNAs are smaller than snRNA Uncategorized processed transcript might become reliable indicators of func- Xist, the search for disruptive sequences microRNA Other tion, and so guide researchers toward better might be easier and less expensive. follow-up experiments. Long noncoding RNAs are just one of many Lee has also described a genome- noncoding transcripts being annotated. lincRNA, Not surprisingly, several researchers are wide technique for identifying lncRNAs long intergenic noncoding RNA; snRNA, small working on the problem. Together with bound to particular proteins9. In chro- nuclear RNA; snoRNA, small nucleolar RNA; and Eran Segal at the Weizmann Institute of matin immunoprecipitation–sequencing miscRNA, miscellaneous RNA. Science in Israel, Chang recently described

382 | VOL.8 NO.5 | MAY 2011 | nature methods technology feature

Separately, a team of researchers from the University of California Santa Cruz pub- lished a similar technique called Frag-seq for fragmentation sequencing, using only one nuclease11. Analysis of digested frag- ments over the entire mouse transcrip- tome successfully mapped single-stranded regions in multiple ncRNAs whose struc- tures are known. Using sequencing to identify structure is far more challenging than using it to iden- tify transcripts, says Chang. Even simple Mark Yamaguma, Stanford University Stanford Yamaguma, Mark things could make a big improvement. Howard Chang of Stanford University For example, says Cech, a nuclease that believes that understanding the structure of precisely cuts only double-stranded RNA long noncoding RNA could reveal much about its function. could make for more accurate analysis. These genome-wide approaches are useful, says Cech. But he suspects that a technique that uses high-throughput before functional classes can be defini- sequencing to assess the structure of the tively assigned, more details will need lncRNAs in an entire yeast transcrip- to be carefully worked out for a few sys- tome10. In the technique, called parallel tems. What we need, he says, are “multiple analysis of RNA structure (PARS), tran- examples that can be taken down to the scripts are first digested with a set of two structural and mechanistic level so that we nucleases that cleave RNA in certain single- have the same sort of understanding as we stranded or stacked-base conformations; do for transcription and RNA splicing and the digested fragments are then sequenced translation and other cellular processes. and used to determine which sections Until we drill down to that level, we don’t of the RNA exist in single-stranded have much understanding.” and other conformations. Chang is cur- 1. Rinn, J.L. et al. Cell 129, 1311–1323 (2007). rently working out ways to use PARS in liv- 2. Tsai, M.C. et al. Science 329, 689–693 (2010). ing cells, the better to compare the lncRNA 3. Khalil, A.M. Proc. Natl. Acad. Sci. USA 106, transcriptome under different conditions. 11667–11672 (2009). 4. Wang, X., Song, X., Glass, C.K. & Rosenfeld, M.G.

Nature America, Inc. All rights reserved. All rights Inc. America, Nature Cold Spring Harb. Perspect. Biol. 3, a003756 1 (2011). 5. Ørom, U.A. et al. Cell 143, 46–57 (2010). 6. Guttmann, M. et al. Nature 458, 129–140 © 201 (2009). 7. Amaral, P.P., Clark, M.B., Gascoigne, D.K., Dinger, M.E. & Mattick, J.S. Nucleic Acids Res. 39, D146–D151 (2011). 8. Sarma, K., Levasseur, A., Aristarkhov, A. & Lee, J.T. Proc. Natl. Acad. Sci. USA 107, 22196– 22201 (2010). 9. Zhao, J. et al. Mol. Cell 40, 939–953 (2010). 10. Kertesz, M. et al. Nature 467, 103–107 (2010). 11. Underwood, J.G. et al. Nat. Methods 7, 995– 1001 (2010). 12. Loewer, S. et al. Nat. Genet. 42, 1113–1117 (2010). 13. Huarte, M. et al. Cell 142, 409–419 (2010).

Jeannie Lee at Harvard Medical School is Monya Baker is technology editor for characterizing Xist, one of the first long Nature and Nature Methods noncoding RNAs to have been discovered. ([email protected]).

nature methods | VOL.8 NO.5 | MAY 2011 | 383