INVESTIGATION

UP-TORR: Online Tool for Accurate and Up-to-Date Annotation of RNAi Reagents

Yanhui Hu,*,† Charles Roesel,†,‡ Ian Flockhart,*,† Lizabeth Perkins,*,† Norbert Perrimon,*,§ and Stephanie E. Mohr*,†,1 *Department of Genetics and † RNAi Screening Center, Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, ‡ Program, Northeastern University, Boston, Massachusetts 02115, and §Howard Hughes Medical Institute, Boston, Massachusetts 02115

ABSTRACT RNA interference (RNAi) is a widely adopted tool for loss-of-function studies but RNAi results only have biological relevance if the reagents are appropriately mapped to genes. Several groups have designed and generated RNAi reagent libraries for studies in cells or in vivo for Drosophila and other species. At first glance, matching RNAi reagents to genes appears to be a simple problem, as each reagent is typically designed to target a single gene. In practice, however, the reagent–gene relationship is complex. Although the sequences of oligonucleotides used to generate most types of RNAi reagents are static, the reference genome and gene annotations are regularly updated. Thus, at the time a researcher chooses an RNAi reagent or analyzes RNAi data, the most current interpretation of the RNAi reagent–gene relationship, as well as related information regarding specificity (e.g., predicted off-target effects), can be different from the original interpretation. Here, we describe a set of strategies and an accompanying online tool, UP-TORR (for Updated Targets of RNAi Reagents; www.flyrnai.org/up-torr), useful for accurate and up-to-date annotation of cell-based and in vivo RNAi reagents. Importantly, UP-TORR automatically synchronizes with gene annotations daily, retrieving the most current information available, and for Drosophila, also synchronizes with the major reagent collections. Thus, UP-TORR allows users to choose the most appropriate RNAi reagents at the onset of a study, as well as to perform the most appropriate analyses of results of RNAi- based studies.

NA interference (RNAi) is an effective tool to study gene icant need to keep RNAi reagent annotations up to date with Rfunction. In particular, genome-scale RNAi screens in new genome assemblies and gene annotations. mammalianandDrosophilaculturedcells,aswellasinvivo A large number of cell-based RNAi screens have been in Drosophila and Caenorhabditis elegans, have made contri- performed using various genome-scale RNAi reagent libraries butions to a number of areas of study (Kamath et al. 2003; (Mohr et al. 2010). RNAi reagents for Drosophila cells are Dietzl et al. 2007; Boutros and Ahringer 2008; Mohr et al. usually long (100–500 bp) double-stranded RNAs (dsRNAs) 2010; Perrimon et al. 2010; Qu et al. 2011; Mohr and Perrimon made by PCR using a genomic or cDNA template, followed by 2012). RNAi screening is dependent not only on the avail- in vitro transcription. In the cell, dsRNAs are processed by ability of RNAi reagents but also on accurate information the endogenous RNAi machinery, generating active RNAi regarding the predicted gene targets of the reagents. reagents, i.e., small dsRNA segments typically 20–22 bp Large-scale RNAi libraries are available for a number of in length with a 2-bp 39 overhang (Clemens et al. 2000; model systems. Although different types of RNAi reagents Hammond et al. 2000). In Drosophila, dsRNAs can be easily are used in different systems, there is a common and signif- introduced into cultured cells (Clemens et al. 2000; Hammond et al. 2000). Several large-scale facilities, including the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, Boutros lab at German Cancer Research Cen- Copyright © 2013 by the Genetics Society of America ter (DKFZ), RNAi Core at New York University, and Sheffield doi: 10.1534/genetics.113.151340 Manuscript received March 21, 2013; accepted for publication May 24, 2013 RNAi Screening Facility (SRSF), support Drosophila cell- Supporting information is available online at http://www.genetics.org/lookup/suppl/ based RNAi screening and offer genome-wide libraries with doi:10.1534/genetics.113.151340/-/DC1. 1Corresponding author: New Research Building Room 336, 77 Avenue Louis Pasteur, multiple dsRNAs-per-gene coverage. For mammalian cells, Boston, MA 02115. E-mail: [email protected] RNAi screens are done using synthesized short interfering RNAs

Genetics, Vol. 195, 37–45 September 2013 37 (siRNAs), endoribonuclease-prepared short interfering RNAs RNAi reagents and endogenous sequences other than the tar- (esiRNAs), or plasmid- or viral-encoded short hairpin RNAs get (Kulkarni et al. 2006; Moffat et al. 2007). As the sequen- (shRNAs) (Root et al. 2006; Kittler et al. 2007; Micklem and ces of genes and transcripts change at each gene annotation Lorens 2007). Similar to Drosophila cell screens, mammalian release, annotation of potential OTEs can also change over screens are typically performed in individual labs or in con- time. Correcting for changes is not simply a matter of keeping junction with one of several academic screening facilities that up with new gene names and synonyms. Updates can change provide automation and database support for screens. predictions as to the target gene, the number of predicted off RNAi reagents have also been developed for in vivo screens targets, isoform specificity, etc. As a result, it is critically im- in various systems. In C. elegans,RNAiissystemic,andgene portant to regularly update the annotation of RNAi reagents expression can be knocked down efficiently by feeding worms and make this information readily accessible to the research- with bacteria expressing a long dsRNA (Fraser et al. 2000). A ers who plan, execute, and analyze RNAi-based experiments. genome-scale RNAi feeding library is available (Kamath et al. Several tools are available for the design of RNAi reagents, 2003) and widely used for functional studies. For Drosophila, including SnapDragon for long dsRNAs (Flockhart et al. 2006, in vivo RNAi relies on transgenic flies carrying RNAi transgenes 2012), DSIR for siRNAs (Vert et al. 2006; Filhol et al. 2012), that can be combined with the Gal4/UAS system for develop- and E-RNAi and NEXT-RNAi (Arziman et al. 2005; Horn and mental, stage- and/or tissue-specific knockdown (Dietzl et al. Boutros 2010; Horn et al. 2010) for long dsRNAs and siRNAs. 2007). Drosophila in vivo RNAi reagents are either long dsRNA Nevertheless, a web-based tool that addresses the dynamic hairpins, for which gene fragments are cloned as an inverted nature of gene annotation has not previously been available. repeat, or short hairpins synthesized as oligonucleotides and Although E-RNAi can be used to evaluate long dsRNAs and then cloned into an expression vector (Perrimon et al. 2010). siRNAs, the reference gene information for Drosophila in E- Altogether, 90% of annotated Drosophila genes are targeted RNAi is currently out of date (FlyBase release 5.19 from July by fly RNAi collections from the Vienna Drosophila RNAi 2009). NEXT-RNAi was designed to be integrated into a back- Center (VDRC), National Institute of Genetics (NIG) RNAi end design/annotation pipeline and there is not currently an Resources in Japan, and the Transgenic RNAi Project (TRiP) openly accessible web-based user interface for the approach. at Harvard Medical School (Dietzl et al. 2007; Ni et al. 2008, In addition, NEXT-RNAi does not distinguish between RNAi 2009, 2011; Yamamoto 2010). Several large-scale transgenic reagents generated from genome DNA vs. cDNA templates, RNAi screens have been successfully performed (reviewed in a feature that is relevant to accurate annotation. Perrimon et al. 2010) and numerous in vivo Drosophila RNAi To best support community needs, the ideal tool would be projects are ongoing. based on regular, automated retrieval of new genome assem- Obtaining meaningful results from RNAi-based studies is blies and gene annotation releases. The ideal tool would also entirely reliant upon appropriate identification of the sequence- handle the dynamic nature of reagent collections via regular, specific gene target(s) of the reagent. Target identification automatic retrieval of new reagent information from major might appear to be a simple problem but this is not necessarily public resources. To meet these needs, we developed a tool the case. Even though sequences associated with RNAi reagents that allows users to query existing RNAi reagents from various are static (e.g., the sequences of oligonucleotides used to make sources based on the current gene annotation. The tool also a library do not change), the reference sequences and gene allows researchers to query up-to-date information regarding annotations, including gene boundaries, exon–intron bound- gene target using user-provided RNAi reagent sequences. aries, and nomenclature, are constantly being updated. Reeval- uations of existing RNAi libraries have shown that by the time of reanalysis, a percentage of reagents do not target any gene or Materials and Methods are no longer predicted to be specific(Hornet al. 2010; Qu Data sources et al. 2011). For a genome-wide C. elegans RNAi feeding library made available in 2003, for example, reanalysis in 2011 Reference gene information is downloaded from the follow- revealed that 18% of reagents needed to be reannotated ing sources: FlyBase for gene anno- (Qu et al. 2011). tation (ftp://ftp.flybase.net/releases/current/); WormBase for For Drosophila, FlyBase is the primary resource of integrated C. elegans gene annotation (ftp://ftp..org/pub/ genetic and genomic information, and FlyBase makes regular wormbase/releases/current-production-release/species/c_elegans/ corrections and additions to gene models (FlyBase Consortium PRJNA13758/); RefSeq for human and mouse gene annota- 2003) Since January 2008, FlyBase has released updated gene tion (ftp://ftp.ncbi.nlm.nih.gov/refseq/release/). annotations 10 times per year. Because several years can pass RNAi reagent information is queried and downloaded between the design of RNAi reagents and their use or data from the following sources: FlyRNAi database for information analysis, many new FlyBase annotations are released between regarding DRSC and TRiP reagents (http://www.flyrnai.org/); reagent design and experimental design, and even more be- GenomeRNAi ftp site for DKFZ library (http://b110-wiki. tween reagent design and data analysis. Off-target effects dkfz.de/signaling/wiki/download/attachments/917513/ (OTEs) are also relevant to the annotation of RNAi reagents. Annotation_1stPCR_fulllibrary_HD2.xls); NIG catalog for OTEs are induced by unintended cross-hybridization between NIG RNAi transgenic lines (http://www.shigen.nig.ac.jp/fly/nigfly/);

38 Y. Hu et al. VDRC catalog for VDRC RNAi transgenic lines (http://stockcenter. HTML, JavaScript, Java servlets, and Lucene. A Perl program vdrc.at/control/fullCatalogueExcel). provided as part of the JBrowse download converts annota- tions from the GFF3 format to the JBrowse format. Data annotation pipeline

The data annotation pipeline (Figure 1) includes the follow- Results ing: (1) A module for automatic retrieval of reference genes “ ” and reagent information. This module downloads informa- Reference genes are moving targets that change tion from corresponding locations daily. The annotation over time pipeline is triggered whenever there is a new release from For D. melanogaster, FlyBase is the primary resource of FlyBase or National Center for Biotechnology Information integrated genetic and genomic information, including up- (NCBI) RefSeq, and/or when any new reagents become to-date genome assemblies and gene annotations (FlyBaseCon- available. (2) A module that processes reference gene in- sortium 2003). Since the first assembly of the D. melanogaster formation for each species, respectively, to assemble a gene genome published in 2000, four subsequent genome assem- lookup table, the BLASTable database of genomic sequence blies, with the most recent one in February 2007, have oc- for virtual PCR, as well as the BLASTable database of tran- curred (Myers et al. 2000; Celniker et al. 2002; Hoskins et al. script sequences for virtual PCR of the reagents made from 2007). In addition to updates to the genome assembly, there the cDNA library and on-target/off-target gene search. (3) A have been numerous updates since 2000 to gene annotations. module that processes RNAi reagent information from each Particularly given the new availability of next-generation se- source to assemble the RNAi reagent lookup table and a quencing approaches, gene annotations continue to change, BLASTable database of RNAi reagent sequences used when for example due to the addition of newly identified genes and the end user queries UP-TORR by gene sequence. (4) A module newly identified isoforms of previously identified genes. Thus, that assembles the sequences of dsRNA reagents by in silico despite the fact that Drosophila is arguably the best annotated PCR. With the exception of the majority of NIG reagents, genome among multicellular species, our knowledge of the fly which have sequences assembled based on sequence valida- genome and proteome continues to improve. Indeed, since the tion data, long dsRNA reagents have not been sequenced availability of the fifth genome assembly (i.e., over the last 6 and thus, the most accurate information is the sequences years or so), the FlyBase consortium has released 49 updates to of PCR primers. Virtual PCR is performed with each new Drosophila gene annotations. FlyBase release for all relevant reagents using either geno- Exemplifying the extent of changes, for the gene annota- mic sequence or transcript sequence, depending on how the tion release issued on September 7, 2012, 123 genes and 578 reagents were initially generated, by BLASTing the primer protein-coding transcripts were changed relative to the pre- sequences against the corresponding BLASTable database. vious release. Moreover, the number and type of changes to (5) A module that matches the sequences of RNAi reagents gene annotations vary with each release. To obtain a more to transcript sequences by BLAST. (6) A module that sum- comprehensive picture of gene annotation changes, we looked marizes the on-target/off-target search results based on at changes to the gene annotation over the period of 1 year user-defined parameters and presents the summary table to (FlyBase version r5.34 vs. 5.44). On the gene level, 412 new the end user. (7) A module that matches the gene sequences genes were added, 12 genes were retired, and the genome submitted by the end user to reagent sequences by BLAST. (8) location of 2287 genes was changed. On the transcript level, A module that aligns reagents to genomic sequences of refer- 3407 new transcripts were added, 833 transcripts were retired, ence genes and reformats the information about the reference and the specific sequences of 2902 transcripts were changed. gene and RNAi reagents into the Generic Feature Format 3 Thus, for a Drosophila RNAi reagent designed at the beginning of (GFF3) for upload to JBrowse, facilitating visual display of this period, there is an 30% chance that the sequence of the gene/reference alignment to the end user. gene target had changed a year later. Given that the time from RNAi reagent design to availability of the reagent for experiments Software can be months, and the practical reality that many RNAi reagents The BLAST program from NCBI (Altschul et al. 1990) is are put to use several years after they were designed, these among the research applications already installed on the changes have a significant impact on RNAi reagent annotation. Orchestra platform at Harvard Medical School. The BLAST Notably, gene annotation changes can affect not just the on- parameters for virtual PCR: -W 10 -e 1 -G 5 -E 2; cutoff target predictions for a given RNAi reagent but also the number of for virtual PCR: 100% identity; BLAST parameters for on- predictedOTEsassociatedwithagivenreagentand/orwhether target/off-target searches: -W 14 -e 10 -G 5 -E 2 -F F; cutoff or not it is predicted to target all isoforms of the target gene. For for on-target search: 27 bp or longer with $98% identity; a summary of annotation changes in FlyBase and WormBase over cutoff for off-target search: 15-bp alignment or longer. the past 5 years, see Supporting Information, Table S1. JBrowse was downloaded from jbrowse.org/install (Skinner Dynamic annotation of RNAi reagents et al. 2009). More detailed information can be found at jbrowse.org/developer. Programs for reagent annotation were When a large amount of information is involved (in this written in Perl and the user interface was developed using case, information surrounding the sequence and targets of

UP-TORR: RNAi Reagent Annotation Tool 39 Figure 1 UP-TORR annotation pipeline. Features of the pipeline include auto- mated daily checks for new reagents and gene annotations from the relevant public sources. When updates are avail- able, the information is downloaded and used to rebuild the underlying databases and lookup tables, allowing UP-TORR to provide the most updated interpretation of the relationship between RNAi reagents and gene annotations.

RNAi reagents), the typical approach is to use a back-end then BLASTed against transcript sequences to identify the cur- database to store the information. At the DRSC, the back- rent on-target and off-target predictions. The process is similar end storage is a relational MySQL database (Flockhart et al. for in vivo long hairpin reagents generated by TRiP, except 2006, 2012) in which a couple dozen tables are used to that for these, transcript sequences are used to generate the store information regarding gene annotations associated virtual PCR product, as the template used to generate these with DRSC and TRiP RNAi reagents. Updating gene anno- was cDNA rather than genomic DNA. For the in vivo long tations as frequently as FlyBase releases updates is not trivial hairpin reagents generated by NIG, because most reagent and as a result, such databases are usually out of sync with sequences were assembled by end-to-end , for these the most current release, a situation that is acceptable for reagents we skip the virtual PCR step and go directly to BLAST- most RNAi reagents but potentially misleading for a subset ing RNAi sequences against transcript sequences. When a user of reagents for which the corresponding gene annotations enters a pair of primers for analysis, the user can specify if have changed significantly. Moreover, forever associating the genomicDNAortranscriptsequencesshouldbeusedinthe RNAi reagent with its originally intended target might bias virtual PCR step. For shRNA reagents, both the 21-bp sense- interpretation of RNAi results, even when information about strand and antistrand sequences, which originated as synthetic alternative targets is also presented. oligonucleotides, can be directly BLASTed against transcript To address this issue, we developed a new strategy and sequences (Figure 1). developed a dynamic annotation tool that is “blind” to the During the reagent “live reannotation” process, UP-TORR original target gene annotations, basing the final reports pre- is designed to answer the following questions: (1) What are sented online solely on updated information. The tool, which all of the possible gene targets? (2) Does the reagent target all we named UP-TORR for updated targets of RNAi reagents, isoforms or only some isoforms of the gene? (3) What region daily and automatically accesses the ftp sites available at of the transcript(s) does the reagent target, i.e., the 59-UTR, FlyBase, WormBase, as well as RefSeq database at NCBI and CDS, or 39-UTR? (4) Are there potential off-target genes that whenever a new release is available, retrieves all of the new share a certain level of sequence similarity? The on-target sequence and gene annotation information. Thus, at any given matches are relative to the full reagent sequence with at least time, a query of UP-TORR will generate the most updated 17 bp matches for shRNA and 27 bp perfect match for long results available. For cell-based RNAi reagents from the DRSC dsRNA, whereas off-target matches can be as short as 15 bp and DKFZ as well as in vivo long hairpin reagents generated matches. The user can specify the cutoffs at the user interface. by VDRC and Ahringer lab, PCR primer sequences are aligned Using this tool, we reannotated all the RNAi reagents to the up-to-date genome assembly sequence, generating vir- generated at DRSC, DKFZ, VDRC, NIG, and TRiP based on tual PCR products. The sequences of these PCR products are FlyBase release 5.49 (Table 1). We found that a percentage

40 Y. Hu et al. of the reagents no longer met the original design goal. For example, within the TRiP shRNA collection, 3% of reagents

were predicted at the time of our reannotation with UP-TORR target to target multiple genes. Some of these are due to high se- No gene quence similarity of the paralogous genes such as His1, His2A, His2B,andHis3 families, respectively, making it impossible to design gene-specific RNAi reagents. Additionally, the Drosoph- ilagenomeismorecompactthanthemammaliangenome, and some genes are located close to each other or fully overlap Target pseudogene on the genome as well as at the transcript level. For example, the genes cup and CG34310 are both located at 6663968– 6674780 on the + strand of chromosome 2L. Their transcripts are also identical and the only difference is the protein-coding regions (Figure 2A). In cases like this, it is impossible to design any RNAi reagent targeting one gene but not the other. An- 5 OTEs (19 bp) Reagents with other example is eIF-2gamma and Su(var)3-9. These genes . partially overlap on both the genome and transcript levels. TRiP reagent HMS00279 happened to target exons shared oject; VDRC, Vienna Drosophila RNAi Center. by the two genes; therefore, the library could be improved by targeting the regions specifictoeachgene(Figure2B).

In addition, 0.8% of reagents do not target any genes in the genes release we were testing. They aligned to introns (Figure 2C), Target multiple intergene regions (Figure 2D), or pseudogenes (Figure 2E) duetothechangesintheintron–exon boundary, gene bound- ary, or gene retirement. Our comparison of FlyBase releases (r5.34 and r5.44) shows that 3407 new transcripts were added and 833 transcripts were removed. Thus, it is more likely that a new Target 1 gene, isoform will be added than that an existing isoform will be not all isoforms retired. An RNAi reagent may fail to target all isoforms even though it was initially designed to be isoform unspecific. According to FlyBase release 5.49, 38% of fly genes have more than one isoform. We found that 90% of TRiP shRNA reagents RNAi reagent collections still target all isoforms, whereas 6% target one or a subset of all isoforms isoforms based on current isoform annotation. Some of these Target 1 gene, reagents are limited by the genes themselves, which lack exons common among all isoforms (Figure 2F), whereas others could be improved (Figure 2G) by targeting regions shared by all isoforms. Because isoforms can be expressed specifically in cer- tain tissues or under certain pathological conditions, and/or Caenorhabditis elegans might have divergent functions, providing annotation at the isoform level is important for the appropriate identification of and y 11,725 10,328 (88%) 532 (5%) 436 (4%) 416 (4%) 2 (0.02%) 427 (4%) y 2483 2255 (91%) 114 (5%) 72 (3%) 8 (0.3%) 2 (0.1%) 40 (2%) y 4132 3738 (90%) 242 (6%) 120 (3%) 0 (0%) 2 (0.1%) 30 (1%) y 21,808 18,607 (85%) 962 (4%) 1357 (6%) 745 (3%) 60 (0.3%) 822 (4%) y 10,748 9135 (85%) 431 (4%) 378 (4%) 67 (1%) 27 (0.3%) 777 (7%) fl fl fl fl RNAi reagents and interpretation of RNAi results. fl Online features of UP-TORR Drosophila To provide researchers with the most current and accurate annotation of RNAi reagents, we developed a freely accessible web-based application. To accommodate the full spectrum of community needs regarding reagent identification and live reannotation, we have provided users with five different ways to query UP-TORR. After selecting the species (Drosophila, C. elegans, mouse, or human) from the appropriate menu tab, users can (1) enter the gene-specificregionofanRNAire- agent sequence (i.e., a 19–21 bp sense/antisense strand cor- responding to a siRNA or short hairpin, or a DNA sequence DKFZDRSC-Genome LibraryDRSC-Followup LibraryNIG dsRNA, cell based dsRNA, cell based dsRNA, cell 24,037 based 9448 dsRNA, 19,615 transgenic (82%) 20,016 8519 (90%) 1664 (7%) 15,948 (80%) 470 (5%) 979 (4%) 816 (4%) 296 (3%) 1466 552 (7%) (2%) 15 (0.2%) 84 (0.4%) 394 (2%) 14 (0.2%) 1695 (7%) 78 (0.4%) 149 (2%) 1708 (9%) RNAi collection Reagent type All reagents Table 1 Summary of major public DKFZ, German Cancer Research Center; DRSC, Drosophila RNAi Screening Center; NIG, National Institute of Genetics (Japan); TRiP, Transgenic RNAi Pr TRiP-LongHairPin dsRNA, transgenic TRiP-ShortHairPin shRNA, transgenic VDRC-GD Library dsRNA, transgenic VDRC-KK Library dsRNA, transgenic corresponding to a dsRNA); (2) enter PCR primers for dsRNA, Ahringer Library dsRNA, worm feeding 16,256 11,002 (68%) 678 (4%) 1733 (11%) 1074 (7%) 283 (2%) 2843 (16%)

UP-TORR: RNAi Reagent Annotation Tool 41 42 Y. Hu et al. then choose the proper PCR template (genomic DNA or Discussion cDNA); (3) enter a list of RNAi reagent IDs (e.g.,DRSC There is a necessary passage of time between the design of amplicon ID, GenomeRNAi amplicon ID, TRiP stock ID, RNAi reagents and their use, as well as between design and NIG stock ID, VDRC transformant ID or Ahringer primer analysis of results (and later reinterpretation of RNAi data, pair ID); (4) enter a list of gene identifiers for which all such as in meta-analyses) (Horn et al. 2010; Qu et al. 2011). relevant reagents will be retrieved (e.g., FlyBase FBgn IDs, As we have presented, gene annotations change over time CG numbers, and/or gene symbols); or (5) enter the se- (Table S1), leading to changes in what the latest evidence quence to be targeted (e.g., a full-length transcript or exon suggests is the appropriate interpretation of RNAi on-target sequence). For query types 1–3, in which an RNAi reagent and off-target potential. The UP-TORR approach and accom- is the input, UP-TORR returns a summary of all of the panying freely accessible user interface make it possible for potentially targeted genes, including gene identifiers such researchers to identify RNAi reagents and/or interpret the as FlyBase FBgn number for fly and NCBI Entrez GeneID results of RNAi studies based on the most current annotation for other species, gene symbol, and gene isoform informa- available from FlyBase. Our analysis of RNAi reagents from tion,aswellastheregionandlocationofeachisoformthat all the public RNAi collections show that a small percentage is targeted. UP-TORR also reports the number of possible of RNAi reagents did not meet initial design goals upon off-target genes, which is hyperlinked to detailed informa- reannotation (i.e., they were no longer predicted to be gene tion about the genes (Figure 3). For query types 4 and 5, in specific and isoform nonspecific with regards to the intended which a target gene is the input, all of the RNAi reagents target gene). By comparing the different FlyBase releases, deemed relevant by the live reannotation are reported, we further found that the coding CDS are less likely to change along with a similar summary of information about isoform as compared with untranslated regions (59-or39-UTRs). This specificity and predicted OTEs. These search options allow likely reflectsthefactthatithashistoricallybeeneasierboth users to retrieve all the available RNAi reagents quickly computationally and experimentally to identify coding sequen- without searching individual resources. In addition, users ces than to identify full-length transcripts. can easily compare all RNAi reagents available for a given Because UP-TORR checks for updates at FlyBase, WormBase, gene and select the best one(s). There has been ongoing as well as RefSeq database daily and incorporates these new effort evaluating the efficiency of TRiP RNAi transgenic lines data, facilitating what we refer to as a live reannotation of RNAi by phenotyping and/or qPCR analysis. To help UP-TORR reagent information, the tool will be valuable to anyone users select the most efficient reagent(s), TRiP stock IDs interested in designing, analyzing, or reanalyzing RNAi results, are hyperlinked to a page that includes validation results. including results from high-throughput screens. We recognize, Withquerytype5,inadditiontofullgeneortranscript however, that results from UP-TORR or any other up-to-date sequences, users can also enter specificexonordomain comparison with the current annotation of genomes and/or sequences to identify reagents, specifically targeting the transcriptomes does not necessarily provide the “final word” on transcript region of interest. For all query types, results are RNAi on-target and off-target effects. For example, RNAi treat- hyperlinked to an instance of JBrowse, where alignment of ments can have generalized, gene nonspecificeffects(Muller the RNAi reagents with genes and transcripts is displayed et al. 2008). In addition, SNPs (Chen et al. 2009), RNA editing visually. Users also have the option to download a summary (Rodriguez et al. 2012), and chimeric transcripts (Frenkel- table of results and supporting information. Morgenstern et al. 2013) can complicate the prediction of the Finally, we note that when the output species is Drosophila on-target as well as off-target genes of RNAi reagents. Never- or C. elegans, the output page from a DRSC Integrative Ortho- theless, UP-TORR is the first tool available to address the issue log Prediction Tool (DIOPT) (flyrnai.org/diopt) or DIOPT dis- of genome annotation and RNAi sequences. Importantly, the eases and traits (DIOPT-DIST) search (flyrnai.org/diopt-dist) tool provides up-to-date annotation for RNAi reagents targeting (Hu et al. 2011) has been modified to include a button that human (Figure S1) and mouse genes, as well as for Drosophila carries the gene list forward from DIOPT or DIOPT-DIST to and C. elegans, and could easily be expanded to include UP-TORR. We expect this should help facilitate identification more species. In the future, this tool might be applied to of RNAi reagents relevant to conserved and disease-related other methods (e.g., Transcription Activator-Like Effectors genes.

Figure 2 Issues associated with RNAi reagents and annotations. (A and B) Examples of reagents that target multiple genes. (A) TRiP line GL00327 targets both cup and CG34310 because both the genome and transcript sequences of these two genes fully overlap. (B) TRiP line HMS00279 targets the common exon shared by both the eIF-2gamma and Su(var)3-9 genes. Since the transcript sequence of both of these genes only partially overlap, it is possible to design specific RNAi reagents targeting either eIF-2gamma or Su(var)3-9.(C–E) Examples of reagents that do not target any gene. (C) TRiP reagent HMS00286 aligns to the intron of gene loqs due to a change in the intron–exon junction(s) of loqs gene annotation. (D) TRiP reagent HMS01233 aligns to an intergene region due to a change in the Parp gene boundary. (E) TRiP line HMS00620 aligns to the newly annotated pseudogene CR43361. (F and G) Examples of reagents that do not target all isoforms. (F) TRiP reagent HMS00621 targets five of the eight isoforms of gene CG42724. CG42724 lacks any common exon shared by all isoforms. (G) TRiP reagent HMS01241 targets two of the four isoforms of gene qkr54B. An improved reagent can be designed against the exons shared by all four isoforms.

UP-TORR: RNAi Reagent Annotation Tool 43 Figure 3 UP-TORR user interface. At UP-TORR, the user first specifies a species (Drosophila, C. elegans, mouse, or human) by selecting the appropriate menu tab. The example shown is for fly genes (see Figure S1 for an example using human RNAi reagent sequences). (A) At the appropriate starting page, the user (1) enters the gene-specific sequence region of an RNAi reagent; (2) enters PCR primers for dsRNA and selects the relevant PCR template (genomic DNA or cDNA); (3) enters a list of RNAi reagent IDs (e.g., DRSC or TRiP IDs, NIG IDs, GenomeRNAi IDs; see Table 1); (4) enters a list of gene identifiers for which all relevant reagents will be retrieved (e.g., FlyBase FBgn IDs); or (5) enters a specific sequence (e.g., a full-length transcript or exon sequence). (B) UP-TORR outputs a table that summarizes gene and isoform specificity of the reagents and provides information about the target region, location, and alignment length. (C) Alignment results are hyperlinked to an instance of JBrowse that visually displays an alignment of the RNAi reagents with genes and transcripts. (D) Reagent identifiers are hyperlinked to detailed information pages with reagent sequence(s), on-target gene(s), and off- target gene(s) information. (E) Reagent identifiers on the detail information page are hyperlinked to verification and phenotype data at TRiP RSVP.

(TALE) (Christian et al. 2010) and Clustered Regularly support from the National Center for Research Resources and Interspaced Short Palindromic Repeats (CRISPRs) (Cong Office of Research Infrastructure Programs (NCRR/ORIP) R24 et al. 2013)) for which gene annotation impacts interpre- RR032668. S.E.M. is also supported in part by the Dana tation of the reagents. Farber/Harvard Cancer Center and N.P. is an investigator of the Howard Hughes Medical Institute. Acknowledgements Note added in proof: See Kulathinal, 2013 (pp. 7–8) in this issue, for a related work. We thank the members of the Drosophila RNAi Screening Center (DRSC), Transgenic RNAi Project (TRiP), and Perrimon Literature Cited lab, in particular Laura Holderbaum and Dong Yan, for helpful comments and support. We thank Thomas Micheler from the Altschul,S.F.,W.Gish,W.Miller,E.W.Myers,andD.J.Lipman, Vienna Drosophila RNAi Center for help with data access. We 1990 Basic local alignment search tool. J. Mol. Biol. 215: 403–410. also thank all of the past and current users of the DRSC and Arziman, Z., T. Horn, and M. Boutros, 2005 E-RNAi: a web appli- cation to design optimized RNAi constructs. Nucleic Acids Res. TRiP for providing valuable feedback. This work was supported 33: W582–W588. inlargepartbytheNationalInstituteofGeneralMedical Boutros, M., and J. Ahringer, 2008 The art and design of genetic Sciences R01 GM067761 and GM084947, with additional screens: RNA interference. Nat. Rev. Genet. 9: 554–566.

44 Y. Hu et al. Celniker, S. E., D. A. Wheeler, B. Kronmiller, J. W. Carlson, A. Halpern Kittler, R., V. Surendranath, A. K. Heninger, M. Slabicki, M. Theis et al., 2002 Finishing a whole-genome shotgun: release 3 of the et al., 2007 Genome-wide resources of endoribonuclease- Drosophila melanogaster euchromatic genome sequence. Genome prepared short interfering RNAs for specific loss-of-function studies. Biol. 3: RESEARCH0079. Nat. Methods 4: 337–344. Chen, D., J. Berger, M. Fellner, and T. Suzuki, 2009 FLYSNPdb: Kulkarni,M.M.,M.Booker,S.J.Silver,A.Friedman,P.Honget al., a high-density SNP database of Drosophila melanogaster. Nu- 2006 Evidence of off-target effects associated with long dsRNAs cleic Acids Res. 37: D567–570. in Drosophila melanogaster cell-based assays. Nat. Methods 3: Christian, M., T. Cermak, E. L. Doyle, C. Schmidt, F. Zhang et al., 833–838. 2010 Targeting DNA double-strand breaks with TAL effector Micklem, D. R., and J. B. Lorens, 2007 RNAi screening for thera- nucleases. Genetics 186: 757–761. peutic targets in human malignancies. Curr. Pharm. Biotechnol. Clemens, J. C., C. A. Worby, N. Simonson-Leff, M. Muda, T. Maehama 8: 337–343. et al., 2000 Use of double-stranded RNA interference in Drosophila Moffat, J., J. H. Reiling, and D. M. Sabatini, 2007 Off-target ef- cell lines to dissect signal transduction pathways. Proc. Natl. Acad. fects associated with long dsRNAs in Drosophila RNAi screens. Sci. U S A 97: 6499–6503. Trends Pharmacol. Sci. 28: 149–151. Cong, L., F. A. Ran, D. Cox, S. Lin, R. Barretto et al., 2013 Multiplex Mohr, S., C. Bakal, and N. Perrimon, 2010 Genomic screening genome engineering using CRISPR/Cas systems. Science 339: with RNAi: results and challenges. Annu. Rev. Biochem. 79: 819–823. 37–64. Dietzl, G., D. Chen, F. Schnorrer, K. C. Su, Y. Barinova et al., Mohr, S. E., and N. Perrimon, 2012 RNAi screening: new ap- 2007 A genome-wide transgenic RNAi library for conditional proaches, understandings, and organisms. Wiley Interdiscip. gene inactivation in Drosophila. Nature 448: 151–156. Rev. RNA 3: 145–158. Filhol,O.,D.Ciais,C.Lajaunie,P.Charbonnier,N.Foveauet al., Muller, P., M. Boutros, and M. P. Zeidler, 2008 Identification of 2012 DSIR: assessing the design of highly potent siRNA by test- JAK/STAT pathway regulators–insights from RNAi screens. ing a set of cancer-relevant target genes. PLoS One 7: e48057. Semin. Cell Dev. Biol. 19: 360–369. Flockhart, I., M. Booker, A. Kiger, M. Boutros, S. Armknecht et al., Myers, E. W., G. G. Sutton, A. L. Delcher, I. M. Dew, D. P. Fasulo 2006 FlyRNAi: the Drosophila RNAi screening center data- et al., 2000 A whole-genome assembly of Drosophila. Science base. Nucleic Acids Res. 34: D489–D494. 287: 2196–2204. Flockhart,I.T.,M.Booker,Y.Hu,B.McElvany,Q.Gillyet al., Ni,J.Q.,M.Markstein,R.Binari,B.Pfeiffer,L.P.Liuet al., 2012 FlyRNAi.org–the database of the Drosophila RNAi screen- 2008 Vector and parameters for targeted transgenic RNA ing center: 2012 update. Nucleic Acids Res. 40: D715–719. interference in Drosophila melanogaster. Nat. Methods 5: FlyBase Consortium, 2003 The FlyBase database of the Drosoph- 49–51. ila genome projects and community literature. Nucleic Acids Ni, J. Q., L. P. Liu, R. Binari, R. Hardy, H. S. Shim et al., 2009 A Res. 31: 172–175. Drosophila resource of transgenic RNAi lines for neurogenetics. Fraser, A. G., R. S. Kamath, P. Zipperlen, M. Martinez-Campos, M. Genetics 182: 1089–1100. Sohrmann et al., 2000 Functional genomic analysis of C. ele- Ni, J. Q., R. Zhou, B. Czech, L. P. Liu, L. Holderbaum et al., 2011 A gans chromosome I by systematic RNA interference. Nature genome-scale shRNA resource for transgenic RNAi in Drosoph- 408: 325–330. ila. Nat. Methods 8: 405–407. Frenkel-Morgenstern, M., A. Gorohovski, V. Lacroix, M. Rogers, K. Perrimon, N., J. Q. Ni, and L. Perkins, 2010 In vivo RNAi: today Ibanez et al., 2013 ChiTaRS: a database of human, mouse and and tomorrow. Cold Spring Harb. Perspect. Biol. 2: a003640. fruit fly chimeric transcripts and RNA-sequencing data. Nucleic Qu,W.,C.Ren,Y.Li,J.Shi,J.Zhanget al., 2011 Reliability analysis Acids Res. 41: D142–151. of the Ahringer Caenorhabditis elegans RNAi feeding library: Hammond, S. M., E. Bernstein, D. Beach, and G. J. Hannon, a guide for genome-wide screens. BMC Genomics 12: 170. 2000 An RNA-directed nuclease mediates post-transcriptional Rodriguez, J., J. S. Menet, and M. Rosbash, 2012 Nascent-seq gene silencing in Drosophila cells. Nature 404: 293–296. indicates widespread cotranscriptional RNA editing in Drosoph- Horn, T., and M. Boutros, 2010 E-RNAi: a web application for the ila. Mol. Cell 47: 27–37. multi-species design of RNAi reagents–2010 update. Nucleic Root, D. E., N. Hacohen, W. C. Hahn, E. S. Lander, and D. M. Acids Res. 38: W332–W339. Sabatini, 2006 Genome-scale loss-of-function screening with Horn, T., T. Sandmann, and M. Boutros, 2010 Design and evalu- a lentiviral RNAi library. Nat. Methods 3: 715–719. ation of genome-wide libraries for RNA interference screens. Skinner, M. E., A. V. Uzilov, L. D. Stein, C. J. Mungall, and I. H. Genome Biol. 11: R61. Holmes, 2009 JBrowse: a next-generation genome browser. Hoskins, R. A., J. W. Carlson, C. Kennedy, D. Acevedo, M. Evans- Genome Res. 19: 1630–1638. Holm et al., 2007 Sequence finishing and mapping of Drosoph- Vert, J. P., N. Foveau, C. Lajaunie, and Y. Vandenbrouck, 2006 An ila melanogaster heterochromatin. Science 316: 1625–1628. accurate and interpretable model for siRNA efficacy prediction. Hu,Y.,I.Flockhart,A.Vinayagam,C.Bergwitz,B.Bergeret al., BMC Bioinformatics 7: 520. 2011 An integrative approach to ortholog prediction for disease- Yamamoto, M. T., 2010 Drosophila Genetic Resource and Stock focused and other functional studies. BMC Bioinformatics 12: 357. Center; The National BioResource Project. Exp. Anim. 59: Kamath, R. S., A. G. Fraser, Y. Dong, G. Poulin, R. Durbin et al., 125–138. 2003 Systematic functional analysis of the Caenorhabditis el- egans genome using RNAi. Nature 421: 231–237. Communicating editor: L. M. McIntyre

UP-TORR: RNAi Reagent Annotation Tool 45 GENETICS

Supporting Information http://www.genetics.org/lookup/suppl/doi:10.1534/genetics.113.151340/-/DC1

UP-TORR: Online Tool for Accurate and Up-to-Date Annotation of RNAi Reagents

Yanhui Hu, Charles Roesel, Ian Flockhart, Lizabeth Perkins, Norbert Perrimon, and Stephanie E. Mohr

Copyright © 2013 by the Genetics Society of America DOI: 10.1534/genetics.113.151340

Figure S1 UP-TORR user interface for human RNAi reagents. For human RNAi reagents, UP-TORR supports input of the gene- specific region from one or more specific RNAi reagent sequence (e.g. siRNA or shRNA 21-mer sequences). (A) The user first selects human tab, then enters the gene-specific sequence regions of RNAi reagents; (B) UP-TORR outputs a table that summarizes gene and isoform specificity of the reagents and provides information about the target region, location and alignment length. (C) Genes are hyperlinked to NCBI EntrezGene page. (D) Transcripts are hyperlinked to NCBI RefSeq page.

2 SI Y. Hu et al.

Table S1 Summary of changes in gene and transcript annotations at FlyBase and WormBase over the past six years.

FlyBase release r5.3 r5.6 r5.16 r5.26 r5.34 r5.44 r5.50

2007- 2008- 2009- 2010- 2011- 2012- 2013- Date Aug Mar Mar Mar Mar Mar Mar

All genes 15096 15089 15073 14721 15183 15569 16093

Genes added (compared with r5.50) 2929 2833 2414 2148 1348 722 NA

Genes retired (compared with r5.50) 1932 1829 1394 776 588 371 NA

Genes with different genome location (compared with r5.50) 10487 10508 9985 9372 8811 7728 NA

All transcripts 21707 21984 22661 22938 24523 27055 30933

Transcripts added (compared with r5.50) 13674 13211 11572 10447 8395 5195 NA

Transcripts retired (compared with r5.50) 4498 4262 3300 2452 2124 1498 NA

Transcripts with different sequence (compared with r5.50) 13730 14040 14098 13665 13471 12889 NA

Transcripts with different CDS sequence (compared with r5.50) 47 4 2 1 0 1 NA

Genes with any transcript changes (compared with r5.50) 10095 10200 10016 9613 9247 8258 NA

WormBase release WS170 WS190 WS200 WS212 WS225 WS230 WS237

2007- 2008- 2009- 2010- 2011- 2012- 2013- Date Feb May Mar Mar Apr Mar Apr

All transcripts 27322 27417 27687 28278 30745 31249 28983

Transcripts added (compared with WS237) 8574 7128 6672 5538 4483 2928 NA

Transcripts retired (compared with WS237) 6913 5562 5376 4833 6245 5194 NA

Transcripts with different CDS sequence (compared with WS237) 3162 2440 2048 1603 1077 618 NA

Y. Hu et al. 3 SI