UC San Diego UC San Diego Previously Published Works

Title Extending gene ontology in the context of extracellular RNA and vesicle communication.

Permalink https://escholarship.org/uc/item/7wt8r2d9

Journal Journal of biomedical semantics, 7(1)

ISSN 2041-1480

Authors Cheung, Kei-Hoi Keerthikumar, Shivakumar Roncaglia, Paola et al.

Publication Date 2016

DOI 10.1186/s13326-016-0061-5

Peer reviewed

eScholarship.org Powered by the California Digital Library University of California Cheung et al. Journal of Biomedical Semantics (2016) 7:19 DOI 10.1186/s13326-016-0061-5

RESEARCH Open Access Extending gene ontology in the context of extracellular RNA and vesicle communication Kei-Hoi Cheung1,2,21*, Shivakumar Keerthikumar3,21, Paola Roncaglia4,22, Sai Lakshmi Subramanian5,21, Matthew E. Roth5,21, Monisha Samuel3, Sushma Anand3, Lahiru Gangoda3, Stephen Gould6,21,23, Roger Alexander7,21, David Galas7,21, Mark B. Gerstein8,9,10,21, Andrew F. Hill3,24, Robert R. Kitchen8,21, Jan Lötvall11,24, Tushar Patel12,21, Dena C. Procaccini13,21, Peter Quesenberry14,21,24, Joel Rozowsky8,10,21, Robert L. Raffai15,21, Aleksandra Shypitsyna4,22, Andrew I. Su16,21, Clotilde Théry17,24, Kasey Vickers18,21, Marca H.M. Wauben19,24, Suresh Mathivanan3,21,24, Aleksandar Milosavljevic5,21 and Louise C. Laurent20,21*

Abstract Background: To address the lack of standard terminology to describe extracellular RNA (exRNA) data/metadata, we have launched an inter-community effort to extend the Gene Ontology (GO) with subcellular structure concepts relevant to the exRNA domain. By extending GO in this manner, the exRNA data/metadata will be more easily annotated and queried because it will be based on a shared set of terms and relationships relevant to extracellular research. Methods: By following a consensus-building process, we have worked with several academic societies/consortia, including ERCC, ISEV, and ASEMV, to identify and approve a set of exRNA and -related terms and relationships that have been incorporated into GO. In addition, we have initiated an ongoing process of extractions of gene product annotations associated with these terms from Vesiclepedia and ExoCarta, conversion of the extracted annotations to Gene Association File (GAF) format for batch submission to GO, and curation of the submitted annotations by the GO Consortium. As a use case, we have incorporated some of the GO terms into annotations of samples from the exRNA Atlas and implemented a faceted search interface based on such annotations. Results: We have added 7 new terms and modified 9 existing terms (along with their synonyms and relationships) to GO. Additionally, 18,695 unique coding gene products (mRNAs and ) and 963 unique non-coding gene products (ncRNAs) which are associated with the terms: “extracellular vesicle”, “extracellular ”, “apoptotic body”,and “microvesicle” were extracted from ExoCarta and Vesiclepedia. These annotations are currently being processed for submission to GO. Conclusions: As an inter-community effort, we have made a substantial update to GO in the exRNA context. We have also demonstrated the utility of some of the new GO terms for sample annotation and metadata search. Keywords: Ontology, Extracellular RNA, Extracellular vesicle, Metadata, Faceted search, Atlas

* Correspondence: [email protected]; `[email protected] 1Department of Emergency Medicine, Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, USA 20Department of Reproductive Medicine, University of California, San Diego, La Jolla, CA, USA Full list of author information is available at the end of the article

© 2016 Cheung et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 2 of 9

Background that uninfected chick and mammalian cells release RNA- Extracellular (exRNAs) are broadly defined as containing vesicles ([21]). However, these non-viral EVs RNAs that are present in the acellular portions of bio- remained largely uninvestigated for decades. EVs re- fluids, such as body fluids (blood, cerebrospinal fluid, entered the literature in the late 1960’swithdescriptions bile, lymph, vitreous humour, amniotic fluid, ascites, of calcifying matrix vesicles released by chondrocytes pleural, pericardial and peritoneal fluids, etc.), secretions ([22]) and vesicular ‘dust’ released by platelets ([23]). (saliva, urine, sweat, tears, milk, seminal fluid, etc.), and These and other such vesicles are now commonly referred cell and tissue culture supernatants. RNA sequencing to as ‘exosomes,’ a term coined by Trams et al. in 1981 analyses of exRNAs demonstrate that they represent al- ([24]) to refer to secreted vesicles that “may serve a most the entire range of cellular RNA species, including physiologic function”, including both small vesicles of rRNAs, tRNAs, mRNAs, miRNAs, piRNAs, lncRNAs, ~100 nm in diameter, and larger vesicles of ~600 nm and circular RNAs ([1–5]). However, the profiles of cel- diameter or greater. lular and exRNAs are not identical, as some cellular This first definition of the term ‘exosome’ has been RNAs appear to be highly enriched in the exRNA frac- subsequently overlooked at least twice, first in 1987 by tion, while others appear to be significantly underrepre- investigators studying the vesicular secretion of the sented, and still others lie between these extremes of transferrin receptor, who adopted a more restrictive def- enrichment or exclusion. inition of the term, conflating it with a delayed mode of Among the many reasons cells might release exRNAs, vesicle secretion in which the vesicles bud at endosome perhaps the most intriguing possibility is that exRNAs membranes to create a multivesicular body (MVB), might contribute to intercellular communication. Export followed later by MVB fusion with the plasma mem- of exRNAs may also be used to eliminate undesired brane to release the vesicles into the extracellular space RNAs from the originating cell. Finally, some exRNAs ([25]). In 1997, investigators studying an RNA-degrading might be generated in a nonspecific manner, either by complex adopted the term ‘exosome’, this time living cells (e.g. by a ‘bulk flow’ process) or as a conse- for an entirely unrelated intracellular biochemical entity quence of cell death (reviewed in [6]). ([26]). Not surprisingly, other investigators have come to ExRNAs appear to be universally associated with carrier different conclusions about which definition holds scien- vehicles, likely due to the rapid degradation of unpro- tific precedent, resulting in variable use of the term ‘exo- tected RNAs in biofluids ([4, 7–10]). The biochemical some’ in different laboratories. EV-related nomenclatures properties of these carriers are likely to be the primary de- are further complicated by the common use of additional terminant of the types and specific identities of exRNAs terms for secreted vesicles that are variably associated with that are secreted from cells, as well as their stability in the different biophysical properties or biogenesis pathways, as extracellular milieu. The first exRNA carriers identified wells as terms based on the cell type or tissue of origin. were RNA viruses, which carry not only viral RNAs, but The former include ‘ectosomes’ (which refer to EVs that also varying levels of host-encoded RNAs. For example, are produced by budding from the plasma membrane retroviruses typically carry two host tRNAs and sub- ([27, 28]) and ‘’ (which are frequently oper- stoichiometric levels of host mRNAs from the cell in ationally defined as EVs that pellet at moderate centrifuga- addition to two copies of the viral RNA genome and key tion speeds (~10,000–20,000 xg) [29]), while the latter viral proteins ([11–16]). In fact, retrovirally infected cells include ‘prostasomes’ ([30]), ‘epididymosomes’ ([31]), produce more “empty” virus-like particles (VLPs) than in- ‘immunosomes’ ([32, 33]), ‘oncosomes’ ([34]), and ‘platelet fectious virions. These VLPs have failed to encapsulate the dust’ ([23]). There are even some vesicle names that refer viral RNA genome, but can carry as much as 10 kb of to observed biological activities (e.g. ‘tolerosomes’ ([35]) host-encoded RNA ([17]). More recently, it has been dis- and ‘calcifying matrix vesicles’ ([22]). covered that virally uninfected cells also release RNAs into The International Society for Extracellular Vesicles the extracellular space, and that these exRNAs are associ- (ISEV) has previously attempted to clarify the nomencla- ated with extracellular vesicles (EVs), lipoproteins (LPPs, ture in this field. Its primary achievements have been to most commonly HDLs ([10, 18]), LDLs ([18, 19])), and (1) introduce the term ‘extracellular vesicle (EV)’ as a ribonucleoprotein particles (RNPs, most commonly general term intended to encompass all secreted vesicles, Ago2-containing RNPs ([9, 20]). While the biogenic and encourage its broad acceptance, and (2) encourage mechanisms underlying the release of exRNA-containing the use of broad-definition terms until a more compre- EVs, LPPs, and RNPs are still being investigated, it is clear hensive understanding of the biogenesis and molecular that they are not generated by a mechanism that is com- compositions of different types of vesicles is developed. mon to all of them. Given the inconsistent vesicle nomenclatures prior to Evidence for EVs and exRNA was first provided more these ISEV efforts, these were major advances. However, than seventy-five years ago by Albert Claude’s observation some inconsistencies still persist. For example, some Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 3 of 9

investigators use the term ‘microvesicle’ for larger EVs data sources, it also allows DMRR data to be integrated (>200 nm diameter) and ‘exosome’ for smaller EVs with other data sources with metadata annotated using (~30–200 nm diameter), while other investigators reject the same ontologies. For ontologies to be useful for bio- these size-based definitions and adopt a set of biogenic logical applications, it is critical that the relevant ontol- definitions in which ‘microvesicle’ describes vesicles that ogies contain meaningful and broadly accepted terms, as bud from the plasma membrane while ‘exosome’ de- well as ensuring that the relationships between the in- scribes vesicles that bud into endosomes and are se- cluded terms be both accurate and accepted by the per- creted only later upon MVB fusion with the plasma tinent scientific community. For an ontology such as membrane. GO which is frequently used for functional enrichment Therefore, investigators are currently forced to make a analysis of genomic datasets, it is also important that choice between these competing definitions, the first be- terms be associated with specific gene products (coding ing practical but mechanistically barren, while the latter and non-coding RNAs and proteins) in an empirically being impractical but mechanistically appealing. Unfor- supported manner. tunately, there is as yet no unambiguous way to distin- The Gene Ontology (GO) Consortium (GOC; http:// guish between vesicles that bud from the plasma www.geneontology.org) is a community-based bioinfor- membrane versus those that bud at the endosome mem- matics effort. This Consortium develops, maintains and brane based on either biophysical properties or molecu- extends two interconnected resources: the Gene Ontol- lar content. This lack of standard nomenclatures creates ogy itself, and a database of GO annotations that associ- i) a problem for individual researchers to annotate/share ate specific gene products with concepts in the Gene their data in an unambiguous manner and ii) a barrier to Ontology. As of March 2016, there are >42,000 GO productive interactions with the broader research com- terms for describing concepts relevant to gene product munity, most prominently the biomedical, genomics, function in a species-independent manner, providing not and computational biology communities. To address only comprehensive coverage of biological concepts but such issues, we initiated our metadata standard efforts also community-wide agreement on how those should within the Metadata Working Group (MWG) of the be used to describe gene functions across all organisms. Extracellular RNA Communication Consortium (ERCC) The GO is organized into three aspects [38]: these are funded by the National Institutes of Health (NIH). As graph structures comprised of classes for molecular part of these efforts, MWG matched metadata terms to functions, the biological processes these contribute to, existing biomedical ontologies and identified the the cellular locations where these occur (cellular compo- exRNA-relevant terms that were absent from major on- nents), and the relationships connecting these classes. A tologies such as the Gene Ontology (GO). ‘GO annotation’ describes the association between a gene product and an ontology class, as well as references Use of ontologies in metadata annotation to the evidence supporting the association. In most As described in [36], the MWG of the ERCC has devel- cases, a GO annotation is a statement about gene func- oped the data and metadata standards for annotating tion, but because the GO cellular component aspect de- exRNA profiling data for submission to the Data Man- scribes a “cellular and subcellular anatomy” it can have agement and Resource Repository (DMRR) of the ERCC. broader applications beyond the originally specified GO Particularly, a process has been established to submit ex- annotation usage of “where a gene product is active.” periment data to DMRR along with metadata in stand- Thus an association of an exRNA to a cellular component ard machine-readable formats (using Linked Data term does not necessarily imply a function per se, though technologies). The standards cover metadata about do- a functional role may of course later be discovered. nors, biosamples, experiments, studies, and analysis The GOC continuously provides enhancements to the steps. Such metadata enable targeted selection of sam- ontology and annotation database, as well as its tools ples of interest (e.g. specific health condition of the and policies, ensuring that the ontology and annotations donor, biofluid or cell/tissue type, library preparation are consistent, and accurately reflect the current state of method, and sequencing assay) for integrative analyses. biological knowledge. The metadata also helps organize the data for efficient In recent years, the GO has expanded the number and interactive as well as programmatic access (e.g. REST type of relationships used to connect terms within the Application Programming Interfaces (APIs)). ontology as well as between GO and other ontologies. A The MWG identified existing biomedical ontologies version of GO containing all relationships, including in- accessible through the NCBO BioPortal [37] as a source formation from the Uber anatomy ontology (UBERON) of commonly accepted terms for annotating exRNA [39], the Chemical Entities of Biological Interest ontol- datasets. Such ontology-based metadata annotation does ogy (ChEBI) [40], the Plant Ontology for plant struc- not only allow semantic retrieval/query of data in ERCC ture/stage (PO) [41], the Phenotypic Quality Ontology Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 4 of 9

(PATO) [42], and the Sequence Ontology (SO) [43], is the GO update, the GOC editorial team routinely adds called go-plus and is available at http://geneontolo- new ontology classes, or modifies existing ones, based gy.org/page/download-ontology. The GOC also makes on requests from GO curators or from community ex- other versions of the ontology available at this site. perts. In the latter case, like our case here, changes to the ontology are proposed using a batch request upon Methods direct consultation (see [44]). As mentioned above, when we initiated our metadata ef- forts, there was a significant gap between the needs of the Adding gene product annotations exRNA community and the terms and relationships present The GOC accepts contributions of annotations from ex- in GO. At that time, the only relevant terms in the Cellular ternal groups, and provides guidance in the ‘Documenta- Component branch were: ‘extracellular vesicular exosome’ tion’ section of its website (http://geneontology.org). The (GO:0070062) and ‘prominosome’ (GO:0071914), neither ‘Documentation’ section includes details of the recom- of which was widely used in the exRNA community. In mended file format, called GAF (Gene Association For- addition to lacking the most commonly used terms for mat). Annotations submitted to the GOC undergo extracellular vesicles, this version of GO lacked terms spe- automated checks, followed by manual review by GO cific for exRNA-containing ‘Lipoprotein Particles’ (LPPs) curators, in this case from the Protein Function Content and ‘Ribonucleoprotein Particles’ (RNPs). team at EMBL-EBI. As sources of gene product annotations in the exRNA Defining exRNA-related terms based on community context, ExoCarta [45] and Vesiclepedia [46] are online consensus databases that catalogue proteins, RNAs and lipids iden- Our goal was to extend the cellular component branch tified specifically in exosomes and other extracellular of GO to incorporate relevant and empirically supported vesicle types, respectively. Recently, ExoCarta was up- terms and relationships that would be useful to and dated with ISEV minimal experimental requirements broadly accepted in the exRNA field, while both main- feature for definition of extracellular vesicles. In collab- taining the internal consistency of GO and leveraging oration with the GOC editorial team, we have mapped the knowledge previously encoded in the GO. Initially the ExoCarta and Vesiclepedia gene products (coding focusing on the major area of controversy (vesicles), we and non-coding RNAs and proteins) onto their corre- surveyed the literature for relevant references and sponding RNAcentral (http://www.rnacentral.org) and sought input from domain experts, including the leader- UniProt (http://www.uniprot.org/) identifiers using the ship of the International Society for Extracellular newly extended GO Cellular Component terms. Vesicles (ISEV), the American Society for Exosomes and Microvesicles (ASEMV), and the ERCC. This process re- Incorporating new GO terms into exRNA Atlas as a use sulted in three alternative proposals, which ranged from case inclusion of only the most general of terms, to inclusion The exRNA Atlas is a central data repository developed of both general and moderately specific terms, to add- and maintained by the DMRR that distributes data pro- itional inclusion of highly specific terms (Fig. 1). As vided by the ERCC. The first public release of the shown in Fig. 1, boxes represent GO terms, ovals repre- exRNA Atlas, in early 2016, contained exRNA-seq pro- sent synonyms, and lines connecting boxes represent the files of over 500 samples generated by ERCC members ‘is-a’ relationship (a dashed line connecting a box and an analyzed uniformly using standard in-house analysis oval corresponds to a synonymous relationship). The pipelines and quality-controlled using standards agreed gray boxes and black lines represent existing terms and by the Consortium. The exRNA Atlas browser enables relationships, while the red boxes and red lines represent efficient searching and sub-selection of exRNA profiles the proposed terms and relationships. The option-3 set for retrieval and integrative analysis. To facilitate inte- of terms/relationships is a subset of the option-2 set gration of the datasets into the exRNA Atlas, data and which is a subset of option 1. We then polled the mem- metadata standards have been developed by the Meta- bership of these societies either during their annual data Working Group of the ERCC. The metadata stan- (ASEMV) or semi-annual (ERC Consortium) meetings dards build on metadata data models we previously or through an electronic survey (ISEV). The results of developed during the course of the NIH Epigenome the ISEV poll are shown (Table 1), and are concordant Roadmap project, International Human Epigenome Con- with the consensuses reached by the other two groups: sortium (IHEC) project, and in collaboration with the Option 2 (set B + set C) is the preferred choice. ENCODE3 project, and NCBI. The metadata include We then worked with the GOC team to incorporate core objects such as biosamples, donors, experimental the consensus recommendations, as well as terms rele- protocols, studies and analysis methods. The Genbor- vant to LPPs, as supported by the literature. As part of eeKB document modeling system was used to define Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 5 of 9

Fig. 1 The three different options of extending GO to include ExRNA terms and relationships

syntactic and semantic models for ontology-rich multi- biosamples and Human Disease Ontology (DOID) [47] for faceted metadata based on the lightweight JSON syntax. disease types. To effectively identify the source from The latest release of the exRNA Metadata Standard as- which the exRNA is extracted, the new GO terms have sociates various biomedical ontologies to both the attri- been used to annotate the biosamples deposited in the bute (key) values as well as with the attributes themselves exRNA Atlas. and supports dynamic validation against ontologies avail- able in NCBO Bioportal. More than twenty data elements Results and discussions within the metadata model are associated with ontologies, ExRNA terms and relationships added to GO such as biofluids, disease types, anatomical locations, cell Based on the option-2 proposal shown in Fig. 1, the ERCC culture sources, cell lines, tissues, and other properties group worked with EMBL-EBI members of the GOC edi- needed to describe biosamples and donors. A number of torial team to add the proposed terms (including syno- these elements reflect the clinical focus of the ERCC, nyms) and relationships to GO. Figure 2 shows these defined by ontologies such as the Systematized Nomencla- terms (with GO IDs) and relationships in red. Any future ture of Medicine - Clinical Terms (SNOMEDCT) (https:// terms added to GO and representing types of vesicles that www.nlm.nih.gov/research/umls/Snomed/snomed_main.h are part of the extracellular region will be automatically tml) for biofluid type as well as anatomical location of the classified under GO:1903561 ‘extracellular vesicle’. Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 6 of 9

Table 1 ISEV poll results leading to understanding the molecular mechanism of Rank 1 2 3 4 extracellular vesicle biogenesis, as well as sorting and se- Option 1 16 86 89 21 creting under normal and pathophysiological microenvi- Option 2 162 36 8 6 ronments. While granular/specific annotations would be more informative, ontological inferencing can be applied Option 3 33 83 91 5 to a given term (extracellular vesicle) to automatically Option 4 1 7 24 180 retrieve its annotations and those associated with its 580 ISEV members received the poll, 212 answered it. The first question was descendant terms (e.g. extracellular exosome). GO query to rank the four options by order of preference. As shown in Fig. 1, the proposed options were: Option 1 = Set A + B + C; Option 2 = Set B + C; Option engines like QuickGO (http://www.ebi.ac.uk/QuickGO/) 3 = Set C; Option 4 = none of the above. Number of answers for each support this type of query. We anticipate that updating the gene products associated with each GO term will be an ongoing process. Gene product annotations extracted from ExoCarta and Vesiclepedia The use of GO terms to annotate exRNA metadata and At the time of writing this manuscript, 18,695 coding implement faceted search within the exRNA Atlas (mRNA and protein) and 963 non-coding (ncRNAs) browser gene products (including isoforms) were annotated with We have implemented a search interface that allows the GO terms: ‘extracellular vesicle’, ‘extracellular exo- users to browse exRNA Atlas biosamples based on some’, ‘apoptotic body’ and ‘microvesicle’ identified in dif- different facets of metadata, including the type of bio- ferent species (Table 2) from ExoCarta and Vesiclepedia. fluid, cell culture source, disease, and exRNA source. The Protein Function Content team at EMBL-EBI are The facets (categories of ontology concepts) provide currently working on importing annotations from an effective means for navigating/filtering a large set ExoCarta and Vesiclepedia into the GOC. We believe of exRNA profiles data based on different aspects of that the gene product annotations extracted from these the samples. The metadata terms linked to ontologies resources would further aid the scientific community in in BioPortal have been augmented to include the downstream cellular component enrichment analysis newly defined exRNA terms (Fig. 3). By annotating

Fig. 2 The list of exRNA-related terms, synonyms and relationships that have been added to GO Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 7 of 9

Table 2 Summary of the gene product annotations extracted from ExoCarta and Vesiclepedia GO-Term GO-ID Coding gene products (mRNAs or proteins) Non-coding gene products (ncRNAs) Extracellular exosome GO:0070062 10,827 953 Apoptotic body GO:0097189 168 Microvesicle GO:1903561 14,668 148 Extracellular vesicle GO:1990742 4029 Total no. of unique coding gene products Total no. of unique non-coding gene products 18,695 963 the exRNA sources using these community-agreed and for computational analysis of exRNA data. Given the terms and by exposing a new facet (exRNA source) rapid developments in this field, we view the current ver- through the ExRNA browser, we have enabled more sion of the relevant areas of the ontology to be a work in precise biologically meaningful search of the ExRNA progress. We anticipate frequent future amendments to Atlas. incorporate as well as refine additional terms, relation- ships, and associated gene products as the community de- Significance of our results and other possible extensions velops a more precise and comprehensive understanding These substantial changes to GO reflect the current of the biogenesis, biophysical properties, molecular com- state of the exRNA field, representing the current litera- position, and functions of different types of exRNA- ture from the perspective of the consensus among three associated EVs, LPPs, and RNPs. To this end, the ERCC major professional societies in this field (ISEV, ASEMV, group aims to continually interrogate the literature and to and ERCC). We believe that the new exRNA-relevant solicit input from professional societies and the wider terms, together with their relationships to each other and exRNA community. The ERCC has launched the ExRNA with the existing GO terms, as well as the associated gene Portal (exrna.org), a publicly accessible website and com- products, will be useful for annotating the increasing num- munity forum for dissemination of information, sharing of ber of manuscripts and datasets being generated by the tools and data (primarily through the ExRNA Atlas), and exRNA community. We also expect that they will prove aggregation of input. Future proposed amendments will be useful for literature mining, data repository integration, posted on the ExRNA Portal and will be subject to a public

Fig. 3 The new GO terms have been used to annotate the biosamples deposited in the ExRNA Atlas based on the exRNA Source. The figure is a screenshot of the faceted search of biosamples in the ExRNA Atlas using the exRNA Source metadata property (facet) Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 8 of 9

comment period. As described above, in the metadata as- Authors’ contributions sociated with datasets submitted to the ExRNA Atlas, we KHC initiated the project of extending GO in the context of exRNA. He also coordinated collaboration between ERCC and GOC. LL provided her domain permit and request depositors to not only tag datasets with expertise for guiding the consensus building process and the project as a the appropriate current GO terms, but to also include whole. SK, MS, SA, LG and SM contributed gene product annotations from more precise keywords, so that over time, frequently used Vesiclepedia and ExoCarta. PR and AS, who are members of GOC, curated the terms and gene product annotations submitted to GO. SLS, MR and AM keywords can be evaluated as possible new GO terms. are involved in the development of the ExRNA Atlas, and have developed In addition, our ExRNA Atlas use case (faceted search) the metadata search use case. KHC, LL, SK, SM, PR, SG, SLS, MR and AM have shows the power of exploiting multiple ontologies in a contributed to the writing of the manuscript. KC, LL, RR, KV, SLS, MR, RA, DG, AS, RK, JR, MG, DP, TP, PQ and AM are members of ERCC. JL, AH, SM, MW, sample annotation. For example, our metadata includes a PQ, and CT are members of ISEV and handled the ISEV poll. SG is a member description of extracellular RNAs isolated from different of ASEMV. These communities support the project. All the authors reviewed types of biofluids. Ontologies such as Uberon [48], Nano- the manuscript. All authors read and approved the final manuscript. Particle Ontology [49], SNOMEDCT and Experimental Acknowledgements Factor Ontology [50] contain concepts corresponding to This work was supported in part by the NIH Common Fund, through the Office different types of “Biofluid” (or “Bodily Fluid”). These con- of Strategic Coordination and the Office of the NIH Director (AM: U54DA036134, cepts can be used alongside with the “extracellular space” LCL: UH2TR000906 and U01HL126494, TP: UH3TR000884, and RR: U19CA179512). Paola Roncaglia and Aleksandra Shypitsyna are funded by the Gene Ontology concept in GO to describe extracellular RNAs that are iso- Consortium (P41 grant from the National Human Genome Research Institute lated from different types of biofluids. (NHGRI), grant 5U41HG002273-14). Aleksandra Shypitsyna is also funded from the European Molecular Biology Laboratory (EMBL). We would like to give special thanks to Dena C. Procaccini for coordinating meetings and conference calls Conclusions facilitating discussion and collaboration as part of the manuscript writing. We would like to thank Tony Sawford (Protein Function Development team, EMBL- In this manuscript, we have described an inter-community EBI) for running the set of ExoCarta and Vesiclepedia annotations through the effort to make a substantial update to GO, focusing on validation pipeline for inclusion in the GOA database. We would also like to thank terms and relationships pertinent to the new field of Xandra O. Breakefield for her insights and comments. Finally we would like to acknowledge the community support offered by the following Consortiums: exRNA biology. We have demonstrated how these new Extracellular RNA Communication (ERC) Consortium, Gene Ontology Consortium GO terms can be used to annotate and search metadata (GOC), and International Society for Extracellular Vesicles (ISEV), and American associated with high-throughput datasets, such as RNAseq Society for Exosomes and Microvesicles (ASEMV). data. In addition, we anticipate that these terms will be Author details useful as keywords for annotation and querying of the lit- 1Department of Emergency Medicine, Yale Center for Medical Informatics, 2 erature. We have also initiated a systematic, literature- Yale University School of Medicine, New Haven, CT, USA. VA Connecticut Healthcare System, West Haven, CT, USA. 3Department of Biochemistry and supported process for associating gene products with each Genetics, La Trobe Institute for Molecular Science, La Trobe University, exRNA-related term, such that they can be used for cellu- Melbourne, VIC 3086, Australia. 4European Bioinformatics Institute (EMBL-EBI), lar component enrichment analysis of omics datasets (e.g. European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 5Bioinformatics Research Laboratory, RNAseq- and gene expression data). We recognize that Department of Molecular & Human Genetics, Baylor College of Medicine, fundamental questions regarding the composition, biogen- Houston, TX, USA. 6Department of Biological Chemistry, Johns Hopkins 7 esis, and functions of exRNA-containing EVs, lipoproteins, University School of Medicine, Baltimore, MD, USA. Pacific Northwest Diabetes Research Institute, Seattle, WA, USA. 8Department of Molecular and ribonucleoprotein complexes remain to be answered. Biophysics and Biochemistry, Yale University, New Haven, CT, USA. Therefore, we anticipate that frequent future updates to 9Department of Computer Science, Yale University, New Haven, CT, USA. 10 GO will be necessary to accurately reflect continued pro- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA. 11University of Gothenburg, Gothenburg, Sweden. 12Mayo gress in this rapidly advancing field. Clinic, Jacksonville, FL, USA. 13Division of Neuroscience and Behavior, National Institute on Drug Abuse (NIDA), Rockville, MD, USA. 14University Medicine Comprehensive Cancer Center, Providence, RI, USA. 15Department Abbreviations of Surgery, University of California San Francisco and VA Medical Center, San ASEMV: American Society for Exosomes and Microvesicles; ChEBI: chemical Francisco, CA, USA. 16Department of Molecular and Experimental Medicine, entities of biological interest; DMRR: data management and resource The Scripps Research Institute, La Jolla, CA, USA. 17Institut Curie, PSL Research repository; ERCC: Extracellular RNA Communication Consortium; University, INSERM U932, Paris, France. 18Department of Medicine, Vanderbilt EV: extracellular vesicle; ExRNA: extracellular ribonucleic acid; GAF: Gene University School of Medicine, Nashville, TN, USA. 19Department of Association Format; GO: gene ontology; GOC: Gene Ontology Consortium; Biochemistry & Cell Biology, Utrecht University, Utrecht, Netherlands. HDL: high density lipoprotein; ISEV: International Society for Extracellular 20Department of Reproductive Medicine, University of California, San Diego, Vesicles; LDL: low density lipoprotein; LPP: lipoprotein particle; La Jolla, CA, USA. 21Extracellular RNA Communication Consortium (ERCC), , mRNA: messenger ribonucleic acid; lncRNA: long non-coding ribonucleic . 22Gene Ontology Consortium (GOC), , . 23American Society for acid; miRNA: micro ribonucleic acid; MVB: multivesicular body; Exosomes and Microvesicles (ASEMV), , . 24International Society for NCBO: National Center for Biomedical Ontology; ncRNA: non-coding RNA; Extracellular Vesicles (ISEV), , . PATO: phenotypic quality ontology; PR: protein ontology; piRNA: Piwi- interacting ribonucleic acid; RNP: ribonucleoprotein; rRNA: ribosomal Received: 17 December 2015 Accepted: 4 April 2016 ribonucleic acid; SO: sequence ontology; tRNA: transfer ribonucleic acid; VHDL: very high density lipoprotein; VLDL: very low density lipoprotein.

References Competing interests 1. Huang X, Yuan T, Tschannen M, et al. Characterization of human plasma- The authors declare that they have no competing interests. derived exosomal RNAs by deep sequencing. BMC Genomics. 2013;14:319. Cheung et al. Journal of Biomedical Semantics (2016) 7:19 Page 9 of 9

2. Li Y, Zheng Q, Bao C, et al. Circular RNA is enriched and stable in exosomes: 30. Brody I, Ronquist G, Gottfries A. Ultrastructural localization of the a promising biomarker for cancer diagnosis. Cell Res. 2015;25(8):981–4. prostasome - an organelle in human seminal plasma. Ups J Med Sci. 1983; 3. Takahashi K, Yan IK, Kogure T, et al. Extracellular vesicle-mediated transfer of 88(2):63–80. long non-coding RNA ROR modulates chemosensitivity in human 31. Sullivan R, Saez F. Epididymosomes, prostasomes, and liposomes: their roles hepatocellular cancer. FEBS Open Bio. 2014;4:458–67. in mammalian male reproductive physiology. Reproduction. 2013;146(1): 4. Skog J, Wurdinger T, van Rijn S, et al. Glioblastoma microvesicles transport R21–35. RNA and proteins that promote tumour growth and provide diagnostic 32. Perrin P, Sureau P, Thibodeau L. Structural and immunogenic characteristics biomarkers. Nat Cell Biol. 2008;10(12):1470–6. of rabies immunosomes. Dev Biol Stand. 1985;60:483–91. 5. Gezer U, Ozgur E, Cetinkaya M, et al. Long non-coding RNAs with low 33. Derdak SV, Kueng HJ, Leb VM, et al. Direct stimulation of T lymphocytes by expression levels in cells are enriched in secreted exosomes. Cell Biol Int. immunosomes: virus-like particles decorated with T cell receptor/CD3 2014;38(9):1076–9. ligands plus costimulatory molecules. Proc Natl Acad Sci U S A. 2006; 6. Patton JG, Franklin JL, Weaver AM, et al. Biogenesis, delivery, and function 103(35):13144–9. of extracellular RNA. J Extracell Vesicles. 2015;4:27494. 34. Di Vizio D, Kim J, Hager MH, et al. Oncosome formation in prostate cancer: 7. Valadi H, Ekstrom K, Bossios A, et al. Exosome-mediated transfer of mRNAs association with a region of frequent chromosomal deletion in metastatic and is a novel mechanism of genetic exchange between cells. disease. Cancer Res. 2009;69(13):5601–9. Nat Cell Biol. 2007;9(6):654–9. 35. Karlsson M, Lundin S, Dahlgren U, et al. “Tolerosomes” are produced by 8. Nolte-‘t Hoen EN, Buermans HP, Waasdorp M, et al. Deep sequencing of intestinal epithelial cells. Eur J Immunol. 2001;31(10):2892–900. RNA from immune cell-derived vesicles uncovers the selective incorporation 36. Subramanian SL, Kitchen RR, Alexander R, et al. Integration of extracellular of small non-coding RNA biotypes with potential regulatory functions. RNA profiling data using metadata, biomedical ontologies and Linked Data Nucleic Acids Res. 2012;40(18):9272–85. technologies. J Extracell Vesicles. 2015;4:27497. 9. Arroyo JD, Chevillet JR, Kroh EM, et al. Argonaute2 complexes carry a 37. Noy NF, Shah NH, Whetzel PL, et al. BioPortal: ontologies and integrated population of circulating microRNAs independent of vesicles in human data resources at the click of a mouse. Nucleic Acids Res. 2009;37(Web plasma. Proc Natl Acad Sci U S A. 2011;108(12):5003–8. Server issue):W170–3. 10. Vickers KC, Palmisano BT, Shoucri BM, et al. MicroRNAs are transported in 38. Ashburner M, Ball C, Blake JA, et al. Gene ontology: tool for the unification plasma and delivered to recipient cells by high-density lipoproteins. Nat of biology. Nat Genet. 2000;25:25–9. Cell Biol. 2011;13(4):423–33. 39. Mungall CJ, Torniai C, Gkoutos GV, et al. Uberon, an integrative multi- 11. Taylor JM. An analysis of the role of tRNA species as primers for the species anatomy ontology. Genome Biol. 2012;13(1):R5. transcription into DNA of RNA tumor virus genomes. Biochim Biophys Acta. 40. Hastings J, de Matos P, Dekker A, et al. The ChEBI reference database and 1977;473(1):57–71. ontology for biologically relevant chemistry: enhancements for 2013. – 12. Dunn MM, Olsen JC, Swanstrom R. Characterization of unintegrated Nucleic Acids Res. 2013;41(Database issue):D456 63. retroviral DNA with long terminal repeat-associated cell-derived inserts. 41. Cooper L, Walls RL, Elser J, et al. The plant ontology as a tool for J Virol. 1992;66(10):5735–43. comparative plant anatomy and genomic analyses. Plant Cell Physiol. 2013; 13. Coffin JM. Structure, replication, and recombination of retrovirus genomes: 54(2):e1. some unifying hypotheses. J Gen Virol. 1979;42(1):1–26. 42. Gkoutos GV, Green EC, Mallon AM, et al. Using ontologies to describe 14. Vogt VM, Eisenman R. Identification of a large polypeptide precursor of mouse phenotypes. Genome Biol. 2005;6(1):R8. avian oncornavirus proteins. Proc Natl Acad Sci U S A. 1973;70(6):1734–8. 43. Eilbeck K, Lewis SE, Mungall CJ, et al. The Sequence Ontology: a tool for the 15. Duesberg PH, Martin GS, Vogt PK. Glycoprotein components of avian and unification of genome annotations. Genome Biol. 2005;6(5):R44. murine RNA tumor viruses. Virology. 1970;41(4):631–46. 44. Alam-Faruque Y, Hill DP, Dimmer EC, et al. Representing kidney 16. Rifkin D, Compans RW. Identification of the spike proteins of Rous sarcoma development using the gene ontology. PLoS One. 2014;9(6):e99864. virus. Virology. 1971;46(2):485–9. 45. Keerthikumar S, Chisanga D, Ariyaratne D, Al Saffar H, Anand S, Zhao K, 17. Petry H, Goldmann C, Ast O, Luke W. The use of virus-like particles for gene Samuel M, Pathan M, Jois M, Chilamkurti N, Gangoda L, Mathivanan S. transfer. Curr Opin Mol Ther. 2003;5(5):524–8. ExoCarta: A Web-Based Compendium of Exosomal Cargo. J Mol Biol. 2016; 18. Wagner J, Riwanto M, Besler C, et al. Characterization of levels and cellular 428(4):688-92. transfer of circulating lipoprotein-bound microRNAs. Arterioscler Thromb 46. Kalra H, Simpson RJ, Ji H, et al. Vesiclepedia: a compendium for extracellular Vasc Biol. 2013;33(6):1392–400. vesicles with continuous community annotation. PLoS Biol. 2012;10(12): e1001450. 19. Li C, Chen C, Zhou F, et al. GW24-e2517 Low density lipoprotein contained 47. Schriml LM, Arze C, Nadendla S, et al. Disease Ontology: a backbone for microRNAs as potential novel atherogenic factors. Heart. 2013;99 Suppl 3: A110. disease semantic integration. Nucleic Acids Res. 2012;40(Database issue): D940–6. 20. Turchinovich A, Weiz L, Langheinz A, Burwinkel B. Characterization of 48. Dahdul WM, Balhoff JP, Blackburn DC, et al. A unified anatomy ontology of extracellular circulating microRNA. Nucleic Acids Res. 2011;39(16):7223–33. the vertebrate skeletal system. PLoS One. 2012;7(12):e51070. 21. Claude A. Concentration and purification of chicken tumor I agent. Science. 49. Thomas DG, Pappu RV, Baker NA. NanoParticle Ontology for cancer 1938;87(2264):467–8. nanotechnology research. J Biomed Inform. 2011;44(1):59–74. 22. Anderson HC. Vesicles associated with calcification in the matrix of 50. Malone J, Holloway E, Adamusiak T, et al. Modeling sample variables with epiphyseal cartilage. J Cell Biol. 1969;41(1):59–72. an experimental factor ontology. Bioinformatics. 2010;26(8):1112–8. 23. Wolf P. The nature and significance of platelet products in human plasma. Br J Haematol. 1967;13(3):269–88. 24. Trams EG, Lauter CJ, Salem Jr N, Heine U. Exfoliation of membrane ecto-enzymes in the form of micro-vesicles. Biochim Biophys Acta. 1981;645(1):63–70. Submit your next manuscript to BioMed Central 25. Johnstone RM, Adam M, Hammond JR, et al. Vesicle formation during and we will help you at every step: reticulocyte maturation. Association of plasma membrane activities with released vesicles (exosomes). J Biol Chem. 1987;262(19):9412–20. • We accept pre-submission inquiries 26. Mitchell P, Petfalski E, Shevchenko A, et al. The exosome: a conserved • Our selector tool helps you to find the most relevant journal – eukaryotic RNA processing complex containing multiple 3' >5' • exoribonucleases. Cell. 1997;91(4):457–66. We provide round the clock customer support 27. Hess C, Sadallah S, Hefti A, et al. Ectosomes released by human neutrophils • Convenient online submission are specialized functional units. J Immunol. 1999;163(8):4564–73. • Thorough peer review 28. Cocucci E, Meldolesi J. Ectosomes and exosomes: shedding the confusion • Inclusion in PubMed and all major indexing services between extracellular vesicles. Trends Cell Biol. 2015;25(6):364–72. 29. Crescitelli R, Lasser C, Szabo T G, et al. Distinct RNA profiles in • Maximum visibility for your research subpopulations of extracellular vesicles: apoptotic bodies, microvesicles and exosomes. J Extracell Vesicles. 2013; 2; doi:10.3402/jev.v2i0.20677. Submit your manuscript at www.biomedcentral.com/submit