D1186–D1194 Nucleic Acids Research, 2019, Vol. 47, Database issue Published online 8 November 2018 doi: 10.1093/nar/gky1036 ECO, the Evidence & Conclusion Ontology: community standard for evidence information Michelle Giglio1,*, Rebecca Tauber1, Suvarna Nadendla1, James Munro1, Dustin Olley1, Shoshannah Ball1, Elvira Mitraka1, Lynn M. Schriml 1, Pascale Gaudet 2, Elizabeth T. Hobbs3, Ivan Erill3, Deborah A. Siegele4, James C. Hu5, Chris Mungall6 and Marcus C. Chibucos1
1Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, 2Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, 3Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA, 4Department of Biology, Texas A&M University, College Station, TX 77840, USA, 5Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX 77840, USA and 6Molecular Ecosystems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Received September 14, 2018; Revised October 12, 2018; Editorial Decision October 15, 2018; Accepted October 16, 2018
ABSTRACT biomolecular sequence, phenotype, neuroscience, behav- ioral, medical and other data. The process of biocuration The Evidence and Conclusion Ontology (ECO) con- involves making one or more descriptive statements about tains terms (classes) that describe types of evi- a biological/biomedical entity, for example that a protein dence and assertion methods. ECO terms are used performs a particular function, that a specific position in in the process of biocuration to capture the evidence DNA is methylated, that a particular drug causes an adverse that supports biological assertions (e.g. gene prod- effect or that bat wings and human arms are homologous. uct X has function Y as supported by evidence Z). Such statements, or assertions, are represented as annota- Capture of this information allows tracking of an- tions in a database by representing the entities and asser- notation provenance, establishment of quality con- tions using unique identifiers, controlled vocabularies and trol measures and query of evidence. ECO con- other systematic tools. Since knowing the scientific basis for tains over 1500 terms and is in use by many lead- an assertion is of vital importance, a fundamental part of the curation process is to document the evidence (2,3). Ev- ing biological resources including the Gene Ontol- idence comes from many sources including, but not limited ogy, UniProt and several model organism databases. to, field observations, high-throughput sequencing assays, ECO is continually being expanded and revised sequence orthology determination, clinical notes, mutage- based on the needs of the biocuration commu- nesis experiments and enzymatic activity assays. The Ev- nity. The ontology is freely available for download idence and Conclusion Ontology (ECO) provides a con- from GitHub (https://github.com/evidenceontology/) trolled vocabulary for the systematic capture of evidence in- or the project’s website (http://evidenceontology. formation. org/). Users can request new terms or changes to ex- An annotation statement consists of at least three ele- isting terms through the project’s GitHub site. ECO ments: the object being annotated, an aspect of the object is released into the public domain under CC0 1.0 Uni- that is being asserted and the evidence supporting the as- versal. sertion. The Gene Ontology (GO) (4,5) is in common use for capture of three aspects of gene products: molecular function, biological process and location in cellular compo- INTRODUCTION nent. To illustrate an example annotation (Figure 1), imag- ine a paper is published that experimentally characterizes Biocuration is the process whereby information about bi- the function of Protein X. A curator reads the paper and ological entities is collected and stored. Ideally, this is assigns a GO function term to Protein X based on the exper- done electronically, in a standardized format, and kept in imental evidence in the paper. This information is captured a central biological data repository so that it may be eas- with appropriate GO and ECO terms and deposited in a ily accessed by the research community (1). Over the past repository accessible by others. This is an example of biocu- few decades, many such databases have been built to host
*To whom correspondence should be addressed. Tel: 410 706 7694; Fax: 410 706 1482; Email: [email protected] Present address: Elvira Mitraka, Computer Science, Dalhousie University, Halifax, NS, Canada.