The Future of Curation at dictyBase Petra Fey, Siddhartha Basu, Robert J. Dodson, and Rex L. Chisholm

dictyBase, Northwestern University, Chicago, IL, USA

Abstract Ontology (GO) Extensions

The complete dictyBase overhaul and introduction of state of the art software infrastructure will allow curators to A. Gene Summary Page GO Display Until recently, GO annotations consisted of begin annotating new biological features and use existing annotations to represent and connect data in novel the term, the evidence, any interaction ways. Curated interactions via the (GO) will be used to represent protein-protein Gene Ontology Annotations for ctnnA! partners, and a reference. In order to add interactions. Curators already privately annotate spatial expression with the Dictyostelium anatomy ontology and Molecular Function more context to annotations, the GO we recently started annotating Dictyostelium disease orthologs with their respective disease ontology (DO) beta-catenin binding binds aarA consortium developed the new Annotation terms. The updated database will also allow representing GO annotations with ‘GO extensions’, which add beta-catenin binding binds Q02248 Extensions, to add deeper information actin filament binding deeper context to those annotations. In the near future HTML5 technology will revolutionize the way curators add such as ‘under what conditions’ or ‘during annotations to the database, allowing the direct editing of gene pages. Furthermore, it will open the door for Biological Process which developmental stage’, a process, direct community annotations on the gene page for interested users. mitotic cytokinesis genetically interacts with mhcA activity or localization occurs. This results protein secretion in annotations that form sentence-like Basu S, Fey P, Jimenez-Morales D, Dodson RJ, Chisholm RL. dictyBase 2015: Expanding data and annotations in a new sorocarp stalk development structures, providing a more complete software environment. Genesis. 2015 Jun 19. doi: 10.1002/dvg.22867 culmination involved in sorocarp development actin filament bundle assembly answer to actual biological questions. positive regulation of cytoskeleton organization centrosome localization Golgi localization Left: GO annotations displayed on the establishment or maintenance of bipolar cell polarity cellular protein localization localizes dcsA during sorocarp stalk development Gene Summary page. The example cellular protein localization localizes rgaA, ctxA, mhcA during sorocarp stalk development depicts a well annotated gene, which sorocarp stalk morphogenesis contains manual annotations in each of the Protein-Protein Interactions Cellular Component three GO aspects. When manual

cell cortex annotations are available, only those are cell-cell junction shown on this page. See below for all GO basal cortex during epithelial cell development during sorocarp stalk development calmodulin annotations. The relations of the annotation protein kinase extensions (such as binds, during B. GO Page Display regulates, at) are black and bold. aminopeptidase Gene Ontology Annotations for p2xA! nucleomorphin

calcium-binding protein All GO Manual GO Experimental GO Electronic GO Calcineurin (CN) Biological Process

Ca2+-dependent adhesion protein GO term + Extension Evidence With Reference Date Source

ABC transporter ATP-gated ion channel activity IMP Sivaramakrishnan & Fountain (2012) 21-02-2014 DDB

ribosomal large subunit L19 calcium ion transmembrane transport IGI p2xA p2xC,p2xD,p2xE Sivaramakrishnan & Fountain (2012) 21-02-2014 DDB requires ATP Histone H1 cation transport IBA Gaudet et. al (2010) 01-03-2011 GOC rasGAP cation transmembrane transport IEA UniProtKB-KW:KW-0406 GO_REF:0000037 25-04-2015 SPKW cysteine-rich protein cellular hypotonic response IGI p2xA,p2xC’p2xD,p2xE Ludlow et. al (2009) 14-03-2014 DDB ion transport IDA Ludlow et. al (2009) 10-03-2010 DDB vinculin B negative regulation of GTPase activity IDA Parkinson et. al (2013) 22-04-2014 DDB regulates rab11A actin, F-actin positive regulation of vesicle fusion at IDA Parkinson et. al (2013) 22-04-2014 DDB WW domain-containing protein plasma membrane myosin regulation of calcium-mediated signaling IGI p2xA,p2xC’p2xD,p2xE Ludlow et. al (2009) 14-03-2014 DDB

response to ATP IEA InterPro:IPR001429 GO_REF:0000002 25-04-2015 IPRO

Graphic Visualization of Protein-Protein Interactions. This example illustrates the interactions of calmodulin Display of all annotations on the GO page. Only the Biological Process aspect is shown. At the top are tabs (calA) with its binding partners and further interactions of some of its binding partners with other . Solid that allow easy filtering of annotations based on evidence code classes. In this example, all annotations are black lines indicate an experimentally annotated physical interaction, whereas gray dotted lines represent selected. The resulting view shows a mixture of experimental annotations (IMP, IGI, IDA), manual annotations electronically and manually inferred annotations. Names in Black are direct interacting partners of calmodulin (all of the above plus IBA) and electronic annotations (IEA). GO terms and annotation extensions are found in while those in gray are either secondary partners or those of inferred direct binding. Protein names or groups are the first column. In general, annotation extensions are always associated with experimental annotations. As of listed on the right. Note that this represents current annotations, and therefore is not likely to show the full range July 2015, we have made 330 annotation extensions that will soon be imported into dictyBase. of calmodulin binding partners. To visualize interactions, we will use cytoscape (http://cytoscape.org), an open source software project for integrating biomolecular interaction networks. This allows visualization using different layouts. Also, all interaction data will be downloadable from dictyBase.

Graphical illustrations provide an opportunity to glean protein networks and allow inferring biological functions. While calmodulin has been well characterized, the visualized interactions with the cytoskeleton, the direct binding Spatial Expression of several protein kinases and nuclear proteins, the binding of calcium-regulated proteins and the interactions with diverse other proteins illustrate the full complexity of the network. From this illustration alone we could infer the central role calmodulin plays in basic biological processes, such as cell growth and chemotaxis. Spatial Expression for png!

pre stalk region of the standing slug Gosain et. al, 2014 Dicty Life pre stalk region of the migratory slug Gosain et. al, 2014 Dicty Life prestalk AB core region of the early culminant Gosain et. al, 2014 Dicty Life basal disc of the late culminant Gosain et. al, 2014 Dicty Life stalk cells Gosain et. al, 2014 Dicty Life Spatial Expression Annotations. For many years, the Dictyostelium discoideum anatomy ontology has been Disease Annotations available (Gaudet et. al, BMC Genomics 9:130, 2008). We now intend to use that ontology while annotating spatial expression during development from published papers. Anatomy terms with their reference will be listed on the gene page in a new section. Each term will also link to to the term page where all annotated to this term Gene Information for sodA are listed allowing all annotations to be easily downloaded. The Dicty Life links will be implemented in the future and below are some preliminary ideas of how we might display this data. Gene Name: sodA pstA region of the Name Description: SuperOxide Dismutase A B early culminant

Gene ID: DDB_G0267420 pstO region of the early culminant Gene Product: superoxide dismutase upper cup region of the early culminant Protein Name: SOD

Description: superoxide dismutase of the SOD1 family; expressed at constant levels throughout anterior like cell the life cycle and upregulated upon oxidative stress; enriched in prespore cells prestalk AB core region of the early culminant Disease: amyotrophic lateral sclerosis type 1; dictyBase 2015; H.s. ortholog: P00441 stalk tube of the prespore region of early culminant the early culminant Disease Ortholog Annotations. Dictyostelium has a large number of direct human orthologs which have

been linked to diseases. dictyBase curators have recently started privately annotating those Dictyostelium basal region of the early culminant orthologs with the respective Disease Ontology (DO) term via the DO ID (e.g. DOID:0060193 in the above example). The reference is often a PubMed ID when researchers describe the protein relationship Graphical View of Spatial Expression. We can envision several ways to graphically show expression in and mention the disease in a publication. In addition, curators also inspect dictyBase ortholog mappings conjunction with the life cycle. One possibility is sketched here. (A) Upon clicking on the Dicty Life link next to and after further analysis make the annotation, as shown in the example above. The human orthologs link the annotation on the gene page, the life cycle will be displayed highlighting the stage named in the anatomy to UniProt, where typically a summary of the disease and further informative links are available. The term. (B) When clicking on the highlighted stage graphics will be displayed showing the specific life stage in disease ontology will be housed in the database and thus will be subject to dictyBase searches. In addition detail. For example, the graphic above would display when clicking on the link next to the annotation prestalk all disease orthologs will be downloadable. AB core region of the early culminant. All cell types during that stage are marked in different colors for easy identification. We could also envision showing RNA expression superimposed on the life cycle, in addition to the manual annotations. The detailed image of the early culminant was adapted from Fig. 4 of Gaudet et al. 2008.