<<

Roncaglia et al. Cilia (2017) 6:10 DOI 10.1186/s13630-017-0054-8 Cilia

RESEARCH Open Access The Ontology of eukaryotic cilia and fagella Paola Roncaglia1,2*† , Teunis J. P. van Dam3,4*†, Karen R. Christie2,5, Lora Nacheva6, Grischa Toedt7, Martijn A. Huynen3, Rachael P. Huntley1,8, Toby J. Gibson7 and Jane Lomax1,2,9

Abstract Background: Recent research into ciliary structure and function provides important insights into inherited termed and other cilia-related disorders. This wealth of knowledge needs to be translated into a computa- tional representation to be fully exploitable by the research community. To this end, members of the (GO) and SYSCILIA Consortia have worked together to improve representation of ciliary substructures and processes in GO. Methods: Members of the SYSCILIA and Gene Ontology Consortia suggested additions and changes to GO, to refect new knowledge in the feld. The project initially aimed to improve coverage of ciliary parts, and was then broadened to cilia-related biological processes. Discussions were documented in a public tracker. We engaged the broader cilia community via direct consultation and by referring to the literature. Ontology updates were implemented via ontol- ogy editing tools. Results: So far, we have created or modifed 127 GO terms representing parts and processes related to eukaryotic cilia/fagella or prokaryotic fagella. A growing number of biological pathways are known to involve cilia, and we continue to incorporate this knowledge in GO. The resulting expansion in GO allows more precise representation of experimentally derived knowledge, and SYSCILIA and GO biocurators have created 199 annotations to 50 ciliary . The revised ontology was also used to curate mouse proteins in a collaborative project. The revised GO and annotations, used in comparative ‘before and after’ analyses of representative ciliary datasets, improve enrich- ment results signifcantly. Conclusions: Our work has resulted in a broader and deeper coverage of ciliary composition and function. These improvements in ontology and annotation will beneft all users of GO enrichment analysis tools, as well as the ciliary research community, in areas ranging from microscopy image annotation to interpretation of high-throughput studies. We welcome feedback to further enhance the representation of cilia biology in GO. Keywords: , , Gene Ontology, , , , Microscopy image annotation, Systems biology

Background letter to the Royal Society, he reported the existence of Te lens-making skills of pro- , also describing their beating cilia and fagella vided him with the highest magnifcation microscopes [1]. Tat these two are homologous to each that had yet to be made. With these tools, in a 1676 other became clear when Irene Manton used electron microscopy to reveal the typical 9 + 2 arrangement of the doublets in motile [2]. How- *Correspondence: [email protected]; [email protected] †Paola Roncaglia and Teunis J. P. van Dam contributed equally to this work ever, the full biomedical signifcance of these organelles 2 The Gene Ontology Consortiumhttp://geneontology.org has only begun to be established with the realization 4 Theoretical Biology and Bioinformatics, Department of Biology, Faculty that non-motile primary cilia of vertebrates are the site of Science, Utrecht University, Utrecht, The Netherlands Full list of author information is available at the end of the article of many critical signalling pathways, notably for sonic

© The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Roncaglia et al. Cilia (2017) 6:10 Page 2 of 13

hedgehog which plays key roles in embryonic develop- and Joubert syndromes [12] and the BBSome complex ment [3], as well as being sensory devices for many of for Bardet-Biedl syndrome [13]. GO annotations form a our basic [4]. Tereafter, cilia research rapidly knowledgebase, refecting the collected information from entered the era of ciliopathies-inherited diseases involv- a vast body of literature. Te knowledge capture of cili- ing defects in cilia, gaining intense interest from human ary protein functions and subcellular localizations will be geneticists in addition to the broader biological research even more relevant as new disorders are classifed as cili- areas in which these organelles play key roles [5, 6] (see opathies [14]. As such, GO is indispensable when study- Additional fle 1). ing the cilium from a systems biology perspective. However, primary cilia were often dismissed as the Until a decade ago, the cilium was a little-appreciated “’s appendix”, rarely discussed in textbooks or research in the vertebrate cell, and the paucity of infor- papers, and even more rarely depicted in diagrams of the mation in the literature was mirrored by a limited num- numerous types of diferentiated cell types that possess ber of corresponding concepts and annotations in the them; many aspects of cilia biology remain poorly under- Gene Ontology. Due to the importance of GO in provid- stood. In addition, much of the older knowledge is not ing cellular functional and contextual information for available electronically and therefore not accessible to be large-scale genomic and proteomic analyses, ciliary fac- applied in modern discovery programmes, which tors were efectively excluded from many contemporary typically use whole genome approaches to link candidate systematic surveys of the cell. Ten, more recently, an mutations to gene functional annotation. increasing focus on ciliary research highlighted the need One of the indispensable resources for function anno- to improve representation and capture of cilia-related tation used in genome research is the Gene Ontology knowledge in GO. Some of this knowledge has been (GO). Te GO is a computational representation of bio- included in the SysCilia standard (SCGS) database that logical knowledge that defnes concepts used to describe captures known human cilium in a relatively sim- aspects of gene function, and the relationships between ple list with genes and their location in the cilium [15]. In these concepts. It consists of three main branches: this article, we report on the steps we have taken towards Molecular Function (e.g. ‘ciliary neurotrophic factor a major revision of ciliary component and process terms receptor activity’), Biological Process (e.g. ‘ciliary transi- in GO, and on the curation of human ciliary proteins that tion zone assembly’) and Cellular Component (e.g. ‘cili- was made possible by such revision. ary transition zone’). Biocurators can then associate GO terms with specifc gene products (proteins and ) Methods to capture experimental fndings from the scientifc lit- Ontology development erature [7, 8]; these associations are known as GO anno- Members of the SYSCILIA Consortium [16] contacted tations. GO annotations are used widely by researchers the Gene Ontology Consortium (GOC) editorial team as a way to generate hypotheses from data, in particular to discuss the need for a more complete and up-to-date via enrichment analysis. For example, the PANTHER formal representation of ciliary composition and biology. online resource [9] hosts a tool to perform GO enrich- A team at Mouse Genome Informatics had also begun a ment analysis on user-defned gene sets, to help identify project focused on annotation of mouse ciliary proteins the biological processes or cellular components enriched and encountered the need for additional GO develop- in the set. Using this type of approach, the role of the ment in this area (Christie and Blake [17]). A working DNA-binding protein RFX2 in spermatogenesis has been group was formed involving GO editors, GO biocurators assessed and confrmed [10], while specifc ciliary func- and members of SYSCILIA. Engagement of the larger tions were shown to be present in the ampulla and isth- cilia research community was ensured in multiple ways, mus of the bovine [11]. A well-structured GO including communicating with SYSCILIA and other representation of the cilium and cilium-mediated pro- researchers and referring to a broad corpus of literature. cesses greatly afects the ability to capture information Opinions outside the working group were especially from the literature, and hence the quality of the outcomes sought in debatable cases. of data analyses. Furthermore, the more fne-grained the SYSCILIA provided an initial list of suggestions for representation, the more informative, insightful and use- new terms to be added in GO, as well as changes to exist- ful a GO enrichment analysis can be. Tis is especially ing terms. Initially, the scope of the work was restricted true for the cilium, where a gene product’s compartmen- to ciliary subcellular components, but as curation of rel- talization and biological process can be quite restricted evant literature progressed, the efort was soon broad- and highly specifc. For instance, many proteins involved ened to cover cilia-related biological processes as well. in ciliopathies are located in particular ciliary substruc- To record discussions on ontology development, and tures, such as the transition zone for Meckel-Gruber to allow members of GO and SYSCILIA outside the Roncaglia et al. Cilia (2017) 6:10 Page 3 of 13

working group to contribute, we used a public tracker dataset [26] was obtained from CilDB [27] and the result- on the GitHub GO repository, specifcally devoted to ing list of Ensembl protein identifers were converted to ontology requests [18]. Te outcome of such discussions gene symbols in Ensembl biomart (version 86) [28]. Term was the incorporation of new classes (terms) in GO, or enrichment analysis was performed using the Ontolo- the modifcation of existing classes. Te modifcations gizer 2.1 [29] using the Parent–Child–Union method and ranged from simple changes, such as the addition of a applying the Bonferroni multiple testing correction. A synonym, to more complex ones, such as creating links custom R script was used to generate graphs to compare with other ontology classes. GO editors then imple- two term enrichment analyses for the same dataset with mented these additions and changes manually via the diferent combinations of GO and UniProt-GOA versions ontology editing tools Protégé [19] or OBO-Edit [20]. to investigate the efects of improvements in ontology and Also, some pattern-based classes (mostly to represent annotations separately and combined. Te fnal graphs regulation of ciliary processes and localization to ciliary were processed in Adobe Illustrator for enhanced clarity. components) were added using an automated generator All scripts, required fles, and instructions to obtain third of GO terms called TermGenie [21]. party software are available on GitHub (https://github. com/JohnvanDam/GeneOntologySupplement). Annotation procedure Human ciliary proteins were manually associated with Results GO terms according to recommended GO annotation Improvements to cilia/fagellar Gene Ontology terms procedures [22]. Annotation is performed by GO biocu- As part of the SYSCILIA research Consortium [16], we rators, who read relevant scientifc articles and associate examined the status of the cilia representation in GO gene products with GO classes based on experimental at the end of 2012. Several discrepancies with current evidence. Te resulting annotations consist of a protein knowledge were highlighted, the main ones being as fol- identifer, a GO term, an evidence code (based on the lows: (a) eukaryotic fagella were represented by the same type of knowledge available, see [23]), and a reference concepts as prokaryotic fagella; (b) eukaryotic fagella to the scientifc literature (mostly via a PubMed identi- were treated as separate from eukaryotic cilia; (c) two fer). Where appropriate, the expressivity of annotations distinct terms existed for ‘cilium axoneme’ and ‘axoneme’, was increased by capturing information related to cell with the latter not being connected to the higher-order types such as ‘respiratory epithelial cell’ (by referring to cilium structure; (d) the detailed substructure of the the Cell Ontology term CL:0002368), or anatomical loca- organelle, as well as basic cilia-related processes, were tions such as ‘’ (using the Uberon term largely undocumented in GO, therefore limiting useful- UBERON:0003126), as detailed in [24]. Te Protein2GO ness of the resource in many areas of basic research, but tool provided by EMBL-EBI was used to associate gene especially in the feld of ciliopathies. products with GO classes [25]. As part of this ciliary Te issues above were addressed in collaboration curation efort, human proteins from the SYSCILIA Gold with the Gene Ontology (GO) Consortium. As a result, Standard set [15] were annotated to both ciliary and non- many improvements were made to the ontology. Te ciliary GO terms, to fully capture the experimental infor- links between terms for eukaryotic fagellum and bacte- mation provided. Where the same literature provided rial fagellum were removed, a term for archaeal fagel- knowledge about ciliary genes from other species (e.g. rat lum was added, and we merged the eukaryotic fagellum or mouse), these genes were also annotated. and cilium terms into GO:0005929 ‘cilium’. Overall, 30 GO terms specifcally related to prokaryotic fagella, and Term enrichment analysis covering subcellular components as well as biological Two versions of GO were downloaded from the Gene processes, are currently available in the Gene Ontology. Ontology Consortium archive ftp-server (2012-12-01 Tey are listed in Additional fle 2, and include 10 terms and 2017-01-01) in the OBO format (ftp://ftp.geneon- added or modifed as part of this project. Previous anno- tology.org/go/ontology-archive/). In addition, we down- tations to cilium/fagellum terms were re-assigned where loaded time-matched Gene Ontology annotation data necessary based on taxonomy (i.e. bacterial, archaeal or from UniProt-GOA (http://www.ebi.ac.uk/GOA; see the eukaryotic). Frequently Asked Questions on http://www.geneontol- In Fig. 1, we provide a graphical representation of the ogy.org for this and other options to access older versions cilium, and highlighting some of the ontology terms that of gene association fles). Specifcally, we downloaded were added or modifed as part of this project. We cap- UniProt-GOA version 116 as a time match for the 2012- tured up-to-date knowledge about well-defned struc- 12-01 ontology fle, and UniProt-GOA version 164 (2017- tures by adding terms to represent the Y-shaped linkers 01-16) for the 2017-01-01 ontology fle. Te Ross et al. in the transition zone, the central pair of in Roncaglia et al. Cilia (2017) 6:10 Page 4 of 13

that occur in the literature and that do not refer to spe- a cifc structures, but rather to observed ciliary subcom- Ciliary tip partments, such as the ‘inversin compartment’ [32], ‘ciliary tip’ [33] and ‘ciliary base’ [34]. In Additional fle 3, we provide a full list of GO terms currently available to the scientifc community to describe ciliary subcompart- Ciliary membrane Ciliary shaft ments and main cilia-related biological processes, for a grand total of 180 classes as of January 2017. Of these, Axoneme 65% (117 terms) were created or modifed as part of the Inversin compartment ontology development project described here. While the Plasma membrane curation of human ciliary proteins using GO terms is Y-shaped link Ciliary tranzition zone described below, it is worth noting here that 54% of all Ciliary pocket existing cilia-related GO terms applicable for mammalian Ciliary base Ciliary transition fiber Ciliary basal body annotation have been used to annotate mouse proteins in a parallel complementary efort (Christie and Blake [17]). Daughter * We examined how cilia types were categorized in GO, Ciliary rootlet and revised and expanded that classifcation signifcantly. Previously, GO:0005929 ‘cilium’ had two children, ‘motile cilium’ and ‘primary cilium’, with descendants ‘motile primary cilium’ and ‘nonmotile primary cilium’. Tat categorization was thus trying to capture both b Ciliary membrane Outer arm and sensory aspects of cilia at the same time. However, Inner dynein arm stalk in doing so, it did not allow for a complete and correct Radial spoke head representation of current knowledge. For example, spe- Axonemal central pair projection C2 axonemal microtubule cialized cilia in vertebrate embryos, e.g. nodal cilia of Axonemal Axonemal central bridge central pair the mouse or cilia in the Kupfer’s vesicle of zebrafsh, C1 axonemal microtubule B axonemal microtubule are motile, but have an axoneme confguration of 9 0, A axonemal microtubule + Axonemal link often found in non-motile cilia [35]; conversely, kinocilia Fig. 1 Schematic representation of the cilium and its main parts. display a 9 + 2 axonemal structure, but are considered Components in bold indicate new terms in GO; components in italics non-motile [4]. Also, motile cilia have been shown to indicate pre-existing GO terms that were modifed to improve them. have a variety of sensory functions [36]. a Schematic overview of a cilium. b Cross section of a cilium with We reviewed the literature, and resolved to classify a 9 2 axoneme. *‘Daughter centriole’ is a new synonym of ‘ciliary + cilia based primarily on the presence or absence of motil- basal body’ ity, and secondarily on their axonemal confguration. Te role of cilia in sensory pathways, when present, should instead be captured by annotating to appropriate biologi- the 9 + 2 axoneme, transition fbres and many more (see cal process terms, rather than trying to embed it in a cel- Additional fle 3). To address another major concern, the lular component term. Te classifcation we implemented term ‘cilium axoneme’ was merged into ‘axoneme’, and is consistent with the recent one by Takeda and Narita, ‘axoneme’ was made a part_of ‘cilium’ (via an intermedi- who proposed an eight-category system based on axone- ate link with the grouping term ‘ciliary part’). As a result, mal confguration, motility of the cilium, and number of all axonemal substructures are now correctly placed in cilia per cell [37]. For the Cellular Component branch the ‘cilium’ branch of GO, and annotations to axonemal of GO, only the structural aspects of axonemal confgu- sub-components can now be propagated to ‘cilium’, with ration and motility are relevant, so we simplifed to a a positive impact on data analysis (e.g. enrichment stud- four-category system. A similar four-category classifca- ies). Figure 2 shows the Gene Ontology representation of tion was also proposed by Ibañez-Tallon et al. [38] and GO:0005930 ‘axoneme’. supported by Fisch and Dupuis-Williams [39]. We also Similarly, we updated the representation of the well- consulted directly with some experts in the cilia commu- studied mammalian fagellum by placing it under nity, and presented our proposal at the international Cilia a new, descriptive term ‘9 + 2 motile cilium’ (see below) 2016 conference held in Amsterdam, Te Netherlands and by adding missing connections to some of its sub- [40]. Fig 4 shows the current ontology structure. Note structural components; the improved hierarchy is shown that the GO classifcation does not aim to include indi- in Fig. 3. We also implemented several ontology terms vidual terms for the totality of axonemal confgurations Roncaglia et al. Cilia (2017) 6:10 Page 5 of 13

a [Term] id: GO:0005930 name: axoneme namespace: cellular_component alt_id: GO:0035085 alt_id: GO:0035086 def: "The bundle of microtubules and associated proteins that forms the core of cilia (also called flagella) in eukaryotic cells and is responsible for their movements." [GOC:bf, GOC:cilia, ISBN:0198547684] comment: Note that cilia and eukaryotic flagella are deemed to be equivalent. In diplomonad species, such as Giardia, the axoneme may extend intracellularly up to 5um away from the plane of the plasma membrane. synonym: "ciliary axoneme" EXACT [] synonym: "cilium axoneme" EXACT [] synonym: "flagellar axoneme" EXACT [] synonym: "flagellum axoneme" EXACT [] xref: Wikipedia:Axoneme is_a: GO:0044430 ! cytoskeletal part is_a: GO:0044441 ! ciliary part relationship: has_part GO:0005874 ! microtubule relationship: part_of GO:0005856 ! relationship: part_of GO:0097014 ! ciliary plasm

c b

Fig. 2 Details of the Gene Ontology term ‘axoneme’. a Full ontology stanza in OBO format. Documentation on relationship types and ontology format is available via [30]. b Placement of ‘axoneme’ within the Gene Ontology. The term itself and its link to ‘ciliary part’ are highlighted in light blue. Dark blue arrows and “I” indicate is_a relationships; orange arrows and “p” indicate part_of relationships. The grey arrow and rectangle connect- ing ‘axoneme’ and ‘microtubule’ indicate has_part relationship. c Overview of main axonemal substructures in GO. These are is_a children terms of ‘axoneme part’. Terms with a ‘ ’ sign have children themselves. Terms in bold in b, c have computable defnitions [31]. b, c were obtained with the + Graph Editor function of the OBO-Edit ontology editing tool [20]

observed in nature (such as 9 + 4 axonemes in Hensen’s structuring GO processes in agreement with Reactome node in rabbit embryos [41], or some unusual structures (and vice versa) increases interoperability and optimizes observed in insects [42]), but still allows the capture of the engagement of feld researchers, while still maintain- less common instances as specifcally as possible, as well ing specifc scopes for each resource (in GO, representa- as ones where fne structure or motility are not known. tion of pathways focuses on processes encoded by gene Due to the increasing number of cellular pathways products, whereas in Reactome it is centred on transfor- in which cilia are known to be involved, the Biologi- mations of chemical entities). Te Reactome entry for cal Process branch of GO was also in need of improve- ‘Assembly of the primary cilium’ was revised recently, ment. We focussed mainly on two distinct areas: cilium and captures up-to-date knowledge [45]. We worked with organization and multiciliation. Within the frst area, Reactome editors to improve integration with GO in this we revised the ontology under the branch area; for example, Reactome renamed their entry to ‘Cil- (GO:1903887 ‘cilium assembly’) by aligning it with the ium Assembly’ to refect applicability to cilium subtypes manually curated Reactome Pathway Database. Reac- in agreement with GO classifcation. New GO terms tome entries are authored by expert biologists in collabo- were created as necessary, and links among GO terms ration with Reactome editorial staf, and cross-reference were added, resulting in a richer representation of the to many bioinformatics databases [43, 44]. Terefore, biological events that lead to the formation of a cilium. Roncaglia et al. Cilia (2017) 6:10 Page 6 of 13

GO terms that had corresponding Reactome entries were a cross-referenced with appropriate Reactome identif- b ers, and vice versa. (Due to the diferent natures of these resources, not all terms can be linked efectively.) We also extended the cilium assembly ontology representation by including the formation of the intermediate ciliary vesicle as observed in vertebrates [46] (Fig. 5). GO terms avail- able to describe details of the cilium assembly process are included in Additional fle 3. Te revision of the overall ‘cilium organization’ process branch of GO (GO:0044782) impacted an existing term, ‘cilium morphogenesis’. We found that, in view of the new, more detailed representation of ciliary processes in GO, the meaning of ‘cilium morphogenesis’ now referred Fig. 3 Details of the Gene Ontology term ‘sperm fagellum’. a Place- to a mixture of ‘cilium assembly’ and its parent term ‘cil- ment of ‘sperm fagellum’ within the Gene Ontology. The term itself ium organization’. We removed the now-redundant class and its link to the parent ‘9 2 motile cilium’ are highlighted in light + blue. Dark blue arrows and “I” indicate is_a relationships. b Overview ‘cilium morphogenesis’, and worked with GO biocurators of main sperm fagellum substructures in GO (part_of children terms). to rehouse its previous annotations (to several diferent Obtained with the Graph Editor function of the OBO-Edit ontology species) under the most appropriate terms. editing tool [20]. Documentation on relationship types is available Among cilia-related processes, we also focused on via [30] those that lead to formation of multiciliated cells.

Fig. 4 Details of the Gene Ontology term ‘cilium’ and its is_a descendants. The term ‘cilium’ itself is highlighted in light blue. Dark blue arrows and “I” indicate is_a relationships. Obtained with the Graph Editor function of the OBO-Edit ontology editing tool [20]

Ciliary vesicle formation Ciliary basal body docking Ciliary transition zone assembly Cilium elongation

Axoneme assembly Fig. 5 Cilium assembly. In vertebrates, the ciliary vesicle forms at the tips of the ciliary transition fbres attached to the basal body. The ciliary vesicle then fuses with the plasma membrane forming the ciliary pocket and ciliary membrane. The axoneme extends from the basal body and the transi- tion zone is assembled with its distinctive Y-shaped links and ciliary necklace. Further axonemal assembly causes the cilium to elongate Roncaglia et al. Cilia (2017) 6:10 Page 7 of 13

Following discussions with members of the cilia research genes from the SCGS involved in ciliary movement, pri- community, it became clear that the distinction between marily and genes involved in axoneme assem- uniciliated and multiciliated cells was biologically impor- bly [15] (DNAH1, DNAH11, DNAH5, DNAH9, DNAI1, tant. However, this feature could not be incorporated as DNAI2, CCDC114, CCDC39, CCDC40, DISC1, NME8, such in the Cellular Component branch of GO, as the and PCM1; UniProt identifers Q9P2D7, Q96DT5, cilia in multiciliated cells are generally not structurally Q8TE73, Q9NYC9, Q9UI46, Q9GZS0, Q96M63, distinct from those in singly ciliated cells. Rather, ‘mul- Q9UFE4, Q4G0X9, Q9NRI5, Q8N427 and Q15154, ticiliation’ is a complex and multifaceted cell diferentia- respectively). Our literature searches identifed 27 rel- tion process that occurs in specifc tissues or organisms, evant papers for these genes, as well as two additional and that was previously only minimally represented in papers focused on two genes (ARMC4 and DNAH7, with GO. We improved its description in several ways, for UniProt IDs Q5T2S8 and Q8WXX0) that are also asso- example by adding to the branch of ‘de novo centriole ciated with primary ciliary dyskinesia. From these 29 assembly’ (see Additional fle 3). It is also important to papers (Additional fle 4; also see below), we made 157 note that, when capturing the role of multiciliation pro- annotations, 89 of which were to ciliary GO terms for teins via GO annotation, curators can increase expressiv- 40 human genes (Additional fles 5, 6; also see below). A ity of their annotations, wherever possible, to indicate the few of these papers also included experimental charac- specifc cell type(s) in which the protein functions. Tis terization of mouse genes; annotations made for mouse is accomplished by referring to the Cell Ontology [47], genes are included in the annotation project described by which provides a broad coverage of ciliated cell classes, Christie and Blake [17]. and using a compositional approach described by Hunt- In the process of making phylogenetic annotations, ley et al. [24]. as described below, we identifed proteins in Chla- Another area that received attention was ‘cilium- mydomonas reinhardtii that had been experimentally dependent cell motility’ (GO:0060285). Terms related to studied and could be used to infer functions for unchar- bacterial, archaeal, and eukaryotic fagellar/ciliary-based acterized homologs in and other . Most cell motility were made distinct from each other. We of these proteins are inner arm or outer arm axonemal carried out a revision to better describe the mechanism dyneins or the cytoplasmic type dyneins involved in of mobility, including cases that do not involve fagel- intrafagellar transport (IFT). Tus, we annotated 13 lated cells, such as ‘amoeboid ’ (observed papers (Additional fle 7) with experimental characteriza- in e.g. the sperm of C. elegans [48, 49]). Overall, 5 new tions of ciliary dyneins from Chlamydomonas reinhardtii. terms were added to account for instances of non-cili- Tis produced 74 annotations (55 to ciliary terms) to 16 ated sperm motility (generic ‘sperm motility’, ‘amoeboid dynein genes, as well as 3 other genes (Additional fles 8, sperm motility’, and regulation terms for the latter); these 9). We also annotated four additional papers (Additional are not included in the list of cilia-related terms available fle 4) targeting the human genes DYNC2H1 and WDR60 in Additional fle 3. (UniProt IDs Q14204 and Q8WVS4). Tis follow-up Overall, as part of the work described in this paper, work making literature-based annotations generated 42 we added 76 new ontology terms related to cilia or fa- more annotations to 10 additional human genes, bringing gella, and modifed 51 existing ones. Additional fle 3 our total to 199 GO annotations (Additional fle 6) for 50 provides the complete list of cilia- and fagella-related human genes (Additional fle 5). Cellular Component and Biological Process terms that Concurrent to our eforts, Christie and Blake have fully are now available for data analysis and to capture cili- curated 134 mouse ciliary genes, all of which correspond ary and fagellar biology. Full details of ontology terms to human genes on the SCGS list, as of December 2016 (including synonyms and relationships with other (Christie and Blake [17]). Amongst the genes targeted for terms) are publicly accessible via the GO browsers annotation in that project were the majority of the dynein AmiGO and QuickGO [50, 51]. Te ontology may be genes on the list of mouse homologs of SCGS human downloaded freely from http://geneontology.org/page/ genes, focusing on those not previously well annotated. download-ontology. While many of the GO annotations for these genes were to processes that are afected when cilia are disrupted, Concurrent gene annotation eforts such as ‘determination of left/right symmetry’ or ‘cilium In order for the improved ontology to have an impact, movement’, some were to terms useful for the phyloge- genes and gene products need to be annotated using netic annotation of dynein proteins. these new terms. Using the ontology for annotation also Tis solid base of experimental annotations for human helps to clarify what terms are needed in the ontology. and Chlamydomonas dynein genes, as well as a few For our annotation efort, we started with a set of twelve from mouse, allowed us to make detailed phylogenetic Roncaglia et al. Cilia (2017) 6:10 Page 8 of 13

annotations using the Phylogenetic Annotation and a better representation of up-to-date knowledge of the INference Tool [52] of the sequences within the seven cilium. To illustrate the respective contribution of the PANTHER protein families [9] containing ciliary dynein progress in GO annotation and ontology development, genes (Additional fle 10). A couple of the smaller dynein we performed GO term enrichment analysis using the families had previously been annotated, but our addi- current ontology but the old 2012 gene annotations, and tional annotations allowed propagation of GO terms pro- then using the current annotations but the old ontology viding specifcity with respect to which type of dynein version from 2012 (see Additional fle 11). Tese analy- complex(es) are relevant. However, the majority of the ses clearly show the signifcant impact of the progress in dynein sequences, including those in the large families both gene annotation and ontology development, on the for dynein heavy chains (PTHR10676), dynein inter- ranking as well as the p values of relevant ciliary terms. mediate chains (PTHR12442), or dynein light chains In our second analysis, Ross et al. describe a gene (PTHR11886), had not previously been phylogenetically expression study of human airway epithelial cells cul- annotated. Tus, our annotations provided the basis for tured at an air–liquid interface [26]. Te culture condi- comprehensive phylogenetic annotation of the ciliary tions cause diferentiation into multiciliated cells; thus, dynein genes. Up-to-date GO annotations may be freely the gene expression dataset is expected to refect the downloaded from the GO website [53] or using QuickGO molecular processes involved in cilium assembly, the [51]. process of forming cilia. In the 2012 state of GO ontol- ogy and annotations, ciliary-related terms are already sig- Efects of Gene Ontology and protein annotation nifcantly overrepresented (Fig. 6b). However, using the improvements on term enrichment analyses current version of GO, we fnd more relevant GO terms In order to assess the efects of our improvements on the descriptive of the processes that the experiments were practical utility of the GO resource for ciliary research- designed to examine, such as ‘cilium organization’ and ers, we performed GO term enrichment analysis on two ‘cilium assembly’ (Fig. 6b). Overall, the overrepresented published datasets using versions of GO ontology and ciliary terms have not only become higher in ranking annotations from December 2012, when we started the with smaller p values, but also more specifc. project, and January 2017, and comparing the results. We used the software package Ontologizer [29] to perform Discussion GO term enrichment analyses using the correspond- Te importance of cilia in a vast array of cell types across ing sets of Gene Ontology annotations from UniProt , and their role in an ever-growing number of [25]. Two datasets were considered: the SYSCILIA Gold human diseases and disorders, prompted us to address Standard of cilia genes [15], and a gene expression data- the gap between current knowledge on ciliary structures set of reassembling motile cilia in epithelial cells by and processes and the Gene Ontology (GO), the most Ross et al. [26]. widely used tool to represent this knowledge computa- Te SCGS is a standardized list of verifed ciliary genes tionally and make it available to the biomedical research for use in systems biology approaches [15]. Te improve- community. Our efort increased the number of ontology ments in ontology are refected in two ways in a GO classes available to describe cilia, fagella and the events term enrichment analysis for this dataset (Fig. 6a). Terms they participate in, and allowed a signifcant improve- directly related to the cilium appear consistently higher ment in the curation coverage of mammalian ciliary in the ranking. Using the current state of GO ontology factors. and annotations, ‘cilium’ is now the top ranking term. Our project enables a more consistent representation Equally important is the observed lower p value (6.1e−72 of knowledge by providing the community with an ontol- in December 2012 vs. 1.5e−214 in January 2017). A sig- ogy structure that includes a standardized set of con- nifcant contribution to the improvement of observed cepts that are carefully defned and related to each other. p values is brought by the concurrent mouse annota- In fact, while term usage in the scientifc literature can tion efort by Christie and Blake [17], in which the list of sometimes be ambiguous, GO requires its classes to be genes targeted for annotation was based on the SCGS. unambiguously defned. An example is the frequent use Mouse annotations were subsequently transferred to of “axonemal localization” in articles, meaning “localiza- their human 1-to-1 orthologs and assigned an evidence tion along the length of the cilium”. However, “axonemal code ‘Inferred from Sequence Orthology’ (ISO), accord- localization” could also be interpreted such that a protein ing to an established pipeline described in [54]. Te specifcally is “part of” the ciliary axonemal microtubule ontology development and annotation work described structures. Te former interpretation of the term may be in this paper, and the mouse annotation project carried clear to scientists comfortable with cilia research, but not out by Christie and Blake, act synergistically towards to those new to or outside of the feld. Te formalization Roncaglia et al. Cilia (2017) 6:10 Page 9 of 13

2012−12−01 2017−01-01 a SYSCILIA gold standard cell projection 2.0e−92 1.5e−214 cilium cell projection part 4.4e−76 3.3e−157 ciliary part cilium 6.1e−72 3.6e−146 cell projection microtubule organizing center 3.4e−61 1.0e−136 cell projection part cilium part 4.7e−60 9.2e−114 cilium organization axoneme 1.0e−59 6.4e−111 cell projection organization cytoskeletal part 3.0e−59 1.1e−97 cytoskeletal part cilium morphogenesis 4.9e−55 1.7e−93 organelle assembly microtubule−based process 1.0e−49 1.5e−91 microtubule−based process cellular component assembly involved in morphogenesis 4.2e−48 1.3e−90 cell projection assembly non−membrane−bounded organelle 2.5e−43 5.8e−87 single−organism organelle organization intracellular non−membrane−bounded organelle 3.9e−43 2.3e−63 cilium assembly cell projection organization 6.1e−32 6.4e−62 intracellular non−membrane−bounded organelle microtubule basal body 7.2e−29 3.8e−52 cytoskeleton cytoskeleton 8.7e−29 3.8e−52 cellular component organization or biogenesis cilium assembly 1.6e−27 5.8e−52 non−membrane−bounded organelle cell projection assembly 4.1e−27 1.4e−49 cellular component organization cellular component organization or biogenesis 6.7e−27 7.2e−48 signaling pathway multi−organism process 7.4e−26 7.0e−42 microtubule organizing center part organelle part 1.0e−24 1.2e−41 organelle part cellular developmental process 2.3e−24 3.9e−40 membrane docking smoothened signaling pathway 3.0e−24 2.4e−38 microtubule organizing center part 1.3e−22 1.2e−37 microtubule−based protein transport centrosome 2.9e−22 1.1e−35 protein transport along microtubule developmental process 3.3e−20 1.3e−35 microtubule−based movement microtubule cytoskeleton 4.2e−20 3.5e−33 single−organism cellular process axoneme part 1.4e−18 2.3e−30 cellular component assembly cellular component morphogenesis 2.6e−17 8.0e−29 ciliary basal body flagellum 5.0e−15 9.6e−29 intraciliary transport particle macromolecular complex 1.2e−14 5.5e−25 organelle

Ross, et al. 2007 b axoneme 3.0e−07 2.8e−26 cilium organization microtubule 1.1e−05 3.0e−24 cilium cilium 1.7e−05 1.1e−17 cell projection assembly microtubule cytoskeleton 1.4e−04 1.5e−16 cilium assembly flagellum 2.5e−04 4.3e−14 ciliary part cilium morphogenesis 1.3e−03 5.2e−10 microtubule−based process microtubule−based flagellum 3.0e−03 9.6e−10 intraciliary transport particle microtubule−based movement 7.1e−03 1.5e−09 protein transport along microtubule cell projection 1.3e−02 2.4e−09 intraciliary transport particle B ER to Golgi transport vesicle 1.7e−02 5.7e−09 organelle assembly ER to Golgi transport vesicle membrane 1.7e−02 5.8e−09 microtubule−based protein transport axoneme part 1.9e−02 9.1e−08 axonemal dynein complex assembly iron ion binding 2.3e−02 7.1e−07 cytoskeletal part MHC protein complex 3.1e−02 1.2e−06 cell projection organization tetrapyrrole binding 3.5e−02 1.5e−06 cell projection cell projection part 4.4e−02 1.9e−06 axoneme cilium part 4.6e−02 2.9e−06 microtubule cytoskeleton MHC class II receptor activity 6.9e−02 5.8e−06 cytoskeleton protein folding 7.3e−02 9.7e−06 microtubule−based movement microtubule basal body 1.2e−01 6.7e−05 cell projection part oxygen binding 1.5e−01 1.2e−04 protein complex localization cellular component assembly involved in morphogenesis 2.2e−01 1.8e−04 MHC protein complex integral to lumenal side of membrane 2.9e−01 2.5e−04 cytoskeleton−dependent intracellular transport hair follicle development 3.3e−01 1.0e−03 single−organism cellular process phenol−containing compound metabolic process 3.3e−01 1.2e−03 microtubule bundle formation integral to organelle membrane 3.5e−01 1.9e−03 centrosome cobalt ion transport 3.5e−01 3.8e−03 lumenal side of endoplasmic reticulum membrane xenobiotic metabolic process 3.6e−01 2.0e−02 cellular component assembly 4.0e−01 3.4e−02 ciliary basal body intrinsic to organelle membrane 4.2e−01 4.6e−02 peptide antigen binding Fig. 6 Comparison of GO term enrichment analyses of ciliary datasets using versions of GO from 2012 and 2017. Green squares: GO terms that rank higher using the current version of GO; red squares: terms that rank lower; grey squares: terms that have dropped out of the top 30 ranked results; white squares: terms that are among the top 30 when using the current version of GO, but not the 2012 one. p values were corrected using the Bonferroni multiple testing correction. Terms in grey are not signifcantly enriched. a Term enrichment analyses of the SYSCILIA gold standard. Cilia-specifc terms rank higher. The improvement of the Gene Ontology and advancement in gene annotations have also been assessed respective of each other, see Additional fle 11. b Term enrichment analyses of the Ross et al. dataset. Overrepresented terms gained smaller p values but have also become more descriptive of the experiments, e.g. ‘cilium organization’, ‘cellular component assembly involved in morphogenesis’ and ‘cilium assembly’ Roncaglia et al. Cilia (2017) 6:10 Page 10 of 13

in GO must be accessible to a broad scientifc commu- cilium’ is a specialized type of ‘ciliary transition zone’. nity, and in this case includes several terms to denote Tese improvements greatly increased our ability to cap- specifc regions of the cilium. For instance, we defned ture experimental work characterizing mouse models the sporadically used term ‘ciliary shaft’ to correspond that advances the understanding of a devastating human to the protruding part of the cilium, and thus this term disease. In this manner, annotation of mouse genes fed is often a better representation of what is meant when a back into the development of the ontology, both to clarify protein is observed to “localize to the axoneme”. previously existing terms or to create new terms when Some of the new GO terms we implemented will make needed (Christie and Blake [17]). it easier to represent experimental fndings from the lit- Many of the ontology revisions we made also improve erature when resolution issues prevent assignment to information available for other species, and further well-defned ciliary compartments. For example, GO improvements can be made as the need arises. Notably, now provides the term ‘ciliary base’ that denotes a more for protein families where experimental characterization general location when the experimental (e.g. microscopy) is lacking in human and mouse (such as some dyneins), observations are not precise enough to defne protein we curated experimental information available from localization to more specifc ciliary compartments such a non-mammalian organism (Chlamydomonas rein- as the basal body, transition fbres or transition zone. hardtii). Tese experimental annotations also enabled Importantly, the ontology development we carried the phylogenetic inference of GO annotations via a out also improved connections among existing classes. dedicated and validated pipeline, both to species of bio- Tis has a positive downstream efect on data analysis. medical interest and also to many more species where For example, by connecting ‘axoneme’ to ‘cilium’ via the direct characterization of ciliary proteins is unlikely. We part_of relationship, pre-existing GO annotations to the also worked to refect the fact that cilia have not been former are automatically inferred to the latter, improving observed in some taxonomic groups, e.g. some types of the sensitivity of enrichment analyses. Similarly, merging plants (including Magnoliophyta, Coniferophyta, and terms that represent the same entity (such as ‘cilium axo- Gnetales), slime molds (Dictyostelium), and most fungi neme’ and ‘axoneme’) solved the issue of fragmentation (including Ascomyceta). In such cases, we applied com- of GO annotations over multiple terms. Tis, too, posi- putational rules to prevent usage of some general ciliary tively impacts data analysis. terms (e.g. ‘cilium’, ‘cilium assembly’ and ‘cilium move- Tere is always the potential to add more terms as ment’) for annotation in non-ciliated species. Te pres- new knowledge emerges or where more precise rep- ence of these taxonomic rules helps ensure correctness resentation of existing knowledge is requested by the of annotations [57], as checks can be applied both during community. For example, species-specifc axonemal manual annotation of experimental literature and during arrangements that are not currently present in GO (e.g. phylogenetic annotation pipelines. 9 + 4 axonemes in Hensen’s node in rabbit embryos [41]) Another way that our work improved the information could be incorporated if deemed useful to support data available for other species was in areas of the ontology analysis. where we uncovered faws in the original scope of GO Te improved GO vocabulary is being actively used terms or the structure relating GO terms to each other, to describe experimental fndings for human and mouse such that the addition of new terms was required in order ciliary proteins, consistent with the focus of the GO to provide clarity. One such area was that of fagella gen- Consortium on representing human biology. In this way, erally, where the previous ontology structure had con- ciliary genes and gene products are now being integrated fated bacterial fagella with those of eukaryotes, and also into gene and protein networks to provide productive made an inappropriate distinction between eukaryotic insight into biomedical studies where cilia and fagella are cilia versus eukaryotic fagella. Te resolution of this involved. Some of the GO terms we created or modifed problem generated new terms or clarifed existing ones have already been used to annotate human genes in the specifcally for the use in annotation of either bacterial SYSCILIA Gold Standard set. or archaeal species, as appropriate. In addition, the term Terms of the improved GO vocabulary have also ‘cilium or fagellum-dependent cell motility’, a grouping been used extensively to annotate ciliary proteins of the term for cell motility via any type of cilia or fagella, was mouse, one of the best systems for generation of mod- marked with a tag indicating that it is inappropriate for els for human genetic diseases [55, 56]. For example, the manual annotation as eukaryotic cilia and bacterial fa- many publications describing research into mouse mod- gella never co-exist in the same organism; thus it should els of retinal degeneration provided impetus to improve always be possible for the biocurator to select the appro- the representation of the photoreceptor cilium, includ- priate more specifc term based on which type of organ- ing the knowledge that the ‘photoreceptor connecting ism is being annotated. Roncaglia et al. Cilia (2017) 6:10 Page 11 of 13

We uncovered another logical faw in the ontology Additional fles while trying to make a connection between ‘sperm motil- ity’ and ‘cilium-dependent cell motility’. We realized Additional fle 1. Number of publications on ciliopathies as recorded in PubMed using the search term ‘ciliopath*’ (to include ciliopathy, ciliopa- that there is more than one mechanism of sperm motil- thies, etc.). Points represent available data (incomplete data for 2017 is ity, either fagellated or amoeboid (note that the non- represented by a dark grey point). A local polynomial regression ft of the fagellated sperm present in many plant species are not publication data allows for predicting the number of publications for 2017 themselves motile cells and are instead moved by the (black line; standard errors of the ft are represented as a grey ribbon). pollen tube). Tus, our addition of GO terms to describe Additional fle 2. Prokaryotic fagella-related cellular component and biological process terms in the Gene Ontology (GO). Full list of terms amoeboid sperm motility will be useful to properly anno- available in the Gene Ontology (GO) to describe cellular components tate gene products involved in the motility of amoeboid and biological processes related to prokaryotic fagella and available sperm in such as C. elegans. as of January 2017. Terms are listed in 3 separate sections pertinent to cellular components, motility and general organization of the fagellum, respectively. GO ID, Gene Ontology term identifer; GO term name, Gene Conclusions Ontology term descriptive label; Aspect, branch of the Gene Ontology Te enhanced ciliary ontology, and the improvements in (CC cellular component, BP biological process); Created/modifed by cilia =GO project, indicates if a term= was created or modifed as part of the breadth and depth of gene annotation, will allow more work described in this article. All GO terms can be browsed via AmiGO precise knowledge representation, which in turn will gen- (http://amigo.geneontology.org/amigo) or QuickGO (http://www.ebi. erate more informative results from data analyses. Te ac.uk/QuickGO/), and the full ontology can be downloaded via http:// geneontology.org/page/download-ontology. latter can potentially include re-analysis of existing data- Additional fle 3. Cilia-related cellular component and biological process sets, maximizing the usefulness of experimental work for terms in the Gene Ontology (GO). Full list of terms available in the Gene the scientifc community and ultimately leading to sig- Ontology (GO) to describe cilia-related cellular components and biologi- nifcant advances in our understanding of biology. Tis is cal processes as of January 2017, ordered alphabetically by term name. GO ID, Gene Ontology term identifer; GO term name, Gene Ontology especially important considering the increasing focus on term descriptive label; Created/modifed by cilia GO project, indicates ciliopathies, as apparent from the steady yearly increase if a term was created or modifed as part of the work described in this in the number of publications on the subject since 2006 article. All GO terms can be browsed via AmiGO (http://amigo.geneontol- ogy.org/amigo) or QuickGO (http://www.ebi.ac.uk/QuickGO/), and the (see Additional fle 1). Te advantages of applying similar full ontology can be downloaded via http://geneontology.org/page/ focused curation approaches to cell organelles were also download-ontology. shown recently for the [58]. Additional fle 4. Papers used to make annotations for human ciliary Our work lays solid foundations for the usefulness of genes. Full list of papers with corresponding PubMed IDs that were used for annotations for human ciliary genes. The frst 29 papers comprise the GO (and GO annotations) as a powerful resource for cili- set initially selected for the annotation project. The last four were curated ary researchers. In fact, beyond the informative classes later and specifcally targeted to fll in gaps in experimental annotations of to describe cilia structure and processes such as cilium certain orthology groups in order to be able to propagate highly specifc dynein-related annotations to other sequences within those orthology assembly, which were the object of this project, the GO groups. also represents other processes relevant to this organelle. Additional fle 5. Human genes annotated. List of all human genes A partial list includes signalling pathways, developmental annotated during this project, ordered alphabetically by Gene Symbol, processes and sensory events involving cilia. In with UniProtKB ID and Gene Name information, and presence within fact, due to the numerous roles the cilium plays in many the SYSCILIA Gold Standard (SCGS) set also included. The numbers of experimental annotations made by this work to either cilia subset terms developmental and signalling pathways, many processes or to other GO terms are indicated. The Comment feld indicates which involving ciliary function may still beneft from improve- genes were targeted in the initial set of papers, which were targeted for ment of ontology and annotation. Also, because the efort supplemental annotations, and also which genes are candidates to add to the SCGS based on the experimental work annotated by this project. described here focused mostly on mammals, there is still Additional fle 6. Gene Association File for Human Annotations. File in space in GO to expand representation of ciliary struc- GAF 2.0 format (http://www.geneontology.org/page/go-annotation-fle- tures found in other species. Input from research experts format-20) containing all the human annotations made by this project. on these individual processes will be needed, since they Annotations for the 33 publications listed in Additional fle 3 were extracted by grepping for the relevant PMID’s from the goa_human.gaf possess the specialized knowledge to help guide ontology fle generated on 3/13/2017 as downloaded from http://www.geneontol- development to refect the biology accurately. Research ogy.org/page/download-annotations on 3/20/2017. A subsequent step to communities within the ciliary feld are invited to collab- flter the annotations by source (column 15) to either SYSCILIA_CCNET or GO_Central generated this fle of annotations generated for human genes orate in joint projects with the GO consortium to tackle for this project. specifc areas of GO related to cilia. Te GO consortium Additional fle 7. Papers used to make annotations for Chlamydomonas also welcomes individual contributions by external experts reinhardtii ciliary genes. Full list of papers with corresponding PubMed (see http://geneontology.org/page/contributing-go). IDs that were used for annotations for Chlamydomonas reinhardtii ciliary genes. Roncaglia et al. Cilia (2017) 6:10 Page 12 of 13

Im Neuenheimer Feld 234, 69120 Heidelberg, Germany. 7 Structural and Com- Additional fle 8. Chlamydomonas reinhardtii genes annotated. List of putational Biology Unit, European Molecular Biology Laboratory, Meyerhofstr. all Chlamydomonas reinhardtii genes annotated during this project, with 1, 69117 Heidelberg, Germany. 8 Present Address: Centre for Cardiovascular the UniProtKB ID, status of the UniProt record, Gene Symbol (if available), Genetics, University College London, London WC1E 6JF, UK. 9 Present Address: and Protein Name information. The numbers of experimental annotations SciBite Limited, BioData Innovation Centre, Wellcome Genome Campus, made by this work to either cilia subset terms or to other GO terms are Cambridge CB10 1DR, UK. indicated. The Comment feld indicates the experimentally demonstrated location in the axoneme as captured by the experimental annotations Acknowledgements generated by this project. The authors wish to thank Gregory Pazour, Peter Satir, George Witman, Kristen Additional fle 9. Gene Association File for Chlamydomonas reinhardtii Verhey, Pete Lefebvre, Robert Bloodgood, and SYSCILIA partners (in particular annotations. File in GAF 2.0 format (http://www.geneontology.org/ Rachel H. Giles (UMC Utrecht) and Oliver Blacque (University College Dublin)) page/go-annotation-fle-format-20) containing all the Chlamydomonas for precious feedback on the biology of cilia; David Hill (MGI) for useful discus- reinhardtii annotations made by this project. These annotations were sions on placement of some ciliary ontology terms; Karen Rothfels and Peter extracted from the goa_uniprot_all_noiea.gaf fle generated on 2/13/2017 d’Eustachio (Reactome) for aligning GO and Reactome on cilia assembly as downloaded from http://www.geneontology.org/page/download- processes; Prudence Mutowo-Meullenet and Tony Sawford (EMBL-EBI) for annotations on 3/17/2017. Note that subsequent to this being generated, assistance with the Protein2GO curation tool; Dragana Mitic Potkrajac and Ana all annotations for A8JDH8 were transferred to Q4AC22, a more complete Stelkic (CCNET) for contributing to gene annotation eforts. sequence for DHC9 and all annotations for A8JDN3 were transferred to Q39604, the SwissProt reviewed entry for IDA4. Additionally, in response Competing interests to our suggestion, A8JDN3 was merged into Q39604 as it was identical in The authors declare that they have no competing interests. sequence. Availability of data and materials Additional fle 10. Panther protein families. List of Panther protein The GO resource (ontology and annotations) is fully and freely available via families (version 11.1) containing dynein subunit sequences, included the GO Consortium website http://www.geneontology.org. In this paper, we Panther family ID and family name, the total number of sequences in each provide Additional fles detailing the new and modifed ontology terms, as family, the human and Chlamydomonas genes present in each. We have well as the gene annotations, that are part of this work. indicated the annotation date as well as the impact of our experimental annotations on the phylogenetic annotations propagated by the PAINT Ethics approval and consent to participate curation process. Not applicable. Additional fle 11. Additional GO term enrichment analysis illustrating the efect of improved GO ontology and annotations separately. Six GO Funding enrichment comparisons are shown of which two are also depicted in PR, KRC and RPH were funded by NIH NHGRI Grant U41 HG002273 to the Fig. 6. In the four additional comparisons, the 2012 state of GO ontology Gene Ontology Consortium (PIs: JA Blake, JM Cherry, S. Lewis, PW Sternberg and annotations is compared to either the current ontology using old and P. Thomas). PR and RPH were supported in part by EMBL-EBI Core funds. annotations from 2012, or the ontology from 2012 but with the current KRC was also funded by NIH NHGRI Grant U41 HG 000330 to the Mouse annotations. The former comparison illustrates the efects of the improved Genome Database. TJPvD, MAH, GT and TJG were funded by the European GO terminology and the underlying connections of terms, while the latter Union Seventh Framework Programme (FP7/2007-2013) under Grant Agree- comparison illustrates the efects of increased gene annotations over the ment No. 241955 (SYSCILIA). TJPvD, MAH were also funded by the Virgo same period of time. p values are corrected using the Bonferroni multiple consortium (FES0908). TJG was also supported by EMBL core funding. TJPvD testing correction. Terms in grey are not signifcantly enriched. was also supported by Utrecht Bioinformatics Center. JL was supported by EMBL Core funds.

Abbreviations Publisher’s Note BP: biological process; CC: cellular component; EMBL-EBI: European Molecular Springer Nature remains neutral with regard to jurisdictional claims in pub- Biology Laboratory, European Bioinformatics Institute; GO: Gene Ontol- lished maps and institutional afliations. ogy; GOC: Gene Ontology Consortium; MF: molecular function; OBO: open biomedical ontologies; PAINT: Phylogenetic Annotation and INference Tool; Received: 3 May 2017 Accepted: 30 October 2017 PANTHER: Protein ANalysis THrough Evolutionary Relationships; SCGS: SYSCILIA Gold Standard; SYSCILIA: a systems biology approach to dissect cilia function and its disruption in human genetic disease; UniProt-GOA: Gene Ontology Annotation Database at the Universal Protein Resource.

Authors’ contributions References PR, TJPvD, KRC, and TG wrote the manuscript. PR, TJPvD, KRC, and JL developed 1. Leewenhoeck AV. Observation, communicated to the publisher by Mr. the ontology. KRC and RPH contributed training in GO curation practice and Antonie van Leewenhoeck in a Dutch letter of the 9 Octob. 1676 here reviewed GO annotations. LN, GT, and KRC performed GO curation of ciliary English’d: concerning little animals by him observed in rain-well-sea and proteins. MAH and TG conceived of the study and participated in the design snow water; as also in water wherein pepper had lain infused. Phil Trans. and coordination. MAH helped draft the manuscript. All authors read and 1977;12:821–31. approved the fnal manuscript. 2. Manton I. The fne structure of plant cilia. Symp Soc Exp Biol. 1952;6:306–19. Author details 3. Huangfu D, Liu A, Rakeman AS, Murcia NS, Niswander L, Anderson KV. 1 European Molecular Biology Laboratory, European Bioinformatics Institute Hedgehog signalling in the mouse requires intrafagellar transport pro- (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK. teins. Nature. 2003;426:83–7. 2 The Gene Ontology Consortiumhttp://geneontology.org. 3 Centre for Molec- 4. Falk N, Lösl M, Schröder N, Gießl A. Specialized cilia in mammalian sen- ular and Biomolecular Informatics, Radboud University Medical Center, PO sory systems. Cells. 2015;4:500–19. Box 9101, 6500 HB Nijmegen, The Netherlands. 4 Theoretical Biology and Bioin- 5. Waters AM, Beales PL. Ciliopathies: an expanding disease spectrum. formatics, Department of Biology, Faculty of Science, Utrecht University, Pediatr Nephrol. 2011;26(7):1039–56. Utrecht, The Netherlands. 5 The Jackson Laboratory, 600 Main Street, Bar 6. Hildebrandt F, Benzing T, Katsanis N. Ciliopathies. N Engl J Med. Harbor, ME 04609, USA. 6 Fakultät Biowissenschaften, Universität Heidelberg, 2011;364:1533–43. Roncaglia et al. Cilia (2017) 6:10 Page 13 of 13

7. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene 32. Shiba D, Yamaoka Y, Hagiwara H, Takamatsu T, Hamada H, Yokoyama T. ontology: tool for the unifcation of biology. The Gene Ontology Consor- Localization of Inv in a distinctive intraciliary compartment requires the tium. Nat Genet. 2000;25:25–9. C-terminal -homolog-containing region. J Cell Sci. 2009;122:44–54. 8. Gene Ontology Consortium. Gene Ontology Consortium: going forward. 33. Satir P. STUDIES ON CILIA: II. Examination of the distal region of the ciliary Nucleic Acids Res. 2015;43:D1049–56. shaft and the role of the flaments in motility. J Cell Biol. 1965;26:805–34. 9. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, et al. PANTHER 34. Reiter JF, Blacque OE, Leroux MR. The base of the cilium: roles for transi- version 11: expanded annotation data from Gene Ontology and Reac- tion fbres and the transition zone in ciliary formation, maintenance and tome pathways, and data analysis tool enhancements. Nucleic Acids Res. compartmentalization. EMBO Rep. 2012;13:608–18. 2017;45:D183–9. 35. Hirokawa N, Tanaka Y, Okada Y, Takeda S. Nodal fow and the generation 10. Kistler WS, Baas D, Lemeille S, Paschaki M, Seguin-Estevez Q, Barras E, et al. of left-right asymmetry. Cell. 2006;125:33–45. RFX2 Is a major transcriptional regulator of Spermiogenesis. PLoS Genet. 36. Bloodgood RA. Sensory reception is an attribute of both primary cilia and 2015;11:e1005368. motile cilia. J Cell Sci. 2010;123:505–9. 11. Maillo V, de Frutos C, O’Gaora P, Forde N, Burns GW, Spencer TE, et al. 37. Takeda S, Narita K. Structure and function of vertebrate cilia, towards a Spatial diferences in gene expression in the bovine oviduct. Reproduc- new taxonomy. Diferentiation. 2012;83:S4–11. tion. 2016;152:37–46. 38. Ibañez-Tallon I, Heintz N, Omran H. To beat or not to beat: roles of cilia in 12. Garcia-Gonzalo FR, Corbit KC, Sirerol-Piquer MS, Ramaswami G, Otto EA, development and disease. Hum Mol Genet. 2003;12:R27–35. Noriega TR, et al. A transition zone complex regulates mammalian cili- 39. Fisch C, Dupuis-Williams P. Ultrastructure of cilia and fagella—back to the ogenesis and ciliary membrane composition. Nat Genet. 2011;43:776–84. future! Biol Cell. 2011;103:249–70. 13. Shefeld VC. The blind leading the obese: the molecular pathophysi- 40. Roncaglia P, van DamTeunis JP, Christie KR, Nacheva L, Huynen MA, ology of a human obesity syndrome. Trans Am Clin Climatol Assoc. Huntley RP, Gibson T, Lomax J. Towards the gene ontology of the eukary- 2010;121:172–81. otic cilia and fagella [v1; not peer reviewed] (poster). F1000Research. 14. Boldt K, van Reeuwijk J, Lu Q, Koutroumpas K, Nguyen T-MT, Texier Y, et al. 2016;5:2451. An organelle-specifc protein landscape identifes novel diseases and 41. Feistel K, Blum M. Three types of cilia including a novel 9 4 axoneme on molecular mechanisms. Nat Commun. 2016;7:11491. the notochordal plate of the rabbit embryo. Dev Dyn. 2006;235:3348–58.+ 15. van Dam TJ, Wheway G, Slaats GG, Huynen MA, Giles RH. The SYSCILIA 42. Baccetti B, Dallai R, Giusti F. The spermatozoon of Arthropoda. VI. gold standard (SCGSv1) of known ciliary components and its applications Ephemeroptera. J Ultrastruct Res. 1969;29:343–9. within a systems biology consortium. Cilia. 2013;2:7. 43. Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw 16. SYSCILIA Consortium. http://www.syscilia.org/. Accessed 3 Nov 2017. R, et al. The reactome pathway knowledgebase. Nucleic Acids Res. 17. Christie KR, Blake JA. Sensing the cilium, digital capture of ciliary data for 2016;44:D481–7. comparative genomics investigations. Submitted to Cilia. 44. Milacic M, Haw R, Rothfels K, Wu G, Croft D, Hermjakob H, et al. Annotat- 18. GitHub GO repository issue tracker. https://github.com/geneontology/ ing cancer variants and anti-cancer therapeutics in reactome. Cancers go-ontology/issues. Accessed 3 Nov 2017. (Basel). 2012;4:1180–211. 19. Musen MA, Protégé Team. The Protégé Project: a look back and a look 45. Rothfels K. Assembly of the primary cilium. Reactome—a curated knowl- forward. AI Matters. 2015;1:4–12. edgebase. Biol Pathways. 2014;51:932. 20. Day-Richter J, Harris MA, Haendel M, Gene Ontology OBO-Edit Working 46. Sorokin S. and the formation of rudimentary cilia by fbroblasts Group, Lewis S. OBO-Edit—an ontology editor for biologists. Bioinformat- and smooth muscle cells. J Cell Biol. 1962;15:363–77. ics. 2007;23:2198–200. 47. Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, 21. Dietze H, Berardini TZ, Foulger RE, Hill DP, Lomax J, Osumi-Sutherland D, et al. The Cell Ontology 2016: enhanced content, modularization, and et al. TermGenie—a web-application for pattern-based ontology class ontology interoperability. J Biomed Semantics. 2016;7:44. generation. J Biomed Semantics. 2014;5:48. 48. Nelson GA, Roberts TM, Ward S. Caenorhabditis elegans spermatozoan 22. Balakrishnan R, Harris MA, Huntley R, Van Auken K, Cherry JM. A guide locomotion: with almost no . J Cell Biol. to best practices for Gene Ontology (GO) manual annotation. Database 1982;92:121–31. (Oxford). 2013;2013:bat054. 49. Smith HE. sperm motility. WormBook. 2014;1–15. 23. Guide to GO Evidence Codes|Gene Ontology Consortium. http://www. 50. Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, et al. geneontology.org/page/guide-go-evidence-codes. Accessed 21 Apr AmiGO: online access to ontology and annotation data. Bioinformatics. 2017. 2009;25:288–9. 24. Huntley RP, Harris MA, Alam-Faruque Y, Blake JA, Carbon S, Dietze H, et al. 51. Binns D, Dimmer E, Huntley R, Barrell D, O’Donovan C, Apweiler R. A method for increasing expressivity of Gene Ontology annotations QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics. using a compositional approach. BMC Bioinform. 2014;15:155. 2009;25:3045–6. 25. Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, 52. Gaudet P, Livstone MS, Lewis SE, Thomas PD. Phylogenetic-based propa- Martin MJ, et al. The GOA database: Gene Ontology annotation updates gation of functional annotations within the Gene Ontology Consortium. for 2015. Nucleic Acids Res. 2015;43:D1057–63. Brief Bioinform. 2011;12:449–62. 26. Ross AJ, Dailey LA, Brighton LE, Devlin RB. Transcriptional profling of 53. Download Annotations|Gene Ontology Consortium. http://geneontol- mucociliary diferentiation in human airway epithelial cells. Am J Respir ogy.org/page/download-annotations. Accessed 21 Apr 2017. Cell Mol Biol. 2007;37:169–85. 54. Barrell D, Dimmer E, Huntley RP, Binns D, O’Donovan C, Apweiler R. 27. Arnaiz O, Malinowska A, Klotz C, Sperling L, Dadlez M, Koll F, et al. The GOA database in 2009–an integrated Gene Ontology Annotation Cildb: a knowledgebase for and cilia. Database. resource. Nucleic Acids Res. 2009;37:D396–403. 2009;2009:bap022. 55. Rao Damerla R, Gabriel GC, Li Y, Klena NT, Liu X, Chen Y, et al. Role of cilia 28. Herrero J, Mufato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, in structural birth defects: insights from ciliopathy mutant mouse models. et al. Ensembl comparative genomics resources. Database (Oxford). Birth Defects Res C Embryo Today. 2014;102:115–25. 2016;2016:bav069. 56. Norris DP, Grimes DT. Mouse models of ciliopathies: the state of the art. 29. Bauer S, Grossmann S, Vingron M, Robinson PN. Ontologizer 2.0—a mul- Dis Model Mech. 2012;5:299–312. tifunctional tool for GO term enrichment analysis and data exploration. 57. Deegan JI, Dimmer EC, Mungall CJ. Formalization of taxon-based con- Bioinformatics. 2008;24:1650–1. straints to detect inconsistencies in annotation and ontology develop- 30. Ontology Documentation|Gene Ontology Consortium. http://geneontol- ment. BMC Bioinform. 2010;11:530. ogy.org/page/ontology-documentation. Accessed 21 Apr 2017. 58. Mutowo-Meullenet P, Huntley RP, Dimmer EC, Alam-Faruque Y, 31. Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harris MA, et al. Sawford T, Jesus Martin M, et al. Use of Gene Ontology Annotation to Cross-product extensions of the Gene Ontology. J Biomed Inform. understand the peroxisome proteome in humans. Database (Oxford). 2011;44:80–6. 2013;2013:bas062.