0 (0) 1 1 IOS Press

1 James P. McCusker a,1, John S. Erickson a and 1 2 Katherine Chastain a and Sabbir Rashid a and 2 3 Rukmal Weerawarana a and Marcello Bax a and 3 4 Deborah L. McGuinness a 4 5 a Computer Science, Rensselaer Polytechnic Institute, 5 6 Troy, NY, US 6 7 E-mails: [email protected], [email protected], 7 8 [email protected], [email protected], [email protected], 8 9 [email protected], [email protected] 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51

1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved 2 J. McCusker et al. / What is a Knowledge Graph?

1 1 2 2 3 3 4 What is a Knowledge Graph? 4 5 5 6 6 7 7 8 8 9 9 Abstract. Knowledge graphs have enjoyed a resurgence in research interests after the development of several commercial 10 10 projects, such as Google’s knowledge graph. However, the use of the term has evolved and now may refer to a wide range of 11 11 graphs, that may not include clear and unambiguous definitions or references. To better provide clarity to knowledge graph re- 12 search, we survey the literature for current efforts that may inform a knowledge graph definition, and then use that review along 12 13 with our work to synthesize a definition that is relevant and informative to current knowledge graph research, while constraining 13 14 the research space that may be considered a knowledge graph. We define a knowledge graph as “A graph, composed of a set of 14 15 assertions (edges labeled with relations) that are expressed between entities (vertices), where the meaning of the graph is encoded 15 16 in its structure, the relations and entities are unambiguously identified, a limited set of relations are used to label the edges, 16 17 and the graph encodes the provenance, especially justification and attribution, of the assertions.” We evaluate a wide variety of 17 18 knowledge resources, graphs, and ontologies to determine if they qualify under our definition, and find that while expressing 18 19 knowledge as a graph structure and unambiguous denotation of entities and relations in the graph are common, it is less common 19 to trace provenance of encoded knowledge, and less common to constrain the relations used when expressing that knowledge. 20 20 We created our Knowledge Graph Catalog to support this effort, and make it available to the public to search and contribute new 21 21 knowledge graphs. 22 22 23 Keywords: Knowledge Graphs 23 24 24 25 25 26 26 27 1. Introduction updated definition along with a set of knowl- 27 28 edge graph requirements. We include the require- 28 29 Google introduced its Knowledge Graph project ment that knowledge graphs represent attributable 29 30 in 2012 [1] in order to enhance their search re- knowledge, thus they need to include information 30 31 31 sult quality, but it has also reignited interest in about where the knowledge came from, as op- 32 32 posed to containing "bare statements" with no jus- 33 knowledge graph research. They have leveraged 33 34 existing knowledge graphs, such as DBpedia and tification or provenance. We discuss how knowl- 34 35 , and also have opened up the process of edge graphs as defined are a crucial component 35 36 contributing to the graph by ingesting , for the future of the Web and have great potential 36 37 RDFa, and formats from the Web pages for transformational change in data science and 37 38 they index, based on the vocabularies published domain sciences. 38 39 by schema.org. The success of the Google Knowl- Knowledge graphs provide an opportunity to ex- 39 40 edge Graph, and its use of semantic technologies, pand our understanding of how knowledge can be 40 41 has led to a resurgence in the use of the term managed on the Web and how that knowledge can 41 42 in semantic research to describe similar projects. be distinguished from more conventional Web- 42 43 However, the term “knowledge graph” remains based data publication schemes such as Linked 43 44 44 underspecified, and in many cases, simply refers Data [2]. In recent years, knowledge graphs have 45 45 grown increasingly prominent through commer- 46 to any directed labeled graph. The pre-Semantic 46 47 Web conceptualization of knowledge graphs pro- cial and research applications on the Web. Google 47 48 vides us with guidance as to what might currently was one of the first to promote a semantic meta- 48 49 “count” as a knowledge graph and also describes data organizational model described as a “knowl- 49 50 capabilities that do not yet exist in current knowl- edge graph,” and many other organizations have 50 51 edge graphs. From this synthesis, we propose an since used the term in published research on 51 J. McCusker et al. / What is a Knowledge Graph? 3

1 knowledge management and graph databases. Our set of relation types are used. These requirements 1 2 purpose with this paper is to survey the evolv- also minimize redundancy within the knowledge 2 3 ing notion of a knowledge graph, to describe the graph, which simplifies analytical operations (in- 3 4 general space, and to provide an explicit opera- cluding reasoning and queries). Popping explores 4 5 tional description of a knowledge graph. We begin the use of knowledge graphs, and their challenges 5 6 with a review of recent definitions of knowledge at the time, in their use in network text analysis 6 7 7 graphs, knowledge graph analysis and construc- [9]. Following Zhang, Popping defines the knowl- 8 8 9 tion algorithms, and commercial, research, non- edge graph as a type of that uses 9 10 profit, and government knowledge graphs. These only a few types of relations, but also asserts that 10 11 new knowledge graphs do not strictly adhere to additional knowledge may be added to the graph. 11 12 original knowledge graph theory [3], but instead Ehrlinger [10] selected some representative def- 12 13 have followed a looser, more flexible definition. initions that demonstrate the lack of a common 13 14 We present a more descriptive view of current, core understanding of the concept. Farber, et 14 15 practical knowledge graphs, and discuss their po- al. [11] and Huang, et al. [12] define knowledge 15 16 tential for evolution and impact. graph as being an RDF graph. Paulheim [13] 16 17 argues that "knowledge graphs are supposed to 17 18 cover at least a major portion of the domains that 18 19 19 2. Related Work exist in the world, and are not supposed to be 20 20 restricted to only one domain." But while DB- 21 21 22 Rospocher, et al. present knowledge graphs as pedia or are general knowledge graphs 22 23 collections of facts about entities, typically de- and don’t focus on a single domain, this should 23 24 rived from structured data sources such as Free- not mean that all knowledge graphs must be gen- 24 25 base [4]. They cite a dearth of event representa- eral. On the contrary, we believe that knowledge 25 26 tions in current knowledge graphs as a shortcom- graphs created for specific domains such as Bi- 26 27 ing - limiting knowledge graphs to encyclopedic ology can be considered knowledge graphs if 27 28 items such as birth and death dates - primarily due they follow the other requirements. More recently, 28 29 to the difficulty of obtaining temporal data about many works report on automatically building 29 30 entities in a structured manner. Recent surveys, knowledge graphs out of textual medical knowl- 30 31 31 such as those by Hogenboom, et al. [5] and Deng, edge and medical records [14], [15], [16], [17]. 32 32 33 et al. [6], provide overviews of numerous meth- 33 34 ods for event extraction from a variety of sources 34 35 including social media, news, academic publica- 35 3. A Definition of “Knowledge Graph” 36 tions, and even images and video, indicating that 36 37 there is a great interest in finding ways to interpret 37 38 and include such temporal data in a more struc- One thing to note is that the knowledge graph 38 39 tured format. Another review by Nickel et al. ex- platforms that have been reviewed in this paper 39 40 plores machine learning methods for knowledge do not strictly adhere to the definition of knowl- 40 41 graphs, but limits their definition to directed la- edge graph that was set out in an de Riet and 41 42 beled graphs, with the ability to optionally pre- Meersman [3], Stokman and de Vries [7], and 42 43 define the schema. They also review, but do not Zhang [8]. Since usage has evolved, it is appro- 43 44 44 take a position on, the use of the closed versus priate to develop a definition that follows how 45 45 46 open world assumptions. the term is currently used. Implicit in the name, 46 47 van de Riet and Meersman [3], Stokman and de “knowledge graph,” is, of course, that a knowl- 47 48 Vries [7], and Zhang [8], present a formal theory edge graph represents knowledge, and does so us- 48 49 of knowledge graphs as a specialization of seman- ing a graph structure. Stokman, de Vries [7], and 49 50 tic networks where meaning is expressed as struc- Zhang [8] posit useful definitions and require- 50 51 ture, statements are unambiguous, and a limited ments for knowledge graphs as a starting point: 51 4 J. McCusker et al. / What is a Knowledge Graph?

1 – Knowledge graph meaning is expressed as and married Yoko later on. The spouse relation- 1 2 structure. ship is actually more complex than modeled in 2 3 – Knowledge graph statements are unambiguous. DBpedia, and one reason is because the relation 3 4 – Knowledge graphs use a limited set of relation needs information about the time context. The re- 4 5 types. lationship has a beginning, and sometimes an end. 5 6 If a person has had multiple marriages, that infor- 6 7 Note that the graph (in a knowledge graph) is a di- 7 mation needs to be added to the relationship, not 8 rected labeled graph. Without direction or labels, 8 the person. Expressing these relationships in a vo- 9 it would be impossible to encode any significant 9 cabulary with limited relations that follow the cri- 10 meaning into the structure of a graph. In order for 10 teria we introduced above and will revisit below 11 knowledge graph statements to be unambiguous, 11 might look more like this: 12 they need to be composed of unambiguous units. 12 13 13 14 – All identified entities in a knowledge graph, in- John_Lennon hasRole 14 15 cluding types and relations, must be identified [ a Spouse ; 15 16 using global identifiers with unambiguous de- startTime [ a TimeInstant; 16 17 notation. hasValue "1962 −08 −23"]; 17 18 endTime [ a TimeInstant; 18 19 One example of this kind of identifier is the Uni- 19 hasValue "1968 −11 −08"]; 20 form Resource Identifier (URI), as used in RDF 20 inRelationTo Cynthia_Lennon]; 21 [18]. 21 22 While the use of “limited set of relation types” [ a Spouse ; 22 23 proposed by van de Riet et al. addressed a spe- startTime [ a TimeInstant; 23 24 cific set of non-decomposable, essential relations, hasValue "1969 −03 −20"]; 24 25 in the context of an open world knowledge system inRelationTo Yoko_Ono]. 25 26 this should be taken to mean a core set of essen- 26 27 tial classes and relations that are true regardless The relations used here allow for elaboration at 27 28 28 of context. For instance, a person is a patient only every level. By limiting knowledge graphs to 29 29 within the context of a medical encounter. Sim- a set of essential relations, it forces knowledge 30 graph editors to think compositionally, making 30 31 ilarly, in microbiology, there are many proteins 31 the graph structure more durable as additional 32 that act on other proteins within certain parts of 32 33 the cell. They are inactive in other parts, so the lo- knowledge and context is added. The encoding 33 34 calization of the protein is context that is needed uses the essential term hasRole and includes the 34 35 to understand its behavior. contextual temporal information about when the 35 36 It is important to consider context as many rela- spouse relationship actually held, thus allowing a 36 37 tions that may seem simple and binary may actu- statement about John Lennon’s spouse relation- 37 38 ally be more complex. This is often seen in DBpe- ship to Cynthia DURING a particular context to 38 39 dia, where many relationships, like spouses, and always be evaluated as true. 39 40 children, are expressed as simple triples of the In practice, the knowledge graph literature and 40 41 41 form: the practical knowledge graphs we reviewed ei- 42 ther aggregate knowledge from many secondary 42 43 sources and use Natural Language Processing 43 44 John_Lennon spouse Yoko_Ono. 44 (NLP) extraction when the sources are unstruc- 45 45 46 However, there is another triple in the graph: tured text, or use a semantic Extraction Trans- 46 47 formation and Load (ETL) process from struc- 47 48 John_Lennon spouse Cynthia_Lennon. tured databases [19]. Some knowledge graphs rely 48 49 on crowdsourcing of their information (includ- 49 50 This suggests that Lennon was married to two ing the ), a form of dis- 50 51 people at once. Of course, John divorced Cynthia tributed curation. At no point do we see a case 51 J. McCusker et al. / What is a Knowledge Graph? 5

1 where the knowledge does not have a theoreti- Knowledge Graph An Unambiguous Graph with 1 2 cal, citeable source or some other recorded justi- a limited set of relations used to label the 2 3 fication. Since knowledge graphs nominally rep- edges that encodes the provenance, espe- 3 4 resent knowledge, we argue that some criteria for cially justification and attribution, of the as- 4 5 inclusion of content and its provenance should be sertions. 5 6 encoded in the graph. This is especially true for 6 7 All the resources we reviewed are Graphs, in the 7 knowledge graphs gathered from other sources, as 8 above sense. 8 9 the sources themselves must have some justifica- 9 10 tion for publishing their assertions. 10 11 – Knowledge graphs must include explicit prove- 4. Knowledge Graph Methods 11 12 nance. 12 13 Corby and Zucker present an abstract knowl- 13 14 In many cases, the justification for inclusion of as- edge graph querying machine they call KGRAM 14 15 sertions appeals to authority, through the citation [22], but do not define knowledge graphs be- 15 16 of the resource the knowledge was extracted from. yond being directed labeled graphs. This work ap- 16 17 Authority, at least in scientific research, is only pears to be an abstraction of graph query methods 17 18 18 a short cut for validating knowledge, and good and KGRAM can be viewed as a generalization 19 knowledge graphs should encode as much justifi- 19 20 and extension of the RDF graph query language 20 cation for their assertions as they can. We consider SPARQL [23]. Wang et al. [24] discuss projecting 21 graphs without provenance, concerning attribu- 21 22 generalized knowledge graphs into hyperplanes, 22 tion or justification, to be bare statement graphs. 23 but also only focuses on the labeled directed graph 23 24 Bare statement graphs are not true knowledge requirement of knowledge graphs. Pujara et al. 24 25 graphs, according to our definition, since they do use probabilistic soft logic (PSL) to manage un- 25 26 not provide a way to confirm that statements are certainty in knowledge graphs that have been ex- 26 27 justified or are even believed by their originators; tracted from uncertain sources [25]. They argue 27 28 this is a minimal (but not sufficient [20]) criteria that many current knowledge graphs do not al- 28 29 for “knowledge” in a knowledge graph. ways clearly identify entities, relying instead on 29 30 labels that can be different due to spelling vari- 30 31 – Knowledge graphs may include uncertainty as- 31 ations. Their task of “knowledge graph identi- 32 sessments. 32 fication” has a goal of identifying a set of true 33 Some knowledge graphs go further in model- 33 34 assertions from noisy extractions. They do not 34 ing knowledge by providing uncertainty assess- claim to manage the provenance of the resulting 35 ments of the knowledge asserted [21]. This can 35 36 knowledge graph assertions. Lin et al. attempt 36 be useful when dealing with scientific knowledge 37 link prediction for automated knowledge graph 37 graphs, where competing hypotheses and theories 38 construction but only rely on a directed, labeled 38 are known to be true to certain degrees, which 39 graph model of knowledge graphs [26]. Hakkani- 39 40 may change as new evidence comes to light. Tur et al. use statistical language understanding 40 41 We have therefore identified the following hierar- to pose structured questions against the Freebase 41 42 chy of graph types. The basic graph that we build knowledge graph, focusing on improving the ex- 42 43 on is a directed labeled graph. traction of relation detection in the queries [27]. 43 44 44 Graph A set of assertions (edges labeled with re- Benedek et al. have presented a collaborative 45 45 46 lations) that are expressed between entities knowledge graph construction tool called “Con- 46 47 (vertices) where the meaning of the graph is ceptipedia”, building off of their “WikiNizer” 47 48 encoded in its structure. project [28]. This project uses visual mind map- 48 49 Unambiguous Graph A graph where the rela- ping techniques and concept similarity analysis 49 50 tions and entities are unambiguously identi- to suggest cross-knowledge graph mappings be- 50 51 fied. tween collaborators. Weiderman and Kritzinger 51 6 J. McCusker et al. / What is a Knowledge Graph?

1 [29] refer to knowledge graphs as a synonym for 1 2 concept maps, but do not expand further on the 2 3 topic, nor do they cite any work in knowledge 3 4 graphs. 4 5 5 6 6 7 7 8 5. A Meta-Knowledge Graph 8 9 9 10 To organize this paper, we created what we call 10 11 The Knowledge Graph Catalog (KGC) (http:// 11 12 graphs.whyis.io). The KGC is a meta-knowledge 12 13 graph that collects metadata about published 13 14 14 knowledge graphs and resources that resemble 15 15 16 knowledge graphs. It currently describes key fea- 16 17 tures of each graph, the API, publisher, and life- 17 18 cycle status, and provides a faceted browser of 18 19 the knowledge resources reported here. We have 19 20 made an effort to cover as many knowledge Figure 1. Venn diagram of required features in knowledge graphs. 20 "Meaning as structure" is implemented by all surveyed knowledge 21 21 graphs as we could find, but readers can contribute graphs, but many graphs do not limit their relations, nor do they track 22 to KGC and can suggest knowledge graphs and provenance in a meaningful way. 22 23 similar resources, that are not yet in the catalog, 23 24 through a form on the web site. 24 25 25 26 26 5.1. An Ontology of Knowledge Resources 27 27 28 28 29 We also developed a knowledge graph catalog on- 29 30 tology (http://graphs.whyis.io/ns) to support the 30 31 KGC itself. It includes relevant attributes and defi- 31 32 nitions from here and includes a hierarchy of qual- 32 33 ifying knowledge resource types. 33 34 34 35 35 36 36 37 6. Knowledge Graphs 37 38 38 Figure 2. Knowledge graph proportions by publisher and status. 39 We have surveyed 37 potential knowledge graphs 39 40 so far (including the Knowledge Graph Catalog, 6.1. Academic Knowledge Graphs 40 41 or KGC), and have found that 7 of them (also 41 42 including KGC) fulfill all four requirements of The Gene Ontology (GO) may be considered 42 43 the knowledge graph definition presented in Sec- more of a knowledge graph than an ontology. 43 44 44 tion 3. We show how each resource fulfills those It embodies a hierarchy of biological processes, 45 45 46 requirements in Table 1, along with the publisher cellular locations, and molecular functions into 46 47 and production status. They are found in com- which a number of genes and proteins have been 47 48 mercial, academic, nonprofit, and government set- classified or annotated. These annotations have 48 49 tings. A number of knowledge graphs are experi- been curated by domain experts, and the evidence 49 50 mental or retired, but a significant number are in for each is recorded using a GO-specific prove- 50 51 active production, as shown in Figure 2. nance encoding [30]. Other ontologies in OBO 51 J. McCusker et al. / What is a Knowledge Graph? 7

1 Foundry also encode knowledge, but do not pro- knowledge. It focuses on neuroscience, but de- 1 2 vide provenance of their assertions. BioPortal is velopers claim the core technology will apply to 2 3 a web-based application for accessing and shar- other domains [34]. 3 4 ing biomedical ontologies. It is the largest such 4 5 repository, with more than 700 ontologies to date. 5 6 This set includes ontologies that were developed 7. Other Graph Resources 6 7 7 in OWL, OBO, and other formats, as well as a 8 While one could argue that the following re- 8 large number of medical terminologies that the 9 sources are knowledge graphs, they do not ful- 9 US National Library of Medicine distributes. It 10 fill the complete definition, even if they contain 10 supports dereferencing of URIs for whole ontolo- 11 some of the requirements. All of these knowl- 11 gies and individual terms in the ontologies [31]. 12 edge graphs express meaning as structure, but 12 13 The UniProt is an excellent 13 they all fail to provide one or more of unambigu- 14 source of manually, expert-verified, Protein data, 14 ous identifiers, limited sets of relations, or knowl- 15 and provides citations for all assertions made in 15 edge provenance. 16 the graph. Additionally it also provides mappings 16 17 17 to other Gene and Protein URI schemes, and re- 7.1. Academic Graph Resources 18 lationships with other similar proteins and details 18 19 about protein interactions. However, a lack of a vi- 19 20 BabelNet is a multilingual knowledge graph that 20 sual browser on the KG website limits its usability 21 attempts automated entity disambiguation across 21 22 compared to other Knowledge Graph platforms. its languages, and provides an integration of 22 23 Furthermore, it also lacks provenance information and WordNet [35]. Chemical Entities 23 24 about the expert that completed the manual veri- of Biological Interest (ChEBI) relies on hand- 24 25 fication and Knowledge Graph construction [32]. annotation and affirmation of concepts, which en- 25 26 The Knowledge Graph Catalog (KGC) was dis- sures that the data contained within is accurate 26 27 cussed in Section 5. according to domain scientists, at the expense of 27 28 taking longer to provide updates to the knowl- 28 29 6.2. Commercial Knowledge Graphs edge graph. One of the primary focuses of the 29 30 original effort was to unite a variety of molecular 30 31 Google introduced its Knowledge Graph project 31 representation and nomenclature standards utiliz- 32 in 2012, and has used it to improve query re- 32 ing unambiguous identifiers. This gives credibil- 33 sult relevancy and their overall search experience. 33 ity and re-usability as a resource in the relevant 34 They have leveraged existing knowledge graphs, 34 chemical and biochemical communities. Previ- 35 such as DBpedia and Freebase, and also have 35 ous versions overloaded some of the relationship 36 opened up the process of contributing to the graph 36 37 terms, which introduced difficulty in represent- 37 by ingesting RDFa and microdata formats from 38 ing the information in an ontology graph format, 38 the Web pages they index, based on the vocabu- 39 but more recent versions have worked to resolve 39 laries published by schema.org [1]. The Knowl- 40 these issues [36–38]. ConceptNet is a knowl- 40 edge Vault, a research project funded by Google, 41 edge graph of things people know, and comput- 41 handles knowledge graph uncertainty as a result 42 ers should know, expressed in various natural lan- 42 of automated fact extraction from Web pages, and 43 guages. ConceptNet uses PostgreSQL as database 43 44 attempts to fuse data from multiple sources into a 44 [39]. DBPedia is a large-scale transformation of 45 singular knowledge graph [33]. 45 46 Wikipedia into a knowledge graph. Since it is cu- 46 47 6.3. Nonprofit Knowledge Graphs rated from crowdsourced wikipedia infoboxes, it 47 48 contains an unlimited set of relations. It also relies 48 49 The Nexus Knowledge Graph is a schema- on Wikipedia for the provenance of any changes 49 50 driven knowledge graph that uses the W3C PROV to the knowledge graph, since it is generated from 50 51 ontology to manage provenance about contributed there [40]. Elementary (later, DeepDive) is a 51 8 J. McCusker et al. / What is a Knowledge Graph?

1 framework for developing knowledge graphs us- be a knowledge graph, although it originated as 1 2 ing natural language processing algorithms to ex- a large, general-purpose ontology. While they 2 3 tract and infer new knowledge [41]. PROSPERA aggregate knowledge from many sources, there 3 4 is another tool that attempts are no published descriptions of whether or how 4 5 scalable extraction with high precision and recall provenance is tracked in YAGO [50]. 5 6 [42]. Read the Web/Never Ending Learning 6 7 7 uses semi-supervised natural language processing 7.2. Commercial Graph Resources 8 8 9 techniques to build a knowledge graph against a 9 10 set of entity types. In the web interface, each as- , started in 1987, is the world’s longest- 10 11 sertion is linked back to one or more web pages running artificial inteligence project. Cyc is prin- 11 12 it was extracted from [43]. ReVerb is a knowl- cipally based in predicate logic and provides de- 12 13 edge extraction tool that extracts binary relation- fault values for logic (inferred values can be 13 14 ships from text without the need for a pre-defined overridden by explicit assertions). Cyc has over 14 15 ontology. This has the benefit of completeness, 15,000 predicates, which precludes it from having 15 16 but allows for ambiguous expressions of knowl- a limited set of relations [51]. OpenCyc, which 16 17 edge within its output [44]. GeoLink presents was discontinued in 2017, was an RDF-based 17 18 18 an extensive collection of Geography-research re- open source subset of Cyc [52]. Freebase is a 19 lated data. Additionally, the application devel- 19 20 knowledge graph of over 3B facts and 58M top- 20 oped around the graph (http://demo.geolink.org) 21 ics (Freebase.com web site, April 2016) that was 21 22 is well implemented, and provides the ability to open to public access and curation and formed 22 23 map individual measurements to a map; a partic- the basis for the Google Knowledge Graph. It has 23 24 ularly useful feature for a Geography knowledge since retired and is available as an RDF download 24 25 graph. However, it appears that some of the data [53]. IOS Press has developed a linked data portal 25 26 in the graph is not fully linked; for example, cer- for its publication metadata using the BIBO ontol- 26 27 tain physical measurements may not have asso- ogy. Since it is the authority for publication meta- 27 28 ciated scientists, or vice versa. Additionally, Ge- data, it does not directly encode provenance of 28 29 oLink does not use a system of Nanopublications, this metadata itself [54]. Linked Life Data is con- 29 30 and thus, does not preserve provenance of some structed from a number of biomedical databases, 30 31 31 of the nodes in the graph [45]. The Neurocom- using its own internal vocabulary for a data model 32 mons project attempted to represent all biologi- 32 33 [55]. Open Knowledge Graph was a project 33 cal knowledge relating to neuroscience research 34 that aggregated the output of the SEKI@home 34 35 as a common knowledge graph by integrating a project. SEKI@home is a crowd-sourced knowl- 35 36 number of biological databases. The project was edge graph that aggregates from multiple sources, 36 37 still in development when funding ended [46]. maintaining entity-level provenance using the 37 38 The XLore system claims to be a fully bilingual PROV Ontology [56]. Probase is a automatically 38 39 (Chinese and English) knowledge graph that fo- generated taxonomy of classes and instances au- 39 40 cuses on extracting subClassOf and instanceOf tomatically extracted from the web [57]. The Sug- 40 41 relations from free text [47]. Bio2RDF was orig- gested Upper Merged Ontology (SUMO) at- 41 42 inally designed as a meta-search over a variety of tempts to provide an "all in one" graph of knowl- 42 43 existing biomedical vocabularies and ontologies. edge, and includes partial mappings to Word- 43 44 44 While it provides good unification of synonyms Net and DBPedia. It uses the knowledge inter- 45 over a variety of source ontologies, keeping the 45 46 change format, and only provides partial serial- 46 sources separated is also important to this effort, 47 izations in OWL [58]. The Bing Entity Search 47 48 and so there is less overall integration, and the rep- API is the first public API to be released from Mi- 48 49 resentation is a direct mapping of the source data crosoft’s Satori knowledge graph project. It fo- 49 50 into RDF [48, 49]. YAGO (Yet Another Great cuses on resolving entities and providing infor- 50 51 Ontology) is considered by some researchers to mation about them to API users [59]. The Face- 51 J. McCusker et al. / What is a Knowledge Graph? 9

1 book Graph API provides access to the "social ogy and part of the OBO Foundry, and is avail- 1 2 graph" expressed in the social network- able through Ontobee [67]. The ESKG Knowl- 2 3 ing platform. It encodes social knowledge, includ- edge Graph provides a thorough interface to 3 4 ing, for users who have the right permissions, ac- NASA’s archived Earth-Science studies and data. 4 5 cess to social interactions via the Facebook plat- Additionally, mathematical units and concepts are 5 6 form [60]. The popular Wolfram Alpha natural expressed using unambiguous identifiers. How- 6 7 7 language, mathematics engine utilizes a version ever, there is no user interface provided and the 8 8 9 of a conventional ontology, implemented using knowledge graph is generated automatically and 9 10 symbolic programming. The Wolfram Alpha API may not be error-free [68]. Wikidata is a knowl- 10 11 is planned to expose some of the underlying On- edge graph developed by the Wikimedia Foun- 11 12 tology to developers in a future release [61]. Wal- dation as an effort to provide structured data to 12 13 mart has funded research into a knowledge re- Wikipedia and other efforts. It has developed a 13 14 source as well, extracting structured knowledge language-independent identifier system for enti- 14 15 from Wikipedia. The effort seems to be similar ties, and all information is available as an RDF 15 16 to DBpedia, but has not yet produced any pub- graph. It encourages and allows for references on 16 17 lic output [62]. Upper Mapping and Binding a per-assertion basis to provide evidence to sup- 17 18 Exchange Layer (UMBEL) is designed to be a port it. It also tracks who creates and modifies 18 19 source of entity content. It is designed to pro- facts in the knowledge graph [69]. 19 20 20 vide a coherent ontology for aligning specific en- 21 21 tity knowledge within a broader context. UMBEL 22 8. Future Potential 22 23 used to be distributed and maintained by Struc- 23 24 tured Dynamics LLC. The UMBEL Ontology has Usually, knowledge graphs are not distinguished 24 25 not been updated since the company ceased oper- from bare statement graphs, in that they do not 25 2 26 ations in 2016 [63]. Unigraph attempts to aggre- encode or publish the epistemology of knowl- 26 27 gate knowledge graphs from across the web, but edge asserted in the graph. We see this as trou- 27 28 it is unclear from the documentation how entities bling because it does not privilege knowledge; in 28 29 and relations are identified [64]. most existing knowledge graphs, supported and 29 30 unsupported assertions are given equal weight. 30 31 7.3. Government Graph Resources Moving forward, there is an opportunity to lever- 31 32 32 age existing vocabularies, including the Prove- 33 The United States Geological Survey Geographic 33 nance Ontology (PROV-O) [70], and the Nanop- 34 Names Information System (GNIS) has an ex- 34 ublications Framework [71], to improve the clar- 35 perimental linked data representation that uses 35 ity, transparency, and utility of knowledge graphs. 36 GeoSPARQL to provide geospatial indexing of 36 A nanopublication is a set of RDF graphs: an 37 geographic features [65]. 37 38 assertion graph (the knowledge), a provenance 38 39 7.4. Nonprofit Graph Resources graph (the justification), and an attribution graph 39 40 (the believer). While justified true belief is not 40 41 Ontobee has a web interface for querying and sufficient for knowledge, most other proposals, in- 41 42 visualizing the details and hierarchy of a spe- cluding a causal linkage between the justification, 42 43 cific ontology term. It is able to dereference a assertion, and believer, are well-supported within 43 44 44 single ontology term URI, and then display the provenance vocabularies. Added to a knowledge 45 45 graph, provenance graphs can expand to provide 46 HTML information on a browser. Statistics and 46 47 other detailed information are generated and dis- room for whatever epistemic criteria is desired. 47 48 played. A SPARQL web interface is provided for There is an interesting overlap between what is 48 49 custom queries [66]. The Environment Ontol- considered a “knowledge graph” and what is an 49 50 ogy (ENVO) is an ontology of classes relating 50 51 to environmental research. It is an OWL ontol- 2Epistemology defines why something is known. 51 10 J. McCusker et al. / What is a Knowledge Graph?

1 Table 1 1 2 A breakdown of the features of the reviewed knowledge graphs. 2 Tracks 3 Structured Unambig- Limited 3 Status Publisher Name prove- 4 Meaning uous relations 4 nance 5 5 Experimental Carnegie Mellon University Read the Web Yes No Yes No 6 6 Max Planck Institute for Informat- 7 7 ics PROSPERA Yes No No No 8 8 University of Washington ReVerb Yes No No No 9 9 Google Knowledge Vault Yes Yes Yes Yes 10 10 Probase Yes No No No 11 11 Walmart Walmart Lab’s Social Genome Yes No No No 12 12 Geographic Names Information 13 13 United States Geological Survey System Yes Yes No No 14 14 Blue Brain Project Nexus KnowledgeGraph Yes Yes Yes Yes 15 15 Production Luminoso Technologies, Inc. ConceptNet Yes Yes No Yes 16 16 Chemical Entities of Biological In- 17 17 European Bioinformatics Institute terest (ChEBI) Yes Yes Partial Yes 18 18 UniProt KB Yes Yes Yes Yes 19 19 Laval University BIO2RDF Yes Yes Partial 20 20 Leipzig University DBpedia Yes Yes Yes No 21 21 Max Planck Institute for Informat- 22 22 ics Yet Another Great Ontology Yes Yes No No 23 23 National Science Foundation EarthCube GeoLink No No No Yes 24 24 OBO Foundry Gene Ontology Yes Yes Yes Yes 25 25 National Center for Biomedical On- 26 26 tology BioPortal Yes Yes Yes Yes 27 27 Rensselaer Polytechnic Institute Knowledge Graph Catalog Yes Yes Yes Yes 28 28 Sapienza University of Rome BabelNet Yes Yes No No 29 29 Tsinghua University XLore Yes Yes No No 30 30 Articulate Software Suggested Upper Merged Ontology Yes Partial No Partial 31 31 Cycorp Cyc Yes Yes Yes No 32 32 Facebook Facebook Graph API Yes No No Yes 33 33 Google Google Knowledge Graph Yes Yes Yes Yes 34 34 INGENIOSITY LTD Unigraph Yes No Yes No 35 35 IOS Press LD Connect Yes Yes No Yes 36 36 Entity Search API 37 Yes No No No 37 38 Ontotext Linked Life Data Yes Yes Yes No 38 Thomson Reuters Knowledge 39 Don’t Don’t 39 Thompson Reuters Graph Feed Yes Yes 40 Know Know 40 41 Wolfram Alpha Internal Knowl- 41 42 Wolfram Alpha edge Graph Yes No Yes Yes 42 43 Earth Science Information Partners Earth Science Knowledge Graph Yes Yes No Yes 43 44 Financial Industry Business Ontol- 44 45 EDM Council ogy (FIBO) Yes Yes No Yes 45 46 OBO Foundry Environment Ontology Yes Yes No Yes 46 47 Ontobee Yes Yes No Yes 47 48 Wikimedia Foundation Wikidata Yes Yes Yes No 48 49 Retired Science Commons Neurocommons Yes Yes No Yes 49 50 Stanford University Elementary/DeepDive Yes No No No 50 51 Cycorp OpenCyc Yes Yes No No 51 Google Freebase Yes Yes Yes No Open Knowledge Graph Yes Yes Yes No Upper Mapping and Binding Ex- Yes Unmaintained Structured Dynamics LLC change Layer Yes Yes No J. McCusker et al. / What is a Knowledge Graph? 11

1 ontology. The most commonly accepted definition a level of statement epistemology can be con- 1 2 of an ontology is “an explicit specification of a sidered “Bare Statement” graphs. Since so many 2 3 conceptualization” [72]. To a large degree, knowl- knowledge graphs are curated from third parties, 3 4 edge graphs conform to this definition, but gen- and because of the nature of publishing on the 4 5 erally ontologies tend to talk about generalities Web (Anyone can say Anything about Any sub- 5 6 (classes, properties, and roles) with less focus on ject), as knowledge graphs increase in popularity 6 7 7 inclusion of content about specific instances. For it will become critical to avoid use of such “Bare 8 8 9 example, most ontologies that include content re- Statement” graphs. We also show that, while most 9 10 lated to descriptions of world landmarks would knowledge resources can be considered graphs, 10 11 have descriptions of the landmark class and its re- in the sense defined here, not all can be consid- 11 12 lated properties but would typically not include ered unambiguous graphs, in that they do not use 12 13 a mention of the Eiffel Tower, but a knowledge unambiguous identifiers and do not use a limited 13 14 graph that covers the domain of Parisian land- set of relations. We hope that these definitions 14 15 marks, would. Conversely, knowledge graph ap- help provide a means for setting expectations for 15 16 proaches can be used to improve the credibility of knowledge resources, and to help guide and refine 16 17 ontologies by encoding the epistemology of the the scope of knowledge graph research. 17 18 statements in the ontology. 18 19 19 20 Acknowledgements 20 21 21 9. Conclusion 22 This work was funded by National Spectrum Con- 22 23 sortium’s Dynamic Spectrum Access Policy De- 23 24 Knowledge graphs are an increasingly critical 24 velopment project, NIEHS Award 0255-0236- 25 component of the Semantic Web and serve as 25 4609 / 1U2CES026555-01, NSF Award OAC- 26 information hubs for general use as well as for 26 1640840 IBM Research AI Horizons Network, 27 domain-specific applications. Most knowledge 27 and by the Gates Foundation through HBGDki. 28 graphs seek to aggregate knowledge from third 28 29 party sources, whether from external databases, 29 30 from data aggregated though crawling the Web, 30 31 31 or through the application of entity and relation- 32 32 33 ship extraction methods. Knowledge graphs are 33 34 not simply aggregations of RDF or linked data, 34 35 but critically provide time-invariant information 35 36 about entities of general interest. Their structures 36 37 tend to be focused on a limited set of relations 37 38 adhering to a coherent knowledge model, setting 38 39 them apart from the linked data cloud in gen- 39 40 eral, which usually has relied on the open frame- 40 41 work of the Semantic Web to accommodate a 41 42 completely free-form use of vocabularies and on- 42 43 tologies. Although some knowledge graphs track 43 44 44 the provenance of their content, rigorous prove- 45 45 46 nance is by no means a universal characteristic. 46 47 We argue that knowledge graphs should prioritize 47 48 the epistemology of the knowledge it contains – 48 49 how we know what we know – and that Nanop- 49 50 ublications are a suitable framework in which to 50 51 do so. Semantic publishing that does not provide 51 12 J. McCusker et al. / What is a Knowledge Graph?

1 References [16] M. Rotmensch, Y. Halpern, A. Tlimat, S. Horng and D. Sontag, 1 2 Learning a health knowledge graph from electronic medical 2 3 [1] A. Singhal, Introducing the knowledge graph: things, records, Scientific reports 7(1) (2017), 5994. 3 [17] P. Ernst, A. Siu and G. Weikum, KnowLife: a versatile ap- 4 not strings, Official Google Blog, May (2012), Accessed: 4 2016-04-11. https://googleblog.blogspot.com/2012/05/ proach for constructing a large knowledge graph for biomedi- 5 5 introducing-knowledge-graph-things-not.html. cal sciences, BMC bioinformatics 16(1) (2015), 157. 6 [2] C. Bizer, T. Heath and T. Berners-Lee, Linked data-the story so [18] R. Cyganiak, D. Wood and M. Lanthaler, RDF 1.1 concepts 6 7 far, Semantic Services, Interoperability and Web Applications: and abstract syntax, W3C Recommendation. Feb (2014). 7 8 Emerging Concepts (2009), 205–227. [19] J.P. McCusker, J.A. Phillips, A. Beltrán, A. Finkelstein 8 and M. Krauthammer, Semantic web data warehousing 9 [3] R. van de Riet and R. Meersman, Knowledge Graphs, in: Lin- 9 for caGrid, BMC Bioinformatics 10(Suppl 10) (2009), 2. 10 guistic Instruments in : Proceedings of 10 the 1991 Workshop on Linguistic Instruments in Knowledge doi:10.1186/1471-2105-10-s10-s2. http://dx.doi.org/10.1186/ 11 11 Engineering, Tilburg, the Netherlands, 17-18 January 1991, 1471-2105-10-S10-S2. Analysis 12 North-Holland, 1992, p. 97. [20] E.L. Gettier, Is Justified True Belief Knowledge?, 12 23(6) (1963), 121–123. doi:10.1093/analys/23.6.121. http://dx. 13 [4] M. Rospocher, M. van Erp, P. Vossen, A. Fokkens, I. Aldabe, 13 doi.org/10.1093/analys/23.6.121. 14 G. Rigau, A. Soroa, T. Ploeger and T. Bogaard, Building event- 14 [21] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, centric knowledge graphs from news, Web : Science, 15 K. Murphy, T. Strohmann, S. Sun and W. Zhang, Knowl- 15 Services and Agents on the World Wide Web (2016), ISSN 16 edge vault, in: Proceedings of the 20th ACM SIGKDD in- 16 1570-8268. doi:10.1016/j.websem.2015.12.004. http://www. ternational conference on Knowledge discovery and data 17 sciencedirect.com/science/article/pii/S1570826815001456. 17 mining - KDD '14, Association for Computing Machinery 18 [5] F. Hogenboom, F. Frasincar, U. Kaymak, F. de Jong and 18 (ACM), 2014. doi:10.1145/2623330.2623623. http://dx.doi. 19 E. Caron, A Survey of event extraction methods from text for 19 org/10.1145/2623330.2623623. decision support systems, Decision Support Systems (2016), 20 [22] O. Corby and C.F. Zucker, The KGRAM Abstract Machine for 20 ISSN 0167-9236. doi:10.1016/j.dss.2016.02.006. http://www. 21 Knowledge Graph Querying, in: 2010 IEEE/WIC/ACM Inter- 21 sciencedirect.com/science/article/pii/S0167923616300173. 22 national Conference on Web Intelligence and Intelligent Agent 22 [6] J. Deng, F. Qiao, H. Li, X. Zhang and H. Wang, An Technology, Institute of Electrical & Electronics Engineers 23 Overview of Event Extraction from Twitter, in: Cyber- 23 (IEEE), 2010. doi:10.1109/wi-iat.2010.144. http://dx.doi.org/ 24 Enabled Distributed Computing and Knowledge Discovery 24 10.1109/WI-IAT.2010.144. (CyberC), 2015 International Conference on, 2015, pp. 251– 25 [23] S. Harris, A. Seaborne and E. Prud’hommeaux, SPARQL 1.1 25 256. doi:10.1109/CyberC.2015.24. 26 query language, W3C Recommendation 21 (2013). 26 [7] F.N. Stokman and P.H. de Vries, Structuring Knowl- 27 [24] Z. Wang, J. Zhang, J. Feng and Z. Chen, Knowledge Graph 27 edge in a Graph, in: Human-Computer Interac- 28 Embedding by Translating on Hyperplanes, in: Proceedings of 28 tion, Springer Science + Business Media, 1988, the Twenty-Eighth AAAI Conference on Artificial Intelligence, 29 29 pp. 186–206. doi:10.1007/978-3-642-73402-1_12. 2014. 30 http://dx.doi.org/10.1007/978-3-642-73402-1_12. [25] J. Pujara, H. Miao, L. Getoor and W. Cohen, Knowledge 30 31 [8] L. Zhang, Knowledge graph theory and structural parsing, Graph Identification, in: Lecture Notes in Computer Sci- 31 Twente University Press, 2002. 32 ence, Springer Science + Business Media, 2013, pp. 542– 32 [9] R. Popping, Knowledge Graphs and Network Text 33 557. doi:10.1007/978-3-642-41335-3_34. http://dx.doi.org/10. 33 Analysis, Social Science Information 42(1) (2003), 1007/978-3-642-41335-3_34. 34 34 91–106. doi:10.1177/0539018403042001798. http: [26] Y. Lin, Z. Liu, M. Sun, Y. Liu and X. Zhu, Learning Entity and 35 //dx.doi.org/10.1177/0539018403042001798. Relation Embeddings for Knowledge Graph Completion., in: 35 36 [10] L. Ehrlinger and W. Wöß, Towards a Definition of Knowledge AAAI, 2015, pp. 2181–2187. 36 37 Graphs., in: SEMANTiCS (Posters, Demos, SuCCESS), 2016. [27] D. Hakkani-Tur, L. Heck and G. Tur, Using a knowl- 37 [11] M. Färber, F. Bartscherer, C. Menne and A. Rettinger, Linked 38 edge graph and query click logs for unsupervised learn- 38 data quality of , freebase, opencyc, wikidata, and yago, ing of relation detection, in: 2013 IEEE International 39 39 Semantic Web (2016), 1–53. Conference on Acoustics Speech and Signal Processing, 40 [12] Z. Huang, J. Yang, F. van Harmelen and Q. Hu, Construct- Institute of Electrical & Electronics Engineers (IEEE), 40 41 ing disease-centric knowledge graphs: a case study for depres- 2013. doi:10.1109/icassp.2013.6639289. http://dx.doi.org/10. 41 42 sion (short version), in: Conference on Artificial Intelligence in 1109/ICASSP.2013.6639289. 42 Medicine in Europe, Springer, 2017, pp. 48–52. 43 [28] A. Benedek, C. Goodman and G. Lajos, The ‘Conceptipedia’ 43 [13] H. Paulheim, Knowledge graph refinement: A survey of ap- of Visual Semantic Wikinizers: A Reference Model For Col- 44 44 proaches and evaluation methods, Semantic web 8(3) (2017), laborative Conceptualization, INNODOCT/13 “New changes 45 489–508. in technology and innovation”, 109. 45 46 [14] L. Shi, S. Li, X. Yang, J. Qi, G. Pan and B. Zhou, Seman- [29] M. Weideman and W. Kritzinger, Concept Mapping vs. Web 46 47 tic health knowledge graph: of heteroge- Page Hyperlinks As an Information Retrieval Interface: Pref- 47 48 neous medical knowledge and services, BioMed Research In- erences of Postgraduate Culturally Diverse Learners, in: Pro- 48 ternational 2017 (2017). ceedings of the 2003 Annual Research Conference of the South 49 49 [15] A. Lamurias, J.D. Ferreira, L.A. Clarke and F.M. Couto, gen- African Institute of Computer Scientists and Information Tech- 50 erating a Tolerogenic cell Therapy Knowledge graph from lit- nologists on Enablement Through Technology, SAICSIT ’03, 50 51 erature, Frontiers in immunology 8 (2017), 1656. South African Institute for Computer Scientists and Informa- 51 J. McCusker et al. / What is a Knowledge Graph? 13

1 tion Technologists, Republic of South Africa, 2003, pp. 69– [41] F. Niu, C. Zhang, C. Ré and J. Shavlik, Elementary, Interna- 1 2 82. ISBN 1-58113-774-5. http://dl.acm.org/citation.cfm?id= tional Journal on Semantic Web and Information Systems 8(3) 2 3 954014.954022. (2012), 42–73. doi:10.4018/jswis.2012070103. https://doi.org/ 3 [30] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, 10.4018/jswis.2012070103. 4 4 J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, [42] N. Nakashole, M. Theobald and G. Weikum, Scalable knowl- 5 M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, edge harvesting with high precision and high recall, in: 5 6 J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin and Proceedings of the fourth ACM international conference on 6 7 G. Sherlock, Gene Ontology: tool for the unification of biol- Web search and data mining - WSDM '11, ACM Press, 7 8 ogy, Nature Genetics 25(1) (2000), 25–29. doi:10.1038/75556. 2011. doi:10.1145/1935826.1935869. https://doi.org/10.1145/ 8 https://doi.org/10.1038/75556. 1935826.1935869. 9 9 [31] S. Manuel, A.P. R., M.M. A. and N.N. F., BioPortal [43] T. Mitchell, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, 10 as a dataset of linked biomedical ontologies and termi- T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, 10 11 nologies in RDF, Semantic Web 4(3) (2013), 277–284, B. Settles, W. Cohen, R. Wang, D. Wijaya, A. Gupta, X. Chen, 11 12 ISSN 1570-0844. doi:10.3233/SW-2012-0086. http://doi.org/ A. Saparov, M. Greaves, J. Welling, E. Hruschka, P. Taluk- 12 13 10.3233/SW-2012-0086. dar, B. Yang, J. Betteridge, A. Carlson, B. Dalvi and M. Gard- 13 [32] UniProt: the universal protein knowledgebase, Nucleic Acids ner, Never-ending learning, Communications of the ACM 14 14 Research 45(D1) (2016), 158–169. doi:10.1093/nar/gkw1099. 61(5) (2018), 103–115. doi:10.1145/3191513. https://doi.org/ 15 https://doi.org/10.1093/nar/gkw1099. 10.1145/3191513. 15 16 [33] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Mur- [44] A. Fader, S. Soderland and O. Etzioni, Identifying Rela- 16 17 phy, T. Strohmann, S. Sun and W. Zhang, Knowledge vault, tions for Open Information Extraction, in: Proceedings of 17 18 in: Proceedings of the 20th ACM SIGKDD international con- the Conference on Empirical Methods in Natural Language 18 ference on Knowledge discovery and data mining - KDD '14, Processing, EMNLP ’11, Association for Computational 19 19 ACM Press, 2014. doi:10.1145/2623330.2623623. https://doi. Linguistics, Stroudsburg, PA, USA, 2011, pp. 1535–1545. 20 org/10.1145/2623330.2623623. ISBN 978-1-937284-11-4. http://dl.acm.org/citation.cfm?id= 20 21 [34] Blue Brain Nexus, Last accessed 6/27/2018. https://bbp-nexus. 2145432.2145596. 21 22 epfl.ch/staging/home. [45] A. Krisnadhi, Y. Hu, K. Janowicz, P. Hitzler, R. Arko, S. Car- 22 23 [35] R. Navigli and S.P. Ponzetto, BabelNet: The automatic con- botte, C. Chandler, M. Cheatham, D. Fils, T. Finin, P. Ji, 23 struction, evaluation and application of a wide-coverage mul- M. Jones, N. Karima, K. Lehnert, A. Mickle, T. Narock, 24 24 tilingual semantic network, Artificial Intelligence 193 (2012), M. O’Brien, L. Raymond, A. Shepherd, M. Schildhauer and 25 217–250. doi:10.1016/j.artint.2012.07.001. https://doi.org/10. P. Wiebe, The GeoLink Modular Oceanography Ontology, 25 26 1016/j.artint.2012.07.001. in: The Semantic Web - ISWC 2015, Springer International 26 27 [36] K. Degtyarenko, P. de Matos, M. Ennis, J. Hastings, Publishing, 2015, pp. 301–309. doi:10.1007/978-3-319-25010- 27 28 M. Zbinden, A. McNaught, R. Alcantara, M. Darsow, 6_19. https://doi.org/10.1007/978-3-319-25010-6_19. 28 M. Guedj and M. Ashburner, ChEBI: a database and on- [46] A. Ruttenberg, J.A. Rees, M. Samwald and M.S. Marshall, 29 29 tology for chemical entities of biological interest, Nu- Life sciences on the Semantic Web: the Neurocommons and 30 cleic Acids Research 36(Database) (2007), 344–350. beyond, Briefings in Bioinformatics 10(2) (2009), 193–204. 30 31 doi:10.1093/nar/gkm791. https://doi.org/10.1093/nar/ doi:10.1093/bib/bbp004. https://doi.org/10.1093/bib/bbp004. 31 32 gkm791. [47] Z. Wang, J. Li, Z. Wang, S. Li, M. Li, D. Zhang, Y. Shi, Y. Liu, 32 33 [37] P. de Matos, R. Alcántara, A. Dekker, M. Ennis, J. Hastings, P. Zhang and J. Tang, Xlore: A large-scale english-chinese 33 K. Haug, I. Spiteri, S. Turner and C. Steinbeck, Chemical bilingual knowledge graph, in: Proceedings of the 2013th In- 34 34 Entities of Biological Interest: an update, Nucleic Acids Re- ternational Conference on Posters & Demonstrations Track- 35 search 38(suppl_1) (2009), 249–254. doi:10.1093/nar/gkp886. Volume 1035, CEUR-WS. org, 2013, pp. 121–124. 35 36 https://doi.org/10.1093/nar/gkp886. [48] F. Belleau, M.-A. Nolin, N. Tourigny, P. Rigault and J. Moris- 36 37 [38] J. Hastings, P. de Matos, A. Dekker, M. Ennis, B. Har- sette, Bio2RDF: Towards a mashup to build bioinformat- 37 38 sha, N. Kale, V. Muthukrishnan, G. Owen, S. Turner, ics knowledge systems, Journal of Biomedical Informatics 38 M. Williams and C. Steinbeck, The ChEBI reference database 41(5) (2008), 706–716. doi:10.1016/j.jbi.2008.03.004. https: 39 39 and ontology for biologically relevant chemistry: enhance- //doi.org/10.1016/j.jbi.2008.03.004. 40 ments for 2013, Nucleic Acids Research 41(D1) (2012), [49] A. Callahan, J. Cruz-Toledo, P. Ansell and M. Dumontier, 40 41 456–463. doi:10.1093/nar/gks1146. https://doi.org/10.1093/ Bio2RDF Release 2: Improved Coverage, Interoperability and 41 42 nar/gks1146. Provenance of Life Science Linked Data, in: The Semantic 42 43 [39] R. Speer and C. Havasi, ConceptNet 5: A Large Se- Web: Semantics and Big Data, Springer Berlin Heidelberg, 43 mantic Network for Relational Knowledge, in: The Peo- 2013, pp. 200–212. doi:10.1007/978-3-642-38288-8_14. https: 44 44 ple’s Web Meets NLP, Springer Berlin Heidelberg, 2013, //doi.org/10.1007/978-3-642-38288-8_14. 45 pp. 161–176. doi:10.1007/978-3-642-35085-6_6. https://doi. [50] F.M. Suchanek, G. Kasneci and G. Weikum, Yago, 45 46 org/10.1007/978-3-642-35085-6_6. in: Proceedings of the 16th international conference 46 47 [40] L. Jens, I. Robert, J. Max, J. Anja, K. Dimitris, M.P. N., on World Wide Web - WWW 2007, ACM Press, 2007. 47 48 H. Sebastian, M. Mohamed, van Kleef Patrick, A. Soren and doi:10.1145/1242572.1242667. https://doi.org/10.1145/ 48 et al., DBpedia - A large-scale, multilingual knowledge base 1242572.1242667. 49 49 extracted from Wikipedia, Semantic Web 6(2) (2015), 167– [51] D.B. Lenat, CYC: A large-scale investment in knowledge in- 50 195, ISSN 1570-0844. doi:10.3233/SW-140134. http://doi.org/ frastructure, Communications of the ACM 38(11) (1995), 33– 50 51 10.3233/SW-140134. 38. 51 14 J. McCusker et al. / What is a Knowledge Graph?

1 [52] M.A. Sicilia, E. García, S. Sánchez and E. Rodríguez, On in- MOD '13, ACM Press, 2013. doi:10.1145/2463676.2465297. 1 2 tegrating learning object metadata inside the OpenCyc knowl- https://doi.org/10.1145/2463676.2465297. 2 3 edge base, in: Advanced Learning Technologies, 2004. Pro- [63] M.K. Bergman and F. Giasson, UMBEL ontology, Technical 3 ceedings. IEEE International Conference on, IEEE, 2004, Report, Structured Dynamics, 2008. http://umbel.org. 4 4 pp. 900–901. [64] The World’s Knowledge Graph. https://unigraph.io/. 5 [53] K. Bollacker, C. Evans, P. Paritosh, T. Sturge and J. Tay- [65] B. Regalia, K. Janowicz, G. Mai, D. Varanka and E.L. Usery, 5 6 lor, Freebase, in: Proceedings of the 2008 ACM SIGMOD in- GNIS-LD: Serving and Visualizing the Geographic Names In- 6 7 ternational conference on Management of data - SIGMOD formation System Gazetteer as Linked Data, in: The Seman- 7 8 '08, ACM Press, 2008. doi:10.1145/1376616.1376746. https: tic Web, Springer International Publishing, 2018, pp. 528– 8 //doi.org/10.1145/1376616.1376746. 540. doi:10.1007/978-3-319-93417-4_34. https://doi.org/10. 9 9 [54] Welcome To LD Connect, Our Linked Data Portal. http://ld. 1007/978-3-319-93417-4_34. 10 iospress.nl/. [66] E. Ong, Z. Xiang, B. Zhao, Y. Liu, Y. Lin, J. Zheng, 10 11 [55] V. Momtchev, D. Peychev, T. Primov and G. Georgiev, Ex- C. Mungall, M. Courtot, A. Ruttenberg and Y. He, Onto- 11 12 panding the pathway and interaction knowledge in linked life bee: A linked ontology data server to support ontology term 12 13 data, in: In Proc. of International Semantic Web Challenge, dereferencing, linkage, query and integration, Nucleic Acids 13 2009. Research 45(D1) (2016), 347–352. doi:10.1093/nar/gkw918. 14 14 [56] T. Steiner and S. Mirea, SEKI@ home, or Crowdsourcing an https://doi.org/10.1093/nar/gkw918. 15 Open Knowledge Graph, in: Proceedings of the First Interna- [67] P.L. Buttigieg, E. Pafilis, S.E. Lewis, M.P. Schildhauer, 15 16 tional Workshop on Knowledge Extraction and Consolidation R.L. Walls and C.J. Mungall, The environment ontology in 16 17 from Social Media (KECSM2012), 2012, p. 7. 2016: bridging domains with increased scope, semantic den- 17 18 [57] W. Wu, H. Li, H. Wang and K.Q. Zhu, Probase, sity, and interoperation, Journal of Biomedical Semantics 18 in: Proceedings of the 2012 international con- 7(1) (2016). doi:10.1186/s13326-016-0097-6. https://doi.org/ 19 19 ference on Management of Data - SIGMOD '12, 10.1186/s13326-016-0097-6. 20 ACM Press, 2012. doi:10.1145/2213836.2213891. [68] McGibbney, L. J., Jiang and A. B., ESIP’s Earth Science 20 21 https://doi.org/10.1145/2213836.2213891. Knowledge Graph (ESKG) Testbed Project: An Automatic 21 22 [58] A. Pease, I. Niles and J. Li, The suggested upper merged on- Approach to Building Interdisciplinary Earth Science Knowl- 22 23 tology: A large ontology for the semantic web and its applica- edge Graphs to Improve Data Discovery, 2017. http://adsabs. 23 tions, in: Working notes of the AAAI-2002 workshop on ontolo- harvard.edu/abs/2017AGUFMIN33C0131M. 24 24 gies and the semantic web, Vol. 28, 2002, pp. 7–10. [69] D. Vrandeciˇ c´ and M. Kr"otzsch,¨ Wikidata, Communications of 25 [59] R. Qian, Understand Your World with Bing, bing search blog, the ACM 57(10) (2014), 78–85. doi:10.1145/2629489. https: 25 26 2013, Accessed: 2016-04-11. http://blogs.bing.com/search/ //doi.org/10.1145/2629489. 26 27 2013/03/21/understand-your-world-with-bing/. [70] L. Moreau, P. Groth, J. Cheney, T. Lebo and S. Miles, 27 28 [60] W. Jesse and T. Paul, Facebook Linked Data via The rationale of PROV, Web Semantics: Science Services 28 the Graph API, Semantic Web 4(3) (2013), 245– and Agents on the World Wide Web 35 (2015), 235–257. 29 29 250, ISSN 1570-0844. doi:10.3233/SW-2012-0078. doi:10.1016/j.websem.2015.04.001. http://dx.doi.org/10.1016/ 30 http://doi.org/10.3233/SW-2012-0078. j.websem.2015.04.001. 30 31 [61] S. Wolfram, The Reform of Mathematical Notation, Computa- [71] P. Groth, A. Gibson and J. Velterop, The anatomy of a nanop- 31 32 tion, Mathematical Notation, and Linguistics, Inc., Mathemat- ublication, Information Services and Use 30(1–2) (2010), 51– 32 33 ica, Version 8 (2013), 23. 56. 33 [62] O. Deshpande, D.S. Lamba, M. Tourn, S. Das, S. Subrama- 34 [72] T.R. Gruber, A translation approach to portable ontology 34 niam, A. Rajaraman, V. Harinarayan and A. Doan, Building, specifications, Knowledge Acquisition 5(2) (1993), 199– 35 35 maintaining, and using knowledge bases, in: Proceedings of the 220. doi:10.1006/knac.1993.1008. http://dx.doi.org/10.1006/ 36 2013 international conference on Management of data - SIG- knac.1993.1008. 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51