Freebase QA: Information Extraction Or Semantic Parsing?

Total Page:16

File Type:pdf, Size:1020Kb

Freebase QA: Information Extraction Or Semantic Parsing? Freebase QA: Information Extraction or Semantic Parsing? Xuchen Yao 1 Jonathan Berant 3 Benjamin Van Durme 1,2 1Center for Language and Speech Processing 2Human Language Technology Center of Excellence Johns Hopkins University 3Computer Science Department Stanford University Abstract We characterize semantic parsing as the task of deriving a representation of meaning from lan- We contrast two seemingly distinct ap- guage, sufficient for a given task. Traditional proaches to the task of question answering information extraction (IE) from text may be (QA) using Freebase: one based on infor- coarsely characterized as representing a certain mation extraction techniques, the other on level of semantic parsing, where the goal is to semantic parsing. Results over the same derive enough meaning in order to populate a test-set were collected from two state-of- database with factoids of a form matching a given the-art, open-source systems, then ana- schema.1 Given the ease with which reasonably lyzed in consultation with those systems’ accurate, deep syntactic structure can be automat- creators. We conclude that the differ- ically derived over (English) text, it is not surpris- ences between these technologies, both ing that IE researchers would start including such in task performance, and in how they “features” in their models. get there, is not significant. This sug- Our question is then: what is the difference be- gests that the semantic parsing commu- tween an IE system with access to syntax, as com- nity should target answering more com- pared to a semantic parser, when both are targeting positional open-domain questions that are a factoid-extraction style task? While our conclu- beyond the reach of more direct informa- sions should hold generally for similar KBs, we tion extraction methods. will focus on Freebase, such as explored by Kr- ishnamurthy and Mitchell (2012), and then others 1 Introduction such as Cai and Yates (2013a) and Berant et al. Question Answering (QA) from structured data, (2013). We compare two open-source, state-of- such as DBPedia (Auer et al., 2007), Freebase the-art systems on the task of Freebase QA: the (Bollacker et al., 2008) and Yago2 (Hoffart et semantic parsing system SEMPRE (Berant et al., al., 2011), has drawn significant interest from 2013), and the IE system jacana-freebase (Yao both knowledge base (KB) and semantic pars- and Van Durme, 2014). ing (SP) researchers. The majority of such work We find that these two systems are on par with treats the KB as a database, to which standard each other, with no significant differences in terms database queries (SPARQL, MySQL, etc.) are is- of accuracy between them. A major distinction be- sued to retrieve answers. Language understand- tween the work of Berant et al. (2013) and Yao ing is modeled as the task of converting natu- and Van Durme (2014) is the ability of the for- ral language questions into queries through inter- mer to represent, and compose, aggregation oper- mediate logical forms, with the popular two ap- ators (such as argmax, or count), as well as in- proaches including: CCG parsing (Zettlemoyer tegrate disparate pieces of information. This rep- and Collins, 2005; Zettlemoyer and Collins, 2007; resentational capability was important in previous, Zettlemoyer and Collins, 2009; Kwiatkowski et closed-domain tasks such as GeoQuery. The move al., 2010; Kwiatkowski et al., 2011; Krishna- to Freebase by the SP community was meant to murthy and Mitchell, 2012; Kwiatkowski et al., 1 2013; Cai and Yates, 2013a), and dependency- So-called Open Information Extraction (OIE) is simply a further blurring of the distinction between IE and SP, where based compositional semantics (Liang et al., 2011; the schema is allowed to grow with the number of verbs, and Berant et al., 2013; Berant and Liang, 2014). other predicative elements of the language. 82 Proceedings of the ACL 2014 Workshop on Semantic Parsing, pages 82–86, Baltimore, Maryland USA, June 26 2014. c 2014 Association for Computational Linguistics provide richer, open-domain challenges. While lexicon is used to map NL phrases to KB predi- the vocabulary increased, our analysis suggests cates, and then predicates are combined to form a that compositionality and complexity decreased. full logical form by a context-free grammar. Since We therefore conclude that the semantic parsing logical forms can be derived in multiple ways from community should target more challenging open- the grammar, a log-linear model is used to rank domain datasets, ones that “standard IE” methods possible derivations. The parameters of the model are less capable of attacking. are trained from question-answer pairs. 2 IE and SP Systems 3 Analysis jacana-freebase2 (Yao and Van Durme, 2014) 3.1 Evaluation Metrics treats QA from a KB as a binary classification Both Berant et al. (2013) and Yao and problem. Freebase is a gigantic graph with mil- Van Durme (2014) tested their systems on lions of nodes (topics) and billions of edges (re- the WEBQUESTIONS dataset, which contains lations). For each question, jacana-freebase 3778 training questions and 2032 test questions first selects a “view” of Freebase concerning only collected from the Google Suggest API. Each involved topics and their close neighbors (this question came with a standard answer from “view” is called a topic graph). For instance, Freebase annotated by Amazon Mechanical Turk. for the question “who is the brother of justin Berant et al. (2013) reported a score of 31.4% bieber?”, the topic graph of Justin Bieber, con- in terms of accuracy (with partial credit if inexact taining all related nodes to the topic (think of the match) on the test set and later in Berant and Liang “Justin Bieber” page displayed by the browser), is (2014) revised it to 35.7%. Berant et al. focused selected and retrieved by the Freebase Topic API. on accuracy – how many questions were correctly Usually such a topic graph contains hundreds to answered by the system. Since their system an- thousands of nodes in close relation to the central swered almost all questions, accuracy is roughly topic. Then each of the node is judged as answer identical to F1. Yao and Van Durme (2014)’s sys- or not by a logistic regression learner. tem on the other hand only answered 80% of all Features for the logistic regression learner are test questions. Thus they report a score of 42% first extracted from both the question and the in terms of F1 on this dataset. For the purpose of topic graph. An analysis of the dependency comparing among all test questions, we lowered parse of the question characterizes the question the logistic regression prediction threshold (usu- word, topic, verb, and named entities of the ally 0.5) on jacana-freebase for the other 20% main subject as the question features, such as of questions where jacana-freebase had not pro- qword=who. Features on each node include the posed an answer to, and selected the best-possible types of relations and properties the node pos- prediction with the highest prediction score as the sesses, such as type=person. Finally features answer. In this way jacana-freebase was able from both the question and each node are com- to answer all questions with a lower accuracy of bined as the final features used by the learner, such 35.4%. In the following we present analysis re- as qword=who type=person. In this way the as- sults based on the test questions where the two | sociation between the question and answer type systems had very similar performance (35.7% vs. is enforced. Thus during decoding, for instance, 35.4%).4 The difference is not significant accord- if there is a who question, the nodes with a per- ing to the paired permutation test (Smucker et al., son property would be ranked higher as the an- 2007). swer candidate. 3.2 Accuracy vs. Coverage SEMPRE3 is an open-source system for training semantic parsers, that has been utilized to train a First, we were interested to see the proportions of semantic parser against Freebase by Berant et al. questions SEMPRE and jacana-freebase jointly (2013). SEMPRE maps NL utterances to logical and separately answered correctly. The answer to 4 forms by performing bottom-up parsing. First, a In this setting accuracy equals averaged macro F1: first the F1 value on each question were computed, then averaged 2https://code.google.com/p/jacana/ among all questions, or put it in other words: “accuracy with 3http://www-nlp.stanford.edu/software/ partial credit”. In this section our usage of the terms “accu- sempre/ racy” and “F1” can be exchanged. 83 Accuracy vs. Coverage 70 jacana (F1 = 1) jacana (F1 0.5) jacana-freebase ≥ √ √ SEMPRE × × 60 √ 153 (0.08) 383 (0.19) 429 (0.21) 321 (0.16) EMPRE S 136 (0.07) 1360 (0.67) 366 (0.18) 916 (0.45) 50 × Table 1: The absolute and proportion of ques- Accuracy 40 tions SEMPRE and jacana-freebase answered correctly (√) and incorrectly ( ) jointly and sep- 30 × arately, running a threshold F1 of 1 and 0.5. 20 0 10 20 30 40 50 60 70 80 90 100 Percent Answered many questions in the dataset is a set of answers, Figure 1: Precision with respect to proportion of for example what to see near sedona arizona?. questions answered Since turkers did not exhaustively pick out all pos- sible answers, evaluation is performed by comput- ing the F1 between the set of answers given by a threshold from 1 to 0, we select those questions the system and the answers provided by turkers. with an answer confidence score above the thresh- With a strict threshold of F1 = 1 and a permis- old and compute accuracy at this point.
Recommended publications
  • A Comparative Evaluation of Geospatial Semantic Web Frameworks for Cultural Heritage
    heritage Article A Comparative Evaluation of Geospatial Semantic Web Frameworks for Cultural Heritage Ikrom Nishanbaev 1,* , Erik Champion 1,2,3 and David A. McMeekin 4,5 1 School of Media, Creative Arts, and Social Inquiry, Curtin University, Perth, WA 6845, Australia; [email protected] 2 Honorary Research Professor, CDHR, Sir Roland Wilson Building, 120 McCoy Circuit, Acton 2601, Australia 3 Honorary Research Fellow, School of Social Sciences, FABLE, University of Western Australia, 35 Stirling Highway, Perth, WA 6907, Australia 4 School of Earth and Planetary Sciences, Curtin University, Perth, WA 6845, Australia; [email protected] 5 School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Perth, WA 6845, Australia * Correspondence: [email protected] Received: 14 July 2020; Accepted: 4 August 2020; Published: 12 August 2020 Abstract: Recently, many Resource Description Framework (RDF) data generation tools have been developed to convert geospatial and non-geospatial data into RDF data. Furthermore, there are several interlinking frameworks that find semantically equivalent geospatial resources in related RDF data sources. However, many existing Linked Open Data sources are currently sparsely interlinked. Also, many RDF generation and interlinking frameworks require a solid knowledge of Semantic Web and Geospatial Semantic Web concepts to successfully deploy them. This article comparatively evaluates features and functionality of the current state-of-the-art geospatial RDF generation tools and interlinking frameworks. This evaluation is specifically performed for cultural heritage researchers and professionals who have limited expertise in computer programming. Hence, a set of criteria has been defined to facilitate the selection of tools and frameworks.
    [Show full text]
  • Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
    January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications.
    [Show full text]
  • New Generation of Social Networks Based on Semantic Web Technologies: the Importance of Social Data Portability
    Workshop on Adaptation and Personalization for Web 2.0, UMAP'09, June 22-26, 2009 New Generation of Social Networks Based on Semantic Web Technologies: the Importance of Social Data Portability Liana Razmerita1, Martynas Jusevičius2, Rokas Firantas2 Copenhagen Business School, Denmark1 IT University of Copenhagen, Copenhagen, Denmark2 [email protected], [email protected], [email protected] Abstract. This article investigates several well-known social network applications such as Last.fm, Flickr and identifies social data portability as one of the main technical issues that need to be addressed in the future. We argue that this issue can be addressed by building social networks as Semantic Web applications with FOAF, SIOC, and Linked Data technologies, and prove it by implementing a prototype application using Java and core Semantic Web standards. Furthermore, the developed prototype shows how features from semantic websites such as Freebase and DBpedia can be reused in social applications and lead to more relevant content and stronger social connections. 1 Introduction Social networking sites are developing at a very fast pace, attracting more and more users. Their application domain can be music sharing, photos sharing, videos sharing, bookmarks sharing, professional networks, and others. Despite their tremendous success social networking websites have a number of limitations that are identified and discussed within this article. Most of the social networking applications are “walled websites” and the “online communities are like islands in a sea”[1]. Lack of interoperability between data in different social networks applications limits access to relevant content available on different social networking sites, and limits the integration and reuse of available data and information.
    [Show full text]
  • How Much Is a Triple? Estimating the Cost of Knowledge Graph Creation
    How much is a Triple? Estimating the Cost of Knowledge Graph Creation Heiko Paulheim Data and Web Science Group, University of Mannheim, Germany [email protected] Abstract. Knowledge graphs are used in various applications and have been widely analyzed. A question that is not very well researched is: what is the price of their production? In this paper, we propose ways to estimate the cost of those knowledge graphs. We show that the cost of manually curating a triple is between $2 and $6, and that the cost for automatically created knowledge graphs is by a factor of 15 to 250 cheaper (i.e., 1g to 15g per statement). Furthermore, we advocate for taking cost into account as an evaluation metric, showing the correspon- dence between cost per triple and semantic validity as an example. Keywords: Knowledge Graphs, Cost Estimation, Automation 1 Estimating the Cost of Knowledge Graphs With the increasing attention larger scale knowledge graphs, such as DBpedia [5], YAGO [12] and the like, have drawn towards themselves, they have been inspected under various angles, such as as size, overlap [11], or quality [3]. How- ever, one dimension that is underrepresented in those analyses is cost, i.e., the prize of creating the knowledge graphs. 1.1 Manual Curation: Cyc and Freebase For manually created knowledge graphs, we have to estimate the effort of pro- viding the statements directly. Cyc [6] is one of the earliest general purpose knowledge graphs, and, at the same time, the one for which the development effort is known. At a 2017 con- ference, Douglas Lenat, the inventor of Cyc, denoted the cost of creation of Cyc at $120M.1 In the same presentation, Lenat states that Cyc consists of 21M assertions, which makes a cost of $5.71 per statement.
    [Show full text]
  • Knowledge Extraction Part 3: Graph
    Part 1: Knowledge Graphs Part 2: Part 3: Knowledge Graph Extraction Construction Part 4: Critical Analysis 1 Tutorial Outline 1. Knowledge Graph Primer [Jay] 2. Knowledge Extraction from Text a. NLP Fundamentals [Sameer] b. Information Extraction [Bhavana] Coffee Break 3. Knowledge Graph Construction a. Probabilistic Models [Jay] b. Embedding Techniques [Sameer] 4. Critical Overview and Conclusion [Bhavana] 2 Critical Overview SUMMARY SUCCESS STORIES DATASETS, TASKS, SOFTWARES EXCITING ACTIVE RESEARCH FUTURE RESEARCH DIRECTIONS 3 Critical Overview SUMMARY SUCCESS STORIES DATASETS, TASKS, SOFTWARES EXCITING ACTIVE RESEARCH FUTURE RESEARCH DIRECTIONS 4 Why do we need Knowledge graphs? • Humans can explore large database in intuitive ways •AI agents get access to human common sense knowledge 5 Knowledge graph construction A1 E1 • Who are the entities A2 (nodes) in the graph? • What are their attributes E2 and types (labels)? A1 A2 • How are they related E3 (edges)? A1 A2 6 Knowledge Graph Construction Knowledge Extraction Graph Knowledge Text Extraction graph Construction graph 7 Two perspectives Extraction graph Knowledge graph Who are the entities? (nodes) What are their attributes? (labels) How are they related? (edges) 8 Two perspectives Extraction graph Knowledge graph Who are the entities? • Named Entity • Entity Linking (nodes) Recognition • Entity Resolution • Entity Coreference What are their • Entity Typing • Collective attributes? (labels) classification How are they related? • Semantic role • Link prediction (edges) labeling • Relation Extraction 9 Knowledge Extraction John was born in Liverpool, to Julia and Alfred Lennon. Text NLP Lennon.. Mrs. Lennon.. his father John Lennon... the Pool .. his mother .. he Alfred Person Location Person Person John was born in Liverpool, to Julia and Alfred Lennon.
    [Show full text]
  • Adding Realtime Coverage to the Google Knowledge Graph
    Adding Realtime Coverage to the Google Knowledge Graph Thomas Steiner1?, Ruben Verborgh2, Raphaël Troncy3, Joaquim Gabarro1, and Rik Van de Walle2 1 Universitat Politècnica de Catalunya – Department lsi, Spain {tsteiner,gabarro}@lsi.upc.edu 2 Ghent University – IBBT, ELIS – Multimedia Lab, Belgium {ruben.verborgh,rik.vandewalle}@ugent.be 3 EURECOM, Sophia Antipolis, France [email protected] Abstract. In May 2012, the Web search engine Google has introduced the so-called Knowledge Graph, a graph that understands real-world entities and their relationships to one another. Entities covered by the Knowledge Graph include landmarks, celebrities, cities, sports teams, buildings, movies, celestial objects, works of art, and more. The graph enhances Google search in three main ways: by disambiguation of search queries, by search log-based summarization of key facts, and by explo- rative search suggestions. With this paper, we suggest a fourth way of enhancing Web search: through the addition of realtime coverage of what people say about real-world entities on social networks. We report on a browser extension that seamlessly adds relevant microposts from the social networking sites Google+, Facebook, and Twitter in form of a panel to Knowledge Graph entities. In a true Linked Data fashion, we inter- link detected concepts in microposts with Freebase entities, and evaluate our approach for both relevancy and usefulness. The extension is freely available, we invite the reader to reconstruct the examples of this paper to see how realtime opinions may have changed since time of writing. 1 Introduction With the introduction of the Knowledge Graph, the search engine Google has made a significant paradigm shift towards “things, not strings” [3], as a post on the official Google blog states.
    [Show full text]
  • Question Answering Benchmarks for Wikidata Dennis Diefenbach, Thomas Tanon, Kamal Singh, Pierre Maret
    Question Answering Benchmarks for Wikidata Dennis Diefenbach, Thomas Tanon, Kamal Singh, Pierre Maret To cite this version: Dennis Diefenbach, Thomas Tanon, Kamal Singh, Pierre Maret. Question Answering Benchmarks for Wikidata. ISWC 2017, Oct 2017, Vienne, Austria. hal-01637141 HAL Id: hal-01637141 https://hal.archives-ouvertes.fr/hal-01637141 Submitted on 17 Nov 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Question Answering Benchmarks for Wikidata Dennis Diefenbach1, Thomas Pellissier Tanon2, Kamal Singh1, Pierre Maret1 1 Universit´ede Lyon, CNRS UMR 5516 Laboratoire Hubert Curien fdennis.diefenbach,kamal.singh,[email protected] 2 Universit´ede Lyon, ENS de Lyon, Inria, CNRS, Universit´eClaude-Bernard Lyon 1, LIP. [email protected] Abstract. Wikidata is becoming an increasingly important knowledge base whose usage is spreading in the research community. However, most question answering systems evaluation datasets rely on Freebase or DBpedia. We present two new datasets in order to train and benchmark QA systems over Wikidata. The first is a translation of the popular SimpleQuestions dataset to Wikidata, the second is a dataset created by collecting user feedbacks. Keywords: Wikidata, Question Answering Datasets, SimpleQuestions, WebQuestions, QALD, SimpleQuestionsWikidata, WDAquaCore0Questions 1 Introduction Wikidata is becoming an increasingly important knowledge base.
    [Show full text]
  • Farewell Freebase: Migrating the Simplequestions Dataset to Dbpedia
    Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia Michael Azmy,∗ Peng Shi,∗ Jimmy Lin, and Ihab F. Ilyas David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada fmwazmy,p8shi,jimmylin,[email protected] Abstract Question answering over knowledge graphs is an important problem of interest both commer- cially and academically. There is substantial interest in the class of natural language questions that can be answered via the lookup of a single fact, driven by the availability of the popular SIMPLEQUESTIONS dataset. The problem with this dataset, however, is that answer triples are provided from Freebase, which has been defunct for several years. As a result, it is difficult to build “real-world” question answering systems that are operationally deployable. Furthermore, a defunct knowledge graph means that much of the infrastructure for querying, browsing, and ma- nipulating triples no longer exists. To address this problem, we present SIMPLEDBPEDIAQA, a new benchmark dataset for simple question answering over knowledge graphs that was created by mapping SIMPLEQUESTIONS entities and predicates from Freebase to DBpedia. Although this mapping is conceptually straightforward, there are a number of nuances that make the task non-trivial, owing to the different conceptual organizations of the two knowledge graphs. To lay the foundation for future research using this dataset, we leverage recent work to provide simple yet strong baselines with and without neural networks. 1 Introduction Question answering over knowledge graphs is an important problem at the intersection of multiple re- search communities, with many commercial deployments. To ensure continued progress, it is important that open and relevant benchmarks are available to support the comparison of various techniques.
    [Show full text]
  • Wikigraphs: a Wikipedia Text - Knowledge Graph Paired Dataset
    WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset Luyu Wang* and Yujia Li* and Ozlem Aslan and Oriol Vinyals *Equal contribution DeepMind, London, UK {luyuwang,yujiali,ozlema,vinyals}@google.com Freebase Paul Hewson Abstract WikiText-103 Bono type.object.name Where the streets key/wikipedia.en “Where the Streets Have have no name We present a new dataset of Wikipedia articles No Name” is a song by ns/m.01vswwx Irish rock band U2. It is the ns/m.02h40lc people.person.date of birth each paired with a knowledge graph, to facili- opening track from their key/wikipedia.en 1987 album The Joshua ns/music.composition.language music.composer.compositions Tree and was released as tate the research in conditional text generation, 1960-05-10 the album’s third single in ns/m.06t2j0 August 1987. The song ’s ns/music.composition.form graph generation and graph representation 1961-08-08 hook is a repeating guitar ns/m.074ft arpeggio using a delay music.composer.compositions people.person.date of birth learning. Existing graph-text paired datasets effect, played during the key/wikipedia.en song’s introduction and ns/m.01vswx5 again at the end. Lead David Howell typically contain small graphs and short text Evans, better vocalist Bono wrote... Song key/wikipedia.en known by his ns/common.topic.description stage name The (1 or few sentences), thus limiting the capabil- Edge, is a people.person.height meters The Edge British-born Irish musician, ities of the models that can be learned on the songwriter... 1.77m data.
    [Show full text]
  • Semantic Networks for Engineering Design: a Survey
    Semantic Networks for Engineering Design: A Survey A Preprint Ji Han Serhad Sarica School of Engineering Data-Driven Innovation Lab University of Liverpool Singapore University of Technology and Design United Kingdom Singapore, 487372 [email protected] [email protected] Feng Shi Jianxi Luo Amazon Web Services Data-Driven Innovation Lab United Kingdom Singapore University of Technology and Design [email protected] Singapore, 487372 [email protected] 12th December, 2020 Abstract There have been growing uses of semantic networks in the past decade, such as leveraging large- scale pre-trained graph knowledge databases for various natural language processing (NLP) tasks in engineering design research. Therefore, the paper provides a survey of the research that has employed semantic networks in the engineering design research community. The survey reveals that engineering design researchers have primarily relied on WordNet, ConceptNet, and other common- sense semantic network databases trained on non-engineering data sources to develop methods or tools for engineering design. Meanwhile, there are emerging efforts to mine large scale technical publication and patent databases to construct engineering-contextualized semantic network databases, e.g., B-Link and TechNet, to support NLP in engineering design. On this basis, we recommend future research directions for the construction and applications of engineering-related semantic networks in engineering design research and practice. Keywords: Semantic data processing, Knowledge management, Design informatics, Semantic network, Knowledge base 1 1 INTRODUCTION Design knowledge retrieval, representation, and management are considered significant activities in engineering design because design is essentially a knowledge-intensive process (Bertola and Teixeira, 2003; Chandrasegaran et al., 2013).
    [Show full text]
  • Semantic Web Technologies Public Knowledge Graphs
    Semantic Web Technologies Public Knowledge Graphs Heiko Paulheim Previously on “Semantic Web Technologies” • Linked Open Data – We know the principles – We have seen examples for some datasets • Today – A closer look on actual examples – Some useful, large-scale resources 10/30/20 Heiko Paulheim 2 Growing Interest in Knowledge Graphs Sources: Google 10/30/20 Heiko Paulheim 3 Introduction • Knowledge Graphs on the Web • Everybody talks about them, but what is a Knowledge Graph? – I don’t have a definition either... Journal Paper Review, (Natasha Noy, Google, June 2015): “Please define what a knowledge graph is – and what it is not.” 10/30/20 Heiko Paulheim 4 Definitions • Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets. (Blumauer, 2014) • We define a Knowledge Graph as an RDF graph. (Färber and Rettinger, 2015) • Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities. (Kroetsch and Weikum, 2016) • [...] systems exist, [...], which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph. (Pujara et al., 2013) Ehrlinger and Wöß: Towards a Definition of Knowledge Graphs. 2016 10/30/20 Heiko Paulheim 5 Definitions • My working definition: a Knowledge Graph – mainly describes instances and their relations in a graph • Unlike an ontology • Unlike, e.g., WordNet – Defines possible classes and relations in a schema or ontology • Unlike schema-free output of some IE tools – Allows for interlinking arbitrary entities with each other • Unlike a relational database – Covers various domains • Unlike, e.g., Geonames Paulheim: Knowledge graph refinement: A survey of approaches and evaluation methods, 2017.
    [Show full text]
  • OK Google, What Is Your Ontology? Or: Exploring Freebase Classification to Understand Google’S Knowledge Graph
    OK Google, What Is Your Ontology? Or: Exploring Freebase Classification to Understand Google’s Knowledge Graph Niel Chah a;1 a University of Toronto, Faculty of Information Abstract. This paper reconstructs the Freebase data dumps to understand the under- lying ontology behind Google’s semantic search feature. The Freebase knowledge base was a major Semantic Web and linked data technology that was acquired by Google in 2010 to support the Google Knowledge Graph, the backend for Google search results that include structured answers to queries instead of a series of links to external resources. After its shutdown in 2016, Freebase is contained in a data dump of 1.9 billion Resource Description Format (RDF) triples. A recomposition of the Freebase ontology will be analyzed in relation to concepts and insights from the literature on classification by Bowker and Star. This paper will explore how the Freebase ontology is shaped by many of the forces that also shape classifica- tion systems through a deep dive into the ontology and a small correlational study. These findings will provide a glimpse into the proprietary blackbox Knowledge Graph and what is meant by Google’s mission to “organize the world’s information and make it universally accessible and useful”. Keywords. ontology, Google, Freebase, Knowledge Graph, classification 1. Introduction This paper’s title is a riff on a key phrase “OK, Google” that is uttered by users to trigger Google’s virtual assistant services. Through products such as the Google Home appli- arXiv:1805.03885v2 [cs.IR] 22 May 2018 ance and a mobile application, the Google Assistant taps into the structured data in the proprietary Google Knowledge Graph (KG) to answer queries like “What is the capital of Canada?”.
    [Show full text]