Knowledge-Based Question Answering As Machine Translation

Total Page:16

File Type:pdf, Size:1020Kb

Knowledge-Based Question Answering As Machine Translation Knowledge-Based Question Answering as Machine Translation Junwei Bao† ∗, Nan Duan‡ , Ming Zhou‡ , Tiejun Zhao† †Harbin Institute of Technology ‡Microsoft Research baojunwei001@gmail.com nanduan, mingzhou @microsoft.com { tjzhao@hit.edu.cn} Abstract 2013; Poon, 2013; Artzi et al., 2013; Kwiatkowski et al., 2013; Berant et al., 2013); Then, the answer- A typical knowledge-based question an- s are retrieved from existing KBs using generated swering (KB-QA) system faces two chal- MRs as queries. lenges: one is to transform natural lan- Unlike existing KB-QA systems which treat se- guage questions into their meaning repre- mantic parsing and answer retrieval as two cas- sentations (MRs); the other is to retrieve caded tasks, this paper presents a unified frame- answers from knowledge bases (KBs) us- work that can integrate semantic parsing into the ing generated MRs. Unlike previous meth- question answering procedure directly. Borrow- ods which treat them in a cascaded man- ing ideas from machine translation (MT), we treat ner, we present a translation-based ap- the QA task as a translation procedure. Like MT, proach to solve these two tasks in one u- CYK parsing is used to parse each input question, nified framework. We translate questions and answers of the span covered by each CYK cel- to answers based on CYK parsing. An- l are considered the translations of that cell; un- swers as translations of the span covered like MT, which uses offline-generated translation by each CYK cell are obtained by a ques- tables to translate source phrases into target trans- tion translation method, which first gener- lations, a semantic parsing-based question trans- ates formal triple queries as MRs for the lation method is used to translate each span into span based on question patterns and re- its answers on-the-fly, based on question patterns lation expressions, and then retrieves an- and relation expressions. The final answers can be swers from a given KB based on triple obtained from the root cell. Derivations generated queries generated. A linear model is de- during such a translation procedure are modeled fined over derivations, and minimum er- by a linear model, and minimum error rate train- ror rate training is used to tune feature ing (MERT) (Och, 2003) is used to tune feature weights based on a set of question-answer weights based on a set of question-answer pairs. pairs. Compared to a KB-QA system us- Figure 1 shows an example: the question direc- ing a state-of-the-art semantic parser, our tor of movie starred by Tom Hanks is translated to method achieves better results. one of its answers Robert Zemeckis by three main director of director of 1 Introduction steps: (i) translate to ; (ii) translate movie starred by Tom Hanks to one of it- Knowledge-based question answering (KB-QA) s answers Forrest Gump; (iii) translate director of computes answers to natural language (NL) ques- Forrest Gump to a final answer Robert Zemeckis. tions based on existing knowledge bases (KBs). Note that the updated question covered by Cell[0, Most previous systems tackle this task in a cas- 6] is obtained by combining the answers to ques- caded manner: First, the input question is trans- tion spans covered by Cell[0, 1] and Cell[2, 6]. formed into its meaning representation (MR) by The contributions of this work are two-fold: (1) an independent semantic parser (Zettlemoyer and We propose a translation-based KB-QA method Collins, 2005; Mooney, 2007; Artzi and Zettle- that integrates semantic parsing and QA in one moyer, 2011; Liang et al., 2011; Cai and Yates, unified framework. The benefit of our method This∗ work was finished while the author was visiting Mi- is that we don’t need to explicitly generate com- crosoft Research Asia. plete semantic structures for input questions. Be- 967 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 967–976, Baltimore, Maryland, USA, June 23-25 2014. c 2014 Association for Computational Linguistics Hanks, Film.Actor.Film, Forrest Gump 6 and (iii) director of Forrest Gump ⟹ Robert Zemeckis i2 Forrest Gump, Film.Film.Director, Robert h Cell[0, 6] (ii) movie starred by Tom Hanks ⟹ Forrest Gump Zemeckis 6 are three ordered formal triples i0 (i) director of ⟹ director of corresponding to the three translation steps in Cell[2, 6] Figure 1. We define the task of transforming question spans into formal triples as question Cell[0, 1] translation. denotes one final answer of . A Q director of movie starred by Tom Hanks h ( ) denotes the ith feature function. • i · λ denotes the feature weight of h ( ). Figure 1: Translation-based KB-QA example • i i · According to the above description, our KB- QA method can be decomposed into four tasks as: sides which, answers generated during the transla- (1) search space generation for ( ); (2) ques- tion procedure help significantly with search space H Q tion translation for transforming question spans in- pruning. (2) We propose a robust method to trans- to their corresponding formal triples; (3) feature form single-relation questions into formal triple design for h ( ); and (4) feature weight tuning for queries as their MRs, which trades off between i · λ . We present details of these four tasks in the transformation accuracy and recall using question { i} following subsections one-by-one. patterns and relation expressions respectively. 2 Translation-Based KB-QA 2.2 Search Space Generation We first present our translation-based KB-QA 2.1 Overview method in Algorithm 1, which is used to generate Formally, given a knowledge base and an N- ( ) for each input NL question . KB L question , our KB-QA method generates a set H Q Q Q of formal triples-answer pairs , as deriva- {hD Ai} Algorithm 1: Translation-based KB-QA tions, which are scored and ranked by the distribu- 1 for l = 1 to do |Q| tion P ( , , ) defined as follows: 2 for all i, j s.t. j i = l do hD Ai|KB Q j − 3 ( i ) = ; H Q ∅ j M 4 T = QT rans( , ); exp i=1 λi hi( , , , ) Qi KB { ·M hD Ai KB Q } 5 foreach formal triple t T do exp λ h ( 0 , 0 , , ) ∈ d 0 , 0 ( P) i=1 i i 6 create a new derivation ; hD A i∈H Q { · hD A i KB Q } 7 d. = t.eobj ; A P P 8 d. = t ; denotes a knowledge base1 that stores a D { } • KB 9 update the model score of d; set of assertions. Each assertion t is in j 10 insert d to ( i ); ID ID ∈ KB H Q the form of esbj, p, eobj , where p denotes 11 end {ID ID} 12 end a predicate, esbj and eobj denote the subject 13 2 end and object entities of t, with unique IDs . 14 for l = 1 to do |Q| 15 for all i, j s.t. j i = l do − ( ) denotes the search space , . 16 for all m s.t. i m < j do ≤ m j •H Q {hD Ai} D 17 for dl ( ) and dr ( ) do is composed of a set of ordered formal triples ∈ H Qi ∈ H Qm+1 18 update = dl. + dr. ; j Q A A t1, ..., tn . Each triple t = esbj, p, eobj i 19 T = QT rans( update, ); { } { } ∈ Q KB denotes an assertion in , where i and 20 foreach formal triple t T do D KB ∈ j denotes the beginning and end indexes of 21 create a new derivation d; 22 d. = t.eobj ; A S S the question span from which t is trans- 23 d. = dl. dr. t ; D D D { } formed. The order of triples in denotes 24 update the model score of d; j D 25 insert d to ( ); the order of translation steps from to . H Qi Q A 26 E.g., director of, Null, director of 1, Tom end h i0 h 27 end 1We use a large scale knowledge base in this paper, which 28 end contains 2.3B entities, 5.5K predicates, and 18B assertions. A 29 end 16-machine cluster is used to host and serve the whole data. 30 end 2 Each KB entity has a unique ID. For the sake of conve- 31 return ( ). nience, we omit the ID information in the rest of the paper. H Q 968 The first half (from Line 1 to Line 13) gen- Algorithm 2: -based Question Translation QP erates a formal triple set T for each unary span 1 T = ; j ∅ , using the question translation method 2 foreach entity mention e do i Q ∈ Q Q ∈ Q j j 3 pattern = replace e in with [Slot]; QT rans( , ) (Line 4), which takes as the Q Q Q i i 4 foreach question pattern do Q KB Q QP input. Each triple t T returned is in the form of 5 if pattern == pattern then ∈ j Q QP 6 = Disambiguate(e , predicate); esbj, p, eobj , where esbj’s mention occurs in i , E Q QP { } Q 7 foreach e do ∈ E p is a predicate that denotes the meaning expressed 8 create a new triple query q; j by the context of e in , e is an answer of 9 q = e, predicate, ? ; sbj i obj { QP } j Q 10 i = AnswerRetrieve(q, ); based on esbj, p and . We describe the im- {A } KB i 11 foreach i do Q KB A ∈ {A } plementation detail of QT rans( ) in Section 2.3. 12 create a new formal triple t; · 13 t = q.esbj , q.p, ; The second half (from Line 14 to Line 31) first { A} 14 t.score = 1.0; updates the content of each bigger span j by con- Qi 15 insert t to T ; catenating the answers to its any two consecutive 16 end j smaller spans covered by (Line 18).
Recommended publications
  • Knowit VQA: Answering Knowledge-Based Questions About Videos
    The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) KnowIT VQA: Answering Knowledge-Based Questions about Videos Noa Garcia,1 Mayu Otani,2 Chenhui Chu,1 Yuta Nakashima1 1Osaka University, Japan, 2CyberAgent, Inc., Japan {noagarcia, chu, n-yuta}@ids.osaka-u.ac.jp otani mayu@cyberagent.co.jp Abstract We propose a novel video understanding task by fusing knowledge-based and video question answering. First, we in- troduce KnowIT VQA, a video dataset with 24,282 human- generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence rea- /HRQDUG+DYH\RXQRWLFHGWKDW+RZDUGFDQWDNHDQ\WRSLFDQGXVHLWWRUHPLQG soning together with knowledge-based questions, which need \RXWKDWKHZHQWWRVSDFH" of the experience obtained from the viewing of the series to be 6KHOGRQ,QWHUHVWLQJK\SRWKHVLV/HW¶VDSSO\WKHVFLHQWLILFPHWKRG answered. Second, we propose a video understanding model /HRQDUG2ND\+H\+RZDUGDQ\WKRXJKWVRQZKHUHZHVKRXOGJHWGLQQHU" by combining the visual and textual video content with spe- +RZDUG$Q\ZKHUHEXWWKH6SDFH6WDWLRQ2QDJRRGGD\GLQQHUZDVDEDJIXOO cific knowledge about the show. Our main findings are: (i) the RIPHDWORDI%XWKH\\RXGRQ¶WJRWKHUHIRUWKHIRRG\RXJRWKHUHIRUWKHYLHZ incorporation of knowledge produces outstanding improve- ments for VQA in video, and (ii) the performance on KnowIT 9LVXDO +RZPDQ\SHRSOHDUHWKHUHZHDULQJJODVVHV"2QH VQA still lags well behind human accuracy, indicating its 7H[WXDO :KRKDVEHHQWRWKHV:KRKDVEHHQWRWKHVSDFH"SDFH" +R+RZDUGZDUG usefulness for studying current video modelling limitations. 7HPSRUDO +RZGRWKH\+RZGRWKH\ILQLVKWKHFRQYHUVDWLRQ" ILQLVKWKHFRQYHUVDWLRQ"66KDNLQJKDQGVKDNLQJKDQGV Introduction .QRZOHGJH :KRRZQVWKHSODFHZKHUHWKH:KRRZQVWKHSODFHZKHUHWKH\DUHVWDQGLQJ"\DUHVWDQGLQJ"66WXDUWWXDUW Visual question answering (VQA) was firstly introduced in (Malinowski and Fritz 2014) as a task for bringing to- Figure 1: Types of questions addressed in KnowIt VQA. gether advancements in natural language processing and image understanding.
    [Show full text]
  • Wikidata As a Knowledge Graph for the Life Sciences
    Washington University School of Medicine Digital Commons@Becker Open Access Publications 2020 Wikidata as a knowledge graph for the life sciences Andra Waagmeester Malachi Griffith Obi L. Griffith et al. Follow this and additional works at: https://digitalcommons.wustl.edu/open_access_pubs FEATURE ARTICLE SCIENCE FORUM Wikidata as a knowledge graph for the life sciences Abstract Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing. ANDRA WAAGMEESTER†, GREGORY STUPP†, SEBASTIAN BURGSTALLER- MUEHLBACHER, BENJAMIN M GOOD, MALACHI GRIFFITH, OBI L GRIFFITH, KRISTINA HANSPERS, HENNING HERMJAKOB, TOBY S HUDSON, KEVIN HYBISKE, SARAH M KEATING, MAGNUS MANSKE, MICHAEL MAYERS, DANIEL MIETCHEN, ELVIRA MITRAKA, ALEXANDER R PICO, TIMOTHY PUTMAN, ANDERS RIUTTA, NURIA QUERALT-ROSINACH, LYNN M SCHRIML, THOMAS SHAFEE, DENISE SLENTER, RALF STEPHAN, KATHERINE THORNTON, GINGER TSUENG, ROGER TU, SABAH UL-HASAN, EGON WILLIGHAGEN, CHUNLEI WU AND ANDREW I SU* Introduction advance efforts by the open-data community to Integrating data and knowledge is a formidable build a rich and heterogeneous network of scien- challenge in biomedical research. Although new tific knowledge. That knowledge network could, scientific findings are being discovered at a in turn, be the foundation for many computa- *For correspondence: asu@ rapid pace, a large proportion of that knowl- tional tools, applications and analyses.
    [Show full text]
  • Knowledge-Based Question Answering As Machine Translation
    Knowledge-Based Question Answering as Machine Translation Junwei Baoy ∗, Nan Duanz , Ming Zhouz , Tiejun Zhaoy Harbin Institute of Technologyy Microsoft Researchz baojunwei001@gmail.com fnanduan, mingzhoug@microsoft.com tjzhao@hit.edu.cn Abstract 2013; Poon, 2013; Artzi et al., 2013; Kwiatkowski et al., 2013; Berant et al., 2013); Then, the answer- A typical knowledge-based question an- s are retrieved from existing KBs using generated swering (KB-QA) system faces two chal- MRs as queries. lenges: one is to transform natural lan- Unlike existing KB-QA systems which treat se- guage questions into their meaning repre- mantic parsing and answer retrieval as two cas- sentations (MRs); the other is to retrieve caded tasks, this paper presents a unified frame- answers from knowledge bases (KBs) us- work that can integrate semantic parsing into the ing generated MRs. Unlike previous meth- question answering procedure directly. Borrow- ods which treat them in a cascaded man- ing ideas from machine translation (MT), we treat ner, we present a translation-based ap- the QA task as a translation procedure. Like MT, proach to solve these two tasks in one u- CYK parsing is used to parse each input question, nified framework. We translate questions and answers of the span covered by each CYK cel- to answers based on CYK parsing. An- l are considered the translations of that cell; un- swers as translations of the span covered like MT, which uses offline-generated translation by each CYK cell are obtained by a ques- tables to translate source phrases into target trans- tion translation method, which first gener- lations, a semantic parsing-based question trans- ates formal triple queries as MRs for the lation method is used to translate each span into span based on question patterns and re- its answers on-the-fly, based on question patterns lation expressions, and then retrieves an- and relation expressions.
    [Show full text]
  • Building Domain-Specific Search Engines for Investigative Decision Support
    Knowledge Graph Completion Introduction and motivation We have our ‘constructed’ knowledge graph, now what? freebase: Seattle 2 Introduction and motivation Problem 1: Wrong/missing triples 3 Introduction and motivation Problem 2: Many nodes refer to the same underlying entity 4 For Web extractions, noise is inevitable • Thousands of web domains • Many page formats • Distracting & irrelevant content • Purposeful obfuscation • Poor grammar & spelling • Tables To reach its potential, a constructed KG must be completed or identified 5 Noise Analysis • Extractors found to offer a collective tradeoff between multiple dimensions • Noise is rarely ‘random’! Glossary Regex Landmark CRF NER Easy to define 4 2 4 4 4 Site coverage All All Short Tail All All Precision 2-3 3-4 4 2-3 3 Recall 3-4 2 1 2 1 6 ENTITY RESOLUTION 7 Definitions and alternate names • Common sense: – Which entities refer to the same thing? • Slightly more formal: – Which mentions (aka records, instances, nodes, surface strings…) refer to the same underlying entity? • Rigorous mathematical/logical definition – Doesn’t exist, or unknown! Just like other hard AI problems... • Why try to solve the problem aka why is it a problem? 8 Applications: A Web of Linked ‘Data’ 9 Applications: Schema.org ▪ Schema.org is an RDF ontology from which triples (with Web- dereferencable URIs) can be embedded in HTML pages http://schema.org/ 10 Applications: Google Knowledge Graph https://developers.google.com/knowledge-graph/ 11 SUB-COMMUNITIES 12 Entity Linking/Canonicalization • Name of an entity
    [Show full text]
  • HOLMES: a Hybrid Ontology-Learning Materials Engineering System
    HOLMES: A Hybrid Ontology-Learning Materials Engineering System Miguel Francisco Miravite Remolona Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2018 © 2018 Miguel Francisco Miravite Remolona All rights reserved ABSTRACT HOLMES: A Hybrid Ontology-Learning Materials Engineering System Miguel Francisco Miravite Remolona Designing and discovering novel materials is challenging problem in many domains such as fuel additives, composites, pharmaceuticals, and so on. At the core of all this are models that capture how the different domain-specific data, information, and knowledge regarding the structures and properties of the materials are related to one another. This dissertation explores the difficult task of developing an artificial intelligence-based knowledge modeling environment, called Hybrid Ontology-Learning Materials Engineering System (HOLMES) that can assist humans in populating a materials science and engineering ontology through automatic information extraction from journal article abstracts. While what we propose may be adapted for a generic materials engineering application, our focus in this thesis is on the needs of the pharmaceutical industry. We develop the Columbia Ontology for Pharmaceutical Engineering (COPE), which is a modification of the Purdue Ontology for Pharmaceutical Engineering. COPE serves as the basis for HOLMES. The HOLMES framework starts with journal articles that are in the Portable Document Format (PDF) and ends with the assignment of the entries in the journal articles into ontologies. While this might seem to be a simple task of information extraction, to fully extract the information such that the ontology is filled as completely and correctly as possible is not easy when considering a fully developed ontology.
    [Show full text]
  • Trusted, Transparent, Actually Intelligent Technology Overview
    Trusted, Transparent, Cyc is a revolutionary AI platform with human reasoning, knowledge, Actually Intelligent and logic at enterprise scale Technology Overview info@cyc.com Contents 1 Cyc is AI that Reasons Logically and Explains Itself Transparently 2 2 There’s More to AI Than Machine Learning 2 3 Neither of those approaches to AI (ML’s and Cyc’s) are new 3 4 So. What Exactly Is Cyc? 5 5 Cyc’s Knowledge Base 6 5.1 Cyc’s Representation is More than Just a Knowledge Graph . ....... 8 5.2 Expressive Knowledge Representation—What It Is and Why It Matters . 9 5.3 The Degrees of Expressivity of a Knowledge RepresentationLanguage . 11 5.4 KnowledgeReuse ................................ 11 5.5 Cyc’s Knowledge is Easy to Customize for Individual Clients........ 12 5.6 Embracing Inconsistency in the Knowledge Base . ....... 12 6 Inference in Cyc 13 6.1 ReasoningEfficiently .............................. 14 7 IntelligentDataSelectionandVirtualDataIntegration 15 8 Cyc Applications 16 8.1 WhenShouldanEnterpriseUseCyc?. 18 9 Natural Language Understanding: The Next Frontier 19 10 Machine Learning or Cyc? Often the Best Answer is “Both” 20 11 Deploying Cyc 22 1 1 Cyc is AI that Reasons Logically and Explains Itself Transpar- ently Cyc® is a differentiated, mature Artificial Intelligence software platform with a suite of vertically-focused products. Unlike Machine Learning (ML)—what most people are referring to when they talk about AI these days—Cyc’s power is rooted in its ability to reason logically. ML harnesses big data to do statistical reasoning, whereas Cyc harnesses knowledge to do causal reasoning. And Cyc works: leaders within Global Fortune 500 enterprises operating in the highly- regulated and highly-competitive healthcare, financial services, and energy industries trust Cyc in applications where their existing investments in ML and data science did not suffice.
    [Show full text]
  • Enhancing Neural Data-To-Text Generation Models with External Background Knowledge
    Enhancing Neural Data-To-Text Generation Models with External Background Knowledge Shuang Chen1,∗ Jinpeng Wang2, Xiaocheng Feng1, Feng Jiang1;3, Bing Qin1, Chin-Yew Lin2 1 Harbin Institute of Technology, Harbin, China 2 Microsoft Research Asia 3 Peng Cheng Laboratory hitercs@gmail.com, fjinpwa, cylg@microsoft.com, fxcfeng, qinbg@ir.hit.edu.cn, fjiang@hit.edu.cn Abstract Infobox Description Recent neural models for data-to-text genera- Nacer Hammami (born December 28, 1980) is an Algerian football player who is tion rely on massive parallel pairs of data and currently playing for MC El Eulma in the text to learn the writing knowledge. They of- Algerian Ligue Professionnelle 1. ten assume that writing knowledge can be ac- quired from the training data alone. However, Entity Linking when people are writing, they not only rely Subject Relation Object Guelma country Algeria on the data but also consider related knowl- (Q609871) (P17) (Q262) edge. In this paper, we enhance neural data-to- defender instance of association football position text models with external knowledge in a sim- (Q336286) (P31) (Q4611891) …… … ple but effective way to improve the fidelity MC El Eulma league Algerian Ligue Professionnelle 1 of generated text. Besides relying on parallel (Q2742749) (P118) (Q647746) data and text as in previous work, our model External Background Knowledge from Wikidata attends to relevant external knowledge, en- Figure 1: An example of generating description from coded as a temporary memory, and combines a Wikipedia infobox. External background knowledge this knowledge with the context representa- expanded from the infobox is helpful for generation. tion of data before generating words.
    [Show full text]
  • Danish in Wikidata Lexemes
    Danish in Wikidata lexemes Finn Arup˚ Nielsen Cognitive Systems, DTU Compute, Technical University of Denmark Kongens Lyngby, Denmark Abstract 2 Wikidata lexemes Wikidata introduced support for lexico- Wikidata stores lexeme data on a new type of pages graphic data in 2018. Here we describe the prefixed with the letter `L' and further identified lexicographic part of Wikidata as well as with an integer, e.g., the Danish lexeme gentagelse experiences with setting up lexemes for the (repetition) has the identifier \L117" and available Danish language. We note various possible for view and edit at https://www.wikidata.org/ annotations for lexemes as well as discuss wiki/Lexeme:L117. On the same page, multiple various choices made. senses and forms for the lexeme may be defined. 1 Introduction They are identified by suffixes to the lexeme iden- tifier, e.g., the plural indefinite form gentagelser Wikipedia's structured sister Wikidata (Vrandeˇci´c would be identified as \L117-F3", while the first and Kr¨otzsch, 2014) at https://www.wikidata. sense|if it was defined|would have been identi- org/ supports interlinking different language ver- fied as \L117-S1". Forms and senses are defined sions of Wikipedia as well as several other Wikime- separately, so it is currently difficult to define a dia sites, such as Wikibooks and Wikimedia Com- specific sense for a specific form. Lexemes, forms mons. One wiki that has been missing from the list and senses may be associated with properties, and is Wiktionary, | the dictionary wiki. Wiktionary these properties are identified with integer prefixed has a structure different from the other wikis as with the letter `P'.
    [Show full text]
  • Leveraging Entity Types and Properties for Knowledge Graph Exploitation
    DEPARTMENT OF INFORMATICS UNIVERSITYOF FRIBOURG (SWITZERLAND) Leveraging Entity Types and Properties for Knowledge Graph Exploitation THESIS Presented to the Faculty of Science of the University of Fribourg (Switzerland) in consideration for the award of the academic grade of Doctor scientiarum informaticarum by ALBERTO TONON from ITALY Thesis No: 2018 UniPrint 2017 Accepted by the Faculty of Science of the University of Fribourg (Switzerland) upon the recommendation of Prof. Dr. Krisztian Balog and Dr. Gianluca Demartini. Fribourg, May 22, 2017 Thesis supervisor Dean Prof. Dr. Philippe Cudré-Mauroux Prof. Dr. Christian Bochet iii Declaration of Authorship Title: Leveraging Entity Types and Properties for Knowledge Graph Exploitation I, Alberto Tonon, declare that I have authored this thesis independently, without illicit help, that I have not used any other than the declared sources/resources, and that I have explicitly marked all material which has been quoted either literally or by content from the used sources. Signed: Date: v “Educating the mind without educating the heart is no education at all.” — Aristotle Acknowledgments There are many people without whom I would never have achieved this goal. All of them con- tributed in their own way and proved to be essential to my academic growth, to my personal development, and to my mental health. Here I want to thank them all. Thanks to my supervisor, Professor Philippe Cudré-Mauroux. Philippe provided me with inestimable advice and guidance, he also showed great patience, and always understood the issues that each Ph.D. candidate had. One of Philippe’s best achievements is, in my opinion, the creation of a familiar and friendly group where students and researchers can freely interact, provide feedback, collaborate, and have a lot of fun.
    [Show full text]
  • Reading and Reasoning with Knowledge Graphs
    Reading and Reasoning with Knowledge Graphs Matthew Gardner CMU-LTI-15-014 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Tom Mitchell, Chair William Cohen Christos Faloutsos Antoine Bordes Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy In Language and Information Technologies Copyright c 2015 Matthew Gardner Keywords: Knowledge base inference, machine reading, graph inference To Russell Ball and John Hale Gardner, whose lives instilled in me the desire to pursue a PhD. iv Abstract Much attention has recently been given to the creation of large knowledge bases that contain millions of facts about people, things, and places in the world. These knowledge bases have proven to be incredibly useful for enriching search results, answering factoid questions, and training semantic parsers and relation extractors. The way the knowledge base is actually used in these systems, however, is somewhat shallow—they are treated most often as simple lookup tables, a place to find a factoid answer given a structured query, or to determine whether a sentence should be a positive or negative training example for a relation extraction model. Very little is done in the way of reasoning with these knowledge bases or using them to improve machine reading. This is because typical probabilistic reasoning systems do not scale well to collections of facts as large as modern knowledge bases, and because it is difficult to incorporate information from a knowledge base into typical natural language processing models. In this thesis we present methods for reasoning over very large knowledge bases, and we show how to apply these methods to models of machine reading.
    [Show full text]
  • A Framework for Semantic Similarity Measures to Enhance Knowledge Graph Quality
    A Framework for Semantic Similarity Measures to enhance Knowledge Graph Quality Zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften der Fakultät für Wirtschaftswissenschaften Karlsruher Institut für Technologie (KIT) genehmigte Dissertation von Dipl.-Ing. Ignacio Traverso Ribón Tag der mündlichen Prüfung: 10. August 2017 Referent: Prof. Dr. York Sure-Vetter Korreferent: Prof. Dr. Maria-Esther Vidal Serodio Abstract Precisely determining similarity values among real-world entities becomes a building block for data driven tasks, e.g., ranking, relation discovery or integration. Semantic Web and Linked Data initiatives have promoted the publication of large semi-structured datasets in form of knowledge graphs. Knowledge graphs encode semantics that describes resources in terms of several aspects or resource characteristics, e.g., neighbors, class hierarchies or attributes. Existing similarity measures take into account these aspects in isolation, which may prevent them from delivering accurate similarity values. In this thesis, the relevant resource characteristics to determine accurately similarity values are identified and considered in a cumulative way in a framework of four similarity measures. Additionally, the impact of considering these resource characteristics during the computation of similarity values is analyzed in three data-driven tasks for the enhancement of knowledge graph quality. First, according to the identified resource characteristics, new similarity measures able to combine two or more of them are described. In total four similarity measures are presented in an evolutionary order. While the first three similarity measures, OnSim, IC-OnSim and GADES, combine the resource characteristics according to a human defined aggregation function, the last one, GARUM, makes use of a machine learning regression approach to determine the relevance of each resource characteristic during the computation of the similarity.
    [Show full text]
  • Artificial Intelligence in Education Promises and Implications for Teaching and Learning
    Artificial Intelligence In Education Promises and Implications for Teaching and Learning Wayne Holmes, Maya Bialik, Charles Fadel © Center for Curriculum Redesign – All Rights Reserved The Center for Curriculum Redesign, Boston, MA, 02130 Copyright © 2019 by Center for Curriculum Redesign All rights reserved. Printed in the United States of America Artificial Intelligence in Education Promises and Implications for Teaching and Learning ISBN-13: 978-1-794-29370-0 ISBN-10: 1-794-29370-0 A pdf version with active links is available at: http://bit.ly/AIED- BOOK Keywords: Education technology, EdTech, artificial intelligence, machine learning, deep learning, AI, AIED, curriculum, standards, competencies, competency, CBL, deeper learning, knowledge, skills, character, metacognition, meta-learning, mindset, 21st century education, social- emotional skills, 21st century competencies, education redesign, 21st century curriculum, pedagogy, learning, jobs, employment, employability, eduployment, Education 2030, fourth Industrial Revolution, exponential technology, personalized learning, competency-based learning, individualized learning, adaptive learning. © Center for Curriculum Redesign – All Rights Reserved Introduction: The Context Artificial intelligence (AI) is arguably the driving technological force of the first half of this century, and will transform virtually every industry, if not human endeavors at large.1 Businesses and governments worldwide are pouring enormous sums of money into a very wide array of implementations, and dozens of start-ups are being funded to the tune of billions of dollars. Funding of AI startup companies worldwide, from 2013 to 2017 (in millions of U.S. dollars). Source: Statista2 It would be naive to think that AI will not have an impact on education—au contraire, the possibilities there are profound yet, for the time being, overhyped as well.
    [Show full text]