Knowledge Graph Construction and Applications for Web Search and Beyond
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
What Do Wikidata and Wikipedia Have in Common? an Analysis of Their Use of External References
What do Wikidata and Wikipedia Have in Common? An Analysis of their Use of External References Alessandro Piscopo Pavlos Vougiouklis Lucie-Aimée Kaffee University of Southampton University of Southampton University of Southampton United Kingdom United Kingdom United Kingdom [email protected] [email protected] [email protected] Christopher Phethean Jonathon Hare Elena Simperl University of Southampton University of Southampton University of Southampton United Kingdom United Kingdom United Kingdom [email protected] [email protected] [email protected] ABSTRACT one hundred thousand. These users have gathered facts about Wikidata is a community-driven knowledge graph, strongly around 24 million entities and are able, at least theoretically, linked to Wikipedia. However, the connection between the to further expand the coverage of the knowledge graph and two projects has been sporadically explored. We investigated continuously keep it updated and correct. This is an advantage the relationship between the two projects in terms of the in- compared to a project like DBpedia, where data is periodically formation they contain by looking at their external references. extracted from Wikipedia and must first be modified on the Our findings show that while only a small number of sources is online encyclopedia in order to be corrected. directly reused across Wikidata and Wikipedia, references of- Another strength is that all the data in Wikidata can be openly ten point to the same domain. Furthermore, Wikidata appears reused and shared without requiring any attribution, as it is to use less Anglo-American-centred sources. These results released under a CC0 licence1. -
A Comparative Evaluation of Geospatial Semantic Web Frameworks for Cultural Heritage
heritage Article A Comparative Evaluation of Geospatial Semantic Web Frameworks for Cultural Heritage Ikrom Nishanbaev 1,* , Erik Champion 1,2,3 and David A. McMeekin 4,5 1 School of Media, Creative Arts, and Social Inquiry, Curtin University, Perth, WA 6845, Australia; [email protected] 2 Honorary Research Professor, CDHR, Sir Roland Wilson Building, 120 McCoy Circuit, Acton 2601, Australia 3 Honorary Research Fellow, School of Social Sciences, FABLE, University of Western Australia, 35 Stirling Highway, Perth, WA 6907, Australia 4 School of Earth and Planetary Sciences, Curtin University, Perth, WA 6845, Australia; [email protected] 5 School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Perth, WA 6845, Australia * Correspondence: [email protected] Received: 14 July 2020; Accepted: 4 August 2020; Published: 12 August 2020 Abstract: Recently, many Resource Description Framework (RDF) data generation tools have been developed to convert geospatial and non-geospatial data into RDF data. Furthermore, there are several interlinking frameworks that find semantically equivalent geospatial resources in related RDF data sources. However, many existing Linked Open Data sources are currently sparsely interlinked. Also, many RDF generation and interlinking frameworks require a solid knowledge of Semantic Web and Geospatial Semantic Web concepts to successfully deploy them. This article comparatively evaluates features and functionality of the current state-of-the-art geospatial RDF generation tools and interlinking frameworks. This evaluation is specifically performed for cultural heritage researchers and professionals who have limited expertise in computer programming. Hence, a set of criteria has been defined to facilitate the selection of tools and frameworks. -
Injection of Automatically Selected Dbpedia Subjects in Electronic
Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon To cite this version: Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hos- pitalization Prediction. SAC 2020 - 35th ACM/SIGAPP Symposium On Applied Computing, Mar 2020, Brno, Czech Republic. 10.1145/3341105.3373932. hal-02389918 HAL Id: hal-02389918 https://hal.archives-ouvertes.fr/hal-02389918 Submitted on 16 Dec 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti Catherine Faron-Zucker Fabien Gandon Université Côte d’Azur, Inria, CNRS, Université Côte d’Azur, Inria, CNRS, Inria, Université Côte d’Azur, CNRS, I3S, Sophia-Antipolis, France I3S, Sophia-Antipolis, France -
How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source Jiaqing Liang12, Sheng Zhang1, Yanghua Xiao134∗ 1School of Computer Science, Shanghai Key Laboratory of Data Science Fudan University, Shanghai, China 2Shuyan Technology, Shanghai, China 3Shanghai Internet Big Data Engineering Technology Research Center, China 4Xiaoi Research, Shanghai, China [email protected], fshengzhang16,[email protected] Abstract However, most of these knowledge bases tend to be out- dated, which limits their utility. For example, in many knowl- Knowledge bases are playing an increasingly im- edge bases, Donald Trump is only a business man even af- portant role in many real-world applications. How- ter the inauguration. Obviously, it is important to let ma- ever, most of these knowledge bases tend to be chines know that Donald Trump is the president of the United outdated, which limits the utility of these knowl- States so that they can understand that the topic of an arti- edge bases. In this paper, we investigate how to cle mentioning Donald Trump is probably related to politics. keep the freshness of the knowledge base by syn- Moreover, new entities are continuously emerging and most chronizing it with its data source (usually ency- of them are popular, such as iphone 8. However, it is hard for clopedia websites). A direct solution is revisiting a knowledge base to cover these entities in time even if the the whole encyclopedia periodically and rerun the encyclopedia websites have already covered them. entire pipeline of the construction of knowledge freshness base like most existing methods. -
One Knowledge Graph to Rule Them All? Analyzing the Differences Between Dbpedia, YAGO, Wikidata & Co
One Knowledge Graph to Rule them All? Analyzing the Differences between DBpedia, YAGO, Wikidata & co. Daniel Ringler and Heiko Paulheim University of Mannheim, Data and Web Science Group Abstract. Public Knowledge Graphs (KGs) on the Web are consid- ered a valuable asset for developing intelligent applications. They contain general knowledge which can be used, e.g., for improving data analyt- ics tools, text processing pipelines, or recommender systems. While the large players, e.g., DBpedia, YAGO, or Wikidata, are often considered similar in nature and coverage, there are, in fact, quite a few differences. In this paper, we quantify those differences, and identify the overlapping and the complementary parts of public KGs. From those considerations, we can conclude that the KGs are hardly interchangeable, and that each of them has its strenghts and weaknesses when it comes to applications in different domains. 1 Knowledge Graphs on the Web The term \Knowledge Graph" was coined by Google when they introduced their knowledge graph as a backbone of a new Web search strategy in 2012, i.e., moving from pure text processing to a more symbolic representation of knowledge, using the slogan \things, not strings"1. Various public knowledge graphs are available on the Web, including DB- pedia [3] and YAGO [9], both of which are created by extracting information from Wikipedia (the latter exploiting WordNet on top), the community edited Wikidata [10], which imports other datasets, e.g., from national libraries2, as well as from the discontinued Freebase [7], the expert curated OpenCyc [4], and NELL [1], which exploits pattern-based knowledge extraction from a large Web corpus. -
Wikipedia Knowledge Graph with Deepdive
The Workshops of the Tenth International AAAI Conference on Web and Social Media Wiki: Technical Report WS-16-17 Wikipedia Knowledge Graph with DeepDive Thomas Palomares Youssef Ahres [email protected] [email protected] Juhana Kangaspunta Christopher Re´ [email protected] [email protected] Abstract This paper is organized as follows: first, we review the related work and give a general overview of DeepDive. Sec- Despite the tremendous amount of information on Wikipedia, ond, starting from the data preprocessing, we detail the gen- only a very small amount is structured. Most of the informa- eral methodology used. Then, we detail two applications tion is embedded in unstructured text and extracting it is a non trivial challenge. In this paper, we propose a full pipeline that follow this pipeline along with their specific challenges built on top of DeepDive to successfully extract meaningful and solutions. Finally, we report the results of these applica- relations from the Wikipedia text corpus. We evaluated the tions and discuss the next steps to continue populating Wiki- system by extracting company-founders and family relations data and improve the current system to extract more relations from the text. As a result, we extracted more than 140,000 with a high precision. distinct relations with an average precision above 90%. Background & Related Work Introduction Until recently, populating the large knowledge bases relied on direct contributions from human volunteers as well With the perpetual growth of web usage, the amount as integration of existing repositories such as Wikipedia of unstructured data grows exponentially. Extract- info boxes. These methods are limited by the available ing facts and assertions to store them in a struc- structured data and by human power. -
Learning Ontologies from RDF Annotations
/HDUQLQJÃRQWRORJLHVÃIURPÃ5')ÃDQQRWDWLRQV $OH[DQGUHÃ'HOWHLOÃ&DWKHULQHÃ)DURQ=XFNHUÃ5RVHÃ'LHQJ ACACIA project, INRIA, 2004, route des Lucioles, B.P. 93, 06902 Sophia Antipolis, France {Alexandre.Delteil, Catherine.Faron, Rose.Dieng}@sophia.inria.fr $EVWUDFW objects, as in [Mineau, 1990; Carpineto and Romano, 1993; Bournaud HWÃDO., 2000]. In this paper, we present a method for learning Since all RDF annotations are gathered inside a common ontologies from RDF annotations of Web RDF graph, the problem which arises is the extraction of a resources by systematically generating the most description for a given resource from the whole RDF graph. specific generalization of all the possible sets of After a brief description of the RDF data model (Section 2) resources. The preliminary step of our method and of RDF Schema (Section 3), Section 4 presents several consists in extracting (partial) resource criteria for extracting partial resource descriptions. In order descriptions from the whole RDF graph gathering to deal with the intrinsic complexity of the building of a all the annotations. In order to deal with generalization hierarchy, we propose an incremental algorithmic complexity, we incrementally build approach by gradually increasing the size of the descriptions the ontology by gradually increasing the size of the resource descriptions we consider. we consider. The principle of the approach is explained in Section 5 and more deeply detailed in Section 6. Ã ,QWURGXFWLRQ Ã 7KHÃ5')ÃGDWDÃPRGHO The Semantic Web, expected to be the next step that will RDF is the emerging Web standard for annotating resources lead the Web to its full potential, will be based on semantic HWÃDO metadata describing all kinds of Web resources. -
Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications. -
A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) on YAGO Knowledge Base
A Question Answering System Using Graph-Pattern Association Rules (QAGPAR) On YAGO Knowledge Base Wahyudi 1,2 , Masayu Leylia Khodra 2, Ary Setijadi Prihatmanto 2, Carmadi Machbub 2 1 1) Faculty of Information Technology, University of Andalas Padang, Indonesia 2) School of Electrical Engineering and Informatics, Institut Teknologi Bandung Bandung, Indonesia [email protected], [email protected], [email protected] , [email protected] Abstract. A question answering system (QA System) was developed that uses graph-pattern association rules on the YAGO knowledge base. The answer as output of the system is provided based on a user question as input. If the answer is missing or unavailable in the database, then graph-pattern association rules are used to get the answer. The architecture of this question answering system is as follows: question classification, graph component generation, query generation, and query processing. The question answering system uses association graph patterns in a waterfall model. In this paper, the architecture of the system is described, specifically discussing its reasoning and performance capabilities. The results of this research is that rules with high confidence and correct logic produce correct answers, and vice versa. ACM CCS (2012) Classification: CCS → Computing methodologies → Artificial intelligence → Knowledge representation and reasoning → Reasoning about belief and knowledge Keywords: graph pattern, question answering system, system architecture 1. Introduction Knowledge bases provide information about a large variety of entities, such as people, countries, rivers, etc. Moreover, knowledge bases also contain facts related to these entities, e.g., who lives where, which river is located in which city [1]. -
Knowledge Graph Identification
Knowledge Graph Identification Jay Pujara1, Hui Miao1, Lise Getoor1, and William Cohen2 1 Dept of Computer Science, University of Maryland, College Park, MD 20742 fjay,hui,[email protected] 2 Machine Learning Dept, Carnegie Mellon University, Pittsburgh, PA 15213 [email protected] Abstract. Large-scale information processing systems are able to ex- tract massive collections of interrelated facts, but unfortunately trans- forming these candidate facts into useful knowledge is a formidable chal- lenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a knowledge graph. The ex- tractions form an extraction graph and we refer to the task of removing noise, inferring missing information, and determining which candidate facts should be included into a knowledge graph as knowledge graph identification. In order to perform this task, we must reason jointly about candidate facts and their associated extraction confidences, identify co- referent entities, and incorporate ontological constraints. Our proposed approach uses probabilistic soft logic (PSL), a recently introduced prob- abilistic modeling framework which easily scales to millions of facts. We demonstrate the power of our method on a synthetic Linked Data corpus derived from the MusicBrainz music community and a real-world set of extractions from the NELL project containing over 1M extractions and 70K ontological relations. We show that compared to existing methods, our approach is able to achieve improved AUC and F1 with significantly lower running time. 1 Introduction The web is a vast repository of knowledge, but automatically extracting that knowledge at scale has proven to be a formidable challenge. -
YAGO - Yet Another Great Ontology YAGO: a Large Ontology from Wikipedia and Wordnet1
YAGO - Yet Another Great Ontology YAGO: A Large Ontology from Wikipedia and WordNet1 Presentation by: Besnik Fetahu UdS February 22, 2012 1 Fabian M.Suchanek, Gjergji Kasneci, Gerhard Weikum Presentation by: Besnik Fetahu (UdS) YAGO - Yet Another Great Ontology February 22, 2012 1 / 30 1 Introduction & Background Ontology-Review Usage of Ontology Related Work 2 The YAGO Model Aims of YAGO Representation Models Semantics 3 Putting all at work Information Extraction Approaches Quality Control 4 Evaluation & Examples 5 Conclusions Presentation by: Besnik Fetahu (UdS) YAGO - Yet Another Great Ontology February 22, 2012 2 / 30 Introduction & Background 1 Introduction & Background Ontology-Review Usage of Ontology Related Work 2 The YAGO Model Aims of YAGO Representation Models Semantics 3 Putting all at work Information Extraction Approaches Quality Control 4 Evaluation & Examples 5 Conclusions Presentation by: Besnik Fetahu (UdS) YAGO - Yet Another Great Ontology February 22, 2012 3 / 30 Introduction & Background Ontology-Review Ontology - Review Knowledge represented as a set of concepts within a domain, and a relationship between those concepts.2 In general it has the following components: Entities Relations Domains Rules Axioms, etc. 2 http://en.wikipedia.org/wiki/Ontology (information science) Presentation by: Besnik Fetahu (UdS) YAGO - Yet Another Great Ontology February 22, 2012 4 / 30 Introduction & Background Ontology-Review Ontology - Review Knowledge represented as a set of concepts within a domain, and a relationship between those concepts.2 In general it has the following components: Entities Relations Domains Rules Axioms, etc. 2 http://en.wikipedia.org/wiki/Ontology (information science) Presentation by: Besnik Fetahu (UdS) YAGO - Yet Another Great Ontology February 22, 2012 4 / 30 Word Sense Disambiguation Wikipedia's categories, hyperlinks, and disambiguous articles, used to create a dataset of named entity (Bunescu R., and Pasca M., 2006). -
Falcon 2.0: an Entity and Relation Linking Tool Over Wikidata
Falcon 2.0: An Entity and Relation Linking Tool over Wikidata Ahmad Sakor Kuldeep Singh [email protected] [email protected] L3S Research Center and TIB, University of Hannover Cerence GmbH and Zerotha Research Hannover, Germany Aachen, Germany Anery Patel Maria-Esther Vidal [email protected] [email protected] TIB, University of Hannover L3S Research Center and TIB, University of Hannover Hannover, Germany Hannover, Germany ABSTRACT Management (CIKM ’20), October 19–23, 2020, Virtual Event, Ireland. ACM, The Natural Language Processing (NLP) community has signifi- New York, NY, USA, 8 pages. https://doi.org/10.1145/3340531.3412777 cantly contributed to the solutions for entity and relation recog- nition from a natural language text, and possibly linking them to 1 INTRODUCTION proper matches in Knowledge Graphs (KGs). Considering Wikidata Entity Linking (EL)- also known as Named Entity Disambiguation as the background KG, there are still limited tools to link knowl- (NED)- is a well-studied research domain for aligning unstructured edge within the text to Wikidata. In this paper, we present Falcon text to its structured mentions in various knowledge repositories 2.0, the first joint entity and relation linking tool over Wikidata. It (e.g., Wikipedia, DBpedia [1], Freebase [4] or Wikidata [28]). Entity receives a short natural language text in the English language and linking comprises two sub-tasks. The first task is Named Entity outputs a ranked list of entities and relations annotated with the Recognition (NER), in which an approach aims to identify entity proper candidates in Wikidata. The candidates are represented by labels (or surface forms) in an input sentence.