Towards Knowledge Graph Creation from Thousands of Wikis
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
One Knowledge Graph to Rule Them All? Analyzing the Differences Between Dbpedia, YAGO, Wikidata & Co
One Knowledge Graph to Rule them All? Analyzing the Differences between DBpedia, YAGO, Wikidata & co. Daniel Ringler and Heiko Paulheim University of Mannheim, Data and Web Science Group Abstract. Public Knowledge Graphs (KGs) on the Web are consid- ered a valuable asset for developing intelligent applications. They contain general knowledge which can be used, e.g., for improving data analyt- ics tools, text processing pipelines, or recommender systems. While the large players, e.g., DBpedia, YAGO, or Wikidata, are often considered similar in nature and coverage, there are, in fact, quite a few differences. In this paper, we quantify those differences, and identify the overlapping and the complementary parts of public KGs. From those considerations, we can conclude that the KGs are hardly interchangeable, and that each of them has its strenghts and weaknesses when it comes to applications in different domains. 1 Knowledge Graphs on the Web The term \Knowledge Graph" was coined by Google when they introduced their knowledge graph as a backbone of a new Web search strategy in 2012, i.e., moving from pure text processing to a more symbolic representation of knowledge, using the slogan \things, not strings"1. Various public knowledge graphs are available on the Web, including DB- pedia [3] and YAGO [9], both of which are created by extracting information from Wikipedia (the latter exploiting WordNet on top), the community edited Wikidata [10], which imports other datasets, e.g., from national libraries2, as well as from the discontinued Freebase [7], the expert curated OpenCyc [4], and NELL [1], which exploits pattern-based knowledge extraction from a large Web corpus. -
Wikipedia Knowledge Graph with Deepdive
The Workshops of the Tenth International AAAI Conference on Web and Social Media Wiki: Technical Report WS-16-17 Wikipedia Knowledge Graph with DeepDive Thomas Palomares Youssef Ahres [email protected] [email protected] Juhana Kangaspunta Christopher Re´ [email protected] [email protected] Abstract This paper is organized as follows: first, we review the related work and give a general overview of DeepDive. Sec- Despite the tremendous amount of information on Wikipedia, ond, starting from the data preprocessing, we detail the gen- only a very small amount is structured. Most of the informa- eral methodology used. Then, we detail two applications tion is embedded in unstructured text and extracting it is a non trivial challenge. In this paper, we propose a full pipeline that follow this pipeline along with their specific challenges built on top of DeepDive to successfully extract meaningful and solutions. Finally, we report the results of these applica- relations from the Wikipedia text corpus. We evaluated the tions and discuss the next steps to continue populating Wiki- system by extracting company-founders and family relations data and improve the current system to extract more relations from the text. As a result, we extracted more than 140,000 with a high precision. distinct relations with an average precision above 90%. Background & Related Work Introduction Until recently, populating the large knowledge bases relied on direct contributions from human volunteers as well With the perpetual growth of web usage, the amount as integration of existing repositories such as Wikipedia of unstructured data grows exponentially. Extract- info boxes. These methods are limited by the available ing facts and assertions to store them in a struc- structured data and by human power. -
Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications. -
The Culture of Wikipedia
Good Faith Collaboration: The Culture of Wikipedia Good Faith Collaboration The Culture of Wikipedia Joseph Michael Reagle Jr. Foreword by Lawrence Lessig The MIT Press, Cambridge, MA. Web edition, Copyright © 2011 by Joseph Michael Reagle Jr. CC-NC-SA 3.0 Purchase at Amazon.com | Barnes and Noble | IndieBound | MIT Press Wikipedia's style of collaborative production has been lauded, lambasted, and satirized. Despite unease over its implications for the character (and quality) of knowledge, Wikipedia has brought us closer than ever to a realization of the centuries-old Author Bio & Research Blog pursuit of a universal encyclopedia. Good Faith Collaboration: The Culture of Wikipedia is a rich ethnographic portrayal of Wikipedia's historical roots, collaborative culture, and much debated legacy. Foreword Preface to the Web Edition Praise for Good Faith Collaboration Preface Extended Table of Contents "Reagle offers a compelling case that Wikipedia's most fascinating and unprecedented aspect isn't the encyclopedia itself — rather, it's the collaborative culture that underpins it: brawling, self-reflexive, funny, serious, and full-tilt committed to the 1. Nazis and Norms project, even if it means setting aside personal differences. Reagle's position as a scholar and a member of the community 2. The Pursuit of the Universal makes him uniquely situated to describe this culture." —Cory Doctorow , Boing Boing Encyclopedia "Reagle provides ample data regarding the everyday practices and cultural norms of the community which collaborates to 3. Good Faith Collaboration produce Wikipedia. His rich research and nuanced appreciation of the complexities of cultural digital media research are 4. The Puzzle of Openness well presented. -
Knowledge Graph Identification
Knowledge Graph Identification Jay Pujara1, Hui Miao1, Lise Getoor1, and William Cohen2 1 Dept of Computer Science, University of Maryland, College Park, MD 20742 fjay,hui,[email protected] 2 Machine Learning Dept, Carnegie Mellon University, Pittsburgh, PA 15213 [email protected] Abstract. Large-scale information processing systems are able to ex- tract massive collections of interrelated facts, but unfortunately trans- forming these candidate facts into useful knowledge is a formidable chal- lenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a knowledge graph. The ex- tractions form an extraction graph and we refer to the task of removing noise, inferring missing information, and determining which candidate facts should be included into a knowledge graph as knowledge graph identification. In order to perform this task, we must reason jointly about candidate facts and their associated extraction confidences, identify co- referent entities, and incorporate ontological constraints. Our proposed approach uses probabilistic soft logic (PSL), a recently introduced prob- abilistic modeling framework which easily scales to millions of facts. We demonstrate the power of our method on a synthetic Linked Data corpus derived from the MusicBrainz music community and a real-world set of extractions from the NELL project containing over 1M extractions and 70K ontological relations. We show that compared to existing methods, our approach is able to achieve improved AUC and F1 with significantly lower running time. 1 Introduction The web is a vast repository of knowledge, but automatically extracting that knowledge at scale has proven to be a formidable challenge. -
Google Knowledge Graph, Bing Satori and Wolfram Alpha * Farouk Musa Aliyu and Yusuf Isah Yahaya
International Journal of Scientific & Engineering Research Volume 12, Issue 1, January-2021 11 ISSN 2229-5518 An Investigation of the Accuracy of Knowledge Graph-base Search Engines: Google knowledge Graph, Bing Satori and Wolfram Alpha * Farouk Musa Aliyu and Yusuf Isah Yahaya Abstract— In this paper, we carried out an investigation on the accuracy of two knowledge graph driven search engines (Google knowledge Graph and Bing’s Satori) and a computational knowledge system (Wolfram Alpha). We used a dataset consisting of list of books and their correct authors and constructed queries that will retrieve the author(s) of a book given the book’s name. We evaluate the result from each search engine and measure their precision, recall and F1 score. We also compared the result of these two search engines to the result from the computation knowledge engine (Wolfram Alpha). Our result shows that Google performs better than Bing. While both Google and Bing performs better than Wolfram Alpha. Keywords — Knowledge Graphs, Evaluation, Information Retrieval, Semantic Search Engines.. —————————— —————————— 1 INTRODUCTION earch engines have played a significant role in helping leveraging knowledge graphs or how reliable are the result S web users find their search needs. Most traditional outputted by the semantic search engines? search engines answer their users by presenting a list 2. How are the accuracies of semantic search engines as of ranked documents which they believe are the most relevant compare with computational knowledge engines? to their -
Teaching in the Fandom Realm Through Literacy: an Approach for English Teachers in Medellín, Colombia
TEACHING IN THE FANDOM REALM THROUGH LITERACY: AN APPROACH FOR ENGLISH TEACHERS IN MEDELLÍN, COLOMBIA MARÍA ELÍZABETH AGUDELO RAMÍREZ UNIVERSIDAD PONTIFICIA BOLIVARIANA ESCUELA DE EDUCACIÓN Y PEDAGOGÍA FACULTAD DE EDUCACIÓN LICENCIATURA INGLÉS - ESPAÑOL MEDELLÍN 2020 TEACHING IN THE FANDOM REALM THROUGH LITERACY: AN APPROACH FOR ENGLISH TEACHERS IN MEDELLÍN, COLOMBIA MARÍA ELÍZABETH AGUDELO RAMÍREZ Trabajo de grado para optar al título de Licenciatura Inglés - Español Asesor RAÚL ALBERTO MORA VÉLEZ Doctor of Philosophy in Secondary and Continuing Education UNIVERSIDAD PONTIFICIA BOLIVARIANA ESCUELA DE EDUCACIÓN Y PEDAGOGÍA FACULTAD DE EDUCACIÓN LICENCIATURA INGLÉS - ESPAÑOL MEDELLÍN 2020 Mayo 25 de 2020 María Elízabeth Agudelo Ramírez “Declaro que este trabajo de grado no ha sido presentado con anterioridad para optar a un título, ya sea en igual forma o con variaciones, en ésta o en cualquiera otra universidad”. Art. 92, parágrafo, Régimen Estudiantil de Formación Avanzada. Firma del autor: iii AGRADECIMIENTOS This undergraduate thesis is dedicated to my beloved parents, my brother, my grandmother, and my aunt Rosa. Thanks for your unwavering support. iv TABLE OF CONTENTS INTRODUCTION 1 SITUATING THE PROBLEM 2 CONCEPTUAL FRAMEWORK 7 APPROACHING FANDOM IN THE ELT CLASSROOM: A PROPOSAL 12 Group Work- Fandom in the classroom 12 Fandom and Wikis 13 Fanfiction in Fandom 17 Fandom and Literacy 19 My experience as a fan 21 Conclusion 24 BIBLIOGRAPHY 26 v ABSTRACT This critical literature review is part of a larger research geared to people’s second languages appropriation, led by the group Literacies in Second Languages Project (LSLP), of which the author is a member. The article first situates the problem of pursuing new approaches that foster students’ literacy practices in second languages within (and outside) the classroom. -
How Existing Social Norms Can Help Shape the Next Generation of User-Generated Content
Everything I Need To Know I Learned from Fandom: How Existing Social Norms Can Help Shape the Next Generation of User-Generated Content ABSTRACT With the growing popularity of YouTube and other platforms for user-generated content, such as blogs and wikis, copyright holders are increasingly concerned about potential infringing uses of their content. However, when enforcing their copyrights, owners often do not distinguish between direct piracy, such as uploading an entire episode of a television show, and transformative works, such as a fan-made video that incorporates clips from a television show. The line can be a difficult one to draw. However, there is at least one source of user- generated content that has existed for decades and that clearly differentiates itself from piracy: fandom and “fan fiction” writers. This note traces the history of fan communities and the copyright issues associated with fiction that borrows characters and settings that the fan-author did not create. The author discusses established social norms within these communities that developed to deal with copyright issues, such as requirements for non-commercial use and attribution, and how these norms track to Creative Commons licenses. The author argues that widespread use of these licenses, granting copyrighted works “some rights reserved” instead of “all rights reserved,” would allow copyright holders to give their consumers some creative freedom in creating transformative works, while maintaining the control needed to combat piracy. However, the author also suggests a more immediate solution: copyright holders, in making decisions concerning copyright enforcement, should consider using the norms associated with established user-generated content communities as a framework for drawing a line between transformative work and piracy. -
Towards a Knowledge Graph for Science
Towards a Knowledge Graph for Science Invited Article∗ Sören Auer Viktor Kovtun Manuel Prinz TIB Leibniz Information Centre for L3S Research Centre, Leibniz TIB Leibniz Information Centre for Science and Technology and L3S University of Hannover Science and Technology Research Centre at University of Hannover, Germany Hannover, Germany Hannover [email protected] [email protected] Hannover, Germany [email protected] Anna Kasprzik Markus Stocker TIB Leibniz Information Centre for TIB Leibniz Information Centre for Science and Technology Science and Technology Hannover, Germany Hannover, Germany [email protected] [email protected] ABSTRACT KEYWORDS The document-centric workflows in science have reached (or al- Knowledge Graph, Science and Technology, Research Infrastructure, ready exceeded) the limits of adequacy. This is emphasized by Libraries, Information Science recent discussions on the increasing proliferation of scientific lit- ACM Reference Format: erature and the reproducibility crisis. This presents an opportu- Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, and Markus nity to rethink the dominant paradigm of document-centric schol- Stocker. 2018. Towards a Knowledge Graph for Science: Invited Article. arly information communication and transform it into knowledge- In WIMS ’18: 8th International Conference on Web Intelligence, Mining and based information flows by representing and expressing informa- Semantics, June 25–27, 2018, Novi Sad, Serbia. ACM, New York, NY, USA, tion through semantically rich, interlinked knowledge graphs. At 6 pages. https://doi.org/10.1145/3227609.3227689 the core of knowledge-based information flows is the creation and evolution of information models that establish a common under- 1 INTRODUCTION standing of information communicated between stakeholders as The communication of scholarly information is document-centric. -
A Portrait of Fandom Women in The
DAUGHTERS OF THE DIGITAL: A PORTRAIT OF FANDOM WOMEN IN THE CONTEMPORARY INTERNET AGE ____________________________________ A Thesis Presented to The Honors TutoriAl College Ohio University _______________________________________ In PArtiAl Fulfillment of the Requirements for Graduation from the Honors TutoriAl College with the degree of Bachelor of Science in Journalism ______________________________________ by DelAney P. Murray April 2020 Murray 1 This thesis has been approved by The Honors TutoriAl College and the Department of Journalism __________________________ Dr. Eve Ng, AssociAte Professor, MediA Arts & Studies and Women’s, Gender, and Sexuality Studies Thesis Adviser ___________________________ Dr. Bernhard Debatin Director of Studies, Journalism ___________________________ Dr. Donal Skinner DeAn, Honors TutoriAl College ___________________________ Murray 2 Abstract MediA fandom — defined here by the curation of fiction, art, “zines” (independently printed mAgazines) and other forms of mediA creAted by fans of various pop culture franchises — is a rich subculture mAinly led by women and other mArginalized groups that has attracted mAinstreAm mediA attention in the past decAde. However, journalistic coverage of mediA fandom cAn be misinformed and include condescending framing. In order to remedy negatively biAsed framing seen in journalistic reporting on fandom, I wrote my own long form feAture showing the modern stAte of FAndom based on the generation of lAte millenniAl women who engaged in fandom between the eArly age of the Internet and today. This piece is mAinly focused on the modern experiences of women in fandom spaces and how they balAnce a lifelong connection to fandom, professional and personal connections, and ongoing issues they experience within fandom. My study is also contextualized by my studies in the contemporary history of mediA fan culture in the Internet age, beginning in the 1990’s And to the present day. -
For the Field
For the Field Of the People, For the People: Exploring Wikipedia as a Site for Community Building Introduction The COVID-19 pandemic has heightened the need for equal access to factual information about the virus. Free and open sources of information are increasingly important in our digital age. Although Wikipedia has often been touted as an unreliable source in the academic context, it can provide a great starting place for those interested in researching a given topic. Additionally, due to its inherently collaborative structure, this could be a site to help build community or extend an existing community, from the physical into the digital. Through a brief examination of the player/editor created and maintained Nukapedia, a wiki-site hosted by Fandom.com and dedicated to the videogame Fallout 4, I will demonstrate that a community can be formed even through asynchronous editing projects. I have experienced the power of this game in building community as a long-time fan of this series who has played through three of its titles: Fallout: New Vegas, Fallout 3, and Fallout 4. My long-time partner and I even had out first “official date” on the midnight release of Fallout 4 at our local EB Games. Although I am not an especially talented videogame player, I do enjoy the storylines and aesthetic of its fictional world. These collaborative projects based on collecting, organizing, and sharing knowledge can be employed by library professionals for community programming. As mentioned by Snyder in Edit-a-thons and Beyond (2018), organizing “edit-a-thon Emerging Library & Information Perspectives 1 events” can be a way to meet institutional goals and even engage with new patrons. -
Exploiting Semantic Web Knowledge Graphs in Data Mining
Exploiting Semantic Web Knowledge Graphs in Data Mining Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universit¨atMannheim presented by Petar Ristoski Mannheim, 2017 ii Dekan: Dr. Bernd Lübcke, Universität Mannheim Referent: Professor Dr. Heiko Paulheim, Universität Mannheim Korreferent: Professor Dr. Simone Paolo Ponzetto, Universität Mannheim Tag der mündlichen Prüfung: 15 Januar 2018 Abstract Data Mining and Knowledge Discovery in Databases (KDD) is a research field concerned with deriving higher-level insights from data. The tasks performed in that field are knowledge intensive and can often benefit from using additional knowledge from various sources. Therefore, many approaches have been proposed in this area that combine Semantic Web data with the data mining and knowledge discovery process. Semantic Web knowledge graphs are a backbone of many in- formation systems that require access to structured knowledge. Such knowledge graphs contain factual knowledge about real word entities and the relations be- tween them, which can be utilized in various natural language processing, infor- mation retrieval, and any data mining applications. Following the principles of the Semantic Web, Semantic Web knowledge graphs are publicly available as Linked Open Data. Linked Open Data is an open, interlinked collection of datasets in machine-interpretable form, covering most of the real world domains. In this thesis, we investigate the hypothesis if Semantic Web knowledge graphs can be exploited as background knowledge in different steps of the knowledge discovery process, and different data mining tasks. More precisely, we aim to show that Semantic Web knowledge graphs can be utilized for generating valuable data mining features that can be used in various data mining tasks.