Extending Linked Open Data Resources Exploiting Wikipedia As Source of Information

Total Page:16

File Type:pdf, Size:1020Kb

Extending Linked Open Data Resources Exploiting Wikipedia As Source of Information Universit`adegli Studi di Milano DIPARTIMENTO DI INFORMATICA Scuola di Dottorato in Informatica – XXV ciclo PhD thesis Extending Linked Open Data resources exploiting Wikipedia as source of information Student: Advisor: Alessio Palmero Aprosio Prof. Ernesto Damiani Matricola R08605 Co-Advisors: Alberto Lavelli Claudio Giuliano Year 2011–2012 Abstract DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central role in the Semantic Web, due to the large and growing number of resources linked to it. Currently, the information contained in DBpedia is mainly collected from Wikipedia infoboxes, a set of attribute-value pairs that represent a summary of the Wikipedia page. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology. Thanks to crowdsourcing, a large number of infoboxes in the English Wikipedia has been mapped to the corresponding classes in DBpedia. Subsequently, the same procedure has been applied to other languages to create the localized versions of DBpedia. However, (i) the number of accomplished mappings is still small and limited to most frequent infoboxes, as the task is done manually by the DBpedia community, (ii) mappings need maintenance due to the constant and quick changes of Wikipedia articles, and (iii) infoboxes are manually compiled by the Wikipedia contributors, therefore in more than 50% of the Wikipedia articles the infobox is missing. As a demonstration of these issues, only 2.35M Wikipedia pages are classified in the DBpedia ontology (using a class different from the top-level owl:Thing), although the English Wikipedia contains almost 4M pages. This shows a clear problem of coverage, and this issue is even worse in other languages (like French and Spanish). The objective of this thesis is to define a methodology to increase the coverage of DBpedia in different languages, using various techniques to reach two different goals: automatic mapping and DBpedia dataset completion. A key aspect of our research is multi-linguality in Wikipedia: we bootstrap the available information through cross-language links, starting from the available mappings in some pivot languages, and then extending the existing DBpedia datasets (or create new ones from scratch) comparing the classifications in different languages. When the DBpedia classification is missing, we train a supervised classifier using DBpedia as training. We also use the Distant Supervision paradigm to extract the missing properties directly from the Wikipedia articles. We evaluated our system using a manually annotated test set and some existing DBpedia mappings excluded from the training. The results demonstrate the suitability of the approach in iv extending the DBpedia resource. Finally, the resulting resources are made available through a SPARQL endpoint and as a downloadable package. Acknowledgments This thesis would not have been possible without the help and the guidance of some valuable people, who gave me help and assistance in many ways. First of all, I would express my gratitude to my advisors Alberto Lavelli and Claudio Giuliano, who supported me with patience and enthusiasm. I thank Bernardo Magnini, head of the Human Language Technology unit at Fondazione Bruno Kessler, who accepted me despite my particular academic situation. I also wish to thank all the members of the HLT group for being always present both as inspiring colleagues and precious friends. I am also thankful to Prof. Silvio Ghilardi and Prof. Ernesto Damiani, from University of Milan, for their availability and collaboration. I would like to thank Volha Bryl, Philipp Cimiano and Alessandro Moschitti for agreeing to be my thesis referees and for their valuable feedback. I gratefully thank Elena Cabrio, Julien Cojan and Fabien Gandon from INRIA (Sophia Antipolis) for letting me be a part of their research work. Outside research, I thank all friends and flatmates I met during these years, for adding precious moments to my everyday life in Trento. Finally, I thank my parents and family for their constant and loving support throughout all my studies. vi Contents Abstract iii Acknowledgmentsv Contents vii List of Figures xi List of Tables xiii 1 Introduction 1 1.1 The context.......................................1 1.2 DBpedia.........................................3 1.3 The problem......................................4 1.3.1 Coverage expansion..............................5 1.3.2 Automatic mapping..............................5 1.4 The solution.......................................6 1.5 Interacting with the Semantic Web..........................7 1.6 Contributions......................................8 1.6.1 DBpedia expansion...............................8 1.6.2 Question Answering..............................8 1.7 Structure of the thesis.................................9 2 Linked Open Data 11 2.1 Origins.......................................... 11 2.2 Linked Data principles................................. 13 2.3 Linked Data in practice................................. 14 2.3.1 Resource Description Framework....................... 14 2.3.2 Resource Description Framework in Attributes................ 16 viii 2.3.3 SPARQL query language............................ 18 2.3.4 Processing RDF data.............................. 19 2.4 The LOD cloud..................................... 19 2.5 Resources........................................ 20 2.5.1 Wikipedia.................................... 21 2.5.2 DBpedia..................................... 23 2.5.3 Wikidata.................................... 26 3 Related work 29 3.1 LOD Resources..................................... 29 3.1.1 YAGO...................................... 29 3.1.2 Freebase..................................... 30 3.2 Entity classification................................... 31 3.3 Schema matching.................................... 31 3.4 Distant supervision................................... 32 3.5 Question answering.................................. 33 4 Pre-processing data 35 4.1 Filtering Wikipedia templates............................. 35 4.2 Wikipedia and DBpedia entities representation................... 36 4.2.1 Building the entity matrix........................... 37 4.2.2 Assigning DBpedia class to entities...................... 37 5 Automatic mapping generation for classes 39 5.1 Infobox mapping.................................... 40 5.2 Experiments and evaluation.............................. 42 6 Automatic mapping generation for properties 45 6.1 Problem Formalization................................. 46 6.2 Workflow of the System................................ 48 6.3 Pre-processing..................................... 49 6.3.1 Cross-language information.......................... 49 6.3.2 DBpedia dataset extraction.......................... 49 6.3.3 Template and redirect resolution....................... 50 6.3.4 Data Extraction................................. 50 6.4 Mapping extraction................................... 51 6.5 Inner similarity function................................ 52 ix 6.5.1 Similarity between object properties..................... 53 6.5.2 Similarity between datatype properties.................... 53 6.6 Post-processing..................................... 55 6.7 Evaluation........................................ 55 7 Extending DBpedia coverage on classes 59 7.1 Kernels for Entity Classification............................ 59 7.1.1 Bag-of-features Kernels............................. 60 7.1.2 Latent Semantic Kernel............................ 62 7.1.3 Composite Kernel................................ 63 7.2 Experiments....................................... 63 7.2.1 Pre-processing Wikipedia and DBpedia.................... 64 7.2.2 Benchmark................................... 64 7.2.3 Latent Semantic Models............................ 65 7.2.4 Learning Algorithm............................... 65 7.2.5 Classification Schemas............................. 66 7.2.6 Results...................................... 67 8 Extending DBpedia coverage on properties 71 8.1 Workflow........................................ 72 8.2 Pre-processing..................................... 73 8.2.1 Retrieving sentences.............................. 73 8.2.2 Selecting sentences............................... 75 8.2.3 Training algorithm............................... 76 8.3 Experiments and evaluation.............................. 77 9 Airpedia, an automatically built LOD resource 79 9.1 Mapping generation.................................. 80 9.1.1 Classes (released April 2013)......................... 80 9.1.2 Properties (released May 2013)........................ 80 9.2 Wikipedia page classification............................. 81 9.2.1 Version 1 (released December 2012)..................... 82 9.2.2 Version 2 (released June 2013)........................ 83 9.2.3 Integration with the Italian DBpedia..................... 84 9.3 DBpedia error reporting................................ 84 x 10 Case study: QAKiS 87 10.1 WikiFramework: collecting relational patterns.................... 89 10.2 QAKiS: a system for data answer retrieval from natural language questions.... 91 10.2.1 NE identification and Expected Answer Type (EAT)............. 91 10.2.2 Typed questions generation.......................... 93 10.2.3 WikiFramework pattern matching....................... 93 10.2.4 Query selector................................. 93 10.3 Experimental evaluation................................ 94 10.4 Demo.......................................... 94 11
Recommended publications
  • Wikimedia Conferentie Nederland 2012 Conferentieboek
    http://www.wikimediaconferentie.nl WCN 2012 Conferentieboek CC-BY-SA 9:30–9:40 Opening 9:45–10:30 Lydia Pintscher: Introduction to Wikidata — the next big thing for Wikipedia and the world Wikipedia in het onderwijs Technische kant van wiki’s Wiki-gemeenschappen 10:45–11:30 Jos Punie Ralf Lämmel Sarah Morassi & Ziko van Dijk Het gebruik van Wikimedia Commons en Wikibooks in Community and ontology support for the Wikimedia Nederland, wat is dat precies? interactieve onderwijsvormen voor het secundaire onderwijs 101wiki 11:45–12:30 Tim Ruijters Sandra Fauconnier Een passie voor leren, de Nederlandse wikiversiteit Projecten van Wikimedia Nederland in 2012 en verder Bliksemsessie …+discussie …+vragensessie Lunch 13:15–14:15 Jimmy Wales 14:30–14:50 Wim Muskee Amit Bronner Lotte Belice Baltussen Wikipedia in Edurep Bridging the Gap of Multilingual Diversity Open Cultuur Data: Een bottom-up initiatief vanuit de erfgoedsector 14:55–15:15 Teun Lucassen Gerard Kuys Finne Boonen Scholieren op Wikipedia Onderwerpen vinden met DBpedia Blijf je of ga je weg? 15:30–15:50 Laura van Broekhoven & Jan Auke Brink Jeroen De Dauw Jan-Bart de Vreede 15:55–16:15 Wetenschappelijke stagiairs vertalen onderzoek naar Structured Data in MediaWiki Wikiwijs in vergelijking tot Wikiversity en Wikibooks Wikipedia–lemma 16:20–17:15 Prijsuitreiking van Wiki Loves Monuments Nederland 17:20–17:30 Afsluiting 17:30–18:30 Borrel Inhoudsopgave Organisatie 2 Voorwoord 3 09:45{10:30: Lydia Pintscher 4 13:15{14:15: Jimmy Wales 4 Wikipedia in het onderwijs 5 11:00{11:45: Jos Punie
    [Show full text]
  • Towards a Korean Dbpedia and an Approach for Complementing the Korean Wikipedia Based on Dbpedia
    Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia Eun-kyung Kim1, Matthias Weidl2, Key-Sun Choi1, S¨orenAuer2 1 Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 2 Universit¨at Leipzig, Department of Computer Science, Johannisgasse 26, D-04103 Leipzig, Germany [email protected], [email protected] [email protected], [email protected] Abstract. In the first part of this paper we report about experiences when applying the DBpedia extraction framework to the Korean Wikipedia. We improved the extraction of non-Latin characters and extended the framework with pluggable internationalization components in order to fa- cilitate the extraction of localized information. With these improvements we almost doubled the amount of extracted triples. We also will present the results of the extraction for Korean. In the second part, we present a conceptual study aimed at understanding the impact of international resource synchronization in DBpedia. In the absence of any informa- tion synchronization, each country would construct its own datasets and manage it from its users. Moreover the cooperation across the various countries is adversely affected. Keywords: Synchronization, Wikipedia, DBpedia, Multi-lingual 1 Introduction Wikipedia is the largest encyclopedia of mankind and is written collaboratively by people all around the world. Everybody can access this knowledge as well as add and edit articles. Right now Wikipedia is available in 260 languages and the quality of the articles reached a high level [1]. However, Wikipedia only offers full-text search for this textual information. For that reason, different projects have been started to convert this information into structured knowledge, which can be used by Semantic Web technologies to ask sophisticated queries against Wikipedia.
    [Show full text]
  • Chaudron: Extending Dbpedia with Measurement Julien Subercaze
    Chaudron: Extending DBpedia with measurement Julien Subercaze To cite this version: Julien Subercaze. Chaudron: Extending DBpedia with measurement. 14th European Semantic Web Conference, Eva Blomqvist, Diana Maynard, Aldo Gangemi, May 2017, Portoroz, Slovenia. hal- 01477214 HAL Id: hal-01477214 https://hal.archives-ouvertes.fr/hal-01477214 Submitted on 27 Feb 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Chaudron: Extending DBpedia with measurement Julien Subercaze1 Univ Lyon, UJM-Saint-Etienne, CNRS Laboratoire Hubert Curien UMR 5516, F-42023, SAINT-ETIENNE, France [email protected] Abstract. Wikipedia is the largest collaborative encyclopedia and is used as the source for DBpedia, a central dataset of the LOD cloud. Wikipedia contains numerous numerical measures on the entities it describes, as per the general character of the data it encompasses. The DBpedia In- formation Extraction Framework transforms semi-structured data from Wikipedia into structured RDF. However this extraction framework of- fers a limited support to handle measurement in Wikipedia. In this paper, we describe the automated process that enables the creation of the Chaudron dataset. We propose an alternative extraction to the tra- ditional mapping creation from Wikipedia dump, by also using the ren- dered HTML to avoid the template transclusion issue.
    [Show full text]
  • Building a Visual Editor for Wikipedia
    Building a Visual Editor for Wikipedia Trevor Parscal and Roan Kattouw Wikimania D.C. 2012 Trevor Parscal Roan Kattouw Rob Moen Lead Designer and Engineer Data Model Engineer User Interface Engineer Wikimedia Wikimedia Wikimedia Inez Korczynski Christian Williams James Forrester Edit Surface Engineer Edit Surface Engineer Product Analyst Wikia Wikia Wikimedia The People Wikimania D.C. 2012 Parsoid Team Gabriel Wicke Subbu Sastry Lead Parser Engineer Parser Engineer Wikimedia Wikimedia The People Wikimania D.C. 2012 The Complexity Problem Wikimania D.C. 2012 Active Editors 20k 0 2001 2007 Today Growth Stagnation The Complexity Problem Wikimania D.C. 2012 just messing around Testing testing 123... The Complexity Problem Wikimania D.C. 2012 The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Balancing the ecosystem Difficulty Editing Reviewing The Review Problem Wikimania D.C. 2012 Wikitext enthusiasts CC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:Usfa-heston.gif The Expert Problem Wikimania D.C. 2012 Exit strategy 100% Preference for Wikitext Capabilities of visual tools 0% The Expert Problem Wikimania D.C. 2012 To what extent? CC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:TriMet_MAX_Green_Line_Train_on_Portland_Transit_Mall.jpg The Expert Problem Wikimania D.C. 2012 To what extent? CC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:TriMet_MAX_Green_Line_Train_on_Portland_Transit_Mall.jpgCC-BY-SA-3.0, http://commons.wikimedia.org/wiki/File:TriMet_1990_Gillig_bus_carrying_bike.jpg The Expert Problem Wikimania D.C.
    [Show full text]
  • Wiki-Metasemantik: a Wikipedia-Derived Query Expansion Approach Based on Network Properties
    Wiki-MetaSemantik: A Wikipedia-derived Query Expansion Approach based on Network Properties D. Puspitaningrum1, G. Yulianti2, I.S.W.B. Prasetya3 1,2Department of Computer Science, The University of Bengkulu WR Supratman St., Kandang Limun, Bengkulu 38371, Indonesia 3Department of Information and Computing Sciences, Utrecht University PO Box 80.089, 3508 TB Utrecht, The Netherlands E-mails: [email protected], [email protected], [email protected] Pseudo-Relevance Feedback (PRF) query expansions suffer Abstract- This paper discusses the use of Wikipedia for building from several drawbacks such as query-topic drift [8][10] and semantic ontologies to do Query Expansion (QE) in order to inefficiency [21]. Al-Shboul and Myaeng [1] proposed a improve the search results of search engines. In this technique, technique to alleviate topic drift caused by words ambiguity selecting related Wikipedia concepts becomes important. We and synonymous uses of words by utilizing semantic propose the use of network properties (degree, closeness, and pageRank) to build an ontology graph of user query concept annotations in Wikipedia pages, and enrich queries with which is derived directly from Wikipedia structures. The context disambiguating phrases. Also, in order to avoid resulting expansion system is called Wiki-MetaSemantik. We expansion of mistranslated words, a query expansion method tested this system against other online thesauruses and ontology using link texts of a Wikipedia page has been proposed [9]. based QE in both individual and meta-search engines setups. Furthermore, since not all hyperlinks are helpful for QE task Despite that our system has to build a Wikipedia ontology graph though (e.g.
    [Show full text]
  • Navigating Dbpedia by Topic Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frederique Laforest
    Fouilla: Navigating DBpedia by Topic Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frederique Laforest To cite this version: Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frederique Laforest. Fouilla: Navigating DBpedia by Topic. CIKM 2018, Oct 2018, Turin, Italy. hal-01860672 HAL Id: hal-01860672 https://hal.archives-ouvertes.fr/hal-01860672 Submitted on 23 Aug 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Fouilla: Navigating DBpedia by Topic Tanguy Raynaud, Julien Subercaze, Delphine Boucard, Vincent Battu, Frédérique Laforest Univ Lyon, UJM Saint-Etienne, CNRS, Laboratoire Hubert Curien UMR 5516 Saint-Etienne, France [email protected] ABSTRACT only the triples that concern this topic. For example, a user is inter- Navigating large knowledge bases made of billions of triples is very ested in Italy through the prism of Sports while another through the challenging. In this demonstration, we showcase Fouilla, a topical prism of Word War II. For each of these topics, the relevant triples Knowledge Base browser that offers a seamless navigational expe- of the Italy entity differ. In such circumstances, faceted browsing rience of DBpedia. We propose an original approach that leverages offers no solution to retrieve the entities relative to a defined topic both structural and semantic contents of Wikipedia to enable a if the knowledge graph does not explicitly contain an adequate topic-oriented filter on DBpedia entities.
    [Show full text]
  • Wikipedia Editing History in Dbpedia Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin
    Wikipedia editing history in DBpedia Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin To cite this version: Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin. Wikipedia editing history in DB- pedia : extracting and publishing the encyclopedia editing activity as linked data. IEEE/WIC/ACM International Joint Conference on Web Intelligence (WI’ 16), Oct 2016, Omaha, United States. hal- 01359575 HAL Id: hal-01359575 https://hal.inria.fr/hal-01359575 Submitted on 2 Sep 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Wikipedia editing history in DBpedia extracting and publishing the encyclopedia editing activity as linked data Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin Université Côte d’Azur, Inria, CNRS, I3S, France Wimmics, Sophia Antipolis, France [email protected] Abstract— DBpedia is a huge dataset essentially extracted example, the French editing history dump represents 2TB of from the content and structure of Wikipedia. We present a new uncompressed data. This data extraction is performed by extraction producing a linked data representation of the editing stream in Node.js with a MongoDB instance. It takes 4 days to history of Wikipedia pages. This supports custom querying and extract 55 GB of RDF in turtle on 8 Intel(R) Xeon(R) CPU E5- combining with other data providing new indicators and insights.
    [Show full text]
  • What You Say Is Who You Are. How Open Government Data Facilitates Profiling Politicians
    What you say is who you are. How open government data facilitates profiling politicians Maarten Marx and Arjan Nusselder ISLA, Informatics Institute, University of Amsterdam Science Park 107 1098XG Amsterdam, The Netherlands Abstract. A system is proposed and implemented that creates a lan- guage model for each member of the Dutch parliament, based on the official transcripts of the meetings of the Dutch Parliament. Using ex- pert finding techniques, the system allows users to retrieve a ranked list of politicians, based on queries like news messages. The high quality of the system is due to extensive data cleaning and transformation which could have been avoided when it had been available in an open machine readable format. 1 Introduction The Internet is changing from a web of documents into a web of objects. Open and interoperable (linkable) data are crucial for web applications which are build around objects. Examples of objects featuring prominently is (mashup) websites are traditional named entities like persons, products, organizations [6,4], but also events and unique items like e.g. houses. The success of several mashup sites is simply due to the fact that they provide a different grouping of already (freely) available data. Originally the data could only be grouped by documents; the mashup allows for groupings by objects which are of interest in their specific domain. Here is an example from the political domain. Suppose one wants to know more about Herman van Rompuy, the new EU “president” from Belgium. Being a former member of the Belgium parliament and several governments, an im- portant primary source of information are the parliamentary proceedings.
    [Show full text]
  • Federated Ontology Search Vasco Calais Pedro CMU-LTI-09-010
    Federated Ontology Search Vasco Calais Pedro CMU-LTI-09-010 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Jaime Carbonell, Chair Eric Nyberg Robert Frederking Eduard Hovy, Information Sciences Institute Submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy In Language and Information Technologies Copyright © 2009 Vasco Calais Pedro For my grandmother, Avó Helena, I am sorry I wasn’t there Abstract An Ontology can be defined as a formal representation of a set of concepts within a domain and the relationships between those concepts. The development of the semantic web initiative is rapidly increasing the number of publicly available ontologies. In such a distributed environment, complex applications often need to handle multiple ontologies in order to provide adequate domain coverage. Surprisingly, there is a lack of adequate frameworks for enabling the use of multiple ontologies transparently while abstracting the particular ontological structures used by that framework. Given that any ontology represents the views of its author or authors, using multiple ontologies requires us to deal with several significant challenges, some stemming from the nature of knowledge itself, such as cases of polysemy or homography, and some stemming from the structures that we choose to represent such knowledge with. The focus of this thesis is to explore a set of techniques that will allow us to overcome some of the challenges found when using multiple ontologies, thus making progress in the creation of a functional information access platform for structured sources.
    [Show full text]
  • Collaborative Integration, Publishing and Analysis of Distributed Scholarly Metadata
    Collaborative Integration, Publishing and Analysis of Distributed Scholarly Metadata Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn vorgelegt von Sahar Vahdati aus dem Tabriz, Iran Bonn 2019 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn 1. Gutachter: Prof. Dr. Sören Auer 2. Gutachter: Prof. Dr. Rainer Manthey Tag der Promotion: 17.01.2019 Erscheinungsjahr: 2019 Abstract Research is becoming increasingly digital, interdisciplinary, and data-driven and affects different en- vironments in addition to academia, such as industry, and government. Research output representation, publication, mining, analysis, and visualization are taken to a new level, driven by the increased use of Web standards and digital scholarly communication initiatives. The number of scientific publications produced by new players and the increasing digital availability of scholarly artifacts, and associated metadata are other drivers of the substantial growth in scholarly communication. The heterogeneity of scholarly artifacts and their metadata spread over different Web data sources poses a major challenge for researchers with regard to search, retrieval and exploration. For example, it has become difficult to keep track of relevant scientific results, to stay up-to-date with new scientific events and running projects, as well as to find potential future collaborators. Thus, assisting researchers with a broader integration, management, and analysis of scholarly metadata can lead to new opportunities in research and to new ways of conducting research. The data integration problem has been extensively addressed by communities in the Database, Artificial Intelligence and Semantic Web fields. However, a share of the interoperability issues are domain specific and new challenges with regard to schema, structure, or domain, arise in the context of scholarly metadata integration.
    [Show full text]
  • Working-With-Mediawiki-Yaron-Koren.Pdf
    Working with MediaWiki Yaron Koren 2 Working with MediaWiki by Yaron Koren Published by WikiWorks Press. Copyright ©2012 by Yaron Koren, except where otherwise noted. Chapter 17, “Semantic Forms”, includes significant content from the Semantic Forms homepage (https://www. mediawiki.org/wiki/Extension:Semantic_Forms), available under the Creative Commons BY-SA 3.0 license. All rights reserved. Library of Congress Control Number: 2012952489 ISBN: 978-0615720302 First edition, second printing: 2014 Ordering information for this book can be found at: http://workingwithmediawiki.com All printing of this book is handled by CreateSpace (https://createspace.com), a subsidiary of Amazon.com. Cover design by Grace Cheong (http://gracecheong.com). Contents 1 About MediaWiki 1 History of MediaWiki . 1 Community and support . 3 Available hosts . 4 2 Setting up MediaWiki 7 The MediaWiki environment . 7 Download . 7 Installing . 8 Setting the logo . 8 Changing the URL structure . 9 Updating MediaWiki . 9 3 Editing in MediaWiki 11 Tabs........................................................... 11 Creating and editing pages . 12 Page history . 14 Page diffs . 15 Undoing . 16 Blocking and rollbacks . 17 Deleting revisions . 17 Moving pages . 18 Deleting pages . 19 Edit conflicts . 20 4 MediaWiki syntax 21 Wikitext . 21 Interwiki links . 26 Including HTML . 26 Templates . 27 3 4 Contents Parser and tag functions . 30 Variables . 33 Behavior switches . 33 5 Content organization 35 Categories . 35 Namespaces . 38 Redirects . 41 Subpages and super-pages . 42 Special pages . 43 6 Communication 45 Talk pages . 45 LiquidThreads . 47 Echo & Flow . 48 Handling reader comments . 48 Chat........................................................... 49 Emailing users . 49 7 Images and files 51 Uploading . 51 Displaying images . 55 Image galleries .
    [Show full text]
  • Wikidata Through the Eyes of Dbpedia
    Semantic Web 0 (0) 1–11 1 IOS Press Wikidata through the Eyes of DBpedia Editor(s): Aidan Hogan, Universidad de Chile, Chile Solicited review(s): Denny Vrandecic, Google, USA; Heiko Paulheim, Universität Mannheim, Germany; Thomas Steiner, Google, USA Ali Ismayilov a;∗, Dimitris Kontokostas b, Sören Auer a, Jens Lehmann a, and Sebastian Hellmann b a University of Bonn and Fraunhofer IAIS e-mail: [email protected] | [email protected] | [email protected] b Universität Leipzig, Institut für Informatik, AKSW e-mail: {lastname}@informatik.uni-leipzig.de Abstract. DBpedia is one of the earliest and most prominent nodes of the Linked Open Data cloud. DBpedia extracts and provides structured data for various crowd-maintained information sources such as over 100 Wikipedia language editions as well as Wikimedia Commons by employing a mature ontology and a stable and thorough Linked Data publishing lifecycle. Wikidata, on the other hand, has recently emerged as a user curated source for structured information which is included in Wikipedia. In this paper, we present how Wikidata is incorporated in the DBpedia eco-system. Enriching DBpedia with structured information from Wikidata provides added value for a number of usage scenarios. We outline those scenarios and describe the structure and conversion process of the DBpediaWikidata (DBW) dataset. Keywords: DBpedia, Wikidata, RDF 1. Introduction munity provide more up-to-date information. In addi- tion to the independent growth of DBpedia and Wiki- In the past decade, several large and open knowl- data, there is a number of structural complementarities edge bases were created. A popular example, DB- as well as overlaps with regard to identifiers, structure, pedia [6], extracts information from more than one schema, curation, publication coverage and data fresh- hundred Wikipedia language editions and Wikimedia ness that are analysed throughout this manuscript.
    [Show full text]