Using Shape Expressions (Shex) to Share RDF Data Models and to Guide Curation with Rigorous Validation B Katherine Thornton1( ), Harold Solbrig2, Gregory S
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Proceedings of the 11Th International Symposium on Open Collaboration
Proceedings of the 11th International Symposium on Open Collaboration August 19-21, 2015 San Francisco, California, U.S.A. General chair: Dirk Riehle, Friedrich-Alexander University Erlangen-Nürnberg Research track chairs: Kevin Crowston, Syracuse University Carlos Jensen, Oregon State University Carl Lagoze, University of Michigan Ann Majchrzak, University of Southern California Arvind Malhotra, University of North Carolina at Chapel Hill Claudia Müller-Birn, Freie Universität Berlin Gregorio Robles, Universidad Rey Juan Carlos Aaron Shaw, Northwestern University Sponsors: Wikimedia Foundation Google Inc. University of California Berkeley In-cooperation: ACM SIGSOFT ACM SIGWEB Fiscal sponsor: The John Ernest Foundation The Association for Computing Machinery 2 Penn Plaza, Suite 701 New York New York 10121-0701 ACM COPYRIGHT NOTICE. Copyright © 2014 by the Association for Computing Machin- ery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be hon- ored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or [email protected]. For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-750-8400, +1-978-750- 4470 (fax). -
Validating RDF Data Using Shapes
83 Validating RDF data using Shapes a Jose Emilio Labra Gayo a University of Oviedo, Spain Abstract RDF forms the keystone of the Semantic Web as it enables a simple and powerful knowledge representation graph based data model that can also facilitate integration between heterogeneous sources of information. RDF based applications are usually accompanied with SPARQL stores which enable to efficiently manage and query RDF data. In spite of its well known benefits, the adoption of RDF in practice by web programmers is still lacking and SPARQL stores are usually deployed without proper documentation and quality assurance. However, the producers of RDF data usually know the implicit schema of the data they are generating, but they don't do it traditionally. In the last years, two technologies have been developed to describe and validate RDF content using the term shape: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL). We will present a motivation for their appearance and compare them, as well as some applications and tools that have been developed. Keywords RDF, ShEx, SHACL, Validating, Data quality, Semantic web 1. Introduction In the tutorial we will present an overview of both RDF is a flexible knowledge and describe some challenges and future work [4] representation language based of graphs which has been successfully adopted in semantic web 2. Acknowledgements applications. In this tutorial we will describe two languages that have recently been proposed for This work has been partially funded by the RDF validation: Shape Expressions (ShEx) and Spanish Ministry of Economy, Industry and Shapes Constraint Language (SHACL).ShEx was Competitiveness, project: TIN2017-88877-R proposed as a concise and intuitive language for describing RDF data in 2014 [1]. -
V a Lida T in G R D F Da
Series ISSN: 2160-4711 LABRA GAYO • ET AL GAYO LABRA Series Editors: Ying Ding, Indiana University Paul Groth, Elsevier Labs Validating RDF Data Jose Emilio Labra Gayo, University of Oviedo Eric Prud’hommeaux, W3C/MIT and Micelio Iovka Boneva, University of Lille Dimitris Kontokostas, University of Leipzig VALIDATING RDF DATA This book describes two technologies for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two, and some example applications. RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models. At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages provide models and vocabularies for expressing such structural constraints. -
Semantics Developer's Guide
MarkLogic Server Semantic Graph Developer’s Guide 2 MarkLogic 10 May, 2019 Last Revised: 10.0-8, October, 2021 Copyright © 2021 MarkLogic Corporation. All rights reserved. MarkLogic Server MarkLogic 10—May, 2019 Semantic Graph Developer’s Guide—Page 2 MarkLogic Server Table of Contents Table of Contents Semantic Graph Developer’s Guide 1.0 Introduction to Semantic Graphs in MarkLogic ..........................................11 1.1 Terminology ..........................................................................................................12 1.2 Linked Open Data .................................................................................................13 1.3 RDF Implementation in MarkLogic .....................................................................14 1.3.1 Using RDF in MarkLogic .........................................................................15 1.3.1.1 Storing RDF Triples in MarkLogic ...........................................17 1.3.1.2 Querying Triples .......................................................................18 1.3.2 RDF Data Model .......................................................................................20 1.3.3 Blank Node Identifiers ..............................................................................21 1.3.4 RDF Datatypes ..........................................................................................21 1.3.5 IRIs and Prefixes .......................................................................................22 1.3.5.1 IRIs ............................................................................................22 -
Understanding the Challenges of Collaborative Evidence-Based
Do you have a source for that? Understanding the Challenges of Collaborative Evidence-based Journalism Sheila O’Riordan Gaye Kiely Bill Emerson Joseph Feller Business Information Business Information Business Information Business Information Systems, University Systems, University Systems, University Systems, University College Cork, Ireland College Cork, Ireland College Cork, Ireland College Cork, Ireland [email protected] [email protected] [email protected] [email protected] ABSTRACT consumption has evolved [25]. There has been a move WikiTribune is a pilot news service, where evidence-based towards digital news with increasing user involvement as articles are co-created by professional journalists and a well as the use of social media platforms for accessing and community of volunteers using an open and collaborative discussing current affairs [14,39,43]. As such, the boundaries digital platform. The WikiTribune project is set within an are shifting between professional and amateur contributions. evolving and dynamic media landscape, operating under Traditional news organizations are adding interactive principles of openness and transparency. It combines a features as participatory journalism practices rise (see [8,42]) commercial for-profit business model with an open and the technologies that allow citizens to interact en masse collaborative mode of production with contributions from provide new avenues for engaging in democratic both paid professionals and unpaid volunteers. This deliberation [19]; the “process of reaching reasoned descriptive case study captures the first 12-months of agreement among free and equal citizens” [6:322]. WikiTribune’s operations to understand the challenges and opportunities within this hybrid model of production. We use With these changes, a number of challenges have arisen. -
Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing. -
Introduction to Linked Data and Its Lifecycle on the Web
Introduction to Linked Data and its Lifecycle on the Web Sören Auer, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Amrapali Zaveri AKSW, Institut für Informatik, Universität Leipzig, Pf 100920, 04009 Leipzig {lastname}@informatik.uni-leipzig.de http://aksw.org Abstract. With Linked Data, a very pragmatic approach towards achieving the vision of the Semantic Web has gained some traction in the last years. The term Linked Data refers to a set of best practices for publishing and interlinking struc- tured data on the Web. While many standards, methods and technologies devel- oped within by the Semantic Web community are applicable for Linked Data, there are also a number of specific characteristics of Linked Data, which have to be considered. In this article we introduce the main concepts of Linked Data. We present an overview of the Linked Data lifecycle and discuss individual ap- proaches as well as the state-of-the-art with regard to extraction, authoring, link- ing, enrichment as well as quality of Linked Data. We conclude the chapter with a discussion of issues, limitations and further research and development challenges of Linked Data. This article is an updated version of a similar lecture given at Reasoning Web Summer School 2011. 1 Introduction One of the biggest challenges in the area of intelligent information management is the exploitation of the Web as a platform for data and information integration as well as for search and querying. Just as we publish unstructured textual information on the Web as HTML pages and search such information by using keyword-based search engines, we are already able to easily publish structured information, reliably interlink this informa- tion with other data published on the Web and search the resulting data space by using more expressive querying beyond simple keyword searches. -
Linked Data Services for Internet of Things
International Conference on Recent Advances in Computer Systems (RACS 2015) Linked Data Services for Internet of Things Jaroslav Pullmann, Dr. Yehya Mohamad User-Centered Ubiquitous Computing department Fraunhofer Institute for Applied Information Technology FIT Sankt Augustin, Germany {jaroslav.pullmann, yehya.mohamad}@fit.fraunhofer.de Abstract — In this paper we present the open source “LinkSmart Resource Framework” allowing developers to II. LINKSMART RESOURCE FRAMEWORK incorporate heterogeneous data sources and physical devices through a generic service facade independent of the data model, A. Rationale persistence technology or access protocol. We particularly consi- The Java Data Objects standard [2] specifies an interface to der the integration and maintenance of Linked Data, a widely persist Java objects in a technology agnostic way. The related accepted means for expressing highly-structured, machine reada- ble (meta)data graphs. Thanks to its uniform, technology- Java Persistence specification [3] concentrates on object-rela- agnostic view on data, the framework is expected to increase the tional mapping only. We consider both specifications techno- ease-of-use, maintainability and usability of software relying on logy-driven, overly detailed with regards to common data ma- it. The development of various technologies to access, interact nagement tasks, while missing some important high-level with and manage data has led to rise of parallel communities functionality. The rationale underlying the LinkSmart Resour- often unaware of alternatives beyond their technology bounda- ce Platform is to define a generic, uniform interface for ries. Systems for object-relational mapping, NoSQL document or management, retrieval and processing of data while graph-based Linked Data storage usually exhibit complex, maintaining a technology and implementation agnostic facade. -
The Opencitations Data Model
The OpenCitations Data Model Marilena Daquino1;2[0000−0002−1113−7550], Silvio Peroni1;2[0000−0003−0530−4305], David Shotton2;3[0000−0001−5506−523X], Giovanni Colavizza4[0000−0002−9806−084X], Behnam Ghavimi5[0000−0002−4627−5371], Anne Lauscher6[0000−0001−8590−9827], Philipp Mayr5[0000−0002−6656−1658], Matteo Romanello7[0000−0002−7406−6286], and Philipp Zumstein8[0000−0002−6485−9434]? 1 Digital Humanities Advanced research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna fmarilena.daquino2,[email protected] 2 Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna 3 Oxford e-Research Centre, University of Oxford [email protected] 4 Institute for Logic, Language and Computation (ILLC), University of Amsterdam [email protected] 5 Department of Knowledge Technologies for the Social Sciences, GESIS - Leibniz-Institute for the Social Sciences [email protected], [email protected] 6 Data and Web Science Group, University of Mannheim [email protected] 7 cole Polytechnique Fdrale de Lausanne [email protected] 8 Mannheim University Library, University of Mannheim [email protected] Abstract. A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with differ- ent nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data sup- plier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies. -
Classifying Wikipedia Article Quality with Revision History Networks
Classifying Wikipedia Article Quality With Revision History Networks Narun Raman∗ Nathaniel Sauerberg∗ Carleton College Carleton College [email protected] [email protected] Jonah Fisher Sneha Narayan Carleton College Carleton College [email protected] [email protected] ABSTRACT long been interested in maintaining and investigating the quality We present a novel model for classifying the quality of Wikipedia of its content [4][6][12]. articles based on structural properties of a network representation Editors and WikiProjects typically rely on assessments of article of the article’s revision history. We create revision history networks quality to focus volunteer attention on improving lower quality (an adaptation of Keegan et. al’s article trajectory networks [7]), articles. This has led to multiple efforts to create classifiers that can where nodes correspond to individual editors of an article, and edges predict the quality of a given article [3][4][18]. These classifiers can join the authors of consecutive revisions. Using descriptive statistics assist in providing assessments of article quality at scale, and help generated from these networks, along with general properties like further our understanding of the features that distinguish high and the number of edits and article size, we predict which of six quality low quality Wikipedia articles. classes (Start, Stub, C-Class, B-Class, Good, Featured) articles belong While many Wikipedia article quality classifiers have focused to, attaining a classification accuracy of 49.35% on a stratified sample on assessing quality based on the content of the latest version of of articles. These results suggest that structures of collaboration an article [1, 4, 18], prior work has suggested that high quality arti- underlying the creation of articles, and not just the content of the cles are associated with more intense collaboration among editors article, should be considered for accurate quality classification. -
An Introduction to RDF
An Introduction to RDF Knowledge Technologies 1 Manolis Koubarakis Acknowledgement • This presentation is based on the excellent RDF primer by the W3C available at http://www.w3.org/TR/rdf-primer/ and http://www.w3.org/2007/02/turtle/primer/ . • Much of the material in this presentation is verbatim from the above Web site. Knowledge Technologies 2 Manolis Koubarakis Presentation Outline • Basic concepts of RDF • Serialization of RDF graphs: XML/RDF and Turtle • Other Features of RDF (Containers, Collections and Reification). Knowledge Technologies 3 Manolis Koubarakis What is RDF? •TheResource Description Framework (RDF) is a data model for representing information (especially metadata) about resources in the Web. • RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web (e.g., a book or a person). • RDF is intended for situations in which information about Web resources needs to be processed by applications, rather than being only displayed to people. Knowledge Technologies 4 Manolis Koubarakis Some History • RDF draws upon ideas from knowledge representation, artificial intelligence, and data management, including: – Semantic networks –Frames – Conceptual graphs – Logic-based knowledge representation – Relational databases • Shameless self-promotion : The closest to RDF, pre-Web knowledge representation language is Telos: John Mylopoulos, Alexander Borgida, Matthias Jarke, Manolis Koubarakis: Telos: Representing Knowledge About Information Systems. ACM Trans. Inf. Syst. 8(4): 325-362 (1990). Knowledge Technologies 5 Manolis Koubarakis The Semantic Web “Layer Cake” Knowledge Technologies 6 Manolis Koubarakis RDF Basics • RDF is based on the idea of identifying resources using Web identifiers and describing resources in terms of simple properties and property values. -
Linked Data Vs Schema.Org: a Town Hall Debate About the Future of Information
Linked Data vs Schema.org: A Town Hall Debate about the Future of Information Schema.org Resources Schema.org Microdata Creation and Analysis Tools Schema.org Google Structured Data Testing Tool http://schema.org/docs/full.html http://www.google.com/webmasters/tools/richsnippets The entire hierarchy in one file. Microdata Parser Schema.org FAQ http://tools.seomoves.org/microdata http://schema.org/docs/faq.html This tool parses HTML5 microdata on a web page and displays the results in a graphical view. You can see the full details of each microdata type by clicking on it. Getting Started with Schema.org http://schema.org/docs/gs.html Drupal module Includes information about how to mark up your content http://drupal.org/project/schemaorg using microdata, using the schema.org vocabulary, and advanced topics. A drop-in solution to enable the collections of schemas available at schema.org on your Drupal 7 site. Micro Data & Schema.org: Guide To Generating Rich Snippets Wordpress plugins http://seogadget.com/micro-data-schema-org-guide-to- http://wordpress.org/extend/plugins/tags/schemaorg generating-rich-snippets A list of Wordpress plugins that allow you to easily insert A very comprehensive guide that includes an schema.org microdata on your site. introduction, a section on integrating microdata (use cases) schema.org, and a section on tools and useful Microdata generator resources. http://www.microdatagenerator.com A simple generator that allows you to input basic HTML5 Microdata and Schema.org information and have that info converted into the http://journal.code4lib.org/articles/6400 standard schema.org markup structure.